aa r X i v : . [ ec on . E M ] F e b Markov Switching
Yong Song a , Tomasz Wo´zniak a, ∗ a Department of Economics, University of Melbourne
Abstract
Markov switching models are a popular family of models that introduces time-variation in theparameters in the form of their state- or regime-specific values. Importantly, this time-variationis governed by a discrete-valued latent stochastic process with limited memory. More specifically,the current value of the state indicator is determined only by the value of the state indicator fromthe previous period, thus the Markov property, and the transition matrix. The latter characterizesthe properties of the Markov process by determining with what probability each of the statescan be visited next period, given the state in the current period. This setup decides on the twomain advantages of the Markov switching models. Namely, the estimation of the probability ofstate occurrences in each of the sample periods by using filtering and smoothing methods and theestimation of the state-specific parameters. These two features open the possibility for improvedinterpretations of the parameters associated with specific regimes combined with the correspondingregime probabilities, as well as for improved forecasting performance based on persistent regimesand parameters characterizing them.The most commonly applied models from this family are the Markov switching ones thatpresume a finite number of regimes and exogeneity of the Markov process which is defined as itsindependence of the model unpredictable innovations. In many such applications, desired propertiesof the Markov switching model have been obtained either by imposing appropriate restrictions ontransition probabilities or by introducing the time-dependence of these probabilities determinedby explanatory variables or functions of the state indicator. One of the recent extensions of thisbasic specification includes Infinite Hidden Markov models that grant great flexibility and improvedforecasting performance by allowing the number of states to go to infinity. Another one, namely,endogenous Markov switching model, explicitly relates the state indicator to model’s innovationsmaking it more interpretable and offering promising avenues for future developments.
Keywords:
Transition Probabilities, Exogenous Markov Switching, Infinite Hidden MarkovModel, Endogenous Markov Switching, Markov Process, Finite Mixture Model, Change-pointModel, Non-homogeneous Markov Switching, Time Series Analysis, Business Cycle Analysis ∗ Contact details:
Song: [email protected], Wo´zniak: [email protected] prepared for the Oxford Research Encyclopedia of Economics and Finance.The authors are grateful to Bill Griffiths, Chenghan Hou, Zhuo Li, Vance Martin, and Qiao Yang for theiruseful comments that improved the quality of this article.c (cid:13) . Introduction
The Markov switching (MS) methodology was introduced by the seminal work of Hamilton (1989).It is directly applicable to time series analysis for its dynamic nature. This section shows thebenchmark model and corresponding notation for the data and model parameters. For a morecomprehensive textbook exposition, see Chapter 22 by Hamilton (1994), Krolzig (1997), Kim et al.(1999) and Fr¨uhwirth-Schnatter (2006).The dependent variable at time t is denoted by y t for t = 1 , ..., T and T is the number of periodsin the sample. y t can be either a scalar, vector, or matrix. The independent variable is denotedby x t , where it can be a scalar, vector, or matrix. x t does not need to have the same dimension as y t , and it might include lagged values of y . A latent state at time t is unobservable to an econometrician and denoted by s t . It takes the value k ∈ { , , ..., K } , where K is a positive integer representing the total number of states. Such alatent variable s t indicates in which state the current system is at time t . Hence, it can be called state indicator or regime indicator . The name state and regime are often used interchangeably inthe literature.From the definition, the state indicator is a scalar. However, it generalizes any vector or matrixrepresentation of states as long as the number of states is finite. For example, assume a vector ofstates z t = ( z t , z t ), in which z t takes values from set { , ..., K } and z t from { , ..., K } . Sucha tuple z t can be simply collapsed to a scalar indicator s t , which takes value of 1 , ..., K K . Eachelement in s t corresponds to a vector z t . Sometimes, a vector form of the state variable is preferredin specific applications. However, knowing a scalar representation is generally enough to learn thebasic framework.The dynamics of the state indicator is governed by a Markov process. The probability distributionof s t given the whole path { s t − , s t − , ..., s } only depends on the most recent state s t − . Definesuch a transition probability asPr( s t = j | s t − = i, s t − , ..., s ) = Pr( s t = j | s t − = i ) = p ij , (1)where i, j = 1 , ..., K . Given that in period t − i the probability that thestate will switch to state j in period t is equal to p ij . A transtion matrix organizes these transitionprobabilities in a K × K matrix and is definded as P = [ p ij ] K × K , (2)where p ij is the element on the i th row and j th column such that the elements in each of the rowsof matrix P sum to one.A vector of unconditional state probabilities, denoted by π ≡ E [ s t ], is defined from P ′ π = π. (3)The equation above indicates time-invariance of distribution π , that is, an iteration over oneperiod performed by the premultiplication by the transition probability P ′ does not change vector π . The solution to equation (3) for π given ı ′ K π = 1, where ı K is a K -vector of ones, is given in2amilton (1994, Chapter 22) and expresses π as a function of P . Define a ( K + 1) × K matrix P = (cid:20) I K − P ′ ı ′ K (cid:21) . Hamilton’s solution for π is given by the ( K + 1)th column of ( P ′ P ) − P ′ .The distribution of the initial state at t = 0, denoted by s , is represented by the following K -vector π ≡ [Pr( s = k )] K × . (4)For an ergodic Markov process, the initial distribution can be simply set to the stationary distribution π . If the Markov process is non-stationary, there typically exists theory-guided information for π .For example, a change-point model requires that the initial state is in the first state, namely, s = 1. A measurement equation lays out the probability law of the observations and is given by y t ∼ F ( x t , s t ) , (5)where F represents the distribution of y t conditional on the observation x t and the latent state s t .There is no restriction on F . It can be a discrete, continuous, or mixture distribution dependingon the structure of the data y t . For example, consider a time series of asset returns and assumethat each state admits an autoregressive process of order one with Gaussian innovations. Then F can be expressed as F ( x t , s t ) ≡ N ( µ s t + β s t y t − , σ s t ) , where µ s t + β s t y t − is the mean and σ s t the variance of this conditional distribution given s t and y t − . The same assumption also implies a regression form of equation (5) given by y t = µ s t + β s t y t − + σ s t ǫ t , ǫ t ∼ N (0 , . (6)Equation (1) and (5) comprise the foundation of the Markov switching framework. Togetherwith the initial condition (4), the likelihood function p ( Y | Θ) is available through the filter techniquefrom Hamilton (1989) called the
Hamilton filter . The notation Y denotes a collection of all y t for t =1 , ..., T ; and Θ is the collection of all time-invariant parameters. For instance, Θ ≡ { µ k , β k , σ k } Kk =1 in the example above. The Hamilton filter gives the conditional distribution of the state s t on the data up to time t , Y t ,denoted by p ( s t | Y t ). It is used to compute the one-period ahead probability density of y t from3he following formula. p ( y t +1 | Y t ) = K X s t +1 =1 p ( y t +1 , s t +1 | Y t ) (7a)= K X s t +1 =1 p ( y t +1 | s t +1 , Y t ) p ( s t +1 | Y t ) (7b)= K X s t +1 =1 " p ( y t +1 | s t +1 , Y t ) K X s t =1 p ( s t +1 , s t | Y t ) (7c)= K X s t +1 =1 " p ( y t +1 | s t +1 , Y t ) K X s t =1 p ( s t +1 | s t , Y t ) p ( s t | Y t ) (7d)= K X s t +1 =1 " p ( y t +1 | s t +1 , Y t ) K X s t =1 p ( s t +1 | s t ) p ( s t | Y t ) (7e)= K X s t +1 =1 K X s t =1 p ( y t +1 | s t +1 , Y t ) p ( s t +1 | s t ) p ( s t | Y t ) , (7f)where p ( y t +1 | s t +1 , Y t ) is obtained from the measurement equation (5) and p ( s t +1 | s t ) is the transitionprobability in equation (1). The derivation above utilizes the forecasted state probability convenientlydecomposed into the transition and filtered probabilities p ( s t +1 | Y t ) = K X s t =1 p ( s t +1 | s t ) p ( s t | Y t ) , (8)where the filtered probability for s t +1 is given by p ( s t +1 = k | Y t +1 ) = p ( s t +1 , y t +1 | Y t ) p ( y t +1 | Y t )= p ( y t +1 | s t +1 , Y t ) p ( s t +1 | Y t ) p ( y t +1 | Y t )= p ( y t +1 | s t +1 , Y t ) K P s t =1 p ( s t +1 | s t , Y t ) p ( s t | Y t ) p ( y t +1 | Y t ) . The filtered probability of s t +1 is easy to compute as long as the filtered probability of s t is known.The conditional density p ( y t +1 | Y t ) has been derived above. Given the initial distribution of thefirst-period state, π , the filtered probability of s t and the associated one-period ahead probabilitydensity can be calculated iteratively. The likelihood function is constructed as the product ofconditional probability densities: p ( Y | Θ) = T Y t =1 p ( y t | Y t − , Θ) , (9)where Y is treated as known and the state indicators, s t for t = 1 , ..., T , are integrated out. Ananalytical expression for the likelihood function is available and can be found in Fr¨uhwirth-Schnatter(2006). It is based on equations (9) and (7b). 4 .4. Parameter Estimation The maximum likelihood estimation is based on an iterative expectation-maximization (EM)algorithm (see Hamilton, 1990). In each of its iterations, a filtering-smoothing algorithm is used topropose the current estimate of S , a collection of all s t for t = 1 , . . . , T , and a maximization step isapplied to compute an estimate of Θ and P . The maximum likelihood estimation is straightforwardif regularity conditions are satisfied. However, for larger models, it might become cumbersome dueto unbounded likelihood function.Bayesian estimation relies on a data augmentation technique that requires the specificationof a complete-data likelihood function p ( Y, S | Θ) = p ( Y | S, Θ) p ( S | Θ), where the objects on theright-hand side of the expression are easily obtainable without integration. The complete-datalikelihood function is subsequently used to specify the full-conditional posterior distributions p (Θ | Y, S )and p ( S | Y, Θ). Sampling from the joint posterior distribution of the parameters and states isperformed via Markov Chain Monte Carlo (MCMC) methods. Since the former conditionaldistribution takes S as given, it can be sampled from using standard techniques. The samplingfrom the latter relies on the forward filtering and backward sampling (FFBS) method from Chib(1996). The first results for the asymptotic normality of the maximum likelihood estimator of the MS modelparameters were proposed by Lindgren (1978) and Lehmann (1983), whereas the finite sampleproperties of this estimator are analyzed in Psaradakis & Sola (1998). The frequentist solution toa problem of selection of the number of states of the Markov process K relies on information criteriasuch as, for instance, Akaike Information Criterion (see a detailed study by Psaradakis & Spagnolo,2003). Testing of a hypothesis that K = 1 against an alternative hypothesis that K = 2 is highlycumbersome since the MS model is not identified under the null hypothesis, and the solutionsrequire sophisticated inferential methods some of which are provided by Carrasco et al. (2014) andmore recently by Meitz & Saikkonen (2017).Bayesian model selection based on marginal data densities provides the solution to the numberof states determination problem, including a model with K = 1 (see Fr¨uhwirth-Schnatter, 2006).However, the main challenge for correct Bayesian inference in MS models is the problem oflabel switching which is defined as the invariance of the likelihood function to the labeling ofthe states. Consider an example in equation (6) with two states characterized by state-specificparameter vectors of state A and state B , denoted respectively by Θ A and Θ B . The labelswitching problem states that irrespective of whether the vector of parameters (Θ , Θ ) is set to(Θ A , Θ B ) or (Θ B , Θ A ) the value of the likelihood function evaluated at these parameters staysinvariant. Fr¨uhwirth-Schnatter (2001) proposes to analyze a multimodal global shape of thelikelihood function and the posterior distribution (see Fr¨uhwirth-Schnatter, 2004, for the applicationof this approach to the computation of marginal data densities). Implementing ordering restrictionson the state-dependent parameters of the model that would provide a unique classification of thestates is a solution that is only applicable if such restrictions do not bind the posterior distribution.Alternatively, Geweke (2007) proposes to base inference on label-switching invariant characteristicssuch as predictive densities. The interpretation of the MS models is based on the combined analysis of the parameter estimatesand one of the available estimates of the state probabilities. The latter should be chosen amongst5he forecasted, filtered, or smoothed probabilities, denoted by Pr[ s t | y t − , y t − , . . . ], Pr[ s t | y t , y t − , . . . ]and Pr[ s t | Y ], respectively, that are obtained through the FFBS algorithm, depending on a particularapplication and the objective of investigation.The first application of the MS models in economics included the analysis of business cyclesand could be presented as follows. Consider data on gross domestic product growth rates towhich the autoregressive model from equation (6) is fitted. The business cycle interpretationof the model was proposed by Hamilton (1989). It was acclaimed because the state-dependentconstant term estimates were positive and negative, respectively, in the two states, which smoothedprobabilities had high values for the periods commonly related to as economic expansions andrecessions, respectively. Similar reasoning was applied to financial markets characterized by bulland bear markets subsequently occurring one after another as in Hamilton & Lin (1996).Another such example includes a multivariate model of the effects of monetary policy on the realeconomy in the U.S. with conditional heteroskedasticity modeled with the MS process proposedby Sims & Zha (2006). In this model, the subsequent volatility states occurred to have highvalues of smoothed probabilities in the periods corresponding to the terms of subsequent Chairsof the Federal Reserve. For instance, the state with the highest value of the volatility had highprobabilities of occurrence in the period of Paul Volcker’s chairmanship. In contrast, the lowestvolatility state spanned the term of Alan Greenspan’s.Many applications in economics rely on some concept of causality. Two such popular conceptsare Granger causality proposed by Granger (1969) and Sims (1980) that relates the causal linkbetween variables to their predictive power, and another considers causal links between variablesto be based on a structural model of an economy. An approach that investigates Granger causalityfor a specific state of an MS Vector Autoregressive model was proposed by Psaradakis et al. (2005)whereas the framework that is unconditional on the states was proposed by Droumaguet et al.(2017). An explicit form of dependence between two or more Markov processes describing countryor regional business cycles was proposed by Owyang et al. (2005), Hamilton & Owyang (2012),and more recently by Leiva-Leon (2017).Finally, a new class of structural MS rational expectations models utilizes the MS rule todetermine the time-variation of the parameters of the Dynamic Stochastic General Equilibriummodel. In this framework, the considered agents are rational and, thus, know the MS rule and,consequently, take it into account in their decision-making problem. Farmer et al. (2009) providesthe theory behind such a formulation of the model while Liu et al. (2011) provides a model formacroeconomic fluctuations. Farmer et al. (2011) propose a method of solving these models.
2. Exogenous Markov Switching
The MS model was introduced in Section 1 through the definition of the likelihood function inwhich the predictive densities of the data p ( y t +1 | s t +1 , Y t ) are weighted by the forecasted stateprobabilities p ( s t +1 | Y t ) as in equation (7b). The decomposition of the latter into a product ofthe transition times the filtered probabilities as in equation (8) reveals that it does not dependon the contemporaneous observation y t +1 but only on the previous state s t and past observations Y t . The independence of forecasted state probability p ( s t +1 | Y t ) from the error component of themeasurement equation (5) defines a popular family of exogenous MS models with finite number ofstates, K , that is considered in this section. In Section 3, models with an infinite number of states, K , are considered while Section 4, discusses endogenous MS models in which this dependence isintroduced. 6 .1. A Family of Markov Switching Models The properties of the latent Markov process are driven by the form of the transition matrix P . Ageneral way of imposing restrictions on the transition matrix was proposed by Sims et al. (2008)and Wo´zniak & Droumaguet (2015). Let P i denote the i th row of matrix P that is represented interms of its unrestricted elements, collected in an 1 × r i vector p i whose elements sum to one, asfollows P i = p i W i , (10)where W i are predetermined r i × K matrices for i = 1 , . . . , K . The definition of the transitionmatrix as in the equation above is used below to demonstrate various types of this matrix and theimplied properties of the Markov process. Note that if r i = 1 for some i , then the only element of p i is equal to one.A general stationary and aperiodic MS process in which each of the states can be revisited atany time t presumes r i = K and W i = I K for each i , where I K is an identity matrix of order K . Insuch a model, all of the K elements of the transition matrix are estimated, there is no absorbingstate, all of the elements of the ergodic probabilities vector π are nonzero, and the probabilities ofthe initial state are most often set to π . This type of MS model is the most frequently applied ineconomics and finance.A class of finite mixture models is nested within the MS models by setting W i = I K and p i = π ′ for each i , where all of the elements of π are strictly positive. In this model, the forecastedstate probabilities are time-invariant and equal to p ( s t +1 | Y t ) = π . However, the clustering ofobservations is facilitated through the smoothed probabilities that are allowed to change over time.Finite mixture models provide a convenient way of modeling nonstandard distributions that areoften required for the error terms in economic and finance applications (see Norets, 2010). It canbe shown that any distribution of a random variable defined on a real scale can be approximatedby a mixture of normal distributions, while distribution of a random variable defined on a positivereal scale can be approximated by a mixture of gamma distributions.Change-point models can be used to introduce monotonic regime changes as in a model proposedby Chib (1998). The process is initiated in the first state s = 1, with probability p it remainsunchanged, and with probability p it switches to the other regime. The first state is never tobe revisited and, thus, p = 0. In general, given that at some period t the Markov process is instate s t = k < K , it remains in this state with probability p kk and is only allowed to switch to thenext regime with probability p k +1 .k . Finally, when the process reaches the K th state, it stays thereforever. In the change-point models the transition matrix is obtained by setting r i = 2 and W i toa 2 × K matrix whose columns contain the i th and ( i + 1)th rows of I K for i = 1 , . . . , K −
1, aswell as r K = 1 and a row vector W K contains the last row of I K . An example of such a transitionmatrix for K = 3 is given by p p p p . Therefore, these models are capable of estimating the time at which the regime changes occur.Consequently, they attracted considerable attention in economic and finance applications despitesome recent evidence that maintaining the transition matrix unrestricted is capable of capturingsimilar patterns in data with non-deteriorating in- and out-of-sample fit. Finally, the change-pointmodels introduce non-stationarity in the Markov-process and, thus, their ergodic probabilities areall equal to zero except for the last element of π that is equal to one. Therefore, the last state is7he absorbing state that gains 100 percent of the probability mass asymptotically with T → ∞ .Finally, Fr¨uhwirth-Schnatter (2006) gives a detailed discussion of the nuances of the estimation ofstationary and non-stationary Markov processes.In a simple and popular deterministic change-point models, s t is assumed to be known andprovided by the econometrician. In many applications, this model is used in order to estimatestate-dependent parameters in pre-determined subsamples. Moreover, it is straightforward to set s t to obtain monotonic regime changes. Nevertheless, in deterministic change-point models, theproperties of the Markov process are not driven by the transition matrix, which is redundant givenfixed s t . The regime-change dates are not estimated and, unless the econometrician knows the datagenerating process and sets s t accordingly, the model fit deteriorates heavily compared to othermodels considered in this section.A more elaborate form of matrices W i may lead to the desired application-specific propertiesof the Markov process. Consider a model used by Sims (2001) who introduces symmetric jumpingamong adjacent regimes. In this example, the desired transition matrix for the case of K = 4 statesis p (1 − p ) 0 0(1 − p ) / p (1 − p ) / − p ) / p (1 − p ) /
20 0 (1 − p ) p , which can be easily obtained by setting r i = 2 for each i and W = (cid:20) (cid:21) , W = (cid:20) (cid:21) , W = (cid:20) (cid:21) , W = (cid:20) (cid:21) . In another example, Sims et al. (2008) consider a generalization of this model inspired by thedevelopments in Cogley & Sargent (2005) that allows for potentially many states and the estimationof all of the parameters in matrix P , while keeping the number of its free parameters relativelysmall. In this model, the desired transition matrix has the following form p a α (1 − p ) a α (1 − p ) · · · a α K − (1 − p ) a α (1 − p ) p a α (1 − p ) · · · a α K − (1 − p )... ... ... . . . ... a K α K − K (1 − p KK ) a K α K − K (1 − p KK ) a K α K − K (1 − p KK ) · · · p KK , that can be obtained by setting r i = 2 for each i , matrices W i accordingly to W = (cid:20) . . . a α a α . . . a α K − (cid:21) ,W = (cid:20) . . . a α a α . . . a α K − (cid:21) ,. . .W K = (cid:20) . . . a K α K − K a K α K − K a K α K − K . . . (cid:21) , where α , . . . , α K are positive free parameters to be estimated, and a , . . . , a K are such that thecolumns of matrices W i sum to one. This model uses scarce parameterization of the transition8atrix that is capable of capturing occasional discontinuous shifts in the values of the regime-dependentparameters when K is small, as well as frequent, incremental changes in these parameters for larger K . The choice of K is a matter of empirical investigation.Finally, the Markov property of the latent process might be extended by introducing thedependence of the current state, s ∗ t , not only on one but several recent realizations of the latentprocess. The original model by Hamilton (1989) assumes the dependence of the model parameterson the current and previous regime, s ∗ t and s ∗ t − respectively. This dependence can be modeled bya new four-state Markov process, s t , through the following state representation: s t = 1 if s ∗ t = 1 and s ∗ t − = 1 ,s t = 2 if s ∗ t = 2 and s ∗ t − = 1 ,s t = 3 if s ∗ t = 1 and s ∗ t − = 2 ,s t = 4 if s ∗ t = 2 and s ∗ t − = 2 , and an appropriate form of the transition matrix: p p p p p p p p , that can be specified by setting r = r = 2, p = p and p = p and W = W = (cid:20) (cid:21) , W = W = (cid:20) (cid:21) . In this class of MS models, various groups of parameters of the model depend on separate andindependent Markov processes (see Phillips, 1991; Ravn & Sola, 1995, for some early applications).The examples of such models include models in which the parameters of the conditional meanprocess depend on a different Markov process than conditional variances (see, e.g. Sims et al., 2008),or structural models in which the money demand equation depends on different Markov processthan other parameters of the models (see, e.g. Sims & Zha, 2006). Consider L such independentprocesses s lt each parameterized by a K l × K l transition matrix P l for l = 1 , . . . , L . This modelcan be represented by a composite Markov process s t = ( s t , . . . , s Lt ) with Q Ll =1 K l states and thecorresponding transition matrix given by P = P ⊗ · · · ⊗ P L , where ⊗ denotes the Kronecker product. The gain from the tensor product representation ofthe transition matrix above, introducing nonlinear restrictions, is an economic parameterizationfacilitating the estimation. Note that each of the transition matrices P l might be subject torestrictions as that in equation (10). 9 .3. Non-homogeneous Markov Switching Finally, this survey of parameterizations of transition matrices is concluded by presentation of anon-homogenous MS model in which the transition probabilities change over time (see Filardo,1994; Diebold et al., 1994). The introduction of this time variation is often combined with thedependence on some variables v t that might contain x t (as in e.g. Filardo, 1994) as well asthe state indicators s t (as in Otranto, 2005) or a measure of a duration of the state (as inDurland & McCurdy, 1994). The restriction imposed on the columns of the transition matrixleads to the parameterization of the transition probabilities through the logistic regression in thefollowing form p ij ≡ P ( s t = j | s t − = i ) = exp( v t γ ij ) P Kj =1 exp( v t γ ij ) , where γ ij are parameter vectors to be estimated the dimensions of which correspond to vector v t . Note that for the identification of transition probabilities for each i there is a j so that γ ij isa vector of zeros (see Koki et al., 2020, for a recent exposition on the setup of the model). Theselection of the variables in v t determines the time-dependence in transition probabilities and issubject to empirical verification. As long as the establishment of the factors determining transitionprobabilities might be of interest in itself the non-homogeneous MS models often lead to similarestimates of filtered and smoothed state probabilities as the simple MS model in many data sets.
3. Infinite Hidden Markov Model
The infinite hidden Markov model (IHMM) was developed by Beal et al. (2002) and Teh et al.(2006). It builds on the Dirichlet process mixture model (DPM) from Escobar & West (1995)and extends the finite number of states of the MS model to the case in which this number goesto infinity, K → ∞ . Such an extension introduces a fundamental advancement of econometricmodeling by transforming the parametric MS framework into a nonparametric structure.A direct consequence is that the transition matrix P implied by equation (2) has an infinitedimension and can be presented as P ≡ [ p ij ] = p p p ...p p p ...p p p ... ... ... ... . . . , where i, j = 1 , , , . . . and p ij ≥
0. From the definition, each row of P must sum up to 1, ∞ P j =1 p ij = 1. The time-invariant parameters that describe the k th state are defined as θ k , andthere is an infinite number of them. Similarly to the finite-state MS models, the parameter spacecomprises of the state indicator S ≡ { s t } Tt =1 , the time-invariant parameters Θ ≡ { θ k } ∞ k =1 and thetransition matrix P = [ p ij ] ∞×∞ . Because of the parameter saturation problem, the IHMM cannot be estimated by classical methodswithout regularization. On the contrary, the Bayesian approach is coherent, more appropriate forinference, and thus, the existing research on the IHMM uses mostly this framework.10hree popular ways could be used to draw inference from the IHMM. The first is to integrate outthe transition probability P based on the Chinese restaurant representation of the Dirichlet processas in Fox et al. (2011). This method works directly on states, but the derivation is complicated. Thesecond is to apply the beam sampler to stochastically truncate the number of states to a finite oneduring the MCMC as in Van Gael et al. (2008). This method provides an exact inference similarlyto the first method. Besides, it utilizes conditional independence by keeping the transition matrix P in the parameter space, which allows parallel computations and is usually much faster than thefirst approach.The last method applies the degree- K weak limit approximation from Ishwaran & Zarepour(2002) as in Bauwens et al. (2017). It uses a truncated Dirichlet process so that the IHMMresembles an appropriate finite-state MS model. In practice, Song (2014) found that standardMS models with a large number of states performs similarly to the IHMM, where the number ofinactive states should be nonzero, where an inactive state is a state without data being assigned toit. This approach renders the IHMM easier to execute. Although two caveats exist. First, the priordistribution on the transition matrix must be chosen according to Ishwaran & Zarepour (2002).Namely, the concentration parameter from the truncation approximation should be consistent withthe concentration parameter from the IHMM. Otherwise, the approximation is not valid. Second,the number of states in the MCMC must be monitored to avoid poor approximation. A simplerule would be that the number of active states, that is those with nonzero number of observationsclassified into it, must always be less that the total number states K in the approximation. Forexecution of the algorithm, see Bauwens et al. (2017). The IHMM originated from the machine learning literature where it was used to reveal dynamicclustering in applications, including dialogue summary and motion capture. For the last decade,it has been applied to various fields in economics and finance. The seminal developments includeSong (2014), Jochmann (2015) and Dufays (2015). The motivation for using the IHMM is that itdemonstrates well the trade-off between heuristic economic interpretations and competitive forecastaccuracy. This feature constitutes an advantage over many of the machine learning methods thathardly allow for structural interpretations.Similarly to the conventional MS models, the IHMM maintains its first-order Markov chainproperty. However, due to the differences in the prior distribution setup, and the problem ofdoubling the states, a phenomenon consisting of the possibility of producing an additional statethat mimics an already existing one, the DPM should not be considered a device for detecting thenumber of states of the mixture or MS models as it was suggested in some early approaches suchas by Otranto & Gallo (2002). Miller & Harrison (2013) states the argument formally.An attractive feature of the IHMM is that it captures regime-switching and structural breakjointly. Therefore it grants more flexibility and accommodates data dynamics upon the arrivalof new observations. The regime-switching module pools data with similar behavior to borrowstatistical strength from each other. In addition, the structural break dynamics can generate anew state when the arriving new observations exhibit a new law of motion. A well-known exampleis the global financial crisis in 2007 and 2008. It began with the subprime mortgage crisis inthe U.S. and became the most severe financial crisis since 1930. Such an asset market downturndoes not have the same source as any bear market regime in history. Any standard MS modelfor bull and bear markets such as Maheu & McCurdy (2000), Lunde & Timmermann (2004) andMaheu et al. (2012) is incapable of capturing the new phenomenon, because they are limited by11he data history and do not allow structural changes. The IHMM is an appropriate vehicle toachieve such a goal because the capability of generating a new unprecedented state is a feature ofthe latent process.Another advantage of the IHMM lies in its superior forecasting performance, which is anempirical observation with plausible intuition. Benchmark models such as autoregressions orlinear regression have a rigid assumption of the functional forms. Instead, the IHMM is moreflexible and hence robust to model misspecification. Its flexibility comes from its ability to sortdata into clusters endogenously, so that behaviorally different data will not pollute each other’sinference. One can find the kinship between this idea and the piece-wise linear model in a univariateframework. Meanwhile, the IHMM is easily extendible to the multivariate framework, and thenodes are inferred jointly with the other model coefficients. Two vital components to achievebetter prediction performance are the hierarchical prior structure (Song, 2014) and certain regimepersistence (Fox et al., 2011) in economic and finance applications. The hierarchical structureexploits more data information by letting the regimes to inform one another. At the same time,regime persistence reflects salient stylized facts that economic time series are prone to local dynamicstability.The IHMM provides a convenient tool for a control approach in grand modeling frameworks.Consider a modeling framework that utilizes independent variables x t and error term ǫ t . Anyincorrect distributional assumption about ǫ t could potentially adversely affect the inference on thefunctional form of x t . If such distribution is not the focus of the application, simple estimators as(nonlinear) least-squares are perfectly competent. If the distribution matters for inference such as inrisk analysis, these methods are no longer useful, and a semi-parametric approach has its advantagesby imposing a nonparametric distribution on ǫ t . Such an assumption releases ǫ t from any potentialmisspecification. Moreover, the parametric part related to x t is free from contamination of anyparametric assumption imposed in ǫ t . Lastly, the curse of dimensionality is not an issue because ǫ t is usually univariate, while x t is not. For examples, see Jensen & Maheu (2010), who modeled ǫ t as a DPM model. The most recent research treating ǫ t as an IHMM includes Hou (2017) andDufays et al. (2019). The IHMM provides a basis for the burgeoning academic literature on its extensions. An IHMMwith DPM emission can be found in Fox et al. (2011), but it does not allow sharing particles amongstates. A block IHMM by Stepleton et al. (2009) generalizes the stick structure of Fox et al.(2011) by introducing more in-state dynamics to capture finer structures. In this approach,each state is a distinct MS model, that makes it suitable in the application to the bull andbear markets modeling as in Maheu et al. (2012). To capture long memory, Gael et al. (2009)proposed the factorial IHMM. Examples of its applications include Nakano et al. (2011) andHeller et al. (2009). The factorial IHMM does not impose any restrictions for identification and,thus, structural interpretations are hardly possible, which limits the scope of applications ineconomics and finance. Many interesting modeling frameworks arise from incorporating variouscomponents into the IHMM structure, and some of such ideas can be found in Gray (1996),Ehrmann et al. (2003) and Haas et al. (2004). Alternatively, a modeling strategy consisting ofapplying the IHMM structure to existing interpretable parametric models can be found in Liu & Maheu(2018), Jin et al. (2019) and Jin et al. (2019). Finally, there is a substantial body of work thatfollows up and extends the IHMM in different directions and some examples include Shi & Song122014), Bauwens et al. (2017), Maheu & Yang (2016), Yang (2019), Hou (2017) and Luo et al.(2019).
4. Endogenous Markov Switching
Recent literature proposed models that question the assumption of exogeneity of the Markovprocess and make an explicit link between the measurement equation error term and this latentprocess. In consequence, some form of endogeneity of the Markov process is introduced. Kim et al.(2008) argue that in many applications, endogeneity of the Markov process corresponds to the dataproperties and theoretical considerations to a larger extent than the exogenous one. While theinitial proposal by Kim et al. (2008) implements endogeneity in the MS model, this section focuseson a specification proposed by Chang et al. (2017). In this model, a discrete-valued latent process, s t , is driven by a real-valued process, w t , that is related to the measurement equation error termand then conveniently discretized. Define the dynamics of the real-valued latent factor by an autoregressive process w t = αw t − + v t , (11)where α is the persistence coefficient such that | α | ≤ v t is a standard normal error term.The initial value of the process, w , is recommended to be normally distributed with the zero meanand variance equal to 1 / (1 − α ) for | α | <
1, or equal to zero, w = 0, for α = 1. The processin equation (11) is discretized by defining a threshold parameter, τ , and the discrete-valued stateindicator for a two-state model, K = 2, as s t = (cid:26) w t < τ, w t ≥ τ. (12)Therefore, as long as the process w t is subject to interpretation, its primary role is to define theMarkov process s t .Chang et al. (2017) define the measurement equation in a general form that makes an explicitlink between the potential dependence of the conditional mean and standard deviation on theindependent variables, x t , and the latent processes y t = m ( x t , w t ) + σ ( x t , w t ) ǫ t = m ( x t , s t ) + σ ( x t , s t ) ǫ t , (13)where ǫ t , conditionally on x t and w t (or s t ), is a serially uncorrelated standard normal errorterm. Endogeneity of the Markov process is formally introduced in this model by an appropriatespecification of the joint distribution of the error terms from the state and measurement equations(11) and (13) respectively that is given by (cid:20) ǫ t v t +1 (cid:21) ∼ N (cid:18)(cid:20) (cid:21) , (cid:20) ρρ (cid:21)(cid:19) , (14)where ρ is the correlation coefficient. It can be shown that the exogenous MS model that isdiscussed in Section 2 can be obtained by setting ρ = 0. In other words, the restricted endogenousand the exogenous MS models are observationally equivalent, that is, they both lead to the same13alue of the likelihood function given the values of Θ and S . However, for the values of ρ that aredifferent from zero the relationship between ǫ t and w t +1 and, consequently, between ǫ t and s t +1 is established. Expressions for the implied transition probabilities that in this model are changingover time are given in the original paper.A different form of endogeneity was proposed by Kim et al. (2008) who specified the jointdistribution of error terms, such as the one in equation (14), for vector ( ǫ t , v t ). This modelpresumes a contemporaneous effect of ǫ t on w t and s t which Chang & Kwak (2017) point outto be misspecified. Kim et al. (2008) consider an application to modeling a volatility feedbackeffect in financial time series. To illustrate the interpretation of non-zero ρ consider a model in equation (6) with s t specified bythe endogenous Markov process and with | β k | < k = 1 ,
2. A possible application in financeincludes the modeling of the leverage effect defined as the negative correlation between the currentinnovation and future conditional variance of the return on a financial asset. Let ρ < σ < σ .In this case, a negative realization of ǫ t increases the probability of the second state in period t + 1and leads to an increase in the conditional volatility.A simple application in time series analysis includes the modeling of the mean reversion thatworks differently for ρ of different signs. Let µ < µ and ρ <
0. Then, a positive realization of ǫ t decreases the probability of the second state in the period t + 1. Therefore, the mean reversionis also obtained at the level of the future conditional expected value that now decreases. In theopposite case of ρ >
0, a positive realization of ǫ t increases the probability of the second statein period t + 1 and, consequently, increases the conditional expected value of y t . Therefore, theforecasts of y revert to a mean that is higher, which has a destabilizing effect. Examples of moreelaborate applications in economics include the endogenous switching of the parameters modelingthe effects of monetary and fiscal policies proposed by Chang et al. (2018) and Chang & Kwak(2017), respectively.
5. Future Developments
Two suggested paths for future developments in the MS and IHMM frameworks include efficientalgorithms with the potential for parallel computations and interpretability incurred throughsparsity. Improvements on both of these fronts are required to make the analysis of big datasets possible by combining sufficient flexibility of the model with scarce parameterization.Existing approaches to MS models rely on the FFBS technique, discussed in Section 1. Thisfiltering and smoothing method consists of an iterative procedure that implies serial computations.With an increasing number of observations and states required to capture the features of data, theFFBS algorithm becomes computationally too requiring for practical implementations. Vectorization,tensor algebra, and efficient numerical algorithms provide some of the solutions. Still, they are farfrom being as computationally fast as available algorithms for real-values state-space models suchas, for instance, the precision sampler by Chan & Jeliazkov (2009).Moreover, an increasing interest in heterogenous MS models in which individual parametersfollow independent Markov processes calls for new methods of allowing sparsity in the modeling.In many such studies, the question of whether time-variation is required for a particular parameterand, if yes, then how many MS regimes are required to model it stay unanswered due to the lackof appropriate algorithms. It is worth mentioning that recent studies provide such solutions for14eal-values state-space models, and some examples include Fr¨uhwirth-Schnatter & Wagner (2010),Bitto & Fr¨uhwirth-Schnatter (2019), and Cadonna et al. (2019).Similar considerations apply to the IHMM, although it should be emphasized that the IHMMprovides certain solutions to the challenges singled out above for the MS models. Here, as thenumber of observations increases, the number of states, K , to be modeled in a particular iteration ofthe estimation algorithm following the beam sampler step shall increase as well. This fact increasesthe computation time required geometrically and calls for a more time-efficient estimation method.Variational inference offers faster algorithms at the cost of approximating the posterior distributionat an unknown precision (see Wainwright et al., 2008, for further reference). Variational Bayescaptures correctly the central tendency of the approximated distribution. However, it underestimatesthe posterior variances (see Wang & Blei, 2019, and references therein). Applications of VariationalBayes estimation to Dirichlet process and IHMM can be found in Blei et al. (2006), Kurihara et al.(2007), Teh et al. (2008), and Wang et al. (2011). Alternative approaches may seek to improve thecomputation through parallelization such as Fearnhead (2004), Rodriguez (2011), Williamson et al.(2013) and Tripuraneni et al. (2015).Notable recent developments granting sparsity in Dirichlet process models were proposed byFr¨uhwirth-Schnatter & Malsiner-Walli (2019). They build foundations for new developments forthe IHMM. In this approach, the sparsity is accompanied by choice of concentration parameters,and as an extra benefit, it offers significantly simplified computations as the sparse structure heavilypenalizes the number of states. References
Bauwens, L., Carpantier, J.-F., & Dufays, A. (2017). Autoregressive moving average infinite hidden markov-switchingmodels.
Journal of Business & Economic Statistics , , 162–182.Beal, M. J., Ghahramani, Z., & Rasmussen, C. E. (2002). The infinite hidden markov model. In Advances in neuralinformation processing systems (pp. 577–584).Bitto, A., & Fr¨uhwirth-Schnatter, S. (2019). Achieving shrinkage in a time-varying parameter model framework.
Journal of Econometrics , , 75–97.Blei, D. M., Jordan, M. I. et al. (2006). Variational inference for dirichlet process mixtures. Bayesian analysis , ,121–143.Cadonna, A., Fr¨uhwirth-Schnatter, S., & Knaus, P. (2019). Triple the gamma – A unifying shrinkage prior forvariance and variable selection in sparse state space and TVP models, .Carrasco, M., Hu, L., & Ploberger, W. (2014). Optimal Test for Markov Switching Parameters. Econometrica , ,765–784.Chan, J., & Jeliazkov, I. (2009). Efficient Simulation and Integrated Likelihood Estimation in State Space Models. International Journal of Mathematical Modelling and Numerical Optimisation , , 101–120.Chang, Y., Choi, Y., & Park, J. Y. (2017). A new approach to model regime switching. Journal of Econometrics , , 127–143.Chang, Y., & Kwak, B. (2017). U.S. monetary-fiscal regime changes in the presence of endogenous feedback in policyrules.Chang, Y., Maih, J., & Tan, F. (2018). State Space Models with Endogenous Regime Switching.Chib, S. (1996). Calculating posterior distributions and modal estimates in markov mixture models. Journal ofEconometrics , , 79–97.Chib, S. (1998). Estimation and comparison of multiple change-point models. Journal of Econometrics , , 221–241.Cogley, T., & Sargent, T. J. (2005). Drifts and volatilities: Monetary policies and outcomes in the post WWII US. Review of Economic Dynamics , , 262–302.Diebold, F. X., Lee, J.-H., & Weinbach, G. C. (1994). Regime switching with time-varying transition probabilities.In C. Hargreaves (Ed.), Nonstationary Time Series Analysis and Cointegration (pp. 283–302). Oxford: OxfordUniversity Press. roumaguet, M., Warne, A., & Wo´zniak, T. (2017). Granger Causality and Regime Inference in Markov SwitchingVAR Models with Bayesian Methods. Journal of Applied Econometrics , , 802–818.Dufays, A. (2015). Infinite-state markov-switching for dynamic volatility. Jnl of Financial Econometrics , , 418–460.Dufays, A., Zhuo, L., Rombouts, J., & Song, Y. (2019). Sparse change-point var models. Available at SSRN , .Durland, J. M., & McCurdy, T. H. (1994). Duration-dependent transitions in a markov model of U.S. GNP growth,. , 279–288.Ehrmann, M., Ellison, M., & Valla, N. (2003). Regime-dependent impulse response functions in a markov-switchingvector autoregression model. Economics Letters , , 295–299.Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of theamerican statistical association , , 577–588.Farmer, R. E., Waggoner, D. F., & Zha, T. (2009). Understanding Markov-switching rational expectations models. Journal of Economic Theory , , 1849–1867.Farmer, R. E., Waggoner, D. F., & Zha, T. (2011). Minimal state variable solutions to Markov-switching rationalexpectations models. Journal of Economic Dynamics and Control , , 2150–2166.Fearnhead, P. (2004). Particle filters for mixture models with an unknown number of components. Statistics andComputing , , 11–21.Filardo, A. J. (1994). Business-Cycle Phases and Their Transitional Dynamics. Journal of Business & EconomicStatistics , , 299–308.Fox, E. B., Sudderth, E. B., Jordan, M. I., Willsky, A. S. et al. (2011). A sticky hdp-hmm with application to speakerdiarization. The Annals of Applied Statistics , , 1020–1056.Fr¨uhwirth-Schnatter, S. (2001). Markov Chain Monte Carlo Estimation of Classical and Dynamic Switching andMixture Models. Journal of the American Statistical Association , , 194– 209.Fr¨uhwirth-Schnatter, S. (2004). Estimating marginal likelihoods for mixture and Markov switching models usingbridge sampling techniques. Econometrics Journal , , 143–167.Fr¨uhwirth-Schnatter, S. (2006). Finite mixture and Markov switching models . Springer Science & Business Media.Fr¨uhwirth-Schnatter, S., & Malsiner-Walli, G. (2019). From here to infinity: sparse finite versus dirichlet processmixtures in model-based clustering.
Advances in data analysis and classification , , 33–64.Fr¨uhwirth-Schnatter, S., & Wagner, H. (2010). Stochastic model specification search for Gaussian and partialnon-Gaussian state space models. Journal of Econometrics , , 85–100.Gael, J. V., Teh, Y. W., & Ghahramani, Z. (2009). The infinite factorial hidden markov model. In Advances inNeural Information Processing Systems (pp. 1697–1704).Geweke, J. (2007). Interpretation and inference in mixture models: Simple MCMC works.
Computational Statistics& Data Analysis , , 3529–3550.Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica , , 424–438.Gray, S. F. (1996). Modeling the conditional distribution of interest rates as a regime-switching process. Journal ofFinancial Economics , , 27–62.Haas, M., Mittnik, S., & Paolella, M. S. (2004). A new approach to markov-switching garch models. Journal ofFinancial Econometrics , , 493–530.Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica: Journal of the Econometric Society , (pp. 357–384).Hamilton, J. D. (1990). Analysis of Time Series Subject to Changes in Regime.
Journal of Econometrics , , 39–70.Hamilton, J. D. (1994). Time series analysis . Princeton university press Princeton, NJ.Hamilton, J. D., & Lin, G. (1996). Stock Market Volatility and the Business Cycle.
Journal of Applied Econometrics , , 573–593.Hamilton, J. D., & Owyang, M. T. (2012). The Propagation of Regional Recessions. Review of Economics andStatistics , , 935–47.Heller, K., Teh, Y. W., & Gorur, D. (2009). Infinite hierarchical hidden markov models. In Artificial Intelligenceand Statistics (pp. 224–231).Hou, C. (2017). Infinite hidden markov switching vars with application to macroeconomic forecast.
InternationalJournal of Forecasting , , 1025–1043.Ishwaran, H., & Zarepour, M. (2002). Exact and approximate sum representations for the Dirichlet process. TheCanadian Journal of Statistics / La Revue Canadienne de Statistique , , 269–283.Jensen, M. J., & Maheu, J. M. (2010). Bayesian semiparametric stochastic volatility modeling. Journal ofEconometrics , , 306–316.Jin, X., Maheu, J. M., & Yang, Q. (2019). Bayesian parametric and semiparametric factor models for large realized ovariance matrices. Journal of Applied Econometrics , , 641–660.Jochmann, M. (2015). Modeling us inflation dynamics: A bayesian nonparametric approach. Econometric Reviews , , 537–558.Kim, C.-J., Nelson, C. R. et al. (1999). State-space models with regime switching: classical and gibbs-samplingapproaches with applications. MIT Press Books , .Kim, C. J., Piger, J., & Startz, R. (2008). Estimation of Markov regime-switching regression models with endogenousswitching. Journal of Econometrics , , 263–273.Koki, C., Meligkotsidou, L., & Vrontos, I. D. (2020). Forecasting under model uncertainty : Non-homogeneoushidden Markov models with P`olya-Gamma data augmentation. Journal of Forecasting , (pp. 1–19).Krolzig, H.-M. (1997).
Markov-Switching Vector Autoregressions. Modelling, Statistical Inference, and Applicationto Business Cycle Analysis. . Springer.Kurihara, K., Welling, M., & Teh, Y. W. (2007). Collapsed variational dirichlet process mixture models. In
IJCAI (pp. 2796–2801). volume 7.Lehmann, E. L. (1983).
Theory of Point Estimation . Wiley, New York.Leiva-Leon, D. (2017). Measuring Business Cycles Intra-Synchronization in US: A Regime-switching InterdependenceFramework.
Oxford Bulletin of Economics and Statistics , , 513–545.Lindgren, G. (1978). Markov Regime Models for Mixed Distributions and Switching Regressions. ScandinavianJournal of Statistics , , 81–91.Liu, J., & Maheu, J. M. (2018). Improving markov switching models using realized variance. Journal of AppliedEconometrics , , 297–318.Liu, Z., Waggoner, D. F., & Zha, T. (2011). Sources of macroeconomic fluctuations: A regime-switching DSGEapproach, . , 251–301.Lunde, A., & Timmermann, A. (2004). Duration dependence in stock prices: An analysis of bull and bear markets. Journal of Business & Economic Statistics , , 253–273.Luo, J., Klein, T., Ji, Q., & Hou, C. (2019). Forecasting realized volatility of agricultural commodity futures withinfinite hidden markov har models. International Journal of Forecasting, forthcoming , .Maheu, J. M., & McCurdy, T. H. (2000). Identifying bull and bear markets in stock returns.
Journal of Business &Economic Statistics , , 100–112.Maheu, J. M., McCurdy, T. H., & Song, Y. (2012). Components of bull and bear markets: bull corrections and bearrallies. Journal of Business & Economic Statistics , , 391–403.Maheu, J. M., & Yang, Q. (2016). An infinite hidden markov model for short-term interest rates. Journal of EmpiricalFinance , , 202–220.Meitz, M., & Saikkonen, P. (2017). Testing for observation-dependent regime switching in mixture autoregressivemodels.Miller, J. W., & Harrison, M. T. (2013). A simple example of dirichlet process mixture inconsistency for the numberof components. In Advances in neural information processing systems (pp. 199–206).Nakano, M., Le Roux, J., Kameoka, H., Nakamura, T., Ono, N., & Sagayama, S. (2011). Bayesian nonparametricspectrogram modeling based on infinite factorial infinite hidden markov model. In (pp. 325–328). IEEE.Norets, A. (2010). Approximation of conditional densities by smooth mixtures of regressions.
Annals of Statistics , , 1733–1766.Otranto, E. (2005). The multi-chain Markov switching model. Journal of Forecasting , , 523–537.Otranto, E., & Gallo, G. M. (2002). A Nonparametric Bayesian Approach To Detect the Number of Regimes inMarkov Switching Models. Econometric Reviews , , 477–496.Owyang, M. T., Piger, J., & Wall, H. J. (2005). Business Cycle Phases in U.S. States. Review of Economics andStatistics , , 604–616.Phillips, K. L. (1991). A two-country model of stochastic output with changes in regime. Journal of InternationalEconomics , , 121–142.Psaradakis, Z., Ravn, M. O., & Sola, M. (2005). Markov switching causality and the money-output relationship. Journal of Applied Econometrics , , 665–683.Psaradakis, Z., & Sola, M. (1998). Finite-sample properties of the maximum likelihood estimator in autoregressivemodels with Markov switching. Journal of Econometrics , , 369–386.Psaradakis, Z., & Spagnolo, N. (2003). On the determination of the number of regimes in Markov-switchingautoregressive models. Journal of Time Series Analysis , , 237–252.Ravn, M. O., & Sola, M. (1995). Stylized facts and regime changes: Are prices procyclical? Journal of MonetaryEconomics , , 497–526. odriguez, A. (2011). On-line learning for the infinite hidden markov model. Communications in Statistics-Simulationand Computation , , 879–893.Shi, S., & Song, Y. (2014). Identifying speculative bubbles using an infinite hidden markov model. Journal ofFinancial Econometrics , , 159–184.Sims, C. A. (1980). Macroeconomics and Reality. Econometrica , , 1–48.Sims, C. A. (2001). Stability and instability in us monetary policy behavior.Sims, C. A., Waggoner, D. F., & Zha, T. (2008). Methods for inference in large multiple-equation Markov-switchingmodels. Journal of Econometrics , , 255–274.Sims, C. A., & Zha, T. (2006). Were There Regime Switches in U.S. Monetary Policy? American Economic Review , , 54–81.Song, Y. (2014). Modelling regime switching and structural breaks with an infinite hidden markov model. Journalof Applied Econometrics , , 825–842.Stepleton, T., Ghahramani, Z., Gordon, G., & Lee, T.-S. (2009). The block diagonal infinite hidden markov model.In Artificial Intelligence and Statistics (pp. 552–559).Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical dirichlet processes.
Journal of the AmericanStatistical Association , , 1566–1581.Teh, Y. W., Kurihara, K., & Welling, M. (2008). Collapsed variational inference for hdp. In Advances in neuralinformation processing systems (pp. 1481–1488).Tripuraneni, N., Gu, S. S., Ge, H., & Ghahramani, Z. (2015). Particle gibbs for infinite hidden markov models. In
Advances in Neural Information Processing Systems (pp. 2395–2403).Van Gael, J., Saatci, Y., Teh, Y. W., & Ghahramani, Z. (2008). Beam sampling for the infinite hidden markovmodel. In
Proceedings of the 25th international conference on Machine learning (pp. 1088–1095). ACM.Wainwright, M. J., Jordan, M. I. et al. (2008). Graphical models, exponential families, and variational inference.
Foundations and Trends R (cid:13) in Machine Learning , , 1–305.Wang, C., Paisley, J., & Blei, D. (2011). Online variational inference for the hierarchical dirichlet process. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (pp. 752–760).Wang, Y., & Blei, D. M. (2019). Frequentist Consistency of Variational Bayes.
Journal of the American StatisticalAssociation , , 1147–1161.Williamson, S., Dubey, A., & Xing, E. (2013). Parallel markov chain monte carlo for nonparametric mixture models.In International Conference on Machine Learning (pp. 98–106).Wo´zniak, T., & Droumaguet, M. (2015). Assessing Monetary Policy Models : Bayesian Inference for HeteroskedasticStructural VARs.Yang, Q. (2019). Stock returns and real growth: A bayesian nonparametric approach.
Journal of Empirical Finance , , 53–69., 53–69.