Bayesian Inference for Structural Vector Autoregressions Identified by Markov-Switching Heteroskedasticity
BBayesian Inference for Structural Vector AutoregressionsIdentified by Markov-Switching Heteroskedasticity
Helmut L ¨utkepohl a , Tomasz Wo´zniak I b, ⇤ a DIW Berlin and Freie Universit¨at Berlin b University of Melbourne
Abstract
In this study, Bayesian inference is developed for structural vector autoregressive models inwhich the structural parameters are identified via Markov-switching heteroskedasticity. In sucha model, restrictions that are just-identifying in the homoskedastic case, become over-identifyingand can be tested. A set of parametric restrictions is derived under which the structural matrixis globally or partially identified and a Savage-Dickey density ratio is used to assess the validityof the identification conditions. The latter is facilitated by analytical derivations that make thecomputations fast and numerical standard errors small. As an empirical example, monetarymodels are compared using heteroskedasticity as an additional device for identification. Theempirical results support models with money in the interest rate reaction function.
Keywords:
Identification Through Heteroskedasticity, Bayesian Hypotheses Assessment,Markov-switching Models, Mixture Models, Regime Change
JEL classification:
C11, C12, C32, E52 I The authors acknowledge an excellent research assistance by Yang Song. They thank Dmitri Kulikov, AlekseiNetˇsunajev, Michele Pi ↵ er, Timo Ter¨asvirta, Luis Uzeda, and Benjamin Wong for the discussions of the manuscript ofthe paper. They are grateful also to the participants of the seminars at the Monash University, Technische Universit¨atDortmund, Bank of Canada, Queensland University of Technology, Reserve Bank of New Zealand, Australian NationalUniversity, and the University of Helsinki, as well as of the presentations during the Workshop on MacroeconomicResearch at the Cracow University of Economics, 24th International Conference Computing in Economics and Finance,Rimini Bayesian Econometrics Workshop, Workshop on Structural VAR Models at the Queen Mary University ofLondon, NBER-NSF Seminar on Bayesian Inference in Econometrics and Statistics 2018, Bayesian Analysis andModeling Summer Workshop 2017, Sydney Time Series & Forecasting Symposium 2017. The computations in this paperwere performed using the Spartan HPC-Cloud Hybrid (see Meade, Lafayette, Sauter & Tosello, 2017) at the University ofMelbourne. Helmut L ¨utkepohl acknowledges the support provided by the Deutsche Forschungsgemeinschaft throughthe SFB 649 ”Economic Risk”. Tomasz Wo´zniak acknowledges the support from the Faculty of Business and Economicsat the University of Melbourne through the faculty research grant. ⇤ Corresponding author:
Department of Economics, The University of Melbourne, Level 4, FBE Building, 111Barry Street, Carlton 3053, Victoria, Australia,
Phone: +
61 3 8344 5310,
Fax: +
61 3 8344 6899,
Email address: [email protected] , URL: http://bit.ly/tomaszwozniak .c . Introduction A central problem in structural vector autoregressive (SVAR) analysis is the identification of thestructural parameters or, equivalently, the identification of the structural shocks of interest. Theidentifying assumptions are often controversial. In order to avoid imposing unnecessarily manyrestrictions, typically only just-identifying restrictions are formulated. In that case the data arenot informative about the validity of the restrictions and they cannot be tested with statisticalmethods. Moreover, even if over-identifying restrictions are imposed, they can only be testedconditionally on a set of just-identifying restrictions. This state of the art has led some researchersto extract additional identifying information from the statistical properties of the data. Notablyheteroskedasticity and conditional heteroskedasticity of the reduced form residuals have beenused in this context (Rigobon, 2003). Using such additional information may enable the researcherto make the data speak on the validity of restrictions that cannot be tested in a conventionalframework.One model that has been used repeatedly in applied studies lately to capture heteroskedasticityis based on a latent Markov process that drives the changes in volatility. The model was firstproposed by Lanne, L ¨utkepohl & Maciejowska (2010) for SVAR analysis with identificationthrough heteroskedasticity and it was further developed by Herwartz & L ¨utkepohl (2014). TheSVAR model with Markov-switching heteroskedasticity (SVAR-MSH) is in widespread use (see,e.g., Netˇsunajev (2013), L ¨utkepohl & Netˇsunajev (2014, 2017), L ¨utkepohl & Velinov (2014), Velinov& Chen (2015), Chen & Netˇsunajev (2017) and (Kilian & L ¨utkepohl, 2017, Chapter 14)). SomeBayesian methodology has been developed for its analysis by Kulikov & Netˇsunajev (2013, 2017),Lanne & Luoto (2016) and Wo´zniak & Droumaguet (2015). Apart from Wo´zniak & Droumaguet(2015), all Bayesian approaches base inference for these models on draws from the posterior of thereduced form parameters and transform this output into the posterior draws of the structuralmodel identified through heteroskedasticity. Hence their methodology can only be used togenerate posterior draws for just-identified structural parameters which limits its applicabilitywhen over-identifying restrictions are of interest. Wo´zniak & Droumaguet (2015) focus on alocally identified SVAR-MSH model and develop methods for drawing from the posterior of thestructural parameters. The posterior distribution of the parameters of a locally identified modelis multimodal, however, which allows a statistical model comparison, but severely limits theanalysis of the structural parameters.In the present study, a full Bayesian analysis framework is presented based on a SVAR-MSHmodel where some or all equations are identified. The setup facilitates both of the objectivesmentioned above. We emphasize that our setup allows for the possibility that only some of thestructural equations and associated structural shocks are identified. In the SVAR literature it isnot uncommon that only the responses to a single shock or a small set of shocks are of interest.For example, in a monetary model, the monetary policy shock is often of primary interest. In thatcase, it makes sense to focus on the structural parameters associated with that shock only. Ourapproach allows us to handle that situation even if the other shocks are not properly identified.Our main additional contributions to the SVAR-MSH literature for identification throughheteroskedasticity are as follows: (1) Parametric restrictions for global and partial identification ofthe model are derived and the model is set up in a form such that the data become informative onthe conditions required for identification through heteroskedasticity. Moreover, the model setupfacilitates the Bayesian estimation of the structural parameters. (2) A procedure for investigatingthe restrictions for identification of the structural parameters based on a Savage-Dickey density2atio (SDDR) is proposed. For that purpose, a probability distribution is defined that generalizesthe beta, F , and compound gamma distributions. Thereby a Bayesian statistical procedure isobtained for investigating partial and global identification of the SVAR-MSH model. An SDDRcan also be used for assessing the heteroskedasticity of the structural shocks. (3) A fast MarkovChain Monte Carlo (MCMC) sampler is developed for the posterior distribution of the structuralparameters and a method for computing the marginal data density (MDD) is provided whichfacilitates a full Bayesian model selection.The methods are illustrated by applying them for an empirical analysis of the role of a Divisiamoney aggregate in a monetary policy reaction function. In a frequentist SVAR analysis, Belongia& Ireland (2015) find support for the hypothesis that Divisia monetary aggregates are importantvariables in the monetary policy rule. In their conventional SVAR models without accountingfor heteroskedasticity they can only test over-identifying restrictions to validate their hypotheses.Using Belongia & Ireland (2015) as a benchmark, the Bayesian methods developed in the currentstudy for the SVAR-MSH model are applied for a broader statistical analysis of the identifyingrestrictions even for models that are not identified in Belongia & Ireland’s framework. We findevidence that a money aggregate is an important factor determining the monetary policy.The remainder of this study is organized as follows. The next section presents the basicmodel framework and derives conditions for identification of the structural parameters. Section 3discusses the prior assumptions used for the structural parameters. The SDDR procedure forinvestigating the conditions for identification of the structural parameters obtained from thevolatility model is presented in Section 4 and the empirical illustration is discussed in Section 5.Conclusions follow in Section 6 and, finally, the proof of a result regarding identification throughheteroskedasticity is given in Appendix Appendix A, the computational details of the Gibbssampler and the estimation of the marginal data densities are presented in Appendix AppendixB, while Appendix Appendix C contains more details on the distribution used in the SDDRprocedure. Additional empirical results on the precision of our estimates are presented inAppendix Appendix D.
2. Identified Heteroskedastic Structural Vector Autoregressions
In this section, a structural VAR model is introduced for the N -dimensional vector of observablevariables y t in which the structural shocks are conditionally heteroskedastic. The structural-formmodel is given by A y t = µ + A y t + · · · + A p y t p + u t , (1)where the structural matrix A is assumed to be nonsingular with unit diagonal, denoted bydiag( A ) = ı N , where ı N is an N -dimensional vector of ones. In other words, there is oneequation for each variable. The quantity µ is an N -dimensional vector of constant terms, A , . . . , A p denote N ⇥ N autoregressive slope coe cient matrices, and u t is a contemporaneously and seriallyuncorrelated structural error term. The variances of the structural errors are assumed to changeover time according to a latent process s t , t { , . . . , T } , and the variance of u it conditional on thestate s t is denoted as s t , i . Moreover, conditionally on s t , the structural errors are assumed to benormally distributed with mean vector zero and diagonal covariance matrix, u t | s t ⇠ N ( , diag( s t )) , t { , . . . , T } , (2)3here s t = ( s t , , . . . , s t , N ) is an N -dimensional vector of variances associated with volatilitystate s t and diag( s t ) denotes a diagonal matrix with main diagonal given by s t . Later in thissection, the heteroskedasticity of the data is used to identify the structural matrix A .The process s t for each t can take a discrete number of values, s t { , . . . , M } . In the currentstudy it is assumed to be an unobservable Markov process that defines a Markov-switching modelas proposed by Lanne et al. (2010). In principle, the uniqueness restrictions and their Bayesianverification procedure can also be used for other types of processes s t that describe the volatilitychanges.The properties of the Markov process, s t , considered in the current study are fully governedby an M ⇥ M matrix of transition probabilities P . The ( i , j ) th element of P is the probability ofswitching to state j at time t , given that at time t i , p ij = Pr ⇥ s t = j | s t = i ⇤ ,for i , j { , . . . , M } , and P Mj = p ij =
1. Since the hidden Markov process has M states, also M vectorsof the state-specific structural error variances, , . . . , M , have to be estimated. Such a flexible MSheteroskedastic process o ↵ ers a range of possibilities of modeling particular patterns of changesin volatility in economic data (see Sims, Waggoner & Zha, 2008; Wo´zniak & Droumaguet, 2015).The heteroskedastic SVAR model presented so far allows for the statistical identification of allthe N A , as will be demonstrated shortly.Therefore, the identified rows of the matrix A can be estimated in a heteroskedastic structuralform model given by equations (1) and (2). Any further restrictions imposed on the identifiedrows of A over-identify the system, and thus, the data are informative about such restrictions. via Heteroskedasticity To see how statistical identification of the model is obtained via heteroskedasticity, it is useful tostudy the implied reduced-form model and its relation to the structural form. The reduced formof the model is obtained by multiplying the structural-form model in equation (1) by A from theleft. The reduced-form residuals are ✏ t = A u t such that, for a uniquely determined matrix A ,the structural errors are obtained from the reduced-form residuals as A ✏ t = u t . Suppose that the M covariance matrices of the reduced-form residuals are denoted by ⌃ m , m { , . . . , M } . Under thecurrent assumptions, where only the variances of the structural errors are state dependent whilethe VAR structure is time-invariant, there exists a decomposition ⌃ m = A diag ( m ) A , m { , . . . , M } . (3)The following theorem presents conditions for identification of the rows of A . A proof is givenin Appendix Appendix A. Theorem 1.
Let ⌃ m , m = , . . . , M , be a sequence of positive definite N ⇥ N matrices and ⇤ m = diag( m , , . . . , m , N ) be a sequence of N ⇥ N diagonal matrices with positive diagonal elements.Suppose there exists a nonsingular N ⇥ N matrix A with unit main diagonal such that ⌃ m = A ⇤ m A , m = , . . . , M . (4)Let ! i = ( , i / , i , . . . , M , i / , i ) be an ( M k th row of A is unique if ! k , ! i i { , . . . , N } \ { k } .4he theorem provides a general result on the identification of a single equation throughheteroskedasticity. It shows that a structural equation and, hence, the corresponding structuralshock is identified if the associated sequence of variances is not proportional to the variancesequences of any of the other shocks. For example, if there are just two volatility states and allvariances change proportionally, that is, for some scalar c , = c , then , i / , i = , j / , j , sothat the conditions of Theorem 1 are not satisfied.Theorem 1 implies that A is globally identified if the vectors of relative variances ! i , i { , . . . , N } , are all distinct. We summarize this result in the following corollary for future reference. Corollary 1.
Under the assumptions of Theorem 1, if ! i , ! j i , j { , . . . , N } , i , j , then A isunique.Corollary 1 implies that full identification may be obtained even if one of the structural shocksis homoskedastic. For example, in a two-dimensional model the conditions of Corollary 1 aresatisfied if the first variance components , and , are equal ( , = , ) as long as , and , are distict.The global identification condition for A in Corollary 1 is an advantage of the presentmodel setup relative to the typical setup used in the related literature on identification viaheteroskedasticity. In that literature, a so-called B -model is typically used with locally identifiedshocks, which are unique up to sign and ordering only (see L ¨utkepohl, 2005). More precisely, thestructural errors u t are assumed to be related to the reduced-form residuals ✏ t as u t = B ✏ t such that we get a reduced-form covariance decomposition ⌃ = BB , ⌃ m = B diag( ⇤ m ) B , m = , . . . , M . (5)The matrix B has a direct interpretation as the matrix of impact e ↵ ects of the shocks on the variables.No restrictions are imposed on the main diagonal of B and local uniqueness of B is obtained bynormalizing the variances of the structural shocks associated with the first volatility state, that is, u t | ( s t = ⇠ N ( , I N ). The conditions for local uniqueness of this decomposition for any numberof states M are derived in Lanne et al. (2010).While such local identification results are su cient for asymptotic theory in a frequentistframework, they are not convenient for Bayesian analysis because they complicate simulatingposterior distributions. Thus, the setup of Corollary 1, with restricted diagonal elements of A and a specific variance sequence associated with each equation, is particularly useful for Bayesiananalysis. Moreover, estimation and inference of the unrestricted parameters of the matrix A inthe current model is separated from the scaling problem associated with the label switching ofheteroskedastic states. In e ↵ ect, the likelihood function and the posterior distribution have moreregular shapes with fewer modes (see Wo´zniak & Droumaguet, 2015, for the detailed analysis of theimpact of label switching of the heteroskedastic states on the shape of the posterior distribution).Conditions for full identification could be formulated equivalently for parametrisation (5).In fact, instead of normalizing the diagonal elements of A , one could normalize the diagonalelements of B to obtain global identification. Such a normalization amounts to assuming that the k th shock has unit instantaneous impact on the k th variable. That condition is used by Stock &5atson (2016) who list several of its advantages. However, a potential drawback is that such anormalization requires knowledge that the k th shock has a nonzero impact e ↵ ect on the k th variablewhich may not be obvious in some situations. Thus, we prefer to work with a normalized A matrix.The advantage of the conditions given in Theorem 1 and Corollary 1 for the variances inparametrization (3) is that they can be investigated by statistical methods because the data areinformative about them. If the conditions for full identification in Corollary 1 are not satisfied, thechanging volatility may still o ↵ er some additional identifying information that implies su cientcurvature in the likelihood and, hence, in the posterior, to enable the data to discriminate betweencompeting economic models.Note that the identification of the matrix A using heteroskedasticity is only a statisticalidentification that allows to estimate all or some of the elements of this matrix without imposingany further restrictions on the model. For any identified shock, the structural impulse responsefunctions can be computed. However, the structural-form errors do not have economic interpretationsas such. In order to call any of the structural shocks, say a monetary policy shock, economicreasoning needs to be imposed. Still, it is useful to exploit such an identification of the shocks asit opens up the possibility for testing any further restrictions imposed on the model on the basisof economic considerations. In order to obtain a flexible framework that facilitates the estimation of models with unrestricted orrestricted matrix A , the approach proposed by Amisano & Giannini (1997), used also by Canova& P´erez Forero (2015) is helpful. Let the r ⇥ ↵ collect all of the unrestricted elements of thematrix A column by column. Then we impose restrictions on the structural matrix A by settingvec ( A ) = Q ↵ + q , (6)where Q and q are respectively an N ⇥ r matrix and an N ⇥ Q and q will be zeros and ones if zero restrictions are imposed on the o ↵ -diagonal elements of A inaddition to the restrictions due to normalizing the main diagonal.
3. Prior Distributions for Bayesian Analysis
To facilitate the inference on the restrictions for the uniqueness of the rows of the matrix A with ones on the main diagonal we estimate state-specific variances of the structural shocks in aparametrization that includes the variances of the structural shocks in the first state, , and M ! m = [ m , i / , i ], for states m { , . . . , M } . Specifying independentinverse gamma 2 distributions ( IG
2) as prior distributions for the ! m s, given our assumptionsabout the distribution of the error terms, leads to the full conditional posterior distributions forthese parameters being of the same type. This setup is the basis for feasible computations of theSDDRs for the uniqueness conditions that are specified for relative variances. The choice of theparametrization, marginal prior and full conditional posterior distributions for the ! m s makes our For the definition of the distribution, its properties, and the random numbers sampling algorithm see Bauwens,Richard & Lubrano (1999, Appendix B). s t . The details of Bayesian assessment of the uniqueness restrictions are given in Section 4.The variances of the structural shocks in the first state, , n , are a priori independentlydistributed as IG a and b set to 1 which makes the distribution quite spreadout over a wide range. In fact, under this assumption the first and second moments of , n maybe infinite which limits the impact of the prior distribution on the posterior distribution.The prior of each of the relative variances of structural shocks ! m , n , for m { , . . . , M } , followsindependently an IG a ! = b ! = a ! +
2, which ensures thatthe mode of the prior distribution is located at 1. This assumption implies that the state-specificvariances for states 2 , . . . , M have prior distributions similar to those in Wo´zniak & Droumaguet(2015). At the mode of the prior distribution there is no heteroskedasticity and hence the rows of A are not uniquely identified.The prior distribution for the unrestricted elements of the matrix A collected in the vector ↵ , conditionally on hyper-parameter ↵ , is a normal distribution with mean vector zero and adiagonal covariance matrix ↵ I r . To avoid making the prior more restrictive for some elements of A than for others one could make sure that the variables entering the model have similar ordersof magnitude. In macro models this is typically not a problem because many variables enter inlogs or rates of change. The hyper-parameter ↵ is interpreted as the level of shrinkage imposedon the structural parameters ↵ and is also estimated. For that purpose, we define the marginalprior distribution of ↵ to be IG a and b set to 1.The conditional prior distribution of the variable-specific constant term, µ n , n { , . . . , N } ,given a constant term specific hyper-parameter µ , is a univariate normal distribution with meanzero and variance µ . The marginal prior distribution for µ is IG a and b set to1. To specify the prior distribution of the structural VAR slope parameters = [ A , . . . , A p ], let P = h D N ⇥ N ( p i , where D is an N ⇥ N diagonal matrix. Typically the diagonal elements of thematrix D are zeros for stationary variables and ones for persistent variables, as in the Minnesotaprior, but they could also be other known quantities. Then the conditional prior distributionof the equation-specific autoregressive parameters, n = [ A . n , . . . , A p . n ], where A l . n is the n th row of matrix A l for l { , . . . , p } , is a pN -variate normal distribution. It is conditioned on anautoregressive hyper-parameter and the n th row of A , denoted by A . n . Its prior mean is equalto A . n P and its prior covariance matrix is equal to H . The diagonal matrix H has the maindiagonal set to the vector ⇣ (1 ) ı N , (2 ) ı N , . . . , ( p ) ı N ⌘ , and thus it allows to impose a decayingpattern of prior variances for the subsequent lags as in the Minnesota prior of Doan, Litterman &Sims (1983). The prior distribution for is IG a and b set to 1.Finally, denote by P m the m th row of the transition matrix P . The prior distributions for therows of the transition probabilities matrix, P m , are set independently for each row and are given by M -dimensional Dirichlet distributions ( D M ) as in Wo´zniak & Droumaguet (2015). The parametersof these distributions, e m , k , for k { , . . . , M } , are all set to 1 except the parameters correspondingto the diagonal elements of the matrix P of transition probabilities, denoted by e m , m , which are setto 10. This choice expresses the prior assumption that the volatility states are persistent over time.7o summarize, the prior specification takes the following form: p ( ✓ ) = p ↵ p ⇣ µ ⌘ p ⇣ ⌘ p ↵ | ↵ 0BBBBB@ M Y m = p ( P m ) ⇥ N Y n = p ⇣ µ n | µ ⌘ p ⇣ n | A . n , ⌘ p ( . n ) M Y m = p ( ! m . n ) , (7)where the specific prior distributions are: µ n | µ ⇠ N ⇣ , µ ⌘ n | A . n , ⇠ N pN ⇣ A . n P , H ⌘ ↵ | ↵ ⇠ N r r , ↵ I r , n ⇠ IG ⇣ a , b ⌘ ! ˜ m , n ⇠ IG ⇣ a ! , b ! ⌘ ↵ ⇠ IG ⇣ a , b ⌘ µ ⇠ IG ⇣ a , b ⌘ ⇠ IG ⇣ a , b ⌘ P m ⇠ D M ⇣ e m , . . . , e mM ⌘ for n { , . . . , N } , m { , . . . , M } and ˜ m { , . . . , M } .The above choice of the prior distributions is practical. Priority is given to distributionsthat result in convenient and proper full conditional posterior distributions, and therefore, allowfor the derivation of an e cient Gibbs sampler that is described in Appendix Appendix B.The hierarchical prior distributions for the constant terms, autoregressive slope parameters, andthe structural matrix constitute a flexible framework in which the impact of the choice of thehyper-parameters of the prior distribution on inference is reduced, in line with Giannone, Lenza& Primiceri (2015).
4. Bayesian Assessment of Identification Conditions and Heteroskedasticity
In this section, we propose to use the Savage-Dickey Density Ratio (SDDR) (see Verdinelli &Wasserman, 1995, and references therein) to verify the identification conditions for the structuralmodel considered in this paper. The SDDR is one of the methods to compute a Bayes factor. TheBayes factor itself, under the assumption of equal prior probabilities of the competing models, isinterpreted as a posterior odds ratio of the model with restrictions versus the unrestricted model.Thus, a large value of the SDDR is evidence in favor of the restriction considered and a smallSDDR provides evidence against the restriction.The main advantage of verifying hypotheses using SDDRs is the small computational costrequired relative to alternative inference methods. SDDRs are computed using only the output ofthe unrestricted model estimation through MCMC techniques. If a probability density function ofthe restricted (function of) parameters is available, then the computations simplify even further8y the application of the Rao-Blackwell tool (see Gelfand & Smith, 1990). The latter provides amarginal density ordinate estimate with a high numerical precision. The SDDR for verificationof identification of the SVAR proposed below exhibits all of the features mentioned above.Alternative ways of computing the Bayes factor either are associated with the loss of numericale ciency or involve estimation of two models, the restricted and the unrestricted ones, as well asestimation of MDDs. Either of the tasks may require significant computational costs.The identification of SVARs requires su cient variability in the conditional variances of thestructural shocks. The uniqueness of the structural matrix A can be assessed by verifying theequality restrictions for specific relative variances, such as: ! m , i = ! m , j . (8)If the restriction above holds for some i and j for all m { , . . . , M } , then the i th structural shockcannot be distinguished from the j th structural shock and the corresponding rows of matrix A arenot uniquely identified.Of course, identification through heteroskedasticity requires that there are at least two distinctvolatility states. In other words, for heteroskedasticity of structural shock i , there must be twodistinct variances m , i , m { , . . . , M } , which translates to the requirement that at least one of the ! m , i , m { , . . . , M } is not equal to one. Thus, the homoskedasticity of the i th structural shock isassessed by verifying restrictions ! , i = · · · = ! M , i = . (9)Both of the restrictions (8) and (9) can be verified as the data are informative about these features. We rewrite the restriction in equation (8) as ! m , i ! m , j = SDDR = p ✓ ! m , i ! m , j = | Y ◆ p ✓ ! m , i ! m , j = ◆ , (11)where Y = ( y , . . . , y T ) denotes the data. Small values of the SDDR provide evidence against theratio ! m , i ! m , j being 1. Of course, this raises the question how small the SDDR has to be to indicateclear evidence against the restriction. Kass & Raftery (1995) discuss a scale for evaluating the sizeof the SDDR. We will use that scale in our empirical illustration in Section 5.The SDDR is particularly suitable for the verification of the identification conditions becauseit does not require the estimation of the restricted models. Moreover, the SDDR can be easilycomputed as long as the densities of the full conditional posterior and the prior distributions of ! m , i ! m , j are of known analytical form. We propose a distribution that is useful for such computations9n the context of IG Definition 1 (Inverse Gamma 2 Ratio distribution)
Let x and y be two strictly positive independentrandom variables distributed according to the following IG x ⇠ IG a , b ) and y ⇠ IG a , b ), where a , a , b , and b are positive real numbers and the probability densityfunction of the inverse gamma 2 distribution is given by: f IG ( x ; a , b ) = ✓ a ◆ b ! a x a + exp ( bx ) , (12)where ( · ) denotes the gamma function. Then, the random variable z , defined as z = x / y , followsthe Inverse Gamma 2 Ratio ( IG R ) distribution with probability density function given by: f IG R ( z ; a , a , b , b ) = B ✓ a , a ◆ b a b a z a ( b + b z ) a + a , (13)where B ( · , · ) denotes the beta function. ⇤ It is easy to show that the moments of the Inverse Gamma 2 Ratio distribution are as follows.
Moments of the IG R distribution. The expected value and the variance of the IG R –distributedrandom variable z are respectively given by E [ z ] = b b a a a > , (14) Var [ z ] = ⇣ b b ⌘ a ( a + a a ( a
4) for a > . (15)In general, the k th order non-central moment of z is given by E h z k i = b b ! k B ⇣ a k , a + k ⌘ B ⇣ a , a ⌘ for a > k . (16) ⇤ The density given above generalizes the F distribution that is nested within our distributionfamily by setting a = b and a = b , as well as the compound gamma distribution derived byDubey (1970) that is parametrized by three parameters a / a /
2, and b / b . For completenessof the derivations, Appendix Appendix C defines the Inverse Gamma 1 Ratio distribution ofa random variable that is defined as a ratio of two independent inverse gamma 1-distributed Further generalizations of the F , Beta, and compound gamma distributions were proposed by McDonald (1984) andMcDonald & Xu (1995). The latter work is particularly relevant for our developments as it proposes the generalizationsof the compound gamma distributions parametrized by four and five parameters. Their distributions explicitly nestthe compound gamma distribution, however, none of them nests our Inverse Gamma 2 Ratio or the Inverse Gamma 1Ratio distribution. and ! . In Section 3 it was assumed that the parameters ! m , i and ! m , j are a priori independentlydistributed as IG
2. In e ↵ ect, the denominator of the SDDR from equation (11) can be computedby simply evaluating the newly proposed distribution with parameters a = a = a ! = b = b = b ! = z = IG p ! m , i ! m , j = Y ! = S S X s = f IG R ✓ a ( s ) i , m , a ( s ) j , m , b ( s ) i , m , b ( s ) j , m ◆ , (17)where ⇢ a ( s ) i , m , a ( s ) j , m , b ( s ) i , m , b ( s ) j , m Ss = is a sample of S draws from the posterior distribution defined for n { i , j } as follows: a ( s ) n , m = a ! + T ( s ) m , (18a) b ( s ) n , m = b ! + ⇣ ( s )1 , n ⌘ T X t = ⇣ A ( s )0 , n y t µ ( s ) n A ( s )1 , n y t · · · A ( s ) p , n y t p ⌘ , (18b)where T ( s ) m is the number of observations classified as belonging to the m th state in the s th iterationof the sampling algorithm.According to the conditions stated in Section 2, the j th structural shock may not be identifiedif all the ratios ! m , i ! m , j are equal to 1. Hence, to establish possible identification problems, we haveto investigate whether ! m , i ! m , j = m { , . . . , M } . The SDDR can be extended for thatpurpose. Let U i . j denote the event that ! m , i ! m , j = m { , . . . , M } . In such a case, thedenominator of the SDDR for U i . j is computed simply as:ˆ p ⇣ U i . j ⌘ = M Y m = p ! m , i ! m , j = ! = f IG R ⇣ a ! , a ! , b ! , b ! ⌘ M , (19)where the last equality comes from the assumption of the invariance of the prior distribution withrespect to m . The SDDR’s numerator is computed as:ˆ p ✓ U i . j Y ◆ = S S X s = M Y m = f IG R ✓ a ( s ) i , m , a ( s ) j , m , b ( s ) i , m , b ( s ) j , m ◆ . (20)The computations of the SDDRs presented above, given the output from the MCMC estimation,are fast and accurate, which emphasizes the advantages of the current setup. The verification ofthe identification conditions with the SDDRs requires the prior and full conditional posteriordistributions for the relative variances of the structural shocks being IG
2- or IG ! m , n , and is independent on how the state variable s t is estimated. It is, therefore, easilyapplicable also to other regime-dependent heteroskedastic processes such as those consideredby Wo´zniak & Droumaguet (2015) and Markov-switching models with time-varying transitionprobabilities as considered by Sims et al. (2008) and Chen & Netˇsunajev (2017).Having a procedure for verifying the identification conditions emphasizes the benefits ofapplying Bayesian inference in this paper. Note that there does not exist a general, valid frequentisttest of such conditions. In Bayesian inference the estimation of a model that is not identified doesnot pose any theoretical or practical obstacles. Therefore, using a standard way of verifyinghypotheses, such as through the SDDR, is straightforward. Still, verification of the uniqueness isessential to understand the SVAR model identified through heteroskedasticity. If the identification conditions are confirmed, then heteroskedasticity is also established as a by-product.However, one may also be interested in testing the shocks individually or jointly for heteroskedasticity.In a similar way as the SDDR can be used to verify the identification conditions, it can also beused to investigate the heteroskedasticity of the structural shocks. Denote by H i the event thatthe restrictions ! , i = · · · = ! M , i = i th shock. The SDDR for assessing this hypothesis is given by SDDR = p ( H i | Y ) p ( H i ) . (21)The elements of the SDDR in the equation above can be computed easily byˆ p ( H i ) = M Y m = p ! m , i = = f IG ⇣ a ! , b ! ⌘ M (22)and ˆ p ⇣ H i Y ⌘ = M Y m = p ! m , i = | Y = S S X s = M Y m = f IG ⇣ a ( s ) i , m , b ( s ) i , m ⌘ , (23)where a ( s ) i , m and b ( s ) i , m are given in equation (18).The condition for joint homoskedasticity of several structural shocks can be assessed as well.Let J be a set of K N indicators that define the considered conjuction of homoskedasticityconditions: J = n j i { , . . . , N } for i { , . . . , K } : H j \ · · · \ H j K o . Then, the joint homoskedasticity condition is denoted by H = H j \ · · · \ H j K and the elements ofthe SDDR are computed as follows:ˆ p ( H ) = Y i J M Y m = p ! m , i = = f IG ⇣ a ! , b ! ⌘ K ( M (24)12nd ˆ p ⇣ H Y ⌘ = Y i J M Y m = p ! m , i = | Y = S S X s = Y i J M Y m = f IG ⇣ a ( s ) i , m , b ( s ) i , m ⌘ . (25)All the computations in this section are facilitated by the fact that ! m , n are a priori as well asconditionally a posteriori independent for m { , . . . , M } and n { , . . . , N } . The proposed BayesianSDDR for assessing homoskedasticity can be computed easily given the sample of draws fromthe posterior distribution and it does not pose any significant theoretical challenges (see, e.g.,Fr ¨uhwirth-Schnatter, 2006).
5. Empirical Illustration
In this section we illustrate our Bayesian procedures by applying them to SVAR models thatwere considered by Belongia & Ireland (2015) to study the role of Divisia monetary aggregatesin monetary policy models. These authors find statistical support for the importance of Divisiamonetary aggregates in the monetary policy rule. They document these relationships using Divisiameasurements of several alternative monetary aggregates.In this paper, we focus on the particular role of the money aggregate M2 that, when properlyrepresented by a Divisia measure, has the capability of explaining aggregate fluctuations to alarge extent, as argued by Barnett (2012). For that purpose, we review a number of identificationschemes for the SVAR model some of which have been considered by Belongia & Ireland (2015).We build VAR models for the following six quarterly U.S. variables: p t - log of GDP deflator, gdp t - log of real GDP, cp t - a measure of commodity prices defined as the spot index compiled now bythe Commodity Research Bureau and earlier by the Bureau of Labor Statistics, FF t - federal fundsrate, M t - M2 Divisia monetary aggregate to measure the flow of monetary services and m t itslogarithm, uc t - user-cost measure, provided by Barnett et al. (2013), that is the price dual to theDivisia monetary aggregate M t . These variables in exactly this order are collected in the vector y t , i.e., y t = ( p t , gdp t , cp t , FF t , m t , uc t ). The series are plotted in Figure 1 for the sample period from1967Q1 - 2013Q4. Assuming that there is su cient heterogeneity in the covariance structure of the VAR model, afull set of shocks can be identified by heteroskedasticity. No further restrictions are needed for A in this case. In the following we refer to the model as unrestricted if it is identified purely byheteroskedasticity (see the first scheme in Table 1). Note that the ordering of the equations inthis scheme is to some extent arbitrary as no economic restrictions are imposed. In our empiricalanalysis, we use a model with two volatility regimes and order the equations such that therelative variances of the error terms of the unrestricted model correspond to the relative variancesobtained for the conventional identification schemes. In particular, the equation with the largestrelative variance will be placed as the fourth equation of the model and will be considered to be theinterest rate equation because, for the conventional identification schemes discussed subsequently, We thank Belongia & Ireland for sharing their data set with us. . . . . p . . . . gdp . . . . c p . . . . FF . . . . m . . . . u Figure 1: Time series data taken from Belongia & Ireland (2015). the interest rate equation is the fourth equation and it has consistently the largest relative varianceby far.A standard conventional identification scheme is a recursive model motivated by the workof Bernanke & Blinder (1992), Sims (1986, 1992) and others (see the second scheme in Table 1).It just-identifies the system in the conventional homoskedastic case. In the heteroskedastic caseinstead, the zero restrictions above the main diagonal are over-identifying and can be tested. Thismodel identifies the monetary policy shock by imposing restrictions on the fourth row of thematrix A such that the interest rate reacts to contemporaneous changes in the price level, output,and commodity prices.In a monetary model the interest rate equation is typically set up as a Taylor rule which assumesthat the interest rate reacts to inflation and the output gap. In our model comparison we includea benchmark model inspired by Leeper & Roush (2003) which is also discussed by Belongia &Ireland (2015). In Table 1 the identification of the A matrix from this work is described as TaylorRule with Money . In addition to standard Taylor rule variables such as output gap and inflation,the interest rate equation also contains the Divisia monetary aggregate, meaning that monetarypolicy reacts to changes in the money stock. Additionally, this model identifies the fifth and sixthshocks as money demand and money supply shocks, respectively. The restrictions imposed onthe last two rows of A over-identify the model and, hence, they can be tested in a conventionalsetting as well as in our heteroskedastic setting.Belongia & Ireland are specifically interested in the role of the divisia money variable in theinterest rate reaction function. Therefore they test two sets of over-identifying restrictions on thefourth row of A . First, they exclude the money aggregate from the Taylor rule by imposingthe restriction ↵ =
0. This scheme is indicated as
Taylor Rule without Money in Table 1.It corresponds to the standard monetary policy reaction function of Taylor (1993). Anotherspecification considered by Belongia & Ireland (2015) is denoted as
Money-Interest Rate Rule inTable 1. It assumes that the interest rate reacts contemporaneously only to the money aggregate.Such a rule was advocated by Leeper & Roush (2003) and used also in Leeper & Zha (2003) and14 able 1: Competing Monetary Policy Models
Unrestricted Recursive Scheme ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ Taylor Rule with Money Taylor Rule without Money Money-Interest Rate Rule ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ ↵ Note: The vector of variables at time t is y t = ( p t , gdp t , cp t , FF t , m t , uc t ). The fourth row of each matrixspecifies the monetary policy reaction function and identifies the fourth shock as the monetary policyshock. Sims & Zha (2006). The restrictions ↵ = ↵ = ↵ erences in the comparison of the models proposed in the presentpaper and the analysis conducted by Belongia & Ireland (2015). First of all, Belongia & Ireland didnot allow for heteroskedasticity of the structural shocks. Consequently, they could only test theover-identifying specifications conditional on a set of just-identifying restrictions. Thus, their testsof the restrictions in the fourth row of A are conditional on restrictions imposed in the other rowsof A . By using the heteroskedasticity of the structural shocks we can test not only the restrictionsimposed by Belongia & Ireland, but we can also test the restrictions in the fourth row and leaveall other rows unrestricted, provided the fourth equation is identified through heteroskedasticity.In order to investigate the importance of the money aggregate in the interest rate equation, weestimate the models mentioned above with heteroskedastic structural shocks. We estimate modelswith the full set of restrictions as presented in Table 1 and also test models in which all of the rowsapart from the fourth row are left unrestricted. Finally, all of the models are confronted with amodel solely identified by heteroskedasticity in which all the o ↵ -diagonal elements of matrix A are estimated without any zero restrictions. Importantly, our approach allows us to statisticallycompare alternative monetary policy models that are not nested within one another. For instance,our Bayes factors allow us to compare the recursive model to each of the remaining monetarypolicy models despite the fact that neither of them is nested within the recursive one.We fit VAR models of order p =
4, as in Belongia & Ireland (2015), to our full sample of quarterlydata from 1967Q1 - 2013Q4 and also to a reduced sample from 1967Q1 - 2007Q4. FollowingBelongia & Ireland (2015), the shorter sample is considered because it excludes the financial crisisperiod which could a ↵ ect the structure of monetary policy in the US and, hence, it might lead todistortions in our analysis. 15 able 2: Estimated Relative Variances, ˆ ! , of Structural Shocks in Heteroskedastic Models Taylor Rule Taylor Rule Money-Unrestricted Recursive with Money without Money Interest Rate
Sample period 1967Q1 - 2013Q4 p t gdp t cp t FF t m t uc t Sample period 1967Q1 - 2007Q4 p t gdp t cp t FF t m t uc t ! , for the Markov-switching models with two states, M = We have fitted a two-state Markov process to capture possible changes in the volatility of theresiduals. The estimated variances of the second regime relative to the variances in the firstregime are shown in Table 2 for all the di ↵ erent identification schemes imposed on A . Thus thequantities in the table are the estimated elements of ! . They are all distinct from one, indicatingthat the second regime indeed has di ↵ erent variances than regime 1.The marginal posterior regime probabilities of the second volatility state are depicted inFigure 2. They show that roughly the first part and the last part of the sample constitute thesecond volatility regime and the middle part from the first half of the 1980s to the beginning ofthe financial crisis constitute the first volatility regime. There is also a short period around thechange of the millennium which is assigned to the second volatility state in the longer sample.Since the relative variances of the second volatility state are greater than one, this regime is clearly16 ample from 1967Q1 to 2013Q4 . . . . . . Sample from 1967Q1 to 2007Q4 . . . . . . Figure 2: Marginal posterior probabilities of the second state for the best models with Markov-switchingheteroskedasticity with 2 states. a high-volatility state. With respect to the dating and the interpretation the first state resemblesthe
Greenspan state found by Sims & Zha (2006) while the second one highly resembles the
Burnsand Volcker state (see, e.g., Wo´zniak & Droumaguet, 2015). A similar classification of the volatilitystates is also obtained when the model is fitted only to the reduced sample ending in 2007Q4.Thus, the assignment of states is reasonably similar in both samples and is, hence, not drivenentirely by the potentially higher macroeconomic volatility during the financial crisis.In Table 2 the posterior standard deviations of the relative variances are also presented. Partlythey are quite large. Therefore one may wonder whether the two estimated volatility states arereally clearly distinct. This question can be answered by our formal statistical tools. In Table 3we use SDDRs to assess whether the relative variances are actually 1. Note that for our modelwith only two volatility regimes ( M = i th structural shock is heteroskedastic if ! , i ,
1. Itturns out that all results for the longer sample show strong support for the relative variances to bedi ↵ erent from 1 which implies heteroskedasticity of all structural shocks. For the shorter samplethe evidence is still strong that at least some of the relative variances are not equal to 1, whilesome others may not be di ↵ erent from 1. Apart from the unrestricted model, in all other modelsthe evidence is strong that at most one shock is homoskedastic and, hence, we may still have fullidentification through heteroskedasticity, as discussed in Section 2. In any case, the evidence fortwo distinct volatility states is very strong for both samples because, to confirm distinct covariancematrices in the two states, it is enough that one of the relative variances di ↵ ers from 1. In Appendix Appendix D we provide details on the precision of the estimated quantities. All results are su cientlyprecise so as to ensure the qualitative validity of the results. able 3: Natural Logarithms of SDDRs for Assessing Heteroskedasticity Taylor Rule Taylor Rule Money-Unrestricted Recursive with Money without Money Interest Rate
Sample period 1967Q1 - 2013Q4
Hypothesis: ! , i = p t -16.75 -15.14 -15.95 -11.66 -10.72 gdp t -80.55 -36.14 -35.98 -26.93 -29.59 cp t -52.62 -27.78 -31.45 -27.2 -23.63 FF t -541.29 -393.51 -542.65 -406.74 -373.75 m t -40.84 -46.34 -27.77 -33.04 -30.77 uc t -12.04 -17.25 -28.34 -21.36 -19.65 Hypothesis: ! , i = i = , . . . , N -1058.07 -849.44 -981.9 -773.1 -838.41Sample period 1967Q1 - 2007Q4 Hypothesis: ! , i = p t -7.29 -5.17 -8.3 -5.88 -6.5 gdp t -44.09 -27.07 -31.50 -28.49 -31.07 cp t -9.86 -7.29 -7.89 -6.55 -6.67 FF t -406.27 -280.94 -369.25 -287.87 -341.29 m t -0.64 -0.62 0.72 1.08 0.88 uc t -4.06 -5.64 -8.15 -6.32 -8.61 Hypothesis: ! , i = i = , . . . , N -582.38 -464.92 -518.97 -426.52 -495.53 Note: The table reports natural logarithms of SDDRs for the hypothesis of homoskedasticity of individualstructural shocks, as well as the hypothesis of joint homoskedasticity in the models with two volatilitystates, M =
2. Numbers in boldface denote SDDR values indicating very strong evidence against thehypothesis on a scale by Kass & Raftery (1995). The numerical standard errors for the SDDRs reported inthis table are given in Appendix Appendix D.
These results clearly indicate that there is time-varying volatility in the data that can be used foridentification purposes. It is therefore of interest to know whether there is su cient heteroskedasticityto ensure a fully identified model. As discussed in the earlier sections, in a model with two states,full identification requires that all of the relative variances, ! , n , n = , . . . , N , are distinct. Againwe can use SDDRs to investigate this identification condition. In Table 4 the relevant SDDRs aregiven. For both samples they provide strong support for at least some distinct relative variances.In particular, the SDDRs strongly indicate that ! , is di ↵ erent from all other ! , j , because allSDDRs related to ! , / ! , j = able 4: Natural Logarithms of SDDRs for Assessing Identification Hypothesis: ! , i / ! , j = Sample period 1967Q1 - 2013Q4 i , j ! -22.85 -0.45 1.022 0.42 -7.44 -13.22 -12.55 -24.35 Sample period 1967Q1 - 2007Q4 i , j ! -19.47 -7.14 -2.32 -3.373 -22.77 -18.87 -24.36 M =
2, andunrestricted matrix A . Numbers in boldface denote SDDR values indicating very strong evidence againstthe hypothesis on a scale by Kass & Raftery (1995). The numerical standard errors for the SDDRs reportedin this table are given in Appendix Appendix D. Given the results for the relative variances in our model, we can compare di ↵ erent identificationschemes via their MDDs. For a range of models they are given in Table 5. Each row displays theMDDs for the five identification schemes listed in Table 1 for a di ↵ erent model setup. The modelwith the largest MDD in each row is highlighted in boldface.Looking at the models identified through heteroskedasticity, models with divisia money inthe interest rate equation have the largest MDDs for the longer sample according to the resultsin Table 5. The same is also true for the shorter sample when only zero restrictions are imposedon the interest rate equation. In contrast, the largest MDD is obtained for the scheme signified as Taylor Rule without Money when restrictions are imposed on all of the rows of matrix A . Note,however, that in this case the three schemes Taylor Rule with Money , Taylor Rule without Money ,and
Money-Interest Rate have almost identical MDDs such that the evidence in favor of a modelwithout money in the interest rate equation is very weak at best.The advantage of our setup is that we can deal also with models which are only partiallyidentified in a conventional setting as they are compared in the second row of the two panels inTable 5. Thus, using heteroskedasticity we can compare models which impose restrictions on theequation which is of direct interest. We do not have to condition on the restrictions on the otherrows of A , as in a conventional frequentist analysis. Clearly, if additional restrictions are imposedand then the restrictions are rejected in such a setup, it is unclear whether the restrictions of interestor the additional restrictions drive the rejection. In contrast, using heteroskedasticity it is possible19 able 5: Natural Logarithms of MDDs for Assessing Restrictions on A Taylor Rule Taylor Rule Money-with without InterestUnrestricted Recursive Money Money Rate
Sample period 1967Q1 - 2013Q4 all restrictions -1700.2 -1657.6 -1632.5 -1643.6 -1644.6interest rate equation restricted -1700.2* -1695.9 -1687.8 -1695.3 -1688.1
Sample period 1967Q1 - 2007Q4 all restrictions -1469.8 -1428.6 -1414.4 -1410.4 -1411.2interest rate equation restricted -1469.8* -1457.6 -1453.7 -1453.6 -1442.1
Note: The table reports natural logarithms of marginal data densities for particular models. Numbers inboldface denote the largest values of the MDDs in rows. * denotes values that are copied from the rowabove. The numerical standard errors for the logarithms of the MDDs reported in this table are given inAppendix Appendix D. to explicitly impose the restrictions only on the interest rate equation. The other parameters areidentified by heteroskedasticity. Admittedly, this argument relies on full identification throughheteroskedasticity which is not strongly supported for our data. However, identification of theinterest rate equation is strongly supported confirming that the di ↵ erences in MDDs are not onlydriven by our prior but reflect data properties. The last claim is based on the fact that we assumehierarchical prior distributions for the parameters for which the level of shrinkage is estimated.Thereby we leave considerable room for the data to speak. Overall our analysis supports theimportance of divisia money in the interest rate equation.
6. Conclusions
This study considers structural VAR models with heteroskedasticity where the changes in volatilityare driven by a Markov process. A full Bayesian analysis framework is presented for suchSVAR-MSH models. A set of parametric restrictions for unique identification of the structuralparameters through heteroskedasticity in these models is derived and Bayesian methods arepresented for investigating the restrictions for global identification based on a Savage-Dickeydensity ratio. Moreover, a fast Markov Chain Monte Carlo sampler is developed for the posteriordistribution of the structural parameters and a method for computing the marginal data densityis provided which facilitates a full Bayesian model selection and model comparison.SVAR models from a frequentist study by Belongia & Ireland (2015) are used to illustrate theBayesian methods. Belongia & Ireland are interested in the role of a Divisia money aggregate inan interest rate reaction function. In the empirical illustration we compare our Bayesian methodsto frequentist methods. It is shown that using heteroskedasticity for identification is beneficialand that this can be done in a Bayesian framework. In fact, our methods go beyond what iscurrently possible in a frequentist framework. In particular, in our Bayesian framework we canformally investigate conditions for identification of specific equations and shocks for which formalstatistical tests are currently not available in a frequentist framework.20 eferences
Amisano, G., & Giannini, C. (1997).
Topics in Structural VAR Econometrics . Springer-Verlag, Berlin.Barnett, W. A. (2012).
Getting It Wrong: How Faulty Monetary Statistics Undermine the Fed, the Financial System, and theEconomy . Cambridge, MA: MIT Press.Bauwens, L., Richard, J., & Lubrano, M. (1999).
Bayesian inference in dynamic econometric models . Oxford UniversityPress, USA.Belongia, M. T., & Ireland, P. N. (2015). Interest Rates and Money in the Measurement of Monetary Policy.
Journal ofBusiness & Economic Statistics , , 255–269.Bernanke, B. S., & Blinder, A. S. (1992). The Federal Funds Rate and the Channels of Monetary Transmission. AmericanEconomic Review , , 901–921.Canova, F., & P´erez Forero, F. J. (2015). Estimating Overidentified, Nonrecursive, Time-varying Coe cients StructuralVector Autoregressions. Quantitative Economics , , 359–384.Chen, W., & Netˇsunajev, A. (2017). Structural vector autoregression with time varying transition probablilities: Identificationvia heteroskedasticity . working paper Freie Universit¨at Berlin.Chib, S. (1996). Calculating posterior distributions and modal estimates in Markov mixture models.
Journal ofEconometrics , , 79–97.Doan, T., Litterman, R. B., & Sims, C. A. (1983). Forecasting and Conditional Projection Using Realistic PriorDistributions. NBER Working Paper Series , , 1–71.Droumaguet, M., Warne, A., & Wo´zniak, T. (2017). Granger Causality and Regime Inference in Markov Switching VARModels with Bayesian Methods. Journal of Applied Econometrics , , 802–818.Dubey, S. D. (1970). Compound gamma, beta and F distributions. Metrika , , 27–31.Fr ¨uhwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models . Springer.Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-Based Approaches to Calculating Marginal Densities.
Journal of theAmerican Statistical Association , , 398–409.Giannone, D., Lenza, M., & Primiceri, G. E. (2015). Prior selection for vector autoregressions. Review of Economics andStatistics , , 436–451.Herwartz, H., & L ¨utkepohl, H. (2014). Structural vector autoregressions with Markov switching: Combiningconventional with statistical identification of shocks. Journal of Econometrics , , 104–116.Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association , , 773–795.Kilian, L., & L ¨utkepohl, H. (2017). Structural Vector Autoregressive Analysis . Cambridge: Cambridge University Press.Kulikov, D., & Netˇsunajev, A. (2013). Identifying Monetary Policy Shocks via Heteroskedasticity : a Bayesian Approach.Kulikov, D., & Netˇsunajev, A. (2017). Stargazing with Structural VARs : Shock Identification via IndependentComponent Analysis.Lanne, M., & Luoto, J. (2016). Data-Driven Inference on Sign Restrictions in Bayesian Structural Vector Autoregression.Lanne, M., L ¨utkepohl, H., & Maciejowska, K. (2010). Structural vector autoregressions with Markov switching.
Journalof Economic Dynamics and Control , , 121–131.Leeper, E. M., & Roush, J. E. (2003). Putting ”M” Back in Monetary policy. Journal of Money, Credit and Banking , ,1217–1256.Leeper, E. M., & Zha, T. (2003). Modest policy interventions. Journal of Monetary Economics , , 1673–1700.L ¨utkepohl, H. (2005). New Introduction to Multiple Time Series Analysis . Berlin: Springer-Verlag.L ¨utkepohl, H., & Netˇsunajev, A. (2014). Disentangling demand and supply shocks in the crude oil market: How tocheck sign restrictions in structural VARs.
Journal of Applied Econometrics , , 479–496.L ¨utkepohl, H., & Netˇsunajev, A. (2017). Structural vector autoregressions with heteroskedasticy: A review of di ↵ erentvolatility models. Econometrics and Statistics , , 2–18.L ¨utkepohl, H., & Velinov, A. (2014). Structural vector autoregressions: Checking identifying long-run restrictions viaheteroskedasticity. Journal of Economic Surveys , (pp. n / a–n / a).McDonald, J. B. (1984). Some Generalized Functions for the Size Distribution of Income. Econometrica , , 647–663.McDonald, J. B., & Xu, Y. J. (1995). A Generalization of the Beta Distribution with Applications. Journal of Econometrics , , 133–152.Meade, B., Lafayette, L., Sauter, G., & Tosello, D. (2017). Spartan HPC-Cloud Hybrid: Delivering Performance andFlexibility. University of Melbourne , .Netˇsunajev, A. (2013). Reaction to technology shocks in markov switching structural vars: identification viaheteroscedasticity.
Journal of Macroeconomics , , 51–62.Pajor, A. (2016). Estimating the Marginal Likelihood Using the Arithmetic Mean Identity. Bayesian Analysis , (pp. 1–27).Forthcoming. errakis, K., Ntzoufras, I., & Tsionas, E. G. (2014). On the use of marginal posteriors in marginal likelihood estimationvia importance sampling. Computational Statistics and Data Analysis , , 54–69.Rigobon, R. (2003). Identification Through Heteroskedasticity. The Review of Economics and Statistics , , 777–792.Sims, C. A. (1986). Are Forecasting Models Usable for Policy Analysis? Quarterly Review. Federal Reserve Bank ofMinneapolis. , , 2–16.Sims, C. A. (1992). Interpreting the macroeonomic time series facts. the e ↵ ects of monetary policy. European EconomicReview , , 975–1011.Sims, C. A., Waggoner, D. F., & Zha, T. (2008). Methods for inference in large multiple-equation Markov-switchingmodels. Journal of Econometrics , , 255–274.Sims, C. A., & Zha, T. (2006). Were There Regime Switches in U.S. Monetary Policy? American Economic Review , ,54–81.Stock, J. H., & Watson, M. W. (2016). Dynamic factor models, factor-augmented vector autoregressions, and structuralvector autoregressions in macroeconomics. In J. B. Taylor, & H. Uhlig (Eds.), Handbook of Macroeconomics (pp. 415 –525). Elsevier volume 2A.Taylor, J. (1993). Discretion Versus Policy Rules in Practice.
Carnegie-Rochester Conference Series on Public Policy , ,195–214.Velinov, A., & Chen, W. (2015). Do stock prices reflect their fundamentals? New evidence in the aftermath of thefinancial crisis. Journal of Economics and Business , , 1–20.Verdinelli, I., & Wasserman, L. (1995). Computing Bayes Factors Using a Generalization of the Savage-Dickey DensityRatio. Journal of the American Statistical Association , , 614–618.Wo´zniak, T., & Droumaguet, M. (2015). Assessing Monetary Policy Models : Bayesian Inference for HeteroskedasticStructural VARs. University of Melbourne Working Paper Series , . ppendix A. Proof of Theorem 1 We first show the following matrix result.
Lemma 1.
Given a sequence of positive definite N ⇥ N matrices ⌦ m , m = , . . . , M
2, let C be anonsingular N ⇥ N matrix and m = diag( m , , . . . , m , N ) be a sequence of N ⇥ N diagonal matricessuch that ⌦ m = C m C , m = , . . . , M , (A.1)where = I N , the N ⇥ N identity matrix. Let k = ( , k , . . . , M , k ) be an M -dimensional vector.Then the k th column of C is unique up to sign if k , i i { , . . . , N } \ { k } . Proof:
The proof uses ideas from Lanne et al. (2010). Let C ⇤ be a matrix that satisfies ⌦ m = C ⇤ m C , m = , . . . , M . It will be shown that, under the conditions of Lemma 1, the k th column of C ⇤ must be the sameas that of C , except perhaps for a reversal of signs. Without loss of generality it is assumed in thefollowing that k =
1, because this simplifies the notation. In other words, it is shown that the firstcolumns of C and C ⇤ are the same possibly except for a reversal of signs.There exists a nonsingular N ⇥ N matrix Q such that C ⇤ = CQ . Using condition (A.1) for m = Q has to satisfy the relation CC = CQQ C . Multiplying this relation from the left by C and from the right by C implies that QQ = I N and, hence, Q is an orthogonal matrix.The relations C m C = CQ m Q C , m { , . . . , M } , imply m = Q m Q and, hence, Q m = m Q for m { , . . . , M } . Denoting the i j th element of Q by q ij , the latter equation implies that q k = q k k , k = , . . . , N . Hence, since k is di ↵ erent from for k = , . . . , N , we must have q k = k = , . . . , N . Since, Q is orthogonal, the first column must then be (1 , , . . . , or ( , , . . . , , which proves Lemma1. Q.E.D.Using Lemma 1 the proof of Theorem 1 is straightforward. Proof of Theorem 1.
Consider the setup of Lemma 1 with C = A ⇤ / . Then the arguments in theproof of Lemma 1 show that any other admissible matrix C is of the form C ⇤ = CQ = A ⇤ / Q ,where Q is as in the proof of Lemma 1. Hence, C ⇤ = Q ⇤ / A , which shows that ⇤ / A and Q ⇤ / A have the same k th row. Multiplying the k th row of ⇤ / A by p , k gives the desiredresult. Q.E.D.23 ppendix B. Computational Details Appendix B.1. Notation and Likelihood Function
Let the N ⇥ T matrix Y = [ y , . . . , y T ] collect all the observations of the time series considered. Let K = + pN and define the K ⇥ x t as x t = ⇣ , y t , y t , . . . , y t p ⌘ . It collects all the variables on the right-hand side (RHS) of equation (1). Moreover, let X = [ x , . . . , x T ] be a K ⇥ T matrix, where the initial conditions y , . . . , y p are treated as given andset to the first p observations of the available dataset. Similarly, collect the structural errors inthe matrix U = [ u , . . . , u T ], and denote its n th row by U n . Let Y m , X m , and U n . m denote matricescorresponding to the matrices Y , X , U , and U n collecting only the state specific columns for which s t = m , m { , . . . , M } . The column dimension of these matrices is denoted by T m and P Mm = T m = T .The 1 ⇥ T vector S = ( s , . . . , s T ) is the realization of the hidden Markov process for periods from 1to T . Define a N ⇥ K matrix A = [ µ, A , . . . , A p ] collecting the slope parameters and constant termson the RHS of equation (1) and denote its n th row by A n which is a 1 ⇥ K vector. For conveniencewe also denote by ✓ the vector of all the parameters of the model.Using the previously defined notation, equation (1) can be written in matrix notation as A Y = AX + U , (B.1)and the n th row of (B.1) can be written as A . n Y = A n X + U n , (B.2)for n = , . . . , N .Given the assumptions above and the conditional normality assumption in equation (2) forthe structural errors of the SVAR-MSH model, the likelihood function is given by: p ( Y | S , ✓ ) = (2 ⇡ ) TN | det ( A ) | T N Y n = T , n M Y m = N Y n = ! Tm m , n ⇥⇥ exp M X m = N X n = , n ! m , n [ A . n Y m A n X m ] [ A . n Y m A n X m ] , (B.3)where ! , n = n { , . . . , N } . The likelihood function written in this form emphasizes thefeature of the SVAR models that equations of the model can be analyzed one by one leading to aconvenient form of the full conditional posterior distributions used in the Gibbs sampler. Appendix B.2. Gibbs SamplerSampling the variances of the structural shocks.
For given Y , S , A n and A . n , each , n is drawnindependently, for n { , . . . , N } , from an IG , n | Y , S , A n , A . n , ! m , n ⇠ IG a + T , b + M X m = ! m , n ( A . n Y A n X ) ( A . n Y A n X ) . ! m , n are drawn independently, for m { , . . . , M } and n { , . . . , N } ,from the following IG ! m , n | Y , S , A n , A . n , , n ⇠ IG ⇣ a ! + T m , b ! + , n ( A . n Y m A n X m ) ( A . n Y m A n X m ) ⌘ . Sampling the structural matrix A . To sample the posterior of the unrestricted elements of A collected in the vector ↵ (see equation (6)), rewrite the SVAR model from equation (1) as ˜ y t = ˜ x t ↵ + u t ,where ˜ y t = ⇣ y t ⌦ I N ⌘ q Ax t , and ˜ x t = ⇣ y t ⌦ I N ⌘ Q . Then the likelihood function takes the followingform: p ( Y | S , ✓ ) = (2 ⇡ ) TN T Y t = N Y n = s t , n | det ( A ) | T exp T X t = ⇥ ˜ y t ˜ x t ↵ ⇤ diag s t ⇥ ˜ y t ˜ x t ↵ ⇤9>>=>>; . (B.4)This likelihood function resembles a multivariate normal density function for ↵ , apart from theterm | det( A ) | T . This observation motivates the choice of the candidate-generating density inthe following Metropolis- Hastings algorithm. Draw a candidate value, denoted by ¯ ↵ , at the s th iteration from a multivariate t distribution centered at the previous state of the MarkovChain, ↵ ( s , with the scale matrix set to P ⇤ and the degrees of freedom parameter ⌫ , where P ⇤ = ⇣P Tt = ˜ x t diag s t ˜ x t ⌘ . If ↵ followed a multivariate normal distribution resembling thelikelihood function from equation (B.4), then P ⇤ would be its covariance matrix. Then, compute = p ( ¯ ↵ | Y ) / p ⇣ ↵ ( s | Y ⌘ , where p ( x | Y ) is equal to the product of the likelihood function and theprior distribution evaluated at x , i.e., p ( Y | S , x ) p ( x ). Finally, draw u from a uniform distributionon the interval (0 ,
1) and set ↵ ( s ) = ¯ ↵ if u < and ↵ ( s ) = ↵ ( s otherwise. This Metropolis-Hastingsalgorithm is adjusted to the structural VAR identified through heteroskedasticity and in thatrespect it generalizes the algorithm by Canova & P´erez Forero (2015) maintaining its overallfunctionality. Sampling the autoregressive parameters.
The convenient form of the prior distribution and thelikelihood function allow for sampling the constant term and the autoregressive parametersindependently equation by equation from a multivariate normal distribution: A n | Y , S , A . n , , n , ! m , n ⇠ N K ⇣ A . n P n , H n ⌘ , for n = , . . . , N , where H n = , n X X + , n M X m = X m X m / ! m , n + e H and P n = , n Y X + , n M X m = Y m X m / ! m , n + e P e H H n . Here e H is a diagonal matrix with the first element on the diagonal equal to µ and the remainingones equal to the diagonal of H , and e P = [ N ⇥ P ].25 ampling the shrinkage parameters. The shrinkage parameters ↵ , µ and are sampled independentlyfrom the following IG ↵ | Y , ↵ ⇠ IG ⇣ a + r , b + ↵ ↵ ⌘ , µ | Y , µ ⇠ IG ⇣ a + N , b + µ µ ⌘ , | Y , A , n ⇠ IG a + pN , b + N X n = h n A . n P i H h n A . n P i1CCCCCA . Simulating the hidden Markov process.
In order to estimate the states of the hidden Markov processwe apply the algorithms presented in Section 11.2 of Fr ¨uhwirth-Schnatter (2006) that are based onthe smoothing procedure by Chib (1996). We estimate a stationary hidden Markov process for theMarkov-switching mechanism, and thus, we set the distribution p ( s | P ) to the ergodic probabilities(see Fr ¨uhwirth-Schnatter, 2006, Section 11.2). Sampling the transition probabilities matrix.
The transition probabilities P m , are sampled independentlyfrom an M -dimensional Dirichlet distribution given S : P m | S ⇠ D M ⇣ e m + N m ( S ) , . . . , e mM + N mM ( S ) ⌘ , for m { , . . . , M } . The parameters of the prior Dirichlet distributions are updated by the count ofthe transitions from the i th to the j th state given S , denoted by N ij ( S ).Estimation of the stationary Markov chain for the Markov-switching model requires a Metropolis-Hastingsstep because p ( s | P ) is set to a vector of ergodic probabilities which depends on P . For more detailsthe reader is referred to Section 11.5.5 of Fr ¨uhwirth-Schnatter (2006) or, for the case of a restrictedmatrix P , Droumaguet, Warne & Wo´zniak (2017). Wo´zniak & Droumaguet (2015) use a restrictedmatrix of transition probabilities to model di ↵ erent pattens of heteroskedasticity of the structuralshocks. Appendix B.3. Estimation of Marginal Data Densities
To compute the posterior probabilities of alternative SVAR-MSH models we estimate the MDDsfor a particular model, M , defined as: p ( Y |M ) = Z ⇥ p ( Y | ✓ , M ) p ( ✓ |M ) d ✓ , where ⇥ denotes the parameter space of the parameter vector ✓ , while p ( Y | ✓ , M ) and p ( ✓ |M ) denoterespectively the likelihood function and the prior density for model M (below the conditioningon M is suspended and only used in the context of model comparison).We apply a simple corrected arithmetic mean estimator proposed by Pajor (2016) that is basedon the identity: p ( Y ) = E ✓ ⇥ p ( Y | ✓ ) I O ( ✓ ) ⇤ Pr [ O | Y ] = Pr [ O ]Pr [ O | Y ] E ✓ ⇥ p ( Y | ✓ ) | O ⇤ , (B.5)that is indexed by the subset O ✓ ⇥ , where E ✓ [ . ] denotes the expected value with respect to theprior distribution of ✓ , E ✓ [ . | O ] denotes the conditional expected value given O , Pr [ O ] and Pr [ O | Y ]26enote the prior and posterior probabilities, respectively, of set O and I O ( ✓ ) denotes an indicatorfunction that takes the value of one if ✓ O , and zero otherwise.Pajor (2016) shows that a consistent and unbiased estimator of the MDD in equation (B.5) isgiven by ˆ p ( Y ) = J J X j = p ⇣ Y | ✓ ( j ) ⌘ p ⇣ ✓ ( j ) ⌘ I O ⇣ ✓ ( j ) ⌘ s ⇣ ✓ ( j ) ⌘ , (B.6)where n ✓ ( k ) o Jj = denotes a sample drawn from the importance density s ( . ). In the estimator aboveˆPr[ O | Y ] = O as ✓ ⇤ : p ( Y | ✓ ⇤ ) c O , where c O is the minimum value of thelikelihood function evaluated at the draws from the posterior distribution, as recommended byPajor (2016). Moreover, following Pajor (2016) the importance density is set to a multivariatetruncated normal density with the mean and covariance set to the posterior mean and posteriorcovariance of the parameters, respectively. The truncation is only active to ensure that ✓ ( k ) ⇥ . Appendix C. Definition and Moments of the Inverse Gamma 1 Ratio Distribution
This section specifies the inverse gamma 1 ratio distribution for a random variable that is definedas a ratio of two independent inverse gamma 1–distributed random variables. The probabilitydensity function as well as the moments of the distribution are established. These results mayfacilitate the computations if one prefers to parametrize the model in terms of the conditionalstandard deviations instead of conditional variances and ! that were used in Section 2. Definition 2 (Inverse Gamma 1 Ratio distribution)
Let x and y be two strictly positive independentrandom variables distributed according to the following IG x ⇠ IG a , b ) and y ⇠ IG a , b ), where a , a , b , and b are positive real numbers and the probability densityfunction of the inverse gamma 1 distribution is given by f IG ( x ; a , b ) = ✓ a ◆ b ! a x ( a + exp ( bx ) . (C.1)Then, the random variable z , defined as z = x / y , follows the Inverse Gamma 1 Ratio ( IG R )distribution with the probability density function given by f IG R ( z ; a , a , b , b ) = B ✓ a , a ◆ b a b a z a ⇣ b + b z ⌘ a + a , (C.2)where B ( · , · ) denotes the beta function. ⇤ Moments of the IG R distribution. The expected value and the variance of the IG R –distributed27andom variable z are respectively given by E [ z ] = b b ! B ⇣ a , a + ⌘ B ⇣ a , a ⌘ for a > , (C.3) Var [ z ] = b b a a b b B ⇣ a , a + ⌘ B ⇣ a , a ⌘ 37777775 for a > . (C.4)In general, the k th order non-central moment of z is given by E h z k i = b b ! k B ⇣ a k , a + k ⌘ B ⇣ a , a ⌘ for a > k . (C.5) Appendix D. Numerical Standard Errors for MDDs and SDDRs
In Tables D.6 and D.7 we report the Numerical Standard Errors (NSEs) for the logarithms ofthe SDDRs for the assessment of the homoskedasticity and identification conditions reported inTables 3 and 4, respectively. All of the values of the NSEs are small and show that our assessmentmeasures are numerically stable. The values of the NSEs increase monotonically with increasingabsolute value of the logarithm of the corresponding SDDRs. Nevertheless, the relative values ofthe NSEs to the logarithms of the SDDRs are negligible and do not a ↵ ect the conclusions.In Table D.8 we report the NSEs for the logarithms of the MDDs for the models that arereported in Table 5. These NSEs are greater in value than the NSEs for the logarithms of theSDDRs discussed above. However, the NSEs are smaller than the NSEs of the MDD estimatorproposed for the heteroskedastic SVARs by Wo´zniak & Droumaguet (2015) that were computedfor a similar simulation settings, but for a larger model ( N = ↵ erence between the MDDs fortwo models. The logarithm of the MDD for the Taylor Rule without Money model is significantlydi ↵ erent from the corresponding value for the Money-Interest Rate model when all restrictions areimposed and for the sample ending in 2007. Still, the implied posterior probability of the formermodel is just over two times as large as the posterior probability of the latter one.28 able D.6: NSEs for Savage-Dickey Density Ratios for Assessing Heteroskedasticity
Taylor Rule Taylor Rule Money-Unrestricted Recursive with Money without Money Interest Rate
Sample from 1967Q1 to 2013Q4
Hypothesis: ! , i = p t gdp t cp t FF t m t uc t ! , i = i = , . . . , N Sample from 1967Q1 to 2007Q4
Hypothesis: ! , i = p t gdp t cp t FF t m t uc t ! , i = i = , . . . , N able D.7: NSEs of Savage-Dickey Density Ratios for Assessing Identification Sample from 1967Q1 to 2013Q4 i , j ! Sample from 1967Q1 to 2007Q4 i , j ! Table D.8: NSEs for Marginal Data Densities
Taylor Rule Taylor Rule Money-with without InterestUnrestricted Recursive Money Money Rate
Sample from 1967Q1 to 2013Q4 all restrictions 0.138 0.142 0.145 0.138 0.141interest rate equation restricted 0.135 0.137 0.136 0.145
Sample from 1967Q1 to 2007Q4 all restrictions 0.148 0.138 0.135 0.143 0.140interest rate equation restricted 0.142 0.134 0.132 0.134Note: This table reports the NSEs of the estimates of the logarithms of the MDDs reported in Table 5. TheNSEs are computed with the batch means method described in Perrakis, Ntzoufras & Tsionas (2014) basedon 1000 batched means.all restrictions 0.148 0.138 0.135 0.143 0.140interest rate equation restricted 0.142 0.134 0.132 0.134Note: This table reports the NSEs of the estimates of the logarithms of the MDDs reported in Table 5. TheNSEs are computed with the batch means method described in Perrakis, Ntzoufras & Tsionas (2014) basedon 1000 batched means.