Identification and Inference Under Narrative Restrictions
aa r X i v : . [ ec on . E M ] F e b Identification and Inference Under NarrativeRestrictions ∗ Raffaella Giacomini † , Toru Kitagawa ‡ and Matthew Read § February 15, 2021
Abstract
We consider structural vector autoregressions subject to ‘narrative restrictions’,which are inequality restrictions on functions of the structural shocks in specificperiods. These restrictions raise novel problems related to identification andinference, and there is currently no frequentist procedure for conducting infer-ence in these models. We propose a solution that is valid from both Bayesianand frequentist perspectives by: 1) formalizing the identification problem un-der narrative restrictions; 2) correcting a feature of the existing (single-prior)Bayesian approach that can distort inference; 3) proposing a robust (multiple-prior) Bayesian approach that is useful for assessing and eliminating the pos-terior sensitivity that arises in these models due to the likelihood having flatregions; and 4) showing that the robust Bayesian approach has asymptotic fre-quentist validity. We illustrate our methods by estimating the effects of USmonetary policy under a variety of narrative restrictions.
JEL classification:
C32, E52
Keywords:
Frequentist coverage, global identification, identified set, multiplepriors ∗ We thank Isaiah Andrews, Sophocles Mavroeidis, Jos´e Luis Montiel-Olea, Mikkel Plagborg-Møller, Morten Ravn, Christian Wolf and seminar participants at several venues for helpful com-ments. We gratefully acknowledge financial support from ERC grants (numbers 536284 and 715940)and the ESRC Centre for Microdata Methods and Practice (CeMMAP) (grant number RES-589-28-0001). † University College London, Department of Economics/Cemmap. Email: [email protected] ‡ University College London, Department of Economics/Cemmap. Email: [email protected] § University College London, Department of Economics. Email: [email protected] Introduction
Estimating the dynamic causal effects of structural shocks is a key challenge inmacroeconomics. A common approach to this problem is to use a structural vec-tor autoregression (SVAR) with sign or zero restrictions on the model’s structuralparameters. Recently, a number of papers have augmented these restrictions withrestrictions that involve the values of the structural shocks in specific periods. For ex-ample, Antol´ın-D´ıaz and Rubio-Ram´ırez (2018) (AR18) propose restricting the signsof structural shocks and their contributions to the change in particular variables incertain historical episodes. Ludvigson, Ma and Ng (2018) independently proposerestricting the sign or magnitude of the structural shocks in specific periods. Aburgeoning empirical literature has adopted similar restrictions, including Ben Zeev(2018), Furlanetto and Robstad (2019), Cheng and Yang (2020), Inoue and Kilian(2020), Kilian and Zhou (2020a, 2020b), Laumer (2020), Redl (2020), Zhou (2020)and Ludvigson, Ma and Ng (2020). The fact that these restrictions are placed on theshocks rather than the parameters raises novel problems related to identification, esti-mation and inference. This paper clarifies the nature of these problems and proposesa solution that is valid from both Bayesian and frequentist perspectives.Henceforth, we refer to any restrictions that can be written as inequalities involv-ing structural shocks in particular periods as ‘narrative restrictions’ (NR). An exampleof NR are ‘shock-sign restrictions’, such as the restriction in AR18 that the US econ-omy was hit by a positive monetary policy shock in October 1979. This is when theFederal Reserve markedly increased the federal funds rate following Paul Volcker be-coming chairman, and is widely considered an example of a positive monetary policyshock (e.g., Romer and Romer (1989)). AR18 also consider ‘historical-decompositionrestrictions’, such as the restriction that the change in the federal funds rate in Octo-ber 1979 was overwhelmingly due to a monetary policy shock. This is an inequalityrestriction that simultaneously constrains the historical decomposition of the federalfunds rate with respect to all structural shocks in the SVAR. Other restrictions on thestructural shocks also fit into this framework. For example, we additionally consider‘shock-rank restrictions’, such as the restriction that the monetary policy shock inOctober 1979 was the largest positive realization of this shock in the sample period.From a frequentist perspective, NR are fundamentally different from traditionalidentifying restrictions, such as sign restrictions on impulse responses (e.g., Uhlig22005)). Under normally distributed structural shocks, traditional sign restrictions in-duce set-identification, because they generate a set-valued mapping from the SVAR’sreduced-form parameters to its structural parameters that represents observationalequivalence (i.e., an identified set). This set-valued mapping corresponds to the flatregion of the structural-parameter likelihood and, by the definition of observationalequivalence (e.g., Rothenberg (1971)), does not depend on the realization of the data.NR also result in the structural-parameter likelihood possessing flat regions and hencegenerate a set-valued mapping from the reduced-form parameters to the structuralparameters. Crucially, this mapping depends not only on the reduced-form parame-ters, but also on the realization of the data. The data-dependence of this mappingimplies that the standard concept of an identified set does not apply. In turn, thismeans that: 1) it is unclear whether NR are point- or set-identifying restrictions;and 2) there is no known valid frequentist procedure to conduct inference in thesemodels. From a Bayesian perspective, AR18 and the empirical papers that adopt theirapproach conduct standard (single-prior) Bayesian inference under NR in much thesame way as under traditional sign restrictions. However, we highlight two featuresof this approach that can spuriously affect inference. First, the conditional likelihoodused by AR18 to construct the posterior (distribution) implies that, for some typesof NR, a component of the prior (distribution) is updated only in the direction thatmakes the NR unlikely to hold ex ante. This occurs because the numerator of theconditional likelihood – the likelihood of the reduced-form VAR – is flat with respectto the orthonormal matrix that maps reduced-form VAR innovations into structuralshocks, whereas the denominator – the ex ante probability that the NR hold – dependson this matrix. Second, standard Bayesian inference under NR may be sensitive tothe choice of prior when the NR yield a likelihood with flat regions. A flat likelihoodimplies that the conditional posterior of the orthonormal matrix is proportional to itsconditional prior whenever the likelihood is nonzero. Posterior inference may thereforebe sensitive to the choice of conditional prior for the orthonormal matrix. This is aproblem that also occurs in set-identified models under traditional restrictions (e.g.,Poirier (1998)).To address the above issues, we study identification under NR and propose a Ludvigson et al. (2018, in press) conduct inference using a bootstrap procedure, but its frequen-tist validity is unknown. unconditional likelihood, rather than theconditional likelihood, to construct the posterior. Third, as a tool for assessing and/oreliminating posterior sensitivity occurring due to the likelihood having flat regions,we propose a robust (multiple-prior) Bayesian approach to estimation and inference.Finally, we show that the robust Bayesian approach has frequentist validity in largesamples.To the best of our knowledge, this is the first paper to formally study identificationunder general NR. Plagborg-Møller and Wolf (in press[b]) suggest that shock-sign re-strictions, in particular, could in principle be recast as an external instrument (or‘proxy’) and used to point-identify impulse responses in a proxy SVAR or local pro-jection framework. We explore this idea in Appendix F and highlight the potentialsensitivity of this approach to the realization of the unrestricted shocks in the timeperiods that enter the NR. Petterson, Seim and Shapiro (2020) derive bounds for aslope parameter in a single equation given restrictions on the plausible magnitudeof the residuals, but the restrictions are over the entire sample and the setting isnon-probabilistic.We make two main contributions to the study of identification under NR. First,we provide a necessary and sufficient condition for global identification of an SVARunder NR and show that this condition is satisfied in a simple bivariate examplewith a single shock-sign restriction. That is, in contrast with traditional sign restric-tions, NR may be formally point-identifying despite generating a set-valued mappingfrom reduced-form to structural parameters in any particular sample. However, thispoint-identification result does not deliver a point estimator, because the observedlikelihood is almost always flat at the maximum. Second, to develop a frequentist-valid procedure for inference, we introduce the notion of a ‘conditional identified set’.The conditional identified set extends the standard notion of an identified set to asetting where identification is defined in a repeated sampling experiment conditionalon the set of observations entering the NR. This provides an interpretation for the set-valued mapping induced by the NR as the set of observationally equivalent structuralparameters in such a conditional frequentist experiment.4n terms of inference under NR, this paper makes contributions from both aBayesian and a frequentist point of view.The paper’s contribution to Bayesian inference is to address the issues associatedwith the current approach to standard Bayesian inference under NR. First, we advo-cate using the unconditional likelihood – the joint probability of observing the dataand the NR being satisfied – when constructing the posterior, rather than the condi-tional likelihood. Regardless of the type of NR imposed, the unconditional likelihoodis flat with respect to the orthonormal matrix that maps reduced-form VAR inno-vations into structural shocks. This removes the source of posterior distortion thatarises due to conditioning on the NR holding. Standard Bayesian inference underthe unconditional likelihood requires a simple change to existing computational algo-rithms. Second, to address posterior sensitivity to the choice of prior, we adapt therobust Bayesian approach of Giacomini and Kitagawa (in press[a]) (GK) to a settingwith NR.In the context of an SVAR under traditional identifying restrictions, the robustBayesian approach of GK involves decomposing the prior for the structural parame-ters into a prior for the reduced-form parameters, which is revised by the data, anda conditional prior for the orthonormal matrix given the reduced-form parameters,which is unrevisable. Considering the class of all conditional priors for the orthonor-mal matrix that are consistent with the identifying restrictions generates a class ofposteriors, which can be summarized by a set of posterior means (an estimator ofthe identified set) and a robust credible region. This removes the source of posteriorsensitivity. We show that this approach can also be used to summarize posterior sensitivityunder NR, since the unconditional likelihood at the realized data possesses flat re-gions and the posterior can therefore be sensitive to the choice of prior, as in standardset-identified models. There are, however, some modifications needed to account forthe novel features of the NR. In particular, one cannot use a conditional prior for theorthonormal matrix to impose the NR due to the data-dependent mapping betweenreduced-form and structural parameters. However, by considering the class of allconditional priors consistent with any traditional identifying restrictions (if present),one can trace out all possible posteriors that are consistent with the traditional re- Giacomini, Kitagawa and Read (2019) extend this approach to proxy SVARs where the param-eters of interest are set-identified using external instruments. and the NR. This is because traditional restrictions truncate the support ofthe conditional prior, while NR truncate the support of the likelihood. Consequently,the posterior given any particular conditional prior is only supported on the commonsupport of the conditional prior and the likelihood.If the researcher has a credible conditional prior, we recommend reporting thestandard Bayesian posterior under the unconditional likelihood together with the ro-bust Bayesian output. This allows other researchers to assess the extent to whichposterior inference may be driven by prior choice. In the absence of a credible condi-tional prior, the robust Bayesian output should be reported as an alternative to thestandard Bayesian posterior.The paper’s contribution to frequentist inference is to provide an asymptoticallyvalid approach to inference under NR, which, to the best of our knowledge, was notpreviously available. To explore the asymptotic frequentist properties of our robustBayesian procedure, we assume a fixed number of NR. This assumption is empiricallyrelevant given that applications typically impose no more than a handful of NR. Weprovide conditions under which the robust credible region provides asymptoticallyvalid frequentist coverage of the conditional identified set for the impulse response.Since the conditional identified set is guaranteed to include the true impulse response,the robust credible region also provides valid coverage of the true impulse response.Our robust Bayesian approach should therefore appeal to Bayesians as well as fre-quentists.We illustrate our methods by estimating the effects of monetary policy shocks inthe United States. We find that posterior inferences about the response of outputobtained under restrictions based on the October 1979 episode may be sensitive tothe choice of conditional prior for the orthonormal matrix. In contrast, under anextended set of restrictions constructed by AR18 based on multiple historical episodes,output falls with high posterior probability following a positive monetary policy shockregardless of the choice of conditional prior. We also estimate the set of outputresponses that are consistent with the restriction that the monetary policy shock inOctober 1979 was the largest positive realization of the shock in the sample period.Compared with the extended set of restrictions, this shock-rank restriction results inbroadly similar robust posterior inferences about the output response.
Outline.
The remainder of the paper is structured as follows. Section 2 highlights theeconometric issues that arise when imposing NR using a simple bivariate example.6ection 3 describes the general SVAR( p ) framework. Section 4 formally analyzesidentification under NR and introduces the concept of a conditional identified set.Section 5 discusses how to conduct standard and robust Bayesian inference underNR. Section 6 explores the frequentist properties of the robust Bayesian approach.Section 7 contains the empirical application and Section 8 concludes. The appendicescontain proofs and other supplemental material. Generic notation:
For the matrix X , vec( X ) is the vectorization of X and vech( X )is the half-vectorization of X (when X is symmetric). e i,n is the i th column of the n × n identity matrix, I n . n × m is a n × m matrix of zeros. 1( . ) is the indicatorfunction. k . k is the Euclidean norm. This section sets out the econometric issues that arise when imposing NR usingthe simplest possible SVAR as an example. Consider the SVAR(0) A y t = ε t , for t = 1 , . . . , T , where y t = ( y t , y t ) ′ and ε t = ( ε t , ε t ) ′ with ε t iid ∼ N ( × , I ). Weabstract from dynamics for ease of exposition, but this is without loss of generality.The orthogonal reduced form of the model reparameterizes A as Q ′ Σ − tr , where Σ tr is the lower-triangular Cholesky factor (with positive diagonal elements) of Σ = E ( y t y ′ t ) = A − (cid:0) A − (cid:1) ′ . We parameterize Σ tr directly as Σ tr = " σ σ σ ( σ , σ > , (1)and denote the vector of reduced-form parameters as φ = vech( Σ tr ). Q is an or-thonormal matrix in the space of 2 × O (2): Q ∈ O (2) = (" cos θ − sin θ sin θ cos θ : θ ∈ [ − π, π ] ) ∪ (" cos θ sin θ sin θ − cos θ : θ ∈ [ − π, π ] ) , (2)where the first set is the set of ‘rotation’ matrices and the second set is the set of‘reflection’ matrices.Given the ‘sign normalization’ diag( A ) ≥ × , the set of values for A that areconsistent with the reduced-form parameters in the absence of additional restrictions7s A ∈ ( σ σ " σ cos θ − σ sin θ σ sin θ − σ cos θ − σ sin θ σ cos θ : σ cos θ ≥ σ sin θ, cos θ ≥ , θ ∈ [ − π, π ] ) ∪ ( σ σ " σ cos θ − σ sin θ σ sin θσ sin θ + σ cos θ − σ cos θ : σ cos θ ≥ σ sin θ, cos θ ≤ , θ ∈ [ − π, π ] ) . (3) Consider the ‘shock-sign restriction’ that ε k is nonnegative for some k ∈ { , . . . , T } : ε k = e ′ , A y k = ( σ σ ) − ( σ y k cos θ + ( σ y k − σ y k ) sin θ ) ≥ . (4)Given the realization of the data in period k , Equation (4) implies that the restrictedstructural shock can be written as a function ε k ( θ, φ , y k ). Under the sign normal-ization and the shock-sign restriction, θ is restricted to the set θ ∈ { θ : σ sin θ ≤ σ cos θ, cos θ ≥ , σ y k cos θ ≥ ( σ y k − σ y k ) sin θ, − π ≤ θ ≤ π }∪{ θ : σ sin θ ≤ σ cos θ, cos θ ≤ , σ y k cos θ ≥ ( σ y k − σ y k ) sin θ, − π ≤ θ ≤ π } . (5)Since y k and y k enter the inequalities characterising this set, the shock-sign restric-tion induces a set-valued mapping from φ to θ that depends on the realization of y k .For example, if σ < σ y k − σ y k > y k > θ ∈ (cid:20) arctan (cid:18) σ σ (cid:19) , arctan (cid:18) σ y k σ y k − σ y k (cid:19)(cid:21) . (6)The direct dependence of this mapping on the realization of the data implies that thestandard notion of an identified set – the set of observationally equivalent structuralparameter values given the reduced-form parameters – does not apply. Consequently,it is not obvious whether existing frequentist procedures for conducting inferencein set-identified models are valid under NR. Moreover, it is unclear whether therestrictions are, in fact, set-identifying in a formal frequentist sense. We formallyanalyze identification under NR in Section 4. See Appendix A for the full characterization of this mapping. y T = ( y ′ , . . . , y ′ T ) ′ represent a realization of the random variable Y T , the conditional likelihood is p (cid:0) y T | θ, φ , ε k ( θ, φ , y k ) ≥ (cid:1) = Q Tt =1 (2 π ) − | Σ | − exp (cid:0) − y ′ t Σ − y t (cid:1) Pr( ε k ≥ | θ, φ ) 1 ( ε k ( θ, φ , y k ) ≥ . (7)The numerator in the first term is a function of φ and y T , while the denominatoris equal to 1/2, because the marginal distribution of ε k is standard normal. Theconditional likelihood therefore depends on θ only through the indicator function1 ( ε k ( θ, φ , y k ) ≥ y k . To illustrate, the left panel of Figure 1 plots thelikelihood given different realizations of the data drawn from a data-generating pro-cess with σ < φ . Theconditional likelihood is flat over the region for θ satisfying the shock-sign restrictionand is zero outside this region. The support of the nonzero region depends on therealization of y k .The flat likelihood function implies that the posterior will be proportional to theprior in the region where the likelihood function is nonzero, and it will be zero outsidethis region. The standard approach to Bayesian inference in SVARs identified via signrestrictions assumes a uniform (or Haar) prior over Q , as does the approach in AR18. In the bivariate example, this is equivalent to a prior for θ that is uniform over theinterval [ − π, π ]. This prior implies that the posterior for θ is also uniform over theinterval for θ where the likelihood function is nonzero.The impact impulse response of y t to a positive standard-deviation shock ε t is η ≡ σ cos θ . The right panel of Figure 1 plots the posterior for η induced by auniform prior over θ given the same realizations of the data for which the likelihood The data-generating process assumes A = (cid:20) . . . (cid:21) , which implies that θ = arcsin(0 . σ )with Q equal to the rotation matrix. We assume the time series is of length T = 3 and drawsequences of structural shocks such that ε , ≥ T is a small number to control Monte Carlosampling error in the exercises below. The analysis with known φ replicates the situation with alarge sample, where the likelihood for φ concentrates at the truth. The assumption that φ is knownalso facilitates visualizing the likelihood, which otherwise is a function of four parameters. See, for example, Uhlig (2005), Rubio-Ram´ırez, Waggoner and Zha (2010),Baumeister and Hamilton (2015) and Arias, Rubio-Ram´ırez and Waggoner (2018). igure 1: Shock-sign Restriction - - /2 0 /211.522.533.54 10 -4 Likelihood
Notes: T = 3, φ is known and ε k ( θ, φ , y k ) ≥ η = σ cos θ is approximated using 1,000,000 draws of θ from its uniform posterior. was plotted in the left panel. The uniform posterior for θ induces a posterior for η that assigns more probability mass to more-extreme values of η . This highlights thateven a ‘uniform’ prior may be informative for parameters of interest, which is alsothe case under traditional sign restrictions (Baumeister and Hamilton (2015)). Onedifference is that the conditional prior under sign restrictions is never updated by thedata, whereas the support and shape of the posterior for η under NR may depend onthe realization of y k through its effect on the truncation points of the likelihood, sothere may be some updating of the conditional prior by the data. For example, when σ < σ y k − σ y k > y k > η ∈ (cid:20) σ cos (cid:18) arctan (cid:18) max (cid:26) − σ σ , σ y k σ y k − σ y k (cid:27)(cid:19)(cid:19) , σ (cid:21) . (8)However, the conditional prior is not updated at values of θ corresponding to the flatregion of the likelihood. Posterior inference about η may therefore still be sensitiveto the choice of prior, as in standard set-identified SVARs. The historical decomposition is the contribution of a particular structural shock tothe observed unexpected change in a particular variable over some horizon. The10ontribution of the first shock to the change in the first variable in the k th period is H , ,k ( θ, φ , y k ) = σ − (cid:0) σ y k cos θ + ( σ y k − σ y k ) cos θ sin θ (cid:1) , (9)while the contribution of the second shock is H , ,k ( θ, φ , y k ) = σ − (cid:0) σ y k sin θ + ( σ y k − σ y k ) cos θ sin θ (cid:1) . (10)Consider the restriction that the first structural shock in period k was positive and(in the language of AR18) the ‘most important contributor’ to the change in the firstvariable, which requires that | H , ,k ( θ, φ , y k ) | ≥ | H , ,k ( θ, φ , y k ) | . Under these restric-tions and the sign normalization, θ must satisfy a set of inequalities that depends on φ and y k . As in the case of the shock-sign restriction, this set of restrictions generatesa set-valued mapping from φ to θ that depends on y k . Let D ( θ, φ , y k ) = 1 { ε k ( θ, φ , y k ) ≥ , | H , ,k ( θ, φ , y k ) | ≥ | H , ,k ( θ, φ , y k ) |} rep-resent the indicator function equal to one when the NR are satisfied and equal tozero otherwise, and let ˜ D ( θ, φ , ε k ) = 1 { ε k ≥ , | ˜ H , ,k ( θ, φ , ε k ) | ≥ | ˜ H , ,k ( θ, φ , ε k ) |} represent the indicator function for the same event in terms of the structural shocksrather than the data. The conditional likelihood function given the restrictions isthen p (cid:0) y T | θ, φ , D ( θ, φ , y k ) = 1 (cid:1) = Q Tt =1 (2 π ) − n | Σ | − exp (cid:0) − y ′ t Σ − y t (cid:1) Pr( ˜ D ( θ, φ , ε k ) = 1 | θ, φ ) D ( θ, φ , y k ) . (11)As in the case of the shock-sign restriction, the numerator of the first term doesnot depend on θ . In contrast, the probability in the denominator now depends on θ through the historical decomposition. Intuitively, changing θ changes the impulseresponses of y t to the two shocks and thus changes the ex ante probability that | ˜ H , ,k ( θ, φ , ε k ) | ≥ | ˜ H , ,k ( θ, φ , ε k ) | . The conditional likelihood therefore dependson θ both through this probability and through the indicator function determiningthe truncation points of the likelihood. Consequently, the likelihood function is notnecessarily flat when it is nonzero.To illustrate, the left panel of Figure 2 plots the conditional likelihood evalu-ated at a random realization of the data satisfying the restrictions using the same See Appendix A for this set of inequalities. It is more difficult to analytically characterize theinduced mapping than in the shock-sign example, so we do not pursue this. φ is known. The probability inthe denominator of the conditional likelihood is approximated by drawing 1,000,000realizations of ε k and computing the proportion of draws satisfying the restrictionsat each value of θ . This probability is plotted in the right panel of Figure 2. Thelikelihood is again truncated according to a set-valued mapping from φ and y k to θ ,but an important difference from the case with the shock-sign restriction is that thelikelihood is no longer flat within the region where it is nonzero. In particular, theconditional likelihood has a maximum at the value of θ that minimizes the ex anteprobability that the NR are satisfied (within the set of values of θ that are consistentwith the restrictions). The posterior for θ induced by a uniform prior will thereforeassign greater posterior probability to values of θ that yield a lower ex ante probabilityof satisfying the NR. Figure 2: Historical-decomposition Restriction - - /2 0 /200.0020.0040.0060.0080.010.012
Likelihood
ConditionalUnconditional - - /2 0 /200.10.20.30.40.5
Probability that Shocks Satisfy Restrictions
Notes: T = 3 and φ is known; ε , ( φ , θ, y k ) ≥ | H , , ( φ , θ, y k ) | ≥ | H , , ( φ , θ, y k ) | are the narrative sign restrictions; Pr( ˜ D ( θ, φ , ε k ) = 1 | θ, φ ) is approximated using1,000,000 Monte Carlo draws. If we view the narrative event as a part of the observables and its probabilityof occurring depends on the parameter of interest, conditioning on the narrativeevent implies that we are conditioning on a non-ancillary statistic. When conductinglikelihood-based inference, conditioning on a non-ancillary statistic is undesirable,because it represents a loss of information about the parameter of interest. Theprobability that the shock-sign restriction is satisfied is independent of the parameters,so the event that the restriction is satisfied is ancillary. In the case where there is alsoa restriction on the historical decomposition, the probability that the NR are satisfieddepends on θ , so the event that the NR are satisfied is not ancillary. Conditioning on12his non-ancillary event results in the likelihood no longer being flat, but the shapeof the likelihood is fully driven by the inverse probability of the conditioning event.That is, the loss of information for θ can be viewed as distorting the shape of theposterior in the sense that the prior is updated toward values of θ that make theevent that the NR are satisfied less likely ex ante. We therefore advocate forming thelikelihood without conditioning on the restrictions holding.The joint (or unconditional) likelihood of observing the data and the NR holdingis obtained by multiplying the conditional likelihood by the probability that the NRare satisfied: p (cid:16) y T , ˜ D ( θ, φ , ε k ) = 1 | θ, φ (cid:17) = T Y t =1 (2 π ) − n | Σ | − exp (cid:18) − (cid:0) y ′ t Σ − y t (cid:1)(cid:19) D ( θ, φ , y k ) . (12)Conditional on being nonzero, the unconditional likelihood is flat with respect to θ .The unconditional likelihood depends on θ only through the points of truncation. Toillustrate, Figure 2 plots the unconditional likelihood given the same realization ofthe data used to plot the conditional likelihood. As in the case of the shock-signrestriction, the flat unconditional likelihood implies that posterior inference may besensitive to the choice of prior. We describe our approach to addressing this posteriorsensitivity in Section 5.2. This section describes the general SVAR( p ) and outlines the restrictions that weconsider. p ) Let y t be an n × p ) process: A y t = p X l =1 A l y t − l + ε t , t = 1 , ..., T, (13)where A is invertible and ε t iid ∼ N ( n × , I n ) are structural shocks. The initialconditions ( y − p , ..., y ) are given. We omit exogenous regressors (such as a con-13tant) for simplicity of exposition, but these are straightforward to include. Letting x t = ( y ′ t − , . . . , y ′ t − p ) ′ and A + = ( A , . . . , A p ), rewrite the SVAR( p ) as A y t = A + x t + ε t , t = 1 , ..., T. (14)( A , A + ) are the structural parameters. The reduced-form VAR( p ) representation is y t = Bx t + u t , t = 1 , ..., T, (15)where B = ( B , . . . , B p ), B l = A − A l for l = 1 , . . . , p , and u t = A − ε t iid ∼ N ( n × , Σ )with Σ = A − ( A − ) ′ . φ = (vec( B ) ′ , vech( Σ ) ′ ) ′ ∈ Φ are the reduced-form parameters.We assume that B is such that the VAR( p ) can be inverted into an infinite-order vectormoving average (VMA( ∞ )) representation. As is standard in the literature that considers set-identified SVARs, we reparam-eterize the model into its orthogonal reduced form (e.g., Arias et al. (2018)): y t = Bx t + Σ tr Q ε t , t = 1 , ..., T, (16)where Σ tr is the lower-triangular Cholesky factor of Σ (i.e. Σ tr Σ ′ tr = Σ ) with diagonalelements normalized to be non-negative, Q is an n × n orthonormal matrix and O ( n ) is the set of all such matrices. The structural and orthogonal reduced-formparameterizations are related through the mapping B = A − A + , Σ = A − ( A − ) ′ and Q = Σ − tr A − with inverse mapping A = Q ′ Σ − tr and A + = Q ′ Σ − tr B .The VMA( ∞ ) representation of the model is y t = ∞ X h =0 C h u t − h = ∞ X h =0 C h Σ tr Q ε t , t = 1 , ..., T, (17)where C h is the h th term in ( I n − P pl =1 B l L l ) − and L is the lag operator. C h isdefined recursively by C h = P min { k,p } l =1 B l C h − l for h ≥ C = I n . The ( i, j )thelement of the matrix C h Σ tr Q , which we denote by η i,j,h ( φ , Q ), is the horizon- h impulse response of the i th variable to the j th structural shock: η i,j,h ( φ , Q ) = e ′ i,n C h Σ tr Qe j,n = c ′ i,h ( φ ) q j , (18) The VAR( p ) is invertible into a VMA( ∞ ) process when the eigenvalues of the companion matrixlie inside the unit circle. See Hamilton (1994) or Kilian and L¨utkepohl (2017). c ′ i,h ( φ ) = e ′ i,n C h Σ tr is the i th row of C h Σ tr and q j = Qe j,n is the j th columnof Q . In the absence of any identifying restrictions, it is well-known that Q is set-identified.Consequently, functions of Q , such as the impulse responses, are also set-identified.Imposing traditional identifying restrictions on the SVAR is equivalent to restricting Q to lie in a subspace of O ( n ). It is conventional to impose a ‘sign normalization’ onthe structural shocks. We normalize the diagonal elements of A to be non-negative,so a positive value of ε it is a positive shock to the i th equation in the SVAR at time t . The sign normalization implies that diag( Q ′ Σ − tr ) ≥ n × .It is common to impose sign restrictions on the impulse responses (e.g., Uhlig(2005)) or on the structural parameters themselves. For example, the restriction thatthe horizon- h impulse response of the i th variable to the j th shock is nonnegative is c ′ i,h ( φ ) q j ≥
0, which is a linear inequality restriction on a single column of Q thatdepends only on the reduced-form parameter φ . Restrictions on elements of A takea similar form.In contrast, NR constrain the values of the structural shocks in particular periods.The structural shocks are ε t = A u t = Q ′ Σ − tr u t . (19)The shock-sign restriction that the i th structural shock at time k is positive is ε ik ( φ , Q , u k ) = e ′ i,n Q ′ Σ − tr u k = ( Σ − tr u k ) ′ q i ≥ . (20)We can treat u t as observable given φ and the data, so we suppress the depen-dence of u t on φ and ( y ′ t , x ′ t ) ′ for notational convenience. The restriction in (20) isa linear inequality restriction on a single column of Q . In contrast with traditionalsign restrictions, the shock-sign restriction depends directly on the data through thereduced-form VAR innovations.In addition to shock-sign restrictions, AR18 consider restrictions on the historicaldecomposition, which is the cumulative contribution of the j th shock to the observed15nexpected change in the i th variable between periods k and k + h : H i,j,k,k + h (cid:16) φ , Q , { u t } k + ht = k (cid:17) = h X l =0 e ′ i,n C l Σ tr Qe j,n e ′ j,n ε k + h − l = h X l =0 c ′ i,l ( φ ) q j q ′ j Σ − tr u k + h − l . (21)One example of a restriction on the historical decomposition is that the j th struc-tural shock was the ‘most important contributor’ to the change in the i th variablebetween periods k and k + h , which requires that | H i,j,k,k + h | ≥ max l = j | H i,l,k,k + h | .Another example is that the j th structural shock was the ‘overwhelming contributor’to the change in the i th variable between periods k and k + h , which requires that | H i,j,k,k + h | ≥ P l = j | H i,l,k,k + h | . From Equation (21), it is clear that these restrictionsare nonlinear inequality constraints that simultaneously constrain every column of Q and that depend on the realizations of the data in particular periods in addition tothe reduced-form parameters.Other restrictions also naturally fit into this framework. For instance, we canconsider restrictions on the relative magnitudes of a particular structural shock indifferent periods. We refer to these restrictions as ‘shock-rank restrictions’, sincethey imply a (possibly partial) ordering of the shocks. As an example, one couldimpose that the i th shock in period k was the largest positive realization of this shockin the observed sample. This requires that ε ik ( φ , Q , u k ) ≥ max t = k { ε it ( φ , Q , u t ) } ,which can be expressed as a system of T − Q : ( Σ − tr ( u k − u t )) ′ q i ≥ t = k . Alternatively, one could imposethat the i th shock in period k was the largest-magnitude realization of that shock, or | ε ik ( φ , Q , u k ) | ≥ max t = k {| ε it ( φ , Q , u t ) |} . If ε ik ( φ , Q , u k ) ≥
0, this would require that( Σ − tr ( u k − u t )) ′ q i ≥ Σ − tr ( u k + u t )) ′ q i ≥ t = k , which is a system of 2( T − q i . These restrictions could also be applied to a subsetof the observations rather than the full sample (e.g., ε ik ( φ , Q , u k ) > ε it ( φ , Q , u t ) forsome t ∈ { , . . . , T } ). The collection of NR can be represented in the general form N ( φ , Q , Y T ) ≥ s × ,where s is the number of restrictions. As an illustration, consider the case wherethere is a single shock-sign restriction in period k , ε k ( φ , Q , u k ) ≥
0, as well as the Similar to the shock-rank restrictions we describe, Ben Zeev (2018) imposes a restriction on thetiming of the maximum three-year average of a particular shock, as well as restrictions on the signand relative magnitudes of this three-year average in specific periods. Restrictions on averages ofshocks can also be implemented in the framework we consider. k . Then, N ( φ , Q , Y T ) = " ( Σ − tr u k ) ′ q | e ′ ,n Σ tr q q ′ Σ − tr u k | − max j =1 | e ′ ,n Σ tr q j q ′ j Σ − tr u k | ≥ × . (22)Traditional sign and zero restrictions can also be applied alongside NR. We fol-low AR18 by explicitly allowing for sign restrictions on impulse responses and onelements of A . We denote such sign restrictions by S ( φ , Q ) ≥ ˜ s × , where ˜ s is thenumber of traditional sign restrictions. It is straightforward to additionally allow forzero restrictions, including ‘short-run’ zero restrictions (as in Sims (1980)), ‘long-run’zero restrictions (as in Blanchard and Quah (1989)), or restrictions arising from ex-ternal instruments (as in Mertens and Ravn (2013) and Stock and Watson (2018));for example, see GK and Giacomini et al. (2019). When constructing the posterior of the SVAR’s parameters, AR18 use the likelihoodconditional on the NR holding. Define D N = D N ( φ , Q , Y T ) ≡ { N ( φ , Q , Y T ) ≥ s × } ,r ( φ , Q ) ≡ Pr( D N ( φ , Q , Y T ) = 1 | φ , Q ) ,f ( y T | φ ) ≡ T Y t =1 (2 π ) − n | Σ | − exp (cid:18) −
12 ( y t − Bx t ) ′ Σ − ( y t − Bx t ) (cid:19) . The likelihood conditional on D N = 1 can be written as p ( y T | D N = 1 , φ , Q ) = f ( y T | φ ) r ( φ , Q ) · D N ( φ , Q , y T ) . (23) f ( y T | φ ) is the joint density of the data given φ (i.e., the likelihood function of thereduced-form VAR), which depends only on φ and the data. The indicator function D N ( φ , Q , y T ) is equal to one when the NR are satisfied and is equal to zero other-wise. This determines the truncation points of the likelihood. r ( φ , Q ) is the ex anteprobability that the NR are satisfied. This will be a constant when there are onlyshock-sign or shock-rank restrictions; for example, if there are s shock-sign restric-17ions, r ( φ , Q ) = (1 / s . In contrast, when there are restrictions on the historicaldecomposition, this probability will depend on φ and Q .Consider the case where φ is known, which will be the case asymptotically because φ is point-identified. When r ( φ , Q ) depends on Q , the conditional likelihood will bemaximized at the value of Q that minimizes r ( φ , Q ) (within the set of values of Q thatsatisfy the restrictions). The posterior based on this likelihood will therefore placehigher posterior probability on values of Q that result in a lower ex ante probabilitythat the restrictions are satisfied. As discussed in Section 2.2, this is an artefact ofconditioning on a non-ancillary event, which represents a loss of information aboutthe parameters.We therefore advocate constructing the likelihood without conditioning on the NRholding. The unconditional likelihood (the joint distribution of the data and D N ) canbe expressed as p ( y T , D N = d | φ , Q ) = (cid:2) f ( y T | φ ) D N ( φ , Q , y T ) (cid:3) d · (cid:2) f ( y T | φ ) (cid:0) − D N ( φ , Q , y T ) (cid:1)(cid:3) − d = f ( y T | φ ) · (cid:2) D N ( φ , Q , y T ) (cid:3) d · (cid:2) − D N ( φ , Q , y T ) (cid:3) − d . (24)For any value of φ such that y T is compatible with the NR, there will be a set ofvalues of Q that satisfy the restrictions, which depend on the data, but the valueof the unconditional likelihood will be the same for all values of Q within this set.The conditional posterior of Q | φ , y T will therefore be proportional to the conditionalprior for Q | φ in these regions. Given a fixed number of NR, the likelihood willpossess flat regions even with a time-series of infinite length, so posterior inferencemay be sensitive to the choice of conditional prior for Q , even asymptotically (whichis also the case for the conditional likelihood when the restrictions are ancillary). Thismotivates considering Bayesian inferential procedures that are robust to the choice ofunrevisable conditional prior for Q , which we explore in Section 5.2. In this section, we briefly discuss the distributional assumptions for the structuralshocks and the mechanism that generates the NR.18 .4.1 Distributional assumptions
Practitioners may be concerned about the robustness of inference with respect todeviations from the assumption of standard normal shocks. For instance, one couldworry that the periods in which the NR are imposed are ‘unusual’ in the sense thatthe structural shocks in these periods were drawn from a distribution with, say, in-flated variance or fat tails. The unconditional likelihood depends on the normalityassumption only through f ( y T | φ ). By omitting terms in f ( y T | φ ) corresponding tothe periods in which the NR are imposed, one can conduct inference that is robustto the distributional assumption about the shocks in these particular periods. Toillustrate, consider the case where NR are imposed in period k only and assume thelikelihood function for y T takes the form˜ f ( y T | φ ) = v ( { y t − Bx t } t = k | φ ) w ( y k − Bx k ) , (25)where v ( { y t − Bx t } t = k | φ ) = Y t = k (2 π ) − n | Σ | − exp (cid:18) −
12 ( y t − Bx t ) ′ Σ − ( y t − Bx t ) (cid:19) (26)and w ( y k − Bx k ) is an unknown, potentially non-normal, density. Replacing f ( y T | φ )in Equation (24) with v ( { y t − Bx t } t = k | φ ) yields an ‘unconditional partial likelihood’that does not depend on the distribution of ε k , but that is still truncated by theNR. This would potentially result in a loss of information relative to a likelihoodthat correctly specifies the distribution of the shocks in period k . However, whenNR are imposed in only a few periods, this loss of information is likely to be small.In contrast, the conditional likelihood approach cannot leave fully unspecified thedistribution of the restricted structural shocks, because computing r ( φ , Q ) requiresspecifying this distribution.Concerns about heteroscedasticity or non-normality may also be alleviated byrecognizing that the distributional assumption will become irrelevant asymptotically.The set of values of Q with non-zero unconditional likelihood depends only on φ ,which summarizes the second moments of the data, and the realization of the datain the periods in which the NR are imposed. Under regularity assumptions, the like-lihood (and thus the posterior) of φ will converge to a point at the true value of φ asymptotically regardless of whether the true data-generating process is a VAR with19omoscedastic normal shocks. The set of values of Q with non-zero likelihood willtherefore converge asymptotically to the same set regardless of whether the distribu-tional assumption is correct. Note that we do not explicitly model the mechanism responsible for revealing theinformation underlying the NR (i.e., whether D N = 1 or D N = 0) or the mechanismdetermining the periods in which this information is revealed (e.g., the identity of k inexamples above), which is consistent with the papers that impose these restrictions.If the revelation of this information depends on the data, the likelihood will be mis-specified. The exact implications of this misspecification for estimation or inferencewill depend on assumptions about the mechanism revealing the narrative informa-tion. Exploring the consequences of such misspecification may be an interesting areafor further work. In the bivariate example of Section 2, if the identity of k is ran-domly determined independently of ε , . . . , ε T , we can interpret the current analysisconditional on k . This section formally analyzes identification in the SVAR under NR. Section 4.1 con-siders whether NR are point- or set-identifying in a frequentist sense. Section 4.2introduces the notion of a ‘conditional identified set’, which extends the standardnotion of an identified set to the setting where the mapping from reduced-form tostructural parameters depends on the realization of the data. This provides an inter-pretation of the mapping induced by the NR. Additionally, we make use of this objectwhen showing the frequentist validity of our robust Bayesian procedure in Section 6.
Denoting the true parameter value by ( φ , Q ), point-identification for the parametricmodel (24) requires that there is no other parameter value ( φ , Q ) = ( φ , Q ) that is See Plagborg-Møller (2019) for a discussion of this point in the context of a structural VMAmodel. φ , Q ). To assess the existence or non-existence of observationally equivalent parameterpoints, we analyze a statistical distance between p ( y T , D N = d | φ , Q ) and p ( y T , D N = d | φ , Q ) that metrizes observation equivalence. Specifically, in the current settingwhere the support of the distribution of observables can depend on the parameters,it is convenient to work with the Hellinger distance: HD ( φ , Q ) ≡ X d =0 , Z Y (cid:0) p / ( y T , D N = d | φ , Q ) − p / ( y T , D N = d | φ , Q ) (cid:1) d y T = 2 (1 − H ( φ , Q )) , where H ( φ , Q ) ≡ X d =0 , Z Y p / ( y T , D N = d | φ , Q ) · p / ( y T , D N = d | φ , Q ) d y T , (27)and Y is the sample space for Y T . As is known in the literature on minimum distanceestimation (see, for example, Basu, Shioya and Park (2011)), ( φ , Q ) and ( φ , Q ) areobservationally equivalent if and only if HD ( φ , Q ) = 0 or, equivalently, H ( φ , Q ) = 1.We similarly define the Hellinger distance for the conditional likelihood as HD c ( φ , Q ) ≡ − H c ( φ , Q )) , where H c ( φ , Q ) ≡ Z Y p / ( y T | D N = 1 , φ , Q ) · p / ( y T | D N = 1 , φ , Q ) d y T . (28)The next proposition analyzes the conditions for H ( φ , Q ) = 1 and H c ( φ , Q ) =1, and shows that observational equivalence of ( φ , Q ) and ( φ , Q ) boils down togeometric equivalence of the set of reduced-form VAR innovations satisfying the NR. Proposition 4.1.
Let ( φ , Q ) be the true parameter value and let U ≡ U ( y T ; φ ) =( u ′ , . . . , u ′ T ) ′ collect the reduced-form VAR innovations. Define Q ∗ ≡ ( Q ∈ O ( n ) : { U : N ( φ , Q , Y T ) ≥ s × } = { U : N ( φ , Q , Y T ) ≥ s × } up to f ( Y T | φ ) -null set , diag( Q ′ Σ − tr ) ≥ n × ) . The unconditional likelihood model (24) and the conditional likelihood model (23) areglobally identified (i.e., there are no observationally equivalent parameter points to ( φ , Q ) ) if and only if Q ∗ is a singleton. If the parameter of interest is an impulse ( φ , Q ) = ( φ , Q ) is observationally equivalent to ( φ , Q ) if p ( Y T , D N = d | φ , Q ) = p ( Y T , D N = d | φ , Q ) holds for all Y T and d ∈ { , } . esponse to the j th structural shock, η i,j,h ( φ , Q ) , as defined in (18), then η i,j,h ( φ , Q ) is point-identified if the projection of Q ∗ onto its j th column vector is a singleton.Proof. See Appendix B.This proposition provides a necessary and sufficient condition for global identifi-cation of SVARs by NR. As shown in the proof in Appendix B, Q ∗ defined in thisproposition corresponds to the observationally equivalent Q matrices given φ = φ ,but, importantly, it does not correspond to any flat region of the observed likelihood(the conditional identified set in Definition 4.1 below).To illustrate this point, consider the simple bivariate example of Section 2 withthe NR (4), where y t itself is the reduced-form error, so U in Proposition 4.1 can beset to y k . Given φ , the set of y k ∈ R satisfying the NR is the half-space given by n y k ∈ R : ( σ σ ) − (cid:16) σ cos θ − σ sin θ, σ sin θ (cid:17) y k ≥ o . (29)The condition for point-identification shown in Proposition 4.1 is satisfied if no θ ′ = θ can generate the half-space of y k identical to (29). Such θ ′ cannot exist, since ahalf-space passing through the origin ( a , a ) y k ≥ a /a and (29) implies the slope σ − ( σ (tan θ ) − − σ ) is a bijective mapof θ on a constrained domain due to the sign normalization. Figure 3 plots theHellinger distances in this bivariate example under the shock-sign restriction (4) andthe historical decomposition restriction. For both the conditional and unconditionallikelihood, the Hellinger distances are minimized uniquely at the true θ , which isconsistent with our point-identification claim for θ . Proposition 4.1 also provides conditions under which ( φ , Q ) is not globally iden-tified, but a particular impulse response is. To give an example of this, consider anSVAR with n > k .Given φ , the set of u k ∈ R n satisfying the NR is a half-space defined by q ′ Σ − tr u k ≥ u k satisfying this inequality is indexed uniquely by q given Σ tr at its true value, so there are no values of Q that are observationally equivalent to Q with q = Q e ,n . Any value for the remaining n − Q such that Under the restriction on the historical decomposition, a notable difference between the condi-tional and unconditional likelihood cases is the slope of the Hellinger distance around the minimum.The Hellinger distance of the unconditional likelihood yields a steeper slope than the conditional like-lihood. This indicates the loss of information for θ in the conditional likelihood due to conditioningon the non-ancillary event. igure 3: Hellinger Distance - - /2 0 /200.511.52 Shock-sign - - /2 0 /200.511.52
Historical Decomposition
Notes: T = 3 and φ is known; Hellinger distances are approximated using Monte Carlo. they are orthogonal to Q e ,n will generate the same half-space for u k , so Q ∗ is nota singleton and the SVAR is not globally identified. However, the projection of Q ∗ onto its first column is a singleton, so η i, ,h ( φ , Q ) is globally identified.Although a single NR can deliver global identification in the frequentist sense,the practical implication of this theoretical claim is not obvious. The observed un-conditional likelihood is almost always flat at the maximum, so we cannot obtaina unique maximum likelihood estimator for the structural parameter. As a result,the standard asymptotic approximation of the sampling distribution of the maximumlikelihood estimator is not applicable. The SVAR model with NR possesses featuresof set-identified models from the Bayesian standpoint (i.e., flat regions of the likeli-hood). However, strictly speaking, it can be classified as a globally identified modelin the frequentist sense when the condition of Proposition 4.1 holds. It is well-known that traditional sign restrictions deliver set-identification of Q (or,equivalently, the structural parameters). Given the reduced-form parameter φ –which is point-identified – there are multiple observationally equivalent values of Q ,in the sense that there exists Q and ˜ Q = Q such that p ( y T | φ , Q ) = p ( y T | φ , ˜ Q ) forevery y T in the sample space. The identified set for Q given φ contains all suchobservationally equivalent parameter points, and is defined as Q ( φ | S ) = (cid:8) Q ∈ O ( n ) : S ( φ , Q ) ≥ ˜ s × , diag( Q ′ Σ − tr ) ≥ n × (cid:9) . (30)23he identified set is a set-valued map only of φ , which carries all the informationabout Q contained in the data.The complication in applying this definition of the identified set in SVARs whenthere are NR is that the reduced-form VAR parameters no longer represent all infor-mation about Q contained in the data; by truncating the likelihood, the realizationsof the data entering the NR contain additional information about Q . To address this,we introduce a refinement of the definition of an identified set. Definition 4.1.
Let N ≡ N ( φ , Q , y T ) ≥ s × represent a set of NR in terms of theparameters and the data.(i) The conditional identified set for Q under NR is Q ( φ | y T , N ) = { Q ∈ O ( n ) : N ( φ , Q , y T ) ≥ s × } . (31) The conditional identified set for the impulse response η = η i,j,h ( φ , Q ) under NR isdefined by projecting Q ( φ | y T , N ) via η i,j,h ( φ , Q ) : CIS η ( φ | y T , N ) = { η i,j,h ( φ , Q ) : Q ∈ Q ( φ | y T , N ) } . (32) (ii) Let s : Y → R S be a statistic. We call s ( Y T ) a sufficient statistic forthe conditional identified set Q ( φ | y T , N ) if the conditional identified set for Q depends on the sample y T through s ( y T ) ; i.e., there exists ˜ Q ( φ |· , N ) such that Q ( φ | y T , N ) = ˜ Q ( φ | s ( y T ) , N ) (33) holds for all φ ∈ Φ and y T ∈ Y . Unlike the standard identified set Q ( φ | S ), the conditional identified set Q ( φ | y T , N )depends on the sample y T because of the aforementioned data-dependent support ofthe likelihood. In terms of the observed likelihood, however, they share the propertythat the likelihood is flat on the (conditional) identified set. Hence, given the sample y T and the reduced-form parameters φ , all values of Q in Q ( φ | y T , N ) fit the dataequally well and, in this particular sense, they are observationally equivalent.When the NR concern shocks in only a subset of the time periods in the data,the conditional identified set under these NR depends on the sample only through a24ew observations entering the NR. The sufficient statistics s ( y T ) defined in Definition4.1(ii) represent such observations. For instance, in the toy example of Section 2.1, theconditional identified set depends only on the observations in period k , so s ( y T ) = y k .If we extend the example of Section 2.1 to the SVAR( p ), the shock-sign restriction inEquation (4) can be expressed as ε k = e ′ , A u k = e ′ , Q ′ Σ − tr ( y k − Bx k ) ≥ . (34)Hence, the conditional identified set Q ( φ | y T , N ) depends on the data only through( y ′ k , x ′ k ) ′ = ( y ′ k , y ′ k − , · · · , y ′ k − p ) ′ , so we can set s ( y T ) = ( y ′ k , y ′ k − , · · · , y ′ k − p ) ′ .If the conditional distribution of Y T given s ( Y T ) = s ( y T ) is nondegenerate, wecan consider a frequentist experiment (repeated sampling of Y T ) conditional on thesufficient statistics set to the observed value. In this conditional experiment, we canview the conditional identified set Q ( φ | y T , N ) as the standard identified set in set-identified models, since it no longer depends on the data in the conditional experimentwhere s ( y T ) is fixed. This is the reason that we refer to Q ( φ | y T , N ) as the conditionalidentified set. In Section 6 below, we show the frequentist validity of the robust-Bayescredible region by establishing conditional coverage of the conditional identified setfor an impulse response. This section presents approaches to conducting posterior inference in SVARs underNR. Section 5.1 discusses how to modify the standard Bayesian approach in AR18 touse the unconditional likelihood rather than the conditional likelihood. Section 5.2explains how to conduct robust Bayesian inference under NR, which further addressesthe issue of posterior sensitivity due to the flat unconditional likelihood. Section 5.3describes how to numerically implement the robust Bayesian procedure.
AR18 propose an algorithm for drawing from the uniform-normal-inverse-Wishartposterior of ( φ , Q ) given a set of traditional sign restrictions and NR. This is theposterior induced by a normal-inverse-Wishart prior over φ and an unconditionallyuniform prior over Q . The algorithm proceeds by drawing φ from a normal-inverse-25ishart distribution and Q from a uniform distribution over O ( n ), and checkingwhether the restrictions are satisfied. If the restrictions are not satisfied, the jointdraw is discarded and another draw is made. If the restrictions are satisfied, the exante probability that the NR are satisfied at the drawn parameter values is approxi-mated via Monte Carlo simulation. Once the desired number of draws are obtainedsatisfying the restrictions, the draws are resampled with replacement using as impor-tance weights the inverse of the probability that the NR are satisfied. This algorithm essentially draws from the posterior under the unconditional like-lihood and then uses importance sampling to transform these draws into draws fromthe posterior given the conditional likelihood. To draw from the uniform-normal-inverse-Wishart posterior using the unconditional likelihood to construct the pos-terior, one therefore simply needs to omit the importance-sampling step from thisalgorithm. Approximating the probability used to construct the importance weightsrequires Monte Carlo integration, which can be computationally expensive, particu-larly when the NR constrain the structural shocks in multiple periods. Omitting theimportance-sampling step can therefore ease the computational burden of drawingfrom the posterior. However, as discussed above, standard Bayesian inference underthe unconditional likelihood may be sensitive to the choice of conditional prior for Q | φ , because the likelihood possesses flat regions.By rejecting draws that do not satisfy the restrictions, the algorithm describedabove places more weight on draws of φ that are less likely to satisfy the restrictionsunder the uniform distribution over O ( n ). As discussed in Uhlig (2017), one mayinstead prefer to use a prior that is conditionally uniform over Q | φ . To draw fromthe posterior of ( φ , Q ) under the unconditional likelihood given an arbitrary prior over φ and a conditionally uniform prior over Q | φ , one can repeat Step 2 of Algorithm 1in Section 5.3. This section explains how to conduct robust Bayesian inference about a scalar-valuedfunction of the structural parameters under NR and traditional sign restrictions. Theapproach can be viewed as performing global sensitivity analysis to assess whether Based on the results in Arias et al. (2018), AR18 argue that their algorithm draws from anormal-generalized-normal posterior over the SVAR’s structural parameters ( A , A + ) induced by aconjugate normal-generalized-normal prior, conditional on the restrictions. η ,although the discussion in this section also applies to any other scalar-valued func-tion of the structural parameters, such as the forecast error variance decompositionor the historical decomposition.Let π φ be a prior over the reduced-form parameter φ ∈ Φ , where Φ is the space ofreduced-form parameters such that Q ( φ | S ) is non-empty. A joint prior for ( φ , Q ) ∈ Φ × O ( n ) can be written as π φ , Q = π Q | φ π φ , where π Q | φ is supported only on Q ( φ | S ).When there are only traditional identifying restrictions, π Q | φ is not updated by thedata, because the likelihood function is not a function of Q . Posterior inferencemay therefore be sensitive to the choice of conditional prior, even asymptotically.As discussed above, a similar issue arises under NR. The difference under NR isthat π Q | φ is updated by the data through the truncation points of the unconditionallikelihood. However, at each value of φ , the unconditional likelihood is flat overthe set of values of Q satisfying the NR. Consequently, the conditional posterior for Q | φ , Y T is proportional to the conditional prior for Q | φ at each φ whenever theconditional identified set for Q given ( φ , Y T ) is nonempty.Rather than specifying a single prior for Q | φ , the robust Bayesian approach ofGK considers the class of all priors for Q | φ that are consistent with the traditionalidentifying restrictions: Π Q | φ = (cid:8) π Q | φ : π Q | φ ( Q ( φ | S )) = 1 (cid:9) . (35)Notice that we cannot impose the NR using a particular conditional prior on Q | φ dueto the data-dependent mapping from φ to Q . However, by considering all possibleconditional priors for Q | φ that are consistent with the traditional identifying restric-tions, we trace out all possible conditional posteriors for Q | φ , Y T that are consistentwith the traditional identifying restrictions and the NR. This is because the NR trun-cate the unconditional likelihood function and the traditional identifying restrictionstruncate the prior for Q | φ , so the posterior for Q | φ , Y T is supported only on thevalues of Q that satisfy both sets of restrictions.Given a particular prior for ( φ , Q ) and using the unconditional likelihood, the27osterior is π φ , Q | Y T ,D N =1 ∝ p ( Y T , D N = 1 | φ , Q ) π Q | φ π φ ∝ f ( Y T | φ ) D N ( φ , Q , Y T ) π φ π Q | φ ∝ π φ | Y T π Q | φ D N ( φ , Q , Y T ) . (36)The final expression for the posterior makes it clear that any prior for Q | φ that isconsistent with the traditional identifying restrictions is in effect further truncated bythe NR (through the likelihood) once the data are realized. Generating this posteriorusing every prior within the class of priors for Q | φ generates a class of posteriors for( φ , Q ):Π φ , Q | Y T ,D N =1 = (cid:8) π φ , Q | Y T ,D N =1 = π φ | Y T π Q | φ D N ( φ , Q , Y T ) : π Q | φ ∈ Π Q | φ (cid:9) . (37)Marginalizing each posterior in this class of posteriors induces a class of posteriors for η , Π η | Y T ,D N =1 . Each prior within the class of priors Π Q | φ therefore induces a posteriorfor η . Associated with each of these posteriors are quantities such as the posteriormean, median and other quantiles. For example, as we consider each possible priorwithin Π Q | φ , we can trace out the set of all possible posterior means for η . Thiswill always be an interval, so we can summarize this ‘set of posterior means’ by itsendpoints: (cid:20)Z Φ l ( φ , Y T ) dπ φ | Y T , Z Φ u ( φ , Y T ) dπ φ | Y T (cid:21) , (38)where l ( φ , Y T ) = inf { η ( φ , Q ) : Q ∈ Q ( φ | Y T , N, S ) } , u ( φ , Y T ) = sup { η ( φ , Q ) : Q ∈Q ( φ | Y T , N, S ) } and Q ( φ | Y T , N, S ) = (cid:8) Q ( φ | S ) ∩ Q ( φ | Y T , N ) (cid:9) (39)is the set of values of Q that are consistent with the traditional identifying restrictionsand the NR. In contrast, in GK the set of posterior means is obtained by finding theinfimum and supremum of η ( φ , Q ) over Q ( φ | S ) and averaging these over π φ | Y T . Theimportant difference from GK is that the current set of posterior means depends onthe data not only through the posterior for φ but also through the set of admissiblevalues of Q under the NR. As a result, in contrast with GK, we cannot interpretthe set of posterior means (38) as a consistent estimator for the identified set for η α , whichis the shortest interval estimate for η such that the posterior probability put on theinterval is greater than or equal to α uniformly over the posteriors in Π η | Y T ,D N =1 (see Proposition 1 of GK). One may also be interested in posterior lower and upperprobabilities, which are the infimum and supremum, respectively, of the probabilityfor a hypothesis over all posteriors in the class.GK provide conditions under which their robust Bayesian approach has a validfrequentist interpretation, in the sense that the robust credible region is an asymptot-ically valid confidence set for the true identified set. For the same reason as mentionedabove, however, frequentist validity of the robust credible region does not immedi-ately extend to the NR case. We provide conditions under which the robust credibleregion has a valid frequentist interpretation in Section 6. This section describes a general algorithm to implement our robust Bayesian proce-dure under NR. GK propose numerical algorithms for conducting robust Bayesianinference in SVARs identified using traditional sign and zero restrictions. Their Algo-rithm 1 uses a numerical optimization routine to obtain the lower and upper boundsof the identified set at each draw of φ . Obtaining the bounds via numerical opti-mization is not generally applicable under the class of NR considered in AR18, sincethe constraints on the historical decomposition are not differentiable everywhere in Q . We therefore adapt Algorithm 2 of GK, which approximates the bounds of theidentified set at each draw of φ using Monte Carlo simulation. Algorithm 1.
Let N ( φ , Q , Y T ) ≥ s × be the set of NR and let S ( φ , Q ) ≥ ˜ s × bethe set of traditional sign restrictions (excluding the sign normalization). Assume theobject of interest is η i,j ∗ ,h = c ′ i,h ( φ ) q j ∗ . • Step 1 : Specify a prior for φ , π φ , and obtain the posterior π φ | Y T . Step 2 : Draw φ from π φ | Y T and check whether Q ( φ | Y T , N, S ) is empty usingthe subroutine below. – Step 2.1 : Draw an n × n matrix of independent standard normal randomvariables, Z , and let Z = ˜ QR be the QR decomposition of Z . – Step 2.2 : Define Q = (cid:20) sgn(( Σ − tr e ,n ) ′ ˜ q ) ˜ q k ˜ q k , . . . , sgn(( Σ − tr e n,n ) ′ ˜ q n ) ˜ q n k ˜ q n k (cid:21) , where ˜ q j is the j th column of ˜ Q . – Step 2.3 : Check whether Q satisfies N ( φ , Q , Y T ) ≥ s × and S ( φ , Q ) ≥ ˜ s × . If so, retain Q and proceed to Step 3. Otherwise, repeat Steps 2.1 and2.2 (up to a maximum of L times) until Q is obtained satisfying the restric-tions. If no draws of Q satisfy the restrictions, approximate Q ( φ | Y T , N, S ) as being empty and return to Step 2. • Step 3 : Repeat Steps 2.1–2.3 until K draws of Q are obtained. Let { Q k , k =1 , ..., K } be the K draws of Q that satisfy the restrictions and let q j ∗ ,k bethe j ∗ th column of Q k . Approximate [ l ( φ , Y T ) , u ( φ , Y T )] by [min k c ′ i,h ( φ ) q j ∗ ,k , max k c ′ i,h ( φ ) q j ∗ ,k ] . • Step 4 : Repeat Steps 2–3 M times to obtain [ l ( φ m , Y T ) , u ( φ m , Y T )] for m =1 , ..., M . Approximate the set of posterior means using the sample averages of l ( φ m , Y T ) and u ( φ m , Y T ) . • Step 5 : To obtain an approximation of the smallest robust credible region withcredibility α ∈ (0 , , define d ( η, φ , Y T ) = max {| η − l ( φ , Y T ) | , | η − u ( φ , Y T ) |} and let ˆ z α ( η ) be the sample α -th quantile of { d ( η, φ m , Y T ) , m = 1 , ..., M } . Anapproximated smallest robust credible interval for η i,j ∗ ,h is an interval centeredat arg min η ˆ z α ( η ) with radius min η ˆ z α ( η ) . Algorithm 1 approximates [ l ( φ , Y T ) , u ( φ , Y T )] at each draw of φ via Monte Carlosimulation. The approximated set will be too narrow given a finite number of draws This is the algorithm used by Rubio-Ram´ırez et al. (2010) to draw from the uniform distributionover O ( n ), except that we do not normalize the diagonal elements of R to be positive. This is becausewe impose a sign normalization based on the diagonal elements of A = Q ′ Σ − tr in Step 2.2. Q , but the approximation error will vanish as the number of draws goes to infinity.The algorithm may be computationally demanding when the restrictions substantiallytruncate Q ( φ | Y T , N, S ), because many draws of Q from O ( n ) may be rejected ateach draw of φ . However, the same draws of Q can be used to compute l ( φ , Y T )and u ( φ , Y T ) for different objects of interest, which cuts down on computation time.For example, the same draws of Q can be used to compute the impulse responsesof all variables to all shocks at all horizons of interest. They can also be used tocompute other parameters by replacing η i,j ∗ ,h with some other function, such as theforecast error variance decomposition, an element of A , the historical decompositionor the structural shocks themselves in particular periods. Step 3 is parallelizable, soreductions in computing time are possible by distributing computation across multipleprocessors. Other algorithms may be computationally more efficient than Algorithm 1in particular cases. We discuss these in Appendix D.
In this section, we show that the robust Bayes credible region attains asymptoticallyvalid frequentist coverage in a setting where the number of NR is small relative to thelength of the sampled periods in a sense that we make precise in the next assumption.This assumption is empirically relevant given that applications typically impose theserestrictions in at most a handful of periods.
Assumption 6.1. (fixed-dimensional s ( Y T )): The conditional identified set underNR has sufficient statistics s ( Y T ), as defined in Definition 4.1(ii), and the dimensionof s ( Y T ) does not depend on T .Let ( φ , Q ) be the true parameter values. We view the sample Y T as being drawnfrom p ( Y T | φ ). Let p ( Y T | φ , s ) be the conditional distribution of the sample Y T given the sufficient statistics for the conditional identified set s = s ( Y T ) at φ = φ .We denote by p ( s | φ ) the distribution of the sufficient statistics s ( Y T ) at φ = φ .The next assumption assumes that in the conditional experiment given s ( Y T ), thesampling distribution for the maximum likelihood estimator ˆ φ ≡ arg max φ p ( Y T | φ )centered at φ and the posterior for φ centered at ˆ φ asymptotically coincide. Impulse responses to a unit shock – rather than a standard-deviation shock – can be computedas in Algorithm 3 of Giacomini et al. (2019). ssumption 6.2. (Conditional Bernstein-von Mises property for φ ): For p ( s | φ )-almost every s and p ( Y T | φ , s )-almost every sampling sequence Y T , the posterior for √ T ( φ − ˆ φ ) asymptotically coincides with the sampling distribution of √ T ( ˆ φ − φ )with respect to p ( Y T | φ , s ), as T → ∞ , in the sense stated in Assumption 5(i) inGK.This is a key assumption for establishing the asymptotic frequentist validity ofthe robust credible region under NR. It holds, for instance, when s ( y T ) correspondsto one or a few observations in the whole sample, as we had in the toy example ofSection 2.1. In this case, the influence of s ( y T ) vanishes in the conditional samplingdistribution of √ T ( ˆ φ − φ ) as T → ∞ , as the latter asymptotically agrees with theasymptotically normal sampling distribution for the maximum likelihood estimatorwith variance-covariance matrix given by the inverse of the Fisher information matrix.By the well-known Bernstein-von Mises theorem for regular parametric models, theposterior for √ T ( φ − ˆ φ ) asymptotically agrees with this sampling distribution.The last assumption requires convexity and smoothness of the conditional iden-tified set, and is analogous to Assumption 5(ii) of GK for standard set-identifiedmodels. Assumption 6.3. (Almost-sure convexity and smoothness of the impulse responseidentified set): Let g CIS η ( φ | s ( Y T ) , N ) be the conditional identified set for η withthe sufficient statistics s ( Y T ). For p ( Y T | φ )-almost every Y T , g CIS η ( φ | s ( y T ) , N ) isclosed and convex, g CIS η ( φ | s ( y T ) , N ) = [ ˜ ℓ ( φ , s ( Y T )) , ˜ u ( φ , s ( Y T ))], and its lower andupper bounds are differentiable in φ at φ = φ with nonzero derivatives.Propositions B.1–B.3 in Appendix B provide primitive conditions for Assump-tion 6.3 to hold in the case where there are shock-sign restrictions. Imposing As-sumptions 6.1, 6.2 and 6.3, we obtain the following theorem. Theorem 6.4.
For γ ∈ (0 , , let b C ∗ α be the volume-minimizing robust credible region or η with credibility α , which satisfies inf π ∈ Π φ , Q | Y T ,DN =1 π ( b C ∗ α ) = π φ | Y T ,D N =1 ( CIS η ( φ | Y T , N ) ⊂ ˆ C ∗ α | Y T , D N = 1) = α. (40) Under Assumptions 6.1, 6.2, and 6.3, b C ∗ α attains asymptotically valid coverage forthe true impulse response, η , conditional on s ( Y T ) . lim inf T →∞ P Y T | s , φ ( η ∈ b C ∗ α | s ( Y T ) , φ ) ≥ lim T →∞ P Y T | s , φ ( g CIS η ( φ | s ( Y T ) , N ) ⊂ b C ∗ α | s ( Y T ) , φ ) = α. (41) Accordingly, b C ∗ α attains asymptotically valid coverage for η unconditionally, lim inf T →∞ P Y T | φ ( η ∈ b C ∗ α | φ ) ≥ lim T →∞ P Y T | φ ( g CIS η ( φ | s ( Y T ) , N ) ⊂ b C ∗ α | φ ) = α. (42) Proof.
See Appendix B.This theorem shows that the robust credible region of GK applied to the SVARmodel with NR attains asymptotically valid frequentist coverage for the true impulseresponse as well as the conditional impulse-response identified set. Even if the point-identification condition of Proposition 4.1 holds for the impulse response, it is notobvious if the standard Bayesian credible region can attain frequentist coverage. Thisis because the Bernstein-von Mises theorem does not seem to hold for the impulseresponse due to the non-standard features of models with NR.One could also consider asymptotics under an increasing number of restrictions.We conjecture that, under certain assumptions about how the NR are generated, theclass of posteriors for η will converge to a point mass at its true value. An implicationis that the posterior mean under any conditional prior for Q that places probabilityone on the identified set would be consistent for the true value. This result wouldbe an interesting contrast to the case under traditional set-identifying restrictions,where the location of the posterior mean within the identified set is determined purely The volume-minimizing robust credible region b C ∗ α is defined as a shortest interval among theconnected intervals C α satisfying P Y T | s , φ ( g CIS η ( φ | s ( Y T ) , N ) ⊂ C α | s ( Y T ) , φ ) ≥ α. See Proposition 1 in GK for a procedure to compute the volume-minimizing credible region.
33y the conditional prior and the posterior quantiles lie strictly within the identifiedset (e.g., Moon and Schorfheide (2012)). We do not analyze this case here, since theassumption that there is a fixed number of restrictions seems to be of primary interest.However, Appendix C provides numerical evidence in support of our conjecture. Weleave formal investigation of this conjecture for future work.
AR18 estimate the effects of monetary policy shocks on the US economy using acombination of sign restrictions on impulse responses and NR. The reduced-formVAR is the same as that used in Uhlig (2005). The model’s endogenous variables arereal GDP, the GDP deflator, a commodity price index, total reserves, non-borrowedreserves (all in natural logarithms) and the federal funds rate; see Arias, Caldara andRubio-Ram´ırez (2019) for details on the variables. The data are monthly and runfrom January 1965 to November 2007. The VAR includes 12 lags and we include aconstant.As NR, AR18 impose that the monetary policy shock in October 1979 was positiveand that it was the overwhelming contributor to the unexpected change in the federalfunds rate in that month. This was the month in which the Federal Reserve markedlyand unexpectedly increased the federal funds rate following the appointment of PaulVolcker as chairman of the Federal Reserve, and is widely considered to be an exampleof a positive monetary policy shock (e.g., Romer and Romer (1989)). The traditionalsign restrictions considered in Uhlig (2005) are also imposed. Specifically, the responseof the federal funds rate is restricted to be non-negative for h = 0 , , . . . , h = 0 , , . . . , π φ = π B , Σ ∝ | Σ | − n +12 , which is truncated so that the VAR is stable. The posteriorfor the reduced-form parameters, π φ | Y T , is then a normal-inverse-Wishart distribu-tion, from which it is straightforward to obtain independent draws (for example,see Del Negro and Schorfheide (2011)). We obtain 1,000 draws from the posteriorof φ such that the VAR is stable and Q ( φ | Y T , N, S ) is non-empty. We use Algo-34ithm 1 with K = 10 ,
000 draws of Q at each draw of φ to approximate l ( φ , Y T ) and u ( φ , Y T ). If we cannot obtain a draw of Q satisfying the restrictions after 100,000draws of Q , we approximate Q ( φ | Y T , N, S ) as being empty at that draw of φ .We explore the sensitivity of posterior inference to the choice of prior for Q | φ when the unconditional likelihood is used to construct the posterior. For brevity, wereport only the impulse responses of the federal funds rate and real GDP to a positivestandard-deviation monetary policy shock (Figure 4). As a point of comparison, wereport results obtained using a conditionally uniform prior for Q | φ . Under this prior,the 68 per cent highest posterior density credible intervals for the response of realGDP exclude zero at horizons greater than a year or so. In contrast, the 68 percent robust credible intervals include zero at all horizons. Under the single prior, theposterior probability that the output response is negative two years after the shockis 95 per cent. In contrast, the posterior lower probability of this event – the smallestprobability over the class of posteriors generated by the class of priors – is only 54 percent. The results suggest that posterior inference about the effect of monetary policyon output can be sensitive to the choice of (unrevisable) prior for Q | φ .AR18 also consider an alternative set of restrictions. Specifically, they imposethat the monetary policy shock was: positive in April 1974, October 1979, December1988 and February 1994; negative in December 1990, October 1998, April 2001 andNovember 2002; and the most important contributor to the observed unexpectedchange in the federal funds rate in these months. The choice of these dates is based ona synthesis of information from different sources, including the chronology of monetarypolicy actions from Romer and Romer (1989), an updated series of the monetarypolicy shocks constructed using Greenbook forecasts in Romer and Romer (2004),the high-frequency monetary policy surprises from G¨urkaynak, Sack and Swanson(2005), and minutes from Federal Open Markets Committee meetings. Under thisextended set of restrictions, the set of posterior means and the robust credible intervalare tightened noticeably, particularly at shorter horizons (Figure 5). The posteriorlower probability of a negative output response two years after the shock is now 80 per The results are not directly comparable to those presented in Figure 6 of AR18. First, wepresent responses to a standard-deviation shock, whereas AR18 describe their responses as beingto a 25 basis point shock (although, from close inspection of their Figure 6, it is evident that thisnormalization is not imposed correctly, because the impact response of the federal funds rate fansout around zero). Second, we use a prior for Q that is conditionally uniform given φ , whereas AR18use a prior that is unconditionally uniform. igure 4: Impulse Responses to a Monetary Policy Shock Horizon (months) -40-20020406080 bp s Federal Funds Rate
Horizon (months) -0.6-0.4-0.200.20.4 % Real GDP
Notes: Circles and dashed lines are, respectively, posterior means and 68 per cent (point-wise) highest posterior density intervals under the uniform prior for Q | φ ; ver-tical bars are sets of posterior means and solid lines are 68 per cent (point-wise) robust credible regions obtained using Algorithm 1 with 10,000 draws from Q ( φ | Y T , N, S ); results are based on 1,000 draws from the posterior of φ withnonempty Q ( φ | Y T , N, S ); impulse responses are to a standard-deviation shock. cent, compared with 54 per cent under the October 1979 restrictions.Finally, we investigate how posterior inference about the output response is af-fected by replacing AR18’s extended set of restrictions with a shock-rank restriction.Specifically, we estimate the set of output responses that are consistent with the re-striction that the monetary policy shock in October 1979 was the largest positiverealization of the monetary policy shock in the sample period. This restriction ap-pears plausible given that the change in the federal funds rate in October 1979 wasmore positive than the change in the federal funds rate in the other periods identifiedby AR18 as containing notable monetary policy shocks (Table 1). The shock-rankrestriction somewhat shrinks the set of posterior means and robust credible regionsrelative to those obtained under the restrictions on the historical decomposition. Nev-ertheless, the two sets of restrictions lead to similar (robust) posterior inferences aboutthe output response. The 68 per cent robust credible intervals include zero at all hori-zons under both sets of restrictions. The posterior lower probability that output fallstwo years after the shock is 73 per cent under the shock-rank restriction, comparedwith 80 per cent under the restriction on the historical decomposition. The large number of inequality constraints and tight conditional identified set induced bythe shock-rank restriction poses computational challenges when using Algorithm 1. Accordingly,we use an alternative algorithm to obtain the results. The algorithm adapts an algorithm inAmir-Ahmadi and Drautzburg (2021) and is described in Appendix D. able 1: Monthly Change in Federal Funds Rate (ppt) Oct 79 Apr 74 Dec 88 Feb 94 Dec 90 Oct 98 Apr 01 Nov 022.34 1.16 0.41 0.20 –0.50 –0.44 –0.51 –0.41
Source: FRED
Figure 5: Impulse Responses to a Monetary Policy Shock – ExtendedRestrictions vs Shock-rank Restriction
Horizon (months) -40-20020406080 bp s Federal Funds Rate
Shock-rankHistorical decomposition 0 12 24 36 48 60
Horizon (months) -0.6-0.4-0.200.20.4 % Real GDP
Notes: Solid lines represent set of posterior means and dashed lines represent 68 per cent(pointwise) robust credible regions; results are based on 1,000 draws from the pos-terior of φ with nonempty Q ( φ | Y T , N, S ); results under shock-rank restriction areobtained using Algorithm D.1; results under restrictions on the historical decom-position are obtained using Algorithm 1 with 1,000 draws from Q ( φ | Y T , N, S );impulse responses are to a standard-deviation shock. In general, Q ( φ | Y T , N, S ) may be empty at particular values of φ . The proportionof draws of φ where Q ( φ | Y T , N, S ) is empty can therefore be used to assess the plau-sibility of the restrictions (see GK). Under the October 1979 restrictions, the posteriorplausibility of the restrictions is one (i.e., every draw of φ has a nonempty conditionalidentified set). In contrast, the posterior plausibility under AR18’s extended set ofrestrictions is 53 per cent, while it is only 17 per cent under the shock-rank restriction. Directly restricting the values of structural shocks to be consistent with historicalnarratives offers a potentially useful approach to disciplining SVARs, but raises novelissues related to identification and inference. These restrictions generate a set-valuedmapping from the model’s reduced-form parameters to its structural parameters that37epends on the realization of the data entering the restrictions. This means that theserestrictions do not fit neatly into the existing framework for analyzing identificationin SVARs. In particular, we show that these restrictions may be point-identifying in afrequentist sense. We also highlight issues associated with existing standard Bayesianapproaches to estimation and inference. Conditioning on the restrictions holdingmay result in the posterior placing more weight on parameters that yield a lowerex ante probability that the restrictions are satisfied. We therefore advocate usingthe unconditional likelihood when constructing the posterior. However, the observedunconditional likelihood will almost always possess flat regions, which implies thata component of the prior will not be updated by the data. Posterior inference maytherefore be sensitive to the choice of prior. To address this, we provide robustBayesian tools to assess or eliminate the sensitivity of posterior inference to the choiceof prior. We also provide conditions under which these tools have a valid frequentistinterpretation, so our approach should appeal to both Bayesians and frequentists.While we focus on SVARs in the paper, our analysis could be extended to othersettings. For example, Plagborg-Møller and Wolf (in press[a]) explain how to imposetraditional SVAR identifying restrictions in the local projection framework under theassumption that the structural shocks are invertible. In Appendix E we briefly discusshow NR could also be imposed within the local projection framework, but we leave aformal analysis of this problem to future research.38 ppendices
A Bivariate example derivations
Set of values of θ under shock-sign restriction. This section derives analyticalexpressions for the set of values of θ consistent with the shock-sign restriction in thebivariate example of Section 2. Throughout, we assume that θ ∈ [ − π, π ].Under the shock-sign restriction ε k ≥ A ) ≥ × , θ is restricted to lie in the set θ ∈ { θ : σ sin θ ≤ σ cos θ, cos θ ≥ , σ y k cos θ ≥ ( σ y k − σ y k ) sin θ }∪ { θ : σ sin θ ≤ σ cos θ, cos θ ≤ , σ y k cos θ ≥ ( σ y k − σ y k ) sin θ } . (A.1)Consider the case where σ < σ y k − σ y k <
0. Then θ is restricted tothe set θ ∈ (cid:26) θ : tan θ ≥ σ σ , cos θ > , σ y k σ y k − σ y k ≤ tan θ (cid:27) ∪ n π o ∪ (cid:26) θ : tan θ ≤ σ σ , cos θ < , σ y k σ y k − σ y k ≥ tan θ (cid:27) . (A.2)The inequalities in the first set hold if and only if tan θ ≥ max n σ σ , σ y k σ y k − σ y k o and θ ∈ ( − π , π ), which implies thatarctan (cid:18) max (cid:26) σ σ , σ y k σ y k − σ y k (cid:27)(cid:19) ≤ θ < π . (A.3)The inequalities on the second line hold if and only if tan θ ≤ min n σ σ , σ y k σ y k − σ y k o and θ ∈ [ − π, − π ) ∪ ( π , π ]. Since σ <
0, tan θ must be negative, which implies that θ ∈ ( π , π ]. It follows that π < θ ≤ π + arctan (cid:18) min (cid:26) σ σ , σ y k σ y k − σ y k (cid:27)(cid:19) . (A.4)39aking the union of (A.3), (A.4) and (cid:8) π (cid:9) implies that θ ∈ (cid:20) arctan (cid:18) max (cid:26) σ σ , σ y k σ y k − σ y k (cid:27)(cid:19) , π + arctan (cid:18) min (cid:26) σ σ , σ y k σ y k − σ y k (cid:27)(cid:19)(cid:21) . (A.5)Next, consider the case where σ < σ y k − σ y k >
0. Then θ is restrictedto the set θ ∈ (cid:26) θ : tan θ ≥ σ σ , cos θ > , σ y k σ y k − σ y k ≥ tan θ (cid:27) ∪ (cid:26) θ : tan θ ≤ σ σ , cos θ < , σ y k σ y k − σ y k ≤ tan θ (cid:27) . (A.6)If y k > y k < σ σ < σ y k σ y k − σ y k , the second set of inequalities is notsatisfied for any θ , while the first set of inequalities is satisfied for θ ∈ (cid:20) arctan (cid:18) σ σ (cid:19) , arctan (cid:18) σ y k σ y k − σ y k (cid:19)(cid:21) . (A.7)If y k < σ σ > σ y k σ y k − σ y k , the first set of inequalities has no solution and thesecond set is satisfied for θ ∈ (cid:20) π + arctan (cid:18) σ y k σ y k − σ y k (cid:19) , π + arctan (cid:18) σ σ (cid:19)(cid:21) . (A.8)In the case where σ > σ y k − σ y k < θ is restricted to the set θ ∈ (cid:26) θ : tan θ ≤ σ σ , cos θ > , σ y k σ y k − σ y k ≤ tan θ (cid:27) ∪ (cid:26) θ : tan θ ≥ σ σ , cos θ < , σ y k σ y k − σ y k ≥ tan θ (cid:27) . (A.9)If y k > y k < σ σ > σ y k σ y k − σ y k , the second set of inequalities has nosolution, while the first is satisfied for θ ∈ (cid:20) arctan (cid:18) σ y k σ y k − σ y k (cid:19) , arctan (cid:18) σ σ (cid:19)(cid:21) . (A.10)If y k < σ σ < σ y k σ y k − σ y k , the first set of inequalities has no solution and the40econd set is satisfied for θ ∈ (cid:20) − π + arctan (cid:18) σ σ (cid:19) , − π + arctan (cid:18) σ y k σ y k − σ y k (cid:19)(cid:21) . (A.11)Finally, in the case where σ > σ y k − σ y k > θ is restricted to theset θ ∈ (cid:26) θ : tan θ ≤ σ σ , cos θ > , σ y k σ y k − σ y k ≥ tan θ (cid:27) ∪ n − π o ∪ (cid:26) θ : tan θ ≥ σ σ , cos θ < , σ y k σ y k − σ y k ≤ tan θ (cid:27) . (A.12)The first set of inequalities holds if and only if tan θ ≤ min n σ σ , σ y k σ y k − σ y k o and θ ∈ ( − π , π ), which implies that − π < θ ≤ arctan (cid:18) min (cid:26) σ σ , σ y k σ y k − σ y k (cid:27)(cid:19) . (A.13)The second set of inequalities holds if and only if tan θ ≥ max n σ σ , σ y k σ y k − σ y k o and θ ∈ [ − π, − π ) ∪ ( π , π ]. Since σ >
0, tan θ must be positive, which implies that θ ∈ [ − π, − π ). It follows that − π + arctan (cid:18) max (cid:26) σ σ , σ y k σ y k − σ y k (cid:27)(cid:19) ≤ θ < − π . (A.14)Taking the union of (A.13), (A.14) and (cid:8) − π (cid:9) implies that θ ∈ (cid:20) − π + arctan (cid:18) max (cid:26) σ σ , σ y k σ y k − σ y k (cid:27)(cid:19) , arctan (cid:18) min (cid:26) σ σ , σ y k σ y k − σ y k (cid:27)(cid:19)(cid:21) . (A.15) Set of values of η under shock-sign restriction. Here we derive the expressionfor the set of impulse responses η ≡ σ cos θ consistent with the shock-sign restriction(i.e., (8) in Section 2).In the absence of restrictions, the set of admissible values for the matrix of con-temporaneous impulse responses is 41 − ∈ (" σ cos θ − σ sin θσ cos θ + σ sin θ σ cos θ − σ sin θ : θ ∈ [ − π, π ] ) ∪ (" σ cos θ σ sin θσ cos θ + σ sin θ σ sin θ − σ cos θ : θ ∈ [ − π, π ] ) . (A.16)Assume that σ < σ y k − σ y k > y k >
0. Within the interval for θ defined in (A.7), η is maximized at θ = 0, so η ub = σ . The lower bound η lb occursat one of the endpoints of the interval for θ , so it satisfies η lb = min (cid:26) σ cos (cid:18) arctan (cid:18) σ σ (cid:19)(cid:19) , σ cos (cid:18) arctan (cid:18) σ y k σ y k − σ y k (cid:19)(cid:19)(cid:27) = min (cid:26) σ cos (cid:18) − arctan (cid:18) σ σ (cid:19)(cid:19) , σ cos (cid:18) arctan (cid:18) σ y k σ y k − σ y k (cid:19)(cid:19)(cid:27) = min (cid:26) σ cos (cid:18) arctan (cid:18) − σ σ (cid:19)(cid:19) , σ cos (cid:18) arctan (cid:18) σ y k σ y k − σ y k (cid:19)(cid:19)(cid:27) = σ cos (cid:18) max (cid:26) arctan (cid:18) − σ σ (cid:19) , arctan (cid:18) σ y k σ y k − σ y k (cid:19)(cid:27)(cid:19) = σ cos (cid:18) arctan (cid:18) max (cid:26) − σ σ , σ y k σ y k − σ y k (cid:27)(cid:19)(cid:19) . (A.17)The second line follows from the fact that cos( . ) is an even function and the thirdline follows from the fact that arctan( . ) is an odd function. The arguments enteringthe cos( . ) functions on the third line are both in the interval [0 , π ), so the fourth linefollows from the fact that cos( . ) is a decreasing function over this domain. The finalline follow from the fact that arctan( . ) is an increasing function. Restriction on the historical decomposition.
Under the restrictions that thefirst structural shock is positive in period k and was the most important (or over-whelming) contributor to the change in the first variable, θ is restricted to lie in theset θ ∈ n θ : σ sin θ ≤ σ cos θ, cos θ ≥ , σ y k cos θ ≥ ( σ y k − σ y k ) sin θ, | σ y k cos θ +( σ y k − σ y k ) cos θ sin θ | ≥ | σ y k sin θ +( σ y k − σ y k ) cos θ sin θ | o ∪ n θ : σ sin θ ≤ σ cos θ, cos θ ≤ , σ y k cos θ ≥ ( σ y k − σ y k ) sin θ, σ y k cos θ +( σ y k − σ y k ) cos θ sin θ | ≥ | σ y k sin θ +( σ y k − σ y k ) cos θ sin θ | o . (A.18)As in the case of the shock-sign restriction, this set also depends on the data y k independently of the reduced-form parameters. B Omitted proofs
Proof of Proposition 4.1.
Proof. H ( φ , Q ) can be written as H ( φ , Q ) = Z Y f / ( y T | φ ) f / ( y T | φ ) · D N ( φ , Q , y T ) D N ( φ , Q , y T ) d y T + Z Y f / ( y T | φ ) f / ( y T | φ ) · (1 − D N ( φ , Q , y T ))(1 − D N ( φ , Q , y T )) d y T . Note that the likelihood for the reduced-form parameters f ( y T | φ ) point-identifies φ ,so f ( ·| φ ) = f ( ·| φ ) holds only at φ = φ . Hence, we set φ = φ and consider H ( φ , Q ), H ( φ , Q ) = Z { y T : D N ( φ , Q , y T )= D N ( φ , Q , y T ) } f ( y T | φ ) d y T . Hence, H ( φ , Q ) = 1 if and only if D N ( φ , Q , y T ) = D N ( φ , Q , y T ) holds f ( Y T | φ )-a.s. In terms of the reduced-form residuals entering the NR, the latter conditionis equivalent to { U : N ( φ , Q , Y T ) ≥ s × } = { U : N ( φ , Q , Y T ) ≥ s × } upto f ( Y T | φ )-null set. Hence, Q ∗ defined in the proposition collects observationallyequivalent values of Q at φ = φ in terms of the unconditional likelihood.Next, consider the conditional likelihood and consider H c ( φ , Q ) = 1 r / ( φ , Q ) r / ( φ , Q ) Z Y f ( y T | φ ) · D N ( φ , Q , y T ) D N ( φ , Q , y T ) d y T = E Y T | φ (cid:2) D N ( φ , Q , Y T ) D N ( φ , Q , Y T ) (cid:3) r / ( φ , Q ) r / ( φ , Q ) ≤ , D N ( φ , Q , Y T ) = D N ( φ , Q , Y T ) holds f ( Y T | φ )-a.s. Hence,by repeating the argument for the unconditional likelihood case, we conclude that Q ∗ consists of observationally equivalent values of Q at φ = φ in terms of theconditional likelihood. Proof of Theorem 6.4.
Since ( φ , Q ) satisfies the imposed NR N ( φ , Q , y T ) ≥ s × and the other sign restrictions (if any imposed), η ∈ g CIS η ( φ | s ( y T ) , N ) holdsfor any y T . Hence, for all T , P Y T | s , φ ( η ∈ b C ∗ α | s ( Y T ) , φ ) ≥ P Y T | φ ( g CIS η ( φ | s ( Y T ) , N ) ⊂ b C ∗ α | s ( Y T ) , φ ) . (B.1)Hence, to prove the claim, it suffices to focus on the asymptotic behavior of thecoverage probability for the conditional identified set shown in the right-hand side.Under Assumption 6.2 and 6.3, the asymptotically correct coverage for the condi-tional identified set can be obtained by applying Proposition 2 in GK. Primitive Conditions for Assumption 6.3.
In what follows, we present sufficientconditions for convexity, continuity and differentiability (both in φ ) of the conditionalimpulse-response identified set under the assumption that there is a fixed number ofshock-sign restrictions constraining the first structural shock only (possibly in multipleperiods). Proposition B.1.
Convexity.
Let the parameter of interest be η i, ,h , the impulseresponse of the i th variable at the h th horizon to the first structural shock. Assumethat there are shock-sign restrictions on ε ,t for t = t , . . . , t K , so N ( φ , Q , Y T ) =( Σ − tr u t , . . . , Σ − tr u t K ) ′ q ≥ K × . Then the set of values of η i, ,h satisfying the shock-sign restrictions and sign normalization, { η i, ,h ( φ , Q ) = c i,h ( φ ) q : N ( φ , Q , Y T ) ≥ K × , diag( Q ′ Σ − tr ) ≥ n × , Q ∈ O ( n ) } is convex for all i and h if there exists aunit-length vector q ∈ R n satisfying " ( Σ − tr u t , . . . , Σ − tr u t K ) ′ ( Σ − tr e ,n ) ′ q ≥ ( K +1) × . (B.2) Proof of Proposition B.1.
If there exists a unit-length vector q satisfying theinequality in (B.2), it must lie within the intersection of the K half-spaces defined44y the inequalities ( Σ − tr u t k ) ′ q ≥ k = 1 , . . . , K , the half-space defined by the signnormalization, ( Σ − tr e ,n ) ′ q ≥
0, and the unit sphere in R n . The intersection of these K + 1 half-spaces and the unit sphere is a path-connected set. Since η i, ,h ( φ , Q )is a continuous function of q , the set of values of η i, ,h satisfying the restrictionsis an interval and is thus convex, because the set of a continuous function with apath-connected domain is always an interval. Proposition B.2.
Continuity.
Let the parameter of interest and restrictions be asin Proposition B.1, and assume that the conditions in the proposition are satisfied. Ifthere exists a unit-length vector q ∈ R n such that, at φ = φ , " ( Σ − tr u t , . . . , Σ − tr u t K ) ′ ( Σ − tr e ,n ) ′ q >> ( K +1) × , (B.3) then u ( φ , Y T ) and l ( φ , Y T ) are continuous at φ = φ for all i and h . Proof of Proposition B.2. Y T enters the NR through the reduced-form VARinnovations, u t . After noting that the reduced-form VAR innovations are (implicitly)continuous in φ , continuity of u ( φ , Y T ) and l ( φ , Y T ) follows by the same logic as inthe proof of Proposition B.2 of Giacomini and Kitagawa (in press[b]). We omit thedetail for brevity. Proposition B.3.
Differentiability.
Let the parameter of interest and restrictionsbe as in Proposition B.1, and assume that the conditions in the proposition are satis-fied. Denote the unit sphere in R n by S n − . If, at φ = φ , the set of solutions to theoptimization problem max q ∈S n − (cid:18) min q ∈S n − (cid:19) c ′ i,h ( φ ) q s.t. h ( Σ − tr u t , . . . , Σ − tr u t K ) , Σ − tr e ,n i ′ q ≥ ( K +1) × (B.4) is singleton, the optimized value u ( φ , Y T ) ( l ( φ , Y T ) ) is nonzero, and the numberof binding inequality restrictions at the optimum is at most n − , then u ( φ , Y T ) ( l ( φ , Y T ) ) is almost-surely differentiable at φ = φ . For a vector x = ( x , . . . , x m ) ′ , x >> m × means that x i > i = 1 , . . . , m . roof of Proposition B.3. One-to-one differentiable reparameterization of the op-timization problem in Equation (B.4) using x = Σ tr q yields the optimization problemin Equation (2.5) of Gafarov, Meier and Montiel-Olea (2018) with a set of inequal-ity restrictions that are now a function of the data through the reduced-form VARinnovations entering the NR. Noting that u t is (implicitly) differentiable in φ , differ-entiability of u ( φ , Y T ) at φ = φ follows from their Theorem 2 under the assumptionsthat, at φ = φ , the set of solutions to the optimization problem is singleton, theoptimized value u ( φ , Y T ) is nonzero, and the number of binding sign restrictions atthe optimum is at most n −
1. Differentiability of l ( φ , Y T ) follows similarly. Notethat Theorem 2 of Gafarov et al. (2018) additionally requires that the column vec-tors of h ( Σ − tr u t , . . . , Σ − tr u t K ) , Σ − tr e ,n i are linearly independent, but this occursalmost-surely under the probability law for Y T . C Asymptotics with increasing number of NR
What happens if the number of NR increases with the sample size? We conjecturethat, under some assumptions about how the NR are generated, the class of posteriorsfor particular parameters will converge to a point mass at the truth. Intuitively, as thenumber of restrictions increases, the likelihood tends to be truncated to an increasingextent until the only point with positive likelihood is the true value of the parameter.To provide some numerical evidence for this conjecture, we return to the bivariateexample from Section 2. Assume that φ = φ is known (which will be the caseasymptotically under regularity conditions), so π φ | Y T is a point mass at φ = φ .Consider the case where the econometrician observes sgn( ε t ) for all t and imposesthe shock-sign restriction sgn( ε t ) ε t ( φ , Q , y t ) ≥ t = 1 , . . . , T . Figure C.1 plots the conditional identified set for q for different numbers ofrestrictions given random realizations of a time series drawn from the data-generatingprocess. As the sample size increases, the boundaries of the half-spaces generatedby the binding shock-sign restrictions converge towards the true value of q , q , .Additionally, the conditional identified set for q will converge to its true value, since q is orthogonal to q and satisfies a sign normalization. In other words, imposing This assumption can be relaxed, so that the econometrician observes sgn( ε t ) each period withsome probability. Equivalently, we could show convergence of the conditional identified set for θ to θ , which pinsdown all impulse responses.
46 growing number of shock-sign restrictions on a single shock is sufficient for theposterior of all impulse responses to converge to a point mass at the true value. Note,however, that this will not be the case in higher-dimensional VARs, since the collapseof the conditional identified set for q does not pin down values for q j , j = 2 , . . . , n . Figure C.1: Illustration of Posterior Consistency -1 -0.5 0 0.5 1-1-0.500.51 -1 -0.5 0 0.5 1-1-0.500.51 -1 -0.5 0 0.5 1-1-0.500.51
Notes: Purple line is true value of q = ( q , , q , ) ′ ; orange line is boundary of half-spacegenerated by the sign normalization; blue lines are boundaries of half-spaces gen-erated by ‘binding’ shock-sign restrictions intersected with half-space generated bythe sign normalisation; red line is intersection of all half-spaces with unit circle. D Alternative algorithms for robust Bayesian in-ference
Assume that the object of interest is an impulse response to the first structural shock.The upper bound of the conditional identified set for the horizon- h impulse responseof the i th variable to this shock given φ and Y T is the value function associated withthe optimization problem u ( φ , Y T ) = max Q ∈Q ( φ | Y T ,N,S ) c ′ i,h ( φ ) q . (D.1) l ( φ , Y T ) is obtained by minimising the same objective function subject to the sameconstraints. When N ( φ , Q , Y T ) and S ( φ , Q ) only constrain q , applying the changeof variables x = Σ tr q yields the optimization problem in Gafarov et al. (2018) withadditional inequality restrictions that are functions of Y T .Given a set of active inequality restrictions, Gafarov et al. (2018) provide an ana-lytical expression for the value function and solution of this optimization problem. Tofind the bounds of the identified set, they compute these quantities for every possible47ombination of active restrictions and check which pair solves the optimization prob-lem. Since the bounds are computed analytically at each set of active restrictions, thisalgorithm is computationally inexpensive as long as there is not a very large numberof inequality restrictions. However, if N ( φ , Q , Y T ) contains restrictions on the histor-ical decomposition, all columns of Q are (nonlinearly) constrained and the analyticalresults are longer applicable. Similarly, the approach is not applicable when thereare shock-sign or shock-rank restrictions on different structural shocks, or traditionalsign restrictions on multiple columns of Q . This approach may also be prohibitivelyslow when there is a large number of restrictions, which may be the case when thereare shock-rank restrictions. Amir-Ahmadi and Drautzburg (2021) propose an algorithm to determine whetherthe set of admissible values for Q is nonempty without recourse to random samplingfrom O ( n ). This algorithm can be more accurate and efficient than the simulation-based approach used in Algorithm 1, but it is applicable only when the columnsof Q are subject to linear inequality restrictions, which is not the case when thereare restrictions on the historical decomposition. However, practitioners may not al-ways wish to impose restrictions on the historical decomposition. Accordingly, wedescribe an algorithm that can be used to conduct robust Bayesian inference with-out recourse to rejection sampling when there are shock-rank, shock-sign and/ortraditional sign restrictions on a single column of Q . The algorithm uses the ap-proach in Amir-Ahmadi and Drautzburg (2021) to determine whether the conditionalidentified set for q is nonempty and replaces the Monte Carlo approximation of[ l ( φ , Y T ) , u ( φ , Y T )] in Algorithm 1 with a numerical optimization step. Algorithm D.1.
Let N ( φ , Y T ) q ≥ s × be the set of NR and let S ( φ ) q ≥ (˜ s +1) × be the set of traditional sign restrictions (including the sign normalization). Assumethe object of interest is η i, ,h = c ′ i,h ( φ ) q . Replace Steps 2 and 3 of Algorithm 1 withthe following. • Step 2 : Draw φ from π φ | Y T and check whether the conditional identified setfor q is empty by using the following subroutine. At most n − s NR and ˜ s traditional sign restrictions is P n − k =0 (cid:0) s +˜ s +1 k (cid:1) . For example, in the empirical application below, whenwe consider a shock-rank restriction alongside traditional sign restrictions, there are P n − k =0 (cid:0) T +˜ s +1 k (cid:1) =3 . × combinations of active restrictions to check. Solve for the Chebyshev center { R, ˜ q } of the set { ˜ q : ( N ( φ , Y T ) ′ , S ( φ ) ′ ) ′ ˜ q ≥ ( s +˜ s +1) × , | ˜ q i | ≤ , i = 1 , . . . , n } . (D.2) If R > , the conditional identified set is nonempty, so proceed to Step 3.Otherwise, repeat Step 2. • Step 3 : Compute l ( φ , Y T ) by solving the following constrained optimizationproblem with initial value q = ˜ q / k ˜ q k : l ( φ , Y T ) = min q c ′ i,h ( φ ) q s.t. ( N ( φ , Y T ) ′ , S ( φ ) ′ ) ′ ˜ q ≥ ( s +˜ s +1) × , q ′ q = 1 . (D.3) Similarly, obtain u ( φ , Y T ) by maximising c ′ i,h ( φ ) q subject to the same set ofconstraints. Step 2.1 requires solving for the Chebyshev center of the set satisfying the narrativeand traditional sign restrictions. The Chebyshev center ˜ q is the center of the largestball with radius R that can be inscribed within the set { ˜ q : ( N ( φ , Y T ) ′ , S ( φ ) ′ ) ′ ˜ q ≥ ( s +˜ s +1) × , | ˜ q i | ≤ , i = 1 , . . . , n } , which is the intersection of the half-spaces generatedby the inequality restrictions and the unit n -cube. Letting Z ′ k be the k th row of( N ( φ , Y T ) ′ , S ( φ ) ′ ) ′ , the Chebyshev center and radius can be obtained as the solutionto the following problem (see, for example, Boyd and Vandenberghe (2004)):max { R ≥ , ˜ q } R subject to Z ′ k ˜ q + R k Z k k ≥ , k = 1 , . . . , s + ˜ s + 1˜ q i + R ≤ , i = 1 , . . . , n. ˜ q i − R ≥ − , i = 1 , . . . , n. This is a linear program, which can be solved efficiently. If
R >
0, then the conditionalidentified set for q is nonempty. If ˜ q is a Chebyshev center with R >
0, then ˜ q satisfies the inequality restrictions and k ˜ q k > q = ˜ q / k ˜ q k then has unit norm and The restriction that ˜ q lies within the unit n -cube ensures that the problem is well-defined. E NR in the local pro jection framework
Plagborg-Møller and Wolf (in press[a]) explain how to impose typical SVAR identi-fying restrictions in the local projection framework. This appendix explains how toimpose NR in this framework.
Local projection framework.
Assume the n × y t is driven by an n × ε t = ( ε t , . . . , ε nt ) ′ of structural shocks: y t = µ y + Θ ( L ) ε t , Θ ( L ) ≡ ∞ X l =0 Θ l L l , (E.1)where { Θ l } ∞ l =0 is absolutely summable, Θ ( x ) has full row rank for all complex scalars x on the unit circle, and ε t is independently and identically distributed with E ( ε t ) = n × and E ( ε t ε ′ t ) = I n .Consider the coefficient vectors { β i,h } obtained from the n × ( H + 1) local pro-jections y i,t + h = µ i,h + β ′ i,h y t + ∞ X l =1 δ ′ i,h,l y t − l + u i,h,t , (E.2)where i = 1 , . . . , n and h = 0 , , . . . , H . Let C h = ( β ,h , . . . , β n,h ) ′ denote the n × n matrix of horizon- h projection coefficients. Plagborg-Møller and Wolf (in press[a])show that the elements of C h are the impulse responses of y t at horizon h to theWold innovations u t = y t − Proj( y t |{ y t − l } ∞ l =1 ). The Wold innovations are equalto the residuals of the local projection at h = 1, so u t = ( u , ,t , . . . , u n, ,t ) ′ . LetVar( u t ) = Σ = Σ tr Σ ′ tr , where Σ tr is the lower-triangular Cholesky factor of Σ withstrictly positive diagonal elements.Assume that the structural shocks are invertible, in the sense that they can berecovered as a linear combination of u t : ε t = Gu t , where G is a full-rank n × n matrix. Reparameterize G as G − = Σ tr Q , where Q ∈ O ( n ). The diagonal elementsof G are normalized to be nonnegative, so diag( Q ′ Σ − tr ) ≥ n × .50et φ = (vec( C ) ′ , . . . , vec( C H ) ′ , vech( Σ tr ) ′ ) ′ . The object of interest is the (struc-tural) impulse response, which is an element of Θ l . Invertibility of the shocks impliesthat the structural impulse responses can be obtained as rotations of the reduced-formimpulse responses C h . In particular, given that u t = Σ tr Q ε t , Θ h = C h Σ tr Q . Imposing NR.
The invertibility assumption implies that ε t = Q ′ Σ − tr u t . (E.3)The i th structural shock at time t is therefore ε it ( φ , Q , u t ) = e ′ i Q ′ Σ − tr u t = ( Σ − tr u t ) ′ q i . (E.4)A shock-sign restriction is therefore a linear inequality restriction on q i that dependson the reduced-form parameter Σ tr and the Wold innovations u t .The historical decomposition is the cumulative contribution of the j th shock tothe observed unexpected change in the i th variable between periods t and t + h : H i,j,t,t + h = h X l =0 e ′ i C l Σ tr Qe j e ′ j ε t + h − l = h X l =0 C l Σ tr q j q ′ j Σ − tr u t + h − l . (E.5)This is a function of the reduced-form parameter Σ tr , the reduced-form impulse re-sponses { C l } hl =0 and the Wold innovations u t .Given a set of traditional and narrative sign restrictions, the upper bound of theconditional identified set for a particular impulse response of interest, e ′ i,n Θ h e j ∗ ,n , isthe solution u ( φ , Y T ) of the following constrained optimization problem: u ( φ , Y T ) = max Q e ′ i,n C h Σ tr Qe j ∗ ,n (E.6)subject to N ( φ , Q , Y T ) ≥ s × , Q ′ Q = I n , diag( Q ′ Σ − tr ) ≥ n × . The lower bound of the conditional identified set is the solution of the correspondingminimization problem. 51 emarks • The key difference between this framework and the SVAR framework is that, inthe SVAR framework, the reduced-form impulse responses would be obtainedfrom the VMA representation of the reduced-form VAR. In contrast, here theyare obtained directly from local projections. Otherwise, structural impulse re-sponses are obtained by rotating reduced-form impulse responses in exactly thesame way as in the SVAR. • As in Plagborg-Møller and Wolf (in press[a]), we have assumed that there is aninfinite number of lags appearing as controls in the local projections. Underthis assumption, the reduced-form impulse responses will coincide with thosefrom a VAR( ∞ ) at all horizons. The horizon-1 local projection innovationswill also coincide with the one-step-ahead forecast errors from the VAR, sothe covariance matrix of these innovations will coincide. Consequently, theconditional identified set for the structural impulse responses will also coincideat all horizons. See Plagborg-Møller and Wolf (in press[a]) for discussions ofthe finite-lag case and the choice between VARs and local projections. • Given posterior draws of φ , one could conduct robust Bayesian inference inexactly the same way as in the SVAR case. However, obtaining the posterior of φ requires specifying a joint prior over the parameters in φ and the parametersgoverning the system of local projection residuals, which are in general seriallycorrelated (for example, see Lusompa (2020)). F NR as proxy variables
Plagborg-Møller and Wolf (in press[b]) point out that information about the sign ofa particular structural shock can be recast as a variable that can be used to point-identify impulse responses in a proxy SVAR or local projection framework. Specif-ically, consider the variable that takes value one when the structural shock is knownto be positive, minus one when it is known to be negative and zero otherwise. This‘narrative proxy’ will clearly be positively correlated with the structural shock of in-terest. Since the proxy depends only on the structural shock of interest, it will also For a related approach, see Budnik and R¨unstler (2020).
52e contemporaneously uncorrelated with the other structural shocks. It can thereforebe used to point-identify the impulse responses to the shock of interest in a proxySVAR (e.g., Mertens and Ravn (2013) and Montiel-Olea, Stock and Watson (2020)).Since the instrument is additionally uncorrelated with leads and lags of all structuralshocks, it could alternatively be used as an instrument in a local projection, whichdoes not require assuming invertibility (e.g., Stock and Watson (2018)). This approach is valid when there are shock-sign restrictions only, but more gen-erally it is unclear how one would encode the information underlying richer sets ofNR (e.g., restrictions on the historical decomposition) as an instrument without dis-carding potentially useful identifying information. Additionally, when there are onlya small number of shock-sign restrictions used to generate the instrument, the pointestimator of the impulse response will be sensitive to the realization of the data in theperiods corresponding to the shock-sign restrictions. We illustrate this point belowusing the bivariate example of Section 2.
Proxy variables in the bivariate example.
Assume there is a variable Z t sat-isfying E ( Z t ε t ) = 0 and E ( Z t ε t ) = 0. After expressing ε t in terms of y t and theparameters, the exogeneity condition implies that σ E ( Z t y t ) sin θ = E ( Z t ( σ y t − σ y t )) cos θ. (F.1)If the instrument is not relevant, so that E ( Z t ε t ) = 0, the restriction E ( Z t ε t ) = 0carries no information about θ , since E ( Z t y t ) = 0 and E ( Z t ( σ y t − σ y t )) = 0.Otherwise, tan θ = E ( Z t ( σ y t − σ y t )) σ E ( Z t y t ) . (F.2)This equation has two solutions in [ − π, π ], one of which will be ruled out by thesign normalization restrictions. For example, if σ < C ) is positive, then θ is ei-ther equal to arctan( C ) − π or arctan( C ). The sign normalization implies that θ ∈ [arctan( σ /σ ) , arctan( σ /σ ) + π ], which rules out the first solution, so θ = arctan( C ). If C is negative, then θ is either equal to arctan( C ) or arctan( C ) + π . Note that the covariance between the narrative proxy and the structural shock of interest willconverge to zero asymptotically when there is a fixed number of shock-sign restrictions used togenerate the proxy. In this case, frequentist inference could be conducted using weak-instrumentrobust methods (e.g., Montiel-Olea et al. (2020)).
C > σ /σ , then the sign normalization selects the first solution, otherwise itselects the second solution. Similar arguments apply when σ > Consider the case where information about the sign of the first structural shockis recast as a binary variable. Specifically, as in the shock-sign example, assume theeconometrician knows that ε k ≥ k ∈ { , . . . , T } , and let Z k = sgn( ε k )with Z t = 0 for t = k . What happens if the econometrician imposes the identifyingrestriction that E ( Z t ε t ) = 0?Maintaining the assumption that φ is known with σ <
0, in the case where( σ y k − σ y k ) > y k >
0, an analogue estimator of θ isˆ θ = arctan T P Tt =1 Z t ( σ y t − σ y t ) σ
22 1 T P Tt =1 Z t y t ! = arctan (cid:18) σ y k − σ y k σ y k (cid:19) . (F.3)Note that this is equal to the estimator that would be obtained if one were to imposethe ‘narrative zero restriction’ ε k = 0. Additionally, ˆ θ lies within the conditionalidentified set under the shock-sign restriction ε k ≥
0. To see this, first note that ˆ θ lies in the range (0 , π/ . ) function is positiveby assumption. The conditional identified set for θ under the shock-sign restrictionin this case is given by (A.5). The lower bound of the conditional identified setis bounded above by zero, while the upper bound is bounded below by π/
2, so ˆ θ necessarily lies within this conditional identified set.How does this estimator relate to the true value of θ ? Assume that the data aregenerated by a process with parameter θ ∈ (0 , π ) (with Q equal to the rotationmatrix). Replacing y k and y k in (F.3) using y k = A − ε k yields an expression for ˆ θ in terms of the true parameters and the underlying structural shocks:ˆ θ = arctan ( σ ( σ cos θ ε k − σ sin θ ε k )) − h σ h ( σ cos θ + σ sin θ ) ε k +( σ cos θ − σ sin θ ) ε k i − σ ( σ cos θ ε k − σ sin θ ε k ) i! . (F.4) When σ >
0, the sign normalization restricts θ to lie in h − π + arctan (cid:16) σ σ (cid:17) , arctan (cid:16) σ σ (cid:17)i . If C > σ σ , the sign normalization implies that θ = arctan( C ) − π . Otherwise, the sign normalizationimplies that θ = arctan( C ). ε k = 0, we have that ˆ θ = θ . Otherwise, ˆ θ will not in general coincide with θ .For example, for ε k = 0 and ε k ≈
0, ˆ θ ≈ arctan(cot θ ) = π − θ . In this case, theimpulse-response estimator isˆ η = σ cos ˆ θ ≈ σ cos (cid:16) π − θ (cid:17) = − σ sin θ , (F.5)which is the true impulse response of the first variable to the second shock, rather thanthe first shock. In general, the estimator of the impulse response may be sensitiveto the value of the second shock in period k , since it is based solely on the data inperiod k .The impulse response considered above is to a standard-deviation shock in ε t (i.e., an absolute impulse response). In the literature that uses proxies to identify theeffects of macroeconomic shocks, it is common to use the relative impulse response,which is the impulse response to a shock that raises a particular variable by one unit.For example, the impulse response of y t to a shock that raises the first variable byone unit, ˜ η , is the ratio of the absolute impulse response of the second variable tothe absolute impulse response of the first variable. The analogue estimator of theabsolute impulse response of the first variable isˆ η = σ cos ˆ θ = σ cos (cid:18) arctan (cid:18) σ y k − σ y k σ y k (cid:19)(cid:19) = σ σ y k p σ y k + ( σ y k − σ y k ) , (F.6)where the last line follows from the fact that cos(arctan( x )) = (1 + x ) − / . Theestimator for the absolute impulse response of the second variable isˆ η = σ cos ˆ θ + σ sin ˆ θ = q σ + σ cos (cid:18) ˆ θ − arctan (cid:18) σ σ (cid:19)(cid:19) = q σ + σ (cid:20) cos ˆ θ cos (cid:18) arctan (cid:18) σ σ (cid:19)(cid:19) + sin ˆ θ sin (cid:18) arctan (cid:18) σ σ (cid:19)(cid:19)(cid:21) = σ σ y k p σ y k + ( σ y k − σ y k ) , (F.7) This follows from the fact that arctan( x ) + arctan (cid:0) x (cid:1) = π for x > a cos x + b sin x = √ a + b cos( x − α ) with tan α = b/a ,cos( x − y ) = cos x cos y +sin x sin y , cos(arctan( x )) = (1+ x ) − / and sin(arctan( x )) = x (1 + x ) − / . Consequently, ˜ η = ˆ η ˆ η = y k y k . (F.8)The estimator of the relative impulse response will clearly also be sensitive to therealizations of the structural shocks in period k . Similar to above, if ε k = 0, then ˜ η will be equal to the true relative impulse response of the second variable to the firstshock. If ε k ≈ ε k = 0, then ˜ η will be approximately equal to the true relativeimpulse response of the second variable to the second shock.56 eferences Amir-Ahmadi, P. and T. Drautzburg (2021): “Identification and Inference withRanking Restrictions,”
Quantitative Economics , 12, 1–39.
Antol´ın-D´ıaz, J. and J. Rubio-Ram´ırez (2018): “Narrative Sign Restrictionsfor SVARs,”
American Economic Review , 108, 2802–29.
Arias, J., D. Caldara, and J. Rubio-Ram´ırez (2019): “The Systematic Com-ponent of Monetary Policy in SVARs: An Agnostic Identification Procedure,”
Jour-nal of Monetary Economics , 101, 1–13.
Arias, J., J. Rubio-Ram´ırez, and D. Waggoner (2018): “Inference Basedon Structural Vector Autoregressions Identified with Sign and Zero Restrictions:Theory and Applications,”
Econometrica , 86, 685–720.
Basu, A., H. Shioya, and C. Park (2011):
Statistical Inference: The MinimumDistance Approach , Chapman and Hall/CRC Press.
Baumeister, C. and J. Hamilton (2015): “Sign Restrictions, Structural VectorAutoregressions, and Useful Prior Information,”
Econometrica , 83, 1963–1999.
Ben Zeev, N. (2018): “What Can We Learn About News Shocks from the Late1990s and Early 2000s Boom-bust Period?”
Journal of Economic Dynamics andControl , 87, 94–105.
Blanchard, O. and D. Quah (1989): “The Dynamic Effects of Aggregate Demandand Supply Disturbances,”
The American Economic Review , 79, 655–673.
Boyd, S. and L. Vandenberghe (2004):
Convex Optimization , Cambridge, UnitedKingdom: Cambridge University Press.
Budnik, K. and G. R¨unstler (2020): “Identifying SVARs from Sparse NarrativeInstruments: Dynamic Effects of U.S. Macroprudential Policies,” European CentralBank Working Paper No. 2353.
Cheng, K. and Y. Yang (2020): “Revisiting the Effects of Monetary Policy Shocks:Evidence from SVAR with Narrative Sign Restrictions,”
Economics Letters , 196,109598. 57 el Negro, M. and F. Schorfheide (2011): “Bayesian Macroeconometrics,” in
Oxford Handbook of Bayesian Econometrics , ed. by J. Geweke, G. Koop, and H. V.Dijk, Oxford, United Kingdom: Oxford University Press, 293–389.
Furlanetto, F. and Ø. Robstad (2019): “Immigration and the Macroeconomy:Some New Empirical Evidence,”
Review of Economic Dynamics , 34, 1–19.
Gafarov, B., M. Meier, and J. Montiel-Olea (2018): “Delta-Method Infer-ence for a Class of Set-Identified SVARs,”
Journal of Econometrics , 203, 316–327.
Giacomini, R. and T. Kitagawa (in press[a]): “Robust Bayesian Inference forSet-identified Models,”
Econometrica .——— (in press[b]): “Supplement to “Robust Bayesian Inference for Set-identifiedModels”,”
Econometrica . Giacomini, R., T. Kitagawa, and M. Read (2019): “Robust Bayesian Inferencein Proxy SVARs,” cemmap Working Paper CWP23/19.
G¨urkaynak, R. S., B. Sack, and E. Swanson (2005): “Do Actions SpeakLouder Than Words? The Response of Asset Prices to Monetary Policy Actionsand Statements,”
International Journal of Central Banking , 1, 55–93.
Hamilton, J. (1994):
Time Series Analysis , Princeton, NJ: Princeton UniversityPress.
Inoue, A. and L. Kilian (2020): “Joint Bayesian Inference about Impulse Re-sponses in VAR Models,” .
Kilian, L. and H. L¨utkepohl (2017):
Structural Vector Autoregressive Analysis ,Cambridge, United Kingdom: Cambridge University Press.
Kilian, L. and X. Zhou (2020a): “Does Drawing Down the US Strategic PetroleumReserve Help Stabilize Oil Prices?”
Journal of Applied Econometrics , 35, 673–691.——— (2020b): “Oil Prices, Exchange Rates and Interest Rates,” Center for financialstudies working paper series no. 646.
Laumer, S. (2020): “Government Spending and Heterogeneous Consumption Dy-namics,”
Journal of Economic Dynamics and Control , 114, 103868.58 udvigson, S., S. Ma, and S. Ng (2018): “Shock Restricted Structural Vector-Autoregressions,” National Bureau of Economic Research Working Paper No.23225.——— (in press): “Uncertainty and Business Cycles: Exogenous Impulse or Endoge-nous Response?”
American Economic Journal: Macroeconomics . Lusompa, A. (2020): “Local Projections, Autocorrelation, and Efficiency,” .
Mertens, K. and M. Ravn (2013): “The Dynamic Effects of Personal and Cor-porate Income Tax Changes in the United States,” 103, 1212–47.
Montiel-Olea, J., J. Stock, and M. Watson (2020): “Inference in StructuralVector Autoregressions Identified with an External Instrument,”
Journal of Econo-metrics . Moon, H. and F. Schorfheide (2012): “Bayesian and Frequentist Inference inPartially Identified Models,”
Econometrica , 80, 755–782.
Petterson, M., D. Seim, and J. Shapiro (2020): “Bounds on a Slope from SizeRestrictions on Economic Shocks,” .
Plagborg-Møller, M. (2019): “Bayesian Inference on Structural Impulse Re-sponse Functions,”
Quantitative Economics , 10, 145–184.
Plagborg-Møller, M. and C. Wolf (in press[a]): “Local Projections and VARsEstimate the Same Impulse Responses,”
Econometrica .——— (in press[b]): “Supplement to “Local Projections and VARs Estimate theSame Impulse Responses”,”
Econometrica . Poirier, D. (1998): “Revising Beliefs in Nonidentified Models,”
Econometric The-ory , 14, 483–509.
Redl, C. (2020): “Uncertainty Matters: Evidence from Close Elections,”
Journal ofInternational Economics , 103296.
Romer, C. and D. Romer (1989): “Does Monetary Policy Matter? A New Testin the Spirit of Friedman and Schwartz,” in
NBER Macroeconomics Annual , ed.by O. Blanchard and S. Fischer, Cambridge, MA: MIT Press, vol. 4, 121–84.59—— (2004): “A New Measure of Monetary Shocks: Derivation and Implications,”
American Economic Review , 94, 1055–1084.
Rothenberg, T. (1971): “Identification in Parametric Models,”
Econometrica , 39,577–591.
Rubio-Ram´ırez, J., D. Waggoner, and T. Zha (2010): “Structural Vector Au-toregressions: Theory of Identification and Algorithms for Inference,”
The Reviewof Economic Studies , 77, 665–696.
Sims, C. (1980): “Macroeconomics and Reality,”
Econometrica , 48, 1–48.
Stock, J. and W. Watson (2018): “Identification and Estimation of DynamicCausal Effects in Macroeconomics Using External Instruments,”
The EconomicJournal , 128, 917–948.
Uhlig, H. (2005): “What are the Effects of Monetary Policy on Output? Resultsfrom an Agnostic Identification Procedure,”
Journal of Monetary Economics , 52,381–419.——— (2017): “Shocks, Sign Restrictions, and Identification,” in
Advances in Eco-nomics and Econometrics: Eleventh World Congress , ed. by B. Honor´e, A. Pakes,M. Piazzesi, and L. Samuelson, Cambridge, United Kingdom: Cambridge Univer-sity Press, vol. 2, 95–127.
Zhou, X. (2020): “Refining the Workhorse Oil Market Model,”