[PDF] Instrumental Variable Identification of Dynamic Variance Decompositions

Abstract

Macroeconomists increasingly use external sources of exogenous variation for causal inference. However, unless such external instruments (proxies) capture the underlying shock without measurement error, existing methods are silent on the importance of that shock for macroeconomic fluctuations. We show that, in a general moving average model with external instruments, variance decompositions for the instrumented shock are interval-identified, with informative bounds. Various additional restrictions guarantee point identification of both variance and historical decompositions. Unlike SVAR analysis, our methods do not require invertibility. Applied to U.S. data, they give a tight upper bound on the importance of monetary shocks for inflation dynamics.

Full PDF

IInstrumental Variable Identiﬁcation ofDynamic Variance Decompositions ∗ Mikkel Plagborg-Møller Christian K. WolfPrinceton University University of ChicagoThis version: November 4, 2020First version: August 10, 2017

Abstract:

Macroeconomists increasingly use external sources of exogenousvariation for causal inference. However, unless such external instruments (prox-ies) capture the underlying shock without measurement error, existing methodsare silent on the importance of that shock for macroeconomic ﬂuctuations. Weshow that, in a general moving average model with external instruments, variancedecompositions for the instrumented shock are interval-identiﬁed, with informa-tive bounds. Various additional restrictions guarantee point identiﬁcation of bothvariance and historical decompositions. Unlike SVAR analysis, our methods donot require invertibility. Applied to U.S. data, they give a tight upper bound onthe importance of monetary shocks for inﬂation dynamics.

Keywords: external instrument, impulse response function, invertibility, proxy variable, vari-ance decomposition.

JEL codes:

C32, C36. ∗ Email: [email protected] and [email protected] . We received helpful comments from oureditor Harald Uhlig, three anonymous referees, Isaiah Andrews, Tim Armstrong, Dario Caldara, ThorstenDrautzburg, Domenico Giannone, Yuriy Gorodnichenko, Ed Herbst, Marek Jaroci´nski, Peter Karadi, LutzKilian, Michal Koles´ar, Byoungchan Lee, Sophocles Mavroeidis, Pepe Montiel Olea, Ulrich M¨uller, EmiNakamura, Giorgio Primiceri, Eric Renault, Giovanni Ricco, Luca Sala, J´on Steinsson, Jim Stock, MarkWatson, and seminar participants at several venues. The ﬁrst draft of this paper was written while Wolf wasvisiting the Bundesbank, whose hospitality is gratefully acknowledged. Wolf also acknowledges support fromthe Alfred P. Sloan Foundation and the Macro Financial Modeling Project. Plagborg-Møller acknowledgesthat this material is based upon work supported by the NSF under Grant a r X i v : . [ ec on . E M ] N ov Introduction

In recent years, and in parallel to popular microeconometric identiﬁcation strategies, empiri-cal practice in applied macroeconometrics has turned towards “external” sources of plausiblyexogenous variation. Such external instrumental variables (IVs, or proxy variables) are nowroutinely used to estimate causal eﬀects through a simple Two-Stage Least Squares versionof Local Projections (Jord`a, 2005; Ramey, 2016). Appealingly, this approach is valid evenwithout the assumption of invertibility – the ability to recover structural shocks from currentand past (but not future) values of the observed macro variables (Nakamura & Steinsson,2018b; Stock & Watson, 2018).However, applied researchers are often not just interested in dynamic causal eﬀects, butalso want to learn about a particular shock’s contribution to macroeconomic ﬂuctuations(Christiano et al., 1999; Beaudry & Portier, 2006; Smets & Wouters, 2007). If the IV is aperfect measure of the underlying structural macro shock, then the desired variance decom-positions are readily computed from standard Local Projection regression output (Gorod-nichenko & Lee, 2020). In many applications, though, it is likely that external shock measuresare contaminated by substantial measurement error, causing attenuation bias. For example,Gertler & Karadi (2015) use high-frequency changes in asset prices around monetary policyannouncements as credible instruments for monetary shocks; since these instruments at bestcapture a subset of all monetary shocks, simple direct regressions on the IV are likely to sub-stantially understate the importance of monetary disturbances. Up to this point, the onlypossible alternative approach was to combine the IV with conventional Structural VectorAutoregressive (SVAR) methods (Stock, 2008; Mertens & Ravn, 2013), thus automaticallyimposing the otherwise unnecessary and empirically dubious invertibility assumption.In this paper, we show precisely to what extent external instruments are informativeabout shock importance. Throughout, we consider an unrestricted linear moving averagemodel, disciplined only by IVs. This model nests conventional, invertible SVARs, as wellas essentially all linearized macro models. We prove three main results. First, withoutfurther restrictions, the variance decomposition of the instrumented shock’s contribution tomacroeconomic ﬂuctuations is interval-identiﬁed, with informative lower and upper bounds.Second, if the researcher is willing to impose the assumption of recoverability – i.e., thatthe shock is spanned by current, past and future values of the observed macro variables –then both variance decompositions and historical decompositions (the shock’s contributionto realized ﬂuctuations) are point-identiﬁed. Third, we derive a simple Granger causality2re-test for invertibility that we show exploits the strongest possible testable implication.We complement this set of theoretical results with an extensive code suite that implementsall our inference procedures.We adopt the exact same structural vector moving average (SVMA) model with externalIVs as in Stock & Watson (2018), but focus on variance decompositions, rather than impulseresponses. The key identifying assumption of this model is the availability of external instru-ments that correlate with the shock of interest, but are otherwise dynamically uncorrelatedwith all other macro shocks. Importantly, the IVs may be contaminated by classical mea-surement error. Stock & Watson (2018) show that, in this SVMA-IV model, relative impulseresponses (which normalize the impact eﬀect) are point-identiﬁed. While such normalizedimpulse responses do not require identiﬁcation of the scale of the underlying shock, shockscale inevitably matters for variance and historical decompositions, and so lies at the heartof the identiﬁcation challenge we face in this paper.We bound the importance of the instrumented shock from above and from below byviewing the model as a dynamic measurement error model. Our question is: Given thesecond moments (autocovariances) of the macro variables and the IVs, what can be saidabout (forecast or unconditional) variance decompositions? The identiﬁcation challenge isthat we do not know the signal-to-noise ratio of the IV a priori ; however, we prove that itis possible to bound this ratio using the moments of the data. At one extreme, our lowerbound corresponds to the previously discussed approach of treating the IV as the shock (zeromeasurement error). If – as seems likely in practice – the IV is actually not perfect, then thislower bound may substantially understate the true importance of the shock. At the otherextreme, given that we observe a certain degree of co-movement between the IV and themacro observables at various leads and lags, we know that measurement error also cannot betoo pervasive. We translate this intuition into formal bounds and prove that these boundsare sharp , i.e., they exhaust all the information about variance decompositions contained inthe second moments of the data.We also characterize the set of additional assumptions that researchers could impose to point -identify both variance and historical decompositions. Here our main result is that pointidentiﬁcation obtains if the instrumented shock is assumed to be recoverable , i.e., spannedby all lags and leads of the endogenous macro variables. Appealingly, recoverability obtainsin any macro model with as many observables as shocks; in particular, it holds even in manymodels with news and noise shocks, unlike the strictly stronger (and, as we show, testable)invertibility assumption made in SVAR analysis (Leeper et al., 2013).3e provide the applied researcher with an easy-to-use code suite that constructs conﬁ-dence intervals for all parameters of interest. In a ﬁrst step, we use a reduced-form

VAR inmacro variables and IVs as a convenient tool for approximating the second moments of thedata. The second step then constructs sample analogues of our identiﬁcation bounds andinserts these into the conﬁdence procedure of Imbens & Manski (2004); alternatively, wealso provide conﬁdence intervals valid under the additional point-identifying restriction ofrecoverability. We prove that our conﬁdence intervals have asymptotically valid frequentistcoverage under weak nonparametric conditions on the data generating process.To demonstrate the feasibility and applicability of our procedures, we bound the impor-tance of monetary shocks for inﬂation dynamics in the U.S. We employ the high-frequencyIV proposed by Gertler & Karadi (2015), mentioned above. As discussed in Ramey (2016),the rising importance of forward guidance since the early 1990s is likely to invalidate theinvertibility assumption and so threatens consistency of the standard SVAR-IV estimatorused by Gertler & Karadi. Indeed, we ﬁnd that the data is consistent with substantial non-invertibility. Applying our robust methodology, we ﬁnd that monetary shocks are almostirrelevant for aggregate inﬂation in our post-1990 sample: The 90% conﬁdence intervals forthe forecast variance contribution of monetary shocks rules out values above 8% at all hori-zons. Thus, to the extent that inﬂation is a monetary phenomenon, it is so because of thesystematic part of U.S. monetary policy, not because of its erratic conduct.Finally, we use a series of analytical and quantitative examples to give intuition for why,in spite of its weak identifying assumptions, our method will often manage to give very tightupper bounds on shock importance, consistent with our ﬁndings in the monetary application.

Literature.

Plagborg-Møller & Wolf (2020) prove that the invertibility-robust Local Pro-jection IV impulse response estimator has the same estimand as a recursive SVAR that in-cludes the IV and orders it ﬁrst. This paper complements our other work by analyzing theidentiﬁcation of variance and historical decompositions, which requires completely diﬀerentmathematical arguments.Non-invertibility and its eﬀects on SVAR identiﬁcation have received substantial atten-tion in recent years (see the references in Plagborg-Møller, 2019, sec. 2.3). Previous work hasemphasized that, in the empirically relevant case of foresight about economic fundamentalsor policy (“news”), conventional SVAR analysis invariably fails: Rational expectations equi-libria create non-invertible SVMA representations, and so SVARs cannot correctly recoverthe structural shocks (Leeper et al., 2013; Wolf, 2020). In contrast, non-invertibility poses4o challenge to the methods developed in this paper. We also show that, in the SVMA-IVmodel, the degree of invertibility is set-identiﬁed. Our proposed test of invertibility is relatedto the Granger causality tests developed in SVAR settings by Giannone & Reichlin (2006)and Forni & Gambetti (2014). Finally, the weaker notion of “recoverability” studied here hasindependently been proposed by Chahrour & Jurado (2020) outside the context of externalIV identiﬁcation. Outline.

Section 2 deﬁnes the SVMA-IV model and the parameters of interest, and statesthe identiﬁcation problem. Section 3 gives an overview of our procedures and their imple-mentation. Section 4 applies the procedures to bound the importance of monetary shocks.Section 5 formally derives our identiﬁcation results, and Section 6 illustrates the usefulnessand interpretation of the upper bound through analytical examples. Section 7 compares theﬁnite-sample performance of our procedures to the SVAR-IV approach through simulations.Section 8 concludes. Proofs of our main results are relegated to Appendix A. The Matlabcode suite and a supplemental appendix are available online. We begin by deﬁning the econometric model and the parameters of interest. Then we statethe identiﬁcation problem.

Following Stock & Watson (2018), we assume a SVMA-IV model. This model allows foran unrestricted linear shock transmission mechanism and, unlike standard SVAR analysis,does not require shocks to be invertible. We also assume the availability of valid externalIVs (proxy variables) – variables that correlate with the shock of interest, but not with theother shocks. For notational clarity, we assume throughout that all time series below havezero mean and are strictly non-deterministic.First, we specify the weak assumptions on shock transmission to endogenous variables. Recoverability is formally equivalent to the assumption that the structural shock is spanned by currentand future reduced-form VAR forecast errors. Such dynamic rotations of u t have been exploited in non-IVsettings by Lippi & Reichlin (1994), Mertens & Ravn (2010), and Forni et al. (2017a,b). https://github.com/mikkelpm/svma_iv ssumption 1. The n y -dimensional vector y t = ( y ,t , . . . , y n y ,t ) of observed macro variablesis driven by an unobserved n ε -dimensional vector ε t = ( ε ,t , . . . , ε n ε ,t ) of exogenous economicshocks, y t = Θ( L ) ε t , Θ( L ) ≡ ∞ X ‘ =0 Θ ‘ L ‘ , (1) where L is the lag operator. The matrices Θ ‘ are each n y × n ε and absolutely summableacross ‘ . Θ( x ) is assumed to have full row rank for all complex scalars x on the unit circle.The shocks are mutually orthogonal white noise processes: ε t ∼ WN (0 , I n ε ) , where I n denotes the n -dimensional identity matrix. The ( i, j ) element Θ i,j,‘ of the moving average coeﬃcient matrix Θ ‘ is the impulse response of variable i to shock j at horizon ‘ . The j -th column of Θ ‘ is denoted by Θ • ,j,‘ and the i -throw by Θ i, • ,‘ . The full-rank assumption guarantees a nonsingular stochastic process. Thiscondition requires n ε ≥ n y , but – crucially – we do not assume that the number of shocks n ε is known. The mutual orthogonality of the shocks is the standard assumption in empiricalmacroeconomics. The model is semiparametric in that we place no a priori restrictions onthe coeﬃcients of the inﬁnite moving average, except to ensure a valid stochastic process. Inparticular, the inﬁnite-order SVMA model (1) is consistent with all discrete-time DynamicStochastic General Equilibrium (DSGE) models and all stable SVAR models for y t .Second, we assume the availability of one or more external IVs for the shock of in-terest, with the shock of interest speciﬁed to be the ﬁrst one, ε ,t . Each of the n z IVs z t = ( z ,t , . . . , z n z ,t ) are assumed to correlate with the ﬁrst shock but not the other shocks,after controlling for lagged variables: For all i = 1 , . . . , n z , E (˜ z i,t ε ,t ) = 0 , E (˜ z i,t ε j,τ ) = 0 for all ( j, τ ) = ( i, t ) , (2)where ˜ z i,t is the population residual from projecting z i,t on all lags of { z t , y t } . The keyexclusion restriction is that the shock of interest ε ,t is the only contemporaneous shock tocorrelate with the IVs z t . Thus, ˜ z t is a proxy for ε ,t (up to scale) that is contaminated byclassical measurement error. This is a strong assumption that must be carefully defendedin applications. Ramey (2016) and Stock & Watson (2018) survey the extensive appliedliterature that has constructed plausibly valid external IVs for various shocks.Using linear projection notation, we can equivalently express the IV exclusion restrictions62) as follows. k · k refers to the Euclidean norm. Assumption 2.

The IVs z t = ( z ,t , . . . , z n z ,t ) satisfy z t = ∞ X ‘ =1 (Ψ ‘ z t − ‘ + Λ ‘ y t − ‘ ) + αλε ,t + Σ / v v t , (3) where Ψ ‘ is n z × n z , Λ ‘ is n z × n y , λ is an n z -dimensional vector normalized to unit length( k λ k = 1 ) and with its ﬁrst nonzero element being positive, α ≥ is a scalar, and Σ v is asymmetric positive semideﬁnite n z × n z matrix. The elements of Ψ ‘ and Λ ‘ are absolutelysummable across ‘ , and the polynomial x det( I n z − P ∞ ‘ =1 Ψ ‘ x ‘ ) has all its roots outside theunit circle. The disturbance vector v t is a white noise process that is dynamically uncorrelatedwith the structural shocks ε t : v t ∼ WN (0 , I n z ) , Cov( ε t , v τ ) = 0 n ε × n z for all t, τ. The scale parameter α (along with the residual variance-covariance matrix Σ v ) measuresthe overall strength of the IVs, while the unit-length vector λ determines which IVs arestronger than others. We emphasize that the linearity of equation (3) is not a structuralassumption; it arises from a linear projection (as in the “ﬁrst stage” of cross-sectional IV).Since we restrict attention to identiﬁcation from second moments, we may without lossof generality simplify notation by assuming that all disturbances are Gaussian. Assumption 3. ( ε t , v t ) is i.i.d. jointly Gaussian. The Gaussianity assumption is strictly for notational convenience. We could insteadhave maintained the above white noise assumptions and phrased all our results using linearprojection notation. The sole meaningful restriction is that we only consider identiﬁcationfrom the second-moment properties of the data, as is standard in the applied macro literature(and without loss of generality for Gaussian data). We will drop the Gaussianity assumptionwhen developing practical inference procedures in Section 3.Finally note also that Assumptions 1 to 3 together imply that the ( n y + n z )-dimensionaldata vector ( y t , z t ) is strictly stationary. If we were to take the assumption of i.i.d. shocks seriously, and the shocks were not

Gaussian, higher-order moments of the data would be informative about the parameters. However, we agree with most of theliterature that the assumption of i.i.d. shocks is too strong due to the likely presence of stochastic volatility.Our identiﬁcation results allow for this, since we only require shocks to be white noise. .2 Parameters of interest We are interested in the propagation of the ﬁrst structural shock ε ,t to the macroeconomicaggregates y t . This section lists the parameters of interest to the applied macroeconomist. Impulse responses.

As discussed above, the ( i,

1) element Θ i, ,‘ of the moving averagecoeﬃcient matrix Θ ‘ is the impulse response of variable i to shock 1 at horizon ‘ . Wedistinguish such absolute impulse responses from relative impulse responses Θ i, ,‘ / Θ , , ,which give the response of y i,t + ‘ to a shock to ε ,t that increases y ,t by one unit on impact. Invertibility and recoverability.

The shock ε ,t is said to be invertible if it isspanned by past and current (but not future) values of the endogenous variables y t : ε ,t = E ( ε ,t | { y τ } −∞ <τ ≤ t ). This condition may or may not hold in a given moving average model(1), depending on the impulse response parameters Θ ‘ . Conventional SVAR analysis invari-ably imposes invertibility, since the SVAR model obtains from the additional assumptionsthat n ε = n y and that Θ( L ) has a one-sided inverse, so the shocks ε t = Θ( L ) − y t are spannedby current and past data. However, in many structural macro models, at least some of theshocks cannot be recovered from only lagged macro observables, i.e., the moving averagerepresentation is noninvertible. For example, this is often the case in models with news (an-ticipated) shocks or noise (signal extraction) shocks (Blanchard et al., 2013; Leeper et al.,2013). Furthermore, if n ε > n y , it is impossible for all shocks to be invertible.A continuous measure of the degree of invertibility is the R value in a population regres-sion of the shock on past and current observed variables (Sims & Zha, 2006, pp. 243–245;Forni et al., 2019). More generally, we deﬁne R ‘ ≡ Var( E ( ε ,t | { y τ } −∞ <τ ≤ t + ‘ )) , (4)the population R-squared value in a projection of the shock of interest on data up to time t + ‘ (recall that Var( ε ,t ) = 1). If the shock is invertible in the sense of the previous paragraph,then R = 1. Hence, if R <

1, then no SVAR model can generate the impulse responsesΘ( L ), although the model is nearly consistent with SVAR structure if R ≈ recoverable fromall leads and lags of the endogenous variables – that is, if E ( ε ,t | { y τ } −∞ <τ< ∞ ) = ε ,t , orequivalently if R ∞ = 1. A suﬃcient condition is that n ε = n y , since then Θ( L ) automaticallyhas a two-sided inverse (Brockwell & Davis, 1991, Thm. 3.1.3), and thus the shocks ε t =8( L ) − y t are spanned by current, past, and future data. This is the case in many DSGEmodels with news (i.e., anticipated) shocks (e.g., Leeper et al., 2013). Variance decompositions.

Variance decompositions are the key parameters of interestin this paper. We focus in the main text on the forecast variance ratio (FVR), where theFVR for the shock of interest for variable i at horizon ‘ is deﬁned as FVR i,‘ ≡ − Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t , { ε ,τ } t<τ< ∞ )Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t ) = P ‘ − m =0 Θ i, ,m Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t ) . The FVR measures the reduction in the econometrician’s forecast variance that would arisefrom being told the entire path of future realizations of the ﬁrst shock. The larger thismeasure is, the more important is the ﬁrst shock for forecasting variable i at horizon ‘ . TheFVR is always between 0 and 1.Appendix B.1 deﬁnes and provides identiﬁcation analysis for two additional variancedecomposition concepts. First, the forecast variance decomposition (FVD) is like the FVRbut instead conditions on the history of all past shocks { ε τ } −∞ <τ ≤ t , rather than the historyof observables { y τ } −∞ <τ ≤ t . Under invertibility, the FVR and FVD are identical (since thenthe information set { y τ } −∞ <τ ≤ t equals the information set { ε τ } −∞ <τ ≤ t ), explaining why theprevious SVAR literature has not distinguished between the two. Second, we consider the unconditional frequency-speciﬁc variance decomposition (VD) of Forni et al. (2019, sec. 3.4). Historical decomposition.

The historical decomposition of variable y i,t at time t at-tributable to the shock of interest is deﬁned as E ( y i,t | { ε ,τ } −∞ <τ ≤ t ) = P ∞ ‘ =0 Θ i, ,‘ ε ,t − ‘ . Our goal for the remainder of the paper is to answer the question: Given Assumptions 1 to 3,what do the second moments (autocovariances) of the data ( y t , z t ) say about the parametersof interest deﬁned above? In particular, can we test whether the shock ε ,t is invertible?Stock & Watson (2018) showed that relative impulse responses are point-identiﬁed in theSVMA-IV model. To see this transparently, consider the case with a single IV, so λ = 1.Since Cov( y i,t , z t | { y τ , z τ } −∞ <τ

FVR i,‘ of variable y i,t athorizon ‘ are given by " α × P ‘ − m =0 d Cov( y i,t , ˜ z t − m ) d Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t ) , α × P ‘ − m =0 d Cov( y i,t , ˜ z t − m ) d Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t ) . (6)The interval is always non-empty and never collapses to a point. The true FVR iscontained in this interval with high probability asymptotically, but the analysis doesnot allow us to say where in the interval the parameter lies without making furtherassumptions. We show theoretically in Section 5 that the lower bound – which uponinspection is obtainable from a simple regression of y i,t on lags of the residualized IV˜ z t – is closer to the true FVR when the IV is stronger (i.e., there is less measurementerror). The upper bound instead does not depend on the amount of measurementerror, and is closer to the true FVR when the macro variables y t are more informativeabout the hidden shock ε ,t (in the sense that the degree of recoverability R ∞ is larger). • The estimated bounds for the degree of invertibility R are given by (cid:20) α × d Var( E (˜ z t | { y τ } −∞ <τ ≤ t )) , α × d Var( E (˜ z t | { y τ } −∞ <τ ≤ t )) (cid:21) . The data is consistent with substantial non-invertibility of the shock ε ,t if the aboveinterval contains values substantially below 1. By the deﬁnition of ˆ α , this is the caseif future values of the macro observables help to predict the residualized IV ˜ z t . • The estimated bounds for the degree of recoverability R ∞ are given by (cid:20) ˆ α ˆ¯ α , (cid:21) . The data is consistent with substantial non-recoverability of the shock of interest ε ,t if the interval contains values substantially below 1. The reason why the upper boundabove equals the trivial bound of 1 is that we do not exploit the sharp upper bound,since the latter turns out to be diﬃcult to estimate in realistic sample sizes. Thetheoretical sharp upper bound is derived in Section 5 below. Point identification/estimation under recoverability.

Finally, if the researcheris willing to impose additional a priori assumptions, then it is possible to point-identify many12f the parameters of interest. In particular, if we are willing to assume that the shock isrecoverable – i.e., R ∞ = 1 – then the upper bound for FVR i,‘ in (6) is a consistent estimatorof the true FVR. As discussed in Sections 2.2 and 6, recoverability is a mathematically andeconomically weaker assumption than the invertibility assumption required by conventionalSVAR-IV analysis. Section 5 presents other suﬃcient conditions for point identiﬁcation.

In Appendix B.7 we prove that the above-mentioned bounds are jointly asymptotically nor-mal under weak nonparametric regularity conditions on the data generating process (DGP).We assume neither that the true DGP is a ﬁnite-order VAR, nor that the shocks are Gaus-sian. This argument requires the VAR lag length p = p T used for estimation to diverge withthe sample size T at an appropriate rate.Since the bounds are asymptotically normal, we can use standard arguments to constructconﬁdence sets (Imbens & Manski, 2004). Consider any one of the partially identiﬁed pa-rameters discussed above and denote the estimates of its bounds by the generic notation[ˆ θ, ˆ¯ θ ]. We then use a conventional bootstrap for VAR models (Kilian & L¨utkepohl, 2017, ch.12.2) to generate bootstrap samples of the bound estimates ˆ θ and ˆ¯ θ , and let ˆ q β and ˆ¯ q β denotethe bootstrap β -quantiles of the lower and upper bounds, respectively. Then the interval[ˆ q β/ , ˆ¯ q − β/ ] is a valid 1 − β conﬁdence interval for the identiﬁed set of the parameter inquestion. That is, the probability that the conﬁdence interval contains the entire identiﬁedset is greater than or equal to 1 − β asymptotically; in particular, this conﬁdence interval istherefore also a valid conﬁdence set for the parameter itself. Under the additional point-identifying assumption that the shock is recoverable, the FVR is consistently estimated bythe upper bound ˆ¯ θ , so we can construct a 1 − β conﬁdence interval as [ˆ¯ q β/ , ˆ¯ q − β/ ].Because VAR inference is subject to well-known small-sample biases (Kilian & L¨utkepohl,2017), we recommend that the following alternative formulas be used. Let ˆ θ ∗ and ˆ¯ θ ∗ denotethe average bootstrap draws of ˆ θ and ˆ¯ θ . Then we report the bias-corrected point estimate[2ˆ θ − ˆ θ ∗ , θ − ˆ¯ θ ∗ ] of the bounds, as well as Hall’s percentile conﬁdence interval [2ˆ θ − ˆ q − β/ , θ − ˆ¯ q β/ ]. Similar corrections can be applied in the case of point identiﬁcation via recoverability. The validity requires that the VAR bootstrap procedure is consistent. See Kilian & L¨utkepohl (2017,ch. 12.2) for references to papers that prove bootstrap consistency under various conditions. In principle one could construct narrower conﬁdence intervals that only guarantee coverage of the pa-rameter itself (not the identiﬁed set), as in Imbens & Manski (2004) and Stoye (2009), and we do this in ourMatlab code suite. However, the decrease in length appears to be minimal in realistic applications. Application to monetary policy shocks

To illustrate our method, we revisit an old question: the importance of monetary shocks forU.S. macro ﬂuctuations. Our main result is that monetary shocks are of limited importancefor post-1990 aggregate dynamics, especially for inﬂation. The application illustrates thatour upper bound on variance decompositions can yield surprisingly sharp inference, despitethe weakness of our identifying assumptions.

Background.

Gertler & Karadi (2015) construct an external instrument for monetaryshocks from high-frequency changes in asset prices in very short time windows around FOMCannouncements. This setting is an ideal laboratory to illustrate the appeal of our method,for two reasons.First, measurement error is likely to be substantial. Intuitively, while short time windowsaround FOMC meetings may be a clean way of isolating some monetary shocks, all shocksoccurring outside of that window are necessarily missed. Moreover, ﬁnancial data is subjectto noise due to market microstructure eﬀects and uninformed traders. Treating the IV as theshock – as in the method of Gorodnichenko & Lee (2020), which is equivalent to our lowerbound – will then understate the importance of monetary shocks due to attenuation bias. Second, non-invertibility is a threat to SVAR-IV analysis. For example, Ramey (2016),citing the increasing prevalence of forward guidance in the conduct of U.S. monetary policy,cautions against the conventional SVAR-IV approach. In contrast, our partial identiﬁcationapproach does not require the shock to be invertible (or even recoverable). Model.

Our speciﬁcation largely follows Gertler & Karadi (2015), except that we do notimpose a SVAR structure. We consider four endogenous macro variables y t : output growth(log growth rate of industrial production), inﬂation (log growth rate of CPI inﬂation), theFederal Funds Rate (FFR), and the Excess Bond Premium of Gilchrist & Zakrajˇsek (2012)as a measure of the non-default-related corporate bond spread. For robustness, we alsotry replacing the FFR with the 1-year Treasury rate, as in Gertler & Karadi (2015). The Formally, let the total monetary shock consist of two independent components, ε ,t ≡ ¯ ε ,t +˜ ε ,t , where ¯ ε ,t captures those shocks that occur inside FOMC announcement windows. Assume { ¯ ε ,t , ˜ ε ,t } are independentof { ε ,t , . . . , ε n ε ,t } . If z t = ¯ ε ,t + ¯ v t , where the noise ¯ v t is independent of { ¯ ε ,t , ˜ ε ,t , ε ,t , . . . , ε n ε ,t } , thenthe IV moment conditions (2) are satisﬁed. For the case ¯ v t = 0, our results in Section 5 imply that theGorodnichenko & Lee (2020) FVR estimator will be biased downward by a factor of Var(¯ ε ,t ) ∈ [0 , Caldara & Herbst (2019) compute FVDs for a similar speciﬁcation, assuming an SVAR model. Theirestimates of the importance of monetary shocks for inﬂation are somewhat larger than our upper bounds. z t is constructed from changes in 3-month-ahead futures prices written on theFFR, where the changes are measured over short time windows around Federal Open MarketCommittee monetary policy announcement times. Data are monthly from January 1990 toJune 2012. The AIC selects p = 6 lags in the reduced-form VAR. We use 1,000 bootstrapdraws from a homoskedastic recursive residual VAR bootstrap. Results.

The data are consistent with substantial non-invertibility. Table 1 shows pointestimates and 90% conﬁdence intervals for the identiﬁed sets of the degree of invertibilityand the degree of recoverability, either using the FFR or the 1-year rate as the interestrate variable. When we use the FFR, we can reject invertibility at the 10% level, since theconﬁdence set for the degree of invertibility excludes 1. When we use the 1-year rate, wecannot outright reject invertibility, but the conﬁdence set is still consistent with very lowdegrees of invertibility. Since the data cannot rule out a low degree of invertibility in eithercase, we proceed with our invertibility-robust SVMA-IV analysis. The data are similarlyconsistent with a wide range of values for the degree of recoverability.Figure 1 shows partial identiﬁcation robust conﬁdence intervals for the forecast varianceratio of the four endogenous macro variables with respect to the monetary shock. We reportpoint estimates and conﬁdence intervals for the identiﬁed sets at each horizon separately. Wefocus here on the speciﬁcation with the FFR instead of the 1-year rate, since our quantitativeconclusions are if anything even starker with the latter observable. At all forecast horizons,the 90% conﬁdence intervals rule out FVRs above 31% for output growth and 8% for inﬂation.At forecast horizons up to 6 months, we can rule out that the monetary shock accountsfor more than 19% of the forecast variance of the Excess Bond Premium. However, wecannot rule out that the monetary shock is an important contributor to medium- or long-run forecasts of the bond premium. On the other hand, we cannot rule out that the monetaryshock is completely unimportant either.Hence, we have shown that the weak assumptions of the SVMA-IV model suﬃce toobtain tight upper bounds on the forecast variance contribution of monetary shocks forseveral variables, especially inﬂation. This is despite the ﬁnding by Stock & Watson (2018) See Gertler & Karadi (2015) for details on the construction of the IV and a discussion of the exclu-sion restriction. Nakamura & Steinsson (2018a) argue that the monetary shock identiﬁed using this IVpartially captures revelation of the Federal Reserve’s superior information about economic fundamentals.Appendix B.3 shows that our FVR bounds can generally be interpreted as bounding the importance of theparticular linear combination of shocks that tend to hit during FOMC announcements. The p-values for the Granger causality pre-test of invertibility in Section 3 are 0.0001 (FFR) and 0.390 (1-year rate). Note that Stock & Watson (2018) fail to reject invertibility in a somewhat diﬀerent speciﬁcation. mpirical application: Degree of invertibility/recoverability FFR 1-year rate R Bound estimates [0 . , . . , . . , . . , . R ∞ Bound estimates [0 . , . . , . . , . . , . Table 1:

Bounds on the degree of invertibility R and the degree of recoverability R ∞ . Interestrate variable is either Federal Funds Rate (left) or 1-year Treasury rate (right). All numbers arebootstrap bias corrected. Empirical application: Forecast variance ratios

Figure 1:

Point estimates and 90% conﬁdence intervals for the identiﬁed sets of forecast varianceratios, across diﬀerent variables and forecast horizons. For visual clarity, we force bias-correctedestimates/bounds to lie in [0 , conditional on monetarypolicy shocks in post-1990 data. Although this ﬁnding echoes the conclusions of previouswork (Christiano et al., 1999; Ramey, 2016), our identifying assumptions are much weaker– we are merely imposing validity of the IV in the context of a general SVMA model. Weconclude that, to the extent that inﬂation is a monetary phenomenon, it is so because of thesystematic component of monetary policy, not because of erratic policy shocks.

This section contains the main results of our paper: the formal identiﬁcation analysis justify-ing the inference procedures in Section 3. For exposition, we start in Section 5.1 by derivingresults for a simple static version of our SVMA-IV model. We then turn to the generaldynamic model, applying the static results to the frequency domain representation of thedata. Section 5.2 provides the core building block result, Section 5.3 derives identiﬁed setsfor the parameters of interest, and Section 5.4 discusses additional restrictions that guaran-tee point identiﬁcation. We initially focus on the case with a single IV, but we discuss thestraight-forward extension to multiple IVs in Section 5.5.

To build intuition, consider a static version of the SVMA-IV model with a single instrument: y t = Θ • , , ε ,t + ξ t ,z t = αε ,t + σ v v t , ( ε ,t , v t , ξ t ) i.i.d. ∼ N , I × n y n y × Σ ξ !! . Here α, σ v ≥ ξ t ≡ P n ε j =2 Θ • ,j, ε j,t is an n y -dimensional random vector thatcaptures all the structural shocks other than the one of interest, and Σ ξ ≡ Var( ξ t ). Appendix B.6 reports variance decompositions obtained from a conventional SVAR-IV procedure. Theseresults conﬁrm the limited importance of the monetary shock, though under stronger identifying assumptions. While the static model is primarily intended to provide intuition about the analysis of the SVMA-IVmodel, the results in this subsection are directly relevant for identiﬁcation in the more restrictive SVAR

FVR i, = 1 − Var( y i,t | ε ,t )Var( y i,t ) = Θ i, , Var( y i,t ) . This is just the population R-squared value in the (infeasible) regression of y i,t on ε ,t . SinceCov( y i,t , z t ) = α Θ i, , , it is easy to see that the FVR is identiﬁed up to a factor 1 /α : FVR i, = 1 α × Cov( y i,t , z t ) Var( y i,t ) . (7)Thus, we ask: What does the variance-covariance matrix of the data ( y t , z t ) say about thescale parameter α ?Our key insight is that the static model is nothing but a multivariate classical measure-ment error model: Whereas we would like to measure the R-squared value from a regressionof y t on ε ,t , we only observe the noisy proxy z t for the “regressor”. Intuitively, the con-tribution of ε ,t to y t is not point-identiﬁed because the signal-to-noise ratio α /σ v of theproxy z t is not known a priori . For example, upon observing a small correlation betweenthe IV and macro observables, we do not know whether this correlation is small because ofmeasurement error or because the shock is unimportant. Nevertheless, the moments of thedata are informative about the signal-to-noise ratio. At one extreme, the IV can never bemore than perfect – at best, there is no measurement error (inﬁnite signal-to-noise ratio).At the other extreme, the signal-to-noise ratio cannot be zero, since then the IV would notcorrelate at all with macro observables. We now formalize this intuition. Lower bound on shock importance.

We begin with a lower bound on the importanceof the shock (and so on the amount of measurement error), or equivalently an upper boundon α . To derive this bound, simply observe that α ≤ α + σ v = Var( z t ) . This inequality binds when there is no measurement error in the IV, i.e., when σ v = 0. model with an external IV. In that framework, y t would denote the n y reduced-form VAR residuals, whichare linear functions of the vector ε t of n ε contemporaneous structural shocks. Our bounds do not follow from existing results in the literature on measurement error in linear regression(e.g., Klepper & Leamer, 1984), since our parameters of interest are not regression coeﬃcients. α into a lower bound on the FVR via (7), we get FVR i, ≥ Cov( y i,t , z t ) Var( z t ) Var( y i,t ) = Corr( y i,t , z t ) . (8)The lower bound corresponds to the population R-squared value in a regression of y i,t on z t , that is, a regression which treats the IV as if it were a perfect measure of the shock ε ,t (up to scale). The attenuation bias imparted by the measurement error v t implies that thisregression yields a lower bound on the true FVR. Upper bound on shock importance.

To derive the upper bound on the importanceof the shock (and on the amount of measurement error), or equivalently the lower bound on α , deﬁne ﬁrst z † t ≡ E ( z t | y t ) and ε † ,t ≡ E ( ε ,t | y t ). Then, by standard linear projectionalgebra, we must have Var( z † t ) = α Var( ε † ,t ) ≤ α Var( ε ,t ) = α . Intuitively, α = Var( E ( z t | ε ,t )) is the explained sum of squares from a projection of z t onthe shock ε ,t . This must weakly exceed the explained sum of squares Var( z † t ) = Var( E ( z t | y t )) from a projection of z t on y t , simply because the variables in y t are eﬀectively noisymeasures of the shock ε ,t contaminated by other structural shocks ξ t , and uncorrelated with v t . In other words, the explanatory power of the variables y t for the IV z t puts a lower boundon the possible signal-to-noise ratio α /σ v = α / (Var( z t ) − α ). The inequality above bindswhen the shock is invertible ( ε † ,t ≡ E ( ε ,t | y t ) = ε ,t ), i.e., when the macro observables y t explain as much of the variation in the IV as the shock ε ,t itself does.Mapping the lower bound on α into an upper bound on the FVR via (7), we get FVR i, ≤ Cov( y i,t , z t ) Var( z † t ) Var( y i,t ) = Cov( y i,t , z † t ) Var( z † t ) Var( y i,t ) = Corr( y i,t , z † t ) . (9)The upper bound corresponds to treating the projection z † t = αε † ,t of the IV on the macroobservables as a perfect measure of the shock (up to scale). This is correct if indeed theshock were invertible ( ε † ,t = ε ,t ), but otherwise overstates the importance of the shock.Intuitively, unless the shock is in fact invertible, the upper bound mistakenly attributes toomuch of the lack of co-movement between y t and z t to measurement error (rather than theactual limited importance of ε ,t ). 19hereas the lower bound (8) on the FVR for variable i does not depend on the entireset of observed macro aggregates y t , the upper bound (9) decreases monotonically as we addmore variables to the vector y t . In particular, the upper bound equals the trivial bound of 1if there is only one observable ( n y = 1), since in this case we cannot rule out that the scalartime series y t is driven entirely by the ﬁrst shock, with the imperfect correlation between y t and z t purely caused by measurement error. However, when n y ≥

2, the upper boundis generally below 1. We present an analytical example in Section 6.1 that shows how theaddition of extra observables helps sharpen identiﬁcation, and clariﬁes the conditions underwhich we can expect the upper bound to be close to the true FVR.

Identified set.

The bounds α ∈ [Var( z † t ) , Var( z t )] are sharp , i.e., exploit all informationcontained in the second moments of the data, in the following sense. Suppose we are givenany non-singular variance-covariance matrix for the data ( y t , z t ) , as well as any value of α in our interval. We can then choose appropriate values of the remaining parameters suchthat the model matches the given variance-covariance matrix of the data. Under what conditions are the bounds on α – and thus on the FVR – likely to be tight(i.e., close to the true FVR)? We can express the identiﬁed set for 1 /α in terms of theunderlying model parameters as follows:1 α ∈ (cid:20) α α + σ v × α , E ( ε ,t | y t )) × α (cid:21) . The lower bound is closer to the true value 1 /α when the actual signal-to-noise ratio α /σ v is larger, i.e., when the IV is stronger. The upper bound is closer to the true FVR whenthe degree of invertibility R = Var( ε † ,t ) = Var( E ( ε ,t | y t )) is larger, i.e., when the macrovariables y t are more informative about the hidden shock ε ,t . Finally, the identiﬁed set isnever empty, and it collapses to a point only in case of a perfect IV and invertibility. Point identification.

Point identiﬁcation obtains if the researcher assumes either thatthe IV is perfect ( σ v = 0), in which case the lower bound for the FVR binds, or that theshock of interest is invertible ( ε † ,t = ε ,t ), in which case the upper bound binds. Mathematically, when y t is a scalar, then z † t = E ( z t | y t ) ∝ y t , so Corr( y t , z † t ) = ± This is achieved by the choices Θ • , , = α Cov( y t , z t ), σ v = Var( z t ) − α , and Σ ξ = Var( y t ) − α Cov( y t , z t ) Cov( y t , z t ) . This choice of σ v is nonnegative since Var( z t ) ≥ α , and Lemma 1 in Ap-pendix A.2.1 implies that the choice of Σ ξ is a positive semideﬁnite matrix since α ≥ Var( z † t ) =Cov( z t , y t ) Var( y t ) − Cov( y t , z t ). .2 Dynamic model: shock scale We now analyze identiﬁcation in the general dynamic model of Section 2.1. The key idea inour proofs is to apply the logic of the static model frequency-by-frequency to the frequencydomain representation of the data.As in the static case, we begin in this section by characterizing the identiﬁed set for thescale parameter α . While not economically interesting in itself, this scale parameter is ulti-mately key to identiﬁcation of our actual parameters of interest. We maintain Assumptions 1to 3 throughout, but for the moment consider the case of a single IV ( n z = 1), leaving thegeneralization to Section 5.5. That is, z t is a scalar and λ = 1 in equation (3). We writeΣ / v = σ v ≥

0, a scalar.

Preliminaries.

It will prove convenient to deﬁne the IV projection residual that removesany dependence on lagged observed variables:˜ z t ≡ z t − E ( z t | { y τ , z τ } −∞ <τ

We again begin with a lower bound on shockimportance (or the amount of measurement error), which corresponds to an upper bound onthe scale parameter α . As in the static model, we ﬁnd α ≤ α + σ v = Var(˜ z t ) ≡ α UB . (11)Thus, once we look at the residualized IV in (10), the bound construction works as in thestatic case, with the boundary α = α UB corresponding to a perfect IV.21 pper bound on shock importance. For the upper bound on shock importance (orthe lower bound on α ), we apply a version of the argument from the static case to the jointspectrum of the data at every frequency. First, as in the static case, we deﬁne the projectionsof ˜ z t and ε ,t , respectively, just now onto all leads and lags of the endogenous variables y t :˜ z † t ≡ E (˜ z t | { y τ } −∞ <τ< ∞ ) , (12) ε † ,t ≡ E ( ε ,t | { y τ } −∞ <τ< ∞ ) . Note that ˜ z † t = αε † ,t , since the measurement error v t is dynamically uncorrelated with y t .Applying the same logic as in the static case at an arbitrary frequency ω ∈ [0 , π ], we have s ˜ z † ( ω ) = α s ε † ( ω ) ≤ α s ε ( ω ) = α × π . (13)The last equality uses that the shock ε ,t is white noise with variance 1. Similar to thestatic case, the inequality above arises because the “explained sum of squares” s ε † ( ω ) from afrequency-speciﬁc projection of the shock ε ,t on all leads and lags of the macro observables y t must be less than the “total sum of squares” s ε ( ω ). Exploiting the inequality (13) atall frequencies, we obtain the lower bound α ≥ π sup ω ∈ [0 ,π ] s ˜ z † ( ω ) ≡ α LB . (14)The bound binds if at some frequency ω ∈ [0 , π ] the observed macro aggregates are perfectlyinformative about the hidden shock ε ,t . This is the natural dynamic, frequency-domainanalogue of the condition in the static case, where we required the static y t to be perfectlyinformative about ε ,t . If the macro aggregates are in fact not perfectly informative aboutthe shock at any frequency, then the lower bound attributes too much of the (frequency-by-frequency) lack of co-movement between y t and z t to measurement error. The identified set.

The main theoretical result of this paper is that the above bounds α LB , α UB are sharp. Proposition 1.

Let there be given a joint spectral density for w t = ( y t , ˜ z t ) , continuous andpositive deﬁnite at every frequency, with ˜ z t unpredictable from { w τ } −∞ <τ

In practice, we do not recommendexploiting the sharp lower bound on α for estimation and inference. The reason is that α LB in equation (14) equals the supremum of a function, which depends on the spectral density The proposition does not cover the knife-edge case α = α LB due to economically inessential technicalities. α ≡ Var(˜ z † t ) = Z π s ˜ z † ( ω ) dω ≤ π sup ω ∈ [0 ,π ] s ˜ z † ( ω ) = α LB . (15)Since α is given by an integral of the spectrum as opposed to a supremum, its point estimatordeﬁned in Section 3 is consistent and asymptotically normal, as shown in Appendix B.7. Since Var(˜ z † t ) = α Var( ε † ,t ) = α × R ∞ , the weaker lower bound on α will neverthelessbe close to the truth if the shock of interest is close to being recoverable ( R ∞ ≈ R ≈ α binds in a model with news shocks, which cannot be analyzedusing conventional SVAR-IV methods that assume invertibility. Given the identiﬁed set for α , it is now straight-forward to derive identiﬁed sets for variancedecompositions as well as the degree of invertibility and recoverability. Variance decompositions.

The FVR satisﬁes

FVR i,‘ = P ‘ − m =0 Θ i, ,m Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t ) = 1 α × P ‘ − m =0 Cov( y i,t , ˜ z t − m ) Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t ) . Hence, as in the static case, the identiﬁed set for

FVR i,‘ equals the identiﬁed set for α ,scaled by the (point-identiﬁed) second fraction on the far right-hand side above. This yieldsthe estimated bounds deﬁned in Section 3 (which use the weaker bound α in place of α LB ).As discussed previously, and as in the static case, the lower bound for the FVR depends onthe strength of the IV, and the upper bound on the FVR depends on the informativenessof the macro variables for the shock of interest. Adding more variables to the vector y t Methods from the moment inequality literature could be applied to develop conﬁdence intervals thatexploit our sharp lower bound α LB (Andrews & Shi, 2013, 2017; Chernozhukov et al., 2013). We leave thismore complicated option to future work. Alternatively, if researchers have a strong a priori reason to believethat the shock is likely to be particularly important at certain frequencies, then they may ﬁx frequencybounds [ ω , ω ] and compute the integral in (15) by integrating over this interval only.

24f endogenous observables always leads to a weakly narrower identiﬁed set (in percentageterms, since the parameter

FVR i,‘ itself also changes when we change the vector y t ). Unlikein the static case, the upper bound in the dynamic case is generally below 1 even if we onlyobserve a single macro time series ( n y = 1), as shown by example in Section 6.2.An analogous argument yields the identiﬁed set for the frequency-speciﬁc unconditionalvariance decomposition (VD), while the sharp bounds for the FVD require somewhat morework. We state these results in Appendix B.1. Degree of invertibility & recoverability.

The deﬁnition (4) of R ‘ implies R ‘ = 1 α × Var( E (˜ z t | { y τ } −∞ <τ ≤ t + ‘ )) . (16)Since the variance on the right-hand side above is point-identiﬁed, the identiﬁed sets for thedegree of invertibility ( ‘ = 0) and the degree of recoverability ( ‘ = ∞ ) follow immediatelyfrom the identiﬁed set for α . This yields the bound estimates for R and R ∞ in Section 3(which use the weaker bound α in place of α LB ).From the sharp bounds on R and R ∞ , we can also derive testable conditions under whichthe distribution of the observable data is consistent with invertibility or recoverability. Proposition 2.

Assume α LB > . The identiﬁed set for R contains 1 if and only if theinstrument residual ˜ z t does not Granger cause the macro observables y t . The identiﬁed setfor R ∞ contains 1 if and only if the projection ˜ z † t is serially uncorrelated. According to Proposition 2, ε ,t is certain to be noninvertible if and only if ˜ z t Grangercauses y t (which is equivalent with the condition that z t Granger causes y t ). This resultis the basis for the pre-test of invertibility in Section 3. Note, however, that a ﬁnding ofGranger non-causality need not imply that R = 1; the identiﬁed set for R always includesvalues below 1. Proposition 2 additionally implies that ε ,t is certain to be non-recoverableif and only if ˜ z † t , deﬁned in (12), is serially correlated at some lag. Absolute impulse responses.

For completeness, we note that the identiﬁed set for the absolute impulse response Θ i, ,‘ is obtained by scaling the identiﬁed set for α , cf. equation(5). This extends existing results on the point-identiﬁcation of relative impulse responses(Stock & Watson, 2018), as discussed at the end of Section 2. We leave the development of a practical statistical test of recoverability to future research. .4 Dynamic model: point identiﬁcation As we have seen, without further restrictions, our various parameters of interest are onlyinterval-identiﬁed, albeit with informative bounds. In this section we complement thoseresults by stating a menu of suﬃcient conditions, each of which guarantees point identiﬁcationof the FVR and historical decompositions.

Informative instruments.

Point identiﬁcation obtains if the researcher is willing toassume that the instrument is perfect, i.e., σ v = 0. In this case the lower bounds on theFVR and degree of invertibility/recoverability bind. Indeed, since the instrument equals theshock up to scale, ˜ z t = αε ,t , the FVR and historical decompositions are easily computedthrough regressions (Jord`a, 2005; Gorodnichenko & Lee, 2020). Note that the assumptionthat the IV is perfect is not testable. Informative macro aggregates.

The second set of suﬃcient conditions relates to theinformativeness of the macro aggregates y t for the hidden shock ε ,t . In this category, ourweakest condition for point identiﬁcation is that the data y t is perfectly informative about ε ,t at some frequency, i.e., the spectral density of the projection residual ε ,t − ε † ,t vanishesat some frequency ω . Then α = α LB , so the FVR and degree of invertibility/recoverabilityare identiﬁed. This assumption is not testable.A stronger but more easily interpretable assumption is recoverability, i.e., ε † ,t ≡ E ( ε ,t |{ y τ } −∞ <τ< ∞ ) = ε ,t . This assumption is testable, cf. Proposition 2. As explained inSection 2.2, recoverability is restrictive, but it is a meaningfully weaker requirement thaninvertibility in many economic applications, such as in the news shock model in Section 6.3below. In particular, it is satisﬁed whenever there are as many shocks as variables, n ε = n y . Under recoverability, the shock itself can be identiﬁed as ε ,t = α ˜ z † t , so the historicaldecomposition E ( y i,t | { ε ,τ } −∞ <τ ≤ t ) = E ( y i,t | { ˜ z † τ } −∞ <τ ≤ t ) is also identiﬁed. To conclude, we brieﬂy extend the analysis to a model with multiple IVs for the shock ofinterest ( n z ≥ z t ≡ z t − E ( z t | { y τ , z τ } −∞ <τ

0. Importantly, even with the amount of measurement error unknown, the IV z t reveals the signs and the relative magnitudes of the co-movement in observables inducedby monetary shocks: The shock ε ,t moves inﬂation and interest rates in opposite directions(Uhlig, 2005), while the unconditional correlation of interest rates and inﬂation is ρ .We consider three instructive special cases. For the ﬁrst two, we set ζ = 1; applying ouridentiﬁcation analysis, we then get the bounds FVR i, ≤

12 (1 − ρ ) , i = 1 , . Now suppose ﬁrst that ρ = −

1; that is, interest rates and inﬂation are not just perfectlynegatively correlated conditional on monetary shocks, but also unconditionally. In that casethe upper bound for the FVR of both variables equals 1: The data cannot rule out thatthe correlation of the IV with macro observables is imperfect purely because of measurementerror. Second, suppose that ρ = 1; that is, interest rates and inﬂation are perfectly positivelycorrelated in the data. Then our upper bound for the FVR suddenly equals zero: Themonetary shock induces co-movement patterns that we never see in the data, so it cannotpossibly explain any observed macro ﬂuctuations. Third, if instead ρ = 0, then our upperbounds are, for any ζ ≥ FVR , ≤

11 + ζ , FVR , ≤ ζ ζ . ζ (cid:29)

1. Then the upper bound on the inﬂation variance decomposition

FVR , is verysmall; intuitively, since the IV reveals that the monetary shock moves interest rates by muchmore than inﬂation, but both have the same unconditional variance, the monetary shockcannot possibly be an important driver of inﬂation. This third example rationalizes theﬁndings in our application to monetary shocks in Section 4: The IV z t correlates much morewith interest rates than with prices, yet prices are not commensurately less volatile thaninterest rates, so monetary shocks cannot account for much of the volatility in prices.This example shows that our upper bound on the FVR is close to the true value ifeither the shock is very prominent (so that the bound of one is not far from the truth) or ifthe shock induces somehow atypical co-movements of the various observed macro aggregates.This second condition is equivalent to the shock being prominent for some linear combination of the macro observables y t , which is equivalent to the shock being nearly invertible in thisstatic model. Thus, the preceding arguments agree with the analysis in Section 5.1. Whereas the previous example illustrated how the availability of several macro time seriessharpens identiﬁcation in a static context, we now show how the dynamics of individual timeseries can do the same. Consider the univariate but dynamic model y t = n ε X j =1 ∞ X ‘ =0 ρ ‘j ε j,t − ‘ with n y = 1. That is, we observe a single variable y t driven by n ε independent AR(1)processes. To ﬁx ideas, we think of ε ,t as a technology shock and y t as aggregate output.Now assume that long-run ﬂuctuations in output y t are exclusively driven by the tech-nology shock ε ,t ; that is, consider the limit ρ →

1, while ﬁxing | ρ j | < j ≥

2. Inthis case, the sharp lower bound on α converges to the truth: lim ρ → α LB = lim ρ → π sup ω ∈ [0 ,π ] s ˜ z † ( ω ) = lim ρ → πs ˜ z † (0) = α . Intuitively, at spectral frequency zero, all ﬂuctuations in y t are driven by the technology Note that s y ˜ z (0) = α π P ∞ ‘ =0 ρ ‘ = α π × − ρ and s y (0) = π ( P n ε j =1 P ∞ ‘ =0 ρ ‘j ) = π ( P n ε j =1 11 − ρ j ) .Footnote 16 then implies 2 πs ˜ z † (0) = s y ˜ z (0) /s y (0) = ((1 − ρ ) s y ˜ z (0)) / ((1 − ρ ) s y (0)) → α as ρ → y t therefore isolates the ﬂuctuationscaused by the technology shock. Leads and lags of this low-pass ﬁltered series are thus highlycorrelated with the IV, putting a lower bound on the signal-to-noise ratio in the IV. This iswhy the sharp upper bound for the FVR converges to the true value.The example reveals that cross-restrictions over time can be highly informative even ifthe shock of interest is neither invertible nor recoverable. Intuitively, for the sharp upperbound on the FVR to bind, our method only needs the shock to dominate at some frequency;the across-frequency restrictions then do the rest, exactly like the cross-variable restrictionsin the static example above. In the third example, we show how our method deals with non-invertible news shocks. Firstdiscussed in Pigou (1927), news shocks have recently received much attention as driversof macroeconomic ﬂuctuations (Beaudry & Portier, 2006, 2014; Jaimovich & Rebelo, 2009;Schmitt-Groh´e & Uribe, 2012). Unfortunately, foresight of economic agents complicates con-ventional SVAR-based analysis since it induces equilibria with non-invertible MA represen-tations (Leeper et al., 2013). In contrast, our methods are valid irrespective of invertibility.To illustrate, consider a moving average model of order 1 with n y = n ε = 2: y t = (1 + ζL )Θ ε t , where ζ >

1. As is well known, this assumption implies that the moving average represen-tation is non-invertible. We think of ε ,t as a monetary forward guidance shock: The shockmoves inﬂation and nominal interest rates by more tomorrow (when the shock directly hitsthe monetary policy rule) than today (when the news is revealed).The conventional SVAR-IV approach mis-measures the FVR because of non-invertibility.By standard arguments (e.g., Leeper et al., 2013) the reduced-form VAR residuals equal u t ≡ y t − E ( y t | { y τ } −∞ <τ

1, unless ε ,t is recoverable. However, as discussed in Footnote 18, researchers may leveragea strong prior belief about the low-frequency importance of shocks by computing the integral in (15) for apre-speciﬁed range of (low) frequencies. R = ζ − . Since SVAR procedures assume that the structural shocks ε t can be obtained as linearfunctions of the reduced-form residuals u t , equation (19) shows that any SVAR analysis willconﬂate the explanatory power of the shock ε ,t with that of its lags. As a consequence,Appendix B.4 shows that the SVAR-IV estimand of the FVR overstates the contribution ofthe shock to one-step-ahead forecasts: FVR

SVAR − IV , = 1 R × FVR , > FVR , . Clearly, the population bias of the SVAR-IV estimand worsens as the degree of invertibility R decreases to 0 (see also Forni et al., 2019).In contrast, our identiﬁcation bounds are valid irrespective of invertibility, since we do notassume that ε ,t can be recovered as a function of only the contemporaneous VAR residuals u t . In fact, in this model with as many observables as shocks, both shocks ε t = ( ε ,t , ε ,t ) are recoverable. Hence, if we exploit this knowledge, we can even point-identify the shockas ε ,t ∝ z † t = E ( z t | { y τ } −∞ <τ< ∞ ). The key is that our method can use the future valuesof nominal rates and inﬂation, y τ , τ ≥ t , to recover the forward guidance shock ε ,t at time t . In so doing, it eﬀectively realigns the information sets of the economic agents and of theeconometrician, sidestepping the invertibility problem. We ﬁnish by showing that our inference procedures have good ﬁnite-sample performancein simulations. Our methods continue to work well in non-invertible models, unlike theconventional SVAR-IV procedure.

DGP.

We adopt a variant of the DGP in Kilian & Kim (2011) and assume that the macroaggregates y t follow a structural VARMA( p ,1) model: y t = P p‘ =1 Ξ ‘ y t − ‘ + Θ ( ε t + ζε t − ) . In particular, ε t = − R Θ − u t − (cid:0) − R (cid:1) P ∞ ‘ =1 ( − ζ ) − ‘ Θ − u t + ‘ .

31e consider n y = 2 macro variables, p = 1 autoregressive lag (with one exception discussedbelow), and set Ξ = ρ y . . ! . For the MA part, we consider n ε = 2 shocks (which are thus both recoverable) and setΘ = chol . . ! , where “chol” denotes the lower triangular Cholesky decomposition. As in Section 6.3, ζ is a scalar parameter that governs the degree of invertibility, with ζ > z t for the shock of interest ε ,t : z t = ρ z z t − + ρ zy ( y ,t − + y ,t − ) + ε ,t + σ v v t . Notice that we have normalized α = 1. Finally, the measurement error and structural shocksare i.i.d. Gaussian and orthogonal as in Assumption 3.We run Monte Carlo experiments for nine diﬀerent parameterizations of the above DGP.Speciﬁcally, we consider various deviations from a baseline parametrization. In our bench-mark, we set ρ y = 0 . ρ z = ρ zy = 0, ζ = 0, σ v = 1, and sample size T = 250. We thenconsider variations with more autoregressive persistence (either ρ y = 0 .

9, or ρ z = 0 . ρ zy = 0 . ζ = 0 . ζ = 2),a weaker instrument ( σ v = 2), and diﬀerent sample sizes ( T = 100, T = 500). Finally, weallow for richer dynamics, with p = 4 and Ξ j = j Ξ for j = 2 , , Results.

Our parameters of interest are the degree of invertibility R and the FVR forvariable y ,t at horizons 1 and 4. We conduct 5 ,

000 Monte Carlo repetitions per DGP, andconstruct conﬁdence intervals at the 90% level using 1 ,

000 bootstrap draws per simulation.We use a homoskedastic recursive residual bootstrap. The reduced-form VAR lag length isselected using AIC, and we use Hall’s percentile bootstrap conﬁdence interval, cf. Section 3.Table 2 shows that the partial identiﬁcation robust SVMA-IV conﬁdence sets deﬁned inSection 3 achieve coverage rates close to or exceeding the desired level of 90% throughout.We report coverage rates for both the population identiﬁed sets (columns “Set”) and for theunderlying parameters (columns “Param”). The coverage rate for the parameter is neverbelow 86.8% in any case. The coverage rate for the identiﬁed set is mostly close to 90% and at32 o n t e C a r l o s t u d y : C o v e r a g e r a t e s o f c o n f i d e n c e i n t e r va l s T r u e p a r a m e t e r C o v e r ag e R C o v e r ag e F V R , C o v e r ag e F V R , E x p e r i m e n t R F V R , F V R , S e t P a r a m S e t P a r a m S VA R S e t P a r a m S VA R B a s e li n e . . . . . . . . . . ρ y = . . . . . . . . . . . ρ z = . , ρ z y = . . . . . . . . . . . ζ = . . . . . . . . . . . ζ = . . . . . . . . . . . σ v = . . . . . . . . . . T = . . . . . . . . . . T = . . . . . . . . . . p = . . . . . . . . . . T a b l e : C o v e r ag e r a t e s o f % c o nﬁd e n ce i n t e r v a l s , c o n s t r u c t e d a s i nS ec t i o n , u s i n g1 , b oo t s t r a p i t e r a t i o n s f o r e a c h M o n t e C a r l o e x p e r i m e n t , a nd , M o n t e C a r l o e x p e r i m e n t s p e r D G P . T h e D G P s ( a l o n g r o w s ) a r e d e s c r i b e d i n t h e t e x t . “ S e t ” : p r o b a b ili t y t h a t S V M A - I V c o nﬁd e n ce i n t e r v a l c o v e r s e n t i r e i d e n t i ﬁ e d s e t . “ P a r a m ” : p r o b a b ili t y t h a t S V M A - I V c o nﬁd e n ce i n t e r v a l c o v e r s p a r a m e t e r . “ S VA R ” : p r o b a b ili t y t h a t c o n v e n t i o n a l S VA R - I V c o nﬁd e n ce i n t e r v a l c o v e r s p a r a m e t e r . ζ =2), whereas the SVAR-IV procedure under-covers severely in this case. We make the following additional remarks. First, coverage deteriorates slightly withnoisier/weaker instruments ( σ v = 2), as expected. Our inference methods are not robust toarbitrarily weak instruments ( σ v → ∞ ); we leave this issue to future work. Second, we facesome well-known parameter-at-the-boundary issues. For most experiments, R = 1. Thisexplains the over-coverage of conﬁdence intervals for this parameter and, less so, for theoverall identiﬁed set. Similar problems would arise if the true FVR were close to 0. Third,for more persistent DGPs, the AIC tends to select an insuﬃcient number of lags, resultingin moderate under-coverage, in particular for the FVRs at horizon 4. For example, in theexperiment with p = 4 autoregressive lags, the AIC selects an average lag length of 2 . Applied macroeconomists have recently turned to external sources of exogenous variation toidentify dynamic causal eﬀects. Though such external instruments or proxies are frequentlyused to estimate impulse responses, existing methods did not allow researchers to quantifythe contribution of individual shocks to business-cycle ﬂuctuations – a question of ﬁrst-orderinterest in traditional business-cycle analysis. We ﬁll this gap by providing identiﬁcationresults and inference techniques for variance decompositions, historical decompositions, andthe degree of invertibility. Our methods require neither the absence of measurement error inthe external instrument, nor the often dubious assumption that the instrumented shock isinvertible (as assumed in conventional SVAR analysis). We prove that the importance of theinstrumented shock is generally interval-identiﬁed. Point identiﬁcation can be achieved if theshock is known to be recoverable – a substantively weaker assumption than invertibility. Weprovide a software package that implements all steps of our inference procedures. Applyingour method to U.S. data, we are able to establish a tight upper bound on the importance ofmonetary shocks for recent inﬂation dynamics, despite our weak identifying assumptions. We acknowledge, however, that in DGPs with only mild non-invertibility, SVAR-IV procedures maybe preferable to our more robust SVMA-IV procedure, since the former procedure has fewer parameters toestimate and will be only mildly biased (cf. Appendix B.4). Appendix

A.1 Formulas for estimation and inference

Here we provide the remaining formulas needed for the inference procedures in Section 3.Let ˆ A , . . . , ˆ A p denote the ( n y + 1) × ( n y + 1) coeﬃcient matrix estimates for the VAR in W t = ( y t , z t ) . Let ˆΣ denote the residual sample variance-covariance matrix. Let ˆΣ / denoteany square matrix such that ˆΣ / ˆΣ / = ˆΣ, e.g., the Cholesky factor. Compute the movingaverage coeﬃcients ˆ B ( L ) ≡ ( I n y +1 − P p‘ =1 ˆ A ‘ L ‘ ) − ˆΣ / using the familiar recursionˆ B = ˆΣ / , ˆ B h = P min { h,p } ‘ =1 ˆ A ‘ ˆ B h − ‘ , h ≥ . Denote the top n y rows of ˆ B h by ˆ B y,h and the bottom row by ˆ B z,h . Then d Var(˜ z t ) ≡ ˆ B z, ˆ B z, , d Cov(˜ z t , y t + h ) ≡  ˆ B z, ˆ B y,h if h ≥ , × n y otherwise , d Cov( y t , y t − h ) ≡ P ∞ ‘ =0 ˆ B y,‘ ˆ B y,‘ + h for h ≥ . In practice, we truncate the inﬁnite sum above at a large value of ‘ .Deﬁne now the projection variances d Var( E (˜ z t | { y τ } −∞ <τ< ∞ )) ≡ ˆΣ ˜ z,y, ( M,M ) ˆΣ − y, ( M,M ) ˆΣ ˜ z,y, ( M,M ) , d Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t ) ≡ d Var( y i,t ) − ( d Cov( y i,t + ‘ , y t ) , . . . , d Cov( y i,t + ‘ , y t − M )) b Σ − y, ( M, × ( d Cov( y i,t + ‘ , y t ) , . . . , d Cov( y i,t + ‘ , y t − M )) , d Var( E (˜ z t | { y τ } −∞ <τ ≤ t )) ≡ ( d Cov(˜ z t , y t ) , × n y M ) ˆΣ − y, ( M, ( d Cov(˜ z t , y t ) , × n y M ) , where ˆΣ ˜ z,y, ( M,M ) is the estimated covariance vector of ˜ z t and ( y t + M , . . . , y t , . . . , y t − M ) ob-tained by stacking the estimates d Cov(˜ z t , y t + h ) deﬁned above, ˆΣ y, ( M,M ) is similarly the esti-mated variance-covariance matrix of ( y t + M , . . . , y t , . . . , y t − M ) , and ˆΣ y, ( M, is the estimatedvariance-covariance matrix of ( y t , y t − , . . . , y t − M ) .In these formulas, the integer M is a numerical truncation parameter. For example, weestimate Var( E (˜ z t | { y τ } −∞ <τ< ∞ )) using an estimate of the truncated conditional variance These could alternatively be computed using the Kalman ﬁlter, but there appears to be little diﬀerencein numerical accuracy or speed relative to the formulas stated here. E (˜ z t | { y τ } t − M ≤ τ ≤ t + M )). M should exceed at least 50 to yield an accurate approxima-tion. We recommend checking that the numerical results do not change much when M isincreased, since the eﬀects of truncation will depend on the persistence of the data. A.2 Proofs of main results

A.2.1 Auxiliary lemmaLemma 1.

Let B be an n × n Hermitian positive deﬁnite complex-valued matrix and b an n -dimensional complex-valued column vector. Let x be a nonnegative real scalar. Then B − x − bb ∗ is positive (semi)deﬁnite if and only if x > ( ≥ ) b ∗ B − b . Please ﬁnd the proof in Appendix B.8.1.

A.2.2 Proof of Proposition 1

Let α and the spectrum s w ( ω ) be given. Deﬁne the n y -dimensional vectorsΘ • , ,‘ = α − Cov( y t , ˜ z t − ‘ ) , ‘ ≥ , and the corresponding vector lag polynomialΘ • , ( L ) = ∞ X ‘ =0 Θ • , ,‘ L ‘ . Since α ≤ α UB , we may deﬁne σ v = p Var(˜ z t ) − α . Since α > α LB , Lemma 1 implies that s y ( ω ) − πα s y ˜ z ( ω ) s y ˜ z ( ω ) ∗ = s y ( ω ) − π Θ • , ( e − iω )Θ • , ( e − iω ) ∗ is positive deﬁnite for every ω ∈ [0 , π ]. Hence, the Wold decomposition theorem (Hannan,1970, Thm. 2 , p. 158) implies that there exists an n y × n y matrix lag polynomial ˜Θ( L ) = P ∞ ‘ =0 ˜Θ ‘ L ‘ such that s y ( ω ) − π Θ • , ( e − iω )Θ • , ( e − iω ) ∗ = 12 π ˜Θ( e − iω ) ˜Θ( e − iω ) ∗ , ω ∈ [0 , π ] . We can rule out a deterministic term in the Wold decomposition because a continuous and positivedeﬁnite spectral density satisﬁes the full-rank condition of Hannan (1970, p. 162). w t = ( y t , ˜ z t ) generates the desired spectrum s w ( ω ): y t = Θ • , ( L ) ε ,t + ˜Θ( L )˜ ε t , ˜ z t = αε ,t + σ v v t , ( ε ,t , ˜ ε t , v t ) i.i.d. ∼ N (0 , I n y +2 ) . Note that the construction requires only n ε = n y + 1 shocks, ε ,t ∈ R and ˜ ε t ∈ R n y . A.2.3 Proof of Proposition 2

Identified set for R . If the identiﬁed set contains 1, then there must exist an α ∈ [ α LB , α UB ] and i.i.d., independent standard Gaussian processes ε ,t and v t such that (i)˜ z t = α × ε ,t + v t , (ii) v t is uncorrelated with y t at all leads and lags, and (iii) ε ,t lies in theclosed linear span of { y τ } −∞ <τ ≤ t . This immediately implies the “only if” statement.For the “if” part, assume ˜ z t does not Granger cause y t . By the equivalence of Sims andGranger causality, ˜ z † t = E (˜ z t | { y τ } −∞ <τ< ∞ ) = E (˜ z t | { y τ } −∞ <τ ≤ t ). Note that the latterbest linear predictor is white noise since, for any ‘ ≥ (cid:0) E (˜ z t | { y τ } −∞ <τ ≤ t ) , y t − ‘ (cid:1) = Cov(˜ z t , y t − ‘ ) − Cov (cid:0) ˜ z t − E (˜ z t | { y τ } −∞ <τ ≤ t ) , y t − ‘ (cid:1) = 0 − , using the fact that ˜ z t is a projection residual. In conclusion, the best linear predictor ˜ z † t of˜ z t given { y τ } −∞ <τ< ∞ depends only on { y τ } −∞ <τ ≤ t and it has a constant spectrum. Fromthe expression for α LB , we get that α LB = Var( E (˜ z t | { y τ } −∞ <τ ≤ t )). Hence, expression (16)implies that the upper bound of the identiﬁed set for R equals 1. Identified set for R ∞ . The upper bound of the identiﬁed set for R ∞ equals 1 if andonly if 2 π sup ω ∈ [0 ,π ] s ˜ z † ( ω ) = Var( E (˜ z t | { y τ } −∞ <τ< ∞ )). The right-hand side of this equationequals Var(˜ z † t ) = R π s ˜ z † ( ω ) dω . But sup ω ∈ [0 ,π ] s ˜ z † ( ω ) = π R π s ˜ z † ( ω ) dω if and only if s ˜ z † ( ω )is constant in ω almost everywhere, i.e., ˜ z † t is white noise.37 eferences Andrews, D. W. & Shi, X. (2013). Inference Based on Conditional Moment Inequalities.

Econometrica , (2), 609–666.Andrews, D. W. & Shi, X. (2017). Inference based on many conditional moment inequalities. Journal of Econometrics , (2), 275–287.Beaudry, P. & Portier, F. (2006). Stock Prices, News, and Economic Fluctuations. AmericanEconomic Review , (4), 1293–1307.Beaudry, P. & Portier, F. (2014). News-Driven Business Cycles: Insights and Challenges. Journal of Economic Literature , (4), 993–1074.Blanchard, O. J., L’Huillier, J. P., & Lorenzoni, G. (2013). News, Noise, and Fluctuations:An Empirical Exploration. American Economic Review , (7), 3045–3070.Brockwell, P. J. & Davis, R. A. (1991). Time Series: Theory and Methods (2nd ed.). SpringerSeries in Statistics. Springer.Caldara, D. & Herbst, E. (2019). Monetary policy, real activity, and credit spreads: Evidencefrom bayesian proxy svars.

American Economic Journal: Macroeconomics , (1), 157–92.Chahrour, R. & Jurado, K. (2020). Recoverability and Expectations-Driven Fluctuations.Manuscript, Duke University.Chernozhukov, V., Lee, S., & Rosen, A. M. (2013). Intersection Bounds: Estimation andInference. Econometrica , (2), 667–737.Christiano, L., Eichenbaum, M., & Evans, C. (1999). Monetary Policy Shocks: What HaveWe Learned and to What End? In J. B. Taylor & M. Woodford (Eds.), Handbook ofMacroeconomics, Volume 1A chapter 2, (pp. 65–148). Elsevier.Forni, M. & Gambetti, L. (2014). Suﬃcient information in structural VARs.

Journal ofMonetary Economics , (Supplement C), 124–136.Forni, M., Gambetti, L., Lippi, M., & Sala, L. (2017a). Noise Bubbles. Economic Journal , (604), 1940–1976. 38orni, M., Gambetti, L., Lippi, M., & Sala, L. (2017b). Noisy News in Business Cycles. American Economic Journal: Macroeconomics , (4), 122–152.Forni, M., Gambetti, L., & Sala, L. (2019). Structural VARs and noninvertible macroeco-nomic models. Journal of Applied Econometrics , (2), 221–246.Gafarov, B., Meier, M., & Montiel Olea, J. L. (2018). Delta-Method Inference for a Class ofSet-Identiﬁed SVARs. Journal of Econometrics , (2), 316–327.Gertler, M. & Karadi, P. (2015). Monetary Policy Surprises, Credit Costs, and EconomicActivity. American Economic Journal: Macroeconomics , (1), 44–76.Giannone, D. & Reichlin, L. (2006). Does Information Help Recovering Structural Shocksfrom Past Observations? Journal of the European Economic Association , (2/3), 455–465.Gilchrist, S. & Zakrajˇsek, E. (2012). Credit Spreads and Business Cycle Fluctuations. Amer-ican Economic Review , (4), 1692–1720.Gorodnichenko, Y. & Lee, B. (2020). Forecast Error Variance Decompositions with LocalProjections. Journal of Business & Economic Statistics , (4), 921–933.Hall, R. E. (2011). The Long Slump. American Economic Review , (2), 431–469.Hannan, E. (1970). Multiple Time Series . Wiley Series in Probability and Statistics. JohnWiley & Sons.Imbens, G. W. & Manski, C. F. (2004). Conﬁdence Intervals for Partially Identiﬁed Param-eters.

Econometrica , (6), 1845–1857.Jaimovich, N. & Rebelo, S. (2009). Can news about the future drive the business cycle? American Economic Review , (4), 1097–1118.Jord`a, `O. (2005). Estimation and Inference of Impulse Responses by Local Projections. American Economic Review , (1), 161–182.Kilian, L. & Kim, Y. J. (2011). How Reliable Are Local Projection Estimators of ImpulseResponses? Review of Economics and Statistics , (4), 1460–1466.Kilian, L. & L¨utkepohl, H. (2017). Structural Vector Autoregressive Analysis . CambridgeUniversity Press. 39lepper, S. & Leamer, E. E. (1984). Consistent Sets of Estimates for Regressions with Errorsin All Variables.

Econometrica , (1), 163–183.Leeper, E. M., Walker, T. B., & Yang, S.-C. S. (2013). Fiscal Foresight and InformationFlows. Econometrica , (3), 1115–1145.Lippi, M. & Reichlin, L. (1994). VAR analysis, nonfundamental representations, Blaschkematrices. Journal of Econometrics , (1), 307–325.Mertens, K. & Ravn, M. O. (2010). Measuring the Impact of Fiscal Policy in the Face ofAnticipation: A Structural VAR Approach. Economic Journal , (544), 393–413.Mertens, K. & Ravn, M. O. (2013). The Dynamic Eﬀects of Personal and Corporate IncomeTax Changes in the United States. American Economic Review , (4), 1212–1247.Nakamura, E. & Steinsson, J. (2018a). High Frequency Identiﬁcation of Monetary Non-Neutrality: The Information Eﬀect. Quarterly Journal of Economics , (3), 1283–1330.Nakamura, E. & Steinsson, J. (2018b). Identiﬁcation in Macroeconomics. Journal of Eco-nomic Perspectives , (3), 59–86.Pigou, A. C. (1927). Industrial Fluctuations (2 ed.). London: Macmillan.Plagborg-Møller, M. (2019). Bayesian inference on structural impulse response functions.

Quantitative Economics , (1), 145–184.Plagborg-Møller, M. & Wolf, C. K. (2020). Local Projections and VARs Estimate the SameImpulse Responses. Econometrica . Forthcoming.Ramey, V. A. (2016). Macroeconomic Shocks and Their Propagation. In J. B. Taylor &H. Uhlig (Eds.),

Handbook of Macroeconomics , volume 2 chapter 2, (pp. 71–162). Elsevier.Schmitt-Groh´e, S. & Uribe, M. (2012). What’s News in Business Cycles.

Econometrica , (6), 2733–2764.Sims, C. A. & Zha, T. (2006). Does Monetary Policy Generate Recessions? MacroeconomicDynamics , (02), 231–272.Smets, F. & Wouters, R. (2007). Shocks and Frictions in US Business Cycles: A BayesianDSGE Approach. American Economic Review , (3), 586–606.40tock, J. H. (2008). What’s New in Econometrics: Time Series, Lecture 7. Lecture slides,NBER Summer Institute.Stock, J. H. & Watson, M. W. (2018). Identiﬁcation and Estimation of Dynamic CausalEﬀects in Macroeconomics Using External Instruments. Economic Journal , (610), 917–948.Stoye, J. (2009). More on Conﬁdence Intervals for Partially Identiﬁed Parameters. Econo-metrica , (4), 1299–1315.Uhlig, H. (2005). What are the eﬀects of monetary policy on output? Results from anagnostic identiﬁcation procedure. Journal of Monetary Economics , (2), 381–419.Wolf, C. K. (2020). SVAR (Mis)identiﬁcation and the Real Eﬀects of Monetary PolicyShocks. American Economic Journal: Macroeconomics , (4), 1–32.41 nline Appendix for:Instrumental Variable Identiﬁcation ofDynamic Variance Decompositions Mikkel Plagborg-Møller Christian K. WolfNovember 4, 2020

This online appendix contains supplemental material for the article “Instrumental VariableIdentiﬁcation of Dynamic Variance Decompositions”. We provide (i) bounds on other no-tions of variance decompositions, (ii) extensions of the identiﬁcation analysis to multipleinstruments correlated with a single or multiple shocks, (iii) characterizations of the biasof SVAR-IV (or “proxy SVAR”) procedures under noninvertibility, (iv) an illustration ofour method using a quantitative structural macro model, (v) supplementary results for ourempirical application, and (vi) asymptotic theory on the nonparametric validity of our sieveVAR inference strategy. The end of this appendix contains proofs and auxiliary lemmas.

Any references to equations, ﬁgures, tables, assumptions, propositions, lemmas,or sections that are not preceded by “B.” refer to the main article.

Contents

B.1 Identiﬁcation and estimation of other variance decomposition concepts 3B.2 Multiple instruments correlated with one shock 7B.3 Instruments correlated with multiple shocks 9B.4 Invertibility and SVAR-IV 13B.5 Illustration in a structural macro model 15 a r X i v : . [ ec on . E M ] N ov .5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15B.5.2 Information content of several observables . . . . . . . . . . . . . . . . . . . . . . 16B.5.3 Dynamic information content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17B.5.4 Non-invertibility and news shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.5.5 Other observables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 B.6 Supplementary empirical results 23B.7 Nonparametric sieve VAR inference 24

B.7.1 Assumptions, parameters of interest, and estimator . . . . . . . . . . . . . . . . . 24B.7.2 Main convergence results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

B.8 Additional proofs and auxiliary lemmas 30

B.8.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30B.8.2 Auxiliary lemma for proof of Proposition B.1 . . . . . . . . . . . . . . . . . . . . 30B.8.3 Proof of Proposition B.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31B.8.4 Proof of Proposition B.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32B.8.5 Proof of Proposition B.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33B.8.6 Auxiliary lemmas for sieve VAR results . . . . . . . . . . . . . . . . . . . . . . . 34B.8.7 Proof of Lemma B.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36B.8.8 Proof of Lemma B.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36B.8.9 Proof of Lemma B.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38B.8.10 Proof of Lemma B.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39B.8.11 Proof of Lemma B.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41B.8.12 Proof of Lemma B.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43B.8.13 Proof of Lemma B.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43B.8.14 Proof of Lemma B.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44B.8.15 Proof of Proposition B.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47B.8.16 Proof of Proposition B.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

References 51 .1 Identiﬁcation and estimation of other variance de-composition concepts Our main analysis focuses on forecast variance ratios as a measure of shock importance,deﬁned in Section 2.2. This appendix deﬁnes two additional concepts – forecast variancedecompositions (FVD) and unconditional frequency-speciﬁc variance decompositions (VD)– and discusses the identiﬁcation and estimation of both.

Definitions.

The forecast variance decomposition (FVD) for variable i at horizon ‘ isdeﬁned as FVD i,‘ ≡ − Var( y i,t + ‘ | { ε τ } −∞ <τ ≤ t , { ε ,τ } t<τ< ∞ )Var( y i,t + ‘ | { ε τ } −∞ <τ ≤ t ) = P ‘ − m =0 Θ i, ,m P n ε j =1 P ‘ − m =0 Θ i,j,m . (B.1)The FVD measures the reduction in forecast variance that arises from learning the path offuture realizations of the shock of interest, supposing that we already had the history ofpast structural shocks ε t available when forming our forecast. Because the econometriciangenerally does not observe the structural shocks directly, the FVD is best thought of asreﬂecting forecasts of economic agents who observe the underlying shocks. The FVD alwayslies between 0 and 1, purely reﬂects fundamental forecasting uncertainty, and equals 1 if theﬁrst shock is the only shock driving variable i in equation (1). The software package Dynarereports FVDs after having estimated a DSGE model.While the FVR and FVD concepts generally diﬀer, they coincide in the case where allshocks are invertible, since in that case the information set { y τ } −∞ <τ ≤ t equals the informationset { ε τ } −∞ <τ ≤ t . This explains why the SVAR literature has not made the distinction betweenthe two concepts. B.1

Our second additional concept is the frequency-speciﬁc unconditional variance decompo-sition (VD) of Forni et al. (2019, Sec. 3.4). The VD for variable i over the frequency band[ ω , ω ] is given by VD i ( ω , ω ) ≡ R ω ω | Θ i, ( e − iω ) | dω P n ε j =1 R ω ω | Θ i,j ( e − iω ) | dω , ≤ ω < ω ≤ π, (B.2)where Θ i,j ( L ) is the ( i, j ) element of the lag polynomial Θ( L ). VD i ( ω , ω ) is the percentage B.1

Forni et al. (2019) point out the bias caused by noninvertibility when estimating the FVD using SVARs. y i,t – after passing the data through a bandpass ﬁlter that retainsonly cyclical frequencies [ ω , ω ] – caused by entirely “shutting oﬀ” the shock of interest ε ,t .The software package Dynare automatically reports VD i (0 , π ) after solving a DSGE model. Identification and estimation: VD.

Identiﬁcation of the VD is completely analogousto our analysis of the FVR. By deﬁnition, VD i ( ω , ω ) = 1 α × R ω ω | s y i ˜ z ( ω ) | dω R ω ω s y i ( ω ) dω , where s y i ˜ z ( ω ) = α Θ i, ( e − iω ) is the i -th element of s y ˜ z ( ω ), cf. equation (B.2). Since the lastfraction on the right-hand side is point-identiﬁed, our identiﬁed set for α immediately mapsinto an identiﬁed for the VD.We estimate the bounds as " α × R ω ω | ˆ s y i ˜ z ( ω ) | dω R ω ω ˆ s y i ( ω ) dω , α × R ω ω | ˆ s y i ˜ z ( ω ) | dω R ω ω ˆ s y i ( ω ) dω . (B.3)The integrals are computed numerically. The spectral densities required to compute (B.3) arefunctions of the estimated reduced-form VAR parameters (see Appendix A.1). Speciﬁcally,ˆ s y ( ω ) = 12 π ˆ B y ( e − iω ) ˆ B y ( e − iω ) ∗ , ˆ s y ˜ z ( ω ) = 12 π ∞ X ‘ =0 ˆΣ y, ˜ z,‘ e − iω‘ , with ˆ B y ( e − iω ) ≡ ∞ X ‘ =0 ˆ B y,‘ e − iω‘ , ˆΣ y, ˜ z,‘ ≡ d Cov( y t , ˜ z t − ‘ ) = ˆ B y,‘ ˆ B ˜ z . In practice, we truncate the inﬁnite sums at a large lag.

Identification and estimation: FVD.

Bounding the FVD requires more work. In-tuitively, the reason that identiﬁcation of the FVD is more challenging than for the FVRis that, even if we knew α , the IV z t provides no information about the other structuralshocks ε j,t , j = 1. This matters because the deﬁnition (B.1) of the FVD, unlike that ofthe FVR, conditions on knowing all past shocks, rather than all past macro observables.Proposition B.1 formally characterizes the resulting identiﬁed set.4 roposition B.1. Let there be given a joint spectral density for w t = ( y t , ˜ z t ) satisfying theassumptions in Proposition 1. Given knowledge of α ∈ ( α LB , α UB ] , the largest possible valueof the forecast variance decomposition FVD i,‘ is (the trivial bound), while the smallestpossible value is given by P ‘ − m =0 Cov( y i,t , ˜ z t − m ) P ‘ − m =0 Cov( y i,t , ˜ z t − m ) + α Var(˜ y ( α ) i,t + ‘ | { ˜ y ( α ) τ } −∞ <τ ≤ t ) . (B.4) Here ˜ y ( α ) t = (˜ y ( α )1 ,t , . . . , ˜ y ( α ) n y ,t ) denotes a stationary Gaussian time series with spectral density s ˜ y ( α ) ( ω ) = s y ( ω ) − πα s y ˜ z ( ω ) s y ˜ z ( ω ) ∗ , ω ∈ [0 , π ] . Expression (B.4) is monotonically decreasingin α , so the overall lower bound on FVD i,‘ is attained by α = α UB ; in this boundary case wecan represent ˜ y ( α UB ) t = y t − E ( y t | { ˜ z τ } −∞ <τ ≤ t ) . The upper bound on the FVD always equals the trivial bound of 1, for any ‘ ≥

1. Thisupper bound is achieved by a model in which all shocks, except the ﬁrst one, only aﬀect y t after an ‘ -period delay. The lower bound in contrast is nontrivial and informative. Theargument is as follows: Even if α is known, the denominator Var( y i,t + ‘ | { ε τ } −∞ <τ ≤ t ) ofthe FVD is not identiﬁed due to the lack of information about shocks other than the ﬁrst.Although we can upper-bound this conditional variance by the denominator of the FVR,this upper bound is not sharp. Instead, to maximize the denominator, as much forecastingnoise as possible should be of the pure forecasting variety, and not related to noninvertibility.For all shocks except for ε ,t , this is achievable through a Wold decomposition construction(Hannan, 1970, Thm. 2 , p. 158). Given α , we know the contribution of the ﬁrst shock to y t ; the residual after removing this contribution has the distribution of ˜ y ( α ) t , as deﬁned inthe proposition. If α is not known, the smallest possible value of the lower bound (B.4) isattained at the largest possible value of α , namely α UB , for which ε ,t contributes the leastto forecasts of y t .We estimate the bounds in Proposition B.1 as " P ‘ − m =0 d Cov( y i,t , ˜ z t − m ) P ‘ − m =0 d Cov( y i,t , ˜ z t − m ) + ˆ¯ α d Var(˜ y (¯ α ) i,t + ‘ | { ˜ y (¯ α ) τ } −∞ <τ ≤ t ) , . (B.5)To approximate the conditional variance in the denominator, we proceed as in Appendix A.1.First, we replace the inﬁnite conditioning set with the ﬁnite set { ˜ y (¯ α ) τ } τ − M ≤ τ ≤ t . Second, wecompute the conditional variance using the standard projection formula, where the autoco-5ariances of the process { ˜ y (¯ α ) τ } are estimated as d Cov(˜ y (¯ α ) t + ‘ , ˜ y (¯ α ) t ) = d Cov( y t + ‘ , y t ) − α P ∞ m =0 d Cov( y t , ˜ z t − m − ‘ ) d Cov( y t , ˜ z t − m ) . In practice, we truncate the inﬁnite sum at a large lag.6 .2 Multiple instruments correlated with one shock

Here we show that the multiple-IV model in Assumptions 1 to 3 is testable, but if it isconsistent with the data, then identiﬁcation analysis can be reduced to the single-IV case.Deﬁne the IV residual vector ˜ z t as in equation (17). The multiple-IV model in Assump-tions 1 and 2 implies the following cross-spectrum between y t and ˜ z t : s y ˜ z ( ω ) = α π Θ( e − iω ) e λ , ω ∈ [0 , π ] . (B.6)Thus, the cross-spectrum has rank-1 factor structure: It equals a nonconstant column vectortimes a constant row vector. This testable property turns out to be exactly what characterizesthe multiple-IV model. Proposition B.2.

Let a spectrum s w ( ω ) for w t = ( y t , ˜ z t ) be given, satisfying the assumptionsof Proposition 1. There exists a model of the form in Assumptions 1 and 2 which generatesthe spectrum s w ( ω ) if and only if there exist n y -dimensional real vectors ζ ‘ , ‘ ≥ , and an n z -dimensional constant real vector η of unit length such that s y ˜ z ( ω ) = ζ ( e − iω ) η , ω ∈ [0 , π ] , (B.7) where ζ ( L ) = P ∞ ‘ =0 ζ ‘ L ‘ . Assuming henceforth that the factor structure obtains, we now show that identiﬁcationin the multiple-IV model reduces to the single-IV case. It is convenient ﬁrst to reparametrizethe model slightly, by setting Σ v = Σ ˜ z − α λλ and treating Σ ˜ z as a basic model parameterinstead of Σ v . We then impose the requirement that Σ ˜ z − α λλ be positive semideﬁnite.Clearly, Σ ˜ z = Var(˜ z t ) is point-identiﬁed. Next, note from (B.6) that λ is point-identiﬁed andequal to the η vector in equation (B.7). This is because any rank-1 factorization of a matrixis identiﬁed up to sign and scale, and we have normalized η to have length 1. Let Ξ be any( n z − × n z matrix such that ΞΣ − / z λ = 0. Deﬁne the n z × n z matrix Q ≡ λ Σ − z λ λ Σ − z ΞΣ − / z ! . Since Q is point-identiﬁed (given a choice of Ξ), it is without loss of generality to perform7dentiﬁcation analysis based on the linearly transformed IV residuals Q ˜ z t =  α  ε ,t + ˜ v t , ˜ v t ∼ N , λ Σ − z λ − α

00 ΞΞ !! . Notice, however, that α only enters into the equation for the ﬁrst element of Q ˜ z t , and the( n z −

1) last elements of Q ˜ z t are independent of the ﬁrst element (and independent of y t at all leads and lags). Hence, it is without loss of generality to limit attention to the ﬁrstelement of Q ˜ z t when performing identiﬁcation analysis for Θ i,j,‘ and α . The ﬁrst element of Q ˜ z t equals ˘ z t as deﬁned in equation (18) in the main text. B.2

Additional restrictions on the IVs can ensure point identiﬁcation. In particular, if n z ≥ v to be diagonal, then α is point-identiﬁed from anyoﬀ-diagonal element of Var(˜ z t ) = Σ v + α λλ , since λ is point-identiﬁed. B.2

The above display implies that we must have α ≤ ( λ Var(˜ z t ) − λ ) − , which is precisely what the upperbound for α yields when applied to ˘ z t . .3 Instruments correlated with multiple shocks In this section, we ask how much can be said about forecast variance ratios if the researcheris only willing to assume that the observed set of external instruments z t is correlated withat most n ε x shocks, collected in the vector ε x,t . Hence, in this section we do not impose theexclusion restriction that only the ﬁrst shock ε ,t be correlated with the IV(s). Extended model and FVR.

Without loss of generality, suppose the n z IVs are corre-lated with the ﬁrst n ε x of the n ε shocks. Denote this sub-vector of shocks by ε x,t . For now, n ε x need not be known to the econometrician. We deﬁne the extended SVMA-IV model as y t = Θ( L ) ε t , Θ( L ) ≡ ∞ X ‘ =0 Θ ‘ L ‘ , (B.8) z t = ∞ X ‘ =1 (Ψ ‘ z t − ‘ + Λ ‘ y t − ‘ ) + Γ ε x,t + Σ / v v t | {z } ˜ z t , (B.9)where Γ is n z × n ε x . We continue to impose i.i.d. normality of the shocks, cf. Assumption 3.Our object of interest is the forecast variance ratio with respect to the n z particular linearcombinations of shocks that enter into the IV equations, Γ ε x,t : FVR i,‘ ≡ − Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t , { Γ ε x,τ } t<τ< ∞ )Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t )= P ‘ − m =0 Cov( y it , ˜ z t − m )(ΓΓ ) − Cov( y it , ˜ z t − m ) Var( y i,t + ‘ | { y τ } −∞ <τ ≤ t ) . (B.10)In the following we provide upper and lower bounds on this object. Given ΓΓ , the FVR ispoint-identiﬁed, so we need to derive the identiﬁed set for ΓΓ . At the end of this section wediscuss how the FVR with respect to Γ ε x,t relates to other objects of interest.Similar to Appendix B.2, the testable restriction of the model (B.8)–(B.9) is that thejoint spectrum of y t and ˜ z t has a rank- n ε x factor structure. If this assumption is not re-jected, we can reduce the instrument vector to dimension min( n z , n ε x ) without aﬀecting theidentiﬁcation of FVR i,‘ . B.3

In particular, we may assume that Γ has full row rank, which wedo from now on, thus justifying the second equality in (B.10).

B.3

The argument is very similar to the one-shock case in Appendix B.2 and is available upon request. dentified set for Γ . Deﬁne Σ ˜ z ≡ Var(˜ z t ). Proceeding similarly to the proof of Propo-sition 1, we can show that a given Γ is consistent with the joint spectral density of the dataif and only if ΓΓ has full row rank, Σ ˜ z − ΓΓ ≥ , (B.11)and ΓΓ − πs ˜ z † ( ω ) ≥ , ∀ ω ∈ [0 , π ] , (B.12)where s ˜ z † ( ω ) = s y ˜ z ( ω ) ∗ s y ( ω ) − s y ˜ z ( ω ) and we use the notation A ≥ B if A − B is Hermitianpositive semi-deﬁnite (and similarly for ≤ ). Sharp bounds on FVR i,‘ thus follow from min-imizing/maximizing (B.10) over the space of n z × n z symmetric positive deﬁnite matricesΓΓ subject to constraints (B.11)–(B.12). Lower bound on FVR.

We now establish a sharp lower bound on the numerator in thedeﬁnition (B.10) of the FVR (the denominator is point-identiﬁed). Observe that ‘ − X m =0 Cov( y it , ˜ z t − m )(ΓΓ ) − Cov( y it , ˜ z t − m ) = ‘ − X m =0 Cov( y it , ˜ z t − m )Σ − z Cov( y it , ˜ z t − m ) + ‘ − X m =0 Cov( y it , ˜ z t − m ) { (ΓΓ ) − − Σ − z } Cov( y it , ˜ z t − m ) ≥ ‘ − X m =0 Cov( y it , ˜ z t − m )Σ − z Cov( y it , ˜ z t − m ) , where the inequality uses the constraint (B.11). The above lower bound is sharp: It isattained in a model where Σ v = 0 n z × n z and ΓΓ = Σ ˜ z , i.e., when all IVs are perfect. B.4

Upper bound on FVR.

While we have not been able to derive a closed-form expressionfor the sharp upper bound on the FVR, it is straight-forward to numerically compute it. Let S n denote the space of n × n real symmetric positive deﬁnite matrices, and let tr( A ) denotethe trace of a matrix A . The sharp upper bound on the numerator in the deﬁnition (B.10) B.4

Note that ΓΓ = Σ ˜ z = 2 πs ˜ z ( ω ) satisﬁes constraint (B.12) by the Schur complement formula and thepositive semideﬁniteness of the spectrum of ( y t , ˜ z t ) .

10f the FVR is given by the value of the programmax X ∈S nz tr( XC ) + tr( AC ) (B.13) X ≤ B ( ω ) , ω ∈ [0 , π ] . Here X is a stand-in for (ΓΓ ) − − Σ − z , C ≡ P ‘ − m =0 Cov( y it , ˜ z t − m ) Cov( y it , ˜ z t − m ), A ≡ Σ − z ,and B ( ω ) ≡ π s ˜ z † ( ω ) − − Σ − z . We can solve the above program to arbitrary accuracy bycasting it as a (convex) semi-deﬁnite program with a ﬁnite number of constraints. Partitionthe interval [0 , π ] into N equal-length pieces, and consider the relaxed constraint set X ≤ ˜ B m , m ∈ { , , . . . , N } , (B.14)where ˜ B m ≡ Nπ × R m πN ( m − πN B ( ω ) dω . As N → ∞ , this constraint set approximates that of theoriginal problem arbitrarily well, but for any ﬁnite N the value of the discretized programprovides an upper bound on the numerator in (B.10). Eﬃcient numerical algorithms tocompute the solution to semideﬁnite programs of the form (B.13)–(B.14) are available inMatlab and other environments. B.5

Alternatively, we can derive non-sharp upper bounds on the FVR numerator (B.10)in closed form. For example, one conservative upper bound is obtained by maximizingtr( XC ) + tr( AC ) subject to X + ˜Σ − z ≤ (cid:16)R π − π s ˜ z † ( ω ) dω (cid:17) − = Var(˜ z † t ) − . This yields theupper bound ‘ − X m =0 Cov( y it , ˜ z t − m ) Var(˜ z † t ) − Cov( y it , ˜ z t − m ) , which binds if the shocks ε x,t are all recoverable, but is otherwise not sharp. A less conser-vative – but still generally suboptimal – upper bound is given bytr( AC ) + n z X m =1 inf ω ∈ [0 ,π ] ˇ B mm ( ω ) , where ˇ B mm ( ω ) is the ( m, m ) element of ˇ B ( ω ) ≡ C / B ( ω ) C / , and C = C / C / . Thislatter upper bound is sharp when n z = 1, in which case the lower and upper bounds in thissection reduce to the FVR bound expressions derived in Section 5. B.5

See for example http://cvxr.com/cvx/doc/sdp.html . To transform our constraints into ones involvingreal matrices, note that a Hermitian matrix with real part A and imaginary part B is positive semi-deﬁniteif and only if the real symmetric matrix (cid:0) A B B A (cid:1) is positive semi-deﬁnite. nterpretation. We highlight two special cases where the FVR with respect to Γ ε x,t (which we partially identiﬁed above) is of interest.First, as in Mertens & Ravn (2013), one may assume that the n z instruments are cor-related with the same number n ε x = n z of structural shocks. In that case Γ is square andnonsingular, so the FVR with respect to Γ ε x,t is the same as the FVR with respect to theshocks ε x,t themselves. Moreover, if we further assume that all included shocks ε x,t are re-coverable, then ˜ z † t ≡ E (˜ z t | { y τ } −∞ <τ< ∞ ) = Γ ε x,t , so the historical decomposition of y t withrespect to ε x,t is point-identiﬁed as E ( y t | { ε x,t } −∞ <τ ≤ t ) = E ( y t | { ˜ z † τ } −∞ <τ ≤ t ).Second, consider the case with a single IV but possibly several included shocks, n ε x > n z . The above analysis shows that, even though the IV exclusion restrictions in the baselinemodel (3) fail, the data are informative about the FVR with respect to the particular linearcombination Γ ε x,t of shocks that enters the IV equation. The FVR with respect to thisparticular linear combination of shocks is evidently a lower bound for the FVR with respectto the full vector ε x,t of shocks that are correlated with the IV.12 .4 Invertibility and SVAR-IV In this section we characterize the bias of SVAR-IV methods when shocks may be nonin-vertible. Throughout we assume the validity of the SVMA-IV model in Assumptions 1 to 3.Our analysis builds on results by Lippi & Reichlin (1994) and Forni et al. (2019), who donot consider identiﬁcation using external instruments.The SVAR-IV (or “proxy SVAR”) strategy identiﬁes structural shocks by using the exter-nal IV to rotate the forecast errors from a reduced-form VAR (Stock, 2008; Stock & Watson,2012; Mertens & Ravn, 2013; Gertler & Karadi, 2015; Ramey, 2016). For analytical clarity,we work with a VAR( ∞ ) model with forecast errors u t ≡ y t − E ( y t | { y τ } −∞ <τ

Assume the SVMA-IV model in Assumptions 1 to 3. The shock that is(mis)identiﬁed by SVAR-IV is given by ˜ ε ,t ≡ γ u t = n ε X j =1 ∞ X ‘ =0 a j,‘ ε j,t − ‘ , (B.15) where the scalar coeﬃcients { a j,‘ } satisfy P n ε j =1 P ∞ ‘ =0 a j,‘ = 1 and a , = p R . The associ-ated SVAR-IV impulse responses are given by B.6 ˜Θ • , ,‘ ≡ Cov( y t , ˜ ε ,t − ‘ ) = n ε X j =1 ∞ X m =0 a j,m Θ • , ,‘ + m , ‘ = 0 , , , . . . , and the impact impulse responses satisfy ˜Θ • , , = 1 p R Θ • , , . Under noninvertibility, SVAR-IV mis-identiﬁes the shock as a distributed lag of all the

B.6

In any SVAR( ∞ ) model, the impulse responses implied by the model must equal the local projectionsof the outcomes on the identiﬁed shock(s). This follows from the Wold representation. ε ,t equalto p R (the square root of the degree of invertibility, cf. Section 2.1). This causes impulseresponses to be conﬂated across horizons and shocks. At the impact horizon, SVAR-IVoverstates the magnitudes of the true impulse responses Θ • , , (to a one standard deviationshock) by a factor of 1 / p R . Thus, the SVAR-IV-implied one-step-ahead forecast variancedecompositions for the ﬁrst shock overstate the true one-step-ahead FVRs (as deﬁned inSection 2.1) by a factor of 1 /R . The bias of SVAR-IV-implied multi-step forecast variancedecompositions depends in more complicated ways on the sequence of true impulse responses.In summary, while SVAR-IV analysis solves the familiar “rotation problem” in SVARanalysis, it does not solve the invertibility problem. The issue is not that the IV selects asuboptimal linear combination γ of the forecast residuals u t under noninvertibility, since itcan be veriﬁed that γ u t ∝ E ( ε ,t | u t ) regardless of invertibility. B.7

Rather, SVAR methodsfail because they assume that the time- t forecast residuals suﬃce to recover ε ,t (Lippi &Reichlin, 1994). Only under invertibility (i.e., R = 1) do we have a j,‘ = 0 for all ( j, ‘ ) =(1 , ε ,t equals the true shock ε ,t . The higher the degreeof invertibility R , the smaller is the extent of the SVAR-IV bias, as discussed by Sims& Zha (2006), Forni et al. (2019), and Wolf (2020). An explicit illustration of SVAR-IVmis-identiﬁcation is provided in Appendix B.5. B.7

In particular, no other linear combination γ can yield a representation (B.15) where the weight a , exceeds p R (subject to Var(˜ ε ,t ) = 1). Thus, the IV handles the identiﬁcation problem as well as possiblesubject to the constraints imposed by the (erroneous) invertibility assumption. As discussed in Section 2.1, dynamic rotations circumvent this issue by obtaining the shock ˜ ε ,t as a function of current and future reduced-form residuals { u τ } τ ≥ t . An argument similar to that in the proof of Proposition B.3 shows that,with such dynamic rotations, the weight on the true shock of interest is bounded above by p R ∞ . Dynamicrotations can thus solve the identiﬁcation problem if and only if the shock of interest is recoverable. .5 Illustration in a structural macro model In Section 6 we use several simple analytical examples to illustrate how our upper boundworks. In this section we complement those simple examples with a quantitative exercise.The nature of our exercise is as follows. We consider an econometrician observing (i) asmall set of macroeconomic aggregates generated from the model of Smets & Wouters (2007)and (ii) noisy measures of some of the model’s true underlying structural shocks (i.e., validexternal instruments). For clarity, we abstract from any sampling uncertainty and assumethat the econometrician observes an inﬁnite amount of data, so the joint spectral density ofobserved macro aggregates and external IVs is perfectly known to her. Given this spectraldensity, she uses our bounds to draw conclusions about variance decompositions and thedegree of invertibility, without exploiting the underlying structure of the model. Overall,the point of this exercise is to show that our conclusions on likely tightness of the upperbound are not an artifact of the particular stylized environments considered in Section 6,but similarly obtain in quantitatively relevant, dynamic structural macro models, for exactlythe same economic reasons.

B.5.1 Preliminaries

We employ the Smets & Wouters (2007) model. Throughout, we parametrize the modelaccording to the posterior mode estimates of Smets & Wouters (2007).

B.8

Following thecanonical trivariate VAR in the empirical literature on monetary policy shock transmission,our baseline speciﬁcation assumes the econometrician observes aggregate output, inﬂation,and the short-term policy interest rate; we consider additional observables below. Thesemacro aggregates are all stationary in the model, so they should be viewed as deviationsfrom trend. The model features seven unobserved shocks, so not all shocks can be invertiblein the baseline speciﬁcation.The econometrician observes a single external instrument z t for the shock of interest ε ,t : z t = αε ,t + σ v v t . We normalize α = 1 throughout and compute identiﬁed sets for two diﬀerent degrees ofinformativeness of the external instrument, σ v ∈ { . , . } . We do not attach any speciﬁc B.8

Our implementation of the Smets-Wouters model is based on Dynare replication code kindly providedby Johannes Pfeifer. The code is available at https://sites.google.com/site/pfeiferecon/dynare . r t = ρ r r t − + (1 − ρ r ) × ( φ π π t + φ y ˆ y t + φ dy (ˆ y t − ˆ y t − )) + ε mt − , where r t denotes the nominal interest rate, π t denotes the inﬂation, ˆ y t is the output gap, and ε t is the monetary shock. B.9

Overall, these three shocks are chosen in line with our simpleanalytical illustrations in Section 6, and identiﬁcation will be subject to the same economicintuition as the small-scale examples discussed there.We emphasize that our results in the remainder of this section should not be takento imply that conventional monetary shocks are robustly near-invertible, or that forwardguidance and technology shocks are never invertible – clearly, our statements are alwaysconditional on a certain set of observables. Instead, the only purpose of this section is todocument that the simple economic intuition of Section 6 still plays out in a quantitativemacro model with rich dynamics. To further clarify this point, we ﬁnish this section bybrieﬂy discussing how our results change with alternative sets of observables.

B.5.2 Information content of several observables

We ﬁrst consider identiﬁcation of monetary policy shocks, i.e., shocks to the serially corre-lated disturbance in the model’s Taylor rule.The monetary shock is nearly invertible. In the model of Smets & Wouters, monetarypolicy shocks are the only shock to contemporaneously move inﬂation and nominal ratesin opposite directions (Uhlig, 2005). Given this unique conditional co-movement, the in-tuition oﬀered in Section 6.1 suggests that the degree of invertibility should be high, andindeed it equals R = 0 . R ∞ = 0 . B.9

This is the notion of forward guidance discussed, for example, in Del Negro et al. (2012). onetary shock: Identified set of FVRs Figure B.1:

Horizon-by-horizon identiﬁed sets for FVRs up to 10 quarters. The two lower boundscorrespond to an IV with σ v = 0 .

25 (lower dashed line) and an IV with σ v = 0 . frequency-by-frequency ( α LB = 0 . B.10

Figure B.1 shows that the upper bounds on the forecast variance ratios are close tothe true values. By construction, the upper and lower bounds are proportional to the trueFVRs. The lower bound scales one-for-one with instrument informativeness, while the up-per bound scales one-for-one with the maximal informativeness of the data for the shockacross frequencies. Thus, near-invertibility immediately implies that the upper bounds arethroughout close to the true FVR. In contrast, the informativeness of the lower boundsdepends entirely on the strength of the IV.

B.5.3 Dynamic information content

Next, we consider the identiﬁcation of technology shocks, i.e., innovations to the autoregres-sive process of total factor productivity. This type of shock illustrates how our sharp upperbound leverages information across frequencies.In our baseline trivariate speciﬁcation, the macro aggregates are informative about onlythe longest cycles of the technology shock. Figure B.2 reports the spectral density of thebest two-sided linear predictor of the technology shock. Strikingly, this spectral density is

B.10

Formally, the scaled spectral density 2 πs ε † ( · ) of the best two-sided linear predictor of the monetaryshock is nearly ﬂat at around 0.9. echnology shock: Spectral density of best 2-sided linear predictor Figure B.2:

Scaled spectral density 2 πs ε † ( · ) of the best two-sided linear predictor of the technologyshock. A frequency ω corresponds to a cycle of length πω quarters. small at business-cycle frequencies, but close to 1 for long-run ﬂuctuations, with a peak of α LB = 0 . R = 0 . R ∞ = 0 . B.11

Consequently, Figure B.3 shows that the sharp upper bounds on FVRs are tight. Asalways, the tightness of the lower bound is entirely governed by the strength of the IV.

B.5.4 Non-invertibility and news shocks

For our third example, we modify the model to include forward guidance shocks, a type ofnews shock. As discussed above, a forward guidance shock is identical to a monetary shock,except that it is anticipated two quarters in advance by economic agents. This third example

B.11

The issue is that, at short horizons, other shocks – notably the price and wage mark-up shocks – alsopush inﬂation and output in opposite directions. However, the technology shock becomes nearly invertiblewith a judicious choice of further observables; in particular, including either the level of TFP or hours workedleads to a nearly invertible representation. echnology shock: Identified set of FVRs Figure B.3:

Horizon-by-horizon identiﬁed sets for FVRs up to 10 quarters. The two lower boundscorrespond to an IV with σ v = 0 .

25 (lower dashed line) and an IV with σ v = 0 . illustrates the robustness of our method to non-invertibility induced by news shocks.The forward guidance shock is highly non-invertible, though it is nearly recoverable.Figure B.4 shows the degree of invertibility R ‘ up to time t + ‘ of the forward guidanceshock. The ﬁgure considers horizons from ‘ = 0 (the degree of invertibility) up to ‘ = 10(close to the degree of recoverability). Contemporaneous informativeness is limited, with R = 0 . B.12 At ‘ = 2, however, the corresponding R jumpsto R = 0 . R ∞ = 0 . B.12

This belief is further reinforced by the movement of nominal interest rates: Since output and inﬂationhave both increased (for an expansionary forward guidance shock), nominal interest rates initially increase(as dictated by the Taylor rule), before declining two periods later. orward guidance shock: Degree of invertibility at time t + ‘ Figure B.4:

Population R ‘ for the forward guidance shock, with three observables (output,inﬂation, interest rate) and seven observables (the full set in Smets & Wouters, 2007). in Appendix B.4, the (impact) FVR is biased upward by a factor of 1 /R ≈

13 (!).

B.13

Figure B.6 shows that our method instead achieves a tight upper bound on the FVR, irre-spective of the degree of invertibility. Since the forward guidance shock is nearly recoverable,the upper bounds of our identiﬁed sets for the diﬀerent FVRs are again close to the truth,similar to the conventional (near-invertible) monetary shock studied in Appendix B.5.2.

B.5.5 Other observables

The results in the preceding sections are designed to illustrate the economic logic of ourmethod. They should not, however, be interpreted as oﬀering generally valid conclusionson the invertibility (or lack thereof) of diﬀerent structural shocks – such statements areinvariably sensitive to the choice of observables. To further emphasize this point, we inTable B.1 compute the degrees of invertibility and recoverability for each shock, for diﬀerentsets of macro observables.The degrees of invertibility and recoverability are by deﬁnition increasing in the numberof macroeconomic observables. For the baseline monetary shock, the degree of invertibility is

B.13

Of course, for suitably chosen sets of observables (e.g., expectations about future interest rates), thenon-invertibility problem would disappear (Leeper et al., 2013). Our method is robust in the sense that itdoes not require such a judicious choice of further observables – it works even under extreme non-invertibility. orward guidance shock: SVAR-IV FVRs Figure B.5:

FVRs for a forward guidance shock in the Smets-Wouters model, true values andSVAR-IV-estimated values (population limit). Baseline set of three observables.

Forward guidance shock: Identified set of FVRs

Figure B.6:

Horizon-by-horizon identiﬁed sets for FVRs up to 10 quarters. The two lower boundscorrespond to an IV with σ v = 0 .

25 (lower dashed line) and an IV with σ v = 0 . tructural illustration: Degree of invertibility/recoverability Monetary shock Technology shock Forw. guid. shockMacro observables R R ∞ R R ∞ R R ∞ Baseline 0.8702 0.8763 0.1977 0.2166 0.0768 0.8807+ investm. + consum. 0.9415 0.9507 0.2128 0.2384 0.0980 0.9492+ hours 0.9272 0.9286 0.9799 0.9816 0.0774 0.9331All observables 1 1 1 1 0.1049 1

Table B.1:

Degree of invertibility R and degree of recoverability R ∞ in Smets-Wouters model,given three diﬀerent sets of macro observables y t . “Baseline” is the 3-variable speciﬁcation withoutput, inﬂation, and short-term interest rate. The second and third rows add either (i) investmentand consumption or (ii) hours to the baseline observables. The last row has the full set of observablesconsidered in Smets & Wouters (2007). high as soon as the researcher observes both nominal interest and inﬂation; with the full setof observables, the shock becomes perfectly invertible. For the technology shock, the degreeof invertibility jumps to almost 1 as soon as hours worked become observable; intuitively,this is so because, at the posterior mode of the Smets-Wouters model, most high- and low-frequency variation of hours worked is driven by technology shocks. Finally, because none ofthe observables included in the estimation exercise of Smets & Wouters are forward-lookingmeasures of nominal interest rates, the forward guidance shock remains highly non-invertibleregardless of the choice of observables. 22 mpirical application: Forecast variance ratios, SVAR-IV Figure B.7:

Point estimates and 90% conﬁdence intervals for the identiﬁed sets of forecast varianceratios/decompositions estimated by SVAR-IV, across diﬀerent variables and forecast horizons. Forvisual clarity, we force bias-corrected estimates/bounds to lie in [0 , B.6 Supplementary empirical results

Complementing the empirical results in Section 4, Figure B.7 shows the forecast varianceratios/decompositions estimated by a conventional SVAR-IV procedure, with bootstrap con-ﬁdence intervals. The conclusions about the irrelevance of monetary shocks for output growthand inﬂation are even starker in this ﬁgure than in the main paper. Our bounds estimates inSection 4 show that the irrelevance of monetary shocks is not merely an artifact of assuminginvertibility, but is instead a robust implication of the empirical covariances of the macroaggregates y t with the monetary shock instrument z t of Gertler & Karadi (2015).23 .7 Nonparametric sieve VAR inference In this appendix section we show that the bound estimates proposed in Section 3 are jointlyasymptotically normal under nonparametric conditions on the DGP, as long as the VARlag length is chosen to increase with the sample size at an appropriate rate. The nonpara-metric viewpoint does not change the practical steps necessary to implement the inferencestrategy; it only provides regularity conditions under which it is asymptotically innocuousto approximate the true VAR( ∞ ) data generating process by a ﬁnite-lag VAR. We utilizethe classic sieve VAR results of Lewis & Reinsel (1985) (who build on the univariate resultsof Berk, 1974) to prove asymptotic normality of those nonlinear functionals of the estimatedVAR spectrum that appear in our bounds. Our main result below is similar in spirit to theabstract theorem in Saikkonen & Lutkepohl (2000, Thm. 2), although our regularity condi-tions are more easily veriﬁable as they are tailored to our parameters of interest (however,unlike Saikkonen & Lutkepohl, we only consider stationary data).The purpose of this section is merely to demonstrate that existing sieve VAR theoryimplies that empirical SVMA-IV analysis can be carried out in a nonparametric fashion. Wedo not claim to provide conceptually new insights into sieve VAR econometrics. Althoughhere we only prove the validity of the sieve VAR strategy for delta method inference, weexpect that similar results could be established for bootstrap sieve VAR inference in theSVMA-IV model, see Gon¸calves & Kilian (2007), Meyer & Kreiss (2015), and referencestherein. B.7.1 Assumptions, parameters of interest, and estimator

We ﬁrst deﬁne the general class of parameters of interest for empirical SVMA-IV analysis,and we place assumptions on the DGP and VAR lag length. Our goal is to stay close to theset-up in Lewis & Reinsel (1985), so as to demonstrate how existing asymptotic results canbe readily adapted to study sieve VAR estimators for SVMA-IV purposes.We assume that the data are generated by a reduced-form VAR( ∞ ) model with i.i.d.innovations. The observations are denoted by W t ≡ ( y t , z t ) ∈ R n W , t = 1 , , . . . , T , where n W ≡ n y + 1. In order to make clear the connection with Lewis & Reinsel (1985), we assumethat the data is known to have mean zero. It is straight-forward to extend all results to allowfor non-zero means by including an intercept in the estimated VAR. Let k B k ≡ (tr( B B )) / denote the Frobenius norm. 24 ssumption B.1. The process { W t } is generated by the mean-zero stationary VAR( ∞ )model A ( L ) W t = e t . Here A ( z ) ≡ I n W − P ∞ ‘ =1 A ‘ z ‘ for z ∈ C , and A ‘ ∈ R n W × n W for all ‘ . We impose thefollowing conditions:i) det( A ( z )) = 0 for all | z | ≤ , and P ∞ ‘ =1 k A ‘ k < ∞ .ii) { e t } is an n W -dimensional i.i.d. process with E ( e t ) = 0 n W × , Σ ≡ Var( e t ) is positivedeﬁnite, and E k e t k < ∞ . These conditions are the same as in Lewis & Reinsel (1985), except that we here assumethat e t has 8 moments instead of just 4. B.14

Meyer & Kreiss (2015) discuss the generality ofassuming a reduced-form VAR( ∞ ) with i.i.d. disturbances, see also Kreiss et al. (2011) formore details in the univariate case. Assuming the SVMA-IV model (1)–(3) holds, the i.i.d.assumption on the one-step-ahead reduced-form forecast errors e t is automatically satisﬁed,provided that the structural shocks ( ε t , v t ) are themselves i.i.d. and either (i) invertible,or (ii) Gaussian (regardless of invertibility). Although we are here deliberately aiming atconceptual clarity rather than full generality, we expect it would be straight-forward toweaken the i.i.d.-ness assumption on e t by appealing to a suitable multivariate version ofthe sieve VAR result of Gon¸calves & Kilian (2007), who assume heteroskedastic martingalediﬀerence innovations.Next, we deﬁne the class of parameters of interest for empirical SVMA-IV analysis. Deﬁnethe two matrix-valued functions A cos ( ω ) ≡ ∞ X ‘ =1 A ‘ cos( ω‘ ) , A sin ( ω ) ≡ ∞ X ‘ =1 A ‘ sin( ω‘ ) , ω ∈ [0 , π ] . The parameter of interest is of the form ψ ≡ Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) dω, where we deﬁne the functions h : [0 , π ] → R K and g : A δ × S n W → R K , the set A δ = { ( B , B ) ∈ R n W × n W × R n W × n W : | det( I n W − B − iB ) | ≥ δ } , and the ﬁxed number δ > B.14

We only use more than four moments in the proofs of Lemmas B.6 to B.8 below, where the extramoments make the arguments more transparent.

25s strictly smaller than inf ω ∈ [0 , π ] | det( A ( e iω )) | . S n W denotes the set of n W × n W symmetricpositive deﬁnite matrices.For appropriate choices of h ( · ) and g ( · ), the above class of parameters includes almostall the parameters/bounds in SVMA-IV analysis. B.15

For example, the class contains (i)elements Σ ij of Σ, (ii) the degree of recoverability R ∞ = R π s y ˜ z ( ω ) ∗ s y ( ω ) − s y ˜ z ( ω ) dω , and(iii) autocovariances E ( w i,t w j,t − ‘ ) = R π e iω‘ s w,ij ( ω ) dω . Here w t ≡ ( y t , ˜ z t ) , and for all ω ∈ [0 , π ], s w ( ω ) = s y ( ω ) s y ˜ z ( ω ) s ˜ zy ( ω ) s ˜ z ( ω ) ! = 12 π ( I n y , n y × )( A cos ( ω ) + iA sin ( ω )) − n y × × n y ! × Σ ( A cos ( ω ) − iA sin ( ω )) − ( I n y , n y × ) n y × × n y ! . Other SVMA-IV parameters can be constructed as nonlinear transformations of a ﬁnitenumber of autocovariances. By the Cram´er-Wold device, it is without loss of generality toconsider vector-valued (rather than matrix-valued) functions h ( · ) and g ( · ). In the following,we further assume K = 1 so that both h ( · ) and g ( · ) are scalar. This eases the notationwithout sacriﬁcing essential generality, as should be clear from the proofs.We place certain smoothness conditions on the parameter of interest, thus permitting adelta method argument. Assumption B.2.

The function h ( · ) is continuous on [0 , π ] . On any non-empty, compactsubset of the domain A δ × S n W , the function g ( · , · , · ) is twice continuously diﬀerentiable.Denote the partial derivatives by g ( B , B , S ) ≡ ∂g ( B ,B ,S ) ∂ vec( B ) , g ( B , B , S ) ≡ ∂g ( B ,B ,S ) ∂ vec( B ) , and g ( B , B , S ) ≡ ∂g ( B ,B ,S ) ∂ vec( S ) . At the true VAR parameters { A ‘ } and Σ , each of the functions ω g j ( A cos ( ω ) , A sin ( ω ) , Σ) , j = 1 , , , belongs to L (0 , π ) (element-wise). The smoothness conditions in Assumption B.2 are easily veriﬁed for all parameters of interestin SVMA-IV analysis, since Assumption B.1 ensures that the true VAR spectrum is non-singular.Finally, we deﬁne a sieve VAR estimator as the sample analogue of the population pa-rameter of interest. For any p ∈ N , deﬁne X t ( p ) ≡ ( W t − , . . . , W t − p ) ∈ R n W p and the B.15

The only exception is the parameter sup ω ∈ [0 , π ] s ˜ z † ( ω ), which is discussed in the main text. β ( p ) ≡ (cid:16) ˆ A ( p ) , . . . , ˆ A p ( p ) (cid:17) ≡ T X t = p +1 W t ( p ) X t ( p ) ! T X t = p +1 X t ( p ) X t ( p ) ! − . Let ˆΣ( p ) ≡ ( T − p ) − P Tt = p +1 ˆ e t ( p )ˆ e t ( p ) , where ˆ e t ( p ) ≡ W t − ˆ β ( p ) X t ( p ). Deﬁne alsoˆ A cos ( ω ; p ) ≡ p X ‘ =1 ˆ A ‘ cos( ω‘ ) , ˆ A sin ( ω ; p ) ≡ p X ‘ =1 ˆ A ‘ sin( ω‘ ) , ω ∈ [0 , π ] . The VAR( p ) estimator of the parameter of interest ψ is thenˆ ψ ( p ) ≡ Z π h ( ω ) g ( ˆ A cos ( ω ; p ) , ˆ A sin ( ω ; p ) , ˆΣ) dω. The VAR lag length p = p T must be chosen to grow with the sample size T at an appropriaterate, unless the true DGP is a ﬁnite-order VAR. Assumption B.3. p T ∈ N is a deterministic function of the sample size T such that p T /T → and T / P ∞ ‘ = p T +1 k A ‘ k → as T → ∞ . These conditions are adopted from Lewis & Reinsel (1985, Thm. 2), see also Berk (1974).The last condition in Assumption B.3 amounts to oversmoothing (i.e., choosing the laglength p so large that the variance dominates the mean square error), which ensures thatthe nonparametric bias does not show up in asymptotic limiting distributions. If the partialautocorrelations of the data decay exponentially fast with the lag length, Assumption B.3 issatisﬁed by choosing p T ∝ T φ for any φ ∈ (0 , / p T to be any constant greater than the true lag length. B.7.2 Main convergence results

We now state our main results on the asymptotic normality of the sieve VAR estimator andthe consistency of the asymptotic variance estimator.In preparation for stating our results, deﬁne for all T the vector ν T = ( ν ,T , . . . , ν p T ,T ) ∈ R n W p T , where ν ‘,T ≡ Z π h ( ω ) { g ( A cos ( ω ) , A sin ( ω ) , Σ) cos( ω‘ ) + g ( A cos ( ω ) , A sin ( ω ) , Σ) sin( ω‘ ) } dω ∈ R n W ‘ = 1 , , . . . , p T . Deﬁne also ξ ≡ Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) dω ∈ R n W . We also deﬁne the estimators ˆ ν T and ˆ ξ ( p T ) of ν T and ξ obtained by substituting A cos ( · )and A sin ( · ) with ˆ A cos ( · ; p T ) and ˆ A sin ( · ; p T ) in the above formulas. Finally, we deﬁne Γ( p ) ≡ E ( X t ( p ) X t ( p ) ) for all p ∈ N and the sample analogue ˆΓ( p ) ≡ ( T − p ) − P Tt = p +1 X t ( p ) X t ( p ) .In the rest of this section, all convergence statements are understood to be taken as T → ∞ .Our ﬁrst main proposition states that the sieve VAR estimator of the parameter ofinterest is asymptotically normal under our nonparametric conditions on the data generatingprocess, the conditions on the estimated VAR lag order, and the regularity conditions onthe parameter of interest. Proposition B.4.

Let Assumptions B.1 to B.3 hold. Assume σ ψ ≡ lim T →∞ ν T (Γ( p T ) − ⊗ Σ) ν T + ξ Var( e t ⊗ e t ) ξ is strictly positive and that the limit exists. Then ( T − p T ) / ( ˆ ψ ( p T ) − ψ ) d → N (0 , σ ψ ) . Under our regularity conditions on the parameter of interest, the convergence rate of thesieve VAR estimator ˆ ψ ( p T ) is ( T − p T ) − / = O ( T − / ). The condition that σ ψ exists andis nonzero rules out degenerate parameters that can be estimated super-consistently. Thiscondition could for example be violated if the true parameter of interest is on the boundaryof its parameter space (e.g., if the true FVD is 0, or the true degree of invertibility is 1).Such issues are not unique to SVMA-IV and could similarly arise in SVAR inference.Our second main proposition states that the usual delta method standard errors for aVAR( p T ) model are valid asymptotically. Proposition B.5.

Let the assumptions of Proposition B.4 hold. Let ˆ σ ψ ( p T ) ≡ ˆ ν T (ˆΓ( p T ) − ⊗ ˆΣ( p T ))ˆ ν T + ˆ ξ ( p T ) ˆΞ( p T ) ˆ ξ ( p T ) , where ˆ χ t ( p T ) ≡ vec(ˆ e t ( p T )ˆ e t ( p T ) − ˆΣ( p T )) and ˆΞ( p T ) ≡ ( T − p T ) − P Tt = p T +1 ˆ χ t ( p T ) ˆ χ t ( p T ) . Then ˆ σ ψ ( p T ) p → σ ψ . Observe that ˆ σ ψ ( p T ) is precisely the asymptotic variance estimator for ˆ ψ ( p T ) that one wouldcompute from the delta method formula based on a VAR( p T ) model for the data.To summarize, Propositions B.4 and B.5 imply that delta method inference based on28he estimated VAR( p T ) process is valid asymptotically even if the true DGP is a VAR( ∞ ).Hence, the partial identiﬁcation robust conﬁdence intervals proposed in Section 3 are validunder our regularity conditions. This conclusion is consistent with the ﬁnite-sample simula-tion evidence presented in Section 7. 29 .8 Additional proofs and auxiliary lemmas Here we prove Lemma 1 and all additional results stated in this appendix. We ﬁrst proveresults related to the SVMA-IV identiﬁcation analysis. Then we address the sieve VARconvergence results.

B.8.1 Proof of Lemma 1

We focus on the semideﬁniteness statement. Decompose B = B / B / ∗ and deﬁne ˜ b = B − / b . The statement of the lemma is equivalent with the statement that I n − x − ˜ b ˜ b ∗ ispositive semideﬁnite if and only if x ≥ b ∗ b . Let ν be an arbitrary n -dimensional complexvector satisfying ν ∗ ν = 1. Then ν ∗ (cid:16) I n − x − ˜ b ˜ b ∗ (cid:17) ν = 1 − ˜ b ∗ ˜ bx cos (cid:16) θ ( ν, ˜ b ) (cid:17) , where θ ( ν, ˜ b ) is the angle between ν and ˜ b . Evidently, x − ˜ b ∗ ˜ b ≤ ν . B.8.2 Auxiliary lemma for proof of Proposition B.1

Lemma B.1.

Let x t and ˜ x t be two stationary n -dimensional Gaussian time series whosespectral densities s x ( ω ) and s ˜ x ( ω ) are such that s ˜ x ( ω ) − s x ( ω ) is positive semideﬁnite for all ω ∈ [0 , π ] . Then Var( µ x t + ‘ | { x τ } −∞ <τ ≤ t ) ≤ Var( µ ˜ x t + ‘ | { ˜ x τ } −∞ <τ ≤ t ) for all ‘ = 1 , , . . . and all constant vectors µ ∈ R n .Proof. We may deﬁne an n -dimensional stationary Gaussian process ν t with spectral density s ν ( ω ) = s ˜ x ( ω ) − s x ( ω ), ω ∈ [0 , π ], and such that the ν t process is independent of the x t process. Then the process ˇ x t = x t + ν t has the same distribution as the ˜ x t process. Hence,Var( µ ˜ x t + ‘ | { ˜ x τ } −∞ <τ ≤ t ) = Var( µ ˇ x t + ‘ | { ˇ x τ } −∞ <τ ≤ t ) ≥ Var( µ ˇ x t + ‘ | { x τ , ν t } −∞ <τ ≤ t )= Var( µ x t + ‘ | { x τ , ν t } −∞ <τ ≤ t ) + Var( µ ν t + ‘ | { x τ , ν t } −∞ <τ ≤ t ) ≥ Var( µ x t + ‘ | { x τ , ν t } −∞ <τ ≤ t )= Var( µ x t + ‘ | { x τ } −∞ <τ ≤ t ) . x t and ν t processes implies that x t + ‘ and ν t + ‘ are independent also conditional on { x τ , ν t } −∞ <τ ≤ t . B.8.3 Proof of Proposition B.1

The proof proceeds in two steps. First, for a given known α , we show that FVD i,‘ issharply bounded above by 1 and below by (B.4). Second, we show that the lower bound ismonotonically decreasing in α , so that the overall lower bound is attained by α UB .1. Given α ∈ ( α LB , α UB ], the numerator of FVD i,‘ is point-identiﬁed (see below), so we needonly concern ourselves with the denominator. We can write the denominator asVar( y i,t + ‘ | { ε τ } −∞ <τ ≤ t ) = ‘ − X m =0 Θ i, ,m + n ε X j =2 ‘ − X m =0 Θ i,j,m = 1 α ‘ − X m =0 Cov( y i,t , ˜ z t − m ) + n ε X j =2 ‘ − X m =0 Θ i,j,m . (B.16)Given α , the ﬁrst term in (B.16) is point-identiﬁed (note that it equals the numeratorof the FVD), while the second is not. To upper-bound FVD i,‘ , we seek to make thatsecond term as small as possible. In fact, we can always set it to 0. To see this, let { Θ • ,j,m } ≤ j ≤ n ε , ≤ m< ∞ denote some sequence of impulse responses for the structural shocks j = 1 that is consistent with the second-moment properties of the data. Since α ∈ ( α LB , α UB ], such a sequence exists by Proposition 1. Now, for a given forecast horizon ‘ ,instead consider the new sequence { ˘Θ • ,j,m } ≤ j ≤ n ε , ≤ m< ∞ , deﬁned via˘Θ • ,j,m =  n y × if m ≤ ‘ − , Θ • ,j,m − ‘ if m > ‘ − . Then the stochastic process induced by { ˘Θ • ,j,m } ≤ j ≤ n ε , ≤ m< ∞ has the exact same second-moment properties as the (by assumption admissible) stochastic process induced by { Θ • ,j,m } ≤ j ≤ n ε , ≤ m< ∞ . However, by construction, we now have FVD i,‘ = 1, as claimed.For the lower bound, we want to make the second term in (B.16) as large as possible.Given a known α ∈ ( α LB , α UB ], deﬁne˜ y ( α ) t = (˜ y ( α )1 ,t , . . . , ˜ y ( α ) n y ,t ) ≡ y t − α ∞ X ‘ =0 Cov( y t , ˜ z t − ‘ ) ε ,t − ‘ = n ε X j =2 ∞ X ‘ =0 Θ • ,j,‘ ε j,t − ‘ , y ( α ) i,t + ‘ | { ˜ y ( α ) τ } −∞ <τ ≤ t ) ≥ Var(˜ y ( α ) i,t + ‘ | { ε j,τ } ≤ j ≤ n ε , −∞ <τ ≤ t ) = n ε X j =2 ‘ − X m =0 Θ i,j,m , so the second term in (B.16) has an point-identiﬁed upper bound. Thus, given α , FVD i,‘ is bounded below by the expression (B.4).We now argue that the lower bound (B.4) is attained by an admissible model with thegiven α . To that end, consider the Wold decomposition of ˜ y ( α ) t = P ∞ ‘ =0 ˜Θ ‘ ˜ ε t − ‘ , where the˜Θ ‘ matrices are n y × n y , and ˜ ε t is n y -dimensional i.i.d. standard normal and spanned by { ˜ y ( α ) τ } −∞ <τ ≤ t . B.16

Then Var(˜ y ( α ) i,t + ‘ | { ˜ y ( α ) τ } −∞ <τ ≤ t ) = P n ε j =2 P ‘ − m =0 ˜Θ i,j,m , so the followingmodel attains the lower bound (B.4) and is consistent with the given spectrum s w ( · ): y t = 1 α ∞ X ‘ =0 Cov( y t , ˜ z t − ‘ ) ε ,t + ∞ X ‘ =0 ˜Θ ‘ ˜ ε t − ‘ , ˜ z t = αε ,t + p Var(˜ z t ) − α × v t , (B.17)( ε ,t , ˜ ε t , v t ) i.i.d. ∼ N (0 , I n y +2 ) .

2. Lemma B.1 implies that Var(˜ y ( α ) i,t + ‘ | { ˜ y ( α ) τ } −∞ <τ ≤ t ) is increasing in α . Hence, the ex-pression (B.4) is decreasing in α , as claimed. At α = α UB , the representation (B.17)has ˜ z t = α UB ε ,t , so we can represent ˜ y ( α UB ) t = y t − E ( y t | { ε ,τ } −∞ <τ ≤ t ) = y t − E ( y t |{ ˜ z τ } −∞ <τ ≤ t ). B.8.4 Proof of Proposition B.2

The “only if” part was proved already in the text of Appendix B.2. For the “if” part, assumethat the cross-spectrum has the given factor structure. Since ˜ z t is serially uncorrelated, wecan write s ˜ z ( · ) = s ˜ z . Because s w ( ω ) is positive deﬁnite, the Schur complement s ˜ z − s y ˜ z ( ω ) ∗ s y ( ω ) − s y ˜ z ( ω ) = s ˜ z − ηζ ( ω ) ∗ s y ( ω ) − ζ ( ω ) η B.16

Since α > α LB , the Wold decomposition has no deterministic term, cf. the proof of Proposition 1.

32s also positive deﬁnite. Pre-multiplying the above expression by η s − z , post-multiplying by s − z η , and rearranging the positive deﬁniteness condition, we obtain the implication that2 πζ ( ω ) ∗ s y ( ω ) − ζ ( ω ) < πη s − z η , ω ∈ [0 , π ] . Now choose any α ≥ α lies strictly between the left- and right-hand sides in theabove inequality. The matrix Σ v ≡ πs ˜ z − α ηη is then positive deﬁnite by Lemma 1. Moreover, the same lemma implies that s y ( ω ) − πα ζ ( ω ) ζ ( ω ) ∗ is positive deﬁnite for all ω ∈ [0 , π ]. If we set Θ • , ( L ) = (2 π/α ) ζ ( L ), the same argumentsas in the proof of Proposition 1 show that there exists an n y × n y matrix polynomial ˜Θ( L )such that the following model achieves the desired spectrum s w ( ω ): y t = Θ • , ( L ) ε ,t + ˜Θ( L )˜ ε t , ˜ z t = αηε ,t + Σ / v v t , ( ε ,t , ˜ ε t , v t ) i.i.d. ∼ N (0 , I n y + n z +1 ) . Note that η assumes the role of λ . B.8.5 Proof of Proposition B.3

According to the model (1), we can write u t = ∞ X ‘ =0 M ‘ ε t − ‘ , for some n y × n ε matrices { M ‘ } . Let M • ,j,‘ denote the j -th column of M ‘ . Then˜ ε ,t = γ u t = n ε X j =1 ∞ X ‘ =0 a j,‘ ε j,t − ‘ , a j,‘ = γ M • ,j,‘ . We have Var(˜ ε ,t ) = 1 by construction of γ , so P n ε j =1 P ∞ ‘ =0 a j,‘ = 1.The expression for ˜Θ • , ,‘ in the proposition also immediately follows from the above displayand the fact Cov( y t , ε j,t − ‘ ) = Θ • ,j,‘ . Next, observe that R = Var( E ( ε ,t | { y τ } −∞ <τ ≤ t ))= Var( E ( ε ,t | { u τ } −∞ <τ ≤ t ))= Var( E ( ε ,t | u t ))= Cov( u t , ε ,t ) Σ − u Cov( u t , ε ,t )= M , , Σ − u M • , , . Since Σ u ˜ z = P ∞ ‘ =0 M ‘ Cov( ε t − ‘ , ˜ z t ) = αM • , , , we therefore have γ = 1 p Σ u ˜ z Σ − u Σ u ˜ z Σ − u Σ u ˜ z = 1 p M , , Σ − u M • , , Σ − u M • , , = 1 p R Σ − u M • , , . This implies a , = γ M • , , = q M , , Σ − u M • , , = q R . Finally, ˜Θ • , , = Cov( y t , ˜ ε ,t ) = Cov( y t , u t ) γ = Cov( u t , u t ) γ = Σ u γ = 1 p R M • , , , and M • , , = Cov( u t , ε ,t ) = Cov( y t , ε ,t ) = Θ • , , . B.8.6 Auxiliary lemmas for sieve VAR results

Here we deﬁne notation and state auxiliary lemmas used to prove the propositions in Ap-pendix B.7. The lemmas are proved below. For any matrix B , let k B k denote the largestsingular value of B . Recall that k B k ≤ k B k and k BC k ≤ k B kk C k for conformablematrices B and C . Let e t ( p ) ≡ W t − β ( p ) X t ( p ) for all t and p . Finally, deﬁne A cos ( ω ; p ) ≡ p X ‘ =1 A ‘ cos( ω‘ ) , A sin ( ω ; p ) ≡ p X ‘ =1 A ‘ sin( ω‘ ) , ω ∈ [0 , π ] , p ∈ N . Lemma B.2 (Lewis & Reinsel, 1985, p. 397) . Let Assumptions B.1 and B.3 hold. Then E ( k ˆΓ( p T ) − Γ( p T ) k ) = O ( p T /T ) . emma B.3. Let Assumptions B.1 and B.3 hold. Then k ˆ β ( p T ) − β ( p T ) k = O p (( p T /T ) / ) . Lemma B.4.

Let Assumptions B.1 and B.3 hold. Then ˆΣ( p T ) − ( T − p T ) − P Tt = p T +1 e t e t = o p ( T − / ) . Lemma B.5 (Lewis & Reinsel, 1985, Thm. 2) . Let Assumptions B.1 and B.3 hold. Let ˜ ν T ∈ R n W p T be a deterministic sequence of vectors such that k ˜ ν T k ≤ M < ∞ for all T .Deﬁne ζ T ≡ ( T − p T ) − / T X t = p T +1 ˜ ν T (cid:0) Γ( p T ) − X t ( p T ) ⊗ e t (cid:1) . Then ( T − p T ) / ˜ ν T vec( ˆ β ( p T ) − β ( p T )) − ζ T p → . Lemma B.6.

Let Assumption B.1 hold. Then for all j , j , j , j ∈ { , , . . . , n W } , all p, T ∈ N such that p < T , and all m , m , m , m ∈ Z we have T − p T X t = p +1 T X s = p +1 (cid:12)(cid:12) Cov( e j ,t + m e j ,t + m e j ,t e j ,t , e j ,s + m e j ,s + m e j ,s e j ,s ) (cid:12)(cid:12) ≤ E k e t k . Lemma B.7.

Let Assumptions B.1 and B.3 hold. Then (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec( e t X t ( p T ) ) − E [vec( e t X t ( p T ) ) vec( e t X t ( p T ) ) ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p ( p T /T ) , and (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec( e t e t − Σ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p ( p T /T ) . Lemma B.8.

Let Assumptions B.1 and B.3 hold. Deﬁne a sequence ˜ ν T as in Lemma B.5,and assume v ζ ≡ lim T →∞ ˜ ν T (Γ( p T ) − ⊗ Σ)˜ ν T exists. Then ( T − p T ) / ˜ ν T vec( ˆ β ( p T ) − β ( p T )) d → N (0 , v ζ ) , ( T − p T ) / vec( ˆΣ( p T ) − Σ) d → N (0 , Var( e t ⊗ e t )) , and these two random vectors are asymptotically independent. emma B.9. Let Assumptions B.1 and B.3 hold. Then sup ω ∈ [0 , π ] (cid:16) k ˆ A cos ( ω ; p T ) − A cos ( ω ; p T ) k + k ˆ A sin ( ω ; p T ) − A sin ( ω ; p T ) k (cid:17) = O p ( p T /T ) . Lemma B.10.

Let Assumptions B.1 and B.3 hold. For

M > , deﬁne A M ≡ { ( B , B ) ∈A δ × R n W × n W : k B j − P ∞ ‘ =1 A ‘ k ≤ M, j = 1 , } and S M = { ˜Σ ∈ S n W : k ˜Σ − Σ k ≤ M } . Thenthere exists an M < ∞ such that P (cid:16) ( ˆ A cos ( ω ; p T ) , ˆ A sin ( ω ; p T )) ∈ A M for all ω ∈ [0 , π ] , ˆΣ( p T ) ∈ S M (cid:17) → . Lemma B.11.

Let Assumptions B.1 to B.3 hold. Deﬁne ν T and ξ as in Appendix B.7.2.Then ( T − p T ) / n ( ˆ ψ ( p T ) − ψ ) − ν T vec( ˆ β ( p T ) − β ( p T )) − ξ vec( ˆΣ − Σ) o p → . B.8.7 Proof of Lemma B.3

The result follows almost directly from the proof of Thm. 1 in Lewis & Reinsel (1985). Asin that proof, deﬁne U ,T ≡ T − p T T X t = p T +1 ( e t ( p T ) − e t ) X t ( p T ) , U ,T ≡ T − p T T X t = p T +1 e t X t ( p T ) . Lewis & Reinsel’s arguments show that k U ,T k = O p ( p / T P ∞ ‘ = p T +1 k A ‘ k ) = o p (( p T /T ) / )and k U ,T k = O p (( p T /T ) / ) under Assumptions B.1 and B.3. The rest of the arguments inLewis & Reinsel’s proof now yields the desired convergence rate of ˆ β ( p T ). B.8.8 Proof of Lemma B.4

Recall the notation U ,T and U ,T in the proof of Lemma B.3. SinceˆΣ = 1 T − p T T X t = p T +1 e t e t + 1 T − p T T X t = p T +1 (ˆ e t − e t ) e t + 1 T − p T T X t = p T +1 e t (ˆ e t − e t ) + 1 T − p T T X t = p T +1 (ˆ e t − e t )(ˆ e t − e t ) T − p T T X t = p T +1 e t e t + R ,T + R ,T + R ,T , we need to show R ,T = o p ( T − / ) and R ,T = o p ( T − / ).Decompose R ,T as R ,T = 1 T − p T T X t = p T +1 (ˆ e t − e t ( p T )) e t + 1 T − p T T X t = p T +1 ( e t ( p T ) − e t ) e t ≡ ˜ R ,T + ˜ R ,T . Since ˆ e t ( p T ) − e t ( p T ) = ( β ( p T ) − ˆ β ( p T )) X t ( p T ), we have k ˜ R ,T k ≤ k ˆ β ( p T ) − β ( p T ) k k U ,T k = O p (( p T /T ) / ) O p (( p T /T ) / ) = o ( T − / ) . Moreover, since e t − e t ( p T ) = P ∞ ‘ = p T +1 A ‘ W t − ‘ , E k ˜ R ,T k ≤ T − p T T X t = p T +1 ∞ X ‘ = p T +1 k A ‘ k E ( k W t − ‘ e t k ) ≤ ∞ X ‘ = p T +1 k A ‘ k ( E k W t k E k e t k ) / = constant × ∞ X ‘ = p T +1 k A ‘ k = o p ( T − / ) . Now decompose R ,T as1 T − p T T X t = p T +1 (ˆ e t − e t )(ˆ e t − e t ) = 1 T − p T T X t = p T +1 (ˆ e t − e t ( p T ))(ˆ e t − e t ( p T )) + 1 T − p T T X t = p T +1 (ˆ e t − e t ( p T ))( e t ( p ) − e t ) + 1 T − p T T X t = p T +1 ( e t ( p ) − e t )(ˆ e t − e t ( p T )) + 1 T − p T T X t = p T +1 ( e t ( p T ) − e t )( e t ( p T ) − e t ) ˆ R ,T + ˆ R ,T + ˆ R ,T + ˆ R ,T . We have k ˆ R ,T k ≤ k ˆ β ( p T ) − β ( p T ) k k ˆΓ( p T ) k ≤ k ˆ β ( p T ) − β ( p T ) k ( k ˆΓ( p T ) − Γ( p T ) k + k Γ( p T ) k )= O p ( p T /T ) , using Lemma B.2 and Lemma B.3. Further, k ˆ R ,T k ≤ k ˆ β ( p T ) − β ( p T ) k k U ,T k = O p (( p T /T ) / ) o p (( p/T ) / ) = o p ( T − / ) . Finally, E k ˆ R ,T k ≤ E k e t ( p T ) − e t k ≤ P ∞ ‘ = p T +1 P ∞ m = p T +1 k A ‘ k k A m k E ( k W t − ‘ k k W t − m k ) ≤ constant × (cid:16)P ∞ ‘ = p T +1 k A ‘ k (cid:17) = o ( T − ) . B.8.9 Proof of Lemma B.6

By stationarity,1 T − p T X t = p +1 T X s = p +1 (cid:12)(cid:12) Cov( e j ,t + m e j ,t + m e j ,t e j ,t , e j ,s + m e j ,s + m e j ,s e j ,s ) (cid:12)(cid:12) = T − p − X ‘ = − ( T − p − (cid:18) − | ‘ | T − p (cid:19) (cid:12)(cid:12) Cov( e j ,‘ + m e j ,‘ + m e j ,‘ e j ,‘ , e j ,m e j ,m e j , e j , ) (cid:12)(cid:12) . (B.18)We ﬁrst argue that each term in the sum (B.18) is bounded. This follows from Cauchy-Schwarz: (cid:12)(cid:12) Cov( e j ,‘ + m e j ,‘ + m e j ,‘ e j ,‘ , e j ,m e j ,m e j , e j , ) (cid:12)(cid:12) ≤ (Var( e j ,‘ + m e j ,‘ + m e j ,‘ e j ,‘ ) Var( e j ,m e j ,m e j , e j , )) / ≤ max ≤ j ≤ n W E ( e j,t ) 38 E k e t k . Next, we show that at most 9 of the terms in the sum (B.18) are nonzero. Consider the termcorresponding to a given index ‘ in the sum. For the covariance in the term to be nonzero,it must be the case that { ‘ + m , ‘ + m , ‘ } ∩ { m , m , } 6 = ∅ (otherwise the two variablesin the covariance would be independent). At most 9 values of ‘ have this property.Putting the preceding two results together, we obtain the statement of the lemma. B.8.10 Proof of Lemma B.7

We ﬁrst remark that Assumption B.1 implies { W t } is a strictly non-deterministic time se-ries with Wold innovation e t . Thus, the Wold representation W t = B ( L ) e t has B ( L ) = P ∞ ‘ =0 B ‘ L ‘ = A ( L ) − , and so for ﬁxed i, j , the elements B i,j,‘ of B ‘ are absolutely summableacross ‘ (Brockwell & Davis, 1991, p. 418).Deﬁne the n W p T × n W p T matrix R ,T ≡ T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec( e t X t ( p T ) ) − E [vec( e t X t ( p T ) ) vec( e t X t ( p T ) ) ]with elements R ,T,i,j . Then k R ,T k = P n W p T i,j =1 R ,T,i,j , and the ﬁrst statement of the lemmafollows if we can show that E ( R ,T,i,j ) = O ( T − ) uniformly in i, j . Since E ( R ,T,i,j ) = 0 forall i, j , we need to show that Var( R ,T,i,j ) = O ( T − ) uniformly in i, j . The typical element R ,T,i,j has the form1 T − p T T X t = p T +1 e j ,t W j ,t − m e j ,t W j ,t − m − E [ e j ,t W j ,t − m e j ,t W j ,t − m ]for appropriate j , j , j , j , m , m ∈ N . Here W j,t is the j -th element of W t , and similarlyfor e t . The variance of the above expression is given by1( T − p T ) T X t = p T +1 T X s = p T +1 Cov( e j ,t W j ,t − m e j ,t W j ,t − m , e j ,s W j ,s − m e j ,s W j ,s − m ) . (B.19)Using the above-mentioned Wold decomposition of { W t } , we can write W j ,t − m = P n W b =1 P ∞ ‘ =0 B j ,b ,‘ e b ,t − m − ‘ , T − p T ∞ X ‘ ,‘ ,‘ ,‘ =0 n W X b ,b ,b ,b =1 B j ,b ,‘ B j ,b ,‘ B j ,b ,‘ B j ,b ,‘ × T − p T T X s,t = p T +1 Cov ( e j ,t e b ,t − m − ‘ e j ,t e b ,t − m − ‘ , e j ,s e b ,s − m − ‘ e j ,s e b ,s − m − ‘ ) . According to Lemma B.6, the above display is bounded by1 T − p T ∞ X ‘ ,‘ ,‘ ,‘ =0 n W X b ,b ,b ,b =1 | B j ,b ,‘ B j ,b ,‘ B j ,b ,‘ B j ,b ,‘ | × E k e t k = O ( T − ) , (B.20)where the equality uses the previously-mentioned absolute summability of { B ‘ } . This con-cludes the proof of the ﬁrst statement of the lemma.We prove the second statement of the lemma in a similar fashion. Deﬁne the n W p T × n W matrix R ,T ≡ T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec( e t e t − Σ) . Decompose it as R ,T = 1 T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec( e t e t ) − T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec(Σ) ≡ ˜ R ,T − ˜ R ,T . Since { vec( e t X t ( p T ) ) } is a serially uncorrelated ( n W p T )-dimensional sequence, it is easy toshow that E k ˜ R ,T k = O p ( p T /T ). Consider now the matrix ˜ R ,T . Its typical element( T − p T ) − P Tt = p T +1 e j ,t W j ,t − m e j ,t e j ,t has mean zero due to the independence of e t and W t − m for m ≥

1. We need to show that ithas variance of order O ( T − ). Said variance equals1( T − p T ) T X s,t = p T +1 Cov ( e j ,t W j ,t − m e j ,t e j ,t , e j ,s W j ,s − m e j ,s e j ,s )40 1 T − p T ∞ X ‘ ,‘ =0 n W X b ,b =1 B j ,b ,‘ B j ,b ,‘ × T − p T T X s,t = p T +1 Cov ( e j ,t e b ,t − m − ‘ e j ,t e j ,t , e j ,s e b ,s − m − ‘ e j ,s e j ,s ) . This expression is of order O ( T − ), for the same reason as (B.20) above. B.8.11 Proof of Lemma B.8

This result is very similar to Thm. 2 in Lewis & Reinsel (1985), with the twist that wehere deal also with the convergence of ˆΣ. Deﬁne v ζ,T ≡ ˜ ν T (Γ( p T ) − ⊗ Σ)˜ ν T for all T . If v ζ ≡ lim T →∞ v ζ,T = 0, it is easy to show that ( T − p T ) / ˜ ν T vec( ˆ β ( p T ) − β ( p T )) = o p (1)using Lemma B.5 and an mean-square bound, so in the following we assume v ζ >

0. ByLemma B.5 and the Cram´er-Wold device, we need to show that, for any λ ∈ R n W , T X t = p T +1 J t,T d → N (0 , , where we deﬁne the triangular array J t,T ≡ ˜ ν T (Γ( p T ) − X t ( p T ) ⊗ e t ) + λ vec( e t e t − Σ)( T − p T ) / (cid:0) v ζ,T + λ Var( e t ⊗ e t ) λ (cid:1) / , t = p T + 1 , . . . , T, T ∈ N . Since { e t } is i.i.d., e t is independent of X t ( p T ), so { J t,T } p T +1 ≤ t ≤ T is a martingale diﬀerencesequence with respect to the ﬁltration generated by { e t } . Also, since E [ X t ( p T )] = 0, we have E ( J t,T ) = ( T − p T ) − . The statement of the lemma then follows from Davidson (1994, Thm.24.3) if we can show P Tt = p T +1 J t,T p → p T +1 ≤ t ≤ T | J t,T | p → . (B.22)We ﬁrst prove (B.21), following the univariate argument in Gon¸calves & Kilian (2007,pp. 633–636). Decompose T X t = p T +1 J t,T − { v ζ,T + λ Var( e t ⊗ e t ) λ } − (cid:26) T − p T T X t = p T +1 h(cid:0) ˜ ν T (Γ( p T ) − X t ( p T ) ⊗ e t ) (cid:1) − v ζ,T i + 2 T − p T T X t = p T +1 ˜ ν T (Γ( p T ) − X t ( p T ) ⊗ e t ) vec( e t e t − Σ) λ + 1 T − p T T X t = p T +1 h ( λ vec( e t e t − Σ)) − λ Var( e t ⊗ e t ) λ i (cid:27) ≡ { v ζ,T + λ Var( e t ⊗ e t ) λ } − (cid:8) R ,T + 2 R ,T + R ,T (cid:9) . The i.i.d. law of large numbers implies that R ,T = o p (1). We now show that also R ,T = o p (1) and R ,T = o p (1). First, | R ,T | = (cid:12)(cid:12)(cid:12)(cid:12) ˜ ν T (Γ( p T ) − ⊗ I n W ) (cid:26) T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec( e t X t ( p T ) ) − E [vec( e t X t ( p T ) ) vec( e t X t ( p T ) ) ] (cid:27) (Γ( p T ) − ⊗ I n W )˜ ν T (cid:12)(cid:12)(cid:12)(cid:12) ≤ k ˜ ν T k k Γ( p T ) − k × (cid:13)(cid:13)(cid:13)(cid:13) T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec( e t X t ( p T ) ) − E [vec( e t X t ( p T ) ) vec( e t X t ( p T ) ) ] (cid:13)(cid:13)(cid:13)(cid:13) = o p (1) , where the last line follows from Lemma B.7 and Assumptions B.1 and B.3. Second, weanalogously have | R ,T | ≤ k ˜ ν T k k λ k k Γ( p T ) − k (cid:13)(cid:13)(cid:13)(cid:13) T − p T T X t = p T +1 vec( e t X t ( p T ) ) vec( e t e t − Σ) (cid:13)(cid:13)(cid:13)(cid:13) = o p (1) , again using Lemma B.7 and Assumptions B.1 and B.3. This concludes the proof of (B.21).To prove (B.22), ﬁrst note that since E k e t k (cid:15) < ∞ for some (cid:15) >

0, a standard argumentfor i.i.d. variables gives that ( T − p T ) − / max p T +1 ≤ t ≤ T | λ vec( e t e t − Σ) | = o p (1). Next, the42ame calculations as in equation (2.12) in Lewis & Reinsel (1985, p. 401) yield P (cid:18) max p T +1 ≤ t ≤ T (˜ ν T (Γ( p T ) − X t ( p T ) ⊗ e t )) T − p T ≥ ˜ (cid:15) (cid:19) ≤ (cid:15) p T ( T − p T ) k ˜ ν T k k Γ( p T ) − k E k e t k E k W t k → (cid:15) >

0. Putting the previous two facts together, we obtain (B.22).

B.8.12 Proof of Lemma B.9

For any ω ∈ [0 , π ], k ˆ A cos ( ω ; p T ) − A cos ( ω ; p T ) k = P p T ‘ =1 k ˆ A ‘ − A ‘ k cos ( ω‘ ) ≤ P p T ‘ =1 k ˆ A ‘ − A ‘ k = k ˆ β ( p T ) − β ( p T ) k = O ( p T /T ) , using Lemma B.3. The argument for A sin is identical. B.8.13 Proof of Lemma B.10

We start oﬀ by showing that the estimated VAR spectrum is nonsingular, asymptotically.Extend the deﬁnition of the Frobenius norm to complex matrices, so k B k ≡ tr( B ∗ B ). Thematrix perturbation bound | det( B ) − det( C ) | ≤ n k C − B k max {k B k , k C k} n − for n × n complex matrices B and C (Bhatia, 1997, Problem I.6.11, p. 22) implies (cid:12)(cid:12)(cid:12) det( A ( e iω )) − det( I n W − ˆ A cos ( ω ; p T ) − i ˆ A sin ( ω ; p T )) (cid:12)(cid:12)(cid:12) ≤ n W (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ X ‘ = p T +1 A ‘ e iω − p T X ‘ =1 ( ˆ A ‘ − A ‘ ) e iω (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) max ((cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ X ‘ =1 A ‘ e iω (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p T X ‘ =1 ˆ A ‘ e iω (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)) n W − . (B.23)Lemma B.9 implies sup ω ∈ [0 , π ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p T X ‘ =1 ( ˆ A ‘ − A ‘ ) e iω (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = o p (1) .

43y Assumptions B.1 and B.3, the right-hand side of (B.23) therefore tends to 0 in probabilityuniformly in ω , implyinginf ω ∈ [0 , π ] (cid:12)(cid:12)(cid:12) det( I n W − ˆ A cos ( ω ; p T ) − i ˆ A sin ( ω ; p T )) (cid:12)(cid:12)(cid:12) = inf ω ∈ [0 , π ] | det( A ( e iω )) | + o p (1) > δ + o p (1) . Thus, with probability approaching 1,( ˆ A cos ( ω ; p T ) , ˆ A sin ( ω ; p T )) ∈ A δ for all ω ∈ [0 , π ] . We now show that, asymptotically, the estimated VAR spectrum lies in a region where g ( · ) issmooth. Let M ≡ max { P ∞ ‘ =1 k A ‘ k , k Σ k} + 1. By Assumption B.2, g ( · , · , · ) is continuouslydiﬀerentiable on A M × S M . Since (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ˆ A cos ( ω ; p T ) − ∞ X ‘ =1 A ‘ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˆ A cos ( ω ; p T ) − A cos ( ω ; p T ) (cid:13)(cid:13)(cid:13) + 2 ∞ X ‘ =1 k A ‘ k = 2 ∞ X ‘ =1 k A ‘ k + o p (1)uniformly in ω by Lemma B.9 and Assumption B.3 (and similarly for sin instead of cos), itfollows that, with probability approaching 1,( ˆ A cos ( ω ; p T ) , ˆ A sin ( ω ; p T )) ∈ A M for all ω ∈ [0 , π ] . Moreover, by the law of large numbers for i.i.d. variables and Lemma B.4, we also haveˆΣ( p T ) ∈ S M with probability approaching 1. B.8.14 Proof of Lemma B.11

We start out by applying a ﬁrst-order Taylor expansion to the parameter of interest ψ . ByLemma B.10 and Assumption B.2, we can write g ( ˆ A cos ( ω ; p T ) , ˆ A sin ( ω ; p T ) , ˆΣ) − g ( A cos ( ω ; p T ) , A sin ( ω ; p T ) , Σ)= g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆ A cos ( ω ; p T ) − A cos ( ω ))+ g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆ A sin ( ω ; p T ) − A sin ( ω ))+ g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆΣ − Σ)+ ˆ R T ( ω ) , g ( · , · , · ) is twice continuously diﬀerentiable implies that there exists a C > | ˆ R T ( ω ) | ≤ C (cid:16) k ˆ A cos ( ω ; p T ) − A cos ( ω ) k + k ˆ A sin ( ω ; p T ) − A sin ( ω ) k + k ˆΣ − Σ k (cid:17) for all ω , with probability approaching 1. Since k ˆ A cos ( ω ; p T ) − A cos ( ω ) k ≤ P ∞ ‘ = p T +1 k A ‘ k + k ˆ A cos ( ω ; p T ) − A cos ( ω ; p T ) k = O p (( p T /T ) / )by Lemma B.9 and Assumption B.3 (and similarly with sin instead of cos), and since k ˆΣ − Σ k = O p ( T − / ) by Lemma B.4, we obtain Z π | ˆ R T ( ω ) | dω = O p ( p T /T ) . Using the continuity and thus boundedness of h ( · ), we therefore getˆ ψ ( p T ) − ψ = Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆ A cos ( ω ; p T ) − A cos ( ω )) dω + Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆ A sin ( ω ; p T ) − A sin ( ω )) dω + Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆΣ − Σ) dω + O p ( p T /T )= Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆ A cos ( ω ; p T ) − A cos ( ω ; p T )) dω + Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆ A sin ( ω ; p T ) − A sin ( ω ; p T )) dω + Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( A cos ( ω ; p T ) − A cos ( ω )) dω (B.24)+ Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( A sin ( ω ; p T ) − A sin ( ω )) dω (B.25)+ ξ vec( ˆΣ − Σ)+ O p ( p T /T ) .

45e now bound the nonparametric bias term (B.24); the argument for (B.25) is similar. Notethat h ( · ) is bounded, and Z π k g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( A cos ( ω ; p T ) − A cos ( ω )) k dω ≤ Z π k g ( A cos ( ω ) , A sin ( ω ) , Σ) k dω × sup ω ∈ [0 , π ] k A cos ( ω ; p T ) − A cos ( ω ) k≤ Z π k g ( A cos ( ω ) , A sin ( ω ) , Σ) k dω × ∞ X ‘ = p T +1 k A ‘ k = o ( T − / ) , by Assumption B.3. We also used that Assumption B.2 implies ω

7→ k g ( A cos ( ω ) , A sin ( ω ) , Σ) k is in L (0 , π ), implying that this function is integrable. Thus, the terms (B.24)–(B.25) areeach o ( T − / ).To complete the proof, observe that Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆ A cos ( ω ; p T ) − A cos ( ω ; p T )) dω + Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) vec( ˆ A sin ( ω ; p T ) − A sin ( ω ; p T )) dω = Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) p T X ‘ =1 vec( ˆ A ‘ − A ‘ ) cos( ω‘ ) dω + Z π h ( ω ) g ( A cos ( ω ) , A sin ( ω ) , Σ) p T X ‘ =1 vec( ˆ A ‘ − A ‘ ) sin( ω‘ ) dω = p T X ‘ =1 ν ‘,T vec( ˆ A ‘ − A ‘ ) . In conclusion,ˆ ψ ( p T ) − ψ = ν T vec( ˆ β ( p T ) − β ( p T )) + ξ vec( ˆΣ − Σ) + o ( T − / ) + O p ( p T /T ) . The above remainder terms are both o p (( T − p T ) − / ) by Assumption B.3.46 .8.15 Proof of Proposition B.4 The proposition follows immediately from Lemmas B.8 and B.11 if we can show that k ν T k is bounded asymptotically. Let g j,i ( · , · , · ) denote the i -th element of g j ( · , · , · ), j = 1 , i = 1 , , . . . , n W . Let M ≡ sup ω ∈ [0 , π ] | h ( ω ) | < ∞ . Then k ν T k = n W X i =1 p T X ‘ =1 (cid:16) Z π h ( ω ) (cid:8) g ,i ( A cos ( ω ) , A sin ( ω ) , Σ) cos( ω‘ )+ g ,i ( A cos ( ω ) , A sin ( ω ) , Σ) sin( ω‘ ) (cid:9) dω (cid:17) ≤ M n W X i =1 p T X ‘ =1 (cid:26) (cid:18)Z π g ,i ( A cos ( ω ) , A sin ( ω ) , Σ) cos( ω‘ ) dω (cid:19) + (cid:18)Z π g ,i ( A cos ( ω ) , A sin ( ω ) , Σ) sin( ω‘ ) dω (cid:19) (cid:27) . The sum p T X ‘ =1 (cid:18) π Z π g ,i ( A cos ( ω ) , A sin ( ω ) , Σ) cos( ω‘ ) dω (cid:19) (B.26)equals the L (0 , π ) norm of the projection of the function ω g ,i ( A cos ( ω ) , A sin ( ω ) , Σ)onto the space of orthonormal functions { ω cos( ω‘ ) } ≤ ‘ ≤ p T . Bessel’s inequality thereforestates that (B.26) is bounded above by the squared L (0 , π ) norm of the function ω g ,i ( A cos ( ω ) , A sin ( ω ) , Σ). We can similarly bound the expression (B.26) with g ,i ( · , · , · ) inplace of g ,i ( · , · , · ) and with sin( ω‘ ) in place of cos( ω‘ ). Hence, k ν T k ≤ π M P n W i =1 (cid:16) k g ,i ( A cos ( · ) , A sin ( · ) , Σ) k L (0 , π ) + k g ,i ( A cos ( · ) , A sin ( · ) , Σ) k L (0 , π ) (cid:17) , using obvious notation for the L norms. These norms are ﬁnite by Assumption B.2. B.8.16 Proof of Proposition B.5

We start by showing that k ˆ ν T − ν T k = o p (1) and k ˆ ξ ( p T ) − ξ k = o p (1). By Lemma B.10, andthe twice continuous diﬀerentiability assumed in Assumption B.2, there exists a constant C < ∞ such that, with probability approaching one, k g j ( ˆ A cos ( ω ; p T ) , ˆ A sin ( ω ; p T ) , ˆΣ) − g j ( A cos ( ω ) , A sin ( ω ) , Σ) k C (cid:16) k ˆ A cos ( ω ; p T ) − A cos ( ω ) k + k ˆ A sin ( ω ; p T ) − A sin ( ω ) k + k ˆΣ( p T ) − Σ k (cid:17) for j = 1 , ,

3. By Lemma B.4 and the i.i.d. central limit theorem, we have k ˆΣ( p T ) − Σ k = O p ( T − / ). Using additionally Lemma B.9, we then have, for example, that p T X ‘ =1 (cid:13)(cid:13)(cid:13)(cid:13)Z π h ( ω ) h g ( ˆ A cos ( ω ) , ˆ A sin ( ω ) , ˆΣ) − g ( A cos ( ω ) , A sin ( ω ) , Σ) i cos( ω‘ ) dω (cid:13)(cid:13)(cid:13)(cid:13) ≤ ˜ Cp T sup ω ∈ [0 , π ] (cid:16) k ˆ A cos ( ω ; p T ) − A cos ( ω ) k + k ˆ A sin ( ω ; p T ) − A sin ( ω ) k + k ˆΣ − Σ( p T ) k (cid:17) = O p (( p T /T ) / )= o p (1) , where ˜ C is some constant. This type of calculation implies k ˆ ν T − ν T k = o p (1) and k ˆ ξ ( p T ) − ξ k = o p (1).We now deal with the consistency of the two terms in ˆ σ ψ ( p T ) one at a time. First,decomposeˆ ν T (ˆΓ( p T ) − ⊗ ˆΣ( p T ))ˆ ν T = ν T (cid:16) (ˆΓ( p T ) − ⊗ ˆΣ( p T )) − (Γ( p T ) − ⊗ Σ) (cid:17) ν T + (ˆ ν T − ν T ) (ˆΓ( p T ) − ⊗ ˆΣ( p T ))(ˆ ν T − ν T )+ 2(ˆ ν T − ν T ) (ˆΓ( p T ) − ⊗ ˆΣ( p T )) ν T ≡ R ,T + R ,T + 2 R ,T . Using Lemma B.2, we ﬁnd | R ,T | ≤ k ν T k (cid:13)(cid:13)(cid:13) (ˆΓ( p T ) − ⊗ ˆΣ( p T )) − (Γ( p T ) − ⊗ Σ) (cid:13)(cid:13)(cid:13) ≤ M (cid:16) k ˆΓ( p T ) − − Γ( p T ) − k k ˆΣ( p T ) k + k Γ( p T ) − k k ˆΣ( p T ) − Σ k (cid:17) ≤ M (cid:16) k ˆΓ( p T ) − Γ( p T ) k k Γ( p T ) − k k ˆΓ( p T ) − k k ˆΣ( p T ) k + k Γ( p T ) − k k ˆΣ( p T ) − Σ k (cid:17) = o p (1) . Similar calculations, along with the fact k ˆ ν T − ν T k = o p (1), can be used to show that R ,T = o p (1) and R ,T = o p (1).Second, deﬁne Ξ ≡ Var( e t ⊗ e t ) and decomposeˆ ξ ( p T ) ˆΞ( p T ) ˆ ξ ( p T ) − ξ Ξ ξ = ξ (ˆΞ( p T ) − Ξ) ξ

48 ( ˆ ξ ( p T ) − ξ ) ˆΞ( p T )( ˆ ξ ( p T ) − ξ )+ 2( ˆ ξ ( p T ) − ξ ) ˆΞ( p T ) ξ ≡ ˜ R ,T + ˜ R ,T + 2 ˜ R ,T . Since k ˆ ξ ( p T ) − ξ k = o p (1), the statement of the proposition follows if we can show k ˆΞ( p T ) − Ξ k = o p (1). Deﬁne χ t ≡ vec( e t e t − Σ), and note that ( T − p T ) − P Tt = p T +1 χ t χ t p → Ξ by theusual law of large numbers for i.i.d. variables. Because k ˆΞ( p T ) − Ξ k ≤ T − p T T X t = p T +1 k ˆ χ t ˆ χ t − χ t χ t k≤ T − p T T X t = p T +1 k ˆ χ t − χ t k + 2 T − p T T X t = p T +1 k ˆ χ t − χ t k k χ t k≤ T − p T T X t = p T +1 k ˆ χ t − χ t k + 2 (cid:18) T − p T T X t = p T +1 k ˆ χ t − χ t k × T − p T T X t = p T +1 k χ t k (cid:19) / by Cauchy-Schwarz, we just need to show that( T − p T ) − P Tt = p T +1 k ˆ χ t − χ t k = o p (1) . Since k ˆ χ t − χ t k = k ˆ e t ( p T )ˆ e t ( p T ) − e t e t k≤ k ˆ e t ( p T ) − e t k + 2 k ˆ e t ( p T ) − e t k k e t k , we have 1 T − p T T X t = p T +1 k ˆ χ t − χ t k ≤ T − p T T X t = p T +1 k ˆ e t − e t k + 4 T − p T T X t = p T +1 k ˆ e t − e t k T − p T T X t = p T +1 k e t k ! / . The i.i.d. law of large numbers gives ( T − p T ) − P Tt = p T +1 k e t k = O p (1). To complete the49roof, we bound1 T − p T T X t = p T +1 k ˆ e t − e t k ≤ T − p T T X t = p T +1 k ˆ e t − e t ( p T ) k + 8 T − p T T X t = p T +1 k e t − e t ( p T ) k ≡

8( ˆ R ,T + ˆ R ,T )and show that the two terms on the right-hand side tend to zero, using similar argumentsas in the proof of Lemma B.4. First,ˆ R ,T ≤ k ˆ β ( p T ) − β ( p T ) k T − p T T X t = p T +1 k X t ( p T ) k = O p (( p T /T ) ) O p ( p T ) = o p (1) , since E k X t ( p T ) k = E ( P p T ‘ =1 k W t − ‘ k ) = P p T ‘ =1 P p T m =1 E ( k W t − ‘ k k W t − m k ) = O ( p T ) . Second, E ( ˆ R ,T ) = E k e t − e t ( p T ) k ≤ E (cid:16)P ∞ ‘ = p T +1 k A ‘ kk W t − ‘ k (cid:17) = P ∞ ‘ ,‘ ,‘ ,‘ = p T +1 k A ‘ k k A ‘ k k A ‘ k k A ‘ k E ( k W t − ‘ k k W t − ‘ k k W t − ‘ k k W t − ‘ k ) ≤ constant × (cid:16)P ∞ ‘ = p T +1 k A ‘ k (cid:17) = o (1) . eferences Berk, K. (1974). Consistent Autoregressive Spectral Estimates.

Annals of Statistics , (3),489–502.Bhatia, R. (1997). Matrix Analysis . Graduate Texts in Mathematics. Springer.Brockwell, P. J. & Davis, R. A. (1991).

Time Series: Theory and Methods (2nd ed.). SpringerSeries in Statistics. Springer.Davidson, J. (1994).

Stochastic Limit Theory: An Introduction for Econometricians . Ad-vanced Texts in Econometrics. Oxford University Press.Del Negro, M., Giannoni, M. P., & Patterson, C. (2012). The Forward Guidance Puzzle.Federal Reserve Bank of New York Staﬀ Report No. 574.Forni, M., Gambetti, L., & Sala, L. (2019). Structural VARs and noninvertible macroeco-nomic models.

Journal of Applied Econometrics , (2), 221–246.Gertler, M. & Karadi, P. (2015). Monetary Policy Surprises, Credit Costs, and EconomicActivity. American Economic Journal: Macroeconomics , (1), 44–76.Gon¸calves, S. & Kilian, L. (2007). Asymptotic and nootstrap inference for AR( ∞ ) processeswith conditional heteroskedasticity. Econometric Reviews , (6), 609–641.Hannan, E. (1970). Multiple Time Series . Wiley Series in Probability and Statistics. JohnWiley & Sons.Kreiss, J.-P., Paparoditis, E., & Politis, D. N. (2011). On the Range of Validity of theAutoregressive Sieve Bootstrap.

Annals of Statistics , (4), 2103–2130.Leeper, E. M., Walker, T. B., & Yang, S.-C. S. (2013). Fiscal Foresight and InformationFlows. Econometrica , (3), 1115–1145.Lewis, R. & Reinsel, G. C. (1985). Prediction of Multivariate Time Series by AutoregressiveModel Fitting. Journal of Multivariate Analysis , (3), 393–411.Lippi, M. & Reichlin, L. (1994). VAR analysis, nonfundamental representations, Blaschkematrices. Journal of Econometrics , (1), 307–325.51ertens, K. & Ravn, M. O. (2013). The Dynamic Eﬀects of Personal and Corporate IncomeTax Changes in the United States. American Economic Review , (4), 1212–1247.Meyer, M. & Kreiss, J.-P. (2015). On the Vector Autoregressive Sieve Bootstrap. Journalof Time Series Analysis , (3), 377–397.Ramey, V. A. (2016). Macroeconomic Shocks and Their Propagation. In J. B. Taylor &H. Uhlig (Eds.), Handbook of Macroeconomics , volume 2 chapter 2, (pp. 71–162). Elsevier.Saikkonen, P. & Lutkepohl, H. (2000). Asymptotic Inference on Nonlinear Functions of theCoeﬃcients of Inﬁnite Order Cointegrated VAR Processes. In W. Barnett, D. Hendry,S. Hylleberg, T. Ter¨asvirta, D. Tjøstheim, & A. W¨urtz (Eds.),

Nonlinear EconometricModeling in Time Series Analysis (pp. 165–201). Cambridge University Press.Sims, C. A. & Zha, T. (2006). Does Monetary Policy Generate Recessions?

MacroeconomicDynamics , (02), 231–272.Smets, F. & Wouters, R. (2007). Shocks and Frictions in US Business Cycles: A BayesianDSGE Approach. American Economic Review , (3), 586–606.Stock, J. H. (2008). What’s New in Econometrics: Time Series, Lecture 7. Lecture slides,NBER Summer Institute.Stock, J. H. & Watson, M. W. (2012). Disentangling the Channels of the 2007–09 Recession. Brookings Papers on Economic Activity , (1), 81–135.Uhlig, H. (2005). What are the eﬀects of monetary policy on output? Results from anagnostic identiﬁcation procedure. Journal of Monetary Economics , (2), 381–419.Wolf, C. K. (2020). SVAR (Mis)identiﬁcation and the Real Eﬀects of Monetary PolicyShocks. American Economic Journal: Macroeconomics ,12