[PDF] Econometric analysis of potential outcomes time series: instruments, shocks, linearity and the causal response function

Abstract

Bojinov & Shephard (2019) defined potential outcome time series to nonparametrically measure dynamic causal effects in time series experiments. Four innovations are developed in this paper: "instrumental paths," treatments which are "shocks," "linear potential outcomes" and the "causal response function." Potential outcome time series are then used to provide a nonparametric causal interpretation of impulse response functions, generalized impulse response functions, local projections and LP-IV.

Full PDF

aa r X i v : . [ ec on . E M ] F e b Econometric analysis of potential outcomes time series:instruments, shocks, linearity and the causal response function ∗ Ashesh Rambachan

Department of Economics,Harvard University [email protected]

Neil Shephard

Department of Economics andDepartment of Statistics,Harvard University [email protected]

February 27, 2020

Abstract

Bojinov and Shephard (2019) deﬁned potential outcome time series to nonparametrically mea-sure dynamic causal eﬀects in time series experiments. Four innovations are developed in thispaper: “instrumental paths”, treatments which are “shocks”, “linear potential outcomes” and the“causal response function.” Potential outcome time series are then used to provide a nonparametriccausal interpretation of impulse response functions, generalized impulse response functions, localprojections and LP-IV.

Keywords:

Dynamic causality, instrumental variables, linearity, potential outcomes, time series,shocks. ∗ This is a revised version of “A nonparametric dynamic causal model for macroeconometrics.” We thank Iavor Bojinov,Gary Chamberlain, Fabrizia Mealli, James M. Robins and James H. Stock for conversations that have developed ourthinking on causality. We are also grateful to Isaiah Andrews, John Campbell, Peng Ding, Avi Feller, Ron Gallant,Peter R. Hansen, Kosuke Imai, Guido Kuersteiner, Daniel Lewis, Luke Miratrix, Sendhil Mullainathan, Ulrich Muller,Susan A. Murphy, Andrew Patton, Natesh Pillai, Mikkel Plagborg-Moller, Julia Shephard and Christopher Sims forvaluable feedback on an earlier draft. Participants at the Harvard Econometrics Workshop, Workshop on Score-drivenTime-series Models at the University of Cambridge, NBER-NSF Time Series Conference 2019, the Conference in Honorof George Tauchen at Duke University, the Workshop on Causal Inference with Interactions at UCL and the (EC) Introduction

Bojinov and Shephard (2019) developed potential outcome time series to nonparametrically measuredynamic causal eﬀects from time series randomized experiments conducted in ﬁnancial markets. How-ever, most time series data used in economics is observational. In this paper we develop the toolsneeded to use the potential outcome time series framework on observational data, yielding an ob-servational, nonparametric framework for measuring dynamic causal eﬀects. It provides a ﬂexiblefoundation upon which to build new methods and interpret existing methods for causal inference oneconomic time series.Our analysis is based on four new ideas beyond Bojinov and Shephard (2019). The ﬁrst threeare special cases of the potential outcome time series: “instrumented potential outcome time series”,treatments which are “shocks” and “linear potential outcomes”. The fourth innovation is the “causalresponse function,” which is a new dynamic causal estimand. To illustrate the power of these fourideas, we provide a nonparametric causal interpretation to four tools commonly used in the time seriesliterature: impulse response functions, generalized impulse response functions, local projections andlocal projections with an instrumental variable (LP-IV). Our results show that a tightly parameterizedmodel such as the structural moving average is not needed to provide a causal interpretation to theseobjects.Of course, there is a storied history of economists trying to learn dynamic causal eﬀects from timeseries data. Modern reviews include Ramey (2016) and Stock and Watson (2018). As this vast bodyof research emphasizes, conceptualizing and estimating dynamic causal eﬀects is quite challenging.Dynamic feedback between treatments and observed outcomes makes it diﬃcult to disentangle causesfrom eﬀects. Additionally, in many important applications, only several hundred time series observa-tions are available. Given these challenges, much of the literature on dynamic causal eﬀects in timeseries relies on parameterized linear models. Canonical examples are structural vector autoregressions(Sims, 1980) , local projections (Jord´a, 2005) and LP-IV (Jord´a et al., 2015; Stock and Watson, 2018).However, there are exceptions such as Priestley (1988), Engle et al. (1990) and Gallant et al. (1993).Inﬂuentially, Koop et al. (1996) deﬁned a “generalized impulse response function” for non-linear, non-causal models.While tractable, the heavy emphasis on linear models has drawbacks. The role of particular setsof assumptions are often unclear in existing approaches. For example, it is common in economics to Structural vector autoregressions are typically motivated as a linear approximation to an equilibrium arising from anunderlying dynamic stochastic general equilibrium model such as Christiano et al. (1999, 2005), Smets and Wouters(2003, 2007). T , large- N panels. The groundbreaking panel work of Robins (1986) led to an enormous lit-erature on dynamic causal eﬀects (Murphy et al., 2001; Murphy, 2003; Abbring and Heckman, 2007;Heckman and Navarro, 2007; Lechner, 2011; Heckman et al., 2016; Boruvka et al., 2018; Blackwell and Glynn,2018; Hernan and Robins, 2019). However, the four new ideas in this paper are not the focus of thosepapers.Inference on dynamic causal eﬀects is one of the great themes of the broader time series literature.Researchers quantify causality in time series in a variety of ways such as using “Granger causality”(Wiener, 1956; Granger, 1969), highly structured models such as DSGE models (Herbst and Schorfheide,2015), behavioral game theory (Toulis and Parkes, 2016), state space modelling (Harvey and Durbin,1986; Harvey, 1996; Bondersen et al., 2015), Bayesian structural models (Brodersen et al., 2015) aswell as intervention analysis (Box and Tiao, 1975) and regression discountinuity (Kuersteiner et al.,2018). The potential outcome time series is distinct from each of those approaches.The closest work to the potential outcome time series framework is Angrist and Kuersteiner (2011)and Angrist et al. (2018), which also studies time series using potential outcomes (see also White and Lu(2010) and Lu et al. (2017)). That work is importantly diﬀerent from Bojinov and Shephard (2019),as it avoids discussion of treatment paths, deﬁning potential outcomes as a function of a single priortreatment — this diﬀerence will be detailed in Section 2. More importantly, Angrist and Kuersteiner(2011) and Angrist et al. (2018) do not discuss the main contribution of this paper which are thespecial cases of instruments, shocks and linear potential outcomes and the establishment of the causalresponse function. Also related to the framework is Robins et al. (1999) who used potential out-3ome paths for binary time series and Bondersen et al. (2015) who used them for state space models.Recently, Bojinov and Shephard (2019) and Blackwell and Glynn (2018) used them in more generalsettings. Overview of the paper:

Section 2 recalls the deﬁnition of a potential outcome time series. Thentwo examples of this setup are given, before developing the three important special cases that dealwith instrumental variables, shocks and linear potential outcomes. Section 3 deﬁnes causal eﬀects,introducing a weighted causal eﬀect and a causal response function. We provide deﬁnitions that allowus to link them with the economics literature and are more general than those in Bojinov and Shephard(2019). We show that the causal response function is closely related to the impulse response function.We also analyze the properties of these causal estimands under the assumptions of linear potentialoutcomes and shocked treatments. Section 3 ﬁnishes with a nonparametric causal interpretation ofthe local projection and its instrumental variables version. Section 4 concludes the paper. Longerproofs and a series of additional results are collected in our Web Appendix.

Notation:

The mathematics of this paper is written using standard path notation: for a time series { A t : t = 1 , , . . . T } , let A t := ( A , . . . , A t ). Here := denotes a deﬁnition of the left hand side of theequation. Further, A ⊥⊥ B generically means that the random variable A is stochastically independentof B . A B denotes A and B not being independent, while A L = B means A and B have the samelaw or distribution. For a matrix A , A ⊺ is the transpose of A . We recall the deﬁnition of the potential outcome time series developed by Bojinov and Shephard(2019) in the context of time series experiments seen in ﬁnancial economics. There is nothing novel inthis ﬁrst subsection.There is a single unit that is observed over t = 1 , . . . , T periods. At each time period, the unitreceives a new K -dimensional treatment W t and we observe a scalar outcome Y t . The potential outcometime series links treatments and outcomes using four foundation stones: (i) the deﬁnition of treatmentand potential outcome paths, (ii) an assumption of non-anticipating outcomes, (iii) an assumptionthat generates outcomes by linking potential outcomes to treatments and (iv) an assumption of non-anticipating treatments. 4 potential outcome describes what would be observed at time t for a particular path of treatments.Its formal deﬁnition is given below. Deﬁnition 1. A treatment path W T is a stochastic process where each random variable W t hascompact support W ⊂ R K . The potential outcome path is, for any deterministic w T ∈ W T , thestochastic process Y T ( w T ) := ( Y ( w T ) , Y ( w T ) , ..., Y T ( w T )) ⊺ , where the time- t potential outcome Y t ( w T ) : W T → R . In the deﬁnition above, the potential outcomes can depend on future treatments. Now, we employour second foundation stone: restricting the potential outcomes to only depend on past and currenttreatments.

Assumption 1 (Non-anticipating potential outcomes) . For each t = 1 , . . . , T , Y t ( w t , w t +1: T ) = Y t ( w t , w ′ t +1: T ) almost surely, for all deterministic w T ∈ W T , w ′ t +1: T ∈ W T − t . Assumption 1 is the time series analogue of SUTVA (Cox, 1958; Rubin, 1980). For convenience, wewill drop references to future treatments and write the time- t potential outcome random variable Y t ( w t ) : W t → R , while the stochastic process version is written as Y T ( w T ) = ( Y ( w ) , Y ( w ) , ..., Y T ( w T )) ⊺ . We link the potential outcomes and treatments to deliver outcomes through our third stone: Assumption 2 (Outcomes) . The time- t outcome is the random variable Y t := Y t ( W t ) , while theoutcome stochastic process is Y T := ( Y ( W ) , Y ( W ) , ..., Y T ( W T )) ⊺ . Let F t stand for the natural ﬁltration generated by the observed stochastic process { Y t , W t } .The ﬁnal foundation stone is that W T is non-anticipating: the assignment of the treatmentdepends only on past outcomes and past treatments. This is a probabilistic assumption involving the Angrist and Kuersteiner (2011); Angrist et al. (2018) allow treatments to stochastically depend on past outcomes andtreatments but deﬁne their potential outcomes as { Y t,p ( w ) , w ∈ W} , for each lag p ≥

0. This latter step limits thedependence of the potential outcomes on the full treatment path, e.g. for p = 1, { Y t, ( w ) , w ∈ W} only depends onthe treatment assigned at period t − t . In principle, the 1-step aheadcausal eﬀect of the treatment on the outcome may diﬀer depending on what treatments are assigned at period t but thisnotation rules this out. As we next discuss in detail in Section 3, introducing explicit dependence on the full treatmentpath leads to a rich set of interesting causal estimands. { W t , Y t : T ( W t − , w t : T ) }|F t − . The associated (conditional) probability triple of this jointconditional distribution is written as (Ω , G , Pr), hiding the implicit dependence on w t : T and F t − . Assumption 3 (Non-anticipating treatment paths) . For each t = 1 , . . . , T {{ Y t : T ( W t − , w t : T ) , w t : T ∈ W T − t +1 } ⊥⊥ W t } | F t − . Assumption 3 is the time-series analogue of unconfoundedness. It says that the future potentialoutcomes { Y t : T ( W t − , w t : T ) , w t : T ∈ W T − t +1 } do not Granger-cause the current treatment W t (Sims,1972; Chamberlain, 1982; Engle et al., 1983; Kuersteiner, 2010; Lechner, 2011; Hendry, 2017).With our four foundation stones in place we can now deﬁne a potential outcome time series . Deﬁnition 2 (Potential outcome time series) . A stochastic process of potential outcomes and treat-ments { Y t , W t } that satisﬁes Assumptions 1, 2 and 3 is a potential outcome time series . We now illustrate the potential outcome time series through two examples.

Example 1 (Autoregression) . Consider a bivariate time series, where for all w t ∈ W t ,  Y t ( w t ) W t  =  µ + φY t − ( w t − ) + β w t γ + θW t − + δY t − ( W t − )  +  ǫ t η t  ,  ǫ t η t  iid ∼ N  ,  σ ǫ ρσ ǫ σ η ρσ ǫ σ η σ η  . (1) The resulting { Y t , W t } is a Gaussian process. However, in general this system is not a potential out-come time series, as ǫ t and η t are contemporaneously correlated which disallows the use of Assumption3. If ρ = 0 then this is a potential outcome time series. More generally, if we replace the assumptionabout the joint law of ǫ t , η t in (1) entirely with the assumption { ǫ t ⊥⊥ W t }|F t − , then this is a potentialoutcome time series. Example 2 (Expectations of future treatments and non-anticipation) . In economics, consumers andﬁrms are often modelled as forward-looking, with the distribution of futures outcomes inﬂuencing to-day’s treatment choice. A simple version of this (e.g. in the tradition of Muth (1961), Lucas (1972), In panel data settings, Robins (1994), Robins et al. (1999) and Abbring and van den Berg (2003) use this type of“selection on observables” assumption for the treatment paths W T . When T = 2 this assumption is equivalent to the“latent sequential ignorability” assumption of Ricciardi et al. (2020). More broadly, Frangakis and Rubin (1999) callthis type of assumption “latent ignorable”. argent (1981)) is: W t = arg max w t (cid:18) max w t +1: T E [ U ∗ ( Y t : T ( W t − , w t : T ) , w t : T ) | F t − ] (cid:19) , (2) where U ∗ is a utility function of future outcomes and treatments. This decision rule delivers W t andthus Y t ( W t ) . This is a potential outcome time series. We now focus on three, new special cases of the nonparametric potential outcome time series whichallow the formal deﬁnitions of an instrumental path, a linear potential outcome and a nonparametricshock in causal time series models. These three cases were not in Bojinov and Shephard (2019).The ﬁrst special case of the potential outcome time series connects this framework to the literatureon instrumental variables (Angrist et al. (1996); Angrist and Krueger (2001)).

Deﬁnition 3 (Instrumented potential outcome time series) . Partition the treatment path V t = ( W ′ t , Z ′ t ) ′ ,where W t ∈ W W and Z t ∈ W Z . Assume { Y t , V t } is a potential outcome time series and additionallythat:1. Exclusion condition: Y t ( w , z , ..., w t , z t ) = Y t ( w , z ′ , ..., w t , z ′ t ) for all w t ∈ W tW , z t , z ′ t ∈ W tZ .2. Relevance condition: Z t W t | F t − . Then { Y t , V t } is an instrumented potential outcome time series , where Z t is labelled an instrument path . The lack of dependence of the potential outcomes on the instrument means it is convenient to refer toit as Y t ( w t ) : W tW → R , while Y T ( w T ) = ( Y ( w ) , Y ( w ) , ..., Y T ( w T )) ⊺ . Example 1 (continuing from p. 6) . In economics, it is often diﬃcult to measure accurately the treat-ment W t , so instead, researchers use an estimator, ˆ W t , of the treatment. We take the instrument Z t = ˆ W t , following the statistical measurement error tradition of Durbin (1954), which is used inthe context of dynamic linear causal models by Jord´a et al. (2015). An empirical example of thisis Stock and Watson (2018) where W t is a monetary policy movement and ˆ W t is an estimator of The non-anticipation assumptions are similarly plausible if a diﬀerent model for expectations is used. “Natural ex-pectations” as in Fuster et al. (2010) or “diagnostic expectations” as in Bordalo et al. (2018) both only allow currentdecisions to depend on (possibly biased) beliefs about future outcomes, not the exact realizations along alternative paths Y t ( w t ). t constructed from high-frequency movements in the rates on federal funds contracts around policyannouncements. A simple time series example of this extends Example 1 with ˆ W t = α + α W t + ζ t , where  ǫ t η t ζ t  iid ∼ N   ,  σ ǫ σ η

00 0 σ ζ  . Hence the estimated treatment is biased but not independent of the treatment. As ˆ W t does not movearound the potential outcomes, this system is an instrumented potential outcome time series. Remark 2.1.

The non-anticipation of treatments means that instrumented potential outcome timeseries has { Z t − p ⊥⊥ Y t ( W t − p − , w t − p : t ) } | F t − p − for all w t − p : t ∈ W p +1 W . Our second special case of the potential outcome time series bridges this framework to the literatureon linear dynamic causal models (e.g. the survey of Ramey (2016)).

Deﬁnition 4 (Linear potential outcome time series) . Assume a potential outcome time series. If, forevery w t ∈ W t , Y t ( w t ) = U t + t − X s =0 β t,s w t − s , almost surely , where β t,s are non-stochastic, then { Y t , W t } is called a linear potential outcome time series . If β t,s = β s for every t , then the linear potential outcome time series is time-invariant . Here, { U t } is an arbitrary stochastic process whose only constraint is that it does not vary with w t and ( U t ⊥⊥ W t ) |F t − . For example, { U t } may be an ARCH process, which is non-linear, or a randomwalk, which is non-stationary.Our last special case bridges the potential outcome time series framework to the literature onshocks in economics (e.g. the surveys of Ramey (2016) and Stock and Watson (2018)). Deﬁnition 5 (Shocked potential outcome) . For a potential outcome time series, if, E [ W t | F t − ] = 0 , then W t is called a shock and we label { Y t , W t } a shocked potential outcome time series . Moreover, constructed measures of changes in government spending and tax policy have also recently been used asinstruments to study the eﬀects of ﬁscal policy on macroeconomic outcomes using time series data (Ramey and Zubairy,2018; Fieldhouse et al., 2018; Mertens and Montiel Olea, 2018).

Example 1 (continuing from p. 6) . If Y t ( w t ) = µ + φY t − ( w t − ) + β w t + ǫ t , where E ( W t |F t − ) = 0 ,and { ǫ t ⊥⊥ W t }|F t − , then the system is a shocked potential outcome time series. The class of shocked potential outcome time series provides the formal deﬁnition of a sequence ofnonparametric shocks within a causal framework. To our knowledge, this formalization of a causalshock is novel. Shocks are often described heuristically or precisely with respect to a model such asa structural moving average. For example, Stock and Watson (2018) describe macroeconomic shocksas “unanticipated structural disturbances” that produce “unexpected changes” in the macroeconomicoutcomes of interest. Ramey (2016) also describes shocks as: (1) “exogenous with respect to the othercurrent and lagged endogenous variables,” (2) “uncorrelated with other exogenous shocks” and (3)“either unanticipated movements in exogenous variables or news about future movements in exogenousvariables.”Shocks are central to modern macroeconomics and ﬁnancial economics. Leading empirical exam-ples of macroeconomic shocks include “oil price shocks,” (Hamilton, 2003, 2013) and sudden changesin national defense spending (Ramey, 2011; Barro and Redlick, 2011; Ramey and Zubairy, 2018). Ex-amples of shocks in ﬁnancial economics include “earnings surprises” (Kothari, 2001; Kothari et al.,2006; Patton and Verardo, 2012) and “news impact” (Engle and Ng, 1993; Anatolyev and Petukhov,2016). L projections of potential outcomes In economics, it is common to use best linear approximations or representations of potentially non-linear systems or expectations (Rudd, 2000; Plagborg-Møller and Wolf, 2019). That tradition gener-ates two superpopulation L projections of potential outcomes on lagged treatments. Deﬁnition 6.

Suppose { Y t , W t } is a shocked potential outcome time series where K = 1 , E ( Y t ) < ∞ , < E ( W t − p ) < ∞ , and p = 0 , , , ... . Deﬁne the time- t projection β Lt,p := arg min β h min α E ( Y t − α − βW t − p ) i , nd the “universal” β Up := arg min β h min α S p ( α, β ) i where S p ( α, β ) := lim T →∞ E  T − p T X t = p +1 ( Y t − α − βW t − p )  . Also deﬁne the L projections of the time- t potential outcomes Y Lt ( w t ) := α t + t − X s =0 β Lt,s w t − s , and Y Ut ( w t ) := α + t − X s =0 β Ls w t − s , where α t = E ( Y t ) and α = lim T →∞ T P Tt =1 E ( Y t ) . Then β Lt,p = E ( Y t W t − p ) E ( W t − p ) , and β Up = lim T →∞ T − p P Tt = p +1 E ( Y t W t − p ) lim T →∞ T − p P Tt = p +1 E ( W t − p ) , since the martingale diﬀerence treatments implies E ( W t − p ) = 0. The two terms are related to oneanother through β Up = lim T →∞ T − p P Tt = p +1 β Lt,p E ( W t − p ) lim T →∞ T − p P Tt = p +1 E ( W t − p ) . (3)Hence β Up is a weighted average of { β Lt,p } , where the weights are the time-varying variance of thetreatments. If E ( W t ) is time-invariant, then the simpliﬁcation β Up = lim T →∞ T − p P Tt = p +1 β Lt,p holds.These quantities are important in modern dynamic econometrics. In Section 3.5, we will show that β Up is the implicit estimand for the “Local Projection” estimator of the lag- p dynamic causal eﬀect fora shocked potential outcome time series.To link { β Lt,p } and β Up directly to deﬁnitional terms { β t,p } under the linear potential outcomes(Deﬁnition 4), we combine the shocked assumption with linearity. Theorem 2.1. If { Y t , W t } is a shocked, linear potential outcome time series where K = 1 , E ( Y t ) < ∞ , < E ( W t − p ) < ∞ , and p = 0 , , , ... , then β Lt,p = β t,p , and β Up = β U ∗ p , where β U ∗ p := lim T →∞ T − p P Tt = p +1 β t,p E ( W t − p ) lim T →∞ T − p P Tt = p +1 E ( W t − p ) . (4) Proof.

Given in the Appendix A. 10

Dynamic causal eﬀects p causal eﬀects Dynamic causal eﬀects are comparisons of potential outcomes at a particular point in time alongdiﬀerent treatment paths. In particular, for a potential outcome time series, the time- t causal eﬀecton Y t of treatment path w t , compared to counterfactual path w ′ t , is Y t ( w t ) − Y t ( w ′ t ) . The time- t , lag- p causal eﬀect measures how the outcome at time t changes if the treatment at time t − p changes, where p ≥

0, ﬁxing the treatment path up to time t − p − W t − p − . Deﬁnition 7 (Lag- p causal eﬀect) . For a potential outcome time series and scalars w, w ′ , then τ t,p ( w, w ′ ) := Y t ( W t − p − , w, w t − p +1: t ) − Y t ( W t − p − , w ′ , w ′ t − p +1: t ) , is a lag- p , time- t causal eﬀect for all w t − p +1: t , w ′ t − p +1: t ∈ W p . Bojinov and Shephard (2019) introduced and studied the case where the treatment and counterfac-tual at time t − p varies but w ′ t − p +1: t = w t − p +1: t . The more general time- t , lag- p τ t,p ( w, w ′ ) is newand our focus. This generalization is essential to link existing model-based dynamic causal methodsdeveloped in economics to the potential outcome framework. Its development below is the fourth maincontribution of this paper.We can similarly deﬁne the projection versions of the lag- p , time- t causal eﬀect as τ Lt,p ( w, w ′ ) := Y Lt ( W t − p − , w, w t − p +1: t ) − Y Lt ( W t − p − , w ′ , w ′ t − p +1: t ) and τ Ut,p ( w, w ′ ) := Y Ut ( W t − p − , w, w t − p +1: t ) − Y Ut ( W t − p − , w ′ , w ′ t − p +1: t ). Example 3.

Assume a linear potential outcome time series, then τ It,p ( w, w ′ ) = β t,p ( w − w ′ ) , and τ t,p ( w, w ′ ) = β t,p ( w − w ′ ) + p − X s =0 β t,s ( w t − s − w ′ t − s ) . For a shocked potential outcome time series: τ Lt,p ( w, w ′ ) = β Lt,p ( w − w ′ ) + P p − s =0 β Lt,s ( w t − s − w ′ t − s ) , and τ Ut,p ( w, w ′ ) = β Up ( w − w ′ ) + P p − s =0 β Us ( w t − s − w ′ t − s ) are, respectively, the time- t and universal L projections of the lag- p , time- t causal eﬀect. Under a shocked, linear potential outcomes, notice that Our approach follows the ﬁnite sample tradition that manipulates causal eﬀects without reference to superpop-ulations (Imbens and Rubin, 2015). It contrasts with the superpopulation approach used by Robins (1986),Angrist and Kuersteiner (2011) and Boruvka et al. (2018) in the context of panel data and Angrist et al. (2018) fortime series. Lt,p ( w, w ′ ) = τ t,p ( w, w ′ ) = τ Ut,p ( w, w ′ ) and that Y Lt ( w t ) , Y Ut ( w t ) and Y t ( w t ) all diﬀer, recalling thedeﬁnitions of Y Lt ( w t ) , Y Ut ( w t ) from Deﬁnition 6 (e.g. α , α t and U t all diﬀer). We now introduce causal estimands built from the lag- p , time- t causal eﬀect τ t,p ( w, w ′ ).Many possible w t − p : t and w ′ t − p : t are consistent with passing through w t − p = w and w ′ t − p = w ′ . Eachpossible path leads to a valid lag- p , time- t causal eﬀect. We weight these diﬀerent paths, selecting aweight function which will eventually lead to existing model-based dynamic causal methods developedin economics. The weights we choose will be generated by using distributions of W t − p : t , W ′ t − p : t givenpast data. Deﬁnition 8.

Let Y t := Y t ( W t ) , Y ′ t := Y t ( W t − p − , W ′ t − p : t ) . Then, the weighted causal eﬀect is τ ∗ t,p ( w, w ′ ) := E (cid:2)(cid:0) Y t − Y ′ t (cid:1) | F t − p − , W t − p = w, W ′ t − p = w ′ , (cid:8) Y t − p : t ( W t − p − , w t − p : t ) , w t − p : t ∈ W p +1 (cid:9)(cid:3) , (5) if it exists, where the expectation is generated by { W t − p : t , W ′ t − p : t }| F t − p − , { Y t − p : t ( W t − p − , w t − p : t ) , w t − p : t ∈W p +1 } . The causal response function is, if it exists,

CRF t,p ( w, w ′ ) := E (cid:2)(cid:0) Y t − Y ′ t (cid:1) | W t − p = w, W ′ t − p = w ′ , F t − p − (cid:3) , (6) where the expectation is generated by { Y t , W t − p : t , Y ′ t , W ′ t − p : t }|F t − p − . Temporally averaging these causal eﬀects produces the estimands:¯ τ ∗ p ( w, w ′ ) = 1 T − p T X t = p +1 τ ∗ t,p ( w, w ′ ) , CRF p ( w, w ′ ) = 1 T − p T X t = p +1 CRF t,p ( w, w ′ ) , (7)which we label the lag- p average weighted causal eﬀect and the lag- p average causal responsefunction , respectively.The lag- p average weighted causal eﬀect ¯ τ ∗ p ( w, w ′ ) is a ﬁnite sample dynamic causal estimand,invoking no stochastic model for the potential outcomes. Intuitively, it describes the observed, his-torical causal eﬀects. CRF p ( w, w ′ ) is a superpopulation quantity. The diﬀerence between superpop-ulation and ﬁnite sample causal estimands is subtle and increasingly emphasized in microeconomics(Aronow and Samii, 2016; Abadie et al., 2020). Here we introduce this distinction into time series.12e now make an additional assumption about the selected weights that places restrictions on therelationship between the counterfactual and treatment paths, enabling us to simplify the expressionsfor the weighted causal eﬀect and the causal response function. Assumption 4.

For a potential outcome time series assume that:1. { Y ′ t , W ′ t − p : t }|F t − p − L = { Y t , W t − p : t }|F t − p − , where Y t := Y t ( W t ) , Y ′ t := Y t ( W t − p − , W ′ t − p : t ) , { Y t , W t − p : t } ⊥⊥ W ′ t − p |F t − p − , and { Y ′ t , W ′ t − p : t } ⊥⊥ W t − p |F t − p − . Assumption 4.2 means that the treatment path and outcome is independent from the t − p coun-terfactual, given the past. Lemma 3.1.

For a potential outcome time series, if Assumption 4 holds and the expectations exist,then τ ∗ t,p ( w, w ′ ) = E [ Y t | F t − p − , W t − p = w, { Y t − p : t ( W t − p − , w t − p : t ) , w t − p : t ∈ W p +1 } ] − E [ Y t | F t − p − , W t − p = w ′ , { Y t − p : t ( W t − p − , w t − p : t ) , w t − p : t ∈ W p +1 } ] , where the expectations are from W t − p : t |F t − p − , { Y t − p : t ( W t − p − , w t − p : t ) , w t − p : t ∈ W p +1 } . Likewise,

CRF t,p ( w, w ′ ) = E [ Y t | F t − p − , W t − p = w ] − E [ Y t | F t − p − , W t − p = w ′ ] , where the expectations are from the law of { Y t ( W t ) , W t − p }|F t − p − . Proof.

Given in the Appendix A.Lemma 3.1 shows that under Assumption 4 the

CRF t,p ( w, w ′ ) is the same as the “generalizedimpulse response function” of Koop et al. (1996) when w ′ = 0, but those authors have no broaddiscussion of causality. The CRF t,p ( w, w ′ ) is also similar in spirit to the “average policy eﬀect” inAngrist et al. (2018) where w, w ′ are discrete. However, the “average policy eﬀect” is not explicitlydeﬁned in terms of treatment paths.A simple T / -consistent and asymptotically Gaussian kernel estimator of the ﬁnite sample, averageweighted causal eﬀect ¯ τ ∗ p ( w, w ′ ) is developed in Appendix B for continuous w, w ′ . This means that theaverage dynamic causal eﬀects can be nonparametrically identiﬁed solely from assuming a potentialoutcome time series. No further assumptions on the potential outcomes, such as stationarity, linearityor shocks, are needed. Those auxiliary assumptions on the potential outcomes may improve the13ﬃciency of estimation, but they are not fundamental to causal identiﬁcation of the average weightedcausal eﬀect ¯ τ ∗ p ( w, w ′ ). This is a conceptually important point. We now link the CRF to the impulse response function (IRF), which was introduced by Sims (1980)for vector autoregressions (Ramey, 2016; Stock and Watson, 2016; Kilian and Lutkepohl, 2017). Weﬁrst give the IRF deﬁnition.

Deﬁnition 9 (Impulse response function) . Assume { Y t , W t } is strictly stationary and IRF p ( w, w ′ ) := E [ Y t | W t − p = w ] − E [ Y t | W t − p = w ′ ] , exists, where here E [ · ] is calculated from the joint law of Y t , W t − p .Then, IRF p ( w, w ′ ) is an impulse response function (IRF). The IRF is commonly viewed as tracing out the dynamic causal eﬀect of the treatment on the outcome.However, the IRF does not have causal meaning without additional assumptions as it is just thediﬀerence of two conditional expectations. In contrast, a causal eﬀect measures what would happenif W t − p is moved from w to w ′ . This is well known, as IRFs are typically used in the context ofparametrized causal models such as the structural vector moving average.With that said, Theorem 3.1 gives the IRF a nonparametric causal meaning by linking it to theCRF. Theorem 3.1.

Assume { Y t , W t } is a stationary potential outcome time series and that Assumption4 holds. Then, if the expectations exist, E [ CRF t,p ( w, w ′ )] = IRF p ( w, w ′ ) , where the expectation is generated by the stationary distribution of treatments and outcomes.Proof. If the expectations exist, then E [ CRF t,p ( w, w ′ )] = E [ Y t ( W t ) | W t − p = w ] − E [ Y t ( W t ) | W t − p = w ′ ] , and the RHS is the IRF.Here, F t − p − is averaged out by stationarity, implying the causal measure holds universally. Hence,if we add stationarity to the potential outcome time series assumption, we can nonparametricallyestimate the impulse response function by the diﬀerence of a kernel regression of Y t on W t − p (Robinson,1983; Fan and Yao, 2006) evaluated at w and w ′ , respectively, converging at, again, T / , the standardnonparametric rate. However, this rate is not an improvement over what could be obtained for theaverage weighted causal eﬀect ¯ τ ∗ p ( w, w ′ ) without stationarity.14 .4 Example: linear potential outcomes and shocked treatments Here we detail the properties of the weighted causal eﬀect and the causal response function underspecial features such as a linear potential outcomes and shocked treatments. These two assumptionsare crucial, as most empirical dynamic causal work in economics is carried out using linear modelsunder the assumption that treatments are shocks. It is this restriction that will eventually allow aparametric rate of convergence.

Example 3 (continuing from p. 11) . Under the linear potential outcomes, then the weighted causaleﬀect becomes τ ∗ t,p ( w, w ′ ) = β t,p ( w − w ′ ) + p − X s =0 β t,s { µ t − s | t − p − ( w ) − µ t − s | t − p − ( w ′ ) } , where µ t − s | t − p − ( w ) = E (cid:2) W t − s |F t − p − , W t − p = w, { Y t − p : t ( W t − p − , w t − p : t ) , w t − p : t ∈ W p +1 } (cid:3) and thecausal response function becomes CRF t,p ( w, w ′ ) = β t,p ( w − w ′ ) + p − X s =0 β t,s { E [ W t − s |F t − p − , W t − p = w ] − E [ W t − s |F t − p − , W t − p = w ′ ] } , assuming all the relevant moments exist. Likewise for a linear, shocked potential outcome time series τ ∗ t,p ( w, w ′ ) = CRF t,p ( w, w ′ ) = β t,p ( w − w ′ ) = τ It,p ( w, w ′ ) . Under time-invariant, linear, stationary potential outcome time series

IRF p ( w, w ′ ) = β p ( w − w ′ ) + P p − s =0 β s { E [ W t − s | W t − p = w ] − E [ W t − s | W t − p = w ′ ] } . For a time-invariant, linear, stationary, shockedpotential outcome time series, then

IRF p ( w, w ′ ) = τ ∗ t,p ( w, w ′ ) = CRF t,p ( w, w ′ ) = β p ( w − w ′ ) . This example shows that if treatments are shocks and potential outcomes are linear, then

CRF p ( w, w ′ ) = ¯ τ ∗ p ( w, w ′ ) = ( w − w ′ ) 1 T − p T X t = p +1 β t,p . Thus estimating

CRF p ( w, w ′ ) or ¯ τ ∗ p ( w, w ′ ) will be estimating the temporal average of β t,p . The timeseries properties of the outcomes (which includes { U t } ) do not drive this result, it is the properties ofthe treatments and the linear potential outcomes which determine it.15 .5 Local projection estimator of causal estimands Here we use the shocked potential outcome time series to provide a formal, causal interpretationto the “local projections” estimator, which is commonly used in economics. This estimator directlyregresses the observed outcome on the observed treatment at a variety of lags, interpreting the coef-ﬁcients on the lagged treatments as estimates of dynamic causal eﬀects (Jord´a, 2005; Ramey, 2016;Stock and Watson, 2018). Theorem 3.2 (Local projection) . Assume { Y t , W t } is a shocked potential outcome time series where K = 1 , E ( Y t ) < ∞ , < E ( W t − p ) < ∞ , p = 0 , , , ... . Construct β Lt,p = E ( Y t W t − p ) / E ( W t − p ) and themean-zero error U Lt := Y t W t − p − β Lt,p W t − p . Assume that { U Lt } , { W t − p } are ergodic processes and β Up (Deﬁnition 6) exists. If T → ∞ , then ˆ β OLSp = P Tt = p +1 Y t W t − p P Tt = p +1 W t − p p −→ β Up . If T − / P Tt = p +1 U Lt = O p (1) , T − P Tt = p +1 W t − p p −→ σ W > , then ˆ β OLSp is T / -consistent for β Up .Proof. The probability limit is by construction. The convergence rate is a standard calculation.By construction, ˆ β OLSp estimates, at the parametric rate, the universal β Up from Deﬁnition 6. However, β Up only has indirect causal meaning, through the deﬁnition τ Ut,p ( w, w ′ ) in Example 3. If we furtherassume a linear potential outcome time series, then this has a direct causal meaning. Corollary 3.1.

Maintain the same conditions as Theorem 3.2 and strengthen { Y t , W t } to a shocked,linear potential outcome time series. If T → ∞ , then ˆ β OLSp p −→ β U ∗ p = lim T →∞ T − p P Tt = p +1 CRF t,p (1 , E ( W t − p ) lim T →∞ T − p P Tt = p +1 E ( W t − p ) . If T − / P Tt = p +1 U Lt = O p (1) , T − P Tt = p +1 W t − p p −→ σ W > , then ˆ β OLSp is T / -consistent for β U ∗ p .Proof. The strenghtening to linearity implies β Lt,p = β t,p = CRF t,p (1 ,

0) and β Up = β U ∗ p , so resultfollows from Theorem 3.2.Under a shocked, linear potential outcome time series β U ∗ p is the temporal weighted average of This is related to, but diﬀerent from, the literature on direct forecasting, which forecasts Y t by regressing on Y t − p ratherthan iterating one step ahead forecasts p times (Cox, 1961; Marcellino et al., 2006). RF t,p (1 , E ( W t − p ). It is lim T →∞ CRF p (1 ,

0) if E ( W t − p ) is time-invariant. If β t,p = β p ,then ˆ β OLSp p −→ β p irrespective of the variation of E ( W t − p ). A major concern is that precisely measuring the treatment may be very diﬃcult (Jord´a et al., 2015;Stock and Watson, 2018; Plagborg-Møller and Wolf, 2018). Here, we use the instrumented potentialoutcome time series to provide a causal interpretation of LP-IV.

Theorem 3.3 (LP-IV) . Suppose { Y t , V t } is a shocked, linear, instrumented potential outcome timesseries, where for each t = 1 , , ..., T , that V t = ( W t , ˆ W t ) , E ( Y t ) < ∞ , < E ( W t ) < ∞ , < E ( ˆ W t ) < ∞ . For each t = 1 , , ...T, and p = 0 , , ..., t − construct β Lt,p = E ( Y t W t − p ) / E ( W t − p ) , η Lt := ( Y t − β Lt,p W t − p ) ˆ W t − p and ζ Lt := β Lt,p { W t − p ˆ W t − p − E ( W t − p ˆ W t − p ) } and assume that β γp := lim T →∞ T − p P Tt = p +1 β Lt,p E ( W t − p ˆ W t − p ) exists. If { η Lt } and { ζ Lt } are ergodic and β γ = 0 , then ˆ β IVp = P Tt = p +1 Y t ˆ W t − p P Tt = p +1 Y t − p ˆ W t − p p −→ β IVp := lim T →∞ T − p P Tt = p +1 CRF t,p (1 , E ( W t − p ˆ W t − p ) lim T →∞ T − p P Tt = p +1 CRF t, (1 , E ( W t − p ˆ W t − p ) . If, additionally, T − / P Tt = p +1 η Lt = O p (1) , T − P Tt = p +1 Y t − p ˆ W t − p p −→ β γ , then ˆ β IVp is T / -consistentfor β IVp .Proof.

Given in the Appendix A.Under a shocked, linear, instrumented potential outcome time series β IVp is the ratio of theweighted-average of the

CRF t,p (1 , E ( W t − p ˆ W t − p ) to the weightedaverage of CRF t, (1 , E ( W t − p ˆ W t − p ) is time-invariant, then β IVp = lim T →∞ CRF p (1 , /lim T →∞ CRF (1 , . In the LP-IV literature it is conventional to take β t, = 1 (e.g. Stock and Watson(2018)), which would mean that β IVp = lim T →∞ CRF p (1 , β IVp . A suﬃcientcondition to rule out such behavior is E ( W t − p ˆ W t − p ) ≥ t , which is a signrestriction and is similar in spirit to the “monotonicity” assumption found in the LATE literature oncross-sectional instrumental variables (Imbens and Angrist, 1994; Angrist et al., 1996). Whether sucha restriction is reasonable will depend on the empirical application.17emark 2.1 says that the instrumented potential outcome times series implies ˆ W t − p is uncorrelatedfrom the counterfactual. This lack of correlation is needed for the LP-IV to be causal. Otherwise,ˆ β IVp p −→ β γp + β ′ p β γ , where T − p P Tt = p +1 E ( U t ˆ W t − p ) → β ′ p . Remark 3.1 (Lead-lag exogeneity) . The need for the condition that

Cov ( ˆ W t − p , Y t ( W t − p − , w t − p : t )) =0 is seemingly missing from the LP-IV literature. Instead the existing literature typically uses a “lead-lag exogeneity” assumption that Cov ( W t , ˆ W s ) = 0 for all t = s . Unfortunately, lead-lag exogeneityplus the assumption that { Y t , W t } is a shocked potential outcome time series does not imply that Cov ( ˆ W t − p , Y t ( W t − p − , w t − p : t )) = 0 . This assumption is implied by lead-lag exogeneity assumptionin the tightly parameterized setting studied by existing literature on LP-IV (i.e., outcomes that aregenerated by a structural moving average in the treatment, where the treatments are white noise). Ouranalysis shows that lead-lag exogeneity is not suﬃcient in more general causal models. In this paper, we adapted the nonparametric potential outcomes time series framework for experimentsto formalize dynamic causal eﬀects in observational time series data. We did so by introducing threecrucial special cases of the potential outcome time series: instruments, shocks and linearity. Further,we deepened our understanding of dynamic causal eﬀects by developing a fourth idea: the ﬁnite sampleweighted causal eﬀect and its superpopulation analogue, the causal response function.These four ideas give nonparametric causal meaning to the impulse response function, which is amajor device for economists to measure dynamic causal eﬀects. Further, we used this framework toprovide a causal interpretation to the implicit estimand of the local projections estimator. Finally,we made two important contributions to literature on LP-IV. We showed that the LP-IV estimatoridentiﬁes a weighted average of dynamic causal eﬀects, where the weights depend on the possiblytime-varying relationship between the instrument and the treatment. We also showed that typicalassumptions (i.e. lead-lag exogeneity) are not suﬃcient to identify a causally interpretable estimandbecause it does not enforce that the instrument is independent of the counterfactual given the past.18 eferences

Abadie, A., S. C. Athey, G. W. Imbens, and J. Wooldridge (2017). When should you adjust standarderrors for clustering? Technical report.Abadie, A., S. C. Athey, G. W. Imbens, and J. Wooldridge (2020). Sampling-based vs. design-baseduncertainty in regression analysis.

Econometrica . Forthcoming.Abadie, A. and M. D. Cattaneo (2018). Econometric methods for program evaluation.

Annual Reviewof Economics 10 , 465–503.Abbring, J. H. and J. J. Heckman (2007). Econometric evaluation of social programs, part III:Using the marginal treatment eﬀect to organize alternative econometric estimators to evaluate socialprograms, and to forecast their eﬀects in new environments. In J. J. Heckman and E. E. Leamer(Eds.),

Handbook of Econometrics , Volume 6, pp. 5145–5303. Amsterdam, The Netherlands: NorthHolland.Abbring, J. H. and G. J. van den Berg (2003). The nonparametric identiﬁcation of treatment eﬀectsin duration models.

Econometrica 71 , 1491–1517.Anatolyev, S. and A. Petukhov (2016). Uncovering the skewness news impact curve.

The Journal ofFinancial Econometrics 14 , 746–771.Angrist, J. D., G. W. Imbens, and D. B. Rubin (1996). Identiﬁcation of causal eﬀects using instru-mental variables.

Journal of the American Statistical Association 91 , 444–455.Angrist, J. D., O. Jorda, and G. M. Kuersteiner (2018). Semiparametric estimates of monetary policyeﬀects: String theory revisited.

Journal of Business and Economic Statistics 36 , 381–387.Angrist, J. D. and A. B. Krueger (2001). Instrumental variables and the search for identiﬁcation:From supply and demand to natural experiments.

Journal of Economic Perspectives 15 , 69–85.Angrist, J. D. and G. M. Kuersteiner (2011). Causal eﬀects of monetary shocks: Semiparametricconditional independence tests with a multinomial propensity score.

Review of Economics andStatistics 93 , 725–747.Angrist, J. D. and J.-S. Pischke (2009).

Mostly Harmless Econometrics: An Empiricist’s Companion .Princeton: Princeton University Press.Aronow, P. M. and C. D. Samii (2016). Does regression produce representative estimates of causaleﬀects?

American Journal of Political Science 60 , 250–267.Athey, S. C. and G. W. Imbens (2017). The state of applied econometrics: Causality and policyevaluation.

Journal of Economic Perspectives 31 , 3–32.Barro, R. and C. Redlick (2011). Macroeconomic eﬀects from government purchases and taxes.

TheQuarterly Journal of Economics 126 , 51102.Blackwell, M. and A. Glynn (2018). How to make causal inferences with time-series and cross-sectionaldata.

American Political Science Review 112 , 1067–1082.Bojinov, I. and N. Shephard (2019). Time series experiments and causal estimands: Exact random-ization tests and trading.

Journal of the American Statistical Association 114 , 1665–1682.19ondersen, K. H., F. Gallusser, J. Koehler, N. Remy, and S. L. Scott (2015). Inferring causal impactusing Bayesian structural time-series models.

The Annals of Applied Statistics 9 , 247–274.Bordalo, P., N. Gennaioli, and A. Shleifer (2018). Diagnostic expectations and credit cycles.

Journalof Finance 73 , 199–227.Boruvka, A., D. Almirall, K. Witkiwitz, and S. A. Murphy (2018). Assessing time-varying causal eﬀectmoderation in mobile health.

Journal of the American Statistical Association 113 , 1112–1121.Box, G. E. P. and G. C. Tiao (1975). Intervention analysis with applications to economic and envi-ronmental problems.

Journal of the American Statistical Association 70 , 70–79.Brodersen, K., F. Gallusser, J. Koehler, N. Remy, and S. Scott (2015). Inferring causal impact usingBayesian structural time-series models.

The Annals of Applied Statistics 9 , 247–274.Cattaneo, M. (2010). Eﬃcient semiparametric estimation of multi-level treatment eﬀects under ignor-ability.

Journal of Econometrics 155 , 138–154.Chamberlain, G. (1982). The general equivalence of Granger and Sims causality.

Econometrica 50 ,1305–1324.Christiano, L. J., M. S. Eichenbaum, and C. L. Evans (1999). Monetary policy shocks: What have welearned and to what end? In J. B. Taylor and M. Woodford (Eds.),

Handbook of Macroeconomics ,Volume 1A, pp. 65–148. Amsterdam, The Netherlands: North Holland.Christiano, L. J., M. S. Eichenbaum, and C. L. Evans (2005). Nominal rigidities and the dynamiceﬀects of a shock to monetary policy.

Journal of Political Economy 113 , 1–45.Cox, D. R. (1958).

Planning of Experiments . Oxford, United Kingdom: Wiley.Cox, D. R. (1961). Prediction by exponentially weighted moving averages and related methods.

Journalof the Royal Statistical Society, Series B 23 , 414–422.Durbin, J. (1954). Errors in variables.

Review of the Institute of International Statistics 22 , 23–54.Engle, R. F., D. F. Hendry, and J. F. Richard (1983). Exogeneity.

Econometrica 51 , 277–304.Engle, R. F., T. Ito, and W. L. Lin (1990). Meteor showers or heat waves? Heteroskedastic intra-dayvolatility in the foreign exchange market.

Econometrica 58 , 525–542.Engle, R. F. and V. K. Ng (1993). Measuring and testing the impact of news on volatility.

Journal ofFinance 48 , 1749–1778.Fan, J. and Q. Yao (2006).

Nonlinear Time Series: Nonparametric and Parametric Methods . NewYork: Springer.Fieldhouse, A., K. Mertens, and M. Ravn (2018). The macroeconomic eﬀects of government assetpurchases: Evidence from postwar U.S. housing credit policy.

Quarterly Journal of Economics 133 ,1503–1560.Fisher, R. A. (1925).

Statistical Methods for Research Workers (1 ed.). London, United Kingdom:Oliver and Boyd. 20isher, R. A. (1935).

Design of Experiments (1 ed.). London, United Kingdom: Oliver and Boyd.Frangakis, C. E. and D. B. Rubin (1999). Addressing complications of intention-to-treat analysis inthe combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes.

Biometrika 86 , 365–379.Frisch, R. (1933).

Propagation Problems and Impulse Problems in Dynamic Economics . London,United Kingdom: Allen and Unwin.Fuster, A., D. I. Laibson, and B. Mendel (2010). Natural expectations and macroeconomic ﬂuctuations.

Journal of Economic Perspectives 24 , 67–84.Gallant, A. R., P. E. Rossi, and G. Tauchen (1993). Nonlinear dynamic structures.

Econometrica 61 ,871–907.Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectralmethods.

Econometrica 37 , 424–438.Hall, P. and C. C. Heyde (1980).

Martingale Limit Theory and its Applications . San Diego, California,USA: Academic Press.Hamilton, J. D. (2003). What is an oil shock?

Journal of Econometrics 113 , 363–398.Hamilton, J. D. (2013). Historical oil price shocks. In R. E. Parker and R. M. Whaples (Eds.),

Routledge Handbook of Major Events in Economic History , pp. 239–265. Abingdon, Oxfordshire:Routledge. Also: NBER Working Paper, 2011.Harvey, A. C. (1996). Intervention analysis with control groups.

International Statistical Review 64 ,313–328.Harvey, A. C. and J. Durbin (1986). The eﬀects of seat belt legislation on British road casualties:A case study in structural time series modelling.

Journal of the Royal Statistical Society, SeriesA 149 , 187–227.Heckman, J. J., J. E. Humphries, and G. Veramendi (2016). Dynamic treatment eﬀects.

Journal ofEconometrics 191 , 276–292.Heckman, J. J. and S. Navarro (2007). Dynamic discrete choice and dynamic treatment eﬀects.

Journalof Econometrics 136 , 341–396.Hendry, D. F. (2017). Granger causality.

European Journal of Pure and Applied Mathematics 10 ,12–29.Herbst, E. and F. Schorfheide (2015).

Bayesian Estimation of DSGE Models . Princeton, New Jersey,USA: Princeton University Press.Hernan, M. A. and J. M. Robins (2019).

Causal Inference . Boca Raton, Florida, USA: Chapman &Hall. Forthcoming.Hirano, K. and G. W. Imbens (2004). The propensity score with continuous treatments. In A. Gelmanand X.-L. Meng (Eds.),

Applied Bayesian Modeling and Causal Inference from Incomplete-DataPerspectives , pp. 73–84. Hoboken, New Jersey, USA: John Wiley.21orvitz, D. G. and D. J. Thompson (1952). A generalization of sampling without replacement from aﬁnite universe.

Journal of the American Statistical Association 47 , 663–685.Imbens, G. W. and J. D. Angrist (1994). Identiﬁcation and estimation of local average treatmenteﬀects.

Econometrica 62 , 467–475.Imbens, G. W. and D. B. Rubin (2015).

Causal Inference for statistics, social and biomedical sciences:An introduction . Cambridge, United Kingdom: Cambridge University Press.Jord´a, O. (2005). Estimation and inference of impulse responses by local projections.

AmericanEconomic Review 95 , 161–182.Jord´a, O., M. Schularick, and A. M. Taylor (2015). Betting the house.

Journal of InternationalEconomics 96 , S2–S18.Kempthorne, O. (1955). The randomization theory of experimental inference.

Journal of the AmericanStatistical Association 50 , 946–967.Kilian, L. and H. Lutkepohl (2017).

Structural Vector Autoregressive Analysis . Cambridge, UnitedKingdom: Cambridge University Press.Koop, G., M. H. Pesaran, and S. M. Potter (1996). Impulse response analysis in nonlinear multivariatemodels.

Journal of Econometrics 74 , 119–147.Kothari, S. P. (2001). Capital markets research in accounting.

Journal of Accounting and Eco-nomics 31 , 105–231.Kothari, S. P., J. Lewellen, and J. Warner (2006). Stock returns, aggregate earnings surprises, andbehavioral ﬁnance.

Journal of Financial Economics 79 , 537–568.Kuersteiner, G. (2010). Granger-Sims causality. In S. N. Durlauf and L. Blume (Eds.),

Macroeconomicsand Time Series Analysis , pp. 119–134. London, United Kingdom: Palgrave Macmillian.Kuersteiner, G., D. Phillips, and M. Villamizar-Villegas (2018). Eﬀective sterilized foreign exchangeintervention? Evidence from a rule-based policy.

Journal of International Economics 118 , 118–138.Lechner, M. (2011). The relation of diﬀerent concepts of causality used in time series and microeco-nomics.

Econometric Reviews 30 , 109–127.Lu, X., L. Su, and H. White (2017). Granger causality and structural causality in cross-section andpanel data.

Econometric Theory 33 , 263–291.Lucas, R. E. (1972). Expectations and the neutrality of money.

Journal of Economic Theory 4 ,103–124.Marcellino, M., J. H. Stock, and M. W. Watson (2006). A comparison of direct and iterated multistepAR methods for forecasting macroeconomic time series.

Journal of Econometrics 135 , 499–526.Mertens, K. and J. L. Montiel Olea (2018). Marginal tax rates and income: New time series evidence.

The Quarterly Journal of Economics 133 , 1803–1884.Murphy, S. A. (2003). Optimal dynamic treatment regimes.

Journal of the Royal Statisticsl SocietyB 65 , 331–366. 22urphy, S. A., M. J. van der Laan, J. M. Robins, and C. P. P. R. Group (2001). Marginal meanmodels for dynamic regimes.

Journal of the American Statistical Association 96 , 1410–1423.Muth, J. F. (1961). Rational expectations and the theory of price movements.

Econometrica 29 ,315–335.Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay onPrinciples. Section 9.

Statistical Science 5 , 465–472. Originally published 1923, republished in 1990,translated by Dorota M. Dabrowska and Terence P. Speed.Patton, A. and M. Verardo (2012). Does beta move with news? Firm-speciﬁc information ﬂows andlearning about proﬁtability.

The Review of Financial Studies 25 , 2789–2839.Plagborg-Møller, M. and C. K. Wolf (2018). Instrumental variable identiﬁcation of dynamic variancedecompositions. Unpublished paper: Department of Economics, Princeton University.Plagborg-Møller, M. and C. K. Wolf (2019). Local projections and VARs estimate the same impulseresponses. Unpublished paper: Department of Economics, Princeton University.Priestley, M. B. (1988).

Nonlinear and Non-stationary Time Series Analysis . London, United King-dom: Academic Press.Ramey, V. (2011). Identifying government spending shocks: It’s all in the timing.

The QuarterlyJournal of Economics 126 (1), 1–50.Ramey, V. A. (2016). Macroeconomics shocks and their propagation. In J. B. Taylor and H. Uhlig(Eds.),

Handbook of Macroeconomics , Volume 2A, pp. 71–162. Amsterdam, The Netherlands: NorthHolland.Ramey, V. A. and S. Zubairy (2018). Government spending multipliers in good times and in bad:Evidence from US historical data.

Journal of Political Economy 126 , 850–901.Ricciardi, F., A. Mattei, and F. Mealli (2020). Bayesian inference for sequential treatments underlatent sequential ignorability.

Journal of the American Statistical Association . Forthcoming.Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposureperiods: Application to control of the healthy worker survivor eﬀect.

Mathematical Modelling 7 ,1393–1512.Robins, J. M. (1994). Correcting for non-compliance in randomization trials using structural nestedmean models.

Communications in Statistics — Theory and Methods 23 , 2379–2412.Robins, J. M., S. Greenland, and F.-C. Hu (1999). Estimation of the causal eﬀect of a time-varyingexposure on the marginal mean of a repeated binary outcome.

Journal of the American StatisticalAssociation 94 , 687–700.Robins, J. M., M. A. Hernan, and B. Brumback (2000). Marginal structural models and causalinference in epidemiology.

Epidemiology 11 , 550–560.Robinson, P. M. (1983). Nonparametric estimators for time series.

Journal of Time Series Analysis 4 ,185–207. 23oy, A. D. (1951). Some thoughts on the distribution of earnings.

Oxford Economic Papers 3 , 135–146.Rubin, D. B. (1974). Estimating causal eﬀects of treatments in randomized and nonrandomizedstudies.

Journal of Educational Psychology 66 , 688–701.Rubin, D. B. (1980). Randomization analysis of experimental data: The Fisher randomization testcomment.

Journal of the American Statistical Association 75 , 591–593.Rudd, P. A. (2000).

An Introduction to Classical Econometric Theory . Oxford: Oxford UniversityPress.Sargent, T. J. (1981). Interpreting economic time series.

Journal of Political Economy 89 , 213–248.Sims, C. A. (1972). Money, income and causality.

American Economic Review 62 , 540–552.Sims, C. A. (1980). Macroeconomics and reality.

Econometrica 48 , 1–48.Slutzky, E. (1937). The summation of random causes as the source of cyclic processes.

Econometrica 5 ,105–146.Smets, F. R. and R. Wouters (2003). An estimated dynamic stochastic general equilibrium model ofthe Euro area.

Journal of the European Economic Association 1 , 1123–1175.Smets, F. R. and R. Wouters (2007). Shocks and frictions in US business cycles: A Bayesian DSGEapproach.

American Economic Review 97 , 586–606.Stock, J. H. and M. W. Watson (2016). Dynamic factor models, factor-augmented vector autoregres-sions, and structural vector autoregressions in macroeconomics. In J. B. Taylor and H. Uhlig (Eds.),

Handbook of Macroeconomics , Volume 2A, pp. 415–525.Stock, J. H. and M. W. Watson (2018). Identiﬁcation and estimation of dynamic causal eﬀects inmacroeconomics using external instruments.

Economic Journal 128 , 917–948.Toulis, P. and D. C. Parkes (2016). Long-term causal eﬀects via behavioral game theory. 30thConference on Neural Information Processing Systems (NIPS’16).White, H. and X. Lu (2010). Granger causality and dynamic structural systems.

Journal of FinancialEconometrics 8 , 193–243.Wiener, N. (1956). The theory of prediction. In E. F. Beckenbeck (Ed.),

Modern Mathematics , pp.165–190. New York, USA: McGraw-Hill.Yang, S., G. W. Imbens, Z. Cui, D. E. Faries, and Z. Kadziola (2016). Propensity score matching andsubclassiﬁcation in observational studies with multilevel treatments.

Biometrics 72 , 1055–1065.24 conometric analysis of potential outcomes time series:instruments, shocks, linearity and the causal response function

Online Appendix

Ashesh Rambachan Neil Shephard

A Appendix: a collection of proofs

Proof of Theorem 2.1.

Under a linear potential outcome time series Y t ( w t ) = U t + t − X s =0 β t,s w t − s , so if { W t } is a MD sequence, then E ( Y t W t − p ) = E ( U t W t − p ) E ( Y t ) = E ( U t ) = α t . By non-antipicapting treatments of the potential outcome time series, E ( U t W t − p ) = 0 so long as themoment exists. This delivers the required result using conventional arguments. Proof of Lemma 3.1.

As the moments exist, so

CRF t,p ( w, w ′ ) simpliﬁes to E [ { Y t ( W t − p − , w, W t − p +1: t ) | ( W t − p = w, W ′ t − p = w ′ , F t − p − )] − E [ { Y t ( W t − p − , w ′ , W ′ t − p +1: t ) } | ( W t − p = w, W ′ t − p = w ′ , F t − p − )] . Due to property 2 of the causal predictive weight, E [ Y t ( W t ) | F t − p − , W t − p = w, W ′ t − p = w ′ ] = E [ Y t ( W t ) | F t − p − , W t − p = w ] . Due to property 1 of the causal predictive weights, E [ Y t ( W t − p − , w ′ , W ′ t − p +1: t ) | F t − p − , W ′ t − p = w ′ ] = E [ Y t ( W t ) | F t − p − , W t − p = w ′ ] . The corresponding results for the weighted causal eﬀect follow using the same logical arguments.25 roof of Theorem 3.3.

Deﬁne ǫ Lt,p := Y t − β Lt,p W t − p , then by the shock and instrument property ofthe time series, E ( ǫ Lt,p ˆ W t − p ) = 0 . So construct the zero mean time series η Lt := ǫ Lt,p ˆ W t − p and ζ Lt := β Lt,p { W t − p ˆ W t − p − E ( W t − p ˆ W t − p ) } . Then1 T − p T X t = p +1 Y t ˆ W t − p = 1 T − p T X t = p +1 β Lt,p W t − p ˆ W t − p + 1 T − p T X t = p +1 η Lt . If { η Lt } is ergodic, the latter sum disappears, while if { ζ Lt } is ergodic then the former term converges tothe limit of the expectations as expected. Shocks plus linearity implies β Lt,p = β t,p = CRF t,p (1 , Appendix: estimation of ¯ τ ∗ p ( w, w ′ ) B.1 Conditioning on the potential outcomes

Throughout this Section F T,t denotes the triangular ﬁltration (pg. 53 of Hall and Heyde (1980))generated by { W t , Y t , { Y t +1: T ( W t , w t +1: T ) , w t +1: T ∈ W T − t }} . Recall ¯ τ ∗ p ( w, w ′ ) = 1 T − p T X t = p +1 τ ∗ t,p ( w, w ′ )where τ ∗ t,p ( w, w ′ ) = E [ Y t | F T,t − p − , W t − p = w ] − E [ Y t | F T,t − p − , W t − p = w ′ ] . The expectations are over the treatment path, holding ﬁxed the potential outcomes. Fixing thepotential outcomes follows the microeconometrics tradition discussed by Imbens and Rubin (2015),Abadie et al. (2017, 2020) and traces back to Fisher (1925, 1935) and Cox (1958). Bojinov and Shephard(2019) ﬁrst introduced this type of approach into time series experiments.Our task is to estimate τ ∗ t,p ( w, w ′ ) and ¯ τ ∗ p ( w, w ′ ). B.2 When W is discrete B.2.1 Estimator

We start by assuming that W is discrete and that the treatment is probabilistic . Assumption 5 (Probabilistic treatment) . For all t ≥ , F T,t − and w ∈ W , p t ( w ) := Pr( W t = w | F T,t − ) > . Assumption 5 is the analogue of the “overlap” assumption made in cross-sectional settings. Through-out we will regard p t ( w ) as known, which will be true in experimental settings and unlikely in obser-vational ones where p t ( w ) would need to be estimated.27eﬁne a time series version of the classic Horvitz and Thompson (1952) style estimatorˆ¯ τ ∗ p ( w, w ′ ) := 1 T − p T X t = p +1 ˆ τ ∗ t,p ( w, w ′ ) , ˆ τ ∗ t,p ( w, w ′ ) := Y t (cid:26) ( W t − p = w ) − ( W t − p = w ′ ) (cid:27) p t − p ( W t − p ) . (8)This estimator appears in Angrist et al. (2018), but for a superpopulation estimand. Bojinov and Shephard(2019) also use a Horvitz and Thompson (1952) style estimator, but diﬀerently setup and for a diﬀerentﬁnite sample estimand. The results which follow are roughly inline with those in Bojinov and Shephard(2019), although the details diﬀer. No new ideas are needed to generate the results. B.2.2 Properties of ˆ τ ∗ t,p ( w, w ′ ) and ˆ¯ τ ∗ p ( w, w ′ )The following theorem shows that ˆ τ ∗ t,p ( w, w ′ ) − τ ∗ t,p ( w, w ′ ) has martingale diﬀerence errors and henceˆ¯ τ ∗ p ( w, w ′ ) is unbiased, conditional on the potential outcomes. Theorem B.1 (Properties of ˆ τ ∗ t,p ( w, w ′ )) . Assume a potential outcome time series and Assumption5. Let u t − p ( w, w ′ ) := ˆ τ ∗ t,p ( w, w ′ ) − τ ∗ t,p ( w, w ′ ) . Then, over the non-anticipating treatment path, E [ u t − p ( w, w ′ ) | F T,t − p − ] = 0 , and E [ˆ¯ τ ∗ p ( w, w ′ )] = ˆ¯ τ ∗ p ( w, w ′ ) . (9) Further η t − p ( w, w ′ ) := V ar [ u t − p ( w, w ′ ) |F T,t − p − ] , is E (cid:18) Y t ( W t − p − , w, W t − p +1: t ) p t − p ( w ) | F T,t − p − , W t − p = w (cid:19) (10)+ E (cid:18) Y t ( W t − p − , w ′ , W t − p +1: t ) p t − p ( w ′ ) | F T,t − p − , W t − p = w ′ (cid:19) − τ ∗ t,p . (11) Proof.

We produce equation (9) by noting that E (cid:18) Y t ( W t − p − , w, W t − p +1: t )1( W t − p = w ) p t − p ( w ) |F T,t − p − (cid:19) = E { Y t ( W t − p − , w, W t − p +1: t ) |F T,t − p − , W t − p = w } = E { Y t |F T,t − p − , W t − p = w } The form of η t − p ( w, w ′ ) is expected from the cross-sectional literature, and can be derived using thevariance of a Bernoulli trial. 28hus, over the treatment path, conditioning on the entire path of all potential outcomes,( T − p ) V ar (ˆ¯ τ ∗ p ( w, w ′ ) − ¯ τ ∗ p ( w, w ′ ) |{ Y T ( w T ) , w T ∈ W T } ) = ¯ η T ( w, w ′ ) (12)where ¯ η T ( w, w ′ ) = 1 T − p T X t = p +1 E ( η t − p ( w, w ′ ) |{ Y T ( w T ) , w T ∈ W T } ) . So long as the conditional mean of η t − p ( w, w ′ ) is bounded, then this the conditional variance ofˆ¯ τ ∗ p ( w, w ′ ) will contract with T .The following Theorem, which just applies a triangular martingale diﬀerence central limit theorem,extends these results to where T → ∞ . It shows that ˆ¯ τ ∗ t,p ( w, w ′ ) is consistent for ¯ τ ∗ p ( w, w ′ ) and theestimator’s error is asymptotically normal under weak conditions. Theorem B.2.

Under the conditions of Theorem B.1, additionally assume that lim T →∞ ¯ η T ( w, w ′ ) < ∞ . Then ˆ¯ τ ∗ p ( w, w ′ ) − ¯ τ ∗ p ( w, w ′ ) p → as T → ∞ . Finally, if T − p P Tt = p +1 η t − p ( w, w ′ ) p → η ( w, w ′ ) > , then, over the non-anticipating treatment path, as T → ∞ , √ T { ˆ¯ τ ∗ p ( w, w ′ ) − ¯ τ ∗ p ( w, w ′ ) } η ( w, w ′ ) d → N (0 , . (13) Proof.

The ﬁrst result follows from (12) as ¯ η T ( w, w ′ ) is bounded. The second follows from a martingalearray CLT of Theorem 3.2 in Hall and Heyde (1980) as the potential outcomes are bounded whichmeans the Lindeberg condition hold.Again, the only source of randomness here is the path of the treatments. B.3 When W is continuous B.3.1 Estimator

There is a modest literature on the nonparametric estimation of causal eﬀects when treatments arecontinuous in cross-sectional and panel settings. For example, Hirano and Imbens (2004) study con-tinuous treatments using “generalized propensity scores.” Marginal structural models of Robins et al.(2000) provide parametric and series based nonparametric strategies to deal with continuous treat-ments. Cattaneo (2010) provides an extensive discussion of the multivalued case and the relatedliterature. Yang et al. (2016) is a recent paper on this topic.29rite F t ( w ) := Pr( W t ≤ w |F T,t − ) , and f t ( w ) := ∂F t ( w ) /∂w. Then, using a bandwidth h >

0, deﬁne the time- t kernel regression estimatorˆ τ ∗ t,p ( w, w ′ ) := ˆ g t,p ( w ) − ˆ g t,p ( w ′ ) , and ˆ g t,p ( w ) := Y t k h ( W t − p − w ) f t − p ( W t − p ) , (14)where k h ( u ) = h − k ( u/h ) is a kernel weight function, and the estimand is τ ∗ t,p ( w, w ′ ) = g t,p ( w ) − g t,p ( w ′ ) , where g t,p ( W t − p ) := E ( Y t |F T,t − p − , W t − p ) . In a moment we will use the deﬁnitions g [2] t,p ( W t − p ) = E ( Y t |F T,t − p − , W t − p ) , κ j = R u j k ( u ) du , b = R k ( u ) du and k ∗ ( x ) = R k ( u ) k ( x + u ) du .Theorem B.3 quantiﬁes the variance and bias terms of the time-t kernel regression estimator,holding the potential outcomes as ﬁxed. The derivation of the result is entirely conventional from thekernel literature. Theorem B.3.

Assume h > and f t − p ( w ) > for all w . Deﬁne µ t,p ( w ) := E (cid:20) ˆ g t,p ( w ) |F T,t − p − (cid:21) , σ t,p ( w ) := h × V ar (cid:20) ˆ g t,p ( w ) |F T,t − p − (cid:21) ,c t,p ( w, w ′ ) := h × Cov (ˆ g t,p ( w ) , ˆ g t,p ( w ′ ) |F T,t − p − ) , where the expectations are over the treatment process W t − p : t |F T,t − p − , holding the potential outcomesﬁxed. If u t,p ( w ) := ˆ g t,p ( w ) − µ t,p ( w ) then E ( u t,p ( w ) |F T,t − p − ) = 0 , V ar ( u t,p ( w ) |F T,t − p − ) = h − σ t,p ( w ) Cov ( u t,p ( w ) , u t,p ( w ′ ) |F T,t − p − ) = h − c t,p ( w, w ′ ) . Further, if g t,p ( w ) is twice continuously diﬀerentiable in w , κ = 1 , κ = 0 and h ↓ , then µ t,p ( w ) ≃ g t,p ( w ) + 0 . h g ′′ t,p ( w ) κ , σ t,p ( w ) ≃ g [2] t,p ( w ) f t − p ( w ) b,c t,p ( w, w ) ≃ (cid:18) g [2] t,p ( w ) f t − p ( w ) + g [2] t,p ( w ′ ) f t − p ( w ′ ) (cid:19) k ∗ (( w − w ′ ) /h ) . inally, if as | x | → ∞ , k ∗ ( x ) = o (1) , then c t,p ( w, w ) = o (1) , if w = w ′ .Proof. All but the last 3 results are by deﬁnition. Now h t,p ( w ) := E (cid:20) Y t k h ( W t − p − w ) f t − p ( W t − p ) |F T,t − p − (cid:21) = h − Z g t,p ( x ) k (( x − w ) /h ) dx. Transforming to u = ( x − w ) /h , so x = w + hu , we have h t,p ( w ) = Z g t,p ( w + hu ) k ( u ) du ≃ g t,p ( w ) + 0 . h g ′′ t,p ( w ) κ , as κ = 1 and κ = 0. Likewise E (cid:20) Y t k h ( W t − p − w ) f t − p ( W t − p ) |F T,t − p − (cid:21) = h − Z g [2] t,p ( x ) f t − p ( x ) k (( x − w ) /h ) dx = h − Z g [2] t,p ( w + hu ) f t − p ( w + hu ) k ( u ) du ≃ h − g [2] t,p ( w ) f t − p ( w ) b, while E (cid:20) Y t k h ( W t − p − w ) f t − p ( W t − p ) k h ( W t − p − w ′ ) f t − p ( W t − p ) |F T,t − p − (cid:21) = h − Z g [2] t,p ( x ) f t − p ( x ) k (( x − w ) /h ) k (( x − w ′ ) /h ) dx = h − Z g [2] t,p ( x ) f t − p ( x ) k (( x − w ) /h ) k (( x − w ′ ) /h ) dx + h − Z g [2] t,p ( x ) f t − p ( x ) k (( x − w ) /h ) k (( x − w ′ ) /h ) dx = h − Z g [2] t,p ( w + hu ) f t − p ( w + hu ) k ( u ) k ( u + ( w − w ′ ) /h ) du + h − Z g [2] t,p ( w ′ + hu ) f t − p ( w ′ + hu ) k ( u + ( w − w ′ ) /h ) k ( u ) du ≃ h − (cid:18) g [2] t − p ( w ) f t − p ( w ) + g [2] t,p ( w ′ ) f t − p ( w ′ ) (cid:19) Z k ( u ) k ( u + ( w − w ′ ) /h ) du = h − (cid:18) g [2] t,p ( w ) f t − p ( w ) + g [2] t,p ( w ′ ) f t − p ( w ′ ) (cid:19) k ∗ (( w − w ′ ) /h ) . Then the result follows by the assumed property of k ∗ .For each T ﬁx the bandwidth as h T . For each T , the estimation error { u t,p ( w ) } is a martingalediﬀerence sequence but centered at µ t,p ( w ) not g t ( w ). Now assume that T − p P Tt = p +1 σ t,p ( w ) p −→ σ p ( w ),then for h T > p h ( T − p ) { ˆ¯ τ ∗ p ( w ) − ˆ¯ τ ∗ p ( w ′ ) } − { ¯ µ p ( w ) − ¯ µ p ( w ′ ) } q σ p ( w ) + σ p ( w ′ ) d −→ N (0 , , µ p ( w ) = T − p P Tt = p +1 µ t,p ( w ). Of course¯ µ p ( w ) − ¯ µ p ( w ′ ) ≃ { ¯ g p ( w ) − ¯ g p ( w ′ ) } + 0 . h κ { ¯ g ′′ p ( w ) − ¯ g ′′ p ( w ′ ) } , where ¯ g ′′ p ( w ) = T − p P Tt = p +1 g ′′ t,p ( w ). Notice that the bias involves the diﬀerence of two second deriva-tives of ¯ g p ( w ) evaluated at w and w ′ . Remark B.1.

The corresponding results when the regression kernels for ˆ¯ g p ( w ) and ˆ¯ g p ( w ′ ) use diﬀerentbandwidth, h w and h w ′ , is straightforward to write out. However, in practice this has the disadvantagethat the bias term becomes . κ { h w ¯ g ′′ p ( w ) − h w ′ ¯ g ′′ p ( w ′ ) } , which shows no sign of cancelling. Further, if we aggregate period of period mean square error, then1 T − p T X t = p +1 E (cid:20)(cid:18) { ˆ τ ∗ t,p ( w ) − ˆ τ ∗ t,p ( w ′ ) } − { g t,p ( w ) − g t,p ( w ′ ) } (cid:19) |F T,t − p − (cid:21) ≃ h ( T − p ) { σ p ( w ) + σ p ( w ′ ) } + 0 . h κ T − p T X t = p +1 (cid:18) g ′′ t,p ( w ) − g ′′ t,p ( w ′ ) (cid:19) , which is minimized by selecting h ∝ ( T − p ) − / so the mean square error declines at the usualnonparametric rate T − / , which does not vary with p . None of these results are surprising from thevast nonparametric literature. Remark B.2.

At a fundamental level it would be convenient to be able to estimate individual ﬁnitesample terms like Y t ( W t − p − , w ) − Y t ( W t − p − , w ′ ) , where w, w ′ ∈ W p +1 , or their temporal average.Can these terms be nonparametrically identiﬁed just using the structure of the potential outcome timeseries, conditioning on all of the potential outcomes? We sketch out below that the answer to this isyes, but that the result is of little immediate practice use due to the slow rate of convergence. Writethe intermediate estimand as g t,p ( w ) := Y t ( W t − p − , w ) , where w ∈ W p +1 , while write g ′′ it,p ( w ) := ∂ g t,p /∂w i , k h,r ( u ) := h − r k ( u ) ...k ( u r ) , and F t − p : t ( w ) := Pr( W t − p : t ≤ w |F T,t − p − ) , and f t − p : t ( w ) := ∂F t − p : t ( w ) /∂w. The corresponding intermediate estimator is ˆ g t,p ( w ) := Y t f t − p : t ( W t − p : t ) k h,p +1 ( W t − p : t − w ) . The eventual goal is to use ˆ g t,p ( w ) − ˆ g t,p ( w ′ ) to estimate g t,p ( w ) − g t,p ( w ′ ) . Now E (ˆ g t,p ( w, w ′ ) |F T,t − p − ) = h − ( p +1) Z g t,p ( x ) k h,r (( x − w ) /h ) dx ...dx p +1 Z g t,p ( w + hu ) k ( u ) ...k ( u p +1 ) du ...du p +1 = g t,p ( w ) + h ( p + 1)0 . κ p + 1 p +1 X i =1 g ′′ it,p ( w ) , while E (ˆ g t,p ( w, w ′ ) |F T,t − p − ) = h − p +1) Z g t,p ( x ) f t − p : t ( x ) K (( x − w ) /h ) dx ...dx p +1 ≃ h − ( p +1) g t,p ( w ) f t − p : t ( w ) . As before the covariance between ˆ g t,p ( w ) and ˆ g t,p ( w ′ ) is comparatively unimportant. Hence, averagingover T data points, in terms of mean square the best bandwidth choice would be h ∝ T − / ( p +5) so themean square error declines at the usual multivariate rate of T ( p +4) / ( p +5) . Hence g t,p ( w ) − g t,p ( w ′ ) isnonparametrically identiﬁed, but at its core it is a very nasty result empirically. As the length of thelags increases the rate of convergence slows.isnonparametrically identiﬁed, but at its core it is a very nasty result empirically. As the length of thelags increases the rate of convergence slows.