Difference-in-Differences Estimators of Intertemporal Treatment Effects
DDifference-in-Differences Estimators of Intertemporal
Treatment Effects
Clément de Chaisemartin ∗ Xavier D’Haultfœuille † August 13, 2020
Abstract
We consider the estimation of the effect of a policy or treatment, using panel datawhere different groups of units are exposed to the treatment at different times. We focus onparameters aggregating instantaneous and dynamic treatment effects, with a clear welfareinterpretation. We show that under parallel trends conditions, these parameters can beunbiasedly estimated by a weighted average of differences-in-differences, provided that atleast one group is always untreated, and another group is always treated. Our estimators arevalid if the treatment effect is heterogeneous, contrary to the commonly-used event-studyregression.
Keywords: differences-in-differences, panel data, repeated cross-section data, dynamic treat-ment effects, welfare analysis, event-study regression.
JEL Codes:
C21, C23 ∗ University of California at Santa Barbara, [email protected] † CREST-ENSAE, [email protected] a r X i v : . [ ec on . E M ] A ug Introduction
We consider the estimation of the effect of a policy or treatment on an outcome, using a panelof groups (indexed by g hereafter) that are exposed to the policy at different times (indexed by t hereafter). It is often appealing to study the dynamic effects of the policy, rather than justfocusing on its instantaneous effect at the time of implementation. To do so, a commonly-usedmethod, first proposed by Autor (2003), is to regress the outcome on group fixed effects, timefixed effects, the value of the treatment in group g and period t , and lags of the treatment in group g . Hereafter, we refer to this regression as the event-study regression. Intuitively, the coefficientof the contemporaneous treatment should estimate its instantaneous effect, while the coefficientsof the lagged treatments should estimate its dynamic effects. However, Abraham and Sun (2020)have shown that those estimators are only valid if the treatment effect is homogenous over groupsand time, the latter assumption being especially unappealing when one is interested in estimatingdynamic effects. Instead, we propose to use differences-in-differences (DID) estimators. As theevent-study regression, such estimators rely on the standard parallel trends assumption. Butunlike it, they remain valid if the treatment effect is heterogeneous.In our panel data setting, there is a wealth of instantaneous and dynamic treatment effects onecould estimate, and some aggregation is in order to improve power. Welfare analysis is a naturalguide to perform said aggregation (see Manski, 2005). Specifically, we assume units’ utilityis additively separable in the outcome, and adopt the perspective of a planner interested incomparing the population’s expected average intertemporal utility under the actual treatmentsreceived and under the scenario where all groups keep all along the same treatment as in the firstperiod of the panel. Our parameter of interest is the difference between these two expectations,hereafter referred to as the actual-versus-status-quo parameter. It measures the welfare effectsof the policy changes that occurred over the period.We start by focusing on staggered adoption designs, where groups adopt the treatment at dif-ferent time periods and cannot switch out of the treatment after adoption. Many applicationsdo not fall into this special case, but it is much simpler to analyze than the general case, so wefocus on it first for the sake of exposition. We show that if at least one group is still untreatedat the end of the panel, the actual-versus-status-quo parameter can be unbiasedly estimated.Our estimator proceeds in two steps. We start by estimating the average effect of having startedto receive the treatment (cid:96) periods ago, from (cid:96) = 0 (corresponding to the instantaneous treatmenteffect) to the highest value of (cid:96) observed in the data. For each (cid:96) , our estimator DID (cid:96) is a weightedaverage, across t , of DID estimators comparing the t − (cid:96) − to t outcome evolution, in groupsthat become treated in t − (cid:96) and in groups not yet treated in t . Then, our estimator of the Their result is a generalization of that in de Chaisemartin and D’Haultfœuille (2020), who show that a similarresult holds for the static two-way fixed effects regression without the lagged treatments. (cid:96) estimators. If all groupsare treated at the end of the panel, a truncated version of our actual-versus-status-quo parametercan still be unbiasedly estimated, where the truncation happens at the last period when at leastone group is still untreated. Finally, we show how to test for the parallel trends conditionunderlying our estimators, using placebo estimators comparing the outcome evolution betweenthe same groups as above, before groups that eventually become treated do so. In staggeredadoption designs, the DID (cid:96) and placebo estimators are computed by the Stata did_multiplegt package (see de Chaisemartin et al., 2019).We then consider general designs, where groups may switch in and out of the treatment at anytime. We start by showing that if there is at least one group that is always untreated andanother group that is always treated, the actual-versus-status-quo parameter can be unbiasedlyestimated. Our estimator is a weighted average, across t and (cid:96) , of DID estimators comparingthe t − (cid:96) − to t outcome evolution, in groups whose treatment first changed in t − (cid:96) and ingroups whose treatment has not changed yet in t . Again, if there is no group untreated tillthe end of the panel, or no group treated till the end of the panel, a truncated version of ourparameter can be unbiasedly estimated. We also propose placebo estimators to test the paralleltrends conditions underlying our estimators.In long panels, or in instances where groups change treatment frequently, the truncation mayhappen early in the panel, and the truncated parameter could then be very different from theoriginal target parameter. Then, we consider an assumption on dynamic effects, and show thatunder this assumption, one can unbiasedly estimate a parameter that will often be closer to theparameter the planner needs to know to evaluate the policy changes that took place over theperiod. This assumption amounts to ruling out dynamic effects beyond k -lags, for a given k thatthe analyst should choose based on the context. Ruling out dynamic effects beyond a certainlag is an assumption that is implicitly made in event-study regressions since without it, suchregressions are not identified (Borusyak and Jaravel, 2017; Schmidheiny and Siegloch, 2020).Abraham and Sun (2020) and Callaway and Sant’Anna (2018) have also proposed DID estimatorsof instantaneous and dynamic treatment effects in panels with multiple groups and periods. Ourpaper differs from those on at least three dimensions. First and foremost, those papers onlyconsider staggered adoption designs, while we also consider general designs, where groups mayswitch in and out of the treatment at any time. Our estimators can also be easily extendedto non-binary treatments, unlike theirs. In a survey of all the papers using regressions withgroup and time fixed effects published by the AER between 2010 and 2012, de Chaisemartinand D’Haultfœuille (2020) find that less than 10% have a staggered adoption design. Therefore,our estimators can be used in a much larger set of empirical applications. Second, our paper isthe first to use welfare analysis to guide the aggregation of instantaneous and dynamic treatmenteffects. Callaway and Sant’Anna (2018) propose several interesting aggregation methods, but3he estimands they propose differ from our actual-versus-status-quo parameter, and they do nothave a clear welfare interpretation. The same applies to the aggregation method in Abrahamand Sun (2020). Aggregation is especially important outside of staggered designs. With T timeperiods in the panel, there are only T + 1 possible treatment trajectories in a staggered design,against T in general designs. Hence, we can expect the estimation of the difference betweenthe outcome of two such trajectories to be very noisy. Third, even in staggered designs, ourestimators differ from those in Abraham and Sun (2020) and Callaway and Sant’Anna (2018).To estimate the treatment effect at date t in groups that became treated at date t − (cid:96) , we useas controls all groups not yet treated at t , while Callaway and Sant’Anna (2018) use the nevertreated groups, and Abraham and Sun (2020) use the never treated groups or the groups thatbecome treated last if there are no never treated groups. Our control group is larger, so we canexpect our estimators to be more precise.This paper is also related to previous work of ours. In staggered designs, the DID estimatorwe propose in this paper is equivalent to the DID M estimator we proposed in de Chaisemartinand D’Haultfœuille (2020). Outside of staggered designs, the DID M estimator is equivalent tothe estimator we propose in this paper under the assumption of no dynamic effects.This paper is organized as follows. Section 2 introduces the notation and the assumptions wemaintain throughout the paper. Section 3 considers the case of staggered adoption designs.Section 4 considers general designs. One considers observations that can be divided into G groups and T periods. Time periods areindexed by t ∈ { , ..., T } . Groups are indexed by g ∈ { , ..., G } . There are N g,t > observationsin group g at period t . The data may be an individual-level panel or repeated cross-section dataset where groups are, say, individuals’ county of birth. The data could also be a cross-sectionwhere cohort of birth plays the role of time. It is also possible that for all ( g, t ) , N g,t = 1 , e.g. agroup is one individual or firm.One is interested in measuring the effect of a treatment on some outcome. Throughout thepaper we assume that treatment is binary, but our results can easily be generalized to anyordered treatment. Then, for every ( i, g, t ) ∈ { , ..., N g,t } × { , ..., G } × { , ..., T } , let D i,g,t denote the treatment status of observation i in group g at period t . We focus on sharp designs,where the treatment does not vary within ( g, t ) cells. Assumption 1 (Sharp design) ∀ ( i, g, t ) ∈ { , ..., N g,t } × { , ..., G } × { , ..., T } , D i,g,t = D g,t . Assumptions 1, 3, 4, 6, 7, 9- k , and Theorems 1 and 2 have equalities and inequalities involving randomvariables. Implicitly, these equalities and inequalities are assumed to hold with probability 1. N g,t = 1 . Then, let D g = ( D g, , ..., D g,T ) be a × T vectorstacking the treatments of group g from period to T . For all t , let D g,t = ( D g, , ..., D g,t ) be a × t vector stacking the treatments of group g from period to t .For all d ∈ { , } T , let Y g,t ( d ) denote the average potential outcome of group g at period t ,if the treatments of group g from period to T are equal to d . This notation allows for thepossibility that group g ’s outcome at time t be affected by her past and future treatments. Somegroups may have already been treated prior to period 1, the first period in the data, and thosetreatments may still affect some of their period- -to- T outcomes. However, we cannot estimatesuch dynamic effects, as treatments and outcomes are not observed for those periods, so we donot account for this potential dependency in our notation. Finally, we let Y g,t = Y g,t ( D g ) denotethe observed outcome, and for all t we let t denote a vector of t zeros. Assumption 2 (Independence between groups) The G vectors ( D g , ( Y g, ( d ) , ..., Y g,T ( d )) d ∈{ , } T ) are mutually independent. We consider the treatment and potential outcomes of each ( g, t ) cell as random variables. Forinstance, aggregate random shocks may affect the potential outcomes of group g at period t , andthat cell’s treatment may also be random. The expectations below are taken with respect to thedistribution of those random variables. Under Assumption 2, the treatments and potential out-comes of a group may be correlated over time, but the potential outcomes and treatments of dif-ferent groups have to be independent, a commonly made assumption in difference-in-differences(DID) designs (see Bertrand et al., 2004). Assumption 3 (No Anticipation) For all g , for all d ∈ { , } T , Y g,t ( d ) = Y g,t ( d , ..., d t ) . Assumption 3 requires that a group’s current outcome do not depend on her future treatments,the so-called no-anticipation hypothesis, see, e.g., Abbring and Van den Berg (2003). UnderAssumption 3, Y g,t = Y g,t ( D g,t ) .Hereafter, we refer to Y g,t ( t ) , the potential outcome that group g will obtain at period t if sheremains untreated from period to t as the never-treated potential outcome. We consider twoassumptions on that outcome. Assumption 4 (Strong exogeneity) ∀ t ≥ , E ( Y g,t ( t ) − Y g,t − ( t − ) | D g ) = E ( Y g,t ( t ) − Y g,t − ( t − )) . Assumption 4 requires that the shocks affecting a group’s never-treated potential outcome bemean independent of her treatments. For instance, this rules out cases where a group gets treatedbecause it experiences some negative shocks, the so-called Ashenfelter’s dip (see Ashenfelter,1978). Assumption 4 is related to the strong exogeneity condition in panel data models.
Assumption 5 (Common trends) ∀ t ≥ , E ( Y g,t ( t ) − Y g,t − ( t − )) does not vary across g . Throughout this section, we assume that the treatment follows a staggered adoption design,where groups can switch in but not out of the treatment.
Assumption 6 (Staggered adoption designs) For all g and t ≥ , D g,t ≥ D g,t − . For any g ∈ { , ..., G } , let F g = min { t : D g,t = 1 } denote the first date at which group g is treated, with the convention that F g = T + 1 if group g is never treated.Our parameters of interest are motivated by a welfare analysis. The treatment D g,t may forinstance correspond to a policy that is costly to implement, and a planner may seek to comparethe welfare gains produced by the policy to its cost (see Manski, 2005). We assume that for anypossible value d t of the treatments up to period t , the average utility in group g at period t isequal to Y g,t ( d t ) + U g,t . If the social planner wants to compare the population’s expected averageintertemporal utility under the actual treatments received by all groups and under the scenariowhere all groups are never treated, then she would like to learn δ AT T = E (cid:88) g : F g ≤ T T (cid:88) t = F g N g,t N D β t ( Y g,t ( D g,t ) − Y g,t ( t )) , where β is the planner’s discount factor, and N D = (cid:80) g : F g ≤ T (cid:80) Tt = F g N g,t is the number of units inthe treated ( g, t ) cells. With β = 1 and assuming no dynamic effects ( Y g,t ( d , ..., d t ) = Y g,t ( d t ) ), δ AT T is equal to the standard average treatment effect on the treated (ATT) parameter. Thus, δ AT T generalizes the ATT to settings with dynamic treatment effects, and it allows for thepossibility that the planner may discount later periods relative to earlier ones.If the social planner wants to compare the population’s utility under the actual treatmentsreceived and under the scenario where all groups keep the same treatment as in the first period(the status quo scenario), then she would like to learn δ SQ = E (cid:88) g :2 ≤ F g ≤ T T (cid:88) t = F g N g,t N S β t ( Y g,t ( D g,t ) − Y g,t ( t )) , N S = (cid:80) g :2 ≤ F g ≤ T (cid:80) Tt = F g N g,t is the number of units in the treated ( g, t ) cells, excluding thealways treated groups. If there are no always treated groups ( F g > for all g ), δ AT T = δ SQ . Ifthere are always treated groups, not taking their treatment effect into account may be justified,if the planner wants to evaluate the welfare effects of the policy decisions that took place overthe period under consideration, and not of earlier policy decisions.Before proposing an estimator of δ SQ , we consider parameters that are simpler to estimate. Forany (cid:96) ∈ { , ..., T − } , let N (cid:96) = (cid:80) g :2 ≤ F g ≤ T − (cid:96) N g,F g + (cid:96) denote the number of units in the ( g, t ) cellssuch that at period t , group g has started receiving the treatment (cid:96) periods ago. Let ∆ (cid:96) = (cid:88) g :2 ≤ F g ≤ T − (cid:96) N g,F g + l N (cid:96) β F g + (cid:96) ( Y g,F g + (cid:96) ( D g,F g + (cid:96) ) − Y g,F g + (cid:96) ( F g + (cid:96) )) (1)if N (cid:96) > , and let ∆ (cid:96) = 0 otherwise. If β = 1 , ∆ (cid:96) is just the average effect of having beentreated for (cid:96) periods, across all the groups treated for (cid:96) periods before period T , and excludingthe always treated groups. If β < , ∆ (cid:96) has the same interpretation, except that groups’treatment effect is discounted according to the date when they reach (cid:96) periods of treatment. Let L = T − min g : F g ≥ F g denote the number of time periods between the earliest date at which agroup goes from untreated to treated and date T . N (cid:96) = 0 if and only if (cid:96) > L : if (cid:96) > L , theeffect of being treated for (cid:96) periods is not observed for any group. Finally, let δ (cid:96) = E (∆ (cid:96) ) .For any (cid:96) ∈ { , ..., T − } and t ∈ { (cid:96) + 2 , ..., T } , let N (cid:96)t = (cid:80) g : F g = t − (cid:96) N g,t and N ntt = (cid:80) g : F g >t N g,t .Let DID t,(cid:96) = β t (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t ( Y g,t − Y g,t − (cid:96) − ) − (cid:88) g : F g >t N g,t N ntt ( Y g,t − Y g,t − (cid:96) − ) if N (cid:96)t > and N ntt > , and let DID t,(cid:96) = 0 otherwise. DID t,(cid:96) is the β t -discounted DID estimatorcomparing the outcome evolution from period t − (cid:96) − to t in groups that became treated in t − (cid:96) and in groups still untreated in t . Under Assumptions 4-5, the expectation of the outcomeevolution in the latter set of groups is a counterfactual of the evolution that would have takenplace in the former set of groups if it had not started receiving the treatment (cid:96) periods ago.Thus, DID t,(cid:96) is an unbiased estimator of the effect of having been treated for (cid:96) periods in thosegroups. Then, let N DID (cid:96) = (cid:80) t ≥ (cid:96) +2 ,N ntt > N (cid:96)t be the number of units in ( g, t ) cells such that group g has been treated for (cid:96) periods at date t , and another group is untreated at t . LetDID (cid:96) = T (cid:88) t = (cid:96) +2 N (cid:96)t N DID (cid:96)
DID t,(cid:96) if N DID (cid:96) > , and let DID (cid:96) = 0 otherwise. DID (cid:96) is our estimator of δ (cid:96) . Under Assumption6, DID with β = 1 is equal to the DID M estimator in de Chaisemartin and D’Haultfœuille(2020).The DID (cid:96) estimators with β = 1 are computed by the Stata did_multiplegt package.7e have the following relationship between δ SQ and the (∆ (cid:96) ) ≤ (cid:96) ≤ L : E (cid:32) L (cid:88) (cid:96) =0 N (cid:96) N S ∆ (cid:96) (cid:33) = E L (cid:88) (cid:96) =0 (cid:88) g :2 ≤ F g ≤ T − (cid:96) N g,F g + (cid:96) N S β F g + (cid:96) ( Y g,F g + (cid:96) ( D g,F g + (cid:96) ) − Y g,F g + (cid:96) ( F g + (cid:96) )) = E (cid:88) g :2 ≤ F g ≤ T T − F g (cid:88) (cid:96) =0 N g,F g + (cid:96) N S β F g + (cid:96) ( Y g,F g + (cid:96) ( D g,F g + (cid:96) ) − Y g,F g + (cid:96) ( F g + (cid:96) )) = δ SQ . (2)Accordingly, we estimate δ SQ by (cid:98) δ SQ = L (cid:88) (cid:96) =0 N DID (cid:96) N S DID (cid:96) . The following theorem gives conditions under which DID (cid:96) and (cid:98) δ SQ are unbiased estimators of δ (cid:96) and δ SQ . Theorem 1
Suppose that Assumptions 1-6 hold and N S > .1. For any (cid:96) ∈ { , ..., T − } , if max g ∈{ ,...,G } F g > max g :2 ≤ F g ≤ T − (cid:96) F g + (cid:96) , E [ DID (cid:96) ] = δ (cid:96) .
2. If max g ∈{ ,...,G } F g = T + 1 , E (cid:104)(cid:98) δ SQ (cid:105) = δ SQ . The condition max g ∈{ ,...,G } F g > max g :2 ≤ F g ≤ T − (cid:96) F g + (cid:96) requires that at the last time period whena group reaches (cid:96) periods of treatment, there is still at least one untreated group. This conditioncan be tested from the data. Point 1 of Theorem 1 shows that under this condition, the averageeffect of having been treated for (cid:96) periods δ (cid:96) can be unbiasedly estimated. Then, Point 2 of thetheorem shows that δ SQ can also be unbiasedly estimated, provided there is at least one groupthat is still untreated at date T . When no group is treated at the start of the panel, δ SQ and δ AT T are equal, so δ AT T can also be unbiasedly estimated.Theorem 1 applies to designs where at least one group is still untreated at the end of the panel.We now consider cases where that condition is not met. Let
N T = max g ∈{ ,...,G } F g − denotethe last period where at least one group is still untreated. Let N trunS = (cid:80) g :2 ≤ F g ≤ NT (cid:80) NTt = F g N g,t be the number of units in the ( g, t ) cells that are treated and such that t ≤ N T , excluding thealways treated groups. Let δ trunSQ = E (cid:88) g :2 ≤ F g ≤ NT NT (cid:88) t = F g N g,t N trunS β t ( Y g,t ( D g,t ) − Y g,t ( t )) denote the truncated-at- N T version of δ SQ , that only takes into account the treatment effectsuntil period N T . For any (cid:96) ∈ { , ..., T − } , let N trun(cid:96) = (cid:80) g :2 ≤ F g ≤ NT − (cid:96) N g,F g + (cid:96) denote the number8f units in ( g, t ) cells such that at period t , group g has started receiving the treatment (cid:96) periodsago and there is still an untreated group, excluding the always treated groups. Let ∆ trun(cid:96) = (cid:88) g :2 ≤ F g ≤ NT − (cid:96) N g,F g + l N trun(cid:96) β F g + (cid:96) ( Y g,F g + (cid:96) ( D g,F g + (cid:96) ) − Y g,F g + (cid:96) ( F g + (cid:96) )) if N trun(cid:96) > , and let ∆ trun(cid:96) = 0 otherwise. ∆ trun(cid:96) is the truncated-at- N T version of ∆ (cid:96) , the effectof being treated for (cid:96) periods. Let δ trun(cid:96) denote its expectation. Using the same steps as thoseuse to prove Equation (2), one can show that E (cid:32) L (cid:88) (cid:96) =0 N trun(cid:96) N trunS ∆ trun(cid:96) (cid:33) = δ trunSQ . (3)Then, we let (cid:98) δ trunSQ = (cid:80) L(cid:96) =0 ( N DID (cid:96) /N trunS ) DID (cid:96) . The (cid:98) δ trunSQ estimator with β = 1 is computed bythe Stata did_multiplegt package. Theorem 2
Suppose that Assumptions 1-6 hold and N trunS > .1. For any (cid:96) ∈ { , ..., T − } , E [ DID (cid:96) ] = δ trun(cid:96) . E (cid:104)(cid:98) δ trunSQ (cid:105) = δ trunSQ . Theorem 2 shows that δ trunSQ can be unbiasedly estimated even when there is no untreated groupat period T . However, unlike δ SQ , it is harder to rationalize why a social planner would want tolearn δ trunSQ . The treatment effects in δ trunSQ are a subset of those in δ SQ . Accordingly, we proposeto use λ trun = (cid:80) g :2 ≤ F g ≤ NT (cid:80) NTt = F g N g,t β t (cid:80) g :2 ≤ F g ≤ T (cid:80) Tt = F g N g,t β t , a quantity that can be computed from the data, to assess whether δ trunSQ is an “interesting”parameter. When that ratio is close to 1, most of the treatment effects in δ SQ are also in δ trunSQ ,so δ trunSQ may be useful for social choice. When potential outcomes are bounded, one can estimatebounds for δ SQ based on Theorem 2 and λ trun . Notice that if β = 1 , λ trun = N trunS /N S .Finally, we propose placebo estimators to test Assumptions 4 and 5. For any (cid:96) ∈ { , ..., (cid:98) T − (cid:99)} and t ∈ { (cid:96) + 3 , ..., T } , letDID pl t,(cid:96) = β t (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) − (cid:88) g : F g >t N g,t N ntt ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) if N (cid:96)t > and N ntt > , and let DID pl t,(cid:96) = 0 otherwise. Let also N DID pl (cid:96) = (cid:80) Tt =2 (cid:96) +3 N (cid:96)t , and letDID pl (cid:96) = T (cid:88) t =2 (cid:96) +3 N (cid:96)t N DID pl (cid:96) DID pl t,(cid:96) N DID pl (cid:96) > , and let DID pl (cid:96) = 0 otherwise. DID pl t,(cid:96) compares the outcome evolution in thesame two sets of groups as those used in DID t,(cid:96) , but between periods t − (cid:96) − and t − (cid:96) − instead of t − (cid:96) − and t . Thus, DID pl t,(cid:96) is a placebo estimator testing if parallel trends holdsfor (cid:96) + 1 periods, the same number of periods over which parallel trends has to hold for DID t,(cid:96) to be an unbiased estimator of the dynamic treatment effect (cid:96) periods after starting receivingthe treatment. DID pl (cid:96) averages those placebos across t , thus providing us a placebo estimatormimicking DID (cid:96) . Theorem 3
Suppose that Assumptions 1-6 hold. For any (cid:96) ∈ { , ..., (cid:98) T − (cid:99)} , E (cid:104) DID pl (cid:96) (cid:105) = 0 . Theorem 3 shows that E (cid:104) DID pl (cid:96) (cid:105) = 0 is a testable implication of Assumptions 4 and 5, so onecan reject those assumptions when DID pl (cid:96) is significantly different from 0. More informally, itmay be the case that, say, DID pl (cid:96) is significantly different from 0 for (cid:96) ≥ , but DID pl and DID pl are not significantly different from 0. That would suggest that violations of parallel trends mayinvalidate DID (cid:96) for (cid:96) ≥ , but not DID and DID .Finally, note that the “long-difference” placebos we propose here differ from the “first-difference”placebos we proposed in de Chaisemartin and D’Haultfœuille (2020), and that compare the t − k − to t − k − outcome evolution in groups becoming treated at t and in groups not yettreated at t , for k ∈ { , ..., T − } . Both placebo estimators have advantages and drawbacks.The long-difference ones test the common trends assumption underlying the instantaneous anddynamic treatment effect estimators, but one can at most compute (cid:98) T − (cid:99) + 1 of them. On theother hand, the first-difference placebos test only the common trends assumption underlying theinstantaneous treatment effect estimator, but one will often be able to compute more of them,using parts of the data that the long-run placebos may not be using. In this section we no longer assume that treatments follow a staggered adoption design. Groupsmay switch in or out of the treatment at any date. For every g ∈ { , ..., G } , let S g = min { t ≥ D g,t (cid:54) = D g,t − } denote the first date at which group g ’s treatment changes, with the convention that S g = T + 1 if group g ’s treatment never changes. For all t , let t denote a × t vector of ones. Hereafter, For completeness, we define those placebo estimators in the Web Appendix.
10e refer to Y g,t ( t ) , the potential outcome that group g will obtain at period t if she remainstreated from period to t , as the always-treated potential outcome.Again, our parameters of interest are motivated by a welfare analysis. In groups untreated atperiod 1, a planner interested in evaluating the welfare effects of the policy changes that tookplace from period to T wants to compare the actual outcome to the never-treated outcome,the outcome that would have been realized without any policy change. This leads us to considerthe following parameter: δ SQ, = E (cid:88) g : S g ≤ T (1 − D g, ) T (cid:88) t = S g N g,t N S, β t ( Y g,t ( D g,t ) − Y g,t ( t )) , where N S, = (cid:80) g : S g ≤ T (1 − D g, ) (cid:80) Tt = S g N g,t is the number of units in the ( g, t ) cells such thatgroup g was untreated at period and her treatment has changed for the first time at or before t . Under Assumption 6, δ SQ, is equal to δ SQ in the previous section, so δ SQ, is a generalizationof δ SQ to non-staggered designs.In groups treated at period 1, the planner wants to compare the actual outcome to the always-treated outcome, the outcome that would have been realized without any policy change. Thisleads us to consider the following parameter: δ SQ, = E (cid:88) g : S g ≤ T D g, T (cid:88) t = S g N g,t N S, β t ( Y g,t ( D g,t ) − Y g,t ( t )) , where N S, = (cid:80) g : S g ≤ T D g, (cid:80) Tt = S g N g,t is the number of units in the ( g, t ) cells such that group g was treated at period and her treatment has changed for the first time at or before t .Finally, the planner may be interested in aggregating those two parameters, to perform a cost-benefit analysis of all the policy changes that occurred over the period. This leads us to considerthe following parameter: δ CB = N S, N S δ SQ, − N S, N S δ SQ, , where N S = N S, + N S, . δ SQ, enters with a negative sign, because it is the effect of a reductionin exposure to the treatment, while δ SQ, is the effect of an increase. When aggregating them,the two parameters have to be put on the same scale.As in the previous section, we cannot unbiasedly estimate δ SQ, , δ SQ, and δ CB in general, so weconsider hereafter their truncated versions. Let N T = max g : D g, =0 S g − denote the last periodat which at least one group has never been treated, and let AT = max g : D g, =1 S g − denote the11ast period at which at least one group has always been treated. Then, let δ trunSQ, = E (cid:88) g : S g ≤ NT (1 − D g, ) NT (cid:88) t = S g N g,t N trunS, β t ( Y g,t ( D g,t ) − Y g,t ( t )) ,δ trunSQ, = E (cid:88) g : S g ≤ AT D g, AT (cid:88) t = S g N g,t N trunS, β t ( Y g,t ( D g,t ) − Y g,t ( t )) ,δ trunCB = N trunS, N trunS δ trunSQ, − N trunS, N trunS δ trunSQ, , where N trunS, = (cid:88) g : S g ≤ NT (1 − D g, ) NT (cid:88) t = S g N g,t ,N trunS, = (cid:88) g : S g ≤ AT D g, AT (cid:88) t = S g N g,t ,N trunS = N trunS, + N trunS, .δ trunSQ, (resp. δ trunSQ, ) is a version of δ SQ, (resp. δ SQ, ) truncated at N T (resp. AT ). Similarly, δ trunCB is a truncated version of δ CB . As in the previous section, let λ trun = (cid:80) g : S g ≤ NT (1 − D g, ) (cid:80) NTt = S g N g,t β t + (cid:80) g : S g ≤ AT D g, (cid:80) ATt = S g N g,t β t (cid:80) g : S g ≤ T (1 − D g, ) (cid:80) Tt = S g N g,t β t + (cid:80) g : S g ≤ T D g, (cid:80) Tt = S g N g,t β t denote the “ ‘proportion” of δ CB ’s treatment effects that are also in δ trunCB . δ trunSQ, can be unbiasedly estimated under the same assumptions as in the previous section. Onthe other hand, new assumptions are needed to unbiasedly estimate δ trunSQ, . Assumption 7 (Strong exogeneity for the always treated outcome) ∀ t ≥ , E ( Y g,t ( t ) − Y g,t − ( t − ) | D g ) = E ( Y g,t ( t ) − Y g,t − ( t − )) . Assumption 7 is the equivalent of Assumption 4, for the always-treated potential outcome.It requires that the shocks affecting a group’s Y g,t ( t ) be mean independent of that group’streatment sequence. Assumption 8 (Common trends for the always treated outcome) ∀ t ≥ , E ( Y g,t ( t ) − Y g,t − ( t − )) does not vary across g . Again, Assumption 8 is the equivalent of Assumption 5, for the always-treated potential outcome.It requires that between each pair of consecutive periods, the expectation of the always-treatedoutcome follow the same evolution over time in every group.12or any (cid:96) ∈ { , ..., T − } and t ∈ { (cid:96) + 2 , ..., T } , let N (cid:96), + t = (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t ,N (cid:96), − t = (cid:88) g : S g = t − (cid:96) D g, N g,t ,N ntt = (cid:88) g : S g >t (1 − D g, ) N g,t ,N att = (cid:88) g : S g >t D g, N g,t . LetDID + t,(cid:96) = β t (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N (cid:96), + t ( Y g,t − Y g,t − (cid:96) − ) − (cid:88) g : S g >t (1 − D g, ) N g,t N ntt ( Y g,t − Y g,t − (cid:96) − ) if N (cid:96), + t > and N ntt > , and let DID + t,(cid:96) = 0 otherwise. DID + t,(cid:96) is the β t -discounted DID estimatorcomparing the outcome evolution from period t − (cid:96) − to t in groups untreated in period 1 andwhose treatment changed for the first time in t − (cid:96) and in groups untreated in period 1 and whosetreatment has not changed yet in t . Under Assumptions 4-5, the expectation of the outcomeevolution in the latter set of groups is a counterfactual of the evolution that would have takenplace in the former set of groups if it had remained untreated till t . Thus, DID + t,(cid:96) is an unbiasedestimator of the effect of not having remained untreated till t in those groups.Let DID − t,(cid:96) = β t (cid:88) g : S g = t − (cid:96) D g, N g,t N (cid:96), − t ( Y g,t − Y g,t − (cid:96) − ) − (cid:88) g : S g >t D g, N g,t N att ( Y g,t − Y g,t − (cid:96) − ) if N (cid:96), − t > and N att > , and let DID − t,(cid:96) = 0 otherwise. DID − t,(cid:96) is the β t -discounted DID estimatorcomparing the outcome evolution from period t − (cid:96) − to t in groups treated in period 1 andwhose treatment changed for the first time in t − (cid:96) and in groups treated in period 1 and whosetreatment has not changed yet in t . Under Assumptions 7-8, DID − t,(cid:96) is an unbiased estimator ofthe effect of not having remained treated till t in the former groups.Then, let N DID + (cid:96) = (cid:88) t ≥ (cid:96) +2 ,N ntt > N (cid:96), + t N DID − (cid:96) = (cid:88) t ≥ (cid:96) +2 ,N att > N (cid:96), − t . + (cid:96) = T (cid:88) t = (cid:96) +2 N (cid:96), + t N DID + (cid:96) DID + t,(cid:96) if N DID + (cid:96) > and let DID + (cid:96) = 0 otherwise, and letDID − (cid:96) = T (cid:88) t = (cid:96) +2 N (cid:96), − t N DID − (cid:96) DID − t,(cid:96) if N DID − (cid:96) > and let DID − (cid:96) = 0 otherwise. Under Assumption 6, DID + (cid:96) is equal to DID (cid:96) in theprevious section. Finally, we let (cid:98) δ trunSQ, = L (cid:88) (cid:96) =0 N DID + (cid:96) N trunS, DID + (cid:96) , (cid:98) δ trunSQ, = L (cid:88) (cid:96) =0 N DID − (cid:96) N trunS, DID − (cid:96) , (cid:98) δ trunCB = N trunS, N trunS (cid:98) δ trunSQ, − N trunS, N trunS (cid:98) δ trunSQ, . Theorem 4
Suppose that Assumptions 1-3 hold.1. If Assumptions 4 and 5 also hold, E (cid:104)(cid:98) δ trunSQ, (cid:105) = δ trunSQ, .
2. If Assumptions 7 and 8 also hold, E (cid:104)(cid:98) δ trunSQ, (cid:105) = δ trunSQ, .
3. If Assumptions 4-5 and 7-8 also hold, E (cid:104)(cid:98) δ trunCB (cid:105) = δ trunCB . If there is at least one group untreated from period 1 to T , δ trunSQ, = δ SQ, , so Point 1 of Theorem4 implies that δ SQ, can be unbiasedly estimated. Similarly, if there is at least one group treatedfrom period 1 to T , δ trunSQ, = δ SQ, , so Point 2 of Theorem 4 implies that δ SQ, can be unbiasedlyestimated. Accordingly, if there is both a never treated and an always treated group, δ CB , theparameter the planner needs to know to evaluate the policy changes that took place during theperiod under consideration, can also be unbiasedly estimated. If there is no never-treated or noalways-treated group, δ CB can no longer be unbiasedly estimated, but one can still unbiasedlyestimate δ trunCB , whose closedness to δ CB can be assessed by computing λ trun . Whether λ trun isclose to or not depends on whether N T (resp. AT ), the number of periods for which at leastone group remains untreated (resp. treated) is close to T or not. Accordingly, δ trunCB and δ CB should be close in short panels, or in instances where groups rarely change treatment.14heorem 4 above relies on Assumptions 4, 5, 7, and 8. We now define placebo estimators onecan use to test those assumptions. For any (cid:96) ∈ { , ..., (cid:98) T − (cid:99)} and t ∈ { (cid:96) + 3 , ..., T } , letDID + , pl t,(cid:96) = β t (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N (cid:96), + t ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) − (cid:88) g : S g >t (1 − D g, ) N g,t N ntt ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) if N (cid:96), + t > and N ntt > , and let DID + , pl t,(cid:96) = 0 otherwise. Let also N DID + , pl (cid:96) = (cid:80) Tt =2 (cid:96) +3 N (cid:96), + t , andlet DID + , pl (cid:96) = T (cid:88) t =2 (cid:96) +3 N (cid:96), + t N DID + , pl (cid:96) DID + , pl t,(cid:96) if N DID + , pl (cid:96) > , and let DID + , pl (cid:96) = 0 otherwise. DID + , pl t,(cid:96) compares the outcome evolution in thesame two sets of groups as those used in DID + t,(cid:96) , but between periods t − (cid:96) − and t − (cid:96) − instead of t − (cid:96) − and t . Thus, DID + , pl t,(cid:96) tests if parallel trends holds for (cid:96) + 1 periods, the samenumber of periods over which parallel trends has to hold for DID + t,(cid:96) to be an unbiased estimatorof the effect of having changed treatment (cid:96) periods ago.Similarly, for any (cid:96) ∈ { , ..., (cid:98) T − (cid:99)} and t ∈ { (cid:96) + 3 , ..., T } , letDID − , pl t,(cid:96) = β t (cid:88) g : S g = t − (cid:96) D g, N g,t N (cid:96), − t ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) − (cid:88) g : S g >t D g, N g,t N att ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) if N (cid:96), − t > and N att > , and let DID − , pl t,(cid:96) = 0 otherwise. Let also N DID − , pl (cid:96) = (cid:80) Tt =2 (cid:96) − N (cid:96), − t , andlet DID − , pl (cid:96) = T (cid:88) t =2 (cid:96) − N (cid:96), − t N DID − , pl (cid:96) DID − , pl t,(cid:96) if N DID − , pl (cid:96) > , and let DID − , pl (cid:96) = 0 otherwise. Theorem 5
Suppose that Assumptions 1-3 hold and (cid:96) ∈ { , ..., (cid:98) T − (cid:99)} .1. If Assumptions 4 and 5 also hold, E (cid:104) DID + , pl (cid:96) (cid:105) = 0 .2. If Assumptions 7 and 8 also hold, E (cid:104) DID − , pl (cid:96) (cid:105) = 0 . Theorem 5 shows that E (cid:104) DID + , pl (cid:96) (cid:105) = 0 (resp. E (cid:104) DID − , pl (cid:96) (cid:105) = 0 ) is a testable implication ofAssumptions 4 and 5 (resp. Assumptions 7 and 8), so one can reject those assumptions whenDID pl + ,(cid:96) (resp. DID pl − ,(cid:96) ) is significantly different from 0. One could consider instead first-difference15lacebo estimators, comparing the t − k − to t − k − outcome evolution in groups changingtreatment for the first time in t and in groups that have not changed treatment yet in t . Thetrade-off between the first- and long-difference placebo estimators are the same as in staggeredadoption designs. These alternative estimators are defined and discussed in our Web Appendix. On the other hand, λ trun may be low in long panels, and in instances where groups often changetreatment. Then, δ trunCB may be very different from δ CB . We now consider an assumption ondynamic treatment effects indexed by k ∈ { , ..., T − } . If this assumption is plausible for some k , one may be able to unbiasedly estimate a parameter close to that the planner needs to knowfor policy evaluation. Assumption 9 - k (Treatments from up to k -periods ago can affect the current outcome)For all g , all t ≥ k + 1 , and all ( d , ..., d t ) ∈ { , } t , Y g,t ( d , ..., d t ) = Y g,t ( d t − k , ..., d t ) . Assumption 9- k is equivalent to ruling out dynamic effects beyond the k th-lagged treatment, anassumption often implicitly made in event-study regressions (see Autor, 2003). For instance, if k = 1 , only the period- t and period- t − treatments can affect the period t -outcome. Assumption9- k is plausible in instances where the treatment is unlikely to have very long-run effects.Under Assumption 9- k , if a group’s treatment changes at some point, this change may have aneffect for at most k periods thereafter. Accordingly, in her cost-benefit analysis, the planneronly needs to take into account the effect of that change at the period it takes place and the k following periods, relative to the scenario where that change had not taken place. This motivatesthe following generalization of the δ CB parameter introduced in the previous subsection. For all g , t ≥ , and k ≥ , let S g,t,k = min { t (cid:48) ∈ { max( t − k, , ..., t } : D g,t (cid:48) (cid:54) = D g,t (cid:48) − } be the least recent date included between t − k and t (or and t if t − k < ) at which group g ’streatment changed. We let S g,t,k = 0 if group g ’s treatment did not change treatment between max( t − k, − and t , or if t = 1 . Let N S,k = (cid:80) ( g,t ): S g,t,k ≥ N g,t denote the number of units inthe ( g, t ) cells such that g ’s treatment changed at least once over the k periods before t . Finally,let δ CB,k = E (cid:88) ( g,t ): S g,t,k ≥ (1 − D g,S g,t,k − ) N g,t N S,k β t ( Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) − D g,S g,t,k − (cid:88) ( g,t ): S g,t,k ≥ N g,t N S,k β t ( Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) )) . CB,k is a discounted weighted average of the effect of having changed treatment, in all the ( g, t ) cells such that g ’s treatment changed at least once over the last k periods before t . Remarkingthat S g,t,T − = S g { t ≥ S g } , one can show that δ CB,T − is equal to δ CB in the previous subsection.Also, δ CB, with β = 1 is equal to the average treatment effect among the switchers consideredby de Chaisemartin and D’Haultfœuille (2020).As in the previous sections, we cannot unbiasedly estimate δ CB,k in general, so we consider atruncated version of it. Let C g,t,k = S g,t,k × (cid:8) S g,S g,t,k − ,k − = 0 , ∃ g (cid:48) ∈ { , ..., G } : S g (cid:48) ,t,k = S g (cid:48) ,S g,t,k − ,k − = 0 ,D g (cid:48) ,S g,t,k − = D g,S g,t,k − (cid:9) . Thus, C g,t,k (cid:54) = 0 (and then C g,t,k ≥ ) under the following conditions. First, g ’s treatment haschanged at least once over the k periods before t (so that S g,t,k > ) and did not change for atleast k − periods before that change (so that S g,S g,t,k − ,k − = 0 ). Second, at least another group g (cid:48) had the same treatment as g before g ’s treatment changed ( D g (cid:48) ,S g,t,k − = D g,S g,t,k − ) and didnot experience any change in its treatment from the k − th period before g ’s treatment changeduntil period t ( S g (cid:48) ,t,k = S g (cid:48) ,S g,t,k − ,k − = 0 ). Under Assumptions 4, 5, 7, 8, and 9- k , g (cid:48) can act asa control to infer the outcome evolution g would have experienced until period t if her treatmenthad not changed. Let N trunS,k = (cid:80) ( g,t ): C g,t,k ≥ N g,t denote the number of units in the ( g, t ) cellssuch that g ’s treatment changed at least once over the k periods before t , and there is at leastanother group g (cid:48) that can act as a control for g . Finally, let δ trunCB,k = E (cid:88) ( g,t ): C g,t,k ≥ (1 − D g,S g,t,k − ) N g,t N trunS,k β t ( Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) − D g,S g,t,k − (cid:88) ( g,t ): C g,t,k ≥ N g,t N trunS,k β t ( Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) )) .δ trunCB,k is a version of δ CB,k truncated from the treatment effect in all the ( g, t ) cells such that g ’streatment changed at least once over the k periods before t but there is no other group g (cid:48) thatcan act as a control to estimate the effect of that change. As previously, let λ trunk = (cid:80) ( g,t ): C g,t,k ≥ N g,t β t (cid:80) ( g,t ): S g,t,k ≥ N g,t β t denote the “ ‘proportion” of δ CB,k ’s treatment effects that are also in δ trunCB,k . δ trunCB,T − = δ trunCB , so if k = T − , the estimators we propose are the same as in the previous17ubsection. We therefore assume that k ≤ T − . For any (cid:96) ∈ { , ..., k } and t ∈ { (cid:96) + 2 , ..., T } , let N (cid:96), + t = (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96), − t = (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 D g,t − (cid:96) − N g,t N (cid:96),ntt = (cid:88) g : S g,t,k =0 ,S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96),att = (cid:88) g : S g,t,k =0 ,S g,t − (cid:96) − ,k − =0 D g,t − (cid:96) − N g,t . Let DID + t,(cid:96) = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96), + t ( Y g,t − Y g,t − (cid:96) − ) − (cid:88) g : S g,t,k =0 ,S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96),ntt ( Y g,t − Y g,t − (cid:96) − ) if N (cid:96), + t > and N (cid:96),ntt > , and let DID + t,(cid:96) = 0 otherwise. DID + t,(cid:96) is the β t -discounted DIDestimator comparing the outcome evolution from period t − (cid:96) − to t in groups untreated fromperiod t − (cid:96) − − k to t − (cid:96) − and treated in t − (cid:96) to the same evolution in groups untreatedfrom period t − (cid:96) − − k to t . Under Assumptions 4-5 and 9- k , the expectation of the outcomeevolution in the latter set of groups is a counterfactual of the evolution that would have takenplace in the former set of groups if it had remained untreated till t . Thus, DID + t,(cid:96) is an unbiasedestimator of the effect of not having remained untreated till t in those groups. LetDID − t,(cid:96) = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 D g,t − (cid:96) − N g,t N (cid:96), − t ( Y g,t − Y g,t − (cid:96) − ) − (cid:88) g : S g,t,k =0 ,S g,t − (cid:96) − ,k − =0 D g,t − (cid:96) − N g,t N (cid:96),att ( Y g,t − Y g,t − (cid:96) − ) if N (cid:96), − t > and N (cid:96),att > , and let DID − t,(cid:96) = 0 otherwise. DID − t,(cid:96) is the β t -discounted DIDestimator comparing the outcome evolution from period t − (cid:96) − to t in groups treated fromperiod t − (cid:96) − − k to t − (cid:96) − and untreated in t − (cid:96) to the same evolution in groups treatedfrom period t − (cid:96) − − k to t . Under Assumptions 7-8 and 9- k , DID − t,(cid:96) is an unbiased estimatorof the effect of not having remained treated till t in the former set of groups.Then, let N DID (cid:96) = (cid:80) t ≥ (cid:96) +2 ,N (cid:96),ntt > N (cid:96), + t + (cid:80) t ≥ (cid:96) +2 ,N (cid:96),att > N (cid:96), − t . LetDID (cid:96) = T (cid:88) t = (cid:96) +2 (cid:32) N (cid:96), + t N DID (cid:96)
DID + t,(cid:96) − N (cid:96), − t N DID (cid:96)
DID − t,(cid:96) (cid:33) N DID (cid:96) > , and let DID (cid:96) = 0 otherwise. For k = 0 and β = 1 , DID is equal to the DID M esti-mator in de Chaisemartin and D’Haultfœuille (2020). Finally, let (cid:98) δ trunCB,k = (cid:80) k(cid:96) =0 ( N DID (cid:96) /N trunS,k ) DID (cid:96) . Theorem 6
Suppose that Assumptions 2-5, 7, 8, and 9- k hold and N trunS,k > . Then, E (cid:104)(cid:98) δ trunCB,k (cid:105) = δ trunCB,k . Suppose Assumption 9- is plausible. If for all ( g, t ) such that t ≥ and D g,t (cid:54) = D g,t − , there isanother g (cid:48) such that D g (cid:48) ,t = D g (cid:48) ,t − = D g,t − , then δ trunCB, = δ CB, . Therefore, Theorem 6 impliesthat the parameter the planner needs to know to evaluate the policy changes that took placeover the period can be unbiasedly estimated. Note also that in this case, with k = 0 and β = 1 Theorem 6 is equivalent to Theorem 3 in de Chaisemartin and D’Haultfœuille (2020).Now, suppose Assumption 9- is implausible, but Assumption 9- is. Moreover, assume that thefollowing two conditions hold:1. for all t ≥ , all ( g, t ) such that D g,t (cid:54) = D g,t − are also such that D g,t − = D g,t − and thereis another g (cid:48) such that D g (cid:48) ,t +1 = D g (cid:48) ,t = D g (cid:48) ,t − = D g (cid:48) ,t − = D g,t −
2. for all g such that D g, (cid:54) = D g, , there is another g (cid:48) such that D g (cid:48) , = D g (cid:48) , = D g, .Then δ trunCB, = δ CB, , and Theorem 6 implies that δ CB, can be unbiasedly estimated. Notice thatthe conditions under which δ trunCB, = δ CB, are stronger than those under which δ trunCB, = δ CB, .Therefore, it is more likely that only the truncated parameter can be unbiasedly estimated whenone works under Assumption 9- , than when one works under Assumption 9- . Similarly, whenonly the truncated parameter can be estimated in both cases, that parameter should be closerto the untruncated one under Assumption 9- than under Assumption 9- ( λ trun < λ trun ). Thesame conclusion applies to higher values of k : one should typically have that k (cid:55)→ λ trunk isdecreasing. Then, there is a trade-off between the plausibility of the assumptions one imposes,and the relevance of the parameter that can be estimated under those assumptions.As discussed previously some groups may have already been treated prior to period 1, and thosetreatments may still affect some of their period- -to- T outcomes. However, under Assumption9- k , one can circumvent this problem, by redefining the DID (cid:96) estimators above by consideringonly time periods after k + 1 .Finally, Theorem 6 above relies on Assumptions 4, 5, 7, and 8. Under Assumption 9- k , theplacebo estimators one can use to test those assumptions differ from those in the previous19ection. For any (cid:96) ∈ { , ..., min( (cid:98) T − (cid:99) , k ) } and t ∈ { (cid:96) + 3 , ..., T } , letDID + , pl t,(cid:96) = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k + (cid:96) =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96), + t ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) − (cid:88) g : S g,t,k =0 ,S g,t − (cid:96) − ,k + (cid:96) =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96),ntt ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) if N (cid:96), + t > and N (cid:96),ntt > , and let DID + , pl t,(cid:96) = 0 otherwise. Note that { g : S g,t − (cid:96) − ,k + (cid:96) = 0 } isthe subset of { g : S g,t − (cid:96) − ,k − = 0 } (the latter being used in DID + t,(cid:96) ) for which D g,t − (cid:96) − = ... = D t − (cid:96) − − k . Let also N DID + , pl (cid:96) = (cid:80) Tt =2 (cid:96) +3 N (cid:96), + t , and letDID + , pl (cid:96) = T (cid:88) t =2 (cid:96) +3 N (cid:96), + t N DID + , pl (cid:96) DID + , pl t,(cid:96) if N DID + , pl (cid:96) > , and let DID + , pl (cid:96) = 0 otherwise. Similarly, letDID − , pl t,(cid:96) = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k + (cid:96) =0 D g,t − (cid:96) − N g,t N (cid:96), − t ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) − (cid:88) g : S g,t,k =0 ,S g,t − (cid:96) − ,k + (cid:96) =0 D g,t − (cid:96) − N g,t N (cid:96),att ( Y g,t − (cid:96) − − Y g,t − (cid:96) − ) if N (cid:96), − t > and N (cid:96),att > , and let DID − t,(cid:96) = 0 otherwise. Let also N DID − , pl (cid:96) = (cid:80) Tt =2 (cid:96) − N (cid:96), − t , andlet DID − , pl (cid:96) = T (cid:88) t =2 (cid:96) − N (cid:96), − t N DID − , pl (cid:96) DID − , pl t,(cid:96) if N DID − , pl (cid:96) > , and let DID − , pl (cid:96) = 0 otherwise. Theorem 7
Suppose that Assumptions 1-3 and 9- k hold, and (cid:96) ∈ { , ..., min( (cid:98) T − (cid:99) , k ) } .1. If Assumptions 4 and 5 also hold, E (cid:104) DID + , pl (cid:96) (cid:105) = 0 .2. If Assumptions 7 and 8 also hold, E (cid:104) DID − , pl (cid:96) (cid:105) = 0 . Theorem 7 shows that under Assumption 9- k , E (cid:104) DID + , pl (cid:96) (cid:105) = 0 (resp. E (cid:104) DID − , pl (cid:96) (cid:105) = 0 ) is atestable implication of Assumptions 4 and 5 (resp. Assumptions 7 and 8), so one can rejectthose assumptions when DID pl + ,(cid:96) (resp. DID pl − ,(cid:96) ) is significantly different from 0. Again, onecould instead define first-difference placebo estimators under Assumption 9- k . These alternativeplacebos are defined and discussed in our Web Appendix.20 eferences Abadie, A. (2005), ‘Semiparametric difference-in-differences estimators’,
Review of EconomicStudies (1), 1–19.Abbring, J. H. and Van den Berg, G. J. (2003), ‘The nonparametric identification of treatmenteffects in duration models’, Econometrica (5), 1491–1517.Abraham, S. and Sun, L. (2020), Estimating dynamic treatment effects in event studies withheterogeneous treatment effects. Working Paper.Ashenfelter, O. (1978), ‘Estimating the effect of training programs on earnings’, The Review ofEconomics and Statistics pp. 47–57.Athey, S. and Imbens, G. W. (2018), Design-based analysis in difference-in-differences settingswith staggered adoption, Technical report, National Bureau of Economic Research.Autor, D. H. (2003), ‘Outsourcing at will: The contribution of unjust dismissal doctrine to thegrowth of employment outsourcing’,
Journal of labor economics (1), 1–42.Bertrand, M., Duflo, E. and Mullainathan, S. (2004), ‘How much should we trust differences-in-differences estimates?’, The Quarterly Journal of Economics (1), 249–275.Borusyak, K. and Jaravel, X. (2017), Revisiting event study designs. Working Paper.Callaway, B. and Sant’Anna, P. H. (2018), Difference-in-differences with multiple time periodsand an application on the minimum wage and employment. arXiv e-print 1803.09015.de Chaisemartin, C. and D’Haultfœuille, X. (2020), ‘Two-way fixed effects estimators with het-erogeneous treatment effects’,
American Economic Review
Forthcoming .de Chaisemartin, C., D’Haultfoeuille, X. and Guyonvarch, Y. (2019), ‘DID_MULTIPLEGT:Stata module to estimate sharp Difference-in-Difference designs with multiple groups andperiods’, Statistical Software Components, Boston College Department of Economics.
URL: https://ideas.repec.org/c/boc/bocode/s458643.html
Manski, C. F. (2005),
Social choice with partial knowledge of treatment response , PrincetonUniversity Press.Schmidheiny, K. and Siegloch, S. (2020), On event studies and distributed-lags in two-way fixedeffects models: Identification, equivalence, and generalization. ZEW Discussion Paper 20-01.21
Appendix: proofs
A.1 Proofs of Theorems 1 and 2
Let D = ( D g ) g =1 ,...,G . We first prove the following lemma. Lemma 1
If Assumptions 2-6 hold, then for any (cid:96) ∈ { , ..., T − } E [ DID (cid:96) | D ] = E [∆ trun(cid:96) | D ] . Proof:
DID (cid:96) { L < (cid:96) } = ∆ trun(cid:96) { L < (cid:96) } = 0 , so to prove the result, it is sufficient to show that { L ≥ (cid:96) } E [ DID (cid:96) | D ] = 1 { L ≥ (cid:96) } E [∆ (cid:96) | D ] . We consider an arbitrary (cid:96) ∈ { , ..., L } . By Assumption 5, for all t ≥ there is a real number ψ t such that ψ t = E ( Y g,t ( t ) − Y g,t − ( t − )) for all g . Then, for all g and all t ≥ (cid:96) + 2 , E [ Y g,t ( t ) − Y g,t − (cid:96) − ( t − (cid:96) − )] = (cid:96) (cid:88) k =0 ψ t − k . (4)Then, for all t ∈ { (cid:96) + 2 , ..., T } such that N (cid:96)t > and N ntt > , E [ DID t,(cid:96) | D ]= β t (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t E [ Y g,t − Y g,t − (cid:96) − | D ] − (cid:88) g : F g >t N g,t N ntt E [ Y g,t − Y g,t − (cid:96) − | D ] = β t (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t E [ Y g,t ( D g,t ) − Y g,t ( t ) | D ]+ (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t E [ Y g,t ( t ) − Y g,t − (cid:96) − ( t − (cid:96) − ) | D ] − (cid:88) g : F g >t N g,t N ntt E [ Y g,t ( t ) − Y g,t − (cid:96) − ( t − (cid:96) − ) | D ] = β t (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t E [ Y g,t ( D g,t ) − Y g,t ( t ) | D ]+ (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t E [ Y g,t ( t ) − Y g,t − (cid:96) − ( t − (cid:96) − )] − (cid:88) g : F g >t N g,t N ntt E [ Y g,t ( t ) − Y g,t − (cid:96) − ( t − (cid:96) − )] = β t E (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t ( Y g,t ( D g,t ) − Y g,t ( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) D . (5)The first equality holds by the definition of DID t,(cid:96) and the fact that N (cid:96)t > and N ntt > . Thesecond equality holds by Assumptions 3 and 6. The third equality follows from Assumptions 222nd 4. The last equality follows from Equation (4). Then, { L ≥ (cid:96) } E [ DID (cid:96) | D ]=1 { L ≥ (cid:96), N DID (cid:96) > } (cid:32) T (cid:88) t = (cid:96) +2 N (cid:96)t N DID (cid:96) E [ DID t,(cid:96) | D ] (cid:33) =1 { L ≥ (cid:96), N DID (cid:96) > } NT (cid:88) t = (cid:96) +2 β t E (cid:88) g : F g = t − (cid:96) N g,t N trun(cid:96) ( Y g,t ( D g,t ) − Y g,t ( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) D =1 { L ≥ (cid:96), N DID (cid:96) > } E (cid:2) ∆ trun(cid:96) | D (cid:3) =1 { L ≥ (cid:96) } E (cid:2) ∆ trun(cid:96) | D (cid:3) . (6)The first equality follows from the definition of DID (cid:96) . The second equality follows from thefollowing facts. First, DID t,(cid:96) = 0 for t > N T . Second, if t ≤ N T , we have N ntt > and thusEquation (5) holds if N (cid:96)t > . Third, if N (cid:96)t = 0 , we have (cid:80) g : F g = t − (cid:96) N g,t ( Y g,t ( D g,t ) − Y g,t ( t )) = DID t,(cid:96) = 0 . Finally, we use N trun(cid:96) = N DID (cid:96) . The last equality follows from N trun(cid:96) = N DID (cid:96) , and ∆ trun(cid:96) = 0 if N trun(cid:96) = 0 (cid:3) Turning to Theorem 2, Point 1 follows from Lemma 1 and the law of iterated expectations.Point 2 follows from N DID (cid:96) = N trun(cid:96) , Lemma 1, the law of iterated expectations and (3).Finally, we prove Theorem 1. Point 1 follows from Point 1 of Theorem 2, and the fact that max g ∈{ ,...,G } F g > max g :2 ≤ F g ≤ T − (cid:96) F g + (cid:96) implies that δ trun(cid:96) = δ (cid:96) . Point 2 follows from Point 2 ofTheorem 2, and the fact that max g ∈{ ,...,G } F g = T + 1 implies that δ trunSQ = δ SQ . A.2 Proof of Theorem 3
Following the same steps as those used to obtain (5), we get, whenever N (cid:96)t > and N ntt > , E [ DID pl t,(cid:96) | D ] = β t E (cid:88) g : F g = t − (cid:96) N g,t N (cid:96)t ( Y g,t − (cid:96) − ( D g,t − (cid:96) − ) − Y g,t − (cid:96) − ( t − (cid:96) − )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) D =0 , where the second equality follows since by definition, F g = t − (cid:96) implies that D g,t − (cid:96) − = t − (cid:96) − .The result follows using the same reasoning as that used to obtain (6).23 .3 Proof of Theorem 4
1. First, note that for any (cid:96) ∈ { , ..., T − } and t ∈ { (cid:96) + 2 , ..., T } such that N (cid:96), + t > and N ntt > , E (cid:0) DID + t,(cid:96) | D (cid:1) = β t (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N (cid:96), + t E ( Y g,t − Y g,t − (cid:96) − | D ) − (cid:88) g : S g >t (1 − D g, ) N g,t N ntt E ( Y g,t − Y g,t − (cid:96) − | D ) = β t (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N (cid:96), + t E ( Y g,t ( D g,t ) − Y g,t ( t ) | D )+ (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N (cid:96), + t E ( Y g,t ( t ) − Y g,t − (cid:96) − ( t − (cid:96) − ) | D ) − (cid:88) g : S g >t (1 − D g, ) N g,t N ntt E ( Y g,t ( t ) − Y g,t − (cid:96) − ( t − (cid:96) − ) | D ) = β t (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N (cid:96), + t E ( Y g,t ( D g,t ) − Y g,t ( t ) | D ) . (7)The first equality follows from the definition of DID + t,(cid:96) , N (cid:96), + t > and N ntt > . The secondequality follows from Assumption 3. The third equality follows from Assumptions 2 and 4 andEquation (4). Then, L (cid:88) (cid:96) =0 N DID + (cid:96) N trunS, E (cid:0) DID + (cid:96) | D (cid:1) = L (cid:88) (cid:96) =0 T (cid:88) t = (cid:96) +2 N (cid:96), + t N trunS, E (cid:0) DID + t,(cid:96) | D (cid:1) = L (cid:88) (cid:96) =0 NT (cid:88) t = (cid:96) +2 β t (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N trunS, E ( Y g,t ( D g,t ) − Y g,t ( t ) | D )= (cid:88) g : S g ≤ NT (1 − D g, ) NT (cid:88) t = S g β t N g,t N trunS, E ( Y g,t ( D g,t ) − Y g,t ( t ) | D ) . (8)The first equality follows from the definition of DID + (cid:96) , and from the fact DID + t,(cid:96) = 0 for all t ∈ { (cid:96) + 2 , ..., T } if N DID + (cid:96) = 0 . The second equality follows from the following facts. First, if t > N T , DID + t,(cid:96) = 0 . Second, if t ≤ N T , N ntt > by definition of N T . Then we use Equation(7) if N (cid:96), + t > , and if N (cid:96), + t = 0 ,DID + t,(cid:96) = (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N trunS, E ( Y g,t ( D g,t ) − Y g,t ( t ) | D ) = 0 . Then Point 1 follows from Equation (8) and the law of iterated expectations. Point 2 follows24rom the same reasoning, using Assumptions 7 and 8 instead of Assumptions 4 and 5. Point 3follows directly from Points 1 and 2.
A.4 Proof of Theorem 5
Following the same steps as those used to obtain (7), we get, whenever N (cid:96), + t > and N ntt > , E [ DID + , pl t,(cid:96) | D ] = β t (cid:88) g : S g = t − (cid:96) (1 − D g, ) N g,t N (cid:96), + t E [ Y g,t − (cid:96) − ( D g,t − (cid:96) − ) − Y g,t − (cid:96) − ( t − (cid:96) − ) | D ]=0 , where the second equality follows since by definition, S g = t − (cid:96) and D g, = 0 imply that D g,t − (cid:96) − = t − (cid:96) − . Then, Point 1 follows using the same reasoning as that used to obtain (8).Point 2 can be obtained similarly. A.5 Proof of Theorem 6
Under Assumption 9- k , it follows from Equation (4) that for all g , all (cid:96) ≥ and all t ≥ (cid:96) + 2 , E [ Y g,t ( min( t,k +1) ) − Y g,t − (cid:96) − ( min( t − (cid:96) − ,k +1) )] = (cid:96) (cid:88) j =0 ψ t − j . (9)25or any (cid:96) ∈ { , ..., k } and t ∈ { (cid:96) + 2 , ..., T } such that N (cid:96), + t > and N (cid:96),ntt > , E (cid:0) DID + t,(cid:96) | D (cid:1) = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96), + t E ( Y g,t − Y g,t − (cid:96) − | D ) − (cid:88) g : S g,t,k =0 ,S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96),ntt E ( Y g,t − Y g,t − (cid:96) − | D ) = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96), + t E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) + (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96), + t E (cid:0) Y g,t ( min( t,k +1) ) − Y g,t − (cid:96) − ( min( t − (cid:96) − ,k +1) ) | D (cid:1) − (cid:88) g : S g,t,k =0 ,S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96),ntt E (cid:0) Y g,t ( min( t,k +1) ) − Y g,t − (cid:96) − ( min( t − (cid:96) − ,k +1) ) | D (cid:1) = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96), + t E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) . (10)The first equality follows from the definition of DID + t,(cid:96) and N (cid:96), + t > and N (cid:96),ntt > . The secondequality follows from Assumptions 3 and 9- k and the definitions of S g,t,k and S g,t − (cid:96) − ,k − . Thethird equality follows from Assumptions 2 and 4 and Equation (9).Similarly, one can show that for any (cid:96) ∈ { , ..., k } and t ∈ { (cid:96) + 2 , ..., T } such that N (cid:96), − t > and N (cid:96),att > , E (cid:0) DID − t,(cid:96) | D (cid:1) = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k − =0 D g,t − (cid:96) − N g,t N (cid:96), − t E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) . (11)26inally, k (cid:88) (cid:96) =0 N DID (cid:96) N trunS,k E ( DID (cid:96) | D )= k (cid:88) (cid:96) =0 T (cid:88) t = (cid:96) +2 (cid:32) N (cid:96), + t N trunS,k E (cid:0) DID + t,(cid:96) | D (cid:1) − N (cid:96), − t N trunS,k E (cid:0) DID − t,(cid:96) | D (cid:1)(cid:33) = k (cid:88) (cid:96) =0 T (cid:88) t = (cid:96) +2 β t (cid:88) g : C g,t,k = t − (cid:96) (1 − D g,t − (cid:96) − ) N g,t N trunS,k E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) − k (cid:88) (cid:96) =0 T (cid:88) t = (cid:96) +2 β t (cid:88) g : C g,t,k = t − (cid:96) D g,t − (cid:96) − N g,t N trunS,k E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) = (cid:88) g : C g,t,k ≥ (1 − D g,t − (cid:96) − ) β t N g,t N trunS,k E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) − (cid:88) g : C g,t,k ≥ D g,t − (cid:96) − β t N g,t N trunS,k E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) . (12)The first equality follows from the definition of DID (cid:96) , and from the fact DID + t,(cid:96) = DID − t,(cid:96) = 0 forall t ∈ { (cid:96) + 2 , ..., T } if N DID (cid:96) = 0 . The second equality follows from the following facts. First,DID + t,(cid:96) = β t (cid:88) g : C g,t,k = t − (cid:96) (1 − D g,t − (cid:96) − ) N g,t N trunS,k E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) = 0 if N (cid:96), + t = 0 or N (cid:96),ntt = 0 andDID − t,(cid:96) = β t (cid:88) g : C g,t,k = t − (cid:96) D g,t − (cid:96) − N g,t N trunS,k E (cid:0) Y g,t ( D g, max( t − k, , ..., D g,t ) − Y g,t ( min( t,k +1) ) | D (cid:1) = 0 if N (cid:96), − t = 0 or N (cid:96),att = 0 . Second, if N (cid:96), + t > and N (cid:96),ntt > (resp. N (cid:96), − t > and N (cid:96),att > ),Equation (10) (resp. (11)) holds.The result follows from Equation (12) and the law of iterated expectations. A.6 Proof of Theorem 7
Following the same steps as those used to obtain (10), we get, whenever N (cid:96), + t > and N (cid:96),ntt > , E [ DID + , pl t,(cid:96) | D ] = β t (cid:88) g : S g,t,k = t − (cid:96),S g,t − (cid:96) − ,k + (cid:96) =0 (1 − D g,t − (cid:96) − ) N g,t N (cid:96), + t E (cid:2) Y g,t − (cid:96) − ( D g, max( t − (cid:96) − − k, , ..., D g,t − (cid:96) − ) − Y g,t − (cid:96) − ( min( t − (cid:96) − ,k +1) ) | D (cid:3) =0 , S g,t − (cid:96) − ,k + (cid:96) = 0 and D g,t − (cid:96) − = 0 imply that ( D g, max( t − (cid:96) − − k, , ..., D g,t − (cid:96) − ) = min( t − (cid:96) − ,k +1)+1)