A Design-Based Perspective on Synthetic Control Methods
AA Design-Based Perspective on Synthetic Control Methods
Lea Bottmer ∗ Guido Imbens † Jann Spiess ‡ Merrill Warnick § First version: November 2020Current version: January 2021
Abstract
Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) meth-ods have quickly become one of the leading methods for estimating causal effects in ob-servational studies with panel data. Formal discussions often motivate SC methods bythe assumption that the potential outcomes were generated by a factor model. Here westudy SC methods from a design-based perspective, assuming a model for the selection ofthe treated unit(s), e.g. , random selection as guaranteed in a randomized experiment. Weshow that SC methods offer benefits even in settings with randomized assignment, andthat the design perspective offers new insights into SC methods for observational data. Afirst insight is that the standard SC estimator is not unbiased under random assignment.We propose a simple modification of the SC estimator that guarantees unbiasedness inthis setting and derive its exact, randomization-based, finite sample variance. We alsopropose an unbiased estimator for this variance. We show in settings with real data thatunder random assignment this Modified Unbiased Synthetic Control (MUSC) estimatorcan have a root mean-squared error (RMSE) that is substantially lower than that of thedifference-in-means estimator. We show that such an improvement is weakly guaranteedif the treated period is similar to the other periods, for example, if the treated period wasrandomly selected. The improvement is most likely to be substantial if the number ofpre-treatment periods is large relative to the number of control units.
Keywords : Randomization, Synthetic Controls, Panel Data, Causal Effects ∗ Department of Economics, Stanford University, [email protected] . † Professor of Economics, Graduate School of Business, and Department of Economics, StanfordUniversity, SIEPR, and NBER, [email protected] . ‡ Assistant Professor of Operations, Information & Technology, Graduate School of Business, Stan-ford University, and SIEPR, [email protected] . § Department of Economics, Stanford University, [email protected] . a r X i v : . [ ec on . E M ] J a n Introduction
Synthetic Control (SC) methods for estimating causal effects have become commonplace inempirical work in the social sciences since their introduction by Alberto Abadie and coauthors(Abadie and Gardeazabal, 2003; Abadie et al., 2010, 2015). Typically the properties of theoriginal SC estimator, as well as those of the various modifications that have been proposedsubsequently, are studied under model-based assumptions about the distribution of the potentialoutcomes in the absence of the intervention. A common approach is to assume that the potentialoutcomes follow a factor model plus noise.In this article we take a different approach to studying the properties of SC estimators.Instead of making model-based assumptions about the distribution of the potential outcomes, wemake design-based assumptions about the assignment of the unit/time-period pairs to treatment.In particular, we consider the case with a single treated unit/time-period, with the treated unitselected at random from a set of units. We find that in this setting the original SC estimator isgenerally biased. We propose a simple modification of the SC estimator, labeled the ModifiedUnbiased Synthetic Control (MUSC) estimator, which is unbiased under random assignment ofthe treatment. We also propose a variance estimator that is unbiased for the variance of thisestimator.Studying the properties of SC-type estimators under design-based assumptions serves twodistinct purposes. First, it suggests an important role for SC methods in randomized experi-ments. Second, it leads to new insights into SC methods in observational studies. SC methodsare particularly valuable in experimental settings with relatively few units with correlated vari-ation over time. In such experiments, where the design assumptions hold by definition, SC-typemethods can have substantially better RMSE properties than the standard estimator based onthe difference in means by treatment status, where we focus on the average treatment effect onthe treated. However, it may be important to maintain the guarantees that the standard esti-mator for randomized experiments, the difference in means, enjoys under randomization. Theproposed MUSC estimator does so, and combines the typical improvement in terms of RMSEwith the guarantees under randomization. To illustrate the benefits of the MUSC estimator inexperimental settings, we simulate an experiment based on data on average log wages observedacross 50 states over 40 years. We randomly select one state to be treated in the last period, andcompare the difference in means, the standard SC estimator, and our proposed MUSC (Modified1nbiased Synthetic Control) estimator. There are three key findings, partly reported in Table 1and expanded on in Section 4. First, the difference-in-means and the new MUSC estimator areunbiased by construction whereas the SC estimator is biased. Although in this example the biasof the SC estimator is modest, there are no guarantees that the bias is small in general. Second,the RMSE is substantially lower for the SC and MUSC estimators relative to the RMSE of thedifference-in-means estimator. This suggests that there can be considerable gains to using SCmethods in randomized experiments, without a need to surrender the unbiasedness guaranteedby the randomization. Third, the new variance estimators are accurate in this setting.
Table 1:
Simulation Experiment Based on CPS Average Log Wage by State and YearDiff in Means Synth Control Modified Unbiased Synth ControlBias 0 − .
007 0RMSE 0 .
105 0 .
051 0 . .
105 0 .
051 0 . e.g. Abadieet al., 2010; Doudchenko and Imbens, 2016; Ferman and Pinto, 2017), is also biased. Withplacebo methods, one estimates the variance by calculating the variance of the SC estimator overthe distribution generated by applying the SC estimator to randomly selected control units, withor without the actual treated unit. Such methods can both under-estimate and over-estimatethe true variance under our design assumptions. In contrast, our proposed variance estimatoris unbiased in finite samples, even with a single treated unit, under the random assignmentassumption, irrespective of any autocorrelation in the potential outcomes. Third, the designperspective highlights the importance of the choice of estimand. We define four different averagetreatment effects, either for the treated unit/period pairs, or averaged over units and or periods,2nd show how the choice of estimand relates to the assumptions and inferential methods.The article builds on the general Synthetic Control literature where early key contributionsare Abadie and Gardeazabal (2003); Abadie et al. (2010, 2015); Abadie (2019). Recent workproposing new estimators in this general class include Doudchenko and Imbens (2016); Abadieand L’Hour (2017); Ferman and Pinto (2017); Arkhangelsky et al. (2019); Li (2020); Ben-Michaelet al. (2020). The current paper also contributes to the literature on inference for SC estimators,including Abadie et al. (2010); Doudchenko and Imbens (2016); Ferman and Pinto (2017); Hahnand Shi (2016); Lei and Cand`es (2020); Chernozhukov et al. (2017). There is also a connectionto randomization inference in the general evaluation literature, ( e.g.
Neyman, 1990; Imbens andRubin, 2015; Sekhon and Shem-Tov, 2017; Abadie et al., 2020; Rambachan and Roth, 2020).
We consider a setting with N units, for whom we observe outcomes for T time periods. There isa binary treatment that varies by units and time periods, denoted by W it ∈ { , } , and a pair ofpotential outcomes Y it (0) and Y it (1) for all unit/period combinations (Rubin, 1974). We assumethere are no dynamic effects for the time being, so the potential outcomes are indexed only bythe contemporaneous treatment. In some of the settings we consider, the dynamic effects wouldsimply change the interpretation of the estimand. There are no restrictions on the time path ofthe potential outcomes. The N × T matrices of treatments and potential outcomes are denotedby W , Y (0) and Y (1) respectively. Given the treatment the realized/observed outcome matrixis Y , with typical element Y it ≡ W it Y it (1) + (1 − W it ) Y it (0) . (2.1)In contrast to most of the literature, we take the potential outcomes Y (0) and Y (1) as fixed inour analysis, and treat the assignment matrix W as stochastic. This in turn makes the realizedoutcomes Y stochastic.For much of the discussion we focus on the case with a single treated unit and a singletreated period. Many of the insights carry over to the case with a block of treated unit/time-period pairs, and we discuss explicitly the case with multiple treated units in Section 5.1. Inthe case with a single treated unit/time-period pair, the N × T matrix W , with typical element3 it ∈ { , } , satisfies (cid:80) i,t W it = 1.It is useful for our design-based analysis to separate out the assignment mechanism into theselection of the time period treated and the unit treated. For that purpose, exploiting the factthat there is only a single pair ( i, t ) with W it = 1, we write W = U V (cid:62) , where U is an N -vector with typical element U i ∈ { , } and (cid:80) Ni =1 U i = 1, and V is a T -vectorwith typical element V i ∈ { , } and (cid:80) Tt =1 V t = 1. The V t and U i can be defined in terms of W : V t = (cid:80) Ni =1 W it and U i = (cid:80) Tt =1 W it .In many cases the treated unit is exposed only in the last period, so V is non-stochasticand has the last element equal to one and all earlier elements equal to zero. We examine thiscase separately, but additional insights are obtained by considering the more general case whereboth the treated unit and the treated period are stochastic. Next we define the estimands we consider in this article. Being precise about estimands will beimportant for the discussion of bias and variance, as well as the role of various assumptions weintroduce. The estimands are defined as averages over elements of the matrix Y (1) − Y (0) withtypical element τ it ≡ Y it (1) − Y it (0). Which elements of this matrix we average over potentiallydepends on W , and thus the estimand may be stochastic. It is useful to write the estimandsas functions of U and V to show that some of the estimands depend only on one component of W = U V (cid:62) . For ease of exposition, the dependence on the potential outcomes is suppressed inthe notation.The estimand that is the primary focus in this discussion is the causal effect for the singletreated unit/time-period: τ ≡ τ ( U , V ) ≡ N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:16) Y it (1) − Y it (0) (cid:17) . (2.2)Because there is only a treated single unit/period, this is not an average but a single differencein potential outcomes. For the case with multiple treated units or periods this estimand could4e generalized to˜ τ ≡ N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:16) Y it (1) − Y it (0) (cid:17)(cid:46)(cid:32) N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:33) . We are interested in accurate estimation of τ as well as inference. In particular, we focus on twoproperties of estimators for τ : the bias and the (exact finite sample) variance. We also discussestimation of this variance.There are three other estimands that are important to contrast with the primary estimand τ . First, the average effect for all N units in the treated period, the “vertical” effect, which onlydepends on V and not on U : τ V ≡ τ V ( V ) ≡ N N (cid:88) i =1 T (cid:88) t =1 V t (cid:16) Y it (1) − Y it (0) (cid:17) , Next, the average effect for the treated unit over all T periods, the “horizontal” effect. Thisonly depends on U and not on V : τ H ≡ τ H ( U ) ≡ T N (cid:88) i =1 T (cid:88) t =1 U i (cid:16) Y it (1) − Y it (0) (cid:17) . Finally, the population average treatment effect, which depends on neither U nor V : τ POP ≡ N T N (cid:88) i =1 T (cid:88) t =1 (cid:16) Y it (1) − Y it (0) (cid:17) . An important conceptual advantage of focusing on τ rather than any of the other threeestimands is that it frees us from conceptualizing Y it (1) for unit/time periods other than theunit/time pair that was actually treated. To make this explicit, we can write τ as τ = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − N (cid:88) i =1 T (cid:88) t =1 U i V t Y it (0) , which clarifies that the estimand does not depend on any unobserved Y it (1). This can beimportant because in some cases it is difficult to give meaning to Y it (1) for units other than the5reated unit. For example, in the German reunification application in Abadie et al. (2015), it isdifficult to conceptualize Y it (1) for countries other than West Germany: what does it mean forFrance to unify with East Germany? However, focusing on τ allows us to ignore all unobserved Y it (1) in this application; we only need to consider the value of West German GDP in theabsence of the reunification, Y it (0). Most of the literature on panel data in general, and SC methods in particular, relies on modelingassumptions on the potential outcomes to derive properties of the proposed methods. For SCmethods, the model often takes the form of an R -factor latent-factor model for the controloutcome Y it (0) = R (cid:88) r =1 γ i β t + ε it , in combination with independence assumptions on the noise components ε it (Abadie et al., 2010;Athey et al., 2017; Amjad et al., 2018; Xu, 2017). Here we focus instead on design assumptionsabout the assignment process, that is, assumptions on the distribution of W (or, equivalently,the distributions of U and V ). In this approach, we place no restrictions on the potentialoutcomes, and in fact take those as fixed in the repeated sampling thought experiments. Suchdesign-based, as opposed to model-based, approaches have been used in the general experimentaldesign and program evaluation literature ( e.g. Fisher, 1937; Neyman, 1990; Imbens and Rubin,2015; Rosenbaum, 2002; Cunningham, 2018), as well more recently in regression settings (Abadieet al., 2020), and panel data analyses (Athey and Imbens, 2018). However, to the best of ourknowledge, they have not yet been used to analyze SC estimators.First, we consider random assignment of the units to the treatment.
Assumption 1. (Random Assignment of Units) pr( U = u ) = N if u i ∈ { , } ∀ i, (cid:80) Ni =1 u i = 1 , . We can also write this as U ⊥⊥ ( Y (0) , Y (1)) . Some version of this assumption underlies6any approaches to inference in synthetic control estimation.A second assumption we consider for some (but not all) of our results is less common. Herewe assume that the (single) treated period was randomly selected from the T periods underobservation. Assumption 2. (Random Assignment of Treated Period) pr( V = v ) = T if v t ∈ { , } ∀ t, (cid:80) Tt =1 v t = 1 , . Although this assumption is not plausible in many cases, as it is often the last period(s) thatare treated, it is useful to consider its implications. The implicit synthetic control assumptionis that the within-period relation between outcomes for different units during the pre-treatmentperiods is similar to that within-period relationship during the treated periods. Assumption 2formalizes this.We also consider the special case, which is common in empirical work, where the last elementof V is equal to one and all other elements are equal to zero, so the treatment only occurs inthe last period. In this case, the distribution of V is degenerate. Assumption 3. (Last Period Assignment) pr( V = v ) = v t = 0 ∀ t = 1 , . . . , T − , v T = 1 , . Though some properties we derive rely on Assumption 1 and Assumption 2, some results areconditional on V and therefore still apply to this common special case.Most of our discussion concerns the finite-sample case. However, for some results it willbe useful to consider large- T approximations. There are also settings where it may be usefulto consider large- N results, but we do not do so here. Large- N settings require regularizationof the SC weights, and the properties of the estimators will depend directly on the specificregularization method used ( e.g., Abadie and L’Hour, 2017; Doudchenko and Imbens, 2016).For large- T results we consider a stationarity assumption. First define Y · t (0) to be the N vectorwith typical element Y it (0). Define the averages up to period t of the first and the centered7econd moment:ˆ µ t = 1 t t (cid:88) s =1 Y · s (0) , ˆΣ t = 1 t t (cid:88) s =1 ( Y · s (0) − ˆ µ s ) ( Y · s (0) − ˆ µ s ) (cid:62) . Assumption 4. (Large- T Stationarity)
For some finite µ and Σ the sequence of popula-tions indexed by T satisfies, as T → ∞ , ˆ µ T −→ µ, ˆΣ T −→ Σ . Next we introduce four conventional estimators and assess their properties as estimators for thefour estimands defined in Section 2.1 under Assumption 1 and/or Assumption 2. This serves toclarify the role of the assumptions and the comparisons they validate, as well as set the stagefor the discussion of the SC estimators.The first estimator is the simple difference in outcomes for treated and control unit/periodpairs: ˆ τ DiM ( U , V , Y ) = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − N T − N (cid:88) i =1 T (cid:88) t =1 (1 − U i V t ) Y it . Second, the vertical difference, only for the treated period, between the treated unit and thecontrol units:ˆ τ V ( U , V , Y ) = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − N − N (cid:88) i =1 T (cid:88) t =1 V t (1 − U i ) Y it . Third, the horizontal difference, only for the treated unit, between the treated period and thecontrol periods:ˆ τ H ( U , V , Y ) = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − T − N (cid:88) i =1 T (cid:88) t =1 U i (1 − V t ) Y it . Fourth, the difference-in-differences or two-way fixed effect estimator (based on the two-way8xed effect regression specification Y it = µ + β t + α i + τ W it + ε it ). This DiD estimator can bewritten as the double difference in four averages:ˆ τ DiD ( U , V , Y ) = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − N − N (cid:88) i =1 T (cid:88) t =1 V t (1 − U i ) Y it − T − N (cid:88) i =1 T (cid:88) t =1 U i (1 − V t ) Y it + 1( N − T − N (cid:88) i =1 T (cid:88) t =1 (1 − V t )(1 − U i ) Y it . In Table 2 we report the bias properties of the four estimators under Assumptions 1 and 2,both separately and jointly, for the four estimands.
Table 2:
Bias Properties of Estimators for Varying Estimands and AssumptionsMaintained AssumptionsUnit Randomization Time Randomization Unit & Time RandEstimand → τ τ POP τ V τ H τ τ POP τ V τ H τ τ POP τ V τ H Estimator ↓ ˆ τ DiM
B B B B B B B B U U U Uˆ τ H B B B B U B B U U U U Uˆ τ V U B U B B B B B U U U Uˆ τ DiD
U B U B U B B U U U U UFor the first four estimators based on simple sample averages, unit randomization guaranteesunbiasedness for any estimator that compares average outcomes for the treated unit to theaverage for the control units, which includes the DiD estimator ˆ τ DiD and the vertical averageestimator ˆ τ V . On the other hand, time randomization ensures unbiasedness for estimators thatcompare average outcomes for treated periods and control periods, which includes the DiDestimator ˆ τ DiD and the horizontal average estimator ˆ τ H . The simple difference estimator is onlyunbiased under both unit and time randomization. In this section we consider SC type estimators, including the original SC estimator proposedby Abadie et al. (2010), as well as three modifications thereof. We define these estimators for9ll possible treatment assignment vectors U and V , initially for the case where both U and V have a single element equal to one and all other elements equal to zero. We do this at a higherlevel of abstraction than is typically done in the SC literature in order to accommodate somemodifications and to convey additional insights. We characterize the SC type estimators in terms of a set of weights M ijt , indexed by i = 1 , . . . , N , j = 0 , . . . , N , and t = 1 , . . . , T . Given a set of weights M , and given the assignments U , V ,and the outcomes Y , the general SC estimator has the formˆ τ ( U , V , Y , M ) ≡ N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) M i t + N (cid:88) j =1 M ijt Y jt (cid:41) . (2.3)Note that this estimator simplifies to the vertical estimator ˆ τ V ( U , V , Y ) if we choose the weights M i t = 0 and M ijt = − / ( N −
1) for j (cid:54) = 0 , i , and M iit = 1 for all i .The various estimators in this SC class we consider differ in the choice of the weights M . There are in general two components to this choice. First, there is a set of possible weights,denoted by M , over which we search for an optimal weight. These sets differ between theestimators we consider, but in all cases the sets are non-stochastic and do not depend on eitherthe assignment matrix W or the outcome data Y . Second, there is an objective function thatdefines the chosen weight within the set of possible weights. This objective function is the samefor all estimators we consider in the current article. We start with the second component. For a given matrix of outcomes Y , and a given setof possible weights M , define the tensor M ( Y , M ) with elements M ijt , for i = 1 , . . . , N , j = 0 , . . . , N , and t = 1 , . . . , T , through the minimization over the weights M over the set ofweights M : M ( Y , M ) ≡ arg min M ∈M N (cid:88) i =1 T (cid:88) t =1 (cid:88) s (cid:54) = t (cid:32) M i t + N (cid:88) j =1 M ijt Y js (cid:33) . (2.4)10or the connection to the original setup for the SC estimator, see Abadie et al. (2010) andDoudchenko and Imbens (2016). This definition of the weights has some important featuresthat drive some of the subsequent results. It is convenient to define the infeasible weights M ∗ ≡ M ( Y (0) , M ). The reason that it is convenient to work with M ∗ is that M ∗ is non-stochastic, as a function of the fixed potential outcomes and the set M . However, because wedo not observe all elements of Y (0), we cannot calculate M ∗ ijt for all i, j , and t . But, for thepurpose of estimation we do not need to know every element of M ∗ . In fact, we only need toknow M ∗ ijt for all j for the pair ( i, t ) such that U i = V t = 1, and these components are calculablegiven W and Y . Next, we consider the sets of possible weights M . For a given set M we can characterize theestimators asˆ τ ( U , V , Y , M ( Y , M )) = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) M i t ( Y , M ) + N (cid:88) j =1 M ijt ( Y , M ) Y jt (cid:41) . (2.5)For this class of estimators we can view the weights as non-stochastic: Lemma 1.
For all Y (0) , Y (1) , U , V , and M , ˆ τ ( U , V , Y , M ( Y , M )) = ˆ τ ( U , V , Y , M ( Y (0) , M )) (2.6)This representation is useful because the properties of ˆ τ ( U , V , Y , M ( Y (0) , M )) are easierto establish under assumptions on V and U than those of ˆ τ ( U , V , Y , M ( Y , M )) for the generalcase.We consider in this discussion only sets of weights that impose the following restrictions.First, M iit = 1 , ∀ i ∈ { , . . . , N } , t ∈ { , . . . , T } . (2.7)11econd, the weight for unit j for the prediction of the causal effect for unit i is nonpositive: M ijt ≤ , ∀ i ∈ { , . . . , N } , j ∈ { , . . . , N } \ { i } , t ∈ { , . . . , T } . (2.8)Note that there is no restriction on the intercepts M i t . To simplify the notation, and tohighlight the restrictions that differ between the different estimators, we leave the restrictions(2.7) and (2.8) implicit in the sets below. We consider four estimators in this class, characterizedby four sets of possible weights M which all imposing restrictions (2.7) and (2.8).The original SC estimator (Abadie and Gardeazabal, 2003; Abadie et al., 2010) correspondsto the estimator in (2.5) with the set M SC = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) M i t = 0 , ∀ i, t, N (cid:88) j =1 M ijt = 0 ∀ i, t (cid:27) . The modification introduced in Doudchenko and Imbens (2016) and Ferman and Pinto (2019),allows for an intercept by dropping the restriction M i t = 0, leading to the Modified SyntheticControl (MSC) estimator: M MSC = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) j =1 M ijt = 0 ∀ i, t (cid:27) . Allowing for this intercept in the weights has been proposed to make the SC estimator morerobust. Arkhangelsky et al. (2019) show that the inclusion of the intercept can be interpretedas including a unit fixed effect in the regression function. Here we discuss how the inclusion ofthe intercept ties in with the time randomization assumption.We also consider a second modification of the basic SC estimator, where we place an addi-tional set of restrictions on the weights: N (cid:88) i =1 M ijt = 0 , ∀ j = 1 , . . . , N, t = 1 , . . . , T. (2.9)This restriction will be seen to ensure unbiasedness of the corresponding estimator given ran-domization of the treated units. Formally the set of possible weights for the Unbiased Synthetic12ontrol (USC) estimator is M USC = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) M i t = 0 , ∀ i, t, N (cid:88) j =1 M ijt = 0 ∀ i, t, N (cid:88) i =1 M ijt = 0 ∀ j ≥ , t (cid:27) . Finally, we combine the inclusion of the intercept with the additional restriction, leading to theModified Unbiased Synthetic Control (MUSC) estimator: M MUSC = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) j =1 M ijt = 0 ∀ i, t, N (cid:88) i =1 M ijt = 0 ∀ j ≥ , t (cid:27) . Here, we drop the restriction that there is no intercept, M i t = 0 ∀ i, t . The optimization problemin (2.4) together with the restrictions in M MUSC then imply that intercepts do not contributebias, (cid:80) Ni =1 M i t = 0 ∀ t .These four sets of restrictions define four estimators: ˆ τ SC , ˆ τ MSC , ˆ τ USC , and ˆ τ MUSC , based onthe sets M SC , M MSC , M USC and M MUSC , respectively. Our focus is mainly on ˆ τ SC and ˆ τ MUSC ,with the comparison with M MSC and M USC serving to aid the interpretation of the restrictionsthat make up the difference between ˆ τ SC and ˆ τ MUSC . Specifically, we show that relaxing the no-intercept restriction leads to unbiasedness under time randomization for large T , and imposingthe second restriction (2.9) leads to unbiasedness under unit-randomization. In this section, we investigate the properties of the various estimators given the unit and/ortime randomization assumptions (Assumption 1 and Assumption 2). In some cases there areexact properties while in other cases these may depend on N and/or T being large. The first question we study is the bias of the four SC estimators, as estimators of the treatmenteffect for the treated unit, τ . We summarize the results in Table 3.The basic SC estimator is not unbiased even if both the unit treated and the time periodtreated were both randomly selected. Because this is perhaps surprising, it is useful to investigate13 able 3: Bias Properties of SC Estimators for τ Maintained AssumptionsEstimator ↓ Unit Rand Time Rand Time Rand Unit & Time Rand Unit & Time RandLarge T Large T ˆ τ SC B B B B Bˆ τ MSC
B B U B Bˆ τ USC
U B B U Uˆ τ MUSC
U B U U Uthe bias of the SC estimator in more detail. In order to guarantee unbiasedness given randomselection of the treated unit, we require that (cid:80) Ni =1 M ijt = 0, which holds for the USC andMUSC estimator. In order to achieve unbiasedness given random selection of the time period,we need an unrestricted intercept in the weights, as in the MSC and MUSC estimators, as wellas a large number of time periods.Recall the general definition of the SC type estimators in (2.3), which we rewrite asˆ τ ( U , V , Y , M ) = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) M i t + N (cid:88) j =1 M ijt Y jt (cid:41) = T (cid:88) t =1 N (cid:88) i =1 V t U i (cid:40) Y it (1) + M i t + N (cid:88) j =1 (1 − U j ) M ijt Y jt (0) (cid:41) . We focus on the properties relative to the treatment effect for the treated, (cid:80) Ni =1 (cid:80) Tt =1 U i V t ( Y it (1) − Y it (0)) . The estimation error is equal toˆ τ ( U , V , Y , M ) − τ ( U , V ) = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) Y it (0) + M i t + N (cid:88) j =1 (1 − U j ) M ijt Y jt (0) (cid:41) . We consider separately the SC estimators without an intercept, for fixed M , so that M i t = 0.This includes both the SC and USC estimators. Lemma 2.
Suppose Assumption 1 (random assignment of units to treatment) holds. Then ifone of the following two conditions holds: ( i ) the intercept is zero, M i t = 0 for all i, t , r ( ii ) if the intercept is estimated through (2.4),then the conditional bias vanishes if M guarantees N (cid:88) i =1 M ijt = 0 ∀ j, t. This lemma shows that τ USC and τ MUSC are unbiased, since both estimators only search overweight sets that fulfill the summing-up condition. Intuitively, this weight restriction ensuresthat each unit on average is used the same amount in treatment and in control when formingthe synthetic control estimator (and that the intercepts average out to zero).
Lemma 3.
For the SC estimator the conditional bias under Assumption 1 is
Bias SC = E [ ˆ τ − τ | V ] = 1 N N (cid:88) i =1 T (cid:88) t =1 V t Y it (0) N (cid:88) j =1 M SC jit . The intuition for the bias of the SC estimator also holds for the simple matching estima-tor (Abadie and Imbens, 2006), which is also biased under randomization in finite samples,and it extends to other weighting estimators with the addition of the conditional bias term N (cid:80) Ni =1 (cid:80) Tt =1 V t M i t coming from the intercepts.We can estimate the bias for the SC estimator as (cid:100) Bias SC = 1 N − N (cid:88) i =1 T (cid:88) t =1 V t (1 − U i ) Y it N (cid:88) j =1 M SC jit . (3.1)This estimator for the bias is unbiased, and so in principle an unbiased estimator can also begenerated by subtracting this estimated bias from the standard SC estimator. However, theproperties of this unbiased estimator are not very attractive in terms of root mean-squared error(RMSE).To see the role the time randomization plays in the bias, consider the MSC estimator in asetting with large T , and random selection of the treated period. For ease of exposition, supposeunit N is the treated unit. We can view the MSC estimator as a regression estimator where weregress the outcomes Y N , . . . , Y NT on the treatment indicator and the predictors Y t , . . . , Y N − ,t and an intercept. It is well known that this leads to an estimator that is asymptotically unbiased15n large samples (in this case meaning large T ; Freedman, 2008; Lin, 2013). Consider the general SC class estimator ˆ τ = ˆ τ ( U , V , Y , M ) as an estimator of τ . Lemma 4.
Suppose Assumption 1 holds. Then V ( V , M ) = E (cid:2) (ˆ τ ( U , V , Y , M ) − τ ) (cid:12)(cid:12) V (cid:3) = 1 N N (cid:88) i =1 T (cid:88) t =1 V t (cid:32) M i t + N (cid:88) j =1 M ijt Y jt (0) (cid:33) . For the unbiased estimators ˆ τ USC and ˆ τ MUSC this is also the variance. For the other estimatorsthis is the expected squared error. Note that the synthetic-control objective in (2.4) solves forthe analogue of this error in the non-treated periods, which we return to in Section 3.4.
Proposition 1.
Suppose Assumption 1 holds. Then the estimator ˆ V = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) N − N (cid:88) k =1 k (cid:54) = i N (cid:88) j =1 j (cid:54) = i M kjt ( Y jt − Y kt ) − N − N − N (cid:88) k =1 k (cid:54) = i N (cid:88) j =1 j (cid:54) = i M kjt ( Y jt − Y kt ) + 2 N − N (cid:88) k =1 k (cid:54) = i M k N (cid:88) j =1 j (cid:54) = i M kj ( Y jt − Y kt ) + 1 N N (cid:88) k =1 M k (cid:41) is unbiased for V ( V , M ) . The variance estimator in this proposition takes the form of a leave-one-out estimator, withan additional term that corrects for over-counting the diagonal elements in the inner square ofthe first term and additional terms for the intercept.
Here we consider an alternative method for estimating the variance of SC estimators, versionsof which have been proposed previously both for testing zero effects ( e.g.
Abadie et al., 2010)16nd for constructing confidence intervals ( e.g.
Doudchenko and Imbens, 2016). Suppose unit i is the treated unit. For each of the N − j with j (cid:54) = i ) we recalculatethe weights, leaving out the treated unit, and then estimate the treatment effect. For ease ofexposition we focus on the case of Assumption 3 where the last period is the treated period, V T = 1.Formally, define for i = 1 , . . . , N the N restrictions of the weights M for fixed t = T to M ( i ) ,where each weight matrix M ( i ) of dimension N × ( N + 1) (indexed by { , . . . , N } × { , . . . , N } as before) is further restricted by M ( i ) ij = 0 = M ( i ) ji for all j , and M ( i ) ( Y , M ( i ) ) = arg min M ( i ) ∈M ( i ) T − (cid:88) s =1 N (cid:88) j =1 j (cid:54) = i M ( i ) j + N (cid:88) k =1 k (cid:54) = i M ( i ) jk Y ks . (3.2)The placebo estimator is thenˆ τ ( i ) j = M ( i ) j + N (cid:88) k =1 k (cid:54) = i M ( i ) jk Y kT . This is an estimator of zero, and we can use it to estimate the variance asˆ V P CB = 1 N − N (cid:88) i =1 N (cid:88) j =1 j (cid:54) = i U i (cid:16) ˆ τ ( i ) j (cid:17) = 1 N − N (cid:88) i =1 N (cid:88) j =1 j (cid:54) = i U i M ( i ) j + N (cid:88) k =1 k (cid:54) = i M ( i ) jk Y kT . Below, we consider the properties of this placebo variance estimator for the case of the MUSCestimator.
Remark 1.
The placebo variance can be biased downward, conditional on V . We focus on thecase with V T = 1 (i.e. Assumption 3). Suppose N = 4 , M = , Y · T = , o the units are matched in pairs, but the matching is of poor quality. Then the placebo varianceis smaller in expectation than the true variance. Remark 2.
In the same setting of Assumption 3, the placebo variance can be biased upward,conditional on V . Suppose N = 4 , M = , Y · T = , so the units are matched in pairs, and the matching is of perfect quality. Then the placebovariance is higher in expectation than the true variance. Here we discuss how to motivate the SC estimators based on our design perspective. We maywish to choose a matrix M to minimize the expected variance, where the expectation is bothover the random period that is treated and the random unit that is treated. The expectedvariance is EV (ˆ τ ) = E (cid:2) (ˆ τ ( U , V , Y ) − τ ) (cid:3) = 1 N T N (cid:88) i =1 T (cid:88) t =1 (cid:32) M i t + N (cid:88) j =1 M ijt Y jt (0) (cid:33) . Let M ∗ be the value of M that minimizes this infeasible objective function. Under time-randomization (Assumption 2), an unbiased estimator for this objective function is (cid:99) EV = 1 N ( T − N (cid:88) i =1 T (cid:88) t =1 (1 − V t ) (cid:32) M i t + N (cid:88) j =1 M ijt Y jt (cid:33) . This is in fact the MUSC/SC objective function from (2.4). Let (cid:99) M be the value that minimizesthis empirical objective. (cid:99) M is not unbiased for M (cid:63) , but it is a natural approximation.18 igure 1: Pre- and Post-Treatment fit of SC and MUSC
In this section we illustrate some of the methods proposed in this article.
To illustrate some of the concepts developed in this article, we first turn to the data from theCalifornia smoking study (Abadie et al., 2010). In Figure 1 we compare the SC and MUSCestimates. We find that the pre-treatment fit as well as the point estimates are similar for theSC and MUSC estimators. 19 .2 A Simulation Study
We also perform a small simulation study to assess the properties of the MUSC estimator.Following Bertrand et al. (2004); Arkhangelsky et al. (2019), we use data from the CurrentPopulation Survey for 50 states and 40 years. The variables we use include state/year averagelog wages, hours, and the state/year unemployment rate. For each of the variables we use theDifference-in-Means (DiM) estimator, the standard Synthetic Control (SC) estimator and theModified Unbiased Synthetic Control (MUSC) estimator. We calculate for randomly selectedstates the estimated effect and compare that to the actual value.In Table 4 we report the RMSE. We see that the RMSE is substantially lower for the SCand the MUSC estimator compared to the DiM estimator. This is true for all years. Table 5reports the standard errors based on our variance estimator, and using the placebo approach.
In this section we look at generalizations of the set up considered so far with a single treated unitand single treated period where the estimand was the average effect for the treated. First, weconsider the case with multiple treated units. Second, we consider the case where the estimandis the average effect for all units in the treated period. Both of these generalizations createconceptual complications. Third, we consider the case of non-constant propensity scores.
In this section we look at the case with multiple treated units. We fix the number of treatedunits at N T . The estimand is, as before, the average effect for the N T treated units: τ = τ ( U , V ) ≡ N T N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:16) Y it (1) − Y it (0) (cid:17) , We modify Assumption 1 to 20 able 4:
Simulation Experiment Based on CPS Data by State and Year – RMSE
Treated
Log Wages Hours Unemployment Rate
Period DiM SC MUSC DiM SC MUSC DiM SC MUSC T = 21 0.1157 0.0543 0.0550 1.4838 0.9797 0.9340 0.0123 0.0111 0.0102 T = 22 0.1112 0.0496 0.0502 1.3764 0.8204 0.9170 0.0107 0.0086 0.0084 T = 23 0.1089 0.0435 0.0484 1.2821 0.9091 0.9009 0.0114 0.0111 0.0116 T = 24 0.1143 0.0421 0.0450 1.2400 0.9569 0.8829 0.0145 0.0146 0.0142 T = 25 0.1136 0.0465 0.0500 1.3125 0.9074 0.8896 0.0124 0.0110 0.0115 T = 26 0.1111 0.0487 0.0471 1.1424 0.8509 0.8347 0.0132 0.0136 0.0146 T = 27 0.1105 0.0481 0.0493 1.1926 0.8612 0.8141 0.0155 0.0145 0.0149 T = 28 0.1014 0.0551 0.0674 1.0627 0.7611 0.7660 0.0136 0.0105 0.0106 T = 29 0.0964 0.0473 0.0594 1.1342 0.8552 0.8686 0.0137 0.0124 0.0118 T = 30 0.0980 0.0505 0.0516 1.0741 0.7765 0.7755 0.0137 0.0141 0.0147 T = 31 0.1084 0.0410 0.0393 1.2340 0.9700 0.9537 0.0209 0.0180 0.0175 T = 32 0.1049 0.0460 0.0468 1.0571 0.8496 0.8209 0.0214 0.0182 0.0167 T = 33 0.1046 0.0530 0.0536 1.2947 1.0018 0.9696 0.0215 0.0161 0.0151 T = 34 0.1060 0.0625 0.0662 1.1640 0.9174 0.9007 0.0182 0.0131 0.0129 T = 35 0.1053 0.0560 0.0537 1.2233 1.0824 0.9914 0.0201 0.0137 0.0137 T = 36 0.0985 0.0594 0.0535 1.1267 0.8083 0.7396 0.0150 0.0124 0.0119 T = 37 0.1017 0.0605 0.0579 1.3066 1.2080 1.1235 0.0156 0.0128 0.0131 T = 38 0.0929 0.0580 0.0615 0.9917 0.7742 0.7929 0.0126 0.0116 0.0117 T = 39 0.0853 0.0459 0.0554 0.8437 0.7979 0.9475 0.0112 0.0120 0.0122 T = 40 0.1051 0.0517 0.0479 1.4048 1.2714 1.2382 0.0126 0.0112 0.0106 Average able 5: Simulation Experiment Based on CPS Data (Log Wages) by State and Year – StandardError; Treated Period T = 40 Unit (cid:112) ˆV DiM (cid:112) ˆV DiM , PCB (cid:112) ˆV SC (cid:112) ˆV SC , PCB (cid:112) ˆV MUSC (cid:112) ˆV MUSC , PCB i = 1 0.1047 0.1047 0.0506 0.0520 0.0546 0.0492 i = 2 0.1060 0.1060 0.0524 0.0519 0.0478 0.0480 i = 3 0.1046 0.1046 0.0529 0.0521 0.0479 0.0480 i = 4 0.1062 0.1062 0.0514 0.0522 0.0455 0.0479 i = 5 0.1052 0.1052 0.0520 0.0524 0.0487 0.0484 i = 6 0.1040 0.1040 0.0526 0.0526 0.0492 0.0486 i = 7 0.1045 0.1045 0.0486 0.0491 0.0461 0.0467 i = 8 0.1048 0.1049 0.0522 0.0518 0.0474 0.0476 i = 9 0.1061 0.1062 0.0527 0.0519 0.0501 0.0492 i = 10 0.1060 0.1060 0.0526 0.0525 0.0470 0.0486... i = 31 0.1012 0.1012 0.0480 0.0516 0.0465 0.0467 i = 32 0.1062 0.1062 0.0517 0.0510 0.0466 0.0474 i = 33 0.1062 0.1062 0.0527 0.0524 0.0485 0.0488 i = 34 0.1060 0.1060 0.0499 0.0496 0.0452 0.0466 i = 35 0.1061 0.1061 0.0526 0.0523 0.0469 0.0489 i = 36 0.1049 0.1050 0.0523 0.0524 0.0481 0.0483 i = 37 0.1062 0.1062 0.0527 0.0527 0.0474 0.0492 i = 38 0.1062 0.1062 0.0529 0.0521 0.0476 0.0482 i = 39 0.1059 0.1059 0.0524 0.0523 0.0464 0.0485 i = 40 0.1051 0.1052 0.0522 0.0521 0.0481 0.0486 i = 41 0.1062 0.1062 0.0510 0.0525 0.0464 0.0472 i = 42 0.1060 0.1060 0.0522 0.0519 0.0483 0.0485 i = 43 0.1054 0.1054 0.0511 0.0507 0.0523 0.0478 i = 44 0.1028 0.1028 0.0541 0.0518 0.0563 0.0482 i = 45 0.1060 0.1060 0.0500 0.0508 0.0508 0.0475 i = 46 0.1051 0.1051 0.0504 0.0507 0.0475 0.0472 i = 47 0.1048 0.1048 0.0520 0.0515 0.0486 0.0474 i = 48 0.1059 0.1059 0.0497 0.0506 0.0443 0.0454 i = 49 0.1056 0.1056 0.0525 0.0523 0.0471 0.0487 i = 50 0.1043 0.1043 0.0523 0.0523 0.0472 0.0479 Average ssumption 5. (Random Assignment of Units) pr( U = u ) = (cid:16) N ! N T ! N C ! (cid:17) − if u i ∈ { , } ∀ i, (cid:80) Ni =1 u i = N T , . In this case it is convenient to work with the sets of units assigned to the treatment,rather than the individual units. There are K = N ! / ( N T ! N C !) = (cid:0) NN T (cid:1) such sets. Of these,( N − / (( N T − N C !) = (cid:0) N − N T − (cid:1) include a given unit, such as unit 1, since there are (cid:0) N − N T − (cid:1) combinations of the remaining units if that unit is treated. This represents a fraction N T /N ofthe total number of sets of N T treated units. Hence the fraction of sets that does not includeunit 1 is N C /N .Let ˜ U be the vector of length K of indicators that denotes which set of N T units is treated.Let e k denote the K -component vector with all elements equal to zero, other than the k -thcomponent which is equal to one. By construction, (cid:80) Kk =1 ˜ U k = 1, and ˜ U k ∈ { , } . Assumption 5implies that the probability that ˜ U k = 1 is equal to N T ! N C ! /N ! = 1 /K . Let u i ( ˜ U ) ∈ { , } bean indicator for unit i being treated given the assignment vector ˜ U . In this notation we canrewrite τ as τ = τ ( ˜ U , V ) = 1 N T K (cid:88) k =1 T (cid:88) t =1 ˜ U k V t N (cid:88) i =1 u i ( ˜ U k ) (cid:16) Y it (1) − Y it (0) (cid:17) . Instead of the tensors M with dimension N × ( N + 1) × T , we now have tensors withdimension K × ( N + 1) × T , with one row for each of the K = N ! / ( N T ! N C !) possible sets oftreated units. The estimators we consider are of the formˆ τ ( ˜ U , V , Y , M ) ≡ K (cid:88) k =1 T (cid:88) t =1 ˜ U k V t (cid:40) M k t + N (cid:88) j =1 M kjt Y jt (cid:41) . (5.1)This formulation suggest the restriction that M kjt = 1 /N T , for all j such that u j ( e k ) = 1 and M kjt ≤ , whenever u j ( e k ) = 0. The set of such M we consider is for the generalized modified23nbiased synthetic control (MUSC) estimator is M MUSC = (cid:40) M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) j =1 M kjt = 0 ∀ k, t, K (cid:88) k =1 M kjt = 0 ∀ j ≥ , t (cid:41) . The objective function for choosing M is now M ( Y , M MUSC ) = arg min M ∈M MUSC K (cid:88) k =1 T (cid:88) t =1 (cid:88) s (cid:54) = t (cid:32) M k t + N (cid:88) j =1 M kjt Y js (cid:33) . Lemma 5.
Suppose that Assumption 5 holds. Then ( i ) the estimator ˆ τ ( ˜ U , V , Y , M ( Y , M MUSC ) is unbiased conditional on V : E (cid:104) ˆ τ ( ˜ U , V , Y , M ( Y , M MUSC ) − τ ( U , V ) (cid:12)(cid:12)(cid:12) V (cid:105) = 0 , ( ii ) the variance of ˆ τ ( ˜ U , V , Y , M ( Y , M MUSC ) is V (cid:16) ˆ τ ( ˜ U , V , Y , M ( Y , M MUSC )) (cid:12)(cid:12)(cid:12) V (cid:17) = 1 K T (cid:88) t =1 V t K (cid:88) k =1 (cid:32) M k t + N (cid:88) j =1 M kjt Y ts (0) (cid:33) , and ( iii ) , the variance can be estimated without bias (conditional on V ) by a generalization ofthe variance estimator in Proposition 1, ˆ V = K (cid:88) k =1 T (cid:88) t =1 ˜ U k V t (cid:32) K (cid:88) k (cid:48) =1; u i ( ˜ U k )+ u i ( ˜ U k (cid:48) ) ≤ ∀ i (cid:40) (cid:0) N C − N T (cid:1) (cid:32) N (cid:88) j =1 (1 − u j ( ˜ U k )) M k (cid:48) jt ( Y jt − Y k (cid:48) t ) (cid:33) − N T ( N C − (cid:0) N C − N T (cid:1) N (cid:88) j =1 (1 − u j ( ˜ U k )) M k (cid:48) jt ( Y jt − Y k (cid:48) t ) + 2 (cid:0) N C − N T (cid:1) M k (cid:48) t N (cid:88) j =1 (1 − u j ( ˜ U k )) M k (cid:48) jt ( Y j − Y k (cid:48) ) (cid:41) + 1 K K (cid:88) k (cid:48) =1 M k (cid:48) t (cid:33) for Y k (cid:48) t = N T (cid:80) Nj =1 u j ( ˜ U k (cid:48) ) Y jt . So far the estimators we consider have imposed the restriction that all the treated units24eceive equal weight, M kjt = 1 N T In the case with a single treated unit that restriction was natural, but here we could relax thisto requiring only that the sum of the weights for the treated units is restricted to unit: N (cid:88) j =1 u j ( e k ) M kjt = 1 . This allows us to choose the weights for the treated units to reduce the variance. Yet changingthe weights on treated units also affects the expectation under unit randomization. Specifically,we can understand the resulting estimator as estimating the weighted estimand K (cid:88) k =1 T (cid:88) t =1 ˜ U k V t N (cid:88) i =1 u i ( ˜ U k ) M kit (cid:16) Y it (1) − Y it (0) (cid:17) . While the estimator is unbiased relative to this estimand, and we can estimate its variance in thesame way as in Lemma 5, we cannot generally estimate its error relative to the equally-weightedaverage treatment effect on the treated τ .To see the issue of unequally weighted treatment units, let us focus on the simplest case withtwo treated units and a single control unit. In that case there are K = 3 possible sets of twotreated units. If we impose the restrictions that the weights for the treated units sum to oneand the weights for the control units sum to minus one, there is only one free parameter in eachrow of the weight matrix. Consider the first row of the weight matrix, with the third unit thecontrol unit. The estimator in that case isˆ τ = M T Y (1) + M T Y (1) − Y (0) . The error isˆ τ −
12 ( Y (1) + Y (1)) + 12 ( Y (0) + Y (0)) 25 ( M T − / Y (1) + ( M T − / Y (1) + 12 ( Y (0) + Y (0)) − Y (0) . Hence the expected squared error over the three assignments is13 (cid:40)(cid:18) ( M T − / Y (1) + ( M T − / Y (1) + 12 ( Y (0) + Y (0)) − Y (0) (cid:19) + (cid:18) ( M T − / Y (1) + ( M T − / Y (1) + 12 ( Y (0) + Y (0)) − Y (0) (cid:19) + (cid:18) ( M T − / Y (1) + ( M T − / Y (1) + 12 ( Y (0) + Y (0)) − Y (0) (cid:19) (cid:41) . The complication is that there is no unbiased estimator for this error because it involves cross-products of Y i (0) and Y i (1) which cannot be estimated. If we impose the restriction that theweights for all the treated units are equal, the dependence of the error on the Y i (1) vanishes,and the error can in general be estimated without bias. Here we look at the case where the estimand changes from the average effect for the treatedunit(s) to the average effect over all units in the treated periods. For ease of exposition wecontinue to focus on the case with a single treated period and a single treated unit. Theextension to the case with multiple treated units is conceptually clear based on the discussionin the previous subsection. Formally, the estimand is τ V = τ V ( V ) ≡ N N (cid:88) i =1 T (cid:88) t =1 V t (cid:16) Y it (1) − Y it (0) (cid:17) , We can separate this into two components, the effect for the treated unit, τ T ≡ N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:16) Y it (1) − Y it (0) (cid:17) , τ before), and the average effect for the control units: τ C ≡ N − N (cid:88) i =1 T (cid:88) t =1 (1 − U i ) V t (cid:16) Y it (1) − Y it (0) (cid:17) , with τ V = 1 N τ T + N − N τ C . Consider, as before, an estimator of the formˆ τ ( U , V , Y , M ) = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) M i t + N (cid:88) j =1 M ijt Y jt (cid:41) = T (cid:88) t =1 N (cid:88) i =1 V t U i (cid:40) M i t + M iit Y it (1) + N (cid:88) j =1 (1 − U j ) M ijt Y jt (0) (cid:41) . the restrictions M iit = 1 ∀ i, t , (cid:80) Ni =1 M ijt = 0 ∀ j, t (including the intercept) still imply unbiased-ness conditional on V , and the MUSC remains unbiased for τ V . Yet the variance (and moregenerally the conditional expected loss of such a weighted estimator) is now E (cid:104)(cid:0) ˆ τ ( U , V , Y , M ) − τ V (cid:1) (cid:12)(cid:12)(cid:12) V (cid:105) = 1 N N (cid:88) i =1 T (cid:88) t =1 V t (cid:18) M i t + (cid:18) M iit − N (cid:19) Y it (1) − N (cid:88) j =1 (1 − U j ) 1 N Y jt (1) (5.2)+ 1 N Y it (0) + N (cid:88) j =1 (1 − U j ) (cid:18) M ijt + 1 N (cid:19) Y jt (0) (cid:19) which depends on treated and untreated potential outcomes. This creates two related challenges:First, since the expression depends on treated outcomes, there is no immediate sample analogueavailable that corresponds to minimizing expected error, even under time randomization. Sec-ond, the variance cannot generally be estimated without bias, since it depends not only on thevariation of the Y it (0) (which can be estimated), but also on the variation of the Y it (1) and theircovariance with the Y it (0) (neither of which is identified from the data).We briefly discuss two assumptions on the (non-stochastic) correlation of treatment and27ontrol outcomes, and what they imply for estimation. First, if treatment effects are constantwithin time period (and treatment and control potential outcomes thus perfectly correlated, Y it (1) − Y it (0) = τ ), then (5.2) becomes1 N N (cid:88) i =1 T (cid:88) t =1 V t (cid:18) M i t + N (cid:88) j =1 M ijt Y jt (0) (cid:19) as before, suggesting the MUSC estimator. If, on the other hand, treated outcomes are uncor-related to control outcomes, and we focus on unbiased estimators (so in particular M iit = 1),then (5.2) becomes1 N N (cid:88) i =1 T (cid:88) t =1 V t (cid:18) M i t + 1 N N (cid:88) j =1 Y jt (0) + N (cid:88) j =1 (1 − U j ) M ijt Y jt (0) (cid:19) + const. , suggesting an alternative MUSC-type estimator that minimizes the sample analogue in non-treated time periods over weights M MUSC , which could effectively shrink the MUSC weights oncontrol units towards the DiM weights − N − . (Indeed, one feasible set of weights would yieldthe estimator ˆ τ = N ˆ τ MUSC + N − N ˆ τ DiM of τ V = N τ T + N − N τ C corresponding to control weights N M MUSC ijt − N . This solution would likely be suboptimal because it enforces M ijt ≤ − N forcontrol weights.) Throughout this article, we have assumed that treatment is assigned with equal probabilityacross units, time periods, or unit–time pairs. Yet the theory we develop generalizes to non-constant propensity scores. For example, assume that treatment is assigned randomly to unit i with probability p i (or, similarly, to a time period or a unit–time pair), where (cid:80) Ni =1 p i = 1. Anatural analogue of the MUSC estimator is then M MUSC p ( Y , M MUSC p ) ≡ arg min M ∈M MUSC p N (cid:88) i =1 p i T (cid:88) t =1 (cid:88) s (cid:54) = t (cid:32) M i t + N (cid:88) j =1 M ijt Y js (cid:33) (5.3)28here unbiasedness is guaranteed by the constraints M MUSC p = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) j =1 M ijt = 0 ∀ i, t, N (cid:88) i =1 p i M ijt = 0 ∀ j, t (cid:27) . Such an estimator could be used when treatment is assigned randomly. When the analyst hasa choice over the treatment assignment, and t = T , the optimization in (5.3) could also includethe choice of propensity score. In this article we study Synthetic Control methods from a design perspective. We show thatwhen a randomized experiment is conducted, the standard SC estimator is biased. However, aminor modification of the SC estimator is unbiased under randomization, and in cases with fewtreated units can have RMSE properties superior to those of the standard Difference-in-Meansestimator. We show that the design perspective also has implications for observational studies.We propose a variance estimator that is validated by randomization.29 eferences
Abadie, A. (2019). Using synthetic controls: Feasibility, data requirements, and methodologicalaspects. Journal of Economic Literature.Abadie, A., Athey, S., Imbens, G. W., and Wooldridge, J. M. (2020). Sampling-based versusdesign-based uncertainty in regression analysis. Econometrica, 88(1):265–296.Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparativecase studies: Estimating the effect of california’s tobacco control program. Journal of theAmerican statistical Association, 105(490):493–505.Abadie, A., Diamond, A., and Hainmueller, J. (2015). Comparative politics and the syntheticcontrol method. American Journal of Political Science, pages 495–510.Abadie, A. and Gardeazabal, J. (2003). The economic costs of conflict: A case study of thebasque country. American Economic Review, 93(-):113–132.Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators foraverage treatment effects. Econometrica, 74(1):235–267.Abadie, A. and L’Hour, J. (2017). A penalized synthetic control estimator for disaggregateddata. Work. Pap., Mass. Inst. Technol., Cambridge, MA.Amjad, M., Shah, D., and Shen, D. (2018). Robust synthetic control. The Journal of MachineLearning Research, 19(1):802–852.Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2019). Syntheticdifference in differences. Technical report, National Bureau of Economic Research.Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2017). Matrix completionmethods for causal panel data models. arXiv preprint arXiv:1710.10251.Athey, S. and Imbens, G. W. (2018). Design-based analysis in difference-in-differences settingswith staggered adoption. Technical report, National Bureau of Economic Research.Ben-Michael, E., Feller, A., and Rothstein, J. (2020). The augmented synthetic control method.arXiv preprint arXiv:1811.04170. 30ertrand, M., Duflo, E., and Mullainathan, S. (2004). How much should we trust differences-in-differences estimates? The Quarterly Journal of Economics, 119(1):249–275.Chernozhukov, V., Wuthrich, K., and Zhu, Y. (2017). An exact and robust conformal inferencemethod for counterfactual and synthetic controls. arXiv preprint arXiv:1712.09089.Cunningham, S. (2018). Causal inference: The mixtape. Yale University Press.Doudchenko, N. and Imbens, G. W. (2016). Balancing, regression, difference-in-differences andsynthetic control methods: A synthesis. Technical report, National Bureau of EconomicResearch.Ferman, B. and Pinto, C. (2017). Placebo tests for synthetic controls.Ferman, B. and Pinto, C. (2019). Synthetic controls with imperfect pre-treatment fit. arXivpreprint arXiv:1911.08521.Fisher, R. A. (1937). The design of experiments. Oliver And Boyd; Edinburgh; London.Freedman, D. A. (2008). On regression adjustments in experiments with several treatments.The annals of applied statistics, 2(1):176–196.Hahn, J. and Shi, R. (2016). Synthetic control and inference. Available at UCLA.Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and BiomedicalSciences. Cambridge University Press.Lei, L. and Cand`es, E. J. (2020). Conformal inference of counterfactuals and individual treatmenteffects. arXiv preprint arXiv:2006.06138.Li, K. T. (2020). Statistical inference for average treatment effects estimated by synthetic controlmethods. Journal of the American Statistical Association, 115(532):2068–2083.Lin, W. (2013). Agnostic notes on regression adjustments for experimental data: Reexaminingfreedman’s critique. The Annals of Applied Statistics, 7(1):295–318.Neyman, J. (1923/1990). On the application of probability theory to agricultural experiments.essay on principles. section 9. Statistical Science, 5(4):465–472.31ambachan, A. and Roth, J. (2020). Design-Based Uncertainty for Quasi-Experiments. arXivpreprint arXiv:2008.00602.Rosenbaum, P. R. (2002). Observational Studies. Springer.Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomizedstudies. Journal of educational Psychology, 66(5):688.Sekhon, J. S. and Shem-Tov, Y. (2017). Inference on a new class of sample average treatmenteffects. arXiv preprint arXiv:1708.02140.Xu, Y. (2017). Generalized synthetic control method: Causal inference with interactive fixedeffects models. Political Analysis, 25(1):57–76.32 ppendix
Proof of Proposition 1.
We first consider the case without intercept. As a preliminary calculation, notethat for k, j, j (cid:48) ∈ { , . . . , n } N (cid:88) i =1 (cid:88) k,j,j (cid:48) (cid:54) = i N − |{ k, j, j (cid:48) }| a kjj (cid:48) = (cid:88) k,j,j (cid:48) a kjj (cid:48) , (A.1)since every term kjj (cid:48) term appears N − |{ k, j, j (cid:48) }| times in the sum on the left. Let now a kjj (cid:48) = M kj ( Y j (0) − Y k (0)) · M kj (cid:48) ( Y j (cid:48) (0) − Y k (0)) , where for simplicity we fix the period t , drop all time indices to set M ij = M ijt , and write ˆ V i for thevariance estimator when U i = 1. Then a kjj (cid:48) = 0 for k ∈ { j, j (cid:48) } and thus1 N N (cid:88) i =1 N − (cid:88) k (cid:54) = i (cid:88) j (cid:54) = i M kj ( Y j (0) − Y k (0)) − N − N − (cid:88) k,j (cid:54) = i M kj ( Y j (0) − Y k (0)) (cid:124) (cid:123)(cid:122) (cid:125) =ˆ V i = 1 N N (cid:88) i =1 (cid:16) (cid:88) k,j,j (cid:48) (cid:54) = i N − a kjj (cid:48) − (cid:88) k,j,j (cid:48) (cid:54) = ij = j (cid:48) N − N − (cid:124) (cid:123)(cid:122) (cid:125) = N − − N − a kjj (cid:48) (cid:17) = 1 N (cid:16) N (cid:88) i =1 (cid:88) k,j,j (cid:48) (cid:54) = i |{ k,j,j (cid:48) }| =3 N − a kjj (cid:48) + (cid:88) k,j,j (cid:48) (cid:54) = i |{ k,j,j (cid:48) }| =2 N − a kjj (cid:48) (cid:124)(cid:123)(cid:122)(cid:125) =0 for j (cid:54) = j (cid:48) + (cid:88) k,j,j (cid:48) (cid:54) = i |{ k,j,j (cid:48) }| =1 N − a kjj (cid:48) (cid:124)(cid:123)(cid:122)(cid:125) =0 (cid:17) = 1 N N (cid:88) i =1 (cid:88) k,j,j (cid:48) (cid:54) = i N − |{ k, j, j (cid:48) }| a kjj (cid:48) (A.1) = 1 N (cid:88) k,j,j (cid:48) a kjj (cid:48) = 1 N N (cid:88) i =1 N (cid:88) j =1 M ij ( Y j (0) − Y i (0)) = 1 N N (cid:88) i =1 N (cid:88) j =1 M ij Y j (0) = V . Here, we have used that (cid:80) Nj =1 M ij = 0. ith an intercept we note that V = 1 N N (cid:88) i =1 M i + N (cid:88) j =1 M ij Y j (0) = 1 N N (cid:88) i =1 M i + N (cid:88) j =1 M ij ( Y j (0) − Y i (0)) = 1 N N (cid:88) i =1 M i + 2 N N (cid:88) i =1 M i N (cid:88) j =1 M ij ( Y j (0) − Y i (0)) + 1 N N (cid:88) j =1 M ij ( Y j (0) − Y i (0)) , where 2 N − (cid:88) k (cid:54) = i M k (cid:88) j (cid:54) = i M kj ( Y j (0) − Y k (0)) is unbiased for the middle term, using that M kj ( Y j (0) − Y k (0)) = 0 for k = j . It follows thatˆ V i = 1 N − (cid:88) k (cid:54) = i (cid:88) j (cid:54) = i M kj ( Y j (0) − Y k (0)) − N − N − (cid:88) k,j (cid:54) = i M kj ( Y j (0) − Y k (0)) + 2 N − (cid:88) k (cid:54) = i M k (cid:88) j (cid:54) = i M kj ( Y j (0) − Y k (0)) + 1 N (cid:88) k M k is an unbiased estimator of the conditional variance V . Proof of the variance expression in Lemma 5.
This proof generalized the proof of Proposition 1 above.Specifically, for [ N ] = { , . . . , N } , (cid:88) k ⊆ [ N ]; | k | = N T (cid:88) i ⊆ [ N ] \ k ; | i | = N T (cid:88) j,j (cid:48) ∈ [ N ] \ k ∪{ } (cid:0) | [ N ] \ ( k ∪ [ { j,j (cid:48) } ) | N T (cid:1) a i,j,j (cid:48) = (cid:88) k ⊆{ ,...,N } ; | k | = N T (cid:88) j,j (cid:48) ∈ [ N ] ∪{ } a k,j,j (cid:48) (A.2)for a conformal tensor a .For fixed t as above consider weights M kj indexed by k ⊆ [ N ] with | k | = N T and j ∈ [ N ] ∪