[PDF] A Design-Based Perspective on Synthetic Control Methods

Abstract

Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) methods have quickly become one of the leading methods for estimating causal effects in observational studies with panel data. Formal discussions often motivate SC methods by the assumption that the potential outcomes were generated by a factor model. Here we study SC methods from a design-based perspective, assuming a model for the selection of the treated unit(s), e.g., random selection as guaranteed in a randomized experiment. We show that SC methods offer benefits even in settings with randomized assignment, and that the design perspective offers new insights into SC methods for observational data. A first insight is that the standard SC estimator is not unbiased under random assignment. We propose a simple modification of the SC estimator that guarantees unbiasedness in this setting and derive its exact, randomization-based, finite sample variance. We also propose an unbiased estimator for this variance. We show in settings with real data that under random assignment this Modified Unbiased Synthetic Control (MUSC) estimator can have a root mean-squared error (RMSE) that is substantially lower than that of the difference-in-means estimator. We show that such an improvement is weakly guaranteed if the treated period is similar to the other periods, for example, if the treated period was randomly selected. The improvement is most likely to be substantial if the number of pre-treatment periods is large relative to the number of control units.

Full PDF

AA Design-Based Perspective on Synthetic Control Methods

Lea Bottmer ∗ Guido Imbens † Jann Spiess ‡ Merrill Warnick § First version: November 2020Current version: January 2021

Abstract

Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) meth-ods have quickly become one of the leading methods for estimating causal eﬀects in ob-servational studies with panel data. Formal discussions often motivate SC methods bythe assumption that the potential outcomes were generated by a factor model. Here westudy SC methods from a design-based perspective, assuming a model for the selection ofthe treated unit(s), e.g. , random selection as guaranteed in a randomized experiment. Weshow that SC methods oﬀer beneﬁts even in settings with randomized assignment, andthat the design perspective oﬀers new insights into SC methods for observational data. Aﬁrst insight is that the standard SC estimator is not unbiased under random assignment.We propose a simple modiﬁcation of the SC estimator that guarantees unbiasedness inthis setting and derive its exact, randomization-based, ﬁnite sample variance. We alsopropose an unbiased estimator for this variance. We show in settings with real data thatunder random assignment this Modiﬁed Unbiased Synthetic Control (MUSC) estimatorcan have a root mean-squared error (RMSE) that is substantially lower than that of thediﬀerence-in-means estimator. We show that such an improvement is weakly guaranteedif the treated period is similar to the other periods, for example, if the treated period wasrandomly selected. The improvement is most likely to be substantial if the number ofpre-treatment periods is large relative to the number of control units.

Keywords : Randomization, Synthetic Controls, Panel Data, Causal Eﬀects ∗ Department of Economics, Stanford University, [email protected] . † Professor of Economics, Graduate School of Business, and Department of Economics, StanfordUniversity, SIEPR, and NBER, [email protected] . ‡ Assistant Professor of Operations, Information & Technology, Graduate School of Business, Stan-ford University, and SIEPR, [email protected] . § Department of Economics, Stanford University, [email protected] . a r X i v : . [ ec on . E M ] J a n Introduction

Synthetic Control (SC) methods for estimating causal eﬀects have become commonplace inempirical work in the social sciences since their introduction by Alberto Abadie and coauthors(Abadie and Gardeazabal, 2003; Abadie et al., 2010, 2015). Typically the properties of theoriginal SC estimator, as well as those of the various modiﬁcations that have been proposedsubsequently, are studied under model-based assumptions about the distribution of the potentialoutcomes in the absence of the intervention. A common approach is to assume that the potentialoutcomes follow a factor model plus noise.In this article we take a diﬀerent approach to studying the properties of SC estimators.Instead of making model-based assumptions about the distribution of the potential outcomes, wemake design-based assumptions about the assignment of the unit/time-period pairs to treatment.In particular, we consider the case with a single treated unit/time-period, with the treated unitselected at random from a set of units. We ﬁnd that in this setting the original SC estimator isgenerally biased. We propose a simple modiﬁcation of the SC estimator, labeled the ModiﬁedUnbiased Synthetic Control (MUSC) estimator, which is unbiased under random assignment ofthe treatment. We also propose a variance estimator that is unbiased for the variance of thisestimator.Studying the properties of SC-type estimators under design-based assumptions serves twodistinct purposes. First, it suggests an important role for SC methods in randomized experi-ments. Second, it leads to new insights into SC methods in observational studies. SC methodsare particularly valuable in experimental settings with relatively few units with correlated vari-ation over time. In such experiments, where the design assumptions hold by deﬁnition, SC-typemethods can have substantially better RMSE properties than the standard estimator based onthe diﬀerence in means by treatment status, where we focus on the average treatment eﬀect onthe treated. However, it may be important to maintain the guarantees that the standard esti-mator for randomized experiments, the diﬀerence in means, enjoys under randomization. Theproposed MUSC estimator does so, and combines the typical improvement in terms of RMSEwith the guarantees under randomization. To illustrate the beneﬁts of the MUSC estimator inexperimental settings, we simulate an experiment based on data on average log wages observedacross 50 states over 40 years. We randomly select one state to be treated in the last period, andcompare the diﬀerence in means, the standard SC estimator, and our proposed MUSC (Modiﬁed1nbiased Synthetic Control) estimator. There are three key ﬁndings, partly reported in Table 1and expanded on in Section 4. First, the diﬀerence-in-means and the new MUSC estimator areunbiased by construction whereas the SC estimator is biased. Although in this example the biasof the SC estimator is modest, there are no guarantees that the bias is small in general. Second,the RMSE is substantially lower for the SC and MUSC estimators relative to the RMSE of thediﬀerence-in-means estimator. This suggests that there can be considerable gains to using SCmethods in randomized experiments, without a need to surrender the unbiasedness guaranteedby the randomization. Third, the new variance estimators are accurate in this setting.

Table 1:

Simulation Experiment Based on CPS Average Log Wage by State and YearDiﬀ in Means Synth Control Modiﬁed Unbiased Synth ControlBias 0 − .

007 0RMSE 0 .

105 0 .

051 0 . .

105 0 .

051 0 . e.g. Abadieet al., 2010; Doudchenko and Imbens, 2016; Ferman and Pinto, 2017), is also biased. Withplacebo methods, one estimates the variance by calculating the variance of the SC estimator overthe distribution generated by applying the SC estimator to randomly selected control units, withor without the actual treated unit. Such methods can both under-estimate and over-estimatethe true variance under our design assumptions. In contrast, our proposed variance estimatoris unbiased in ﬁnite samples, even with a single treated unit, under the random assignmentassumption, irrespective of any autocorrelation in the potential outcomes. Third, the designperspective highlights the importance of the choice of estimand. We deﬁne four diﬀerent averagetreatment eﬀects, either for the treated unit/period pairs, or averaged over units and or periods,2nd show how the choice of estimand relates to the assumptions and inferential methods.The article builds on the general Synthetic Control literature where early key contributionsare Abadie and Gardeazabal (2003); Abadie et al. (2010, 2015); Abadie (2019). Recent workproposing new estimators in this general class include Doudchenko and Imbens (2016); Abadieand L’Hour (2017); Ferman and Pinto (2017); Arkhangelsky et al. (2019); Li (2020); Ben-Michaelet al. (2020). The current paper also contributes to the literature on inference for SC estimators,including Abadie et al. (2010); Doudchenko and Imbens (2016); Ferman and Pinto (2017); Hahnand Shi (2016); Lei and Cand`es (2020); Chernozhukov et al. (2017). There is also a connectionto randomization inference in the general evaluation literature, ( e.g.

Neyman, 1990; Imbens andRubin, 2015; Sekhon and Shem-Tov, 2017; Abadie et al., 2020; Rambachan and Roth, 2020).

We consider a setting with N units, for whom we observe outcomes for T time periods. There isa binary treatment that varies by units and time periods, denoted by W it ∈ { , } , and a pair ofpotential outcomes Y it (0) and Y it (1) for all unit/period combinations (Rubin, 1974). We assumethere are no dynamic eﬀects for the time being, so the potential outcomes are indexed only bythe contemporaneous treatment. In some of the settings we consider, the dynamic eﬀects wouldsimply change the interpretation of the estimand. There are no restrictions on the time path ofthe potential outcomes. The N × T matrices of treatments and potential outcomes are denotedby W , Y (0) and Y (1) respectively. Given the treatment the realized/observed outcome matrixis Y , with typical element Y it ≡ W it Y it (1) + (1 − W it ) Y it (0) . (2.1)In contrast to most of the literature, we take the potential outcomes Y (0) and Y (1) as ﬁxed inour analysis, and treat the assignment matrix W as stochastic. This in turn makes the realizedoutcomes Y stochastic.For much of the discussion we focus on the case with a single treated unit and a singletreated period. Many of the insights carry over to the case with a block of treated unit/time-period pairs, and we discuss explicitly the case with multiple treated units in Section 5.1. Inthe case with a single treated unit/time-period pair, the N × T matrix W , with typical element3 it ∈ { , } , satisﬁes (cid:80) i,t W it = 1.It is useful for our design-based analysis to separate out the assignment mechanism into theselection of the time period treated and the unit treated. For that purpose, exploiting the factthat there is only a single pair ( i, t ) with W it = 1, we write W = U V (cid:62) , where U is an N -vector with typical element U i ∈ { , } and (cid:80) Ni =1 U i = 1, and V is a T -vectorwith typical element V i ∈ { , } and (cid:80) Tt =1 V t = 1. The V t and U i can be deﬁned in terms of W : V t = (cid:80) Ni =1 W it and U i = (cid:80) Tt =1 W it .In many cases the treated unit is exposed only in the last period, so V is non-stochasticand has the last element equal to one and all earlier elements equal to zero. We examine thiscase separately, but additional insights are obtained by considering the more general case whereboth the treated unit and the treated period are stochastic. Next we deﬁne the estimands we consider in this article. Being precise about estimands will beimportant for the discussion of bias and variance, as well as the role of various assumptions weintroduce. The estimands are deﬁned as averages over elements of the matrix Y (1) − Y (0) withtypical element τ it ≡ Y it (1) − Y it (0). Which elements of this matrix we average over potentiallydepends on W , and thus the estimand may be stochastic. It is useful to write the estimandsas functions of U and V to show that some of the estimands depend only on one component of W = U V (cid:62) . For ease of exposition, the dependence on the potential outcomes is suppressed inthe notation.The estimand that is the primary focus in this discussion is the causal eﬀect for the singletreated unit/time-period: τ ≡ τ ( U , V ) ≡ N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:16) Y it (1) − Y it (0) (cid:17) . (2.2)Because there is only a treated single unit/period, this is not an average but a single diﬀerencein potential outcomes. For the case with multiple treated units or periods this estimand could4e generalized to˜ τ ≡ N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:16) Y it (1) − Y it (0) (cid:17)(cid:46)(cid:32) N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:33) . We are interested in accurate estimation of τ as well as inference. In particular, we focus on twoproperties of estimators for τ : the bias and the (exact ﬁnite sample) variance. We also discussestimation of this variance.There are three other estimands that are important to contrast with the primary estimand τ . First, the average eﬀect for all N units in the treated period, the “vertical” eﬀect, which onlydepends on V and not on U : τ V ≡ τ V ( V ) ≡ N N (cid:88) i =1 T (cid:88) t =1 V t (cid:16) Y it (1) − Y it (0) (cid:17) , Next, the average eﬀect for the treated unit over all T periods, the “horizontal” eﬀect. Thisonly depends on U and not on V : τ H ≡ τ H ( U ) ≡ T N (cid:88) i =1 T (cid:88) t =1 U i (cid:16) Y it (1) − Y it (0) (cid:17) . Finally, the population average treatment eﬀect, which depends on neither U nor V : τ POP ≡ N T N (cid:88) i =1 T (cid:88) t =1 (cid:16) Y it (1) − Y it (0) (cid:17) . An important conceptual advantage of focusing on τ rather than any of the other threeestimands is that it frees us from conceptualizing Y it (1) for unit/time periods other than theunit/time pair that was actually treated. To make this explicit, we can write τ as τ = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − N (cid:88) i =1 T (cid:88) t =1 U i V t Y it (0) , which clariﬁes that the estimand does not depend on any unobserved Y it (1). This can beimportant because in some cases it is diﬃcult to give meaning to Y it (1) for units other than the5reated unit. For example, in the German reuniﬁcation application in Abadie et al. (2015), it isdiﬃcult to conceptualize Y it (1) for countries other than West Germany: what does it mean forFrance to unify with East Germany? However, focusing on τ allows us to ignore all unobserved Y it (1) in this application; we only need to consider the value of West German GDP in theabsence of the reuniﬁcation, Y it (0). Most of the literature on panel data in general, and SC methods in particular, relies on modelingassumptions on the potential outcomes to derive properties of the proposed methods. For SCmethods, the model often takes the form of an R -factor latent-factor model for the controloutcome Y it (0) = R (cid:88) r =1 γ i β t + ε it , in combination with independence assumptions on the noise components ε it (Abadie et al., 2010;Athey et al., 2017; Amjad et al., 2018; Xu, 2017). Here we focus instead on design assumptionsabout the assignment process, that is, assumptions on the distribution of W (or, equivalently,the distributions of U and V ). In this approach, we place no restrictions on the potentialoutcomes, and in fact take those as ﬁxed in the repeated sampling thought experiments. Suchdesign-based, as opposed to model-based, approaches have been used in the general experimentaldesign and program evaluation literature ( e.g. Fisher, 1937; Neyman, 1990; Imbens and Rubin,2015; Rosenbaum, 2002; Cunningham, 2018), as well more recently in regression settings (Abadieet al., 2020), and panel data analyses (Athey and Imbens, 2018). However, to the best of ourknowledge, they have not yet been used to analyze SC estimators.First, we consider random assignment of the units to the treatment.

Assumption 1. (Random Assignment of Units) pr( U = u ) =  N if u i ∈ { , } ∀ i, (cid:80) Ni =1 u i = 1 , . We can also write this as U ⊥⊥ ( Y (0) , Y (1)) . Some version of this assumption underlies6any approaches to inference in synthetic control estimation.A second assumption we consider for some (but not all) of our results is less common. Herewe assume that the (single) treated period was randomly selected from the T periods underobservation. Assumption 2. (Random Assignment of Treated Period) pr( V = v ) =  T if v t ∈ { , } ∀ t, (cid:80) Tt =1 v t = 1 , . Although this assumption is not plausible in many cases, as it is often the last period(s) thatare treated, it is useful to consider its implications. The implicit synthetic control assumptionis that the within-period relation between outcomes for diﬀerent units during the pre-treatmentperiods is similar to that within-period relationship during the treated periods. Assumption 2formalizes this.We also consider the special case, which is common in empirical work, where the last elementof V is equal to one and all other elements are equal to zero, so the treatment only occurs inthe last period. In this case, the distribution of V is degenerate. Assumption 3. (Last Period Assignment) pr( V = v ) =  v t = 0 ∀ t = 1 , . . . , T − , v T = 1 , . Though some properties we derive rely on Assumption 1 and Assumption 2, some results areconditional on V and therefore still apply to this common special case.Most of our discussion concerns the ﬁnite-sample case. However, for some results it willbe useful to consider large- T approximations. There are also settings where it may be usefulto consider large- N results, but we do not do so here. Large- N settings require regularizationof the SC weights, and the properties of the estimators will depend directly on the speciﬁcregularization method used ( e.g., Abadie and L’Hour, 2017; Doudchenko and Imbens, 2016).For large- T results we consider a stationarity assumption. First deﬁne Y · t (0) to be the N vectorwith typical element Y it (0). Deﬁne the averages up to period t of the ﬁrst and the centered7econd moment:ˆ µ t = 1 t t (cid:88) s =1 Y · s (0) , ˆΣ t = 1 t t (cid:88) s =1 ( Y · s (0) − ˆ µ s ) ( Y · s (0) − ˆ µ s ) (cid:62) . Assumption 4. (Large- T Stationarity)

For some ﬁnite µ and Σ the sequence of popula-tions indexed by T satisﬁes, as T → ∞ , ˆ µ T −→ µ, ˆΣ T −→ Σ . Next we introduce four conventional estimators and assess their properties as estimators for thefour estimands deﬁned in Section 2.1 under Assumption 1 and/or Assumption 2. This serves toclarify the role of the assumptions and the comparisons they validate, as well as set the stagefor the discussion of the SC estimators.The ﬁrst estimator is the simple diﬀerence in outcomes for treated and control unit/periodpairs: ˆ τ DiM ( U , V , Y ) = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − N T − N (cid:88) i =1 T (cid:88) t =1 (1 − U i V t ) Y it . Second, the vertical diﬀerence, only for the treated period, between the treated unit and thecontrol units:ˆ τ V ( U , V , Y ) = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − N − N (cid:88) i =1 T (cid:88) t =1 V t (1 − U i ) Y it . Third, the horizontal diﬀerence, only for the treated unit, between the treated period and thecontrol periods:ˆ τ H ( U , V , Y ) = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − T − N (cid:88) i =1 T (cid:88) t =1 U i (1 − V t ) Y it . Fourth, the diﬀerence-in-diﬀerences or two-way ﬁxed eﬀect estimator (based on the two-way8xed eﬀect regression speciﬁcation Y it = µ + β t + α i + τ W it + ε it ). This DiD estimator can bewritten as the double diﬀerence in four averages:ˆ τ DiD ( U , V , Y ) = N (cid:88) i =1 T (cid:88) t =1 U i V t Y it − N − N (cid:88) i =1 T (cid:88) t =1 V t (1 − U i ) Y it − T − N (cid:88) i =1 T (cid:88) t =1 U i (1 − V t ) Y it + 1( N − T − N (cid:88) i =1 T (cid:88) t =1 (1 − V t )(1 − U i ) Y it . In Table 2 we report the bias properties of the four estimators under Assumptions 1 and 2,both separately and jointly, for the four estimands.

Table 2:

Bias Properties of Estimators for Varying Estimands and AssumptionsMaintained AssumptionsUnit Randomization Time Randomization Unit & Time RandEstimand → τ τ POP τ V τ H τ τ POP τ V τ H τ τ POP τ V τ H Estimator ↓ ˆ τ DiM

B B B B B B B B U U U Uˆ τ H B B B B U B B U U U U Uˆ τ V U B U B B B B B U U U Uˆ τ DiD

U B U B U B B U U U U UFor the ﬁrst four estimators based on simple sample averages, unit randomization guaranteesunbiasedness for any estimator that compares average outcomes for the treated unit to theaverage for the control units, which includes the DiD estimator ˆ τ DiD and the vertical averageestimator ˆ τ V . On the other hand, time randomization ensures unbiasedness for estimators thatcompare average outcomes for treated periods and control periods, which includes the DiDestimator ˆ τ DiD and the horizontal average estimator ˆ τ H . The simple diﬀerence estimator is onlyunbiased under both unit and time randomization. In this section we consider SC type estimators, including the original SC estimator proposedby Abadie et al. (2010), as well as three modiﬁcations thereof. We deﬁne these estimators for9ll possible treatment assignment vectors U and V , initially for the case where both U and V have a single element equal to one and all other elements equal to zero. We do this at a higherlevel of abstraction than is typically done in the SC literature in order to accommodate somemodiﬁcations and to convey additional insights. We characterize the SC type estimators in terms of a set of weights M ijt , indexed by i = 1 , . . . , N , j = 0 , . . . , N , and t = 1 , . . . , T . Given a set of weights M , and given the assignments U , V ,and the outcomes Y , the general SC estimator has the formˆ τ ( U , V , Y , M ) ≡ N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) M i t + N (cid:88) j =1 M ijt Y jt (cid:41) . (2.3)Note that this estimator simpliﬁes to the vertical estimator ˆ τ V ( U , V , Y ) if we choose the weights M i t = 0 and M ijt = − / ( N −

1) for j (cid:54) = 0 , i , and M iit = 1 for all i .The various estimators in this SC class we consider diﬀer in the choice of the weights M . There are in general two components to this choice. First, there is a set of possible weights,denoted by M , over which we search for an optimal weight. These sets diﬀer between theestimators we consider, but in all cases the sets are non-stochastic and do not depend on eitherthe assignment matrix W or the outcome data Y . Second, there is an objective function thatdeﬁnes the chosen weight within the set of possible weights. This objective function is the samefor all estimators we consider in the current article. We start with the second component. For a given matrix of outcomes Y , and a given setof possible weights M , deﬁne the tensor M ( Y , M ) with elements M ijt , for i = 1 , . . . , N , j = 0 , . . . , N , and t = 1 , . . . , T , through the minimization over the weights M over the set ofweights M : M ( Y , M ) ≡ arg min M ∈M N (cid:88) i =1 T (cid:88) t =1 (cid:88) s (cid:54) = t (cid:32) M i t + N (cid:88) j =1 M ijt Y js (cid:33)  . (2.4)10or the connection to the original setup for the SC estimator, see Abadie et al. (2010) andDoudchenko and Imbens (2016). This deﬁnition of the weights has some important featuresthat drive some of the subsequent results. It is convenient to deﬁne the infeasible weights M ∗ ≡ M ( Y (0) , M ). The reason that it is convenient to work with M ∗ is that M ∗ is non-stochastic, as a function of the ﬁxed potential outcomes and the set M . However, because wedo not observe all elements of Y (0), we cannot calculate M ∗ ijt for all i, j , and t . But, for thepurpose of estimation we do not need to know every element of M ∗ . In fact, we only need toknow M ∗ ijt for all j for the pair ( i, t ) such that U i = V t = 1, and these components are calculablegiven W and Y . Next, we consider the sets of possible weights M . For a given set M we can characterize theestimators asˆ τ ( U , V , Y , M ( Y , M )) = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) M i t ( Y , M ) + N (cid:88) j =1 M ijt ( Y , M ) Y jt (cid:41) . (2.5)For this class of estimators we can view the weights as non-stochastic: Lemma 1.

For all Y (0) , Y (1) , U , V , and M , ˆ τ ( U , V , Y , M ( Y , M )) = ˆ τ ( U , V , Y , M ( Y (0) , M )) (2.6)This representation is useful because the properties of ˆ τ ( U , V , Y , M ( Y (0) , M )) are easierto establish under assumptions on V and U than those of ˆ τ ( U , V , Y , M ( Y , M )) for the generalcase.We consider in this discussion only sets of weights that impose the following restrictions.First, M iit = 1 , ∀ i ∈ { , . . . , N } , t ∈ { , . . . , T } . (2.7)11econd, the weight for unit j for the prediction of the causal eﬀect for unit i is nonpositive: M ijt ≤ , ∀ i ∈ { , . . . , N } , j ∈ { , . . . , N } \ { i } , t ∈ { , . . . , T } . (2.8)Note that there is no restriction on the intercepts M i t . To simplify the notation, and tohighlight the restrictions that diﬀer between the diﬀerent estimators, we leave the restrictions(2.7) and (2.8) implicit in the sets below. We consider four estimators in this class, characterizedby four sets of possible weights M which all imposing restrictions (2.7) and (2.8).The original SC estimator (Abadie and Gardeazabal, 2003; Abadie et al., 2010) correspondsto the estimator in (2.5) with the set M SC = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) M i t = 0 , ∀ i, t, N (cid:88) j =1 M ijt = 0 ∀ i, t (cid:27) . The modiﬁcation introduced in Doudchenko and Imbens (2016) and Ferman and Pinto (2019),allows for an intercept by dropping the restriction M i t = 0, leading to the Modiﬁed SyntheticControl (MSC) estimator: M MSC = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) j =1 M ijt = 0 ∀ i, t (cid:27) . Allowing for this intercept in the weights has been proposed to make the SC estimator morerobust. Arkhangelsky et al. (2019) show that the inclusion of the intercept can be interpretedas including a unit ﬁxed eﬀect in the regression function. Here we discuss how the inclusion ofthe intercept ties in with the time randomization assumption.We also consider a second modiﬁcation of the basic SC estimator, where we place an addi-tional set of restrictions on the weights: N (cid:88) i =1 M ijt = 0 , ∀ j = 1 , . . . , N, t = 1 , . . . , T. (2.9)This restriction will be seen to ensure unbiasedness of the corresponding estimator given ran-domization of the treated units. Formally the set of possible weights for the Unbiased Synthetic12ontrol (USC) estimator is M USC = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) M i t = 0 , ∀ i, t, N (cid:88) j =1 M ijt = 0 ∀ i, t, N (cid:88) i =1 M ijt = 0 ∀ j ≥ , t (cid:27) . Finally, we combine the inclusion of the intercept with the additional restriction, leading to theModiﬁed Unbiased Synthetic Control (MUSC) estimator: M MUSC = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) j =1 M ijt = 0 ∀ i, t, N (cid:88) i =1 M ijt = 0 ∀ j ≥ , t (cid:27) . Here, we drop the restriction that there is no intercept, M i t = 0 ∀ i, t . The optimization problemin (2.4) together with the restrictions in M MUSC then imply that intercepts do not contributebias, (cid:80) Ni =1 M i t = 0 ∀ t .These four sets of restrictions deﬁne four estimators: ˆ τ SC , ˆ τ MSC , ˆ τ USC , and ˆ τ MUSC , based onthe sets M SC , M MSC , M USC and M MUSC , respectively. Our focus is mainly on ˆ τ SC and ˆ τ MUSC ,with the comparison with M MSC and M USC serving to aid the interpretation of the restrictionsthat make up the diﬀerence between ˆ τ SC and ˆ τ MUSC . Speciﬁcally, we show that relaxing the no-intercept restriction leads to unbiasedness under time randomization for large T , and imposingthe second restriction (2.9) leads to unbiasedness under unit-randomization. In this section, we investigate the properties of the various estimators given the unit and/ortime randomization assumptions (Assumption 1 and Assumption 2). In some cases there areexact properties while in other cases these may depend on N and/or T being large. The ﬁrst question we study is the bias of the four SC estimators, as estimators of the treatmenteﬀect for the treated unit, τ . We summarize the results in Table 3.The basic SC estimator is not unbiased even if both the unit treated and the time periodtreated were both randomly selected. Because this is perhaps surprising, it is useful to investigate13 able 3: Bias Properties of SC Estimators for τ Maintained AssumptionsEstimator ↓ Unit Rand Time Rand Time Rand Unit & Time Rand Unit & Time RandLarge T Large T ˆ τ SC B B B B Bˆ τ MSC

B B U B Bˆ τ USC

U B B U Uˆ τ MUSC

U B U U Uthe bias of the SC estimator in more detail. In order to guarantee unbiasedness given randomselection of the treated unit, we require that (cid:80) Ni =1 M ijt = 0, which holds for the USC andMUSC estimator. In order to achieve unbiasedness given random selection of the time period,we need an unrestricted intercept in the weights, as in the MSC and MUSC estimators, as wellas a large number of time periods.Recall the general deﬁnition of the SC type estimators in (2.3), which we rewrite asˆ τ ( U , V , Y , M ) = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) M i t + N (cid:88) j =1 M ijt Y jt (cid:41) = T (cid:88) t =1 N (cid:88) i =1 V t U i (cid:40) Y it (1) + M i t + N (cid:88) j =1 (1 − U j ) M ijt Y jt (0) (cid:41) . We focus on the properties relative to the treatment eﬀect for the treated, (cid:80) Ni =1 (cid:80) Tt =1 U i V t ( Y it (1) − Y it (0)) . The estimation error is equal toˆ τ ( U , V , Y , M ) − τ ( U , V ) = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) Y it (0) + M i t + N (cid:88) j =1 (1 − U j ) M ijt Y jt (0) (cid:41) . We consider separately the SC estimators without an intercept, for ﬁxed M , so that M i t = 0.This includes both the SC and USC estimators. Lemma 2.

Suppose Assumption 1 (random assignment of units to treatment) holds. Then ifone of the following two conditions holds: ( i ) the intercept is zero, M i t = 0 for all i, t , r ( ii ) if the intercept is estimated through (2.4),then the conditional bias vanishes if M guarantees N (cid:88) i =1 M ijt = 0 ∀ j, t. This lemma shows that τ USC and τ MUSC are unbiased, since both estimators only search overweight sets that fulﬁll the summing-up condition. Intuitively, this weight restriction ensuresthat each unit on average is used the same amount in treatment and in control when formingthe synthetic control estimator (and that the intercepts average out to zero).

Lemma 3.

For the SC estimator the conditional bias under Assumption 1 is

Bias SC = E [ ˆ τ − τ | V ] = 1 N N (cid:88) i =1 T (cid:88) t =1 V t Y it (0) N (cid:88) j =1 M SC jit . The intuition for the bias of the SC estimator also holds for the simple matching estima-tor (Abadie and Imbens, 2006), which is also biased under randomization in ﬁnite samples,and it extends to other weighting estimators with the addition of the conditional bias term N (cid:80) Ni =1 (cid:80) Tt =1 V t M i t coming from the intercepts.We can estimate the bias for the SC estimator as (cid:100) Bias SC = 1 N − N (cid:88) i =1 T (cid:88) t =1 V t (1 − U i ) Y it N (cid:88) j =1 M SC jit . (3.1)This estimator for the bias is unbiased, and so in principle an unbiased estimator can also begenerated by subtracting this estimated bias from the standard SC estimator. However, theproperties of this unbiased estimator are not very attractive in terms of root mean-squared error(RMSE).To see the role the time randomization plays in the bias, consider the MSC estimator in asetting with large T , and random selection of the treated period. For ease of exposition, supposeunit N is the treated unit. We can view the MSC estimator as a regression estimator where weregress the outcomes Y N , . . . , Y NT on the treatment indicator and the predictors Y t , . . . , Y N − ,t and an intercept. It is well known that this leads to an estimator that is asymptotically unbiased15n large samples (in this case meaning large T ; Freedman, 2008; Lin, 2013). Consider the general SC class estimator ˆ τ = ˆ τ ( U , V , Y , M ) as an estimator of τ . Lemma 4.

Suppose Assumption 1 holds. Then V ( V , M ) = E (cid:2) (ˆ τ ( U , V , Y , M ) − τ ) (cid:12)(cid:12) V (cid:3) = 1 N N (cid:88) i =1 T (cid:88) t =1 V t (cid:32) M i t + N (cid:88) j =1 M ijt Y jt (0) (cid:33) . For the unbiased estimators ˆ τ USC and ˆ τ MUSC this is also the variance. For the other estimatorsthis is the expected squared error. Note that the synthetic-control objective in (2.4) solves forthe analogue of this error in the non-treated periods, which we return to in Section 3.4.

Proposition 1.

Suppose Assumption 1 holds. Then the estimator ˆ V = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) N − N (cid:88) k =1 k (cid:54) = i  N (cid:88) j =1 j (cid:54) = i M kjt ( Y jt − Y kt )  − N − N − N (cid:88) k =1 k (cid:54) = i N (cid:88) j =1 j (cid:54) = i M kjt ( Y jt − Y kt ) + 2 N − N (cid:88) k =1 k (cid:54) = i M k  N (cid:88) j =1 j (cid:54) = i M kj ( Y jt − Y kt )  + 1 N N (cid:88) k =1 M k (cid:41) is unbiased for V ( V , M ) . The variance estimator in this proposition takes the form of a leave-one-out estimator, withan additional term that corrects for over-counting the diagonal elements in the inner square ofthe ﬁrst term and additional terms for the intercept.

Here we consider an alternative method for estimating the variance of SC estimators, versionsof which have been proposed previously both for testing zero eﬀects ( e.g.

Abadie et al., 2010)16nd for constructing conﬁdence intervals ( e.g.

Doudchenko and Imbens, 2016). Suppose unit i is the treated unit. For each of the N − j with j (cid:54) = i ) we recalculatethe weights, leaving out the treated unit, and then estimate the treatment eﬀect. For ease ofexposition we focus on the case of Assumption 3 where the last period is the treated period, V T = 1.Formally, deﬁne for i = 1 , . . . , N the N restrictions of the weights M for ﬁxed t = T to M ( i ) ,where each weight matrix M ( i ) of dimension N × ( N + 1) (indexed by { , . . . , N } × { , . . . , N } as before) is further restricted by M ( i ) ij = 0 = M ( i ) ji for all j , and M ( i ) ( Y , M ( i ) ) = arg min M ( i ) ∈M ( i ) T − (cid:88) s =1 N (cid:88) j =1 j (cid:54) = i  M ( i ) j + N (cid:88) k =1 k (cid:54) = i M ( i ) jk Y ks  . (3.2)The placebo estimator is thenˆ τ ( i ) j = M ( i ) j + N (cid:88) k =1 k (cid:54) = i M ( i ) jk Y kT . This is an estimator of zero, and we can use it to estimate the variance asˆ V P CB = 1 N − N (cid:88) i =1 N (cid:88) j =1 j (cid:54) = i U i (cid:16) ˆ τ ( i ) j (cid:17) = 1 N − N (cid:88) i =1 N (cid:88) j =1 j (cid:54) = i U i  M ( i ) j + N (cid:88) k =1 k (cid:54) = i M ( i ) jk Y kT  . Below, we consider the properties of this placebo variance estimator for the case of the MUSCestimator.

Remark 1.

The placebo variance can be biased downward, conditional on V . We focus on thecase with V T = 1 (i.e. Assumption 3). Suppose N = 4 , M =   , Y · T =   , o the units are matched in pairs, but the matching is of poor quality. Then the placebo varianceis smaller in expectation than the true variance. Remark 2.

In the same setting of Assumption 3, the placebo variance can be biased upward,conditional on V . Suppose N = 4 , M =   , Y · T =   , so the units are matched in pairs, and the matching is of perfect quality. Then the placebovariance is higher in expectation than the true variance. Here we discuss how to motivate the SC estimators based on our design perspective. We maywish to choose a matrix M to minimize the expected variance, where the expectation is bothover the random period that is treated and the random unit that is treated. The expectedvariance is EV (ˆ τ ) = E (cid:2) (ˆ τ ( U , V , Y ) − τ ) (cid:3) = 1 N T N (cid:88) i =1 T (cid:88) t =1 (cid:32) M i t + N (cid:88) j =1 M ijt Y jt (0) (cid:33) . Let M ∗ be the value of M that minimizes this infeasible objective function. Under time-randomization (Assumption 2), an unbiased estimator for this objective function is (cid:99) EV = 1 N ( T − N (cid:88) i =1 T (cid:88) t =1 (1 − V t ) (cid:32) M i t + N (cid:88) j =1 M ijt Y jt (cid:33) . This is in fact the MUSC/SC objective function from (2.4). Let (cid:99) M be the value that minimizesthis empirical objective. (cid:99) M is not unbiased for M (cid:63) , but it is a natural approximation.18 igure 1: Pre- and Post-Treatment ﬁt of SC and MUSC

In this section we illustrate some of the methods proposed in this article.

To illustrate some of the concepts developed in this article, we ﬁrst turn to the data from theCalifornia smoking study (Abadie et al., 2010). In Figure 1 we compare the SC and MUSCestimates. We ﬁnd that the pre-treatment ﬁt as well as the point estimates are similar for theSC and MUSC estimators. 19 .2 A Simulation Study

We also perform a small simulation study to assess the properties of the MUSC estimator.Following Bertrand et al. (2004); Arkhangelsky et al. (2019), we use data from the CurrentPopulation Survey for 50 states and 40 years. The variables we use include state/year averagelog wages, hours, and the state/year unemployment rate. For each of the variables we use theDiﬀerence-in-Means (DiM) estimator, the standard Synthetic Control (SC) estimator and theModiﬁed Unbiased Synthetic Control (MUSC) estimator. We calculate for randomly selectedstates the estimated eﬀect and compare that to the actual value.In Table 4 we report the RMSE. We see that the RMSE is substantially lower for the SCand the MUSC estimator compared to the DiM estimator. This is true for all years. Table 5reports the standard errors based on our variance estimator, and using the placebo approach.

In this section we look at generalizations of the set up considered so far with a single treated unitand single treated period where the estimand was the average eﬀect for the treated. First, weconsider the case with multiple treated units. Second, we consider the case where the estimandis the average eﬀect for all units in the treated period. Both of these generalizations createconceptual complications. Third, we consider the case of non-constant propensity scores.

In this section we look at the case with multiple treated units. We ﬁx the number of treatedunits at N T . The estimand is, as before, the average eﬀect for the N T treated units: τ = τ ( U , V ) ≡ N T N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:16) Y it (1) − Y it (0) (cid:17) , We modify Assumption 1 to 20 able 4:

Simulation Experiment Based on CPS Data by State and Year – RMSE

Treated

Log Wages Hours Unemployment Rate

Period DiM SC MUSC DiM SC MUSC DiM SC MUSC T = 21 0.1157 0.0543 0.0550 1.4838 0.9797 0.9340 0.0123 0.0111 0.0102 T = 22 0.1112 0.0496 0.0502 1.3764 0.8204 0.9170 0.0107 0.0086 0.0084 T = 23 0.1089 0.0435 0.0484 1.2821 0.9091 0.9009 0.0114 0.0111 0.0116 T = 24 0.1143 0.0421 0.0450 1.2400 0.9569 0.8829 0.0145 0.0146 0.0142 T = 25 0.1136 0.0465 0.0500 1.3125 0.9074 0.8896 0.0124 0.0110 0.0115 T = 26 0.1111 0.0487 0.0471 1.1424 0.8509 0.8347 0.0132 0.0136 0.0146 T = 27 0.1105 0.0481 0.0493 1.1926 0.8612 0.8141 0.0155 0.0145 0.0149 T = 28 0.1014 0.0551 0.0674 1.0627 0.7611 0.7660 0.0136 0.0105 0.0106 T = 29 0.0964 0.0473 0.0594 1.1342 0.8552 0.8686 0.0137 0.0124 0.0118 T = 30 0.0980 0.0505 0.0516 1.0741 0.7765 0.7755 0.0137 0.0141 0.0147 T = 31 0.1084 0.0410 0.0393 1.2340 0.9700 0.9537 0.0209 0.0180 0.0175 T = 32 0.1049 0.0460 0.0468 1.0571 0.8496 0.8209 0.0214 0.0182 0.0167 T = 33 0.1046 0.0530 0.0536 1.2947 1.0018 0.9696 0.0215 0.0161 0.0151 T = 34 0.1060 0.0625 0.0662 1.1640 0.9174 0.9007 0.0182 0.0131 0.0129 T = 35 0.1053 0.0560 0.0537 1.2233 1.0824 0.9914 0.0201 0.0137 0.0137 T = 36 0.0985 0.0594 0.0535 1.1267 0.8083 0.7396 0.0150 0.0124 0.0119 T = 37 0.1017 0.0605 0.0579 1.3066 1.2080 1.1235 0.0156 0.0128 0.0131 T = 38 0.0929 0.0580 0.0615 0.9917 0.7742 0.7929 0.0126 0.0116 0.0117 T = 39 0.0853 0.0459 0.0554 0.8437 0.7979 0.9475 0.0112 0.0120 0.0122 T = 40 0.1051 0.0517 0.0479 1.4048 1.2714 1.2382 0.0126 0.0112 0.0106 Average able 5: Simulation Experiment Based on CPS Data (Log Wages) by State and Year – StandardError; Treated Period T = 40 Unit (cid:112) ˆV DiM (cid:112) ˆV DiM , PCB (cid:112) ˆV SC (cid:112) ˆV SC , PCB (cid:112) ˆV MUSC (cid:112) ˆV MUSC , PCB i = 1 0.1047 0.1047 0.0506 0.0520 0.0546 0.0492 i = 2 0.1060 0.1060 0.0524 0.0519 0.0478 0.0480 i = 3 0.1046 0.1046 0.0529 0.0521 0.0479 0.0480 i = 4 0.1062 0.1062 0.0514 0.0522 0.0455 0.0479 i = 5 0.1052 0.1052 0.0520 0.0524 0.0487 0.0484 i = 6 0.1040 0.1040 0.0526 0.0526 0.0492 0.0486 i = 7 0.1045 0.1045 0.0486 0.0491 0.0461 0.0467 i = 8 0.1048 0.1049 0.0522 0.0518 0.0474 0.0476 i = 9 0.1061 0.1062 0.0527 0.0519 0.0501 0.0492 i = 10 0.1060 0.1060 0.0526 0.0525 0.0470 0.0486... i = 31 0.1012 0.1012 0.0480 0.0516 0.0465 0.0467 i = 32 0.1062 0.1062 0.0517 0.0510 0.0466 0.0474 i = 33 0.1062 0.1062 0.0527 0.0524 0.0485 0.0488 i = 34 0.1060 0.1060 0.0499 0.0496 0.0452 0.0466 i = 35 0.1061 0.1061 0.0526 0.0523 0.0469 0.0489 i = 36 0.1049 0.1050 0.0523 0.0524 0.0481 0.0483 i = 37 0.1062 0.1062 0.0527 0.0527 0.0474 0.0492 i = 38 0.1062 0.1062 0.0529 0.0521 0.0476 0.0482 i = 39 0.1059 0.1059 0.0524 0.0523 0.0464 0.0485 i = 40 0.1051 0.1052 0.0522 0.0521 0.0481 0.0486 i = 41 0.1062 0.1062 0.0510 0.0525 0.0464 0.0472 i = 42 0.1060 0.1060 0.0522 0.0519 0.0483 0.0485 i = 43 0.1054 0.1054 0.0511 0.0507 0.0523 0.0478 i = 44 0.1028 0.1028 0.0541 0.0518 0.0563 0.0482 i = 45 0.1060 0.1060 0.0500 0.0508 0.0508 0.0475 i = 46 0.1051 0.1051 0.0504 0.0507 0.0475 0.0472 i = 47 0.1048 0.1048 0.0520 0.0515 0.0486 0.0474 i = 48 0.1059 0.1059 0.0497 0.0506 0.0443 0.0454 i = 49 0.1056 0.1056 0.0525 0.0523 0.0471 0.0487 i = 50 0.1043 0.1043 0.0523 0.0523 0.0472 0.0479 Average ssumption 5. (Random Assignment of Units) pr( U = u ) =  (cid:16) N ! N T ! N C ! (cid:17) − if u i ∈ { , } ∀ i, (cid:80) Ni =1 u i = N T , . In this case it is convenient to work with the sets of units assigned to the treatment,rather than the individual units. There are K = N ! / ( N T ! N C !) = (cid:0) NN T (cid:1) such sets. Of these,( N − / (( N T − N C !) = (cid:0) N − N T − (cid:1) include a given unit, such as unit 1, since there are (cid:0) N − N T − (cid:1) combinations of the remaining units if that unit is treated. This represents a fraction N T /N ofthe total number of sets of N T treated units. Hence the fraction of sets that does not includeunit 1 is N C /N .Let ˜ U be the vector of length K of indicators that denotes which set of N T units is treated.Let e k denote the K -component vector with all elements equal to zero, other than the k -thcomponent which is equal to one. By construction, (cid:80) Kk =1 ˜ U k = 1, and ˜ U k ∈ { , } . Assumption 5implies that the probability that ˜ U k = 1 is equal to N T ! N C ! /N ! = 1 /K . Let u i ( ˜ U ) ∈ { , } bean indicator for unit i being treated given the assignment vector ˜ U . In this notation we canrewrite τ as τ = τ ( ˜ U , V ) = 1 N T K (cid:88) k =1 T (cid:88) t =1 ˜ U k V t N (cid:88) i =1 u i ( ˜ U k ) (cid:16) Y it (1) − Y it (0) (cid:17) . Instead of the tensors M with dimension N × ( N + 1) × T , we now have tensors withdimension K × ( N + 1) × T , with one row for each of the K = N ! / ( N T ! N C !) possible sets oftreated units. The estimators we consider are of the formˆ τ ( ˜ U , V , Y , M ) ≡ K (cid:88) k =1 T (cid:88) t =1 ˜ U k V t (cid:40) M k t + N (cid:88) j =1 M kjt Y jt (cid:41) . (5.1)This formulation suggest the restriction that M kjt = 1 /N T , for all j such that u j ( e k ) = 1 and M kjt ≤ , whenever u j ( e k ) = 0. The set of such M we consider is for the generalized modiﬁed23nbiased synthetic control (MUSC) estimator is M MUSC = (cid:40) M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) j =1 M kjt = 0 ∀ k, t, K (cid:88) k =1 M kjt = 0 ∀ j ≥ , t (cid:41) . The objective function for choosing M is now M ( Y , M MUSC ) = arg min M ∈M MUSC K (cid:88) k =1 T (cid:88) t =1 (cid:88) s (cid:54) = t (cid:32) M k t + N (cid:88) j =1 M kjt Y js (cid:33)  . Lemma 5.

Suppose that Assumption 5 holds. Then ( i ) the estimator ˆ τ ( ˜ U , V , Y , M ( Y , M MUSC ) is unbiased conditional on V : E (cid:104) ˆ τ ( ˜ U , V , Y , M ( Y , M MUSC ) − τ ( U , V ) (cid:12)(cid:12)(cid:12) V (cid:105) = 0 , ( ii ) the variance of ˆ τ ( ˜ U , V , Y , M ( Y , M MUSC ) is V (cid:16) ˆ τ ( ˜ U , V , Y , M ( Y , M MUSC )) (cid:12)(cid:12)(cid:12) V (cid:17) = 1 K T (cid:88) t =1 V t K (cid:88) k =1 (cid:32) M k t + N (cid:88) j =1 M kjt Y ts (0) (cid:33) , and ( iii ) , the variance can be estimated without bias (conditional on V ) by a generalization ofthe variance estimator in Proposition 1, ˆ V = K (cid:88) k =1 T (cid:88) t =1 ˜ U k V t (cid:32) K (cid:88) k (cid:48) =1; u i ( ˜ U k )+ u i ( ˜ U k (cid:48) ) ≤ ∀ i (cid:40) (cid:0) N C − N T (cid:1) (cid:32) N (cid:88) j =1 (1 − u j ( ˜ U k )) M k (cid:48) jt ( Y jt − Y k (cid:48) t ) (cid:33) − N T ( N C − (cid:0) N C − N T (cid:1) N (cid:88) j =1 (1 − u j ( ˜ U k )) M k (cid:48) jt ( Y jt − Y k (cid:48) t ) + 2 (cid:0) N C − N T (cid:1) M k (cid:48) t N (cid:88) j =1 (1 − u j ( ˜ U k )) M k (cid:48) jt ( Y j − Y k (cid:48) ) (cid:41) + 1 K K (cid:88) k (cid:48) =1 M k (cid:48) t (cid:33) for Y k (cid:48) t = N T (cid:80) Nj =1 u j ( ˜ U k (cid:48) ) Y jt . So far the estimators we consider have imposed the restriction that all the treated units24eceive equal weight, M kjt = 1 N T In the case with a single treated unit that restriction was natural, but here we could relax thisto requiring only that the sum of the weights for the treated units is restricted to unit: N (cid:88) j =1 u j ( e k ) M kjt = 1 . This allows us to choose the weights for the treated units to reduce the variance. Yet changingthe weights on treated units also aﬀects the expectation under unit randomization. Speciﬁcally,we can understand the resulting estimator as estimating the weighted estimand K (cid:88) k =1 T (cid:88) t =1 ˜ U k V t N (cid:88) i =1 u i ( ˜ U k ) M kit (cid:16) Y it (1) − Y it (0) (cid:17) . While the estimator is unbiased relative to this estimand, and we can estimate its variance in thesame way as in Lemma 5, we cannot generally estimate its error relative to the equally-weightedaverage treatment eﬀect on the treated τ .To see the issue of unequally weighted treatment units, let us focus on the simplest case withtwo treated units and a single control unit. In that case there are K = 3 possible sets of twotreated units. If we impose the restrictions that the weights for the treated units sum to oneand the weights for the control units sum to minus one, there is only one free parameter in eachrow of the weight matrix. Consider the ﬁrst row of the weight matrix, with the third unit thecontrol unit. The estimator in that case isˆ τ = M T Y (1) + M T Y (1) − Y (0) . The error isˆ τ −

12 ( Y (1) + Y (1)) + 12 ( Y (0) + Y (0)) 25 ( M T − / Y (1) + ( M T − / Y (1) + 12 ( Y (0) + Y (0)) − Y (0) . Hence the expected squared error over the three assignments is13 (cid:40)(cid:18) ( M T − / Y (1) + ( M T − / Y (1) + 12 ( Y (0) + Y (0)) − Y (0) (cid:19) + (cid:18) ( M T − / Y (1) + ( M T − / Y (1) + 12 ( Y (0) + Y (0)) − Y (0) (cid:19) + (cid:18) ( M T − / Y (1) + ( M T − / Y (1) + 12 ( Y (0) + Y (0)) − Y (0) (cid:19) (cid:41) . The complication is that there is no unbiased estimator for this error because it involves cross-products of Y i (0) and Y i (1) which cannot be estimated. If we impose the restriction that theweights for all the treated units are equal, the dependence of the error on the Y i (1) vanishes,and the error can in general be estimated without bias. Here we look at the case where the estimand changes from the average eﬀect for the treatedunit(s) to the average eﬀect over all units in the treated periods. For ease of exposition wecontinue to focus on the case with a single treated period and a single treated unit. Theextension to the case with multiple treated units is conceptually clear based on the discussionin the previous subsection. Formally, the estimand is τ V = τ V ( V ) ≡ N N (cid:88) i =1 T (cid:88) t =1 V t (cid:16) Y it (1) − Y it (0) (cid:17) , We can separate this into two components, the eﬀect for the treated unit, τ T ≡ N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:16) Y it (1) − Y it (0) (cid:17) , τ before), and the average eﬀect for the control units: τ C ≡ N − N (cid:88) i =1 T (cid:88) t =1 (1 − U i ) V t (cid:16) Y it (1) − Y it (0) (cid:17) , with τ V = 1 N τ T + N − N τ C . Consider, as before, an estimator of the formˆ τ ( U , V , Y , M ) = N (cid:88) i =1 T (cid:88) t =1 U i V t (cid:40) M i t + N (cid:88) j =1 M ijt Y jt (cid:41) = T (cid:88) t =1 N (cid:88) i =1 V t U i (cid:40) M i t + M iit Y it (1) + N (cid:88) j =1 (1 − U j ) M ijt Y jt (0) (cid:41) . the restrictions M iit = 1 ∀ i, t , (cid:80) Ni =1 M ijt = 0 ∀ j, t (including the intercept) still imply unbiased-ness conditional on V , and the MUSC remains unbiased for τ V . Yet the variance (and moregenerally the conditional expected loss of such a weighted estimator) is now E (cid:104)(cid:0) ˆ τ ( U , V , Y , M ) − τ V (cid:1) (cid:12)(cid:12)(cid:12) V (cid:105) = 1 N N (cid:88) i =1 T (cid:88) t =1 V t (cid:18) M i t + (cid:18) M iit − N (cid:19) Y it (1) − N (cid:88) j =1 (1 − U j ) 1 N Y jt (1) (5.2)+ 1 N Y it (0) + N (cid:88) j =1 (1 − U j ) (cid:18) M ijt + 1 N (cid:19) Y jt (0) (cid:19) which depends on treated and untreated potential outcomes. This creates two related challenges:First, since the expression depends on treated outcomes, there is no immediate sample analogueavailable that corresponds to minimizing expected error, even under time randomization. Sec-ond, the variance cannot generally be estimated without bias, since it depends not only on thevariation of the Y it (0) (which can be estimated), but also on the variation of the Y it (1) and theircovariance with the Y it (0) (neither of which is identiﬁed from the data).We brieﬂy discuss two assumptions on the (non-stochastic) correlation of treatment and27ontrol outcomes, and what they imply for estimation. First, if treatment eﬀects are constantwithin time period (and treatment and control potential outcomes thus perfectly correlated, Y it (1) − Y it (0) = τ ), then (5.2) becomes1 N N (cid:88) i =1 T (cid:88) t =1 V t (cid:18) M i t + N (cid:88) j =1 M ijt Y jt (0) (cid:19) as before, suggesting the MUSC estimator. If, on the other hand, treated outcomes are uncor-related to control outcomes, and we focus on unbiased estimators (so in particular M iit = 1),then (5.2) becomes1 N N (cid:88) i =1 T (cid:88) t =1 V t (cid:18) M i t + 1 N N (cid:88) j =1 Y jt (0) + N (cid:88) j =1 (1 − U j ) M ijt Y jt (0) (cid:19) + const. , suggesting an alternative MUSC-type estimator that minimizes the sample analogue in non-treated time periods over weights M MUSC , which could eﬀectively shrink the MUSC weights oncontrol units towards the DiM weights − N − . (Indeed, one feasible set of weights would yieldthe estimator ˆ τ = N ˆ τ MUSC + N − N ˆ τ DiM of τ V = N τ T + N − N τ C corresponding to control weights N M MUSC ijt − N . This solution would likely be suboptimal because it enforces M ijt ≤ − N forcontrol weights.) Throughout this article, we have assumed that treatment is assigned with equal probabilityacross units, time periods, or unit–time pairs. Yet the theory we develop generalizes to non-constant propensity scores. For example, assume that treatment is assigned randomly to unit i with probability p i (or, similarly, to a time period or a unit–time pair), where (cid:80) Ni =1 p i = 1. Anatural analogue of the MUSC estimator is then M MUSC p ( Y , M MUSC p ) ≡ arg min M ∈M MUSC p N (cid:88) i =1 p i T (cid:88) t =1 (cid:88) s (cid:54) = t (cid:32) M i t + N (cid:88) j =1 M ijt Y js (cid:33)  (5.3)28here unbiasedness is guaranteed by the constraints M MUSC p = (cid:26) M (cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) j =1 M ijt = 0 ∀ i, t, N (cid:88) i =1 p i M ijt = 0 ∀ j, t (cid:27) . Such an estimator could be used when treatment is assigned randomly. When the analyst hasa choice over the treatment assignment, and t = T , the optimization in (5.3) could also includethe choice of propensity score. In this article we study Synthetic Control methods from a design perspective. We show thatwhen a randomized experiment is conducted, the standard SC estimator is biased. However, aminor modiﬁcation of the SC estimator is unbiased under randomization, and in cases with fewtreated units can have RMSE properties superior to those of the standard Diﬀerence-in-Meansestimator. We show that the design perspective also has implications for observational studies.We propose a variance estimator that is validated by randomization.29 eferences

Abadie, A. (2019). Using synthetic controls: Feasibility, data requirements, and methodologicalaspects. Journal of Economic Literature.Abadie, A., Athey, S., Imbens, G. W., and Wooldridge, J. M. (2020). Sampling-based versusdesign-based uncertainty in regression analysis. Econometrica, 88(1):265–296.Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparativecase studies: Estimating the eﬀect of california’s tobacco control program. Journal of theAmerican statistical Association, 105(490):493–505.Abadie, A., Diamond, A., and Hainmueller, J. (2015). Comparative politics and the syntheticcontrol method. American Journal of Political Science, pages 495–510.Abadie, A. and Gardeazabal, J. (2003). The economic costs of conﬂict: A case study of thebasque country. American Economic Review, 93(-):113–132.Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators foraverage treatment eﬀects. Econometrica, 74(1):235–267.Abadie, A. and L’Hour, J. (2017). A penalized synthetic control estimator for disaggregateddata. Work. Pap., Mass. Inst. Technol., Cambridge, MA.Amjad, M., Shah, D., and Shen, D. (2018). Robust synthetic control. The Journal of MachineLearning Research, 19(1):802–852.Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2019). Syntheticdiﬀerence in diﬀerences. Technical report, National Bureau of Economic Research.Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2017). Matrix completionmethods for causal panel data models. arXiv preprint arXiv:1710.10251.Athey, S. and Imbens, G. W. (2018). Design-based analysis in diﬀerence-in-diﬀerences settingswith staggered adoption. Technical report, National Bureau of Economic Research.Ben-Michael, E., Feller, A., and Rothstein, J. (2020). The augmented synthetic control method.arXiv preprint arXiv:1811.04170. 30ertrand, M., Duﬂo, E., and Mullainathan, S. (2004). How much should we trust diﬀerences-in-diﬀerences estimates? The Quarterly Journal of Economics, 119(1):249–275.Chernozhukov, V., Wuthrich, K., and Zhu, Y. (2017). An exact and robust conformal inferencemethod for counterfactual and synthetic controls. arXiv preprint arXiv:1712.09089.Cunningham, S. (2018). Causal inference: The mixtape. Yale University Press.Doudchenko, N. and Imbens, G. W. (2016). Balancing, regression, diﬀerence-in-diﬀerences andsynthetic control methods: A synthesis. Technical report, National Bureau of EconomicResearch.Ferman, B. and Pinto, C. (2017). Placebo tests for synthetic controls.Ferman, B. and Pinto, C. (2019). Synthetic controls with imperfect pre-treatment ﬁt. arXivpreprint arXiv:1911.08521.Fisher, R. A. (1937). The design of experiments. Oliver And Boyd; Edinburgh; London.Freedman, D. A. (2008). On regression adjustments in experiments with several treatments.The annals of applied statistics, 2(1):176–196.Hahn, J. and Shi, R. (2016). Synthetic control and inference. Available at UCLA.Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and BiomedicalSciences. Cambridge University Press.Lei, L. and Cand`es, E. J. (2020). Conformal inference of counterfactuals and individual treatmenteﬀects. arXiv preprint arXiv:2006.06138.Li, K. T. (2020). Statistical inference for average treatment eﬀects estimated by synthetic controlmethods. Journal of the American Statistical Association, 115(532):2068–2083.Lin, W. (2013). Agnostic notes on regression adjustments for experimental data: Reexaminingfreedman’s critique. The Annals of Applied Statistics, 7(1):295–318.Neyman, J. (1923/1990). On the application of probability theory to agricultural experiments.essay on principles. section 9. Statistical Science, 5(4):465–472.31ambachan, A. and Roth, J. (2020). Design-Based Uncertainty for Quasi-Experiments. arXivpreprint arXiv:2008.00602.Rosenbaum, P. R. (2002). Observational Studies. Springer.Rubin, D. B. (1974). Estimating causal eﬀects of treatments in randomized and nonrandomizedstudies. Journal of educational Psychology, 66(5):688.Sekhon, J. S. and Shem-Tov, Y. (2017). Inference on a new class of sample average treatmenteﬀects. arXiv preprint arXiv:1708.02140.Xu, Y. (2017). Generalized synthetic control method: Causal inference with interactive ﬁxedeﬀects models. Political Analysis, 25(1):57–76.32 ppendix

Proof of Proposition 1.

We ﬁrst consider the case without intercept. As a preliminary calculation, notethat for k, j, j (cid:48) ∈ { , . . . , n } N (cid:88) i =1 (cid:88) k,j,j (cid:48) (cid:54) = i N − |{ k, j, j (cid:48) }| a kjj (cid:48) = (cid:88) k,j,j (cid:48) a kjj (cid:48) , (A.1)since every term kjj (cid:48) term appears N − |{ k, j, j (cid:48) }| times in the sum on the left. Let now a kjj (cid:48) = M kj ( Y j (0) − Y k (0)) · M kj (cid:48) ( Y j (cid:48) (0) − Y k (0)) , where for simplicity we ﬁx the period t , drop all time indices to set M ij = M ijt , and write ˆ V i for thevariance estimator when U i = 1. Then a kjj (cid:48) = 0 for k ∈ { j, j (cid:48) } and thus1 N N (cid:88) i =1  N − (cid:88) k (cid:54) = i (cid:88) j (cid:54) = i M kj ( Y j (0) − Y k (0))  − N − N − (cid:88) k,j (cid:54) = i M kj ( Y j (0) − Y k (0)) (cid:124) (cid:123)(cid:122) (cid:125) =ˆ V i = 1 N N (cid:88) i =1 (cid:16) (cid:88) k,j,j (cid:48) (cid:54) = i N − a kjj (cid:48) − (cid:88) k,j,j (cid:48) (cid:54) = ij = j (cid:48) N − N − (cid:124) (cid:123)(cid:122) (cid:125) = N − − N − a kjj (cid:48) (cid:17) = 1 N (cid:16) N (cid:88) i =1 (cid:88) k,j,j (cid:48) (cid:54) = i |{ k,j,j (cid:48) }| =3 N − a kjj (cid:48) + (cid:88) k,j,j (cid:48) (cid:54) = i |{ k,j,j (cid:48) }| =2 N − a kjj (cid:48) (cid:124)(cid:123)(cid:122)(cid:125) =0 for j (cid:54) = j (cid:48) + (cid:88) k,j,j (cid:48) (cid:54) = i |{ k,j,j (cid:48) }| =1 N − a kjj (cid:48) (cid:124)(cid:123)(cid:122)(cid:125) =0 (cid:17) = 1 N N (cid:88) i =1 (cid:88) k,j,j (cid:48) (cid:54) = i N − |{ k, j, j (cid:48) }| a kjj (cid:48) (A.1) = 1 N (cid:88) k,j,j (cid:48) a kjj (cid:48) = 1 N N (cid:88) i =1  N (cid:88) j =1 M ij ( Y j (0) − Y i (0))  = 1 N N (cid:88) i =1  N (cid:88) j =1 M ij Y j (0)  = V . Here, we have used that (cid:80) Nj =1 M ij = 0. ith an intercept we note that V = 1 N N (cid:88) i =1  M i + N (cid:88) j =1 M ij Y j (0)  = 1 N N (cid:88) i =1  M i + N (cid:88) j =1 M ij ( Y j (0) − Y i (0))  = 1 N N (cid:88) i =1 M i + 2 N N (cid:88) i =1 M i  N (cid:88) j =1 M ij ( Y j (0) − Y i (0))  + 1 N  N (cid:88) j =1 M ij ( Y j (0) − Y i (0))  , where 2 N − (cid:88) k (cid:54) = i M k (cid:88) j (cid:54) = i M kj ( Y j (0) − Y k (0))  is unbiased for the middle term, using that M kj ( Y j (0) − Y k (0)) = 0 for k = j . It follows thatˆ V i = 1 N − (cid:88) k (cid:54) = i (cid:88) j (cid:54) = i M kj ( Y j (0) − Y k (0))  − N − N − (cid:88) k,j (cid:54) = i M kj ( Y j (0) − Y k (0)) + 2 N − (cid:88) k (cid:54) = i M k (cid:88) j (cid:54) = i M kj ( Y j (0) − Y k (0))  + 1 N (cid:88) k M k is an unbiased estimator of the conditional variance V . Proof of the variance expression in Lemma 5.

This proof generalized the proof of Proposition 1 above.Speciﬁcally, for [ N ] = { , . . . , N } , (cid:88) k ⊆ [ N ]; | k | = N T (cid:88) i ⊆ [ N ] \ k ; | i | = N T (cid:88) j,j (cid:48) ∈ [ N ] \ k ∪{ } (cid:0) | [ N ] \ ( k ∪ [ { j,j (cid:48) } ) | N T (cid:1) a i,j,j (cid:48) = (cid:88) k ⊆{ ,...,N } ; | k | = N T (cid:88) j,j (cid:48) ∈ [ N ] ∪{ } a k,j,j (cid:48) (A.2)for a conformal tensor a .For ﬁxed t as above consider weights M kj indexed by k ⊆ [ N ] with | k | = N T and j ∈ [ N ] ∪