[PDF] Design-Based Uncertainty for Quasi-Experiments

Abstract

Social scientists are often interested in estimating causal effects in settings where all units in the population are observed (e.g. all 50 US states). Design-based approaches, which view the treatment as the random object of interest, may be more appealing than standard sampling-based approaches in such contexts. This paper develops a design-based theory of uncertainty suitable for quasi-experimental settings, in which the researcher estimates the treatment effect as if treatment was randomly assigned, but in reality treatment probabilities may depend in unknown ways on the potential outcomes. We first study the properties of the simple difference-in-means (SDIM) estimator. The SDIM is unbiased for a finite-population design-based analog to the average treatment effect on the treated (ATT) if treatment probabilities are uncorrelated with the potential outcomes in a finite population sense. We further derive expressions for the variance of the SDIM estimator and a central limit theorem under sequences of finite populations with growing sample size. We then show how our results can be applied to analyze the distribution and estimand of difference-in-differences (DiD) and two-stage least squares (2SLS) from a design-based perspective when treatment is not completely randomly assigned.

Full PDF

aa r X i v : . [ ec on . E M ] A ug Design-Based Uncertainty forQuasi-Experiments ∗ Ashesh Rambachan † Jonathan Roth ‡ August 6, 2020

Abstract

Social scientists are often interested in estimating causal eﬀects in settings where allunits in the population are observed (e.g. all 50 US states). Design-based approaches,which view the treatment as the random object of interest, may be more appealingthan standard sampling-based approaches in such contexts. This paper develops adesign-based theory of uncertainty suitable for quasi-experimental settings, in which theresearcher estimates the treatment eﬀect as if treatment was randomly assigned, but inreality treatment probabilities may depend in unknown ways on the potential outcomes.We ﬁrst study the properties of the simple diﬀerence-in-means (SDIM) estimator. TheSDIM is unbiased for a ﬁnite-population design-based analog to the average treatmenteﬀect on the treated (ATT) if treatment probabilities are uncorrelated with the potentialoutcomes in a ﬁnite population sense. We further derive expressions for the variance ofthe SDIM estimator and a central limit theorem under sequences of ﬁnite populationswith growing sample size. We then show how our results can be applied to analyze thedistribution and estimand of diﬀerence-in-diﬀerences (DiD) and two-stage least squares(2SLS) from a design-based perspective when treatment is not completely randomlyassigned. ∗ We thank Isaiah Andrews, Iavor Bojinov, Peng Ding, Pedro Sant’Anna, Yotam Shem-Tov, and NeilShephard for helpful comments and suggestions. Rambachan gratefully acknowledges support from the NSFGraduate Research Fellowship under Grant DGE1745303. † Harvard University, Department of Economics. Email: [email protected] ‡ Microsoft Research. Email: [email protected]

Introduction

Standard econometric analyses of causal eﬀects typically view the data obtained by theeconometrician as a random sample from a larger superpopulation. This sampling-basedview may be unnatural in economic contexts where the entire population of interest is ob-served. For example, applied researchers are often interested in the causal eﬀect of state-levelpolicies when outcomes for all 50 US states are observed (Manski and Pepper, 2018). Simi-lar diﬃculties arise when the researcher has access to large-scale administrative data for theentire population of interest. In these settings, it may be more attractive to view uncertaintyas purely design-based, i.e. arising due to the stochastic nature of the treatment assignmentfor a ﬁnite population. A celebrated literature in statistics, dating to at least Neyman (1923)and Fisher (1935), has analyzed randomized experiments from such a design-based perspec-tive. This ﬁnite population view has received recent attention in the econometrics literature,e.g. from Abadie et al. (2017, 2020).However, there remains a gap between the typical assumptions used in existing ﬁnitepopulation causal analyses and many leading empirical settings in which a ﬁnite populationperspective is conceptually attractive. Typically, ﬁnite population analyses of causal eﬀectsassume that the observable data were generated from a randomized experiment, in which thetreatment is randomly assigned to units through an assignment mechanism with known prob-abilities (e.g., Imbens and Rubin (2015), Aronow and Middleton (2015), Middleton (2018),Savje and Delevoye (2020) among others). In contrast, social scientists often employ “quasi-experimental” methods, in which the data is analyzed as if treatment were randomly as-signed, but random assignment is not guaranteed by design. The probability of treatmentassignment is therefore not known to the researcher. In such settings, it is desirable to under-stand the properties of quasi-experimental estimators if in fact the data-generating processdiﬀers from random assignment.Existing analyses of quasi-experimental estimators — such as simple-diﬀerences-in-means(SDIM), diﬀerence-in-diﬀerences (DiD), and two-stage least squares (2SLS) — often adopta sampling-based view and consider the limiting distribution of the estimator in settingswhere treatment is not independent of potential outcomes. It is typically possible to obtainasymptotically valid causal estimation and inference under orthogonality conditions thatare weaker than strict independence between the treatment (or instrument) and potentialoutcomes. However, the interpretation of the causal estimand diﬀers under these weakerassumptions – for example, it may be an average treatment eﬀect on the treated (ATT) ora local average treatment eﬀect (LATE), rather than an average treatment eﬀect (ATE).Given the attractiveness of the design-based approach for many quasi-experimental settings,1t is useful to understand from the design-based perspective whether it is possible to obtainvalid inference on an interpretable causal parameter when randomization fails.To bridge these gaps, we study the estimation and inference of treatment eﬀects in aﬁnite population setting where the probability of treatment assignment varies arbitrarilyacross units. We analyze a treatment assignment mechanism that allows each unit to havean idiosyncratic probability p i of receiving a binary treatment. The idiosyncratic probability p i may depend arbitrarily on i ’s potential outcomes p Y i p q , Y i p qq . In this sense, our modelallows for the possibility that the “quasi-experimental” research design may not, in fact,mimic random assignment. We study the properties of three popular quasi-experimentalestimators – SDIM, DiD, and 2SLS – under this assignment mechanism from a purely design-based perspective.We begin with an analysis of the simple diﬀerence-in-means estimator (SDIM) in Section3. We ﬁrst establish a ﬁnite-population analog to the omitted variable bias formula, whichdecomposes the expectation of the SDIM into two terms: (i) a ﬁnite-population design-basedanalog to the average treatment eﬀect on the treated (ATT), and (ii) a bias term equal tothe ﬁnite-population covariance between the unit-speciﬁc treatment probabilities and theiruntreated potential outcomes. We then derive the ﬁnite population asymptotic distribu-tion of the SDIM as the size of the population grows large. We derive intuitive formulasfor the asymptotic variance of the SDIM statistic, as well as a central limit theorem underappropriate regularity conditions. As in the standard completely randomized experiment,the usual variance estimate is consistent for an upper bound on the variance of the esti-mator. An interesting feature of our setting is that the standard variance estimator maybe conservative even under constant treatment eﬀects if treatment probabilities diﬀer acrossunits. Thus, standard conﬁdence intervals deliver asymptotically conservative inference forthe ﬁnite-population ATT when the unit-speciﬁc treatment probabilities are orthogonal tothe potential outcomes.In Section 4, we extend the results for the SDIM to diﬀerence-in-diﬀerences (DiD). Weshow that the DiD estimator is unbiased for the ﬁnite population ATT under a ﬁnite-population analogue to the well-known “parallel trends” assumption in the sampling-basedliterature (e.g., see Chapter 5 of Angrist and Pischke (2009)). Our results thus help bridgethe gap between the sampling-based literature on DiD and recent work by Athey and Imbens(2018), who study DiD from a design-based perspective but assume completely random treat-ment timing. As with the SDIM, we show that widely used cluster-robust standard errors Concretely, we analyze the asymptotic distribution of the SDIM along a sequence of ﬁnite populationsin which both the size of the population and the number of treated units grows large. Similar ﬁnite pop-ulation asymptotics have been considered in the context of randomized experiments (Li and Ding, 2017;Abadie et al., 2017, 2020). Z i and binary treatment D i . The stochastic nature ofthe data now arises due to the assignment of the instrument Z i , holding ﬁxed the potentialoutcomes Y p d q and the potential treatments D p z q , as in Kang et al. (2018). We provide anintuitive expression for the estimand of 2SLS allowing for an arbitrary relationship betweenthe probability that Z i “ and the potential outcomes. Our results thus provide a bridgebetween recent work by Kang et al. (2018), who study instrumental variables models froma design-based perspective in which the instrument is completely randomly assigned, andsampling-based models of sensitivity analysis for IV (e.g. Conley et al. (2010)). When theinstrument is completely random, our expression reduces to the well-known result that theestimand of 2SLS is a local average treatment eﬀect (LATE) (Angrist and Imbens, 1994;Angrist et al., 1996). We generalize this result, showing that the 2SLS estimand also has aninteresting causal interpretation from a design-based perspective under the weaker conditionthat the probability that Z i “ has zero ﬁnite population covariance with both D i p q and Y i p D i p qq . Under this condition, the 2SLS estimand is a weighted average of the causaleﬀects for compliers, where the weights are equal to the unit-speciﬁc probabilities of re-ceiving Z i “ . This parameter can be interpreted as an instrument-propensity reweightedlocal average treatment eﬀect. As with the previously discussed estimators, standard in-ference methods yield asymptotically conservative inference for this estimand under “stronginstrument” asymptotics. Consider a ﬁnite population of N units. Let D i denote a binary indicator for whether unit i adopts a treatment of interest. Units are associated with potential outcomes Y i p q , Y i p q ,under treatment and control respectively, and the observed outcome equals Y i “ D i Y i p q `p ´ D i q Y i p q . Throughout the paper, the potential outcomes are treated as ﬁxed (or condi-tioned on), and the stochastic nature of the data arises only due to the random assignmentor adoption of treatment.Each unit independently adopts the treatment with idiosyncratic probability p i . Weallow for p i to be arbitrarily related to the potential outcomes with p i “ g p Y i p q , Y i p q , W i q ,where g is an unknown link function that maps p Y i p q , Y i p qq and some other (possiblyunobserved) i -level pre-treatment covariates W i into the unit interval. Since the researcherneither observes the pair of potential outcomes nor knows the link function g , the unit-speciﬁc treatment probabilities p i are unknown to the researcher. For example, such unit-3peciﬁc treatment probabilities may arise if units decide whether to adopt the treatmentbased on a choice model in which each unit’s adoption decision depends on its potentialoutcomes, pre-treatment covariates and idiosyncratic taste or information shocks ν i (e.g.,see Heckman and Vytlacil (2006) among many others). In this view, the randomness intreatment adoption in our model arises from the randomness in the idiosyncratic shocks ν i conditional on the potential outcomes and pre-treatment covariates. Example 1.

The Tax Cuts and Jobs Act of 2017 allowed for US census tracts meetingcertain criteria to receive tax beneﬁts if they were designated by the governor of their stateas “Opportunity Zones.” SUppose we are interested in the eﬀect of an eligible census tractbeing designated as an Opportunity Zone p D q on housing price growth p Y q , as in Chen et al.(2019). Since housing price growth is observed for all eligible census tracts, it is attractive tothink of the randomness in the data as coming from the choice of which tracts to designate asOpportunity Zones, rather than from drawing the observed sample from a superpopulationof census tracts. Owing to the vagaries of the political process, it is plausible that the choiceof which of the eligible census tracts to designate as Opportunity Zones is as-if randomlyassigned. For instance, the choice of which tracts to designate may depend on arbitraryfactors such as the order in which brieﬁngs about tracts were presented p ν i q that are unrelatedto the potential outcomes. It therefore may be sensible to estimate the causal eﬀect ofthe policy by comparing outcomes for designated and non-designated census tracts as if it were a randomized experiment. Nevertheless, we may still worry that – in addition tothe aforementioned idiosyncratic factors – the probability a particular tract is designated asan Opportunity Zone depends on the beneﬁt of treatment p Y i p q ´ Y i p qq and other ﬁxedfeatures of the tract such as its partisan lean ( W i ). It is therefore instructive to analyze theproperties of quasi-experimental estimators if we view the uncertainty in the data as comingfrom the idiosyncratic factors ν i but allow the probability of treatment to depend arbitrarilyon the other ﬁxed factors that aﬀect treatment choice, p i “ g p Y i p q , Y i p q , W i q .Following the literature on completely randomized experiments (e.g. Imbens and Rubin(2015)), we condition on the number of treatment and control units, N : “ ř i D i and N : “ N ´ N respectively. It is straightforward to derive the distribution of treatmentassignments D “ p D , ..., D N q conditional on N and N : P ˜ D “ d ˇˇˇ ÿ i D i “ N ¸ “ C ź i p d i i p ´ p i q ´ d i (1)for all d P t , u N such that ř i d i “ N , and zero otherwise. We refer to this as a

Poisson This follows from the fact that P p D “ d | ř i D i “ N q “ P p D “ d ^ ř i D i “ N q { P p ř i D i “ N q . ejective assignment mechanism , since it parallels what Hajek (1964) refers to as Poissonrejective sampling, in which units are sampled from a ﬁnite population only if D i “ and D has the distribution given in (1).As notation, deﬁne the marginal assignment probability as π i : “ P p D i “ | ř i d i “ N q .Additionally, for non-stochastic weights w i and a non-stochastic attribute X i (such as apotential outcome), deﬁne E w r X i s : “ ř i w i ÿ i w i X i and V w r X i s : “ ř i w i ÿ i w i p X i ´ E w r X i sq to be the ﬁnite-population weighted expectation and variance respectively. Analogously, de-ﬁne C ov w r X i , Y i s “ E w rp X i ´ E w r X i sq p Y i ´ E w r Y i sqs . We denote by E R r¨s “ E r¨ | ř i D i “ N s the expectation with respect to the randomization distribution for the treatment assignment D , conditional on the number of treated units. The operators V R r¨s and C ov R r¨ , ¨s are deﬁnedanalogously as the variance and covariance respectively over the randomization distributionfor the treatment assignment D , conditional on the number of treated units. We begin by analyzing the properties of the simple diﬀerence in means (SDIM) estimator, ˆ τ : “ N ÿ i D i Y i ´ N ÿ i p ´ D i q Y i . (2)Our results are thus relevant for quasi-experimental settings where the researcher comparesthe treated and untreated units as if they were randomly assigned, but may be concernedthat in fact treatment probabilities were related to potential outcomes. We ﬁrst turn our attention to the expectation of ˆ τ under the treatment assignment mecha-nism (1). Observe that E R r ˆ τ s “ N ÿ i π i p Y i p q ` τ i q looooomooooon “ Y i p q ´ N ÿ i p ´ π i q Y i p q“ N ÿ i π i τ i loooomoooon “ τ ATT ` NN NN ˜ N ÿ i ˆ π i ´ N N ˙ Y i p q ¸loooooooooooooooomoooooooooooooooon “ C ov r π i ,Y i p qs , (3)5here τ i “ Y i p q ´ Y i p q is unit i ’s causal eﬀect. The ﬁrst term in the previous displayis a weighted average of the unit-speciﬁc causal eﬀects, where the weights are proportionalto the unit-speciﬁc treatment probabilities. We interpret this object as a ﬁnite-populationanalogue to the average treatment eﬀect on the treated since N ÿ i π i τ i “ E R « N ÿ i D i τ i ﬀ “ : τ AT T . (4) τ AT T is the expected value of what Imbens (2004) and Sekhon and Shem-Tov (2020) referto as the sample average treatment eﬀect on the treated (SATT), where the expectation istaken over the stochastic realization of which units are treated. The second term in (3) is theSDIM’s bias for τ AT T and equals a constant times the ﬁnite-population covariance betweenthe treatment probabilities π i and the untreated potential outcomes Y i p q . The bias is zeroif all units are treated with the same probability (i.e. π i “ N { N for all i ), and furthermoreunder this condition τ AT T reduces to the average treatment eﬀect.This characterization of the bias of the SDIM estimator suggests that researchers mayconduct sensitivity analysis under diﬀerent assumptions about the ﬁnite-population covari-ance between the treatment probabilities and the untreated potential outcomes – i.e., reportthe range of possible values for ˆ τ ´ NN NN C ov r π i , Y i p qs under diﬀerent assumptions aboutthe possible magnitudes of C ov r π i , Y i p qs . Such a sensitivity analysis is related to, but dif-ferent from existing design-based sensitivity analyses developed in, for example, Rosenbaum(1987), Chapter 4 of Rosenbaum (2002), Rosenbaum (2005) among many others. The ap-proach in those papers places bounds on the relative odds ratio of treatment between twounits (i.e., π i p ´ π j q π j p ´ π i q for i ‰ j ) and examines the extent to which the relative odds ratiomust vary across units such that we may no longer reject a particular sharp (Fisher) nullof interest. In contrast, we focus on examining how the bias of the SDIM estimator fora particular weighted average treatment eﬀect varies with the ﬁnite population covariancebetween treatment probabilities and untreated potential outcomes.Equation (3) may also be interpreted as a ﬁnite population version of the omitted variablesbias formula for regression analyses. Deﬁning the errors ε Yi “ Y i p q ´ E ´ π r Y i p qs and ε τi “ τ i ´ τ AT T , we may rewrite the observed outcome for unit i as Y i “ β ` D i τ AT T ` u i , (5)where β “ E ´ π r Y i p qs and u i “ ε Yi ` D i ε τi . One can show that the expression derived abovefor E R r ˆ τ ´ τ AT T s is equivalent to E R ” C ov r D i ,u i s V ar r D i s ı , which in light of equation (5) coincides withthe omitted variable bias formula for the coeﬃcient on D i in an OLS regression of Y i on D i We now turn our attention to the variance and distribution of ˆ τ . The exact ﬁnite-samplevariance and distribution functions are complicated functions of the p i , and we thereforerely on a triangular array asymptotic approximation using a sequence of ﬁnite populationswhere the number of units grows large, in the spirit of Freedman (2008b,a), Lin (2013), andLi and Ding (2017). We consider sequences of populations indexed by m of size N m , with N m treated units, potential outcomes t Y im p d q : d “ , i “ , ..., N m u , and assignmentweights p m , ..., p N m . For brevity, we leave the subscript m implicit in our notation; all limitsare implicitly taken as m Ñ 8 . Our results will provide an approximation to the propertiesof ˆ τ for ﬁnite populations with a suﬃciently large number of units.To analyze its distribution, note that ˆ τ may be re-written as ˆ τ “ ÿ i D i π i ˜ Y i ´ N ÿ i Y i p q , (6)where ˜ Y i : “ π i ´ N Y i p q ` N Y i p q ¯ . The second term on the right-hand side of the previousdisplay is non-stochastic. The ﬁrst term, on the other hand, can be viewed as a Horvitz-Thompson estimator for ř Ni “ π i ˜ Y i under what Hajek (1964) refers to as Poisson rejectivesampling. We can therefore make use of results from Hajek (1964) to obtain its asymptoticdistribution under a sequence of ﬁnite populations as described above. To obtain the asymptotic variance of ˆ τ , we impose the following assumption on the sequenceof populations. Assumption 1.

The sequence of populations satisﬁes ř Ni “ π i p ´ π i q Ñ 8 . Note that π i p ´ π i q is the variance of the Bernouli random variable D i , so Assumption 1implies that the sum of the variances of the D i grows large. Assumption 1 also implies thatboth N and N go to inﬁnity, since ř Ni “ π i p ´ π i q ď min t ř i π i , ř i p ´ π i qu “ min t N , N u .Note that Assumption 1 is trivially satisﬁed under the familiar overlap condition (i.e., π i Pr η, ´ η s for some η ą ). However, overlap for all units is not necessary for Assumption 1to hold, and indeed Assumption 1 allows for π i “ or π i “ for some units.7 emma 3.1. Under Assumption 1, V R r ˆ τ s r ` o p qs “ N ř Nk “ π k p ´ π k q N N N N „ N V ar ˜ π r Y i p qs ` N V ar ˜ π r Y i p qs ´ N V ar ˜ π r τ i s  , (7) where o p q Ñ and the weights are given by ˜ π i “ π i p ´ π i q .Proof. Since ˆ τ can be represented as a Horvitz-Thompson estimator under Poisson rejectivesampling, Theorem 6.1 in Hajek (1964) implies that V R r ˆ τ s r ` o p qs “ « N ÿ k “ π k p ´ π k q ﬀ V ar ˜ π ” ˜ Y i ı . (8)Standard decomposition arguments for completely randomized experiments (e.g. Imbens and Rubin(2015)), modiﬁed to replace unweighted variances with weighted variances, yield that V ar ˜ π ” ˜ Y i ı “ NN N ˆ N V ar ˜ π r Y i p qs ` N V ar ˜ π r Y i p qs ´ N V ar ˜ π r τ i s ˙ , which together with the previous display yields the desired result.Lemma 3.1 shows that the asymptotic variance of ˆ τ depends on the weighted variance ofthe treated and untreated potential outcomes and treatment eﬀects, where unit i is weightedproportionally to the variance of their treatment status V R r D i s “ π i p ´ π i q . The leadingconstant term is less than or equal to one by Jensen’s inequality, with equality when π i isconstant across units. Thus, in the special case of a completely random experiment, theformula in Lemma 3.1 reduces to p ` o p qq ´ N V ar r Y i p qs ` N V ar r Y i p qs ´ N V ar r τ i s ¯ ,which mimics the familiar formula for completely randomized experiments up to a degrees-of-freedom corrections. We next provide an upper bound for the asymptotic variance derived in Lemma 3.1.We will later provide regularity conditions under which the standard variance estimator isasymptotically consistent for this upper bound.

Lemma 3.2.

Under Assumption 1, the right-hand side of (7) is bounded above by N V ar π r Y i p qs ` N V ar ´ π r Y i p qs , (9) The ` o p q correction is needed here because V ar r Y i p d qs “ N ř i p Y i p d q ´ E r Y i p d qsq , which diﬀersfrom the usual ﬁnite population variance by the degrees-of-freedom correction factor NN ´ . nd the bound holds with equality if and only if E ˜ π „ N Y i p q ` N Y i p q  “ N E π r Y i p qs ` N E ´ π r Y i p qs and π i N { N Y i p q ´ ´ π i N { N Y i p q “ π i N { N E π r Y i p qs ´ ´ π i N { N E ´ π r Y i p qs for all i. Proof.

From (8), we see that the right-hand side of (7) is equivalent to N ÿ i “ π i p ´ π i q ˆ N Y i p q ` N Y i p q ´ ˆ E ˜ π „ N Y i p q ` N Y i p q ˙˙ . Since for any X , E ˜ π r X s “ arg min µ ř Ni “ π i p ´ π i qp X i ´ µ q , it follows that this is boundedabove by N ÿ i “ π i p ´ π i q ˆ N Y i p q ` N Y i p q ´ ˆ E π „ N Y i p q  ` E ´ π „ N Y i p q ˙˙ , (10)and the bound is strict if and only if E ˜ π „ N Y i p q ` N Y i p q  “ N E π r Y i p qs ` N E ´ π r Y i p qs . Let Y i p q “ Y i p q ´ E π r Y i p qs and Y i p q “ Y i p q ´ E ´ π r Y i p qs . Then the expression in (10)can be written as N ÿ i “ π i p ´ π i q ˆ N Y i p q ` N Y i p q ˙ “ « N N ÿ i “ π i Y i p q ` N N ÿ i “ p ´ π i q Y i p q ´ N N ÿ i “ π i Y i p q ´ N N ÿ i “ p ´ π q Y i p q ` N N N ÿ i “ π i p ´ π i q Y i p q Y i p q ﬀ “ « N V ar π r Y i p qs ` N V ar ´ π r Y i p qs ´ N N ÿ i “ ˆ π i N { N Y i p q ´ ´ π i N { N Y i p q ˙ ﬀ , from which the result is immediate. Corollary 3.1.

If treatment eﬀects are constant, Y i p q “ τ ` Y i p q for all i , and E R r ˆ τ s “ τ ,then the bound in Lemma 3.2 is only strict if π i “ N N for all i such that Y i p q ‰ E ˜ π r Y i p qs . roof. The two conditions for equality in Lemma 3.2 together with the assumption that Y i p q “ τ ` Y i p q imply that τ ´ E R r ˆ τ s “ N ˆ π i ´ N N ˙ ˆ N ` N ˙ p Y i p q ´ E ˜ π r Y i p qsq for all i, from which the result follows immediately.We thus see that under constant treatment eﬀects, if ˆ τ is unbiased then the asymptoticvariance of ˆ τ will be strictly lower than the upper bound when treatment probabilities arenot uniform (unless the treatment probabilities diﬀer from uniformity only for a set of unitsfor which Y i p q “ E ˜ π r Y i p qs .) Remark 1.

It is straightforward to show that if π “ N N for all i , then the bound in Lemma3.2 is strict if and only if treatment eﬀects are constant, which is a standard result forcompletely randomized experiments. When π ‰ N N , Lemma 3.2 implies that the boundholds with strict equality only in knife-edge cases. Next, we provide a regularity condition under which the standard variance estimator isconsistent for the upper bound on the asymptotic variance of ˆ τ given in (9). Let ˆ s “ N ˆ s ` N ˆ s , where ˆ s : “ N ÿ i D i p Y i ´ ¯ Y q , ˆ s : “ N ÿ i p ´ D i qp Y i ´ ¯ Y q , and ¯ Y : “ N ř i D i Y i , ¯ Y : “ N ř i p ´ D i q Y i . The following assumption and consistency result generalize those in Li and Ding (2017)for the case of completely randomized assignment.

Assumption 2.

Deﬁne m N p q : “ max ď i ď N p Y i p q ´ E π r Y i p qsq , and analogously m N p q : “ max ď i ď N p Y i p q ´ E ´ π r Y i p qsq . Assume that, N m N p q V ar π r Y i p qs Ñ and N m N p q V ar ´ π r Y i p qs Ñ . Lemma 3.3.

Under Assumptions 1 and 2, ˆ s ´ N V ar π r Y i p qs ` N V ar π r Y i p qs ¯ p ÝÑ . roof. See Appendix.

Finally, we introduce an assumption that allows us to obtain a central limit theorem for theSDIM ˆ τ . Assumption 3.

Let ˜ Y i “ N Y i p q ` N Y i p q , and assume σ π “ V ar ˜ π ” ˜ Y i ı ą . Suppose thatfor all ǫ ą , σ π E ˜ π «´ ˜ Y i ´ E ˜ π ” ˜ Y i ı¯ «ˇˇˇ ˜ Y i ´ E ˜ π ” ˜ Y i ıˇˇˇ ě cÿ i π i p ´ π i q ¨ σ ˜ π ǫ ﬀﬀ Ñ . Assumption 3 is similar to the Lindeberg condition for the standard Lindeberg-Levy cen-tral limit theorem, and imposes that the weighted ﬁnite-population variance of ˜ Y i is notdominated by a small number of observations. Viewing ˆ τ as a Horvitz-Thompson estimatorunder Poisson rejective sampling in light of (6), the following result follows immediately fromTheorem 1 in Berger (1998), which is based on Hajek (1964). Lemma 3.4.

Suppose Assumptions 1 and 3 hold. Then, ˆ τ ´ E R r ˆ τ s a V R r ˆ τ s d ÝÑ N p , q . The results for scalar outcomes Y i extend easily to the multiple outcome case with Y i P R K .This is relevant when we observe multiple outcome measures in a cross-section, or we observethe same outcome measure for multiple periods (or both). We use the extension to multi-ple outcomes in our ﬁnite population analysis of diﬀerence-in-diﬀerences and instrumentalvariables settings later in the paper.We extend our notation from the scalar case, so that Y i P R K , and for a ﬁxed vector-valued characteristic X i (e.g a function of the potential outcomes), E w r X i s : “ ř i w i ř i w i X i and V ar w r X i s “ ř i w i ř i p X i ´ E w r X i sq p X i ´ E w r X i sq . In particular, deﬁne S ,w : “ V ar w r Y i p qs , S ,w : “ V ar w r Y i p qs ,S ,w : “ E w rp Y i p q ´ E w r Y i p qsqp Y i p q ´ E w r Y i p qsq s Berger (1998) gives the result using the actual inclusion probabilities π i , whereas Hajek (1964) states asimilar result where the Horvitz-Thompson estimator uses an approximation to the π i in terms of the p i .

11o be the weighted ﬁnite population variances and covariance of Y i p q and Y i p q . Addition-ally, the vector-valued ATT is deﬁned as, τ AT T : “ N ř i π i p Y i p q ´ Y i p qq , and consider thevector-valued SDIM estimator ˆ τ “ N ř i D i Y i p q ´ N ř i p ´ D i q Y i p q . We also generalizethe variance estimators introduced above, ˆs : “ N ˆs ` N ˆs , ˆs : “ N ÿ i D i p Y i ´ ¯Y qp Y i ´ ¯Y q , ˆs : “ N ÿ i p ´ D i qp Y i ´ ¯Y qp Y i ´ ¯Y q , where ¯Y : “ N ř i D i Y i and ¯Y : “ N ř i p ´ D i q Y i .We introduce the following assumptions on the sequence of ﬁnite populations. Assumption 4.

Suppose that N { N Ñ p P p , q , and S ,w , S ,w , S ,w have ﬁnite limits for w P t π, ´ π, ˜ π u . Assumption 5.

Assume that max ď i ď N || Y i p q ´ E π r Y i p qs || { N Ñ ď i ď N || Y i p q ´ E ´ π r Y i p qs || { N Ñ where || ¨ || is the Euclidean norm. Assumption 6.

Let ˜Y i “ N Y i p q ` N Y i p q , and let λ min be the minimal eigenvalue of Σ ˜ π “ V ar ˜ π ” ˜Y i ı . Assume λ min ą and for all ǫ ą , λ min E ˜ π «ˇˇˇˇˇˇ ˜Y i ´ E ˜ π ” ˜Y i ıˇˇˇˇˇˇ ¨ «ˇˇˇˇˇˇ ˜Y i ´ E ˜ π ” ˜Y i ıˇˇˇˇˇˇ ě cÿ i π i p ´ π i q ¨ λ min ¨ ǫ ﬀﬀ Ñ . Assumption 4 requires that the fraction of treated units and the (weighted) variance andcovariances of the potential outcomes have limits. Assumption 5 is a multivariate analogof Assumption 2 in that it requires that no single observation dominate the π or p ´ π q -weighted variance of the potential outcomes. Assumption 6 is a multivariate generalizationof the Lindeberg-type condition in Assumption 3. Proposition 3.1 (Results for vector-valued outcomes) . (1) E R r ˆ τ s “ τ AT T ` NN NN ˜ N ÿ i ˆ π i ´ N N ˙ Y i p q ¸ .

2) Under Assumptions 1, and 4, V R r ˆ τ s ` o p N ´ q “ N ř Nk “ π k p ´ π k q N N N N „ N V ar ˜ π r Y i p qs ` N V ar ˜ π r Y i p qs ´ N V ar ˜ π r τ i s  ď N V ar π r Y i p qs ` N V ar ´ π r Y i p qs where A ď B if B ´ A is positive semi-deﬁnite.(3) Under Assumptions 1, 4, and 5, ˆs ´ V ar π r Y i p qs p ÝÑ , ˆs ´ V ar ´ π r Y i p qs p ÝÑ . (4) Under Assumptions 1, 4, and 6, V R r ˆ τ s ´ p ˆ τ ´ τ q d ÝÑ N p , I q . Assumption 4 implies Σ τ “ lim N Ñ8 N V R r ˆ τ s exists, so the previous display can alterna-tively be written as ? N p ˆ τ ´ τ q d ÝÑ N p , Σ τ q . Proof.

See appendix.

In this section, we apply our results to provide a design-based analysis of diﬀerence-in-diﬀerences estimators (e.g., Chapter 5 of Angrist and Pischke (2009)). Such a design-basedanalysis is useful since applied researchers commonly use diﬀerence-in-diﬀerences estimatorsin quasi-experimental settings to analyze the causal eﬀects of state-level polices in whichoutcomes for all 50 US states are observed.Suppose we observe panel data for a population of N units for periods t “ ´ ¯ T, ..., ¯ T .Units with D i “ receive a treatment of interest beginning at period t “ . The observedoutcome for unit i at period t is Y it “ Y it p D i q . We assume the treatment has no eﬀect prior We focus on the case with non-staggered treatment timing, since it may be diﬃcult to interpret theestimand of standard two-way ﬁxed eﬀects models under treatment eﬀect heterogeneity and staggeredtreatment timing (Borusyak and Jaravel, 2016; de Chaisemartin and D’Haultfœuille, 2018; Goodman-Bacon,2018; Athey and Imbens, 2018). The results in this section could be extended to other estimators with amore sensible interpretation under staggered timing e.g. Callaway and Sant’Anna (2019); Sun and Abraham(2020).

13o its implementation, so that Y it p q “ Y it p q for all t ă . Consider the common dynamictwo-way ﬁxed eﬀects (TWFE) or “event-study” regression speciﬁcation Y it “ α i ` φ t ` ÿ s ‰ D i ˆ r s “ t s ˆ β s ` ǫ it . (11)It is well known in this setting that ˆ β t “ ˆ τ t ´ ˆ τ where ˆ τ t “ N ÿ i D i Y it ´ N ÿ i p ´ D i q Y it . Thus, ˆ β t is the diﬀerence in the SDIM estimators for the outcome in period t and period 0.Letting Y i “ p Y i, ´ ¯ T , ..., Y i, ¯ T q , (3) implies that under Poisson rejective assignment, E R ” ˆ β t ı “ τ t ` NN NN C ov r π i , Y it p q ´ Y i p qs , where τ t “ N ř i π i Y it p q is the ATT in period t , and we use the fact that τ “ by theno-anticipation assumption. Thus, the bias in ˆ β t is proportional to the ﬁnite population co-variance between π i and trends in the untreated potential outcomes, Y it p q´ Y i p q . It followsthat ˆ β t is unbiased for τ t over the randomization distribution if C ov r π i , Y it p q ´ Y i p qs “ ,or equivalently, if E R « N ÿ i D i p Y it p q ´ Y i p qq ﬀ “ E R « N ÿ i p ´ D i qp Y it p q ´ Y i p qq ﬀ , which mimics the familiar “parallel trends” assumption from the sampling-based model.Further, if the sequence of populations satisﬁes the assumptions in part (4) of Proposition3.1, then ? N p ˆ β ´ p τ ` δ qq Ñ d N p , Σ q , (12)where ˆ β is the vector that stacks ˆ β t , Σ “ lim N Ñ8 N V R ” ˆ β t ı , and τ , δ are the vectors thatstack τ t and δ t “ NN NN C ov r π i , Y it p q ´ Y i p qs . Part (3) implies that the variance estimator ˆs is asymptotically conservative for ˆ β . It is easily veriﬁed that ˆs corresponds with thecluster-robust variance estimator for (11) that clusters at level i (up to degrees of freedomcorrections). The normal limiting model in (12) has been studied by Roth (2019) andRambachan and Roth (2019) from a sampling-based perspective in which parallel trendsmay fail; our results show that it also has a sensible interpretation from a design-basedperspective. 14 Instrumental Variables

In this section, we apply our results to analyze the properties of two-stage least squaresinstrumental variables estimators. Let Z i P t , u be an instrument. Let D i p z q P t , u bethe potential treatment status as a function of z . Let Y i p d q be the potential outcome as afunction of d P t , u . Our notation Y p d q encodes the so-called “exclusion restriction” that Z aﬀects Y only through D . We observe p Y i , D i , Z i q where Y i “ Y i p D i p Z i qq and D i “ D i p Z i q .We treat Z i as stochastic and the potential outcomes for both D and Y as ﬁxed. The numberof units with Z i “ is denoted by N Z and the number of units with Z i “ is denoted by N Z . Example 2.

Researchers may have data on student outcomes for all students attendingpublic and private schools in a particular geographic area (e.g., Goodman (2008) observesdata on all high school graduates in Massachusetts from 2003-2005). The instrument Z i could be an indicator for whether a student is oﬀered a subsidy for attending private school, D i could be an indicator for whether a student attends private school, and Y i could be astudent’s test score. We might suspect that an organization assigns scholarships essentiallyas-if random, but it is also plausible that they may target their oﬀers to students that arelikely to accept if oﬀered, or who have high beneﬁts from private school, so that P p Z i q “ may be related to Y i p d q and D i p z q . It is therefore instructive to consider the distribution the2SLS estimator when Z i is not completely randomly assigned.In canonical IV frameworks, it is traditionally assumed that the instrument Z is indepen-dent of the potential outcomes (see Angrist and Imbens (1994); Angrist et al. (1996) for asampling-based model, and Kang et al. (2018) for a design-based model). We instead allowfor the possibility that the probability that Z i “ may diﬀer across units, and be arbitrarilyrelated to the potential outcomes. In particular, we suppose that P ˜ Z “ z ˇˇˇ ÿ i Z i “ N Z ¸ “ C ź i p z i i p ´ p i q ´ z i (13)for all Z P t , u N such that ř i z i “ N Z , and zero otherwise. Thus, the assignment ofthe instrument Z i mimics the Poisson rejective assignment of D i in (1). We update thenotation to use E R Z r¨s , V R Z r¨s to denote the expectations and variances with respect to therandomization distribution of Z conditional on the number of units assigned to Z “ . Wealso maintain the typical monotonicity assumption that is commonly imposed in IV settings. Assumption 7 (Monotonicity) . D i p q ě D i p q for all i .

15 common method for estimating treatment eﬀects in an instrumental variables settingis two-stage least squares (2SLS), deﬁned as ˆ β SLS : “ ˆ τ RF { ˆ τ F S with ˆ τ RF : “ N Z ÿ i Z i Y i ´ N Z ÿ i p ´ Z i q Y i ˆ τ F S : “ N Z ÿ i Z i D i ´ N Z ÿ i p ´ Z i q D i . ˆ τ RF is often referred to as the “reduced-form” coeﬃcient, whereas ˆ τ F S is referred to as the“ﬁrst-stage” coeﬃcient.Observe that ˆ τ RF is a SDIM for the eﬀect of Z i on Y i , whereas ˆ τ F S can be viewed as aSDIM for the eﬀect of Z i on Y i . Equation (3) thus implies that E R Z r ˆ τ RF s “ N ÿ i π Zi p Y i p D i p qq ´ Y i p D i p qqq ` NN Z NN Z C ov “ π Zi , Y i p D i p qq ‰ , where C ov “ π Zi , Y i p D i p qq ‰ “ N ř i ´ π Zi ´ N Z N ¯ Y i p D i p qq is the ﬁnite population covariancebetween π Zi and Y i p D i p qq . Let C “ t i : D i p q ą D i p qu denote the set of compliers. Theprevious display along with Assumption 7 imply that E R Z r ˆ τ RF s “ N ÿ i P C π Zi p Y i p q ´ Y i p qq ` NN Z NN Z C ov “ π Zi , Y i p D i p qq ‰ . (14)By an analogous argument for ˆ τ F S , we obtain that E R Z r ˆ τ F S s “ N ÿ i P C π Zi ` NN Z NN Z C ov “ π Zi , D i p q ‰ . (15)Deﬁne β SLS : “ E RZ r ˆ τ RF s E RZ r ˆ τ F S s .Our earlier results imply that under suitable regularity conditions ˆ β SLS is normallydistributed around β SLS in large populations. Let Y i “ p Y i , D i q and deﬁne the potentialoutcomes Y i p z q “ p Y i p D i p z qq , D i p z qq . If the sequence of populations satisﬁes the assumptionsin Proposition 3.1, part 4 (using Y i as just deﬁned, and adding sub- or super-script Z asneeded), then ? N ˜ ˆ τ RF ´ E R Z r ˆ τ RF s ˆ τ F S ´ E R Z r ˆ τ F S s ¸ Ñ d N p , Σ τ q , where Σ τ “ lim N Ñ8 N V R Z «˜ ˆ τ RF ˆ τ F S ¸ﬀ . Assuming further that the sequence of populationssatisﬁes p E R Z r ˆ τ RF s , E R Z r ˆ τ F S sq Ñ p τ ˚ RF , τ ˚ F S q with τ ˚ F S ą , then the uniform delta method16e.g., Theorem 3.8 in van der Vaart (2000)) implies that ? N p ˆ β SLS ´ β SLS q Ñ d N p , g Σ τ g q , where g is the gradient of h p x, y q “ x { y evaluated at p τ ˚ RF , τ ˚ F S q . Proposition 3.1 likewiseimplies that it is possible to obtain asymptotically conservative inference for β SLS usingplug-in estimates of the variance.How should we interpret the estimand β SLS ? First, note that if π Zi ” N Z N , so that allunits receive Z “ with equal probability, then equations (14) and (15) imply that β SLS “ | C | ř i P C p Y i p q ´ Y i p qq , which is the canonical local average treatment eﬀect (LATE) for com-pliers (Angrist et al., 1996). Interestingly, our results show that β SLS has a general causal in-terpretation under the weaker assumption that C ov “ π Zi , Y i p D i p qq ‰ “ C ov “ π Zi , D i p q ‰ “ ,so that the probability that Z i “ may diﬀer across units but the ﬁnite population covari-ance between treatment probabilities and D i p q and Y i p D i p qq is equal to zero. Under thisassumption, we have that β SLS “ ř i P C π Zi ÿ i P C π Zi p Y i p q ´ Y i p qq . The parameter β SLS can then be interpreted as a π Zi -weighted local average treatment eﬀect(LATE) for compliers. The weights given to each complier are proportional to the probabilitythat Z i “ . This is intuitive, as a complier with a low probability of having Z i “ shouldhave little eﬀect on the 2SLS estimator. This paper analyzes the properties of quasi-experimental estimators, such as SDIM, DiD,and 2SLS, in a ﬁnite population setting in which treatment probabilities are non-constantacross units and may vary systematically with potential outcomes. Analogous to familiarresults in the sampling-based framework, we show that one can obtain valid causal inferencefor certain interpretable causal estimands if complete randomization is replaced with weakerorthogonality conditions. More generally, our results allow one to understand the bias andlimiting distribution of these estimators for the ATT as a function of the ﬁnite-population It is well-known in sampling-based instrumental variables settings that the delta method fails under“weak-instrument asymptotics” in which E R Z r ˆ τ F S s drifts towards zero (Staiger and Stock, 1997). Similarissues apply here. However, the test static used to form Anderson-Rubin conﬁdence intervals, which arerobust to weak identiﬁcation, can be written as a quadratic form in a SDIM statistic (see, e.g., Li and Ding(2017)). Our results could thus also be applied to analyze the properties of Anderson-Rubin based CIs underweak identiﬁcation asymptotics. π i and functions of the potential outcomes, akinto familiar omitted variable bias formulas.The analysis in this paper could be extended in a variety of directions. First, the analysismight be extended to settings where the stochastic nature of the data arises both from theassignment of treatment and from sampling a subset of units from a ﬁnite population, asin Abadie et al. (2020). Like in Abadie et al. (2020), the analysis could also be extendedto allow for clustered sampling or treatment assignment. Second, our results on the lim-iting distribution of the SDIM suggest that a variety of mis-speciﬁcation robust tools andsensitivity analyses which have been developed under the assumption of asymptotic normal-ity from a sampling-based perspective could also potentially be applied in ﬁnite populationcontexts as well (e.g., Armstrong and Kolesar (2018a,b); Bonhomme and Weidner (2018);Andrews et al. (2017, 2019)). However, the ﬁnite population setting studied here diﬀersfrom the usual sampling-based approach in that the variance matrix is only conservativelyestimated. It would be useful to study which guarantees of size control and/or optimalityfrom the sampling literature are robust to this modiﬁcation.18 eferences Abadie, Alberto, Susan Athey, Guido W. Imbens, and Jef-frey M. Wooldridge , “Sampling-Based versus Design-Based Uncertaintyin Regression Analysis,”

Econometrica , 2020, (1), 265–296. _eprint:https://onlinelibrary.wiley.com/doi/pdf/10.3982/ECTA12675. , , Guido W Imbens, and Jeﬀrey Wooldridge , “When Should You Adjust StandardErrors for Clustering?,” Working Paper 24003, National Bureau of Economic ResearchNovember 2017. Series: Working Paper Series. Andrews, Isaiah, Matthew Gentzkow, and Jesse Shapiro , “Measuring the Sensitivityof Parameter Estimates to Estimation Moments,”

The Quarterly Journal of Economics ,2017, (4), 1553–1592. , , and , “On the Informativeness of Descriptive Statistics for Structural Estimates,”Technical Report 2019.

Angrist, Joshua and Guido Imbens , “Identiﬁcation and Estimation of Local AverageTreatment Eﬀects,”

Econometrica , 1994, (2), 467–475. Angrist, Joshua D. and Jorn-Steﬀen Pischke , Mostly Harmless Econometrics: AnEmpiricist’s Companion , Princeton: Princeton University Press, 2009. , Guido W. Imbens, and Donald B. Rubin , “Identiﬁcation of Causal Eﬀects UsingInstrumental Variables,”

Journal of the American Statistical Association , 1996, (434),444–455. Publisher: [American Statistical Association, Taylor & Francis, Ltd.]. Armstrong, Timothy and Michal Kolesar , “Optimal Inference in a Class of RegressionModels,”

Econometrica , 2018, , 655–683. and , “Simple and Honest Conﬁdence Intervals in Nonparametric Regression,” TechnicalReport 2018. Aronow, Peter M. and Joel A. Middleton , “A class of unbiased estimators of theaverage treatment eﬀect in randomized experiments,”

Journal of Causal Inference , 2015, (1), 135–154. Athey, Susan and Guido Imbens , “Design-Based Analysis in Diﬀerence-In-DiﬀerencesSettings with Staggered Adoption,” arXiv:1808.05293 [cs, econ, math, stat] , August 2018.

Berger, Yves G. , “Rate of convergence to normal distribution for the Horvitz-Thompsonestimator,”

Journal of Statistical Planning and Inference , April 1998, (2), 209–226. Bertrand, Marianne, Esther Duﬂo, and Sendhil Mullainathan , “How Much ShouldWe Trust Diﬀerences-In-Diﬀerences Estimates?,”

The Quarterly Journal of Economics ,February 2004, (1), 249–275.

Bonhomme, Stephanne and Martin Weidner , “Minimizing Sensitivity to Model Mis-speciﬁcation,” Technical Report 2018. 19 orusyak, Kirill and Xavier Jaravel , “Revisiting Event Study Designs,” SSRN ScholarlyPaper ID 2826228, Social Science Research Network, Rochester, NY August 2016.

Callaway, Brantly and Pedro H. C. Sant’Anna , “Diﬀerence-in-Diﬀerences with Multi-ple Time Periods,” SSRN Scholarly Paper ID 3148250, Social Science Research Network,Rochester, NY March 2019.

Chen, Jiafeng, Edward Glaeser, and David Wessel , “The (Non-) Eﬀect of Opportu-nity Zones on Housing Prices,” Technical Report w26587, National Bureau of EconomicResearch, Cambridge, MA December 2019.

Conley, Timothy G., Christian B. Hansen, and Peter E. Rossi , “Plausibly Exoge-nous,”

The Review of Economics and Statistics , October 2010, (1), 260–272. de Chaisemartin, Clément and Xavier D’Haultfœuille , “Two-way ﬁxed eﬀects estima-tors with heterogeneous treatment eﬀects,” arXiv:1803.08807 [econ] , March 2018. arXiv:1803.08807. Fisher, R. A. , The design of experiments

The design of experiments, Oxford, England:Oliver & Boyd, 1935. Pages: xi, 251.

Freedman, David A. , “On Regression Adjustments in Experiments with Several Treat-ments,”

The Annals of Applied Statistics , 2008, (1), 176–196., “On regression adjustments to experimental data,” Advances in Applied Mathematics ,2008, (2), 180–193. Goodman-Bacon, Andrew , “Diﬀerence-in-Diﬀerences with Variation in Treatment Tim-ing,” Working Paper 25018, National Bureau of Economic Research September 2018.

Goodman, Joshua , “Who merits ﬁnancial aid?: Massachusetts’ Adams Scholarship,”

Jour-nal of Public Economics , 2008, , 2121–2131. Hajek, Jaroslav , “Asymptotic Theory of Rejective Sampling with Varying Probabilitiesfrom a Finite Population,”

Annals of Mathematical Statistics , December 1964, (4),1491–1523. Publisher: Institute of Mathematical Statistics. Heckman, James J. and Edward J. Vytlacil , “Econometric Evaluation of Social Pro-grams, Part I: Causal Models, Structural Models and Econometric Policy Evaluation,” in“Handbook of Econometrics,” Vol. 6 2006, pp. 4779–4874.

Imbens, Guido W. , “Nonparametric Estimation of Average Treatment Eﬀects Under Exo-geneity: A Review,”

The Review of Economics and Statistics , February 2004, (1), 4–29.Publisher: MIT Press. and Donald B. Rubin , Causal Inference for Statistics, Social, and Biomedical Sciences:An Introduction , Cambridge: Cambridge University Press, 2015.20 ang, Hyunseung, Laura Peck, and Luke Keele , “Inference for instrumen-tal variables: a randomization inference approach,”

Journal of the Royal Statisti-cal Society: Series A (Statistics in Society) , 2018, (4), 1231–1254. _eprint:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssa.12353.

Li, Xinran and Peng Ding , “General Forms of Finite Population Central Limit The-orems with Applications to Causal Inference,”

Journal of the American Statistical As-sociation , October 2017, (520), 1759–1769. Publisher: Taylor & Francis _eprint:https://doi.org/10.1080/01621459.2017.1295865.

Lin, Winston , “Agnostic Notes on Regression Adjustments to Experimental Data: Reex-amining Freedman’s critique,”

The Annals of Applied Statistics , 2013, (1), 295–318. Manski, Charles F. and John V. Pepper , “How Do Right-to-Carry Laws Aﬀect CrimeRates? Coping with Ambiguity Using Bounded-Variation Assumptions,”

Review of Eco-nomics and Statistics , 2018, (2), 232–244.

Middleton, Joel A. , “A Uniﬁed Theory of Regression Adjustment for Design-based Infer-ence,” Technical Report, arXiv preprint arXiv:1803.06011 2018.

Neyman, Jerzy , “On the Application of Probability Theory to Agricultural Experiments.Essay on Principles. Section 9.,”

Statistical Science , 1923, (4), 465–472. Publisher:Institute of Mathematical Statistics. Rambachan, Ashesh and Jonathan Roth , “An Honest Approach to Parallel Trends,”Technical Report 2019.

Rosenbaum, Paul , “Sensitivity Analysis in Observational Studies,” in B. S. Everitt andD. C. Howell, eds.,

Encyclopedia of Statistics in Behavioral Science , 2005.

Rosenbaum, Paul R. , “Sensitivity analysis for certain permutation inferences in matchedobservational studies,” Technical Report 1 1987.,

Observational Studies , Springer Science, 2002.

Roth, Jonathan , “Pre-test with Caution: Event-study Estimates After Testing for ParallelTrends,”

Working paper , 2019.

Savje, Frederik and Angele Delevoye , “Consistency of the Horvitz-Thompson estimatorunder general sampling and experimental designs,”

Journal of Statistical Planning andInference , 2020, , 190–197.

Sekhon, Jasjeet S. and Yotam Shem-Tov , “Inference on a New Class of Sample AverageTreatment Eﬀects,”

Journal of the American Statistical Association , February 2020, pp. 1–18. Publisher: Taylor & Francis.

Staiger, Douglas and James H. Stock , “Instrumental Variables Regression with WeakInstruments,”

Econometrica , 1997, (3), 557–586. Publisher: [Wiley, Econometric Soci-ety]. 21 un, Liyan and Sarah Abraham , “Estimating Dynamic Treatment Eﬀects in Event Stud-ies with Heterogeneous Treatment Eﬀects,” Working Paper , 2020. van der Vaart, A. W. , Asymptotic Statistics , Cambridge University Press, June 2000.22 esign-Based Uncertainty forQuasi-Experiments

Appendix

Ashesh Rambachan Jonathan RothAugust 6, 2020

A Additional Proofs

Proof of Lemma 4

Proof.

It suﬃces to show that ˆ s V ar π r Y i p qs Ñ p and ˆ s V ar ´ π r Y i p qs Ñ p . We provide aproof for the former; the latter proof is analogous. For notational convenience, let v “ V ar π r Y i p qs . From the deﬁnition of ˆ s , we can write ˆ s v “ v ˜˜ N ÿ i D i p Y i p q ´ E π r Y i p qsq ¸ ´ p ¯ Y ´ E π r Y i p qsq ¸ . Now, N ř i D i p Y i p q ´ E π r Y i p qsq can be viewed as a Horvitz-Thompson estimator of N ř i π i p Y i p q ´ E π r Y i p qsq “ v , and thus by Theorem 6.2 in Hajek (1964), its variance isequal to p ` o p qq ˜ N ÿ i π i p ´ π i q ¸ ¨ V ar ˜ π “ p Y i p q ´ E π r Y i p qsq ‰ q . Note further that ˜ N ÿ i π i p ´ π i q ¸ ¨ V ar ˜ π “ p Y i p q ´ E π r Y i p qsq ‰ ď N ÿ i π i p ´ π i qp Y i p q ´ E π r Y i p qsq ď N m N p q ÿ i π i p Y i p q ´ E π rp Y i p qsq “ N m N p q V ar π r Y i p qs . Applying Chebychev’s inequality, we have N ÿ i p D i p Y i p q ´ E π r Y i p qsq ´ v “ O p ˆc N m N p q V ar π r Y i p qs ˙ . Next, viewing ¯ Y as a Horvitz-Thomson estimator, we see that its variance is bounded by p ` o p qq ´ N ř i π i p ´ π i q ¯ ¨ V ar ˜ π r Y i p qs , which by similar logic to that above is boundedA-1bove by p ` o p qq N V ar π r Y i p qs . Thus, by Chebychev’s inequality, ¯ Y ´ E π r Y i p qs “ O p ˆc N V ar π r Y i p qs ˙ . Combining the results above, it follows that ˆ s v “ v ˜ v ` O p ˜d m N p q v N ¸ ` O p ˆ N v ˙¸ “ ` O p ˜d m N p q v N ¸ ` O p ˆ N ˙ . However, the ﬁrst O p term converges to 0 by assumption, and since Assumption 1 impliesthat N Ñ 8 , the second O p term converges to 0 as well. Proof of Proposition 3.1

Proof.

The proof of claim (1) is analogous to equation (3). We next prove claim (2). Forsimplicity, let A n “ V R r ˆ τ s , let B n be the right-hand-side of the ﬁrst equality in claim (2),and let C n be the right-hand side of the inequality in claim (2). We ﬁrst prove the inequality.Note that by the deﬁnition of a semi-deﬁnite matrix, it suﬃces to show that l B n l ď l C n l for all l P R K . However, letting Y i p d q “ l Y i p d q , the desired inequality follows from Lemma3.2. Next, observe that A n ´ B n “ o p N ´ q if and only if D n : “ N A n ´ N B n “ o p q ,which holds if and only if l D n l “ o p q for all l P L : “ t e j | ď j ď K u Y t e j ´ e j | ď j, j ď K u , where e j is the j th basis vector in R K . To obtain the last equivalence, note that e j D n e j “ r D n s jj (the p j, j q element of D n ), whereas exploiting the fact that D n is symmetric, p e j ´ e j q D n p e j ´ e j q “ r D n s jj ` r D n s j j ´ r D n s jj , and so convergence of l D n l to zero forall l P L is equivalent to convergence of each of the elements of D n . Next, note that if Y i p d q “ l Y i p d q , then ˆ τ as deﬁned in (2) is equal to l ˆ τ and V ar ˜ π r Y i p d qs “ l V ar ˜ π r Y i p d qs l .It follows from Lemma 3.1 that N ¨ l V R r ˆ τ s l r ` o p qs “ N ř Nk “ π k p ´ π k q N N N N l „ NN V ar ˜ π r Y i p qs ` NN V ar ˜ π r Y i p qs ´ V ar ˜ π r τ i s  l, (16)which implies that l D n l “ l p N A n q l ¨ o p q . However, Assumption 4, together with theinequality in claim (2), implies that the right-hand side of the previous display is O p q , andthus l p N A n q l “ O p q , from which the desired result follows.The proof of (3) is similar to the proof of Lemma A3 in Li and Ding (2017), which gives asimilar result in the case of completely randomized experiments. We provide a proof for theconvergence of ˆ s ; the convergence of ˆ s is similar. As in the proof to claim (2), it suﬃcesA-2o show that l ˆ s l ´ l V ar π r Y i p qs l Ñ p for all l P L . Let Y i p d q “ l Y i p q . Then l ˆ s l “ N ÿ i D i p l Y i p q ´ N ÿ j D j l Y j p qq “ ˜ N ÿ i D i p l Y i p q ´ l E π r Y i p qsq ¸ ` ˜ N ÿ i D i l Y i p q ´ E π r l Y i p qs ¸ , (17)where the second line uses the bias variance decomposition. The ﬁrst term can be viewedas a Horvitz-Thompson estimator of N ř i π i p l Y i p q ´ E π r l Y i p qsq “ V ar π r l Y i p qs underPoisson rejective sampling, and thus has variance equal to p ` o p qq N ÿ i π i p ´ π i q V ar ˜ π “ p l Y i p q ´ E π r l Y i p qsq ‰ . Further, observe that N ÿ i π i p ´ π i q V ar ˜ π “ p l Y i p q ´ E π r l Y i p qsq ‰ ď N E π “ p l Y i p q ´ E π r l Y i p qsq ‰ ď N max i p l Y i p q ´ E π r l Y i p qsq ( ¨ V ar π r l Y i p qs ď „ || l || NN  ” max i || Y i p q ´ E π r Y i p qs || { N ı ¨ r l V ar π r Y i p qs l s “ o p q where the ﬁrst inequality is obtained using the fact that V ar ˜ π r X s ď E ˜ π r X s , expanding thedeﬁnition of E ˜ π r¨s , and using the inequality π i p ´ π i q ď π i , analogous to the argument inthe proof to Lemma 3.3; the ﬁnal inequality uses the Cauchy-Schwarz inequality and factorsout l ; and we obtain that the ﬁnal term is o p q by noting that the ﬁrst and ﬁnal bracketedterms are O p q by Assumption 4 and the middle term is o p q by Assumption 5. ApplyingChebychev’s inequality, it follows that the ﬁrst term in (17) is equal to V ar π r l Y i p qs ` o p q .To complete the proof of the claim, we show that the second term in (17) is o p q . Notethat we can view N ř i D i l Y i p q as a Horvitz-Thompson estimator of E π r l Y i s . Followingsimilar arguments to that in the proceeding paragraph, we have that its variance is boundedabove by N l V ar π r Y i p qs l , which is o p q by Assumption 4 combined with the fact thatAssumption 1 implies N Ñ 8 . Applying Chebychev’s inequality again, we obtain that thesecond term in (17) is o p q , as needed.To prove claim (4), appealing to the Cramer-Wold device, it suﬃces to show that for any l P R K zt u , Y i “ l Y i , and ˆ τ as deﬁned in (2), V R r ˆ τ s ´ p ˆ τ ´ τ q Ñ d N p , q . This follows fromProposition 3.4, provided that we can show that Assumption 6 implies that Assumption 3holds when Y i “ l Y i for any conformable vector l . Indeed, recall that σ π “ l Σ ˜ π l ě λ min || l || ,A-3nd hence λ min ě || l || σ π . From the Cauchy-Schwarz inequality ˇˇˇˇˇˇ ˜Y i ´ E ˜ π ” ˜Y i ıˇˇˇˇˇˇ ¨ || l || ě p ˜ Y i ´ E ˜ π ” ˜ Y i ı q . Together with the previous inequality, this implies that λ min E ˜ π «ˇˇˇˇˇˇ ˜Y i ´ E ˜ π ” ˜Y i ıˇˇˇˇˇˇ ¨ «ˇˇˇˇˇˇ ˜Y i ´ E ˜ π ” ˜Y i ıˇˇˇˇˇˇ ě cÿ i π i p ´ π i q ¨ λ min ¨ ǫ ﬀﬀ ě σ π E ˜ π « p ˜ Y i ´ E ˜ π ” ˜ Y i ı q ¨ «ˇˇˇ p ˜ Y i ´ E ˜ π ” ˜ Y i ı q ˇˇˇ ě cÿ i π i p ´ π i q ¨ σ ˜ π ǫ ﬀﬀ ,,