[PDF] Inference under Covariate-Adaptive Randomization with Imperfect Compliance

Abstract

This paper studies inference in a randomized controlled trial (RCT) with covariate-adaptive randomization (CAR) and imperfect compliance of a binary treatment. In this context, we study inference on the LATE. As in Bugni et al. (2018,2019), CAR refers to randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve ``balance'' within each stratum. In contrast to these papers, however, we allow participants of the RCT to endogenously decide to comply or not with the assigned treatment status. We study the properties of an estimator of the LATE derived from a ``fully saturated'' IV linear regression, i.e., a linear regression of the outcome on all indicators for all strata and their interaction with the treatment decision, with the latter instrumented with the treatment assignment. We show that the proposed LATE estimator is asymptotically normal, and we characterize its asymptotic variance in terms of primitives of the problem. We provide consistent estimators of the standard errors and asymptotically exact hypothesis tests. In the special case when the target proportion of units assigned to each treatment does not vary across strata, we can also consider two other estimators of the LATE, including the one based on the ``strata fixed effects'' IV linear regression, i.e., a linear regression of the outcome on indicators for all strata and the treatment decision, with the latter instrumented with the treatment assignment. Our characterization of the asymptotic variance of the LATE estimators allows us to understand the influence of the parameters of the RCT. We use this to propose strategies to minimize their asymptotic variance in a hypothetical RCT based on data from a pilot study. We illustrate the practical relevance of these results using a simulation study and an empirical application based on Dupas et al. (2018).

Full PDF

aa r X i v : . [ ec on . E M ] F e b Inference under Covariate-Adaptive Randomizationwith Imperfect Compliance ∗ Federico A. BugniDepartment of EconomicsDuke University [email protected]

Mengsi GaoDepartment of EconomicsUC Berkeley [email protected]

February 9, 2021

Abstract

This paper studies inference in a randomized controlled trial (RCT) with covariate-adaptive random-ization (CAR) and imperfect compliance of a binary treatment. In this context, we study inference onthe local average treatment eﬀect (LATE), i.e., the average treatment eﬀect conditional on individualsthat always comply with the assigned treatment. As in Bugni et al. (2018, 2019), CAR refers to random-ization schemes that ﬁrst stratify according to baseline covariates and then assign treatment status so asto achieve “balance” within each stratum. In contrast to these papers, however, we allow participants ofthe RCT to endogenously decide to comply or not with the assigned treatment status.We study the properties of an estimator of the LATE derived from a “fully saturated” instrumentalvariable (IV) linear regression, i.e., a linear regression of the outcome on all indicators for all strata andtheir interaction with the treatment decision, with the latter instrumented with the treatment assignment.We show that the proposed LATE estimator is asymptotically normal, and we characterize its asymptoticvariance in terms of primitives of the problem. We provide consistent estimators of the standard errorsand asymptotically exact hypothesis tests. In the special case when the target proportion of units assignedto each treatment does not vary across strata, we can also consider two other estimators of the LATE,including the one based on the “strata ﬁxed eﬀects” IV linear regression, i.e., a linear regression of theoutcome on indicators for all strata and the treatment decision, with the latter instrumented with thetreatment assignment.Our characterization of the asymptotic variance of the LATE estimators in terms of the primitives ofthe problem allows us to understand the inﬂuence of the parameters of the RCT. We use this to proposestrategies to minimize their asymptotic variance in a hypothetical RCT based on data from a pilot study.We illustrate the practical relevance of these results using a simulation study and an empirical applicationbased on Dupas et al. (2018).

KEYWORDS: Covariate-adaptive randomization, stratiﬁed block randomization, treatment assignment, ran-domized controlled trial, strata ﬁxed eﬀects, saturated regression, imperfect compliance.JEL classiﬁcation codes: C12, C14 ∗ We thank Ivan Canay, Azeem Shaikh, and Max Tabord-Meehan for helpful comments and discussion. This research wassupported by the National Science Foundation Grant SES-1729280.

Introduction

This paper studies inference in a randomized controlled trial (RCT) with covariate-adaptive randomization(CAR) with a binary treatment. As in Bugni et al. (2018, 2019), CAR refers to randomization schemes thatﬁrst stratify according to baseline covariates and then assign treatment status so as to achieve “balance”within each stratum. As these references explain, CAR is commonly used to assign treatment status in RCTsin all parts of the sciences. In contrast to Bugni et al. (2018, 2019), this paper allows for imperfect compliance. That is, the partic-ipants of the RCT can endogenously decide their treatment, denoted by D , which may or may not coincidewith the assigned treatment status, denoted by A . This constitutes an empirically relevant contribution tothis growing literature, as imperfect compliance is a common occurrence in many RCTs. For recent examplesof RCTs that use CAR and have imperfect compliance, see Angrist and Lavy (2009), Attanasio et al. (2011),Dupas et al. (2018), McIntosh et al. (2018), Somville and Vandewalle (2018), among many others.Our goal is to study the eﬀect of the treatment on an outcome of interest, denoted by Y . We considerthe potential outcome framework with Y = Y (1) D + Y (0)(1 − D ), where Y (1) denotes the outcome withtreatment and Y (0) denotes the outcome without treatment. In the context of imperfect treatment compli-ance, the causal parameter of interest is the so-called local average treatment eﬀect (LATE) introduced inAngrist and Imbens (1994), and given by β ≡ E [ Y (1) − Y (0) | D = A ] . (1.1)In words, the LATE is the average treatment eﬀect for those individuals who decide to comply with theirassigned treatment status, i.e., the “compliers”. As in Bugni et al. (2019), we consider inference on theLATE based on simple linear regressions. In Section 3, we study the properties of an estimator of theLATE derived from a “fully saturated” instrumental variable (IV) linear regression, i.e., a linear regressionof the outcome on all indicators for all strata and their interaction with the treatment decision, wherethe latter is instrumented with the treatment assignment. We show that its coeﬃcients can be used toconsistently estimate the LATE under very general conditions. We show that the proposed LATE estimatoris asymptotically normal, and we characterize its asymptotic variance in terms of primitives of the problem.As expected, we show that the asymptotic variance is diﬀerent from the one under perfect compliancederived in Bugni et al. (2018, 2019). We provide consistent estimators of these new standard errors andasymptotically exact hypothesis tests. In addition, we show that the results of the fully saturated regressioncan be used to estimate all of the primitive parameters of the problem.In the special case when the target proportion of units being assigned to each of the treatments does notvary across strata, we also consider two other regression-based estimators of the LATE. Section 4 proposesan estimator of the LATE based on the “strata ﬁxed eﬀects” IV linear regression, i.e., a linear regression ofthe outcome on indicators for all strata and the treatment decision, where the latter is instrumented withthe treatment assignment. In turn, Section 5 proposes an estimator of the LATE based on a “two sampleregression” IV linear regression, i.e., a linear regression of the outcome on a constant and the treatmentdecision, where the latter is instrumented by the treatment assignment. We show that the LATE estimators See Rosenberger and Lachin (2016) for a textbook treatment focused on clinical trials and Duﬂo et al. (2007) andBruhn and McKenzie (2008) for reviews focused on development economics. Under imperfect compliance, the literature also considers other parameters, such as the intention to treat, deﬁned asITT = E [ Y | A = 1] − E [ Y | A = 0], or the average treatment eﬀect on the treated, deﬁned as TOT = E [ Y (1) − Y (0) | D = 1].In this paper, we prefer the LATE over these alternative parameters. First, it can be shown that ITT = LATE × P ( D = A ).Unlike the ITT, the LATE represents the average treatment eﬀect for a subset of the population (i.e., the compliers), whichmakes it preferable. Second, it can be shown that TOT = LATE if there are no “always takers”, i.e., individuals who adopt thetreatment regardless of the assignment. However, if always takers are present, the TOT is not identiﬁed under our assumptions. Within this class of CAR methods, our second result in Section 6 establishes that theasymptotic variance of the estimators of the LATE cannot increase when the collection of strata becomesﬁner. To the best of our knowledge, this is a new ﬁnding in this literature even when there is perfectcompliance. In addition, we show how to use the data from a pilot RCT to estimate the asymptotic variancethat would result from using a ﬁner set of strata in a hypothetical RCT. Our third and ﬁnal result in Section6 provides an expression for the optimal treatment propensity in a hypothetical RCT in terms of its primitiveparameters. To exploit this result in practice, we provide a consistent estimator of the optimal treatmentpropensity based on data from a pilot version of the RCT.In recent work, Ansel et al. (2018) also consider inference for the LATE in RCTs with a binary treatmentand imperfect compliance. While most of their paper focuses on the case in which treatment assignment isdone via simple random sampling, they consider inference on RCTs with CAR in Section 4. In contrast,our paper is entirely focused on RCTs with CAR. This allows us to tailor our assumptions to the problemunder consideration, and it enables us to give more detailed formal arguments. Second, Ansel et al. (2018)consider IV regressions without fully specifying the set of covariates in these regressions. Consequently, theyderive the asymptotic variance of their LATE estimators in terms of high-order expressions. In contrast, wefully specify the covariates in our IV regressions according to the speciﬁcation typically used by practitioners.This allows us to obtain explicit expressions for the asymptotic variance of our LATE estimators, detailingwhich of these depend on the underlying population and which are chosen by the researcher implementingthe RCT. In this sense, our expressions reveal the underlying forces determining the asymptotic variance andenable researchers to choose the RCT parameters to improve the eﬃciency of their estimators. We considerthe topic of RCT design in Section 6.The remainder of the paper is organized as follows. In Section 2, we describe the setup of the inferenceproblem and we specify our assumptions. The next three sections consider the problem of inference on theLATE based on diﬀerent IV regression models. Section 3 considers the “fully saturated” IV linear regression,Section 4 considers the “strata ﬁxed eﬀects” IV linear regression, and Section 5 considers the “two-sampleregression” IV linear regression. In each one of these sections, we propose a consistent estimator of the LATE,we characterize its asymptotic distribution, and we propose a consistent estimator of their standard errorsand asymptotically valid hypothesis tests. In Section 6, we consider the problem of designing a hypotheticalRCT with CAR based on data from a pilot RCT with CAR. In Section 7, we study the ﬁnite sample behaviorof our hypothesis tests based via Monte Carlo simulations. Section 8 illustrates the practical relevance of ourresults by an empirical application based on the RCT in Dupas et al. (2018). Section 9 provides concludingremarks. All proofs and several intermediate results are collected in the appendix. This ﬁnding extends the results obtained by Bugni et al. (2018, 2019) to the case of imperfect compliance. An example of this is the proof of Lemma A.5, where we modify the arguments in Bugni et al. (2018, Lemma B.2) to allowfor the presence of imperfect compliance. Setup and notation

We consider an RCT with n participants. For each participant i = 1 , . . . , n , Y i ∈ R denotes the observedoutcome of interest, Z i ∈ Z denotes a vector of observed baseline covariates, A i ∈ { , } indicates thetreatment assignment, and D i ∈ { , } indicates the treatment decision. Relative to the setup in Bugni et al.(2018, 2019), we allow for imperfect compliance, i.e., for D i = A i .We consider potential outcome models for both outcomes and treatment decisions. For each participant i = 1 , . . . , n , we use Y i ( D ) to denote the potential outcome of participant i if he/she makes treatmentdecision D , and we use D i ( A ) to denote the potential treatment decision of participant i if he/she hasassigned treatment A . These are related to their observed counterparts in the usual manner: D i = D i (1) A i + D i (0)(1 − A i ) ,Y i = Y i (1) D i + Y i (0)(1 − D i ) . (2.1)Following the usual classiﬁcation in the LATE framework in Angrist and Imbens (1994), each participantin the RCT can only be one of four types: complier, always taker, never taker, or a deﬁer. An individual i is said to be a complier if { D i (0) = 0 , D i (1) = 1 } , an always taker if { D i (0) = D i (1) = 1 } , a never takerif { D i (0) = D i (1) = 0 } , and a deﬁer if { D i (0) = 1 , D i (1) = 0 } . As usual in the literature, we later imposethat there are no deﬁers in our population of participants in order to identify the LATE. It is convenientto use C to denote a complier, AT to denote an always taker, N T to denote a never taker, and

DEF todenote a deﬁer. Our goal in this paper is to consistently estimate the LATE β ≡ E [ Y (1) − Y (0) | C ] and totest hypotheses about it. In particular, for a prespeciﬁed choice of β ∈ R , we are interested in the followinghypothesis testing problem H : β = β versus H : β = β (2.2)at a signiﬁcance level α ∈ (0 , P n to denote the distribution of the observed data X ( n ) = { ( Y i , D i , A i , Z i : i = 1 , . . . , n } and denote by Q n the distribution of the underlying random variables, given by W ( n ) = { ( Y i (1) , Y i (0) , D i (0) , D i (1) , Z i ) : i = 1 , . . . , n } . Note that P n is jointly determined by (2.1), Q n , and the treatment assignment mechanism. We thereforestate our assumptions below in terms of the restrictions on Q n and the treatment assignment mechanism. Infact, we will not make reference to P n for the remainder of the paper, and all the operations are understoodto be under Q n and the treatment assignment mechanism.Strata are constructed from the observed, baseline covariates Z i using a prespeciﬁed function S : Z → S ,where S is a ﬁnite set. For each participant i = 1 . . . , n , let S i ≡ S ( Z i ) and let S ( n ) = { S i : i = 1 , . . . , n } .By deﬁnition, we note that S ( n ) is completely determined by the covariates in W ( n ) .We begin by describing our assumptions on the underlying data generating process (DGP) of W ( n ) . Assumption 2.1. W ( n ) is an i.i.d. sample that satisﬁes(a) E [ Y i ( d ) ] < ∞ for all d ∈ { , } ,(b) p ( s ) ≡ P ( S i = s ) > for all s ∈ S , c) P ( D i (0) = 1 , D i (1) = 0) = 0 or, equivalently, P ( D i (1) ≥ D i (0)) = 1 .(d) π D (1) ( s ) − π D (0) ( s ) > for all s ∈ S , where π D ( a ) ( s ) = P ( D i ( a ) = 1 | S i = s ) for ( a, s ) ∈ { , } × S . Assumption 2.1 requires the underlying data distribution to be i.i.d., i.e., Q n = Q n , where Q denotes thecommon marginal distribution of ( Y i (1) , Y i (0) , D i (0) , D i (1) , Z i ). In addition, the assumption imposes severalrequirements on Q . Assumption 2.1(a) demands that the potential outcomes have ﬁnite second moments,which is important to develop our asymptotic analysis. Assumption 2.1(b) requires all strata to be relevant.Assumption 2.1(c) corresponds to Angrist and Imbens (1994, Condition 2), and imposes the standard “nodeﬁers” or “monotonicity” condition that is essential to identify the LATE. In other words, this conditionimplies that there are no participants who will decide to defy the treatment assignment, i.e., decide toboth adopt the treatment when assigned to the control and decide to adopt the control when assigned totreatment. To interpret Assumption 2.1(d), we note that Lemma A.1 provides the following expression ofthe probability of each type of participant conditional on the stratum s ∈ S : P ( AT | S = s ) = π D (0) ( s ) ,P ( N T | S = s ) = 1 − π D (1) ( s ) ,P ( C | S = s ) = π D (1) ( s ) − π D (0) ( s ) . (2.3)By (2.3), Assumption 2.1(d) imposes that every stratum has a non-trivial amount of participants who willdecide to comply with the assigned treatment status. This corresponds to a strata-speciﬁc version of therequirement in Angrist and Imbens (1994, Condition 1).Next, we describe our assumptions on the treatment assignment mechanism. As explained earlier, wefocus our analysis on CAR, i.e., on randomization schemes that ﬁrst stratify according to baseline covariatesand then assign treatment status to as to achieve “balance” within each stratum. To describe our assumptionmore formally, we require some further notation. Let A ( n ) = { A i : i = 1 , . . . , n } denote the vector oftreatment assignments. For any s ∈ S , let π A ( s ) ∈ (0 ,

1) denote the “target” proportion of participants toassign to treatment in stratum s , determined by the researcher implementing the RCT. Also, let n A ( s ) ≡ n X i =1 A i = 1 , S i = s ]denote the number of participants assigned to treatment in stratum s , and let n ( s ) ≡ n X i =1 S i = s ]denote the number of participants in stratum s . With this notation in place, the following assumption theconditions that we will require for the treatment assignment mechanism. Assumption 2.2.

The treatment assignment mechanism satisﬁes(a) { W ( n ) ⊥ A ( n ) | S ( n ) } ,(b) n A ( s ) /n ( s ) p → π A ( s ) ∈ (0 , for all s ∈ S , Assumption 2.2(a) requires that the treatment assignment mechanism is a function of the vector of strata S ( n ) and an exogenous randomization device. Assumption 2.2(b) imposes that the fraction of units assignedto treatment in the stratum s converges in probability to a target proportion π A ( s ) as the sample size diverges.4s we show in Section 3, Assumption 2.2 imposes suﬃcient structure of the CAR mechanism to analyze theasymptotic distribution of the LATE estimator in the SAT regression. However, as we show in Sections 4and 5, Assumption 2.2 will not be enough to guarantee the consistency of the LATE estimator in the SFEand 2SR regressions. To analyse the asymptotic properties of these estimators, we replace Assumption 2.2with the following condition, which mildly strengthens it. Assumption 2.3.

The treatment assignment mechanism satisﬁes(a) { W ( n ) ⊥ A ( n ) | S ( n ) } ,(b) {{√ n ( n A ( s ) /n ( s ) − π A ( s )) : s ∈ S}| S ( n ) } d → N ( , Σ A ) w.p.a.1, and, for some τ ( s ) ∈ [0 , , Σ A ≡ diag ( { τ ( s ) π A ( s )(1 − π A ( s )) /p ( s ) : s ∈ S} ) . (c) π A = π A ( s ) ∈ (0 , for all s ∈ S . Note that Assumption 2.2(a) and Assumption 2.3(a) coincide. Assumption 2.3(b) strengthens the con-vergence Assumption 2.2(b), and it requires the fraction of units assigned to the treatment in the stratum s to be asymptotically normal, conditional on the vector of strata S ( n ) . For each stratum s ∈ S , the parameter τ ( s ) ∈ [0 ,

1] determines the amount of dispersion that the CAR mechanism allows on the fraction of unitsassigned to the treatment in that stratum. A lower value of τ ( s ) implies that the CAR mechanism imposes ahigher degree of “balance” or “control” of the treatment assignment proportion relative to its desired targetvalue. Finally, Assumption 2.3(c) imposes that the target value for the treatment assignment does not varyby strata. As we show in Sections 4 and 5, this condition is key to the consistency of the LATE estimatorsproduced by the SFE and 2SR regressions.As explained by Bugni et al. (2018, 2019) and Rosenberger and Lachin (2016, Sections 3.10 and 3.11),Assumptions 2.2 and 2.3 are satisﬁed by a wide array of CAR schemes. We brieﬂy consider two popularschemes that can easily be seen to satisfy this assumption.

Example 2.1 (Simple Random Sampling (SRS)) . This refers to a treatment assignment mechanism inwhich A ( n ) satisfy P ( A ( n ) = { a i : i = 1 , . . . , n }| S ( n ) = { s i : i = 1 , . . . , n } , W ( n ) ) = n Y i =1 π A ( s i ) a i (1 − π A ( s i )) − a i . (2.4) In other words, SRS assigns each participant in the stratum s to treatment with probability π A ( s ) and tocontrol with probability (1 − π A ( s )) , independent of anything else in sample.Note that Assumption 2.3(a) follows immediately from (2.4) . Also, by combining (2.4) , Assumption 2.1,and the Central Limit Theorem (CLT), it is possible to show Assumption 2.3(b) holds with τ ( s ) = 1 for all s ∈ S . In terms of the range of values of τ ( s ) allowed by Assumption 2.3(b), SRS imposes the least amountof “balance” of the treatment assignment proportion relative to its desired target value. Finally, Assumption2.3(c) can be satisﬁed by setting π A ( s ) to be constant across strata. Example 2.2 (Stratiﬁed Block Randomization (SBR)) . This is sometimes also referred to as block ran-domization or permuted blocks within strata. In SBR, the assignments across strata are independent, andindependently of the rest of the information in the sample. Within every stratum s , SBR assigns exactly Assumption 2.3(b) is slightly weaker than Bugni et al. (2019, Assumption 4.1(c)) in that we require the condition holdingw.p.a.1 instead of a.s. We establish that the w.p.a.1 version of the assumption is suﬃcient to establish all of our formal results. n ( s ) π A ( s ) ⌋ of the n ( s ) participants in stratum s to treatment and the remaining n ( s ) − ⌊ n ( s ) π A ( s ) ⌋ tocontrol, where all possible (cid:18) n ( s ) ⌊ n ( s ) π A ( s ) ⌋ (cid:19) assignments are equally likely.As explained by Bugni et al. (2018, 2019), this mechanism satisﬁes Assumptions 2.3(a)-(b) with τ ( s ) = 0 for all s ∈ S . In terms of the range of values of τ ( s ) allowed by Assumption 2.3(b), SBR imposes the mostamount of “balance” of the treatment assignment proportion relative to its desired target value. Finally,Assumption 2.3(c) can be satisﬁed by setting π A ( s ) to be constant across strata. In this section, we study the asymptotic properties of an IV estimator of the LATE based on a linearregression model of the outcome of interest on the full set of indicators for all strata and their interactionwith the treatment decision, where the latter is instrumented with the treatment assignment. We refer tothis as the SAT IV regression. We show in this section that this SAT IV regression can consistently estimatethe LATE for each stratum, and we derive their joint asymptotic distribution. These estimators can thenbe combined to produce a consistent estimator of the LATE. We show that this estimator is asymptoticallynormal and we characterize its asymptotic variance in terms of the primitives parameters of the RCT. Wealso show that the coeﬃcients and residuals of the SAT IV regression can be used to consistently estimatethese primitive parameters, which allows us to propose a consistent estimator of the standard errors of theLATE estimator. All of this allows us to propose hypothesis tests for the LATE that are asymptoticallyexact, i.e., their limiting rejection probability under the null hypothesis is equal to the nominal level.In terms of our notation, the SAT IV regression is the result of regressing Y i on { I { S i = s } : s ∈ S} and { D i I { S i = s } : s ∈ S} . Since the treatment decision D i is endogenously decided by the RCT participant,we instrument it with the exogenous treatment assignment A i . To deﬁne these IV estimators precisely, set Y n ≡ { Y i : i = 1 , . . . , n } ′ X sat n ≡ { X sat i : i = 1 , . . . , n } ′ with X sat i ≡ {{ I { S i = s } : s ∈ S} ′ , { D i I { S i = s } : s ∈ S} ′ } Z sat n ≡ { Z sat i : i = 1 , . . . , n } ′ with Z sat i ≡ {{ I { S i = s } : s ∈ S} ′ , { A i I { S i = s } : s ∈ S} ′ } . The IV estimators of the coeﬃcients in SAT regression are( { ˆ γ sat ( s ) : s ∈ S} ′ , { ˆ β sat ( s ) : s ∈ S} ′ ) ′ ≡ ( Z sat n ′ X sat n ) − ( Z sat n ′ Y n ) , (3.1)where ˆ γ sat ( s ) corresponds to the IV estimator of the coeﬃcient on I { S i = s } and ˆ β sat ( s ) corresponds to theIV estimator of the coeﬃcient on D i I { S i = s } .Under Assumptions 2.1 and 2.2, Theorem A.1 in the appendix shows that, for each stratum s ∈ S ,ˆ γ sat ( s ) p → γ ( s ) ≡ " π D (1) ( s ) E [ Y (0) | C, S = s ] − π D (0) ( s ) E [ Y (1) | C, S = s ]+ π D (0) ( s ) E [ Y (1) | AT, S = s ] + (1 − π D (1) ( s )) E [ Y (0) | N T, S = s ] ˆ β sat ( s ) p → β ( s ) ≡ E [ Y (1) − Y (0) | C, S = s ] . (3.2)This last equation in (3.2) reveals that ˆ β sat is a consistent estimator of the vector of strata-speciﬁc LATE.To deﬁne a consistent estimator of the LATE based on these, all we then need is a consistent estimator of6he probability a participant belongs to each strata conditional on being a complier, i.e., P ( S = s | C ) for s ∈ S . To this end, for every s ∈ S , letˆ P ( S = s, C ) ≡ n ( s ) n (cid:18) n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) (cid:19) ˆ P ( C ) ≡ X s ∈S n ( s ) n (cid:18) n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) (cid:19) ˆ P ( S = s | C ) ≡ ˆ P ( S = s, C )ˆ P ( C ) = n ( s ) n ( n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) P ˜ s ∈S n (˜ s ) n ( n AD (˜ s ) n A (˜ s ) − n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) ) , (3.3)where n AD ( s ) ≡ n X i =1 A i = 1 , D i = 1 , S i = s ] n D ( s ) ≡ n X i =1 D i = 1 , S i = s ] . Under Assumptions 2.1 and 2.2, Theorem A.1 also shows that ˆ P ( S = s | C ) is a consistent estimator of P ( S = s | C ) for every s ∈ S . It is then natural to propose the following estimator of the LATE:ˆ β sat ≡ X s ∈S ˆ P ( S = s | C ) ˆ β sat ( s ) . (3.4)It follows from our previous discussion, the continuous mapping theorem, and the law of iterated expecta-tions that ˆ β sat is a consistent estimator of the LATE. The following theorem conﬁrms this result, and alsocharacterizes its asymptotic distribution in terms of primitive parameters of the RCT. Theorem 3.1 (SAT main result) . Suppose that Assumptions 2.1 and 2.2 hold. Then, √ n ( ˆ β sat − β ) d → N (0 , V sat ) , here β ≡ E [ Y (1) − Y (0) | C ] and V sat ≡ V sat Y, + V sat Y, + V sat D, + V sat D, + V sat H with V sat Y, ≡ P ( C ) X s ∈S p ( s ) π A ( s )  V [ Y (1) | AT, S = s ] π D (0) ( s ) + V [ Y (0) | N T, S = s ](1 − π D (1) ( s ))+ V [ Y (1) | C, S = s ]( π D (1) ( s ) − π D (0) ( s ))+( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) × π D (0) ( s )( π D (1) ( s ) − π D (0) ( s )) /π D (1) ( s )  V sat Y, ≡ P ( C ) X s ∈S p ( s )(1 − π A ( s ))  V [ Y (1) | AT, S = s ] π D (0) ( s ) + V [ Y (0) | N T, S = s ](1 − π D (1) ( s ))+ V [ Y (0) | C, S = s ]( π D (1) ( s ) − π D (0) ( s ))+( E [ Y (0) | C, S = s ] − E [ Y (0) | N T, S = s ]) × (1 − π D (1) ( s ))( π D (1) ( s ) − π D (0) ( s )) / (1 − π D (0) ( s ))  V sat D, ≡ P ( C ) X s ∈S p ( s )(1 − π D (1) ( s )) π A ( s ) π D (1) ( s )  − π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ])+ π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | N T, S = s ])+ π D (1) ( s )( E [ Y (1) − Y (0) | C, S = s ] − β )  V sat D, ≡ P ( C ) X s ∈S p ( s ) π D (0) ( s )(1 − π A ( s ))(1 − π D (0) ( s ))  − (1 − π D (0) ( s ))( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ])+(1 − π D (1) ( s ))( E [ Y (0) | C, S = s ] − E [ Y (0) | N T, S = s ])+(1 − π D (0) ( s ))( E [ Y (1) − Y (0) | C, S = s ] − β )  V sat H ≡ P ( C ) X s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) ( E [ Y (1) − Y (0) | C, S = s ] − β ) P ( C ) ≡ X s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) > . (3.5)Several remarks about Theorem 3.1 are in order. First, we note that ˆ β sat is related to the IV estimatorˆ β considered by Ansel et al. (2018). In fact, these two estimators coincide if one speciﬁes their covariate X i as a full vector of strata dummies. By specifying the set of covariates in our regression, we are able toobtain a closed-form expression of the asymptotic variance of ˆ β sat in terms of the primitive parameters ofthe RCT. This will become useful in Section 6, where we consider the problem of choosing the parametersof the RCT to improve eﬃciency.Second, we note that Bugni et al. (2019) derive the asymptotic distribution of ˆ β sat under perfect compli-ance (i.e., π D (1) ( s ) = 1, π D (0) ( s ) = 0). This means that we can understand the consequences of imperfectcompliance by comparing Theorem 3.1 and Bugni et al. (2019, Theorem 3.1). First and foremost, we notethat imperfect compliance means that the probability limit of ˆ β sat is no longer the ATE, but rather theLATE. Second, we note that imperfect compliance introduces signiﬁcant changes to the asymptotic varianceof ˆ β sat . Imperfect compliance not only changes the expressions of V sat Y = V sat Y, + V sat Y, and V sat H , but it alsoadds two new terms, V sat D, and V sat D, . All of this implies that the consistent estimator of V sat proposed inBugni et al. (2019, Theorem 3.3) no longer applies, and a new one is required. We do this in Theorem 3.2.Third, it is notable that Assumption 2.3 is not required to derive Theorem 3.1. In other words, thedetails of the CAR mechanism are not relevant to the asymptotic distribution of ˆ β sat . This was pointed outin the case of perfect compliance by Bugni et al. (2019), and Theorem 3.1 reveals that it also extends to thepresent setup.Third, we note that Theorem 3.1 allows for π D (0) ( s ) = P ( AT | S = s ) = 0 or 1 − π D (1) ( s ) = P ( N T | S = s ) = 0, but this requires a mild abuse of notation. If π D (0) ( s ) = P ( AT | S = s ) = 0, the mean and thevariance of { Y (1) | AT, S = s } are not properly deﬁned, but we can set V [ Y (1) | AT, S = s ] π D (0) ( s ) = 0 and E [ Y (1) | AT, S = s ] π D (0) ( s ) = 0 . − π D (1) ( s ) = P ( N T | S = s ) = 0, the mean and the variance of { Y (0) | N T, S = s } are notproperly deﬁned, but we can set V [ Y (0) | N T, S = s ](1 − π D (1) ( s )) = 0 and E [ Y (0) | N T, S = s ](1 − π D (1) ( s )) = 0 . In particular, in the special case of perfect compliance (i.e., π D (1) ( s ) = 1, π D (0) ( s ) = 0), Theorem 3.1 thenholds with β = E [ Y (1) − Y (0)] V sat = X s ∈S p ( s ) (cid:18) V [ Y (1) | S = s ] π A ( s ) + V [ Y (0) | S = s ](1 − π A ( s )) + ( E [ Y (1) − Y (0) | S = s ] − β ) (cid:19) , which can be shown to coincide with the corresponding result in Bugni et al. (2019, Section 5).As promised earlier, the next result provides a consistent estimator of V sat . Theorem 3.2 (Estimator of SAT asy. variance) . Suppose that Assumptions 2.1 and 2.2 hold. Deﬁne thefollowing estimators: ˆ V sat1 ≡ P ( C ) X s ∈S ( n ( s ) n A ( s ) ) × " n P ni =1 I { D i = 1 , A i = 1 , S i = s } [ˆ u i + (1 − n AD ( s ) n A ( s ) )( ˆ β sat ( s ) − ˆ β sat )] + n P ni =1 I { D i = 0 , A i = 1 , S i = s } [ˆ u i − n AD ( s ) n A ( s ) ( ˆ β sat ( s ) − ˆ β sat )] ˆ V sat0 ≡ P ( C ) X s ∈S ( n ( s ) n ( s ) − n A ( s ) ) × " n P ni =1 I { D i = 1 , A i = 0 , S i = s } [ˆ u i + (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( ˆ β sat ( s ) − ˆ β sat )] + n P ni =1 I { D i = 0 , A i = 0 , S i = s } [ˆ u i − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ( ˆ β sat ( s ) − ˆ β sat )] ˆ V sat H ≡ P ( C ) X s ∈S n ( s ) n (cid:18) n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) (cid:19) ( ˆ β sat ( s ) − ˆ β sat ) , (3.6) where { ˆ u i } ni =1 are the SAT IV-regression residuals, given by ˆ u i ≡ Y i − X s ∈S I { S i = s } ˆ γ sat ( s ) − X s ∈S D i I { S i = s } ˆ β sat ( s ) , (3.7) and (ˆ γ sat ( s ) , ˆ β sat ( s )) , ˆ β sat , and ˆ P ( C ) are as in (3.1) , (3.4) , and (3.3) , respectively. Then, ˆ V sat ≡ ˆ V sat1 + ˆ V sat0 + ˆ V sat H p → V sat . (3.8)We can propose hypothesis tests for the LATE by combining Theorems 3.1 and 3.2. For completeness,this is recorded in the next result. Theorem 3.3 (SAT test) . Suppose that Assumptions 2.1 and 2.2 hold, and that V sat > . For the problemof testing (2.2) at level α ∈ (0 , , Consider the following hypothesis testing procedure φ sat n ( X ( n ) ) ≡ I ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n ( ˆ β sat − β ) p ˆ V sat (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > z − α/ ) , here z − α/ is the (1 − α/ -quantile of N (0 , . Then, lim n →∞ E [ φ sat n ( X ( n ) )] = α whenever H in (2.2) holds, i.e., β = β . In this section, we consider the asymptotic properties of an IV estimator of the LATE based on a linearregression model of the outcome of interest with a full set of indicators for all strata and the treatmentdecision, where the latter is instrumented with the treatment assignment. We refer to this as the SFE IVregression. Under certain conditions, we show that this SFE IV regression consistently estimates the LATE.We show that this estimator is asymptotically normal and we characterize its asymptotic variance in termsof the primitives parameters of the RCT. We also propose a consistent estimator of this asymptotic varianceby using the results of the SAT IV regression in Section 3. This allows us to propose hypothesis tests forthe LATE that are asymptotically exact, i.e., their limiting rejection probability under the null hypothesisis equal to the nominal level.In terms of our notation, the SFE IV regression is the result of regressing Y i on { I { S i = s } : s ∈ S} and D i . Since the treatment decision D i is endogenously decided by the RCT participant, we instrument it withthe exogenous treatment assignment A i . To deﬁne this IV estimator precisely, set • Y n = { Y i : i = 1 , . . . , n } ′ , • X sfe n = { X sfe i : i = 1 , . . . , n } ′ with X sfe i = {{ I { S i = s } : s ∈ S} ′ , D i } , • Z sfe n = { Z sfe i : i = 1 , . . . , n } ′ with Z sfe i = {{ I { S i = s } : s ∈ S} ′ , A i } .The estimators of the coeﬃcients in IV SFE regression are {{ ˆ γ sfe ( s ) : s ∈ S} ′ , ˆ β sfe } ′ ≡ ( Z sfe n ′ X sfe n ) − ( Z sfe n ′ Y n ) . (4.1)where ˆ γ sfe ( s ) corresponds to the IV estimator of the coeﬃcient on I { S i = s } and ˆ β sfe corresponds to the IVestimator of the coeﬃcient on D i .Under Assumptions 2.1 and 2.2, Theorem A.5 in the appendix shows thatˆ β sfe p → X s ∈S ω ( s ) E [ Y (1) − Y (0) | C, S = s ] , where { ω ( s ) } s ∈S are non-negative weights deﬁned by ω ( s ) ≡ π A ( s )(1 − π A ( s )) P ( C, S = s ) P ˜ s ∈S π A (˜ s )(1 − π A (˜ s )) P ( C, S = ˜ s ) . (4.2)These equations show that ˆ β sfe is not necessarily a consistent estimator of the LATE under Assumptions 2.1and 2.2. By inspecting (4.2), it follows that the consistency of the estimator can be restored provided thatthe treatment propensity does not vary by strata, i.e., Assumption 2.3(c). For this reason, we maintain thiscondition for the remainder of this section.The following result reveals that ˆ β sfe is a consistent and asymptotically normal estimator of the LATE.The result characterizes the asymptotic distribution of this estimator in terms of primitive parameters of10he RCT. Theorem 4.1 (SFE main result) . Suppose that Assumptions 2.1 and 2.3 hold. Then, √ n ( ˆ β sfe − β ) d → N (0 , V sfe ) , where β ≡ E [ Y (1) − Y (0) | C ] and V sfe ≡ V sat + V sfe A with V sfe A ≡ (1 − π A ) P ( C ) π A (1 − π A ) X s ∈S p ( s ) τ ( s )( π D (1) ( s ) − π D (0) ( s )) ( E [ Y (1) − Y (0) | C, S = s ] − β ) , (4.3) with P ( C ) and V sat as deﬁned in (3.5) . We now give several remarks about Theorem 4.1. First, we note that ˆ β sfe is related to the IV estimator ˆ β considered by Ansel et al. (2018). In fact, these two estimators coincide if one speciﬁes their covariate X i asa full vector of strata dummies. As pointed out in Section 3, specifying the covariates in the regression allowsus to obtain a closed-form expression of the asymptotic variance ˆ β sfe in terms of the primitive parametersof the RCT.As we have done in Section 3 with the SAT IV regression, we can analyze the consequences of imperfectcompliance for the SFE IV regression by comparing Theorem 4.1 and Bugni et al. (2018, Theorem 4.3) orBugni et al. (2019, Theorem 4.1). First, note that imperfect compliance means that the probability limit ofˆ β sfe is not the ATE, but rather the LATE. Second, we note that imperfect compliance introduces signiﬁcantchanges to the asymptotic variance of ˆ β sfe . This implies that the consistent estimators of V sfe proposed inBugni et al. (2018, Section 4.2) or Bugni et al. (2019, Theorem 4.2) do not apply, and a new one is required.We provide this in Theorem 4.2.Third, we note that Theorem 4.1 relies on Assumption 2.3, which is stronger than Assumption 2.2 usedto derive Theorem 3.1. First, and as discussed earlier, Assumption 2.3(c) is important to guarantee that ˆ β sfe is a consistent estimator of the LATE. Second, we note that the derivation of the asymptotic distribution ofˆ β sfe relies on the details about the CAR mechanism provided in Assumption 2.3(b). These types of detailswere not required to derive the asymptotic distribution of ˆ β sat in Theorem 3.1.Fourth, it is relevant to note that V sfe − V sat = V sfe A ≥

0, which reveals that ˆ β sat is equally or more eﬃcientthan ˆ β sfe . In particular, both estimators have the same asymptotic distribution if and only if V sfe A = 0. Byinspecting (4.3), this occurs if the RCT is implemented with either π = 1 / τ ( s ) = 0 (e.g., by using SBRas described in Example 2.2).Fifth, we note that Theorem 4.1 allows for π D (0) ( s ) = P ( AT | S = s ) = 0 or 1 − π D (1) ( s ) = P ( N T | S = s ) = 0, by using the same abuse of notation as in Section 3. In particular, in the special case of perfectcompliance (i.e., π D (1) ( s ) = 1, π D (0) ( s ) = 0), Theorem 4.1 then holds with β = E [ Y (1) − Y (0)] V sfe = X s ∈S p ( s ) h V [ Y (1) | S = s ] π A ( s ) + V [ Y (0) | S = s ](1 − π A ( s )) + (cid:16) τ ( s ) (1 − π A ) π A (1 − π A ) (cid:17) ( E [ Y (1) − Y (0) | S = s ] − β ) i , which coincides with the corresponding results in Bugni et al. (2018, Section 4.2) and Bugni et al. (2019,Section 5).As promised earlier, the next result provides a consistent estimator of V sfe . Theorem 4.2 (Estimator of SFE asy. variance) . Assume Assumptions 2.1 and 2.3. Deﬁne the following stimator: ˆ V sfe A ≡ P ( C ) X s ∈S n ( s ) n τ ( s ) (1 − n A ( s ) n ( s ) ) n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) (cid:20) n AD ( s ) n A ( s ) − ( n D ( s ) − n AD ( s ))( n ( s ) − n A ( s )) (cid:21) ( ˆ β sat ( s ) − ˆ β sat ) , (4.4) where ( ˆ β sat ( s ) : s ∈ S ) , ˆ β sat , and ˆ P ( C ) are as in (3.1) , (3.4) , and (3.3) , respectively. Then, ˆ V sfe = ˆ V sat + ˆ V sfe A p → V sfe , where ˆ V sat is as in (3.6) . To conclude the section, we can propose hypothesis tests for the LATE by combining Theorems 4.1 and4.2. For completeness, this is recorded in the next result.

Theorem 4.3 (SFE test) . Suppose that Assumptions 2.1 and 2.3 hold, and that V sfe > . For the problemof testing (2.2) at level α ∈ (0 , , Consider the following hypothesis testing procedure φ sfe n ( X ( n ) ) ≡ I ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n ( ˆ β sfe − β ) p ˆ V sfe (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > z − α/ ) , where z − α/ is the (1 − α/ -quantile of N (0 , . Then, lim n →∞ E [ φ sfe n ( X ( n ) )] = α whenever H in (2.2) holds, i.e., β = β . We now consider the asymptotic properties of an IV estimator of the LATE based on a linear regressionmodel of the outcome of interest on a constant and the treatment decision, where the latter is instrumentedwith the treatment assignment. We refer to this as the 2S IV regression. Under certain conditions, the 2S IVregression can consistently estimate the LATE. We show that this estimator is asymptotically normal andwe characterize its asymptotic variance in terms of the primitives parameters of the RCT. We also proposea consistent estimator of this asymptotic variance by using the results of the SAT IV regression in Section3. This allows us to propose hypothesis tests for the LATE that are asymptotically exact, i.e., their limitingrejection probability under the null hypothesis is equal to the nominal level.In terms of our notation, the 2S IV regression is the result of regressing Y i on 1 and D i . Since thetreatment decision D i is endogenously decided by the RCT participant, we instrument it with the exogenoustreatment assignment A i . To deﬁne this IV estimator precisely, set • Y n = { Y i : i = 1 , . . . , n } ′ , • X n = { X i : i = 1 , . . . , n } ′ with X i = { (1 , D i ) } , • Z n = { Z i : i = 1 , . . . , n } ′ with Z i = { (1 , A i ) } .The estimators of the coeﬃcients in IV 2S regression are(ˆ γ , ˆ β ) ′ ≡ ( Z n ′ X n ) − ( Z n ′ Y n ) . (5.1)12here ˆ γ corresponds to the IV estimator of the coeﬃcient on 1 and ˆ β corresponds to the IV estimator ofthe coeﬃcient on D i .Under Assumptions 2.1 and 2.2, Theorem A.6 in the appendix shows thatˆ β p → P s ∈S p ( s )  [ π A ( s ) − P ( A )] π D (0) ( s ) E [ Y (1) | AT, S = s ]++[ π A ( s ) − P ( A )](1 − π D (1) ( s )) E [ Y (0) | N T, S = s ]+(1 − P ( A )) π A ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ] − P ( A )(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) E [ Y (0) | C, S = s ]  (1 − P ( A ))( P s ∈S p ( s ) π A ( s ) π D (1) ( s )) − P ( A )( P s ∈S p ( s )(1 − π A ( s )) π D (0) ( s )) , (5.2)with P ( A ) ≡ P s ∈S p ( s ) π A ( s ). This equation reveals that ˆ β is not necessarily a consistent estimator ofthe LATE under Assumptions 2.1 and 2.2. However, if we additionally impose Assumption 2.3(c), it followsthat ˆ β becomes a consistent estimator of the LATE. For this reason, we maintain this condition for theremainder of this section.The following result reveals that ˆ β is a consistent and asymptotically normal estimator of the LATE.The result characterizes the asymptotic distribution of this estimator in terms of primitive parameters ofthe RCT. Theorem 5.1 (2S main result) . Suppose that Assumptions 2.1 and 2.3 hold. Then, √ n ( ˆ β − β ) d → N (0 , V ) , where β ≡ E [ Y (1) − Y (0) | C ] and V ≡ V sat + V A with V A = X s ∈S p ( s ) τ ( s ) π A (1 − π A ) P ( C )  ( π A π D (0) ( s ) + (1 − π A ) π D (1) ( s ))( E [ Y (1) − Y (0) | C, S = s ] − β )+ π D (1) ( s ) E [ Y (0) | C, S = s ] − π D (0) ( s ) E [ Y (1) | C, S = s ]+ π D (0) ( s ) E [ Y (1) | AT, S = s ] + (1 − π D (1) ( s )) E [ Y (0) | N T, S = s ] − P ˜ s ∈S p (˜ s ) ×  ( π A π D (1) (˜ s ) + (1 − π A ) π D (0) (˜ s ))( E [ Y (1) − Y (0) | C, S = ˜ s ] − β )+ π D (1) (˜ s ) E [ Y (0) | C, S = ˜ s ] − π D (0) (˜ s ) E [ Y (1) | C, S = ˜ s ]+ π D (0) (˜ s ) E [ Y (1) | AT, S = ˜ s ] + (1 − π D (1) (˜ s )) E [ Y (0) | N T, S = ˜ s ]   , (5.3) with P ( C ) and V sat as deﬁned in (3.5) . Several remarks about Theorem 5.1 are in order. First, we note that ˆ β coincides with IV estimatorˆ β considered by Ansel et al. (2018). Relative to their results, we provide a closed-form expression of theasymptotic variance ˆ β in terms of the primitive parameters of the RCT.As we have done in the previous sections, we can analyze the consequences of imperfect compliance forthe 2S IV regression by comparing Theorem 5.1 and Bugni et al. (2018, Theorem 4.1). First, note thatimperfect compliance means that the probability limit of ˆ β is not the ATE, but rather the LATE. Second,we note that imperfect compliance introduces signiﬁcant changes to the asymptotic variance of ˆ β . Thisimplies that the consistent estimator of V proposed in Bugni et al. (2018, Section 4.1) does not apply, anda new one is required. We provide this in Theorem 5.2.Third, we note that Theorem 5.1 relies on Assumption 2.3, which is stronger than Assumption 2.2 used toderive Theorem 3.1. The argument here is the same as in Section 4. First, Assumption 2.3(c) is importantto guarantee that ˆ β is a consistent estimator of the LATE. Second, the derivation of the asymptotic13istribution of ˆ β relies on the details about the CAR mechanism provided in Assumption 2.3(b).Fourth, it is relevant to note that V − V sat = V A ≥

0, which reveals that ˆ β sat is equally or more eﬃcientthan ˆ β . In particular, both estimators have the same asymptotic distribution if and only if V A = 0. Byinspecting (4.3), this occurs if the RCT is implemented with τ ( s ) = 0 (e.g., by using SBR as describedin Example 2.2). We also note that τ ( s ) = 0 implies that V = V sfe = V sat and π A = 1 / V ≥ V sfe = V sat . Other than these special cases, V and V sfe cannot be ordered unambiguously (SeeBugni et al. (2018, Remark 4.8) for a similar point in the context of perfect compliance).Fifth, we note that Theorem 5.1 allows for π D (0) ( s ) = P ( AT | S = s ) = 0 or 1 − π D (1) ( s ) = P ( N T | S = s ) = 0, by using the same abuse of notation as in Section 3. In the special case of perfect compliance (i.e., π D (1) ( s ) = 1, π D (0) ( s ) = 0), Theorem 5.1 then holds with β = E [ Y (1) − Y (0)] V = X s ∈S p ( s )  V [ Y (1) | S = s ] π A ( s ) + V [ Y (0) | S = s ](1 − π A ( s )) + ( E [ Y (1) − Y (0) | S = s ] − β ) + τ ( s ) π A (1 − π A ) (cid:16) ( E [ Y (1) | S = s ] − E [ Y (1)]) π A + ( E [ Y (0) | S = s ] − E [ Y (0)])(1 − π A ) (cid:17)  , which coincides with the result in Bugni et al. (2018, Section 4.1).As promised earlier, the next result provides a consistent estimator of V . Theorem 5.2 (Estimator of 2S asy. variance) . Assume Assumptions 2.1 and 2.3. Deﬁne the followingestimator: ˆ V A ≡ X s ∈S n ( s ) n τ ( s ) nA ( s ) n ( s ) (1 − nA ( s ) n ( s ) ) ˆ P ( C ) " [ n A ( s ) n ( s ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) + (1 − n A ( s ) n ( s ) ) n AD ( s ) n A ( s ) ]( ˆ β sat ( s ) − ˆ β sat ) − P ˜ s ∈S n D (˜ s ) n ( ˆ β sat (˜ s ) − ˆ β sat ) + ˆ γ sat ( s ) − P ˜ s ∈S p (˜ s )ˆ γ sat (˜ s ) , (5.4) where ( ˆ β sat ( s ) : s ∈ S ) , ˆ β sat , and ˆ P ( C ) are as in (3.1) , (3.4) , and (3.3) , respectively. Then, ˆ V = ˆ V sat + ˆ V A p → V , where ˆ V sat is as in (3.6) . To conclude the section, we can propose hypothesis tests for the LATE by combining Theorems 5.1 and5.2. For completeness, this is recorded in the next result.

Theorem 5.3 (2S test) . Suppose that Assumptions 2.1 and 2.3 hold, and that V > . For the problem oftesting (2.2) at level α ∈ (0 , , Consider the following hypothesis testing procedure φ n ( X ( n ) ) ≡ I ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n ( ˆ β − β ) p ˆ V (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > z − α/ ) , where z − α/ is the (1 − α/ -quantile of N (0 , . Then, lim n →∞ E [ φ n ( X ( n ) )] = α whenever H in (2.2) holds, i.e., β = β . Designing an RCT based on pilot RCT data

This section considers the situation of a researcher who is interested in designing a hypothetical RCT withCAR to estimate the LATE. The researcher is in charge of choosing the parameters of the RCT, which entailthe randomization scheme, the stratiﬁcation function, and the treatment probability vector. To make thesedecisions, the researcher observes the results of a previous pilot RCT (also with CAR) conducted on the samepopulation of interest. Throughout this section, we presume that the researcher chooses the parameters ofthe hypothetical RCT in terms of the asymptotic eﬃciency of a LATE estimator based on the IV regressions.

In this section, we consider a hypothetical RCT with CAR with a given stratiﬁcation function S : Z → S .Our objective is to consider the eﬀect that the randomization scheme has on the asymptotic distribution ofthe LATE estimators.Under Assumptions 2.1 and 2.2, Theorem 3.1 reveals the randomization scheme has no inﬂuence on theasymptotic distribution of the LATE estimator based on the SAT IV regression. If we additionally imposeAssumption 2.3, then Theorems 4.1 and 5.1 show that the randomization scheme aﬀects the asymptoticvariance of the LATE estimators based on the SFE IV and 2S IV regressions via the parameter { τ ( s ) : s ∈S} , which appears on V sfe A and V A , respectively. For both of these estimators, the optimal choice of therandomization scheme is to set τ ( s ) = 0 for all s ∈ S . This decision is optimal regardless of the informationin the pilot RCT. Under this optimal choice, the asymptotic distribution of the LATE estimators in the SFEIV and 2S IV regressions coincide with the LATE estimator in the SAT IV regression, given in Theorem 3.1.In practice, τ ( s ) = 0 for all s ∈ S can be achieved by implementing the CAR using SBR.Based on these arguments, it is natural to focus the remainder of this section on the case in which thehypothetical RCT has τ ( s ) = 0 for all s ∈ S . This section considers the eﬀect of the choice of the stratiﬁcation function of the hypothetical RCT on theasymptotic variance of the LATE estimators. For the sake of simplicity, we impose some restrictions on thehypothetical RCT under consideration. First, we assume that the hypothetical RCT satisﬁes Assumptions2.1 and 2.3 with τ ( s ) = 0 for all s ∈ S . This choice is motivated by the discussion in Section 6.1, and hasthe additional beneﬁt that we can describe the asymptotic distribution of all three LATE estimators in asingle statement. The next result establishes that if the strata of the hypothetical RCT becomes coarser and all else remainsequal, then the asymptotic variance of the IV estimators will either remain constant or increase.

Theorem 6.1.

Consider two hypothetical RCTs with CAR on the same population and with the sameparameters except for the strata function. In particular, both RCTs satisfy Assumptions 2.1 and 2.3, andboth have τ ( s ) = 0 and the same π A . The ﬁrst RCT has strata function S : Z → S , the second RCT hasstrata function S : Z → S , and S is (weakly) ﬁner than S , i.e., ∀ z, z ′ ∈ Z , S ( z ) = S ( z ′ ) = ⇒ S ( z ) = S ( z ′ ) . (6.1) Note that Assumption 2.3(c) imposes a constant treatment assignment probability. In principle, we could generalize ouranalysis at the expense of substantially complicating the notation. et ˆ β denote the SAT, SFE, or 2S IV LATE estimator from the ﬁrst RCT with sample size n , and let ˆ β denote the SAT, SFE, or 2S IV LATE estimator from the second RCT with sample size n . Then, as n → ∞ , √ n ( ˆ β − β ) d → N (0 , V sat , ) , √ n ( ˆ β − β ) d → N (0 , V sat , ) , (6.2) and V sat , ≤ V sat , . (6.3)One implication of Theorem 6.1 is that, all else equal, a ﬁner strata structure is always preferable fromthe point of view of the asymptotic eﬃciency of the LATE estimator. Since the strata are deﬁned based onthe baseline covariate Z , the ﬁnest possible strata structure is one in which each point in the support of Z ,denoted by Z , is assigned to its own stratum. This idea would actually work if Z is discretely distributedand Z is ﬁnite. On the other hand, this conclusion would fail in the common case in which Z takes inﬁnitelymany values, as our formal analysis is entirely based on the presumption that S is a ﬁnite set.It is also relevant to connect Theorem 6.1 with the recent work by Tabord-Meehan (2020). In that paper,the author proposes using a randomization procedure referred to as stratiﬁcation trees in the context of anRCT with perfect compliance. The main result in Tabord-Meehan (2020) shows that these stratiﬁcation treescan be used to ﬁnd an optimal stratiﬁcation function in the sense of minimizing the asymptotic variance ofthe corresponding ATE estimator. To derive this result, the author restricts attention to strata functionswith a ﬁxed level of complexity or tree depth . If we apply it to an RCT with perfect compliance, Theorem 6.1would imply that the optimal stratiﬁcation function cannot decrease if we consider the optimal stratiﬁcationtree in Tabord-Meehan (2020) and we further divide any of its branches. Note that this ﬁnding is compatiblewith the main results in Tabord-Meehan (2020), as that the alternative tree would have a level of complexitythat is not allowed in the optimization problem in that paper.Consider the situation of a researcher who has completed a pilot RCT with CAR. Motivated by Theorem6.1, the researcher interested in gaining eﬃciency in the estimation of the LATE would want to run ahypothetical RCT with a ﬁner strata partition than that of the pilot RCT. In this case, the researcher wouldnaturally be interested in estimating the asymptotic variance of the LATE estimator in this hypotheticalRCT based only on the pilot RCT data, that is, before implementing the hypothetical RCT. This is preciselythe problem addressed in Theorem 6.2. This result provides a consistent estimator of the asymptotic varianceof the LATE estimator in the hypothetical RCT with a ﬁner strata than the pilot RCT, based on the pilotRCT data. The result follows from showing that the data from the pilot RCT can be reinterpreted as if itbelonged to the hypothetical RCT. Theorem 6.2.

Let { ( Y i , Z i , S Pi , A i ) } n P i =1 denote data from a pilot RCT that satisﬁes Assumptions 2.1, 2.2,and 2.3(c), and uses a strata function given by S P : Z → S P . Consider a hypothetical RCT on the samepopulation that satisﬁes Assumptions 2.1 and 2.3, that uses the same π A as the pilot RCT, τ ( s ) = 0 for all s ∈ S , and a strata function S : Z → S that is ﬁner than that of the pilot RCT i.e., ∀ z, z ′ ∈ Z , S ( z ) = S ( z ′ ) = ⇒ S P ( z ) = S P ( z ′ ) . (6.4) We also assume that n PA ( s ) /n P ( s ) p → π A for all s ∈ S , where n PA ( s ) ≡ n P P n P i =1 A i = 1 , S ( Z i ) = s ] and n P ( s ) ≡ n P P n P i =1 S ( Z i ) = s ] .Let ˆ β denote the SAT, SFE, or 2S IV LATE estimator from this hypothetical RCT with sample size n .Then, as n → ∞ , √ n ( ˆ β − β ) d → N (0 , V sat ) . (6.5) Furthermore, V sat can be consistently estimated based on { ( Y i , Z i , S ( Z i ) , A i ) } n P i =1 as in Theorem 3.2. This section considers the eﬀect of the choice of the treatment propensity vector { π ( s ) : s ∈ S} on theasymptotic variance of the LATE estimators, for a given the stratiﬁcation function S : Z → S . Underappropriate assumptions, Theorems 3.1, 4.1, and 5.1 provide an asymptotic distribution of our LATE IVestimators for any treatment propensity vector { π ( s ) : s ∈ S} . The following result calculates the optimaltreatment propensity in the sense of minimizing their asymptotic variance, and provides a strategy to estimateit based on data from a pilot RCT. Theorem 6.3.

Consider a hypothetical RCT that satisﬁes Assumptions 2.1 and 2.2. The treatment assign-ment probability vector that minimizes the asymptotic variance of the SAT IV estimator is { π ∗ A ( s ) : s ∈ S} with π ∗ A ( s ) ≡ s Π ( s )Π ( s ) ! − , (6.6) where Π ( s ) ≡   V [ Y (1) | AT, S = s ] π D (0) ( s ) + V [ Y (0) | N T, S = s ](1 − π D (1) ( s ))+ V [ Y (1) | C, S = s ]( π D (1) ( s ) − π D (0) ( s ))+( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) × π D (0) ( s )( π D (1) ( s ) − π D (0) ( s )) /π D (1) ( s )  + (1 − π D (1) ( s )) π D (1) ( s )  − π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ])+ π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | N T, S = s ])+ π D (1) ( s )( E [ Y (1) − Y (0) | C, S = s ] − β )   Π ( s ) ≡   V [ Y (1) | AT, S = s ] π D (0) ( s ) + V [ Y (0) | N T, S = s ](1 − π D (1) ( s ))+ V [ Y (0) | C, S = s ]( π D (1) ( s ) − π D (0) ( s ))+( E [ Y (0) | C, S = s ] − E [ Y (0) | N T, S = s ]) × (1 − π D (1) ( s ))( π D (1) ( s ) − π D (0) ( s )) / (1 − π D (0) ( s ))  + π D (0) ( s )(1 − π D (0) ( s ))  − (1 − π D (0) ( s ))( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ])+(1 − π D (1) ( s ))( E [ Y (0) | C, S = s ] − E [ Y (0) | N T, S = s ])+(1 − π D (0) ( s ))( E [ Y (1) − Y (0) | C, S = s ] − β )   . (6.7) Consider a hypothetical RCT that satisﬁes Assumptions 2.1 and 2.3 with τ ( s ) = 0 for all s ∈ S . Theconstant treatment assignment probability that minimizes the asymptotic variance of any of the LATE IVestimators π ∗ A ≡ s P s ∈S p ( s )Π ( s ) P ˜ s ∈S p (˜ s )Π (˜ s ) ! − . (6.8) Furthermore, (6.6) and (6.8) can be consistently estimated from the data in the pilot RCT. To this end,we propose plug-in estimators of the terms on the right-hand side of (6.6) or (6.8) , based on Theorems A.1and A.4 in the appendix. Monte Carlo simulations

In this section, we explore the various inference methods proposed in the paper via Monte Carlo simulations.This exercise has multiple goals. First, we hope to show that we can accurately estimate the LATE andthe asymptotic variance of the various LATE estimators. Second, we seek to conﬁrm the accuracy of ourasymptotic normal approximation by showing that the empirical coverage rate of our proposed conﬁdenceintervals is close to our desired coverage level. Third, we will use these simulations to explore how theasymptotic variance of the various estimators change as we vary the parameters of the DGP. In particular,we will numerically explore the theoretical predictions in Section 6 regarding the optimal RCT design.We consider four simulation designs, which we describe in Table 1. Design 1 is our baseline design, wherewe consider an RCT with ﬁve strata and with treatment assignment probabilities and the strata-speciﬁcLATE that are constant across strata. Design 2 is similar to our baseline design, but we increase the numberof strata from ﬁve to ten. Design 3 is similar to our baseline design, but we allow the strata-speciﬁc LATEto vary across strata. Finally, Design 4 is similar to Design 3, but we also enable treatment assignmentprobabilities to vary by strata.Design |S| { π A ( s ) : s ∈ S} { β ( s ) : s ∈ S} Description of the simulation designs.

For each simulation design, we show the average results of computing the estimators in 5 ,

000 independentreplications of the RCT with sample size n = 1 , We begin our description of Design 1 by specifying the features known to the researcher implementing theRCT. This researcher knows that there are ﬁve strata (i.e., |S| = 5), and that RCT participants are assignedinto treatment or control by using either SRS or SBR with constant treatment assignment probabilities equalto π A ( s ) = 1 / s ∈ S .We next describe those aspects of Design 1 unknown to the researcher implementing the RCT. First, wemake all strata to be equally likely, i.e., p ( s ) = 1 / s ∈ S . Second, RCT participants belong to eachtype according to i.i.d. draws of a multinomial distribution with P ( C | S = s ) = 0 . , P ( AT | S = s ) = 0 . , and P ( N T | S = s ) = 0 . . (7.1)Third, conditional on their type and their strata, the RCT participants have potential outcomes that are18.i.d. drawn according to a normal distribution with { E ( Y (0) | C, S = s ) : s ∈ S} = [0 , , , , { E ( Y (1) | C, S = s ) : s ∈ S} = [1 , , , , { E ( Y (0) | N T, S = s ) : s ∈ S} = [ − , − . , − . , − . , { E ( Y (1) | AT, S = s ) : s ∈ S} = [2 , . , . , . ,

3] (7.2)and V ( Y (0) | C, S = s ) = 0 . , V ( Y (1) | C, S = s ) = 3 , V ( Y (0) | N T, S = s ) = V ( Y (1) | AT, S = s ) = 1 . (7.3)Note that (7.2) implies that the strata-speciﬁc LATE is β ( s ) = E ( Y (1) − Y (0) | C, S = s ) = 1 and, thus,the LATE is β = 1. Also, while the distribution of treatment eﬀects for compliers are homogeneous acrossstrata, the corresponding distribution for always takers and never takers is not. The simulation results for Design 1 are provided in Table 2. The results reveal that all the proposedestimators are very close to the true LATE (equal to one). Across our simulations, the bias is almost zeroand the squared error of estimation is close to the asymptotic variance. We now describe the behavior ofthe asymptotic variance of these estimators. As shown by our formal results, the asymptotic variance isconstant under SBR (i.e., τ ( s ) = 0), and the asymptotic variance of the SAT IV estimator is the same acrossCAR mechanisms. The homogeneity of the distribution of treatment eﬀects for the compliers across strataimplies that V sfe A = 0 and, thus, the asymptotic variance of the SFE IV estimator is the same across CARmechanisms. Finally, the heterogeneity of the distribution of potential outcomes for never takers and alwaystakers causes that V A > This explains why the asymptotic variance of the 2S IV estimator is higherunder SRS than under SBR. It is relevant to note that, for all estimators, the asymptotic variance under SBRis smaller or equal than that under SRS, as demonstrated in Section 6.1. In all cases, the average value ofour proposed asymptotic variance estimate is very close to the true asymptotic variance. From these resultsand the fact that the normal asymptotic approximation is accurate, it follows that the empirical coveragerate of the true LATE is very close to the desired coverage rate of 95%.The simulation results in Table 2 were obtained using a treatment assignment probability of π A ( s ) = 1 / s ∈ S . The most eﬃcient LATE estimator in Table 2 is the SAT IV estimator, which has anasymptotic variance equal to 15.5408. Section 6.3 shows us to improve on the eﬃciency of this estimatorby optimizing the treatment assignment probabilities. According to Theorem 6.3, the optimal treatmentassignment probability vector is { π ∗ A ( s ) : s ∈ S} = (0 . , . , . , . , . π ∗ A = 0 . π A ( s ) = 1 / s ∈ S is approximately equivalent to discarding 5.93% of the sample relativeto the optimal constant treatment assignment probability and 5.94% of the sample relative to the optimaltreatment assignment probability vector. As discussed in earlier sections, the ITT is equal to the LATE multiplied by P ( C ) = 0 . We have also conducted simulations in a case in which there is homogeneity across strata for all types. These results areomitted for brevity and available upon request. This is the main diﬀerence with the simulations in which there is homogeneity across strata for all types. In that case, V A = 0 and thus all estimators under consideration have the same asymptotic variance under SRS and SBR. β sat β sfe β β sat β sfe β Simulation results over 5 ,

000 replications of Design 1 with sample size n = 1 , β sat , ˆ β sfe , or ˆ β , “Avg. est.” denotes average LATE estimateover simulations, “Avg. SE” denotes the average squared error of estimation over simulations, “AVar.” denotes theasymptotic variance of the LATE estimator, “Avg. AVar. est.” denotes the average asymptotic variance estimate oversimulations, and “Coverage” denotes the coverage rate of the true LATE over the simulations with desired coveragerate of 1 − α = 95%. In Section 6.2, we showed that the asymptotic variance of the proposed LATE estimators does not increaseif the strata of the RCT becomes ﬁner and all else remains equal. We explore this prediction in Design 2,where we split each stratum in Design 1 into two equally-sized strata. This produces a DGP with ten strata(i.e., |S| = 10). As in Design 1, RCT participants are still assigned into treatment or control by using eitherSRS or SBR with constant treatment assignment probabilities equal to π A ( s ) = 1 / p ( s ) = 1 /

10 for all s ∈ S , and RCT participants are assigned into types according to i.i.d. drawsof a multinomial distribution with probabilities as in (7.1). Conditional on their type and their strata, theRCT participants have potential outcomes that are i.i.d. drawn according to a normal distribution with { E ( Y (0) | C, S = s ) : s ∈ S} = [ − . , . , − . , . , − . , . , − . , . , − . , . { E ( Y (1) | C, S = s ) : s ∈ S} = [0 . , . , . , . , . , . , . , . , . , . { E ( Y (0) | N T, S = s ) : s ∈ S} = [ − . , − . , − . , − . , − , , − . , . , − . , . { E ( Y (1) | AT, S = s ) : s ∈ S} = [1 . , . , . , . , , , . , . , . , .

5] (7.4)and V ( Y (0) | C, S = s ) = 0 . , V ( Y (1) | C, S = s ) = 2 . , V ( Y (0) | N T, S = s ) = V ( Y (1) | AT, S = s ) = 0 . . (7.5)The parameters of the DGP in Design 2 are the result from splitting each stratum in Design 1 into twoequally-sized strata. In particular, the law of iterated expectations and the law of total variance imply that(7.4) and (7.5) are compatible with (7.2) and (7.3). Note also that (7.4) implies that the strata-speciﬁcLATE is β ( s ) = E ( Y (1) − Y (0) | C, S = s ) = 1 and, thus, the LATE is β = 1.The simulation results for Design 2 are provided in Table 3. These results are qualitatively similarto those in Table 2. The estimators of the LATE and the various asymptotic variances are extremelyaccurate, and the empirical coverage rate is extremely close to the desired coverage level. The main noticeablediﬀerence between the tables is that the asymptotic variance of each estimator in Design 2 is smaller thanthe corresponding one in Design 1, as expected from Theorem 6.1. For example, the asymptotic varianceof the SAT IV estimator is 15.5408 in the RCT with ﬁve strata and becomes 13.5 in the RCT with ten20trata. In terms of “eﬀective sample size”, this means that using ﬁve strata is approximately equivalent todiscarding 15.91% of the sample relative to using ten strata.We can also consider optimizing the treatment assignment probability in these simulations. Focusing onthe SAT IV estimator, the optimal treatment assignment probability vector yields an asymptotic variance of12.4249 and the optimal constant treatment assignment probability yields an asymptotic variance of 14.3473.In terms of “eﬀective sample size”, this means that using π A ( s ) = 1 / s ∈ S is approximately equivalentto discarding 8.19% of the sample relative to the optimal constant treatment assignment probability and8.22% of the sample relative to the optimal treatment assignment probability vector.CAR Estimator Avg. est. Avg. SE AVar. Avg. AVar. est. Coverageˆ β sat β sfe β β sat β sfe β Simulation results over 5 ,

000 replications of Design 2 with sample size n = 1 , β sat , ˆ β sfe , or ˆ β , “Avg. est.” denotes average LATE estimateover simulations, “Avg. SE” denotes the average squared error of estimation over simulations, “AVar.” denotes theasymptotic variance of the LATE estimator, “Avg. AVar. est.” denotes the average asymptotic variance estimate oversimulations, and “Coverage” denotes the coverage rate of the true LATE over the simulations with desired coveragerate of 1 − α = 95%. Relative to Design 1, Design 3 introduces heterogeneity of the strata-speciﬁc treatment eﬀects. In terms ofthe observable features to the researcher, however, Design 3 is identical to Design 1. There are ﬁve strata(i.e., |S| = 5) and RCT participants are assigned into treatment or control by using either SRS or SBR withconstant treatment assignment probabilities equal to π A ( s ) = 1 / s ∈ S .We now describe the features of Design 3 that are unobserved to the researcher. As in Design 1, all strataare equally likely, i.e., p ( s ) = 1 / s ∈ S , and RCT participants are assigned into types according toi.i.d. draws of a multinomial distribution with probabilities as in (7.1). Conditional on their type and theirstrata, RCT participants have potential outcomes that are i.i.d. drawn according to a normal distribution.The conditional variance of this distribution is as in (7.3). The conditional mean of this distribution fornever takers and always takes is as in (7.2), while that for compliers is as follows { E ( Y (0) | C, S = s ) : s ∈ S} = [0 , . , . , . , { E ( Y (1) | C, S = s ) : s ∈ S} = [ − , − . , . , . , . (7.6)Note also that (7.6) implies that the strata-speciﬁc LATE is { β ( s ) : s ∈ S} = { E ( Y (1) − Y (0) | C, S = s ) : s ∈ S} = [ − , − , , , P ( C | S = s ) = 0 . P ( s ) = 1 / s ∈ S , it follows that the LATE is (still) β = 1.The simulation results for Design 3 are provided in Table 4. Most of the results are qualitatively similarto those in previous simulations. The estimators of the LATE and the various asymptotic variances areextremely accurate, and the empirical coverage rate is extremely close to the desired coverage level. Aspredicted by our results, all LATE IV estimators have the same asymptotic variance when SBR is used. The21ain diﬀerence with respect to the results in Design 1 appears when SRS is used. In Design 1, the SAT andSFE IV estimators have the same asymptotic variance under SRS. In Design 3, the SFE IV estimator haslarger asymptotic variance than the SFE IV estimator when SRS is used. This is compatible with Theorem4.1, as the heterogeneity of the strata-speciﬁc treatment eﬀect in Design 3 implies that V sfe A > β sat β sfe β β sat β sfe β Simulation results over 5 ,

000 replications of Design 3 with sample size n = 1 , β sat , ˆ β sfe , or ˆ β , “Avg. est.” denotes average LATE estimateover simulations, “Avg. SE” denotes the average squared error of estimation over simulations, “AVar.” denotes theasymptotic variance of the LATE estimator, “Avg. AVar. est.” denotes the average asymptotic variance estimate oversimulations, and “Coverage” denotes the coverage rate of the true LATE over the simulations with desired coveragerate of 1 − α = 95%. Design 4 considers a researcher who implements CAR with an heterogeneous treatment assignment proba-bility. The RCT in Design 4 has ﬁve strata, just like in Design 1 and 3. Unlike all previous designs, RCTparticipants are assigned into treatment or control by using either SRS or SBR with treatment assignmentprobability vector equal to { π A ( s ) : s ∈ S} = [0 . , . , . , . , . . (7.7)We now describe the features of Design 4 that are unobserved to the researcher. First, all strata areequally likely, i.e., p ( s ) = 1 / s ∈ S . Second, RCT participants are assigned into types according toi.i.d. draws of a multinomial distribution with probabilities { P ( AT | S = s ) : s ∈ S} = [0 . , . , . , . , . { P ( N T | S = s ) : s ∈ S} = [0 . , . , . , . , . { P ( C | S = s ) : s ∈ S} = [0 . , . , . , . , . . (7.8)Conditional on their type and their strata, RCT participants have potential outcomes that are i.i.d. drawnaccording to a normal distribution. Its conditional variance is as in (7.3), while its conditional mean is given22y { E ( Y (0) | C, S = s ) : s ∈ S} = [0 , . , . , . , { E ( Y (1) | C, S = s ) : s ∈ S} = [ − , − . , . , . , { E ( Y (0) | N T, S = s ) : s ∈ S} = [ − , − . , − . , − . , { E ( Y (1) | AT, S = s ) : s ∈ S} = [2 , . , . , . ,

3] (7.9)These parameters in (7.9) imply that the strata-speciﬁc LATE is { β ( s ) : s ∈ S} = { E ( Y (1) − Y (0) | C, S = s ) : s ∈ S} = [ − , − . , . , , { P ( C | S = s ) : s ∈ S} in (7.8)and P ( s ) = 1 / s ∈ S , we can verify that the LATE is (still) β = 1.The simulation results for Design 4 are provided in Table 5. Given the heterogeneity of the treatmentassignment probability in (7.7), neither the SFE IV estimator nor the 2S IV estimator are guaranteed to beconsistent for the LATE. In fact, under our current conditions, Theorem A.5 in the appendix implies thatˆ β sfe p → . = 1 = β and Theorem A.6 in the appendix implies that ˆ β p → . = 1 = β . Note that thisoccurs for both CAR mechanisms. Since these estimators are not consistent for the parameter of interest,there is no point in discussing their asymptotic variance. If we blindly apply our inference method, it isthen not surprising that our empirical coverage rate for both of these IV estimators is signiﬁcantly below thedesired coverage level. In contrast, the SAT IV estimator remains consistent in this scenario. In addition,the estimator of the asymptotic variance of the SAT IV estimator is very accurate and the empirical coveragelevel is extremely close to the desired coverage level. These simulations conﬁrm that the inference based onthe SAT IV estimator is valid under more general conditions than that based on the other two IV estimators.CAR Estimator Avg. est. Avg. SE AVar. Avg. AVar. est. Coverageˆ β sat β sfe β β sat β sfe β Simulation results over 5 ,

000 replications of Design 4 with sample size n = 1 , β sat , ˆ β sfe , or ˆ β , “Avg. est.” denotes average LATE estimateover simulations, “Avg. SE” denotes the average squared error of estimation over simulations, “AVar.” denotes theasymptotic variance of the LATE estimator, “Avg. AVar. est.” denotes the average asymptotic variance estimate oversimulations, and “Coverage” denotes the coverage rate of the true LATE over the simulations with desired coveragerate of 1 − α = 95%. Finally, “NR” indicates that the corresponding asymptotic variance is not relevant as theconditions for the consistency of the corresponding estimator are not satisﬁed. In this section, we consider an empirical illustration based on Dupas et al. (2018). The authors use an RCTto investigate the economic impact of expanding access to basic bank accounts in several countries: Malawi,Uganda, and Chile. These countries diﬀer signiﬁcantly in their level of development and banking access,with Uganda being the intermediate country in both of these respects. For our illustration, we focus on theirRCT conducted in Uganda. The data are publicly available at .

23e now brieﬂy summarize the empirical setting; see Dupas et al. (2018) for a more detailed description.Bank accounts are important to daily economic life, but the rate of opening a bank account in developingcountries is relatively low compared to developed countries. During the time of the RCT, the authors reportthat 74% of households in Uganda were unbanked. Among the many beneﬁts of having access to bankaccounts, the authors focus on its arguably most primary function: safekeeping.Dupas et al. (2018) selected a random sample of 2,159 Ugandan households who did not have a bankaccount in 2011. These households were assigned into a treatment or a control group. Treated householdswere given a voucher for a free savings account at their nearest bank branch, without any fees for theRCT duration, along with assistance to complete the necessary paperwork required to open this account.Households in the control group were provided with these vouchers. We use the binary variable A i ∈ { , } to indicate if household i was given the voucher for the free savings account. The treatment assignmentwas stratiﬁed by gender, occupation, and bank branch, which generated 41 strata, i.e., s ∈ S = { , · · · , } .Within each stratum, households were randomly assigned to treatment or control using SBR with π A ( s ) =1 / s ∈ S . As a result, 1,080 households were selected to receive the treatment, while the remaining1,079 were placed in the control group. The households in the sample were re-interviewed thrice during2012-2013. In these follow-up surveys, the authors collected information about their saving behavior andseveral other related outcomes.This RCT featured imperfect compliance. While none of the 1,079 households in the control groupaccessed a free savings account, not every one of the 1,080 households in the treatment group opened theirfree savings account. Dupas et al. (2018, Table 1) reveals that, out of the 1,080 treated households, only54% actually opened up the bank account, while only 32% made at least one deposit during the RCT. As inDupas et al. (2018), we are interested in the eﬀect of opening and using the free savings account. We deﬁnethe binary variable D i = D i ( A i ) ∈ { , } to indicate if household i has opened and used the free savingsaccount in this RCT. Given this setup, we use our IV regressions to estimate and conduct inference on the LATE for severaloutcomes of interest. Dupas et al. (2018) collect information on two types of outcomes: saving stocks anddownstream outcomes. For brevity, we select three outcomes: savings in formal ﬁnancial institutions, savingsin cash at home or in a secret place, and expenditures in the last month. These variables were reported bythe households in the ﬁnal survey of the RCT and measured in 2010 US dollars.Table 6 presents the results of the IV regressions computed using our Stata package. For each outcomevariable, we estimate the LATE of using the savings account based on the SAT, SFE, and 2S IV regressions.Given the setup of the RCT, all of these estimators are consistent. We can also consistently estimate thestandard errors of these LATE estimators. Since the RCT uses SBR (i.e., τ ( s ) = 0 for all s ∈ S ), thestandard errors for all of these estimators coincide. To illustrate our results, it is also relevant to comparethese with the estimates of the standard errors under the incorrect assumption that the RCT used SRS (i.e., τ ( s ) = 1 for all s ∈ S ). (It is relevant to note that the standard Stata command ivregress would presumethat the data were collected using SRS.) First, by Theorem 3.1, the standard error of the SAT IV estimatordoes not depend on the details of the CAR mechanism, so the estimated standard error does not dependon { τ ( s ) : s ∈ S} . Second, since this RCT uses π A ( s ) = 1 / s ∈ S , Theorem 4.1 implies that thestandard error of the SFE IV estimator does not depend on { τ ( s ) : s ∈ S} . Finally, as predicted by Theorem5.1, the standard error of the 2S IV estimator under τ ( s ) = 1 for all s ∈ S is larger than necessary, resultingin a loss in statistical power. When measured in terms of eﬀective sample size, using the larger standard Following the paper, we consider an account used if the owner has made at least one deposit during the RCT. We havealso explored alternative deﬁnitions of usage, and we obtained qualitatively similar ﬁndings. In this RCT, D i (0) = 0 and so there are no always takers. As a consequence, the LATE coincides with the TOT, i.e., E [ Y (1) − Y (0) | D = 1]. β sat τ ( s ) = 0 4.059 4.818 3.870s.e. if τ ( s ) = 1 4.059 4.818 3.870ˆ β sfe τ ( s ) = 0 4.059 4.818 3.870s.e. if τ ( s ) = 1 4.059 4.818 3.870ˆ β τ ( s ) = 0 4.059 4.818 3.870s.e. if τ ( s ) = 1 4.163 5.099 4.244Table 6: Results of the IV regressions based on data from Dupas et al. (2018). For t ∈ { , } , “s.e. if τ ( s ) = t ”denotes the estimated standard error of the LATE estimator (divided by √ n ) under the assumption that the RCTuses τ ( s ) = t for all s ∈ S . Note that the RCT used SBR, and so τ ( s ) = 0 for all s ∈ S . The signiﬁcance levelof LATE estimators is indicated with stars in the usual manner: “***” means signiﬁcant at α = 1%, “**” meanssigniﬁcant at α = 5%, and“*” means signiﬁcant at α = 10%. errors in the 2S IV regression is analog to losing 5%, 10.7%, and 16.8% of the sample in the IV regressionsof the ﬁrst, second, and third outcome variables, respectively.We now brieﬂy describe the quantitative ﬁndings. For brevity, we focus on the LATE estimators basedon the SAT IV regression. For complier households, opening and using these savings accounts result in anaverage increase of savings in formal ﬁnancial institutions of $ $ $ π ∗ A for the three outcomevariables are 0.549, 0.491, and 0.496, respectively. These probabilities are all very close to the one used inthe RCT and consequently result in very minor eﬃciency gains. This paper studies inference in an RCT with CAR and imperfect compliance of a binary treatment. ByCAR, we refer to randomization schemes that ﬁrst stratify according to baseline covariates and then assigntreatment status to achieve “balance” within each stratum. In this context, we allow the RCT participantsto endogenously decide whether to comply or not with the assigned treatment status. Given the possibilityof imperfect compliance, we study inference on the LATE.We study the asymptotic properties of three LATE estimators derived from IV regression. The ﬁrst oneis the “fully saturated” or SAT IV regression, i.e., a linear regression of the outcome on all indicators for allstrata and their interaction with the treatment decision, with the latter instrumented with the treatmentassignment. We show that the proposed LATE estimator is asymptotically normal, and we characterize itsasymptotic variance in terms of primitives of the problem. We provide consistent estimators of the standarderrors and asymptotically exact hypothesis tests. This LATE estimator is consistent under weak conditions25egarding the CAR method used to implement the RCT (i.e., Assumptions 2.1-2.2).Our second LATE estimator is based on the “strata ﬁxed eﬀects” or SFE IV linear regression, i.e., alinear regression of the outcome on indicators for all strata and the treatment decision, with the latterinstrumented with the treatment assignment. Our last LATE estimator is based on the “two-sample” or 2SIV linear regression, i.e., a linear regression of the outcome on a constant and the treatment decision, withthe latter instrumented with the treatment assignment. The consistency of both of these LATE estimatorsrequires additional conditions relative to the one based on the SAT IV linear regression (i.e., Assumptions 2.1-2.3). In particular, they require that the target proportion of RCT participants assigned to each treatmentcannot vary by strata (see Assumption 2.3). Under these conditions, we show that both LATE estimators areasymptotically normal, and we characterize their asymptotic variance in terms of primitives of the problem.We also provide consistent estimators of their standard errors and asymptotically exact hypothesis tests.Our characterization of the asymptotic properties of the LATE estimators allows us to investigate theinﬂuence of the parameters of the RCT. We use this to propose strategies to minimize their asymptoticvariance in a hypothetical RCT based on data from its pilot study. We also establish that the asymptoticvariance of the proposed LATE estimators does not increase if the strata of the RCT becomes ﬁner and allelse remains equal. We determine the optimal treatment assignment probability vector and show how toestimate it consistently based on data from a pilot study.We conﬁrm our theoretical results in Monte Carlo simulations. We also illustrate the practical relevanceof our ﬁndings by revisiting the RCT in Dupas et al. (2018).

A Appendix

A.1 Additional notation

This appendix uses the following notation. We use LHS and RHS to denote “left hand side” and “right hand side”,respectively. We also use LLN, CLT, CMT, and LIE to denote “law of large numbers”, “central limit theorem”,“continuous mapping theorem”, “law of iterated expectations”, respectively.For any i = 1 , . . . , n and ( d, a, s ) ∈ { , } × S , we also deﬁne˜ Y i ( d ) ≡ Y i ( d ) − E [ Y i ( d ) | S i ] (1) = Y i ( d ) − E [ Y ( d ) | S ] E [ ˜ Y i ( d ) | D i ( a ) = d, A i = a, S i = s ] (2) = E [ ˜ Y ( d ) | D ( a ) = d, S = s ] V [ ˜ Y i ( d ) | D i ( a ) = d, A i = a, S i = s ] (3) = V [ ˜ Y ( d ) | D ( a ) = d, S = s ] , (A-1)where (1) follows from Assumption 2.1, and (2) and (3) follow from Lemma A.2. Based on (A-1), we deﬁne µ ( d, a, s ) ≡ E [ ˜ Y ( d ) | D ( a ) = d, S = s ] and σ ( d, a, s ) ≡ V [ ˜ Y ( d ) | D ( a ) = d, S = s ] . (A-2)Lemma A.3 translates µ ( d, a, s ) and σ ( d, a, s ) in terms of the conditional moments of Y ( d ).For every s ∈ S , Sections A.4 and A.5 will use the following notation: n A ≡ X s ∈S n A ( s ) , n D ≡ X s ∈S n D ( s ) , and n AD ≡ X s ∈S n AD ( s ) . .2 Auxiliary results Lemma A.1.

Under Assumption 2.1, and for any i = 1 , . . . , n and s ∈ S , P ( D i (1) = 1 , D i (0) = 1 | S i = s ) = P ( AT | S = s ) = π D (0) ( s ) ∈ [0 , P ( D i (1) = 0 , D i (0) = 0 | S i = s ) = P ( NT | S = s ) = 1 − π D (1) ( s ) ∈ [0 , P ( D i (1) = 1 , D i (0) = 0 | S i = s ) = P ( C | S = s ) = π D (1) ( s ) − π D (0) ( s ) ∈ (0 , . (A-3) Proof.

We begin by showing the ﬁrst line in (A-3). P ( D i (1) = 1 , D i (0) = 1 | S i = s ) (1) = P ( D (1) = 1 , D (0) = 1 | S = s ) (2) = P ( D (0) = 1 | S = s ) = π D (0) ( s ) (3) < , where (1) and (3) hold by the i.i.d. condition in Assumption 2.1, and (2) holds by Assumption 2.1(c). To completethe argument, note that P ( AT | S = s ) ≡ P ( D (1) = 1 , D (0) = 1 | S = s ). The second line in (A-3) follows from asimilar argument. The last line in (A-3) follows from the other two lines and Assumption 2.1(c). Lemma A.2.

Assume Assumptions 2.1 and 2.2. For any i = 1 , . . . , n and ( d, a, s, b ) ∈ { , } × S × { , } , P ( D i ( a ) = 1 | A i = a, S i = s ) = P ( D ( a ) = 1 | S = s ) = π D ( a ) ( s ) . (A-4) Also, provided that the conditioning event has positive probability, E [ Y i ( d ) b | D i ( a ) = d, A i = a, S i = s ] = E [ Y ( d ) b | D ( a ) = d, S = s ] . (A-5) Proof.

Fix i = 1 , . . . , n and ( d, a, s, b ) ∈ { , } × S × { , } arbitrarily, and set s i = s throughout this proof. Toprove (A-4), consider the following derivation. P ( D i ( a ) = 1 | A i = a, S i = s )= X { s j } j = i ∈S P ( D i ( a ) = 1 | A i = a, { S j } nj =1 = { s j } nj =1 ) P ( { S j } j = i = { s j } j = i | A i = a, S i = s ) (1) = X { s j } j = i ∈S P ( D i ( a ) = 1 |{ S j } nj =1 = { s j } nj =1 ) P ( { S j } j = i = { s j } j = i | A i = a, S i = s ) = π D ( a ) ( s ) , where (1) holds by Assumption 2.2(a).To derive (A-5), consider the following argument. E [ Y i ( d ) b | A i = a, D i ( a ) = d, S i = s ] (1) = Z y y b P { s j } j = i ∈S dP ( Y i ( d ) = y, D i ( a ) = d | A i = a, { S i } ni =1 = { s i } ni =1 ) P ( { S i } ni =1 = { s i } ni =1 ) P { s j } j = i ∈S P ( D i ( a ) = d | A i = a, { S i } ni =1 = { s i } ni =1 ) P ( { S i } ni =1 = { s i } ni =1 ) (2) = Z y y b P { s j } j = i ∈S dP ( Y i ( d ) = y, D i ( a ) = d |{ S i } ni =1 = { s i } ni =1 ) P ( { S i } ni =1 = { s i } ni =1 ) P { s j } j = i ∈S P ( D i ( a ) = d |{ S i } ni =1 = { s i } ni =1 ) P ( { S i } ni =1 = { s i } ni =1 ) (3) = Z y y b dP ( Y ( d ) = y, D ( a ) = d, S ( Z ) = s ) P ( D ( a ) = d, S ( Z ) = s ) = E [ Y ( d ) b | D ( a ) = d, S = s ] , where (1) and (3) hold by Assumption 2.1, and (2) holds by Assumption 2.2(a). emma A.3. Under Assumptions 2.1 and 2.2, and provided that the conditioning event has positive probability, µ (1 , , s ) = E [ Y (1) | AT, S = s ] π D (0) ( s ) π D (1) ( s ) + E [ Y (1) | C, S = s ] π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) − E [ Y (1) | S = s ] µ (0 , , s ) = E [ Y (0) | NT, S = s ] − E [ Y (0) | S = s ] µ (1 , , s ) = E [ Y (1) | AT, S = s ] − E [ Y (1) | S = s ] µ (0 , , s ) = E [ Y (0) | NT, S = s ] − π D (1) ( s )1 − π D (0) ( s ) + E [ Y (0) | C, S = s ] π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) − E [ Y (0) | S = s ] (A-6) and σ (1 , , s ) =  V [ Y (1) | AT, S = s ] π D (0) ( s ) π D (1) ( s ) + V [ Y (1) | C, S = s ] π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) +( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) π D (0) ( s ) π D (1) ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s )  σ (0 , , s ) = V [ Y (0) | NT, S = s ] σ (1 , , s ) = V [ Y (1) | AT, S = s ] σ (0 , , s ) =  V [ Y (0) | NT, S = s ] − π D (1) ( s )1 − π D (0) ( s ) + V [ Y (0) | C, S = s ] π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) +( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) − π D (1) ( s )1 − π D (0) ( s ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s )  . (A-7) Proof.

The subscript i = 1 , . . . , n is absent from all expressions due to the i.i.d. condition in Assumption 2.1. Forany ( d, a, s, b ) ∈ { , } × S × { , } such that P ( D ( a ) = d | S = s ) = I [ d = 1] π D ( a ) ( s ) + I [ d = 0](1 − π D ( a ) ( s )) > E [ Y ( d ) b | D ( a ) = d, S = s ] = " E [ Y ( d ) b | D ( a ) = d, D (1 − a ) = 0 , S = s ] P ( D (1 − a ) = 0 , D ( a ) = d | S = s )+ E [ Y ( d ) b | D ( a ) = d, D (1 − a ) = 1 , S = s ] P ( D (1 − a ) = 1 , D ( a ) = d | S = s ) I [ d = 1] π D ( a ) ( s ) + I [ d = 0](1 − π D ( a ) ( s )) . (A-8)We begin by showing (A-6). We only show the ﬁrst line, as the others can be shown analogously. µ (1 , , s ) (1) = E [ Y (1) | D (1) = 1 , S = s ] − E [ Y (1) | S = s ] (2) = " E [ Y (1) | D (1) = 1 , D (0) = 0 , S = s ] P ( D (0) = 0 , D (1) = 1 | S = s )+ E [ Y (1) | D (1) = 1 , D (0) = 1 , S = s ] P ( D (0) = 1 , D (1) = 1 | S = s ) π D (1) ( s ) − E [ Y (1) | S = s ] (3) = E [ Y (1) | AT, S = s ] π D (0) ( s ) π D (1) ( s ) + E [ Y (1) | C, S = s ] π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) − E [ Y (1) | S = s ] , where (1) follows from (A-2), (2) follows from (A-8), and (3) follows from Lemma A.1.To conclude, we show (A-7). Again, we only show the ﬁrst line, as the others can be shown analogously. σ (1 , , s ) (1) = V [ Y (1) − E [ Y (1) | S = s ] | D (1) = 1 , S = s ]= V [ Y (1) | D (1) = 1 , S = s ] (2) =   E [ Y (1) | D (1) = 1 , D (0) = 0 , S = s ] P ( D (0) = 0 , D (1) = 1 | S = s )+ E [ Y (1) | D (1) = 1 , D (0) = 1 , S = s ] P ( D (0) = 1 , D (1) = 1 | S = s )  π D (1) ( s ) −  E [ Y (1) | D (1) = 1 , D (0) = 0 , S = s ] P ( D (0) = 0 , D (1) = 1 | S = s )+ E [ Y (1) | D (1) = 1 , D (0) = 1 , S = s ] P ( D (0) = 1 , D (1) = 1 | S = s )  ( π D (1) ( s ))  (3) =  V [ Y (1) | AT, S = s ] π D (0) ( s ) π D (1) ( s ) + V [ Y (1) | C, S = s ] π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) +( E [ Y (1) | AT, S = s ] − E [ Y (1) | C, S = s ]) π D (0) ( s ) π D (1) ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s )  , where (1) follows from (A-2), (2) follows from (A-8), and (3) follows follows from Lemma A.1. emma A.4. Under Assumptions 2.1 and 2.2, (cid:26)(cid:26) √ n (cid:18) n AD ( s ) n A ( s ) − π D (1) ( s ) , n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) − π D (0) ( s ) (cid:19) ′ : s ∈ S (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) { ( S i , A i ) } ni =1 (cid:27) d → N ( , Σ D ) w.p.a.1, (A-9) where Σ D ≡ diag (" π D (1) ( s )(1 − π D (1) ( s )) π A ( s ) π D (0) ( s )(1 − π D (0) ( s ))1 − π A ( s ) p ( s ) : s ∈ S ) . (A-10) In addition, (cid:18) n AD ( s ) n A ( s ) , n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) (cid:19) p → ( π D (1) ( s ) , π D (0) ( s )) . (A-11) Proof.

We only show (A-9), as (A-11) follows from (A-9) and elementary convergence arguments. We divide theproof of (A-9) in two steps. The ﬁrst step shows that (cid:26)(cid:26)(cid:18)p n A ( s ) (cid:18) n AD ( s ) n A ( s ) − π D (1) ( s ) (cid:19) , p n ( s ) − n A ( s ) (cid:18) n D ( s ) n ( s ) − n A ( s ) − π D (0) ( s ) (cid:19)(cid:19) ′ : s ∈ S (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) { ( S i , A i ) } ni =1 (cid:27) d → N , diag (" π D (1) ( s )(1 − π D (1) ( s )) 00 π D (0) ( s )(1 − π D (0) ( s )) : s ∈ S )! w.p.a.1, (A-12)where n D ( s ) ≡ P ni =1 A i = 0 , D i = 1 , S i = s ] for any s ∈ S . The second step shows that (( √ n p n A ( s ) , √ n p n ( s ) − n A ( s ) ! ′ : s ∈ S )(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { ( S i , A i ) } ni =1 ) → p p ( s ) p π A ( s ) , p − π A ( s ) ! w.p.a.1. (A-13)Then, (A-9) follows from (A-12) and (A-13) via elementary convergence arguments.Step 1: Show (A-12). Conditional on { ( S i , A i ) } ni =1 , note that { ( n A ( s ) , n ( s )) : s ∈ S} is non-stochastic, and so theonly source of randomness in (A-12) is { ( n AD ( s ) , n D ( s )) : s ∈ S} . Also, It is relevant to note that n AD ( s ) = n X i =1 A i = 1 , D i (1) = 1 , S i = s ] = n X i =1 A i = 1 , S i = s ] D i (1) n D ( s ) = n X i =1 A i = 0 , D i (0) = 1 , S i = s ] = n X i =1 A i = 0 , S i = s ] D i (0) . (A-14)According to (A-14), each component of { ( n AD ( s ) , n D ( s )) ′ : s ∈ S} is determined by diﬀerent subset of individualsin the random sample.As a next step, consider the following derivation for any { d ,i } ni =1 × { d ,i } ni =1 × { a i } ni =1 × { s i } ni =1 ∈ { , } n ×{ , } n × { , } n × S n . P ( { ( D i (0) , D i (1)) } ni =1 = { ( d ,i , d ,i ) } ni =1 |{ ( A i , S i ) } ni =1 = { ( a i , s i ) } ni =1 ) (1) = P ( { ( D i (0) , D i (1)) } ni =1 = { ( d ,i , d ,i ) } ni =1 |{ S i } ni =1 = { s i } ni =1 )= P ( { ( D i (0) , D i (1) , S i ) } ni =1 = { ( d ,i , d ,i , s i ) } ni =1 ) P ( { S i } ni =1 = { s i } ni =1 ) (2) = Π ni =1 P (( D (0) , D (1)) = ( d ,i , d ,i ) | S = s i ) , (A-15)where (1) follows from Assumption 2.2(a) and (2) follows from Assumption 2.1. Conditionally on { ( A i , S i ) } ni =1 = { ( a i , s i ) } ni =1 , (A-15) reveals that { ( D i (0) , D i (1)) } ni =1 is an independent sample with { ( D i (0) , D i (1)) |{ ( A i , S i ) } ni =1 = { ( a i , s i ) } ni =1 } d = { ( D (0) , D (1)) | S = s i } .By (A-14), { ( n AD ( s ) , n D ( s )) ′ : s ∈ S} are the sum of binary observations from diﬀerent individuals. If wecondition on { ( A i , S i ) } ni =1 , (A-14) and (A-15) imply that {{ ( n AD ( s ) , n D ( s ) − n AD ( s )) ′ : s ∈ S}|{ S i } ni =1 , { A i } ni =1 } d = { ( B (1 , s ) , B (0 , s )) ′ : s ∈ S} , (A-16) here { ( B (1 , s ) , B (0 , s )) ′ : s ∈ S} are independent random variables with B (1 , s ) ∼ Bi ( n A ( s ) , π D (1) ( s )) and B (0 , s ) ∼ Bi ( n ( s ) − n A ( s ) , π D (0) ( s )). Provided that we condition on sequences of { ( S i , A i ) } ni =1 with n A ( s ) → ∞ and n ( s ) − n A ( s ) → ∞ for all s ∈ S , (A-12) follows immediately from (A-16) and the normal approximation to the binomial.To conclude the step, it suﬃces to show that n A ( s ) → ∞ and n ( s ) − n A ( s ) → ∞ for all s ∈ S w.p.a.1. In turn,note that this is a consequence of n ( s ) /n a.s → p ( s ) > n A ( s ) /n ( s ) p → π A ( s ) ∈ (0 ,

1) byAssumption 2.2(b).Step 2: Show (A-13). Fix s ∈ S arbitrarily and notice that √ n p n A ( s ) , √ n p n ( s ) − n A ( s ) ! = h nn ( s ) (cid:16) n A ( s ) /n ( s ) , − n A ( s ) /n ( s ) (cid:17)i / p → p p ( s ) p π A ( s ) , p − π A ( s ) ! , (A-17)where the convergence follows from Assumptions 2.1 and 2.2(b). From (A-17) and the fact that { ( n A ( s ) , n ( s )) : s ∈ S} is non-stochastic once we condition on { ( A i , S i ) } ni =1 , (A-13) follows. Lemma A.5.

Assume Assumptions 2.1 and 2.3, and deﬁne R n ≡ ( R ′ n, , R ′ n, , R ′ n, , R ′ n, ) ′ , where R n, ≡ ( √ n n X i =1 I { D i = d, A i = a, S i = s } ( ˜ Y i ( d ) − µ ( d, a, s )) : ( d, a, s ) ∈ { , } × S ) R n, ≡ (cid:26)(cid:20) √ n ( n AD ( s ) n A ( s ) − π D (1) ( s )) , √ n ( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) − π D (0) ( s )) (cid:21) ′ : s ∈ S (cid:27) R n, ≡ (cid:26) √ n ( n A ( s ) n ( s ) − π A ( s )) : s ∈ S (cid:27) R n, ≡ {√ n ( n ( s ) n − p ( s )) : s ∈ S } . (A-18) Then, R n d → N   ,  Σ Σ Σ

00 0 0 Σ  , where Σ ≡ diag  d, a ) = (0 , − π D (0) ( s ))(1 − π A ( s ))+1[( d, a ) = (1 , π D (0) ( s )(1 − π A ( s ))+1[( d, a ) = (0 , − π D (1) ( s )) π A ( s )+1[( d, a ) = (1 , π D (1) ( s ) π A ( s )  p ( s ) σ ( d, a, s ) : ( d, a, s ) ∈ { , } × S  Σ ≡ diag (" (1 − π D (1) ( s )) π D (1) ( s ) /π A ( s ) 00 (1 − π D (0) ( s )) π D (0) ( s ) / (1 − π A ( s )) /p ( s ) : s ∈ S ) Σ ≡ diag { τ ( s )(1 − π A ( s )) π A ( s ) /p ( s ) : s ∈ S} Σ ≡ diag { p ( s ) : s ∈ S} − { p ( s ) : s ∈ S}{ p ( s ) : s ∈ S} ′ . Proof.

Throughout this proof, it is relevant to recall that Assumption 2.3 implies Assumption 2.2. Also, let ζ j ∼ N ( , Σ j ) for j = 1 , , ,

4, with ( ζ ′ , ζ ′ , ζ ′ , ζ ′ ) are independent. Our goal is to show that ( R ′ n, , R ′ n, , R ′ n, , R ′ n, ) d → ( ζ ′ , ζ ′ , ζ ′ , ζ ′ ). We divide the argument into 3 steps. tep 1. Under Assumptions 2.1 and 2.2, we show that for random vectors R Cn, and R Dn, ,( R ′ n, , R ′ n, , R ′ n, , R ′ n, ) d = ( R Cn, ′ , R ′ n, , R ′ n, , R ′ n, ) (A-19) R Dn, ⊥ ( R ′ n, , R ′ n, , R ′ n, ) ′ (A-20) R Dn, d → ζ (A-21) R Cn, = R Dn, + o p (1) . (A-22)For any arbitrary ( { y i } ni =1 , { d i } ni =1 , { a i } ni =1 , { s i } ni =1 ) ∈ R n × { , } n × { , } n × S n , consider ﬁrst the followingderivation. Provided that the conditioning event has positive probability, dP ( { ˜ Y ( d i ) = y i } ni =1 |{ ( D i , A i , S i ) = ( d i , a i , s i ) } ni =1 ) (1) = dP ( { Y ( d i ) = y i + E [ Y ( d i ) | S = s i ] } ni =1 |{ ( D i , A i , S i ) = ( d i , a i , s i ) } ni =1 ) (2) = dP ( { ( Y ( d i ) , D ( a i )) = ( y i + E [ Y ( d i ) | S = s i ] , d i ) } ni =1 |{ ( S i , A i ) = ( s i , a i ) } ni =1 ) P ( { D ( a i ) = d i } ni =1 |{ ( S i , A i ) = ( s i , a i ) } ni =1 ) (3) = dP ( { ( Y ( d i ) , D ( a i )) = ( y i + E [ Y ( d i ) | S = s i ] , d i ) } ni =1 |{ S i = s i } ni =1 ) P ( { D ( a i ) = d i } ni =1 |{ S i = s i } ni =1 ) (4) = R { z i : S ( z i )= s i } ni =1 dP ( { ( Y ( d i ) , D ( a i ) , Z i ) = ( y i + E [ Y ( d i ) | S = s i ] , d i , z i ) } ni =1 ) R { z i : S ( z i )= s i } ni =1 P ( { ( D ( a i ) , Z i ) = ( d i , z i ) } ni =1 ) (5) = Q ni =1 R z i : S ( z i )= s i dP (( Y ( d i ) , D ( a i ) , Z i ) = ( y i + E [ Y ( d i ) | S = s i ] , d i , z i )) Q ni =1 R z i : S ( z i )= s i P (( D ( a i ) , Z i ) = ( d i , z i ))= n Y i =1 dP ( Y ( d i ) = y i + E [ Y ( d i ) | S = s i ] | D ( a i ) = d i , S = s i ) (6) = n Y i =1 dP ( ˜ Y ( d i ) = y i | D ( a i ) = d i , S = s i ) , (A-23)where (1) and (6) follow from (A-1), (2) follows from D i = D ( A i ), (3) follows form Assumption 2.2(a), (4) follows from S i = S ( Z i ), (5) follows from Assumption 2.1. A corollary of (A-23) is that {{ ˜ Y ( d i ) } ni =1 |{ ( D i , A i , S i ) = ( d i , a i , s i ) } ni =1 } has the distribution of an independent sample with observation i = 1 , . . . , n distributed according to { ˜ Y ( d i ) | D ( a i ) = d i , S = s i } . Then, conditionally on { ( D i , A i , S i ) = ( d i , a i , s i ) } ni =1 , { ˜ Y i ( d i ) − µ ( d, a, s ) } ni =1 is an independent samplewith { ˜ Y i ( d i ) − µ ( d i , a i , s i ) |{ ( D i , A i , S i ) = ( d i , a i , s i ) } ni =1 } d = { ˜ Y ( d i ) − µ ( d i , a i , s i ) | D ( a i ) = d i , S = s i } .Conditional on { ( D i , A i , S i ) = ( d i , a i , s i ) } ni =1 , consider the following matrix {{{ I { D i = d, A i = a, S i = s } ( ˜ Y i ( d ) − µ ( d, a, s )) } : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } . (A-24)Consider the following observation for each row i = 1 , . . . , n of (A-24). Row i has one and only one indicator { I { D i = d, A i = a, S i = s } : ( d, a, s ) ∈ { , } × S} that is turned on, corresponding to ( d, a, s ) = ( d i , a i , s i ). For thisentry, we have that I { D i = d, A i = a, S i = s } ( ˜ Y i ( d ) − µ ( d, a, s )) = ( ˜ Y i ( d i ) − µ ( d i , a i , s i )). The remaining observationsin row i are equal to zero and, thus, independent of ( ˜ Y i ( d i ) − µ ( d i , a i , s i )). In this sense, the elements of the row areindependent. By the derivation in (A-23), conditional on { ( D i , A i , S i ) = ( d i , a i , s i ) } ni =1 , the rows are independent.As a consequence, conditional on { ( D i , S i , A i ) } ni =1 , (A-24) has the same distribution as the following matrix {{{ I { D i = d, A i = a, S i = s } ˘ Y i ( d, a, s ) } : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } , (A-25)where {{ ˘ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } denotes a matrix of 4 S × n independent random variables,independent of { ( D i , S i , A i ) } ni =1 , with ˘ Y i ( d, a, s ) d = { ˜ Y ( d ) − µ ( d, a, s ) | D ( a ) = d, S = s } for each ( d, a, s ) ∈ { , } × S .As a corollary, { R n, |{ ( D i , A i , S i ) } ni =1 } d = { R Bn, |{ ( D i , A i , S i ) } ni =1 } , (A-26) here R Bn, ≡ ( √ n n X i =1 I { D i = d, A i = a, S i = s } ˘ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S ) . Consider the following classiﬁcation of observations. Let g = 1 represent an observation with ( d, a, s ) = (0 , , g = 2 represents ( d, a, s ) = (1 , , g = 3 represents ( d, a, s ) = (0 , , g = 4 represents ( d, a, s ) = (1 , , g = 5represents ( d, a, s ) = (0 , , g = 4 S , which represents ( d, a, s ) = (1 , , |S| ). Let G : ( d, a, s ) → G ≡{ , . . . , |S|} denote the function that maps each ( d, a, s ) into a group g ∈ G . Note that G is invertible for every g ∈ G with G − : G → { ( d, a, s ) ∈ { , } × S} . Also, for any s ∈ S , let ¯ g ( s ) ∈ G be the unique value of G s.t.¯ g ( s ) = g (0 , , s ), and so ¯ g ( s ) + 1 = g (1 , , s ), ¯ g ( s ) + 2 = g (0 , , s ), and ¯ g ( s ) + 3 = g (1 , , s ). For each g ∈ G , let N g ≡ P ni =1 I { G ( D i , A i , S i ) < g } . Note that, conditional on { ( D i , A i , S i ) } ni =1 , { N G ( d,a,s ) : ( d, a, s ) ∈ { , } × S} isnonstochastic. Let’s now consider a reordering of the units i = 1 , . . . , n in the following manner: ﬁrst by strata s ∈ S ,then by treatment assignment a ∈ { , } , and then by decision d ∈ { , } . In other words, the units are reordered inincreasing order of g = 1 , . . . , G . Let R Cn, denote the reordered sum. Let { π ( i ) : i = 1 , . . . , n } denote the permutationof the units described by this reordering. Since ˘ Y i ( d, a, s ) d = { ˜ Y ( d ) − µ ( d, a, s ) | D ( a ) = d, S = s } , note that {{ ˘ Y π ( i ) ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } d = {{ ˘ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } . (A-27)As a corollary of (A-27), { R Bn, |{ ( D i , A i , S i ) } ni =1 } d = { R Cn, |{ ( D i , A i , S i ) } ni =1 } , (A-28)where R Cn, ≡  √ n n NG ( d,a,s )+1 n X i = n NG ( d,a,s ) n +1 ˇ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S  (A-29)and {{ ˇ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } denotes a matrix of row-wise i.i.d. random vectors, indepen-dent of { ( D i , A i , S i ) } ni =1 , with ˇ Y i ( d, a, s ) d = { ˜ Y ( d ) − µ ( d, a, s ) | D ( a ) = d, S = s } .By (A-26) and (A-28), { R n, |{ ( D i , A i , S i ) } ni =1 } d = { R Cn, |{ ( D i , A i , S i ) } ni =1 } . (A-30)For any ( h , h , h , h ) ∈ R |S| × R |S| × R |S| × R |S| , consider the following derivation. P ( R n, ≤ h , R n, ≤ h , R n, ≤ h , R n, ≤ h )= E [ P ( R n, ≤ h , R n, ≤ h , R n, ≤ h , R n, ≤ h |{ ( D i , A i , S i ) } ni =1 )] (1) = E [ P ( R n, ≤ h |{ ( D i , A i , S i ) } ni =1 )1[ R n, ≤ h , R n, ≤ h , R n, ≤ h ]] (2) = E [ P ( R Cn, ≤ h |{ ( D i , A i , S i ) } ni =1 )1[ R n, ≤ h , R n, ≤ h , R n, ≤ h ]] (3) = E [ P ( R Cn, ≤ h , R n, ≤ h , R n, ≤ h , R n, ≤ h |{ ( D i , A i , S i ) } ni =1 )]= P ( R Cn, ≤ h , R n, ≤ h , R n, ≤ h , R n, ≤ h ) , where (1) and (3) holds because ( R n, , R n, , R n, ) is a nonstochastic function of { ( D i , A i , S i ) } ni =1 , and (2) holds by(A-30). Since the choice of ( h , h , h , h ) was arbitrary, (A-19) follows.For each g ∈ G , let F g ≡ X s ∈S [ P ( G ( D (1) , , s ) < g | S = s ) π A ( s ) p ( s ) + P ( G ( D (0) , , s ) < g | S = s )(1 − π A ( s )) p ( s )] . (A-31)Also, deﬁne R Dn, ≡  √ n ⌊ nF G ( d,a,s )+1 ⌋ X i = ⌊ nF G ( d,a,s ) ⌋ +1 ˇ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S  . (A-32) ince R Dn, is a nonstochastic function of {{ ˇ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } , ( R n, , R n, , R n, )is a nonstochastic function of { ( D i , A i , S i ) } ni =1 , and {{ ˇ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } ⊥{ ( D i , A i , S i ) } ni =1 , we conclude that (A-20) holds.For each ( d, a, s, u ) ∈ { , } × S × (0 , L n ( u ) = 1 √ n ⌊ nu ⌋ X i =1 ˇ Y i ( d, a, s ) . Note that ˇ Y i ( d, a, s ) d = { ˜ Y ( d ) − µ ( d, a, s ) | D ( a ) = d, S = s } and so (A-2) implies that E [ ˇ Y i ( d, a, s )] = 0 and V [ ˇ Y i ( d, a, s )] = σ ( d, a, s ). By repeating arguments in the proof of Bugni et al. (2018, Lemma B.2), L n ( u ) d → N (0 , uσ ( d, a, s )) . (A-33)Since {{ ˇ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } , L n ( F G ( d,a,s ) ) ⊥ L n ( F G ( d,a,s )+1 ) − L n ( F G ( d,a,s ) ). By thisand that (A-33) holds for u ∈ { F G ( d,a,s ) , F G ( d,a,s )+1 } ,1 √ n ⌊ nF G ( d,a,s )+1 ⌋ X i = ⌊ nF G ( d,a,s ) ⌋ +1 ˇ Y i ( d, a, s ) = L n ( F G ( d,a,s )+1 ) − L n ( F G ( d,a,s ) ) d → N (0 , ( F G ( d,a,s )+1 − F G ( d,a,s ) ) σ ( d, a, s )) . (A-34)Since {{ ˇ Y i ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ′ : i = 1 , . . . , n } are independent random variables, we conclude that forany ( d, a, s ) , ( ˜ d, ˜ a, ˜ s ) ∈ { , } × S with ( d, a, s ) = ( ˜ d, ˜ a, ˜ s ), it follows that1 √ n ⌊ nF G ( d,a,s )+1 ⌋ X i = ⌊ nF G ( d,a,s ) ⌋ +1 ˇ Y i ( d, a, s ) ⊥ √ n ⌊ nF G ( ˜ d, ˜ a, ˜ s )+1 ⌋ X i = ⌊ nF G ( ˜ d, ˜ a, ˜ s ) ⌋ +1 ˇ Y i ( ˜ d, ˜ a, ˜ s ) . (A-35)By (A-34) and (A-35), R Dn, d → N ( , diag { ( F G ( d,a,s )+1 − F G ( d,a,s ) ) σ ( d, a, s ) : ( d, a, s ) ∈ { , } × S} ) (A-36)To show (A-21) from (A-36), it then suﬃces to show that for all ( d, a, s ) ∈ { , } × S , F G ( d,a,s )+1 − F G ( d,a,s ) =  a, d ) = (0 , − π D (0) ( s ))(1 − π A ( s ))+1[( a, d ) = (0 , π D (0) ( s )(1 − π A ( s ))+1[( a, d ) = (1 , − π D (1) ( s )) π A ( s )+1[( a, d ) = (1 , π D (1) ( s ) π A ( s )  p ( s ) . (A-37)We can show this from (A-31) by using an induction argument. As an initial step, note that (A-31) implies that F = 0, F = (1 − π D (0) (1))(1 − π A (1)) p (1), F = (1 − π A (1)) p (1), F = (1 − π A (1)) p (1) + (1 − π D (1) ( s )) π A ( s ) p (1) . Asthe inductive step, note that for g = 1 , , , . . . , | S |− F g +1 = F g +(1 − π D (0) ( s ))(1 − π A ( s )) p ( s ), F g +2 = F g +(1 − π D (0) ( s ))(1 − π A ( s )) p ( s ), F g +3 = F g +(1 − π D (0) ( s ))(1 − π A ( s )) p ( s ), and F g +4 = F g +(1 − π D (0) ( s ))(1 − π A ( s )) p ( s ). By ﬁnite induction, (A-37) follows.By repeating arguments in the proof of Bugni et al. (2018, Lemma B.2), we can show (A-22) follows from showingthat N g /n p → F g for all g ∈ G . Note that F = N /n = 0. For any ( d, a, s ) ∈ { , } × S , consider the following rgument. N G ( d,a,s )+1 n − N G ( d,a,s ) n (1) =  a, d ) = (0 , n ( s ) − n D ( s ) − n A ( s ) + n AD ( s ))+1[( a, d ) = (0 , n D ( s ) − n AD ( s ))+1[( a, d ) = (1 , n A ( s ) − n AD ( s ))+1[( a, d ) = (1 , n AD ( s )  n =  a, d ) = (0 , − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )(1 − n A ( s ) n ( s ) )+1[( a, d ) = (0 , n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )(1 − n A ( s ) n ( s ) )+1[( a, d ) = (1 , − n AD ( s ) n A ( s ) ) n A ( s ) n ( s ) +1[( a, d ) = (1 , n AD ( s ) n A ( s ) n A ( s ) n ( s )  n ( s ) n (2) = F G ( d,a,s )+1 − F G ( d,a,s ) + o p (1) , (A-38)where (1) follows from an induction argument similar to the one used to show (A-37) and (2) follows from Assumptions2.1, 2.2(b), Lemma A.4, and the LLN, which implies that n ( s ) n = p ( s ) + o p (1).Step 2. Under Assumptions 2.1 and 2.3, we show that ( R ′ n, , R ′ n, , R ′ n, ) d → ( ζ ′ , ζ ′ , ζ ′ ).By deﬁnition, ζ and ζ are continuously distributed, and ζ = { ζ ,s : s ∈ S} is a vector of |S| independentcoordinates, ζ ,s is continuously distributed if τ ( s ) > ζ ,s = 0 if τ ( s ) = 0. Then, ( h ′ , h ′ , h ′ ) ′ is continuity pointof the CDF of ( ζ ′ , ζ ′ , ζ ′ ) ′ if and only if h ,s = 0 for all s ∈ S with τ ( s ) = 0. Therefore, ( h ′ , h ′ , h ′ ) ′ is continuitypoint of the CDF of ( ζ ′ , ζ ′ , ζ ′ ) ′ if and only if h is continuity point of ζ . For any such ( h ′ , h ′ , h ′ ) ′ , consider thefollowing argument.lim P ( R n, ≤ h , R n, ≤ h , R n, ≤ h ) (1) = lim E [ E [ E [ I { R n, ≤ h } I { R n, ≤ h } I { R n, ≤ h }|{ A i } ni =1 , { S i } ni =1 ] |{ S i } ni =1 ]] (2) = lim E [ E [ E [ I { R n, ≤ h }|{ ( A i , S i ) } ni =1 ] I { R n, ≤ h }|{ S i } ni =1 ] I { R n, ≤ h } ]= lim  E [ E [( P ( R n, ≤ h |{ ( A i , S i ) } ni =1 ) − P ( ζ ≤ h )) I { R cn, ≤ h }|{ S i } ni =1 ] I { R n, ≤ h } ]+ P ( ζ ≤ h ) E [( P ( R n, ≤ h |{ S i } ni =1 ) − P ( ζ ≤ h )) I { R n, ≤ h } ]+ P ( ζ ≤ h ) P ( ζ ≤ h )( P ( R n, ≤ h ) − P ( ζ ≤ h )) + P ( ζ ≤ h ) P ( ζ ≤ h ) P ( ζ ≤ h )  , (A-39)where (1) follows from the LIE, and (2) follows from the fact that R n, is nonstochastic conditional on { ( A i , S i ) } ni =1 and R n, is nonstochastic conditional on { S i } ni =1 . By (A-39), | lim P ( R n, ≤ h , R n, ≤ h , R n, ≤ h ) − P ( ζ ≤ h ) P ( ζ ≤ h ) P ( ζ ≤ h ) |≤  lim E [ E [ | P ( R n, ≤ h |{ ( A i , S i ) } ni =1 ) − P ( ζ ≤ h ) ||{ S i } ni =1 ]]+ lim E [ | P ( R n, ≤ h |{ S i } ni =1 ) − P ( ζ ≤ h ) | ]+ lim | P ( R n, ≤ h ) − P ( ζ ≤ h ) |  (A-40)The proof of this step is completed by showing that the three terms on the right hand side of (A-40) are zero.We begin with the ﬁrst term. Fix ε > N ∈ N s.t. ∀ n ≥ N , E [ E [ | P ( R n, ≤ h |{ ( A i , S i ) } ni =1 ) − P ( ζ ≤ h ) ||{ S i } ni =1 ]] ≤ ε. By Assumption 2.2(b), there exists a set of values of { ( A i , S i ) } ni =1 denoted by M n s.t. P ( { ( A i , S i ) } ni =1 ∈ M n ) → { ( a i , s i ) } ni =1 ∈ M n , P ( R n, ≤ h |{ ( A i , S i ) } ni =1 = { ( a i , s i ) } ni =1 ) → P ( ζ ≤ h ), where we are using that ζ is continuously distributed. This implies that ∃ N ∈ N s.t. ∀ n ≥ N and ∀{ ( a i , s i ) } ni =1 ∈ M n , | P ( R n, ≤ h |{ ( A i , S i ) } ni =1 = { ( a i , s i ) } ni =1 ) − P ( ζ ≤ h ) | ≤ ε/ P ( { ( A i , S i ) } ni =1 ∈ M n ) ≥ − ε/ . (A-42) hen, E [ E [ | P ( R n, ≤ h |{ ( A i , S i ) } ni =1 ) − P ( ζ ≤ h ) ||{ S i } ni =1 ]]=  R { ( a i ,s i ) } ni =1 ∈ M n E [ | P ( R n, ≤ h |{ ( a i , s i ) } ni =1 ) − P ( ζ ≤ h ) ||{ S i } ni =1 = { s i } ni =1 ] × dP ( { ( A i , S i ) } ni =1 = { ( a i , s i ) } ni =1 )+ R { ( a i ,s i ) } ni =1 ∈ M cn E [ E [ | P ( R n, ≤ h |{ ( a i , s i ) } ni =1 ) − P ( ζ ≤ h ) ||{ S i } ni =1 = { s i } ni =1 ]] × dP ( { ( A i , S i ) } ni =1 = { ( a i , s i ) } ni =1 )  (1) ≤ P ( { ( A i , S i ) } ni =1 ∈ M n ) ε/ P ( { ( A i , S i ) } ni =1 ∈ M cn ) (2) ≤ ε, where (1) holds by (A-41) and (2) holds by (A-42). This completes the proof for the ﬁrst term on the right hand sideof (A-40). The argument for the second term is similar, except that the argument that relies on Assumption 2.2(b)would instead rely on Lemma A.4. Finally, the argument for the third term holds by ζ is continuously distributedand R n, d → ζ , which holds by Assumption 2.1, S ( Z i ) = S i , and the CLT.Step 3. We now combine steps 1 and 2 to complete the proof. Let ( h ′ , h ′ , h ′ , h ′ ) be a continuity point of theCDF of ( ζ ′ , ζ ′ , ζ ′ , ζ ′ ). By the same argument as in step 2, this implies that h ,s = 0 for all s ∈ S with τ ( s ) = 0.Under these conditions, consider the following derivation.lim P ( R n, ≤ h , R n, ≤ h , R n, ≤ h , R n, ≤ h ) (1) = lim P ( R Cn, ≤ h , R n, ≤ h , R n, ≤ h , R n, ≤ h ) (2) = lim P ( R Dn, ≤ h , R n, ≤ h , R n, ≤ h , R n, ≤ h ) (3) = lim P ( R Dn, ≤ h ) lim P ( R n, ≤ h , R n, ≤ h , R n, ≤ h ) (4) = P ( ζ ≤ h ) P ( ζ ≤ h ) P ( ζ ≤ h ) P ( ζ ≤ h ) , as desired, where (1) holds by (A-19) in step 1, (2) holds by (A-22) in step 1, (3) holds by (A-20) in step 1, and (4)holds by (A-22) in step 1 and (A-40) in step 2. Lemma A.6.

Assume Assumptions 2.1 and 2.2. For any ( d, a, s ) ∈ { , } × S , R n, ( d, a, s ) √ n = 1 n n X i =1 I { D i = d, A i = a, S i = s } ( ˜ Y i ( d ) − µ ( d, a, s )) = o p (1) (A-43) and R n, ( d, a, s ) ≡ n n X i =1 I { D i = d, A i = a, S i = s } ( ˜ Y i ( d ) − µ ( d, a, s )) =  a, d ) = (0 , − π A ( s ))(1 − π D (0) ( s ))+1[( a, d ) = (0 , − π A ( s )) π D (0) ( s )+1[( a, d ) = (1 , π A ( s )(1 − π D (1) ( s ))+1[( a, d ) = (1 , π A ( s ) π D (1) ( s )  p ( s ) σ ( d, a, s ) + o p (1) , (A-44) where R n is as in (A-18) .Proof. Fix ( d, a, s ) ∈ { , } × S arbitrarily throughout this proof. We begin by showing (A-43). Under our currentassumptions, step 1 of the proof of Lemma A.5 implies that R n, ( d, a, s ) √ n d = R Cn, ( d, a, s ) √ n = R Dn, ( d, a, s ) √ n + o p (1) √ n = R Dn, ( d, a, s ) √ n + o p (1) , where R Cn, and R Dn, are deﬁned in (A-29) and (A-32), respectively. Therefore, (A-43) follows from the following erivation. R Dn, ( d, a, s ) √ n (1) = 1 n ⌊ nF G ( d,a,s )+1 ⌋ X i = ⌊ nF G ( d,a,s ) ⌋ +1 ˇ Y i ( d, a, s )= ⌊ nF G ( d,a,s )+1 ⌋ − ⌊ nF G ( d,a,s ) ⌋ n P ⌊ nF G ( d,a,s )+1 ⌋ i = ⌊ nF G ( d,a,s ) ⌋ +1 ˇ Y i ( d, a, s ) ⌊ nF G ( d,a,s )+1 ⌋ − ⌊ nF G ( d,a,s ) ⌋ (2) = ( F G ( d,a,s )+1 − F G ( d,a,s ) + o (1)) o p (1) = o p (1) , as required, where (1) holds by (A-32) with F g as deﬁned in (A-31) in step 1 of the proof of Lemma A.5, and { ˇ Y i ( d, a, s ) : i = 1 , . . . , n } given by an i.i.d. sequence with ˇ Y i ( d, a, s ) d = { ˜ Y ( d ) − µ ( d, a, s ) | D ( a ) = d, S = s } , (2) holdsby 1 ≥ F G ( d,a,s )+1 > F G ( d,a,s ) ≥ E [ ˇ Y i ( d, a, s )] = 0 (due to (A-2)), and the LLN.We now show (A-44). By repeating arguments used in step 1 of the proof of Lemma A.5, we can show that R n, ( d, a, s ) d = R Dn, ( d, a, s ) + o p (1) , where R Dn, ( d, a, s ) ≡ n ⌊ nF G ( d,a,s )+1 ⌋ X i = ⌊ nF G ( d,a,s ) ⌋ +1 U i ( d, a, s )and { U i ( d, a, s ) : i = 1 , . . . , n } is an i.i.d. sequence with U i ( d, a, s ) d = { ( ˜ Y ( d ) − µ ( d, a, s )) | D ( a ) = d, S = s } . To show(A-44), consider the following argument. R Dn, ( d, a, s ) = ⌊ nF G ( d,a,s )+1 ⌋ − ⌊ nF G ( d,a,s ) ⌋ n P ⌊ nF G ( d,a,s )+1 ⌋ i = ⌊ nF G ( d,a,s ) ⌋ +1 U i ( d, a, s ) ⌊ nF G ( d,a,s )+1 ⌋ − ⌊ nF G ( d,a,s ) ⌋ (1) = ( F G ( d,a,s )+1 − F G ( d,a,s ) + o (1))( σ ( d, a, s ) + o p (1)) (2) =  a, d ) = (0 , − π A ( s ))(1 − π D (0) ( s ))+1[( a, d ) = (0 , − π A ( s )) π D (0) ( s )+1[( a, d ) = (1 , π A ( s )(1 − π D (1) ( s ))+1[( a, d ) = (1 , π A ( s ) π D (1) ( s )  p ( s ) σ ( d, a, s ) + o p (1) , where (1) holds by 1 ≥ F G ( d,a,s )+1 > F G ( d,a,s ) ≥ E [ U i ( d, a, s )] = σ ( d, a, s ) (due to (A-2)), and the LLN, and (2)follows from (A-37). A.3 Proofs of results related to Section 3

Lemma A.7 (SAT matrices) . Assume Assumptions 2.1 and 2.2. Then, Z sat n ′ X sat n /n = " diag { n ( s ) /n : s ∈ S} diag { n D ( s ) /n : s ∈ S} diag { n A ( s ) /n : s ∈ S} diag { n AD ( s ) /n : s ∈ S} = " diag { p ( s ) : s ∈ S} diag { [ π D (1) ( s ) π A ( s ) + π D (0) ( s )(1 − π A ( s ))] p ( s ) : s ∈ S} diag { π A ( s ) p ( s ) : s ∈ S} diag { π D (1) ( s ) π A ( s ) p ( s ) : s ∈ S} + o p (1) , nd thus ( Z sat n ′ X sat n /n ) − = " diag { n AD ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) : s ∈ S} diag { − n D ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) : s ∈ S} diag { − n A ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) : s ∈ S} diag { n ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) : s ∈ S} =  diag { π D (1) ( s ) p ( s )(1 − π A ( s ))[ π D (1) ( s ) − π D (0) ( s )] : s ∈ S} diag { − [ π D (1) ( s ) π A ( s )+ π D (0) ( s )(1 − π A ( s ))] p ( s ) π A ( s )(1 − π A ( s ))[ π D (1) ( s ) − π D (0) ( s )] : s ∈ S} diag { − p ( s )(1 − π A ( s ))[ π D (1) ( s ) − π D (0) ( s )] : s ∈ S} diag { p ( s ) π A ( s )(1 − π A ( s ))[ π D (1) ( s ) − π D (0) ( s )] : s ∈ S}  + o p (1) . Also, Z sat n ′ Y n /n = " { n P ni =1 I { S i = s } Y i : s ∈ S} , { n P ni =1 I { A i = 1 , S i = s } Y i : s ∈ S} . Proof.

This equalities follow from algebra and the convergences follow from the CMT.

Theorem A.1 (SAT limits) . Assume Assumptions 2.1 and 2.2. Then, for every s ∈ S , ˆ β sat ( s ) p → β ( s ) ≡ E [ Y (1) − Y (0) | C, S = s ]ˆ γ sat ( s ) p → γ ( s ) ≡ " π D (1) ( s ) E [ Y (0) | C, S = s ] − π D (0) ( s ) E [ Y (1) | C, S = s ]+ π D (0) ( s ) E [ Y (1) | AT, S = s ] + (1 − π D (1) ( s )) E [ Y (0) | NT, S = s ] ˆ P ( S = s, C ) p → P ( S = s, C ) ≡ p ( s )( π D (1) ( s ) − π D (0) ( s ))ˆ P ( S = s | C ) p → P ( S = s | C ) ≡ p ( s )( π D (1) ( s ) − π D (0) ( s )) P ˜ s ∈S p (˜ s )( π D (1) (˜ s ) − π D (0) (˜ s )) , (A-45) where ( ˆ β sat ( s ) , ˆ γ sat ( s )) is as in (3.1) and ( ˆ P ( S = s, C ) , ˆ P ( S = s | C )) is as is in (3.3) . Also, ˆ β sat p → β ≡ E [ Y (1) − Y (0) | C ]ˆ P ( C ) p → P ( C ) ≡ X s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) , (A-46) where ˆ β sat is as in (3.4) and ˆ P ( C ) is as in (3.3) .Proof. We focus on showing (A-45), as (A-46) follows from (A-45) and CMT. To show the ﬁrst line of (A-45), considerthe following derivation.ˆ β sat ( s ) (1) = n ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) n n X i =1 I { A i = 1 , S i = s } Y i − n A ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) n n X i =1 I { S i = s } Y i (2) =  ( n ( s ) − n A ( s )) P ni =1 D i = 1 , A i = 1 , S i = s ][ ˜ Y i (1) + E [ Y (1) | S = s ]]+( n ( s ) − n A ( s )) P ni =1 D i = 0 , A i = 1 , S i = s ][ ˜ Y i (0) + E [ Y (0) | S = s ]] − n A ( s ) P ni =1 D i = 1 , A i = 0 , S i = s ][ ˜ Y i (1) + E [ Y (1) | S = s ]] − n A ( s ) P ni =1 D i = 0 , A i = 0 , S i = s ][ ˜ Y i (0) + E [ Y (0) | S = s ]]  n ( s ) n AD ( s ) − n A ( s ) n D ( s )(3) =  ( n ( s ) − n A ( s )) P ni =1 D i = 1 , A i = 1 , S i = s ] ˜ Y i (1)+( n ( s ) − n A ( s )) P ni =1 D i = 0 , A i = 1 , S i = s ] ˜ Y i (0) − n A ( s ) P ni =1 D i = 1 , A i = 0 , S i = s ] ˜ Y i (1) − n A ( s ) P ni =1 D i = 0 , A i = 0 , S i = s ] ˜ Y i (0)  n ( s ) n AD ( s ) − n A ( s ) n D ( s ) + E [ Y (1) − Y (0) | S = s ] , (A-47)where (1) holds by (3.1) and Lemma A.7, (2) holds by the fact that, conditional on ( D i , S i ) = ( d, s ), Y i = Y i ( d ) = Y i ( d ) + E [ Y ( d ) | S = s ] (by (A-1)), and (3) holds by (A-2) and the following algebraic derivation: E [ Y (1) − Y (0) | S = s ] =  ( n ( s ) − n A ( s )) P ni =1 D i = 1 , A i = 1 , S i = s ] E [ Y (1) | S = s ]+( n ( s ) − n A ( s )) P ni =1 D i = 0 , A i = 1 , S i = s ] E [ Y (0) | S = s ] − n A ( s ) P ni =1 D i = 1 , A i = 0 , S i = s ] E [ Y (1) | S = s ] − n A ( s ) P ni =1 D i = 0 , A i = 0 , S i = s ] E [ Y (0) | S = s ]  n ( s ) n AD ( s ) − n A ( s ) n D ( s ) . To complete the proof of the ﬁrst line of (A-45), consider the following derivation.ˆ β sat ( s ) − β ( s ) (1) =  nn ( s ) (1 − n A ( s ) n ( s ) ) √ n R n, (1 , , s ) + n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) n AD ( s ) n A ( s ) µ (1 , , s )+ nn ( s ) (1 − n A ( s ) n ( s ) ) √ n R n, (0 , , s ) + n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )(1 − n AD ( s ) n A ( s ) ) µ (0 , , s ) − nn ( s ) n A ( s ) n ( s ) 1 √ n R n, (1 , , s ) − n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) µ (1 , , s ) − nn ( s ) n A ( s ) n ( s ) 1 √ n R n, (0 , , s ) − n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) µ (0 , , s )+ n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )( n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( E [ Y (1) − Y (0) | S = s ] − β ( s ))  nA ( s ) n ( s ) (1 − nA ( s ) n ( s ) )( nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) )(2) =  n AD ( s ) n A ( s ) ( E [ Y (1) | AT, S = s ] π D (0) ( s ) π D (1) ( s ) + E [ Y (1) | C, S = s ] π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) − E [ Y (1) | S = s ])+(1 − n AD ( s ) n A ( s ) )( E [ Y (0) | NT, S = s ] − E [ Y (0) | S = s ]) − ( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( E [ Y (1) | AT, S = s ] − E [ Y (1) | S = s ]) − (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) × ( E [ Y (0) | NT, S = s ] − π D (1) ( s )1 − π D (0) ( s ) + E [ Y (0) | C, S = s ] π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) − E [ Y (0) | S = s ])+( n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( E [ Y (1) − Y (0) | S = s ] − β ( s ))  ( nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) + o p (1) (3) = o p (1) , as desired, where (1) holds by (A-18), (A-47) and β ( s ) = E [ Y (1) − Y (0) | C, S = s ], (2) holds by Lemma A.3, and (3)holds by Assumption 2.1 and Lemma A.4.To show the second line of (A-45), consider the following argument.ˆ γ sat ( s ) (1) = n AD ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) 1 n n X i =1 Y i I { S i = s } − n D ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) 1 n n X i =1 Y i A i I { S i = s } (2) =  − [( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )(1 − n A ( s ) n ( s ) ) nn ( s ) ][ √ n ( R n, (1 , , s ) + R n, (0 , , s ))]+[ n AD ( s ) n A ( s ) n A ( s ) n ( s ) nn ( s ) ][ √ n ( R n, (1 , , s ) + R n, (0 , , s ))] − [ n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) n AD ( s ) n A ( s ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ][ µ (1 , , s ) + E [ Y (1) | S = s ]] − [ n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )(1 − n AD ( s ) n A ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ][ µ (0 , , s ) + E [ Y (0) | S = s ]]+[ n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) n AD ( s ) n A ( s ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ][ µ (1 , , s ) + E [ Y (1) | S = s ]]+[ n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) n AD ( s ) n A ( s ) (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )][ µ (0 , , s ) + E [ Y (0) | S = s ]]  n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )( n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) (3) =  − [ π D (0) ( s ) π D (1) ( s )][ µ (1 , , s ) + E [ Y (1) | S = s ]] − [ π D (0) ( s )(1 − π D (1) ( s ))][ µ (0 , , s ) + E [ Y (0) | S = s ]]+[ π D (1) ( s ) π D (0) ( s )][ µ (1 , , s ) + E [ Y (1) | S = s ]]+[ π D (1) ( s )(1 − π D (0) ( s ))][ µ (0 , , s ) + E [ Y (0) | S = s ]]  ( π D (1) ( s ) − π D (0) ( s )) + o p (1) (4) = " π D (0) ( s ) E [ Y (1) | AT, S = s ] + (1 − π D (1) ( s )) E [ Y (0) | NT, S = s ] − π D (0) ( s ) E [ Y (1) | C, S = s ] + π D (1) ( s ) E [ Y (0) | C, S = s ] + o p (1) , where (1) follows from (3.1) and Lemma A.7, (2) follows from the fact that, conditional on ( D i , S i ) = ( d, s ), Y i = i ( d ) = ˜ Y i ( d ) + E [ Y ( d ) | S = s ] (by (A-1)), (3) follows from Assumptions 2.1 and 2.2, the LLN, and Lemmas A.4 andA.6, and (4) follows from Lemma A.3.To conclude the proof, note that the third line of (A-45) holds by Lemma A.5. In turn, this and the CMT impliesthe fourth line of (A-45). Theorem A.2 (SAT representation) . Assume Assumptions 2.1 and 2.2. Then, for any s ∈ S , √ n ( ˆ β sat ( s ) − β ( s )) = ξ n, ( s ) + ξ n, ( s ) + ξ n, ( s ) + o p (1) , where ˆ β sat ( s ) is as in (3.1) , β ( s ) is as in (3.2) , and also ξ n, ( s ) ≡ (1 − π A ( s ))( R n, (1 , , s ) + R n, (0 , , s )) − π A ( s )( R n, (1 , , s ) + R n, (0 , , s )) p ( s ) π A ( s )(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) ξ n, ( s ) ≡ R n, (1 , s ) π D (1) ( s ) − π D (0) ( s ) " ( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) − π D (0) ( s ) π D (1) ( s ) ( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) ξ n, ( s ) ≡ R n, (2 , s ) π D (1) ( s ) − π D (0) ( s ) " ( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − − π D (1) ( s )1 − π D (0) ( s ) ( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) , (A-48) with R n, and R n, as in (A-18) .Proof. Consider the following derivation. √ n ( ˆ β sat ( s ) − β ( s )) (1) =  n ( n ( s ) − n A ( s )) n ( s ) n AD ( s ) − n A ( s ) n D ( s ) ( R n, (1 , , s ) + R n, (0 , , s )) − n A ( s ) nn ( s ) n AD ( s ) − n A ( s ) n D ( s ) ( R n, (1 , , s ) + R n, (0 , , s ))+ R n, (1 ,s ) nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) " ( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) − π D (0) ( s ) π D (1) ( s ) ( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) + R n, (2 ,s ) nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) " ( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − − π D (1) ( s )1 − π D (0) ( s ) ( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) (2) = ξ n, ( s ) + ξ n, ( s ) + ξ n, ( s ) + o p (1) , where (1) holds by (A-18), (A-47), and β ( s ) = E [ Y (1) − Y (0) | C, S = s ], and (2) holds by the auxiliary derivations in(A-49) and (A-50) that appear below.The ﬁrst auxiliary derivation is nn ( s ) n AD ( s ) − n A ( s ) n D ( s )  ( n ( s ) − n A ( s )) √ n P ni =1 D i = 1 , A i = 1 , S i = s ]( ˜ Y i (1) − µ (1 , , s ))+( n ( s ) − n A ( s )) √ n P ni =1 D i = 0 , A i = 1 , S i = s ]( ˜ Y i (0) − µ (0 , , s )) − n A ( s ) √ n P ni =1 D i = 1 , A i = 0 , S i = s ]( ˜ Y i (1) − µ (1 , , s )) − n A ( s ) √ n P ni =1 D i = 0 , A i = 0 , S i = s ]( ˜ Y i (0) − µ (0 , , s ))  (1) =  n ( s ) n (1 − nA ( s ) n ( s ) )( n ( s ) n ) nA ( s ) n ( s ) (1 − nA ( s ) n ( s ) )( nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) ( R n, (1 , , s ) + R n, (0 , , s )) − n ( s ) n nA ( s ) n ( s ) ( n ( s ) n ) nA ( s ) n ( s ) (1 − nA ( s ) n ( s ) )( nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) ( R n, (1 , , s ) + R n, (0 , , s ))  (2) = ξ n, ( s ) + o p (1) , (A-49)where (1) holds by (A-18), and (2) holds by Lemma A.5, as this implies that R n, = O p (1) and also that n ( s ) n p → p ( s ), n A ( s ) n ( s ) p → π A ( s ), n AD ( s ) n A ( s ) p → π D (1) ( s ), and n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) p → π D (0) ( s ) for all s ∈ S . he second auxiliary derivation is √ n  n ( s ) n AD ( s ) − n A ( s ) n D ( s )  ( n ( s ) − n A ( s )) n AD ( s ) µ (1 , , s )( n ( s ) − n A ( s ))( n A ( s ) − n AD ( s )) µ (0 , , s ) − n A ( s )( n D ( s ) − n AD ( s )) µ (1 , , s ) − n A ( s )( n ( s ) − n A ( s ) − n D ( s ) + n AD ( s )) µ (0 , , s )  +( E [ Y (1) − Y (0) | S = s ] − E [ Y (1) − Y (0) | C, S = s ])  (1) = √ n  ( ( n ( s ) − n A ( s )) n AD ( s ) n ( s ) n AD ( s ) − n A ( s ) n D ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) − E [ Y (1) | C, S = s ]+(1 − n A ( s )( n ( s ) − n A ( s ) − n D ( s )+ n AD ( s )) n ( s ) n AD ( s ) − n A ( s ) n D ( s ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) ) E [ Y (0) | C, S = s ]+( ( n ( s ) − n A ( s )) n AD ( s ) n ( s ) n AD ( s ) − n A ( s ) n D ( s ) π D (0) ( s ) π D (1) ( s ) − n A ( s )( n D ( s ) − n AD ( s )) n ( s ) n AD ( s ) − n A ( s ) n D ( s ) ) E [ Y (1) | AT, S = s ]+( ( n ( s ) − n A ( s ))( n A ( s ) − n AD ( s )) n ( s ) n AD ( s ) − n A ( s ) n D ( s ) − n A ( s )( n ( s ) − n A ( s ) − n D ( s )+ n AD ( s )) n ( s ) n AD ( s ) − n A ( s ) n D ( s ) 1 − π D (1) ( s )1 − π D (0) ( s ) ) E [ Y (0) | NT, S = s ]  (2) =  ( n ( s )) nA ( s ) n ( s ) (1 − nA ( s ) n ( s ) )( R n, (2 ,s ) − R n, (1 ,s ) πD (0)( s ) πD (1)( s ) )( n ( s )) nA ( s ) n ( s ) (1 − nA ( s ) n ( s ) )( nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) ( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ])+ ( n ( s )) nA ( s ) n ( s ) (1 − nA ( s ) n ( s ) )( R n, (1 ,s ) − R n, (2 ,s ) − πD (1)( s )1 − πD (0)( s ) )( n ( s )) nA ( s ) n ( s ) (1 − nA ( s ) n ( s ) )( nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) ( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ])  (3) = ξ n, ( s ) + ξ n, ( s ) + o p (1) , (A-50)where (1) holds by Lemma A.3, (2) holds by (A-18), and (3) by Lemma A.5, as this implies that R n, = O p (1) andalso that n ( s ) n p → p ( s ), n A ( s ) n ( s ) p → π A ( s ) for all s ∈ S . Theorem A.3 (SAT strata speciﬁc asy. dist.) . Under Assumptions 2.1 and 2.2, and for any s ∈ S , {√ n ( ˆ β sat ( s ) − β ( s )) : s ∈ S} ′ d → N ( , diag ( { V sat Y, ( s ) + V sat Y, ( s ) + V sat D, ( s ) + V sat D, ( s ) : s ∈ S} )) , where ˆ β sat ( s ) is as (3.1) , β ( s ) is as in (3.2) , and V sat Y, ( s ) ≡  V [ Y (1) | S = s, AT ] π D (0) ( s ) + V [ Y (0) | S = s, NT ](1 − π D (1) ( s ))+ V [ Y (1) | S = s, C ]( π D (1) ( s ) − π D (0) ( s ))+( E [ Y (1) | S = s, C ] − E [ Y (1) | S = s, AT ]) π D (0) ( s )( π D (1) ( s ) − π D (0) ( s )) / ( π D (1) ( s ))  p ( s )( π D (1) ( s ) − π D (0) ( s )) π A ( s ) V sat Y, ( s ) ≡  V [ Y (1) | S = s, AT ] π D (0) ( s ) + V [ Y (0) | S = s, NT ](1 − π D (1) ( s ))+ V [ Y (0) | S = s, C ]( π D (1) ( s ) − π D (0) ( s ))+( E [ Y (0) | S = s, C ] − E [ Y (0) | S = s, NT ]) (1 − π D (1) ( s ))( π D (1) ( s ) − π D (0) ( s )) / (1 − π D (0) ( s ))  p ( s )( π D (1) ( s ) − π D (0) ( s )) (1 − π A ( s )) V sat D, ( s ) ≡ (1 − π D (1) ( s )) " π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) p ( s ) π A ( s ) π D (1) ( s )( π D (1) ( s ) − π D (0) ( s )) V sat D, ( s ) ≡ π D (0) ( s ) " (1 − π D (0) ( s ))( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − (1 − π D (1) ( s ))( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) p ( s )(1 − π A ( s ))(1 − π D (0) ( s ))( π D (1) ( s ) − π D (0) ( s )) . Proof.

This result follows from Lemma A.5 and Theorem A.2.

Proof of Theorem 3.1.

Before proving the result, we generate several auxiliary derivations for any arbitrary s ∈ S . he ﬁrst auxiliary derivation is as follows. √ n [ ˆ P ( S = s | C ) − P ( S = s | C )] (1) = √ n " n ( s ) n ( n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) P ˜ s ∈S n (˜ s ) n ( n AD (˜ s ) n A (˜ s ) − n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) ) − P ( S = s, C ) P ( C ) (2) = 1ˆ P ( C )  n ( s ) n ( R n, (1 , s ) − R n, (2 , s )) + ( π D (1) ( s ) − π D (0) ( s )) R n, ( s ) − P ( S = s | C ) P ˜ s ∈S n (˜ s ) n ( R n, (1 , ˜ s ) − R n, (2 , ˜ s )) − P ( S = s | C ) P ˜ s ∈S ( π D (1) (˜ s ) − π D (0) (˜ s )) R n, (˜ s )  (3) = 1 P ( C )  p ( s )( R n, (1 , s ) − R n, (2 , s )) + ( π D (1) ( s ) − π D (0) ( s )) R n, ( s ) − P ( S = s | C ) P ˜ s ∈S p (˜ s )( R n, (1 , ˜ s ) − R n, (2 , ˜ s )) − P ( S = s | C ) P ˜ s ∈S ( π D (1) (˜ s ) − π D (0) (˜ s )) R n, (˜ s )  + o p (1) , (A-51)where (1) holds by (A-46), (2) holds by Lemma A.5, (3) follows from (A-46) and Lemma A.5, as it implies that R n, = O p (1), R n, = O p (1), and Assumption 2.1, as it implies that n ( s ) n p → p ( s ) for all s ∈ S .The second auxiliary derivation is as follows. X s ∈S ˆ P ( S = s | C ) √ n ( ˆ β sat ( s ) − β ( s )) (1) = 1 P ( C ) X s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) √ n ( ˆ β sat ( s ) − β ( s )) + o p (1) (2) = 1 P ( C )  P s ∈S π A ( s ) ( R n, (1 , , s ) + R n, (0 , , s )) − P s ∈S − π A ( s )) ( R n, (1 , , s ) + R n, (0 , , s ))+ P s ∈S p ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ])( R n, (2 , s ) − R n, (1 , s ) π D (0) ( s ) π D (1) ( s ) )+ P s ∈S p ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ])( R n, (1 , s ) − R n, (2 , s ) − π D (1) ( s )1 − π D (0) ( s ) )  + o p (1) , (A-52)where (1) holds by (A-45) and Theorem A.3, as this implies √ n ( ˆ β sat ( s ) − β ( s )) = O p (1) for all s ∈ S , and (2) followsfrom Theorem A.2.We are now ready to complete the proof of the desired result. To this end, consider the following derivation. √ n ( ˆ β sat − β ) = X s ∈S ˆ P ( S = s | C ) √ n ( ˆ β sat ( s ) − β ( s )) + X s ∈S β ( s ) √ n ( ˆ P ( S = s | C ) − P ( S = s | C )) (1) = 1 P ( C )  P s ∈S π A ( s ) ( R n, (1 , , s ) + R n, (0 , , s )) − P s ∈S − π A ( s )) ( R n, (1 , , s ) + R n, (0 , , s ))+ P s ∈S p ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ])( R n, (2 , s ) − R n, (1 , s ) π D (0) ( s ) π D (1) ( s ) )+ P s ∈S p ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ])( R n, (1 , s ) − R n, (2 , s ) − π D (1) ( s )1 − π D (0) ( s ) )+ P s ∈S β ( s ) p ( s )( R n, (1 , s ) − R n, (2 , s )) + P s ∈S β ( s )( π D (1) ( s ) − π D (0) ( s )) R n, ( s ) − P s ∈S β ( s ) P ( S = s | C )[ P ˜ s ∈S p (˜ s )( R n, (1 , ˜ s ) − R n, (2 , ˜ s ))] − P s ∈S β ( s ) P ( S = s | C )[ P ˜ s ∈S ( π D (1) (˜ s ) − π D (0) (˜ s )) R n, (˜ s )]  + o p (1) (2) = 1 P ( C )  P s ∈S π A ( s ) ( R n, (1 , , s ) + R n, (0 , , s )) − P s ∈S − π A ( s )) ( R n, (1 , , s ) + R n, (0 , , s ))+ P s ∈S p ( s )  ( β ( s ) − β )+ E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ] − ( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) π D (0) ( s ) π D (1) ( s )  R n, (1 , s )+ P s ∈S p ( s )  E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ] − ( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) − π D (1) ( s )1 − π D (0) ( s ) − ( β ( s ) − β )  R n, (2 , s )+ P s ∈S ( β ( s ) − β )( π D (1) ( s ) − π D (0) ( s )) R n, ( s )  + o p (1) , (A-53)where (1) holds by (A-51) and (A-52), and equality (2) holds by β = P s ∈S β ( s ) P ( S = s | C ). The desired result ollows from combining (A-53), Lemmas A.3 and A.5. Lemma A.8 (SAT residuals) . Assume Assumptions 2.1 and 2.2. Let { u i } ni =1 denote the population version of theSAT regression residuals, deﬁned by u i ≡ Y i − X s ∈S I { S i = s } γ ( s ) − X s ∈S D i I { S i = s } β ( s ) , (A-54) where { ( β ( s ) , γ ( s )) : s ∈ S} is as in (3.2) . Then, the SAT residuals { ˆ u i } ni =1 deﬁned in (3.7) are such that, for any ( d, a, s ) ∈ { , } × S , n n X i =1 I { D i = d, A i = a, S i = s } ˆ u i = 1 n n X i =1 I { D i = d, A i = a, S i = s } u i + o p (1) = o p (1)+ p ( s )  − I { ( d, a ) = (1 , } π A ( s )(1 − π D (1) ( s )) π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! + I { ( d, a ) = (0 , } π A ( s )(1 − π D (1) ( s )) π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! − I { ( d, a ) = (1 , } (1 − π A ( s )) π D (0) ( s ) (1 − π D (0) ( s ))( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − (1 − π D (1) ( s ))( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! + I { ( d, a ) = (0 , } (1 − π A ( s )) π D (0) ( s ) (1 − π D (0) ( s ))( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − (1 − π D (1) ( s ))( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) !  (A-55) and n n X i =1 I { D i = d, A i = a, S i = s } ˆ u i = 1 n n X i =1 I { D i = d, A i = a, S i = s } u i + o p (1) = o p (1)+ p ( s )  I { ( d, a ) = (1 , } π A ( s ) π D (1) ( s ) × " σ (1 , , s ) + π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! ( − π D (1) ( s ) π D (1) ( s ) ) + I { ( d, a ) = (0 , } π A ( s )(1 − π D (1) ( s )) × " σ (0 , , s ) + π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! + I { ( d, a ) = (1 , } (1 − π A ( s )) π D (0) ( s ) × " σ (1 , , s ) + (1 − π D (0) ( s ))( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − (1 − π D (1) ( s ))( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! + I { ( d, a ) = (0 , } (1 − π A ( s ))(1 − π D (0) ( s )) × " σ (0 , , s ) + (1 − π D (0) ( s ))( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − (1 − π D (1) ( s ))( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! ( π D (0) ( s )1 − π D (0) ( s ) ) . (A-56) Proof.

We only show (A-55), as the proof of (A-56) follows from analogous arguments. Fix ( d, a, s ) ∈ { , } × S arbitrarily. To show the ﬁrst equality in (A-55), consider the following argument.1 n n X i =1 I { D i = d, A i = a, S i = s } ˆ u i = 1 n n X i =1 I { D i = d, A i = a, S i = s } [ u i + γ ( s ) − ˆ γ sat ( s ) + I { d = 1 } ( β ( s ) − ˆ β sat ( s ))] (1) = 1 n n X i =1 I { D i = d, A i = a, S i = s } u i + o p (1) , where (1) holds by Theorem A.1 and ( n P ni =1 I { D i = d, A i = a, S i = s } u i ) ≤ n P ni =1 u i = O p (1). To show the econd equality in (A-55), consider the following argument.1 n n X i =1 I { D i = d, A i = a, S i = s } u i = 1 n n X i =1 I { D i = d, A i = a, S i = s } ( Y i − γ ( s ) − dβ ( s )) (1) = 1 n n X i =1 I { D i = d, A i = a, S i = s }  ( ˜ Y i ( d ) − µ ( d, a, s )) + ( µ ( d, a, s ) + E [ Y ( d ) | S = s ])+( π D (0) ( s ) − d ) E [ Y (1) | C, S = s ] + ( d − π D (1) ( s )) E [ Y (0) | C, S = s ] − π D (0) ( s ) E [ Y (1) | AT, S = s ] − (1 − π D (1) ( s )) E [ Y (0) | NT, S = s ]  (2) = p ( s )  [ I { ( d, a ) = (0 , } − I { ( d, a ) = (1 , } ] π A ( s )(1 − π D (1) ( s )) × π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! +[ I { ( d, a ) = (0 , } − I { ( d, a ) = (1 , } ](1 − π A ( s )) π D (0) ( s ) × (1 − π D (0) ( s ))( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − (1 − π D (1) ( s ))( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) !  + o p (1) , where (1) holds by Theorem A.1, β ( s ) = E [ Y (1) − Y (0) | C, S = s ], and the fact that, conditional on ( D i , S i ) = ( d, s ), Y i = Y i ( d ) = ˜ Y i ( d ) + E [ Y ( d ) | S = s ] (by (A-1)), (2) follows from Lemmas A.3, A.4, and A.6, and Assumptions2.2(b). Proof of Theorem 3.2.

The desired result follows from showing thatˆ V sat1 p → V sat Y, + V sat D, , ˆ V sat0 p → V sat Y, + V sat D, , and ˆ V sat H p → V sat H . (A-57)We only show the result in (A-57), as the others can be shown analogously. Consider the following derivation.ˆ V sat1 = 1ˆ P ( C ) X s ∈S ( 1 n A ( s ) /n ( s ) ) " n P ni =1 I { D i = 1 , A i = 1 , S i = s } (ˆ u i + (1 − n AD ( s ) n A ( s ) )( ˆ β sat ( s ) − ˆ β sat )) + n P ni =1 I { D i = 0 , A i = 1 , S i = s } (ˆ u i − n AD ( s ) n A ( s ) ( ˆ β sat ( s ) − ˆ β sat )) (1) = 1 P ( C ) X s ∈S p ( s ) π A ( s )  π D (1) ( s ) σ (1 , , s ) + (1 − π D (1) ( s )) σ (0 , , s )+ − π D (1) ( s ) π D (1) ( s )  π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) − π D (1) ( s )( β ( s ) − β )   + o p (1) (2) = V sat Y, + V sat D, + o p (1) , where (1) holds by Theorem A.1 and Lemma A.8, and (2) holds by Lemma A.3. Theorem A.4 (Estimation of primitive parameters) . Under Assumptions 2.1 and 2.2, the primitive parameters in (2.3) can be consistently estimated. In particular, for any s ∈ S , (cid:18) n ( s ) n , n A ( s ) n ( s ) , n AD ( s ) n A ( s ) , n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) (cid:19) p → ( p ( s ) , π A ( s ) , π D (1) ( s ) , π D (0) ( s )) . (A-58) lso, provided that the conditioning event has positive probability, ˆ E [ Y (0) | NT, S = s ] ≡ n/ n A ( s ) − n AD ( s ) " n P ni =1 I { D i = 0 , A i = 1 , S i = s } ˆ u i − n P ni =1 I { D i = 1 , A i = 1 , S i = s } ˆ u i + ˆ γ sat ( s ) p → E [ Y (0) | NT, S = s ]ˆ E [ Y (1) | AT, S = s ] ≡ n/ n D ( s ) − n AD ( s ) " n P ni =1 I { D i = 1 , A i = 0 , S i = s } ˆ u i − n P ni =1 I { D i = 0 , A i = 0 , S i = s } ˆ u i + ˆ β sat ( s ) + ˆ γ sat ( s ) p → E [ Y (1) | AT, S = s ]ˆ E [ Y (0) | C, S = s ] ≡ nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s )  ˆ γ sat ( s ) + n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ˆ β sat ( s ) − (1 − n AD ( s ) n A ( s ) ) ˆ E [ Y (0) | NT, S = s ] − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ˆ E [ Y (1) | AT, S = s ]  p → E [ Y (0) | C, S = s ]ˆ E [ Y (1) | C, S = s ] ≡ ˆ β sat ( s ) + ˆ E [ Y (0) | C, S = s ] p → E [ Y (1) | C, S = s ] (A-59) and ˆ V [ Y (1) | AT, S = s ] ≡  n ( s ) n (1 − nA ( s ) n ( s ) ) nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) n P ni =1 I { D i = 1 , A i = 0 , S i = s } ˆ u i − (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( ˆ E [ Y (1) | C, S = s ] − ˆ E [ Y (1) | AT, S = s ]) − (1 − n AD ( s ) n A ( s ) )( ˆ E [ Y (0) | C, S = s ] − ˆ E [ Y (0) | NT, S = s ]) !  p → V [ Y (1) | AT, S = s ]ˆ V [ Y (0) | NT, S = s ] ≡  nn A ( s ) − n AD ( s ) 1 n P ni =1 I { D i = 0 , A i = 1 , S i = s } ˆ u i − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ( ˆ E [ Y (1) | C, S = s ] − ˆ E [ Y (1) | AT, S = s ]) − n AD ( s ) n A ( s ) ( ˆ E [ Y (0) | C, S = s ] − ˆ E [ Y (0) | NT, S = s ]) !  p → V [ Y (0) | NT, S = s ]ˆ V [ Y (1) | C, S = s ] ≡ nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ×  nn A ( s ) 1 n P ni =1 I { D i = 1 , A i = 1 , S i = s } ˆ u i − (1 − nAD ( s ) nA ( s ) ) nAD ( s ) nA ( s ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ( ˆ E [ Y (1) | C, S = s ] − ˆ E [ Y (1) | AT, S = s ]) − n AD ( s ) n A ( s ) ( ˆ E [ Y (0) | C, S = s ] − ˆ E [ Y (0) | NT, S = s ]) ! − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ( nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) nAD ( s ) nA ( s ) ( ˆ E [ Y (1) | C, S = s ] − ˆ E [ Y (1) | AT, S = s ]) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ˆ V [ Y (1) | AT, S = s ]  p → V [ Y (1) | C, S = s ]ˆ V [ Y (0) | C, S = s ] ≡ nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ×  n ( n ( s ) − n A ( s )) 1 n P ni =1 I { D i = 0 , A i = 0 , S i = s } ˆ u i − ( nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) (1 − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( ˆ E [ Y (1) | C, S = s ] − ˆ E [ Y (1) | AT, S = s ]) − (1 − n AD ( s ) n A ( s ) )( ˆ E [ Y (0) | C, S = s ] − ˆ E [ Y (0) | NT, S = s ]) ! − nA ( s ) − nAD ( s ) nA ( s ) ( nAD ( s ) nA ( s ) − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) )(1 − nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) ) ( ˆ E [ Y (0) | C, S = s ] − ˆ E [ Y (0) | NT, S = s ]) − (1 − n AD ( s ) n A ( s ) ) ˆ V [ Y (0) | NT, S = s ]  p → V [ Y (0) | C, S = s ] . (A-60) Proof.

The ﬁrst convergence in (A-58) holds by Assumption 2.1 and the LLN. The second convergence of (A-58) isimposed in Assumption 2.2(b). The remaining results hold by Lemma A.4. e next show the ﬁrst line of (A-59). By Lemma A.8,12 " n P ni =1 I { D i = 0 , A i = 1 , S i = s } ˆ u i − n P ni =1 I { D i = 1 , A i = 1 , S i = s } ˆ u i p → p ( s ) π A ( s )(1 − π D (1) ( s )) π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) − π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) ! . (A-61)Then, consider the following derivation.ˆ E [ Y (0) | NT, S = s ] = 1 / n ( s ) n n A ( s ) n ( s ) (1 − ( n AD ( s ) n A ( s ) )) " n P ni =1 I { D i = 0 , A i = 1 , S i = s } ˆ u i − n P ni =1 I { D i = 1 , A i = 1 , S i = s } ˆ u i + ˆ γ sat ( s ) p → E [ Y (0) | NT, S = s ] , (A-62)where the convergence follows from (A-61), Assumptions 2.1 and 2.2(b), Lemma A.4, and Theorem A.1. An analogousargument can be used to show the second line of (A-59). Next, note that the third line of (A-59) follows fromAssumption 2.1, 2.2(b), Lemma A.4, and the ﬁrst and second lines of (A-59), and Theorem A.1. Finally, note thatthe last line of (A-59) follows from the third line of (A-59) and Theorem A.1.To show the ﬁrst line of (A-60), consider the following argument.ˆ V [ Y (1) | AT, S = s ] =  n ( s ) n (1 − nA ( s ) n ( s ) ) nD ( s ) − nAD ( s ) n ( s ) − nA ( s ) n P ni =1 I { D i = 1 , A i = 0 , S i = s } ˆ u i − (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( ˆ E [ Y (1) | C, S = s ] − ˆ E [ Y (1) | AT, S = s ]) − (1 − n AD ( s ) n A ( s ) )( ˆ E [ Y (0) | C, S = s ] − ˆ E [ Y (0) | NT, S = s ]) !  p → V [ Y (1) | AT, S = s ] , where the convergence holds by (A-56) and (A-59), and Assumptions 2.1 and 2.2(b). An analogous argument can beused to show the remaining lines of (A-60). Proof of Theorem 3.3.

This result follows from elementary convergence arguments and Theorems 3.1 and 3.2.

A.4 Proofs of results related to Section 4

Lemma A.9 (SFE matrices) . Assume Assumptions 2.1 and 2.2. Then, Z sfe n ′ X sfe n n = " diag { n ( s ) /n : s ∈ S} { n D ( s ) /n : s ∈ S}{ n A ( s ) /n : s ∈ S} ′ n AD /n = " diag { p ( s ) : s ∈ S} { [ π D (1) ( s ) π A ( s ) + π D (0) ( s )(1 − π A ( s )] p ( s ) : s ∈ S}{ π A ( s ) p ( s ) : s ∈ S} ′ P s ∈S π D (1) ( s ) π A ( s ) p ( s ) + o p (1) . hus, ( Z sfe n ′ X sfe n n ) − = " diag { nn ( s ) : s ∈ S} |S|× ×|S| + " { n D ( s ) n ( s ) : s ∈ S} × { n A ( s ) n ( s ) : s ∈ S} ′ {− n D ( s ) n ( s ) : s ∈ S}{− n A ( s ) n ( s ) : s ∈ S} ′ n AD n − P s ∈S n A ( s ) n ( s ) n D ( s ) n ( s ) n ( s ) n  =  " diag { p ( s ) : s ∈ S} |S|× ×|S| +   (" π D (1) ( s ) π A ( s )+ π D (0) ( s )(1 − π A ( s )) : s ∈ S ) ×{ π A ( s ) : s ∈ S} ′  ( − " π D (1) ( s ) π A ( s )+ π D (0) ( s )(1 − π A ( s )) : s ∈ S ) {− π A ( s ) : s ∈ S} ′ P s ∈S p ( s ) π A ( s )(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s ))  + o p (1) . Also, Z sfe n ′ Y n n = " { n P ni =1 I { S i = s } Y i : s ∈ S} , { n P ni =1 A i Y i } . Proof.

The equalities follow from algebra and the convergences follow from the CMT.

Theorem A.5 (SFE limits) . Assume Assumptions 2.1 and 2.2. For every s ∈ S , ˆ γ sfe ( s ) p →  π D (0) ( s ) E [ Y (1) | AT, S = s ] + (1 − π D (1) ( s )) E [ Y (0) | NT, S = s ]+( π D (1) ( s ) − π D (0) ( s ))[ π A ( s ) E [ Y (1) | C, S = s ] + (1 − π A ( s )) E [ Y (0) | C, S = s ]] − ((1 − π A ( s )) π D (0) ( s ) + π A ( s ) π D (1) ( s )) × P ˜ s ∈S p (˜ s ) π A (˜ s )(1 − π A (˜ s ))( π D (1) (˜ s ) − π D (0) (˜ s )) E [ Y (1) − Y (0) | C,S =˜ s ] P ˜ s ∈S p (˜ s ) π A (˜ s )(1 − π A (˜ s ))( π D (1) (˜ s ) − π D (0) (˜ s ))  ˆ β sfe p → P s ∈S p ( s ) π A ( s )(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) − Y (0) | C, S = s ] P ˜ s ∈S p (˜ s ) π A (˜ s )(1 − π A (˜ s ))( π D (1) ( s ) − π D (0) (˜ s )) . (A-63) If we add Assumption 2.3(c), ˆ β sfe p → β. (A-64) Proof.

Throughout this proof, deﬁneΛ n ≡ n AD n − X s ∈S n A ( s ) n ( s ) n D ( s ) n ( s ) n ( s ) n = X s ∈S n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )( n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) p → Λ ≡ X s ∈S p ( s ) π A ( s )(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) , (A-65)where the convergence holds by Assumptions 2.1 and 2.2, Lemma A.4, and the LLN. o show the ﬁrst line of (A-63), consider the following derivation.ˆ γ sfe ( s ) = nn ( s ) 1 n n X i =1 I { S i = s } Y i − n D ( s ) n ( s ) 1Λ n n n X i =1 I { A i = 1 } Y i + n D ( s ) n ( s ) 1Λ n X ˜ s ∈S n A (˜ s ) n (˜ s ) 1 n n X i =1 I { S i = ˜ s } Y i (1) =  nn ( s ) [ √ n R n, (1 , , s ) + √ n R n, (1 , , s ) + √ n R n, (0 , , s ) + √ n R n, (0 , , s )]+ h (1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) + n A ( s ) n ( s ) n AD ( s ) n A ( s ) i n × P ˜ s ∈S [ n A (˜ s ) n (˜ s ) 1 √ n ( R n, (1 , , ˜ s ) + R n, (0 , , ˜ s )) − (1 − n A (˜ s ) n (˜ s ) ) √ n ( R n, (1 , , ˜ s ) + R n, (0 , , ˜ s ))]+ n A ( s ) n ( s ) n AD ( s ) n A ( s ) ( µ (1 , , s ) + E [ Y (1) | S = s ]) + (1 − n A ( s ) n ( s ) )( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( µ (1 , , s ) + E [ Y (1) | S = s ])+ n A ( s ) n ( s ) (1 − n AD ( s ) n A ( s ) )( µ (0 , , s ) + E [ Y (0) | S = s ])+(1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( µ (0 , , s ) + E [ Y (0) | S = s ])+((1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) + n A ( s ) n ( s ) n AD ( s ) n A ( s ) ) n × P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) (1 − n A (˜ s ) n (˜ s ) ) × " − n AD (˜ s ) n A (˜ s ) ( µ (1 , , ˜ s ) + E [ Y (1) | S = ˜ s ]) − (1 − n AD (˜ s ) n A (˜ s ) )( µ (0 , , ˜ s ) + E [ Y (0) | S = ˜ s ])+( n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )( µ (1 , , ˜ s ) + E [ Y (1) | S = ˜ s ]) + (1 − n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )( µ (0 , , ˜ s ) + E [ Y (0) | S = ˜ s ]) (2) =  nn ( s ) [ √ n R n, (1 , , s ) + √ n R n, (1 , , s ) + √ n R n, (0 , , s ) + √ n R n, (0 , , s )]+ h (1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) + n A ( s ) n ( s ) n AD ( s ) n A ( s ) i n × P ˜ s ∈S [ n A (˜ s ) n (˜ s ) 1 √ n ( R n, (1 , , ˜ s ) + R n, (0 , , ˜ s )) − (1 − n A (˜ s ) n (˜ s ) ) √ n ( R n, (1 , , ˜ s ) + R n, (0 , , ˜ s ))]+ n A ( s ) n ( s ) n AD ( s ) n A ( s ) ( E [ Y (1) | AT, S = s ] π D (0) ( s ) π D (1) ( s ) + E [ Y (1) | C, S = s ] π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) )+(1 − n A ( s ) n ( s ) )( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) E [ Y (1) | AT, S = s ] + n A ( s ) n ( s ) (1 − n AD ( s ) n A ( s ) ) E [ Y (0) | NT, S = s ]+(1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )( E [ Y (0) | NT, S = s ] − π D (1) ( s )1 − π D (0) ( s ) + E [ Y (0) | C, S = s ] π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) )+((1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) + n A ( s ) n ( s ) n AD ( s ) n A ( s ) ) n × P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) (1 − n A (˜ s ) n (˜ s ) ) ×  − n AD (˜ s ) n A (˜ s ) ( E [ Y (1) | AT, S = ˜ s ] π D (0) (˜ s ) π D (1) (˜ s ) + E [ Y (1) | C, S = ˜ s ] π D (1) (˜ s ) − π D (0) (˜ s ) π D (1) (˜ s ) ) − (1 − n AD (˜ s ) n A (˜ s ) ) E [ Y (0) | NT, S = ˜ s ] + ( n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) ) E [ Y (1) | AT, S = ˜ s ]+(1 − n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )( E [ Y (0) | NT, S = ˜ s ] − π D (1) (˜ s )1 − π D (0) (˜ s ) + E [ Y (0) | C, S = ˜ s ] π D (1) (˜ s ) − π D (0) (˜ s )1 − π D (0) (˜ s ) )   (3) =  π D (0) ( s ) E [ Y (1) | AT, S = s ] + (1 − π D (1) ( s )) E [ Y (0) | NT, S = s ]+( π D (1) ( s ) − π D (0) ( s ))[ π A ( s ) E [ Y (1) | C, S = s ] + (1 − π A ( s )) E [ Y (0) | C, S = s ]] − ((1 − π A ( s )) π D (0) ( s ) + π A ( s ) π D (1) ( s )) × P ˜ s ∈S p (˜ s ) π A (˜ s )(1 − π A (˜ s ))( π D (1) (˜ s ) − π D (0) (˜ s )) E [ Y (1) − Y (0) | C, S = ˜ s ]  + o p (1) , where (1) holds by Y i = Y i ( D i ), (A-1), (A-2), and (A-18), (2) holds by Lemma A.3, and (3) holds by Assumptions2.1 and 2.2(b), Lemmas A.4 and A.6, and (A-65). o show the second line of (A-63), consider the following derivation.ˆ β sfe = 1Λ n n n X i =1 I { A i = 1 } Y i − n X s ∈S n n X i =1 Y i I { S i = s } n A ( s ) n ( s ) (1) = 1Λ n X ˜ s ∈S  (1 − n A (˜ s ) n (˜ s ) ) √ n ( R n, (1 , , s ) + R n, (0 , , s )) − n A (˜ s ) n (˜ s ) 1 √ n ( R n, (1 , , s ) + R n, (0 , , s ))+ n (˜ s ) n n A (˜ s ) n (˜ s ) (1 − n A (˜ s ) n (˜ s ) ) n AD (˜ s ) n A (˜ s ) ( µ (1 , , ˜ s ) + E [ Y (1) | S = ˜ s ])+ n (˜ s ) n n A (˜ s ) n (˜ s ) (1 − n A (˜ s ) n (˜ s ) )(1 − n AD (˜ s ) n A (˜ s ) )( µ (0 , , ˜ s ) + E [ Y (0) | S = ˜ s ]) − n (˜ s ) n n A (˜ s ) n (˜ s ) (1 − n A (˜ s ) n (˜ s ) )( n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )( µ (1 , , ˜ s ) + E [ Y (1) | S = ˜ s ]) − n (˜ s ) n n A (˜ s ) n (˜ s ) (1 − n A (˜ s ) n (˜ s ) )(1 − n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )( µ (0 , , ˜ s ) + E [ Y (0) | S = ˜ s ])  (2) = 1Λ n X s ∈S  (1 − n A ( s ) n ( s ) ) √ n R n, (1 , , s ) + (1 − n A ( s ) n ( s ) ) √ n R n, (0 , , s ) − n A ( s ) n ( s ) 1 √ n R n, (1 , , s ) − n A ( s ) n ( s ) 1 √ n R n, (0 , , s )+ n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )[ n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) − ( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )] E [ Y (1) | AT, S = s ]+ n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )[(1 − n AD ( s ) n A ( s ) ) − (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s ) ] E [ Y (0) | NT, S = s ]+ n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] − n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ]  (3) = 1Λ X s ∈S p ( s ) π A ( s )(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) − Y (0) | C, S = s ] + o p (1) , (A-66)where (1) holds by Y i = Y i ( D i ), (A-1), (A-2), and (A-18), (2) holds by Lemma A.3, and (3) holds by Assumptions2.1 and 2.2(b), Lemmas A.4 and A.6, and (A-65).Finally, (A-64) holds by the following derivation.ˆ β sfe (1) = P s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) − Y (0) | C, S = s ] P ˜ s ∈S p (˜ s )( π D (1) (˜ s ) − π D (0) (˜ s )) + o p (1) (2) = X s ∈S P ( S = s | C ) E [ Y (1) − Y (0) | C, S = s ] + o p (1) = β + o p (1) , where (1) holds by Assumption 2.3(c), and (A-65) and (A-66), and (2) holds by (2.3). Proof of Theorem 4.1.

As a preliminary result, note that (A-65) and Assumption 2.3(c) imply thatΛ n ≡ n AD n − X s ∈S n A ( s ) n ( s ) n D ( s ) n ( s ) n ( s ) n p → Λ ≡ π A (1 − π A ) P ( C ) . (A-67)From here, consider the following argument. ξ n, ≡ √ n (Λ n − Λ) (1) = X s ∈S  n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )( n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) R n, ( s )+ p ( s )(1 − n A ( s ) n ( s ) − π A )( n AD ( s ) n A ( s ) − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) R n, ( s )+ p ( s ) π A (1 − π A )( R n, (1 , s ) − R n, (2 , s ))  (2) = X s ∈S  p ( s ) π A (1 − π A )( R n, (1 , s ) − R n, (2 , s ))+ p ( s )(1 − π A )( π D (1) ( s ) − π D (0) ( s )) R n, ( s )+ π A (1 − π A )( π D (1) ( s ) − π D (0) ( s )) R n, ( s )  + o p (1) , (A-68)where (1) holds by the deﬁnitions in (A-67) and (2) holds by Assumption 2.2(b) and Lemma A.4. ext, consider the following derivation. √ n ( ˆ β sfe − β ) × Λ n (1) = X s ∈S  (1 − n A ( s ) n ( s ) ) R n, (1 , , s ) + (1 − n A ( s ) n ( s ) ) R n, (0 , , s ) − n A ( s ) n ( s ) R n, (1 , , s ) − n A ( s ) n ( s ) R n, (0 , , s )+ √ n n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )[ n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) − ( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )] E [ Y (1) | AT, S = s ]+ √ n n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )[(1 − n AD ( s ) n A ( s ) ) − (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s ) ] E [ Y (0) | NT, S = s ]+ √ n n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] −√ n n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ] −√ np ( s ) π A (1 − π A )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) − Y (0) | C, S = s ]  − βξ n, = X s ∈S  (1 − n A ( s ) n ( s ) ) R n, (1 , , s ) + (1 − n A ( s ) n ( s ) ) R n, (0 , , s ) − n A ( s ) n ( s ) R n, (1 , , s ) − n A ( s ) n ( s ) R n, (0 , , s )+ ξ n, ( s ) + ξ n, ( s ) + ξ n, ( s )  − βξ n, , (A-69)where (1) follows from (A-66) and (A-68) and (2) follows from deﬁning ξ n, ( s ), ξ n, ( s ), and ξ n, ( s ) as in (A-70),(A-71), and (A-72), respectively. To complete the argument in (A-69), consider the following deﬁnitions. First, ξ n, ( s ) ≡ √ n n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )[ n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) − ( n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) )] E [ Y (1) | AT, S = s ] (1) = p ( s ) π A (1 − π A ) E [ Y (1) | AT, S = s ][ π D (0) ( s ) π D (1) ( s ) R n, (1 , s ) − R n, (2 , s )] + o p (1) , (A-70)where (1) uses Assumptions 2.1 and 2.2(b). Second, ξ n, ( s ) ≡ √ n n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) (cid:20) (1 − n AD ( s ) n A ( s ) ) − (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s ) (cid:21) E [ Y (0) | NT, S = s ] (1) = p ( s ) π A (1 − π A ) E [ Y (0) | NT, S = s ] (cid:20) − π D (1) ( s )1 − π D (0) ( s ) R n, (2 , s ) − R n, (1 , s ) (cid:21) + o p (1) , (A-71)where (1) uses Assumptions 2.1 and 2.2(b). Third, ξ n, ( s ) ≡  + √ n n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] −√ n n ( s ) n n A ( s ) n ( s ) (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ] −√ np ( s ) π A (1 − π A )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) − Y (0) | C, S = s ]  (1) =  p ( s ) π A (1 − π A )( π D (1) ( s ) − π D (0) ( s )) π D (1) ( s ) E [ Y (1) | C, S = s ] R n, (1 , s )+ p ( s ) π A (1 − π A )( π D (1) ( s ) − π D (0) ( s )) − π D (0) ( s ) E [ Y (0) | C, S = s ] R n, (2 , s )+ p ( s )(1 − π A )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) − Y (0) | C, S = s ] R n, ( s )+ π A (1 − π A )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) − Y (0) | C, S = s ] R n, ( s )  + o p (1) , (A-72)where (1) uses Assumptions 2.1 and 2.2(b), and Lemma A.4. rom there results, the next result follows. √ n ( ˆ β sfe − β ) (1) = 1 P ( C ) X s ∈S  R n, (1 , ,s ) π A + R n, (0 , ,s ) π A − R n, (1 , ,s )(1 − π A ) − R n, (0 , ,s )(1 − π A ) + p ( s )  ( β ( s ) − β )+ E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]+( E [ Y (1) | AT, S = s ] − E [ Y (1) | C, S = s ]) π D (0) ( s ) π D (1) ( s )  R n, (1 , s )+ p ( s )  E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ] − ( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ]) − π D (1) ( s )1 − π D (0) ( s ) − ( β ( s ) − β )  R n, (2 , s )+ p ( s ) (1 − π A ) π A (1 − π A ) ( π D (1) ( s ) − π D (0) ( s ))[ E [ Y (1) − Y (0) | C, S = s ] − β ] R n, ( s )+( π D (1) ( s ) − π D (0) ( s ))[ E [ Y (1) − Y (0) | C, S = s ] − β ] R n, ( s )  + o p (1) , (A-73)where (1) uses (A-67), (A-69), (A-70), (A-71), and (A-72), and Lemma A.5, as it implies R n = O p (1). The desiredresult then follows from (A-73), Lemmas A.3 and A.5, and P s ∈S ( π D (1) ( s ) − π D (0) ( s ))( E [ Y (1) − Y (0) | C, S = s ] − β ) p ( s ) = 0 , which in turn follows from β ( s ) = E [ Y (1) − Y (0) | C, S = s ] and (2.3). Proof of Theorem 4.2.

Note that Assumption 2.1, 2.2(b), and 2.3(c), Lemma A.4, and (A-45) and (A-46), imply thatˆ V sfe A p → V sfe A . The desired result follows from this and Theorem 3.2. Proof of Theorem 4.3.

This result follows from elementary convergence arguments and Theorems 4.1 and 4.2.

A.5 Proofs of results in Section 5

Lemma A.10 (2S matrices) . Assume Assumptions 2.1 and 2.2. Then, Z n ′ X n n = " n D /nn A /n n AD /n = " P s ∈S p ( s )[ π D (1) ( s ) π A ( s ) + π D (0) ( s )(1 − π A ( s ))] P s ∈S p ( s ) π A ( s ) P s ∈S p ( s ) π D (1) ( s ) π A ( s ) + o p (1) . Thus, ( Z n ′ X n n ) − = 1 n AD /n − ( n A /n )( n D /n ) " n AD /n − n D /n − n A /n = " P s ∈S p ( s ) π D (1) ( s ) π A ( s ) − P s ∈S p ( s )[ π D (1) ( s ) π A ( s ) + π D (0) ( s )(1 − π A ( s ))] − P s ∈S p ( s ) π A ( s ) 1 s ∈S p ( s ) π D (1) ( s ) π A ( s ) − ( P s ∈S p ( s )[ π D (1) ( s ) π A ( s ) + π D (0) ( s )(1 − π A ( s ))])( P s ∈S p ( s ) π A ( s )) ! + o p (1) . Also, Z sfe n ′ Y n n = " { n P ni =1 Y i } , { n P ni =1 I { A i = 1 } Y i } , Proof.

The equalities follow from algebra and the convergences follow from the CMT and Lemma A.4. heorem A.6 (2S limits) . Assume Assumptions 2.1 and 2.2. Then, ˆ γ p → P s ∈S  " ( P ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s ))(1 − π A ( s )) − ( P ˜ s ∈S p (˜ s ) π D (0) (˜ s )(1 − π A (˜ s ))) π A ( s ) p ( s ) π D (0) ( s ) E [ Y (1) | AT, S = s ] − ( P ˜ s ∈S p (˜ s ) π D (0) (˜ s )(1 − π A (˜ s ))) p ( s ) π A ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ]+( P ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s )) p ( s )(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) E [ Y (0) | C, S = s ]+ " ( P ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s ))(1 − π A ( s )) − ( P ˜ s ∈S p (˜ s ) π D (0) (˜ s )(1 − π A (˜ s ))) π A ( s ) p ( s )(1 − π D (1) ( s )) E [ Y (0) | NT, S = s ]  (1 − P ˜ s ∈S p (˜ s ) π A (˜ s ))( P ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s )) − ( P ˜ s ∈S p (˜ s ) π A (˜ s ))( P ˜ s ∈S p (˜ s )(1 − π A (˜ s )) π D (0) (˜ s ))ˆ β p → P s ∈S p ( s )  +[ π A ( s ) − P ˜ s ∈S p (˜ s ) π A (˜ s )] π D (0) ( s ) E [ Y (1) | AT, S = s ]++[ π A ( s ) − P ˜ s ∈S p (˜ s ) π A (˜ s )](1 − π D (1) ( s )) E [ Y (0) | NT, S = s ]+(1 − P ˜ s ∈S p (˜ s ) π A (˜ s )) π A ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ] − ( P ˜ s ∈S p (˜ s ) π A (˜ s ))(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) E [ Y (0) | C, S = s ]  (1 − P ˜ s ∈S p (˜ s ) π A (˜ s ))( P ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s )) − ( P ˜ s ∈S p (˜ s ) π A (˜ s ))( P ˜ s ∈S p (˜ s )(1 − π A (˜ s )) π D (0) (˜ s )) . (A-74) If we add Assumption 2.3(c), ˆ β p → β. (A-75) Proof.

Throughout this proof, deﬁneΞ n ≡ n AD n − n A n n D n p → Ξ ≡ − X ˜ s ∈S p (˜ s ) π A (˜ s ) ! X ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s ) ! − X ˜ s ∈S p (˜ s ) π A (˜ s ) ! X ˜ s ∈S p (˜ s )(1 − π A (˜ s )) π D (0) (˜ s ) ! , (A-76)where the convergence holds by Assumptions 2.1 and 2.2(b), and Lemma A.4. o show the ﬁrst line of (A-74), consider the following derivation.ˆ γ = 1 n AD /n − ( n A /n )( n D /n ) " n AD n n n X i =1 Y i − n D n n n X i =1 I { A i = 1 } Y i (1) = 1Ξ n X s ∈S  ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) − P ˜ s ∈S n (˜ s ) n [( n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )(1 − n A (˜ s ) n (˜ s ) ) + n AD (˜ s ) n A (˜ s ) n A (˜ s ) n (˜ s ) ]) √ n R n, (1 , , s )+( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) − P ˜ s ∈S n (˜ s ) n [( n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )(1 − n A (˜ s ) n (˜ s ) ) + n AD (˜ s ) n A (˜ s ) n A (˜ s ) n (˜ s ) ]) √ n R n, (0 , , s )+( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) ) √ n R n, (1 , , s ) + ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) ) √ n R n, (0 , , s )+  P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) − P ˜ s ∈S n (˜ s ) n [( n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )(1 − n A (˜ s ) n (˜ s ) ) + n AD (˜ s ) n A (˜ s ) n A (˜ s ) n (˜ s ) ] ! × n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) +( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) )(1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s )  n ( s ) n E [ Y (1) | AT, S = s ]+ P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) − P ˜ s ∈S n (˜ s ) n [( n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )(1 − n A (˜ s ) n (˜ s ) ) + n AD (˜ s ) n A (˜ s ) n A (˜ s ) n (˜ s ) ] ! × n ( s ) n n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ]+( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) ) n ( s ) n (1 − n A ( s ) n ( s ) )(1 − ( n D ( s ) − n AD ( s )) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ]+  P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) − P ˜ s ∈S n (˜ s ) n [( n D (˜ s ) − n AD (˜ s ) n (˜ s ) − n A (˜ s ) )(1 − n A (˜ s ) n (˜ s ) ) + n AD (˜ s ) n A (˜ s ) n A (˜ s ) n (˜ s ) ] ! n A ( s ) n ( s ) (1 − n AD ( s ) n A ( s ) )+( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) n AD (˜ s ) n A (˜ s ) )(1 − n A ( s ) n ( s ) )(1 − ( n D ( s ) − n AD ( s )) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s )  × n ( s ) n E [ Y (0) | NT, S = s ]  (2) = 1Ξ X s ∈S  [( P ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s ))(1 − π A ( s )) − ( P ˜ s ∈S p (˜ s ) π D (0) (˜ s )(1 − π A (˜ s ))) π A ( s )] × p ( s ) π D (0) ( s ) E [ Y (1) | AT, S = s ] − ( P ˜ s ∈S p (˜ s ) π D (0) (˜ s )(1 − π A (˜ s ))) p ( s ) π A ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ]+( P ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s )) p ( s )(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) E [ Y (0) | C, S = s ] (cid:2) ( P ˜ s ∈S p (˜ s ) π A (˜ s ) π D (1) (˜ s ))(1 − π A ( s )) − ( P ˜ s ∈S p (˜ s ) π D (0) (˜ s )(1 − π A (˜ s ))) π A ( s ) (cid:3) × p ( s )(1 − π D (1) ( s )) E [ Y (0) | NT, S = s ]  + o p (1) , (A-77)where (1) holds by Y i = Y i ( D i ), (A-1), (A-2), (A-18), and (2) holds by Assumption 2.1, 2.2(b), Lemma A.4 and A.6,and (A-76).To show the second line of (A-74), consider the following derivation.ˆ β = 1 n AD /n − ( n A /n )( n D /n ) " − n A n n n X i =1 Y i + 1 n n X i =1 I { A i = 1 } Y i (1) = 1Ξ n X s ∈S  (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) √ n R n, (1 , , s ) + (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) √ n R n, (0 , , s ) − ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) √ n R n, (1 , , s ) − ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) √ n R n, (0 , , s )+  (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) − ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n (1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s )  E [ Y (1) | AT, S = s ]+  (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n n A ( s ) n ( s ) (1 − n AD ( s ) n A ( s ) ) − ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s )  E [ Y (0) | NT, S = s ]+(1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] − ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ]  (2) = 1Ξ X s ∈S p ( s )  +[ π A ( s ) − P ˜ s ∈S p (˜ s ) π A (˜ s )] π D (0) ( s ) E [ Y (1) | AT, S = s ]++[ π A ( s ) − P ˜ s ∈S p (˜ s ) π A (˜ s )](1 − π D (1) ( s )) E [ Y (0) | NT, S = s ]+(1 − P ˜ s ∈S p (˜ s ) π A (˜ s )) π A ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ] − ( P ˜ s ∈S p (˜ s ) π A (˜ s ))(1 − π A ( s ))( π D (1) ( s ) − π D (0) ( s )) E [ Y (0) | C, S = s ]  + o p (1) , (A-78)where (1) holds by Y i = Y i ( D i ), (A-1), (A-2), (A-18), and (2) holds by Assumption 2.1, 2.2(b), Lemma A.4 and A.6, nd (A-76).Finally, (A-75) holds by the following derivation.ˆ β

2s (1) = P s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) − Y (0) | C, S = s ] P ˜ s ∈S p (˜ s )( π D (1) ( s ) − π D (0) (˜ s )) + o p (1) (2) = X s ∈S P ( S = s | C ) E [ Y (1) − Y (0) | C, S = s ] + o p (1) = β + o p (1) , where (1) holds by (A-76), (A-78), and Assumption 2.3(c), and (2) holds by (2.3). Proof of Theorem 5.1.

As a preliminary result, consider the following derivation. √ n (Ξ n − Ξ) (1) = √ n " n AD n − n A n n D n − P s ∈S p ( s ) π A π D (1) ( s )+ π A P s ∈S p ( s )[(1 − π A ) π D (0) ( s ) + π A π D (1) ( s )] (2) =  P s ∈S π A (1 − π A )[ π D (1) ( s ) − π D (0) ( s )] R n, ( s )+ P s ∈S p ( s ) " (1 − π A ) π D (1) ( s ) − π A P ˜ s ∈S p (˜ s ) π D (1) (˜ s )+ π A π D (0) ( s ) − (1 − π A ) P ˜ s ∈S p (˜ s ) π D (0) (˜ s ) R n, ( s )+ π A (1 − π A ) P s ∈S p ( s ) R n, (1 , s ) − π A (1 − π A ) P s ∈S p ( s ) R n, (2 , s )  + o p (1) , where (1) holds by deﬁnitions in (A-76) and Assumption 2.3(c), and (2) holds by Assumptions 2.1 and 2.2(b), andLemmas A.4 and A.5. In particular, Lemma A.5 implies that R n = O p (1) and P s ∈S R n, ( s ) = 0.Consider the following derivation. √ n ( ˆ β − β ) =  n P s ∈S (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) R n, (1 , , s ) + n P s ∈S (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) R n, (0 , , s ) − n P s ∈S ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) R n, (1 , , s ) − n P s ∈S ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) R n, (0 , , s )+ n P s ∈S ζ n, ( s ) + n P s ∈S ζ n, ( s ) + ζ n,  , (A-79)where the equality holds by the deﬁnitions of ζ n, ( s ), ζ n, ( s ), and ζ n, that appear below in (A-80), (A-82), and(A-84). First, ζ n, ( s ) ≡ √ n  (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) − ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n (1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s )  E [ Y (1) | AT, S = s ] (1) =  n ( s ) n [ − n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) − (1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ]( P ˜ s ∈S n A (˜ s ) n (˜ s ) R n, (˜ s ))+[(1 − π A ) n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) − π A (1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ] R n, ( s ) − n ( s ) n [ n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) + (1 − n A ( s ) n ( s ) ) n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ]( P ˜ s ∈S p (˜ s ) R n, (˜ s ))+ p ( s )[(1 − π A ) n AD ( s ) n A ( s ) π D (0) ( s ) π D (1) ( s ) + π A n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ] R n, ( s )+ π A (1 − π A ) p ( s ) π D (0) ( s ) π D (1) ( s ) R n, (1 , s ) − π A (1 − π A ) p ( s ) R n, (2 , s )  E [ Y (1) | AT, S = s ] (2) = p ( s )  − π D (0) ( s ) E [ Y (1) | AT, S = s ]( P ˜ s ∈S p (˜ s ) R n, (˜ s ))+ π D (0) ( s ) E [ Y (1) | AT, S = s ] R n, ( s )+ π A (1 − π A ) π D (0) ( s ) π D (1) ( s ) E [ Y (1) | AT, S = s ] R n, (1 , s ) − π A (1 − π A ) E [ Y (1) | AT, S = s ] R n, (2 , s )  + o p (1) , (A-80)where (1) holds by the deﬁnitions in (A-76), and (2) holds by Assumptions 2.1, 2.2(b), and Lemmas A.4 and A.5. his implies that1Ξ n X s ∈S ζ n, ( s ) = 1Ξ X s ∈S p ( s )  " π D (0) ( s ) E [ Y (1) | AT, S = s ] − ( P ˜ s ∈S p (˜ s ) π D (0) (˜ s ) E [ Y (1) | AT, S = ˜ s ]) R n, ( s )+ π A (1 − π A ) π D (0) ( s ) π D (1) ( s ) E [ Y (1) | AT, S = s ] R n, (1 , s ) − π A (1 − π A ) E [ Y (1) | AT, S = s ] R n, (2 , s )  + o p (1) . (A-81)Second, ζ n, ( s ) ≡ √ n  (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n n A ( s ) n ( s ) (1 − n AD ( s ) n A ( s ) ) − ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s )  E [ Y (0) | NT, S = s ] (1) =  − n ( s ) n [ n A ( s ) n ( s ) (1 − n AD ( s ) n A ( s ) ) + (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s ) ]( P ˜ s ∈S R n, (˜ s ) n A (˜ s ) n (˜ s ) )+[(1 − π A ) n A ( s ) n ( s ) (1 − n AD ( s ) n A ( s ) ) − π A (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s ) ] R n, ( s ) − n ( s ) n [ n A ( s ) n ( s ) (1 − n AD ( s ) n A ( s ) ) + (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s ) ]( P ˜ s ∈S p (˜ s ) R n, (˜ s )) p ( s )[(1 − π A )(1 − n AD ( s ) n A ( s ) ) + π A (1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) − π D (1) ( s )1 − π D (0) ( s ) ] R n, ( s ) − p ( s )(1 − π A ) π A R n, (1 , s ) + p ( s ) π A (1 − π A ) − π D (1) ( s )1 − π D (0) ( s ) R n, (2 , s )  × E [ Y (0) | NT, S = s ] (2) = p ( s )  +(1 − π D (1) ( s )) E [ Y (0) | NT, S = s ] R n, ( s ) − (1 − π D (1) ( s )) E [ Y (0) | NT, S = s ]( P ˜ s ∈S p (˜ s ) R n, (˜ s )) − π A (1 − π A ) E [ Y (0) | NT, S = s ] R n, (1 , s )+ π A (1 − π A ) − π D (1) ( s )1 − π D (0) ( s ) E [ Y (0) | NT, S = s ] R n, (2 , s )  + o p (1) , (A-82)where (1) holds by the deﬁnitions in (A-76), and (2) holds by Assumptions 2.1, 2.2(b), and Lemmas A.4 and A.5.This implies that1Ξ n X s ∈S ζ n, ( s ) = 1Ξ X s ∈S p ( s )  " (1 − π D (1) ( s )) E [ Y (0) | NT, S = s ] − ( P ˜ s ∈S p (˜ s )(1 − π D (1) (˜ s )) E [ Y (0) | NT, S = ˜ s ]) R n, ( s ) − π A (1 − π A ) E [ Y (0) | NT, S = s ] R n, (1 , s )+ π A (1 − π A ) − π D (1) ( s )1 − π D (0) ( s ) E [ Y (0) | NT, S = s ] R n, (2 , s )  + o p (1) . (A-83) hird, ζ n, ≡ n √ n  P s ∈S (1 − P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] − P s ∈S π A (1 − π A ) p ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ] − P s ∈S ( P ˜ s ∈S n (˜ s ) n n A (˜ s ) n (˜ s ) ) n ( s ) n (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ]+ P s ∈S π A (1 − π A ) p ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (0) | C, S = s ]+ P s ∈S π A (1 − π A ) p ( s )( π D (1) ( s ) − π D (0) ( s )) β ( s ) − Ξ n β  (1) = 1Ξ n  − P s ∈S n ( s ) n n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ]( P ˜ s ∈S n A (˜ s ) n (˜ s ) R n, (˜ s )) − P s ∈S n ( s ) n n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ]( P ˜ s ∈S p (˜ s ) R n, (˜ s ))+(1 − π A ) P s ∈S n A ( s ) n ( s ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] R n, ( s )+(1 − π A ) P s ∈S p ( s ) n AD ( s ) n A ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] R n, ( s )+(1 − π A ) P s ∈S p ( s ) π A π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] R n, (1 , s ) − P s ∈S n ( s ) n (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ]( P ˜ s ∈S n A (˜ s ) n (˜ s ) R n, (˜ s )) − P s ∈S n ( s ) n (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ]( P ˜ s ∈S p (˜ s ) R n, (˜ s )) − π A P s ∈S (1 − n A ( s ) n ( s ) )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ] R n, ( s )+ π A P s ∈S p ( s )(1 − n D ( s ) − n AD ( s ) n ( s ) − n A ( s ) ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ] R n, ( s )+ π A P s ∈S p ( s )(1 − π A ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ] R n, (2 , s ) − β √ n (Ξ n − Ξ)  (2) = 1Ξ  π A (1 − π A ) P s ∈S ( π D (1) ( s ) − π D (0) ( s ))( β ( s ) − β ) R n, ( s )+(1 − π A ) P s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ] R n, ( s )+ π A P s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (0) | C, S = s ] R n, ( s ) − β P s ∈S p ( s ) { (1 − π A ) π D (1) ( s ) − π A P ˜ s ∈S p (˜ s ) π D (1) (˜ s )+ π A π D (0) ( s ) − (1 − π A ) P ˜ s ∈S p (˜ s ) π D (0) (˜ s ) } R n, ( s ) − π A P s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ]( P ˜ s ∈S p (˜ s ) R n, (˜ s )) − (1 − π A ) P s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) E [ Y (0) | C, S = s ]( P ˜ s ∈S p (˜ s ) R n, (˜ s ))+ π A (1 − π A ) P s ∈S p ( s ) π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] R n, (1 , s ) − βπ A (1 − π A ) P s ∈S p ( s ) R n, (1 , s ) + βπ A (1 − π A ) P s ∈S p ( s ) R n, (2 , s )+ π A (1 − π A ) P s ∈S p ( s ) π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ] R n, (2 , s )  + o p (1) , (A-84)where (1) holds by the deﬁnitions in (A-76), and (2) holds by Assumptions 2.1 and 2.2(b), and Lemmas A.4 and A.5.From these results, the following derivation follows. √ n ( ˆ β − β ) =1 P ( C )  P s ∈S R n, (1 , ,s ) π A + P s ∈S R n, (0 , ,s ) π A − P s ∈S R n, (1 , ,s )1 − π A − P s ∈S R n, (0 , ,s )1 − π A + P s ∈S p ( s )  π D (0) ( s ) π D (1) ( s ) E [ Y (1) | AT, S = s ] − E [ Y (0) | NT, S = s ]+ π D (1) ( s ) − π D (0) ( s ) π D (1) ( s ) E [ Y (1) | C, S = s ] − β  R n, (1 , s )+ P s ∈S p ( s )  − π D (1) ( s )1 − π D (0) ( s ) E [ Y (0) | NT, S = s ] − E [ Y (1) | AT, S = s ]+ π D (1) ( s ) − π D (0) ( s )1 − π D (0) ( s ) E [ Y (0) | C, S = s ] + β  R n, (2 , s )+ P s ∈S p ( s ) π A (1 − π A )  [ π A π D (0) ( s ) + (1 − π A ) π D (1) ( s )]( β ( s ) − β ) − P ˜ s ∈S p (˜ s )( π A π D (1) (˜ s ) + (1 − π A ) π D (0) (˜ s ))( β (˜ s ) − β )+ γ ( s ) − P ˜ s ∈S p (˜ s ) γ (˜ s )  R n, ( s )+ P s ∈S ( π D (1) ( s ) − π D (0) ( s ))( β ( s ) − β ) R n, ( s )  + o p (1) , (A-85)where the equality holds by (A-76), (A-79), (A-81), (A-83), and (A-84), and where ( β ( s ) , γ ( s )) are deﬁned as in(A-45). The desired result then follows from (A-85) and Lemma A.5. Proof of Theorem 5.2.

Note that Assumption 2.1, 2.2(b), and 2.3(c), Lemma A.4, and (A-45) and (A-46), imply that V A p → V A . The desired result follows from this and Theorem 3.2. Proof of Theorem 5.3 .

This result follows from elementary convergence arguments and Theorems 5.1 and 5.2.

A.6 Proofs of results in Section 6

Proof of Theorem 6.1.

Note that (6.2) is a consequence of Theorems 3.1, 4.1, and 5.1. To complete the proof, itsuﬃces to show (6.3).Fix s ∈ S arbitrarily. By (6.1), there is a set { s j ( s ) ∈ S } J ( s ) j =1 (dependent on s ) s.t. { S = s } = ∪ J ( s ) j =1 { S = s j ( s ) } .Then, consider the following derivation. V [ Y (1) | AT, S = s ] π D (0) ( s ) (1) = J ( s ) X j =1 [ V [ Y (1) | AT, S = s j ( s )] + ( E [ Y (1) | AT, S = s j ( s )] − E [ Y (1) | AT, S = s ]) ] π D (0) ( s j ( s )) p ( s j ( s )) p ( s )= ( P J ( s ) j =1 V [ Y (1) | AT, S = s j ( s )] π D (0) ( s j ) p ( s j ( s )) p ( s ) + P J ( s ) j =1 E [ Y (1) | AT, S = s j ( s )] π D (0) ( s j ( s )) p ( s j ( s )) p ( s ) − E [ Y (1) | AT, S = s ] π D (0) ( s ) ) , (A-86)where (1) follows from the law of total variance. By a similar argument, V [ Y (0) | NT, S = s ](1 − π D (1) ( s ))= ( P J ( s ) j =1 V [ Y (0) | NT, S = s j ( s )](1 − π D (1) ( s j ( s ))) p ( s j ( s )) p ( s ) + P J ( s ) j =1 E [ Y (0) | NT, S = s j ( s )] (1 − π D (1) ( s )) p ( s j ( s )) p ( s ) − E [ Y (0) | NT, S = s ] (1 − π D (1) ( s )) ) (A-87)and, for d = { , } , V [ Y ( d ) | C, S = s ]( π D (1) ( s ) − π D (0) ( s )) =  P J ( s ) j =1 V [ Y ( d ) | C, S = s j ( s )]( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) p ( s j ( s )) p ( s ) + P J ( s ) j =1 E [ Y ( d ) | C, S = s j ( s )] ( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) p ( s j ( s )) p ( s ) − E [ Y ( d ) | C, S = s ] ( π D (1) ( s ) − π D (0) ( s ))  . (A-88) or a = 1 , , let V ,a be equal to V sat Y, + V sat D, for RCT with strata S a . Next, consider the following derivation. V , = 1 π A P ( C ) X s ∈S p ( s )  V [ Y (1) | AT, S = s ] π D (0) ( s ) + V [ Y (0) | NT, S = s ](1 − π D (1) ( s ))+ V [ Y (1) | C, S = s ]( π D (1) ( s ) − π D (0) ( s ))+( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ]) π D (0) ( s )( π D (1) ( s ) − π D (0) ( s )) π D (1) ( s ) +  − π D (0) ( s )( E [ Y (1) | C, S = s ] − E [ Y (1) | AT, S = s ])+ π D (1) ( s )( E [ Y (0) | C, S = s ] − E [ Y (0) | NT, S = s ])+ π D (1) ( s )( E [ Y (1) − Y (0) | C, S = s ] − β )  − π D (1) ( s )) π D (1) ( s )  (2) = 1 π A P ( C ) X s ∈S p ( s )  P J ( s ) j =1 V [ Y (1) | AT, S = s j ( s )] π D (0) ( s j ) p ( s j ( s )) p ( s ) + P J ( s ) j =1 V [ Y (0) | NT, S = s j ( s )](1 − π D (1) ( s j ( s ))) p ( s j ( s )) p ( s ) + P J ( s ) j =1 V [ Y (1) | C, S = s j ( s )]( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) p ( s j ( s )) p ( s ) + P J ( s ) j =1 E [ Y (1) | AT, S = s j ( s )] π D (0) ( s j ( s )) p ( s j ( s )) p ( s ) + P J ( s ) j =1 E [ Y (0) | NT, S = s j ( s )] (1 − π D (1) ( s )) p ( s j ( s )) p ( s ) + P J ( s ) j =1 E [ Y (1) | C, S = s j ( s )] ( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) p ( s j ( s )) p ( s ) − " ( π D (1) ( s ) − π D (0) ( s )) E [ Y (1) | C, S = s ] + π D (0) ( s ) E [ Y (1) | AT, S = s ]+(1 − π D (1) ( s ))( E [ Y (0) | NT, S = s ] + β ) + β P J ( s ) j =1 (1 − π D (1) ( s j ( s ))) p ( s j ( s )) p ( s ) +2 β P J ( s ) j =1 E [ Y (0) | NT, S = s j ( s )](1 − π D (1) ( s j ( s ))) p ( s j ( s )) p ( s )  , (A-89)where (1) follows from Theorem 3.1, and (2) follows from (A-86), (A-87), and (A-88). In turn, V , = 1 π A P ( C ) X s ∈S p ( s )  P J ( s ) j =1 V [ Y (1) | AT, S = s j ( s )] π D (0) ( s j ( s )) p ( s j ( s )) p ( s ) + P J ( s ) j =1 V [ Y (0) | NT, S = s j ( s )](1 − π D (1) ( s j ( s ))) p ( s j ( s )) p ( s ) + P J ( s ) j =1 V [ Y (1) | C, S = s j ( s )]( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) p ( s j ( s )) p ( s ) + P J ( s ) j =1 ( E [ Y (1) | AT, S = s j ( s )]) π D (0) ( s j ( s )) p ( s j ( s )) p ( s ) + P J ( s ) j =1 E [ Y (0) | NT, S = s j ( s )] (1 − π D (1) ( s j ( s ))) p ( s j ( s )) p ( s ) + P J ( s ) j =1 ( E [ Y (1) | C, S = s j ( s )]) ( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) p ( s j ( s )) p ( s ) − P J ( s ) j =1  ( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) E [ Y (1) | C, S = s j ( s )]+ π D (0) ( s j ( s )) E [ Y (1) | AT, S = s j ( s )])+(1 − π D (1) ( s j ( s )))( E [ Y (0) | NT, S = s j ( s )] + β )  p ( s j ( s )) p ( s ) + β P J ( s ) j =1 (1 − π D (1) ( s j ( s ))) p ( s j ( s )) p ( s ) +2 β P J ( s ) j =1 E [ Y (0) | NT, S = s j ( s )](1 − π D (1) ( s j ( s ))) p ( s j ( s )) p ( s )  , (A-90)where (1) holds by Theorem 3.1. By (A-89) and (A-90), V , − V , = − π A P ( C ) X s ∈S p ( s ) J ( s ) X j =1 p ( s j ( s )) p ( s )  M ( s j ( s )) − J ( s ) X ˜ j =1 M ( s ˜ j ( s )) p ( s ˜ j ( s )) p ( s )  ≤ , (A-91)where (1) uses the following deﬁnition: M ( s j ( s )) ≡ " ( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) E [ Y (1) | C, S = s j ( s )] + π D (0) ( s j ( s )) E [ Y (1) | AT, S = s j ( s )]+(1 − π D (1) ( s j ( s )))( E [ Y (0) | NT, S = s j ( s )] + β ) . (A-92)For a = 1 , , let V ,a be equal to V sat Y, + V sat D, for RCT with strata S a . By a similar argument, V , − V , = − − π A ) P ( C ) X s ∈S p ( s ) J ( s ) X j =1 p ( s j ( s )) p ( s )  M ( s j ( s )) − J ( s ) X ˜ j =1 M ( s ˜ j ( s )) p ( s ˜ j ( s )) p ( s )  ≤ , (A-93) here we use the following deﬁnition: M ( s j ( s )) ≡ " ( π D (1) ( s j ( s )) − π D (0) ( s j ( s ))) E [ Y (0) | C, S = s j ( s )]+(1 − π D (1) ( s j ( s ))) E [ Y (0) | NT, S = s j ( s )] + π D (0) ( s j ( s ))( E [ Y (1) | AT, S = s j ( s )] − β ) . (A-94)For a = 1 , , let V H,a be equal to V sat H for RCT with strata S a . By Theorem 3.1, V H, = 1 P ( C ) X s ∈S p ( s )( π D (1) ( s ) − π D (0) ( s )) ( E [ Y (1) − Y (0) | C, S = s ] − β ) = 1 P ( C ) X s ∈S p ( s )  J ( s ) X j =1 ( π D (1) ( s j ( s )) − π D (0) ( s j ( s )))( E [ Y (1) − Y (0) | C, S = s j ( s )] − β ) p ( s j ( s )) p ( s )  = 1 P ( C ) X s ∈S p ( s )  J ( s ) X j =1 p ( s j ( s )) p ( s ) M H ( s j ( s ))  , (A-95)where (1) follows from the LIE, and (2) uses the following deﬁnition: M H ( s j ( s )) ≡ ( π D (1) ( s j ( s )) − π D (0) ( s j ( s )))( E [ Y (1) − Y (0) | C, S = s j ( s )] − β ) . (A-96)In turn, V H, = 1 P ( C ) X s ∈S p ( s ) J ( s ) X j =1 p ( s j ( s )) p ( s ) ( M H ( s j ( s ))) . (A-97)By (A-96) and (A-97), V H, − V H, = 1 P ( C ) X s ∈S p ( s ) J ( s ) X j =1 p ( s j ( s )) p ( s ) ( M H ( s j ( s )) − J ( s ) X ˜ j =1 p ( s ˜ j ( s )) p ( s ) M H ( s ˜ j ( s ))) ≥ , (A-98)To conclude the proof, consider the following argument. V sat , − V sat , = V , + V , + V H, − V , − V , − V H, = 1 P ( C ) X s ∈S J ( s ) X j =1 p ( s ) p ( s j ( s )) p ( s )  π A ( M ( s j ( s )) − P J ( s )˜ j =1 M ( s ˜ j ( s )) p ( s ˜ j ( s )) p ( s ) ) − π A ) ( M ( s j ( s )) − P J ( s )˜ j =1 M ( s ˜ j ( s )) p ( s ˜ j ( s )) p ( s ) ) − ( M H ( s j ( s )) − P J ( s )˜ j =1 M H ( s ˜ j ( s )) p ( s ˜ j ( s )) p ( s ) )  (3) = 1 P ( C ) X s ∈S p ( s )  π A P J ( s ) j =1 p ( s j ( s )) p ( s ) M ( s j ( s )) + M H ( s j ( s )) + β − P J ( s )˜ j =1 ( M ( s ˜ j ( s )) + M H ( s ˜ j ( s )) + β ) p ( s ˜ j ( s )) p ( s ) ! + − π A ) P J ( s ) j =1 p ( s j ( s )) p ( s ) ( M ( s j ( s )) − P J ( s )˜ j =1 M ( s ˜ j ( s )) p ( s ˜ j ( s )) p ( s ) ) − P J ( s ) j =1 p ( s j ( s )) p ( s ) ( M H ( s j ( s )) − P J ( s )˜ j =1 M H ( s ˜ j ( s )) p ( s ˜ j ( s )) p ( s ) )  = 1 π A P ( C ) X s ∈S p ( s )  P J ( s ) j =1 p ( s j ( s )) p ( s )  M ( s j ( s )) √ − π A + √ − π A M H ( s j ( s )) − P J ( s )˜ j =1 ( M ( s ˜ j ( s )) √ − π A + √ − π A M H ( s ˜ j ( s ))) p ( s ˜ j ( s )) p ( s )   ≥ , as required by (6.3), where (1) follows from Theorem 3.1, (2) follows from (A-91), (A-93), and (A-98), and (3) followsfrom (A-92), (A-94), and (A-96), as it implies that M ( s j ( s )) = M ( s j ( s )) + M H ( s j ( s )) + β . Proof of Theorem 6.2.

Note that (6.5) is a consequence of Theorems 3.1, 4.1, and 5.1. It then suﬃces to show that V sat and the primitive parameters can be consistently estimated as in Theorems 3.2 and A.4 based on { ( Y i , Z i , S i , A i ) } n P i =1 with S i = S ( Z i ) for all i = 1 , . . . , n P . To this end, it suﬃces to show that { ( Y i (1) , Y i (0) , Z i , D i (1) , D i (0)) } n P i =1 , { A i } n P i =1 , and { S i } n P i =1 satisfy Assumptions 2.1 and 2.2 with strata function S : Z → S . ote that Assumption 2.1 in the hypothetical RCT follows from the fact that the data in the pilot RCT is ani.i.d. sample from the same underlying population. Second, note that Assumption 2.2(b) in the hypothetical RCT isdirectly imposed in the statement.To conclude, we show Assumption 2.2(a) in the hypothetical RCT. We use W i ≡ ( Y i (1) , Y i (0) , Z i , D i (1) , D i (0))for all i = 1 , . . . , n P . By Assumption 2.2(a) in the pilot RCT, P ( { ( A i , W i ) } n P i =1 |{ S Pi } n P i =1 ) = P ( { A i } n P i =1 |{ S Pi } n P i =1 ) P ( { W i } n P i =1 |{ S Pi } n P i =1 ) . (A-99)Since { S Pi } n P i =1 is determined by { Z i } n P i =1 via S Pi = S P ( Z i ), (A-99) implies that P ( { ( A i , W i ) } n P i =1 ) P ( { S Pi } n P i =1 ) = P ( { A i } ni =1 |{ S Pi } ni =1 ) P ( { W i } n P i =1 ) P ( { S Pi } n P i =1 ) , and so P ( { A i } n P i =1 |{ W i } n P i =1 ) = P ( { A i } n P i =1 |{ S Pi } n P i =1 ) . (A-100)Next, consider the following derivation for any arbitrary { s i } n P i =1 ∈ S n P . P ( { A i } n P i =1 |{ S i } n P i =1 = { s i } n P i =1 )= Z { w i } nPi =1 : { S i } nPi =1 = { s i } nPi =1 P ( { A i } ni =1 |{ W i } n P i =1 = { w i } n P i =1 ) dP ( { U i } n P i =1 = { w i } n P i =1 |{ S i } n P i =1 = { s i } n P i =1 )= Z { w i } nPi =1 : { ( S i ,S pi ) } nPi =1 = { ( s i ,s Pi ) } nPi =1 " P ( { A i } ni =1 |{ U i } n P i =1 = { w i } n P i =1 ) dP ( { U i } n P i =1 = { w i } n P i =1 |{ S i } n P i =1 = { s i } n P i =1 , { S Pi } n P i =1 = { s Pi } n P i =1 ) (1) = Z { w i } nPi =1 : { ( S i ,S pi ) } nPi =1 = { ( s i ,s Pi ) } nPi =1 " P ( { A i } ni =1 |{ U i } n P i =1 = { w i } n P i =1 ) dP ( { U i } n P i =1 = { w i } n P i =1 |{ ( S i , S pi ) } n P i =1 = { ( s i , s Pi ) } n P i =1 ) (2) = Z { w i } nPi =1 : { ( S i ,S pi ) } nPi =1 = { ( s i ,s Pi ) } nPi =1 " P ( { A i } n P i =1 |{ S Pi } n P i =1 = { s Pi } n P i =1 ) dP ( { U i } n P i =1 = { w i } n P i =1 |{ ( S i , S pi ) } n P i =1 = { ( s i , s Pi ) } n P i =1 ) = P ( { A i } n P i =1 |{ S Pi } n P i =1 = { s Pi } n P i =1 ) , (A-101)where (1) holds by (6.4), and so the values of { s Pi } n P i =1 are determined by { s i } n P i =1 , and (2) holds by (A-100). Bycombining (A-100) and (A-101), the desired result follows. Proof of Theorem 6.3.

We only prove both results for the SAT IV estimator. The proof of the second result for theother IV estimators is identical under τ ( s ) = 0 for all s ∈ S . Using the notation in the statement, the asymptoticvariance of the SAT IV estimator is V sat = 1 P ( C ) X s ∈S p ( s ) (cid:18) Π ( s ) π A ( s ) + Π ( s )1 − π A ( s ) (cid:19) + V sat H . Then, (6.6) follows from minimizing the expression with respect to { π A ( s ) : s ∈ S} . In turn, (6.8) follows from asimilar minimization under the restriction imposed by Assumption 2.3(c).To conclude, note that the CMT implies the consistency of the plug-in estimators based on (6.6) and (6.8). References

Angrist, J. D. and G. Imbens (1994): “Identiﬁcation and Estimation of Local Average Treatment Ef-fects,”

Econometrica , 62, 467–475.

Angrist, J. D. and V. Lavy (2009): “The Eﬀects of High Stakes High School Achievement Awards:Evidence from a Randomized Trial,”

American Economic Review , 99, 1384–1414.59 nsel, J., H. Hong, and J. Li (2018): “OLS and 2SLS in Randomized and Conditionally RandomizedExperiments,”

Journal of Economics and Statistics (Jahrb¨uecher f¨ur National¨okonomie und Statistik) ,238, 243–293.

Attanasio, O., A. Kugler, and C. Meghir (2011): “Subsidizing Vocational Training for Disadvan-taged Youth in Colombia: Evidence from a Randomized Trial,”

American Economic Journal: AppliedEconomics , 3, 188–220.

Bruhn, M. and D. McKenzie (2008): “In Pursuit of Balance: Randomization in Practice in DevelopmentField Experiments,”

American Economic Journal: Applied Economics , 1, 200–232.

Bugni, F. A., I. A. Canay, and A. M. Shaikh (2018): “Inference under Covariate Adaptive Random-ization,”

Journal of the American Statistical Association (Theory & Methods) , 113, 1741–1768.——— (2019): “Inference under Covariate-Adaptive Randomization with Multiple Treatments,”

Quantita-tive Economics , 10, 1741–1768.

Duflo, E., R. Glennerster, and M. Kremer (2007): “Using Randomization in Development EconomicsResearch: A Toolkit,” Centre for Economic Policy Research, Discussion Paper No. 6059.

Dupas, P., D. Karlan, J. Robinson, and D. Ubfal (2018): “Banking the Unbanked? Evidence fromThree Countries,”

American Economic Journal: Applied Economics , 10, 257–297.

McIntosh, C., T. Alegr´ıa, G. Ord´o˜nez, and R. Zenteno (2018): “The Neighborhood Impacts ofLocal Infrastructure Investment: Evidence from Urban Mexico,”

American Economic Journal: AppliedEconomics , 10, 263–286.

Rosenberger, W. F. and J. M. Lachin (2016):

Randomization in Clinical Trials: Theory and Practice ,John Wiley & Sons, Inc., second ed.

Somville, V. and L. Vandewalle (2018): “Saving by Default: Evidence from a Field Experiment inRural India,”

American Economic Journal: Applied Economics , 10, 39–66.