[PDF] Identifying Causal Effects in Experiments with Spillovers and Non-compliance

Abstract

This paper shows how to use a randomized saturation experimental design to identify and estimate causal effects in the presence of spillovers--one person's treatment may affect another's outcome--and one-sided non-compliance--subjects can only be offered treatment, not compelled to take it up. Two distinct causal effects are of interest in this setting: direct effects quantify how a person's own treatment changes her outcome, while indirect effects quantify how her peers' treatments change her outcome. We consider the case in which spillovers occur only within known groups, and take-up decisions do not depend on peers' offers. In this setting we point identify local average treatment effects, both direct and indirect, in a flexible random coefficients model that allows for both heterogenous treatment effects and endogeneous selection into treatment. We go on to propose a feasible estimator that is consistent and asymptotically normal as the number and size of groups increases. We apply our estimator to data from a large-scale job placement services experiment, and find negative indirect treatment effects on the likelihood of employment for those willing to take up the program. These negative spillovers are offset by positive direct treatment effects from own take-up.

Full PDF

aa r X i v : . [ ec on . E M ] N ov Identifying Causal Eﬀects in Experiments with SocialInteractions and Non-compliance ∗† Francis J. DiTraglia ‡ , Camilo Garc´ıa-Jimeno , Rossa O’Keeﬀe-O’Donovan ,and Alejandro S´anchez-Becerra Department of Economics, University of Oxford Federal Reserve Bank of Chicago Department of Economics, University of Pennsylvania

This Version: November 16, 2020

Abstract

This paper shows how to use a randomized saturation experimental design to iden-tify and estimate causal eﬀects in the presence of social interactions—one person’streatment may aﬀect another’s outcome–and one-sided non-compliance–subjects canonly be oﬀered treatment, not compelled to take it up. Two distinct causal eﬀectsare of interest in this setting: direct eﬀects quantify how a person’s own treatmentchanges her outcome, while indirect eﬀects quantify how her peers’ treatments changeher outcome. We consider the case in which social interactions occur only within knowngroups, and take-up decisions do not depend on peers’ oﬀers. In this setting we pointidentify local average treatment eﬀects, both direct and indirect, in a ﬂexible randomcoeﬃcients model that allows for both heterogenous treatment eﬀects and endogeneousselection into treatment. We go on to propose a feasible estimator that is consistentand asymptotically normal as the number and size of groups increases.

Keywords: social interactions, spillovers, non-compliance, randomized saturation

JEL Codes:

C14, C21, C26, C90 ∗ The views expressed in this article are those of the authors and do not necessarily reﬂect the position ofthe Federal Reserve Bank of Chicago or the Federal Reserve System. † We thank Esther Duﬂo, Roland Rathelot, and Philippe Zamora for their help securing our access to theexperimental data set we use in this paper. We also thank Christina Goldschmidt, seminar participants atThe Philadelphia Fed, the 2018 IAAE Annual Conference, UPenn, the 2018 SEA Annual Meetings, and the2020 Econometric Society World Congress for helpful comments and suggestions. ‡ Corresponding Author: [email protected], Manor Road, Oxford OX1 3UQ, UK.

Introduction

Random saturation experiments provide a powerful tool for estimating causal eﬀects inthe presence of social interactions—also known as spillovers or interference—by generatingexogenous variation in both individuals’ own treatment oﬀers and the fraction of their peerswho are oﬀered treatment (Hudgens and Halloran, 2008). These two sources of variationallow researchers to study both direct causal eﬀects—the eﬀect of Alice’s treatment on herown outcome—and indirect causal eﬀects—the eﬀect of Bob’s treatment on Alice’s outcome.A complete understanding of both direct and indirect eﬀects is crucial for program evaluationin settings with social interactions. When considering a national job placement program, forexample, policymakers may worry that the indirect eﬀects of the program could completelyoﬀset the direct eﬀects: in a slack labor market, job placement could merely change who isemployed without aﬀecting the overall employment rate (Cr´epon et al., 2013).In this paper we provide methods that use data from a randomized saturation designto identify and estimate direct and indirect causal eﬀects in the presence of social inter-actions and one-sided non-compliance. In real-world experiments non-compliance is thenorm rather than the exception. In their study of the French labor market, Cr´epon et al.(2013) found that only 35% of workers oﬀered job placement services took them up. Despitepervasive non-compliance in practice, most of the existing literature on randomized satura-tion designs either assumes perfect compliance—all subjects adhere to their experimentally-assigned treatment allocation–or identiﬁes only intent-to-treat-eﬀects—the eﬀect of being oﬀered treatment. In contrast, we use the experimental design as a source of instrumentalvariables to estimate local average treatment eﬀects (LATE) when subjects endogenouslyselect into treatment on the basis of their experimental oﬀers. In a world of homogeneoustreatment eﬀects, a simple instrumental variables (IV) regression using individual treatmentoﬀers and group saturations as instruments would identify both direct and indirect eﬀects. Inmost if not all real-world settings, however, treatment eﬀects vary across individuals. In thepresence of heterogeneity, this “na¨ıve” IV approach will not in general recover interpretablecausal eﬀects. To allow for realistic patterns of heterogeneity in a tractable framework,we study a ﬂexible random coeﬃcients model in which causal eﬀects may depend on anindividual’s treatment take-up as well as that of her peers.Our approach relies on four key assumptions. First is partial interference : we assume thateach subject belongs to a single, known group and that social interactions occur only withingroups. This is reasonable in many experimental settings where, for example, groups corre-spond to villages, and social interactions across them are negligible. Second is anonymousinteractions : we assume that individuals’ potential outcome functions depend on their peers’2reatment take-up only through the average take-up in their group. Under this assumptiononly the number of treated neighbors matters, not their identities (Manski, 2013). In theabsence of detailed network data, the assumption of anonymous interactions is a naturalstarting point and is likely to be reasonable in settings such as the labor market exampledescribed above. Third is one-sided non-compliance : we assume that the only individualswho can take up treatment are those to whom treatment was oﬀered via the experimentaldesign. One-sided non-compliance is relatively common in practice, for example when an“encouragement design” is used to introduce a new program, product or technology that isotherwise unavailable (e.g. Cr´epon et al., 2013; Miguel and Kremer, 2004). We refer to ourfourth key assumption as individualized oﬀer response , or IOR for short. IOR requires thateach subject’s treatment take-up decision depends only on her own treatment oﬀer, and noton the oﬀers made to her peers. While IOR is a strong assumption, it is testable and a priori reasonable in many contexts. In Cr´epon et al. (2013), for example, local labor markets arelarge and potential participants in the job placement program are unlikely to know each otherin advance. As such, they are unlikely to inﬂuence each other’s treatment take-up decisions,even if they may impose employment externalities on one another. IOR is also reasonablein online settings where other subjects’ take-up decisions are unobserved (Anderson et al.,2014; Bond et al., 2012; Eckles et al., 2016) or conﬁdential (Yi et al., 2015).Because it rules out any form of strategic take-up, IOR allows us to divide the populationinto never-takers and compliers, two of the traditional LATE strata. Under the randomizedsaturation design and a standard exclusion restriction, we show how to construct valid andrelevant instruments that identify the average causal eﬀects of interest. The key to ourapproach is a result showing that conditioning on group size n and the share of compliers¯ c in a group breaks any dependence between peers’ average take-up and an individual’srandom coeﬃcients. Under the randomized saturation design, the share of Alice’s neighborswho are oﬀered treatment is exogenous. Under IOR, their average take-up depends only onhow many of them are compliers and whether they are oﬀered treatment. Thus, conditionalon n and ¯ c , any residual variation in the take-up of Alice’s neighbors comes solely fromthe experimental design. Although group size is observed, the share of compliers in a givengroup is not. In a large group, however, the rate of take-up among those oﬀered treatment,call it b c , closely approximates ¯ c . Using this insight, we provide feasible estimators of directand indirect causal eﬀects that are consistent and asymptotically normal in the limit asgroup size grows at an appropriate rate relative to the number of groups. After constructingthe appropriate instruments, our estimators can be implemented as simple IV regressionswithout the need for non-parametric estimation. One-sided non-compliance rules out always-takers and deﬁers. taken up treatment. Kang and Imbens (2016) identify eﬀects similar tothose of Imai et al. (2018) using a variant of our IOR assumption that they call “personalizedencouragement.” Both Kang and Imbens (2016) and Imai et al. (2018) identify well-deﬁnedeﬀects while placing limited structure on the potential outcome functions. The cost of thisgenerality is that the eﬀects they recover have a “reduced form” ﬂavor, and are only deﬁnedrelative to the speciﬁc saturations used in the experiment. While our random coeﬃcientsmodel is slightly more restrictive over the potential outcome functions, it allows us to recover“fully structural” causal eﬀects that are not speciﬁc to the design of the experiment.Our paper also relates to the applied literature that estimates spillover eﬀects in var-ious settings. This includes “partial population” studies in which a subset of subjectsin the treatment group are left untreated and their outcomes are compared to those ofsubjects in a control group (Angelucci and De Giorgi, 2009; Barrera-Osorio et al., 2011;Bobonis and Finan, 2009; Duﬂo and Saez, 2003; Haushofer and Shapiro, 2016). It also in-cludes cluster-randomized trials where groups are deﬁned by a spatial radius within which so-cial interactions may arise (Bobba and Gignoux, 2014; Miguel and Kremer, 2004) and morerecent papers that use a randomized saturation design (Banerjee et al., 2012; Bursztyn et al.,2019; Gin´e and Mansuri, 2018; Sinclair et al., 2012). In general, this literature estimatesintent-to-treat (ITT) eﬀects. Two notable exceptions are Cr´epon et al. (2013) and Akram et al.(2018) who estimate eﬀects that are similar in spirit to the CADE of Imai et al. (2018).Our identiﬁcation approach also relates to a large literature on random coeﬃcients mod-els, the closest being Wooldridge (2004) and Masten and Torgovitsky (2016), as well asmethods that identify structural eﬀects using control functions (Altonji and Matzkin, 2005;Imbens and Newey, 2009).The remainder of the paper is organized as follows. Section 2 details our notation andassumptions, while section 3 presents our identiﬁcation results. Section 4 provides consis-tent and asymptotically normal estimators of the eﬀects identiﬁed in section 3, and Sectionsection 5 concludes. Proofs appear in the appendix.4

Notation and Assumptions

We observe N individuals divided between G groups. We assume throughout the paper thateach group has at least two members so there is scope for social interactions. Let g = 1 , . . . , G index groups and i = 1 , . . . , N g index individuals within a given group g . Using this notation, N = P g N g . For each individual ( i, g ) we observe a binary treatment oﬀer Z ig , an indicatorof treatment take-up D ig , and an outcome Y ig . For each group g we observe a saturation S g ∈ [0 ,

1] that determines the fraction of individuals oﬀered treatment in that group. A boldletter indicates a vector and a g -subscript shows that this vector is restricted to membersof a particular group. For example Z is the N -vector of all treatment oﬀers Z ig while Z g isthe N g -vector obtained by restricting Z to group g . Deﬁne D and D g analogously and let S denote the G -vector of all S g . At various points in our discussion we will need to refer tothe average value of a variable for everyone in a group besides person ( i, g ). As shorthand,we refer to these other individuals as person ( i, g )’s neighbors . To indicate such an average,we use a bar along with an ( i, g ) subscript. For instance, ¯ D ig denotes the treatment take-uprate in group g excluding ( i, g ), while ¯ Z ig is the analogous treatment oﬀer rate:¯ D ig ≡ N g − X j = i D jg , ¯ Z ig ≡ N g − X j = i Z jg . (1)Note that, under this deﬁnition, ¯ D ig and ¯ Z ig vary across individuals in the same groupdepending on their values of D ig or Z ig . For example in a group of eleven people, of whomﬁve take up treatment, ¯ D ig = 0 . D ig = 0 and 0 . D ig = 1. We now introduce our basicassumptions, beginning with the experimental design. Assumption 1 (Assignment of Saturations) . Let S = { s , s , . . . , s J } where s j ∈ [0 , forall j . Saturations are assigned to groups completely at random from S such that m j groupsare assigned to saturation s j with probability one, where P Jj =1 m j = G . In other words, P ( S g = s j ) = ( m j /G for j = 1 , . . . , J otherwise Assumption 1 details the ﬁrst stage of the randomized saturation design. In this stage,each group g is assigned a saturation S g drawn completely at random from a set S . Inthe example from Figure 1, ﬁfty groups (balls) are divided equally between ﬁve saturations(urns), namely S = { , . , . , . , } . The saturation drawn in this ﬁrst stage determinesthe fraction of individuals in the group that will be oﬀered treatment in the second stage.Figure 1, for example, depicts a group of eight individuals that has been assigned to the5% 25% 50% 75% 100%0 01 0 0 00 1 Figure 1:

Randomized Saturation Design. In the ﬁrst stage groups (balls) are randomly assignedto saturations (urns). In the second stage, individuals within a group are randomly assignedtreatment oﬀers at the saturation selected in the ﬁrst stage. The ﬁgure zooms in on a group of sizeeight that has been assigned to a 25% saturation: two individuals are oﬀered treatment.

25% saturation: two are oﬀered treatment and six are not. For simplicity we assume thattreatment oﬀers in the second stage follow a

Bernoulli design , in which S g determines theprobability of treatment rather than the number of treatment oﬀers. Assumption 2 (Bernoulli Oﬀers) . P ( Z g = z | S g = s, N g = n ) = n Y i =1 s z i (1 − s ) − z i . The randomized saturation design creates exogenous variation at the individual andgroup levels. Within a group some individuals are oﬀered while others are not. Betweengroups, some have a large number of individuals oﬀered treatment—a high saturation—whileothers do not. Many randomized saturation experiments, like the illustration in Figure 1,feature a 0% saturation or even a 100% saturation. We refer to 0% and 100% saturationscollectively as corner saturations to distinguish them from all other saturations, which we With minor modiﬁcations, all of our results can be extended to a completely randomized design, inwhich the number of treatment oﬀers made to a given group is ﬁxed conditional on S g . interior . There is no variation in treatment oﬀers between individuals in a group assigneda corner saturation. For this reason, as we discuss in section 3 below, the number of interiorsaturations in the design will determine the ﬂexibility with which we can model potentialoutcome functions.Assumptions 1–2 concern the design of the experiment. Our remaining assumptions, incontrast, concern the potential outcome and treatment functions. Without imposing anyrestrictions, an individual’s potential outcome function Y ig ( · ) could in principle depend onthe treatment take-up of all individuals in the sample. We denote this unrestricted potentialoutcome function by Y ig ( D ). Assumption 3 restricts Y ig ( · ) to depend only on D ig and ¯ D ig via a random coeﬃcients model. Assumption 3 (Random Coeﬃcients Model) . Let f ( · ) be a K -vector of known functions f k : [0 , R , each of which satisﬁes sup x ∈ [0 , | f k ( x ) | < ∞ . We assume that Y ig ( D ) = Y ig ( D g ) = Y ig ( D ig , ¯ D ig ) = f ( ¯ D ig ) ′ (cid:2) (1 − D ig ) θ ig + D ig ψ ig (cid:3) where θ ig and ψ ig are K -dimensional random vectors that may be dependent on ( D ig , ¯ D ig ) . The ﬁrst equality in Assumption 3 is the so-called partial interference assumption, usedwidely in the literature on spillover eﬀects. This assumption states that there are no socialinteractions between individuals in diﬀerent groups: only the treatment take-up of indi-viduals in group g aﬀects the potential outcome of person ( i, g ). The second equality inAssumption 3 states that person ( i, g )’s potential outcome is only aﬀected by the treatmenttake-up of the others in her group through the aggregate ¯ D ig . This is related to the anony-mous interactions assumption from the network literature as it implies that only the numberof ( i, g )’s neighbors who take up treatment matters for her outcome; the identities of theneighbors are irrelevant (Manski, 2013). The third equality in Assumption 3 posits a ﬁ-nite basis function expansion for the potential outcome functions Y ig (0 , ¯ D ig ) and Y ig (1 , ¯ D ig ),namely Y ig (0 , ¯ D ig ) = K X k =1 θ ( k ) ig f k ( ¯ D ig ) , Y ig (1 , ¯ D ig ) = K X k =1 ψ ( k ) ig f k ( ¯ D ig )or, written more compactly in matrix form, Y ig = X ′ ig B ig , X ig ≡ " D ig ⊗ f ( ¯ D ig ) , B ig ≡ " θ ig ψ ig − θ ig (2)where the coeﬃcient vectors θ ig and ψ ig , and hence B ig , are allowed to vary arbitrarily Recall that ¯ D ig is deﬁned to exclude person ( i, g ). i, g ) has some prior knowledge of herpotential outcome function Y ig ( · , · ), her take-up decision may depend on θ ig and ψ ig . Moregenerally, the same unobserved characteristics that determine a person’s decision to take uptreatment could aﬀect her potential outcomes. To account for these possibilities, we allowarbitrary statistical dependence between ( D ig , ¯ D ig ) and B ig .Ideally, our goal would be to identify the average direct and indirect causal eﬀects ofthe binary treatment D ig . Under Assumption 3, we deﬁne these as follows, building onthe deﬁnitions of Hudgens and Halloran (2008). The direct treatment eﬀect, DE, gives theaverage eﬀect of exogenously changing an individual’s own treatment D ig from 0 to 1 whileholding the share of her treated neighbors ¯ D ig ﬁxed at ¯ d , namelyDE( ¯ d ) ≡ E (cid:2) Y ig (1 , ¯ d ) − Y ig (0 , ¯ d ) (cid:3) = f ( ¯ d ) ′ E (cid:2) ψ ig − θ ig (cid:3) (3)where the expectations are taken over all individuals in the population from which ourexperimental subjects were drawn. Recall that ¯ D ig excludes person ( i, g ), ensuring thatDE( ¯ d ) is well-deﬁned. An indirect treatment eﬀect, in contrast, gives the average eﬀectof exogenously increasing a person’s share of treated neighbors ¯ D ig from ¯ d to ¯ d + ∆ whileholding her own treatment D ig ﬁxed at d , in other wordsIE d ( ¯ d, ∆) ≡ E (cid:2) Y ig ( d, ¯ d + ∆) − Y ig ( d, ¯ d ) (cid:3) = (cid:2) f ( ¯ d + ∆) − f ( ¯ d ) (cid:3) ′ (cid:8) (1 − d ) E [ θ ig ] + d E (cid:2) ψ ig (cid:3)(cid:9) (4)where ∆ is a positive increment. There are two indirect treatment eﬀect functions, IE andIE , corresponding to the two possible values at which we could hold D ig ﬁxed: a spillover onthe untreated, and a spillover on the treated. Because the direct and indirect causal eﬀectsare fully determined by E [ B ig ] under Assumption 3, this is our object of interest below. Forexample, if f ( x ) ′ = [1 x ] we obtain a linear model of the form Y ig = α ig + β ig D ig + γ ig ¯ D ig + δ ig D ig ¯ D ig . (5)In this case the direct eﬀect is DE( ¯ d ) = E [ β ig ] + E [ δ ig ] ¯ d while the indirect eﬀects areIE ( ¯ d, ∆) = ∆ × E [ γ ig ] , IE ( ¯ d, ∆) = ∆ × E [ γ ig + δ ig ] . Figure 2 presents a hypothetical example of (5) in a setting with employment displace-ment eﬀects. Suppose that Y ig is Alice’s probability of long-term employment. Both Y ig (1 , ¯ d )and Y ig (0 , ¯ d ) have a negative slope. This means that Alice’s probability of long-term em-8 ig ¯ D ig α ig + β ig Y ig (1 , ¯ D ig ) γ ig + δ ig α ig Y ig (0 , ¯ D ig )0 γ ig Figure 2:

A hypothetical example of the linear potential outcomes model from (5). The slope ofthe bottom line, γ ig , is the indirect eﬀect when untreated while that of the top line, γ ig + δ ig , is themarginal indirect eﬀect when treated. The distance between the two lines is the direct treatmenteﬀect. ployment decreases if more of her neighbors obtain job placement services. But since δ ig ispositive, the spillover is more harmful if Alice is untreated. Alice’s direct eﬀect of treatment Y ig (1 , ¯ d ) − Y ig (0 , ¯ d ) is positive for all ¯ d in this example and increases as ¯ d does: job placementservices are more valuable to Alice when more of her neighbors obtain them. By averagingthese eﬀects for everyone in the population, we obtain IE , IE , and DE.Under perfect compliance D ig would simply equal Z ig , making both D ig and ¯ D ig ex-ogenous. In this case a sample analogue of E [ Y ig ( d, ¯ d )] could be used to recover all of thetreatment eﬀects discussed above, at least at values of ¯ d that arise in the experimentaldesign. Unfortunately non-compliance is pervasive in real-world experiments, greatly com-plicating the identiﬁcation of causal eﬀects. In a large-scale experiment carried out in France,for example, only 35% of unemployed workers oﬀered job placement services took them up(Cr´epon et al., 2013). Those who did take up treatment likely diﬀer in myriad ways fromthose who did not: they may, for example, be more conscientious. One way to to avoid thisproblem of self-selection is to carry out an intent-to-treat (ITT) analysis, conditioning on Z ig and S g rather than D ig and ¯ D ig . But with take-up rates as low as 35%, ITT estimatescould be very far from the causal eﬀects of interest. In this paper we adopt a diﬀerent ap-proach. Following the tradition in the local average treatment eﬀect (LATE) literature, weprovide conditions under which direct and indirect causal eﬀects–rather than ITT eﬀects–can be identiﬁed for well-deﬁned sub-populations of individuals. We focus on the case of one-sided noncompliance , in which only those oﬀered treatment can take it up. One-sided9on-compliance is fairly common in practice (e.g. Cr´epon et al., 2013) and simpliﬁes theanalysis considerably. Assumption 4 (One-sided Non-compliance) . If Z ig = 0 then D ig = 0 . To account for endogenous treatment take-up, we deﬁne potential treatment functions D ig ( · ). In principle these could depend on the treatment oﬀers of every individual, Z in theexperiment. The following assumption restricts D ig ( · ) to permit identiﬁcation of the directand indirect causal eﬀects described above. Assumption 5 (IOR) . D ig ( Z ) = D ig ( Z g ) = D ig ( Z ig , ¯ Z ig ) = D ig ( Z ig ) . The ﬁrst equality of Assumption 5 is a partial interference assumption: it requires thatthere are no social interactions in take-up between individuals in diﬀerent groups. Thesecond equality of Assumption 5 states that person ( i, g )’s take-up decision depends on thetreatment oﬀers of others in her group only through the fraction ¯ Z ig of treatment oﬀersmade to the others in her group. Unfortunately these ﬁrst two equalities are not in generalsuﬃcient to point identify direct and indirect causal eﬀects. The third equality, which wecall individualistic oﬀer response or IOR for short, imposes the further restriction that eachperson’s take-up decision depends only on her own treatment oﬀer. IOR states that thereare no social interactions in take-up . This is a strong assumption, but one that has alsoappeared in the existing literature. Kang and Imbens (2016), for example, employ a variantof IOR that they call “personalized encouragement.” And while Imai et al. (2018) derivetheir so-called “complier average direct eﬀect (CADE)” under a weaker condition than IOR,the CADE is in fact a hybrid of direct and indirect eﬀects unless one is willing to assumethat there are no social interactions in take-up. Fortunately, IOR is testable: it implies, forexample, that E [ D ig | Z ig = 1 , S g = s ] does not vary with s . If the observed average take-uprate among individuals who are oﬀered treatment varies with saturation, this indicates aviolation of IOR.Under IOR and one-sided non-compliance (Assumptions 4 and 5), we can divide individ-uals into never-takers and compliers, two of the principal strata from the LATE literature.Never-takers are deﬁned as those for whom D ig (0) = D ig (1) = 0, while compliers are thosefor whom D ig ( z ) = z for all z . Deﬁning C ig to be the indicator that person ( i, g ) is a An extension of our results to two-sided non-compliance is currently in progress. Recall that the average ¯ Z ig is deﬁned to exclude ( i, g ). Work in progress explores the possibility of relaxing IOR in speciﬁc settings to obtain point, or at leastpartial identiﬁcation. Under one-sided non-compliance, Assumption 4, there are no always-takers. D ig = C ig Z ig , ¯ D ig = 1 N g − X j = i C jg Z jg . (6)By analogy to ¯ Z ig and ¯ D ig , we deﬁne ¯ C ig to be the share of compliers among person ( i, g )’sneighbors in group g , namely ¯ C ig = 1 N g − X j = i C jg . (7)Note that ¯ C ig varies across individuals in the same group, depending on their values of C ig .Finally, let C g denote the vector of C ig for all individuals in group g .Our ﬁnal assumption is an exclusion restriction for the treatment oﬀers Z g and saturation S g . To state it we require two additional pieces of notation. First, let B g denote the vectorthat stacks B ig for all individuals in group g . Second, following Dawid (1979), let “ | = ” denote(conditional) independence so that X | = Y indicates that X is statistically independent of Y while X | = Y | Z indicates that X is conditionally independent of Y given Z . Using thisnotation, the exclusion restriction is as follows. Assumption 6 (Exclusion Restriction) . (i) S g | = ( C g , B g , N g ) (ii) Z g | = ( C g , B g ) | ( S g , N g )Intuitively, Assumption 6 states that ( C g , B g , N g ) are “predetermined” with respect tothe treatment oﬀers and saturations. In a traditional LATE setting, the counterparts ofAssumption 6 are the “unconfounded type” assumption and the independence of potentialoutcomes and treatment oﬀers. Assumption 6 could be violated in a number of ways. If,for example, individuals chose their group membership based on knowledge of their group’ssaturation, N g would not be independent of S g . Similarly, if some individuals decided tocomply with their treatment oﬀers only because their group was assigned a high saturation, C g would not be independent of S g . This latter possibility illustrates that Assumption 6 par-tially embeds IOR by ruling out “selection into compliance.” More prosaically, Assumption 6would be violated if either S g or Z ig had a direct eﬀect on the random coeﬃcients B g . No-tice that part (ii) of Assumption 6 conditions on ( S g , N g ). This is because the second stageof the randomized saturation experiment assigns Z g conditional on this information: seeAssumption 2. 11 Identiﬁcation

Under Assumption 3, the functional form of the random coeﬃcients model is known. Sowhy not simply use ( Z ig , S g ) as instrumental variables for D ig and f ( ¯ D ig )? If the ﬁrst-stagerelationship between instruments and endogenous regressors is homogeneous, two-stage leastsquares identiﬁes the average eﬀects in a random coeﬃcient model, i.e. E [ θ ig ] and E [ ψ ig − θ ig ]under (2) (Heckman and Vytlacil, 1998; Wooldridge, 1997, 2003, 2016). In our case, however,this result does not apply: the following lemma shows that the ﬁrst stage is heterogeneous because the conditional distribution of ¯ D ig given S g varies with ( ¯ C ig , N g ). Lemma 1.

Let ¯ c be a value in [0 , such that ( n − c is a non-negative integer. UnderAssumptions 1–2 and 4–6 and conditional on ( N g = n, S g = s, C g = c , ¯ C ig = ¯ c, Z ig = z ) , ( n −

1) ¯ D ig follows a Binomial (( n − c, s ) distribution. Intuitively, the problem presented by the Lemma 1 is as follows. Although S g is randomlyassigned, the variation that it induces in ¯ D ig is mediated by the share of compliers ¯ C ig .Accordingly if ¯ C ig —a source of ﬁrst-stage heterogeneity—is correlated with the randomcoeﬃcients in the second stage, the IV estimator will not identify the eﬀects of interest. Tomake this problem more concrete, consider the linear potential outcomes model from (5) andlet ϑ IV be the IV estimand using instruments (1 , Z ig , S g , Z ig S g ). In this example ϑ IV takesa particularly simple form, as shown in the following lemma. Lemma 2.

Let ϑ IV be the IV estimand from a regression of Y ig on X ig ≡ (1 , D ig , ¯ D ig , D ig , ¯ D ig ) ′ with instruments Z ig ≡ (1 , Z ig , S g , Z ig S g ) ′ , namely ϑ IV ≡ h α IV β IV γ IV δ IV i ′ = E (cid:2) Z ig X ′ ig (cid:3) − E [ Z ig Y ig ] . assuming that E [ Z ig X ′ ig ] is invertible. Then, under (5) and Assumptions 1–2 and 4–6, α IV = E [ α ig ] β IV = E [ β ig | C ig = 1] γ IV = E [ γ ig ] + Cov ( ¯ C ig , γ ig ) E ( ¯ C ig ) δ IV = E [ δ ig | C ig = 1] + Cov ( ¯ C ig , δ ig | C ig = 1) E ( ¯ C ig | C ig = 1) . As we see from Lemma 2, IV identiﬁes the population average of α ig , along with thepopulation average of β ig for the subset of individuals who select into treatment. Neitherof these, however, is itself a causal eﬀect. In general, IV recovers neither direct nor indirectcausal eﬀects for any well-deﬁned group of individuals. Specializing (4) to the linear modelfrom (5) gives IE ( ¯ d, ∆) = E [ γ ig ]∆. In other words, E [ γ ig ] is an average spillover . Lemma 212hows that IV fails to identify this quantity unless the individual-speciﬁc spillovers γ ig areuncorrelated with the share of compliers ¯ C ig . This condition could easily fail in practice. Inthe labor market example from the introduction, cities with a particularly depressed labormarket might be expected to contain a large share of compliers. If negative spillovers aremore intense in such cities, IV will not recover the average indirect eﬀect. A similar problemhampers the interpretation of δ IV . Under (5) the average direct eﬀect for compliers, as afunction of ¯ d , is given by E [ β ig | C ig = 1] + E [ δ ig | C ig = 1] ¯ d . While IV identiﬁes the interceptof this function, it only identiﬁes the slope if δ ig is uncorrelated with ¯ C ig for compliers.As this example illustrates, identifying direct and indirect causal eﬀects requires us tocorrect for possible dependence between individual-speciﬁc coeﬃcients and group-level take-up that arises from the ﬁrst-stage relationship in Lemma 1. The key to our approach, asshown in the following theorem, is to condition on ¯ C ig and N g . Theorem 1.

Under Assumptions 1–2 and 4–6, ( S g , Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( ¯ C ig , N g ) . Theorem 1 implies that conditioning on ( ¯ C ig , N g ) is suﬃcient to break any dependencebetween f ( ¯ D ig ) and ( B ig , C ig ) that may be present. The intuition for this result is as follows.Conditional on ¯ C ig and N g , we know precisely how many of ( i, g )’s neighbors are compliers.Given this information, IOR implies that all remaining variation in ¯ D ig is arises solely fromexperimental variation in the saturation S g assigned to diﬀerent groups, and the share ofcompliers oﬀered treatment across groups assigned the same saturation. So long as Z ig and S g do not aﬀect ( B ig , C g ), Assumption 6, it follows that ( Z ig , ¯ D ig , S g ) are exogenousgiven ( ¯ C ig , N g ), even when individuals decide whether or not to take up treatment based onknowledge of their potential outcome functions.Before stating our identiﬁcation results, we require some additional notation and onefurther assumption. Deﬁne the vector W ig and matrix-valued functions Q , Q , Q as follows: Q (¯ c, n ) ≡ E (cid:2) W ig W ′ ig | ¯ C ig = ¯ c, N g = n (cid:3) , W ig ≡ h Z ig i ′ ⊗ f ( ¯ D ig ) (8) Q (¯ c, n ) ≡ E (cid:2) (1 − Z ig ) f ( ¯ D ig ) f ( ¯ D ig ) ′ | ¯ C ig = ¯ c, N g = n (cid:3) (9) Q (¯ c, n ) ≡ E (cid:2) Z ig f ( ¯ D ig ) f ( ¯ D ig ) ′ | ¯ C ig = ¯ c, N g = n (cid:3) . (10)We use Q , Q , Q below to construct instrumental variables that are not subject to theshortcomings of Z ig from Lemma 2 discussed above. The ﬁnal ingredient that we need toconstruct these alternative instruments is a rank condition. Assumption 7 (Rank Condition) . (i) < E ( C ig ) < ii) Q (¯ c, n ) is invertible at every point (¯ c, n ) in the support of ( ¯ C ig , N g ) . Part (i) of Assumption 7 asserts that there is at least some degree of non-compliancewith the experimental treatment oﬀers, E ( C ig ) <

1, and that the population contains atleast some compliers, E ( C ig ) >

0. Part (ii) requires that the matrix-valued function Q deﬁned in (8) is full rank when evaluated at any share of compliers ¯ c and group size n thatoccur in the population. Assumption 7 does not explicitly restrict Q or Q . By the linearityof conditional expectation, however, Q (¯ c, n ) = " Q (¯ c, n ) + Q (¯ c, n ) Q (¯ c, n ) Q (¯ c, n ) Q (¯ c, n ) (11)so Assumption 7(ii) could equivalently be stated in terms of Q and Q . Lemma 3. Q (¯ c, n ) is invertible iﬀ Q (¯ c, n ) and Q (¯ c, n ) are both invertible, in which case Q (¯ c, n ) − = " Q (¯ c, n ) − − Q (¯ c, n ) − − Q (¯ c, n ) − Q (¯ c, n ) − + Q (¯ c, n ) − . We discuss low-level conditions for the invertibility of ( Q , Q ), and hence Q , below.Having assumed the necessary rank condition, we can now state our main identiﬁcationresults. The following theorem shows how Q ( ¯ C ig , N g ) and Q ( ¯ C ig , N g ) can be used toconstruct instrumental variables that identify average values of the random coeﬃcients forwell-deﬁned groups of individuals. Theorem 2.

Deﬁne the instrument vectors Z Wig ≡ Q ( ¯ C ig , N g ) − W ig , Z ig ≡ Q ( ¯ C ig , N g ) − f ( ¯ D ig ) , Z ig ≡ Q ( ¯ C ig , N g ) − f ( ¯ D ig ) where Q , Q , Q , and W ig are as given in (8) – (10) . Then, under Assumptions 3–5 and 7and assuming that ( Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( ¯ C ig , N g ) , we have(i) " E ( θ ig ) E ( ψ ig − θ ig | C ig = 1) = E (cid:2) Z Wig X ′ ig (cid:3) − E (cid:2) Z Wig Y ig (cid:3) ,(ii) E (cid:2) ψ ig | C ig = 1 (cid:3) = E h Z ig (cid:8) D ig f ( ¯ D ig ) (cid:9) ′ i − E (cid:2) Z ig { D ig Y ig } (cid:3) (iii) E [ θ ig | C ig = 0] = E h Z ig (cid:8) Z ig (1 − D ig ) f ( ¯ D ig ) (cid:9) ′ i − E (cid:2) Z ig { Z ig (1 − D ig ) Y ig } (cid:3) , and(iv) E [ θ ig ] = E h Z ig (cid:8) (1 − Z ig ) f ( ¯ D ig ) (cid:9) ′ i − E (cid:2) Z ig { (1 − Z ig ) Y ig } (cid:3) . S g as a source of instruments for f ( ¯ D ig ) we transform this vector of endogenous regressorsinto a set of exogenous instruments using Q ( ¯ C ig , N g ) − and Q ( ¯ C ig , N g ) − . Parts (ii) and(iii) use a similar approach to obtain moment equations for the average value of ψ ig forcompliers and θ ig for never-takers. Given part (i), part (iv) is technically redundant, butit is convenient to have an expression for E ( θ ig ) in isolation. To understand the intuitionbehind the instruments from Theorem 2, consider the linear potential outcomes examplefrom (5) above. Here we have f ( x ) = (1 , x ) ′ and thus Q z ( ¯ C ig , N g ) = P ( Z ig = z ) E " D ig ¯ D ig ¯ D ig !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ¯ C ig , N g , Z ig = z , z ∈ { , } using the fact that Z ig | = ( ¯ C ig , N g ) by Lemma A.2. It follows after a few steps of algebra that Q z ( ¯ C ig , N g ) − f ( ¯ D ig ) = 1 P ( Z ig = z )  E ( ¯ D ig | ¯ C ig , N g , Z ig = z ) − ¯ D ig E ( ¯ D ig | ¯ C ig , N g , Z ig = z )Var( ¯ D ig | ¯ C ig , N g , Z ig = z )¯ D ig − E ( ¯ D ig | ¯ C ig , N g , Z ig = z )Var( ¯ D ig | ¯ C ig , N g , Z ig = z )  . While ¯ D ig is endogenous, we see that the scaled diﬀerence between ¯ D ig and its conditionalexpectation is a valid instrument under the linear potential outcomes model. Intuitively,this transformation adjusts for the ﬁrst-stage heterogeneity discussed at the beginning ofthis section: after controlling for diﬀerences in ( ¯ C ig , N g ), the remaining variation in ¯ D ig arises only from the experimentally–assigned saturations. Thus, rather than using S g as aninstrument directly, we use it indirectly to generate variation in ¯ D ig given ( ¯ C ig , N g ). Asdiscussed below, this is crucial for part (ii) of Assumption 7.Notice that Theorem 2 does not explicitly invoke the randomized saturation design, As-sumptions 1–2, or the exclusion restriction, Assumption 6. Using this result for identiﬁcation,however, requires two conditions. First we need to satisfy ( Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( ¯ C ig , N g ). Asshown in Theorem 1 above, the randomized saturation design and exclusion restriction aresuﬃcient for this condition to hold under one-sided non-compliance and IOR, Assumptions 4and 5. Second, we need to show that the functions Q , Q are identiﬁed in order to constructthe instruments from Theorem 2. Fortunately, these functions are in fact known under therandomized saturation design and exclusion restriction. In particular, they depend only on As Theorem 2 does not strictly speaking require a randomized saturation design, it could in principlebe applied in other settings, e.g. a “natural” experiment, if our other assumptions are satisﬁed. In this case, D ig | ( Z ig , ¯ C ig , N g ), which can be calculated from Lemma 1, and the distri-bution of Z ig | ( ¯ C ig , N g ), which coincides with its unconditional distribution by Lemma A.2.As such, we can always calculate Q ( ¯ C ig , N g ) and Q ( ¯ C ig , N g ) by simulating the experimen-tal design. Depending on the choice of f , analytical expressions for Q , Q may be available,as shown below for the linear potential outcomes model from (5).Constructing the instruments that appear in Theorem 2 requires us to evaluate Q and Q at ( ¯ C ig , N g ). While the group size N g is observed, the share of compliers ¯ C ig is not. In largegroups, however, ¯ C ig can be precisely estimated by calculating the rate of treatment take-upamong the neighbors of ( i, g ) who are oﬀered treatment. In the following section we use thisapproach to provide consistent and asymptotically normal estimators of the parameters fromTheorem 2. For the remainder of this section, however, we consider identiﬁcation conditional on knowledge of ¯ C ig . Subject to this qualiﬁcation, the following result catalogues the full setof causal eﬀects that are identiﬁed under our assumptions. Theorem 3.

Given knowledge of ¯ C ig the following are identiﬁed under Assumptions 1–7:(i) IE ( ¯ d, ∆) ≡ E [ Y ig (0 , ¯ d + ∆) − Y ig (0 , ¯ d )] ,(ii) DE ( ¯ d | D ig = 1) ≡ E (cid:2) Y ig (1 , ¯ d ) − Y ig (0 , ¯ d ) | D ig = 1 (cid:3) ,(iii) IE ( ¯ d, ∆ | D ig = 1) ≡ E [ Y ig (0 , ¯ d + ∆) − Y ig (0 , ¯ d ) | D ig = 1] ,(iv) IE ( ¯ d, ∆ | D ig = 1) ≡ E [ Y ig (1 , ¯ d + ∆) − Y ig (1 , ¯ d ) | D ig = 1] ,(v) IE ( ¯ d, ∆ | C ig = 0) ≡ E [ Y ig (0 , ¯ d + ∆) − Y ig (0 , ¯ d ) | C ig = 0] , Part (i) of Theorem 3 is an indirect treatment eﬀect, as deﬁned in (4) above. It measuresthe causal impact of increasing the treatment take-up rate among Alice’s neighbors from ¯ d to ( ¯ d + ∆) when Alice’s own treatment is held ﬁxed at zero. In the Cr´epon et al. (2013)experiment discussed in our empirical example below, this corresponds to the average labormarket displacement eﬀect. Whereas part (i) is an average treatment eﬀect, parts (ii)–(iv)are the eﬀects of treatment on the treated. Part (ii) gives the direct eﬀect of treatingAlice while holding the treatment take-up rate of her neighbors ﬁxed at ¯ d , while (iii) and(iv) give the indirect eﬀect of increasing her neighbors’ treatment take-up from ¯ d to ¯ d + ∆while holding Alice’s treatment ﬁxed at either zero, part (iii), or one, part (iv). Part (v) isa LATE generalization of Equation 4: it gives the indirect eﬀect for never-takers, holdingtheir treatment ﬁxed at zero. While we identify the full set of direct and indirect eﬀects for Q , Q would not be known but could potentially be recovered via a non-parametric approach. Because we consider a setting with one-sided non-compliance, any experimental participant with D ig = 1must be a complier. D ig = 1. As such, we cannot identify directtreatment eﬀects for this group or indirect treatment eﬀects when D ig is held ﬁxed at one.This in turn implies that we cannot identify the average direct eﬀect for the population asa whole, DE( ¯ d ), or the average indirect eﬀect when D ig is held ﬁxed at one, IE ( ¯ d, ∆).Given that Q and Q are completely determined by the experimental design, we candirectly check part (ii) of Assumption 7 for any choice of basis functions f and probabilitydistribution over saturations. Consider again the linear potential outcomes model from (5).In this example f ( x ) = (1 , x ) ′ and thus, Q (¯ c, n ) = " E { − S g } ¯ c E { S g (1 − S g ) } ¯ c E { S g (1 − S g ) } ¯ c E (cid:8) S g (1 − S g ) (cid:9) + ¯ cn − E { S g (1 − S g ) } (12) Q (¯ c, n ) = " E { S g } ¯ c E (cid:8) S g (cid:9) ¯ c E (cid:8) S g (cid:9) ¯ c E (cid:8) S g (cid:9) + ¯ cn − E (cid:8) S g (1 − S g ) (cid:9) . (13)by Bayes’ Theorem, the Law of Total Probability, and Lemmas 1 and A.2. Suppose ﬁrstthat there is a single saturation s . Then (12) and (13) simplify to yield | Q (¯ c, n ) | = ¯ cs (1 − s ) n − , | Q (¯ c, n ) | = ¯ cs (1 − s ) n − . so that Q (¯ c, n ) and Q (¯ c, n ) are both invertible for any n and all ¯ c greater than zero providedthat 0 < s <

1. The identifying power of this “degenerate” randomized saturation design,however, is weak: Q , Q are arbitrarily close to being singular for any ¯ c if n is suﬃcientlylarge. Consider next a so-called “cluster randomized” experiment in which there are twosaturations, 0 and 1, and P ( S g = 1) = p . Calculating the expectations in (12) and (13), Q (¯ c, n ) = " (1 − p ) 00 0 , Q (¯ c, n ) = " p ¯ cp ¯ cp ¯ c p . In this case neither Q nor Q is invertible for any values of n and ¯ c . Finally, consider adesign with two distinct, equally likely saturations s L < s H . For this design, straightforwardbut tedious algebra gives | Q (¯ c, n ) | = ¯ c − s L )(1 − s H )( s H − s L ) + ¯ c [(1 − s L ) + (1 − s H )] [ s L (1 − s L ) + s H (1 − s H ) ]4( n − | Q (¯ c, n ) | = ¯ c s L s H ( s H − s L ) + ¯ c ( s L + s H ) [ s L (1 − s L ) + s H (1 − s H )]4( n − .

17o long as neither s L nor s H equals zero or one, both terms in each expression are strictlypositive for any ¯ c >

0, so that Q and Q are invertible. Moreover, in contrast to the singlesaturation design discussed above, this design does not suﬀer from a weak identiﬁcationproblem. While the second term in each of the preceding equalities vanishes for large n ,the ﬁrst term does not. Thus, two interior saturations are suﬃcient to strongly identify thelinear potential outcomes model. As the three preceding examples show, two distinct sources of experimental variationdetermine the rank of Q (¯ c, n ) and Q (¯ c, n ): “between” saturation variation, and “within”saturation variation. Our ﬁrst example lacks “between” variation because each group isassigned the same saturation, S g = s . Yet even with a single saturation, there is still“within” variation under Assumption 2, because the number of oﬀers made to a given groupis random. This “within” variation, however, is negligible when n is large. In our secondexample, the cluster randomized experiment, the situation is reversed. Because everyone in agiven group is either oﬀered ( S g = 0) or unoﬀered ( S g = 1), this design generates no “within”variation. While a cluster randomized design does generate some “between” variation, it istoo coarse to identify our eﬀects of interest: under our assumptions ¯ D ig equals zero when S g = 0 and ¯ C ig when S g = 1. Our third example, with two saturations 0 < s L < s H < n is solarge that “within” variation becomes negligible. If ¯ C ig were observed, a handful of just-identiﬁed IV regressions would suﬃce to estimatethe causal eﬀects from Theorem 3. While ¯ C ig is unobserved in practice, fortunately we canestimate it under one-sided non-compliance by comparing treatment take-up to the share oftreatment oﬀers, i.e. b C ig ≡  ¯ D ig / ¯ Z ig , if ¯ Z ig > , otherwise (14)where we arbitrarily deﬁne b C ig = 0 if none of ( i, g )’s neighbors are oﬀered treatment. Inthis section we use (14) to derive feasible, consistent, and asymptotically normal estimatorsof the direct and indirect causal eﬀects identiﬁed in section 3. For simplicity, we assumethroughout that the random saturation S g is bounded below by s >

0. Because we cannot In general, suﬃcient conditions for Assumption 7(ii) will depend on the speciﬁc choice of basis functions f . For large n , however, a necessary condition is that the design contains at least as many distinct interiorsaturations as there are elements in f . For details, see Appendix C. Under Assumption 2 it is possible, although unlikely, that ¯ Z ig could be zero even if S g > ig R W ig (i) (cid:20) D ig (cid:21) ⊗ f ( ¯ D ig ) Q (cid:20) Z ig (cid:21) ⊗ f ( ¯ D ig )(ii) f ( ¯ D ig ) Q f ( ¯ D ig ) D ig (iii) f ( ¯ D ig ) Q f ( ¯ D ig ) Z ig (1 − D ig )(iv) f ( ¯ D ig ) Q f ( ¯ D ig )(1 − Z ig ) Table 1:

This table deﬁnes the shorthand from (15) for the four sample analogue estimatorscorresponding the parts of Theorem 2. In each part, the vector of regressors is X ig , the trueinstrument vector is Z ig ≡ R ( ¯ C ig , N g ) − W ig , and the estimated instrument vector is b Z ig ≡ R ( b C ig , N g ) + W ig , where M + denotes the Moore-Penrose inverse of a square matrix M , and b C ig isas deﬁned in (14). The functions Q , Q , Q are as deﬁned in (8)–(10). estimate ¯ C ig when S g = 0, experiments that include a 0% saturation require a slightlydiﬀerent approach. We explain these diﬀerences in Appendix B.In the interest of brevity, we introduce shorthand notation and high-level regularityconditions that apply to all four of our sample analogue estimators. These take the form b ϑ ≡ G X g =1 N g X i =1 b Z ig X ′ ig ! − G X g =1 N g X i =1 b Z ig Y ig ! , b Z ig ≡ R ( b C ig , N g ) + W ig (15)where Y ig is the outcome variable from Assumption 3, and M + denotes the Moore-Penroseinverse of a square matrix M . Table 1 gives the deﬁnitions of X ig , R , and W ig corre-sponding to each part of Theorem 2. The “estimated” instrument b Z ig is a stand-in for theunobserved “true” instrument Z ig ≡ R ( ¯ C ig , N g ) − W ig . While R ( ¯ C ig , N g ) is invertible underAssumption 7, R ( b C ig , N g ) may not be so, since b C ig could fall outside the support set of ¯ C ig oreven equal zero. For this reason we deﬁne b Z ig using the Moore-Penrose inverse, which alwaysexists and coincides with the ordinary matrix inverse when R ( b C ig , N g ) is indeed invertible.As G grows, so does the number of unknown values ¯ C ig that we must estimate to constructthe instrument vectors b Z ig . For this reason, we consider an asymptotic sequence in whichthe minimum group size n grows along with the number of groups G . Under appropriateassumptions, this implies that the limit behavior of b ϑ coincides with that of the infeasible While ¯ C ig can vary across individuals in the same group, it takes on at most two distinct values for ﬁxed g . If a group contains T total individuals, of whom c are compliers and n never-takers, then the share ofcompliers among a given person’s neighbors is either ( c − / ( T −

1) if she is a complier or c/ ( T −

1) if sheis a never-taker. Thus, the number of incidental parameters is 2 G . Z ig instead of its estimate b Z ig .Like Baird et al. (2018), we take an inﬁnite population approach to inference, assumingthat the researcher observes a random sample of size G from a population of groups. UnlikeBaird et al. (2018), we allow these groups to diﬀer in size. Upon drawing a group g from thepopulation, we observe the group-level random variables ( S g , N g ) along with the individual-level random variables ( Y ig , D ig , Z ig ) for each member of the group: 1 ≤ i ≤ N g . We furtherassume that observations are identically distributed, but not independent, within groups. Groups are only observed as a unit : either everyone from the group appears in thesample or no one does. For this reason, some care is needed in deﬁning random variables torepresent our sampling procedure and expectations to represent the population averages thatdeﬁne our causal eﬀects of interest. The expectations in Theorems 2–3 are averages that giveequal weight to each individual in the population, or sub-population if we condition on C ig .Analogously, the estimator in (15) is an average that gives equal weight to each individual inthe sample. Both of these are precisely what we want, as our goal is to identify and estimateaverage causal eﬀects for individuals. Under the sampling assumptions introduced in thepreceding paragraph, ( Y ig , D ig , Z ig , ¯ D ig ) are random variables that are drawn by choosing agroup uniformly at random from the population of groups, and then a single person from thechosen group. If all groups were the same size, this would be equivalent to choosing a personuniformly at random from the population of individuals . When groups vary in size, however,the equivalence no longer holds. This creates the possibility for ambiguity when takingthe expectation of an individual-level random variable, such as Y ig , without conditioningon group size: is the expectation intended to give equal weight to groups or individuals?Fortunately this is only a question of deﬁning appropriate notation. Our group samplingprocedure unambiguously gives equal weight to each individual in the population because weobserve not isolated individuals but whole groups. While small groups are just as likely tobe drawn as large groups, large groups make a greater contribution to the sample averagesfrom (15) because they contain more people. The question is merely how to represent this The assumption that observations are identically distributed within group amounts to stipulating thatthe indices 1 ≤ i ≤ N g are assigned at random. Consider a population of 100 groups, half of which have 5 members and the rest of which have 15members so that 250 of the 1000 people in the population belong to a small group and the remaining 750belong to a large group. Suppose ﬁrst that we choose a single group at random and then a single personwithin the selected group. Then someone from a small group has probability 1/500 of being selected whilesomeone from a large group has probability 1/1500 of being selected. Continuing from the example in the preceding footnote: suppose that we randomly sample 10 groupsand observe everyone in the selected group. Then, on average, our sample will contain 5 small groups and5 large groups. While the total sample size is random, we will on average observe 100 people, of whom 25come from small groups and the rest from large groups, matching the shares of each kind of individual inthe population. ρ g ≡ N g / E ( N g ) denote the relative size of group g . We write E [ Y ig ] todenote the average that gives equal weight to groups—choosing one person at random froma randomly-chosen group—and E [ ρ g Y ig ] to denote the average that gives equal weight toindividuals—observing an entire group chosen at random. It is the latter expectation thatappears in our asymptotic results below, as it denotes the population equivalent of the doublesums from (15). While this is a slight abuse of notation, expectations from section 3 abovethat involve individual-level random variables but do not condition on group size should beinterpreted as (implicitly) weighting by relative group size. Using the notation and samplingscheme deﬁned above, we now state high-level suﬃcient conditions for the consistency of b ϑ from Equation 15. Theorem 4.

Let ρ g ≡ N g / E ( N g ) and suppose that(i) we observe a random sample of G groups, where observations within a given group areidentically distributed although not necessarily independent,(ii) Y ig = X ′ ig ϑ + U ig for ≤ g ≤ G , ≤ i ≤ N g ,(iii) E ( ρ g Z ig U ig ) = and E (cid:0) ρ g Z ig X ′ ig (cid:1) = I ,(iv) E (cid:2) ρ g || Z ig X ′ ig || (cid:3) = o ( G ) ,(v) E (cid:2) ρ g || Z ig U ig || (cid:3) = o ( G ) ,(vi) || P Gg =1 1 N g P N g i =1 ρ g ( b Z ig − Z ig ) X ′ ig || = o P ( G ) , and(vii) || P Gg =1 1 N g P N g i =1 ρ g ( b Z ig − Z ig ) U ig || = o P ( G ) .Then b ϑ , deﬁned in (15) , is consistent for ϑ as G → ∞ . Condition (i) of Theorem 4 simply restates our group sampling assumption. Conditions(ii) and (iii) hold under the assumptions of Theorem 2, as shown in the proof of that result:for each average eﬀect ϑ from the theorem, we can deﬁne an appropriate error term U ig , vectorof regressors X ig , and vector of instruments Z ig such that Y ig = X ′ ig ϑ + U ig where Z ig isan exogenous and relevant instrument. Moreover, for each part of Theorem 2, E ( ρ g Z ig X ′ ig )equals the identity matrix. Conditions (iv) and (v) of Theorem 4 would be implied byrequiring that the second moments of ρ g Z ig X ′ ig and ρ g Z ig U ig exist and are bounded. We For eﬀects that condition on C ig = c , e.g. those from parts (ii) and (iii) of Theorem 2, the appropriatedeﬁnition of ρ g becomes N g E [ ( C ig = c )] / E [ N g ( C ig = c )]. Given that E ( ρ g Z ig X ′ ig ) = I , we could have deﬁned our estimator to be N P Gg =1 P N g i =1 b Z ig Y ig ratherthan b ϑ . It is more convenient both for our asymptotic derivations and practical implementation, however,to work with an IV estimator. ρ g necessarily changes with G if we consider an asymptotic sequence in which the minimumgroup size n increases with the number of groups, as we will assume below. Requiring therelevant expectations to be o ( G ) in principle allows the variance of relative group size ρ g togrow along with the number of groups, provided that it does not grow too quickly. Conditions(i)–(v) together are suﬃcient for the consistency of e ϑ ≡ G X g =1 N g X i =1 Z ig X ′ ig ! − G X g =1 N g X i =1 Z ig Y ig ! , (16)an infeasible estimator that uses the true instrument vector Z ig instead of its estimate b Z ig .The ﬁnal two conditions of Theorem 4 assume that b Z ig is a suﬃciently accurate estimatorof Z ig to ensure that b ϑ = e ϑ + o P (1). In the setting we consider here, this will requirea condition on how quickly the minimum group size n grows relative to G , as we discussin detail below. Strengthening conditions (v) and (vii) and adding one further assumptionimplies that b ϑ is asymptotically normal. Theorem 5.

Suppose that(i) Var (cid:16) N g P N g i =1 ρ g Z ig U ig (cid:17) → Σ as G → ∞ ,(ii) E (cid:2) ρ δg || Z ig U ig || δ (cid:3) = o ( G δ/ ) for some δ > , and(iii) || P Gg =1 1 N g P N g i =1 ρ g ( b Z ig − Z ig ) U ig || = o P ( G / ) .Then, under the conditions of Theorem 4, √ G ( b ϑ − ϑ ) → d N ( , Σ ) . Combined with the ﬁrst four conditions of Theorem 4, (i) and (ii) from Theorem 5 aresuﬃcient for the asymptotic normality of e ϑ , the infeasible estimator deﬁned in (16). Condi-tion (i) implies that the rate of convergence of e ϑ is G − / . Obtaining a rate of convergencethat depends on the total number of individuals rather than groups in the sample would re-quire assumptions that are implausible in typical applications of the randomized saturationdesign. Conditions (ii) and (iii) strengthen (v) and (vii), respectively, from Theorem 4: (ii)is suﬃcient for the Lindeberg condition, which we use to establish a central limit theorem,while (iii) ensures that the limit distribution of the feasible estimator b ϑ coincides with thatof the infeasible estimator e ϑ . Obtaining the faster rate of convergence would require Var (cid:16) N g P N g i =1 ρ g Z ig U ig (cid:17) → as G → ∞ . Be-cause we consider an asymptotic sequence in which the minimum group size grows with G , this is technicallypossible. It would, however, require us to assume that both heterogeneity between groups and dependencewithin groups to vanish in the limit. b Z ig − Z ig ) to be suﬃciently small on average that the limiting behavior of b ϑ coincides with that of the infeasible estimator. We now provide low-level suﬃcient conditionsfor this to obtain. By deﬁnition, b Z ig − Z ig = h R ( b C ig , N g ) + − R ( ¯ C ig , N g ) − i W ig . (17)Accordingly, so long as R is a suﬃciently well-behaved function, ( b Z ig − Z ig ) will be smallif | b C ig − ¯ C ig | is. As shown in the following lemma, a suﬃcient condition for this diﬀerenceto vanish uniformly over ( i, g ) is for the minimum group size n to be large relative to log G . Lemma 4.

Suppose that < s ≥ S g and n ≤ N g . Under Assumptions 1–2 and 4–6 max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12)(cid:19) = O P s log Gn ! as ( n, G ) → ∞ . The following regularity conditions are suﬃcient for R ( b C ig , N g ) + − R ( ¯ C ig , N g ) − to inheritthe asymptotic behavior of ( b C ig − ¯ C ig ). Assumption 8 (Regularity Conditions for R ) . (i) R (¯ c, n ) is well-deﬁned and symmetric for all ¯ c ∈ [¯ c L / , , n ≥ n where < ¯ c L ≤ ¯ C ig ;(ii) inf ¯ c ≥ ¯ c L / , n ≥ n σ ( R (¯ c, n )) > σ > , where σ ( M ) denotes the minimum eigenvalue of M ;(iii) || R (¯ c , n ) − R (¯ c , n ) || ≤ L (cid:8) | ¯ c − ¯ c | + O ( n − / ) (cid:9) as n → ∞ for some < L < ∞ . Parts (i) and (ii) of Assumption 8 require that R is well-deﬁned and uniformly invertibleover a range of values for ¯ c that includes the support of ¯ C ig and excludes zero. Part (iii) isa variant of Lipschitz continuity that holds in the limit as n grows. These conditions aremild: they amount to a slight strengthening of the rank condition from Assumption 7. In thelinear basis function example from (12) and (13), for instance, Assumption 8 holds whenever¯ C ig is bounded away from zero and S g takes on at least two distinct values between zero andone. More generally, provided that Assumption 7 holds, whenever ¯ C ig is bounded awayfrom zero and the basis functions f are well-behaved, we can always extend the deﬁnitionsof Q , Q from (9)–(10) to ensure that Assumption 8 holds. See Appendix C for full details.Under this assumption, we can derive suﬃcient conditions on the rates at which G and n approach inﬁnity to ensure that the diﬀerence between b Z ig and Z ig is negligible. See the discussion in section 3 immediately following (12) for details. heorem 6. Suppose that E (cid:2) ρ g || W ig X ′ ig || (cid:3) and E (cid:2) ρ g || W ig U ig || (cid:3) are both o ( G ) . Then,under condition (i) of Theorem 4 and the conditions of Lemma 4,(i) log G/n → is suﬃcient for conditions (vi)–(vii) of Theorem 4.(ii) G log G/n → is suﬃcient for condition (iii) of Theorem 5. Taken together, Theorems 4–6 establish that b ϑ from (15) is consistent, and asymptoticallynormal in the limit as G and n grow at an appropriate rate. In practical terms, our estimatorsare appropriate for settings with many large groups such as the experiment of Cr´epon et al.(2013). To implement them in practice, all that is required is to calculate the estimatedinstrument b Z ig and then run the appropriate just-identiﬁed IV regression from Table 1 withstandard errors clustered by group. In this paper we have proposed methods to identify and estimate direct and indirect causaleﬀects under one-sided non-compliance, using data from a randomized saturation experiment.Under appropriate assumptions, we show that the key source of unobserved heterogeneity isthe share of compliers within a given group. In a setting with many large groups, this quantitycan be estimated and yields a simple IV estimator that is consistent and asymptoticallynormal in the limit as group size and the number of groups grow. A possible extension ofthe methods described above would be to consider settings with two-sided non-compliance.In this case our identiﬁcation approach would condition on the share of always-takers inaddition to the share of compliers. Another interesting extension would be to considerrelaxing Assumption 5 to allow some dependence of individuals’ take-up decisions on theoﬀers of their peers. Work currently in progress explores this possibility.

A Proofs

The following lemma, taken from Constantinou and Dawid (2017) summarizes several useful prop-erties of conditional independence that we use in our proofs below. The names attached to prop-erties (i) and (iii)–(v) are taken from Pearl (1988). For the purposes of this document, we call thesecond property “redundancy.”

Lemma A.1 (Axioms of Conditional Independence) . Let

X, Y, Z, W be random vectors deﬁned ona common probability space, and let h be a measurable function. Then:(i) (Symmetry): X | = Y | Z = ⇒ Y | = X | Z .(ii) (Redundancy): X | = Y | Y . iii) (Decomposition): X | = Y | Z and W = h ( Y ) = ⇒ X | = W | Z .(iv) (Weak Union): X | = Y | Z and W = h ( Y ) = ⇒ X | = Y | ( W, Z ) .(v) (Contraction): X | = Y | Z and X | = W | ( Y, Z ) = ⇒ X | = ( Y, W ) | Z . For simplicity, our proofs below freely use the “Symmetry” property without comment, althoughwe reference the other properties when used. We also rely on the following corollary of Lemma A.1.

Corollary A.1. X | = Y | Z implies ( X, Z ) | = Y | Z . Proof of Lemma 1.

Applying Corollary A.1 and the Decomposition property to Assumption 6(ii)yields Z g | = ( C g , ¯ C ig ) | ( N g , S g ). By the deﬁnition of conditional independence, it follows that thedistribution of Z g | ( N g , S g , C g , ¯ C ig ) is the same as that of Z g | ( N g , S g ): P ( Z g = z | N g = n, S g = s, C g , ¯ C ig ) = P ( Z g = z | N g = n, S g = s ) . (A.1)Now, deﬁne the shorthand A ≡ (cid:8) N g = n, S g = s, C g = c , ¯ C ig = ¯ c (cid:9) and let C ( i ) be the indices ofall non-zero components of c , excluding the i th component, i.e. C ( i ) ≡ { j = i : c j = 1 } . By thedeﬁnition of ¯ D ig , the event (cid:8) ¯ D ig = d (cid:9) is equivalent to nP j = i C jg Z jg = d ( N g − o . Consequently, P ( ¯ D ig = d | A, Z ig ) = P  X j = i C jg Z jg  = d ( n − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A, Z ig  = P   X j ∈C ( i ) Z jg  = d ( n − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A, Z ig  where the ﬁrst equality uses the fact that A implies N g = n , and the second uses the fact that A implies C g = c , so we know precisely which of the indicators C jg equal zero and which equalone. Under Assumption 2, (A.1) implies that Z g | A ∼ iid Bernoulli( s ). By our deﬁnition of C ( i )it follows that, conditional on A , the subvector of Z g that corresponds to C ( i ) constitutes an iidsequence of ¯ c ( n −

1) Bernoulli( s ) random variables, each of which is independent of Z ig . Hence,conditional on ( A, Z ig ), we see that P j ∈C ( i ) Z jg ∼ Binomial (cid:0) ¯ c ( n − , s (cid:1) . Proof of Lemma 2.

Under (5), Y ig = X ′ ig B ig where B ig = ( α ig , β ig , γ ig , δ ig ) ′ . Now, let R ig ≡ (cid:8) S g , Z ig , N g , ¯ C ig , C ig , B ig (cid:9) and Λ ig ≡ diag (cid:8) , C ig , ¯ C ig , C ig ¯ C ig (cid:9) . From Lemma 1 we see that E [ ¯ D ig |R ] =¯ C ig S g . Since D ig = C ig Z ig under one-sided non-compliance and IOR, it follows that E [ X ′ ig |R ig ] = Z ′ ig Λ ig . Hence, E [ Z ig Y ig ] = E (cid:2) Z ig E ( X ′ ig |R ig ) B ig (cid:3) = E (cid:2)(cid:0) Z ig Z ′ ig (cid:1) ( Λ ig B ig ) (cid:3) E (cid:2) Z ig X ′ ig (cid:3) = E (cid:2) Z ig E (cid:0) X ′ ig |R ig (cid:1)(cid:3) = E (cid:2)(cid:0) Z ig Z ′ ig (cid:1) Λ ig (cid:3) since Z ig and B ig are R ig –measurable. Now, applying Decomposition and Corollary A.1 to part(ii) of Assumption 6 gives Z ig | = ( C ig , ¯ C ig , B ig ) | ( S g , N g ). Under Bernoulli oﬀers, however, this con-ditional distribution does not involve N g , so we obtain( C ig , ¯ C ig , B ig ) | = Z ig | S g . (A.2)Similarly, applying Decomposition to part (ii) of Corollary A.1, we see that ( C ig , ¯ C ig , B ig ) | = S g .Combining this with (A.2), the Contraction axiom yields ( C ig , ¯ C ig , B ig ) | = ( Z ig , S g ), implying that( Z ig Z ′ ig ) is independent of both Λ ig and ( Λ ig B ig ). Accordingly, ϑ IV = (cid:8) E (cid:2)(cid:0) Z ig Z ′ ig (cid:1) Λ ig (cid:3)(cid:9) − E (cid:2)(cid:0) Z ig Z ′ ig (cid:1) ( Λ ig B ig ) (cid:3) = E [ Λ ig ] − E [ Λ ig B ig ] . y the deﬁnitions of ϑ IV , Λ ig and B ig it follows that α IV = E [ α ig ] , β IV = E [ C ig β ig ] E [ C ig ] , γ IV = E (cid:2) ¯ C ig γ ig (cid:3) E (cid:2) ¯ C ig (cid:3) , δ IV = E (cid:2) C ig ¯ C ig δ ig (cid:3) E (cid:2) C ig ¯ C ig (cid:3) . By iterated expectations over C ig , we obtain β IV = E [ β ig | C ig = 1] while γ IV = E (cid:2) ¯ C ig γ ig (cid:3) E (cid:2) ¯ C ig (cid:3) = Cov( ¯ C ig , γ ig ) + E ( ¯ C ig ) E ( γ ig ) E ( ¯ C ig ) = E [ γ ig ] + Cov( ¯ C ig , γ ig ) E ( ¯ C ig ) . Similarly, again taking iterated expectations over C ig , δ IV = E (cid:2) ¯ C ig δ ig | ¯ C ig = 1 (cid:3) E (cid:2) ¯ C ig | C ig = 1 (cid:3) = E [ γ ig ] + Cov( ¯ C ig , δ ig | C ig = 1) E ( ¯ C ig | C ig = 1) . Proof of Theorem 1.

Assumption 6(i) implies ( C g , B g ) | = S g | N g by Weak Union and Decompo-sition. Combining this with Assumption 6(ii) gives( Z g , S g ) | = ( B g , C g ) | N g (A.3)by Contraction. Now let C − ig denote the subvector of C g that excludes element i . ApplyingDecomposition, Corollary A.1, and Weak Union to (A.3),( S g , Z g ) | = ( B ig , C ig , C − ig , N g ) | ( N g , ¯ C ig ) . (A.4)because ¯ C ig is a function of ( C g , N g ). By Lemma 1,¯ D ig | = C − ig | ( N g , ¯ C ig , S g , Z ig ) . (A.5)Applying Decomposition to (A.4) gives C − ig | = ( S g , Z ig ) | ( N g , ¯ C ig ). Combining this with (A.5),( S g , Z ig , ¯ D ig ) | = C − ig | ( N g , ¯ C ig ) (A.6)by Contraction. Now, applying Weak Union, Decomposition, and Corollary A.1 to (A.4),( S g , Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( C − ig , ¯ C ig , N g ) . (A.7)since ¯ D ig is a function of ( Z g , C − ig , N g ). Finally, applying Contraction to (A.6) and (A.7),( S g , Z ig , ¯ D ig ) | = ( C − ig , B ig , C ig ) | ( ¯ C ig , N g )and the result follows by a ﬁnal application of Decomposition. Proof of Lemma 3.

Deﬁne the shorthand U ≡ Q (¯ c, n ) , A ≡ Q (¯ c, n ), and B = Q (¯ c, n ) so that U = (cid:20) A + B BB B (cid:21) . Using this notation, we are asked to show that U is invertible if and only if A and B are both nvertible, in which case U − = V where V ≡ (cid:20) A − − A − − A − A − + B − (cid:21) . The “if” direction follows by direct calculation:

V U = U V = I . For the “only if” direction, supposethat U is invertible. Partitioning U − into blocks ( C, D, E, F ) conformably with the partition of U , we have U U − = (cid:20) A + B BB B (cid:21) (cid:20)

C DE F (cid:21) = (cid:20) I I (cid:21) = (cid:20) C DE F (cid:21) (cid:20) A + B BB B (cid:21) = U − U. We begin by showing that A is invertible. Consider the product U U − . Multiplying the ﬁrstrow of U by the ﬁrst column of U − gives the equation AC + B ( C + E ) = I ; multiplying thesecond row of U by the ﬁrst column of U − gives B ( C + E ) = 0. Combining these, AC = I m .Now consider the product U − U . Multiplying the ﬁrst row of U − by the ﬁrst column of U gives CA +( C + D ) B = I ; multiplying the ﬁrst row of U − by the second column of U gives ( C + D ) B = 0.Combining these, CA = I . Since AC = CA = I , we have shown that A is invertible with A − = C .We next show that D = E = − C . Consider again the product U U − . Multiplying the ﬁrst rowof U by the second column of U − gives AD + B ( D + F ) = 0; multiplying the second row of U bythe second column of U − gives B ( D + F ) = I . Combining these, AD = − I and because A − = C we can solve this equation to yield D = − C . Now consider U − U . Multiplying the second row of U − by the ﬁrst column of U gives EA + ( E + F ) B = 0; multiplying the second row of U − by thesecond column of U gives ( E + F ) B = I . Combining these, EA = − I and solving for E , we have E = − C since A − = C .Finally we show that B is invertible. Multiplying the second row of U by the second column of U − gives B ( D + F ) = I , but since D = − C this becomes B ( F − C ) = I Multiplying the secondrow of U − by the ﬁrst column of U gives ( E + F ) B + EA = 0 but because E = − C = A − thisbecomes ( F − C ) B = I . Thus, B ( F − C ) = ( F − C ) B = I so we have shown that B is invertiblewith B − = F − C . Proof of Theorem 2.

For each part, it suﬃces to ﬁnd an appropriate outcome variable e Y ig , re-gressor vector e X ig , and instrument set e Z ig such that we can write e Y ig = e X ′ ig ϑ + U ig where ϑ isthe parameter of interest, E [ e Z ig U ig ] = , and E [ e Z ig e X ′ ig ] is invertible. Note that ( e X ig , e Y ig , e Z ig )are placeholders for quantities that diﬀer in each part of the proof: for part (i) they represent( X ig , Y ig , Z Wig ) while for part (ii) they stand for (cid:0) D ig f ( ¯ D ig ) , D ig Y ig , Z ig (cid:1) , for example. The deﬁni-tions of U ig and ϑ are also speciﬁc to each part of the proof. Part (i)

By (2) we can write e Y ig = e X ′ ig ϑ + U ig where ϑ ′ ≡ (cid:2) E ( θ ′ ig ) E ( ψ ′ ig − ϑ ′ ig | C ig = 1) (cid:3) , e Y ig ≡ Y ig , e X ig ≡ X ig , and U ig ≡ X ′ ig ( B ig − ϑ ). Under IOR D ig = C ig Z ig . Hence, deﬁning M ig ≡ diag { , C ig } ⊗ I K , X ig = (cid:18)(cid:20) C ig (cid:21) (cid:20) Z ig (cid:21)(cid:19) ⊗ (cid:2) I K f ( ¯ D ig ) (cid:3) = (cid:18)(cid:20) C ig (cid:21) ⊗ I K (cid:19) (cid:18)(cid:20) Z ig (cid:21) ⊗ f ( ¯ D ig ) (cid:19) = M ig W ig . Since M ig is symmetric, U ig = W ′ ig [ M ig ( B ig − ϑ )]. Thus, taking e Z Wig ≡ Z ig , we have E [ e Z ig U ig ] = E n E h e Z ig U ig (cid:12)(cid:12)(cid:12) ¯ C ig , N g io = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) W ig W ′ ig M ig ( B ig − ϑ ) (cid:12)(cid:12) ¯ C ig , N g (cid:3)(cid:9) y iterated expectations. By assumption ( Z ig , ¯ D ig ) | = ( C ig , B ig ) | ( ¯ C ig , N g ). Hence, E (cid:2) W ig W ′ ig M ig ( B ig − ϑ ) (cid:12)(cid:12) ¯ C ig , N g (cid:3) = E (cid:2) W ig W ′ ig (cid:12)(cid:12) ¯ C ig , N g (cid:3) E (cid:2) M ig ( B ig − ϑ ) | ¯ C ig , N g (cid:3) by Decomposition, since W ig W ′ ig is a measurable function of ( Z ig , ¯ D ig ) and M ig ( B ig − ϑ ) is ameasurable function of ( C ig , B ig ). Substituting into the expression for E [ e Z ig U ig ], E h e Z ig U ig i = E (cid:8) E (cid:2) M ig ( B ig − ϑ ) | ¯ C ig , N g (cid:3)(cid:9) = E [ M ig ( B ig − ϑ )]by iterated expectations, since Q ( ¯ C ig , N g ) − = E [ W ig W ′ ig | ¯ C ig , N g ] − . Now, substituting the deﬁ-nitions of M ig , B ig , and ϑ , E [ M ig ( B ig − ϑ )] = E (cid:20) ( θ ig − E { θ ig } ) C ig (cid:0)(cid:8) ψ ig − θ ig (cid:9) − E (cid:8) ψ ig − θ ig (cid:12)(cid:12) C ig = 1 (cid:9)(cid:1)(cid:21) = since E (cid:2) C ig (cid:0) ψ ig − θ ig (cid:1)(cid:3) = E ( C ig ) E (cid:0) ψ ig − θ ig (cid:12)(cid:12) C ig = 1 (cid:1) . Therefore E [ e Z ig U ig ] = . Similarly, E h e Z ig e X ′ ig i = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) W ig W ′ ig M ig | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) W ig W ′ ig | ¯ C ig , N g (cid:3) E (cid:2) M ig | ¯ C ig , N g (cid:3)(cid:9) = E [ M ig ] . Since [ M ig ] is invertible if and only if E ( C ig ) = 0, it follows that E [ e Z ig e X ′ ig ] is invertible byAssumption 7. Part (ii)

Since D ig = D ig and D ig (1 − D ig ) = 0, multiplying both sides of (2) by D ig andsimplifying gives D ig Y ig = D ig f ( ¯ D ig ) ψ ig . Thus e Y ig = e X ′ ig ϑ + U ig where ϑ ≡ E ( ψ ig | C ig = 1), e Y ≡ D ig Y ig , e X ig ≡ D ig f ( ¯ D ig ), and U ig ≡ (cid:2) D ig f ( ¯ D ig ) (cid:3) ′ ( ψ ig − ϑ ). The remainder of the argumentis similar to that of part (i). Taking e Z ig ≡ Z ig and substituting D ig = Z ig C ig gives E [ e Z ig U ig ] = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig | ¯ C ig , N g (cid:3) E (cid:2) C ig ( ψ ig − ϑ ) (cid:12)(cid:12) ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) E (cid:2) C ig ( ψ ig − ϑ ) | ¯ C ig , N g (cid:3)(cid:9) = E (cid:2) C ig ( ψ ig − ϑ ) (cid:3) . Since E [ C ig ψ ig ] = E ( C ig ) E ( ψ ig | C ig = 1) = E ( C ig ϑ ), we obtain E ( e Z ig U ig ) = . Similarly, E h e Z ig e X ′ ig i = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig C ig | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig | ¯ C ig , N g (cid:3) E (cid:2) C ig | ¯ C ig , N g (cid:3)(cid:9) = E ( C ig ) I K . Hence, E [ e Z ig e X ig ] ′ is invertible by Assumption 7. Part (iii)

Since (1 − D ig ) = (1 − D ig ) and D ig (1 − D ig ) = 0, multiplying both sides of (2)by Z ig (1 − D ig ) and simplifying gives Z ig (1 − D ig ) Y ig = Z ig (1 − D ig ) f ( ¯ D ig ) θ ig . Thus we have e Y ig = e X ′ ig ϑ + U ig where ϑ ≡ E ( θ ig | C ig = 0), e Y ig ≡ Z ig (1 − D ig ) Y ig , e X ig ≡ Z ig (1 − D ig ) f ( ¯ D ig ), and U ig ≡ (cid:2) Z ig (1 − D ig ) f ( ¯ D ig ) (cid:3) ′ ( θ ig − ϑ ). The remainder of the argument is similar to that of part (i). aking e Z ig ≡ Z ig and substituting Z ig (1 − D ig ) = Z ig (1 − C ig ) gives E [ e Z ig U ig ] = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig | ¯ C ig , N g (cid:3) E [(1 − C ig )( θ ig − ϑ ) | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) E (cid:2) (1 − C ig )( θ ig − ϑ ) | ¯ C ig , N g (cid:3)(cid:9) = E [(1 − C ig )( θ ig − ϑ )] . Since E [(1 − C ig ) θ ig ] = E (1 − C ig ) E ( θ ig | C ig = 1) = E [(1 − C ig ) ϑ ], we obtain E ( e Z ig U ig ) = .Similarly, E h e Z ig e X ′ ig i = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig (1 − C ig ) | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig | ¯ C ig , N g (cid:3) E (cid:2) (1 − C ig ) | ¯ C ig , N g (cid:3)(cid:9) = E (1 − C ig ) I K . It follows that E [ e Z ig e X ′ ig ] is invertible by Assumption 7. Part (iv)

Under one-sided non-compliance and IOR, (1 − Z ig )(1 − D ig ) = (1 − Z ig ). Hence,multiplying both sides of (2) by (1 − Z ig ), we obtain (1 − Z ig ) Y ig = (1 − Z ig ) f ( ¯ D ig ) ′ θ ig , using thefact that Z ig (1 − Z ig ) = 0. Thus we can write e Y ig = e X ′ ig ϑ + U ig where ϑ ≡ E ( θ ig ), e Y ig ≡ (1 − Z ig ) Y ig , e X ig ≡ (1 − Z ig ) f ( ¯ D ig ), and U ig ≡ (1 − Z ig ) f ( ¯ D ig ) ′ ( θ ig − ϑ ). The remainder of the argument issimilar to that of part (i). Taking e Z ig ≡ Z ig , we obtain E [ e Z ig U ig ] = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ (1 − Z ig ) | ¯ C ig , N g (cid:3) E [ θ ig − ϑ | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) E (cid:2) θ ig − E ( θ ig ) | ¯ C ig , N g (cid:3)(cid:9) = and E h e Z ig e X ′ ig i = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ (1 − Z ig ) | ¯ C ig , N g (cid:3)(cid:9) = I K . Lemma A.2.

Under Assumptions 2 and 6, ( S g , Z ig ) | = ( C ig , ¯ C ig , N g , B ig ) . Proof of Lemma A.2.

By Assumption 2 Z ig | = N g | S g and by Assumption 6 (ii) and Decomposi-tion Z ig | = ( C ig , B ig ) | ( S g , N g ). Combining these by Contraction yields Z ig | = ( C g , B ig , N g ) | S g . (A.8)Now, by Assumption 6 (i) we have S g | = ( C g , B ig , N g ). Combining this with (A.8) by a secondapplication of Contraction gives ( Z ig , S g ) | = ( C g , B ig , N g ). The result follows by a ﬁnal applicationof Decomposition. Proof of Theorem 3.

Assumptions 1–6 imply that ( Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( ¯ C ig , N g ) by Theorem 1.Hence Assumptions 1–7 are suﬃcient for the conclusions of Theorem 2 to hold. Now, by Lemma 1,Assumptions 1–2 and 4–6 imply that the conditional distribution of ¯ D ig | ( ¯ C ig , N g , Z ig ) is known.Moreover, by Lemma A.2, Z ig | = ( ¯ C ig , N g ) so the distribution of Z ig | ( ¯ C ig , N g ) is likewise known. Itfollows that Q , Q and Q are known functions of ( ¯ C ig , N g ). Since N g is observed, knowledge of¯ C ig is thus suﬃcient to identify the quantities E ( θ ig ) , E ( ψ ig − θ ig | C ig = 1) , E ( ψ ig | C ig = 1) , E ( θ ig | C ig = 0)by the relevant parts of Theorem 2. Now, by iterated expectations, E ( θ ig | C ig = 1) = E ( θ ig | C ig = 0) + 1 E ( C ig ) [ E ( θ ig ) − E ( θ ig | C ig = 0)] . ince E ( C ig ) = E ( D ig | Z ig = 1), it follows that E ( θ ig | C ig = 1) is identiﬁed. Under IOR andone-sided non-compliance { D ig = 1 } = { C ig = 1 , Z ig = 1 } , and applying Weak Union and Decom-position to Lemma A.2, we see that Z ig | = B ig | C ig . Thus, E ( B ig | D ig = 1) = E ( B ig | C ig = 1 , Z ig = 1) = E ( B ig | C ig = 1) . The result follows since Y ig ( d, ¯ d ) = f ( ¯ d ) ′ θ ig + d f ( ¯ d ) ′ ( ψ ig − θ ig ) under Assumption 3. Proof of Theorem 4.

Substituting the model into the deﬁnition of b ϑ and ρ g ≡ N g / E ( N g ), b ϑ − ϑ =  G X g =1 N g X i =1 b Z ig X ′ ig  −  G X g =1 N g X i =1 b Z ig U ig  =  G G X g =1 A g + 1 G G X g =1 R (1) g  −  G G X g =1 P g + 1 G G X g =1 R (2) g  where we deﬁne A g ≡ N g N g X i =1 ρ g b Z ig X ′ ig R (1) g ≡ N g N g X i =1 ρ g ( b Z ig − Z ig ) X ′ ig P g ≡ N g N g X i =1 ρ g b Z ig U ′ ig R (2) g ≡ N g N g X i =1 ρ g ( b Z ig − Z ig ) U ig . By assumption, both || P Gg =1 R (1) g || and || P Gg =1 R (2) g || are o P ( G ) and thus b ϑ − ϑ =  G G X g =1 A g + o P (1)  −  G G X g =1 P g + o P (1)  Now, since we observe a random sample of groups and A g is a group-level random variable E  G G X g =1 A g  = E ( A g ) = E  N g N g X i =1 E (cid:0) ρ g Z ig X ′ ig | N g (cid:1) = E (cid:2) E (cid:0) ρ g Z ig X ′ ig | N g (cid:1)(cid:3) = E ( ρ g Z ig X ′ ig )where the second equality uses iterated expectations and linearity, the third uses the assumption ofidentical distribution within groups, and the the fourth uses iterated expectations a second time.Now consider an arbitrary entry A ( j,k ) g of the matrix A g and let k·k F denote the Frobenius norm.By the triangle and Cauchy-Schwarz inequalities, and using the assumption of identical distribution ith group, we haveVar  G G X g =1 A ( j,k ) g  = 1 G Var (cid:16) A ( j,k ) g (cid:17) ≤ G E (cid:2) || A g || F (cid:3) = 1 G E  N g (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N g X i =1 ρ g Z ig X ′ ig (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F  ≤ G E  N g  N g X i =1 (cid:13)(cid:13) ρ g Z ig X ′ ig (cid:13)(cid:13) F   = 1 G E  N g E  X i,j ≤ N g (cid:13)(cid:13) ρ g Z ig X ′ ig (cid:13)(cid:13) F (cid:13)(cid:13) ρ g Z jg X ′ jg (cid:13)(cid:13) F (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N g  ≤ G E  N g E  X i,j ≤ N g (cid:13)(cid:13) ρ g Z ig X ′ ig (cid:13)(cid:13) F (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N g  = 1 G E h E (cid:16) (cid:13)(cid:13) ρ g Z ig X ′ ig (cid:13)(cid:13) F (cid:12)(cid:12)(cid:12) N g (cid:17)i = 1 G E h ρ g (cid:13)(cid:13) Z ig X ′ ig (cid:13)(cid:13) F i → E (cid:20) ρ g (cid:13)(cid:13)(cid:13) Z ig X ′ ig (cid:13)(cid:13)(cid:13) F (cid:21) = o ( G ). Hence, by the L weak law of large numbers G − P Gg =1 A g → p E ( ρ g Z ig X ′ ig ) = I . An analogous argument shows that G − P Gg =1 P g → p E ( ρ g Z ig U ig ) = . The result follows by the continuous mapping theorem. Proof of Theorem 5.

Continuing the argument from the proof of Theorem 4, we have √ G ( b ϑ − ϑ ) = [ I + o P (1)] −  √ G G X g =1 P g + 1 √ G G X g =1 R (2) g  . By assumption, || P Gg =1 R (2) g || = o P ( G / ), and hence √ G ( b ϑ − ϑ ) = √ G P Gg =1 P g + o P (1). Thus,it suﬃces to apply the Lindeberg-Feller central limit theorem to P g / √ G . Because we observe arandom sample of groups, Var( P Gg =1 P g / √ G ) = Var( P g ) which by assumption converges to Σ. Allthat remains is to verify the Lindeberg condition, namely E h || P g || n || P g || > ε √ G oi → ε >

0. A suﬃcient condition for this to hold is G − δ/ E (cid:2) || P g || δ (cid:3) → δ > E (cid:2) || A g || F (cid:3) ≤ E h ρ g || Z ig X ′ ig || F i in the proof ofTheorem 4, we likewise have G − δ/ E h || P g || δ i ≤ G − δ/ E h ρ δg || Z ig X ′ ig || δ i = o (1)so the result follows. Lemma A.3.

Let ¯ Z g ≡ P N g j =1 Z jg /N g . Under the conditions of Lemma 4, P ( ¯ Z g < s/ ≤ exp (cid:8) − ns / (cid:9) . Proof of Lemma A.3.

Conditional on ( N g = n, S g = s ), the treatment oﬀers ( Z , . . . , Z N g ) are a ollection of n iid Bernoulli( s ) random variables by Assumption 2. Hence, by Hoeﬀding’s inequality P (cid:0) ¯ Z g < s/ | N g = n, S g = s (cid:1) ≤ exp (cid:8) − n ( s − s/ (cid:9) ≤ exp (cid:8) − ns / (cid:9) where the second inequality follows since s ≤ s . Thus, P ( ¯ Z g < s/

2) = X n,s P ( ¯ Z g ≤ s/ | N g = n, S g = s ) P ( N g = n, S g = s ) ≤ exp (cid:8) − ns / (cid:9) by the law of total probability. The result follows since P ( ¯ Z g < s/ ≤ P ( Z g ≤ s/ Lemma A.4.

Let ¯ C g = P N g j =1 C jg /N g and b C g ≡ P N g j =1 D jg /N g ¯ Z g , where ¯ Z g is as deﬁned inLemma A.3. Under the conditions of Lemma 4 and for any t > , P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) ≥ t (cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:17) ≤ (cid:8) − ns t / (cid:9) . Proof of Lemma A.4.

Let

A ≡ (cid:8) C g = c , N g = n, ¯ C g = ¯ c, N g ¯ Z g = m, S g = s (cid:9) where m > c = 0. In this case P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) A (cid:17) = P  (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 c j Z jg m − ¯ c (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A  = P  (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ¯ c X j ∈C Z ∗ jg − ¯ c (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A  where C ≡ { j : c j = 1 } and Z ∗ jg ≡ n ¯ cZ jg /m . Given A , the { Z jg } j ∈C are a sequence of n ¯ c drawsmade without replacement from a population of m ones and ( n − m ) zeros. Thus E ( Z ∗ jg ) = n ¯ cm P ( Z jg = 1 |A ) = n ¯ cm · mn = ¯ c. Moreover, since Z jg ∈ { , } , each of the Z ∗ jg is bounded between 0 and n ¯ c/m . While these randomvariables are identically distributed, they are not independent—like the Z jg from which they areconstructed, n Z ∗ jg o j ∈C are draws made without replacement from a ﬁnite population. Under thisform of dependence, however, Hoeﬀding’s Inequality continues to apply (Hoeﬀding, 1963, p. 28)and hence P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) A (cid:17) ≤ (cid:26) − t m n ¯ c (cid:27) ≤ (cid:26) − n (cid:16) mn (cid:17) t (cid:27) where the second inequality follows because 0 < ¯ c ≤

1. If ¯ c = 0, we have P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) A (cid:17) = P ( | − | > t |A ) = 0 ≤ (cid:26) − n (cid:16) mn (cid:17) t (cid:27) so this inequality holds for any ¯ c . Applying the law of total probability as in the proof of Lemma A.3,we see that P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) N g = n, N g ¯ Z g = m (cid:17) ≤ (cid:26) − n (cid:16) mn (cid:17) t (cid:27) nd thus P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) ≥ t (cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:17) = X { ( m,n ): mn ≥ s/ } P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) N g = n, N g ¯ Z g = m (cid:17) × P ( N g ¯ Z g = m, N g = n | ¯ Z g ≥ s/ ≤ X { ( m,n ): mn ≥ s/ } (cid:26) − n (cid:16) mn (cid:17) t (cid:27) P ( N g ¯ Z g = m, N g = n | ¯ Z g ≥ s/ ≤ X { ( m,n ): mn ≥ s/ } (cid:8) − ns t / (cid:9) P ( N g ¯ Z g = m, N g = n | ¯ Z g ≥ s/ (cid:8) − ns t / (cid:9) by a second application of the law of total probability, since n ≤ N g . Lemma A.5.

Suppose that sn > . Then, under the conditions of Lemma 4, P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:19) ≤ (cid:8) − ns h ( sn, t ) / (cid:9) where we deﬁne h ( x, t ) ≡ (cid:18) x − x (cid:19) t − " − (cid:18) x − x (cid:19) x − . Proof of Lemma A.5.

If ¯ Z g > s/ > /n , then N g ¯ Z g − Z ig > N g ¯ Z g >

0. Hence, b C ig ≡ ¯ D ig ¯ Z ig = N g ¯ D g − D ig N g ¯ Z g − Z ig = N g ¯ Z g b C g − D ig N g ¯ Z g − Z ig = (cid:18) N g ¯ Z g N g ¯ Z g − Z ig (cid:19) b C g − D ig N g ¯ Z g − Z ig . Similar manipulations give ¯ C ig = (cid:18) N g N g − (cid:19) ¯ C g − C ig N g − (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) N g ¯ Z g N g ¯ Z g − Z ig (cid:19) b C g − (cid:18) N g N g − (cid:19) ¯ C g (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) C ig N g − − D ig N g ¯ Z g − Z ig (cid:12)(cid:12)(cid:12)(cid:12) by the triangle inequality. Using the fact that Z ig , D ig , and C ig are binary along with n ≤ N g and¯ Z g > s/ > /n , tedious but straightforward algebra allows us to bound the right-hand side of thepreceding inequality from above, yielding (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) ≤ (cid:18) snsn − (cid:19) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) + "(cid:18) snsn − (cid:19) + 1 sn − . Since this upper bound for | b C ig − ¯ C ig | does not depend on i , it follows thatmax ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) ≤ (cid:18) snsn − (cid:19) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) + "(cid:18) snsn − (cid:19) + 1 sn − rovided that ¯ Z g > s/ > /n . In other words, so long as sn > (cid:8) ¯ Z g ≥ s/ (cid:9) ∩ (cid:26) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:27) ⊆ (cid:8) ¯ Z g > s/ (cid:9) ∩ n(cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > h ( sn, t ) o . Therefore, by the monotonicity of probability P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:19) ≤ P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > h ( sn, t ) (cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:17) and the result follows by Lemma A.4. Proof of Lemma 4.

By the law of total probability, Lemma A.4, and Lemma A.5 P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:19) ≤ P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:19) + P ( ¯ Z g < s/ ≤ (cid:8) − ns h ( sn, t ) / (cid:9) + exp (cid:8) − ns / (cid:9) where h ( · , · ) is as deﬁned in Lemma A.5. Expanding and simplifying, we see that h ( sn, t ) ≥ (cid:18) sn − sn (cid:19) t − tsn − ≡ h ∗ ( sn, t ) . Now, for any t ≥ P (cid:16) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:17) since both b C ig and ¯ C ig are betweenzero and one. Since h ∗ ( sn, t ) < t <

1, it follows that P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:19) ≤ (cid:8) − ns h ( sn, t ) / (cid:9) + exp (cid:8) − ns / (cid:9) ≤ (cid:8) − ns h ∗ ( sn, t ) / (cid:9) + exp (cid:8) − ns / (cid:9) ≤ (cid:8) − ns h ∗ ( sn, t ) / (cid:9) Applying the union bound we obtain P (cid:18) max ≤ g ≤ G max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:19) = P  G [ g =1 (cid:26) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:27) ≤ G X g =1 P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:19) ≤ G X g =1 (cid:8) − ns h ∗ ( sn, t ) / (cid:9) = 3 G exp (cid:8) − ns h ∗ ( sn, t ) / (cid:9) nd accordingly we have P  max ≤ g ≤ G max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12)s log Gn > M  ≤ G exp ( − ns "(cid:18) sn − sn (cid:19) log Gn M − sn − s log Gn M = 3 G exp ( log G " − s (cid:18) sn − sn (cid:19) M − sn − r n log G . The expression on the right-hand side converges to 3 exp (cid:8) log G (cid:2) − s M / (cid:3)(cid:9) as ( n, G ) → ∞ andhence can be made arbitrarily small by choosing a suﬃciently large value of M . Proof of Theorem 6.

We provide the argument for condition (vii) of Theorem 4 and (iii) ofTheorem 5 only. For (vi) from Theorem 4, simply replace U ig with X ig in the following derivations.By (17) and the triangle inequality (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) G X g =1 N g N g X i =1 ρ g ( b Z ig − Z ig ) U ig (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ ∆ G  G X g =1 N g N g X i =1 || ρ g W ig U ig ||  (A.9)where we deﬁne the shorthand∆ G ≡ max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) + − R ( ¯ C ig , N g ) − (cid:13)(cid:13)(cid:13)(cid:19) . Consider the second factor on the RHS of (A.9). By an argument similar to that used in the proofof Theorem 4, 1 G G X g =1  N g N g X i =1 || ρ g W ig U ig ||  → p E [ || ρ W ig U ig || ] < ∞ so that P Gg =1 1 N g P N g i =1 || ρ g W ig U ig || = O P ( G ). Now, deﬁne the event b G as b G ≡ (cid:26) min ≤ g ≤ G (cid:18) min ≤ i ≤ N g b C ig (cid:19) ≥ ¯ c L (cid:27) . By assumption R ( ¯ C ig , N ig ) is invertible, and conditional on b C ig ≥ ¯ c L / R ( b C ig , N g )is likewise invertible. Hence, if b G = 1 we can write (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − − R ( ¯ C ig , N g ) − (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − h R ( b C ig , N g ) − R ( ¯ C ig , N g ) i R ( ¯ C ig , N g ) − (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − R ( ¯ C ig , N g ) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) R ( ¯ C ig , N g ) − (cid:13)(cid:13) . Let || M || denote the spectral norm of a matrix M , i.e. its largest singular value. Since R ( ¯ C ig , N g )is square, symmetric, and positive deﬁnite we have || R ( ¯ C ig , N g ) − || ≤ /σ < ∞ . Similarly, if b G = 1, then || R ( b C ig , N g ) − || ≤ /σ < ∞ . Because all ﬁnite-dimensional norms are equivalent, it ollows that b G ∆ G ≤ K max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − R ( ¯ C ig , N g ) (cid:13)(cid:13)(cid:13)(cid:19) ≤ K (cid:26) max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12)(cid:19) + O ( n − / ) (cid:27) where 0 < K < ∞ denotes a generic, unspeciﬁed constant. Applying Lemma 4 we see that b G ∆ G = O P (cid:16)p log G/n (cid:17) as ( n, G ) → ∞ . Thus, by (A.9), b G (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) G X g =1 N g N g X i =1 ρ g ( b Z ig − Z ig ) U ig (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O P s log Gn ! O P ( G ) . (A.10)If log G/n → n, G ) → ∞ , then the rate on the RHS of (A.10) becomes o P ( G ). If G log G/n →

0, it becomes o P ( G / ). Finally, since ¯ c L ≤ ¯ C ig , it follows that P (cid:16)b G = 1 (cid:17) ≤ P (cid:20) max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) ≥ ¯ c L (cid:19)(cid:21) Hence, applying Lemma 4, log

G/n → b G → p

1. The result follows.

B Experiments with a 0% Saturation

Some randomized saturation designs, including the experiment of Cr´epon et al. (2013), include azero percent saturation, also known as a “pure control” condition. Under one-sided non-compliance S g = 0 implies Z ig = D ig = ¯ D ig = 0 for all 1 ≤ i ≤ N g . Accordingly, we cannot estimate the shareof compliers b C ig from (14) for groups assigned a saturation of zero. The easiest solution to thisproblem is simply to drop observations for any zero saturation groups. Under Assumptions 1–2and 6 this has no eﬀect on our identiﬁcation or large-sample results provided that we replace Q , Q and Q with expectations that condition on S g >

0, namely e Q (¯ c, n ) ≡ E (cid:2) W ig W ′ ig | ¯ C ig = ¯ c, N g = n, S g > (cid:3)e Q (¯ c, n ) ≡ E (cid:2) (1 − Z ig ) f ( ¯ D ig ) f ( ¯ D ig ) ′ | ¯ C ig = ¯ c, N g = n, S g > (cid:3)e Q (¯ c, n ) ≡ E (cid:2) Z ig f ( ¯ D ig ) f ( ¯ D ig ) ′ | ¯ C ig = ¯ c, N g = n, S g > (cid:3) . Zero percent saturation groups, however, are informative: they pin down the value of E [ Y ig (0 , E [ θ ig ]. To exploit this information, we replace theinstrument vector from part (iv) of Theorem 2 with e Z ig ≡ (cid:20) { S g > } e Q ( ¯ C ig , N g ) − f ( ¯ D ig ) { S g = 0 } (cid:21) . Calculations similar to those in the proof of Theorem 2 establish that this is a valid and relevantinstrument. Because its dimension exceeds that of θ ig by one, this instrument vector providesover-identifying information. As such, the just-identiﬁed IV moment condition from part (iv) ofTheorem 2 must be replaced with a linear GMM moment equation. Subject to this small change,estimation and inference can proceed almost exactly as in section 4: we merely substitute b C ig for¯ C ig in e Q to yield a feasible GMM estimator, e.g. two-stage least squares. With minor notationalmodiﬁcations, our large-sample results continue to apply. Extending the Deﬁnition of Q

Technically, the conditional expectations in (8)–(10) are only well-deﬁned when n ¯ c is a positiveinteger, whereas Assumption 8 requires the functions Q , Q , and Q to be deﬁned over a continuousrange of values for ¯ c . This problem is easily solved by extending the deﬁnitions of Q and Q .In many cases, the natural extension will be obvious. In the linear potential outcomes model, forexample, (12) and (13) agree with (9) and (10) when these conditional expectations are well-deﬁnedand satisfy all the conditions of Assumption 8More generally, we can always construct extended deﬁnitions of Q and Q to satisfy theseregularity conditions. Here we provide one such construction based on linear interpolation . Let¯ c ℓ (¯ c, n ) ≡ ⌊ ( n − c ⌋ n − , ¯ c u (¯ c, n ) ≡ ⌈ ( n − c ⌉ n − . By construction, ( n − c u (¯ c, n ) and ( n − c ℓ (¯ c, n ) are always non-negative integers. Now let Q ℓz (¯ c, n ) ≡ E (cid:2) ( Z ig = z ) f ( ¯ D ig ) f ( ¯ D ig ) ′ (cid:12)(cid:12) ¯ C ig = ¯ c ℓ (¯ c, n ) , N g = n (cid:3) Q uz (¯ c, n ) ≡ E (cid:2) ( Z ig = z ) f ( ¯ D ig ) f ( ¯ D ig ) ′ (cid:12)(cid:12) ¯ C ig = ¯ c u (¯ c, n ) , N g = n (cid:3) for z = 0 ,

1. Notice that Q ℓ , Q ℓ and Q u , Q u are well-deﬁned regardless of whether ( n − c is aninteger. From these ingredients, we construct extended deﬁnitions Q ∗ and Q ∗ of Q , Q as Q ∗ z (¯ c, n ) = [1 − ω (¯ c, n )] Q ℓz (¯ c, n ) + ω (¯ c, n ) Q uz (¯ c, n ); ω (¯ c, n ) ≡ ¯ c − ¯ c ℓ (¯ c, n )¯ c u (¯ c, n ) − ¯ c ℓ (¯ c, n ) ∈ [0 , z = 0 ,

1. Since both Q ℓz and Q uz are symmetric and positive deﬁnite, their convex combination Q ∗ z is as well. To show that this construction satisﬁes Assumption 8 (iii), deﬁne Q ∞ (¯ c ) ≡ E (cid:2) (1 − S g ) f (¯ cS g ) f (¯ cS g ) ′ (cid:3) , Q ∞ (¯ c ) ≡ E (cid:2) S g f (¯ cS g ) f (¯ cS g ) ′ (cid:3) . Recall that 0 ≤ S g ≤ c is a real number betweenzero and one, and f is a K -vector of Lipschitz-continuous functions, all of which are bounded on[0 , Q ∞ and Q ∞ are bounded and Lipschitz-continuous on [0 , (cid:13)(cid:13)(cid:13) Q ℓz (¯ c, n ) − Q ∞ z (¯ c ℓ (¯ c, n )) (cid:13)(cid:13)(cid:13) ≤ L √ n − , k Q uz (¯ c, n ) − Q ∞ z (¯ c u (¯ c, n )) k ≤ L √ n − L denotes an arbitrary, ﬁnite, positive constant. Similarly, k Q ∞ z (¯ c ) − Q ∞ z (¯ c ℓ (¯ c, n )) k ≤ Ln − , k Q ∞ z (¯ c ) − Q ∞ z (¯ c u (¯ c, n )) k ≤ Ln − . Combining these inequalities an applying the triangle inequality, it follows that (cid:13)(cid:13)(cid:13) Q uz (¯ c, n ) − Q ℓz (¯ c, n ) (cid:13)(cid:13)(cid:13) ≤ L √ n − , k Q uz (¯ c, n ) − Q ∞ z (¯ c ) k ≤ L √ n − (cid:13)(cid:13)(cid:13) Q ℓz (¯ c, n ) − Q ∞ z (¯ c ) (cid:13)(cid:13)(cid:13) ≤ L √ n − here, again, L is an arbitrary, ﬁnite, positive constant. Thus, k Q ∗ z (¯ c, n ) − Q ∞ z (¯ c ) k ≤ (cid:13)(cid:13)(cid:13) Q ∗ z (¯ c, n ) − Q ℓz (¯ c, n ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) Q ℓz (¯ c, n ) − Q ∞ z (¯ c ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) Q ∗ z (¯ c, n ) − Q ℓz (¯ c, n ) (cid:13)(cid:13)(cid:13) + L √ n − ω (¯ c, n ) (cid:13)(cid:13)(cid:13) Q uz (¯ c, n ) − Q ℓz (¯ c, n ) (cid:13)(cid:13)(cid:13) + L √ n − ≤ L √ n − Q ∗ z and ω (¯ c, n ) from above. Combining all of the preceding inequalities, (cid:13)(cid:13)(cid:13) Q ∗ z ( b C ig , N g ) − Q ∗ z ( ¯ C ig , N g ) (cid:13)(cid:13)(cid:13) ≤ L (cid:26) √ n − (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12)(cid:27) since n ≤ N g and Q ∞ z is Lipschitz-continuous. References

Akram, A. A., Chowdhury, S., Mobarak, A. M., 2018. Eﬀects of emigration on rural labor markets.URL http://faculty.som.yale.edu/mushfiqmobarak/papers/migrationge.pdf

Altonji, J. G., Matzkin, R. L., 2005. Cross section and panel data estimators for nonseparablemodels with endogenous regressors. Econometrica 73 (4), 1053–1102.Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J., 2014. Engaging with massive onlinecourses. In: Proceedings of the 23rd international conference on World wide web. ACM, pp.687–698.Angelucci, M., De Giorgi, G., 2009. Indirect eﬀects of an aid program: how do cash transfers aﬀectineligibles’ consumption? American Economic Review 99 (1), 486–508.Baird, S., Bohren, J. A., McIntosh, C., ¨Ozler, B., 2018. Optimal design of experiments in thepresence of interference. Review of Economics and Statistics 100 (5), 844–860.Banerjee, A. V., Chattopadhyay, R., Duﬂo, E., Keniston, D., Singh, N., 2012. Can institutions bereformed from within? evidence from a randomized experiment with the rajasthan police.Barrera-Osorio, F., Bertrand, M., Linden, L. L., Perez-Calle, F., 2011. Improving the design ofconditional transfer programs: Evidence from a randomized education experiment in Colombia.American Economic Journal: Applied Economics 3 (2), 167–95.Bobba, M., Gignoux, J., 2014. Neighborhood eﬀects and take-up of transfers in integrated socialpolicies: Evidence from Progresa.Bobonis, G. J., Finan, F., 2009. Neighborhood peer eﬀects in secondary school enrollment decisions.The Review of Economics and Statistics 91 (4), 695–716. ond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E., Fowler, J. H., 2012.A 61-million-person experiment in social inﬂuence and political mobilization. Nature 489 (7415),295.Bursztyn, L., Cantoni, D., Yang, D., Yuchtman, N., Zhang, J., 2019. Persistent political engage-ment: Social interactions and the dynamics of protest movements. Working Paper.Constantinou, P., Dawid, A. P., 2017. Extended conditional independence and applications in causalinference. The Annals of Statistics 45 (6), 2618–2653.Cr´epon, B., Duﬂo, E., Gurgand, M., Rathelot, R., Zamora, P., 2013. Do labor market policies havedisplacement eﬀects? evidence from a clustered randomized experiment. The Quarterly Journalof Economics 128 (2), 531–580.Dawid, A. P., 1979. Conditional independence in statistical theory. Journal of the Royal StatisticalSociety: Series B (Methodological) 41 (1), 1–15.Duﬂo, E., Saez, E., 2003. The role of information and social interactions in retirement plan decisions:Evidence from a randomized experiment. The Quarterly Journal of Economics 118 (3), 815–842.Eckles, D., Kizilcec, R. F., Bakshy, E., 2016. Estimating peer eﬀects in networks with peer encour-agement designs. Proceedings of the National Academy of Sciences 113 (27), 7316–7322.Gin´e, X., Mansuri, G., 2018. Together we will: experimental evidence on female voting behavior inpakistan. American Economic Journal: Applied Economics 10 (1), 207–35.Haushofer, J., Shapiro, J., 2016. The short-term impact of unconditional cash transfers to the poor:experimental evidence from kenya. The Quarterly Journal of Economics 131 (4), 1973–2042.Heckman, J., Vytlacil, E., 1998. Instrumental variables methods for the correlated random coef-ﬁcient model: Estimating the average rate of return to schooling when the return is correlatedwith schooling. Journal of Human Resources, 974–987.Hoeﬀding, W., 1963. Probability inequalities for sums of bounded random variables. Journal of theAmerican Statistical Association 58 (301), 13–30.Hudgens, M. G., Halloran, M. E., 2008. Toward causal inference with interference. Journal of theAmerican Statistical Association 103 (482), 832–842.Imai, K., Jiang, Z., Anup Malani, 2018. Causal Inference with Interference and Noncompliance inTwo-Stage Randomized Experiments.URL http://imai.princeton.edu/research/files/spillover.pdf Imbens, G. W., Newey, W. K., 2009. Identiﬁcation and estimation of triangular simultaneousequations models without additivity. Econometrica 77 (5), 1481–1512.Kang, H., Imbens, G., 2016. Peer Encouragement Designs in Causal Inference with Partial Inter-ference and Identiﬁcation of Local Average Network Eﬀects, 1–39.URL http://arxiv.org/abs/1609.04464

Manski, C. F., 2013. Identiﬁcation of treatment response with social interactions. EconometricsJournal 16 (1), 1–23. asten, M. A., Torgovitsky, A., 2016. Identiﬁcation of instrumental variable correlated randomcoeﬃcients models. Review of Economics and Statistics 98 (5), 1001–1005.Miguel, E., Kremer, M., 2004. Worms: identifying impacts on education and health in the presenceof treatment externalities. Econometrica, 159–217.Pearl, J., 1988. Probabilistic reasoning in intelligent systems: Networks of plausible inference.Sinclair, B., McConnell, M., Green, D. P., 2012. Detecting spillover eﬀects: Design and analysis ofmultilevel experiments. American Journal of Political Science 56 (4), 1055–1069.Wooldridge, J. M., 1997. On two stage least squares estimation of the average treatment eﬀect ina random coeﬃcient model. Economics Letters 56 (2), 129–133.Wooldridge, J. M., 2003. Further results on instrumental variables estimation of average treatmenteﬀects in the correlated random coeﬃcient model. Economics Letters 79 (2), 185–191.Wooldridge, J. M., 2004. Estimating average partial eﬀects under conditional moment independenceassumptions. Tech. rep., cemmap working paper.Wooldridge, J. M., 2016. Instrumental variables estimation of the average treatment eﬀect in thecorrelated random coeﬃcient model. Advances in Econometrics 21, 93–116.Yi, H., Song, Y., Liu, C., Huang, X., Zhang, L., Bai, Y., Ren, B., Shi, Y., Loyalka, P., Chu, J.,et al., 2015. Giving kids a head start: The impact and mechanisms of early commitment ofﬁnancial aid on poor students in rural China. Journal of Development Economics 113, 1–15.asten, M. A., Torgovitsky, A., 2016. Identiﬁcation of instrumental variable correlated randomcoeﬃcients models. Review of Economics and Statistics 98 (5), 1001–1005.Miguel, E., Kremer, M., 2004. Worms: identifying impacts on education and health in the presenceof treatment externalities. Econometrica, 159–217.Pearl, J., 1988. Probabilistic reasoning in intelligent systems: Networks of plausible inference.Sinclair, B., McConnell, M., Green, D. P., 2012. Detecting spillover eﬀects: Design and analysis ofmultilevel experiments. American Journal of Political Science 56 (4), 1055–1069.Wooldridge, J. M., 1997. On two stage least squares estimation of the average treatment eﬀect ina random coeﬃcient model. Economics Letters 56 (2), 129–133.Wooldridge, J. M., 2003. Further results on instrumental variables estimation of average treatmenteﬀects in the correlated random coeﬃcient model. Economics Letters 79 (2), 185–191.Wooldridge, J. M., 2004. Estimating average partial eﬀects under conditional moment independenceassumptions. Tech. rep., cemmap working paper.Wooldridge, J. M., 2016. Instrumental variables estimation of the average treatment eﬀect in thecorrelated random coeﬃcient model. Advances in Econometrics 21, 93–116.Yi, H., Song, Y., Liu, C., Huang, X., Zhang, L., Bai, Y., Ren, B., Shi, Y., Loyalka, P., Chu, J.,et al., 2015. Giving kids a head start: The impact and mechanisms of early commitment ofﬁnancial aid on poor students in rural China. Journal of Development Economics 113, 1–15.