Identifying Causal Effects in Experiments with Spillovers and Non-compliance
Francis J. DiTraglia, Camilo Garcia-Jimeno, Rossa O'Keeffe-O'Donovan, Alejandro Sanchez-Becerra
aa r X i v : . [ ec on . E M ] N ov Identifying Causal Effects in Experiments with SocialInteractions and Non-compliance ∗† Francis J. DiTraglia ‡ , Camilo Garc´ıa-Jimeno , Rossa O’Keeffe-O’Donovan ,and Alejandro S´anchez-Becerra Department of Economics, University of Oxford Federal Reserve Bank of Chicago Department of Economics, University of Pennsylvania
This Version: November 16, 2020
Abstract
This paper shows how to use a randomized saturation experimental design to iden-tify and estimate causal effects in the presence of social interactions—one person’streatment may affect another’s outcome–and one-sided non-compliance–subjects canonly be offered treatment, not compelled to take it up. Two distinct causal effectsare of interest in this setting: direct effects quantify how a person’s own treatmentchanges her outcome, while indirect effects quantify how her peers’ treatments changeher outcome. We consider the case in which social interactions occur only within knowngroups, and take-up decisions do not depend on peers’ offers. In this setting we pointidentify local average treatment effects, both direct and indirect, in a flexible randomcoefficients model that allows for both heterogenous treatment effects and endogeneousselection into treatment. We go on to propose a feasible estimator that is consistentand asymptotically normal as the number and size of groups increases.
Keywords: social interactions, spillovers, non-compliance, randomized saturation
JEL Codes:
C14, C21, C26, C90 ∗ The views expressed in this article are those of the authors and do not necessarily reflect the position ofthe Federal Reserve Bank of Chicago or the Federal Reserve System. † We thank Esther Duflo, Roland Rathelot, and Philippe Zamora for their help securing our access to theexperimental data set we use in this paper. We also thank Christina Goldschmidt, seminar participants atThe Philadelphia Fed, the 2018 IAAE Annual Conference, UPenn, the 2018 SEA Annual Meetings, and the2020 Econometric Society World Congress for helpful comments and suggestions. ‡ Corresponding Author: [email protected], Manor Road, Oxford OX1 3UQ, UK.
Introduction
Random saturation experiments provide a powerful tool for estimating causal effects inthe presence of social interactions—also known as spillovers or interference—by generatingexogenous variation in both individuals’ own treatment offers and the fraction of their peerswho are offered treatment (Hudgens and Halloran, 2008). These two sources of variationallow researchers to study both direct causal effects—the effect of Alice’s treatment on herown outcome—and indirect causal effects—the effect of Bob’s treatment on Alice’s outcome.A complete understanding of both direct and indirect effects is crucial for program evaluationin settings with social interactions. When considering a national job placement program, forexample, policymakers may worry that the indirect effects of the program could completelyoffset the direct effects: in a slack labor market, job placement could merely change who isemployed without affecting the overall employment rate (Cr´epon et al., 2013).In this paper we provide methods that use data from a randomized saturation designto identify and estimate direct and indirect causal effects in the presence of social inter-actions and one-sided non-compliance. In real-world experiments non-compliance is thenorm rather than the exception. In their study of the French labor market, Cr´epon et al.(2013) found that only 35% of workers offered job placement services took them up. Despitepervasive non-compliance in practice, most of the existing literature on randomized satura-tion designs either assumes perfect compliance—all subjects adhere to their experimentally-assigned treatment allocation–or identifies only intent-to-treat-effects—the effect of being offered treatment. In contrast, we use the experimental design as a source of instrumentalvariables to estimate local average treatment effects (LATE) when subjects endogenouslyselect into treatment on the basis of their experimental offers. In a world of homogeneoustreatment effects, a simple instrumental variables (IV) regression using individual treatmentoffers and group saturations as instruments would identify both direct and indirect effects. Inmost if not all real-world settings, however, treatment effects vary across individuals. In thepresence of heterogeneity, this “na¨ıve” IV approach will not in general recover interpretablecausal effects. To allow for realistic patterns of heterogeneity in a tractable framework,we study a flexible random coefficients model in which causal effects may depend on anindividual’s treatment take-up as well as that of her peers.Our approach relies on four key assumptions. First is partial interference : we assume thateach subject belongs to a single, known group and that social interactions occur only withingroups. This is reasonable in many experimental settings where, for example, groups corre-spond to villages, and social interactions across them are negligible. Second is anonymousinteractions : we assume that individuals’ potential outcome functions depend on their peers’2reatment take-up only through the average take-up in their group. Under this assumptiononly the number of treated neighbors matters, not their identities (Manski, 2013). In theabsence of detailed network data, the assumption of anonymous interactions is a naturalstarting point and is likely to be reasonable in settings such as the labor market exampledescribed above. Third is one-sided non-compliance : we assume that the only individualswho can take up treatment are those to whom treatment was offered via the experimentaldesign. One-sided non-compliance is relatively common in practice, for example when an“encouragement design” is used to introduce a new program, product or technology that isotherwise unavailable (e.g. Cr´epon et al., 2013; Miguel and Kremer, 2004). We refer to ourfourth key assumption as individualized offer response , or IOR for short. IOR requires thateach subject’s treatment take-up decision depends only on her own treatment offer, and noton the offers made to her peers. While IOR is a strong assumption, it is testable and a priori reasonable in many contexts. In Cr´epon et al. (2013), for example, local labor markets arelarge and potential participants in the job placement program are unlikely to know each otherin advance. As such, they are unlikely to influence each other’s treatment take-up decisions,even if they may impose employment externalities on one another. IOR is also reasonablein online settings where other subjects’ take-up decisions are unobserved (Anderson et al.,2014; Bond et al., 2012; Eckles et al., 2016) or confidential (Yi et al., 2015).Because it rules out any form of strategic take-up, IOR allows us to divide the populationinto never-takers and compliers, two of the traditional LATE strata. Under the randomizedsaturation design and a standard exclusion restriction, we show how to construct valid andrelevant instruments that identify the average causal effects of interest. The key to ourapproach is a result showing that conditioning on group size n and the share of compliers¯ c in a group breaks any dependence between peers’ average take-up and an individual’srandom coefficients. Under the randomized saturation design, the share of Alice’s neighborswho are offered treatment is exogenous. Under IOR, their average take-up depends only onhow many of them are compliers and whether they are offered treatment. Thus, conditionalon n and ¯ c , any residual variation in the take-up of Alice’s neighbors comes solely fromthe experimental design. Although group size is observed, the share of compliers in a givengroup is not. In a large group, however, the rate of take-up among those offered treatment,call it b c , closely approximates ¯ c . Using this insight, we provide feasible estimators of directand indirect causal effects that are consistent and asymptotically normal in the limit asgroup size grows at an appropriate rate relative to the number of groups. After constructingthe appropriate instruments, our estimators can be implemented as simple IV regressionswithout the need for non-parametric estimation. One-sided non-compliance rules out always-takers and defiers. taken up treatment. Kang and Imbens (2016) identify effects similar tothose of Imai et al. (2018) using a variant of our IOR assumption that they call “personalizedencouragement.” Both Kang and Imbens (2016) and Imai et al. (2018) identify well-definedeffects while placing limited structure on the potential outcome functions. The cost of thisgenerality is that the effects they recover have a “reduced form” flavor, and are only definedrelative to the specific saturations used in the experiment. While our random coefficientsmodel is slightly more restrictive over the potential outcome functions, it allows us to recover“fully structural” causal effects that are not specific to the design of the experiment.Our paper also relates to the applied literature that estimates spillover effects in var-ious settings. This includes “partial population” studies in which a subset of subjectsin the treatment group are left untreated and their outcomes are compared to those ofsubjects in a control group (Angelucci and De Giorgi, 2009; Barrera-Osorio et al., 2011;Bobonis and Finan, 2009; Duflo and Saez, 2003; Haushofer and Shapiro, 2016). It also in-cludes cluster-randomized trials where groups are defined by a spatial radius within which so-cial interactions may arise (Bobba and Gignoux, 2014; Miguel and Kremer, 2004) and morerecent papers that use a randomized saturation design (Banerjee et al., 2012; Bursztyn et al.,2019; Gin´e and Mansuri, 2018; Sinclair et al., 2012). In general, this literature estimatesintent-to-treat (ITT) effects. Two notable exceptions are Cr´epon et al. (2013) and Akram et al.(2018) who estimate effects that are similar in spirit to the CADE of Imai et al. (2018).Our identification approach also relates to a large literature on random coefficients mod-els, the closest being Wooldridge (2004) and Masten and Torgovitsky (2016), as well asmethods that identify structural effects using control functions (Altonji and Matzkin, 2005;Imbens and Newey, 2009).The remainder of the paper is organized as follows. Section 2 details our notation andassumptions, while section 3 presents our identification results. Section 4 provides consis-tent and asymptotically normal estimators of the effects identified in section 3, and Sectionsection 5 concludes. Proofs appear in the appendix.4
Notation and Assumptions
We observe N individuals divided between G groups. We assume throughout the paper thateach group has at least two members so there is scope for social interactions. Let g = 1 , . . . , G index groups and i = 1 , . . . , N g index individuals within a given group g . Using this notation, N = P g N g . For each individual ( i, g ) we observe a binary treatment offer Z ig , an indicatorof treatment take-up D ig , and an outcome Y ig . For each group g we observe a saturation S g ∈ [0 ,
1] that determines the fraction of individuals offered treatment in that group. A boldletter indicates a vector and a g -subscript shows that this vector is restricted to membersof a particular group. For example Z is the N -vector of all treatment offers Z ig while Z g isthe N g -vector obtained by restricting Z to group g . Define D and D g analogously and let S denote the G -vector of all S g . At various points in our discussion we will need to refer tothe average value of a variable for everyone in a group besides person ( i, g ). As shorthand,we refer to these other individuals as person ( i, g )’s neighbors . To indicate such an average,we use a bar along with an ( i, g ) subscript. For instance, ¯ D ig denotes the treatment take-uprate in group g excluding ( i, g ), while ¯ Z ig is the analogous treatment offer rate:¯ D ig ≡ N g − X j = i D jg , ¯ Z ig ≡ N g − X j = i Z jg . (1)Note that, under this definition, ¯ D ig and ¯ Z ig vary across individuals in the same groupdepending on their values of D ig or Z ig . For example in a group of eleven people, of whomfive take up treatment, ¯ D ig = 0 . D ig = 0 and 0 . D ig = 1. We now introduce our basicassumptions, beginning with the experimental design. Assumption 1 (Assignment of Saturations) . Let S = { s , s , . . . , s J } where s j ∈ [0 , forall j . Saturations are assigned to groups completely at random from S such that m j groupsare assigned to saturation s j with probability one, where P Jj =1 m j = G . In other words, P ( S g = s j ) = ( m j /G for j = 1 , . . . , J otherwise Assumption 1 details the first stage of the randomized saturation design. In this stage,each group g is assigned a saturation S g drawn completely at random from a set S . Inthe example from Figure 1, fifty groups (balls) are divided equally between five saturations(urns), namely S = { , . , . , . , } . The saturation drawn in this first stage determinesthe fraction of individuals in the group that will be offered treatment in the second stage.Figure 1, for example, depicts a group of eight individuals that has been assigned to the5% 25% 50% 75% 100%0 01 0 0 00 1 Figure 1:
Randomized Saturation Design. In the first stage groups (balls) are randomly assignedto saturations (urns). In the second stage, individuals within a group are randomly assignedtreatment offers at the saturation selected in the first stage. The figure zooms in on a group of sizeeight that has been assigned to a 25% saturation: two individuals are offered treatment.
25% saturation: two are offered treatment and six are not. For simplicity we assume thattreatment offers in the second stage follow a
Bernoulli design , in which S g determines theprobability of treatment rather than the number of treatment offers. Assumption 2 (Bernoulli Offers) . P ( Z g = z | S g = s, N g = n ) = n Y i =1 s z i (1 − s ) − z i . The randomized saturation design creates exogenous variation at the individual andgroup levels. Within a group some individuals are offered while others are not. Betweengroups, some have a large number of individuals offered treatment—a high saturation—whileothers do not. Many randomized saturation experiments, like the illustration in Figure 1,feature a 0% saturation or even a 100% saturation. We refer to 0% and 100% saturationscollectively as corner saturations to distinguish them from all other saturations, which we With minor modifications, all of our results can be extended to a completely randomized design, inwhich the number of treatment offers made to a given group is fixed conditional on S g . interior . There is no variation in treatment offers between individuals in a group assigneda corner saturation. For this reason, as we discuss in section 3 below, the number of interiorsaturations in the design will determine the flexibility with which we can model potentialoutcome functions.Assumptions 1–2 concern the design of the experiment. Our remaining assumptions, incontrast, concern the potential outcome and treatment functions. Without imposing anyrestrictions, an individual’s potential outcome function Y ig ( · ) could in principle depend onthe treatment take-up of all individuals in the sample. We denote this unrestricted potentialoutcome function by Y ig ( D ). Assumption 3 restricts Y ig ( · ) to depend only on D ig and ¯ D ig via a random coefficients model. Assumption 3 (Random Coefficients Model) . Let f ( · ) be a K -vector of known functions f k : [0 , R , each of which satisfies sup x ∈ [0 , | f k ( x ) | < ∞ . We assume that Y ig ( D ) = Y ig ( D g ) = Y ig ( D ig , ¯ D ig ) = f ( ¯ D ig ) ′ (cid:2) (1 − D ig ) θ ig + D ig ψ ig (cid:3) where θ ig and ψ ig are K -dimensional random vectors that may be dependent on ( D ig , ¯ D ig ) . The first equality in Assumption 3 is the so-called partial interference assumption, usedwidely in the literature on spillover effects. This assumption states that there are no socialinteractions between individuals in different groups: only the treatment take-up of indi-viduals in group g affects the potential outcome of person ( i, g ). The second equality inAssumption 3 states that person ( i, g )’s potential outcome is only affected by the treatmenttake-up of the others in her group through the aggregate ¯ D ig . This is related to the anony-mous interactions assumption from the network literature as it implies that only the numberof ( i, g )’s neighbors who take up treatment matters for her outcome; the identities of theneighbors are irrelevant (Manski, 2013). The third equality in Assumption 3 posits a fi-nite basis function expansion for the potential outcome functions Y ig (0 , ¯ D ig ) and Y ig (1 , ¯ D ig ),namely Y ig (0 , ¯ D ig ) = K X k =1 θ ( k ) ig f k ( ¯ D ig ) , Y ig (1 , ¯ D ig ) = K X k =1 ψ ( k ) ig f k ( ¯ D ig )or, written more compactly in matrix form, Y ig = X ′ ig B ig , X ig ≡ " D ig ⊗ f ( ¯ D ig ) , B ig ≡ " θ ig ψ ig − θ ig (2)where the coefficient vectors θ ig and ψ ig , and hence B ig , are allowed to vary arbitrarily Recall that ¯ D ig is defined to exclude person ( i, g ). i, g ) has some prior knowledge of herpotential outcome function Y ig ( · , · ), her take-up decision may depend on θ ig and ψ ig . Moregenerally, the same unobserved characteristics that determine a person’s decision to take uptreatment could affect her potential outcomes. To account for these possibilities, we allowarbitrary statistical dependence between ( D ig , ¯ D ig ) and B ig .Ideally, our goal would be to identify the average direct and indirect causal effects ofthe binary treatment D ig . Under Assumption 3, we define these as follows, building onthe definitions of Hudgens and Halloran (2008). The direct treatment effect, DE, gives theaverage effect of exogenously changing an individual’s own treatment D ig from 0 to 1 whileholding the share of her treated neighbors ¯ D ig fixed at ¯ d , namelyDE( ¯ d ) ≡ E (cid:2) Y ig (1 , ¯ d ) − Y ig (0 , ¯ d ) (cid:3) = f ( ¯ d ) ′ E (cid:2) ψ ig − θ ig (cid:3) (3)where the expectations are taken over all individuals in the population from which ourexperimental subjects were drawn. Recall that ¯ D ig excludes person ( i, g ), ensuring thatDE( ¯ d ) is well-defined. An indirect treatment effect, in contrast, gives the average effectof exogenously increasing a person’s share of treated neighbors ¯ D ig from ¯ d to ¯ d + ∆ whileholding her own treatment D ig fixed at d , in other wordsIE d ( ¯ d, ∆) ≡ E (cid:2) Y ig ( d, ¯ d + ∆) − Y ig ( d, ¯ d ) (cid:3) = (cid:2) f ( ¯ d + ∆) − f ( ¯ d ) (cid:3) ′ (cid:8) (1 − d ) E [ θ ig ] + d E (cid:2) ψ ig (cid:3)(cid:9) (4)where ∆ is a positive increment. There are two indirect treatment effect functions, IE andIE , corresponding to the two possible values at which we could hold D ig fixed: a spillover onthe untreated, and a spillover on the treated. Because the direct and indirect causal effectsare fully determined by E [ B ig ] under Assumption 3, this is our object of interest below. Forexample, if f ( x ) ′ = [1 x ] we obtain a linear model of the form Y ig = α ig + β ig D ig + γ ig ¯ D ig + δ ig D ig ¯ D ig . (5)In this case the direct effect is DE( ¯ d ) = E [ β ig ] + E [ δ ig ] ¯ d while the indirect effects areIE ( ¯ d, ∆) = ∆ × E [ γ ig ] , IE ( ¯ d, ∆) = ∆ × E [ γ ig + δ ig ] . Figure 2 presents a hypothetical example of (5) in a setting with employment displace-ment effects. Suppose that Y ig is Alice’s probability of long-term employment. Both Y ig (1 , ¯ d )and Y ig (0 , ¯ d ) have a negative slope. This means that Alice’s probability of long-term em-8 ig ¯ D ig α ig + β ig Y ig (1 , ¯ D ig ) γ ig + δ ig α ig Y ig (0 , ¯ D ig )0 γ ig Figure 2:
A hypothetical example of the linear potential outcomes model from (5). The slope ofthe bottom line, γ ig , is the indirect effect when untreated while that of the top line, γ ig + δ ig , is themarginal indirect effect when treated. The distance between the two lines is the direct treatmenteffect. ployment decreases if more of her neighbors obtain job placement services. But since δ ig ispositive, the spillover is more harmful if Alice is untreated. Alice’s direct effect of treatment Y ig (1 , ¯ d ) − Y ig (0 , ¯ d ) is positive for all ¯ d in this example and increases as ¯ d does: job placementservices are more valuable to Alice when more of her neighbors obtain them. By averagingthese effects for everyone in the population, we obtain IE , IE , and DE.Under perfect compliance D ig would simply equal Z ig , making both D ig and ¯ D ig ex-ogenous. In this case a sample analogue of E [ Y ig ( d, ¯ d )] could be used to recover all of thetreatment effects discussed above, at least at values of ¯ d that arise in the experimentaldesign. Unfortunately non-compliance is pervasive in real-world experiments, greatly com-plicating the identification of causal effects. In a large-scale experiment carried out in France,for example, only 35% of unemployed workers offered job placement services took them up(Cr´epon et al., 2013). Those who did take up treatment likely differ in myriad ways fromthose who did not: they may, for example, be more conscientious. One way to to avoid thisproblem of self-selection is to carry out an intent-to-treat (ITT) analysis, conditioning on Z ig and S g rather than D ig and ¯ D ig . But with take-up rates as low as 35%, ITT estimatescould be very far from the causal effects of interest. In this paper we adopt a different ap-proach. Following the tradition in the local average treatment effect (LATE) literature, weprovide conditions under which direct and indirect causal effects–rather than ITT effects–can be identified for well-defined sub-populations of individuals. We focus on the case of one-sided noncompliance , in which only those offered treatment can take it up. One-sided9on-compliance is fairly common in practice (e.g. Cr´epon et al., 2013) and simplifies theanalysis considerably. Assumption 4 (One-sided Non-compliance) . If Z ig = 0 then D ig = 0 . To account for endogenous treatment take-up, we define potential treatment functions D ig ( · ). In principle these could depend on the treatment offers of every individual, Z in theexperiment. The following assumption restricts D ig ( · ) to permit identification of the directand indirect causal effects described above. Assumption 5 (IOR) . D ig ( Z ) = D ig ( Z g ) = D ig ( Z ig , ¯ Z ig ) = D ig ( Z ig ) . The first equality of Assumption 5 is a partial interference assumption: it requires thatthere are no social interactions in take-up between individuals in different groups. Thesecond equality of Assumption 5 states that person ( i, g )’s take-up decision depends on thetreatment offers of others in her group only through the fraction ¯ Z ig of treatment offersmade to the others in her group. Unfortunately these first two equalities are not in generalsufficient to point identify direct and indirect causal effects. The third equality, which wecall individualistic offer response or IOR for short, imposes the further restriction that eachperson’s take-up decision depends only on her own treatment offer. IOR states that thereare no social interactions in take-up . This is a strong assumption, but one that has alsoappeared in the existing literature. Kang and Imbens (2016), for example, employ a variantof IOR that they call “personalized encouragement.” And while Imai et al. (2018) derivetheir so-called “complier average direct effect (CADE)” under a weaker condition than IOR,the CADE is in fact a hybrid of direct and indirect effects unless one is willing to assumethat there are no social interactions in take-up. Fortunately, IOR is testable: it implies, forexample, that E [ D ig | Z ig = 1 , S g = s ] does not vary with s . If the observed average take-uprate among individuals who are offered treatment varies with saturation, this indicates aviolation of IOR.Under IOR and one-sided non-compliance (Assumptions 4 and 5), we can divide individ-uals into never-takers and compliers, two of the principal strata from the LATE literature.Never-takers are defined as those for whom D ig (0) = D ig (1) = 0, while compliers are thosefor whom D ig ( z ) = z for all z . Defining C ig to be the indicator that person ( i, g ) is a An extension of our results to two-sided non-compliance is currently in progress. Recall that the average ¯ Z ig is defined to exclude ( i, g ). Work in progress explores the possibility of relaxing IOR in specific settings to obtain point, or at leastpartial identification. Under one-sided non-compliance, Assumption 4, there are no always-takers. D ig = C ig Z ig , ¯ D ig = 1 N g − X j = i C jg Z jg . (6)By analogy to ¯ Z ig and ¯ D ig , we define ¯ C ig to be the share of compliers among person ( i, g )’sneighbors in group g , namely ¯ C ig = 1 N g − X j = i C jg . (7)Note that ¯ C ig varies across individuals in the same group, depending on their values of C ig .Finally, let C g denote the vector of C ig for all individuals in group g .Our final assumption is an exclusion restriction for the treatment offers Z g and saturation S g . To state it we require two additional pieces of notation. First, let B g denote the vectorthat stacks B ig for all individuals in group g . Second, following Dawid (1979), let “ | = ” denote(conditional) independence so that X | = Y indicates that X is statistically independent of Y while X | = Y | Z indicates that X is conditionally independent of Y given Z . Using thisnotation, the exclusion restriction is as follows. Assumption 6 (Exclusion Restriction) . (i) S g | = ( C g , B g , N g ) (ii) Z g | = ( C g , B g ) | ( S g , N g )Intuitively, Assumption 6 states that ( C g , B g , N g ) are “predetermined” with respect tothe treatment offers and saturations. In a traditional LATE setting, the counterparts ofAssumption 6 are the “unconfounded type” assumption and the independence of potentialoutcomes and treatment offers. Assumption 6 could be violated in a number of ways. If,for example, individuals chose their group membership based on knowledge of their group’ssaturation, N g would not be independent of S g . Similarly, if some individuals decided tocomply with their treatment offers only because their group was assigned a high saturation, C g would not be independent of S g . This latter possibility illustrates that Assumption 6 par-tially embeds IOR by ruling out “selection into compliance.” More prosaically, Assumption 6would be violated if either S g or Z ig had a direct effect on the random coefficients B g . No-tice that part (ii) of Assumption 6 conditions on ( S g , N g ). This is because the second stageof the randomized saturation experiment assigns Z g conditional on this information: seeAssumption 2. 11 Identification
Under Assumption 3, the functional form of the random coefficients model is known. Sowhy not simply use ( Z ig , S g ) as instrumental variables for D ig and f ( ¯ D ig )? If the first-stagerelationship between instruments and endogenous regressors is homogeneous, two-stage leastsquares identifies the average effects in a random coefficient model, i.e. E [ θ ig ] and E [ ψ ig − θ ig ]under (2) (Heckman and Vytlacil, 1998; Wooldridge, 1997, 2003, 2016). In our case, however,this result does not apply: the following lemma shows that the first stage is heterogeneous because the conditional distribution of ¯ D ig given S g varies with ( ¯ C ig , N g ). Lemma 1.
Let ¯ c be a value in [0 , such that ( n − c is a non-negative integer. UnderAssumptions 1–2 and 4–6 and conditional on ( N g = n, S g = s, C g = c , ¯ C ig = ¯ c, Z ig = z ) , ( n −
1) ¯ D ig follows a Binomial (( n − c, s ) distribution. Intuitively, the problem presented by the Lemma 1 is as follows. Although S g is randomlyassigned, the variation that it induces in ¯ D ig is mediated by the share of compliers ¯ C ig .Accordingly if ¯ C ig —a source of first-stage heterogeneity—is correlated with the randomcoefficients in the second stage, the IV estimator will not identify the effects of interest. Tomake this problem more concrete, consider the linear potential outcomes model from (5) andlet ϑ IV be the IV estimand using instruments (1 , Z ig , S g , Z ig S g ). In this example ϑ IV takesa particularly simple form, as shown in the following lemma. Lemma 2.
Let ϑ IV be the IV estimand from a regression of Y ig on X ig ≡ (1 , D ig , ¯ D ig , D ig , ¯ D ig ) ′ with instruments Z ig ≡ (1 , Z ig , S g , Z ig S g ) ′ , namely ϑ IV ≡ h α IV β IV γ IV δ IV i ′ = E (cid:2) Z ig X ′ ig (cid:3) − E [ Z ig Y ig ] . assuming that E [ Z ig X ′ ig ] is invertible. Then, under (5) and Assumptions 1–2 and 4–6, α IV = E [ α ig ] β IV = E [ β ig | C ig = 1] γ IV = E [ γ ig ] + Cov ( ¯ C ig , γ ig ) E ( ¯ C ig ) δ IV = E [ δ ig | C ig = 1] + Cov ( ¯ C ig , δ ig | C ig = 1) E ( ¯ C ig | C ig = 1) . As we see from Lemma 2, IV identifies the population average of α ig , along with thepopulation average of β ig for the subset of individuals who select into treatment. Neitherof these, however, is itself a causal effect. In general, IV recovers neither direct nor indirectcausal effects for any well-defined group of individuals. Specializing (4) to the linear modelfrom (5) gives IE ( ¯ d, ∆) = E [ γ ig ]∆. In other words, E [ γ ig ] is an average spillover . Lemma 212hows that IV fails to identify this quantity unless the individual-specific spillovers γ ig areuncorrelated with the share of compliers ¯ C ig . This condition could easily fail in practice. Inthe labor market example from the introduction, cities with a particularly depressed labormarket might be expected to contain a large share of compliers. If negative spillovers aremore intense in such cities, IV will not recover the average indirect effect. A similar problemhampers the interpretation of δ IV . Under (5) the average direct effect for compliers, as afunction of ¯ d , is given by E [ β ig | C ig = 1] + E [ δ ig | C ig = 1] ¯ d . While IV identifies the interceptof this function, it only identifies the slope if δ ig is uncorrelated with ¯ C ig for compliers.As this example illustrates, identifying direct and indirect causal effects requires us tocorrect for possible dependence between individual-specific coefficients and group-level take-up that arises from the first-stage relationship in Lemma 1. The key to our approach, asshown in the following theorem, is to condition on ¯ C ig and N g . Theorem 1.
Under Assumptions 1–2 and 4–6, ( S g , Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( ¯ C ig , N g ) . Theorem 1 implies that conditioning on ( ¯ C ig , N g ) is sufficient to break any dependencebetween f ( ¯ D ig ) and ( B ig , C ig ) that may be present. The intuition for this result is as follows.Conditional on ¯ C ig and N g , we know precisely how many of ( i, g )’s neighbors are compliers.Given this information, IOR implies that all remaining variation in ¯ D ig is arises solely fromexperimental variation in the saturation S g assigned to different groups, and the share ofcompliers offered treatment across groups assigned the same saturation. So long as Z ig and S g do not affect ( B ig , C g ), Assumption 6, it follows that ( Z ig , ¯ D ig , S g ) are exogenousgiven ( ¯ C ig , N g ), even when individuals decide whether or not to take up treatment based onknowledge of their potential outcome functions.Before stating our identification results, we require some additional notation and onefurther assumption. Define the vector W ig and matrix-valued functions Q , Q , Q as follows: Q (¯ c, n ) ≡ E (cid:2) W ig W ′ ig | ¯ C ig = ¯ c, N g = n (cid:3) , W ig ≡ h Z ig i ′ ⊗ f ( ¯ D ig ) (8) Q (¯ c, n ) ≡ E (cid:2) (1 − Z ig ) f ( ¯ D ig ) f ( ¯ D ig ) ′ | ¯ C ig = ¯ c, N g = n (cid:3) (9) Q (¯ c, n ) ≡ E (cid:2) Z ig f ( ¯ D ig ) f ( ¯ D ig ) ′ | ¯ C ig = ¯ c, N g = n (cid:3) . (10)We use Q , Q , Q below to construct instrumental variables that are not subject to theshortcomings of Z ig from Lemma 2 discussed above. The final ingredient that we need toconstruct these alternative instruments is a rank condition. Assumption 7 (Rank Condition) . (i) < E ( C ig ) < ii) Q (¯ c, n ) is invertible at every point (¯ c, n ) in the support of ( ¯ C ig , N g ) . Part (i) of Assumption 7 asserts that there is at least some degree of non-compliancewith the experimental treatment offers, E ( C ig ) <
1, and that the population contains atleast some compliers, E ( C ig ) >
0. Part (ii) requires that the matrix-valued function Q defined in (8) is full rank when evaluated at any share of compliers ¯ c and group size n thatoccur in the population. Assumption 7 does not explicitly restrict Q or Q . By the linearityof conditional expectation, however, Q (¯ c, n ) = " Q (¯ c, n ) + Q (¯ c, n ) Q (¯ c, n ) Q (¯ c, n ) Q (¯ c, n ) (11)so Assumption 7(ii) could equivalently be stated in terms of Q and Q . Lemma 3. Q (¯ c, n ) is invertible iff Q (¯ c, n ) and Q (¯ c, n ) are both invertible, in which case Q (¯ c, n ) − = " Q (¯ c, n ) − − Q (¯ c, n ) − − Q (¯ c, n ) − Q (¯ c, n ) − + Q (¯ c, n ) − . We discuss low-level conditions for the invertibility of ( Q , Q ), and hence Q , below.Having assumed the necessary rank condition, we can now state our main identificationresults. The following theorem shows how Q ( ¯ C ig , N g ) and Q ( ¯ C ig , N g ) can be used toconstruct instrumental variables that identify average values of the random coefficients forwell-defined groups of individuals. Theorem 2.
Define the instrument vectors Z Wig ≡ Q ( ¯ C ig , N g ) − W ig , Z ig ≡ Q ( ¯ C ig , N g ) − f ( ¯ D ig ) , Z ig ≡ Q ( ¯ C ig , N g ) − f ( ¯ D ig ) where Q , Q , Q , and W ig are as given in (8) – (10) . Then, under Assumptions 3–5 and 7and assuming that ( Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( ¯ C ig , N g ) , we have(i) " E ( θ ig ) E ( ψ ig − θ ig | C ig = 1) = E (cid:2) Z Wig X ′ ig (cid:3) − E (cid:2) Z Wig Y ig (cid:3) ,(ii) E (cid:2) ψ ig | C ig = 1 (cid:3) = E h Z ig (cid:8) D ig f ( ¯ D ig ) (cid:9) ′ i − E (cid:2) Z ig { D ig Y ig } (cid:3) (iii) E [ θ ig | C ig = 0] = E h Z ig (cid:8) Z ig (1 − D ig ) f ( ¯ D ig ) (cid:9) ′ i − E (cid:2) Z ig { Z ig (1 − D ig ) Y ig } (cid:3) , and(iv) E [ θ ig ] = E h Z ig (cid:8) (1 − Z ig ) f ( ¯ D ig ) (cid:9) ′ i − E (cid:2) Z ig { (1 − Z ig ) Y ig } (cid:3) . S g as a source of instruments for f ( ¯ D ig ) we transform this vector of endogenous regressorsinto a set of exogenous instruments using Q ( ¯ C ig , N g ) − and Q ( ¯ C ig , N g ) − . Parts (ii) and(iii) use a similar approach to obtain moment equations for the average value of ψ ig forcompliers and θ ig for never-takers. Given part (i), part (iv) is technically redundant, butit is convenient to have an expression for E ( θ ig ) in isolation. To understand the intuitionbehind the instruments from Theorem 2, consider the linear potential outcomes examplefrom (5) above. Here we have f ( x ) = (1 , x ) ′ and thus Q z ( ¯ C ig , N g ) = P ( Z ig = z ) E " D ig ¯ D ig ¯ D ig !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ¯ C ig , N g , Z ig = z , z ∈ { , } using the fact that Z ig | = ( ¯ C ig , N g ) by Lemma A.2. It follows after a few steps of algebra that Q z ( ¯ C ig , N g ) − f ( ¯ D ig ) = 1 P ( Z ig = z ) E ( ¯ D ig | ¯ C ig , N g , Z ig = z ) − ¯ D ig E ( ¯ D ig | ¯ C ig , N g , Z ig = z )Var( ¯ D ig | ¯ C ig , N g , Z ig = z )¯ D ig − E ( ¯ D ig | ¯ C ig , N g , Z ig = z )Var( ¯ D ig | ¯ C ig , N g , Z ig = z ) . While ¯ D ig is endogenous, we see that the scaled difference between ¯ D ig and its conditionalexpectation is a valid instrument under the linear potential outcomes model. Intuitively,this transformation adjusts for the first-stage heterogeneity discussed at the beginning ofthis section: after controlling for differences in ( ¯ C ig , N g ), the remaining variation in ¯ D ig arises only from the experimentally–assigned saturations. Thus, rather than using S g as aninstrument directly, we use it indirectly to generate variation in ¯ D ig given ( ¯ C ig , N g ). Asdiscussed below, this is crucial for part (ii) of Assumption 7.Notice that Theorem 2 does not explicitly invoke the randomized saturation design, As-sumptions 1–2, or the exclusion restriction, Assumption 6. Using this result for identification,however, requires two conditions. First we need to satisfy ( Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( ¯ C ig , N g ). Asshown in Theorem 1 above, the randomized saturation design and exclusion restriction aresufficient for this condition to hold under one-sided non-compliance and IOR, Assumptions 4and 5. Second, we need to show that the functions Q , Q are identified in order to constructthe instruments from Theorem 2. Fortunately, these functions are in fact known under therandomized saturation design and exclusion restriction. In particular, they depend only on As Theorem 2 does not strictly speaking require a randomized saturation design, it could in principlebe applied in other settings, e.g. a “natural” experiment, if our other assumptions are satisfied. In this case, D ig | ( Z ig , ¯ C ig , N g ), which can be calculated from Lemma 1, and the distri-bution of Z ig | ( ¯ C ig , N g ), which coincides with its unconditional distribution by Lemma A.2.As such, we can always calculate Q ( ¯ C ig , N g ) and Q ( ¯ C ig , N g ) by simulating the experimen-tal design. Depending on the choice of f , analytical expressions for Q , Q may be available,as shown below for the linear potential outcomes model from (5).Constructing the instruments that appear in Theorem 2 requires us to evaluate Q and Q at ( ¯ C ig , N g ). While the group size N g is observed, the share of compliers ¯ C ig is not. In largegroups, however, ¯ C ig can be precisely estimated by calculating the rate of treatment take-upamong the neighbors of ( i, g ) who are offered treatment. In the following section we use thisapproach to provide consistent and asymptotically normal estimators of the parameters fromTheorem 2. For the remainder of this section, however, we consider identification conditional on knowledge of ¯ C ig . Subject to this qualification, the following result catalogues the full setof causal effects that are identified under our assumptions. Theorem 3.
Given knowledge of ¯ C ig the following are identified under Assumptions 1–7:(i) IE ( ¯ d, ∆) ≡ E [ Y ig (0 , ¯ d + ∆) − Y ig (0 , ¯ d )] ,(ii) DE ( ¯ d | D ig = 1) ≡ E (cid:2) Y ig (1 , ¯ d ) − Y ig (0 , ¯ d ) | D ig = 1 (cid:3) ,(iii) IE ( ¯ d, ∆ | D ig = 1) ≡ E [ Y ig (0 , ¯ d + ∆) − Y ig (0 , ¯ d ) | D ig = 1] ,(iv) IE ( ¯ d, ∆ | D ig = 1) ≡ E [ Y ig (1 , ¯ d + ∆) − Y ig (1 , ¯ d ) | D ig = 1] ,(v) IE ( ¯ d, ∆ | C ig = 0) ≡ E [ Y ig (0 , ¯ d + ∆) − Y ig (0 , ¯ d ) | C ig = 0] , Part (i) of Theorem 3 is an indirect treatment effect, as defined in (4) above. It measuresthe causal impact of increasing the treatment take-up rate among Alice’s neighbors from ¯ d to ( ¯ d + ∆) when Alice’s own treatment is held fixed at zero. In the Cr´epon et al. (2013)experiment discussed in our empirical example below, this corresponds to the average labormarket displacement effect. Whereas part (i) is an average treatment effect, parts (ii)–(iv)are the effects of treatment on the treated. Part (ii) gives the direct effect of treatingAlice while holding the treatment take-up rate of her neighbors fixed at ¯ d , while (iii) and(iv) give the indirect effect of increasing her neighbors’ treatment take-up from ¯ d to ¯ d + ∆while holding Alice’s treatment fixed at either zero, part (iii), or one, part (iv). Part (v) isa LATE generalization of Equation 4: it gives the indirect effect for never-takers, holdingtheir treatment fixed at zero. While we identify the full set of direct and indirect effects for Q , Q would not be known but could potentially be recovered via a non-parametric approach. Because we consider a setting with one-sided non-compliance, any experimental participant with D ig = 1must be a complier. D ig = 1. As such, we cannot identify directtreatment effects for this group or indirect treatment effects when D ig is held fixed at one.This in turn implies that we cannot identify the average direct effect for the population asa whole, DE( ¯ d ), or the average indirect effect when D ig is held fixed at one, IE ( ¯ d, ∆).Given that Q and Q are completely determined by the experimental design, we candirectly check part (ii) of Assumption 7 for any choice of basis functions f and probabilitydistribution over saturations. Consider again the linear potential outcomes model from (5).In this example f ( x ) = (1 , x ) ′ and thus, Q (¯ c, n ) = " E { − S g } ¯ c E { S g (1 − S g ) } ¯ c E { S g (1 − S g ) } ¯ c E (cid:8) S g (1 − S g ) (cid:9) + ¯ cn − E { S g (1 − S g ) } (12) Q (¯ c, n ) = " E { S g } ¯ c E (cid:8) S g (cid:9) ¯ c E (cid:8) S g (cid:9) ¯ c E (cid:8) S g (cid:9) + ¯ cn − E (cid:8) S g (1 − S g ) (cid:9) . (13)by Bayes’ Theorem, the Law of Total Probability, and Lemmas 1 and A.2. Suppose firstthat there is a single saturation s . Then (12) and (13) simplify to yield | Q (¯ c, n ) | = ¯ cs (1 − s ) n − , | Q (¯ c, n ) | = ¯ cs (1 − s ) n − . so that Q (¯ c, n ) and Q (¯ c, n ) are both invertible for any n and all ¯ c greater than zero providedthat 0 < s <
1. The identifying power of this “degenerate” randomized saturation design,however, is weak: Q , Q are arbitrarily close to being singular for any ¯ c if n is sufficientlylarge. Consider next a so-called “cluster randomized” experiment in which there are twosaturations, 0 and 1, and P ( S g = 1) = p . Calculating the expectations in (12) and (13), Q (¯ c, n ) = " (1 − p ) 00 0 , Q (¯ c, n ) = " p ¯ cp ¯ cp ¯ c p . In this case neither Q nor Q is invertible for any values of n and ¯ c . Finally, consider adesign with two distinct, equally likely saturations s L < s H . For this design, straightforwardbut tedious algebra gives | Q (¯ c, n ) | = ¯ c − s L )(1 − s H )( s H − s L ) + ¯ c [(1 − s L ) + (1 − s H )] [ s L (1 − s L ) + s H (1 − s H ) ]4( n − | Q (¯ c, n ) | = ¯ c s L s H ( s H − s L ) + ¯ c ( s L + s H ) [ s L (1 − s L ) + s H (1 − s H )]4( n − .
17o long as neither s L nor s H equals zero or one, both terms in each expression are strictlypositive for any ¯ c >
0, so that Q and Q are invertible. Moreover, in contrast to the singlesaturation design discussed above, this design does not suffer from a weak identificationproblem. While the second term in each of the preceding equalities vanishes for large n ,the first term does not. Thus, two interior saturations are sufficient to strongly identify thelinear potential outcomes model. As the three preceding examples show, two distinct sources of experimental variationdetermine the rank of Q (¯ c, n ) and Q (¯ c, n ): “between” saturation variation, and “within”saturation variation. Our first example lacks “between” variation because each group isassigned the same saturation, S g = s . Yet even with a single saturation, there is still“within” variation under Assumption 2, because the number of offers made to a given groupis random. This “within” variation, however, is negligible when n is large. In our secondexample, the cluster randomized experiment, the situation is reversed. Because everyone in agiven group is either offered ( S g = 0) or unoffered ( S g = 1), this design generates no “within”variation. While a cluster randomized design does generate some “between” variation, it istoo coarse to identify our effects of interest: under our assumptions ¯ D ig equals zero when S g = 0 and ¯ C ig when S g = 1. Our third example, with two saturations 0 < s L < s H < n is solarge that “within” variation becomes negligible. If ¯ C ig were observed, a handful of just-identified IV regressions would suffice to estimatethe causal effects from Theorem 3. While ¯ C ig is unobserved in practice, fortunately we canestimate it under one-sided non-compliance by comparing treatment take-up to the share oftreatment offers, i.e. b C ig ≡ ¯ D ig / ¯ Z ig , if ¯ Z ig > , otherwise (14)where we arbitrarily define b C ig = 0 if none of ( i, g )’s neighbors are offered treatment. Inthis section we use (14) to derive feasible, consistent, and asymptotically normal estimatorsof the direct and indirect causal effects identified in section 3. For simplicity, we assumethroughout that the random saturation S g is bounded below by s >
0. Because we cannot In general, sufficient conditions for Assumption 7(ii) will depend on the specific choice of basis functions f . For large n , however, a necessary condition is that the design contains at least as many distinct interiorsaturations as there are elements in f . For details, see Appendix C. Under Assumption 2 it is possible, although unlikely, that ¯ Z ig could be zero even if S g > ig R W ig (i) (cid:20) D ig (cid:21) ⊗ f ( ¯ D ig ) Q (cid:20) Z ig (cid:21) ⊗ f ( ¯ D ig )(ii) f ( ¯ D ig ) Q f ( ¯ D ig ) D ig (iii) f ( ¯ D ig ) Q f ( ¯ D ig ) Z ig (1 − D ig )(iv) f ( ¯ D ig ) Q f ( ¯ D ig )(1 − Z ig ) Table 1:
This table defines the shorthand from (15) for the four sample analogue estimatorscorresponding the parts of Theorem 2. In each part, the vector of regressors is X ig , the trueinstrument vector is Z ig ≡ R ( ¯ C ig , N g ) − W ig , and the estimated instrument vector is b Z ig ≡ R ( b C ig , N g ) + W ig , where M + denotes the Moore-Penrose inverse of a square matrix M , and b C ig isas defined in (14). The functions Q , Q , Q are as defined in (8)–(10). estimate ¯ C ig when S g = 0, experiments that include a 0% saturation require a slightlydifferent approach. We explain these differences in Appendix B.In the interest of brevity, we introduce shorthand notation and high-level regularityconditions that apply to all four of our sample analogue estimators. These take the form b ϑ ≡ G X g =1 N g X i =1 b Z ig X ′ ig ! − G X g =1 N g X i =1 b Z ig Y ig ! , b Z ig ≡ R ( b C ig , N g ) + W ig (15)where Y ig is the outcome variable from Assumption 3, and M + denotes the Moore-Penroseinverse of a square matrix M . Table 1 gives the definitions of X ig , R , and W ig corre-sponding to each part of Theorem 2. The “estimated” instrument b Z ig is a stand-in for theunobserved “true” instrument Z ig ≡ R ( ¯ C ig , N g ) − W ig . While R ( ¯ C ig , N g ) is invertible underAssumption 7, R ( b C ig , N g ) may not be so, since b C ig could fall outside the support set of ¯ C ig oreven equal zero. For this reason we define b Z ig using the Moore-Penrose inverse, which alwaysexists and coincides with the ordinary matrix inverse when R ( b C ig , N g ) is indeed invertible.As G grows, so does the number of unknown values ¯ C ig that we must estimate to constructthe instrument vectors b Z ig . For this reason, we consider an asymptotic sequence in whichthe minimum group size n grows along with the number of groups G . Under appropriateassumptions, this implies that the limit behavior of b ϑ coincides with that of the infeasible While ¯ C ig can vary across individuals in the same group, it takes on at most two distinct values for fixed g . If a group contains T total individuals, of whom c are compliers and n never-takers, then the share ofcompliers among a given person’s neighbors is either ( c − / ( T −
1) if she is a complier or c/ ( T −
1) if sheis a never-taker. Thus, the number of incidental parameters is 2 G . Z ig instead of its estimate b Z ig .Like Baird et al. (2018), we take an infinite population approach to inference, assumingthat the researcher observes a random sample of size G from a population of groups. UnlikeBaird et al. (2018), we allow these groups to differ in size. Upon drawing a group g from thepopulation, we observe the group-level random variables ( S g , N g ) along with the individual-level random variables ( Y ig , D ig , Z ig ) for each member of the group: 1 ≤ i ≤ N g . We furtherassume that observations are identically distributed, but not independent, within groups. Groups are only observed as a unit : either everyone from the group appears in thesample or no one does. For this reason, some care is needed in defining random variables torepresent our sampling procedure and expectations to represent the population averages thatdefine our causal effects of interest. The expectations in Theorems 2–3 are averages that giveequal weight to each individual in the population, or sub-population if we condition on C ig .Analogously, the estimator in (15) is an average that gives equal weight to each individual inthe sample. Both of these are precisely what we want, as our goal is to identify and estimateaverage causal effects for individuals. Under the sampling assumptions introduced in thepreceding paragraph, ( Y ig , D ig , Z ig , ¯ D ig ) are random variables that are drawn by choosing agroup uniformly at random from the population of groups, and then a single person from thechosen group. If all groups were the same size, this would be equivalent to choosing a personuniformly at random from the population of individuals . When groups vary in size, however,the equivalence no longer holds. This creates the possibility for ambiguity when takingthe expectation of an individual-level random variable, such as Y ig , without conditioningon group size: is the expectation intended to give equal weight to groups or individuals?Fortunately this is only a question of defining appropriate notation. Our group samplingprocedure unambiguously gives equal weight to each individual in the population because weobserve not isolated individuals but whole groups. While small groups are just as likely tobe drawn as large groups, large groups make a greater contribution to the sample averagesfrom (15) because they contain more people. The question is merely how to represent this The assumption that observations are identically distributed within group amounts to stipulating thatthe indices 1 ≤ i ≤ N g are assigned at random. Consider a population of 100 groups, half of which have 5 members and the rest of which have 15members so that 250 of the 1000 people in the population belong to a small group and the remaining 750belong to a large group. Suppose first that we choose a single group at random and then a single personwithin the selected group. Then someone from a small group has probability 1/500 of being selected whilesomeone from a large group has probability 1/1500 of being selected. Continuing from the example in the preceding footnote: suppose that we randomly sample 10 groupsand observe everyone in the selected group. Then, on average, our sample will contain 5 small groups and5 large groups. While the total sample size is random, we will on average observe 100 people, of whom 25come from small groups and the rest from large groups, matching the shares of each kind of individual inthe population. ρ g ≡ N g / E ( N g ) denote the relative size of group g . We write E [ Y ig ] todenote the average that gives equal weight to groups—choosing one person at random froma randomly-chosen group—and E [ ρ g Y ig ] to denote the average that gives equal weight toindividuals—observing an entire group chosen at random. It is the latter expectation thatappears in our asymptotic results below, as it denotes the population equivalent of the doublesums from (15). While this is a slight abuse of notation, expectations from section 3 abovethat involve individual-level random variables but do not condition on group size should beinterpreted as (implicitly) weighting by relative group size. Using the notation and samplingscheme defined above, we now state high-level sufficient conditions for the consistency of b ϑ from Equation 15. Theorem 4.
Let ρ g ≡ N g / E ( N g ) and suppose that(i) we observe a random sample of G groups, where observations within a given group areidentically distributed although not necessarily independent,(ii) Y ig = X ′ ig ϑ + U ig for ≤ g ≤ G , ≤ i ≤ N g ,(iii) E ( ρ g Z ig U ig ) = and E (cid:0) ρ g Z ig X ′ ig (cid:1) = I ,(iv) E (cid:2) ρ g || Z ig X ′ ig || (cid:3) = o ( G ) ,(v) E (cid:2) ρ g || Z ig U ig || (cid:3) = o ( G ) ,(vi) || P Gg =1 1 N g P N g i =1 ρ g ( b Z ig − Z ig ) X ′ ig || = o P ( G ) , and(vii) || P Gg =1 1 N g P N g i =1 ρ g ( b Z ig − Z ig ) U ig || = o P ( G ) .Then b ϑ , defined in (15) , is consistent for ϑ as G → ∞ . Condition (i) of Theorem 4 simply restates our group sampling assumption. Conditions(ii) and (iii) hold under the assumptions of Theorem 2, as shown in the proof of that result:for each average effect ϑ from the theorem, we can define an appropriate error term U ig , vectorof regressors X ig , and vector of instruments Z ig such that Y ig = X ′ ig ϑ + U ig where Z ig isan exogenous and relevant instrument. Moreover, for each part of Theorem 2, E ( ρ g Z ig X ′ ig )equals the identity matrix. Conditions (iv) and (v) of Theorem 4 would be implied byrequiring that the second moments of ρ g Z ig X ′ ig and ρ g Z ig U ig exist and are bounded. We For effects that condition on C ig = c , e.g. those from parts (ii) and (iii) of Theorem 2, the appropriatedefinition of ρ g becomes N g E [ ( C ig = c )] / E [ N g ( C ig = c )]. Given that E ( ρ g Z ig X ′ ig ) = I , we could have defined our estimator to be N P Gg =1 P N g i =1 b Z ig Y ig ratherthan b ϑ . It is more convenient both for our asymptotic derivations and practical implementation, however,to work with an IV estimator. ρ g necessarily changes with G if we consider an asymptotic sequence in which the minimumgroup size n increases with the number of groups, as we will assume below. Requiring therelevant expectations to be o ( G ) in principle allows the variance of relative group size ρ g togrow along with the number of groups, provided that it does not grow too quickly. Conditions(i)–(v) together are sufficient for the consistency of e ϑ ≡ G X g =1 N g X i =1 Z ig X ′ ig ! − G X g =1 N g X i =1 Z ig Y ig ! , (16)an infeasible estimator that uses the true instrument vector Z ig instead of its estimate b Z ig .The final two conditions of Theorem 4 assume that b Z ig is a sufficiently accurate estimatorof Z ig to ensure that b ϑ = e ϑ + o P (1). In the setting we consider here, this will requirea condition on how quickly the minimum group size n grows relative to G , as we discussin detail below. Strengthening conditions (v) and (vii) and adding one further assumptionimplies that b ϑ is asymptotically normal. Theorem 5.
Suppose that(i) Var (cid:16) N g P N g i =1 ρ g Z ig U ig (cid:17) → Σ as G → ∞ ,(ii) E (cid:2) ρ δg || Z ig U ig || δ (cid:3) = o ( G δ/ ) for some δ > , and(iii) || P Gg =1 1 N g P N g i =1 ρ g ( b Z ig − Z ig ) U ig || = o P ( G / ) .Then, under the conditions of Theorem 4, √ G ( b ϑ − ϑ ) → d N ( , Σ ) . Combined with the first four conditions of Theorem 4, (i) and (ii) from Theorem 5 aresufficient for the asymptotic normality of e ϑ , the infeasible estimator defined in (16). Condi-tion (i) implies that the rate of convergence of e ϑ is G − / . Obtaining a rate of convergencethat depends on the total number of individuals rather than groups in the sample would re-quire assumptions that are implausible in typical applications of the randomized saturationdesign. Conditions (ii) and (iii) strengthen (v) and (vii), respectively, from Theorem 4: (ii)is sufficient for the Lindeberg condition, which we use to establish a central limit theorem,while (iii) ensures that the limit distribution of the feasible estimator b ϑ coincides with thatof the infeasible estimator e ϑ . Obtaining the faster rate of convergence would require Var (cid:16) N g P N g i =1 ρ g Z ig U ig (cid:17) → as G → ∞ . Be-cause we consider an asymptotic sequence in which the minimum group size grows with G , this is technicallypossible. It would, however, require us to assume that both heterogeneity between groups and dependencewithin groups to vanish in the limit. b Z ig − Z ig ) to be sufficiently small on average that the limiting behavior of b ϑ coincides with that of the infeasible estimator. We now provide low-level sufficient conditionsfor this to obtain. By definition, b Z ig − Z ig = h R ( b C ig , N g ) + − R ( ¯ C ig , N g ) − i W ig . (17)Accordingly, so long as R is a sufficiently well-behaved function, ( b Z ig − Z ig ) will be smallif | b C ig − ¯ C ig | is. As shown in the following lemma, a sufficient condition for this differenceto vanish uniformly over ( i, g ) is for the minimum group size n to be large relative to log G . Lemma 4.
Suppose that < s ≥ S g and n ≤ N g . Under Assumptions 1–2 and 4–6 max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12)(cid:19) = O P s log Gn ! as ( n, G ) → ∞ . The following regularity conditions are sufficient for R ( b C ig , N g ) + − R ( ¯ C ig , N g ) − to inheritthe asymptotic behavior of ( b C ig − ¯ C ig ). Assumption 8 (Regularity Conditions for R ) . (i) R (¯ c, n ) is well-defined and symmetric for all ¯ c ∈ [¯ c L / , , n ≥ n where < ¯ c L ≤ ¯ C ig ;(ii) inf ¯ c ≥ ¯ c L / , n ≥ n σ ( R (¯ c, n )) > σ > , where σ ( M ) denotes the minimum eigenvalue of M ;(iii) || R (¯ c , n ) − R (¯ c , n ) || ≤ L (cid:8) | ¯ c − ¯ c | + O ( n − / ) (cid:9) as n → ∞ for some < L < ∞ . Parts (i) and (ii) of Assumption 8 require that R is well-defined and uniformly invertibleover a range of values for ¯ c that includes the support of ¯ C ig and excludes zero. Part (iii) isa variant of Lipschitz continuity that holds in the limit as n grows. These conditions aremild: they amount to a slight strengthening of the rank condition from Assumption 7. In thelinear basis function example from (12) and (13), for instance, Assumption 8 holds whenever¯ C ig is bounded away from zero and S g takes on at least two distinct values between zero andone. More generally, provided that Assumption 7 holds, whenever ¯ C ig is bounded awayfrom zero and the basis functions f are well-behaved, we can always extend the definitionsof Q , Q from (9)–(10) to ensure that Assumption 8 holds. See Appendix C for full details.Under this assumption, we can derive sufficient conditions on the rates at which G and n approach infinity to ensure that the difference between b Z ig and Z ig is negligible. See the discussion in section 3 immediately following (12) for details. heorem 6. Suppose that E (cid:2) ρ g || W ig X ′ ig || (cid:3) and E (cid:2) ρ g || W ig U ig || (cid:3) are both o ( G ) . Then,under condition (i) of Theorem 4 and the conditions of Lemma 4,(i) log G/n → is sufficient for conditions (vi)–(vii) of Theorem 4.(ii) G log G/n → is sufficient for condition (iii) of Theorem 5. Taken together, Theorems 4–6 establish that b ϑ from (15) is consistent, and asymptoticallynormal in the limit as G and n grow at an appropriate rate. In practical terms, our estimatorsare appropriate for settings with many large groups such as the experiment of Cr´epon et al.(2013). To implement them in practice, all that is required is to calculate the estimatedinstrument b Z ig and then run the appropriate just-identified IV regression from Table 1 withstandard errors clustered by group. In this paper we have proposed methods to identify and estimate direct and indirect causaleffects under one-sided non-compliance, using data from a randomized saturation experiment.Under appropriate assumptions, we show that the key source of unobserved heterogeneity isthe share of compliers within a given group. In a setting with many large groups, this quantitycan be estimated and yields a simple IV estimator that is consistent and asymptoticallynormal in the limit as group size and the number of groups grow. A possible extension ofthe methods described above would be to consider settings with two-sided non-compliance.In this case our identification approach would condition on the share of always-takers inaddition to the share of compliers. Another interesting extension would be to considerrelaxing Assumption 5 to allow some dependence of individuals’ take-up decisions on theoffers of their peers. Work currently in progress explores this possibility.
A Proofs
The following lemma, taken from Constantinou and Dawid (2017) summarizes several useful prop-erties of conditional independence that we use in our proofs below. The names attached to prop-erties (i) and (iii)–(v) are taken from Pearl (1988). For the purposes of this document, we call thesecond property “redundancy.”
Lemma A.1 (Axioms of Conditional Independence) . Let
X, Y, Z, W be random vectors defined ona common probability space, and let h be a measurable function. Then:(i) (Symmetry): X | = Y | Z = ⇒ Y | = X | Z .(ii) (Redundancy): X | = Y | Y . iii) (Decomposition): X | = Y | Z and W = h ( Y ) = ⇒ X | = W | Z .(iv) (Weak Union): X | = Y | Z and W = h ( Y ) = ⇒ X | = Y | ( W, Z ) .(v) (Contraction): X | = Y | Z and X | = W | ( Y, Z ) = ⇒ X | = ( Y, W ) | Z . For simplicity, our proofs below freely use the “Symmetry” property without comment, althoughwe reference the other properties when used. We also rely on the following corollary of Lemma A.1.
Corollary A.1. X | = Y | Z implies ( X, Z ) | = Y | Z . Proof of Lemma 1.
Applying Corollary A.1 and the Decomposition property to Assumption 6(ii)yields Z g | = ( C g , ¯ C ig ) | ( N g , S g ). By the definition of conditional independence, it follows that thedistribution of Z g | ( N g , S g , C g , ¯ C ig ) is the same as that of Z g | ( N g , S g ): P ( Z g = z | N g = n, S g = s, C g , ¯ C ig ) = P ( Z g = z | N g = n, S g = s ) . (A.1)Now, define the shorthand A ≡ (cid:8) N g = n, S g = s, C g = c , ¯ C ig = ¯ c (cid:9) and let C ( i ) be the indices ofall non-zero components of c , excluding the i th component, i.e. C ( i ) ≡ { j = i : c j = 1 } . By thedefinition of ¯ D ig , the event (cid:8) ¯ D ig = d (cid:9) is equivalent to nP j = i C jg Z jg = d ( N g − o . Consequently, P ( ¯ D ig = d | A, Z ig ) = P X j = i C jg Z jg = d ( n − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A, Z ig = P X j ∈C ( i ) Z jg = d ( n − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A, Z ig where the first equality uses the fact that A implies N g = n , and the second uses the fact that A implies C g = c , so we know precisely which of the indicators C jg equal zero and which equalone. Under Assumption 2, (A.1) implies that Z g | A ∼ iid Bernoulli( s ). By our definition of C ( i )it follows that, conditional on A , the subvector of Z g that corresponds to C ( i ) constitutes an iidsequence of ¯ c ( n −
1) Bernoulli( s ) random variables, each of which is independent of Z ig . Hence,conditional on ( A, Z ig ), we see that P j ∈C ( i ) Z jg ∼ Binomial (cid:0) ¯ c ( n − , s (cid:1) . Proof of Lemma 2.
Under (5), Y ig = X ′ ig B ig where B ig = ( α ig , β ig , γ ig , δ ig ) ′ . Now, let R ig ≡ (cid:8) S g , Z ig , N g , ¯ C ig , C ig , B ig (cid:9) and Λ ig ≡ diag (cid:8) , C ig , ¯ C ig , C ig ¯ C ig (cid:9) . From Lemma 1 we see that E [ ¯ D ig |R ] =¯ C ig S g . Since D ig = C ig Z ig under one-sided non-compliance and IOR, it follows that E [ X ′ ig |R ig ] = Z ′ ig Λ ig . Hence, E [ Z ig Y ig ] = E (cid:2) Z ig E ( X ′ ig |R ig ) B ig (cid:3) = E (cid:2)(cid:0) Z ig Z ′ ig (cid:1) ( Λ ig B ig ) (cid:3) E (cid:2) Z ig X ′ ig (cid:3) = E (cid:2) Z ig E (cid:0) X ′ ig |R ig (cid:1)(cid:3) = E (cid:2)(cid:0) Z ig Z ′ ig (cid:1) Λ ig (cid:3) since Z ig and B ig are R ig –measurable. Now, applying Decomposition and Corollary A.1 to part(ii) of Assumption 6 gives Z ig | = ( C ig , ¯ C ig , B ig ) | ( S g , N g ). Under Bernoulli offers, however, this con-ditional distribution does not involve N g , so we obtain( C ig , ¯ C ig , B ig ) | = Z ig | S g . (A.2)Similarly, applying Decomposition to part (ii) of Corollary A.1, we see that ( C ig , ¯ C ig , B ig ) | = S g .Combining this with (A.2), the Contraction axiom yields ( C ig , ¯ C ig , B ig ) | = ( Z ig , S g ), implying that( Z ig Z ′ ig ) is independent of both Λ ig and ( Λ ig B ig ). Accordingly, ϑ IV = (cid:8) E (cid:2)(cid:0) Z ig Z ′ ig (cid:1) Λ ig (cid:3)(cid:9) − E (cid:2)(cid:0) Z ig Z ′ ig (cid:1) ( Λ ig B ig ) (cid:3) = E [ Λ ig ] − E [ Λ ig B ig ] . y the definitions of ϑ IV , Λ ig and B ig it follows that α IV = E [ α ig ] , β IV = E [ C ig β ig ] E [ C ig ] , γ IV = E (cid:2) ¯ C ig γ ig (cid:3) E (cid:2) ¯ C ig (cid:3) , δ IV = E (cid:2) C ig ¯ C ig δ ig (cid:3) E (cid:2) C ig ¯ C ig (cid:3) . By iterated expectations over C ig , we obtain β IV = E [ β ig | C ig = 1] while γ IV = E (cid:2) ¯ C ig γ ig (cid:3) E (cid:2) ¯ C ig (cid:3) = Cov( ¯ C ig , γ ig ) + E ( ¯ C ig ) E ( γ ig ) E ( ¯ C ig ) = E [ γ ig ] + Cov( ¯ C ig , γ ig ) E ( ¯ C ig ) . Similarly, again taking iterated expectations over C ig , δ IV = E (cid:2) ¯ C ig δ ig | ¯ C ig = 1 (cid:3) E (cid:2) ¯ C ig | C ig = 1 (cid:3) = E [ γ ig ] + Cov( ¯ C ig , δ ig | C ig = 1) E ( ¯ C ig | C ig = 1) . Proof of Theorem 1.
Assumption 6(i) implies ( C g , B g ) | = S g | N g by Weak Union and Decompo-sition. Combining this with Assumption 6(ii) gives( Z g , S g ) | = ( B g , C g ) | N g (A.3)by Contraction. Now let C − ig denote the subvector of C g that excludes element i . ApplyingDecomposition, Corollary A.1, and Weak Union to (A.3),( S g , Z g ) | = ( B ig , C ig , C − ig , N g ) | ( N g , ¯ C ig ) . (A.4)because ¯ C ig is a function of ( C g , N g ). By Lemma 1,¯ D ig | = C − ig | ( N g , ¯ C ig , S g , Z ig ) . (A.5)Applying Decomposition to (A.4) gives C − ig | = ( S g , Z ig ) | ( N g , ¯ C ig ). Combining this with (A.5),( S g , Z ig , ¯ D ig ) | = C − ig | ( N g , ¯ C ig ) (A.6)by Contraction. Now, applying Weak Union, Decomposition, and Corollary A.1 to (A.4),( S g , Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( C − ig , ¯ C ig , N g ) . (A.7)since ¯ D ig is a function of ( Z g , C − ig , N g ). Finally, applying Contraction to (A.6) and (A.7),( S g , Z ig , ¯ D ig ) | = ( C − ig , B ig , C ig ) | ( ¯ C ig , N g )and the result follows by a final application of Decomposition. Proof of Lemma 3.
Define the shorthand U ≡ Q (¯ c, n ) , A ≡ Q (¯ c, n ), and B = Q (¯ c, n ) so that U = (cid:20) A + B BB B (cid:21) . Using this notation, we are asked to show that U is invertible if and only if A and B are both nvertible, in which case U − = V where V ≡ (cid:20) A − − A − − A − A − + B − (cid:21) . The “if” direction follows by direct calculation:
V U = U V = I . For the “only if” direction, supposethat U is invertible. Partitioning U − into blocks ( C, D, E, F ) conformably with the partition of U , we have U U − = (cid:20) A + B BB B (cid:21) (cid:20)
C DE F (cid:21) = (cid:20) I I (cid:21) = (cid:20) C DE F (cid:21) (cid:20) A + B BB B (cid:21) = U − U. We begin by showing that A is invertible. Consider the product U U − . Multiplying the firstrow of U by the first column of U − gives the equation AC + B ( C + E ) = I ; multiplying thesecond row of U by the first column of U − gives B ( C + E ) = 0. Combining these, AC = I m .Now consider the product U − U . Multiplying the first row of U − by the first column of U gives CA +( C + D ) B = I ; multiplying the first row of U − by the second column of U gives ( C + D ) B = 0.Combining these, CA = I . Since AC = CA = I , we have shown that A is invertible with A − = C .We next show that D = E = − C . Consider again the product U U − . Multiplying the first rowof U by the second column of U − gives AD + B ( D + F ) = 0; multiplying the second row of U bythe second column of U − gives B ( D + F ) = I . Combining these, AD = − I and because A − = C we can solve this equation to yield D = − C . Now consider U − U . Multiplying the second row of U − by the first column of U gives EA + ( E + F ) B = 0; multiplying the second row of U − by thesecond column of U gives ( E + F ) B = I . Combining these, EA = − I and solving for E , we have E = − C since A − = C .Finally we show that B is invertible. Multiplying the second row of U by the second column of U − gives B ( D + F ) = I , but since D = − C this becomes B ( F − C ) = I Multiplying the secondrow of U − by the first column of U gives ( E + F ) B + EA = 0 but because E = − C = A − thisbecomes ( F − C ) B = I . Thus, B ( F − C ) = ( F − C ) B = I so we have shown that B is invertiblewith B − = F − C . Proof of Theorem 2.
For each part, it suffices to find an appropriate outcome variable e Y ig , re-gressor vector e X ig , and instrument set e Z ig such that we can write e Y ig = e X ′ ig ϑ + U ig where ϑ isthe parameter of interest, E [ e Z ig U ig ] = , and E [ e Z ig e X ′ ig ] is invertible. Note that ( e X ig , e Y ig , e Z ig )are placeholders for quantities that differ in each part of the proof: for part (i) they represent( X ig , Y ig , Z Wig ) while for part (ii) they stand for (cid:0) D ig f ( ¯ D ig ) , D ig Y ig , Z ig (cid:1) , for example. The defini-tions of U ig and ϑ are also specific to each part of the proof. Part (i)
By (2) we can write e Y ig = e X ′ ig ϑ + U ig where ϑ ′ ≡ (cid:2) E ( θ ′ ig ) E ( ψ ′ ig − ϑ ′ ig | C ig = 1) (cid:3) , e Y ig ≡ Y ig , e X ig ≡ X ig , and U ig ≡ X ′ ig ( B ig − ϑ ). Under IOR D ig = C ig Z ig . Hence, defining M ig ≡ diag { , C ig } ⊗ I K , X ig = (cid:18)(cid:20) C ig (cid:21) (cid:20) Z ig (cid:21)(cid:19) ⊗ (cid:2) I K f ( ¯ D ig ) (cid:3) = (cid:18)(cid:20) C ig (cid:21) ⊗ I K (cid:19) (cid:18)(cid:20) Z ig (cid:21) ⊗ f ( ¯ D ig ) (cid:19) = M ig W ig . Since M ig is symmetric, U ig = W ′ ig [ M ig ( B ig − ϑ )]. Thus, taking e Z Wig ≡ Z ig , we have E [ e Z ig U ig ] = E n E h e Z ig U ig (cid:12)(cid:12)(cid:12) ¯ C ig , N g io = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) W ig W ′ ig M ig ( B ig − ϑ ) (cid:12)(cid:12) ¯ C ig , N g (cid:3)(cid:9) y iterated expectations. By assumption ( Z ig , ¯ D ig ) | = ( C ig , B ig ) | ( ¯ C ig , N g ). Hence, E (cid:2) W ig W ′ ig M ig ( B ig − ϑ ) (cid:12)(cid:12) ¯ C ig , N g (cid:3) = E (cid:2) W ig W ′ ig (cid:12)(cid:12) ¯ C ig , N g (cid:3) E (cid:2) M ig ( B ig − ϑ ) | ¯ C ig , N g (cid:3) by Decomposition, since W ig W ′ ig is a measurable function of ( Z ig , ¯ D ig ) and M ig ( B ig − ϑ ) is ameasurable function of ( C ig , B ig ). Substituting into the expression for E [ e Z ig U ig ], E h e Z ig U ig i = E (cid:8) E (cid:2) M ig ( B ig − ϑ ) | ¯ C ig , N g (cid:3)(cid:9) = E [ M ig ( B ig − ϑ )]by iterated expectations, since Q ( ¯ C ig , N g ) − = E [ W ig W ′ ig | ¯ C ig , N g ] − . Now, substituting the defi-nitions of M ig , B ig , and ϑ , E [ M ig ( B ig − ϑ )] = E (cid:20) ( θ ig − E { θ ig } ) C ig (cid:0)(cid:8) ψ ig − θ ig (cid:9) − E (cid:8) ψ ig − θ ig (cid:12)(cid:12) C ig = 1 (cid:9)(cid:1)(cid:21) = since E (cid:2) C ig (cid:0) ψ ig − θ ig (cid:1)(cid:3) = E ( C ig ) E (cid:0) ψ ig − θ ig (cid:12)(cid:12) C ig = 1 (cid:1) . Therefore E [ e Z ig U ig ] = . Similarly, E h e Z ig e X ′ ig i = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) W ig W ′ ig M ig | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) W ig W ′ ig | ¯ C ig , N g (cid:3) E (cid:2) M ig | ¯ C ig , N g (cid:3)(cid:9) = E [ M ig ] . Since [ M ig ] is invertible if and only if E ( C ig ) = 0, it follows that E [ e Z ig e X ′ ig ] is invertible byAssumption 7. Part (ii)
Since D ig = D ig and D ig (1 − D ig ) = 0, multiplying both sides of (2) by D ig andsimplifying gives D ig Y ig = D ig f ( ¯ D ig ) ψ ig . Thus e Y ig = e X ′ ig ϑ + U ig where ϑ ≡ E ( ψ ig | C ig = 1), e Y ≡ D ig Y ig , e X ig ≡ D ig f ( ¯ D ig ), and U ig ≡ (cid:2) D ig f ( ¯ D ig ) (cid:3) ′ ( ψ ig − ϑ ). The remainder of the argumentis similar to that of part (i). Taking e Z ig ≡ Z ig and substituting D ig = Z ig C ig gives E [ e Z ig U ig ] = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig | ¯ C ig , N g (cid:3) E (cid:2) C ig ( ψ ig − ϑ ) (cid:12)(cid:12) ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) E (cid:2) C ig ( ψ ig − ϑ ) | ¯ C ig , N g (cid:3)(cid:9) = E (cid:2) C ig ( ψ ig − ϑ ) (cid:3) . Since E [ C ig ψ ig ] = E ( C ig ) E ( ψ ig | C ig = 1) = E ( C ig ϑ ), we obtain E ( e Z ig U ig ) = . Similarly, E h e Z ig e X ′ ig i = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig C ig | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig | ¯ C ig , N g (cid:3) E (cid:2) C ig | ¯ C ig , N g (cid:3)(cid:9) = E ( C ig ) I K . Hence, E [ e Z ig e X ig ] ′ is invertible by Assumption 7. Part (iii)
Since (1 − D ig ) = (1 − D ig ) and D ig (1 − D ig ) = 0, multiplying both sides of (2)by Z ig (1 − D ig ) and simplifying gives Z ig (1 − D ig ) Y ig = Z ig (1 − D ig ) f ( ¯ D ig ) θ ig . Thus we have e Y ig = e X ′ ig ϑ + U ig where ϑ ≡ E ( θ ig | C ig = 0), e Y ig ≡ Z ig (1 − D ig ) Y ig , e X ig ≡ Z ig (1 − D ig ) f ( ¯ D ig ), and U ig ≡ (cid:2) Z ig (1 − D ig ) f ( ¯ D ig ) (cid:3) ′ ( θ ig − ϑ ). The remainder of the argument is similar to that of part (i). aking e Z ig ≡ Z ig and substituting Z ig (1 − D ig ) = Z ig (1 − C ig ) gives E [ e Z ig U ig ] = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig | ¯ C ig , N g (cid:3) E [(1 − C ig )( θ ig − ϑ ) | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) E (cid:2) (1 − C ig )( θ ig − ϑ ) | ¯ C ig , N g (cid:3)(cid:9) = E [(1 − C ig )( θ ig − ϑ )] . Since E [(1 − C ig ) θ ig ] = E (1 − C ig ) E ( θ ig | C ig = 1) = E [(1 − C ig ) ϑ ], we obtain E ( e Z ig U ig ) = .Similarly, E h e Z ig e X ′ ig i = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig (1 − C ig ) | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ Z ig | ¯ C ig , N g (cid:3) E (cid:2) (1 − C ig ) | ¯ C ig , N g (cid:3)(cid:9) = E (1 − C ig ) I K . It follows that E [ e Z ig e X ′ ig ] is invertible by Assumption 7. Part (iv)
Under one-sided non-compliance and IOR, (1 − Z ig )(1 − D ig ) = (1 − Z ig ). Hence,multiplying both sides of (2) by (1 − Z ig ), we obtain (1 − Z ig ) Y ig = (1 − Z ig ) f ( ¯ D ig ) ′ θ ig , using thefact that Z ig (1 − Z ig ) = 0. Thus we can write e Y ig = e X ′ ig ϑ + U ig where ϑ ≡ E ( θ ig ), e Y ig ≡ (1 − Z ig ) Y ig , e X ig ≡ (1 − Z ig ) f ( ¯ D ig ), and U ig ≡ (1 − Z ig ) f ( ¯ D ig ) ′ ( θ ig − ϑ ). The remainder of the argument issimilar to that of part (i). Taking e Z ig ≡ Z ig , we obtain E [ e Z ig U ig ] = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ (1 − Z ig ) | ¯ C ig , N g (cid:3) E [ θ ig − ϑ | ¯ C ig , N g (cid:3)(cid:9) = E (cid:8) E (cid:2) θ ig − E ( θ ig ) | ¯ C ig , N g (cid:3)(cid:9) = and E h e Z ig e X ′ ig i = E (cid:8) Q ( ¯ C ig , N g ) − E (cid:2) f ( ¯ D ig ) f ( ¯ D ig ) ′ (1 − Z ig ) | ¯ C ig , N g (cid:3)(cid:9) = I K . Lemma A.2.
Under Assumptions 2 and 6, ( S g , Z ig ) | = ( C ig , ¯ C ig , N g , B ig ) . Proof of Lemma A.2.
By Assumption 2 Z ig | = N g | S g and by Assumption 6 (ii) and Decomposi-tion Z ig | = ( C ig , B ig ) | ( S g , N g ). Combining these by Contraction yields Z ig | = ( C g , B ig , N g ) | S g . (A.8)Now, by Assumption 6 (i) we have S g | = ( C g , B ig , N g ). Combining this with (A.8) by a secondapplication of Contraction gives ( Z ig , S g ) | = ( C g , B ig , N g ). The result follows by a final applicationof Decomposition. Proof of Theorem 3.
Assumptions 1–6 imply that ( Z ig , ¯ D ig ) | = ( B ig , C ig ) | ( ¯ C ig , N g ) by Theorem 1.Hence Assumptions 1–7 are sufficient for the conclusions of Theorem 2 to hold. Now, by Lemma 1,Assumptions 1–2 and 4–6 imply that the conditional distribution of ¯ D ig | ( ¯ C ig , N g , Z ig ) is known.Moreover, by Lemma A.2, Z ig | = ( ¯ C ig , N g ) so the distribution of Z ig | ( ¯ C ig , N g ) is likewise known. Itfollows that Q , Q and Q are known functions of ( ¯ C ig , N g ). Since N g is observed, knowledge of¯ C ig is thus sufficient to identify the quantities E ( θ ig ) , E ( ψ ig − θ ig | C ig = 1) , E ( ψ ig | C ig = 1) , E ( θ ig | C ig = 0)by the relevant parts of Theorem 2. Now, by iterated expectations, E ( θ ig | C ig = 1) = E ( θ ig | C ig = 0) + 1 E ( C ig ) [ E ( θ ig ) − E ( θ ig | C ig = 0)] . ince E ( C ig ) = E ( D ig | Z ig = 1), it follows that E ( θ ig | C ig = 1) is identified. Under IOR andone-sided non-compliance { D ig = 1 } = { C ig = 1 , Z ig = 1 } , and applying Weak Union and Decom-position to Lemma A.2, we see that Z ig | = B ig | C ig . Thus, E ( B ig | D ig = 1) = E ( B ig | C ig = 1 , Z ig = 1) = E ( B ig | C ig = 1) . The result follows since Y ig ( d, ¯ d ) = f ( ¯ d ) ′ θ ig + d f ( ¯ d ) ′ ( ψ ig − θ ig ) under Assumption 3. Proof of Theorem 4.
Substituting the model into the definition of b ϑ and ρ g ≡ N g / E ( N g ), b ϑ − ϑ = G X g =1 N g X i =1 b Z ig X ′ ig − G X g =1 N g X i =1 b Z ig U ig = G G X g =1 A g + 1 G G X g =1 R (1) g − G G X g =1 P g + 1 G G X g =1 R (2) g where we define A g ≡ N g N g X i =1 ρ g b Z ig X ′ ig R (1) g ≡ N g N g X i =1 ρ g ( b Z ig − Z ig ) X ′ ig P g ≡ N g N g X i =1 ρ g b Z ig U ′ ig R (2) g ≡ N g N g X i =1 ρ g ( b Z ig − Z ig ) U ig . By assumption, both || P Gg =1 R (1) g || and || P Gg =1 R (2) g || are o P ( G ) and thus b ϑ − ϑ = G G X g =1 A g + o P (1) − G G X g =1 P g + o P (1) Now, since we observe a random sample of groups and A g is a group-level random variable E G G X g =1 A g = E ( A g ) = E N g N g X i =1 E (cid:0) ρ g Z ig X ′ ig | N g (cid:1) = E (cid:2) E (cid:0) ρ g Z ig X ′ ig | N g (cid:1)(cid:3) = E ( ρ g Z ig X ′ ig )where the second equality uses iterated expectations and linearity, the third uses the assumption ofidentical distribution within groups, and the the fourth uses iterated expectations a second time.Now consider an arbitrary entry A ( j,k ) g of the matrix A g and let k·k F denote the Frobenius norm.By the triangle and Cauchy-Schwarz inequalities, and using the assumption of identical distribution ith group, we haveVar G G X g =1 A ( j,k ) g = 1 G Var (cid:16) A ( j,k ) g (cid:17) ≤ G E (cid:2) || A g || F (cid:3) = 1 G E N g (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N g X i =1 ρ g Z ig X ′ ig (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F ≤ G E N g N g X i =1 (cid:13)(cid:13) ρ g Z ig X ′ ig (cid:13)(cid:13) F = 1 G E N g E X i,j ≤ N g (cid:13)(cid:13) ρ g Z ig X ′ ig (cid:13)(cid:13) F (cid:13)(cid:13) ρ g Z jg X ′ jg (cid:13)(cid:13) F (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N g ≤ G E N g E X i,j ≤ N g (cid:13)(cid:13) ρ g Z ig X ′ ig (cid:13)(cid:13) F (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N g = 1 G E h E (cid:16) (cid:13)(cid:13) ρ g Z ig X ′ ig (cid:13)(cid:13) F (cid:12)(cid:12)(cid:12) N g (cid:17)i = 1 G E h ρ g (cid:13)(cid:13) Z ig X ′ ig (cid:13)(cid:13) F i → E (cid:20) ρ g (cid:13)(cid:13)(cid:13) Z ig X ′ ig (cid:13)(cid:13)(cid:13) F (cid:21) = o ( G ). Hence, by the L weak law of large numbers G − P Gg =1 A g → p E ( ρ g Z ig X ′ ig ) = I . An analogous argument shows that G − P Gg =1 P g → p E ( ρ g Z ig U ig ) = . The result follows by the continuous mapping theorem. Proof of Theorem 5.
Continuing the argument from the proof of Theorem 4, we have √ G ( b ϑ − ϑ ) = [ I + o P (1)] − √ G G X g =1 P g + 1 √ G G X g =1 R (2) g . By assumption, || P Gg =1 R (2) g || = o P ( G / ), and hence √ G ( b ϑ − ϑ ) = √ G P Gg =1 P g + o P (1). Thus,it suffices to apply the Lindeberg-Feller central limit theorem to P g / √ G . Because we observe arandom sample of groups, Var( P Gg =1 P g / √ G ) = Var( P g ) which by assumption converges to Σ. Allthat remains is to verify the Lindeberg condition, namely E h || P g || n || P g || > ε √ G oi → ε >
0. A sufficient condition for this to hold is G − δ/ E (cid:2) || P g || δ (cid:3) → δ > E (cid:2) || A g || F (cid:3) ≤ E h ρ g || Z ig X ′ ig || F i in the proof ofTheorem 4, we likewise have G − δ/ E h || P g || δ i ≤ G − δ/ E h ρ δg || Z ig X ′ ig || δ i = o (1)so the result follows. Lemma A.3.
Let ¯ Z g ≡ P N g j =1 Z jg /N g . Under the conditions of Lemma 4, P ( ¯ Z g < s/ ≤ exp (cid:8) − ns / (cid:9) . Proof of Lemma A.3.
Conditional on ( N g = n, S g = s ), the treatment offers ( Z , . . . , Z N g ) are a ollection of n iid Bernoulli( s ) random variables by Assumption 2. Hence, by Hoeffding’s inequality P (cid:0) ¯ Z g < s/ | N g = n, S g = s (cid:1) ≤ exp (cid:8) − n ( s − s/ (cid:9) ≤ exp (cid:8) − ns / (cid:9) where the second inequality follows since s ≤ s . Thus, P ( ¯ Z g < s/
2) = X n,s P ( ¯ Z g ≤ s/ | N g = n, S g = s ) P ( N g = n, S g = s ) ≤ exp (cid:8) − ns / (cid:9) by the law of total probability. The result follows since P ( ¯ Z g < s/ ≤ P ( Z g ≤ s/ Lemma A.4.
Let ¯ C g = P N g j =1 C jg /N g and b C g ≡ P N g j =1 D jg /N g ¯ Z g , where ¯ Z g is as defined inLemma A.3. Under the conditions of Lemma 4 and for any t > , P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) ≥ t (cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:17) ≤ (cid:8) − ns t / (cid:9) . Proof of Lemma A.4.
Let
A ≡ (cid:8) C g = c , N g = n, ¯ C g = ¯ c, N g ¯ Z g = m, S g = s (cid:9) where m > c = 0. In this case P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) A (cid:17) = P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 c j Z jg m − ¯ c (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A = P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ¯ c X j ∈C Z ∗ jg − ¯ c (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A where C ≡ { j : c j = 1 } and Z ∗ jg ≡ n ¯ cZ jg /m . Given A , the { Z jg } j ∈C are a sequence of n ¯ c drawsmade without replacement from a population of m ones and ( n − m ) zeros. Thus E ( Z ∗ jg ) = n ¯ cm P ( Z jg = 1 |A ) = n ¯ cm · mn = ¯ c. Moreover, since Z jg ∈ { , } , each of the Z ∗ jg is bounded between 0 and n ¯ c/m . While these randomvariables are identically distributed, they are not independent—like the Z jg from which they areconstructed, n Z ∗ jg o j ∈C are draws made without replacement from a finite population. Under thisform of dependence, however, Hoeffding’s Inequality continues to apply (Hoeffding, 1963, p. 28)and hence P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) A (cid:17) ≤ (cid:26) − t m n ¯ c (cid:27) ≤ (cid:26) − n (cid:16) mn (cid:17) t (cid:27) where the second inequality follows because 0 < ¯ c ≤
1. If ¯ c = 0, we have P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) A (cid:17) = P ( | − | > t |A ) = 0 ≤ (cid:26) − n (cid:16) mn (cid:17) t (cid:27) so this inequality holds for any ¯ c . Applying the law of total probability as in the proof of Lemma A.3,we see that P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) N g = n, N g ¯ Z g = m (cid:17) ≤ (cid:26) − n (cid:16) mn (cid:17) t (cid:27) nd thus P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) ≥ t (cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:17) = X { ( m,n ): mn ≥ s/ } P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12) N g = n, N g ¯ Z g = m (cid:17) × P ( N g ¯ Z g = m, N g = n | ¯ Z g ≥ s/ ≤ X { ( m,n ): mn ≥ s/ } (cid:26) − n (cid:16) mn (cid:17) t (cid:27) P ( N g ¯ Z g = m, N g = n | ¯ Z g ≥ s/ ≤ X { ( m,n ): mn ≥ s/ } (cid:8) − ns t / (cid:9) P ( N g ¯ Z g = m, N g = n | ¯ Z g ≥ s/ (cid:8) − ns t / (cid:9) by a second application of the law of total probability, since n ≤ N g . Lemma A.5.
Suppose that sn > . Then, under the conditions of Lemma 4, P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:19) ≤ (cid:8) − ns h ( sn, t ) / (cid:9) where we define h ( x, t ) ≡ (cid:18) x − x (cid:19) t − " − (cid:18) x − x (cid:19) x − . Proof of Lemma A.5.
If ¯ Z g > s/ > /n , then N g ¯ Z g − Z ig > N g ¯ Z g >
0. Hence, b C ig ≡ ¯ D ig ¯ Z ig = N g ¯ D g − D ig N g ¯ Z g − Z ig = N g ¯ Z g b C g − D ig N g ¯ Z g − Z ig = (cid:18) N g ¯ Z g N g ¯ Z g − Z ig (cid:19) b C g − D ig N g ¯ Z g − Z ig . Similar manipulations give ¯ C ig = (cid:18) N g N g − (cid:19) ¯ C g − C ig N g − (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) N g ¯ Z g N g ¯ Z g − Z ig (cid:19) b C g − (cid:18) N g N g − (cid:19) ¯ C g (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) C ig N g − − D ig N g ¯ Z g − Z ig (cid:12)(cid:12)(cid:12)(cid:12) by the triangle inequality. Using the fact that Z ig , D ig , and C ig are binary along with n ≤ N g and¯ Z g > s/ > /n , tedious but straightforward algebra allows us to bound the right-hand side of thepreceding inequality from above, yielding (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) ≤ (cid:18) snsn − (cid:19) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) + "(cid:18) snsn − (cid:19) + 1 sn − . Since this upper bound for | b C ig − ¯ C ig | does not depend on i , it follows thatmax ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) ≤ (cid:18) snsn − (cid:19) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) + "(cid:18) snsn − (cid:19) + 1 sn − rovided that ¯ Z g > s/ > /n . In other words, so long as sn > (cid:8) ¯ Z g ≥ s/ (cid:9) ∩ (cid:26) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:27) ⊆ (cid:8) ¯ Z g > s/ (cid:9) ∩ n(cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > h ( sn, t ) o . Therefore, by the monotonicity of probability P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:19) ≤ P (cid:16) (cid:12)(cid:12)(cid:12) b C g − ¯ C g (cid:12)(cid:12)(cid:12) > h ( sn, t ) (cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:17) and the result follows by Lemma A.4. Proof of Lemma 4.
By the law of total probability, Lemma A.4, and Lemma A.5 P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:19) ≤ P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:12)(cid:12)(cid:12)(cid:12) ¯ Z g ≥ s/ (cid:19) + P ( ¯ Z g < s/ ≤ (cid:8) − ns h ( sn, t ) / (cid:9) + exp (cid:8) − ns / (cid:9) where h ( · , · ) is as defined in Lemma A.5. Expanding and simplifying, we see that h ( sn, t ) ≥ (cid:18) sn − sn (cid:19) t − tsn − ≡ h ∗ ( sn, t ) . Now, for any t ≥ P (cid:16) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:17) since both b C ig and ¯ C ig are betweenzero and one. Since h ∗ ( sn, t ) < t <
1, it follows that P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:19) ≤ (cid:8) − ns h ( sn, t ) / (cid:9) + exp (cid:8) − ns / (cid:9) ≤ (cid:8) − ns h ∗ ( sn, t ) / (cid:9) + exp (cid:8) − ns / (cid:9) ≤ (cid:8) − ns h ∗ ( sn, t ) / (cid:9) Applying the union bound we obtain P (cid:18) max ≤ g ≤ G max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:19) = P G [ g =1 (cid:26) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:27) ≤ G X g =1 P (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) > t (cid:19) ≤ G X g =1 (cid:8) − ns h ∗ ( sn, t ) / (cid:9) = 3 G exp (cid:8) − ns h ∗ ( sn, t ) / (cid:9) nd accordingly we have P max ≤ g ≤ G max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12)s log Gn > M ≤ G exp ( − ns "(cid:18) sn − sn (cid:19) log Gn M − sn − s log Gn M = 3 G exp ( log G " − s (cid:18) sn − sn (cid:19) M − sn − r n log G . The expression on the right-hand side converges to 3 exp (cid:8) log G (cid:2) − s M / (cid:3)(cid:9) as ( n, G ) → ∞ andhence can be made arbitrarily small by choosing a sufficiently large value of M . Proof of Theorem 6.
We provide the argument for condition (vii) of Theorem 4 and (iii) ofTheorem 5 only. For (vi) from Theorem 4, simply replace U ig with X ig in the following derivations.By (17) and the triangle inequality (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) G X g =1 N g N g X i =1 ρ g ( b Z ig − Z ig ) U ig (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ ∆ G G X g =1 N g N g X i =1 || ρ g W ig U ig || (A.9)where we define the shorthand∆ G ≡ max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) + − R ( ¯ C ig , N g ) − (cid:13)(cid:13)(cid:13)(cid:19) . Consider the second factor on the RHS of (A.9). By an argument similar to that used in the proofof Theorem 4, 1 G G X g =1 N g N g X i =1 || ρ g W ig U ig || → p E [ || ρ W ig U ig || ] < ∞ so that P Gg =1 1 N g P N g i =1 || ρ g W ig U ig || = O P ( G ). Now, define the event b G as b G ≡ (cid:26) min ≤ g ≤ G (cid:18) min ≤ i ≤ N g b C ig (cid:19) ≥ ¯ c L (cid:27) . By assumption R ( ¯ C ig , N ig ) is invertible, and conditional on b C ig ≥ ¯ c L / R ( b C ig , N g )is likewise invertible. Hence, if b G = 1 we can write (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − − R ( ¯ C ig , N g ) − (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − h R ( b C ig , N g ) − R ( ¯ C ig , N g ) i R ( ¯ C ig , N g ) − (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − R ( ¯ C ig , N g ) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) R ( ¯ C ig , N g ) − (cid:13)(cid:13) . Let || M || denote the spectral norm of a matrix M , i.e. its largest singular value. Since R ( ¯ C ig , N g )is square, symmetric, and positive definite we have || R ( ¯ C ig , N g ) − || ≤ /σ < ∞ . Similarly, if b G = 1, then || R ( b C ig , N g ) − || ≤ /σ < ∞ . Because all finite-dimensional norms are equivalent, it ollows that b G ∆ G ≤ K max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:13)(cid:13)(cid:13) R ( b C ig , N g ) − R ( ¯ C ig , N g ) (cid:13)(cid:13)(cid:13)(cid:19) ≤ K (cid:26) max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12)(cid:19) + O ( n − / ) (cid:27) where 0 < K < ∞ denotes a generic, unspecified constant. Applying Lemma 4 we see that b G ∆ G = O P (cid:16)p log G/n (cid:17) as ( n, G ) → ∞ . Thus, by (A.9), b G (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) G X g =1 N g N g X i =1 ρ g ( b Z ig − Z ig ) U ig (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O P s log Gn ! O P ( G ) . (A.10)If log G/n → n, G ) → ∞ , then the rate on the RHS of (A.10) becomes o P ( G ). If G log G/n →
0, it becomes o P ( G / ). Finally, since ¯ c L ≤ ¯ C ig , it follows that P (cid:16)b G = 1 (cid:17) ≤ P (cid:20) max ≤ g ≤ G (cid:18) max ≤ i ≤ N g (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12) ≥ ¯ c L (cid:19)(cid:21) Hence, applying Lemma 4, log
G/n → b G → p
1. The result follows.
B Experiments with a 0% Saturation
Some randomized saturation designs, including the experiment of Cr´epon et al. (2013), include azero percent saturation, also known as a “pure control” condition. Under one-sided non-compliance S g = 0 implies Z ig = D ig = ¯ D ig = 0 for all 1 ≤ i ≤ N g . Accordingly, we cannot estimate the shareof compliers b C ig from (14) for groups assigned a saturation of zero. The easiest solution to thisproblem is simply to drop observations for any zero saturation groups. Under Assumptions 1–2and 6 this has no effect on our identification or large-sample results provided that we replace Q , Q and Q with expectations that condition on S g >
0, namely e Q (¯ c, n ) ≡ E (cid:2) W ig W ′ ig | ¯ C ig = ¯ c, N g = n, S g > (cid:3)e Q (¯ c, n ) ≡ E (cid:2) (1 − Z ig ) f ( ¯ D ig ) f ( ¯ D ig ) ′ | ¯ C ig = ¯ c, N g = n, S g > (cid:3)e Q (¯ c, n ) ≡ E (cid:2) Z ig f ( ¯ D ig ) f ( ¯ D ig ) ′ | ¯ C ig = ¯ c, N g = n, S g > (cid:3) . Zero percent saturation groups, however, are informative: they pin down the value of E [ Y ig (0 , E [ θ ig ]. To exploit this information, we replace theinstrument vector from part (iv) of Theorem 2 with e Z ig ≡ (cid:20) { S g > } e Q ( ¯ C ig , N g ) − f ( ¯ D ig ) { S g = 0 } (cid:21) . Calculations similar to those in the proof of Theorem 2 establish that this is a valid and relevantinstrument. Because its dimension exceeds that of θ ig by one, this instrument vector providesover-identifying information. As such, the just-identified IV moment condition from part (iv) ofTheorem 2 must be replaced with a linear GMM moment equation. Subject to this small change,estimation and inference can proceed almost exactly as in section 4: we merely substitute b C ig for¯ C ig in e Q to yield a feasible GMM estimator, e.g. two-stage least squares. With minor notationalmodifications, our large-sample results continue to apply. Extending the Definition of Q
Technically, the conditional expectations in (8)–(10) are only well-defined when n ¯ c is a positiveinteger, whereas Assumption 8 requires the functions Q , Q , and Q to be defined over a continuousrange of values for ¯ c . This problem is easily solved by extending the definitions of Q and Q .In many cases, the natural extension will be obvious. In the linear potential outcomes model, forexample, (12) and (13) agree with (9) and (10) when these conditional expectations are well-definedand satisfy all the conditions of Assumption 8More generally, we can always construct extended definitions of Q and Q to satisfy theseregularity conditions. Here we provide one such construction based on linear interpolation . Let¯ c ℓ (¯ c, n ) ≡ ⌊ ( n − c ⌋ n − , ¯ c u (¯ c, n ) ≡ ⌈ ( n − c ⌉ n − . By construction, ( n − c u (¯ c, n ) and ( n − c ℓ (¯ c, n ) are always non-negative integers. Now let Q ℓz (¯ c, n ) ≡ E (cid:2) ( Z ig = z ) f ( ¯ D ig ) f ( ¯ D ig ) ′ (cid:12)(cid:12) ¯ C ig = ¯ c ℓ (¯ c, n ) , N g = n (cid:3) Q uz (¯ c, n ) ≡ E (cid:2) ( Z ig = z ) f ( ¯ D ig ) f ( ¯ D ig ) ′ (cid:12)(cid:12) ¯ C ig = ¯ c u (¯ c, n ) , N g = n (cid:3) for z = 0 ,
1. Notice that Q ℓ , Q ℓ and Q u , Q u are well-defined regardless of whether ( n − c is aninteger. From these ingredients, we construct extended definitions Q ∗ and Q ∗ of Q , Q as Q ∗ z (¯ c, n ) = [1 − ω (¯ c, n )] Q ℓz (¯ c, n ) + ω (¯ c, n ) Q uz (¯ c, n ); ω (¯ c, n ) ≡ ¯ c − ¯ c ℓ (¯ c, n )¯ c u (¯ c, n ) − ¯ c ℓ (¯ c, n ) ∈ [0 , z = 0 ,
1. Since both Q ℓz and Q uz are symmetric and positive definite, their convex combination Q ∗ z is as well. To show that this construction satisfies Assumption 8 (iii), define Q ∞ (¯ c ) ≡ E (cid:2) (1 − S g ) f (¯ cS g ) f (¯ cS g ) ′ (cid:3) , Q ∞ (¯ c ) ≡ E (cid:2) S g f (¯ cS g ) f (¯ cS g ) ′ (cid:3) . Recall that 0 ≤ S g ≤ c is a real number betweenzero and one, and f is a K -vector of Lipschitz-continuous functions, all of which are bounded on[0 , Q ∞ and Q ∞ are bounded and Lipschitz-continuous on [0 , (cid:13)(cid:13)(cid:13) Q ℓz (¯ c, n ) − Q ∞ z (¯ c ℓ (¯ c, n )) (cid:13)(cid:13)(cid:13) ≤ L √ n − , k Q uz (¯ c, n ) − Q ∞ z (¯ c u (¯ c, n )) k ≤ L √ n − L denotes an arbitrary, finite, positive constant. Similarly, k Q ∞ z (¯ c ) − Q ∞ z (¯ c ℓ (¯ c, n )) k ≤ Ln − , k Q ∞ z (¯ c ) − Q ∞ z (¯ c u (¯ c, n )) k ≤ Ln − . Combining these inequalities an applying the triangle inequality, it follows that (cid:13)(cid:13)(cid:13) Q uz (¯ c, n ) − Q ℓz (¯ c, n ) (cid:13)(cid:13)(cid:13) ≤ L √ n − , k Q uz (¯ c, n ) − Q ∞ z (¯ c ) k ≤ L √ n − (cid:13)(cid:13)(cid:13) Q ℓz (¯ c, n ) − Q ∞ z (¯ c ) (cid:13)(cid:13)(cid:13) ≤ L √ n − here, again, L is an arbitrary, finite, positive constant. Thus, k Q ∗ z (¯ c, n ) − Q ∞ z (¯ c ) k ≤ (cid:13)(cid:13)(cid:13) Q ∗ z (¯ c, n ) − Q ℓz (¯ c, n ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) Q ℓz (¯ c, n ) − Q ∞ z (¯ c ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) Q ∗ z (¯ c, n ) − Q ℓz (¯ c, n ) (cid:13)(cid:13)(cid:13) + L √ n − ω (¯ c, n ) (cid:13)(cid:13)(cid:13) Q uz (¯ c, n ) − Q ℓz (¯ c, n ) (cid:13)(cid:13)(cid:13) + L √ n − ≤ L √ n − Q ∗ z and ω (¯ c, n ) from above. Combining all of the preceding inequalities, (cid:13)(cid:13)(cid:13) Q ∗ z ( b C ig , N g ) − Q ∗ z ( ¯ C ig , N g ) (cid:13)(cid:13)(cid:13) ≤ L (cid:26) √ n − (cid:12)(cid:12)(cid:12) b C ig − ¯ C ig (cid:12)(cid:12)(cid:12)(cid:27) since n ≤ N g and Q ∞ z is Lipschitz-continuous. References
Akram, A. A., Chowdhury, S., Mobarak, A. M., 2018. Effects of emigration on rural labor markets.URL http://faculty.som.yale.edu/mushfiqmobarak/papers/migrationge.pdf
Altonji, J. G., Matzkin, R. L., 2005. Cross section and panel data estimators for nonseparablemodels with endogenous regressors. Econometrica 73 (4), 1053–1102.Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J., 2014. Engaging with massive onlinecourses. In: Proceedings of the 23rd international conference on World wide web. ACM, pp.687–698.Angelucci, M., De Giorgi, G., 2009. Indirect effects of an aid program: how do cash transfers affectineligibles’ consumption? American Economic Review 99 (1), 486–508.Baird, S., Bohren, J. A., McIntosh, C., ¨Ozler, B., 2018. Optimal design of experiments in thepresence of interference. Review of Economics and Statistics 100 (5), 844–860.Banerjee, A. V., Chattopadhyay, R., Duflo, E., Keniston, D., Singh, N., 2012. Can institutions bereformed from within? evidence from a randomized experiment with the rajasthan police.Barrera-Osorio, F., Bertrand, M., Linden, L. L., Perez-Calle, F., 2011. Improving the design ofconditional transfer programs: Evidence from a randomized education experiment in Colombia.American Economic Journal: Applied Economics 3 (2), 167–95.Bobba, M., Gignoux, J., 2014. Neighborhood effects and take-up of transfers in integrated socialpolicies: Evidence from Progresa.Bobonis, G. J., Finan, F., 2009. Neighborhood peer effects in secondary school enrollment decisions.The Review of Economics and Statistics 91 (4), 695–716. ond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E., Fowler, J. H., 2012.A 61-million-person experiment in social influence and political mobilization. Nature 489 (7415),295.Bursztyn, L., Cantoni, D., Yang, D., Yuchtman, N., Zhang, J., 2019. Persistent political engage-ment: Social interactions and the dynamics of protest movements. Working Paper.Constantinou, P., Dawid, A. P., 2017. Extended conditional independence and applications in causalinference. The Annals of Statistics 45 (6), 2618–2653.Cr´epon, B., Duflo, E., Gurgand, M., Rathelot, R., Zamora, P., 2013. Do labor market policies havedisplacement effects? evidence from a clustered randomized experiment. The Quarterly Journalof Economics 128 (2), 531–580.Dawid, A. P., 1979. Conditional independence in statistical theory. Journal of the Royal StatisticalSociety: Series B (Methodological) 41 (1), 1–15.Duflo, E., Saez, E., 2003. The role of information and social interactions in retirement plan decisions:Evidence from a randomized experiment. The Quarterly Journal of Economics 118 (3), 815–842.Eckles, D., Kizilcec, R. F., Bakshy, E., 2016. Estimating peer effects in networks with peer encour-agement designs. Proceedings of the National Academy of Sciences 113 (27), 7316–7322.Gin´e, X., Mansuri, G., 2018. Together we will: experimental evidence on female voting behavior inpakistan. American Economic Journal: Applied Economics 10 (1), 207–35.Haushofer, J., Shapiro, J., 2016. The short-term impact of unconditional cash transfers to the poor:experimental evidence from kenya. The Quarterly Journal of Economics 131 (4), 1973–2042.Heckman, J., Vytlacil, E., 1998. Instrumental variables methods for the correlated random coef-ficient model: Estimating the average rate of return to schooling when the return is correlatedwith schooling. Journal of Human Resources, 974–987.Hoeffding, W., 1963. Probability inequalities for sums of bounded random variables. Journal of theAmerican Statistical Association 58 (301), 13–30.Hudgens, M. G., Halloran, M. E., 2008. Toward causal inference with interference. Journal of theAmerican Statistical Association 103 (482), 832–842.Imai, K., Jiang, Z., Anup Malani, 2018. Causal Inference with Interference and Noncompliance inTwo-Stage Randomized Experiments.URL http://imai.princeton.edu/research/files/spillover.pdf Imbens, G. W., Newey, W. K., 2009. Identification and estimation of triangular simultaneousequations models without additivity. Econometrica 77 (5), 1481–1512.Kang, H., Imbens, G., 2016. Peer Encouragement Designs in Causal Inference with Partial Inter-ference and Identification of Local Average Network Effects, 1–39.URL http://arxiv.org/abs/1609.04464
Manski, C. F., 2013. Identification of treatment response with social interactions. EconometricsJournal 16 (1), 1–23. asten, M. A., Torgovitsky, A., 2016. Identification of instrumental variable correlated randomcoefficients models. Review of Economics and Statistics 98 (5), 1001–1005.Miguel, E., Kremer, M., 2004. Worms: identifying impacts on education and health in the presenceof treatment externalities. Econometrica, 159–217.Pearl, J., 1988. Probabilistic reasoning in intelligent systems: Networks of plausible inference.Sinclair, B., McConnell, M., Green, D. P., 2012. Detecting spillover effects: Design and analysis ofmultilevel experiments. American Journal of Political Science 56 (4), 1055–1069.Wooldridge, J. M., 1997. On two stage least squares estimation of the average treatment effect ina random coefficient model. Economics Letters 56 (2), 129–133.Wooldridge, J. M., 2003. Further results on instrumental variables estimation of average treatmenteffects in the correlated random coefficient model. Economics Letters 79 (2), 185–191.Wooldridge, J. M., 2004. Estimating average partial effects under conditional moment independenceassumptions. Tech. rep., cemmap working paper.Wooldridge, J. M., 2016. Instrumental variables estimation of the average treatment effect in thecorrelated random coefficient model. Advances in Econometrics 21, 93–116.Yi, H., Song, Y., Liu, C., Huang, X., Zhang, L., Bai, Y., Ren, B., Shi, Y., Loyalka, P., Chu, J.,et al., 2015. Giving kids a head start: The impact and mechanisms of early commitment offinancial aid on poor students in rural China. Journal of Development Economics 113, 1–15.asten, M. A., Torgovitsky, A., 2016. Identification of instrumental variable correlated randomcoefficients models. Review of Economics and Statistics 98 (5), 1001–1005.Miguel, E., Kremer, M., 2004. Worms: identifying impacts on education and health in the presenceof treatment externalities. Econometrica, 159–217.Pearl, J., 1988. Probabilistic reasoning in intelligent systems: Networks of plausible inference.Sinclair, B., McConnell, M., Green, D. P., 2012. Detecting spillover effects: Design and analysis ofmultilevel experiments. American Journal of Political Science 56 (4), 1055–1069.Wooldridge, J. M., 1997. On two stage least squares estimation of the average treatment effect ina random coefficient model. Economics Letters 56 (2), 129–133.Wooldridge, J. M., 2003. Further results on instrumental variables estimation of average treatmenteffects in the correlated random coefficient model. Economics Letters 79 (2), 185–191.Wooldridge, J. M., 2004. Estimating average partial effects under conditional moment independenceassumptions. Tech. rep., cemmap working paper.Wooldridge, J. M., 2016. Instrumental variables estimation of the average treatment effect in thecorrelated random coefficient model. Advances in Econometrics 21, 93–116.Yi, H., Song, Y., Liu, C., Huang, X., Zhang, L., Bai, Y., Ren, B., Shi, Y., Loyalka, P., Chu, J.,et al., 2015. Giving kids a head start: The impact and mechanisms of early commitment offinancial aid on poor students in rural China. Journal of Development Economics 113, 1–15.