EExperimental Design under Network Interference ∗ Davide Viviano † June 15, 2020
Abstract
This paper discusses the problem of the design of a two-wave experiment undernetwork interference. We consider (i) a possibly fully connected network, (ii) spillovereffects occurring across neighbors, (iii) local dependence of unobservables character-istics. We allow for a class of estimands of interest which includes the average effectof treating the entire network, the average spillover effects, average direct effects, andinteractions of the latter two. We propose a design mechanism where the experimenteroptimizes over participants and treatment assignments to minimize the variance of theestimators of interest, using the first-wave experiment for estimation of the variance.We characterize conditions on the first and second wave experiments to guarantee un-confounded experimentation, we showcase tradeoffs in the choice of the pilot’s size, andwe formally characterize the pilot’s size relative to the main experiment. We deriveasymptotic properties of estimators of interest under the proposed design mechanism,and regret guarantees of the proposed method. Finally we illustrate the advantage ofthe method over state-of-art methodologies on simulated and real-world networks. ∗ I acknowledge support of the Social Science Computing Facilities at UC San Diego and HPC@UCawards from the San Diego Super Computer Center. I thank Jelena Bradic, Graham Elliott, James Fowler,Karthik Muralidharan, Yixiao Sun, and Kaspar W¨uthrich for helpful comments and discussion. All mistakesare mine. † University of California, San Diego. Correspondence: [email protected]. a r X i v : . [ ec on . E M ] J un Introduction
This paper discusses the problem of experimental design with units embedded in a network.Motivated by applications in economic studies, we consider the main experiment to beconducted once , whereas a first-wave experiment is available to the researcher. Our goal isto understand how the researcher can best select participants and treatment assignmentsin the first and second-wave experiments for conducting precise inference on treatmenteffects.We now discuss the main setting under consideration. We consider N units connectedon an observed network, with spillovers occurring between neighbors. The network ispotentially fully connected: differently from typical settings for clustered or saturationdesign experiments (Baird et al., 2018), no independent clusters are necessarily available.The following experimental protocol is considered: (1) researchers select a small sub-sampleof individuals, and they implement a pilot study; (2) using information from such a study,they select participants in the main experiment, and treatment assignments of participantsand their neighbors; (3) researchers collect end-line information on the outcome of interestof participating units. We consider a class of estimands of interest, which include as themain ones the (i) overall effect of treatment, the (ii) direct effect, the (iii) spillover effectsand interactions of the latter two. For example, in the presence of a cash transfer program(Barrera-Osorio et al., 2011), we may be interested in the effect on recipients (i.e., directeffects), as well as on those non-recipients living close to the recipients (i.e., spillovers),and on the sum of these effects (i.e., overall effect). Estimators under consideration includedifferences in means estimators and estimators obtained from linear models.To the best of our knowledge, this paper provides the first statistical framework for thedesign and inference of a two-wave experiment under network interference. Main facts chal-lenge two-wave experimentation on networks. First, dependence across observations mayinduce dependence of the pilot study with the outcomes in the second-wave experiment,and confound the assignment mechanism. Our first contribution consists in the design ofthe pilot study and second-wave experiment to guarantee precise and unconfounded ex-perimentation. We derive conditions on the selection of participants in the second-waveexperiment, which require that the pilot is “well separated” from the main experiment. Weoutline the existence of a trade-off in the choice of the pilot’s size: a larger pilot guaranteesmore precise estimates of the variance components, at the expense of stricter conditionson the participants’ selection in the main experiment. Motivated by this trade-off, we se-lect the first-wave experiment by finding a maximum cut in the network, under additional See for example, Muralidharan and Niehaus (2017). Usage of pilot studies is common practice and some examples include Karlan and Appel (2018); Karlanand Zinman (2008); DellaVigna and Pope (2018). This assumption is known as local interference (Manski, 2013), and it can be tested using, for instance,the framework in Athey et al. (2018). This is often assumed in practice (Egger et al., 2019; Dupas, 2014;Miguel and Kremer, 2004; Bhattacharya et al., 2013; Duflo et al., 2011) as well as in theoretical analysis(Forastiere et al., 2016; Leung, 2019b; Sinclair et al., 2012). true variance of the estimator. We showcase that such a difference, rescaled by the samplesize, converges to zero with the rate depending on the ratio of the size of the first andsecond wave experiments, and the maximum degree of the network. This result permitsto formally characterize the pilot’s size relative to the main experiment. A key step in ourproof consists in deriving lower bounds to the oracle solution, under stricter constraints onthe decision space of the experimenter, to permit comparability of the feasible and oraclesolutions.A second challenge is represented by the dependence of potential outcomes with individ-ual and neighbors’ treatment status. We allow for heteroskedastic variance and covariances,and we assign treatments and select units to minimize the variance in the main experi-ment. The optimization problem naturally leads to arbitrary dependence among treatmentassignments. Motivated by this consideration, we derive asymptotic properties of the esti-mator under the proposed design, conditional on the assignment mechanism. We considerlocal dependency graphs, and we allow the maximum degree and the number of highlyconnected nodes to grow with the sample size at a slower rate.Our mechanism imposes the following conditions: (a) effects may be heterogeneous insummary statistics of the network structure, such as the number of neighbors or centralitymeasures having discrete support; (b) unobservables are locally dependent with neighborsconnected by finitely many edges (e.g., one or two-degree neighbors). Such a model allowsto fully exploit network information in the design mechanism, and it encompasses a largenumber of economic examples from the literature: spillovers in public policy programs(Muralidharan et al., 2017), cash transfer programs (Egger et al., 2019), health programs(Dupas, 2014; Miguel and Kremer, 2004; Bhattacharya et al., 2013), educational program(Duflo et al., 2011), advertising campaigns (Cai et al., 2015; Bond et al., 2012), amongothers.We discuss extensions in the presence of partially observed networks. In this context,the pilot study is assumed to be conducted on an independent component of the network,and the unobserved edges are imputed under modeling assumptions. Asymptotic inferenceon causal effects can be conducted, as long as neighbors’ treatment status of the participantsis observed. Such information can be obtained from a survey conducted on the participantindividuals only. Finally, in the Appendix, we extend our framework to experiments wheretreatments are randomized, proposing a procedure for the design of the propensity scorefunction.We conclude our discussion with a set of simulation results. Using real-world networks In the presence of continuous centrality measures, discretization is necessary for the validity of theresults.
The problem of experimental design is receiving growing attention in recent years. A simpleapproach for experimental design may consist of dividing units into clusters (Hudgens andHalloran, 2008). Methods in such a setting include clustered experiments (Eckles et al.,2017; Taylor and Eckles, 2018) and saturation design experiments (Baird et al., 2018; Basseand Feller, 2016; Pouget-Abadie, 2018).Whereas extensions of clustered experiments to fully connected graphs are available(Ugander et al., 2013), two drawbacks characterize such designs: (i) they impose severelimitations on the set of causal estimands that may be considered - without allowing forseparate identification of direct and spillover effects; (ii) they drastically reduce the effec-tive sample size of the experiment. Instead, saturation design experiments require mutuallyindependent clusters, and they do not apply to a fully connected network. Optimal ran-domization for saturation design experiments also requires knowledge of the variance andcovariance across individuals (Baird et al., 2018).Recent literature in statistics and econometrics discusses design mechanisms underalternative modeling assumptions without exploiting knowledge from a pilot study in theestimation of the variance. For instance, Basse and Airoldi (2018b) assume Gaussianoutcomes, dependence but no interference across units. Wager and Xu (2019) discussinstead sequential randomization for optimal pricing strategies under global interference,without discussing the problem of inference on treatment effects. A further referenceincludes Kang and Imbens (2016) which discuss encouragement designs instead in thepresence of interference, without focusing on the problem of variance-optimal design. Basseand Airoldi (2018a) discuss limitations of design-based causal inference under interference.Finally, Jagadeesan et al. (2017) and Sussman and Airoldi (2017) discuss the design ofexperiments for estimating direct treatment effects only, whereas this paper considers amore general class of estimands, which may include overall and spillover effects.We relate to a large literature on optimal experimental design in the i.i.d. setting forbatch experiments, which can be divided into “one-stage” procedures (Harshaw et al.,2019; Kasy, 2016; Kallus, 2018; Barrios, 2014), and randomization “with a pilot study”4Bai, 2019; Tabord-Meehan, 2018). Our setting relates to this latter strand of literature.Dependence and interference across observations induce at least two challenges: (i) the useof a pilot restricts the selection of participants in the main experiments, due to possibledependence between pilot units and individuals in the main experiments. Such restric-tions motivate the “regret” analysis discussed in the current paper; (ii) the optimizationproblem induces arbitrary dependence on treatment assignment, due to interference condi-tions, which, together with the dependence across units, motivates the asymptotic analysisdiscussed in this paper.A further strand of literature to which we refer to is inference under interference.References include Aronow et al. (2017), Choi (2017), Forastiere et al. (2016), Manski(2013), Leung (2019a), Vazquez-Bare (2017), Li et al. (2019), Athey et al. (2018), Ogburnet al. (2017), Goldsmith-Pinkham and Imbens (2013), S¨avje et al. (2017) among others. Allthese references discuss valid inferential procedure for treatment effects under interference,but they do not provide insights for variance-optimal designs. The asymptotic analysisin previous references often impose either independence or weak depedence conditions ontreatment assignmnents (Leung, 2019b; Chin, 2018; Ogburn et al., 2017; Kojevnikov et al.,2019) or independent clusters (e.g., Vazquez-Bare (2017)). Finally, Viviano (2019) discussessmall sample guarantees for policy targeting under interference, without providing insightsneither on asymptotic inference nor on the design mechanism.
In this section, we discuss the set up, model, estimands and estimators.
We consider the following setting: N units are connected by an adjacency matrix A andhave outcomes Y i ∈ R drawn from a super-population. The researcher samples n ≤ N units participating in the experiment. For each unit i ∈ { , ..., N } we denote R i = 1 { i is in the experiment } , D i ∈ { , } , respectively the participation indicator variable, which is equal to one if unit i is sampledby the researcher and zero otherwise, and the treatment assignment indicator.We consider both R i and D i as decision variables in the “hands” of the experimenter.Once such variables are assigned, n = (cid:80) Ni =1 R i denotes the total number of participants inthe experiment. Throughout our discussion we interpret the treatment from an intentionto treat perspective. Notation
A short summary is provided in Table 2 contained in the Appendix. Wedenote the set of neighbors of each individual to be N i = { j (cid:54) = i : A i,j = 1 } where5 i,j = A j,i ∈ { , } denotes the edge between individual i and j . We consider A i,j ∈ { , } and A i,i = 0. We let | N i | denote the cardinality of the set N i . Throughout our discussionwe define [ n ] := { i : R i = 1 } the set of all participants, and [˜ n ] the set [ n ] ∪ {∪ j N j : j ∈ [ n ] } of all participants and their neighbors. We denote ˜ n the size of such a set. Finally, we de-note [ N ] = { , ..., N } the set of all units of interest. We let R [ N ] the vector of participationindicators and similarly we denote D [˜ n ] the vector of treatment assignments of participantsand their neighbors. Denote T i ∈ T arbitrary and additional information of the individual.Let θ i = f i ( A, T [ N ] ) ∈ Θ be some arbitrary and observable statistics of individual i , whichalways contains the number of neighbors | N i | , and also depend on the network. θ [ n ] denotesthe vector of such characteristics for all participants. We consider the following outcome model Y i = r (cid:16) D i , (cid:88) k ∈ N i D k , θ i , ε i (cid:17) , ε i | A, T [ N ] , θ i = l ∼ P l , θ i ∈ Θ , ∀ i ∈ { , ..., N } , (1)where the function r ( · ) and P l are potentially unknown to the researcher, and ε i denotesunobservable characteristics. θ i ∈ Θ is assumed to be observable and Θ is assumed to bea countable space (e.g., θ i denoting the number of neighbors). The above model assumesthat the network affects the outcome variable through the variable θ i only.The above model is motivated by its large use in applications, where treatment effectsare assumed to propagate to first-order degree neighbors, and possibly being heterogeneousin observable characteristics θ i such as the number of neighbors. Similar assumptions canbe found for instance also in Leung (2019b). Throughout the rest of our discussion, wedenote E (cid:104) r (cid:16) d, s, l, ε i (cid:17)(cid:12)(cid:12)(cid:12) θ i = l (cid:105) = m ( d, s, l ) , (2)the conditional mean given θ i = l , and fixing the individual and neighbors’ treatmentassignments to be respectively ( d, s ). Next, we discuss dependence among unobservables.Unobservables are assumed to be locally dependent conditional on the adjacency matrix,with nodes connected by at most M edges. For instance, local dependence of degree onereads as follows: ε i ⊥ ε j (cid:54)∈ N i | A, θ i , θ j but ε i (cid:54)⊥ ε j ∈ N i | A, θ i , θ j . Throughout the rest of our discussion we consider one-degree dependence only as describedin the above equation, and we leave extensions to higher order degree dependence to Ap-pendix A.1. We formalize such an assumption in the following lines. For expositional convenience we only consider symmetric graphs, whereas all our results also extend toasymmetric graphs. Examples include Muralidharan et al. (2017); Sinclair et al. (2012); Cai et al. (2015); Duflo et al. (2011),among others. ssumption 2.1 (Model under One Degree Dependence) . Let Equation (1) hold. Assumein addition that for all i ∈ { , ..., N } , (cid:110) ε i , { ε k } k (cid:54)∈ N j ,j ∈ N i (cid:111) ⊥ { ε j } j (cid:54)∈ N i (cid:12)(cid:12)(cid:12) A, θ [ N ] a.s., ( ε i , ε j ) = d ( ε i (cid:48) , ε j (cid:48) ) (cid:12)(cid:12)(cid:12) A, θ [ N ] ∀ ( i, j, i (cid:48) , j (cid:48) ) : i ∈ N j , i (cid:48) ∈ N j (cid:48) , θ i = θ i (cid:48) , θ j = θ j (cid:48) a.s.. (3)The above condition states that unobservables of non-adjacent neighbors are mutuallyindependent. The condition also imposes that the dependence between unobservable onlydepend on whether they are neighbors, but not on the identity of the neighbors. Remark 1 (Higher order dependence) . Extensions to higher order dependence of degree M , reads as follows (cid:110) ε i , { ε k } k (cid:54)∈∪ Mu =1 N uj ,j ∈∪ Mu =1 N ui (cid:111) ⊥ { ε j } j (cid:54)∈∪ Mu =1 N ui (cid:12)(cid:12)(cid:12) A, θ [ N ] a.s., (4)where N ui denotes the set of neighbors of degree u . Under such a model, unobservablesthat are not adjacent by at least M edges are independent. All our results extend to thissetting, and are contained in Section A.1.We provide examples under one and two degree dependence in the following lines. Example 2.1.
Sinclair et al. (2012) study spillover effects for political decisions withinhouseholds. The authors propose a model of the form Y i = µ + τ D i + τ (cid:110) (cid:88) j ∈ N i D j ≥ (cid:111) + τ (cid:110) (cid:88) j ∈ N i D j ≥ | N i | / (cid:111) + τ (cid:110) (cid:88) j ∈ N i D j = | N i | (cid:111) + ε i , (5) where N i denotes the element in the same household of individual i . The model captureseffect for individual i being treated and at least one, half and all of the other units in thehousehold being treated. Under the above model, the local and anonymous interferencecondition holds with θ i = | N i | . Suppose in addition that ε [ N ] | A ∼ N (0 , Σ) (6) where Σ i,i = σ , Σ i,j = α × { i ∈ N j } for α > . Example 2.2.
Consider the following equation Y i = µ + τ D i + τ (cid:88) k ∈ N i D k (cid:46) | N i | + (cid:112) | N i | × ε i , (7) where ε i = (cid:88) k ∈ N i η k (cid:46)(cid:112) | N i | , η i ∼ iid N (0 , σ ) . (8) Then the above assumption holds with θ i = | N i | . In such a case unobservables are dependenton their neighbors and the neighbors of their neighbors only. .3 Estimands and Estimators This paper considers a class of estimands which encompasses direct effect, spillover effectsand interaction of these. In particular, the class of estimands of interest of this paper isthe following: τ ( d, s, d (cid:48) , s (cid:48) , l ) = m ( d, s, l ) − m ( d (cid:48) , s (cid:48) , l ) , for some arbitrary d, d (cid:48) ∈ { , } , s, s (cid:48) ∈ Z , l ∈ Θ , (9)and any weighted combination of the form (cid:88) l ∈ Θ v ( l ) τ ( d, s, d (cid:48) , s (cid:48) , l )for some given weights v ( l ). The above estimand denotes a weighted average of conditionaltreatment effects, given θ i = l .We provide three main examples below, setting θ i = | N i | for expositional convenience:1. Average direct effect: τ (1 , s, , s, l ), it denotes the average effect of treating an indi-vidual s treated neighbors, conditional on having l neighbors;2. Average marginal spillover effect: τ (0 , s, , s − , l ), it denotes the average effect oftreating one more neighbor, conditional on having l neighbors, and having s treatedneighbors;3. Average overall effect: τ (1 , l, , , l ), it denotes the average effect of treating eachindividual on the network, conditional on having l neighbors.Researchers are assumed to be interested in either, some or all of the above effects. Example
Cont’d
In this case τ (1 , l, , , l ) = τ + τ + τ + τ , τ (1 , s, l, , s, l ) = τ . (10) Example
Cont’d
In this case τ (1 , l, , , l ) = τ + τ , τ (0 , s, , s − , l ) = τ /l, τ (1 , s, l, , s, l ) = τ . (11)For sake of generality, since we consider either parametric or non-parametric formula-tion, we formally define the estimand of interest as a weighted combination of the expectedoutcomes, conditional on the assignment mechanism. We formalize the definition below,and provide examples in the following sub-section that showcase equivalence of the follow-ing definition with those provided above. 8 efinition 2.1 (Conditional Estimand and Estimator) . Given a set of weights w N ∈ W N ,let the estimand be defined as τ n ( w N ) = 1 n (cid:88) i : R i =1 w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) m (cid:16) D i , (cid:88) k ∈ N i D k , θ i (cid:17) . (12)Denote the corresponding estimator asˆ τ n ( w N ) = 1 n (cid:88) i : R i =1 w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) Y i . (13)The weights w N ( . ) are functions of observable characteristics, i.e., treatment assign-ments, selection indicators, network characteristics on the participants, and outcomes ofthe participants. Without loss of generality, we assume that the weights of non-participantsare equal to zero. The size of the set W N depends on the number of estimands (and soestimators) of interest. We provide examples in the following lines. In this section, we discuss leading examples of estimands and corresponding estimatorsconsidered throughout this paper.
Difference in Means
Let θ i = | N i | and consider the following class of weights: w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) = (cid:40) γ i ( d , s , l ) − γ i ( s , d , l ) if R i = 10 otherwise . (14)where γ i ( d , s , l ) = 1 { D i = d , (cid:80) k ∈ N i D k = s , θ i = l } (cid:80) i : R i =1 { D i = d , (cid:80) k ∈ N i D k = s , θ i = l } /n (15)Then τ n ( w N ) = m ( d , s , l ) − m ( d , s , l ) = τ ( d , s , d , s , l ) , (16)which defines the effect on an individual conditional on having l neighbors, who is exposedto treatment d and s treated neighbors, against the case where such individual is exposedto treatment d and s many treated neighbors. In addition, any weighted average of theform ∞ (cid:88) l =0 v ( l ) τ ( d , s , d , s , l ) , for some weights v ( l ), satisfies Definition 2.1. Here weights are only indexed by N , with an abuse of notation. In fact, weights formally also dependon n , and ˜ n . This abuse of notation is without loss of generality, since we can re-write the weights asfunctions of D [ N ] , R [ N ] and assume to be constant on non-observable arguments. odel-Based Estimands Consider the following two weighting mechanisms for i : R i = 1, and θ i = T i . Then, w N ( i, . ) = ( 1 n (cid:88) i : R i =1 X i X (cid:48) i ) − X i (17)where, for example, X i = (cid:16) , D i , (cid:88) k ∈ N i D k , D i (cid:88) k ∈ N i D k , D i T i (cid:17) . Under Assumption 3.1, by further assuming that m ( d, s, l ) = µ + dβ + γs + dsφ + ldω. (18)we have τ n ( w N ) = ( µ, β, γ, φ, ω ) . In this section, we discuss the main experimental protocol, allowing the entire adjacencymatrix to be observed by the researchers. We consider the following setting.Researchers:1. either observe the adjacency matrix A or partial information of such a matrix, suchas the connections of a random subset of individuals, encoded in ˜ A ;2. select unit in a set I and run a pilot study on such units, collecting their outcomesand treatment assignments;3. based on the pilot study, researchers select the participants (i.e., indicators R i ), andthe treatment assignments D i for all such participants and their neighbors. Thetreatment D i for the remaining units is assumed to be exogenous (e.g., constant atzero), and the treatment assignment to the pilot units remains unchanged;4. they collect information ( Y i , D i , θ i , D j ∈ N i , N i ) for all participants (i.e., R i = 1);5. researchers estimate the causal effect of interest using such information.For an intuitive explanation, consider Figure 1. The figure partitions the populationof interest into four regions: (i) the pilot study, for which the treatment is assigned in afirst-wave experiment; (ii) the set of participants (white region), whose end-line outcomesare observed by the researcher; (iii) the set of units which are both non-participant and Here we make an abuse of notation and define the vector of estimands as discussed below. i = 1 , D i = 1 R i = 1 , D i = 0 R i = 0 , D i = 0 R i = 0 , D i = 1 R i = 0, exogenous D i pilot D i I Figure 1: Graphical representation of the data structure. The square denotes all observa-tions in the population of size N . The green region denotes the individuals selected forthe pilot study. The white region denotes the individuals not in the pilot study and whoseoutcomes are sampled by the researchers. The brown area denotes all the units that arenot in the pilot study, which have not been selected for the experiment, but which areneighbors of the selected units. The gray region denotes all those observations that areeligible for the experiment but whose outcomes are not sampled by the researcher and thatare not neighbors of the units in the white area.neighbors of the participant units. For (iii) we assume that the researcher assigns thetreatment, but the end-line outcomes of these units are not necessarily collected. Therefore,(iii) assumes that collecting end-line information may be costly, and therefore, even if someunits are exposed to treatment, researchers may have constraints on the number of unitsfollowed during the experiment. However, constraints on the assignment of the individualin the brown region (i.e., neighbors of participants who are not themselves participants)being equal to the baseline may also be imposed, and all our results also hold wheneverthe treatment is constrained to be exogenous for such units. An important intuition is that by imposing anonymous and local interference, we can con-struct a design mechanism that allows researchers to sample units and to assign treatmentsarbitrarily dependent on the network structure. However, this is possible as long as thesampling and treatment assignment mechanism does not depend on the outcome variablesof those units participating in the experiment. We formalize such a condition as follows:after excluding individuals in the pilot study, which may be used for the design of theexperiment, and their neighbors, whose unobservables depend on the pilot units, the treat-ment assignment mechanism and the selection mechanism must be randomized based onthe adjacency matrix and pilot units only. Formally, let us define H = { , ..., N } \ {I ∪ j ∈I N j } (19)11he set of units after excluding individuals in the pilot study and the corresponding neigh-bors. Then an experiment is defined as valid if the following condition holds. Assumption 3.1 (Experimental Restriction) . Let the following hold:( a ) : ε i ∈H ⊥ (cid:16) D [˜ n ] , R [ N ] (cid:17)(cid:12)(cid:12)(cid:12) A, θ [ N ] , I , and ( b ) : ε [ N ] ⊥ { j ∈ I} (cid:12)(cid:12)(cid:12) A, θ [ N ] . Assume in addition, that ( c ) : R i = 0 ∀ i ∈ J , J = I ∪ j ∈I N j . (20)The first condition states that unobservables in the set H are independent on treatmentassignments and selection indicators. The second condition states that the choice of thepilot units is randomized, depending on information A, θ [ N ] only. The third conditionimposes that units participating in the experiment are not in the pilot and they are notthe neighbors of pilot units.The first condition in Assumption 3.1 is imposed on all units, with the exception ofthose units in the pilot study and their neighbors . The reason why such a condition is not imposed on the units in the pilot and their neighbors is due to the local dependenceassumption: the outcome of the neighbors may be dependent with the outcomes of thepilot units, and therefore be “confounded” when treatments are assigned on the basis ofa pilot study. This motivates the third condition, i.e., the participants are neither pilotunits, nor their neighbors. To gain further intuition, consider Figure 2. In the figure, theset of pilot units includes the vertices N4, N5, N6. Researchers may use their outcomesfor the design of the experiments. Therefore, the treatment assignment mechanism isclearly dependent on the unobservables of such units. However, the outcomes of suchunits are also dependent on N7, namely the neighbors of the pilot set. To guarantee thatpotential outcomes are independent of the treatment and selection assignment mechanism,N7 should not be included as participants to the experiment. Under the above condition,the estimator is conditional unbiased. Theorem 3.1 (Conditional Unbiasness) . Under Assumption 2.1, 3.1, E (cid:104) ˆ τ n ( w N ) (cid:12)(cid:12)(cid:12) D [˜ n ] , R [ N ] , A, θ [ N ] , I (cid:105) = τ n ( w N ) . (21)The proof is contained in the Appendix.Theorem 3.1 showcases that the estimator of interest is centered around the correctestimand conditional on the design mechanism and the adjacency matrix, under localinterference. Our result is derived conditional on the design mechanism, which guaranteesvalid inference for any design under Assumption 3.1. The theorem relates to coloringargument on the graph which have also used in Sussman and Airoldi (2017) for studyingthe bias of estimators of direct treatment effects induced by neigbors’ interference. However,here conditions are differently imposed on the construction of a pilot study , and lack of suchconditions would induce bias on general estimators , due the dependence of unobservablesin the pilot study with the design mechanism .121 N2 N3N4 N5N6 N7 P ilot
Figure 2: Example of network. In such a setting, under one-degree dependence, N7 doesnot satisfy the validity condition since it connects to the pilot study, which is used for therandomization of treatments and indicators.
Before discussing the main results, we introduce additional notation. We define the condi-tional variance of the estimators of interest, conditional on the treatment assignment andthe underlying network below. V N ( w N ; A, D [˜ n ] , R [ N ] , θ [ N ] , I ) = Var (cid:16) (cid:80) Ni =1 R i (cid:88) i : R i =1 w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) Y i (cid:12)(cid:12)(cid:12) A, D [˜ n ] , R [ N ] , θ [ N ] , I (cid:17) . (22)We omit the last arguments of the above expression whenever clear from the context.Given the selection of the pilot study I , the design of the main experiment (i.e., second-wave experiment) solves the following minimax problem: for α ∈ (0 , D [˜ n ] ,R [ N ] max w N ∈W N ˆ V N,p ( w N ; A, D [˜ n ] , R [ N ] , θ [ N ] , I ) , s.t. N (cid:88) i =1 R i ∈ [ αn, n ] , R j = 0 ∀ j ∈ J , (23)where ˆ V N,p denotes an estimator of the variance obtained from the pilot study, and dis-cussed in the following paragraphs, and αn, n are given constraints on the maximum anda minimum number of participants. The maximum number of participants may be im-posed by cost considerations, whereas the minimum number of participants guaranteesvalid asymptotic approximation for inference, discussed in Section 4. The minimization iswith respect to two sets of choice variables: the participation indicators and the treatmentassignments. Intuitively, different participants have different variances depending on thenumber of their connections, as well as different covariances with other participants. Sim-ilarly, different treatment assignments may lead to different variances and covariances inthe presence of heteroscedasticity.Additional constraints may be included: for example, only some units can participatein the experiments, which corresponds to constraints on R i = 0 for some of the units. An13lternative constraint may impose that D i × R i ≥ D i . Such a constraint imposes thatthose unit which are not selected as participants have treatment assignment constant atthe baseline. The proposed method accommodates all such constraints and the resultsdiscussed in the following lines hold also in such scenarios.The above definition showcases one main trade-off in the selection of the pilot study:the larger the pilot study, the more precise is the estimator of the variance. However, thelarger the pilot study, the larger the set J and therefore the more stringent the constraintimposed in the above optimization procedure. In Section 4 we formally characterize sucha trade-off and derive the optimal size of the pilot. Remark 2 (Weighted Average Objective) . An alternative specification of the objectivefunction consists in minimizing the following weighted average (cid:88) w N ∈W N u ( w N ) V N ( w N ; A, D [˜ n ] , R [ N ] , θ [ N ] , I )for given weights u ( w N ). The proposed mechanism and all our results directly extend alsoto this setting. We now discuss the choice of the pilot study and the treatment assignment mechanism.The treatment assignment is randomized as follows.
Assumption 3.2 (Pilot Experiment) . Let D i ∈I∪{∪ j ∈I N j } ⊥ ε [ N ] | A, θ [ N ] . The above condition imposes restrictions on the treatment assignment on the pilot.Such assingment may be, for instance, fully randomized.Next, we discuss selection of participants in the pilot study. There are two importantfacts to consider: (i) under Assumption 3.1, participants in the pilot study can be selectedbased on network information only; (ii) the larger the set J , the stricter the constraintimposed on the second-wave experiment. Therefore, selection of the pilot must minimize |J | . Two constraints must be imposed: (i) a minimum number of elements to be selectedin the pilot; (ii) the pilot must include some neighbors in order to be able to estimatecovariances between individuals.The problem is formally stated below. We denote x i = 1 { i ∈ I} , the indicator ofwhether individual i belongs to the pilot study, αm, m the lower and upper bounds on thenumber of pilot units, the following optimization is devised. Denote δ a given parameter.Then we define: for α ∈ (0 , { x ,...,x N }∈{ , } N N (cid:88) i =1 (cid:88) j ∈ N i x i (1 − x j ) , s.t. N (cid:88) i =1 x i ∈ [ αm, m ] , N (cid:88) i =1 x i (cid:88) j ∈ N i x j ≥ δ. (24)14he problem reads as a variation of the min-cut problem in a graph: we aim to finda set of units that are “well” separated from the rest under constraints on the number ofsuch units and their number of neighbors. The parameter δ imposes a lower bound onthe number of neighbors within the pilot study, which is required to be able to estimatethe covariance between units. The optimization can be easily solved using mixed-integerquadratic programming (MIQP).Next, we discuss identification and estimation of the variance component. Observe thatunder Assumption 2.1, we obtain nV N ( w N ) = 1 n (cid:88) i : R i =1 w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) V ar ( Y i | A, D [˜ n ] , R [ N ] , θ [ N ] , I )+ 1 n (cid:88) i : R i =1 (cid:88) j ∈ N i R j w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) w N ( j, D [˜ n ] , R [ N ] , θ [ n ] ) Cov ( Y i , Y j | A, D [˜ n ] , R [ N ] , θ [ N ] , I ) . (25)The variance depends on two main components: the sum of the variances over eachunit participating in the experiment and the covariance among those units participating inthe experiment only. We now discuss identification and estimation of such components. Lemma 3.2.
Suppose that Assumption 2.1, 3.1 hold. Then for all units participating inthe experiment (i.e., R i = 1 ): V ar ( Y i | A, D [˜ n ] , R [ N ] , θ [ N ] , I ) = σ (cid:16) θ i , D i , (cid:88) k ∈ N i D k (cid:17) .Cov ( Y i , Y j | A, D [˜ n ] , R [ N ] , θ [ N ] , I , i ∈ N j ) = η (cid:16) θ i , D i , (cid:88) k ∈ N i D k , θ j , D j , (cid:88) k ∈ N j D k (cid:17) (26) for some functions σ ( . ) , η ( . ) . In addition, under Assumption 2.1, 3.1, 3.2, we obtain thatfor all units in the pilot experiment the following hold: Var (cid:16) Y i (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, A (cid:17) = σ ( l, d, s )Cov (cid:16) Y i , Y j (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, D j = d (cid:48) , (cid:88) k ∈ N j D k = s (cid:48) , θ j = l (cid:48) , A (cid:17) = η ( l, d, s, l (cid:48) , d (cid:48) , s (cid:48) ) . (27)The above lemma states that the variance of the outcome and the covariance betweenthe outcomes can be expressed as a function of θ i , of the individual treatment assignmentand of the number of treated neighbors. In addition, this function is the same if estimatedon the pilot units. This result permits to use a plug-in procedure with the estimatedindividual variance and covariance function. In particular, given estimator of the variance15nd covariance component ˆ σ p ( . ) , ˆ η p ( . ), the variance estimator is defined as follows. n ˆ V n,p ( w N ) = 1 n (cid:88) i : R i =1 w N ( i, D [˜ n ] , R [ N ] , θ [ n ] )ˆ σ p (cid:16) θ i , D i , (cid:88) k ∈ N i D k (cid:17) + 1 n (cid:88) i : R i =1 (cid:88) j ∈ N i R j w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) w N ( j, D [˜ n ] , R [ N ] , θ [ n ] )ˆ η p (cid:16) θ i , D i , (cid:88) k ∈ N i D k , θ j , D j , (cid:88) k ∈ N j D k (cid:17) . (28) One question is left to answer to practitioners: “how would researchers find the optimalnumber of participants, for a given target level of power and pre-specified minimum de-tectable effect of the treatment?”. We consider the problem where the number of treatedunits is internalized in the decision problem. We define β α ( w N ) to be an upper bound onthe maximal variance to reject the null hypothesis of interest, against a local alternative,for a given level 1 − α . In practice, β α ( w N ) may be computed after specifying the minimumdetectable effect. Examples are provided at the end of this section.The optimization problem in such a case takes the following form.min R [ N ] ,D [˜ n ] N (cid:88) i =1 R i (29)such that ∀ w N ∈ W N ,( i ) β α ( w N ) ≥ ˆ V N,p ( w N ; A, D [˜ n ] , R [ N ] , θ [ N ] , I ) , and ( ii ) R i = 0 ∀ i ∈ J . (30)Intuitively, the optimization problem minimizes the total number of participants, byimposing that the resulting variance is not larger than the maximal variance that wouldallow rejecting the null hypothesis of interest under a fixed alternative. In the followinglines we discuss choices of β α ( w N ). Example 3.1.
Suppose we are interested in performing the following test: H : m (1 , , − m (0 , ,
1) = 0 , H : m (1 , , − m (0 , , > ν (31) for some ν > . Let n = (cid:80) Ni =1 R i , where R i solve the optimization in Equation (29) . Usingasymptotic approximations (see, Section 4), we obtain that rejection of the null at size α ,we have ≤ ν − z − α × (cid:112) V N ( w N ) ⇒ V N ( w N ) ≤ ν /z − α , (32) where z − α denotes the − α quantile of a standard normal distribution. Therefore, a validchoice is β α ( w N ) = ν /z − α . Theoretical Analysis
In this section, we discuss optimality guarantees of the proposed procedure and asymptoticinference.
A natural question is how the variance obtained from the minimization above would com-pare to the variance obtained from the oracle experiment, where the variance and covariancefunction are known and where all units, also the ones in the pilot experiment, may partici-pate in the main experiment. Formally, we compare the solution of the feasible experimentwith the oracle solution of the following optimization problem: V N = min D [˜ n ] ,R [ N ] max w N ∈W N Var (cid:16) (cid:80) Ni =1 R i N (cid:88) i =1 R i w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) Y i (cid:12)(cid:12)(cid:12) A, D [˜ n ] , R [ N ] , θ [ N ] (cid:17) , (33)subject to αn + |J | ≤ (cid:80) Ni =1 R i ≤ n . The oracle experiment minimizes the true variance and it does not impose any conditionon the units in the pilot and their neighbors not participating in the main experiment. Weimpose a lower and an upper bound on the number of participants in the oracle experiment.The upper bound matches the upper bound in the empirical design discussed in Equation(23), namely the same maximum number of participants is considered for the two cases. Onthe other hand, we impose that for the oracle experiment (cid:80) Ni =1 R i /αn ≥ |J | /αn , whichexceeds the lower bound on the original design in Equation (23) by a factor |J | /αn . In theasymptotic regime, where the size of the pilot experiment is assumed to grow at a slowerrate than the number of participants in the main experiment, |J | /αn (cid:46) N N m/n = o (1),for m (cid:46) n / , under the conditions stated in the following paragraphs, and therefore beingasymptotic neglegible.We define the regret as the difference between the variance under the oracle solutionof the optimization problem against the variance evaluated at the estimated treatmentassignment. R N = max w N ∈W N Var (cid:16) (cid:80) Ni =1 R i (cid:88) i : R i =1 w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) Y i (cid:12)(cid:12)(cid:12) A, D [˜ n ] , R [ N ] , θ [ N ] (cid:17) − V N (34)where R [ N ] , D [˜ n ] solve the two-wave experiment in Equation (23). Since we might expectthat each component in the above expression converges to zero, we study the behavior of Here, the lower bound on the participants αn + |J | is assumed to be less or equal than the upper boundon the number of participants n . N , after appropriately multiplying this difference by the maximum sample size n . Thefollowing assumption is imposed. Assumption 4.1.
Let |I| = m . Assume that for some ξ >
0, the following hold:sup d,s,l (cid:12)(cid:12)(cid:12) ˆ σ p ( d, s, l ) − σ ( d, s, l ) (cid:12)(cid:12)(cid:12) (cid:46) m − ξ , sup d,s,l,d (cid:48) ,s (cid:48) ,l (cid:48) (cid:12)(cid:12)(cid:12) ˆ η p ( d, s, l, d (cid:48) , s (cid:48) , l (cid:48) ) − η ( d, s, l, d (cid:48) , s (cid:48) , l (cid:48) ) (cid:12)(cid:12)(cid:12) (cid:46) m − ξ . Assumption 4.1 characterize the convergence rate of the variance and covariance func-tion. Examples are provided at the end of the section. In the following assumption weimpose moment conditions; we denote K N the set of restrictions imposed on R [ N ] as inAssumption 3.1. Assumption 4.2 (Moment and Distributional Conditions) . Suppose that the followingholds for each w N ∈ W N :(A) Y i ∈ [ − M, M ] where
M < ∞ ;(B) nV N ( w N ) > N N /n / = o (1);(D) | w N ( i ; D [˜ n ] , R [ N ] , θ [ n ] ) | < ∞ for all i almost surely.Assumption 4.2 imposes the following conditions: (a) the outcome is bounded; (b)the variance of the outcome, once reweighted by the weights w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) is non-zero, conditional on the network and the treatment assignments. (c) the maximum degreegrows at a rate slower than n / ; finally (d) assumes that the weights are finite. Forlinear regression models, this is satisfied under invertibility of the Gram matrix, and forthe difference in means estimators it requires that non-zero observation are assigned toeach group of interest. These conditions can be directly incorporated in the optimizationproblem. We can now state the first theorem. Theorem 4.1.
Under Assumption 2.1, 4.1, 4.2 n R N (cid:46) N N m − ξ + N N mn . (35)Theorem 4.1 characterizes the difference between the variance of the experiment witha pilot study against the variance of the oracle experiment with known variance and co-variance functions. The theorem outlines a key trade-off: the size of the pilot experimentplays two contrasting effects on the upper bound for the regret: (i) the larger the sizeof the pilot experiment, the smaller the estimation error; (ii) the larger the size of thepilot, the stronger the constraints imposed in the optimization algorithm, and thereforethe larger the regret with respect to the oracle assignment mechanism. Motivated by theabove theorem, the following corollary holds.18 orollary. Suppose that the conditions in Theorem 4.1 hold, with ξ = 1 / (i.e., parametricrate). Then for m (cid:16) ( n/ N N ) / , we have n R N (cid:46) N / N n − / . Therefore under the aboveconditions, n R N → a.s. . The above corollary is the first result that formally characterizes the pilot’s size withrespect to the main experiment. The corollary showcases that the pilot’s size should ap-proximately equal the sample size of the main experiment, rescaled by the maximum degree,to the power of two-third. Such a result has important practical implications: it providesguidance on the choice of the pilot’s size relative to the main experiment.
In the following lines, we derive the asymptotic properties of the estimator without im-posing any assumption on the dependence between the treatment assignments. The resultguarantee valid asymptotic inference on causal effects under general experimental designmechanisms, as well as local dependence of the outcomes of interest. Throughout the restof our discussion, we consider a sequence of data generating processes with n, N → ∞ ,where n ≤ N .Given the second wave experiment, we estimate σ and η using the entire sample. Wethen estimate the variance using a plug-in procedure. n ˆ V N ( w N ) = 1 n (cid:88) i : R i =1 w N ( i, D [˜ n ] , R [ N ] , θ [ n ] )ˆ σ (cid:16) θ i , D i , (cid:88) k ∈ N i D k (cid:17) + 1 n (cid:88) i : R i =1 (cid:88) j ∈ N i R j w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) w N ( j, D [˜ n ] , R [ N ] , θ [ n ] )ˆ η (cid:16) θ i , D i , (cid:88) k ∈ N i D k , θ j , D j , (cid:88) k ∈ N j D j (cid:17) . (36)One necessary condition for the validity of the above estimator is uniform consistencyof the estimator of the conditional variance ˆ σ ( . ) and covariance function ˆ η . On the otherhand, in the presence of a growing maximum degree, such condition is not sufficient, sincethe second component may have arbitrarily many elements. Therefore, validity may requiresome additional conditions on the network topology. Here, we require that the number ofhighly connected individuals represent a relatively small portion of the sample. Assumption 4.3.
Assume that (i) there exist a finite
L < ∞ such that L : (cid:12)(cid:12)(cid:12) { i : | N i | > L } (cid:12)(cid:12)(cid:12) ≤ n / ¯ C, a.s. for some universal constant ¯
C < ∞ . Assume in addition that (ii)sup l,d,s | σ ( l, d, s ) − ˆ σ ( l, d, s ) | = o p (1) , sup l,d,s,l (cid:48) ,d (cid:48) ,s (cid:48) | η ( l, d, s, l (cid:48) , d (cid:48) , s (cid:48) ) − ˆ η ( l, d, s, l (cid:48) , d (cid:48) , s (cid:48) ) | = o p (1) . Finally (iii) assume that W N is finite dimensional.19ondition (i) states that the number of “influential nodes”, namely the number ofindividuals with a large degree (larger than some finite L ) grows at a slower rate than thesample size. Condition (ii) assumes that consistent variance and covariance estimator areavailable to the researcher. Based on the above condition, we can state the next theorem. Theorem 4.2.
Suppose that Assumptions 2.1, 3.1, 4.2, 4.3 hold. Then for all w N ∈ W N , √ n (ˆ τ ( w N ) − τ n ( w N )) (cid:113) n ˆ V n ( w N ) → d N (0 , . (37)The proof of the theorem is contained in the Appendix. The above theorem establishesasymptotic normality for a general class of linear estimators. The rate of convergence ofthe estimator depends on the variance component V N ( w N ). Whenever nV N ( w N ) = O (1),the estimator achieves the optimal √ n convergence rate. The result exploits applications ofStein’s method for dependency graphs (Ross et al., 2011), which in the context of networkinterference has also been discussed in Chin (2018) for a different class of causal estimands.Additional asymptotic properties of estimator for network data have been discussed in avariety of contexts (e.g., Ogburn et al. (2017)). On the other hand, differently from previousreferences, such a result is derived conditionally on the treatment assignment mechanism,allowing for dependence on the treatment assignment mechanism. In this section, we now consider the case where the researcher has access to partial networkinformation only. We consider a study that follows these steps.
Experimental Protocol :
1. Pilot study : Researchers collect information from a random sample of individuals,which is assumed to be disconnected from all other eligible units. Such a sample may becollected from a disconnected component of the network, which we denote as C such as avillage (Banerjee et al., 2013), school (Paluck et al., 2016) or region (Muralidharan et al.,2017). The identity of the neighbors of such individuals in the pilot study as well as theirnetwork characteristics θ i is collected during the first-wave experiment.
2. Survey : researchers collect network information of a random subset of individuals i ∈ { , ..., N } .
3. Experimental design : researchers select the participants and the correspondingtreatment assignments based on the available information, selecting participants i (cid:54)∈ C .20 . Second survey and analysis : researchers collect information ( Y i , D i , D j ∈ N i , θ i , N i )for each participant.The experiment consists of four main steps: a pilot study, where network informationis available to the researcher, a first survey that collects partial network information, thedesign, and the analysis. The analysis is based on one key assumption: the neighbors of theparticipant units are observable to the policymaker, as well as their network characteristicsonce the main experiment is implemented, but not necessarily before. Such informationcan be obtained by including questions on neighbors’ information in the end-line survey ofthe experiment. Under the above protocol, the following result holds. Proposition 5.1.
Let Assumption 2.1 hold. Then the experimental design in Equation (23) satisfies Assumption 3.1. In addition, the estimator ˆ τ ( w n ) in Definition 2.1 and theestimated variance ˆ V n ( w n ) in Equation (28) are observable to the researcher after Step 4. The above proposition guarantees that the validity condition holds under such a pro-tocol. In addition, under Step 4. in the protocol, we obtain that ˆ τ ( w n ) as well as ˆ V n ( w n )are observable by definition of such estimators. Such a result permits valid asymptoticinference on causal effects of interest also in the scenario of partial network information.We now answer the question of optimal design, in the presence of partial networkinformation. The optimal design consists in minimizing the expected variance, where theexpectation is taken also with respect to the missing links . Formally, we minimize thefollowing expression:min D [˜ n ] ,R [ N ] max w N ∈W N E (cid:104) ˆ V N,p ( w N ; A, D [˜ n ] , R [ N ] , θ [ N ] , I ) (cid:12)(cid:12)(cid:12) ˜ A, T , ..., T N (cid:105) , s.t. : N (cid:88) i =1 R i ∈ [ αn, n ] , R i = 0 ∀ i ∈ C . (38)where ˆ η p , ˆ σ p are the covariance and variance functions estimated on the pilot experiment.The expectation is taken with respect to the posterior distribution of the edges, givencurrent information. Such expectation can be computed via Monte Carlo methods byexplicitely modeling the network formation model, using the posterior distribution of theedges and using plug-in estimates for the variance and covariance function. Example 5.1.
Consider the following Erd˝os-R´enyi model: { A i,j } j>i ∼ i.i.d Bern( p ) , p ∼ U (0 , . (39) Assume in addition that A i,j = A j,i and A i,i, = 0 . The model assumes that each individualconnects with independent probabilities. Such probabilities are modeled based on a uniform The problem can also be solved in a fully Bayesian fashion, by imposing a prior distribution also onpotential outcomes. A full derivation of a hierarchical model goes beyond the scope of this paper and weleave for future research this extension. rior. Suppose we observe edges of a subset of individuals ˜ n . Then we obtain that P ( A i,j = 1 | ˜ A ) ∼ δ if ˜ A i,j = 1 δ if ˜ A i,j = 0Beta( α, β ) if ˜ A i,j is missing (40) where δ c denotes a point-mass distribution at c and α = (cid:88) u>v ˜ A u,v { ˜ A u,v ∈ { , }} + 1 , β = ˜ N − (cid:88) u>v ˜ A u,v { ˜ A u,v ∈ { , }} + 1 (41)˜ N is the number of observed connections. Example 5.2.
Following Breza et al. (2017), we can consider a model of the form P ( A i,j = 1 | ν i , ν j , z i , z j , δ ) ∝ exp (cid:16) ν i + ν j + δ dist( z i , z j ) (cid:17) , (42) where ν i denotes individual fixed effect, z i denotes a position in some latent space and δ isan hyper-parameter of interest. In this section, we collect simulation results. Throughout this section, we set θ i = | N i | ,i.e., the sufficient network statistic is the number of neighbors of each individual.We consider the following functional form for the variance and covariance functions: σ ( l, d, s ) = µ + β d + sβ max { l, } , η ( l, d, s, l (cid:48) , d (cid:48) , s (cid:48) ) = (cid:112) σ ( l, d, s ) × σ ( l (cid:48) , d (cid:48) , s (cid:48) ) α. (43)The variance depends on the individual treatment status and on the percentage of treatedneighbors. The covariance instead is chosen using the Cauchy-Swartz inequality with α being the equivalent of the intra-cluster correlation in the presence of clustered networks(Baird et al., 2018). Notice that α ∈ [ − , α = 0 .
1. We choose µ = 0 . β and β in (0 , . , . . , Y i = D i γ + (cid:80) k ∈ N i D k | N i | γ + ε i . (44)We choose γ = 0 . , γ = 1. We remark that the choice of such coefficients does not affectthe resulting variance of the estimator. 22 .1 Simulated and Real-World Networks In a first set of simulations, we generate data from an Erd˝os-R´enyi graph with P ( A i,j =1) = 2 /n and an Albert-Barabasi graph. For the latter, we first draw n/ p = 2 /n , and second, we draw connections of the newnodes sequentially to the existing ones with probability equal to the number of connectionof each pre-existing node divided by the overall number of connections in the graph. Weevaluate the methods over 200 data sets. For the simulated networks, we consider a graphwith N = 800 and where the number of participants selected by the proposed method isat most half of the sample (i.e., n = 400).In the second set of simulations, we evaluate results using the adjacency matrix fromCai et al. (2015). We consider two different adjacency matrices obtained from this study:the “weak” network, where two individuals are connected if either indicated the other asa friend, the “strong” network where two individuals are connected if both individualsindicate the other as a friend. The weak network presents a dense structure, whereasthe strong network presents a sparse structure. We consider the adjacency matrix to bethe matrix obtained from the first five villages, which counts in total N = 832, and weconstraint the number of maximum participants selected by the proposed method to be416 (i.e., n = N/ We evaluate the proposed method , with complete knowledge of the adjacency matrix andwith a pilot study containing 70 units. Estimation of the variance and covariances isperformed using a quadratic program with a positivity constraint on the variance function.In the estimation, we impose constraints on the estimated parameter for α being in [0 , . proposed method with partiallyobserved network . We estimate the variance and covariances, selecting 70 units for a pilotstudy from the sixth village. For such a method, the network in the main village is onlypartially observed before the randomization of the experiment. We consider the case whereonly the sub-block of the adjacency matrix of the first 200 individuals out of the 832individuals is observable to the researcher before randomization. We impute missing edgesusing a simple Erd˝os-R´enyi model, with a uniform prior on the probability of connections.The model is clearly wrongly specified in the real-world scenario, and it is used only tooutline the benefits of the proposed method even when a simplistic model is used for theimputation of missing edges. We solve the optimization problem by alternating a Monte-23arlo step for estimating the variance over the unobserved edges and the optimization stepover treatment assignments and participation indicators.We compare to a set of competitors, where the number of participants either equalsthe number of participants in the main experiment, or it equals the sum of the numberof participants in the main experiments and the number of units used in the pilot study.We consider the following competing methods : (ii) the 3- (cid:15) net graph clustering methodwith 400 participants discussed in Ugander et al. (2013); (iii) the 3- (cid:15) net graph clustering method with 470 participants, denoted as Clustering+ , and three different saturation de-signs. Since saturation design methods are not directly applicable in the presence of a fullyconnected network, we consider extensions of saturation designs, where we combine the (cid:15) -net clustering discussed in Ugander et al. (2013), with the saturation design mechanism(Baird et al., 2018). We consider several alternative specifications (iv)
Saturation1 , with400 participants, with uniform probability assignment across the estimated clusters; (v)the
Saturation1+ , having 470 participants and being as Saturation1; (vii)
Saturation2+ ,with 470 participants, selects the saturation probabilities and the percentage of clustersfor each probability of minimizing the sum of the standard errors of the treatment andspillover effect, with intracluster correlation equals to the true α and with the varianceof the individual error set to be homoskedastic; (iix) Saturation3+ , with 470 participants,instead minimizes the sum of the standard errors of treatment effects, spillover effects aswell as on the slope effects as defined in Baird et al. (2018). However, we remark thatsaturation designs may showcase a poor performance in this particular case since they arenot directly applicable in scenarios where (i) the network is not clustered; (ii) the varianceis unknown to the researcher. Finally, we consider
Random Assignment + , which selectsat random 470 participants and assign equal probabilities treatments. All competitors,with the exception of the random assignment mechanism, uses complete information of thenetwork structure.
We collect results for the real-world network in Table 1, where we report the variance ofthe estimator. Each column corresponds to different values of the coefficients ( β , β ). Theleft-hand side panel collects results for the network with strong ties, and the right-hand sidepanel collects results for the network with weak ties. Results showcase that the proposedmethod with the pilot study on real-world network simulations, significantly outperformsuniformly any competitor under any design . The improvement is significantly larger as thevalues of the coefficients increase, i.e., in the presence of heteroskedasticity.In the presence of the partially observed network, the only valid competitor to theproposed method is the random allocation. In such a case, we observe that the proposedmethod significantly outperforms the random allocation strategy uniformly. Such behavior Since the method in Jagadeesan et al. (2017) is only valid for direct effects, but not spillovers andoverall effects, such method is not a suitable competitor in these simulations. uni-formly any competitor, and the improvement with respect to the competitors increases fora larger degree of heteroskedasticity. In the homoskedastic case (i.e., ( β , β ) = (0 , one single exception, corresponding to estimating theoverall effect under the Albert-Barabasi network. In such a case, the only method that out-performs the proposed procedure is graph clustering algorithms, with 70 more participantsin the main experiment than the proposed method. In all remaining cases, the proposedmethod outperforms any competitor, including those that contain 70 more participants.Such behavior reflects the benefit of conducting a small pilot study before the main ex-periment, especially in the presence of heteroskedastic variances. Since, in this setting, wedo not consider the presence of a separate cluster, as in the real-world network analysis,results for the partially observed network are not computed for simulated networks.25igure 3: In the left panel, we report the percentage decrease in the number of unitsnecessary to achieve the same level of variance between the best performing competitorand the ELI method, using the simulations with the real-world network. The case denotedas “Unobserved network”compares the random allocation to the ELI method with thepartially observed network. In the difference, we consider the number of units used by theELI method be given by the sum of participants and the size of the pilot study. In theright panel of Figure 3, we report the variance in the log-scale of the proposed method (inblue) against the competitor with the lowest median variance, which randomizes using thesum of participants selected by ELI and units in the pilot study.26able 1: Variance for estimating the overall effect, using data originated from Cai et al.(2015), using the first five villages as the population of interest N = 832). Each columncorresponds to a different design, for different values of the coefficients ( β , β ). “ELI”corresponds to the proposed method, where 400 participants from the 832 potential par-ticipants are sampled in the main experiment, and a pilot study with 70 units is used. Thesecond row corresponds to the proposed method, where only the first sub-block with thefirst 200 observations is observable from the main experiment, and a pilot of 70 units fromthe sixth village is used. Methods with a + use 470 participants in the main experiment,and without a +, such methods use 400 participants in the main experiment. All competi-tors, with the exception of the random allocation (Random All+), exploit full knowledgeof the network structure. Strong WeakOverall Effect (0,0) (0.5,0.5) (0.5,1) (0,0) (0.5,0.5) (0.5,1)ELI 0 .
551 1 .
134 1 .
367 0 .
769 1 .
442 1 . .
857 1 .
710 2 .
171 1 .
934 3 .
928 5 . .
107 2 .
249 2 .
876 2 .
430 4 .
827 6 . .
694 1 .
591 2 .
038 0 .
874 1 .
830 2 . .
913 1 .
985 2 .
513 1 .
523 3 .
143 3 . .
793 1 .
847 2 .
420 0 .
989 2 .
104 2 . .
059 2 .
259 2 .
940 1 .
736 3 .
603 4 . .
719 1 .
669 2 .
104 0 .
944 1 .
973 2 . .
931 2 .
171 2 .
772 1 .
700 3 .
844 4 . .
491 1 .
028 1 .
299 0 .
790 1 .
525 1 . .
589 1 .
263 1 .
619 1 .
598 3 .
060 3 . .
641 1 .
431 1 .
882 1 .
813 3 .
580 4 . .
864 2 .
147 2 .
600 1 .
838 3 .
528 4 . .
652 1 .
500 1 .
942 1 .
403 2 .
807 3 . .
999 2 .
491 3 .
001 2 .
165 4 .
022 5 . .
760 1 .
755 2 .
286 1 .
654 3 .
283 4 . .
773 1 .
900 2 .
371 1 .
516 2 .
986 3 . .
801 1 .
910 2 .
449 2 .
231 4 .
202 5 . Conclusions
In this paper, we have introduced a novel method for designing experiments under in-terference. Motivated by applications in the social sciences, we consider general networkstructure, and we accommodate for estimating a large class of causal estimands using para-metric and non-parametric estimators. We allow for the variance and covariance betweenunits being unknown, and we provide the first set of conditions on pilot studies underinterference when such functions are estimated from a first-wave experiment. We proposea design that selects treatment assignments and participation indicators to minimize thevariance of the final estimator. We derive the first set of guarantees on the variance, andtheoretical analysis on pilot’s size.We considered designs where either full or partial network information is available tothe researchers. In the latter case, we outlined the importance of exploiting modelingstrategies for the network formation model for minimizing the resulting variance. Ourempirical findings suggest robustness to such a model in the presence of a partially observednetwork. We leave for future research addressing the question of network model selectionfor experimental design in the presence of a partially observed network.This paper makes two key assumptions: interactions are anonymous, and interferencepropagates to the neighbors only. Future research should address the question of designunder general interactions and interference propagating on the entire network. Exploringthe effect of the network topology as well as different exposure mappings on the performanceof the design mechanisms remains an open research question.28 eferences
Aronow, P. M., C. Samii, et al. (2017). Estimating average causal effects under generalinterference, with application to a social network experiment.
The Annals of AppliedStatistics 11 (4), 1912–1947.Athey, S., D. Eckles, and G. W. Imbens (2018). Exact p-values for network interference.
Journal of the American Statistical Association 113 (521), 230–240.Bai, Y. (2019). Optimality of matched-pair designs in randomized controlled trials.
Avail-able at SSRN 3483834 .Baird, S., J. A. Bohren, C. McIntosh, and B. ¨Ozler (2018). Optimal design of experimentsin the presence of interference.
Review of Economics and Statistics 100 (5), 844–860.Banerjee, A., A. G. Chandrasekhar, E. Duflo, and M. O. Jackson (2013). The diffusion ofmicrofinance.
Science 341 (6144), 1236498.Barrera-Osorio, F., M. Bertrand, L. L. Linden, and F. Perez-Calle (2011). Improvingthe design of conditional transfer programs: Evidence from a randomized educationexperiment in colombia.
American Economic Journal: Applied Economics 3 (2), 167–95.Barrios, T. (2014). Optimal stratification in randomized experiments.
Manuscript, HarvardUniversity .Basse, G. and A. Feller (2016). Analyzing multilevel experiments in the presence of peereffects. arXiv preprint arXiv 1608 .Basse, G. W. and E. M. Airoldi (2018a). Limitations of design-based causal inference anda/b testing under arbitrary and network interference.
Sociological Methodology 48 (1),136–151.Basse, G. W. and E. M. Airoldi (2018b). Model-assisted design of experiments in thepresence of network-correlated outcomes.
Biometrika 105 (4), 849–858.Bhattacharya, D., P. Dupas, and S. Kanaya (2013). Estimating the impact of means-tested subsidies under treatment externalities with application to anti-malarial bednets.Technical report, National Bureau of Economic Research.Bond, R. M., C. J. Fariss, J. J. Jones, A. D. Kramer, C. Marlow, J. E. Settle, and J. H.Fowler (2012). A 61-million-person experiment in social influence and political mobiliza-tion.
Nature 489 (7415), 295.Breza, E., A. G. Chandrasekhar, T. H. McCormick, and M. Pan (2017). Using aggregatedrelational data to feasibly identify network structure without network data. Technicalreport, National Bureau of Economic Research.29ai, J., A. De Janvry, and E. Sadoulet (2015). Social networks and the decision to insure.
American Economic Journal: Applied Economics 7 (2), 81–108.Charnes, A. and W. W. Cooper (1962). Programming with linear fractional functionals.
Naval Research logistics quarterly 9 (3-4), 181–186.Chin, A. (2018). Central limit theorems via stein’s method for randomized experimentsunder interference. arXiv preprint arXiv:1804.03105 .Choi, D. (2017). Estimation of monotone treatment effects in network experiments.
Journalof the American Statistical Association 112 (519), 1147–1155.DellaVigna, S. and D. Pope (2018). Predicting experimental results: who knows what?
Journal of Political Economy 126 (6), 2410–2456.Duflo, E., P. Dupas, and M. Kremer (2011). Peer effects, teacher incentives, and the impactof tracking: Evidence from a randomized evaluation in kenya.
American EconomicReview 101 (5), 1739–74.Dupas, P. (2014). Short-run subsidies and long-run adoption of new health products:Evidence from a field experiment.
Econometrica 82 (1), 197–228.Eckles, D., B. Karrer, and J. Ugander (2017). Design and analysis of experiments innetworks: Reducing bias from interference.
Journal of Causal Inference 5 (1).Egger, D., J. Haushofer, E. Miguel, P. Niehaus, and M. W. Walker (2019). General equi-librium effects of cash transfers: experimental evidence from kenya. Technical report,National Bureau of Economic Research.Forastiere, L., E. M. Airoldi, and F. Mealli (2016). Identification and estimation oftreatment and interference effects in observational studies on networks. arXiv preprintarXiv:1609.06245 .Goldsmith-Pinkham, P. and G. W. Imbens (2013). Social networks and the identificationof peer effects.
Journal of Business & Economic Statistics 31 (3), 253–264.Graham, B. S., G. W. Imbens, and G. Ridder (2010). Measuring the effects of segregation inthe presence of social spillovers: A nonparametric approach. Technical report, NationalBureau of Economic Research.Harshaw, C., F. S¨avje, D. Spielman, and P. Zhang (2019). Balancing covariates in ran-domized experiments using the gram-schmidt walk. arXiv preprint arXiv:1911.03071 .Horvitz, D. G. and D. J. Thompson (1952). A generalization of sampling without replace-ment from a finite universe.
Journal of the American statistical Association 47 (260),663–685. 30udgens, M. G. and M. E. Halloran (2008). Toward causal inference with interference.
Journal of the American Statistical Association 103 (482), 832–842.Jagadeesan, R., N. Pillai, and A. Volfovsky (2017). Designs for estimating the treatmenteffect in networks with interference. arXiv preprint arXiv:1705.08524 .Kallus, N. (2018). Optimal a priori balance in the design of controlled experiments.
Journalof the Royal Statistical Society: Series B (Statistical Methodology) 80 (1), 85–112.Kang, H. and G. Imbens (2016). Peer encouragement designs in causal inference withpartial interference and identification of local average network effects. arXiv preprintarXiv:1609.04464 .Karlan, D. and J. Appel (2018).
Failing in the field: What we can learn when field researchgoes wrong . Princeton University Press.Karlan, D. S. and J. Zinman (2008). Credit elasticities in less-developed economies: Im-plications for microfinance.
American Economic Review 98 (3), 1040–68.Kasy, M. (2016). Why experimenters might not always want to randomize, and what theycould do instead.
Political Analysis 24 (3), 324–338.Kojevnikov, D., V. Marmer, and K. Song (2019). Limit theorems for network dependentrandom variables. arXiv preprint arXiv:1903.01059 .Leung, M. P. (2019a). Inference in models of discrete choice with social interactions usingnetwork data.
Available at SSRN 3446926 .Leung, M. P. (2019b). Treatment and spillover effects under network interference.
Availableat SSRN 2757313 .Li, X., P. Ding, Q. Lin, D. Yang, and J. S. Liu (2019). Randomization inference for peereffects.
Journal of the American Statistical Association , 1–31.Manski, C. F. (2013). Identification of treatment response with social interactions.
TheEconometrics Journal 16 (1), S1–S23.Miguel, E. and M. Kremer (2004). Worms: identifying impacts on education and healthin the presence of treatment externalities.
Econometrica 72 (1), 159–217.Muralidharan, K. and P. Niehaus (2017). Experimentation at scale.
Journal of EconomicPerspectives 31 (4), 103–24.Muralidharan, K., P. Niehaus, and S. Sukhtankar (2017). General equilibrium effects of(improving) public employment programs: Experimental evidence from india. Technicalreport, National Bureau of Economic Research.31gburn, E. L., O. Sofrygin, I. Diaz, and M. J. van der Laan (2017). Causal inference forsocial network data. arXiv preprint arXiv:1705.08527 .Paluck, E. L., H. Shepherd, and P. M. Aronow (2016). Changing climates of conflict:A social network experiment in 56 schools.
Proceedings of the National Academy ofSciences 113 (3), 566–571.Pouget-Abadie, J. (2018).
Dealing with Interference on Experimentation Platforms . Ph.D. thesis.Ross, N. et al. (2011). Fundamentals of stein’s method.
Probability Surveys 8 , 210–293.S¨avje, F., P. M. Aronow, and M. G. Hudgens (2017). Average treatment effects in thepresence of unknown interference. arXiv preprint arXiv:1711.06399 .Sinclair, B., M. McConnell, and D. P. Green (2012). Detecting spillover effects: Designand analysis of multilevel experiments.
American Journal of Political Science 56 (4),1055–1069.Sussman, D. L. and E. M. Airoldi (2017). Elements of estimation theory for causal effectsin the presence of network interference. arXiv preprint arXiv:1702.03578 .Tabord-Meehan, M. (2018). Stratification trees for adaptive randomization in randomizedcontrolled trials. arXiv preprint arXiv:1806.05127 .Taylor, S. J. and D. Eckles (2018). Randomized experiments to detect and estimate socialinfluence in networks. In
Complex Spreading Phenomena in Social Systems , pp. 289–322.Springer.Ugander, J., B. Karrer, L. Backstrom, and J. Kleinberg (2013). Graph cluster randomiza-tion: Network exposure to multiple universes. In
Proceedings of the 19th ACM SIGKDDinternational conference on Knowledge discovery and data mining , pp. 329–337. ACM.Vazquez-Bare, G. (2017). Identification and estimation of spillover effects in randomizedexperiments. arXiv preprint arXiv:1711.02745 .Viviano, D. (2019). Policy targeting under network interference. arXiv preprintarXiv:1906.10258 .Wager, S. and K. Xu (2019). Experimenting in equilibrium. arXiv preprintarXiv:1903.02124 . 32
Extensions
A.1 Allowing for Higher-Order Dependence
In this section, we relax the local dependence assumption, and we consider the generalcase where unobservables exhibit M -dependence. Formally, we replace Assumption 2.1with weaker conditions. Assumption A.1 (Model under Higher-Order Dependence) . Let Equation (1) hold. As-sume in addition that for all i ∈ { , ..., N } , (cid:110) ε i , { ε k } k (cid:54)∈∪ Mu =1 N uj ,j ∈∪ Mu =1 N ui (cid:111) ⊥ { ε j } j (cid:54)∈∪ Mu =1 N ui (cid:12)(cid:12)(cid:12) A, θ [ N ] a.s. ( ε i , ε j ) = d ( ε i (cid:48) , ε j (cid:48) ) (cid:12)(cid:12)(cid:12) A, θ [ N ] ∀ ( i, j, i (cid:48) , j (cid:48) ) : i ∈ N j , i (cid:48) ∈ N j (cid:48) , θ i = θ i (cid:48) , θ j = θ j (cid:48) a.s., N N < ¯ C < ∞ . (45)Assumption A.1 states the following: (i) unobservables are independent whenever theyare distant by more than M edges; (ii) the joint distribution of two unobservables giventhe adjacency matrix is the same, whenever (a) potential treatments are the same, and(b) such unobservable are at the same distance from the unit of interest. (b) impliesthat, for example, the dependence between an individual and its first-degree neighbor canbe potentially different from the individual and a second or third-degree neighbor. Inaddition, the assumption states that the maximum degree is uniformly bounded. The second condition is the experimental restriction. In the following condition, wedefine ˜ H = [ N ] \ {I ∪ j ∈I ∪ Mu =1 N uj } . (46)The set ˜ H denotes all individuals in the population of interest, after excluding the pilotunits and the neighbors of the pilot units up to the M th degree. The following restrictionis imposed.
Assumption A.2 (Experimental Restriction) . Let the following hold:( A ) : ε i ∈ ˜ H ⊥ (cid:16) D [˜ n ] , R [ N ] (cid:17)(cid:12)(cid:12)(cid:12) A, θ [ N ] , I , and ( B ) : ε [ N ] ⊥ { j ∈ I} (cid:12)(cid:12)(cid:12) A, θ [ N ] . Assume in addition, that ( C ) : R i = 0 ∀ i ∈ ˜ J (47)where ˜ J = {I ∪ j ∈I ∪ Mu =1 N uj } .The following theorem extends Theorem 3.1 to higher-order dependence. After a quick inspection of the derivations contained in the second part of the Appendix, the readermay observe that such condition can be replaced by assuming that the maximum degree of the sampledunits and their neighbors up to order M scales at a rate slower than n / . heorem A.1. Under Assumption A.1, A.2 E (cid:104) ˆ τ n ( w n ) (cid:12)(cid:12)(cid:12) I , A, D [˜ n ] , R [ N ] , θ [ N ] (cid:105) = τ n ( w n ) . The proof of the theorem is contained in the second part of the Appendix. Based onAssumption A.1, the variance component takes the following form: nV N ( w N ) = 1 n (cid:88) i : R i =1 w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) V ar ( Y i | A, D [˜ n ] , R [ N ] , θ [ N ] )+ 1 n (cid:88) i : R i =1 (cid:88) j ∈ N i R j w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) w N ( j, D [˜ n ] , R [ N ] , θ [ n ] ) Cov ( Y i , Y j | A, D [˜ n ] , R [ N ] , θ [ N ] )+ 1 n (cid:88) i : R i =1 (cid:88) j ∈ N i R j w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) w N ( j, D [˜ n ] , R [ N ] , θ [ n ] ) Cov ( Y i , Y j | A, D [˜ n ] , R [ N ] , θ [ N ] )+ · · · + 1 n (cid:88) i : R i =1 (cid:88) j ∈ N Mi R j w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) w N ( j, D [˜ n ] , R [ N ] , θ [ n ] ) Cov ( Y i , Y j | A, D [˜ n ] , R [ N ] , θ [ N ] ) . (48)Therefore, the variance sums over the covariances of each individual and her neighborsup to the M th degree. Notice now that the variance and each covariance component isidentified, where each covariance component depends on the distance of unit i from element j . Formally, we obtain that the following holds. V ar ( Y i | A, D [˜ n ] , R [ N ] , θ [ N ] ) = Var (cid:16) r ( D i , (cid:88) k ∈ N i D k , θ i , ε i ) (cid:12)(cid:12)(cid:12) D i , (cid:88) k ∈ N i D k , θ i (cid:17) = σ (cid:16) θ i , D i , (cid:88) k ∈ N i D k (cid:17) , (49)which guarantees identifiability of the variance function. Similarly, for a given j ∈ N ui wehave Cov ( Y i , Y j | A, D [˜ n ] , R [ N ] , θ [ N ] ) = Cov (cid:16) r ( D i , (cid:88) k ∈ N i D k , θ i , ε i ) , r ( D j , (cid:88) k ∈ N j D k , θ j , ε j ) | A, D [˜ n ] (cid:17) = Cov (cid:16) r ( D i , (cid:88) k ∈ N i D k , θ i , ε i ) , r ( D j , (cid:88) k ∈ N j D k , θ j , ε j ) | j ∈ N ui , D [˜ n ] (cid:17) = η u (cid:16) θ i , D i , (cid:88) k ∈ N i D k , θ j , D j , (cid:88) k ∈ N j D k (cid:17) . (50)The above expression states that under the above condition the covariance between twoindividuals, whose shortest path between such two individual is of length u is a functionwhich only depends on (a) the length of the path, (b) the treatment assignment of each of34hese two individuals, (c) the treatment assignments of the corresponding neighbors, (d)the network statistics θ i and θ j of these two individuals.Based on such a conclusion, estimation of these components can be performed via para-metric or non-parametric procedures, whereas the latter may be extremely data-intensive.The design of the experiment consists in minimizing the variance under the restriction inAssumption A.2.Inference is guaranteed under the following theorem. Theorem A.2.
Suppose that Assumption 2.1, A.1, A.2 hold. Then for V N ( w n ) as definedin Equation (22) , √ n (ˆ τ ( w N ) − τ ( w n )) (cid:112) nV N ( w n ) → d N (0 , . (51)The proof of the theorem is contained in the second part of the Appendix. A.2 Randomized Treatments
In this section, we extend model to discuss randomization based on observable covariates.In particular, we assign treatments and participation indicators at random, and indepen-dently. Randomization is stratified on observable covariates. The following restriction onthe outcome model is imposed. Y i = r (cid:16) D i , | N i | − (cid:88) k ∈ N i D k , θ i , ε i (cid:17) , ε i | T [ N ] , θ i = l, A ∼ P l , (52)where ε i defines unobservables. The above model defines potential outcomes as a functionof the share of treated neighbors, as well as unobservables and additional covariates.Using the first wave experiment, we estimate the pilot variance and covariance as followsˆ σ p (cid:16) θ i , D i , (cid:88) k ∈ N i D k (cid:17) , ˆ η p (cid:16) θ i , D i , (cid:88) k ∈ N i D k , θ j , D j , (cid:88) k ∈ N j D k (cid:17) , where the variance and covariance depend on individual, neighbors’ treatment assignments,and covariates. We aim to estimate the propensity score and the selection probability basedon a first wave experiment. Formally, we aim to estimate the following functions: P ( D i = 1 | R i = 1 , θ i = l ) = e ( l ) , e ∈ E P ( R i = 1 | θ i = l ) = r ( l ) , r ∈ R , where such functions are assumed to depend on the number of neighbors of each individualand the observed characteristics of such an individual. For expositional convenience wediscuss estimating the overall effect of the treatment, whereas the framework directly extendto direct and spillover effects. We consider the following estimator:ˆ τ = 1 N (cid:88) i : R i =1 ˜ w (cid:16) D i , (cid:88) k ∈ N i D k , θ i (cid:17) Y i (53)35here˜ w (cid:16) D i , (cid:88) k ∈ N i D k , θ i (cid:17) = 1 P ( R i = 1 | θ i ) ×× (cid:110) D i = 1 , (cid:80) k ∈ N i D k = | N i | (cid:111) P ( D i = 1 , (cid:80) k ∈ N i D k = | N i || R i = 1 , θ i , θ j ∈ N i ) − (cid:110) D i = 0 , (cid:80) k ∈ N i D k = 0 (cid:111) P ( D i = 0 , (cid:80) k ∈ N i D k = 0 | R i = 1 , θ i , θ j ∈ N i ) . The estimator re-weights observations by the propensity score, in the same spirit of theHorowitz-Thompson estimator (Horvitz and Thompson, 1952). The key idea is the following: we select a sub-sample and we minimize the expectedvariance, with respect to the distribution of treatment assignments and participation indi-cators. Formally, given a randomly selected sub-sample G , we minimize over e ∈ E , r ∈ R ,the following expression:1 |G| (cid:88) i ∈G E D ∼ e,R ∼ r (cid:104) R i ˜ w (cid:16) D i , (cid:88) k ∈ N i D k , θ i , θ k ∈ N i (cid:17) ˆ σ p ( D i , (cid:88) k ∈ N i D k , θ i ) (cid:105) + 1 |G| (cid:88) i ∈G (cid:88) j ∈ N i E D ∼ e,R ∼ r (cid:104) R i R j ˜ w (cid:16) D i , (cid:88) k ∈ N i D k , θ i , θ k ∈ N i (cid:17) ˜ w (cid:16) D j , (cid:88) k ∈ N j D k , θ j , θ k ∈ N i (cid:17) ˆ η p ( · ) (cid:105) . (55)Given the minimizers ˆ e, ˆ r , we then implement the second-wave experiment, on thepopulation defined as [ N ] \ (cid:110) J ∪ G ∪ j ∈G N j (cid:111) . Namely, we implement the experimenton those units whose covariates and outcomes have not been used for the design of theexperiment. Under the above modeling condition ˆ e, ˆ r , only is independent on the outcomeand covariates of all other participants in the main experiment. A.3 Minimax Design in the Absence of Pilots
Whenever the variance and covariance functions are not available to the researcher, wedevise an optimization algorithm over the identity of participants, treatment assignments,and a number of participating units under a maximal constraint on the variance function.Suppose that the researcher has prior knowledge on σ ∈ S , η ∈ E ( S ) , (56) Under independence of trreatment assignments, the above estimator simplifies given the following iden-tity: P (cid:16) D i = d, (cid:88) k ∈ N i D k = s | N i | (cid:12)(cid:12)(cid:12) R i = 1 , θ i , θ j ∈ N i (cid:17) = P (cid:16) D i = d (cid:12)(cid:12)(cid:12) θ i (cid:17) × (cid:88) u ,...,u l : (cid:80) v u v = s | N i | | N i | (cid:89) k =1 P (cid:16) D N ( k ) i = u k (cid:12)(cid:12)(cid:12) θ N ( k ) i (cid:17) . (54) B σ ∈ (0 , ∞ ), L η , U η ∈ [0 , S = { f : { , } × Z (cid:55)→ R + , || f || ∞ ≤ B σ }E ( S ) = { g ( f , f ) ∈ [ − L η f f , U η f f ] , f , f ∈ S} . (57)The function class encodes upper and lower bounds on the variance and covariance function.Then in such a case, the min-max optimization problem can be written as follows:min R [ N ] ,D [˜ n ] N (cid:88) i =1 R i (58)subject to ( i ) sup w N ∈W n ,η ∈E ( S ) ,σ ∈S ˆ V n,p ( w n ; · ) − β α ( w N ) ≤ . (59)The optimization problem consists in minimizing the number of participants, afterimposing constraints on the maximal variance. Similar to Section 5, β α ( w N ) denotes themaximal variance to reject a given null hypothesis with size α for a fixed alternative. Remark 3. (Implementation)
The optimization can be written with respect to additionalparameters σ i which denote the variance of each element i and the parameters η i,j whichdenote the covariance between i, j . The supremum is taken over a finite set of such param-eters, under the constraint that σ i = σ j whenever i and j have the same treatment status,number of treated neighbors and θ i = θ j . Similarly for any pair ( η i,j , η u,v ) . Additionalconstraints on the function class such as a linear function class with bounded coefficientsmay be considered. In such a case, such restriction translates into possibly different upperand lower bounds on each σ i and η i,j . B Additional Tables
Table 2 discusses the main notation. We collect results of the simulated network in Table3, and Table 4. Each table reports the variance averaged over two-hundred replications.Each design corresponds to a different set of parameters ( β , β ), which can be found atthe top of the table. 37otation Description R i Indicator of whether an individual participates in the experiment; D i Treatment assignment indicator; Y i Outcome of interest; A Adjacency matrix; N i Neighbors of individual i ; | N i | Number of neighbors of individual i ; T i Additional covariates; θ i Individual specific characteristics; n Number of participants in the experiment;˜ n Number of participants in the experiments and their neighbors;[ N ] Population of interest;[˜ n ] Set of participants and their neighbors; I Set of units in a pilot study; J Set of pilot units and their neighbors; R [ N ] Vector containing participation indicators of all units. D [˜ n ] Vector containing treatment assignments of all participants and their neighbors; θ [ n ] Vector containing relevant network characteristics of the participants.Table 2: Notation in the main text.38able 3: Variance of the overall effect (sum of spillover and treatment effects). 200 repli-cations. Each column corresponds to different values of the coefficient. Panel at the topcollects results for the Erd˝os-R´enyi graph and at the bottom for the Albert-Barabasi graph.ER (0,0) (0, 0.5) (0,1) (0, 1.5) (0.5,0.5) (0.5,1) (0.5,1.5) (1,1.5)ELI 0 .
624 0 .
929 1 .
194 1 .
420 1 .
200 1 .
415 1 .
637 1 . .
162 1 .
740 2 .
315 2 .
891 2 .
329 2 .
905 3 .
479 4 . .
640 0 .
991 1 .
343 1 .
694 1 .
361 1 .
713 2 .
063 2 . .
908 1 .
378 1 .
849 2 .
316 1 .
859 2 .
330 2 .
801 3 . .
767 1 .
188 1 .
607 2 .
029 1 .
631 2 .
051 2 .
471 2 . .
090 1 .
654 2 .
217 2 .
781 2 .
231 2 .
794 3 .
358 3 . .
679 1 .
047 1 .
416 1 .
783 1 .
430 1 .
800 2 .
169 2 . .
993 1 .
587 2 .
177 2 .
771 2 .
178 2 .
771 3 .
364 3 . .
714 0 .
909 1 .
278 1 .
566 1 .
294 1 .
548 1 .
595 2 . .
144 1 .
714 2 .
284 2 .
851 2 .
299 2 .
874 3 .
482 4 . .
693 1 .
098 1 .
503 1 .
908 1 .
531 1 .
938 2 .
060 2 . .
936 1 .
435 1 .
936 2 .
434 1 .
950 2 .
451 2 .
800 3 . .
837 1 .
325 1 .
811 2 .
299 1 .
845 2 .
333 2 .
471 3 . .
132 1 .
733 2 .
336 2 .
934 2 .
354 2 .
955 3 .
358 4 . .
732 1 .
152 1 .
572 1 .
992 1 .
594 2 .
015 2 .
169 2 . .
091 1 .
762 2 .
433 3 .
103 2 .
425 3 .
096 3 .
364 4 . .
545 0 .
782 1 .
091 1 .
308 1 .
102 1 .
321 1 .
579 1 . .
676 1 .
037 1 .
400 1 .
763 1 .
379 1 .
740 2 .
101 2 . .
224 1 .
674 2 .
120 2 .
568 2 .
601 3 .
046 3 .
497 4 . .
678 1 .
036 1 .
395 1 .
756 1 .
409 1 .
769 2 .
128 2 . .
496 2 .
038 2 .
585 3 .
129 3 .
173 3 .
717 4 .
259 5 . .
825 1 .
262 1 .
698 2 .
136 1 .
715 2 .
150 2 .
588 3 . .
969 1 .
393 1 .
820 2 .
247 2 .
053 2 .
478 2 .
901 3 . .
930 1 .
474 2 .
016 2 .
562 1 .
930 2 .
473 3 .
013 3 . .
571 0 .
792 1 .
081 1 .
359 1 .
059 1 .
399 1 .
574 1 . .
672 1 .
057 1 .
443 1 .
830 1 .
398 1 .
782 2 .
101 2 . .
984 1 .
383 1 .
784 2 .
184 2 .
192 2 .
594 3 .
495 3 . .
676 1 .
060 1 .
444 1 .
829 1 .
453 1 .
837 2 .
127 2 . .
204 1 .
689 2 .
175 2 .
661 2 .
678 3 .
163 4 .
261 4 . .
827 1 .
294 1 .
763 2 .
233 1 .
773 2 .
239 2 .
587 3 . .
859 1 .
262 1 .
665 2 .
066 1 .
902 2 .
307 2 .
904 3 . .
984 1 .
590 2 .
196 2 .
805 2 .
107 2 .
713 3 .
015 3 . Auxiliary Lemmas
Lemma C.1. (Ross et al., 2011) Let X , ..., X n be random variables such that E [ X i ] < ∞ , E [ X i ] = 0 , σ = Var( (cid:80) ni =1 X i ) and define W = (cid:80) ni =1 X i /σ . Let the collection ( X , ..., X n ) have dependency neighborhoods N i , i = 1 , ..., n and also define D = max ≤ i ≤ n | N i | . Thenfor Z a standard normal random variable, we obtain d W ( W, Z ) ≤ D σ n (cid:88) i =1 E | X i | + √ D / √ πσ (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) i =1 E [ X i ] , (60) where d W denotes the Wesserstein metric. To show that the optimization problem admits a mixed-integer linear program formula-tion, we first introduce the following proposition, which follows similarly to what discussedin Viviano (2019).
Lemma C.2. (Viviano, 2019) Any function g i that depends on D i and (cid:80) k ∈ N i D k can bewritten as g i ( D i , (cid:88) k ∈ N i D k ) = | N i | (cid:88) h =0 ( g i (1 , h ) − g i (0 , h )) u i,h + ( t i,h, + t i,h, − g i (0 , h ) , (61) where u i,h , t i,h, , t i,h, are defined by the following linear inequalities. ( A ) D i + t i,h, + t i,h, − < u i,h ≤ D i + t i,h, + t i,h, , u i,h ∈ { , } ∀ h ∈ { , ..., | N i |} , ( B ) ( (cid:80) k A i,k D k − h ) | N i | + 1 < t i,h, ≤ ( (cid:80) k A i,k D k − h ) | N i | + 1 + 1 , t i,h, ∈ { , } , ∀ h ∈ { , ..., | N i |} ( C ) ( h − (cid:80) k A i,k D k ) | N i | + 1 < t i,h, ≤ ( h − (cid:80) k A i,k D k ) | N i | + 1 + 1 , t i,h, ∈ { , } , ∀ h ∈ { , ...., | N i |} . (62) Proof.
We define the following variables: t i,h, = 1 { (cid:88) k ∈ N i D k ≥ h } , t i,h, = 1 { (cid:88) k ∈ N i D k ≤ h } , h ∈ { , ...., | N i |} . The first variable is one if at least h neighbors are treated, and the second variable isone if at most h neighbors are treated.Since each unit has | N i | neighbors and zero to | N i | neighbors can be treated, there arein total (cid:80) ni =1 (2 | N i | + 2) of such variables.The variable t i,h, can be equivalently be defined as( (cid:80) k A i,k D k − h ) | N i | + 1 < t i,h, ≤ ( (cid:80) k A i,k D k − h ) | N i | + 1 + 1 , t i,h, ∈ { , } . (63)41he above equation holds for the following reason. Suppose that h < (cid:80) k A i,k D k . Since ( (cid:80) k A i,k D k − h ) | N i | +1 <
0, the left-hand side of the inequality is negative and the right hand sideis positive and strictly smaller than one. Since t i,h, is constrained to be either zero or one,in such case, it is set to be zero. Suppose now that h ≥ (cid:80) k A i,k D k . Then the left-handside is bounded from below by zero, and the right-hand side is bounded from below by one.Therefore t i,h, is set to be one. Similarly, we can write( h − (cid:80) k A i,k D k ) | N i | + 1 < t i,h, ≤ ( h − (cid:80) k A i,k D k ) | N i | + 1 + 1 , t i,h, ∈ { , } . (64)By definition, t i,h, + t i,h, = (cid:40) (cid:80) k ∈ N i D k (cid:54) = h . (65)Therefore, we can write1 n n (cid:88) i =1 | N i | (cid:88) h =0 ( g i (1 , h ) − g i (0 , h )) D i ( t i,h, + t i,h, −
1) + ( t i,h, + t i,h, − g i (0 , h ) . (66)Finally, we introduce the variable u i,h = D i ( t i,h, + t i,h, − D i , t i,h, , t i,h, ∈ { , } it is easy to show that such variable is completely determined by the above constraint.This completes the proof. D Identification
Proof of Theorem 3.1
Consider all D [˜ n ] such that D i = d, (cid:80) k ∈ N i D k = s , and all A such that θ i = l . To derivethe result we want to show that E (cid:104) Y i (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, D [˜ n ] , A, R [ N ] , θ [ N ] , I (cid:105) = m ( d, s, l ) (67)for all those units in the sample (i.e., R i = 1).Notice first that under Assumption 2.1, E (cid:104) Y i (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, D [˜ n ] , A, R [ N ] , θ [ N ] , I (cid:105) = E (cid:104) r (cid:16) d, s, l, ε i (cid:17)(cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, D [˜ n ] , A, R [ N ] , I (cid:105) . (68)42bserve now that under Assumption 3.1, since participants are not units in the pilot study,we have that the following holds: E (cid:104) r (cid:16) d, s, l, ε i (cid:17)(cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, D [˜ n ] , A, R [ N ] , θ [ N ] , I (cid:105) = E (cid:104) r (cid:16) d, s, l, ε i (cid:17)(cid:12)(cid:12)(cid:12) θ i = l, A, θ [ N ] (cid:105) . (69)Under Assumption 2.1, since ε i ⊥ ( A, T [ N ] ) | θ i , the proof completes. Proof of Theorem A.1
The proof follows similarly to the previous proof. Consider all D [˜ n ] such that D i = d, (cid:80) k ∈ N i D k = s , and all A such that θ i = l . Notice first that under Equation (1), E (cid:104) Y i (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, D [˜ n ] , A, R [ N ] , θ [ N ] , I (cid:105) = E (cid:104) r (cid:16) d, s, l, ε i (cid:17)(cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, θ [ N ] , D [˜ n ] , A, R [ N ] , I (cid:105) . (70)Observe now that under Assumption A.2, since participants are not units in the pilot studyand their neighbors up to the M th degree, we have that the following holds: E (cid:104) r (cid:16) d, s, l, ε i (cid:17)(cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, D [˜ n ] , A, R [ N ] , θ [ N ] , I (cid:105) = E (cid:104) r (cid:16) d, s, l, ε i (cid:17)(cid:12)(cid:12)(cid:12) A, θ [ N ] , θ i = l (cid:105) . (71)Under Equation (1), since ε i ⊥ ( A, T [ N ] ) | θ i , the proof completes. Proof of Lemma 3.2
Consider all D [˜ n ] such that D i = d, (cid:80) k ∈ N i D k = s , and all A such that θ i = l .First notice that under Assumption 2.1,Var (cid:16) Y i (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, D [˜ n ] , R [ N ] \ i , A, θ i = l, R i = 1 , θ [ N ] , I (cid:17) =Var (cid:16) r ( d, s, l, ε i ) (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, D [˜ n ] , R [ N ] \ i , A, θ i = l, R i = 1 , I , θ [ N ] (cid:17) . (72)Under Assumption 3.1, since R i = 0 for all those units not being in the pilot study, wethen obtainVar (cid:16) r ( d, s, l, ε i ) (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, D [˜ n ] , R [ N ] \ i , A, θ i = l, R i = 1 , I , θ [ N ] (cid:17) = Var (cid:16) r ( d, s, l, ε i ) (cid:12)(cid:12)(cid:12) θ [ N ] , A, θ i = l (cid:17) . (73)Under Assumption 2.1, since ε i ⊥ ( A, T [ N ] ) | θ i , the proof of the first part completes.43or the covariance component, the same reasoning follows. Consider all D [˜ n ] such that D i = d, (cid:80) k ∈ N i D k = s, D j = d (cid:48) , (cid:80) k ∈ N j D k = s (cid:48) and all A such that θ i = l, θ j = l (cid:48) . First,notice that by the second condition in Assumption 3.1, and Assumption 2.1,Cov (cid:16) Y i , Y j (cid:12)(cid:12)(cid:12) D i = d, D j = d (cid:48) , (cid:88) k ∈ N i D k = s, (cid:88) k ∈ N j D k = s (cid:48) , D [˜ n ] , R [ N ] \{ i,j } , A, θ i = l, θ j = l (cid:48) ,R i = 1 , R j = 1 , θ [ N ] , I (cid:17) =Cov (cid:16) r ( d, s, l, ε i ) , r ( d (cid:48) , s (cid:48) , l (cid:48) , ε j ) (cid:12)(cid:12)(cid:12) D i = d, D j = d (cid:48) , (cid:88) k ∈ N i D k = s, (cid:88) k ∈ N j D k = s (cid:48) , D [˜ n ] ,R [ N ] \{ i,j } , A, θ i = l, θ j = l (cid:48) , R i = 1 , R j = 1 , θ [ N ] , I (cid:17) . (74)By Assumption 3.1, we obtain that the above component equalsCov (cid:16) r ( d, s, l, ε i ) , r ( d (cid:48) , s (cid:48) , l (cid:48) , ε j ) (cid:12)(cid:12)(cid:12) A, θ i = l, θ j = l (cid:48) , θ [ N ] , i ∈ H , j ∈ H (cid:17) . (75)By Assumption 3.1, the covariance is zero if two individuals are not neighbors. In such acase the lemma trivially holds. Therefore, consider the case where individuals are neighbors.Then we obtainCov (cid:16) r ( d, s, l, ε i ) , r ( d (cid:48) , s (cid:48) , l (cid:48) , ε j ) (cid:12)(cid:12)(cid:12) A, θ i = l, θ j = l (cid:48) , i ∈ H , j ∈ H , θ [ N ] (cid:17) = Cov (cid:16) r ( d, s, l, ε i ) , r ( d (cid:48) , s (cid:48) , l (cid:48) , ε j ) (cid:12)(cid:12)(cid:12) i ∈ N j , θ i = l, θ j = l (cid:48) (cid:17) := η ( l, d, s, l (cid:48) , d (cid:48) , s (cid:48) ) . (76)The last equality follows by Assumption 2.1. For the pilot study, observe that by Assump-tion 3.2, we obtain thatVar (cid:16) Y i (cid:12)(cid:12)(cid:12) D i = d, (cid:88) k ∈ N i D k = s, θ i = l, A (cid:17) = Var (cid:16) r ( d, s, l, ε i ) (cid:12)(cid:12)(cid:12) θ i = l (cid:17) (77)and similarly for the covariance component under Assumption 3.1. E Asymptotics
Theorem E.1.
Suppose that Assumption 2.1, 3.1, 4.2 hold. Then for all w N ∈ W N , √ n (ˆ τ ( w N ) − τ n ( w N )) (cid:112) nV N ( w N ) → d N (0 , . (78) Proof of Theorem E.1.
We prove asymptotic normality after conditioning on the sigmaalgebra σ ( D [˜ n ] , A, R [ N ] , θ [ N ] , I ). Since H = [ N ] \ J , conditioning on σ ( D [˜ n ] , A, R [ N ] , θ [ N ] , I )is equivalent to conditioning on the set σ ( D [˜ n ] , A, R [ N ] , θ [ N ] , H ), since given A , J only44epends in I , and [ N ] is deterministic. Notice that unbiasness holds by Theorem 3.1. Next,we show that Y i for all i : R i = 1 are locally dependent, given σ ( A, D [˜ n ] , R [ N ] , θ [ N ] , H ). Toshowcase such a result it suffices to show that { ε i } i : R i =1 (cid:12)(cid:12)(cid:12) σ ( A, D [˜ n ] , R [ N ] , θ [ N ] , H )are locally dependent. Here local dependence refers to the set of random variables Y [ n ] forming a dependency graph as discussed in Ross et al. (2011).The argument is the following. Under Assumption 3.1, unobservables are locally de-pendent given the adjacency matrix A only. By the second condition in Assumption 3.1,since unobservables are mutually independent on the set H given the adjacency matrix,we obtain that unobservable also defines a dependence graph as Assumption 2.1 given A, H , θ [ N ] . That is, { ε [ N ] } i : R i =1 (cid:12)(cid:12)(cid:12) σ ( A, H , θ [ N ] )are locally dependent. Consider now the distribution of all unobservables in the set H ,given A, H , θ [ N ] . By the first condition in Assumption 3.1, such unobservables are mutuallyindependent on D [˜ n ] , R [ N ] , given σ ( A, H , θ [ N ] ). Therefore, ε i ∈H (cid:12)(cid:12)(cid:12) σ ( A, H , D [˜ n ] , R [ N ] , θ [ N ] )are locally dependent. Since { i : R i = 1 } ⊆ H the local dependence assumption ofunobservables in such a set holds conditional on A, H , D [˜ n ] , R [ N ] , θ [ N ] for such units.Notice now that by Assumption 2.1 Y i = r (cid:16) D i , (cid:88) k ∈ N i D k , θ i , ε i (cid:17) . (79)Therefore, given σ ( A, H , D [˜ n ] , R [ N ] , θ [ N ] ) outcomes Y [ n ] are locally dependent. Let X i := 1 (cid:112) V N ( w N ) w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) (cid:16) Y i − m ( D i , (cid:88) k ∈ N i D k , θ i ) (cid:17) . (80)Notice that by Assumption 3.1, similarly to what discussed in Theorem 3.1, we have E [ X i | σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] )] = 0 . (81)To prove the theorem we invoke Lemma C.1. In particular, we observe that for Z ∼ N (0 , x ∈ R (cid:12)(cid:12)(cid:12) P (cid:16) (cid:88) i : R i =1 X i ≤ x (cid:12)(cid:12)(cid:12) σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] ) (cid:17) − Φ( x ) (cid:12)(cid:12)(cid:12) ≤ c (cid:115) d W | σ ( D [˜ n ] ,A,R [ N ] , H ,θ [ N ] ) ( (cid:88) i : R i =1 X i , Z ) . (82)45here d W | σ ( D [˜ n ] ,A,R [ N ] , H ,θ [ N ] ) ( (cid:80) i : R i =1 X i , Z ) denotes the Wesserstein metric taken with re-spect to the conditional marginal distribution of (cid:80) i : R i =1 X i given σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] )and Φ( x ) is the CDF of a standard normal distribution, and c < ∞ is a universal constant.To apply Lemma C.1 we take σ = 1 since X i already contains the rescaling factor definedin Lemma C.1. In addition, since nV N ( w N ) is strictly bounded away from zero we obtainunder Assumption 4.2 E [ X i | σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] )] ≤ ¯ C n , E [ X i | σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] )] ≤ ¯ C n / . (83)Therefore, the condition in Lemma C.1 are satisfied. Then we obtain d W | σ ( D [˜ n ] ,A,R [ N ] , H ,θ [ N ] ) ( (cid:88) i : R i =1 X i , Z ) ≤ N n (cid:88) i : R i =1 E [ | X i | | D [˜ n ] , R [ N ] , A, H , θ [ N ] ]+ √ N / n √ π (cid:115) (cid:88) i : R i =1 E [ X i | R [ N ] , A, D [˜ n ] , H , θ [ N ] ] ≤ N n n / ¯ C + √ N / n √ πn ¯ C (84)for a universal constan ¯ C < ∞ . Since N n /n / = o (1), we obtainsup x ∈ R (cid:12)(cid:12)(cid:12) P (cid:16) (cid:88) i : R i =1 X i ≤ x (cid:12)(cid:12)(cid:12) σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] ) (cid:17) − Φ( x ) (cid:12)(cid:12)(cid:12) ≤ (cid:115) N n n / ¯ C + √ N / n √ πn ¯ C = o (1)(85)where the latter result is true since the conditions in Lemma C.1 are satisfied pointwisefor any w N ∈ W N and by the property of the Wesserstein metric. To prove that the resultalso holds unconditionally, we may notice that for some arbitrary measure µ N ,sup x ∈ R (cid:12)(cid:12)(cid:12) (cid:90) P (cid:16) (cid:88) i : R i =1 X i ≤ x (cid:12)(cid:12)(cid:12) σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] ) (cid:17) dµ N − Φ( x ) (cid:12)(cid:12)(cid:12) ≤ sup x ∈ R (cid:90) (cid:12)(cid:12)(cid:12) P (cid:16) (cid:88) i : R i =1 X i ≤ x (cid:12)(cid:12)(cid:12) σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] ) (cid:17) − Φ( x ) (cid:12)(cid:12)(cid:12) dµ N ≤ (cid:90) sup x ∈ R (cid:12)(cid:12)(cid:12) P (cid:16) (cid:88) i : R I =1 X i ≤ x (cid:12)(cid:12)(cid:12) σ ( D [˜ n ] , A, R [ N ] , H , θ [ N ] ) (cid:17) − Φ( x ) (cid:12)(cid:12)(cid:12) dµ N = o (1) . (86)This concludes the proof. Corollary.
Theorem A.2 holds.Proof.
The proof follows similarly to the above theorem with an important modification.We observe that the variables X i in Equation (80) do not follow a dependence graph since46hey exhibit M degree dependence. Instead, we construct a graph where two individuals areconnected if they are connected by at least M edges in the original graph. In such a graph,the variables X i as defined in Equation (80) satisfy the local dependence assumption inLemma C.1. In order for the lemma to apply, we need to show that the maximum degree ofsuch a graph, denoted as ¯ N M satisfies the condition ¯ N M /n / = o (1). This follows underAssumption A.1, since the maximum degree is uniformly bounded. This completes theproof. Theorem E.2.
Let Assumptions 2.1, 3.1, 4.2, 4.3 hold. Then for all w N ∈ W N , V N ( w N )ˆ V n ( w N ) − → p . (87) Proof of Theorem E.2.
First, notice that under Assumption 2.1, 3.1, Lemma 3.2 holds, andtherefore, the conditional variance can be written as a function of σ ( . ) , η ( . ).Next, we prove consistency pointwise for each element in W n . Throughout the proof wedenote η ( i, j ) = η (cid:16) θ i , D i , (cid:80) k ∈ N i D k , θ j , D j , (cid:80) k ∈ N j D k (cid:17) and σ ( i ) = σ (cid:16) θ i , D i , (cid:80) k ∈ N i D k (cid:17) .For notational convenience, we denote w N ( i, . ) omitting the last arguments when clear fromthe context.We have | nV N ( w N ) − n ˆ V n ( w N ) | ≤ (cid:12)(cid:12)(cid:12) n (cid:88) i : R i =1 w N ( i )(ˆ σ ( i ) − σ ( i )) (cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) ( a ) + | n (cid:88) i : R i =1 (cid:88) j ∈ N i w N ( i, D [˜ n ] , R [ N ] , θ [ n ] ) w N ( j )(ˆ η ( i, j ) − η ( i, j )) | (cid:124) (cid:123)(cid:122) (cid:125) ( b ) . (88)Consider first term ( a ). Then we can write( a ) ≤ max o ∈ [ n ] w N ( o ) n (cid:88) i : R i =1 (cid:12)(cid:12)(cid:12) (ˆ σ ( i ) − σ ( i )) (cid:12)(cid:12)(cid:12) = o p (1) . (89)Consider now the covariance component. We have( b ) ≤ max o ∈ [ n ] | w N ( o ) | n (cid:88) i : R i =1 (cid:12)(cid:12)(cid:12) (cid:88) j ∈ N i w N ( j )(ˆ η ( i, j ) − η ( i, j )) (cid:12)(cid:12)(cid:12) ≤ max o ∈ [ n ] | w N ( o ) | n (cid:88) i : R i =1 (cid:12)(cid:12)(cid:12) (cid:88) j ∈ N i w N ( j )(ˆ η ( i, j ) − η ( i, j )) (cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) ( J ) . (90)47e have( J ) ≤ max o ∈ [ n ] | w N ( o ) | n (cid:88) i : | N i |≤ L (cid:12)(cid:12)(cid:12) (cid:88) j ∈ N i w N ( j )(ˆ η ( i, j ) − η ( i, j )) (cid:12)(cid:12)(cid:12) +max o ∈ [ n ] | w N ( o ) | n (cid:88) i : | N i |≥ L (cid:12)(cid:12)(cid:12) (cid:88) j ∈ N i w N ( j )(ˆ η ( i, j ) − η ( i, j )) (cid:12)(cid:12)(cid:12) . (91)We have by Holder’s inequality and Assumption 4.2,max o ∈ [ n ] | w N ( o ) | n (cid:88) i : | N i |≤ L (cid:12)(cid:12)(cid:12) (cid:88) j ∈ N i w N ( j )(ˆ η ( i, j ) − η ( i, j )) (cid:12)(cid:12)(cid:12) ≤ L ¯ C n (cid:88) i : | N i |≤ L max j | ˆ η ( i, j ) − η ( i, j ) | = o p (1)(92)where the last equality follows by Assumption 4.2, for a constant ¯ C . The second componentreads as follows:max o ∈ [ n ] | w N ( o ) | n (cid:88) i : | N i |≥ L (cid:12)(cid:12)(cid:12) (cid:88) j ∈ N i w N ( j )(ˆ η ( i, j ) − η ( i, j )) (cid:12)(cid:12)(cid:12) ≤ ¯ C N n n (cid:88) i : | N i |≥ L max j | ˆ η ( i, j ) − η ( i, j ) | . (93)By Assumption 4.3, we have that¯ C N n n (cid:88) i : | N i |≥ L max j | ˆ η ( i, j ) − η ( i, j ) | ≤ O p (1) N n n / /n = o p (1) . (94)Here max j | ˆ η ( i, j ) − η ( i, j ) | = O p (1) since ˆ η converges uniformly to η . Uniform consistencyfollows from the union bound, since |W n | is finite dimensional. The proof is complete bythe fact that nV N ( w N ) > Corollary.
Theorem 4.2 holds.Proof.
The proof follows from Theorem E.1 and Theorem E.2 by Slutsky theorem.
F Proof of Theorem 4.1
Proof.
First, notice that under Assumption 2.1, 3.2, Lemma 3.2 holds, and therefore, theconditional variance can be written as a function of σ ( . ) , η ( . ).Recall in addition that weights for those units not in the experiment are equal to zerowhenever R i = 0 (i.e., in such case we only consider the sub-sample of participants).Throughout the proof, for arbitrary D ∗ , R ∗ , we denoteˆ V n,p ( D ∗ [˜ n ] , R ∗ [ N ] ) = max w N ∈W N ˆ V n,p ( w N ; D ∗ [˜ n ] , R ∗ [ N ] , θ [ N ] , A ) , the maximum variance over W N with estimated covariance and variance function obtainedfrom the pilot experiment and V N ( D ∗ [˜ n ] , R ∗ [ N ] ), the population counterpart. For notational48onvenience we refer to w N ( i, . ), omitting the last arguments, whenever clear from thecontext. Let ( ˜ D [ n ] , ˜ R [ N ] ) ∈ arg min D [˜ n ] ,R [ N ] ,αn ≤ (cid:80) Ni =1 R i ≤ n,R j =0 ∀ j ∈J V N ( D [˜ n ] , R [ N ] ) , (95)the optimal assignments for known variance and covariance function and constraint onthe pilot units. Denote D [˜ n ] , R [ N ] the assignments that solve the experimenter problem inEquation (23).Then we have R N = V N ( D [˜ n ] , R [ N ] ) − min D ∗ [˜ n ] ,R ∗ [ N ] ,αn + |J |≤ (cid:80) Ni =1 R ∗ i ≤ n V N ( D ∗ [˜ n ] , R ∗ [ N ] )= V N ( D [˜ n ] , R [ N ] ) − min D ∗ [˜ n ] ,R ∗ [ N ] ,αn + |J |≤ (cid:80) Ni =1 R ∗ i ≤ n V N ( D ∗ [˜ n ] , R ∗ [ N ] )+ V N ( ˜ D [ n ] , ˜ R [ N ] ) − ˆ V n,p ( D [˜ n ] , R [ N ] ) + ˆ V n,p ( D [˜ n ] , R [ N ] ) − V N ( ˜ D [ n ] , ˜ R [ N ] ) ≤ (cid:16) V N ( D [˜ n ] , R [ N ] ) − ˆ V n,p ( D [˜ n ] , R [ N ] ) (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) ( i ) + (cid:16) ˆ V n,p ( ˜ D [ n ] , ˜ R [ N ] ) − V N ( ˜ D [ n ] , ˜ R [ N ] ) (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) ( ii ) + V N ( ˜ D [ n ] , ˜ R [ N ] ) − min D ∗ [˜ n ] ,R ∗ [ N ] ,αn + |J |≤ (cid:80) Ni =1 R ∗ i ≤ n V N ( D ∗ [˜ n ] , R ∗ [ N ] ) (cid:124) (cid:123)(cid:122) (cid:125) ( iii ) . (96)We study each component separately. We can write( i ) ≤ n N (cid:88) i =1 w ∗ N ( i ) R i (cid:16) σ ( θ i , D i , (cid:88) k ∈ N i D k ) − ˆ σ p ( θ i , D i , (cid:88) k ∈ N i D k ) (cid:17) + 1 n N (cid:88) i =1 (cid:88) j ∈ N i w ∗ N ( i ) w ∗ N ( j ) R i R j (cid:16) ( η − ˆ η p )( θ i , D i , (cid:88) k ∈ N i D k , θ j , D j , (cid:88) k ∈ N j D k ) (cid:17) , (97)where w ∗ N ∈ arg max w N ∈W N n N (cid:88) i =1 w N ( i ) R i (cid:16) σ ( θ i , D i , (cid:88) k ∈ N i D k ) (cid:17) + 1 n N (cid:88) i =1 (cid:88) j ∈ N i w N ( i ) w N ( j ) R i R j η ( θ i , D i , (cid:88) k ∈ N i D k , θ j , D j , (cid:88) k ∈ N j D k ) (cid:17) . (98)Here for notational convenience, we denoted ( η − ˆ η p )( θ i , D i , (cid:80) k ∈ N i D k , θ j , D j , (cid:80) k ∈ N j D k )the difference between the two functions, evaluated at ( θ i , D i , (cid:80) k ∈ N i D k , θ j , D j , (cid:80) k ∈ N j D k ).49herefore, we obtain( i ) ≤ max w N ∈W N (cid:12)(cid:12)(cid:12) n N (cid:88) i =1 w N ( i ) R i (cid:16) σ ( θ i , D i , (cid:88) k ∈ N i D k ) − ˆ σ p ( θ i , D i , (cid:88) k ∈ N i D k ) (cid:17)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) ( I ) + max w N ∈W N (cid:12)(cid:12)(cid:12) n N (cid:88) i =1 (cid:88) j ∈ N i w N ( i ) w N ( j ) R i R j (cid:16) ( η − ˆ η p )( θ i , D i , (cid:88) k ∈ N i D k , θ j , D j , (cid:88) k ∈ N j D k ) (cid:17)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) ( II ) . (99)The above term satisfies(99) (cid:46) N n sup d,s,l,d (cid:48) ,s (cid:48) ,l (cid:48) (cid:16) η ( d, s, l, d (cid:48) , s (cid:48) , l (cid:48) ) − ˆ η p ( d, s, l (cid:48) , d (cid:48) , s (cid:48) ) (cid:17)(cid:46) αn +sup d,s,l (cid:16) σ ( l, d, s ) − ˆ σ p ( l, d, s ) (cid:17)(cid:46) αn. (100)The same reasoning also applies to the term ( ii ). Finally, consider the term ( iii ). Letˆ D [ N ] , ˆ R [ N ] ∈ arg min D ∗ [˜ n ] ,R ∗ [ N ] ,αn + |J |≤ (cid:80) Ni =1 R ∗ i ≤ n V N ( D ∗ [˜ n ] , R ∗ [ N ] ) . For notational convenience, define σ ( i, ˆ D, A ) = σ ( θ i , ˆ D i , (cid:88) k ∈ N i ˆ D k ) (101)and similarly for η ( i, j, ˆ D, A ).We can write by definition of ˆ R, ˆ D max w N ∈W N (cid:80) Ni =1 ˆ R i ) N (cid:88) i =1 ˆ R i w N ( i, ˆ D, ˆ R ) σ ( i, ˆ D, A ) + (cid:88) j ∈ N i ˆ R i ˆ R j w N ( i, ˆ D, ˆ R ) w N ( j, ˆ D, ˆ R ) η ( i, j, ˆ D, A )= min D ∗ [˜ n ] ,R ∗ [ N ] ,αn + |J |≤ (cid:80) Ni =1 R ∗ i ≤ n max w N ∈W N (cid:16) (cid:80) Ni =1 R ∗ i ) (cid:88) i ∈J R ∗ i w N ( i, D ∗ , R ∗ ) σ ( i, D ∗ , A ) + (cid:88) j ∈ N i \J R ∗ i R ∗ j w N ( i, D ∗ , R ∗ ) w N ( j, D ∗ , R ∗ ) η ( i, j, D ∗ , A )+ 1( (cid:80) Ni =1 R ∗ i ) (cid:88) i ∈J R ∗ i w N ( i, D ∗ , R ∗ ) σ ( i, D ∗ , A ) + (cid:88) j ∈ N i R ∗ i R ∗ j w N ( i, D ∗ , R ∗ ) w N ( j, D ∗ , R ∗ ) η ( i, j, D ∗ , A ) (cid:17) , (102)where J = [ N ] \ {I ∪ ∪ j ∈I N j } and J = {I ∪ ∪ j ∈I N j } . Notice now that the following50erm 1( (cid:80) Ni =1 R ∗ i ) (cid:88) i ∈J R ∗ i w N ( i, D ∗ , R ∗ ) σ ( i, D ∗ , A ) + (cid:88) j ∈ N i R ∗ i R ∗ j w N ( i, D ∗ , R ∗ ) w N ( j, D ∗ , R ∗ ) η ( i, j, D ∗ , A ) ≥ (cid:80) Ni =1 R ∗ i ) (cid:88) i ∈J (cid:88) j ∈ N i R ∗ i R ∗ j w N ( i, D ∗ , R ∗ ) w N ( j, D ∗ , R ∗ ) η ( i, j, D ∗ , A ) ≥ − ¯ C |J | max i ∈J | N i | / ( αn + |J | ) (103)for all R ∗ , D ∗ satisfying the above constraints, since the second moment are bounded byAssumption 4.2, for a universal constant ¯ C < ∞ . Therefore, the following holds:(102) ≥ min D ∗ [˜ n ] ,R ∗ [ N ] ,αn + |J |≤ (cid:80) Ni =1 R ∗ i ≤ n (cid:16) max w N ∈W N (cid:80) Ni =1 R ∗ i ) (cid:88) i ∈J R ∗ i w N ( i, D ∗ , R ∗ ) σ ( i, D ∗ , A )+ (cid:88) j ∈ N i \J R ∗ i R ∗ j w N ( i, D ∗ , R ∗ ) w N ( j, D ∗ , R ∗ ) η ( i, j, D ∗ , A ) (cid:17) − ¯ C |J | max i ∈J | N i | / ( αn + |J | ) . (104)In the above expression, neither the variance nor the covariance component of units whichare not in J appears. Instead, the decision variables of all units in such set affects theobjective function only through the constraint and the denominator. The following step isto consider the optimization problem with a slacker constraint, whose objective functionis a lower bound of the above objective function. Since R ∗ i ∈ { , } we have that theconstraint αn + |J | ≤ N (cid:88) i =1 R ∗ i = (cid:88) i ∈J R ∗ i + (cid:88) i ∈J R ∗ i ≤ n (105)is a stricter constraint than αn ≤ (cid:88) i ∈J R ∗ i ≤ n (106)since |J | ≥ (cid:80) i ∈J R ∗ i ≥ ≥ min D ∗ [˜ n ] ,R ∗ [ N ] ,αn ≤ (cid:80) i ∈J R ∗ i ≤ n (cid:16) max w N ∈W N (cid:80) Ni =1 R ∗ i ) (cid:88) i ∈J R ∗ i w N ( i, D ∗ , R ∗ ) σ ( i, D ∗ , A )+ (cid:88) j ∈ N i \J R ∗ i R ∗ j w N ( i, D ∗ , R ∗ ) w N ( j, D ∗ , R ∗ ) η ( i, j, D ∗ , A ) (cid:17) − ¯ C |J | max i ∈J | N i | / ( αn + |J | ) . (107)In the above expression we relaxed the constraint, by allowing the decision variable for unitsin J to be unconstrained. Since such variables affect the above expression only through51he denominator, the solution to the above equation is given by(107) ≥ min D ∗ [˜ n ] ,R ∗ [ N ] ,αn ≤ (cid:80) i ∈J R ∗ i ≤ n (cid:16) max w N ∈W N (cid:80) i ∈J R ∗ i + |J | ) (cid:88) i ∈J R ∗ i w N ( i, D ∗ , R ∗ ) σ ( i, D ∗ , A )+ (cid:88) j ∈ N i \J R ∗ i R ∗ j w N ( i, D ∗ , R ∗ ) w N ( j, D ∗ , R ∗ ) η ( i, j, D ∗ , A ) (cid:17) − ¯ C |J | max i ∈J | N i | / ( αn + |J | ) . (108)Notice now that the solution to Equation (108) satisfies the constraints imposed in the opti-mization problem in Equation (95). Therefore, we obtain that the following two inqualitieshold: min D ∗ [˜ n ] ,R ∗ [ N ] ,αn + |J |≤ (cid:80) Ni =1 R ∗ i ≤ n V N ( D ∗ [˜ n ] , R ∗ [ N ] ) ≥ V N ( ˜ D ∗∗ [ n ] , ˜ R ∗∗ [ N ] ) , V N ( ˜ D [ n ] , ˜ R [ N ] ) ≤ V N ( ˜ D ∗∗ [ n ] , ˜ R ∗∗ [ N ] )(109)where˜ D ∗∗ [ n ] , ˜ R ∗∗ [ N ] ∈ min D ∗ [˜ n ] ,R ∗ [ N ] ,αn ≤ (cid:80) i ∈J R ∗ i ≤ n (cid:16) max w N ∈W N (cid:80) i ∈J R ∗ i + |J | ) (cid:88) i ∈J R ∗ i w N ( i, D ∗ , R ∗ ) σ ( i, D ∗ , A )+ (cid:88) j ∈ N i \J R ∗ i R ∗ j w N ( i, D ∗ , R ∗ ) w N ( j, D ∗ , R ∗ ) η ( i, j, D ∗ , A ) (cid:17) are the solution to Equation (108). After combining the above bounds, it follows that V N ( ˜ D [ n ] , ˜ R [ N ] ) − min D ∗ [˜ n ] ,R ∗ [ N ] ,αn + |J |≤ (cid:80) Ni =1 R ∗ i ≤ n V N ( D ∗ [˜ n ] , R ∗ [ N ] ) ≤ V N ( ˜ D ∗∗ [ n ] , ˜ R ∗∗ [ N ] ) − (cid:16) max w N ∈W N (cid:80) i ∈J ˜ R ∗∗ i + |J | ) (cid:88) i ∈J ˜ R ∗∗ i w N ( i, ˜ D ∗∗ , ˜ R ∗∗ ) σ ( i, ˜ D ∗∗ , A )+ (cid:88) j ∈ N i \J ˜ R ∗∗ i ˜ R ∗∗ j w N ( i, ˜ D ∗∗ , ˜ R ∗∗ ) w N ( j, ˜ D ∗∗ , ˜ R ∗∗ ) η ( i, j, ˜ D ∗∗ , A ) (cid:17) + ¯ C |J | max i ∈J | N i | / ( αn + |J | ) . (110)By trivial algebra, and using the same argument for the weights used for ( i ), we obtainthat the right-hand side of Equation (110), by Assumption 4.2, is bounded as follows(110) ≤ ¯ Cn N n n |J | + |J | ( (cid:80) Ni =1 ˜ R ∗∗ i ) + ¯ C |J | max i ∈J | N i | / ( αn + |J | ) ≤ ¯ C N n n |J | + |J | αn + ¯ C |J | max i ∈J | N i | / ( αn + |J | ) (111)for a universal constant ¯ C < ∞ . The above expression follows from basic rearrangementof the expression. Notice now that |J | ≤ (1 + max i ∈I | N i | ) × m which completes the proof.52 Optimization: MILP for Difference in Means Estimators