Elements of estimation theory for causal effects in the presence of network interference
EElements of estimation theory for causal effects in thepresence of network interference
Daniel L. Sussman ∗ , Edoardo M. Airoldi † February 14, 2017
Abstract
Randomized experiments in which the treatment of a unit can affect the outcomesof other units are becoming increasingly common in healthcare, economics, and inthe social and information sciences. From a causal inference perspective, the typi-cal assumption of no interference becomes untenable in such experiments. In manyproblems, however, the patterns of interference may be informed by the observation ofnetwork connections among the units of analysis. Here, we develop elements of optimalestimation theory for causal effects leveraging an observed network, by assuming thatthe potential outcomes of an individual depend only on the individual’s treatment andon the treatment of the neighbors. We propose a collection of exclusion restrictionson the potential outcomes, and show how subsets of these restrictions lead to variousparameterizations. Considering the class of linear unbiased estimators of the averagedirect treatment effect, we derive conditions on the design that lead to the existenceof unbiased estimators, and offer analytical insights on the weights that lead to mini-mum integrated variance estimators. We illustrate the improved performance of theseestimators when compared to more standard biased and unbiased estimators, usingsimulations.
Keywords : causal inference, network data, interference, inferential targets, exclusionrestrictions, general additive models, unbiased estimation, minimum integrated vari-ance
Social scientists, businesses, public health researchers and economists frequently seek toperform experiments in environments where it may be prohibitive or undesirable to restrictinteractions between individuals in the study [Gui et al., 2015]. Experiments in these contextspresent challenges and opportunities, as the causal effects of treatments need not be confined ∗ Boston University, Department of Mathematics & Statistics, Boston, MA, [email protected] † Harvard University, Department of Statistics, Cambridge, MA, [email protected] a r X i v : . [ s t a t . M E ] F e b o treated individual. A core assumption in causal inference of no interference [Cox, 1958,Rubin, 1974, Manski, 1990] frequently cannot be made for such studies.In the presence of interference, two key questions arise: (i) Which units’ treatments can effectwhich units’ outcomes? (ii) How can treatments effect each unit’s outcomes? Fortunatelyin some instances, the answer to the first question is partially exposed by an observed socialnetwork based on an online user base [Bond et al., 2012, Gui et al., 2015] or collected as partof a study [McQueen et al., 2015] Provided the network is an accurate depiction of whichindividuals interact with each other, a tempting and often sensible tact is for an experimenterto assume that the transmission of effects of treatments must respect the structure of thenetwork.For the second question, various properties of the treatments, outcomes and the social net-work itself may be used to justify further assumptions regarding the nature of how outcomesrelate to treatments. Furthermore, making additional assumptions may relax the set of rea-sonable analytic tools sufficiently to allow for variance reduction in exchange for a smallincrease in bias.We consider a set of core assumptions for interference when an observed network illuminatesthe structure of this interference. Combinations of these core assumptions lead to a varietyof parametric forms for the potential outcomes which serve to decompose treatment effectsas either direct, indirect or interaction effects. From a randomization inference perspective,we consider the set of unbiased estimates of the direct treatment effect, deriving conditionson the experimental design that ensure linear unbiased estimates exist. We also propose anovel optimality criterion for this setting which draws on Bayesian ideas, deriving minimumintegrated variance unbiased estimates. We demonstrate that using appropriate distributionsfor the parameters leads to improved performance over more standard methods to createunbiased estimators as well as over naive biased estimates.In Section 2, we review causal inference with interference and networks and we introducethe potential outcome framework. In Section 3, we consider one core and four structurualassumptions regarding network interference and in Section 4, we show how these assumptionslead to parameterizations of the potential outcomes which clarify relevant estimands. Asmany of the ideas can be translated between the various sets of assumptions, much of thefocus of this manuscript is under a particular parmameterization where two of the fourstructural assumptions hold. In Section 5 we consider the set of unbiased estimators ofthe direct treatment effect as controlled by collections linear equations and in Section 6we propose a method to choose among this set of estimators by optimizing a quadraticobjective with respect the linear constraints for unbiasedness. Finally, in Section 7 weperform simulations to illustrate the effectiveness of our methods and in Section 8 we discussthe ideas proposed and future directions. 2 Background
The assumption of no interference is at the core of causal inference [Cox, 1958, Manski,1990] as it is a key part of the stable unit treatment assumption [Rubin, 1974]. To relaxthis assumption, researchers have proposed different answers to the question of which unitscan interfere with which other units based on the type of experiment performed. Earlyforms of interference included interference localized to an individual across different roundsof treatment such as in clinical trials with crossover designs [Grizzle, 1965]. Interferencebased on spatial proximity of treated units also received relatively early attention [Kemptonand Lockwood, 1984]. More recently, interference confined to blocks where interference isallowed within blocks but not between blocks has been studied [Hudgens and Halloran,2012]. General interference has been studied occasionally [Rosenbaum, 2007] but typicallyallowing for any unit’s treatment to effect any unit’s outcome makes design and analysismajor challenges.With the rise of interest in social networks, many authors have proposed that observednetwork of interactions or relationships between units may inform how causal effects canbe transmitted between units [Ugander et al., 2013, Eckles et al., 2014, Athey et al., 2015].An assumption that has gained traction for causal inference on networks is that the causaleffects can be passed along edges in the network so that units may experience an effect fromthe treatment of other nodes that are in some ways proximal in the network. Observationof the network itself does not determine which units interfere with each other but typicallythe network structure drives key assumptions about the interference. One form of theseassumptions is that the outcome of unit i depends only on the treatments of units which canconnected to i following a path through a network of length at most k for some k [Athey et al.,2015]. The strongest and most studied assumption is to assume that units outcomes are onlyimpacted by their neighbors in the network, which will be the focus of this manuscript.In the presence of network interference, the experimental design and analysis of the out-comes present distinct challenges. Efforts to design experiments which takes the structureof interference into account include block and neighbor designs [David and Kempton, 1996],graph cluster design [Ugander et al., 2013], nomination designs, high-degree designs [Kimet al., 2015], peer encouragement designs [Eckles et al., 2016] and model-based designs [Basseand Airoldi, 2015]. Many of these can be viewed as versions of classical blocking designs inthe the network context while others are based off of how individuals communicate withinthe network. Whether using a design which takes into account the network structure, or astandard design such as a Bernoulli trial, the subsequent analysis should take into accountthe structure of the interference.Our goals are to build estimators that are unbiased and in some sense optimal for estimandsunder a variety of network interference assumptions. While unbiased estimation is not nec-essary for all scenarios and can often be traded off for a substantial variance reduction,understanding the structure of unbiased estimators can be useful on its own and can alsoilluminate how accepting small amounts of bias can lead to even better estimators.Aronow and Samii [2013] proposed a method to create unbiased estimators for average out-3ome for any given exposure to treatment under interference, where exposure levels aredefined as a subset of treatment allocations for each unit. As with Horvitz-Thompson esti-mation, these estimators rely on deriving the probability of a given exposure based on thedesign, known as the propoensity score, and using inverse propensity score weighting of theobserved outcomes. While this method is generally applicable and easily computed, this isfar from the only way to construct unbiased estimators and one of our goals is to investi-gate the questions, (i) when do unbiased estimators exist? (ii) what is the set of unbiasedestimators for a given design, network, and interference assumptions? and (iii) how can wechoose from among these estimators?We concretely answer the first two questions and provide a criterion for answering the third.In particular, we propose finding the minimum integrated variance linear unbiased estimatorwith respect to a distribution on the potential outcomes. We will call this distribution a priordistribution which can be viewed as an a priori weighting of the parameter values of interest.Minimum integrated variance unbiased estimators have not been studied extensively in thecausal inference or statistics literature. One closely related area where these ideas havebeen investigated is in the field of finite population inference [Ghosh and Meeden, 1997].Minimum integrated variance designs have also been studied in the design of experimentsliterature but these ideas seek a design that functions well for standard estimators, whereasare goal is the reverse [Box and Draper, 1959]. In analogy to results from finite populationstatistics, we will see in Section 6 that estimators similar to those proposed by Aronow andSamii [2013] do arise as minimum integrated variance solutions for certain prior distributions.However, in our simulation studies (see Section 7) we show that other prior distributionson the parameters generally lead to improved performance compared to inverse propensityscore weighting. We will now give an overview of the potential outcomes framework for causal inference,sometimes referred to as the Rubin causal model [Neyman, 1923, Rubin, 1974]. Denote thepopulation of experimental units as [ n ] = { , , . . . , n } . A treatment allocation vector, orsimply an allocation, is a vector z ∈ Z n = { , } n where the i th component z i represents thetreatment assigned to unit i . Our results extend in a straightforward way to finite treatmentregimes, but as notation can become unwieldy, we consider the binary treatment regime.The outcome for unit i under allocation z is denoted as Y i ( z ) ∈ R .The design of a randomized trial is a probability distribution over all allocations with prob-ability mass function denoted as p : Z n (cid:55)→ [0 , z obs distributed ac-cording to the design p . The observed data consists of the allocation z obs and the outcomes Y obsi = Y i ( z obs ). The support of the design is the set of allocations z ∈ Z n such that p ( z ) > z the potential out-comes for all units are non-random, meaning that all randomness in the observed data is dueto the random selection of the treatment allocation z obs . Hence, we use P and E to denoteprobability and expectation with respect to the design p . Occasionally, it will be useful to4efer to the allocation where only unit j is in the treatment group and all others are in thecontrol group; this allocation will be denoted as e j .If one can reasonably assume that the experimental units are not interacting, it is plausibleto make the stable unit treatment value assumption (SUTVA) [Rubin, 1974]. Under SUTVA,for each unit i the outcome Y i ( z ) depends on z only through z i so we can abuse notationand write Y i ( z ) as Y i ( z i ). Hence, SUTVA effectively restricts the set of potential outcomesfor each unit to Y i (1) and Y i (0). For any observed allocation z obs , one would observe n outof the 2 n distinct potential outcomes, leaving n potential outcomes as unobserved. Much ofthe work under the Rubin causal model can be viewed as developing and studying methodsto deal with this missing data problem.Without making any assumptions about interference, we must view the potential outcomefor each individual and each treatment allocation as possibly depending on the treatmentassignments for all individuals [Rubin, 1978]. In this situation, the set of distinct potentialoutcomes explodes in cardinality to 2 n for each individual and n n for the experimentalpopulation. We will still only observe n of these potential outcomes and dealing with amissing data problem of this magnitude is unlikely to yield useful inferences about causaleffects without further assumptions or restrictions. However, we need not restrict ourselvesto SUTVA or arbitrary interference, and in the next section we will explore a collection ofassumptions for the interference between units in a network. The key data for constraining interference between units in our setting is an observed networkof relationships between units that might accounts for possible paths of interference. We willfirst introduce some necessary notation for the network setting.
The network will be represented by a binary adjacency matrix g = [ g ij ] ni,j =1 ∈ { , } n × n . Theentry g ij is 1 if there is a edge from unit i to unit j and 0 if there is no edge. For the purposeof identifiability, we assume that g ii = 0. For generality, the network is directed so that g ij need not equal g ji , though all results below apply in the undirected case where g ij = g ji forall i, j . We will say that unit j is a neighbor of unit i if g ji = 1.The neighborhood of unit i is the set of vertices with edges directed towards unit i . Wedenote the neighborhood by N i = { j : g ji = 1 } . The degree of unit i is denoted as d i = (cid:80) nj =1 g ji = |N i | . For unit i and allocation z , we denote the vector of treatments for theneighbors of unit i by z N i = ( z j ) j ∈N i ∈ { , } d i . The set of treated neighbors for unit i is N z i = { j : g ji = 1 , z j = 1 } . We define the treated degree of unit i , denoted d z i , to be thenumber of treated neighbors of unit i , d z i = |N z i | = ( g T z ) i .5 .2 Core Assumption While many assumptions could be made with regard to how causal effects can propogatethrough the network, perhaps the most concrete assumption is the following.
Definition 3.1 (Neighborhood Interference Assumption (NIA)) . For a network g ∈ { , } n × n ,the potential outcomes satisfy the neighborhood interference assumption for g if for each unit i ∈ [ n ] and all treatment allocations z , z (cid:48) ∈ Z where z j = z (cid:48) j for all j ∈ N i ∪ { i } , it holdsthat Y i ( z ) = Y i ( z (cid:48) ) .NIA means that if the treatments of unit i and its neighbors are held fixed, then changingthe treatments of other units does not change the outcome for unit i . Under NIA, we definean alternative function for the potential outcome in terms of the treatment of the unit i and the treatment of its neighbors, which will help simplify exposition: let the function (cid:101) Y i : { , } × { , } d i (cid:55)→ R satisfy Y i ( z ) = (cid:101) Y i ( z i , z N i ) for all allocations z .Depending on the structure of the observed network, NIA can be a significant reduction inthe set of distinct potential outcomes as compared to an unconstrained interference setting.In general, the number of distinct potential outcomes for each unit under NIA is at most2 d i +1 . If the network is empty, so that g ij = 0 for all i, j , then NIA is equivalent to SUTVAwhile if the network is complete, so that g ij = 1 for all i (cid:54) = j , then NIA allows for arbitraryinterference. The results in the paper are best applied in an intermediate regime withsparse networks where 0 < d i (cid:28) n , which is the setting for many real-world networks.Hence, conditioning on observing g pre-treatment, we can limit the set of distinct potentialoutcomes in a way that allows for the development of estimation theory. Remark 3.2.
The network g does not need to correspond to some observed interactionsbetween units. It is easy to apply NIA in settings where N i is defined in other ways; forexample, the neighbors could correspond to nearby units in space or units sharing certainproperties. Importantly, the network must be observed or at least observable before theexperiment and the network must be fixed and assumed not to be effected by treatment. While assuming NIA answers the question of which units’ treatments can effect which units’outcomes, the question of how the outcomes are effected remains open. In this section,we consider four structural assumptions assumptions that will more clearly restrict howoutcomes are effected by a neighbors’ treatments. These can be combined in various waysto fit different goals of the practitioner to restrict the set of potential outcomes. Two of theassumptions are related to symmetry of the interference effects, and the other assumptionsare related to additivity of effects.Under no interference, the potential must satisfy the form Y i ( z i ) = α i + β i z i which providesa relationship between the potential outcomes framework and linear regression. Similarly,Each combination of these assumptions implies a parametric form for the potential outcomes.We will illustrate the parameterizations in Section 4.1 for some combinations and the detailsfor the remaining combinations are in Appendix A. For each of the following definitions we6ssume that NIA holds. The first assumption is that there is no interaction between a unit’s treatment and thetreatments of a unit’s neighbors.
Definition 3.3 (Additivity of Main Effects) . The potential outcomes satisfy additivity ofmain effects if (cid:101) Y i ( z i , z N i ) = (cid:101) Y i (0 , ) + ( (cid:101) Y i ( z i , ) − (cid:101) Y i (0 , )) + ( (cid:101) Y i (0 , z N i ) − (cid:101) Y i (0 , )) (3.1)for all treatments z and all units i .The first term in Eq. (3.1) is the baseline outcome under no treatment, the second term isthe direct treatment effect, the usual object of study in the causal inference literature, andthe third term is the interference effect. The second assumption is that the interference effects are additive among the interferingunits.
Definition 3.4 (Additivity of Interference Effects) . The potential outcomes satisfy additiv-ity of interference effects if (cid:101) Y ( z i , z N i ) = (cid:101) Y ( z i , ) + d i (cid:88) j =1 (cid:16) (cid:101) Y i (cid:0) z i , ( z N i ) j e j (cid:1) − (cid:101) Y ( z i , ) (cid:17) , (3.2)for all allocations z and units i . Note that e j ∈ { , } di is zero in each coordinate exceptcoordinate j .The first term above is the outcome if no neighbors are treated and each summand is theadditional effect of each neighbor’s treatment. As an example, this assumption can be verifiedto hold in the models proposed in Toulis and Kao [2013] and Eckles et al. [2016] uses closelyrelated assumptions. The third assumption is that the impact of interferences is invariant to which particularsubset of neighbors are treated.
Definition 3.5 (Symmetrically Received Interference Effects) . The potential outcomes havesymmetrically received interference if (cid:101) Y i ( z i , z N i ) = (cid:101) Y i ( z i , τ ( z N i )) , (3.3)for all allocations z , units i , and permutations τ of vectors of length d i .7quivalently, this means that (cid:101) Y depends on its second argument only through the numberof treated neighbors d z i . This is also referred to as anonymous interference [Manski, 2013] orno peer-effect-heterogeneity [Athey et al., 2015]. While the symmetrically received interference effects assumption asserts that the interferenceeffects for unit i are equal across permutations of the neighbors treatments, the symmetricallysent interference assumption asserts that units that share a neighbor are effected “equally”by that neighbor. In general, it is unclear what should be assumed equal since there maybe interaction effects between sets of neighbors. Under additivity of interference effects,however, we can assume that the equality is in terms of the differences between the relevantpotential outcomes. In this manuscript, we will only assume symmetrically sent interferenceeffects in conjunction with additivity of interference effects. Definition 3.6 (Symmetrically Sent Interference Effects) . If the potential outcomes satisfyadditivitivity of interference effects (see Definition 3.4), then the potential outcomes havesymmetrically sent interference if Y i ( z + e j ) − Y i ( z ) = Y i (cid:48) ( z + e j ) − Y i (cid:48) ( z ) , (3.4)for units j and allocations z where z j = 0, and units i, i (cid:48) where g ji = g ji (cid:48) = 1, where e j isthe allocation where only unit j is in the treatment group.Again, this assumption can be verified to hold in certain models proposed in Toulis and Kao[2013] and Eckles et al. [2016]. Remark 3.7.
In this manuscript, we avoid assumptions of constant treatment effects acrossunits so that our methods may be applicable to realistic settings. However, as we will seelater, certain combinations of assumptions, such as combining symmetrically sent interferenceeffects and symmetrically received interference effects will imply constant treatment effectsacross certain units. Conversely, if we assume symmetrically sent interference effects andwe assume that the interference effects are constant across units then this will also implythat the potential outcomes will satisfy the symmetrically received interference assumptionas well. This arises because the assumption of constant interference effects will impose thatthe potential outcomes are linear in the treated degree. Section 4.1.3 and Appendix A offermore details.
In practice, an analyst can choose which additional assumptions may hold in addition to NIA.Figure 1 shows the partial ordering of the twelve possible subsets of assumptions that canbe chosen from the four structural assumptions. Certain combinations of assumptions aredisallowed because we require that Symmetrically Sent Interference Effects is only assumedin combination with Additivity of Interference Effects. Each combination of assumptions8 _N__IA_AN__IA __NA_IAS_N__IA _ANA_IASAN__IA S_NA_IASANA_IA __NASIAS_NASIA _ANASIASANASIA
Figure 1: The partial orderingof models that arise from combin-ing structural assumptions. An ar-row indicates that one model con-tains the other. Each element islabeled where each of four possiblyblank characters refers to whether aparticular assumption is made: (1)Symmetrically Received InterferenceEffects, (2) Additivity of Main Ef-fects, (3) Additivity of InterferenceEffects, (4) Symmetrically Sent In-terference Effects. The largest modelis NIA while the smallest model isSANASIA, where all four structuralassumptions are made.leads to a particular model or parameterization for the potential outcomes. In Section 4.1, weanalyze three of these models, NIA with no additional assumptions, SANIA, where we assumesymmetrically received interference and additivity of main effects, and SANASIA where weinclude all four assumptions from Section 3.3. We then discuss how these parameterizationscan be used to define estimands and how these estimands relate to estimands defined interms of potential outcomes in Section 4.2. Appendix A describes the parameterizationsimplied by the remaining nine combinations of assumptions.
For each of these models, we can parametrize the potential outcomes in terms of the baselineoutcome, the direct treatment effect, the interference effect, and possible interaction betweendirect treatment and interference effects. We note that the direct treatment effect and theinterference effect are closely related to the direct and indirect effects, respeectively, as definedin [Hudgens and Halloran, 2012], while the interaction effect can be viewed as the differencebetween the total causal effect and indirect effects as definied there.
Under NIA, we can write the potential outcome Y i ( z ) as baseline (cid:122) (cid:125)(cid:124) (cid:123)(cid:101) Y (0 , ) + z i direct treatment effect (cid:122) (cid:125)(cid:124) (cid:123) ( (cid:101) Y (1 , ) − (cid:101) Y (0 , )) + ( interference effect (cid:122) (cid:125)(cid:124) (cid:123)(cid:101) Y (0 , z N i ) − (cid:101) Y (0 , ))+ z i ( (cid:101) Y (1 , z N i ) − (cid:101) Y (1 , ) − ( (cid:101) Y (0 , z N i ) − (cid:101) Y (0 , )) (cid:124) (cid:123)(cid:122) (cid:125) interaction effect ) .
9e define α i = (cid:101) Y i (0 , ) as the baseline outcome and β i = (cid:101) Y i (1 , ) − (cid:101) Y i (0 , ) as the directtreatment effect. The functions Γ i : { , } d i (cid:55)→ R and ∆ i : { , } d i (cid:55)→ R are defined asΓ i ( z N i ) = (cid:101) Y i (0 , z N i ) − (cid:101) Y i (0 , ) and∆ i ( z N i ) = (cid:101) Y i (1 , z N i ) − (cid:101) Y i (1 , ) − ( (cid:101) Y i (0 , z N i ) − (cid:101) Y i (0 , )) . Hence, Γ i ( z N i ) gives the interference effect if the treatment allocation for unit i ’s neighbors is z N i . Similarly ∆ i gives the additionally interaction effect between the neighbors’ treatmentsand direct treatment of the unit.Using this notation, the NIA assumption is equivalent to the assertion that the potentialoutcomes adhere to the following parameterization, Y i ( z ) = α i + β i z i + Γ i ( z N i ) + z i ∆ i ( z N i ) . (4.1)For unit i , this parameterization has 2 d i +1 parameters (noting that Γ i ( ) = ∆ i ( ) = 0),which is exactly the number of possibly distinct potential outcomes. Under SANIA, where symmetrically received interference effects and additivity of maineffects are assumed, we abuse notation and redefine Γ i : { , , . . . , d i } (cid:55)→ R such thatΓ i ( d ) = (cid:101) Y i (0 , z N i ) − (cid:101) Y i (0 , ) for all z where (cid:80) j ∈N i z i = d . This is well defined due to thesymmetrically received interference effects assumption. Since the main effects are additive,we may also omit the ∆ term, yielding Y i ( z ) = α i + β i z i + Γ i ( d z i ) (4.2)for all allocations z . Here we have 2 + d i parameters which encode 2 d i + 2 distinct potentialoutcomes for unit i . Our simulation in Section 7 and much of our theoretical results arefocused on this particular model which is relatively parsimonious. The SANASIA assumption restricts SANIA further by assuming additivity of interferenceeffects and symmetrically sent interference effect. This means that the functions Γ i must beadditive. Since we restrict that Γ i (0) = 0, we have that Γ i is linear so that Γ i ( d ) = γ i d forsome γ i ∈ R . Hence, the potential outcomes must satisfy Y i ( z ) = α i + β i z i + γ i d z i . (4.3)Additionally, the interference effects are symmetrically sent. This implies that if i, i (cid:48) bothhave j as a neighbor then the additive effect of j being treated is the same for both i and i (cid:48) , so that γ i = γ i (cid:48) . By iterating on this idea, one can show the following proposition whichrelies on construction of the undirected graph h which indicates whether two nodes share aneighbor. 10 roposition 4.1. Let h ∈ { , } n × n be an adjacency matrix where h ij = { ( g T g ) ij > } .Under SANASIA, the potential outcomes can be parametrized as Y i ( z ) = α i + β i z i + γ i d z i where if i and i (cid:48) are in the same connected component of h then γ i = γ i (cid:48) . Since the interference parameters are equal in connected components of h , the total number ofparameters needed for all potential outcomes is 2 n plus the number of connected componentsof h . In the case that g is undirected and connected, if g is bipartite then h has two connectedcomponents, otherwise h has one connected component. In order to parametrize the potentialoutcomes in the latter case, we need only 2 n + 1 parameters with Y i ( z ) = α i + β i z i + γd z i foreach each unit. These parameters encode 2 n + 2 (cid:80) i d i distinct potential outcomes. Causal estimands are defined as functions of the potential outcomes, frequently an average ofa given function over a set of units. Given parametric forms for the potential outcomes, wecan define estimands in terms these parameters as well: (i) estimands for direct treatmenteffects, the β i terms, (ii) estimands for interference effects, the Γ i terms, and (iii) estimandsfor interaction effects, the ∆ i terms. Estimands defined in terms of potential outcomes mapto different functions of the parameters depending on which assumptions hold, which wedemonstrate below for the three parameterizations from Section 4.1.Combinations of these effects, such as the mean effect of treating all units versus treating nounits can be identified via potential outcomes as1 n n (cid:88) i =1 Y i ( ) − Y i ( ) . Under NIA this estimand is identified parametrically as n (cid:80) i β i +Γ i ( )+∆ i ( ). On the otherhand, if we assume SANIA holds, this estimand is n (cid:80) i β i + Γ i ( d i ), while under SANASIA,this estimand is n (cid:80) i β i + γ i d i . Finally, if SUTVA holds then the above estimand is simply¯ β = n (cid:80) i β i . Alternatively, the same estimand defined in terms of parameters, such as ¯ β , willpotentially have multiple definitions in terms of potential outcomes depending on the specificassumptions made. Defining estimands from potential outcomes or from parameterizationsare complementary approaches that are differentially impacted by changes in the underlyingassumptions.For the remainder of this manuscript, we will focus on estimation of ¯ β , though the coreideas presented below apply to other estimands. The estimand ¯ β can be identified as theaverage direct treatment effect and under NIA and all submodels, including SUTVA, ¯ β canbe defined in terms of the potential outcomes as¯ β = 1 n n (cid:88) i =1 Y i ( e i ) − Y i ( ) (4.4)where under e i only unit i receives treatment. If we know SUTVA holds, then¯ β = 1 n n (cid:88) i =1 Y i (1) − Y i (0) = 1 n n (cid:88) i =1 Y i ( ) − Y i ( ) = 1 n n (cid:88) i =1 Y i ( e i ) − Y i ( ) . β in terms of other potential outcomes,possibly depending on the structure of the graph. In this section we will answer the question of when unbiased estimators of the averagedirect treatment effect exist. We also focus on linear estimators which are of the form (cid:98) β w = (cid:80) ni =1 w i ( z ) Y i ( z ) where each w i : Z n (cid:55)→ R is a function that gives the coefficient forunit i under allocation z . Non-linear estimators can also be unbiased but since the estimandsare linear functions of the potential outcomes, linear estimators are a natural starting pointfor study. Linear estimators include the naive difference of means estimator where w i ( z ) = z i (cid:80) j z j − − z i (cid:80) j (1 − z j ) , (5.1)and the Horvitz-Thompson inverse propensity score weighting estimator where w i ( z ) = z i n P [ z obsi = z i ] − − z i n P [ z obsi = z i ] , (5.2)as well as stratified estimators (see Eq. 6.6 in Proposition 6.5) and other more sophisticatedestimators.Even under SUTVA, certain pathological designs will preclude the existence of unbiasedestimators. In Section 5.1, we’ll give conditions on the design for when linear unbiased esti-mates exists under NIA, SANIA, and SANASIA. In Section 5.2, we will analyze completelyrandomized designs and Bernoulli trial as well as various estimators in the context of NIA. We will require that (cid:98) β w be unbiased for ¯ β with respect to the known design. For linearestimators, the requirement that (cid:98) β w is unbiased is equivalent to a set of linear constraintson the coefficients. The particular constraints depend on which assumptions we make.12 .1.1 NIAProposition 5.1. Under NIA, a linear estimator (cid:98) β w is unbiased for ¯ β if and only if for all i ∈ [ n ] , the following constraints hold: (cid:88) z ∈Z p ( z ) w i ( z ) z i = 1 /n. ( β i constraints) (C1) (cid:88) z ∈Z p ( z ) w i ( z ) = 0 , ( α i constraints) (C2) ∀ z (cid:48) ∈ { , } d i \ { } , (cid:88) z ∈Z p ( z ) w i ( z ) { z N i = z (cid:48) } = 0 , (Γ i ( z (cid:48) ) constraints) (C3) and ∀ z (cid:48) ∈ { , } d i \ { } , (cid:88) z ∈Z p ( z ) w i ( z ) z i { z N i = z (cid:48) } = 0 , (∆ i ( z (cid:48) ) constraints) (C4)The first line, C1, ensures that the weights for β i terms average to 1 /n across allocationswhich guarantees the weights in front all β terms average to unity across allocations. Thelast three lines, C2, C3, and C4, ensure all other parameters will have weights that average tozero. Due to the fact that we allow for direct treatment effect heterogeneity, these constraintsare also equivalent to the constraint that for each i ∈ [ n ], the estimator (cid:98) β i = w i ( z obs ) Y i ( z obs )is unbiased for β i /n .By manipulating the constraints above, we can find a concise condition on the design p suchthat unbiased estimators of ¯ β exist. First, by combining the C4 constraints and subtractingthem from the C1 constraints we have that for all i ∈ [ n ] it must hold that (cid:88) z ∈Z p ( z ) w i ( z ) z i { d z i = 0 } = 1 /n and (cid:88) z ∈Z p ( z ) w i ( z )(1 − z i ) { d z i = 0 } = − /n This leads to the following results.
Proposition 5.2.
Under NIA, LUEs exist if and only if for each i ∈ [ n ] there exist allocations z , z (cid:48) such that p ( z ) > , p ( z (cid:48) ) > , d z i = d z (cid:48) i = 0 , z i = 1 , and z (cid:48) i = 0 .Proof. For the necessity, if no such allocations exist, then one of the summations in thedisplayed equation above would necessarily sum to 0 rather than ± /n . The sufficiencyfollows by considering inverse propensity score weighting. Namely, for the two allocationsthat satisfy the conditions, set w i ( z ) = p ( z ) − /n , w i ( z (cid:48) ) = − p ( z ) − /n , and w i ( z (cid:48)(cid:48) ) = 0 for allother allocations.This propositions implies that, regardless of the network structure, one can construct adesign such that unbiased estimators exist. By ensuring the support of the design contains { e i : i ∈ [ n ] } ∪ { } where e i is the allocation where only unit i is treated, the constraintsof Proposition 5.2 are satisfied. In particular, unbiased estimators will always exist fornon-trivial Bernoulli trials. Remark 5.3.
More generally we can consider designs derived from a collection of inde-pendent sets, which are subsets of vertices where no two vertices in the same subset are13 (a) Independent Sets
11 2 2 33 (b) Non-independent Sets
Figure 2: (a) The left panel shows three independent sets, labeled 1, 2, and 3, which partitionthe graph and form a proper vertex coloring. Notice that no vertex is adjacent to a vertexwith the same number. (b) The right panel shows a partition but the sets are not independentsets since vertices are adjacent to other vertices with the same partition.adjacent. An independent set for a network g is a set V ⊂ [ n ] such that for all i, j ∈ V , g ij = 0, meaning there are no edges between units in V . Let V , . . . , V M be a collectionof disjoint independent sets whose union is [ n ]. Such a collection is called a proper vertexcoloring and no two vertices of the same color are adjacent. An example of a proper vertexcoloring is demonstrated in Figure 2.Proper vertex colorings exist for every graph because the singleton { i } is an independent setand hence { } , { } , . . . , { n } is a proper vertex coloring. One can verify that the conditions ofProposition 5.2 will be satisfied by a design supported on , z (1) , . . . , z ( M ) , where for each m , z ( m ) is the allocation with treatment group V m . Designs based off of independent sets ensureunbiased estimates exist and they can also be used to minimize the presence of interferenceeffects.In Section 5.2 we will examine whether the constraints from Proposition 5.2 are satisfiedfor standard designs for certain networks. Next, we investigate the existence of unbiasedestimators under SANIA, but note that merely assuming symmetrically received interference,the conditions for the existence of unbiased estimators are the same as in Proposition 5.2. Under SANIA, where Y i ( z ) = α i + β i +Γ i ( d z i ), the constraints C1 and C2 from Proposition 5.1remain unchanged and constraint C4 is vacuous. The constraint C3 for the Γ i coefficientsare replaced with for all i ∈ [ n ] it holds that ∀ d ∈ [ d i ] , (cid:88) z ∈Z p ( z ) w i ( z ) { d z i = d } = 0 . (Γ i ( d ) coefficients) (C3 (cid:48) )The exclusion of interaction effects due to the additivity of main effects, Definition 3.3, aswell as the symmetrically received interference effects, Definition 3.5, affords much moreflexibility for unbiased estimates yielding weaker constraints for their existence. Proposition 5.4.
Under SANIA, LUEs exist if and only if for each i ∈ [ n ] there existallocations z , z (cid:48) such that p ( z ) > , p ( z (cid:48) ) > , d z i = d z (cid:48) i , z i = 1 , and z (cid:48) i = 0 . The constraint C1 and C2 on the α i and β i terms, respectively, are the same for SANASIAas under NIA and SANIA. Let C , . . . , C K be the connected components of h = I { g T g > } ,as described in Proposition 4.1. In this case, the only other assumption to ensure that (cid:98) β w isunbiased for β is that ∀ k ∈ [ K ] , (cid:88) z p ( z ) (cid:88) i ∈C k w i ( z ) d z i = 0 . ( γ i constraint) (C3 (cid:48)(cid:48) )Due to the nested nature of SANASIA and SANIA, a sufficient condition for unbiased es-timates to exist is that the design satisfies Proposition 5.4. A necessary but insufficientcondition is that for each unit i there exist allocations z , z (cid:48) such that p ( z ) > p ( z (cid:48) ) > z i = 0 and z (cid:48) i = 1, the condition for the existence of unibased estimates under SUTVA.Under SANASIA, we have not found a concise necessary and sufficient condition for theexistence unbiased estimators given a design. In this section we will examine Bernoulli trials, completely randomized designs and otherdesigns as well as standard estimators such as the naive estimator and Horvitz-Thompsonestimator in the context of small examples.
Example 5.5 (Empty Graph, SUTVA) . First, consider the case where the graph is empty,so that g ij = 0 for all i, j , or equivalently that SUTVA holds. In this case unbiased estimatorsexist as long as for each unit i there exists allocations z , z (cid:48) such that z i = 1 = 1 − z (cid:48) i and p ( z ) , p ( z (cid:48) ) >
0. Hence, unbiased estimators exists for any nontrivial Bernoulli trial andcompletely randomized design. A standard results is that the Horvitz-Thompson, Eq. (5.2),is always unbiased if unbiased estimators exist. The naive estimator Eq. (5.1), will beunbiased for any design which is invariant under permutations of the unit labels, which canbe proved by an application of Proposition 6.5 to the case of the empty graph.
Example 5.6 (Complete Graph) . On the other end of the spectrum, suppose that thegraph is complete so that g ij = 1 for all i (cid:54) = j . In this case, NIA does not constrain theinterference and by Proposition 5.2 unbiased estimators exist if and only if p ( e i ) > i and p ( ) >
0. Note that a Bernoulli trial will satisfy this but a completely randomizeddesign will not because p ( ) = 0 for all but trivial completely randomized designs. For anydesign where unbiased estimators exist, the Horvitz-Thompson estimator with w i ( z ) = p ( e i ) − I { z = e i } − p ( ) − I { z = } , Figure 3: Triangle plus tail graph as described in Example 5.7.will be unbiased. On the other hand, the naive difference of means estimator will never beunbiased in this case.If SANIA holds, completely randomized designs also do not yield unbiased estimators becausethe treated degree of a unit is determined by its treatment. For a completely randomizedesign with k treated units, a unit that receives treatment will have k − k treated neighbors, and hence the conditionsof Proposition 5.4 will not hold. It is also straightforward to show that the same is trueeven under SANASIA. If we consider a mixture of two completely randomized designed, onewhere n t units are treated and one where n t + 1 units are treated, then in this case unbiasedestimators will exist for any n t < n .As shown in Karwa and Airoldi [2016], even under SANASIA with the added assumptionof constant treatment effects, the naive estimator will be biased for any Bernoulli trial orcompletely randomized design for any non-empty graph. On the other hand, while thestandard Horvitz-Thompson estimator under SUTVA may be biased, the use of inversepropensity score weighting will be unbiased in some generality [Aronow and Samii, 2013]. Example 5.7 (Triangle plus Tail) . Consider an undirected network on four nodes wherethe first three nodes are all connected and the fourth node is adjacent only to the thirdnode, as illustrated in Figure 5.7. For this network, under SANIA, again the a Bernoullitrial will yield unbiased estimates while unbiased estimates do not exist for any completelyrandomized design because as in the complete graph example, the third node has all othernodes as neighbors. Table 1 shows the values of the potential outcomes under SANIA for allunits and all allocations in a completely randomized design with two units treated. Hence,if the support of the design distribution is any subset of the allocations with exactly twotreated units, then no unbiased estimator will exist since the treated degree for unit three isalways two if the unit is in the control group and the treated degree is one if the unit is inthe treatment group. This violates Proposition 5.4.
At this point we have established the constraints on a linear estimator such that it is unbiasedand the constraints on the design such that LUEs exist. Typically this unbiased estimator willnot be unique and so we must introduce additional criteria to chose among them. Adoptingthe variance as our measure of performance for an unbiased estimate, we would ideallyminimize variance for all values of the parameters to achieve a uniformly minimum variance16 Y obs Y obs Y obs Y obs (1 , , , β + Γ (1) + α β + Γ (1) + α Γ (2) + α α (1 , , , β + Γ (1) + α Γ (2) + α β + Γ (1) + α Γ (1) + α (1 , , , β + α Γ (1) + α Γ (2) + α β + α (0 , , ,
0) Γ (2) + α β + Γ (1) + α β + Γ (1) + α Γ (1) + α (0 , , ,
1) Γ (1) + α β + α Γ (2) + α β + α (0 , , ,
1) Γ (1) + α Γ (1) + α β + Γ (1) + α β + Γ (1) + α Table 1: The potential outcomes for the network in Example 5.7 for the six treatmentassignments that assign exactly two units to treatment under SANIA.LUE. However, this project is futile as such estimates do not exist even if one assumes there isno interference. Hence, we adopt an alternative notion of optimality motivated by Bayesianconsiderations.We propose minimizing the integrated variance (MIV) with respect to distributions on theparameters defined in Section 4.1. These distributions can be viewed as a prior distributionor as simply a weighting of the parameter space where accuracy is most desired. Note, onemay want to compute the best LUE with respect to a posterior distribution of the parametersthat depends on the observed treatments and outcomes, however this will not be guaranteedto be unbiased since the particular unbiased estimator that will be used for each allocationcan be different and hence bias can be introduced. For convenience, we will refer to thedistribution on the parameters as the prior distribution.We will introduce some additional notation used throughout this section. We denote covari-ance matrices for the parameters with respect to the prior asΣ ξ = Cov( ξ, ξ ) ∈ R n × n and Σ ξ,ξ (cid:48) = Cov( ξ, ξ (cid:48) ) ∈ R n × n where ξ, ξ (cid:48) will denote parameter vectors in R n . For example, under SANIA ξ, ξ (cid:48) can beany of α, β , Γ(1) , . . . , Γ( n − ∈ R n and Σ α denotes the prior covariance for the vector α = ( α , . . . , α n ) and similarly Σ αβ = Cov( α, β ). As is typical for linear estimates [Bickeland Doksum, 2015, Hoff, 2009], the MIV LUE will depend only on the covariance of the priorand not the explicit form of the prior distribution or higher moments so we do not specifythese.To illustrate the ideas, we’ll begin by giving results for MIV LUEs in the SUTVA case beforeextending these results to SANIA, NIA, and SANASIA. We’ll also focus our attention on twoextreme settings which correspond loosely to maximum and minimum heterogeneity. Thefirst is priors where parameters are independent across units, where analytical results arestraightforward. The second is priors where with probability one the parameters are equalacross units. For each allocation, let (cid:98) β z be the MIV LUE for the posterior when the observed allocation is z . While theconstraints will ensure that (cid:80) z (cid:48) p ( z (cid:48) ) (cid:98) β z ( z (cid:48) ) = ¯ β the actual bias of this procedure is (cid:80) z p ( z ) (cid:98) β z ( z ) − ¯ β wherenote that the estimator is different for each allocation z . Hence, we cannot ensure the overall unbiasednessof the procedure. .1 SUTVA Recall that for SUTVA the parameters are α = ( α , . . . , α n ) and β = ( β , . . . , β n ) and Y i ( z ) = α i + z i β i . For SUTVA and all other models, it is sufficient to consider mean zeropriors on the parameters of the model. If the priors are not mean zero, then the estimate (cid:98) β = n (cid:88) i =1 w i ( z ) (cid:0) Y i ( z ) − µ Y i ( z ) (cid:1) + µ ¯ β will be unbiased provided the estimate (cid:98) β w is unbiased, where µ Y i ( z ) and µ ¯ β denote the priormean of the potential outcome Y i ( z ) and average direct treatment effect ¯ β , respectively.Additionally, if (cid:98) β w minimizes the integrated variance for a mean zero prior with the samecovariance, then the above estimator will minimize variance for the shifted prior. This followsfrom standard arguments from Bayesian estimation [Hoff, 2009].If the priors on ( α i , β i ) are mean zero then the integrated variance isIVAR( (cid:98) β w ) = (cid:88) z p ( z ) (cid:88) i,j w i ( z ) w j ( z )(Σ α,ij + z i z j Σ β,ij + z j Σ αβ,ij + z i Σ βα,ij ) . (6.1)There are two extreme cases to consider for the priors, one corresponding to a fully hetero-geneous outcomes and effects, and one corresponding to constant outcomes and effects. Thefirst is the case where the potential outcomes are identically distributed and uncorrelatedacross units so thatΣ α,ij = I { i = j } σ α , Σ β,ij = I { i = j } σ β , and Σ αβ,ij = I { i = j } σ α,β . The integrated variance simplifies to (cid:80) z p ( z ) (cid:80) ni =1 w i ( z ) ( σ α + z i ( σ β + σ α,β )). For this prior,the Horvitz-Thompson estimator is the MIV LUE, see Theorem 6.1 part 1.Alternatively, we can specify a prior where we assume α i = α j and β i = β j for all i, j . Thisis the prior for constant treatment effects with constant baseline outcomes andΣ α,ij = σ α , Σ β,ij = σ β , and Σ αβ,ij = σ α,β . For nontrivial completely randomized designs and Bernoulli trials excluding the allocationswhere either all units or no units are treated, the MIV LUE for this prior is the naiveestimate, see Theorem 6.1 part 2. More generally, this holds for symmetric designs where p ( z ) = p ( z (cid:48) ) whenever z is a permutation of z (cid:48) , which are exactly designs which are mixturesof completely randomized designs.The formal statements of these results follow. Theorem 6.1.
Suppose that the potential outcomes satisfy SUTVA and the design admitsunbiased estimators.1. For a prior where the parameters are mean zero and uncorrelated across units, the MIVLUE of ¯ β is the Horvitz-Thompson estimator with w i ( z ) = z i (cid:80) z (cid:48) z (cid:48) i p ( z (cid:48) ) − − z i (cid:80) z (cid:48) (1 − z (cid:48) i ) p ( z (cid:48) ) = 2 z i − P [ z obsi = z i ] . . For a prior where the parameters are constant across units, if the design is symmetricand does not contain or in its support, then the MIV LUE for ¯ β is the naiveestimator with w i ( z ) = z i (cid:80) j z j − − z i n − (cid:80) j z j . We will not prove this theorem directly, but instead apply results from the SANIA settingwhere the network is empty. Part 1 of the above is an application of Theorem 6.2, and part2 is an application of Proposition 6.5.
We now return to SANIA, where the main effects are additive, Definition 3.3, and theinterference effects are symmetrically received, Definition 3.5. The potential outcomes satisfy Y i ( z ) = α i + β i z i + Γ i ( d z i ),s see Eq. (4.2). For a given allocation z , the integrated squareerror for (cid:98) β w is w T Σ( z ) w − n T Σ β whereΣ( z ) =Σ α + z T Σ β z + z T Σ β,α + Σ α,β z + n − (cid:88) d =1 n (cid:88) d (cid:48) =1 I { d z = d } T Σ Γ( d ) , Γ( d (cid:48) ) I { d z = d (cid:48) } + n − (cid:88) d =1 I { d z = d } T Σ Γ( d ) ,α + Σ α, Γ( d (cid:48) ) I { d z = d } + n − (cid:88) d =1 I { d z = d } T Σ Γ( d ) ,β z + z T Σ β, Γ( d (cid:48) ) I { d z = d } . (6.2)We will now consider first the special case of uncorrelated priors where an explicit solutioncan be derived in the SANIA and NIA case. We then analyze the general case where we cancompute the solution, but we cannot derive a closed form except for a special case. If the parameters are uncorrelated across units, but possibly correlated within units, thenthe integrated variance simplifies to (cid:88) z p ( z ) n (cid:88) i =1 w i ( z ) Σ( z ) ii − n (cid:88) i Σ β,ii where we require that Σ( z ) ii >
0. Using the method of Lagrange multipliers, we derive ananalytic form for the optimal weights w as a function of the covariances of the parametersand the design. As under SUTVA, this will lead to a MIV LUE which uses inverse propensityscore weighting. 19 heorem 6.2. Suppose the potential outcomes satisfy SANIA and the prior on the parame-ters has no correlation between units. If any unbiased estimators exist, then the weights forthe MIV LUE are given by w i ( z ) = C i,d z i (cid:80) n − d =0 C i,d (2 z i − n P [ z obsi = z i , d z obs i = d z i ] (6.3) where we define C i,d = (cid:32)(cid:88) z p ( z ) I { d z i = d } Σ( z ) ii P [ z obsi = z i , d z obs i = d ] (cid:33) − (6.4)= (cid:32) Σ(0 , d ) i P [ z obsi = 0 , d z obs i = d ] + Σ(1 , d ) i P [ z obsi = 1 , d z obs i = d ] (cid:33) − . (6.5) or C i,d = 0 if either denominator in Eq. (6.5) is zero. We make a few observations about this estimator before directing our attention to the generalcase. First, if C i,d = 0 for all d then no unbiased estimators for ¯ β will exist as this violates theconditions of Proposition 5.4. Second, for the case where the graph is empty so that SUTVAholds, then C i,d = 0 for each d > (cid:98) β (cid:101) wi with weights (cid:101) w i defined as w i above but with C i,d defined to be I { d = d ∗ i } for some d ∗ . We require that d ∗ i satisfy P [ z obsi = 1 | d z obs i = d ∗ i ] ∈ (0 ,
1) for all i . Provided this holds for each i , the estimates (cid:98) β (cid:101) wi isunbiased for β i /n and hence (cid:80) i (cid:98) β (cid:101) wi is unbiased for ¯ β .The estimator in Theorem 6.2 is a weighted average over all valid values of d ∗ i of estimateslike (cid:98) β (cid:101) wi . The value C i,d in Theorem 6.2 is equal to the inverse of the integrated varianceof (cid:98) β (cid:101) wi when d ∗ i = d . Hence, the estimator from Theorem 6.2 can be viewed as a weightedaverage of the estimators proposed in Aronow and Samii [2013] with weights proportionalto inverse integrated variances of said estimators. The general case in some ways resembles the case for the independent priors but does nothave an explicit form. Computation of the MIV LUE under a general prior requires findingthe solution to a system of linear equations with | supp( z obs ) | n + 2 n + (cid:80) ni =1 | supp( d z obs i ) \ { }| unknowns. If the covariance matrices are non-singular some simplifications can be made toreduce the computational complexity of computing this estimator. The details of for thegeneral case are provided in Section B.1.As in Section 6.1, the opposite end of the spectrum from uncorrelated priors are priors whereall parameters are constant across units. Such simplifications unfortunately do not yield a20losed form for the MIV LUE. The next two examples illustrate the types of estimators thatarise out of such constant parameter priors for two different graphs. Example 6.3 (Triangle plus tail) . Consider again the graph from Example 5.7 and Figure 5.7with four nodes consisting of a triangle with one extra node adjacent to the first node inthe triangle. We consider the prior where with probability one α i = α , β i = β , and Γ i = Γfor all i . The prior distribution will be α ∼ N (0 , β ∼ N (0 , i ( d ) ∼ N (0 , d ).We consider the design which is a uniform sample over all allocations except for and , essentially a Bernoulli trial with treatment probability 1/2 and the trivial allocationsremoved. The resulting weights for the MIV LUE are given in Table 2. Careful investigationdoes not illuminate any straightforward pattern to these weights and moreover some weightsseem nonsensical as high positive weights are given to units in the control group and visaversa. Nonetheless, using prior distributions such as these will often lead to well performingestimators but this lack of a clear pattern is prototypical of MIV LUEs. i z i d z i w i ( z ) z i d z i w i ( z ) z i d z i w i ( z )1 1 0 0 0 1 -1.7 1 1 0.922 0 1 -2 1 0 1.3 1 1 0.413 0 1 -2 0 1 1.8 0 2 -0.0174 0 1 3.9 0 0 -1.7 0 1 -1.4 i z i d z i w i ( z ) z i d z i w i ( z ) z i d z i w i ( z )1 0 1 -1.7 1 1 0.92 0 2 0.0672 0 1 1.8 0 2 -0.017 1 1 0.373 1 0 1.3 1 1 0.41 1 1 0.374 0 0 -1.7 0 1 -1.4 0 0 -1 i z i d z i w i ( z ) z i d z i w i ( z ) z i d z i w i ( z )1 1 2 -2.5 0 1 0.13 1 1 1.42 1 2 1.4 0 0 -0.85 0 1 -0.693 1 2 1.4 0 0 -0.85 0 1 -0.694 0 1 -0.23 1 0 1.3 1 1 -0.015 i z i d z i w i ( z ) z i d z i w i ( z ) z i d z i w i ( z )1 0 2 -0.18 1 2 1.4 0 2 -0.182 1 0 -0.49 1 1 0.4 0 1 -0.373 0 1 -0.37 0 2 -1.4 1 0 -0.494 1 0 1.3 1 1 -0.45 1 0 1.3 i z i d z i w i ( z ) z i d z i w i ( z )1 1 2 1.4 0 3 02 0 2 -1.4 1 1 0.0643 1 1 0.4 1 1 0.0644 1 1 -0.45 1 0 0.43 Table 2: The (approximate) weights for the MIV LUE for each allocation in a Bernoullitrial excluding the two trivial allocations for Example 6.3.Section 6.2.1, Section 6.3, and Theorem 6.1 illustrate how when the prior distribution hasno correlation across units, the resulting MIV LUE uses inverse propensity weighting. The-orem 6.1 also states that under SUTVA and a constant parameter prior, the naive estimator21s the MIV LUE provided the design is symmetric. On the other hand, the previous exampleshows that under SANIA, neither the naive estimator nor a stratified version of it are theMIV LUE for the constant prior. There are two key reasons that this fails in the previousexample. In the following example, we illustrate a set of conditions under which the stratifiednaive estimator is the MIV LUE. While these conditions are sufficient, we have not beenable to show that they are necessary.
Example 6.4 (Vertex Transitive Graphs) . One reason stratified naive estimators are notMIV LUEs, is they are not even unbiased. The key reason that a naive-type estimator failsto be the MIV LUE in the previous example is the lack of sufficient symmetry. Even if thegraph was regular, so that all units have equal degree, stratified naive estimators are notnecessarily MIV LUEs since the joint distribution of treated degrees and treatments is notexchangeable across units.A sufficient property for the MIV LUE to be the stratified naive estimator is if the graphis vertex transitive and the design is sufficiently symmetric. A graph is vertex transitiveif for all pairs of units i, j , there exists a graph automorphism τ ij : [ n ] (cid:55)→ [ n ] such that τ ( i ) = j [Godsil and Royle, 2013]. Vertex transitive graphs include rings, hypercubes, andthe Peterson graph and can be informally viewed as graphs which are locally identical ateach vertex.Vertex transitivity is not quite sufficient to lead to the stratified naive estimator. Recall thatTheorem 6.1, part 2, requires that the design is symmetric and the support omits and .For the SANIA case, we need to exclude any allocations where the set of treated degrees forthe control units, { d z i : i ∈ [ n ] , z i = 0 } , is disjoint from the set of treated degrees for thetreated units, { d z i : i ∈ [ n ] , z i = 1 } . Viewing the treated degree as a covariate, this excludesany allocations which have no balance on the covariates. Proposition 6.5.
Suppose that the graph g is vertex transitive and suppose the design p satisfies,1. for any graph automorphism τ of g , p ( z ) = p ( τ ( z )) and2. for all allocations z where { d z i : i ∈ [ n ] , z i = 0 } ∩ { d z i : i ∈ [ n ] , z i = 1 } = ∅ , it holds that p ( z ) = 0 .Let n z,d = |{ i : z i = z, d z i = d }| . For any prior where for all i, j , it holds that almost surely α i = α j , β i = β j , and Γ i = Γ j , the MIV LUE has weights given by w i ( z ) = C d z i ( z ) (cid:80) d C d ( z ) 2 z i − n z i ,d z i , (6.6) where C d ( z ) = I { n ,d > , n ,d > } (cid:18) n ,d + 1 n ,d (cid:19) − . The proof of this proposition is in Appendix B.4. A graph automorphism τ : [ n ] (cid:55)→ [ n ] is a bijection where g ij = g τ ( i ) τ ( j ) for all i, j . .3 NIA Under NIA it holds that Y i ( z ) = α i + β i z i + Γ i ( d z i ) = ∆ i ( d z i ) z i . The techniques to deriveMIV LUEs are the same in the NIA case as in the SANIA case. One major difference is thatthe possible interaction between the direct treatment effect and the interference effect meansthat for an allocation z where z i = 0, it does not hold that Y i ( z + e i ) − Y i ( z ) = β . Thismeans that we cannot use stratified estimates like those derived for the SANIA case withuncorrelated priors. Indeed, if only NIA or NIA with symmetrically received interference(SNIA) can be assumed and we find the MIV LUE under an uncorrelated prior then non-zero weights will only be assigned to units with no treated neighbors, as detailed in thefollowing result. Theorem 6.6.
Under NIA (or SNIA), for a prior on the parameters which is uncorrelatedacross units, the MIV LUE has weights w i ( z ) = (2 z i − I { d z i = 0 } n P [ z obsi = z i , d z obs i = 0] . (6.7)Note that for correlated priors, the MIV LUE can have nonzero weights on units with positivetreated degrees but again a closed form does not exist for the general case. Consider the special case where the graph is undirected and not bipartite so that underSANASIA the potential outcomes can be parameterized as Y i ( z ) = α i + β i z i + γd z i . In thiscase, we need to consider a prior distribution over the parameters α , . . . , α n , β , . . . , β n , and γ . In the case of independent priors on all of these parameters, closed forms for the MIVLUE weights w ( z ) do exist but do not have a compact form. In Appendix C, we give theoptimal weights in the independent prior case. We will now attempt a brief analysis via simulation of the quality of the estimators that wederived above in comparison to other commonly used estimators. While a full explorationof the space of parameters is well beyond the scope of this work, we will demonstrate howthe performance of various estimators varies as a function of various aspects of the problem,including the potential outcome parameters, network properties, and the number of units.While our theoretical results show that certain estimators will minimize the integrated vari-ance among all unbiased estimators, under other notions of optimality certain estimatorsmay be more desirable than others and our simulations will serve to investigate the impactof the choice of estimator in situations where the estimator is not necessarily optimal. Inparticular, since our focus in the theory is on finding estimators which minimize integrated23ariance, we will avoid integrating with respect to parameter distributions and instead fo-cus on the performance of the estimators as the parameters change and with respect to thedistributions of the networks.Certain elements of the simulation will be common across experiments. First, we considersix different estimators which we detail here.
Naive
Difference of means between the treatment group and the control group (see Eq. (5.1)).The Naive estimator is the MIV LUE under SUTVA for the prior where all parametersare constant across units and if the design is invariant under permutation of the units.The Naive estimator is not unbiased if there is interference.
Horvitz-Thompson
Inverse propensity score weighting ignoring interference (see Eq. (5.2)).The Horvitz-Thompson estimator is the MIV LUE under SUTVA if the prior distri-bution is independent across units. This estimator is biased if there is interference.
Stratified Naive
A weighted difference of means estimator stratified according to treateddegree (see Eq. (6.6)). This estimator is MIV LUE under certain strong symmetryconditions as described in Proposition 6.5 and other this estimator can be biased.
Independent
The MIV LUE under SANIA for the prior distribution where all parame-ters independent with α i , β i ∼ N (0 ,
1) and Γ i ( d ) ∼ N (0 , d ) for all i and d . (SeeTheorem 6.2.) Equal
The MIV LUE under SANIA for a prior where α i = α + (cid:15) i , β i = β and Γ i = Γ forall i . The parameters α, β and Γ(1) , . . . , Γ( d ) are all independent with Γ( d ) ∼ N (0 , d )and α, β ∼ N (0 , (cid:15) , . . . , (cid:15) n iid ∼ N (0 , − ). SANASIA
The MIV LUE under SANASIA, where all parameters are independent and α i , β i ∼ N (0 ,
1) and γ ∼ N (0 , /n ).These six estimators will be compared in various settings. Remark 7.1.
For the Equal estimator we impose some variance in the α i parameters inorder to guarantee that the Σ( z ) matrices are all non-singular. This allows for the muchmore rapid computational procedure as described in Section B.2. Remark 7.2.
For the SANASIA estimator, we choose γ to have variance that is a factor of1 /n smaller than the variances for the Γ parameters. Viewing SANASIA as an approximationfor SANIA, γ corresponds to an average across units of the Γ parameters hence the varianceof γ matches the variance of ¯Γ(1) for the Independent estimator. We could have scaled thevariances for the Equal estimator but since the MIV LUE is invariant under uniform scalingof the variances, it would not change the results presented.Computation times for these simulations will depend greatly on the two factors, the numberof units and the number of allocations in the support of the design. For most of the esti-mators, computation will be approximately linear in the number of units but for the Equaland SANASIA estimators the computation time will scale polynomially with the number ofunits since a matrix inversion is required.Additionally, since we will compute the mean square error with respect to the design, each24f the estimators will have computation time which scales linearly with the number of allo-cations in the support of the design. In order to consider more analyses we choose to keepthese numbers small and to leave efforts towards more efficient computation for future work.To this end, the second common aspect of our simulations is that we will impose that thecardinality of the support is the minimum of 2 n − , where the particular allocationsused are sampled uniformly from all allocations. Additionally, we always omit the allocations and where all units receive the same treatment.Finally, for each of the simulations we impose that SANIA holds for the potential outcomes.While other sets of assumptions such as NIA may be of interest, restricting our attention tothe SANIA setting allows us to investigate the impact of the core aspects of the problem,such as the number of units, the density and degree distribution of the network, and theeffect sizes. As we vary these aspects of the problem we will deviate from the regions of theparameter where each of the estimators tends to perform well, in order to discover how theperformance of each estimator changes. For this experiment we investigate the performance of the estimators as the number of units n varies. We also consider two types of graphs, dense graphs, where for each Monte Carloreplicate we sample an Erd˝os-R´enyi graph with edge probability 1 /
2, and sparse graphs,where for each Monte Carlo replicate we sample an Erd˝os-R´enyi graph with edge probability1 /n .For the potential outcomes, we impose that SANIA holds and use parameters that arecommon across all Monte Carlo replicates with the same number of units. The parametersare sampled independently with α i ∼ N (0 , β i ∼ N (2 ,
1) and Γ i ( d ) = N ( d,
1) using thesame seed across replicates.For each n ranging in 10 , , . . . ,
50, we sampled 500 graphs for each of the settings, denseER( n, /
2) and sparse ER( n, /n ). For each graph, we computed the six estimators and themean square error for the estimators over the fixed randomization of the allocations z . Wethen averaged the mean square error across graphs with the same number of units for thedense and sparse settings. These estimated mean square errors are plotted in Figure 4. Weomit error bars as the number of Monte Carlo replicates were sufficiently large for the errorbars to be quite small.First, ranks for the performance of each estimators are consistent across number of unitsand density. Typically, Equal has the lowest mean square error followed by Stratified Naive,Naive, SANASIA, Independent, and finally Horvitz-Thompson. However, in the case whenthe number of units is small and the graph is dense, this order can change. Both theHorvitz-Thompson and Naive estimators are biased and do not take treated degree intoaccount. Nevertheless, Naive is generally one of the three best estimators according to thesemetrics. The Horvitz-Thompson estimator has much higher variance and is always the worstestimator. 25 ense ~ ER(1/2) sparse ~ ER(1/n)0.11.010.0 10 20 30 40 50 10 20 30 40 50 number of units m e a n s q u a r e e rr o r estimator Horvitz−ThompsonIndependentSANASIANaiveStratified NaiveEqual
Figure 4: Mean square errors on the log scale for the six estimators as a function of thenumber of units for dense graphs and sparse graphs.The MIV LUE Independent can be viewed as version of the Horvitz-Thompson estima-tor which stratifies based on treated degree. Independent has slightly better performance,though the bias reduction does not make up for the large variance due to inverse propen-sity weighting. The MIV LUE SANASIA is also derived from a prior where all the α i and β i are independent but also imposes that the interference effects are constant and linear,which yields slight improvements over Independent. The MIV LUE Equal imposes the mostregularity in the prior, with all parameters being equal across units, which yields the bestperformance overall.One problematic aspect of Horvitz-Thompson, Independent, and SANASIA, each of whichimplicitly or explicitly consider heterogeneity in baseline outcomes and direct effects, is thatin the dense case, the mean square error actually increases with the number of units. Thissuggests that these estimators have difficulty accounting for large interference effects andindeed, unlike naive type estimators, Horvitz-Thompson estimators can have variances thatscale with effect sizes.All the estimators improve as the number of units increase in the sparse case, where theoverall interference effect per unit remains approximately constant in the number of units. In this next experiment, we investigate the impact of the size of the effect on the performanceof the estimators. This is of interest since the MIV LUEs are each derived from mean zeropriors so we might expect performance to vary as the effect sizes increase.26
E Size = 0 IE Size = 0.8 IE Size = 1.6110 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
DTE size m e a n s q u a r e e rr o r estimator Horvitz−ThompsonIndependentSANASIANaiveStratified NaiveEqual
Figure 5: Mean square errors on the log scale for the six estimators as a function of thenumber of the direct treatment and interference effect sizes.For each monte carlo replicate in this setting we sampled an Erd˝os-R´enyi random graph withedge probability 1 / α i ∼ N (0 , β i ∼ N ( µ β ,
1) and Γ i ( d ) ∼ N ( µ γ d, µ β ranged in 0 , , , . , . . . , . β i and Γ i ( d ) terms, but we found that the impact of treatment effect heterogeneity wassmall compared to the impact of effect sizes. The optimal estimators tended to not changequickly in terms of the level of treatment effect heterogeneity. While it may be natural toposit that an estimator such as Independent will tend to have relatively superior performanceas the effect heterogeneity increases, we found that this is only marginally the case. TheIndependent was less impacted by treatment effect heterogeneity, with a relatively small slopeas effect heterogeneity increased, however, the overall scale of the errors for Independent leadsto the fact that the effect heterogeneity must be very large before Independent will performbetter than other estimators we investigated. Additionally, the naive estimator was the leastsusceptible to treatment effect heterogeneity and hence is frequently the best estimator whentreatment effect variances are large. The performance of the estimators could be impactedby other aspects of the effect size distribution such as the skewness but we have not yetinvestigated the nature of those impacts. For the simulations above, we only considered graphs distributed according Erd˝os-R´enyidistributions which many have noted do not adequately fit most real world graphs [Barab´asiand Albert, 1999, Newman, 2003] For example, the degree distribution of Erd˝os-R´enyi graphsare approximately Poisson or Binomial while many real world graphs exhibit more skeweddegree distributions such as power law distributions. In this simulation we generate graphswith different degree distributions using variations on the preferential attachment model[Barab´asi and Albert, 1999]. In the preferential attachment model, nodes are added oneat a time, with new nodes having one edge connecting it to one of the nodes currentlyin the graph. The probability that the edge connects to a node currently in the graph isproportional to d αi , where d i is the current degree of node i and ρ is a parameter. Once all n nodes have been added and attached to the graph, we randomly permute the vertices, whichpreserves the degrees but not the order that the nodes were added. This ensures that thefixed designs are not biased towards higher or lower degree nodes.In the standard preferential attachment model, ρ = 1 which yields a power law distributionfor the degrees. When ρ = 0, the distribution is similar to an Erd˝os-R´enyi distribution,while when ρ = 2, the degree distribution becomes highly skewed, frequently yielding a stargraph.Figure 6 shows the mean square error for as a function of the power parameter ρ . We see thatmost estimators have there performance degrade increasingly quickly as the power parameterincreases, especially beyond ρ = 1 .
5. The best estimator up to that point is again the MIVLUE Equal. This estimator has performance that becomes non-smooth when the power getslarge. We believe this occurs because the distribution of the graphs tends to concentrateand the Equal estimator may be sensitive to small changes in the graph structure. Finally,the stratified naive estimator actually has improved performance as the degree distributionbecomes more skewed. This is likely due to the fact that the stratified naive estimatorwill omit units that have high treated degree, effectively estimating the mean DTE for the28 .40.81.21.6 0.0 0.5 1.0 1.5 2.0 power parameter m s e estimator Horvitz−ThompsonIndependentSANASIANaiveStratified NaiveEqual
Figure 6: Mean square errors for the six estimators as a function of the number of the directtreatment and interference effect sizes.remaining nodes which all have similar treated degrees. Estimators such as Equal insteadmust be fully unbiased and hence cannot omit nodes with large treated degree.
In this work we have considered the neighborhood interference assumption and four coreassumptions structural assumptions regarding the potential outcomes. The neighborhoodinterference assumption is the simplest nonparametric assumption for which units’ treat-ments’ can effect which units’ outcomes’ in a setting with networked units. As we haveseen, even this assumption can lead to complex high-dimensional parameterizations for thepotential outcomes that make estimation and inference difficult.Some practitioners may desire to allow for units at a greater distance to effect each otherand this can be done but it requires either allowing for even more potential outcomes, losinginformation about path lengths by rewiring the graph, or parameterizing how vertices ata greater distance can effect units. Additionally, due to the small world property of mostobserved networks, allowing treatments to effect units at a distance of more than two or threeedges away would often be nearly equivalent to the arbitrary interference setting. Hence,even though NIA does not capture all aspects of possible interference in a real problem, webelieve for many practitioners NIA does provide a sufficiently rich structure to approximatereal-world interference, while promoting a more parsimonious parameterization.A more challenging problem for many real world setting will be that the observed networkdoes not fully capture the possible paths of interference. If we view the observed networkas a noisy version of a true network encoding interference, then under SANIA, the observedtreated degree will also be a noisy observation of the true treated degree for that allocation.29n these settings, it may be necessary to account for this noise by modelling the true networkand the noise structure for the observed network or by modelling the noisy structure of theinterference directly. We believe these problems pose very interesting challenges even in theSANIA setting.We focused on estimation of the direct treatment effect and most of our efforts were restrictedto the SANIA setting. Experimenters may be interested in other estimands under other setsof assumptions but we believe that the general structure of our results regarding how tochose among unbiased estimates will hold for other estimands provided the constraints forunbiased are changed appropriately.Another core aspect of this work is to consider unbiased estimators, and while we showedin simulation that this can lead to improved performance it still is not clear that unbiasedestimators should be generally preferred. Our simulations showed that when interferencewas present, the Equal MIVLUE estimator and the stratified naive estimator performedcomparably. Unbiasedness may be desired if combining the results of many experimentsis a goal and unbiased estimators can sometimes be more easily interpreted. In general,the desired optimality properties of the estimator must be chosen by the practioner. Ifunbiasedness is wanted, we have shown that using more generic unbiased estimators basedon inverse propensity score weighting can lead to a significant sacrifice in performance.Additionally, the minimum integrated variance concept provides a core tool for choosingamong unbiased estimators.One of the major challenges for using these ideas on large scale social networks is the compu-tational burden for computing the MIVLUEs, especially those where the prior imposes cor-relation between units. For relatively simple designs such as Bernoulli trials and completelyrandomized designs, computation of propensities is staightforward and so for uncorrelatedpriors the computation can be easily paralyzed across units. For correlated priors, compu-tation of Σ( z ) − is required for each z in the support of p . This can be parallelized acrossallocations with the relevant w ( z ) computed in a MapReduce framework.To reduce the computational burden, practitioners do have a few options. One option is toreduce the number of units or repeat the experiment across small disconnected subgraphs.Another option, which we adopted in the simulations, is to reduce the support of the design p . While we did this randomly, one could use rerandomization to reduce the support towardscertain desirable designs, such as those with low treated degrees or balanced treated degrees[Basse and Airoldi, 2015].In general, constructing the design presents many challenges and one promising aspect ofthis work is to use the minimum integrated variance ideas to motivate design decisions aswell as estimation procedures. Ideally, one could seek to jointly optimize the design p and theweights w but this introduces further computational challenges since an iterative approachcould involve finding a MIV LUE at each iteration. Anoter option is to first optimize thedesign based on other criteria such as balance for the treated degree and then find the MIVLUE for that fixed design.Finally, we considered the relatively unrealistic setting where there are no additional covari-ates about the units. Unit covariates can be handled by many classical tools from causal30nference and tools like MIVLUEs can also be applied in analogous manner. A challengemore closely related to this manuscript is if the edges between units have covariate informa-tion. If this information is categorical, such as a communication type or topic, then then onemay want to partition the treated degree according to each edge type, effectively increasingthe types of interference treatments. If these covariates are continuous or more complex thenaccounting for that information presents further challenges. Acknowledgements
This work was partially supported by National Science Foundation awards CAREER IIS-1149662 and IIS-1409177, and by Office of Naval Research YIP award N00014-14-1-0485, allto Harvard University. EMA is an Alfred P. Sloan Research Fellow, and a Shutzer Fellow atthe Radcliffe Institute for Advanced Study.
References
Peter M Aronow and Cyrus Samii. Estimating average causal effects under interferencebetween units. arXiv preprint arXiv:1305.6156 , 2013. 3, 4, 16, 20Susan Athey, Dean Eckles, and Guido W Imbens. Exact p-values for network interference.Technical report, National Bureau of Economic Research, 2015. 3, 8Albert-L´aszl´o Barab´asi and R´eka Albert. Emergence of scaling in random networks. science ,286(5439):509–512, 1999. 28Guillaume W Basse and Edoardo M Airoldi. Optimal design of experiments in the presenceof network-correlated outcomes. arXiv preprint arXiv:1507.00803 , 2015. 3, 30Peter J Bickel and Kjell A Doksum.
Mathematical Statistics: Basic Ideas and SelectedTopics, volume I , volume 117. CRC Press, 2015. 17Robert M Bond, Christopher J Fariss, Jason J Jones, Adam DI Kramer, Cameron Marlow,Jaime E Settle, and James H Fowler. A 61-million-person experiment in social influenceand political mobilization.
Nature , 489(7415):295–298, 2012. 2G. E. P. Box and Norman R. Draper. A basis for the selection of a response surface design.
J. Amer. Statist. Assoc. , 54:622–654, 1959. ISSN 0162-1459. 4Stephen Boyd and Lieven Vandenberghe.
Convex optimization . Cambridge university press,2004. 35, 36David Roxbee Cox. Planning of experiments. 1958. 2, 3Olivier David and Rob A Kempton. Designs for interference.
Biometrics , pages 597–606,1996. 3 31ean Eckles, Brian Karrer, and Johan Ugander. Design and analysis of experiments innetworks: Reducing bias from interference. arXiv preprint arXiv:1404.7530 , 2014. 3Dean Eckles, Ren´e F Kizilcec, and Eytan Bakshy. Estimating peer effects in networks withpeer encouragement designs.
Proceedings of the National Academy of Sciences , 113(27):7316–7322, 2016. 3, 7, 8M. Ghosh and G. Meeden.
Bayesian methods for finite population sampling , volume 79 of
Monographs on Statistics and Applied Probability . Chapman & Hall, London, 1997. ISBN0-412-98771-6. doi: 10.1007/978-1-4899-3416-1. 4Chris Godsil and Gordon F Royle.
Algebraic graph theory , volume 207. Springer Science &Business Media, 2013. 22, 39James E Grizzle. The two-period change-over design and its use in clinical trials.
Biometrics ,pages 467–480, 1965. 3Huan Gui, Ya Xu, Anmol Bhasin, and Jiawei Han. Network a/b testing: From samplingto estimation. In
Proceedings of the 24th International Conference on World Wide Web ,pages 399–409. International World Wide Web Conferences Steering Committee, 2015. 1,2Nicholas J Higham.
Accuracy and stability of numerical algorithms . Siam, 2002. 43Peter D Hoff.
A first course in Bayesian statistical methods . Springer Science & BusinessMedia, 2009. 17, 18Michael G Hudgens and M Elizabeth Halloran. Toward causal inference with interference.
Journal of the American Statistical Association , 2012. 3, 9Vishesh Karwa and Edo Airoldi. Bias of classic estimates under interference. In preperation,2016. 16RA Kempton and G Lockwood. Inter-plot competition in variety trials of field beans (viciafaba l.).
The Journal of Agricultural Science , 103(02):293–302, 1984. 3David A Kim, Alison R Hwong, Derek Stafford, D Alex Hughes, A James O’Malley, James HFowler, and Nicholas A Christakis. Social network targeting to maximise populationbehaviour change: a cluster randomised controlled trial.
The Lancet , 386(9989):145–153,2015. 3Charles F Manski. Nonparametric bounds on treatment effects.
The American EconomicReview , pages 319–323, 1990. 2, 3Charles F Manski. Identification of treatment response with social interactions.
The Econo-metrics Journal , 16(1):S1–S23, 2013. 8 32atthew B McQueen, Jason D Boardman, Benjamin W Domingue, Andrew Smolen, JoyceTabor, Ley Killeya-Jones, Carolyn T Halpern, Eric A Whitsel, and Kathleen MullanHarris. The national longitudinal study of adolescent to adult health (add health) siblingpairs genome-wide data.
Behavior genetics , 45(1):12–23, 2015. 2Mark EJ Newman. The structure and function of complex networks.
SIAM review , 45(2):167–256, 2003. 28J Neyman. Sur les applications de la thar des probabilities aux experiences agaricales: Essaydes principle. excerpts reprinted (1990) in english.
Statistical Science , 5:463–472, 1923. 4Paul R Rosenbaum. Interference between units in randomized experiments.
Journal of theAmerican Statistical Association , 102(477), 2007. 3Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandomizedstudies.
Journal of Educational Psychology , 66(5):688, 1974. 2, 3, 4, 5Donald B Rubin. Bayesian inference for causal effects: The role of randomization.
TheAnnals of Statistics , (1):34–58, 1978. 5Yousef Saad.
Iterative methods for sparse linear systems . Society for Industrial and AppliedMathematics, Philadelphia, PA, second edition, 2003. ISBN 0-89871-534-2. doi: 10.1137/1.9780898718003. 36Panos Toulis and Edward Kao. Estimation of causal peer influence effects. Proceedings ofthe 30th International Conference on Machine Learning, 2013. 7, 8Johan Ugander, Brian Karrer, Lars Backstrom, and Jon Kleinberg. Graph cluster random-ization: Network exposure to multiple universes. In
Proceedings of the 19th ACM SIGKDDinternational conference on Knowledge discovery and data mining , pages 329–337. ACM,2013. 3
A Parametric Forms for Combinations of Assumptions
Here we give parameterizations for the remaining nine combinations of the four assumptionsfrom Section 3.3.
Definition A.1 (SNIA) . If we assume symmetrically received interference effects then thepotential outcomes can be parameterized as Y i ( z ) = α i + β i z i + Γ i ( d z i ) + ∆ i ( d z i ) z i , (A.1)where α i , β i ∈ R and Γ i , ∆ i : { , . . . , d i } (cid:55)→ R . Definition A.2 (ANIA) . The potential outcomes satisfy additivity of main effects if andonly if the the potential outcomes can be parameterized as Y i ( z ) = α i + β i z i + Γ i ( z N i )where α i , β i ∈ R and Γ i : { , } d i (cid:55)→ R . 33 efinition A.3 (NAIA) . The potential outcomes satisfy additivity of interference effects ifand only if the potential outcomes can be parameterized as Y i ( z ) = α i + β i z i + (cid:88) j γ ji g ji z j + (cid:88) j δ ji g ji z i z j (A.2)where for each i, j ∈ [ n ], α i , β i , γ ij , δ ij ∈ R . Remark A.4.
Each γ ji represents the effect of unit j being treated on the outcome of unit i provided there is an edge from j to i and similarly, δ ij represents the additional effect dueto any interaction between the treatments of i and j . Note that for each unit i there are still2 d i +1 distinct potential outcomes but those outcomes can be parametrized using only 2 d i + 2parameters. Definition A.5 (ANAIA) . The potential outcomes satisfy additivity of main effects andadditivity of interference effects if and only if the potential outcomes can be parameterized Y i ( z ) = α i + β i z i + (cid:88) j γ ij g ij z j (A.3)for α i , β i , γ ij ∈ R . Definition A.6 (SNAIA) . The potential outcomes satisfy symmetrically received interfer-ence effects and additivity of interference effects if and only if the potential outcomes can beparameterized Y i ( z ) = α i + β i z i + γ i d z i + δ i d z i z i (A.4)for some α i , β i , γ i , δ i ∈ R . Definition A.7 (SANAIA) . The potential outcomes satisfy symmetrically received inter-ference effects, additivity of main effects, and additivity of interference effects if and only ifthe potential outcomes can be parameterized Y i ( z ) = α i + β i z i + γ i (cid:88) j g ji z j (A.5)= α i + β i z i + γ i d z i (A.6)for some α i , β i , γ i ∈ R . Definition A.8 (NASIA) . The potential outcomes satisfy additivity of interference effectsand symmetrically sent interference effects if and only if the potential outcomes can beparameterized Y i ( z ) = α i + β i z i + (cid:88) j γ j g ji z j + (cid:88) j δ j g ji z i z j (A.7)for some α i , β i , γ j , δ j ∈ R . Definition A.9 (ANASIA) . The potential outcomes satisfy additivity of main effects, ad-ditivity of interference effects, and symmetrically sent interference effects if and only if thepotential outcomes can be parameterized Y i ( z ) = α i + β i z i + (cid:88) j γ j g ji z j (A.8)for some α i , β i , γ j ∈ R . 34 efinition A.10 (SNASIA) . The potential outcomes satisfy symmetrically received inter-ference effects, additivity of interference effects, and symmetrically sent interference effectsif and only if the potential outcomes can be parameterized Y i ( z ) = α i + β i z i + γ i d z i + δ i d z i . (A.9)for some α i , β i , γ i , δ i ∈ R , where as in Proposition 4.1, γ i = γ j and δ i = δ j if i and j are inthe same connected component of h = I { g T g > } . B MIV LUE under SANIA
In this section we will develop a method to solve for the minimum integrated variance linearunbiased estimated under SANIA for quite general priors. The method relies on the use ofLagrange multipliers to solve the constrained optimization problem [Boyd and Vandenberghe,2004] We will require that Σ( z ) is positive semidefinite for each z in the support of the design.Note, we do not require Σ( z ) to be nonsingular, since singular Σ( z ) arise for certain priors,such as those where parameters are almost surely constant across units. B.1 General Case
Using the definition of Σ( z ) from Eq. (6.2), half of the integrated mean square error for anestimate (cid:98) β w is given by 12 (cid:88) z p ( z ) w ( z ) T Σ( z ) w ( z ) . We use half the IMSE because it simplifies some formulas later without changing the solution.Using the method of Lagrange multipliers we know that the best weights will be a stationarypoint function L ( w , λ ) = 12 (cid:88) z p ( z ) w ( z ) T Σ( z ) w ( z )+ λ Tα (cid:88) z p ( z ) w ( z )+ λ Tβ (cid:32)(cid:88) z p ( z ) w ( z ) z − n (cid:33) + n − (cid:88) d =1 λ T Γ d (cid:88) z p ( z ) w ( z ) I { d z = d } . where w : { , } n (cid:55)→ R n and λ α , λ β , λ Γ1 , . . . , λ Γ n − ∈ R n are vectors. Note that we use thenotation I { d z = d } to refer to the vector of indicators, so that I { d z = d } i = I { d z i = d } .Similar abuses of notation will be made later, hopefully with minimum confusion for the35eader. Finally, for a discrete random variable X , its support is the set of values whichthe random variable takes with some positive probability which we denote supp( X ) = { x : P [ X = x ] > } .In order to find the MIV LUE, we must find the Karus-Kuhn-Tucker (KKT) point [Boydand Vandenberghe, 2004] for this problem which is the pair w , λ which satisfies ∇ L (cid:18) w λ (cid:19) = . This is a system of linear equations with | supp( z obs ) | n +2 n + (cid:80) ni =1 | supp( d z obs i ) \{ }| unknownsand coefficients. As is the case for generalized least squares estimates for linear regression, ∇ L comprises 2 × n + (cid:80) ni =1 | supp( d z obs i ) \ { }| is zero. Remark B.1.
For a Bernoulli trial, ∇ L has n n + 2 n + (cid:80) i d i rows which for n = 12 isnearly 50000. Fortunately, this system is quite sparse, with only O (2 n n ) non-zero entries,rather than O (4 n n ), so the use of sparse linear system solvers may lead to faster algorithms[Saad, 2003]. Furthermore, the support of the design can be restricted, as was done in oursimulations, to ease this computational burden. B.2 Nonsingular case
If it holds that Σ( z ) is nonsingular for each z in the support of p , then this system can besolved more rapidly. First, taking derivatives with respect to w ( z ) for a given unit i andallocation z , we have that ∂L∂ w ( z ) = p ( z ) (cid:32) Σ( z ) w ( z ) − λ α − diag( z ) λ β − n − (cid:88) d =1 diag( I { d z = d } ) λ Γ d (cid:33) As Σ( z ) is invertible, we can solve for ∂L∂ w ( z ) = . This yields w ( z ) = − Σ( z ) − (cid:32) λ α + diag( z ) λ β + n − (cid:88) d =1 diag( I { d z = d } ) λ Γ d (cid:33) . (B.1)Taking derivatives with respect to the λ terms and plugging in Eq. (B.1) gives ∂L∂λ α = − (cid:88) z p ( z )Σ( z ) − (cid:32) λ α + diag( z ) λ β + n (cid:88) d =1 diag( d z = d ) λ Γ d (cid:33) ∂L∂λ β = − (cid:88) z p ( z )diag( z )Σ( z ) − (cid:32) λ α + diag( z ) λ β + n (cid:88) d =1 diag( d z = d ) λ Γ d (cid:33) − n ∂L∂λ Γ d = − (cid:88) z p ( z )diag( d z = d )Σ( z ) − λ α + diag( z ) λ β + n (cid:88) d (cid:48) =1 diag( d z = d (cid:48) ) λ Γ d (cid:48)
36o find the optimal λ , we must set the above equations to zero and solve for the λ terms.This is now a system with 2 n + (cid:80) i | supp( d z i )) \ { }| unknowns and coefficients which musthave a solution since we are working under the conditions where an unbiased estimatorexists. Finally, after solving for the λ terms the MIV LUE is found by plugging them backinto Eq. (B.1). In the next section we will show how we can analytically solve this systemin the case of uncorrelated priors which will lead to Theorem 6.2. Remark B.2.
The case when Σ( z ) is invertible offers a substantial reduction in computationcompared to the general case. First, we must find all Σ( z ) − involves inverting | supp( z obs ) | matrices of size n × n . Following this, we need to sum various diagonal transformations ofthe Σ( z ) − over all allocations.We note that these two operations fit well into a MapReduce framework and hence can beperformed relatively quickly in a distributed or parallel system. At this point we have notfully explored the computation aspects of this procedure but further methodological researchwill likely allow scaling to relatively large problems provided the the support of the designdoes not grow too quickly. Remark B.3.
If Σ( z ) is singular, the strategy outlined in this section can still be attemptedwith the inverse of Σ( z ) replaced by the Moore-Penrose pseudoinverse. Unfortunately, thismethod is not guaranteed to find an optimal solution but it is straightforward to checkwhether the KKT conditions are satisfied. Again, we have not explored this in detail but wedo note that this method finds the solution found in Section B.4. B.3 Uncorrelated Case
In this subsection, we’ll prove Theorem 6.2. Uncorrelated priors will automatically yieldnon-singular covariance structure, so everything in the previous subsection applies in thiscase but the equations simplify substantially.
Proof of Theorem 6.2.
First, Σ( z ) ∈ R n × n will be diagonal and with Σ( z ) ii = σ i ( z ), w i ( z ) = − σ i ( z ) − (cid:32) λ αi + z i λ βi + n − (cid:88) d =1 I { d z i = d } λ Γ d,i (cid:33) . (B.2)Let a i = (cid:88) z p ( z ) σ i ( z ) b i = (cid:88) z p ( z ) z i σ i ( z ) (B.3) f id = (cid:88) z p ( z ) I { d z i = d } σ i ( z ) g id = (cid:88) z p ( z ) z i I { d z i = d } σ i ( z ) (B.4)37o that ∂L∂λ αi = − (cid:32) λ αi a i + λ βi b i + n − (cid:88) d =1 λ Γ ,d,i f id (cid:33) ,∂L∂λ βi = − (cid:32) λ αi b i + λ βi b i + n − (cid:88) d =1 λ Γ ,d,i g id + 1 /n (cid:33) , and ∂L∂λ Γ ,d,i = − (cid:0) λ αi f id + λ βi g id + λ Γ ,d,i f id (cid:1) . The last equation implies λ Γ ,d,i = − λ αi − λ βi g id f id which we plug into the first two equationsyielding ∂L∂λ αi = λ αi (cid:32) − a i + n − (cid:88) d =1 f id (cid:33) + λ βi (cid:32) − b i + n − (cid:88) d =1 g id (cid:33) ,∂L∂λ βi = λ αi (cid:32) − b i + n − (cid:88) d =1 g id (cid:33) + λ βi (cid:32) − b i + n − (cid:88) d =1 g id f id (cid:33) − /n. Setting the above to 0 gives λ αi = g i /f i n (cid:80) n − d =0 g id (cid:16) − g id f id (cid:17) λ βi = − n (cid:80) n − d =0 g id (cid:16) − g id f id (cid:17) and λ Γ di = − g i /f i + g id /f id n (cid:80) n − d =0 g id (cid:16) − g id f id (cid:17) . Which implies after plugging into Eq. (B.2) that w i ( z ) = z i − g id z i /f id z i nσ i ( z ) (cid:80) n − d =0 g id (cid:16) − g id f id (cid:17) . By letting h id = f id − g id , we can rewrite the above equation as w i ( z ) = z i (cid:18) g id z i h id z i f id z i (cid:19) nσ i ( z ) g id z i (cid:80) n − d =0 g id h id f id − (1 − z i ) (cid:18) g id z i h id z i f id z i (cid:19) nσ i ( z ) h id z i (cid:80) n − d =0 g id h id f id = (cid:32) n P [ z obsi = z i , d z obs i = d z i ] n − (cid:88) d =0 g id h id f id (cid:33) − g id z i h id z i f id z i (2 z i − g id = P [ z obsi = 1 , d z obs i = d ] σ ( z (cid:48) ) and h id = P [ z obsi = 0 , d z obs i = d ] σ ( z (cid:48)(cid:48) ) , where z (cid:48) and z (cid:48)(cid:48) are allocations where the treated degree for unit i is d and z (cid:48) i = 1 and z (cid:48)(cid:48) i = 0,yields the estimate in Eq. 6.3 in Theorem 6.2.The first part of Theorem 6.1 is simply an application of Theorem 6.2. B.4 Constant Prior Case for Vertex Transitive Graphs
The constant prior case, where almost surely the parameters are constant across units, doesnot provide immediate simplifications or closed forms for the MIV LUE for general graphsand designs. At this point, only the vertex transitive case has been fully explored, seeSection 6.2.2, in which case a very simple form for the MIV LUE arises.
Proof of Proposition 6.5.
To begin, we will prove the result in the special case that thedesign p is supported on a single orbit of the automorphism group acting on the set ofallocations. Specifically, the orbit of an allocation z is the set of allocations τ ( z ) where τ is an automorphism of the group g which acts on allocations such that τ ( z ) i = z τ − ( i ) .Importantly, the set of orbits partitions the space of allocations [Godsil and Royle, 2013].The first step in this proof is to show that estimator is unbiased. In this special case, sincewe assume that the design is symmetric with respect to automorphisms, p must be uniformon the orbit. We recall the notation n z,d ( z ) is the number of units i where z i = z and d z i = d . Note that n z,d ( z ) = n z,d ( τ ( z )) for any automorphism τ since reassigning treatmentsaccording to an automorphism τ must also reassign the treated degrees in according to τ .Hence, for now we will abbreviate n z,d ( z ) as simply n z,d since it is the same for all allocationsin the support.Before proceeding, we will briefly prove the following group Lemmas. Lemma B.4.
Suppose that g is vertex transitive and let T be the automorphism group for g .For any i, j, k ∈ [ n ] , the number of automorphisms τ where τ ( i ) = j is equal to the numberof automorphisms where τ ( i ) = k .Proof. For any i, j ∈ [ n ], let T ij ⊂ T be the set of automorphisms which map i to j . For any τ jk ∈ T jk and τ ij ∈ T ij we have that τ jk ◦ τ ij ∈ T ik . Additionally, by the fact that τ jk has aninverse, if τ ij , τ (cid:48) ij ∈ T ij then τ ij (cid:54) = τ (cid:48) ij implies τ jk ◦ τ ij (cid:54) = τ jk ◦ τ (cid:48) ij for any distinct τ ij , τ (cid:48) ij ∈ T ij .This implies that | T ij | ≤ | T ik | and reversing the roles of j and k above implies the result. Lemma B.5.
Under the conditions of Proposition 6.5 where the design p is supported on asingle orbit, P [ z obsi = z, d z obs i = d ] = n z,d /n .Proof. Select a fixed allocation z ∈ supp( z obs ). We can sample from p by selecting a randomautomorphism τ drawn uniformly from all automorphisms and then z obs = τ ( z ) ∼ p . We39ow have that P [ z obsi = z, d z obs i = d ] is the probability that we draw an allocation that maps i to a unit j where d z j = d and z j = z . The number of such units is exactly n z,d and byLemma B.4 the probability that the random automorphism τ maps i to j for any units i and j is exactly | T ij | / (cid:80) k | T ik | = 1 /n .Now, recall from Proposition 6.5 Eq. (6.6), the estimator is defined as w i ( z ) = C d z i (cid:80) n − d =0 C d z i − n z i ,d z i where C d = I { n ,d > , n ,d > } (cid:18) n ,d + 1 n ,d (cid:19) − and C = n − (cid:88) d =0 C d . Altogether, we have that E [ w i ( z obs ) Y i ( z obs )]= n − (cid:88) d =0 C d C (cid:18) P [ z obsi = 1 , d z obs i = d ] 1 n ,d ( α i + β i + Γ i ( d )) − P [ z obsi = 0 , d z obs i = d ] 1 n ,d ( α i + Γ i ( d )) (cid:19) = n − (cid:88) d =0 C d C (cid:18) n ,d n n ,d ( α i + β i + Γ i ( d )) − n ,d n n ,d ( α i + Γ i ( d )) (cid:19) = n − (cid:88) d =0 C d C (cid:18) n ( α i + β i + Γ i ( d )) − n ( α i + Γ i ( d )) (cid:19) = n − (cid:88) d =0 C d C n β i = 1 n β i . This implies that the estimator is unbiased.The remaining KKT conditions are thatΣ( z ) w ( z ) − λ α − diag( z ) λ β − n − (cid:88) d =1 diag( I { d z = d } ) λ Γ d = 0 , (B.5)for each z in the support where we are free to choose each λ term, but the λ terms cannotdepend on z . Since all the parameters are equal across units, let us define σ ξ,ξ (cid:48) = Cov( ξ, ξ (cid:48) )where ξ, ξ (cid:48) can be any of α , β , Γ(1) , . . . , Γ( n − z ) ij = σ α,α +( z i + z j ) σ α,β + z i z j σ β,β + σ Γ( d z i ) , Γ( d z j ) + z i σ β, Γ( d z j ) + z j σ β, Γ( d z i ) + σ α, Γ( d z j ) + σ α, Γ( d z i ) + σ α, Γ( d z j ) Note that since (cid:80) j w j ( z ) = 0, we have that (Σ( z ) w ( z )) i will have no terms from above thatdepend only on z i and d z i , including σ α,α , z i σ α,β , σ α, Γ i ( d z i ) , . . . . Let w z,d = w i ( z ) for any unit40 where d z i = d and z i = z and note that w z,d n z,d = (2 z − C d /C . Hence,(Σ( z ) w ( z )) i = n − (cid:88) d =0 w ,d n ,d (cid:0) σ α,α + z i σ α,β + σ Γ( d z i ) , Γ( d ) + z i σ β, Γ( d ) + σ α, Γ( d ) + σ α, Γ( d z i ) + σ α, Γ( d ) ) (cid:1) + w ,d n ,d (cid:0) σ α,α + z i σ α,β + σ α,β + z i σ β,β + σ Γ( d z i ) , Γ( d ) + z i σ β, Γ( d ) + σ β, Γ( d z i ) + σ α, Γ( d ) + σ α, Γ( d z i ) + σ α, Γ( d ) (cid:1) = n − (cid:88) d =0 C d C ( σ α,β + z i σ β,β + σ β, Γ( d z i ) ) = σ α,β + z i σ β,β + σ β, Γ( d z i ) or, succinctly, Σ( z ) w ( z ) = σ α,β + σ β,β z + n − (cid:88) d =1 σ β, Γ( d ) I { d z = d } )Hence, setting λ α = σ α,β , λ β = σ β,β , and λ Γ d = σ β, Γ( d ) will ensure that Eq. (B.5) issatisfied.This proves the result if p is supported on a single orbit. If p is not supported on a single orbitthen we can write p as a mixture of designs supported on each orbit. Since the estimator doesnot depend on the design, by conditioning we can verify that the estimator is unbiased forany such mixture. Similarly, the analysis of the remaining KKT conditions follows mutatismutandis . C MIVLUE under SANASIA
In this section we will sketch a derivation of the MIVLUE under SANASIA for a specificset of priors. We assume that the the priors are uncorrelated across units for the α i and β i parameters and that the prior for the γ i parameters are uncorrelated with the priors on the α i and β i parameters. As in the previous sections we assume that the priors are all meanzero.We will also assume, in the context of Proposition 4.1, that the graph h has exactly oneconnected component so that the interference effects are equal for all units. The resultsare similar for h with multiple connected components but the derivation requires additionalnotation and book keeping.Recall that under SANASIA Y i ( z ) = α i + β i z i + γd z i . The mean square error for a LUE is M SE ( (cid:98) β w ) = (cid:88) z p ( z ) n (cid:88) i =1 n (cid:88) j =1 w i ( z ) w j ( z )( α i + β i z i + γd z i )( α j + β j z j + γd z j ) − β . We will write the prior variance for α i as σ αi and β i as σ β,i . We write Var( γ ) = σ γ . Hence,half of the integrated mean square error can be written as12 (cid:88) z p ( z ) n (cid:88) i =1 w i ( z ) (cid:32) σ αi + σ βi z i + d z i σ γ (cid:88) j d z j (cid:33) − σ β n . L ( w , λ ) = 12 (cid:88) z p ( z ) n (cid:88) i =1 w i ( z ) (cid:32) σ αi + σ βi z i + d z i σ γ (cid:88) j d z j (cid:33) + λ Tα (cid:88) z p ( z ) w ( z )+ λ Tβ (cid:32)(cid:88) z p ( z ) w ( z ) z i − n (cid:33) + λ γ (cid:88) z p ( z ) (cid:88) i w i ( z ) d z i . . The derivative of the Lagrangian with respect to w i ( z ) is ∂L∂w i ( z ) = p ( z ) (cid:32) w i ( z )( σ αi + σ βi z i + σ γ d z i (cid:88) j d z j ) + λ αi + λ βi z i + λ γ d z i (cid:33) so that w i ( z ) = − λ αi + λ βi z i + λ γ d z i σ i ( z ) . where we define σ i ( z ) = σ αi + σ βi z i + σ γ d z i (cid:80) j d z j .The derivative with respect to the λ terms after plugging in the solution for w i ( z ) above is ∂L∂λ αi = − (cid:88) z p ( z ) λ αi + λ βi z i + λ γ d z i σ i ( z ) ∂L∂λ βi = − (cid:88) z p ( z ) λ αi z i + λ βi z i + λ γ d z i z i σ i ( z ) − /n∂L∂λ γ = − (cid:88) z p ( z ) (cid:88) i λ αi d z i + λ βi d z i z i + λ γ d z i σ i ( z )Hence, λ γ = C − (cid:88) z p ( z ) (cid:88) i λ αi d z i + λ βi d z i z i σ i ( z )where C = − (cid:80) z p ( z ) (cid:80) i d z i σ i ( z ) .Let a i and b i be as in Eq. (B.3) and let g i = (cid:80) z p ( z ) d z i σ i ( z ) and let h i = (cid:80) z p ( z ) d z i z i σ i ( z ) .Plugging in the solution for λ γ in terms of λ α and λ β , we can write the derivatives withrespect to λ α and λ β as − (cid:18)(cid:18) diag( a ) diag( b )diag( b ) diag( b ) (cid:19) + C − (cid:18) gh (cid:19) (cid:0) g T h T (cid:1)(cid:19) (cid:18) λ α λ β (cid:19) A − = (cid:18) diag( a − b ) − − diag( a − b ) − − diag( a − b ) − diag( a ◦ b − ◦ ( a − b ) − ) (cid:19) where ◦ denotes the Hadamard product of vectors, ( a ◦ b ) i = a i b i . Using the Woodburymatrix identity [Higham, 2002] with the above, the solution for λ α and λ β is (cid:18) λ α λ β (cid:19) = (cid:18) A − + Q − A − (cid:18) gh (cid:19) (cid:0) g T h T (cid:1) A − (cid:19) (cid:18) n (cid:19) where Q is a scalar equal to C + g T diag( a − b ) − g − g T diag( a − b ) − h − h T diag( a ◦ b − ◦ ( a − b ) − ) h. So (cid:18) λ α λ β (cid:19) = 1 n (cid:18) − a − bab ( a − b ) (cid:19) − Rn (cid:32) g − ha − bha − gbb ( a − b ) (cid:33) where R = Q − ( h T ab ( a − b ) − g T a − b ) Plugging this into λ γ and simplifying gives that λ γ = C − ( g T λ α + h T λ β ) = RQnC − R ( Q − C ) nC = R/n
So we have w i ( z ) = 1 nσ i ( z ) (cid:18) − b i − Rb i ( g i − h i ) b i ( a i − b i ) + z i a i − R ( h i a i − g i b i ) b i ( a i − b i ) + Rd z i (cid:19) . This is the formula that is used for the MIV LUEs based on SANASIA in the simulationssection.
Theorem C.1.
Suppose the potential outcomes satisfy SANASIA and that the interferenceeffect is constant so that Y i ( z ) = α i + β i z i + γd z i . Suppose the prior distribution on theparameters satisfies Cov(( α i , β i ) , ( α j , β j )) = 0 and that γ is indepo and the prior on theparameters has no correlation between units. If any unbiased estimators exist, then theweights for the MIV LUE are given by w i ( z ) = 1 nσ i ( z ) (cid:18) − b i − Rb i ( g i − h i ) b i ( a i − b i ) + z i a i − R ( h i a i − g i b i ) b i ( a i − b i ) + Rd z i (cid:19) . (C.1) where we define the vectors a, b, g, h ∈ R n as a i = (cid:88) z p ( z ) σ i ( z ) b i = (cid:88) z p ( z ) z i σ i ( z ) , (C.2) g i = (cid:88) z p ( z ) d z i σ i ( z ) h i = (cid:88) z p ( z ) d z i z i σ i ( z ) , (C.3) The scalar R is defined in terms of C = − (cid:88) z p ( z ) (cid:88) i d z i σ i ( z ) ∈ R , and Q = C + g T diag( a − b ) − g − g T diag( a − b ) − h − h T diag( a ◦ b − ◦ ( a − b ) − ) h ∈ R ,R = Q − (cid:16) h T ( a ◦ b − ◦ ( a − b ) − ) − g T ( a − b ) − (cid:17) ∈ R ..