[PDF] Analysis of Randomized Experiments with Network Interference and Noncompliance

Abstract

Randomized experiments have become a standard tool in economics. In analyzing randomized experiments, the traditional approach has been based on the Stable Unit Treatment Value (SUTVA: \cite{rubin}) assumption which dictates that there is no interference between individuals. However, the SUTVA assumption fails to hold in many applications due to social interaction, general equilibrium, and/or externality effects. While much progress has been made in relaxing the SUTVA assumption, most of this literature has only considered a setting with perfect compliance to treatment assignment. In practice, however, noncompliance occurs frequently where the actual treatment receipt is different from the assignment to the treatment. In this paper, we study causal effects in randomized experiments with network interference and noncompliance. Spillovers are allowed to occur at both treatment choice stage and outcome realization stage. In particular, we explicitly model treatment choices of agents as a binary game of incomplete information where resulting equilibrium treatment choice probabilities affect outcomes of interest. Outcomes are further characterized by a random coefficient model to allow for general unobserved heterogeneity in the causal effects. After defining our causal parameters of interest, we propose a simple control function estimator and derive its asymptotic properties under large-network asymptotics. We apply our methods to the randomized subsidy program of \cite{dupas} where we find evidence of spillover effects on both short-run and long-run adoption of insecticide-treated bed nets. Finally, we illustrate the usefulness of our methods by analyzing the impact of counterfactual subsidy policies.

Full PDF

AAnalysis of Randomized Experimentswith Network Interference and Noncompliance

Bora KimDecember 29, 2020

Abstract

Randomized experiments have become a standard tool in economics. In analyzing random-ized experiments, the traditional approach has been based on the Stable Unit TreatmentValue (SUTVA: Rubin (1990)) assumption which dictates that there is no interference be-tween individuals. However, the SUTVA assumption fails to hold in many applications dueto social interaction, general equilibrium, and/or externality eﬀects. While much progresshas been made in relaxing the SUTVA assumption, most of this literature has only con-sidered a setting with perfect compliance to treatment assignment. In practice, however,noncompliance occurs frequently where the actual treatment receipt is diﬀerent from theassignment to the treatment. In this paper, we study causal eﬀects in randomized exper-iments with network interference and noncompliance. Spillovers are allowed to occur atboth treatment choice stage and outcome realization stage. In particular, we explicitlymodel treatment choices of agents as a binary game of incomplete information where re-sulting equilibrium treatment choice probabilities aﬀect outcomes of interest. Outcomesare further characterized by a random coeﬃcient model to allow for general unobservedheterogeneity in the causal eﬀects. After deﬁning our causal parameters of interest, wepropose a simple control function estimator and derive its asymptotic properties underlarge-network asymptotics. We apply our methods to the randomized subsidy program ofDupas (2014) where we ﬁnd evidence of spillover eﬀects on both short-run and long-runadoption of insecticide-treated bed nets. Finally, we illustrate the usefulness of our methodsby analyzing the impact of counterfactual subsidy policies.

Keywords : causal inference, interference, spillover, networks, games of incomplete infor-mation, control function 1 a r X i v : . [ ec on . E M ] D ec Introduction

Randomized experiments have become a standard tool for causal inference in economics.In analyzing randomized experiments, the traditional approach is based on the StableUnit Treatment Value (SUTVA: Rubin (1990)) assumption which dictates that there isno interference between individuals. However, there are many settings where the SUTVAassumption fails to hold. For instance, deworming treatment given to some student may af-fect academic achievements of other students through externality eﬀects (See for instance,Miguel and Kremer (2004)). In labor market, Cr´epon et al. (2013) show that a large-scalejob placement program aﬀects non-participant’s employment probability through generalequilibrium eﬀects. Ferracci et al. (2014) also report similar results. In such cases, there isinterference or spillover eﬀect where an individual’s behavior either directly or indirectlyaﬀects others’ outcomes through social interactions, externalities, or general equilibriumeﬀects.In recent years, there has been substantial progress in relaxing the SUTVA assumptionin causal inference framework. Examples include Manski (2013), Hudgens and Halloran(2008), Leung (2020a), Vazquez-Bare (2020), and Baird et al. (2018). Much of the lit-erature, however, has been built on the restrictive assumption of perfect compliance tointervention in which experimental units perfectly comply with their assignment of treat-ment. In practice, noncompliance occurs commonly — some units assigned to treatmentgroup may opt out of the treatment, while some units assigned to control group may de-cide to take the treatment. In studies of labor market, for example, Cr´epon et al. (2013)report that only 35% of those who were oﬀered intensive job counseling actually tookup the oﬀer. While instrumental variables (IV) methods are widely used to address thenoncompliance problem, these methods are developed based on the assumption that rulesout interference between units (Imbens and Angrist (1994)).The goal of this paper is to develop a formal framework to conduct causal inferencein randomized experiments with both spillovers and noncompliance. In the presence ofnoncompliance, spillovers can occur at two stages: at the treatment decision stage, andat the outcome realization stage. In the ﬁrst stage in which each agent chooses theirtreatment status, spillovers may occur if the utility from choosing treatment depends onthe treatment choices of others. In the second stage where outcomes (or responses) arerealized, agent’s outcome can be aﬀected not only by their own treatment choice, but2lso by treatment choices of others either directly or indirectly. While most of existingliterature has only addressed the spillover eﬀects at the outcome level (i.e., at the secondstage), we allow for spillover eﬀects both at the treatment choice (ﬁrst stage) and at theoutcome (second stage).To model spillovers, we take a game-theoretic approach. We consider a ﬁrst stagemodel in which agents play a binary game of incomplete information. Such binary games ofincomplete information have been used in various economic applications, e.g., in empiricalindustrial organization literature (Bajari et al. (2010)), to model binary choices under peereﬀects (Brock and Durlauf (2001), Brock and Durlauf (2007) and Xu (2018)), and recently,to model network formation process (Leung (2015), and Ridder and Sheng (2020)). Weapply the method to the problem of endogenous treatment choices in the presence ofspillovers. Speciﬁcally, we assume that agents simultaneously choose their treatment statusas to maximize their expected utilities, given beliefs about anticipated treatment choicesof their neighbors. In equilibrium, agents’ subjective beliefs coincide with objective choiceprobabilities. Assuming that the unique equilibrium exists, the reduced-form model ofagent’s treatment choice can be written as a single threshold-crossing model where thethreshold is a function of agent’s own treatment assignment and the average equilibriumtreatment choice probability of their neighbors. In the second stage, outcomes are modeledas being a function of agent’s own treatment choice and the equilibrium average treatmentchoice probability of their neighbors, as it is determined in the ﬁrst stage game. As inthe ﬁrst stage choice model, spillovers are captured by the equilibrium treatment choiceprobabilities.In our model, therefore, equilibrium treatment choice probabilities work as a media-tor of spillover eﬀects. This is diﬀerent from the existing literature which often modelsthe spillover at the outcome level by the proportion of treated neighbors. See for instanceHudgens and Halloran (2008), Leung (2020a), and Vazquez-Bare (2020). As we show later,when the outcome of interest represents a choice or behavior of individuals, their formu-lation implicitly assumes that the proportion of treated neighbors is fully observable toagents, i.e., agents possess a complete information over behaviors of their peers. However,the assumption of complete information is unrealistic especially in a single large networksetting as ours where each individual has a considerable number of peers. In such cases,it is more reasonable to assume that agents face uncertainty over others’ behavior, making In our application, for instance, agents have 17 neighbors on average.

3n incomplete information framework more adequate approximation of reality.We then characterize outcomes as a random coeﬃcient model to allow for generalunobserved heterogeneity. Our parameters of interest are average causal eﬀects which in-clude an average direct eﬀect of own treatment take-up and an average spillover eﬀectfrom direct neighbors. After rigorously deﬁning our parameters of interest, we show ouridentiﬁcation result. We ﬁrst note that under general unobserved heterogeneity, the con-ventional instrumental variables (IV) methods do not identify the causal parameters whenwe allow for general heterogeneity in the outcome. We therefore propose our alternativeidentiﬁcation based on a control function approach.We then propose a simple two-step estimator where the ﬁrst step estimates the payoﬀparameters of treatment choice games using nested ﬁxed-point maximum-likelihood esti-mation and the second step estimates the average potential outcome functions using con-trol function regression. Our estimator extends canonical Heckman (1979) sample selectionestimator (“Heckit”) to incorporate possible spillover eﬀects. We show that the estimatorsare √ n -consistent and asymptotically normal under the “large-network” asymptotics inwhich a number of individuals connected in a single network increases to inﬁnity. Westudy ﬁnite-sample properties of our estimators through Monte Carlo simulation.Our methods are applied to the randomized subsidy program of Dupas (2014). Whilethe use of insecticide-treated nets (ITNs) has been shown to be eﬀective in control-ling malaria, the rate of adoption remains low. Given that the mosquito nets need tobe re-purchased and replaced regularly, understanding the factors aﬀecting household’sshort-run and long-run decision to purchase the bednet is an important task to achievesuﬃciently high equilibrium adoption rate. In our application, we study the eﬀect of short-run purchase of the bednet on the long-run purchase decision while incorporating possiblespillovers from neighbors deﬁned by geographical proximity. The treatment is a binary isa binary indicator for purchasing a mosquito net in the short-run (in Phase 1) and theoutcome is a binary indicator for purchasing a mosquito net in the long-run (in Phase 2).We ﬁnd evidence of positive spillover eﬀects in the short-run bednet purchase decision.More speciﬁcally, in Phase 1, households were more likely to purchase the bednet whenthe average expected purchase rate of their neighbors is higher. On the contrary, we ﬁndthe evidence of negative spillover eﬀects in the long run although the statistical power islimited. Speciﬁcally, households were less likely to purchase the bednet in Phase 2 whenthe average expected purchase rate in Phase 1 was higher. Our results also suggest that4he average direct eﬀect of the bednet purchase in Phase 1 on the purchase in Phase 2declines monotonically with respect to the expected neighborhood purchase rate in Phase1. When the Phase-1 neighborhood purchase rate was 0% (no spillover), households whopurchased the bednet in Phase 1 were 36.9 percentage points more likely to purchase thebednet in Phase 2 compared to those who did not purchase the bednet in Phase 1. Sucheﬀect becomes almost to zero at another extreme where the neighborhood purchase ratewas 100% (full spillover). Ignoring spillover eﬀects leads to the misleading conclusion thatthe average direct eﬀect of the short-run purchase on the long-run purchase is almost zerowhen in fact, the eﬀect varies from 0% to 36% depending on the degree of spillovers.Our structural modeling allows researchers to analyze the impact of counterfactualpolicies on the outcome of interest. We illustrate this by analyzing the impact of counter-factual subsidy program on the long-run adoption in which a policy-maker implements ameans-tested subsidy rule where the subsidy is given only when the household’s incomelevel is below some pre-speciﬁed threshold. We predict the average long-term adoptionrate under diﬀerent subsidy regimes deﬁned by diﬀerent values of the eligibility threshold.We ﬁnd that even under the very generous subsidy regime where almost everyone in thesample receives the subsidy, the average long-run adoption rate does not exceed 20%, dueto the large negative spillover in the long-run. Related Literature

Recent works on causal inference under spillovers mainly concentrate on the case with ran-dom treatment, i.e., they do not address treatment choice endogeneity. Examples includeHudgens and Halloran (2008), Leung (2020a), and Vazquez-Bare (2020).In causal inference literature, game-theoretic models have been used in several papers.Lazzati (2015) proposes a structural model of treatment responses using games of completeinformation. However, the paper does not address the endogeneity of treatment choices.Balat and Han (2019) allow spillovers at both choice and outcome stages using gametheoretic approach. Their model is diﬀerent from ours in that they model treatment choiceby a binary game of complete (perfect) information. Also, Balat and Han (2019) consideran interaction within groups while we consider an interaction under general network.While the assumption of complete information may be appropriate under interactionsin a relatively small group, incomplete information assumption is more reasonable undernetwork interactions, especially when the network size is large. Jackson et al. (2020) model5reatment choices as a binary game of incomplete information. However, they do notconsider spillovers at the outcome level while we are interested in separately identifyingthe individual treatment eﬀect and spillover eﬀect.Meanwhile a literature from statistics has started to incorporate spillovers and non-compliance in network setting. See Imai et al. (2020) for the most recent progress. Un-like our game-theoretic model, their model is reduced-form in nature and consequently,important aspects of economic mechanism behind treatment choices such as utility max-imization are largely ignored.

Outline

We describe our model in Section 2. We ﬁrst outline our model of treatment choices andthen the model of potential outcomes. Parameters of interest are also discussed. Section3 discusses identiﬁcation of parameters of interest. We ﬁrst show that the conventionalIV methods are not valid in the presence of treatment eﬀect heterogeneity. We then showhow to use control function approach to achieve point identiﬁcation. In Section 4, wepropose a simple two-stage estimation procedure. Asymptotic properties are derived andsimulation results are also presented. Section 5 applies our methods to empirical setting.

In this section, we ﬁrst describe our treatment choice models as a binary game under in-complete information. We then describe our model of treatment responses under spillovers.Let N n = { , · · · , n } denote a set of agents. n -many agents are connected througha single, large network. Let G be a symmetric n × n adjacency matrix where ij th entry( G ij ) represents a connection or link between agents. Speciﬁcally, G ij = 1 if agent i and j are connected and G ij = 0 otherwise. We assume G ii = 0 for all i ∈ N n (no self-link).When G ij = 1, we say that i and j are (direct) peers or neighbors . Let N i be a set of i ’speers, i.e., N i = { j ∈ N n : G ij = 1 } . The number of i ’s neighbors or degree of i is denotedas |N i | . We consider a game theoretic model of treatment choice. Speciﬁcally, we characterize arealized treatment choice as a solution to a binary game under incomplete information6layed by agents in a given network. In this framework, agents simultaneously choosetheir treatment status in order to maximize their expected utility, given beliefs about theanticipated behaviors of their peers.

Utility

Each agent i has a vector of observed characteristics X i ∈ X and an unobservedutility shock v i ∈ R . Throughout the paper, we assume that X is a bounded subset of R k . In addition, each i is randomly assigned to treatment. Let Z i ∈ { , } represent i ’srandomized treatment assignment where Z i = 1 if i is assigned to treatment and Z i = 0if i is assigned to control. Let Z = ( Z i ) i ∈N n and X = ( X i ) i ∈N n . There is noncomplianceif Z (cid:54) = D , i.e., for some i , the treatment assignment is diﬀerent from the actual treatmentreceived. There are two possible cases for this: ( Z i , D i ) = (1 ,

0) and ( Z i , D i ) = (0 , i who was assigned to treatment group has refused to take thetreatment. The latter indicates that i has received the treatment even when i was assignedto control group. In this paper, we allow for both cases, i.e., we consider a setting withtwo-sided noncompliance.Unlike Z i , D i is self-selection. We assume that each i chooses D i ∈ { , } by utilitymaximization where the utility that i receives depends on the choices of i ’s peers. Let theutility function of agent i be π ( D i , D − i , X i , Z i , v i ) where D − i ∈ { , } n − is a vector oftreatment choices of agents except for i . We specify the utility function as the followinglinear model: π ( D i , D − i , X i , Z i , v i ) =  X (cid:48) i θ + θ Z i + θ |N i | (cid:80) j ∈N i D j − v i if D i = 10 if D i = 0 . (1)First note that the utility from choosing D i = 0 is normalized as zero. This is withoutloss of generality as only diﬀerence in utilities is identiﬁed. Utility of choosing D i = 1depends on other agents’ treatment choices through the term (cid:80) j ∈N i D j / |N i | , the fractionof peers taking up the treatment. This term represents social interactions or spillovereﬀects in treatment choice. When θ = 0, there are no spillovers and the model becomesa usual single-agent binary choice model as in McFadden (1984). When θ >

0, we havepositive spillovers where the utility of choosing D i = 1 is higher when members of i ’sreference group (directed neighbors in our speciﬁcation) behave similarly. θ > θ <

0, we7onclude that there are negative spillovers in treatment choice.We assume that v i is a private information, i.e., v i is known only to i , and other agentscannot observe v i . Therefore agents have incomplete information over others’ choices. Inother words, i cannot observe other players’ treatment choices at the time their choice ismade. Instead, each agent i chooses an action that maximizes their expected utility giventheir beliefs on (cid:80) j ∈N i D j / |N i | . Beliefs are formed under the information set available to i . Let τ i denote i ’s information set. We specify τ i as follows: Assumption 1 (informational structure) . Let G = ( G ij ) i,j ∈N n , X = ( X i ) i ∈N n and Z =( Z i ) i ∈N n . We assume that ( G, X, Z ) is a public information , i.e., every agent knows theentire network structure ( G ), the vector of observed characteristics ( X ) and the vectorof treatment assignment ( Z ). On the other hand, v i is a private information of i whereits value is only known to i . Therefore τ i = ( G, X, Z, v i ) summarizes the informationavailable to i . The assumption 1 is standard in the literature on games of incomplete information.Let S = ( G, X, Z ) be the set of public information. This is often called a public statevariable as well. For private information v i , we make the following assumption: Assumption 2 (unobserved heterogeneity) . For all i ∈ N n , a private information v i is(i) i.i.d. with a standard normal cdf Φ and(ii) independent of S . As in the standard single-agent binary choice models, distribution of v i must be knownup to a ﬁnite-dimensional parameter. We use the normal distribution only for convenience.Other distributional assumptions such as logit can be used as well. The assumption that v i ’s are independent to each other is critical for our identiﬁcation analysis. This assump-tion implies that the knowledge of v i does not help predicting v j for any j (cid:54) = i . To ourknowledge, identiﬁcation of incomplete information games with correlated private infor-mation in a general network setting is an open question. Assumption 2 (ii) is triviallysatisﬁed if we treat S as ﬁxed. Consequently, we do not address the issue of networkendogeneity as it is not a focus of this paper. Strategy

Let D i ( τ i , θ ) denote i ’s pure strategy which maps i ’s information set τ i = ( S, v i )to a treatment choice D i ∈ { , } given a parameter value θ = ( θ , θ , θ ). Agent i chooses8er optimal action by maximizing her expected utility E [ π ( D i , D − i , X i , Z i , v i ) | τ i ] wherethe expectation is taken with respect to D − i given her belief about D − i . Let σ j,i be i ’sbelief over the event { D j = 1 } given the information τ i . Then σ j,i = def Pr( D j = 1 | τ i ) (2)= Pr( D j ( τ j , θ ) = 1 | τ i ) (3)= Pr( D j ( S, v j , θ ) = 1 | S, v i ) (4)= Pr( D j ( S, v j , θ ) = 1) (5)= σ j ( S, θ ) (6)where the fourth equality follows from the Assumption 2. From the last equality, we seethat σ j,i = σ j for all i (cid:54) = j , i.e., every agent shares a common belief on j ’s choice. Thiscommon belief should be consistent with actual probability of j choosing D j = 1 underrational expectations as we show below. Equilibrium

Given the belief proﬁle of { σ j ( S, θ ) } j (cid:54) = i , agent i calculates the expectedutility he gets when choosing D i = 1 as follows: E (cid:2) π (1 , D − i , X i , Z i , v i ) | τ i (cid:3) = E (cid:2) X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i D j − v i (cid:12)(cid:12) S, v i (cid:3) (7)= X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i Pr( D j = 1 | S ) (cid:124) (cid:123)(cid:122) (cid:125) = σ j ( S,θ ) − v i (8)= X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ j ( S, θ ) − v i . (9)Agent i would choose D i = 1 if E (cid:2) π (1 , D − i , X i , Z i , v i ) | τ i (cid:3) ≥

0. Therefore, D i = (cid:110) v i ≤ X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ j ( S, θ ) (cid:111) . Bayes-Nash equilibrium (BNE) is deﬁned by a vector of choice probabilities σ ∗ ( S, θ ) = (cid:0) σ ∗ i ( S, θ ) (cid:1) i ∈N n that is consistent with the observed decision rule in the sense that it satisﬁes9he following system of equations: σ ∗ i ( S, θ ) = Pr (cid:0) v i ≤ X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) (cid:1) , ∀ i ∈ N n (10)= Φ (cid:16) X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) (cid:17) , ∀ i ∈ N n . (11)Here we use the superscript [ ∗ ] to emphasize that σ ∗ ( S, θ ) is an equilibrium quantity. Inother words, Bayes-Nash equilibrium given (

S, θ ) is a vector σ ∗ ( S, θ ) which is deﬁned as aﬁxed point to the system of equations above. By the implicit function theorem, it can beshown easily that σ ∗ ( S, θ ) is smooth in both S and θ . Therefore the existence of a ﬁxedpoint is guaranteed due to Brouwer’s ﬁxed point theorem for any realized data S andparameter value θ . However, there can be many ﬁxed points σ ∗ ( S, θ ) solving the system.We show that a unique equilibrium exists if we restrict the value of θ to be suﬃcientlymild. Formally, Theorem 1 (unique equilibrium) . Let the pdf of v i be φ ( v ) . Deﬁne λ = | θ | sup u φ ( u ) .For any S and θ , there exists a unique equilibrium { σ ∗ j ( S, θ ) } j ∈N n if λ < . See appendix A for proof. When v i is normally distributed, we have sup u φ ( u ) =1 / √ π . Therefore λ < | θ | < √ π ≈ .

5. Throughout the paper weassume that λ <

Assumption 3 (unique equilibrium) . | θ | < √ π . Under the unique equilibrium, agent’s treatment choice can be written as the followingreduced-form equation: D i = (cid:110) v i ≤ X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) (cid:111) (12) ⇐⇒ D i = (cid:110) Φ( v i ) ≤ Φ (cid:16) X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) (cid:17)(cid:111) (13) ⇐⇒ D i = (cid:110) Φ( v i ) ≤ σ ∗ i ( S, θ ) (cid:111) (14)where the last step follows from 11.The story goes like this: For given S and θ , the equilibrium choice probabilities σ ∗ i ( S, θ ) , ∀ i ∈ N n are realized. Observing this equilibrium, each agent chooses their treat-ment status according to either 12,13 or 14.10 .2 Potential Outcomes Model with Spillovers In this section, we propose our model of treatment response in settings with spillovers.Previous research on treatment response has been based on the SUTVA assumption whichrequires that an individual’s outcome depends only on their own treatment status. Underthe SUTVA assumption, i ’s outcome or response Y i can be written as Y i = Y i ( D i ). Let d ∈ { , } be the possible treatment value that agents can get. Potential outcome underthe SUTVA assumption is denoted by Y i ( d ), which delivers the response of i when assignedto D i = d . Unlike the SUTVA case, however, there is no obvious way to model spilloversin the treatment response. As Manski (2013) and Kline and Tamer (2020) show, there aremany ways to relax the SUTVA assumption, each of which is based on diﬀerent restrictionson the nature of interference between agents.In our paper, we assume that i ’s outcome is a function of a direct eﬀect from owntreatment status and an indirect eﬀect or spillover eﬀect from i ’s neighbors. Spillovereﬀects are assumed to be mediated by (cid:80) j ∈N i σ ∗ j ( S, θ ) / |N i | . For notational simplicity, let usdeﬁne π ∗ i ( S, θ ) = (cid:80) j ∈N i σ ∗ j ( S, θ ) / |N i | . Also, π ∗ i and π ∗ i ( S, θ ) will be used interchangeably.Thus, we write the realized outcome of i as follows: Y i = Y i ( D i , π ∗ i )where π ∗ i = π ∗ i ( S, θ ) is the average of equilibrium treatment choice probabilities of i ’sneighbor. From now on, we simply refer to π ∗ i as i ’s “ neighborhood (propensity) score ”.This is the average value of propensity scores of i ’s direct neighbors where each scoremeasures the probability of taking up the treatment given the public information S .Jackson et al. (2020) have termed the same object as “peer-inﬂuenced propensity score”.Let π ∈ [0 ,

1] be the possible value that π ∗ i can take. The potential outcome Y i ( d, π )represents i ’s response when we exogenously assign D i = d and π ∗ i = π . Concretely, Y i (1 , π ) represents i ’s outcome when i is required to be treated and i ’s neighborhoodscore has been exogenously set to π . Similarly Y i (0 , π ) is i ’s outcome when i is forbiddento be treated and i ’s neighborhood score has been exogenously set to π . Underlyingassumption is that it is possible to manipulate the value of D i and π ∗ i . Since π ∗ i is afunction of the public state variable S = ( G, X, Z ), we can conceivably manipulate thevalue of π ∗ i by changing Z for a given ( G, X ), which is assumed to be predeterminedand non-manipulable. Thus Y i ( d, π ) can be realized through changing Z proﬁle in the11opulation in a way that it induces π ∗ i = π as an equilibrium in the ﬁrst-stage and thenrequiring i to choose D i = d . Comparison to other approaches

The existing literature with interference often modelspotential outcomes as a function of own treatment status and the proportion of treatedneighbors or the number of treated neighbors (e.g. Hudgens and Halloran (2008), Leung(2020a), Vazquez-Bare (2020)). Deﬁne ¯ D i ≡ (cid:80) j ∈N i D j / |N i | with a generic value ¯ d ∈ [0 , Y i = Y i ( D i , ¯ D i ) and the potentialoutcomes as Y i ( d, ¯ d ). Our model diﬀers from theirs in that we model spillovers via ex ante (anticipated) expectation of ¯ D i rather than ex post realization of ¯ D i itself. Recall that π ∗ i ( S, θ ) = E [ ¯ D i | S ]. Since the diﬀerence between ¯ D i and π ∗ i ( S, θ ) has a mean zero (i.e., E [ ¯ D i − π ∗ i ( S, θ ) | S ] = 0), in practice the values of these two quantities may not be toodiﬀerent, especially when |N i | is large.Nevertheless, they are based on two diﬀerent behavioral assumptions. Suppose thatthe outcome of interest represents decision or behavior of agents. Then the formulation Y i = Y i ( D i , ¯ D i ) is derived under the assumption that agents base their decisions on ¯ D i rather than expected ¯ D i . This is realistic only when ¯ D i is fully observed at the timedecision on Y i is made. Thus, the model could be interpreted as a model with completeor perfect information. On the other hand, our speciﬁcation Y i = Y i ( D i , π ∗ i ) assumes thatagents do not fully observe ¯ D i when they decide their Y i . Thus agents face an intrinsic un-certainty over others’ treatment choices even at the second-stage. This is plausible whenthe reference group is relatively large so that it is not easy for agents to fully observethe value of ¯ D i . Also, there are settings where agents are reluctant to reveal their treat-ment status — For instance when treatment represents learning about their HIV statusas in Godlonton and Thornton (2012). In such cases, it may be more realistic to assumethat agents have private information even in the second stage. Unlike ¯ D i , the equilibriumneighborhood score π ∗ i is always observable to agents as it is a function of public informa-tion S . Thus it is plausible that agents base their decisions on the equilibrium quantity π ∗ i which signals a priori prevalence of treatment adoption in the neighborhood. Note that some combination ( d, π ) may represent oﬀ-the-equilibrium quantity. Thus, the resulting Y i ( d, π ) may not be a policy-relevant counterfactual. Nevertheless, to deﬁne causal eﬀects rigorously, weneed to consider every possible combinations of ( d, π ) ∈ { , } × [0 , andom Coeﬃcients Model of Potential Responses We put more structure on Y i ( d, π )by using random coeﬃcients model where we allow for a correlation between individualtreatment status and random coeﬃcients. Therefore our model can be seen as a correlatedrandom coeﬃcient model as in Masten and Torgovitsky (2016) and Wooldridge (2003). Assumption 4 (random coeﬃcient model) . (i) For any i ∈ N n , d ∈ { , } and π ∈ [0 , , we have Y i (1 , π ) = α i + β i π, Y i (0 , π ) = α i + β i π where ( α i , β i ) and ( α i , β i ) are unit-speciﬁc coeﬃcients.(ii) For S = ( G, X, Z ) , unit-speciﬁc coeﬃcients satisfy the following restrictions: E [ α i | S ] = E [ α i | X i ] = X (cid:48) i α , & E [ β i | S ] = E [ β i | X i ] = X (cid:48) i β and similarly, E [ α i | S ] = E [ α i | X i ] = X (cid:48) i α , & E [ β i | S ] = E [ β i | X i ] = X (cid:48) i β . Recall that Y i (1 , π ) represent i ’s response when i is given the treatment and i ’s neigh-borhood score had been exogenously set to π . Under the Assumption 4 (i), such responseis assumed to be linear in π with the intercept α i and the slope β i that are allowed to bediﬀerent across agents. Similarly, Y i (0 , π ) is assumed to be linear in π with the intercept α i and the slope β i . Note that unit-speciﬁc coeﬃcients under the treatment, ( α i , β i ),are allowed to be diﬀerent from those without the treatment, ( α i , β i ) for generality.The assumption that π aﬀects the potential outcomes Y i (1 , π ) and Y i (0 , π ) in a linearway is only for convenience. It is straightforward to extend our model to include higher-order terms such as π , e.g., Y i ( d, π ) = α d,i + β d,i π + γ d,i π for d ∈ { , } .Unit-speciﬁc coeﬃcients are unobservable random variables that are potentially de-pendent on unit’s observed covariates. By Assumption 4 (ii), we assume that the observedparts of the coeﬃcients depend on the public state variable S = ( G, X, Z ) only through X i . Importantly, this assumption implies that Z is irrelevant for the random coeﬃcients.This rules out the case that the treatment assignment vector Z = ( Z i , Z − i ) directly af-fects Y i . This is the standard exclusion restriction of instruments. Therefore under thisassumption, Z is given a status of an instrumental variable.13he assumption that G is redundant is only for convenience as we can always includenetwork statistics such as the number of direct peers in X i . Finally, that the conditionalexpectation is linear in X i is also for convenience as we can always allow X i to includenonlinear functions of underlying covariates.Under Assumption 4 (ii), we can decompose the unit-speciﬁc coeﬃcients into its meanpart given X i , and its deviation from mean as follows: α i = X (cid:48) i α + u i , E [ u i | S ] = 0 ,β i = X (cid:48) i β + e i , E [ e i | S ] = 0 . Analogously for D i = 0 as well: α i = X (cid:48) i α + u i , E [ u i | S ] = 0 ,β i = X (cid:48) i β + e i , E [ e i | S ] = 0 . Therefore the potential outcomes can be written as Y i (1 , π ) = X (cid:48) i α + u i + π (cid:0) X (cid:48) i β + e i (cid:1) , E [ u i | S ] = E [ e i | S ] = 0 ,Y i (0 , π ) = X (cid:48) i α + u i + π (cid:0) X (cid:48) i β + e i (cid:1) , E [ u i | S ] = E [ e i | S ] = 0 , while the observed outcome is given as follows: Y i = Y i ( D i , π ∗ i ) =  X (cid:48) i α + u i + π ∗ i (cid:0) X (cid:48) i β + e i (cid:1) if D i = 1 X (cid:48) i α + u i + π ∗ i (cid:0) X (cid:48) i β + e i (cid:1) if D i = 0Our model contains the four-dimensional error term: η i = ( u i , e i , u i , e i ). By construc-tion, η i are uncorrelated with S , i.e., E [ η i | S ] = 0. By having η i , random coeﬃcients areallowed to be heterogeneous even after controlling for relevant observed characteristics X i . The importance of allowing for such unobserved heterogeneity has been emphasizedin the modern program evaluation literature (See, e.g., Heckman (2001), Heckman et al.(2006) and Imbens (2007)). 14 .3 Parameters of Interest In this section, we formally deﬁne our parameters of interest, the class of average casualeﬀects. For this purpose, let us ﬁrst study average potential outcomes functions.

Average potential outcomes

Under our speciﬁcations, average potential outcomes foragents with X i = x are computed as follows: for π ∈ [0 , E [ Y i (1 , π ) | X i = x ] = x (cid:48) α + ( x (cid:48) β ) π, E [ Y i (0 , π ) | X i = x ] = x (cid:48) α + ( x (cid:48) β ) π. Integrating them over identically distributed X i gives the unconditional average potentialoutcomes. Letting µ X = E [ X i ], E [ Y i (1 , π )] = µ (cid:48) X α + ( µ (cid:48) X β ) π (15)= α m + β m π, (16) E [ Y i (0 , π )] = µ (cid:48) X α + ( µ (cid:48) X β ) π (17)= α m + β m π (18)where ( α m , β m , α m , β m ) = ( µ (cid:48) X α , µ (cid:48) X β , µ (cid:48) X α , µ (cid:48) X β ). Since µ X is identiﬁable fromthe data, identiﬁcation of ( α m , β m , α m , β m ) requires one to identify ( α , β , α , β ).( α m , α m ) represent the baseline mean potential outcomes when we set π = 0, i.e.,( α m , α m ) = ( E [ Y i (1 , , E [ Y i (0 , π is captured by ( β m , β m ).On the other hand, ( α , β , α , β ) measures the heterogeneous eﬀect of X i on themean potential outcomes. To see this, notice that the following equations hold: E [ Y i (1 , π ) | X i = x ] = x (cid:48) α + πx (cid:48) β = E [ Y i (1 , π )] + ( x − µ X ) (cid:48) α + π ( x − µ X ) (cid:48) β , E [ Y i (1 , π ) | X i = x ] = x (cid:48) α + πx (cid:48) β = E [ Y i (0 , π )] + ( x − µ X ) (cid:48) α + π ( x − µ X ) (cid:48) β . Therefore for d ∈ { , } , ( α d , β d ), without constant coeﬃcients parts, explains the diﬀer-ence between E [ Y i ( d, π ) | X i = x ] and E [ Y i ( d, π )].15 verage causal eﬀects Given the average response functions, we now deﬁne averagecausal eﬀects, which are our parameters of interest. Let us deﬁne the average direct eﬀect(ADE) of own treatment under π as follows: ADE ( π ) = E [ Y i (1 , π ) − Y i (0 , π )] .ADE ( π ) measures the average change in outcomes under the regime in which i is requiredto choose D i = 1, compared to the regime in which i is forbidden to choose D i = 1 while i ’s neighborhood score is ﬁxed to π . Under our random coeﬃcients speciﬁcation, ADE ( π )can be written as ADE ( π ) = α m − α m + ( β m − β m ) π. Similarly, we deﬁne average spillover eﬀect (ASE) from changing the neighborhoodscore from π to ˜ π for each d ∈ { , } as follows: ASE ( π, ˜ π, d ) = E [ Y i ( d, ˜ π ) − Y i ( d, π )] = (˜ π − π ) β dm , which measures the eﬀect of changing the neighborhood score from π to ˜ π while ﬁxingagent’s treatment status at D i = d . Whether β m = 0 or β m = 0 is of interest as itindicates whether there are treatment spillovers at the outcome level. In sum, our model of treatment choices and outcomes can be written as the followingsemi-triangular system: Y i = Y i ( D i , π ∗ i ) =  X (cid:48) i α + u i + (cid:0) X (cid:48) i β + e i (cid:1) π ∗ i if D i = 1 X (cid:48) i α + u i + (cid:0) X (cid:48) i β + e i (cid:1) π ∗ i if D i = 0 (19) D i = { v i ≤ X (cid:48) i θ + θ Z i + θ π ∗ i } (20)s.t. σ ∗ i = Φ (cid:16) X (cid:48) i θ + θ Z i + θ π ∗ i (cid:17) , ∀ i ∈ N n . (21)Using the formula Y i = D i Y i (1 , π ∗ i )+(1 − D i ) Y i (0 , π ∗ i ) = Y i (0 , π ∗ i )+ D i ( Y i (1 , π ∗ i ) − Y i (0 , π ∗ i )),19 can be written as follows: 16 i = X (cid:48) i α + π ∗ i X (cid:48) i β + D i X (cid:48) i ( α − α ) + D i π ∗ i X (cid:48) i ( β − β ) + (cid:15) i (22)where (cid:15) i = u i + π ∗ i e i + D i (cid:0) u i − u i + π ∗ i ( e i − e i ) (cid:1) . (23)Equation 22 gives the conventional linear regression model. Naturally, one may con-sider estimating ( α , α , β , β ) by the least squares regression of Y i on ( X i , π ∗ i X i , D i X i , D i π ∗ i X i ).Resulting OLS estimator is consistent only when (cid:15) i is uncorrelated with the regressors,i.e., E [ (cid:15) i | D i , X i , π ∗ i ] = 0 which requires that the following two conditions hold: E [ u i + π ∗ i e i | D i = 0 , X i , π ∗ i ] = ( a ) E [ u i + π ∗ i e i | X i , π ∗ i ] = ( b ) , E [ u i + π ∗ i e i | D i = 1 , X i , π ∗ i ] = ( a ) (cid:48) E [ u i + π ∗ i e i | X i , π ∗ i ] = ( b ) (cid:48) . Since η i = ( u i , u i , e i , e i ) are uncorrelated with S = ( G, X, Z ) by construction, ( b ) and( b ) (cid:48) are automatically satisﬁed. Therefore, we only need to show that ( a ) and ( a ) (cid:48) aresatisﬁed. This is true only when D i is uncorrelated with η i conditional on ( X i , π ∗ i ). This isthe familiar selection-on-observables assumption. Such assumption is unlikely to hold if thetreatment group and control group are systematically diﬀerent in their unobserved factors η i even after controlling for all relevant observables. Indeed, the very fact that agents withthe same observed characteristics ( X i , π ∗ i ) have made diﬀerent treatment choices suggeststhat they diﬀer in their unobserved factors. Thus, the source of endogeneity comes fromthe correlation between v i and η i even after conditional on S .More speciﬁcally, note that the selection-on-observables assumption requires that thefollowing two conditions hold: Corr ( Y i (0 , π ∗ i ) , D i | X i , π ∗ i ) = 0 (24)and Corr ( Y i (1 , π ∗ i ) − Y i (0 , π ∗ i ) , D i | X i , π ∗ i ) = 0 . (25)Condition 24 requires that the idiosyncratic part of Y i (0 , π ∗ i ) is uncorrelated with D i ,17.e., in the absence of the treatment, there should be no diﬀerence in the mean potentialoutcomes across treatment group and control group once we account for relevant observ-ables ( X i , π ∗ i ). However, agents who take up the treatment may have unusual values of Y i (0 , π ) even after controlling for ( X i , π ∗ i ). If individuals who take up the treatment tendto have higher values of Y i (0 , π ) in terms of unobservables, then the naive least squaresregression would suﬀer from an upward bias since cov ( D i , (cid:15) i | S ) >

0. This is the case ofclassic selection problem.The requirement 25 is also troublesome as the condition implies that the unobservedgain from the treatment given π ∗ i should not vary across treatment group and controlgroup. This is not satisﬁed if the treatment choice is correlated with unobserved gains fromthe treatment. It is plausible that agents have some knowledge of likely idiosyncratic gainsfrom the treatment at the time they choose their treatment status. If agent’s treatmentchoice is partially based on such knowledge, then 25 would not be satisﬁed. This typeof sorting on the unobserved gain, termed “ essential heterogeneity ” by Heckman et al.(2006), has been emphasized in the modern program literature.In conclusion, whenever selection problem or essential heterogeneity exists, the naiveOLS regression delivers inconsistent estimates of structural parameters ( α , α , β , β ). In the previous section, we showed that the OLS regression of 22 suﬀers from bias when v i is correlated with η i = ( u i , u i , e i , e i ) even when we control for S . In this section,we ﬁrst show that the IV methods do not identify the casual parameters of interest inthe presence of general heterogeneity. We then propose the alternative method known ascontrol function approach. Endogeneity is often addressed by IV methods such as two-stage least squares (2SLS). Inour setup, Z i is a valid IV for D i since (i) D i is correlated with Z i , and (ii) Z i is exogenousand is excluded from the outcome equation. In fact, in the presence of spillovers in theﬁrst stage, not only Z i but also n -dimensional vector Z = ( Z i , Z − i ) is a valid instrumentfor D i since in that case, D i is a function of entire assignment vector Z . . Therefore, Recall that when there exist spillovers in the ﬁrst stage choice model, not only i ’s direct neighbor’s Z but indirect neighbors’ Z also aﬀect D i . Therefore Z j for j that are eventually connected to i is also

18e may run an IV regression to 22 where we instrument D i by Z i or by Z = ( Z i , Z − i ),depending on whether spillovers exist in the ﬁrst stage.We argue that such strategy does not identify ( α , β , α , β ) in our setup. Sup-pose we instrument D i by Z i . The resulting IV estimator is consistent only when the E [ (cid:15) i | Z i , X i , π ∗ i ] = 0 where (cid:15) i = u i + π ∗ i e i + D i (cid:0) u i − u i + π ∗ i ( e i − e i ) (cid:1) as in 23. Notethat, E [ (cid:15) i | Z i , X i , π ∗ i ]= E [ u i + π ∗ i e i + D i (cid:0) u i − u i + π ∗ i ( e i − e i ) (cid:1) | Z i , X i , π ∗ i ]= E [ u i + π ∗ i e i | Z i , X i , π ∗ i ] (cid:124) (cid:123)(cid:122) (cid:125) A + E [ u i − u i + π ∗ i ( e i − e i ) | D i = 1 , Z i , X i , π ∗ i ] (cid:124) (cid:123)(cid:122) (cid:125) B Pr( D i = 1 | Z i , X i , π ∗ i ) (cid:124) (cid:123)(cid:122) (cid:125) C . A = 0 since η i = ( u i , u i , e i , e i ) is uncorrelated with S , and thereby with ( Z, X i , π ∗ i ). C cannot be zero except for trivial cases. Therefore E [ (cid:15) i | Z, X i , π ∗ i ] = 0 only when B = 0.This is satisﬁed when E [ u i − u i + π ∗ i ( e i − e i ) | D i = 1 , Z i , X i , π ∗ i ] = E [ u i − u i + π ∗ i ( e i − e i ) | Z i , X i , π ∗ i ] as E [ η i | S ] = 0 implies that the last term is zero. Note that u i − u i + π ∗ i ( e i − e i ) can be interpreted as an idiosyncratic part of Y i (1 , π ∗ i ) − Y i (0 , π ∗ i ). Thereforewe need to assume that D i is uncorrelated with the idiosyncratic gain from taking thetreatment once we condition on ( Z i , X i , π ∗ i ). Such requirement is unrealistic when agentshave some knowledge on their idiosyncratic gains and base their treatment decision onsuch knowledge, i.e., when there is sorting on unobserved gains.Whether the u i − u i + π ∗ i ( e i − e i ) is correlated with D i is an empirical matter andshould not be settled a priori. IV methods rule out the possibility of such correlation andare subject to failure when the correlation exists. This point has also been pointed out inthe traditional treatment eﬀect literature which rules out spillover eﬀects. (See Hahn andRidder (2011)). For instance, it is now well established in the literature that IV/2SLSdoes not recover the average causal parameters such as ATE under the heterogeneousresponses model such as random coeﬃcients models (See Imbens and Angrist (1994)). We now propose the alternative strategy known as the control function approach. Con-trol function approach addresses the endogeneity problem by explicitly formulating the relevant for D i . However as the network distance between i and j becomes greater, the dependence between Z j and D i decays exponentially when λ <

1. (See Xu (2018) and Leung (2020b)). Therefore, using Z j that is too far from i as an IV may incur weak IV problem. E [ Y i | D i = 1 , S ] and E [ Y i | D i = 0 , S ] as follows: E [ Y i | D i = 1 , S ]= E [ Y i | D i = 1 , σ ∗ i ( S, θ ) , π ∗ i ( S, θ ) , S ]= E [ Y i (1 , π ∗ i ( S, θ )) | v i ≤ Φ − ( σ ∗ i ( S, θ )) , σ ∗ i ( S, θ ) , π ∗ i ( S, θ ) , S ]= X (cid:48) i α + E [ u i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] + π ∗ i ( S, θ ) (cid:110) X (cid:48) i β + E [ e i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] (cid:111) since D i = 1 ⇐⇒ Φ( v i ) ≤ σ ∗ i (See 14). Similarly, the observed conditional mean for thecontrol group is, E [ Y i | D i = 0 , S ]= X (cid:48) i α + E [ u i | v i > Φ − ( σ ∗ i ( S, θ )) , S ] + π ∗ i ( S, θ ) (cid:110) X (cid:48) i β + E [ e i | v i > Φ − ( σ ∗ i ( S, θ )) , S ] (cid:111) . The terms E [ u i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] , E [ e i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] and E [ u i | v i > Φ − ( σ ∗ i ( S, θ )) , S ] , E [ e i | v i > Φ − ( σ ∗ i ( S, θ )) , S ] are “control functions” which account forthe endogeneity of D i . Assumption 5 below restricts the form of these control functions. Assumption 5.

For all i ∈ N n , η i = ( u i , u i , e i , e i ) satisﬁes the following conditions.(i) η i is i.i.d. and is independent of S .(ii) E [ η i | v i ] is a linear function of v i .Under these two conditions, we write E [ u i | v i , S ] = E [ u i | v i ] = ρ u v i , E [ e i | v i , S ] = E [ e i | v i ] = ρ e v i , E [ u i | v i , S ] = E [ u i | v i ] = ρ u v i , E [ e i | v i , S ] = E [ e i | v i ] = ρ e v i where ρ = ( ρ u , ρ e , ρ u , ρ e ) captures the covariances between each component of η i and v i . Assumption 5 (i) is often referred to as “separability” assumption and has been uti-lized in literature as in Carneiro et al. (2011) and Brinch et al. (2017). Under this as-sumption, the control functions depend only on the individual propensity score σ ∗ i ( S, θ ),e.g., E [ u i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] = E [ u i | v i ≤ Φ − ( σ ∗ i ( S, θ ))] so that the control func-tions are separated from S . As a result, E [ Y i | D i = 1 , S ] and E [ Y i | D i = 0 , S ] depend on20 only though ( X i , π ∗ i , σ ∗ i ). This step is necessary since it is not possible to control for S = ( G, X, Z ) itself as our data consist of one large network.Assumption 5 (ii) further allows us to write E [ u i | v i ≤ Φ − ( σ ∗ i )], for instance, as ρ u E [ v i | v i ≤ Φ − ( σ ∗ i )]. Combined with the normality assumption on v i , we eﬀectivelyassume that ( η i , v i ) are jointly normal. However, it can easily accommodate alternativedistributional assumptions on v i other than normality.Under the joint normality assumption, control functions take a form of inverse millsratio. Deﬁne λ ( · ) and λ ( · ) as follows: For σ ∈ (0 , λ ( σ ) = − φ (Φ − ( σ )) σ , λ ( σ ) = φ (Φ − ( σ ))1 − σ . It follows that E [ Y i | D i = 1 , S ] = X (cid:48) i α + ρ u λ ( σ ∗ i ) + π ∗ i (cid:0) X (cid:48) i β + ρ e λ ( σ ∗ i ) (cid:1) , E [ Y i | D i = 0 , S ] = X (cid:48) i α + ρ u λ ( σ ∗ i ) + π ∗ i (cid:0) X (cid:48) i β + ρ e λ ( σ ∗ i ) (cid:1) . Let λ i = D i λ i + (1 − D i ) λ i . We see that ( α , β , ρ u , ρ e ) is identiﬁed by regressing Y i on ( X (cid:48) i , λ i , π ∗ i X (cid:48) i , π ∗ i λ i ) (cid:48) using the subsample of D i = 1. Similarly, we can identify( α , β , ρ u , ρ e ) by regressing Y i on X i , λ i and their interactions with π ∗ i using the sub-sample of D i = 0. The inclusion of λ i accounts for the correlation between η i and v i sothat we can test for the endogeneity of D i by checking whether correlations are collectivelyzero or not.Our model achieves a point identiﬁcation by exploiting a functional form assumptionbetween η i and v i . We can relax the linearity assumption and have more ﬂexible parametricfunctional form by adding higher-order terms. For instance, we may specify E [ u i | v i ] asthe quadratic function of v i as follows: E [ u i | v i ] = ρ u v i + ˜ ρ u v i . Then it can be shown that E [ u i | v i ≤ Φ − ( σ ∗ i )] = − ρ u φ (Φ − ( σ ∗ i )) σ ∗ i + ˜ ρ u (cid:104) Φ − ( σ ∗ i ) φ (Φ − ( σ ∗ i )) σ ∗ i + (cid:110) φ (Φ − ( σ ∗ i )) σ ∗ i (cid:111) (cid:105) . This also oﬀers a way to test for linearity assumption in a spirit of Lee (1984).21

Estimation

We propose a two-stage estimation procedure. In the ﬁrst-stage, we estimate the treatmentchoice games using a nested ﬁxed point maximum likelihood (NFXP-ML) method. Inthe second-stage, using ﬁrst-stage estimates, we estimate regression models of treatmentoutcomes with generated regressors.

Recall that the treatment choice models boil down to equation 20 subject to the ﬁxed-point requirement 28. Our sample log-likelihood function are deﬁned as follows: (cid:98) L n ( θ ) = 1 n n (cid:88) i =1 (cid:110) D i ln σ ∗ i ( S, θ ) + (1 − D i ) ln(1 − σ ∗ i ( S, θ )) (cid:111) (26)Our estimator ˆ θ = (ˆ θ , ˆ θ , ˆ θ ) is deﬁned as the maximizer of (cid:98) L n ( θ ) subject to the constraintthat { σ ∗ i ( S, ˆ θ ) } satisﬁes the ﬁxed-point requirement. Formally,ˆ θ = arg max θ ∈ Θ (cid:98) L n ( θ ) (27)subject to σ ∗ i ( S, ˆ θ ) = Φ (cid:16) X (cid:48) i ˆ θ + ˆ θ Z i + ˆ θ |N i | (cid:88) j ∈N i σ ∗ j ( S, ˆ θ ) (cid:17) , ∀ i ∈ N n (28)For computation, we use the nested ﬁxed point (NFXP) algorithm. Speciﬁcally, startingwith an arbitrary initial guess for ˆ θ , we ﬁnd the ﬁxed point of 28 via contraction iterations(it can be shown that 28 is a contraction mapping when λ < θ toˆ θ (cid:48) according to, say, Newton’s method. Iterate the procedure until a sequence of estimatesconverges. Our NFXP-ML estimator is taken as its limit. Let us deﬁne the set of regressors as W i = [ X (cid:48) i , λ i , π ∗ i ( S, θ ) X (cid:48) i , π ∗ i ( S, θ ) λ i ] (cid:48) λ i = D i λ i + (1 − D i ) λ i with λ i = λ ( σ ∗ i ( S, θ )) and λ i = λ ( σ ∗ i ( S, θ )).Our estimators are based on the following moment conditions E [ Y i | D i = 1 , S ] = W (cid:48) i γ , E [ Y i | D i = 0 , S ] = W (cid:48) i γ where γ = ( α , ρ u , β , ρ e ) (cid:48) and γ = ( α , ρ u , β , ρ e ) (cid:48) .This suggests that γ and γ can be estimated by regressing Y i on W i , separately to thesubsample with D i = 1 and D i = 0, respectively. However, since λ i and π ∗ i are functions ofunknown ﬁrst-stage parameters θ , we need to replace θ with ˆ θ . Deﬁne ˆ λ i = λ ( σ ∗ i ( S, ˆ θ ))and ˆ λ i = λ ( σ ∗ i ( S, ˆ θ )). Let ˆ λ i = D i ˆ λ i + (1 − D i )ˆ λ i . Similarly, we replace the unknownquantity π ∗ i ( S, θ ) ≡ |N i | (cid:80) j ∈N i σ ∗ j ( S, θ ) with ˆ π ∗ i = π ∗ i ( S, ˆ θ ) = |N i | (cid:80) j ∈N i σ ∗ j ( S, ˆ θ ). Thus,our generated regressor ˆ W i for W i isˆ W i = [ X (cid:48) i , ˆ λ i , ˆ π i X (cid:48) i , ˆ π i ˆ λ i ] (cid:48) . Estimator for γ is then deﬁned asˆ γ = arg min γ n n (cid:88) i =1 D i (cid:0) Y i − ˆ W (cid:48) i γ (cid:1) = (cid:110) n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:111) − n (cid:88) i =1 D i ˆ W i Y i . Similarly, estimator for γ isˆ γ = arg min γ n n (cid:88) i =1 (1 − D i ) (cid:0) Y i − ˆ W (cid:48) i γ (cid:1) = (cid:110) n (cid:88) i =1 (1 − D i ) ˆ W i ˆ W (cid:48) i (cid:111) − n (cid:88) i =1 (1 − D i ) ˆ W i Y i . For the asymptotic analysis, we consider large-network asymptotics in which a number ofindividuals connected in a single network goes to inﬁnity. Moreover, for each n , we treat S = ( G, X, Z ) as ﬁxed. This is justiﬁed since S is an ancillary statistics, i.e., S does notcontain any information on the parameters of interest.23 .3.1 Inference for the ﬁrst-stage game We ﬁrst establish √ n -consistency and asymptotic normality of the ﬁrst-stage estimatorˆ θ . The true parameter is denoted by θ . Therefore our data { D i } ni =1 is assumed to begenerated from D i = { v i ≤ X (cid:48) i θ + θ Z i + θ π ∗ i ( S, θ ) } subject to σ ∗ i ( S, θ ) = Φ (cid:0) X (cid:48) i θ + θ Z i + θ π ∗ i ( S, θ ) (cid:1) for all i ∈ N n . Theorem 2 (consistency of ˆ θ ) . Under the following assumptions, ˆ θ − θ p −→ .(i) The true parameter θ = ( θ , θ , θ ) lies in a compact set Θ ⊆ R dim ( θ ) and | θ | < √ π . The support of X i is a bounded subset of R k .(ii) Let R i = ( X (cid:48) i , Z i , π ∗ i ( S, θ )) (cid:48) . For large enough n , (cid:80) ni =1 R i R (cid:48) i is invertible, i.e., lim inf n →∞ det( n (cid:88) i =1 R i R (cid:48) i ) > . See Appendix B.1 for the proof.

Assumption (i) ensures that there is unique equilibrium at the true parameter (SeeTheorem 1) and that each equilibrium probability σ ∗ i ( S, θ ) ∈ (0 ,

1) for all i . Assumption(ii) is the rank condition for identiﬁcation which requires that for all large enough n . themoment matrix of regressors has full rank.We now establish asymptotic normality of ˆ θ . Let us deﬁne the information matrix asfollows: I n ( θ ) = E (cid:104) n n (cid:88) i =1 ∇ θ l i ( θ ) ∇ θ l i ( θ ) (cid:48) (cid:12)(cid:12)(cid:12) S (cid:105) where l i ( θ ) = D i ln σ ∗ i ( S, θ ) + (1 − D i ) ln(1 − σ ∗ i ( S, θ )) is the individual log-likelihoodfunction. Therefore ∇ θ l i ( θ ) is given by ∇ θ l i ( θ ) = D i ∇ θ σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) + (1 − D i ) −∇ θ σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) . (29) Theorem 3 (asymptotic normality of ˆ θ ) . In addition to the conditions for Theorem 2,assume(i) The true parameter θ lies in the interior of the compact set Θ ⊆ R dim ( θ ) . ii) For any n , I n ( θ ) is nonsingluar.Then ( I − n ( θ )) − / √ n (ˆ θ − θ ) d −→ N (0 , I dim ( θ ) ) (30) where I dim ( θ ) is the dim ( θ ) × dim ( θ ) identity matrix.See Appendix B.2 for proof. Variance Estimation

The asymptotic variance of ˆ θ can be estimated by (cid:100) V ar (ˆ θ ) = (cid:98) I − n /n where (cid:98) I n ≡ n n (cid:88) i =1 ∇ θ l i (ˆ θ ) ∇ θ l i (ˆ θ ) (cid:48) . In order to compute ∇ θ l i (ˆ θ ) using equation 29, we need to evaluate ∇ θ σ ∗ i ( S, ˆ θ ). For thiswe use the numerical approximation method: Take ˆ θ + (cid:15) for a small perturbation (cid:15) (e.g., (cid:15) = 10 − ), then compute the new equilibrium { σ ∗ i ( S, ˆ θ + (cid:15) ) } ni =1 by solving the ﬁxed point. ∇ θ σ ∗ i ( S, ˆ θ ) is then computed by ( σ ∗ i ( S, ˆ θ + (cid:15) ) − σ ∗ i ( S, ˆ θ )) /(cid:15) . Next, we establish √ n -consistency and asymptotic normality of the second-stage estima-tors (ˆ γ , ˆ γ ). Let us denote the true parameters by ( γ , γ ). We assume that our model iscorrectly speciﬁed, i.e., Y i satisﬁes the following conditional moment restrictions: E [ Y i | S, D i = 1] = W (cid:48) i γ , E [ Y i | S, D i = 0] = W (cid:48) i γ . We maintain the conditions for √ n -consistency and asymptotic normality of the ﬁrst-stageestimator ˆ θ . Theorem 4 (consistency of (ˆ γ , ˆ γ )) . Under the following assumptions, ˆ γ − γ p −→ and ˆ γ − γ p −→ (i) The true parameter γ lies in a compact set Γ ⊆ R dim ( γ ) . Similarly, the trueparameter γ lies in a compact set Γ ⊆ R dim ( γ ) .(ii) Let lim inf n →∞ det (cid:110) n (cid:88) i =1 E [ D i W i W (cid:48) i | S ] (cid:111) > nd lim inf n →∞ det (cid:110) n (cid:88) i =1 E [(1 − D i ) W i W (cid:48) i | S ] (cid:111) > . See Appendix B.3 for proof.

Next, we derive the asymptotic results for the second-step estimators. For compact-ness, we only report results for ˆ γ , as ˆ γ case can be derived in an analogous way. Theorem 5 (asymptotic normality of ˆ γ ) . Deﬁne Υ n = E [ 1 n n (cid:88) i =1 D i W i W (cid:48) i | S ]Ψ n = E [ 1 n n (cid:88) i =1 D i W i W (cid:48) i (cid:15) i | S ]+ E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) E (cid:104) n n (cid:88) i =1 ∇ θ l i ( θ ) ∇ θ l i ( θ ) (cid:48) (cid:12)(cid:12)(cid:12) S (cid:105) − E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) (cid:48) In addition to the conditions for Theorem 4, assume(i) The true parameter γ lies in the interior of the compact set Γ ⊆ R dim ( γ ) .(ii) For any n , Ψ n and Υ n are nonsingular.Then we have Λ − / n √ n (ˆ γ − γ ) d −→ N (0 , I dim ( γ ) ) where Λ n = Υ − n Ψ n Υ − n . See Appendix B.4 for proof. If we ignore ﬁrst-stage estimation, the asymptotic variance would beΥ − n E (cid:104) n n (cid:88) i =1 D i W i W (cid:48) i (cid:15) i | S (cid:105) Υ − n which is smaller, in the positive semi-deﬁnite sense, than the correct asymptotic varianceΥ − n Ψ n Υ − n . 26 ariance Estimation The asymptotic variance Λ n can be estimated by replacing thepopulation means by sample counterparts. Speciﬁcally,ˆΥ n = 1 n n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i ˆΨ n = 1 n n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i ˆ (cid:15) i + (cid:16) n n (cid:88) i =1 D i W i ˆ γ (cid:48) ∇ γ W i (ˆ γ ) (cid:17)(cid:16) n n (cid:88) i =1 ∇ θ l i (ˆ θ ) ∇ θ l i (ˆ θ ) (cid:17)(cid:16) n n (cid:88) i =1 D i W i ˆ γ (cid:48) ∇ γ W i (ˆ γ ) (cid:17) (cid:48) where ˆ (cid:15) i = D i ( Y i − ˆ W (cid:48) i ˆ γ ). In this section, we illustrate the ﬁnite sample properties of our estimators through simu-lation exercises.

Exogenous Variables

For simulation purpose, we imitate the environment of Dupas(2014). The network G is constructed from the GPS data of Dupas (2014). Speciﬁcally,two households i and j are considered connected if they live within 500-meter radius.After removing isolated nodes, we have a sample size of 538. The instrumental variable Z is also taken from Dupas (2014) where the binary Z i represents whether i received a highlevel of subsidy or not. Summary statistics of ( G, Z ) can be found in the next section.Throughout the simulation replications, G and Z are treated ﬁxed. We do not consider X . Generating Endogenous Variables

Treatment choices are determined according to thefollowing equation: D i = { v i ≤ θ + θ Z i + θ π ∗ i } where v i ∼ iid N (0 , θ = ( θ , θ , θ ) = ( − , , .

5) under which the probability of D = 1 is around 0.8. Since | θ | < .

5, there exists a unique equilibrium by the Theorem1. Given our parameter values, we can compute the unique equilibrium { σ ∗ i ( G, Z, θ ) } ni =1 by calculating the ﬁxed point to the following system: σ ∗ i ( G, Z, θ ) = Φ (cid:8) θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( G, Z, θ ) (cid:9) , ∀ i ∈ N n π ∗ i is then computed by π ∗ i ( G, Z, θ ) = (cid:80) j ∈N i σ ∗ j ( G, Z, θ ) / |N i | .27oeﬀ. bias se cov.prob.FS θ θ -0.034 0.181 0.937 θ α β -0.005 0.530 0.979 α -0.004 0.333 0.959 β n = 538 with 3000 simulations. Target coverage probability is 0.95.Outcomes are realized according to the following rule: Y i =  α i + β i π ∗ i if D i = 1 α i + β i π ∗ i if D i = 0 . We generate the random coeﬃcients according to α i | v i ∼ iid N (2 + 0 . v i , , β i | v i ∼ iid N (1 + 0 . v i , ,α i | v i ∼ iid N (4 + 0 . v i , , β i | v i ∼ iid N (3 + 0 . v i , , so that ( E [ α i ] , E [ β i ] , E [ α i ] , E [ β i ]) or ( α , β , α , β ) is given as (2 , , , α i , β i , α i , β i ) and v i are given by ( ρ α , ρ β , ρ α , ρ β ) = (0 . , . , . , .

2) sothat D i is endogenous with respect to all coeﬃcients.Table 1 reports the results for the bias, standard errors, and coverage probability for3000 replications. The target coverage probability is 0.95. As we observe from the ﬁrstcolumn, our estimators are unbiased. Our estimators perform well in terms of coverageprobabilities as well. Malaria is a life-threatening infectious disease responsible for approximately 1-3 milliondeaths per year. Most of these deaths are in children less than ﬁve years of age in ruralsub-Saharan Africa. The use of insecticide-treated nets (ITNs) has been shown to be acost-eﬀective way to control malaria. However, the rate of adoption remains low and manyhouseholds exhibit low willingness to pay (WTP) for ITNs. In addition, positive health28ariable deﬁnition mean min maxdegree number of neighbors 16.41 1.00 38.00 Z D Y n = 583)externalities generated from using ITNs render the private adoption level that is less thanthe socially optimal one. For these reasons, public subsidy programs have been proposedto achieve socially optimal coverage rate.While it has been shown that distributing ITNs for free or at highly subsidized pricesis eﬀective in increasing the adoption in the short run, there have been concerns that theshort-run, one-time subsidies would lower household’s WTPs for the product later, andthus reduce the adoption rate in the long-run. This could happen, for instance, when thereexist reference dependence eﬀects in which households anchor their WTPs to previouslypaid subsidized prices. Consequently, households may be unwilling to pay a higher pricefor the product later once the subsidies end.On the other hand, some argue that short-run subsidies would be beneﬁcial for thelong-run adoption since households could learn the beneﬁts of the product better withprior experience. Such learning eﬀects would increase consumer’s future WTPs. Moreover,the adoption process can be facilitated with social learning eﬀects in which householdslearn beneﬁts of the product from their neighbors’ prior experiences. As a result, one-timesubsidies would also be beneﬁcial for long-run adoption rate and household’s WTP.Since ITNs need to be regularly replaced and re-purchased, understanding the factorsdetermining the short-run and long-run adoption decision is an important task for sus-tainable public subsidy schemes. Depending on whether reference dependence or learningeﬀects exist, the subsidy schemes would lead to diﬀerent predictions on the short run andlong run demand for ITNs. In this application, therefore, we study the factors aﬀectingthe short-run and long-run adoption (purchase) decision of ITNs. In doing so, we allowfor possible spillover eﬀects in both short-run and long-run adoption decision. As Dupas(2014) showed, social interactions seem to play an important role in household’s bednetpurchase decision. Depending on whether there exist positive or negative peer eﬀects inthe short run and in the long run, subsidy eﬀectiveness may vary greatly.29ariable estimates marginal eﬀects p-valuespillover ( π ) 2.308 0.661 0.000subsidy 0.694 0.199 0.000female-educ 0.223 0.064 0.026wealth 0.005 0.001 0.001Table 3: estimation results for FS model ( n = 583) Design of Experiment

We use data from a two-stage randomized pricing experimentconducted in Kenya by Dupas (2014). In Phase 1, households within six villages weregiven a voucher for the bednet at the randomly assigned subsidy level varying from 100%to 40% with the corresponding prices varying from 0 to 250 Ksh. In Phase 2, a year later,all study households in four villages were given a second voucher for a bednet. This time,however, all households faced the same subsidy level of 36%.

Data

Let Z i be a binary indicator representing that household i received a high subsidy(deﬁned as the assigned price less than Ksh 50) in Phase 1. Treatment variable D i equalsto 1 if i purchased a bednet in Phase 1. Y i is also binary taking value 1 if i purchased abednet in Phase 2. Following Dupas (2014), we may interpret Y i as a proxy for i ’s WTPfor the future bednet. Network

Using GPS data, we construct the binarized spatial network. Two house-holds i and j are considered connected (i.e., G ij = 1) if they live within 500-meter radius.We also consider 250-m, and 750-m radius. Since the results do not diﬀer much, we onlyreport results for 500-m radius. Other Covariates

For household pre-treatment covariates, we consider wealth, andthe education level of the female head.Summary statistics of the variables can be found on the Table 2. After deleting 25isolated nodes, we have n = 538 observations from four villages. Results on the short-run adoption

We ﬁrst estimate the equation for the short-runadoption decision using our game-theoretic model. Table 3 displays the estimates of co-eﬃcients, marginal eﬀects , as well as associated standard errors and p-values. As an-ticipated, high-subsidy level is associated with higher adoption of the bednet. Education Marginal eﬀects are computed as the sample average of conditional eﬀects. For instance, the marginaleﬀect of Z i is computed as n (cid:80) ni =1 φ ( X (cid:48) i ˆ θ + ˆ θ Z i + ˆ θ π ∗ i ( S, ˆ θ ))ˆ θ . σ ∗ i , π ∗ i ) D = 1 estimates p-value D = 0 estimates p-valuecons 0.497 0.043 cons 0.128 0.174female-educ -0.094 0.530 female-educ -0.070 0.519wealth 0.003 0.388 wealth -0.002 0.325lambda 0.059 0.767 lambda 0.036 0.841 π -0.347 0.324 π -0.021 0.940 π *female-educ 0.031 0.906 π *female-educ 0.176 0.513 π *wealth -0.003 0.610 π *wealth 0.013 0.098 π *lambda 0.317 0.375 π *lambda -0.063 0.832Table 4: estimation results for SS model ( n = 583)and wealth are also positively associated with adoption decision in the short run. Thesevariables are all signiﬁcant at 1 percent level. Figure 1 shows the estimated plot of (ˆ σ ∗ i , ˆ π ∗ i )by the value of Z i . The plot shows clearly that individual Z i is relevant for the treatmentchoice.Our results show strong evidence of the existence of positive spillover eﬀects in theshort-run adoption decision. When the average adoption probability of neighbors ( π ∗ i )increases by 10 percentage points, i ’s short-run adoption probability ( σ ∗ i ) increases by6.6 percentage points. The resulting conformity eﬀects implies that if we ignore spillovereﬀects in the speciﬁcation, we would underestimate the full eﬀect of the programs.31 esults on the long-run adoption Table 4 presents the estimates of own short-runadoption experience ( D i ) and average adoption probability of neighbors ( π ∗ i ) on the long-run adoption decision. Unfortunately, we have very limited statistical power except for fewconstants due to small sample size. However, in terms of magnitudes, estimated coeﬃcientshave implications on the spillover eﬀects in the long-run adoption decision.Using the formula 16 and 18, we get the following estimated mean response functions: (cid:98) E [ Y i (1 , π )] = 0 . − . π, (cid:98) E [ Y i (0 , π )] = 0 . − . π (31)First, let us consider (cid:98) E [ Y i (1 , π )]. Although the coeﬃcient on π is not signiﬁcant, weobserve considerable negative spillover eﬀects in terms of magnitude: If π increases by10 percentage points, the probability of the second-period adoption probability decreasesby 3.4 percentage points. This is contrary to the positive spillovers observed in the ﬁrstperiod adoption decision. One possible explanation for such negative spillovers in thetreated response is that they result from positive health spillovers occurring over time. Forinstance, household with higher value of π would anticipate higher coverage rate in theirarea, which would result in lower malaria prevalence in the long run. This might makehouseholds less likely to re-invest the product later. Such results highlight the importanceof distinguishing the mechanism of static spillovers from that of dynamic spillovers.Such eﬀects do not seem to apply to the untreated households as (cid:98) E [ Y i (0 , π )] shows.However, the statistical power is very limited. Average Direct Eﬀect

From 31, the average direct eﬀect (ADE) of own short-run adop-tion on the long-run adoption is computed as follows: (cid:98) E [ Y i (1 , π ) − Y i (0 , π )] = 0 . − . π (32)The result suggests that the values of ADE vary greatly depending on the value of π :when π = 0, treated households are 36.9 percentage points more likely to invest in thesecond bednet. However, such eﬀect declines with the neighborhood exposure rate π .When π = 1, the eﬀect is almost zero. The fact that ADE is positive for all possiblevalues of π points to the existence of learning eﬀects from prior experience, rather than Dupas (2014) also report similar results from their reduced-form regression models. Their results showthat the adoption in Phase 2 is negatively aﬀected by the share of neighbors who received a high subsidyin Phase 1.

Bias from ignoring spillovers

Suppose that we falsely ignore spillover eﬀects in re-sponses. Using the conventional Heckit model, we obtain the following estimated averagetreatment eﬀect (ATE): ˆ E [ Y i (1) − Y i (0)] = 0 . . Above result suggests that the eﬀect of D on Y is very limited. However as equation 32shows, there is substantial heterogeneity in the eﬀect of D on Y depending on values of π :the eﬀect of D varies from almost 0 percent to 37 percent. Thus, by ignoring the spillovereﬀects, we would draw a misleading conclusion that there is no treatment eﬀect. Observed heterogeneity in eﬀects

Let us turn to the eﬀect heterogeneity due to observ-able covariates, education and wealth. For the treated, the eﬀect of education and wealthon the adoption rate seems to be trivial in magnitude: coeﬃcients are close to zero andtheir associated p-values are large. We also compute the estimates without covariates. Themagnitude of the estimates resembles that with covariates. Therefore we do not reportthe result here. This also suggests that there seems to be little observed heterogeneity in E [ Y i (1 , π )] in terms of education and wealth.On the other hand, for D i = 0 case, the magnitudes of the estimates on the covariatesare much higher than those for D i = 1 case. Consider education ﬁrst. The interactionbetween π and education suggests that higher education is associated with higher spillovereﬀect — one more year of education increases the eﬀect of π from − .

02 to − .

02 + 0 .

17 =0 .

15. Similarly if wealth level increases by 1000 units, the eﬀect on π increases by 1 . One advantage of our structural approach is that it allows researchers to simulate counter-factual policies. Suppose that a policy-maker is interested in implementing means-testedsubsidy schemes where Z is determined according to the following rule: Z i = { wealth i ≤ τ } , ∀ i ∈ N n l (33)33.e., household i gets high subsidy only when their wealth level is below some speciﬁedthreshold τ . The question is: what would be the expected outcome under this new, coun-terfactual subsidy rule?This problem is related to the literature on the policy-relevant treatment eﬀects(PRTE: Heckman and Vytlacil (2001)). In this framework, each intervention or policyis deﬁned by a manipulation on the exogenous variable S = ( G, X, Z ). In our setup, weassume that a policy maker has no means of changing the underlying network structure G or pre-treatment covariates X . Thus, the only way to change S is through changing Z . Let us denote the new counterfactual policy as S new = ( G, X, Z new ) where we set thevalue of Z as Z = Z new , which is not in the data. i ’s expected outcome under the newpolicy is given as E [ Y i | S = S new ]. Note that for any S , E [ Y i | S ] = E [ Y i | D i = 1 , S ] Pr( D i = 1 | S ) + E [ Y i | D i = 0 , S ] Pr( D i = 0 | S ) (34)Under our control function speciﬁcation, E [ Y i | S ] can be written as follows:: E [ Y i | S ] = σ ∗ i ( S ) (cid:104) X (cid:48) i α + λ ( σ ∗ i ( S )) + (cid:110) X (cid:48) i β + λ ( σ ∗ i ( S )) (cid:111) π ∗ i ( S ) (cid:105) +(1 − σ ∗ i ( S )) (cid:104) X (cid:48) i α + λ ( σ ∗ i ( S )) + (cid:110) X (cid:48) i β + λ ( σ ∗ i ( S )) (cid:111) π ∗ i ( S ) (cid:105) = E [ Y i | X i , σ ∗ i ( S ) , π ∗ i ( S )]Note that E [ Y i | S ] is a function of S only through ( X i , σ ∗ i ( S ) , π ∗ i ( S )), thus we write E [ Y i | X i , σ ∗ i ( S ) , π ∗ i ( S )]. i ’s expected outcome under new policy is then given by E [ Y i | X i , σ ∗ i ( S new ) , π ∗ i ( S new )].To estimate this, we ﬁrst need to compute the new equilibrium choice probabili-ties: { σ ∗ i ( G, X, Z new ) } i ∈N n where Z new is determined according to 33. Under the iden-tiﬁed ﬁrst-stage parameters, this is done by solving the new ﬁxed point of the best-response functions under the new data set S new = ( G, X, Z new ). We then estimateˆ Y i ≡ ˆ E [ Y i | X i , σ ∗ i ( S new ) , π ∗ i ( S new )] for each i ∈ N n using the formula above. Overall impactof policy S new is computed by (cid:80) ni =1 ˆ Y i /n . Results

See 2. The red line shows the eﬀect of τ on the overall long-run adoption levelwhen we ignore interference eﬀects. In such case, as τ increases, the long-run adoptionlevel increases monotonically. This is because as τ increases, more households get subsidy,and without interference, treated agents are more likely to adopt in the long-run.In the presence of spillovers, the eﬀect of τ does not increase monotonically anymore34igure 2: counterfactual impact of means-tested subsidy on LR-adoptionas the blue line shows. Higher τ also induces higher π ∗ i which aﬀect long-run adoptionnegatively. Therefore a priori, we cannot expect that higher τ would give higher overalllong-run adoption rate in the population. In fact, as the blue line shows, the highestlong-run adoption rate is achieved under the subsidy scheme targeting the very lowestpercentile households.The result also highlights complication involved in the use of subsidies to increaselong-run adoption rate. As the result shows, the highest expected coverage is only 17percent. In this paper, we propose a new methodological framework to analyze randomized ex-periments with spillovers and noncompliance in a general network setup. Using a game-theoretic framework, we allow for spillover eﬀects to occur at two stages: at the choice35tage and outcome stage. Potential outcomes are modeled as a random coeﬃcient modelto account for general unobserved heterogeneity. We extend the traditional control func-tion estimator of Heckman (1979) to incorporate spillovers. Finally, we illustrate ourmethods using Dupas (2014) data and show that our model can be used to evaluate thecounterfactual policies.In our treatment choice games, we assumed that private information is independentlydistributed across agents. Relaxing this assumption to allow for network dependence inprivate information would be a rewarding task. Another important issue is multiple equi-libria – formalizing a problem of policy evaluation and counterfactual prediction in thepresence of multiple equilibria is important for realistic policy design. Finally, we concludeby noting that our model can be used to derive an ex ante optimal treatment assignmentrule under interference, especially in settings where a social planner should take possiblenoncompliance and spillover into account. 36 ppendixA Proof of Theorem 1

Following Xu (2018), we show this by contradiction. Deﬁne ¯ σ i = (cid:80) j ∈N i σ j / |N i | . LetΓ( X i , Z i , ¯ σ i , θ ) = Φ( X (cid:48) i θ + θ Z i + θ ¯ σ i ) be i ’s best-response function to inputs ( X i , Z i , ¯ σ i ),and parameter value θ . Suppose there are two non-identically equilibria σ ∗ = ( σ ∗ i ) i ∈N n and σ + = ( σ + i ) i ∈N n . By deﬁnition, they should satisfy σ ∗ i = Γ( X i , Z i , ¯ σ ∗ i , θ ) , ∀ i ∈ N n and σ + i = Γ( X i , Z i , ¯ σ + i , θ ) , ∀ i ∈ N n . Taking diﬀerence and applying mean-value theorem, we have σ ∗ i − σ + i = Γ( X i , Z i , ¯ σ ∗ i , θ ) − Γ( X i , Z i , ¯ σ + i , θ )= ∂ Γ( X i , Z i , ¯ σ mi , θ ) ∂ ¯ σ i (¯ σ ∗ i − ¯ σ + i )where ¯ σ mi is a mean value between ¯ σ ∗ i and ¯ σ + i . Taking an absolute value to the LHS, | σ ∗ i − σ + i | ≤ (cid:12)(cid:12)(cid:12) ∂ Γ( X i , Z i , ¯ σ mi , θ ) ∂ ¯ σ i (cid:12)(cid:12)(cid:12) · | ¯ σ ∗ i − ¯ σ + i | (35) ≤ (cid:12)(cid:12)(cid:12) ∂ Γ( X i , Z i , ¯ σ mi , θ ) ∂ ¯ σ i (cid:12)(cid:12)(cid:12) · max j ∈N i | σ ∗ j − σ + j | . (36)From the deﬁnition of Γ( · ), observe that ∂ Γ( X i , Z i , ¯ σ i , θ ) ∂ ¯ σ i = Φ( X (cid:48) i θ + θ Z i + θ ¯ σ i ) ∂ ¯ σ i = φ (cid:0) X (cid:48) i θ + θ Z i + θ ¯ σ i (cid:1) θ . Thus, (cid:12)(cid:12)(cid:12) ∂ Γ( X i , Z i , ¯ σ mi , θ ) ∂ ¯ σ i (cid:12)(cid:12)(cid:12) ≤ | θ | sup u φ ( u ) ≡ λ. (37)Therefore we can write 36 as | σ ∗ i − σ + i | ≤ λ max j ∈N i | σ ∗ j − σ + j | . i ∈ N n to both sides gives,max i ∈N n | σ ∗ i − σ + i | ≤ λ max i ∈N n max j ∈N i | σ ∗ j − σ + j | ≤ λ max k ∈N n | σ ∗ k − σ + k | which leads to contradiction when λ < (cid:4) B Proofs for Asymptotic Results

B.1 Proof of consistency of ﬁrst-stage estimators

Let l i ( θ ) ≡ D i ln σ ∗ i ( S, θ )+(1 − D i ) ln(1 − σ ∗ i ( S, θ )) be an individual log-likelihood functionof i . Then (cid:98) L n ( θ ) = n (cid:80) ni =1 l i ( θ ).Deﬁne L n ( θ ) = E [ (cid:98) L n ( θ ) | S ]where the population objective function, L n ( θ ), depends on n through the public state S =( G, X, Z ). Recall that the true parameter is denoted by θ . Following Gallant and White(1988) Theorem 3.3, we establish consistency result by showing identiﬁable uniquenessand uniform convergence result. Identiﬁable Uniqueness

We show that lim inf n →∞ ( L n ( θ ) − L n ( θ )) > θ suchthat | θ − θ | ≥ (cid:15) > − lim inf n →∞ ( L n ( θ ) − L n ( θ ))= lim inf n →∞ − n n (cid:88) i =1 E (cid:104) D i ln σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) + (1 − D i ) ln 1 − σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) (cid:12)(cid:12)(cid:12) S (cid:105) = lim inf n →∞ − n n (cid:88) i =1 (cid:104) σ ∗ i ( S, θ ) ln σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) + (1 − σ ∗ i ( S, θ )) ln 1 − σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) (cid:12)(cid:12)(cid:12) S (cid:105) ≥ lim inf n →∞ − n n (cid:88) i =1 ln (cid:0) σ ∗ i ( S, θ ) + 1 − σ ∗ i ( S, θ ) (cid:1) = 0 . The second equality follows from E [ D i | S ] = σ ∗ i ( S, θ ) and the last weak inequality is dueto Jensen’s inequality. To show that the inequality holds strictly, we need to rule outthe case of lim inf n →∞ ( L n ( θ ) − L n ( θ )) = 0. This happens when for some large enough n , σ ∗ i ( S, θ ) = σ ∗ i ( S, θ ) for all i ∈ N n = { , , · · · , n } , i.e., there exists n that deliversobservationally equivalent choice probabilities.38uppose this is the case. By the ﬁxed point requirement, the following needs to besatisﬁed for any arbitrary θ , including the true parameter θ :Φ − ( σ ∗ i ( S, θ )) = X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) , ∀ i ∈ N n and Φ − ( σ ∗ i ( S, θ )) = X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) , ∀ i ∈ N n . If σ ∗ i ( S, θ ) = σ ∗ i ( S, θ ) , ∀ i ∈ N n , we have, X (cid:48) i ( θ − θ ) + Z i ( θ − θ ) + ( θ − θ ) 1 |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) = 0 , ∀ i ∈ N n . Equivalently, R (cid:48) i ( θ − θ ) = 0 , ∀ i ∈ N n where R i is deﬁned as in Theorem 2. It follows that( θ − θ ) (cid:48) (cid:80) ni =1 R i R (cid:48) i ( θ − θ ) = 0. Given the assumption that (cid:80) ni =1 R i R (cid:48) i is positive deﬁnitefor all large enough n , above equation holds only under θ = θ leading to contradiction. (cid:3) Next, we verify that sup θ ∈ Θ | (cid:98) L n ( θ ) − L n ( θ ) | p −→ . We ﬁrst shows the pointwise con-vergence holds. Uniform convergence follows then from Lipschitz conditions.

Pointwise Convergence

We ﬁrst show that for any θ ∈ Θ, | (cid:98) L n ( θ ) − L n ( θ ) | p −→ . It canbe shown that (cid:98) L n ( θ ) − L n ( θ ) = 1 n n (cid:88) i =1 (cid:110) ( D i − σ ∗ i ( S, θ )) ln σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) (cid:124) (cid:123)(cid:122) (cid:125) ζ i (cid:111) . { ζ i } ni =1 is conditionally independent with mean zero given S . It is also uniformly boundeddue to Lemma 1. Therefore we can apply a LLN for independent observations (e.g.,Markov) and the result follows. Uniform Convergence

Given pointwise convergence result, uniform convergence followsif we can establish that { (cid:98) L n ( θ ) − L n ( θ ) } n is stochastically equicontinuous on Θ (theorem1 in Andrews (1992)). Suﬃcient condition for this is to show that the summand in thesample objective function { l i ( θ ) } is Lipschitz (Assumption W-LIP in Andrews (1992)).39ote that ∇ θ l i ( θ ) = D i ∇ θ σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) + (1 − D i ) −∇ θ σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ )which is bounded by |∇ θ l i ( θ ) | ≤ (cid:12)(cid:12)(cid:12) ∇ θ σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∇ θ σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) (cid:12)(cid:12)(cid:12) . By Lemma 1 and Lemma 2, σ ∗ i ( S, θ ) and ∇ θ σ ∗ i ( S, θ ) are uniformly bounded. Therefore { l i ( θ ) } is Lipschitz-continuous and the result follows. (cid:4) B.2 Proof of asymptotic normality of ﬁrst-stage estimators ˆ θ should satisfy the ﬁrst-order condition for maximization: ∇ θ (cid:98) L n (ˆ θ ) = 0. Given that (cid:98) L n (ˆ θ )is smooth, we can apply the mean-value theorem to the ﬁrst-order condition around thetrue parameter θ : ∇ θ (cid:98) L n (ˆ θ ) = ∇ θ (cid:98) L n ( θ ) + ∇ θθ (cid:98) L n (¯ θ )(ˆ θ − θ ) = 0 (38) ⇐⇒ √ n (ˆ θ − θ ) = − ( ∇ θ (cid:98) L n (¯ θ )) − √ n ∇ θ (cid:98) L n (ˆ θ ) (39)where ¯ θ is a mean value of the line joining ˆ θ and θ . Deﬁne the Hessian matrix as H n ( θ ) = E (cid:104) n n (cid:88) i =1 ∇ θθ l i ( θ ) (cid:12)(cid:12)(cid:12) S (cid:105) and the information matrix as I n ( θ ) = E (cid:104) n n (cid:88) i =1 ∇ θ l i ( θ ) ∇ θ l i ( θ ) (cid:48) (cid:12)(cid:12)(cid:12) S (cid:105) . We ﬁrst show that ∇ θθ (cid:98) L n (¯ θ ) − H n ( θ ) p −→ √ n I − n ( θ ) ∇ θ (cid:98) L n (ˆ θ ) d −→ N (0 , I dim ( θ ) ) (CLT on the score).40 LLN of the Hessian Matrix

We show that ∇ θθ (cid:98) L n (¯ θ ) − H n ( θ ) p −→

0. Note that ∇ θθ (cid:98) L n (¯ θ ) − H n ( θ )= 1 n n (cid:88) i =1 ∇ θθ l i (¯ θ ) − n n (cid:88) i =1 ∇ θθ l i ( θ ) (cid:124) (cid:123)(cid:122) (cid:125) A + 1 n n (cid:88) i =1 ∇ θθ l i ( θ ) − E (cid:104) n n (cid:88) i =1 ∇ θθ l i ( θ ) (cid:12)(cid:12)(cid:12) S (cid:105)(cid:124) (cid:123)(cid:122) (cid:125) B First, A = o p (1) since ˆ θ − θ p −→ ∇ θθ l i ( · ) is continuous as a result of Lemma 3. Next,note that B = 1 n n (cid:88) i =1 (cid:110) ∇ θθ l i ( θ ) − E (cid:2) ∇ θθ l i ( θ ) (cid:12)(cid:12) S (cid:3)(cid:111)(cid:124) (cid:123)(cid:122) (cid:125) ξ i { ξ i } is independent conditional on S with mean zero. Also by Lemma 3, it is uniformlybounded. Therefore by LLN for independent observations, B = o p (1). CLT on the Score

Note that √ n ∇ θ (cid:98) L n ( θ ) = √ n n (cid:80) ni =1 ∇ θ l i ( θ ) and that {∇ θ l i ( θ ) } is independently distributed conditional on S with the uniformly bounded conditionalvariance I n ( θ ). Therefore we can apply Lyapunov’s CLT for independent observationsto get √ n I − / n ( θ ) ∇ θ (cid:98) L n ( θ ) d −→ N (0 , I ).Combining all these results, we see that the equation 39 can be written as √ n (ˆ θ − θ ) = − ( H n ( θ ) + o p (1)) − I n ( θ ) / √ n I n ( θ ) − / ∇ (cid:99) L n ( θ )By the information matrix inequality, when the model is correctly speciﬁed, H n ( θ ) = −I n ( θ ) so that we have √ n (ˆ θ − θ ) = ( I n ( θ ) + o p (1)) − I n ( θ ) / √ n I n ( θ ) − / ∇ (cid:99) L n ( θ )Under the assumption that I n ( θ ) is nonsingular, we get the desired result: √ n ( I − n ( θ )) − / (ˆ θ − θ ) d −→ N (0 , I dim ( θ ) ) . (cid:4) .3 Proof of consistency of second-stage estimators Our estimators are based on the following moment conditions E [ Y i | D i = 1 , S ] = W (cid:48) i γ , E [ Y i | D i = 0 , S ] = W (cid:48) i γ Let us focus on ˆ γ case as ˆ γ case can be analyzed in an analogous way.Given the moment condition E [ Y i | D i = 1 , S ] = W (cid:48) i γ , we write the equation in errorform as Y i = W (cid:48) i γ + (cid:15) i , E [ (cid:15) i | D i = 1 , S ] = 0 . Estimator for γ is deﬁned asˆ γ = arg min γ n n (cid:88) i =1 D i (cid:0) Y i − ˆ W (cid:48) i γ (cid:1) (40)= arg min γ n n (cid:88) i =1 (cid:0) D i Y i − D i ˆ W (cid:48) i γ (cid:1) (41)= (cid:110) n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:111) − n (cid:88) i =1 D i ˆ W i Y i (42)Note that D i Y i = D i Y i (1 , π ∗ i ( S, θ )) = D i ( W (cid:48) i γ + (cid:15) i ) = D i (cid:0) ˆ W (cid:48) i γ + (cid:15) i − ( ˆ W i − W i ) (cid:48) γ (cid:1) .Plugging this into 42 gives thatˆ γ = (cid:0) n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:1) − n (cid:88) i =1 D i ˆ W i (cid:0) ˆ W (cid:48) i γ + (cid:15) i − ( ˆ W i − W i ) (cid:48) γ (cid:1) = γ + (cid:0) n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:1) − (cid:88) i D i ˆ W i (cid:0) (cid:15) i − ( ˆ W i − W i ) (cid:48) γ (cid:1) so thatˆ γ − γ = (cid:0) n n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:124) (cid:123)(cid:122) (cid:125) A (cid:1) − n (cid:88) i D i ˆ W i (cid:0) (cid:15) i − ( ˆ W i − W i ) (cid:48) γ (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) B = A − B . (43)42 art A We show that n (cid:80) i =1 D i ˆ W i ˆ W (cid:48) i − E [ n (cid:80) ni =1 D i W i W (cid:48) i | S ] = o p (1). Decompose n (cid:80) i =1 D i ˆ W i ˆ W (cid:48) i − E [ n (cid:80) ni =1 D i W i W (cid:48) i | S ] into two parts as follows:1 n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i − n n (cid:88) i =1 D i W i W (cid:48) i (cid:124) (cid:123)(cid:122) (cid:125) ( a ) + 1 n n (cid:88) i =1 D i W i W (cid:48) i − n n (cid:88) i =1 E [ D i W i W (cid:48) i | S ] (cid:124) (cid:123)(cid:122) (cid:125) ( b ) . ( a ) = o p (1) since ˆ θ − θ p −→ W i ( θ ) is continuous in θ . For ( b ), note that the sum-mand { D i W i W (cid:48) i − E [ D i W i W (cid:48) i | S ] } is conditionally independent given S with mean zero.It is also uniformly bounded. Therefore by LLN, ( b ) = o p (1). Finally, invertibility of E [ n (cid:80) ni =1 D i W i W (cid:48) i | S ] follows from the identiﬁcation condition. Part B

Since ˆ W i − W i = o p (1), we can write it B as1 n n (cid:88) i =1 D i ( W i + o p (1))( (cid:15) i − o p (1)) = 1 n n (cid:88) i =1 D i W i (cid:15) i Similar argument as above shows that1 n n (cid:88) i =1 (cid:16) D i W i (cid:15) i − E [ D i W i (cid:15) i | S ] (cid:17) = o p (1) . It follows from the moment condition E [ (cid:15) i | D i = 1 , S ] = 0 that E [ D i W i (cid:15) i | S ] = 0. There-fore we conclude that B = 1 n n (cid:88) i =1 D i W i (cid:15) i + o p (1) = o p (1) . Combining with the result on part A , we conclude that ˆ γ − γ = o p (1). (cid:4) B.4 Proof of asymptotic normality of second-stage estimators

From 43, √ n (ˆ γ − γ ) = (cid:16) n n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:17) − √ n (cid:88) i D i ˆ W i (cid:16) (cid:15) i − γ (cid:48) ( ˆ W i − W i ) (cid:17) (44)= (cid:16) E [ 1 n n (cid:88) i =1 D i W i W (cid:48) i | S ] + o p (1) (cid:17) − √ n (cid:88) i D i ˆ W i (cid:16) (cid:15) i − γ (cid:48) ( ˆ W i − W i ) (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) C (45)43here the last step has been established in the previous section. Consider the term ˆ W i − W i in C . By mean-value theorem,ˆ W i − W i = W i (ˆ γ ) − W i ( γ ) = ∇ γ W i (¯ γ )(ˆ γ − γ )= ⇒ √ n ( ˆ W i − W i ) = ∇ γ W i (¯ γ ) √ n (ˆ γ − γ )where ¯ γ is a mean value of the line joining ˆ γ and γ . By the asymptotic normality of theﬁrst-step estimator ˆ θ as in the equation 30, we can show that √ n (ˆ θ − θ ) is asymptoticallylinear. Speciﬁcally, deﬁne the inﬂuence function as η i = E [ n (cid:80) ni =1 ∇ θ l i ( θ ) ∇ θ l i ( θ ) (cid:48) | S ] ∇ θ l i ( θ ),then √ n (ˆ θ − θ ) = 1 √ n n (cid:88) i =1 η i + o p (1) . Therefore the term C in √ n (ˆ γ − γ ) can be written as1 √ n (cid:88) i D i ˆ W i ( (cid:15) i − γ (cid:48) ( ˆ W i − W i )) = 1 √ n n (cid:88) i =1 D i ˆ W i (cid:15) i − n n (cid:88) i =1 D i ˆ W i γ (cid:48) √ n ( ˆ W i − W i )= 1 √ n n (cid:88) i =1 D i ˆ W i (cid:15) i (cid:124) (cid:123)(cid:122) (cid:125) C ( a ) − (cid:110) n n (cid:88) i =1 D i ˆ W i γ (cid:48) ∇ γ W i (¯ γ ) (cid:111)(cid:124) (cid:123)(cid:122) (cid:125) C ( b ) √ n n (cid:88) i =1 η i + o p (1)We ﬁrst show that C ( a ) can be replaced by √ n (cid:80) ni =1 D i W i (cid:15) i and that C ( b ) can bereplaced by E [ n (cid:80) ni =1 D i W i γ (cid:48) ∇ γ W i ( γ )]. Part C(a)

We show that 1 √ n n (cid:88) i =1 (cid:16) D i ˆ W i (cid:15) i − D i W i (cid:15) i (cid:17) p −→ √ n n (cid:88) i =1 D i ( ˆ W i − W i ) (cid:15) i = 1 √ n n (cid:88) i =1 D i ∇ γ W i (¯ γ )(ˆ γ − γ (cid:1) (cid:15) i (46)= 1 n n (cid:88) i =1 D i ∇ γ W i (¯ γ ) √ n (ˆ γ − γ ) (cid:15) i (47)= 1 n n (cid:88) i =1 D i ∇ γ W i (¯ γ ) (cid:0) √ n n (cid:88) i =1 η i (cid:1) (cid:15) i (48)= (cid:16) n n (cid:88) i =1 D i ∇ γ W i (¯ γ ) (cid:15) i (cid:17) √ n n (cid:88) i =1 η i (49)44t can be shown easily that n (cid:80) ni =1 (cid:16) D i ∇ γ W i ( γ ) (cid:15) i − E [ D i ∇ γ W i (¯ γ ) (cid:15) i | S ] (cid:17) p −→ E [ D i ∇ γ W i ( γ ) (cid:15) i | S ] = 0 from the moment condition. Therefore equation 49 becomes o p (1) × O p (1) and the result follows. (cid:4) Part C ( b ) We show that1 n n (cid:88) i =1 D i ˆ W i γ (cid:48) ∇ γ W i (¯ γ ) − E [ 1 n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) | S ] = o p (1) . Decompose the LHS as1 n n (cid:88) i =1 D i ˆ W i γ (cid:48) ∇ γ W i (¯ γ ) − n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:124) (cid:123)(cid:122) (cid:125) A + 1 n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) − E [ 1 n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) | S ] (cid:124) (cid:123)(cid:122) (cid:125) B .A = o p (1) since ˆ θ − θ p −→

0. Also, since { D i W i γ (cid:48) ∇ γ W i ( γ ) } are conditionally independentgiven S and uniformly bounded, we can apply Markov LLN to show that B = o p (1). (cid:4) Combining all the results, term C can be written as C = 1 √ n n (cid:88) i =1 D i W i (cid:15) i − E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) √ n n (cid:88) i =1 η i + o p (1)= 1 √ n n (cid:88) i =1 (cid:110) D i W i (cid:15) i − E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) η i (cid:111)(cid:124) (cid:123)(cid:122) (cid:125) ζ i . Since ζ i | S has a mean zero and is independently distributed, we can apply CLT forthe independent observation and get Ψ − / n √ n (cid:80) ni =1 ζ i d −→ N (0 , I dim ( γ ) ) where Ψ n = n (cid:80) ni =1 E [ ζ i ζ (cid:48) i | S ] which can be simpliﬁed as1 n n (cid:88) i =1 E [ D i W i W (cid:48) i (cid:15) i ] + E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) n n (cid:88) i =1 E [ η i η (cid:48) i | S ] E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) (cid:48) as the cross-terms get crossed out due to E [ (cid:15) i η (cid:48) i | S ] = 0, i.e., the ﬁrst- and second-stagemoments are uncorrelated. Finally, from 45, and by deﬁning Υ n = E [ n (cid:80) ni =1 W i W (cid:48) i | S ],45e have Λ − / n √ n (ˆ γ − γ ) d −→ N (0 , I dim ( γ ) )for Λ n = Υ − n Ψ n Υ − n as desired. (cid:4) C Auxiliary Lemmas

Lemma 1 (uniform boundedness of σ ∗ i ( S, θ )) . There exists a constant C ∈ (0 , suchthat σ ∗ i ( S, θ ) ≥ C for any i, S, θ and n . (Proof) As in A, let us deﬁne agent’s best-response function as Γ( X i , Z i , ¯ σ i , θ ) =Φ( X (cid:48) i θ + θ Z i + θ ¯ σ i ). Recall that σ ∗ i ( S, θ ) = Φ( X (cid:48) i θ + θ Z i + θ π ∗ i ( S, θ )).The resultfollows since X i is bounded, Z i is binary, and π ∗ i ( S, θ ) ≤ (cid:4) Lemma 2 (uniform boundedness of ∇ σ i ) . Suppose λ < . There exists a ﬁnite constant C such that sup i,n,S,θ,k (cid:12)(cid:12) ∂σ ∗ i ( S, θ ) ∂θ k (cid:12)(cid:12) < C < ∞ . (Proof) Recall that σ ∗ i ( S, θ ) = Γ( X i , Z i , ¯ σ ∗ i ( S, θ ) , θ ) . Diﬀerentiating above equation with respect to θ k gives ∂σ ∗ i ( S, θ ) ∂θ k = ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂θ k + ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂ ¯ σ ∗ i ∂ ¯ σ ∗ i ( S, θ ) ∂θ k Equivalently, ∂σ ∗ i ( S, θ ) ∂θ k = ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂θ k + 1 |N i | (cid:88) j ∈N i ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂ ¯ σ ∗ i ∂σ ∗ j ( S, θ ) ∂θ k (50)which gives the implicit function of [ ∂σ ∗ i ( S, θ ) /∂θ k ] i ∈N n . Let us write 50 in matrix formby deﬁning the following: • Let χ n be n × i th component ∂σ ∗ i ( S, θ ) /∂θ k . • Let D n be n × n matrix with ij th element1 |N i | ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂ ¯ σ ∗ i G ij = 1 and zero if G ij = 0. • Let τ n be n × i th component ∂ Γ( X i ,Z i , ¯ σ ∗ i ,θ ) ∂θ k .Then we can write the system 50 as χ n = D n χ n + τ n or equivalently,( I n − D n ) χ n = τ n which is invertible if || D n || ∞ < || D n || ∞ is the maximumof the absolute values of row sums, i.e., || D n || ∞ = max i ∈N n (cid:12)(cid:12)(cid:12) ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂ ¯ σ ∗ i (cid:12)(cid:12)(cid:12) .

37 implies that || D n || ∞ ≤ λ , thus || D n || ∞ <

1. Therefore D n is invertible and ( I n − D n ) − = (cid:80) ∞ t =0 D tn . It follows that χ n = ( (cid:80) ∞ t =0 D tn ) τ n . Taking sup norm gives || χ n || ∞ ≤ ∞ (cid:88) t =0 || D tn || ∞ || τ n || ∞ = || τ n || ∞ − λ < C τ − λ since RHS does not depend on ( i, n, z n , θ, k ), we have the desired result. (cid:4) Lemma 3 (uniform boundedness of ∇ σ i ) . Suppose λ < . There exists a ﬁnite constant C such that | ∂ σ ∗ i ( S, θ ) ∂θ m ∂θ k | < C < ∞ for any i, n, S, θ, k, m a.s. (Proof) Fix m . Diﬀerentiating the equation 50 w.r.t. θ m gives ∂ σ i ∂θ m ∂θ k = ∂ Γ ∂θ m ∂θ k + ∂ Γ ∂ ¯ σ i ∂θ k ∂ ¯ σ i ∂θ m + ∂ Γ ∂ ¯ σ i ∂ ¯ σ i ∂θ m ∂θ k + ∂ ¯ σ i ∂θ k (cid:110) ∂ Γ ∂ ¯ σ i ∂ ¯ σ i ∂θ m + ∂ Γ ∂θ m ∂ ¯ σ i (cid:111) . Let us write it compactly as follows: ∂ mk σ i = Γ mk + Γ ¯ σk ∂ m ¯ σ i + Γ ¯ σ ∂ mk ¯ σ i + Γ ¯ σ ¯ σ ∂ k ¯ σ i ∂ m ¯ σ i + Γ ¯ σm ∂ k ¯ σ i . (51)Write 51 in a matrix form by deﬁning • Let ˜ χ n be n × i th component ∂ mk σ i .47 Let ˜ τ n be n × i th componentΓ mk + Γ ¯ σk ∂ m ¯ σ i + Γ ¯ σ ¯ σ ∂ k ¯ σ i ∂ m ¯ σ i + Γ ¯ σm ∂ k ¯ σ i . Then 51 can be written as ( I n − D n ) ˜ χ n = ˜ τ n . As we have shown before, D n is invertible. For any i ∈ N n , | τ i | ≤ B θ,θ +2 B ¯ σθ C ∂σ + B ¯ σ, ¯ σ C ∂σ ,so that || τ n || ∞ = max i | τ i | is uniformly bounded. Therefore, || ˜ x n || ∞ ≤ C τ − λ and the result follows. (cid:4) eferences Donald W. K. Andrews. Generic uniform convergence.

Econometric Theory , 8(2):241–257,1992.Sarah Baird, J. Aislinn Bohren, Craig McIntosh, and Berk ¨Ozler. Optimal design ofexperiments in the presence of interference.

The Review of Economics and Statistics ,(5):844–860, 2018.Patrick Bajari, Han Hong, John Krainer, and Denis Nekipelov. Estimating static models ofstrategic interactions.

Journal of Business & Economic Statistics , 28(4):469–482, 2010.doi: 10.1198/jbes.2009.07264. URL https://doi.org/10.1198/jbes.2009.07264 .Jorge Balat and Sukjin Han. Multiple treatments with strategic interaction. arXiv, 2019.Christian N. Brinch, Magne Mogstad, and Matthew Wiswall. Beyond late with a discreteinstrument.

Journal of Political Economy , 125(4):985–1039, 2017. doi: 10.1086/692712.URL https://doi.org/10.1086/692712 .William Brock and Steven Durlauf. Identiﬁcation of binary choice models with socialinteractions.

Journal of Econometrics , 140(1):52–75, 2007. URL https://EconPapers.repec.org/RePEc:eee:econom:v:140:y:2007:i:1:p:52-75 .William A. Brock and Steven N. Durlauf. Discrete Choice with Social Interactions.

TheReview of Economic Studies , 68(2):235–260, 04 2001. ISSN 0034-6527. doi: 10.1111/1467-937X.00168. URL https://doi.org/10.1111/1467-937X.00168 .Pedro Carneiro, James J. Heckman, and Edward J. Vytlacil. Estimating marginal returnsto education.

American Economic Review , 101(6):2754–81, October 2011. doi: 10.1257/aer.101.6.2754. URL .Bruno Cr´epon, Esther Duﬂo, Marc Gurgand, Roland Rathelot, and Philippe Zamora. DoLabor Market Policies have Displacement Eﬀects? Evidence from a Clustered Random-ized Experiment *.

The Quarterly Journal of Economics , 128(2):531–580, 04 2013. ISSN0033-5533. doi: 10.1093/qje/qjt001. URL https://doi.org/10.1093/qje/qjt001 .Pascaline Dupas. Short-run subsidies and long-run adoption of new health products:Evidence from a ﬁeld experiment.

Econometrica , 82(1):197–228, 2014. doi: https:49/doi.org/10.3982/ECTA9508. URL https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA9508 .Marc Ferracci, Gr´egory Jolivet, and Gerard J. van den Berg. Evidence of treatmentspillovers within markets.

The Review of Economics and Statistics , 95(5):812–823,2014.A. Gallant and H. White.

A Uniﬁed Theory of Estimation and Inference for NonlinearDynamic Models . Oxford: Basil Blackwell, 1988.Susan Godlonton and Rebecca Thornton. Peer eﬀects in learning hiv results.

Journalof Development Economics , 97(1):118 – 129, 2012. ISSN 0304-3878. doi: https://doi.org/10.1016/j.jdeveco.2010.12.003. URL .Jinyong Hahn and Geert Ridder. Conditional moment restrictions and triangular simul-taneous equations.

The Review of Economics and Statistics , 93(2):683–689, 2011.James J. Heckman. Sample selection bias as a speciﬁcation error.

Econometrica , 47(1):153–161, 1979.James J. Heckman. Micro data, heterogeneity, and the evaluation of public policy: Nobellecture.

Journal of Political Economy , 109(4):673–748, 2001.James J. Heckman and Edward Vytlacil. Policy-relevant treatment eﬀects.

AmericanEconomic Review , 91(2):107–111, May 2001. doi: 10.1257/aer.91.2.107. URL .James J Heckman, Sergio Urzua, and Edward Vytlacil. Understanding instrumental vari-ables in models with essential heterogeneity.

The Review of Economics and Statistics , 88(3):389–432, 2006. doi: 10.1162/rest.88.3.389. URL https://doi.org/10.1162/rest.88.3.389 .Michael G Hudgens and M. Elizabeth Halloran. Toward causal inference with interference.

Journal of the American Statistical Association , 103(482):832–842, 2008. doi: 10.1198/016214508000000292. URL https://doi.org/10.1198/016214508000000292 . PMID:19081744. 50osuke Imai, Zhichao Jiang, and Anup Malani. Causal inference with interferenceand noncompliance in two-stage randomized experiments.

Journal of the AmericanStatistical Association , 0(0):1–13, 2020. doi: 10.1080/01621459.2020.1775612. URL https://doi.org/10.1080/01621459.2020.1775612 .Guido W. Imbens.

Nonadditive Models with Endogenous Regressors , volume 3 of

Econo-metric Society Monographs , pages 17–46. Cambridge University Press, advances ineconomics and econometrics: theory and applications, ninth world congress edition,2007.Guido W. Imbens and Joshua D. Angrist. Identiﬁcation and estimation of local averagetreatment eﬀects.

Econometrica , 62:467–475, 1994.Matthew O. Jackson, Zhongjian Lin, and Ning Neil Yu. Adjusting for peer-inﬂuence inpropensity scoring when estimating treatment eﬀects, 2020.Brendan Kline and Elie Tamer. Chapter 7 - econometric analysis of models with so-cial interactionssome of this chapter had been previously distributed as “the em-pirical content of models with social interactions” and “some interpretation of thelinear-in-means model of social interactions” by the same authors. In Bryan Gra-ham and ´Aureo de Paula, editors,

The Econometric Analysis of Network Data , pages149 – 181. Academic Press, 2020. ISBN 978-0-12-811771-2. doi: https://doi.org/10.1016/B978-0-12-811771-2.00013-4. URL .Natalia Lazzati. Treatment response with social interactions: Partial identiﬁcationvia monotone comparative statics.

Quantitative Economics , 6(1):49–83, 2015. doi:https://doi.org/10.3982/QE308. URL https://onlinelibrary.wiley.com/doi/abs/10.3982/QE308 .Lung-Fei Lee. Tests for the bivariate normal distribution in econometric models withselectivity.

Econometrica , 52(4):843–863, 1984.Michael P. Leung. Two-step estimation of network-formation models with incomplete in-formation.

Journal of Econometrics , 188(1):182 – 195, 2015. ISSN 0304-4076. doi:https://doi.org/10.1016/j.jeconom.2015.04.001. URL .51ichael P. Leung. Treatment and spillover eﬀects under network interference.

The Reviewof Economics and Statistics , 102(2):368–380, 2020a.Michael P. Leung. Causal inference under approximate neighborhood interference. arXiv,2020b.Charles F. Manski. Identiﬁcation of treatment response with social interactions.

The Econometrics Journal , 16(1):S1–S23, 2013. doi: https://doi.org/10.1111/j.1368-423X.2012.00368.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1368-423X.2012.00368.x .Matthew A. Masten and Alexander Torgovitsky. Identiﬁcation of instrumental variablecorrelated random coeﬃcients models.

The Review of Economics and Statistics , 98(5):1001–1005, 2016.Daniel McFadden. Econometric analysis of qualitative response models. In Z. Griliches † and M. D. Intriligator, editors, Handbook of Econometrics , volume 2, chapter 24, pages1395–1457. Elsevier, 1 edition, 1984. URL https://EconPapers.repec.org/RePEc:eee:ecochp:2-24 .Edward Miguel and Michael Kremer. Worms: Identifying impacts on education and healthin the presence of treatment externalities.

Econometrica , 72(1):159–217, 2004. doi:https://doi.org/10.1111/j.1468-0262.2004.00481.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0262.2004.00481.x .Geert Ridder and Shuyang Sheng. Estimation of large network formation games. arXiv,2020.D. B. Rubin. Comments on “on the application of probability theory to agriculturalexperiments. essay on principles. section 9” by j. splawa-neyman translated from thepolish and edited by d. m. dabrowska and t. p. speed.

Statistical Science , 5:472–480,1990.Gonzalo Vazquez-Bare. Causal spillover eﬀects using instrumental variables. arXiv, 2020.Jeﬀrey M. Wooldridge. Further results on instrumental variables estimation of averagetreatment eﬀects in the correlated random coeﬃcient model.

Economics Letters , 79(2):185 – 191, 2003. ISSN 0165-1765. doi: https://doi.org/10.1016/S0165-1765(02)00318-X.URL .52aiqing Xu. Social interactions in large networks: A game theoretic approach.

Interna-tional Economic Review , 59(1):257–284, 2018. doi: https://doi.org/10.1111/iere.12269.URL https://onlinelibrary.wiley.com/doi/abs/10.1111/iere.12269https://onlinelibrary.wiley.com/doi/abs/10.1111/iere.12269