Analysis of Randomized Experiments with Network Interference and Noncompliance
AAnalysis of Randomized Experimentswith Network Interference and Noncompliance
Bora KimDecember 29, 2020
Abstract
Randomized experiments have become a standard tool in economics. In analyzing random-ized experiments, the traditional approach has been based on the Stable Unit TreatmentValue (SUTVA: Rubin (1990)) assumption which dictates that there is no interference be-tween individuals. However, the SUTVA assumption fails to hold in many applications dueto social interaction, general equilibrium, and/or externality effects. While much progresshas been made in relaxing the SUTVA assumption, most of this literature has only con-sidered a setting with perfect compliance to treatment assignment. In practice, however,noncompliance occurs frequently where the actual treatment receipt is different from theassignment to the treatment. In this paper, we study causal effects in randomized exper-iments with network interference and noncompliance. Spillovers are allowed to occur atboth treatment choice stage and outcome realization stage. In particular, we explicitlymodel treatment choices of agents as a binary game of incomplete information where re-sulting equilibrium treatment choice probabilities affect outcomes of interest. Outcomesare further characterized by a random coefficient model to allow for general unobservedheterogeneity in the causal effects. After defining our causal parameters of interest, wepropose a simple control function estimator and derive its asymptotic properties underlarge-network asymptotics. We apply our methods to the randomized subsidy program ofDupas (2014) where we find evidence of spillover effects on both short-run and long-runadoption of insecticide-treated bed nets. Finally, we illustrate the usefulness of our methodsby analyzing the impact of counterfactual subsidy policies.
Keywords : causal inference, interference, spillover, networks, games of incomplete infor-mation, control function 1 a r X i v : . [ ec on . E M ] D ec Introduction
Randomized experiments have become a standard tool for causal inference in economics.In analyzing randomized experiments, the traditional approach is based on the StableUnit Treatment Value (SUTVA: Rubin (1990)) assumption which dictates that there isno interference between individuals. However, there are many settings where the SUTVAassumption fails to hold. For instance, deworming treatment given to some student may af-fect academic achievements of other students through externality effects (See for instance,Miguel and Kremer (2004)). In labor market, Cr´epon et al. (2013) show that a large-scalejob placement program affects non-participant’s employment probability through generalequilibrium effects. Ferracci et al. (2014) also report similar results. In such cases, there isinterference or spillover effect where an individual’s behavior either directly or indirectlyaffects others’ outcomes through social interactions, externalities, or general equilibriumeffects.In recent years, there has been substantial progress in relaxing the SUTVA assumptionin causal inference framework. Examples include Manski (2013), Hudgens and Halloran(2008), Leung (2020a), Vazquez-Bare (2020), and Baird et al. (2018). Much of the lit-erature, however, has been built on the restrictive assumption of perfect compliance tointervention in which experimental units perfectly comply with their assignment of treat-ment. In practice, noncompliance occurs commonly — some units assigned to treatmentgroup may opt out of the treatment, while some units assigned to control group may de-cide to take the treatment. In studies of labor market, for example, Cr´epon et al. (2013)report that only 35% of those who were offered intensive job counseling actually tookup the offer. While instrumental variables (IV) methods are widely used to address thenoncompliance problem, these methods are developed based on the assumption that rulesout interference between units (Imbens and Angrist (1994)).The goal of this paper is to develop a formal framework to conduct causal inferencein randomized experiments with both spillovers and noncompliance. In the presence ofnoncompliance, spillovers can occur at two stages: at the treatment decision stage, andat the outcome realization stage. In the first stage in which each agent chooses theirtreatment status, spillovers may occur if the utility from choosing treatment depends onthe treatment choices of others. In the second stage where outcomes (or responses) arerealized, agent’s outcome can be affected not only by their own treatment choice, but2lso by treatment choices of others either directly or indirectly. While most of existingliterature has only addressed the spillover effects at the outcome level (i.e., at the secondstage), we allow for spillover effects both at the treatment choice (first stage) and at theoutcome (second stage).To model spillovers, we take a game-theoretic approach. We consider a first stagemodel in which agents play a binary game of incomplete information. Such binary games ofincomplete information have been used in various economic applications, e.g., in empiricalindustrial organization literature (Bajari et al. (2010)), to model binary choices under peereffects (Brock and Durlauf (2001), Brock and Durlauf (2007) and Xu (2018)), and recently,to model network formation process (Leung (2015), and Ridder and Sheng (2020)). Weapply the method to the problem of endogenous treatment choices in the presence ofspillovers. Specifically, we assume that agents simultaneously choose their treatment statusas to maximize their expected utilities, given beliefs about anticipated treatment choicesof their neighbors. In equilibrium, agents’ subjective beliefs coincide with objective choiceprobabilities. Assuming that the unique equilibrium exists, the reduced-form model ofagent’s treatment choice can be written as a single threshold-crossing model where thethreshold is a function of agent’s own treatment assignment and the average equilibriumtreatment choice probability of their neighbors. In the second stage, outcomes are modeledas being a function of agent’s own treatment choice and the equilibrium average treatmentchoice probability of their neighbors, as it is determined in the first stage game. As inthe first stage choice model, spillovers are captured by the equilibrium treatment choiceprobabilities.In our model, therefore, equilibrium treatment choice probabilities work as a media-tor of spillover effects. This is different from the existing literature which often modelsthe spillover at the outcome level by the proportion of treated neighbors. See for instanceHudgens and Halloran (2008), Leung (2020a), and Vazquez-Bare (2020). As we show later,when the outcome of interest represents a choice or behavior of individuals, their formu-lation implicitly assumes that the proportion of treated neighbors is fully observable toagents, i.e., agents possess a complete information over behaviors of their peers. However,the assumption of complete information is unrealistic especially in a single large networksetting as ours where each individual has a considerable number of peers. In such cases,it is more reasonable to assume that agents face uncertainty over others’ behavior, making In our application, for instance, agents have 17 neighbors on average.
3n incomplete information framework more adequate approximation of reality.We then characterize outcomes as a random coefficient model to allow for generalunobserved heterogeneity. Our parameters of interest are average causal effects which in-clude an average direct effect of own treatment take-up and an average spillover effectfrom direct neighbors. After rigorously defining our parameters of interest, we show ouridentification result. We first note that under general unobserved heterogeneity, the con-ventional instrumental variables (IV) methods do not identify the causal parameters whenwe allow for general heterogeneity in the outcome. We therefore propose our alternativeidentification based on a control function approach.We then propose a simple two-step estimator where the first step estimates the payoffparameters of treatment choice games using nested fixed-point maximum-likelihood esti-mation and the second step estimates the average potential outcome functions using con-trol function regression. Our estimator extends canonical Heckman (1979) sample selectionestimator (“Heckit”) to incorporate possible spillover effects. We show that the estimatorsare √ n -consistent and asymptotically normal under the “large-network” asymptotics inwhich a number of individuals connected in a single network increases to infinity. Westudy finite-sample properties of our estimators through Monte Carlo simulation.Our methods are applied to the randomized subsidy program of Dupas (2014). Whilethe use of insecticide-treated nets (ITNs) has been shown to be effective in control-ling malaria, the rate of adoption remains low. Given that the mosquito nets need tobe re-purchased and replaced regularly, understanding the factors affecting household’sshort-run and long-run decision to purchase the bednet is an important task to achievesufficiently high equilibrium adoption rate. In our application, we study the effect of short-run purchase of the bednet on the long-run purchase decision while incorporating possiblespillovers from neighbors defined by geographical proximity. The treatment is a binary isa binary indicator for purchasing a mosquito net in the short-run (in Phase 1) and theoutcome is a binary indicator for purchasing a mosquito net in the long-run (in Phase 2).We find evidence of positive spillover effects in the short-run bednet purchase decision.More specifically, in Phase 1, households were more likely to purchase the bednet whenthe average expected purchase rate of their neighbors is higher. On the contrary, we findthe evidence of negative spillover effects in the long run although the statistical power islimited. Specifically, households were less likely to purchase the bednet in Phase 2 whenthe average expected purchase rate in Phase 1 was higher. Our results also suggest that4he average direct effect of the bednet purchase in Phase 1 on the purchase in Phase 2declines monotonically with respect to the expected neighborhood purchase rate in Phase1. When the Phase-1 neighborhood purchase rate was 0% (no spillover), households whopurchased the bednet in Phase 1 were 36.9 percentage points more likely to purchase thebednet in Phase 2 compared to those who did not purchase the bednet in Phase 1. Sucheffect becomes almost to zero at another extreme where the neighborhood purchase ratewas 100% (full spillover). Ignoring spillover effects leads to the misleading conclusion thatthe average direct effect of the short-run purchase on the long-run purchase is almost zerowhen in fact, the effect varies from 0% to 36% depending on the degree of spillovers.Our structural modeling allows researchers to analyze the impact of counterfactualpolicies on the outcome of interest. We illustrate this by analyzing the impact of counter-factual subsidy program on the long-run adoption in which a policy-maker implements ameans-tested subsidy rule where the subsidy is given only when the household’s incomelevel is below some pre-specified threshold. We predict the average long-term adoptionrate under different subsidy regimes defined by different values of the eligibility threshold.We find that even under the very generous subsidy regime where almost everyone in thesample receives the subsidy, the average long-run adoption rate does not exceed 20%, dueto the large negative spillover in the long-run. Related Literature
Recent works on causal inference under spillovers mainly concentrate on the case with ran-dom treatment, i.e., they do not address treatment choice endogeneity. Examples includeHudgens and Halloran (2008), Leung (2020a), and Vazquez-Bare (2020).In causal inference literature, game-theoretic models have been used in several papers.Lazzati (2015) proposes a structural model of treatment responses using games of completeinformation. However, the paper does not address the endogeneity of treatment choices.Balat and Han (2019) allow spillovers at both choice and outcome stages using gametheoretic approach. Their model is different from ours in that they model treatment choiceby a binary game of complete (perfect) information. Also, Balat and Han (2019) consideran interaction within groups while we consider an interaction under general network.While the assumption of complete information may be appropriate under interactionsin a relatively small group, incomplete information assumption is more reasonable undernetwork interactions, especially when the network size is large. Jackson et al. (2020) model5reatment choices as a binary game of incomplete information. However, they do notconsider spillovers at the outcome level while we are interested in separately identifyingthe individual treatment effect and spillover effect.Meanwhile a literature from statistics has started to incorporate spillovers and non-compliance in network setting. See Imai et al. (2020) for the most recent progress. Un-like our game-theoretic model, their model is reduced-form in nature and consequently,important aspects of economic mechanism behind treatment choices such as utility max-imization are largely ignored.
Outline
We describe our model in Section 2. We first outline our model of treatment choices andthen the model of potential outcomes. Parameters of interest are also discussed. Section3 discusses identification of parameters of interest. We first show that the conventionalIV methods are not valid in the presence of treatment effect heterogeneity. We then showhow to use control function approach to achieve point identification. In Section 4, wepropose a simple two-stage estimation procedure. Asymptotic properties are derived andsimulation results are also presented. Section 5 applies our methods to empirical setting.
In this section, we first describe our treatment choice models as a binary game under in-complete information. We then describe our model of treatment responses under spillovers.Let N n = { , · · · , n } denote a set of agents. n -many agents are connected througha single, large network. Let G be a symmetric n × n adjacency matrix where ij th entry( G ij ) represents a connection or link between agents. Specifically, G ij = 1 if agent i and j are connected and G ij = 0 otherwise. We assume G ii = 0 for all i ∈ N n (no self-link).When G ij = 1, we say that i and j are (direct) peers or neighbors . Let N i be a set of i ’speers, i.e., N i = { j ∈ N n : G ij = 1 } . The number of i ’s neighbors or degree of i is denotedas |N i | . We consider a game theoretic model of treatment choice. Specifically, we characterize arealized treatment choice as a solution to a binary game under incomplete information6layed by agents in a given network. In this framework, agents simultaneously choosetheir treatment status in order to maximize their expected utility, given beliefs about theanticipated behaviors of their peers.
Utility
Each agent i has a vector of observed characteristics X i ∈ X and an unobservedutility shock v i ∈ R . Throughout the paper, we assume that X is a bounded subset of R k . In addition, each i is randomly assigned to treatment. Let Z i ∈ { , } represent i ’srandomized treatment assignment where Z i = 1 if i is assigned to treatment and Z i = 0if i is assigned to control. Let Z = ( Z i ) i ∈N n and X = ( X i ) i ∈N n . There is noncomplianceif Z (cid:54) = D , i.e., for some i , the treatment assignment is different from the actual treatmentreceived. There are two possible cases for this: ( Z i , D i ) = (1 ,
0) and ( Z i , D i ) = (0 , i who was assigned to treatment group has refused to take thetreatment. The latter indicates that i has received the treatment even when i was assignedto control group. In this paper, we allow for both cases, i.e., we consider a setting withtwo-sided noncompliance.Unlike Z i , D i is self-selection. We assume that each i chooses D i ∈ { , } by utilitymaximization where the utility that i receives depends on the choices of i ’s peers. Let theutility function of agent i be π ( D i , D − i , X i , Z i , v i ) where D − i ∈ { , } n − is a vector oftreatment choices of agents except for i . We specify the utility function as the followinglinear model: π ( D i , D − i , X i , Z i , v i ) = X (cid:48) i θ + θ Z i + θ |N i | (cid:80) j ∈N i D j − v i if D i = 10 if D i = 0 . (1)First note that the utility from choosing D i = 0 is normalized as zero. This is withoutloss of generality as only difference in utilities is identified. Utility of choosing D i = 1depends on other agents’ treatment choices through the term (cid:80) j ∈N i D j / |N i | , the fractionof peers taking up the treatment. This term represents social interactions or spillovereffects in treatment choice. When θ = 0, there are no spillovers and the model becomesa usual single-agent binary choice model as in McFadden (1984). When θ >
0, we havepositive spillovers where the utility of choosing D i = 1 is higher when members of i ’sreference group (directed neighbors in our specification) behave similarly. θ > θ <
0, we7onclude that there are negative spillovers in treatment choice.We assume that v i is a private information, i.e., v i is known only to i , and other agentscannot observe v i . Therefore agents have incomplete information over others’ choices. Inother words, i cannot observe other players’ treatment choices at the time their choice ismade. Instead, each agent i chooses an action that maximizes their expected utility giventheir beliefs on (cid:80) j ∈N i D j / |N i | . Beliefs are formed under the information set available to i . Let τ i denote i ’s information set. We specify τ i as follows: Assumption 1 (informational structure) . Let G = ( G ij ) i,j ∈N n , X = ( X i ) i ∈N n and Z =( Z i ) i ∈N n . We assume that ( G, X, Z ) is a public information , i.e., every agent knows theentire network structure ( G ), the vector of observed characteristics ( X ) and the vectorof treatment assignment ( Z ). On the other hand, v i is a private information of i whereits value is only known to i . Therefore τ i = ( G, X, Z, v i ) summarizes the informationavailable to i . The assumption 1 is standard in the literature on games of incomplete information.Let S = ( G, X, Z ) be the set of public information. This is often called a public statevariable as well. For private information v i , we make the following assumption: Assumption 2 (unobserved heterogeneity) . For all i ∈ N n , a private information v i is(i) i.i.d. with a standard normal cdf Φ and(ii) independent of S . As in the standard single-agent binary choice models, distribution of v i must be knownup to a finite-dimensional parameter. We use the normal distribution only for convenience.Other distributional assumptions such as logit can be used as well. The assumption that v i ’s are independent to each other is critical for our identification analysis. This assump-tion implies that the knowledge of v i does not help predicting v j for any j (cid:54) = i . To ourknowledge, identification of incomplete information games with correlated private infor-mation in a general network setting is an open question. Assumption 2 (ii) is triviallysatisfied if we treat S as fixed. Consequently, we do not address the issue of networkendogeneity as it is not a focus of this paper. Strategy
Let D i ( τ i , θ ) denote i ’s pure strategy which maps i ’s information set τ i = ( S, v i )to a treatment choice D i ∈ { , } given a parameter value θ = ( θ , θ , θ ). Agent i chooses8er optimal action by maximizing her expected utility E [ π ( D i , D − i , X i , Z i , v i ) | τ i ] wherethe expectation is taken with respect to D − i given her belief about D − i . Let σ j,i be i ’sbelief over the event { D j = 1 } given the information τ i . Then σ j,i = def Pr( D j = 1 | τ i ) (2)= Pr( D j ( τ j , θ ) = 1 | τ i ) (3)= Pr( D j ( S, v j , θ ) = 1 | S, v i ) (4)= Pr( D j ( S, v j , θ ) = 1) (5)= σ j ( S, θ ) (6)where the fourth equality follows from the Assumption 2. From the last equality, we seethat σ j,i = σ j for all i (cid:54) = j , i.e., every agent shares a common belief on j ’s choice. Thiscommon belief should be consistent with actual probability of j choosing D j = 1 underrational expectations as we show below. Equilibrium
Given the belief profile of { σ j ( S, θ ) } j (cid:54) = i , agent i calculates the expectedutility he gets when choosing D i = 1 as follows: E (cid:2) π (1 , D − i , X i , Z i , v i ) | τ i (cid:3) = E (cid:2) X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i D j − v i (cid:12)(cid:12) S, v i (cid:3) (7)= X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i Pr( D j = 1 | S ) (cid:124) (cid:123)(cid:122) (cid:125) = σ j ( S,θ ) − v i (8)= X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ j ( S, θ ) − v i . (9)Agent i would choose D i = 1 if E (cid:2) π (1 , D − i , X i , Z i , v i ) | τ i (cid:3) ≥
0. Therefore, D i = (cid:110) v i ≤ X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ j ( S, θ ) (cid:111) . Bayes-Nash equilibrium (BNE) is defined by a vector of choice probabilities σ ∗ ( S, θ ) = (cid:0) σ ∗ i ( S, θ ) (cid:1) i ∈N n that is consistent with the observed decision rule in the sense that it satisfies9he following system of equations: σ ∗ i ( S, θ ) = Pr (cid:0) v i ≤ X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) (cid:1) , ∀ i ∈ N n (10)= Φ (cid:16) X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) (cid:17) , ∀ i ∈ N n . (11)Here we use the superscript [ ∗ ] to emphasize that σ ∗ ( S, θ ) is an equilibrium quantity. Inother words, Bayes-Nash equilibrium given (
S, θ ) is a vector σ ∗ ( S, θ ) which is defined as afixed point to the system of equations above. By the implicit function theorem, it can beshown easily that σ ∗ ( S, θ ) is smooth in both S and θ . Therefore the existence of a fixedpoint is guaranteed due to Brouwer’s fixed point theorem for any realized data S andparameter value θ . However, there can be many fixed points σ ∗ ( S, θ ) solving the system.We show that a unique equilibrium exists if we restrict the value of θ to be sufficientlymild. Formally, Theorem 1 (unique equilibrium) . Let the pdf of v i be φ ( v ) . Define λ = | θ | sup u φ ( u ) .For any S and θ , there exists a unique equilibrium { σ ∗ j ( S, θ ) } j ∈N n if λ < . See appendix A for proof. When v i is normally distributed, we have sup u φ ( u ) =1 / √ π . Therefore λ < | θ | < √ π ≈ .
5. Throughout the paper weassume that λ <
Assumption 3 (unique equilibrium) . | θ | < √ π . Under the unique equilibrium, agent’s treatment choice can be written as the followingreduced-form equation: D i = (cid:110) v i ≤ X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) (cid:111) (12) ⇐⇒ D i = (cid:110) Φ( v i ) ≤ Φ (cid:16) X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) (cid:17)(cid:111) (13) ⇐⇒ D i = (cid:110) Φ( v i ) ≤ σ ∗ i ( S, θ ) (cid:111) (14)where the last step follows from 11.The story goes like this: For given S and θ , the equilibrium choice probabilities σ ∗ i ( S, θ ) , ∀ i ∈ N n are realized. Observing this equilibrium, each agent chooses their treat-ment status according to either 12,13 or 14.10 .2 Potential Outcomes Model with Spillovers In this section, we propose our model of treatment response in settings with spillovers.Previous research on treatment response has been based on the SUTVA assumption whichrequires that an individual’s outcome depends only on their own treatment status. Underthe SUTVA assumption, i ’s outcome or response Y i can be written as Y i = Y i ( D i ). Let d ∈ { , } be the possible treatment value that agents can get. Potential outcome underthe SUTVA assumption is denoted by Y i ( d ), which delivers the response of i when assignedto D i = d . Unlike the SUTVA case, however, there is no obvious way to model spilloversin the treatment response. As Manski (2013) and Kline and Tamer (2020) show, there aremany ways to relax the SUTVA assumption, each of which is based on different restrictionson the nature of interference between agents.In our paper, we assume that i ’s outcome is a function of a direct effect from owntreatment status and an indirect effect or spillover effect from i ’s neighbors. Spillovereffects are assumed to be mediated by (cid:80) j ∈N i σ ∗ j ( S, θ ) / |N i | . For notational simplicity, let usdefine π ∗ i ( S, θ ) = (cid:80) j ∈N i σ ∗ j ( S, θ ) / |N i | . Also, π ∗ i and π ∗ i ( S, θ ) will be used interchangeably.Thus, we write the realized outcome of i as follows: Y i = Y i ( D i , π ∗ i )where π ∗ i = π ∗ i ( S, θ ) is the average of equilibrium treatment choice probabilities of i ’sneighbor. From now on, we simply refer to π ∗ i as i ’s “ neighborhood (propensity) score ”.This is the average value of propensity scores of i ’s direct neighbors where each scoremeasures the probability of taking up the treatment given the public information S .Jackson et al. (2020) have termed the same object as “peer-influenced propensity score”.Let π ∈ [0 ,
1] be the possible value that π ∗ i can take. The potential outcome Y i ( d, π )represents i ’s response when we exogenously assign D i = d and π ∗ i = π . Concretely, Y i (1 , π ) represents i ’s outcome when i is required to be treated and i ’s neighborhoodscore has been exogenously set to π . Similarly Y i (0 , π ) is i ’s outcome when i is forbiddento be treated and i ’s neighborhood score has been exogenously set to π . Underlyingassumption is that it is possible to manipulate the value of D i and π ∗ i . Since π ∗ i is afunction of the public state variable S = ( G, X, Z ), we can conceivably manipulate thevalue of π ∗ i by changing Z for a given ( G, X ), which is assumed to be predeterminedand non-manipulable. Thus Y i ( d, π ) can be realized through changing Z profile in the11opulation in a way that it induces π ∗ i = π as an equilibrium in the first-stage and thenrequiring i to choose D i = d . Comparison to other approaches
The existing literature with interference often modelspotential outcomes as a function of own treatment status and the proportion of treatedneighbors or the number of treated neighbors (e.g. Hudgens and Halloran (2008), Leung(2020a), Vazquez-Bare (2020)). Define ¯ D i ≡ (cid:80) j ∈N i D j / |N i | with a generic value ¯ d ∈ [0 , Y i = Y i ( D i , ¯ D i ) and the potentialoutcomes as Y i ( d, ¯ d ). Our model differs from theirs in that we model spillovers via ex ante (anticipated) expectation of ¯ D i rather than ex post realization of ¯ D i itself. Recall that π ∗ i ( S, θ ) = E [ ¯ D i | S ]. Since the difference between ¯ D i and π ∗ i ( S, θ ) has a mean zero (i.e., E [ ¯ D i − π ∗ i ( S, θ ) | S ] = 0), in practice the values of these two quantities may not be toodifferent, especially when |N i | is large.Nevertheless, they are based on two different behavioral assumptions. Suppose thatthe outcome of interest represents decision or behavior of agents. Then the formulation Y i = Y i ( D i , ¯ D i ) is derived under the assumption that agents base their decisions on ¯ D i rather than expected ¯ D i . This is realistic only when ¯ D i is fully observed at the timedecision on Y i is made. Thus, the model could be interpreted as a model with completeor perfect information. On the other hand, our specification Y i = Y i ( D i , π ∗ i ) assumes thatagents do not fully observe ¯ D i when they decide their Y i . Thus agents face an intrinsic un-certainty over others’ treatment choices even at the second-stage. This is plausible whenthe reference group is relatively large so that it is not easy for agents to fully observethe value of ¯ D i . Also, there are settings where agents are reluctant to reveal their treat-ment status — For instance when treatment represents learning about their HIV statusas in Godlonton and Thornton (2012). In such cases, it may be more realistic to assumethat agents have private information even in the second stage. Unlike ¯ D i , the equilibriumneighborhood score π ∗ i is always observable to agents as it is a function of public informa-tion S . Thus it is plausible that agents base their decisions on the equilibrium quantity π ∗ i which signals a priori prevalence of treatment adoption in the neighborhood. Note that some combination ( d, π ) may represent off-the-equilibrium quantity. Thus, the resulting Y i ( d, π ) may not be a policy-relevant counterfactual. Nevertheless, to define causal effects rigorously, weneed to consider every possible combinations of ( d, π ) ∈ { , } × [0 , andom Coefficients Model of Potential Responses We put more structure on Y i ( d, π )by using random coefficients model where we allow for a correlation between individualtreatment status and random coefficients. Therefore our model can be seen as a correlatedrandom coefficient model as in Masten and Torgovitsky (2016) and Wooldridge (2003). Assumption 4 (random coefficient model) . (i) For any i ∈ N n , d ∈ { , } and π ∈ [0 , , we have Y i (1 , π ) = α i + β i π, Y i (0 , π ) = α i + β i π where ( α i , β i ) and ( α i , β i ) are unit-specific coefficients.(ii) For S = ( G, X, Z ) , unit-specific coefficients satisfy the following restrictions: E [ α i | S ] = E [ α i | X i ] = X (cid:48) i α , & E [ β i | S ] = E [ β i | X i ] = X (cid:48) i β and similarly, E [ α i | S ] = E [ α i | X i ] = X (cid:48) i α , & E [ β i | S ] = E [ β i | X i ] = X (cid:48) i β . Recall that Y i (1 , π ) represent i ’s response when i is given the treatment and i ’s neigh-borhood score had been exogenously set to π . Under the Assumption 4 (i), such responseis assumed to be linear in π with the intercept α i and the slope β i that are allowed to bedifferent across agents. Similarly, Y i (0 , π ) is assumed to be linear in π with the intercept α i and the slope β i . Note that unit-specific coefficients under the treatment, ( α i , β i ),are allowed to be different from those without the treatment, ( α i , β i ) for generality.The assumption that π affects the potential outcomes Y i (1 , π ) and Y i (0 , π ) in a linearway is only for convenience. It is straightforward to extend our model to include higher-order terms such as π , e.g., Y i ( d, π ) = α d,i + β d,i π + γ d,i π for d ∈ { , } .Unit-specific coefficients are unobservable random variables that are potentially de-pendent on unit’s observed covariates. By Assumption 4 (ii), we assume that the observedparts of the coefficients depend on the public state variable S = ( G, X, Z ) only through X i . Importantly, this assumption implies that Z is irrelevant for the random coefficients.This rules out the case that the treatment assignment vector Z = ( Z i , Z − i ) directly af-fects Y i . This is the standard exclusion restriction of instruments. Therefore under thisassumption, Z is given a status of an instrumental variable.13he assumption that G is redundant is only for convenience as we can always includenetwork statistics such as the number of direct peers in X i . Finally, that the conditionalexpectation is linear in X i is also for convenience as we can always allow X i to includenonlinear functions of underlying covariates.Under Assumption 4 (ii), we can decompose the unit-specific coefficients into its meanpart given X i , and its deviation from mean as follows: α i = X (cid:48) i α + u i , E [ u i | S ] = 0 ,β i = X (cid:48) i β + e i , E [ e i | S ] = 0 . Analogously for D i = 0 as well: α i = X (cid:48) i α + u i , E [ u i | S ] = 0 ,β i = X (cid:48) i β + e i , E [ e i | S ] = 0 . Therefore the potential outcomes can be written as Y i (1 , π ) = X (cid:48) i α + u i + π (cid:0) X (cid:48) i β + e i (cid:1) , E [ u i | S ] = E [ e i | S ] = 0 ,Y i (0 , π ) = X (cid:48) i α + u i + π (cid:0) X (cid:48) i β + e i (cid:1) , E [ u i | S ] = E [ e i | S ] = 0 , while the observed outcome is given as follows: Y i = Y i ( D i , π ∗ i ) = X (cid:48) i α + u i + π ∗ i (cid:0) X (cid:48) i β + e i (cid:1) if D i = 1 X (cid:48) i α + u i + π ∗ i (cid:0) X (cid:48) i β + e i (cid:1) if D i = 0Our model contains the four-dimensional error term: η i = ( u i , e i , u i , e i ). By construc-tion, η i are uncorrelated with S , i.e., E [ η i | S ] = 0. By having η i , random coefficients areallowed to be heterogeneous even after controlling for relevant observed characteristics X i . The importance of allowing for such unobserved heterogeneity has been emphasizedin the modern program evaluation literature (See, e.g., Heckman (2001), Heckman et al.(2006) and Imbens (2007)). 14 .3 Parameters of Interest In this section, we formally define our parameters of interest, the class of average casualeffects. For this purpose, let us first study average potential outcomes functions.
Average potential outcomes
Under our specifications, average potential outcomes foragents with X i = x are computed as follows: for π ∈ [0 , E [ Y i (1 , π ) | X i = x ] = x (cid:48) α + ( x (cid:48) β ) π, E [ Y i (0 , π ) | X i = x ] = x (cid:48) α + ( x (cid:48) β ) π. Integrating them over identically distributed X i gives the unconditional average potentialoutcomes. Letting µ X = E [ X i ], E [ Y i (1 , π )] = µ (cid:48) X α + ( µ (cid:48) X β ) π (15)= α m + β m π, (16) E [ Y i (0 , π )] = µ (cid:48) X α + ( µ (cid:48) X β ) π (17)= α m + β m π (18)where ( α m , β m , α m , β m ) = ( µ (cid:48) X α , µ (cid:48) X β , µ (cid:48) X α , µ (cid:48) X β ). Since µ X is identifiable fromthe data, identification of ( α m , β m , α m , β m ) requires one to identify ( α , β , α , β ).( α m , α m ) represent the baseline mean potential outcomes when we set π = 0, i.e.,( α m , α m ) = ( E [ Y i (1 , , E [ Y i (0 , π is captured by ( β m , β m ).On the other hand, ( α , β , α , β ) measures the heterogeneous effect of X i on themean potential outcomes. To see this, notice that the following equations hold: E [ Y i (1 , π ) | X i = x ] = x (cid:48) α + πx (cid:48) β = E [ Y i (1 , π )] + ( x − µ X ) (cid:48) α + π ( x − µ X ) (cid:48) β , E [ Y i (1 , π ) | X i = x ] = x (cid:48) α + πx (cid:48) β = E [ Y i (0 , π )] + ( x − µ X ) (cid:48) α + π ( x − µ X ) (cid:48) β . Therefore for d ∈ { , } , ( α d , β d ), without constant coefficients parts, explains the differ-ence between E [ Y i ( d, π ) | X i = x ] and E [ Y i ( d, π )].15 verage causal effects Given the average response functions, we now define averagecausal effects, which are our parameters of interest. Let us define the average direct effect(ADE) of own treatment under π as follows: ADE ( π ) = E [ Y i (1 , π ) − Y i (0 , π )] .ADE ( π ) measures the average change in outcomes under the regime in which i is requiredto choose D i = 1, compared to the regime in which i is forbidden to choose D i = 1 while i ’s neighborhood score is fixed to π . Under our random coefficients specification, ADE ( π )can be written as ADE ( π ) = α m − α m + ( β m − β m ) π. Similarly, we define average spillover effect (ASE) from changing the neighborhoodscore from π to ˜ π for each d ∈ { , } as follows: ASE ( π, ˜ π, d ) = E [ Y i ( d, ˜ π ) − Y i ( d, π )] = (˜ π − π ) β dm , which measures the effect of changing the neighborhood score from π to ˜ π while fixingagent’s treatment status at D i = d . Whether β m = 0 or β m = 0 is of interest as itindicates whether there are treatment spillovers at the outcome level. In sum, our model of treatment choices and outcomes can be written as the followingsemi-triangular system: Y i = Y i ( D i , π ∗ i ) = X (cid:48) i α + u i + (cid:0) X (cid:48) i β + e i (cid:1) π ∗ i if D i = 1 X (cid:48) i α + u i + (cid:0) X (cid:48) i β + e i (cid:1) π ∗ i if D i = 0 (19) D i = { v i ≤ X (cid:48) i θ + θ Z i + θ π ∗ i } (20)s.t. σ ∗ i = Φ (cid:16) X (cid:48) i θ + θ Z i + θ π ∗ i (cid:17) , ∀ i ∈ N n . (21)Using the formula Y i = D i Y i (1 , π ∗ i )+(1 − D i ) Y i (0 , π ∗ i ) = Y i (0 , π ∗ i )+ D i ( Y i (1 , π ∗ i ) − Y i (0 , π ∗ i )),19 can be written as follows: 16 i = X (cid:48) i α + π ∗ i X (cid:48) i β + D i X (cid:48) i ( α − α ) + D i π ∗ i X (cid:48) i ( β − β ) + (cid:15) i (22)where (cid:15) i = u i + π ∗ i e i + D i (cid:0) u i − u i + π ∗ i ( e i − e i ) (cid:1) . (23)Equation 22 gives the conventional linear regression model. Naturally, one may con-sider estimating ( α , α , β , β ) by the least squares regression of Y i on ( X i , π ∗ i X i , D i X i , D i π ∗ i X i ).Resulting OLS estimator is consistent only when (cid:15) i is uncorrelated with the regressors,i.e., E [ (cid:15) i | D i , X i , π ∗ i ] = 0 which requires that the following two conditions hold: E [ u i + π ∗ i e i | D i = 0 , X i , π ∗ i ] = ( a ) E [ u i + π ∗ i e i | X i , π ∗ i ] = ( b ) , E [ u i + π ∗ i e i | D i = 1 , X i , π ∗ i ] = ( a ) (cid:48) E [ u i + π ∗ i e i | X i , π ∗ i ] = ( b ) (cid:48) . Since η i = ( u i , u i , e i , e i ) are uncorrelated with S = ( G, X, Z ) by construction, ( b ) and( b ) (cid:48) are automatically satisfied. Therefore, we only need to show that ( a ) and ( a ) (cid:48) aresatisfied. This is true only when D i is uncorrelated with η i conditional on ( X i , π ∗ i ). This isthe familiar selection-on-observables assumption. Such assumption is unlikely to hold if thetreatment group and control group are systematically different in their unobserved factors η i even after controlling for all relevant observables. Indeed, the very fact that agents withthe same observed characteristics ( X i , π ∗ i ) have made different treatment choices suggeststhat they differ in their unobserved factors. Thus, the source of endogeneity comes fromthe correlation between v i and η i even after conditional on S .More specifically, note that the selection-on-observables assumption requires that thefollowing two conditions hold: Corr ( Y i (0 , π ∗ i ) , D i | X i , π ∗ i ) = 0 (24)and Corr ( Y i (1 , π ∗ i ) − Y i (0 , π ∗ i ) , D i | X i , π ∗ i ) = 0 . (25)Condition 24 requires that the idiosyncratic part of Y i (0 , π ∗ i ) is uncorrelated with D i ,17.e., in the absence of the treatment, there should be no difference in the mean potentialoutcomes across treatment group and control group once we account for relevant observ-ables ( X i , π ∗ i ). However, agents who take up the treatment may have unusual values of Y i (0 , π ) even after controlling for ( X i , π ∗ i ). If individuals who take up the treatment tendto have higher values of Y i (0 , π ) in terms of unobservables, then the naive least squaresregression would suffer from an upward bias since cov ( D i , (cid:15) i | S ) >
0. This is the case ofclassic selection problem.The requirement 25 is also troublesome as the condition implies that the unobservedgain from the treatment given π ∗ i should not vary across treatment group and controlgroup. This is not satisfied if the treatment choice is correlated with unobserved gains fromthe treatment. It is plausible that agents have some knowledge of likely idiosyncratic gainsfrom the treatment at the time they choose their treatment status. If agent’s treatmentchoice is partially based on such knowledge, then 25 would not be satisfied. This typeof sorting on the unobserved gain, termed “ essential heterogeneity ” by Heckman et al.(2006), has been emphasized in the modern program literature.In conclusion, whenever selection problem or essential heterogeneity exists, the naiveOLS regression delivers inconsistent estimates of structural parameters ( α , α , β , β ). In the previous section, we showed that the OLS regression of 22 suffers from bias when v i is correlated with η i = ( u i , u i , e i , e i ) even when we control for S . In this section,we first show that the IV methods do not identify the casual parameters of interest inthe presence of general heterogeneity. We then propose the alternative method known ascontrol function approach. Endogeneity is often addressed by IV methods such as two-stage least squares (2SLS). Inour setup, Z i is a valid IV for D i since (i) D i is correlated with Z i , and (ii) Z i is exogenousand is excluded from the outcome equation. In fact, in the presence of spillovers in thefirst stage, not only Z i but also n -dimensional vector Z = ( Z i , Z − i ) is a valid instrumentfor D i since in that case, D i is a function of entire assignment vector Z . . Therefore, Recall that when there exist spillovers in the first stage choice model, not only i ’s direct neighbor’s Z but indirect neighbors’ Z also affect D i . Therefore Z j for j that are eventually connected to i is also
18e may run an IV regression to 22 where we instrument D i by Z i or by Z = ( Z i , Z − i ),depending on whether spillovers exist in the first stage.We argue that such strategy does not identify ( α , β , α , β ) in our setup. Sup-pose we instrument D i by Z i . The resulting IV estimator is consistent only when the E [ (cid:15) i | Z i , X i , π ∗ i ] = 0 where (cid:15) i = u i + π ∗ i e i + D i (cid:0) u i − u i + π ∗ i ( e i − e i ) (cid:1) as in 23. Notethat, E [ (cid:15) i | Z i , X i , π ∗ i ]= E [ u i + π ∗ i e i + D i (cid:0) u i − u i + π ∗ i ( e i − e i ) (cid:1) | Z i , X i , π ∗ i ]= E [ u i + π ∗ i e i | Z i , X i , π ∗ i ] (cid:124) (cid:123)(cid:122) (cid:125) A + E [ u i − u i + π ∗ i ( e i − e i ) | D i = 1 , Z i , X i , π ∗ i ] (cid:124) (cid:123)(cid:122) (cid:125) B Pr( D i = 1 | Z i , X i , π ∗ i ) (cid:124) (cid:123)(cid:122) (cid:125) C . A = 0 since η i = ( u i , u i , e i , e i ) is uncorrelated with S , and thereby with ( Z, X i , π ∗ i ). C cannot be zero except for trivial cases. Therefore E [ (cid:15) i | Z, X i , π ∗ i ] = 0 only when B = 0.This is satisfied when E [ u i − u i + π ∗ i ( e i − e i ) | D i = 1 , Z i , X i , π ∗ i ] = E [ u i − u i + π ∗ i ( e i − e i ) | Z i , X i , π ∗ i ] as E [ η i | S ] = 0 implies that the last term is zero. Note that u i − u i + π ∗ i ( e i − e i ) can be interpreted as an idiosyncratic part of Y i (1 , π ∗ i ) − Y i (0 , π ∗ i ). Thereforewe need to assume that D i is uncorrelated with the idiosyncratic gain from taking thetreatment once we condition on ( Z i , X i , π ∗ i ). Such requirement is unrealistic when agentshave some knowledge on their idiosyncratic gains and base their treatment decision onsuch knowledge, i.e., when there is sorting on unobserved gains.Whether the u i − u i + π ∗ i ( e i − e i ) is correlated with D i is an empirical matter andshould not be settled a priori. IV methods rule out the possibility of such correlation andare subject to failure when the correlation exists. This point has also been pointed out inthe traditional treatment effect literature which rules out spillover effects. (See Hahn andRidder (2011)). For instance, it is now well established in the literature that IV/2SLSdoes not recover the average causal parameters such as ATE under the heterogeneousresponses model such as random coefficients models (See Imbens and Angrist (1994)). We now propose the alternative strategy known as the control function approach. Con-trol function approach addresses the endogeneity problem by explicitly formulating the relevant for D i . However as the network distance between i and j becomes greater, the dependence between Z j and D i decays exponentially when λ <
1. (See Xu (2018) and Leung (2020b)). Therefore, using Z j that is too far from i as an IV may incur weak IV problem. E [ Y i | D i = 1 , S ] and E [ Y i | D i = 0 , S ] as follows: E [ Y i | D i = 1 , S ]= E [ Y i | D i = 1 , σ ∗ i ( S, θ ) , π ∗ i ( S, θ ) , S ]= E [ Y i (1 , π ∗ i ( S, θ )) | v i ≤ Φ − ( σ ∗ i ( S, θ )) , σ ∗ i ( S, θ ) , π ∗ i ( S, θ ) , S ]= X (cid:48) i α + E [ u i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] + π ∗ i ( S, θ ) (cid:110) X (cid:48) i β + E [ e i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] (cid:111) since D i = 1 ⇐⇒ Φ( v i ) ≤ σ ∗ i (See 14). Similarly, the observed conditional mean for thecontrol group is, E [ Y i | D i = 0 , S ]= X (cid:48) i α + E [ u i | v i > Φ − ( σ ∗ i ( S, θ )) , S ] + π ∗ i ( S, θ ) (cid:110) X (cid:48) i β + E [ e i | v i > Φ − ( σ ∗ i ( S, θ )) , S ] (cid:111) . The terms E [ u i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] , E [ e i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] and E [ u i | v i > Φ − ( σ ∗ i ( S, θ )) , S ] , E [ e i | v i > Φ − ( σ ∗ i ( S, θ )) , S ] are “control functions” which account forthe endogeneity of D i . Assumption 5 below restricts the form of these control functions. Assumption 5.
For all i ∈ N n , η i = ( u i , u i , e i , e i ) satisfies the following conditions.(i) η i is i.i.d. and is independent of S .(ii) E [ η i | v i ] is a linear function of v i .Under these two conditions, we write E [ u i | v i , S ] = E [ u i | v i ] = ρ u v i , E [ e i | v i , S ] = E [ e i | v i ] = ρ e v i , E [ u i | v i , S ] = E [ u i | v i ] = ρ u v i , E [ e i | v i , S ] = E [ e i | v i ] = ρ e v i where ρ = ( ρ u , ρ e , ρ u , ρ e ) captures the covariances between each component of η i and v i . Assumption 5 (i) is often referred to as “separability” assumption and has been uti-lized in literature as in Carneiro et al. (2011) and Brinch et al. (2017). Under this as-sumption, the control functions depend only on the individual propensity score σ ∗ i ( S, θ ),e.g., E [ u i | v i ≤ Φ − ( σ ∗ i ( S, θ )) , S ] = E [ u i | v i ≤ Φ − ( σ ∗ i ( S, θ ))] so that the control func-tions are separated from S . As a result, E [ Y i | D i = 1 , S ] and E [ Y i | D i = 0 , S ] depend on20 only though ( X i , π ∗ i , σ ∗ i ). This step is necessary since it is not possible to control for S = ( G, X, Z ) itself as our data consist of one large network.Assumption 5 (ii) further allows us to write E [ u i | v i ≤ Φ − ( σ ∗ i )], for instance, as ρ u E [ v i | v i ≤ Φ − ( σ ∗ i )]. Combined with the normality assumption on v i , we effectivelyassume that ( η i , v i ) are jointly normal. However, it can easily accommodate alternativedistributional assumptions on v i other than normality.Under the joint normality assumption, control functions take a form of inverse millsratio. Define λ ( · ) and λ ( · ) as follows: For σ ∈ (0 , λ ( σ ) = − φ (Φ − ( σ )) σ , λ ( σ ) = φ (Φ − ( σ ))1 − σ . It follows that E [ Y i | D i = 1 , S ] = X (cid:48) i α + ρ u λ ( σ ∗ i ) + π ∗ i (cid:0) X (cid:48) i β + ρ e λ ( σ ∗ i ) (cid:1) , E [ Y i | D i = 0 , S ] = X (cid:48) i α + ρ u λ ( σ ∗ i ) + π ∗ i (cid:0) X (cid:48) i β + ρ e λ ( σ ∗ i ) (cid:1) . Let λ i = D i λ i + (1 − D i ) λ i . We see that ( α , β , ρ u , ρ e ) is identified by regressing Y i on ( X (cid:48) i , λ i , π ∗ i X (cid:48) i , π ∗ i λ i ) (cid:48) using the subsample of D i = 1. Similarly, we can identify( α , β , ρ u , ρ e ) by regressing Y i on X i , λ i and their interactions with π ∗ i using the sub-sample of D i = 0. The inclusion of λ i accounts for the correlation between η i and v i sothat we can test for the endogeneity of D i by checking whether correlations are collectivelyzero or not.Our model achieves a point identification by exploiting a functional form assumptionbetween η i and v i . We can relax the linearity assumption and have more flexible parametricfunctional form by adding higher-order terms. For instance, we may specify E [ u i | v i ] asthe quadratic function of v i as follows: E [ u i | v i ] = ρ u v i + ˜ ρ u v i . Then it can be shown that E [ u i | v i ≤ Φ − ( σ ∗ i )] = − ρ u φ (Φ − ( σ ∗ i )) σ ∗ i + ˜ ρ u (cid:104) Φ − ( σ ∗ i ) φ (Φ − ( σ ∗ i )) σ ∗ i + (cid:110) φ (Φ − ( σ ∗ i )) σ ∗ i (cid:111) (cid:105) . This also offers a way to test for linearity assumption in a spirit of Lee (1984).21
Estimation
We propose a two-stage estimation procedure. In the first-stage, we estimate the treatmentchoice games using a nested fixed point maximum likelihood (NFXP-ML) method. Inthe second-stage, using first-stage estimates, we estimate regression models of treatmentoutcomes with generated regressors.
Recall that the treatment choice models boil down to equation 20 subject to the fixed-point requirement 28. Our sample log-likelihood function are defined as follows: (cid:98) L n ( θ ) = 1 n n (cid:88) i =1 (cid:110) D i ln σ ∗ i ( S, θ ) + (1 − D i ) ln(1 − σ ∗ i ( S, θ )) (cid:111) (26)Our estimator ˆ θ = (ˆ θ , ˆ θ , ˆ θ ) is defined as the maximizer of (cid:98) L n ( θ ) subject to the constraintthat { σ ∗ i ( S, ˆ θ ) } satisfies the fixed-point requirement. Formally,ˆ θ = arg max θ ∈ Θ (cid:98) L n ( θ ) (27)subject to σ ∗ i ( S, ˆ θ ) = Φ (cid:16) X (cid:48) i ˆ θ + ˆ θ Z i + ˆ θ |N i | (cid:88) j ∈N i σ ∗ j ( S, ˆ θ ) (cid:17) , ∀ i ∈ N n (28)For computation, we use the nested fixed point (NFXP) algorithm. Specifically, startingwith an arbitrary initial guess for ˆ θ , we find the fixed point of 28 via contraction iterations(it can be shown that 28 is a contraction mapping when λ < θ toˆ θ (cid:48) according to, say, Newton’s method. Iterate the procedure until a sequence of estimatesconverges. Our NFXP-ML estimator is taken as its limit. Let us define the set of regressors as W i = [ X (cid:48) i , λ i , π ∗ i ( S, θ ) X (cid:48) i , π ∗ i ( S, θ ) λ i ] (cid:48) λ i = D i λ i + (1 − D i ) λ i with λ i = λ ( σ ∗ i ( S, θ )) and λ i = λ ( σ ∗ i ( S, θ )).Our estimators are based on the following moment conditions E [ Y i | D i = 1 , S ] = W (cid:48) i γ , E [ Y i | D i = 0 , S ] = W (cid:48) i γ where γ = ( α , ρ u , β , ρ e ) (cid:48) and γ = ( α , ρ u , β , ρ e ) (cid:48) .This suggests that γ and γ can be estimated by regressing Y i on W i , separately to thesubsample with D i = 1 and D i = 0, respectively. However, since λ i and π ∗ i are functions ofunknown first-stage parameters θ , we need to replace θ with ˆ θ . Define ˆ λ i = λ ( σ ∗ i ( S, ˆ θ ))and ˆ λ i = λ ( σ ∗ i ( S, ˆ θ )). Let ˆ λ i = D i ˆ λ i + (1 − D i )ˆ λ i . Similarly, we replace the unknownquantity π ∗ i ( S, θ ) ≡ |N i | (cid:80) j ∈N i σ ∗ j ( S, θ ) with ˆ π ∗ i = π ∗ i ( S, ˆ θ ) = |N i | (cid:80) j ∈N i σ ∗ j ( S, ˆ θ ). Thus,our generated regressor ˆ W i for W i isˆ W i = [ X (cid:48) i , ˆ λ i , ˆ π i X (cid:48) i , ˆ π i ˆ λ i ] (cid:48) . Estimator for γ is then defined asˆ γ = arg min γ n n (cid:88) i =1 D i (cid:0) Y i − ˆ W (cid:48) i γ (cid:1) = (cid:110) n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:111) − n (cid:88) i =1 D i ˆ W i Y i . Similarly, estimator for γ isˆ γ = arg min γ n n (cid:88) i =1 (1 − D i ) (cid:0) Y i − ˆ W (cid:48) i γ (cid:1) = (cid:110) n (cid:88) i =1 (1 − D i ) ˆ W i ˆ W (cid:48) i (cid:111) − n (cid:88) i =1 (1 − D i ) ˆ W i Y i . For the asymptotic analysis, we consider large-network asymptotics in which a number ofindividuals connected in a single network goes to infinity. Moreover, for each n , we treat S = ( G, X, Z ) as fixed. This is justified since S is an ancillary statistics, i.e., S does notcontain any information on the parameters of interest.23 .3.1 Inference for the first-stage game We first establish √ n -consistency and asymptotic normality of the first-stage estimatorˆ θ . The true parameter is denoted by θ . Therefore our data { D i } ni =1 is assumed to begenerated from D i = { v i ≤ X (cid:48) i θ + θ Z i + θ π ∗ i ( S, θ ) } subject to σ ∗ i ( S, θ ) = Φ (cid:0) X (cid:48) i θ + θ Z i + θ π ∗ i ( S, θ ) (cid:1) for all i ∈ N n . Theorem 2 (consistency of ˆ θ ) . Under the following assumptions, ˆ θ − θ p −→ .(i) The true parameter θ = ( θ , θ , θ ) lies in a compact set Θ ⊆ R dim ( θ ) and | θ | < √ π . The support of X i is a bounded subset of R k .(ii) Let R i = ( X (cid:48) i , Z i , π ∗ i ( S, θ )) (cid:48) . For large enough n , (cid:80) ni =1 R i R (cid:48) i is invertible, i.e., lim inf n →∞ det( n (cid:88) i =1 R i R (cid:48) i ) > . See Appendix B.1 for the proof.
Assumption (i) ensures that there is unique equilibrium at the true parameter (SeeTheorem 1) and that each equilibrium probability σ ∗ i ( S, θ ) ∈ (0 ,
1) for all i . Assumption(ii) is the rank condition for identification which requires that for all large enough n . themoment matrix of regressors has full rank.We now establish asymptotic normality of ˆ θ . Let us define the information matrix asfollows: I n ( θ ) = E (cid:104) n n (cid:88) i =1 ∇ θ l i ( θ ) ∇ θ l i ( θ ) (cid:48) (cid:12)(cid:12)(cid:12) S (cid:105) where l i ( θ ) = D i ln σ ∗ i ( S, θ ) + (1 − D i ) ln(1 − σ ∗ i ( S, θ )) is the individual log-likelihoodfunction. Therefore ∇ θ l i ( θ ) is given by ∇ θ l i ( θ ) = D i ∇ θ σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) + (1 − D i ) −∇ θ σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) . (29) Theorem 3 (asymptotic normality of ˆ θ ) . In addition to the conditions for Theorem 2,assume(i) The true parameter θ lies in the interior of the compact set Θ ⊆ R dim ( θ ) . ii) For any n , I n ( θ ) is nonsingluar.Then ( I − n ( θ )) − / √ n (ˆ θ − θ ) d −→ N (0 , I dim ( θ ) ) (30) where I dim ( θ ) is the dim ( θ ) × dim ( θ ) identity matrix.See Appendix B.2 for proof. Variance Estimation
The asymptotic variance of ˆ θ can be estimated by (cid:100) V ar (ˆ θ ) = (cid:98) I − n /n where (cid:98) I n ≡ n n (cid:88) i =1 ∇ θ l i (ˆ θ ) ∇ θ l i (ˆ θ ) (cid:48) . In order to compute ∇ θ l i (ˆ θ ) using equation 29, we need to evaluate ∇ θ σ ∗ i ( S, ˆ θ ). For thiswe use the numerical approximation method: Take ˆ θ + (cid:15) for a small perturbation (cid:15) (e.g., (cid:15) = 10 − ), then compute the new equilibrium { σ ∗ i ( S, ˆ θ + (cid:15) ) } ni =1 by solving the fixed point. ∇ θ σ ∗ i ( S, ˆ θ ) is then computed by ( σ ∗ i ( S, ˆ θ + (cid:15) ) − σ ∗ i ( S, ˆ θ )) /(cid:15) . Next, we establish √ n -consistency and asymptotic normality of the second-stage estima-tors (ˆ γ , ˆ γ ). Let us denote the true parameters by ( γ , γ ). We assume that our model iscorrectly specified, i.e., Y i satisfies the following conditional moment restrictions: E [ Y i | S, D i = 1] = W (cid:48) i γ , E [ Y i | S, D i = 0] = W (cid:48) i γ . We maintain the conditions for √ n -consistency and asymptotic normality of the first-stageestimator ˆ θ . Theorem 4 (consistency of (ˆ γ , ˆ γ )) . Under the following assumptions, ˆ γ − γ p −→ and ˆ γ − γ p −→ (i) The true parameter γ lies in a compact set Γ ⊆ R dim ( γ ) . Similarly, the trueparameter γ lies in a compact set Γ ⊆ R dim ( γ ) .(ii) Let lim inf n →∞ det (cid:110) n (cid:88) i =1 E [ D i W i W (cid:48) i | S ] (cid:111) > nd lim inf n →∞ det (cid:110) n (cid:88) i =1 E [(1 − D i ) W i W (cid:48) i | S ] (cid:111) > . See Appendix B.3 for proof.
Next, we derive the asymptotic results for the second-step estimators. For compact-ness, we only report results for ˆ γ , as ˆ γ case can be derived in an analogous way. Theorem 5 (asymptotic normality of ˆ γ ) . Define Υ n = E [ 1 n n (cid:88) i =1 D i W i W (cid:48) i | S ]Ψ n = E [ 1 n n (cid:88) i =1 D i W i W (cid:48) i (cid:15) i | S ]+ E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) E (cid:104) n n (cid:88) i =1 ∇ θ l i ( θ ) ∇ θ l i ( θ ) (cid:48) (cid:12)(cid:12)(cid:12) S (cid:105) − E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) (cid:48) In addition to the conditions for Theorem 4, assume(i) The true parameter γ lies in the interior of the compact set Γ ⊆ R dim ( γ ) .(ii) For any n , Ψ n and Υ n are nonsingular.Then we have Λ − / n √ n (ˆ γ − γ ) d −→ N (0 , I dim ( γ ) ) where Λ n = Υ − n Ψ n Υ − n . See Appendix B.4 for proof. If we ignore first-stage estimation, the asymptotic variance would beΥ − n E (cid:104) n n (cid:88) i =1 D i W i W (cid:48) i (cid:15) i | S (cid:105) Υ − n which is smaller, in the positive semi-definite sense, than the correct asymptotic varianceΥ − n Ψ n Υ − n . 26 ariance Estimation The asymptotic variance Λ n can be estimated by replacing thepopulation means by sample counterparts. Specifically,ˆΥ n = 1 n n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i ˆΨ n = 1 n n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i ˆ (cid:15) i + (cid:16) n n (cid:88) i =1 D i W i ˆ γ (cid:48) ∇ γ W i (ˆ γ ) (cid:17)(cid:16) n n (cid:88) i =1 ∇ θ l i (ˆ θ ) ∇ θ l i (ˆ θ ) (cid:17)(cid:16) n n (cid:88) i =1 D i W i ˆ γ (cid:48) ∇ γ W i (ˆ γ ) (cid:17) (cid:48) where ˆ (cid:15) i = D i ( Y i − ˆ W (cid:48) i ˆ γ ). In this section, we illustrate the finite sample properties of our estimators through simu-lation exercises.
Exogenous Variables
For simulation purpose, we imitate the environment of Dupas(2014). The network G is constructed from the GPS data of Dupas (2014). Specifically,two households i and j are considered connected if they live within 500-meter radius.After removing isolated nodes, we have a sample size of 538. The instrumental variable Z is also taken from Dupas (2014) where the binary Z i represents whether i received a highlevel of subsidy or not. Summary statistics of ( G, Z ) can be found in the next section.Throughout the simulation replications, G and Z are treated fixed. We do not consider X . Generating Endogenous Variables
Treatment choices are determined according to thefollowing equation: D i = { v i ≤ θ + θ Z i + θ π ∗ i } where v i ∼ iid N (0 , θ = ( θ , θ , θ ) = ( − , , .
5) under which the probability of D = 1 is around 0.8. Since | θ | < .
5, there exists a unique equilibrium by the Theorem1. Given our parameter values, we can compute the unique equilibrium { σ ∗ i ( G, Z, θ ) } ni =1 by calculating the fixed point to the following system: σ ∗ i ( G, Z, θ ) = Φ (cid:8) θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( G, Z, θ ) (cid:9) , ∀ i ∈ N n π ∗ i is then computed by π ∗ i ( G, Z, θ ) = (cid:80) j ∈N i σ ∗ j ( G, Z, θ ) / |N i | .27oeff. bias se cov.prob.FS θ θ -0.034 0.181 0.937 θ α β -0.005 0.530 0.979 α -0.004 0.333 0.959 β n = 538 with 3000 simulations. Target coverage probability is 0.95.Outcomes are realized according to the following rule: Y i = α i + β i π ∗ i if D i = 1 α i + β i π ∗ i if D i = 0 . We generate the random coefficients according to α i | v i ∼ iid N (2 + 0 . v i , , β i | v i ∼ iid N (1 + 0 . v i , ,α i | v i ∼ iid N (4 + 0 . v i , , β i | v i ∼ iid N (3 + 0 . v i , , so that ( E [ α i ] , E [ β i ] , E [ α i ] , E [ β i ]) or ( α , β , α , β ) is given as (2 , , , α i , β i , α i , β i ) and v i are given by ( ρ α , ρ β , ρ α , ρ β ) = (0 . , . , . , .
2) sothat D i is endogenous with respect to all coefficients.Table 1 reports the results for the bias, standard errors, and coverage probability for3000 replications. The target coverage probability is 0.95. As we observe from the firstcolumn, our estimators are unbiased. Our estimators perform well in terms of coverageprobabilities as well. Malaria is a life-threatening infectious disease responsible for approximately 1-3 milliondeaths per year. Most of these deaths are in children less than five years of age in ruralsub-Saharan Africa. The use of insecticide-treated nets (ITNs) has been shown to be acost-effective way to control malaria. However, the rate of adoption remains low and manyhouseholds exhibit low willingness to pay (WTP) for ITNs. In addition, positive health28ariable definition mean min maxdegree number of neighbors 16.41 1.00 38.00 Z D Y n = 583)externalities generated from using ITNs render the private adoption level that is less thanthe socially optimal one. For these reasons, public subsidy programs have been proposedto achieve socially optimal coverage rate.While it has been shown that distributing ITNs for free or at highly subsidized pricesis effective in increasing the adoption in the short run, there have been concerns that theshort-run, one-time subsidies would lower household’s WTPs for the product later, andthus reduce the adoption rate in the long-run. This could happen, for instance, when thereexist reference dependence effects in which households anchor their WTPs to previouslypaid subsidized prices. Consequently, households may be unwilling to pay a higher pricefor the product later once the subsidies end.On the other hand, some argue that short-run subsidies would be beneficial for thelong-run adoption since households could learn the benefits of the product better withprior experience. Such learning effects would increase consumer’s future WTPs. Moreover,the adoption process can be facilitated with social learning effects in which householdslearn benefits of the product from their neighbors’ prior experiences. As a result, one-timesubsidies would also be beneficial for long-run adoption rate and household’s WTP.Since ITNs need to be regularly replaced and re-purchased, understanding the factorsdetermining the short-run and long-run adoption decision is an important task for sus-tainable public subsidy schemes. Depending on whether reference dependence or learningeffects exist, the subsidy schemes would lead to different predictions on the short run andlong run demand for ITNs. In this application, therefore, we study the factors affectingthe short-run and long-run adoption (purchase) decision of ITNs. In doing so, we allowfor possible spillover effects in both short-run and long-run adoption decision. As Dupas(2014) showed, social interactions seem to play an important role in household’s bednetpurchase decision. Depending on whether there exist positive or negative peer effects inthe short run and in the long run, subsidy effectiveness may vary greatly.29ariable estimates marginal effects p-valuespillover ( π ) 2.308 0.661 0.000subsidy 0.694 0.199 0.000female-educ 0.223 0.064 0.026wealth 0.005 0.001 0.001Table 3: estimation results for FS model ( n = 583) Design of Experiment
We use data from a two-stage randomized pricing experimentconducted in Kenya by Dupas (2014). In Phase 1, households within six villages weregiven a voucher for the bednet at the randomly assigned subsidy level varying from 100%to 40% with the corresponding prices varying from 0 to 250 Ksh. In Phase 2, a year later,all study households in four villages were given a second voucher for a bednet. This time,however, all households faced the same subsidy level of 36%.
Data
Let Z i be a binary indicator representing that household i received a high subsidy(defined as the assigned price less than Ksh 50) in Phase 1. Treatment variable D i equalsto 1 if i purchased a bednet in Phase 1. Y i is also binary taking value 1 if i purchased abednet in Phase 2. Following Dupas (2014), we may interpret Y i as a proxy for i ’s WTPfor the future bednet. Network
Using GPS data, we construct the binarized spatial network. Two house-holds i and j are considered connected (i.e., G ij = 1) if they live within 500-meter radius.We also consider 250-m, and 750-m radius. Since the results do not differ much, we onlyreport results for 500-m radius. Other Covariates
For household pre-treatment covariates, we consider wealth, andthe education level of the female head.Summary statistics of the variables can be found on the Table 2. After deleting 25isolated nodes, we have n = 538 observations from four villages. Results on the short-run adoption
We first estimate the equation for the short-runadoption decision using our game-theoretic model. Table 3 displays the estimates of co-efficients, marginal effects , as well as associated standard errors and p-values. As an-ticipated, high-subsidy level is associated with higher adoption of the bednet. Education Marginal effects are computed as the sample average of conditional effects. For instance, the marginaleffect of Z i is computed as n (cid:80) ni =1 φ ( X (cid:48) i ˆ θ + ˆ θ Z i + ˆ θ π ∗ i ( S, ˆ θ ))ˆ θ . σ ∗ i , π ∗ i ) D = 1 estimates p-value D = 0 estimates p-valuecons 0.497 0.043 cons 0.128 0.174female-educ -0.094 0.530 female-educ -0.070 0.519wealth 0.003 0.388 wealth -0.002 0.325lambda 0.059 0.767 lambda 0.036 0.841 π -0.347 0.324 π -0.021 0.940 π *female-educ 0.031 0.906 π *female-educ 0.176 0.513 π *wealth -0.003 0.610 π *wealth 0.013 0.098 π *lambda 0.317 0.375 π *lambda -0.063 0.832Table 4: estimation results for SS model ( n = 583)and wealth are also positively associated with adoption decision in the short run. Thesevariables are all significant at 1 percent level. Figure 1 shows the estimated plot of (ˆ σ ∗ i , ˆ π ∗ i )by the value of Z i . The plot shows clearly that individual Z i is relevant for the treatmentchoice.Our results show strong evidence of the existence of positive spillover effects in theshort-run adoption decision. When the average adoption probability of neighbors ( π ∗ i )increases by 10 percentage points, i ’s short-run adoption probability ( σ ∗ i ) increases by6.6 percentage points. The resulting conformity effects implies that if we ignore spillovereffects in the specification, we would underestimate the full effect of the programs.31 esults on the long-run adoption Table 4 presents the estimates of own short-runadoption experience ( D i ) and average adoption probability of neighbors ( π ∗ i ) on the long-run adoption decision. Unfortunately, we have very limited statistical power except for fewconstants due to small sample size. However, in terms of magnitudes, estimated coefficientshave implications on the spillover effects in the long-run adoption decision.Using the formula 16 and 18, we get the following estimated mean response functions: (cid:98) E [ Y i (1 , π )] = 0 . − . π, (cid:98) E [ Y i (0 , π )] = 0 . − . π (31)First, let us consider (cid:98) E [ Y i (1 , π )]. Although the coefficient on π is not significant, weobserve considerable negative spillover effects in terms of magnitude: If π increases by10 percentage points, the probability of the second-period adoption probability decreasesby 3.4 percentage points. This is contrary to the positive spillovers observed in the firstperiod adoption decision. One possible explanation for such negative spillovers in thetreated response is that they result from positive health spillovers occurring over time. Forinstance, household with higher value of π would anticipate higher coverage rate in theirarea, which would result in lower malaria prevalence in the long run. This might makehouseholds less likely to re-invest the product later. Such results highlight the importanceof distinguishing the mechanism of static spillovers from that of dynamic spillovers.Such effects do not seem to apply to the untreated households as (cid:98) E [ Y i (0 , π )] shows.However, the statistical power is very limited. Average Direct Effect
From 31, the average direct effect (ADE) of own short-run adop-tion on the long-run adoption is computed as follows: (cid:98) E [ Y i (1 , π ) − Y i (0 , π )] = 0 . − . π (32)The result suggests that the values of ADE vary greatly depending on the value of π :when π = 0, treated households are 36.9 percentage points more likely to invest in thesecond bednet. However, such effect declines with the neighborhood exposure rate π .When π = 1, the effect is almost zero. The fact that ADE is positive for all possiblevalues of π points to the existence of learning effects from prior experience, rather than Dupas (2014) also report similar results from their reduced-form regression models. Their results showthat the adoption in Phase 2 is negatively affected by the share of neighbors who received a high subsidyin Phase 1.
Bias from ignoring spillovers
Suppose that we falsely ignore spillover effects in re-sponses. Using the conventional Heckit model, we obtain the following estimated averagetreatment effect (ATE): ˆ E [ Y i (1) − Y i (0)] = 0 . . Above result suggests that the effect of D on Y is very limited. However as equation 32shows, there is substantial heterogeneity in the effect of D on Y depending on values of π :the effect of D varies from almost 0 percent to 37 percent. Thus, by ignoring the spillovereffects, we would draw a misleading conclusion that there is no treatment effect. Observed heterogeneity in effects
Let us turn to the effect heterogeneity due to observ-able covariates, education and wealth. For the treated, the effect of education and wealthon the adoption rate seems to be trivial in magnitude: coefficients are close to zero andtheir associated p-values are large. We also compute the estimates without covariates. Themagnitude of the estimates resembles that with covariates. Therefore we do not reportthe result here. This also suggests that there seems to be little observed heterogeneity in E [ Y i (1 , π )] in terms of education and wealth.On the other hand, for D i = 0 case, the magnitudes of the estimates on the covariatesare much higher than those for D i = 1 case. Consider education first. The interactionbetween π and education suggests that higher education is associated with higher spillovereffect — one more year of education increases the effect of π from − .
02 to − .
02 + 0 .
17 =0 .
15. Similarly if wealth level increases by 1000 units, the effect on π increases by 1 . One advantage of our structural approach is that it allows researchers to simulate counter-factual policies. Suppose that a policy-maker is interested in implementing means-testedsubsidy schemes where Z is determined according to the following rule: Z i = { wealth i ≤ τ } , ∀ i ∈ N n l (33)33.e., household i gets high subsidy only when their wealth level is below some specifiedthreshold τ . The question is: what would be the expected outcome under this new, coun-terfactual subsidy rule?This problem is related to the literature on the policy-relevant treatment effects(PRTE: Heckman and Vytlacil (2001)). In this framework, each intervention or policyis defined by a manipulation on the exogenous variable S = ( G, X, Z ). In our setup, weassume that a policy maker has no means of changing the underlying network structure G or pre-treatment covariates X . Thus, the only way to change S is through changing Z . Let us denote the new counterfactual policy as S new = ( G, X, Z new ) where we set thevalue of Z as Z = Z new , which is not in the data. i ’s expected outcome under the newpolicy is given as E [ Y i | S = S new ]. Note that for any S , E [ Y i | S ] = E [ Y i | D i = 1 , S ] Pr( D i = 1 | S ) + E [ Y i | D i = 0 , S ] Pr( D i = 0 | S ) (34)Under our control function specification, E [ Y i | S ] can be written as follows:: E [ Y i | S ] = σ ∗ i ( S ) (cid:104) X (cid:48) i α + λ ( σ ∗ i ( S )) + (cid:110) X (cid:48) i β + λ ( σ ∗ i ( S )) (cid:111) π ∗ i ( S ) (cid:105) +(1 − σ ∗ i ( S )) (cid:104) X (cid:48) i α + λ ( σ ∗ i ( S )) + (cid:110) X (cid:48) i β + λ ( σ ∗ i ( S )) (cid:111) π ∗ i ( S ) (cid:105) = E [ Y i | X i , σ ∗ i ( S ) , π ∗ i ( S )]Note that E [ Y i | S ] is a function of S only through ( X i , σ ∗ i ( S ) , π ∗ i ( S )), thus we write E [ Y i | X i , σ ∗ i ( S ) , π ∗ i ( S )]. i ’s expected outcome under new policy is then given by E [ Y i | X i , σ ∗ i ( S new ) , π ∗ i ( S new )].To estimate this, we first need to compute the new equilibrium choice probabili-ties: { σ ∗ i ( G, X, Z new ) } i ∈N n where Z new is determined according to 33. Under the iden-tified first-stage parameters, this is done by solving the new fixed point of the best-response functions under the new data set S new = ( G, X, Z new ). We then estimateˆ Y i ≡ ˆ E [ Y i | X i , σ ∗ i ( S new ) , π ∗ i ( S new )] for each i ∈ N n using the formula above. Overall impactof policy S new is computed by (cid:80) ni =1 ˆ Y i /n . Results
See 2. The red line shows the effect of τ on the overall long-run adoption levelwhen we ignore interference effects. In such case, as τ increases, the long-run adoptionlevel increases monotonically. This is because as τ increases, more households get subsidy,and without interference, treated agents are more likely to adopt in the long-run.In the presence of spillovers, the effect of τ does not increase monotonically anymore34igure 2: counterfactual impact of means-tested subsidy on LR-adoptionas the blue line shows. Higher τ also induces higher π ∗ i which affect long-run adoptionnegatively. Therefore a priori, we cannot expect that higher τ would give higher overalllong-run adoption rate in the population. In fact, as the blue line shows, the highestlong-run adoption rate is achieved under the subsidy scheme targeting the very lowestpercentile households.The result also highlights complication involved in the use of subsidies to increaselong-run adoption rate. As the result shows, the highest expected coverage is only 17percent. In this paper, we propose a new methodological framework to analyze randomized ex-periments with spillovers and noncompliance in a general network setup. Using a game-theoretic framework, we allow for spillover effects to occur at two stages: at the choice35tage and outcome stage. Potential outcomes are modeled as a random coefficient modelto account for general unobserved heterogeneity. We extend the traditional control func-tion estimator of Heckman (1979) to incorporate spillovers. Finally, we illustrate ourmethods using Dupas (2014) data and show that our model can be used to evaluate thecounterfactual policies.In our treatment choice games, we assumed that private information is independentlydistributed across agents. Relaxing this assumption to allow for network dependence inprivate information would be a rewarding task. Another important issue is multiple equi-libria – formalizing a problem of policy evaluation and counterfactual prediction in thepresence of multiple equilibria is important for realistic policy design. Finally, we concludeby noting that our model can be used to derive an ex ante optimal treatment assignmentrule under interference, especially in settings where a social planner should take possiblenoncompliance and spillover into account. 36 ppendixA Proof of Theorem 1
Following Xu (2018), we show this by contradiction. Define ¯ σ i = (cid:80) j ∈N i σ j / |N i | . LetΓ( X i , Z i , ¯ σ i , θ ) = Φ( X (cid:48) i θ + θ Z i + θ ¯ σ i ) be i ’s best-response function to inputs ( X i , Z i , ¯ σ i ),and parameter value θ . Suppose there are two non-identically equilibria σ ∗ = ( σ ∗ i ) i ∈N n and σ + = ( σ + i ) i ∈N n . By definition, they should satisfy σ ∗ i = Γ( X i , Z i , ¯ σ ∗ i , θ ) , ∀ i ∈ N n and σ + i = Γ( X i , Z i , ¯ σ + i , θ ) , ∀ i ∈ N n . Taking difference and applying mean-value theorem, we have σ ∗ i − σ + i = Γ( X i , Z i , ¯ σ ∗ i , θ ) − Γ( X i , Z i , ¯ σ + i , θ )= ∂ Γ( X i , Z i , ¯ σ mi , θ ) ∂ ¯ σ i (¯ σ ∗ i − ¯ σ + i )where ¯ σ mi is a mean value between ¯ σ ∗ i and ¯ σ + i . Taking an absolute value to the LHS, | σ ∗ i − σ + i | ≤ (cid:12)(cid:12)(cid:12) ∂ Γ( X i , Z i , ¯ σ mi , θ ) ∂ ¯ σ i (cid:12)(cid:12)(cid:12) · | ¯ σ ∗ i − ¯ σ + i | (35) ≤ (cid:12)(cid:12)(cid:12) ∂ Γ( X i , Z i , ¯ σ mi , θ ) ∂ ¯ σ i (cid:12)(cid:12)(cid:12) · max j ∈N i | σ ∗ j − σ + j | . (36)From the definition of Γ( · ), observe that ∂ Γ( X i , Z i , ¯ σ i , θ ) ∂ ¯ σ i = Φ( X (cid:48) i θ + θ Z i + θ ¯ σ i ) ∂ ¯ σ i = φ (cid:0) X (cid:48) i θ + θ Z i + θ ¯ σ i (cid:1) θ . Thus, (cid:12)(cid:12)(cid:12) ∂ Γ( X i , Z i , ¯ σ mi , θ ) ∂ ¯ σ i (cid:12)(cid:12)(cid:12) ≤ | θ | sup u φ ( u ) ≡ λ. (37)Therefore we can write 36 as | σ ∗ i − σ + i | ≤ λ max j ∈N i | σ ∗ j − σ + j | . i ∈ N n to both sides gives,max i ∈N n | σ ∗ i − σ + i | ≤ λ max i ∈N n max j ∈N i | σ ∗ j − σ + j | ≤ λ max k ∈N n | σ ∗ k − σ + k | which leads to contradiction when λ < (cid:4) B Proofs for Asymptotic Results
B.1 Proof of consistency of first-stage estimators
Let l i ( θ ) ≡ D i ln σ ∗ i ( S, θ )+(1 − D i ) ln(1 − σ ∗ i ( S, θ )) be an individual log-likelihood functionof i . Then (cid:98) L n ( θ ) = n (cid:80) ni =1 l i ( θ ).Define L n ( θ ) = E [ (cid:98) L n ( θ ) | S ]where the population objective function, L n ( θ ), depends on n through the public state S =( G, X, Z ). Recall that the true parameter is denoted by θ . Following Gallant and White(1988) Theorem 3.3, we establish consistency result by showing identifiable uniquenessand uniform convergence result. Identifiable Uniqueness
We show that lim inf n →∞ ( L n ( θ ) − L n ( θ )) > θ suchthat | θ − θ | ≥ (cid:15) > − lim inf n →∞ ( L n ( θ ) − L n ( θ ))= lim inf n →∞ − n n (cid:88) i =1 E (cid:104) D i ln σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) + (1 − D i ) ln 1 − σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) (cid:12)(cid:12)(cid:12) S (cid:105) = lim inf n →∞ − n n (cid:88) i =1 (cid:104) σ ∗ i ( S, θ ) ln σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) + (1 − σ ∗ i ( S, θ )) ln 1 − σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) (cid:12)(cid:12)(cid:12) S (cid:105) ≥ lim inf n →∞ − n n (cid:88) i =1 ln (cid:0) σ ∗ i ( S, θ ) + 1 − σ ∗ i ( S, θ ) (cid:1) = 0 . The second equality follows from E [ D i | S ] = σ ∗ i ( S, θ ) and the last weak inequality is dueto Jensen’s inequality. To show that the inequality holds strictly, we need to rule outthe case of lim inf n →∞ ( L n ( θ ) − L n ( θ )) = 0. This happens when for some large enough n , σ ∗ i ( S, θ ) = σ ∗ i ( S, θ ) for all i ∈ N n = { , , · · · , n } , i.e., there exists n that deliversobservationally equivalent choice probabilities.38uppose this is the case. By the fixed point requirement, the following needs to besatisfied for any arbitrary θ , including the true parameter θ :Φ − ( σ ∗ i ( S, θ )) = X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) , ∀ i ∈ N n and Φ − ( σ ∗ i ( S, θ )) = X (cid:48) i θ + θ Z i + θ |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) , ∀ i ∈ N n . If σ ∗ i ( S, θ ) = σ ∗ i ( S, θ ) , ∀ i ∈ N n , we have, X (cid:48) i ( θ − θ ) + Z i ( θ − θ ) + ( θ − θ ) 1 |N i | (cid:88) j ∈N i σ ∗ j ( S, θ ) = 0 , ∀ i ∈ N n . Equivalently, R (cid:48) i ( θ − θ ) = 0 , ∀ i ∈ N n where R i is defined as in Theorem 2. It follows that( θ − θ ) (cid:48) (cid:80) ni =1 R i R (cid:48) i ( θ − θ ) = 0. Given the assumption that (cid:80) ni =1 R i R (cid:48) i is positive definitefor all large enough n , above equation holds only under θ = θ leading to contradiction. (cid:3) Next, we verify that sup θ ∈ Θ | (cid:98) L n ( θ ) − L n ( θ ) | p −→ . We first shows the pointwise con-vergence holds. Uniform convergence follows then from Lipschitz conditions.
Pointwise Convergence
We first show that for any θ ∈ Θ, | (cid:98) L n ( θ ) − L n ( θ ) | p −→ . It canbe shown that (cid:98) L n ( θ ) − L n ( θ ) = 1 n n (cid:88) i =1 (cid:110) ( D i − σ ∗ i ( S, θ )) ln σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) (cid:124) (cid:123)(cid:122) (cid:125) ζ i (cid:111) . { ζ i } ni =1 is conditionally independent with mean zero given S . It is also uniformly boundeddue to Lemma 1. Therefore we can apply a LLN for independent observations (e.g.,Markov) and the result follows. Uniform Convergence
Given pointwise convergence result, uniform convergence followsif we can establish that { (cid:98) L n ( θ ) − L n ( θ ) } n is stochastically equicontinuous on Θ (theorem1 in Andrews (1992)). Sufficient condition for this is to show that the summand in thesample objective function { l i ( θ ) } is Lipschitz (Assumption W-LIP in Andrews (1992)).39ote that ∇ θ l i ( θ ) = D i ∇ θ σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) + (1 − D i ) −∇ θ σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ )which is bounded by |∇ θ l i ( θ ) | ≤ (cid:12)(cid:12)(cid:12) ∇ θ σ ∗ i ( S, θ ) σ ∗ i ( S, θ ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∇ θ σ ∗ i ( S, θ )1 − σ ∗ i ( S, θ ) (cid:12)(cid:12)(cid:12) . By Lemma 1 and Lemma 2, σ ∗ i ( S, θ ) and ∇ θ σ ∗ i ( S, θ ) are uniformly bounded. Therefore { l i ( θ ) } is Lipschitz-continuous and the result follows. (cid:4) B.2 Proof of asymptotic normality of first-stage estimators ˆ θ should satisfy the first-order condition for maximization: ∇ θ (cid:98) L n (ˆ θ ) = 0. Given that (cid:98) L n (ˆ θ )is smooth, we can apply the mean-value theorem to the first-order condition around thetrue parameter θ : ∇ θ (cid:98) L n (ˆ θ ) = ∇ θ (cid:98) L n ( θ ) + ∇ θθ (cid:98) L n (¯ θ )(ˆ θ − θ ) = 0 (38) ⇐⇒ √ n (ˆ θ − θ ) = − ( ∇ θ (cid:98) L n (¯ θ )) − √ n ∇ θ (cid:98) L n (ˆ θ ) (39)where ¯ θ is a mean value of the line joining ˆ θ and θ . Define the Hessian matrix as H n ( θ ) = E (cid:104) n n (cid:88) i =1 ∇ θθ l i ( θ ) (cid:12)(cid:12)(cid:12) S (cid:105) and the information matrix as I n ( θ ) = E (cid:104) n n (cid:88) i =1 ∇ θ l i ( θ ) ∇ θ l i ( θ ) (cid:48) (cid:12)(cid:12)(cid:12) S (cid:105) . We first show that ∇ θθ (cid:98) L n (¯ θ ) − H n ( θ ) p −→ √ n I − n ( θ ) ∇ θ (cid:98) L n (ˆ θ ) d −→ N (0 , I dim ( θ ) ) (CLT on the score).40 LLN of the Hessian Matrix
We show that ∇ θθ (cid:98) L n (¯ θ ) − H n ( θ ) p −→
0. Note that ∇ θθ (cid:98) L n (¯ θ ) − H n ( θ )= 1 n n (cid:88) i =1 ∇ θθ l i (¯ θ ) − n n (cid:88) i =1 ∇ θθ l i ( θ ) (cid:124) (cid:123)(cid:122) (cid:125) A + 1 n n (cid:88) i =1 ∇ θθ l i ( θ ) − E (cid:104) n n (cid:88) i =1 ∇ θθ l i ( θ ) (cid:12)(cid:12)(cid:12) S (cid:105)(cid:124) (cid:123)(cid:122) (cid:125) B First, A = o p (1) since ˆ θ − θ p −→ ∇ θθ l i ( · ) is continuous as a result of Lemma 3. Next,note that B = 1 n n (cid:88) i =1 (cid:110) ∇ θθ l i ( θ ) − E (cid:2) ∇ θθ l i ( θ ) (cid:12)(cid:12) S (cid:3)(cid:111)(cid:124) (cid:123)(cid:122) (cid:125) ξ i { ξ i } is independent conditional on S with mean zero. Also by Lemma 3, it is uniformlybounded. Therefore by LLN for independent observations, B = o p (1). CLT on the Score
Note that √ n ∇ θ (cid:98) L n ( θ ) = √ n n (cid:80) ni =1 ∇ θ l i ( θ ) and that {∇ θ l i ( θ ) } is independently distributed conditional on S with the uniformly bounded conditionalvariance I n ( θ ). Therefore we can apply Lyapunov’s CLT for independent observationsto get √ n I − / n ( θ ) ∇ θ (cid:98) L n ( θ ) d −→ N (0 , I ).Combining all these results, we see that the equation 39 can be written as √ n (ˆ θ − θ ) = − ( H n ( θ ) + o p (1)) − I n ( θ ) / √ n I n ( θ ) − / ∇ (cid:99) L n ( θ )By the information matrix inequality, when the model is correctly specified, H n ( θ ) = −I n ( θ ) so that we have √ n (ˆ θ − θ ) = ( I n ( θ ) + o p (1)) − I n ( θ ) / √ n I n ( θ ) − / ∇ (cid:99) L n ( θ )Under the assumption that I n ( θ ) is nonsingular, we get the desired result: √ n ( I − n ( θ )) − / (ˆ θ − θ ) d −→ N (0 , I dim ( θ ) ) . (cid:4) .3 Proof of consistency of second-stage estimators Our estimators are based on the following moment conditions E [ Y i | D i = 1 , S ] = W (cid:48) i γ , E [ Y i | D i = 0 , S ] = W (cid:48) i γ Let us focus on ˆ γ case as ˆ γ case can be analyzed in an analogous way.Given the moment condition E [ Y i | D i = 1 , S ] = W (cid:48) i γ , we write the equation in errorform as Y i = W (cid:48) i γ + (cid:15) i , E [ (cid:15) i | D i = 1 , S ] = 0 . Estimator for γ is defined asˆ γ = arg min γ n n (cid:88) i =1 D i (cid:0) Y i − ˆ W (cid:48) i γ (cid:1) (40)= arg min γ n n (cid:88) i =1 (cid:0) D i Y i − D i ˆ W (cid:48) i γ (cid:1) (41)= (cid:110) n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:111) − n (cid:88) i =1 D i ˆ W i Y i (42)Note that D i Y i = D i Y i (1 , π ∗ i ( S, θ )) = D i ( W (cid:48) i γ + (cid:15) i ) = D i (cid:0) ˆ W (cid:48) i γ + (cid:15) i − ( ˆ W i − W i ) (cid:48) γ (cid:1) .Plugging this into 42 gives thatˆ γ = (cid:0) n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:1) − n (cid:88) i =1 D i ˆ W i (cid:0) ˆ W (cid:48) i γ + (cid:15) i − ( ˆ W i − W i ) (cid:48) γ (cid:1) = γ + (cid:0) n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:1) − (cid:88) i D i ˆ W i (cid:0) (cid:15) i − ( ˆ W i − W i ) (cid:48) γ (cid:1) so thatˆ γ − γ = (cid:0) n n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:124) (cid:123)(cid:122) (cid:125) A (cid:1) − n (cid:88) i D i ˆ W i (cid:0) (cid:15) i − ( ˆ W i − W i ) (cid:48) γ (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) B = A − B . (43)42 art A We show that n (cid:80) i =1 D i ˆ W i ˆ W (cid:48) i − E [ n (cid:80) ni =1 D i W i W (cid:48) i | S ] = o p (1). Decompose n (cid:80) i =1 D i ˆ W i ˆ W (cid:48) i − E [ n (cid:80) ni =1 D i W i W (cid:48) i | S ] into two parts as follows:1 n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i − n n (cid:88) i =1 D i W i W (cid:48) i (cid:124) (cid:123)(cid:122) (cid:125) ( a ) + 1 n n (cid:88) i =1 D i W i W (cid:48) i − n n (cid:88) i =1 E [ D i W i W (cid:48) i | S ] (cid:124) (cid:123)(cid:122) (cid:125) ( b ) . ( a ) = o p (1) since ˆ θ − θ p −→ W i ( θ ) is continuous in θ . For ( b ), note that the sum-mand { D i W i W (cid:48) i − E [ D i W i W (cid:48) i | S ] } is conditionally independent given S with mean zero.It is also uniformly bounded. Therefore by LLN, ( b ) = o p (1). Finally, invertibility of E [ n (cid:80) ni =1 D i W i W (cid:48) i | S ] follows from the identification condition. Part B
Since ˆ W i − W i = o p (1), we can write it B as1 n n (cid:88) i =1 D i ( W i + o p (1))( (cid:15) i − o p (1)) = 1 n n (cid:88) i =1 D i W i (cid:15) i Similar argument as above shows that1 n n (cid:88) i =1 (cid:16) D i W i (cid:15) i − E [ D i W i (cid:15) i | S ] (cid:17) = o p (1) . It follows from the moment condition E [ (cid:15) i | D i = 1 , S ] = 0 that E [ D i W i (cid:15) i | S ] = 0. There-fore we conclude that B = 1 n n (cid:88) i =1 D i W i (cid:15) i + o p (1) = o p (1) . Combining with the result on part A , we conclude that ˆ γ − γ = o p (1). (cid:4) B.4 Proof of asymptotic normality of second-stage estimators
From 43, √ n (ˆ γ − γ ) = (cid:16) n n (cid:88) i =1 D i ˆ W i ˆ W (cid:48) i (cid:17) − √ n (cid:88) i D i ˆ W i (cid:16) (cid:15) i − γ (cid:48) ( ˆ W i − W i ) (cid:17) (44)= (cid:16) E [ 1 n n (cid:88) i =1 D i W i W (cid:48) i | S ] + o p (1) (cid:17) − √ n (cid:88) i D i ˆ W i (cid:16) (cid:15) i − γ (cid:48) ( ˆ W i − W i ) (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) C (45)43here the last step has been established in the previous section. Consider the term ˆ W i − W i in C . By mean-value theorem,ˆ W i − W i = W i (ˆ γ ) − W i ( γ ) = ∇ γ W i (¯ γ )(ˆ γ − γ )= ⇒ √ n ( ˆ W i − W i ) = ∇ γ W i (¯ γ ) √ n (ˆ γ − γ )where ¯ γ is a mean value of the line joining ˆ γ and γ . By the asymptotic normality of thefirst-step estimator ˆ θ as in the equation 30, we can show that √ n (ˆ θ − θ ) is asymptoticallylinear. Specifically, define the influence function as η i = E [ n (cid:80) ni =1 ∇ θ l i ( θ ) ∇ θ l i ( θ ) (cid:48) | S ] ∇ θ l i ( θ ),then √ n (ˆ θ − θ ) = 1 √ n n (cid:88) i =1 η i + o p (1) . Therefore the term C in √ n (ˆ γ − γ ) can be written as1 √ n (cid:88) i D i ˆ W i ( (cid:15) i − γ (cid:48) ( ˆ W i − W i )) = 1 √ n n (cid:88) i =1 D i ˆ W i (cid:15) i − n n (cid:88) i =1 D i ˆ W i γ (cid:48) √ n ( ˆ W i − W i )= 1 √ n n (cid:88) i =1 D i ˆ W i (cid:15) i (cid:124) (cid:123)(cid:122) (cid:125) C ( a ) − (cid:110) n n (cid:88) i =1 D i ˆ W i γ (cid:48) ∇ γ W i (¯ γ ) (cid:111)(cid:124) (cid:123)(cid:122) (cid:125) C ( b ) √ n n (cid:88) i =1 η i + o p (1)We first show that C ( a ) can be replaced by √ n (cid:80) ni =1 D i W i (cid:15) i and that C ( b ) can bereplaced by E [ n (cid:80) ni =1 D i W i γ (cid:48) ∇ γ W i ( γ )]. Part C(a)
We show that 1 √ n n (cid:88) i =1 (cid:16) D i ˆ W i (cid:15) i − D i W i (cid:15) i (cid:17) p −→ √ n n (cid:88) i =1 D i ( ˆ W i − W i ) (cid:15) i = 1 √ n n (cid:88) i =1 D i ∇ γ W i (¯ γ )(ˆ γ − γ (cid:1) (cid:15) i (46)= 1 n n (cid:88) i =1 D i ∇ γ W i (¯ γ ) √ n (ˆ γ − γ ) (cid:15) i (47)= 1 n n (cid:88) i =1 D i ∇ γ W i (¯ γ ) (cid:0) √ n n (cid:88) i =1 η i (cid:1) (cid:15) i (48)= (cid:16) n n (cid:88) i =1 D i ∇ γ W i (¯ γ ) (cid:15) i (cid:17) √ n n (cid:88) i =1 η i (49)44t can be shown easily that n (cid:80) ni =1 (cid:16) D i ∇ γ W i ( γ ) (cid:15) i − E [ D i ∇ γ W i (¯ γ ) (cid:15) i | S ] (cid:17) p −→ E [ D i ∇ γ W i ( γ ) (cid:15) i | S ] = 0 from the moment condition. Therefore equation 49 becomes o p (1) × O p (1) and the result follows. (cid:4) Part C ( b ) We show that1 n n (cid:88) i =1 D i ˆ W i γ (cid:48) ∇ γ W i (¯ γ ) − E [ 1 n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) | S ] = o p (1) . Decompose the LHS as1 n n (cid:88) i =1 D i ˆ W i γ (cid:48) ∇ γ W i (¯ γ ) − n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:124) (cid:123)(cid:122) (cid:125) A + 1 n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) − E [ 1 n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) | S ] (cid:124) (cid:123)(cid:122) (cid:125) B .A = o p (1) since ˆ θ − θ p −→
0. Also, since { D i W i γ (cid:48) ∇ γ W i ( γ ) } are conditionally independentgiven S and uniformly bounded, we can apply Markov LLN to show that B = o p (1). (cid:4) Combining all the results, term C can be written as C = 1 √ n n (cid:88) i =1 D i W i (cid:15) i − E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) √ n n (cid:88) i =1 η i + o p (1)= 1 √ n n (cid:88) i =1 (cid:110) D i W i (cid:15) i − E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) η i (cid:111)(cid:124) (cid:123)(cid:122) (cid:125) ζ i . Since ζ i | S has a mean zero and is independently distributed, we can apply CLT forthe independent observation and get Ψ − / n √ n (cid:80) ni =1 ζ i d −→ N (0 , I dim ( γ ) ) where Ψ n = n (cid:80) ni =1 E [ ζ i ζ (cid:48) i | S ] which can be simplified as1 n n (cid:88) i =1 E [ D i W i W (cid:48) i (cid:15) i ] + E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) n n (cid:88) i =1 E [ η i η (cid:48) i | S ] E (cid:104) n n (cid:88) i =1 D i W i γ (cid:48) ∇ γ W i ( γ ) (cid:12)(cid:12)(cid:12) S (cid:105) (cid:48) as the cross-terms get crossed out due to E [ (cid:15) i η (cid:48) i | S ] = 0, i.e., the first- and second-stagemoments are uncorrelated. Finally, from 45, and by defining Υ n = E [ n (cid:80) ni =1 W i W (cid:48) i | S ],45e have Λ − / n √ n (ˆ γ − γ ) d −→ N (0 , I dim ( γ ) )for Λ n = Υ − n Ψ n Υ − n as desired. (cid:4) C Auxiliary Lemmas
Lemma 1 (uniform boundedness of σ ∗ i ( S, θ )) . There exists a constant C ∈ (0 , suchthat σ ∗ i ( S, θ ) ≥ C for any i, S, θ and n . (Proof) As in A, let us define agent’s best-response function as Γ( X i , Z i , ¯ σ i , θ ) =Φ( X (cid:48) i θ + θ Z i + θ ¯ σ i ). Recall that σ ∗ i ( S, θ ) = Φ( X (cid:48) i θ + θ Z i + θ π ∗ i ( S, θ )).The resultfollows since X i is bounded, Z i is binary, and π ∗ i ( S, θ ) ≤ (cid:4) Lemma 2 (uniform boundedness of ∇ σ i ) . Suppose λ < . There exists a finite constant C such that sup i,n,S,θ,k (cid:12)(cid:12) ∂σ ∗ i ( S, θ ) ∂θ k (cid:12)(cid:12) < C < ∞ . (Proof) Recall that σ ∗ i ( S, θ ) = Γ( X i , Z i , ¯ σ ∗ i ( S, θ ) , θ ) . Differentiating above equation with respect to θ k gives ∂σ ∗ i ( S, θ ) ∂θ k = ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂θ k + ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂ ¯ σ ∗ i ∂ ¯ σ ∗ i ( S, θ ) ∂θ k Equivalently, ∂σ ∗ i ( S, θ ) ∂θ k = ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂θ k + 1 |N i | (cid:88) j ∈N i ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂ ¯ σ ∗ i ∂σ ∗ j ( S, θ ) ∂θ k (50)which gives the implicit function of [ ∂σ ∗ i ( S, θ ) /∂θ k ] i ∈N n . Let us write 50 in matrix formby defining the following: • Let χ n be n × i th component ∂σ ∗ i ( S, θ ) /∂θ k . • Let D n be n × n matrix with ij th element1 |N i | ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂ ¯ σ ∗ i G ij = 1 and zero if G ij = 0. • Let τ n be n × i th component ∂ Γ( X i ,Z i , ¯ σ ∗ i ,θ ) ∂θ k .Then we can write the system 50 as χ n = D n χ n + τ n or equivalently,( I n − D n ) χ n = τ n which is invertible if || D n || ∞ < || D n || ∞ is the maximumof the absolute values of row sums, i.e., || D n || ∞ = max i ∈N n (cid:12)(cid:12)(cid:12) ∂ Γ( X i , Z i , ¯ σ ∗ i , θ ) ∂ ¯ σ ∗ i (cid:12)(cid:12)(cid:12) .
37 implies that || D n || ∞ ≤ λ , thus || D n || ∞ <
1. Therefore D n is invertible and ( I n − D n ) − = (cid:80) ∞ t =0 D tn . It follows that χ n = ( (cid:80) ∞ t =0 D tn ) τ n . Taking sup norm gives || χ n || ∞ ≤ ∞ (cid:88) t =0 || D tn || ∞ || τ n || ∞ = || τ n || ∞ − λ < C τ − λ since RHS does not depend on ( i, n, z n , θ, k ), we have the desired result. (cid:4) Lemma 3 (uniform boundedness of ∇ σ i ) . Suppose λ < . There exists a finite constant C such that | ∂ σ ∗ i ( S, θ ) ∂θ m ∂θ k | < C < ∞ for any i, n, S, θ, k, m a.s. (Proof) Fix m . Differentiating the equation 50 w.r.t. θ m gives ∂ σ i ∂θ m ∂θ k = ∂ Γ ∂θ m ∂θ k + ∂ Γ ∂ ¯ σ i ∂θ k ∂ ¯ σ i ∂θ m + ∂ Γ ∂ ¯ σ i ∂ ¯ σ i ∂θ m ∂θ k + ∂ ¯ σ i ∂θ k (cid:110) ∂ Γ ∂ ¯ σ i ∂ ¯ σ i ∂θ m + ∂ Γ ∂θ m ∂ ¯ σ i (cid:111) . Let us write it compactly as follows: ∂ mk σ i = Γ mk + Γ ¯ σk ∂ m ¯ σ i + Γ ¯ σ ∂ mk ¯ σ i + Γ ¯ σ ¯ σ ∂ k ¯ σ i ∂ m ¯ σ i + Γ ¯ σm ∂ k ¯ σ i . (51)Write 51 in a matrix form by defining • Let ˜ χ n be n × i th component ∂ mk σ i .47 Let ˜ τ n be n × i th componentΓ mk + Γ ¯ σk ∂ m ¯ σ i + Γ ¯ σ ¯ σ ∂ k ¯ σ i ∂ m ¯ σ i + Γ ¯ σm ∂ k ¯ σ i . Then 51 can be written as ( I n − D n ) ˜ χ n = ˜ τ n . As we have shown before, D n is invertible. For any i ∈ N n , | τ i | ≤ B θ,θ +2 B ¯ σθ C ∂σ + B ¯ σ, ¯ σ C ∂σ ,so that || τ n || ∞ = max i | τ i | is uniformly bounded. Therefore, || ˜ x n || ∞ ≤ C τ − λ and the result follows. (cid:4) eferences Donald W. K. Andrews. Generic uniform convergence.
Econometric Theory , 8(2):241–257,1992.Sarah Baird, J. Aislinn Bohren, Craig McIntosh, and Berk ¨Ozler. Optimal design ofexperiments in the presence of interference.
The Review of Economics and Statistics ,(5):844–860, 2018.Patrick Bajari, Han Hong, John Krainer, and Denis Nekipelov. Estimating static models ofstrategic interactions.
Journal of Business & Economic Statistics , 28(4):469–482, 2010.doi: 10.1198/jbes.2009.07264. URL https://doi.org/10.1198/jbes.2009.07264 .Jorge Balat and Sukjin Han. Multiple treatments with strategic interaction. arXiv, 2019.Christian N. Brinch, Magne Mogstad, and Matthew Wiswall. Beyond late with a discreteinstrument.
Journal of Political Economy , 125(4):985–1039, 2017. doi: 10.1086/692712.URL https://doi.org/10.1086/692712 .William Brock and Steven Durlauf. Identification of binary choice models with socialinteractions.
Journal of Econometrics , 140(1):52–75, 2007. URL https://EconPapers.repec.org/RePEc:eee:econom:v:140:y:2007:i:1:p:52-75 .William A. Brock and Steven N. Durlauf. Discrete Choice with Social Interactions.
TheReview of Economic Studies , 68(2):235–260, 04 2001. ISSN 0034-6527. doi: 10.1111/1467-937X.00168. URL https://doi.org/10.1111/1467-937X.00168 .Pedro Carneiro, James J. Heckman, and Edward J. Vytlacil. Estimating marginal returnsto education.
American Economic Review , 101(6):2754–81, October 2011. doi: 10.1257/aer.101.6.2754. URL .Bruno Cr´epon, Esther Duflo, Marc Gurgand, Roland Rathelot, and Philippe Zamora. DoLabor Market Policies have Displacement Effects? Evidence from a Clustered Random-ized Experiment *.
The Quarterly Journal of Economics , 128(2):531–580, 04 2013. ISSN0033-5533. doi: 10.1093/qje/qjt001. URL https://doi.org/10.1093/qje/qjt001 .Pascaline Dupas. Short-run subsidies and long-run adoption of new health products:Evidence from a field experiment.
Econometrica , 82(1):197–228, 2014. doi: https:49/doi.org/10.3982/ECTA9508. URL https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA9508 .Marc Ferracci, Gr´egory Jolivet, and Gerard J. van den Berg. Evidence of treatmentspillovers within markets.
The Review of Economics and Statistics , 95(5):812–823,2014.A. Gallant and H. White.
A Unified Theory of Estimation and Inference for NonlinearDynamic Models . Oxford: Basil Blackwell, 1988.Susan Godlonton and Rebecca Thornton. Peer effects in learning hiv results.
Journalof Development Economics , 97(1):118 – 129, 2012. ISSN 0304-3878. doi: https://doi.org/10.1016/j.jdeveco.2010.12.003. URL .Jinyong Hahn and Geert Ridder. Conditional moment restrictions and triangular simul-taneous equations.
The Review of Economics and Statistics , 93(2):683–689, 2011.James J. Heckman. Sample selection bias as a specification error.
Econometrica , 47(1):153–161, 1979.James J. Heckman. Micro data, heterogeneity, and the evaluation of public policy: Nobellecture.
Journal of Political Economy , 109(4):673–748, 2001.James J. Heckman and Edward Vytlacil. Policy-relevant treatment effects.
AmericanEconomic Review , 91(2):107–111, May 2001. doi: 10.1257/aer.91.2.107. URL .James J Heckman, Sergio Urzua, and Edward Vytlacil. Understanding instrumental vari-ables in models with essential heterogeneity.
The Review of Economics and Statistics , 88(3):389–432, 2006. doi: 10.1162/rest.88.3.389. URL https://doi.org/10.1162/rest.88.3.389 .Michael G Hudgens and M. Elizabeth Halloran. Toward causal inference with interference.
Journal of the American Statistical Association , 103(482):832–842, 2008. doi: 10.1198/016214508000000292. URL https://doi.org/10.1198/016214508000000292 . PMID:19081744. 50osuke Imai, Zhichao Jiang, and Anup Malani. Causal inference with interferenceand noncompliance in two-stage randomized experiments.
Journal of the AmericanStatistical Association , 0(0):1–13, 2020. doi: 10.1080/01621459.2020.1775612. URL https://doi.org/10.1080/01621459.2020.1775612 .Guido W. Imbens.
Nonadditive Models with Endogenous Regressors , volume 3 of
Econo-metric Society Monographs , pages 17–46. Cambridge University Press, advances ineconomics and econometrics: theory and applications, ninth world congress edition,2007.Guido W. Imbens and Joshua D. Angrist. Identification and estimation of local averagetreatment effects.
Econometrica , 62:467–475, 1994.Matthew O. Jackson, Zhongjian Lin, and Ning Neil Yu. Adjusting for peer-influence inpropensity scoring when estimating treatment effects, 2020.Brendan Kline and Elie Tamer. Chapter 7 - econometric analysis of models with so-cial interactionssome of this chapter had been previously distributed as “the em-pirical content of models with social interactions” and “some interpretation of thelinear-in-means model of social interactions” by the same authors. In Bryan Gra-ham and ´Aureo de Paula, editors,
The Econometric Analysis of Network Data , pages149 – 181. Academic Press, 2020. ISBN 978-0-12-811771-2. doi: https://doi.org/10.1016/B978-0-12-811771-2.00013-4. URL .Natalia Lazzati. Treatment response with social interactions: Partial identificationvia monotone comparative statics.
Quantitative Economics , 6(1):49–83, 2015. doi:https://doi.org/10.3982/QE308. URL https://onlinelibrary.wiley.com/doi/abs/10.3982/QE308 .Lung-Fei Lee. Tests for the bivariate normal distribution in econometric models withselectivity.
Econometrica , 52(4):843–863, 1984.Michael P. Leung. Two-step estimation of network-formation models with incomplete in-formation.
Journal of Econometrics , 188(1):182 – 195, 2015. ISSN 0304-4076. doi:https://doi.org/10.1016/j.jeconom.2015.04.001. URL .51ichael P. Leung. Treatment and spillover effects under network interference.
The Reviewof Economics and Statistics , 102(2):368–380, 2020a.Michael P. Leung. Causal inference under approximate neighborhood interference. arXiv,2020b.Charles F. Manski. Identification of treatment response with social interactions.
The Econometrics Journal , 16(1):S1–S23, 2013. doi: https://doi.org/10.1111/j.1368-423X.2012.00368.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1368-423X.2012.00368.x .Matthew A. Masten and Alexander Torgovitsky. Identification of instrumental variablecorrelated random coefficients models.
The Review of Economics and Statistics , 98(5):1001–1005, 2016.Daniel McFadden. Econometric analysis of qualitative response models. In Z. Griliches † and M. D. Intriligator, editors, Handbook of Econometrics , volume 2, chapter 24, pages1395–1457. Elsevier, 1 edition, 1984. URL https://EconPapers.repec.org/RePEc:eee:ecochp:2-24 .Edward Miguel and Michael Kremer. Worms: Identifying impacts on education and healthin the presence of treatment externalities.
Econometrica , 72(1):159–217, 2004. doi:https://doi.org/10.1111/j.1468-0262.2004.00481.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0262.2004.00481.x .Geert Ridder and Shuyang Sheng. Estimation of large network formation games. arXiv,2020.D. B. Rubin. Comments on “on the application of probability theory to agriculturalexperiments. essay on principles. section 9” by j. splawa-neyman translated from thepolish and edited by d. m. dabrowska and t. p. speed.
Statistical Science , 5:472–480,1990.Gonzalo Vazquez-Bare. Causal spillover effects using instrumental variables. arXiv, 2020.Jeffrey M. Wooldridge. Further results on instrumental variables estimation of averagetreatment effects in the correlated random coefficient model.
Economics Letters , 79(2):185 – 191, 2003. ISSN 0165-1765. doi: https://doi.org/10.1016/S0165-1765(02)00318-X.URL .52aiqing Xu. Social interactions in large networks: A game theoretic approach.
Interna-tional Economic Review , 59(1):257–284, 2018. doi: https://doi.org/10.1111/iere.12269.URL https://onlinelibrary.wiley.com/doi/abs/10.1111/iere.12269https://onlinelibrary.wiley.com/doi/abs/10.1111/iere.12269