[PDF] Heterogeneous Endogenous Effects in Networks

Abstract

This paper proposes a new method to identify leaders and followers in a network. Prior works use spatial autoregression models (SARs) which implicitly assume that each individual in the network has the same peer effects on others. Mechanically, they conclude the key player in the network to be the one with the highest centrality. However, when some individuals are more influential than others, centrality may fail to be a good measure. I develop a model that allows for individual-specific endogenous effects and propose a two-stage LASSO procedure to identify influential individuals in a network. Under an assumption of sparsity: only a subset of individuals (which can increase with sample size n) is influential, I show that my 2SLSS estimator for individual-specific endogenous effects is consistent and achieves asymptotic normality. I also develop robust inference including uniformly valid confidence intervals. These results also carry through to scenarios where the influential individuals are not sparse. I extend the analysis to allow for multiple types of connections (multiple networks), and I show how to use the sparse group LASSO to detect which of the multiple connection types is more influential. Simulation evidence shows that my estimator has good finite sample performance. I further apply my method to the data in Banerjee et al. (2013) and my proposed procedure is able to identify leaders and effective networks.

Full PDF

HHeterogeneous Endogenous Eﬀects in Networks ∗ Sida Peng † August 5, 2019

Abstract

This paper proposes a new method to identify leaders and followers in a network. Prior works usespatial autoregression models (SARs) which implicitly assume that each individual in the network hasthe same peer eﬀects on others. Mechanically, they conclude the key player in the network to be the onewith the highest centrality. However, when some individuals are more inﬂuential than others, centralitymay fail to be a good measure. I develop a model that allows for individual-speciﬁc endogenous eﬀectsand propose a two-stage LASSO procedure to identify inﬂuential individuals in a network. Under an as-sumption of sparsity: only a subset of individuals (which can increase with sample size n ) is inﬂuential,I show that my 2SLSS estimator for individual-speciﬁc endogenous eﬀects is consistent and achievesasymptotic normality. I also develop robust inference including uniformly valid conﬁdence intervals.These results also carry through to scenarios where the inﬂuential individuals are not sparse. I extendthe analysis to allow for multiple types of connections (multiple networks), and I show how to use thesparse group LASSO to detect which of the multiple connection types is more inﬂuential. Simulationevidence shows that my estimator has good ﬁnite sample performance. I further apply my method tothe data in Banerjee et al. (2013) and my proposed procedure is able to identify leaders and eﬀectivenetworks. key words: key players, network, endogenous eﬀects, spillovers, high-dimensional models, LASSO,model selection, robust inference ∗ I would like to thank Francesca Molinari, Matthew Backus, David Easley, Marten Wegkamp, Donald Kenkel,Zhuan Pei, Douglas Miller, Joris Pinkse, Peter Hull, Yanlei Ma and participants in seminars and conferences at whichthis paper was presented. All remaining errors are mine. † Microsoft Research, [email protected] a r X i v : . [ ec on . E M ] A ug Introduction

How an individual’s behavior is aﬀected by the behavior of her neighbors in an exogenously givennetwork is an important research question in applied economics. With the increasing availability ofdetailed data documenting connections among individuals, spatial autoregression models (SARs)have been widely applied in the empirical networks literature to estimate endogenous eﬀects.In SARs, an individual’s behavior depends on the weighted average of other individuals’ behav-iors (see Anselin, 1988; Kelejian and Prucha, 1998). Standard SARs assume that the peer ef-fects/endogenous eﬀects are the same across individuals in a network. Each individual inﬂuencesher neighbors at the same rate regardless of who she is. However, in many contexts, some indi-viduals are clearly more inﬂuential than others. For example, Mas and Moretti (2009) ﬁnds thatthe magnitude of spillovers varies dramatically among workers with diﬀerent skill levels. Clark andLoheac (2007) also notes that popular teenagers in a school have much stronger inﬂuence on theirclassmates’ smoking decisions than their less popular peers.I propose a novel SAR model which allows for heterogeneous endogenous eﬀects. Each individual ina network simultaneously generates an outcome that takes into account all her neighbors’ behaviors.Unlike standard SARs, each individual has an individual-speciﬁc eﬀect on her neighbors. As a result,there are as many coeﬃcients for individual-speciﬁc endogenous eﬀects as there are individuals inthe network. To achieve identiﬁcation, I assume that “truly-inﬂuential” individuals only constitutea small fraction of the total population. In other words, individual-speciﬁc coeﬃcients are assumedto be sparse. This assumption allows me to estimate the model via the least absolute shrinkageand selection operator (LASSO). The LASSO procedure penalizes the l norm for the coeﬃcientsof heterogeneous endogenous eﬀects. The geometry of the l norm enforces the sparsity in theLASSO estimators. If a coeﬃcient is selected by LASSO (i.e. the estimated coeﬃcient is non-zero), the individual associated with this coeﬃcient can inﬂuence all her neighbors at her speciﬁcrate. Otherwise the LASSO estimator will indicate that the individual has no inﬂuence on herneighbors. With some restrictions on the network structures, I show that the LASSO estimates forheterogeneous endogenous eﬀects have near oracle performance (see B¨uhlmann and van de Geer,2011). In other words, the selection of inﬂuential individuals is consistent and the convergence rateof non-zero LASSO estimates is the same as the convergence rate that would have been achieved ifthe truly inﬂuential individuals were known.One challenge in my estimation process is the presence of endogeneity in the spatial lag and errorterm. As with standard SARs, the dependent variable in my model is used to construct spatiallags as an independent variable. As a result, the regressors are correlated with the error term and1stimates would be biased if we were to apply LASSO directly.First I propose a set of novel instruments to address the endogeneity. Following Kelejian and Prucha(1998), I express the dependent variable as an inﬁnite sum of functions consisting of exogenouscharacteristics and an adjacency matrix. I show that the exogenous characteristics of inﬂuentialindividuals can be used as instruments for their neighbors. Then I design a two-stage estimationprocess for heterogeneous endogenous eﬀects using LASSO at each stage. In the ﬁrst stage , Iuse LASSO to estimate the coeﬃcients for the instruments. These estimated coeﬃcients andinstruments are then used to create a synthetic dependent variable.

In the second stage , I replace thedependent variable in the spatial lags with the synthetic variable to perform the LASSO estimation.The next challenge is to construct robust conﬁdence intervals for my LASSO type two-stage esti-mator. As pointed out in Leeb and Potscher (2008), it is impossible to estimate the distribution forpost-model-selection estimators. Consistent model selection by LASSO is only guaranteed when allnon-zero coeﬃcients are large enough to be distinguished from zero in a ﬁnite sample (i.e. a condi-tion usually named “beta-min”). LASSO may fail to select regressors with very small coeﬃcients,resulting in omitted variable bias in the post LASSO inference.I propose a bias correction for my two-stage estimator following the recent LASSO inference liter-ature (see Chernozhukov et al., 2018; Belloni et al., 2015; van de Geer et al., 2014; Javanmard andMontanari, 2014; Zhang and Zhang, 2011; Zhu, 2018). The idea is to correct the ﬁrst order biasand make the estimators independent from the model selection. Heuristically, shrinkage bias due tothe l penalty in LASSO can be expressed as a function of the LASSO estimators. Normality canstill be achieved after adjusting for this bias. I show that this strategy also works in my two-stageestimation process under the presence of spatial errors. I derive the asymptotic normality for my“de-bias” two-stage LASSO estimator and conduct robust inference including conﬁdence intervals.My model can be extended to allow for more ﬂexible structures of inﬂuencing. One real worldscenario is a network which consists of many local leaders. Local leaders may only inﬂuenceindividuals within their own cliques/groups but have no inﬂuence on individuals outside theircliques/groups. Inﬂuence from local leaders can be represented by a homogenous endogenouseﬀects. Exogenous eﬀect and homophily within cliques may also be present.To solve the problem in this scenario, I modify my model by bringing back the classical SAR model.I assume that there are both local leaders and global leaders in the network. In contrast to localleaders, global leaders can inﬂuence individuals across diﬀerent cliques. I show that homogeneousendogenous eﬀects across local leaders, exogenous eﬀects, and correlated eﬀects can be identiﬁedunder assumptions similar to those Bramoull´e et al. (2009). Under a sparsity assumption, the en-2ogenous eﬀects of global leaders, whose inﬂuence remain individual-speciﬁc, can also be identiﬁedunder the same set of instruments proposed in my main model.Another real world scenario is the existence of multiple types of connections among individuals.For example, connections among individuals can be classiﬁed as social (e.g. friendship, kinship) oreconomic (e.g. lending, employment). In epidemiology, infectious disease can be spread throughair, insects, or direct contact. It is important to identify which networks are more eﬃcient attransmitting the endogenous eﬀects.I model diﬀerent types of connections as multiple networks. I propose the use of sparse groupLASSO to estimate a heterogeneous endogenous eﬀects model with multiple networks. The sparsegroup LASSO penalizes both the l norm and the l norm for each coeﬃcient in each type ofconnection and thus selects both the inﬂuential individuals and eﬀective types of connections thatgenerate spillover. I derive the convergence rate and prove the consistency of selection for thisestimator. To the best of my knowledge, my paper is the ﬁrst to show statistical properties forsparse group LASSO.I provide simulation evidence for networks of diﬀerent sizes and diﬀerent data generating algorithms.The empirical coverage of my proposed estimators is close to the nominal level in all scenarios.Similar results are also found in models with multiple networks and with cliques.I apply my method to study villagers’ decisions to participate in micro-ﬁnance programs in ruralareas of India as in Banerjee et al. (2013). Instead of simulating the contagion process and recon-structing the data into panel format as in Banerjee et al. (2013), my method allows researchers todirectly analyze cross-sectional data under the standard equilibrium assumptions. Among diﬀerentsocial and economic networks, my method shows that some networks such as “visit go-come” and“borrow money”, are much more eﬀective at inﬂuencing villagers’ decisions than other networkssuch as “go to temple together” and “medical help”. I further show that individuals in certaincareers such as agricultural workers, Anganwadi teachers and small business owners are more likelyto inﬂuence villagers. This paper brings together literature on spatial autoregression model, LASSO and networks.

SARs :SARs have been widely applied in empirical studies. For instance, they have been used to study3eer eﬀects in labor productivity (see Mas and Moretti, 2009; Guryan et al., 2009; Bandiera et al.,2009), smoking behavior among teenagers (see Krauth, 2005; Clark and Loheac, 2007; Nakajima,2007), educational achievements among diﬀerent student groups (see Sacerdote, 2001; Neidell andWaldfogel, 2010), systemic risk in ﬁnance (see Bonaldi et al., 2015; Denbee et al., 2015), and theadoption of new agricultural technologies (see Coelli et al., 2002; Conley and Udry, 2010). My paperproposes a novel extension of standard SARs that could be used to identify inﬂuential individuals ina given network. My methodology for estimating such a model could easily be adopted in existingempirical SARs analyses to identify inﬂuential individuals who inﬂuence their peers productivity,smoking decisions, or ﬁnancial holdings.More speciﬁcally, my model extends existing SARs literature by introducing heterogeneous endoge-nous eﬀects. Until very recently, SARs always assume a constant rate of dependence for endogenouseﬀects across diﬀerent individuals. Moreover, row-sum normalization is widely adopted in the esti-mation, which may further result misspeciﬁcation under the existence of heterogeneous endogenouseﬀects. (see Cliﬀ and Ord (1973), the ﬁrst monograph on the topic, and the later studies, Up-ton and Fingleton (1985); Anselin (1988); Cressie (1993); Lee and Liu (2010); Lee and Yu (2010);Jin and Lee (2018)). Recent developments in social interaction literature incorporate individualcharacteristics into SARs, essentially allows for a limited degrees of pre-stipulated heterogeneity,for example, (see Pinkse et al., 2002). In contrast, this paper shows that heterogeneous endoge-nous eﬀects can be identiﬁed from individuals’ outcomes instead of being pre-speciﬁed throughindividuals’ characteristics. Another article studying individual heterogeneity from random coeﬃ-cient aspect is Masten (2018). Masten (2018)’s model assumes each individual has diﬀerent rateof receiving the inﬂuence while my model assumes each individual has diﬀerent rate of inﬂuencingher neighbors. The two diﬀerent models yield two diﬀerent identiﬁcation strategies. This papercontributes to the literature by showing heterogeneous endogenous eﬀects can be directly identiﬁedusing cross-sectional data.To estimate the heterogeneous endogenous eﬀects in my model, I propose a methodology that isdiﬀerent from standard SARs literature. In classic SARs, there is only one endogenous variable andhence it is suﬃcient to identify the model through only one instrument. In my model, the numberof potentially endogenous variables increases as the number of observations increases. As a result,I propose a set of instruments that contain the same number of instruments as the total number ofindividuals. Each instrument is essentially a decomposition from the standard SARs instrumentsas in Kelejian and Prucha (1998), Lee (2002), Lee (2003) and Lee (2004).I show that my model can be combined with SAR model under homogeneous endogenous eﬀects,exogenous eﬀects and correlated eﬀects. To solve the classical “reﬂection problem” as in Manski41993), I adopt the same strategy as in Bramoull´e et al. (2009) and show that “neighbors’ neighbor”instruments can be combined with my set of instruments to identify additional structures in thespillovers.This paper also contributes to literature that models multiple networks through SARs. In standardSARs, multiple networks are modeled as higher order spatial lags (see Lee and Liu, 2010). Mymodel allows each individual to have her own speciﬁc endogenous eﬀects in each network. Thisdesign enables the selection at the network level which implies some networks can be classiﬁed ascompletely irrelevant to decision-making. Furthermore, as the asymptotic allows the number ofnetworks q to go to inﬁnity, my model can handle those cases when researchers observe a largenumber of networks ( q > n ). LASSO :My paper contributes to the growing literature on endogenous regressors in LASSO estimators. Forinstance, Belloni et al. (2014) proposes the double selection mechanism to study confounded treat-ment eﬀects. Fan and Liao (2014) proposes a GMM type estimator to deal with many endogenousregressors. Gautier and TsyBakov (2014) proposes a Self Tuning Instrumental Variables (STIV)estimator. The paper that is closest to mine is Zhu (2018), which studies the statistical propertiesof two-stage least square procedure with high-dimensional endogenous regressors. The two-stageestimator proposed in Zhu (2018) assumes an i.i.d error term for the ﬁrst stage. However, thisassumption is incompatible with SAR model as the structure assumptions in SAR lead to a ﬁrststage with correlated unobservables. In this paper, I derive the rate of convergence and consistencyof selection for a two-stage LASSO estimator under the presence of spatial errors. I show thata modiﬁed “de-bias” LASSO estimator accounting for spatial errors can be constructed for myestimator in a manner similar to Zhang and Zhang (2011), B¨uhlmann (2013), van de Geer et al.(2014), and Zhu (2018). I derive its asymptotic distributions and show how to perform inference.This paper also extends LASSO literature by deriving statistical bounds and consistency of selectionfor sparse group

LASSO estimator. Yuan and Lin (2006) proposes the group LASSO, in whichexplanatory variables are represented by diﬀerent groups. The group LASSO assumes that sparsityexists only among groups, i.e. some groups of variables are relevant while other groups are not.Simon et al. (2013) proposes the sparse group LASSO, which further allows sparsity within eachgroup, i.e. some regressors within the relevant groups can also be irrelevant. Bunea et al. (2014)derives statistical properties for the square-root group LASSO, which combines group LASSO andsquare-root LASSO. When estimating a heterogeneous endogenous eﬀects model with multiplenetworks, I establish both statistical bounds and consistency of selection for the sparse group

LASSO estimator.

Sparse group

LASSO diﬀers from group LASSO by allowing the number of5egressors in each group to also goes to inﬁnity as the number of groups goes to inﬁnity. To thebest of my knowledge, this paper is the ﬁrst to show asymptotic statistical properties for the sparsegroup

LASSO estimator.

Networks :My paper shares similar microfoundations with SARs as discussed in Blume et al. (2015), wherethe individual utility function can be written as a linear summation of the private and socialcomponents. The private component is a quadratic loss function on individual’s eﬀorts. The socialcomponents depend on the network structure as well as the eﬀorts of one’s neighbors. While themarginal rate of substitution between the private and social components of utility is assumed ﬁxedin SARs, I assume this rate is individual-speciﬁc and depends on one’s neighbors. My paper appliesand extends LASSO approaches to deal with a high-dimensional problem in networks. The totalnumber of possible edges in a network is n , however, the social interaction networks we oftenobserve are far more sparse. This is an ideal setting where penalized estimators like LASSO couldbe applied. Manresa (2013) studies the heterogeneous exogenous eﬀects in a network using LASSO.de Paula et al. (2015) explores the use of LASSO to recover network structures. Both these twopapers consider panel data and rely on repeated observations of the same network to identify theirmodels. My model considers cross-sectional data. To identify an individual’s endogenous eﬀects, Irely on the variations in her neighbors’ outcome.My paper also relates to the literature on identifying the key players in the network followingBallester et al. (2006), Calv´o-Armengol et al. (2009), and Horrace et al. (2016). Under the frame-work of SARs, every individual is assumed to have the same endogenous eﬀects. As a result,individuals who are well-connected in the network (with high centrality measure) are consideredas the key players. However, well connected individuals may eﬀectively have zero eﬀects on theirneighbors under heterogeneous endogenous eﬀects. Indeed, as shown in the empirical application,well connected villagers such as tailors, hotel workers, veterans, and barbers are not inﬂuential inother villagers’ decisions to join the micro-ﬁnance program.The rest of this paper is organized as follows: in Section 2, I introduce the model; in Section 3, Idiscuss assumptions; in Section 4, I design estimation procedures and derive consistency and asymp-totic properties; in Section 5, I show ﬁnite sample performance using Monte Carlo simulations; inSection 6, I apply my proposed model to study inﬂuential individuals and eﬀective networks inpromoting micro-ﬁnance programs in rural India; and in Section 7, I conclude.6 Models

Let n denotes the total number of observed individuals in a network. The outcome of individual i is denoted as d i and is the variable of interest. Here d i can represent outcome variable of interestassociated with individual i , such as whether to join a program or i ’s labor productivity. In standardSAR model, it is assumed that the outcome of each neighbor of individual i impacts her outcomehomogeneously through a constant rate λ : d i = λ (cid:88) j ∈ N i d j + x i β + (cid:15) i , (1)where the set N i is deﬁned as individual i ’s neighbors. The matrix form of this model is expressedas follows: D n = λ M n D n + X n β + (cid:15) n , (2)where D n = ( d , d , · · · , d n ) (cid:48) is the n -dimensional vector of observable outcomes. The n by k matrix X n represents the observable exogenous characteristics of individuals. When (cid:15) n is speciﬁed as an n -dimensional vector of independent and identically distributed disturbances with zero mean anda constant variance σ , equation (2) is also called a mixed regression model.The spatial weight matrix M n is of size n by n , where the ( i, j )th entry represents the connectionbetween individual i and individual j . In empirical studies, the spatial weight matrix is oftenreplaced by the adjacency matrix (see Ammermuller and Pischke, 2009; Acemoglu et al., 2012;Banerjee et al., 2013): the ( i, j )-th entry of the matrix M n takes value 1 if individual i and individual j are connected and takes value 0 otherwise; the diagonal entries of the matrix M n are always 0s.In the SAR literature, spatial weight matrix or adjacency matrix is taken as exogenous. Mymethod follows this assumption and is designed for cross-sectional data. However, it is importantto recognize that social networks do change over time and it is then important to take networkformation into modeling (see Christakis et al. (2010), Sheng (2016)).In mixed regression model, endogenous eﬀects (see Manski, 1993) or network eﬀects (see Bramoull´eet al., 2009) are captured by the scalar λ . An implicit assumption in equation (2) is that λ , therate of endogenous eﬀects, is identical across all individuals in the network. Although a limiteddegree of heterogeneity can be built into the adjacency matrix via pre-speciﬁed structure assump-tions, the identiﬁcation potential for heterogeneous endogenous eﬀects has not been fully explored.This limitation has been noted in various studies (see Ammermuller and Pischke, 2009; de Paula7t al., 2015). I relax this assumption by proposing a more ﬂexible model that allows and identiﬁesindividual-speciﬁc endogenous eﬀects as discussed below. I propose the following model: d i = (cid:88) j ∈ N i d j η j + x i β + (cid:15) i (3)where N i represents the set of individual i ’s neighbors and η j represents the endogenous eﬀects ofindividual j on the outcome of all her neighbors i ∈ N j . the model can be rewritten in matrix formas: D n = (cid:16) M n ◦ D n (cid:17) η + X n β + (cid:15) n , (4)where η = ( η , η , · · · , η n ) (cid:48) is a vector of parameter of size n by 1. The i th entry in η represents theendogenous eﬀects of individual i on her neighbors. This model allows for individual heterogeneityto interact with endogenous eﬀects so that every individual is allowed to have her own coeﬃcient η i . My model allows some η j = 0. In other word, there are individuals that impose no endogenouseﬀects on their neighbors. I deﬁne those individuals with η j (cid:54) = 0 as inﬂuential.The operator ◦ is deﬁned between a n by n matrix M n and a n by 1 vector D n as M n ◦ D n = M n · diag( D n ) = C, where diag ( · ) is the diagonalization operator and C i,j = M i,j d j .Note that in contrast to ﬁxed rate λ speciﬁed in equation (2), even though each neighbor ofindividual j is assumed to receive the same inﬂuence d j η j from her, each individual is allowed toinﬂuence her neighbors at her own rate η j .A more generalized form of the model is to replace η j with η ij . This generalization can furthercapture the diﬀerent perceiving rate of endogenous eﬀects on the same leader but among diﬀerentfollowers. However, the number of parameters increase from n to n and the model becomes toosaturated to estimate.Another direction of modeling is to assume heterogeneous perceiving rate of endogenous eﬀects butmaintaining the assumption of homogeneous inﬂuencing rate, for example d i = λ ∗ i (cid:88) j ∈ N i d j + x i β + (cid:15) i λ ∗ i to be passed to a ﬁrst stageregression as shown in proposition 1. As a result, the identiﬁcation strategies are fundamentallydiﬀerent.Equation (4) can be derived from a bayesian Nash Equilibrium. Let ( x i , (cid:15) i ) denotes an individual’stype, where x i is publicly observed characteristics and (cid:15) i is private characteristics only observableby i . Individual i ’s utility depends on her own action and characteristics as well as her neighbors’actions. Individual i chooses action d i to maximize the following utility: U i ( d i , d − i ) = ( x i β + (cid:15) i ) d i − d i + (cid:88) j ∈ N i d j d i η j The ﬁrst order condition yields equation (4). The micro-foundations derived above is similar to theone for SARs as discussed in Blume et al. (2015).

Peer Eﬀects in Labor Productivity:

Understanding the mechanism and magnitude of the dependence of labor productivity on coworkersis an important question for economists and policy makers. As found in Mas and Moretti (2009),workers respond more to the presence of coworkers with whom they frequently interact. Anothermodern example is on code sharing platforms like GitHub. Programmers provide eﬀorts to anopen source project depending on how much other programers are contributing. In such cases, theinﬂuence level of each individual to hers coworkers is not the same. Equation (4) can be used toincorporate such diﬀerences. y i = (cid:88) j ∈ N i y j η j + x i β + (cid:15) i , where y i is individual i ’s productivity, and η j represents the size of inﬂuence of coworker j – allelse being equal, the additional eﬀect on individual i ’s productivity if individual j becomes hercoworker.Note that if we restrict the parameters η j to be the same across diﬀerent workers, then we are backto the classical SARs setting as laid out in equation (2). Thus, λ = n (cid:80) nj =1 η j can be interpreted asthe averaged spillover eﬀects in the canonical sense. Notice that λ converges to 0 when the spillovereﬀects are sparse and it may lead to the conclusion that there is no spillover in the network undersuch scenario. 9eﬁne λ ∗ = 1 (cid:80) η j (cid:54) =0 n (cid:88) j =1 η j η j (cid:54) =0 as the averaged endogenous eﬀects for inﬂuential workers. The estimand λ ∗ is arguably better than λ for average endogenous eﬀects as it excludes individuals who are not inﬂuential to their neighbors. Online Opinion Leaders:

A decision can represent whether to “tweet” a news story seen online. When individuals makesuch decisions, they are often inﬂuenced by several online opinion leaders – whether those peoplealso “tweet” the news or not. Political ﬁgures may have stronger inﬂuence on people’s tweet forpolitical news than celebrities. And vice versa for entertainment news. Assume a binary decision(0 ,

1) is made from a bayesian Nash Equilibrium, such that d ∗ i = (cid:88) j ∈ N i d ∗ j η j + x i β + (cid:15) i , where d ∗ i is the probability of individual i playing action 1, and (cid:80) j ∈ N j d ∗ j η j is the expected endoge-nous eﬀects from i ’s neighbors N i . Deﬁne S = { j : η j (cid:54) = 0 } as truly inﬂuential opinion leaders. My method provides a way to estimate ˆ S which is asymptoticallyconsistent with S . This is an important metric to policy makers or private sectors as targeting ornudging the inﬂuential individuals is usually more eﬃcient than targeting the entire population. Heterogeneous Endogenous Eﬀects Model with Cliques:

There are two important assumptions for equation (4). First is the sparsity assumption, whichrequires non-inﬂuential individuals to have completely zero inﬂuence on their neighbors. Secondit assumes away the exogenous and correlated eﬀects. Consider a network composed of manycliques (small groups of connected individuals). Each clique has its local leaders who only inﬂuenceindividuals within their own clique. Figure 1 provides an example of such a network structure.Note that in Figure 1, node S , S and S represent local leaders who only inﬂuence individualswithin their own cliques. On the contrary, node S represents a global leader who can inﬂuenceindividuals across diﬀerent cliques. For example, one can think about the local leaders S , S and10 as local news channels while S is the national news channel. Furthermore, within each clique,there might exist exogenous and correlated eﬀects. I propose an extension to my heterogeneousendogenous eﬀects model which could address such concerns. I assume that all local leaders willinﬂuence their followers at a small but similar rate while global leaders can have diﬀerent eﬀectson their audience. S S S S Figure 1: Local LeaderFirst, I consider the following extension to introduce only homogeneous endogenous eﬀects: d i = (cid:88) j ∈ N i d j η j + γ (cid:88) j ∈ N i d j + x i β + (cid:15) i , which can be represented in matrix form as: D n = (cid:16) M n ◦ D n (cid:17) η + M n D n γ + X n β + (cid:15) n , (5)where η (cid:48) = ( η , η , · · · , η n ) (cid:48) . The new term γ (cid:80) j ∈ N i d j captures inﬂuence from the local level.Note that this is the same term as the spatial lag in the benchmark spatial autoregression model.The vector η captures the heterogeneous endogenous eﬀects of global leaders.To further include exogenous and correlated eﬀects, consider the following model: d i = (cid:88) j ∈ N i d j η j + γ end (cid:88) j ∈ N i d j + γ exo (cid:88) j ∈ N i x j + x i β + µ c + (cid:15) i , (6)Besides the ﬁrst term representing heterogeneous endogenous eﬀects, equation (6) is the same asthe model in Manski (1993). Heterogeneous Endogenous Eﬀects Model with Multiple Networks:

Individuals are often connected with each other through more than one type of network. Forexample, one’s ﬁnancial network (for borrowing/lending) may be diﬀerent from hers relative network11r even friendship network. A common strategy in empirical application is to “pool” the networksby mixing all types of connections into one network. This approach may increase the noise in thenetwork measurement when the outcome variables only depend on certain types of networks.To capture diﬀerent types of connections among the same set of individuals, we can incorporatemultiple networks in the heterogeneous endogenous model. More speciﬁcally, a separate adjacencymatrix can be constructed for each type of network. For instance, the ( i, j )-th entry of the adjacencymatrix representing friendship takes value 1 if individual i and individual j are friends and takesvalue 0 otherwise; that representing the borrowing/lending network takes value 1 if individual i and individual j lend money to each other and takes value 0 otherwise.Let q be the total number of diﬀerent types of networks. Deﬁne M ln as the adjacency matrix forthe l th network. The heterogeneous endogenous eﬀects model with multiple networks is deﬁned as d i = q (cid:88) l =1 (cid:88) k ∈ N i d lk η lk + x i β + (cid:15) i (7)Note that in this model, diﬀerent networks could potentially bear diﬀerent endogenous eﬀects forthe same individual. In equation (7), coeﬃcient η lk represents the rate of endogenous eﬀect ofindividual k through network l . As a result, we have nq + k coeﬃcients for endogenous eﬀects. Inaddition, I assume endogenous eﬀects from diﬀerent types of networks are linearly additive. Themodel can also be rewritten in matrix form as: D n = q (cid:88) l =1 (cid:16) M ln ◦ D n (cid:17) η l + X n β + (cid:15) n , (8)where M ln is the adjacency matrix for network l . η l = ( η l , η l , · · · , η ln ) (cid:48) is an n by 1 vector for l =1 , , · · · , q . Deﬁne a network l as eﬃcient network if η li (cid:54) = 0 for at least one individual i = 1 , , · · · , n .In this speciﬁcation, leaders can only inﬂuence their neighbors through eﬃcient networks and non-eﬃcient networks are completely independent from the outcome variable. The assumptions discussed in this section combine both standard SARs type assumptions andLASSO type assumptions. SARs type assumptions ensure the existence of valid instruments.LASSO type assumptions provide suﬃcient and necessary conditions for valid inference. The iden-tiﬁcation for inﬂuential individuals is achieved through its coincidence with the sparsity pattern inthe reduced form estimator. 12irst recall the benchmark SAR model: D n = λ M n D n + X n β + (cid:15) n , (9)By rearranging the above equation, we can express endogenous variable M n D n solely as a functionof X n and M n , since: D n = J − n X n β + J − n (cid:15) n where I n is the n by n identity matrix and J n = I n − λ M n . It is straightforward that J − n X n canserve as valid instruments for M n D n . As a result, the identiﬁcation and estimation of equation (9)can be achieved through either 2SLS or GMM as proposed in papers such as Kelejian and Prucha(1995), Kelejian and Prucha (1998) Lee (2002), Lee (2003), and Lee (2004). Following the samestrategy, I derive a set of instruments by solving D n as a function of exogenous variables and theadjacency matrix. Without additional restrictions, equation (4) could not be estimated through canonical method asthe number of parameters n + k is greater than the number of observations n . Assumption 1 (Sparsity) . Let S n ⊂ { , , · · · , n } denotes the set of inﬂuential individuals (i.e. η j (cid:54) = 0 ). Let s n = | S n | be the number of elements in S n . s n = o (cid:18) √ n log n (cid:19) , as n → ∞ Assumption 1 is usually referred to as “sparsity” assumption. The assumption that most individualsin a network are not inﬂuential is plausible under many circumstances. For example, opinion leaderson social media only constitute a very small fraction of internet users. Star programmers who canencourage other programmers to work with them on Github are also a small portion of all Githubusers. On the other hand, assumption 1 can be easily violated when there exists many local leaders.For example, when studying the peer eﬀects in obesity among school children, each class/grade canbe treated as a clique and as the number of cliques increases, the number of leaders may increase ata rate of O ( n ). In next subsection, I propose an extension of the current model to partially addressthis problem. Assumption 2 (SAR restrictions) . - There exists an η max < such that (cid:107) η (cid:107) ∞ ≤ η max The (cid:15) j are i.i.d sub-Gaussian random variable with 0 mean and variance σ - The regressors x i in X n are non-stochastic and uniformly bounded for all n. lim n →∞ X (cid:48) n X n /n exists and is nonsingular- The minimum eigenvalue of ( I n − M n ◦ η ) , Λ min , is uniformly bounded away from 0. The restriction on η excludes the unit root process and ensures the uniqueness of equilibrium. Itguarantees the invertibility of (cid:16) I n − M n ◦ η (cid:17) . This is a fundamental assumption required in allSAR literatures see Upton and Fingleton (1985); Anselin (1988); Jin and Lee (2018)).The assumption on the error term currently excludes exogenous eﬀects and correlated eﬀects frommy model. The SAR model with this exclusion restriction is known as mixed regression model asin (see Lee, 2002). The main challenge to relax this assumption is known as “reﬂection problem” aspointed out in Manski (1993). I adopt a strategy similar to Bramoull´e et al. (2009) to incorporateboth exogenous and correlated eﬀects in next subsection. I also require the error term to besub-Gaussian so that concentration inequalities can be derived to bound the empirical process.Sub-Gaussian process is known to have “almost” bounded support due to the fast decay of its tails.This assumption is usually required for high dimensional estimators as in Belloni et al. (2014), Zhu(2018) and etc.The assumption on the regressors follows the convention in the SAR literatures see Anselin (1988);Jin and Lee (2018). This deterministic design assumption can be extended to the random designcases under my model. In the following context, I focus on the case where X n is an n by 1 vectorand study identiﬁcation as in Bramoull´e et al. (2009).I also require the minimum eigenvalue of ( I n − M n ◦ η ) to be uniformly bounded away from 0. Thisis to prevent the spatial errors from accumulating too fast. The spatial error term after the matrixinversion is ( I n − M n ◦ η ) − (cid:15) . A similar assumption can be found in Kelejian and Prucha (1995)and Kelejian and Prucha (1998), which requires the uniformly boundedness of ( I n − M n ◦ η ) − .To proceed, recall the deﬁnition of the operator “ ◦ ” as M n ◦ D n = M n · diag( D n ), where diag( · ) isthe diagonalization operator. Note the following property of the “ ◦ ”: (cid:16) M n ◦ D n (cid:17) η = (cid:16) M n ◦ η (cid:17) D n If the invertibility of (cid:16) I n − M n ◦ η (cid:17) is guaranteed, then D n = (cid:16) M n ◦ D n (cid:17) η + X n β + (cid:15) n ⇔ D n = ∞ (cid:88) i =0 (cid:16) M n ◦ η (cid:17) i ( X n β + (cid:15) n ) (10)14ince (cid:16) M n ◦ D n (cid:17) η is correlated with (cid:15) n and η is sparse (i.e. having at most s n non-zero elements),we need at least s n instruments to deal with the endogeneity in the model. Using equation (10),we can express the expectation of D n as follows: E ( D n ) = X n β + (cid:16) M n ◦ X n (cid:17) ( βη ) + ∞ (cid:88) i =2 (cid:16) M n ◦ η (cid:17) i X n β , (11)Let ( · ) S denote the operator such that ( M n ) S is a sub matrix of M n with its columns restricted tocolumns corresponding to the elements of S . The ﬁrst and second terms of equation (11) suggestthat X n and ( M n ◦ X n ) S can serve as valid instruments to point identify β and η . Proposition 1 (First Stage Equivalence) . Under assumption 1 and 2 E ( D n ) = X n β + (cid:16) M n ◦ X n (cid:17) ˜ η, (12) where ˜ η j = η j ˜ g ( η , β , X n , M n ) for some function ˜ g depends on η , β , X n , and M n . Proposition 1 shows the sparsity pattern is preserved when solving the simultaneous equations since˜ η j = 0 as long as η j = 0 . As a result, the sparsity assumption is also satisﬁed in equation (12),and I can thus estimate equation (12) as the ﬁrst stage using a LASSO type estimator.Deﬁne W n as the projection matrix on to the orthogonal space of X n : W n = I n − X n ( X (cid:48) n X n ) − X (cid:48) n Assumption 3 (Independence) . W n ( M n ◦ X n ) S has full column rank. The linear independence among ( M n ◦ X n ) S requires the assumption that any two inﬂuential indi-viduals may not necessarily connect with identical neighbors. Moreover, assumption 3 also requiresthat neighbors of an inﬂuential individual cannot be a linear combination of neighbors of severalother inﬂuential individuals, which rules out network structures as depicted in Figure 2. de Paula et al. (2015) noticed that sparsity pattern will generally not be preserved during matrix inversion whenthere is no further restriction on the adjacency matrix. Proposition 1 shows that with pre-existing network structureand homogeneous inﬂuence assumption for a given inﬂuential individual, sparsity pattern can still be preserved aftermatrix inversion. S . . . . . . . . . . . .Figure 3: Examples of networks which satisﬁes assumption 3 The inﬂuence of S can be identiﬁed by comparing red (right shaded) and grey groups (plain), while theinﬂuence of S can be identiﬁed by comparing blue (left shaded) and grey (plain) groups. Or the inﬂuenceof S can be identiﬁed by comparing green (dotted) and blue (left shaded) groups, while the inﬂuence of S can be identiﬁed by comparing red (right shaded) and green (dotted) groups. S S . . .(1) S S S . . . . . .(2)Figure 2: Examples of networks which violate assumption 3 (1) Two inﬂuential individual S and S share the exact same neighbors. (2) The neighbors of an inﬂuentialindividual S is a linear combination of S and S ’s neighbors. In other words, as long as each inﬂuential individual has a neighbor that is not connected with anyother inﬂuential individuals, assumption 3 is satisﬁed. One can think of the identiﬁcation here asestimating ﬁxed eﬀects from inﬂuential individuals as illustrated in Figure 3. Collinearity ariseswhen the ﬁxed eﬀects of two inﬂuential individuals are imposed on exactly the same observations.Assumption 3 is essentially a restriction on the topology of network structures. It rules out caseslike complete network or cases (1) and (2) as in Figure 2. To achieve identiﬁcation under a cross-sectional network data, one has to rely on certain network structures. Similar assumptions can befound in Bramoull´e et al. (2009) and de Paula et al. (2015).At this point, if the truly inﬂuential individuals set S n were available to us, we would be able16o estimate the model using 2SLS method or GMM. However, in most cases, S n is not knownbeforehand. Notice that the identiﬁcation for the set S n in the structure model (4) coincides withthe sparsity pattern in the reduced form (12). I propose to use a LASSO type estimator to bothrecover the set of inﬂuential individuals and estimate the model. For LASSO to achieve correctrecovery, I need the following assumptions: Assumption 4. [LASSO restrictions](Irrepresentable Condition) There is a ϑ ∈ (0 , such that max (cid:107) u (cid:107) ∞ ≤ (cid:13)(cid:13) diag ( f S c ) · Σ M , (Σ M , ) − · diag ( f S ) − · u (cid:13)(cid:13) ∞ < ϑ where Σ := 1 n M (cid:48) n W n M n = (cid:32) Σ M , Σ M , Σ M , Σ M , (cid:33) and f = ( I − M n ◦ η ) − X n β (Beta Min Condition) There exists N ∈ N and a m > such that ∀ n ≥ N , min( | η | ) S ≥ m/ √ n, Here deﬁne the operator ( · ) S as the sub-matrix/vector restricted to the columns/entries corre-sponding to inﬂuential individuals. Similarly, ( · ) S c represents the sub-matrix/vector restricted bythe columns/entries corresponding to non-inﬂuential individuals. Also notice that the invertibilityof Σ M , is guaranteed by assumption 3.We prove in theorem 2 that assumption 4 is a suﬃcient condition for the LASSO estimator toachieve a consistent selection for the set S n in the second stage. While assumption 3 restricts howinﬂuential individual may connect with their neighbors, assumption 4 restricts how non-inﬂuentialindividual may connect. For example, Irrepresentable Condition prevents the neighbors of a non-inﬂuential individual to be exactly the same as the neighbors of any inﬂuential individual. This isbecause when two individuals connect with exactly the same neighbors, we cannot distinguish whichindividual is the true source of inﬂuence. This is illustrated in Figure 4 (1). On the other hand,assumption 4 does not require full independence between inﬂuential individuals and non-inﬂuentialindividuals. This is illustrated in Figure 4 (2). 17 . . . s (1) S . . . . . .. . . s (2)Figure 4: Examples of networks which violate and satisfy assumption 4 In (1), the inﬂuence of S can not be separately identiﬁed from S although S is inﬂuential and S isnon-inﬂuential. In (2), the inﬂuence of S can be identiﬁed by comparing red (right shaded) and blue(left shaded) groups. And then we can ﬁnd out S is non-inﬂuential by comparing blue (left shaded) andgreen (dotted) groups. The Beta Min Condition requires the magnitude of the endogenous eﬀects to be suﬃciently strongin order to be detected by LASSO. For example, there does not exist a sequence of individualswhose inﬂuence decay to 0 faster than the rate of 1 / √ n . Beta Min Condition is restrictive suchthat it rules out uniform inference and creates additional problems when constructing conﬁdenceintervals.Equation (4) can still be consistently estimated under weaker conditions than Irrepresentable Con-dition and Beta Min Condition. The stronger version is assumed above to ensure selection consis-tency. As shown in Zhao and Yu (2006), the Irrepresentable Condition together with the Beta MinCondition are necessary and suﬃcient conditions for LASSO to achieve consistent model selection.The following compatibility condition will guarantee the valid inference using “de-bias” estimatorproposed in the next section. Assumption 4-1 (Compatibility Condition) . For some φ > (independent of n) and for all η satisfying (cid:107) η S c (cid:107) ≤ (cid:107) η S (cid:107) , it holds that (cid:107) η S (cid:107) ≤ ( η (cid:48) ( M n ) (cid:48) S W n ( M n ) S η ) s /φ , Assumption 4-1 is referred as compatibility condition as in van de Geer et al. (2014). It also restrictsthe network topology (i.e complete network is still ruled out) but is weaker than assumption 3 +assumption 4. Under assumption 4-1, we can still construct uniformly valid inference for de-bias18stimator proposed in the next section even though consistent model selection is not guaranteed.

Heterogeneous Endogenous Eﬀects Model with Cliques:

The heterogeneous endogenous eﬀects model with cliques can be written in matrix form as: D n = (cid:16) M n ◦ D n (cid:17) η + M n D n γ + X n β + (cid:15) n Assumption 1’ (Sparsity with Cliques) . Among n individuals in the network, let S n ⊂ { , , · · · , n } be the set of global leaders. Let s n = | S n | be the number of elements in S n . Assume: s n = o (cid:18) √ n log n (cid:19) , as n → ∞ Assumption 1’ relaxes the exact sparsity in assumption 1 without imposing any restriction on thenumber of local leaders. For example, it does not rule out situations where everyone is (locally)inﬂuential. Local leaders’ inﬂuence will be captured by the γ , coeﬃcient of classical spatial lag.The number of global leaders is restricted to be sparse and we can identify these leaders similar asin previous case.To ensure invertibility of the matrix (cid:16) I n − M n ◦ η − M n γ (cid:17) , I modify assumption 2 as: Assumption 2’ (SAR Restrictions with Cliques) . In addition to assumption 2, there exists an η max < such that (cid:107) η + γ (cid:107) ∞ ≤ η max Similar to assumption 2, this assumption excludes unit root processes. Since there exists a locallevel inﬂuence γ in the network, global level inﬂuence η need to be further bounded away from1. As a result, equation (5) can be transformed into the following: E ( D n ) = X n β + (cid:16) M n ◦ X n (cid:17) ( β η ) + M n X n ( β γ ) + ∞ (cid:88) i =2 (cid:16) M n ◦ η + γ M n (cid:17) i β X n (13) Proposition 2 (First Stage Equivalence with Cliques) . Under assumption 1’ and 2’ E ( D n ) = X n β + ( M n ◦ X n )˜ η ∗ , ∞ + ∞ (cid:88) i =1 γ i M in X n β + ∞ (cid:88) i =1 M in ( M n ◦ X n )˜˜ η ∗ , ( i, ∞ ) , where ˜ η ∗ , ∞ j = η ,j ˜ g ∞ k,j ( η , γ , β , X n , M n ) , ˜˜ η ∗ , ( i, ∞ ) j = η ,j ˜ h ( i, ∞ ) k,j ( η , γ , β , X n , M n ) for some function ˜ g ∞ k,j and ˜ h ( i, ∞ ) k,j depend on β , γ , η , X n , and M n . γ introduced in equation (5) to be identiﬁed. Anextra instrument M n X n need to be introduced. And thus we need to assume additional indepen-dence: Assumption 3’ (Independence with Cliques) . (cid:2) W n ( M n ◦ X n ) S , W n M n X n (cid:3) has full column rank. To further incorporate exogenous and correlated eﬀects, recall equation (6) can be written in matrixform as D n = (cid:16) M ∗ n ◦ D n (cid:17) η + M ∗ n D n γ end + M ∗ n X n γ exo + X n β + µ c + (cid:15) n Here, I deﬁne M ∗ n as the row sum normalized version of M n . The main challenge for the identi-ﬁcation is known as “reﬂection problem” (Manski (1993)). Bramoull´e et al. (2009) proposed the“neighbor’s neighbor” instruments as a solution. By taking the local diﬀerence, we can obtain thefollowing form( I − M ∗ n ) D n = ( I − M ∗ n ) (cid:16) M ∗ n ◦ D n (cid:17) η +( I − M ∗ n ) M ∗ n D n γ end +( I − M ∗ n ) M ∗ n X n γ exo +( I − M ∗ n ) X n β +( I − M ∗ n ) (cid:15) n and invert the simultaneous equation, we have E (( I − M ∗ n ) D n ) = (cid:16) I − M ∗ n ◦ η − M ∗ n γ end (cid:17) − (cid:16) β · I + γ exo M ∗ n (cid:17) ( I − M ∗ n ) X n , Proposition 3 (First Stage Equivalence with Cliques + exogenous and correlated eﬀects) . Underassumption 1’ and 2’ and assume M ∗ n is row sum normalized E (( I − M ∗ n ) D n ) = ( I − M ∗ n ) X n β + ( γ exo + γ end β ) M ∗ n ( I − M ∗ n ) X n + ( M ∗ n ◦ X n )˜ η ∗ , ∞ + ∞ (cid:88) i =1 ( γ end ) i ( γ exo + γ end ) M ∗ ( i +1) n ( I − M ∗ n ) X n β + ∞ (cid:88) i =1 M ∗ in ( M n ◦ X n )˜˜ η ∗ , ( i, ∞ ) where ˜ η ∗ , ∞ j = η ,j ˜ g ∞ ,k,j ( η , γ exo , γ end , β , X n , M n ) , ˜˜ η ∗ , ( i, ∞ ) j = η ,j ˜ h ( i, ∞ )2 ,k,j ( η , γ exo , γ end , β , X n , M n ) for some function ˜ g ∞ ,k,j and ˜ h ( i, ∞ )2 ,k,j depend on β , γ exo , γ end , η , X n , and M n . Assumption 3’-1.

Assume γ exo + β γ end (cid:54) = 0 and (cid:2) W n ( M ∗ n ◦ X n ) S , W n M ∗ n X n , W n M ∗ n X n , W n M ∗ n X n (cid:3) is full rank and M ∗ is row sum normalizable. Similar to proposition 4 in Bramoull´e et al. (2009), assumption 3’-1 requires the independencebetween M ∗ n X n and M ∗ n X n in order to use the “neighbor’s neighbors” as instruments to addressthe “reﬂection” problem. Furthermore, the independence of M ∗ n X n will pin down the identiﬁcationfor correlated eﬀects. 20 eterogeneous Endogenous Eﬀects Model with Multiple Networks: The heterogeneous endogenous eﬀects model with multiple networks can be represented in matrixform as follows: D n = q (cid:88) j =1 (cid:0) M jn ◦ D n (cid:1) η j + X n β + (cid:15) n The number of coeﬃcients in this model becomes nq + k and the number of observed networks q may also increase as the number of observations n increases. As a result, the sparsity assumptionwill be imposed on both the inﬂuential individuals and the eﬀective networks. I assume that someof the networks are completely irrelevant (i.e. η j = 0) and that relevant networks are not necessarilypassing inﬂuence for everyone (i.e. η j (cid:54) = 0 but η j ,i = 0 for some i ).Second, to ensure invertibility, for any matrix norm (cid:107) . (cid:107) : (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) q (cid:88) j =1 (cid:16) M jn ◦ η j (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ q (cid:88) j =1 (cid:13)(cid:13)(cid:13) (cid:16) M jn ◦ η j (cid:17) (cid:13)(cid:13)(cid:13) ≤ q (cid:88) j =1 (cid:107) η j (cid:107) ∞ (cid:13)(cid:13)(cid:13) (cid:0) M jn (cid:1) (cid:13)(cid:13)(cid:13) Because M jn is the adjacency matrix such that each entry is 0 or 1, (cid:80) qj =1 (cid:107) η j (cid:107) ∞ < I − (cid:80) qj =1 (cid:16) M jn ◦ η j (cid:17) . Proposition 4 (First Stage Equivalence – Multiple Networks) . Under assumption assumption 1*and assumption 2* E ( D n ) = X n β + q (cid:88) j =1 (cid:16) M jn ◦ X n (cid:17) (˜ η j ) where ˜ η jk = η jk ˜ g j ( η , β , X n , M n ) for some function ˜ g j depends on η j , β , X n , and M jn . Third, I require (cid:104) X n , (cid:16) M n ◦ X n (cid:17) S , (cid:16) M n ◦ X n (cid:17) S , · · · , (cid:16) M qn ◦ X n (cid:17) S (cid:105) to be full rank. Compared withthe standard model, this assumption requires the independence condition to hold across diﬀerentnetworks. The four assumptions required for multiple networks are listed formally in the appendixas assumption assumption 1*, 2*, 3*, and 4*. The proposed estimator is similar to the two-stage least square method but use LASSO in bothstages. 21 wo-Stage LASSO Estimator: - First Stage: ( ˜ β, ˜ η ) = arg min β,η (cid:107) D n − X n β − (cid:16) M n ◦ X n (cid:17) η (cid:107) + λ | η | (14)Obtain a LASSO ﬁtting ˆ D n ˆ D n = X n ˜ β + (cid:16) M n ◦ X n (cid:17) ˜ η - Second Stage: ( ˆ β, ˆ η ) = arg min β,η (cid:107) D n − (cid:16) M n ◦ ˆ D n (cid:17) η − X n β (cid:107) + λ | η | (15)As shown in section 3, (cid:16) M n ◦ D n (cid:17) is correlated with (cid:15) n . Thus equation (4), equation (5) andequation (8) cannot be estimated directly using LASSO or sparse group LASSO. The instrumentsproposed in section 3 are [ X n , ( M n ◦ X n ) S ]. We do not observe the set S but [ X n , ( M n ◦ X n )] is aset of regressors that contains the valid instruments. The estimator ˆ β and ˆ η suﬀer from LASSO shrinkage bias. Moreover, post model selection inferenceconditioning on the selected model ˆ S n = { i | ˆ η (cid:54) = 0 } suﬀers from the omitted variable bias and thusis not uniformly valid (see Leeb and Potscher, 2005, 2008, 2009). I construct a “de-bias” estimatorunder my setting and derive the asymptotic distribution for it. I propose the following de-biasLASSO estimator: de-bias 2SLSS Estimator: ˆ e = ˆ η + ˆΘ ˜ X (cid:48) n ( M n ◦ ˆ D n ) (cid:48) W n ( D n − ( M n ◦ D n )ˆ η ) /n ˆ b = ˆ β − ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) (cid:16) ( M n ◦ ( ˆ D n − D n ))ˆ η (cid:17) ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) X n ˆ β and ˆ η are estimators from the 2SLSS. Deﬁne W n = (cid:16) I − X n ( X (cid:48) n X n ) − X (cid:48) n (cid:17) . ˜ X n = n ( M n ◦ D n ) (cid:48) W n ( M n ◦ ˆ D n ), ˆΘ are deﬁned by the nodewise regression as in Meinshausen and B¨uhlmann (2006)on ˜ X n and ˆ γ β are again deﬁned by the nodewise regression on between X n and ( M n ◦ D n ). Nodewiseregression explores the correlation between the columns of the design matrix ˜ X n by regressing eachcolumn on all the rest of the columns while penalizing the coeﬃcients. An approximation of the22nverse of the matrix n ˜ X (cid:48) n ˜ X n can be constructed based on nodewise regression. Further, deﬁneˆ S n = { i | ˆ η (cid:54) = 0 } , which represents the LASSO selected active set. The estimators (ˆ e, ˆ b ) are adjustedfor the LASSO shrinkage bias and are a consistent estimator for η and β . They are similar to theestimators proposed in van de Geer et al. (2014), but are constructed through a two-stage process.The de-bias estimator also diﬀers from the two-stage estimators proposed in Zhu (2018) due tospatial correlation. Theorem 1.

Under assumption 1, assumption 2 and assumption 4-1. There exist constant c , c , c and ι : (cid:107) ι (cid:107) < ∞ such that for the ﬁrst stage tuning parameter λ ≥ (cid:113) σ c Λ − min log nn and secondstage tuning parameter λ ≥ (cid:113) σ c Λ − min log nn + 2 c λ + 2 c λ , the de-bias estimator √ n ( ι (cid:48) ˆ e − ι (cid:48) η ) = 1 √ n ι (cid:48) ˆΘ ˜ X (cid:48) n (cid:15) − ι (cid:48) ∆ ∼ N (0 , σ ι (cid:48) ˆΘ ˜ X (cid:48) n Ω n ˜ X n ˆΘ (cid:48) ι ) √ n (ˆ b − β ) = ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) (cid:15) ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) X n + ∆ β ∼ N  , σ ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) ( X n − ( M n ◦ D n )ˆ γ β ) (cid:16) ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) X n (cid:17)  where (cid:107) ∆ (cid:107) ∞ = o p (1) , (cid:107) ∆ β (cid:107) ∞ = o p (1) , ˜ X n = n ( M n ◦ D n ) (cid:48) W n ( M n ◦ ˆ D n ) , Ω n = n ( M n ◦ ˆ D n ) (cid:48) W n ( M n ◦ ˆ D n ) , ˆΘ are deﬁned by the nodewise regression on ˜ X n and ˆ γ β are deﬁned by the nodewise regressionon between X n and ( M n ◦ D n ) . Theorem 1 shows that the 2SLSS estimator achieves normality at the standard rate √ n . The shifts∆ and ∆ β represent the bias from using nodewise regression and they are shown to be o p (1) withthe proper choice of tuning parameters. Theorem 2.

Under assumption 1-4, there exist ˜ γ > and ˜ ϑ = ϑ − ϑ . For λ ≥ γ ∨ ˜ ϑ ) (cid:113) σ c Λ − min log nn +2 c λ + 2 c λ , lim n →∞ P ( ˆ S n = S ) = 1 ; .2 2SLSS for Two Extensions Two-Stage LASSO Estimator with Homogenous Eﬀects: - First Stage: for a pre-speciﬁed constant k ,( ˜ β, ˜ γ, ˜ η ) = arg min β,γ,η (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D n − X n β − k (cid:88) i =1 M in X n γ ,i − k (cid:88) i =0 M in (cid:16) M n ◦ X n (cid:17) η ,i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + λ (cid:32) k (cid:88) i =1 | η ,i | + | γ ,i | (cid:33) Obtain a LASSO ﬁtting ˆ D n ˆ D n = X n ˜ β + M n X n ˜ γ + (cid:16) M n ◦ X n (cid:17) ˜ η - Second Stage:( ˆ β, ˆ γ, ˆ η ) = arg min β,γ,η (cid:107) D n − M n ˆ D n γ − (cid:16) M n ◦ ˆ D n (cid:17) η − X n β (cid:107) + λ ( | η | + | γ | )From proposition 2, the sparsity pattern can not be fully preserved by ( M n ◦ X n ) so additionalstructures like M in ( M n ◦ X n ) need to be included in the ﬁrst stage. Those terms represent theinﬂuence from global leaders but passing i th times through local leaders. By assumption 2’, theinﬂuence represented by M in ( M n ◦ X n ) decreases as i increases.The second stage of 2SLSS with Cliques case can be viewed as a special case of the standard 2SLSSestimator. For example, one can rewrite the second stage as( ˜ β, ˜ γ, ˜ η ) = arg min β,γ,η (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D n − X n β − (cid:16) M n ˆ D n (cid:16) M n ◦ ˆ D n (cid:17) (cid:17) · (cid:32) γη (cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + λ | η | + λ | γ | As a result, asymptotics as theorem 1 and theorem 2 follows.To incorporate the structure of multiple networks, I propose the use of sparse group LASSO.24 wo-Stage LASSO Estimator with Multiple Networks: - First Stage:( ˜ β, ˜ η ) = arg min β,η (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D n − X n β − q (cid:88) j =1 ( M jn ◦ X n ) η j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) +  q (cid:88) j =1 (cid:16) λ (cid:107) η j (cid:107) + λ (cid:107) η j (cid:107) (cid:17) Obtain a LASSO ﬁtting ˆ D n ˆ D n = X n ˜ β + q (cid:88) j =1 ( M jn ◦ X n )˜ η j - Second Stage:( ˆ β, ˆ η ) = arg min β,η (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D n − X n β − q (cid:88) j =1 ( M jn ◦ ˆ D n ) η j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) +  q (cid:88) j =1 (cid:16) λ (cid:107) η j (cid:107) + λ (cid:107) η j (cid:107) (cid:17) The sparse group LASSO introduces two tuning parameters, λ and λ , to penalize both the l andthe l norm in each network. Similar to the LASSO estimator, the geometric shape of the penaltiesallows the sparse group LASSO to identify sparsity not only within each network (group) butalso among networks (groups). In other words, some networks could be completely irrelevant (i.e. η j = 0) and within relevant networks, some individuals can have no inﬂuence on their neighbors(i.e. η j (cid:54) = 0 but η ji = 0 for some i ). de-bias 2SLSS Estimator under Multiple Networks: ˆ e M = ˆ η M + ˆΘ Z ˜ Z (cid:48) n ˆ Z (cid:48) n W n ( D n − Z n ˆ η M ) /n ˆ b M = ˆ β − ( X n − Z n ˆ γ β ) (cid:48) ( ˆ Z n − Z n ˆ η M )( X n − Z n ˆ γ β ) (cid:48) X n ˆ β and ˆ η M = (ˆ η (cid:48) , ˆ η (cid:48) , · · · , ˆ η q (cid:48) ) (cid:48) are estimators from the 2SLSS estimator with multiple networks. Letˆ Z n = (cid:104) ( M n ◦ ˆ D n ) , ( M n ◦ ˆ D n ) , · · · ( M qn ◦ ˆ D n ) (cid:105) , Z n = (cid:2) ( M n ◦ D n ) , ( M n ◦ D n ) , · · · ( M qn ◦ D n ) (cid:3) . Deﬁne˜ Z n = n Z (cid:48) n W n ˆ Z n , ˆΘ Z are deﬁned by the nodewise regression as in Meinshausen and B¨uhlmann(2006) on ˜ Z n and ˆ γ β are again deﬁned by the nodewise regression on between X n and Z n . Theorem 3.

Under assumption 1*, assumption 2* and assumption 4-1*. There exists constant , d , d and ι : (cid:107) ι (cid:107) < ∞ such that for the ﬁrst stage tuning parameter λ , ≥ (cid:113) σ d Λ − min (log n +log q ) n and second stage tuning parameter λ , ≥ (cid:113) σ d Λ − min (log n +log q ) n + 2 d λ , + 2 d λ , , the de-biasestimators √ n ( ι (cid:48) ˆ e M − ι (cid:48) η M ) = 1 √ n ι (cid:48) ˆΘ Z ˜ Z (cid:48) n (cid:15) − ι (cid:48) ∆ M ∼ N (0 , σ ι (cid:48) ˆΘ Z ˜ Z (cid:48) n Ω mn ˜ Z n ˆΘ (cid:48) Z ι ) √ n (ˆ b M − β ) = ( X n − Z n ˆ γ β ) (cid:48) (cid:15) ( X n − Z n ˆ γ β ) (cid:48) X n + ∆ M,β ∼ N  , σ ( X n − Z n ˆ γ β ) (cid:48) ( X n − Z n ˆ γ β ) (cid:16) ( X n − Z n ˆ γ β ) (cid:48) X n (cid:17)  where (cid:107) ∆ M (cid:107) ∞ = o p (1) , (cid:107) ∆ M,β (cid:107) ∞ = o p (1) and ˜ Z n = n Z (cid:48) n W n ˆ Z n and Ω mn = n ˆ Z (cid:48) n W n ˆ Z n . ˆΘ Z aredeﬁned by the nodewise regression on ˜ Z n and ˆ γ β are deﬁned by the nodewise regression on between X n and Z n . Theorem 3 requires the rate of convergence for sparse group LASSO at the ﬁrst stage. This isproved in Lemma ?? in the appendix. In group LASSO, λ , controls the convergence of the l λ , to achieve l convergence. However,group LASSO requires the number of regressors in each group to be ﬁnite. In my estimator, thenumber of regressors in each group equals to n . In the next theorem, I show that λ , will need tobe chosen in the same order as λ , in order to achieve consistent selection. Theorem 4.

Let ˜ c = λ , λ , and ˜ ϑ mul = ϑ mul − ϑ mul . Choose λ , and λ , such that ˜ c > ϑ mul − ϑ mul . Underassumption 1*-4* and for λ , ≥ γ ∨ ˜ ϑ mul ) (cid:113) σ d Λ − min (log n +log q ) n + 2 d λ , + 2 d λ , , there exists ˜ γ > , lim n →∞ P ( ˆ S n = S ) = 1It is worth pointing out that assumption ?? is a weaker assumption than assume the full irrepre-sentable condition on the design matrixΣ mul := 1 n [ M n , M n , · · · , M qn ] (cid:48) W n [ M n , M n , · · · , M qn ]As the number of regressors in each group also goes to inﬁnity as the number of groups goesto inﬁnity, the multicollinearity in Σ mul can be severe. Instead, assumption ?? decomposes themulticollinearity into between-group and within-group and thus allows a more ﬂexible dependencestructure. As a result, sparse group LASSO can be used to recover the inﬂuential individuals aswell as the eﬀective networks that generate spillovers.26 Simulations

In this section, I report Monte Carlo simulation results for the models proposed above. My resultsare robust when applied to networks generated by diﬀerent algorithms and to networks of diﬀerentsizes.

First, I use the Erdos-Renyi algorithm to simulate a network of size n . Individuals are added intothe graph one at a time. When one individual is added to the network, she has probability p ofgenerating a link with all existing individuals independently. I choose p = 0 . p = 0 . p because collinearity among regressors may arise when links becomevery dense, violating assumption 3 or 4.I set the ﬁrst 5 individuals to be inﬂuential by letting their coeﬃcients η j be non-zero. To guaranteethe existence of endogenous eﬀects, I arbitrarily specify the connections among these ﬁve individ-uals. The adjacency matrix M n for the ﬁve inﬂuential individuals is given in the appendix. If theconnections among these ﬁve individuals are not ﬁxed, there is a possibility that no connections areformed among these ﬁve and thus there is no endogeneity in the network. The results will be toogood in such a case. The true parameters are ﬁxed as β = 3, η , = η , = η , = η , = η , = 0 . η ,j = 0 for j >

5. Individual characteristics X n are generated from a standard normal distri-bution. Individual outcomes Y n are then generated as Y n = ( I − M n ◦ η ) − ( X n β + (cid:15) n ) where (cid:15) n is drawn independently from a standard normal distribution.I use ( M n , X n , Y n ) as observations and apply my two-stage LASSO estimator. I construct the de-bias 2SLSS estimator and repeat the above process 200 times in a manner similar to van de Geeret al. (2014). I report the average coverage probability (Avgcov) and average length (Avglength)of conﬁdence intervals for the coeﬃcients for inﬂuential individuals, { η , · · · , η } , the coeﬃcient forindividual characteristics, β , and the coeﬃcients for non-inﬂuential individuals, the η j s ( j > S = s − (cid:88) j ∈ S P [ η ,j ∈ CI j ] (16)Avglength S = s − (cid:88) j ∈ S length ( CI j ) (17)I separately report the average coverage and average length for each of the ﬁve inﬂuential individ-uals. As shown in appendix table A1, the coverage is around the nominal 95% level and the length27f the conﬁdence intervals decreases as the sample size grows.Since we can construct conﬁdence intervals for all n coeﬃcients, joint inference can be performedunder the control of False Discover Rate (FDR). As shown in equation (18), the power reportedin appendix table A1 represents the average percentage in the active set (i.e. { , , , , } ) thatis signiﬁcant after controlling for the False Discover Rate (FDR) at 5% using the Benjamini-Hochberg method. The FDR reported in appendix table A1 represents the average percentage ofthe non-active set (i.e. { , , · · · , n } ) that is signiﬁcant after controlling the FDR at 5% using theBenjamini-Hochberg method. The exact deﬁnition is as in equation (19).Power = s − (cid:88) j ∈ S P [ H ,j is rejected] (18)FDR = (cid:88) j ∈ S c P [ H ,j is rejected] / n (cid:88) j =1 P [ H ,j is rejected] (19)The power varies because the networks change when the sample size increases. It is strictly in-creasing when the network is sparse (i.e. p = 0 . p = 0 . λ sfrom both stages as in section 4.1). Moreover, when calculating ˆΘ in the de-bias 2SLSS estimator(section 4.2) and using the nodewise regression, one also need to choose a tuning parameter. I usea benchmark choice of λ nodewise (cid:38) (cid:112) log( n ) /n and λ (cid:38) (cid:112) log( n ) /n in the nodewise regression andﬁrst stage. Then I use cross-validation to pick λ in the second stage.I further increase the number of inﬂuential individuals to 10 and report the results in appendixtable A2. Again, to guarantee the existence of endogeneity, the adjacency matrix for these tenindividuals is set as shown in the appendix. All average coverages and average conﬁdence intervallengths are separately reported for these ten individuals. The choice of the tuning parameters issimilar to those used to generate appendix table A1 for networks with 50 and 200 individuals. Fornetworks with 500 individuals, I use benchmark λ to replace cross validation in the second stage.28s shown in appendix table A2, all coverages are very close to the nominal levels. The averagelengths of conﬁdence intervals is slightly larger compared with appendix table A1. This is due tothe increase in inﬂuential individuals; it is more diﬃcult to diﬀerentiate them from those irrelevantindividuals.Appendix table A3 presents the result when a network is generated using the Watts-Strogatzmechanism or the “small world” network. Deﬁne the pN (even number) as the mean degree foreach node and a special parameter ω = 0 .

4. The WattsStrogatz mechanism works as follows:- construct a graph with N nodes each connected to pN neighbors, which pN on each side.- For each node n i , take every edge ( n i , n j ) with i < j and rewrite it with probability ω .Rewrite means replace ( n i , n j ) with ( n i , n k ) where k is choosing uniformly among all nodesthat are not currently connected with n i The inﬂuential individuals are chosen as the 1st, 5th, 15th, 40th and 50th individuals in the network.As shown in appendix table A3, my estimator is robust under a “small world” algorithm. Nominallevel is reached as the size of the network grows and the length of conﬁdence intervals is slightlysmaller than in the standard case.

Appendix table A5 presents results for the heterogeneous endogenous eﬀects model with cliques.The outcome variable Y n is now generated as Y n = ( I − M n ◦ η − M n γ ) − ( X n β + (cid:15) n ). Thecoeﬃcient of the homogeneity eﬀects γ is set at 0.05.The choice of the tuning parameters is similar to that used to generate appendix table A1 fornetworks with 50 and 200 individuals. For networks with 500 individuals, I use benchmark λ (i.e. λ (cid:38) (cid:112) log( n ) /n ) to replace cross validation in the second stage.The coverage is above the 95% nominal level in all cases. I also report the mean coverage andaverage length of the conﬁdence interval for the coeﬃcient of the homogeneous eﬀects. My modelgives above 95% coverage in all cases. I also report the empirical probability of rejecting a nullhypothesis of zeros eﬀects at 95% nominal level. The probability of rejecting the test converges to1 when the sample size grows to 500. 29 .3 Heterogeneous Endogenous Eﬀects Model with Multiple Networks In this Monte Carlo exercise, I include two diﬀerent networks generated by the Erdos-Renyi algo-rithm, where one is inﬂuential and the other is not. I use the two-stage LASSO estimator withmultiple networks to estimate the parameters. The sparse group LASSO requires two tuning pa-rameters, one for the l norm and the other for the l norm. I set the two parameters to be equalto each other as the correlations among the columns of the adjacency matrices are very small. Thechoice of tuning parameters is similar to that used to generate appendix table A1 for networkswith 50 and 200 individuals. For networks with 500 individuals, I use benchmark λ instead ofcross-validation in the second stage. Appendix table A4 summarizes the results. As in previousresults, all coverages exceed the nominal 95% level.I report the empirical probabilities such that at least one individual is detected in a given networkcontrolling for the FDR at 5% using the Benjamini-Hochberg method. I also report the averagenumber of detections conditioning on at least one individual who is detected in a given network.Appendix table A4 shows that network 1, which is the relevant network, is more likely to be detectedin all cases than network 2, the irrelevant network. The average number of identiﬁed individualsfor network 1 is also more than that of network 2. I use the proposed estimator to study the importance of diﬀerent networks in spreading the par-ticipation in a micro ﬁnance program within rural Indian villages. I show that diﬀerent kinds ofnetworks have diﬀerent eﬀects on individuals decisions. I identify the inﬂuential individuals ineach village. My analysis shows that leaders among agricultural laborers, Anganavadi teachers,construction workers, small business owners and mechanics are very likely to be inﬂuential in thevillages.

A non-proﬁt organization named Bharatha Swamukti Samsthe (BSS) has been running micro ﬁ-nance programs in rural southern Karnataka, India since 2007. It provides small loan productsto poor women and, through them, to their families. The villages covered by the program aregeographically isolated and heterogeneous in terms of caste.30hen BSS initially introduces a micro ﬁnance program to a village, the credit oﬃcers of BSS ﬁrstapproached a number of “predeﬁned leaders”, such as teachers, shopkeepers and village elders.BSS held a private meeting with these leaders and explained the program. Then these predeﬁnedleaders passed the information onto other villagers. Those who were interested in the programand contacted BSS were trained and assigned to groups to receive credit. Each group consistedof 5 borrowers and group members were jointly liable for loans. Loans were around 10,000 rupees(approximately $200) at an annualized rate of approximately 28%. Note that 74.5 percent of thehouseholds in rural area said the monthly income of their highest earning member is less than 5,000rupees (source: Socio-Economic Caste Census-2011). This loan had to be repaid within 50 weeks.In 2006, 75 villages in Karnataka were surveyed 6 months before the initiation of the BSS microﬁnance program. This survey consisted of a village questionnaire and a detailed follow-up surveyconducted among a subsample of villagers. The village questionnaire gathered demographic infor-mation on all households in a village including GPS coordinates, age, gender, number of rooms,whether the house had electricity, and whether the house had a latrine. The data set also con-tains information on the “pre-deﬁned leaders” set who helped spread the information to the entirevillage. The follow-up survey collected data from a villager sample stratiﬁed according to age, ed-ucation level, caste, occupancy, etc. It also asked questions about social network structures along12 dimensions, including:- Friends: Name the 4 non-relatives whom you speak to the most.- Visit-go: In your free time, whose house do you visit?- Visit-come: Who visits your house in his or her free time?- Borrow-kerorice: If you needed to borrow kerosene or rice, to whom would you go?- Lend-kerorice: Who would come to you if he/she needed to borrow kerosene or rice?- Borrow-money: If you suddenly needed to borrow Rs. 50 for a day, whom would you ask?- Lend-money: Who do you trust enough that if he/she needed to borrow Rs. 50 for a day youwould lend it to him/her?- Advice-come: Who comes to you for advice?- Advice-go: If you had to make a diﬃcult personal decision, whom would you ask for advice?- Medical-help: If you had a medical emergency and were alone at home whom would you askfor help in getting to a hospital? 31 Relatives: Name any close relatives, aside from those in this household, who also live in thisvillage.- Temple-company: Do you visit a temple/mosque/church? Do you go with anyone else? Whatare the names of these people?For the 43 villages where micro ﬁnance was introduced by the time of 2011, BSS also collects infor-mation on which villagers have joined the program. These survey questions reveal the underlyingstructures for connections among any two individuals in the network. Figure 5 presents all thoseconnections at the household-level in a graph. Each node in the graph represents a household.A black node indicates that the household joined the micro ﬁnance program, while a white nodeindicates that it did not. Bigger nodes represent those households in which at least one familymember has been chosen as being among the “pre-deﬁned leaders”. An edge between two nodessigniﬁes that the two nodes are connected in at least one of the 12 networks. The darker the colorof the edge, the more connections it represents.This dataset provides an ideal framework for application of the heterogeneous endogenous eﬀectsmodel. First, it allows me to model endogenous eﬀects. An individual may decide to join themicro ﬁnance program if her neighbors or friends plan to join. Second, the endogenous eﬀects areindividual speciﬁc. Given the diversity of the villagers, it is possible that some villagers are moreinﬂuential than others. Third, it allows me to implement the heterogeneous endogenous eﬀectsmodel with multiple networks. The questions asked regarding multiple dimensions of the networkstructure allow me to explore which network is most inﬂuential.

In this empirical study, I focus on the 38 villages that have been introduced to the micro ﬁnanceprograms by BSS. For each village, I can observe both its social network structure and the villagers’decisions about joining the program. I drop the data for one village (Village 46) that containsincorrect entries on the index of households. Appendix table ?? summarizes the descriptive statisticsfor each village.Among the 12 questions about the social network structure, 4 pairs essentially capture the sameconnections among the villagers . Therefore, I consolidate each pair of questions into one dimen-sion: Assuming every villager truthfully answers a pair of questions, the adjacency matrices associated with eachquestion are the same. It is also plausible to treat villagers’ answers to each question as a separate directed graph. 

In your free time, whose house do you visit?Who visits your house in his or her free time?Borrow-Lend-kerorice 

If you needed to borrow kerosene or rice, to whom would you go?Who would come to you if he/she needed to borrow kerosene or rice?Borrow-Lend-money 

If you needed to borrow Rs.50 for a day, whom would you ask?Who do you trust enough that if he/she needed to borrow Rs.50 for aday you would lend it to him/her?Help decision 

Who comes to you for advice?If you had to make a diﬃcult personal decision, whom would you askfor advice?I restructure all the data at the household level as only women are allowed to apply for the microﬁnance program because the goal of BSS is to support families through the women in them. Asa result, a woman’s decision to join or not join the micro ﬁnance program becomes her family’sdecision. A connection between two villagers becomes a connection between two families. A“predeﬁned leader” is a villager selected by BSS to help spread information about the micro ﬁnanceprogram to the other villagers. At the household level, I use the term “predeﬁned leader” for ahousehold that contains at least one such villager.

To demonstrate how my method identiﬁes inﬂuential households, I model families’ decisions regard-ing joining the micro ﬁnance program as a network game with Bayesian Nash Equilibrium. Forhousehold i , let d ∗ i be the expected probability that i chooses to join the micro ﬁnance program.The decision of household i depends on its neighbors’ decisions as well as the types of connectionsbetween them. The decision also depends on its characteristics X i and on unobserved information (cid:15) i . Formally, it can be written as: d ∗ i = (cid:88) l ∈ N i d ∗ l ( q (cid:88) j =1 η jl ) + x i β + (cid:15) i However, these questions do not allow for clear determination of the directions. For example, if villager A visitsvillager B ’s house, it is not clear whether villager A inﬂuences villager B or vice versa D ∗ n = q (cid:88) j =1 (cid:0) M jn ◦ D ∗ n (cid:1) η j + X n β + (cid:15) n I assume that only a small number of households are inﬂuential over their neighbors. Leaders andfollowers are usually observed in those rural villages. Big decisions are often made by the villageelders or by the more educated among the villagers. BSS recognized the importance of leadersand gathered a group of predeﬁned leaders, asking them to inform the rest of the villagers abouttheir program. I do not consider the local level inﬂuence in these villages given the size and howcomplicated the network structures are. Households are closely connected by these 8 networks asshown in Figure 5 and there is no form of clique visible.Because the villages are considered geographically isolated, I apply my estimator separately toeach of the 38 villages. I use the number of rooms per person in a household as the independentvariable X n . Number of rooms per person is a proxy for the wealth in the family. As shownin table 1, it is negatively correlated with the decision to join the micro ﬁnance program. Thericher the family, the less likely the family is to participate in the micro-ﬁnance program. Onthe other hand, it is arguably an exogenous variable to other factors (career, education, etc) thatmight aﬀect the decision to join the micro-ﬁnance program. I further check the robustness of myindependent variable by including additional controls. The adjacency matrix M jn is constructedfrom the questions in the survey. Households i and k are connected in network j if either i or k reported the other in question j . Finally, d ∗ i is replaced with the household’s decision.The instruments are constructed as (cid:16) M jn ◦ X n (cid:17) for j = 1 , , · · · ,

8. I use the heterogeneous en-dogenous eﬀects model with multiple networks to: 1) Identify the eﬀective networks aﬀecting ahousehold’s decision and 2) Identify that households that are leaders in the village and study theassociation between observable characteristics and leader status. If a new program is going to try torecruit these households, the organizers can target those inﬂuential households and try to persuadethem to join ﬁrst.

First, I study how LASSO selects networks. I deﬁne a coeﬃcient for a household’s endogenous eﬀectin a network as signiﬁcant according to two diﬀerent criteria. The ﬁrst criterion, “Cross-Validation”,35able 1: Predictive Power of Characteristics X n (1) (2) (3)Participate Participate ParticipateAverage Num. rooms x100 -8.19 ∗∗∗ -7.12 ∗∗∗ -3.36 ∗∗∗ (1.30) (1.30) (1.12)Household Size x100 0.45 ∗∗ ∗∗ (0.21) (0.20)Electricity x100 1.29(1.26)Latrine x100 -5.66 ∗∗∗ (1.38)Average Num. workers x100 0.64 ∗ (0.50)Average age x100 -0.27 ∗∗∗ (0.03)Village Fixed Eﬀects Y Y Y n R Table 1 provides a robustness check for variable X n : Average Num. rooms, which used to construct instruments. Standarderrors in parentheses * p < .

1, ** p < .

05, *** p < .

01. Standard deviation clustered at village level. Dependent variable ishouseholds’ decision on whether to join the micro ﬁnance program or not. All design control village ﬁxed eﬀects. visit friendship borrow-lend borrow-lend relatives help medical templego-come keroric money decision help companyCross probability

70% 54% 38% 49% 22% 30% 14% 0%Validation identiﬁed

101 89 88 86 69 70 80 0De-bias probability

54% 35% 30% 19% 16% 8% 5% 0%identiﬁed

11 8 8 9 7 14 8 0magnitudes Table 2 reports the probability of detection for diﬀerent networks among the 38 villages. A network is detected as inﬂuential ifat least one leader is detected within this network. 1. Cross Validation represents those networks detected by lasso using crossvalidation. 2. De-bias represents those networks detected by signiﬁcant de-bias estimators under FDR control. 3. Probabilityreports the empirical probability that at least one coeﬃcient ˆ e ji is signiﬁcant in network j . 4. Identiﬁed reports the averagednumber of signiﬁcant ˆ e ji in the network j conditioning on the network being detected. 5. Magnitudes is the mean of | ˆ e ji | andrepresents the average endogenous eﬀects through network j Second, I focus on how LASSO selects households. I compare the LASSO selected inﬂuentialhouseholds with the BSS selected “predeﬁned leaders”. It is important to point out that these“predeﬁned leaders” are not necessarily inﬂuential villagers in a network. Recall that predeﬁnedleaders are a set of villagers that BSS select to help spread the information about the micro ﬁnanceprogram. The fact that a villager is selected as a “predeﬁned leader” to pass information aboutthe micro ﬁnance program does not a priori guarantee her or her family’s inﬂuence – her decisionto join the micro ﬁnance program may not lead to her neighbors’ decisions to join. In the analysesbelow, I will examine how inﬂuential villagers are associated with “predeﬁned leaders” and exploretheir potential diﬀerences.

1. Inﬂuential Predeﬁned Households

In table 3, I report results indicating that inﬂuential households selected by LASSO partly overlapwith “predeﬁned leaders”. This is intuitive because some “predeﬁned leaders” such as schoolheadmasters and village elders are highly respected ﬁgures in a village. Therefore, their decisionsare likely to be followed by others in the village. On average, BSS selected 27 villagers as “predeﬁnedleaders” in each village. In comparison, Cross-Validation criterion selects around 136 villagers andde-bias criterion selects around 19. Furthermore, on average, 19 out of 136 inﬂuential villagers (i.e.14%) selected by Cross-Validation criterion are also BSS “predeﬁned leaders”; 3 out of 19 inﬂuentialvillagers (i.e. 17%) selected by de-bias criterion are also BSS “predeﬁned leaders”. The likelihoodof selected by the two methods are both higher than the percentage of predeﬁned leaders in theentire village (11%). Comparing with a random guess of inﬂuential individuals, table 3 suggests theLASSO detected inﬂuential individuals are more likely to overlap with the predeﬁned leaders. InTable 6 below, I show that small business owners are more likely to be both inﬂuential and selectedas “predeﬁned leaders”.

2. Inﬂuential Non-Predeﬁned Households

In this and the following section, I focus on understanding the diﬀerences between the inﬂuentialhouseholds selected by LASSO and the “predeﬁned households” selected by BSS. I investigate thelikelihood that a household being selected by LASSO or by BSS, as associated with the careers of38able 3: Coverage of predeﬁned leaders % of predeﬁned leaders among: Average number of discovery LASSO detected entire village Cross Validation

14% 11% 136De-bias

17% 11% 19

Table 3 depicts the overlapping between inﬂuential households selected by LASSO and “predeﬁned leaders”. Predeﬁned leadersare a set of villagers deﬁned by BSS, who helped spread the information about the micro-ﬁnance program. 1. LASSO detectedreports the percentage of households detected by LASSO and also selected as “predeﬁned leaders” in total LASSO detectedhouseholds. 2. Entire village reports the percentage of “predeﬁned leaders” among the entire village. 3. Average number ofdiscovery reports the total number of individuals discovered by LASSO using each method. 4. Cross Validation representsthose individuals identiﬁed from lasso using cross validation. 5. De-bias represents those individuals identiﬁed from signiﬁcantde-biased estimators controlling FDR. 6. The average number of predeﬁned leaders in one village is 27. its family members. Table 4 and 5 present linear regression results using career dummy variablesof family members to explain whether a household is selected as “predeﬁned leader” (Column 1 intable 4), whether a household is selected by LASSO as inﬂuential using cross-validation (Column2 in table 4), and whether a household is selected by LASSO as inﬂuential using de-bias estimator(Column 3 in table 4). The full results of these regressions are reported in appendix table A7.Table 4 summarizes all careers that have a signiﬁcant impact ( p < . p < .

01) on the likelihood of ahousehold being selected by BSS as being among the “predeﬁned leaders”. Poojari are Indianpriests in those villages and they are very likely to be included as “predeﬁned leaders”. However,they are not likely to inﬂuence people to join the micro ﬁnance program. Other careers as tailor,39otel workers, veteran and barber are included as “predeﬁned leaders” because individuals doingthese jobs can spread information quickly in the village. However, LASSO does not ﬁnd theseindividuals to be inﬂuential.Table 6 reports the counter factual study when selected leaders all decide to join the micro-ﬁnanceprogram. The participation rate for non-leaders in the data is 16%. When all “predeﬁned leaders”decide to join, the participation rate for non-leaders will increase to 20%. And when all LASSOselected leaders decide to join, the participation rate for non-leaders will further increase to 33%.Table 4: Second Stage: LASSO selected leaders’ careers

Predeﬁned Selected by LASSOleaders Cross-validation De-biasAgriculture labour 0.00 0.31 ∗∗∗ ∗∗∗ (0.01) (0.01) (0.00)Anganavadi Teacher 0.14 ∗ ∗∗∗ (0.06) (0.07) (0.04)Construction/mud worker 0.00 0.17 ∗∗∗ ∗∗∗ (0.02) (0.03) (0.02)Truck/Tractor Driver -0.03 0.16 ∗∗∗ ∗∗∗ (0.03) (0.03) (0.02)Factory worker (bricks/stones/mill) -0.00 0.17 ∗∗∗ ∗∗∗ (0.02) (0.03) (0.01)Small business 0.22 ∗∗∗ ∗∗∗ ∗∗∗ (0.02) (0.03) (0.01)Teacher 0.05 0.22 ∗∗∗ ∗∗∗ (0.04) (0.05) (0.03)Daily labourer -0.05 ∗ ∗∗∗ ∗∗∗ (0.03) (0.03) (0.02)Wood cutter -0.03 0.15 ∗ ∗∗∗ (0.06) (0.07) (0.04)Animal skin business 0.36 0.62 ∗ ∗∗∗ (0.23) (0.28) (0.15)Control other careers Y Y YControl village ﬁx eﬀect Y Y Y Table 4 summarizes all careers that have a signiﬁcant impact ( p < . ?? . The ﬁrst column uses whether one is predeﬁned leaders as response variable,the second column uses whether one joins the micro-ﬁnance program as response variable and the third column uses whetherone is selected by lasso as response variable. Standard errors in parentheses * p < .

1, ** p < .

01, *** p < . Predeﬁned Selected by LASSOleaders Cross-validation De-biasSmall business 0.22 ∗∗∗ ∗∗∗ ∗∗∗ (0.02) (0.03) (0.01)Tailor Garment worker 0.08 ∗∗ ∗∗∗ ∗∗∗ ∗∗ -0.02(0.07) (0.09) (0.05)Poojari 0.53 ∗∗∗ ∗∗ -0.03 0.00(0.32) (0.39) (0.21)Barber/saloon 0.41 ∗∗∗ -0.01 -0.02(0.10) (0.12) (0.06)Control other careers Y Y YControl village ﬁx eﬀect Y Y Y Table 5 summarizes all careers that have a signiﬁcant impact (( p < . ?? . The ﬁrst column uses whether one is predeﬁned leadersas response variable, the second column uses whether one joins the micro-ﬁnance program as response variable and the thirdcolumn uses whether one is selected by lasso as response variable. Standard errors in parentheses * p < .

1, ** p < .

01, *** p < . Table 6: Participation Rate when Targeting Diﬀerent Leaders

In data Predeﬁned LASSOLeaders LeadersParticipation Rate 16% 20% 33%(non-leaders)

Table 6 reports the participation rate of non-leaders when all targeted leaders decided to join. The true participation rate indata is 16%. If all predeﬁned leaders decide to join, the participation rate will increase to 20%. If all LASSO detected leadersdecide to join, the participation rate will increase to 33%. Conclusions

In this paper, I propose a novel spatial autoregression model which allows for heterogeneous en-dogenous eﬀects. Speciﬁcally, each individual has an individual-speciﬁc endogenous eﬀect on herneighbors. My approach is useful for modeling a network with leaders and followers.I propose a set of instruments as well as a two stage LASSO (2SLSS) method to estimate my model.The instruments are constructed as a function of the independent variables and an adjacency matrix.I use a LASSO type estimator to select the valid instruments in the ﬁrst stage and the inﬂuentialindividuals in the second stage. I propose a bias correction for my two-stage estimator followingvan de Geer et al. (2014). I derive the asymptotic normality for my “de-bias” two-stage LASSOestimator and conduct robust inference including conﬁdence intervals.My model can be extended to allow for more ﬂexible structures. To apply LASSO, I assume thatthe number of inﬂuential individuals is sparse. I propose heterogeneous endogenous eﬀects modelwith cliques to incorporate locally inﬂuential individuals, where the sparsity assumption is onlyapplied to globally inﬂuential individuals. My model can also be extended to situations wherethere are multiple networks. I propose the use of the sparse group LASSO in my 2SLSS process.I derive the convergence rate and prove the consistency of selection for the sparse group LASSOestimator.I apply my method to study villagers’ decisions to participate in micro-ﬁnance programs in ruralareas of Indian. I show that leaders in those villages have signiﬁcant inﬂuence over their neighbors’decision to join the micro-ﬁnance program, and I provide rankings for the diﬀerent social andeconomic networks among villagers. Based on how eﬀectively each network spreads the impact ofinﬂuential individuals’ decisions, my method shows that some networks such as “visit go-come” and“borrow money” are much more eﬀective in inﬂuencing villagers’ decisions than other networks suchas “temple company” and “medical help”. I further show that individuals from certain careers suchas agricultural workers, Anganwadi teachers and small business owners are more likely to inﬂuenceother villagers and the “predeﬁned leaders” selected by BSS are diﬀerent than the LASSO detectedinﬂuential individuals.

References

Acemoglu, D., Garc´ıa-Jimeno, C., and Robinson, J. A. (2012). Finding eldorado: Slavery andlong-run development in colombia. NBER WORKING PAPER SERIES.42mmermuller, A. and Pischke, J.-S. (2009). Peer eﬀects in european primary schools: Evidencefrom pirls.

Journal of Labor Economics , 27(3):315–348.Anselin, L. (1988).

Spatial Econometrics: Methods and Models . Boston: Kluwer.Ballester, C., Calv´o-Armengol, A., and Zenou, Y. (2006). Who’s who in networks. wanted: Thekey player.

Econometrica , 74(5):1403–1417.Bandiera, O., Barankay, I., and Rasul, I. (2009). Social connections and incentives in the workplace:Evidence from personnel data.

Econometrica , 77(4):1047–1094.Banerjee, A., Chandrasekhar, A., Duﬂo, E., and Jackson, M. (2013). The diﬀusion of microﬁnance.

Science , 341(6144).Belloni, A., Chernozhukov, V., and Hansen, C. (2014). Inference on treatment eﬀects after selectionamongst high-dimensional controls.

The Review of Economic Studies , 81(2):608–650.Belloni, A., Chernozhukov, V., and Kato, K. (2015). Uniform post selection inference for ladregression and other z-estimation problems.

Biometrika , 102:77–94.Blume, L. E., Brock, W. A., Durlauf, S. N., and Jayaraman, R. (2015). Linear social interactionsmodels.

Journal of Political Economy , 123(2):444–496.Bonaldi, P., Hortacsu, A., and Kastl, J. (2015). An empirical analysis of funding costs spillovers inthe euro-zone with application to systemic risk. NBER Working Paper.Bramoull´e, Y., Djebbari, H., and Fortin, B. (2009). Identiﬁcation of peer eﬀects through socialnetworks.

Journal of Econometrics , 150(1):41–55.B¨uhlmann, P. (2013). Statistical signiﬁcance in high-dimensional linear models.

Bernoulli ,41(2):802–837.B¨uhlmann, P. and van de Geer, S. (2011).

Statistics for High-Dimensional Data . Springer.Bunea, F., Lederer, J., and She, Y. (2014). The square root group lasso: theoretical properties andfast algorithms.

IEEE-Information Theory , 60:1313–1325.Calv´o-Armengol, A., Patacchini, E., and Zenou, Y. (2009). Peer eﬀects and social networks ineducation.

Review of Economic Studies , 76(4):1239–1267.Chernozhukov, V., Chetverikov, D., Demirer, M., Duﬂo, E., Hansen, C., Newey, W., and Robins, J.(2018). Double/debiased machine learning for treatment and causal parameters.

EconometricsJournal , 21:C1–C68. 43hristakis, N. A., Fowler, J. H., Imbens, G. W., and Kalyanaraman, K. (2010). An empirical modelfor strategic network formation. NNBER working paper.Clark, A. E. and Loheac, Y. (2007). “it wasn’t me, it was them!” social inﬂuence in risky behaviorby adolescents.

Journal of Health Economics , 26:763–784.Cliﬀ, A. and Ord, J. K. (1973).

Spatial autocorrelation . London: Pion.Coelli, T., Rahman, S., and Thirtle, C. (2002). Technical, allocative, cost and scale eﬃcienciesin bangladesh rice cultivation: A nonparametric approach.

Journal of Agricultural Economics ,53(3):607–626.Conley, T. G. and Udry, C. R. (2010). Learning about a new technology: Pineapple in ghana.

AMERICAN ECONOMIC REVIEW , 100(1):35–69.Cressie, N. A. C. (1993).

Statistics for Spatial Data . John Wiley & Sons, Inc.de Paula, A., Rasul, I., and Souza, P. C. (2015). Recovering social networks from panel data:identiﬁcation, simulations and an application.Denbee, E., Julliard, C., Li, Y., and Yuan, K. (2015). Network risk and key players: A structuralanalysis of interbank liquidity.Fan, J. and Liao, Y. (2014). Endogeneity in high dimensions.

The Annals of Statistics , 42(3):872–917.Gautier, E. and TsyBakov, A. B. (2014). High-dimensional instrumental variables regression andconﬁdence sets. TSE Working Paper.Guryan, J., Kroft, K., and Notowidigdo, M. J. (2009). Peer eﬀects in the workplace: Evidencefrom random groupings in professional golf tournaments.

American Economic Journal: AppliedEconomics , 1(4):34–68.Horrace, W. C., Liu, X., and Patacchini, E. (2016). Endogenous network production functions withselectivity.

Journal of Econometrics , 190(2):222–232.Javanmard, A. and Montanari, A. (2014). Conﬁdence intervals and hypothesis testing for high-dimensional regression.

Journal of Machine Learning Research , 15(1):2869–2909.Jin, F. and Lee, L.-F. (2018). Lasso maximum likelihood estimation of parametric models withsingular information matrices.

Econometrics , 6(1):1–24.Kelejian, H. H. and Prucha, I. R. (1995). A generalized moments estimator for the autoregressiveparameter in a spatial model.

INTERNATIONAL ECONOMIC REVIEW , 40.44elejian, H. H. and Prucha, I. R. (1998). A generalized spatial two-stage least squares procedurefor estimating a spatial autoregressive model with autoregressive disturbances.

Journal of RealEstate Finance and Economics , 17(1):99–121.Krauth, B. V. (2005). Peer eﬀects and selection eﬀects on smoking among canadian youth.

CanadianJournal of Economics , 38(3):735–757.Lee, L. (2002). Consistency and eﬃciency of least squares estimation for mixed regressive, spatial.

Econometric Theory , 18(2):252–277.Lee, L. (2003). Best spatial two-stage least squares estimators for a spatial autoregressive modelwith autoregressive.

Econometric Reviews , 22(4):305–335.Lee, L. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial econo-metric models.

Econometrica , 72:1899–1926.Lee, L. and Liu, X. (2010). Eﬃcient gmm estimation of high order spatial autoregrssive modelswith autoregressive disturbances.

Econometric Theory , 26:187–230.Lee, L.-f. and Yu, J. (2010). A spatial dynamic panel data model with both time and individualeﬀects.

Econometric Theory , 26:564–597.Leeb, H. and Potscher, B. M. (2005). Model selection and inference: facts and ﬁction.

EconometricTheory , 21(1):21–59.Leeb, H. and Potscher, B. M. (2008). Can one estimate the unconditional distribution of post-modelselection estimators?

Econometric Theory , 24(2):38–376.Leeb, H. and Potscher, B. M. (2009). Model selection.

Handbook of Financial Time Series , pages889–925.Manresa, E. (2013). Estimating the structure of social interactions using panel data.Manski, C. (1993). Identiﬁcation of endogenous social eﬀects: The reﬂection problem.

The Reviewof Economic Studies , 60(3):531–542.Mas, A. and Moretti, E. (2009). Peers at work.

American Economic Review , 99(1):112–145.Masten, M. A. (2018). Random coeﬃcients on endogenous variables in simultaneous equationsmodels.

The Review of Economic Studies , 85(2):1193–1250.Meinshausen, N. and B¨uhlmann, P. (2006). High-dimensional graphs and variable selection withthe lasso.

The Annals of Statistics , 34(1436-1462).45akajima, R. (2007). Measuring peer eﬀects on youth smoking behaviour.

The Review of EconomicStudies , 74(3):897–935.Neidell, M. and Waldfogel, J. (2010). Cognitive and noncognitive peer eﬀects in early education.

Review of Economics and Statistics , 92(3):562–576.Pinkse, J., Slade, M., and Brett, C. (2002). Spatial price competition: a semiparametric approach.

Econometrica , 70(3):1111–1153.Sacerdote, B. (2001). Peer eﬀects with random assignment: Results for dartmouth roommates.

The Quarterly Journal of Economics , 116(2):681–704.Sheng, S. (2016). A structural econometric analysis of network formation games through subnet-works. Conditional Acceptance at Econometrica.Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2013). The sparse group lasso.

Journal ofComputational and Graphical Statistics , 22(2):231–245.Upton, G. and Fingleton, B. (1985).

Spatial data analysis by example. Volume 1: Point patternand quantitative data.

John Wiley and Sons Ltd.van de Geer, S., Buhlmann, P., Ritov, Y., and Dezeure, R. (2014). On asymptotically optimalconﬁdence regions and tests for high-dimensional models.

The Annals of Statistics , 42(3):1166–1202.Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables.

Journal of the Royal Statistical Society , B(68):49–67.Zhang, C.-H. and Zhang, S. S. (2011). Conﬁdence intervals for low-dimensional parameters inhigh-dimensional linear models.

Journal of the Royal Statistical Society , 76(1):217–242.Zhao, P. and Yu, B. (2006). On model selection consistency of lasso.

Journal of Machine LearningResearch , 7:2541–2563.Zhu, Y. (2018). Sparse linear models and l1regularized 2sls with high-dimensional endogenousregressors and instruments.