HHeterogeneous Endogenous Effects in Networks ∗ Sida Peng † August 5, 2019
Abstract
This paper proposes a new method to identify leaders and followers in a network. Prior works usespatial autoregression models (SARs) which implicitly assume that each individual in the network hasthe same peer effects on others. Mechanically, they conclude the key player in the network to be the onewith the highest centrality. However, when some individuals are more influential than others, centralitymay fail to be a good measure. I develop a model that allows for individual-specific endogenous effectsand propose a two-stage LASSO procedure to identify influential individuals in a network. Under an as-sumption of sparsity: only a subset of individuals (which can increase with sample size n ) is influential,I show that my 2SLSS estimator for individual-specific endogenous effects is consistent and achievesasymptotic normality. I also develop robust inference including uniformly valid confidence intervals.These results also carry through to scenarios where the influential individuals are not sparse. I extendthe analysis to allow for multiple types of connections (multiple networks), and I show how to use thesparse group LASSO to detect which of the multiple connection types is more influential. Simulationevidence shows that my estimator has good finite sample performance. I further apply my method tothe data in Banerjee et al. (2013) and my proposed procedure is able to identify leaders and effectivenetworks. key words: key players, network, endogenous effects, spillovers, high-dimensional models, LASSO,model selection, robust inference ∗ I would like to thank Francesca Molinari, Matthew Backus, David Easley, Marten Wegkamp, Donald Kenkel,Zhuan Pei, Douglas Miller, Joris Pinkse, Peter Hull, Yanlei Ma and participants in seminars and conferences at whichthis paper was presented. All remaining errors are mine. † Microsoft Research, [email protected] a r X i v : . [ ec on . E M ] A ug Introduction
How an individual’s behavior is affected by the behavior of her neighbors in an exogenously givennetwork is an important research question in applied economics. With the increasing availability ofdetailed data documenting connections among individuals, spatial autoregression models (SARs)have been widely applied in the empirical networks literature to estimate endogenous effects.In SARs, an individual’s behavior depends on the weighted average of other individuals’ behav-iors (see Anselin, 1988; Kelejian and Prucha, 1998). Standard SARs assume that the peer ef-fects/endogenous effects are the same across individuals in a network. Each individual influencesher neighbors at the same rate regardless of who she is. However, in many contexts, some indi-viduals are clearly more influential than others. For example, Mas and Moretti (2009) finds thatthe magnitude of spillovers varies dramatically among workers with different skill levels. Clark andLoheac (2007) also notes that popular teenagers in a school have much stronger influence on theirclassmates’ smoking decisions than their less popular peers.I propose a novel SAR model which allows for heterogeneous endogenous effects. Each individual ina network simultaneously generates an outcome that takes into account all her neighbors’ behaviors.Unlike standard SARs, each individual has an individual-specific effect on her neighbors. As a result,there are as many coefficients for individual-specific endogenous effects as there are individuals inthe network. To achieve identification, I assume that “truly-influential” individuals only constitutea small fraction of the total population. In other words, individual-specific coefficients are assumedto be sparse. This assumption allows me to estimate the model via the least absolute shrinkageand selection operator (LASSO). The LASSO procedure penalizes the l norm for the coefficientsof heterogeneous endogenous effects. The geometry of the l norm enforces the sparsity in theLASSO estimators. If a coefficient is selected by LASSO (i.e. the estimated coefficient is non-zero), the individual associated with this coefficient can influence all her neighbors at her specificrate. Otherwise the LASSO estimator will indicate that the individual has no influence on herneighbors. With some restrictions on the network structures, I show that the LASSO estimates forheterogeneous endogenous effects have near oracle performance (see B¨uhlmann and van de Geer,2011). In other words, the selection of influential individuals is consistent and the convergence rateof non-zero LASSO estimates is the same as the convergence rate that would have been achieved ifthe truly influential individuals were known.One challenge in my estimation process is the presence of endogeneity in the spatial lag and errorterm. As with standard SARs, the dependent variable in my model is used to construct spatiallags as an independent variable. As a result, the regressors are correlated with the error term and1stimates would be biased if we were to apply LASSO directly.First I propose a set of novel instruments to address the endogeneity. Following Kelejian and Prucha(1998), I express the dependent variable as an infinite sum of functions consisting of exogenouscharacteristics and an adjacency matrix. I show that the exogenous characteristics of influentialindividuals can be used as instruments for their neighbors. Then I design a two-stage estimationprocess for heterogeneous endogenous effects using LASSO at each stage. In the first stage , Iuse LASSO to estimate the coefficients for the instruments. These estimated coefficients andinstruments are then used to create a synthetic dependent variable.
In the second stage , I replace thedependent variable in the spatial lags with the synthetic variable to perform the LASSO estimation.The next challenge is to construct robust confidence intervals for my LASSO type two-stage esti-mator. As pointed out in Leeb and Potscher (2008), it is impossible to estimate the distribution forpost-model-selection estimators. Consistent model selection by LASSO is only guaranteed when allnon-zero coefficients are large enough to be distinguished from zero in a finite sample (i.e. a condi-tion usually named “beta-min”). LASSO may fail to select regressors with very small coefficients,resulting in omitted variable bias in the post LASSO inference.I propose a bias correction for my two-stage estimator following the recent LASSO inference liter-ature (see Chernozhukov et al., 2018; Belloni et al., 2015; van de Geer et al., 2014; Javanmard andMontanari, 2014; Zhang and Zhang, 2011; Zhu, 2018). The idea is to correct the first order biasand make the estimators independent from the model selection. Heuristically, shrinkage bias due tothe l penalty in LASSO can be expressed as a function of the LASSO estimators. Normality canstill be achieved after adjusting for this bias. I show that this strategy also works in my two-stageestimation process under the presence of spatial errors. I derive the asymptotic normality for my“de-bias” two-stage LASSO estimator and conduct robust inference including confidence intervals.My model can be extended to allow for more flexible structures of influencing. One real worldscenario is a network which consists of many local leaders. Local leaders may only influenceindividuals within their own cliques/groups but have no influence on individuals outside theircliques/groups. Influence from local leaders can be represented by a homogenous endogenouseffects. Exogenous effect and homophily within cliques may also be present.To solve the problem in this scenario, I modify my model by bringing back the classical SAR model.I assume that there are both local leaders and global leaders in the network. In contrast to localleaders, global leaders can influence individuals across different cliques. I show that homogeneousendogenous effects across local leaders, exogenous effects, and correlated effects can be identifiedunder assumptions similar to those Bramoull´e et al. (2009). Under a sparsity assumption, the en-2ogenous effects of global leaders, whose influence remain individual-specific, can also be identifiedunder the same set of instruments proposed in my main model.Another real world scenario is the existence of multiple types of connections among individuals.For example, connections among individuals can be classified as social (e.g. friendship, kinship) oreconomic (e.g. lending, employment). In epidemiology, infectious disease can be spread throughair, insects, or direct contact. It is important to identify which networks are more efficient attransmitting the endogenous effects.I model different types of connections as multiple networks. I propose the use of sparse groupLASSO to estimate a heterogeneous endogenous effects model with multiple networks. The sparsegroup LASSO penalizes both the l norm and the l norm for each coefficient in each type ofconnection and thus selects both the influential individuals and effective types of connections thatgenerate spillover. I derive the convergence rate and prove the consistency of selection for thisestimator. To the best of my knowledge, my paper is the first to show statistical properties forsparse group LASSO.I provide simulation evidence for networks of different sizes and different data generating algorithms.The empirical coverage of my proposed estimators is close to the nominal level in all scenarios.Similar results are also found in models with multiple networks and with cliques.I apply my method to study villagers’ decisions to participate in micro-finance programs in ruralareas of India as in Banerjee et al. (2013). Instead of simulating the contagion process and recon-structing the data into panel format as in Banerjee et al. (2013), my method allows researchers todirectly analyze cross-sectional data under the standard equilibrium assumptions. Among differentsocial and economic networks, my method shows that some networks such as “visit go-come” and“borrow money”, are much more effective at influencing villagers’ decisions than other networkssuch as “go to temple together” and “medical help”. I further show that individuals in certaincareers such as agricultural workers, Anganwadi teachers and small business owners are more likelyto influence villagers. This paper brings together literature on spatial autoregression model, LASSO and networks.
SARs :SARs have been widely applied in empirical studies. For instance, they have been used to study3eer effects in labor productivity (see Mas and Moretti, 2009; Guryan et al., 2009; Bandiera et al.,2009), smoking behavior among teenagers (see Krauth, 2005; Clark and Loheac, 2007; Nakajima,2007), educational achievements among different student groups (see Sacerdote, 2001; Neidell andWaldfogel, 2010), systemic risk in finance (see Bonaldi et al., 2015; Denbee et al., 2015), and theadoption of new agricultural technologies (see Coelli et al., 2002; Conley and Udry, 2010). My paperproposes a novel extension of standard SARs that could be used to identify influential individuals ina given network. My methodology for estimating such a model could easily be adopted in existingempirical SARs analyses to identify influential individuals who influence their peers productivity,smoking decisions, or financial holdings.More specifically, my model extends existing SARs literature by introducing heterogeneous endoge-nous effects. Until very recently, SARs always assume a constant rate of dependence for endogenouseffects across different individuals. Moreover, row-sum normalization is widely adopted in the esti-mation, which may further result misspecification under the existence of heterogeneous endogenouseffects. (see Cliff and Ord (1973), the first monograph on the topic, and the later studies, Up-ton and Fingleton (1985); Anselin (1988); Cressie (1993); Lee and Liu (2010); Lee and Yu (2010);Jin and Lee (2018)). Recent developments in social interaction literature incorporate individualcharacteristics into SARs, essentially allows for a limited degrees of pre-stipulated heterogeneity,for example, (see Pinkse et al., 2002). In contrast, this paper shows that heterogeneous endoge-nous effects can be identified from individuals’ outcomes instead of being pre-specified throughindividuals’ characteristics. Another article studying individual heterogeneity from random coeffi-cient aspect is Masten (2018). Masten (2018)’s model assumes each individual has different rateof receiving the influence while my model assumes each individual has different rate of influencingher neighbors. The two different models yield two different identification strategies. This papercontributes to the literature by showing heterogeneous endogenous effects can be directly identifiedusing cross-sectional data.To estimate the heterogeneous endogenous effects in my model, I propose a methodology that isdifferent from standard SARs literature. In classic SARs, there is only one endogenous variable andhence it is sufficient to identify the model through only one instrument. In my model, the numberof potentially endogenous variables increases as the number of observations increases. As a result,I propose a set of instruments that contain the same number of instruments as the total number ofindividuals. Each instrument is essentially a decomposition from the standard SARs instrumentsas in Kelejian and Prucha (1998), Lee (2002), Lee (2003) and Lee (2004).I show that my model can be combined with SAR model under homogeneous endogenous effects,exogenous effects and correlated effects. To solve the classical “reflection problem” as in Manski41993), I adopt the same strategy as in Bramoull´e et al. (2009) and show that “neighbors’ neighbor”instruments can be combined with my set of instruments to identify additional structures in thespillovers.This paper also contributes to literature that models multiple networks through SARs. In standardSARs, multiple networks are modeled as higher order spatial lags (see Lee and Liu, 2010). Mymodel allows each individual to have her own specific endogenous effects in each network. Thisdesign enables the selection at the network level which implies some networks can be classified ascompletely irrelevant to decision-making. Furthermore, as the asymptotic allows the number ofnetworks q to go to infinity, my model can handle those cases when researchers observe a largenumber of networks ( q > n ). LASSO :My paper contributes to the growing literature on endogenous regressors in LASSO estimators. Forinstance, Belloni et al. (2014) proposes the double selection mechanism to study confounded treat-ment effects. Fan and Liao (2014) proposes a GMM type estimator to deal with many endogenousregressors. Gautier and TsyBakov (2014) proposes a Self Tuning Instrumental Variables (STIV)estimator. The paper that is closest to mine is Zhu (2018), which studies the statistical propertiesof two-stage least square procedure with high-dimensional endogenous regressors. The two-stageestimator proposed in Zhu (2018) assumes an i.i.d error term for the first stage. However, thisassumption is incompatible with SAR model as the structure assumptions in SAR lead to a firststage with correlated unobservables. In this paper, I derive the rate of convergence and consistencyof selection for a two-stage LASSO estimator under the presence of spatial errors. I show thata modified “de-bias” LASSO estimator accounting for spatial errors can be constructed for myestimator in a manner similar to Zhang and Zhang (2011), B¨uhlmann (2013), van de Geer et al.(2014), and Zhu (2018). I derive its asymptotic distributions and show how to perform inference.This paper also extends LASSO literature by deriving statistical bounds and consistency of selectionfor sparse group
LASSO estimator. Yuan and Lin (2006) proposes the group LASSO, in whichexplanatory variables are represented by different groups. The group LASSO assumes that sparsityexists only among groups, i.e. some groups of variables are relevant while other groups are not.Simon et al. (2013) proposes the sparse group LASSO, which further allows sparsity within eachgroup, i.e. some regressors within the relevant groups can also be irrelevant. Bunea et al. (2014)derives statistical properties for the square-root group LASSO, which combines group LASSO andsquare-root LASSO. When estimating a heterogeneous endogenous effects model with multiplenetworks, I establish both statistical bounds and consistency of selection for the sparse group
LASSO estimator.
Sparse group
LASSO differs from group LASSO by allowing the number of5egressors in each group to also goes to infinity as the number of groups goes to infinity. To thebest of my knowledge, this paper is the first to show asymptotic statistical properties for the sparsegroup
LASSO estimator.
Networks :My paper shares similar microfoundations with SARs as discussed in Blume et al. (2015), wherethe individual utility function can be written as a linear summation of the private and socialcomponents. The private component is a quadratic loss function on individual’s efforts. The socialcomponents depend on the network structure as well as the efforts of one’s neighbors. While themarginal rate of substitution between the private and social components of utility is assumed fixedin SARs, I assume this rate is individual-specific and depends on one’s neighbors. My paper appliesand extends LASSO approaches to deal with a high-dimensional problem in networks. The totalnumber of possible edges in a network is n , however, the social interaction networks we oftenobserve are far more sparse. This is an ideal setting where penalized estimators like LASSO couldbe applied. Manresa (2013) studies the heterogeneous exogenous effects in a network using LASSO.de Paula et al. (2015) explores the use of LASSO to recover network structures. Both these twopapers consider panel data and rely on repeated observations of the same network to identify theirmodels. My model considers cross-sectional data. To identify an individual’s endogenous effects, Irely on the variations in her neighbors’ outcome.My paper also relates to the literature on identifying the key players in the network followingBallester et al. (2006), Calv´o-Armengol et al. (2009), and Horrace et al. (2016). Under the frame-work of SARs, every individual is assumed to have the same endogenous effects. As a result,individuals who are well-connected in the network (with high centrality measure) are consideredas the key players. However, well connected individuals may effectively have zero effects on theirneighbors under heterogeneous endogenous effects. Indeed, as shown in the empirical application,well connected villagers such as tailors, hotel workers, veterans, and barbers are not influential inother villagers’ decisions to join the micro-finance program.The rest of this paper is organized as follows: in Section 2, I introduce the model; in Section 3, Idiscuss assumptions; in Section 4, I design estimation procedures and derive consistency and asymp-totic properties; in Section 5, I show finite sample performance using Monte Carlo simulations; inSection 6, I apply my proposed model to study influential individuals and effective networks inpromoting micro-finance programs in rural India; and in Section 7, I conclude.6 Models
Let n denotes the total number of observed individuals in a network. The outcome of individual i is denoted as d i and is the variable of interest. Here d i can represent outcome variable of interestassociated with individual i , such as whether to join a program or i ’s labor productivity. In standardSAR model, it is assumed that the outcome of each neighbor of individual i impacts her outcomehomogeneously through a constant rate λ : d i = λ (cid:88) j ∈ N i d j + x i β + (cid:15) i , (1)where the set N i is defined as individual i ’s neighbors. The matrix form of this model is expressedas follows: D n = λ M n D n + X n β + (cid:15) n , (2)where D n = ( d , d , · · · , d n ) (cid:48) is the n -dimensional vector of observable outcomes. The n by k matrix X n represents the observable exogenous characteristics of individuals. When (cid:15) n is specified as an n -dimensional vector of independent and identically distributed disturbances with zero mean anda constant variance σ , equation (2) is also called a mixed regression model.The spatial weight matrix M n is of size n by n , where the ( i, j )th entry represents the connectionbetween individual i and individual j . In empirical studies, the spatial weight matrix is oftenreplaced by the adjacency matrix (see Ammermuller and Pischke, 2009; Acemoglu et al., 2012;Banerjee et al., 2013): the ( i, j )-th entry of the matrix M n takes value 1 if individual i and individual j are connected and takes value 0 otherwise; the diagonal entries of the matrix M n are always 0s.In the SAR literature, spatial weight matrix or adjacency matrix is taken as exogenous. Mymethod follows this assumption and is designed for cross-sectional data. However, it is importantto recognize that social networks do change over time and it is then important to take networkformation into modeling (see Christakis et al. (2010), Sheng (2016)).In mixed regression model, endogenous effects (see Manski, 1993) or network effects (see Bramoull´eet al., 2009) are captured by the scalar λ . An implicit assumption in equation (2) is that λ , therate of endogenous effects, is identical across all individuals in the network. Although a limiteddegree of heterogeneity can be built into the adjacency matrix via pre-specified structure assump-tions, the identification potential for heterogeneous endogenous effects has not been fully explored.This limitation has been noted in various studies (see Ammermuller and Pischke, 2009; de Paula7t al., 2015). I relax this assumption by proposing a more flexible model that allows and identifiesindividual-specific endogenous effects as discussed below. I propose the following model: d i = (cid:88) j ∈ N i d j η j + x i β + (cid:15) i (3)where N i represents the set of individual i ’s neighbors and η j represents the endogenous effects ofindividual j on the outcome of all her neighbors i ∈ N j . the model can be rewritten in matrix formas: D n = (cid:16) M n ◦ D n (cid:17) η + X n β + (cid:15) n , (4)where η = ( η , η , · · · , η n ) (cid:48) is a vector of parameter of size n by 1. The i th entry in η represents theendogenous effects of individual i on her neighbors. This model allows for individual heterogeneityto interact with endogenous effects so that every individual is allowed to have her own coefficient η i . My model allows some η j = 0. In other word, there are individuals that impose no endogenouseffects on their neighbors. I define those individuals with η j (cid:54) = 0 as influential.The operator ◦ is defined between a n by n matrix M n and a n by 1 vector D n as M n ◦ D n = M n · diag( D n ) = C, where diag ( · ) is the diagonalization operator and C i,j = M i,j d j .Note that in contrast to fixed rate λ specified in equation (2), even though each neighbor ofindividual j is assumed to receive the same influence d j η j from her, each individual is allowed toinfluence her neighbors at her own rate η j .A more generalized form of the model is to replace η j with η ij . This generalization can furthercapture the different perceiving rate of endogenous effects on the same leader but among differentfollowers. However, the number of parameters increase from n to n and the model becomes toosaturated to estimate.Another direction of modeling is to assume heterogeneous perceiving rate of endogenous effects butmaintaining the assumption of homogeneous influencing rate, for example d i = λ ∗ i (cid:88) j ∈ N i d j + x i β + (cid:15) i λ ∗ i to be passed to a first stageregression as shown in proposition 1. As a result, the identification strategies are fundamentallydifferent.Equation (4) can be derived from a bayesian Nash Equilibrium. Let ( x i , (cid:15) i ) denotes an individual’stype, where x i is publicly observed characteristics and (cid:15) i is private characteristics only observableby i . Individual i ’s utility depends on her own action and characteristics as well as her neighbors’actions. Individual i chooses action d i to maximize the following utility: U i ( d i , d − i ) = ( x i β + (cid:15) i ) d i − d i + (cid:88) j ∈ N i d j d i η j The first order condition yields equation (4). The micro-foundations derived above is similar to theone for SARs as discussed in Blume et al. (2015).
Peer Effects in Labor Productivity:
Understanding the mechanism and magnitude of the dependence of labor productivity on coworkersis an important question for economists and policy makers. As found in Mas and Moretti (2009),workers respond more to the presence of coworkers with whom they frequently interact. Anothermodern example is on code sharing platforms like GitHub. Programmers provide efforts to anopen source project depending on how much other programers are contributing. In such cases, theinfluence level of each individual to hers coworkers is not the same. Equation (4) can be used toincorporate such differences. y i = (cid:88) j ∈ N i y j η j + x i β + (cid:15) i , where y i is individual i ’s productivity, and η j represents the size of influence of coworker j – allelse being equal, the additional effect on individual i ’s productivity if individual j becomes hercoworker.Note that if we restrict the parameters η j to be the same across different workers, then we are backto the classical SARs setting as laid out in equation (2). Thus, λ = n (cid:80) nj =1 η j can be interpreted asthe averaged spillover effects in the canonical sense. Notice that λ converges to 0 when the spillovereffects are sparse and it may lead to the conclusion that there is no spillover in the network undersuch scenario. 9efine λ ∗ = 1 (cid:80) η j (cid:54) =0 n (cid:88) j =1 η j η j (cid:54) =0 as the averaged endogenous effects for influential workers. The estimand λ ∗ is arguably better than λ for average endogenous effects as it excludes individuals who are not influential to their neighbors. Online Opinion Leaders:
A decision can represent whether to “tweet” a news story seen online. When individuals makesuch decisions, they are often influenced by several online opinion leaders – whether those peoplealso “tweet” the news or not. Political figures may have stronger influence on people’s tweet forpolitical news than celebrities. And vice versa for entertainment news. Assume a binary decision(0 ,
1) is made from a bayesian Nash Equilibrium, such that d ∗ i = (cid:88) j ∈ N i d ∗ j η j + x i β + (cid:15) i , where d ∗ i is the probability of individual i playing action 1, and (cid:80) j ∈ N j d ∗ j η j is the expected endoge-nous effects from i ’s neighbors N i . Define S = { j : η j (cid:54) = 0 } as truly influential opinion leaders. My method provides a way to estimate ˆ S which is asymptoticallyconsistent with S . This is an important metric to policy makers or private sectors as targeting ornudging the influential individuals is usually more efficient than targeting the entire population. Heterogeneous Endogenous Effects Model with Cliques:
There are two important assumptions for equation (4). First is the sparsity assumption, whichrequires non-influential individuals to have completely zero influence on their neighbors. Secondit assumes away the exogenous and correlated effects. Consider a network composed of manycliques (small groups of connected individuals). Each clique has its local leaders who only influenceindividuals within their own clique. Figure 1 provides an example of such a network structure.Note that in Figure 1, node S , S and S represent local leaders who only influence individualswithin their own cliques. On the contrary, node S represents a global leader who can influenceindividuals across different cliques. For example, one can think about the local leaders S , S and10 as local news channels while S is the national news channel. Furthermore, within each clique,there might exist exogenous and correlated effects. I propose an extension to my heterogeneousendogenous effects model which could address such concerns. I assume that all local leaders willinfluence their followers at a small but similar rate while global leaders can have different effectson their audience. S S S S Figure 1: Local LeaderFirst, I consider the following extension to introduce only homogeneous endogenous effects: d i = (cid:88) j ∈ N i d j η j + γ (cid:88) j ∈ N i d j + x i β + (cid:15) i , which can be represented in matrix form as: D n = (cid:16) M n ◦ D n (cid:17) η + M n D n γ + X n β + (cid:15) n , (5)where η (cid:48) = ( η , η , · · · , η n ) (cid:48) . The new term γ (cid:80) j ∈ N i d j captures influence from the local level.Note that this is the same term as the spatial lag in the benchmark spatial autoregression model.The vector η captures the heterogeneous endogenous effects of global leaders.To further include exogenous and correlated effects, consider the following model: d i = (cid:88) j ∈ N i d j η j + γ end (cid:88) j ∈ N i d j + γ exo (cid:88) j ∈ N i x j + x i β + µ c + (cid:15) i , (6)Besides the first term representing heterogeneous endogenous effects, equation (6) is the same asthe model in Manski (1993). Heterogeneous Endogenous Effects Model with Multiple Networks:
Individuals are often connected with each other through more than one type of network. Forexample, one’s financial network (for borrowing/lending) may be different from hers relative network11r even friendship network. A common strategy in empirical application is to “pool” the networksby mixing all types of connections into one network. This approach may increase the noise in thenetwork measurement when the outcome variables only depend on certain types of networks.To capture different types of connections among the same set of individuals, we can incorporatemultiple networks in the heterogeneous endogenous model. More specifically, a separate adjacencymatrix can be constructed for each type of network. For instance, the ( i, j )-th entry of the adjacencymatrix representing friendship takes value 1 if individual i and individual j are friends and takesvalue 0 otherwise; that representing the borrowing/lending network takes value 1 if individual i and individual j lend money to each other and takes value 0 otherwise.Let q be the total number of different types of networks. Define M ln as the adjacency matrix forthe l th network. The heterogeneous endogenous effects model with multiple networks is defined as d i = q (cid:88) l =1 (cid:88) k ∈ N i d lk η lk + x i β + (cid:15) i (7)Note that in this model, different networks could potentially bear different endogenous effects forthe same individual. In equation (7), coefficient η lk represents the rate of endogenous effect ofindividual k through network l . As a result, we have nq + k coefficients for endogenous effects. Inaddition, I assume endogenous effects from different types of networks are linearly additive. Themodel can also be rewritten in matrix form as: D n = q (cid:88) l =1 (cid:16) M ln ◦ D n (cid:17) η l + X n β + (cid:15) n , (8)where M ln is the adjacency matrix for network l . η l = ( η l , η l , · · · , η ln ) (cid:48) is an n by 1 vector for l =1 , , · · · , q . Define a network l as efficient network if η li (cid:54) = 0 for at least one individual i = 1 , , · · · , n .In this specification, leaders can only influence their neighbors through efficient networks and non-efficient networks are completely independent from the outcome variable. The assumptions discussed in this section combine both standard SARs type assumptions andLASSO type assumptions. SARs type assumptions ensure the existence of valid instruments.LASSO type assumptions provide sufficient and necessary conditions for valid inference. The iden-tification for influential individuals is achieved through its coincidence with the sparsity pattern inthe reduced form estimator. 12irst recall the benchmark SAR model: D n = λ M n D n + X n β + (cid:15) n , (9)By rearranging the above equation, we can express endogenous variable M n D n solely as a functionof X n and M n , since: D n = J − n X n β + J − n (cid:15) n where I n is the n by n identity matrix and J n = I n − λ M n . It is straightforward that J − n X n canserve as valid instruments for M n D n . As a result, the identification and estimation of equation (9)can be achieved through either 2SLS or GMM as proposed in papers such as Kelejian and Prucha(1995), Kelejian and Prucha (1998) Lee (2002), Lee (2003), and Lee (2004). Following the samestrategy, I derive a set of instruments by solving D n as a function of exogenous variables and theadjacency matrix. Without additional restrictions, equation (4) could not be estimated through canonical method asthe number of parameters n + k is greater than the number of observations n . Assumption 1 (Sparsity) . Let S n ⊂ { , , · · · , n } denotes the set of influential individuals (i.e. η j (cid:54) = 0 ). Let s n = | S n | be the number of elements in S n . s n = o (cid:18) √ n log n (cid:19) , as n → ∞ Assumption 1 is usually referred to as “sparsity” assumption. The assumption that most individualsin a network are not influential is plausible under many circumstances. For example, opinion leaderson social media only constitute a very small fraction of internet users. Star programmers who canencourage other programmers to work with them on Github are also a small portion of all Githubusers. On the other hand, assumption 1 can be easily violated when there exists many local leaders.For example, when studying the peer effects in obesity among school children, each class/grade canbe treated as a clique and as the number of cliques increases, the number of leaders may increase ata rate of O ( n ). In next subsection, I propose an extension of the current model to partially addressthis problem. Assumption 2 (SAR restrictions) . - There exists an η max < such that (cid:107) η (cid:107) ∞ ≤ η max The (cid:15) j are i.i.d sub-Gaussian random variable with 0 mean and variance σ - The regressors x i in X n are non-stochastic and uniformly bounded for all n. lim n →∞ X (cid:48) n X n /n exists and is nonsingular- The minimum eigenvalue of ( I n − M n ◦ η ) , Λ min , is uniformly bounded away from 0. The restriction on η excludes the unit root process and ensures the uniqueness of equilibrium. Itguarantees the invertibility of (cid:16) I n − M n ◦ η (cid:17) . This is a fundamental assumption required in allSAR literatures see Upton and Fingleton (1985); Anselin (1988); Jin and Lee (2018)).The assumption on the error term currently excludes exogenous effects and correlated effects frommy model. The SAR model with this exclusion restriction is known as mixed regression model asin (see Lee, 2002). The main challenge to relax this assumption is known as “reflection problem” aspointed out in Manski (1993). I adopt a strategy similar to Bramoull´e et al. (2009) to incorporateboth exogenous and correlated effects in next subsection. I also require the error term to besub-Gaussian so that concentration inequalities can be derived to bound the empirical process.Sub-Gaussian process is known to have “almost” bounded support due to the fast decay of its tails.This assumption is usually required for high dimensional estimators as in Belloni et al. (2014), Zhu(2018) and etc.The assumption on the regressors follows the convention in the SAR literatures see Anselin (1988);Jin and Lee (2018). This deterministic design assumption can be extended to the random designcases under my model. In the following context, I focus on the case where X n is an n by 1 vectorand study identification as in Bramoull´e et al. (2009).I also require the minimum eigenvalue of ( I n − M n ◦ η ) to be uniformly bounded away from 0. Thisis to prevent the spatial errors from accumulating too fast. The spatial error term after the matrixinversion is ( I n − M n ◦ η ) − (cid:15) . A similar assumption can be found in Kelejian and Prucha (1995)and Kelejian and Prucha (1998), which requires the uniformly boundedness of ( I n − M n ◦ η ) − .To proceed, recall the definition of the operator “ ◦ ” as M n ◦ D n = M n · diag( D n ), where diag( · ) isthe diagonalization operator. Note the following property of the “ ◦ ”: (cid:16) M n ◦ D n (cid:17) η = (cid:16) M n ◦ η (cid:17) D n If the invertibility of (cid:16) I n − M n ◦ η (cid:17) is guaranteed, then D n = (cid:16) M n ◦ D n (cid:17) η + X n β + (cid:15) n ⇔ D n = ∞ (cid:88) i =0 (cid:16) M n ◦ η (cid:17) i ( X n β + (cid:15) n ) (10)14ince (cid:16) M n ◦ D n (cid:17) η is correlated with (cid:15) n and η is sparse (i.e. having at most s n non-zero elements),we need at least s n instruments to deal with the endogeneity in the model. Using equation (10),we can express the expectation of D n as follows: E ( D n ) = X n β + (cid:16) M n ◦ X n (cid:17) ( βη ) + ∞ (cid:88) i =2 (cid:16) M n ◦ η (cid:17) i X n β , (11)Let ( · ) S denote the operator such that ( M n ) S is a sub matrix of M n with its columns restricted tocolumns corresponding to the elements of S . The first and second terms of equation (11) suggestthat X n and ( M n ◦ X n ) S can serve as valid instruments to point identify β and η . Proposition 1 (First Stage Equivalence) . Under assumption 1 and 2 E ( D n ) = X n β + (cid:16) M n ◦ X n (cid:17) ˜ η, (12) where ˜ η j = η j ˜ g ( η , β , X n , M n ) for some function ˜ g depends on η , β , X n , and M n . Proposition 1 shows the sparsity pattern is preserved when solving the simultaneous equations since˜ η j = 0 as long as η j = 0 . As a result, the sparsity assumption is also satisfied in equation (12),and I can thus estimate equation (12) as the first stage using a LASSO type estimator.Define W n as the projection matrix on to the orthogonal space of X n : W n = I n − X n ( X (cid:48) n X n ) − X (cid:48) n Assumption 3 (Independence) . W n ( M n ◦ X n ) S has full column rank. The linear independence among ( M n ◦ X n ) S requires the assumption that any two influential indi-viduals may not necessarily connect with identical neighbors. Moreover, assumption 3 also requiresthat neighbors of an influential individual cannot be a linear combination of neighbors of severalother influential individuals, which rules out network structures as depicted in Figure 2. de Paula et al. (2015) noticed that sparsity pattern will generally not be preserved during matrix inversion whenthere is no further restriction on the adjacency matrix. Proposition 1 shows that with pre-existing network structureand homogeneous influence assumption for a given influential individual, sparsity pattern can still be preserved aftermatrix inversion. S . . . . . . . . . . . .Figure 3: Examples of networks which satisfies assumption 3 The influence of S can be identified by comparing red (right shaded) and grey groups (plain), while theinfluence of S can be identified by comparing blue (left shaded) and grey (plain) groups. Or the influenceof S can be identified by comparing green (dotted) and blue (left shaded) groups, while the influence of S can be identified by comparing red (right shaded) and green (dotted) groups. S S . . .(1) S S S . . . . . .(2)Figure 2: Examples of networks which violate assumption 3 (1) Two influential individual S and S share the exact same neighbors. (2) The neighbors of an influentialindividual S is a linear combination of S and S ’s neighbors. In other words, as long as each influential individual has a neighbor that is not connected with anyother influential individuals, assumption 3 is satisfied. One can think of the identification here asestimating fixed effects from influential individuals as illustrated in Figure 3. Collinearity ariseswhen the fixed effects of two influential individuals are imposed on exactly the same observations.Assumption 3 is essentially a restriction on the topology of network structures. It rules out caseslike complete network or cases (1) and (2) as in Figure 2. To achieve identification under a cross-sectional network data, one has to rely on certain network structures. Similar assumptions can befound in Bramoull´e et al. (2009) and de Paula et al. (2015).At this point, if the truly influential individuals set S n were available to us, we would be able16o estimate the model using 2SLS method or GMM. However, in most cases, S n is not knownbeforehand. Notice that the identification for the set S n in the structure model (4) coincides withthe sparsity pattern in the reduced form (12). I propose to use a LASSO type estimator to bothrecover the set of influential individuals and estimate the model. For LASSO to achieve correctrecovery, I need the following assumptions: Assumption 4. [LASSO restrictions](Irrepresentable Condition) There is a ϑ ∈ (0 , such that max (cid:107) u (cid:107) ∞ ≤ (cid:13)(cid:13) diag ( f S c ) · Σ M , (Σ M , ) − · diag ( f S ) − · u (cid:13)(cid:13) ∞ < ϑ where Σ := 1 n M (cid:48) n W n M n = (cid:32) Σ M , Σ M , Σ M , Σ M , (cid:33) and f = ( I − M n ◦ η ) − X n β (Beta Min Condition) There exists N ∈ N and a m > such that ∀ n ≥ N , min( | η | ) S ≥ m/ √ n, Here define the operator ( · ) S as the sub-matrix/vector restricted to the columns/entries corre-sponding to influential individuals. Similarly, ( · ) S c represents the sub-matrix/vector restricted bythe columns/entries corresponding to non-influential individuals. Also notice that the invertibilityof Σ M , is guaranteed by assumption 3.We prove in theorem 2 that assumption 4 is a sufficient condition for the LASSO estimator toachieve a consistent selection for the set S n in the second stage. While assumption 3 restricts howinfluential individual may connect with their neighbors, assumption 4 restricts how non-influentialindividual may connect. For example, Irrepresentable Condition prevents the neighbors of a non-influential individual to be exactly the same as the neighbors of any influential individual. This isbecause when two individuals connect with exactly the same neighbors, we cannot distinguish whichindividual is the true source of influence. This is illustrated in Figure 4 (1). On the other hand,assumption 4 does not require full independence between influential individuals and non-influentialindividuals. This is illustrated in Figure 4 (2). 17 . . . s (1) S . . . . . .. . . s (2)Figure 4: Examples of networks which violate and satisfy assumption 4 In (1), the influence of S can not be separately identified from S although S is influential and S isnon-influential. In (2), the influence of S can be identified by comparing red (right shaded) and blue(left shaded) groups. And then we can find out S is non-influential by comparing blue (left shaded) andgreen (dotted) groups. The Beta Min Condition requires the magnitude of the endogenous effects to be sufficiently strongin order to be detected by LASSO. For example, there does not exist a sequence of individualswhose influence decay to 0 faster than the rate of 1 / √ n . Beta Min Condition is restrictive suchthat it rules out uniform inference and creates additional problems when constructing confidenceintervals.Equation (4) can still be consistently estimated under weaker conditions than Irrepresentable Con-dition and Beta Min Condition. The stronger version is assumed above to ensure selection consis-tency. As shown in Zhao and Yu (2006), the Irrepresentable Condition together with the Beta MinCondition are necessary and sufficient conditions for LASSO to achieve consistent model selection.The following compatibility condition will guarantee the valid inference using “de-bias” estimatorproposed in the next section. Assumption 4-1 (Compatibility Condition) . For some φ > (independent of n) and for all η satisfying (cid:107) η S c (cid:107) ≤ (cid:107) η S (cid:107) , it holds that (cid:107) η S (cid:107) ≤ ( η (cid:48) ( M n ) (cid:48) S W n ( M n ) S η ) s /φ , Assumption 4-1 is referred as compatibility condition as in van de Geer et al. (2014). It also restrictsthe network topology (i.e complete network is still ruled out) but is weaker than assumption 3 +assumption 4. Under assumption 4-1, we can still construct uniformly valid inference for de-bias18stimator proposed in the next section even though consistent model selection is not guaranteed.
Heterogeneous Endogenous Effects Model with Cliques:
The heterogeneous endogenous effects model with cliques can be written in matrix form as: D n = (cid:16) M n ◦ D n (cid:17) η + M n D n γ + X n β + (cid:15) n Assumption 1’ (Sparsity with Cliques) . Among n individuals in the network, let S n ⊂ { , , · · · , n } be the set of global leaders. Let s n = | S n | be the number of elements in S n . Assume: s n = o (cid:18) √ n log n (cid:19) , as n → ∞ Assumption 1’ relaxes the exact sparsity in assumption 1 without imposing any restriction on thenumber of local leaders. For example, it does not rule out situations where everyone is (locally)influential. Local leaders’ influence will be captured by the γ , coefficient of classical spatial lag.The number of global leaders is restricted to be sparse and we can identify these leaders similar asin previous case.To ensure invertibility of the matrix (cid:16) I n − M n ◦ η − M n γ (cid:17) , I modify assumption 2 as: Assumption 2’ (SAR Restrictions with Cliques) . In addition to assumption 2, there exists an η max < such that (cid:107) η + γ (cid:107) ∞ ≤ η max Similar to assumption 2, this assumption excludes unit root processes. Since there exists a locallevel influence γ in the network, global level influence η need to be further bounded away from1. As a result, equation (5) can be transformed into the following: E ( D n ) = X n β + (cid:16) M n ◦ X n (cid:17) ( β η ) + M n X n ( β γ ) + ∞ (cid:88) i =2 (cid:16) M n ◦ η + γ M n (cid:17) i β X n (13) Proposition 2 (First Stage Equivalence with Cliques) . Under assumption 1’ and 2’ E ( D n ) = X n β + ( M n ◦ X n )˜ η ∗ , ∞ + ∞ (cid:88) i =1 γ i M in X n β + ∞ (cid:88) i =1 M in ( M n ◦ X n )˜˜ η ∗ , ( i, ∞ ) , where ˜ η ∗ , ∞ j = η ,j ˜ g ∞ k,j ( η , γ , β , X n , M n ) , ˜˜ η ∗ , ( i, ∞ ) j = η ,j ˜ h ( i, ∞ ) k,j ( η , γ , β , X n , M n ) for some function ˜ g ∞ k,j and ˜ h ( i, ∞ ) k,j depend on β , γ , η , X n , and M n . γ introduced in equation (5) to be identified. Anextra instrument M n X n need to be introduced. And thus we need to assume additional indepen-dence: Assumption 3’ (Independence with Cliques) . (cid:2) W n ( M n ◦ X n ) S , W n M n X n (cid:3) has full column rank. To further incorporate exogenous and correlated effects, recall equation (6) can be written in matrixform as D n = (cid:16) M ∗ n ◦ D n (cid:17) η + M ∗ n D n γ end + M ∗ n X n γ exo + X n β + µ c + (cid:15) n Here, I define M ∗ n as the row sum normalized version of M n . The main challenge for the identi-fication is known as “reflection problem” (Manski (1993)). Bramoull´e et al. (2009) proposed the“neighbor’s neighbor” instruments as a solution. By taking the local difference, we can obtain thefollowing form( I − M ∗ n ) D n = ( I − M ∗ n ) (cid:16) M ∗ n ◦ D n (cid:17) η +( I − M ∗ n ) M ∗ n D n γ end +( I − M ∗ n ) M ∗ n X n γ exo +( I − M ∗ n ) X n β +( I − M ∗ n ) (cid:15) n and invert the simultaneous equation, we have E (( I − M ∗ n ) D n ) = (cid:16) I − M ∗ n ◦ η − M ∗ n γ end (cid:17) − (cid:16) β · I + γ exo M ∗ n (cid:17) ( I − M ∗ n ) X n , Proposition 3 (First Stage Equivalence with Cliques + exogenous and correlated effects) . Underassumption 1’ and 2’ and assume M ∗ n is row sum normalized E (( I − M ∗ n ) D n ) = ( I − M ∗ n ) X n β + ( γ exo + γ end β ) M ∗ n ( I − M ∗ n ) X n + ( M ∗ n ◦ X n )˜ η ∗ , ∞ + ∞ (cid:88) i =1 ( γ end ) i ( γ exo + γ end ) M ∗ ( i +1) n ( I − M ∗ n ) X n β + ∞ (cid:88) i =1 M ∗ in ( M n ◦ X n )˜˜ η ∗ , ( i, ∞ ) where ˜ η ∗ , ∞ j = η ,j ˜ g ∞ ,k,j ( η , γ exo , γ end , β , X n , M n ) , ˜˜ η ∗ , ( i, ∞ ) j = η ,j ˜ h ( i, ∞ )2 ,k,j ( η , γ exo , γ end , β , X n , M n ) for some function ˜ g ∞ ,k,j and ˜ h ( i, ∞ )2 ,k,j depend on β , γ exo , γ end , η , X n , and M n . Assumption 3’-1.
Assume γ exo + β γ end (cid:54) = 0 and (cid:2) W n ( M ∗ n ◦ X n ) S , W n M ∗ n X n , W n M ∗ n X n , W n M ∗ n X n (cid:3) is full rank and M ∗ is row sum normalizable. Similar to proposition 4 in Bramoull´e et al. (2009), assumption 3’-1 requires the independencebetween M ∗ n X n and M ∗ n X n in order to use the “neighbor’s neighbors” as instruments to addressthe “reflection” problem. Furthermore, the independence of M ∗ n X n will pin down the identificationfor correlated effects. 20 eterogeneous Endogenous Effects Model with Multiple Networks: The heterogeneous endogenous effects model with multiple networks can be represented in matrixform as follows: D n = q (cid:88) j =1 (cid:0) M jn ◦ D n (cid:1) η j + X n β + (cid:15) n The number of coefficients in this model becomes nq + k and the number of observed networks q may also increase as the number of observations n increases. As a result, the sparsity assumptionwill be imposed on both the influential individuals and the effective networks. I assume that someof the networks are completely irrelevant (i.e. η j = 0) and that relevant networks are not necessarilypassing influence for everyone (i.e. η j (cid:54) = 0 but η j ,i = 0 for some i ).Second, to ensure invertibility, for any matrix norm (cid:107) . (cid:107) : (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) q (cid:88) j =1 (cid:16) M jn ◦ η j (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ q (cid:88) j =1 (cid:13)(cid:13)(cid:13) (cid:16) M jn ◦ η j (cid:17) (cid:13)(cid:13)(cid:13) ≤ q (cid:88) j =1 (cid:107) η j (cid:107) ∞ (cid:13)(cid:13)(cid:13) (cid:0) M jn (cid:1) (cid:13)(cid:13)(cid:13) Because M jn is the adjacency matrix such that each entry is 0 or 1, (cid:80) qj =1 (cid:107) η j (cid:107) ∞ < I − (cid:80) qj =1 (cid:16) M jn ◦ η j (cid:17) . Proposition 4 (First Stage Equivalence – Multiple Networks) . Under assumption assumption 1*and assumption 2* E ( D n ) = X n β + q (cid:88) j =1 (cid:16) M jn ◦ X n (cid:17) (˜ η j ) where ˜ η jk = η jk ˜ g j ( η , β , X n , M n ) for some function ˜ g j depends on η j , β , X n , and M jn . Third, I require (cid:104) X n , (cid:16) M n ◦ X n (cid:17) S , (cid:16) M n ◦ X n (cid:17) S , · · · , (cid:16) M qn ◦ X n (cid:17) S (cid:105) to be full rank. Compared withthe standard model, this assumption requires the independence condition to hold across differentnetworks. The four assumptions required for multiple networks are listed formally in the appendixas assumption assumption 1*, 2*, 3*, and 4*. The proposed estimator is similar to the two-stage least square method but use LASSO in bothstages. 21 wo-Stage LASSO Estimator: - First Stage: ( ˜ β, ˜ η ) = arg min β,η (cid:107) D n − X n β − (cid:16) M n ◦ X n (cid:17) η (cid:107) + λ | η | (14)Obtain a LASSO fitting ˆ D n ˆ D n = X n ˜ β + (cid:16) M n ◦ X n (cid:17) ˜ η - Second Stage: ( ˆ β, ˆ η ) = arg min β,η (cid:107) D n − (cid:16) M n ◦ ˆ D n (cid:17) η − X n β (cid:107) + λ | η | (15)As shown in section 3, (cid:16) M n ◦ D n (cid:17) is correlated with (cid:15) n . Thus equation (4), equation (5) andequation (8) cannot be estimated directly using LASSO or sparse group LASSO. The instrumentsproposed in section 3 are [ X n , ( M n ◦ X n ) S ]. We do not observe the set S but [ X n , ( M n ◦ X n )] is aset of regressors that contains the valid instruments. The estimator ˆ β and ˆ η suffer from LASSO shrinkage bias. Moreover, post model selection inferenceconditioning on the selected model ˆ S n = { i | ˆ η (cid:54) = 0 } suffers from the omitted variable bias and thusis not uniformly valid (see Leeb and Potscher, 2005, 2008, 2009). I construct a “de-bias” estimatorunder my setting and derive the asymptotic distribution for it. I propose the following de-biasLASSO estimator: de-bias 2SLSS Estimator: ˆ e = ˆ η + ˆΘ ˜ X (cid:48) n ( M n ◦ ˆ D n ) (cid:48) W n ( D n − ( M n ◦ D n )ˆ η ) /n ˆ b = ˆ β − ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) (cid:16) ( M n ◦ ( ˆ D n − D n ))ˆ η (cid:17) ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) X n ˆ β and ˆ η are estimators from the 2SLSS. Define W n = (cid:16) I − X n ( X (cid:48) n X n ) − X (cid:48) n (cid:17) . ˜ X n = n ( M n ◦ D n ) (cid:48) W n ( M n ◦ ˆ D n ), ˆΘ are defined by the nodewise regression as in Meinshausen and B¨uhlmann (2006)on ˜ X n and ˆ γ β are again defined by the nodewise regression on between X n and ( M n ◦ D n ). Nodewiseregression explores the correlation between the columns of the design matrix ˜ X n by regressing eachcolumn on all the rest of the columns while penalizing the coefficients. An approximation of the22nverse of the matrix n ˜ X (cid:48) n ˜ X n can be constructed based on nodewise regression. Further, defineˆ S n = { i | ˆ η (cid:54) = 0 } , which represents the LASSO selected active set. The estimators (ˆ e, ˆ b ) are adjustedfor the LASSO shrinkage bias and are a consistent estimator for η and β . They are similar to theestimators proposed in van de Geer et al. (2014), but are constructed through a two-stage process.The de-bias estimator also differs from the two-stage estimators proposed in Zhu (2018) due tospatial correlation. Theorem 1.
Under assumption 1, assumption 2 and assumption 4-1. There exist constant c , c , c and ι : (cid:107) ι (cid:107) < ∞ such that for the first stage tuning parameter λ ≥ (cid:113) σ c Λ − min log nn and secondstage tuning parameter λ ≥ (cid:113) σ c Λ − min log nn + 2 c λ + 2 c λ , the de-bias estimator √ n ( ι (cid:48) ˆ e − ι (cid:48) η ) = 1 √ n ι (cid:48) ˆΘ ˜ X (cid:48) n (cid:15) − ι (cid:48) ∆ ∼ N (0 , σ ι (cid:48) ˆΘ ˜ X (cid:48) n Ω n ˜ X n ˆΘ (cid:48) ι ) √ n (ˆ b − β ) = ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) (cid:15) ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) X n + ∆ β ∼ N , σ ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) ( X n − ( M n ◦ D n )ˆ γ β ) (cid:16) ( X n − ( M n ◦ D n )ˆ γ β ) (cid:48) X n (cid:17) where (cid:107) ∆ (cid:107) ∞ = o p (1) , (cid:107) ∆ β (cid:107) ∞ = o p (1) , ˜ X n = n ( M n ◦ D n ) (cid:48) W n ( M n ◦ ˆ D n ) , Ω n = n ( M n ◦ ˆ D n ) (cid:48) W n ( M n ◦ ˆ D n ) , ˆΘ are defined by the nodewise regression on ˜ X n and ˆ γ β are defined by the nodewise regressionon between X n and ( M n ◦ D n ) . Theorem 1 shows that the 2SLSS estimator achieves normality at the standard rate √ n . The shifts∆ and ∆ β represent the bias from using nodewise regression and they are shown to be o p (1) withthe proper choice of tuning parameters. Theorem 2.
Under assumption 1-4, there exist ˜ γ > and ˜ ϑ = ϑ − ϑ . For λ ≥ γ ∨ ˜ ϑ ) (cid:113) σ c Λ − min log nn +2 c λ + 2 c λ , lim n →∞ P ( ˆ S n = S ) = 1 ; .2 2SLSS for Two Extensions Two-Stage LASSO Estimator with Homogenous Effects: - First Stage: for a pre-specified constant k ,( ˜ β, ˜ γ, ˜ η ) = arg min β,γ,η (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D n − X n β − k (cid:88) i =1 M in X n γ ,i − k (cid:88) i =0 M in (cid:16) M n ◦ X n (cid:17) η ,i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + λ (cid:32) k (cid:88) i =1 | η ,i | + | γ ,i | (cid:33) Obtain a LASSO fitting ˆ D n ˆ D n = X n ˜ β + M n X n ˜ γ + (cid:16) M n ◦ X n (cid:17) ˜ η - Second Stage:( ˆ β, ˆ γ, ˆ η ) = arg min β,γ,η (cid:107) D n − M n ˆ D n γ − (cid:16) M n ◦ ˆ D n (cid:17) η − X n β (cid:107) + λ ( | η | + | γ | )From proposition 2, the sparsity pattern can not be fully preserved by ( M n ◦ X n ) so additionalstructures like M in ( M n ◦ X n ) need to be included in the first stage. Those terms represent theinfluence from global leaders but passing i th times through local leaders. By assumption 2’, theinfluence represented by M in ( M n ◦ X n ) decreases as i increases.The second stage of 2SLSS with Cliques case can be viewed as a special case of the standard 2SLSSestimator. For example, one can rewrite the second stage as( ˜ β, ˜ γ, ˜ η ) = arg min β,γ,η (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D n − X n β − (cid:16) M n ˆ D n (cid:16) M n ◦ ˆ D n (cid:17) (cid:17) · (cid:32) γη (cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + λ | η | + λ | γ | As a result, asymptotics as theorem 1 and theorem 2 follows.To incorporate the structure of multiple networks, I propose the use of sparse group LASSO.24 wo-Stage LASSO Estimator with Multiple Networks: - First Stage:( ˜ β, ˜ η ) = arg min β,η (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D n − X n β − q (cid:88) j =1 ( M jn ◦ X n ) η j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + q (cid:88) j =1 (cid:16) λ (cid:107) η j (cid:107) + λ (cid:107) η j (cid:107) (cid:17) Obtain a LASSO fitting ˆ D n ˆ D n = X n ˜ β + q (cid:88) j =1 ( M jn ◦ X n )˜ η j - Second Stage:( ˆ β, ˆ η ) = arg min β,η (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D n − X n β − q (cid:88) j =1 ( M jn ◦ ˆ D n ) η j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + q (cid:88) j =1 (cid:16) λ (cid:107) η j (cid:107) + λ (cid:107) η j (cid:107) (cid:17) The sparse group LASSO introduces two tuning parameters, λ and λ , to penalize both the l andthe l norm in each network. Similar to the LASSO estimator, the geometric shape of the penaltiesallows the sparse group LASSO to identify sparsity not only within each network (group) butalso among networks (groups). In other words, some networks could be completely irrelevant (i.e. η j = 0) and within relevant networks, some individuals can have no influence on their neighbors(i.e. η j (cid:54) = 0 but η ji = 0 for some i ). de-bias 2SLSS Estimator under Multiple Networks: ˆ e M = ˆ η M + ˆΘ Z ˜ Z (cid:48) n ˆ Z (cid:48) n W n ( D n − Z n ˆ η M ) /n ˆ b M = ˆ β − ( X n − Z n ˆ γ β ) (cid:48) ( ˆ Z n − Z n ˆ η M )( X n − Z n ˆ γ β ) (cid:48) X n ˆ β and ˆ η M = (ˆ η (cid:48) , ˆ η (cid:48) , · · · , ˆ η q (cid:48) ) (cid:48) are estimators from the 2SLSS estimator with multiple networks. Letˆ Z n = (cid:104) ( M n ◦ ˆ D n ) , ( M n ◦ ˆ D n ) , · · · ( M qn ◦ ˆ D n ) (cid:105) , Z n = (cid:2) ( M n ◦ D n ) , ( M n ◦ D n ) , · · · ( M qn ◦ D n ) (cid:3) . Define˜ Z n = n Z (cid:48) n W n ˆ Z n , ˆΘ Z are defined by the nodewise regression as in Meinshausen and B¨uhlmann(2006) on ˜ Z n and ˆ γ β are again defined by the nodewise regression on between X n and Z n . Theorem 3.
Under assumption 1*, assumption 2* and assumption 4-1*. There exists constant , d , d and ι : (cid:107) ι (cid:107) < ∞ such that for the first stage tuning parameter λ , ≥ (cid:113) σ d Λ − min (log n +log q ) n and second stage tuning parameter λ , ≥ (cid:113) σ d Λ − min (log n +log q ) n + 2 d λ , + 2 d λ , , the de-biasestimators √ n ( ι (cid:48) ˆ e M − ι (cid:48) η M ) = 1 √ n ι (cid:48) ˆΘ Z ˜ Z (cid:48) n (cid:15) − ι (cid:48) ∆ M ∼ N (0 , σ ι (cid:48) ˆΘ Z ˜ Z (cid:48) n Ω mn ˜ Z n ˆΘ (cid:48) Z ι ) √ n (ˆ b M − β ) = ( X n − Z n ˆ γ β ) (cid:48) (cid:15) ( X n − Z n ˆ γ β ) (cid:48) X n + ∆ M,β ∼ N , σ ( X n − Z n ˆ γ β ) (cid:48) ( X n − Z n ˆ γ β ) (cid:16) ( X n − Z n ˆ γ β ) (cid:48) X n (cid:17) where (cid:107) ∆ M (cid:107) ∞ = o p (1) , (cid:107) ∆ M,β (cid:107) ∞ = o p (1) and ˜ Z n = n Z (cid:48) n W n ˆ Z n and Ω mn = n ˆ Z (cid:48) n W n ˆ Z n . ˆΘ Z aredefined by the nodewise regression on ˜ Z n and ˆ γ β are defined by the nodewise regression on between X n and Z n . Theorem 3 requires the rate of convergence for sparse group LASSO at the first stage. This isproved in Lemma ?? in the appendix. In group LASSO, λ , controls the convergence of the l λ , to achieve l convergence. However,group LASSO requires the number of regressors in each group to be finite. In my estimator, thenumber of regressors in each group equals to n . In the next theorem, I show that λ , will need tobe chosen in the same order as λ , in order to achieve consistent selection. Theorem 4.
Let ˜ c = λ , λ , and ˜ ϑ mul = ϑ mul − ϑ mul . Choose λ , and λ , such that ˜ c > ϑ mul − ϑ mul . Underassumption 1*-4* and for λ , ≥ γ ∨ ˜ ϑ mul ) (cid:113) σ d Λ − min (log n +log q ) n + 2 d λ , + 2 d λ , , there exists ˜ γ > , lim n →∞ P ( ˆ S n = S ) = 1It is worth pointing out that assumption ?? is a weaker assumption than assume the full irrepre-sentable condition on the design matrixΣ mul := 1 n [ M n , M n , · · · , M qn ] (cid:48) W n [ M n , M n , · · · , M qn ]As the number of regressors in each group also goes to infinity as the number of groups goesto infinity, the multicollinearity in Σ mul can be severe. Instead, assumption ?? decomposes themulticollinearity into between-group and within-group and thus allows a more flexible dependencestructure. As a result, sparse group LASSO can be used to recover the influential individuals aswell as the effective networks that generate spillovers.26 Simulations
In this section, I report Monte Carlo simulation results for the models proposed above. My resultsare robust when applied to networks generated by different algorithms and to networks of differentsizes.
First, I use the Erdos-Renyi algorithm to simulate a network of size n . Individuals are added intothe graph one at a time. When one individual is added to the network, she has probability p ofgenerating a link with all existing individuals independently. I choose p = 0 . p = 0 . p because collinearity among regressors may arise when links becomevery dense, violating assumption 3 or 4.I set the first 5 individuals to be influential by letting their coefficients η j be non-zero. To guaranteethe existence of endogenous effects, I arbitrarily specify the connections among these five individ-uals. The adjacency matrix M n for the five influential individuals is given in the appendix. If theconnections among these five individuals are not fixed, there is a possibility that no connections areformed among these five and thus there is no endogeneity in the network. The results will be toogood in such a case. The true parameters are fixed as β = 3, η , = η , = η , = η , = η , = 0 . η ,j = 0 for j >
5. Individual characteristics X n are generated from a standard normal distri-bution. Individual outcomes Y n are then generated as Y n = ( I − M n ◦ η ) − ( X n β + (cid:15) n ) where (cid:15) n is drawn independently from a standard normal distribution.I use ( M n , X n , Y n ) as observations and apply my two-stage LASSO estimator. I construct the de-bias 2SLSS estimator and repeat the above process 200 times in a manner similar to van de Geeret al. (2014). I report the average coverage probability (Avgcov) and average length (Avglength)of confidence intervals for the coefficients for influential individuals, { η , · · · , η } , the coefficient forindividual characteristics, β , and the coefficients for non-influential individuals, the η j s ( j > S = s − (cid:88) j ∈ S P [ η ,j ∈ CI j ] (16)Avglength S = s − (cid:88) j ∈ S length ( CI j ) (17)I separately report the average coverage and average length for each of the five influential individ-uals. As shown in appendix table A1, the coverage is around the nominal 95% level and the length27f the confidence intervals decreases as the sample size grows.Since we can construct confidence intervals for all n coefficients, joint inference can be performedunder the control of False Discover Rate (FDR). As shown in equation (18), the power reportedin appendix table A1 represents the average percentage in the active set (i.e. { , , , , } ) thatis significant after controlling for the False Discover Rate (FDR) at 5% using the Benjamini-Hochberg method. The FDR reported in appendix table A1 represents the average percentage ofthe non-active set (i.e. { , , · · · , n } ) that is significant after controlling the FDR at 5% using theBenjamini-Hochberg method. The exact definition is as in equation (19).Power = s − (cid:88) j ∈ S P [ H ,j is rejected] (18)FDR = (cid:88) j ∈ S c P [ H ,j is rejected] / n (cid:88) j =1 P [ H ,j is rejected] (19)The power varies because the networks change when the sample size increases. It is strictly in-creasing when the network is sparse (i.e. p = 0 . p = 0 . λ sfrom both stages as in section 4.1). Moreover, when calculating ˆΘ in the de-bias 2SLSS estimator(section 4.2) and using the nodewise regression, one also need to choose a tuning parameter. I usea benchmark choice of λ nodewise (cid:38) (cid:112) log( n ) /n and λ (cid:38) (cid:112) log( n ) /n in the nodewise regression andfirst stage. Then I use cross-validation to pick λ in the second stage.I further increase the number of influential individuals to 10 and report the results in appendixtable A2. Again, to guarantee the existence of endogeneity, the adjacency matrix for these tenindividuals is set as shown in the appendix. All average coverages and average confidence intervallengths are separately reported for these ten individuals. The choice of the tuning parameters issimilar to those used to generate appendix table A1 for networks with 50 and 200 individuals. Fornetworks with 500 individuals, I use benchmark λ to replace cross validation in the second stage.28s shown in appendix table A2, all coverages are very close to the nominal levels. The averagelengths of confidence intervals is slightly larger compared with appendix table A1. This is due tothe increase in influential individuals; it is more difficult to differentiate them from those irrelevantindividuals.Appendix table A3 presents the result when a network is generated using the Watts-Strogatzmechanism or the “small world” network. Define the pN (even number) as the mean degree foreach node and a special parameter ω = 0 .
4. The WattsStrogatz mechanism works as follows:- construct a graph with N nodes each connected to pN neighbors, which pN on each side.- For each node n i , take every edge ( n i , n j ) with i < j and rewrite it with probability ω .Rewrite means replace ( n i , n j ) with ( n i , n k ) where k is choosing uniformly among all nodesthat are not currently connected with n i The influential individuals are chosen as the 1st, 5th, 15th, 40th and 50th individuals in the network.As shown in appendix table A3, my estimator is robust under a “small world” algorithm. Nominallevel is reached as the size of the network grows and the length of confidence intervals is slightlysmaller than in the standard case.
Appendix table A5 presents results for the heterogeneous endogenous effects model with cliques.The outcome variable Y n is now generated as Y n = ( I − M n ◦ η − M n γ ) − ( X n β + (cid:15) n ). Thecoefficient of the homogeneity effects γ is set at 0.05.The choice of the tuning parameters is similar to that used to generate appendix table A1 fornetworks with 50 and 200 individuals. For networks with 500 individuals, I use benchmark λ (i.e. λ (cid:38) (cid:112) log( n ) /n ) to replace cross validation in the second stage.The coverage is above the 95% nominal level in all cases. I also report the mean coverage andaverage length of the confidence interval for the coefficient of the homogeneous effects. My modelgives above 95% coverage in all cases. I also report the empirical probability of rejecting a nullhypothesis of zeros effects at 95% nominal level. The probability of rejecting the test converges to1 when the sample size grows to 500. 29 .3 Heterogeneous Endogenous Effects Model with Multiple Networks In this Monte Carlo exercise, I include two different networks generated by the Erdos-Renyi algo-rithm, where one is influential and the other is not. I use the two-stage LASSO estimator withmultiple networks to estimate the parameters. The sparse group LASSO requires two tuning pa-rameters, one for the l norm and the other for the l norm. I set the two parameters to be equalto each other as the correlations among the columns of the adjacency matrices are very small. Thechoice of tuning parameters is similar to that used to generate appendix table A1 for networkswith 50 and 200 individuals. For networks with 500 individuals, I use benchmark λ instead ofcross-validation in the second stage. Appendix table A4 summarizes the results. As in previousresults, all coverages exceed the nominal 95% level.I report the empirical probabilities such that at least one individual is detected in a given networkcontrolling for the FDR at 5% using the Benjamini-Hochberg method. I also report the averagenumber of detections conditioning on at least one individual who is detected in a given network.Appendix table A4 shows that network 1, which is the relevant network, is more likely to be detectedin all cases than network 2, the irrelevant network. The average number of identified individualsfor network 1 is also more than that of network 2. I use the proposed estimator to study the importance of different networks in spreading the par-ticipation in a micro finance program within rural Indian villages. I show that different kinds ofnetworks have different effects on individuals decisions. I identify the influential individuals ineach village. My analysis shows that leaders among agricultural laborers, Anganavadi teachers,construction workers, small business owners and mechanics are very likely to be influential in thevillages.
A non-profit organization named Bharatha Swamukti Samsthe (BSS) has been running micro fi-nance programs in rural southern Karnataka, India since 2007. It provides small loan productsto poor women and, through them, to their families. The villages covered by the program aregeographically isolated and heterogeneous in terms of caste.30hen BSS initially introduces a micro finance program to a village, the credit officers of BSS firstapproached a number of “predefined leaders”, such as teachers, shopkeepers and village elders.BSS held a private meeting with these leaders and explained the program. Then these predefinedleaders passed the information onto other villagers. Those who were interested in the programand contacted BSS were trained and assigned to groups to receive credit. Each group consistedof 5 borrowers and group members were jointly liable for loans. Loans were around 10,000 rupees(approximately $200) at an annualized rate of approximately 28%. Note that 74.5 percent of thehouseholds in rural area said the monthly income of their highest earning member is less than 5,000rupees (source: Socio-Economic Caste Census-2011). This loan had to be repaid within 50 weeks.In 2006, 75 villages in Karnataka were surveyed 6 months before the initiation of the BSS microfinance program. This survey consisted of a village questionnaire and a detailed follow-up surveyconducted among a subsample of villagers. The village questionnaire gathered demographic infor-mation on all households in a village including GPS coordinates, age, gender, number of rooms,whether the house had electricity, and whether the house had a latrine. The data set also con-tains information on the “pre-defined leaders” set who helped spread the information to the entirevillage. The follow-up survey collected data from a villager sample stratified according to age, ed-ucation level, caste, occupancy, etc. It also asked questions about social network structures along12 dimensions, including:- Friends: Name the 4 non-relatives whom you speak to the most.- Visit-go: In your free time, whose house do you visit?- Visit-come: Who visits your house in his or her free time?- Borrow-kerorice: If you needed to borrow kerosene or rice, to whom would you go?- Lend-kerorice: Who would come to you if he/she needed to borrow kerosene or rice?- Borrow-money: If you suddenly needed to borrow Rs. 50 for a day, whom would you ask?- Lend-money: Who do you trust enough that if he/she needed to borrow Rs. 50 for a day youwould lend it to him/her?- Advice-come: Who comes to you for advice?- Advice-go: If you had to make a difficult personal decision, whom would you ask for advice?- Medical-help: If you had a medical emergency and were alone at home whom would you askfor help in getting to a hospital? 31 Relatives: Name any close relatives, aside from those in this household, who also live in thisvillage.- Temple-company: Do you visit a temple/mosque/church? Do you go with anyone else? Whatare the names of these people?For the 43 villages where micro finance was introduced by the time of 2011, BSS also collects infor-mation on which villagers have joined the program. These survey questions reveal the underlyingstructures for connections among any two individuals in the network. Figure 5 presents all thoseconnections at the household-level in a graph. Each node in the graph represents a household.A black node indicates that the household joined the micro finance program, while a white nodeindicates that it did not. Bigger nodes represent those households in which at least one familymember has been chosen as being among the “pre-defined leaders”. An edge between two nodessignifies that the two nodes are connected in at least one of the 12 networks. The darker the colorof the edge, the more connections it represents.This dataset provides an ideal framework for application of the heterogeneous endogenous effectsmodel. First, it allows me to model endogenous effects. An individual may decide to join themicro finance program if her neighbors or friends plan to join. Second, the endogenous effects areindividual specific. Given the diversity of the villagers, it is possible that some villagers are moreinfluential than others. Third, it allows me to implement the heterogeneous endogenous effectsmodel with multiple networks. The questions asked regarding multiple dimensions of the networkstructure allow me to explore which network is most influential.
In this empirical study, I focus on the 38 villages that have been introduced to the micro financeprograms by BSS. For each village, I can observe both its social network structure and the villagers’decisions about joining the program. I drop the data for one village (Village 46) that containsincorrect entries on the index of households. Appendix table ?? summarizes the descriptive statisticsfor each village.Among the 12 questions about the social network structure, 4 pairs essentially capture the sameconnections among the villagers . Therefore, I consolidate each pair of questions into one dimen-sion: Assuming every villager truthfully answers a pair of questions, the adjacency matrices associated with eachquestion are the same. It is also plausible to treat villagers’ answers to each question as a separate directed graph.
In your free time, whose house do you visit?Who visits your house in his or her free time?Borrow-Lend-kerorice
If you needed to borrow kerosene or rice, to whom would you go?Who would come to you if he/she needed to borrow kerosene or rice?Borrow-Lend-money
If you needed to borrow Rs.50 for a day, whom would you ask?Who do you trust enough that if he/she needed to borrow Rs.50 for aday you would lend it to him/her?Help decision
Who comes to you for advice?If you had to make a difficult personal decision, whom would you askfor advice?I restructure all the data at the household level as only women are allowed to apply for the microfinance program because the goal of BSS is to support families through the women in them. Asa result, a woman’s decision to join or not join the micro finance program becomes her family’sdecision. A connection between two villagers becomes a connection between two families. A“predefined leader” is a villager selected by BSS to help spread information about the micro financeprogram to the other villagers. At the household level, I use the term “predefined leader” for ahousehold that contains at least one such villager.
To demonstrate how my method identifies influential households, I model families’ decisions regard-ing joining the micro finance program as a network game with Bayesian Nash Equilibrium. Forhousehold i , let d ∗ i be the expected probability that i chooses to join the micro finance program.The decision of household i depends on its neighbors’ decisions as well as the types of connectionsbetween them. The decision also depends on its characteristics X i and on unobserved information (cid:15) i . Formally, it can be written as: d ∗ i = (cid:88) l ∈ N i d ∗ l ( q (cid:88) j =1 η jl ) + x i β + (cid:15) i However, these questions do not allow for clear determination of the directions. For example, if villager A visitsvillager B ’s house, it is not clear whether villager A influences villager B or vice versa D ∗ n = q (cid:88) j =1 (cid:0) M jn ◦ D ∗ n (cid:1) η j + X n β + (cid:15) n I assume that only a small number of households are influential over their neighbors. Leaders andfollowers are usually observed in those rural villages. Big decisions are often made by the villageelders or by the more educated among the villagers. BSS recognized the importance of leadersand gathered a group of predefined leaders, asking them to inform the rest of the villagers abouttheir program. I do not consider the local level influence in these villages given the size and howcomplicated the network structures are. Households are closely connected by these 8 networks asshown in Figure 5 and there is no form of clique visible.Because the villages are considered geographically isolated, I apply my estimator separately toeach of the 38 villages. I use the number of rooms per person in a household as the independentvariable X n . Number of rooms per person is a proxy for the wealth in the family. As shownin table 1, it is negatively correlated with the decision to join the micro finance program. Thericher the family, the less likely the family is to participate in the micro-finance program. Onthe other hand, it is arguably an exogenous variable to other factors (career, education, etc) thatmight affect the decision to join the micro-finance program. I further check the robustness of myindependent variable by including additional controls. The adjacency matrix M jn is constructedfrom the questions in the survey. Households i and k are connected in network j if either i or k reported the other in question j . Finally, d ∗ i is replaced with the household’s decision.The instruments are constructed as (cid:16) M jn ◦ X n (cid:17) for j = 1 , , · · · ,
8. I use the heterogeneous en-dogenous effects model with multiple networks to: 1) Identify the effective networks affecting ahousehold’s decision and 2) Identify that households that are leaders in the village and study theassociation between observable characteristics and leader status. If a new program is going to try torecruit these households, the organizers can target those influential households and try to persuadethem to join first.
First, I study how LASSO selects networks. I define a coefficient for a household’s endogenous effectin a network as significant according to two different criteria. The first criterion, “Cross-Validation”,35able 1: Predictive Power of Characteristics X n (1) (2) (3)Participate Participate ParticipateAverage Num. rooms x100 -8.19 ∗∗∗ -7.12 ∗∗∗ -3.36 ∗∗∗ (1.30) (1.30) (1.12)Household Size x100 0.45 ∗∗ ∗∗ (0.21) (0.20)Electricity x100 1.29(1.26)Latrine x100 -5.66 ∗∗∗ (1.38)Average Num. workers x100 0.64 ∗ (0.50)Average age x100 -0.27 ∗∗∗ (0.03)Village Fixed Effects Y Y Y n R Table 1 provides a robustness check for variable X n : Average Num. rooms, which used to construct instruments. Standarderrors in parentheses * p < .
1, ** p < .
05, *** p < .
01. Standard deviation clustered at village level. Dependent variable ishouseholds’ decision on whether to join the micro finance program or not. All design control village fixed effects. visit friendship borrow-lend borrow-lend relatives help medical templego-come keroric money decision help companyCross probability
70% 54% 38% 49% 22% 30% 14% 0%Validation identified
101 89 88 86 69 70 80 0De-bias probability
54% 35% 30% 19% 16% 8% 5% 0%identified
11 8 8 9 7 14 8 0magnitudes Table 2 reports the probability of detection for different networks among the 38 villages. A network is detected as influential ifat least one leader is detected within this network. 1. Cross Validation represents those networks detected by lasso using crossvalidation. 2. De-bias represents those networks detected by significant de-bias estimators under FDR control. 3. Probabilityreports the empirical probability that at least one coefficient ˆ e ji is significant in network j . 4. Identified reports the averagednumber of significant ˆ e ji in the network j conditioning on the network being detected. 5. Magnitudes is the mean of | ˆ e ji | andrepresents the average endogenous effects through network j Second, I focus on how LASSO selects households. I compare the LASSO selected influentialhouseholds with the BSS selected “predefined leaders”. It is important to point out that these“predefined leaders” are not necessarily influential villagers in a network. Recall that predefinedleaders are a set of villagers that BSS select to help spread the information about the micro financeprogram. The fact that a villager is selected as a “predefined leader” to pass information aboutthe micro finance program does not a priori guarantee her or her family’s influence – her decisionto join the micro finance program may not lead to her neighbors’ decisions to join. In the analysesbelow, I will examine how influential villagers are associated with “predefined leaders” and exploretheir potential differences.
1. Influential Predefined Households
In table 3, I report results indicating that influential households selected by LASSO partly overlapwith “predefined leaders”. This is intuitive because some “predefined leaders” such as schoolheadmasters and village elders are highly respected figures in a village. Therefore, their decisionsare likely to be followed by others in the village. On average, BSS selected 27 villagers as “predefinedleaders” in each village. In comparison, Cross-Validation criterion selects around 136 villagers andde-bias criterion selects around 19. Furthermore, on average, 19 out of 136 influential villagers (i.e.14%) selected by Cross-Validation criterion are also BSS “predefined leaders”; 3 out of 19 influentialvillagers (i.e. 17%) selected by de-bias criterion are also BSS “predefined leaders”. The likelihoodof selected by the two methods are both higher than the percentage of predefined leaders in theentire village (11%). Comparing with a random guess of influential individuals, table 3 suggests theLASSO detected influential individuals are more likely to overlap with the predefined leaders. InTable 6 below, I show that small business owners are more likely to be both influential and selectedas “predefined leaders”.
2. Influential Non-Predefined Households
In this and the following section, I focus on understanding the differences between the influentialhouseholds selected by LASSO and the “predefined households” selected by BSS. I investigate thelikelihood that a household being selected by LASSO or by BSS, as associated with the careers of38able 3: Coverage of predefined leaders % of predefined leaders among: Average number of discovery LASSO detected entire village Cross Validation
14% 11% 136De-bias
17% 11% 19
Table 3 depicts the overlapping between influential households selected by LASSO and “predefined leaders”. Predefined leadersare a set of villagers defined by BSS, who helped spread the information about the micro-finance program. 1. LASSO detectedreports the percentage of households detected by LASSO and also selected as “predefined leaders” in total LASSO detectedhouseholds. 2. Entire village reports the percentage of “predefined leaders” among the entire village. 3. Average number ofdiscovery reports the total number of individuals discovered by LASSO using each method. 4. Cross Validation representsthose individuals identified from lasso using cross validation. 5. De-bias represents those individuals identified from significantde-biased estimators controlling FDR. 6. The average number of predefined leaders in one village is 27. its family members. Table 4 and 5 present linear regression results using career dummy variablesof family members to explain whether a household is selected as “predefined leader” (Column 1 intable 4), whether a household is selected by LASSO as influential using cross-validation (Column2 in table 4), and whether a household is selected by LASSO as influential using de-bias estimator(Column 3 in table 4). The full results of these regressions are reported in appendix table A7.Table 4 summarizes all careers that have a significant impact ( p < . p < .
01) on the likelihood of ahousehold being selected by BSS as being among the “predefined leaders”. Poojari are Indianpriests in those villages and they are very likely to be included as “predefined leaders”. However,they are not likely to influence people to join the micro finance program. Other careers as tailor,39otel workers, veteran and barber are included as “predefined leaders” because individuals doingthese jobs can spread information quickly in the village. However, LASSO does not find theseindividuals to be influential.Table 6 reports the counter factual study when selected leaders all decide to join the micro-financeprogram. The participation rate for non-leaders in the data is 16%. When all “predefined leaders”decide to join, the participation rate for non-leaders will increase to 20%. And when all LASSOselected leaders decide to join, the participation rate for non-leaders will further increase to 33%.Table 4: Second Stage: LASSO selected leaders’ careers
Predefined Selected by LASSOleaders Cross-validation De-biasAgriculture labour 0.00 0.31 ∗∗∗ ∗∗∗ (0.01) (0.01) (0.00)Anganavadi Teacher 0.14 ∗ ∗∗∗ (0.06) (0.07) (0.04)Construction/mud worker 0.00 0.17 ∗∗∗ ∗∗∗ (0.02) (0.03) (0.02)Truck/Tractor Driver -0.03 0.16 ∗∗∗ ∗∗∗ (0.03) (0.03) (0.02)Factory worker (bricks/stones/mill) -0.00 0.17 ∗∗∗ ∗∗∗ (0.02) (0.03) (0.01)Small business 0.22 ∗∗∗ ∗∗∗ ∗∗∗ (0.02) (0.03) (0.01)Teacher 0.05 0.22 ∗∗∗ ∗∗∗ (0.04) (0.05) (0.03)Daily labourer -0.05 ∗ ∗∗∗ ∗∗∗ (0.03) (0.03) (0.02)Wood cutter -0.03 0.15 ∗ ∗∗∗ (0.06) (0.07) (0.04)Animal skin business 0.36 0.62 ∗ ∗∗∗ (0.23) (0.28) (0.15)Control other careers Y Y YControl village fix effect Y Y Y Table 4 summarizes all careers that have a significant impact ( p < . ?? . The first column uses whether one is predefined leaders as response variable,the second column uses whether one joins the micro-finance program as response variable and the third column uses whetherone is selected by lasso as response variable. Standard errors in parentheses * p < .
1, ** p < .
01, *** p < . Predefined Selected by LASSOleaders Cross-validation De-biasSmall business 0.22 ∗∗∗ ∗∗∗ ∗∗∗ (0.02) (0.03) (0.01)Tailor Garment worker 0.08 ∗∗ ∗∗∗ ∗∗∗ ∗∗ -0.02(0.07) (0.09) (0.05)Poojari 0.53 ∗∗∗ ∗∗ -0.03 0.00(0.32) (0.39) (0.21)Barber/saloon 0.41 ∗∗∗ -0.01 -0.02(0.10) (0.12) (0.06)Control other careers Y Y YControl village fix effect Y Y Y Table 5 summarizes all careers that have a significant impact (( p < . ?? . The first column uses whether one is predefined leadersas response variable, the second column uses whether one joins the micro-finance program as response variable and the thirdcolumn uses whether one is selected by lasso as response variable. Standard errors in parentheses * p < .
1, ** p < .
01, *** p < . Table 6: Participation Rate when Targeting Different Leaders
In data Predefined LASSOLeaders LeadersParticipation Rate 16% 20% 33%(non-leaders)
Table 6 reports the participation rate of non-leaders when all targeted leaders decided to join. The true participation rate indata is 16%. If all predefined leaders decide to join, the participation rate will increase to 20%. If all LASSO detected leadersdecide to join, the participation rate will increase to 33%. Conclusions
In this paper, I propose a novel spatial autoregression model which allows for heterogeneous en-dogenous effects. Specifically, each individual has an individual-specific endogenous effect on herneighbors. My approach is useful for modeling a network with leaders and followers.I propose a set of instruments as well as a two stage LASSO (2SLSS) method to estimate my model.The instruments are constructed as a function of the independent variables and an adjacency matrix.I use a LASSO type estimator to select the valid instruments in the first stage and the influentialindividuals in the second stage. I propose a bias correction for my two-stage estimator followingvan de Geer et al. (2014). I derive the asymptotic normality for my “de-bias” two-stage LASSOestimator and conduct robust inference including confidence intervals.My model can be extended to allow for more flexible structures. To apply LASSO, I assume thatthe number of influential individuals is sparse. I propose heterogeneous endogenous effects modelwith cliques to incorporate locally influential individuals, where the sparsity assumption is onlyapplied to globally influential individuals. My model can also be extended to situations wherethere are multiple networks. I propose the use of the sparse group LASSO in my 2SLSS process.I derive the convergence rate and prove the consistency of selection for the sparse group LASSOestimator.I apply my method to study villagers’ decisions to participate in micro-finance programs in ruralareas of Indian. I show that leaders in those villages have significant influence over their neighbors’decision to join the micro-finance program, and I provide rankings for the different social andeconomic networks among villagers. Based on how effectively each network spreads the impact ofinfluential individuals’ decisions, my method shows that some networks such as “visit go-come” and“borrow money” are much more effective in influencing villagers’ decisions than other networks suchas “temple company” and “medical help”. I further show that individuals from certain careers suchas agricultural workers, Anganwadi teachers and small business owners are more likely to influenceother villagers and the “predefined leaders” selected by BSS are different than the LASSO detectedinfluential individuals.
References
Acemoglu, D., Garc´ıa-Jimeno, C., and Robinson, J. A. (2012). Finding eldorado: Slavery andlong-run development in colombia. NBER WORKING PAPER SERIES.42mmermuller, A. and Pischke, J.-S. (2009). Peer effects in european primary schools: Evidencefrom pirls.
Journal of Labor Economics , 27(3):315–348.Anselin, L. (1988).
Spatial Econometrics: Methods and Models . Boston: Kluwer.Ballester, C., Calv´o-Armengol, A., and Zenou, Y. (2006). Who’s who in networks. wanted: Thekey player.
Econometrica , 74(5):1403–1417.Bandiera, O., Barankay, I., and Rasul, I. (2009). Social connections and incentives in the workplace:Evidence from personnel data.
Econometrica , 77(4):1047–1094.Banerjee, A., Chandrasekhar, A., Duflo, E., and Jackson, M. (2013). The diffusion of microfinance.
Science , 341(6144).Belloni, A., Chernozhukov, V., and Hansen, C. (2014). Inference on treatment effects after selectionamongst high-dimensional controls.
The Review of Economic Studies , 81(2):608–650.Belloni, A., Chernozhukov, V., and Kato, K. (2015). Uniform post selection inference for ladregression and other z-estimation problems.
Biometrika , 102:77–94.Blume, L. E., Brock, W. A., Durlauf, S. N., and Jayaraman, R. (2015). Linear social interactionsmodels.
Journal of Political Economy , 123(2):444–496.Bonaldi, P., Hortacsu, A., and Kastl, J. (2015). An empirical analysis of funding costs spillovers inthe euro-zone with application to systemic risk. NBER Working Paper.Bramoull´e, Y., Djebbari, H., and Fortin, B. (2009). Identification of peer effects through socialnetworks.
Journal of Econometrics , 150(1):41–55.B¨uhlmann, P. (2013). Statistical significance in high-dimensional linear models.
Bernoulli ,41(2):802–837.B¨uhlmann, P. and van de Geer, S. (2011).
Statistics for High-Dimensional Data . Springer.Bunea, F., Lederer, J., and She, Y. (2014). The square root group lasso: theoretical properties andfast algorithms.
IEEE-Information Theory , 60:1313–1325.Calv´o-Armengol, A., Patacchini, E., and Zenou, Y. (2009). Peer effects and social networks ineducation.
Review of Economic Studies , 76(4):1239–1267.Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J.(2018). Double/debiased machine learning for treatment and causal parameters.
EconometricsJournal , 21:C1–C68. 43hristakis, N. A., Fowler, J. H., Imbens, G. W., and Kalyanaraman, K. (2010). An empirical modelfor strategic network formation. NNBER working paper.Clark, A. E. and Loheac, Y. (2007). “it wasn’t me, it was them!” social influence in risky behaviorby adolescents.
Journal of Health Economics , 26:763–784.Cliff, A. and Ord, J. K. (1973).
Spatial autocorrelation . London: Pion.Coelli, T., Rahman, S., and Thirtle, C. (2002). Technical, allocative, cost and scale efficienciesin bangladesh rice cultivation: A nonparametric approach.
Journal of Agricultural Economics ,53(3):607–626.Conley, T. G. and Udry, C. R. (2010). Learning about a new technology: Pineapple in ghana.
AMERICAN ECONOMIC REVIEW , 100(1):35–69.Cressie, N. A. C. (1993).
Statistics for Spatial Data . John Wiley & Sons, Inc.de Paula, A., Rasul, I., and Souza, P. C. (2015). Recovering social networks from panel data:identification, simulations and an application.Denbee, E., Julliard, C., Li, Y., and Yuan, K. (2015). Network risk and key players: A structuralanalysis of interbank liquidity.Fan, J. and Liao, Y. (2014). Endogeneity in high dimensions.
The Annals of Statistics , 42(3):872–917.Gautier, E. and TsyBakov, A. B. (2014). High-dimensional instrumental variables regression andconfidence sets. TSE Working Paper.Guryan, J., Kroft, K., and Notowidigdo, M. J. (2009). Peer effects in the workplace: Evidencefrom random groupings in professional golf tournaments.
American Economic Journal: AppliedEconomics , 1(4):34–68.Horrace, W. C., Liu, X., and Patacchini, E. (2016). Endogenous network production functions withselectivity.
Journal of Econometrics , 190(2):222–232.Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression.
Journal of Machine Learning Research , 15(1):2869–2909.Jin, F. and Lee, L.-F. (2018). Lasso maximum likelihood estimation of parametric models withsingular information matrices.
Econometrics , 6(1):1–24.Kelejian, H. H. and Prucha, I. R. (1995). A generalized moments estimator for the autoregressiveparameter in a spatial model.
INTERNATIONAL ECONOMIC REVIEW , 40.44elejian, H. H. and Prucha, I. R. (1998). A generalized spatial two-stage least squares procedurefor estimating a spatial autoregressive model with autoregressive disturbances.
Journal of RealEstate Finance and Economics , 17(1):99–121.Krauth, B. V. (2005). Peer effects and selection effects on smoking among canadian youth.
CanadianJournal of Economics , 38(3):735–757.Lee, L. (2002). Consistency and efficiency of least squares estimation for mixed regressive, spatial.
Econometric Theory , 18(2):252–277.Lee, L. (2003). Best spatial two-stage least squares estimators for a spatial autoregressive modelwith autoregressive.
Econometric Reviews , 22(4):305–335.Lee, L. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial econo-metric models.
Econometrica , 72:1899–1926.Lee, L. and Liu, X. (2010). Efficient gmm estimation of high order spatial autoregrssive modelswith autoregressive disturbances.
Econometric Theory , 26:187–230.Lee, L.-f. and Yu, J. (2010). A spatial dynamic panel data model with both time and individualeffects.
Econometric Theory , 26:564–597.Leeb, H. and Potscher, B. M. (2005). Model selection and inference: facts and fiction.
EconometricTheory , 21(1):21–59.Leeb, H. and Potscher, B. M. (2008). Can one estimate the unconditional distribution of post-modelselection estimators?
Econometric Theory , 24(2):38–376.Leeb, H. and Potscher, B. M. (2009). Model selection.
Handbook of Financial Time Series , pages889–925.Manresa, E. (2013). Estimating the structure of social interactions using panel data.Manski, C. (1993). Identification of endogenous social effects: The reflection problem.
The Reviewof Economic Studies , 60(3):531–542.Mas, A. and Moretti, E. (2009). Peers at work.
American Economic Review , 99(1):112–145.Masten, M. A. (2018). Random coefficients on endogenous variables in simultaneous equationsmodels.
The Review of Economic Studies , 85(2):1193–1250.Meinshausen, N. and B¨uhlmann, P. (2006). High-dimensional graphs and variable selection withthe lasso.
The Annals of Statistics , 34(1436-1462).45akajima, R. (2007). Measuring peer effects on youth smoking behaviour.
The Review of EconomicStudies , 74(3):897–935.Neidell, M. and Waldfogel, J. (2010). Cognitive and noncognitive peer effects in early education.
Review of Economics and Statistics , 92(3):562–576.Pinkse, J., Slade, M., and Brett, C. (2002). Spatial price competition: a semiparametric approach.
Econometrica , 70(3):1111–1153.Sacerdote, B. (2001). Peer effects with random assignment: Results for dartmouth roommates.
The Quarterly Journal of Economics , 116(2):681–704.Sheng, S. (2016). A structural econometric analysis of network formation games through subnet-works. Conditional Acceptance at Econometrica.Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2013). The sparse group lasso.
Journal ofComputational and Graphical Statistics , 22(2):231–245.Upton, G. and Fingleton, B. (1985).
Spatial data analysis by example. Volume 1: Point patternand quantitative data.
John Wiley and Sons Ltd.van de Geer, S., Buhlmann, P., Ritov, Y., and Dezeure, R. (2014). On asymptotically optimalconfidence regions and tests for high-dimensional models.
The Annals of Statistics , 42(3):1166–1202.Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables.
Journal of the Royal Statistical Society , B(68):49–67.Zhang, C.-H. and Zhang, S. S. (2011). Confidence intervals for low-dimensional parameters inhigh-dimensional linear models.
Journal of the Royal Statistical Society , 76(1):217–242.Zhao, P. and Yu, B. (2006). On model selection consistency of lasso.
Journal of Machine LearningResearch , 7:2541–2563.Zhu, Y. (2018). Sparse linear models and l1regularized 2sls with high-dimensional endogenousregressors and instruments.