[PDF] Weak Identification and Estimation of Social Interaction Models

Abstract

The identification of the network effect is based on either group size variation, the structure of the network or the relative position in the network. I provide easy-to-verify necessary conditions for identification of undirected network models based on the number of distinct eigenvalues of the adjacency matrix. Identification of network effects is possible; although in many empirical situations existing identification strategies may require the use of many instruments or instruments that could be strongly correlated with each other. The use of highly correlated instruments or many instruments may lead to weak identification or many instruments bias. This paper proposes regularized versions of the two-stage least squares (2SLS) estimators as a solution to these problems. The proposed estimators are consistent and asymptotically normal. A Monte Carlo study illustrates the properties of the regularized estimators. An empirical application, assessing a local government tax competition model, shows the empirical relevance of using regularization methods.

Full PDF

aa r X i v : . [ ec on . E M ] F e b Weak Identiﬁcation and Estimation of Social InteractionModels ∗ Guy Tchuente † February 2019

Abstract

The identiﬁcation of the network eﬀect is based on either group size variation, the structureof the network or the relative position in the network. I provide easy-to-verify necessaryconditions for identiﬁcation of undirected network models based on the number of distincteigenvalues of the adjacency matrix. Identiﬁcation of network eﬀects is possible; althoughin many empirical situations existing identiﬁcation strategies may require the use of manyinstruments or instruments that could be strongly correlated with each other. The use ofhighly correlated instruments or many instruments may lead to weak identiﬁcation or manyinstruments bias. This paper proposes regularized versions of the two-stage least squares(2SLS) estimators as a solution to these problems. The proposed estimators are consistentand asymptotically normal. A Monte Carlo study illustrates the properties of the regularizedestimators. An empirical application, assessing a local government tax competition model,shows the empirical relevance of using regularization methods.

Keywords:

High-dimensional models, Social network, Identiﬁcation, Spatial autoregressivemodel, 2SLS, Regularization methods.

JEL classiﬁcation:

C13, C31. ∗ Comments from Marine Carrasco, Eric Renault, James G. MacKinnon, Silvia Goncalves, Russell Davidson, Yann Bramoull´e, Lynda Khalaf, JohnPeirson and A. Colin Cameron are gratefully acknowledged. The author also thanks seminar participants at the University of Kent, University of Ottawa,Universidad Pontiﬁcia Javeriana, RES 2016, EEA-ESEM 2016, AFES 2016, CIREQ Econometrics Conference 2017, the University of Bristol’s EconometricsStudy Group 2017, SCSE 2017 and NASM 2018 for their comments. Many thanks to Xiaodong Lui for kindly providing his code for bias-corrected 2SLSmethods and to Teemu Lyytik¨ainen for providing data on Finnish municipalities. † E-mail: [email protected]. Address: School of Economics, University of Kent, Keynes College, Canterbury, Kent, CT27NP. Tel:+441227827249 Introduction

This paper investigates the estimation of social interaction models with network structuresand the presence of endogenous, contextual, correlated and group ﬁxed eﬀects. In his seminalpaper on network model estimation, Manski (1993) argues that solving the reﬂection problemin identifying and estimating the endogenous interaction eﬀects is of signiﬁcant interest insocial interaction models. He shows that the separate identiﬁcation of the network eﬀects, in alinear in mean model, is impossible. Following Manski (1993), the literature on identiﬁcationof network eﬀect has proposed three main identiﬁcation strategies. They are based on eitherthe variation in the size of the group of peers or on the structure through which peers interact.The present paper investigates a robust to weak identiﬁcation estimation strategy. Weakidentiﬁcation can occur in limit cases for all the identiﬁcation strategies.The ﬁrst method for identiﬁcation was proposed by Lee (2007). He shows that both theendogenous and exogenous interaction eﬀects can be identiﬁed if there is suﬃcient variationin group sizes. However, with large groups, identiﬁcation can be weak in the sense that theestimator converges in distribution at low rates (Lee (2007)). The low rate of convergencemeans that we need a larger sample to have enough exogenous variation. Indeed as thegroup size increase, the marginal eﬀect of an individual on its peer becomes small an moreobservations are needed for identiﬁcation.In a more general framework, Bramoull´e, Djebbari, and Fortin (2009) investigate identi-ﬁcation and estimation of network eﬀect. They use the structure of the network to identifythe network eﬀect. Their identiﬁcation strategy relies on the use of spatial lags of friends’(or friends of friends’) characteristics as instruments. But, if the network is highly transitive(i.e. if a friend of my friend is also likely to be my friend), the identiﬁcation is also weak.Weak identiﬁcation can also occurs if the there are too many isolated individuals, the weakidentiﬁcation correspond to the weak instruments as in Staiger and Stock (1997). This paperfocuses its attention on highly transitive networks.More recently, Liu and Lee (2010) have considered the estimation of a social network wherethe endogenous eﬀect is given by the aggregate choices of an agent’s friends. They show that In network models, an agent’s behavior may be inﬂuenced by his peers’ choices (the endogenous eﬀect), hispeers’ exogenous characteristics (the contextual eﬀect), and/or by the common environment of the network (thecorrelated eﬀect) (see Manski (1993) for a description of these models). iﬀerent positions of agents in a network captured by the Bonacich (1987) centrality measurecan be used as additional instrumental variables to improve estimation eﬃciency. The numberof such instruments depends on the number of groups, and can be very large. Liu and Lee(2010) propose two-stage least squares (2SLS) and generalized method of moments (GMM)estimators. The proposed estimators have an asymptotic bias due to the presence of manyinstruments.The existing papers in the literature use instrumental variable (IV) methods or quasi-maximum likelihood method to estimate the network eﬀect. The present paper is interestedon the use of IV when identiﬁcation is weak in the sense describe above. We will show that,in the estimation of peer eﬀects using IV methods, highly transitive network or large groupsize imply the use of highly correlated instruments (where the set of instrumental variablescontains the included and excluded instruments). If the Bonacich (1987) centrality measureare used the number of instruments increase with the number of groups. In both cases, thestructure of the interaction generates a weak identiﬁcation issue. The weak identiﬁcationproblem comes from the near-perfect collinearity of the ﬁrst-stage regression.This paper proposes simple-to-check necessary conditions for identiﬁcation based on thespectral decomposition of the network adjacency matrix. It shows that identiﬁcation of thenetwork eﬀects is possible many cases. However, given that all exogenous variations comefrom the system, weak identiﬁcation may be a concern. I propose a regularized 2SLS estima-tors for network models with spatial autoregressive (SAR) representations. High-dimensionalreduction techniques are used to mitigate the ﬁnite sample bias of the 2SLS estimators stem-ming from the use of many or highly correlated instruments. The regularized 2SLS estimatorsare based on three ways of computing a regularized inverse of the (possibly inﬁnite dimen-sional) covariance matrix of the instruments. The regularization methods come from theliterature on inverse problems (see Kress (1999) and Carrasco, Florens, and Renault (2007)).The ﬁrst estimator is based on Tikhonov (ridge) regularization. The Tikhonov (ridge) reg-ularization is known in the machine learning literature for its ability to address near-perfectcollinearity problems. The second estimator is based on the iterative Landweber-Fridmanmethod. It has the same regularization properties as the ridge method, with the advantageof being appropriate for larger-scale problems. The third estimator is based on the principalcomponents associated with the largest eigenvalues. The use of the principal components isvery popular for estimating models with factors. In the presence of many instruments, the se of few principal components can help reduce the ﬁrst-stage regression dimension. Theregularized estimator presented in the paper depends on a tuning parameter, I proposed adata-driven method for it selection based the estimation of an approximation of the meansquare error of the estimator.The regularized 2SLS estimators are consistent and asymptotically normal and unbiased.The regularized 2SLS estimators achieve the semiparametric eﬃciency bound. However, theconsistency and asymptotic normality conditions require more regularization than in Carrasco(2012). A Monte Carlo experiment shows that the regularized estimators perform well. Ingeneral, the quality of the regularized estimators improves as the density of the networkincreases.I demonstrate the empirical relevance of my estimators by estimating a model of taxcompetition between municipalities in Finland. The size of the tax competition parameterseems larger than what is suggested by Lyytik¨ainen (2012). However, the regularized estima-tors are not statistically diﬀerent from zero. This leaves the conclusion unchanged that taxcompetition is absent between municipalities in Finland. The large existing literature on network models focuses on two main issues: identiﬁcationand the estimation of the network eﬀect. In his seminal work, Manski (1993) shows thatlinear-in-means speciﬁcations suﬀer from the reﬂection problem, so endogenous and contex-tual eﬀects cannot be separately identiﬁed. Lee (2007) and Bramoull´e, Djebbari, and Fortin(2009) propose identiﬁcation strategies for a local-average network model based on diﬀerencesin group sizes and structures. Liu and Lee (2010) show that the Bonacich (1987) centralitymeasure can also be used as additional instruments to improve identiﬁcation and estima-tion eﬃciency. Lee (2007) and Bramoull´e, Djebbari, and Fortin (2009) use the instrumentalvariables method to estimate the parameter of interest. Liu and Lee (2010) propose a general-ized method of moments (GMM) estimation approach, following Kelejian and Prucha (1998,1999), who propose 2SLS and GMM approaches for estimating SAR models. The inclusion ofthe measure of centrality implies the use of many moment conditions (see Donald and Newey(2001), Hansen, Hausman, and Newey (2008) and Hasselt (2010) for some recent develop-ments in this area).In this paper, I assume that there are many instruments at hand (they are generatedby the structure imposed on the data), and therefore use a framework that allows for an The inference carried out in the empirical example does not account for the eﬀect of regularization. nﬁnite number of instruments. Thus, this paper contributes to the literature on modelsfor which the number of instruments exceeds the sample size. In a linear model frameworkwithout network eﬀects, Carrasco (2012) proposes an estimation procedure that allows forthe use of many instruments; the number of instruments may be smaller or larger than thesample size, or even inﬁnite. Moreover, Carrasco and Tchuente (2016) show that these meth-ods can be used to improve identiﬁcation in weak instrumental variables estimation. Closelyrelated papers also include Kuersteiner (2012), who considers a kernel-weighted GMM esti-mator; Okui (2011), who uses shrinkage with many instruments; and Bai and Ng (2010) andKapetanios and Marcellino (2010), who also assume that the endogenous regressors dependon a small number of factors that are exogenous. Using estimated factors as instruments,they assume that the number of variables from which the factors are estimated can be largerthan the sample size. Belloni, Chen, Chernozhukov, and Hansen (2012) propose an instru-mental variables estimator under the ﬁrst-stage sparsity assumption. Hansen and Kozbur(2014) propose a ridge-regularized jackknife instrumental variable estimator in the presenceof heteroscedasticity, which does not require sparsity, and with good sizes.Another important focus in the instrumental variables estimation literature is on weak in-struments or weak identiﬁcation (see, for example, Chao and Swanson (2005) and Newey and Windmeijer(2009)). In this paper, I assume that the concentration parameter grows at the same rate asthe sample size. However, I allow for the possibility of weak identiﬁcation resulting from near-perfect collinearity in the set of instruments. My framework is similar to Caner and Yıldız(2012), with the diﬀerence that the near-singular design does not come for the proliferationof instruments, but from the structure of the social or spatial interaction.The paper is organized as follows. Section 2 presents the network model. Section 3discusses identiﬁcation and estimation in network models. It proposes the regularized 2SLSapproach to estimating the model. The selection of the regularization parameter is discussedin Section 4. Monte Carlo evidence on the performance of the proposed estimators for smallsamples is given in Section 5. An empirical application on local government tax competitionis proposed in Section 6. Section 7 concludes. The Model

The following social interaction model is considered: Y r = λW r Y r + X r β + W r X r β + ι m r γ r + u r (1)with u r = ρM r u r + ε r and r = 1 ... ¯ r , where ¯ r is the total number of groups and m r is thenumber of individuals in group r . Y r = ( y r , ..., y m r r ) ′ is an m r -dimensional vector that represents the outcomes of interest. y ir is the observation of individual i in group r . The total number of individuals in the sampleis n = ¯ r X r =1 m r . W r and M r are m r × m r sociomatrices of known constants, and may or may not be thesame. λ is a scalar that captures endogenous network eﬀects. I assume that this eﬀect is thesame for all individuals and groups. The outcomes of individuals inﬂuences those of theirsuccessors in the network graph (the successors are usually a friends or peers).In such a linear model, the parameter λ is usually interpreted as the partial eﬀect ofa one-unit change in the explanatory variable on the outcome. The explanatory variablein the present case is a product of the a sociomatice W r and friends’ outcomes Y r . If thesociomatrix W r is row-normalized, the endogenous network eﬀect captured by λ representsthe expected change in the outcome of an individual if all his friends’ outcomes were changedby one unit. This corresponds to the “local average” endogenous eﬀect in the terminologyof Liu, Patacchini, and Zenou (2014). On the other hand, if W r is not row-normalized, it isimpossible to know which intervention is the source of the exogenous change in W r Y r (seeGoldsmith-Pinkham and Imbens (2013) and Angrist (2014) for a discussion on the causalinterpretation of the network eﬀect). The unit variation in W r Y r could come from a changein the allocation of friend, an intervention on friends’ outcomes or both. This should bedone in a speciﬁc manner to obtain a unit change. Such a situation corresponds to the “localaggregate” endogenous eﬀect in the terminology of Liu, Patacchini, and Zenou (2014).My model speciﬁcation allows for the use of the “local average” and “local aggregate” en-dogenous eﬀects. Micro-foundations developed in Liu, Patacchini, and Zenou (2014) suggestthat “local average” should be used in situations where the network eﬀect comes from indi-viduals trying to conform to the social norm and the “local aggregate” for a situation where here is leakage. X r and X r are m r × k and m r × k matrices, respectively. They represent individuals’exogenous characteristics. β is the parameter measuring the dependence of individuals’outcomes on their own characteristics. The outcomes of individuals may also depend on thecharacteristics of their predecessors via the exogenous contextual eﬀect, β . ι m r is an m r -dimensional vector of ones and γ r represents the unobserved group-speciﬁc eﬀect (it is treatedas a vector of unknown parameters that will not be estimated).Aside from the group ﬁxed eﬀect, ρ captures unobservable correlated eﬀects betweenindividuals and their connections in the network. ε r is the m r -dimensional disturbance vector, ε ir are i.i.d. with a mean of 0 and varianceof σ for all i and r . I deﬁne X r = ( X r , W r X r ).For a sample with ¯ r groups, the data is stacked up by deﬁning V = ( V ′ , ..., V ′ ¯ r ) ′ for V = Y, X, ε or u .I also deﬁne W = D ( W , W , ..., W ¯ r ) and M = D ( M , M , ..., M ¯ r ), ι = D ( ι m , ι m , ..., ι m ¯ r ),where D ( A , .., A K ) is a block diagonal matrix in which the diagonal blocks are m k × n k matrices, denoted as A k , for k = 1 , ..., K .The full sample model is Y = λW Y + Xβ + ιγ + u (2)where u = ρM u + ε .I deﬁne R ( ρ ) = ( I − ρM ). The Cochrane-Orcutt-type transformation of the model is ob-tained by multiplying equation (2) by R = R ( ρ ), where ρ is the true value of the parameter ρ : RY = λRW Y + RXβ + Rιγ + Ru.

This leads to the following equation: RY = λRW Y + RXβ + Rιγ + ε. (3)When the number of groups is large, we have the incidental parameter problem (seeNeyman and Scott (1948) and Lancaster (2000) for a discussion of the consequences of thisproblem).To eliminate unobserved group heterogeneity, I deﬁne J r = I m r − ( ι m r , M r ι m r )[( ι m r , M r ι m r ) ′ ( ι m r , M r ι m r )] − ( ι m r , M r ι m r ) ′ here A − is the generalized inverse of a square matrix A . In general, J r represents theprojection of an m r -dimensional vector on the space spanned by ι m r and M r ι m r if they arelinearly independent. Otherwise, J r = I m r − m r ι m r ι ′ m r , which is the deviation from the groupmean projector.The matrix J = D ( J , J , ..., J ¯ r ) is then pre-multiplied by equation (3) to create a modelwithout the unobserved group-eﬀect parameters: J RY = λJ RW Y + J RXβ + J ε. (4)This is the structural equation, and we are interested in the estimation of λ , β , β and ρ .The discussions on the identiﬁcation and estimation of λ , β and β in this paper will becarried out under the assumption of a consistent estimation of ρ .I deﬁne S ( λ ) = I − λW . I assume that equation (2) is an equilibrium and that S ≡ S ( λ ) isinvertible at the true parameter value. The equilibrium vector Y is given by the reduced-formequation: Y = S − ( Xβ + ιγ ) + S − R − ε. (5)It follows that W Y = W S − ( Xβ + ιγ ) + W S − R − ε and W Y is correlated with ε . Hence,in general, equation (4) cannot be consistently estimated by ordinary least squares (OLS).Moreover, this model may not be considered as a self-contained system where the transformedvariable J RY can be expressed as a function of the exogenous variables and disturbances.Hence, a partial-likelihood-type approach based only on equation (4) may not be feasible.In this paper, I consider the estimation of the parameters of equation (4) using regularized2SLS. An extension would be to estimate the same model using a limited information maximum likelihood (LIML)method (the least variance ratio (LVR)) from Carrasco and Tchuente (2015). In models with independent obser-vations, the LIML estimator can also be derived using the LVR principle (Davidson and MacKinnon (1993)). TheLVR estimator is not equivalent to the LIML estimator for the SAR model. This is analogous to the diﬀerencebetween the 2SLS and maximum likelihood estimators for the SAR model (Lee (2004)). Identiﬁcation and Estimation of the Network Mod-els

This section presents the identiﬁcation and estimation of the network model parameters usingregularization techniques. It ﬁrst discusses weak identiﬁcation in network model. It, then,proposes a regularized 2SLS model using three regularized methods (Tikhonov, Landweber-Fridman and principal component). They are presented in a uniﬁed framework covering botha ﬁnite or inﬁnite number of instruments. The focus is on estimating endogenous and contex-tual eﬀects under the assumption of a preliminary estimator of the unobservable correlationbetween individuals and their connections in the network. I also derive the asymptotic prop-erties of the models’ estimated parameters.

The model presented in Equation(2) proposes an underlying structure assumed to have gen-erated the data of the population from which our sample is drawn. The estimation strategythat I propose later aims at making statements about the parameters of this model. To thatend, they shouldn’t exist many parametrizations compatible with the observed data. Dis-cussing conditions under which a unique parametric characterisation exist is a considerableproblem in the estimation of network models (see Bramoull´e, Djebbari, and Fortin (2009))and, in econometrics (see Dufour and Hsiao (2010) for a general discussion on identiﬁcation).The discussion on the identiﬁcation is done under a number of assumptions.

Assumption 1.

The elements of ε ir are i.i.d. with a mean of 0 and variance of σ , anda moment of order higher than the fourth exists. Assumption 2.

The sequences of matrices { W } , { M } , { S − } and { R − } are uniformlybounded (UB), and Sup k λW k < Uniformly bounded in row (column) sums of the absolute value of a sequence of square matrices { A } will beabbreviated as UBR (UBC), and uniformly bounded in both row and column sums in absolute value as UB. Asequence of square matrices { A } , where A = [ A ij ], is said to be UBR (UBC) if the sequence of the row-sum matrixnorm of A (column-sum matrix norm of A ) is bounded. take ε ( ρ , δ ) = J R ( Y − Zδ ) = f ( δ − δ ) + J RW S − R − ε ( λ − λ ) + J ε , with f = J R [ W S − ( Xβ + ιγ ) , X ], where λ , β and γ are true values of the parameters δ = ( λ, β ′ ) ′ and Z = ( W Y, X ).Under Assumption 2 (i.e. that

Sup k λW k < f can be approximated by a linearcombination of ( W X, W X, ... ) , (

W ι, W ι, ... ) and X . This is a typical case where thenumber of potential instruments is inﬁnite.I deﬁne Q = J [ Q , M Q ], where Q = [ W X, W X, ...W ι, W ι, ..., X ] is the inﬁnite di-mensional set of instruments.We can also consider the case where only a ﬁnite number of instruments, such as m < n ,is used. For this case, I deﬁne Q m = J [ Q m , M Q m ]where Q m = [ W X, W X, ...W m X, W ι, W ι, ..., W m ι, X ].As discussed in Liu and Lee (2010), δ is identiﬁed if Q ′ m f has full column rank k + 1.This rank condition requires that f has full rank k + 1. Note that this assumes that Q m isfull column rank (meaning no perfect collinearity between instruments). If instruments arenear-perfectly or perfectly collinear, f having full rank k + 1 does not ensure identiﬁcation. If W r does not have equal degrees in all its nodes and W r is not row-normalized, the centralityscore of each individual in his group helps to identify δ . This is possible even if β = 0.However, if W r has constant row sums, then f = J R [ W S − Xβ , X ] and the identiﬁcation isimpossible for β = 0. Under Assumptions 1 and 2, δ is identiﬁed. The identiﬁcation in the general case with an inﬁnite number of instruments is possible ifthe matrix with an inﬁnite number of rows, Q ′ f , has full column rank. The identiﬁcation isbased on the moment condition E ( Q ′ ε ( ρ , δ )) = 0 (i.e. Q ′ f ( δ − δ ) = 0).For any sample size n , rank ( Q ) ≤ n . If we assume that rank ( QQ ′ ) = n , then the fullcolumn rank condition only requires that f has full rank k + 1. The same identiﬁcationconditions as in the ﬁnite dimensional case follow. There may be a ﬁnite set of instruments when the network eﬀect is very small, such that λ m → m → ∞ at a very fast rate. Section 3.1 discusses the eﬀect of near-perfect collinearity on the identiﬁcation of the network eﬀect. These identiﬁcation results are from Liu and Lee (2010). My work generalizes the results to an inﬁnite numberof instruments. Section 3.2 proposes regularization tools that can ensure the identiﬁcation of δ with a regularized version ofthe orthogonality condition. he identiﬁcation of the model parameters relies on the structure of the network throughthe adjacency matrix W . The adjacency matrix is an n × n matrix. Let τ ≥ τ ≥ ... ≥ τ n beits n eigenvalues. An eigenvalue could have multiplicity one or k depending on the numberof corresponding eigenvectors. Let deﬁne ̺ w , to be the number of distinct eigenvalues of theadjacency matrix. The results propose in proposition 1 to 3 apply to symmetric spatial andadjacency matrix W . Undirected networks’ adjacency matrices is an example of a networkstructure represented by a symmetric adjacency matrix. Proposition 1

Consider a network model represented by Equation 2 with ρ = 0 . If ̺ w = 2 ,then the network eﬀects are not identiﬁed. Proposition 1 implies that the identiﬁcation of the network eﬀect can be reduced to aspectral analysis of the adjacency matrix. It provide a easy to verify condition for identiﬁca-tion of the network eﬀects under the assumption of network exogeneity. Indeed, if ̺ w = 2,using the Cayley-Hamilton theorem, I can show that there exist µ and µ non null scalarssuch that W = µ I + µ W . Then, using proposition 1 from Bramoull´e, Djebbari, and Fortin(2009) the network eﬀects are not identiﬁed. Proposition 2

Consider a network model represented by Equation 2 with ρ = 0 and ε ( δ ) = J ( Y − Zδ ) = f ( δ − δ ) + J W S − ε ( λ − λ ) + J ε , with f = J [ W S − ( Xβ + ιγ ) , X ] , where λ and β = 0 are true values of the parameters and δ = ( λ, β ′ ) ′ and Assumptions 1and 2 hold. Let ̺ w be the number of distinct eigenvalues of the adjacency matrix W. If [ W X, W X, ..., W ̺ w − X, X ] is full rank column, the network eﬀects are identiﬁed. Proposition 2 gives a relationship between the identiﬁcation of network eﬀect and thespectral decomposition of the adjacency matrix. If ̺ w = 2 using the deﬁnition of X andapplying the Cayley-Hamilton theorem leads to the conclusion that [ W X, X ] is not full rankcolumn. Thus,

J W X cannot be excluded from the structural equation and, therefore, cannot sere as an instrumental variable for

J W Y . However, if the number of distinct eigenvaluesis strictly greater than 2, identiﬁcation may be possible. For instance if ̺ w = 3, ρ = 0 and[ W X, W X, X ] is full rank column then the network eﬀect are identiﬁed. Indeed,

J W X and

J W X serve as excluded instruments for J W Y .The full rank condition can be generalised to a necessary and suﬃcient condition, undervery restrictive assumptions on the set in which the true model’s parameters belong. Thispossibility is discussed in the proof of proposition two in the appendix. now consider the case where there is spatial serial correlation. The following propositiongeneralizes proposition 1 and 2. Proposition 3

Consider a network model represented by Equation 2, β = 0 and Assump-tions 1 and 2 hold. Let ̺ w be the number of distinct eigenvalues of the adjacency matrix W .If Q ̺ w = [ Q ̺ w , M Q ̺ w ] where Q ̺ w = [ W X, W X, ...W ̺ w − X, W ι, W ι, ..., W ̺ w − ι, X ] isfull rank column, the network eﬀects are identiﬁed. A special case of a model with spatial serial correlation is one in which W = M . In sucha situation, proposition 3 becomes similar to Proposition 2. Otherwise, the identiﬁcation ofthe network eﬀects could be achieved via the eﬀect of unobserved shock on peers of peers via M . Having spatial correlation provides a second source of exogenous variation.The identiﬁcation of the network eﬀects seems to rest upon the possibility of having a fullrank column matrix Q ̺ w = J [ W X, W X, ..., W ̺ w − X, X ]. The rank property of Q ̺ w can bemeasured by condition number of the matrix Q ̺ w Q ′ ̺ w . Large values of the condition numbercorrespond to a situation of near-rank deﬁciency and near-non-identiﬁcation of the modelsnetwork eﬀects. I consider a model with near rank deﬁcient Q ̺ w matrix as being weaklyidentiﬁed following the terminology of Dufour and Hsiao (2010). The following subsectionprovides a discussion of the empirical contexts in which existing network eﬀects identiﬁcationstrategy may become weak. Since Manski (1993), the identiﬁcation problem in network models has been a major concernfor econometricians. After ﬁnding that separately identifying endogenous and exogenousinteraction eﬀects in a linear-in-mean model is not possible, many subsequent studies haveinvestigated network structures in which identiﬁcation is possible. The identiﬁcation of thenetwork eﬀect is achieved through group size variation or by exploiting the structure of thenetwork. It is notable that in all cases, additional information is required to overcome thereﬂection problem.Lee (2007) uses variations in group sizes to identify both the endogenous and exogenousinteraction eﬀects. His identiﬁcation relies on having suﬃcient variation in group size. For The condition number is the ratio between the largest and the smallest eigenvalue of a symmetric matrix (see¨Ozt¨urk and Akdeniz (2000) for the relation between ill-conditioned and multicollinearity). xample, if we assume that the we have two groups form m and m individual and we considerthe adjacency matrix formed as follows W ii = 0 and W ij = 1 m k − i and j belong into the same group k . W can be represented as a block diagonal matrix. Its distinct eigenvaluesare τ = − m − τ = − m − τ = 1. If the group sizes are equal, we have exactlytwo distinct eigenvalues. An the network eﬀect cannot be identiﬁed. Diﬀerent group sizes leadto more than two distinct eigenvalues. The spectral decompossition of the adjacency matrixleads to the same conclusion as in the comments from Bramoull´e, Djebbari, and Fortin (2009)on Lee’s identiﬁcation with two groups of diﬀerent same sizes. I can show that with grouplarge group size there is almost near-perfect collinearity between W X, W X, ..., W ̺ w − X and, X . Or in other words, with large groups, the identiﬁcation can be weak.More precisely, let us consider the model presented in Section 2. To focus the discussionon the possibility of model’s weak identiﬁcation, we will consider the version of the socialinteraction model without spatial serial correlation.For an individual in group r , the model above gives y ir = λ  m r − m r X j = i y jr  + x ir β +  m r − m r X j = i x jr  β + γ r + ε ir . (6)The reduced form after a within transformation is given by: y ir − ¯ y r = ( x ir − ¯ x r ) ( m r − β m r − λ − ( x ir − ¯ x r ) β m r − λ + m r − m r − λ ( ε ir − ¯ ε r ) (7)where ¯ y r , ¯ x r , ¯ x r , and ¯ ε r are the group average of the variables excluding individual i (seeequation 12 in Bramoull´e, Djebbari, and Fortin (2009), and equation 2.5 in Lee (2007)). Tosimplify the discussion, without loose of generality, let assume that x ir = x ir . Thus, y ir − ¯ y r = ( x ir − ¯ x r ) ( m r − β − β m r − λ + m r − m r − λ ( ε ir − ¯ ε r ) (8)Each reduced form equation gives value for ( m r − β − β m r − λ . Identiﬁcation of the parame-ters in this model comes from the variations in ( m r − β − β m r − λ . Indeed, Bramoull´e, Djebbari, and Fortin(2009) show that we need at least three diﬀerent group sizes to be able to identify β , β and, ν . The parameters are obtained after solving a system of linear equations. There is a needfor at least three distinct equations for a unique solution.When the group size becomes large, ( m r − β − β m r − λ converge to constants, which meansno or very small variation as m becomes large in the coeﬃcient of the reduced form of all roups. A explanation is that with a large group, the marginal contribution of an additionalmember of the group is relatively small, which means that the amount of exogenous variationuseful for identiﬁcation vanishes as the group’s size increases. This situation is a case of weakidentiﬁcation of the network eﬀects.The spectral decomposition of the adjacency matrix associated with Lee’s model is ablock diagonal matrix. The distinct eigenvalues are given by τ r = − m r − r = 1 , ..., ¯ r and τ ¯ r +1 = 1. As the group sizes increase, the diﬀerence between the eigenvalue τ r decreases. Thenumber of distinct eigenvalue because nearly equal to two and the model is weakly identiﬁed.It can then be said that if the groups are large based on Proposition 1, X , W X and W X will be nearly linearly dependent, leading to weak identiﬁcation.Bramoull´e, Djebbari, and Fortin (2009) use the structure of the network to identify thenetwork eﬀect. Their work proposes a general framework that incorporates Lee’s and Manski’ssetups as special cases. The identiﬁcation strategy proposed in their work relies on the use ofspatial lags of friends’ (i.e. friends of friends’) characteristics as instruments. The variables W X, W X and W X... are used as instruments for

W Y . The condition for identiﬁcation isthat

I, W and W (or, as noted in Proposition 1 and 4 of Bramoull´e, Djebbari, and Fortin(2009), I, W, W and W in the presence of correlated eﬀects) are linearly independent.Variation in group size ensures that I, W and W are linearly independent. If the networkis highly transitive (i.e. a friend of my friend is likely to be my friend too; W ∼ W ),identiﬁcation is also weak. In practice, using W X, W X and W X... as instruments canlead to near-perfect collinearity, which implies weak identiﬁcation (Gibbons and Overman(2012)). The the nearly violation of the full rank condition of Proposition 2 is a potentialsource of weak identiﬁcation. Because, it leads to a near-perfect collinearity occurring in theﬁrst-stage regression of the endogenous network eﬀect. The use of regularization methods,such as ridge regression, has been shown to solve these problems.Liu and Lee (2010) also consider the estimation of a social interaction network model.As in Bramoull´e, Djebbari, and Fortin (2009), they exploit the structure of the network toidentify the network eﬀect. In addition to

W X, W X and W X... , the Bonacich centralityacross nodes in a network is used as an instrumental variable to identify network eﬀects andimprove estimation eﬃciency. The use of the Bonacich centrality measure usually leads to The extreme case of fully connected graph has exactly two distinct eigenvalues. An the application of theProposition 1 implies that the network eﬀects are not identiﬁed. he use of many instruments. The 2SLS estimates obtained with these instruments are biasedbecause of the large number of instruments used. Liu and Lee (2010) propose a bias-corrected2SLS method to account for this.In this paper, I use regularization techniques. These high-dimensional estimation tech-niques enable the use of all instruments and deliver eﬃciency with better ﬁnite sample prop-erties (see Carrasco (2012) and Carrasco and Tchuente (2015)). In this case, asymptoticeﬃciency can be obtained by using many (or all potential) instruments. I use both theBonacich centrality measure and W X, W X and W X... as instrumental variables and applya high-dimensional technique to mitigate the problem of near-perfect collinearity resultingfrom network structure or/and the bias of many instruments.

The parameters of interest can be estimated using instrumental variables. We can use a ﬁnitenumber of instruments or all potential instruments. As the number of instruments increases,estimation becomes asymptotically more eﬃcient. However, a large number of instrumentsrelative to the sample size creates the many instruments problem (see, for example, Bekker(1994), Donald and Newey (2001) and Han and Phillips (2006)). The parameter of interestcan also be weakly identiﬁed when a ﬁxed number of instruments is used but the structureof the interaction does not provide suﬃcient exogenous variation. In such cases, using a ﬁxednumber of instrumental variables will not avoid the bias problem in the estimation.The 2SLS estimator with a ﬁxed number of instrumental variables will be consistent andasymptotically normal, but may be less eﬃcient than using many instruments. In order touse all potential instruments ( Q ), I use regularization tools. In addition to addressing themany instruments bias in Carrasco (2012), my objective is to use regularization to addressthe weak identiﬁcation problem.Let ε ( ρ , δ ) = J R ( Y − Zδ ), with δ = ( λ, β ′ ) ′ and Z = ( W Y, X ). The estimation is basedon moments corresponding to the orthogonality condition of Q and J ε given by E ( Q ′ ε ( ρ , δ )) = 0 (9) The set of instrumental variables is Q = J [ Q , M Q ] with Q = [ W X, W X, ...W ι, W ι, ..., X ]. They can benormalized or standardized. y identiﬁcation results are conditional on ρ . I should ﬁrst have a preliminary estimator˜ ρ of ρ . I take ˜ R = I − ˜ ρM to be an estimator of R .The regularized estimators used in this paper require the deﬁnition of some mathematicalobjects. My notation follows existing notation in the literature on regularization methods.The set of all potential instrumental variables ( Q ) is a countable inﬁnite set. π is a positivemeasure on N , and l ( π ) is the Hilbert space of square-summable sequences with respect to π in the real space. I deﬁne the covariance operator K of the instruments as K : l ( π ) → l ( π )( Kg ) j = X k ∈ N E ( Q ji Q ki g k π k )where Q ji is the j th column and i th line of Q . Under the assumption that | Q ji Q ki | for all j, k and i are uniformly bounded, K is a compact operator (see Carrasco, Florens, and Renault(2007) for a deﬁnition). Indeed, under Assumption 2, the operator K is a Hilbert-Schmidtoperator; I assume that it has non-zero eigenvalues. I consider ν j ; j = 1 , , ... to be the eigenvalues (in decreasing order) of K , and φ j ; j =1 , , ... to be the orthogonal eigenvector of K . K can be estimated by K n , deﬁned as: K n : l ( π ) → l ( π )( K n g ) j = X k ∈ N n n X i =1 Q ji Q ki g k π k . In the SAR model, the number of potential moment conditions can be inﬁnite as inequation (9). Therefore, the inverse of K n needs to be regularized because it is nearly singular.By deﬁnition (see Kress (1999), p. 269), a regularized inverse of an operator K is R α : l ( π ) → l ( π )such that lim α → R α Kϕ = ϕ , ∀ ϕ ∈ l ( π ) . I consider three diﬀerent types of regularization schemes: Tikhonov (T), Landweber-Fridman (LF) and principal component (PC). They are deﬁned as follows: For a detailed discussion on the role and choice of π , see Carrasco (2012) and Carrasco and Florens (2014) when π is a measure on R . In my model, π can, for example, be π k = λ k P k ∈ N λ k with k ∈ N . I assume that the element of X are uniformly bounded. Tikhonov (T)

Tikhonov regularization is also known as ridge regularization:( K α ) − r = ( K + αI ) − Kr or ( K α ) − r = ∞ X j =1 ν j ν j + α (cid:10) r, φ j (cid:11) φ j where α > I is the identity operator. • Landweber-Fridman (LF)

Let 0 < c < / k K k , where k K k is the largest eigenvalue of K (which can be estimatedby the largest eigenvalue of K n ). Then,( K α ) − r = ∞ X j =1 [1 − (1 − cν j ) α ] ν j (cid:10) r, φ j (cid:11) φ j where 1 α is some positive integer. Principal component (PC)

This method consists of using the ﬁrst eigenfunctions:( K α ) − r = /α X j =1 ν j (cid:10) r, φ j (cid:11) φ j where 1 α is some positive integer. The use of PC in the ﬁrst stage is equivalent to projectingon the ﬁrst principal components of the set of instrumental variables.In the case of a ﬁnite number of moments, P m = Q m ( Q ′ m Q m ) − Q ′ m is the projectionmatrix on the space of instruments. The matrix Q ′ m Q m may become nearly singular when m gets large. Moreover, when m > n , Q ′ m Q m is singular. To address these cases, Iconsider a regularized version of the inverse of the matrix Q ′ m Q m .I use ψ j to represent the eigenvectors of the n × n matrix Q m Q ′ m /n associated witheigenvalues, ν j . For any vector e , the regularized version of P m , P αm is: P αm e = 1 n n X j =1 q ( α, ν j ) (cid:10) e, ψ j (cid:11) ψ j (cid:10) ., . (cid:11) represents the scalar product in l ( π ) and in R n (depending on the context). here for T, q ( α, ν j ) = ν j ν j + α ; for LF, q ( α, ν j ) = [1 − (1 − cν j ) /α ]; and for PC, q ( α, ν j ) = I ( j ≤ /α ).The network models suggest the use of an inﬁnite number of instruments, which is the rea-son we are not using instrument selection methods. Following Carrasco and Florens (2000),I deﬁne the counterpart of P α for an inﬁnite number of instruments as P α = G ( K αn ) − G ∗ where G : l ( π ) → R n with Gg = (cid:0) h Q , g i ′ , h Q , g i ′ , ..., h Q n , g i ′ (cid:1) ′ and G ∗ : R n → l ( π ) with G ∗ v = 1 n n X i =1 Q i v i such that K n = G ∗ G and GG ∗ is an n × n matrix with a typical element (cid:10) Q i , Q j (cid:11) n . Let φ j , ν ≥ ν ≥ ... > j = 1 , , ... be the orthonormalized eigenvectors and eigenvalues of K n , and ψ j be the eigenfunctions of GG ∗ . Gφ j = √ ν j ψ j and G ∗ ψ j = √ ν j φ j . Note that in this case for e ∈ R n , P α e = ∞ X j =1 q ( α, ν j ) (cid:10) e, ψ j (cid:11) ψ j . We can also note that: v ′ P α w = v ′ G ( K αn ) − G ∗ w = * ( K αn ) − / n X i =1 Q i ( . ) v i , ( K αn ) − / n n X i =1 Q i ( . ) w i + . (10)Our objective is to estimate the parameters of the model.I consider S n ( k ) = 1 n n X i =1 ( ˇ Y i − ˇ Z i δ ) Q ik with ˇ Y = ˜ RY and ˇ Z = ˜ RZ .And I denote ( K αn ) − as the regularized inverse of K n and ( K αn ) − / = (( K αn ) − ) / .The regularized 2SLS estimator of δ is deﬁned as:ˆ δ R sls = argmin (cid:10) ( K αn ) − / S n ( . ) , ( K αn ) − / S n ( . ) (cid:11) . (11)Solving the minimization problem, we haveˆ δ R sls = ( Z ′ ˜ R ′ P α ˜ RZ ) − Z ′ ˜ R ′ P α ˜ RY. (12) quation (12) deﬁnes the regularized 2SLS. The regularized 2SLS for SAR is closely relatedto the regularized 2SLS of Carrasco (2012) and the 2SLS of Liu and Lee (2010). It extendsCarrasco (2012) by considering SAR models and diﬀers from Liu and Lee (2010) in that theprojection matrix P is replaced by its regularized counterpart P α .The 2SLS estimators proposed in this paper are for cases with spatial serial correlation andhomoscedastic errors. Extending the regularization approach to deal with heteroscedasticity isleft for future research. Indeed, in a companion paper, I propose regularized GMM estimatorsallowing the joint estimation of all parameters of the model, and the variance covarianceestimator of the estimate is obtained using an approach similar to West and Newey (1987). The following proposition shows the consistency and asymptotic normality of the regularized2SLS estimators. The following extra assumptions are needed.

Assumption 3. H = lim n →∞ n f ′ f is a ﬁnite nonsingular matrix. Assumption 4. (i) The elements of X are uniformly bounded, X has full rank k , E ( ε | X ) = 0, and lim n →∞ n X ′ X exists and is nonsingular.(ii) There is a ω ≥ / ∞ X j =1 (cid:10) E ( Z ( ., x i ) f a ( x i )) , φ j (cid:11) ν ω +1 j < ∞ . Assumption 4 (ii) ensures that regularization allows us to obtain a good asymptotic ap-proximation of the best instrument, f . Proposition 4

Under Assumptions 1-4 , ˜ ρ − ρ = O p (1 / √ n ) and α → . Then, the T, LFand PC estimators satisfy:1. Consistency: ˆ δ R sls → δ in probability as n and α √ n go to inﬁnity.2. Asymptotic normality: √ n (ˆ δ R sls − δ ) d → N (cid:0) , σ ε H − (cid:1) as n and α √ n go to inﬁnity. The convergence rate of the regularized 2SLS estimators for SAR is diﬀerent from thoseobtained without spatial correlation. For consistency in the SAR model, α √ n must go to nﬁnity. The Carrasco (2012) regularized 2SLS estimator is consistent with a convergence rateof nα . Asymptotic normality is obtained if α √ n goes to inﬁnity, which is also diﬀerent fromthe Carrasco (2012) asymptotic normality condition for 2SLS. The regularization parameter α is allowed to go to zero slower than in Carrasco (2012) for consistency. Compared to Carrasco(2012), more regularization is needed in order to achieve appropriate asymptotic behavior.The reinforcement of these conditions is certainly due to regularization taking into accountthe spatial representation of the data.If the regularization parameter is constant, the asymptotic variance will be bigger. However,asymptotically, the use of regularization should not be needed. It is therefore reasonable tohave α → α √ n goes to inﬁnity.The bias of the 2SLS estimator in Liu and Lee (2010) is of the form √ nb sls = σ tr ( P α RW S − R − )( Z ′ RP α RZ ) − e . Using Lemma 1 and 2 in the Appendix, I show that the 2SLS bias is of order √ nb sls = O p ( 1 α √ n ), which goes to zero as α √ n goes to inﬁnity. The ability to choose the regulariza-tion parameter means that we are able to control the size of α √ n . Therefore, selecting theappropriate regularization parameter is crucial.The regularization methods presented involve the use of eigenvalues and eigenvectors. Theeigenvalues obtained can vary greatly because of the diﬀerence in the variance of instrumentalvariables in the model. For example, W X and W ι could have diﬀerent variances. To accountfor this diﬀerence, I use normalized instruments in the Monte Carlo simulation. We canalso standardize the instruments, which means that regularization methods will be able toaccount for the diﬀerence in location and scale of the instruments. In addition, the regularizedestimator presented in this section depend on the regularization parameter, α . The choice ofthis parameter is very important for the estimators’ behavior in small samples. In Section 4,I discuss the selection of the regularization parameter. Selection of the Regularization Parameter

This section discusses the selection of the regularized parameter for network models. I ﬁrstderive an approximation of the mean-squared error (MSE) using Nagar-type expansion. I es-timate the dominant term of the MSE, and select the regularization parameter that minimizesthis term.

The following proposition provides an approximation of the MSE:

Proposition 5

If Assumptions 1 to 4 hold, ˜ ρ − ρ = O p (1 / √ n ) and nα → ∞ for LF-, PC-and T-regularized 2SLS estimators, then n (ˆ δ R sls − δ )(ˆ δ R sls − δ ) ′ = Q ( α ) + ˆ R ( α ) ,E ( Q ( α ) | X ) = σ ε H − + S ( α ) , (13) and r ( α ) /tr ( S ( α )) = o p (1) , with r ( α ) = E ( ˆ R ( α ) | X ) and S ( α ) = σ ε H −  f ′ (1 − P α ) fn + σ ε n X j q j  e ι ′ D ′ Dιe ′  H − . For LF and PC, S ( α ) = O p (cid:18) nα + α ω (cid:19) and for T, S ( α ) = O p (cid:18) nα + α min( ω, (cid:19) , with D = J RW S − R − and e is the ﬁrst unit (column) vector. For the selection of α , the relevant dominant term S ( α ) will be minimized to achieve thesmallest MSE. S ( α ) accounts for a trade-oﬀ between the bias and variance. When α goes tozero, the bias term increases while the variance term decreases. The approximation of theregularized 2SLS estimator is similar to Carrasco-regularized 2SLS. However, the expressionof the MSE is more complicated because of the spatial correlation. .2 Estimation of the MSE The aim of this subsection is to ﬁnd the regularized parameter that minimizes the conditionalMSE of ¯ γ ′ ˆ δ sls for some arbitrary k + 1 × γ . This conditional MSE is: M SE = E [¯ γ ′ (ˆ δ sls − δ )(ˆ δ sls − δ ) ′ ¯ γ | X ] ∼ ¯ γ ′ S ( α )¯ γ ≡ S ¯ γ ( α ) .S ¯ γ ( α ) involves the function f , which is unknown. We therefore need to replace S ¯ γ with anestimate.Stacking the observations, the reduced form equation can be rewritten as RZ = f + v. This expression involves n × ( k +1) matrices. We can reduce the dimension by post-multiplyingby H − ¯ γ : RZH − ¯ γ = f H − ¯ γ + vH − ¯ γ ⇔ RZ ¯ γ = f ¯ γ + v ¯ γ (14)where v ¯ γi = v ′ i H − ¯ γ is a scalar.I use ˜ δ to denote a preliminary estimator of δ , obtained from a ﬁnite number of instru-ments. I use ˜ ρ to denote a preliminary estimator of ρ , obtained by the method of momentsas follows: ˜ ρ = armin ˜ g ( ρ ) ′ ˜ g ( ρ )where ˜ g ( ρ ) = [ M ˜ ε ( ρ ) , M ˜ ε ( ρ ) , M ˜ ε ( ρ )] ′ ˜ ε ( ρ ), M = J W J − tr ( J W J ) I/tr ( J ) ,M = J M J − tr ( J M J ) I/tr ( J ) ,M = J M W J − tr ( J M W J ) I/tr ( J ) , and ˜ ε ( ρ ) = J R ( ρ )( Y − Z ′ ˜ δ ) . ˜ δ = [ Z ′ Q ( Q ′ Q ) − Q ′ Z ] − Z ′ Q ( Q ′ Q ) − Q ′ Y , where Q is a single instrument. The residualis ˆ ε ( ρ ) = J R (˜ ρ )( Y − Z ′ ˜ δ ) . et us denote ˆ σ ε = ˆ ε ( ρ ) ′ ˆ ε ( ρ ) /n , ˆ v ¯ γ = ( I − P ˜ α ) R (˜ ρ ) Z ˜ H − ¯ γ , where ˜ H is a consistentestimate of H and ˜ α is a preliminary value for α , ˜ v ¯ γ = ( I − P ˜ α ) R (˜ ρ ) Z ˜ H − ¯ γ and ˆ σ v ¯ γ = ˜ v ′ ¯ γ ˜ v ¯ γ /n .I consider the following goodness-of-ﬁt criteria: Mallows C p (Mallows (1973)) ˆ ̟ m ( α ) = ˆ v ′ ¯ γ ˆ v ¯ γ n + 2ˆ σ v ¯ γ tr ( P α ) n . Generalized cross-validation (Craven and Wahba (1979))ˆ ̟ cv ( α ) = 1 n ˆ v ′ ¯ γ ˆ v ¯ γ (cid:16) − tr ( P α ) n (cid:17) . Leave-one-out cross-validation (Stone (1974))ˆ ̟ lcv ( α ) = 1 n n X i =1 ( ˜ RZ ¯ γ i − ˆ f α ¯ γ − i ) , where ˜ RZ ¯ γ = W ˜ H − ¯ γ , ˜ RZ ¯ γ i is the i th element of ˜ RZ ¯ γ and ˆ f α ¯ γ − i = P α − i ˜ RZ ¯ γ − i . The n × ( n − P α − i is such that the P α − i = G ( K αn − i ) G ∗− i are obtained by suppressing the i th observationfrom the sample. ˜ RZ ¯ γ − i is the ( n − × i th observationof ˜ RZ ¯ γ .Using (13), S ¯ γ ( α ) can be rewritten as S ¯ γ ( α ) = σ ε  f ′ ¯ γ ( I − P α ) f ¯ γ n + σ ε n X j q j  e γ ι ′ D ′ Dιe ′ γ  . Using Li (1986)’s results on C p or cross-validation procedures, note that ˆ ̟ ( α ) approxi-mates to ̟ ( α ) = f ′ ¯ γ ( I − P α ) f ¯ γ n + σ v ¯ γ tr (cid:0) ( P α ) (cid:1) n . Therefore, S γ ( α ) is estimated using the following equation:ˆ S ¯ γ ( α ) = ˆ σ ε " ˆ ̟ ( α ) − ˆ σ v ¯ γ tr (cid:0) ( P α ) (cid:1) n + ˆ σ ε n ( tr ( P α )) e γ ι ′ ˜ D ′ ˜ Dιe ′ γ here ˜ D is a consistent estimator of D . The optimal regularization parameter is obtainedby minimising ˆ S ¯ γ ( α ) with respect to α . My selection procedure is very similar to Carrasco(2012), and its optimality can be established using the results of Li (1986) and Li (1987).The regularized 2SLS process and the selection of the regularization parameters arebased on a preliminary estimator of ρ . This means that if ρ is not correctly estimated,the estimation of δ could be biased in an unpredictable direction. Also, the use of a cross-validation-type method to choose the regularization parameter usually inﬂuences the qualityof inference. This is similar to the inference problem in non-parametric estimation (seeNewey, Hsieh, and Robins (1998) and Guerre and Lavergne (2005)). This paper focuses onthe point estimation of the parameter; post-regularization inference is left for future research. To investigate the ﬁnite sample performance of the regularized 2SLS estimators, I conduct asimulation study based on the following model: Y = λ W Y + Xβ + W Xβ + ια + u with u = ρ M u + ε .I generate four samples with diﬀerent numbers of groups (¯ r ) and group sizes ( m r ). Theﬁrst sample contains 30 groups, each with 10 individuals. The second sample contains 60groups, also with 10 individuals each. To study the eﬀect of group sizes, I also consider 30and 60 groups of 15 individuals.For each group, the sociomatrix W r is generated as follows. First, for the i th row of W r ( i = 1 , ..., m r ), k ri is generated uniformly at random from the set of integers [0, 1, 2, 3], [0,1, ..., 6] or [0, 1, ..., 8]. Allowing for diﬀerences in the maximum number of friends helps usstudy the eﬀect of the density of the network on the estimators.The sociomatrix W r is constructed as follows. First, set the ( i +1) th, ..., ( i + k ri ) th elementsof the i th row of W r to be 1 and the rest of the elements in that row to be 0, if i + k ri ≤ m r .Otherwise, the entries of 1 will be wrapped around.In the case of k ri = 0, the i th row of W r will only contain zeros. M is the row-normalized W . X ∼ N (0 , I ), α r ∼ N (0 , . ε r,i ∼ N (0 , β = β =0 . λ = ρ = 0 . he estimation methods considered are: • Q = J [ X, W X, M X, M W X ], • Q = [ Q , J W ι ], and • the regularized 2SLS estimators T-2SLS (Tikhonov), LF-2SLS (Landweber-Fridman)and PC-2SLS (principal component), with many instruments, ˜ Q . ˜ Q is a matrix ofinstruments with Q ’s instruments normalized to unit variance. For all 2SLS estimators, a preliminary estimator of ρ is obtained by the method of mo-ments,˜ ρ = argmin ˜ g ( ρ ) ′ ˜ g ( ρ ) where ˜ g ( ρ ) = [ M ˜ ε ( ρ ) , M ˜ ε ( ρ ) , M ˜ ε ( ρ )] ′ ˜ ε ( ρ ), M = J W J − tr ( J W J ) I/tr ( J ) ,M = J M J − tr ( J M J ) I/tr ( J ) ,M = J M W J − tr ( J M W J ) I/tr ( J ) , ˜ ε ( ρ ) = J R ( ρ )( Y − Z ′ ˜ δ )and ˜ δ = [ Z ′ Q ( Q ′ Q ) − Q ′ Z ] − Z ′ Q ( Q ′ Q ) − Q ′ Y. The selection of the regularization parameter follows the procedure proposed in Section4. I minimise the estimated approximated MSE.Before presenting the results of the simulations, it is important to note that the data-generating process in this experiment exhibits a very low transitivity level (there are a lotof non-connected individuals in all groups). Moreover, the reduced-form model is sparse (forexample, when the maximum number of friends is 3, W q = 0 for q > As noted by Newey (2013), the choice of identity for the matrix for the Tikhonov regularization method doesnot account for any diﬀerence in location and scale of the instruments. . The additional linear moment conditions reduce standard deviations in 2SLS estimatorsof λ and β . The 2SLS estimators in the model with a large number of instrumentshave smaller standard deviations than the 2SLS estimators in the model with a ﬁnitenumber of instruments.2. The additional instruments in Q introduce bias into the 2SLS estimators of λ and β .The 2SLS estimators from the model with a ﬁnite number of instruments have a meanvalue of estimators closer to the true value of the parameter than the 2SLS estimatorsfrom the model with a large number of instruments.3. The regularized 2SLS procedures substantially reduce the many instruments bias forboth the 2SLS estimators, particularly in large samples. The bias-correction estimatorsare similar to regularized estimators in term of bias correction for large samples froma denser network. But in small samples, the bias of the bias-corrected estimator issmaller than that of the regularized estimators. Relative to the 2SLS estimators fromthe model with many instruments, the regularized 2SLS estimators reduce the bias andhave comparable standard deviations.4. The performance of the regularized estimators improves with the density of the networkand the number of groups. The behavior of the regularized estimator with respectto network density suggests that the regularized estimators are good candidates toimprove the asymptotic behavior of the estimator of the network eﬀect when the levelof transitivity in the groups is very high. The large theoretical literature on local government tax competition can be divided in twogroups: eﬃcient local taxation (Tiebout (1956)) and tax competition models departing fromTiebout’s model (Lyytik¨ainen (2012)). The departure from Tiebout’s model leads to threetypes of ﬁscal consequences: beneﬁt spillovers, distorting taxes on a mobile tax base, politicaleconomy considerations and information asymmetries (Lyytik¨ainen (2012)). While the causesof local government tax interaction are certainly present in most legislation, the empiricalliterature has long been divided on how to identify a causal local tax competition (interaction) ﬀect.The identiﬁcation problem here is a special case of Manski’s reﬂection problem. In thecase of municipalities in the same legislation, the network matrix can be represented bythe spatial matrix of neighbors. This neighborhood structure of the municipality can beconsidered as exogenous with respect to tax level. I propose a model to test the hypothesisof tax competition between municipalities: T itr = λW r T itr + β X itr + β W r X itr + α r + ε itr The identiﬁcation and estimation of the tax competition parameter ( λ ) is achieved, in alarge part of the empirical literature, via two strategies. The ﬁrst strategy uses spatial lags asinstruments (friends of friends’ characteristics) in an instrumental variables approach, whilethe second uses maximum likelihood estimation, where identiﬁcation is achieved via modelspeciﬁcation. As pointed out by Gibbons and Overman (2012), the causality of the parame-ters obtained in these cases is not easy to defend. The validity of the exclusion restriction isnot obvious and the correct speciﬁcation of the model is not fully testable. As an alternative,Gibbons and Overman (2012) propose using diﬀerencing coupled with instrumental variablescoming from exogenous policy variations.Lyytik¨ainen (2012) estimates a tax competition parameter among Finnish local govern-ments. He uses changes in statutory lower limits to property tax rates as a source of exogenousvariation to estimate the tax competition parameter ( λ ) on ﬁrst diﬀerence model. He esti-mates the following model: T i − T i = λ X j = i w ij ( T j − T j )+ β ( X i − X i )+ β X j = i w ij ( X j − X j )+ v i . where w ij = 1 /n i with n i the number of neighbor of the individual i .The second column in Table 1 replicates the estimate using the instrument from Lyytik¨ainen(2012). He assumes that β = 0 and use only one excluded instrument. Other estimationsare carried out using spatial lag of the second-, third- and fourth-order and regularized esti-mators. The results in Table 1 suggest that the use of many instruments by adding more spatiallags biases the results. The use of regularization seems to reduce the bias of the estimation. The instrument used in Table 3 of Lyytik¨ainen (2012) is one of the instruments used with the spatial lag ofother exogenous variables. I have augmented this model to account for an exogenous network model. n = 411) Estimators/IVs Lyytik¨ainen (2012) Spatial lags 2 Spatial lags 2 and 3 Spatial lags 2, 3 and 42SLS 0.06 (0.07) 0.26(0.28) -0.02 (0.22) -0.03(0.17)T-2SLS 0.01(0.004) 0.19(0.30) 0.185(0.31) 0.182(0.26)L-2SLS 0.01(0.0005) 0.20(0.22) 0.186(0.33) 0.182(0.31)PC-2SLS 0.0115(0.42) 0.26(0.28) -0.027(0.22) -0.039(0.17)Cond. number( ν ν min ) 2153.8 2001 17800 1.3983e+05 Standard errors are in parenthesis. The change in general property taxation between 1999 and 2000 is the dependent variable. The independent variablesare changes in neigboring municipalities’ tax rates, the municipality’s own imposed increase, non-zero own imposed increase and changes in municipalattributes, such as grants from the central government, disposable income per capita, the unemployment rate and age structure (see Table 3 of Lyytik¨ainen(2012)) for more details). The last line of the table shows the condition numbers of QQ ′ matrices for diﬀerent instrument sets. The values are relativelylarge, suggesting a near-perfect collinearity problem in small samples. The simulation results indicate that T-2SLS and L-2SLS are the best methods in terms of biascorrection. The point estimates obtained by both estimation methods are very similar, whichsuggests a bias correction relative to the 2SLS. As the number of instruments increases,the standard errors decrease for the 2SLS as well as for the regularized 2SLS. However,the standard errors are still very large, which means that the tax competition eﬀect is notstatistically signiﬁcantly diﬀerent from zero. This empirical example shows how the regularized estimator can be used to improve theestimation of network models. The size of the tax competition parameter appears to be largerthan is suggested by Lyytik¨ainen (2012). The estimators are not statistically diﬀerent fromzero. However, the regularized estimators (T-2SLS and L-2SLS) appear to be more stableas the number of instruments increases, which suggests that the weak identiﬁcation problemmay have been solved.

This paper uses regularization methods to estimate network models. It proposes easy-to-check identiﬁcation conditions based on the network adjacency matrix number of distincteigenvalues. Regularization is proposed as a solution to the weak identiﬁcation problem innetwork models. Identiﬁcation of the network eﬀect can be achieved by using individuals’ Inference using the standard errors of regularized estimators does not account for regularization and should beinterpreted with caution, given the relatively small sample ( n = 411 municipalities). onacich (1987) centrality as instrumental variables. However, the number of instrumentsincreases with the number of groups, leading to the many instruments problem. Identiﬁcationcan also be achieved using the friend-of-a-friend’s exogenous characteristics. However, if thenetwork is very dense or group size is very large, the identiﬁcation is weakened.The proposed regularized 2SLS estimators based on three regularization methods helpaddress the weak identiﬁcation and many moments problems. These estimators are consistentand asymptotically normal. The regularized 2SLS estimators achieve the asymptotic eﬃciencybound. I derive an optimal data-driven selection method for the regularization parameter. Anapplication to the estimation of tax competition in Finnish municipalities shows the empiricalrelevance of my methods.A Monte Carlo experiment shows that the regularized estimator performs well. The reg-ularized 2SLS procedures substantially reduce the bias from the 2SLS estimators, speciﬁcallyin a large sample. Moreover, the regularized estimator becomes more precise and less biasedwith increases in the network density and in the number of groups. These results show thatregularization is a valuable solution to the potential weak identiﬁcation problem existing inthe estimation of network models. Appendix: Summary of notation

To simplify notation, I use the following: P = P α , q j = q ( ν j , α ) tr ( A ) is the trace of matrix Ae j is the j th unit (column) vector j = 1 , ..., ne f = 1 n f ′ ( I − P ) fe f = 1 n f ′ ( I − P ) f ,∆ f = tr ( e f ) and ∆ f = tr ( e f ) B Appendix: Lemmas

Lemma 0: (Lemma 4 and Lemma 5 of Carrasco (2012)) (i) tr ( P ) = X j q j = O (1 /α ) and tr ( P ) = X j q j = o (( X j q j ) ), Lemma 4 (i) of Carrasco(2012),(ii) ∆ f =  O p ( α ω ) f or LF and SCO p ( α min ( ω, ) f or T and f ′ ( I − P ) ε/ √ n = O p ( p ∆ f ), Lemma 5 (i) and(ii) of Carrasco (2012),(iii) u ′ P ε = O p (1 /α ), Lemma 5 (iii) of Carrasco (2012),(iv) E [ u ′ P εε ′ P u | X ] = ( X j q j ) σ uε σ ′ uε + ( X j q j )( σ uε σ ′ uε + σ ε Σ u ), Lemma 5 (iv) of Carrasco(2012),(v) E [ f ′ ( I − P ) εε ′ P u/n | X ] = O p (∆ f / √ αn ), Lemma 5 (viii) of Carrasco (2012). Lemma 1: (i) tr ( P ) = X j q j = O (1 /α ) and tr ( P ) = X j q j = o (( X j q j ) ).(ii) Suppose that { A } is a sequence of n × n UB matrices. For B = P A , tr ( B ) = o (( X j q j ) ), tr ( B ) = o (( X j q j ) ), and X i B ii = o (( X j q j ) ), where B ii are diagonal elements of B . Proof of Lemma 1: (i) Proof is in Carrasco (2012) Lemma 4 (i).(ii) By eigenvalue decomposition, AA ′ = Π∆Π ′ , where Π is an orthonormal matrix and ∆ is he eigenvalue matrix. It follows that P AA ′ P ≤ ν max P with ν max being the largest eigen-value. It follows that tr ( P AA ′ P ) ≤ ν max tr ( P ) = o p (( X j q j ) ). By the Cauchy-Schwarzinequality, tr ( B ) ≤ [ tr ( P )] / [ tr ( P AA ′ P )] / = o p (( X j q j ) ). Also by the Cauchy-Schwarzinequality, tr ( B ) ≤ tr ( BB ′ ) = tr ( P AA ′ P ) = o (( X j q j ) ). Lemma 2:

Let C and D be two UB n × n matrix sequences.(i) C ′ P D = O p ( n/α )(ii) ε ′ C ′ P Dε = O p (1 /α ) and C ′ P Dε = O p ( √ n/α ) Proof of Lemma 2: (i) By the Cauchy-Schwarz inequality, | e ′ i C ′ P α De j | ≤ q e ′ i C ′ Ce i q e ′ j D ′ P De j = O ( n/α ),which implies that C ′ P D = O ( n/α ).(ii) E | ε ′ C ′ P Dε | ≤ p E ( ε ′ C ′ P Cε ) p E ( ε ′ D ′ P Dε ) = σ p tr ( C ′ P C ) p tr ( D ′ P D ) = O ( 1 α ).By the Markov inequality, ε ′ C ′ P Dε = O p ( 1 α ).By the Cauchy-Schwarz inequality, | e ′ j C ′ P Dε | ≤ q e ′ j C ′ Ce j √ ε ′ D ′ P Dε = O p ( √ n/α ), thus C ′ P Dε = O p ( √ n/α ). Lemma 3:

Suppose ˜ ρ is a consistent estimator of ρ and ˜ R = R (˜ ρ ).Then, 1 n Z ′ ˜ R ′ P ˜ RZ = 1 n Z ′ R ′ P RZ + O p [(˜ ρ − ρ ) /α ] and1 n Z ′ ˜ R ′ P ˜ RR − ε = 1 n Z ′ RP ε + O p [(˜ ρ − ρ ) / ( α √ n (1 + α √ n ))]. Proof of Lemma 3: ˜ R = R − (˜ ρ − ρ ) M . Thus, Z ′ ˜ R ′ P ˜ RZ/n = Z ′ R ′ P RZ/n − (˜ ρ − ρ ) Z ′ M ′ P RZ/n − (˜ ρ − ρ ) Z ′ R ′ P M Z/n + (˜ ρ − ρ ) Z ′ M ′ P M Z/n

Let us show that Z ′ R ′ P M Z/n = O p (1 /α ) and Z ′ M ′ P M Z/n = O p (1 /α ).Note that Z = [ W S − ( Xβ + ιγ ) , X ] + W S − R − εe ′ .Under Assumption 3, Z ′ R ′ P M Z/n = O (1 /α ) + O p (1 / √ nα ) + O p (1 /nα ) = O p (1 /α ) and Z ′ M ′ P M Z/n = O p (1 /α ) by Lemma 2 (i). ′ ˜ R ′ P ˜ Rε/n = Z ′ RP ε/n − (˜ ρ − ρ ) Z ′ M ′ P ε/n − (˜ ρ − ρ ) Z ′ R ′ P M R − ε/n + (˜ ρ − ρ ) Z ′ M ′ P M R − ε/n Using the same argument as in the previous case under Assumption 3, Z ′ R ′ P M R − ε/n = O p (1 / √ nα +1 /nα ) = O p [1 /α √ n (1+1 /α √ n )] , Z ′ M ′ P ε/n = O p [1 /α √ n (1+1 /α √ n )] and Z ′ M ′ P ε/n = O p [1 /α √ n (1 + 1 /α √ n )] by Lemma 2 (ii). Lemma 4:

If Assumptions 1-4 are satisﬁed and α →

0, then(i) Z ′ RP RZ/n = H + o p (1) if α √ n → ∞ , and(ii) Z ′ RP ε/ √ n = f ′ ε/ √ n + o p (1) if α √ n → ∞ . Proof of Lemma 4:

Let v = J RW S − R − ε and J RZ = f + ve ′ (i) 1 n Z ′ RP RZ = 1 n f ′ f − n f ′ ( I − P ) f + 1 n e v ′ P ve ′ + 1 n f ′ P ve ′ + 1 n e v ′ P f

Let e f = 1 n f ′ ( I − P ) f , e f = 1 n f ′ ( I − P ) f , ∆ f = tr ( e f ) and ∆ f = tr ( e f ). By theCauchy-Schwarz inequality, 1 n | e ′ i f ′ ( I − P ) f e j | ≤ n q e ′ i f ′ f e i q e ′ j f ′ ( I − P ) f e j = O ( p ∆ f ) . From Carrasco (2012) Lemma 5 (i), ∆ f =  O p ( α ω ) f or LF and SCO p ( α min ( ω, ) f or T . Thus, ∆ f = o p (1).By Lemma 2 (ii), 1 n e v ′ P ve ′ + 1 n f ′ P ve ′ + 1 n e v ′ P f = O p ( 1 nα + 1 α √ n ) = o p (1).(ii) Z ′ RP ε/ √ n = f ′ ε/ √ n − f ′ ( I − P ) ε/ √ n + e v ′ P ε/ √ n By Lemma 5 (ii) of Carrasco (2012), f ′ ( I − P ) ε/ √ n = O p ( p ∆ f ) and by Lemma 2 (ii), e v ′ P ε/ √ n = O p (1 /α √ n ). C Appendix: Proofs of propositions

Proof of Proposition 1:

The Cayley-Hamilton theorem in linear algebra state that each square matrix is solution toit characteristic polynomial. The adjacency matrix of the network in our case is given by W , which is an n × n matrix. If it has two distinct eigenvalues, therefore, the characteristicpolynomial, p ( τ ) = det ( τ I n − W ), is a degree two polynomial. Thus, there exist a , a and, with a = 0 such that a I n + a W + a W = 0. I n , W and W are linearly dependantand from Proposition 1 of Bramoull´e, Djebbari, and Fortin (2009) the network eﬀects are notidentiﬁed. Proof of Proposition 2:

Under Assumption 2 (i.e. that

Sup k λW k < f can be approximated by a linear com-bination of ( J W X, J W X, ..., J W ̺ w − X ) and J X . Indeed, using Caley-Hamilton theoremand the fact that the characteristic polynomial has ̺ w distinct eigenvalues, For any naturalnumber q > ̺ w , W q can be written as a linear combination of I n , W, ..., W ̺ w − . Thus, W S − can be written a linear combination of I n , W, ..., W ̺ w − . Therefore, f can be approximatedby a linear combination of ( J W X, J W X, ..., J W ̺ w − X ) and J X .Let assume that [

W X, W X, ..., W ̺ w − X, X ] is full rank column.Let Q = J [ W X, W X, ..., W ̺ w − X, X ] be the set of instrumental variables. The identiﬁcationof the network eﬀects is based on the moment conditions E ( Q ′ ε ( ρ , δ )) = 0 (i.e. Q ′ f ( δ − δ ) =0). The parameters are point identiﬁed if the solution to this equation is unique. A necessaryand suﬃcient condition is that Q and f are full rank column. [ W X, W X, ..., W ̺ w − X, X ] isfull rank column if and only if Q is full rank column. Moreover, if [ W X, W X, ..., W ̺ w − X, X ]is full rank column the f is of rank 1 + k .Let assume that [ W X, W X, ..., W ̺ w − X, X ] is not full rank column. Consider B = { b ∈ R k × ̺ w , Xb + W Xb + ... + W ̺ w − Xb ̺ w − = 0 } It can be observed that f = [ J W S − ( Xβ ) , J X ] is equivalent to f = J [ ̺ w − X k =1 ς k W k Xβ , X ].Consider A = { a = ( a , a ) ∈ R k × R , Xa + a ̺ w − X k =1 ς k W k Xβ = 0 } f is not full rank if and only if A = { } .In other word, f is not full rank column if and only if there exist b ∈ B such that b = a , b k = a ς k β with β , ς k known constant for all k = 1 , ..., ̺ w − b = 0. The condition for f not being full rank column of very restrictive. However, if we assume that there exist sucha sub set in A , then f is not full rank.Note that in general, it is possible to have J W S − ( Xβ + ιγ ) linearly independent from J X without [

W X, W X, ..., W ̺ w − X, X ] being full rank column. This happen if β , λ and γ are not in the space parameter compatible with the null space of [ W X, W X, ..., W ̺ w − X, X ]. he condition [ W X, W X, ..., W ̺ w − X, X ] is full rank column is therefore a necessary butnot, in general, a suﬃcient condition for identiﬁcation. But if we restrict the true value of theparameter to be in the compatible set as in Bramoull´e, Djebbari, and Fortin (2009) Result 1(2) Page 54 the condition is necessary and suﬃcient.

Proof of Proposition 3:

The proof of proposition 3 is similar to that of proposition 2 with [

W X, W X, ..., W ̺ w − X, X ]replaced by Q ̺ w = [ W X, W X, ...W ̺ w − X, W ι, W ι, ..., W ̺ w − ι, X ]. The identiﬁcation re-sult in this case are conditional on a consistent preliminary estimation of ρ as in Liu and Lee(2010). Proof of Proposition 4:

The regularized 2SLS estimator satisﬁes ˆ δ R sls − δ = ( Z ′ ˜ R ′ P ˜ RZ ) − Z ′ ˜ R ′ P ˜ RR − ε.Z ′ ˜ R ′ P ˜ RZ/n = O p (1) + O n (1 /α √ n ) by Lemmas 3 and 4.˜ R ′ P ˜ RR − ε/n = O p (1 / √ n ) + O p [(1 / ( nα (1 + α √ n )) by Lemmas 3 and 4.Then, ˆ δ R sls − δ = o p (1) as α √ n → ∞ and α →

0. This proves the consistency of theregularized 2SLS for SAR with many instruments: √ n (ˆ δ R sls − δ ) = ( Z ′ ˜ R ′ P ˜ RZ/n ) − [ Z ′ ˜ R ′ P ˜ RR − ε/ √ n ] . Using Lemmas 3 and 4, as well as the Slutzky theorem: √ n (ˆ δ R sls − δ ) d → N (cid:0) , σ ε H − (cid:1) if α √ n → ∞ and α → Proof of Proposition 5

Let us consider the MSE of the estimated parameters: n (ˆ δ R sls − δ )(ˆ δ R sls − δ ) = ˆ H − ˆ h ˆ h ′ ˆ H − with ˆ H = Z ′ ˜ R ′ P ˜ RZn and ˆ h = Z ′ ˜ R ′ P ˜ RY √ n . Our objective is to approximate the MSE. Toachieve this, I use a Nagar-type approximation in order to concentrate on the largest part of he MSE. By Lemma 3,ˆ H = Z ′ RP RZ/n − (˜ ρ − ρ ) Z ′ M ′ P RZ/n − (˜ ρ − ρ ) Z ′ R ′ P M Z/n + (˜ ρ − ρ ) Z ′ M ′ P M Z/n.

And ˆ H = Z ′ RP RZ/n + O p ((˜ ρ − ρ ) /α ). By Lemma 4, we have thatˆ H = 1 n f ′ f − n f ′ ( I − P ) f + 1 n e v ′ P ve ′ + 1 n f ′ P ve ′ + 1 n e v ′ P f + O p ((˜ ρ − ρ ) /α ) . Let us deﬁne T H = T H + T H + T H , with T H = − n f ′ ( I − P ) f , T H = 1 n e v ′ P ve ′ and T H = 1 n f ′ P ve ′ + 1 n e v ′ P f + O p ((˜ ρ − ρ ) /α ), such thatˆ H = 1 n f ′ f + T H + T H + T H = H + T H + T H + T H + o p (1)= H + T H + o p (1) . Following similar arguments, we haveˆ h = f ′ ε/ √ n − f ′ ( I − P ) ε/ √ n + e v ′ P ε/ √ n + O p [(˜ ρ − ρ ) / ( α (1 + α √ n ))] . Let us also deﬁne T h = T h + T h with T h = − f ′ ( I − P ) ε/ √ n and T h = e v ′ P ε/ √ n + O p [(˜ ρ − ρ ) / ( α (1 + α √ n ))].We therefore have ˆ h = f ′ ε/ √ n + T h + T h = h + T h + T h + o p (1)= h + T h + o p (1) . Using a Nagar-type expansion on ˆ H − , n (ˆ δ R sls − δ )(ˆ δ R sls − δ ) = H − [ I − T H H − ][ hh ′ + hT h + T h h ′ + T h T h ′ ][ I − H − T H ] H − + o p (1) . Let us deﬁne A ( α ) = [ I − T H H − ] ℑ ( α )[ I − H − T H ] with ℑ ( α ) = [ hh ′ + hT h + T h h ′ + T h T h ′ ].Therefore, A ( α ) = ℑ ( α ) + T H H − ℑ ( α ) H − T H − T H H − ℑ ( α ) − ℑ ( α ) H − T H . [ ℑ ( α ) | X ] = σ [ H − e f + 1 n f ′ P ve ′ + 1 n e v ′ P f + e f ] − E [ 1 n f ′ ( I − P ) εε ′ P ve ′ + 1 n e v ′ P εε ′ ( I − P ) f | X ]+ E [ 1 n e v ′ P εε ′ P ve ′ | X ] .E ( T H H − ℑ ( α ) | X ) = − σ e f + o p (1) and E ( ℑ ( α ) H − T H | X ) = − σ e f + o p (1). E ( T H H − ℑ ( α ) H − T H | X ) = σ HO p ([ 1 nα + 1 α √ n + ∆ f ] )= O p ([ 1 nα + 1 α √ n + ∆ f ] ) . We have E ( A ( α ) | X ) = σ H + σ e f + E [ 1 n e v ′ P εε ′ P ve ′ | X ] − E [ 1 n f ′ ( I − P ) εε ′ P ve ′ + 1 n e v ′ P εε ′ ( I − P ) f | X ]+ 1 n f ′ P ve ′ + 1 n e v ′ P f + O p ([ 1 nα + 1 α √ n + ∆ f ] ) . From Lemma 5 (viii) of Carrasco (2012), we have E [ 1 n f ′ ( I − P ) εε ′ P ve ′ + 1 n e v ′ P εε ′ ( I − P ) f | X ] = O p ( p ∆ f / √ αn )and 1 n e v ′ ( P − P ) f = O p ( p ∆ f / √ αn ).From Lemma 5 (iii) of Carrasco (2012), 1 n f ′ P ve ′ + 1 n e v ′ P f = O p ( 1 nα ).And, from Lemma 5 (iv) of Carrasco (2012), E [ 1 n e v ′ P εε ′ P ve ′ /n | X ] = 1 n ( X j q j ) σ e ι ′ D ′ Dιe ′ + o p (( X j q j ) /n )with D = J RW S − R − .We can conclude that n (ˆ δ R sls − δ )(ˆ δ R sls − δ ) = Q ( α ) + ˆ R ( α )with E [ Q ( α ) | X ] = H − σ + H −  σ e f + 1 n ( X j q j ) σ e ι ′ D ′ Dιe ′  H − and r ( α ) = E ( ˆ R ( α ) | X ) = o p (( X j q j ) /n ) + O p ([ 1 nα + 1 α √ n + ∆ f ] + 1 nα + ∆ f √ αn ) . ( α ) = H −  σ e f + 1 n ( X j q j ) σ e ι ′ D ′ Dιe ′  H − .Note that r ( α ) /tr ( S ( α )) = o p (1); my argument is similar to that used in Carrasco (2012).This means that S ( α ) is the dominant part of the MSE of the estimation of the model usingregularized 2SLS. eferences Angrist, J. D. (2014): “The perils of peer eﬀects,”

Labour Economics , 30, 98–108.

Bai, J., and S. Ng (2010): “Instrumental Variable Estimation in a Data Rich Environment,”

Econometric Theory , 26(6), 1577–1606.

Bekker, P. A. (1994): “Alternative Approximations to the Distributions of InstrumentalVariable Estimators,”

Econometrica , 62(3), 657–81.

Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012): “Sparse models andmethods for optimal instruments with an application to eminent domain,”

Econometrica ,80(6), 2369–2429.

Bonacich, P. (1987): “Power and centrality: A family of measures,”

American Journal ofSociology , 92(5), 1170–1182.

Bramoull´e, Y., H. Djebbari, and B. Fortin (2009): “Identiﬁcation of peer eﬀectsthrough social networks,”

Journal of Econometrics , 150(1), 41–55.

Caner, M., and N. Yıldız (2012): “CUE with many weak instruments and nearly singulardesign,”

Journal of Econometrics , 170(2), 422–441.

Carrasco, M. (2012): “A regularization approach to the many instruments problem,”

Jour-nal of Econometrics , 170(2), 383–398.

Carrasco, M., and J.-P. Florens (2000): “Generalization Of Gmm To A Continuum OfMoment Conditions,”

Econometric Theory , 16(06), 797–834.(2014): “On the asymptotic eﬃciency of GMM,”

Econometric Theory , 30(02), 372–406.

Carrasco, M., J.-P. Florens, and E. Renault (2007): “Linear Inverse Problems inStructural Econometrics Estimation Based on Spectral Decomposition and Regularization,”in

Handbook of Econometrics , ed. by J. Heckman, and E. Leamer, vol. 6 of

Handbook ofEconometrics , chap. 77. Elsevier. arrasco, M., and G. Tchuente (2015): “Regularized LIML for many instruments,” Journal of Econometrics , 186(2), 427–442.(2016): “Eﬃcient Estimation with Many Weak Instruments Using RegularizationTechniques,”

Econometric Reviews , 35(8-10), 1609–1637.

Chao, J. C., and N. R. Swanson (2005): “Consistent Estimation with a Large Number ofWeak Instruments,”

Econometrica , 73(5), 1673–1692.

Craven, P., and G. Wahba (1979): “Smoothing noisy data with spline functions: Esti-mating the correct degree of smoothing by the method of the generalized cross-validation,”

Numer. Math. , 31, 377–404.

Davidson, R., and J. G. MacKinnon (1993):

Estimation and Inference in Econometrics ,no. 9780195060119 in OUP Catalogue. Oxford University Press.

Donald, S. G., and W. K. Newey (2001): “Choosing the Number of Instruments,”

Econo-metrica , 69(5), 1161–91.

Dufour, J.-M., and C. Hsiao (2010): “Identiﬁcation,” in

Microeconometrics , pp. 65–77.Springer.

Gibbons, S., and H. G. Overman (2012): “Mostly pointless spatial econometrics?,”

Journalof Regional Science , 52(2), 172–191.

Goldsmith-Pinkham, P., and G. W. Imbens (2013): “Social networks and the identiﬁca-tion of peer eﬀects,”

Journal of Business & Economic Statistics , 31(3), 253–264.

Guerre, E., and P. Lavergne (2005): “Data-driven rate-optimal speciﬁcation testing inregression models,”

Annals of Statistics , pp. 840–870.

Han, C., and P. C. B. Phillips (2006): “GMM with Many Moment Conditions,”

Econo-metrica , 74(1), 147–192.

Hansen, C., J. Hausman, and W. Newey (2008): “Estimation With Many InstrumentalVariables,”

Journal of Business & Economic Statistics , 26(4), 398–422.

Hansen, C., and D. Kozbur (2014): “Instrumental variables estimation with many weakinstruments using regularized JIVE,”

Journal of Econometrics , 182(2), 290–308. asselt, M. v. (2010): “Many instruments asymptotic approximations under nonnormalerror distributions,” Econometric Theory , 26(02), 633–645.

Kapetanios, G., and M. Marcellino (2010): “Factor-GMM estimation with large setsof possibly weak instruments,”

Computational Statistics and Data Analysis , 54(11), 2655–2675.

Kress, R. (1999):

Linear Integral Equations . Springer.

Kuersteiner, G. (2012): “Kernel-weighted GMM estimators for linear time series models,”

Journal of Econometrics , 170(2), 399–421.

Lancaster, T. (2000): “The incidental parameter problem since 1948,”

Journal of Econo-metrics , 95(2), 391–413.

Lee, L.-F. (2004): “Asymptotic Distributions of Quasi-Maximum Likelihood Estimators forSpatial Autoregressive Models,”

Econometrica , 72(6), 1899–1925.

Lee, L.-f. (2007): “Identiﬁcation and estimation of econometric models with group interac-tions, contextual factors and ﬁxed eﬀects,”

Journal of Econometrics , 140(2), 333–374.

Li, K.-C. (1986): “Asymptotic optimality of C L and generalized cross-validation in ridgeregression with application to spline smoothing,” The Annals of Statistics , 14(3), 1101–1112. (1987): “Asymptotic optimality for C p , C L , cross-validation and generalized cross-validation: Discrete Index Set,” The Annals of Statistics , 15(3), 958–975.

Liu, X., and L.-f. Lee (2010): “GMM estimation of social interaction models with central-ity,”

Journal of Econometrics , 159(1), 99–115.

Liu, X., E. Patacchini, and Y. Zenou (2014): “Endogenous peer eﬀects: local aggregateor local average?,”

Journal of Economic Behavior & Organization , 103, 39–59.

Lyytik¨ainen, T. (2012): “Tax competition among local governments: Evidence from aproperty tax reform in Finland,”

Journal of Public Economics , 96(7), 584–595.

Mallows, C. L. (1973): “Some Comments on Cp,”

Technometrics , 15(4), 661–675. anski, C. F. (1993): “Identiﬁcation of endogenous social eﬀects: The reﬂection problem,” The Review of Economic Studies , 60(3), 531–542.

Newey, W. K. (2013): “Nonparametric instrumental variables estimation,”

The AmericanEconomic Review , 103(3), 550–556.

Newey, W. K., F. Hsieh, and J. Robins (1998): “Undersmoothing and bias correctedfunctional estimation,” .

Newey, W. K., and F. Windmeijer (2009): “Generalized method of moments with manyweak moment conditions,”

Econometrica , 77(3), 687–719.

Neyman, J., and L. Scott (1948): “Consistent estimates based on partially consistentobservations,”

Econometrica , 16(1), 1–32.

Okui, R. (2011): “Instrumental variable estimation in the presence of many moment condi-tions,”

Journal of Econometrics , 165(1), 70–86. ¨Ozt¨urk, F., and F. Akdeniz (2000): “Ill-conditioning and multicollinearity,”

Linear Alge-bra and Its Applications , 321(1-3), 295–305.

Staiger, D., and J. H. Stock (1997): “Instrumental Variables Regression with WeakInstruments,”

Econometrica , 65(3), 557–586.

Stone, C. J. (1974): “Cross-validatory choice and assessment of statistical predictions,”

Journal of the Royal Statistical Society. Series B (Methodological) , 36(2), 111–147.

Tiebout, C. M. (1956): “A pure theory of local expenditures,”

Journal of political economy ,64(5), 416–424.

West, K. D., and W. K. Newey (1987): “A Simple, Positive Semi-Deﬁnite, Heteroskedas-ticity and Autocorrelation Consistent Covariance Matrix,”

Econometrica , 55(3), 703–708. Appendix: Monte Carlo Simulation Results

Mean, standard deviation (SD) and root mean square errors (RMSE) of the empirical dis-tributions of the estimates are reported. Each data-generating process uses 500 replications.

Table 2: Simulation results with maximum of three connections (1/2) m = 10 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . g = 602SLS (ﬁnite iv) 0.104 (0.136) [0.136] 0.203 (0.047) [0.047] 0.204 (0.049) [0.049] 0.116 (0.177) [0.178]2SLS (large iv) 0.032 (0.081) [0.105] 0.196 (0.046) [0.046] 0.217 (0.043) [0.046] -Bias-corrected 2SLS 0.108 (0.099) [0.099] 0.202 (0.047) [0.047] 0.204 (0.045) [0.045] -T-2SLS 0.055 (0.088) [0.099] 0.193 (0.051) [0.052] 0.213 (0.046) [0.048] -LF-2SLS 0.064 (0.095) [0.101] 0.193 (0.053) [0.054] 0.212 (0.048) [0.049] -PC-2SLS 0.064 (0.095) [0.101] 0.193 (0.053) [0.054] 0.212 (0.048) [0.049] - Mean (SD) [RMSE]

Table 3: Simulation results with maximum of three connections (2/2) m = 15 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . g = 602SLS (ﬁnite iv) 0.093 (0.103) [0.103] 0.198 (0.037) [0.037] 0.200 (0.039) [0.039] 0.109 (0.143) [0.143]2SLS (large iv) 0.066 (0.061) [0.070] 0.197 (0.038) [0.038] 0.206 (0.034) [0.035] -Bias-corrected 2SLS 0.096 (0.072) [0.072] 0.198 (0.038) [0.038] 0.199 (0.036) [0.036] -T-2SLS 0.079 (0.069) [0.072] 0.196 (0.043) [0.043] 0.204 (0.037) [0.037] -LF-2SLS 0.082 (0.074) [0.076] 0.196 (0.046) [0.046] 0.203 (0.040) [0.040] -PC-2SLS 0.082 (0.074) [0.076] 0.196 (0.046) [0.046] 0.203 (0.040) [0.040] - Mean (SD) [RMSE] m = 10 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . g = 602SLS (ﬁnite iv) 0.099 (0.090) [0.090] 0.204 (0.051) [0.051] 0.207 (0.034) [0.035] 0.118 (0.158) [0.159]2SLS (large iv) 0.053 (0.038) [0.060] 0.193 (0.047) [0.048] 0.209 (0.032) [0.034] -Bias-corrected 2SLS 0.099 (0.064) [0.064] 0.203 (0.049) [0.049] 0.205 (0.034) [0.034] -T-2SLS 0.066 (0.044) [0.056] 0.184 (0.054) [0.056] 0.205 (0.035) [0.035] -LF-2SLS 0.072 (0.049) [0.057] 0.180 (0.058) [0.061] 0.204 (0.036) [0.036] -PC-2SLS 0.072 (0.049) [0.057] 0.180 (0.058) [0.061] 0.204 (0.036) [0.036] - Mean (SD) [RMSE]

Table 5: Simulation results with maximum of six connections (2/2) m = 15 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . g = 602SLS (ﬁnite iv) 0.103 (0.050) [0.050] 0.200 (0.039) [0.039] 0.200 (0.026) [0.026] 0.108 (0.100) [0.100]2SLS (large iv) 0.086 (0.030) [0.033] 0.196 (0.039) [0.039] 0.202 (0.024) [0.025] -Bias-corrected 2SLS 0.105 (0.035) [0.035] 0.200 (0.039) [0.039] 0.199 (0.025) [0.025] -T-2SLS 0.094 (0.035) [0.035] 0.193 (0.043) [0.043] 0.201 (0.027) [0.027] -LF-2SLS 0.097 (0.038) [0.039] 0.193 (0.044) [0.044] 0.201 (0.028) [0.028] -PC-2SLS 0.097 (0.038) [0.039] 0.193 (0.044) [0.044] 0.201 (0.028) [0.028] - Mean (SD) [RMSE] m = 10 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . Mean (SD) [RMSE]