aa r X i v : . [ ec on . E M ] F e b Weak Identification and Estimation of Social InteractionModels ∗ Guy Tchuente † February 2019
Abstract
The identification of the network effect is based on either group size variation, the structureof the network or the relative position in the network. I provide easy-to-verify necessaryconditions for identification of undirected network models based on the number of distincteigenvalues of the adjacency matrix. Identification of network effects is possible; althoughin many empirical situations existing identification strategies may require the use of manyinstruments or instruments that could be strongly correlated with each other. The use ofhighly correlated instruments or many instruments may lead to weak identification or manyinstruments bias. This paper proposes regularized versions of the two-stage least squares(2SLS) estimators as a solution to these problems. The proposed estimators are consistentand asymptotically normal. A Monte Carlo study illustrates the properties of the regularizedestimators. An empirical application, assessing a local government tax competition model,shows the empirical relevance of using regularization methods.
Keywords:
High-dimensional models, Social network, Identification, Spatial autoregressivemodel, 2SLS, Regularization methods.
JEL classification:
C13, C31. ∗ Comments from Marine Carrasco, Eric Renault, James G. MacKinnon, Silvia Goncalves, Russell Davidson, Yann Bramoull´e, Lynda Khalaf, JohnPeirson and A. Colin Cameron are gratefully acknowledged. The author also thanks seminar participants at the University of Kent, University of Ottawa,Universidad Pontificia Javeriana, RES 2016, EEA-ESEM 2016, AFES 2016, CIREQ Econometrics Conference 2017, the University of Bristol’s EconometricsStudy Group 2017, SCSE 2017 and NASM 2018 for their comments. Many thanks to Xiaodong Lui for kindly providing his code for bias-corrected 2SLSmethods and to Teemu Lyytik¨ainen for providing data on Finnish municipalities. † E-mail: [email protected]. Address: School of Economics, University of Kent, Keynes College, Canterbury, Kent, CT27NP. Tel:+441227827249 Introduction
This paper investigates the estimation of social interaction models with network structuresand the presence of endogenous, contextual, correlated and group fixed effects. In his seminalpaper on network model estimation, Manski (1993) argues that solving the reflection problemin identifying and estimating the endogenous interaction effects is of significant interest insocial interaction models. He shows that the separate identification of the network effects, in alinear in mean model, is impossible. Following Manski (1993), the literature on identificationof network effect has proposed three main identification strategies. They are based on eitherthe variation in the size of the group of peers or on the structure through which peers interact.The present paper investigates a robust to weak identification estimation strategy. Weakidentification can occur in limit cases for all the identification strategies.The first method for identification was proposed by Lee (2007). He shows that both theendogenous and exogenous interaction effects can be identified if there is sufficient variationin group sizes. However, with large groups, identification can be weak in the sense that theestimator converges in distribution at low rates (Lee (2007)). The low rate of convergencemeans that we need a larger sample to have enough exogenous variation. Indeed as thegroup size increase, the marginal effect of an individual on its peer becomes small an moreobservations are needed for identification.In a more general framework, Bramoull´e, Djebbari, and Fortin (2009) investigate identi-fication and estimation of network effect. They use the structure of the network to identifythe network effect. Their identification strategy relies on the use of spatial lags of friends’(or friends of friends’) characteristics as instruments. But, if the network is highly transitive(i.e. if a friend of my friend is also likely to be my friend), the identification is also weak.Weak identification can also occurs if the there are too many isolated individuals, the weakidentification correspond to the weak instruments as in Staiger and Stock (1997). This paperfocuses its attention on highly transitive networks.More recently, Liu and Lee (2010) have considered the estimation of a social network wherethe endogenous effect is given by the aggregate choices of an agent’s friends. They show that In network models, an agent’s behavior may be influenced by his peers’ choices (the endogenous effect), hispeers’ exogenous characteristics (the contextual effect), and/or by the common environment of the network (thecorrelated effect) (see Manski (1993) for a description of these models). ifferent positions of agents in a network captured by the Bonacich (1987) centrality measurecan be used as additional instrumental variables to improve estimation efficiency. The numberof such instruments depends on the number of groups, and can be very large. Liu and Lee(2010) propose two-stage least squares (2SLS) and generalized method of moments (GMM)estimators. The proposed estimators have an asymptotic bias due to the presence of manyinstruments.The existing papers in the literature use instrumental variable (IV) methods or quasi-maximum likelihood method to estimate the network effect. The present paper is interestedon the use of IV when identification is weak in the sense describe above. We will show that,in the estimation of peer effects using IV methods, highly transitive network or large groupsize imply the use of highly correlated instruments (where the set of instrumental variablescontains the included and excluded instruments). If the Bonacich (1987) centrality measureare used the number of instruments increase with the number of groups. In both cases, thestructure of the interaction generates a weak identification issue. The weak identificationproblem comes from the near-perfect collinearity of the first-stage regression.This paper proposes simple-to-check necessary conditions for identification based on thespectral decomposition of the network adjacency matrix. It shows that identification of thenetwork effects is possible many cases. However, given that all exogenous variations comefrom the system, weak identification may be a concern. I propose a regularized 2SLS estima-tors for network models with spatial autoregressive (SAR) representations. High-dimensionalreduction techniques are used to mitigate the finite sample bias of the 2SLS estimators stem-ming from the use of many or highly correlated instruments. The regularized 2SLS estimatorsare based on three ways of computing a regularized inverse of the (possibly infinite dimen-sional) covariance matrix of the instruments. The regularization methods come from theliterature on inverse problems (see Kress (1999) and Carrasco, Florens, and Renault (2007)).The first estimator is based on Tikhonov (ridge) regularization. The Tikhonov (ridge) reg-ularization is known in the machine learning literature for its ability to address near-perfectcollinearity problems. The second estimator is based on the iterative Landweber-Fridmanmethod. It has the same regularization properties as the ridge method, with the advantageof being appropriate for larger-scale problems. The third estimator is based on the principalcomponents associated with the largest eigenvalues. The use of the principal components isvery popular for estimating models with factors. In the presence of many instruments, the se of few principal components can help reduce the first-stage regression dimension. Theregularized estimator presented in the paper depends on a tuning parameter, I proposed adata-driven method for it selection based the estimation of an approximation of the meansquare error of the estimator.The regularized 2SLS estimators are consistent and asymptotically normal and unbiased.The regularized 2SLS estimators achieve the semiparametric efficiency bound. However, theconsistency and asymptotic normality conditions require more regularization than in Carrasco(2012). A Monte Carlo experiment shows that the regularized estimators perform well. Ingeneral, the quality of the regularized estimators improves as the density of the networkincreases.I demonstrate the empirical relevance of my estimators by estimating a model of taxcompetition between municipalities in Finland. The size of the tax competition parameterseems larger than what is suggested by Lyytik¨ainen (2012). However, the regularized estima-tors are not statistically different from zero. This leaves the conclusion unchanged that taxcompetition is absent between municipalities in Finland. The large existing literature on network models focuses on two main issues: identificationand the estimation of the network effect. In his seminal work, Manski (1993) shows thatlinear-in-means specifications suffer from the reflection problem, so endogenous and contex-tual effects cannot be separately identified. Lee (2007) and Bramoull´e, Djebbari, and Fortin(2009) propose identification strategies for a local-average network model based on differencesin group sizes and structures. Liu and Lee (2010) show that the Bonacich (1987) centralitymeasure can also be used as additional instruments to improve identification and estima-tion efficiency. Lee (2007) and Bramoull´e, Djebbari, and Fortin (2009) use the instrumentalvariables method to estimate the parameter of interest. Liu and Lee (2010) propose a general-ized method of moments (GMM) estimation approach, following Kelejian and Prucha (1998,1999), who propose 2SLS and GMM approaches for estimating SAR models. The inclusion ofthe measure of centrality implies the use of many moment conditions (see Donald and Newey(2001), Hansen, Hausman, and Newey (2008) and Hasselt (2010) for some recent develop-ments in this area).In this paper, I assume that there are many instruments at hand (they are generatedby the structure imposed on the data), and therefore use a framework that allows for an The inference carried out in the empirical example does not account for the effect of regularization. nfinite number of instruments. Thus, this paper contributes to the literature on modelsfor which the number of instruments exceeds the sample size. In a linear model frameworkwithout network effects, Carrasco (2012) proposes an estimation procedure that allows forthe use of many instruments; the number of instruments may be smaller or larger than thesample size, or even infinite. Moreover, Carrasco and Tchuente (2016) show that these meth-ods can be used to improve identification in weak instrumental variables estimation. Closelyrelated papers also include Kuersteiner (2012), who considers a kernel-weighted GMM esti-mator; Okui (2011), who uses shrinkage with many instruments; and Bai and Ng (2010) andKapetanios and Marcellino (2010), who also assume that the endogenous regressors dependon a small number of factors that are exogenous. Using estimated factors as instruments,they assume that the number of variables from which the factors are estimated can be largerthan the sample size. Belloni, Chen, Chernozhukov, and Hansen (2012) propose an instru-mental variables estimator under the first-stage sparsity assumption. Hansen and Kozbur(2014) propose a ridge-regularized jackknife instrumental variable estimator in the presenceof heteroscedasticity, which does not require sparsity, and with good sizes.Another important focus in the instrumental variables estimation literature is on weak in-struments or weak identification (see, for example, Chao and Swanson (2005) and Newey and Windmeijer(2009)). In this paper, I assume that the concentration parameter grows at the same rate asthe sample size. However, I allow for the possibility of weak identification resulting from near-perfect collinearity in the set of instruments. My framework is similar to Caner and Yıldız(2012), with the difference that the near-singular design does not come for the proliferationof instruments, but from the structure of the social or spatial interaction.The paper is organized as follows. Section 2 presents the network model. Section 3discusses identification and estimation in network models. It proposes the regularized 2SLSapproach to estimating the model. The selection of the regularization parameter is discussedin Section 4. Monte Carlo evidence on the performance of the proposed estimators for smallsamples is given in Section 5. An empirical application on local government tax competitionis proposed in Section 6. Section 7 concludes. The Model
The following social interaction model is considered: Y r = λW r Y r + X r β + W r X r β + ι m r γ r + u r (1)with u r = ρM r u r + ε r and r = 1 ... ¯ r , where ¯ r is the total number of groups and m r is thenumber of individuals in group r . Y r = ( y r , ..., y m r r ) ′ is an m r -dimensional vector that represents the outcomes of interest. y ir is the observation of individual i in group r . The total number of individuals in the sampleis n = ¯ r X r =1 m r . W r and M r are m r × m r sociomatrices of known constants, and may or may not be thesame. λ is a scalar that captures endogenous network effects. I assume that this effect is thesame for all individuals and groups. The outcomes of individuals influences those of theirsuccessors in the network graph (the successors are usually a friends or peers).In such a linear model, the parameter λ is usually interpreted as the partial effect ofa one-unit change in the explanatory variable on the outcome. The explanatory variablein the present case is a product of the a sociomatice W r and friends’ outcomes Y r . If thesociomatrix W r is row-normalized, the endogenous network effect captured by λ representsthe expected change in the outcome of an individual if all his friends’ outcomes were changedby one unit. This corresponds to the “local average” endogenous effect in the terminologyof Liu, Patacchini, and Zenou (2014). On the other hand, if W r is not row-normalized, it isimpossible to know which intervention is the source of the exogenous change in W r Y r (seeGoldsmith-Pinkham and Imbens (2013) and Angrist (2014) for a discussion on the causalinterpretation of the network effect). The unit variation in W r Y r could come from a changein the allocation of friend, an intervention on friends’ outcomes or both. This should bedone in a specific manner to obtain a unit change. Such a situation corresponds to the “localaggregate” endogenous effect in the terminology of Liu, Patacchini, and Zenou (2014).My model specification allows for the use of the “local average” and “local aggregate” en-dogenous effects. Micro-foundations developed in Liu, Patacchini, and Zenou (2014) suggestthat “local average” should be used in situations where the network effect comes from indi-viduals trying to conform to the social norm and the “local aggregate” for a situation where here is leakage. X r and X r are m r × k and m r × k matrices, respectively. They represent individuals’exogenous characteristics. β is the parameter measuring the dependence of individuals’outcomes on their own characteristics. The outcomes of individuals may also depend on thecharacteristics of their predecessors via the exogenous contextual effect, β . ι m r is an m r -dimensional vector of ones and γ r represents the unobserved group-specific effect (it is treatedas a vector of unknown parameters that will not be estimated).Aside from the group fixed effect, ρ captures unobservable correlated effects betweenindividuals and their connections in the network. ε r is the m r -dimensional disturbance vector, ε ir are i.i.d. with a mean of 0 and varianceof σ for all i and r . I define X r = ( X r , W r X r ).For a sample with ¯ r groups, the data is stacked up by defining V = ( V ′ , ..., V ′ ¯ r ) ′ for V = Y, X, ε or u .I also define W = D ( W , W , ..., W ¯ r ) and M = D ( M , M , ..., M ¯ r ), ι = D ( ι m , ι m , ..., ι m ¯ r ),where D ( A , .., A K ) is a block diagonal matrix in which the diagonal blocks are m k × n k matrices, denoted as A k , for k = 1 , ..., K .The full sample model is Y = λW Y + Xβ + ιγ + u (2)where u = ρM u + ε .I define R ( ρ ) = ( I − ρM ). The Cochrane-Orcutt-type transformation of the model is ob-tained by multiplying equation (2) by R = R ( ρ ), where ρ is the true value of the parameter ρ : RY = λRW Y + RXβ + Rιγ + Ru.
This leads to the following equation: RY = λRW Y + RXβ + Rιγ + ε. (3)When the number of groups is large, we have the incidental parameter problem (seeNeyman and Scott (1948) and Lancaster (2000) for a discussion of the consequences of thisproblem).To eliminate unobserved group heterogeneity, I define J r = I m r − ( ι m r , M r ι m r )[( ι m r , M r ι m r ) ′ ( ι m r , M r ι m r )] − ( ι m r , M r ι m r ) ′ here A − is the generalized inverse of a square matrix A . In general, J r represents theprojection of an m r -dimensional vector on the space spanned by ι m r and M r ι m r if they arelinearly independent. Otherwise, J r = I m r − m r ι m r ι ′ m r , which is the deviation from the groupmean projector.The matrix J = D ( J , J , ..., J ¯ r ) is then pre-multiplied by equation (3) to create a modelwithout the unobserved group-effect parameters: J RY = λJ RW Y + J RXβ + J ε. (4)This is the structural equation, and we are interested in the estimation of λ , β , β and ρ .The discussions on the identification and estimation of λ , β and β in this paper will becarried out under the assumption of a consistent estimation of ρ .I define S ( λ ) = I − λW . I assume that equation (2) is an equilibrium and that S ≡ S ( λ ) isinvertible at the true parameter value. The equilibrium vector Y is given by the reduced-formequation: Y = S − ( Xβ + ιγ ) + S − R − ε. (5)It follows that W Y = W S − ( Xβ + ιγ ) + W S − R − ε and W Y is correlated with ε . Hence,in general, equation (4) cannot be consistently estimated by ordinary least squares (OLS).Moreover, this model may not be considered as a self-contained system where the transformedvariable J RY can be expressed as a function of the exogenous variables and disturbances.Hence, a partial-likelihood-type approach based only on equation (4) may not be feasible.In this paper, I consider the estimation of the parameters of equation (4) using regularized2SLS. An extension would be to estimate the same model using a limited information maximum likelihood (LIML)method (the least variance ratio (LVR)) from Carrasco and Tchuente (2015). In models with independent obser-vations, the LIML estimator can also be derived using the LVR principle (Davidson and MacKinnon (1993)). TheLVR estimator is not equivalent to the LIML estimator for the SAR model. This is analogous to the differencebetween the 2SLS and maximum likelihood estimators for the SAR model (Lee (2004)). Identification and Estimation of the Network Mod-els
This section presents the identification and estimation of the network model parameters usingregularization techniques. It first discusses weak identification in network model. It, then,proposes a regularized 2SLS model using three regularized methods (Tikhonov, Landweber-Fridman and principal component). They are presented in a unified framework covering botha finite or infinite number of instruments. The focus is on estimating endogenous and contex-tual effects under the assumption of a preliminary estimator of the unobservable correlationbetween individuals and their connections in the network. I also derive the asymptotic prop-erties of the models’ estimated parameters.
The model presented in Equation(2) proposes an underlying structure assumed to have gen-erated the data of the population from which our sample is drawn. The estimation strategythat I propose later aims at making statements about the parameters of this model. To thatend, they shouldn’t exist many parametrizations compatible with the observed data. Dis-cussing conditions under which a unique parametric characterisation exist is a considerableproblem in the estimation of network models (see Bramoull´e, Djebbari, and Fortin (2009))and, in econometrics (see Dufour and Hsiao (2010) for a general discussion on identification).The discussion on the identification is done under a number of assumptions.
Assumption 1.
The elements of ε ir are i.i.d. with a mean of 0 and variance of σ , anda moment of order higher than the fourth exists. Assumption 2.
The sequences of matrices { W } , { M } , { S − } and { R − } are uniformlybounded (UB), and Sup k λW k < Uniformly bounded in row (column) sums of the absolute value of a sequence of square matrices { A } will beabbreviated as UBR (UBC), and uniformly bounded in both row and column sums in absolute value as UB. Asequence of square matrices { A } , where A = [ A ij ], is said to be UBR (UBC) if the sequence of the row-sum matrixnorm of A (column-sum matrix norm of A ) is bounded. take ε ( ρ , δ ) = J R ( Y − Zδ ) = f ( δ − δ ) + J RW S − R − ε ( λ − λ ) + J ε , with f = J R [ W S − ( Xβ + ιγ ) , X ], where λ , β and γ are true values of the parameters δ = ( λ, β ′ ) ′ and Z = ( W Y, X ).Under Assumption 2 (i.e. that
Sup k λW k < f can be approximated by a linearcombination of ( W X, W X, ... ) , (
W ι, W ι, ... ) and X . This is a typical case where thenumber of potential instruments is infinite.I define Q = J [ Q , M Q ], where Q = [ W X, W X, ...W ι, W ι, ..., X ] is the infinite di-mensional set of instruments.We can also consider the case where only a finite number of instruments, such as m < n ,is used. For this case, I define Q m = J [ Q m , M Q m ]where Q m = [ W X, W X, ...W m X, W ι, W ι, ..., W m ι, X ].As discussed in Liu and Lee (2010), δ is identified if Q ′ m f has full column rank k + 1.This rank condition requires that f has full rank k + 1. Note that this assumes that Q m isfull column rank (meaning no perfect collinearity between instruments). If instruments arenear-perfectly or perfectly collinear, f having full rank k + 1 does not ensure identification. If W r does not have equal degrees in all its nodes and W r is not row-normalized, the centralityscore of each individual in his group helps to identify δ . This is possible even if β = 0.However, if W r has constant row sums, then f = J R [ W S − Xβ , X ] and the identification isimpossible for β = 0. Under Assumptions 1 and 2, δ is identified. The identification in the general case with an infinite number of instruments is possible ifthe matrix with an infinite number of rows, Q ′ f , has full column rank. The identification isbased on the moment condition E ( Q ′ ε ( ρ , δ )) = 0 (i.e. Q ′ f ( δ − δ ) = 0).For any sample size n , rank ( Q ) ≤ n . If we assume that rank ( QQ ′ ) = n , then the fullcolumn rank condition only requires that f has full rank k + 1. The same identificationconditions as in the finite dimensional case follow. There may be a finite set of instruments when the network effect is very small, such that λ m → m → ∞ at a very fast rate. Section 3.1 discusses the effect of near-perfect collinearity on the identification of the network effect. These identification results are from Liu and Lee (2010). My work generalizes the results to an infinite numberof instruments. Section 3.2 proposes regularization tools that can ensure the identification of δ with a regularized version ofthe orthogonality condition. he identification of the model parameters relies on the structure of the network throughthe adjacency matrix W . The adjacency matrix is an n × n matrix. Let τ ≥ τ ≥ ... ≥ τ n beits n eigenvalues. An eigenvalue could have multiplicity one or k depending on the numberof corresponding eigenvectors. Let define ̺ w , to be the number of distinct eigenvalues of theadjacency matrix. The results propose in proposition 1 to 3 apply to symmetric spatial andadjacency matrix W . Undirected networks’ adjacency matrices is an example of a networkstructure represented by a symmetric adjacency matrix. Proposition 1
Consider a network model represented by Equation 2 with ρ = 0 . If ̺ w = 2 ,then the network effects are not identified. Proposition 1 implies that the identification of the network effect can be reduced to aspectral analysis of the adjacency matrix. It provide a easy to verify condition for identifica-tion of the network effects under the assumption of network exogeneity. Indeed, if ̺ w = 2,using the Cayley-Hamilton theorem, I can show that there exist µ and µ non null scalarssuch that W = µ I + µ W . Then, using proposition 1 from Bramoull´e, Djebbari, and Fortin(2009) the network effects are not identified. Proposition 2
Consider a network model represented by Equation 2 with ρ = 0 and ε ( δ ) = J ( Y − Zδ ) = f ( δ − δ ) + J W S − ε ( λ − λ ) + J ε , with f = J [ W S − ( Xβ + ιγ ) , X ] , where λ and β = 0 are true values of the parameters and δ = ( λ, β ′ ) ′ and Assumptions 1and 2 hold. Let ̺ w be the number of distinct eigenvalues of the adjacency matrix W. If [ W X, W X, ..., W ̺ w − X, X ] is full rank column, the network effects are identified. Proposition 2 gives a relationship between the identification of network effect and thespectral decomposition of the adjacency matrix. If ̺ w = 2 using the definition of X andapplying the Cayley-Hamilton theorem leads to the conclusion that [ W X, X ] is not full rankcolumn. Thus,
J W X cannot be excluded from the structural equation and, therefore, cannot sere as an instrumental variable for
J W Y . However, if the number of distinct eigenvaluesis strictly greater than 2, identification may be possible. For instance if ̺ w = 3, ρ = 0 and[ W X, W X, X ] is full rank column then the network effect are identified. Indeed,
J W X and
J W X serve as excluded instruments for J W Y .The full rank condition can be generalised to a necessary and sufficient condition, undervery restrictive assumptions on the set in which the true model’s parameters belong. Thispossibility is discussed in the proof of proposition two in the appendix. now consider the case where there is spatial serial correlation. The following propositiongeneralizes proposition 1 and 2. Proposition 3
Consider a network model represented by Equation 2, β = 0 and Assump-tions 1 and 2 hold. Let ̺ w be the number of distinct eigenvalues of the adjacency matrix W .If Q ̺ w = [ Q ̺ w , M Q ̺ w ] where Q ̺ w = [ W X, W X, ...W ̺ w − X, W ι, W ι, ..., W ̺ w − ι, X ] isfull rank column, the network effects are identified. A special case of a model with spatial serial correlation is one in which W = M . In sucha situation, proposition 3 becomes similar to Proposition 2. Otherwise, the identification ofthe network effects could be achieved via the effect of unobserved shock on peers of peers via M . Having spatial correlation provides a second source of exogenous variation.The identification of the network effects seems to rest upon the possibility of having a fullrank column matrix Q ̺ w = J [ W X, W X, ..., W ̺ w − X, X ]. The rank property of Q ̺ w can bemeasured by condition number of the matrix Q ̺ w Q ′ ̺ w . Large values of the condition numbercorrespond to a situation of near-rank deficiency and near-non-identification of the modelsnetwork effects. I consider a model with near rank deficient Q ̺ w matrix as being weaklyidentified following the terminology of Dufour and Hsiao (2010). The following subsectionprovides a discussion of the empirical contexts in which existing network effects identificationstrategy may become weak. Since Manski (1993), the identification problem in network models has been a major concernfor econometricians. After finding that separately identifying endogenous and exogenousinteraction effects in a linear-in-mean model is not possible, many subsequent studies haveinvestigated network structures in which identification is possible. The identification of thenetwork effect is achieved through group size variation or by exploiting the structure of thenetwork. It is notable that in all cases, additional information is required to overcome thereflection problem.Lee (2007) uses variations in group sizes to identify both the endogenous and exogenousinteraction effects. His identification relies on having sufficient variation in group size. For The condition number is the ratio between the largest and the smallest eigenvalue of a symmetric matrix (see¨Ozt¨urk and Akdeniz (2000) for the relation between ill-conditioned and multicollinearity). xample, if we assume that the we have two groups form m and m individual and we considerthe adjacency matrix formed as follows W ii = 0 and W ij = 1 m k − i and j belong into the same group k . W can be represented as a block diagonal matrix. Its distinct eigenvaluesare τ = − m − τ = − m − τ = 1. If the group sizes are equal, we have exactlytwo distinct eigenvalues. An the network effect cannot be identified. Different group sizes leadto more than two distinct eigenvalues. The spectral decompossition of the adjacency matrixleads to the same conclusion as in the comments from Bramoull´e, Djebbari, and Fortin (2009)on Lee’s identification with two groups of different same sizes. I can show that with grouplarge group size there is almost near-perfect collinearity between W X, W X, ..., W ̺ w − X and, X . Or in other words, with large groups, the identification can be weak.More precisely, let us consider the model presented in Section 2. To focus the discussionon the possibility of model’s weak identification, we will consider the version of the socialinteraction model without spatial serial correlation.For an individual in group r , the model above gives y ir = λ m r − m r X j = i y jr + x ir β + m r − m r X j = i x jr β + γ r + ε ir . (6)The reduced form after a within transformation is given by: y ir − ¯ y r = ( x ir − ¯ x r ) ( m r − β m r − λ − ( x ir − ¯ x r ) β m r − λ + m r − m r − λ ( ε ir − ¯ ε r ) (7)where ¯ y r , ¯ x r , ¯ x r , and ¯ ε r are the group average of the variables excluding individual i (seeequation 12 in Bramoull´e, Djebbari, and Fortin (2009), and equation 2.5 in Lee (2007)). Tosimplify the discussion, without loose of generality, let assume that x ir = x ir . Thus, y ir − ¯ y r = ( x ir − ¯ x r ) ( m r − β − β m r − λ + m r − m r − λ ( ε ir − ¯ ε r ) (8)Each reduced form equation gives value for ( m r − β − β m r − λ . Identification of the parame-ters in this model comes from the variations in ( m r − β − β m r − λ . Indeed, Bramoull´e, Djebbari, and Fortin(2009) show that we need at least three different group sizes to be able to identify β , β and, ν . The parameters are obtained after solving a system of linear equations. There is a needfor at least three distinct equations for a unique solution.When the group size becomes large, ( m r − β − β m r − λ converge to constants, which meansno or very small variation as m becomes large in the coefficient of the reduced form of all roups. A explanation is that with a large group, the marginal contribution of an additionalmember of the group is relatively small, which means that the amount of exogenous variationuseful for identification vanishes as the group’s size increases. This situation is a case of weakidentification of the network effects.The spectral decomposition of the adjacency matrix associated with Lee’s model is ablock diagonal matrix. The distinct eigenvalues are given by τ r = − m r − r = 1 , ..., ¯ r and τ ¯ r +1 = 1. As the group sizes increase, the difference between the eigenvalue τ r decreases. Thenumber of distinct eigenvalue because nearly equal to two and the model is weakly identified.It can then be said that if the groups are large based on Proposition 1, X , W X and W X will be nearly linearly dependent, leading to weak identification.Bramoull´e, Djebbari, and Fortin (2009) use the structure of the network to identify thenetwork effect. Their work proposes a general framework that incorporates Lee’s and Manski’ssetups as special cases. The identification strategy proposed in their work relies on the use ofspatial lags of friends’ (i.e. friends of friends’) characteristics as instruments. The variables W X, W X and W X... are used as instruments for
W Y . The condition for identification isthat
I, W and W (or, as noted in Proposition 1 and 4 of Bramoull´e, Djebbari, and Fortin(2009), I, W, W and W in the presence of correlated effects) are linearly independent.Variation in group size ensures that I, W and W are linearly independent. If the networkis highly transitive (i.e. a friend of my friend is likely to be my friend too; W ∼ W ),identification is also weak. In practice, using W X, W X and W X... as instruments canlead to near-perfect collinearity, which implies weak identification (Gibbons and Overman(2012)). The the nearly violation of the full rank condition of Proposition 2 is a potentialsource of weak identification. Because, it leads to a near-perfect collinearity occurring in thefirst-stage regression of the endogenous network effect. The use of regularization methods,such as ridge regression, has been shown to solve these problems.Liu and Lee (2010) also consider the estimation of a social interaction network model.As in Bramoull´e, Djebbari, and Fortin (2009), they exploit the structure of the network toidentify the network effect. In addition to
W X, W X and W X... , the Bonacich centralityacross nodes in a network is used as an instrumental variable to identify network effects andimprove estimation efficiency. The use of the Bonacich centrality measure usually leads to The extreme case of fully connected graph has exactly two distinct eigenvalues. An the application of theProposition 1 implies that the network effects are not identified. he use of many instruments. The 2SLS estimates obtained with these instruments are biasedbecause of the large number of instruments used. Liu and Lee (2010) propose a bias-corrected2SLS method to account for this.In this paper, I use regularization techniques. These high-dimensional estimation tech-niques enable the use of all instruments and deliver efficiency with better finite sample prop-erties (see Carrasco (2012) and Carrasco and Tchuente (2015)). In this case, asymptoticefficiency can be obtained by using many (or all potential) instruments. I use both theBonacich centrality measure and W X, W X and W X... as instrumental variables and applya high-dimensional technique to mitigate the problem of near-perfect collinearity resultingfrom network structure or/and the bias of many instruments.
The parameters of interest can be estimated using instrumental variables. We can use a finitenumber of instruments or all potential instruments. As the number of instruments increases,estimation becomes asymptotically more efficient. However, a large number of instrumentsrelative to the sample size creates the many instruments problem (see, for example, Bekker(1994), Donald and Newey (2001) and Han and Phillips (2006)). The parameter of interestcan also be weakly identified when a fixed number of instruments is used but the structureof the interaction does not provide sufficient exogenous variation. In such cases, using a fixednumber of instrumental variables will not avoid the bias problem in the estimation.The 2SLS estimator with a fixed number of instrumental variables will be consistent andasymptotically normal, but may be less efficient than using many instruments. In order touse all potential instruments ( Q ), I use regularization tools. In addition to addressing themany instruments bias in Carrasco (2012), my objective is to use regularization to addressthe weak identification problem.Let ε ( ρ , δ ) = J R ( Y − Zδ ), with δ = ( λ, β ′ ) ′ and Z = ( W Y, X ). The estimation is basedon moments corresponding to the orthogonality condition of Q and J ε given by E ( Q ′ ε ( ρ , δ )) = 0 (9) The set of instrumental variables is Q = J [ Q , M Q ] with Q = [ W X, W X, ...W ι, W ι, ..., X ]. They can benormalized or standardized. y identification results are conditional on ρ . I should first have a preliminary estimator˜ ρ of ρ . I take ˜ R = I − ˜ ρM to be an estimator of R .The regularized estimators used in this paper require the definition of some mathematicalobjects. My notation follows existing notation in the literature on regularization methods.The set of all potential instrumental variables ( Q ) is a countable infinite set. π is a positivemeasure on N , and l ( π ) is the Hilbert space of square-summable sequences with respect to π in the real space. I define the covariance operator K of the instruments as K : l ( π ) → l ( π )( Kg ) j = X k ∈ N E ( Q ji Q ki g k π k )where Q ji is the j th column and i th line of Q . Under the assumption that | Q ji Q ki | for all j, k and i are uniformly bounded, K is a compact operator (see Carrasco, Florens, and Renault(2007) for a definition). Indeed, under Assumption 2, the operator K is a Hilbert-Schmidtoperator; I assume that it has non-zero eigenvalues. I consider ν j ; j = 1 , , ... to be the eigenvalues (in decreasing order) of K , and φ j ; j =1 , , ... to be the orthogonal eigenvector of K . K can be estimated by K n , defined as: K n : l ( π ) → l ( π )( K n g ) j = X k ∈ N n n X i =1 Q ji Q ki g k π k . In the SAR model, the number of potential moment conditions can be infinite as inequation (9). Therefore, the inverse of K n needs to be regularized because it is nearly singular.By definition (see Kress (1999), p. 269), a regularized inverse of an operator K is R α : l ( π ) → l ( π )such that lim α → R α Kϕ = ϕ , ∀ ϕ ∈ l ( π ) . I consider three different types of regularization schemes: Tikhonov (T), Landweber-Fridman (LF) and principal component (PC). They are defined as follows: For a detailed discussion on the role and choice of π , see Carrasco (2012) and Carrasco and Florens (2014) when π is a measure on R . In my model, π can, for example, be π k = λ k P k ∈ N λ k with k ∈ N . I assume that the element of X are uniformly bounded. Tikhonov (T)
Tikhonov regularization is also known as ridge regularization:( K α ) − r = ( K + αI ) − Kr or ( K α ) − r = ∞ X j =1 ν j ν j + α (cid:10) r, φ j (cid:11) φ j where α > I is the identity operator. • Landweber-Fridman (LF)
Let 0 < c < / k K k , where k K k is the largest eigenvalue of K (which can be estimatedby the largest eigenvalue of K n ). Then,( K α ) − r = ∞ X j =1 [1 − (1 − cν j ) α ] ν j (cid:10) r, φ j (cid:11) φ j where 1 α is some positive integer. Principal component (PC)
This method consists of using the first eigenfunctions:( K α ) − r = /α X j =1 ν j (cid:10) r, φ j (cid:11) φ j where 1 α is some positive integer. The use of PC in the first stage is equivalent to projectingon the first principal components of the set of instrumental variables.In the case of a finite number of moments, P m = Q m ( Q ′ m Q m ) − Q ′ m is the projectionmatrix on the space of instruments. The matrix Q ′ m Q m may become nearly singular when m gets large. Moreover, when m > n , Q ′ m Q m is singular. To address these cases, Iconsider a regularized version of the inverse of the matrix Q ′ m Q m .I use ψ j to represent the eigenvectors of the n × n matrix Q m Q ′ m /n associated witheigenvalues, ν j . For any vector e , the regularized version of P m , P αm is: P αm e = 1 n n X j =1 q ( α, ν j ) (cid:10) e, ψ j (cid:11) ψ j (cid:10) ., . (cid:11) represents the scalar product in l ( π ) and in R n (depending on the context). here for T, q ( α, ν j ) = ν j ν j + α ; for LF, q ( α, ν j ) = [1 − (1 − cν j ) /α ]; and for PC, q ( α, ν j ) = I ( j ≤ /α ).The network models suggest the use of an infinite number of instruments, which is the rea-son we are not using instrument selection methods. Following Carrasco and Florens (2000),I define the counterpart of P α for an infinite number of instruments as P α = G ( K αn ) − G ∗ where G : l ( π ) → R n with Gg = (cid:0) h Q , g i ′ , h Q , g i ′ , ..., h Q n , g i ′ (cid:1) ′ and G ∗ : R n → l ( π ) with G ∗ v = 1 n n X i =1 Q i v i such that K n = G ∗ G and GG ∗ is an n × n matrix with a typical element (cid:10) Q i , Q j (cid:11) n . Let φ j , ν ≥ ν ≥ ... > j = 1 , , ... be the orthonormalized eigenvectors and eigenvalues of K n , and ψ j be the eigenfunctions of GG ∗ . Gφ j = √ ν j ψ j and G ∗ ψ j = √ ν j φ j . Note that in this case for e ∈ R n , P α e = ∞ X j =1 q ( α, ν j ) (cid:10) e, ψ j (cid:11) ψ j . We can also note that: v ′ P α w = v ′ G ( K αn ) − G ∗ w = * ( K αn ) − / n X i =1 Q i ( . ) v i , ( K αn ) − / n n X i =1 Q i ( . ) w i + . (10)Our objective is to estimate the parameters of the model.I consider S n ( k ) = 1 n n X i =1 ( ˇ Y i − ˇ Z i δ ) Q ik with ˇ Y = ˜ RY and ˇ Z = ˜ RZ .And I denote ( K αn ) − as the regularized inverse of K n and ( K αn ) − / = (( K αn ) − ) / .The regularized 2SLS estimator of δ is defined as:ˆ δ R sls = argmin (cid:10) ( K αn ) − / S n ( . ) , ( K αn ) − / S n ( . ) (cid:11) . (11)Solving the minimization problem, we haveˆ δ R sls = ( Z ′ ˜ R ′ P α ˜ RZ ) − Z ′ ˜ R ′ P α ˜ RY. (12) quation (12) defines the regularized 2SLS. The regularized 2SLS for SAR is closely relatedto the regularized 2SLS of Carrasco (2012) and the 2SLS of Liu and Lee (2010). It extendsCarrasco (2012) by considering SAR models and differs from Liu and Lee (2010) in that theprojection matrix P is replaced by its regularized counterpart P α .The 2SLS estimators proposed in this paper are for cases with spatial serial correlation andhomoscedastic errors. Extending the regularization approach to deal with heteroscedasticity isleft for future research. Indeed, in a companion paper, I propose regularized GMM estimatorsallowing the joint estimation of all parameters of the model, and the variance covarianceestimator of the estimate is obtained using an approach similar to West and Newey (1987). The following proposition shows the consistency and asymptotic normality of the regularized2SLS estimators. The following extra assumptions are needed.
Assumption 3. H = lim n →∞ n f ′ f is a finite nonsingular matrix. Assumption 4. (i) The elements of X are uniformly bounded, X has full rank k , E ( ε | X ) = 0, and lim n →∞ n X ′ X exists and is nonsingular.(ii) There is a ω ≥ / ∞ X j =1 (cid:10) E ( Z ( ., x i ) f a ( x i )) , φ j (cid:11) ν ω +1 j < ∞ . Assumption 4 (ii) ensures that regularization allows us to obtain a good asymptotic ap-proximation of the best instrument, f . Proposition 4
Under Assumptions 1-4 , ˜ ρ − ρ = O p (1 / √ n ) and α → . Then, the T, LFand PC estimators satisfy:1. Consistency: ˆ δ R sls → δ in probability as n and α √ n go to infinity.2. Asymptotic normality: √ n (ˆ δ R sls − δ ) d → N (cid:0) , σ ε H − (cid:1) as n and α √ n go to infinity. The convergence rate of the regularized 2SLS estimators for SAR is different from thoseobtained without spatial correlation. For consistency in the SAR model, α √ n must go to nfinity. The Carrasco (2012) regularized 2SLS estimator is consistent with a convergence rateof nα . Asymptotic normality is obtained if α √ n goes to infinity, which is also different fromthe Carrasco (2012) asymptotic normality condition for 2SLS. The regularization parameter α is allowed to go to zero slower than in Carrasco (2012) for consistency. Compared to Carrasco(2012), more regularization is needed in order to achieve appropriate asymptotic behavior.The reinforcement of these conditions is certainly due to regularization taking into accountthe spatial representation of the data.If the regularization parameter is constant, the asymptotic variance will be bigger. However,asymptotically, the use of regularization should not be needed. It is therefore reasonable tohave α → α √ n goes to infinity.The bias of the 2SLS estimator in Liu and Lee (2010) is of the form √ nb sls = σ tr ( P α RW S − R − )( Z ′ RP α RZ ) − e . Using Lemma 1 and 2 in the Appendix, I show that the 2SLS bias is of order √ nb sls = O p ( 1 α √ n ), which goes to zero as α √ n goes to infinity. The ability to choose the regulariza-tion parameter means that we are able to control the size of α √ n . Therefore, selecting theappropriate regularization parameter is crucial.The regularization methods presented involve the use of eigenvalues and eigenvectors. Theeigenvalues obtained can vary greatly because of the difference in the variance of instrumentalvariables in the model. For example, W X and W ι could have different variances. To accountfor this difference, I use normalized instruments in the Monte Carlo simulation. We canalso standardize the instruments, which means that regularization methods will be able toaccount for the difference in location and scale of the instruments. In addition, the regularizedestimator presented in this section depend on the regularization parameter, α . The choice ofthis parameter is very important for the estimators’ behavior in small samples. In Section 4,I discuss the selection of the regularization parameter. Selection of the Regularization Parameter
This section discusses the selection of the regularized parameter for network models. I firstderive an approximation of the mean-squared error (MSE) using Nagar-type expansion. I es-timate the dominant term of the MSE, and select the regularization parameter that minimizesthis term.
The following proposition provides an approximation of the MSE:
Proposition 5
If Assumptions 1 to 4 hold, ˜ ρ − ρ = O p (1 / √ n ) and nα → ∞ for LF-, PC-and T-regularized 2SLS estimators, then n (ˆ δ R sls − δ )(ˆ δ R sls − δ ) ′ = Q ( α ) + ˆ R ( α ) ,E ( Q ( α ) | X ) = σ ε H − + S ( α ) , (13) and r ( α ) /tr ( S ( α )) = o p (1) , with r ( α ) = E ( ˆ R ( α ) | X ) and S ( α ) = σ ε H − f ′ (1 − P α ) fn + σ ε n X j q j e ι ′ D ′ Dιe ′ H − . For LF and PC, S ( α ) = O p (cid:18) nα + α ω (cid:19) and for T, S ( α ) = O p (cid:18) nα + α min( ω, (cid:19) , with D = J RW S − R − and e is the first unit (column) vector. For the selection of α , the relevant dominant term S ( α ) will be minimized to achieve thesmallest MSE. S ( α ) accounts for a trade-off between the bias and variance. When α goes tozero, the bias term increases while the variance term decreases. The approximation of theregularized 2SLS estimator is similar to Carrasco-regularized 2SLS. However, the expressionof the MSE is more complicated because of the spatial correlation. .2 Estimation of the MSE The aim of this subsection is to find the regularized parameter that minimizes the conditionalMSE of ¯ γ ′ ˆ δ sls for some arbitrary k + 1 × γ . This conditional MSE is: M SE = E [¯ γ ′ (ˆ δ sls − δ )(ˆ δ sls − δ ) ′ ¯ γ | X ] ∼ ¯ γ ′ S ( α )¯ γ ≡ S ¯ γ ( α ) .S ¯ γ ( α ) involves the function f , which is unknown. We therefore need to replace S ¯ γ with anestimate.Stacking the observations, the reduced form equation can be rewritten as RZ = f + v. This expression involves n × ( k +1) matrices. We can reduce the dimension by post-multiplyingby H − ¯ γ : RZH − ¯ γ = f H − ¯ γ + vH − ¯ γ ⇔ RZ ¯ γ = f ¯ γ + v ¯ γ (14)where v ¯ γi = v ′ i H − ¯ γ is a scalar.I use ˜ δ to denote a preliminary estimator of δ , obtained from a finite number of instru-ments. I use ˜ ρ to denote a preliminary estimator of ρ , obtained by the method of momentsas follows: ˜ ρ = armin ˜ g ( ρ ) ′ ˜ g ( ρ )where ˜ g ( ρ ) = [ M ˜ ε ( ρ ) , M ˜ ε ( ρ ) , M ˜ ε ( ρ )] ′ ˜ ε ( ρ ), M = J W J − tr ( J W J ) I/tr ( J ) ,M = J M J − tr ( J M J ) I/tr ( J ) ,M = J M W J − tr ( J M W J ) I/tr ( J ) , and ˜ ε ( ρ ) = J R ( ρ )( Y − Z ′ ˜ δ ) . ˜ δ = [ Z ′ Q ( Q ′ Q ) − Q ′ Z ] − Z ′ Q ( Q ′ Q ) − Q ′ Y , where Q is a single instrument. The residualis ˆ ε ( ρ ) = J R (˜ ρ )( Y − Z ′ ˜ δ ) . et us denote ˆ σ ε = ˆ ε ( ρ ) ′ ˆ ε ( ρ ) /n , ˆ v ¯ γ = ( I − P ˜ α ) R (˜ ρ ) Z ˜ H − ¯ γ , where ˜ H is a consistentestimate of H and ˜ α is a preliminary value for α , ˜ v ¯ γ = ( I − P ˜ α ) R (˜ ρ ) Z ˜ H − ¯ γ and ˆ σ v ¯ γ = ˜ v ′ ¯ γ ˜ v ¯ γ /n .I consider the following goodness-of-fit criteria: Mallows C p (Mallows (1973)) ˆ ̟ m ( α ) = ˆ v ′ ¯ γ ˆ v ¯ γ n + 2ˆ σ v ¯ γ tr ( P α ) n . Generalized cross-validation (Craven and Wahba (1979))ˆ ̟ cv ( α ) = 1 n ˆ v ′ ¯ γ ˆ v ¯ γ (cid:16) − tr ( P α ) n (cid:17) . Leave-one-out cross-validation (Stone (1974))ˆ ̟ lcv ( α ) = 1 n n X i =1 ( ˜ RZ ¯ γ i − ˆ f α ¯ γ − i ) , where ˜ RZ ¯ γ = W ˜ H − ¯ γ , ˜ RZ ¯ γ i is the i th element of ˜ RZ ¯ γ and ˆ f α ¯ γ − i = P α − i ˜ RZ ¯ γ − i . The n × ( n − P α − i is such that the P α − i = G ( K αn − i ) G ∗− i are obtained by suppressing the i th observationfrom the sample. ˜ RZ ¯ γ − i is the ( n − × i th observationof ˜ RZ ¯ γ .Using (13), S ¯ γ ( α ) can be rewritten as S ¯ γ ( α ) = σ ε f ′ ¯ γ ( I − P α ) f ¯ γ n + σ ε n X j q j e γ ι ′ D ′ Dιe ′ γ . Using Li (1986)’s results on C p or cross-validation procedures, note that ˆ ̟ ( α ) approxi-mates to ̟ ( α ) = f ′ ¯ γ ( I − P α ) f ¯ γ n + σ v ¯ γ tr (cid:0) ( P α ) (cid:1) n . Therefore, S γ ( α ) is estimated using the following equation:ˆ S ¯ γ ( α ) = ˆ σ ε " ˆ ̟ ( α ) − ˆ σ v ¯ γ tr (cid:0) ( P α ) (cid:1) n + ˆ σ ε n ( tr ( P α )) e γ ι ′ ˜ D ′ ˜ Dιe ′ γ here ˜ D is a consistent estimator of D . The optimal regularization parameter is obtainedby minimising ˆ S ¯ γ ( α ) with respect to α . My selection procedure is very similar to Carrasco(2012), and its optimality can be established using the results of Li (1986) and Li (1987).The regularized 2SLS process and the selection of the regularization parameters arebased on a preliminary estimator of ρ . This means that if ρ is not correctly estimated,the estimation of δ could be biased in an unpredictable direction. Also, the use of a cross-validation-type method to choose the regularization parameter usually influences the qualityof inference. This is similar to the inference problem in non-parametric estimation (seeNewey, Hsieh, and Robins (1998) and Guerre and Lavergne (2005)). This paper focuses onthe point estimation of the parameter; post-regularization inference is left for future research. To investigate the finite sample performance of the regularized 2SLS estimators, I conduct asimulation study based on the following model: Y = λ W Y + Xβ + W Xβ + ια + u with u = ρ M u + ε .I generate four samples with different numbers of groups (¯ r ) and group sizes ( m r ). Thefirst sample contains 30 groups, each with 10 individuals. The second sample contains 60groups, also with 10 individuals each. To study the effect of group sizes, I also consider 30and 60 groups of 15 individuals.For each group, the sociomatrix W r is generated as follows. First, for the i th row of W r ( i = 1 , ..., m r ), k ri is generated uniformly at random from the set of integers [0, 1, 2, 3], [0,1, ..., 6] or [0, 1, ..., 8]. Allowing for differences in the maximum number of friends helps usstudy the effect of the density of the network on the estimators.The sociomatrix W r is constructed as follows. First, set the ( i +1) th, ..., ( i + k ri ) th elementsof the i th row of W r to be 1 and the rest of the elements in that row to be 0, if i + k ri ≤ m r .Otherwise, the entries of 1 will be wrapped around.In the case of k ri = 0, the i th row of W r will only contain zeros. M is the row-normalized W . X ∼ N (0 , I ), α r ∼ N (0 , . ε r,i ∼ N (0 , β = β =0 . λ = ρ = 0 . he estimation methods considered are: • Q = J [ X, W X, M X, M W X ], • Q = [ Q , J W ι ], and • the regularized 2SLS estimators T-2SLS (Tikhonov), LF-2SLS (Landweber-Fridman)and PC-2SLS (principal component), with many instruments, ˜ Q . ˜ Q is a matrix ofinstruments with Q ’s instruments normalized to unit variance. For all 2SLS estimators, a preliminary estimator of ρ is obtained by the method of mo-ments,˜ ρ = argmin ˜ g ( ρ ) ′ ˜ g ( ρ ) where ˜ g ( ρ ) = [ M ˜ ε ( ρ ) , M ˜ ε ( ρ ) , M ˜ ε ( ρ )] ′ ˜ ε ( ρ ), M = J W J − tr ( J W J ) I/tr ( J ) ,M = J M J − tr ( J M J ) I/tr ( J ) ,M = J M W J − tr ( J M W J ) I/tr ( J ) , ˜ ε ( ρ ) = J R ( ρ )( Y − Z ′ ˜ δ )and ˜ δ = [ Z ′ Q ( Q ′ Q ) − Q ′ Z ] − Z ′ Q ( Q ′ Q ) − Q ′ Y. The selection of the regularization parameter follows the procedure proposed in Section4. I minimise the estimated approximated MSE.Before presenting the results of the simulations, it is important to note that the data-generating process in this experiment exhibits a very low transitivity level (there are a lotof non-connected individuals in all groups). Moreover, the reduced-form model is sparse (forexample, when the maximum number of friends is 3, W q = 0 for q > As noted by Newey (2013), the choice of identity for the matrix for the Tikhonov regularization method doesnot account for any difference in location and scale of the instruments. . The additional linear moment conditions reduce standard deviations in 2SLS estimatorsof λ and β . The 2SLS estimators in the model with a large number of instrumentshave smaller standard deviations than the 2SLS estimators in the model with a finitenumber of instruments.2. The additional instruments in Q introduce bias into the 2SLS estimators of λ and β .The 2SLS estimators from the model with a finite number of instruments have a meanvalue of estimators closer to the true value of the parameter than the 2SLS estimatorsfrom the model with a large number of instruments.3. The regularized 2SLS procedures substantially reduce the many instruments bias forboth the 2SLS estimators, particularly in large samples. The bias-correction estimatorsare similar to regularized estimators in term of bias correction for large samples froma denser network. But in small samples, the bias of the bias-corrected estimator issmaller than that of the regularized estimators. Relative to the 2SLS estimators fromthe model with many instruments, the regularized 2SLS estimators reduce the bias andhave comparable standard deviations.4. The performance of the regularized estimators improves with the density of the networkand the number of groups. The behavior of the regularized estimator with respectto network density suggests that the regularized estimators are good candidates toimprove the asymptotic behavior of the estimator of the network effect when the levelof transitivity in the groups is very high. The large theoretical literature on local government tax competition can be divided in twogroups: efficient local taxation (Tiebout (1956)) and tax competition models departing fromTiebout’s model (Lyytik¨ainen (2012)). The departure from Tiebout’s model leads to threetypes of fiscal consequences: benefit spillovers, distorting taxes on a mobile tax base, politicaleconomy considerations and information asymmetries (Lyytik¨ainen (2012)). While the causesof local government tax interaction are certainly present in most legislation, the empiricalliterature has long been divided on how to identify a causal local tax competition (interaction) ffect.The identification problem here is a special case of Manski’s reflection problem. In thecase of municipalities in the same legislation, the network matrix can be represented bythe spatial matrix of neighbors. This neighborhood structure of the municipality can beconsidered as exogenous with respect to tax level. I propose a model to test the hypothesisof tax competition between municipalities: T itr = λW r T itr + β X itr + β W r X itr + α r + ε itr The identification and estimation of the tax competition parameter ( λ ) is achieved, in alarge part of the empirical literature, via two strategies. The first strategy uses spatial lags asinstruments (friends of friends’ characteristics) in an instrumental variables approach, whilethe second uses maximum likelihood estimation, where identification is achieved via modelspecification. As pointed out by Gibbons and Overman (2012), the causality of the parame-ters obtained in these cases is not easy to defend. The validity of the exclusion restriction isnot obvious and the correct specification of the model is not fully testable. As an alternative,Gibbons and Overman (2012) propose using differencing coupled with instrumental variablescoming from exogenous policy variations.Lyytik¨ainen (2012) estimates a tax competition parameter among Finnish local govern-ments. He uses changes in statutory lower limits to property tax rates as a source of exogenousvariation to estimate the tax competition parameter ( λ ) on first difference model. He esti-mates the following model: T i − T i = λ X j = i w ij ( T j − T j )+ β ( X i − X i )+ β X j = i w ij ( X j − X j )+ v i . where w ij = 1 /n i with n i the number of neighbor of the individual i .The second column in Table 1 replicates the estimate using the instrument from Lyytik¨ainen(2012). He assumes that β = 0 and use only one excluded instrument. Other estimationsare carried out using spatial lag of the second-, third- and fourth-order and regularized esti-mators. The results in Table 1 suggest that the use of many instruments by adding more spatiallags biases the results. The use of regularization seems to reduce the bias of the estimation. The instrument used in Table 3 of Lyytik¨ainen (2012) is one of the instruments used with the spatial lag ofother exogenous variables. I have augmented this model to account for an exogenous network model. n = 411) Estimators/IVs Lyytik¨ainen (2012) Spatial lags 2 Spatial lags 2 and 3 Spatial lags 2, 3 and 42SLS 0.06 (0.07) 0.26(0.28) -0.02 (0.22) -0.03(0.17)T-2SLS 0.01(0.004) 0.19(0.30) 0.185(0.31) 0.182(0.26)L-2SLS 0.01(0.0005) 0.20(0.22) 0.186(0.33) 0.182(0.31)PC-2SLS 0.0115(0.42) 0.26(0.28) -0.027(0.22) -0.039(0.17)Cond. number( ν ν min ) 2153.8 2001 17800 1.3983e+05 Standard errors are in parenthesis. The change in general property taxation between 1999 and 2000 is the dependent variable. The independent variablesare changes in neigboring municipalities’ tax rates, the municipality’s own imposed increase, non-zero own imposed increase and changes in municipalattributes, such as grants from the central government, disposable income per capita, the unemployment rate and age structure (see Table 3 of Lyytik¨ainen(2012)) for more details). The last line of the table shows the condition numbers of QQ ′ matrices for different instrument sets. The values are relativelylarge, suggesting a near-perfect collinearity problem in small samples. The simulation results indicate that T-2SLS and L-2SLS are the best methods in terms of biascorrection. The point estimates obtained by both estimation methods are very similar, whichsuggests a bias correction relative to the 2SLS. As the number of instruments increases,the standard errors decrease for the 2SLS as well as for the regularized 2SLS. However,the standard errors are still very large, which means that the tax competition effect is notstatistically significantly different from zero. This empirical example shows how the regularized estimator can be used to improve theestimation of network models. The size of the tax competition parameter appears to be largerthan is suggested by Lyytik¨ainen (2012). The estimators are not statistically different fromzero. However, the regularized estimators (T-2SLS and L-2SLS) appear to be more stableas the number of instruments increases, which suggests that the weak identification problemmay have been solved.
This paper uses regularization methods to estimate network models. It proposes easy-to-check identification conditions based on the network adjacency matrix number of distincteigenvalues. Regularization is proposed as a solution to the weak identification problem innetwork models. Identification of the network effect can be achieved by using individuals’ Inference using the standard errors of regularized estimators does not account for regularization and should beinterpreted with caution, given the relatively small sample ( n = 411 municipalities). onacich (1987) centrality as instrumental variables. However, the number of instrumentsincreases with the number of groups, leading to the many instruments problem. Identificationcan also be achieved using the friend-of-a-friend’s exogenous characteristics. However, if thenetwork is very dense or group size is very large, the identification is weakened.The proposed regularized 2SLS estimators based on three regularization methods helpaddress the weak identification and many moments problems. These estimators are consistentand asymptotically normal. The regularized 2SLS estimators achieve the asymptotic efficiencybound. I derive an optimal data-driven selection method for the regularization parameter. Anapplication to the estimation of tax competition in Finnish municipalities shows the empiricalrelevance of my methods.A Monte Carlo experiment shows that the regularized estimator performs well. The reg-ularized 2SLS procedures substantially reduce the bias from the 2SLS estimators, specificallyin a large sample. Moreover, the regularized estimator becomes more precise and less biasedwith increases in the network density and in the number of groups. These results show thatregularization is a valuable solution to the potential weak identification problem existing inthe estimation of network models. Appendix: Summary of notation
To simplify notation, I use the following: P = P α , q j = q ( ν j , α ) tr ( A ) is the trace of matrix Ae j is the j th unit (column) vector j = 1 , ..., ne f = 1 n f ′ ( I − P ) fe f = 1 n f ′ ( I − P ) f ,∆ f = tr ( e f ) and ∆ f = tr ( e f ) B Appendix: Lemmas
Lemma 0: (Lemma 4 and Lemma 5 of Carrasco (2012)) (i) tr ( P ) = X j q j = O (1 /α ) and tr ( P ) = X j q j = o (( X j q j ) ), Lemma 4 (i) of Carrasco(2012),(ii) ∆ f = O p ( α ω ) f or LF and SCO p ( α min ( ω, ) f or T and f ′ ( I − P ) ε/ √ n = O p ( p ∆ f ), Lemma 5 (i) and(ii) of Carrasco (2012),(iii) u ′ P ε = O p (1 /α ), Lemma 5 (iii) of Carrasco (2012),(iv) E [ u ′ P εε ′ P u | X ] = ( X j q j ) σ uε σ ′ uε + ( X j q j )( σ uε σ ′ uε + σ ε Σ u ), Lemma 5 (iv) of Carrasco(2012),(v) E [ f ′ ( I − P ) εε ′ P u/n | X ] = O p (∆ f / √ αn ), Lemma 5 (viii) of Carrasco (2012). Lemma 1: (i) tr ( P ) = X j q j = O (1 /α ) and tr ( P ) = X j q j = o (( X j q j ) ).(ii) Suppose that { A } is a sequence of n × n UB matrices. For B = P A , tr ( B ) = o (( X j q j ) ), tr ( B ) = o (( X j q j ) ), and X i B ii = o (( X j q j ) ), where B ii are diagonal elements of B . Proof of Lemma 1: (i) Proof is in Carrasco (2012) Lemma 4 (i).(ii) By eigenvalue decomposition, AA ′ = Π∆Π ′ , where Π is an orthonormal matrix and ∆ is he eigenvalue matrix. It follows that P AA ′ P ≤ ν max P with ν max being the largest eigen-value. It follows that tr ( P AA ′ P ) ≤ ν max tr ( P ) = o p (( X j q j ) ). By the Cauchy-Schwarzinequality, tr ( B ) ≤ [ tr ( P )] / [ tr ( P AA ′ P )] / = o p (( X j q j ) ). Also by the Cauchy-Schwarzinequality, tr ( B ) ≤ tr ( BB ′ ) = tr ( P AA ′ P ) = o (( X j q j ) ). Lemma 2:
Let C and D be two UB n × n matrix sequences.(i) C ′ P D = O p ( n/α )(ii) ε ′ C ′ P Dε = O p (1 /α ) and C ′ P Dε = O p ( √ n/α ) Proof of Lemma 2: (i) By the Cauchy-Schwarz inequality, | e ′ i C ′ P α De j | ≤ q e ′ i C ′ Ce i q e ′ j D ′ P De j = O ( n/α ),which implies that C ′ P D = O ( n/α ).(ii) E | ε ′ C ′ P Dε | ≤ p E ( ε ′ C ′ P Cε ) p E ( ε ′ D ′ P Dε ) = σ p tr ( C ′ P C ) p tr ( D ′ P D ) = O ( 1 α ).By the Markov inequality, ε ′ C ′ P Dε = O p ( 1 α ).By the Cauchy-Schwarz inequality, | e ′ j C ′ P Dε | ≤ q e ′ j C ′ Ce j √ ε ′ D ′ P Dε = O p ( √ n/α ), thus C ′ P Dε = O p ( √ n/α ). Lemma 3:
Suppose ˜ ρ is a consistent estimator of ρ and ˜ R = R (˜ ρ ).Then, 1 n Z ′ ˜ R ′ P ˜ RZ = 1 n Z ′ R ′ P RZ + O p [(˜ ρ − ρ ) /α ] and1 n Z ′ ˜ R ′ P ˜ RR − ε = 1 n Z ′ RP ε + O p [(˜ ρ − ρ ) / ( α √ n (1 + α √ n ))]. Proof of Lemma 3: ˜ R = R − (˜ ρ − ρ ) M . Thus, Z ′ ˜ R ′ P ˜ RZ/n = Z ′ R ′ P RZ/n − (˜ ρ − ρ ) Z ′ M ′ P RZ/n − (˜ ρ − ρ ) Z ′ R ′ P M Z/n + (˜ ρ − ρ ) Z ′ M ′ P M Z/n
Let us show that Z ′ R ′ P M Z/n = O p (1 /α ) and Z ′ M ′ P M Z/n = O p (1 /α ).Note that Z = [ W S − ( Xβ + ιγ ) , X ] + W S − R − εe ′ .Under Assumption 3, Z ′ R ′ P M Z/n = O (1 /α ) + O p (1 / √ nα ) + O p (1 /nα ) = O p (1 /α ) and Z ′ M ′ P M Z/n = O p (1 /α ) by Lemma 2 (i). ′ ˜ R ′ P ˜ Rε/n = Z ′ RP ε/n − (˜ ρ − ρ ) Z ′ M ′ P ε/n − (˜ ρ − ρ ) Z ′ R ′ P M R − ε/n + (˜ ρ − ρ ) Z ′ M ′ P M R − ε/n Using the same argument as in the previous case under Assumption 3, Z ′ R ′ P M R − ε/n = O p (1 / √ nα +1 /nα ) = O p [1 /α √ n (1+1 /α √ n )] , Z ′ M ′ P ε/n = O p [1 /α √ n (1+1 /α √ n )] and Z ′ M ′ P ε/n = O p [1 /α √ n (1 + 1 /α √ n )] by Lemma 2 (ii). Lemma 4:
If Assumptions 1-4 are satisfied and α →
0, then(i) Z ′ RP RZ/n = H + o p (1) if α √ n → ∞ , and(ii) Z ′ RP ε/ √ n = f ′ ε/ √ n + o p (1) if α √ n → ∞ . Proof of Lemma 4:
Let v = J RW S − R − ε and J RZ = f + ve ′ (i) 1 n Z ′ RP RZ = 1 n f ′ f − n f ′ ( I − P ) f + 1 n e v ′ P ve ′ + 1 n f ′ P ve ′ + 1 n e v ′ P f
Let e f = 1 n f ′ ( I − P ) f , e f = 1 n f ′ ( I − P ) f , ∆ f = tr ( e f ) and ∆ f = tr ( e f ). By theCauchy-Schwarz inequality, 1 n | e ′ i f ′ ( I − P ) f e j | ≤ n q e ′ i f ′ f e i q e ′ j f ′ ( I − P ) f e j = O ( p ∆ f ) . From Carrasco (2012) Lemma 5 (i), ∆ f = O p ( α ω ) f or LF and SCO p ( α min ( ω, ) f or T . Thus, ∆ f = o p (1).By Lemma 2 (ii), 1 n e v ′ P ve ′ + 1 n f ′ P ve ′ + 1 n e v ′ P f = O p ( 1 nα + 1 α √ n ) = o p (1).(ii) Z ′ RP ε/ √ n = f ′ ε/ √ n − f ′ ( I − P ) ε/ √ n + e v ′ P ε/ √ n By Lemma 5 (ii) of Carrasco (2012), f ′ ( I − P ) ε/ √ n = O p ( p ∆ f ) and by Lemma 2 (ii), e v ′ P ε/ √ n = O p (1 /α √ n ). C Appendix: Proofs of propositions
Proof of Proposition 1:
The Cayley-Hamilton theorem in linear algebra state that each square matrix is solution toit characteristic polynomial. The adjacency matrix of the network in our case is given by W , which is an n × n matrix. If it has two distinct eigenvalues, therefore, the characteristicpolynomial, p ( τ ) = det ( τ I n − W ), is a degree two polynomial. Thus, there exist a , a and, with a = 0 such that a I n + a W + a W = 0. I n , W and W are linearly dependantand from Proposition 1 of Bramoull´e, Djebbari, and Fortin (2009) the network effects are notidentified. Proof of Proposition 2:
Under Assumption 2 (i.e. that
Sup k λW k < f can be approximated by a linear com-bination of ( J W X, J W X, ..., J W ̺ w − X ) and J X . Indeed, using Caley-Hamilton theoremand the fact that the characteristic polynomial has ̺ w distinct eigenvalues, For any naturalnumber q > ̺ w , W q can be written as a linear combination of I n , W, ..., W ̺ w − . Thus, W S − can be written a linear combination of I n , W, ..., W ̺ w − . Therefore, f can be approximatedby a linear combination of ( J W X, J W X, ..., J W ̺ w − X ) and J X .Let assume that [
W X, W X, ..., W ̺ w − X, X ] is full rank column.Let Q = J [ W X, W X, ..., W ̺ w − X, X ] be the set of instrumental variables. The identificationof the network effects is based on the moment conditions E ( Q ′ ε ( ρ , δ )) = 0 (i.e. Q ′ f ( δ − δ ) =0). The parameters are point identified if the solution to this equation is unique. A necessaryand sufficient condition is that Q and f are full rank column. [ W X, W X, ..., W ̺ w − X, X ] isfull rank column if and only if Q is full rank column. Moreover, if [ W X, W X, ..., W ̺ w − X, X ]is full rank column the f is of rank 1 + k .Let assume that [ W X, W X, ..., W ̺ w − X, X ] is not full rank column. Consider B = { b ∈ R k × ̺ w , Xb + W Xb + ... + W ̺ w − Xb ̺ w − = 0 } It can be observed that f = [ J W S − ( Xβ ) , J X ] is equivalent to f = J [ ̺ w − X k =1 ς k W k Xβ , X ].Consider A = { a = ( a , a ) ∈ R k × R , Xa + a ̺ w − X k =1 ς k W k Xβ = 0 } f is not full rank if and only if A = { } .In other word, f is not full rank column if and only if there exist b ∈ B such that b = a , b k = a ς k β with β , ς k known constant for all k = 1 , ..., ̺ w − b = 0. The condition for f not being full rank column of very restrictive. However, if we assume that there exist sucha sub set in A , then f is not full rank.Note that in general, it is possible to have J W S − ( Xβ + ιγ ) linearly independent from J X without [
W X, W X, ..., W ̺ w − X, X ] being full rank column. This happen if β , λ and γ are not in the space parameter compatible with the null space of [ W X, W X, ..., W ̺ w − X, X ]. he condition [ W X, W X, ..., W ̺ w − X, X ] is full rank column is therefore a necessary butnot, in general, a sufficient condition for identification. But if we restrict the true value of theparameter to be in the compatible set as in Bramoull´e, Djebbari, and Fortin (2009) Result 1(2) Page 54 the condition is necessary and sufficient.
Proof of Proposition 3:
The proof of proposition 3 is similar to that of proposition 2 with [
W X, W X, ..., W ̺ w − X, X ]replaced by Q ̺ w = [ W X, W X, ...W ̺ w − X, W ι, W ι, ..., W ̺ w − ι, X ]. The identification re-sult in this case are conditional on a consistent preliminary estimation of ρ as in Liu and Lee(2010). Proof of Proposition 4:
The regularized 2SLS estimator satisfies ˆ δ R sls − δ = ( Z ′ ˜ R ′ P ˜ RZ ) − Z ′ ˜ R ′ P ˜ RR − ε.Z ′ ˜ R ′ P ˜ RZ/n = O p (1) + O n (1 /α √ n ) by Lemmas 3 and 4.˜ R ′ P ˜ RR − ε/n = O p (1 / √ n ) + O p [(1 / ( nα (1 + α √ n )) by Lemmas 3 and 4.Then, ˆ δ R sls − δ = o p (1) as α √ n → ∞ and α →
0. This proves the consistency of theregularized 2SLS for SAR with many instruments: √ n (ˆ δ R sls − δ ) = ( Z ′ ˜ R ′ P ˜ RZ/n ) − [ Z ′ ˜ R ′ P ˜ RR − ε/ √ n ] . Using Lemmas 3 and 4, as well as the Slutzky theorem: √ n (ˆ δ R sls − δ ) d → N (cid:0) , σ ε H − (cid:1) if α √ n → ∞ and α → Proof of Proposition 5
Let us consider the MSE of the estimated parameters: n (ˆ δ R sls − δ )(ˆ δ R sls − δ ) = ˆ H − ˆ h ˆ h ′ ˆ H − with ˆ H = Z ′ ˜ R ′ P ˜ RZn and ˆ h = Z ′ ˜ R ′ P ˜ RY √ n . Our objective is to approximate the MSE. Toachieve this, I use a Nagar-type approximation in order to concentrate on the largest part of he MSE. By Lemma 3,ˆ H = Z ′ RP RZ/n − (˜ ρ − ρ ) Z ′ M ′ P RZ/n − (˜ ρ − ρ ) Z ′ R ′ P M Z/n + (˜ ρ − ρ ) Z ′ M ′ P M Z/n.
And ˆ H = Z ′ RP RZ/n + O p ((˜ ρ − ρ ) /α ). By Lemma 4, we have thatˆ H = 1 n f ′ f − n f ′ ( I − P ) f + 1 n e v ′ P ve ′ + 1 n f ′ P ve ′ + 1 n e v ′ P f + O p ((˜ ρ − ρ ) /α ) . Let us define T H = T H + T H + T H , with T H = − n f ′ ( I − P ) f , T H = 1 n e v ′ P ve ′ and T H = 1 n f ′ P ve ′ + 1 n e v ′ P f + O p ((˜ ρ − ρ ) /α ), such thatˆ H = 1 n f ′ f + T H + T H + T H = H + T H + T H + T H + o p (1)= H + T H + o p (1) . Following similar arguments, we haveˆ h = f ′ ε/ √ n − f ′ ( I − P ) ε/ √ n + e v ′ P ε/ √ n + O p [(˜ ρ − ρ ) / ( α (1 + α √ n ))] . Let us also define T h = T h + T h with T h = − f ′ ( I − P ) ε/ √ n and T h = e v ′ P ε/ √ n + O p [(˜ ρ − ρ ) / ( α (1 + α √ n ))].We therefore have ˆ h = f ′ ε/ √ n + T h + T h = h + T h + T h + o p (1)= h + T h + o p (1) . Using a Nagar-type expansion on ˆ H − , n (ˆ δ R sls − δ )(ˆ δ R sls − δ ) = H − [ I − T H H − ][ hh ′ + hT h + T h h ′ + T h T h ′ ][ I − H − T H ] H − + o p (1) . Let us define A ( α ) = [ I − T H H − ] ℑ ( α )[ I − H − T H ] with ℑ ( α ) = [ hh ′ + hT h + T h h ′ + T h T h ′ ].Therefore, A ( α ) = ℑ ( α ) + T H H − ℑ ( α ) H − T H − T H H − ℑ ( α ) − ℑ ( α ) H − T H . [ ℑ ( α ) | X ] = σ [ H − e f + 1 n f ′ P ve ′ + 1 n e v ′ P f + e f ] − E [ 1 n f ′ ( I − P ) εε ′ P ve ′ + 1 n e v ′ P εε ′ ( I − P ) f | X ]+ E [ 1 n e v ′ P εε ′ P ve ′ | X ] .E ( T H H − ℑ ( α ) | X ) = − σ e f + o p (1) and E ( ℑ ( α ) H − T H | X ) = − σ e f + o p (1). E ( T H H − ℑ ( α ) H − T H | X ) = σ HO p ([ 1 nα + 1 α √ n + ∆ f ] )= O p ([ 1 nα + 1 α √ n + ∆ f ] ) . We have E ( A ( α ) | X ) = σ H + σ e f + E [ 1 n e v ′ P εε ′ P ve ′ | X ] − E [ 1 n f ′ ( I − P ) εε ′ P ve ′ + 1 n e v ′ P εε ′ ( I − P ) f | X ]+ 1 n f ′ P ve ′ + 1 n e v ′ P f + O p ([ 1 nα + 1 α √ n + ∆ f ] ) . From Lemma 5 (viii) of Carrasco (2012), we have E [ 1 n f ′ ( I − P ) εε ′ P ve ′ + 1 n e v ′ P εε ′ ( I − P ) f | X ] = O p ( p ∆ f / √ αn )and 1 n e v ′ ( P − P ) f = O p ( p ∆ f / √ αn ).From Lemma 5 (iii) of Carrasco (2012), 1 n f ′ P ve ′ + 1 n e v ′ P f = O p ( 1 nα ).And, from Lemma 5 (iv) of Carrasco (2012), E [ 1 n e v ′ P εε ′ P ve ′ /n | X ] = 1 n ( X j q j ) σ e ι ′ D ′ Dιe ′ + o p (( X j q j ) /n )with D = J RW S − R − .We can conclude that n (ˆ δ R sls − δ )(ˆ δ R sls − δ ) = Q ( α ) + ˆ R ( α )with E [ Q ( α ) | X ] = H − σ + H − σ e f + 1 n ( X j q j ) σ e ι ′ D ′ Dιe ′ H − and r ( α ) = E ( ˆ R ( α ) | X ) = o p (( X j q j ) /n ) + O p ([ 1 nα + 1 α √ n + ∆ f ] + 1 nα + ∆ f √ αn ) . ( α ) = H − σ e f + 1 n ( X j q j ) σ e ι ′ D ′ Dιe ′ H − .Note that r ( α ) /tr ( S ( α )) = o p (1); my argument is similar to that used in Carrasco (2012).This means that S ( α ) is the dominant part of the MSE of the estimation of the model usingregularized 2SLS. eferences Angrist, J. D. (2014): “The perils of peer effects,”
Labour Economics , 30, 98–108.
Bai, J., and S. Ng (2010): “Instrumental Variable Estimation in a Data Rich Environment,”
Econometric Theory , 26(6), 1577–1606.
Bekker, P. A. (1994): “Alternative Approximations to the Distributions of InstrumentalVariable Estimators,”
Econometrica , 62(3), 657–81.
Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012): “Sparse models andmethods for optimal instruments with an application to eminent domain,”
Econometrica ,80(6), 2369–2429.
Bonacich, P. (1987): “Power and centrality: A family of measures,”
American Journal ofSociology , 92(5), 1170–1182.
Bramoull´e, Y., H. Djebbari, and B. Fortin (2009): “Identification of peer effectsthrough social networks,”
Journal of Econometrics , 150(1), 41–55.
Caner, M., and N. Yıldız (2012): “CUE with many weak instruments and nearly singulardesign,”
Journal of Econometrics , 170(2), 422–441.
Carrasco, M. (2012): “A regularization approach to the many instruments problem,”
Jour-nal of Econometrics , 170(2), 383–398.
Carrasco, M., and J.-P. Florens (2000): “Generalization Of Gmm To A Continuum OfMoment Conditions,”
Econometric Theory , 16(06), 797–834.(2014): “On the asymptotic efficiency of GMM,”
Econometric Theory , 30(02), 372–406.
Carrasco, M., J.-P. Florens, and E. Renault (2007): “Linear Inverse Problems inStructural Econometrics Estimation Based on Spectral Decomposition and Regularization,”in
Handbook of Econometrics , ed. by J. Heckman, and E. Leamer, vol. 6 of
Handbook ofEconometrics , chap. 77. Elsevier. arrasco, M., and G. Tchuente (2015): “Regularized LIML for many instruments,” Journal of Econometrics , 186(2), 427–442.(2016): “Efficient Estimation with Many Weak Instruments Using RegularizationTechniques,”
Econometric Reviews , 35(8-10), 1609–1637.
Chao, J. C., and N. R. Swanson (2005): “Consistent Estimation with a Large Number ofWeak Instruments,”
Econometrica , 73(5), 1673–1692.
Craven, P., and G. Wahba (1979): “Smoothing noisy data with spline functions: Esti-mating the correct degree of smoothing by the method of the generalized cross-validation,”
Numer. Math. , 31, 377–404.
Davidson, R., and J. G. MacKinnon (1993):
Estimation and Inference in Econometrics ,no. 9780195060119 in OUP Catalogue. Oxford University Press.
Donald, S. G., and W. K. Newey (2001): “Choosing the Number of Instruments,”
Econo-metrica , 69(5), 1161–91.
Dufour, J.-M., and C. Hsiao (2010): “Identification,” in
Microeconometrics , pp. 65–77.Springer.
Gibbons, S., and H. G. Overman (2012): “Mostly pointless spatial econometrics?,”
Journalof Regional Science , 52(2), 172–191.
Goldsmith-Pinkham, P., and G. W. Imbens (2013): “Social networks and the identifica-tion of peer effects,”
Journal of Business & Economic Statistics , 31(3), 253–264.
Guerre, E., and P. Lavergne (2005): “Data-driven rate-optimal specification testing inregression models,”
Annals of Statistics , pp. 840–870.
Han, C., and P. C. B. Phillips (2006): “GMM with Many Moment Conditions,”
Econo-metrica , 74(1), 147–192.
Hansen, C., J. Hausman, and W. Newey (2008): “Estimation With Many InstrumentalVariables,”
Journal of Business & Economic Statistics , 26(4), 398–422.
Hansen, C., and D. Kozbur (2014): “Instrumental variables estimation with many weakinstruments using regularized JIVE,”
Journal of Econometrics , 182(2), 290–308. asselt, M. v. (2010): “Many instruments asymptotic approximations under nonnormalerror distributions,” Econometric Theory , 26(02), 633–645.
Kapetanios, G., and M. Marcellino (2010): “Factor-GMM estimation with large setsof possibly weak instruments,”
Computational Statistics and Data Analysis , 54(11), 2655–2675.
Kress, R. (1999):
Linear Integral Equations . Springer.
Kuersteiner, G. (2012): “Kernel-weighted GMM estimators for linear time series models,”
Journal of Econometrics , 170(2), 399–421.
Lancaster, T. (2000): “The incidental parameter problem since 1948,”
Journal of Econo-metrics , 95(2), 391–413.
Lee, L.-F. (2004): “Asymptotic Distributions of Quasi-Maximum Likelihood Estimators forSpatial Autoregressive Models,”
Econometrica , 72(6), 1899–1925.
Lee, L.-f. (2007): “Identification and estimation of econometric models with group interac-tions, contextual factors and fixed effects,”
Journal of Econometrics , 140(2), 333–374.
Li, K.-C. (1986): “Asymptotic optimality of C L and generalized cross-validation in ridgeregression with application to spline smoothing,” The Annals of Statistics , 14(3), 1101–1112. (1987): “Asymptotic optimality for C p , C L , cross-validation and generalized cross-validation: Discrete Index Set,” The Annals of Statistics , 15(3), 958–975.
Liu, X., and L.-f. Lee (2010): “GMM estimation of social interaction models with central-ity,”
Journal of Econometrics , 159(1), 99–115.
Liu, X., E. Patacchini, and Y. Zenou (2014): “Endogenous peer effects: local aggregateor local average?,”
Journal of Economic Behavior & Organization , 103, 39–59.
Lyytik¨ainen, T. (2012): “Tax competition among local governments: Evidence from aproperty tax reform in Finland,”
Journal of Public Economics , 96(7), 584–595.
Mallows, C. L. (1973): “Some Comments on Cp,”
Technometrics , 15(4), 661–675. anski, C. F. (1993): “Identification of endogenous social effects: The reflection problem,” The Review of Economic Studies , 60(3), 531–542.
Newey, W. K. (2013): “Nonparametric instrumental variables estimation,”
The AmericanEconomic Review , 103(3), 550–556.
Newey, W. K., F. Hsieh, and J. Robins (1998): “Undersmoothing and bias correctedfunctional estimation,” .
Newey, W. K., and F. Windmeijer (2009): “Generalized method of moments with manyweak moment conditions,”
Econometrica , 77(3), 687–719.
Neyman, J., and L. Scott (1948): “Consistent estimates based on partially consistentobservations,”
Econometrica , 16(1), 1–32.
Okui, R. (2011): “Instrumental variable estimation in the presence of many moment condi-tions,”
Journal of Econometrics , 165(1), 70–86. ¨Ozt¨urk, F., and F. Akdeniz (2000): “Ill-conditioning and multicollinearity,”
Linear Alge-bra and Its Applications , 321(1-3), 295–305.
Staiger, D., and J. H. Stock (1997): “Instrumental Variables Regression with WeakInstruments,”
Econometrica , 65(3), 557–586.
Stone, C. J. (1974): “Cross-validatory choice and assessment of statistical predictions,”
Journal of the Royal Statistical Society. Series B (Methodological) , 36(2), 111–147.
Tiebout, C. M. (1956): “A pure theory of local expenditures,”
Journal of political economy ,64(5), 416–424.
West, K. D., and W. K. Newey (1987): “A Simple, Positive Semi-Definite, Heteroskedas-ticity and Autocorrelation Consistent Covariance Matrix,”
Econometrica , 55(3), 703–708. Appendix: Monte Carlo Simulation Results
Mean, standard deviation (SD) and root mean square errors (RMSE) of the empirical dis-tributions of the estimates are reported. Each data-generating process uses 500 replications.
Table 2: Simulation results with maximum of three connections (1/2) m = 10 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . g = 602SLS (finite iv) 0.104 (0.136) [0.136] 0.203 (0.047) [0.047] 0.204 (0.049) [0.049] 0.116 (0.177) [0.178]2SLS (large iv) 0.032 (0.081) [0.105] 0.196 (0.046) [0.046] 0.217 (0.043) [0.046] -Bias-corrected 2SLS 0.108 (0.099) [0.099] 0.202 (0.047) [0.047] 0.204 (0.045) [0.045] -T-2SLS 0.055 (0.088) [0.099] 0.193 (0.051) [0.052] 0.213 (0.046) [0.048] -LF-2SLS 0.064 (0.095) [0.101] 0.193 (0.053) [0.054] 0.212 (0.048) [0.049] -PC-2SLS 0.064 (0.095) [0.101] 0.193 (0.053) [0.054] 0.212 (0.048) [0.049] - Mean (SD) [RMSE]
Table 3: Simulation results with maximum of three connections (2/2) m = 15 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . g = 602SLS (finite iv) 0.093 (0.103) [0.103] 0.198 (0.037) [0.037] 0.200 (0.039) [0.039] 0.109 (0.143) [0.143]2SLS (large iv) 0.066 (0.061) [0.070] 0.197 (0.038) [0.038] 0.206 (0.034) [0.035] -Bias-corrected 2SLS 0.096 (0.072) [0.072] 0.198 (0.038) [0.038] 0.199 (0.036) [0.036] -T-2SLS 0.079 (0.069) [0.072] 0.196 (0.043) [0.043] 0.204 (0.037) [0.037] -LF-2SLS 0.082 (0.074) [0.076] 0.196 (0.046) [0.046] 0.203 (0.040) [0.040] -PC-2SLS 0.082 (0.074) [0.076] 0.196 (0.046) [0.046] 0.203 (0.040) [0.040] - Mean (SD) [RMSE] m = 10 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . g = 602SLS (finite iv) 0.099 (0.090) [0.090] 0.204 (0.051) [0.051] 0.207 (0.034) [0.035] 0.118 (0.158) [0.159]2SLS (large iv) 0.053 (0.038) [0.060] 0.193 (0.047) [0.048] 0.209 (0.032) [0.034] -Bias-corrected 2SLS 0.099 (0.064) [0.064] 0.203 (0.049) [0.049] 0.205 (0.034) [0.034] -T-2SLS 0.066 (0.044) [0.056] 0.184 (0.054) [0.056] 0.205 (0.035) [0.035] -LF-2SLS 0.072 (0.049) [0.057] 0.180 (0.058) [0.061] 0.204 (0.036) [0.036] -PC-2SLS 0.072 (0.049) [0.057] 0.180 (0.058) [0.061] 0.204 (0.036) [0.036] - Mean (SD) [RMSE]
Table 5: Simulation results with maximum of six connections (2/2) m = 15 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . g = 602SLS (finite iv) 0.103 (0.050) [0.050] 0.200 (0.039) [0.039] 0.200 (0.026) [0.026] 0.108 (0.100) [0.100]2SLS (large iv) 0.086 (0.030) [0.033] 0.196 (0.039) [0.039] 0.202 (0.024) [0.025] -Bias-corrected 2SLS 0.105 (0.035) [0.035] 0.200 (0.039) [0.039] 0.199 (0.025) [0.025] -T-2SLS 0.094 (0.035) [0.035] 0.193 (0.043) [0.043] 0.201 (0.027) [0.027] -LF-2SLS 0.097 (0.038) [0.039] 0.193 (0.044) [0.044] 0.201 (0.028) [0.028] -PC-2SLS 0.097 (0.038) [0.039] 0.193 (0.044) [0.044] 0.201 (0.028) [0.028] - Mean (SD) [RMSE] m = 10 g = 30 λ = 0 . β = 0 . β = 0 . ρ = 0 . Mean (SD) [RMSE]