Identifying Network Ties from Panel Data: Theory and an Application to Tax Competition
IIdentifying Network Ties from Panel Data: Theory and an Application to Tax Competition ∗ Áureo de Paula Imran Rasul Pedro CL Souza † .April 2020 Abstract
Social interactions determine many economic behaviors, but information on social tiesdoes not exist in most publicly available and widely used datasets. We present results on theidentification of social networks from observational panel data that contains no information onsocial ties between agents. In the context of a canonical social interactions model, we providesufficient conditions under which the social interactions matrix, endogenous and exogenoussocial effect parameters are all globally identified. While this result is relevant across differentestimation strategies, we then describe how high-dimensional estimation techniques can beused to estimate the interactions model based on the Adaptive Elastic Net GMM method.We employ the method to study tax competition across US states. We find the identified socialinteractions matrix implies tax competition differs markedly from the common assumption ofcompetition between geographically neighboring states, providing further insights for the long-standing debate on the relative roles of factor mobility and yardstick competition in drivingtax setting behavior across states. Most broadly, our identification and application show theanalysis of social interactions can be extended to economic realms where no network dataexists.
JEL Codes: C31, D85, H71. ∗ We gratefully acknowledge financial support from the ESRC through the Centre for the Microeconomic Anal-ysis of Public Policy (RES-544-28-0001), the Centre for Microdata Methods and Practice (RES-589-28-0001) andthe Large Research Grant ES/P008909/1 and from the ERC (SG338187). We thank Edo Airoldi, Luis Alvarez,Oriana Bandiera, Larry Blume, Yann Bramoullé, Stephane Bonhomme, Vasco Carvalho, Gary Chamberlain, An-drew Chesher, Christian Dustmann, Sérgio Firpo, Jean-Pierre Florens, Eric Gautier, Giacomo de Giorgi, MatthewGentzkow, Stefan Hoderlein, Bo Honoré, Matt Jackson, Dale Jorgensen, Christian Julliard, Maximilian Kasy, MilesKimball, Thibaut Lamadon, Simon Sokbae Lee, Arthur Lewbel, Tong Li, Xiadong Liu, Elena Manresa, CharlesManski, Marcelo Medeiros, Angelo Mele, Francesca Molinari, Pepe Montiel, Andrea Moro, Whitney Newey, ArielPakes, Eleonora Pattachini, Michele Pelizzari, Martin Pesendorfer, Christiern Rose, Adam Rosen, Bernard Salanie,Olivier Scaillet, Sebastien Siegloch, Pasquale Schiraldi, Tymon Sloczynski, Kevin Song, John Sutton, Adam Szeidl,Thiago Tachibana, Elie Tamer, and seminar and conference participants for valuable comments. We also thankTim Besley and Anne Case for comments and sharing data. Previous versions of this paper were circulated as“Recovering Social Networks from Panel Data: Identification, Simulations and an Application.” All errors remainour own. Simulation and estimation codes are available upon request. † de Paula: University College London, CeMMAP and IFS, [email protected]; Rasul: University College Londonand IFS, [email protected]; Souza: Warwick University, [email protected] a r X i v : . [ ec on . E M ] A p r Introduction
In many economic environments, behavior is shaped by social interactions between agents. In in-dividual decision problems, social interactions have been key to understanding outcomes as diverseas educational test scores, the demand for financial assets, and technology adoption (Sacerdote,2001; Bursztyn et al. , 2014; Conley and Udry, 2010). In macroeconomics, the structure of firm’sproduction and credit networks propagate shocks, or help firms to learn (Acemoglu et al. , 2012;Chaney, 2014). In political economy, ties between jurisdictions are key to understanding tax settingbehavior (Tiebout, 1956; Shleifer, 1985; Besley and Case, 1994).Underpinning all these bodies of research is some measurement of the underlying social tiesbetween agents. However, information on social ties does not exist in most publicly available andwidely used datasets. To overcome this limitation, studies of social interaction either postulate tiesbased on common observables or homophily, or elicit data on networks. However, it is increasinglyrecognized that postulated and elicited networks remain imperfect solutions to the fundamentalproblem of missing data on social ties, because of econometric concerns that arise with eithermethod, or simply because of the cost of collecting network data. Two consequences are that: (i) classes of problems in which social interactions occur are un-derstudied, because social networks data is missing or too costly to collect; (ii) there is no wayto validate social interactions analysis in contexts where ties are postulated. In this paper wetackle this challenge by deriving sufficient conditions under which global identification of the en-tire structure of social networks is obtained, using only observational panel data that itself contains no information on network ties. Our identification results allow the study of social interactionswithout data on social networks, and the validation of structures of social interaction where socialties have hitherto been postulated. The recovered networks are economically meaningful to explainthe effects under study since they are entirely estimated from the data itself, and not driven by exante assumptions on how individuals interact.A researcher is assumed to have panel data on individuals i = 1 , ..., N for instances t = 1 , ..., T .An instance refers to a specific observation for i and need not correspond to a time period (forexample if i refers to a firm, t could refer to market t ). The outcome of interest for individual i in As detailed in de Paula (2017), elicited networks are often self-reported, and can introduce error for the outcomeof interest. Network data can be censored if only a limited number of links can feasibly be reported. Incompletesurvey coverage of nodes in a network may lead to biased aggregate network statistics. Chandrasekhar and Lewis(2016) show that even when nodes are randomly sampled from a network, partial sampling leads to non-classicalmeasurement error, and biased estimation. Collecting social network data is also a time and resource intensiveprocess. In response to these concerns, a nascent strand of literature explores cost-effective alternatives to fullelicitation to recover aggregate network statistics (Breza et al. , 2019). t is y it and is generated according to a canonical structural model of social interactions: y it = ρ N (cid:88) j =1 W ,ij y jt + β x it + γ N (cid:88) j =1 W ,ij x jt + α i + α t + (cid:15) it (1)Outcome y it depends on the outcome of other individuals to whom i is socially tied, y jt , and x jt includes characteristics of those individuals (or lagged values of y it ). W ,ij measures how theoutcome and characteristics of j causally impact the outcome for i . As outcomes for all individualsobey equations analogous to (1), the system of equations can be written in matrix notation wherethe structure of interactions is captured by the adjacency matrix, denoted W . Our approachallows for unobserved heterogeneity across individuals α i and common shocks to all individuals α t . This framework encompasses the classic linear-in-means specification of Manski (1993). Inhis terminology, ρ and γ capture endogenous and exogenous social effects, and α t capturescorrelated effects. The distinction between endogenous and exogenous peer effects is critical, asonly the former generates social multiplier effects.Manski’s seminal contribution set out the reflection problem of separately identifying endoge-nous, exogenous and correlated effects in linear models. However, it has been somewhat overlookedthat he also set out another challenge on the identification of the social network in the first place. This is the problem we tackle and so expand the scope of identification beyond ρ , β and γ . Ourpoint of departure from much of the literature is to therefore presume W is entirely unknown to theresearcher. We derive sufficient conditions under which all the entries in W , and the endogenousand exogenous social effect parameters, ρ and γ , are globally identified. By identifying the socialinteractions matrix W , our results allow the recovery of aggregate network characteristics suchas the degree distribution and patterns of homophily, as well as node-level statistics such as thestrength of social interactions between nodes, and the centrality of nodes. This is useful becausesuch aggregate and node-level statistics often map back to underlying models of social interaction(Ballester et al. , 2006; Jackson et al. , 2017; de Paula, 2017).The mathematical strategy for our identification result is new and fundamentally different fromthose employed elsewhere in this nascent literature (and does not rely on requirements on networksparsity). However it delivers sufficient conditions that are mild, and relate to existing results onthe identification of social effects parameters when W is known (Bramoullé et al. , 2009; De Giorgi Blume et al. (2015) present micro-foundations based on non-cooperative games of incomplete information forindividual choice problems, that result in this estimating equation for a class of social interaction models. Manski (1993) highlights difficulties (and potential restrictions) for identifying ρ , β and γ when all individualsinteract with each other, and when this is observed by the researcher. In (1), this corresponds to W ,ij = N − , for i, j = 1 , . . . , N . At the same time, he states (p. 536), “I have presumed that researchers know how individuals formreference groups and that individuals correctly perceive the mean outcomes experienced by their supposed referencegroups. There is substantial reason to question these assumptions (...) If researchers do not know how individualsform reference groups and perceive reference-group outcomes, then it is reasonable to ask whether observed behaviorcan be used to infer these unknowns (...) The conclusion to be drawn is that informed specification of referencegroups is a necessary prelude to analysis of social effects.” t al. , 2010; Blume et al. , 2015). Our identification result is also useful in other estimation contexts,such as when a researcher has partial knowledge of W , or in navigating between priors on reduced-form (later denoted Π ) and structural (later denoted θ ) parameters in a Bayesian framework, thusavoiding issues raised in Kline and Tamer (2016).Global identification is a necessary requirement for consistency of extremum estimators suchas those based on GMM (Hansen 1982; Newey and McFadden 1994). Our identification analysisthus provides primitives for this important condition. To estimate the model, we employ theAdaptive Elastic Net GMM method (Caner and Zhang, 2014) because this allows us to deal with apotentially high-dimensional parameter vector (in comparison to the time dimension in the data)including all the entries of the social interactions matrix W , though other estimation protocolsmay also be entertained (e.g. using Bayesian methods or a priori information). We showcase the method using Monte Carlo simulations based on stylized random networkstructures as well as real world networks. In each case, we take a fixed network structure W , andsimulate panel data as if the data generating process were given by (1). We then apply the methodon the simulated panel data to recover estimates of all elements in W , as well as the endogenousand exogenous social effect parameters ( ρ , γ ). The networks considered vary in size, complexity,and their aggregate and node-level features. Despite this heterogeneity, we find the method toperform well in all simulations. In a reasonable dimension of panel data T and with varying nodenumbers across simulations ( N ), we find the true network structure W is well recovered. Foreach simulated network, the majority of true links are correctly identified even for T = 5 , andthe proportion of true non-links (zeroes in W ) captured correctly as zeros is over % even when T = 5 . Both proportions rapidly increase with T . The endogenous and exogenous social effects arealso correctly captured as T increases. Of course, small- T biases are as expected, being analogousto well-known results for autoregressive time series models. A fortiori , we estimate aggregateand node-level statistics of each network, demonstrating the accurate recovery of key players innetworks for example. Furthermore, biases in the estimation of endogenous and exogenous effects One such example is the nascent literature of Aggregate Relational Data (ARD) as in Breza et al. (2019).Another possibility is that individuals are known to belong to subgroups, so W is block diagonal. We also interpretthis situation generally as partial network knowledge. Partial information on the network is readily incorporated inthis setup and may, in some circumstances, greatly reduce the number of parameters to be estimated. For example,Barrenho et al. (2019) study spillovers in the adoption of laparascopy using consultants in the UK health service.With a sample of N = 1 , consultants and T = 15 years, they restrict links to occur between doctors who haveworked in the same hospital. The elastic net was introduced by Zou and Hastie, 2005 in part to circumvent difficulties faced by alternativeestimation protocols (e.g., LASSO) when the number of parameters, p , exceeds the number of observations, n (where p and n follow the notation in that paper). Whereas the theoretical results on the large sample properties ofelastic net estimators usually have not exploited sparsity, several articles have demonstrated its performance in datascenarios where this occurs. For example, Zou and Hastie, 2005 consider an application to leukemia classificationwhere p = 7 , and n = 72 (see their Section 6) and Zou and Zhang, 2009 explore a scenario where p = 1 , and n = 200 . The favourable performance of the elastic net in these cases also relates to the literature on the‘effective number of parameters’ (or ‘effective degrees of freedom’) in the estimation of sparse models (Tibshiraniand Taylor, 2012). In Section 3 we provide an informal calculation for the minimum number of time periods suchthat penalized estimation is feasible in our context. ˆ ρ, ˆ γ ) fall quickly with T and are close to zero for large sample sizes.The practical use of our proposed method is already being demonstrated in a range of applica-tions. For example, Fetzer et al. (2020) study the impact on conflict of the transition of securityresponsibilities between international and Afghan forces. Our proposed method is used to controlfor violation of SUTVA-type hypotheses that might occur because of spillover and displacementeffects of insurgent forces across districts. Since the pattern of displacement is unobserved – and,in fact, insurgents have incentives to obfuscate their strategy – the current method is applied tofully recover the network and bound the effects of the end of the military occupation on conflict.Our identification results have also been applied, for instance, by Zhou (2019), who focusses onunobserved networks with grouped heterogeneity, and suggests a nonlinear least squares procedurefor estimation on a single network observation. We believe our technical arguments may also beuseful in related models (e.g., Vector Auto-Regressions).In the final part of our analysis, we apply the method to shed new light on a classic realworld social interactions problem: tax competition between US states. The literatures in politicaleconomy and public economics have long recognized the behavior of state governors might beinfluenced by decisions made in ‘neighboring’ states. The typical empirical approach has been topostulate the relevant neighbors as being geographically contiguous states. Our approach allowsus to infer the set of economic neighbors determining social interactions in tax setting behaviorfrom panel data on outcomes and covariates alone. In this application, the panel data dimensionscover mainland US states, N = 48 , for years 1962-2015, T = 53 .We find the identified network structure of tax competition to differ markedly from the commonassumption of competition between geographic neighbors. The identified network has fewer edgesthan the geography-based network, that gets reflected in the far lower clustering coefficient in theidentified network than in the geographic network ( . versus . ). With the recovered socialinteractions matrix we establish, beyond geography, what covariates correlate to the existence ofties between states and the strength of those ties. We identify non-adjacent states that influencetax setting and, more broadly, we establish that social interactions are highly asymmetric: somestates – such as Delaware, a well known low-tax state – are especially focal in driving tax settingin other jurisdictions. We use all these results to shed new light on the main hypotheses for socialinteractions in tax setting: factor mobility and yardstick competition (Tiebout, 1956; Shleifer,1985; Besley and Case, 1994).Our paper contributes to the literature on the identification of social interactions models.The first generation of papers studied the case where W is known, so only the endogenous andexogenous social effects parameters need to be identified. It is now established that if the known W differs from the linear-in-means example, ρ and γ can be identified (Bramoullé et al. , 2009;De Giorgi et al. , 2010). Intuitively, identification in those cases can use peers-of-peers, that arenot necessarily connected to individual i and can be used to leverage variation from exclusionrestrictions in (1), or can use groups of different sizes within which all individuals interact among5ach other (Lee, 2007). Bramoullé et al. (2009) show these conditions are met if I, W and W arelinearly independent, which is shown to hold generically by Blume et al. (2015). However, as madeprecise in Section 2, the linear algebraic arguments employed in Bramoullé et al. (2009) or Blume et al. (2015) do not apply when W is unobserved and other arguments have to be used instead. Our paper builds on these papers by studying the problem where W is entirely unknownto the researcher. In so doing, we open up the study of social interactions to the many realmswhere complete social network data does not actually exist. Closely related to our work, Blume et al. (2015) investigate the case when W is partially observed. Specifically, Blume et al. (2015,Theorem 6) show that if two individuals are known to not be directly connected, the parametersof interest in a model related to (1) can be identified. An alternative approach is taken in Blume et al. (2011, Theorem 7): they suggest a parameterization of W according to a pre-specifieddistance between nodes. We do not impose such restrictions, but note that partial observabilityof W (as in Blume et al. , 2015) or placing additional structure on W (as in Blume et al. , 2011)is complementary to our approach as it reduces the number of parameters in W to be retrieved.Bonaldi et al. (2015) and Manresa (2016) estimate models like (1) when W is not observed,but where ρ is restricted to be zero so there are no endogenous social effects. They use sparsity-inducing methods from the statistics literature, but the presence of ρ in our case complicatesidentification non-trivially because it introduces issues of simultaneity that we address. Rose (2015) also presents related identification results for linear models like (1). Assumingsparsity of the neighborhood structure, Rose (2015) offers identification conditions under rank re-strictions on sub-matrices of the reduced form coefficient matrix from a regression of outcomes ( y it )on covariates ( x it ). Intuitively, given two observationally equivalent systems, sparsity guaranteesthe existence of pairs that are not connected in either. Since observationally equivalent systems arelinked via the reduced-form coefficient matrix, this pair allows one to identify certain parameters inthe model. Having identified those parameters, Rose (2015) shows that one can proceed to identifyother aspects of the structure (see also Gautier and Rose, 2016). This is related to the ideas inBlume et al. (2015, Theorem 6), who show identification results can be leveraged if individualsare known not to be connected. Our main identification results do not rely on properties of sparsenetworks, and make use of plausible and intuitive conditions, whereas the auxiliary rank conditionsnecessary in Rose (2015) may be computationally complex to verify. More recently, Lewbel et al. (2019) propose an estimation strategy for the parameters ρ , β and γ of model (1) in the absenceof network links if many different groups are able to be observed. Battaglini et al. (2019) estimatea structural model specifically for the case of unobserved social connections in the U.S. Congress.Finally, in the statistics literature, Lam and Souza (2019) study the penalized estimation of Alternative identification approaches when W is known focus on higher moments (variances and covariancesacross individuals) of outcomes (de Paula, 2017), and rely on additional restrictions on higher moments of (cid:15) it . Notethat (1) is a spatial autoregressive model. In that literature, W is also typically assumed known (Anselin, 2010). Manresa (2016) allows for unit-specific β parameters. While in many applications those are taken to behomogeneous, we also discuss extensions on how heterogeneity in those parameters can be handled when ρ (cid:54) = 0 . W is not observed, assuming the model and social interactions are identified.The statistical literature on graphical models has investigated the estimation of neighborhoodsdefined by the covariance structure of the random variables at hand (Meinshausen and Buhlmann,2006). This corresponds to a model where y t = ( I − ρ W ) − (cid:15) t is jointly normal (abstractingfrom covariates). On a graph with N nodes corresponding to the variables in the model, anedge between two nodes (variables) i and j is absent when these two variables are conditionallyindependent given the other nodes. In this Gaussian model, this corresponds to a zero ij entry inthe inverse covariance matrix for y t (see, e.g., Yuan and Lin, 2007, p. 19). In the model above,the inverse covariance matrix is ( I − ρ W ) (cid:62) Σ − (cid:15) ( I − ρ W ) , where Σ (cid:15) is the variance covariancestructure for (cid:15) t . The discovery of zero entries in this matrix is not equivalent to the identificationof W as we study, and involves Σ (cid:15) (as do identification strategies using higher moments when W is known). Related studies in the statistics literature also focus on higher moments and defineneighborhoods differently (Diebold and Yilmaz, 2015; Rothenhäusler et al. , 2015).Our conclusions discuss how our approach can be modified, and assumptions weakened, tointegrate in partial knowledge of W . We also discuss the next steps required to simultaneouslyidentify models of network formation and the structure of social interactions.The paper is organized as follows. Section 2 presents our core result: the sufficient conditionsunder which the social interactions matrix, endogenous and exogenous social effects are globallyidentified. Section 3 describes the high-dimensional estimation techniques used, based on theAdaptive Elastic Net GMM method and presents simulation results from stylized and real-worldnetworks. Section 4 applies our methods to study tax competition between US states. Section 5concludes. The Appendix provides proofs and further details on estimation and simulations. Consider a researcher with panel data covering i = 1 , . . . , N individuals repeatedly observed over t = 1 , . . . , T instances. We consider that the number of individuals N in the network is fixed, butpotentially large. The aim is to use this data to identify a social interactions model, with no data onactual social ties being available. For expositional ease, we first consider identification in a simplerversion of the canonical model in (1), where we drop individual-specific ( α i ) and time-constantfixed effects ( α t ), and assume x it is a one-dimensional regressor for individual i and instance t . Ofcourse, we later extend the analysis to include individual-specific and time-constant fixed effects,and also allow for multidimensional covariates x k,it , k = 1 , . . . , K . We adopt the subscript “0” to In fact, Meinshausen and Buhlmann (2006)’s neighborhood estimates (as also Lam and Souza (2019)’s) rely on(penalized) regressions of y it on y t , . . . , y i − ,t , y i +1 ,t , . . . , y N,t , which do not address the econometric endogeneityin estimating W . y it = ρ N (cid:88) j =1 W ,ij y jt + β x it + γ N (cid:88) j =1 W ,ij x jt + (cid:15) it . (2)As outcomes for all individuals i = 1 , . . . , N obey equations analogous to (2), the system ofequations can be more compactly written in matrix notation as: y t = ρ W y t + β x t + γ W x t + (cid:15) t . (3)The vector of outcomes y t = ( y t , . . . , y Nt ) (cid:48) assembles the individual outcomes in instance t ; thevector x t does the same with individual characteristics. y t , x t and (cid:15) t have dimension N × , thesocial interactions matrix W is N × N , and ρ , β , and γ are scalar parameters. We do notmake any distributional assumptions on (cid:15) t beyond E ( (cid:15) t | x t ) = 0 (or E ( (cid:15) t | z t ) = 0 for an appropriateinstrumental variable z t if x t is also endogenous). We assume the network structure is predeter-mined and constant, and that the number of individuals N is fixed. The network structure W isa parameter to be identified and estimated. A regression of outcomes on covariates corresponds, then, to the reduced form for (3), y t = Π x t + ν t , (4)with Π = ( I − ρ W ) − ( β I + γ W ) and ν t ≡ ( I − ρ W ) − (cid:15) t . If W is observed, Bramoullé et al. (2009) note that a structure ( ρ, β, γ ) that is observationally equivalent to ( ρ , β , γ ) is suchthat ( I − ρ W ) − ( β I + γ W ) = ( I − ρW ) − ( βI + γW ) . This equation can be written as alinear equation in I, W and W and identification is established if those matrices are linearlyindependent. If W is not observed, the putative unobserved structure now comprises W andan observationally equivalent parameter vector will instead satisfy ( I − ρ W ) − ( β I + γ W ) =( I − ρW ) − ( βI + γW ) . Following the strategy in Bramoullé et al. (2009) would lead to an equationin I, W, W and W W , and the insights obtained in that paper then do not carry over for the casewe study when W is unknown.We establish identification of the structural parameters of the model, including the socialinteractions matrix W , from the coefficients matrix Π . Without data on the network W , we treatit as an additional parameter in an otherwise standard model relating outcomes and covariates.Our identification strategy relies on how changes in covariates x it reverberate through the systemand impact y it , as well as outcomes for other individuals. These are summarized by the entries A related set of papers instead focuses on the distribution of networks generating the pattern in data and aimsto estimate aggregate network effects. Souza (2014) offers several identification and estimation results in this spirit.In particular, he infers the network distribution within a certain class of statistical network formation models fromoutcome data from many groups, such as classrooms, in few time periods. We instead concentrate on estimatingthe set of links for one group of size N followed over t = 1 , . . . , T instances.
8f the coefficient matrix Π , which, in turn, encode information about W and ( ρ , β , γ ) . Anon-zero partial effect x it of y jt indicates the existence of direct or indirect links between i and j . When ρ = 0 (and Π = β I + γ W ), only direct links would produce such a correlation.When ρ (cid:54) = 0 , both direct and indirect connections may generate a non-zero response but distantconnections will lead to a lower response. Our results formally determine sufficient conditions toprecisely disentangle these forces.We first set out five assumptions underpinning our main identification results. Three of theseare entirely standard in the social interactions. A fourth is a normalization required to separatelyidentify ( ρ , γ ) from W , and the fifth is closely related to known results on the identificationof ( ρ , γ ) when W is known (Bramoullé et al. , 2009). These Assumptions (A1-A5) deliver anidentified set of up to two points.Our first assumption explicitly states that no individual affects himself and is a standardcondition in social interaction models:(A1) ( W ) ii = 0 , i = 1 , . . . , N .With Assumption (A1), we can omit elements on the diagonal of W from the parameter space.We thus can denote a generic parameter vector as θ = ( W , . . . , W N,N − , ρ, γ, β ) (cid:48) ∈ R m , where m = N ( N −
1) + 3 , and W ij is the ( i, j ) -th element of W . Reduced-form parameters can be tiedback to the structural model (3) by letting Π : R m → R N define the relation between structuraland reduced-form parameters: Π( θ ) = ( I − ρW ) − ( βI + γW ) , where θ ∈ R m , and Π ≡ Π( θ ) .As (cid:15) t (and, consequently, ν t ) is mean-independent from x t , E [ (cid:15) t | x t ] = 0 , the matrix Π canbe identified as the linear projection of y t on x t . We do not impose additional distributionalassumptions on the disturbance term, except for conditions that allow us to identify the reduced-form parameters in (4). If x t is endogenous, i.e. E [ (cid:15) t | x t ] (cid:54) = 0 , a vector of instrumental variables z t may still be used to identify Π . In either case, identification of Π requires variation of theregressor across individuals i and through instances t . In other words, either E [ x t x (cid:48) t ] (if exogeneityholds) or E [ x t z (cid:48) t ] (otherwise) are full-rank.Our next assumption controls the propagation of shocks and guarantees they die as they rever-berate through the network. This provides adequate stability in the system, and is closely relatedto the concept of stationarity in network models. It implies the maximum eigenvalue norm of ρ W is less than one. It also ensures ( I − ρ W ) is a non-singular matrix, and so the variance of y t exists,the transformation Π( θ ) is well-defined, and the Neumann expansion ( I − ρ W ) − = (cid:80) ∞ j =0 ( ρ W ) j is appropriate.(A2) (cid:80) Nj =1 | ρ ( W ) ij | < for every i = 1 , . . . , N , (cid:107) W (cid:107) < C for some positive C ∈ R and | ρ | < .9e next assume that network effects do not cancel out, another standard assumption.(A3) β ρ + γ (cid:54) = 0 .The need for this assumption can be shown by expanding the expression for Π( θ ) , which is possibleby (A2): Π( θ ) = β I + ( ρ β + γ ) ∞ (cid:88) k =1 ρ k − W k . (5)If Assumption (A3) were violated, β ρ + γ = 0 and Π = β I so the endogenous and exogenouseffects balance each other out, and network effects are altogether eliminated in the reduced form. Identification of the social effects parameters ( ρ , γ ) requires that at least one row of W addsto a fixed and known number. Otherwise, ρ and γ cannot be separately identified from W .Clearly, no such condition would be required if W was observed.(A4) There is an i such that (cid:80) j =1 ,...,N ( W ) ij = 1 .Letting W y ≡ ρ W and W x ≡ γ W denote the matrices that summarize the influence of peers’outcomes (the endogenous social effects) and characteristics on one’s outcome (the exogenous socialeffects), respectively, the assumption above can be seen as a normalization. In this case, ρ and γ represent the row-sum for individual i in W y and W x , respectively. In line with the literature, wemaintain that the same W governs the structure of both endogenous ( W y ) and exogenous ( W x )effects. We later discuss relaxing this assumption when more than one regressor is used.Our final assumption provides for a specific kind of network asymmetry. We require the diagonalof W not to be constant as one of our sufficient conditions for identification.(A5) There exists l, k such that ( W ) ll (cid:54) = ( W ) kk , i.e. the diagonal of W is not proportional to ι , where ι is the N × vector of ones.In unweighted networks, the diagonal of the square of the social interactions matrix capturesthe number of reciprocated links for each individual or, in the case of undirected networks, thepopularity of those individuals. Assumption (A5) hence intuitively suggests differential popularityacross individuals in the social network.This assumption is related to the network asymmetry condition proposed elsewhere, such as inBramoullé et al. (2009). They show that when W is known, the structural model (2) is identified One important case is when networks do not determine outcomes, which we interpret as ρ = γ = 0 or with W representing the empty network. From equation (5), it is clear that if Π( θ ) is not diagonal with constant entries,then it must be that ( ρ β + γ ) (cid:54) = 0 , which implies that ρ (cid:54) = 0 or γ (cid:54) = 0 , and also that W is non-empty. Takentogether, this suggests that the observation that Π( θ ) is not diagonal is sufficient to ensure that network effectsare present and Assumption (A3) is not violated. An alternative to Assumption (A4) is to impose the normalization on the parameters. For example, one couldnormalize ρ ∗ = 1 and allow the network to be rescaled accordingly. In this case, W ∗ = ρ W would be identifiedinstead. Also W x = γ ρ W ∗ so γ would be identified relative to ρ . W y and W x are unchanged. I , W , and W are linearly independent. Given the remaining assumptions, this condition issatisfied if (A5) is satisfied, but the converse is not true: one can construct examples in which I , W , and W are linearly independent when W has a constant diagonal, so that Π does not pindown θ . The strengthening of this hypothesis is the formal price to pay for the social interactionsmatrix W being unknown to the researcher. Before proceeding to our formal results, we provide a very simple illustration to shed lighton how the assumptions above come together to provide identification. Suppose the observedreduced-form matrix is, Π = 1455
275 310 0310 275 00 0 182 , and that, following (A4), the first row is normalized to one. From the third row and column of Π ,we see there is no path of any length connecting the individual in row 3 to or from those in rows 1or 2 since her outcome is not affected by their covariates and their outcomes are not affected by hercovariates. In other words, individual 3 is isolated and ( W ) = ( W ) = ( W ) = ( W ) = 0 .On the other hand, individuals 1 and 2 cannot be isolated as their covariates are correlated withthe other individual’s outcome, reflecting (A5). Due to the row-sum normalization of the firstrow, ( W ) = 1 . Using (A3), it can be seen that W is symmetric if Π is symmetric. We thusfind that ( W ) = 1 . This and (A1) map all elements of W , and thus, W = . As the third individual is isolated, she will be only be affected by her exogenous x i and not by To see the strength of the assumption of Bramoullé et al. (2009) when W is known , choose constants c , c ,and c such that c I + c W + c W = 0 . Focusing on diagonal elements of this condition, we see that if the diagonalof W is not proportional to the diagonal of I , then c = c = 0 because diag ( W ) = 0 . It follows that c = 0 if atleast one (off-diagonal) element of W is non-zero. However, the converse is not true, so that if Assumptions A1-A5do not hold, one can construct examples where Π does not pin down θ . Take, for instance, N = 5 with θ and θ where β = β = 1 , ρ = 1 . , ρ = 0 . , γ = − . , γ = 0 . , W = . . . . . . . . . . and W = . . . . . . . . . . . Both W and W violate (A5) ( ( W ) kk = ( W ) kk = 0 . for any k ), and ρ violates (A2). Nonetheless, I, W and W are linearly independent and, likewise, so are I, W , and W . In this case, both parameter sets produce Π = ( I − ρ W ) − ( β I + γ W ) = ( I − ρW ) − ( βI + γW ) . This arises even as W and W represent very differentnetwork structures: any pair connected under W is not connected under W and vice-versa . If on the other hand, ( W ) ij = 0 . , i (cid:54) = j in violation of (A5) and all agents were connected, the model wouldnot be identified. (3 , element of Π is equal to β = = . .To find ρ , note that ( I − ρ W )Π = β I + γ W . Hence focussing on the (1,1) elements of thematrices above, we find that − ρ = . , implying ρ = . (complying with (A2)). Finally, γ is identified from entry (1 , , giving γ = − . = . . Under the relatively mild assumptions above, we can begin to identify parameters related to thenetwork. These results are then useful for our main identification theorems. Let λ j denote aneigenvalue of W with corresponding eigenvector v ,j for j = 1 , . . . , N . Assumptions (A2) and (A3)allow us to identify the eigenvectors of W directly from the reduced form. As | ρ | < : Π v ,j = β v ,j + ( ρ β + γ ) ∞ (cid:88) k =1 ρ k − W k v ,j = (cid:34) β + ( ρ β + γ ) ∞ (cid:88) k =1 ρ k − λ k ,j (cid:35) v ,j = β + γ λ ,j − ρ λ ,j v ,j . (6)The infinite sum converges as | ρ λ ,j | < by (A2). The equation above implies that v ,j is alsoan eigenvector of Π with associated eigenvalue λ Π ,j = β + γ λ ,j − ρ λ ,j . The fact that eigenvectors of Π are also eigenvectors of W has a useful implication: eigencentralities may be identified fromthe reduced form, even when W is not identified. As detailed in de Paula (2017) and Jackson et al. (2017), such eigencentralities often play an important role in empirical work as they allow amapping back to underlying models of social interaction. Now let Θ ≡ { θ ∈ R m : Assumptions (A1)-(A5) are satisfied } be the structural parameterspace of interest. Our first theorem establishes local identification of the mapping. A parameterpoint θ is locally identifiable if there exists a neighborhood of θ containing no other θ whichis observationally equivalent. Using classical results in Rothenberg (1971), we show that ourassumptions are sufficient to ensure that the Jacobian of Π relative to θ is non-singular, which, inturn, suffices to establish local identification. Theorem 1.
Assume (A1)-(A5). θ ∈ Θ is locally identifiable. An immediate consequence of local identification is that the set { θ ∈ Θ : Π( θ ) = Π( θ ) } isdiscrete (i.e. its elements are isolated points). The following corollary establishes that Π is a properfunction, i.e. the inverse image Π − ( K ) of any compact set K ⊂ R N is also compact (Krantz andParks, 2013, p. 124). Since it is discrete, the identified set must be finite. To identify the eigencentralities, we identify the eigenvector that corresponds to the dominant eigenvalue. If W is non-negative and irreducible, this is the (unique) eigenvector with strictly positive entries, by the Perron-FrobeniusTheorem for non-negative matrices (see Horn and Johnson, 2013, p.534). orollary 1. Assume (A1)-(A5). Then Π( · ) is a proper mapping. Moreover, the set { θ : Π( θ ) =Π( θ ) } is finite. Under additional assumptions, the identified set is at most a singleton in each of the partitioningsets Θ − ≡ Θ ∩ { ρβ + γ < } and Θ + ≡ Θ ∩ { ρβ + γ > } . Since
Θ = Θ − ∪ Θ + , if the signof ρ β + γ is unknown, the identified set contains, at most, two elements. In the theorem thatfollows, we show global identification only for θ ∈ Θ + , since arguments are mirrored for θ ∈ Θ − . Theorem 2.
Assume (A1)-(A5), then for every θ ∈ Θ + we have Π( θ ) = Π( θ ) ⇒ θ = θ . Thatis, θ is globally identified with respect to the set Θ + . Similar arguments apply if Theorem 2 instead were to be restricted to θ ∈ Θ − . The proof of thecorollary below is immediate and therefore omitted. Corollary 2.
Assume (A1)-(A5). If ρ β + γ > , then the identified set contains at most oneelement, and similarly if ρ β + γ < . Hence, if the sign of ρ β + γ is unknown, the identifiedset contains, at most, two elements. We now turn our attention to the problem of identifying the sign of ρ β + γ from the obser-vation of Π . This would then allow us to establish global identification using Theorem 2. It isapparent from (5) that if ρ > and ( W ) ij ≥ , for all i, j = { , . . . , N } the off-diagonal elementsof Π identify the sign of ρ β + γ . Corollary 3.
Assume (A1)-(A5). If ρ > and ( W ) ij ≥ , the model is globally identified. Real world applications often suggest endogenous social interactions are positive ( ρ > ), inwhich case global identification is fully established by Corollary 3. On the other hand, if ρ < (e.g. if outcomes are strategic substitutes), ρ k in (5) alternates signs with k , and the off-diagonalelements no longer carry the sign of ρ β + γ . Nonetheless, if W is non-negative and irreducible(i.e., not permutable into a block-triangular matrix or, equivalently, a strongly connected socialnetwork), the model is also identifiable without further restrictions on ρ : Corollary 4.
Assume (A1)-(A5), ( W ) ij ≥ and W is irreducible. If W has at least two realeigenvalues or | ρ | < √ / , then the model is globally identified. The global inversion results we use are related to, but different from, those used by Komunjer (2012), Leeand Lewbel (2013) and Chiappori et al. (2015). Those authors use variations on a classical inversion result ofHadamard. In contrast, we employ results on the cardinality of the pre-image of a function, relying on less stringentassumptions. Specifically, while the classical Hadamard result requires that the image of the function be simply-connected (Theorem 6.2.8 of Krantz and Parks, 2013), the results we rely on do not. Under some special conditions, the mirror image of θ can be characterized from equation (5). If − W sat-isfies Assumption (A4), we may set ρ ∗ = − ρ , β ∗ = β , γ ∗ = − γ and W ∗ = − W . Then ρ β + γ = − ( ρ ∗ β ∗ + γ ∗ ) . Also note that (cid:80) ∞ k =1 ρ k − W k = − (cid:80) ∞ k =1 ( ρ ∗ ) k − ( W ∗ ) k , and so ( ρ β + γ ) (cid:80) ∞ k =1 ρ k − W k =( ρ ∗ β ∗ + γ ∗ ) (cid:80) ∞ k =1 ( ρ ∗ ) k − ( W ∗ ) k . It follows that Π( θ ) = Π( θ ∗ ) , where θ ∗ = ( ρ ∗ , β ∗ , γ ∗ , W ∗ ) . ρ is appropriately bounded.Since W is non-negative, it has at least one real eigenvalue, by the Perron-Frobenius Theorem. If W is symmetric, for example, its eigenvalues are all real, and Corollary 4 holds. It also holds if ( W ) ij ≤ , as we can re-write the model as ρW = − ρ | W | where | W | , is the matrix whose entriesare the absolute values of the entries in W . In any case, the bound on | ρ | is sufficient and holdsin most (if not all) empirical estimates we are aware of obtained from either elicited or postulatednetworks, and in our application on tax competition. We observe outcomes for i = 1 , . . . , N individuals repeatedly through t = 1 , . . . , T instances.If t corresponds to time, it is natural to think of there being unobserved heterogeneity acrossindividuals, α i , to be accounted for when estimating Π . The structural model (2) is then, y it = ρ N (cid:88) j =1 W ,ij y jt + β x it + γ N (cid:88) j =1 W ,ij x jt + α i + (cid:15) it , which can be written in matrix form as, y t = ρ W y t + x t β + W x t γ + α ∗ + (cid:15) t , where α ∗ is the vector of fixed effects. Individual-specific and time-constant fixed effects can beeliminated using the standard subtraction of individual time averages. Defining ¯ y t = T − (cid:80) Tt =1 y t , ¯ x t = T − (cid:80) Tt =1 x t and ¯ (cid:15) t = T − (cid:80) Tt =1 (cid:15) t , y t − ¯ y t = ρ W ( y t − ¯ y t ) + ( x t − ¯ x t ) β + W ( x t − ¯ x t ) γ + (cid:15) t − ¯ (cid:15) t , if W is does not change with time. Identification from the reduced form follows from previoustheorems, since Π is unchanged when regressing y t − ¯ y t on x t − ¯ x t . We next allow for unobserved common shocks to all individuals in the network in the same instance t . Such correlated effects α t can confound the identification of social interactions. As we have notplaced any distributional assumption on the covariance matrix of the disturbance term, our analysisreadily incorporates correlated effects that are orthogonal to x t . When this is not the case, one As is the case in panel data, this would require strict exogenity ( E [ (cid:15) s | x t ] = 0 for any s and t ) or predeterminederrors ( E [ (cid:15) s | x t ] = 0 for s ≥ t ) so that the matrix Π can be consistently estimated. α t explicitly. The model then is, y t = ρ W y t + x t β + γ W x t + α t ι + (cid:15) t , where α t is a scalar capturing shocks in the network common to all individuals. Let Π =( I − ρ W ) − and Π = ( β I + γ W ) such that Π = Π Π . The reduced-form model is, y t = Π x t + α t Π ι + v t . We propose a transformation to eliminate the correlated effects: exclude the individual-invariant α t , subtracting the mean of the variables at a given period (global differencing). For this purpose,define H = n ιι (cid:48) . We note that in empirical and theoretical work it is customary to strengthenAssumption (A4) and require that all rows of W sum to one if no individual is isolated (see forexample Blume et al. , 2015). This strengthened assumption is usually referred to as row-sumnormalization, and is stated below:(A4’) For all i = 1 , ...N we have that (cid:80) j =1 ,...,N ( W ) ij = 1 .This can be written compactly as W ι = ι . In this case, W can be interpreted as the normalizedadjacency matrix. Under row-sum normalization we have that, ( I − H ) y t = ( I − H ) ( I − ρ W ) − ( β I + γ W ) x t + ( I − H ) ( I − ρ W ) − (cid:15) t = ( I − H ) Π x t + ( I − H ) v t , because ( I − H ) ( I − ρ W ) − α t ι = 0 if Assumption (A4’) holds. It then follows that ˜Π =( I − H )Π is identified. The next proposition shows that, under row-sum normalization of W , Π is identified from ˜Π (and, as a consequence, the previous results immediately apply). Proposition 1. If W is diagonalizable and row-sum normalized, Π is identified from ˜Π . Under row-sum normalization of W , a common group-level shock affects individuals homo-geneously since ( I − ρ W ) − α t ι = α t ( I + ρ W + ρ W + · · · ) ι = α t − ρ ι , which is a vector withno variation across entries. Consequently, global differencing eliminates correlated effects and ( I − H ) ( I − ρ W ) − α t ι = ( I − ρ W ) − α t ( I − H ) ι = 0 . In the absence of row-sum normaliza-tion, global differencing does not ensure that correlated effects are eliminated. To see this, notethat ( I − ρ W ) − is no longer row-sum normalized and, crucially, α t ( I − ρ W ) − ι is not a vectorwith constant entries.The next proposition makes this point formally, that the stronger Assumption (A4’) is necessary to eliminate group-level shocks, by showing it is not possible to construct a data transformationthat eliminates group effects in the absence of row-sum normalization.15 roposition 2. Define r W = ( I − ρ W ) − ι . If in space Θ = { θ ∈ R m : Assumptions (A1)-(A5)are satisfied } there are N matrices W (1)0 , . . . , W ( N )0 such that [ r W (1)0 · · · r W ( N )0 ] has rank N , thenthe only transformation such that ( I − ˜ H )( I − ρ W ) − ι = 0 is ˜ H = I . It is useful to be able to test for row-sum normalization (A4’) as it enables common shocks tobe accounted for in the social interactions model. This is possible as, Π ι = β ι + ( ρ β + γ ) ∞ (cid:88) k =1 ρ k − W k ι = (cid:34) β + ( ρ β + γ ) ∞ (cid:88) k =1 ρ k − (cid:35) ι = β + γ − ρ ι. (7)The last equality follows from the observation that, under row-normalization of W , W k ι = W ι = ι , k > . This implies Π has constant row-sums, which suggests row-sum normalization is testable.In the Appendix we derive a Wald test statistic to do so. Next, allowing for multivariate x t of dimension n × k , the reduced-form model (4) is, y t = K (cid:88) k =1 Π ,k x k,t + ν t , where Π ,k = ( I − ρ W ) − (cid:0) β ,k + γ ,k W (cid:1) , x k,t refers to the k -th column of x t , and β ,k and γ ,k select the k -th element of K -dimensional β and γ , respectively. The previous identificationresults then apply sequentially to each Π ,k , k = 1 , . . . , K . In fact, we only then need to maintain W x = γ W for one covariate. It is therefore possible to allow the structure of endogenous andexogenous social effects to differ for K − of the covariates. With K covariates, equation (3) is, y t = ρ W y t + K (cid:88) k =1 β ,k x t + K (cid:88) k =1 γ ,k W ,k x k,t + (cid:15) t . Let W ,k = W be the case for k = 1 . Then, having identified ρ and W from Π , , ( I − ρ W )Π ,k = β ,k I + γ ,k W ,k , For ease of explanation, in the Appendix we derive the test under the asymptotic distribution of the OLSestimator. The test generally holds with minor adjustments for estimators with known asymptotic distributions. k = 2 , . . . , K . The parameter β ,k then corresponds to the diagonal elements of ( I − ρ W )Π ,k and the off-diagonal entries correspond to the off-diagonal elements of γ ,k W ,k . If Assumption(A4) holds for every k = 1 , . . . , K , we can identify γ ,k and thus W ,k for every k = 1 , . . . , K . β While many applications assume β to be homogeneous across individual units, we here considerpossible avenues allowing for heterogeneous coefficients. In a slight abuse of notation, consider forthis subsection β in equation system (3) to be diag ( β , . . . , β N ) N × N . Instead of a homogeneousscalar, β is a diagonal matrix with the individual-specific coefficients β , . . . , β N along its di-agonal. When ρ = 0 as in Manresa (2016), Π = β + γ W . In this case, under Assumption(A1), β is identified from the diagonal elements in Π and γ W is identified from its off-diagonalelements.With multiple covariates, as long as coefficients are homogeneous for one of the covariates, onecan also identify heterogeneous coefficients on the remaining covariates as done in the previousSubsection. For example, let there be K covariates and β ,k = diag ( β ,k , . . . , β N,k ) for k =1 , . . . , N . Suppose β ,k = · · · = β N,k for one of these covariates and let this k = 1 without loss ofgenerality. Having identified ρ and W from Π , , ( I − ρ W )Π ,k = β ,k + γ ,k W ,k , for k = 2 , . . . , K . Then, under Assumption (A1) , β ,k is identified from the diagonal elements in ( I − ρ W )Π ,k and γ ,k W ,k is identified from its off-diagonal elements for k = 2 , . . . , K − .Alternatively, when γ = 0 one can apply traditional simultaneous equation methods to at-tain identification. For example, let B ≡ [( I − ρ W ) (cid:48) , − ( β + γ W ) (cid:48) ] (cid:48) N × N and R ( N − × N =[0 ( N − × ( N +1) I N − ] . The restriction that γ = 0 in the first equation in equation system (3) canthen be expressed as RB · , = 0 ( N − × , where B · , is the first column in B . The rank condition forthe identification of the first equation is then given by, RB = − β , . . . . . . . . . − β ,N having rank equal to N − (see Theorem 9.2 in Wooldridge, 2002). This will be the case if β , , . . . , β ,N (cid:54) = 0 . Intuitively, this guarantees that individual specific covariates are valid in- Blume et al. (2015) also study the case in which the social structure mediating endogenous and exogenous socialeffects might differ. When W x is known and there is partial knowledge of the endogenous social interaction matrix W , they show that the parameters of the model can be identified (their Theorem 6). Analogously, when there areenough unconnected nodes in each of the social interaction matrices represented by W x and W , and the identityof those nodes is known, identification is also (generically) possible (their Theorem 7). β , , . . . , β ,N are each different from zero,identification from Π is obtained.More generally, because there are N equations corresponding to the entries in Π and, allow-ing for heterogeneity in β and imposing assumptions (A1)-(A5), there are N + 1 parameters,further restrictions (like row-sum normalization) would be necessary. We conjecture that adequaterestrictions would deliver positive identification results, but focus in the more conventional settingwith homogeneous β . We now transition from our core identification results to their practical implementation. As this isa high-dimensional estimation problem, our preferred approach makes use of the Adaptive ElasticNet GMM (Caner and Zhang, 2014), that is based on the penalized GMM objective function. Giventhe identification results presented in Section 2, the populational version of the GMM objectivefunction will be uniquely minimized at the true parameter vector.After setting out the estimation procedure, we showcase the method using Monte Carlo simu-lations based both on stylized random network structures as well as real world networks. In eachcase, we take a fixed network structure W , and simulate panel data as if the data generatingprocess were given by (1). We apply the method on the simulated panel data to recover estimatesof all elements in W , as well as the endogenous and exogenous social effect parameters. The parameter vector to be estimated is high-dimensional: θ = ( W , . . . , W N,N − , ρ, γ, β ) (cid:48) ∈ R m ,where m = N ( N −
1) + 3 and W ij is the ( i, j ) -th element of the N × N social interactions matrix W . To be clear, in a network with N individuals, there are N ( N − potential interactionsbecause individuals could interact with everyone else but herself (which would violate AssumptionA1). As a consequence, even with a modest N , there are many more parameters to estimate and m is large. For example, a network with N = 50 implies more than two thousand parametersto estimate. While we consider N (and thus m ) is fixed, we still refer to θ as high-dimensional.OLS estimation requires m (cid:28) N T ( ⇒ N (cid:28) T ) , so many more time periods than individuals: arequirement often met in finance data sets (van Vliet, 2018) or in other fields (see, e.g., Section 4.2in Rothenhäusler et al. , 2015). Instead, to estimate a large number of parameters with limited datawe utilize high-dimensional estimation methods, that are the focus of a rapidly growing literature.However, the identification results presented in Section 2 apply more broadly and irrespective ofthe estimation procedure.Sparsity is a key assumption underlying many high-dimensional estimation techniques. In thecontext of social interactions, we say that W is sparse if ˜ m , the number of non-zero elements of18 , is such that ˜ m (cid:28) N T . The notion of sparsity thus depends on the number of of time periods:although N and m are fixed, ˜ m itself can grow with T . Sparsity corresponds to assuming thatindividuals influence or are influenced by a small number of others, relative to the overall size ofthe potential network and the time horizon in the data. As such, sparsity is typically not a bindingconstraint in social networks analysis. In the estimation of sparse models, the “effective number of parameters” (or “effective degreesof freedom”) relates to the number of variables with non-zero estimated coefficients (Tibshirani andTaylor, 2012). In the context of the current social network model (and the Elastic Net estimatoron which the estimation strategy below builds on), this is approximately equivalent to the densityof the network times the number of parameters m . We then require this number to be smallerthan T . Implicitly, this calculation provides a rough assessment on the minimum required T . Forexample, with N = 30 and a network with % of potential links in place, this implies T should belarger than .Finally, to reiterate, our identification results themselves do not depend on the sparsity ofnetworks. In particular, Assumptions (A1) to (A5) do not impose restrictions on the number oflinks in W , or ˜ m . Our preferred approach estimates the interaction matrix in the reduced form while penalizingand imposing sparsity on the structural object W . We impose sparsity and penalization in thestructural-form matrix W because this is a weaker requirement than imposing sparsity and penal-ization in the reduced-form matrix Π . To accomplish this, we make use of the Adaptive ElasticNet GMM (Caner and Zhang, 2014), that is based on the penalized GMM objective function, G NT ( θ, p ) ≡ g NT ( θ ) (cid:48) M T g NT ( θ ) + p N (cid:88) i,j =1 i (cid:54) = j | W i,j | + p N (cid:88) i,j =1 i (cid:54) = j | W i,j | (8)where θ = ( W , , . . . , W N,N − ρ, γ, β ) (cid:48) with dimension m = N ( N −
1) + 3 , and p and p are thepenalization terms. The term g NT ( θ ) (cid:48) M T g NT ( θ ) is the unpenalized GMM objective function with For example, common stylized networks are sparse, such as: (i) star: all individuals receive spillovers from thesame individual; (ii) lattice: each individual is a source of spillover only to one other individual; (iii) interactions inpairs or triads or small groups, such as those described by De Giorgi et al. (2010); and (iv) small world networks(Watts, 1999). Prominent real world economic networks are also sparse. For example, in individual-level eliciteddata from
AddHealth on teenage friendships (defined as reciprocated nominations), the density of links is around % of all feasible links. In firm-level data, the density of production networks in the US is less than % of all feasiblelinks (Atalay et al. , 2011). If N → ∞ , Assumption (A2) would imply vanishing ( W ) ij entries. As highlighted previously, we consider N to be fixed, in line with many practical applications. Furthermore, Assumption (A2) is used to represent inversematrices as Neumann series in our identification results. What is necessary for this to hold is that a sub-multiplicativenorm on ρW be less than one. Here we use a specific norm (i.e., the maximum row sum norm), but other (induced)norms are also possible (i.e., the 2-norm or the 1-norm) (see Horn and Johnson, 2013, Chapter 5.6). Note that even if W is sparse, Π may not be sparse. In Appendix B.1, we show that [Π ] ij = 0 if, and only if,there are no paths between i and j in W , and so the pair is not connected. So sparsity in Π is understood as W being ‘sparsely connected’, which is a stronger assumption than sparsity in W . g NT ( θ ) = (cid:80) Tt =1 [ x t e t ( θ ) (cid:48) · · · x Nt e t ( θ ) (cid:48) ] (cid:48) , e t ( θ ) = y t − ( I − ρW ) − ( βI + γW ) x t . Thereare q ≡ N moment conditions since x it is orthogonal to e jt , for each i, j = 1 , . . . , N . Hence theGMM weight matrix M T is of dimension N × N , symmetric, and positive definite. For simplicity,we use M T = I N × N . Note that if x t is econometrically endogenous, one can also exploit momentconditions with respect to available instrumental variables. Given the identification results presented in Section 2, if θ (cid:54) = θ and does not belong to theidentified set, then Π( θ ) (cid:54) = Π( θ ) . Consequently, the populational version of the GMM objectivefunction is uniquely minimized at the true parameter vector θ .The penalization terms in (8) is what makes this different from a standard GMM problem.The first term, p (cid:80) Ni,j =1 ,i (cid:54) = j | W i,j | , penalizes the sum of the absolute values of W ij , i.e. the sumof the strength of links, for all node-pairs. The second term, p (cid:80) Ni,j =1 ,i (cid:54) = j | W i,j | , penalizes thesum of the square of the parameters. This term has been shown to provide better model-selectionproperties, especially when explanatory variables are correlated (Zou and Zhang, 2009). The firststage estimate is, ˜ θ ( p ) = (1 + p /T ) · arg min θ ∈ R p G NT ( θ, p ) (9)where (1 + p /T ) is a bias-correction term also used by Caner and Zhang (2014).Depending on the choice of p , some W i,j ’s will be estimated as exact zeros. A larger shareof parameters will be estimated as zeros if p increases. The penalization also shrinks non-zeroestimates towards zero. A second (adaptive) step provides improvements by re-weighting thepenalization by the inverse of the first-step estimates (Zou, 2006): ˆ θ ( p ) = (1 + p /T ) · arg min θ ∈ R p g NT ( θ ) (cid:48) M T g NT ( θ ) + p ∗ (cid:88) { i,j : ˜ W ij (cid:54) =0 ,i,j =1 ,...,N,i (cid:54) = j } | W i,j || ˜ W i,j | γ + p (cid:88) { i,j : ˜ W ij (cid:54) =0 ,i,j =1 ,...,N,i (cid:54) = j } | W i,j | , (10)where ˜ W i,j is the ( i, j ) -th element of the first-step estimate of W , and we follow Caner and Zhang(2014) to set γ = 2 . . Elements ˜ W i,j estimated as zeros in the first stage are kept as zero in thesecond stage, because ˜ W i,j = 0 implies the effective penalization is infinite. We write p = ( p , p ∗ , p ) as the final set of penalization parameters. Conditional on p , the estimate of the Adaptive ElasticNet GMM procedure is ˆ θ ( p ) . Finally, we update the estimates of ρ , β and γ on a regressionusing peers-of-peers as instruments, similar to Bramoullé et al. (2009), but using the network asestimated in (10). This final step is not necessary but performs better in small samples. As inCaner and Zhang (2014, p. 35), the penalization parameters p are chosen by the BIC criterion. For expositional ease, we describe estimation in the context of the reduced form model (4), thereby abstainingfrom individual fixed or correlated effects. As the GMM estimator uses moments between the structural disturbanceterms and covariates, this endogeneity is built into the estimation procedure. In Appendix B.2 we provide further implementation details, including the choice of initialconditions. Of course, other estimation methods are available and our identification results do nothinge on any particular estimator. Our aim is to demonstrate the practical feasibility of usingthe Adaptive Elastic Net estimator, rather than claim it is the optimal estimator. Indeed, inAppendix B.3 we show how OLS can also be used to estimate θ if T is sufficiently large. Thismakes precise the benefits of penalized estimation for any given T and highlights that sparsity isnot required for our identification results. We showcase the method using Monte Carlo simulations based both on stylized random networkstructures as well as real world networks. We describe the simulation procedures, results androbustness checks in more detail in the Appendix. Here we just provide a brief overview tohighlight how well the method works to recover social networks even in relatively short panels.For each simulated network, we take a fixed network structure W , and simulate panel dataas if the data generating process were given by (1). We then apply the method on the simulatedpanel data to recover estimates of all elements in W , as well as the endogenous and exogenoussocial effect parameters ( ρ , γ ). Our result identifies entries in W and so naturally recovers linksof varying strength. It is long recognized that link strength might play an important role in socialinteractions (Granovetter, 1973). Data limitations often force researchers to postulate some ties tobe weaker than others (say, based on interaction frequency). In contrast, our approach identifiesthe continuous strength of ties, W ,ij , where W ,ij > implies node j influences node i .The stylized networks we consider are a random network, and a political party network in whichtwo groups of nodes each cluster around a central node. The real world networks we consider arethe high-school friendship network in Coleman (1964) from a small high school in Illinois, andone of the village networks elicited in Banerjee et al. (2013) from rural Karnataka, India. Thesenetworks vary in size, complexity, and their aggregate and node-level features. All four networksare also sparse. For the stylized networks, we first assess the performance of the estimator for afixed network size, N = 30 . We simulate the real-world networks using non-isolated nodes in each Following Caner and Zhang, 2014, the choice of p , which we denote as ˆ p , is the one that minimizesBIC ( p ) = log (cid:20) g NT (cid:16) ˆ θ ( p ) (cid:17) (cid:48) M T g NT (cid:16) ˆ θ ( p ) (cid:17)(cid:21) + A (cid:16) ˆ θ ( p ) (cid:17) · log TT where A (cid:16) ˆ θ ( p ) (cid:17) counts the number of non-zero coefficients among { W , , . . . , W N,N − } . (See also Zou et al. , 2007.) For example, Manresa (2016) also relies on a Lasso-related methodology but restricts ρ to be zero and soignores endogenous social effects. If instrumental variables are available, Lam and Souza (2016) propose estimating(1) directly using the Adaptive Lasso and exploiting sparsity of the estimated W . Gautier and Rose (2016) extendthe (identification-robust) Self-Tuning Instrumental Variable estimator in Gautier and Tsybakov (2014). N = 70 and respectively). Despite the heterogeneity across networks, the method performs well in all simulations: FigureA1 shows the simulation results. Each Panel presents a different metric as we vary T for eachsimulated network. Panel A shows that for each network, the proportion of zero entries in W correctly estimated as zeros is above % even when T = 5 . The proportion approaches % as T grows. Conversely, Panel B shows the proportion of non-zeros entries estimated as non-zeros isalso high for small T . It is above % from T = 5 for the Erdos-Renyi network, being at least % across networks for T = 25 , and increasing as T grows. As discussed above, the AdaptiveElastic Net estimator is better in recovering true zero entries because it is a well-known featurethat shrinkage estimators tend to shrink small parameters to zero.Panels C and D show that for each simulated network, the mean absolute deviation betweenestimated and true networks for ˆ W and ˆΠ falls quickly with T and is close zero for large samplesizes. Finally, Panels E and F show that biases in the endogenous and exogenous social effectsparameters, ˆ ρ and ˆ γ , also fall quickly in T . The fact that biases are not zero is as expected forsmall T , being analogous to well-known results for autoregressive time series models. In the Appendix we show the robustness of the simulation results to: (i) varying network sizesand node definitions in the real work network of Banerjee et al. (2013); (ii) alternative parameterchoices and richening up the structure of shocks across nodes. We demonstrate the gains fromusing the Adaptive Elastic Net GMM estimator over alternative estimators, such as the AdaptiveLasso estimator and OLS, and also show how incorporating partial knowledge on W provides forperformance gains. Our identification result can be used to shed new light on a classic social interactions problem:tax competition between US states (Wilson, 1999). Since the seminal empirical studies in taxcompetition between jurisdictions (Case et al. , 1989; Case et al. , 1993), it has been well-recognizedthat defining competing ‘neighbors’ is the central empirical challenge, and theory cannot resolve theissue. Two mechanisms have been argued to drive the structure of interactions across jurisdictions:factor mobility and yardstick competition.On factor mobility, Tiebout (1956) first argued that labor and capital can move in response todifferential tax rates across jurisdictions. Factor mobility leads naturally to the postulated socialinteractions matrix being: (i) geographic neighbors given labor mobility; and (ii) jurisdictions withsimilar economic or demographic characteristics, given capital mobility (Case et al. , 1989). As in Bramoullé et al. , 2009, we exclude isolated nodes because they do not conform with row-sum normalization. The bias in spatial auto-regressive models with small number of observations even when the network is observed is similarly documented by Mizruchi and Neuman (2008), Farber et al. (2009), Smith (2009), Neuman and Mizruchi(2010), and Wang et al. (2014). A body of evidence finds that tax bases are mobile in response to tax differentials (Hines, 1996; Devereux and
22 second mechanism occurs through political economy channels (Shleifer, 1985). In particular,yardstick competition between jurisdictions is driven by voters making comparisons between statesto learn about their own politician’s quality. Besley and Case (1995) formalize the idea in amodel where voters use taxes set by governors in neighboring states to infer their own governor’squality. This generates informational externalities across jurisdictions, forcing incumbents intoyardstick competition, where their tax setting behavior is determined by what other incumbentsdo. Yardstick competition leads naturally to the postulated interactions matrix corresponding toa matrix of ‘political neighbors’: other states that voters make comparisons to.This application shows the practical use of our approach to recover social interactions in asetting in which the number of nodes and time periods is relatively low: the data covers mainlandUS states, N = 48 , for years 1962-2015, T = 53 . Our approach identifies the structure of socialinteractions among ‘economic neighbors’, that we denote W econ . We contrast this against a nullhypothesis that states are only influenced by their geographic neighbors, W geo , as postulated byBesley and Case, 1995 and shown in Figure 1A. With W econ recovered, we can establish, beyondgeography, what predicts the existence and strength of ties between states. Finally, relative to W geo , we conduct simulations using W econ to assess the equilibrium propagation of tax settingshocks across mainland US states. Taken together, this body of evidence allows us to providenovel insights related to the role of factor mobility and yardstick competition in driving tax settingbehavior across US states. We denote state tax liabilities for state i in year t as τ it , covering state taxes collected from real percapita income, sales and corporate taxes. We measure this using a series constructed from datapublished annually in the Statistical Abstract of the United States. Our series covers mainlandstates ( N = 48 ) for years 1962-2015, ( T = 53 ), therefore extending the sample used by Besley andCase (1995), that runs from 1962-1988 ( T = 26 ). The outcome considered, ∆ τ it , is the changein tax liabilities between years t and ( t − because it might take a governor more than a year toimplement a tax program. Their model implies a standard social interactions specification for thetax setting behavior of state governors: ∆ τ it = ρ N (cid:88) j =1 W ,ij ∆ τ jt + γ N (cid:88) j =1 W ,ij x jt + βx it + α i + α t + (cid:15) it . (11) Griffith, 1998; Kleven et al. , 2013, 2014) Besley and Case (1995) test their political agency model using a two equation set-up: (i) on gubernatorialre-election probabilities; and (ii) on tax setting. Our application focuses on the latter because this represents asocial interaction problem. They use two tax series: (i) TAXSIM data (from the NBER) which runs from 1977-88;and (ii) state tax liabilities series constructed from data published annually in the Statistical Abstract of the USthat runs from 1962-1988. All their results are robust to either series. We extend the second series. (cid:80) Nj =1 W ,ij ∆ τ jt ); (ii) exogenous social effects arising through the economic/demographiccharacteristics of neighbors ( (cid:80) Nj =1 W ,ij x jt ) ; (iii) state i ’s characteristics ( x it ), that include incomeper capita, the unemployment rate, and the proportion of young and elderly. All specificationsinclude state and time effects ( α i , α t ), so allowing for time-invariant unobserved heterogeneityacross states, and for common (macroeconomic) shocks. Due to the inclusion of the time effects α t , we normalize the rows of W econ to one. Table A7 presents descriptive statistics for the Besleyand Case (1995) sample and our extended sample.Much of the earlier literature focuses on endogenous social effects and ignores exogenous socialeffects by setting γ = 0 . Our identification result allows us to relax this constraint and thusestimate the full typology of social effects described by Manski (1993). This is important becauseonly endogenous social effects lead to social multipliers, and are crucial to identify as they canlead to a race-to-the-bottom or sub-optimal public goods provision (Brennan and Buchanan, 1980;Wilson, 1986; Oates and Schwab, 1988).After estimating the neighborhood matrix, we follow Besley and Case (1995) and estimate themodel instrumenting for ∆ τ jt using neighbors’ lagged change in income per capita, and neighbors’lagged change in unemployment rate. These instruments are in the spirit of using exogenous socialeffects to instrument for neighbor’s tax changes. However, given our approach allows us to estimateexogenous social effects ( γ (cid:54) = 0 ), these instruments will generally be weaker when estimating thefull specification in (11). We thus follow Bramoullé et al. (2009) and De Giorgi et al. (2010), andalso instrument neighbors’ tax changes with neighbor-of-neighbor characteristics. Table 1 presents our preliminary findings and comparison to Besley and Case (1995). Column 1shows OLS estimates of (11) where the postulated social interactions matrix is based on geographicneighbors, exogenous social effects are ignored so γ = 0 and the panel includes all mainlandstates but runs only from 1962-1988 as in Besley and Case (1995). Social interactions influencegubernatorial tax setting behavior: (cid:98) ρ OLS = . . Column 2 shows this to be robust to instrument-ing neighbors’ tax changes using the instrument set proposed by Besley and Case (1995). (cid:98) ρ SLS ismore than double the magnitude of (cid:98) ρ OLS suggesting tax setting behaviors across jurisdictions arestrategic complements, and OLS estimates are heavily downward-biased.Columns 3 and 4 replicate both specifications over the longer sample period, and confirm Besleyand Case’s (1995) finding on social interactions to be robust in this longer sample. We again notethat (cid:98) ρ SLS is more than double the magnitude of (cid:98) ρ OLS . The result in Column 4 implies that forevery dollar increase in the average tax rates among geographic neighbors, a state increases itsown taxes by cents. This is similar to the headline estimate of Besley and Case (1995). Nor is the magnitude very different from earlier work examining fiscal expenditure spillovers. For example, .3 Endogenous and Exogenous Social Interactions ( ρ and γ ) We now move beyond much of the earlier political economy and public economics literature to firstestablish whether there are endogenous and exogenous social interactions in tax setting behavior.We first focus on the endogenous and exogenous social interaction parameters, and in the nextsubsection we detail the identified social interactions matrix, ˆ W econ . Column 1 of Table 2 shows theinitial estimates obtained from the Adaptive Elastic Net procedure where γ = 0 . Columns 2 and3 show the resulting OLS and 2SLS estimates for ρ : (cid:98) ρ SLS = . > (cid:98) ρ OLS = . > . Columns4 to 6 estimate the full model in (11). Columns 5 and 6 show the OLS and 2SLS estimates of ρ are smaller, and less precisely estimated when exogenous social effects are allowed. This is notsurprising given that the instrument set is based on neighbors’ characteristics, many of which aredirectly controlled for in (11), thus reducing the effective variation induced by the instrument.Hence, in Column 7, we report 2SLS estimates based on instruments using neighbor-of-neighborcharacteristics. This represents our preferred specification: (cid:98) ρ SLS = . (with a standard error of . ). This value also meets the requirements on ρ in Corollaries 3 and 4 for global identification.In short, there is robust evidence of endogenous social interactions in tax setting behavior ofgovernors across states. ˆ W econ ) Figure 1B shows how the structure of economic ( ˆ W econ ) and geographic networks ( W geo ) differ,where connected edges imply that two states are linked in at least one direction (either state i causally impacts state taxes in j , and/or vice versa ). This comparison makes it clear whetherall states geographically adjacent to i matter for its tax setting behavior and whether there arenon-adjacent states that influence its tax rate.The left-hand panel of Figure 1B shows the network of geographic neighbors (whose edgesare colored blue), onto which we have superimposed the edges that are not identified as links in W econ ; these dropped edges are indicated in red. This first implies that not all geographicallyadjacent states are relevant for tax setting behavior. The right-hand panel of Figure 1B adds newedges identified in ˆ W econ that are not part of W geo . These represent non-adjacent states throughwhich social interactions occur. This implies the existence of spatially dispersed social interactionsbetween states. The implication is that for tax-setting behavior, economic distance is imperfectlymeasured if we simply assume that interactions depend only on geographical distance. As detailed Case et al. (1989) find that US state government levels of per-capita expenditures are significantly impacted bythe expenditures of their neighbors, with the size of the impact being that a one dollar increase in neighbors’expenditures leads to an increase in own-state expenditures by seventy cents. We report robust standard errors and so do not adjust them for the fact that ˆ W econ is estimated. Table A8 shows the full set of exogenous social effects (so Columns 1 to 4 refer to the same specifications asColumns 4 to 7 in Table 2). Exogenous social effects operate through economic neighbors’ income per capita andunemployment rate. Demographic characteristics of economic neighbors to state i do not impact its tax rate. W geo has edges, while ˆ W econ has only edges. States are lessconnected than implied by postulating geographic networks. ˆ W econ and W geo have edges incommon. However, W geo has edges that are absent in ˆ W econ . Hence, while geography remainsa key determinant of tax competition, the majority of geographical neighbors ( /
214 = 63 %)are not relevant for tax setting. There are edges that exist only in W geo , so although there arefewer edges in ˆ W econ , the identified social interactions are more spatially dispersed than under theassumption of geographic networks. This is reflected in the far lower clustering coefficient in ˆ W econ than in W geo ( . versus . ). Our estimation strategy identifies the continuous strength of ties, W ,ij , where W ,ij > is inter-preted as state j influencing outcomes in state i . This is useful because recent developments intax competition theory, using insights from the social networks literature, suggest links need notbe reciprocal or of symmetric strength (Janeba and Osterleh, 2013).Figure 2A shows the distribution of W ,ij ’s across edges in ˆ W econ (conditional on W ,ij > ).The strength of ties between pairs of states varies greatly. The mean strength of ties is . , thatis higher than the median strength, . , suggesting many weak ties. At the other end of thedistribution, the strongest % of ties have weight above . .On the reciprocity of ties, Table 3 reveals that only . % of edges in ˆ W econ are reciprocal (alledges in W geo are reciprocal by construction). Hence, tax competition is both spatially disperseand highly asymmetric. In most cases where tax setting in state i is influenced by taxes in state j , the opposite is not true.Panels B and C in Figure 2 illustrate this for California, indicating the strength of each tie( ˆ W econ,CA,j ). Figure 2B shows the in-network for California: those states causally impacting tax-setting in California. Some geographic neighbors to California influence its tax setting behavior(Nevada and Oregon), although these ties are weak. On the other hand, non-adjacent statesinfluence California (Colorado, Maine), and these in-network ties are stronger than geographicallyadjacent in-network ties. Figure 2C shows the out-network for California, again indicating each tiestrength ( ˆ W econ,i,CA ): those states whose taxes are influenced by taxes in California. We see thatnone of the geographic neighbors to California are influenced by its tax setting behavior, whereasa number of non-adjacent states are influenced (including East Coast states such as Virginia, andSouthern states, such as Louisiana). When states are influenced by taxes in California, these tiestend to be relatively strong: ˆ W econ,i,CA (cid:62) . for all five in-network ties. The clustering coefficient is the frequency of the number of fully connected triplets over the total number oftriplets. Other metrics can also be used to provide a scalar comparison of W geo and ˆ W econ . One way to do so isto reshape both matrices as vectors of length (48 × and to compute their correlation. Doing so, we obtain acorrelation coefficient of . . α t in (11), row-sum normalization is required and ensures (cid:80) j W ,ij =1 . Hence, for every state i there will be at least one economic neighbor state j ∗ impacts it, so that W ,ij ∗ > . This just reiterates that social interactions matter. On the other hand, our procedureimposes no restriction on the derived columns of ˆ W econ . It could be that a state does not affect anyother state. Examining this possibility directly in ˆ W econ , we see this occurs for Minnesota, NewJersey, New Mexico, Vermont, and Wisconsin. These states have an out-degree of zero. Their taxrates impact no other states.Table 3 reports the degree distribution across all nodes (states), splitting for in-networks andout-networks. In W geo , the in-degree is by construction equal to the out-degree, as all ties arereciprocal. The greater sparsity of the network of economic neighbors is again reflected in thedegree distribution being lower for ˆ W econ than for W geo . In ˆ W econ the dispersion of in- and out-degree networks is very different (as measured by the standard deviation), being near double forthe in-degree. This asymmetry in ˆ W econ further suggests that some highly focal or influential statesdrive tax setting behavior in other states.Figures 3A and 3B show complete histograms for the in- and out-degree across states. Thehistogram on the left is for in-degree, and shows that states under ˆ W econ generally have lowerin-degree than under W geo . The states that are influenced by the highest number of other statesare Utah, Pennsylvania and Ohio. The histogram on the right for out-degree, shows the five statesdescribed above that do not impact other states (Wisconsin, Vermont, New Mexico, New Jerseyand Minnesota). Delaware is an outlier influential state in its out-degree in determining tax settingin other states: as discussed below, Delaware is a well-known potential tax haven. We conclude by presenting two strategies to shed light on whether factor mobility and yardstickcompetition drive these social interactions: (i) exploiting information in the identified social inter-actions matrix ˆ W econ ; (ii) following Besley and Case (1995), using gubernatorial re-election as anindirect test of the relevance of yardstick competition.In our first strategy we estimate the factors correlated with the existence/strength of linksbetween states i and j in ˆ W econ using the following dyadic regression: ˆ W econ,ij = λ + λ X ij + λ X i + λ X j + u ij . We discretize link strength so ˆ W econ,ij ∈ { , } and predict the existence of a link using a linearprobability model. We then estimate the correlates of link strength ˆ W econ,ij ∈ [0 , using a Tobit Dyreng et al. (2013) find that taxes play an important role in determining whether firms locate subsidiaries inDelaware: a Delaware-based state tax avoidance strategy lowers state effective tax rates by around percentagepoint. They also report that in June 2010, Delaware landed at the top of National Geographic magazine’s publishedlist of the most secretive tax havens in the world (ahead of foreign tax havens such as Luxembourg, Switzerland,and the Cayman Islands). X ij , X i , and X j correspond to characteristics of the pair of states ( i, j ), ofstate i , and state j , respectively. Covariates are time-averaged over the sample period, and robuststandard errors are reported. The sample thus corresponds to N × ( N −
1) = 48 ×
47 = 2256 potential ij links that could have formed.Table 4 presents the results. Column 1 controls only for whether states i and j are geographicneighbors. This is highly predictive of a link between them. Columns 2 and 3 show that distancebetween states also negatively correlates with them being linked, but that when both geographicadjacency and distances are included, the former is more predictive. Hence, we control only forwhether i and j are geographic neighbors in the remaining Columns.The next set of specifications use the insight that economic neighbors are likely to be basedon a mixture of similarity in geography, income per capita, and demography (Case et al. , 1989).Column 4 thus adds two X ij covariates to capture the economic and demographic homophilybetween states i and j . GDP homophily is the absolute difference in the states GDP per capita.Demographic homophily is the absolute difference of the share of young people (aged 5-17) plusthe absolute difference of the share of elderly people (aged 65+) across the states. GDP homophilypredicts ties, whereas demographic homophily does not.Columns 5 to 7 then sequentially add in several sets of controls. For labor mobility, we usenet state-to-state migration data to control for the net migration flow of individuals from state i to state i (defined as the flow from i to j minus the flow from j to i ). We then add a politicalhomophily variable between states. For any given year, this is set to one if a pair of states havegovernors of the same political party. As this is time averaged over our sample, this elementcaptures the share of the sample period in which the states have governors of the same party.Lastly, we include whether state j is considered a tax haven (and so might have disproportionateinfluence on other states). Based on Findley et al. (2012), the following states are coded as taxhavens: Nevada, Delaware, Montana, South Dakota, Wyoming and New York. This corroboratesearlier evidence in Figure 3B, where Delaware, Wyoming and Nevada were among the states withthe highest out-degree.The specification in Column 7 shows that with this full set of controls, geographic adjacencyremains a robust predictor of the existence of links between states. However, the identified economicnetwork highlights additional significant predictors of tax competition between states: politicalhomophily reduces the likelihood of a link, suggesting any yardstick competition driving socialinteractions occurs when voters compare their governor to those of the opposing party in otherstates. The tax haven states appear to be especially influential in the tax setting behaviors of other can run for re-election.Taken together, our evidence suggests that both factor mobility (of both labor and capital,as measured through the influence of tax havens), and yardstick competition (occurring throughcomparisons to governors of the other party), are important mechanisms driving the existence andstrength of interactions in tax-setting behavior across US states.Finally, in the Appendix, we contrast how shocks to tax setting in a given state propagateunder the identified interactions matrix ˆ W econ , relative to what would have been predicted under apostulated network structure based on W geo . As ˆ W econ is spatially more dispersed than W geo , thegeneral equilibrium effects might be very different under the two network structures. We thereforediscuss the implications for tax inequality under ˆ W econ and the W geo counterfactual. In a canonical social interactions model, we provide sufficient conditions under which the socialinteractions matrix, and endogenous and exogenous social effects are all globally identified, evenabsent information on social links. Our identification strategy is novel, and may bear fruit inother areas. We describe how high-dimensional estimation techniques can be used to estimatethe model based on the Adaptive Elastic Net GMM method. We showcase our method in MonteCarlo simulations using two stylized and two real world networks: these highlight that even inpanels as short as T = 5 , the majority of social ties can be correctly identified. Finally, weemploy this estimation strategy to provide novel insights in a classic social interactions problem:tax competition across US states.Our method is immediately applicable to other classic social interactions problems. For exam-ple, in finance a long-standing question has been whether CEOs are subject to relative performance29valuation, and if so, what is the comparison set of firms/CEOs used (Edmans and Gabaix, 2016). Other fields such as macroeconomics, political economy and trade are all obvious areas where so-cial interactions across jurisdictions/countries etc. could drive key outcomes, panel data exist, andthe number of nodes is relatively fixed. Our approach can also be applied to new contexts wheresocial interactions determine economic behavior but data on social links is absent. Advances inthe availability of administrative data, data from social media or mobile technologies, and highfrequency data in finance and from online economic transactions, all offer new possibilities to iden-tify social interactions. For example, van Vliet (2018) studies the interconnectedness between thelargest financial institutions in during the 2008 financial crisis using readlily available market data,in which N = 13 and T = 500 .Three further directions for future research are of note. First, under partial observability of W (as in Blume et al. , 2015), the number of parameters in W to be retrieved falls quickly. Ourapproach can then still be applied to complete knowledge of W , such as if Aggregate RelationalData is available, and this could be achieved with potentially weaker assumptions for identification,and in even shorter panels. To illustrate possibilities, Figure 4 shows results from a final simulationexercise in which we assume the researcher starts with partial knowledge of W . We do so for theBanerjee et al. (2013) village family network, showing simulation results for scenarios in which theresearcher knows the social ties of the three (five, ten) households with the highest out-degree. Forcomparison we also show the earlier simulation results when W is entirely unknown. This clearlyillustrates that with partial knowledge of the social network, performance on all metrics improvesrapidly for any given T .Second, we have developed our approach in the context of the canonical linear social interactionsmodel (1). This builds on Manski (1993) when W is known to the researcher, and the reflectionproblem is the main challenge in identifying endogenous and exogenous social effects. However, asestablished in Blume et al. (2011) and Blume et al. (2015), the reflection problem is functional-formdependent and may not apply to many non-linear models. An important topic for future researchis thus to extend the insights gathered here to non-linear social interaction settings.Finally, our approach has taken the network structure as predetermined and fixed. Clearly, animportant part of the social networks literature examines endogenous network formation (Jackson et al. , 2017; de Paula, 2017). Our analysis allows us to begin probing the issue in two ways. First,the kind of dyadic regression analysis in Section 4 on the correlates of entries in W ,ij suggests fac- Edmans and Gabaix (2016) overview the theory and empirics of executive compensation. Applying the infor-mativeness principle in contract theory to CEO pay suggests peer performance is informative about the degree towhich firm value is due to high CEO effort or luck. In a first generation of studies, Aggarwal and Samwick (1999)and Murphy (1999) showed that CEO pay is determined by absolute, rather than relative performance. However,this conclusion has been challenged by others such as Gong et al. (2011) who argue these conclusions arise fromidentifying relative performance evaluation (RPE) based on an implicit approach, assuming a peer group (e.g. basedon industry and/or size). Indeed, when Gong et al. (2011) study the explicit use of RPE, based on the disclosureof peer firms and performance measures mandated by the SEC in 2006, they actually find that % of S&P 1500firms explicitly using RPE. We are currently working on using our method to provide novel evidence on the matter. ˆ W econ by running theprocedure in two subsamples, each with T = 26 periods: 1962-88 and 1989-2015. Panels A andB in Figure 5 shows the resulting estimated economic networks in each subsample, and Panel Cprovides network statistics for each subsample panel (as well as for the earlier estimated economicnetwork and the network based on geographic neighbors). This highlights that the network struc-ture of tax competition has changed over time, with the later sample network from 1989-2015having fewer edges, fewer reciprocated edges, lower clustering and lower degree distribution.This analysis leads naturally to a broad agenda going forward, to address the challenge ofsimultaneously identifying and estimating time varying models of network formation and socialinteraction, all in cases where data on social networks is not required. References
Acemoglu, D., V. Carvalho, A. Ozdaglar, and A. Tahbaz-Salehi (2012). The NetworkOrigins of Aggregate Fluctuations.
Econometrica , 80, 1977–2016.
Aggarwal, R. K. and A. A. Samwick (1999). Executive Compensation, Strategic Competition,and Relative Performance Evaluation: theory and evidence.
The Journal of Finance , 54.
Ambrosetti, A. and G. Prodi (1972). On the Inversion of Some Differentiable Mappings withSingularities between Banach Spaces.
Annali di Matematica Pura ed Applicata , 93, 231–46.——— (1995).
A Primer of Nonlinear Analysis , Cambridge University Press.
Anselin, L. (2010). Thirty Years of Spatial Econometrics.
Papers in Regional Science , 89, 3–25.
Atalay, E., A. Hortacsu, J. Roberts, and C. Syverson (2011). Network Structure ofProduction.
Proceedings of the American Mathematical Society , 108, 5199–202.
Ballester, C., A. Calvo-Armendol, and Y. Zenou (2006). Who’s Who in Networks.Wanted: The Key Player.
Econometrica , 74, 1403–17.
Banerjee, A., A. G. Chandrasekhar, E. Duflo, and M. O. Jackson (2013). The Diffusionof Microfinance.
Science , 341, 1236498.
Barrenho, E., M. Miraldo, C. Propper, and C. Rose (2019). Peer and Network Effects inMedical Innovation: The Case of Laparoscopic Surgery in the English NHS. University of YorkWorking Paper. 31 attaglini, M., E. Patacchini, and E. Rainone (2019). Endogenous Social Connections inLegislatures.
NBER Working Paper 25988 . Besley, T. and A. Case (1994). Unnatural Experiments? Estimating the Incidence of Endoge-nous Policies.
NBER Working Paper 4956 .——— (1995). Incumbent Behavior: Vote-seeking, Tax-setting, and Yardstick Competition.
Amer-ican Economic Review , 85, 25–45.
Blume, L., W. A. Brock, S. N. Durlauf, and Y. Ioannides (2011). Identification of SocialInteractions. in
Handbook of Social Economics , ed. by J. Behabib, A. Bisin, and M. O. Jackson,Noth-Holland, vol. 1B.
Blume, L. E., W. A. Brock, S. N. Durlauf, and R. Jayaraman (2015). Linear SocialInteractions Models.
Journal of Political Economy , 123, 444–96.
Bonaldi, P., A. Hortacsu, and J. Kastl (2015). An Empirical Analysis of Funding CostsSpillovers in the EURO-zone with Application to Systemic Risk. Princeton University WorkingPaper.
Bramoullé, Y., H. Djebbari, and B. Fortin (2009). Identification of Peer Effects ThroughSocial Networks.
Journal of Econometrics , 150, 41–55.
Brennan, G. and J. Buchanan (1980).
The Power to Tax: Analytical Foundations of a FiscalConstitution , Cambridge University Press.
Breza, E., A. Chandrasekhar, T. McCormick, and M. Pan (2019). Using AggregatedRelational Data to Feasibly Identify Network Structure without Network Data.
American Eco-nomic Review . Bursztyn, L., F. Ederer, B. Ferman, and N. Yuchtman (2014). Understanding Mech-anisms Underlying Peer Effects: Evidence From a Field Experiment on Financial Decisions.
Econometrica , 82, 1273–301.
Caner, M. and H. H. Zhang (2014). Adaptive Elastic Net for Generalized Method of Moments.
Journal of Business and Economic Statistics , 32, 30–47.
Case, A., J. R. Hines, and H. S. Rosen (1989). Copycatting: Fiscal Policies of States andTheir Neighborrs.
NBER Working Paper 3032 . Case, A., H. Rosen, and J. Hines (1993). Budget Spillovers and Fiscal Policy Interdependence:Evidence from the States.
Journal of Public Economics , 52, 285–307.
Chandrasekhar, A. and R. Lewis (2016). Econometrics of Sampled Networks. Working Paper.32 haney, T. (2014). The Network Structure of International Trade.
American Economic Review ,104, 3600–34.
Chiappori, P.-A., I. Komunjer, and D. Kristensen (2015). Nonparametric Identificationand Estimation of Transformation Models.
Journal of Econometrics , 188, 22–39.
Coleman, J. S. (1964).
Introduction to Mathematical Sociology , London Free Press Glencoe.
Conley, T. G. and C. R. Udry (2010). Learning About a New Technology: Pineapple inGhana.
American Economic Review , 100, 35–69.
De Giorgi, G., M. Pellizzari, and S. Redaelli (2010). Identification of Social Interactionsthrough Partially Overlapping Peer Groups.
American Economic Journal: Applied Economics ,2, 241–75. de Marco, G., G. Gorni, and G. Zampieri (2014). Global Inversion of Functions: an Intro-duction.
ArXiv:1410.7902v1 . de Paula, A. (2017). Econometrics of Network Models. in Advances in Economics and Econo-metrics: Theory and Applications , ed. by B. Honore, A. Pakes, M. Piazzesi, and L. Samuelson,Cambridge University Press.
Devereux, M. and R. Griffith (1998). Taxes and the Location of Production: Evidence froma Panel of US Multinationals.
Journal of Public Economics , 68, 335–67.
Diebold, F. X. and K. Yilmaz (2015).
Financial and Macroeconomic Connectedness: A Net-work Approach to Measurement and Monitoring , Oxford University Press.
Dyreng, S., B. Lindsey, and J. Thornbock (2013). Exploring the Role Delaware Plays as aDomestic Tax Haven.
Journal of Financial Economics , 108, 751–72.
Edmans, A. and X. Gabaix (2016). Executive Compensation: a modern primer.
Journal ofEconomic literature , 54, 1232–87.
Erdos, P. and A. Renyi (1960). On the Evolution of Random Graphs.
Publ. Math. Inst. Hung.Acad. Sci , 5, 17–60.
Farber, S., A. Páez, and E. Volz (2009). Topology and dependency tests in spatial andnetwork autoregressive models.
Geographical Analysis , 41, 158–180.
Fetzer, T., O. Eynde, P. Souza, and A. Wright (2020). Security Transitions. .
Findley, M., D. Nielson, and J. Sharman (2012). Global Shell Games: Testing MoneyLaunderers’ and Terrorist Financiers’ Access to Shell Companies. Griffith University WorkingPaper. 33 autier, E. and C. Rose (2016). Inference in Social Effects when the Network is Sparse andUnknown. (in preparation).
Gautier, E. and A. Tsybakov (2014). High-Dimensional Instrumental Variables Regressionand Confidence Sets. Working Paper CREST.
Gong, G., L. Y. Li, and J. Y. Shin (2011). Relative Performance Evaluation and Related PeerGroups in Executive Compensation Contracts.
The Accounting Review , 86, 1007–43.
Granovetter, M. (1973). The Strength of Weak Ties.
American Journal of Sociology , 6, 1360–80.
Hines, J. (1996). Altered States: Taxes and the Location of Foreign Direct Investment in America.
American Economic Review , 86, 1076–94.
Horn, R. A. and C. R. Johnson (2013).
Matrix Analysis , Cambridge University Press.
Jackson, M., B. Rogers, and Y. Zenou (2017). The Economic Consequences of Social Net-work Structure.
Journal of Economic Literature , 55, 49–95.
Janeba, E. and S. Osterleh (2013). Tax and the City – A Theory of Local Tax Competition.
Journal of Public Economics , 106, 89–100.
Kennedy, J. and R. Eberhart (1995). Particle Swarm Optimization.
Proceedings of the IEEEInternational Conference on Neural Networks , IV, 1942–8.
Kleven, H., C. Landais, and E. Saez (2013). Taxation and International Mobility of Super-stars: Evidence from the European Football Market.
American Economic Review , 103, 1892–924.——— (2014). Migration and Wage Effects of Taxing Top Earners: Evidence from the Foreigners’Tax Scheme in Denmark.
Quarterly Journal of Economics , 129, 333–78.
Kline, B. and E. Tamer (2016). Bayesian Inference in a Class of Partially Identified Models.
Quantitative Economics , 7, 329–366.
Komunjer, I. (2012). Global Identification in Nonlinear Models with Moment Restrictions.
Econometric Theory , 28, 719–29.
Krantz, S. G. and H. R. Parks (2013).
The Implicit Function Theorem , Birkhauser.
Lam, C. and P. C. Souza (2016). Detection and Estimation of Block Structure in Spatial WeightMatrix.
Econometric Reviews , 35, 1347–1376.——— (2019). Estimation and Selection of Spatial Weight Matrix in a Spatial Lag Model.
Journalof Business and Economic Statistics , forthcoming.34 ee, L.-F. (2004). Asumptotic Distributions of Quasi-Maximum Likelihood Estimators for SpatialAutoregressive Models.
Econometrica , 72, 25.——— (2007). Identification and Estimation of Econometric Models with Group Interactions,Contextual Factors and Fixed Effects.
Journal of Econometrics , 60, 531–42.
Lee, S. and A. Lewbel (2013). Nonparametric Identification of Accelerated Failure Time Com-peting Risks Models.
Econometric Theory , 29, 905–19.
Lewbel, A., X. Qu, and X. Tan (2019). Social Networks with Misclassified or UnobservedLinks. Manuscript.
Manresa, E. (2016). Estimating the Structure of Social Interactions Using Panel Data.Manuscript.
Manski, C. F. (1993). Identification of Endogenous Social Effects: the reflection problem.
TheReview of Economic Studies , 60, 531–42.
Meinshausen, N. and P. Buhlmann (2006). High-Dimensional Graphs and Variable Selectionwith the Lasso.
The Annals of Statistics , 34, 1436–1462.
Mizruchi, M. S. and E. J. Neuman (2008). The effect of density on the level of bias in thenetwork autocorrelation model.
Social Networks , 30, 190–200.
Murphy, K. J. (1999). Executive Compensation.
Handbook of Labor Economics , 3, 2485–563.
Neuman, E. J. and M. S. Mizruchi (2010). Structure and bias in the network autocorrelationmodel.
Social Networks , 32, 290–300.
Oates, W. and R. Schwab (1988). Economic Competition Among Jurisdictions: Efficiency-enhancing or Distortion-inducing?.
Journal of Public Economics , 35, 333–54.
Rose, C. (2015). Essays in Applied Microeconometrics. Ph.D. thesis, University of Bristol.
Rothenberg, T. (1971). Identification in Parametric Models.
Econometrica , 39, 577–91.
Rothenhäusler, D., C. Heinze, J. Peters, and N. Meinshausen (2015). BACKSHIFT:Learning causal cyclic graphs from unknown shift interventions. in
Advances in Neural Infor-mation Processing Systems , 1513–1521.
Sacerdote, B. (2001). Peer Effects with Random Assingment: Results for Dartmouth Room-mates.
The Quarterly Journal of Economics , 116, 681–704.
Shleifer, A. (1985). A Theory of Yardstick Competition.
Rand Journal of Economics , 16.35 mith, T. E. (2009). Estimation bias in spatial models with strongly connected weight matrices.
Geographical Analysis , 41, 307–332.
Souza, P. C. (2014). Estimating Network Effects without Network Data. PUC-Rio WorkingPaper.
Tibshirani, R. J. and J. Taylor (2012). Degrees of freedom in lasso problems.
The Annals ofStatistics , 40, 1198–1232.
Tiebout, C. (1956). A Pure Theory of Local Expenditures.
Journal of Political Economy , 64,416–24. van Vliet, W. (2018). Connections as Jumps: Estimating Financial Interconnectedness fromMarket Data. Manuscript.
Wang, W., E. J. Neuman, and D. A. Newman (2014). Statistical power of the social networkautocorrelation model.
Social Networks , 38, 88–99.
Watts, D. J. (1999). Networks, Dynamics, and the Small-World Phenomenon.
American Journalof Sociology , 105, 493–527.
Wilson, J. (1986). A Theory of Interregional Tax Competition.
Journal of Urban Economics , 19,296–315.——— (1999). Theories of Tax Competition.
National Tax Journal , 52, 269–304.
Wooldridge, J. (2002).
Econometric Analysis of Cross Section and Panel Data , The MIT Press.
Yuan, M. and Y. Lin (2007). Model Selection and Estimation in the Gaussian Graphical Model.
Biometrika , 94, 19–35.
Zhou, W. (2019). A Network Social Interaction Model with Heterogeneous Links.
EconomicsLetters , 180, 50–53.
Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties.
Journal of the American StatisticalAssociation , 101, 1418–29.
Zou, H. and T. Hastie (2005). Regularization and Variable Selection via the Elastic Net.
Journalof the Royal Statistical Society: Series B (Statistical Methodology) , 67, 301–320.
Zou, H., T. Hastie, and R. Tibshirani (2007). On the “Degrees of Freedom” of the LASSO.
The Annals of Statistics , 35, 2173–2192.
Zou, H. and H. H. Zhang (2009). On the Adaptive Elastic-net with a Diverging Number ofParameters.
Ann. Statist. , 37, 1733–51. 36
Proofs
Theorem 1
Proof.
The local identification result follows Rothenberg (1971). Under the assumptions in ourmodel, the parameter space Θ ⊂ R m is an open set (recall that m = N ( N −
1) + 3 .) Thiscorresponds to Assumption I in Rothenberg (1971).We have that, ∂ Π ∂W ij = ρ ( I − ρW ) − ∆ ij ( I − ρW ) − ( βI + γW ) + ( I − ρW ) − γ ∆ ij ∂ Π ∂ρ = ( I − ρW ) − W ( I − ρW ) − ( βI + γW ) ∂ Π ∂γ = ( I − ρW ) − W∂ Π ∂β = ( I − ρW ) − , where ∆ ij is the N × N matrix with 1 in the ( i, j ) -th position and zero elsewhere. Write the N × m derivative matrix ∇ Π ≡ ∂ vec (Π) ∂θ (cid:48) . By assumption, row i in matrix W sums up to one,incorporated through the restriction that ϕ ≡ (cid:80) Nj =1 ,j (cid:54) = i W ij − , for the unit-normalised row i . The derivative of the restriction ϕ is the m -dimensional vector ∇ (cid:48) W ≡ ∂ϕ∂θ (cid:48) = (cid:2) e (cid:48) i ⊗ ι (cid:48) N − × (cid:3) (where e i is an N -dimensional vector with 1 in the i th component and zero, otherwise). FollowingTheorem 6 of Rothenberg (1971), the structural parameters θ ∈ Θ are locally identified if, andonly if, the matrix ∇ ≡ [ ∇ (cid:48) Π ∇ (cid:48) W ] (cid:48) has rank m . If ∇ is does not have rank m , there is a nonzero vector c ≡ (cid:0) c W , . . . , c W N,N − , c ρ , c γ , c β (cid:1) (cid:48) suchthat ∇ · c = 0 . This implies that c W ∂ Π ∂W + · · · + c W N,N − ∂ Π ∂W N,N − + c ρ ∂ Π ∂ρ + c γ ∂ Π ∂γ + c β ∂ Π ∂β = 0 (12)and, for the unit-normalized row i (see A4), (cid:88) j (cid:54) = i,j =1 ,...,n c W ij = 0 . (13) For a parameter vector to be locally identified, Rothenberg (1971) requires that the derivative matrix ∇ haverank m at that point and that this vector be (rank-)regular. A (rank-)regular point of the parameter space is onefor which there is a neighborhood where the rank of ∇ is constant (see Definition 4 in Rothenberg, 1971). Becausewe show that the derivative matrix has rank m at every point in the parameter space, this also guarantees thatevery point in the parameter space is (rank-)regular. ( I − ρW ) and substituting the derivatives, N (cid:88) i,j =1 ,i (cid:54) = j c W ij (cid:2) ρ ∆ ij ( I − ρW ) − ( βI + γW ) + γ ∆ ij (cid:3) ++ c ρ W ( I − ρW ) − ( βI + γW ) + c γ W + c β I = 0 . Define C ≡ (cid:80) Ni,j =1 ,i (cid:54) = j c W ij ∆ ij . Since the spectral radius of ρW is strictly less than one by A2, onecan show (by representing ( I − ρW ) − as a Neumann series, for instance) that ( βI + γW ) and ( I − ρW ) − commute. Then, the expression above is equivalent to ρC ( βI + γW ) ( I − ρW ) − + γC + c ρ W ( βI + γW ) ( I − ρW ) − + c γ W + c β I = 0 . Post-multiplying by ( I − ρW ) , we obtain ρC ( βI + γW ) + γC ( I − ρW ) + c ρ W ( βI + γW ) + c γ W ( I − ρW ) + c β ( I − ρW ) = 0 which, upon rearrangement, yields ( γ + ρβ ) C + c β I + ( βc ρ − c β ρ + c γ ) W + ( c ρ γ − ρc γ ) W = 0 . (14)Because C ii = 0 and W ii = 0 (by A1), we have that c β + ( c ρ γ − ρc γ ) ( W ) ii = 0 for all i = 1 , . . . , N .Since by assumption A5 there isn’t a constant κ such that diag ( W ) = κι , then c β = c ρ γ − ρc γ = 0 .Plugging back in (14), we obtain ( γ + ρβ ) C + ( βc ρ + c γ ) W = 0 . which implies that C = − βc ρ + c γ γ + ρβ W since γ + ρβ (cid:54) = 0 by assumption A3. Taking the sum of theelements in row i , we get ( γ + ρβ ) (cid:88) j (cid:54) = i,j =1 ,...,n c W ij + ( βc ρ + c γ ) = 0 . Note that, by equation (13), (cid:80) j (cid:54) = i,j =1 ,...,n c W ij = 0 . So βc ρ + c γ = 0 and C = − βc ρ + c γ γ + ρβ W = 0 . Thisimplies that c W ij = 0 for any i and j . Combining βc ρ + c γ = 0 with c ρ γ − ρc γ = 0 obtained above,we get that c ρ ( ρβ + γ ) = 0 . Since ρβ + γ (cid:54) = 0 , then c ρ = 0 . Given that βc ρ + c γ = 0 , it followsthat c γ = 0 . This shows that θ ∈ Θ is locally identified. Corollary 1
Proof.
The parameter θ being locally identified (see Theorem 1) implies that the set { θ : Π( θ ) =Π( θ ) } is discrete. If it is also compact, then the set is finite. To establish that we now show that38 is a proper function: the inverse image Π − ( K ) of any compact set K ⊂ R m is also compact(see Krantz and Parks, 2013, p.124).Let A be a compact set in the space of N × N real matrices. Since it is a compact set ina finite dimensional space, it is closed and bounded. Since Π is a continuous function of θ , thepre-image of a compact set, which is closed, is also closed. Because W is bounded and ρ ∈ ( − , ,their corresponding coordinates in θ ∈ Π − ( A ) are bounded. Suppose the coordinates for β or γ in θ ∈ Π − ( A ) are not bounded. So one can find a sequence ( θ k ) ∞ k =1 such that | β k | → ∞ or | γ k | → ∞ .Denote the Frobenius norm of the matrix A as (cid:107) A (cid:107) . By the submultiplicative property (cid:107) AB (cid:107) ≤(cid:107) A (cid:107) · (cid:107) B (cid:107) , (cid:107) βI + γW (cid:107) ≤ (cid:13)(cid:13) ( βI + γW ) ( I − ρW ) − (cid:13)(cid:13) · (cid:107) I − ρW (cid:107) . Note that ( I − ρW ) − and ( βI + γW ) commute, and so (cid:13)(cid:13) ( βI + γW ) ( I − ρW ) − (cid:13)(cid:13) = (cid:13)(cid:13) ( I − ρW ) − ( βI + γW ) (cid:13)(cid:13) = (cid:107) Π (cid:107) . It follows that (cid:107) βI + γW (cid:107)(cid:107) I − ρW (cid:107) ≤ (cid:107) Π (cid:107) . Given W has zero main diagonal, (cid:107) βI + γW (cid:107) = β (cid:107) I (cid:107) + γ (cid:107) W (cid:107) = β N + γ (cid:107) W (cid:107) . Also, (cid:107) I − ρW (cid:107) = N + ρ (cid:107) W (cid:107) ≤ N + ρ C , for some constant C ∈ R , since W is bounded byassumption A2. We then have that (cid:113) β N + γ (cid:107) W (cid:107) (cid:112) N + ρ C ≤ (cid:107) Π (cid:107) . Since | ρ | < by Assumption (A2) the denominator above is bounded. Hence | β k | → ∞ ⇒(cid:107) Π( θ k ) (cid:107) → ∞ . We now use the fact that (cid:80) j W ij = 1 to show that there is a lower bound on (cid:107) W (cid:107) , and so | γ k | → ∞ ⇒ (cid:107) Π( θ k ) (cid:107) → ∞ . To see this, note that min s.t. (cid:80) j W ij =1 (cid:107) W (cid:107) ≥ min s.t. (cid:80) j W ij =1 N (cid:88) j =1 W ij . L ( W i , . . . , W i,i − , W i,i +1 , . . . , W iN ; µ ) = N (cid:88) j =1 W ij − µ (cid:32)(cid:88) j W ij − (cid:33) . where µ is the Lagrangean multiplier for the normalisation constraint. The first-order conditionsfor this convex minimization problem are: ∂ L ∂W ij = 2 W ij − µ = 0 , for any j (cid:54) = i∂ L ∂µ = N (cid:88) j =1 W ij − . The first equation implies that W ij = µ for j = 1 , . . . , i − , i + 1 , . . . , N . Using the fact that W ii = 0 , the second equation implies that µ = 2 / ( N − . We have then that W ij = N − , j (cid:54) = i and, consequently, (cid:107) W (cid:107) ≥ ( N − N − = N − . Hence, if | γ k | → ∞ , the numerator in the lowerbound for (cid:107) Π (cid:107) above also goes to infinity. Consequently, A would not be compact.Therefore, if A is compact the coordinates in θ ∈ Π − ( A ) corresponding to β and γ are alsobounded. Hence, Π − ( A ) is bounded (and closed). Consequently it is compact.For a given reduced form parameter matrix Π , the set { θ : Π( θ ) = Π( θ ) } is then compact.Since it is also discrete, it is finite.The following lemmas are used in proving Theorem 2. Lemma 1.
Assume (A1)-(A5). If γ = 0 , W is such that ( W ) , = ( W ) , = 1 and ( W ) ij = 0 otherwise, with ρ (cid:54) = 0 and β (cid:54) = 0 , then θ ∈ Θ is identified.Proof. Take θ = ( W , . . . , W N,N − , ρ, γ, β ) ∈ Θ possibily different from θ such that the modelsare observationally equivalent, so Π = Π . Then ( I − ρ W ) − ( β I + γ W ) = ( I − ρW ) − ( βI + γW ) . Since γ = 0 and ( I − ρW ) − and ( βI + γW ) commute (see the proof for Theorem 1), it followsthat Π = Π ⇔ β ( I − ρ W ) − = ( βI + γW )( I − ρW ) − or, equivalently, β ( I − ρW ) = ( I − ρ W )( βI + γW ) . ( β − β ) I − ( γ + β ρ ) W + ρ βW + ρ γW W = 0 . (15)We first note that ( W W ) N,N = 0 since ( W ) N,i = 0 for any ≤ i ≤ N and, by Assumption (A1), ( W ) N,N = ( W ) N,N = 0 . So β = β . Taking elements ( i, j ) such that i ≥ and i (cid:54) = j in equation(15), and using the fact that β = β , we find that − ( γ + β ρ )( W ) ij = − ( γ + βρ )( W ) ij = 0 for any ( i, j ) such that i ≥ and i (cid:54) = j . By Assumption (A3), γ + βρ (cid:54) = 0 and it follows that ( W ) ij = 0 for any ( i, j ) such that i ≥ and i (cid:54) = j . In fact, since ( W ) i,i = 0 by Assumption (A1), we get that ( W ) ij = 0 for any ( i, j ) such that i ≥ .Using Assumption (A1) and since β = β , elements (1 , and (2 , in equation (15) implythat ρ γ ( W ) , = ρ γ ( W ) , = 0 . Given that ρ (cid:54) = 0 , we get that γ ( W ) , = γ ( W ) , = 0 .From element (1 , in equation (15) we find that − ( γ + β ρ )( W ) , + ρ β = 0 or, equivalently, ( ρ − ρ ( W ) , ) β − γ ( W ) , = 0 . Given that γ ( W ) , = 0 and β (cid:54) = 0 , it must be that ρ − ρ ( W ) , =0 . Making the analogous argument for element (2 , , we would also obtain that ρ − ρ ( W ) , = 0 .If both ( W ) , and ( W ) , are equal to zero, using the fact that W ij = 0 for any ( i, j ) such that i ≥ , we would then obtain that W is equal to W ) , · · · ( W ) ,N W ) , · · · ( W ) ,N · · · ... ... ... . . . ... · · · = · · ·
00 0 0 · · ·
00 0 0 · · · ... ... ... . . . ... · · · , which contradicts Assumption (A5). Hence ( W ) , (cid:54) = 0 or ( W ) , (cid:54) = 0 . If ( W ) , (cid:54) = 0 , using thefact that γ ( W ) , = 0 , we get that γ = 0 . Equivalently, if ( W ) , (cid:54) = 0 , and using the fact that γ ( W ) , = 0 , we again get that γ = 0 . So, in either case, γ = γ = 0 .Taking element (1 , j ) in equation (15), with j ≥ , we get that − ( γ + ρβ ) W ,j + γρ ( W ) ,j = − ρβ W ,j = 0 . Similarly, element (2 , j ) , with j ≥ implies that − ( γ + ρβ ) W ,j + γρ ( W ) ,j = − ρβ W ,j = 0 . Then, from − ρβ ( W ) ,j = − ρβ ( W ) ,j = 0 for j ≥ , it follows that − ρ ( W ) ,j = − ρ ( W ) ,j = 0 since β (cid:54) = 0 .From ρ − ρ ( W ) , = 0 , if ( W ) , (cid:54) = 0 , we get that ρ = ρ / ( W ) , (cid:54) = 0 . Equivalently, if ( W ) , (cid:54) = 0 , we get that ρ = ρ / ( W ) , (cid:54) = 0 . Since ( W ) , (cid:54) = 0 or ( W ) , (cid:54) = 0 , we obtain that ρ (cid:54) = 0 .Then, because − ρ ( W ) j = ρ ( W ) j = 0 for j ≥ , we have that ( W ) j = ( W ) j = 0 for j ≥ .Given that ρ − ρ ( W ) , = 0 , ρ − ρ ( W ) , = 0 and ρ (cid:54) = 0 , we obtain that ( W ) , = ( W ) , = ρ ρ .Since ( W ) ,j = 0 for j (cid:54) = 2 , ( W ) ,j = 0 for j (cid:54) = 1 and ( W ) ij = 0 for i ≥ , by Assumption(A5) we get that ( W ) , = ( W ) , = 1 and ρ = ρ . Hence, (( W ) , , . . . , ( W ) N,N − , ρ, γ, β ) =(( W ) , , . . . , ( W ) N,N − , ρ , γ , β ) . 41 emma 2. Assume (A1)-(A2) and (A4)-(A5). The image of Π( · ) , for θ ∈ Θ + , is path-connectedand, therefore, connected.Proof. Take θ and θ ∗ ∈ Θ + . Consider first the subvectors corresponding to the adjacency matrices W and W ∗ . Without loss of generality, let , . . . , N be ordered such that ( W ) > ( W ) . Con-sider the adjacency matrix W ∗ corresponding to the network of directed connections { (1 , , (2 , } and { (3 , , (4 , , . . . , ( N − , N ) , ( N, } : W ∗ = · · ·
01 0 0 0 · · ·
00 0 0 1 · · · ... ... ... . . . . . . ... · · · . Note that diag ( W ∗ ) = (1 , , , . . . , and this is an admissible adjacency matrix under assumptions(A1)-(A2) and (A4)-(A5). We first show that W is path-connected to W ∗ .Consider the path given by W ( t ) = tW ∗ + (1 − t ) W which implies that ( W ( t ) ) = (1 − t ) ( W ) + t + (1 − t ) t ( W + W )( W ( t ) ) = (1 − t ) ( W ) + t + (1 − t ) t ( W + W ) . Since ( W ( t ) ) − ( W ( t ) ) = (1 − t ) [( W ) − ( W ) ] > for t ∈ [0 , and W (1) = W ∗ ,(A5) is satisfied for any matrix W ( t ) such that t ∈ [0 , . Since all rows in W ∗ sum to one and ( W ∗ ) ii = 0 for any i , it is straightforward to see that W ( t ) also satisfies (A1) and (A4). Finally, (cid:80) Nj =1 | W ij ( t ) | ≤ t (cid:80) Nj =1 | ( W ∗ ) ij | + (1 − t ) (cid:80) Nj =1 | W ij | ≤ for every i = 1 , . . . , N and W ( t ) satisfiesAssumption (A2).If W ∗ is such that ( W ∗ ) (cid:54) = ( W ∗ ) , the convex combination of W ∗ and W ∗ is also seen tosatisfy (A1)-(A2) and (A4)-(A5) and a path between W and W ∗ can be constructed via W ∗ . If,on the other hand ( W ∗ ) = ( W ∗ ) , suppose without loss of generality that ( W ∗ ) (cid:54) = ( W ∗ ) .In this case, one can construct a path between W ∗ and W ∗∗ where W ∗∗ represents the network of42irected connections { (1 , , (3 , } and { (2 , , (4 , , . . . , ( N − , N ) , ( N, } : W ∗∗ = · · ·
00 0 0 1 · · ·
01 0 0 0 · · · ... ... ... . . . . . . ... · · · . Like W ( t ) above, this path can be seen to satisfy assumptions (A1)-(A2) and (A4)-(A5). Nownote that a path can also be constructed between W ∗ and W ∗∗ as their convex combination alsosatisfies (A1)-(A2) and (A4)-(A5). For example, note that ˆ W ( t ) = tW ∗ + (1 − t ) W ∗∗ is such that ( ˆ W ( t ) ) = t + (1 − t ) and ( ˆ W ( t ) ) NN = 0 so ( ˆ W ( t ) ) − ( ˆ W ( t ) ) NN > for any t ∈ (0 , and both ˆ W (0) and ˆ W (1) satisfy (A5). Hence, we can construct a path W ( t ) between W and W ∗ through W ∗ and W ∗∗ .Furthermore, ρ ( t ) = tρ ∗ +(1 − t ) ρ , β ( t ) = ( tρ ∗ β ∗ +(1 − t ) ρβ ) / ( tρ ∗ +(1 − t ) ρ ) , γ ( t ) = tγ ∗ +(1 − t ) γ are such that f ( t ) ≡ ρ ( t ) β ( t ) + γ ( t ) = t ( ρ ∗ β ∗ + γ ∗ ) + (1 − t )( ρβ + γ ) > , since θ ∗ and θ ∈ Θ + . (Note also that | ρ ( t ) | < so Assumption (A2) is satisfied.) These facts takentogether imply that θ ( t ) ≡ ( W ( t ) , . . . , W ( t ) N,N − , ρ ( t ) , γ ( t ) , β ( t )) ∈ Θ + . That is, Θ + is path-connected and therefore connected. Since Π( · ) is continuous on Θ + , Π(Θ + ) isconnected. Theorem 2
Proof.
The proof uses Corollary 1.4 in Ambrosetti and Prodi (1995, p. 46), which we reproducehere with our notation for convenience: Suppose the function Π( · ) is continuous, proper and locallyinvertible with a connected image. Then the cardinality of Π − (Π) is constant for any Π in theimage of Π( · ) . The mapping Π( θ ) is continuous and proper (by Corollary 1), with connected image (Lemma2), and non-singular Jacobian at any point (as per the proof for Theorem 1) which guarantees localinvertibility. Following Corollary 1.4 in Ambrosetti and Prodi (1995, p.46) reproduced above, weobtain that the cardinality of the pre-image of Π( θ ) is finite and constant. Take θ ∈ Θ + such that Related results can be found in Ambrosetti and Prodi (1972) and de Marco et al. (2014) = 0 , ( W ) , = ( W ) , = 1 and ( W ) i,j = 0 otherwise, with ρ (cid:54) = 0 and β (cid:54) = 0 . By Lemma 1, thatcardinality is one. Corollary 3
Proof.
Since ρ ∈ (0 , and W ij ≥ , (cid:80) ∞ k =1 ρ k − W k is a non-negative matrix. By (5), the off-diagonal elements of Π( θ ) are equal to the off-diagonal elements of ( ρβ + γ ) (cid:80) ∞ k =1 ρ k − W k , thesign of those elements identifies the sign of ρβ + γ . By Theorem 2, the model is identified. Corollary 4
Proof.
Since W is non-negative and irreducible, there is a real eigenvalue equal to the spectralradius of W corresponding to the unique eigenvector whose entries can be chosen to be strictlypositive (i.e., all the entries share the same sign). A generic eigenvalue of W , λ , corresponds toan eigenvalue of Π according to: λ Π = β + ( ρ β + γ ) λ − ρ λ If λ = a + b i where a , b ∈ R and i = √− , then λ Π = β + ( ρ β + γ ) a (1 − ρ a ) − ρ b (1 − ρ a ) + ρ b + ( ρ β + γ ) b (1 − ρ a ) + ρ b i. If the eigenvalue λ is real, b = 0 and the corresponding λ Π eigenvalue is also real. Differentiating Re ( λ Π ) , the real part of λ Π , with respect to Re ( λ ) = a , we get: ∂Re ( λ Π ) ∂a = (1 − ρ a ) − ρ b [(1 − ρ a ) + ρ b ] × ( ρ β + γ ) . (16)If the eigenvalue λ is real, the expression (16) becomes: ∂Re ( λ Π ) ∂a = ∂λ Π ∂a = 1(1 − ρ a ) × ( ρ β + γ ) . The fraction multiplying ρ β + γ is positive. If ρ β + γ < , the real eigenvalues of Π aredecreasing on the real eigenvalues of W . Consequently, the eigenvector corresponding to thelargest (real) eigenvalue of W will be associated with smallest real eigenvalue of Π . If, on theother hand, ρ β + γ > the eigenvector corresponding to the largest real eigenvalue of W willcorrespond to the largest real eigenvalue of Π . Since that eigenvector is the unique eigenvectorthat can be chosen to have strictly positive entries, the sign of ρ β + γ is identified by the λ Π eigenvalue it is associated with and whether it is the largest or smallest real eigenvalue. ByTheorem 2, the model is identified. 44f there is only one real eigenvalue, note that the denominator in the fraction in (16) is positive.The minimum value of the numerator subject to | λ | = a + b ≤ is given by min a ,b (1 − ρ a ) − ρ b s.t. a + b ≤ . The Lagrangean for this minimization problem is given by: L ( a , b ; µ ) = (1 − ρ a ) − ρ b + µ ( a + b − . where µ is the Lagrange multiplier associated with the constraint a + b ≤ . The Kuhn-Tuckernecessary conditions for the solution ( a ∗ , b ∗ , µ ∗ ) of this problem are given by: ( ∂a :) ρ (1 − ρ a ∗ ) − µ ∗ a ∗ = 0( ∂b :) ( ρ − µ ∗ ) b ∗ = 0 µ ∗ ( a ∗ + b ∗ −
1) = 0 a ∗ + b ∗ ≤ and µ ∗ ≥ , Let ρ (cid:54) = 0 . (Otherwise, the objective function above is equal to one irrespective of a or b and thepartial derivative is ρ β + γ .) If µ ∗ = 0 , ∂b implies that b ∗ = 0 . Then ∂a would have a ∗ = ρ − which violates a ∗ + b ∗ ≤ .Hence, a solution should have µ ∗ > . In this case, there are two possibilities: b ∗ = 0 or b ∗ (cid:54) = 0 . If b ∗ (cid:54) = 0 , condition ∂b implies that µ ∗ = ρ and ∂a then gives a ∗ = (2 ρ ) − . Because theconstraint is binding, b ∗ = 1 − (4 ρ ) − . In this case, a ∗ ≤ and b ∗ ≥ requires that | ρ | ≥ / .The value of the minimised objective function in this case / − ρ . This is positive if | ρ | < √ / .The other possibility is to have b = 0 . Because the constraint is binding, a = 1 and theobjective function takes the value (1 − ρ ) > . Since (1 − ρ ) − / ρ = 2 ρ − ρ + 1 / ≥ ,this solution is dominated by the previous one when | ρ | ≥ / .Consequently, the fraction multiplying ρ β + γ is non-negative and it can be ascertained thatsgn (cid:20) ∂Re ( λ Π ) ∂a (cid:21) = sgn [ ρ β + γ ] as long as | ρ | < √ / .If ρ β + γ < , the real part of the eigenvalues of Π is decreasing on the real part of theeigenvalues of W . Consequently, the eigenvector corresponding to the eigenvalue of W with thelargest real part will correspond to the eigenvalue of Π with the smallest real part. If, on theother hand, ρ β + γ > the eigenvector corresponding to the eigenvalue of W with the largestreal part will correspond to the eigenvalue of Π with the largest real part. Since that eigenvectoris the unique eigenvector that can be chosen to have strictly positive entries, the sign of ρ β + γ
45s identified by the λ Π eigenvalue it is associated with.By Theorem 2, the model is identified. Proposition 1
Proof.
From equation (6) we observed that Π v j = λ Π ,j v j , where v j is an eigenvector of both W and Π with corresponding eigenvalue λ Π ,j = β + γ λ ,j − ρ λ ,j . Defining c as the row-sum of Π , we alsohave that ˜Π ( I − H ) v j = ( I − H )Π ( I − H ) v j = ( I − H )Π v j − ( I − H )Π Hv j = λ Π ,j ( I − H ) v j − ( I − H ) cHv j = λ Π ,j ( I − H ) v j − ( H − H ) cv j = λ Π ,j ( I − H ) v j − ( H − H ) cv j = λ Π ,j ( I − H ) v j , where the third equality obtains from Π H = cH and the fifth equality holds since H is idempotent.So ˜Π and Π have common eigenvalues, with corresponding eigenvector ˜ v j = v j − ¯ v j ι for ˜Π , where ¯ v j = N ι (cid:48) v j , j = 1 , . . . , N . Since λ Π ,j and ˜ v j are observed from ˜Π , identification of Π is equivalentto identification of ¯ v j (given diagonilizability).To establish identification of ¯ v j , note that W (˜ v j + ¯ v j ι ) = λ ,j (˜ v j + ¯ v j ι ) since v j is an eigenvectorof W . Consider an alternative constant ¯ v ∗ j (cid:54) = ¯ v j that satisfies the previous equation. Then W ι (¯ v j − ¯ v ∗ j ) = λ ,j (¯ v j − ¯ v ∗ j ) . Since W ι = ι , v j must satisfy (1 − λ ,j )(¯ v j − ¯ v ∗ j ) = 0 . For j = 2 , . . . , N , | λ ,j | < . So ¯ v j = ¯ v ∗ j andtherefore identified. For j = 1 , it is known that λ = 1 with eigenvector v = ι . Proposition 2
Proof.
Under row-sum normalization and | ρ | < , ( I − ρ W ) − ι = ι + ρ W ι + ρ W ι + · · · = ι + ρ ι + ρ ι + · · · = ι − ρ , so Π ≡ ( I − ρ W ) − has constant row-sums. If row-sum normalizationfails, Π may not have constant row-sums. Define h ij as the ( ij ) -th element of ˜ H . The first rowof the system ( I − ˜ H )( I − ρ W ) − ι = ( I − ˜ H ) r W = 0 is h ∗ r W , − h r W , − · · · − h N r W ,N = 0 where h ∗ = 1 − h and r W ,l is the l -th element of r W . If there are N possible W , W (1)0 , . . . , W ( n )0 ,such that [ r W (1)0 · · · r W ( N )0 ] has rank N , then h ∗ = h = · · · = h N = 0 . Since the same reasoningapplies to all rows, ˜ H is the trivial transformation ˜ H = I .46 Estimation
B.1 Sparsity of W and Π Define ˜ M as the number of non-zero elements of Π . We say that Π is sparse if ˜ M (cid:28) N T . Denotethe number of connected pairs in W via paths of any length as ˜ m c . We equivalently say that W is "sparse connected" if ˜ m c (cid:28) N T . We show that sparsity of Π is related to sparse connectednessof W . Proposition 3. Π is sparse if, and only if, the number of unconnected pairs W is large.Proof. For | ρ | < , we have that Π = β I + ( ρ β + γ ) ∞ (cid:88) k =1 ρ k − W k . Given that ρ β + γ (cid:54) = 0 , it follows directly that [Π ] ij = 0 if, and only if, there are no pathsbetween i and j in W . Therefore, sparsity of Π translates into a large number of ( i, j ) unconnectedpairs in W .On the one hand, sparsity does not imply sparse connectedness. A circular graph is clearlysparse, but all nodes connect with all other nodes through a path of length at most N . On theother hand, the sparse connectedness implies sparsity and therefore is a stronger requirement.To see this, take any arbitrary network G with ˜ m ( G ) non-zero elements and ˜ m c ( G ) connectedpairs. Now consider the operation of “completing” G : for every connected ( i, j ) pair, add a directlink between ( i, j ) if non-existent in G and denote the resulting matrix as C ( G ) . It is clear that ˜ m ( G ) ≤ ˜ m ( C ( G )) . Yet, ˜ m ( C ( G )) = ˜ m c ( G ) . B.2 Adaptive Elastic Net
B.2.1 Implementation and Initial Conditions
To make our procedure robust to the choice of initial condition, we use the particle swarm algo-rithm. This is an optimization algorithm tailored to more aptly find global optima, which doesnot depend on choice of initial conditions. It works as follows. The procedure starts from a largenumber of initial conditions covering the parameter space, known as “particles” (Kennedy andEberhart, 1995). Each particle is iterated independently until convergence. The algorithm returns47he optimum calculated across particles. , To ensure compliance with row-sum normalization for each row i of W , one non-zero parameter W i,j ∗ is set to − (cid:80) Nj =1 ,j (cid:54) = j ∗ W i,j . This avoids making use of constrained optimization routines. We also impose the restriction that ρ ≥ and W ij ≥ by minimizing the objective function withrespect to ˜ ρ with ρ = ˜ ρ and ˜ W ij with W ij = ˜ W ij .Optimization of (8) starts from the initial condition selected by the particle swarm algorithmand is minimized with respect to the parameters that were neither set to zero nor were chosen toensure row-sum normalization. Estimates from the first stage are subsequently used to adjust thepenalization, as in the Adaptive Elastic Net GMM objective function (10). The steps above arerepeated for different combinations of p = ( p , p ∗ , p ), selected on a grid. The final estimate is theone that minimizes the BIC criterion. B.3 OLS
For the purpose of estimation, it is convenient to write the model in the stacked form. Let x = [ x , . . . , x T ] (cid:48) be the T × N matrix of explanatory variables, y i = [ y i , . . . , y iT ] (cid:48) be the T × vector of response variables for individual i and π i = [ π i , . . . , π iN ] (cid:48) where π ij is a short notationfor the ( i, j ) -th element of Π . The concise model is then, y i = xπ i + v i (17)for each i = 1 , . . . , N , where also v i = [ v i , . . . , v iT ] (cid:48) . Model (17) can then be estimated equation-by-equation. Denote π = [ π (cid:48) , . . . , π N (cid:48) ] (cid:48) . Stacking the full set of N equations, y = Xπ + v (18) We set Caner and Zhang’s (2014) suggestion for the initial condition as one of those particles, with minormodifications. The authors suggest calculating the absolute value of the derivative of the unpenalized GMMobjective function evaluated at zero, ∇ W , and the set parameters smaller than p at zero. The rationale is thatif the GMM objective function is invariant with respect to certain parameters, the Elastic Net problem achieves acorner solution (where parameters are set to zero). In our case, allowing only for positive interactions, we set to zerothe elements such that −∇ W ≤ p . All other elements of W gain equal weights such that row-sum normalization isrespected. The derivative ∇ W is mechanically zero if ρ = γ = 0 . So we set ρ = . , given that the parameter spaceis bounded and ρ ∈ [0 , . The other parameters that enter the derivative are ˆ β estimated from a regression of y on x , with the full set of fixed effects, and γ = 0 . We also implemented an additional five particles. Particle 2: like Particle 1 but with size proportional to themagnitude of the derivatives conditional on −∇ W being greater than p ; Particle 3: sets to non-zero all positiveelements of −∇ W with equal weights; Particle 4: selects % highest values of −∇ W , sets all others to zero, and non-zero gain equal weights; Particle 5: W obtained from the Lasso regression of y t on the y t of others with penalization p ; Particle 6: W obtained from the Lasso regression of y t on the x t of others with penalization p . In all cases,weights are rescaled by row-specific constants such that row-sum normalization is complied with. The remaining particles are uniformly randomly selected by the built-in MATLAB particle swarm algorithm. At each row, we pick the j ∗ closest to the main diagonal of W . Note that the Elastic Net penalty p (cid:80) | W i,j | is invariant with respect to choices of W if row-sum normalizationis imposed. Yet, the penalty affects the initial selection of arguments in which W i,j is restricted to zero if thederivative of the objective function is smaller than p in absolute value. y = [ y , . . . , y N ] , X = I N ⊗ x , π = vec (Π (cid:48) ) , and v = [ v , . . . , v N ] . If the number of individ-uals in the network N is fixed and much smaller than data points available, N (cid:28) N T , equation(18) can be estimated via ordinary least squares (OLS). Under suitable regularity conditions, theOLS estimator ˆ π = ( X (cid:48) X ) − X (cid:48) y is asymptotically distributed, √ N T (ˆ π − π ) d −→ N (cid:0) , Q − Σ Q − (cid:1) where Q T ≡ NT X (cid:48) X , Q ≡ p lim T →∞ Q T , Σ T ≡ NT X (cid:48) vv (cid:48) X and Σ ≡ p lim T →∞ Σ T . The proof isstandard and omitted here. As noted above, in typical applications it is customary to row-sumnormalize matrix W . If no individual is isolated, one obtains that, by equation (5), Π ι N = β ι + ( ρ β + γ ) ∞ (cid:88) k =1 ρ k − W k ι = β + γ − ρ ι (19)where ι N is the N -length vector of ones. The last equality follows from the observation that, underrow-normalization of W , W k ι = W ι = ι , k > . Equation (19) implies that Π has constantrow-sums, which implies that row-sum normalization is, in principle, testable. This suggests asimple Wald statistic applied to the estimates of π . Under the null hypothesis, √ N T R ˆ π d −→ N (cid:0) , RQ − Σ Q − R (cid:48) (cid:1) where R = [ I N − ⊗ ι (cid:48) N ; − ι N − ⊗ ι (cid:48) N ] . The Wald statistic is W = N T ( R ˆ π ) (cid:48) ( Q − Σ Q − ) − ( R ˆ π ) ∼ χ N − which is a convenient expression for testing row-sum normalization of W . We also note thatthe asymptotic distribution of ˆ θ can be immediately obtained by the Delta Method, √ T (ˆ θ − θ ) d −→ N (cid:0) , ∇ (cid:48)− θ Q − Σ Q − ∇ θ (cid:1) where ∇ θ is the gradient of ˆ θ with respect to ˆ π . We note that the derivation of the Wald statisticfor testing the row-sum normalization and the asymptotic distribution of ˆ θ does not depend onthe OLS implementation, and can be easily adjusted for any estimator for which the asymptoticdistribution is known. C Simulations
C.1 Set-Up
The simulations are based on two stylized random network structures, and two real world networks.These networks vary in their size, complexity, and aggregate and node-level features. All four49etworks are also sparse. The two stylized networks considered are:( i ) Erdos-Renyi network: we randomly pick exactly one element in each row of W and set thatelement to . This is a random graph with in-degree equal to for every individual (Erdosand Renyi, 1960). Such a network could be observed in practice if connections are formedindependently of one another. With N = 30 , the resulting density of links is . %.( ii ) Political party network: there are two parties, each with a party leader. The leader directlyaffects the behavior of half the party members. We assume that one party has twice thenumber of members as the other. More specifically, we assume individuals i = 1 , . . . , N areaffiliated to Party A and are led by individual ; individuals i = N + 1 , . . . , N are affiliatedto Party B and are led by individual N + 1 . This difference in party size allows us to evaluateour ability to recover and identify central leaders, even in the smaller party. To test theprocedure further, we add one random link per row to represent ties that are not determinedby links to the Party leader. We simulate this network for various choices of N . If N is nota multiple of three, we round N to the nearest integer. With N = 30 , this network has adensity of . %.( iii ) Coleman’s (1964) high school friendship network survey: in / , students in a small highschool in Illinois were asked to name, “fellows that they go around with most often.” A linkwas considered if the student nominated a peer in either survey wave. The full network has N = 73 nodes, of which are non-isolated and so have at least one link to another student.On average, students named just over five friendship peers. This network has density . %.Furthermore, the in-degree distribution shows that most individuals received a small numberof links, while a small number received many peer nominations.( iv ) Banerjee et al. ’s (2013) village network survey: these authors conducted a census of house-holds in villages in rural Karnataka, India, and survey questions include several aboutrelationships with other households in the village. To begin with, we use social ties basedon family relations (later examining insurance networks). We focus on village that iscomprised of N = 77 households, and so similar in size to network (iii) . In this village thereare non-isolated households, with at least one family link to another household. Thisnetwork has density . %.For the stylized networks ( i ) and ( ii ), we first assess the performance of the estimator for afixed network size, N = 30 . We later show how performance varies with alternative network sizes.We simulate the real-world networks ( iii ) and ( iv ) using non-isolated nodes in each (so N = 70 and respectively). As in Bramoullé et al. , 2009, we exclude isolated nodes because they do notconform with row-sum normalization. 50ur result identifies entries in W and so naturally recovers links of varying strength. Datalimitations often force researchers to postulate some ties to be weaker than others (say, based oninteraction frequency). This is in sharp contrast to our approach, that identifies the continuousstrength of ties, W ,ij , where W ,ij > implies node j influences node i .To establish the performance of the estimator in capturing variation in link strength, we proceedas follows for each network. First, for each node we randomly assign one of their links to havevalue W ,ij = . . As the underlying data generating process is assumed to allow for common timeeffects ( α t ), we then set the weight on all other links from the node to be equal and such thatrow-sum normalization (A4’) is complied with. As we consider larger networks, we typicallyexpect them to have more non-zero entries in each row of W , but row-sum normalization meansthat each weaker link will be of lower value. This works against the detection of weaker linksusing estimation methods involving penalization, because they impose small parameter estimatesshrink to zero. Finally, to aid exposition, we set a threshold value for link strength to distinguish‘strong’ and ‘weak’ links. A strong (weak) link is defined as one for which W ,ij > ( ≤ ) . .Summary statistics for each network are presented in Panel A of Table A1. Following Jackson et al. (2017), we consider the following network-wide statistics: number of edges, number of strongand weak edges, number of reciprocated edges, clustering coefficient, number of components, andthe size of the maximal component. In addition, we report the standard deviation calculatedacross elements of the diagonal of W . If this is zero, then the diagonal of W is either zero orproportional to the vector of ones, and Assumption A5 would not be satisfied. We can see thatfor each case this statistic is well above zero.Following Jackson et al. (2017), we consider the following node-level statistics: in- and out-degree distribution (mean and standard deviation), and the three most central individuals. Thefour networks differ in their size, complexity, and the relative importance of strong and weak ties.For example, the Erdos-Renyi network only has strong ties, the political party network has twiceas many strong as weak ties. For the real world networks, the mean out-degree distributions arehigher so the majority of ties are weak, with the high school network having around % of itsedges being weak ties.Panel data for each of the four simulations is generated as, y t = ( I − ρ W ) − ( x t β + W x t γ + α t ι + α ∗ + (cid:15) t ) , where α t is a (scalar) time effect and α ∗ is a N × vector of fixed effects, drawn respectively from For example, if in a given row of W there are two links, one will be randomly selected to be set to . , and theother set to . . If there are three links one is set to . and the other two set to each have weight . to maintainrow-sum normalization, and so on. For the Erdos-Renyi network, there are thus only strong ties as each node hasonly one link to another node. Caner and Zhang (2014) state that “local to zero coefficients should be larger than T − to be differentiatedfrom zero.” (1 , and N ( ι, I N × N ) distributions. We consider T = { , , , , , , , , } . Thetrue parameters are set to ρ = . , β = . and γ = . (thus satisfying Assumption A3). Theexogenous variable ( x t ) and error term ( (cid:15) t ) are simulated as standard Gaussian, both generatedfrom N (0 N , I N × N ) distributions. This is similar to variance terms set in other papers, e.g., Lee(2004). We later conduct a series of robustness checks to evaluate the sensitivity of the simulationsto alternative parameters choices, and the presence of common- and individual-level shocks.For each combination of parameters, we conduct , simulation runs. On the initial runs, we choose penalization parameters p that minimize the BIC criteria on a grid. This iscomputationally intensive because it requires running the optimization procedure described inSection 3.1 as many times as the number of points in the grid for p . To reduce the computationalburden, we do so only in the initial runs and consider these simulation runs as a calibrationof p . For the remaining iterations, the penalization parameter p is set fixed at the median p computed over the calibration runs. This only worsens the performance of the estimator, since asub-optimal p is chosen for the majority of the iterations. C.2 Results
We evaluate the procedure over varying panel lengths (starting from short panels with T = 5 ),using the following metrics. Given our core contribution is to identify the social interactions matrix,we first examine the proportion of true zero entries in W estimated as zeros, and the proportionof true non-zero entries estimated as non-zeros. A global perspective of the proximity betweenthe true and estimated network can be inferred from their average absolute distance betweenelements. This is the mean absolute deviation of ˆ W and ˆΠ relative to their true values, definedas M AD ( ˆ W ) = N ( N − (cid:80) i,j,i (cid:54) = j | ˆ W ij − W ij, | and M AD ( ˆΠ) = N ( N − (cid:80) i,j,i (cid:54) = j | ˆΠ ij − Π ij, | . Thecloser these metrics are to zero, more of the elements in the true matrix are correctly estimated.Finally, we evaluate the performance of the procedure using averaged estimates of the endogenousand exogenous social effect parameters, ˆ ρ and ˆ γ . In keeping with the estimation strategy in ourempirical application, we report ‘post-Elastic Net’ estimates obtained after having estimated thesocial interactions matrix by the Elastic Net GMM procedure. We use peers-of-peers’ covariatesfrom the estimated matrix as instrumental variables.Each Panel in Figure A1 shows a different metric as we vary T for each simulated network.Panel A shows that for each network, the proportion of zero entries in W correctly estimated aszeros is above % even when exploiting a small number of time periods ( T = 5 ). The proportionapproaches % as T grows. Panel B shows the proportion of non-zeros entries estimated asnon-zeros is also high for small T . It is above % from T = 5 for the Erdos-Renyi network, beingat least % across networks for T = 25 , and increasing as T grows. In our simulations, we set the penalization grid to p = [0 , . , . , . , p ∗ = [0 , . , . , . and p =[0 , . , . , . , resulting in = 64 points per run. ˆ W and ˆΠ falls quickly with T and is close zero for large samplesizes. Finally, Panels E and F show that biases in the endogenous and exogenous social effectsparameters, ˆ ρ and ˆ γ , also fall quickly in T (we do not report the bias in ˆ β since it is close tozero for all T ). The fact that biases are not zero is as expected for small T , being analogous towell-known results for autoregressive time series models.Figure A2 provides a visual representation of the simulated and actual networks under T = 100 time periods. The network size is set to N = 30 in the two stylized networks, N = 70 for the highschool network, and N = 65 for the village household network. In comparing the simulated andtrue network, Figure A2 distinguishes between three types of edges: kept edges, added edges andremoved edges. Kept edges are depicted in blue: these links are estimated as non-zero in at least % of the iterations and are also non-zero in the true network. Added edges are depicted in green:these links are estimated as non-zero in at least % of the iterations but the edge is zero in thetrue network. Removed edges are depicted in red: these links are estimated as zero in at least %of the iterations but are non-zero in the true network. Figure A2 further distinguishes betweenstrong and weak links: strong links are shown in solid edges ( W ,ij > . ), and weak links are shownas dashed edges.Consider first Panel A of Figure A2, comparing the simulated and true Erdos-Renyi network.All zero and all non-zero links are correctly estimated. All links are thus recovered and no edges areadded to the true network (all edges are in blue). For the political party network, Panel B showsthat all strong edges are correctly estimated (it also highlights the party leader nodes). However,around half the weak edges are recovered (blue dashed edges) with the others being missed (reddashed edges). As discussed above, this is not surprising given that shrinkage estimators forcesmall non-zero parameters to zero. Hence, larger T is needed to achieve similar performance asin the other simulated networks in terms of detecting weak links. Again, we never estimate anyadded edges (no edges are green).For the more complex and larger real-world networks, Panel C shows that in the high schoolnetwork, strong edges are all recovered. However, around half the weak edges are missing (reddashed edges) and there are a relatively small number of added edges (green edges): these amountto edges, or approximately . % of the , zero entries in the true high-school network. Asimilar pattern of results is seen in the village network in Panel D: strong edges are all recovered,and here the majority of weak edges are also recovered. A relatively small share of overall edgesare added or missed.Panel B of Table A1 compares the network- and node-level statistics calculated from the recov-ered social interactions matrix ˆ W to those in Panel A from the true interactions matrix W . AsFigure A2 showed, the random Erdos-Renyi network is perfectly recovered. For the political partynetwork, the number of recovered edges is slightly lower than the true network ( vs. ). Thisis driven by weak edges: while all the strong edges are recovered ( out of ), not all the weak53nes are ( vs. ). On node-level statistics, the mean of the in- and out-degree distributions areslightly lower in the recovered network, the clustering coefficient is exactly recovered, and all threenodes with the highest out-degree are correctly captured (nodes , and ), that includes bothparty leaders (individuals and ).Performance in the two real world networks is also encouraging. In the high school network,all strong edges are correctly recovered, as are the majority of weak edges. However, as alreadynoted in Figure A2, because weak edges are not well estimated in the high school network, theaverage in- and out- degrees are smaller in the recovered network relative to the true network. Werecover two out of the three individuals with the highest out-degree (nodes and ). Finally,in the village network, all strong edges are recovered, the majority of weak edges are recovered,the clustering coefficients are similar across recovered and true networks ( . vs. . ) and werecover two out of the three households with the highest out-degree (nodes , , and ). C.3 Robustness
Table A2 presents results for the recovered stylized networks under varying network sizes, N = { , , } . Differences between the true and estimated networks are fairly constant as N in-creases: even for small N = 15 a large proportion of zeros and non-zeros are correctly estimated.In all cases, biases in ˆ ρ and ˆ γ decrease with larger T . We also conduct a counterpart robustnesscheck for one of the real work networks. More precisely, we use the savings and insurance networksbetween households in villages identified in Banerjee et al. (2013), that are generally larger thanfamily networks focused on so far. Table A3 shows descriptive statistics on this true village net-work (Panel A) and the recovered network (Panel B). Relative to the family network, the savingsand insurance network has many more edges, a greater proportion of weak edges, is less clustered,with nodes having a higher degree distribution. Despite these differences, the recovered networkretains good accuracy on many dimensions: % of all edges are recovered, the recovered clusteringcoefficient is . (relative to an actual coefficient of . ) and the three nodes with the highestout-degree are all still identified.Table A4 conducts robustness checks on the sensitivity of the estimates to parameters choices.We consider true parameters ρ = { . , . , . , . } , γ = { . , . } , β = { . , . } . We also introduce acommon shock in the disturbance variance-covariance matrix by varying q in, (cid:15) t ∼ N , q · · · qq · · · q ... ... . . . ... q q · · · where we consider q = { . , . , . , } . We find the procedure to be robust to the true values of ρ , β , γ , and q . For β = 0 , performance is slightly worse. This is expected as the exogenous54ariation from x t no longer affects y t directly.We next probe the procedure by richening up the structure of shocks across nodes. First, weintroduce a common shock correlated with covariates x t . To do so, we take x t from a Gaussiandistribution with mean . α t ι and, as before, variance 1. Second, we implement a version wherethe shock is constant over time but varies at the individual level. In this case, the mean of x t isgiven by . α ∗ . Third, we implement a version mixing the two types of shocks, with the mean of x t given by . α ∗ + 0 . α t ι . In each case, we simulate based on the Erdos-Renyi network as the true W . The results are shown in Figure A3: this shows that for each of the six performance metricsconsidered, the procedure is highly robust to these richer structures of shocks across individualsand time periods.The final robustness check demonstrates the gains from using the Adaptive Elastic Net GMMestimator over alternative estimators. Table A5 shows simulation results using Adaptive Lassoestimates of the interaction matrix Π , so estimating and penalizing the reduced-form. The Adap-tive Lasso estimator performs relatively worse: the mean absolute deviation between ˆ W and W isoften two to three times larger than the corresponding Adaptive Elastic Net estimates. AppendixTable A6 then shows the performance of the procedure based on OLS estimates of Π . Given OLSrequires m (cid:28) N T , we use a time dimension ten times larger, T = { , , } , and still finda deterioration in performance compared to the Adaptive Elastic Net GMM estimator.Taken together, these robustness checks suggest the Adaptive Elastic Net GMM estimator ispreferred over Adaptive Lasso and OLS estimators. This procedure does well in recovering truenetwork structures, and a fortiori , network- and node-level statistics. It does so in networks thatvary in size and complexity, and as the underlying social interactions model varies in the strengthof endogenous and exogenous social effects, and the structure of shocks. D Application: Counterfactuals
We consider a scenario in which California exogenously increases the change in its taxes per capita,so ∆ τ it corresponds to an increase of %. We measure the differential change in equilibrium statetaxes in state j under the two network structures using the following statistic: Υ j = log(∆ τ jt | ˆ W econ ) − log(∆ τ jt | W geo ) , (20)so that positive (negative) values imply taxes being higher (or lower) under ˆ W econ than W geo . Figure A4 graphs Υ j for each mainland US state (including for California itself, the origin ofthe shock). A wide discrepancy between the equilibrium state tax rates predicted under ˆ W econ relative to W geo : across states Υ j varies from − . to . . Only in one state is Υ j close to We calculate the counterfactual at ˆ ρ SLS = . , the endogenous effect parameter estimated in our preferredspecification, Column 7 of Table 2. % higher under ˆ W econ . The dispersion of tax rates across statesalso increases dramatically under ˆ W econ . Finally, assuming interactions across states are basedsolely on geographic neighbors, we miss the fact that many states will have relatively small taxincreases. 56 a b l e1 : G e og r a ph i c N e i ghbo r s D e p e nd e n t va r i a b l e : C h a ng e i np e r ca p i t a i n c o m ea nd c o r po r a t e t axes C o e ff i c i e n t es t i m a t es , s t a nd a r d e rr o r s i np a r e n t h eses ( ) O L S ( ) S L S ( ) O L S ( ) S L S G e og r a ph i c N e i ghbo r s ' T ax C h a ng e (t-[t- ]) . *** . *** . *** . *** ( . )( . )( . )( . ) P e r i od - - - - F i r s t S t a g e ( F - s t a t) . . C on t r o l s Y e s Y e s Y e s Y e s S t a t ea nd Y ea r F i xe d E ff ec t s Y e s Y e s Y e s Y e s O b se r va t i on s , , , , E x t e nd e d S a m p l e B es l eya nd C ase ( ) S a m p l e N o t es : *** deno t e ss i gn i f i c an c ea t % , ** a t % , and * a t % .I na ll s pe c i f i c a t i on s , apa i r o f s t a t e s a r e c on s i de r edne i ghbo r s i ft he ys ha r eageog r aph i c bo r de r . T he s a m p l e c o v e r s m a i n l and U S s t a t e s .I n C o l u m n s t he s a m p l e r un s f r o m t o1988 ( a s i n B e s l e y and C a s e ( )) .I n C o l u m n s t he s a m p l e i s e x t ended t o r un f r o m t o2015 . T hedependen t v a r i ab l e i s t he c hange i n s t a t e i ' s t o t a l t a x e s pe r c ap i t a i n y ea r t. O L S r eg r e ss i on s e s t i m a t e s a r e s ho w n i n C o l u m n s . C o l u m n s s ho w S L S r eg r e ss i on s w he r eea c hgeog r aph i c ne i ghbo r s ' t a xc hange i s i n s t r u m en t edb y l aggedne i ghbo r ' ss t a t e i n c o m epe r c ap i t aandune m p l o y m en t r a t e . A tt he f oo t o f C o l u m n s w e r epo r tt hep - v a l ueon t he F - s t a t i s t i c f r o m t he f i r s t s t ageo ft henu ll h y po t he s i s t ha t i n s t r u m en t s a r e j o i n t l y equa l t o z e r o . A ll r eg r e ss i on sc on t r o l f o r s t a t e i’ s i n c o m epe r c ap i t a i n1982 U S do ll a r s , s t a t e i’ s une m p l o y m en t r a t e ,t hep r opo r t i ono f y oung ( aged5 - ) ande l de r l y ( aged65 + ) i n s t a t e i’ s popu l a t i on , and t he s t a t ego v e r no r ' s age . A ll s pe c i f i c a t i on s i n c l ude s t a t eand t i m e f i x ede ff e c t s . W i t h t hee xc ep t i ono f go v e r no r ' s age , a ll v a r i ab l e s a r ed i ff e r en c edbe t w eenpe r i od t andpe r i od t - . R obu s t s t anda r de rr o r s a r e r epo r t ed i npa r en t he s e s . a b l e2 : E c ono m i c N e i ghbo r s D e p e nd e n t va r i a b l e : C h a ng e i np e r ca p i t a i n c o m ea nd c o r po r a t e t axes C o e ff i c i e n t es t i m a t es , s t a nd a r d e rr o r s i np a r e n t h eses ( ) I n i t i a l ( ) O L S ( ) S L S : I V sa r e C h a r ac t e r i s t i cs o f N e i ghbo r s ( ) I n i t i a l ( ) O L S ( ) S L S : I V sa r e C h a r ac t e r i s t i cs o f N e i ghbo r s ( ) S L S : I V sa r e C h a r ac t e r i s t i cs o f N e i ghbo r s - o f N e i ghbo r s E c ono m i c N e i ghbo r s ' T ax C h a ng e (t-[t- ]) . . *** . *** . . ** . * . *** ( . )( . )( . )( . )( . ) P e r i odF i r s t S t a g e ( F - s t a t) . . . C on t r o l s Y e s Y e s Y e s Y e s Y e s Y e s Y e s S t a t ea nd Y ea r F i xe d E ff ec t s Y e s Y e s Y e s Y e s Y e s Y e s Y e s O b se r va t i on s , , , , , , , E x og e nou s S o c i a l E ff ec t s - - N o E x og e nou s S o c i a l E ff ec t s N o t es : *** deno t e ss i gn i f i c an c ea t % , ** a t % , and * a t % . T he s a m p l e c o v e r s m a i n l and U S s t a t e s r unn i ng f r o m t o2015 . T hedependen t v a r i ab l e i s t he c hange i n s t a t e i ' s t o t a l t a x e s pe r c ap i t a i n y ea r t. W ea ll o w f o r e x ogenou ss o c i a l e ff e c t s i n C o l u m n s t o7 .I n s ub s equen t O L S and I V r eg r e ss i on s ,t hee c ono m i c ne i ghbo r s ' e ff e c t i sc a l c u l a t eda s t he w e i gh t eda v e r ageo f e c ono m i c ne i ghbo r s ' v a r i ab l e s . O L S r eg r e ss i on s e s t i m a t e s a r e s ho w n i n C o l u m n2 , . C o l u m n3and6 s ho w t he2 S L S r eg r e ss i on w he r eea c hgeog r aph i c ne i ghbo r s ' t a xc hange i s i n s t r u m en t edb y l aggedne i ghbo r ' ss t a t e i n c o m epe r c ap i t aandune m p l o y m en t r a t e . C o l u m n7 s ho w s a2 S L S r eg r e ss i on w he r eea c hgeog r aph i c ne i ghbo r s ' t a xc hange i s i n s t r u m en t edb y l aggedne i ghbo r- o f - ne i ghbo r ' ss t a t e i n c o m epe r c ap i t aandune m p l o y m en t r a t e . A tt he f oo t o f C o l u m n s , w e r epo r tt hep - v a l ueon t he F - s t a t i s t i c f r o m t he f i r s t s t ageo ft henu ll h y po t he s i s t ha t i n s t r u m en t s a r e j o i n t l y equa l t o z e r o . A ll r eg r e ss i on sc on t r o l f o r s t a t e i’ s i n c o m epe r c ap i t a i n1982 U S do ll a r s , s t a t e i’ s une m p l o y m en t r a t e ,t hep r opo r t i ono f y oung ( aged5 - ) ande l de r l y ( aged65 + ) i n s t a t e i’ s popu l a t i on , and t he s t a t ego v e r no r ' s age . A ll s pe c i f i c a t i on s i n c l ude s t a t eand t i m e f i x ede ff e c t s . W i t h t hee xc ep t i ono f go v e r no r ' s age , a ll v a r i ab l e s a r ed i ff e r en c edbe t w eenpe r i od t andpe r i od t - . R obu s t s t anda r de rr o r s a r e r epo r t ed i npa r en t he s e s . a b l e3 : G e og r a ph i c V e r s u s E c ono m i c N e i ghbo r N e t w o r ks G e og r a ph i c N e i ghbo r N e t w o r k E c ono m i c N e i ghbo r N e t w o r k N u m b e r o f E dg es E dg es i n B o t h N e t w o r ks E dg es i n W - g e oon l y E dg es i n W - ec onon l y C l u s t e r i ng . . R ec i p r o ca t e d E dg es % . % ou t- d e g r ee . ( . ) i n - d e g r ee . ( . ) . ( . ) D e g r ee D i s t r i bu t i on A c r o ss N od es ( s t a t es ) N o t es : T h i sc o m pa r e ss t a t i s t i cs de r i v ed f r o m t hegeog r aph i c ne t w o r k o f U S s t a t e s t o t ho s e f r o m t hee s t i m a t ede c ono m i c ne t w o r k a m ong U S s t a t e s . T henu m be r o f edge s , edge s i nbo t hne t w o r ks , edge s i n W - geoon l y , edge s i n W - e c onon l yc oun t s t henu m be r o f edge s i n t ho s e c a t ego r i e s . R e c i p r o c a t ededge s i s t he f r equen cy o f i n - edge s t ha t a r e r e c i p r o c a t edb y ou t - edge s ( b yc on s t r u c t i on ,t h i s i s % f o r geog r aph i c ne t w o r ks ) . T he c l u s t e r i ng c oe ff i c i en t i s t he f r equen cy o ft henu m be r o ff u ll yc onne c t ed t r i p l e t s o v e r t he t o t a l nu m be r o ft r i p l e t s . T hedeg r eed i s t r i bu t i ona c r o ss node sc oun t s t hea v e r agenu m be r o f c onne c t i on s ( s t anda r dde v i a t i on i npa r en t he s e s ) : w e s ho w t h i ss epa r a t e l y f o r i n - deg r eeandou t - deg r ee ( b yc on s t r u c t i on ,t he s ea r e i den t i c a l f o r geog r aph i c ne t w o r ks ) . a b l e4 : P r e d i c t i ngL i n ks t o E c ono m i c N e i ghbo r s C o l u m n s1 - : L i n ea r P r ob a b ili t y M od e l ; C o l u m n : Tob i t D e p e nd e n t va r i a b l e ( C o l s1 - ): = i f E c ono m i c L i n k B e t w ee n S t a t es I d e n t i f i e d D e p e nd e n t va r i a b l e ( C o l ): = W e i gh t e dL i n k B e t w ee n S t a t es C o e ff i c i e n t es t i m a t es , s t a nd a r d e rr o r s i np a r e n t h eses E c ono m i ca nd D e m og r a ph i c H o m oph y l y L a bo r M ob ili t y P o li t i ca l H o m oph y l y T ax H ave n s Tob i t , P a r t i a l A v g E ff ec t s ( )( )( )( )( )( )( )( ) G e og r a ph i c N e i ghbo r . *** . *** . *** . *** . *** . *** . *** ( . )( . )( . )( . )( . )( . )( . ) D i s t a n ce - . *** - . ( . )( . ) D i s t a n ces q . . *** . ( . )( . ) G D P H o m oph y l y . ** . * . * . . ( . )( . )( . )( . )( . ) D e m og r a ph i c H o m oph y l y . . . . . ( . )( . )( . )( . )( . ) N e t M i g r a t i on . * . * - . . ( . )( . )( . )( . ) P o li t i ca l H o m oph y l y - . - . ** - . * ( . )( . )( . ) T ax H ave n S e nd e r . *** . *** ( . )( . ) A d j u s t e d R - s qu a r e d . . . . . . . - O b se r va t i on s , , , , , , , , G e og r a ph y N o t es : *** deno t e ss i gn i f i c an c ea t % , ** a t % , and * a t % . T he s pe c i f i c a t i on s i n C o l u m n s - r e c r o ss - s e c t i ona lli nea r p r obab ili t i e s m ode l s w he r e t hedependen t v a r i ab l e i s equa l t o1 i ft w o s t a t e s a r e li n k ed , and z e r oo t he r w i s e .I n C o l u m n8 t hedependen t v a r i ab l e i s t he w e i gh t ed li n k be t w een s t a t e s . C o l u m n8 r epo r t s t hepa r t i a l a v e r agee ff e c t s f r o m a T ob i t m ode l . A pa i r o f s t a t e s i sc on s i de r eda f i r s t - deg r eegeog r aph i c ne i ghbo r i ft he ys ha r eabo r de r . D i s t an c eandd i s t an c e s qua r eda r e c a l c u l a t ed f r o m t he c en t r o i d s o f s t a t e s ' c ap i t a l c i t i e s . G D P ho m oph y l y i s t heab s o l u t ed i ff e r en c eo f s t a t e s ' G D P pe r c ap i t a . D e m og r aph i c ho m oph y l y i s t heab s o l u t ed i ff e r en c eo f s ha r eo f y oung ( aged5 - ) p l u s t heab s o l u t ed i ff e r en c eo ft he s ha r eo f e l de r l y i n s t a t e s ' popu l a t i on ( aged65 + ) . N e t m i g r a t i onba s edon i nd i v i dua l s t a x r e t u r n s ( S ou r c e :I n t e r na l R e v enue S e r v i c e , h tt p s :// . i r s . go v / s t a t i s t i cs / s o i - t a x - s t a t s - m i g r a t i on - da t a ) . P o li t i c a l ho m oph y l y i s equa l t oone i f apa i r o f s t a t e s ha v ego v e r no r s o f s a m epa r t y a t g i v en y ea r . N e v ada , D e l a w a r e , M on t ana , S ou t h D a k o t a , W y o m i ngand N e w Y o r k a r e c on s i de r ed t a x ha v en s t a t e s . T i m ea v e r age s a r e t a k en f o r a ll e x p l ana t o r yv a r i ab l e s . R obu s t s t anda r de rr o r s i npa r en t he s e s . a b l e5 : G ub e r n a t o r i a l T e r m L i m i t s D e p e nd e n t va r i a b l e : C h a ng e i np e r ca p i t a li n c o m ea nd c o r po r a t e t axes C o e ff i c i e n t es t i m a t es , s t a nd a r d e rr o r s i np a r e n t h eses I V s : C h a r ac t e r i s t i cs o f N e i ghbo r s - o f- N e i ghbo r s ( ) O L S ( ) S L S ( ) O L S ( ) S L S ( ) O L S ( ) S L SE c ono m i c N e i ghbo r s ' t axc h a ng e (t-[t- ]) . ** . *** . . * . ** . ** ( . )( . )( . )( . )( . )( . ) P e r i odF i r s t S t a g e ( F - s t a t) . . . C on t r o l s Y e s Y e s Y e s Y e s Y e s Y e s S t a t ea nd Y ea r F i xe d E ff ec t s Y e s Y e s Y e s Y e s Y e s Y e s O b se r va t i on s , , , , N o t es : *** deno t e ss i gn i f i c an c ea t % , ** a t % , and * a t % . T he s a m p l e i n C o l u m n s c o v e r s m a i n l and U S s t a t e s r unn i ng f r o m t o2015 .I n C o l u m n s w eu s e t he s ub s a m p l eo f s t a t e - y ea r s i n w h i c h t hego v e r no r t ha tf a c ed t e r m li m i t s i n t he s ub s equen t gube r na t o r i a l e l e c t i on .I n C o l u m n s w eu s e t he s ub s a m p l eo f s t a t e - y ea r s i n w h i c h t hego v e r no r d i dno tf a c e t e r m li m i t s i n t he s ub s equen t gube r na t o r i a l e l e c t i on , and s o c ou l d r un f o rr ee l e c t i on . T hedependen t v a r i ab l e i s t he c hange i n s t a t e i ' s t o t a l t a x e s pe r c ap i t a i n y ea r t. W e f i r s t e s t i m a t eou r p r o c edu r e w h i c hou t pu t s pa r a m e t e r s and t hene t w o r k o f e c ono m i c ne i ghbo r s . W epena li z egeog r aph i c ne i ghbo r s t h r oughou t anda l s oa ll o w f o r e x ogenou ss o c i a l e ff e c t s . O L S r eg r e ss i on s e s t i m a t e s a r e s ho w n i n C o l u m n s , . C o l u m n s , s ho w a2 S L S r eg r e ss i on w he r eea c hgeog r aph i c ne i ghbo r s ' t a xc hange i s i n s t r u m en t edb y l aggedne i ghbo r- o f - ne i ghbo r ' ss t a t e i n c o m epe r c ap i t aandune m p l o y m en t r a t e . A tt he f oo t o f C o l u m n s , w e r epo r tt hep - v a l ueon t he F - s t a t i s t i c f r o m t he f i r s t s t ageo ft henu ll h y po t he s i s t ha t i n s t r u m en t s a r e j o i n t l y equa l t o z e r o . A ll r eg r e ss i on sc on t r o l f o r s t a t e i’ s i n c o m epe r c ap i t a i n1982 U S do ll a r s , s t a t e i’ s une m p l o y m en t r a t e ,t hep r opo r t i ono f y oung ( aged5 - ) ande l de r l y ( aged65 + ) i n s t a t e i’ s popu l a t i on , and t he s t a t ego v e r no r ' s age . A ll s pe c i f i c a t i on s i n c l ude s t a t eand t i m e f i x ede ff e c t s . W i t h t hee xc ep t i ono f go v e r no r ' s age , a ll v a r i ab l e s a r ed i ff e r en c edbe t w eenpe r i od t andpe r i od t - . R obu s t s t anda r de rr o r s a r e r epo r t ed i npa r en t he s e s . - - - A ll G o ve r no r s G o ve r no r C a nno t R un f o r R e - e l ec t i on G o ve r no r C a n R un f o r R e - e l ec t i on eographic network edgesRemoved (geographic) edges in economic networkNew edges added in economic networks Figure 1B: Network Graph of US States, Identified Economic Neighbors
Notes:
Figure 3B represents the continental United States (N=48). The economic network is derived from our preferred specification, where we penalize geographic neighbors to states, andallow for exogenous social effects. A blue edge is drawn between a pair of states if they are geographic neighbors and were estimated as connected. A red edge is drawn between a pair of statesif they are geographic neighbors but were not estimated as connected. A green edge is drawn between a pair of states if they are not geographic neighbors and were estimated connected. Theleft hand side graph just shows read and blue edges. The right hand side shows all three types of edges. State abbreviations are as used by US Post Office (http://about.usps.com/who-we-are/postal-history/state-abbreviations.pdf).
Notes:
Figure 3A represents the continental US states (N=48). An edge is drawn between a pair of states if they share a geographic border. State abbreviations are as used by US Post Office(http://about.usps.com/who-we-are/postal-history/state-abbreviations.pdf).
Figure 1A: Network Graph of US States, Geographic Neighbors eographic network edgesRemoved (geographic) edges in economic networkNew edges added in economic networks
Panel B: In-network for California
Figure 2: Strength of Ties and Reciprocity
Panel C: Out-network for CaliforniaPanel A: Histogram of Strength of Ties, Conditional on W , ij >0 Notes:
Panel A is the histogram of ties in the economic network, conditional on non-zero ties. Panels B and C show the in-network and out-network of California as derivedfrom our preferred specification, where we penalize geographic neighbors to states, and allow for exogenous social effects. The in-network are the states that determine taxsetting in California. The out-network is the states in which taxes are set in direct response to those in California. A blue edge is drawn between a pair of states if they aregeographic neighbors and were estimated as connected. A red edge is drawn between a pair of states if they are geographic neighbors but were not estimated as connected.A green edge is drawn between a pair of states if they are not geographic neighbors and were estimated connected. State abbreviations are as used by US Post Office(http://about.usps.com/who-we-are/postal-history/state-abbreviations.pdf). i gu r e3 : I n - a nd O u t- d e g r ee D i s t r i bu t i on P a n e l A : I n - d e g r ee d i s t r i bu t i on P a n e l B : O u t- d e g r ee d i s t r i bu t i on N o t es : I n - deg r eed i s t r i bu t i on ( P ane l A ) andou t - deg r eed i s t r i bu t i on ( P ane l B ) . D i s t r i bu t i on c a l c u l a t ed f r o m geog r aph i c ne i ghbo r s ' ne t w o r k ( W - geo ) i nb l ue . D i s t r i bu t i on c a l c u l a t ed f r o m e c ono m i c ne i ghbo r ' s ne t w o r k i n ( W - e c on ) i n r ed . S t a t eabb r e v i a t i on s a r ea s u s edb y U SP o s t O ff i c e ( h tt p :// abou t. u s p s . c o m / w ho - w e - a r e / po s t a l - h i s t o r y / s t a t e - abb r e v i a t i on s . pd f ) . . % of zeros B. % of non-zerosC. Mean Absolute Deviation of D. Mean Absolute Deviation ofE. Endogenous Social Effect, F. Exogenous Social Effect Notes:
These simulation results are based on the Banerjee et al. (2013) village network, using the Adaptive Elastic Net GMMalgorithm, with penalization parameters chosen by BIC, under various assumptions about knowledge of the true network andtime periods T=25, 50, 100, 125 and 150. The “Village" case refers to the simulation implemented without knowledge about thetrue network. "Village (top 3)" refers to the case where all connections of the three households with highest out-degrees areassumed to be known. "Village (top 5)" and "Village (top 103)" are analogously defined. In all cases, 1000 Monte Carloiterations were performed. The true parameters are rho-0=.3, beta-0=.4 and gamma-0=.5. In Panel A, the % of zeroes refers tothe proportion of true zero elements in the social interaction matrix that are estimated as smaller than .05. In Panel B, the % ofnon-zeros refers to the proportion of true elements greater than .3 in the social interaction matrix that are estimated as non-zeros. In Panels C and D, the Mean Absolute Deviations are the mean absolute error of the estimated network compared to thetrue network for the social interaction matrix W and the reduced form matrix respectively. In Panels E and F, the true parametervalues are marked in the horizontal red lines. The recovered parameter are the estimated parameters averaged acrossiterations. All specifications include time and node fixed effects.
Figure 4: Simulation Results, Adaptive Elastic Net GMM
Partial Knowledge of W ܹ Π ߩො ߛො T=25 50 75 100 125 1500.90.910.920.930.940.950.960.970.980.991 T=25 50 75 100 125 15000.020.040.060.080.10.12 T=25 50 75 100 125 15000.10.20.30.40.50.60.70.80.91 T=25 50 75 100 125 1500.90.910.920.930.940.950.960.970.980.991
T=25 50 75 100 125 15000.020.040.060.080.10.12 T=25 50 75 100 125 15000.10.20.30.40.50.60.70.80.91 anel B: Econ Network, 1989-2015 subsample
Panel C: Geographic Versus Economic Neighbor Networks
GeographicNeighborNetwork Economic NeighborNetwork (Full Sample) Economic NeighborNetwork (1962-1988) Economic NeighborNetwork (1989-2015)
Number of Edges
214 144 197 185
Edges in both W-geo and W-econ
79 108 86
Edges in W-econ only
65 89 99
Edges in W-geo only
135 106 128
Clustering .1936 .0259 .0389 .0394
Reciprocated Edges out-degree in-degree
Notes:
This compares statistics derived from the geographic network of US states to those from the estimated economic network among US states, for the three samples ("Full Sample", 1962-2015; 1962-1988; 1989-2015). The number of edges, edges in both networks, edges in W-geo only, edges in W-econ only counts the number of edges in those categories. Numbers are relativeto the W-geo network in the first column. Reciprocated edges is the frequency of in-edges that are reciprocated by out-edges (by construction, this is 100% for geographic networks). Theclustering coefficient is the frequency of the number of fully connected triplets over the total number of triplets. The degree distribution across nodes counts the average number of connections(standard deviation in parentheses): we show this separately for in-degree and out-degree (by construction, these are identical for geographic networks).
Panel A: Econ Network, 1962-1988 subsample
Figure 5: Network Graph of US States, Identified Economic Neighbors by Subsamples
Notes:
Panels A and B represents the continental United States (N=48). Panel A is estimated withe 1962-1988 subsample. Panel B is estimated with the 1989-2015 subsample. The economicnetwork is derived from our preferred specification, where we penalize geographic neighbors to states, and allow for exogenous social effects. A blue edge is drawn between a pair of states if theyare geographic neighbors and were estimated as connected. A red edge is drawn between a pair of states if they are geographic neighbors but were not estimated as connected. A green edge isdrawn between a pair of states if they are not geographic neighbors and were estimated connected. The left hand side graph just shows read and blue edges. The right hand side shows all threetypes of edges. State abbreviations are as used by US Post Office (http://about.usps.com/who-we-are/postal-history/state-abbreviations.pdf).
Degree Distribution Across Nodes (states) rdos-Renyi Political Party High school Village
Coleman (1964) Banerjee et al. (2013)
Number of nodes 30 30 70 65(a) Network-wide statistics
Number of edges
30 45 366 240
Number of strong edges
30 30 70 65
Number of weak edges
Number of reciprocated edges
Clustering coefficient - .000 .120 .141
Number of components
12 11 3 3
Size of maximal component
10 16 68 51
Standard deviation of thediagonal of squared W (b) Node-level statistics
In-degree distribution
Out-degree distribution
Nodes with highest out-degree { 7, 11, 26 } { 1, 11 , 28 } { 21, 22, 69 } { 16, 35, 57 } (a) Network-wide statistics
Number of edges
30 38 210 194
Number of strong edges
30 30 70 68
Number of weak edges
Number of reciprocated edges
Clustering coefficient - .000 .162 .134
Number of components
12 11 1 4
Size of maximal component
10 14 70 48 (b) Node-level statistics
In-degree distribution
Out-degree distribution
Nodes with highest out-degree { 7, 11, 26 } { 1, 11, 28 } { 21, 48, 69 } { 16, 35, 57 }
A. True Networks
Table A1: True and Recovered Networks
Notes:
Panel A refers to the true networks. Panel B refers to the recovered networks. In each Panel, the summary statistics are divided intonetwork-wide and node-level statistics. Strong edges are defined as those with strength greater than or equal to .3. For the in-degree and out-degree distribution, the mean is shown and the standard deviation is in parentheses. The nodes with the highest out-degree are those with thegreatest influence on others, and are calculated as the column-sum of the social interaction matrix. The recovered networks statistic are calculatedover the average network across simulations with T=100. .254 .254 .167 .239
B. Recovered Networks a b l e A : S i m u l a t i on R es u l t s , A d a p t i ve E l as t i c N e t G MM , A l t e r n a t i ve N e t w o r k S i z es T = T = T = T = T = T = % T r u e Z e r o es . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) % T r u e N on - Z e r o es . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) N o t es : T he s e s i m u l a t i on r e s u l t s a r eba s edon t he A dap t i v e E l a s t i c N e t G MM a l go r i t h m , w i t hpena li z a t i onpa r a m e t e r sc ho s enb y B I C , unde r v a r i ou s t r uene t w o r ks , ne t w o r ks i z e s and t i m epe r i od s T = , .I na ll c a s e s , M on t e C a r l o i t e r a t i on s w e r epe r f o r m ed . T he t r uepa r a m e t e r s a r e r ho - = . , be t a - = . mm a - = . . T he % o ft r ue z e r oe s r e f e r s t o t hep r opo r t i ono ft r ue z e r oe l e m en t s i n t he s o c i a li n t e r a c t i on m a t r i x t ha t a r ee s t i m a t eda ss m a ll e r t han . . T he % o ft r uenon - z e r oe s r e f e r s t o t hep r opo r t i ono ft r uee l e m en t s g r ea t e r t han . i n t he s o c i a li n t e r a c t i on m a t r i x t ha t a r ee s t i m a t eda s non - z e r o s . T he M ean A b s o l u t e D e v i a t i on s a r e t he m eanab s o l u t ee rr o r o ft hee s t i m a t edne t w o r kc o m pa r ed t o t he t r uene t w o r k f o r t he s o c i a li n t e r a c t i on m a t r i x W and t he r edu c ed f o r mm a t r i x r e s pe c t i v e l y . T he r e c o v e r edpa r a m e t e r a r e t hee s t i m a t edpa r a m e t e r s a v e r ageda c r o ss i t e r a t i on s . A ll s pe c i f i c a t i on s i n c l ude t i m eandnode f i x ede ff e c t s . S t anda r de rr o r s a c r o ss i t e r a t i on s a r e i npa r en t he s e s . N = N = N = A . E r do s - R e n y i B . P o li t i ca l p a r t y N = N = N = ࡹ ࡰ ࢃ ࡹ ࡰ ࢰ ࣋ ෝ ࢼ ࢽ ෝ able A3: True and Recovered Village Networks Village Village
Family Savings and Insurance
Banerjee et al. (2013) Banerjee et al. (2013)
Number of nodes 65 65(a) Network-wide statistics
Number of edges
240 343
Number of strong edges
65 47
Number of weak edges
175 296
Number of reciprocated edges
240 340
Clustering coefficient .141 .073
Number of components
Size of maximal component
51 62
Standard deviation of thediagonal of squared W (b) Node-level statistics
In-degree distribution
Out-degree distribution
Nodes with highest out-degree { 16, 35, 57 } { 16, 35, 55 } (a) Network-wide statistics
Number of edges
194 269
Number of strong edges
68 65
Number of weak edges
126 204
Number of reciprocated edges
170 250
Clustering coefficient .134 .058
Number of components
Size of maximal component
48 62 (b) Node-level statistics
In-degree distribution
Out-degree distribution
Nodes with highest out-degree { 16, 35, 57 } { 16, 35, 55 }.159
Notes:
Panel A refers to the true networks. Panel B refers to the recovered networks. In each Panel, thesummary statistics are divided into network-wide and node-level statistics. Strong edges are defined as thosewith strength greater than or equal to .3. For the in-degree and out-degree distribution, the mean is shown andthe standard deviation is in parentheses. The nodes with the highest out-degree are those with the greatestinfluence on others, and are calculated as the column-sum of the social interaction matrix. The recoverednetworks statistic are calculated over the average network across simulations with T=100. .239
A. True NetworksB. Recovered Networks a b l e A : S i m u l a t i on R es u l t s , A d a p t i ve E l as t i c N e t G MM , A l t e r n a t i ve P a r a m e t e r s . . . . . . . . . . . . . . . . . . . . . . . . % T r u e Z e r o es . . . . . . . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) % T r u e N on - Z e r o es . . . . . . . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . ) . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . ( . ) . . . . . . . . . . . - . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) B . P o li t i ca l p a r t y N o t es : T he s e s i m u l a t i on r e s u l t s a r eba s edon t he A dap t i v e E l a s t i c N e t G MM a l go r i t h m , w i t hpena li z a t i onpa r a m e t e r sc ho s enb y B I C , unde r v a r i ou s t r uene t w o r ks , ne t w o r ks i z e s ,t i m epe r i od s T = r a m e t e r v a l ue s .I na ll c a s e s , M on t e C a r l o i t e r a t i on s w e r epe r f o r m ed . T he % o ft r ue z e r oe s r e f e r s t o t hep r opo r t i ono ft r ue z e r oe l e m en t s i n t he s o c i a li n t e r a c t i on m a t r i x t ha t a r ee s t i m a t eda ss m a ll e r t han . . T he % o ft r uenon - z e r oe s r e f e r s t o t hep r opo r t i ono ft r uee l e m en t s g r ea t e r t han . i n t he s o c i a li n t e r a c t i on m a t r i x t ha t a r ee s t i m a t eda s non - z e r o s . T he M ean A b s o l u t e D e v i a t i on s a r e t he m eanab s o l u t ee rr o r o ft hee s t i m a t edne t w o r kc o m pa r ed t o t he t r uene t w o r k f o r t he s o c i a li n t e r a c t i on m a t r i x W and t he r edu c ed f o r mm a t r i x r e s pe c t i v e l y . T he r e c o v e r edpa r a m e t e r a r e t hee s t i m a t edpa r a m e t e r s a v e r ageda c r o ss i t e r a t i on s . A ll s pe c i f i c a t i on s i n c l ude t i m eandnode f i x ede ff e c t s . S t anda r de rr o r s a c r o ss i t e r a t i on s a r e i npa r en t he s e s . qq A . E r do s - R e n y i ࣋ ࢼ ࢽ ࡹ ࡰ ࢃ ࡹ ࡰ ࢰ ࣋ ෝ ࢼ ࢽ ෝ ࣋ ࢼ ࢽ = T = T = T = T = T = % T r u e Z e r o es . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) % T r u e N on - Z e r o es . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) N o t es : T he s e s i m u l a t i on r e s u l t s a r eba s edon t he A dap t i v eLa ss oa l go r i t h m , w i t hpena li z a t i onpa r a m e t e r sc ho s enb y B I C , unde r v a r i ou s t r uene t w o r ks , ne t w o r ks i z e s and t i m epe r i od s T = , .I na ll c a s e s , M on t e C a r l o i t e r a t i on s w e r epe r f o r m ed . T he t r uepa r a m e t e r s a r e r ho - = . , be t a - = . mm a - = . . T he % o ft r ue z e r oe s r e f e r s t o t hep r opo r t i ono ft r ue z e r oe l e m en t s i n t he s o c i a li n t e r a c t i on m a t r i x t ha t a r ee s t i m a t eda ss m a ll e r t han . . T he % o ft r uenon - z e r oe s r e f e r s t o t hep r opo r t i ono ft r uee l e m en t s g r ea t e r t han . i n t he s o c i a li n t e r a c t i on m a t r i x t ha t a r ee s t i m a t eda s non - z e r o s . T he M ean A b s o l u t e D e v i a t i on s a r e t he m eanab s o l u t ee rr o r o ft hee s t i m a t edne t w o r kc o m pa r ed t o t he t r uene t w o r k f o r t he s o c i a li n t e r a c t i on m a t r i x W and t he r edu c ed f o r mm a t r i x r e s pe c t i v e l y . T he r e c o v e r edpa r a m e t e r a r e t hee s t i m a t edpa r a m e t e r s a v e r ageda c r o ss i t e r a t i on s . A ll s pe c i f i c a t i on s i n c l ude t i m eandnode f i x ede ff e c t s . S t anda r de rr o r s a c r o ss i t e r a t i on s a r e i npa r en t he s e s . T a b l e A : S i m u l a t i on R es u l t s , A d a p t i ve L ass o A . E r do s - R e n y i B . P o li t i ca l p a r t y N = N = N = N = N = N = ࡹ ࡰ ࢃ ࡹ ࡰ ࢰ ࣋ ෝ ࢼ ࢽ ෝ = T = T = T = T = T = % T r u e Z e r o es . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) % T r u e N on - Z e r o es . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) . . . . . . . . . . . . . . . . . . ( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . )( . ) N o t es : T he s e s i m u l a t i on r e s u l t s a r eba s edon O L S e s t i m a t e s , unde r v a r i ou s t r uene t w o r ks , ne t w o r ks i z e s and t i m epe r i od s T = , .I na ll c a s e s , M on t e C a r l o i t e r a t i on s w e r epe r f o r m ed . T he t r uepa r a m e t e r s a r e r ho - = . , be t a - = . mm a - = . . T he % o ft r ue z e r oe s r e f e r s t o t hep r opo r t i ono ft r ue z e r oe l e m en t s i n t he s o c i a li n t e r a c t i on m a t r i x t ha t a r ee s t i m a t eda ss m a ll e r t han . . T he % o ft r uenon - z e r oe s r e f e r s t o t hep r opo r t i ono ft r uee l e m en t s g r ea t e r t han . i n t he s o c i a li n t e r a c t i on m a t r i x t ha t a r ee s t i m a t eda s non - z e r o s . T he M ean A b s o l u t e D e v i a t i on s a r e t he m eanab s o l u t ee rr o r o ft hee s t i m a t edne t w o r kc o m pa r ed t o t he t r uene t w o r k f o r t he s o c i a li n t e r a c t i on m a t r i x W and t he r edu c ed f o r mm a t r i x r e s pe c t i v e l y . T he r e c o v e r edpa r a m e t e r a r e t hee s t i m a t edpa r a m e t e r s a v e r ageda c r o ss i t e r a t i on s . A ll s pe c i f i c a t i on s i n c l ude t i m eandnode f i x ede ff e c t s . S t anda r de rr o r s a c r o ss i t e r a t i on s a r e i npa r en t he s e s . T a b l e A : S i m u l a t i on R es u l t s , O L S A . E r do s - R e n y i B . P o li t i ca l p a r t y N = N = N = N = N = N = ࡹ ࡰ ࢃ ࡹ ࡰ ࢰ ࣋ ෝ ࢼ ࢽ ෝ a b l e A : S u mm a r y S t a t i s t i cs , T ax C o m p e t i t i on A pp li ca t i on O b s M ea n S D M i nq M e d i a nq M ax S t a t e t o t a l t a x pe r c ap i t a1296 . . . . . . . S t a t e i n c o m epe r c ap i t a12969 . . . . . . . U ne m p l o y m en t r a t e12965 . . . . . . . P r opo r t i ono f y oung1296 . . . . . . . P r opo r t i ono f e l de r l y . . . . . . . S t a t ego v e r no r ' s age129651 . . . . . . . G o v e r no r t e r m li m i t du mm y . . . . . . . S t a t e t o t a l t a x pe r c ap i t a26880 . . . . . . . S t a t e i n c o m epe r c ap i t a273613 . . . . . . . U ne m p l o y m en t r a t e26885 . . . . . . . P r opo r t i ono f y oung26880 . . . . . . . P r opo r t i ono f e l de r l y . . . . . . . S t a t ego v e r no r ' s age273653 . . . . . . . G o v e r no r t e r m li m i t du mm y . . . . . . . N o t es : S u mm a r ys t a t i s t i cs o f v a r i ab l e s ( i n l e v e l s ) u s ed i n s ub s equen t r eg r e ss i on s . B e s l e y and C a s e s a m p l e r un s f r o m t o1988ande x t ended s a m p l eun t il . S t a t e t o t a l t a x pe r c ap i t a i s t he s u m o f s a l e s , i n c o m eand c o r po r a t i on t a x i n t hou s and s o f U S do ll a r s . S t a t e i n c o m epe r c ap i t a i n t hou s and s o f U S do ll a r s . P r opo r t i ono f y oung i s t hep r opo r t i ono ft hepopu l a t i onbe t w een5and17 y ea r s . P r opo r t i ono f e l de r l y i s t hep r opo r t i ono ft hepopu l a t i onaged65o r o l de r . S t a t ego v e r no r ' s age i n y ea r s . G o v e r no r t e r m li m i t du mm y i s equa l t o1 i f go v e r no r f a c e s t e r m li m i t s i n t he c u rr en t m anda t e . D a t a s ou r c e s : S t a t e t o t a l t a x pe r c ap i t a , C en s u s o f G o v e r n m en t s ( , , , , - ) and A nnua l S u r v e y o f G o v e r m en t F i nan c e s ( a ll o t he r y ea r s ) ; S t e t a i n c o m epe r c ap i t a , B u r eauo f E c ono m i c A na l ys i s ; U ne m p l o y m en t r a t e , B u r eauo f Labo r S t a t i s t i cs ; P r opo r t i ono f y oung ( aged5 - ) ande l de r l y ( aged65 + ) , C en s u s P opu l a t i on & H ou s i ng D a t a ; S t a t ego v e r no r ' s ageandpo li t i c a l v a r i ab l e s m anua ll ys ou r c ed f r o m i nd i v i dua l go v e r no r ' s w ebpage s on W i k i ped i a . A . B es l eya nd C asesa m p l e ( - ) B . E x t e nd e d sa m p l e ( - ) a b l e A : E x og e nou s S o c i a l E ff ec t s D e p e nd e n t va r i a b l e : C h a ng e i np e r ca p i t a li n c o m ea nd c o r po r a t e t axes C o e ff i c i e n t es t i m a t es , s t a nd a r d e rr o r s i np a r e n t h eses ( ) I n i t i a l ( ) O L S ( ) S L S : I V sa r e C h a r ac t e r i s t i cs o f N e i ghbo r s ( ) S L S : I V sa r e C h a r ac t e r i s t i cs o f N e i ghbo r s - o f N e i ghbo r s E c ono m i c N e i ghbo r s ' t axc h a ng e (t-[t- ]) . . ** . * . *** ( . )( . )( . ) E c ono m i c N e i ghbo r s ' i n c o m e p e r ca p i t a . . *** . *** . *** ( . )( . )( . ) E c ono m i c N e i ghbo r s ' un e m p l o y m e n t r a t e . . *** . *** . *** ( . )( . )( . ) E c ono m i c N e i ghbo r s ' popu l a t i on a g e d - . . . . ( . )( . )( . ) E c ono m i c N e i ghbo r s ' popu l a t i on a g e d + - . - . - . * - . ( . )( . )( . ) E c ono m i c N e i ghbo r s ' go ve r no r a g e - . - . - . - . ( . )( . )( . ) P e r i odF i r s t S t a g e ( F - s t a t) . . C on t r o l s Y e s Y e s Y e s Y e s S t a t ea nd Y ea r F i xe d E ff ec t s Y e s Y e s Y e s Y e s O b se r va t i on s , , , ,
592 1962 - N o t es : *** deno t e ss i gn i f i c an c ea t % , ** a t % , and * a t % . T he s a m p l e c o v e r s m a i n l and U S s t a t e s r unn i ng f r o m t o2015 . T hedependen t v a r i ab l e i s t he c hange i n s t a t e i ' s t o t a l t a x e s pe r c ap i t a i n y ea r t.I na ll C o l u m n s , w epena li z egeog r aph i c ne i ghbo r s i na ll C o l u m n s anda ll o w f o r e x ogenou ss o c i a l e ff e c t s .I n O L S and I V r eg r e ss i on s ,t hee c ono m i c ne i ghbo r s ' e ff e c t i sc a l c u l a t eda s t he w e i gh t eda v e r ageo f e c ono m i c ne i ghbo r s ' v a r i ab l e s . O L S r eg r e ss i on s e s t i m a t e s a r e s ho w n i n C o l u m n2 . C o l u m n3 s ho w s t he2 S L S r eg r e ss i on w he r eea c hgeog r aph i c ne i ghbo r s ' t a xc hange i s i n s t r u m en t edb y l aggedne i ghbo r ' ss t a t e i n c o m epe r c ap i t aandune m p l o y m en t r a t e . C o l u m n4 s ho w s a2 S L S r eg r e ss i on w he r eea c hgeog r aph i c ne i ghbo r s ' t a xc hange i s i n s t r u m en t edb y l aggedne i ghbo r- o f - ne i ghbo r ' ss t a t e i n c o m epe r c ap i t aandune m p l o y m en t r a t e . A tt he f oo t o f C o l u m n s w e r epo r tt hep - v a l ueon t he F - s t a t i s t i c f r o m t he f i r s t s t ageo ft henu ll h y po t he s i s t ha t i n s t r u m en t s a r e j o i n t l y equa l t o z e r o . A ll r eg r e ss i on sc on t r o l f o r s t a t e i’ s i n c o m epe r c ap i t a i n1982 U S do ll a r s , s t a t e i’ s une m p l o y m en t r a t e ,t hep r opo r t i ono f y oung ( aged5 - ) ande l de r l y ( aged65 + ) i n s t a t e i’ s popu l a t i on , and t he s t a t ego v e r no r ' s age . A ll s pe c i f i c a t i on s i n c l ude s t a t eand t i m e f i x ede ff e c t s . W i t h t hee xc ep t i ono f go v e r no r ' s age , a ll v a r i ab l e s a r ed i ff e r en c edbe t w eenpe r i od t andpe r i od t - . R obu s t s t anda r de rr o r s a r e r epo r t ed i npa r en t he s e s . a b l e A : G e n e r a l E qu ili b r i u m I m p ac t s o f C a li f o r n i a T ax R i se G e og r a ph i c N e i ghbo r N e t w o r k E c ono m i c N e i ghbo r N e t w o r k R a t i o A ve r a g e t ax i n c r ease . . . V a r i a n ce t ax i n c r ease . . . T ax d i s p e r s i on . . . S t a t es w i t h t ax i n c r ease . S t a t es w i t h t ax i n c r ease > . % . S t a t es w i t h t ax i n c r ease > . % . S t a t es w i t h t ax i n c r ease > % . S t a t es w i t h t ax i n c r ease > . % . S t a t es w i t h t ax i n c r ease > % . N o t es : T h i ss ho w s t heequ ili b r i u m i m pu l s e r e s pon s e s i n t a x e ss e t i nea c h s t a t ea s a r e s u l t o f C a li f o r n i a i n c r ea s i ng i t s t a xc hangeb y % . T he r ho c oe ff i c i en t i s de r i v ed f r o m ou r p r e f e rr ed s pe c i f i c a t i on t oe s t i m a t e t hee c ono m i c ne t w o r k , w he r e w epena li z egeog r aph i c ne i ghbo r s t o s t a t e s , anda ll o w f o r e x ogenou ss o c i a l e ff e c t s ( ba s edona s a m p l eo f m a i n l and U S s t a t e s r unn i ng f r o m t o2015 ) . W e c o m pa r e t he s ede r i v ed t a xc hange s unde r t he i den t i f i ede c ono m i c ne t w o r ks t r u c t u r e , r e l a t i v e t o t ha t a ss u m edunde r ageog r aph i c ne i ghbo r ss t r u c t u r e . T he f i na l C o l u m n s ho w s t he r a t i oo ft he s a m e s t a t i s t i c de r i v edunde r ea c hne t w o r k . . % of zeros B. % of non-zerosC. Mean Absolute Deviation of D. Mean Absolute Deviation ofE. Endogenous Social Effect, F. Exogenous Social Effect Notes:
These simulation results are based on the Adaptive Elastic Net GMM algorithm, with penalization parameters chosenby BIC, under various true networks and time periods T=5, 10, 15, 25, 50, 100, 125 and 150. In all cases, 1000 Monte Carloiterations were performed. The true parameters are rho-0=.3, beta-0=.4 and gamma-0=.5. In Panel A, the % of zeroes refersto the proportion of true zero elements in the social interaction matrix that are estimated as smaller than .05. In Panel B, the %of non-zeros refers to the proportion of true elements greater than .3 in the social interaction matrix that are estimated as non-zeros. In Panels C and D, the Mean Absolute Deviations are the mean absolute error of the estimated network compared tothe true network for the social interaction matrix W and the reduced form matrix respectively. In Panels E and F, the trueparameter values are marked in the horizontal red lines. The recovered parameter are the estimated parameters averagedacross iterations. All specifications include time and node fixed effects.
Figure A1: Simulation Results, Adaptive Elastic Net GMM ܹ Π ߩො ߛො T=5 10 15 25 50 75 100 125 1500.50.550.60.650.70.750.80.850.90.951
T=5 10 15 25 50 75 100 125 1500.50.550.60.650.70.750.80.850.90.951 T=5 10 15 25 50 75 100 125 15000.10.20.30.40.50.60.70.8T=5 10 15 25 50 75 100 125 15000.20.40.60.811.2 T=5 10 15 25 50 75 100 125 15000.20.40.60.811.2T=5 10 15 25 50 75 100 125 15000.10.20.30.40.50.60.70.8 igure A2: Simulated and True Networks
Notes:
These simulation results are based on the Elastic Net algorithm, with penalization parameters chosen by BIC, under varioustrue networks and time periods T=50, 100 and 150. In the two stylized networks (Erdos-Renyi and political party), we set N=30, andthe real world networks, the high school friendship and village network are based on N=65 and 70 non-isolated nodes respectively.Party leaders in the political party network are marked in black in Panel B. In all cases, 1,000 Monte Carlo iterations were performed.The true parameters are rho-0=.3, beta-0=.4 and gamma-0=.5. All specifications include time and node fixed effects. Kept edges aredepicted in blue: these links are estimated as non-zero in at least 5% of the iterations and are also non-zero in the true network.Added edges are depicted in green: these links are estimated as non-zero in at least 5% of the iterations but the edge is zero in thetrue network. Removed edges are depicted in red: these links are estimated as zero in at least 5% of the iterations but are non-zero inthe true network. The figures further distinguish between strong and weak links: strong links are shown in solid edges (whose strengthis greater than or equal to .3), and weak links are shown as dashed edges.
A. Erdos-Renyi B. Political PartyC. High-school D. Village . % of zeros B. % of non-zerosC. Mean Absolute Deviation of D. Mean Absolute Deviation ofE. Endogenous Social Effect, F. Exogenous Social Effect
Figure A3: Simulation Results, Adaptive Elastic Net GMM
Erdos-Renyi graph (N=30)
Notes:
Simulations with common shocks between the exogenous variable and the error term: time-constant and varying at theindividual level ("individual"), constant across individuals and varying over time ("time") and both types of shocks. Thesesimulation results are based on the Adaptive Elastic Net GMM algorithm, with penalization parameters chosen by BIC, undervarious true networks and time periods T=25, 50, 100, 125 and 150. In all cases, 1000 Monte Carlo iterations were performed.The true parameters are rho-0=.3, beta-0=.4 and gamma-0=.5. In Panel A, the % of zeroes refers to the proportion of true zeroelements in the social interaction matrix that are estimated as smaller than .05. In Panel B, the % of non-zeros refers to theproportion of true elements greater than .3 in the social interaction matrix that are estimated as non-zeros. In Panels C and D,the Mean Absolute Deviations are the mean absolute error of the estimated network compared to the true network for the socialinteraction matrix W and the reduced form matrix respectively. In Panels E and F, the true parameter values are marked in thehorizontal red lines. The recovered parameter are the estimated parameters averaged across iterations. All specificationsinclude time and node fixed effects.
Alternative Structures of Shocks ܹ Π ߩො ߛො T=25 50 75 100 125 1500.90.910.920.930.940.950.960.970.980.991 T=25 50 75 100 125 15000.010.020.030.040.050.060.070.08 T=25 50 75 100 125 15000.10.20.30.40.50.60.70.80.91 T=25 50 75 100 125 1500.90.910.920.930.940.950.960.970.980.991 T=25 50 75 100 125 15000.010.020.030.040.050.060.070.08 T=25 50 75 100 125 15000.10.20.30.40.50.60.70.80.91 o s i t i veva l u es i nd i ca t e h i gh e r e qu ili b r i u m t axes und e r E c ono m i c n e i ghbo r s t h a ng e og r a ph i c n e i ghbo r s N e g a t i veva l u es i nd i ca t e l o w e qu ili b r i u m t axes und e r E c ono m i c n e i ghbo r s t h a ng e og r a ph i c n e i ghbo r s F i gu r e A : G e n e r a l E qu ili b r i u m I m p ac t s o f CA T ax R i se S ho cks S t a t e ' s R eac t i on t o % i n c r ease i n CA t axes Log ( e qu ili b r i u m t axes und e r W - ec on )- Log ( e qu ili b r i u m t axes und e r W - g e o ) N o t es : T h i ss ho w s t heequ ili b r i u m i m pu l s e r e s pon s e s i n t a x e ss e t i nea c h s t a t ea s a r e s u l t o f C a li f o r n i a i n c r ea s i ng i t s t a xc hangeb y % . T h i s i s a s de r i v ed f r o m ou r p r e f e rr ed s pe c i f i c a t i on , w he r e w epena li z egeog r aph i c ne i ghbo r s t o s t a t e s , anda ll o w f o r e x ogenou ss o c i a l e ff e c t s . W e c o m pa r e t he s ede r i v ed t a xc hange s unde r t he i den t i f i ede c ono m i c ne t w o r ks t r u c t u r e , r e l a t i v e t o t ha t a ss u m edunde r ageog r aph i c ne i ghbo r ss t r u c t u r e . W eg r aph t he l og c hange i nequ ili b r i u m t a x e s unde r e c ono m i c ne i ghbo r s , m i nu s t he l og c hange i nequ ili b r i u m t a x e s unde r geog r aph i c ne i ghbo r s . P o s i t i v e v a l ue s (r ed s haded ) s t a t e s i nd i c a t eh i ghe r equ ili b r i u m t a x e s unde r e c ono m i c ne i ghbo r s t hangeog r aph i c ne i ghbo r s , andnega t i v e v a l ue s ( b l ue s haded ) s t a t e s i nd i c a t e l o w e r equ ili b r i u m t a x e s unde r e c ono m i c ne i ghbo r s t hangeog r aph i c ne i ghbo r ss