aa r X i v : . [ ec on . E M ] N ov Non-Identifiability in Network Autoregressions
Federico Martellosio ∗ November 20, 2020
Abstract
We study identification in autoregressions defined on a general network. Most identifi-cation conditions that are available for these models either rely on repeated observations,are only sufficient, or require strong distributional assumptions. We derive conditions thatapply even if only one observation of a network is available, are necessary and sufficient foridentification, and require weak distributional assumptions. We find that the models aregenerically identified even without repeated observations, and analyze the combinations ofthe interaction matrix and the regressor matrix for which identification fails. This is doneboth in the original model and after certain transformations in the sample space, the lattercase being important for some fixed effects specifications.
Keywords : fixed effects, invariance, networks, quasi maximum likelihood estimation.
JEL Classification : C12, C21.
A simple way to model interaction on a general network is to use an autoregressive processfor an outcome variable, usually conditional on covariates. Models of this type can betraced back at least to Whittle (1954), and have since proved useful in many applications,across many scientific fields. In economics, and the social sciences more generally, theyare currently particularly popular in the analysis of peer effects and social networks. Themodels are known as simultaneous autoregressions in the statistics literature (e.g., Cressie,1993), spatial autoregressions in the econometrics literature (e.g., LeSage and Pace, 2009),are closely related to linear-in-means models (e.g., Manski, 1993a), and have importantconnections to linear structural equation models (e.g., Drton et al., 2011). To emphasizetheir wide applicability, we refer to them as network autoregressions . ∗ School of Economics, University of Surrey, [email protected] W and the regressor matrix X lead to a failure of identification. To this end, we restrict attention to the case where both W and X are nonstochastic and known, as in Lee (2004). We show that identification from the first moment is generally possible, and characterizethe cases when it is impossible. We focus on one class of such cases, which is particularlyrelevant in fixed effects models (for example, the classical linear-in-means model with groupfixed effects belongs to this class of cases). In this class, non-identifiability from the first mo-ment is linked to the impossibility of invariant inference; that is, the parameters cannot beidentified from any statistic that is invariant with respect to a certain group of transforma-tions under which the model itself is invariant. This fundamental type of non-identifiabilityoccurs despite the fact that the parameters may be identifiable from the second moment ofthe outcome variable.Section 2 sets out the framework. Section 3 studies identifiability from the first andsecond moments of the outcome variable. Identifiability after imposition of invariance isdiscussed in Section 4, and implications for likelihood inference in Section 5. Section 6briefly concludes. The appendices contain additional material and all proofs.
Notation . Throughout the paper, ι n denotes the n × M A denotes theorthogonal projector onto col ⊥ ( A ) ( M A := I n − A ( A ′ A ) − A ′ if A has full column rank), µ R n denotes the Lebesgue measure on R n , “a.s.” stands for almost surely, with respect to µ R n , It would be possible, alternatively, to study identifiability conditional on W and/or X , under suitableexogeneity assumptions (see, e.g., Bramoull´e et al., 2009; Gupta, 2019), at the cost of some notationalcomplexity. Allowing for endogeneity of W and/or X would instead require different methods; see Section6. A ⊕ B denotes the direct sum of the matrices A and B (that is, if A is n × m and B is p × q , A ⊕ B is the np × mq block diagonal matrix with A as top diagonal block and B asbottom diagonal block). The model of interest is the network autoregression y = λW y + Xβ + σε, (2.1)where y is the n × λ is a scalar parameter, W is an interaction matrix, X is an n × k matrix of regressors with full column rank and with k ≤ n − β ∈ R k , σ is a positive scale parameter, and ε is an unobserved zero mean n × W and X are taken to be nonstochastic and known. The entriesof W are supposed to reflect the pairwise interaction between the observational units; inparticular, the ( i, j )-th entry of W is zero if unit j is not deemed to be a neighbor of unit i .Some of the columns of X may be spatial lags of some other columns (the spatial lag of avector x being the vector W x ). That is, in the terminology of social networks, we allow for“contextual effects” or “exogenous spillovers”.When the index set of y has more than one dimension (e.g., individuals and time, orindividuals and networks), it is often useful to include in the error term additive unobservedcomponents relative to those dimensions. In that case, we take a fixed effects approachand treat the unobserved effects as parameters to be estimated. Accordingly, for inferentialpurposes, we incorporate the fixed effects into β and the corresponding dummy variables into X . Two examples of fixed effects specifications that can be nested into the general model(2.1) are given next. Example . There are N individuals, followed over T time periods. Let f W be an N × N matrix describing the interaction between individuals, and e X an N T × ˜ k regressormatrix. The interaction matrix f W is assumed to be constant over time for simplicity. A paneldata version of the network autoregression (2.1) is given by y it = λ P ij f W ij y jt + e x ′ it ˜ β + u it ,where f W ij are the entries of f W , and e x ′ it are the ˜ k × e X , for i = 1 , . . . , N and t = 1 , . . . , T. The error u it is decomposed into c i + σε it (one-way model) or c i + α t + σε it (two-way model), where c i and α t are, respectively, individual and time fixed effects, and ε it is an3diosyncratic error. Following a fixed effects approach (i.e., treating the random components c i and α t as parameters to be estimated), the model can be written in the notation ofequation (2.1), with W = I T ⊗ f W , and, for the two-way model, X = ( e X, ι T ⊗ I N , I T ⊗ ι N )and β = ( ˜ β ′ , c ′ , α ′ ) ′ , where c and α are the vectors with entries c i and α t , respectively. Example . There are R networks, with network r having m r indi-viduals. The model is y r = λW r y r + e X r γ + α r ι m r + σε r , r = 1 , . . . , R, (2.2)where W r is the m r × m r interaction matrix of network r , α r is a network fixed effect, e X r isan m r × ˜ k matrix of regressors, and γ is a ˜ k × y = ( y ′ , . . . , y ′ R ) ′ , W = L Rr =1 W r , β = ( γ ′ , α ′ ) ′ , ε = ( ε ′ , . . . , ε ′ R ) ′ , and X = ( e X, L Rr =1 ι m r ),with e X := ( e X ′ , . . . , e X ′ R ) ′ .We now present an assumption that plays a crucial role throughout the paper. Assumption 1.
There is no real eigenvalue ω of W for which M X ( ωI n − W ) = 0 . Assumption 1 is required to rule out some pathological combinations of W and X . Moreprecisely, we shall see in Section 4 that a failure of Assumption 1 implies a particular type ofnon-identifiability. A condition equivalent to M X ( ωI n − W ) = 0 is col( ωI n − W ) ⊆ col( X ).That is, a pair ( X, W ) causes Assumption 1 to fail if and only if col( X ) contains the subspacecol( ωI n − W ), for some real eigenvalue ω of W . Also, note that if, for a given W , Assumption1 is violated for some X = X , then it is also violated for X = ( X , X ), for any X (suchthat X is full rank). It is helpful to look at two examples in which Assumption 1 fails (furtherexamples are given in Appendix A). Example . A particular case of model (2.2), whichwe refer to as the
Group Interaction model , is when all members of a group interact homo-geneously, that is, W r = m r − ( ι m r ι ′ m r − I m r ) =: B m r , for r = 1 , . . . , R . Following Manski(1993b), this specific structure has played a central role in the literature on peer effects. Wesay that the Group Interaction model is balanced if all group sizes m r are the same. In thatcase, letting m denote the common group size, W = I R ⊗ B m . It is easily verified that, forthe matrix W = I R ⊗ B m , ω min = − m − and col( ω min I n − W ) = col( I R ⊗ ι m ). Since I R ⊗ ι m Obviously for identification of β one column of the matrix ( ι T ⊗ I N , I T ⊗ ι N ) should be omitted from X , or some normalization should be imposed on the fixed effects, and no regressor should be constant overtime or over individuals.
4s the design matrix of the group fixed effects, it follows that the balanced group interactionmodel violates Assumption 1 whenever it includes group fixed effects.
Example . In a complete bipartite graph the n observationalunits are partitioned into two groups of sizes p and q, say, with all units within a groupinteracting with all in the other group, but with none in their own group. For p = 1 or q = 1this corresponds to the graph known as a star . The adjacency matrix of a complete bipartitegraph is A := pp ι p ι ′ q ι q ι ′ p qq ! . The associated row-normalized interaction matrix is W = pp q ι p ι ′ q p ι q ι ′ p qq ! . (2.3)Alternatively, A can be rescaled by its largest eigenvalue, yielding the symmetric interactionmatrix W = 1 √ pq A. (2.4)We refer to the network autoregression with interaction matrix (2.3) or (2.4), as, respectively,the row-normalized Complete Bipartite model and the symmetric Complete Bipartite model .It is easily verified that, for both (2.3) and (2.4), col( W ) is spanned by the vectors ( ι ′ p , ′ q ) ′ and(0 ′ p , ι ′ q ) ′ . Hence, for both the row-normalized Complete Bipartite model and the symmetricComplete Bipartite model, Assumption 1 is violated (for ω = 0) if col( X ) contains ( ι ′ p , ′ q ) ′ and (0 ′ p , ι ′ q ) ′ . This is the case whenever X contains an intercept for each of the two groups,and also in the two following circumstances: (i) X contains an intercept and a contextualeffect term W x , for some x ∈ R n ; (ii) X contains two contextual effect terms W x and W x , for some x , x ∈ R n . A row-normalized matrix is obtained by dividing each entry of a matrix by the corresponding row-sum,and is therefore a row-stochastic matrix. This is because, when W is the interaction matrix of a complete bipartite model, W x is in the span of( ι ′ p , ′ q ) ′ and (0 ′ p , ι ′ q ) ′ , for any x ∈ R n . Identifiability
This section studies identifiability of ( λ, β ) from the first two moments of y . Note that inmodels containing fixed effects one would often consider a transformation of y that removesthe fixed effects. We do not discuss, at this stage, identifiability after removal of the fixedeffects, which, depending on the specific model and the specific transformation, may be adifferent question—see Section 4. Instead, this section asks the more primitive question ofwhether all model parameters, including the fixed effects, are identifiable.We shall use the following definitions. Consider an observable random vector z ∈ R n with cumulative distribution function F ( z ; θ ) depending on a parameter θ ∈ Θ ⊆ R p . Aparticular value ∼ θ ∈ Θ I ⊆ Θ of θ is said to be identified (from the distribution of z ) on aset Θ I if there is no other ∼∼ θ ∈ Θ I such that F ( z ; ∼ θ ) = F ( z ; ∼∼ θ ) for all z ∈ R n . If all values ∼ θ ∈ Θ I are identified on Θ I , we say that the parameter θ is identified on Θ I . If all values ∼ θ ∈ Θ I except for those in a µ R n -null set are identified on Θ I , we say that the parameter θ is generically identified on Θ I . Next, the value ∼ θ ∈ Θ I is said to be identified from amoment m ( θ ) of z on a set Θ I if there is no other ∼∼ θ ∈ Θ I such that m ( ∼ θ ) = m ( ∼∼ θ ). Clearly,identification from a moment of z is sufficient but not necessary for identification from thedistribution of z . When no distributional assumption other than E( ε ) = 0 is imposed on model (2.1), identifi-cation can only occur via the first moment of Y . To explore this case, we need to be clearabout the set over which we wish to identify λ . Letting S ( λ ) := I n − λW , rewrite equation(2.1) as S ( λ ) y = Xβ + σε . In order for y to be uniquely determined, given X and ε , it isnecessary that det( S ( λ )) = 0, which requires λ = ω − , for any nonzero real eigenvalue ω of W . We refer to the set Λ u := { λ ∈ R : det( S ( λ )) = 0 } as the unrestricted parameter spacefor λ . In practice, the parameter space for λ is usually restricted much further, but, for now,it is convenient to focus on Λ u . Of course, if λ is identified on Λ u it is also identified on anysubset of Λ u . Lemma 3.1 (Identifiability from first moment) . In the network autoregression (2.1),(i) if rank(
X, W X ) > k , the parameter ( λ, β ) is generically identified on Λ u × R k ; ii) if rank( X, W X ) = k , no value of the parameter ( λ, β ) is identified on Λ u × R k from E( Y ) . Lemma 3.1 says that the parameters λ and β are generically identified (from the firstmoment of y ) if the matrices X and W are such that rank( X, W X ) > k . Conversely, ifrank( X, W X ) = k , λ and β cannot be identified, and hence consistently estimated, with-out distributional assumptions beyond E( ε ) = 0. For example, the 2SLS estimators ofKelejian and Prucha (1998) and Lee (2003), which are based on the specification of the firstmoment only of y , are not defined if rank( X, W X ) = k , because in that case no internalinstruments are available for the endogenous variable W y .The condition rank(
X, W X ) = k is trivially satisfied when k = 0 (pure model); otherwise,it is typically very strong. Indeed, for any given W , the set of (full rank) n × k matrices X such that rank( X, W X ) = k is a µ R n × k -null set. Accordingly, Lemma 3.1 could be stated bysaying that identification from the first moment of y is possible for generic parameter values( λ, β ) and for generic regressor matrices X . Nevertheless, specific combinations of W and X such that rank( X, W X ) = k may arise in some cases of interest, particularly in fixed effectsmodels. Some examples worth mentioning where it is easily verified that rank( X, W X ) = k are as follows:(a) Any network autoregression such that Assumption 1 is violated (because M X ( ωI n − W ) = 0 implies M X W X = 0, which is equivalent to rank(
X, W X ) = k ).(b) Some network fixed effects models of the type in Example 2: (b.i) A Group Interaction model with group specific slope coefficients, group fixedeffects, and with at least two groups ( R > X = L Rr =1 ( e X r , ι m r ),where the matrix e X r of regressors is m r × k r , with 0 ≤ k r < m r , so that k = R + P Rr =1 k r .(b.ii) A Balanced Group Interaction model with contextual effects, and with at leasttwo groups ( R > X = ( e X, W e X ) for some n × ˜ k matrix e X ofregressors, so that k = 2˜ k . In case (b.i), Assumption 1 is satisfied for generic matrices e X , . . . , e X r if the model is unbalanced, andis violated if the model is balanced (see Example 3). In cases (b.ii) and (b.iii), Assumption 1 is satisfied forgeneric e X . The condition rank(
X, W X ) = k is also satisfied if, when ˜ k = 1, an intercept is added to X , i.e., W r being the symmetric orrow-normalized adjacency matrix of a complete bipartite graph, with contex-tual effects, and with at least two groups ( R > X =( e X, W e X, L Rr =1 ι m r ) for some n × ˜ k matrix e X of regressors, with ˜ k ≥
0, so that k = R + 2˜ k .(c) Some models with fixed effects and no regressors (i.e., X contains only the dummiescorresponding to the fixed effects):(c.i) The one-way model of Example 1 with no regressors (i.e., X = ι T ⊗ I N ), as, forinstance, in Robinson and Rossi (2015).(c.ii) The two-way model of Example 1 with no regressors (i.e., X = ( ι T ⊗ I N , I T ⊗ ι N ))and row-stochastic f W (a matrix is said to be row-stochastic if all its row sumsare 1).(c.iii) The network fixed effects model (2.2) with no regressors (i.e., X = L Rr =1 ι m r )and all matrices W r ’s being row-stochastic. Note that, when R = 1, this reducesto an intercept-only network autoregression (2.1) with row-stochastic interactionmatrix.In cases such as those just listed, rank( X, W X ) = k and therefore λ and β cannot beidentified from E( Y ). As noted earlier, however, the condition rank( X, W X ) = k is verystrong in general. What might be more relevant in applications is that the condition isclose, in some sense, to being satisfied. In such a situation, it is natural to expect thatidentification from the first moment will be weak. We confirm this with a small simulationexperiment. We generate 10 ,
000 replications from model (2.1) with W a row-normalized2-ahead 2-behind interaction matrix (before row-standardization, this is a matrix with allentries in the two diagonals above and the two diagonals below the main diagonal equal toone, and zero everywhere else), and a single regressor equal to ι n + bz , where b ∈ R and z ∼ N(0 , I n ), with z being generated once and then kept fixed across replications. We set β = 1, σ = 1, and draw the errors independently from either a standard normal distributionor a gamma distribution with shape parameter 1 and scale parameter 1, demeaned by the X = ( ι n , e X, W e X ). When ˜ k > ι n ∈ col( e X, W e X ), and therefore an intercept cannot be added to ( e X, W e X )(one could, of course, replace one of the columns of ( e X, W e X ) with an intercept, and this would still giverank( X, W X ) = k ). λ and β cannot be identified fromthe first moment if b = 0, because in that case rank( X, W X ) = k = 1. Thus, we expectany estimator of λ and β that relies entirely on the specification of the first moment of y to perform poorly if b is close to 0. For illustration, we consider the 2SLS estimator withinstruments W X and W X for W y (Kelejian and Prucha, 1998), and we compare it withthe quasi maximum likelihood estimator (QMLE), which also uses the second moment (theQMLE is the MLE that maximizes the likelihood obtained when ε ∼ N(0 , I n ); see Section5). Table 1 displays the root median square error of the 2SLS and (Q)ML estimators of λ and β . The root median square error is reported rather than the more usual root meansquare error because, in the setting we are considering, the variance of the 2SLS estimatordoes not exist (see Roberts, 1995, Section 7.2.2). For both λ and β , and for both the normaland the gamma distributions, the performance of the 2SLS estimator is good, compared tothe (Q)MLE benchmark, when b = 1, but deteriorates rapidly as b gets smaller. Such adeterioration is due to both the bias and the dispersion of the 2SLS estimator growing largeas b decreases, for any n .Table 1: Root median square error of the 2SLS and (Q)ML estimator of λ and β . Normal Gamma λ β λ βn b .
080 0.061 0 .
067 0 .
059 0 .
081 0 .
062 0 .
066 0 . . .
598 0.095 0 .
593 0 .
116 0 .
593 0 .
095 0 .
582 0 . .
01 1 .
658 0.096 1 .
585 0 .
118 1 .
618 0 .
096 1 .
598 0 . .
024 0.019 0 .
020 0 .
018 0 .
024 0 .
019 0 .
020 0 . . .
189 0.030 0 .
188 0 .
036 0 .
190 0 .
030 0 .
189 0 . .
01 1 .
194 0.031 1 .
191 0 .
037 1 .
189 0 .
030 1 .
189 0 . In the simulation experiment, b can be interpreted as a measure of the distance fromnon-identifiability via the first moment. In more complex situations, one could construct ameasure of distance by observing that, since rank( X ) = k , rank( X, W X ) = k is equivalentto col( W X ) ⊆ col( X ) (i.e., in matrix theoretic language, to col( X ) being an invariantsubspace of W ) or, which is the same, to M X W X = 0. A distance from the conditionrank(
X, W X ) = k could then be provided by some norm of the matrix M X W X . We donot intend to study this rigorously here, but such a measure should help model users toavoid not only the cases in which inference based on the first moment is impossible (the9orm of M X W X is zero), but also the cases close to these (the norm of M X W X is closeto zero), in which inference is likely to be very challenging without additional distributionalassumptions.
It is useful to briefly compare Lemma 3.1 with some related results available in the literature,obtained by two different approaches.
First , Lee (2004) studies asymptotic properties ofthe quasi maximum likelihood estimator based on the Gaussian distribution. The conditionrank(
X, W X ) > k appearing in Lemma 3.1 can be interpreted as the finite sample equivalentof Assumption 8 in Lee (2004). Indeed, under the latter assumption (and other regularityassumptions) the limit of the Gaussian quasi-likelihood has a unique maximum at the truevalue of the parameters, which is sufficient (and necessary under correct specification ofthe likelihood) for identification; see Newey and McFadden (1994). Second , in the socialnetwork literature, identification of the structural parameters in model (2.1) is typicallyestablished by checking that those parameters can be uniquely recovered from the reducedform parameters (e.g., Bramoull´e et al., 2009; Blume et al., 2011; Kwok, 2019). Such astrategy obviously relies on the reduced form parameters being identified, which, in the caseof a fixed W , would typically require repeated observations of the cross-section, over time orsome other dimension. Because of this, identification via reduced form parameters may notbe appropriate in applications where a single observation of a network is available. Lemma3.1 can establish identifiability not only when repeated observations are available (in whichcase W is block diagonal with identical blocks, as in Example 1), but also when a singleobservation of the network is available. The following example shows that it is possible thatparameters are identified with repeated observations, but not with a single observation. Example . Consider a row-normalized or symmetric Complete Bipartite model with X =( ι n , x, W x ), for some x ∈ R n (such that X is full rank). Since the matrices I n , W , W are linearly independent, Proposition 1 in Bramoull´e et al. (2009) implies that λ and β areidentified from an i.i.d. sample of observations from the model. However, as noted in Example4, Assumption 1 fails, and therefore rank( X, W X ) = k . Thus, according to Lemma 3.1, λ and β cannot be identified from a single observation of the model, whatever the value of x . The applicability to the case of a single observation of a network is the most important10ifference between Lemma 3.1 and the approach in Bramoull´e et al. (2009). With repeatedobservations, Lemma 3.1 yields results that are similar to those in Bramoull´e et al. (2009), but with two less important differences. Firstly, Lemma 3.1 does not restrict attention tothe case when X contains contextual effects; our results can be used for that case, but alsofor the case when no contextual effects are included, or only some contextual effects areincluded. Secondly, Bramoull´e et al. (2009) assume that X is random with E( ε | X ) = 0,whereas, for the reasons mentioned in the Introduction, X is nonrandom in Lemma 3.1. So far, we have considered identifiability from the first moment of y , under the restrictionE( ε ) = 0. When identification from the first moment fails, identification may be achieved byimposing further restrictions on the model. The simplest of such restrictions is var( ε ) = I n , inwhich case identification can occur via the second moment of y . To see this, it is convenientto focus on a parameter space for λ that is smaller than Λ u . Consider the case when W has at least one (real) negative eigenvalue and at least one (real) positive eigenvalue. Thisis typically satisfied in both applications and theoretical studies. Denote the smallest realeigenvalue of W by ω min , and, without loss of generality, normalize the largest real eigenvalueto 1. The parameter space for λ is often restricted to the largest interval containing the originin which S ( λ ) is nonsingular, that is, Λ := ( ω − , , or a subset thereof (possibly independent of n ) such as ( − , λ is difficult to interpret. Lemma 3.2 (Identifiability from second moment) . Consider a network autoregression (2.1)with var( ε ) = I n , and assume that W has at least one negative eigenvalue and at least onepositive eigenvalue. The parameter ( λ, σ ) is identified on Λ × (0 , ∞ ) . Indeed, identification under repeated observations for the Complete Bipartite model, which is establishedvia Proposition 1 in Bramoull´e et al. (2009) in Example 5, can also be established by Lemma 3.1. Tosee this, note that R observations of the row-normalized or symmetric Complete Bipartite model with X = ( ι n , x, W x ) correspond to a network autoregression with interaction matrix W ∗ = I R ⊗ W and regressormatrix X ∗ = ( ι nR , x ∗ , W ∗ x ∗ ) for some x ∗ ∈ R nR . Then one can see that rank( X ∗ , W ∗ X ∗ ) > k if and onlyif R >
1. That is, Lemma 3.1 establishes that identification is achieved if and only if
R > If X and W were random, the restriction would be imposed on var( ε | W, X ), rather than on var( ε ). While not needed for Lemma 3.1, this restriction rules out the case when W is a scalar multiple of I n ,which trivially leads to non-identification in Lemma 3.1.
11f course, once λ is identified, β can be identified from the first moment E( y ) = ( I n − λW ) − Xβ , for any W and any (full rank) X . Lemma 3.2 complements two results availablein the literature that are concerned with identifiability from var( y ) on a different parameterspace for λ . Firstly, Lemma 3.2 is an extension of Lemma 4.2 in Preinerstorfer and P¨otscher(2017), which establishes identification of ( λ, σ ) on (0 , × (0 , ∞ ). Secondly, Lemma 4 inLee and Yu (2016) says that a sufficient condition for ( λ, σ ) to be identified from var( y )on Λ u × (0 , ∞ ) is that the matrices I n , W + W ′ and W ′ W are linearly independent. Thefollowing example considers a case when identification cannot be established by Lemma 4 inLee and Yu (2016), but can be by Lemma 3.2.
Example . Consider a balanced group interaction model (see Example 3) with var( ε ) = I n . According to Lemma 4 in Lee and Yu (2016), ( λ, σ ) is not identified from var( y ) onΛ u × (0 , ∞ ), because the matrices I n , W + W ′ and W ′ W are linearly dependent when W = I R ⊗ B m . However, Lemma 3.2 asserts that ( λ, σ ) is identified (from var( y )) onΛ × (0 , ∞ ) (and hence on any subset thereof).It should be noted that the restriction var( ε ) = I n is imposed only for simplicity, and is byno means crucial for identification from var( y ). Indeed, one could assume some parametricstructure for var( ε ), say var( ε ) = Σ( η ), and study identifiability of the parameter ( λ, σ , η )from var( y ) = σ ( I n − λW ) − Σ( η )( I n − λW ′ ) − , but we refrain from doing this here.At this point, it is worth considering the network (or spatial) error model y = Xβ + u, u = λW u + σε, (3.1)even though this specification is considerably less popular than model (2.1) in economicapplications. The same set of assumptions as in the paragraph after equation (2.1) will bemaintained for model (3.1). Lemma 3.2 also applies to the network error model, becauseequations (2.1) and (3.1) imply the same variance structure for y . On the other hand, inthe network error mode λ cannot obviously be identified from the first moment Xβ of y . Infact, the result in Lemma 3.1 can be interpreted as saying that λ and β cannot be identifiedfrom E( y ) in a network autoregression that behaves like a network error model. This point See also Theorem 3.2 in Davezies et al. (2009). Conditions for ( λ, σ ) to be identified from var( y ) canbe seen as finite sample counterparts of Assumption 9 in Lee (2004) (cf. Section 3.2). More precisely, for the variance matrix σ ( S ′ ( λ ) S ( λ )) − of the balanced group interaction model wehave σ ( S ′ ( λ ) S ( λ )) − = σ ( S ′ ( λ ) S ( λ )) − if and only if σ = m σ / (2 λ + m − and λ = − (( m − λ + 2(1 − m )) / (2 λ + m − λ / ∈ Λ if λ ∈ Λ.
12s made precise by the following argument. If rank(
X, W X ) = k , there exists a unique k × k matrix A such that W X = XA , and hence S − ( λ ) X = X ( I k − λA ) − , for any λ such that S ( λ ) is invertible. It follows that, when rank(
X, W X ) = k , the network autoregression y = S − ( λ ) Xβ + σS − ( λ ) ε can be written as y = X ( I k − λA ) − β + σS − ( λ ) ε , which is anetwork error model with regression coefficients ( I k − λA ) − β . This section discusses the full identifiability content of Assumption 1. We already knowfrom Section 3.1 that, in the network autoregression, a failure of Assumption 1 precludesidentification from the first moment of y , but not from the higher order moments of y . We arenow going to show that Assumption 1 is necessary for identification from statistics that areinvariant under certain transformations. Similar results to those in this section are obtainedin Preinerstorfer and P¨otscher (2017) for a general regression model with correlated errorsand for the particular case of a network error model with arbitrary W . We will need somenotions of group invariance (e.g., Lehmann and Romano, 2005, Chapter 6). Let G be a groupof transformations from the sample space into itself. A statistic is said to be invariant under G (or G -invariant) if it is constant on the orbits of G . It is said to be a maximal invariantunder G if it is invariant and takes different values on each orbit. A necessary and sufficientcondition for a statistic to be invariant under G is that it depends on the data only througha maximal invariant under G . Lastly, a family of distributions { P θ , θ ∈ Θ } , where Θ is theparameter space is said to be invariant under G if every pair g ∈ G , θ ∈ Θ determine a uniqueelement in Θ denoted by ¯ gθ , such that when y has distribution P θ , gy has distribution P ¯ gθ .In order to apply the theory of invariance, in this section the network autoregression (2.1)and the network error model (3.1) are regarded as families of distributions { P θ , θ ∈ Θ } for y , where θ := ( λ, β, σ , η ), with η being a parameter indexing the distribution of ε , and θ isassumed to be identified (from the distribution of y ). For a given regressor matrix X, wewill consider the group G X := { g κ,δ : κ > , δ ∈ R k } , where g κ,δ denotes the transformation y → κy + Xδ , and its subgroup G X := { g ,δ : δ ∈ R k } . A maximal invariant under G X is C X y ,where C X is an ( n − k ) × n matrix such that C X C ′ X = I n − k and C ′ X C X = M X , and a maximal It is easily seen that the eigenvalues of A are eigenvalues of W . Hence, I k − λA is invertible if S ( λ ) is. According to Lemma C.1 in Appendix C, the model y = X ( I k − λA ) − β + σS − ( λ ) ε has the same profilequasi log-likelihood l ( λ, σ ) as model (3.1), even though, clearly, the MLE of β will be different in the twomodels. G X is v := C X y/ k C X y k (with the convention that v = 0 if C X y = 0). We alsosay that an expectation µ y ( θ ) is G -invariant if every pair g ∈ G , θ ∈ Θ determine a unique¯ gθ such that µ gy ( θ ) = µ y (¯ gθ ). The non-identifiability result in Lemma 3.1(ii) can be seenas a consequence of the fact that, when rank( X, W X ) = k , the mean µ y ( λ, β ) := S − ( λ ) Xβ is G X -invariant. This type of invariance implies that, when rank(
X, W X ) = k , the meancan only identify a k -dimensional parameter, not the ( k + 1)-dimensional parameter ( λ, β ).Under the same rank restriction and with an additional assumption that we now state, thenetwork autoregression (not its mean only) is invariant under G X , in fact under G X . Assumption 2.
The distribution of ε does not depend on the parameters λ , β , and σ . Let P θ denote the distribution for y in the network error model, with θ := ( λ, β, σ , η ).Under Assumption 2, the network error model is G X -invariant, because gy has distribution P ¯ gθ , with ¯ gθ = ( λ, κβ + δ, κ σ , η ), for any g ∈ G X . For the network autoregression we havethe following result. Lemma 4.1.
Suppose Assumption 2 holds. The network autoregression (2.1) is G X -invariantif and only if rank( X, W X ) = k . We are now in a position to discuss the implications of Assumption 1. The “principleof invariance” asserts that inference in a model should be invariant under any group oftransformations under which the model is invariant. Accordingly, under Assumption 2,inference in a network error model should be based on G X -invariant procedures, whatever X and W are, and inference in a network autoregression should be based on G X -invariantprocedures whenever rank( X, W X ) = k , which, as we have seen in Section 3.1, is the case ifAssumption 1 is violated. However, the imposition of G X -invariance causes an identifiabilityissue when Assumption 1 fails. To see this, observe that if Assumption 1 fails then C X S ( λ ) =(1 − λω ) C X , and therefore premultiplying both sides of the network autoregression equation S ( λ ) y = Xβ + σε by C X yields C X y = σ − λω C X ε. (4.1)Equation (4.1) shows that, when Assumption 2 is satisfied but Assumption 1 is not, ( λ, β, σ )cannot be identified from the distribution of C X y and hence, since C X y is a maximal invariant If rank(
X, W X ) = k , there exists a unique k × k matrix A such that W X = XA , and hence S − ( λ ) X = X ( I k − λA ) − , for any λ such that S ( λ ) is invertible (note that I k − λA is invertible if S ( λ ) is, because theeigenvalues of A must be eigenvalues of W ). Hence µ g ,δ y ( λ, β ) = µ y (¯ g ( λ, β )), with ¯ g ( λ, β ) = ( I k − λA ) − β + δ. G X , cannot be identified from the distribution of any G X -invariant statistic. Exactlythe same conclusion obtains starting from the network error model y = Xβ + σS − ( λ ) ε . Theresult is particularly perverse for the network autoregression: when Assumption 1 fails, andunder Assumption 2, the model is G X -invariant, and yet its parameters cannot be identifiedfrom any G X -invariant statistic.It is possible to be more precise about the cause of non-identification. Suppose Assump-tion 1 is violated for some eigenvalue ω of W , and let g ω be the geometric multiplicity of ω . Recall from Section 2 that a pair (
X, W ) causes Assumption 1 to fail if and only if some ofthe columns of X span the subspace col( ωI n − W ). Observe that this requires k ≥ n − g ω ,because the dimension of col( ωI n − W ) is rank( ωI n − W ) = n − nullity( ωI n − W ) = n − g ω .Let X ω be the n × ( n − g ω ) matrix containing the columns of X that span col( ωI n − W ),and reorder the columns of X as in X = ( X ω , X ∗ ), where X ∗ is n × ( k − ( n − g ω )), with k − ( n − g ω ) ≥
0. Generalizing the argument leading to equation (4.1), if Assumption 1 failsthen C X ω S ( λ ) = (1 − λω ) C X ω , and therefore C X ω y = 11 − λω C X ω X ∗ β ∗ + σ − λω C X ω ε, (4.2)where β ∗ is the component of β corresponding to X ∗ . This shows that, under Assumption2, ( λ, β, σ ) cannot be identified from the distribution of C X ω y if Assumption 1 fails. Thatis, what really causes non-identification when Assumption 1 fails is the imposition of invari-ance with respect to the subgroup G X ω of G X , and what we said above about G X -invariantstatistics applies to the (larger) set of G X ω -invariant statistics. We summarize this result inthe following theorem, and then provide an example. Theorem 1.
Suppose that, in the network autoregression (2.1) or in the network error model(3.1), Assumption 2 is satisfied, but Assumption 1 fails for some eigenvalue ω of W . Then ( λ, β, σ ) cannot be identified from the distribution of any G X ω -invariant statistic. Theorem 1 says that, for any W , there are matrices of regressors that make invariantinference impossible—these are the matrices leading to a violation of Assumption 1, that is,the matrices whose column space contains one of the subspaces col( ωI n − W ), where ω is aneigenvalue of W . It is worth emphasizing that this result does not require any distributionalassumption other than Assumption 2. Note that, for fixed W and X , the condition M X ( ωI n − W ) = 0 that leads to a violation of Assumption1 can be satisfied at most by one eigenvalue ω . This is because M X ( ω I n − W ) = M X ( ω I n − W ) implies ω = ω . Also, note that M X ( ωI n − W ) = 0 implies that ω is real. xample . Consider a balanced group interaction model with group fixed effects. We haveseen in Example 3 that in this model Assumption 1 fails, because the columns of the fixedeffects matrix I R ⊗ ι m span col( ω min I n − W ) (i.e., in the notation introduced just beforeequation (4.2), X ω min = I R ⊗ ι m ). Theorem 1 therefore implies that, under Assumption 2,( λ, β, σ ) cannot be identified from any statistic that is invariant under G I R ⊗ ι m , even thoughthe model is invariant under that group.Since C X X = 0, imposing invariance with respect to G I R ⊗ ι m removes the group fixedeffects. Thus, Example 7 can be seen as a revisitation of the well-known identificationfailure that occurs in a balanced group interaction model upon removal of the group fixedeffects (Lee, 2007).To conclude this section, it is useful to make a connection with the results obtained inSection 3. Recall that the parameters of a network autoregression can generally be identifiedby specifying the variance structure of ε , regardless of whether Assumption 1 holds; forexample, this is certainly the case if var( ε ) = I n , by Lemma 3.2. According to Theorem 1,however, any result establishing identification from the distribution of y cannot be helpful forinvariant inference if Assumption 1 is not satisfied, because in that case identification is lostafter imposition of invariance with respect to the group G X ω . The next section considersthis point from a likelihood perspective.
We now study the consequences of Theorem 1 for likelihood estimation of the networkautoregression. The MLE that is typically used for a network autoregression is the onebased on the likelihood that would obtain if ε were distributed as N(0 , I n ), often referredsimply as the QMLE (quasi MLE); see, e.g., Lee (2004). It will also be useful to considerthe adjusted QMLE, which is obtained from the QMLE by centering the profile score for( λ, σ ) (see Yu et al., 2015). For estimation of ( λ, σ ), the adjusted QMLE usually performs For the use of invariance arguments to solve incidental parameter problems, see alsoChamberlain and Moreira (2009). Consider the model in Example 7. Due to the failure of Assumption 1, any result establishing identi-fication from the distribution of y cannot help to achieve reasonable inference in that model. This is so,for example, for Proposition 2 in de Paula (2017), which establishes identification from the variance of y forthe particular case R = 1, when | λ | <
1. Inference based on such a result cannot respect the invarianceproperties of the model, because the model is invariant under the group G ι n of transformations y → y + αι n , α ∈ R , but identification is lost on imposition of the group G ι n . β is large with respect to the sample size n (including in fixed effects models, in which case the dimension of β is increasing with n ).Let l ( λ, β, σ ) denote the Gaussian quasi log-likelihood for ( λ, β, σ ) in a network autore-gression or in a network error model, l ( λ ) the corresponding profile likelihood for λ, and l a ( λ )the adjusted profile likelihood for λ . The precise definitions of these likelihoods are givenin Appendix B. We say that a parameter θ is identified on a set Θ I from a quasi likelihood L ( θ ) if it is identified on Θ I from the distribution underlying L ( θ ) (so, if θ is identified onΘ I from the quasi likelihood L ( θ ), then L ( ∼ θ ) = L ( ∼∼ θ ) for almost all y ∈ R n implies ∼ θ = ∼∼ θ , forany ∼ θ, ∼∼ θ ∈ Θ I ). Clearly, Lemma 3.2 is sufficient to guarantee identification of ( λ, β, σ ) onΛ × R k × (0 , ∞ ) from l ( λ, β, σ ) for any pair X, W , including those pairs such that Assump-tion 1 is violated. However, a violation of Assumption 1 makes inference based on l ( λ, β, σ )pointless, in the following sense. Proposition 5.1.
Consider the network autoregression (2.1) or the network error model(3.1). If Assumption 1 is violated, then, for any λ such that det( S ( λ )) = 0 , and for any y / ∈ col( X ) ,(i) the profile score associated with the profile log-likelihood l ( λ ) does not depend on y ;(ii) the adjusted profile log-likelihood function l a ( λ ) is flat. In other words, when Assumption 1 fails, a maximizer of l ( λ ) (over Λ or any other subsetof R ), if it exists, is non-random, and l a ( λ ) does not contain any identifying informationabout λ .Part (ii) of Proposition 5.1 can be linked back to the invariance results of Section 4. Bystandard arguments (available for instance in Rahman and King, 1997), l a ( λ ) correspondsto the density of the maximal invariant v := C X y/ k C X y k under G X , for any network autore-gression model violating Assumption 1 and for any network error model. Then, the flatnessof l a ( λ ) can be understood in terms of the distribution of v being free of λ if the distributionof ε is free of λ , which follows from equation (4.1). It is easily verified that the maximal invariant induced by G X on the parameter space is λ (it would be( λ, η ) in the presence of a parameter η in the distribution of ε ). This may seem to contradict one of thefundamental results on invariance, which is usually stated by saying that the distribution of an invariantstatistic depends only on a maximal invariant induced on the parameter space (e.g., Lehmann and Romano,2005, Theorem 6.3.2). The apparent contradiction is due to the non-identification caused by the violationof Assumption 1. Conclusion
We have studied identification of an autoregression defined on a general network, under weakdistributional assumptions and without requiring repeated observations of the network. Inthis context, identification is possible for generic parameter values and for generic regressormatrices, whatever the network. However, important cases do exist when identification fails,either in the original sample space or after some transformation (this could be, for instance,a transformation aimed at removing fixed effects). We have shown that in the latter caseit is impossible to conduct inference that respects the invariance properties of the model,regardless of whether the parameters are identified from the second moment of the outcomevariable.It should be emphasized that our results have been derived under the assumption that thenetwork is fully known and exogenous, which may be unrealistic in many applications. Thestudy of identification when the network is (partially) unknown and/or endogenous remainsa key challenge in the literature (e.g., Blume et al., 2015; de Paula et al., 2020; Lewbel et al.,2019), and we hope that the results obtained in this paper can prove useful in that settingtoo.
Appendix A Further examples when Assumption 1 fails
Further to Examples 3 and 4, other two simple models in which Assumption 1 fails are asfollows.
Example . Consider the modification of Example 3 in which exclusive averaging is replacedby inclusive averaging, meaning that each unit interacts not only with all other units in agroup but also with itself. If there are R groups, each of size m r , the interaction matrix is W = L Rr =1 1 m r ι m r ι ′ m r . Since col( L Rr =1 1 m r ι m r ι ′ m r ) = col( L Rr =1 ι m r ), Assumption 1 is violated(at ω = 0) whenever X contains group intercepts. Note that in this case, contrary to thecase of exclusive averaging, Assumption 1 fails regardless of whether the model is balancedor not. Example . Example 4 generalizes immediately to complete R -partite graphs, with R ≥ R -partite graph is a graph in which the n observational units can be divided into R partitions, with all units in a partition interacting with all in other partitions, but withnone in their own partition). In that case, Assumption 1 is violated (at ω = 0) whenever X R partitions.Examples 8 and 9 share important similarities, due to the fact that the graphs underlyingthe two models are complements of each other, in the graph theoretic sense. Indeed, forboth models, the condition col( L Rr =1 ι m r ) ⊆ col( X ) leading to a failure of Assumption 1is also satisfied if: (i) X contains an intercept and R − W x i , forsome x , . . . , x R − ∈ R n ; (ii) X contains R contextual effect terms W x , . . . , W x R , for some x , . . . , x R ∈ R n . Appendix B The QMLE and the adjusted QMLE
Omitting additive constants, the quasi log-likelihood for ε ∼ N(0 , I n ) in the network autore-gression (2.1) is l ( λ, β, σ ) := − n σ ) + log | det( S ( λ )) | − σ ( S ( λ ) y − Xβ ) ′ ( S ( λ ) y − Xβ ) , (B.1)for any λ such that S ( λ ) is nonsingular. To avoid tedious repetitions, we often omit the“quasi-” in front of “log-likelihood”. The QMLE in most common use maximizes l ( λ, β, σ )under the condition that λ is in Λ (or in a subset thereof). That is, the QMLE of ( λ, β, σ )is (ˆ λ ML , ˆ β ML , ˆ σ ) := argmax β ∈ R k , σ > , λ ∈ Λ l ( λ, β, σ ) . Maximization with respect to β and σ gives ˆ β ML ( λ ) := ( X ′ X ) − X ′ S ( λ ) y and ˆ σ ( λ ) := n y ′ S ′ ( λ ) M X S ( λ ) y . Thus, ˆ λ ML can be conveniently computed by maximizing over Λ theprofile likelihood for λ , l ( λ ) := l ( λ, ˆ β ML ( λ ) , ˆ σ ( λ )) = − n (cid:0) ˆ σ ( λ ) (cid:1) + log | det( S ( λ )) | , (B.2)where additive constants have again been omitted.When the dimension of β is large compared to the sample size, the QMLE of ( λ, σ )may perform poorly. To tackle this problem, the QMLE of ( λ, σ ) can be adjusted byrecentering the profile score s ( λ, σ ) associated to the profile log-likelihood for ( λ, σ ), In order to be full rank, X can contain at most R − R contextual effect terms otherwise. This assumes that Λ is well defined. If W did not have a negative (resp., positive) eigenvalue, then theleft (resp., right) extreme of Λ could be taken to be −∞ (resp., + ∞ ). ( λ, σ ) := l ( ˆ β ML ( λ ) , σ , λ ). Under the assumptions E( ε ) = 0 and var( ε ) = I n , E( s ( λ, σ )) isavailable analytically and does not depend on the nuisance parameter β . Thus, calculationof the adjusted profile score s a ( λ, σ ) := s ( σ , λ ) − E( s ( λ, σ )) is straightforward. Given s a ( λ, σ ), one can define the adjusted likelihood l a ( λ, σ ) as the function with gradient equalto s a ( λ, σ ), and hence the adjusted QMLE (ˆ λ aML , ˆ σ ) as the maximizer of l a ( λ, σ ). Also,letting ˆ σ ( λ ) be the adjusted QMLE of σ for given λ , we define the adjusted likelihoodfor λ only as l a ( λ ) := l a ( λ, ˆ σ ( λ )). See Yu et al. (2015) for details on these constructions. Appendix C Proofs
Lemma C.1.
The network autoregression (2.1) and the network error model (3.1) imply thesame profile quasi log-likelihood function for ( λ, σ ) if and only if rank( X, W X ) = k . Proof of Lemma C.1.
On concentrating the nuisance parameter β out of the likelihood(B.1), the profile quasi log-likelihood for ( λ, σ ) in a network autoregression is, up to anadditive constant, l ( λ, σ ) := l ( ˆ β ML ( λ ) , σ , λ ) = − n σ ) + log | det( S ( λ )) | − σ y ′ S ′ ( λ ) M X S ( λ ) y. (C.1)Similarly, the profile quasi log-likelihood function for ( λ, σ ) in a network error model, basedagain on the assumption ε ∼ N(0 , I n ), is l ( λ, σ ) := − n σ ) + log | det( S ( λ )) | − σ y ′ S ′ ( λ ) M S ( λ ) X S ( λ ) y. (C.2)The two log-likelihood functions are the same if and only if M S ( λ ) X = M X for any λ such that S ( λ ) is invertible. But, for any λ such that S ( λ ) is invertible, the condition M S ( λ ) X = M X is equivalent to col( S ( λ ) X ) = col( X ), and hence to col( W X ) ⊆ col( X ), which in turn is thesame as rank( X, W X ) = k. Proof of Lemma 3.1.
The parameter ( λ, β ) is identified on Λ u × R k from E( Y ) = S − ( λ ) Xβ if S − ( ∼ λ ) X ∼ β = S − ( ∼∼ λ ) X ∼∼ β implies ( ∼ λ, ∼ β ) = ( ∼∼ λ, ∼∼ β ) for any two values ( ∼ λ, ∼ β ) , ( ∼∼ λ, ∼∼ β ) of ( λ, β ) inΛ u × R k . One immediately has that S − ( ∼ λ ) X ∼ β = S − ( ∼∼ λ ) X ∼∼ β if and only if X ( ∼ β − ∼∼ β ) + W X ( ∼ λ ∼∼ β − ∼∼ λ ∼ β ) = 0 . (C.3)20e analyze separately three (exhaustive) cases, depending on the rank of the n × k matrix( X, W X ). Recall that X is assumed to be of full column rank.(i) rank( X, W X ) = 2 k . In this case equation (C.3) is equivalent to ∼ β = ∼∼ β and ∼ λ ∼∼ β = ∼∼ λ ∼ β, from which ( ∼ λ, ∼ β ) = ( ∼∼ λ, ∼∼ β ) if and only if ∼ β = ∼∼ β = 0 . That is, ( λ, β ) is identified onΛ u × R k \{ } from E( Y ) . (ii) k < rank( X, W X ) < k . Partition X as ( X , X ) where X is n × k and X is n × k , with 0 < k < k . The case k < rank( X, W X ) < k may be characterized byassuming rank( X, W X ) = k + k and W X = XB + W X C, for some k × k matrix B and some k × k matrix C , so that rank( X, W X ) = k + k . Replacing W X with(
W X , XB + W X C ) in (C.3), and letting ( β ′ , β ′ ) be the partition of β ′ conformablewith that of X , we obtain X ( ∼ β − ∼∼ β + B ( ∼ λ ∼∼ β − ∼∼ λ ∼ β )) + W X ( ∼ λ ∼∼ β − ∼∼ λ ∼ β + C ( ∼ λ ∼∼ β − ∼∼ λ ∼ β )) = 0 , which is satisfied if and only if ∼ β − ∼∼ β + B ( ∼ λ ∼∼ β − ∼∼ λ ∼ β ) = 0 and ∼ λ ∼∼ β − ∼∼ λ ∼ β + C ( ∼ λ ∼∼ β − ∼∼ λ ∼ β ) =0. As a linear system in the unknowns ∼∼ λ and ∼∼ β , these two equations are M ( ∼ λ, ∼ β ) ∼∼ λ ∼∼ β = ∼ β k ! , (C.4)where the matrix M ( ∼ λ, ∼ β ) := B ∼ β I k − ∼ λ (0 k,k , B ) ∼ β + C ∼ β − ∼ λ ( I k , C ) ! is of dimension ( k + k ) × (1 + k ). Now, identification of ( λ, β ) from E( Y ) is equiv-alent to ( ∼ λ, ∼ β ) being the unique solution to system (C.4), and this occurs if andonly if rank( M ( ∼ λ, ∼ β )) = 1 + k , or, equivalently, det( M ( ∼ λ, ∼ β ) ′ M ( ∼ λ, ∼ β )) = 0 . Butdet( M ( ∼ λ, ∼ β ) ′ M ( ∼ λ, ∼ β )) is a polynomial in ( ∼ λ, ∼ β ) and hence the set of its zeros is ei-ther the whole R k +1 or has zero measure with respect to µ R k +1 . The former case iseasily ruled out (e.g., M ( ∼ λ, ∼ β ) has rank k + 1 for ( ∼ λ, ∼ β ) = (0 , (1 ′ k , ′ k ) ′ )), which meansthat ( λ, β ) is generically identified from E( Y ).(iii) rank( X, W X ) = k . This happens if and only if there is a k × k matrix A such that21 X = XA . In that case, equation (C.3) becomes X ( ∼ β − ∼∼ β + A ( ∼ λ ∼∼ β − ∼∼ λ ∼ β )) = 0, which,since rank( X ) = k , is equivalent to ∼ β − ∼∼ β + A ( ∼ λ ∼∼ β − ∼∼ λ ∼ β ) = 0. Rewrite the last equalityas ( I k − ∼∼ λA ) ∼ β − ( I k − ∼ λA ) ∼∼ β = 0. Since the eigenvalues of A are eigenvalues of W,I k − λA is invertible for any λ ∈ Λ u , and therefore ∼ β = ( I k − ∼∼ λA ) − ( I k − ∼ λA ) ∼∼ β . Thisshows that for any ( ∼∼ λ, ∼∼ β ) ∈ Λ u × R k , it is possible to find ( ∼ λ, ∼ β ) = ( ∼∼ λ, ∼∼ β ) such that S − ( ∼ λ ) X ∼ β = S − ( ∼∼ λ ) X ∼∼ β .Summarizing, ( λ, β ) is generically identified from E( Y ), and hence generically identified,on Λ u × R k in cases (i) and (ii), and not identified from E( Y ) on Λ u × R k in case (iii). Proof of Lemma 3.2.
This proof is similar to the proof of Lemma 4.2 in Preinerstorfer and P¨otscher(2017). Under the assumption that var( ε ) = I n , var( y ) = σ ( S ′ ( λ ) S ( λ )) − . We show that, if ∼∼ σ S ′ ( ∼ λ ) S ( ∼ λ ) = ∼ σ S ′ ( ∼∼ λ ) S ( ∼∼ λ ) for any two parameter values ( ∼ λ, ∼ σ ) , ( ∼∼ λ, ∼∼ σ ) ∈ Λ × (0 , ∞ ), then( ∼ λ, ∼ σ ) = ( ∼∼ λ, ∼∼ σ ). The maintained assumption that W has at least one negative eigen-value and at least one positive eigenvalue guarantees the existence of a nonzero vector f ∈ null( W − I n ) and a nonzero vector g ∈ null( W − ω min I n ). Multiplying both sidesof the equality ∼∼ σ S ′ ( ∼ λ ) S ( ∼ λ ) = ∼ σ S ′ ( ∼∼ λ ) S ( ∼∼ λ ) by f ′ on the left and f on the right gives ∼∼ σ (1 − ∼ λ ) f ′ f = ∼ σ (1 − ∼∼ λ ) f ′ f . Since 1 − λ > λ ∈ Λ , and f ′ f = 0 , thelast equality is equivalent to ∼∼ σ/ ∼ σ = (1 − ∼∼ λ ) / (1 − ∼ λ ). Repeating with g in place of f gives ∼∼ σ/ ∼ σ = (1 − ∼∼ λω min ) / (1 − ∼ λω min ). Thus, we must have (1 − ∼ λω min ) / (1 − ∼ λ ) = (1 − ∼∼ λω min ) / (1 − ∼∼ λ ).Since the function λ (1 − λω min ) / (1 − λ ) is strictly increasing on Λ, we have ∼ λ = ∼∼ λ , andhence ∼ σ = ∼∼ σ . Proof of Lemma 4.1.
For any λ such that S ( λ ) is nonsingular and under Assumption 2,it is clear from the reduced form y = S − ( λ ) Xβ + σS − ( λ ) ε that a network autoregression isinvariant under G X if and only if col( S − ( λ ) X ) = col( X ), or, which is the same, col( S ( λ ) X ) =col( X ). But this is all that is required, because, as noted in the proof of Lemma C.1,the condition col( S ( λ ) X ) = col( X ) for any λ such that S ( λ ) is invertible is equivalent torank( X, W X ) = k . Proof of Proposition 5.1.
For any λ such that rank( S ( λ )) = n , and for any y / ∈ null( M X S ( λ )), the profile log-likelihood l ( λ ) for a network autoregression is given by equation(B.2). Note that equation (B.2) holds a.s. for any fixed λ such that rank( S ( λ )) = n , becausenull( M X S ( λ )) is a µ R n -null set when rank( S ( λ )) = n (since k < n ). If Assumption 1 is vio-lated for an eigenvalue ω of W , then M X ( ωI n − W ) = 0 and hence M X S ( λ ) = (1 − λω ) M X ,22hich substituted into (B.2) gives l ( λ ) = log | det( S ( λ )) | − n log | − λω | − n y ′ M X y ) , (C.5)for any y / ∈ col( X ). Since a violation of Assumption 1 implies rank( X, W X ) = k , equation(C.5) also applies to a network error model, by Lemma C.1. Part (i) of the propositionfollows on noting that the terms in (C.5) that contain λ do not contain y . Next, let s ( λ ) bethe profile score associated with l ( λ ), let s a ( λ ) := s ( λ ) − E( s ( λ )) be its adjusted counterpart,and let l ∗ a ( λ ) := R s a ( λ ) d λ be the likelihood corresponding to s a ( λ ). It can be easily verifiedthat l a ( λ ) = n − kn l ∗ a ( λ ) (the adjusted profile likelihood l a being defined in Appendix B). IfAssumption 1 is violated, then, from part (i), E( s ( λ )) = s ( λ ), and hence s a ( λ ) = 0, whichin turn implies that l ∗ a ( λ ), and hence l a ( λ ), is constant. This completes the proof. References
Blume, L. E., Brock, W. A., Durlauf, S. N., Ioannides, Y. M., 2011. Identification of social inter-actions. Vol. 1 of Handbook of Social Economics. North-Holland, pp. 853–964.Blume, L. E., Brock, W. A., Durlauf, S. N., Jayaraman, R., 2015. Linear social interactions models.Journal of Political Economy 123 (2), 444–496.Bramoull´e, Y., Djebbari, H., Fortin, B., 2009. Identification of peer effects through social networks.Journal of Econometrics 150 (1), 41–55.Chamberlain, G., Moreira, M. J., 2009. Decision theory applied to a linear panel data model.Econometrica 77 (1), 107–133.Cressie, N., 1993. Statistics for Spatial Data. Wiley, New York.Davezies, L., D’Haultfoeuille, X., Foug´ere, D., 2009. Identification of peer effects using group sizevariation. The Econometrics Journal 12 (3), 397–413.de Paula, ´A., 2017. Econometrics of Network Models. Vol. 1 of Econometric Society Monographs.Cambridge University Press, pp. 268–323.de Paula, ´A., Rasul, I., Souza, P. C., 2020. Identifying network ties from panel data: Theory andan application to tax competition. Working paper.Drton, M., Foygel, R., Sullivant, S., 2011. Global identifiability of linear structural equation models.Ann. Statist. 39 (2), 865–886.Gupta, A., 2019. Estimation of spatial autoregressions with stochastic weight matrices. EconometricTheory 35 (2), 417–463. elejian, H. H., Prucha, I. R., 1998. A generalized spatial two-stage least squares procedure for es-timating a spatial autoregressive model with autoregressive disturbances. Journal of Real EstateFinance and Economics 17 (1), 99–121.Kwok, H. H., 2019. Identification and estimation of linear social interaction models. Journal ofEconometrics 210 (2), 434–458.Lee, L.-F., 2003. Best spatial two-stage least squares estimators for a spatial autoregressive modelwith autoregressive disturbances. Econometric Reviews 22 (4), 307–335.Lee, L.-F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial au-toregressive models. Econometrica 72 (6), 1899–1925.Lee, L.-F., 2007. Identification and estimation of econometric models with group interactions,contextual factors and fixed effects. Journal of Econometrics 140 (2), 333–374.Lee, L.-F., Yu, J., 2016. Identification of spatial Durbin panel models. Journal of Applied Econo-metrics 31 (1), 133–162.Lehmann, E. L., Romano, J. P., 2005. Testing statistical hypotheses, 3rd Edition. Springer Textsin Statistics. Springer, New York.LeSage, J., Pace, R., 2009. Introduction to Spatial Econometrics. Chapman and Hall/CRC, NewYork.Lewbel, A., Qu, X., Tang, X., 2019. Social networks with misclassifed or unobserved links. Workingpaper.Manski, C. F., 1993a. Identification of endogenous social effects: The reflection problem. TheReview of Economic Studies 60 (3), 531–542.Manski, C. F., 1993b. Identification of endogenous social effects: The reflection problem. TheReview of Economic Studies 60 (3), 531–542.Newey, W. K., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Engle,R. F., McFadden, D. L. (Eds.), Handbook of Econometrics. Vol. 4. Elsevier, Ch. 36, pp. 2111–2245.Preinerstorfer, D., P¨otscher, B. M., 2017. On the power of invariant tests for hypotheses on acovariance matrix. Econometric Theory 33 (1), 1–68.Rahman, S., King, M. L., 1997. Marginal-likelihood score-based tests of regression disturbances inthe presence of nuisance parameters. Journal of Econometrics 82 (1), 81–106.Roberts, L. A., 1995. On the existence of moments of ratios of quadratic forms. Econometric Theory11 (4), 750–774.Robinson, P. M., Rossi, F., 2015. Refinements in maximum likelihood inference on spatial autocor-relation in panel data. Journal of Econometrics 189 (2), 447–456. hittle, P., 1954. On stationary processes in the plane. Biometrika 41 (3/4), 434–449.Yu, D., Bai, P., Ding, C., 2015. Adjusted quasi-maximum likelihood estimator for mixed regres-sive, spatial autoregressive model and its small sample bias. Computational Statistics and DataAnalysis 87, 116–135.hittle, P., 1954. On stationary processes in the plane. Biometrika 41 (3/4), 434–449.Yu, D., Bai, P., Ding, C., 2015. Adjusted quasi-maximum likelihood estimator for mixed regres-sive, spatial autoregressive model and its small sample bias. Computational Statistics and DataAnalysis 87, 116–135.