[PDF] Non-Identifiability in Network Autoregressions

Abstract

We study identification in autoregressions defined on a general network. Most identification conditions that are available for these models either rely on repeated observations, are only sufficient, or require strong distributional assumptions. We derive conditions that apply even if only one observation of a network is available, are necessary and sufficient for identification, and require weak distributional assumptions. We find that the models are generically identified even without repeated observations, and analyze the combinations of the interaction matrix and the regressor matrix for which identification fails. This is done both in the original model and after certain transformations in the sample space, the latter case being important for some fixed effects specifications.

Full PDF

aa r X i v : . [ ec on . E M ] N ov Non-Identiﬁability in Network Autoregressions

Federico Martellosio ∗ November 20, 2020

Abstract

We study identiﬁcation in autoregressions deﬁned on a general network. Most identiﬁ-cation conditions that are available for these models either rely on repeated observations,are only suﬃcient, or require strong distributional assumptions. We derive conditions thatapply even if only one observation of a network is available, are necessary and suﬃcient foridentiﬁcation, and require weak distributional assumptions. We ﬁnd that the models aregenerically identiﬁed even without repeated observations, and analyze the combinations ofthe interaction matrix and the regressor matrix for which identiﬁcation fails. This is doneboth in the original model and after certain transformations in the sample space, the lattercase being important for some ﬁxed eﬀects speciﬁcations.

Keywords : ﬁxed eﬀects, invariance, networks, quasi maximum likelihood estimation.

JEL Classiﬁcation : C12, C21.

A simple way to model interaction on a general network is to use an autoregressive processfor an outcome variable, usually conditional on covariates. Models of this type can betraced back at least to Whittle (1954), and have since proved useful in many applications,across many scientiﬁc ﬁelds. In economics, and the social sciences more generally, theyare currently particularly popular in the analysis of peer eﬀects and social networks. Themodels are known as simultaneous autoregressions in the statistics literature (e.g., Cressie,1993), spatial autoregressions in the econometrics literature (e.g., LeSage and Pace, 2009),are closely related to linear-in-means models (e.g., Manski, 1993a), and have importantconnections to linear structural equation models (e.g., Drton et al., 2011). To emphasizetheir wide applicability, we refer to them as network autoregressions . ∗ School of Economics, University of Surrey, [email protected] W and the regressor matrix X lead to a failure of identiﬁcation. To this end, we restrict attention to the case where both W and X are nonstochastic and known, as in Lee (2004). We show that identiﬁcation from the ﬁrst moment is generally possible, and characterizethe cases when it is impossible. We focus on one class of such cases, which is particularlyrelevant in ﬁxed eﬀects models (for example, the classical linear-in-means model with groupﬁxed eﬀects belongs to this class of cases). In this class, non-identiﬁability from the ﬁrst mo-ment is linked to the impossibility of invariant inference; that is, the parameters cannot beidentiﬁed from any statistic that is invariant with respect to a certain group of transforma-tions under which the model itself is invariant. This fundamental type of non-identiﬁabilityoccurs despite the fact that the parameters may be identiﬁable from the second moment ofthe outcome variable.Section 2 sets out the framework. Section 3 studies identiﬁability from the ﬁrst andsecond moments of the outcome variable. Identiﬁability after imposition of invariance isdiscussed in Section 4, and implications for likelihood inference in Section 5. Section 6brieﬂy concludes. The appendices contain additional material and all proofs.

Notation . Throughout the paper, ι n denotes the n × M A denotes theorthogonal projector onto col ⊥ ( A ) ( M A := I n − A ( A ′ A ) − A ′ if A has full column rank), µ R n denotes the Lebesgue measure on R n , “a.s.” stands for almost surely, with respect to µ R n , It would be possible, alternatively, to study identiﬁability conditional on W and/or X , under suitableexogeneity assumptions (see, e.g., Bramoull´e et al., 2009; Gupta, 2019), at the cost of some notationalcomplexity. Allowing for endogeneity of W and/or X would instead require diﬀerent methods; see Section6. A ⊕ B denotes the direct sum of the matrices A and B (that is, if A is n × m and B is p × q , A ⊕ B is the np × mq block diagonal matrix with A as top diagonal block and B asbottom diagonal block). The model of interest is the network autoregression y = λW y + Xβ + σε, (2.1)where y is the n × λ is a scalar parameter, W is an interaction matrix, X is an n × k matrix of regressors with full column rank and with k ≤ n − β ∈ R k , σ is a positive scale parameter, and ε is an unobserved zero mean n × W and X are taken to be nonstochastic and known. The entriesof W are supposed to reﬂect the pairwise interaction between the observational units; inparticular, the ( i, j )-th entry of W is zero if unit j is not deemed to be a neighbor of unit i .Some of the columns of X may be spatial lags of some other columns (the spatial lag of avector x being the vector W x ). That is, in the terminology of social networks, we allow for“contextual eﬀects” or “exogenous spillovers”.When the index set of y has more than one dimension (e.g., individuals and time, orindividuals and networks), it is often useful to include in the error term additive unobservedcomponents relative to those dimensions. In that case, we take a ﬁxed eﬀects approachand treat the unobserved eﬀects as parameters to be estimated. Accordingly, for inferentialpurposes, we incorporate the ﬁxed eﬀects into β and the corresponding dummy variables into X . Two examples of ﬁxed eﬀects speciﬁcations that can be nested into the general model(2.1) are given next. Example . There are N individuals, followed over T time periods. Let f W be an N × N matrix describing the interaction between individuals, and e X an N T × ˜ k regressormatrix. The interaction matrix f W is assumed to be constant over time for simplicity. A paneldata version of the network autoregression (2.1) is given by y it = λ P ij f W ij y jt + e x ′ it ˜ β + u it ,where f W ij are the entries of f W , and e x ′ it are the ˜ k × e X , for i = 1 , . . . , N and t = 1 , . . . , T. The error u it is decomposed into c i + σε it (one-way model) or c i + α t + σε it (two-way model), where c i and α t are, respectively, individual and time ﬁxed eﬀects, and ε it is an3diosyncratic error. Following a ﬁxed eﬀects approach (i.e., treating the random components c i and α t as parameters to be estimated), the model can be written in the notation ofequation (2.1), with W = I T ⊗ f W , and, for the two-way model, X = ( e X, ι T ⊗ I N , I T ⊗ ι N )and β = ( ˜ β ′ , c ′ , α ′ ) ′ , where c and α are the vectors with entries c i and α t , respectively. Example . There are R networks, with network r having m r indi-viduals. The model is y r = λW r y r + e X r γ + α r ι m r + σε r , r = 1 , . . . , R, (2.2)where W r is the m r × m r interaction matrix of network r , α r is a network ﬁxed eﬀect, e X r isan m r × ˜ k matrix of regressors, and γ is a ˜ k × y = ( y ′ , . . . , y ′ R ) ′ , W = L Rr =1 W r , β = ( γ ′ , α ′ ) ′ , ε = ( ε ′ , . . . , ε ′ R ) ′ , and X = ( e X, L Rr =1 ι m r ),with e X := ( e X ′ , . . . , e X ′ R ) ′ .We now present an assumption that plays a crucial role throughout the paper. Assumption 1.

There is no real eigenvalue ω of W for which M X ( ωI n − W ) = 0 . Assumption 1 is required to rule out some pathological combinations of W and X . Moreprecisely, we shall see in Section 4 that a failure of Assumption 1 implies a particular type ofnon-identiﬁability. A condition equivalent to M X ( ωI n − W ) = 0 is col( ωI n − W ) ⊆ col( X ).That is, a pair ( X, W ) causes Assumption 1 to fail if and only if col( X ) contains the subspacecol( ωI n − W ), for some real eigenvalue ω of W . Also, note that if, for a given W , Assumption1 is violated for some X = X , then it is also violated for X = ( X , X ), for any X (suchthat X is full rank). It is helpful to look at two examples in which Assumption 1 fails (furtherexamples are given in Appendix A). Example . A particular case of model (2.2), whichwe refer to as the

Group Interaction model , is when all members of a group interact homo-geneously, that is, W r = m r − ( ι m r ι ′ m r − I m r ) =: B m r , for r = 1 , . . . , R . Following Manski(1993b), this speciﬁc structure has played a central role in the literature on peer eﬀects. Wesay that the Group Interaction model is balanced if all group sizes m r are the same. In thatcase, letting m denote the common group size, W = I R ⊗ B m . It is easily veriﬁed that, forthe matrix W = I R ⊗ B m , ω min = − m − and col( ω min I n − W ) = col( I R ⊗ ι m ). Since I R ⊗ ι m Obviously for identiﬁcation of β one column of the matrix ( ι T ⊗ I N , I T ⊗ ι N ) should be omitted from X , or some normalization should be imposed on the ﬁxed eﬀects, and no regressor should be constant overtime or over individuals.

4s the design matrix of the group ﬁxed eﬀects, it follows that the balanced group interactionmodel violates Assumption 1 whenever it includes group ﬁxed eﬀects.

Example . In a complete bipartite graph the n observationalunits are partitioned into two groups of sizes p and q, say, with all units within a groupinteracting with all in the other group, but with none in their own group. For p = 1 or q = 1this corresponds to the graph known as a star . The adjacency matrix of a complete bipartitegraph is A := pp ι p ι ′ q ι q ι ′ p qq ! . The associated row-normalized interaction matrix is W = pp q ι p ι ′ q p ι q ι ′ p qq ! . (2.3)Alternatively, A can be rescaled by its largest eigenvalue, yielding the symmetric interactionmatrix W = 1 √ pq A. (2.4)We refer to the network autoregression with interaction matrix (2.3) or (2.4), as, respectively,the row-normalized Complete Bipartite model and the symmetric Complete Bipartite model .It is easily veriﬁed that, for both (2.3) and (2.4), col( W ) is spanned by the vectors ( ι ′ p , ′ q ) ′ and(0 ′ p , ι ′ q ) ′ . Hence, for both the row-normalized Complete Bipartite model and the symmetricComplete Bipartite model, Assumption 1 is violated (for ω = 0) if col( X ) contains ( ι ′ p , ′ q ) ′ and (0 ′ p , ι ′ q ) ′ . This is the case whenever X contains an intercept for each of the two groups,and also in the two following circumstances: (i) X contains an intercept and a contextualeﬀect term W x , for some x ∈ R n ; (ii) X contains two contextual eﬀect terms W x and W x , for some x , x ∈ R n . A row-normalized matrix is obtained by dividing each entry of a matrix by the corresponding row-sum,and is therefore a row-stochastic matrix. This is because, when W is the interaction matrix of a complete bipartite model, W x is in the span of( ι ′ p , ′ q ) ′ and (0 ′ p , ι ′ q ) ′ , for any x ∈ R n . Identiﬁability

This section studies identiﬁability of ( λ, β ) from the ﬁrst two moments of y . Note that inmodels containing ﬁxed eﬀects one would often consider a transformation of y that removesthe ﬁxed eﬀects. We do not discuss, at this stage, identiﬁability after removal of the ﬁxedeﬀects, which, depending on the speciﬁc model and the speciﬁc transformation, may be adiﬀerent question—see Section 4. Instead, this section asks the more primitive question ofwhether all model parameters, including the ﬁxed eﬀects, are identiﬁable.We shall use the following deﬁnitions. Consider an observable random vector z ∈ R n with cumulative distribution function F ( z ; θ ) depending on a parameter θ ∈ Θ ⊆ R p . Aparticular value ∼ θ ∈ Θ I ⊆ Θ of θ is said to be identiﬁed (from the distribution of z ) on aset Θ I if there is no other ∼∼ θ ∈ Θ I such that F ( z ; ∼ θ ) = F ( z ; ∼∼ θ ) for all z ∈ R n . If all values ∼ θ ∈ Θ I are identiﬁed on Θ I , we say that the parameter θ is identiﬁed on Θ I . If all values ∼ θ ∈ Θ I except for those in a µ R n -null set are identiﬁed on Θ I , we say that the parameter θ is generically identiﬁed on Θ I . Next, the value ∼ θ ∈ Θ I is said to be identiﬁed from amoment m ( θ ) of z on a set Θ I if there is no other ∼∼ θ ∈ Θ I such that m ( ∼ θ ) = m ( ∼∼ θ ). Clearly,identiﬁcation from a moment of z is suﬃcient but not necessary for identiﬁcation from thedistribution of z . When no distributional assumption other than E( ε ) = 0 is imposed on model (2.1), identiﬁ-cation can only occur via the ﬁrst moment of Y . To explore this case, we need to be clearabout the set over which we wish to identify λ . Letting S ( λ ) := I n − λW , rewrite equation(2.1) as S ( λ ) y = Xβ + σε . In order for y to be uniquely determined, given X and ε , it isnecessary that det( S ( λ )) = 0, which requires λ = ω − , for any nonzero real eigenvalue ω of W . We refer to the set Λ u := { λ ∈ R : det( S ( λ )) = 0 } as the unrestricted parameter spacefor λ . In practice, the parameter space for λ is usually restricted much further, but, for now,it is convenient to focus on Λ u . Of course, if λ is identiﬁed on Λ u it is also identiﬁed on anysubset of Λ u . Lemma 3.1 (Identiﬁability from ﬁrst moment) . In the network autoregression (2.1),(i) if rank(

X, W X ) > k , the parameter ( λ, β ) is generically identiﬁed on Λ u × R k ; ii) if rank( X, W X ) = k , no value of the parameter ( λ, β ) is identiﬁed on Λ u × R k from E( Y ) . Lemma 3.1 says that the parameters λ and β are generically identiﬁed (from the ﬁrstmoment of y ) if the matrices X and W are such that rank( X, W X ) > k . Conversely, ifrank( X, W X ) = k , λ and β cannot be identiﬁed, and hence consistently estimated, with-out distributional assumptions beyond E( ε ) = 0. For example, the 2SLS estimators ofKelejian and Prucha (1998) and Lee (2003), which are based on the speciﬁcation of the ﬁrstmoment only of y , are not deﬁned if rank( X, W X ) = k , because in that case no internalinstruments are available for the endogenous variable W y .The condition rank(

X, W X ) = k is trivially satisﬁed when k = 0 (pure model); otherwise,it is typically very strong. Indeed, for any given W , the set of (full rank) n × k matrices X such that rank( X, W X ) = k is a µ R n × k -null set. Accordingly, Lemma 3.1 could be stated bysaying that identiﬁcation from the ﬁrst moment of y is possible for generic parameter values( λ, β ) and for generic regressor matrices X . Nevertheless, speciﬁc combinations of W and X such that rank( X, W X ) = k may arise in some cases of interest, particularly in ﬁxed eﬀectsmodels. Some examples worth mentioning where it is easily veriﬁed that rank( X, W X ) = k are as follows:(a) Any network autoregression such that Assumption 1 is violated (because M X ( ωI n − W ) = 0 implies M X W X = 0, which is equivalent to rank(

X, W X ) = k ).(b) Some network ﬁxed eﬀects models of the type in Example 2: (b.i) A Group Interaction model with group speciﬁc slope coeﬃcients, group ﬁxedeﬀects, and with at least two groups ( R > X = L Rr =1 ( e X r , ι m r ),where the matrix e X r of regressors is m r × k r , with 0 ≤ k r < m r , so that k = R + P Rr =1 k r .(b.ii) A Balanced Group Interaction model with contextual eﬀects, and with at leasttwo groups ( R > X = ( e X, W e X ) for some n × ˜ k matrix e X ofregressors, so that k = 2˜ k . In case (b.i), Assumption 1 is satisﬁed for generic matrices e X , . . . , e X r if the model is unbalanced, andis violated if the model is balanced (see Example 3). In cases (b.ii) and (b.iii), Assumption 1 is satisﬁed forgeneric e X . The condition rank(

X, W X ) = k is also satisﬁed if, when ˜ k = 1, an intercept is added to X , i.e., W r being the symmetric orrow-normalized adjacency matrix of a complete bipartite graph, with contex-tual eﬀects, and with at least two groups ( R > X =( e X, W e X, L Rr =1 ι m r ) for some n × ˜ k matrix e X of regressors, with ˜ k ≥

0, so that k = R + 2˜ k .(c) Some models with ﬁxed eﬀects and no regressors (i.e., X contains only the dummiescorresponding to the ﬁxed eﬀects):(c.i) The one-way model of Example 1 with no regressors (i.e., X = ι T ⊗ I N ), as, forinstance, in Robinson and Rossi (2015).(c.ii) The two-way model of Example 1 with no regressors (i.e., X = ( ι T ⊗ I N , I T ⊗ ι N ))and row-stochastic f W (a matrix is said to be row-stochastic if all its row sumsare 1).(c.iii) The network ﬁxed eﬀects model (2.2) with no regressors (i.e., X = L Rr =1 ι m r )and all matrices W r ’s being row-stochastic. Note that, when R = 1, this reducesto an intercept-only network autoregression (2.1) with row-stochastic interactionmatrix.In cases such as those just listed, rank( X, W X ) = k and therefore λ and β cannot beidentiﬁed from E( Y ). As noted earlier, however, the condition rank( X, W X ) = k is verystrong in general. What might be more relevant in applications is that the condition isclose, in some sense, to being satisﬁed. In such a situation, it is natural to expect thatidentiﬁcation from the ﬁrst moment will be weak. We conﬁrm this with a small simulationexperiment. We generate 10 ,

000 replications from model (2.1) with W a row-normalized2-ahead 2-behind interaction matrix (before row-standardization, this is a matrix with allentries in the two diagonals above and the two diagonals below the main diagonal equal toone, and zero everywhere else), and a single regressor equal to ι n + bz , where b ∈ R and z ∼ N(0 , I n ), with z being generated once and then kept ﬁxed across replications. We set β = 1, σ = 1, and draw the errors independently from either a standard normal distributionor a gamma distribution with shape parameter 1 and scale parameter 1, demeaned by the X = ( ι n , e X, W e X ). When ˜ k > ι n ∈ col( e X, W e X ), and therefore an intercept cannot be added to ( e X, W e X )(one could, of course, replace one of the columns of ( e X, W e X ) with an intercept, and this would still giverank( X, W X ) = k ). λ and β cannot be identiﬁed fromthe ﬁrst moment if b = 0, because in that case rank( X, W X ) = k = 1. Thus, we expectany estimator of λ and β that relies entirely on the speciﬁcation of the ﬁrst moment of y to perform poorly if b is close to 0. For illustration, we consider the 2SLS estimator withinstruments W X and W X for W y (Kelejian and Prucha, 1998), and we compare it withthe quasi maximum likelihood estimator (QMLE), which also uses the second moment (theQMLE is the MLE that maximizes the likelihood obtained when ε ∼ N(0 , I n ); see Section5). Table 1 displays the root median square error of the 2SLS and (Q)ML estimators of λ and β . The root median square error is reported rather than the more usual root meansquare error because, in the setting we are considering, the variance of the 2SLS estimatordoes not exist (see Roberts, 1995, Section 7.2.2). For both λ and β , and for both the normaland the gamma distributions, the performance of the 2SLS estimator is good, compared tothe (Q)MLE benchmark, when b = 1, but deteriorates rapidly as b gets smaller. Such adeterioration is due to both the bias and the dispersion of the 2SLS estimator growing largeas b decreases, for any n .Table 1: Root median square error of the 2SLS and (Q)ML estimator of λ and β . Normal Gamma λ β λ βn b .

080 0.061 0 .

067 0 .

059 0 .

081 0 .

062 0 .

066 0 . . .

598 0.095 0 .

593 0 .

116 0 .

593 0 .

095 0 .

582 0 . .

01 1 .

658 0.096 1 .

585 0 .

118 1 .

618 0 .

096 1 .

598 0 . .

024 0.019 0 .

020 0 .

018 0 .

024 0 .

019 0 .

020 0 . . .

189 0.030 0 .

188 0 .

036 0 .

190 0 .

030 0 .

189 0 . .

01 1 .

194 0.031 1 .

191 0 .

037 1 .

189 0 .

030 1 .

189 0 . In the simulation experiment, b can be interpreted as a measure of the distance fromnon-identiﬁability via the ﬁrst moment. In more complex situations, one could construct ameasure of distance by observing that, since rank( X ) = k , rank( X, W X ) = k is equivalentto col( W X ) ⊆ col( X ) (i.e., in matrix theoretic language, to col( X ) being an invariantsubspace of W ) or, which is the same, to M X W X = 0. A distance from the conditionrank(

X, W X ) = k could then be provided by some norm of the matrix M X W X . We donot intend to study this rigorously here, but such a measure should help model users toavoid not only the cases in which inference based on the ﬁrst moment is impossible (the9orm of M X W X is zero), but also the cases close to these (the norm of M X W X is closeto zero), in which inference is likely to be very challenging without additional distributionalassumptions.

It is useful to brieﬂy compare Lemma 3.1 with some related results available in the literature,obtained by two diﬀerent approaches.

First , Lee (2004) studies asymptotic properties ofthe quasi maximum likelihood estimator based on the Gaussian distribution. The conditionrank(

X, W X ) > k appearing in Lemma 3.1 can be interpreted as the ﬁnite sample equivalentof Assumption 8 in Lee (2004). Indeed, under the latter assumption (and other regularityassumptions) the limit of the Gaussian quasi-likelihood has a unique maximum at the truevalue of the parameters, which is suﬃcient (and necessary under correct speciﬁcation ofthe likelihood) for identiﬁcation; see Newey and McFadden (1994). Second , in the socialnetwork literature, identiﬁcation of the structural parameters in model (2.1) is typicallyestablished by checking that those parameters can be uniquely recovered from the reducedform parameters (e.g., Bramoull´e et al., 2009; Blume et al., 2011; Kwok, 2019). Such astrategy obviously relies on the reduced form parameters being identiﬁed, which, in the caseof a ﬁxed W , would typically require repeated observations of the cross-section, over time orsome other dimension. Because of this, identiﬁcation via reduced form parameters may notbe appropriate in applications where a single observation of a network is available. Lemma3.1 can establish identiﬁability not only when repeated observations are available (in whichcase W is block diagonal with identical blocks, as in Example 1), but also when a singleobservation of the network is available. The following example shows that it is possible thatparameters are identiﬁed with repeated observations, but not with a single observation. Example . Consider a row-normalized or symmetric Complete Bipartite model with X =( ι n , x, W x ), for some x ∈ R n (such that X is full rank). Since the matrices I n , W , W are linearly independent, Proposition 1 in Bramoull´e et al. (2009) implies that λ and β areidentiﬁed from an i.i.d. sample of observations from the model. However, as noted in Example4, Assumption 1 fails, and therefore rank( X, W X ) = k . Thus, according to Lemma 3.1, λ and β cannot be identiﬁed from a single observation of the model, whatever the value of x . The applicability to the case of a single observation of a network is the most important10iﬀerence between Lemma 3.1 and the approach in Bramoull´e et al. (2009). With repeatedobservations, Lemma 3.1 yields results that are similar to those in Bramoull´e et al. (2009), but with two less important diﬀerences. Firstly, Lemma 3.1 does not restrict attention tothe case when X contains contextual eﬀects; our results can be used for that case, but alsofor the case when no contextual eﬀects are included, or only some contextual eﬀects areincluded. Secondly, Bramoull´e et al. (2009) assume that X is random with E( ε | X ) = 0,whereas, for the reasons mentioned in the Introduction, X is nonrandom in Lemma 3.1. So far, we have considered identiﬁability from the ﬁrst moment of y , under the restrictionE( ε ) = 0. When identiﬁcation from the ﬁrst moment fails, identiﬁcation may be achieved byimposing further restrictions on the model. The simplest of such restrictions is var( ε ) = I n , inwhich case identiﬁcation can occur via the second moment of y . To see this, it is convenientto focus on a parameter space for λ that is smaller than Λ u . Consider the case when W has at least one (real) negative eigenvalue and at least one (real) positive eigenvalue. Thisis typically satisﬁed in both applications and theoretical studies. Denote the smallest realeigenvalue of W by ω min , and, without loss of generality, normalize the largest real eigenvalueto 1. The parameter space for λ is often restricted to the largest interval containing the originin which S ( λ ) is nonsingular, that is, Λ := ( ω − , , or a subset thereof (possibly independent of n ) such as ( − , λ is diﬃcult to interpret. Lemma 3.2 (Identiﬁability from second moment) . Consider a network autoregression (2.1)with var( ε ) = I n , and assume that W has at least one negative eigenvalue and at least onepositive eigenvalue. The parameter ( λ, σ ) is identiﬁed on Λ × (0 , ∞ ) . Indeed, identiﬁcation under repeated observations for the Complete Bipartite model, which is establishedvia Proposition 1 in Bramoull´e et al. (2009) in Example 5, can also be established by Lemma 3.1. Tosee this, note that R observations of the row-normalized or symmetric Complete Bipartite model with X = ( ι n , x, W x ) correspond to a network autoregression with interaction matrix W ∗ = I R ⊗ W and regressormatrix X ∗ = ( ι nR , x ∗ , W ∗ x ∗ ) for some x ∗ ∈ R nR . Then one can see that rank( X ∗ , W ∗ X ∗ ) > k if and onlyif R >

1. That is, Lemma 3.1 establishes that identiﬁcation is achieved if and only if

R > If X and W were random, the restriction would be imposed on var( ε | W, X ), rather than on var( ε ). While not needed for Lemma 3.1, this restriction rules out the case when W is a scalar multiple of I n ,which trivially leads to non-identiﬁcation in Lemma 3.1.

11f course, once λ is identiﬁed, β can be identiﬁed from the ﬁrst moment E( y ) = ( I n − λW ) − Xβ , for any W and any (full rank) X . Lemma 3.2 complements two results availablein the literature that are concerned with identiﬁability from var( y ) on a diﬀerent parameterspace for λ . Firstly, Lemma 3.2 is an extension of Lemma 4.2 in Preinerstorfer and P¨otscher(2017), which establishes identiﬁcation of ( λ, σ ) on (0 , × (0 , ∞ ). Secondly, Lemma 4 inLee and Yu (2016) says that a suﬃcient condition for ( λ, σ ) to be identiﬁed from var( y )on Λ u × (0 , ∞ ) is that the matrices I n , W + W ′ and W ′ W are linearly independent. Thefollowing example considers a case when identiﬁcation cannot be established by Lemma 4 inLee and Yu (2016), but can be by Lemma 3.2.

Example . Consider a balanced group interaction model (see Example 3) with var( ε ) = I n . According to Lemma 4 in Lee and Yu (2016), ( λ, σ ) is not identiﬁed from var( y ) onΛ u × (0 , ∞ ), because the matrices I n , W + W ′ and W ′ W are linearly dependent when W = I R ⊗ B m . However, Lemma 3.2 asserts that ( λ, σ ) is identiﬁed (from var( y )) onΛ × (0 , ∞ ) (and hence on any subset thereof).It should be noted that the restriction var( ε ) = I n is imposed only for simplicity, and is byno means crucial for identiﬁcation from var( y ). Indeed, one could assume some parametricstructure for var( ε ), say var( ε ) = Σ( η ), and study identiﬁability of the parameter ( λ, σ , η )from var( y ) = σ ( I n − λW ) − Σ( η )( I n − λW ′ ) − , but we refrain from doing this here.At this point, it is worth considering the network (or spatial) error model y = Xβ + u, u = λW u + σε, (3.1)even though this speciﬁcation is considerably less popular than model (2.1) in economicapplications. The same set of assumptions as in the paragraph after equation (2.1) will bemaintained for model (3.1). Lemma 3.2 also applies to the network error model, becauseequations (2.1) and (3.1) imply the same variance structure for y . On the other hand, inthe network error mode λ cannot obviously be identiﬁed from the ﬁrst moment Xβ of y . Infact, the result in Lemma 3.1 can be interpreted as saying that λ and β cannot be identiﬁedfrom E( y ) in a network autoregression that behaves like a network error model. This point See also Theorem 3.2 in Davezies et al. (2009). Conditions for ( λ, σ ) to be identiﬁed from var( y ) canbe seen as ﬁnite sample counterparts of Assumption 9 in Lee (2004) (cf. Section 3.2). More precisely, for the variance matrix σ ( S ′ ( λ ) S ( λ )) − of the balanced group interaction model wehave σ ( S ′ ( λ ) S ( λ )) − = σ ( S ′ ( λ ) S ( λ )) − if and only if σ = m σ / (2 λ + m − and λ = − (( m − λ + 2(1 − m )) / (2 λ + m − λ / ∈ Λ if λ ∈ Λ.

12s made precise by the following argument. If rank(

X, W X ) = k , there exists a unique k × k matrix A such that W X = XA , and hence S − ( λ ) X = X ( I k − λA ) − , for any λ such that S ( λ ) is invertible. It follows that, when rank(

X, W X ) = k , the network autoregression y = S − ( λ ) Xβ + σS − ( λ ) ε can be written as y = X ( I k − λA ) − β + σS − ( λ ) ε , which is anetwork error model with regression coeﬃcients ( I k − λA ) − β . This section discusses the full identiﬁability content of Assumption 1. We already knowfrom Section 3.1 that, in the network autoregression, a failure of Assumption 1 precludesidentiﬁcation from the ﬁrst moment of y , but not from the higher order moments of y . We arenow going to show that Assumption 1 is necessary for identiﬁcation from statistics that areinvariant under certain transformations. Similar results to those in this section are obtainedin Preinerstorfer and P¨otscher (2017) for a general regression model with correlated errorsand for the particular case of a network error model with arbitrary W . We will need somenotions of group invariance (e.g., Lehmann and Romano, 2005, Chapter 6). Let G be a groupof transformations from the sample space into itself. A statistic is said to be invariant under G (or G -invariant) if it is constant on the orbits of G . It is said to be a maximal invariantunder G if it is invariant and takes diﬀerent values on each orbit. A necessary and suﬃcientcondition for a statistic to be invariant under G is that it depends on the data only througha maximal invariant under G . Lastly, a family of distributions { P θ , θ ∈ Θ } , where Θ is theparameter space is said to be invariant under G if every pair g ∈ G , θ ∈ Θ determine a uniqueelement in Θ denoted by ¯ gθ , such that when y has distribution P θ , gy has distribution P ¯ gθ .In order to apply the theory of invariance, in this section the network autoregression (2.1)and the network error model (3.1) are regarded as families of distributions { P θ , θ ∈ Θ } for y , where θ := ( λ, β, σ , η ), with η being a parameter indexing the distribution of ε , and θ isassumed to be identiﬁed (from the distribution of y ). For a given regressor matrix X, wewill consider the group G X := { g κ,δ : κ > , δ ∈ R k } , where g κ,δ denotes the transformation y → κy + Xδ , and its subgroup G X := { g ,δ : δ ∈ R k } . A maximal invariant under G X is C X y ,where C X is an ( n − k ) × n matrix such that C X C ′ X = I n − k and C ′ X C X = M X , and a maximal It is easily seen that the eigenvalues of A are eigenvalues of W . Hence, I k − λA is invertible if S ( λ ) is. According to Lemma C.1 in Appendix C, the model y = X ( I k − λA ) − β + σS − ( λ ) ε has the same proﬁlequasi log-likelihood l ( λ, σ ) as model (3.1), even though, clearly, the MLE of β will be diﬀerent in the twomodels. G X is v := C X y/ k C X y k (with the convention that v = 0 if C X y = 0). We alsosay that an expectation µ y ( θ ) is G -invariant if every pair g ∈ G , θ ∈ Θ determine a unique¯ gθ such that µ gy ( θ ) = µ y (¯ gθ ). The non-identiﬁability result in Lemma 3.1(ii) can be seenas a consequence of the fact that, when rank( X, W X ) = k , the mean µ y ( λ, β ) := S − ( λ ) Xβ is G X -invariant. This type of invariance implies that, when rank(

X, W X ) = k , the meancan only identify a k -dimensional parameter, not the ( k + 1)-dimensional parameter ( λ, β ).Under the same rank restriction and with an additional assumption that we now state, thenetwork autoregression (not its mean only) is invariant under G X , in fact under G X . Assumption 2.

The distribution of ε does not depend on the parameters λ , β , and σ . Let P θ denote the distribution for y in the network error model, with θ := ( λ, β, σ , η ).Under Assumption 2, the network error model is G X -invariant, because gy has distribution P ¯ gθ , with ¯ gθ = ( λ, κβ + δ, κ σ , η ), for any g ∈ G X . For the network autoregression we havethe following result. Lemma 4.1.

Suppose Assumption 2 holds. The network autoregression (2.1) is G X -invariantif and only if rank( X, W X ) = k . We are now in a position to discuss the implications of Assumption 1. The “principleof invariance” asserts that inference in a model should be invariant under any group oftransformations under which the model is invariant. Accordingly, under Assumption 2,inference in a network error model should be based on G X -invariant procedures, whatever X and W are, and inference in a network autoregression should be based on G X -invariantprocedures whenever rank( X, W X ) = k , which, as we have seen in Section 3.1, is the case ifAssumption 1 is violated. However, the imposition of G X -invariance causes an identiﬁabilityissue when Assumption 1 fails. To see this, observe that if Assumption 1 fails then C X S ( λ ) =(1 − λω ) C X , and therefore premultiplying both sides of the network autoregression equation S ( λ ) y = Xβ + σε by C X yields C X y = σ − λω C X ε. (4.1)Equation (4.1) shows that, when Assumption 2 is satisﬁed but Assumption 1 is not, ( λ, β, σ )cannot be identiﬁed from the distribution of C X y and hence, since C X y is a maximal invariant If rank(

X, W X ) = k , there exists a unique k × k matrix A such that W X = XA , and hence S − ( λ ) X = X ( I k − λA ) − , for any λ such that S ( λ ) is invertible (note that I k − λA is invertible if S ( λ ) is, because theeigenvalues of A must be eigenvalues of W ). Hence µ g ,δ y ( λ, β ) = µ y (¯ g ( λ, β )), with ¯ g ( λ, β ) = ( I k − λA ) − β + δ. G X , cannot be identiﬁed from the distribution of any G X -invariant statistic. Exactlythe same conclusion obtains starting from the network error model y = Xβ + σS − ( λ ) ε . Theresult is particularly perverse for the network autoregression: when Assumption 1 fails, andunder Assumption 2, the model is G X -invariant, and yet its parameters cannot be identiﬁedfrom any G X -invariant statistic.It is possible to be more precise about the cause of non-identiﬁcation. Suppose Assump-tion 1 is violated for some eigenvalue ω of W , and let g ω be the geometric multiplicity of ω . Recall from Section 2 that a pair (

X, W ) causes Assumption 1 to fail if and only if some ofthe columns of X span the subspace col( ωI n − W ). Observe that this requires k ≥ n − g ω ,because the dimension of col( ωI n − W ) is rank( ωI n − W ) = n − nullity( ωI n − W ) = n − g ω .Let X ω be the n × ( n − g ω ) matrix containing the columns of X that span col( ωI n − W ),and reorder the columns of X as in X = ( X ω , X ∗ ), where X ∗ is n × ( k − ( n − g ω )), with k − ( n − g ω ) ≥

0. Generalizing the argument leading to equation (4.1), if Assumption 1 failsthen C X ω S ( λ ) = (1 − λω ) C X ω , and therefore C X ω y = 11 − λω C X ω X ∗ β ∗ + σ − λω C X ω ε, (4.2)where β ∗ is the component of β corresponding to X ∗ . This shows that, under Assumption2, ( λ, β, σ ) cannot be identiﬁed from the distribution of C X ω y if Assumption 1 fails. Thatis, what really causes non-identiﬁcation when Assumption 1 fails is the imposition of invari-ance with respect to the subgroup G X ω of G X , and what we said above about G X -invariantstatistics applies to the (larger) set of G X ω -invariant statistics. We summarize this result inthe following theorem, and then provide an example. Theorem 1.

Suppose that, in the network autoregression (2.1) or in the network error model(3.1), Assumption 2 is satisﬁed, but Assumption 1 fails for some eigenvalue ω of W . Then ( λ, β, σ ) cannot be identiﬁed from the distribution of any G X ω -invariant statistic. Theorem 1 says that, for any W , there are matrices of regressors that make invariantinference impossible—these are the matrices leading to a violation of Assumption 1, that is,the matrices whose column space contains one of the subspaces col( ωI n − W ), where ω is aneigenvalue of W . It is worth emphasizing that this result does not require any distributionalassumption other than Assumption 2. Note that, for ﬁxed W and X , the condition M X ( ωI n − W ) = 0 that leads to a violation of Assumption1 can be satisﬁed at most by one eigenvalue ω . This is because M X ( ω I n − W ) = M X ( ω I n − W ) implies ω = ω . Also, note that M X ( ωI n − W ) = 0 implies that ω is real. xample . Consider a balanced group interaction model with group ﬁxed eﬀects. We haveseen in Example 3 that in this model Assumption 1 fails, because the columns of the ﬁxedeﬀects matrix I R ⊗ ι m span col( ω min I n − W ) (i.e., in the notation introduced just beforeequation (4.2), X ω min = I R ⊗ ι m ). Theorem 1 therefore implies that, under Assumption 2,( λ, β, σ ) cannot be identiﬁed from any statistic that is invariant under G I R ⊗ ι m , even thoughthe model is invariant under that group.Since C X X = 0, imposing invariance with respect to G I R ⊗ ι m removes the group ﬁxedeﬀects. Thus, Example 7 can be seen as a revisitation of the well-known identiﬁcationfailure that occurs in a balanced group interaction model upon removal of the group ﬁxedeﬀects (Lee, 2007).To conclude this section, it is useful to make a connection with the results obtained inSection 3. Recall that the parameters of a network autoregression can generally be identiﬁedby specifying the variance structure of ε , regardless of whether Assumption 1 holds; forexample, this is certainly the case if var( ε ) = I n , by Lemma 3.2. According to Theorem 1,however, any result establishing identiﬁcation from the distribution of y cannot be helpful forinvariant inference if Assumption 1 is not satisﬁed, because in that case identiﬁcation is lostafter imposition of invariance with respect to the group G X ω . The next section considersthis point from a likelihood perspective.

We now study the consequences of Theorem 1 for likelihood estimation of the networkautoregression. The MLE that is typically used for a network autoregression is the onebased on the likelihood that would obtain if ε were distributed as N(0 , I n ), often referredsimply as the QMLE (quasi MLE); see, e.g., Lee (2004). It will also be useful to considerthe adjusted QMLE, which is obtained from the QMLE by centering the proﬁle score for( λ, σ ) (see Yu et al., 2015). For estimation of ( λ, σ ), the adjusted QMLE usually performs For the use of invariance arguments to solve incidental parameter problems, see alsoChamberlain and Moreira (2009). Consider the model in Example 7. Due to the failure of Assumption 1, any result establishing identi-ﬁcation from the distribution of y cannot help to achieve reasonable inference in that model. This is so,for example, for Proposition 2 in de Paula (2017), which establishes identiﬁcation from the variance of y forthe particular case R = 1, when | λ | <

1. Inference based on such a result cannot respect the invarianceproperties of the model, because the model is invariant under the group G ι n of transformations y → y + αι n , α ∈ R , but identiﬁcation is lost on imposition of the group G ι n . β is large with respect to the sample size n (including in ﬁxed eﬀects models, in which case the dimension of β is increasing with n ).Let l ( λ, β, σ ) denote the Gaussian quasi log-likelihood for ( λ, β, σ ) in a network autore-gression or in a network error model, l ( λ ) the corresponding proﬁle likelihood for λ, and l a ( λ )the adjusted proﬁle likelihood for λ . The precise deﬁnitions of these likelihoods are givenin Appendix B. We say that a parameter θ is identiﬁed on a set Θ I from a quasi likelihood L ( θ ) if it is identiﬁed on Θ I from the distribution underlying L ( θ ) (so, if θ is identiﬁed onΘ I from the quasi likelihood L ( θ ), then L ( ∼ θ ) = L ( ∼∼ θ ) for almost all y ∈ R n implies ∼ θ = ∼∼ θ , forany ∼ θ, ∼∼ θ ∈ Θ I ). Clearly, Lemma 3.2 is suﬃcient to guarantee identiﬁcation of ( λ, β, σ ) onΛ × R k × (0 , ∞ ) from l ( λ, β, σ ) for any pair X, W , including those pairs such that Assump-tion 1 is violated. However, a violation of Assumption 1 makes inference based on l ( λ, β, σ )pointless, in the following sense. Proposition 5.1.

Consider the network autoregression (2.1) or the network error model(3.1). If Assumption 1 is violated, then, for any λ such that det( S ( λ )) = 0 , and for any y / ∈ col( X ) ,(i) the proﬁle score associated with the proﬁle log-likelihood l ( λ ) does not depend on y ;(ii) the adjusted proﬁle log-likelihood function l a ( λ ) is ﬂat. In other words, when Assumption 1 fails, a maximizer of l ( λ ) (over Λ or any other subsetof R ), if it exists, is non-random, and l a ( λ ) does not contain any identifying informationabout λ .Part (ii) of Proposition 5.1 can be linked back to the invariance results of Section 4. Bystandard arguments (available for instance in Rahman and King, 1997), l a ( λ ) correspondsto the density of the maximal invariant v := C X y/ k C X y k under G X , for any network autore-gression model violating Assumption 1 and for any network error model. Then, the ﬂatnessof l a ( λ ) can be understood in terms of the distribution of v being free of λ if the distributionof ε is free of λ , which follows from equation (4.1). It is easily veriﬁed that the maximal invariant induced by G X on the parameter space is λ (it would be( λ, η ) in the presence of a parameter η in the distribution of ε ). This may seem to contradict one of thefundamental results on invariance, which is usually stated by saying that the distribution of an invariantstatistic depends only on a maximal invariant induced on the parameter space (e.g., Lehmann and Romano,2005, Theorem 6.3.2). The apparent contradiction is due to the non-identiﬁcation caused by the violationof Assumption 1. Conclusion

We have studied identiﬁcation of an autoregression deﬁned on a general network, under weakdistributional assumptions and without requiring repeated observations of the network. Inthis context, identiﬁcation is possible for generic parameter values and for generic regressormatrices, whatever the network. However, important cases do exist when identiﬁcation fails,either in the original sample space or after some transformation (this could be, for instance,a transformation aimed at removing ﬁxed eﬀects). We have shown that in the latter caseit is impossible to conduct inference that respects the invariance properties of the model,regardless of whether the parameters are identiﬁed from the second moment of the outcomevariable.It should be emphasized that our results have been derived under the assumption that thenetwork is fully known and exogenous, which may be unrealistic in many applications. Thestudy of identiﬁcation when the network is (partially) unknown and/or endogenous remainsa key challenge in the literature (e.g., Blume et al., 2015; de Paula et al., 2020; Lewbel et al.,2019), and we hope that the results obtained in this paper can prove useful in that settingtoo.

Appendix A Further examples when Assumption 1 fails

Further to Examples 3 and 4, other two simple models in which Assumption 1 fails are asfollows.

Example . Consider the modiﬁcation of Example 3 in which exclusive averaging is replacedby inclusive averaging, meaning that each unit interacts not only with all other units in agroup but also with itself. If there are R groups, each of size m r , the interaction matrix is W = L Rr =1 1 m r ι m r ι ′ m r . Since col( L Rr =1 1 m r ι m r ι ′ m r ) = col( L Rr =1 ι m r ), Assumption 1 is violated(at ω = 0) whenever X contains group intercepts. Note that in this case, contrary to thecase of exclusive averaging, Assumption 1 fails regardless of whether the model is balancedor not. Example . Example 4 generalizes immediately to complete R -partite graphs, with R ≥ R -partite graph is a graph in which the n observational units can be divided into R partitions, with all units in a partition interacting with all in other partitions, but withnone in their own partition). In that case, Assumption 1 is violated (at ω = 0) whenever X R partitions.Examples 8 and 9 share important similarities, due to the fact that the graphs underlyingthe two models are complements of each other, in the graph theoretic sense. Indeed, forboth models, the condition col( L Rr =1 ι m r ) ⊆ col( X ) leading to a failure of Assumption 1is also satisﬁed if: (i) X contains an intercept and R − W x i , forsome x , . . . , x R − ∈ R n ; (ii) X contains R contextual eﬀect terms W x , . . . , W x R , for some x , . . . , x R ∈ R n . Appendix B The QMLE and the adjusted QMLE

Omitting additive constants, the quasi log-likelihood for ε ∼ N(0 , I n ) in the network autore-gression (2.1) is l ( λ, β, σ ) := − n σ ) + log | det( S ( λ )) | − σ ( S ( λ ) y − Xβ ) ′ ( S ( λ ) y − Xβ ) , (B.1)for any λ such that S ( λ ) is nonsingular. To avoid tedious repetitions, we often omit the“quasi-” in front of “log-likelihood”. The QMLE in most common use maximizes l ( λ, β, σ )under the condition that λ is in Λ (or in a subset thereof). That is, the QMLE of ( λ, β, σ )is (ˆ λ ML , ˆ β ML , ˆ σ ) := argmax β ∈ R k , σ > , λ ∈ Λ l ( λ, β, σ ) . Maximization with respect to β and σ gives ˆ β ML ( λ ) := ( X ′ X ) − X ′ S ( λ ) y and ˆ σ ( λ ) := n y ′ S ′ ( λ ) M X S ( λ ) y . Thus, ˆ λ ML can be conveniently computed by maximizing over Λ theproﬁle likelihood for λ , l ( λ ) := l ( λ, ˆ β ML ( λ ) , ˆ σ ( λ )) = − n (cid:0) ˆ σ ( λ ) (cid:1) + log | det( S ( λ )) | , (B.2)where additive constants have again been omitted.When the dimension of β is large compared to the sample size, the QMLE of ( λ, σ )may perform poorly. To tackle this problem, the QMLE of ( λ, σ ) can be adjusted byrecentering the proﬁle score s ( λ, σ ) associated to the proﬁle log-likelihood for ( λ, σ ), In order to be full rank, X can contain at most R − R contextual eﬀect terms otherwise. This assumes that Λ is well deﬁned. If W did not have a negative (resp., positive) eigenvalue, then theleft (resp., right) extreme of Λ could be taken to be −∞ (resp., + ∞ ). ( λ, σ ) := l ( ˆ β ML ( λ ) , σ , λ ). Under the assumptions E( ε ) = 0 and var( ε ) = I n , E( s ( λ, σ )) isavailable analytically and does not depend on the nuisance parameter β . Thus, calculationof the adjusted proﬁle score s a ( λ, σ ) := s ( σ , λ ) − E( s ( λ, σ )) is straightforward. Given s a ( λ, σ ), one can deﬁne the adjusted likelihood l a ( λ, σ ) as the function with gradient equalto s a ( λ, σ ), and hence the adjusted QMLE (ˆ λ aML , ˆ σ ) as the maximizer of l a ( λ, σ ). Also,letting ˆ σ ( λ ) be the adjusted QMLE of σ for given λ , we deﬁne the adjusted likelihoodfor λ only as l a ( λ ) := l a ( λ, ˆ σ ( λ )). See Yu et al. (2015) for details on these constructions. Appendix C Proofs

Lemma C.1.

The network autoregression (2.1) and the network error model (3.1) imply thesame proﬁle quasi log-likelihood function for ( λ, σ ) if and only if rank( X, W X ) = k . Proof of Lemma C.1.

On concentrating the nuisance parameter β out of the likelihood(B.1), the proﬁle quasi log-likelihood for ( λ, σ ) in a network autoregression is, up to anadditive constant, l ( λ, σ ) := l ( ˆ β ML ( λ ) , σ , λ ) = − n σ ) + log | det( S ( λ )) | − σ y ′ S ′ ( λ ) M X S ( λ ) y. (C.1)Similarly, the proﬁle quasi log-likelihood function for ( λ, σ ) in a network error model, basedagain on the assumption ε ∼ N(0 , I n ), is l ( λ, σ ) := − n σ ) + log | det( S ( λ )) | − σ y ′ S ′ ( λ ) M S ( λ ) X S ( λ ) y. (C.2)The two log-likelihood functions are the same if and only if M S ( λ ) X = M X for any λ such that S ( λ ) is invertible. But, for any λ such that S ( λ ) is invertible, the condition M S ( λ ) X = M X is equivalent to col( S ( λ ) X ) = col( X ), and hence to col( W X ) ⊆ col( X ), which in turn is thesame as rank( X, W X ) = k. Proof of Lemma 3.1.

The parameter ( λ, β ) is identiﬁed on Λ u × R k from E( Y ) = S − ( λ ) Xβ if S − ( ∼ λ ) X ∼ β = S − ( ∼∼ λ ) X ∼∼ β implies ( ∼ λ, ∼ β ) = ( ∼∼ λ, ∼∼ β ) for any two values ( ∼ λ, ∼ β ) , ( ∼∼ λ, ∼∼ β ) of ( λ, β ) inΛ u × R k . One immediately has that S − ( ∼ λ ) X ∼ β = S − ( ∼∼ λ ) X ∼∼ β if and only if X ( ∼ β − ∼∼ β ) + W X ( ∼ λ ∼∼ β − ∼∼ λ ∼ β ) = 0 . (C.3)20e analyze separately three (exhaustive) cases, depending on the rank of the n × k matrix( X, W X ). Recall that X is assumed to be of full column rank.(i) rank( X, W X ) = 2 k . In this case equation (C.3) is equivalent to ∼ β = ∼∼ β and ∼ λ ∼∼ β = ∼∼ λ ∼ β, from which ( ∼ λ, ∼ β ) = ( ∼∼ λ, ∼∼ β ) if and only if ∼ β = ∼∼ β = 0 . That is, ( λ, β ) is identiﬁed onΛ u × R k \{ } from E( Y ) . (ii) k < rank( X, W X ) < k . Partition X as ( X , X ) where X is n × k and X is n × k , with 0 < k < k . The case k < rank( X, W X ) < k may be characterized byassuming rank( X, W X ) = k + k and W X = XB + W X C, for some k × k matrix B and some k × k matrix C , so that rank( X, W X ) = k + k . Replacing W X with(

W X , XB + W X C ) in (C.3), and letting ( β ′ , β ′ ) be the partition of β ′ conformablewith that of X , we obtain X ( ∼ β − ∼∼ β + B ( ∼ λ ∼∼ β − ∼∼ λ ∼ β )) + W X ( ∼ λ ∼∼ β − ∼∼ λ ∼ β + C ( ∼ λ ∼∼ β − ∼∼ λ ∼ β )) = 0 , which is satisﬁed if and only if ∼ β − ∼∼ β + B ( ∼ λ ∼∼ β − ∼∼ λ ∼ β ) = 0 and ∼ λ ∼∼ β − ∼∼ λ ∼ β + C ( ∼ λ ∼∼ β − ∼∼ λ ∼ β ) =0. As a linear system in the unknowns ∼∼ λ and ∼∼ β , these two equations are M ( ∼ λ, ∼ β )  ∼∼ λ ∼∼ β  = ∼ β k ! , (C.4)where the matrix M ( ∼ λ, ∼ β ) := B ∼ β I k − ∼ λ (0 k,k , B ) ∼ β + C ∼ β − ∼ λ ( I k , C ) ! is of dimension ( k + k ) × (1 + k ). Now, identiﬁcation of ( λ, β ) from E( Y ) is equiv-alent to ( ∼ λ, ∼ β ) being the unique solution to system (C.4), and this occurs if andonly if rank( M ( ∼ λ, ∼ β )) = 1 + k , or, equivalently, det( M ( ∼ λ, ∼ β ) ′ M ( ∼ λ, ∼ β )) = 0 . Butdet( M ( ∼ λ, ∼ β ) ′ M ( ∼ λ, ∼ β )) is a polynomial in ( ∼ λ, ∼ β ) and hence the set of its zeros is ei-ther the whole R k +1 or has zero measure with respect to µ R k +1 . The former case iseasily ruled out (e.g., M ( ∼ λ, ∼ β ) has rank k + 1 for ( ∼ λ, ∼ β ) = (0 , (1 ′ k , ′ k ) ′ )), which meansthat ( λ, β ) is generically identiﬁed from E( Y ).(iii) rank( X, W X ) = k . This happens if and only if there is a k × k matrix A such that21 X = XA . In that case, equation (C.3) becomes X ( ∼ β − ∼∼ β + A ( ∼ λ ∼∼ β − ∼∼ λ ∼ β )) = 0, which,since rank( X ) = k , is equivalent to ∼ β − ∼∼ β + A ( ∼ λ ∼∼ β − ∼∼ λ ∼ β ) = 0. Rewrite the last equalityas ( I k − ∼∼ λA ) ∼ β − ( I k − ∼ λA ) ∼∼ β = 0. Since the eigenvalues of A are eigenvalues of W,I k − λA is invertible for any λ ∈ Λ u , and therefore ∼ β = ( I k − ∼∼ λA ) − ( I k − ∼ λA ) ∼∼ β . Thisshows that for any ( ∼∼ λ, ∼∼ β ) ∈ Λ u × R k , it is possible to ﬁnd ( ∼ λ, ∼ β ) = ( ∼∼ λ, ∼∼ β ) such that S − ( ∼ λ ) X ∼ β = S − ( ∼∼ λ ) X ∼∼ β .Summarizing, ( λ, β ) is generically identiﬁed from E( Y ), and hence generically identiﬁed,on Λ u × R k in cases (i) and (ii), and not identiﬁed from E( Y ) on Λ u × R k in case (iii). Proof of Lemma 3.2.

This proof is similar to the proof of Lemma 4.2 in Preinerstorfer and P¨otscher(2017). Under the assumption that var( ε ) = I n , var( y ) = σ ( S ′ ( λ ) S ( λ )) − . We show that, if ∼∼ σ S ′ ( ∼ λ ) S ( ∼ λ ) = ∼ σ S ′ ( ∼∼ λ ) S ( ∼∼ λ ) for any two parameter values ( ∼ λ, ∼ σ ) , ( ∼∼ λ, ∼∼ σ ) ∈ Λ × (0 , ∞ ), then( ∼ λ, ∼ σ ) = ( ∼∼ λ, ∼∼ σ ). The maintained assumption that W has at least one negative eigen-value and at least one positive eigenvalue guarantees the existence of a nonzero vector f ∈ null( W − I n ) and a nonzero vector g ∈ null( W − ω min I n ). Multiplying both sidesof the equality ∼∼ σ S ′ ( ∼ λ ) S ( ∼ λ ) = ∼ σ S ′ ( ∼∼ λ ) S ( ∼∼ λ ) by f ′ on the left and f on the right gives ∼∼ σ (1 − ∼ λ ) f ′ f = ∼ σ (1 − ∼∼ λ ) f ′ f . Since 1 − λ > λ ∈ Λ , and f ′ f = 0 , thelast equality is equivalent to ∼∼ σ/ ∼ σ = (1 − ∼∼ λ ) / (1 − ∼ λ ). Repeating with g in place of f gives ∼∼ σ/ ∼ σ = (1 − ∼∼ λω min ) / (1 − ∼ λω min ). Thus, we must have (1 − ∼ λω min ) / (1 − ∼ λ ) = (1 − ∼∼ λω min ) / (1 − ∼∼ λ ).Since the function λ (1 − λω min ) / (1 − λ ) is strictly increasing on Λ, we have ∼ λ = ∼∼ λ , andhence ∼ σ = ∼∼ σ . Proof of Lemma 4.1.

For any λ such that S ( λ ) is nonsingular and under Assumption 2,it is clear from the reduced form y = S − ( λ ) Xβ + σS − ( λ ) ε that a network autoregression isinvariant under G X if and only if col( S − ( λ ) X ) = col( X ), or, which is the same, col( S ( λ ) X ) =col( X ). But this is all that is required, because, as noted in the proof of Lemma C.1,the condition col( S ( λ ) X ) = col( X ) for any λ such that S ( λ ) is invertible is equivalent torank( X, W X ) = k . Proof of Proposition 5.1.

For any λ such that rank( S ( λ )) = n , and for any y / ∈ null( M X S ( λ )), the proﬁle log-likelihood l ( λ ) for a network autoregression is given by equation(B.2). Note that equation (B.2) holds a.s. for any ﬁxed λ such that rank( S ( λ )) = n , becausenull( M X S ( λ )) is a µ R n -null set when rank( S ( λ )) = n (since k < n ). If Assumption 1 is vio-lated for an eigenvalue ω of W , then M X ( ωI n − W ) = 0 and hence M X S ( λ ) = (1 − λω ) M X ,22hich substituted into (B.2) gives l ( λ ) = log | det( S ( λ )) | − n log | − λω | − n y ′ M X y ) , (C.5)for any y / ∈ col( X ). Since a violation of Assumption 1 implies rank( X, W X ) = k , equation(C.5) also applies to a network error model, by Lemma C.1. Part (i) of the propositionfollows on noting that the terms in (C.5) that contain λ do not contain y . Next, let s ( λ ) bethe proﬁle score associated with l ( λ ), let s a ( λ ) := s ( λ ) − E( s ( λ )) be its adjusted counterpart,and let l ∗ a ( λ ) := R s a ( λ ) d λ be the likelihood corresponding to s a ( λ ). It can be easily veriﬁedthat l a ( λ ) = n − kn l ∗ a ( λ ) (the adjusted proﬁle likelihood l a being deﬁned in Appendix B). IfAssumption 1 is violated, then, from part (i), E( s ( λ )) = s ( λ ), and hence s a ( λ ) = 0, whichin turn implies that l ∗ a ( λ ), and hence l a ( λ ), is constant. This completes the proof. References

Blume, L. E., Brock, W. A., Durlauf, S. N., Ioannides, Y. M., 2011. Identiﬁcation of social inter-actions. Vol. 1 of Handbook of Social Economics. North-Holland, pp. 853–964.Blume, L. E., Brock, W. A., Durlauf, S. N., Jayaraman, R., 2015. Linear social interactions models.Journal of Political Economy 123 (2), 444–496.Bramoull´e, Y., Djebbari, H., Fortin, B., 2009. Identiﬁcation of peer eﬀects through social networks.Journal of Econometrics 150 (1), 41–55.Chamberlain, G., Moreira, M. J., 2009. Decision theory applied to a linear panel data model.Econometrica 77 (1), 107–133.Cressie, N., 1993. Statistics for Spatial Data. Wiley, New York.Davezies, L., D’Haultfoeuille, X., Foug´ere, D., 2009. Identiﬁcation of peer eﬀects using group sizevariation. The Econometrics Journal 12 (3), 397–413.de Paula, ´A., 2017. Econometrics of Network Models. Vol. 1 of Econometric Society Monographs.Cambridge University Press, pp. 268–323.de Paula, ´A., Rasul, I., Souza, P. C., 2020. Identifying network ties from panel data: Theory andan application to tax competition. Working paper.Drton, M., Foygel, R., Sullivant, S., 2011. Global identiﬁability of linear structural equation models.Ann. Statist. 39 (2), 865–886.Gupta, A., 2019. Estimation of spatial autoregressions with stochastic weight matrices. EconometricTheory 35 (2), 417–463. elejian, H. H., Prucha, I. R., 1998. A generalized spatial two-stage least squares procedure for es-timating a spatial autoregressive model with autoregressive disturbances. Journal of Real EstateFinance and Economics 17 (1), 99–121.Kwok, H. H., 2019. Identiﬁcation and estimation of linear social interaction models. Journal ofEconometrics 210 (2), 434–458.Lee, L.-F., 2003. Best spatial two-stage least squares estimators for a spatial autoregressive modelwith autoregressive disturbances. Econometric Reviews 22 (4), 307–335.Lee, L.-F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial au-toregressive models. Econometrica 72 (6), 1899–1925.Lee, L.-F., 2007. Identiﬁcation and estimation of econometric models with group interactions,contextual factors and ﬁxed eﬀects. Journal of Econometrics 140 (2), 333–374.Lee, L.-F., Yu, J., 2016. Identiﬁcation of spatial Durbin panel models. Journal of Applied Econo-metrics 31 (1), 133–162.Lehmann, E. L., Romano, J. P., 2005. Testing statistical hypotheses, 3rd Edition. Springer Textsin Statistics. Springer, New York.LeSage, J., Pace, R., 2009. Introduction to Spatial Econometrics. Chapman and Hall/CRC, NewYork.Lewbel, A., Qu, X., Tang, X., 2019. Social networks with misclassifed or unobserved links. Workingpaper.Manski, C. F., 1993a. Identiﬁcation of endogenous social eﬀects: The reﬂection problem. TheReview of Economic Studies 60 (3), 531–542.Manski, C. F., 1993b. Identiﬁcation of endogenous social eﬀects: The reﬂection problem. TheReview of Economic Studies 60 (3), 531–542.Newey, W. K., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Engle,R. F., McFadden, D. L. (Eds.), Handbook of Econometrics. Vol. 4. Elsevier, Ch. 36, pp. 2111–2245.Preinerstorfer, D., P¨otscher, B. M., 2017. On the power of invariant tests for hypotheses on acovariance matrix. Econometric Theory 33 (1), 1–68.Rahman, S., King, M. L., 1997. Marginal-likelihood score-based tests of regression disturbances inthe presence of nuisance parameters. Journal of Econometrics 82 (1), 81–106.Roberts, L. A., 1995. On the existence of moments of ratios of quadratic forms. Econometric Theory11 (4), 750–774.Robinson, P. M., Rossi, F., 2015. Reﬁnements in maximum likelihood inference on spatial autocor-relation in panel data. Journal of Econometrics 189 (2), 447–456. hittle, P., 1954. On stationary processes in the plane. Biometrika 41 (3/4), 434–449.Yu, D., Bai, P., Ding, C., 2015. Adjusted quasi-maximum likelihood estimator for mixed regres-sive, spatial autoregressive model and its small sample bias. Computational Statistics and DataAnalysis 87, 116–135.hittle, P., 1954. On stationary processes in the plane. Biometrika 41 (3/4), 434–449.Yu, D., Bai, P., Ding, C., 2015. Adjusted quasi-maximum likelihood estimator for mixed regres-sive, spatial autoregressive model and its small sample bias. Computational Statistics and DataAnalysis 87, 116–135.