Consistent specification testing under spatial dependence
aa r X i v : . [ ec on . E M ] J a n Consistent specification testing under spatialdependence ∗ Abhimanyu Gupta †‡ Xi Qu §¶ January 26, 2021
Abstract
We propose a series-based nonparametric specification test for a regression func-tion when data are spatially dependent, the ‘space’ being of a general economic orsocial nature. Dependence can be parametric, parametric with increasing dimension,semiparametric or any combination thereof, thus covering a vast variety of settings.These include spatial error models of varying types and levels of complexity. Under anew smooth spatial dependence condition, our test statistic is asymptotically standardnormal. To prove the latter property, we establish a central limit theorem for quadraticforms in linear processes in an increasing dimension setting. Finite sample performanceis investigated in a simulation study and empirical examples illustrate the test withreal-world data.
Keywords:
Specification testing, nonparametric regression, spatial dependence, cross-sectional dependence
JEL Classification:
C21, C55 ∗ We thank Swati Chandna, Miguel Delgado, Emmanuel Guerre, Fernando L´opez Hernand´ez (discussant),Hon Ho Kwok, Arthur Lewbel, Daisuke Murakami (discussant), Ryo Okui and Amol Sasane for helpful discus-sions. We also thank seminar participants at YEAP 2018 (Shanghai University of Finance and Economics),NYU Shanghai, Carlos III Madrid, SEW 2018 (Dijon), Aarhus University, SEA 2018 (Vienna), EcoSta 2018(Hong Kong), Hong Kong University, AFES 2018 (Cotonou), ESEM 2018 (Cologne), CFE 2018 (Pisa), Uni-versity of York, Pennsylvania State University, Michigan State University, University of Michigan, TexasA&M University, 1st Southampton Workshop on Econometrics and Statistics and MEG 2019 (Columbus). † Department of Economics, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK. E-mail:[email protected]. ‡ Research supported by ESRC grant ES/R006032/1. § Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China, 200052.E-mail: [email protected]. ¶ Research supported by the National Natural Science Foundation of China (Project no. 71973097) andShanghai Institute of International Finance and Economics. Introduction
Models for spatial dependence have recently become the subject of vigorous research. Thisburgeoning interest has roots in the needs of practitioners who frequently have access to datasets featuring inter-connected cross-sectional units. Motivated by these practical concerns,we propose a specification test for a regression function in a general setup that covers avast variety of commonly employed spatial dependence models and permits the complexityof dependence to increase with sample size. Our test is consistent, in the sense that aparametric specification is tested with asymptotically unit power against a nonparametricalternative. The ‘spatial’ models that we study are not restricted in any way to be geographicin nature, indeed ‘space’ can be a very general economic or social space.Specification testing is an important problem, and this is reflected in a huge literaturestudying consistent tests. Much of this is based on independent, and often also identicallydistributed, data. However data frequently exhibit dependence and consequently a branchof the literature has also examined specification tests under time series dependence. Ourinterest centres on dependence across a ‘space’, which differs quite fundamentally from de-pendence in a time series context. Time series are naturally ordered and locations of theobservations can be observed, or at least the process generating these locations may bemodelled. It can be imagined that concepts from time series dependence be extended tosettings where the data are observed on a geographic space and dependence can be treatedas decreasing function of distance between observations. Indeed much work has been done toextend notions of time series dependence in this type of setting, see e.g. Jenish and Prucha(2009, 2012).However, in a huge variety of economics and social science applications agents in-fluence each other in ways that do not conform to such a setting. For example,farmers affect the demand of farmers in the same village but not in different villages,as in Case (1991). Likewise, price competition among firms exhibits spatial features(Pinkse, Slade, and Brett (2002)), input-output relations lead to complementarities betweensectors (Conley and Dupor (2003)), co-author connections form among scientists (Oettl(2012), Mohnen (2020)), R&D spillovers occur through technology and product marketspaces (Bloom, Schankerman, and van Reenen (2013)), networks form due to allegiancesin conflicts (K¨onig, Rohner, Thoenig, and Zilibotti (2017)) and overlapping bank portfolioslead to correlated lending decisions (Gupta, Kokas, and Michaelides (2020)). Such examplescannot be studied by simply extending results developed for time series and require suitablemethods.A popular model for general spatial dependence is the spatial autoregressive (SAR)2lass, due to Cliff and Ord (1973). The key feature of SAR models, and various gener-alizations such as SARMA (SAR moving average) and matrix exponential spatial spec-ifications (MESS, due to LeSage and Pace (2007)), is the presence of one or more spa-tial weight matrices whose elements characterize the links between agents. As notedabove, these links may form for a variety of reasons, so ‘spatial’ represents a very gen-eral notion of space, such as social or economic space. Key papers on the estimationof SAR models and their variants include Kelejian and Prucha (1998) and Lee (2004),but research on various aspects of these is active, see e.g. Robinson and Rossi (2015);Hillier and Martellosio (2018a,b); Kuersteiner and Prucha (2020); Han, Lee, and Xu (2021);Hahn, Kuersteiner, and Mazzocco (2021).Unlike work focusing on independent or time series data, a general drawback of spa-tially oriented research has been the lack of general unified theory. Typically, individualpapers have studied specific special cases of various spatial specifications. A strand of theliterature has introduced the notion of a cross-sectional linear-process to help address thisproblem, and we follow this approach. This representation can accommodate SAR mod-els in the error term (so called spatial error models (SEM)) as a special case, as well asvariants like SARMA and MESS, whence its generality is apparent. The linear-processstructure shares some similarities with that familiar from the time series literature (see e.g.Hannan (1970)). Indeed, time series versions may be regarded as very special cases but,as stressed before, the features of spatial dependence must be taken into account in thegeneral formulation. Such a representation was introduced by Robinson (2011) and furtherexamined in other situations by Robinson and Thawornkaiwong (2012) (partially linear re-gression), Delgado and Robinson (2015) (non-nested correlation testing), Lee and Robinson(2016) (series estimation of nonparametric regression) and Hidalgo and Schafgans (2017)(cross-sectionally dependent panels).In this paper, we propose a test statistic similar to that of Hong and White (1995), basedon estimating the nonparametric specification via series approximations. Assuming an inde-pendent and identically distributed sample, their statistic is based on the sample covariancebetween the residual from the parametric model and the discrepancy between the paramet-ric and nonparametric fitted values. Allowing additionally for spatial dependence throughthe form of a linear process as discussed above, our statistic is shown to be asymptoticallystandard normal, consistent and possessing nontrivial power against local alternatives of acertain type. To prove asymptotic normality, we present a new central limit theorem (CLT)for quadratic forms in linear processes in an increasing dimension setting that may be of in-dependent interest. The setting of Su and Qu (2017) is a very special case of our framework.There has been recent interest in specification testing for spatial models, see for example3ee, Phillips, and Rossi (2020) for a consistent omnibus test.Our linear process framework permits spatial dependence to be parametric, parametricwith increasing dimension, semiparametric or any combination thereof, thus covering a vastvariety of settings. A class of models of great empirical interest are ‘higher-order’ SARmodels in the outcome variables, but with spatial dependence structure also in the errors.We initially present the familiar nonparametric regression to clarify the exposition, andthen cover this class as the main model of interest. Our theory covers as special casesSAR, SMA, SARMA, MESS models for the error term. These specifications may be of anyfixed order, but our theory also covers the case where they are of increasing order. Thuswe permit a more complex model of spatial dependence as more data become available,which encourages a more flexible approach to modelling such dependence as stressed byGupta and Robinson (2015, 2018) in a higher-order SAR context, Huber (1973), Portnoy(1984, 1985) and Anatolyev (2012) in a regression context and Koenker and Machado (1999)for the generalized method of moments setting, amongst others.Our framework is also extended to the situation where spatial dependence occurs throughnonparametric functions of raw distances (these may be economic or social distances, say),as in Pinkse et al. (2002). The case of geographical data is also covered, for example theimportant classes of Mat´ern and Wendland (see e.g. Gneiting (2002)) covariance functions.We also introduce a new notion of smooth spatial dependence that provides more primitive,and checkable, conditions for certain properties than extant ones in the literature. To illus-trate the performance of the test in finite samples, we present Monte Carlo simulations thatexhibit satisfactory small sample properties. The test is demonstrated in three empiricalexamples. Our test may or may not reject the null hypothesis of a linear regression in theseexamples, illustrating its ability to distinguish well between the null and alternative models.The next section introduces our setup using a nonparametric regression with no SARstructure in responses. We treat this abstraction as a base case, and Section 3 discussesestimation and defines the test statistic, while Section 4 introduces assumptions and the keyasymptotic results of the paper. Section 5 examines the most commonly employed higher-order SAR models, while Section 6 deals with nonparametric structures. Sections 7 and 8contain a study of finite sample performance and the empirical examples respectively. Proofsare contained in appendices. 4
Setup
To illustrate our approach, we first consider the nonparametric regression y i = θ ( x i ) + u i , i = 1 , . . . , n, (2.1)where θ ( · ) is an unknown function and x i is a vector of strictly exogenous explanatoryvariables with support X ⊂ R k . Spatial dependence is explicitly modeled via the error term u i , which we assume is generated by: u i = ∞ X s =1 b is ε s , (2.2)where ε s are independent random variables, with zero mean and identical variance σ . Fur-ther conditions on the ǫ s will be assumed later. The linear process coefficients b is can dependon n , as may the covariates x i . This is generally the case with spatial models and impliesthat asymptotic theory ought to be developed for triangular arrays. There are a number ofreasons to permit dependence on sample size. The b is can depend on spatial weight matrices,which are usually normalized for both stability and identification purposes.Such normalizations, e.g. row-standardization or division by spectral norm, may be n -dependent. Furthermore, x i often includes underlying covariates of ‘neighbours’ defined byspatial weight matrices. For instance, for some n × z and spatial weightmatrix W ≡ W n , a component of x i can be e ′ i W z , where e i has unity in the i -th positionand zeroes elsewhere, which depends on n . Thus, subsequently, any spatial weight matriceswill also be allowed to depend on n . Finally, treating triangular arrays permits re-labellingof quantities that is often required when dealing with spatial data, due to the lack of naturalordering, see e.g. Robinson (2011). We suppress explicit reference to this n -dependence ofvarious quantities for brevity, although mention will be made of this at times to remind thereader of this feature.Introduce three notational conventions for any parameter ν for the rest of the paper: ν ∈ R d ν , ν denotes the true value of ν and for any scalar, vector or matrix valued function f ( ν ), we denote f ≡ f ( ν ). Now, assume the existence of a d γ × γ such that b is = b is ( γ ), possibly with d γ → ∞ as n → ∞ , for all i = 1 , . . . , n and s ≥
1. Let u be the n × u i , ε be the infinite dimensional vector with typical element ε s , and B be an infinite dimensional matrix with typical element b is . In matrix form, u = Bε and E ( uu ′ ) = σ BB ′ = σ Σ ≡ σ Σ ( γ ) . (2.3)5e assume that γ ∈ Γ, where Γ is a compact subset of R d γ . With d γ diverging, ensuring Γhas bounded volume requires some care, see Gupta and Robinson (2018) for more detailedanalysis. For a known function f ( · ), our aim is to test H : P [ f ( x i , α ) = θ ( x i )] = 1 , for some α ∈ A , (2.4)where A ⊂ R d α against the global alternative H : P [ f ( x i , α ) = θ ( x i )] > , for all α ∈ A .We now nest commonly used models for spatial dependence in (2.3). Introduce a setof n × n spatial weight (equivalently network adjacency) matrices W j , j = 1 , . . . , m + m . Each W j can be thought of as representing dependence through a particular space.Now, consider models of the form Σ( γ ) = A − ( γ ) A ′− ( γ ). For example, with ξ denoting avector of iid disturbances with variance σ , the model with SARMA( m , m ) errors is u = P m j =1 γ j W j u + P m + m j = m +1 γ j W j ξ + ξ , with A ( γ ) = (cid:16) I n + P m + m j = m +1 γ j W j (cid:17) − (cid:16) I n − P m j =1 γ j W j (cid:17) ,assuming conditions that guarantee the existence of the inverse. Such conditions can befound in the literature, see e.g. Lee and Liu (2010) and Gupta and Robinson (2018). TheSEM model is obtained by setting m = 0 while the model with SMA errors has m = 0.The model with MESS( m ) errors (LeSage and Pace (2007), Debarsy, Jin, and Lee (2015)) is u = exp (cid:16)P mj =1 γ j W j (cid:17) ξ, A ( γ ) = exp (cid:16) − P mj =1 γ j W j (cid:17) . In some cases the space under consideration is geographic i.e. the data may be ob-served at irregular points in Euclidean space. For a generic matrix A , denote k A k =[ ϕ ( A ′ A )] / , i.e. the spectral norm of A which reduces to the Euclidean norm if A isa vector. Making the identification u i ≡ U ( t i ), t i ∈ R d for some d >
1, and assum-ing covariance stationarity, U ( t ) is said to follow an isotropic model if, for some func-tion δ on R , the covariance at lag s is r ( s ) = E [ U ( t ) U ( t + s )] = δ ( k s k ). An importantclass of parametric isotropic models is that of Mat´ern (1986), which can be parameter-ized in several ways, see e.g. Stein (1999). Denoting by Γ f the Gamma function andby K γ the modified Bessel function of the second kind (Gradshteyn and Ryzhik (1994)),take δ ( k s k , γ ) = (2 γ − Γ f ( γ )) − (cid:0) γ − √ γ k s k (cid:1) γ K γ (cid:0) γ − √ γ k s k (cid:1) , with γ , γ > d γ = 2. With d γ = 3, another model takes δ ( k s k , γ ) = γ exp ( − k s/γ k γ ), see e.g.De Oliveira, Kedem, and Short (1997), Stein (1999). Fuentes (2007) considers this modelwith γ = 1, as well as a specific parameterization of the Matern covariance function. We estimate θ ( · ) via a series approximation. Certain technical conditions are needed to allowfor X to have unbounded support. To this end, for a function g ( x ) on X , define a weighted6up-norm (see e.g. Gallant and Nychka (1987), Chen, Hong, and Tamer (2005), Chen(2007), Lee and Robinson (2016)) by k g k w = sup x ∈X | g ( x ) | (cid:0) k x k (cid:1) − w/ , for some w > . Assume that there exists a sequence of functions ψ i := ψ ( x i ) : R k R p , where p → ∞ as n → ∞ , and a p × β such that θ ( x i ) = ψ ′ i β + e ( x i ) , (3.1)where e ( · ) satisfies: Assumption R.1.
There exists a constant µ > such that k e k w x = O ( p − µ ) , as p → ∞ ,where w x ≥ is the largest value such that sup i =1 ,...,n E k x i k w x < ∞ , for all n . By Lemma 1 in Appendix B of Lee and Robinson (2016), this assumption implies thatsup i =1 ,...,n E (cid:0) e ( x i ) (cid:1) = O (cid:0) p − µ (cid:1) . (3.2)Due to the large number of assumptions in the paper, sometimes with changes reflectingonly the various setups we consider, we prefix assumptions with R in this section and thenext, to signify ‘regression’. In Section 5 the prefix is SAR, for ‘spatial autoregression’, whilein Section 6 we use NPN, for ‘nonparametric’.Let y = ( y , . . . , y n ) ′ , θ = ( θ ( x ) , . . . , θ ( x n )) ′ , Ψ = ( ψ ′ , . . . , ψ ′ n ) ′ . We will estimate γ using a quasi maximum likelihood estimator (QMLE) based on a Gaussian likelihood,although Gaussianity is nowhere assumed. For any admissible values β , σ and γ , the(multiplied by 2 /n ) negative quasi log likelihood function based on using the approximation(3.1) is L ( β, σ , γ ) = ln (cid:0) πσ (cid:1) + 1 n ln | Σ ( γ ) | + 1 nσ ( y − Ψ β ) ′ Σ ( γ ) − ( y − Ψ β ) , (3.3)which is minimised with respect to β and σ by¯ β ( γ ) = (cid:0) Ψ ′ Σ ( γ ) − Ψ (cid:1) − Ψ ′ Σ ( γ ) − y, (3.4)¯ σ ( γ ) = n − y ′ C ( γ ) ′ M ( γ ) C ( γ ) y, (3.5)where M ( γ ) = I n − C ( γ )Ψ (Ψ ′ Σ( γ ) − Ψ) − Ψ ′ C ( γ ) ′ and C ( γ ) is the n × n matrix such that C ( γ ) C ( γ ) ′ = Σ( γ ) − . Thus the concentrated likelihood function is L ( γ ) = ln(2 π ) + ln ¯ σ ( γ ) + 1 n ln | Σ ( γ ) | . (3.6)We define the QMLE of γ as b γ = arg min γ ∈ Γ L ( γ ) and the QMLEs of β and σ as b β = ¯ β ( b γ )7nd b σ = ¯ σ ( b γ ). For a given (suppressed) x , the series estimate of θ ( x ) is defined as b θ = ψ ( x ) ′ b β. (3.7)Let b α n ≡ b α denote an estimator consistent for α under H , for example the (nonlinear)least squares estimator. Note that b α is consistent only under H , so we introduce a generalprobability limit of b α , as in Hong and White (1995). Assumption R.2.
There exists a deterministic sequence α ∗ n ≡ α ∗ such that b α − α ∗ = O p (1 / √ n ) . Following Hong and White (1995), define the regression error u i ≡ y i − f ( x i , α ∗ )and the specification error v i ≡ θ ( x i ) − f ( x i , α ∗ ). Our test statistic is basedon an appropriately scaled and centred version of b m n = b σ − b v ′ Σ ( b γ ) − b u/n = b σ − (cid:16)b θ − f ( x, b α ) (cid:17) ′ Σ ( b γ ) − ( y − f ( x, b α )) /n , where f ( x, α ) = ( f ( x , α ) , . . . , f ( x n , α )) ′ . Pre-cisely, it is defined as T n = n b m n − p √ p . (3.8)The motivation for such a centering and scaling stems from the fact that, for fixed p , n b m n has an asymptotic χ p distribution. Such a distribution has mean p and variance 2 p , and itis a well-known fact that (cid:0) χ p − p (cid:1) / √ p d −→ N (0 , , as p → ∞ . This motivates our use of(3.8) and explains why we aspire to establish a standard normal distribution under the nullhypothesis. b γ We first provide conditions under which our estimator b γ of γ is consistent. Such a propertyis necessary for the results that follow. Let ϕ ( A ) (respectively ϕ ( A )) denote the largest(respectively smallest) eigenvalue of a generic square nonnegative definite matrix A . Thefollowing assumption is a rather standard type of asymptotic boundedness and full-rankcondition on Σ( γ ). Assumption R.3. lim n →∞ sup γ ∈ Γ ¯ ϕ (Σ( γ )) < ∞ and lim n →∞ inf γ ∈ Γ ϕ (Σ( γ )) > . ssumption R.4. The u i , i = 1 , . . . , n, satisfy the representation (2.2). The ε s , s ≥ , havezero mean, finite third and fourth moments µ and µ respectively and, denoting by σ ij ( γ ) the ( i, j ) -th element of Σ( γ ) and defining b ∗ is = b is /σ ii , i = 1 , . . . , n, n ≥ , s ≥ , we have lim n →∞ sup i =1 ,...,n ∞ X s =1 | b ∗ is | + sup s ≥ lim n →∞ n X i =1 | b ∗ is | < ∞ . (4.1)By Assumption R.3, σ ii is bounded and bounded away from zero, so the normalization ofthe b is in Assumption R.4 is well defined. The summability conditions in (4.1) are typicalconditions on linear process coefficients that are needed to control dependence; for instancein the case of stationary time series b ∗ is = b ∗ i − s . The infinite linear process assumed in (2.2) isfurther discussed by Robinson (2011), who introduced it, and also by Delgado and Robinson(2015).Because we often need to consider the difference between values of the matrix-valued func-tion Σ( · ) at distinct points, it is useful to introduce an appropriate concept of ‘smoothness’.This concept occurs in functional analysis and is defined below. Definition 1.
Let ( X, k·k X ) and ( Y, k·k Y ) be Banach spaces, L ( X, Y ) be the Banach spaceof linear continuous maps from X to Y with norm k T k L ( X,Y ) = sup k x k X ≤ k T ( x ) k Y and U be an open subset of X . A map F : U → Y is said to be Fr´echet-differentiable at u ∈ U ifthere exists L ∈ L ( X, Y ) such that lim k h k X → k F ( u + h ) − F ( u ) − L ( h ) k Y k h k X = 0 . (4.2) L is called the Fr´echet-derivative of F at u . The map F is said to be Fr´echet-differentiableon U if it is Fr´echet-differentiable for all u ∈ U . The above definition extends the notion of a derivative that is familiar from real analysisto the functional spaces and allows us to check high-level assumptions that past literature hasimposed. To the best of our knowledge, this is the first use of such a concept in the literatureon spatial/network models. Denote by M n × n the set of real, symmetric and positive semi-definite n × n matrices. Let Γ o be an open subset of Γ and consider the Banach spaces (cid:16) Γ , k·k g (cid:17) and ( M n × n , k·k ), where k·k g is a generic ℓ p norm, p ≥
1. Denote by c ( C ) genericpositive constants, independent of p , d γ and n and arbitrarily small (big). The followingassumption ensures that Σ( · ) is a ‘smooth’ function, in the sense of Fr´echet-smoothness. Assumption R.5.
The map
Σ : Γ o → M n × n is Fr´echet-differentiable on Γ o with Fr´echet- erivative denoted D Σ ∈ L (Γ o , M n × n ) . Furthermore, the map D Σ satisfies sup γ ∈ Γ o k D Σ( γ ) k L (Γ o , M n × n ) ≤ C. (4.3)Assumption R.5 is a functional smoothness condition on spatial dependence. It has theadvantage of being checkable for a variety of commonly employed models. For example,a first-order SEM has Σ( γ ) = A − ( γ ) A ′− ( γ ) with A = I n − γW . Corollary B.1 in theappendix shows ( D Σ( γ )) (cid:0) γ † (cid:1) = γ † A − ( γ ) ( G ′ ( γ ) + G ( γ )) A ′− ( γ ), at a given point γ ∈ Γ o ,where G ( γ ) = W A − ( γ ). Then, taking k W k + sup γ ∈ Γ (cid:13)(cid:13) A − ( γ ) (cid:13)(cid:13) < C (4.4)yields Assumption R.5. Condition (4.4) limits the extent of spatial dependence and is verystandard in the spatial literature; see e.g. Lee (2004) and numerous subsequent papersemploying similar conditions. Fr´echet derivatives for higher-order SAR, SMA, SARMA andMESS error structures are computed in appendix B, in Lemmas B.5-B.6 and CorollariesB.1-B.2. The following proposition is very useful in ‘linearizing’ perturbations in the Σ( · ). Proposition 4.1.
If Assumption R.5 holds, then for any γ , γ ∈ Γ o , k Σ ( γ ) − Σ ( γ ) k ≤ C k γ − γ k . (4.5)To illustrate how the concept of Fr´echet-differentiability allows us to check high-level as-sumptions extant in the literature, a consequence of Proposition 4.1 is the following corollary,a version of which appears as an assumption in Delgado and Robinson (2015). Corollary 4.1.
For any γ ∗ ∈ Γ o and any η > , lim n →∞ sup γ ∈{ γ : k γ − γ ∗ k <η }∩ Γ o k Σ( γ ) − Σ ( γ ∗ ) k < Cη. (4.6)We now introduce regularity conditions needed to establish the consistency of ˆ γ . For ageneric matrix, let k A k F = [ tr ( AA ′ )] / . Define σ ( γ ) = n − σ tr (cid:0) Σ( γ ) − Σ (cid:1) = n − σ (cid:13)(cid:13) C ( γ ) C − (cid:13)(cid:13) F , which is nonnegative by definition and bounded by Assumption R.3. Assumption R.6. c ≤ σ ( γ ) ≤ C for all γ ∈ Γ . ssumption R.7. γ ∈ Γ and, for any η > , lim n →∞ inf γ ∈N γ ( η ) n − tr (Σ( γ ) − Σ) | Σ( γ ) − Σ | /n > , (4.7) where N γ ( η ) = Γ \ N γ ( η ) and N γ ( η ) = { γ : k γ − γ k < η } ∩ Γ . Assumption R.8. (cid:8) ϕ ( n − Ψ ′ Ψ) (cid:9) − + ϕ ( n − Ψ ′ Ψ) = O p (1) . Assumption R.6 is a boundedness condition originally considered in Gupta and Robinson(2018), while Assumptions R.7 and R.8 are identification conditions. Indeed, AssumptionR.7 requires that Σ( γ ) be identifiable in a small neighbourhood around γ . This is ap-parent on noticing that the ratio in (4.7) is at least one by the inequality between arith-metic and geometric means, and equals one when Σ( γ ) = Σ. Similar assumptions arisefrequently in related literature, see e.g. Lee (2004), Delgado and Robinson (2015). Assump-tion R.8 is a typical asymptotic boundedness and non-multicollinearity condition, see e.g.Newey (1997) and much other literature on series estimation. By Assumption R.3, it impliessup γ ∈ Γ (cid:8) ϕ ( n − Ψ ′ Σ( γ ) − Ψ) (cid:9) − = O p (1). Theorem 4.1.
Under Assumptions R.1-R.8 and p − + d − γ + ( d γ + p ) /n → as n → ∞ , k b γ − γ k p −→ . Note that while we explicitly allow d γ → ∞ in the theorem statement above this is notnecessary for the result, which also holds as a particular case if d γ is fixed. Write Σ j ( γ ) = ∂ Σ( γ ) /∂γ j , j = 1 , . . . , d γ , the matrix differentiated element-wise. Assumption R.9. lim n →∞ sup j =1 ,...,d γ k Σ j k < ∞ . We will later consider the sequence of local alternatives H ℓn ≡ H ℓ : f ( x i , α ∗ n ) = θ ( x i ) + ( p / /n / ) h ( x i ) , (4.8)where h is square integrable on the support X of the x i . Under the null H , we have h ( x i ) = 0. Assumption R.10.
For each n ∈ N and i = 1 , . . . , n , the function f : X × A → R is such that f ( x i , α ) is measurable for each α ∈ A , f ( x i , · ) is a.s. continuous on A ,with f ( x i , · ) ≤ D n ( x i ) , where sup n ∈ N D n ( x i ) is integrable and sup α ∈A k ∂f ( x i , α ) /∂α k ≤ D n ( x i ) , sup α ∈A k ∂ f ( x i , α ) /∂α∂α ′ k ≤ D n ( x i ) . V = B ′ Σ − Ψ (Ψ ′ Σ − Ψ) − Ψ ′ Σ − B , which is sym-metric, idempotent and has rank p . We now show that our test statistic is approximated bya quadratic form in ε , weighted by V . Theorem 4.2.
Under Assumptions R.1-R.10, p − + d − γ + p (cid:0) p + d γ (cid:1) /n + √ n/p µ +1 / → ,as n → ∞ , and H , T n − (cid:0) σ − ε ′ V ε − p (cid:1) / √ p = o p (1) . Denote by k A k R the maximum absolute row sum of a generic matrix A . Assumption R.11. lim n →∞ k Σ − k R < ∞ . Because k Σ − k ≤ k Σ − k R , this restriction on spatial dependence is somewhat strongerthan a restriction on spectral norm but is typically imposed for central limit theorems in thistype of setting, cf. Lee (2004), Delgado and Robinson (2015), Gupta and Robinson (2018).The next assumption is needed in our proofs to check a Lyapunov condition. Assumption R.12.
The ε s , s ≥ , have finite eighth moment. Assumption R.13. E | ψ ( x ) | < C . The next theorem establishes the asymptotic normality of the approximating quadraticform introduced above.
Theorem 4.3.
Under Assumptions R.3, R.4, R.8, R.11-R.13 and p − + p /n → , as n → ∞ , (cid:0) σ − ε ′ V ε − p (cid:1) / √ p d −→ N (0 , . This is a new type of CLT, integrating both a linear process framework as well as anincreasing dimension element. A linear-quadratic form in iid disturbances is treated byKelejian and Prucha (2001), while a quadratic form in a linear process framework is treatedby Delgado and Robinson (2015). However both results are established in a parametricframework, entailing no increasing dimension aspect of the type we face with p → ∞ .Next, we summarize the properties of our test statistic in a theorem that recordsits asymptotic normality under the null, consistency and ability to detect local alterna-tives at p / /n / rate. This rate has been found also by De Jong and Bierens (1994)and Gupta (2018). Introduce the quantity κ = (cid:0) √ σ (cid:1) − plim n →∞ n − h ′ Σ − h , where h = ( h ( x ) , . . . , h ( x n )) ′ and h ( x i ) is from (4.8). Theorem 4.4.
Under the conditions of Theorems 4.2 and 4.3, (1) T n d → N (0 , under H ,(2) T n is a consistent test statistic (i.e. it has asymptotically unit power under H ), (3) T n d → N ( κ , under local alternatives H ℓ . Models with SAR structure in responses
The previous sections provided the ingredients for the study of our main model of interest,viz. y i = d λ X j =1 λ j w ′ i,j y + θ ( x i ) + u i , i = 1 , . . . , n, (5.1)where W j , j = 1 , . . . , d λ , are known spatial weight matrices with i -th rows denoted w ′ i,j ,as discussed earlier, and λ j are unknown parameters measuring the strength of spatialdependence. The error structure remains the same as in (2.2). Here spatial dependencearises not only in errors but also responses. For example, this corresponds to a situationwhere agents in a network influence each other both in their observed and unobserved actions.While the model in (5.1) is new in the literature, some related ones are discussedhere. Models such as (5.1) but without dependence in the error structure are consideredby Su and Jin (2010) and Gupta and Robinson (2015, 2018), but the former consider only d λ = 1 and the latter only parametric θ ( · ). Linear θ ( · ) and d λ > θ ( · ) is proposed by Su and Qu (2017). In comparison, our modelis much more general and our test can handle more general parametric null hypotheses.Denoting S ( λ ) = I n − P d λ j =1 λ j W j , the quasi likelihood function based on Gaussianityand conditional on covariates is L ( β, σ , φ ) = log (2 πσ ) − n log | S ( λ ) | + 1 n log | Σ ( γ ) | + 1 σ n ( S ( λ ) y − Ψ β ) ′ Σ( γ ) − ( S ( λ ) y − Ψ β ) , (5.2)at any admissible point ( β ′ , φ ′ , σ ) ′ with φ = ( λ ′ , γ ′ ) ′ , for nonsingular S ( λ ) and Σ( γ ). Forgiven φ = ( λ ′ , γ ′ ) ′ , (5.2) is minimised with respect to β and σ by¯ β ( φ ) = (cid:0) Ψ ′ Σ( γ ) − Ψ (cid:1) − Ψ ′ Σ( γ ) − S ( λ ) y, (5.3)¯ σ ( φ ) = n − y ′ S ′ ( λ ) C ( γ ) ′ M ( γ ) C ( γ ) S ( λ ) y. (5.4)The QMLE of φ is b φ = arg min φ ∈ Φ L ( φ ), where L ( φ ) = log ¯ σ ( φ ) + n − log (cid:12)(cid:12) S ′− ( λ ) Σ( γ ) S − ( λ ) (cid:12)(cid:12) , (5.5)and Φ = Λ × Γ is taken to be a compact subset of R d λ + dγ . The QMLEs of β and σ are13efined as ¯ β (cid:16) b φ (cid:17) ≡ b β and ¯ σ (cid:16) b φ (cid:17) ≡ b σ respectively. The following assumption controlsspatial dependence and is discussed below equation (4.4). Assumption SAR.1. max j =1 ,...,d λ k W j k + k S − k < C . Writing T ( λ ) = S ( λ ) S − and φ = ( λ ′ , γ ′ ) ′ , define the quantity σ ( φ ) = n − σ tr (cid:0) T ′ ( λ )Σ( γ ) − T ( λ )Σ (cid:1) = n − σ (cid:13)(cid:13)(cid:13) Σ( γ ) − T ( λ )Σ (cid:13)(cid:13)(cid:13) F , which is nonnegative by definition and bounded by Assumptions R.3 and SAR.1. Theassumptions below directly extend Assumptions R.6 and R.7 to the present setup. Assumption SAR.2. c ≤ σ ( φ ) ≤ C , for all φ ∈ Φ . Assumption SAR.3. φ ∈ Φ and, for any η > , lim n →∞ inf φ ∈N φ ( η ) n − tr ( T ′ ( λ )Σ( γ ) − T ( λ )Σ) | T ′ ( λ )Σ( γ ) − T ( λ )Σ | /n > , (5.6) where N φ ( η ) = Φ \ N φ ( η ) and N φ ( η ) = { φ : k φ − φ k < η } ∩ Φ . We now introduce an identification condition that is required in the setup of this section.
Assumption SAR.4. β = 0 , λ = 0 and, for any η > , P (cid:18) lim n →∞ inf ( λ ′ ,γ ′ ) ′ ∈ Λ ×N γ ( η ) n − β ′ Ψ ′ T ′ ( λ ) C ( γ ) ′ M ( γ ) C ( γ ) T ( λ )Ψ β / k β k > (cid:19) = 1 . (5.7)Upon performing minimization with respect to β , the event inside the probability in (5.7)is equivalent to the eventlim n →∞ min β ∈ R p inf ( λ ′ ,γ ′ ) ′ ∈ Λ ×N γ ( η ) n − (Ψ β − T ( λ )Ψ β ) ′ Σ( γ ) − (Ψ β − T ( λ )Ψ β ) / k β k > , making the identifying nature of the assumption transparent. A similar identifying assump-tion is used by Gupta and Robinson (2018), and indeed in the context of nonlinear regressionby Robinson (1972). Theorem 5.1.
Under Assumptions R.1-R.5, R.8, SAR.1-SAR.4 and p − + d − γ + ( d γ + p ) /n → , as n → ∞ , (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) b φ ′ , b σ (cid:17) ′ − ( φ ′ , σ ) ′ (cid:13)(cid:13)(cid:13)(cid:13) p −→ as n → ∞ . heorem 5.2. Under Assumptions R.1-R.5, R.8-R.10, SAR.1-SAR.4, p − + d − γ + p (cid:0) p + d γ (cid:1) /n + √ n/p µ +1 / + d γ /p → , as n → ∞ , and H , T n − (cid:0) σ − ε ′ V ε − p (cid:1) / √ p = o p (1) . Theorem 5.3.
Under the conditions of Theorems 4.3, 5.1 and 5.2, (1) T n d → N (0 , under H , (2) T n is a consistent test statistic (i.e. it has asymptotically unit power under H ), (3) T n d → N ( κ , under local alternatives H ℓ . In this section we are motivated by settings where spatial dependence occurs through non-parametric functions of raw distances (this may be geographic, social, economic, or any othertype of distance), as is the case in Pinkse et al. (2002), for example. In their kind of setup, d ij is a raw economic distance between units i and j and the corresponding element of thespatial weight matrix is given by w ij = ζ ( d ij ), where ζ ( · ) is an unknown nonparametricfunction. Pinkse et al. (2002) use such a setup in a SAR model like (5.1), but with a linearregression function. In contrast, in keeping with the focus of this paper we instead modeldependence in the errors in this manner. Our formulation is rather general, covering, forexample, a specification like w ij = f ( γ, ζ ( d ij )), with f ( · ) a known function, γ an unknown parameter of possibly increasing dimension, and ζ ( · ) an unknown nonparametric function.Let Ξ be a compact space of functions, on which we will specify more conditions later.For notational simplicity we abstract away from the SAR dependence in the responses.Thus we consider (2.1), but with linear process coefficients now b js ( γ, ζ ( z )), with ζ ( · ) = (cid:0) ζ ( · ) , . . . , ζ d ζ ( · ) (cid:1) ′ a fixed-dimensional vector of real-valued nonparametric functions with ζ i ∈ Ξ for each i = 1 , . . . , d ζ , and z a fixed-dimensional vector of data, independent ofthe ε s , s ≥
1, with support Z . We base our estimation on approximating each ζ i ( z ), i = 1 , . . . , d ζ , with the series representation δ ′ i ϕ i ( z ), where ϕ i ( z ) ≡ ϕ i is an r i × r i → ∞ as n → ∞ ) vector of basis functions with typical function ϕ il , l = 1 , . . . , r i . The set oflinear combinations δ ′ i ϕ i ( z ) forms the sequence of sieve spaces Φ r i ⊂ Ξ as r i → ∞ , for any i = 1 , . . . , d ζ , and ζ i ( z ) = δ ′ i ϕ i + ν i , (6.1)with the following restriction on the function space Ξ: Assumption NPN.1.
For some scalars κ i > , k ν i k w z = O (cid:0) r − κ i i (cid:1) , as r i → ∞ , i =1 , . . . , d ζ , where w z ≥ is the largest value such that sup z ∈Z E k z k w z < ∞ z ∈Z E (cid:0) ν i (cid:1) = O (cid:0) r − κ i i (cid:1) , i = 1 , . . . , d ζ . (6.2)Thus we now have an infinite-dimensional nuisance parameter ζ ( · ) and increasing-dimensional nuisance parameter γ . Writing P d ζ i =1 r i = r and τ = ( γ ′ , δ ′ , . . . , δ ′ d ζ ) ′ , whichhas increasing dimension d τ = d γ + r , define ς ( r ) = sup z ∈Z ; i =1 ,...,d ζ k ϕ i k . For any admissible values β , σ and τ , the redefined (multiplied by 2 /n ) negative quasilog likelihood function based on using the approximations (3.1) and (6.1) is L ( β, σ , τ ) = ln (cid:0) πσ (cid:1) + 1 n ln | Σ ( τ ) | + 1 nσ ( y − Ψ β ) ′ Σ ( τ ) − ( y − Ψ β ) , (6.3)which is minimised with respect to β and σ by¯ β ( τ ) = (cid:0) Ψ ′ Σ ( τ ) − Ψ (cid:1) − Ψ ′ Σ ( τ ) − y, (6.4)¯ σ ( τ ) = n − y ′ C ( τ ) ′ M ( τ ) C ( τ ) y, (6.5)where M ( τ ) = I n − C ( τ )Ψ (Ψ ′ Σ( τ ) − Ψ) − Ψ ′ C ( τ ) ′ and C ( τ ) is the n × n matrix such that C ( τ ) C ( τ ) ′ = Σ( τ ) − . Thus the concentrated likelihood function is L ( τ ) = ln(2 π ) + ln ¯ σ ( τ ) + 1 n ln | Σ ( τ ) | . (6.6)Again, for compact Γ and sieve coefficient space ∆, the QMLE of τ is b τ = arg min τ ∈ Γ × ∆ L ( τ )and the QMLEs of β and σ are b β = ¯ β ( b τ ) and b σ = ¯ σ ( b τ ). For a given x , the series estimateof θ ( x ) is defined as b θ ( x ) = ψ ( x ) ′ b β. Define also the product Banach space T = Γ × Ξ d ζ with norm (cid:13)(cid:13) ( γ ′ , ζ ′ ) ′ (cid:13)(cid:13) T w = k γ k + P d ζ i =1 k ζ i k w , and consider the map Σ : T o → M n × n , where T o is an open subset of T . Assumption NPN.2.
The map
Σ : T o → M n × n is Fr´echet-differentiable on T o withFr´echet-derivative denoted D Σ ∈ L ( T o , M n × n ) . Furthermore, conditional on z , the map D Σ satisfies sup t ∈T o k D Σ( t ) k L ( T o , M n × n ) ≤ C, (6.7) on its domain T o . Proposition 6.1.
If Assumption NPN.2 holds, then for any t , t ∈ T o , conditional on z , k Σ ( t ) − Σ ( t ) k ≤ Cς ( r ) k t − t k . (6.8)16 orollary 6.1. For any t ∗ ∈ T o and any η > , conditional on z , lim n →∞ sup t ∈{ t : k t − t ∗ k <η }∩T o k Σ( t ) − Σ ( t ∗ ) k < Cς ( r ) η. (6.9) Assumption NPN.3. c ≤ σ ( τ ) ≤ C for τ ∈ Γ × ∆ , conditional on z . Denote Σ ( τ ) = Σ . Note that this is not the true covariance matrix, which is Σ ≡ Σ ( γ , ζ ). Assumption NPN.4. τ ∈ Γ × ∆ and, for any η > , conditional on z , lim n →∞ inf τ ∈N τ ( η ) n − tr (Σ( τ ) − Σ ) | Σ( τ ) − Σ | /n > , (6.10) where N τ ( η ) = (Γ × ∆) \ N τ ( η ) and N τ ( η ) = { τ : k τ − τ k < η } ∩ (Γ × ∆) . Remark 1.
Expressing the identification condition in Assumption NPN.4 in terms of τ implies that identification is guaranteed via the sieve spaces Φ r i , i = 1 , . . . , d ζ . This approachis common in the sieve estimation literature, see e.g. Chen (2007), p. 5589, Condition 3.1. Theorem 6.1.
Under Assumptions R.1-R.4 (with R.3 and R.4 holding for t ∈ T rather than γ ∈ Γ ), R.8, NPN.1-NPN.4 and p − + d − γ + (cid:0) min i =1 ,...,d ζ r i (cid:1) − + (cid:0) d γ + p + max i =1 ,...,d ζ r i (cid:1) /n → as n → ∞ , k b τ − τ k p −→ . Theorem 6.2.
Under the conditions of Theorems 4.2 and 6.1, but with τ and T replacing γ and Γ in assumptions prefixed with R, p → ∞ and d γ → ∞ , (cid:18) min i =1 ,...,d ζ r i (cid:19) − + p n + √ np µ +1 / + p / ς ( r ) d γ + max i =1 ,...,d ζ r i √ n + vuut d ζ X i =1 r − κ i i → , as n → ∞ , and H , T n − (cid:0) σ − ε ′ V ε − p (cid:1) / √ p = o p (1) . Theorem 6.3.
Let the conditions of Theorems 4.3 and 6.2 hold, but with τ and T replacing γ and Γ in assumptions prefixed with R. Then (1) T n d → N (0 , under H , (2) T n is aconsistent test statistic (i.e. it has asymptotically unit power under H ), (3) T n d → N ( κ , under local alternatives H ℓ . We now examine the finite sample performance of our test using Monte Carlo experiments.To study the size behaviour of our test, we generate the null model by DGP1: θ ( x i ) =17 + x i + x i = x ′ i α , where x i = ( z i + z i ) / x i = ( z i + z i ) / z i , z i , and z i are iid U [0 , π ].A linear model f ( x i , α ) = x ′ i α is correctly specified for θ ( x i ) under DGP1 and misspecifiedunder DGP2: θ ( x i ) = x ′ i α + 0 . z i − π )( z i − π ) and DGP3: θ ( x i ) = x ′ i α + sin( x ′ i α ) . To illustrate different spatial structures in the error term, for every specification of θ ( x i ), wegenerate y asSARSE( m , m , m ): y = m X k =1 λ k W k y + θ ( x ) + u, u = m X l =1 γ l W l u + m X l =1 γ l W l ξ + ξ, where ξ ∼ N (0 , I n ). We use a power series for our test and rates: p = 9 , n = 100 , , . The spatial weight matrices are generated using LeSage’s code make neighborsw n/ λ = 0 . γ = 0 .
4; SARSE(1,0,1): λ = 0 . γ = 0 .
4; SARSE(2,0,1): λ = 0 . λ = 0 . γ = 0 .
4; SARSE(2,1,0): λ = 0 . λ = 0 . γ = 0 .
4; SARSE(2,2,0): λ = 0 . λ = 0 . γ = 0 . γ = 0 .
1. We consider test statistics based on both b m n = b σ − b v ′ Σ ( b γ ) − b u/n and e m n = b σ − ( b u ′ Σ ( b γ ) − b u − b η ′ Σ ( b γ ) − b η ) /n , where b η = y − b θ , i.e.,the residual from nonparametric estimation. Analogous to the definition of T n , define thestatistic T an = ( n e m n − p ) / √ p. In the case of no spatial autoregressive term, and underthe power series, T an and T n are numerically identical, as was observed by Hong and White(1995). However, in the SARSE setting a difference arises due to the spatial structure in theresponse y . We show that T an − T n = o p (1) in Theorem B.1 in the appendix.Table 1 reports the rejection rates using 500 Monte Carlo simulation at the 5% asymp-totic level, which implies the critical value 1.645. We find that size control improves withincreasing sample size, as one would expect, and in the majority of cases the asymptoti-cally equivalent T an performs better. Thus, although the two statistics are asymptoticallyequivalent, T an might be preferred in finite samples, especially given its largely comparablepower performance relative to T n . We find that sizes are generally larger for p = 14 than for p = 9, but for the former these are acceptable compared to the nominal 5% when n = 300 or n = 500. For both values of p , power improves as sample size increases for all spatial struc-tures, although it is lowest under DGP2. This was also noted by Hong and White (1995)18 = 9 SARSE(1,1,0) SARSE(1,0,1) SARSE(2,0,1) SARSE(2,1,0) SARSE(2,2,0) n = 100 T n T an T n T an T n T an T n T an T n T an DGP1 0.0160 0.0220 0.0180 0.0200 0.0400 0.0860 0.0360 0.0660 0.0600 0.0780DGP2 0.3000 0.2760 0.2960 0.2880 0.4080 0.4120 0.3860 0.4020 0.3420 0.3680DGP3 0.8960 0.9220 0.9100 0.9240 0.8200 0.8680 0.7900 0.8460 0.8920 0.9200 n = 300DGP1 0.0300 0.0300 0.0280 0.0300 0.0220 0.0220 0.0260 0.0340 0.0200 0.0300DGP2 0.8820 0.8840 0.8840 0.8820 0.9220 0.9060 0.9160 0.8980 0.9400 0.9340DGP3 1.0000 1.0000 1.0000 1.0000 0.9980 1.0000 0.9960 0.9960 1.0000 1.0000 n = 500DGP1 0.0180 0.0360 0.0140 0.0260 0.0240 0.0500 0.0260 0.0440 0.0180 0.0700DGP2 0.9960 0.9860 0.9980 0.9980 1.0000 0.9960 1.0000 0.9920 0.9960 0.9900DGP3 1.0000 0.9960 1.0000 0.9940 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p = 14 SARSE(1,1,0) SARSE(1,0,1) SARSE(2,0,1) SARSE(2,1,0) SARSE(2,2,0) n = 100 T n T an T n T an T n T an T n T an T n T an DGP1 0.0540 0.0540 0.0440 0.0440 0.0700 0.1200 0.0580 0.0940 0.1040 0.1080DGP2 0.2880 0.2760 0.2940 0.2720 0.4000 0.3860 0.3640 0.3920 0.3500 0.3840DGP3 0.9920 0.9920 0.9940 0.9940 0.9660 0.9780 0.9580 0.9720 0.9700 0.9800 n = 300DGP1 0.0280 0.0300 0.0240 0.0260 0.0360 0.0300 0.0340 0.0440 0.0420 0.0440DGP2 0.8400 0.8400 0.8400 0.8340 0.8900 0.8720 0.8760 0.8700 0.9060 0.9060DGP3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 n = 500DGP1 0.0200 0.0500 0.0200 0.0500 0.0340 0.0460 0.0300 0.0520 0.0320 0.0860DGP2 0.9880 0.9760 0.9920 0.9840 0.9980 0.9920 0.9960 0.9920 0.9940 0.9900DGP3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 Table 1: Rejection probabilities at 5% asymptotic level, parametric spatial structures19 = 2 r = 3 r = 4 n = 150 p = 9 p = 14 p = 9 p = 14 p = 9 p = 14DGP1 0 .
086 0 .
202 0 .
118 0 .
206 0 .
142 0 . .
474 0 .
558 0 .
496 0 .
566 0 .
504 0 . .
954 0 .
998 0 .
964 0 .
998 0 .
960 0 . n = 300DGP1 0 .
082 0 .
096 0 .
088 0 .
108 0 .
106 0 . .
802 0 .
778 0 .
806 0 .
786 0 .
814 0 . .
996 1 0 .
996 1 0 .
996 1 n = 500DGP1 0 .
028 0 .
042 0 .
026 0 .
040 0 .
036 0 . .
980 0 .
970 0 .
980 0 .
968 0 .
980 0 . Table 2: Rejection probabilities at 5% asymptotic level, nonparametric spatial structurewho used DGP2 in their iid setting. However with the largest sample size ( n = 500), poweris virtually unity in all cases. Now we examine finite sample performance in the setting of Section 6. The three DGPs of θ ( x ) are the same as the parametric setting and we generate the n × n matrix W ∗ as w ∗ ij =Φ( − d ij ) I ( c ij < .
05) if i = j , and w ∗ ii = 0, where Φ( · ) is the standard normal cdf, d ij ∼ iid U [ − , c ij ∼ iid U [0 , W ∗ is sparse with nomore than 5% elements being nonzero. Then, y is generated from y = θ ( x ) + u, u = W u + ξ, where W = W ∗ / . ϕ ( W ∗ ), ensuring the existence of ( I − W ) − . In estimation, we know thedistance d ij and the indicator I ( c ij < . d ij in w ij , so we approximate elements in W by b w ij = P rl =0 a l d lij I ( c ij < .
05) if i = j ; b w ii = 0 . Estimation is carried out using the MLE described in Section 6.Table 2 reports the rejection rates using 500 Monte Carlo simulation at the 5% asymptoticlevel 1.645 in this nonparametric error spatial setting using r = 2 , , p = 9 ,
14. Oursmallest sample size is n = 150 rather than n = 100 as earlier because two nonparametricfunctions must be estimated in the nonparametric spatial setting. We observe a clear patternof rejection rates approaching the theoretical level as sample size increases. Generally, thepower is excellent for all DGPs for n ≥
300 and sizes are acceptable for n = 500, particularlywhen p = 14. The power against DGP3 is always higher than that against DGP2, as observedin the parametric setup of the previous subsection.20 Empirical applications
This example is based on a study of how a network of military alliances and enmities affectsthe intensity of a conflict, conducted by K¨onig et al. (2017). They stress that understandingthe role of informal networks of military alliances and enmities is important not only forpredicting outcomes, but also for designing and implementing policies to contain or putan end to violence. K¨onig et al. (2017) obtain a closed-form characterization of the Nashequilibrium and perform an empirical analysis using data on the Second Congo War, aconflict that involves many groups in a complex network of informal alliances and rivalries.To study the fighting effort of each group the authors use a panel data model withindividual fixed effects, where key regressors include total fighting effort of allies and enemies.They further correct the potential spatial correlation in the error term by using a spatialHAC standard error. We use their data and the main structure of the specification and builda cross-sectional SAR(2) model with two weight matrices, W A ( W Aij = 1 if group i and j areallies, and W Aij = 0 otherwise) and W E ( W Eij = 1 if group i and j are enemies, and W Eij = 0otherwise): y = λ W A y + λ W E y + n β + Xβ + u, where y is a vector of fighting efforts of each group and X includes the current rainfall, rainfallfrom the last year, and their squares. To consider the spatial correlation in the error term,we consider both the Error SARMA(1,0) and Error SARMA(0,1) structures. For these, weemploy a spatial weight matrix W d , based on the inverse distance between group locationsand set to be 0 after 150 km, following K¨onig et al. (2017). The idea is that geographicalspatial correlation dies out as groups become further apart. We also report results usinga nonparametric estimator of the spatial weights, as described in Section 6 and studied insimulations in Section 7. For the nonparametric estimator we take r = 2.In the original dataset, there are 80 groups, but groups 62 and 63 have the same variablesand the same locations, so we drop one group and end up with a sample of 79 groups. We usedata from 1998 as an example and further use the pooled data from all years as a robustnesscheck. The column IV of Table 3 is from Table 1 of K¨onig et al. (2017) based on theirpanel IV estimation, which we report for the sake of comparison. H stands for restrictedmodel where the linear functional form of the regression is imposed, while H stands for theunrestricted model where we use basis functions comprising of power series with p = 9. In allour specifications, the test statistics are negative, so we cannot reject the null hypothesis thatthe model is correctly specified. As Table 3 indicates, this failure to reject the null persists21hen we use pooled data from 13 years, yielding 1027 observations. Thus we conclude thata linear specification is not inappropriate for this setting. Another interesting finding is thatallowing for explicit spatial dependence in the disturbances reduces the magnitude of theestimated λ , i.e. the coefficient on W A y , drastically, a feature that is not replicated for theestimate of λ . This example is based on the study of the impact of R&D on growth from Bloom et al.(2013). They develop a general framework incorporating two types of spillovers: a positiveeffect from technology (knowledge) spillovers and a negative ‘business stealing’ effect fromproduct market rivals. They implement this model using panel data on U.S. firms.We consider the Productivity Equation in Bloom et al. (2013):ln y = ϕ ln( R & D ) + ϕ ln( Sptec ) + ϕ ln( Spsic ) + ϕ X + error, (8.1)where y is a vector of sales, R & D is a vector of R&D stocks, and regressors in X include thelog of capital ( Capital ), log of labour (
Labour ), R & D , a dummy for missing values in R & D ,a price index, and two spillover terms constructed as the log of W SIC R & D ( Spsic ) and thelog of W T EC R & D ( Sptec ), where W SIC measures the product market proximity and W T EC measures the technological proximity. Specifically, they define W SIC,ij = S i S ′ j / ( S i S ′ i ) / ( S j S ′ j ) / , W T EC,ij = T i T ′ j / ( T i T ′ i ) / ( T j T ′ j ) / , where S i = ( S i , S i , . . . , S i ) ′ , with S ik being the share of patents of firm i in the fourdigit industry k and T i = ( T i , T i , . . . , T i ) ′ , with T iτ being the share of patents of firm i in technology class τ . Focussing on a cross-sectional analysis, we use observations from theyear 2000 and obtain a sample size of 577.The column FE of Table 4 is from Table 5 of Bloom et al. (2013) based on their panelestimation and we use it as a baseline for comparison. “SE” is the spatial error model cor-responding to the Error SARMA(1,0) in our Monte Carlo setting. We use either W SIC or W T EC in the SE setting and both of these two matrices in the Error SARMA(2,0), ErrorSARMA(0,2), and Error MESS(2) models. In all of these specifications, the test statisticsare larger than 1.645, so we reject the null hypothesis of the linear specification. However,we can say even more as our estimation also sheds light on spatial effects in the disturbances The original model does not feature spatially lagged y , so the two test statistics are numerically equivalentand Table 4 reports one test only.
22n (8.1). As before H imposes linear functional form of the regressors, while H uses thenonparametric series estimate employing power series with p = 9. Regardless of the spec-ification of the regression function, the disturbances suggest a strong spatial effect as thecoefficients on W T EC and W SIC are large in magnitude.
The final example is based on the study of economic growth rate in Ertur and Koch (2007).Knowledge accumulated in one area might depend on knowledge accumulated in other areas,especially in its neighborhoods, implying the possible existence of the spatial spillover effectssuggesting a natural use of spatial econometrics models to model such technological interde-pendence. These questions are of interest to both economists as well as regional scientists.For example, Autant-Bernard and LeSage (2011) examine spatial spillovers associated withresearch expenditures for French regions, while Ho, Wang, and Yu (2013) examine the in-ternational spillover of economic growth through bilateral trade amongst OECD countries,Cuaresma and Feldkircher (2013) study spatially correlated growth spillovers in the incomeconvergence process of Europe, and Evans and Kim (2014) study the spatial dynamics ofgrowth and convergence in Korean regional incomes.We wish to test the linear SAR model specification in Ertur and Koch (2007). Theirdataset covers a sample of 91 countries over the period 1960-1995 originally fromHeston, Summers, and Aten (2002) obtained from Penn World Tables (PWT version 6.1).The variables used include per worker income in 1960 ( y
60) and 1995 ( y gy ), average investment rate of this period ( s ), and averagerate of growth of working-age population ( n p ). The dataset can be downloaded from JAEData Archive at http://qed.econ.queensu.ca/jae/2007-v22.6/.Ertur and Koch (2007) consider the model y = λW y + n β + Xβ + W Xθ + ε, (8.2)where the dependent variable is log real income per worker ln( y X = ( x ′ , x ′ ) include log investment rate ln( s ) = x and log physical capitaleffective rate of depreciation ln( n p + 0 .
05) = x , with corresponding subscripted coefficients β , β , θ , θ . A restricted regression based on the joint constraints β = − β and θ = − θ (these constraints are implied by economic theory) is also considered in Ertur and Koch(2007). The model (8.2) can be considered as a special case of the SAR model with a generalregressor X ∗ = ( X, W X ) and iid errors, so the test derived in Section 5 can be directlyapplied here. Denoting by d ij the great-circle distance between the capital cities of coun-23ries i and j , one construction of W takes w ij = d − ij while the other takes w ij = e − d ij ,following Ertur and Koch (2007). Table 5 presents results based on the unrestricted modeland restricted model under two constructions of W using power series basis functions with p = 9. Using our specification test, we cannot reject linearity of the regression functionunder the unrestricted model. On the other hand, linearity is rejected under the restrictedmodel, which is the preferred specification of Ertur and Koch (2007). Thus, not only can weconclude that the specification of the model is under suspicion we can also infer that suchdoubts are created by imposing constraints from economic theory.24 V SARMA(1,0) SARMA(0,1) Nonparametric1998 Pooled 1998 Pooled 1998 Pooled H H H H H H H H H H H H W A y -0.218 -0.005 -0.003 0.013 0.013 0.001 0.011 0.013 0.013 -0.052 -0.011 0.033 0.033 W E y W d -0.159 -0.225 -0.086 -0.086 -0.153 -0.050 -0.086 -0.086 T n -1.921 -2.531 -1.763 -2.421 -1.294 -2.314 T an -1.918 -2.547 -2.349 -2.423 -1.898 -2.530 Table 3: The estimates and test statistics for the conflict data
Variables FE SE W SIC SE W T EC
SARMA(2,0) SARMA(0,2) Error MESS(2) H H H H H H H H H H ln( Sptech ) 0.191 0.007 0.015 0.008 0.017 0.009 0.018 -0.0002 0.013 0.002 0.014ln(
Spsic ) -0.005 0.006 -0.0001 0.038 0.020 0.044 0.026 0.033 0.017 0.045 0.027ln(
Capital ) 0.636 0.572 0.571 0.573 0.565 0.569ln(
Labor ) 0.154 0.336 0.318 0.315 0.334 0.323ln( R & D ) 0.043 0.814 0.082 0.081 0.076 0.077 W SIC W T EC T n Table 4: The estimates and test statistics for the R&D data ariable w ∗ ij = d − ij for i = j w ∗ ij = e − d ij for i = j estimate p-value estimate p-valueConstant 1 . .
608 0 . . s ) 0 . < .
001 0 . < . n p + 0 . − . . − . . W ln( s ) − . . − . . W ln( n p + 0 .
05) 0 . .
498 0 . . W y . < .
001 0 . < . T n − . − . T an − . − . . < .
001 2 . < . s ) − ln( n + 0 .
05) 0 . < .
001 0 . < . W [ln( s ) − ln( n p + 0 . − . . − . . W ln( y ) 0 . < .
001 0 . < . T n .
30 4 . T an .
10 4 . Table 5: The estimates and test statistics of the linear SAR model for the growth data
AppendixA Proofs of theorems and propositions
Proof of Proposition 4.1:
Because the map Σ : Γ o → M n × n is Fr´echet-differentiable on Γ o ,it is also Gˆateaux-differentiable and the two derivative maps coincide. Thus by Theorem 1.8of Ambrosetti and Prodi (1995), k Σ ( γ ) − Σ ( γ ) k ≤ sup γ ∈ ℓ [ γ ,γ ] k D Σ( γ ) k k γ − γ k , where ℓ [ γ , γ ] = { tγ + (1 − t ) γ : t ∈ [0 , } . The claim now follows by (4.3) in Assumption 8. Proof of Theorem 4.1.
This a particular case of the proof of Theorem 5.1 with λ = 0, andso S ( λ ) = I n . Proof of Theorem 4.2.
From Corollary 4.1 and Lemma B.2, k Σ ( b γ ) − Σ k = O p ( k b γ − γ k ) = p d γ /n , so we have, from Assumption R.3, (cid:13)(cid:13) Σ ( b γ ) − − Σ − (cid:13)(cid:13) ≤ (cid:13)(cid:13) Σ ( b γ ) − (cid:13)(cid:13) k Σ ( b γ ) − Σ k (cid:13)(cid:13) Σ − (cid:13)(cid:13) = O p ( k b γ − γ k ) = q d γ /n. (A.1)Similarly, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( b γ ) − Ψ (cid:19) − − (cid:18) n Ψ ′ Σ − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( b γ ) − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) n Ψ ′ (cid:0) Σ ( b γ ) − − Σ − (cid:1) Ψ (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ sup γ ∈ Γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( γ ) − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13) Σ ( b γ ) − − Σ − (cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) √ n Ψ (cid:13)(cid:13)(cid:13)(cid:13) = O p ( k b γ − γ k ) = q d γ /n. By Assumption R.2, we have b α − α ∗ = O p (1 / √ n ). Denote by θ ∗ ( x ) = ψ ( x ) ′ β ∗ , where β ∗ = arg min β E [ y i − ψ ( x i ) ′ β )] , and set θ ni = θ ( x i ), θ i = θ ( x i ), b θ i = ψ ′ i b β , b f i = f ( x i , b α ), f ∗ i = f ( x i , α ∗ ). Then b u i = y i − f ( x i , b α ) = u i + θ i − b f i . Let θ = ( θ ( x ) , . . . , θ ( x n )) ′ asbefore, with similar component-wise notation for the n -dimensional vectors b θ , θ ∗ , b f , and u .As the approximation error is e = θ − θ ∗ = θ − Ψ β ∗ , b θ − θ ∗ = Ψ( b β − β ∗ ) = Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − ( u + θ − Ψ β ∗ )= Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − ( u + e ) , so that n b m n = b σ − b v ′ Σ ( b γ ) − b u = b σ − (cid:16)b θ − b f (cid:17) ′ Σ ( b γ ) − (cid:16) y − b f (cid:17) = b σ − (cid:16)b θ − θ ∗ + θ ∗ − θ + θ − b f (cid:17) ′ Σ ( b γ ) − (cid:16) u + θ − b f (cid:17) = b σ − h Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − ( u + e ) − e + θ − b f i ′ Σ ( b γ ) − (cid:16) u + θ − b f (cid:17) = b σ − u ′ Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − u + b σ − u ′ Σ ( b γ ) − (cid:16) θ − b f (cid:17) − b σ − (cid:16) u + θ − b f (cid:17) ′ Σ ( b γ ) − (cid:0) I − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − (cid:1) e + b σ − (cid:16) θ − b f (cid:17) ′ Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − u + b σ − ( θ − b f ) ′ Σ ( b γ ) − ( θ − b f )= b σ − u ′ Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − u + b σ − ( A + A + A + A ) , say. First, for any vector g comprising of conditioned random variables, E h(cid:0) u ′ Σ( γ ) − g (cid:1) i = g ′ Σ( γ ) − ΣΣ( γ ) − g ≤ sup γ ∈ Γ (cid:13)(cid:13) Σ( γ ) − (cid:13)(cid:13) k Σ k k g k = O p (cid:0) k g k (cid:1) , uniformly in γ ∈ Γ. Similarly, E (cid:20)(cid:16) u ′ Σ( γ ) − Ψ (cid:0) Ψ ′ Σ( γ ) − Ψ (cid:1) − Ψ ′ Σ( γ ) − g (cid:17) (cid:21) = g ′ Σ( γ ) − Ψ (cid:0) Ψ ′ Σ( γ ) − Ψ (cid:1) − Ψ ′ Σ( γ ) − ΣΣ( γ ) − Ψ (cid:0) Ψ ′ Σ( γ ) − Ψ (cid:1) − Ψ ′ Σ( γ ) − g sup γ ∈ Γ (cid:13)(cid:13) Σ( γ ) − (cid:13)(cid:13) k Σ k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n Ψ (cid:18) n Ψ ′ Σ( γ ) − Ψ (cid:19) − Ψ ′ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k g k = O p (cid:0) k g k (cid:1) , uniformly and, for any j = 1, . . . , d γ , E h(cid:0) u ′ Σ( γ ) − Σ j ( γ ) Σ ( γ ) − g (cid:1) i = g ′ Σ( γ ) − Σ j ( γ ) Σ ( γ ) − ΣΣ( γ ) − Σ j ( γ ) Σ ( γ ) − g ≤ sup γ ∈ Γ (cid:13)(cid:13) Σ( γ ) − (cid:13)(cid:13) k Σ j ( γ ) k k Σ k k g k = O p (cid:0) k g k (cid:1) . Let Ψ k be the k -th column of Ψ, k = 1 , . . . , p . Then, we have k Ψ k / √ n k = O p (1) and for any γ ∈ Γ, E (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ ( γ ) − Ψ (cid:13)(cid:13)(cid:13)(cid:13) ≤ p X k =1 E (cid:18) u ′ Σ ( γ ) − √ n Ψ k (cid:19) = O p ( p ) ,E (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ ( γ ) − Σ j ( γ ) Σ ( γ ) − Ψ (cid:13)(cid:13)(cid:13)(cid:13) ≤ p X k =1 E (cid:18) u ′ Σ ( γ ) − Σ j ( γ ) Σ ( γ ) − √ n Ψ k (cid:19) = O ( p ) . Therefore, by Chebyshev’s inequality,sup γ ∈ Γ (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ ( γ ) − Ψ (cid:13)(cid:13)(cid:13)(cid:13) = O p ( √ p ) and sup γ ∈ Γ (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ ( γ ) − Σ j ( γ ) Σ ( γ ) − Ψ (cid:13)(cid:13)(cid:13)(cid:13) = O p ( √ p ) . By the decomposition u ′ (cid:0) Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − − Σ − Ψ[Ψ ′ Σ − Ψ] − Ψ ′ Σ − (cid:1) u = u ′ (cid:0) Σ ( b γ ) − + Σ − (cid:1) Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ n X i =1 e in e ′ in ! (cid:0) Σ ( b γ ) − − Σ − (cid:1) u + u ′ Σ − Ψ (cid:0) [Ψ ′ Σ ( b γ ) − Ψ] − − [Ψ ′ Σ − Ψ] − (cid:1) Ψ ′ Σ − u = u ′ (cid:0) Σ ( b γ ) − + Σ − (cid:1) Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ n X i =1 e in e ′ in ! d γ X j =1 (cid:0) Σ ( e γ ) − Σ j ( e γ ) Σ ( e γ ) − (cid:1) × u ( e γ j − γ j ) + u ′ Σ − Ψ (cid:0) [Ψ ′ Σ ( b γ ) − Ψ] − − [Ψ ′ Σ − Ψ] − (cid:1) Ψ ′ Σ − u, where e in is an n × i -th entry one and zeros elsewhere, so P ni =1 e in e ′ in = I n ,and e ′ in (cid:0) Σ ( b γ ) − − Σ − (cid:1) u = d γ X j =1 e ′ in (cid:0) Σ ( e γ ) − Σ j ( e γ ) Σ ( e γ ) − (cid:1) u ( e γ j − γ j )28 e ′ in d γ X j =1 (cid:0) Σ ( e γ ) − Σ j ( e γ ) Σ ( e γ ) − (cid:1) u ( e γ j − γ j )where e γ is a value between b γ and γ due to the mean value theorem. We have (cid:12)(cid:12) u ′ (cid:0) Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − − Σ − Ψ[Ψ ′ Σ − Ψ] − Ψ ′ Σ − (cid:1) u (cid:12)(cid:12) ≤ γ ∈ Γ (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ ( γ ) − Ψ (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( γ ) − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) d γ X j =1 (cid:13)(cid:13)(cid:13)(cid:13) √ n Ψ ′ (cid:0) Σ ( γ ) − Σ j ( γ ) Σ ( γ ) − (cid:1) u (cid:13)(cid:13)(cid:13)(cid:13) × | e γ j − γ j | + (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ − Ψ (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( b γ ) − Ψ (cid:19) − − (cid:18) n Ψ ′ Σ − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p ( √ p ) O p ( d γ √ p/ √ n ) + O p ( p ) O p ( p d γ / √ n ) = O p ( d γ p/ √ n ) = o p ( √ p ) , where the last equality holds under the conditions of the theorem.It remains to show that A i = o p (cid:0) p / (cid:1) , i = 1 , . . . , . (A.2)It is convenient to perform the calculations under H ℓ , which covers H as a particular case.Using the mean value theorem and either H or H ℓ , we can express θ i − b f i = f ∗ i − b f i − ( p / /n / ) h i = d α X j =1 ∂f ( x i , e α ) ∂α j ( α ∗ j − e α j ) − p / n / h i , (A.3)where e α j is a value between α ∗ j and b α j . Then, for any j = 1 , . . . , d α , (cid:12)(cid:12) α ∗ j − e α j (cid:12)(cid:12) = O p (1 / √ n ).Based onsup γ ∈ Γ (cid:12)(cid:12)(cid:12) u ′ Σ( γ ) − Ψ (cid:0) Ψ ′ Σ( γ ) − Ψ (cid:1) − Ψ ′ Σ( γ ) − g (cid:12)(cid:12)(cid:12) = O p ( k g k ) and sup γ ∈ Γ (cid:12)(cid:12) u ′ Σ( γ ) − g (cid:12)(cid:12) = O p ( k g k )for any γ ∈ Γ and any conditioned vector g , if we take g = ∂f ( x, α ) /∂α j or g = h , then bothsatisfy O p ( k g k ) = O p ( √ n ) and it follows that | A | = (cid:12)(cid:12)(cid:12) u ′ Σ ( b γ ) − (cid:16) θ − b f (cid:17)(cid:12)(cid:12)(cid:12) ≤ sup γ,α d α X j =1 (cid:13)(cid:13)(cid:13)(cid:13) u ′ Σ( γ ) − ∂f ( x, α ) ∂α j (cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12) α ∗ j − e α j (cid:12)(cid:12) + p / n / sup γ (cid:13)(cid:13) u ′ Σ( γ ) − h (cid:13)(cid:13) = O p ( √ n ) O p (cid:18) √ n (cid:19) + O (cid:18) p / n / (cid:19) O p ( √ n ) = O p ( p / ) = o p ( p / ) . | A | = (cid:12)(cid:12)(cid:12) u ′ Σ ( b γ ) − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − ( θ − b f ) (cid:12)(cid:12)(cid:12) ≤ sup γ,α d α X j =1 (cid:13)(cid:13)(cid:13)(cid:13) u ′ Σ ( b γ ) − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − ∂f ( x, α ) ∂α j (cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12) α ∗ j − e α j (cid:12)(cid:12) + p / n / sup γ (cid:13)(cid:13)(cid:13) u ′ Σ ( b γ ) − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − h (cid:13)(cid:13)(cid:13) = O p (1) + O p ( p / ) = O p ( p / ) = o p ( p / ) . Also, by Assumptions R.2 and R.10, we have (cid:13)(cid:13)(cid:13) θ − b f (cid:13)(cid:13)(cid:13) ≤ sup α d α X j =1 (cid:13)(cid:13)(cid:13)(cid:13) ∂f ( x, α ) ∂α j (cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12) α ∗ j − e α j (cid:12)(cid:12) + k h k p / n / = O p ( p / ) . (A.4)By (3.2), we have k e k = O ( p − µ n / ) and | A | = (cid:12)(cid:12)(cid:12) ( u + θ − b f ) ′ (cid:0) Σ ( b γ ) − − Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − (cid:1) e (cid:12)(cid:12)(cid:12) ≤ sup γ | u ′ Σ ( γ ) − e | + sup γ (cid:12)(cid:12) u ′ Σ ( γ ) − Ψ[Ψ ′ Σ ( γ ) − Ψ] − Ψ ′ Σ ( γ ) − e (cid:12)(cid:12) + (cid:13)(cid:13)(cid:13) θ − b f (cid:13)(cid:13)(cid:13) sup γ (cid:0)(cid:13)(cid:13) Σ ( γ ) − (cid:13)(cid:13) + (cid:13)(cid:13) Σ ( γ ) − Ψ[Ψ ′ Σ ( γ ) − Ψ] − Ψ ′ Σ ( γ ) − (cid:13)(cid:13)(cid:1) k e k = O p ( p − µ n / ) + O p ( p − µ +1 / n / ) = O p ( p − µ +1 / n / ) = o p ( √ p ) . where the last equality holds under the conditions of the theorem. Finally, under H ℓ , A = (cid:16) θ − b f (cid:17) ′ Σ ( b γ ) − (cid:16) θ − b f (cid:17) = (cid:16) θ − b f (cid:17) ′ Σ − (cid:16) θ − b f (cid:17) + (cid:16) θ − b f (cid:17) ′ (cid:0) Σ ( b γ ) − − Σ − (cid:1) (cid:16) θ − b f (cid:17) = p / n h ′ Σ − h + o p (1) + O p (cid:0) p / d / γ /n / (cid:1) = p / n h ′ Σ − h + o p ( √ p ) . Combining these together, we have n b m n = b σ − b v ′ Σ ( b γ ) − b u = 1 σ ε ′ V ε + p / n h ′ Σ − h + o p ( √ p ) , under H ℓ and the same expression holds with h = 0 under H .30 roof of Theorem 4.3. We would like to establish the asymptotic unit normality of σ − ε ′ V ε − p √ p . (A.5)Writing q = √ p , the ratio in (A.5) has zero mean and variance equal to one, and may bewritten as P ∞ s =1 w s , where w s = σ − q − v ss ( ε s − σ ) + 2 σ − q − ( s ≥ ε s P t
32y Assumption R.4. Thus (A.13) is established. Notice that E ( w s | ǫ t , t < s ) equals4 q − σ − ((cid:0) µ − σ (cid:1) v ss + 2 µ ( s ≥ X t C n ) → , so thatconsistency follows. (3) Follows from Theorems 4.2 and 4.3. Proof of Theorem 5.1.
We show b φ p → φ , whence b β p → β and b σ p → σ follow from (5.3) and(5.4) respectively. First note that L ( φ ) −L = log σ ( φ ) /σ − n − log (cid:12)(cid:12) T ′ ( λ )Σ( γ ) − T ( λ )Σ (cid:12)(cid:12) = log σ ( φ ) /σ ( φ ) − log σ /σ +log r ( φ ) , (A.18)where recall that σ ( φ ) = n − σ tr ( T ′ ( λ )Σ( γ ) − T ( λ )Σ) , σ = σ ( φ ) = n − u ′ Σ ′− M Σ − u, using (5.4) and also r ( φ ) = n − tr ( T ′ ( λ )Σ( γ ) − T ( λ )Σ) / | T ′ ( λ )Σ( γ ) − T ( λ )Σ | /n . We have σ ( φ ) = n − (cid:8) S − ′ (Ψ β + u ) (cid:9) ′ S ′ ( λ )Σ( γ ) ′− M ( γ ) Σ( γ ) − S ( λ ) S − (Ψ β + u ) = c ( φ ) +35 ( φ ) + c ( φ ), where c ( φ ) = n − β ′ Ψ ′ T ′ ( λ )Σ( γ ) ′− M ( γ ) Σ( γ ) − T ( λ )Ψ β ,c ( φ ) = n − σ tr (cid:16) T ′ ( λ )Σ( γ ) ′− M ( γ ) Σ( γ ) − T ( λ )Σ (cid:17) ,c ( φ ) = n − tr (cid:16) T ′ ( λ )Σ( γ ) ′− M ( γ ) Σ( γ ) − T ( λ ) (cid:0) uu ′ − σ Σ (cid:1)(cid:17) + 2 n − β ′ Ψ ′ T ′ ( λ )Σ( γ ) ′− M ( γ ) Σ( γ ) − T ( λ ) u. Note that in the particular cases of Theorems 4.1 and 6.1, where T ( λ ) = I n , the c termvanishes because M ( γ ) Σ( γ ) − Ψ = 0 and M ( τ ) Σ( τ ) − Ψ = 0. Proceeding with the current,more general prooflog σ ( φ ) σ ( φ ) = log σ ( φ )( c ( φ ) + c ( φ )) + log c ( φ ) + c ( φ ) σ ( φ )= log (cid:18) c ( φ ) c ( φ ) + c ( φ ) (cid:19) + log (cid:18) c ( φ ) − f ( φ ) σ ( φ ) (cid:19) , where f ( φ ) = n − σ tr (cid:16) Σ ′ T ′ ( λ )Σ( γ ) ′− ( I n − M ( γ )) Σ( γ ) − T ( λ )Σ (cid:17) . Then (A.18) implies P (cid:16)(cid:13)(cid:13)(cid:13) b φ − φ (cid:13)(cid:13)(cid:13) ∈ N φ ( η ) (cid:17) = P inf φ ∈ N φ ( η ) L ( φ ) − L ≤ ! ≤ P log φ ∈ N φ ( η ) (cid:12)(cid:12)(cid:12)(cid:12) c ( φ ) c ( φ ) + c ( φ ) (cid:12)(cid:12)(cid:12)(cid:12)! + (cid:12)(cid:12) log (cid:0) σ /σ (cid:1)(cid:12)(cid:12) ≥ inf φ ∈ N φ ( η ) (cid:18) log (cid:18) c ( φ ) − f ( φ ) σ ( φ ) (cid:19) + log r ( φ ) (cid:19)! , where recall that N φ ( η ) = Φ \N φ ( η ) , N φ ( η ) = { φ : k φ − φ k < η }∩ Φ . Because σ /σ p → , the property log (1 + x ) = x + o ( x ) as x → φ ∈ N φ ( η ) (cid:12)(cid:12)(cid:12)(cid:12) c ( φ ) c ( φ ) + c ( φ ) (cid:12)(cid:12)(cid:12)(cid:12) p −→ , (A.19)sup φ ∈ N φ ( η ) (cid:12)(cid:12)(cid:12)(cid:12) f ( φ ) σ ( φ ) (cid:12)(cid:12)(cid:12)(cid:12) p −→ , (A.20) P inf φ ∈ N φ ( η ) (cid:26) c ( φ ) σ ( φ ) + log r ( φ ) (cid:27) > ! −→ . (A.21)36ecause N φ ( η ) ⊆ (cid:8) Λ × N γ ( η/ (cid:9) ∪ n N λ ( η/ × Γ o , we have P inf φ ∈ N φ ( η ) (cid:26) c ( φ ) σ ( φ ) + log r ( φ ) (cid:27) > ! ≥ P min ( inf Λ ×N γ ( η/ c ( φ ) σ ( φ ) , inf N λ ( η/ log r ( φ ) ) > ! ≥ P min ( inf Λ ×N γ ( η/ c ( φ ) C , inf N λ ( η/ log r ( φ ) ) > ! , from Assumption SAR.2, whence Assumptions SAR.3 and SAR.4 imply (A.21). Again usingAssumption SAR.2, uniformly in φ , | f ( φ ) /σ ( φ ) | = O p ( | f ( φ ) | ) and | f ( φ ) | = O p (cid:16) tr (cid:16) Σ ′ T ′ ( λ )Σ( γ ) − Ψ (cid:0) Ψ ′ Σ( γ ) − Ψ (cid:1) − Ψ ′ Σ( γ ) − T ( λ )Σ (cid:17) /n (cid:17) = O p (cid:16) tr (cid:16) Σ ′ T ′ ( λ )Σ( γ ) − ΨΨ ′ Σ( γ ) − T ( λ )Σ (cid:17) /n (cid:17) = O p (cid:18)(cid:13)(cid:13)(cid:13) Ψ ′ Σ( γ ) − T ( λ )Σ /n (cid:13)(cid:13)(cid:13) F (cid:19) = O p (cid:18) k Ψ /n k F ϕ (cid:0) Σ( γ ) − (cid:1) k T ( λ ) k (cid:13)(cid:13)(cid:13) Σ (cid:13)(cid:13)(cid:13) (cid:19) = O p (cid:0) k Ψ /n k F k T ( λ ) k ϕ (Σ) /ϕ (Σ( γ )) (cid:1) = O p (cid:0) k T ( λ ) k /n (cid:1) , (A.22)where we have twice made use of the inequality k AB k F ≤ k A k F k B k (A.23)for generic multiplication compatible matrices A and B . (A.20) now follows by AssumptionSAR.1 and compactness of Λ because T ( λ ) = I n + P d λ j =1 ( λ j − λ j ) G j . Finally consider(A.19). We first prove pointwise convergence. For any fixed φ ∈ N φ ( η ) and large enough n , Assumptions SAR.2 and SAR.4 imply { c ( φ ) } − = O p (cid:0) k β k − (cid:1) = O p (1) (A.24) { c ( φ ) } − = O p (1) , (A.25)because n n − σ tr (cid:16) Σ ′ T ′ ( λ )Σ( γ ) − T ( λ )Σ (cid:17)o − = O p (1) and, proceeding like in the boundfor | f ( φ ) | , tr (cid:16) Σ ′ T ′ ( λ )Σ( γ ) ′− ( I − M ( γ )) Σ( γ ) − T ( λ )Σ (cid:17) = O p (cid:0) k T ( λ ) k /n (cid:1) = O p (1 /n ).In fact it is worth noting for the equicontinuity argument presented later that AssumptionsSAR.2 and SAR.4 actually imply that (A.24) and (A.25) hold uniformly over N φ ( η ), aproperty not needed for the present pointwise arguments. Thus c ( φ ) / ( c ( φ ) + c ( φ )) = O p ( | c ( φ ) | ) where, writing B ( φ ) = T ′ ( λ )Σ( γ ) ′− M ( γ ) Σ( γ ) − T ( λ ) with typical element37 rs ( φ ), r, s = 1 , . . . , n , c ( φ ) has mean 0 and variance O p k B ( φ )Σ k F n + P nr,s,t,v =1 b rs ( φ ) b tv ( φ ) κ rstv n + (cid:13)(cid:13)(cid:13) β ′ Ψ ′ B ( φ )Σ (cid:13)(cid:13)(cid:13) n , (A.26)with κ rstv denoting the fourth cumulant of u r , u s , u t , u v , r, s, t, v = 1 , . . . , n . Under the linearprocess assumed in Assumption R.4 it is known that n X r,s,t,v =1 κ rstv = O ( n ) . (A.27)Using (A.23) and Assumptions SAR.1 and R.3, the first term in parentheses in (A.26) is O p (cid:0) k B ( φ ) k F ϕ (Σ) /n (cid:1) = O p (cid:18) k T ( λ ) k F (cid:13)(cid:13)(cid:13) Σ( γ ) − (cid:13)(cid:13)(cid:13) k M ( γ ) k k T ( λ ) k /n (cid:19) = O p (cid:0) k T ( λ ) k /nϕ (Σ( γ )) (cid:1) = O p (cid:0) k T ( λ ) k /n (cid:1) , (A.28)while the second is similarly O p (cid:0) k B ( φ ) k F /n (cid:1) n X r,s,t,v =1 κ rstv /n ! = o p (cid:0) k T ( λ ) k (cid:1) , (A.29)using (A.27). Finally, the third term in parentheses in (A.26) is O p (cid:0) k B ( φ ) k /n (cid:1) = O p (cid:0) k T ( λ ) k /n (cid:1) . (A.30)By compactness of Λ and Assumption SAR.1, (A.28), (A.29) and (A.30) are negligible, thuspointwise convergence is established.Uniform convergence will follow from an equicontinuity argument. First, for arbitrary ε > φ ∗ = ( λ ′∗ , γ ′∗ ) ′ , possibly infinitely many, such that the neighbourhoods k φ − φ ∗ k < ε form an open cover of N φ ( η ). Since Φ is compact any open cover has a finitesubcover and thus we may in fact choose finitely many φ ∗ = ( λ ′∗ , γ ′∗ ) ′ , whence it suffices toprove sup k φ − φ ∗ k <ε (cid:12)(cid:12)(cid:12)(cid:12) c ( φ ) c ( φ ) + c ( φ ) − c ( φ ∗ ) c ( φ ∗ ) + c ( φ ∗ ) (cid:12)(cid:12)(cid:12)(cid:12) p −→ . Proceeding as in Gupta and Robinson (2018), we denote the two components of c ( φ ) by38 ( φ ) , c ( φ ) , and are left with establishing the negligibility of | c ( φ ) − c ( φ ∗ ) | c ( φ ) + | c ( φ ) − c ( φ ∗ ) | c ( φ ) + | c ( φ ∗ ) | c ( φ ) c ( φ ∗ ) | c ( φ ∗ ) − c ( φ ) | + | c ( φ ∗ ) | c ( φ ) c ( φ ∗ ) | c ( φ ∗ ) − c ( φ ) | , (A.31)uniformly on k φ − φ ∗ k < ε . By the fact that (A.24) and (A.25) hold uniformly over Φ, wefirst consider only the numerators in the first two terms in (A.31). As in the proof of Theorem1 of Delgado and Robinson (2015), (A.23) implies that E (cid:0) sup k φ − φ ∗ k <ε | c ( φ ) − c ( φ ∗ ) | (cid:1) isbounded by n − (cid:0) E k u k + σ tr Σ (cid:1) sup k φ − φ ∗ k <ε k B ( φ ) − B ( φ ∗ ) k = O p sup k φ − φ ∗ k <ε k B ( φ ) − B ( φ ∗ ) k ! , because E k u k = O ( n ) and tr Σ = O ( n ). B ( φ ) − B ( φ ∗ ) can be written as( T ( λ ) − T ( λ ∗ )) ′ Σ( γ ) ′− M ( γ )Σ( γ ) − T ( λ ) + T ( λ ∗ ) ′ Σ ′ ( γ ∗ ) M ( γ ∗ )Σ( γ ∗ ) − ( T ( λ ) − T ( λ ∗ ))+ T ′ ( λ ∗ ) (cid:16) Σ( γ ) ′− M ( γ )Σ( γ ) − − Σ( γ ∗ ) ′− M ( γ ∗ )Σ( γ ∗ ) − (cid:17) T ( λ ) , (A.32)which, by the triangle inequality, has spectral norm bounded by k T ( λ ) − T ( λ ∗ ) k (cid:18)(cid:13)(cid:13)(cid:13) Σ( γ ) − (cid:13)(cid:13)(cid:13) k T ( λ ) k + (cid:13)(cid:13)(cid:13) Σ( γ ∗ ) − (cid:13)(cid:13)(cid:13) k T ( λ ∗ ) k (cid:19) + k T ( λ ∗ ) k (cid:13)(cid:13)(cid:13) Σ( γ ) ′− M ( γ )Σ( γ ) − − Σ( γ ∗ ) ′− M ( γ ∗ )Σ( γ ∗ ) − (cid:13)(cid:13)(cid:13) k T ( λ ) k = O p (cid:16) k T ( λ ) − T ( λ ∗ ) k + (cid:13)(cid:13)(cid:13) Σ( γ ) ′− M ( γ )Σ( γ ) − − Σ( γ ∗ ) ′− M ( γ ∗ )Σ( γ ∗ ) − (cid:13)(cid:13)(cid:13)(cid:17) . (A.33)By Assumption SAR.1 the first term in parentheses on the right side of (A.33) is boundeduniformly on k φ − φ ∗ k < ε by d λ X j =1 | λ j − λ ∗ j | k G j k ≤ max j =1 ,...,d λ k G j k k λ − λ ∗ k = O p ( ε ) , (A.34)while because Σ( γ ) ′− M ( γ )Σ( γ ) − = n − Σ( γ ) − Ψ ( n − Ψ ′ Σ( γ ) − Ψ) − Ψ ′ Σ( γ ) − for any γ ∈ Γ, the second one can be decomposed into terms with bounds typified by n − (cid:13)(cid:13) Σ( γ ) − − Σ( γ ∗ ) − (cid:13)(cid:13) k Ψ k (cid:13)(cid:13)(cid:13)(cid:0) n − Ψ ′ Σ( γ ) − Ψ (cid:1) − (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) Σ( γ ) − (cid:13)(cid:13) n − k Σ( γ ) − Σ( γ ∗ ) k k Ψ k (cid:13)(cid:13)(cid:13)(cid:0) n − Ψ ′ Σ( γ ) − Ψ (cid:1) − (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) Σ( γ ) − (cid:13)(cid:13) (cid:13)(cid:13) Σ( γ ∗ ) − (cid:13)(cid:13) = O p ( k Σ( γ ) − Σ( γ ∗ ) k ) = O p ( ε ) , uniformly on k φ − φ ∗ k < ε , by Assumptions R.3 and R.8, Proposition 4.1 and the inequality k A k ≤ k A k F for a generic matrix A , so thatsup k φ − φ ∗ k <ε k B ( φ ) − B ( φ ∗ ) k = O p ( ε ) . (A.35)Thus equicontinuity of the first term in (A.31) follows because ε is arbitrary. The equicontinu-ity of the second term in (A.31) follows in much the same way. Indeed sup k φ − φ ∗ k <ε c ( φ ) − c ( φ ∗ ) = 2 n − β ′ Ψ ′ sup k φ − φ ∗ k <ε ( B ( φ ) − B ( φ ∗ )) u = O p (cid:0) sup k φ − φ ∗ k <ε k B ( φ ) − B ( φ ∗ ) k (cid:1) = O p ( ε ), using earlier arguments and (A.35). Because c ( φ ) is bounded and bounded awayfrom zero in probability (see A.24) for sufficiently large n and all φ ∈ N φ ( η ), the thirdterm in (A.31) may be bounded by | c ( φ ∗ ) | /c ( φ ∗ ) (1 + c ( φ ∗ ) /c ( φ )) p −→ , convergencebeing uniform on k φ − φ ∗ k < ε by pointwise convergence of c ( φ ) / ( c ( φ ) + c ( φ )), cf.Gupta and Robinson (2018). The uniform convergence to zero of the fourth term in (A.31)follows in identical fashion, because c ( φ ) is bounded and bounded away from zero (see(A.25)) in probability for sufficiently large n and all φ ∈ N φ ( η ). This concludes the proof. Proof of Theorem 5.2.
Denote θ ∗ as the solution of min θ E (cid:16) y i − P d λ j =1 λ j w ′ i,j y − θ ( x i ) (cid:17) .Put θ ∗ i = θ ∗ ( x i ), θ i = θ ( x i ), b θ i = ψ ′ i b β , b f i = f ( x i , b α ), f ∗ i = f ( x i , α ∗ ). Then b u i = y i − P d λ j =1 b λ j w ′ i,j y − f ( x i , b α ) = u i + θ i + P d λ j =1 ( λ j − b λ j ) w ′ i,j y − b f i . Proceeding as in the proofof Theorem 4.2, we obtain n b m n = b σ − u ′ Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − u + b σ − P j =1 A j .Thus, compared to the test statistic with no spatial lag, cf. the proof of Theorem 4.2, wehave the additional terms A = d λ X j =1 ( λ j − b λ j ) y ′ W ′ j Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − d λ X j =1 ( λ j − b λ j ) W j y,A = d λ X j =1 ( λ j − b λ j ) y ′ W ′ j Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − ( u + θ − b f ) ,A = (cid:16) Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − ( u + e ) − e + θ − b f (cid:17) ′ Σ ( b γ ) − d λ X j =1 ( λ j − b λ j ) W j y. We now show that A ℓ = o p ( √ p ) , ℓ >
4, so the leading term in n b m n is the same as before. First k y k = O p ( √ n ) from y = ( I n − P d λ j =1 λ j W j ) − ( θ + u ). Then, with (cid:13)(cid:13)(cid:13) λ − b λ (cid:13)(cid:13)(cid:13) = O p (cid:16)p d γ /n (cid:17)
40y Lemma B.2, we have | A | ≤ (cid:13)(cid:13)(cid:13) λ − b λ (cid:13)(cid:13)(cid:13) d λ X j =1 k W j k sup γ,j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Σ ( γ ) − n Ψ (cid:18) n Ψ ′ Σ ( γ ) − Ψ (cid:19) − Ψ ′ Σ ( γ ) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k y k = O p ( d γ /n ) O p (1) O p ( n ) = O p ( d γ ) = o p ( √ p ) . Uniformly in γ and j , E (cid:0) u ′ S − ′ W ′ j Σ ( γ ) − Ψ[Ψ ′ Σ ( γ ) − Ψ] − Ψ ′ Σ ( γ ) − u (cid:1) = Etr (cid:18) n Ψ ′ Σ ( γ ) − Ψ (cid:19) − n Ψ ′ Σ ( γ ) − Σ S − ′ W ′ j Σ ( γ ) − Ψ ! = O p ( p )and E (cid:0) θ ′ S − ′ W ′ j Σ ( γ ) − Ψ[Ψ ′ Σ ( γ ) − Ψ] − Ψ ′ Σ ( γ ) − u (cid:1) = O p (cid:13)(cid:13) S − (cid:13)(cid:13) sup γ (cid:13)(cid:13) Σ ( γ ) − (cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n Ψ (cid:18) n Ψ ′ Σ ( γ ) − Ψ (cid:19) − Ψ ′ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) sup j k W j k k Σ k k θ k = O p ( n ) . Similarly, θ ′ S − ′ W ′ j Σ ( γ ) − Ψ[Ψ ′ Σ ( γ ) − Ψ] − Ψ ′ Σ ( γ ) − W j θ = O p ( n ) , uniformly. Therefore, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d λ X j =1 ( λ j − b λ j ) y ′ W ′ j Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d λ X j =1 ( λ j − b λ j ) ( θ + u ) ′ S − ′ W ′ j Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ d λ (cid:13)(cid:13)(cid:13) λ − b λ (cid:13)(cid:13)(cid:13) sup γ,j (cid:12)(cid:12) θ ′ S − ′ W ′ j Σ ( γ ) − Ψ[Ψ ′ Σ ( γ ) − Ψ] − Ψ ′ Σ ( γ ) − u (cid:12)(cid:12) + d λ (cid:13)(cid:13)(cid:13) λ − b λ (cid:13)(cid:13)(cid:13) sup γ,j (cid:12)(cid:12) u ′ S − ′ W ′ j Σ ( γ ) − Ψ[Ψ ′ Σ ( γ ) − Ψ] − Ψ ′ Σ ( γ ) − u (cid:12)(cid:12) = O p (cid:18)q d γ /n (cid:19) O p ( √ n ) + O p (cid:18)q d γ /n (cid:19) O p ( p ) = O p (cid:16)p d γ (cid:17) = o p ( √ p ) , and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d λ X j =1 ( λ j − b λ j ) y ′ W ′ j Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − ( θ − b f ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ d λ (cid:13)(cid:13)(cid:13) λ − b λ (cid:13)(cid:13)(cid:13) k y k sup j k W j k sup γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n Ψ (cid:18) n Ψ ′ Σ ( γ ) − Ψ (cid:19) − Ψ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) sup γ (cid:13)(cid:13) Σ ( γ ) − (cid:13)(cid:13) (cid:13)(cid:13)(cid:13) θ − b f (cid:13)(cid:13)(cid:13) O p (cid:18)q d γ /n (cid:19) O p (cid:0) √ n (cid:1) O p (cid:0) p / (cid:1) = O p (cid:16)p d γ p / (cid:17) = o p ( √ p ) , so that A = o p ( √ p ). Finally, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d λ X j =1 ( λ j − b λ j ) y ′ W ′ j Σ ( b γ ) − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − e (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ d λ (cid:13)(cid:13)(cid:13) λ − b λ (cid:13)(cid:13)(cid:13) k y k sup j k W j k sup γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n Ψ (cid:18) n Ψ ′ Σ ( γ ) − Ψ (cid:19) − Ψ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) sup γ (cid:13)(cid:13) Σ ( γ ) − (cid:13)(cid:13) k e k = O p (cid:18)q d γ /n (cid:19) O p (cid:0) √ n (cid:1) O p (cid:0) p − µ √ n (cid:1) = O p (cid:16)p d γ p − µ √ n (cid:17) = o p ( √ p ) , and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( e + θ − b f ) ′ Σ ( b γ ) − d λ X j =1 ( λ j − b λ j ) W j y (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ d λ (cid:13)(cid:13)(cid:13) λ − b λ (cid:13)(cid:13)(cid:13) (cid:16) k e k + (cid:13)(cid:13)(cid:13) θ − b f (cid:13)(cid:13)(cid:13)(cid:17) sup γ (cid:13)(cid:13) Σ ( γ ) − (cid:13)(cid:13) sup j k W j k k y k = O p (cid:18)q d γ /n (cid:19) O p (cid:0) p − µ √ n + p / (cid:1) O p (cid:0) √ n (cid:1) = O p (cid:16)p d γ p − µ √ n + p d γ p / (cid:17) = o p ( √ p ) , implying that A = o p ( √ p ) . Proof of Theorem 5.3.
Omitted as it is similar to the proof of Theorem 4.4.
Proof of Proposition 6.1:
Because the map Σ : T o → M n × n is Fr´echet-differentiable on T o ,it is also Gˆateaux-differentiable and the two derivative maps coincide. Thus by Theorem 1.8of Ambrosetti and Prodi (1995), k Σ( t ) − Σ( t ) k ≤ sup t ∈T o k D Σ( t ) k L ( T o , M n × n ) k γ − γ k + d ζ X i =1 (cid:13)(cid:13) ( δ i − δ i ) ′ ϕ i (cid:13)(cid:13) w , (A.36)where d ζ X i =1 (cid:13)(cid:13) ( δ i − δ i ) ′ ϕ i (cid:13)(cid:13) w = d ζ X i =1 sup z ∈Z (cid:12)(cid:12) ( δ i − δ i ) ′ ϕ i (cid:12)(cid:12) (cid:0) k z k (cid:1) − w/ ≤ d ζ X i =1 k δ i − δ i k sup z ∈Z k ϕ i k (cid:0) k z k (cid:1) − w/ Cς ( r ) d ζ X i =1 k δ i − δ i k ≤ Cς ( r ) k t − t k . The claim now follows by (6.7) in Assumption NPN.2, because k γ − γ k ≤ Cς ( r ) k t − t k for some suitably chosen C . Proof of Theorem 6.1.
The proof is omitted as it is entirely analogous to that of Theorem5.1, with the exception of one difference when proving equicontinuity. In the setting ofSection 6, we obtain via Proposition 6.1 k Σ( τ ) − Σ ( τ ∗ ) k = O p ( ς ( r ) ε ). Because ε > ε ′ /ς ( r ), for some arbitrary ǫ ′ > Proof of Theorem 6.2.
Writing, δ ( z ) = (cid:16)b δ ′ ϕ ( z ) , . . . , b δ ′ d ζ ϕ d ζ ( z ) (cid:17) ′ and taking t = (cid:16)b γ ′ , ˆ δ ( z ) ′ (cid:17) ′ and t = ( γ ′ , ζ ( z ) ′ ) ′ in Proposition 6.1 implies (we suppress the argument z ) k Σ ( b τ ) − Σ k = O p (cid:16) ς ( r ) (cid:16) k b γ − γ k + (cid:13)(cid:13)(cid:13)b δ − ζ (cid:13)(cid:13)(cid:13)(cid:17)(cid:17) = O p ( ς ( r ) ( k b τ − τ k + k ν k ))= O p ς ( r ) max p d τ /n, vuut d ζ X i =1 r − κ i i , uniformly on Z . Thus we have (cid:13)(cid:13) Σ ( b τ ) − − Σ − (cid:13)(cid:13) ≤ (cid:13)(cid:13) Σ ( b τ ) − (cid:13)(cid:13) k Σ ( b τ ) − Σ k (cid:13)(cid:13) Σ − (cid:13)(cid:13) = O p ς ( r ) max p d τ /n, vuut d ζ X i =1 r − κ i i . And similarly, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( b τ ) − Ψ (cid:19) − − (cid:18) n Ψ ′ Σ − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( b τ ) − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) n Ψ ′ (cid:0) Σ ( b τ ) − − Σ − (cid:1) Ψ (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:0)(cid:13)(cid:13) Σ ( b τ ) − − Σ − (cid:13)(cid:13)(cid:1) = O p ς ( r ) max p d τ /n, vuut d ζ X i =1 r − κ i i . As in the proof of Theorem 4.2, n b m n = b σ − u ′ Σ ( b τ ) − Ψ[Ψ ′ Σ ( b τ ) − Ψ] − Ψ ′ Σ ( b τ ) − u + b σ − P ℓ =1 A ℓ , where γ in the parametric setting is changed to τ in this nonparametric setting.43hen, by the MVT, (cid:12)(cid:12) u ′ (cid:0) Σ ( b τ ) − Ψ[Ψ ′ Σ ( b τ ) − Ψ] − Ψ ′ Σ ( b τ ) − − Σ − Ψ[Ψ ′ Σ − Ψ] − Ψ ′ Σ − (cid:1) u (cid:12)(cid:12) ≤ sup t (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ ( t ) − Ψ (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( t ) − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)! d τ X j =1 (cid:13)(cid:13)(cid:13)(cid:13) √ n Ψ ′ (cid:0) Σ ( e τ ) − Σ j ( e τ ) Σ ( e τ ) − (cid:1) u (cid:13)(cid:13)(cid:13)(cid:13) × | e τ j − τ j | + 2 sup t (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ ( t ) − Ψ (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( t ) − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) √ n Ψ ′ (Σ − Σ) u (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) √ n u ′ Σ − Ψ (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n Ψ ′ Σ ( b τ ) − Ψ (cid:19) − − (cid:18) n Ψ ′ Σ − Ψ (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p ( √ p ) O p ( d τ √ pς ( r ) / √ n ) + O p ( √ p ) O p √ pς ( r ) vuut d ζ X i =1 r − κ i i + O p ( p ) O p ς ( r ) max p d τ /n, vuut d ζ X i =1 r − κ i i = O p pς ( r ) max d τ / √ n, vuut d ζ X i =1 r − κ i i = o p ( √ p ) , where the last equality holds under the conditions of the theorem. Next, it remains to show A ℓ = o p ( p / ) , ℓ = 1 , . . . ,
4. The order of A ℓ , ℓ ≤
3, is the same as the parametric case: | A | = (cid:12)(cid:12)(cid:12) u ′ Σ ( b τ ) − (cid:16) θ − b f (cid:17)(cid:12)(cid:12)(cid:12) ≤ sup α,t (cid:13)(cid:13)(cid:13)(cid:13) u ′ Σ ( t ) − ∂f ( x, α ) ∂α j (cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12) α ∗ j − e α j (cid:12)(cid:12) + p / n / sup t (cid:13)(cid:13) u ′ Σ ( t ) − h (cid:13)(cid:13) = O p ( √ n ) O p ( 1 √ n ) + O ( p / n / ) O p ( √ n ) = O p ( p / ) = o p ( p / ) , | A | = (cid:12)(cid:12)(cid:12) ( u + θ − b f ) ′ (cid:0) Σ ( b τ ) − − Σ ( b τ ) − Ψ[Ψ ′ Σ ( b τ ) − Ψ] − Ψ ′ Σ ( b τ ) − (cid:1) e (cid:12)(cid:12)(cid:12) ≤ sup t | u ′ Σ ( t ) − e | + sup t (cid:12)(cid:12) u ′ Σ ( t ) − Ψ[Ψ ′ Σ ( t ) − Ψ] − Ψ ′ Σ ( t ) − e (cid:12)(cid:12) + (cid:13)(cid:13)(cid:13) θ − b f (cid:13)(cid:13)(cid:13) sup t (cid:0)(cid:13)(cid:13) Σ ( t ) − (cid:13)(cid:13) + (cid:13)(cid:13) Σ ( t ) − Ψ[Ψ ′ Σ ( t ) − Ψ] − Ψ ′ Σ ( t ) − (cid:13)(cid:13)(cid:1) k e k = O p ( p − µ n / ) + O p ( p − µ +1 / n / ) = O p ( p − µ +1 / n / ) = o p ( √ p ) , | A | = (cid:12)(cid:12)(cid:12) u ′ Σ ( b τ ) − Ψ (cid:0) Ψ ′ Σ ( b τ ) − Ψ (cid:1) − Ψ ′ Σ ( b τ ) − ( θ − b f ) (cid:12)(cid:12)(cid:12) ≤ sup α,t d α X j =1 (cid:13)(cid:13)(cid:13)(cid:13) u ′ Σ ( t ) − Ψ (cid:0) Ψ ′ Σ ( t ) − Ψ (cid:1) − Ψ ′ Σ ( t ) − ∂f ( x, α ) ∂α j (cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12) α ∗ j − e α j (cid:12)(cid:12) p / n / sup t (cid:13)(cid:13)(cid:13) u ′ Σ ( t ) − Ψ (cid:0) Ψ ′ Σ ( t ) − Ψ (cid:1) − Ψ ′ Σ ( t ) − h (cid:13)(cid:13)(cid:13) = O p (1) + O p ( p / ) = O p ( p / ) = o p ( p / ) . However, A has a different order. Under H ℓ , A = (cid:16) θ − b f (cid:17) ′ Σ ( b γ ) − (cid:16) θ − b f (cid:17) = (cid:16) θ − b f (cid:17) ′ Σ − (cid:16) θ − b f (cid:17) + (cid:16) θ − b f (cid:17) ′ (cid:0) Σ ( b τ ) − − Σ − (cid:1) (cid:16) θ − b f (cid:17) = p / n h ′ Σ − h + o p (1) + O p (cid:0) p / (cid:1) O p ς ( r ) max p d τ /n, vuut d ζ X i =1 r − κ i i = p / n h ′ Σ − h + o p ( √ p ) , where the last equality holds under the conditions of the theorem. Combining these together,we have n b m n = b σ − b v ′ Σ ( b τ ) − b u = σ − ε ′ V ε + (cid:0) p / /n (cid:1) h ′ Σ − h + o p ( √ p ) , under H ℓ and thesame expression holds with h = 0 under H . Proof of Theorem 6.3.
Omitted as it is similar to the proof of Theorem 4.4.
B Lemmas
Lemma B.1.
Under the conditions of Theorem 4.1, c ( γ ) = n − β ′ Ψ ′ C ′ ( γ ) M ( γ ) C ( γ )Ψ β + o p (1) . Proof.
First, c ( γ ) = n − β ′ Ψ ′ C ′ ( γ ) M ( γ ) C ( γ )Ψ β + c ( γ ) + c ( γ ) , with c ( γ ) =2 n − e ′ C ′ ( γ ) M ( γ ) C ( γ )Ψ β and c ( γ ) = n − e ′ C ′ ( γ ) M ( γ ) C ( γ ) e . It is readily seen that c ( γ )and c ( γ ) are negligible. Lemma B.2.
Under the conditions of Theorem 4.2 or Theorem 5.2, k b γ − γ k = O p (cid:16)p d γ /n (cid:17) . Proof.
We show the details for the setting of Theorem 4.2 and omit the details for thesetting of Theorem 5.2. Write l = ∂L ( β , γ ) /∂γ . By Robinson (1988), we have k b γ − γ k = O p ( k l k ). Now l = (cid:0) l , . . . , l d γ (cid:1) ′ , with l j = n − tr (Σ − Σ j ) − n − σ − u ′ Σ − Σ j Σ − u . Next, E k l k = P d γ j =1 E (cid:0) l j (cid:1) and E (cid:0) l j (cid:1) = 1 n σ var (cid:0) u ′ Σ − Σ j Σ − u (cid:1) = 1 n σ var (cid:0) ε ′ B ′ Σ − Σ j Σ − Bε (cid:1) = 1 n σ var ( ε ′ D j ε ) , (B.1)45ay. But, writing d j,st for a typical element of the infinite dimensional matrix D j , we have var ( ε ′ D j ε ) = (cid:0) µ − σ (cid:1) ∞ X s =1 d j,ss + 2 σ tr (cid:0) D j (cid:1) = (cid:0) µ − σ (cid:1) ∞ X s =1 d j,ss + 2 σ ∞ X s,t =1 d j,st . (B.2)Next, by Assumptions R.4, R.3 and R.9 ∞ X s =1 d j,ss = ∞ X s =1 (cid:0) b ′ s Σ − Σ j Σ − b s (cid:1) ≤ ∞ X s =1 k b s k ! (cid:13)(cid:13) Σ − (cid:13)(cid:13) k Σ j k = O n X j =1 ∞ X s =1 b ∗ js ! = O ( n ) . (B.3)Similarly, ∞ X s,t =1 d j,st = ∞ X s =1 b ′ s Σ − Σ j Σ − ∞ X t =1 b t b ′ t ! Σ − Σ j Σ − b s = ∞ X s =1 b ′ s Σ − Σ j Σ − Σ j Σ − b s = O ( n ) . (B.4)Using (B.3) and (B.4) in (B.2) implies that E (cid:0) l j (cid:1) = O ( n − ), by (B.1). Thus we have E k l k = O ( d γ /n ), and thus k l k = O p (cid:16)p d γ /n (cid:17) , by Markov’s inequality, proving the lemma. Lemma B.3.
Under the conditions of Theorem 4.3, E (cid:0) σ − ε ′ V ε (cid:1) = p and V ar (cid:0) σ − ε ′ V ε (cid:1) / p → .Proof. As E (cid:0) σ − ε ′ V ε (cid:1) = tr ( E [ B ′ Σ − Ψ(Ψ ′ Σ − Ψ) − Ψ ′ Σ − B ]) = p, and V ar (cid:18) σ ε ′ V ε (cid:19) = (cid:18) µ σ − (cid:19) ∞ X s =1 E ( v ss ) + E [ tr ( V V ′ ) + tr ( V )] = (cid:18) µ σ − (cid:19) ∞ X s =1 v ss + 2 p, (B.5)it suffices to show that (2 p ) − ∞ X s =1 v ss p → . (B.6)Because v ss = b ′ s M b s , we have v ss = (cid:16)P ni,j =1 b is b js m ij (cid:17) . Thus, using Assumption R.4 and(A.9), we have ∞ X s =1 v ss ≤ (cid:18) sup i,j | m ij | (cid:19) ∞ X s =1 n X i,j =1 | b ∗ is | (cid:12)(cid:12) b ∗ js (cid:12)(cid:12)! = O p p n − sup s n X i =1 | b ∗ is | ! n X i =1 ∞ X s =1 | b ∗ is | = O p (cid:0) p n − (cid:1) , (B.7)establishing (B.6) because p /n →
0. 46 emma B.4.
Under the conditions of Theorem 6.2, k b τ − τ k = O p (cid:16)p d τ /n (cid:17) . Proof.
The proof is similar to that of Lemma B.2 and is omitted.Denote H ( γ ) = I n + P m + m j = m +1 γ j W j and K ( γ ) = I n − P m j =1 γ j W j . Let G j ( γ ) = W j K − ( γ ), j = 1 , . . . , m , T j = H − ( γ ) W j , j = m + 1 , . . . , m + m and, for a generic matrix A , denote A = A + A ′ . Our final conditions may differ according to whether the W j are of generalform or have ‘single nonzero diagonal block structure’, see e.g Gupta and Robinson (2015).To define these, denote by V an n × n block diagonal matrix with i -th block V i , a s i × s i matrix, where P m + m i =1 s i = n , and for i = 1 , ..., m + m obtain W j from V by replacingeach V j , j = i , by a matrix of zeros. Thus V = P m + m i =1 W j . Lemma B.5.
For the spatial error model with SARMA ( p, q ) errors, if sup γ ∈ Γ o (cid:0)(cid:13)(cid:13) K − ( γ ) (cid:13)(cid:13) + (cid:13)(cid:13) K ′− ( γ ) (cid:13)(cid:13) + (cid:13)(cid:13) H − ( γ ) (cid:13)(cid:13) + (cid:13)(cid:13) H ′− ( γ ) (cid:13)(cid:13)(cid:1) + max j =1 ,...,m + m k W j k < C, (B.8) then ( D Σ( γ )) (cid:0) γ † (cid:1) = A − ( γ ) m X j =1 γ † j H − ( γ ) G j ( γ ) + m + m X j = m +1 γ † j T j ( γ ) ! A ′− ( γ ) . Proof.
We first show that D Σ ∈ L (Γ o , M n × n ). Clearly, D Σ is a linear map and (B.8) (cid:13)(cid:13) ( D Σ( γ )) (cid:0) γ † (cid:1)(cid:13)(cid:13) ≤ C (cid:13)(cid:13) γ † (cid:13)(cid:13) , in the general case and (cid:13)(cid:13) ( D Σ( γ )) (cid:0) γ † (cid:1)(cid:13)(cid:13) ≤ C max j =1 ,...,m + m (cid:12)(cid:12)(cid:12) γ † j (cid:12)(cid:12)(cid:12) , in the ‘single nonzero diagonal block’ case. Thus D Σ is a bounded linear operator betweentwo normed linear spaces, i.e. it is a continuous linear operator.With A ( γ ) = H − ( γ ) K ( γ ), we now show that A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) − A − ( γ ) A ′− ( γ ) − ( D Σ( γ )) (cid:0) γ † (cid:1) k γ † k g → , as (cid:13)(cid:13) γ † (cid:13)(cid:13) g → , (B.9)where k·k g is either the 1-norm or the max norm on Γ. First, note that A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) − A − ( γ ) A ′− ( γ )= A − (cid:0) γ + γ † (cid:1) (cid:0) A − (cid:0) γ + γ † (cid:1) − A − ( γ ) (cid:1) ′ + (cid:0) A − (cid:0) γ + γ † (cid:1) − A − ( γ ) (cid:1) A − ( γ )= − A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) (cid:0) A (cid:0) γ + γ † (cid:1) − A ( γ ) (cid:1) ′ A ′− ( γ )47 A − (cid:0) γ + γ † (cid:1) (cid:0) A (cid:0) γ + γ † (cid:1) − A ( γ ) (cid:1) A − ( γ ) A ′− ( γ ) . (B.10)Next, A (cid:0) γ + γ † (cid:1) − A ( γ ) = H − (cid:0) γ + γ † (cid:1) K (cid:0) γ + γ † (cid:1) − H − ( γ ) K ( γ )= H − (cid:0) γ + γ † (cid:1) (cid:0) K (cid:0) γ + γ † (cid:1) − K ( γ ) (cid:1) + H − (cid:0) γ + γ † (cid:1) (cid:0) H ( γ ) − H (cid:0) γ + γ † (cid:1)(cid:1) H − ( γ ) K ( γ )= − H − (cid:0) γ + γ † (cid:1) m X j =1 γ † j W j + m + m X j = m +1 γ † j W j H − ( γ ) K ( γ ) ! . (B.11)Substituting (B.11) in (B.10) implies that A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) − A − ( γ ) A ′− ( γ ) = ∆ (cid:0) γ, γ † (cid:1) + ∆ (cid:0) γ, γ † (cid:1) = ∆ (cid:0) γ, γ † (cid:1) , (B.12)say, where∆ (cid:0) γ, γ † (cid:1) = A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) m X j =1 γ † j W ′ j + K ′ ( γ ) H ′− ( γ ) m + m X j = m +1 γ † j W ′ j ! × H ′− (cid:0) γ + γ † (cid:1) A ′− ( γ ) , ∆ (cid:0) γ, γ † (cid:1) = A − (cid:0) γ + γ † (cid:1) H − (cid:0) γ + γ † (cid:1) m X j =1 γ † j W j + m + m X j = m +1 γ † j W j H − ( γ ) K ( γ ) ! × A − ( γ ) A ′− ( γ ) . From the definitions above and recalling that A ( γ ) = H − ( γ ) K ( γ ), we can write∆ (cid:0) γ, γ † (cid:1) = A − (cid:0) γ + γ † (cid:1) Υ (cid:0) γ, γ † (cid:1) A ′− ( γ ) , (B.13)withΥ (cid:0) γ, γ † (cid:1) = m X j =1 γ † j G ′ j (cid:0) γ + γ † (cid:1) H ′− (cid:0) γ + γ † (cid:1) + A ′− (cid:0) γ + γ † (cid:1) A ′ ( γ ) m + m X j = m +1 γ † j T ′ j (cid:0) γ + γ † (cid:1) + m X j =1 γ † j H − (cid:0) γ + γ † (cid:1) G j ( γ ) + m + m X j = m +1 γ † j T j (cid:0) γ + γ † (cid:1) . A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) − A − ( γ ) A ′− ( γ ) − ( D Σ( γ )) (cid:0) γ † (cid:1) = A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) − A − ( γ ) A ′− ( γ ) − ∆ (cid:0) γ, γ † (cid:1) − ( D Σ( γ )) (cid:0) γ † (cid:1) + ∆ (cid:0) γ, γ † (cid:1) = ∆ (cid:0) γ, γ † (cid:1) − ( D Σ( γ )) (cid:0) γ † (cid:1) , (B.14)so to prove (B.9) it is sufficient to show that∆ (cid:0) γ, γ † (cid:1) − ( D Σ( γ )) (cid:0) γ † (cid:1) k γ † k g → (cid:13)(cid:13) γ † (cid:13)(cid:13) g → . (B.15)The numerator in (B.15) can be written as P i =1 Π i (cid:0) γ, γ † (cid:1) A ′− ( γ ) by adding, subtractingand grouping terms, where (omitting the argument (cid:0) γ, γ † (cid:1) )Π = A − (cid:0) γ + γ † (cid:1) m X j =1 γ † j G ′ j (cid:0) γ + γ † (cid:1) H ′− ( γ ) (cid:0) H ( γ ) − H (cid:0) γ + γ † (cid:1)(cid:1) ′ H ′− (cid:0) γ + γ † (cid:1) , Π = A − (cid:0) γ + γ † (cid:1) m X j =1 γ † j H − (cid:0) γ + γ † (cid:1) (cid:0) H ( γ ) − H (cid:0) γ + γ † (cid:1)(cid:1) H − ( γ ) G j ( γ ) , Π = A − (cid:0) γ + γ † (cid:1) m + m X j = m +1 γ † j (cid:0) A − (cid:0) γ + γ † (cid:1) − A − ( γ ) (cid:1) T ′ j (cid:0) γ + γ † (cid:1) , Π = (cid:0) A − (cid:0) γ + γ † (cid:1) − A − ( γ ) (cid:1) m + m X j = m +1 γ † j T j ( γ + γ † ) , Π = A − ( γ ) m + m X j = m +1 γ † j H − ( γ + γ † ) ( H ( γ ) − H ( γ + γ † )) H − ( γ ) W j , Π = ∆ (cid:0) γ, γ † (cid:1) m X j =1 γ † j W ′ j H ′− ( γ ) , Π = (cid:0) A − (cid:0) γ + γ † (cid:1) − A − ( γ ) (cid:1) m X j =1 γ † j H − ( γ ) G j ( γ ) . By (B.8), (B.13) and replication of earlier techniques, we havemax i =1 ,..., sup γ ∈ Γ o (cid:13)(cid:13) Π i (cid:0) γ, γ † (cid:1) A − ( γ ) (cid:13)(cid:13) ≤ C (cid:13)(cid:13) γ † (cid:13)(cid:13) g , (B.16)where the norm used on the RHS of (B.16) depends on whether we are considering the49eneral case or the ‘single nonzero diagonal block’ case. Thus (cid:13)(cid:13) ∆ (cid:0) γ, γ † (cid:1) − ( D Σ( γ )) (cid:0) γ † (cid:1)(cid:13)(cid:13) k γ † k g ≤ C (cid:13)(cid:13) γ † (cid:13)(cid:13) g → (cid:13)(cid:13) γ † (cid:13)(cid:13) g → , proving (B.15) and thus (B.9). Corollary B.1.
For the spatial error model with SAR ( p ) errors, ( D Σ( γ )) (cid:0) γ † (cid:1) = K − ( γ ) m X j =1 γ † j G j ( γ ) K ′− ( γ ) . Proof.
Taking m = 0 in Lemma B.5, the elements involving sums from m + 1 to m + m do not arise and H ( γ ) = I n , proving the claim. Corollary B.2.
For the spatial error model with SMA ( m ) errors, ( D Σ( γ )) (cid:0) γ † (cid:1) = H ( γ ) m X j =1 γ † j T j ( γ ) H ′ ( γ ) . Proof.
Taking m = 0 in Lemma B.5, the elements involving sums from 1 to m do not ariseand K ( γ ) = I n , proving the claim. Lemma B.6.
For the spatial error model with MESS ( p ) errors, if max j =1 ,...,m (cid:0) k W j k + (cid:13)(cid:13) W ′ j (cid:13)(cid:13)(cid:1) < , (B.17) then ( D Σ( γ )) (cid:0) γ † (cid:1) = exp m X j =1 γ j (cid:0) W j + W ′ j (cid:1)! m X j =1 γ † j (cid:0) W j + W ′ j (cid:1) . Proof.
Clearly D Σ ∈ L (Γ o , M n × n ). Next, (cid:13)(cid:13) A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) − A − ( γ ) A ′− ( γ ) − ( D Σ( γ )) (cid:0) γ † (cid:1)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) exp m X j =1 (cid:16) γ j + γ † j (cid:17) (cid:0) W j + W ′ j (cid:1)! − exp m X j =1 γ j (cid:0) W j + W ′ j (cid:1)! − ( D Σ( γ )) (cid:0) γ † (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) exp m X j =1 γ j (cid:0) W j + W ′ j (cid:1)! exp m X j =1 γ † j (cid:0) W j + W ′ j (cid:1)! − I n − m X j =1 γ † j (cid:0) W j + W ′ j (cid:1)!(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) exp m X j =1 γ j (cid:0) W j + W ′ j (cid:1)!(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) exp m X j =1 γ † j (cid:0) W j + W ′ j (cid:1)! − I n − m X j =1 γ † j (cid:0) W j + W ′ j (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) I n + p X j =1 γ † j (cid:0) W j + W ′ j (cid:1) + ∞ X k =2 ( m X j =1 γ † j (cid:0) W j + W ′ j (cid:1)) k − I n − m X j =1 γ † j (cid:0) W j + W ′ j (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ C (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ X k =2 ( m X j =1 γ † j (cid:0) W j + W ′ j (cid:1)) k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ C ∞ X k =2 m X j =1 (cid:12)(cid:12)(cid:12) γ † j (cid:12)(cid:12)(cid:12) (cid:13)(cid:13)(cid:0) W j + W ′ j (cid:1)(cid:13)(cid:13) k ≤ C ∞ X k =2 (cid:13)(cid:13) γ † (cid:13)(cid:13) kg , (B.18)by (B.17), without loss of generality, and again the norm used in (B.18) depending on whetherwe are in the general or the ‘single nonzero diagonal block’ case. Thus (cid:13)(cid:13) A − (cid:0) γ + γ † (cid:1) A ′− (cid:0) γ + γ † (cid:1) − A − ( γ ) A ′− ( γ ) − ( D Σ( γ )) (cid:0) γ † (cid:1)(cid:13)(cid:13) k γ † k g ≤ C ∞ X k =2 (cid:13)(cid:13) γ † (cid:13)(cid:13) k − g → , as (cid:13)(cid:13) γ † (cid:13)(cid:13) g →
0, proving the claim.
Theorem B.1.
Under the conditions of Theorem 4.4 or 5.3, T n − T an = o p (1) as n → ∞ .Proof. It suffices to show that n e m n = n b m n + o p ( √ p ). As b η = y − b θ, b u = y − b f , and b v = b θ − b f ,we have b u = b η + b v and n e m n = b σ − (cid:0)b u ′ Σ ( b γ ) − b u − b η ′ Σ ( b γ ) − b η (cid:1) = b σ − (cid:0) b u ′ Σ ( b γ ) − b v − b v ′ Σ ( b γ ) − b v (cid:1) = 2 n b m n − b σ − h Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − ( u + e ) − e + θ − b f i ′ Σ ( b γ ) − h Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − ( u + e ) − e + θ − b f i = 2 n b m n − b σ − u ′ Σ ( b γ ) − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − u − b σ − (cid:16) θ − b f (cid:17) ′ Σ ( b γ ) − (cid:16) θ − b f (cid:17) + b σ − (cid:16) θ − b f ) − e (cid:17) ′ Σ ( b γ ) − (cid:0) I − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − (cid:1) e − b σ − (cid:16) θ − b f (cid:17) ′ Σ ( b γ ) − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − u = 2 n b m n − (cid:0) n b m n − b σ − ( A + A + A + A ) (cid:1) − b σ − A + b σ − (cid:16) θ − b f ) − e (cid:17) ′ Σ ( b γ ) − (cid:0) I − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − (cid:1) e − b σ − A = n b m n + b σ − ( A + A − A )+ b σ − (cid:16) θ − b f ) − e (cid:17) ′ Σ ( b γ ) − (cid:16) I − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − (cid:17) e. (B.19)In the proof of Theorem 4.2, we have shown that (cid:12)(cid:12)(cid:12)(cid:12)(cid:16) θ − b f (cid:17) ′ Σ ( b γ ) − (cid:0) I − Ψ[Ψ ′ Σ ( b γ ) − Ψ] − Ψ ′ Σ ( b γ ) − (cid:1) e (cid:12)(cid:12)(cid:12)(cid:12) = o p ( √ p )51n the process of proving | A | = o p ( √ p ). Along with (cid:12)(cid:12)(cid:12) e ′ Σ ( b γ ) − (cid:16) I − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − (cid:17) e (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12) e ′ Σ ( b γ ) − e (cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) e ′ Σ ( b γ ) − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − e (cid:12)(cid:12)(cid:12) ≤ k e k sup γ ∈ Γ (cid:13)(cid:13) Σ ( γ ) − (cid:13)(cid:13) + k e k sup γ ∈ Γ (cid:13)(cid:13) Σ ( γ ) − (cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n Ψ (cid:18) n Ψ ′ Σ ( γ ) − Ψ (cid:19) − Ψ ′ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:0) k e k (cid:1) = O p (cid:0) p − µ n (cid:1) = o p ( √ p ) , we complete the proof that n e m n = n b m n + o p ( √ p ) . In the SAR setting of Section 5, n e m n = b σ − (cid:0)b u ′ Σ ( b γ ) − b u − b η ′ Σ ( b γ ) − b η (cid:1) = b σ − (cid:0) b u ′ Σ ( b γ ) − b v − b v ′ Σ ( b γ ) − b v (cid:1) = 2 n b m n − b σ − " Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − u + e + d λ X j =1 ( λ j − b λ j ) W j y ! − e + θ − b f ′ Σ ( b γ ) − " Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − u + e + d λ X j =1 ( λ j − b λ j ) W j y ! − e + θ − b f . Compared to the expression in (B.19), we have the additional terms − b σ − d λ X j =1 ( λ j − b λ j ) W j y ′ Σ ( b γ ) − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − d λ X j =1 ( λ j − b λ j ) W j y and − b σ − d λ X j =1 ( λ j − b λ j ) W j y ′ Σ ( b γ ) − Ψ (cid:0) Ψ ′ Σ ( b γ ) − Ψ (cid:1) − Ψ ′ Σ ( b γ ) − (cid:16) u + θ − b f (cid:17) . Both terms are o p ( √ p ) from the orders of A and A in the proof of Theorem 5.2. Hence, inthe SAR setting, n e m n = n b m n + o p ( √ p ) also holds. References
Ambrosetti, A. and G. Prodi (1995).
A Primer of Nonlinear Analysis . Cambridge UniversityPress.Anatolyev, S. (2012). Inference in regression models with many regressors.
Journal ofEconometrics 170 , 368–382. 52utant-Bernard, C. and J. P. LeSage (2011). Quantifying knowledge spillovers using spatialautoregressive models.
Journal of Regional Science 51 , 471–496.Bloom, N., M. Schankerman, and J. van Reenen (2013). Identifying technology pillovers andproduct market rivalry.
Econometrica 81 , 1347–1393.Case, A. C. (1991). Spatial patterns in household demand.
Econometrica 59 , 953–965.Chen, X. (2007).
Large sample sieve estimation of semi-nonparametric models , Volume 6B,Chapter 76, pp. 5549–5632. North Holland.Chen, X., H. Hong, and E. Tamer (2005). Measurement error models with auxiliary data.
Review of Economic Studies 72 , 343–366.Cliff, A. D. and J. K. Ord (1973).
Spatial Autocorrelation . London: Pion.Conley, T. G. and B. Dupor (2003). A spatial analysis of sectoral complementarity.
Journalof Political Economy 111 , 311–352.De Jong, R. M. and H. J. Bierens (1994). On the limit behavior of a chi-square type test ifthe number of conditional moments tested approaches infinity.
Econometric Theory 10 ,70–90.De Oliveira, V., B. Kedem, and D. A. Short (1997). Bayesian prediction of transformedGaussian random fields.
Journal of the American Statistical Association 92 , 1422–1433.Debarsy, N., F. Jin, and L. F. Lee (2015). Large sample properties of the matrix exponentialspatial specification with an application to FDI.
Journal of Econometrics 188 , 1–21.Delgado, M. and P. M. Robinson (2015). Non-nested testing of spatial correlation.
Journalof Econometrics 187 , 385–401.Ertur, C. and W. Koch (2007). Growth, technological interdependence and spatial external-ities: theory and evidence.
Journal of Applied Econometrics 22 , 1033–1062.Evans, P. and J. U. Kim (2014). The spatial dynamics of growth and convergence in Koreanregional incomes.
Applied Economics Letters 21 , 1139–1143.Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data.
Journalof the American Statistical Association 102 , 321–331.Gallant, R. and D. W. Nychka (1987). Semi-nonparametric maximum likelihood estimation.
Econometrica 55 , 363–390.Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data.
Journal of the American Statistical Association 97 , 590–600.Gradshteyn, I. S. and I. M. Ryzhik (1994).
Table of Integrals, Series and Products (5th ed.).Academic Press, London.Gupta, A. (2018). Nonparametric specification testing via the trinity of tests.
Journal ofEconometrics 203 , 169–185. 53upta, A., S. Kokas, and A. Michaelides (2020). Credit market spillovers in a financialnetwork. Working paper.Gupta, A. and P. M. Robinson (2015). Inference on higher-order spatial autoregressivemodels with increasingly many parameters.
Journal of Econometrics 186 , 19–31.Gupta, A. and P. M. Robinson (2018). Pseudo maximum likelihood estimation of spatialautoregressive models with increasing dimension.
Journal of Econometrics 202 , 92–107.Hahn, J., G. Kuersteiner, and M. Mazzocco (2021). Joint time-series and cross-section limittheory under mixingale assumptions.
Econometric Theory firstview , 1–17.Han, X., L.-f. Lee, and X. Xu (2021). Large sample properties of Bayesian estimation ofspatial econometric models.
Econometric Theory firstview , 1–39.Hannan, E. J. (1970).
Multiple Time Series . John Wiley & Sons.Heston, A., R. Summers, and B. Aten (2002). Penn World Tables Verison 6.1. Downloadabledataset, Center for International Comparisons at the University of Pennsylvania.Hidalgo, J. and M. Schafgans (2017). Inference and testing breaks in large dynamic panelswith strong cross sectional dependence.
Journal of Econometrics 196 , 259–274.Hillier, G. and F. Martellosio (2018a). Exact and higher-order properties of the MLE inspatial autoregressive models, with applications to inference.
Journal of Econometrics 205 ,402–422.Hillier, G. and F. Martellosio (2018b). Exact likelihood inference in group interaction networkmodels.
Econometric Theory 34 , 383–415.Ho, C.-Y., W. Wang, and J. Yu (2013). Growth spillover through trade: A spatial dynamicpanel data approach.
Economics Letters 120 , 450–453.Hong, Y. and H. White (1995). Consistent specification testing via nonparametric seriesregression.
Econometrica 63 , 1133–1159.Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo.
TheAnnals of Statistics 1 , 799–821.Jenish, N. and I. R. Prucha (2009). Central limit theorems and uniform laws of large numbersfor arrays of random fields.
Journal of Econometrics 150 , 86–98.Jenish, N. and I. R. Prucha (2012). On spatial processes and asymptotic inference undernear-epoch dependence.
Journal of Econometrics 170 , 178 – 190.Kelejian, H. H. and I. R. Prucha (1998). A generalized spatial two-stage least squaresprocedure for estimating a spatial autoregressive model with autoregressive disturbances.
Journal of Real Estate Finance and Economics 17 , 99–121.Kelejian, H. H. and I. R. Prucha (2001). On the asymptotic distribution of the Moran I teststatistic with applications. Journal of Econometrics 104 , 219–257.54oenker, R. and J. A. F. Machado (1999). GMM inference when the number of momentconditions is large.
Journal of Econometrics 93 , 327–344.K¨onig, M. D., D. Rohner, M. Thoenig, and F. Zilibotti (2017). Networks in conflict: Theoryand evidence from the Great War of Africa.
Econometrica 85 , 1093–1132.Kuersteiner, G. M. and I. R. Prucha (2020). Dynamic spatial panel models: networks,common shocks, and sequential exogeneity. Forthcoming, Econometrica.Lee, J., P. C. B. Phillips, and F. Rossi (2020). Consistent misspecification testing in spatialautoregressive models. Cowles Foundations Discussion Paper no. 2256.Lee, J. and P. M. Robinson (2016). Series estimation under cross-sectional dependence.
Journal of Econometrics 190 , 1–17.Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators forspatial autoregressive models.
Econometrica 72 , 1899–1925.Lee, L. F. and X. Liu (2010). Efficient GMM estimation of high order spatial autoregressivemodels with autoregressive disturbances.
Econometric Theory 26 , 187–230.LeSage, J. P. and R. Pace (2007). A matrix exponential spatial specification.
Journal ofEconometrics 140 , 190–214.Mat´ern, B. (1986).
Spatial Variation . Almaenna Foerlaget, Stockholm.Mohnen, M. (2020). Stars and brokers: peer effects among medical scientists. Mimeo.Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators.
Journal of Econometrics 79 , 147–168.Oettl, A. (2012). Reconceptualizing stars: scientist helpfulness and peer performance.
Man-agement Science 58 , 1122–1140.Pinkse, J., M. E. Slade, and C. Brett (2002). Spatial price competition: A semiparametricapproach.
Econometrica 70 , 1111–1153.Portnoy, S. (1984). Asymptotic behavior of M -estimators of p regression parameters when p /n is large. I. Consistency. The Annals of Statistics 12 , 1298–1309.Portnoy, S. (1985). Asymptotic behavior of M -estimators of p regression parameters when p /n is large; II. Normal approximation. The Annals of Statistics 13 , 1403–1417.Robinson, P. M. (1972). Non-linear regression for multiple time-series.
Journal of AppliedProbability 9 , 758–768.Robinson, P. M. (1988). The stochastic difference between econometric statistics.
Econo-metrica 56 , 531–548.Robinson, P. M. (2011). Asymptotic theory for nonparametric regression with spatial data.
Journal of Econometrics 165 , 5–19.Robinson, P. M. and F. Rossi (2015). Refined tests for spatial correlation.
EconometricTheory 31 , 1249–1280. 55obinson, P. M. and S. Thawornkaiwong (2012). Statistical inference on regression withspatial dependence.
Journal of Econometrics 167 , 521–542.Scott, D. J. (1973). Central limit theorems for martingales and for processes with stationaryincrements using a Skorokhod representation approach.
Advances in Applied Probability 5 ,119–137.Stein, M. (1999).
Interpolation of Spatial Data . Springer-Verlag, New York.Su, L. and S. Jin (2010). Profile quasi-maximum likelihood estimation of partially linearspatial autoregressive models.
Journal of Econometrics 157 , 18–33.Su, L. and X. Qu (2017). Specification test for spatial autoregressive models.