Efficient closed-form estimation of large spatial autoregressions
aa r X i v : . [ ec on . E M ] S e p Efficient closed-form estimation of large spatial autoregressions ∗ Abhimanyu Gupta †‡ September 14, 2020
Abstract
Newton-step approximations to pseudo maximum likelihood estimates of spatial autore-gressive models with a large number of parameters are examined, in the sense that theparameter space grows slowly as a function of sample size. These have the same asymptoticefficiency properties as maximum likelihood under Gaussianity but are of closed form. Hencethey are computationally simple and free from compactness assumptions, thereby avoidingtwo notorious pitfalls of implicitly defined estimates of large spatial autoregressions. Whencommencing from an initial least squares estimate, the Newton step can also lead to weakerregularity conditions for a central limit theorem than those extant in the literature. A sim-ulation study demonstrates excellent finite sample gains from Newton iterations, especiallyin large multiparameter models for which grid search is costly. A small empirical illustrationshows improvements in estimation precision with real data.
Keywords:
Spatial autoregression, efficiency, many parameters, networks
JEL Classification:
C21, C31, C33, C36 ∗ I am grateful to Peter Robinson for very helpful comments and Christos Kolympiris for permission to usethe data in the empirical illustration. † Department of Economics, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK. E-mail:[email protected]. ‡ Research supported by ESRC grant ES/R006032/1. Introduction
Spatial autoregressive (SAR) models, introduced by Cliff and Ord (1973), are extremely populartools for modelling cross-sectionally dependent economic data. The pre-eminent feature of suchmodels is the presence of one or more ‘spatial weight’ matrices, which parsimoniously capturethe dependence between units in the sample. Such dependence need not be geographic innature, indeed the spatial weight matrix is known by other terms such as ‘adjacency matrix’,‘network link matrix’ and ‘sociomatrix’. For n × y n and u of responses and unobserveddisturbances, respectively, and an n × k covariate matrix X n , the SAR model is y n = p X i =1 λ in W in y n + X n β n + u, (1.1)where the elements of the n × n spatial weight matrices W in are inverse economic distancesand λ n = ( λ n , . . . , λ pn ) ′ and β n are unknown parameter vectors. Subscripting with n permits treatment of triangular arrays, an important issue for spatial models in general (seeRobinson (2011)), and for SAR models even more so due to various normalizations of the W in that make it n -dependent. This paper justifies computationally straightforward estimation forthe parameters of (1.1) with the same asymptotic properties as pseudo maximum likelihoodestimates.SAR models allow dependence to occur across a very generalized notion of space: so longas a mapping exists between every pair of individuals to the real line a spatial weight matrixmay be constructed. The flexible nature of the SAR model means that it may be used to modela very wide range of phenomena. Thus it has found application in many fields of economicssuch as development economics (Case (1991), Helmers and Patnam (2014)), industrial orga-nization (Pinkse, Slade, and Brett (2002)), trade (Conley and Dupor (2003)) and peer effects(Hsieh and van Kippersluis (2018)), to name only a few examples.Estimation of SAR models has long been considered in the regional science literature, seee.g. Anselin (1988). Rigorous asymptotic theory for instrumental variables (IV) estimationwas initially provided by Kelejian and Prucha (1998), leading to the present flourishing theo-retical literature. Lee (2002) studied ordinary least squares (OLS) estimation of SAR models,stressing the need for lack of sparsity in the spatial weight matrix to establish desirable asymp-totic properties such as consistency and efficiency. This was followed by Lee (2004), a seminalcontribution that provided a taxonomical asymptotic theory for Gaussian pseudo maximumlikelihood estimates (PMLE) of SAR models. Recently Kuersteiner and Prucha (2013, 2020)have provided general theory for such models in a panel data setting.The flexible nature of SAR modelling is further embellished by the seamless ability to in-tegrate more than one spatial weight matrix in the model (1.1), thus permitting simultaneousconnections between units across a number of channels. This is an accurate representation oftypical economic situations, e.g. countries are ‘connected’ by both geographical proximity aswell as trade ties. Furthermore, in many economic settings the sample partitions naturallyinto p clusters or groups, leading to block diagonal structure for the spatial weight matrix V n = diag ( V n , . . . , V pn ), where V in is m i × m i and P pi =1 m i = n . To permit the modelling of2eterogenous spillover effects across clusters, one may take W in to be the n × n block diagonalmatrix with the m i × m i dimensional i -th diagonal block given by V in . This approach has beensuggested by Gupta and Robinson (2015, 2018). Here and more generally, the specification(1.1) is termed a ‘higher-order’ SAR model if p >
1, see e.g Blommestein (1983), Lee and Liu(2010), Li (2017), Han, Hsieh, and Lee (2017), Kwok (2019).In the study of higher-order SAR models, Gupta and Robinson (2015, 2018) have suggestedthat p, k be allowed to diverge slowly to infinity as functions of sample size. The motivation forsuch generality is typically threefold: first, it is desirable to permit a richer model as the samplesize permits. Second, clustered data as mentioned in the previous paragraph naturally implyasymptotic regimes with increasing p . For instance, when m i = m for each i = 1 , . . . , p , we have n = mp and the results of Lee (2004) imply that p → ∞ is necessary for consistent estimation,analogous to the problems created in the spatial statistics literature by ‘infill asymptotics’, seee.g. Lahiri (1996). Finally, a theory that allows the model dimension to grow with samplesize provides a more incisive analysis of large models in practice, much as typical asymptotictheory with a fixed parameter space itself can be thought as providing an approximation infinite samples.The estimation of such increasing-order SAR models has been studied by Gupta and Robinson(2015, 2018) using IV, OLS and PMLE approaches. The first two methods have the advan-tage of being in closed-form, while even for p = 1 PMLE (in)famously requires grid search andthe inversion of an n × n matrix in every iteration, leading to many ingenious solutions forfaster computation, see e.g. Ord (1975) and Pace and Barry (1997). The computational costof PMLE in SAR type models is particularly salient as data sets increase in size, as stressedby Zhu, Huang, Pan, and Wang (2020). Modern network data sets are amenable to modellingvia SAR techniques and can feature, or accommodate, large parameter spaces but computationremains a serious challenge. Han, Lee, and Xu (2020) provide a discussion of the problems andpropose a Bayesian solution.These problems are naturally exacerbated if p >
1, with grid search requiring more itera-tions to converge and each iteration requiring inversion of an n × n matrix, as well as risk ofconvergence to local optima. Furthermore, the requirement of a compact parameter space for λ n can severely restrict the admissible parameter values (see Gupta and Robinson (2018)). Onthe other hand, under Gaussianity the PMLE becomes the MLE and is efficient. This prop-erty is shared by OLS, but under rather delicate and specific conditions even for p = 1 (see Lee(2002)). Thus the IV/OLS and PMLE approaches each have their advantages and it is desirableto combine the positive properties of both.One method of obtaining closed-form estimates with the same asymptotic covariance matrixas a target estimate is to use Newton-type iterations commencing from an initial consistent es-timator that is straightforward to compute. The approach dates back at least to Fisher (1925)and LeCam (1956). It enjoys the added attraction of avoiding a potentially complicated con-sistency proof for an implicitly defined estimate, as well as the compactness assumptions thistypically entails. As a result, the technique has been used in a vast variety of settings, see e.g.Rothenberg and Leenders (1964) (simultaneous equations), Hartley and Booker (1965) (non-linear least squares), Janssen, Jureˇckova, and Veraverbeke (1985) ( M -estimation), Rothenberg31984) (generalized least squares), Hualde and Robinson (2011), Kristensen and Linton (2006),Robinson (2005) (time series and adaptive estimation), Andrews (1997) (generalized methodof moments), Kasahara and Shimotsu (2008), Kristensen and Salani´e (2017) (structural estima-tion), De Luca, Magnus, and Peracchi (2018) (generalized linear models) and Frazier and Renault(2017) (efficient two-step estimation), to name just a few.In this paper we use IV and OLS estimates as initial estimates to form a single Newton-step asymptotic approximation to the Gaussian PMLE with p = p n and k = k n allowed todiverge as functions of n → ∞ . The approach has been studied in the case of fixed-dimensionalSAR models by Robinson (2010) and Lee and Yu (2013), but the previous discussion hints atits particular usefulness when considering large models. One avoids grid search over a high-dimensional parameter space, compactness assumptions on this space and the inversion of large( n × n ) matrix for every search iteration, as well as various headaches related to convergence andlocal optima. When commencing from IV estimates, this leads to closed-form efficient estimatesunder Gaussianity. As suggested by the results of Lee (2002) and Gupta and Robinson (2015),commencing iteration from OLS preserves the efficiency property. Howevere, we show that theNewton step approach cancels out certain terms of large stochastic order that allows for weakerrate conditions than those imposed in these papers.In a simulation study, we demonstrate that the Newton step can lead to much improvedestimates in finite samples, both in terms of bias and efficiency. While a single step is sufficientto establish desirable asymptotic properties, in our simulation study we also explore the finitesample implications of additional Newton steps, reporting results with up to six iterations. Wefind large finite sample gains in both bias and mean squared error that are robust to heavytailed error distributions. We also observe fast convergence of iterations, which conforms toextant theoretical observations. The gains are particularly notable when the parameter spaceand sample size is large, a situation in which PMLE becomes computationally onerous. In asmall illustration with real world data, we show that the estimates work well in practice andlead to more precise results. The ( − /n times) log pseudo Gaussian likelihood function for model (1.1) at any admissiblepoint θ = ( λ ′ , β ′ ) ′ is given by Q n (cid:0) θ, σ (cid:1) = log (2 πσ ) − n log | S n ( λ ) | + 1 nσ y ′ n S n ( λ ) M n S n ( λ ) y n , (2.1)where M n = I n − X n ( X ′ n X n ) − X ′ n and S n ( λ ) = I n − P pi =1 λ in W in , with I n denoting the n × n identity matrix. Henceforth we denote true parameter values with 0 subscript and suppressthe argument for a quantity evaluated at a true parameter value, e.g. S n ( λ n ) ≡ S n . If S n isinvertible, (1.1) admits the reduced form y n = S − n X n β n + S − n u , and we define R n = A n + B n , where A n = ( G n X n β n , . . . , G p n n X n β n ) , B n = ( G n u, . . . , G p n n u ) , G in ( λ ) = W in S − n ( λ ), i =1 , . . . , p n , and so R n = ( W n y n , . . . , W p n n y n ). 4efining R yn ( θ ) = R n λ n + X n β n − y n , the derivative of (2.1) at any admissible (cid:0) θ, σ (cid:1) is ξ n (cid:0) θ, σ (cid:1) = (cid:0) ϕ n ′ ( θ, σ ) , σ − n − R yn ′ ( θ ) X n (cid:1) ′ , (2.2)where ϕ n (cid:0) θ, σ (cid:1) = 2 σ − n − (cid:0) σ trG n ( λ ) + y ′ n W ′ n R yn ( θ ) , . . . , σ trG pn ( λ ) + y ′ n W ′ pn R yn ( θ ) (cid:1) ′ . Be-cause R y = − u , denoting φ n = σ − n − (cid:0) σ trC n − u ′ C n u, . . . , σ trC p − u ′ C pn u (cid:1) ′ with C in = G in + G ′ in , we obtain ξ n ≡ ∂ Q n ∂θ = (cid:0) φ n ′ , (cid:1) ′ − σ − t n , (2.3)with t n = n − [ A n , X n ] ′ u . The Hessian at any admissible point in the parameter space is H n (cid:0) θ n , σ (cid:1) ≡ ∂ Q n ( θ n , σ ) ∂θ∂θ ′ = n P ji,n ( λ n ) + nσ R ′ n R n nσ R ′ n X n nσ X ′ n R n nσ X ′ n X n (2.4)where P ji,n ( λ n ) is the p n × p n matrix with ( i, j )-th element given by tr ( G jn ( λ n ) G in ( λ n )).For a generic matrix A denote k A k = ( η ( A ′ A )) , with η ( · ) and η ( · ) denoting the largest andsmallest eigenvalues, respectively, of a symmetric positive semidefinite matrix. Note that if A is a vector then k A k is simply its Euclidean norm. Let Z n be an n × r n matrix of instruments,with r n ≥ p n , and define the IV and OLS estimates asˆ θ n = ˆ Q − n ˆ K ′ n J − n ˆ k n , ˆ σ n = n − (cid:13)(cid:13)(cid:13) y n − ( R n , X n ) ˆ θ n (cid:13)(cid:13)(cid:13) , (2.5)˜ θ n = ˆ L − n ˆ l n , ˜ σ n = n − (cid:13)(cid:13)(cid:13) y n − ( R n , X n ) ˜ θ n (cid:13)(cid:13)(cid:13) , (2.6)respectively, with ˆ Q n = ˆ K ′ n J − n ˆ K n , ˆ K n = n − [ Z n , X n ] ′ [ R n , X n ] , ˆ k n = n − [ Z n , X n ] ′ y n , J n = n − [ Z n , X n ] ′ [ Z n , X n ] , and ˆ L n = n − [ R n , X n ] ′ [ R n , X n ] , ˆ l n = n − [ R n , X n ] ′ y n . Define the re-spective ‘one-step’ estimates ˆˆ θ n and ˜˜ θ n by the following equationsˆˆ θ n = ˆ θ n − ˆ H − n ˆ ξ n , (2.7)˜˜ θ n = ˜ θ n − ˜ H − n ˜ ξ n . (2.8)where for any function f ( θ ) and generic estimate ˇ θ , ˇ f ≡ f (cid:0) ˇ θ (cid:1) .While our theorems below establish desired asymptotic properties for the one step estimates,from a practical point of view more iterations may be desirable. In fact, these also improve thestatistical rate of convergence to the target PMLE, yielding an even faster statistical counterpartto the famous quadratic numerical rate of convergence of Newton estimates, see for exampleTheorem 2 of Robinson (1988) and p. 312-313 of Ortega and Rheinboldt (1970). We examinethis issue in more detail in the next section and also the Monte Carlo study. The following assumptions are discussed in Lee (2002, 2004), and Gupta and Robinson (2015,2018), amongst other spatial papers in which they are routinely employed. Stochastic regressors5an easily be accommodated but needlessly complicate the notation so we opt for simplicity.
Assumption . u = ( u , . . . , u n ) ′ has iid elements with zero mean and finite variance σ . Assumption . For i = 1 , . . . , p n , the elements of W in are uniformly O (1 /h n ), where h n is somepositive sequence which may be bounded or divergent, but always bounded away from zero andsuch that n/h n → ∞ as n → ∞ . The diagonal elements of each W in are zero. Assumption . S n is non-singular for all sufficiently large n .Let k A k R denote the maximum absolute row sum norm of a generic matrix A . Assumption . (cid:13)(cid:13) S − n (cid:13)(cid:13) R , (cid:13)(cid:13) S ′− n (cid:13)(cid:13) R , k W in k R and k W ′ in k R are uniformly bounded in n and i forall i = 1 , . . . , p n and sufficiently large n . Assumption . The elements of X n are constants and are uniformly bounded in n , in absolutevalue, for all sufficiently large n . Assumption . The elements of Z n are constants and are uniformly bounded in absolute value,for all sufficiently large n . Assumption . lim n →∞ η ( J n ) < ∞ and lim n →∞ η ( K ′ n K n ) > Assumption . lim n →∞ η ( J n ) > n →∞ η ( K ′ n K n ) < ∞ . Assumption . lim n →∞ η ( L n ) < ∞ . Assumption . lim n →∞ η ( L n ) > Assumption . E (cid:0) u i (cid:1) ≤ C for i = 1 , . . . , n .Let Ψ n be an s × ( p n + k n ) matrix of constants with full row-rank. The claims of the followingtheorems also hold when p n and k n are fixed, but we state and prove the results for the morechallenging case when these diverge. Theorem 3.1. (i) Let Assumptions 1-11 hold along with p n + 1 r n + 1 k n + p n k n n + p n k n h n → as n → ∞ , (3.1) and r n n bounded as n → ∞ . (3.2) Then n ( p n + k n ) Ψ n (cid:16) ˆˆ θ n − θ n (cid:17) d −→ N (cid:18) , lim n →∞ σ p n + k n Ψ n L − n Ψ ′ n (cid:19) , where the asymptotic covariance matrix exists, and is positive definite, by Assumptions 9and 10.(ii) Let Assumptions 1-5 and 9-11 hold. Suppose also that p n + 1 k n + p n k n n + p n k n h n + n p n h n → . (3.3)6 hen n ( p n + k n ) Ψ n (cid:16) ˜˜ θ n − θ n (cid:17) d −→ N (cid:18) , lim n →∞ σ p n + k n Ψ n L − n Ψ ′ n (cid:19) , where the asymptotic covariance matrix exists, and is positive definite, by Assumptions 9and 10. In the ‘just identified’ case p n = r n , condition (3.2) is implied by (3.1). Theorem 3.1 ( i ) showsthat the one-step estimate asymptotically achieves the efficiency bound noted by Lee (2002). Onthe other hand, Theorem 3.1 ( ii ) yields the same distributional result as for the OLS estimate(Theorem 4.3 of Gupta and Robinson (2015)). This should come as no surprise since Lee (2002)has already established the efficiency of OLS under suitable conditions. Nevertheless, Theorem3.1 ( ii ) imposes weaker conditions on the relative rates of h n and n than those extant in theliterature.Indeed, for their result, Gupta and Robinson (2015) assumed n p n /h n → n p n /h n →
0. The latter is a quantity of smaller order as (cid:18) n p n /h n (cid:19) / (cid:18) n p n /h n (cid:19) =( p n /h n ) →
0. For fixed p n and k n , our asymptotic normality result relies only on n /h n → n → ∞ . This is a weaker requirement as compared to Lee (2002), who assumed n /h n → n → ∞ . The reason for these favourable outcomes is the cancellation of higher order termswhen using the one-step approximation. The key difference is in the rates ww n − [ B n , ′ u ww = O p (cid:18) p n /h n (cid:19) and k φ n k = O p (cid:18) p n /n h n (cid:19) , the latter being sharper since n/h n → ∞ as n → ∞ .If h n is bounded as n → ∞ , a more complicated analysis is required to establish that one-stepestimates achieve the PMLE asymptotic covariance matrix, because the information equalitydoes not hold asymptotically. Denote µ l = E (cid:0) u li (cid:1) for natural numbers l , and introduce, with i, j = 1 , . . . , p n , the p n × p n matrix Ω λλ,n with ( i, j )-th element µ nσ P nr =1 c rr,in b r,jn X n β n +( µ − σ ) nσ P nr =1 c rr,in c rr,jn and the k n × p n matrix Ω λβ,n with i -th column µ nσ P nr =1 c rr,in x r,n where c pq,in is the ( p, q )-th element of C in , b jn = G jn X n β n with t -th element b t,jn ( j = 1 , . . . , p n and t = 1 , . . . , n ) and x p,n is the p -th column of X ′ n . DefineΩ n = Ω λλ,n Ω ′ λβ,n Ω λβ,n . (3.4)Then E ( ξ n ξ ′ n ) = n − (2Ξ n + Ω n ) , whereΞ n = E ( H n ) = n (cid:16) P ji,n + P j ′ i,n + σ A ′ n A n (cid:17) nσ A ′ n X n nσ X ′ n A n nσ X ′ n X n . (3.5)When h n is bounded OLS cannot be consistent (see Lee (2002)), so the following theoremconsiders only initial IV estimates. Theorem 3.2.
Let Assumptions 1-7 hold. Suppose that h n is bounded away from zero and that here is a real number δ > such that E | u i | δ ≤ C for i = 1 , . . . , n . In addition, assume that lim n →∞ η (cid:0) − n + Ξ − n Ω n Ξ − n (cid:1) < ∞ , lim n →∞ η (cid:0) − n + Ξ − n Ω n Ξ − n (cid:1) > n →∞ η (Ξ n ) > . (3.6) Suppose also that p n + 1 r n + 1 k n + p n k n (cid:0) p n ( r n + k n ) + k n (cid:1) n + ( p n k n ) δ n → as n → ∞ . (3.7) Then n ( p n + k n ) Ψ n (cid:16) ˆˆ θ n − θ n (cid:17) d −→ N (cid:18) , lim n →∞ σ p n + k n Ψ n (cid:0) − n + Ξ − n Ω n Ξ − n (cid:1) Ψ ′ n (cid:19) , where the asymptotic covariance matrix exists, and is positive definite, by (3.6). As indicated earlier, further iterations on the Newton step can improve the rate of statisticalconvergence to the target as well as finite sample properties. To see this, let ˆˆ θ ℓn be the ℓ -thNewton iteration towards the PMLE ˇ θ n . By Theorem 2 of Robinson (1988), (cid:13)(cid:13)(cid:13) ˇ θ n − ˆˆ θ ℓ +1 n (cid:13)(cid:13)(cid:13) = O p (cid:18)(cid:13)(cid:13)(cid:13) ˇ θ n − ˆˆ θ n (cid:13)(cid:13)(cid:13) ℓ (cid:19) , an identical bound holding also for ˜˜ θ ℓ +1 n . A factor that depends on ℓ issuppressed in the stated stochastic bound, indicating that this is not uniform in ℓ . Because theresults of Gupta and Robinson (2018) and this paper show that one-step Newton estimates andˇ θ n are n / / ( p n + k n ) / -consistent, we have (cid:13)(cid:13)(cid:13) ˇ θ n − ˆˆ θ ℓ +1 n (cid:13)(cid:13)(cid:13) = O p (cid:16) ( n/ ( p n + k n )) − ℓ − (cid:17) , (cid:13)(cid:13)(cid:13) ˇ θ n − ˜˜ θ ℓ +1 n (cid:13)(cid:13)(cid:13) = O p (cid:16) ( n/ ( p n + k n )) − ℓ − (cid:17) , thus yielding the rate at which the iterations approximate the target estimate in a statisticalsense, pointwise in ℓ . We examine finite-sample performance of ˆˆ θ n in this section, since the IV case entails a changein limiting distribution due to the Newton step. Following Das, Kelejian, and Prucha (2003)and the design in Gupta and Robinson (2015), define W ∗ in as the symmetric circulant matrixwith first row w ∗ j,in = ( j = 1 or j = i + 2 , . . . , n − i ;1 if j = 2 , . . . , i + 1 or j = n − i + 1 , . . . , n, (4.1)and take W cin = k W ∗ in k − W ∗ in , where k W ∗ in k = η ( W ∗ in ) = 2 i, because W ∗ in is a symmetric,circulant matrix (see e.g. Davis (1979) p. 73). Thus W cin is also a symmetric circulant matrixwith first row given by w ∗ j,in / i . This is an example of spatial weight matrices with bounded h n . We now dispense with n subscripts for brevity. Our design generates y = S − ( Xβ + u )for sample sizes n = 200 , ,
800 and k = 2, with elements of X generated as iid replicatesfrom a U (0 ,
1) distribution. We generate the disturbance u using two different distributions: N (0 ,
1) and t . PMLE becomes MLE under the first, while the second has heavier tails. Our8xperiments take p = 2 , , W ci , i = 1 , . . . , p . Finally, we set β = 1, β = 0 . p = 2 : λ = 0 . , λ =0 . p = 4 : λ = 0 . , λ i = 0 . , i = 2 , , p = 6 : λ i = 0 . , i = 1 , . . . , . The choices of λ i satisfythe sufficient condition P pi =1 | λ i | < S .With the aim of comparing initial IV estimates to Newton-step estimates, we report twostatistics: the Monte Carlo mean and relative root Monte Carlo mean squared error, the latterbeing a straightforward ratio of the root MSE for IV and the iterated estimate. We also examinethe use of more than one iteration in finite samples, and for this recall the notation ˆˆ θ ℓn for the ℓ -th Newton iteration. Our results are reported for ℓ = 1 , ,
6. The set of instruments that weuse for our initial estimates are the linearly independent columns of Z = (cid:0) W c X, . . . , W cp X, X (cid:1) .In Tables 4.1 and 4.2, we report the Monte Carlo mean of our estimates for standard normaland t errors, respectively. For standard normal errors, we notice that the initial IV estimatecan be heavily biased but Newton iterations improve matters, sometimes spectacularly. Indeed,for p = 6 and n = 200 the performance of ˆ θ n can be appalling, with ˆ λ <
0. However aftersix Newton steps this has improved to 0.1216 and even three iterations lead to a significantimprovement. The reduction of bias from Newton iterations is not a universal feature, howeverbroadly speaking the Newton steps reduce bias in the estimates, even for smaller values of p .As the sample size increases the iterations converge substantially, with little to choose typicallybetween ˆˆ θ n and ˆˆ θ n for n = 800. However for n < p < t errors, Table 4.2 paints a similar picture to Table 4.1. Once again, the noticeable‘rogue’ estimate is for λ when p = 6 and n = 200. Considering that all our simulations startfrom the same seed, this outlier may possibly be attributed to a bad draw. As in the normalerrors case, results are quite stable for larger n and smaller p , and typically show bias reductiondue to Newton steps and near convergence after three iterations.In Tables 4.3 and 4.4, we report the ratio of the Monte Carlo root mean squared error of ˆ θ n to that of ˆˆ θ ℓn , ℓ = 1 , ,
6, abbreviating this quantity to RRMSE. An RRMSE of two indicatesthat the RMSE of the IV estimate is twice that of the Newton iteration it is being comparedto. Our results in Table 4.3 show that Newton iterations can lead to tremendous finite samplegains in MSE. These gains are present in 100% of the cases considered, but are generally largerfor the spatial parameters λ i than the regression parameters β i .We discuss the spatial parameter estimates first. Note that for greater sample sizes we havegreater MSE gains, often the gains more than doubling from n = 200 to n = 800, and sometimeseven tripling. As observed for the means in Table 4.1, there is usually not much to choose frombetween the third and sixth iterations. With and n = 800 we nearly always obtain Newtonestimates with RMSE a quarter of that for IV, and occasionally even a fifth of the IV RMSE.In most cases three iterations are enough to achieve these superb gains.These patterns for the λ i qualitatively repeat themselves when the errors are t , as seen inTable 4.4. In this case when n = 800 we achieve RMSE improvements over IV of a factor ofthree always when three iterations are carried out, with factors of four commonly seen and threecases with a fivefold improvement. The factors of efficiency improvement that we observe in ourresults can dominate similar precedents in other settings. Indeed, the greatest relative root MSE9 = 200 n = 400 n = 800 p = 2 ˆ θ n ˆˆ θ n ˆˆ θ n ˆˆ θ n ˆ θ n ˆˆ θ n ˆˆ θ n ˆˆ θ n ˆ θ n ˆˆ θ n ˆˆ θ n ˆˆ θ n λ λ β β p = 4 λ λ λ λ β β p = 6 λ λ λ λ λ -0.0390 0.0475 0.1085 0.1216 0.1522 0.1753 0.1870 0.1761 0.1700 0.1439 0.1613 0.1636 λ β β Table 4.1: Monte Carlo mean of parameter estimates. IV and iterated Newton-step estimateswith N (0 ,
1) errors.improvement that Robinson (2005) finds in his fractional time series setting is p / .
23 = 2 . β and β , in both Tables 4.3 and 4.4we see almost universal improvement over IV. The exceptions are four cases out of a total of54 in Table 4.4, for the t case. These RMSE gains are not as spectacular as for the λ i , butare generally noticeably large as both n and p increase. Indeed, for n = 800 we the RMSE forthe IV estimate can sometimes be one and a half times are large as the Newton iterations when p = 6 and n = 800. For n ≥ β and β , the values of p , the number of iterations and the error distribution. Thus thereis evidence of the usefulness of Newton iterations even for the regression parameters, albeit thegains are greater for the spatial parameters. In this small empirical illustration we show that the Newton step estimates perform well in prac-tice and can lead to more precise estimation. The example is based on Kolympiris, Kalaitzandonakes, and Miller(2011) (KKM), and is also studied in Gupta and Robinson (2015). KKM seek to model theventure capital funding (provided by venture capital firms (VCFs)) for dedicated biotechnologyfirms (DBFs) with a SAR model. The hypothesis is that the level of VC funding for a DBFincreases with the number of VCFs located in close proximity. Denoting by d lk the distance in10 = 200 n = 400 n = 800 p = 2 ˆ θ n ˆˆ θ n ˆˆ θ n ˆˆ θ n ˆ θ n ˆˆ θ n ˆˆ θ n ˆˆ θ n ˆ θ n ˆˆ θ n ˆˆ θ n ˆˆ θ n λ λ β β p = 4 λ λ λ λ β β p = 6 λ λ λ λ λ -0.0051 0.0421 0.1620 0.1664 0.1190 0.1141 0.1433 0.1337 0.1840 0.1891 0.1727 0.1671 λ β β Table 4.2: Monte Carlo mean of parameter estimates. IV and iterated Newton-step estimateswith t errors.miles between the l -th and k -th DBFs, we estimate y = p X i =1 λ i W bi y + Xβ + U, (5.1)where W bi is the (row-normalised) weight matrix having off-diagonal ( l, k )-th element equal to 1if i − < d lk ≤ i , i = 1 . . . , p, and if d lk = 0 for i = 1. Thus the matrices are based on each oneof p sequential 1-mile rings from the origin DBF. y is the vector of natural logs of the amountof VC funding (million $) received by each of n = 816 DBFs.We focus on estimates of the main parameters of interest λ i in (5.1), and omit detailsregarding the β . We estimate (5.1) with p = 2 , , λ and λ are statistically significant at the 1% level, and the magnitude of our parameter estimates isalso close to their findings, with our results reported in Table 5.1. The table reports t statistics in The authors include several explanatory variables in X n , we give a very brief description and refer the readerto KKM for details. The covariates include the number of proximate VCFs and DBFs to capture the effects ofbeing in areas of high VCF or DBF concentration. Firm-specific characteristics include the distance from eachDBF to its funding VCFs, , the average age of each funding VCF, exposure of VCFs through syndication and anindicator for foreign VCF investment. Variables controlling for DBF-specific factors include firm age, dummiesfor receiving a grant and being in an R&D tax credit state, a cost of business index for the DBF’s home state,distance to the closest university and the number of non-biotech establishments in the DBF’s zip code. Twofurther variables recognize that additional factors can affect the cost of doing business in ways that influence theVC funding levels of a given DBF. = 200 n = 400 n = 800 p = 2 RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) λ λ β β p = 4 λ λ λ λ β β p = 6 λ λ λ λ λ λ β β Table 4.3: Monte Carlo relative root MSE of IV estimates to iterated Newton-step estimateswith N (0 ,
1) errors.parentheses. In square brackets we report for each parameter estimate the ratio of IV standarderror to Newton-step standard error, and find that this difference can be as great as 12.53%.Thus the iteration scheme we propose can lead to more accurate inference in practice as theestimates are more precise. 12 = 200 n = 400 n = 800 p = 2 RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) RMSE ( ˆ θ n ) RMSE (cid:16) ˆˆ θ n (cid:17) λ λ β β p = 4 λ λ λ λ β β p = 6 λ λ λ λ λ λ β β Table 4.4: Monte Carlo relative root MSE of IV estimates to iterated Newton-step estimateswith t errors. p = 6 ˆ λ ˆ λ ˆ λ ˆ λ ˆ λ ˆ λ λ ˆˆ λ ˆˆ λ ˆˆ λ ˆˆ λ ˆˆ λ p = 4 ˆ λ ˆ λ ˆ λ ˆ λ λ ˆˆ λ ˆˆ λ ˆˆ λ p = 2 ˆ λ ˆ λ λ ˆˆ λ Table 5.1: IV and single Newton-step estimates of λ i in model (5.1). t -statistics are in parenthe-ses and the ratio of IV standard errors to Newton-step standard errors are in square brackets.13 Proofs of theorems
Write a n = p n + k n , b n = r n + k n , c n = p n k n + k n and τ n = n /a n . Proof of Theorem 3.1. (i) By the mean value theorem (2.7) implies thatˆˆ θ n − θ n = (cid:16) I a n − ˆ H − n H n (cid:17) (cid:16) ˆ θ n − θ n (cid:17) − ˆ H − n ξ n = ˆ θ n − θ n − ˆ H − n H n (cid:16) ˆ θ n − θ n (cid:17) − ˆ H − n ξ n (5.2)where H n = ∂ Q n ( θ n , ˆ σ n ) /∂θ∂θ ′ and ww θ n − θ n ww ≤ ˆ θ n − θ n , with each row of theHessian matrix evaluated at possibly different θ n . The latter point is a technical commentthat we take as given in the remainder of the paper whenever a mean-value theorem isapplied to vector of values. For any s × α , we can use (5.2) to write τ n α ′ Ψ n (cid:16) ˆˆ θ n − θ n (cid:17) = τ n α ′ Ψ n ˆ H − n (cid:16) ˆ H n − H n (cid:17) (cid:16) ˆ θ n − θ n (cid:17) − τ n α ′ Ψ n ˆ H − n ξ n , (5.3)recalling that τ n = n /a n . The first term on RHS above has modulus bounded by τ n k α k k Ψ n k ˆ H − n ˆ H n − H n ˆ θ n − θ n , where the second factor in norms is O (cid:18) a n (cid:19) ,the third is bounded for sufficiently large n by Lemma B.5, by Lemma B.3 the fourth is O p (cid:18) max (cid:26) p n b n /n h n , b n c n /n, p n b n /n h n , b n /n (cid:27)(cid:19) and the fifth is O p (cid:18) b n /n (cid:19) by The-orem 3.1 of Gupta and Robinson (2015). We conclude that the first term on the RHS of(5.3) is O p (cid:18) max (cid:26) p n b/n h n , b n c n /n, p n b/n h n , b n /n (cid:27)(cid:19) , which is negligible by (3.1) and(3.2) because p n b n nh n ≤ C (cid:16) p n r n + p n k n nh n (cid:17) = O (cid:16) p n h n r n n + p n k n nh n (cid:17) , b n c n n ≤ C (cid:18) r n p n k n + p n k n n (cid:19) = O (cid:18) p n k n n r n n + p n k n n (cid:19) , p n b n nh n ≤ C (cid:16) p n r n + p n k n nh n (cid:17) = O (cid:16) r n n p n h n + p n k n n h n (cid:17) , b n n ≤ C (cid:16) r n + k n n (cid:17) .For the negligibility of the last term note that r n n = r n n r − n .Thus we only need to find the asymptotic distribution of − τ n α ′ Ψ n ˆ H − n ξ n . We can write − τ n α ′ Ψ n ˆ H − n ξ n = 2 σ τ n α ′ Ψ n ˆ H − n t n − τ n α ′ Ψ n ˆ H − n φ n . (5.4)We have E k φ n k ≤ P p n i =1 E (cid:0) n − trC in − n − σ − u ′ C in u (cid:1) = P p n i =1 var (cid:0) n − u ′ C in u (cid:1) = O ( p n /nh n ) , (see (A.20) in the proof of Theorem 3.3 and Lemma B.2 in Gupta and Robinson(2018)), so that k φ n k = O p p n n h n . (5.5)Therefore the second term on the right of (5.4) has modulus bounded by τ n times k α k k Ψ n k ˆ H − n k φ n k , (5.6)14here the second factor is O (cid:18) a n (cid:19) , the third is bounded for sufficiently large n by LemmaB.5 and the last is O p (cid:18) p n /n h n (cid:19) . Thus (5.6) is O p (cid:18) p n a n /n h n (cid:19) and the second termon the right of (5.4) is O p (cid:18) p n /h n (cid:19) which is negligible by (3.1). Then the asymptoticdistribution required is that of2 σ τ n α ′ Ψ n ˆ H − n t n = X i =1 Υ in + τ n α ′ Ψ n L − n t n , (5.7)Υ n = σ τ n α ′ Ψ n ˆ H − n (cid:16) ˆ H n − H n (cid:17) H − n t n , Υ n = σ τ n α ′ Ψ n Ξ − n ( H n − Ξ n ) H − n t n , Υ n = τ n α ′ Ψ n L − n (cid:16) σ Ξ n − L n (cid:17) (cid:16) σ Ξ n (cid:17) − t n . We will demonstrate that | Υ in | = o p (1), i = 1 , , . First we observe that | Υ n | ≤ σ τ n k α k k Ψ n k (cid:13)(cid:13)(cid:13) ˆ H − n (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ˆ H n − H n (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) H − n (cid:13)(cid:13) k t n k , where thesecond factor in norms is O (cid:18) a n (cid:19) , the third and fifth are bounded for sufficiently large n by Lemma B.5, the fourth is O p (cid:18) max (cid:26) p n b n /n h n , b n c n /n, p n b n /n h n , b n /n (cid:27)(cid:19) byLemma B.3, and by (A.13) of Gupta and Robinson (2015) the last is O p (cid:18) c n /n (cid:19) .Then | Υ n | = O p (cid:18) max (cid:26) p n b n c n /n h n , b n c n /n, p n b n c n /n h n , b n c n /n (cid:27)(cid:19) , which is negli-gible by (3.1) and (3.2) because p n b n c n nh n ≤ C (cid:16) p n r n k n + p n k n nh n (cid:17) = O (cid:18) r n n p n k n n p n h n + p n k n n p n h n (cid:19) , b n c n n ≤ C (cid:16) r n p n k n + p n k n n (cid:17) = O (cid:16) r n n p n k n n + p n k n n k n n (cid:17) , p n b n c n nh n ≤ C (cid:16) r n p n k n + p n k n nh n (cid:17) = O (cid:18) r n n p n k n n p n h n + p n k n nh n (cid:19) and b n c n /n = o (1) has been shown earlier.Next | Υ n | ≤ σ − τ n k α k k Ψ n k (cid:13)(cid:13) H − n (cid:13)(cid:13) k H n − Ξ n k (cid:13)(cid:13) Ξ − n (cid:13)(cid:13) k t n k , where the second factor innorms is O (cid:18) a n (cid:19) , the third and fifth are bounded for sufficiently large n by Lemma B.5,the fourth is O p (cid:16) p n k n /n (cid:17) by Lemma B.4 and the last is O p (cid:18) c n /n (cid:19) as above. Then | Υ n | = O p (cid:18) p n k n c n /n (cid:19) which is negligible by (3.1) because p n k n c n /n ≤ Cp n k n /n. Similarly | Υ n | = O p (cid:18) p n c n /h n (cid:19) by Lemma B.4, which is negligible by (3.1) because p n c n /h n ≤ Cp n k n /h n . Then we only need to find the asymptotic distribution of the lastterm term in (5.7), but this is precisely the proof of Theorem 3.3 of Gupta and Robinson(2015). Replicating those arguments leads to the theorem.(ii) In view of Lemmas B.4, B.6 and B.7, the theorem is proved exactly like Theorem 3.1 ( i ),except for different orders of magnitudes of various expressions. In this case two of theorders will be different from the analogous ones considered in the the proof of Theorem15.1 ( i ). Indeed, the analogue of the bound for the first term in (5.3) is O p n max p n c n n h n , p n h n , p n c n n h n , c n n max c n n , p n h n = O p (max { π n , π n , π n , π n , π n , π n } ) , where π n = p n c n /n h n , π n = p n c n /h n , π n = p n c n /n h n , π n = c n /n , π n = n p n /h n , π n = n p n c n /h n . Now π n is assumed to tend to zero by (3.3), while the remaining π in terms are also negligible by (3.3) because π n = O (cid:0) p n k n /nh n (cid:1) , π n = O (cid:0) p n k n /h n (cid:1) , π n = O (cid:0)(cid:0) p n k n /n (cid:1) (cid:0) p n k n /h n (cid:1)(cid:1) , π n = O (cid:0) p n k n /n (cid:1) , π n = O (cid:16) n p n k n /h n (cid:17) = O (cid:18) π n (cid:18) p n k n /h n (cid:19)(cid:19) .The Υ n bound analogue is O (cid:16) n (cid:17) O p (cid:18) max (cid:26) p n c n /n h n , p n /h n , p n c n /n h n , c n /n (cid:27)(cid:19) × O p (cid:18) c n /n (cid:19) = O p (max { π n , π n , π n , π n } ) , which was shown to be negligible under theassumed conditions. All other bounds remain unchanged and will be also be negligibleunder (3.3), as in the proof of Theorem 3.1 ( i ). Proof of Theorem 3.2.
Proceeding as in the proof of Theorem 3.1 (i), we can write τ n α ′ Ψ n (cid:16) ˆˆ θ n − θ n (cid:17) = τ n α ′ Ψ n ˆ H − n (cid:16) ˆ H n − H n (cid:17) (cid:16) ˆ θ n − θ n (cid:17) − τ n α ′ Ψ n (cid:16) ˆ H − n − Ξ − n (cid:17) ξ n − τ n α ′ Ψ n Ξ − n ξ n . (5.8)As in the proof of Theorem 3.1 (i), the first term on the RHS above is negligible by (3.7).Lemma B.5 (for bounded h n ) indicates that the second term on the RHS of (5.8) is boundedin modulus by a constant times τ n k Ψ n k ( k t n k + k φ n k ) (cid:16)(cid:13)(cid:13)(cid:13) ˆ H n − H n (cid:13)(cid:13)(cid:13) + k H n − Ξ n k (cid:17) which is O p (cid:18) n max (cid:26) c n /n , p n /n h n (cid:27) max (cid:26) p n b n /n h n , b n c n /n, p n b n /n h n , b n /n, p n k n /n (cid:27)(cid:19) , us-ing (A.13) of Gupta and Robinson (2015), (5.5) and Lemmas B.3 and B.4 (i). This is negligibleby (3.7), in a similar way to the preceding proofs. Thus we need to establish the asymptoticdistribution of − τ n α ′ Ψ n Ξ − n ξ n , which is established under the assumed conditions in Theorem3.4 of Gupta and Robinson (2018). B Lemmas
In the subsequent lemmas the assumptions of the theorems that these are used to prove aretaken to hold.
Lemma B.1. (Lemma LS.4 of Gupta and Robinson (2018), Supplementary Material) k B ′ n A n k = k A ′ n B n k = O p (cid:16) n p n k n (cid:17) . Lemma B.2. (Lemma LS.4 of Gupta and Robinson (2018), Supplementary Material) k X ′ n B n k = k B ′ n X n k = O p (cid:18) n p n k n (cid:19) . emma B.3. ˆ H n − H n and ˆ H n − H n are O p (cid:18) max (cid:26) p n b n /n h n , b n c n /n, p n b n /n h n , b n /n (cid:27)(cid:19) . Proof.
By the triangle inequality ˆ H n − H n ≤ ˆ H n − H n + ww H n − H n ww , and again by thetriangle inequality ˆ H n − H n is bounded by2 n (cid:13)(cid:13)(cid:13) P ji,n (ˆ λ n ) − P ji,n (cid:13)(cid:13)(cid:13) + 2 n (cid:12)(cid:12)(cid:12)(cid:12) σ n − σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:0)(cid:13)(cid:13) R ′ n R n (cid:13)(cid:13) + 2 (cid:13)(cid:13) X ′ n R n (cid:13)(cid:13) + (cid:13)(cid:13) X ′ n X n (cid:13)(cid:13)(cid:1) . (5.9)The first term in (5.9) is bounded by p n X i,j =1 (cid:18) n tr ( G jn (ˆ λ n ) G in (ˆ λ n )) − n tr ( G jn G in ) (cid:19) (5.10)By the MVT, we have tr ( G jn (ˆ λ n ) G in (ˆ λ n )) = tr ( G jn G in ) + ζ ′ ij,n (cid:16) ˆ λ n − λ n (cid:17) , where ζ ij,n haselements tr (cid:16) G in (cid:16) λ n (cid:17) G sn (cid:16) λ n (cid:17) G jn (cid:16) λ n (cid:17) + G sn (cid:16) λ n (cid:17) G in (cid:16) λ n (cid:17) G jn (cid:16) λ n (cid:17)(cid:17) , s = 1 , . . . , p n , and λ n − λ n ≤ ˆ λ n − λ n . Thus, the summands in (5.10) are 4 n − (cid:16) ζ ′ ij,n (cid:16) ˆ λ n − λ n (cid:17)(cid:17) ≤ n − (cid:13)(cid:13)(cid:13) ζ ij,n (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ˆ λ n − λ n (cid:13)(cid:13)(cid:13) by Cauchy-Schwarz inequality, where the first factor in norms on theRHS is O (cid:0) p n n /h n (cid:1) by Lemma LS.3 of Gupta and Robinson (2018), supplementary material.The second factor is bounded by ˆ θ n − θ n = O p ( b n /n ) (see (A.6) of Gupta and Robinson(2015)), so we conclude that the summands in (5.10) are O p (cid:0) b n p n /nh n (cid:1) and therefore (5.10) is O p (cid:18) p n b n /n h n (cid:19) and it follows that so is the first term in (5.9). By (A.7) of Gupta and Robinson(2015), (cid:12)(cid:12)(cid:12)(cid:12) σ n − σ (cid:12)(cid:12)(cid:12)(cid:12) = O p max b n c n n , p n b n n h n , b n n , (5.11)which handles the second factor in the second term in (5.9). We shall now bound the termsinside the parentheses in the second term in (5.9). These are O p ( n ) because n − k R n k = O p (1), n − k X n k = O (1) and n − k X ′ n R n k = O p (1), by Assumption 9. From (5.10), (5.11), we concludethat ˆ H n − H n = O p p n b n n h n + O p max b n c n n , p n b n n h n , b n n Similarly, it may be shown that ww H n − H n ww has the same order, whence the lemma follows. Lemma B.4. (Lemma B.2 of Gupta and Robinson (2018)) (i) k H n − Ξ n k = O p (cid:16) p n k n /n (cid:17) ifAssumption 11 holds, (ii) ww L n − (cid:0) σ / (cid:1) Ξ n ww = O ( p n /h n ) . Lemma B.5. (Lemma B.3 of Gupta and Robinson (2018)) The following inequalities are satis-fied: plim ˆ H − n ≤ C plim ww H − n ww ≤ C lim n →∞ ww Ξ − n ww ≤ C (cid:18) lim n →∞ η ( L n ) (cid:19) − ≤ C. If h n does ot diverge, the above result becomes plim ˆ H − n ≤ C plim ww H − n ww ≤ C (cid:18) lim n →∞ η (Ξ n ) (cid:19) − ≤ C, if also lim n →∞ η (Ξ n ) > . Lemma B.6. ˜ H n − H n and ˜ H n − H n are O p (cid:18) max (cid:26) c n /n, p n /h n , p n c n /n h n , p n c n /n h n (cid:27)(cid:19) . Proof.
The proof is similar to that of Lemma B.3 and we only elaborate on the differences fromthat proof. In this case we need to bound2 n (cid:13)(cid:13)(cid:13) P ji,n (˜ λ n ) − P ji,n (cid:13)(cid:13)(cid:13) + 2 n (cid:12)(cid:12)(cid:12)(cid:12) σ n − σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:0)(cid:13)(cid:13) R ′ n R n (cid:13)(cid:13) + 2 (cid:13)(cid:13) X ′ n R n (cid:13)(cid:13) + (cid:13)(cid:13) X ′ n X n (cid:13)(cid:13)(cid:1) . (5.12)In the OLS case we have ˜ σ n − σ = O p (cid:18) max (cid:26) c n /n, p n /h n , p n c n /n h n (cid:27)(cid:19) and ˜ θ n − θ n = O p (cid:18) max (cid:26) c n /n , p n /h n (cid:27)(cid:19) , from (A.23) and (A.21) of Gupta and Robinson (2015), respec-tively. The first term in (5.12) is then O p (cid:18) max (cid:26) p n c n /n h n , p n /h n , p n c n /n h n (cid:27)(cid:19) while thesecond one is O p (cid:18) max (cid:26) c n /n, p n /h n , p n c n /n h n (cid:27)(cid:19) . We may then argue in a similar way thatthe Hessian evaluated at the OLS estimate differs from its value at an intermediate point innorm by the same to conclude the proof. Lemma B.7. (Lemma B.3 of Gupta and Robinson (2018)) plim ˜ H − n ≤ plim ww H − n ww ≤ lim n →∞ ww Ξ − n ww ≤ σ (cid:18) lim n →∞ η ( L n ) (cid:19) − ≤ C. References
Andrews, D. W. K. (1997). A stopping rule for the computation of Generalized Method of Momentsestimators.
Econometrica 65 , 913–931.Anselin, L. (1988).
Spatial Econometrics: Methods and Models , Volume 4. Kluwer Academic Publishers,Boston.Blommestein, H. J. (1983). Specification and estimation of spatial econometric models: a discussionof alternative strategies for spatial economic modelling. Regional Science and Urban Economics 13 ,251–270.Case, A. C. (1991). Spatial patterns in household demand.
Econometrica 59 , 953–965.Cliff, A. D. and J. K. Ord (1973).
Spatial Autocorrelation . London: Pion.Conley, T. G. and B. Dupor (2003). A spatial analysis of sectoral complementarity.
Journal of PoliticalEconomy 111 , 311–352.Das, D., H. H. Kelejian, and I. R. Prucha (2003). Finite sample properties of estimators of spatialautoregressive models with autoregressive disturbances.
Papers in Regional Science 82 , 1–26.Davis, P. J. (1979).
Circulant Matrices . Wiley Interscience, New York.De Luca, G., J. R. Magnus, and F. Peracchi (2018). Weighted-average least squares estimation ofgeneralized linear models.
Journal of Econometrics 204 , 1–17.Fisher, R. A. (1925). Theory of statistical estimation.
Proceedings of the Cambridge PhilosophicalSociety 22 , 700–725.Frazier, D. T. and E. Renault (2017). Efficient two-step estimation via targeting.
Journal of Economet-rics 201 , 212–227. upta, A. and P. M. Robinson (2015). Inference on higher-order spatial autoregressive models withincreasingly many parameters. Journal of Econometrics 186 , 19–31.Gupta, A. and P. M. Robinson (2018). Pseudo maximum likelihood estimation of spatial autoregressivemodels with increasing dimension.
Journal of Econometrics 202 , 92–107.Han, X., C.-s. Hsieh, and L. F. Lee (2017). Estimation and model selection of higher-order spatialautoregressive model: An efficient Bayesian approach.
Regional Science and Urban Economics 63 ,97–120.Han, X., L. F. Lee, and X. Xu (2020). Large sample properties of Bayesian estimation of spatialeconometric models.
Econometric Theory Firstview , 39pp.Hartley, H. O. and A. Booker (1965). Nonlinear least squares estimation.
Annals of MathematicalStatistics 36 , 638–650.Helmers, C. and M. Patnam (2014). Does the rotten child spoil his companion? Spatial peer effectsamong children in rural India.
Quantitative Economics 5 , 67–121.Hsieh, C.-s. and H. van Kippersluis (2018). Smoking initiation: Peers and personality.
QuantitativeEconomics 9 , 825–863.Hualde, J. and P. M. Robinson (2011). Gaussian pseudo-maximum likelihood estimation of fractionaltime series models.
Annals of Statistics 39 , 3152–3181.Janssen, P., J. Jureˇckova, and N. Veraverbeke (1985). Rate of convergence of one-and two-step M-estimators with applications to maximum likelihood and Pitman estimators.
The Annals of Statis-tics 13 , 1222–1229.Kasahara, H. and K. Shimotsu (2008). Pseudo-likelihood estimation and bootstrap inference for struc-tural discrete Markov decision models.
Journal of Econometrics 146 , 92–106.Kelejian, H. H. and I. R. Prucha (1998). A generalized spatial two-stage least squares procedure forestimating a spatial autoregressive model with autoregressive disturbances.
Journal of Real EstateFinance and Economics 17 , 99–121.Kolympiris, C., N. Kalaitzandonakes, and D. Miller (2011). Spatial collocation and venture capital inthe US biotechnology industry.
Research Policy 40 , 1188–1199.Kristensen, D. and O. Linton (2006). A closed-form estimator for the GARCH(1,1) model.
EconometricTheory 22 , 323–337.Kristensen, D. and B. Salani´e (2017). Higher-order properties of approximate estimators.
Journal ofEconometrics 198 , 189–208.Kuersteiner, G. M. and I. R. Prucha (2013). Limit theory for panel data models with cross sectionaldependence and sequential exogeneity.
Journal of Econometrics 174 , 107–126.Kuersteiner, G. M. and I. R. Prucha (2020). Dynamic spatial panel models: networks, common shocks,and sequential exogeneity. Forthcoming, Econometrica.Kwok, H. H. (2019). Identification and estimation of linear social interaction models.
Journal of Econo-metrics 210 , 434–458.Lahiri, S. N. (1996). On inconsistency of estimators based on spatial data under infill asymptotics.
Sankhy¯a: Series A 58 , 403–417.LeCam, L. (1956). On the asymptotic theory of estimation and testing hypotheses.
Proceedings of theThird Berkeley Symposium on Mathematical Statistics and Probability 1 , 129–156.Lee, L. F. (2002). Consistency and efficiency of least squares estimation for mixed regressive, spatialautoregressive models.
Econometric Theory 18 , 252–277.Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autore-gressive models.
Econometrica 72 , 1899–1925.Lee, L. F. and X. Liu (2010). Efficient GMM estimation of high order spatial autoregressive models withautoregressive disturbances.
Econometric Theory 26 , 187–230. ee, L. F. and J. Yu (2013). Near unit root in the spatial autoregressive model. Spatial EconomicAnalysis 8 , 314–351.Li, K. (2017). Fixed-effects dynamic spatial panel data models and impulse.
Journal of Econometrics 198 ,102–121.Ord, K. (1975). Estimation methods for models of spatial interaction.
Journal of the American StatisticalAssociation 70 , 120–126.Ortega, J. and W. Rheinboldt (1970).
Iterative Solution of Nonlinear Equations in Several Variables .Academic Press, New York.Pace, R. K. and R. Barry (1997). Quick computation of spatial autoregressive estimators.
GeographicalAnalysis 29 , 232–247.Pinkse, J., M. E. Slade, and C. Brett (2002). Spatial price competition: A semiparametric approach.
Econometrica 70 , 1111–1153.Robinson, P. M. (1988). The stochastic difference between econometric statistics.
Econometrica 56 ,531–548.Robinson, P. M. (2005). Efficiency improvements in inference on stationary and nonstationary fractionaltime series.
Annals of Statistics 33 , 1800–1842.Robinson, P. M. (2010). Efficient estimation of the semiparametric spatial autoregressive model.
Journalof Econometrics 157 , 6–17.Robinson, P. M. (2011). Asymptotic theory for nonparametric regression with spatial data.
Journal ofEconometrics 165 , 5–19.Rothenberg, T. J. (1984). Approximate normality of generalized least squares estimates.
Economet-rica 52 , 811–825.Rothenberg, T. J. and C. T. Leenders (1964). Efficient estimation of simultaneous equation systems.
Econometrica 32 , 57–76.Zhu, X., D. Huang, R. Pan, and H. Wang (2020). Multivariate spatial autoregressive model for largescale social networks.
Journal of Econometrics 215 , 591–606., 591–606.