NNonparametric prediction with spatial data
Abhimanyu Gupta ∗† Javier Hidalgo ‡ August 11, 2020
Abstract
We describe a nonparametric prediction algorithm for spatial data. The algorithm is basedon a flexible exponential representation of the model characterized via the spectral density func-tion. We provide theoretical results demonstrating that our predictors have desired asymptoticproperties. Finite sample performance is assessed in a Monte Carlo study that also comparesour algorithm to a rival nonparametric method based on the infinite AR representation of thedynamics of the data. We apply our method to a real data set in an empirical example thatpredicts house prices in Los Angeles. Keywords : Spatial data, prediction, nonparametric, spectral density, lattice
Random models for spatial or spatio-temporal data play an important role in many disciplines ofeconomics, such as environmental, urban, development or agricultural economics as well as economicgeography, among others. Examples may be found in the special volume by Baltagi et al. (2007)on the topic or Cressie (1993). Earlier treatments include the pioneering paper by Mercer andHall (1911) on wheat crop yield data (see also Gao et al. (2006)) or Batchelor and Reed (1918),which were employed as examples and analysed in the celebrated paper by Whittle (1954). Otherexamples are given in Cressie and Huang (1999), see also Fernandez-Casal et al. (2003). Witha view towards applications in environmental and agricultural economics, Mitchell et al. (2005)employed a model of the type studied in this paper to analyse the effect of CO on crops, whereasGenton and Koul (2008) examine the yield of barley in UK. They shed light on how the models canbe useful when there is evidence of spatial movement, such as that of pollutants due to winds orocean currents. When data is collected in time such models are termed ‘noncausal’ and have drawninterest in economics, see for instance Breidt et al. (2001) among others for some early examples. ∗ Department of Economics, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, U.K. Email:[email protected] † Research supported by ESRC grant ES/R006032/1. ‡ Economics Department, London School of Economics, Houghton Street, London WC2A 2AE, U.K. Email:[email protected] a r X i v : . [ ec on . E M ] A ug t is often the case that spatial data are collected on a regular lattice. This may occur asa consequence of some planned experiment or due to a systematic sampling scheme, or whenwe can regard the (possibly non-gridded) observations as the result of aggregation over a set ofcovering regions rather than values at a particular site, see e.g. Conley (1999), Conley and Molinari(2007), Bester et al. (2011), Wang et al. (2013), Nychka et al. (2015), Bester et al. (2016). As aresult of this ability to map locations to a regular grid, lattice data are frequently studied in theeconometrics literature, see for example Roknossadati and Zarepour (2010), Robinson (2011) andJenish (2016). Nonsystematic patterns may occur, although these might arise as a consequence ofmissing observations, see Jenish and Prucha (2012) for a study that covers irregular spatial data.Doubtlessly, one of the ultimate goals when analysing the spatial “dynamics” of the data, i.e.the evolution over space, is to provide the best possible predictions of “future” values, where inthe context of spatial data, prediction has a broader meaning as it is also referred to interpolationat an unsampled site s , i.e. x ( s ), as is the case in cartography or in our empirical example.However in view that not much work exists on how to predict with spatial data, the main aim ofthis manuscript is then to fill this gap and hence to provide a methodology to predict a “future”observation. More specifically, given a stretch of observations X ( n ) = { x ( s i ) } ni =1 , where s i denotesthe site in Z d , our objective becomes to obtain a function of the data X ( n ), say p ( X ( n ) ; s ), whichminimizes the L -risk function E ( x ( s ) − p ( X ( n ) ; s )) . Of course, other risk functions might beconsidered although the L -risk is standard and routinely employed with real data sets. It is wellknown that the solution to the latter problem is given by p ( X ( n ) ; s ) = E [ x ( s ) | X ( n )] , (1.1)which, under Gaussianity or (2 .
1) below; see also our Condition C
1, is a linear function of theobservations. That is E [ x ( s ) | X ( n )] = n (cid:88) i =1 β i x ( s i ) (1.2)or x ( s ) = n (cid:88) i =1 β i x ( s i ) + η ( s ) .Model (1 .
2) is referred to as the conditional autoregressive (
CAR ) or the Besag (1974) model,as opposed to the Whittle (1954) simultaneous specification, see (2 .
1) below. So (1 .
2) suggests thatto obtain accurate predictions, one needs to estimate the coefficients β i . For that purpose, a typicalapproach is to assume a particular parameterization of β i and then to estimate the parameters vialeast squares, which are a function of the covariance (covariogram) structure of X ( n ). In this paper,the aim is to provide an estimator of (1 .
1) without assuming any particular parameterization ofthe dynamic structure of the data a priori, for instance without assuming any particular functionalform for the coefficients β i in (1 . We shall assume that the (spatial) process { x ( t ) } t ∈ Z d admits a multilateral representation givenby x ( t ) − µ = (cid:88) j ∈ Z d ψ j ε ( t − j ) , (cid:88) j ∈ Z d ψ j < ∞ , ψ = 1, (2.1)for some sequence of random variables { ε ( t ) } t ∈ Z d such that E ( ε ( t )) = 0 and E ( ε ( s ) ε ( t )) = σ ε I ( t = s ) and where I ( · ) denotes the indicator function. One important consequence of themultilateral representation of the process { x ( t ) } t ∈ Z d is that the sequence { ε ( t ) } t ∈ Z d loses its inter-pretation as being the “prediction” error of the model, and thus they can no longer be regardedas innovations, as was first noticed by Whittle (1954). Observe that we have changed the notationfor the site s to a more standard one given by t ∈ Z d . When d = 1, we obtain the so-called non-causal models or, in Whittle’s terminology, linear transect models. These models can be regardedas forward looking and have gained some consideration in economics, see for instance Lanne andSaikonnen (2011), Davis et al. (2013), Lanne and Saikkonen (2013) or Cavaliere et al. (2020).As we mentioned in the introduction, to provide correct predictors (or interpolation), we needto specify the covariogram function of the sequence { x ( t ) } t ∈ Z d , denoted by { γ ( s ) } s ∈ Z d , where γ ( s ) = Cov ( x ( t ) , x ( t + s )). The covariogram γ ( s ) is related to the spectral density function of { x ( t ) } t ∈ Z d , f ( λ ), through the expression γ ( s ) = (cid:90) Π d f ( λ ) e − is · λ dλ, s ∈ Z d ,where Π = ( − π, π ]. Henceforth the notation “ j · λ ” means the inner product of multi-indices j and λ of dimension d , while for any element a ∈ Z d (or Π d ), we shall write a = ( a [1] , ..., a [ d ]). It is3orth observing that under (2 . f ( λ ) can be factorized as f ( λ ) = σ ε (2 π ) d | Ψ ( λ ) | , λ ∈ Π d , with Ψ ( λ ) = (cid:80) j ∈ Z d ψ j e − ij · λ . Thus, Ψ ( λ ) summarizes the covariogram structure of { x ( t ) } t ∈ Z d ,which is the key to obtain accurate and valid predictions.One parameterization of (2 .
1) is the
ARM A field model P ( L ) ( x ( t ) − µ ) = Q ( L ) ε ( t ) , P ( z ) = (cid:88) j ∈ Z d α j z j ; α (0) = 1; Q ( z ) = (cid:88) j ∈ Z d β j z j ; β (0) = 1,where Z d and Z d are finite subsets of Z d and henceforth z j = (cid:81) d(cid:96) =1 z [ (cid:96) ] j [ (cid:96) ] with the convention that0 = 1. For instance, the ARM A ( k , k ; (cid:96) , (cid:96) ) field is given by k (cid:88) j = − k α j ( x ( t − j ) − µ ) = (cid:96) (cid:88) j = − (cid:96) β j ε ( t − j ) , α = β = 1 , (2.2)which has an spectral density function f ( λ ) = σ ε (2 π ) d (cid:12)(cid:12)(cid:12)(cid:80) (cid:96) j = − (cid:96) β j e ij · λ (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:80) k j = − k α j e ij · λ (cid:12)(cid:12)(cid:12) .Observe that model (2 .
2) becomes a causal one if (cid:96) = k = 0. In what follows, we will discuss thecase d = 2, as it has all the ingredients and it is the most relevant and standard situation with realdata sets.An alternative representation of { x ( t ) } t ∈ Z to that in (2 .
1) is the extension to spatial dataof the Bloomfield (1973) exponential model, which was introduced by Whittle (1954), Section 6,and also termed as the
Cepstrum model by Solo (1986), see also McElroy and Holan (2014). Thisrepresentation of { x ( t ) } t ∈ Z is characterized via the spectral density function f ( λ ) and is definedas f ( λ ) =: exp ( − α (1))(2 π ) | A (1; λ ) | = 1(2 π ) exp − α (1) − (cid:88) ≺ (cid:96) α (cid:96) (1) cos ( (cid:96) · λ ) , (2.3)where by definition A (1; λ ) =: exp − (cid:88) ≺ (cid:96) α (cid:96) (1) e i(cid:96) · λ (2.4)4nd “ ≺ ” denotes the lexicographical (dictionary) ordering defined as s ≺ k ⇔ ( s [1] < k [1]) or ( s [1] = k [1] ∨ s [2] < k [2]) . (2.5)It is worth noting that by the definition of A (1; λ ) given in (2 . α (cid:96) (1) in (2 .
4) aregiven by α (cid:96) (1) = 12 π (cid:90) (cid:101) Π log ( f ( λ )) cos ( (cid:96) · λ ) dλ , 0 (cid:22) (cid:96) , (2.6)where (cid:101) Π = [0 , π ] × [ − π, π ], that is λ ∈ (cid:101) Π if λ [1] ∈ [0 , π ] and λ [2] ∈ [ − π, π ]. When d = 1,expression (2 .
3) is known as the canonical decomposition of f ( λ ), see Whittle (1963), p. 26, orTheorem 3.8.4 in Brillinger (1981). Also, we have that 2 π exp ( α (1)) = σ ε , which becomes theone-step prediction error in causal models.Observe that if we allowed s in (2 .
3) to belong to Z , we would then suffer from some iden-tification problems as cos ( s · λ ) = cos ( − s · λ ) for all λ ∈ Π and s . Solo (1986) notes that if0 < f ( λ ) < M then the representation of the spectral density in (2 . / (2 .
4) exists, and indeed inthe spatial setting the seminal work of Whittle (1954) established the general existence of unilateralschemes in the space-domain as we discuss below.However, the representation induced by (2 .
4) is one of four possible ones using the “lexico-graphic” ordering. Indeed, instead of the ordering in (2 . s ≺ k ⇔ ( s [1] > k [1]) or ( s [1] = k [1] ∨ s [2] > k [2]) , (2.7)so that (2 .
3) and (2 .
4) become respectively f ( λ ) =: exp ( − α (2))(2 π ) | A (2; λ ) | = 1(2 π ) exp − α (2) − (cid:88) ≺ (cid:96) α (cid:96) (2) cos ( (cid:96) · λ ) , (2.8)and A (2; λ ) =: exp − (cid:88) ≺ (cid:96) α (cid:96) (2) e i(cid:96) · λ .Of course, there are two remaining possible orderings which arise when the roles of “ s ” and “ k ”are interchanged in (2 .
5) and/or (2 . s we were interested to predict. However,it is also very important to emphasize that the spectral density function f ( λ ) given in (2 .
3) or (2 . f ( λ ) the same.The previous representations of the sequence { x ( t ) } t ∈ Z might be thought as given in thespectral domain, as opposed to that suggested in (2 . igure 1: Half-plane illustration for d = 2. Circles form ≺ ≺
0. The large black solid dotmarks the origin. having the same spectral scheme although not necessarily of finite order, as is the case when d = 1,and with coefficients that may be intractable functions of the parameters ψ j in (2 . ψ j in (2 .
1) are not needed, as the expression in (1 .
2) suggests.That is, Whittle (1954) showed that the sequence { x ( t ) } t ∈ Z possesses four unilateral repre-sentations, as opposed to the multilateral one in (2 . . x ( t ) − µ = (cid:88) (cid:52) j ζ j (1) ϑ ( t − j ) , (cid:88) (cid:52) j ζ j (1) < ∞ , ζ (1) = 1. (2.9)These are called unilateral representations as they model dependence over a ‘half-plane’ of Z . Ofcourse, similar to the fact that depending on the chosen lexicographic ordering, in the sense that ifone chooses (2 . x ( t ) − µ = (cid:88) (cid:52) j ζ j (2) ˙ ϑ ( t − j ) , (cid:88) (cid:52) j ζ j (2) < ∞ , ζ (2) = 1.The ‘half-planes’ corresponding to these representations are illustrated in Figure 1. As before, wehave two additional representations by changing the roles of (cid:96) [1] and (cid:96) [2] in the definitions of 0 (cid:52) j and 0 (cid:52) j , which would split the points in Figure 1 horizontally rather than vertically.It might be expected that there is a relationship between the representation in (2 .
9) and(2 . / (2 . ζ j (1) and α (cid:96) (1). The link among these coefficients turnsout to play a crucial role in our prediction algorithm. This is discussed in the next section. In the context of causal time series sequences, d = 1, prediction algorithms are well understoodand they are performed either in the time or in the frequency domain, see Brockwell and Davis62006), Section 5.6, or Hannan (1970), Chapter 3. In our context, when d = 2, we envisage thatsimilar algorithms should be possible and one can choose between using a time domain approachor use the spectral density for that purpose. For instance, one can choose the “representation” in(1 .
2) or the unilateral representation in, say, (3 .
2) given below. In the situation, where no finiteparameterization of the model is known, similar to the case when d = 1, we would employ a sievetype of approximation to the true model. For instance, we might approximate the left side of (3 . (cid:80) (cid:52) j (cid:52) P a j (1) x ( t − j ), which corresponds to the often employed AR ( P ) approximation when d = 1.However, in this paper the main aim is to extend the methodology in Bhansali (1974) or Hidalgoand Yajima (2002), by describing an algorithm using the spectral representation in (2 . { x ( t ) } t ∈ Z in the “frequency” and “time” domains. For thatpurpose, we shall examine the relationship between (2 .
4) and (2 .
3) and that in (2 . (cid:52) ”, and denote the Fourier coefficients of A (1; λ ) by a j (1) = 14 π (cid:90) Π A (1; λ ) e ij · λ dλ ; 0 ≺ j (3.1)and a (1) = 1. The latter displayed expression yields that A (1; λ ) = (cid:88) (cid:52) j a j (1) e ij · λ ,which implies that the sequence { x ( t ) } t ∈ Z has a unilateral representation given by (cid:88) (cid:52) j a j (1) x ( t − j ) = ϑ ( t ) , (3.2)where { ϑ ( t ) } t ∈ Z is a sequence of zero mean uncorrelated errors with finite fourth moments. So,we observe that the unilateral representation in (3 .
2) corresponds to that given in (2 . ϑ ( t ) were denoted as ε ( t ). In addition, it is easy to see that (3 .
2) corresponds then tothe “autoregressive” representation given in (2 . f ( λ ) =: σ ϑ (2 π ) | A (1; λ ) | = σ ϑ (2 π ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) (cid:52) j a j (1) e ij · λ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − .Similarly, the sequence { x ( t ) } t ∈ Z has a unilateral M A ( ∞ ) representation in terms of the errors { ϑ ( t ) } t ∈ Z given by x ( t ) − µ = (cid:88) (cid:52) j ζ j (1) ϑ ( t − j ) ,7here ζ j (1) = 14 π (cid:90) Π B (1; λ ) e ij · λ dλ , 0 ≺ j ; ζ (1) = 1 B (1; λ ) = : A − (1; λ ) = exp (cid:88) ≺ j α (cid:96) (1) e − i(cid:96) · λ .Due to the rather unusual notation in this paper, we have decided to gather it at this stage forconvenience. The numbers 0 , π can be either scalars or vectors (of dimension d = 2), whichshould be clear from the context, whereas ˚ π = (0 , π ) (cid:48) and e (cid:96) denotes a vector in Z whose (cid:96) -thelement is one and the other element is zero. Given two vectors a and b , a ∨ b and a ∧ b representrespectively the maximum and minimum of the two according to the employed lexicographicalordering, whereas a ≥ ( ≤ ) b means that a [ (cid:96) ] ≥ ( ≤ ) b [ (cid:96) ] for all (cid:96) = 1 ,
2. DenoteΠ n = (cid:26) λ k [ (cid:96) ] = πk [ (cid:96) ]˜ n [ (cid:96) ] , k [ (cid:96) ] = 0 , ± , ..., ± ˜ n [ (cid:96) ] =: ± n [ (cid:96) ]2 , (cid:96) = 1 , (cid:27) ,where λ k = (cid:0) λ k [1] , λ k [2] (cid:1) stands for the Fourier frequencies. We define (cid:101) Π n = (cid:8) λ k ∈ Π n : λ k [1] > (cid:9) .Also (cid:90) + λ (cid:46) π = (cid:90) πλ [1]=0 (cid:90) πλ [2]= − π ; (cid:90) − λ (cid:46) π = (cid:90) λ [1]= − π (cid:90) πλ [2]= − π (3.3) (cid:90) a ≤ λ ≤ b = (cid:90) b [1] λ [1]= a [1] (cid:90) b [2] λ [2]= b [2] and similarly (cid:88) j (cid:22) J + = : J [1] (cid:88) j [1]=0 J [2] (cid:88) j [2]= − J [2] ; (cid:88) j (cid:22) J − =: − (cid:88) j [1]= − J [1] J [2] (cid:88) j [2]= − J [2] (3.4) (cid:88) a ≤ t ≤ b = : b [1] (cid:88) t [1]= a [1] b [2] (cid:88) t [2]= a [2] .It goes without saying that similar notation follows when q = 2 , (cid:80) j (cid:22) J + + (cid:80) j (cid:22) J − =: (cid:80) − J ≤ j ≤ J , and likewise (cid:82) + λ (cid:46) π + (cid:82) − λ (cid:46) π =: (cid:82) λ ∈ Π .Before we describe our prediction algorithm, we shall introduce our set of regularity conditions. Condition C1 ( a ) { ε ( t ) } t ∈ Z in (2 .
1) is a zero mean independent identically distributed sequenceof random variables with variance σ ε and finite 4 th moments with κ ε denoting the fourthcumulant of { ε ( t ) } t ∈ Z . 8 b ) The multilateral Moving Average representation of { x ( t ) } t ∈ Z d in (2 .
1) can be written (orit has a representation) as a multilateral
Autoregressive model (cid:88) j ∈ Z ξ j x ( t − j ) = ε ( t ) ξ (0) = 1, (3.5)where ξ ( j ) is the coefficient of z j in the Fourier expansion of L − ( z ), where L ( z ) = L ( z [1] , z [2]) = (cid:88) j ∈ Z ψ j z j . Condition C2 n = (cid:81) (cid:96) =1 n [ (cid:96) ], where n [ (cid:96) ] (cid:16) (cid:126)n for (cid:96) = 1 ,
2, and “ a (cid:16) b ” means that C − ≤ a/b ≤ C for some finite positive constant C .We now comment on Conditions C C
2. Part ( a ) of Condition C .
5) is Ψ ( z ) be no zero for any z [ (cid:96) ], (cid:96) = 1 ,
2, which simultaneouslysatisfy | z [1] | = 1 and | z [2] | = 1. The latter condition will guarantee that f ( λ ) > λ ∈ (cid:101) Π .Condition C n − [ (cid:96) ] aredifferent for (cid:96) = 1 ,
2. However, for notational simplicity we prefer to leave it as it stands.We now discuss the flexible exponential algorithm to predict, or to interpolate, “future” valuesof the data without imposing any specific parametric model for the spectral density function. Forthat purpose, we extend results in Bhansali (1974, 1977) or Hidalgo and Yajima (2002), whenthe data is collected in the plane. In addition, as a by-product, we provide an extremely simpleestimator of the weights α (cid:96) ( q ) or a (cid:96) ( q ), q = 1 , ..., A (1; λ ) and expression (2 .
6) suggest that to compute an estimator ofthe coefficients α (cid:96) ( q ) and/or a (cid:96) ( q ), q = 1 , ...,
4, all that we need is to obtain an estimator of thespectral density function f ( λ ). To that end, for a generic sequence { v ( t ) } nt =1 , we shall define the periodogram as I v ( λ ) = 1 n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ≤ t ≤ n v ( t ) e − it · λ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ; λ ∈ (cid:101) Π ,and where in what follows we use the notation that for any g = ( g [1] , g [2]), g = g [1] g [2] .In real applications, in order to make use of the fast Fourier transform , the periodogram will beevaluated at the Fourier frequencies, that is at λ k = (cid:0) λ k [1] , λ k [2] (cid:1) (cid:48) .However as noted by Guyon (1982), due to non-negligible end effects (the edge effect), the biasof the periodogram does not converge to zero fast enough when d >
1. We therefore proceed as in Of course, we might have written this condition in term of the unilateral representation. However, due to thepossible (four) representations that we can employ, we have preferred to use the multilateral one. tapered periodogram defined as I Tv ( λ j ) = (cid:12)(cid:12) w Tv ( λ j ) (cid:12)(cid:12) ; w Tv ( λ j ) = 1 (cid:16)(cid:80) ≤ t ≤ n h ( t ) (cid:17) / (cid:88) ≤ t ≤ n h ( t ) v ( t ) e it · λ j , (3.6)where w Tv ( λ j ) denotes the taper discrete Fourier transform, DF T . Another possibility to reducethe bias was considered in Robinson and Vidal Sanz (2006), and helpful when d ≥
4. However, aswe only consider the most common scenario d = 2, it suffices for our results to hold true to employthe tapered periodogram I Tv ( λ j ). One common taper is the cosine-bell (or Hanning) taper, whichis defined as h ( t ) = 14 (cid:89) (cid:96) =1 h (cid:96) ( t [ (cid:96) ]) ; h (cid:96) ( t [ (cid:96) ]) = (cid:18) − cos (cid:18) πt [ (cid:96) ] n [ (cid:96) ] (cid:19)(cid:19) , (3.7)see Brillinger (1981). It is worth observing that the standard DFT and the cosine-bell taper
DFT are related by the equality w Tv ( λ j ) = 16 (cid:89) (cid:96) =1 (cid:2) − w v (cid:0) λ j [ (cid:96) ] − (cid:1) + 2 w v (cid:0) λ j [ (cid:96) ] (cid:1) − w v (cid:0) λ j [ (cid:96) ]+1 (cid:1)(cid:3) . (3.8)In the paper we shall explicitly consider the cosine-bell , although the same results follow employingother taper functions such as Parzen or Kolmogorov tapers (Brillinger, 1981).
Condition C3 { h ( t ) } t =1 is the cosine-bell taper function in (3 . f ( λ ) by the average (tapered) periodogram (cid:98) f ( λ ) = 14 m (cid:88) − m ≤ (cid:96) ≤ m I Tx ( λ + λ (cid:96) )(2 π ) , (3.9)with I Tx ( λ j ) given in (3 . m = m [1] m [2], and where m [ (cid:96) ] /n [ (cid:96) ] + m [ (cid:96) ] − = o (1), for (cid:96) = 1 ,
2. Next we denote (cid:101) λ k = (cid:16)(cid:101) λ k [1] , (cid:101) λ k [2] (cid:17) (cid:48) , for k [1] = 0 , , ..., M [1] =:˜ n [1] / m [1] and k [2] = 0 , ± , ..., ± M [2] =: ˜ n [2] / m [2], where (cid:101) λ k [ (cid:96) ] = 2 πk [ (cid:96) ] M [ (cid:96) ] ; (cid:96) = 1 , . a j ( q ), j = 1 , ..., M ,as (cid:98) a j ( q ) = 14 M (cid:88) − M ≤ (cid:96) ≤ M (cid:98) A (cid:96) ( q ) e ij · (cid:101) λ (cid:96) , 0 q ≺ j q (cid:22) M ; q = 1 , ...,
4, (3.10)where (cid:98) A (cid:96) ( q ) = (cid:98) A − (cid:96) ( q ) = exp − (cid:88) j q ≺ M + (cid:98) α j ( q ) e − ij · (cid:101) λ (cid:96) , 0 (cid:52) (cid:96) q (cid:22) M ; q = 1 , ..., α j ( q ) = 12 M (cid:88) k q (cid:22) M + cos (cid:16) j · (cid:101) λ k (cid:17) log (cid:98) f k , 0 q (cid:52) j q (cid:22) M ; q = 1 , ...,
4, (3.11)with φ (cid:16)(cid:101) λ k (cid:17) being abbreviated by φ k for any generic function φ ( · ).It is also worth defining the quantities (3 .
10) and (3 .
11) when evaluated using the spectraldensity function f k ( λ ) or (cid:101) f ( λ ) = 14 m (cid:88) − m ≤ (cid:96) ≤ m f ( λ + λ (cid:96) )instead of the estimated one (cid:98) f ( λ ). That is a j,n ( q ) = M (cid:80) − M ≤ (cid:96) ≤ M A (cid:96),n e ij · (cid:101) λ (cid:96) (cid:101) a j,n ( q ) = M (cid:80) − M ≤ (cid:96) ≤ M (cid:101) A (cid:96),n e ij · (cid:101) λ (cid:96) , 0 q ≺ j q (cid:22) M ; q = 1 , ..., A (cid:96),n ( q ) = A n,(cid:96) ( q ) = exp (cid:110) − (cid:80) j q ≺ M + α j,n ( q ) e − ij · (cid:101) λ (cid:96) (cid:111)(cid:101) A (cid:96),n ( q ) = (cid:101) A n,(cid:96) ( q ) = exp (cid:110) − (cid:80) j q ≺ M + (cid:101) α j,n ( q ) e − ij · (cid:101) λ (cid:96) (cid:111) , 0 (cid:52) (cid:96) q (cid:22) M ; q = 1 , ..., α j,n ( q ) = M (cid:80) k q (cid:22) M + cos (cid:16) j · (cid:101) λ k (cid:17) log f k (cid:101) α j,n ( q ) = M (cid:80) k q (cid:22) M + cos (cid:16) j · (cid:101) λ k (cid:17) log (cid:101) f k , 0 q (cid:52) j q (cid:22) M ; q = 1 , ...,
4. (3.13)Then, our prediction/interpolation algorithm is described as follows. Suppose that we areinterested to “predict” the value x ( s ) at the site s = [ s [1] , s [2]]. To that end, we denote thepredictor of x ( s ) using observations such that t q (cid:22) s . Denote this predictor as (cid:98) x [ q ; s ] = (cid:88) (cid:96) q (cid:52) M + (cid:98) a (cid:96) ( q ) (cid:98) x [ q ; s − (cid:96) ] , q = 1 , ...,
4, (3.14)with the convention that (cid:98) x [ q ; s ] = x ( s ) if x ( s ) is observed. Then, we “predict” or interpolate thevalue x ( s ) by (cid:98) x ( s ) = 12 (cid:88) q =1 ∆ q [ s ] n (cid:98) x [ q ; s ] , (3.15)where ∆ q [ s ] = (cid:26) t : t q (cid:22) s (cid:27) , q = 1 , [ s ] + ∆ [ s ] = ∆ [ s ] + ∆ [ s ] = n , i.e. the total number of observations. Themotivation to perform the “average” of the predictors obtain using the 4 different lexicographicordering comes from the observation that the latter will make use of all the available data to obtainthe predictor for x ( s ). This would not be the case if we employed, say, (cid:98) x [1; s ] then the values of x ( t ) such that s (cid:22) t would have not been used to predict x ( s ).11efore we give the statistical properties of our predictor for x ( s ), we introduce one extracondition. Condition C4 As n [ (cid:96) ] , m [ (cid:96) ] → ∞ , for (cid:96) = 1 ,
2, such that n [ (cid:96) ] m / [ (cid:96) ] + m [ (cid:96) ] n [ (cid:96) ] → (cid:96) = 1 , (cid:98) α (cid:96) ( q ), are the same for any q = 1 , ..., α (cid:96) (1), A (1; λ ), a (cid:96) (1). From here the statistical properties of predictor (cid:98) x ( s ) in (3 .
15) will follow easily.
Theorem 1.
Under C − C , for any finite integer J , we have that ( a ) n / ( (cid:98) α j (1) − (cid:101) α j,n (1)) Jj =1 d → N (0 , Ω α ) , ( b ) (cid:101) α j,n (1) − α j,n (1) = O (cid:0) M − η j + M (cid:1) where Ω a is a diagonal matrix whose jth element is κ ) δ j and δ j := 1 if j = 0 and := 0 otherwise and (cid:8) η j (cid:9) j is a summable sequence. Theorem 2.
Under C − C , for any finite integer J , we have that ( a ) m / (cid:16) (cid:98) A j (1) − (cid:101) A j,n (1) (cid:17) Jj =1 d → N c (0 , Ω A ) , ( b ) (cid:101) A j,n (1) − A j,n (1) = O (cid:16) m − / (cid:17) where N c (0 , Ω A ) denotes a complex normal random variable where the (cid:96)jth element of Ω A is Ω A,j j = 12 (cid:18) δ j [1] − j [2] + 12 φ j [1] φ j [2] − iφ | j − j | [1] (cid:19) A j (1) A j (1) φ k = : 1 − cos ( kπ ) kπ if k ∈ N + and =: 1 if k = 0 . Theorem 3.
Under C − C , for any finite integer J , we have that n / ( (cid:98) a j (1) − a j,n (1)) Jj =1 d → N (0 , Ω a ) ,where a typical element j j th of Ω a is − (cid:80) ≺ k (cid:22) j a j − k (1) a j − k (1) with j (cid:52) j . Once we have obtained the asymptotic properties of the estimators of a j (1), j (cid:52) M , we arein a position to examine the asymptotic properties of the predictor (cid:98) x ( s ) in (3 . { x ∗ ( t ) } t ∈ Z anew independent replicate sequence with the same statistical properties of the original sequence { x ( t ) } t ∈ Z not used in the estimation of the spectral density. Then let (cid:98) x ∗ [ q ; s ] be as in (3 .
14) but12ith (cid:98) x [ q ; s − (cid:96) ] being replaced by x ∗ [ q ; s − j ] there, that is (cid:98) x ∗ [ q ; s ] = (cid:88) j (cid:52) M + (cid:98) a j ( q ) x ∗ [ q ; s − j ] . Theorem 4.
Under C − C , for q = 1 , ..., , we have that AE ( (cid:98) x ∗ [ q ; s ] − x ∗ ( s )) = σ ϑ . We examine the finite-sample behaviour of our algorithm in a set of Monte Carlo simulations. Asin Robinson and Vidal Sanz (2006) and Robinson (2007) we used the model x ( t ) = (cid:15) ( t ) + τ (cid:88) s = − (cid:88) s = − s (cid:54) =0 (cid:15) ( t − s ) , (4.1)similar to one considered in Haining (1978). Then f ( λ ) = (2 π ) − { τ ν ( λ ) } , (4.2)with ν ( λ ) = (cid:81) j =1 (1 + 2 cos λ j ) −
1. Robinson and Vidal Sanz (2006) show that a sufficientcondition for invertibility of (4.1) is | τ | < / . (4.3)We first generated a 40 ×
41 lattice using (4.1), with τ = 0 .
075 and the (cid:15) ( t ) drawn independentlyfrom three different distributions: U ( − , N (0 ,
1) and χ −
9. The aim of this section is to examinethe performance of both prediction algorithms in predicting the (20 , × n [1] = n ∗ + 1 and n [2] = 2 n ∗ + 1, for some positive integer n ∗ , implying n =(2 n ∗ + 1) ( n ∗ + 1), and generated iid (cid:15) ( t ) from each of the three distributions mentioned in theprevious paragraph. In each of the 1000 replications we experimented with τ = 0 . , . , . n ∗ = 5 , ,
20 and 40. The choices of τ satisfy (4.3). Note that when τ (cid:54) = 0 . [1] , m [2] and p , p as n ∗ increases. We make the following choices: m [1] = m [2] = 1; p ∗ = p = p = 1 , , when n ∗ = 5 ,m [1] = 1 , m [2] = 1 , p ∗ = p = p = 1 , , , when n ∗ = 10 ,m [1] = 1 , , m [2] = 1 , , , , p ∗ = p = p = 1 , , , when n ∗ = 20 ,m [1] = m [2] = 1 , , , , p ∗ = p = p = 1 , , , , n ∗ = 30 . The flexible exponential approach requires a nonparametric estimate of f ( λ ). Two such esti-mates are available to use: the first one based on the tapered periodogram described in (3.9), whichwe denote ˆ f ( λ ), and the second based on the autoregressive approach in Gupta (2018). The latteralso provides a rival prediction methodology based on a nonparametric algorithm using AR modelfitting, extending well established results for d = 1, see Bhansali (1978) and Lewis and Reinsel(1985). The idea is first to obtain a least squares predictor based on a truncated autoregressionof order p = ( p L , p U ; p L , p U ), for non-negative integers p L (cid:96) , p U (cid:96) , (cid:96) = 1 ,
2, with the truncationallowed to diverge as N → ∞ . That is, we approximate the infinite unilateral representation in(3.2) by one of increasing order.In view of the half-plane representation we can a priori set, say, p L = 0, when considering (cid:52) and with similar restrictions for q (cid:52) , q = 2 , ,
4. If we could observe the AR prediction coefficients a ( (cid:96) ), say, a prediction of x [ s ] based on q (cid:52) could be constructed asˇ x [ q ; s ] = (cid:88) (cid:96) ∈ S [ − p L ,p U ] a ( (cid:96) )ˇ x [ s − (cid:96) ] , (4.4)where S [ − p L , p U ] = { t ∈ L : − p L (cid:96) ≤ t (cid:96) ≤ p U (cid:96) , (cid:96) = 1 , } ∩ q (cid:52) . (4.5)This is the spatial version of one-step prediction and again we follow the convention that ˇ x [ s ] = x [ s ]if x [ s ] is observed. However (4.4) is not feasible and needs to be replaced by an approximate version,as described below.Writing p (cid:96) = p L (cid:96) + p U (cid:96) , we assume throughout that n [ (cid:96) ] > p (cid:96) for (cid:96) = 1 ,
2, and denote n p = (cid:81) (cid:96) =1 ( n [ (cid:96) ] − p (cid:96) ), h ( p ) = p U + ( p + 1) p U , i.e. the cardinality of S [ − p L , p U ]. Suppose that thedata are observed on { ( t , t ) : n L ≤ t ≤ n U , − n L ≤ t ≤ n U } . Define a least squares predictorof order h ( p ) byˇ d p = arg min a ( (cid:96) ) ,(cid:96) ∈ S [ − p L ,p U ] n − p (cid:88) k ( p,n ) (cid:48)(cid:48) x [ k ] − (cid:88) (cid:96) ∈ S [ − p L ,p U ] a ( (cid:96) ) x [ k − (cid:96) ] , (4.6)where (cid:80) (cid:48)(cid:48) k ( p,n ) runs over { ( k , k ) : p − n L < k ≤ n U + 1 , p − n L < k ≤ n U + 1 } . We denotethe elements of ˇ d p by ˇ d p ( (cid:96) ), (cid:96) ∈ S [ − p L , p U ], and the minimum value by ˇ σ p . Feasible prediction14ased on a fitted autoregression of order p and the half-plane q (cid:52) is given byˇ x p [ q ; s ] = (cid:88) (cid:96) ∈ S [ − p L ,p U ] ˇ d p ( (cid:96) )ˇ x [ q ; s − (cid:96) ] , (4.7)and the predicted or interpolated value ˇ x p [ s ] = ˇ x [ s ] (we suppress the p subscript for brevity) isgiven analogously to (3.15). The autoregressive nonparametric spectrum estimate is defined asˇ f ( λ ) = ˇ σ p (2 π ) (cid:12)(cid:12)(cid:12) − (cid:80) (cid:96) ∈ S [ − p L ,p U ] ˇ d p ( (cid:96) ) e ij (cid:48) λ (cid:12)(cid:12)(cid:12) . (4.8)A predictor of x [ s ] based on (3.15) using ˆ f ( λ ) (respectively ˇ f ( λ )) is denoted ˆ x [ s ] (respectively ˜ x [ s ]),while a predictor based on (4.7) is denoted ˇ x [ s ] as mentioned above.Let (cid:126)x r [ s ] be a generic predictor of x [ s ] in replication r , r = 1 , . . . , (cid:126)x [ s ]) = (cid:40) (cid:88) r =1 ( (cid:126)x r [ s ] − x [ s ]) (cid:41) . (4.9)To fully utilize the information in the sample we report results when the prediction is performedusing an average of the predictions as given by (3.15). This combination approach is particularlyuseful when, as in our case, the missing value lies in the middle of the lattice. If the missing valueslie on the boundaries then the choice of half-plane to use may be obvious. This is the case in ourempirical example. Because the practitioner has no way of knowing which half-plane to employwhen the missing value is in the middle of the lattice, our practical guidance is to use the combinedprediction by averaging across the possible half-planes.These results are reported in Tables 1-4. We observe an improvement in prediction performanceas n ∗ increases, and also as the bandwidths (( m [1] , m [2]) and p ∗ ) increase as function of n ∗ . This isas expected in the theory. Nevertheless, even for rather small sample sizes the RMSE is acceptable.For example, with n ∗ = 5 we can obtain predictions with RMSE that are not radically differentfrom the n ∗ = 10 case, even though this change in n ∗ entails a sample that is nearly four timeslarger (231 against 66). In comparison the RMSE with the smaller sample size can be quite closeto those obtained with more data in some cases, cf. ˇ x (20 ,
20) for any error distribution.For the smallest sample size ˇ x (20 ,
20) can outperform ˆ x (20 ,
20) and ˜ x (20 , n ∗ the latter two clearly begin to dominate. An inspection of Tables 1-4 reveals that using ˜ x (20 , ( t ) ∼ U ( − , τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , (cid:15) ( t ) ∼ N (0 , τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , (cid:15) ( t ) ∼ χ − τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , Table 1: Monte Carlo RMSE of prediction from an average of predictions from q (cid:52) , q = 1 , , ,
4, with n ∗ = 5, model(4.1) τ = 0 . , . ×
14 grid of square cells is superimposed on Los Angeles, from 33 . ◦ N to 34 . ◦ Nand 117 . ◦ W to 118 . ◦ W. The grid covers a total of 5259 observations. The average of themedian house values for each cell is calculated and the 322 such observations form our sample. Thegridding is shown in Figure 2, in which the 8 empty cells are filled and marked with a cross. Wewish to predict the house price for these cells. House price data is not a zero mean process, so wesubtract the sample mean using the whole sample from each cell.We proceed in the following way: to obtain the coefficients ˆ a ( (cid:96) ), and ˇ d ( (cid:96) ) in (3.15) and (4.7) weuse the 19 ×
14 sublattice formed of the first 19 columns of cells. This sublattice contains no missingobservations. Once the coefficients are obtained we construct predictions using the remaining 4 × ( t ) ∼ U ( − , τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , (cid:15) ( t ) ∼ N (0 , τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , (cid:15) ( t ) ∼ χ − τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , Table 2: Monte Carlo RMSE of prediction with n ∗ = 10 from an average of predictions from q (cid:52) , q = 1 , , , by (21,8) and (22,8), using the half-plane q (cid:52) . We then predict (21,4), followed by (22,7), (23,9),(23,6) and (23,1). The locations of the missing observations towards the corner suggest that using q = 2 , , q (cid:52) is redundant, if at all feasibleThe predicted values are tabulated for various values of ( m [1] , m [2]) and ( p , p ) in Tables 5and 6. When using (3.15) the predicted values are quite stable across the choices ( m [1] , m [2]) =(2 , , (2 ,
3) using either the periodogram or AR spectral estimate. They most closely match thoseobtained when ( p , p ) = (2 , , (2 ,
3) in (4.7). In the latter case we compare the order selectioncriteria proposed by Gupta (2018), which include the usual FPE and BIC (denoted with a (cid:98) ) aswell as corrected version that account for the spatial case (denoted with (cid:101) and ¯ ). The FPE tendsto favour longer lag lengths no matter which version is used, as do (cid:100)
BIC and BIC. However thelatter as well as (cid:93)
FPE are not monotonically decreasing in lag length, unlike (cid:100)
BIC, (cid:91)
FPE and FPE.Thus the latter three are likely to overfit and seem undesirable. If we impose a selection rule thatpicks the desirable lag order as the first instance when the selection criteria shows an increase in laglength, then we get ( p , p ) = (2 ,
2) using (cid:93)
FPE and BIC. (cid:103)
BIC indicates a choice of ( p , p ) = (1 , p , p ) = (2 ,
2) is a reasonable choice.17 ( t ) ∼ U ( − , τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , (cid:15) ( t ) ∼ N (0 , τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , (cid:15) ( t ) ∼ χ − τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , Table 3: Monte Carlo RMSE of prediction with n ∗ = 20 from an average of predictions from q (cid:52) , q = 1 , , , ( t ) ∼ U ( − , τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , (cid:15) ( t ) ∼ N (0 , τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , (cid:15) ( t ) ∼ χ − τ m [1] , m [2]) p ∗ ˆ x (20 ,
20) ˜ x (20 ,
20) ˇ x (20 , Table 4: Monte Carlo RMSE of prediction with n ∗ = 40 from an average of predictions from q (cid:52) , q = 1 , , ,
4, model(4.1)
In this paper we have dealt with the problem of prediction when the data is collected in a lattice.To do so we have considered the unilateral representation of the sequence, the latter being possibleas observed by Whittle (1954). Our approach does not need any parameterization of the model(covariance structure of the data) because it is completely nonparametric, and we do not needto worry about the possible lack of simplicity of the unilateral representation as compared to themultilateral one. 19 igure 2: Gridded Los Angeles median house price data ( m [1] , m [2]) (20,8) (21,8) (22,8) (21,4) (22,7) (23,9) (23,6) (23,1)(1,1) (cid:98) (cid:101) (cid:98) (cid:101) (cid:98) (cid:101) (cid:98) (cid:101) (cid:98) (cid:101) (cid:98) (cid:101) Table 5: Los Angeles house price predictions using (3.15), in ‘00,000 US Dollars ( p , p ) (20,8) (21,8) (22,8) (21,4) (22,7) (23,9) (23,6) (23,1)(1,1) 1.6525 1.2876 1.0325 2.4728 1.6663 2.1204 1.6825 1.7266(1,2) 1.5373 1.2516 0.8473 2.3741 1.7944 1.8452 1.7626 1.5659(2,1) 1.8138 1.6433 1.1237 2.4881 1.8545 2.3404 1.7547 2.3233(2,2) 1.8703 1.9858 1.4089 2.6079 2.4100 2.4992 2.0862 2.2252(3,2) 1.7925 1.9298 1.4400 2.4489 2.6472 2.5962 2.4021 1.9518(4,3) 2.4465 2.2325 1.9075 2.0319 3.1207 3.5439 2.8234 2.0841 Table 6: Los Angeles house price predictions using (4.7), in ‘00,000 US Dollars p , p ) (cid:100) BIC (cid:103)
BIC BIC (cid:91)
FPE (cid:93)
FPE FPE(1,1) 0.5979 0.6001 0.5990 0.5472 0.5639 0.5555(1,2) 0.5973 0.5979 0.5976 0.5144 0.5183 0.5164(2,1) 0.5953 0.6042 0.5997 0.4766 0.5377 0.5062(2,2) 0.5946 0.6008 0.5977 0.4351 0.4728 0.4535(3,2) 0.5943 0.6105 0.6024 0.4022 0.5018 0.4489(4,3) 0.5824 0.6081 0.5953 0.2129 0.3058 0.2543
Table 7: Los Angeles house price predictions using (4.7), BIC and FPE athematical AppendixA Proofs of theorems For the sake of notational simplicity, we shall assume that M [1] = M [2] and also that n [1] = n [2],so that n = n [1] and m = m [1] say. Also to simplify the notation we shall write (cid:80) j (cid:22) J insteadof (cid:80) j (cid:22) J + and hence that (cid:88) (cid:22) (cid:96) (cid:22) M c ( (cid:96) [1] , (cid:96) [2]) = M [2] (cid:88) (cid:96) [2]=0 c (0 , (cid:96) [2]) + M [1] (cid:88) (cid:96) [1]=0 M [2] (cid:88) (cid:96) [2]= − M [2] c ( (cid:96) [1] , (cid:96) [2]) .Finally, we shall write (cid:98) α j (1) or (cid:101) α j,n (1), say, as (cid:98) α j or (cid:101) α j,n . A.1 Proof of Theorem 1
We shall examine part ( a ), since part ( b ) follows by Lemma 2 and standard arguments. By theCram´er-Wold device, it suffices to show that for a finite set of constants ϕ j , j = 1 , ..., J , n / J (cid:88) j =1 ϕ j ( (cid:98) α j − (cid:101) α j,n ) d → N , J (cid:88) j =1 ϕ j (1 + (1 + κ ) δ j ) . (A.1)First, by definitions of (cid:98) α j and (cid:101) α j,n , we have that (cid:98) α j − (cid:101) α j,n = 12 M (cid:88) (cid:22) (cid:96) (cid:22) M log (cid:32) (cid:98) f (cid:96) (cid:101) f (cid:96) (cid:33) cos (cid:16) j · (cid:101) λ (cid:96) (cid:17) . (A.2)Using the inequality sup (cid:96) =1 ,...,p (cid:37) (cid:96) ≤ (cid:80) p(cid:96) =1 (cid:37) (cid:96) , and then Lemma 3,sup (cid:22) (cid:96) (cid:22) M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:88) (cid:22) (cid:96) (cid:22) M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p (cid:18) Mm (cid:19) .So, by Taylor expansion of log ( z ) around z = 1, ( A.2) is then12 M (cid:88) (cid:22) (cid:96) (cid:22) M (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) cos (cid:16) j · (cid:101) λ (cid:96) (cid:17) − M (cid:88) (cid:22) (cid:96) (cid:22) M (cid:32) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:33) cos (cid:16) j · (cid:101) λ (cid:96) (cid:17) (1 + o p (1))= 12 M (cid:88) (cid:22) (cid:96) (cid:22) M (cid:98) f (cid:96) − (cid:101) f (cid:96) f (cid:96) cos (cid:16) j · (cid:101) λ (cid:96) (cid:17) + 14 M (cid:88) (cid:22) (cid:96) (cid:22) M (cid:32) (cid:98) f (cid:96) − (cid:101) f (cid:96) f (cid:96) (cid:33) (cid:32) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:33) cos (cid:16) j · (cid:101) λ (cid:96) (cid:17) + o p (cid:16) n − / (cid:17) , 22ecause by Lemma 3, the second term on the left of the last displayed equality is O p (cid:0) m − (cid:1) = o p (cid:0) n − / (cid:1) by Condition C
4. Now, the absolute value of the second term on the right of the lastdisplayed expression is bounded by12 M (cid:88) (cid:22) (cid:96) (cid:22) M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) f (cid:96) − (cid:101) f (cid:96) f (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:18) Mm / (cid:19) = o (cid:16) n − / (cid:17) by Lemmas 2 and 3 and then Condition C
4. So, we conclude that n / ( (cid:98) α j − (cid:101) α j,n ) = n / M (cid:88) (cid:22) (cid:96) (cid:22) M (cid:98) f (cid:96) − (cid:101) f (cid:96) f (cid:96) cos (cid:16) j · (cid:101) λ (cid:96) (cid:17) + o p (1)= 12 n / (cid:88) (cid:22) (cid:96) (cid:22) n I Tx ( λ (cid:96) ) − f ( λ (cid:96) ) f ( λ (cid:96) ) h j,n ( (cid:96) ) + o p (1) ,where h j,n ( (cid:96) ) = f − p f ( λ (cid:96) ) cos (cid:16) j · (cid:101) λ (cid:96) (cid:17) if (2 p [ q ] − m [ q ] < (cid:96) [ q ] < (2 p [ q ] + 1) m [ q ] and p [ q ] =1 , ..., M − q = 1 ,
2. That is, h j,n ( (cid:96) ) is a step function in (cid:101) Π with jumps equal to cos (cid:16) j · (cid:101) λ (cid:96) (cid:17) .Now, using Lemma 4 of Hidalgo (2009), we have that (cid:88) (cid:22) (cid:96) (cid:22) n (cid:32) I Tx ( λ (cid:96) ) f ( λ (cid:96) ) − (2 π ) d I Tε ( λ (cid:96) ) σ ε (cid:33) h j,n ( (cid:96) ) = o p (cid:16) n / (cid:17) .So, we conclude that the left side of ( A.
1) is n / J (cid:88) j =1 ϕ j ( (cid:98) α j − (cid:101) α j,n ) = J (cid:88) j =1 ϕ j n / (cid:88) (cid:22) (cid:96) (cid:22) n (cid:32) (2 π ) d I Tε ( λ (cid:96) ) σ ε − (cid:33) h j,n ( (cid:96) ) + o p (1) .From here the conclusion is standard and so it is omitted. (cid:4) A.2 Proof of Corollary 1
By delta methods and Theorem 1 part ( a ), we obtain that n / (cid:0)(cid:98) σ ϑ − (cid:101) σ ϑ,n (cid:1) = n / ( (cid:98) α − (cid:101) α ,n ) (2 π ) e α ,n − (cid:101) α ,n + o p (1) d → N (0 , κ ) .The proof of the corollary concludes by observing that by definition σ ε =: 2 πe α and (cid:101) α ,n − α = ( (cid:101) α ,n − α ,n ) + ( α ,n − α )23nd hence Theorem 1 part ( b ) and that α ,n − α =: 12 M (cid:88) (cid:22) (cid:96) (cid:22) M log f (cid:96) − π (cid:90) (cid:101) Π log f ( λ ) dλ → f ( λ ) is a continuous differentiable function. (cid:4) A.3 Proof of Theorem 2
Define (cid:98) d j = log (cid:98) A j , (cid:101) d j,n = log (cid:101) A j,n and d j,n = log A j,n . We begin with part ( b ). By definition of (cid:101) A j,n and A j,n , we have that (cid:101) d j,n − d j,n is (cid:88) (cid:22) k (cid:22) M ( (cid:101) α k,n − α k,n ) e − ik · (cid:101) λ j .By Taylor expansion of log (cid:16) (cid:101) f (cid:96) /f (cid:96) (cid:17) , the last displayed expression is (cid:88) (cid:22) k (cid:22) M M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:32) (cid:101) f (cid:96) − f (cid:96) f x,(cid:96) (cid:33) + (cid:32) (cid:101) f (cid:96) − f (cid:96) f (cid:96) (cid:33) (1 + o (1)) cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) e − ik · (cid:101) λ j = (cid:88) (cid:22) k (cid:22) M M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:32) (cid:101) f (cid:96) − f (cid:96) f (cid:96) (cid:33) cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) e − ik · (cid:101) λ j + O (cid:0) M − η k + M − (cid:1) = (cid:88) (cid:22) k (cid:22) M M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:32) (cid:101) f (cid:96) − f (cid:96) f (cid:96) (cid:33) (cid:16) e ik · (cid:101) λ (cid:96) − j + e ik · (cid:101) λ − (cid:96) − j (cid:17) + O (cid:0) M − (cid:1) = O (cid:0) M − (cid:1) ,where the first equality we have used Lemma 1 and in the second equality Lemmas 2 and 5. Observethat | η k | is a summable sequence. From here we conclude the proof of part ( b ) by Condition C a ). By Cram´er-Wold device it suffices to examine, for any set of finiteconstants ϕ j , m / q (cid:88) j = p ϕ j (cid:16) (cid:98) A j − A j,n (cid:17) .By definitions of (cid:98) A j,n and (cid:101) A j,n , we examine the the behaviour oflog (cid:98) A j − log (cid:101) A j,n =: (cid:98) d j − (cid:101) d j,n = − (cid:88) (cid:22) k (cid:22) M ( (cid:98) α k − (cid:101) α k,n ) e − ik · (cid:101) λ j .By definition of (cid:98) α k − (cid:101) α k,n and its proceeding as in the proof of Theorem 1, we have that this term24s − (cid:88) (cid:52) k (cid:22) M M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:32) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:33) cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) e − ik · (cid:101) λ j + o p (cid:16) m − / (cid:17) = (cid:88) (cid:52) k (cid:22) M n (cid:88) (cid:52) (cid:96) (cid:52) n ρ (cid:96) h k,n ( (cid:96) ) e − ik · (cid:101) λ j + o p (cid:16) m − / (cid:17) by Condition C.
4, where ρ (cid:96) = (2 π ) σ − ε I Tε ( λ (cid:96) ) and h k,n ( (cid:96) ) was defined in Theorem 1. So, m / (cid:16) (cid:98) d j − (cid:101) d j,n (cid:17) = − m / (cid:88) (cid:52) k (cid:22) M n (cid:88) (cid:52) (cid:96) (cid:22) n ρ (cid:96) h k,n ( (cid:96) ) e − ik · (cid:101) λ j + o p (1)= − n / (cid:88) (cid:52) (cid:96) (cid:22) n ρ (cid:96) ψ (cid:96),M ( n ) ( j ) + o p (1) ,where ψ (cid:96),M ( n ) ( j ) = (2 M ) − / (cid:80) (cid:52) k (cid:52) M h k,n ( (cid:96) ) e − ik · (cid:101) λ j and because n [ q ] = 4 M [ q ] m [ q ] for q = 1 , m / J (cid:88) j =1 ϕ j (cid:16) (cid:98) d j − (cid:101) d j,n (cid:17) = 12 n / (cid:88) (cid:52) (cid:96) (cid:22) n ρ (cid:96) J (cid:88) j =1 ϕ j ψ (cid:96),M ( n ) ( j ) + o p (1) (A.3)and proceeding as in the proof of Theorem 1, we have that it converges to a complex normal randomvariable with variance V = lim n →∞ J (cid:88) j ,j =1 ϕ j ϕ j n (cid:88) (cid:52) (cid:96) (cid:22) n ψ (cid:96),M ( n ) ( j ) ψ (cid:96),M ( n ) ( j )= lim n →∞ J (cid:88) j ,j =1 ϕ j ϕ j M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:88) (cid:52) k ,k (cid:52) M e − k (cid:101) λ j + k (cid:101) λ j cos (cid:16) k (cid:101) λ (cid:96) (cid:17) cos (cid:16) k (cid:101) λ (cid:96) (cid:17) = 2 − J (cid:88) j ,j =1 ϕ j ϕ j (cid:16) δ j [1] − j [2] + 2 − φ j [1] φ j [1] − iφ | j − j | [1] (cid:17) by Lemma 4. Hence, we conclude that m / q (cid:88) j = p ϕ j (cid:16) (cid:98) d j − d j,n (cid:17) d → N (0 , V ) .From here the conclusion of the theorem follows by standard delta arguments. (cid:4) .4 Proof of Theorem 3 The proof is completed if( a ) n / q (cid:88) υ = p ϕ υ ( (cid:98) a υ − (cid:101) a υ,n ) d → N (cid:32) , q (cid:88) υ ,υ = p ϕ υ ϕ υ Ω a,υ ,υ (cid:33) , (A.4)( b ) n / q (cid:88) υ = p ϕ υ ( (cid:101) a υ,n − a υ,n ) P → a ). By definition of (cid:98) a υ − (cid:101) a υ,n and the Taylor expansion of (cid:98) A j − (cid:101) A j,n , a typicalcomponent on the left of ( A.
4) is n / M (cid:88) − M ≤ j ≤ M (cid:16) (cid:98) d j − (cid:101) d j,n (cid:17) (cid:101) A j,n e iυ · (cid:101) λ j + n / M (cid:88) − M ≤ j ≤ M (cid:12)(cid:12)(cid:12) (cid:98) d j − (cid:101) d j,n (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (cid:101) A j,n (cid:12)(cid:12)(cid:12) (1 + o p (1)) . (A.5)First, we show that the second term of ( A.
5) is o p (1). By trivial inequalities and ( A. m (cid:12)(cid:12)(cid:12) (cid:98) d j − (cid:101) d j,n (cid:12)(cid:12)(cid:12) = C n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) (cid:52) (cid:96) (cid:22) n ρ (cid:96) ψ (cid:96),M ( n ) ( j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + o p (1) .The contribution into the second term of ( A.
5) is o p (1) by Theorem A of Serfling (1980), p. 14,because by Theorem 2 and the continuous mapping theorem it converges in distribution to a χ distribution and x ( t ) is uniformly integrable. So, from the definition of (cid:98) d j − (cid:101) d j,n , we have that( A.
5) is − n / M (cid:88) − M ≤ j ≤ M (cid:88) (cid:52) k (cid:22) M ( (cid:98) α k − (cid:101) α k,n ) e − ik · λ j (cid:101) A j,n e − ik · (cid:101) λ j + O p (cid:18) n m (cid:19) (A.6)= − n / M (cid:88) − M ≤ j ≤ M (cid:88) (cid:52) k,(cid:96) (cid:22) M (cid:32) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:33) cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) e − ik · (cid:101) λ j (cid:101) A j,n e − iυ · (cid:101) λ j + o p (1)= − n / M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:32) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:33) (cid:88) (cid:52) k (cid:22) M cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) (cid:88) − M ≤ j ≤ M A j e − i ( k − υ ) · (cid:101) λ j + o p (1) ,where for the first equality we have used Taylor expansion and that by Proposition 3 of Hi-dalgo and Yajima (2002), sup (cid:52) (cid:96) (cid:52) M (cid:101) f − (cid:96) (cid:12)(cid:12)(cid:12) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:12)(cid:12)(cid:12) = o p (1) and then that (cid:12)(cid:12)(cid:12)(cid:80) (cid:52) (cid:96) (cid:22) M cos (cid:16) k · (cid:101) λ (cid:96) (cid:17)(cid:12)(cid:12)(cid:12) = O ( M /(cid:96) [1] (cid:96) [2]) and (cid:12)(cid:12)(cid:12)(cid:80) − M ≤ j ≤ M (cid:16) (cid:101) A j,n − A j (cid:17) e − i ( u − υ ) · (cid:101) λ j (cid:12)(cid:12)(cid:12) = O (1) proceeding as in the proof ofLemma 6 and Lemma 3.So, proceeding as with the proof of Theorem 3 in Hidalgo and Yajima (2002), cf. their expres-26ions (52) and (53), we can conclude that n / q (cid:88) υ = p ϕ υ ( (cid:98) a υ − a υ,n ) = n / q (cid:88) υ = p ϕ υ (cid:0)(cid:98) a υ − (cid:101) a υ,n (cid:1) + o p (1) , (A.7)where (cid:98) a υ − (cid:101) a υ,n = − n / M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:32) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:101) f (cid:96) (cid:33) (cid:88) (cid:52) k (cid:22) M cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) a ∗ k − υ ,where (4 M ) − (cid:80) − M ≤ j ≤ M A j e − ir · (cid:101) λ j = a ∗ r , 0 (cid:52) r (cid:52) M . Now Theorems 1 and 2 yield that n / q (cid:88) υ = p ϕ υ (cid:0)(cid:98) a υ − (cid:101) a υ,n (cid:1) d → N (0 , V ) , V = lim M →∞ M (cid:88) (cid:52) (cid:96) (cid:22) M q (cid:88) υ = p ϕ υ (cid:88) (cid:52) k (cid:22) M cos ( k · λ (cid:96) ) a ∗ k − υ .Now a typical component of V is ϕ υ ϕ υ timeslim M →∞ (cid:88) (cid:52) k ,k (cid:52) M a ∗ k − υ a ∗ k − υ M (cid:88) (cid:52) (cid:96) (cid:22) M cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) = lim M →∞ (cid:88) (cid:52) k ,k (cid:22) M a ∗ k − υ a ∗ k − υ M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:16) cos (cid:16) ( k + k ) · (cid:101) λ (cid:96) (cid:17) + cos (cid:16) ( k − k ) · (cid:101) λ (cid:96) (cid:17)(cid:17) = lim M →∞ (cid:88) (cid:52) k (cid:22) υ a ∗ k − υ a ∗ k − υ − M (cid:88) (cid:52) k (cid:54) = k (cid:22) M ; k ± k [1]= odd a ∗ k − υ a ∗ k − υ using expression (65) in Hidalgo and Yajima (2002) and where we have assumed that υ (cid:52) υ with-out loss of generality. From here the conclusion of part ( a ) follows after observing that a k = O (cid:0) k − (cid:1) and that a ∗ k − υ = a k − υ + O (cid:0) M − (cid:1) by Brillinger (1981), p. 15, because A ( λ ) is a continuously dif-ferentiable function. Now part ( b ) follows by identical arguments to those in ( A.
6) and Lemma 2so that it is omitted. (cid:4)
A.5 Proof of Theorem 4 ( a ) By definition, (cid:98) x ∗ [1; s ] − x ∗ ( s ) is (cid:98) x ∗ [ q ; s ] = (cid:88) j (cid:52) M + (cid:98) a j ( q ) x ∗ [ q ; s − j ] . ϑ ( s ) − (cid:88) ≺ k/ (cid:52) k (cid:52) M a k x ∗ ( s − k ) + (cid:88) ≺ k (cid:52) M ( (cid:98) a k − a k,n ) x ∗ ( s − k ) + (cid:88) ≺ k (cid:52) M ( a k,n − a k ) x ∗ ( s − k ) . (A.8)27he second moment of the second term of ( A.
8) is σ x (cid:88) ≺ k/ (cid:52) k (cid:52) M a k + 2 (cid:88) ≺ k,j/ (cid:52) k,j (cid:52) M a k a j γ ( | k − j | ) = O (cid:0) M − (cid:1) because a k = O (cid:0) k − (cid:1) and γ ( k ) ∼ Kk − by Condition C
1. Next, denoting (cid:37) k = (cid:98) a k − (cid:101) a k,n we have that the third term of ( A.
8) is (cid:88) (cid:52) k (cid:52) M (cid:37) k x ∗ ( s − k ) + ( (cid:98) a k − a k,n − (cid:37) k ) x ∗ ( s − k ) .Now using expression ( A. k | (cid:98) a k − a k,n − (cid:37) k | = O p (cid:0) m − M (cid:1) because | (cid:98) a k − a k,n − (cid:37) k | = O p (cid:0) m − (cid:1) , so that the second term of the last displayed expression is O p (cid:0) m − M (cid:1) = o p (1) by Condition C
4. On the other hand, the second moment of the first term is n − σ x (cid:88) (cid:52) k (cid:52) M E (cid:16) n / (cid:37) k (cid:17) + 2 (cid:88) (cid:52) k To simplify the notation, in what follows (cid:80) − m ≤ j ≤ m will be denoted as (cid:80) j . Also, recall that wehave assumed that C − < m [1] /m [2] < C for some finite positive constant C . Lemma 1. Under Conditions C − C we have that (cid:98) f x,(cid:96) = 14 m (cid:88) j f x (cid:16) λ j + (cid:101) λ (cid:96) (cid:17) I Tε (cid:16) λ j + (cid:101) λ (cid:96) (cid:17) + (cid:15) (cid:96) ,uniformly in j , where E sup (cid:96) | (cid:15) (cid:96) | = o (cid:0) m − (cid:1) .Proof. The proof follows easily from Lemma 4 of Hidalgo (2009), and so it is omitted.Denote (cid:101) f x,(cid:96) = 14 m (cid:88) j f x (cid:16) λ j + (cid:101) λ (cid:96) (cid:17) . Lemma 2. Assuming C − C , ( a ) (cid:101) α k,n − α k,n = M − g k + O (cid:0) M − (cid:1) and ( b ) α k,n − α k = O (cid:0) M − (cid:1) . roof. We begin with part ( a ). By definition of (cid:101) f (cid:96) and then Taylor’s expansion of log ( · ), we havethat (cid:101) α k,n − α k,n = 12 M (cid:88) (cid:52) (cid:96) (cid:22) M (cid:32) (cid:101) f (cid:96) − f (cid:96) f (cid:96) (cid:33) + (cid:32) (cid:101) f (cid:96) − f (cid:96) f (cid:96) (cid:33) (1 + o (1)) cos (cid:16) k · (cid:101) λ (cid:96) (cid:17) .So, to complete the proof we need to examine the behaviour of f − (cid:96) (cid:16) (cid:101) f (cid:96) − f (cid:96) (cid:17) . Now by definition, (cid:101) f (cid:96) − f (cid:96) f (cid:96) = f − (cid:96) m (cid:88) j ( f ( λ j +2 m(cid:96) ) − f ( λ m(cid:96) ))= f − (cid:96) m (cid:88) j (cid:18) j [1] n [1] f ( λ m(cid:96) ) + j [2] n [2] f ( λ m(cid:96) ) (cid:19) + O (cid:0) M − (cid:1) = M − g ( λ m(cid:96) ) + O (cid:0) M − (cid:1) ,where g ( λ ) is a twice differentiable function and f ( λ ) = ∂ ∂λ [1] f ( λ ) ; f ( λ ) = ∂ ∂λ [2] f ( λ )From here the conclusion is standard after we identify g k as the Fourier coefficients of g ( λ ).Part ( b ) follows using Brillinger (1981), p.15, since by assumption f x ( λ ) is twice continuouslydifferentiable. Lemma 3. Assuming, C − C , for all (cid:96) = 1 , , ...E (cid:16) (cid:101) f − (cid:96) (cid:16) (cid:98) f (cid:96) − (cid:101) f (cid:96) (cid:17)(cid:17) = O (cid:0) m − (cid:1) .Proof. Because (cid:101) f x,(cid:96) = (4 m ) − (cid:80) j f ( λ j +2 m(cid:96) ) > 0, the left side of the last displayed equality, exceptmultiplicative constants, is bonded by2 E m (cid:88) j f ( λ j +2 m(cid:96) ) (cid:18) I Tx ( λ j +2 m(cid:96) ) f ( λ j +2 m(cid:96) ) − I Tε ( λ j +2 m(cid:96) ) (cid:19) +2 E m (cid:88) j f ( λ j +2 m(cid:96) ) (cid:0) I Tε ( λ j +2 m(cid:96) ) − (cid:1) .The first term of the last displayed expression is O (cid:0) m − (cid:1) by Lemma 1, whereas the second termfollows by standard arguments as ε t is an iid sequence with finite fourth moments.29 emma 4. M (cid:88) (cid:52) p (cid:52) M (cid:88) (cid:52) k (cid:52) M cos (cid:16) k · (cid:101) λ p (cid:17) e − ik · (cid:101) λ j (cid:88) (cid:52) k (cid:52) M cos (cid:16) − k · (cid:101) λ p (cid:17) e ik · (cid:101) λ j = 2 − (cid:16) δ j [1] − j [1] + 2 − φ j [1] φ j [1] − iφ ( j − j )[1] (cid:17) + O (cid:0) M − (cid:1) .Proof. It follows by obvious extension of Lemma 4 of Hidalgo and Yajima (2002), after we observethat the left side of the last displayed expression is12 M (cid:88) k ,k (cid:52) M e − i ( k · (cid:101) λ j − k · (cid:101) λ j ) (cid:88) p (cid:52) M cos (cid:16) k · (cid:101) λ p (cid:17) cos (cid:16) − k · (cid:101) λ p (cid:17) = 12 M (cid:88) k ,k (cid:52) M e − i ( k · (cid:101) λ j − k · (cid:101) λ j ) (cid:88) p (cid:52) M (cid:110) cos (cid:16) ( k + k ) · (cid:101) λ p (cid:17) + cos (cid:16) ( k − k ) · (cid:101) λ p (cid:17)(cid:111) and then that when p [2] (cid:54) = 0 (cid:88) k (cid:52) M e − ik · (cid:101) λ p = 0whereas 1 M [2] (cid:88) k (cid:52) M e − ik · (cid:101) λ p =: M [1] (cid:88) k [1]=0 e − ik [1] · (cid:101) λ p [1] = (cid:40) M [1] if p [1] = 0 ip [1] (1 − cos ( p [1] π )) if p [1] (cid:54) = 0when p [2] = 0. Lemma 5. Under Condition C , we have that (cid:88) M (cid:52) s α s e − is · (cid:101) λ j = O (cid:0) M − (cid:1) .Proof. The proof is standard because four times continuous differentiability of f ( λ ) implies that σ s = O (cid:0) s − (cid:1) . Lemma 6. Under Condition C , a k,n − a k = O (cid:0) M − (cid:1) .Proof. a k,n − a k is 14 M (cid:88) − M ≤ j ≤ M (cid:0) A j,n − A ∗ j (cid:1) e ik · (cid:101) λ j + 14 M (cid:88) − M ≤ j ≤ M (cid:0) A ∗ j − A j (cid:1) e ik · (cid:101) λ j + 14 M (cid:88) − M ≤ j ≤ M A j e ik · (cid:101) λ j − π (cid:90) Π A ( λ ) e ik · λ dλ , (B.1)where A ∗ j = exp (cid:16) − (cid:80) (cid:52) s (cid:52) M α s e − is · (cid:101) λ j (cid:17) . The third term of ( B. 1) is O (cid:0) M − (cid:1) by Brillinger (1981),p. 15, because A ( λ ) cos ( k · λ ) and A ( λ ) sin ( k · λ ) are continuous differentiable. Now, the second30erm of ( B. 1) is bounded in absolute value by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M (cid:88) − M ≤ j ≤ M A j exp (cid:88) s (cid:52) / (cid:52) s (cid:52) M α s e − is · (cid:101) λ j − e ik · (cid:101) λ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C M (cid:88) − M ≤ j ≤ M | A j | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) s (cid:52) / (cid:52) s (cid:52) M α s e − is · (cid:101) λ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (1 + O (1)) = O (cid:0) M − (cid:1) ,by Lemma 5 and that (cid:80) − M ≤ j ≤ M | A ( j ) | = O ( M ). Finally, by definition of A j,n and A ∗ ( j ), the firstterm of ( B. 1) is 12 M (cid:88) − M ≤ j ≤ M exp (cid:88) (cid:52) s (cid:52) M ( α s,n − α s ) e − is · (cid:101) λ j − A ∗ ( j ) e ik · (cid:101) λ j .Now, from the definition of α s,n and α s , (cid:80) (cid:52) s (cid:52) M ( α s,n − α s ) e − is · (cid:101) λ j is (cid:88) (cid:52) s (cid:52) M M (cid:88) (cid:52) (cid:96) (cid:52) M log ( f (cid:96) ) cos (cid:16) s · (cid:101) λ (cid:96) (cid:17) − π (cid:90) (cid:101) Π log ( f ( λ )) cos ( s · λ ) dλ e − is · (cid:101) λ j = 12 M (cid:88) (cid:52) s (cid:52) M (log f (0) + cos ( s · π ) f ( π )) e − is · (cid:101) λ j + O (cid:0) M − (cid:1) = M − (cid:16) D (cid:16) − (cid:101) λ j (cid:17) log f (0) + 2 − (cid:16) D (cid:16)(cid:101) λ j + π (cid:17) + D (cid:16)(cid:101) λ j − π (cid:17)(cid:17) log f ( π ) (cid:17) + O (cid:0) M − (cid:1) ,where D ( λ ) = (cid:80) s ≺ M − e is · λ , by Brillinger (1981), p.15, because f ( · ) is twice continuous differen-tiable. From here it is standard to conclude that the first term of ( B. 1) is O (cid:0) M − (cid:1) . References Baltagi, B. H., H. H. Kelejian, and I. R. Prucha (2007). Analysis of spatially dependent data. Journal of Econometrics: Annals Issue .Banerjee, S., A. E. Gelfand, J. R. Knight, and C. F. Sirmans (2004). Spatial modeling of houseprices using normalized distance-weighted sums of stationary processes. Journal of Business &Economic Statistics 22 , 206–213.Batchelor, L. and H. Reed (1918). Relation of the variability of yields of fruit trees to the accuracyof field trials. Journal of Agricultural Research XII , 245–283.Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of theRoyal Statistical Society Series B 36 , 192–236.Bester, C. A., T. G. Conley, and C. B. Hansen (2011). Inference with dependent data using clustercovariance estimators. Journal of Econometrics 165 , 137–151.31ester, C. A., T. G. Conley, C. B. Hansen, and T. J. Vogelsang (2016). Fixed-b asymptotics forspatially dependent robust nonparametric covariance matrix estimators. Econometric Theory 32 ,154–186.Bhansali, R. J. (1974). Asymptotic properties of the Wiener-Kolmogorov predictor. I. Journal ofthe Royal Statistical Society. Series B 36 , 61–73.Bhansali, R. J. (1977). Asymptotic properties of the Wiener-Kolmogorov predictor. II. Journal ofthe Royal Statistical Society. Series B 41 , 375–387.Bhansali, R. J. (1978). Linear prediction by autoregressive model fitting in the time domain. Annalsof Statistics 60 , 224–231.Bloomfield, P. (1973). An exponential model for the spectrum of a scalar time series. Biometrika 60 ,217–226.Breidt, F. J., R. A. Davis, and A. A. Trindade (2001). Least absolute deviation estimation forall-pass time series models. Annals of Statistics 29 , 919–946.Brillinger, D. R. (1981). Time Series: Data Analysis and Theory . Holden Day, San Francisco.Brockwell, P. J. and R. Davis (2006). Time Series: Theory and Methods . Springer.Cavaliere, G., H. B. Nielsen, and A. Rahbek (2020). Bootstrapping noncausal autoregressions: withapplications to explosive bubble modeling. Journal of Business & Economic Statistics 38 , 55–67.Conley, T. G. . (1999). GMM estimation with cross sectional dependence. Journal of Economet-rics 92 , 1–45.Conley, T. G. and F. Molinari (2007). Spatial correlation robust inference with imperfect distanceinformation. Journal of Econometrics 140 , 76–96.Cressie, N. and H. Huang (1999). Classes of nonseparable, spatio-temporal stationary covariancefunctions. Journal of the American Statistical Association 94 , 1330–1340.Cressie, N. A. (1993). Statistics for Spatial Data . Wiley Series in Probability and MathematicalStatistics: Applied Probability and Statistics. John Wiley & Sons.Dahlhaus, R. and H. K¨unsch (1987). Edge effects and efficient parameter estimation for stationaryrandom fields. Biometrika 74 , 877–882.Davis, R. A., C. Kl¨uppelberg, and C. Steinkohl (2013). Statistical inference for max-stable processesin space and time. Journal of the Royal Statistical Society Series B 75 , 791–819.Fernandez-Casal, R., W. Gonzalez-Manteiga, and M. Febrero-Bande (2003). Flexible spatio-temporal stationary variogram models. Statistics and Computing 13 , 127–136.Gao, J., Z. Lu, and D. Tjøstheim (2006). Estimation in semiparametric spatial regression. TheAnnals of Statistics 34 , 1395–1435.Genton, M. G. and H. L. Koul (2008). Minimum distance inference in unilateral autoregressivelattice processes. Statistica Sinica 18 , 617–631.Gupta, A. (2018). Autoregressive spatial spectral estimates. Journal of Econometrics 203 , 80–95.Guyon, X. (1982). Parameter estimation for a stationary process on a d -dimensional lattice. Biometrika 69 , 95–105. 32aining, R. P. (1978). The moving average model for spatial interaction. Transactions of theInstitute of British Geographers 3 , 202–225.Hannan, E. J. (1970). Multiple Time Series . John Wiley & Sons.Helson, H. and D. Lowdenslager (1958). Prediction theory and Fourier series in several variables. Acta Mathematica 99 , 165–202.Helson, H. and D. Lowdenslager (1961). Prediction theory and Fourier series in several variables.II. Acta Mathematica 106 , 175–213.Hidalgo, J. (2009). Goodness of fit for lattice processes. Journal of Econometrics 151 , 113–128.Hidalgo, J. and Y. Yajima (2002). Prediction and signal extraction of strongly dependent processesin the frequency domain. Econometric Theory 18 , 584–624.Iversen Jr, E. (2001). Spatially disaggregated real estate indices. Journal of Business & EconomicStatistics 19 , 341–357.Jenish, N. (2016). Spatial semiparametric model with endogenous regressors. Econometric The-ory 32 , 714–739.Jenish, N. and I. R. Prucha (2012). On spatial processes and asymptotic inference under near-epochdependence. Journal of Econometrics 170 , 178 – 190.Lanne, M. and P. Saikkonen (2013). Noncausal vector autoregression. Econometric Theory 29 ,447–481.Lanne, M. and P. Saikonnen (2011). Noncausal autoregressions for economic time series. Journalof Time Series Econometrics 3 , 1–30.Lewis, R. and G. C. Reinsel (1985). Prediction of multivariate time series by autoregressive modelfitting. Journal of Multivariate Analysis 16 , 393–411.Majumdar, A., H. J. Munneke, A. E. Gelfand, S. Banerjee, and C. F. Sirmans (2006). Gradients inspatial response surfaces with application to urban land values. Journal of Business & EconomicStatistics 24 , 77–90.McElroy, T. S. and S. H. Holan (2014). Asymptotic theory of cepstral random fields. The Annalsof Statistics 42 , 64–86.Mercer, W. B. and A. D. Hall (1911). The experimental errors of field trials. Journal of AgriculturalScience IV , 107–132.Mitchell, M. W., M. G. Genton, and M. L. Gumpertz (2005). Testing for separability of space-timecovariances. Environmetrics 16 , 819–831.Nychka, D., S. Bandyopadhyay, D. Hammerling, F. Lindgren, and S. Sain (2015). A multiresolutionGaussian process model for the analysis of large spatial datasets. Journal of Computational andGraphical Statistics 24 , 579–599.Pace, R. K. and R. Barry (1997). Sparse spatial autoregressions. Statistics & Probability Letters 33 ,291 – 297.Robinson, P. M. (2007). Nonparametric spectrum estimation for spatial data. Journal of StatisticalPlanning and Inference 137 , 1024–1034. 33obinson, P. M. (2011). Asymptotic theory for nonparametric regression with spatial data. Journalof Econometrics 165 , 5–19.Robinson, P. M. and J. Vidal Sanz (2006). Modified Whittle estimation of multilateral models ona lattice. Journal of Multivariate Analysis 97 , 1090–1120.Roknossadati, S. M. and M. Zarepour (2010). M -estimation for a spatial unilateral autoregressivemodel with infinite variance innovations. Econometric Theory 26 , 1663–1682.Serfling, R. (1980). Approximation Theorems of Mathematical Statistics . John Wiley & Sons.Solo, V. (1986). Modeling of two-dimensional random fields by parametric Cepstrum. IEEE,Transactions on Information Theory 42 , 743–750.Tjøstheim, D. (1978). Statistical spatial series modelling. Advances in Applied Probability 10 ,130–154.Tjøstheim, D. (1983). Statistical spatial series modelling II: Some further results on unilaterallattice processes. Advances in Applied Probability 15 , 562–584.Wang, H., E. M. Iglesias, and J. M. Wooldridge (2013). Partial maximum likelihood estimation ofspatial probit models. Journal of Econometrics 172 , 77–89.Whittle, P. (1954). On stationary processes in the plane. Biometrika 41 , 434–449.Whittle, P. (1963).