Asymptotic analysis of maximum likelihood estimation of covariance parameters for Gaussian processes: an introduction with proofs
aa r X i v : . [ m a t h . S T ] S e p Asymptotic analysis of maximum likelihoodestimation of covariance parameters for Gaussianprocesses: an introduction with proofs
Fran¸cois BachocInstitut de math´ematique, UMR5219;Universit´e de Toulouse;CNRS, UPS IMT, F-31062 Toulouse Cedex 9, France, [email protected]
September 16, 2020
Abstract
This article provides an introduction to the asymptotic analysis ofcovariance parameter estimation for Gaussian processes. Maximum like-lihood estimation is considered. The aim of this introduction is to beaccessible to a wide audience and to present some existing results andproof techniques from the literature. The increasing-domain and fixed-domain asymptotic settings are considered. Under increasing-domainasymptotics, it is shown that in general all the components of the covari-ance parameter can be estimated consistently by maximum likelihood andthat asymptotic normality holds. In contrast, under fixed-domain asymp-totics, only some components of the covariance parameter, constitutingthe microergodic parameter, can be estimated consistently. Under fixed-domain asymptotics, the special case of the family of isotropic Mat´erncovariance functions is considered. It is shown that only a combination ofthe variance and spatial scale parameter is microergodic. A consistencyand asymptotic normality proof is sketched for maximum likelihood esti-mators.
Kriging [55, 47] consists of inferring the values of a (Gaussian) processgiven observations at a finite set of points. It has become a popularmethod for a large range of applications, such as geostatistics [43], nu-merical code approximation [48, 49, 10], calibration [46, 13, 34], globaloptimization [32], and machine learning [47].If the mean and covariance function of the Gaussian process are known,then the unknown values of the Gaussian process can be predicted basedon Gaussian conditioning [47, 49]. Confidence intervals are associated to he predictions. In addition, in the case where the observation pointsof the Gaussian process can be selected, efficient goal-oriented sequentialsampling techniques are available, for instance for optimization [32] orestimation of failure domains [17].Nevertheless, the mean and covariance functions are typically un-known, so that the above methods are typically carried out based ona mean and covariance function selected by the user, that differ from thetrue ones. Here we shall consider the case where the mean function isknown to be equal to zero and the covariance function is known to belongto a parametric set of covariance functions. In this case, selecting a co-variance function amounts to estimating the covariance parameter. Largeestimation errors of the covariance parameter can be harmful to the qual-ity of the above methods based on Gaussian processes. Hence, one mayhope to obtain theoretical guarantees that estimators of the covarianceparameters converge to the true ones.Here we will review some of such guarantees in the case of maxi-mum likelihood estimation [47, 55], that is the most standard estimationmethod of covariance parameters. The two main settings for these guaran-tees are the increasing and fixed-domain asymptotic frameworks. Underincreasing-domain asymptotics, we will show that, generally speaking, thecovariance parameter is fully estimable consistently and asymptotic nor-mality holds. Under fixed-domain asymptotics only a subcomponent ofthe covariance parameter, called the microergodic parameter, can be es-timated consistently. We will show that the microergodic parameter isestimated consistently by maximum likelihood in the case of the family ofisotropic Mat´ern covariance functions, with asymptotic normality. In bothasymptotic settings, we will provide sketches of the proofs. We will alsohighlight the technical differences between the proofs in the two settings.The rest of the article is organized as follows. Gaussian processes,estimation of covariance parameters and maximum likelihood are intro-duced in Section 2. Increasing-domain asymptotics is studied in Section3. Fixed-domain asymptotics is studied in Section 4. Concluding re-marks and pointers to additional references are provided in Section 5. Asupplementary material contains the asymptotic normality results for theMat´ern model and the expressions of means and covariances of quadraticforms of a Gaussian vector. We consider a Gaussian process ξ : R d → R . We recall that ξ is a stochas-tic process such that for any m ∈ N and for any u , . . . , u m ∈ R d , therandom vector ( ξ ( u ) , . . . , ξ ( u m )) is a Gaussian vector [47]. Here and inthe rest of the paper, N is the set of positive integers.We assume throughout that ξ has mean function zero, that is E ( ξ ( u )) =0 for u ∈ R d . Thus, the distribution of ξ is characterized by its covariancefunction ( u, v ) ∈ R d cov( ξ ( u ) , ξ ( v )) . e assume in all the paper that the covariance function of ξ is stationary,that is there exists a function k ⋆ : R d → R such that for u, v ∈ R d ,cov( ξ ( u ) , ξ ( v )) = k ⋆ ( u − v ) . In a slight abuse of language, we will also refer to k ⋆ as the (stationary)covariance function of ξ . The function k ⋆ is symmetric because for u ∈ R d , k ⋆ ( u ) = cov( ξ ( u ) , ξ (0)) = cov( ξ (0) , ξ ( u )) = k ⋆ ( − u ). This function ispositive definite in the sense of the following definition. Definition 1.
A function φ : R d → R is positive definite if for any m ∈ N and for any u , . . . , u m ∈ R d , the m × m matrix [ φ ( u i − u j )] i,j =1 ,...,m ispositive semi-definite. The function k ⋆ is positive definite because the matrices [ k ⋆ ( u i − u j )] i,j =1 ,...,m of the form of Definition 1 are covariance matrices (of Gaus-sian vectors).We then consider a set of functions { k θ ; θ ∈ Θ } where Θ ⊂ R p andwhere for θ ∈ Θ, k θ is a function from R d → R that is symmetric andpositive definite. We also call k θ a covariance function and θ a covarianceparameter for θ ∈ Θ.The set { k θ ; θ ∈ Θ } is a set of candidate covariance functions for ξ ,that is, this set is known to the statistician who aims at selecting anappropriate parameter θ such that k θ is as close as possible to k ⋆ . Inthe rest of the paper, we will consider that k ⋆ belongs to { k θ ; θ ∈ Θ } .Hence, there exists θ ∈ Θ such that k ⋆ = k θ . This setting is calledthe well-specified case in [7, 6, 9]. Under this setting, we have a classicalparametric statistical estimation problem, where the goal is to estimatethe true covariance parameter θ . For q ∈ N and for a vector x in R q , we let let || x || be the Euclidean normof x . A first classical family of covariance functions is composed by theisotropic exponential ones with Θ ⊂ (0 , ∞ ) and k θ ( x ) = σ e − α || x || , for θ = ( σ , α ) and x ∈ R d . A second classical family is composed by theisotropic Gaussian covariance functions, with Θ ⊂ (0 , ∞ ) and k θ ( x ) = σ e − α || x || , for θ = ( σ , α ) and x ∈ R d .Finally, a third classical family is composed by the isotropic Mat´erncovariance functions, with Θ ⊂ (0 , ∞ ) and k θ ( x ) = σ − ν Γ( ν ) ( α || x || ) ν K ν ( α || x || ) , (1)where Γ is the gamma function, K ν is the modified Bessel function of thesecond kind, for θ = ( σ , α, ν ) and x ∈ R d . These families of covariancefunctions, and other ones, can be found for instance in [18, 24, 28, 47, 49, ν = 1 / k θ (0) = σ (in the Mat´ern case the function is extended at zero by con-tinuity). Hence σ is called the variance parameter, because if ξ hascovariance function k θ we have var( ξ ( u )) = σ for u ∈ R d . In these threefamilies of covariance functions, for u, v ∈ R d , if ξ has covariance func-tion k θ we have that cov( ξ ( u ) , ξ ( v )) depends on α || u − v || . Hence α iscalled the spatial scale parameter because changing α can be interpretedas changing the spatial scale when measuring differences between inputlocations of ξ . In the three examples, k θ ( x ) is a decreasing function of || x || , thus a large α makes the covariance decrease more quickly with || x || and provides a small spatial scale of variation of ξ . Conversely, a small α makes the covariance decrease more slowly and provides a large spatialscale of variation of ξ .Finally, for the family of Mat´ern covariance functions, ν is called thesmoothness parameter. To interpret this, for θ ∈ Θ, let us call spectraldensity the function ˆ k θ : R d → R such that for u ∈ R d k θ ( u ) = Z R d ˆ k θ ( ω ) e i ω ⊤ u dω, with i = −
1. Under mild regularity assumptions, that hold for the threefamilies above, the function ˆ k θ is the Fourier transform of k θ . When k θ isa Mat´ern covariance function, we haveˆ k θ ( ω ) = σ Γ( ν + d/ α ν Γ( ν ) π d/ α + || ω || ) ν + d/ , (2)for ω ∈ R d [27]. Hence, we see that for larger ν , the Fourier transformˆ k θ ( ω ) converges to zero faster as || ω || → ∞ , which implies that the func-tion k θ is smoother at zero (this function is already infinitely differentiableon R d \{ } ). This is why ν is called the smoothness parameter.There is an important body of literature on the interplay betweenthe smoothness of the covariance function of ξ and the smoothness of ξ [2, 3, 5]. In our case, if ξ has an exponential covariance function, then it iscontinuous and not differentiable (almost surely and in quadratic mean).If ξ has a Gaussian covariance function, then it is infinitely differentiable(almost surely and in quadratic mean). The Mat´ern covariance functionsprovide, so to speak, a continuum of smoothness in between these twocases. Indeed, consider ξ with Mat´ern covariance function with smooth-ness parameter ν >
0. Then ξ is m times differentiable (almost surely andin quadratic mean) if ν > m . Consider a sequence ( s i ) i ∈ N of spatial locations at which we observe ξ ,with s i ∈ R d . Assume from now on that the locations ( s i ) i ∈ N are two-by-two distinct. Then, for n ∈ N , we consider the Gaussian observationvector y = ( y , . . . , y n ) ⊤ = ( ξ ( s ) , . . . , ξ ( s n )) ⊤ . e consider a family of covariance functions { k θ ; θ ∈ Θ } and assumefurther that for n ∈ N , the covariance matrix R θ := [ k θ ( s i − s j )] i,j =1 ,...,n isinvertible. Then, when ξ has covariance function k θ , the Gaussian densityof y is L n ( θ ) = 1 p | R θ | (2 π ) n/ e − y ⊤ R − θ y , with | R θ | the determinant of R θ . The focus of this paper will be onmaximum likelihood estimation. A maximum likelihood estimator is a(measurable) estimator of θ that satisfiesˆ θ ML ∈ argmax θ ∈ Θ L n ( θ ) . (3)We remark that, in general, there may not be a unique estimator ˆ θ ML satisfying (3). Furthermore, the existence of measurable estimators satis-fying (3) is not a trivial problem. We refer for instance to [26, 44] on thispoint.In this paper, we assume that there exists at least one measurableestimator satisfying (3) and the results hold for any choice of such anestimator. A notable particular case is when Θ = (0 , ∞ ), θ = σ and k θ = σ k ⋆ . In this case, there is a unique estimator satisfying (3) (seealso (21) in Section 4). In this special case, we can call ˆ θ ML the maxi-mum likelihood estimator. In general, one may rather call it a maximumlikelihood estimator.It is convenient to consider the following decreasing transformation ofthe logarithm of the likelihood, L n ( θ ) = 1 n log( | R θ | ) + 1 n y ⊤ R − θ y, (4)for θ ∈ Θ. We have ˆ θ ML ∈ argmin θ ∈ Θ L n ( θ ) . The problem of studying the asymptotic properties of ˆ θ ML as n → ∞ presents several differences compared to the most standard parametricestimation setting where the observations are independent and identicallydistributed [58]. Indeed, in our case the components of the observationvector y are dependent, so the logarithm of the likelihood is not a sum ofindependent random variables. Furthermore, the likelihood function in-volves the quantities | R θ | and R − θ for which, often, no explicit expressionsexist. Finally, for asymptotic statistics with independent and identicallydistributed data, there is a single asymptotic setting as n → ∞ . Herethere exist several possible asymptotic settings, depending on how thespatial locations s , . . . , s n behave as n → ∞ . The proof techniques andthe results obtained strongly depend on the asymptotic setting. We willnow review some results under the two main existing asymptotic frame-works: increasing-domain and fixed-domain asymptotics. Increasing-domain asymptotics
In Section 3, we assume that there exists a fixed ∆ > i,j ∈ N i = j || s i − s j || ≥ ∆ . (5)This assumption is the main assumption considered in the literature forincreasing-domain asymptotics (see [8] for instance and see also [9] for oneof the few exceptions). This assumption implies that the spatial locations( s i ) i ∈ N are not restricted to a bounded set. The results and proofs thatwill be presented in Section 3 can mainly be found in [8]. Here the aim is to show that ˆ θ ML converges to θ , weakly. We consider ageneral family of covariance functions { k θ ; θ ∈ Θ } , where Θ is compact,that satisfies sup θ ∈ Θ | k θ ( x ) | ≤ C sup || x || d + C inf (6)and max s =1 , , max i ,...,i s =1 ,...,p sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12) ∂ s ∂θ i , . . . , ∂θ i s k θ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C sup || x || d + C inf , (7)where 0 < C inf and C sup < ∞ are fixed constants and for x ∈ R d .We also assume that( θ, ω ) ∈ Θ × R d ˆ k θ ( ω ) is continuous and strictly positive. (8)The families of isotropic exponential, Gaussian and Mat´ern covariancefunctions do satisfy (6) and (7), when Θ is compact, and ν is fixed forMat´ern. Indeed, these functions and their partial derivatives, with respectto σ and α , are exponentially decaying as || x || → ∞ , where x is theirinput. For the exponential and Gaussian covariance functions this canbe seen simply and for the Mat´ern covariance function, this follows fromthe properties of the modified Bessel functions of the second kind [1].Also, when Θ is compact, exponentially decaying functions bounding thecovariance functions and their partial derivatives can be chosen uniformlyover θ ∈ Θ (see again [1] for the Mat´ern covariance functions).These three families of covariance functions also satisfy (8). The ex-pressions of the Fourier transforms of these covariance functions can befound for instance in [27] and [55].Then the next lemma enables to control the term R − θ in (4). We let λ inf ( M ) be the smallest eigenvalue of a symmetric matrix M . Lemma 2 (Proposition D.4 in [8], Theorem 5 in [14]) . Assume that (5) , (6) and (8) hold. We have inf n ∈ N inf θ ∈ Θ λ inf ( R θ ) > . ketch. We have, for n ∈ N and λ , . . . , λ n ∈ R , n X i,j =1 λ i λ j ( R θ ) i,j = n X i,j =1 λ i λ j k θ ( s i − s j )= n X i,j =1 λ i λ j Z R d ˆ k θ ( ω ) e i ω ⊤ ( s i − s j ) dω = Z R d ˆ k θ ( ω ) n X i,j =1 λ i λ j e i ω ⊤ s i e − i ω ⊤ s j ! dω = Z R d ˆ k θ ( ω ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 λ i e i ω ⊤ s i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dω, (9)where | z | is the modulus of a complex number z . In (9), ˆ k θ ( ω ) is strictlypositive. Furthermore, because s , . . . , s n are two-by-two distinct, thefamily of functions ( ω e i ω ⊤ s i ) i =1 ,...,n is linearly independent. Hence, P ni,j =1 λ i λ j ( R θ ) i,j > λ , . . . , λ n ) = 0. This shows that λ inf ( R θ ) > n ∈ N and θ ∈ Θ. Proving that the infimum in the lemma is alsostrictly positive is also based on (9). We refer to the proofs of PropositionD.4 in [8] or of Theorem 5 in [14].The next lemma will enable to control the variance of the likelihoodcriterion and the order of magnitude of its derivatives.
Lemma 3.
Assume that (5) , (6) , (7) and (8) hold. For any θ ∈ Θ , as n → ∞ , var( L n ( θ )) = o (1) . Furthermore max i =1 ,...,p sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12) ∂∂θ i L n ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) = O p (1) . sketch. Using that y is a centered Gaussian vector, we have, with cov( z )the covariance matrix of a random vector z , from Appendix B in thesupplementary material,var( L n ( θ )) = 1 n var( y ⊤ R − θ y ) = 2 n tr (cid:0) R − θ cov( y ) R − θ cov( y ) (cid:1) = 2 n tr (cid:0) R − θ R θ R − θ R θ (cid:1) . Let λ sup ( M ) be the largest eigenvalue of a symmetric matrix M . FromGershgorin circle theorem, we have λ sup ( R θ ) ≤ max i =1 ,...,n n X j =1 | ( R θ ) i,j | = max i =1 ,...,n n X j =1 | k θ ( s i − s j ) | (from (6) :) ≤ max i =1 ,...,n n X j =1 C sup || s i − s j || d + C inf . t is shown in [8] that (5) implies thatmax i =1 ,..., ∞ ∞ X j =1 C sup || s i − s j || d + C inf < ∞ . Hence there is a constant A < ∞ such that λ sup ( R θ ) ≤ A . Also, fromLemma 2, there is a constant A < ∞ such that sup θ ∈ Θ λ sup ( R − θ ) ≤ A .Hence, we have var( L n ( θ )) ≤ A A /n which proves the first part of thelemma.For the second part of the lemma, let ρ sup ( M ) be the largest singularvalue of a matrix M . Using Gershgorin circle theorem again, togetherwith (7), we show that there is a constant A < ∞ such that,max i =1 ,...,p sup θ ∈ Θ ρ sup (cid:18) ∂R θ ∂θ i (cid:19) ≤ A . With this, we havemax i =1 ,...,p sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12) ∂∂θ i L n ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) = max i =1 ,...,p sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12) n tr (cid:18) R − θ ∂R θ ∂θ i (cid:19) − n y ⊤ R − θ ∂R θ ∂θ i R − θ y (cid:12)(cid:12)(cid:12)(cid:12) ≤ A A + A A || y || n . This last quantity is a O p (1) because || y || /n is non-negative with (bounded)expectation var( ξ (0)).The consistency result will rely on the following asymptotic identifia-bility assumption. We assume that for all ǫ > n →∞ inf θ ∈ Θ || θ − θ ||≥ ǫ n n X i,j =1 ( k θ ( s i − s j ) − k θ ( s i − s j )) > . (10)This assumption means that for θ bounded away from θ , there is sufficientinformation in the spatial locations s , . . . , s n to distinguish between thetwo covariance functions k θ and k θ . In [8], an explicit example is providedfor which (10) holds.We remark that, even though there are n terms in the sum in (10),this sum can be shown to be a O ( n ) for any fixed θ ∈ Θ, because of(6) (by proceeding as in the proof of Lemma 3). The intuition is that,asymptotically, for many pairs i, j ∈ { , . . . , n } , k θ ( s i − s j ) and k θ ( s i − s j )are small. This is why the normalization factor is 1 /n rather than 1 /n in (10).With the assumption (10), we can now state the consistency result. Theorem 4 ([8]) . Assume that (5) , (6) , (7) , (8) and (10) hold. As n → ∞ ˆ θ ML → p θ . sketch. From Lemma 3 we have, for any θ ∈ Θ, L n ( θ ) − E ( L n ( θ )) → pn →∞ . urthermore one can show, similarly as in Lemma 3,max i =1 ,...,p sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12) ∂∂θ i E ( L n ( θ )) (cid:12)(cid:12)(cid:12)(cid:12) = O (1) . Hence, using Lemma 3, we obtainsup θ ∈ Θ | L n ( θ ) − E ( L n ( θ )) | = o p (1) . (11)Next, it is shown in [8] that there exists a constant A > θ ∈ Θ E ( L n ( θ )) − E ( L n ( θ )) ≥ A n n X i,j =1 ( k θ ( s i − s j ) − k θ ( s i − s j )) . (12)From (12) and (10), we then obtain, for ǫ >
0, with a strictly positiveconstant A , for n large enough,inf θ ∈ Θ || θ − θ ||≥ ǫ ( E ( L n ( θ )) − E ( L n ( θ ))) ≥ A . (13)Combining (11) and (13) enables to conclude the proof with a standardM-estimator argument (for instance as in the proof of Theorem 5.7 in[58]). For i ∈ { , . . . , p } , we have seen in the proof of Lemma 3 that the i -thpartial derivative of L n at θ is ∂∂θ i L n ( θ ) = 1 n tr (cid:18) R − θ ∂R θ ∂θ i (cid:19) − n y ⊤ R − θ ∂R θ ∂θ i R − θ y. Since y is a centered Gaussian vector and using Appendix B in the supple-mentary material, the element i, j of the covariance matrix of the gradientof L n at θ is thus, for i, j = 1 , . . . , p ,cov (cid:18) ∂∂θ i L n ( θ ) , ∂∂θ j L n ( θ ) (cid:19) = 2 n tr (cid:18) R − θ ∂R θ ∂θ i R − θ R θ R − θ ∂R θ ∂θ j R − θ R θ (cid:19) = 2 n tr (cid:18) R − θ ∂R θ ∂θ i R − θ ∂R θ ∂θ j (cid:19) . (14)It is shown in [8] that for i, j ∈ { , . . . , p } , E (cid:18) ∂ ∂θ i ∂θ j L n ( θ ) (cid:19) = 1 n tr (cid:18) R − θ ∂R θ ∂θ i R − θ ∂R θ ∂θ j (cid:19) . We will thus need to ensure that the p × p matrix with element i, j equal to 1 n tr (cid:18) R − θ ∂R θ ∂θ i R − θ ∂R θ ∂θ j (cid:19) s asymptotically invertible. For this, we assume that for all ( λ , . . . , λ p ) ∈ R p \{ } , lim inf n →∞ n n X i,j =1 p X m =1 λ m ∂k θ ( s i − s j ) ∂θ m ! > . (15)This assumption is interpreted as a local identifiability condition around θ . In [8], an explicit example is provided for which (15) holds.We can now state the asymptotic normality result for maximum like-lihood estimators. Theorem 5.
Assume that (5) , (6) , (7) , (8) , (10) and (15) hold. Let Σ θ be the p × p matrix with element i, j equal to
12 1 n tr (cid:18) R − θ ∂R θ ∂θ i R − θ ∂R θ ∂θ j (cid:19) . Then < lim inf n →∞ λ inf (Σ θ ) ≤ lim sup n →∞ λ sup (Σ θ ) < ∞ . (16) Furthermore, with M − / the unique symmetric matrix square root of M − for a symmetric strictly positive definite M , we have √ n (cid:0) Σ − θ (cid:1) − / (ˆ θ ML − θ ) → dn →∞ N (0 , I p ) . (17)We remark that in Theorem 5, Σ − θ is the asymptotic covariance ma-trix, but this matrix is not necessarily assumed to converge as n → ∞ .This matrix has its eigenvalues bounded away from zero and infinityasymptotically, so that the rate of convergence is √ n in Theorem 5. Remark 6.
Here the element i, j of n Σ θ is n / times the covariancebetween the elements i and j of the gradient of L n , from (14) . Note that L n is − /n times the log-likelihood (up to a constant not depending on y or θ ). Consider now the score vector that is equal to the gradient of thelog-likelihood. Then, we obtain that the covariance between the elements i and j of the score is n / times /n times the element i, j of n Σ θ .In other words, n Σ θ is the (theoretical) Fisher information matrix. Inagreement with this, remark that from Theorem 5 the inverse of n Σ θ pro-vides the asymptotic covariance matrix of maximum likelihood estimatorsas n → ∞ .sketch. In [8], it is shown that there exists a strictly positive constant A such that for any λ , . . . , λ p with λ + · · · + λ p = 1, we have p X i,j =1 λ i λ j
12 1 n tr (cid:18) R − θ ∂R θ ∂θ i R − θ ∂R θ ∂θ j (cid:19) ≥ A n n X i,j =1 p X m =1 λ m ∂k θ ( s i − s j ) ∂θ m ! . Hence, from (15), 0 < lim inf n →∞ λ inf (Σ θ ) . Hence Σ θ is invertible for n large enough. Let n be large enough so thatthis is the case in the rest of the proof. ne can show as in the proof of Lemma 3 (see also [8]) thatlim sup n →∞ λ sup (Σ θ ) < ∞ . Hence (16) is proved. Let us now prove (17).It is shown in [8] (see also [11]), using a standard M-estimator argu-ment together with techniques similar as above, that √ n (ˆ θ ML − θ ) = − (cid:20) E (cid:18) ∂ ∂θ i ∂θ j L n ( θ ) (cid:19)(cid:21) i,j =1 ,...,p ! − √ n (cid:18) ∂∂θ i L n ( θ ) (cid:19) i =1 ,...,p + o p (1)= −
12 Σ − θ √ n (cid:18) ∂∂θ i L n ( θ ) (cid:19) i =1 ,...,p + o p (1) . Hence to conclude the proof, it is sufficient to show that(4Σ θ ) − / √ n (cid:18) ∂∂θ i L n ( θ ) (cid:19) i =1 ,...,p → dn →∞ N (0 , I p ) . Let us show this using linear combinations. Let us write the p × ∂∂θ L n ( θ ) = (cid:18) ∂∂θ i L n ( θ ) (cid:19) i =1 ,...,p . Let λ = ( λ , . . . , λ p ) ⊤ ∈ R p be fixed with λ + · · · + λ p = 1. We have p X i =1 λ i (cid:18) (4Σ θ ) − / √ n ∂∂θ L n ( θ ) (cid:19) i = p X i =1 (cid:16) (4Σ θ ) − / λ (cid:17) i √ n ∂∂θ i L n ( θ ) . Let us now write β i = (cid:16) (4Σ θ ) − / λ (cid:17) i . We have p X i =1 λ i (cid:18) (4Σ θ ) − / √ n ∂∂θ L n ( θ ) (cid:19) i = p X i =1 β i √ n ∂∂θ i L n ( θ )= −√ n y ⊤ n p X i =1 β i R − θ ∂R θ ∂θ i R − θ ! y − E y ⊤ n p X i =1 β i R − θ ∂R θ ∂θ i R − θ ! y !! , using for the last equality that the gradient of the logarithm of the like-lihood at θ has mean zero. Letting z = ( z , . . . , z n ) ⊤ = R − / θ y , thenegative of the above quantity is equal to √ n z ⊤ n p X i =1 β i R − / θ ∂R θ ∂θ i R − / θ ! z − E z ⊤ n p X i =1 β i R − / θ ∂R θ ∂θ i R − / θ ! z !! . (18) etting ρ , . . . , ρ n be the eigenvalues of (1 /n ) P pi =1 β i R − / θ ∂R θ ∂θ i R − / θ and letting w = ( w , . . . , w n ) ∼ N (0 , I n ), (18) is equal, in distribution, to √ n n X i =1 ( w i − ρ i . (19)Let us show that (19) converges to a standard Gaussian distribution. Wehavevar √ n n X i =1 ( w i − ρ i ! = 2 n n X i =1 ρ i = var p X i =1 λ i (cid:18) (4Σ θ ) − / √ n ∂∂θ L n ( θ ) (cid:19) i ! = λ ⊤ cov (cid:18) (4Σ θ ) − / √ n ∂∂θ L n ( θ ) (cid:19) λ (from (14) :) = λ ⊤ I p λ = 1 . One can show as in the proof of Lemma 3 (see also [8]) that max ni =1 | ρ i | = O (1 /n ). Hence, the classical Lindeberg-Feller central limit theorem en-ables to conclude that (19) converges to a standard Gaussian distribution(see also [31]). This concludes the proof.To conclude Section 3, the consistency and asymptotic normality re-sults given here are quite generally applicable to families of stationarycovariance functions and to Gaussian processes with zero mean functions.Some extensions to non-zero constant mean functions are discussed in [11].It would be interesting to provide extensions to non-stationary covariancefunctions or to unknown non-constant mean functions, with a parametricfamily of mean functions. It is possible that some of the proof techniquesand intermediary results presented in Section 3 and in [8] would be rele-vant for these extensions. Nevertheless, new arguments would also needto be developed, and appropriate assumptions, on the non-stationary co-variance functions and non-constant mean functions, would need to beconsidered. Under fixed-domain asymptotics, the spatial locations s , . . . , s n are re-stricted to a compact set D ⊂ R d . In this case, almost none of the prooftechniques above for increasing-domain asymptotics can be applied. In-deed, they are based on the fact that for a given i ∈ { , . . . , n } , ξ ( s i ) hasa very small covariance with ξ ( s j ) for most s j , j = 1 , . . . , n . On the con-trary, under fixed-domain asymptotics, for instance if k θ is non-zero on R d , ξ ( s i ) has a non negligible covariance with all the ξ ( s j ), j = 1 , . . . , n . n particular, contrary to Lemma 2, if θ ∈ Θ is such that k θ is contin-uous at zero, then the smallest eigenvalue of R θ goes to zero as n → ∞ .This is seen by considering a sequence of 2 × s i n , s j n with || s i n − s j n || → n → ∞ . Similarly, the largest eigenvalueof R θ goes to infinity as n → ∞ for any θ ∈ Θ if k θ is, for instance,non-zero on R d . The conclusion of Section 3 on increasing-domain asymptotics is that thefamily of stationary covariance functions { k θ ; θ ∈ Θ } can be fairly generalto prove the consistency and asymptotic normality of maximum likelihoodestimators of θ . In particular, under the reasonable conditions (10) and(15), θ can be entirely consistently estimable.We will now see that, in contrast, for a family { k θ ; θ ∈ Θ } of covariancefunctions, under fixed-domain asymptotics, it can regularly be the casethat θ is not entirely consistently estimable.The notion that makes this more precise is that of the equivalence ofGaussian measures [30, 55]. Consider two covariance parameters θ , θ ∈ Θ, θ = θ . If ξ has covariance function k θ , ξ yields a measure M θ onthe set of functions from D to R , with respect to the cylindrical sigma-algebra . Similarly, if ξ has covariance function k θ , ξ yields a measure M θ . When D is compact, these two measures can be equivalent (for aset A of functions, M θ ( A ) = 0 if and only if M θ ( A ) = 0) even whenthe covariance functions k θ and k θ are different.The notion of equivalence of Gaussian measures enables to define non-microergodic parameters. Definition 7.
Let Φ be a function from Θ to R q for q ∈ N . We say that Φ( θ ) is non-microergodic if there exists θ ∈ Θ such that Φ( θ ) = Φ( θ ) and the measures M θ and M θ are equivalent. If a covariance parameter is non-microergodic, it can not be estimatedconsistently.
Lemma 8.
Let ( s i ) i ∈ N be any sequence of points in D . If Φ( θ ) is non-microergodic, there does not exist a sequence of functions ˆΦ n : R n → R q such that, for any θ ∈ Θ , if ξ has covariance function k θ then ˆΦ n ( ξ ( s ) , . . . , ξ ( s n )) goes to Φ( θ ) in probability as n → ∞ .Proof. Let Φ( θ ) be non-microergodic. Then fix θ ∈ Θ such that Φ( θ ) =Φ( θ ) and the measures M θ and M θ are equivalent.Assume that an estimator sequence ˆΦ n as described in the lemmaexists. Then, when ξ has covariance function k θ , as n → ∞ ,ˆΦ n ( ξ ( s ) , . . . , ξ ( s n )) → p Φ( θ ) . If Gaussian processes with continuous realizations on compact sets are considered, one canalso define Gaussian measures over the Banach space of continuous functions (on a compactset) endowed with the supremum norm and the corresponding Borel sigma-algebra. ence there exists a subsequence n ′ such that as n ′ → ∞ , almost surely,ˆΦ n ′ ( ξ ( s ) , . . . , ξ ( s n ′ )) → Φ( θ ) . This can be written in the form M θ (cid:16)n f function from D to R such that ˆΦ n ′ ( f ( s ) , . . . , f ( s n ′ )) → n ′ →∞ Φ( θ ) o(cid:17) = 1 . Then since the measures M θ and M θ are equivalent M θ (cid:16)n f function from D to R such that ˆΦ n ′ ( f ( s ) , . . . , f ( s n ′ )) → n ′ →∞ Φ( θ ) o(cid:17) = 1 . This means that, when ξ has covariance function k θ , the sequence ˆΦ n ′ ( ξ ( s ) , . . . , ξ ( s n ′ ))goes almost surely to Φ( θ ) = Φ( θ ). Hence the sequence ˆΦ n ( ξ ( s ) , . . . , ξ ( s n ))does not go to Φ( θ ) in probability as n → ∞ . This is a contradictionwhich concludes the proof.Hence, one should not expect to have accurate estimators of non-microergodic parameters under fixed-domain asymptotics. The interpre-tation of non-microergodic parameters is that, even if Φ( θ ) and Φ( θ )are different, there is not enough information in a single realization ofthe random function { ξ ( s ); s ∈ D } (even if this realization was observedcontinuously) to distinguish between Φ( θ ) and Φ( θ ). This lack of infor-mation stems from the boundedness of D .It is important to remark that there exist results showing that non-microergodic parameters have an asymptotically negligible impact on pre-diction of unknown values of ξ [51, 52, 54, 62]. In [55], this situation isinterpreted as an instance of the following principle, called Jeffreys’s law:“things we shall never find much out about cannot be very important forprediction”.Finally, we can define microergodic parameters. Definition 9.
Let Φ be a function from Θ to R q for q ∈ N . We saythat Φ( θ ) is microergodic if for any θ ∈ Θ such that Φ( θ ) = Φ( θ ) , themeasures M θ and M θ are orthogonal (i.e. there exists a set of functions A such that M θ ( A ) = 0 and M θ ( A ) = 1). Let us now focus on the family of isotropic Mat´ern covariance functions(1), in the case where the smoothness parameter ν is known. We thusconsider θ = ( σ , α ) ∈ Θ = (0 , ∞ ) × [ α inf , α sup ] with 0 < α inf < α sup < ∞ fixed. We thus have k θ ( x ) = σ − ν Γ( ν ) ( α || x || ) ν K ν ( α || x || ) , (20)for x ∈ R d where 0 < ν < ∞ is fixed and known. We let θ = ( σ , α ). Inthe rest of Section 4, we set the dimension as d ∈ { , , } .Then the parameters σ and α are non-microergodic, while the pa-rameter σ α ν is microergodic. oposition 10 ([62]) . With the family of covariance functions given by (20) , the measures M θ and M θ are equivalent if σ α ν = σ α ν andare orthogonal if σ α ν = σ α ν . Hence, σ α ν is microergodic, and inparticular σ and α are non-microergodic. We remark that Proposition 10 holds for d ∈ { , , } , which is theambient assumption in Section 4.3. When d ≥
5, [4] proved that thefull parameter ( σ , α ) is microergodic (thus in particular σ and α aremicroergodic). At the time of [4], it was mentioned there that the case d = 4 was open, that is, it was not known if σ and α are microergodic inthis case. Currently, this case is still open, to the best of our knowledge.Then, [62] finds a consistent estimator of σ α ν by fixing α to anarbitrary value and by maximizing the likelihood with respect to σ only.Hence, for α ∈ [ α inf , α sup ], letˆ σ ( α ) = argmin σ ∈ (0 , ∞ ) L n ( σ , α ) . We remark that the argmin is unique from (22) in the proof of Theorem11. By canceling the derivative of L n ( σ , α ) with respect to σ , we findˆ σ ( α ) = 1 n y ⊤ Σ − α y, (21)with Σ α = R σ ,α /σ , based on (22) in the proof of Theorem 11. Theorem 11 ([62]) . Let α be any fixed element of [ α inf , α sup ] . As n →∞ , almost surely, ˆ σ ( α ) α ν → σ α ν . sketch. Let σ = σ α ν α ν . Let ǫ >
0. From Proposition 10, the measures M σ ,α and M σ ,α are equivalent and the measures M σ ,α and M σ + ǫ,α are orthogonal.Hence, [62], based on [25], obtains that, almost surely, nL n ( σ + ǫ, α ) − nL n ( σ , α ) → ∞ . Similarly, we can show that, almost surely, nL n ( σ − ǫ, α ) − nL n ( σ , α ) → ∞ . Let Σ α = R σ ,α /σ . Then L n ( σ , α ) = log( σ ) + 1 n log( | Σ α | ) + 1 σ n y ⊤ Σ − α y. (22)Hence, 1 /σ nL n ( σ , α ) is convex, and thus by convexity we obtain,as n → ∞ , inf σ ∈ (0 , ∞ ) | σ − σ |≥ ǫ nL n ( σ , α ) − nL n ( σ , α ) → ∞ almost surely. This implies that ˆ σ ( α ) → σ almost surely as n → ∞ which concludes the proof. n the supplementary material, still for the Mat´ern covariance func-tions, we also provide asymptotic normality results for the estimatorˆ σ ( α ) α ν and for the “full” maximum likelihood estimator, where thelikelihood is maximized with respect to both σ and α .We remark that, in general and outside of the Mat´ern case, consistencyresults for maximum likelihood under fixed-domain asymptotics are quitescarce. We mention a few such other consistency results at the end ofAppendix A in the supplementary material and in Section 5. We have presented some asymptotic results on covariance parameter es-timation under increasing and fixed-domain asymptotics. The presen-tation highlights the strong differences between the two settings. Un-der increasing-domain asymptotics, with mild identifiability conditions,all the components of the covariance parameter can be estimated con-sistently, and with asymptotic normality. The proof techniques hold forgeneral families of stationary covariance functions. They are based on theasymptotic independence between most pairs of observations, as n → ∞ ,that enables to control the logarithm of the likelihood and its gradientand to apply general methods for M-estimators.In contrast, under fixed-domain asymptotics, typically all pairs of ob-servations have a covariance that is not small. As a consequence somecomponents of the covariance parameter can not be estimated consis-tently, even if changing the component changes the covariance function.The notion of equivalence of Gaussian measures, yielding the notion ofmicroergodicity, is central. The results and proofs are not general in thecurrent state of the literature. Here we have presented results and proofsrelated to the family of isotropic Mat´ern covariance functions in dimension d = 1 , ,
3. The presented proofs rely on the Fourier transforms of thesecovariance functions (through the results taken from the cited references)and also on the explicit expression of the logarithm of the likelihood as afunction of the variance parameter σ .There are many other existing contributions in the literature that wehave not presented here. Under increasing-domain asymptotics, earlierresults on maximum likelihood were provided by [42], using general re-sults from [56] (the latter not necessarily considering Gaussian processes).Restricted maximum likelihood was then studied in [21]. Cross validationwas considered in [8, 9]. Extensions to transformed Gaussian processeswere studied in [11]. Pairwise likelihood was studied in [19]. Multivari-ate processes were considered in [23, 50]. Finally, more generally, theincreasing-domain asymptotic framework is investigated in spatial statis-tics for instance in [29, 35, 36, 37].Under fixed-domain asymptotics, earlier results for the estimation ofthe microergodic parameter in the family of exponential covariance func-tions in dimension one were obtained in [60]. The estimation of parametersfor the Brownian motion is addressed in [53]. Various additional results onmaximum likelihood are obtained in [38, 39, 57, 61]. Variation-based esti-mators are studied in [4, 20, 31, 40]. Composite likelihood is addressed in A Asymptotic normality for the estima-tion of the microergodic parameter of theisotropic Mat´ern model
In the case of the Mat´ern covariance functions, in [22], a central limittheorem is proved for the same estimator ˆ σ ( α ) α ν as in [62]. As inSection 4.3, we let d ∈ { , , } . Theorem 12 ([22]) . Let α be any fixed element of [ α inf , α sup ] . As n →∞ , √ n (cid:0) ˆ σ ( α ) α ν − σ α ν (cid:1) → dn →∞ N (0 , σ α ν ) ) . Proof (sketch).
In [22], it is shown (after an involved and technicalproof that is based in particular on the Fourier transform expression ofthe Mat´ern covariance function) thatˆ σ ( α ) α ν − ˆ σ ( α ) α ν = o p (cid:18) √ n (cid:19) . Hence, √ n (cid:0) ˆ σ ( α ) α ν − σ α ν (cid:1) = √ n (cid:0) ˆ σ ( α ) α ν − σ α ν (cid:1) + o p (1) . (23)We have √ n (cid:0) ˆ σ ( α ) α ν − σ α ν (cid:1) = σ α ν √ n (cid:16) y ⊤ R − θ y − E ( y ⊤ R − θ y ) (cid:17) . Since y ⊤ R − θ y is a sum of squares of independent standard Gaussian vari-ables, we have √ n (cid:0) ˆ σ ( α ) α ν − σ α ν (cid:1) → dn →∞ N (0 , σ α ν ) ) . Hence, from (23) and Slutsky’s lemma, √ n (cid:0) ˆ σ ( α ) α ν − σ α ν (cid:1) → dn →∞ N (0 , σ α ν ) ) . This concludes the proof.Finally, the above central limit theorem relies on an arbitrary fixedchoice of α . Later, this central limit theorem was refined by [33], basedon intermediary results from [59], to allow for an arbitrary estimator of α . More precisely, [33] proves the following. Theorem 13 ([33]) . Let (ˆ α n ) n ∈ N be any sequence of random variables in [ α inf , α sup ] . Then as n → ∞ , √ n (cid:0) ˆ σ (ˆ α n )ˆ α νn − σ α ν (cid:1) → dn →∞ N (0 , σ α ν ) ) . n particular, with a maximum likelihood estimator (ˆ σ , ˆ α ML ), wehave ˆ σ = ˆ σ (ˆ α ML )and thus Theorem 13 implies √ n (cid:0) ˆ σ ˆ α ν ML − σ α ν (cid:1) → dn →∞ N (0 , σ α ν ) ) , which is the asymptotic normality of maximum likelihood estimators ofthe microergodic parameter in the family of isotropic Mat´ern covariancefunctions. Note that ˆ σ and ˆ α ML typically do not converge separatelyto fixed quantities [63].We remark that the bounds α inf and α sup that define maximum likeli-hood estimators have no impact on Theorem 13 as long as they are finite,non-zero and fixed independently of n . The proof techniques of [33] donot allow for α inf and α sup depending on n and going to zero or infinity.In practice, one also usually takes bounds 0 < α inf < α sup < ∞ for α toimplement maximum likelihood. Then a common practice is to take α inf and α sup of the same orders as the inverses of the maximum and minimumdistances between two distinct observation points.Notice that all the results reviewed here for the Mat´ern model assume ν to be fixed and known. We are not aware of any consistency resultsof maximum likelihood estimators of ν under fixed-domain asymptotics.Nevertheless, there exist other estimation techniques than maximum like-lihood, that are shown to be able to estimate ν consistently, in particularvariation-based estimators [20, 31, 40].We also remark that the results and proof techniques of Section 4 andAppendix A are intrinsically specific to the Mat´ern covariance functions.Nevertheless, [18] recently managed to extend them to the family of Wend-land covariance functions. Finding other families of covariance functionsfor which similar extensions would be possible is an interesting topic forfuture research. B Expectations and covariances of quadraticforms of a Gaussian vector
Let r ∈ N , let V be a centered r × A and B befixed r × r matrices. Then we have E ( V ⊤ AV ) = tr ( A cov( V ))and cov( V ⊤ AV, V ⊤ BV ) = 2tr ( A cov( V ) B cov( V ))from, for instance, (A.6) and (A.7) in [45]. References [1] M. Abramowitz and I. A. Stegun.
Handbook of Mathematical Func-tions with Formulas, Graphs, and Mathematical Tables . Dover, NewYork, ninth Dover printing, tenth GPO printing edition, 1964.
2] R. Adler.
The Geometry of Random Fields . Wiley, New York, 1981.[3] R. J. Adler.
An introduction to continuity, extrema, and related topicsfor general Gaussian processes . IMS, 1990.[4] E. Anderes. On the consistent separation of scale and variance forGaussian random fields.
Annals of Statistics , 38:870–893, 2010.[5] J.-M. Aza¨ıs and M. Wschebor.
Level Sets and Extrema of RandomProcesses and Fields . John Wiley & Sons, 2009.[6] F. Bachoc. Cross validation and maximum likelihood estimations ofhyper-parameters of Gaussian processes with model mispecification.
Computational Statistics and Data Analysis , 66:55–69, 2013.[7] F. Bachoc.
Parametric estimation of covariance functionin Gaussian-process based Kriging models. Application to un-certainty quantification for computer experiments . PhD the-sis, Universit´e Paris-Diderot - Paris VII, 2013. Available at https://tel.archives-ouvertes.fr/tel-00881002/document .[8] F. Bachoc. Asymptotic analysis of the role of spatial sampling forcovariance parameter estimation of Gaussian processes.
Journal ofMultivariate Analysis , 125:1–35, 2014.[9] F. Bachoc. Asymptotic analysis of covariance parameter estima-tion for Gaussian processes in the misspecified case.
Bernoulli ,24(2):1531–1575, 2018.[10] F. Bachoc, K. Ammar, and J. Martinez. Improvement of code be-havior in a design of experiments by metamodeling.
Nuclear scienceand engineering , 183(3):387–406, 2016.[11] F. Bachoc, J. B´etancourt, R. Furrer, and T. Klein. Asymptoticproperties of the maximum likelihood and cross validation estimatorsfor transformed Gaussian processes.
Electronic Journal of Statistics ,14(1):1962–2008, 2020.[12] F. Bachoc, M. Bevilacqua, and D. Velandia. Composite likelihoodestimation for a Gaussian process under fixed domain asymptotics.
Journal of Multivariate Analysis , 174:104534, 2019.[13] F. Bachoc, G. Bois, J. Garnier, and J.-M. Martinez. Calibrationand improved prediction of computer models by universal Kriging.
Nuclear Science and Engineering , 176(1):81–97, 2014.[14] F. Bachoc and R. Furrer. On the smallest eigenvalues of covariancematrices of multivariate spatial processes.
Stat , 5(1):102–107, 2016.[15] F. Bachoc, A. Lagnoux, and A. F. L´opez-Lopera. Maximum likeli-hood estimation for Gaussian processes under inequality constraints.
Electronic Journal of Statistics , 13(2):2921–2969, 2019.[16] F. Bachoc, A. Lagnoux, and T. M. N. Nguyen. Cross-validationestimation of covariance parameters under fixed-domain asymptotics.
Journal of Multivariate Analysis , 160:42–67, 2017.[17] J. Bect, D. Ginsbourger, L. Li, V. Picheny, and E. Vazquez. Sequen-tial design of computer experiments for the estimation of a probabil-ity of failure.
Statistics and Computing , 22 (3):773–793, 2012.
18] M. Bevilacqua, T. Faouzi, R. Furrer, and E. Porcu. Estimation andprediction using generalized Wendland covariance functions underfixed domain asymptotics.
The Annals of Statistics , 47(2):828–856,2019.[19] M. Bevilacqua and C. Gaetan. Comparing composite likelihoodmethods based on pairs for spatial Gaussian random fields.
Statisticsand Computing , 25(5):877–892, 2015.[20] D. Blanke and C. Vial. Global smoothness estimation of a Gaus-sian process from general sequence designs.
Electronic Journal ofStatistics , 8(1):1152–1187, 2014.[21] N. Cressie and S. Lahiri. The asymptotic distribution of REMLestimators.
Journal of Multivariate Analysis , 45:217–233, 1993.[22] J. Du, H. Zhang, and V. Mandrekar. Fixed-domain asymptotic prop-erties of tapered maximum likelihood estimators.
The Annals ofStatistics , 37:3330–3361, 2009.[23] R. Furrer, F. Bachoc, and J. Du. Asymptotic properties of multivari-ate tapering for estimation and prediction.
Journal of MultivariateAnalysis , 149:177–191, 2016.[24] M. G. Genton and W. Kleiber. Cross-covariance functions for multi-variate geostatistics.
Statistical Science , 30(2):147–163, 2015.[25] I. Gikhman and A. Skorokhod.
The theory of stochastic processes II .Springer Science & Business Media, 2004.[26] E. Gin´e and R. Nickl.
Mathematical foundations of infinite-dimensional statistical models , volume 40. Cambridge UniversityPress, 2016.[27] T. Gneiting, W. Kleiber, and M. Schlather. Mat´ern cross-covariancefunctions for multivariate random fields.
Journal of the AmericanStatistical Association , 105(491):1167–1177, 2010.[28] T. Gneiting and M. Schlather. Stochastic models that separate fractaldimension and the hurst effect.
SIAM review , 46(2):269–282, 2004.[29] M. Hallin, Z. Lu, and K. Yu. Local linear spatial quantile regression.
Bernoulli , 22(1):659–686, 2009.[30] I. Ibragimov and Y. Rozanov.
Gaussian Random Processes . Springer-Verlag, New York, 1978.[31] J. Istas and G. Lang. Quadratic variations and estimation of thelocal H¨older index of a Gaussian process.
Ann. Inst. H. Poincar´eProbab. Statist. , 33(4):407–436, 1997.[32] D. Jones, M. Schonlau, and W. Welch. Efficient global optimizationof expensive black box functions.
Journal of Global Optimization ,13:455–492, 1998.[33] C. Kaufman and B. Shaby. The role of the range parameter forestimation and prediction in geostatistics.
Biometrika , 100:473–484,2013.[34] M. C. Kennedy and A. O’Hagan. Bayesian calibration of computermodels.
Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 63(3):425–464, 2001.
35] S. Lahiri and P. Robinson. Central limit theorems for long rangedependent spatial linear processes.
Bernoulli , 22(1):345–375, 2016.[36] S. N. Lahiri. Central limit theorems for weighted sums of a spatialprocess under a class of stochastic and fixed designs.
Sankhy˜a: TheIndian Journal of Statistics , 65:356–388, 2003.[37] S. N. Lahiri and K. Mukherjee. Asymptotic distributions of M-estimators in a spatial regression model under some fixed and stochas-tic spatial sampling designs.
Annals of the Institute of StatisticalMathematics , 56:225–250, 2004.[38] W. Loh. Fixed domain asymptotics for a subclass of Mat´ern typeGaussian random fields.
Annals of Statistics , 33:2344–2394, 2005.[39] W. Loh and T. Lam. Estimating structured correlation matrices insmooth Gaussian random field models.
Annals of Statistics , 28:880–904, 2000.[40] W.-L. Loh. Estimating the smoothness of a Gaussian random fieldfrom irregularly spaced data via higher-order quadratic variations.
The Annals of Statistics , 43(6):2766–2794, 2015.[41] A. F. L´opez-Lopera, F. Bachoc, N. Durrande, and O. Roustant.Finite-dimensional Gaussian approximation with linear inequalityconstraints.
SIAM/ASA Journal on Uncertainty Quantification ,6(3):1224–1255, 2018.[42] K. Mardia and R. Marshall. Maximum likelihood estimation of mod-els for residual covariance in spatial regression.
Biometrika , 71:135–146, 1984.[43] G. Matheron.
La Th´eorie des Variables R´egionalis´ees et ses Ap-plications . Fasicule 5 in Les Cahiers du Centre de MorphologieMath´ematique de Fontainebleau. Ecole Nationale Sup´erieure desMines de Paris, 1970.[44] I. Molchanov.
Theory of random sets . Springer, 2005.[45] M. S. Paolella.
Linear Models and Time-Series Analysis: Regression,ANOVA, ARMA and GARCH . John Wiley & Sons, 2018.[46] R. Paulo, G. Garcia-Donato, and J. Palomo. Calibration of computermodels with multivariate output.
Computational Statistics and DataAnalysis , 56:3959–3974, 2012.[47] C. Rasmussen and C. Williams.
Gaussian Processes for MachineLearning . The MIT Press, Cambridge, 2006.[48] J. Sacks, W. Welch, T. Mitchell, and H. Wynn. Design and analysisof computer experiments.
Statistical Science , 4:409–423, 1989.[49] T. Santner, B. Williams, and W. Notz.
The Design and Analysis ofComputer Experiments . Springer, New York, 2003.[50] B. A. Shaby and D. Ruppert. Tapered covariance: Bayesian esti-mation and asymptotics.
Journal of Computational and GraphicalStatistics , 21(2):433–452, 2012.[51] M. Stein. Asymptotically efficient prediction of a random field witha misspecified covariance function.
Annals of Statistics , 16:55–63,1988.
52] M. Stein. Bounds on the efficiency of linear predictions using anincorrect covariance function.
Annals of Statistics , 18:1116–1138,1990.[53] M. Stein. A comparison of generalized cross validation and modifiedmaximum likelihood for estimating the parameters of a stochasticprocess.
Annals of Statistics , 18:1139–1157, 1990.[54] M. Stein. Uniform asymptotic optimality of linear predictions of arandom field using an incorrect second-order structure.
Annals ofStatistics , 18:850–872, 1990.[55] M. Stein.
Interpolation of Spatial Data: Some Theory for Kriging .Springer-Verlag, New York, 1999.[56] T. Sweeting. Uniform asymptotic normality of the maximum likeli-hood estimator.
Annals of Statistics , 8:1375–1381, 1980.[57] A. W. Van der Vaart. Maximum likelihood estimation under a spatialsampling scheme.
Annals of Statistics , 24(5):2049–2057, 1996.[58] A. W. Van der Vaart.
Asymptotic statistics , volume 3. CambridgeUniversity Press, 2000.[59] D. Wang and W.-L. Loh. On fixed-domain asymptotics and covari-ance tapering in Gaussian random field models.
Electronic Journalof Statistics , 5:238–269, 2011.[60] Z. Ying. Asymptotic properties of a maximum likelihood estimatorwith data from a Gaussian process.
Journal of Multivariate Analysis ,36:280–296, 1991.[61] Z. Ying. Maximum likelihood estimation of parameters under a spa-tial sampling scheme.
Annals of Statistics , 21:1567–1590, 1993.[62] H. Zhang. Inconsistent estimation and asymptotically equivalent in-terpolations in model-based geostatistics.
Journal of the AmericanStatistical Association , 99:250–261, 2004.[63] H. Zhang and D. Zimmerman. Toward reconciling two asymptoticframeworks in spatial statistics.
Biometrika , 92:921–936, 2005., 92:921–936, 2005.