[PDF] Augmented Gaussian Random Field: Theory and Computation

Abstract

We propose the novel augmented Gaussian random field (AGRF), which is a universal framework incorporating the data of observable and derivatives of any order. Rigorous theory is established. We prove that under certain conditions, the observable and its derivatives of any order are governed by a single Gaussian random field, which is the aforementioned AGRF. As a corollary, the statement ``the derivative of a Gaussian process remains a Gaussian process'' is validated, since the derivative is represented by a part of the AGRF. Moreover, a computational method corresponding to the universal AGRF framework is constructed. Both noiseless and noisy scenarios are considered. Formulas of the posterior distributions are deduced in a nice closed form. A significant advantage of our computational method is that the universal AGRF framework provides a natural way to incorporate arbitrary order derivatives and deal with missing data. We use four numerical examples to demonstrate the effectiveness of the computational method. The numerical examples are composite function, damped harmonic oscillator, Korteweg-De Vries equation, and Burgers' equation.

Full PDF

AAugmented Gaussian Random Field: Theory and Computation

Sheng Zhang , Xiu Yang ∗ , Samy Tindel , and Guang Lin † Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA18015, USA School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907, USA

Abstract:

We propose a rigorous theoretical foundation for incorporating data of observable and derivativesof any order in a Gaussian-random-ﬁeld-based surrogate model using tools in real analysis and probability.We demonstrate that under some conditions, the random ﬁeld representing the derivatives is a Gaussianrandom ﬁeld (GRF) given that its structure is derived from the GRF regressing the data of the observable.We propose an augmented Gaussian random ﬁeld (AGRF) framework to unify these GRFs and calculatethe prediction of this surrogate model in a similar manner as the conventional Gaussian process regressionmethod. A prominent advantage of our method is that it can incorporate arbitrary order derivatives anddeal with missing data.

Keywords:

Gaussian random ﬁeld, Gaussian process regression, arbitrary order derivatives, missing data

Gaussian random ﬁeld (GRF) has been widely used in scientiﬁc and engineering study to construct a surro-gate model (also called response surface or metamodel in diﬀerent areas) of a complex system’s observablebased on available observations. Especially, its special case Gaussian process (GP) has become a powerfultool in applied math, statistics, machine learning, etc. [26]. Although random processes originally refer toone-dimensional random ﬁelds [1], e.g., models describing time dependent systems, the terminology GP is interchangeable with GRF now in most application scenarios that involve high-dimensional systems. Also,in diﬀerent areas, GRF-based (or GP-based) methods have diﬀerent names. For example, in geostatistics,GP regression is referred to as Kriging , and it has multiple variants [27, 12].To enhance the prediction accuracy of the GRF-based surrogate model, one can incorporate all theadditional information available, such as gradients, Hessian, multi-ﬁdelity data, physical law, and empiricalknowledge. For example, gradient-enhanced Kriging (GEK) uses gradients in either direct [18] or indirect [3]way to improve the accuracy; multi-ﬁdelity Cokriging combines a small amount of high-ﬁdelity data with alarge amount of low-ﬁdelity data from simpliﬁed or reduced models, in order to leverage low-ﬁdelity modelsfor speedup, while using high-ﬁdelity data to establish accuracy and convergence guarantees [11, 9, 15, 23];physics-informed Kriging takes advantage of well-developed simulation tools to incorporate physical laws inthe resulting GRF [32, 31, 33]; GP-based numerical ordinary/partial diﬀerential equation solver intrinsicallyimposes the equations in the structure of the GP, and it is one of the most important tools in probabilisticscientiﬁc computing [28, 10, 2, 24, 25, 4, 19, 20, 21]; inequality constraints that are not explicitly reﬂectedin governing equations, e.g., positivity, monotonicity, can also be imposed to enhance accuracy and reduceuncertainty in the prediction [6, 16, 17, 22].Despite the success in applying the aforementioned GRF-based methods to the construction of surrogatemodels for practical problems, the theory related to the accuracy, convergence, uncertainty, etc. of thesemethods is not well developed in general. Especially, when using the observable and its derivatives (such as ∗ [email protected] † [email protected] a r X i v : . [ m a t h . S T ] O c t radient) jointly, e.g., to solve diﬀerential equations, one usually assumes that the random ﬁeld representingthe derivative is also a GRF, and its mean and covariance functions can be calculated by taking derivativesof the mean and covariance functions of the GRF representing the observable. Also, the correlation betweenthe observable and its derivative is calculated accordingly. Intuitively, this is correct, because a linearcombination of multiple Gaussian random variables is still a Gaussian random variable. However, takinglimit of a linear combination is a critical step in the deﬁnition of derivative, which makes the theoreticalguarantee of using derivatives in GRF-based methods less obvious. To the best of our knowledge, there isno comprehensive theory on the validity of the aforementioned assumption on the random ﬁeld representingthe derivative nor on its relation with the observable. Most literature uses the linearity of the derivativeoperator to validate this assumption (e.g., [29]), which is questionable.In this work, we provide a rigorous theoretical framework to incorporate arbitrary order derivativesin GRF-based methods using tools in real analysis and probability. We demonstrate that under someconditions, the random ﬁeld representing the observable’s derivative obtained by taking derivative of theGRF representing the observable is a GRF, and its structure can be derived from the GRF representingthe observable. We propose an augmented Gaussian random ﬁeld (AGRF) framework to unify these GRFsand calculate the posterior distribution of the observable and its derivatives at new locations in a similarmanner as the conventional GP regression. Our theoretical and computational framework is universal as itcan incorporate derivatives of arbitrary order . Moreover, our approach can deal with missing data withoutdiﬃculty, which is very useful in practice.This paper is organized as follows. The theoretical framework is presented in Section 2, the computationalframework is shown in Section 3, the numerical examples are given in Section 4, and the conclusion followsin Section 5. In this section, we brieﬂy review some fundamental deﬁnitions and an important theorem for GRF. Then wepresent our main theoretical framework, i.e., the AGRF, that uniﬁes the GRFs representing the observableand its derivative. Finally, we extend it to the general case to incorporate information of derivatives ofarbitrary order. For notation brevity, we consider the system’s observable as a univariate function of thephysical location or time. The extension to the multivariate case is straightforward. Therefore, our theoreticalframework is applicable to GP methods that use the information of gradient as well as arbitrary orderderivatives of the observable.

In this paper, the Euclidean space R refers to R equipped with Euclidean metric and Lebesgue measure onLebesgue-measurable sets. We begin with the deﬁnition of random ﬁelds. Deﬁnition 2.1.

Let (Ω , F , P) be a probability space and X be a set. Suppose f : Ω × X → R is a mappingsuch that for each x ∈ X , f ( · , x ) : Ω → R is a random variable (or measurable function). Then f is called areal-valued random ﬁeld on X . We note that, in practical problems, X is typically a subset of the d -dimensional Euclidean space R d , i.e., X ⊆ R d . Here, X is considered as a general set as in [8]. Next, we deﬁne Gaussian random ﬁelds as follows: Deﬁnition 2.2.

Suppose f is a real-valued random ﬁeld on X such that for every ﬁnite set of indices x , . . . , x p ∈ X , ( f ( ω, x ) , . . . , f ( ω, x p )) is a multivariate Gaussian random variable, or, equivalently, everylinear combination of ( f ( ω, x ) , . . . , f ( ω, x p )) has a univariate Gaussian (or normal) distribution. Then f is called a Gaussian random ﬁeld on X . Here ω is an element in the sample space Ω. A Gaussian random ﬁeld is characterized by its mean andcovariance functions: 2 eﬁnition 2.3. Given a Gaussian random ﬁeld f on X , its mean function is deﬁned as the expectation: m ( x ) = E[ f ( ω, x )] , (1) and its covariance function is deﬁned as: k ( x, x (cid:48) ) = E[( f ( ω, x ) − m ( x ))( f ( ω, x (cid:48) ) − m ( x (cid:48) ))] . (2)Here, the covariance function is also called the kernel function . On the other hand, the following theoremprovides a perspective in the converse way: Theorem 2.4 (Kolmogorov consistency theorem) . Let X be any set, m : X → R any function, and k : X × X → R such that k is symmetric and nonnegative deﬁnite. Then there exists a Gaussian random ﬁeldon X with mean function m and covariance function k .Proof. See [13] (p. 176) or [8] (p. 443).

We start our theoretical results with a limit in a Gaussian random ﬁeld related to the derivative of itsrealization. The following theorem is a classic result with a proof in [5] (p. 84). Here, we adapt thestatement into our framework and reorganize the proof.

Theorem 2.5.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is diﬀerentiable and k ( x, x (cid:48) ) is twice continuously diﬀerentiable. Then there exists areal-valued random ﬁeld Df on R such that for each ﬁxed x ∈ R , f ( ω, x + δ ) − f ( ω, x ) δ m.s. −−−→ δ → Df ( ω, x ) . (3) Remark 2.6.

Here “m.s.” stands for mean-square convergence, i.e., convergence in L . Since mean-squareconvergence implies convergence in probability and convergence in distribution, the limit in Theorem 2.5 alsoholds in probability and in distribution.Proof of Theorem 2.5. Denote k ( x, x (cid:48) ) = ∂ ∂x∂x (cid:48) k ( x, x (cid:48) ) . (4)Let x ∈ R be ﬁxed. For any δ ∈ R − { } , deﬁne ξ δ = ξ δ ( ω ) = f ( ω, x + δ ) − f ( ω, x ) δ . (5)Suppose δ, τ ∈ R − { } . ThenE[ ξ δ ξ τ ] =E (cid:20) ( f ( ω, x + δ ) − f ( ω, x ))( f ( ω, x + τ ) − f ( ω, x )) δτ (cid:21) = 1 δτ { E[ f ( ω, x + δ ) f ( ω, x + τ )] − E[ f ( ω, x + δ ) f ( ω, x )] − E[ f ( ω, x ) f ( ω, x + τ )] + E[ f ( ω, x ) f ( ω, x )] } = 1 δτ { k ( x + δ, x + τ ) + m ( x + δ ) m ( x + τ ) − k ( x + δ, x ) − m ( x + δ ) m ( x ) − k ( x, x + τ ) − m ( x ) m ( x + τ ) + k ( x, x ) + m ( x ) m ( x ) } = 1 δτ { k ( x + δ, x + τ ) − k ( x + δ, x ) − k ( x, x + τ ) + k ( x, x ) } + 1 δτ { m ( x + δ ) m ( x + τ ) − m ( x + δ ) m ( x ) − m ( x ) m ( x + τ ) + m ( x ) m ( x ) } . (6)3ince m ( x ) is diﬀerentiable and k ( x, x (cid:48) ) is twice continuously diﬀerentiable,lim δ,τ → E[ ξ δ ξ τ ] = k ( x, x ) + m (cid:48) ( x ) m (cid:48) ( x ) . (7)Therefore, lim δ,τ → E[ | ξ δ − ξ τ | ] = lim δ,τ → { E[ ξ δ ξ δ ] + E[ ξ τ ξ τ ] − ξ δ ξ τ ] } = 0 . (8)Choose a sequence τ i → i = 1 , , . . . ) such thatE[ | ξ τ i +1 − ξ τ i | ] ≤ i . (9)Then E | ξ τ i +1 − ξ τ i | ≤ (cid:113) E[ | ξ τ i +1 − ξ τ i | ] ≤ i . (10)By monotone convergence theorem,E (cid:34) ∞ (cid:88) i =1 | ξ τ i +1 − ξ τ i | (cid:35) = ∞ (cid:88) i =1 E | ξ τ i +1 − ξ τ i | ≤ < ∞ . (11)Thus, P (cid:32) ∞ (cid:88) i =1 | ξ τ i +1 − ξ τ i | < ∞ (cid:33) = 1 . (12)So the random variable η = ξ τ + (cid:80) ∞ i =1 ( ξ τ i +1 − ξ τ i ) is well deﬁned, and it is a candidate to be the limit inEq. (3). Now, by monotone convergence theorem and Cauchy-Schwarz inequality, for any j ≥ | η − ξ τ j | ] =E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ (cid:88) i = j ( ξ τ i +1 − ξ τ i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  ≤ E  ∞ (cid:88) i = j | ξ τ i +1 − ξ τ i |   =E  ∞ (cid:88) i = j | ξ τ i +1 − ξ τ i |   ∞ (cid:88) i (cid:48) = j | ξ τ i (cid:48) +1 − ξ τ i (cid:48) |  =E  ∞ (cid:88) i = j ∞ (cid:88) i (cid:48) = j | ξ τ i +1 − ξ τ i || ξ τ i (cid:48) +1 − ξ τ i (cid:48) |  = ∞ (cid:88) i = j ∞ (cid:88) i (cid:48) = j E | ξ τ i +1 − ξ τ i || ξ τ i (cid:48) +1 − ξ τ i (cid:48) |≤ ∞ (cid:88) i = j ∞ (cid:88) i (cid:48) = j (cid:113) E[ | ξ τ i +1 − ξ τ i | ]E[ | ξ τ i (cid:48) +1 − ξ τ i (cid:48) | ] ≤ ∞ (cid:88) i = j ∞ (cid:88) i (cid:48) = j (cid:114) i i (cid:48) = ∞ (cid:88) i = j i ∞ (cid:88) i (cid:48) = j i (cid:48) = (cid:18) j − (cid:19) . (13)Since E[ | η − ξ δ | ] ≤ E[2 | η − ξ τ j | + 2 | ξ τ j − ξ δ | ] = 2E[ | η − ξ τ j | ] + 2E[ | ξ τ j − ξ δ | ] , (14)we have lim δ → E[ | η − ξ δ | ] = 0 , (15)or ξ δ m.s. −−−→ δ → η. (16)Let Df ( ω, x ) = η , do the same for every x , and the proof is complete.4 emark 2.7. For δ ∈ R − { } , let f δ = f δ ( ω, x ) = f ( ω, x + δ ) − f ( ω, x ) δ (17) be a random ﬁeld on R . One could also consider the convergence of the family { f δ | δ ∈ R − { }} to Df .We have refrained from getting into this type of consideration for the sake of conciseness. The next lemma indicates that the limit of a series of Gaussian random variables is still a Gaussianrandom variable. This is a classic result with various statements and proofs. Here, we give our statementand proof that ﬁts into our framework.

Lemma 2.8.

Let (Ω , F , P) be a probability space and ξ δ ( δ ∈ R ) be a family of random variables such that ξ δ ( δ (cid:54) = 0 ) are Gaussian random variables with mean µ δ and variance σ δ , and ξ δ m.s. −−−→ δ → ξ . (18) Then ξ has Gaussian distribution with mean µ and variance σ , and lim δ → µ δ = µ , lim δ → σ δ = σ . (19) Proof.

Suppose δ, τ ∈ R − { } . Since lim δ → E[ | ξ δ − ξ | ] = 0 (20)and E[ | ξ δ − ξ τ | ] ≤ | ξ δ − ξ | ] + 2E[ | ξ τ − ξ | ] , (21)we have lim δ,τ → E[ | ξ δ − ξ τ | ] = 0 . (22)Thus, | µ δ − µ τ | = | E[ ξ σ − ξ τ ] | ≤ E | ξ σ − ξ τ | ≤ (cid:112) E[ | ξ σ − ξ τ | ] , (23)which indicates lim δ,τ → | µ δ − µ τ | = 0 . (24)Hence, there exists µ ∈ R such that lim δ → µ δ = µ . (25)By (22), there exists ∆ > δ (0 < | δ | ≤ ∆), we haveE[ | ξ δ − ξ ∆ | ] ≤ . (26)So E[ ξ δ ] ≤ | ξ δ − ξ ∆ | ] + 2E[ ξ ] ≤ ξ ] . (27)Since ξ ∆ is Gaussian, E[ ξ ] is ﬁnite. Now, for any δ (0 < | δ | ≤ ∆) and τ (0 < | τ | ≤ ∆), by Cauchy-Schwarzinequality, | σ δ − σ τ | = | (E[ ξ δ ] − µ δ ) − (E[ ξ τ ] − µ τ ) |≤ | E[ ξ δ ] − E[ ξ τ ] | + | µ δ − µ τ |≤ E | ξ δ − ξ τ | + | µ δ − µ τ | = E | ξ δ − ξ τ || ξ δ + ξ τ | + | µ δ − µ τ |≤ (cid:112) E[ | ξ δ − ξ τ | ] (cid:112) E[ | ξ δ + ξ τ | ] + | µ δ − µ τ |≤ (cid:112) E[ | ξ δ − ξ τ | ] (cid:113) ξ δ ] + 2E[ ξ τ ] + | µ δ − µ τ |≤ (cid:112) E[ | ξ δ − ξ τ | ] (cid:113) ξ ] + | µ δ − µ τ | . (28)5y Eqs. (22), (25), and (28), we have lim δ,τ → | σ δ − σ τ | = 0 . (29)Since σ δ ≥ δ ∈ R − { } , there exists σ ≥ δ → σ δ = σ . (30)If σ = 0, then E | ξ − µ | ≤ E | ξ − ξ δ | + E | ξ δ − µ δ | + E | µ δ − µ |≤ (cid:112) E[ | ξ − ξ δ | ] + (cid:112) E[ | ξ δ − µ δ | ] + | µ δ − µ | = (cid:112) E[ | ξ − ξ δ | ] + (cid:113) σ δ + | µ δ − µ | . (31)Let δ → | ξ − µ | = 0 , (32)or P( ξ = µ ) = 1 . (33)Therefore, ξ has degenerate Gaussian distribution with mean µ and variance 0. If σ >

0, then there exists∆ (cid:48) > δ (0 < | δ | ≤ ∆ (cid:48) ), we have σ δ ≥ σ / > . (34)For all δ ( | δ | ≤ ∆ (cid:48) ), deﬁne η δ = ξ δ − µ δ σ δ . (35)Then when 0 < | δ | ≤ ∆ (cid:48) , η δ has the standard Gaussian distribution. Now,E[ | η δ − η | ] = E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ξ δ − µ δ σ δ − ξ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) ≤ (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ξ δ σ δ − ξ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) + 2E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) µ δ σ δ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) ≤ (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ξ δ σ δ − ξ δ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) + 4E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ξ δ σ − ξ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) + 2 (cid:12)(cid:12)(cid:12)(cid:12) µ δ σ δ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) = 4 (cid:18) σ δ − σ (cid:19) E[ ξ δ ] + 4 σ E[ | ξ δ − ξ | ] + 2 (cid:12)(cid:12)(cid:12)(cid:12) µ δ σ δ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) = 4 ( σ − σ δ ) σ δ σ ( σ δ + µ δ ) + 4 σ E[ | ξ δ − ξ | ] + 2 (cid:12)(cid:12)(cid:12)(cid:12) µ δ σ δ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) . (36)Thus, lim δ → E[ | η δ − η | ] = 0 . (37)Let Φ( z ) = 1 √ π (cid:90) z −∞ e − t / dt (38)be the cumulative distribution function (CDF) of the standard Gaussian distribution. For any z ∈ R andany (cid:15) >

0, P( η ≤ z ) ≤ P( η δ ≤ z + (cid:15) ) + P( | η δ − η | > (cid:15) ) ≤ Φ( z + (cid:15) ) + E[ | η δ − η | ] (cid:15) . (39)6et δ → η ≤ z ) ≤ Φ( z + (cid:15) ) . (40)Let (cid:15) → η ≤ z ) ≤ Φ( z ) . (41)On the other hand, P( η > z ) ≤ P( η δ > z − (cid:15) ) + P( | η δ − η | > (cid:15) ) ≤ − Φ( z − (cid:15) ) + E[ | η δ − η | ] (cid:15) . (42)Let δ → η > z ) ≤ − Φ( z − (cid:15) ) . (43)Let (cid:15) → η > z ) ≤ − Φ( z ) , (44)or P( η ≤ z ) ≥ Φ( z ) . (45)Combining (41) and (45), we have P( η ≤ z ) = Φ( z ) . (46)Thus, η has standard Gaussian distribution, which implies that ξ has Gaussian distribution with mean µ and variance σ .Now we deﬁne an augmented random ﬁeld consisting of a Gaussian random ﬁeld f and the Df associatedwith it. Then we prove that this augmented random ﬁeld is a Gaussian random ﬁeld. Deﬁnition 2.9.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is diﬀerentiable and k ( x, x (cid:48) ) is twice continuously diﬀerentiable. The real-valuedrandom ﬁeld Df deﬁned in Theorem 2.5 is called the derivative random ﬁeld of f . Deﬁne a real-valuedrandom ﬁeld on R × { , } : ˜ f : Ω × R × { , } → R (47) such that ˜ f ( ω, x,

0) = f ( ω, x )˜ f ( ω, x,

1) = Df ( ω, x ) . (48) We call ˜ f as the augmented random ﬁeld of f . Theorem 2.10.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is diﬀerentiable and k ( x, x (cid:48) ) is twice continuously diﬀerentiable. Then the augmentedrandom ﬁeld ˜ f of f is a Gaussian random ﬁeld on R × { , } .Proof. For any p, q ∈ N + ∪{ } such that p + q ≥

1, any x , . . . , x p , y , . . . , y q ∈ R , and any c , . . . , c p , d , . . . , d q ∈ R , we have the linear combination: p (cid:88) i =1 c i ˜ f ( ω, x i ,

0) + q (cid:88) j =1 d j ˜ f ( ω, y j , p (cid:88) i =1 c i f ( ω, x i ) + q (cid:88) j =1 d j Df ( ω, y j )= p (cid:88) i =1 c i f ( ω, x i ) + q (cid:88) j =1 d j lim δ j → f ( ω, y j + δ j ) − f ( ω, y j ) δ j = lim δ → · · · lim δ q →  p (cid:88) i =1 c i f ( ω, x i ) + q (cid:88) j =1 d j f ( ω, y j + δ j ) − f ( ω, y j ) δ j  , (49)7here the limits are taken in mean-square sense. Since f is a Gaussian random ﬁeld, p (cid:88) i =1 c i f ( ω, x i ) + q (cid:88) j =1 d j f ( ω, y j + δ j ) − f ( ω, y j ) δ j (50)has Gaussian distribution for any δ , . . . , δ q ∈ R − { } . By Lemma 2.8, the limit has Gaussian distribution.As this holds for every linear combination, ˜ f is a Gaussian random ﬁeld.After proving the augmented Gaussian random ﬁeld is well deﬁned, we calculate its mean and covariancefunctions. Theorem 2.11.

0) = m ( x ) , ˜ m ( x,

1) = ddx m ( x ) , (52) and covariance function: ˜ k : R × { , } × R × { , } → R (53) such that ˜ k ( x, , x (cid:48) ,

0) = k ( x, x (cid:48) ) , ˜ k ( x, , x (cid:48) ,

1) = ∂∂x (cid:48) k ( x, x (cid:48) ) , ˜ k ( x, , x (cid:48) ,

0) = ∂∂x k ( x, x (cid:48) ) , ˜ k ( x, , x (cid:48) ,

1) = ∂ ∂x∂x (cid:48) k ( x, x (cid:48) ) . (54) Proof.

By the deﬁnition of ˜ f : ˜ m ( x,

0) = E[ ˜ f ( ω, x, f ( ω, x )] = m ( x ) . (55)By Theorem 2.5 and Lemma 2.8, ˜ m ( x,

1) = E[ ˜ f ( ω, x, Df ( ω, x )]= E (cid:20) lim δ → f ( ω, x + δ ) − f ( ω, x ) δ (cid:21) = lim δ → E (cid:20) f ( ω, x + δ ) − f ( ω, x ) δ (cid:21) = lim δ → m ( x + δ ) − m ( x ) δ = dd x m ( x ) . (56)Similarly, by deﬁnition,˜ k ( x, , x (cid:48) ,

0) = E[( ˜ f ( ω, x, − ˜ m ( x, f ( ω, x (cid:48) , − ˜ m ( x (cid:48) , f ( ω, x ) − m ( x ))( f ( ω, x (cid:48) ) − m ( x (cid:48) ))]= k ( x, x (cid:48) ) . (57)8lso, ˜ k ( x, , x (cid:48) , f ( ω, x,

0) ˜ f ( ω, x (cid:48) , − E[ ˜ f ( ω, x, f ( ω, x (cid:48) , f ( ω, x ) Df ( ω, x (cid:48) )] − m ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + E (cid:20) f ( ω, x ) f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:21) − m ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + k ( x, x (cid:48) + δ ) + m ( x ) m ( x (cid:48) + δ ) − k ( x, x (cid:48) ) − m ( x ) m ( x (cid:48) ) δ − m ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + k ( x, x (cid:48) + δ ) − k ( x, x (cid:48) ) δ + m ( x ) (cid:18) m ( x (cid:48) + δ ) − m ( x (cid:48) ) δ − m (cid:48) ( x (cid:48) ) (cid:19) . Since (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E (cid:12)(cid:12)(cid:12)(cid:12) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:112) E[ | f ( ω, x ) | ] (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) , (58)let δ → k ( x, , x (cid:48) ,

1) = ∂∂x (cid:48) k ( x, x (cid:48) ) . (59)Similarly, ˜ k ( x, , x (cid:48) ,

0) = ∂∂x k ( x, x (cid:48) ) . (60)Finally, ˜ k ( x, , x (cid:48) , Df ( ω, x ) Df ( ω, x (cid:48) )] − m (cid:48) ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) Df ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + E (cid:20) Df ( ω, x ) f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:21) − m (cid:48) ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) Df ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + ∂∂x k ( x, x (cid:48) + δ ) + m (cid:48) ( x ) m ( x (cid:48) + δ ) − ∂∂x k ( x, x (cid:48) ) − m (cid:48) ( x ) m ( x (cid:48) ) δ − m (cid:48) ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) Df ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) ∂∂x k ( x, x (cid:48) + δ ) − ∂∂x k ( x, x (cid:48) ) δ + m (cid:48) ( x ) (cid:18) m ( x (cid:48) + δ ) − m ( x (cid:48) ) δ − m (cid:48) ( x (cid:48) ) (cid:19) . δ → k ( x, x (cid:48) ) is twice continuously diﬀerentiable,˜ k ( x, , x (cid:48) ,

1) = ∂ ∂x∂x (cid:48) k ( x, x (cid:48) ) . (61) Corollary 2.12.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is diﬀerentiable and k ( x, x (cid:48) ) is twice continuously diﬀerentiable. Then the derivativerandom ﬁeld Df of f is a Gaussian random ﬁeld on R with mean function m (cid:48) ( x ) and covariance function ∂ k ( x, x ) /∂x∂x (cid:48) .Sketch of proof. This is a direct conclusion from Theorem 2.5, Theorem 2.10, and Theorem 2.11.

The deﬁnition of the aforementioned augmented Gaussian random ﬁeld can be extended to more generalcases involing higher order derivatives.

Deﬁnition 2.13.

Let f be a Gaussian random ﬁeld on R such that the derivative random ﬁeld Df is aGaussian random ﬁeld on R with diﬀerentiable mean function and twice continuously diﬀerentiable covariancefunction. By Corollary 2.12, the derivative random ﬁeld D ( Df ) of Df is a Gaussian random ﬁeld on R .Deﬁne the second order derivative random ﬁeld of f as D f = D ( Df ) . Recursively, deﬁne the n th orderderivative random ﬁeld of f as D n f = D ( D n − f ) when D n − f is a Gaussian random ﬁeld on R withdiﬀerentiable mean function and twice continuously diﬀerentiable covariance function. Corollary 2.14.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is n times diﬀerentiable and k ( x, x (cid:48) ) is n times continuously diﬀerentiable ( n ∈ N + ).Then Df, . . . , D n f are well-deﬁned and are Gaussian random ﬁelds on R .Sketch of proof. This corollary can be proved by applying Corollary 2.12 recursively.Now we can deﬁne the general augmented Gaussian random ﬁeld involving higher order derivatives.

Deﬁnition 2.15.

0) = f ( ω, x )˜ f n ( ω, x,

1) = Df ( ω, x ) ... ˜ f n ( ω, x, n ) = D n f ( ω, x ) . (63) Theorem 2.16.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is n times diﬀerentiable and k ( x, x (cid:48) ) is n times continuously diﬀerentiable ( n ∈ N + ).Then the n th order augmented random ﬁeld ˜ f n of f is a Gaussian random ﬁeld on R × { , , . . . , n } .Sketch of proof. This can be proved in a similar way to the proof of Theorem 2.10.The following theorem calculates the mean and covariance functions of the n th order augmented Gaussianrandom ﬁeld. 10 heorem 2.17. Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is n times diﬀerentiable and k ( x, x (cid:48) ) is n times continuously diﬀerentiable ( n ∈ N + ).Then the n th order augmented random ﬁeld ˜ f n of f has mean function: ˜ m n : R × { , , . . . , n } → R ˜ m n ( x, i ) = d i d x i m ( x ) , (64) and covariance function: ˜ k n : R × { , , . . . , n } × R × { , , . . . , n } → R ˜ k n ( x, i, x (cid:48) , j ) = ∂ i + j ∂x i ∂x (cid:48) j k ( x, x (cid:48) ) . (65) Sketch of proof.

When i, j ∈ { , } , by Theorem 2.11, the formulas hold. Then this theorem can be provedby using induction and following a similar way to the proof of Theorem 2.11.Furthermore, we can extend the augmented Gaussian random ﬁeld to the inﬁnite order case, and calculatethe mean and covariance functions accordingly. Deﬁnition 2.18.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) and k ( x, x (cid:48) ) are smooth. Deﬁne the inﬁnite order augmented random ﬁeld of f as: ˜ f ∞ : Ω × R × N → R (66) such that ˜ f ∞ ( ω, x,

0) = f ( ω, x )˜ f ∞ ( ω, x,

1) = Df ( ω, x ) ... ˜ f ∞ ( ω, x, n ) = D n f ( ω, x ) ... (67) Theorem 2.19.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) and k ( x, x (cid:48) ) are smooth. Then the inﬁnite order augmented random ﬁeld ˜ f ∞ of f isa Gaussian random ﬁeld on R × N .Sketch of proof. This is proved in a similar way to the proof in Theorem 2.10.

Theorem 2.20.

Let f be a Gaussian random ﬁeld on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) and k ( x, x (cid:48) ) are smooth. Then the inﬁnite order augmented random ﬁeld ˜ f ∞ of f has mean function: ˜ m ∞ : R × N → R , ˜ m ∞ ( x, i ) = d i d x i m ( x ) , (68) and covariance function: ˜ k ∞ : R × N × R × N → R , ˜ k ∞ ( x, i, x (cid:48) , j ) = ∂ i + j ∂x i ∂x (cid:48) j k ( x, x (cid:48) ) . (69) Sketch of proof.

When i, j ∈ { , } , by Theorem 2.11, the formulas hold. Then the result is proved byinduction and in a similar way to Theorem 2.11. 11 Computational framework

In this section, we describe the computational framework for the AGRF prediction. It is based on theconventional GP regression. Speciﬁcally, in this work, we only consider the noiseless observations for demon-stration purpose. Noise in the observations can be incorporated by modifying the covariance matrix as in theconventional GP regression (see [26]). Also, since we use univariate observable in the theoretical framework,we still consider this scenario here to keep consistency. Formulations for the multi-variate cases can bededuced without diﬃculty based on our results and the gradient-enhanced Kriging/Cokriging methods (see,e.g., [30, 14, 7]).

As in the conventional GP regression, we aim to condition the joint Gaussian prior distribution on theobservations, as such to provide a posterior joint Gaussian distribution. In our framework, the observationsinclude the collected data of the observable and its derivatives of diﬀerent orders. Suppose we are given aﬁnite collection of real-valued data pairs:Observable: ( x , , y , ) ( x , , y , ) . . . ( x ,p , y ,p )First order derivative: ( x , , y , ) ( x , , y , ) . . . ( x ,p , y ,p )Second order derivative: ( x , , y , ) ( x , , y , ) . . . ( x ,p , y ,p )... n th order derivative: ( x n, , y n, ) ( x n, , y n, ) . . . ( x n,p n , y n,p n ) (70)with n ≥ p , p , p , . . . , p n ≥

0. Here x i,j are locations and y i,j are the data collected at this location.We also introduce the notation X = { x i,j } and y = { y i,j } where i ∈ { , , . . . , n } and j ∈ { , . . . , p i } . Ofnote, here we consider a general case, and it is not necessary that x i,j = x i +1 ,j . In other words, it is possiblethat the observable and its derivatives are sampled at diﬀerent locations. We assume that a mean functionand a covariance function are given for the Gaussian random ﬁeld f that describes the observable: m : R → R (71)and k : R × R → R (72)such that m is n times diﬀerentiable and k is symmetric, nonnegative deﬁnite, and 2 n times continuouslydiﬀerentiable. By Theorem 2.4 and Theorem 2.16, there exists a Gaussian random ﬁeld ˜ f n on R ×{ , , . . . , n } whose mean function and covariance function are given by Theorem 2.17. We use the augmented Gaussianrandom ﬁeld ˜ f n to model the data such that˜ f n ( ω, x i,j , i ) = y i,j for i ∈ { , , . . . , n } and j ∈ { , . . . , p i } . (73)The prediction of the q th (0 ≤ q ≤ n ) order derivative at any x ∈ R is the posterior mean of the randomvariable ˜ f n ( ω, x, q ), and the uncertainty in the prediction can be described by the conﬁdence interval basedon the posterior variance (or the standard deviation).Since ˜ f n is a Gaussian random ﬁeld, (cid:110) ˜ f n ( ω, x, q ) , ˜ f n ( ω, x i,j , i ) (cid:111) satisﬁes the multivariate Gaussian dis-12ribution:  ˜ f n ( ω, x, q )˜ f n ( ω, x , , f n ( ω, x n,p n , n )  ∼ N  ˜ m n ( x, q )˜ m n ( x , , m n ( x n,p n , n )  ,  ˜ k n ( x, q, x, q ) ˜ k n ( x, q, x , , · · · ˜ k n ( x, q, x n,p n , n )˜ k n ( x , , , x, q ) ˜ k n ( x , , , x , , · · · ˜ k n ( x , , , x n,p n , n )... ... . . . ...˜ k n ( x n,p n , n, x, q ) ˜ k n ( x n,p n , n, x , , · · · ˜ k n ( x n,p n , n, x n,p n , n )  . (74)Then the posterior distribution of ˜ f n ( ω, x, q ) given (73) is also a Gaussian distribution: (cid:16) ˜ f n ( ω, x, q ) (cid:12)(cid:12) ˜ f n ( ω, x i,j , i ) , i ∈ { , , . . . , n } , j ∈ { , . . . , p i } (cid:17) ∼ N ( µ, σ ) , (75)where µ = ˜ m n ( x, q ) + (cid:104) ˜ k n ( x, q, x , , · · · ˜ k n ( x, q, x n,p n , n ) (cid:105) K −  y , ... y n,p n  −  ˜ m n ( x , , m n ( x n,p n , n )  ,σ = ˜ k n ( x, q, x, q ) − (cid:104) ˜ k n ( x, q, x , , · · · ˜ k n ( x, q, x n,p n , n ) (cid:105) K −  ˜ k n ( x , , , x, q )...˜ k n ( x n,p n , n, x, q )  ,K =  ˜ k n ( x , , , x , , · · · ˜ k n ( x , , , x n,p n , n )... . . . ...˜ k n ( x n,p n , n, x , , · · · ˜ k n ( x n,p n , n, x n,p n , n )  . Here ˜ m n and ˜ k n are calculated according to Theorem 2.17. Now, we have the posterior distribution of the q th order derivative at x . The posterior mean is usually used as the prediction and the posterior standarddeviation is used to describe the uncertainty (e.g., the conﬁdence interval) in the prediction. In practice, we usually assume the form of m and k , which involve hyperparameters. We denote theseparameters as θ . For example, in the widely used squared exponential covariance function k ( x, x (cid:48) ) = σ exp( − ( x − x (cid:48) ) / (2 l )), σ and l are hyperparameters. Similarly, the mean function m may include ad-ditional hyperparameters. For example, if m is a polynomial as in the universal Kriging, the coeﬃcients ofthe polynomial are hyperparameters. Similar to the standard GP method, the AGRF method identiﬁes thehyperparameters via maximizing the following log likelihood:log p ( y | X , θ ) = − p + · · · + p n π ) −

12 log(det( K )) −  y , ... y n,p n  −  ˜ m ( x , , m n ( x n,p n , n )  (cid:62) K −  y , ... y n,p n  −  ˜ m ( x , , m n ( x n,p n , n )  , (76)where K =  ˜ k n ( x , , , x , , · · · ˜ k n ( x , , , x n,p n , n )... . . . ...˜ k n ( x n,p n , n, x , , · · · ˜ k n ( x n,p n , n, x n,p n , n )  . (77)13fter identifying the θ , we obtain the posterior distribution in Eq. (75). In this section, we present two examples to illustrate the AGRF framework. In both examples, we use thezero mean function m ( x ) = 0 , (78)and the squared exponential covariance function k ( x, x (cid:48) ) = σ exp (cid:18) − ( x − x (cid:48) ) l (cid:19) (79)to construct the GRF representing the observable. The hyperparameters to identify are θ = ( σ, l ). We usethe following relative L error (RLE) to evaluate the accuracy of the prediction by diﬀerent GRF-basedmethods: (cid:107) u − ˜ u (cid:107) / (cid:107) u (cid:107) , where u is the exact function and ˜ u is the prediction by GRF-based methods. Consider the following function: y ( x ) = (3 x − sin(14 x −

4) (80)on x ∈ [0 , θ = ( σ, l ) ∈ [0 . , × [0 . , . Case 1

The data include the observable at x ∈ { . , . , . , . , . } only. Figure 1 shows y, y (cid:48) , y (cid:48)(cid:48) predicted by AGRF with conﬁdence intervals. The AGRF surrogate model for y is the same asthat obtained by the conventional GP regression, because the former is a generalization of the latter.However, the conventional GP regression does not provide the prediction of y (cid:48) or y (cid:48)(cid:48) . Case 2

The data include the observable at x ∈ { . , . , . , . , . } and its ﬁrst order derivative at x ∈ { . , . , . } . Figure 2 shows that the AGRF prediction matches the exact function and itsderivatives better than the results shown in Case 1. This is because the derivative information isincorporated in the model. Case 3

The data include the observable at x ∈ { . , . , . , . , . } and its second order derivative at x ∈ { . , . , . } . Figure 3 shows that, like Case 2, the AGRF prediction imitates the ups and downsthat are present in the exact function but not fully reﬂected in the observable data. The predictionaccuracy is enhanced by incorporating the second order derivative, and it is better than the predictionin Case 1. Case 4

The data include the observable at x ∈ { . , . , . , . , . } , its ﬁrst order derivative at x ∈{ . , . , . } , and its second order derivative at x ∈ { . , . , . } . Figure 4 demonstrates that theAGRF prediction of y almost coincides with the exact function, and the prediction of y (cid:48) and y (cid:48)(cid:48) arealso very accuracy in most regions. It is not surprising that by using all available information we canconstruct the most accuracy surrogate model among all four cases.We can see that AGRF is able to integrate the observable and its derivatives of any order, regardlessof the location where they are collected. As one might expect, the AGRF prediction improves when moreinformation is available. Table 1 provides a quantitative comparison of the prediction accuracy in the fourcases above, which further veriﬁes our observations in Figures 1-4.14 a) y prediction (b) y (cid:48) prediction(c) y (cid:48)(cid:48) prediction Figure 1: [Example 1] Case 1: The data include the observable at x ∈ { . , . , . , . , . } . The AGRFprediction is the posterior mean and the shaded region is [posterior mean] ± × [posterior standard deviation].Table 1: [Example 1] The RLE of the AGRF prediction of y, y (cid:48) , y (cid:48)(cid:48) on x ∈ [0 , y RLE of y (cid:48) RLE of y (cid:48)(cid:48) Case 1 y .

347 0 .

960 1 . y, y (cid:48) .

356 0 .

412 0 . y, y (cid:48)(cid:48) .

501 0 .

387 0 . y, y (cid:48) , y (cid:48)(cid:48) .

088 0 .

126 0 . a) y prediction (b) y (cid:48) prediction(c) y (cid:48)(cid:48) prediction Figure 2: [Example 1] Case 2: The data include the observable at x ∈ { . , . , . , . , . } and its ﬁrstorder derivative at x ∈ { . , . , . } . The AGRF prediction is the posterior mean and the shaded region is[posterior mean] ± × [posterior standard deviation].16 a) y prediction (b) y (cid:48) prediction(c) y (cid:48)(cid:48) prediction Figure 3: [Example 1] Case 3: The data include the observable at x ∈ { . , . , . , . , . } and its secondorder derivative at x ∈ { . , . , . } . The AGRF prediction is the posterior mean and the shaded region is[posterior mean] ± × [posterior standard deviation].17 a) y prediction (b) y (cid:48) prediction(c) y (cid:48)(cid:48) prediction Figure 4: [Example 1] Case 4: The data include the observable at x ∈ { . , . , . , . , . } , its ﬁrst orderderivative at x ∈ { . , . , . } , and its second order derivative at x ∈ { . , . , . } . The AGRF predictionis the posterior mean and the shaded region is [posterior mean] ± × [posterior standard deviation].18 .2 Example 2 Consider the damped harmonic oscillator: (cid:26) F = − ky − cy (cid:48) F = my (cid:48)(cid:48) (81)where y is the displacement, y (cid:48) is the velocity, y (cid:48)(cid:48) is the acceleration, − ky is the restoring force, and − cy (cid:48) isthe frictional force. This system can be simpliﬁed to: y (cid:48)(cid:48) + 2 ζω y (cid:48) + ω y = 0 , (82)where ω = (cid:112) k/m is the undamped angular frequency and ζ = c/ √ mk is the damping ratio. When ζ < y ( t ) = A exp( − ζω t ) sin (cid:16)(cid:112) − ζ ω t + φ (cid:17) , (83)where the amplitude A and the phase φ determine the behavior needed to match the initial conditions. Now,consider a speciﬁc example: y ( t ) = exp( − . × t ) sin (cid:16)(cid:112) − . × t (cid:17) (84)on t ∈ [0 , θ = ( σ, l ) ∈ [0 . , × [0 . , . Case 1

We use the conventional GP regression. The data include the observable and its ﬁrst order derivativeat x ∈ { . , . , . , . , . } . The observable data are used to predict the displacement and the ﬁrstorder derivative data are used to predict the velocity, respectively. The results are shown in Figure 5.Apparently, the prediction is not accurate. Case 2

We use GEK. The data include the observable and its ﬁrst order derivative at x ∈ { . , . , . , . , . } .All these data are used jointly in the same random ﬁeld to predict the displacement and the velocity atthe same time. Figure 6 demonstrates that the prediction of y and y (cid:48) is much more accurate than thosein Case 1. Yet in the phase diagram plot, there is still signiﬁcant discrepancy between the predictionand the exact solution when t ∈ [0 . , . Case 3

We use AGRF. The data include the observable, its ﬁrst order derivative, and its second orderderivative at x ∈ { . , . , . , . , . } . All the data are used together in the same random ﬁeld topredict the displacement and the velocity at the same time. The results in Figure 7 show much betteraccuracy than the corresponding ones in Figures 5 and 6, as the prediction almost coincides with theexact solution and the conﬁdence intervals are very narrow.As in Example 1, we can see that the prediction is better when the observable and its derivative are usedtogether (Case 2) than when they are used separately (Case 1). Furthermore, the inclusion of second orderderivative can further enhance the prediction (Case 3). Table 2 provides a quantitative comparison of thesethree cases. In this work, we provide a comprehensive theoretical foundation for incorporating arbitrary order derivativesin GRF-based methods. We demonstrate that under some conditions, the derivative of each realizationof the GRF representing the observable can be considered as a realization of the GRF representing thecorresponding derivative of the observable. Combining these GRFs, we propose the augmented Gaussian a) y prediction (b) y (cid:48) prediction(c) Phase-space diagram prediction Figure 5: [Example 2] Case 1: Prediction of the displacement and the velocity using conventional GPregression. The data include the observable and its ﬁrst order derivative at x ∈ { . , . , . , . , . } .The observable data are used to predict the displacement and the ﬁrst order derivative data are usedto predict the velocity, respectively. The GP prediction is the posterior mean and the shaded region is[posterior mean] ± × [posterior standard deviation].Table 2: [Example 2] The RLE of the prediction of the displacement and the velocity on x ∈ [0 , y RLE of y (cid:48) Case 1 y, y (cid:48) (separately) 1 .

334 1 . y, y (cid:48) (together) 0 .

120 0 . y, y (cid:48) , y (cid:48)(cid:48) (together) 0 .

031 0 . a) y prediction (b) y (cid:48) prediction(c) Phase-space diagram prediction Figure 6: [Example 2] Case 2: Prediction of the displacement and the velocity using GEK. The data includethe observable and its ﬁrst order derivative at x ∈ { . , . , . , . , . } . All the data are used jointly inthe same random ﬁeld to predict the displacement and the velocity at the same time. The GEK predictionis the posterior mean and the shaded region is [posterior mean] ± × [posterior standard deviation].21 a) y prediction (b) y (cid:48) prediction(c) Phase-space diagram prediction Figure 7: [Example 2] Case 3: Prediction of the displacement and the velocity using AGRF. The data includethe observable, its ﬁrst order derivative, and its second order derivative at x ∈ { . , . , . , . , . } . Allthe data are used together in the same random ﬁeld to predict the displacement and the velocity at thesame time. The AGRF prediction is the posterior mean and the shaded region is [posterior mean] ± × [posterior standard deviation]. 22 andom ﬁeld (AGRF), which is a universal framework incorporating the observable and its derivatives.Consequently, the computation of the posterior mean and variance of the observable and its derivativesat new locations can be carried out in a similar way as in the conventional GP regression method. Ournumerical results show that, in general, using more information of the system, we get better prediction interms of the accuracy of the posterior mean and the width of the conﬁdence interval as in the literature.The main advantage is that our universal framework provides a natural way to incorporate arbitrary orderderivatives and deal with missing data, e.g., the observation of the observable or its derivative is missing atsome sampling locations.To the best of our knowledge, this is the ﬁrst systematic work that proves the validity of the intuitiveassumption that the derivative of a GRF is still a GRF, which is widely used in probabilistic scientiﬁccomputing and GRF-based (or GP-based) regression methods. Although we use one-dimensional system fordemonstration, our conclusion can be extended to multi-dimensional systems without diﬃculty. Acknowledgements

SZ was supported by National Science Foundation (NSF) Mathematical Sciences Graduate Internship (MSGI)Program sponsored by the NSF Division of Mathematical Sciences. XY was supported by the U.S. Depart-ment of Energy (DOE), Oﬃce of Science, Oﬃce of Advanced Scientiﬁc Computing Research (ASCR) aspart of Multifaceted Mathematics for Rare, Extreme Events in Complex Energy and Environment Sys-tems (MACSER). GL was supported by the National Science Foundation (DMS-1555072, DMS-1736364,CMMI-1634832 and CMMI-1560834), and Brookhaven National Laboratory Subcontract 382247.

References [1] Petter Abrahamsen. A review of gaussian random ﬁelds and correlation functions.[2] Oksana A. Chkrebtii, David A. Campbell, Ben Calderhead, and Mark A. Girolami. Bayesian solutionuncertainty quantiﬁcation for diﬀerential equations.

Bayesian Analysis , 11(4):1239–1267.[3] H.-S. Chung and J. Alonso. Using gradients to construct cokriging approximation models for high-dimensional design optimization problems. In . Amer-ican Institute of Aeronautics and Astronautics.[4] Jon Cockayne, Chris J. Oates, T. J. Sullivan, and Mark Girolami. Bayesian probabilistic numericalmethods.

SIAM Review , 61(3):756–789.[5] Harald Cram´er and M. R. Leadbetter.

Stationary and related stochastic processes: sample functionproperties and their applications . Dover books on mathematics. Dover Publications, dover ed edition.OCLC: ocm56532664.[6] S´ebastien Da Veiga and Amandine Marrel. Gaussian process modeling with inequality constraints.

Annales de la facult´e des sciences de Toulouse Math´ematiques , 21(3):529–555.[7] Yixiang Deng, Guang Lin, and Xiu Yang. Multiﬁdelity data fusion via gradient-enhanced gaussianprocess regression. arXiv:2008.01066 [cs, stat] .[8] R. M. Dudley.

Real analysis and probability . Number 74 in Cambridge studies in advanced mathematics.Cambridge University Press.[9] Alexander I.J Forrester, Andr´as S´obester, and Andy J Keane. Multi-ﬁdelity optimization via surro-gate modelling.

Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences ,463(2088):3251–3269. 2310] Philipp Hennig, Michael A. Osborne, and Mark Girolami. Probabilistic numerics and uncertainty incomputations.

Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences ,471(2179):20150142.[11] M. Kennedy. Predicting the output from a complex computer code when fast approximations areavailable.

Biometrika , 87(1):1–13.[12] P. K. Kitanidis.

Introduction to geostatistics: applications to hydrogeology . Cambridge University Press.[13] Leonid Koralov and Yakov G. Sinai.

Theory of Probability and Random Processes . Universitext. SpringerBerlin Heidelberg.[14] Luc Laurent, Rodolphe Le Riche, Bruno Soulier, and Pierre-Alain Boucard. An overview of gradient-enhanced metamodels with applications.

Archives of Computational Methods in Engineering , 26(1):61–106.[15] Loic Le Gratiet and Josselin Garnier. RECURSIVE CO-KRIGING MODEL FOR DESIGN OF COM-PUTER EXPERIMENTS WITH MULTIPLE LEVELS OF FIDELITY.

International Journal forUncertainty Quantiﬁcation , 4(5):365–386.[16] L. Lin and D. B. Dunson. Bayesian monotone regression using gaussian process projection.

Biometrika ,101(2):303–317.[17] Andr´es F. L´opez-Lopera, Fran¸cois Bachoc, Nicolas Durrande, and Olivier Roustant. Finite-dimensionalgaussian approximation with linear inequality constraints.

SIAM/ASA Journal on Uncertainty Quan-tiﬁcation , 6(3):1224–1255.[18] Max D. Morris, Toby J. Mitchell, and Donald Ylvisaker. Bayesian design and analysis of computerexperiments: Use of derivatives in surface prediction.

Technometrics , 35(3):243–255.[19] Houman Owhadi. Bayesian numerical homogenization.

Multiscale Modeling & Simulation , 13(3):812–828.[20] Houman Owhadi. Multigrid with rough coeﬃcients and multiresolution operator decomposition fromhierarchical information games.

SIAM Review , 59(1):99–149.[21] Houman Owhadi and Clint Scovel.

Operator-Adapted Wavelets, Fast Solvers, and Numerical Homoge-nization: From a Game Theoretic Approach to Numerical Approximation and Algorithm Design . Cam-bridge University Press, 1 edition.[22] Andrew Pensoneault, Xiu Yang, and Xueyu Zhu. Nonnegativity-enforced gaussian process regression. arXiv:2004.04632 [cs, stat] .[23] P. Perdikaris, M. Raissi, A. Damianou, N. D. Lawrence, and G. E. Karniadakis. Nonlinear informa-tion fusion algorithms for data-eﬃcient multi-ﬁdelity modelling.

Proceedings of the Royal Society A:Mathematical, Physical and Engineering Sciences , 473(2198):20160751.[24] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear diﬀerentialequations using gaussian processes.

Journal of Computational Physics , 348:683–693.[25] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Numerical gaussian processes fortime-dependent and nonlinear partial diﬀerential equations.

SIAM Journal on Scientiﬁc Computing ,40(1):A172–A198.[26] Carl Edward Rasmussen and Christopher K. I. Williams.

Gaussian processes for machine learning .Adaptive computation and machine learning. MIT Press. OCLC: ocm61285753.2427] Jerome Sacks, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. Design and analysis of computerexperiments.

Statistical Science , 4(4):409–423.[28] Michael Schober, David Duvenaud, and Philipp Hennig. Probabilistic ODE solvers with runge-kuttameans. arXiv:1406.2582 [cs, math, stat] .[29] E. Solak, R. Murray-smith, W. E. Leithead, D. J. Leith, and Carl E. Rasmussen. Derivative observationsin gaussian process models of dynamic systems. In S. Becker, S. Thrun, and K. Obermayer, editors,

Advances in Neural Information Processing Systems 15 , pages 1057–1064. MIT Press.[30] Selvakumar Ulaganathan, Ivo Couckuyt, Francesco Ferranti, Eric Laermans, and Tom Dhaene. Perfor-mance study of multi-ﬁdelity gradient enhanced kriging.

Structural and Multidisciplinary Optimization ,51(5):1017–1033.[31] Xiu Yang, David Barajas-Solano, Guzel Tartakovsky, and Alexandre M. Tartakovsky. Physics-informedCoKriging: A gaussian-process-regression-based multiﬁdelity method for data-model convergence.

Jour-nal of Computational Physics , 395:410–431.[32] Xiu Yang, Guzel Tartakovsky, and Alexandre Tartakovsky. Physics-informed kriging: A physics-informed gaussian process regression method for data-model convergence. arXiv:1809.03461 [cs, stat] .[33] Xiu Yang, Xueyu Zhu, and Jing Li. When biﬁdelity meets CoKriging: An eﬃcient physics-informedMultiFidelity method.