Augmented Gaussian Random Field: Theory and Computation
AAugmented Gaussian Random Field: Theory and Computation
Sheng Zhang , Xiu Yang ∗ , Samy Tindel , and Guang Lin † Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA18015, USA School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907, USA
Abstract:
We propose a rigorous theoretical foundation for incorporating data of observable and derivativesof any order in a Gaussian-random-field-based surrogate model using tools in real analysis and probability.We demonstrate that under some conditions, the random field representing the derivatives is a Gaussianrandom field (GRF) given that its structure is derived from the GRF regressing the data of the observable.We propose an augmented Gaussian random field (AGRF) framework to unify these GRFs and calculatethe prediction of this surrogate model in a similar manner as the conventional Gaussian process regressionmethod. A prominent advantage of our method is that it can incorporate arbitrary order derivatives anddeal with missing data.
Keywords:
Gaussian random field, Gaussian process regression, arbitrary order derivatives, missing data
Gaussian random field (GRF) has been widely used in scientific and engineering study to construct a surro-gate model (also called response surface or metamodel in different areas) of a complex system’s observablebased on available observations. Especially, its special case Gaussian process (GP) has become a powerfultool in applied math, statistics, machine learning, etc. [26]. Although random processes originally refer toone-dimensional random fields [1], e.g., models describing time dependent systems, the terminology GP is interchangeable with GRF now in most application scenarios that involve high-dimensional systems. Also,in different areas, GRF-based (or GP-based) methods have different names. For example, in geostatistics,GP regression is referred to as Kriging , and it has multiple variants [27, 12].To enhance the prediction accuracy of the GRF-based surrogate model, one can incorporate all theadditional information available, such as gradients, Hessian, multi-fidelity data, physical law, and empiricalknowledge. For example, gradient-enhanced Kriging (GEK) uses gradients in either direct [18] or indirect [3]way to improve the accuracy; multi-fidelity Cokriging combines a small amount of high-fidelity data with alarge amount of low-fidelity data from simplified or reduced models, in order to leverage low-fidelity modelsfor speedup, while using high-fidelity data to establish accuracy and convergence guarantees [11, 9, 15, 23];physics-informed Kriging takes advantage of well-developed simulation tools to incorporate physical laws inthe resulting GRF [32, 31, 33]; GP-based numerical ordinary/partial differential equation solver intrinsicallyimposes the equations in the structure of the GP, and it is one of the most important tools in probabilisticscientific computing [28, 10, 2, 24, 25, 4, 19, 20, 21]; inequality constraints that are not explicitly reflectedin governing equations, e.g., positivity, monotonicity, can also be imposed to enhance accuracy and reduceuncertainty in the prediction [6, 16, 17, 22].Despite the success in applying the aforementioned GRF-based methods to the construction of surrogatemodels for practical problems, the theory related to the accuracy, convergence, uncertainty, etc. of thesemethods is not well developed in general. Especially, when using the observable and its derivatives (such as ∗ [email protected] † [email protected] a r X i v : . [ m a t h . S T ] O c t radient) jointly, e.g., to solve differential equations, one usually assumes that the random field representingthe derivative is also a GRF, and its mean and covariance functions can be calculated by taking derivativesof the mean and covariance functions of the GRF representing the observable. Also, the correlation betweenthe observable and its derivative is calculated accordingly. Intuitively, this is correct, because a linearcombination of multiple Gaussian random variables is still a Gaussian random variable. However, takinglimit of a linear combination is a critical step in the definition of derivative, which makes the theoreticalguarantee of using derivatives in GRF-based methods less obvious. To the best of our knowledge, there isno comprehensive theory on the validity of the aforementioned assumption on the random field representingthe derivative nor on its relation with the observable. Most literature uses the linearity of the derivativeoperator to validate this assumption (e.g., [29]), which is questionable.In this work, we provide a rigorous theoretical framework to incorporate arbitrary order derivativesin GRF-based methods using tools in real analysis and probability. We demonstrate that under someconditions, the random field representing the observable’s derivative obtained by taking derivative of theGRF representing the observable is a GRF, and its structure can be derived from the GRF representingthe observable. We propose an augmented Gaussian random field (AGRF) framework to unify these GRFsand calculate the posterior distribution of the observable and its derivatives at new locations in a similarmanner as the conventional GP regression. Our theoretical and computational framework is universal as itcan incorporate derivatives of arbitrary order . Moreover, our approach can deal with missing data withoutdifficulty, which is very useful in practice.This paper is organized as follows. The theoretical framework is presented in Section 2, the computationalframework is shown in Section 3, the numerical examples are given in Section 4, and the conclusion followsin Section 5. In this section, we briefly review some fundamental definitions and an important theorem for GRF. Then wepresent our main theoretical framework, i.e., the AGRF, that unifies the GRFs representing the observableand its derivative. Finally, we extend it to the general case to incorporate information of derivatives ofarbitrary order. For notation brevity, we consider the system’s observable as a univariate function of thephysical location or time. The extension to the multivariate case is straightforward. Therefore, our theoreticalframework is applicable to GP methods that use the information of gradient as well as arbitrary orderderivatives of the observable.
In this paper, the Euclidean space R refers to R equipped with Euclidean metric and Lebesgue measure onLebesgue-measurable sets. We begin with the definition of random fields. Definition 2.1.
Let (Ω , F , P) be a probability space and X be a set. Suppose f : Ω × X → R is a mappingsuch that for each x ∈ X , f ( · , x ) : Ω → R is a random variable (or measurable function). Then f is called areal-valued random field on X . We note that, in practical problems, X is typically a subset of the d -dimensional Euclidean space R d , i.e., X ⊆ R d . Here, X is considered as a general set as in [8]. Next, we define Gaussian random fields as follows: Definition 2.2.
Suppose f is a real-valued random field on X such that for every finite set of indices x , . . . , x p ∈ X , ( f ( ω, x ) , . . . , f ( ω, x p )) is a multivariate Gaussian random variable, or, equivalently, everylinear combination of ( f ( ω, x ) , . . . , f ( ω, x p )) has a univariate Gaussian (or normal) distribution. Then f is called a Gaussian random field on X . Here ω is an element in the sample space Ω. A Gaussian random field is characterized by its mean andcovariance functions: 2 efinition 2.3. Given a Gaussian random field f on X , its mean function is defined as the expectation: m ( x ) = E[ f ( ω, x )] , (1) and its covariance function is defined as: k ( x, x (cid:48) ) = E[( f ( ω, x ) − m ( x ))( f ( ω, x (cid:48) ) − m ( x (cid:48) ))] . (2)Here, the covariance function is also called the kernel function . On the other hand, the following theoremprovides a perspective in the converse way: Theorem 2.4 (Kolmogorov consistency theorem) . Let X be any set, m : X → R any function, and k : X × X → R such that k is symmetric and nonnegative definite. Then there exists a Gaussian random fieldon X with mean function m and covariance function k .Proof. See [13] (p. 176) or [8] (p. 443).
We start our theoretical results with a limit in a Gaussian random field related to the derivative of itsrealization. The following theorem is a classic result with a proof in [5] (p. 84). Here, we adapt thestatement into our framework and reorganize the proof.
Theorem 2.5.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is differentiable and k ( x, x (cid:48) ) is twice continuously differentiable. Then there exists areal-valued random field Df on R such that for each fixed x ∈ R , f ( ω, x + δ ) − f ( ω, x ) δ m.s. −−−→ δ → Df ( ω, x ) . (3) Remark 2.6.
Here “m.s.” stands for mean-square convergence, i.e., convergence in L . Since mean-squareconvergence implies convergence in probability and convergence in distribution, the limit in Theorem 2.5 alsoholds in probability and in distribution.Proof of Theorem 2.5. Denote k ( x, x (cid:48) ) = ∂ ∂x∂x (cid:48) k ( x, x (cid:48) ) . (4)Let x ∈ R be fixed. For any δ ∈ R − { } , define ξ δ = ξ δ ( ω ) = f ( ω, x + δ ) − f ( ω, x ) δ . (5)Suppose δ, τ ∈ R − { } . ThenE[ ξ δ ξ τ ] =E (cid:20) ( f ( ω, x + δ ) − f ( ω, x ))( f ( ω, x + τ ) − f ( ω, x )) δτ (cid:21) = 1 δτ { E[ f ( ω, x + δ ) f ( ω, x + τ )] − E[ f ( ω, x + δ ) f ( ω, x )] − E[ f ( ω, x ) f ( ω, x + τ )] + E[ f ( ω, x ) f ( ω, x )] } = 1 δτ { k ( x + δ, x + τ ) + m ( x + δ ) m ( x + τ ) − k ( x + δ, x ) − m ( x + δ ) m ( x ) − k ( x, x + τ ) − m ( x ) m ( x + τ ) + k ( x, x ) + m ( x ) m ( x ) } = 1 δτ { k ( x + δ, x + τ ) − k ( x + δ, x ) − k ( x, x + τ ) + k ( x, x ) } + 1 δτ { m ( x + δ ) m ( x + τ ) − m ( x + δ ) m ( x ) − m ( x ) m ( x + τ ) + m ( x ) m ( x ) } . (6)3ince m ( x ) is differentiable and k ( x, x (cid:48) ) is twice continuously differentiable,lim δ,τ → E[ ξ δ ξ τ ] = k ( x, x ) + m (cid:48) ( x ) m (cid:48) ( x ) . (7)Therefore, lim δ,τ → E[ | ξ δ − ξ τ | ] = lim δ,τ → { E[ ξ δ ξ δ ] + E[ ξ τ ξ τ ] − ξ δ ξ τ ] } = 0 . (8)Choose a sequence τ i → i = 1 , , . . . ) such thatE[ | ξ τ i +1 − ξ τ i | ] ≤ i . (9)Then E | ξ τ i +1 − ξ τ i | ≤ (cid:113) E[ | ξ τ i +1 − ξ τ i | ] ≤ i . (10)By monotone convergence theorem,E (cid:34) ∞ (cid:88) i =1 | ξ τ i +1 − ξ τ i | (cid:35) = ∞ (cid:88) i =1 E | ξ τ i +1 − ξ τ i | ≤ < ∞ . (11)Thus, P (cid:32) ∞ (cid:88) i =1 | ξ τ i +1 − ξ τ i | < ∞ (cid:33) = 1 . (12)So the random variable η = ξ τ + (cid:80) ∞ i =1 ( ξ τ i +1 − ξ τ i ) is well defined, and it is a candidate to be the limit inEq. (3). Now, by monotone convergence theorem and Cauchy-Schwarz inequality, for any j ≥ | η − ξ τ j | ] =E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ (cid:88) i = j ( ξ τ i +1 − ξ τ i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E ∞ (cid:88) i = j | ξ τ i +1 − ξ τ i | =E ∞ (cid:88) i = j | ξ τ i +1 − ξ τ i | ∞ (cid:88) i (cid:48) = j | ξ τ i (cid:48) +1 − ξ τ i (cid:48) | =E ∞ (cid:88) i = j ∞ (cid:88) i (cid:48) = j | ξ τ i +1 − ξ τ i || ξ τ i (cid:48) +1 − ξ τ i (cid:48) | = ∞ (cid:88) i = j ∞ (cid:88) i (cid:48) = j E | ξ τ i +1 − ξ τ i || ξ τ i (cid:48) +1 − ξ τ i (cid:48) |≤ ∞ (cid:88) i = j ∞ (cid:88) i (cid:48) = j (cid:113) E[ | ξ τ i +1 − ξ τ i | ]E[ | ξ τ i (cid:48) +1 − ξ τ i (cid:48) | ] ≤ ∞ (cid:88) i = j ∞ (cid:88) i (cid:48) = j (cid:114) i i (cid:48) = ∞ (cid:88) i = j i ∞ (cid:88) i (cid:48) = j i (cid:48) = (cid:18) j − (cid:19) . (13)Since E[ | η − ξ δ | ] ≤ E[2 | η − ξ τ j | + 2 | ξ τ j − ξ δ | ] = 2E[ | η − ξ τ j | ] + 2E[ | ξ τ j − ξ δ | ] , (14)we have lim δ → E[ | η − ξ δ | ] = 0 , (15)or ξ δ m.s. −−−→ δ → η. (16)Let Df ( ω, x ) = η , do the same for every x , and the proof is complete.4 emark 2.7. For δ ∈ R − { } , let f δ = f δ ( ω, x ) = f ( ω, x + δ ) − f ( ω, x ) δ (17) be a random field on R . One could also consider the convergence of the family { f δ | δ ∈ R − { }} to Df .We have refrained from getting into this type of consideration for the sake of conciseness. The next lemma indicates that the limit of a series of Gaussian random variables is still a Gaussianrandom variable. This is a classic result with various statements and proofs. Here, we give our statementand proof that fits into our framework.
Lemma 2.8.
Let (Ω , F , P) be a probability space and ξ δ ( δ ∈ R ) be a family of random variables such that ξ δ ( δ (cid:54) = 0 ) are Gaussian random variables with mean µ δ and variance σ δ , and ξ δ m.s. −−−→ δ → ξ . (18) Then ξ has Gaussian distribution with mean µ and variance σ , and lim δ → µ δ = µ , lim δ → σ δ = σ . (19) Proof.
Suppose δ, τ ∈ R − { } . Since lim δ → E[ | ξ δ − ξ | ] = 0 (20)and E[ | ξ δ − ξ τ | ] ≤ | ξ δ − ξ | ] + 2E[ | ξ τ − ξ | ] , (21)we have lim δ,τ → E[ | ξ δ − ξ τ | ] = 0 . (22)Thus, | µ δ − µ τ | = | E[ ξ σ − ξ τ ] | ≤ E | ξ σ − ξ τ | ≤ (cid:112) E[ | ξ σ − ξ τ | ] , (23)which indicates lim δ,τ → | µ δ − µ τ | = 0 . (24)Hence, there exists µ ∈ R such that lim δ → µ δ = µ . (25)By (22), there exists ∆ > δ (0 < | δ | ≤ ∆), we haveE[ | ξ δ − ξ ∆ | ] ≤ . (26)So E[ ξ δ ] ≤ | ξ δ − ξ ∆ | ] + 2E[ ξ ] ≤ ξ ] . (27)Since ξ ∆ is Gaussian, E[ ξ ] is finite. Now, for any δ (0 < | δ | ≤ ∆) and τ (0 < | τ | ≤ ∆), by Cauchy-Schwarzinequality, | σ δ − σ τ | = | (E[ ξ δ ] − µ δ ) − (E[ ξ τ ] − µ τ ) |≤ | E[ ξ δ ] − E[ ξ τ ] | + | µ δ − µ τ |≤ E | ξ δ − ξ τ | + | µ δ − µ τ | = E | ξ δ − ξ τ || ξ δ + ξ τ | + | µ δ − µ τ |≤ (cid:112) E[ | ξ δ − ξ τ | ] (cid:112) E[ | ξ δ + ξ τ | ] + | µ δ − µ τ |≤ (cid:112) E[ | ξ δ − ξ τ | ] (cid:113) ξ δ ] + 2E[ ξ τ ] + | µ δ − µ τ |≤ (cid:112) E[ | ξ δ − ξ τ | ] (cid:113) ξ ] + | µ δ − µ τ | . (28)5y Eqs. (22), (25), and (28), we have lim δ,τ → | σ δ − σ τ | = 0 . (29)Since σ δ ≥ δ ∈ R − { } , there exists σ ≥ δ → σ δ = σ . (30)If σ = 0, then E | ξ − µ | ≤ E | ξ − ξ δ | + E | ξ δ − µ δ | + E | µ δ − µ |≤ (cid:112) E[ | ξ − ξ δ | ] + (cid:112) E[ | ξ δ − µ δ | ] + | µ δ − µ | = (cid:112) E[ | ξ − ξ δ | ] + (cid:113) σ δ + | µ δ − µ | . (31)Let δ → | ξ − µ | = 0 , (32)or P( ξ = µ ) = 1 . (33)Therefore, ξ has degenerate Gaussian distribution with mean µ and variance 0. If σ >
0, then there exists∆ (cid:48) > δ (0 < | δ | ≤ ∆ (cid:48) ), we have σ δ ≥ σ / > . (34)For all δ ( | δ | ≤ ∆ (cid:48) ), define η δ = ξ δ − µ δ σ δ . (35)Then when 0 < | δ | ≤ ∆ (cid:48) , η δ has the standard Gaussian distribution. Now,E[ | η δ − η | ] = E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ξ δ − µ δ σ δ − ξ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) ≤ (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ξ δ σ δ − ξ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) + 2E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) µ δ σ δ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) ≤ (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ξ δ σ δ − ξ δ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) + 4E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ξ δ σ − ξ σ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) + 2 (cid:12)(cid:12)(cid:12)(cid:12) µ δ σ δ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) = 4 (cid:18) σ δ − σ (cid:19) E[ ξ δ ] + 4 σ E[ | ξ δ − ξ | ] + 2 (cid:12)(cid:12)(cid:12)(cid:12) µ δ σ δ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) = 4 ( σ − σ δ ) σ δ σ ( σ δ + µ δ ) + 4 σ E[ | ξ δ − ξ | ] + 2 (cid:12)(cid:12)(cid:12)(cid:12) µ δ σ δ − µ σ (cid:12)(cid:12)(cid:12)(cid:12) . (36)Thus, lim δ → E[ | η δ − η | ] = 0 . (37)Let Φ( z ) = 1 √ π (cid:90) z −∞ e − t / dt (38)be the cumulative distribution function (CDF) of the standard Gaussian distribution. For any z ∈ R andany (cid:15) >
0, P( η ≤ z ) ≤ P( η δ ≤ z + (cid:15) ) + P( | η δ − η | > (cid:15) ) ≤ Φ( z + (cid:15) ) + E[ | η δ − η | ] (cid:15) . (39)6et δ → η ≤ z ) ≤ Φ( z + (cid:15) ) . (40)Let (cid:15) → η ≤ z ) ≤ Φ( z ) . (41)On the other hand, P( η > z ) ≤ P( η δ > z − (cid:15) ) + P( | η δ − η | > (cid:15) ) ≤ − Φ( z − (cid:15) ) + E[ | η δ − η | ] (cid:15) . (42)Let δ → η > z ) ≤ − Φ( z − (cid:15) ) . (43)Let (cid:15) → η > z ) ≤ − Φ( z ) , (44)or P( η ≤ z ) ≥ Φ( z ) . (45)Combining (41) and (45), we have P( η ≤ z ) = Φ( z ) . (46)Thus, η has standard Gaussian distribution, which implies that ξ has Gaussian distribution with mean µ and variance σ .Now we define an augmented random field consisting of a Gaussian random field f and the Df associatedwith it. Then we prove that this augmented random field is a Gaussian random field. Definition 2.9.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is differentiable and k ( x, x (cid:48) ) is twice continuously differentiable. The real-valuedrandom field Df defined in Theorem 2.5 is called the derivative random field of f . Define a real-valuedrandom field on R × { , } : ˜ f : Ω × R × { , } → R (47) such that ˜ f ( ω, x,
0) = f ( ω, x )˜ f ( ω, x,
1) = Df ( ω, x ) . (48) We call ˜ f as the augmented random field of f . Theorem 2.10.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is differentiable and k ( x, x (cid:48) ) is twice continuously differentiable. Then the augmentedrandom field ˜ f of f is a Gaussian random field on R × { , } .Proof. For any p, q ∈ N + ∪{ } such that p + q ≥
1, any x , . . . , x p , y , . . . , y q ∈ R , and any c , . . . , c p , d , . . . , d q ∈ R , we have the linear combination: p (cid:88) i =1 c i ˜ f ( ω, x i ,
0) + q (cid:88) j =1 d j ˜ f ( ω, y j , p (cid:88) i =1 c i f ( ω, x i ) + q (cid:88) j =1 d j Df ( ω, y j )= p (cid:88) i =1 c i f ( ω, x i ) + q (cid:88) j =1 d j lim δ j → f ( ω, y j + δ j ) − f ( ω, y j ) δ j = lim δ → · · · lim δ q → p (cid:88) i =1 c i f ( ω, x i ) + q (cid:88) j =1 d j f ( ω, y j + δ j ) − f ( ω, y j ) δ j , (49)7here the limits are taken in mean-square sense. Since f is a Gaussian random field, p (cid:88) i =1 c i f ( ω, x i ) + q (cid:88) j =1 d j f ( ω, y j + δ j ) − f ( ω, y j ) δ j (50)has Gaussian distribution for any δ , . . . , δ q ∈ R − { } . By Lemma 2.8, the limit has Gaussian distribution.As this holds for every linear combination, ˜ f is a Gaussian random field.After proving the augmented Gaussian random field is well defined, we calculate its mean and covariancefunctions. Theorem 2.11.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is differentiable and k ( x, x (cid:48) ) is twice continuously differentiable. Then the augmentedrandom field ˜ f of f has mean function: ˜ m : R × { , } → R (51) such that ˜ m ( x,
0) = m ( x ) , ˜ m ( x,
1) = ddx m ( x ) , (52) and covariance function: ˜ k : R × { , } × R × { , } → R (53) such that ˜ k ( x, , x (cid:48) ,
0) = k ( x, x (cid:48) ) , ˜ k ( x, , x (cid:48) ,
1) = ∂∂x (cid:48) k ( x, x (cid:48) ) , ˜ k ( x, , x (cid:48) ,
0) = ∂∂x k ( x, x (cid:48) ) , ˜ k ( x, , x (cid:48) ,
1) = ∂ ∂x∂x (cid:48) k ( x, x (cid:48) ) . (54) Proof.
By the definition of ˜ f : ˜ m ( x,
0) = E[ ˜ f ( ω, x, f ( ω, x )] = m ( x ) . (55)By Theorem 2.5 and Lemma 2.8, ˜ m ( x,
1) = E[ ˜ f ( ω, x, Df ( ω, x )]= E (cid:20) lim δ → f ( ω, x + δ ) − f ( ω, x ) δ (cid:21) = lim δ → E (cid:20) f ( ω, x + δ ) − f ( ω, x ) δ (cid:21) = lim δ → m ( x + δ ) − m ( x ) δ = dd x m ( x ) . (56)Similarly, by definition,˜ k ( x, , x (cid:48) ,
0) = E[( ˜ f ( ω, x, − ˜ m ( x, f ( ω, x (cid:48) , − ˜ m ( x (cid:48) , f ( ω, x ) − m ( x ))( f ( ω, x (cid:48) ) − m ( x (cid:48) ))]= k ( x, x (cid:48) ) . (57)8lso, ˜ k ( x, , x (cid:48) , f ( ω, x,
0) ˜ f ( ω, x (cid:48) , − E[ ˜ f ( ω, x, f ( ω, x (cid:48) , f ( ω, x ) Df ( ω, x (cid:48) )] − m ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + E (cid:20) f ( ω, x ) f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:21) − m ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + k ( x, x (cid:48) + δ ) + m ( x ) m ( x (cid:48) + δ ) − k ( x, x (cid:48) ) − m ( x ) m ( x (cid:48) ) δ − m ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + k ( x, x (cid:48) + δ ) − k ( x, x (cid:48) ) δ + m ( x ) (cid:18) m ( x (cid:48) + δ ) − m ( x (cid:48) ) δ − m (cid:48) ( x (cid:48) ) (cid:19) . Since (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E (cid:12)(cid:12)(cid:12)(cid:12) f ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:112) E[ | f ( ω, x ) | ] (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) , (58)let δ → k ( x, , x (cid:48) ,
1) = ∂∂x (cid:48) k ( x, x (cid:48) ) . (59)Similarly, ˜ k ( x, , x (cid:48) ,
0) = ∂∂x k ( x, x (cid:48) ) . (60)Finally, ˜ k ( x, , x (cid:48) , Df ( ω, x ) Df ( ω, x (cid:48) )] − m (cid:48) ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) Df ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + E (cid:20) Df ( ω, x ) f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:21) − m (cid:48) ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) Df ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) + ∂∂x k ( x, x (cid:48) + δ ) + m (cid:48) ( x ) m ( x (cid:48) + δ ) − ∂∂x k ( x, x (cid:48) ) − m (cid:48) ( x ) m ( x (cid:48) ) δ − m (cid:48) ( x ) m (cid:48) ( x (cid:48) )=E (cid:20) Df ( ω, x ) (cid:18) Df ( ω, x (cid:48) ) − f ( ω, x (cid:48) + δ ) − f ( ω, x (cid:48) ) δ (cid:19)(cid:21) ∂∂x k ( x, x (cid:48) + δ ) − ∂∂x k ( x, x (cid:48) ) δ + m (cid:48) ( x ) (cid:18) m ( x (cid:48) + δ ) − m ( x (cid:48) ) δ − m (cid:48) ( x (cid:48) ) (cid:19) . δ → k ( x, x (cid:48) ) is twice continuously differentiable,˜ k ( x, , x (cid:48) ,
1) = ∂ ∂x∂x (cid:48) k ( x, x (cid:48) ) . (61) Corollary 2.12.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is differentiable and k ( x, x (cid:48) ) is twice continuously differentiable. Then the derivativerandom field Df of f is a Gaussian random field on R with mean function m (cid:48) ( x ) and covariance function ∂ k ( x, x ) /∂x∂x (cid:48) .Sketch of proof. This is a direct conclusion from Theorem 2.5, Theorem 2.10, and Theorem 2.11.
The definition of the aforementioned augmented Gaussian random field can be extended to more generalcases involing higher order derivatives.
Definition 2.13.
Let f be a Gaussian random field on R such that the derivative random field Df is aGaussian random field on R with differentiable mean function and twice continuously differentiable covariancefunction. By Corollary 2.12, the derivative random field D ( Df ) of Df is a Gaussian random field on R .Define the second order derivative random field of f as D f = D ( Df ) . Recursively, define the n th orderderivative random field of f as D n f = D ( D n − f ) when D n − f is a Gaussian random field on R withdifferentiable mean function and twice continuously differentiable covariance function. Corollary 2.14.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is n times differentiable and k ( x, x (cid:48) ) is n times continuously differentiable ( n ∈ N + ).Then Df, . . . , D n f are well-defined and are Gaussian random fields on R .Sketch of proof. This corollary can be proved by applying Corollary 2.12 recursively.Now we can define the general augmented Gaussian random field involving higher order derivatives.
Definition 2.15.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is n times differentiable and k ( x, x (cid:48) ) is n times continuously differentiable ( n ∈ N + ).Define the n th order augmented random field of f as: ˜ f n : Ω × R × { , , . . . , n } → R (62) such that ˜ f n ( ω, x,
0) = f ( ω, x )˜ f n ( ω, x,
1) = Df ( ω, x ) ... ˜ f n ( ω, x, n ) = D n f ( ω, x ) . (63) Theorem 2.16.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is n times differentiable and k ( x, x (cid:48) ) is n times continuously differentiable ( n ∈ N + ).Then the n th order augmented random field ˜ f n of f is a Gaussian random field on R × { , , . . . , n } .Sketch of proof. This can be proved in a similar way to the proof of Theorem 2.10.The following theorem calculates the mean and covariance functions of the n th order augmented Gaussianrandom field. 10 heorem 2.17. Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) is n times differentiable and k ( x, x (cid:48) ) is n times continuously differentiable ( n ∈ N + ).Then the n th order augmented random field ˜ f n of f has mean function: ˜ m n : R × { , , . . . , n } → R ˜ m n ( x, i ) = d i d x i m ( x ) , (64) and covariance function: ˜ k n : R × { , , . . . , n } × R × { , , . . . , n } → R ˜ k n ( x, i, x (cid:48) , j ) = ∂ i + j ∂x i ∂x (cid:48) j k ( x, x (cid:48) ) . (65) Sketch of proof.
When i, j ∈ { , } , by Theorem 2.11, the formulas hold. Then this theorem can be provedby using induction and following a similar way to the proof of Theorem 2.11.Furthermore, we can extend the augmented Gaussian random field to the infinite order case, and calculatethe mean and covariance functions accordingly. Definition 2.18.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) and k ( x, x (cid:48) ) are smooth. Define the infinite order augmented random field of f as: ˜ f ∞ : Ω × R × N → R (66) such that ˜ f ∞ ( ω, x,
0) = f ( ω, x )˜ f ∞ ( ω, x,
1) = Df ( ω, x ) ... ˜ f ∞ ( ω, x, n ) = D n f ( ω, x ) ... (67) Theorem 2.19.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) and k ( x, x (cid:48) ) are smooth. Then the infinite order augmented random field ˜ f ∞ of f isa Gaussian random field on R × N .Sketch of proof. This is proved in a similar way to the proof in Theorem 2.10.
Theorem 2.20.
Let f be a Gaussian random field on R with mean function m ( x ) and covariance function k ( x, x (cid:48) ) such that m ( x ) and k ( x, x (cid:48) ) are smooth. Then the infinite order augmented random field ˜ f ∞ of f has mean function: ˜ m ∞ : R × N → R , ˜ m ∞ ( x, i ) = d i d x i m ( x ) , (68) and covariance function: ˜ k ∞ : R × N × R × N → R , ˜ k ∞ ( x, i, x (cid:48) , j ) = ∂ i + j ∂x i ∂x (cid:48) j k ( x, x (cid:48) ) . (69) Sketch of proof.
When i, j ∈ { , } , by Theorem 2.11, the formulas hold. Then the result is proved byinduction and in a similar way to Theorem 2.11. 11 Computational framework
In this section, we describe the computational framework for the AGRF prediction. It is based on theconventional GP regression. Specifically, in this work, we only consider the noiseless observations for demon-stration purpose. Noise in the observations can be incorporated by modifying the covariance matrix as in theconventional GP regression (see [26]). Also, since we use univariate observable in the theoretical framework,we still consider this scenario here to keep consistency. Formulations for the multi-variate cases can bededuced without difficulty based on our results and the gradient-enhanced Kriging/Cokriging methods (see,e.g., [30, 14, 7]).
As in the conventional GP regression, we aim to condition the joint Gaussian prior distribution on theobservations, as such to provide a posterior joint Gaussian distribution. In our framework, the observationsinclude the collected data of the observable and its derivatives of different orders. Suppose we are given afinite collection of real-valued data pairs:Observable: ( x , , y , ) ( x , , y , ) . . . ( x ,p , y ,p )First order derivative: ( x , , y , ) ( x , , y , ) . . . ( x ,p , y ,p )Second order derivative: ( x , , y , ) ( x , , y , ) . . . ( x ,p , y ,p )... n th order derivative: ( x n, , y n, ) ( x n, , y n, ) . . . ( x n,p n , y n,p n ) (70)with n ≥ p , p , p , . . . , p n ≥
0. Here x i,j are locations and y i,j are the data collected at this location.We also introduce the notation X = { x i,j } and y = { y i,j } where i ∈ { , , . . . , n } and j ∈ { , . . . , p i } . Ofnote, here we consider a general case, and it is not necessary that x i,j = x i +1 ,j . In other words, it is possiblethat the observable and its derivatives are sampled at different locations. We assume that a mean functionand a covariance function are given for the Gaussian random field f that describes the observable: m : R → R (71)and k : R × R → R (72)such that m is n times differentiable and k is symmetric, nonnegative definite, and 2 n times continuouslydifferentiable. By Theorem 2.4 and Theorem 2.16, there exists a Gaussian random field ˜ f n on R ×{ , , . . . , n } whose mean function and covariance function are given by Theorem 2.17. We use the augmented Gaussianrandom field ˜ f n to model the data such that˜ f n ( ω, x i,j , i ) = y i,j for i ∈ { , , . . . , n } and j ∈ { , . . . , p i } . (73)The prediction of the q th (0 ≤ q ≤ n ) order derivative at any x ∈ R is the posterior mean of the randomvariable ˜ f n ( ω, x, q ), and the uncertainty in the prediction can be described by the confidence interval basedon the posterior variance (or the standard deviation).Since ˜ f n is a Gaussian random field, (cid:110) ˜ f n ( ω, x, q ) , ˜ f n ( ω, x i,j , i ) (cid:111) satisfies the multivariate Gaussian dis-12ribution: ˜ f n ( ω, x, q )˜ f n ( ω, x , , f n ( ω, x n,p n , n ) ∼ N ˜ m n ( x, q )˜ m n ( x , , m n ( x n,p n , n ) , ˜ k n ( x, q, x, q ) ˜ k n ( x, q, x , , · · · ˜ k n ( x, q, x n,p n , n )˜ k n ( x , , , x, q ) ˜ k n ( x , , , x , , · · · ˜ k n ( x , , , x n,p n , n )... ... . . . ...˜ k n ( x n,p n , n, x, q ) ˜ k n ( x n,p n , n, x , , · · · ˜ k n ( x n,p n , n, x n,p n , n ) . (74)Then the posterior distribution of ˜ f n ( ω, x, q ) given (73) is also a Gaussian distribution: (cid:16) ˜ f n ( ω, x, q ) (cid:12)(cid:12) ˜ f n ( ω, x i,j , i ) , i ∈ { , , . . . , n } , j ∈ { , . . . , p i } (cid:17) ∼ N ( µ, σ ) , (75)where µ = ˜ m n ( x, q ) + (cid:104) ˜ k n ( x, q, x , , · · · ˜ k n ( x, q, x n,p n , n ) (cid:105) K − y , ... y n,p n − ˜ m n ( x , , m n ( x n,p n , n ) ,σ = ˜ k n ( x, q, x, q ) − (cid:104) ˜ k n ( x, q, x , , · · · ˜ k n ( x, q, x n,p n , n ) (cid:105) K − ˜ k n ( x , , , x, q )...˜ k n ( x n,p n , n, x, q ) ,K = ˜ k n ( x , , , x , , · · · ˜ k n ( x , , , x n,p n , n )... . . . ...˜ k n ( x n,p n , n, x , , · · · ˜ k n ( x n,p n , n, x n,p n , n ) . Here ˜ m n and ˜ k n are calculated according to Theorem 2.17. Now, we have the posterior distribution of the q th order derivative at x . The posterior mean is usually used as the prediction and the posterior standarddeviation is used to describe the uncertainty (e.g., the confidence interval) in the prediction. In practice, we usually assume the form of m and k , which involve hyperparameters. We denote theseparameters as θ . For example, in the widely used squared exponential covariance function k ( x, x (cid:48) ) = σ exp( − ( x − x (cid:48) ) / (2 l )), σ and l are hyperparameters. Similarly, the mean function m may include ad-ditional hyperparameters. For example, if m is a polynomial as in the universal Kriging, the coefficients ofthe polynomial are hyperparameters. Similar to the standard GP method, the AGRF method identifies thehyperparameters via maximizing the following log likelihood:log p ( y | X , θ ) = − p + · · · + p n π ) −
12 log(det( K )) − y , ... y n,p n − ˜ m ( x , , m n ( x n,p n , n ) (cid:62) K − y , ... y n,p n − ˜ m ( x , , m n ( x n,p n , n ) , (76)where K = ˜ k n ( x , , , x , , · · · ˜ k n ( x , , , x n,p n , n )... . . . ...˜ k n ( x n,p n , n, x , , · · · ˜ k n ( x n,p n , n, x n,p n , n ) . (77)13fter identifying the θ , we obtain the posterior distribution in Eq. (75). In this section, we present two examples to illustrate the AGRF framework. In both examples, we use thezero mean function m ( x ) = 0 , (78)and the squared exponential covariance function k ( x, x (cid:48) ) = σ exp (cid:18) − ( x − x (cid:48) ) l (cid:19) (79)to construct the GRF representing the observable. The hyperparameters to identify are θ = ( σ, l ). We usethe following relative L error (RLE) to evaluate the accuracy of the prediction by different GRF-basedmethods: (cid:107) u − ˜ u (cid:107) / (cid:107) u (cid:107) , where u is the exact function and ˜ u is the prediction by GRF-based methods. Consider the following function: y ( x ) = (3 x − sin(14 x −
4) (80)on x ∈ [0 , θ = ( σ, l ) ∈ [0 . , × [0 . , . Case 1
The data include the observable at x ∈ { . , . , . , . , . } only. Figure 1 shows y, y (cid:48) , y (cid:48)(cid:48) predicted by AGRF with confidence intervals. The AGRF surrogate model for y is the same asthat obtained by the conventional GP regression, because the former is a generalization of the latter.However, the conventional GP regression does not provide the prediction of y (cid:48) or y (cid:48)(cid:48) . Case 2
The data include the observable at x ∈ { . , . , . , . , . } and its first order derivative at x ∈ { . , . , . } . Figure 2 shows that the AGRF prediction matches the exact function and itsderivatives better than the results shown in Case 1. This is because the derivative information isincorporated in the model. Case 3
The data include the observable at x ∈ { . , . , . , . , . } and its second order derivative at x ∈ { . , . , . } . Figure 3 shows that, like Case 2, the AGRF prediction imitates the ups and downsthat are present in the exact function but not fully reflected in the observable data. The predictionaccuracy is enhanced by incorporating the second order derivative, and it is better than the predictionin Case 1. Case 4
The data include the observable at x ∈ { . , . , . , . , . } , its first order derivative at x ∈{ . , . , . } , and its second order derivative at x ∈ { . , . , . } . Figure 4 demonstrates that theAGRF prediction of y almost coincides with the exact function, and the prediction of y (cid:48) and y (cid:48)(cid:48) arealso very accuracy in most regions. It is not surprising that by using all available information we canconstruct the most accuracy surrogate model among all four cases.We can see that AGRF is able to integrate the observable and its derivatives of any order, regardlessof the location where they are collected. As one might expect, the AGRF prediction improves when moreinformation is available. Table 1 provides a quantitative comparison of the prediction accuracy in the fourcases above, which further verifies our observations in Figures 1-4.14 a) y prediction (b) y (cid:48) prediction(c) y (cid:48)(cid:48) prediction Figure 1: [Example 1] Case 1: The data include the observable at x ∈ { . , . , . , . , . } . The AGRFprediction is the posterior mean and the shaded region is [posterior mean] ± × [posterior standard deviation].Table 1: [Example 1] The RLE of the AGRF prediction of y, y (cid:48) , y (cid:48)(cid:48) on x ∈ [0 , y RLE of y (cid:48) RLE of y (cid:48)(cid:48) Case 1 y .
347 0 .
960 1 . y, y (cid:48) .
356 0 .
412 0 . y, y (cid:48)(cid:48) .
501 0 .
387 0 . y, y (cid:48) , y (cid:48)(cid:48) .
088 0 .
126 0 . a) y prediction (b) y (cid:48) prediction(c) y (cid:48)(cid:48) prediction Figure 2: [Example 1] Case 2: The data include the observable at x ∈ { . , . , . , . , . } and its firstorder derivative at x ∈ { . , . , . } . The AGRF prediction is the posterior mean and the shaded region is[posterior mean] ± × [posterior standard deviation].16 a) y prediction (b) y (cid:48) prediction(c) y (cid:48)(cid:48) prediction Figure 3: [Example 1] Case 3: The data include the observable at x ∈ { . , . , . , . , . } and its secondorder derivative at x ∈ { . , . , . } . The AGRF prediction is the posterior mean and the shaded region is[posterior mean] ± × [posterior standard deviation].17 a) y prediction (b) y (cid:48) prediction(c) y (cid:48)(cid:48) prediction Figure 4: [Example 1] Case 4: The data include the observable at x ∈ { . , . , . , . , . } , its first orderderivative at x ∈ { . , . , . } , and its second order derivative at x ∈ { . , . , . } . The AGRF predictionis the posterior mean and the shaded region is [posterior mean] ± × [posterior standard deviation].18 .2 Example 2 Consider the damped harmonic oscillator: (cid:26) F = − ky − cy (cid:48) F = my (cid:48)(cid:48) (81)where y is the displacement, y (cid:48) is the velocity, y (cid:48)(cid:48) is the acceleration, − ky is the restoring force, and − cy (cid:48) isthe frictional force. This system can be simplified to: y (cid:48)(cid:48) + 2 ζω y (cid:48) + ω y = 0 , (82)where ω = (cid:112) k/m is the undamped angular frequency and ζ = c/ √ mk is the damping ratio. When ζ < y ( t ) = A exp( − ζω t ) sin (cid:16)(cid:112) − ζ ω t + φ (cid:17) , (83)where the amplitude A and the phase φ determine the behavior needed to match the initial conditions. Now,consider a specific example: y ( t ) = exp( − . × t ) sin (cid:16)(cid:112) − . × t (cid:17) (84)on t ∈ [0 , θ = ( σ, l ) ∈ [0 . , × [0 . , . Case 1
We use the conventional GP regression. The data include the observable and its first order derivativeat x ∈ { . , . , . , . , . } . The observable data are used to predict the displacement and the firstorder derivative data are used to predict the velocity, respectively. The results are shown in Figure 5.Apparently, the prediction is not accurate. Case 2
We use GEK. The data include the observable and its first order derivative at x ∈ { . , . , . , . , . } .All these data are used jointly in the same random field to predict the displacement and the velocity atthe same time. Figure 6 demonstrates that the prediction of y and y (cid:48) is much more accurate than thosein Case 1. Yet in the phase diagram plot, there is still significant discrepancy between the predictionand the exact solution when t ∈ [0 . , . Case 3
We use AGRF. The data include the observable, its first order derivative, and its second orderderivative at x ∈ { . , . , . , . , . } . All the data are used together in the same random field topredict the displacement and the velocity at the same time. The results in Figure 7 show much betteraccuracy than the corresponding ones in Figures 5 and 6, as the prediction almost coincides with theexact solution and the confidence intervals are very narrow.As in Example 1, we can see that the prediction is better when the observable and its derivative are usedtogether (Case 2) than when they are used separately (Case 1). Furthermore, the inclusion of second orderderivative can further enhance the prediction (Case 3). Table 2 provides a quantitative comparison of thesethree cases. In this work, we provide a comprehensive theoretical foundation for incorporating arbitrary order derivativesin GRF-based methods. We demonstrate that under some conditions, the derivative of each realizationof the GRF representing the observable can be considered as a realization of the GRF representing thecorresponding derivative of the observable. Combining these GRFs, we propose the augmented Gaussian a) y prediction (b) y (cid:48) prediction(c) Phase-space diagram prediction Figure 5: [Example 2] Case 1: Prediction of the displacement and the velocity using conventional GPregression. The data include the observable and its first order derivative at x ∈ { . , . , . , . , . } .The observable data are used to predict the displacement and the first order derivative data are usedto predict the velocity, respectively. The GP prediction is the posterior mean and the shaded region is[posterior mean] ± × [posterior standard deviation].Table 2: [Example 2] The RLE of the prediction of the displacement and the velocity on x ∈ [0 , y RLE of y (cid:48) Case 1 y, y (cid:48) (separately) 1 .
334 1 . y, y (cid:48) (together) 0 .
120 0 . y, y (cid:48) , y (cid:48)(cid:48) (together) 0 .
031 0 . a) y prediction (b) y (cid:48) prediction(c) Phase-space diagram prediction Figure 6: [Example 2] Case 2: Prediction of the displacement and the velocity using GEK. The data includethe observable and its first order derivative at x ∈ { . , . , . , . , . } . All the data are used jointly inthe same random field to predict the displacement and the velocity at the same time. The GEK predictionis the posterior mean and the shaded region is [posterior mean] ± × [posterior standard deviation].21 a) y prediction (b) y (cid:48) prediction(c) Phase-space diagram prediction Figure 7: [Example 2] Case 3: Prediction of the displacement and the velocity using AGRF. The data includethe observable, its first order derivative, and its second order derivative at x ∈ { . , . , . , . , . } . Allthe data are used together in the same random field to predict the displacement and the velocity at thesame time. The AGRF prediction is the posterior mean and the shaded region is [posterior mean] ± × [posterior standard deviation]. 22 andom field (AGRF), which is a universal framework incorporating the observable and its derivatives.Consequently, the computation of the posterior mean and variance of the observable and its derivativesat new locations can be carried out in a similar way as in the conventional GP regression method. Ournumerical results show that, in general, using more information of the system, we get better prediction interms of the accuracy of the posterior mean and the width of the confidence interval as in the literature.The main advantage is that our universal framework provides a natural way to incorporate arbitrary orderderivatives and deal with missing data, e.g., the observation of the observable or its derivative is missing atsome sampling locations.To the best of our knowledge, this is the first systematic work that proves the validity of the intuitiveassumption that the derivative of a GRF is still a GRF, which is widely used in probabilistic scientificcomputing and GRF-based (or GP-based) regression methods. Although we use one-dimensional system fordemonstration, our conclusion can be extended to multi-dimensional systems without difficulty. Acknowledgements
SZ was supported by National Science Foundation (NSF) Mathematical Sciences Graduate Internship (MSGI)Program sponsored by the NSF Division of Mathematical Sciences. XY was supported by the U.S. Depart-ment of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research (ASCR) aspart of Multifaceted Mathematics for Rare, Extreme Events in Complex Energy and Environment Sys-tems (MACSER). GL was supported by the National Science Foundation (DMS-1555072, DMS-1736364,CMMI-1634832 and CMMI-1560834), and Brookhaven National Laboratory Subcontract 382247.
References [1] Petter Abrahamsen. A review of gaussian random fields and correlation functions.[2] Oksana A. Chkrebtii, David A. Campbell, Ben Calderhead, and Mark A. Girolami. Bayesian solutionuncertainty quantification for differential equations.
Bayesian Analysis , 11(4):1239–1267.[3] H.-S. Chung and J. Alonso. Using gradients to construct cokriging approximation models for high-dimensional design optimization problems. In . Amer-ican Institute of Aeronautics and Astronautics.[4] Jon Cockayne, Chris J. Oates, T. J. Sullivan, and Mark Girolami. Bayesian probabilistic numericalmethods.
SIAM Review , 61(3):756–789.[5] Harald Cram´er and M. R. Leadbetter.
Stationary and related stochastic processes: sample functionproperties and their applications . Dover books on mathematics. Dover Publications, dover ed edition.OCLC: ocm56532664.[6] S´ebastien Da Veiga and Amandine Marrel. Gaussian process modeling with inequality constraints.
Annales de la facult´e des sciences de Toulouse Math´ematiques , 21(3):529–555.[7] Yixiang Deng, Guang Lin, and Xiu Yang. Multifidelity data fusion via gradient-enhanced gaussianprocess regression. arXiv:2008.01066 [cs, stat] .[8] R. M. Dudley.
Real analysis and probability . Number 74 in Cambridge studies in advanced mathematics.Cambridge University Press.[9] Alexander I.J Forrester, Andr´as S´obester, and Andy J Keane. Multi-fidelity optimization via surro-gate modelling.
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences ,463(2088):3251–3269. 2310] Philipp Hennig, Michael A. Osborne, and Mark Girolami. Probabilistic numerics and uncertainty incomputations.
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences ,471(2179):20150142.[11] M. Kennedy. Predicting the output from a complex computer code when fast approximations areavailable.
Biometrika , 87(1):1–13.[12] P. K. Kitanidis.
Introduction to geostatistics: applications to hydrogeology . Cambridge University Press.[13] Leonid Koralov and Yakov G. Sinai.
Theory of Probability and Random Processes . Universitext. SpringerBerlin Heidelberg.[14] Luc Laurent, Rodolphe Le Riche, Bruno Soulier, and Pierre-Alain Boucard. An overview of gradient-enhanced metamodels with applications.
Archives of Computational Methods in Engineering , 26(1):61–106.[15] Loic Le Gratiet and Josselin Garnier. RECURSIVE CO-KRIGING MODEL FOR DESIGN OF COM-PUTER EXPERIMENTS WITH MULTIPLE LEVELS OF FIDELITY.
International Journal forUncertainty Quantification , 4(5):365–386.[16] L. Lin and D. B. Dunson. Bayesian monotone regression using gaussian process projection.
Biometrika ,101(2):303–317.[17] Andr´es F. L´opez-Lopera, Fran¸cois Bachoc, Nicolas Durrande, and Olivier Roustant. Finite-dimensionalgaussian approximation with linear inequality constraints.
SIAM/ASA Journal on Uncertainty Quan-tification , 6(3):1224–1255.[18] Max D. Morris, Toby J. Mitchell, and Donald Ylvisaker. Bayesian design and analysis of computerexperiments: Use of derivatives in surface prediction.
Technometrics , 35(3):243–255.[19] Houman Owhadi. Bayesian numerical homogenization.
Multiscale Modeling & Simulation , 13(3):812–828.[20] Houman Owhadi. Multigrid with rough coefficients and multiresolution operator decomposition fromhierarchical information games.
SIAM Review , 59(1):99–149.[21] Houman Owhadi and Clint Scovel.
Operator-Adapted Wavelets, Fast Solvers, and Numerical Homoge-nization: From a Game Theoretic Approach to Numerical Approximation and Algorithm Design . Cam-bridge University Press, 1 edition.[22] Andrew Pensoneault, Xiu Yang, and Xueyu Zhu. Nonnegativity-enforced gaussian process regression. arXiv:2004.04632 [cs, stat] .[23] P. Perdikaris, M. Raissi, A. Damianou, N. D. Lawrence, and G. E. Karniadakis. Nonlinear informa-tion fusion algorithms for data-efficient multi-fidelity modelling.
Proceedings of the Royal Society A:Mathematical, Physical and Engineering Sciences , 473(2198):20160751.[24] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear differentialequations using gaussian processes.
Journal of Computational Physics , 348:683–693.[25] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Numerical gaussian processes fortime-dependent and nonlinear partial differential equations.
SIAM Journal on Scientific Computing ,40(1):A172–A198.[26] Carl Edward Rasmussen and Christopher K. I. Williams.
Gaussian processes for machine learning .Adaptive computation and machine learning. MIT Press. OCLC: ocm61285753.2427] Jerome Sacks, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. Design and analysis of computerexperiments.
Statistical Science , 4(4):409–423.[28] Michael Schober, David Duvenaud, and Philipp Hennig. Probabilistic ODE solvers with runge-kuttameans. arXiv:1406.2582 [cs, math, stat] .[29] E. Solak, R. Murray-smith, W. E. Leithead, D. J. Leith, and Carl E. Rasmussen. Derivative observationsin gaussian process models of dynamic systems. In S. Becker, S. Thrun, and K. Obermayer, editors,
Advances in Neural Information Processing Systems 15 , pages 1057–1064. MIT Press.[30] Selvakumar Ulaganathan, Ivo Couckuyt, Francesco Ferranti, Eric Laermans, and Tom Dhaene. Perfor-mance study of multi-fidelity gradient enhanced kriging.
Structural and Multidisciplinary Optimization ,51(5):1017–1033.[31] Xiu Yang, David Barajas-Solano, Guzel Tartakovsky, and Alexandre M. Tartakovsky. Physics-informedCoKriging: A gaussian-process-regression-based multifidelity method for data-model convergence.
Jour-nal of Computational Physics , 395:410–431.[32] Xiu Yang, Guzel Tartakovsky, and Alexandre Tartakovsky. Physics-informed kriging: A physics-informed gaussian process regression method for data-model convergence. arXiv:1809.03461 [cs, stat] .[33] Xiu Yang, Xueyu Zhu, and Jing Li. When bifidelity meets CoKriging: An efficient physics-informedMultiFidelity method.