Multifidelity Data Fusion via Gradient-Enhanced Gaussian Process Regression
aa r X i v : . [ c s . C E ] A ug Multifidelity Data Fusion via Gradient-Enhanced Gaussian ProcessRegression
Yixiang Deng , Guang Lin , and Xiu Yang ∗ School of Engineering, Brown University, USA Division of Applied Mathematics, Brown University, USA Department of Mathematics, Purdue University, USA School of Engineering, Purdue University, USA Department of Industrial and Systems Engineering, Lehigh University, USAAugust 4, 2020
Abstract
We propose a data fusion method based on multi-fidelity Gaussian process regression (GPR) frame-work. This method combines available data of the quantity of interest (QoI) and its gradients withdifferent fidelity levels, namely, it is a Gradient-enhanced Cokriging method (GE-Cokriging). It providesthe approximations of both the QoI and its gradients simultaneously with uncertainty estimates. Wecompare this method with the conventional multi-fidelity Cokriging method that does not use gradientsinformation, and the result suggests that GE-Cokriging has a better performance in predicting both QoIand its gradients. Moreover, GE-Cokriging even shows better generalization result in some cases whereCokriging performs poorly due to the singularity of the covariance matrix. We demonstrate the appli-cation of GE-Cokriging in several practical cases including reconstructing the trajectories and velocityof an underdamped oscillator with respect to time simultaneously, and investigating the sensitivity ofpower factor of a load bus with respect to varying power inputs of a generator bus in a large scale powersystem. We also show that though GE-Cokriging method requires a little bit higher computational costthan Cokriging method, the result of accuracy comparison shows that this cost is usually worth it.
Gaussian process (GP) is one of the most well studied stochastic processes in probability and statistics.Given the flexible form of data representation, GP is a powerful tool for classification and regression, andit is widely used in probabilistic scientific computing, engineering design, geostatistics, data assimilation,machine learning, etc. In particular, given a data set comprising input/output pairs of locations and quantityof interest (QoI),
GP regression (GPR, also known as
Kriging ), can provide a prediction along with a meansquared error (MSE) estimate of the QoI at any location. Alternatively, from the Bayesian perspective,GPR identifies a Gaussian random variable at any location with a posterior mean (corresponding to theprediction) and variance (corresponding to the MSE). Generally speaking, the larger the given data set sizeis, the closer the GPR’s posterior mean is to the ground truth and the smaller the posterior variance is.In many practical problems, obtaining a large amount of data can be difficult because of the limitationof resources. There are several approaches to augment the data set in different manners. For example,the original Cokriging method exploits the correlation between multiple QoIs in the geostatistical study, ∗ [email protected] gradient-enhanced Kriging (GE-Kriging) method, also referred to as Gradient-based Krigingin some literature, has been widely investigated in areas such as computational fluid dynamics, especiallyin aerodynamics optimization problems [4, 32, 15, 3]. Incorporating gradient information in different ways,this method consists of direct and indirect approaches. The former uses the gradient information through anaugmented covariance matrix [12], while the latter approximates the gradient via finite-difference method [3,37]. The gradient-enhanced Cokriging (GE-Cokriging) method in [16] refers to a GE-Kriging method thatuses a different covariance function between the QoI and its gradients other than that in conventional GE-Kriging. The GE-Cokriging method in [30] combines multi-fidelity information of the QoI and its gradientsto predict the QoI only.Most of the aforementioned works focus on enhancing the accuracy of predicting the QoI. Hence, whenthe gradient information is used, the method is a “gradient-enhanced” approach. However, in many ap-plications, both the QoI and its gradient are important. For example, when studying the phase diagramof a dynamical system, one needs an accurate prediction of both location and velocity. Another exampleis the sensitivity analysis of a system, where the gradient information is critical. Therefore, in this work,we propose a comprehensive multifidelity gradient-enhanced Cokriging method to predict both QoI and itsgradients simultaneously based on GE-Cokriging [30]. This method exploits the QoI and its gradient frommodels of different fidelities based on the combination of the GE-Kriging and the Cokriging to improve theprediction accuracy. In terms of predicting the QoI, this method can be considered as “gradient-enhanced”,while from the perspective of estimating gradients, this method can be considered as “integral-enhanced”.In this work, GE-Cokriging refers to our proposed multi-fidelity method, instead of the GE-Cokriging in [16].In this paper, we firstly review GPR (Kriging) and its extension for a multi-fidelity study (Cokriging).Then, we describe the gradient-enhanced Kriging/Cokriging as well as the GE-Kriging/Cokriging method.Finally, we use four examples to demonstrate the efficacy of our approach. We present a brief review of the GPR method adopted from [1, 6, 34]. We denote the observation locationsas X = { x ( i ) } Ni =1 ( x ( i ) ∈ D, D ⊆ R d ) and the observed values of the QoI at these locations as y =( y (1) , y (2) , . . . , y ( N ) ) ⊤ ( y ( i ) ∈ R ). For simplicity, we assume that y ( i ) are scalars. The GPR method aimsto identify a GP Y ( x , ω ) : D × Ω → R based on the input/output data set { ( x ( i ) , y ( i ) ) } Ni =1 , where Ω isthe sample space of a probability triple. Here, x can be considered as parameters for this GP, such that Y ( x , · ) : Ω → R is a Gaussian random variable for any x in the set D . A GP Y ( x , ω ) is usually denoted as Y ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) , (2.1)2here ω is not explicitly listed for brevity, µ ( · ) : D → R and k ( · , · ) : D × D → R are the mean and covariancefunctions (also called kernel function), respectively: µ ( x ) = E { Y ( x ) } , (2.2) k ( x , x ′ ) = Cov { Y ( x ) , Y ( x ′ ) } = E { ( Y ( x ) − µ ( x ))( Y ( x ′ ) − µ ( x ′ )) } . (2.3)The variance of Y ( x ) is k ( x , x ), and its standard deviation is σ ( x ) = p k ( x , x ). The covariance matrix,denoted as C , is defined as C ij = k ( x ( i ) , x ( j ) ). Functions µ ( x ) and k ( x , x ′ ) are obtained by identifying theirhyperparameters via maximizing the log marginal likelihood [31]:ln L = −
12 ( y − µ ) ⊤ C − ( y − µ ) −
12 ln | C | − N π, (2.4)where µ = ( µ ( x (1) ) , . . . , µ ( x ( N ) )) ⊤ and | C | is the determinant of matrix C . For any x ∗ ∈ D , the GPRposterior mean and variance are ˆ y ( x ∗ ) = µ ( x ∗ ) + c ( x ∗ ) ⊤ C − ( y − µ ) , (2.5)ˆ s ( x ∗ ) = σ ( x ∗ ) − c ( x ∗ ) ⊤ C − c ( x ∗ ) , (2.6)where c ( x ∗ ) is a vector of covariance: ( c ( x ∗ )) i = k ( x ( i ) , x ∗ ). In practice, it is common to use ˆ y ( x ∗ ) asthe prediction, and ˆ s ( x ∗ ) is also called the mean squared error (MSE) of the prediction because ˆ s ( x ∗ ) =E (cid:8) (ˆ y ( x ∗ ) − Y ( x ∗ )) (cid:9) [6]. Consequently, ˆ s ( x ∗ ), the posterior standard deviation, is called the root meansquared error (RMSE). Moreover, to account for the observation noise, one can assume that the noise isindependent and identically distributed (i.i.d.) Gaussian random variables with zero mean and variance δ ,and replace C with C + δ I . In this study, we assume that observations y are noiseless. If C is not invertibleor its condition number is very large, one can add a small regularization term α I ( α is a small positive realnumber) to C , which is equivalent to assuming there is an observation noise. In addition, ˆ s can be used inglobal optimization, or in the greedy algorithm to identify locations of additional observations. In the widely used ordinary Kriging method, a stationary GP is assumed [14]. Specifically, µ is set as a con-stant µ ( x ) ≡ µ , and k ( x , x ′ ) = k ( τ ), where τ = x − x ′ . Consequently, σ ( x ) = k ( x , x ) = k ( ) = σ is a con-stant. The most widely used kernels in scientific computing is the Mat´ern functions, especially its two specialcases, i.e., exponential and squared-exponential (Gaussian) kernels. For example, the Gaussian kernel can bewritten as k ( τ ) = σ exp (cid:0) − k x − x ′ k w (cid:1) , where the weighted norm is defined as k x − x ′ k w = d X i =1 (cid:18) x i − x ′ i l i (cid:19) .Here, l i ( i = 1 , . . . , d ), the correlation lengths in the i direction, are constants. Given a stationary covariancefunction, the covariance matrix C can be written as C = σ Ψ , where Ψ ij = exp( − k x ( i ) − x ( j ) k w ). Theestimators of µ and σ , denoted as ˆ µ and ˆ σ , areˆ µ = ⊤ Ψ − y ⊤ Ψ − , ˆ σ = ( y − ˆ µ ) ⊤ Ψ − ( y − ˆ µ ) N , (2.7)where is a constant vector consisting of 1s [6]. It is also common to set µ = 0 [31]. The hyperparameters σ and l i are identified by maximizing the log marginal likelihood in Eq. (2.4). The terms ˆ y ( x ∗ ) and ˆ s ( x ∗ )in Eq. (2.5) take the following form: ˆ y ( x ∗ ) = ˆ µ + ψ ⊤ Ψ − ( y − ˆ µ ) , (2.8)ˆ s ( x ∗ ) = ˆ σ (cid:0) − ψ ⊤ Ψ − ψ (cid:1) , (2.9)where ψ = ψ ( x ∗ ) is a (column) vector consisting of correlations between the observed data and the predic-tion, i.e., ψ i = σ k ( x ( i ) , x ∗ ). 3ext, we briefly review the formulation of the multifidelity Cokriging, and we use the two-fidelity modelfor demonstration. Suppose that we have high-fidelity data (e.g., accurate measurements of the QoI) y H = ( y (1) H , . . . , y ( N H ) H ) ⊤ at locations X H = { x ( i ) H } N H i =1 , and low-fidelity data (e.g., measurements with loweraccuracy or numerical approximations of the QoI) y L = ( y (1) L , . . . , y ( N L ) L ) ⊤ at locations X L = { x ( i ) L } N L i =1 ,where y ( i ) H , y ( i ) L ∈ R and x ( i ) H , x ( i ) L ∈ D ⊆ R d . We denote X = { X L , X H } and e y = ( y ⊤ L , y ⊤ H ) ⊤ . Kennedyand O’Hagan [13] proposed a multifidelity formulation based on the auto-regressive model for GP Y H ( · )( ∼ GP ( µ H ( · ) , k H ( · , · ))): Y H ( x ) = ρY L ( x ) + Y d ( x ) , (2.10)where Y L ( · ) ( ∼ GP ( µ L ( · ) , k L ( · , · ))) regresses the low-fidelity data, ρ ∈ R is a regression parameter and Y d ( · )( ∼ GP ( µ d ( · ) , k d ( · , · ))) models the discrepancy between Y H and ρY L . This model assumes thatCov { Y H ( x ) , Y L ( x ′ ) | Y L ( x ) } = 0 , for all x ′ = x , x , x ′ ∈ D. (2.11)The covariance of observations, e C , is then given by e C = (cid:18) C L ( X L , X L ) ρ C L ( X L , X H ) ρ C L ( X H , X L ) ρ C L ( X H , X H ) + C d ( X H , X H ) (cid:19) , (2.12)where C L and C d are the covariance matrices computed from k L ( · , · ) and k d ( · , · ), respectively, i.e.,[ C L ( X L , X L )] ij = k L ( X ( i ) L , X ( j ) L ) , [ C L ( X L , X H )] ij = k L ( X ( i ) L , X ( j ) H ) , [ C L ( X H , X L )] ij = k L ( X ( i ) H , X ( j ) L ) , [ C L ( X H , X H )] ij = k L ( X ( i ) H , X ( j ) H ) , [ C d ( X H , X H )] ij = k d ( X ( i ) H , X ( j ) H ) . (2.13)One can assume parameterized forms for these kernels (e.g., Gaussian kernel) and employ the followingtwo-step approach [7, 6] to identify hyperparameters:1. Use Kriging to construct Y L based on { X L , y L } .2. Denote y d = y H − ρ y L ( X H ), where y L ( X H ) are the values of y L at locations common to those of X H , then construct Y d using { X H , y d } via Kriging.The posterior mean and variance of Y H at x ∗ ∈ D are given byˆ y ( x ∗ ) = µ H ( x ∗ ) + e c ( x ∗ ) ⊤ e C − ( e y − e µ ) , (2.14)ˆ s ( x ∗ ) = ρ σ L ( x ∗ ) + σ d ( x ∗ ) − e c ( x ∗ ) ⊤ e C − e c ( x ∗ ) , (2.15)where µ H ( x ∗ ) = ρµ L ( x ∗ ) + µ d ( x ∗ ), σ L ( x ∗ ) = k L ( x ∗ , x ∗ ), σ d ( x ∗ ) = k d ( x ∗ , x ∗ ), and e µ = (cid:18) µ L µ H (cid:19) = (cid:16) µ L ( x (1) L ) , . . . , µ L ( x ( N L ) L ) (cid:17) ⊤ (cid:16) µ H ( x (1) H ) , . . . , µ H ( x ( N H ) H ) (cid:17) ⊤ , (2.16) e c ( x ∗ ) = (cid:18) ρ c L ( x ∗ ) c H ( x ∗ ) (cid:19) = (cid:16) ρk L ( x (1) L , x ∗ ) , . . . , ρk L ( x ( N L ) L , x ∗ ) (cid:17) ⊤ (cid:16) k H ( x (1) H , x ∗ ) , . . . , k H ( x ( N H ) H , x ∗ ) (cid:17) ⊤ , (2.17)where k H ( x , x ′ ) = ρ k L ( x , x ′ ) + k d ( x , x ′ ). Alternatively, one can simultaneously identify hyperparametersin k L ( · , · ) and k d ( · , · ) along with ρ by maximizing the following log marginal likelihood:ln e L = −
12 ( e y − e µ ) ⊤ e C − ( e y − e µ ) −
12 ln (cid:12)(cid:12) e C (cid:12)(cid:12) − N H + N L π. (2.18)4 .3 GE-Kriging/Cokriging GE-Kriging uses the fact that under some condition, the derivative in physical space and the integral in theprobability space are interchangable: ∂∂x i µ ( x ) = ∂∂x i E { Y ( x ) } = E (cid:26) ∂∂x i Y ( x ) (cid:27) ,∂∂x i k ( x , x ′ ) = ∂∂x i Cov { Y ( x ) , Y ( x ′ ) } = Cov (cid:26) ∂∂x i Y ( x ) , Y ( x ′ ) (cid:27) ,∂ ∂x i ∂x ′ j k ( x , x ′ ) = ∂ ∂x i ∂x ′ j Cov { Y ( x ) , Y ( x ′ ) } = Cov ( ∂∂x i Y ( x ) , ∂∂x ′ j Y ( x ′ ) ) . (2.19)These formulas specify the covariance between the QoI and its gradient as well as the covariance betweendifferent components of the gradient. To simplify the notations, we use ∂ i and ∂ i ′ to denote ∂∂x i and ∂∂x ′ i ,respectively, and ∇ = ( ∂ , ∂ , . . . , ∂ d ) ⊤ , ∇ ′ = ( ∂ ′ , ∂ ′ , . . . , ∂ d ′ ). Of note, for a scalar function z , ∇ z is acolumn vector and ∇ ′ z is a row vector. Since we use a stationary kernel in this work, i.e., k ( x , x ′ ) = k ( x − x ′ ),we have ∂ i k ( x , x ′ ) = − ∂ i ′ k ( x , x ′ ) . (2.20)The analytical form of ∂ i k ( x , x ′ ) and ∂ i ∂ j ′ k ( x , x ′ ) can be found in the appendix of [30] for widely usedkernel functions k ( x , x ′ ), e.g., Mat´ern kernels with several specific selections of ν . Subsequently, GE-Krigingfollows almost the same procedures as those in Kriging with the following modifications [16]:1. The observation vector is augmented to include gradient data, i.e., y = ( y (1) , y (2) , . . . , y ( N ) , ( ∇ y (1) ) ⊤ , ( ∇ y (2) ) ⊤ , . . . , ( ∇ y ( N ) ) ⊤ ) ⊤ .
2. Given a constant posterior mean of the QoI, the posterior mean of the gradient is zero, hence, =(1 , , . . . , | {z } N , , , . . . , | {z } N × d ) ⊤ .3. Covariance matrix C = σ Ψ , more specifically, the correlation matrix Ψ is expanded to include cor-relations between QoI and its gradient as well as correlations between components of the gradient,i.e., Ψ = (cid:20) Ψ Ψ Ψ Ψ (cid:21) , (2.21)where Ψ = 1 σ k ( x (1) , x (1) ) · · · k ( x (1) , x ( N ) )... . . . ... k ( x ( N ) , x (1) ) · · · k ( x ( N ) , x ( N ) ) , Ψ = ∇ Ψ = 1 σ ∂ k ( x (1) , x (1) ) · · · ∂ k ( x (1) , x ( N ) )... . . . ... ∂ d k ( x (1) , x (1) ) · · · ∂ d k ( x (1) , x ( N ) )... . . . ... ∂ k ( x ( N ) , x (1) ) · · · ∂ k ( x ( N ) , x ( N ) )... . . . ... ∂ d k ( x ( N ) , x (1) ) · · · ∂ d k ( x ( N ) , x ( N ) ) , Ψ = Ψ ⊤ , = ∇ ′ ∇ Ψ = ψ · · · ψ N ... . . . ... ψ N · · · ψ NN , ψ lm = 1 σ ∂ ∂ ′ k ( x ( l ) , x ( m ) ) · · · ∂ ∂ d ′ k ( x ( l ) , x ( m ) )... . . . ... ∂ d ∂ ′ k ( x ( l ) , x ( m ) ) · · · ∂ d ∂ d ′ k ( x ( l ) , x ( m ) ) . The posterior mean and variance of the QoI at a new location x ∗ , denoted by ˆ y ( x ∗ ) and ˆ s ( x ∗ ), hasthe same form as in Kriging, i.e., Eqs. (2.8) and (2.9), except that ψ = (cid:18) ψ ( x ∗ ) ∇ ψ ( x ∗ ) (cid:19) , where ∇ ψ ( x ∗ ) =1 σ ∇ k ( x (1) , x ∗ )... ∇ k ( x ( N ) , x ∗ ) . Furthermore, the posterior mean and variance of the QoI’s gradient at x ∗ are computedas c ∂ i y ( x ∗ ) = ( ∂ i ′ ψ ) ⊤ Ψ − ( y − ˆ µ ) , (2.22) b s i ( x ∗ ) = ˆ σ (cid:2) − ( ∂ i ′ ψ ) ⊤ Ψ − ∂ i ′ ψ (cid:3) , (2.23)where ∂ i ′ ψ = (cid:18) ∂ i ′ ψ ( x ∗ ) ∂ i ′ ( ∇ ψ ( x ∗ )) (cid:19) and i = 1 , , . . . , d .Next, we introduce the details of GE-Cokriging method, which also shares a similar construction pro-cedure as Cokriging except for some modifications to incorporate gradient information. Such modificationsare as follows:1. The observation vector is augmented to e y = (cid:0) y ⊤ L , y ⊤ H , ( ∇ y L ) ⊤ , ( ∇ y H ) ⊤ (cid:1) ⊤ and is of length N L + N H +( N L + N H ) d .2. The covariance matrix of the observation data, e C in Eq. (2.12), is augmented to include gradientinformation as well, i.e., e C = e C e C e C e C ! (2.24)where e C takes the form of covariance matrix in Cokriging, see Eq. (2.12), and e C = (cid:20) ∇ C L ( X L , X L ) ρ ∇ C L ( X L , X H ) ∇ C L ( X H , X L ) ρ ∇ C L ( X H , X H ) + ∇ C d ( X H , X H ) (cid:21) , e C = e C ⊤ , e C = (cid:20) ∇ ′ ∇ C L ( X L , X L ) ρ ∇ ′ ∇ C L ( X L , X H ) ρ ∇ ′ ∇ C L ( X H , X L ) ρ ∇ ′ ∇ C L ( X H , X H ) + ∇ ′ ∇ C d ( X H , X H ) (cid:21) . Here ∇ C L ( X L , X L ) is a matrix constructed by replacing each element in C L ( X L , X L ), i.e., [ C L ( X L , X L )] ij ,with its gradient ( ∂ [ C L ( X L , X L )] ij , . . . , ∂ d [ C L ( X L , X L )] ij ) ⊤ . Similarly, ∇ C L ( X L , X H ), ∇ C L ( X H , X L ), ∇ C L ( X H , X H ) and ∇ C d ( X H , X H ) are constructed by replacing elements in corresponding matrices inEq. (2.13) with their gradients, respectively. The matrix ∇ ′ ∇ C L ( X L , X L ) is constructed by replacingeach element in C L ( X L , X L ), i.e., [ C L ( X L , X L )] ij , with the matrix ∂ ∂ ′ [ C L ( X L , X L )] ij · · · ∂ ∂ d ′ [ C L ( X L , X L )] ij ... . . . ... ∂ d ∂ ′ [ C L ( X L , X L )] ij · · · ∂ d ∂ d ′ [ C L ( X L , X L )] ij . Other submatrices in e C are constructed in the same manner.6. The posterior mean vector now becomes e µ = µ L µ H L H = (cid:0) µ L ( x (1) L ) , . . . , µ L ( x ( N L ) L ) (cid:1) ⊤ (cid:0) µ H ( x (1) H ) , . . . , µ H ( x ( N H ) H ) (cid:1) ⊤ (0 , . . . , | {z } N L · d ) ⊤ (0 , . . . , | {z } N H · d ) ⊤ . (2.25)4. The covariance vector between the new observation location x ∗ and existing observation data [ X L , X H ],denoted by e c ( x ∗ ), is given by e c ( x ∗ ) = ρ c L ( x ∗ ) c H ( x ∗ ) ρ ∇ c L ( x ∗ ) ∇ c H ( x ∗ ) , (2.26)where c L ( x ∗ ) = (cid:0) k L ( x (1) L , x ∗ ) , . . . , k L ( x ( N L ) L , x ∗ ) (cid:1) ⊤ and c H ( x ∗ ) = (cid:0) k H ( x (1) H , x ∗ ) , . . . , k H ( x ( N H ) H , x ∗ ) (cid:1) ⊤ .The estimators for the mean and standard deviation of QoI at the new observation location x ∗ in GE-Cokriging follow Eqs. (2.14) and (2.15) in Cokriging method with corresponding components updated asshown above.We provide the formulas for the posterior mean and variance of the QoI’s gradient at x ∗ as follows: c ∂ i y ( x ∗ ) = ( ∂ i ′ e c ( x ∗ )) ⊤ e C − ( e y − e µ ) , (2.27) b s i ( x ∗ ) = ρ ∂ i ∂ i ′ k L ( x ∗ , x ∗ ) + ∂ i ∂ i ′ k H ( x ∗ , x ∗ ) − [ ∂ i ′ e c ( x ∗ )] ⊤ e C − ∂ i ′ e c ( x ∗ ) , (2.28)where i = 1 , , . . . , d . The derivation of Eqs. (2.27) and (2.28) follow the same procedure as Eqs. (2.14)and (2.15) shown in [13, 6]. In other words, Eqs. (2.27) and (2.28) can be obtained by replacing Y ( x ) inEqs. (2.14) and (2.15) with ∂ i Y ( x ). More specifically, µ H ( x ∗ ) is replaced with the mean of ∂ i Y ( x ) (whichis zero), e c is replaced with ∂ i e c , and ρ σ L ( x ∗ ) + σ d ( x ∗ ) (i.e., ρ Var { Y L ( x ∗ ) } + Var { Y d ( x ∗ ) } ) is replaced with ρ Var { ∂ i Y L ( x ∗ ) } + Var { ∂ i Y d ( x ∗ ) } = ρ ∂ i ∂ i ′ k L ( x ∗ , x ∗ ) + ∂ i ∂ i ′ k H ( x ∗ , x ∗ ).We note that the GE-Cokriging exploits the relation between QoI and its gradients, and once the hyper-parameters in the model are identified, we can compute the posterior mean and variance of the QoI and itsgradients simultaneously . It has the potential to improve the accuracy of the prediction for both QoI and itsgradients compared with predicting them separately. Also, in some cases, this approach can reduce compu-tational cost compared to, for example, constructing Cokriging models for QoI and its gradients separately(see Section 3.5). In this section, we provide another perspective on using the QoI f and its gradients ∇ f in GPR simultane-ously. The aforementioned gradient-enhanced methods firstly assume a GP model Y ( x ) for f , and the GPmodel for ∇ f can be constructed accordingly by taking (partial) derivatives of Y ( x )’s mean and covariancefunction. Alternatively, one can also assume a GP model for ∇ f first, e.g., ∂ i f is modeled by Y ( x ), then theQoI f can be modeled by R Y ( x )d x i , which is a GP because integral is a linear operator. Here we use theunivariate function to further illustrate the concept. We model f ′ with GP Y f ′ ( x ) ∼ GP ( µ f ′ ( x ) , k f ′ ( x , x ′ )),7hen similar to Eqs. (2.19), the integrals in the physical space and in the probability space are interchangeable: Z µ f ′ ( x )d x = Z E { Y f ′ ( x ) } d x = E (cid:26)Z Y f ′ ( x )d x (cid:27) , Z k f ′ ( x , x ′ )d x = Z Cov { Y f ′ ( x ) , Y f ′ ( x ′ ) } d x = Z E { ( Y f ′ ( x ) − µ f ′ ( x ))( Y f ′ ( x ′ ) − µ f ′ ( x ′ )) } d x = E (cid:26)(cid:20) Z ( Y f ′ ( x ) − µ f ′ ( x ))d x (cid:21) ( Y f ′ ( x ′ ) − µ f ′ ( x ′ )) (cid:27) = Cov (cid:26)Z Y f ′ ( x )d x , Y f ′ ( x ′ ) (cid:27) , Z Z k f ′ ( x , x ′ )d x d x ′ = Z Z
Cov { Y f ′ ( x ) , Y f ′ ( x ′ ) } d x d x ′ = Z Z E { ( Y f ′ ( x ) − µ f ′ ( x ))( Y f ′ ( x ′ ) − µ f ′ ( x ′ )) } d x d x ′ = E (cid:26)Z ( Y f ′ ( x ) − µ f ′ ( x ))d x Z ( Y f ′ ( x ′ ) − µ f ′ ( x ′ ))d x ′ (cid:27) = Cov (cid:26)Z Y f ′ ( x )d x , Z Y f ′ ( x ′ )d x ′ (cid:27) . (2.29)These formulas provide the mean and covariance of the GP Y f ( x ) = R Y f ′ ( x )d x as well as the covariancebetween Y f ( x ) and Y f ′ ( x ). Of note, we use indefinite integral here and the constant associated with thisintegral needs identification via maximizing the log marginal likelihood. But this constant will not affectthe covariance function, because Cov (cid:8)R Y f ′ ( x )d x , R Y f ′ ( x ′ )d x ′ (cid:9) = Cov (cid:8)R Y f ′ ( x )d x + a R Y f ′ ( x ′ )d x ′ + b (cid:9) for any constants a and b .Then we can follow the same procedure in the gradient-enhanced Kriging in Section 2.3 to construct thecovariance matrix C and compute the posterior mean and variance of f and f ′ at any location x ∗ . Of note,this “integral-enhanced” GPR/Kriging is equivalent to the gradient-enhanced version. For example, if weset the mean of Y f ′ ( x ) to be zero, then the mean of Y f ( x ) is a constant µ , which needs identifying as in thegradient-enhanced version. Subsequently, the integral-enhanced Kriging is equivalent to the equivalence ofthe gradient-enhanced Kriging if the mean and covariance functions are selected appropriately. For example,if we assume zero mean and set k f ′ ( x , x ′ ) = ∂ ∂x i ∂x ′ j k f ( x , x ′ ) for Y f ′ ( x ), where k f ( x , x ′ ) is the Gaussiankernel function, this integral-enhanced Kriging model is the same as the gradient-enhanced Kriging modelthat uses Gaussian kernel function and constant mean for Y f ( x ). In most cases, it is easier to computethe (partial) derivatives than to compute the integral. Therefore, it is more convenient to use the gradient-enhanced setting. The similar argument holds for Cokriging. In this work, we only show the results ofgradient-enhanced Kriging/Cokriging. We present four numerical examples to demonstrate the performance of GE-Cokriging. The first two proto-type examples show the capability GE-Cokriging’s capability of approximating the QoI and its gradients oftwo 1D functions and a 2D function. The other two examples illustrate the high precision of GE-Cokrigingin constructing the phase diagram of an underdamped oscillator and analyzing the sensitivity of power factorunder varying power inputs in a large-scale power grid system. In all these examples, we assume that boththe QoI and its gradients are collected at every observation locations. The hyperparameters in GP modelsare identified by maximizing associated log marginal likelihood function using genetic algorithm as in [6].8astly, we compare the prediction accuracy using Cokriging, GE-Kriging and GE-Cokriging in each casequantitatively. We also compare the computational cost of these methods in each case.
In this part, we compare the results of Cokriging and GE-Cokriging in approximating a 1D function. In thiscase, the target function to approximate is, f H ( x ) = (6 x − sin(12 x − , (3.1)from which high-fidelity data are sampled. The low-fidelity data are sampled from the following function f L ( x ) = Af H ( x ) + B ( x − .
5) + C. (3.2)The observation locations of f H are X H = { , . , . , . } , and those for f L are X L = { , . , . , . , . , . } .Here, the observation locations of data are chosen so that X H ⊂ X L . We first show a well-studied case where parameters of low-fidelity function is given by A = 0 . , B = 10 , C = − f L ( x ) = 0 . f H ( x ) + 10( x − . − . (3.3)Of note, we use fewer observation points in X L than in [6]. (a) (b) Figure 1: Prediction of the QoI for the 1D problem case 1. Prediction of posterior mean (black solid line)and standard deviation (grey shaded area) of QoI f H by (a) Cokriging and (b) GE-Cokriging. The low-fidelity function f L is denoted by red solid lines, high-fidelity samples are denoted by black diamonds andlow-fidelity samples by red circles. Colored online.The results of Cokriging and GE-Cokriging for reconstructing f H are shown in Fig. 1. Fig. 1a shows thatCokriging is able to capture f H as the posterior mean is generally close to the high-fidelity function value.However, ˆ s of the prediction are large on most of the prediction locations, which indicates that Cokrigingmethod yields considerable uncertainty at those locations, whereas this uncertainty is very small at X c because a simple relation has been found between f H and f L based on available data [6]. As a comparison,9ig. 1b illustrates that the posterior mean of GE-Cokriging coincides with f H , and the uncertainty in theprediction is very small on the entire interval as the grey shaded area is almost invisible.Next, we compare the performance of predicting the gradients of f H , i.e., d f H ( x )d x . Fig. 2 shows thatCokriging method suffers from the singularity of the covariance matrix in this setup, implied from sharpturning of predicted curvature between neighboring observations in Fig. 2a and large standard deviations inFig. 2b on locations where observations are not available. As for GE-Cokriging method, the prediction ofgradients is accurate both in terms of posterior mean illustrated in Fig. 2a and standard deviation illustratedFig. 2b, which shows that the prediction uncertainty by Cokriging is almost 10 times greater than that byGE-Cokriging. We note that the performance of Cokriging is poor in this case because the covariance matrix˜ C is close to a singular matrix. The reason for this phenomenon is that the value of d y L d x is close at x = 0 . x = 0 .
4, as well as at x = 0 and x = 0 .
6. As we point out in Section 2.1, this singularity issue is commonfor GPR method in practice, and the typical approach to alleviate this is to add a diagonal matrix αI to thecovariance matrix, which is equivalent to add noises in the collected data. In this paper, we set α = 10 − ,which is much smaller than typical numbers used in practice, to demonstrate that the GE-Cokriging canhelp to alleviate the singularity issue without sacrificing accuracy of matching observation data. (a) (b) Figure 2: Prediction of the gradient of QoI for the 1D problem case 1. Prediction of posterior (a) mean byCokriging (blue solid line) and GE-Cokriging (green solid line), where the gradient of high-fidelity function d f H d x is denoted by black solid line, gradient of low-fidelity function d f L d x is denoted by red solid line, high-fidelity samples are denoted by black diamonds and low-fidelity samples by red circles and (b) standarddeviation for gradient of QoI d f H d x by Cokriging (red solid line) and GE-Cokriging (black solid line). Coloredonline. f L Next, we keep the sampling locations, i.e., X H and X L same as those in Section 3.1.1, and only modify themodel parameters of the low-fidelity function in Eq. (3.3) by slightly shifting it, i.e., replace x with x − . f L , f L ( x ) = f L ( x − . . f H ( x − . x − . − . − . (3.4)The posterior means and standard deviations of Cokriging and GE-Cokriging are shown in Fig. 3. It is shownin Fig. 3a that the Cokriging method is not able to obtain an accurate prediction of f H , and the resultinguncertainty is large on the entire interval except for locations of X H . On the contrary, as shown in Fig. 3b,the GE-Cokriging result is much closer to f H and the uncertainty is very small.10 a) (b) Figure 3: Prediction of the QoI for the 1D problem case 2. Prediction of posterior mean (black solid line)and standard deviation (grey shaded area) of QoI f H by (a) Cokriging and (b) GE-Cokriging. The low-fidelity function f L is denoted by red solid lines, high-fidelity samples are denoted by black diamonds andlow-fidelity samples by red circles. Colored online.We present the prediction results of gradients by GE-Cokriging and Cokriging in Fig. 4. Similar tothe observations from Fig. 2a, Cokriging in this case suffers from the singularity of the covariance matrix,with posterior mean deviating significantly from f H (see Fig. 4a) and standard deviation being in the ordercomparable to its mean value (see Fig. 4b). In comparison, GE-Cokriging still yields a good result withposterior mean close to d f H d x (see Fig. 4a) and low uncertainty, i.e., small standard deviations (see Fig. 4b).These contrasts between the Cokriging and GE-Cokriging suggest that the gradient information from high-fidelity function and low-fidelity function can help to improve the prediction accuracy of not only QoI butalso the corresponding gradients. We extend the application of GE-Cokriging method in approximating a 2D function, namely a modifiedBranin function [6], given by f H ( x, y ) = a (¯ x − b ¯ x + c ¯ x − r ) + g (1 − p ) cos(¯ x ) + g + qx, (3.5)where ¯ x = 15 x − , ¯ x = 15 y, x ∈ [0 , , y ∈ [0 , , with a = 1 , b = 5 . π , c = 5 π , r = 6 , g = 10 , p = 18 π , q = 5 , and the low-fidelity function is constructed as follows, f L ( x, y ) = Af H ( Bx + (1 − B ) , Cy ) , (3.6)where A = 1.1, B = 0.95, C = 0.9. The contour of the modified Branin function f H that we aim toapproximate is shown in Fig. 5a and the contour for the low-fidelity function f L is shown in Fig. 5d. Thesamples for high-fidelity observation locations X H (black squares in Fig. 5a) and low-fidelity observationlocations X L (black circles in Fig. 5d) are randomly selected from the uniformly spaced grid of size 41 × , × [0 , ∈ R . We note that X H ⊂ X L as before.11 a) (b) Figure 4: Prediction of the gradient of QoI for the 1D problem case 2. Prediction of posterior (a) mean byCokriging (blue solid line) and GE-Cokriging (green solid line), where the gradient of high-fidelity function d f H d x is denoted by black solid line, gradient of low-fidelity function d f L d x is denoted by red solid line, high-fidelity samples are denoted by black diamonds and low-fidelity samples by red circles and (b) standarddeviation for gradient of QoI d f H d x by Cokriging (red solid line) and GE-Cokriging (black solid line). Coloredonline.We first compare the results of reconstructing f H by Cokriging and GE-Cokriging shown in Fig. 5. Itis clear that the posterior mean of GE-Cokriging (Fig. 5c) is closer to f H than that of Cokriging (Fig. 5b).Also the degree of uncertainty is distinct as posterior standard deviation of Cokriging (Fig. 5e) is one orderof magnitude larger than that in GE-Cokriging (Fig. 5f).Next, we compare the prediction of gradients by Cokriging and GE-Cokriging. Fig. 6a and Fig. 6d profilecontours of exact ∂f H ∂x and ∂f H ∂y , respectively. For predicting ∂f H ∂x , GE-Cokriging (Fig. 6c) shows higheraccuracy globally while Cokriging (Fig. 6b) can not result in accurate prediction in the lower left corner,where the available observation data is rare. As for ∂f H ∂y , since the target function is relatively smooth,both Cokriging (Fig. 6e) and GE-Cokriging (Fig. 6f), are capable of obtaining accurate prediction, whileGE-Cokriging still outperforms Cokriging in the sense of the total RMSE recorded in Tab. 1. We consider a driven harmonic oscillator described by the following second order ODE: ( m ¨ x + c ˙ x + kx = F ( t ) ,x (0) = 1 , ˙ x (0) = 0 , (3.7)where m is the mass, c is the damping coefficient, k is a constant (e.g., elasticity coefficient of a string), and F ( t ) is the external force. We rewrite the ODE in Eq. (3.7) as¨ x + 2 ζω ˙ x + ω x = F ( t ) m , (3.8)12 a) (b) (c)(d) (e) (f) Figure 5: The high-fidelity and low-fidelity function of the 2D problem and the posterior prediction for thehigh-fidelity function. (a) The high-fidelity function, namely the modified Brainin function f H (contour)and observation locations (black squares). Posterior mean of QoI prediction by (b) Cokriging and (c) GE-Cokriging. (d) Low-fidelity function f L (contour) and observation locations (black dots). Posterior standarddeviation of QoI by (e) Cokriging and (f) GE-Cokriging. Colored online.where ω = r km is the undamped angular frequency, and ζ = c √ mk is the damping ratio. We set ζ = 1 / √ ω = 6 p − ζ in this study. The external force is set as the step response: F ( t ) m = ( ω , t ≥ , , t < . (3.9)The analytical solution to Eq. (3.7) is x H ( t ) = e − ζω t sin( p − ζ ω t + ϕ )sin ϕ , ϕ = arccos ζ, (3.10)and the velocity is˙ x H ( t ) = − ω e − ζω t sin ϕ h ζ sin( p − ζ ω t + ϕ ) − p − ζ cos( p − ζ ω t + ϕ ) i . (3.11)The low-fidelity model is a simple harmonic oscillator model: ( m ¨ x + kx = 0 ,x (0) = 1 , ˙ x (0) = 0 , (3.12)13 a) (b) (c)(d) (e) (f) Figure 6: The high-fidelity gradients in x and y directions of the 2D problem and the corresponding posteriorpredictions. (a) The gradient of high-fidelity function in x direction, df H dx (contour) and high-fidelity samples(black squares) of gradient in x direction. Posterior mean of gradient prediction in x direction by (b)Cokriging and (c) GE-Cokriging. (d) The gradient of high-fidelity function in y direction, df H dy (contour) andhigh-fidelity samples (black squares) of gradient in y direction. Posterior mean of gradient prediction in y direction by (e) Cokriging and (f) GE-Cokriging. Colored online.which is equivalent to setting ζ = 0 and F ( t ) = 0 in Eq. (3.8). The analytical solution to the low-fidelitymodel is x L ( t ) = cos( ω t ) , (3.13)and the velocity is ˙ x L ( t ) = − ω sin( ω t ) . (3.14)The observation locations for high- and low-fidelity models are set as T H = { . j } j =0 and T L = { . j } j =0 ,respectively. We compare the constructed trajectory x ( t ) and velocity ˙ x ( t ) on [0 ,
3] by Cokriging and GE-Cokriging in Fig. 7. Cokriging again shows worse performance both for prediction of QoI (Fig. 7a) andgradient (Fig. 7c) marked by significant deviations from the true values as well as large uncertainties atlocations distant from observation locations, while GE-Cokriging manages to reconstruct the trajectory(Fig. 7b) and velocity (Fig. 7d) of the oscillator well with small standard deviations. The overlapping betweentrajectory-velocity phase diagram by GE-Cokriging and the exact phase diagram (Fig. 7e) emphasizes thatGE-Cokriging can provide accurate predictions for QoI and the corresponding gradients simultaneously,while Cokriging failed to. We also note that Cokring suffers from singularity of the covariance matrix again,while GE-Cokriging doesn’t have this concern. 14 .4 Sensitivity of a power grid system
We now consider the relationship between the power input of a generator bus, denoted as x , and real-timepower factor of a load bus, as f ( x ), in a large-scale power system from IEEE 118 bus test case [26]. We useMATPOWER [36], which provides a model for the IEEE 118 bus test case, to run simulations and generatesample points. The f H ( x ) and f L ( x ) represent the alternative current (AC) and direct current (DC) modelsapproximating f ( x ), respectively.The observation locations for Cokriging and GE-Cokriging consist of 51 low-fidelity samples from DCmodel on X L = {
20 + 2 j } j =0 and five samples from AC model on X H = { , , , , } (again, X H ⊂ X L ). In addition to reconstructing f H accurately, estimating the change of power factor of a load busin response to the change of power input of a generate bus, i.e., the sensitivity of f with respect to x , isimportant for safety or energy-efficiency consideration. This change is reflected by the derivative of f ( x ), i.e., d f ( x )d x . Therefore, we aim to approximate both f H and its derivative. Here we use finite-difference methodto obtain d f H d x and d f L d x at X H and X L , respectively, and the step size is 0 . f H with noticeable standarddeviations (Fig. 8a), but it fails to reconstruct d f H d x (Fig. 8c). On the other hand, GE-Cokriging can recon-struct both f H (Fig. 8b) and d f H d x (Fig. 8d) accurately with rather small uncertainty, and the only noticeablediscrepancy appears near the left boundary because that region is far from available data. Unlike othercases, here we notice the occurrence of wiggling in the high-fidelity gradient prediction by GE-Cokriging.This is caused by the aliasing error as we used finite-different method to approximate the gradient functions,recall that no wiggling is observed in previous examples where the gradients are observed directly. Again,reconstructing the gradient using Cokriging suffers from the sigularity of the covariance matrix as shown inFig. 8c, whereas GE-Cokriging doesn’t have this concern (see Fig. 8d). To analyze and compare the accuracy and efficiency among Cokriging, GE-Kriging and GE-Cokriging, werun simulations for five times with random initial conditions for each numerical example, and list the relativemean squared errors for QoI prediction and gradient of QoI prediction in Tab. 1. The numerical simulationswere performed on the same laptop with Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz. We recorded thetime for each separate run and computed the corresponding mean and standard deviation from these 5 runsfor each example (see Tab. 2).The results in Tab. 1 show that GE-Cokriging outperforms Cokriging and GE-Kriging in terms of relativemean squared error for all examples presented. We note that in GE-Kriging, only high-fidelity QoI data(including high-fidelity gradient data) was used for training. GE-Cokriging improves accuracy in all casescompared to Cokriging, which is consistent to the visual observations shown in each numerical example. Itis also worth noting that the relative mean squared errors by Cokriging are almost one order of magnitudehigher than those by GE-Cokriging in most of the cases. The errors in the prediction of QoIs by GE-Krigingare several times larger than those by GE-Cokriging, and the prediction of gradients by GE-Kriging areeven worse than those by GE-Cokriging, in all examples. Hence, among these three methods compared, GE-Cokriging is able to maintain a robust prediction result both in terms of QoI and in terms of the gradientof QoI simultaneously, while the other two methods can not obtain comparable results. This further verifiesthat the information of QoI and its gradients can be strongly correlated, and hence is of great help to improvethe accuracy of GPR methods when used jointly.Tab. 2 shows that GE-Kriging and GE-Cokriging are more time-efficient compared to Cokriging, whichis suggested by the fact that the prediction of gradients with GE-Kriging and GE-Cokriging take a rathersmall amount of time compared to Cokriging method. This is due to the fact that GE-Kriging and GE-Cokriging integrate both QoI data and the corresponding gradient data in the training step and henceprovides prediction of QoI as well as the gradient on the new locations simultaneously in the predictingstep. Whereas, Cokriging requires construction of a model for gradient data separately. Hence, the timefor the prediction of the gradients by GE-Kriging and GE-Cokriging, i.e., the last two columns in Tab. 2,are for prediction only and is relatively short. It is also noticed that the time consumption of GE-Kriging15 ase Cokriging GE-Kriging GE-Cokriging Cokriging ( ∇ ) GE-Kriging ( ∇ ) GE-Cokriging ( ∇ )1D1 0 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . ∗ . ± . . ± . . ± . . ± . . ± . . ± . ∗∗ - - - 0 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . Table 1: Relative mean squared error (mean ± standard deviation) of QoI and the corresponding gradients foreach numerical example averaged over 5 separate runs with random parameters initialization by Cokriging,GE-Kriging and GE-Cokriging. ∗ denotes gradient in x direction and ∗∗ denotes gradient in y direction. ∇ denotes prediction of the gradient of QoI.is smaller than that of GE-Cokriging, recall that GE-Kriging only used high-fidelity information while GE-Cokriging used both high-fidelity and low-fidelity information, which lead to a larger covariance matrix inGE-Cokriging compared to that in GE-Kriging. Although GE-Cokriging generally requires longer time inthe training step, almost doubles Cokriging’s training time, the total time cost of GE-Cokriging in QoI andgradients prediction is almost the same as that of Cokriging. Considering the significant improvement inaccuracy and robustness, we can conclude that GE-Cokriging is an accurate and efficient approach to obtainprediction both QoI and its gradients simultaneously. Case ID Cokriging GE-Kriging GE-Cokriging Cokriging ( ∇ ) GE-Kriging ( ∇ ) GE-Cokriging ( ∇ )1D1 1 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . ∗ . ± . . ± . . ± . . ± . . ± . . ± . ∗∗ - - - 1 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . Table 2: Runtime (mean ± standard deviation) of predicting QoI and its gradients for each numerical exampleaveraged over 5 separate runs with random parameters initialization by Cokriging, GE-Kriging and GE-Cokriging. ∗ denotes gradient in x direction and ∗∗ denotes gradient in y direction. ∇ denotes prediction ofthe gradient of QoI. In this work, we present a comprehensive gradient-enhanced multi-fidelity Cokriging method, namely GE-Cokriging, which incorporates available gradient information of multi-fidelity data, i.e., low-fidelity andhigh-fidelity observation of QoIs and its gradients. We present several numerical examples to study theperformance of GE-Cokriging. Our results show that GE-Cokriging can accurately predict the QoI and itsgradients simultaneously. We compare the performance of GE-Cokriging against GE-Kriging and multi-fidelity Cokriging, two popular GP-based prediction methods, and illustrate that GE-Cokriging is the mostaccurate, robust and efficient among these methods.In particular, our result suggests that GE-Cokriging achieves better accuracy than GE-Kriging, thisis because it exploits the information of the low-fidelity model. Also, GE-Cokriging yields more accurateresults than using Cokriging for QoI and its gradients separately, because it takes advantage of the relationbetween these two quantities and makes use of corresponding data jointly. Even when some of the low-fidelitygradient information is misleading, for example, the gradient of low-fidelity data is negative while that ofhigh-fidelity data is positive, the GE-Cokriging method may still be robust enough to predict accuratelyon target functions with less uncertainty compared to those by Cokriging and GE-Kriging. Moreover,the GE-Cokriging helps to alleviate the singularity issue of the covariance matrix, which is quite commonin GPR methods. In terms of computational cost, the training of GE-Cokriging model, i.e., identifying16yperparameters, could take longer time than Cokriging in solving a high-dimensional problem, given thatthe dimension of the covariance matrix is expanded due to the incorporation of gradient samples. However,once these hyperparameters are specified, the QoI and its gradients can be predicted simultaneously. Thissaves total computational time compared with Cokriging, which requires constructing models for QoI andits gradients separately, and hence needs training at least two models. Therefore, the overhead of training amodel with a larger covariance matrix in GE-Cokriging is mitigated, and the overall time required to predictboth QoI and its gradients for these three methods are comparable.We note that our gradient-enhanced framework is also flexible for further extensions. In all of thenumerical examples, we apply the commonly used stationary radial-basis function kernel. Other kernel func-tions, e.g., Mat´ern kernels with different smoothness, can be used to solve problems with desired regularityconstraints. In addition, non-stationary kernels can be applied in this framework to model heterogeneoussystems more accurately. Another extension can be to relax the constraints on the sample data to address thesituation of missing data. More specifically, in the numerical examples presented, the gradient information isavailable with QoI at each observation location. Whereas in practice, it is possible that at some observationlocations, either the QoI or its gradient is unavailable. In this scenario, modifications to the mean and co-variance functions of the GP in our framework are needed. Moreover, we used the linear auto-regression formof the multi-fidelity Cokriging from [13], which can be replaced by more general nonlinear auto-regressionforms, e.g., the methods used in [23, 9, 17], or even the deep neural network, e.g., [19]. Finally, as we pointout in Section 2.4, our framework can also be built based on the “integral-enhanced” perspective, which canbe useful in specific practical problems.
Acknowledgments
Yixiang Deng was supported by National Science Foundation (NSF) Award No. 1736088. Xiu Yang wassupported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Com-puting Research (ASCR) as part of Multifaceted Mathematics for Rare, Extreme Events in Complex Energyand Environment Systems (MACSER). Guang Lin gratefully acknowledges the support from National Sci-ence Foundation (DMS-1555072, DMS-1736364, and CMMI-1634832) and Brookhaven National LaboratorySubcontract 382247.
References [1] Petter Abrahamsen. A review of gaussian random fields and correlation functions, 1997.[2] Giancarlo Alfonsi. Reynolds-averaged navier–stokes equations for turbulence modeling.
Appl. Mech.Rev. , 62(4), 2009.[3] Hyoung Seog Chung and Juan Alonso. Design of a low-boom supersonic business jet using cokrigingapproximation models. In ,page 5598, 2002.[4] Richard Dwight and Zhong-Hua Han. Efficient uncertainty quantification using gradient-enhancedkriging. In , page 2276, 2009.[5] Pep Espanol and Patrick Warren. Statistical mechanics of dissipative particle dynamics.
Europhys.Lett. , 30(4):191, 1995.[6] Alexander Forrester, Andy Keane, and Andr`as S`obester.
Engineering Design via Surrogate Modelling:A Practical Guide . John Wiley & Sons, 2008.[7] Alexander IJ Forrester, Andr´as S´obester, and Andy J Keane. Multi-fidelity optimization via surrogatemodelling.
Proc. R. Soc. A. , 463(2088):3251–3269, 2007.178] Meixia Geng, Danian Huang, Qingjie Yang, and Yinping Liu. 3d inversion of airborne gravity-gradiometry data using cokriging.
Geophysics , 79(4):G37–G47, 2014.[9] Mark Girolami and Mingjun Zhong. Data integration for classification problems employing gaussianprocess priors. In
Adv. Neural. Inf. Process. Syst. , pages 465–472, 2007.[10] Pierre Goovaerts. Ordinary cokriging revisited.
Math. Geosci. , 30(1):21–42, 1998.[11] Loic Le Gratiet and Josselin Garnier. Recursive co-kriging model for design of computer experimentswith multiple levels of fidelity.
Int. J. Uncertain. Quan. , 4(5):365–386, 2014.[12] Zhong-Hua Han, Stefan G¨ortz, and Ralf Zimmermann. Improving variable-fidelity surrogate modelingvia gradient-enhanced kriging and a generalized hybrid bridge function.
Aerosp. Sci. Technol. , 25(1):177–189, 2013.[13] Marc C Kennedy and Anthony O’Hagan. Predicting the output from a complex computer code whenfast approximations are available.
Biometrika , 87(1):1–13, 2000.[14] Peter K Kitanidis.
Introduction to Geostatistics: Applications in Hydrogeology . Cambridge UniversityPress, 1997.[15] J Laurenceau, M Meaux, M Montagnac, and P Sagaut. Comparison of gradient-based and gradient-enhanced response-surface-based optimizers.
AIAA J. , 48(5):981–994, 2010.[16] Luc Laurent, Rodolphe Le Riche, Bruno Soulier, and Pierre-Alain Boucard. An overview of gradient-enhanced metamodels with applications.
Arch. Comput. Methods Eng. , 26(1):61–106, 2019.[17] Seungjoon Lee, Felix Dietrich, George E Karniadakis, and Ioannis G Kevrekidis. Linking gaussianprocess regression with data-driven manifold embeddings for nonlinear data fusion.
Interface focus ,9(3):20180083, 2019.[18] Seungjoon Lee, Ioannis G Kevrekidis, and George Em Karniadakis. A general cfd framework for fault-resilient simulations based on multi-resolution information fusion.
J. Comput. Phys. , 347:290–304, 2017.[19] Xuhui Meng and George Em Karniadakis. A composite neural network that learns from multi-fidelitydata: Application to function approximation and inverse pde problems.
J. Comput. Phys. , 401:109020,2020.[20] Max D Morris, Toby J Mitchell, and Donald Ylvisaker. Bayesian design and analysis of computerexperiments: use of derivatives in surface prediction.
Technometrics , 35(3):243–255, 1993.[21] Benjamin Peherstorfer, Karen Willcox, and Max Gunzburger. Survey of multifidelity methods in un-certainty propagation, inference, and optimization.
SIAM Rev. , 60(3):550–591, 2018.[22] P Perdikaris, D Venturi, JO Royset, and GE Karniadakis. Multi-fidelity modelling via recursive co-kriging and Gaussian–Markov random fields.
Proc. R. Soc. A. , 471(2179):20150018, 2015.[23] Paris Perdikaris, Maziar Raissi, Andreas Damianou, ND Lawrence, and George Em Karniadakis.Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling.
Proc. R. Soc. A ,473(2198):20160751, 2017.[24] Ghanshyam Pilania, James E Gubernatis, and Turab Lookman. Multi-fidelity machine learning modelsfor accurate bandgap predictions of solids.
Comput. Mater. Sci. , 129:156–163, 2017.[25] Osborne Reynolds. Iv. on the dynamical theory of incompressible viscous fluids and the determinationof the criterion.
Philos. Trans. R. Soc. Lond. A , (186):123–164, 1895.[26] Christie Richard. Power systems test case archive, May 1993.1827] Robert E Rudd and Jeremy Q Broughton. Coarse-grained molecular dynamics and the atomic limit offinite elements.
Phys. Rev. B , 58(10):R5893, 1998.[28] A Stein and LCA Corsten. Universal kriging and cokriging as a regression procedure.
Biometrics , pages575–587, 1991.[29] A Stein, IG Staritsky, J Bouma, AC Van Eijnsbergen, and AK Bregt. Simulation of moisture deficitsand areal interpolation by universal cokriging.
Water Resour. Res. , 27(8):1963–1973, 1991.[30] Selvakumar Ulaganathan, Ivo Couckuyt, Francesco Ferranti, Eric Laermans, and Tom Dhaene. Perfor-mance study of multi-fidelity gradient enhanced kriging.
Struct. Multidiscipl. Optim. , 51(5):1017–1033,2015.[31] Christopher KI Williams and Carl Edward Rasmussen.
Gaussian processes for machine learning , vol-ume 2. MIT press Cambridge, MA, 2006.[32] Ying Xuan, JunHua Xiang, WeiHua Zhang, and YuLin Zhang. Gradient-based kriging approximatemodel and its application research to optimization design.
Sci. China Technol. Sci. , 52(4):1117–1124,2009.[33] Xiu Yang, David Barajas-Solano, Guzel Tartakovsky, and Alexandre M Tartakovsky. Physics-informedcokriging: A gaussian-process-regression-based multifidelity method for data-model convergence.
J.Comput. Phys. , 395:410–431, 2019.[34] Xiu Yang, Guzel Tartakovsky, and Alexandre Tartakovsky. Physics-informed kriging: Aphysics-informed gaussian process regression method for data-model convergence. arXiv preprintarXiv:1809.03461 , 2018.[35] Xiu Yang, Xueyu Zhu, and Jing Li. When bifidelity meets cokriging: An efficient physics-informedmultifidelity method.
SIAM J. Sci. Comput. , 42(1):A220–A249, 2020.[36] Ray Daniel Zimmerman, Carlos Edmundo Murillo-S´anchez, and Robert John Thomas. Matpower:Steady-state operations, planning, and analysis tools for power systems research and education.
IEEETrans. Power Syst. , 26(1):12–19, 2011.[37] Ralf Zimmermann. On the maximum likelihood training of gradient-enhanced spatial gaussian processes.
SIAM J. Sci. Comput. , 35(6):A2554–A2574, 2013.19 a) (b)(c) (d)(e)
Figure 7: Prediction of the trajectory (QoI), velocity (gradient of QoI) and the phase diagram of an under-damped oscillator. Prediction of the posterior mean (blue solid lines) and standard deviation (grey shadedarea) of the trajectory x H ( t ) by (a) Cokriging and (b) GE-Cokriging. Prediction of the posterior mean(blue solid lines) and standard deviation (grey shaded area) of the velocity dx H ( t ) dt by (c) Cokriging and (d)GE-Cokriging. (e) Prediction of phase diagram by Cokriging (blue dashed line) and that by GE-Cokriging(black dashed line). Black diamonds denote high-fidelity observations, red circles denote low-fidelity observa-tions, black solid lines denote the high-fidelity models and red solid lines denote low-fidelity models. Coloredonline. 20 a) (b)(c) (d) Figure 8: Prediction of the relationship between the power input of a generator bus x and real-time powerfactor of a load bus f H ( x ) by an AC model. Prediction of the posterior mean (blue solid lines) and standarddeviation (grey shaded area) of f H ( x ) by (a) Cokriging and (b) GE-Cokriging. Prediction of the posteriormean (blue solid lines) and standard deviation (grey shaded area) of gradient of QoI df H ( x ) dxdx