Multivariate Gaussian Process Regression for Multiscale Data Assimilation and Uncertainty Reduction
CConfidential manuscript submitted to
Water Resources Research
Multivariate Gaussian Process Regression forMultiscale Data Assimilation and UncertaintyReduction
David A. Barajas-Solano , Alexandre M. Tartakovsky Pacific Northwest National Laboratory, Richland, WA
Key Points: • We present a parameter field estimation method based on measurements collectedat different scales • Scales are treated as components of a multivariate Gaussian process trained frommultiscale data • Conditioning on multiscale data can be used to reduce uncertainty in geophysi-cal modeling
Corresponding author: D. A. Barajas-Solano,
[email protected] –1– a r X i v : . [ s t a t . M E ] A p r onfidential manuscript submitted to Water Resources Research
Abstract
We present a multivariate Gaussian process regression approach for parameter field re-construction based on the field’s measurements collected at two different scales, the coarseand fine scales. The proposed approach treats the parameter field defined at fine and coarsescales as a bivariate Gaussian process with a parameterized multiscale covariance model.We employ a full bivariate Mat´ern kernel as multiscale covariance model, with shape andsmoothness hyperparameters that account for the coarsening relation between fine andcoarse fields. In contrast to similar multiscale kriging approaches that assume a knowncoarsening relation between scales, the hyperparameters of the multiscale covariance modelare estimated directly from data via pseudo-likelihood maximization.We illustrate the proposed approach with a predictive simulation application forsaturated flow in porous media. Multiscale Gaussian process regression is employed toestimate two-dimensional log-saturated hydraulic conductivity distributions from syn-thetic multiscale measurements. The resulting stochastic model for coarse saturated con-ductivity is employed to quantify and reduce uncertainty in pressure predictions.
Kriging is a widely used geostatistical tool for estimating the spatially distributedquantities from sparse measurements and for quantifying uncertainty of such estimates.When such estimates are used as parameter fields in a physics model, uncertainty in krig-ing estimates can be propagated throughout the model to quantify uncertainty in pre-dictions. Observations of parameter fields are often gathered at different scales (oftenreferred to as support scales). A common example is hydraulic permeability. Labora-tory experiments performed on field samples (which are approximately 0 . −
100 m), and correspond to local averages of the measured quantityover that support volume. Methodologies for incorporating multiscale measurements onstochastic models for spatial parameters are important in order to systematically reduceuncertainty using all available data sources.In
Li et al. [2009], the authors employ multivariate Gaussian process regression orcokriging, together with the properties of Gaussian processes, to formulate a multiscale simple kriging method for incorporating observations of the field at two scales under theassumption that coarsening is obtained by block averaging. This multiscale simple krig-ing method treats the fine and coarse scale fields as covariates in bivariate cokriging, withthe coarse covariance and coarse-fine cross-covariance computed analytically from thefine scale covariance in terms of the block averaging operator [
Vanmarcke , 2010]. Whilethis approach to formulating multiscale simple kriging can be generalized to all linearcoarsening operators (e.g. arithmetic moving averages, convolutions, etc.), it’s not pos-sible in general to obtain closed-form expressions for the coarse covariance and the coarse-fine cross-covariance. Furthermore, this approach either presumes a priori knowledgeof the structure of the coarsening operator and the coarse field support volume, whichis in general application-specific and not available, or requires a sufficiently flexible pa-rameterized linear coarsening operator with parameters to be estimated together withall other covariance hyperparameters.As an alternative, in this work we propose to model the fine and coarse fields ascomponents of a bivariate Gaussian process with parameterized bivariate covariance modeland to estimate the shape and smoothness parameters of such model directly from mul-tiscale data rather than deriving them from knowledge of the structure of the coarsen-ing operator. Multivariate covariance models are commonly used for cokriging in geo-statistics. We point the reader to
Genton and Kleiber [2015] for a comprehensive reviewof multivariate models. For multiscale simple kriging, we propose employing the univari- –2–onfidential manuscript submitted to
Water Resources Research ate Mat´ern kernel as the building block for multiscale covariance models as it providessufficient flexibility for capturing the smoothness properties that arise from coarsening.In particular, we employ the full bivariate Mat´ern covariance kernel proposed in
Gneit-ing et al. [2010]. Our approach can be extended to more than two support scales by em-ploying the generalized construction proposed in
Apanasovich et al. [2012]. We illustratevia numerical experiments that the full bivariate Mat´ern kernel captures the differentdegrees of smoothness of each scale and, via multiscale simple kriging, produces accu-rate estimates of the field at fine and coarse scales by assimilating data collected at fineand coarse scales.This manuscript is structured as follows. In section 2, we introduce the multiscalesimple kriging algorithm based on multivariate Gaussian process regression, and outlinethe model selection approach employed. In section 3, we summarize the block averaging-based covariance model, and introduce the bivariate Mat´ern model. Numerical exper-iments illustrating both inference from data and application to saturated flow model-ing are presented in section 4. Finally, conclusions are given in section 5, together withpossible avenues for future work.
Let Ω ⊂ R d ( d ∈ [1 , Y :Ω → R denote a heterogeneous physical property to be estimated from discrete obser-vations. We assume that measurements of Y are available at two different scales: (i) fromsoil samples (fine scale) with the measurement scale η f , and (ii) pumping-well tests (coarsescale) with the measurement scale η c . Our goal is to incorporate both sets of measure-ments to derive estimators of Y at either support scale.We use y f to denote Y sampled at the support length-scale η f and y c for Y sam-pled at the support length-scale η c . Next, we assume that a “coarsening” operator R cf exists such that y c ( x ) ≡ [ R cf y f ]( x ) for all x ∈ Ω . (1)A total of N f and N c observations of y f and y f are available at locations { x fi ∈ Ω } N f i =1 and { x ci ∈ Ω } N c i =1 , respectively. The observation locations are arranged into the N f × d and N c × d matrices X f and X c , respectively. Similarly, we arrange the observationinto the N f × N c × y fs and y cs , respectively. Finally, we denoteby D s ≡ ( X fs , y fs , X cs , y cs ) the multiscale data set.We assume that y f and y c are Gaussian processes. For univariate simple kriging,we employ the prior processes y c ( x ) ∼ GP (0 , C c ( x , x (cid:48) | θ c )) , y f ( x ) ∼ GP (cid:0) , C f ( x , x (cid:48) | θ c ) (cid:1) , which are updated employing the single-scale data sets ( X cs , y cs ) and ( X fs , y fs ), respec-tively [ Cressie , 2015]. Here, C c and C f denote parameterized covariance functions, withhyperparameter sets θ c and θ f , respectively, estimated from the corresponding data sets.For multiscale simple kriging [ Li et al. , 2009], we assume that the fine and coarse scalefields are components of a bivariate
Gaussian process y ( x ) = [ y c ( x ) , y f ( x )] (cid:62) with prior y ( x ) ≡ (cid:20) y c ( x ) y f ( x ) (cid:21) ∼ GP (cid:18)(cid:20) (cid:21) , C ( x , x (cid:48) | θ ) = (cid:20) C c ( x , x (cid:48) | θ ) C cf ( x , x (cid:48) | θ ) C fc ( x , x (cid:48) | θ ) C f ( x , x (cid:48) | θ ) (cid:21)(cid:19) (2)to be conditioned on the multiscale data set D s . Here, C denotes the parameterized mul-tiscale covariance function, with hyperparameter set θ to be estimated from the multi-scale data set.The multiscale simple kriging-based estimation procedure is as follows: First, a “valid”multiscale covariance model C ( · , · | θ ) is selected as explained later in this section. Sec-ond, an estimator θ ∗ of the hyperparameters of the covariance model is computed from –3–onfidential manuscript submitted to Water Resources Research the multiscale data set D s (model selection stage). Finally, the estimator θ ∗ is pluggedinto the covariance model, and the mean and covariance of the conditional bivariate GP y | D s , θ ∗ are computed (plug-in conditioning stage).The choice of multiscale covariance model and hyperparameters is not arbitrary.First, the model must be “valid”, i.e. the block components C c , C f and C cf ( ≡ ( C fc ) (cid:62) )and the hyperparameters θ must be so that C is a symmetric positive definite (SPD) ker-nel. Second, in a weaker sense, covariance model and hyperparameters must reflect that y c is a coarsening of y f as given by eq. (1). The approach we pursue in this work is dif-ferent from the approach of Li et al. [2009] and
Cressie [2015], where block averagingwith known support is employed as the coarsening operator, from which block compo-nents can be derived such that the previous conditions are satisfied. In contrast, in thiswork we propose a multiscale covariance model that can be used even when the supportand/or the structure of the coarsening are unknown a priori .For the remainder of this manuscript we employ the symbol C ( · , · | θ ) to denoteboth covariance matrices and functions, i.e., for N a × d and N b × d matrices X a and X b of locations in Ω, C ( X a , X b | θ ) denotes the covariance matrix { C ij ≡ C ( x ai , x bj | θ ) } . For given hyperparameters θ , the posterior process conditional on the data D s is y | D s , θ ∼ GP (ˆ µ, ˆ C ), with conditional mean and covariance given byˆ µ ( x | θ ) = C ( x , X s | θ ) C − s ( θ ) y s , (3)ˆ C ( x , x (cid:48) | θ ) = C ( x , x (cid:48) | θ ) − C ( x , X s | θ ) C − s ( θ ) C ( X s , x (cid:48) | θ ) , (4)where y (cid:62) s = [( y cs ) (cid:62) , ( y fs ) (cid:62) ] is the N × N ≡ N f + N c , and C s ( θ ) ≡ C ( X s , X s | θ ) denotes the covariance matrix of the observations.From eq. (3) and eq. (4) we can find the conditional mean and covariance for eachscale. For the coarse scale, we have y c | D s , θ ∼ GP (ˆ µ c , ˆ C c ), withˆ µ c ( x | θ ) = C cp ( x , X s | θ ) C − s ( θ ) y s , (5)ˆ C c ( x , x (cid:48) | θ ) = C c ( x , x (cid:48) | θ ) − C cp ( x , X s | θ ) C − s ( θ ) C cp ( X s , x (cid:48) | θ ) , (6) C cp ( x , x (cid:48) | θ ) = (cid:2) C c ( x , x (cid:48) | θ ) C cf ( x , x (cid:48) | θ ) (cid:3) . (7)Similar expressions can be found for the fine scale. For the plug-in approach employed in this work we need to compute an estimate θ ∗ of the covariance hyperparameters θ from the data set D s . Two approaches are com-monly used in univariate geostatistics: variogram curve fitting, and (pseudo-log)likelihoodmaximization. In this work we employ the latter as experience shows that curve fittingis not adequate for estimating the smoothness of a differentiable processes [ Stein , 1999],which plays an important role in estimating coarse fields (see section 3.1). Specifically,we take θ ∗ ≡ arg max θ L ( D s , θ ), where L ( D s , θ ) is a (pseudo-)log likelihood function.Two (pseudo-)log likelihood functions are employed in this work. The first one isthe marginal likelihood (ML), commonly used in Bayesian model selection, and definedas the likelihood of the observations given the data and the hyperparameters: L ML ( D s , θ ) ≡ log p ( y s | D s , θ ) = − y s C − s ( θ ) y s −
12 log | C s ( θ ) | − N π. (8)The computational cost of computing the ML log likelihood is O ( N ), chiefly driven bythe inversion of the covariance matrix of the observations. We note that ML estimation –4–onfidential manuscript submitted to Water Resources Research is equivalent to the well-known residual maximum likelihood (REML) [
Cressie , 2015] es-timation for the case of zero mean priors, as in eq. (2).The second function is the leave-one-out cross-validation (LOO-CV) pseudo-log like-lihood, which consists of sum of the log predictive probabilities of each observation givenall other observations and the hyperparameters. We denote by D − i = ( X − i , y − i ) thedata set excluding the i th observation. The prediction for the i th observation given D − i and θ has distribution p ( y i |D s , θ ) = N ( y i | ˆ µ − i , ˆ σ − i ), where ˆ µ − i and ˆ σ − i are given byapplying eq. (3) and eq. (4) to D − i , obtainingˆ µ − i ( θ ) = C ( x i , X − i | θ ) C − − i ( θ ) y − i , ˆ σ − i ( θ ) = C ( x i , x i | θ ) − C ( x i , X − i | θ ) C − − i ( θ ) C ( X − i , x i | θ ) , where C − i ( θ ) ≡ C ( X − i , X − i | θ ). The LOO-CV pseudo-log likelihood is therefore givenby L LOO ( D s , θ ) = N (cid:88) i =1 log p ( y i | D − i , θ )= − N (cid:88) i =1 (cid:40) [ y i − ˆ µ − i ( θ )] σ ( θ ) − i + 12 log ˆ σ ( θ ) − i + 12 log 2 π (cid:41) . (9)We note that computing the LOO-CV pseudo-likelihood does not require applying eq. (3)and eq. (4) N times, and instead can be computed with O ( N ) overhead from C − s ( θ ).Therefore, the computational cost of the LOO-CV pseudo-likelihood is O ( N ) [ Rasmussenand Williams , 2005].
In this section we present bivariate covariance models to be used in multiscale sim-ple kriging. The first model is the block-averaging-based model employed in
Cressie [2015]and
Li et al. [2009]. The second model is a bivariate Mat´ern covariance model. Both mod-els require assuming a certain functional form for the fine-scale covariance (with hyper-parameters estimated during model selection), but the block-averaging-based model re-quires a priori knowledge of the the structure of the coarsening, whereas the bivariateMat´ern covariance model doesn’t.
In this approach, we construct a multiscale covariance model directly from a modelfor the fine scale covariance C f and the relation eq. (1). We assume that R cf is the lin-ear operator H η c of the form y c ( x ) ≡ (cid:2) H η c y f (cid:3) ( x ) ≡ (cid:90) Ω h η c ( x , z ) y f ( z ) d z , (10)where h η c ( x , z ) is an averaging kernel with support η c satisfying (cid:82) Ω h ( z , z (cid:48) ) d z (cid:48) = 1 forany z ∈ Ω. Linearity of the coarsening operator ensures that y c is a GP. If h η c is known,the coarse-scale covariance and the fine-coarse-scale cross-covariance functions can becomputed as C c ( x , x (cid:48) | θ ) = (cid:90) (cid:90) Ω × Ω h η c ( x , z ) C f ( z , z (cid:48) | θ ) h η c ( z (cid:48) , x (cid:48) ) d z d z (cid:48) , (11) C cf ( x , x (cid:48) | θ ) = (cid:90) Ω h η c ( x , z ) C f ( z , x (cid:48) ) d z , (12)In this model, the multiscale covariance hyperparameters are those of the fine scale co-variance model. This approach was employed in Li et al. [2009] to formulate the authors’ –5–onfidential manuscript submitted to
Water Resources Research multiscale simple kriging method, with the block averaging kernel h η c ( z , z (cid:48) ) ≡ (cid:40) | H ( z ) | − if z (cid:48) ∈ H ( z ) , . ,H ( z ) = { x ∈ Ω | max {| x − z | , . . . , | x d − z d |} ≤ η c / } ∩ Ω , (13)for which explicit formulae for C c and C cf can be found in terms of C f [see Vanmar-cke , 2010, chap. 5–7]. It must be noted that for the block averaging kernel eq. (13), y c is in general not isotropic, and only stationary away from ∂ Ω.This approach can be employed when the support length-scale is known and when eq. (10)accurately describes the averaging relation between y c and y f . The advantage of this ap-proach is that the number of hyperparameters is lower than for a full multiscale model(as will be seen in section 3.2), thus simplifying the model selection process.To illustrate this approach, we compute the coarse covariance and coarse-fine cross-covariance using the block averaging kernel eq. (13) for the fine scale isotropic covari-ance C f ( r ) = σ f exp( − r/λ f ) in R , and for various values of η c /λ f . Fine and coarsevariograms and coarse-fine cross-variograms are presented in fig. 1. In this case, the coarsecovariance is not isotropic, therefore we present in fig. 1 the pseudo-isotropic variogram˜ γ c ( r ) ≡ C c ( x , x ) − C c ( x , x + r e ) for any x ∈ R , where e is the unit vector along the1st coordinate. A similar pseudo-isotropic variogram is computed from the coarse-finecross-covariance. r/ λ f γ / σ f γ f ˜ γ c ˜ γ cf (a) λ f /η c = 1 r/ λ f γ / σ f γ f ˜ γ c ˜ γ cf (b) λ f /η c = 0 . Figure 1.
Fine and coarse variograms, and coarse-fine cross-variogram for fine scale isotropicexponential covariance in R , and two values of the coarse length-scale relative to the fine scalecorrelation length As expected, the coarse covariance exhibit longer correlation length and smallervariance than the fine scale covariance. We also observe that the coarse correlation lengthincreases and the coarse variance decreases with increasing η c /λ f , as noted in Tartakovskyet al. [2017]. Similar behavior is observed for the coarse-fine cross-covariance. Therefore,applying the multiscale simple kriging estimation method described in section 2 to theblock-averaging model with fixed η c when the coarse length scale is not sufficiently wellknown may result in large estimation errors. This issue can be ameliorated by includ-ing η c as a hyperparameter of the multiscale covariance model to be estimated duringthe model selection stage. This requires assuming that the coarsening operator can beapproximated by the block averaging operator with known averaging kernel. –6–onfidential manuscript submitted to Water Resources Research
Another effect of coarsening is in the change in smoothness from y f to y c , reflectedby the order of mean square differentiability of the fields. This is seen visually in fig. 1in the behavior of γ f and ˜ γ c for r →
0. For the exponential kernel, we have γ f = σ f λ − f | r | + O ( r ) for r →
0. According to theorem 2.7.2 in
Stein [1999], under this condition, y f is not mean square differentiable. On the other hand, dˆ γ c ( r = 0) / d r ≈ y c is mean square dif-ferentiable. This change in smoothness is due to coarsening and correspond to high cor-relation between y c values at locations closely together, with the radius of such high cor-relation region increasing with increasing η c /λ f . Multiscale covariance models must cap-ture this ‘S’-shape in order to ensure that the coarse field estimator has the appropri-ate degree of smoothness due to coarsening. As an alternative to the block-averaging-based model presented in the previous sec-tion, we propose employing multivariate Mat´ern kernels [
Gneiting et al. , 2010] to modelblock components of the multiscale covariance function. The univariate Mat´ern kernelis given by M ( r | ν, D ) = 2 − ν Γ( ν ) (cid:16) √ ν (cid:107) r (cid:107) D (cid:17) ν K ν (cid:16) √ ν (cid:107) r (cid:107) D (cid:17) , (14)where K ν is the modified Bessel function of the second kind, ν > D is a d × d SPD matrix, and (cid:107) r (cid:107) D denotes the so-called Mahalanobis distance,given by (cid:107) r (cid:107) D = r (cid:62) D − r . (15)The matrix D allows us to introduce anisotropy to the model. In the isotropic case, D ≡ λ − I d , where λ denotes the correlation length. The shape parameter ν controls the smooth-ness of the process. Specifically, a process with Mat´ern covariance is m times mean squaredifferentiable if and only if ν > m . The Mat´ern family includes other well known ker-nels as special cases: For ν = 1 /
2, the Mat´ern kernel is equal to the exponential ker-nel exp( −(cid:107) r (cid:107) D /λ ) (not differentiable), while for ν → ∞ we recover the Gaussian or squared-exponential kernel exp( −(cid:107) r (cid:107) D / λ ) (infinitely differentiable).In this work we propose employing the so-called full bivariate Mat´ern kernel as themultiscale covariance model, given by C ( r ) = (cid:20) σ c M ( r | ν c , λ − c I d ) + σ nc (cid:107) r (cid:107) =0 ρσ c σ f M ( r | ν cf , λ − cf I d ) ρσ c σ f M ( r | ν cf , λ − cf I d ) σ f M ( r | ν f , λ − f I d ) + σ nf (cid:107) r (cid:107) =0 (cid:21) (16)where ν c , ν f and ν cf are the shape parameters for the coarse, fine and coarse-fine com-ponents, respectively; λ c , λ f and λ cf are the correlation lengths of the coarse, fine, andcoarse-fine components, respectively; σ c and σ f are the coarse and fine scale standarddeviations, respectively, and ρ is the collocated correlation coefficient. The nugget terms σ nc (cid:107) r (cid:107) =0 and σ nf (cid:107) r (cid:107) =0 are added to the block diagonal components to capture Gaus-sian measurement noise. The full bivariate model, by including separate standard de-viation, shape parameters and correlation lengths for the coarse and fine scale compo-nents, provides enough flexibility to capture the effects of coarsening discussed in the pre-vious section, namely the changes in standard deviation, correlation length, and degreesof smoothness, from the fine to coarse scale.We proceed to establish conditions for the hyperparameters that ensure that themultiscale covariance model is valid, that is, that the full bivariate Mat´ern function eq. (16)is SPD. To simplify the full bivariate model, we consider exclusively the case ν cf = ( ν c + ν f ). Numerical experiments show us that this case provides sufficient flexibility for mod-eling multiscale covariance functions for sufficient scale separation between y f and y c , –7–onfidential manuscript submitted to Water Resources Research i.e. for η c /λ f (cid:38)
1. For this case, sufficient conditions for validity are
Gneiting et al. [2010] a cf ≥
12 ( a c + a f ) , a c = √ ν c λ c , a f = (cid:112) ν f λ f , a cf = (cid:112) ν cf λ cf , and (17) | ρ | ≤ a ν c c a ν f f a ν cf cf Γ( ν cf )Γ / ( ν c )Γ / ( ν f ) . (18)The number of hyperparameters to estimate during the model selection stage is 10 ( λ c , λ f , λ cf , ν c , ν f , σ c , σ f , ρ , σ nc , and σ nf ). In general, we expect that σ c < σ f , ν c > ν f , λ c > λ f , and ρ >
0. These are not validity conditions and rather reflect the fact that y c is a coarsening of y f that is more smooth and has lower variance, and that y c and y f are by definition positively correlated.The advantage of employing the full bivariate Mat´ern kernel as a multiscale covari-ance model is that it does not require a priori knowledge of the coarsening relation (e.g.,whether is a linear coarsening, the analytic form of h η c , the value of η c , etc.). Instead,the shape of the block variogram components is estimated from data, and reflected bythe shape parameters ν c , ν f , and ν cf . The disadvantages are that the number of free hy-perparameters is larger than for the block averaging model, and that model selection re-quires constrained minimization of the pseudo-likelihood subject to the nonlinear con-straints eq. (17) and eq. (18), as opposed to unconstrained minimization. In this section we illustrate the proposed multiscale simple kriging approach thatcombines cokriging with the full bivariate Mat´ern kernel. We employ the proposed ap-proach to estimate two-dimensional distribution of hydraulic conductivity K from syn-thetic multiscale hydraulic conductivity measurements. We model K at the fine scaleas a stationary lognormal process, i.e., K = K G exp( Y ), where K G [L T − ] is the uni-form geometric mean, assumed known, and Y , referred to as the normalized log-conductivity,is a zero-mean Gaussian process with isotropic exponential covariance. We illustrate the model selection process by considering two sets of values of η f and η c , corresponding to two degrees of scale separation between y f and y c . For each testcase, we obtain hyperparameter estimates using both the ML and LOO-CV approachesfrom multiscale observations. We compare the estimated coarse and fine variograms againstthe corresponding empirical variograms computed using the R package RandomFields [ Schlatheret al. , 2017, 2015]. We also compare against the true fine scale variogram and the coarseand coarse-fine pseudo-isotropic variograms.For this experiment, the estimation domain is Ω = [0 , L × [0 , L , where L [L]is the domain length scale. A synthetic reference fine scale log-conductivity field is sim-ulated on a 256 ×
128 square grid. We take the fine support scale to be equal to the gridresolution ∆ x , that is, η f = ∆ x . The reference coarse scale log-conductivity field is thencomputed by block averaging with η c = m ∆ x , where m is a positive integer. At eachscale, sampling locations are randomly chosen from the set of discretization centroids.White noise with variance σ (cid:15) is added to the observations at both scales. All parame-ters and the number of measurements are listed in in table 1.Constrained minimization of the negative log pseudo-likelihood is performed em-ploying the MATLAB implementation of the interior point algorithm with a BFGS ap-proximation of the Hessian. Correlation length, shape, and standard deviation hyper-parameters are log-transformed to change their support from (0 , ∞ ) to ( −∞ , ∞ ). Sim-ilarly, the collocated correlation coefficient is transformed as ξ = log[(1+ ρ ) / (1 − ρ )] to –8–onfidential manuscript submitted to Water Resources Research change their support from ( − ,
1) to ( −∞ , ∞ ). Initial values for the hyperparametersare chosen by inspection of the empirical variograms, and modified if the constrained min-imization algorithm fails to converge. Table 1.
Model selection tests
Fine CoarseTest σ f λ f /L N f σ (cid:15) m = η c /η f N c σ (cid:15) . .
05 50 5 × − × − . .
10 40 5 × −
16 120 5 × − Table 2.
True and estimated full bivariate Mat´ern parameters for Tests 1 and 2 of table 1.True parameters in italics ( σ c and ρ ) are computed from eq. (11) and eq. (12) Test 1 λ f σ f ν f λ c σ c ν c λ cf ρ σ nf σ nc True 0.05 1 0.5 — — — . × − . × − LOO-CV 0.0654 1.05 0.980 0.0877 0.806 2.99 0.0801 0.855 8 . × − . × − Test 2True 0.1 1 0.5 — — — . × − . × − LOO-CV 0.108 0.891 0.997 0.124 0.84 1.69 0.117 0.968 4 . × − . × − table 2 presents the estimated full bivariate Mat´ern parameters for both test cases.For Test 1, fig. 2 shows both the reference fine and coarse scale fields and the observa-tion locations. Similarly, fig. 3 presents the estimated, true, and empirical variograms.For this test we have significant scale separation as shown by the 46% variance decreasefrom y f to y c . From table 2 and fig. 3, we see that both ML and LOO-CV provide ac-curate estimates of the fine scale and coarse scale standard deviations and the collocatedcorrelation coefficient, with LOO-CV overestimating the standard deviations. Both MLand LOO-CV overestimate λ f by approximately 30%, which indicates that more fine-scale observations are necessary to more accurately estimate the fine-scale correlationlength. More importantly, both ML and LOO-CV correctly identify the difference in smooth-ness between scales. Even though ML and LOO-CV overestimate the fine scale shapeparameter, both correctly estimate that the fine scale field is not mean-square differen-tiable. Furthermore, both methods estimate coarse correlation length and shape param-eter larger than their fine scale counterparts, aligned with the observations in section 3.We compute the mean and variance for the fine and coarse field conditioned on mul-tiscale data by employing the Nystr¨om method presented in appendix A: . The resultsare shown in fig. 4. It can be seen in fig. 4c and fig. 4d how multiscale measurements in-form the conditioning for both scales. The measurements at a given scale reduce the con-ditional variance to the level of noise, while the measurements at the other scale reducevariance to a lesser but still noticeable degree.For Test 2 we have less differentiation between scales compared to Test 1. fig. 5 showsboth the reference fine and coarse scale fields. fig. 6 shows the estimated, true, and em- –9–onfidential manuscript submitted to Water Resources Research (a) y f -202 (b) y c Figure 2.
Synthetic reference fine and coarse fields, and observation locations, for Test 1 r γ / σ f ExactEmpiricalMLLOO-CV (a) Fine scale r γ / σ f ExactEmpiricalMLLOO-CV (b) Coarse scale r γ / σ f ExactMLLOO-CV (c) Coarse-fine
Figure 3.
Fine scale and coarse scale variograms, and coarse-fine cross-variograms for Test 1.True and estimated variograms indicated in continuous and dashed lines, respectively. Empiricalvariograms indicated with crosses. –10–onfidential manuscript submitted to
Water Resources Research (a) Conditional mean ˆ µ f -202 (b) Conditional mean ˆ µ c (c) Conditional variance (ˆ σ f ) (d) Conditional variance (ˆ σ c ) Figure 4.
Mean and variance of the fine and coarse scale fields, conditioned on multiscaledata, for Test 1. pirical variograms. Additionally, fig. 7 shows the conditional mean and variance. Dueto the lesser degree of separation between scales compared to Test 1, model selection ismore challenging. Nevertheless, as for Test 1, we find that both ML and LOO-CV pro-vide estimates for the full bivariate Mat´ern model that accurately resolve the structureof the multiscale reference model. We see that both ML and LOO-CV overestimate theshape parameter of the fine scale field, but correctly identify that the coarse field is smootherand has longer correlation length. As the degree of separation between scales is lesser,we can see in fig. 4c and fig. 4d that measurements on the fine scale more sharply reducethe conditional variance at the coarse scale, and that measurements on the coarse scalemore sharply reduce the conditional variance at the fine scale. (a) y f -202 (b) y c Figure 5.
Synthetic reference fine and coarse fields, and observation locations, for Test 2
We now illustrate how the estimated hyperparameters of the multivariate Mat´ernmodel vary for randomly chosen reference fields and observation locations. For this ex-periment we generate 500 draws of normalized fine scale log-conductivity fields from thesame GP and compute for each one the corresponding coarse scale field via block aver- –11–onfidential manuscript submitted to
Water Resources Research r γ / σ f ExactEmpiricalMLLOO-CV (a) Fine scale r γ / σ f ExactEmpiricalMLLOO-CV (b) Coarse scale r γ / σ f ExactMLLOO-CV (c) Coarse-fine
Figure 6.
Fine scale and coarse scale variograms, and coarse-fine cross-variograms for Test 2.True and estimated variograms indicated in continuous and dashed lines, respectively. Empiricalvariograms indicated with crosses. –12–onfidential manuscript submitted to
Water Resources Research (a) Conditional mean ˆ µ f -202 (b) Conditional mean ˆ µ c (c) Conditional variance (ˆ σ f ) (d) Conditional variance (ˆ σ c ) Figure 7.
Mean and variance of the fine and coarse scale fields, conditioned on multiscaledata, for Test 2 aging with given support η c . For each pair of fields we obtain hyperparameter estimatesusing both the ML and LOO-CV approaches from randomly chosen observations. As inTests 1 and 2, observation locations are randomly taken from the set of discretizationcentroids. The reference field parameters are set as λ f = 0 . ν f = 0 . σ f = 1 . η c = 8, and σ (cid:15) = 5 × − . The estimation domain is Ω = [0 , L × [0 , L , and fieldsare computed on a 64 ×
64 square grid. 150 and 50 observations are taken at the fineand coarse scales, respectively.Results for Test 3 are summarized in fig. 8. These results, as those of Tests 1 and2, indicate that the proposed approach is capable of identifying the different degrees ofsmoothness of each scale. We also see that the true parameters fall either inside or closethe interquartile range of estimated hyperparameters with the exception of λ f . This in-dicates that more fine scale observations are generally required to accurately estimatethe fine scale correlation length. We finally remark that fig. 8 indicates that both MLand LOO-CV result in similar estimate variability. Nevertheless, we recommend the useof LOO-CV estimation as it was more robust with respect to initial guess of hyperpa-rameters.To close this section, for Tests 1 and 2, we compare the fine and coarse scale fieldsestimated from multiscale data and estimated from the corresponding single scale datasets.For this purpose we compute the mean square error (MSE) of the predicted log-conductivity,defined for scale a = { c, f } asMSE a ( D ) = 1 L (cid:90) E (cid:104) ( y a ref ( x ) − y a ( x ) | D ) (cid:105) d x (19)The results are summarized in table 3. It can be seen that, for both the fine and coarsescale fields and for both tests 1 and 2, going from single scale data to multiscale dataresults in a reduction in MSE. This reduction is more pronounced for the coarse scalefield (24% for test 1 and 33% for test 2). In fact, we expect fine scale measurements toinform the coarse field estimate more than vice versa due to the indeterminacy of the –13–onfidential manuscript submitted to Water Resources Research
ML LOO-CV00.050.10.150.20.25 λ c ML LOO-CV012345 ν c ML LOO-CV0.40.60.81 σ c ML LOO-CV00.050.10.150.2 λ f ML LOO-CV00.20.40.60.81 ν f ML LOO-CV0.60.811.2 σ f ML LOO-CV00.050.10.150.2 λ cf ML LOO-CV0.20.40.60.81 σ cf Figure 8.
Hyperparameters of the full bivariate model estimated from synthetic multiscaledata. with λ f = 0 . ν f = 0 . σ f = 1 .
0, and λ f /η c = 0 .
4, estimated from 500 sets of randomlygenerated synthetic multiscale data. Boxes indicate interquartile range and whiskers extend tothe most extreme value that is not 1.5 times the interquartile range from the box. Solid linesindicate the median, and dashed lines indicate true values.–14–onfidential manuscript submitted to
Water Resources Research inverse of the coarsening relation. Similarly, reduction in MSE is more pronounced forTest 2 than for Test 1, which is to be expected as we observe less scale separation be-tween fine and coarse fields for Test 2.
Table 3.
MSE of predicted log-conductivity with respect to the reference for the tests of ta-ble 1, computed using eq. (19)
Single scale data Multiscale dataFine Coarse Fine CoarseTest ML LOO-CV ML LOO-CV ML LOO-CV ML LOO-CV1 1.150 1.180 0.812 0.859 1.030 1.050 0.472 0.4922 0.596 0.590 0.643 0.639 0.495 0.489 0.284 0.280
We present an application of multivariate GP regression to uncertainty propaga-tion in physical systems. We consider Darcy flow in heterogeneous porous media, wherewe are interested in predicting spatial hydraulic pressure ( h ) profiles from multiscale ob-servations of h and of the hydraulic conductivity K , and estimating the uncertainty inthese predictions. We assume hydraulic head observations have a fine-scale support vol-ume corresponding to piezometric observations, whereas hydraulic conductivity obser-vations have fine-scale and coarse-scale support volumes corresponding to both labora-tory experiments performed on field samples and cross-pumping tests.In order to employ the Darcy flow model to propagate uncertainty in the spatialhydraulic conductivity distribution, both the hydraulic pressure and conductivity fieldsmust be defined with the same support volume. As h observations are fine-scale, we re-quire constructing a probabilistic model for the fine-scale hydraulic conductivity field K f from multiscale observations of K . We assume there’s a data set D s of multiscale ob-servations of the log hydraulic conductivity Y = log K . In particular we assume thedata set of Test 1 in section 4.1.The Darcy flow problem reads ∇ · (cid:2) K f ( x ) ∇ h f ( x ) (cid:3) = 0 , x ≡ ( x , x ) (cid:62) ∈ Ω( ≡ [0 × L ] × [0 , L ]) , (20) h f (0 , x ) = h f L , h f (2 L, x ) = h f R , (21) ∂∂x h f ( x ,
0) = ∂∂x h f ( x , L ) = 0 , (22)where h f ( x ) is the fine-scale hydraulic pressure field. The field h f ( x ) is discretized intothe vector h f = ( h f ( x ) , . . . , h f ( x N h )) (cid:62) , where { x i } N h i =1 is a set of discrete nodes. Ourgoal is to estimate h f ( x ) given discrete observations of h f and the multiscale data set D s of observations of Y . We assume that observations h fs of the discrete hydraulic headfield are taken as h fs = Hh f + (cid:15) h , where H ∈ R N hs × N h is a sampling matrix with N hs (cid:28) N h , and (cid:15) h ∼ N (0 , σ eh I ) is the normally distributed observation error. For this exper-iment, we take N hs = 20 head observations at observation locations randomly chosenfrom the set of discretization centroids. Furthermore, we assume an observation errorstandard deviation σ (cid:15)h = 5 × − ( h f L − h f R ).We model the discrete log hydraulic conductivity y as a GP. Given the uncertaintyin y and the observations h fs , the minimum mean square error estimator of the hydraulichead field, ˆ h (predictive mean), and its covariance ˆ C h (predictive covariance), are given –15–onfidential manuscript submitted to Water Resources Research by ˆ h = ¯ h + C h H (cid:62) (cid:0) HC h H (cid:62) + σ (cid:15)h I (cid:1) − ( h fs − H ¯ h ) , (23)ˆ C h = C h − C h H (cid:62) (cid:0) HC h H (cid:62) + σ (cid:15)h I (cid:1) − HC h , (24)where ¯ h and C h are the prior mean and covariance of h f given the uncertainty in Y . Weestimate ¯ h and C h from 1 × Monte Carlo (MC) simulations of the GP y f | D s andthe problem eqs. (20) to (22). Results for the profile x = 0 . L are presented in fig. 9and table 4.fig. 9a shows the mean and variance along the profile x = 0 . L . It can be seenthat the variance of h f due to uncertainty in Y is narrow because of conditioning y f onmultiscale data. table 4 presents the L -norm of the prior variance along the profile x =0 . L obtained when y f is conditioned on fine-scale data only and on multiscale data. Con-sistent with our observations in section 4.1, conditioning on multiscale data results ina reduction in uncertainty ( ∼ h and C h , we apply eqs. (23) and (24) to obtain thepredictive mean and covariance ˆ h and ˆ C h conditioned on h f observations. fig. 9b showsthe mean and variance of the estimated hydraulic head along the profile x = 0 . L , to-gether with the reference profile. Good agreement of the predictive mean ˆ h with the ref-erence profile is observed, with the reference profile falling inside the 95% confidence in-terval indicated by the predictive variance. table 4 shows the L -norm of the conditionedvariance when the GP model for y f is conditioned on fine-scale data and on multiscaledata. It is again seen that by conditioning on multiscale data the uncertainty in the stateis reduced. x/L ( h f − h f R ) / ( h f L − h f R ) (a) x/L ( h f − h f R ) / ( h f L − h f R ) (b) Figure 9.
Estimated normalized profile h f ( x , x = 0 . L ) computed via minimum meansquare estimation. (a) Mean and 95% confidence interval of Gaussian prior computed from1 × realizations of the fine-scale log-conductivity field of Test 1 conditioned on multiscaledata. (b) Predictive man (red) and 95% confidence interval, computed using 20 head observa-tions, and reference (black) We have presented a multiscale simple kriging approach for parameter field recon-struction that incorporates measurements with two different support volumes (fine and –16–onfidential manuscript submitted to
Water Resources Research
Table 4. L -norm of the variance of normalized profile h f ( x , x = 0 . L ) prior and after con-ditioning on h f observations. GP model for y f trained first on fine-scale data only, and then onmultiscale data. h f ( x , x = 0 . L ) variance Fine-scale data only Multiscale dataPrior 1 . × − . × − Conditioned 2 . × − . × − coarse scales). Our results show that treating each support scale as components of a bi-variate Gaussian process with full bivariate Mat´ern covariance function is a viable strat-egy for modeling multiscale observations. We find that the full bivariate Mat´ern modelwith hyperparameters estimated from data accurately captures the stationary featuresof reference fields for scenarios exhibiting moderate to large separation between fine andcoarse scales. In particular, we show that the model selection approaches employed inthis work, marginal likelihood and leave-one-out cross-validation, both qualitatively iden-tify the difference in smoothness and correlation lengths between scales. Nevertheless,we recommend the use of LOO-CV estimation as it was more robust with respect to ini-tial guess of hyperparameters.Furthermore, we show that conditioning on multiscale data can be used to reduceuncertainty in stochastic modeling of physical systems with heterogeneous model param-eters sampled over multiple scales.Future work will extend the proposed approach to more than two spatial scales byemploying valid multivariate Mat´ern models [ Apanasovich et al. , 2012].
A: Nystr¨om Method for Simulating GPs
In this section we present the Nystr¨om method employed in this work for simulat-ing GPs. Nystr¨om methods are used to approximate the factorization of large covari-ance matrices in terms of the factorization of a smaller covariance matrices [
Halko et al. ,2011].Let y ( x ) ∼ GP (0 , C ( x , x (cid:48) )). Our goal is to simulate realizations of y , y ( k ) ∗ at the N nodes X ∗ . Simulating y via factorization (e.g. Cholesky, eigendecomposition, etc.) iscomputationally prohibitive for large N as factorizations require O ( N ) operations and O ( N ) storage. Therefore, we aim to compute a low-rank approximation to the covari-ance matrix of the form C ( X ∗ , X ∗ ) ≈ AB − A (cid:62) , where B is a rank- M spd matrix with M (cid:28) N .By Mercer’s theorem, the covariance kernel C ( x , x (cid:48) ) allows the representation interms of its eigenpairs { λ k , ψ k } ∞ k =1 , C ( x , x (cid:48) ) = ∞ (cid:88) k =1 λ k ψ k ( x ) ψ k ( x (cid:48) ) , (A.1)where the eigenpairs satisfy the relation λ i ψ i ( x ) = (cid:90) Ω C ( x , x (cid:48) ) ψ ( x (cid:48) ) d x (cid:48) (A.2)and the orthogonality relation (cid:82) Ω ψ i ( x ) ψ j ( x ) d x = δ ij . We approximate the integralin eq. (A.2) using an M -node quadrature rule over Ω with nodes ˜ X k = (˜ x , . . . , ˜ x M ) –17–onfidential manuscript submitted to Water Resources Research and uniform weight w (e.g., midpoint rule, MC sampling, etc.), obtaining λ i ψ i ( x ) ≈ w M (cid:88) k =1 C ( x , ˜ x k ) ψ i (˜ x k ) . (A.3)Furthermore, let ˜ C ≡ C ( ˜ X, ˜ X ) denote the covariance matrix for the quadrature nodes,with eigendecomposition ˜ C = ˜ U ˜ L ˜ U (cid:62) , ˜Λ = diag(˜ λ i , . . . , ˜ λ M ). Approximating ψ i (˜ x k )and λ i in terms of ˜ U and ˜Λ, we obtain λ k ≈ w ˜ λ k , ψ i ( x ) ≈ w / ˜ λ i M (cid:88) k =1 C ( x , ˜ x k ) ˜ U ki , k = 1 , . . . , M. (A.4)The relations in eq. (A.4) can be employed to construct an approximate K-L expansion,which then can be used to simulate the field. In this work we follow an alternative pro-cedure, employed in section 4.2 for simulation and in section 4.1 to compute conditionalmean and variances. Substituting eq. (A.4) into eq. (A.1), we obtain the following rank- M approximation to the covariance matrix C ( X ∗ , X ∗ ), C ( Y, Z ) ≈ C ( Y, ˜ X ) (cid:16) ˜ U ˜Λ − ˜ U (cid:62) (cid:17) C ( ˜ X, Z ) = C ( Y, ˜ X ) ˜ C − C ( ˜ X, Z ) , (A.5)where Y and Z are two arbitrary sets of nodes.For prediction, we employ eq. (A.5) to compute C cp (and C fp ) in eq. (7). For sim-ulation, we note that the covariance matrix ˜ C admits the Cholesky factorization ˜ C =˜ L ˜ L (cid:62) . Substituting this factorization into eq. (A.5) for Y = Z = X ∗ we obtain the rect-angular factorization C ( X ∗ , X ∗ ) = LL (cid:62) , L (cid:62) = ˜ L − C ( ˜ X, X ∗ ) . (A.6)The realizations are simulated as y ( k ) ∗ = Lξ ( k ) , where the ξ ( k ) are realizations of ξ ∼N (0 , I M ). This procedure requires O ( N M ) operations and O ( M N ) storage.We employed the centroids of a 128 ×
64 ( M = 8192) uniform discretization ofΩ as quadrature nodes for computing conditional mean and variance in section 4.1 andfor MC simulations presented in section 4.2. Acknowledgments
All code employed to generate all data and results presented in this manuscript can befound at https://doi.org/10.5281/zenodo.1205644 . This research at Pacific North-west National Laboratory (PNNL) was supported by the U.S. Department of Energy (DOE)Office of Advanced Scientific Computing Research (ASCR) as part of the project “Un-certainty quantification for complex systems described by Stochastic Partial Differen-tial Equations.” PNNL is operated by Battelle for the DOE under Contract DE-AC05-76RL01830.
References
Apanasovich, T. V., M. G. Genton, and Y. Sun (2012), A valid mat´ern class ofcross-covariance functions for multivariate random fields with any number ofcomponents,
Journal of the American Statistical Association , (497), 180–193,doi:10.1080/01621459.2011.643197.Cressie, N. A. C. (2015), Geostatistics , pp. 27–104, John Wiley & Sons, Inc., doi:10.1002/9781119115151.ch2.Genton, M. G., and W. Kleiber (2015), Cross-covariance functions for multivariategeostatistics,
Statistical Science , (2), 147–163, doi:10.1214/14-STS487. –18–onfidential manuscript submitted to Water Resources Research
Gneiting, T., W. Kleiber, and M. Schlather (2010), Mat´ern cross-covariance func-tions for multivariate random fields,
Journal of the American Statistical Associa-tion , (491), 1167–1177, doi:10.1198/jasa.2010.tm09420.Halko, N., P. G. Martinsson, and J. A. Tropp (2011), Finding structure with ran-domness: Probabilistic algorithms for constructing approximate matrix decompo-sitions, SIAM Review , (2), 217–288, doi:10.1137/090771806.Li, C., Z. Lu, T. Ma, and X. Zhu (2009), A simple kriging method incorporatingmultiscale measurements in geochemical survey, Journal of Geochemical Explo-ration , (2), 147 – 154, doi:10.1016/j.gexplo.2008.06.003.Rasmussen, C. E., and C. K. I. Williams (2005), Gaussian Processes for MachineLearning (Adaptive Computation and Machine Learning) , The MIT Press.Schlather, M., A. Malinowski, P. J. Menck, M. Oesting, and K. Strokorb (2015),Analysis, simulation and prediction of multivariate random fields with packageRandomFields,
Journal of Statistical Software , (8), 1–25.Schlather, M., A. Malinowski, M. Oesting, D. Boecker, K. Strokorb, S. Engelke,J. Martini, F. Ballani, O. Moreva, J. Auel, P. J. Menck, S. Gross, U. Ober,Christoph Berreth, K. Burmeister, J. Manitz, P. Ribeiro, R. Singleton, B. Pfaff,and R Core Team (2017), RandomFields: Simulation and Analysis of RandomFields , r package version 3.1.50.Stein, M. L. (1999),
Interpolation of spatial data: some theory for kriging , SpringerSeries in Statistics, Springer-Verlag, New York.Tartakovsky, A. M., M. Panzeri, G. D. Tartakovsky, and A. Guadagnini (2017), Un-certainty quantification in scale-dependent models of flow in porous media,
WaterResources Research , pp. n/a–n/a.Vanmarcke, E. (2010),
Random Fields: Analysis and Synthesis , World Scientific., World Scientific.