Convergence of Probability Densities using Approximate Models for Forward and Inverse Problems in Uncertainty Quantification
CConvergence of Probability Densities using ApproximateModels for Forward and Inverse Problems in UncertaintyQuantification
T. Butler ∗ , J.D Jakeman † , T. Wildey † July 3, 2018
Abstract
We analyze the convergence of probability density functions utilizing approximate models for bothforward and inverse problems. We consider the standard forward uncertainty quantification problemwhere an assumed probability density on parameters is propagated through the approximate model toproduce a probability density, often called a push-forward probability density, on a set of quantities ofinterest (QoI). The inverse problem considered in this paper seeks a posterior probability density onmodel input parameters such that the subsequent push-forward density through the parameter-to-QoImap matches a given probability density on the QoI. We prove that the probability densities obtained fromsolving the forward and inverse problems, using approximate models, converge to the true probabilitydensities as the approximate models converges to the true models. Numerical results are presented todemonstrate optimal convergence of probability densities for sparse grid approximations of parameter-to-QoI maps and standard spatial and temporal discretizations of PDEs and ODEs.
Keywords. inverse problems, uncertainty quantification, density estimation, surrogate modeling, re-sponse surface approximations, discretization errors
Assessing modeling uncertainties is essential for credible simulation-based prediction and design. Forwarduncertainty quantification (UQ) problems involve estimating uncertainty in model outputs caused by uncer-tain inputs. Inverse UQ problems involve using (noisy) data associated with (a subset of) model outputs toupdate prior information on model inputs. Practical UQ studies often require the solution of both an inverseand forward problem. Unfortunately, evaluating high-fidelity models is often computationally demanding,and methods for solving either forward or inverse UQ problems typically require generating large ensem-bles of simulation runs evaluated at varying realizations of the random variables used to characterize themodel input uncertainty. UQ analyses are also complicated by the simple fact that many of the governingequations used to model physical systems can rarely be solved analytically and so instead must be solvedapproximately.In this paper, we investigate how using approximate models affects the probabilities densities solvingforward and inverse UQ problems. The theory we develop is general. We demonstrate its utility usingcommon forms of approximations, namely, temporal and spatial discretization for the numerical solution ofdifferential equations, and sparse grid surrogate models.The convergence of certain statistical quantities (e.g., mean and variance) is well-studied for many popularchoices of surrogate approximations including generalized polynomial chaos expansions (PCE) [18, 41], sparsegrid interpolation [2, 28] and Gaussian process models [33]. However, little attention in the literatureis given to the impact of surrogate approximations on probability density functions. For example, PCEapproximations for random variables with finite second moments exhibit mean-square convergence [18] andthus a sequence of PCE converge in both probability and distribution. But while Scheffe’s theorem states thatalmost everywhere (a.e.) convergence of probability density functions implies convergence in distribution,the converse is generally not true. For a classical counterexample to the converse, consider the sequence of
Disclosure: See acknowledgementsCorrespondence author: T. Wildey ([email protected]) ∗ CUB † Optimization and Uncertainty Quantification Department, Sandia National Labs, Albuquerque, NM, USA a r X i v : . [ m a t h . NA ] J u l T. Butler, J. Jakeman, T. Wildey random variables ( X n ) with densities (1 − cos(2 πnx )) for x ∈ [0 , Given a probability density describing the uncertain model inputs, the forward problemseeks to determine the push-forward probability density obtained by propagating the input density through theparameter-to-QoI map. (Inverse Problem)
Given an observed probability density, the inverse problem seeks a pullback probabilitydensity for the model inputs that when propagated through the parameter-to-QoI map, produces a push-forwarddensity that exactly matches the observed density.
We prove convergence results for both the forward and inverse problems in the total variation metric (i.e.,the so-called “statistical distance” metric). Specifically, we show that, under suitable conditions, sequencesof approximate push-forward or pullback densities obtained using approximate models converge at a rateproportional to the rate of convergence of the approximate models to the true model. To our knowledge, thisanalysis is the first of its kind and exploits a special form of the converse of Scheffe’s theorem first provenin [5] and subsequently generalized in [37]. Under more restrictive conditions, namely those necessary forconvergence of a standard kernel density estimator, we prove that the rates of convergence are bounded bythe error in the kernel density approximation and the L ∞ -error in the approximate model.To our knowledge, this work on the convergence of push-forward and pullback densities using approxi-mate models is the first of its kind. However, complementary work on the convergence of classical Bayesianinverse problems using PCE is studied in [29]. In that work, the Kullback-Leibler divergence (KLD) is usedto measure the difference between the true and approximate posterior densities (using standard assump-tions in classical Bayesian analysis). Convergence of the densities in the sense of the KLD converging tozero are proven. Furthermore, the convergence of the KLD shown in [29] does imply convergence to theclassical Bayesian posterior in the total variation metric by application of Pinsker’s inequality. However,the analysis provided in [29] does not generalize for either the forward or inverse problems studied in thiswork. Specifically, we do not restrict approximate models to be defined by PCE surrogates, and, as shownin [11], the classical Bayesian posterior is not designed to give a pullback measure. Moreover, we allow theobserved densities used to define our posteriors (i.e., the pullback measures) to be of a more general classthan Gaussian distributions assumed for the error models in [29]. Additionally, the posteriors we obtain arenot simply normalized by a constant as with classical Bayesian posteriors. Subsequently, key inequalitiessuch as (4.11) in [29] used to prove the fundamental lemmas for the convergence of the KLD for a classicalBayesian posterior simply do not apply to our pullback densities.The remainder of the paper is organized as follows. In Section 2 we provide a formal definition of theforward problem and discuss the theoretical aspects of its solution using approximate models. We thenintroduce the inverse problem and prove that the posterior density corresponding to an approximate modelconverges to the true posterior. In Section 4, we review some important classes of approximate models anduse our general theoretical results to provide specific error bounds for the classes of approximate models weconsider. For each of these applications, we provide numerical results to complement our theoretical resultsand highlight important aspects of the forward and inverse problems. We provide concluding remarks inSection 5. In this section we consider solution of the forward problem using approximate models. We use the termapproximate models in a broad sense to mean any sequence of approximations to the response of the modeloutputs. Specifically, for a given model, let Λ ⊂ R k denote a space of inputs to the model that we referto simply as parameters. Given a set of quantities of interest (QoI), we define the parameter-to-QoI map Q ( λ ) : Λ → D ⊂ R m . The range of the QoI map D := Q ( Λ ) describes the space of observable data for theQoI that can be predicted by the model. We let ( Q n ) denote a sequence of approximate parameter-to-QoImaps defined by the approximate models. Q with Approximate Models To facilitate the analysis of solutions to the forward problem presented in the introduction, we first formalizethe forward problem definition. Let ( Λ , B Λ , µ Λ ) and ( D , B D , µ D ) denote measure spaces with B Λ and B D theBorel σ -algebras inherited from the metric topologies on Λ ⊂ R k and D ⊂ R m , respectively. The measures µ Λ and µ D are the dominating measures for which probability densities (i.e., Radon-Nikodym derivatives ofprobability measures) are defined on each space. Definition 2.1 (Forward Problem and Push-Forward Measure) . Given a probability measure P Λ on ( Λ , B Λ ) that is absolutely continuous with respect to µ Λ and admits a density π Λ , the forward problem is the deter-mination of the push-forward probability measure P Q D ( A ) = P Λ ( Q − ( A )) , ∀ A ∈ B D . on ( D , B D ) that is absolutely continuous with respect to µ D and admits a density π Q D . Here, and in the remainder of the paper, we assume that the parameter-to-QoI map, Q , is a measurable andpiecewise smooth map between ( Λ , B Λ ) and ( D , B D ) so that Q − ( A ) = { λ ∈ Λ | Q ( λ ) ∈ A } ∈ B Λ , and Q ( Q − ( A )) = A. If P Λ is described in terms of a density π Λ with respect to µ Λ (i.e., π Λ = dP Λ /dµ Λ is the Radon-Nikodymderivative of P Λ ), it is not necessarily the case that P Q D is absolutely continuous with respect to the Lebesguemeasure on D . Following [11], we assume that either the measure µ D on ( D , B D ) is defined as the push-forward of µ Λ , or the push-forward of µ Λ is absolutely continuous with respect to a specified µ D .In practice, even if an exact parameter-to-QoI map Q ( λ ) is available, we will often approximate thepush-forward of π Λ using finite sampling and standard density estimation techniques. We formalize, inAssumption 1 below, the types of parameter densities π Λ , for which we may reasonably expect to obtainaccurate approximations of π Q D using Monte Carlo sampling and standard density estimation techniques. Assumption 1.
For a given Q ( λ ) , π Λ is chosen so that sup q ∈D π Q D ( q ) ≤ B for some B > , and π Q D iscontinuous on D except possibly on a set A ⊂ D of zero µ D -measure. Generally, for any finite set of samples for any distribution, { q i } Mi =1 , a standard kernel density estimateof a density π Q D ( q ) will produce a bounded approximation that is continuous everywhere and has the formˆ π Q D ( q ) = 1 M h mM M (cid:88) i =1 K (cid:18) q − q i h M (cid:19) , (2.1)where h M is the bandwidth parameter and K ( q ) is the kernel function. It is common to assume thatthe kernel is integrable with (cid:82) D K ( q ) dµ D = 1 and is bounded, i.e., there exist a constant κ such that (cid:107) K ( q ) (cid:107) L ∞ ( D ) ≤ κ < ∞ .The accuracy of the estimated push-forward density is dependent on the number of samples M anddimension m of the space. The following result from [20] gives a rate of convergence in the L ∞ -norm undercertain assumptions on the regularity of the density and the kernel. Theorem 2.1. If π Q D and the s th -order derivatives of π Q D are uniformly continuous, K ( q ) is an s th -orderkernel that is bounded and integrable, and h M satisfies the criteria described in [20], then the error in thekernel density estimate given by (2.1) satisfies (cid:107) π Q D ( q ) − ˆ π Q D ( q ) (cid:107) L ∞ ( D ) ≤ C (cid:18) log MM (cid:19) s/ (2 s + m ) . The Gaussian kernel is a popular choice for which s = 2 yielding an O ( M − / (4+ m ) ) rate of convergence inthe L ∞ -norm if one ignores the log factor. The rate of convergence of the KDE using the Gaussian kernel canalso be shown to be O ( M − / (4+ m ) ) in the mean-squared error [38] and O ( M − / (4+ m ) ) in the L -error [14] T. Butler, J. Jakeman, T. Wildey under similar assumptions on the kernel, the bandwidth parameter and the regularity of π Q D . Since the rateof convergence is rather slow in M and scales poorly with dimension, KDEs often requires a large numberof samples to achieve an acceptable level of accuracy motivating the use of (computationally inexpensive)approximate models. Let ( Q n ( λ )) denote a sequence of approximations to Q ( λ ). The use of approximate QoI maps introducesan additional error in the estimates of push-forward densities. Two practical requirements are neededto approximate the push-forward densities using any particular Q n ( λ ). The first requirement is that theapproximate push-forward densities are uniformly bounded if the exact push-forward density is boundedso that point-wise errors are not allowed to become arbitrarily large in which case we would not expectconvergence at all. The second requirement puts constraints on the continuity of the approximate push-forward density. To formalize the second requirement we use a generalized notion of equicontinuity toconsider functions that may have many points of discontinuity such as density functions that are onlycontinuous in an a.e. sense. Definition 2.2.
Using similar notation from [37], we say that a sequence of real-valued functions ( u n ) defined on R k is asymptotically equicontinuous (a.e.c.) at x ∈ R k if ∀ (cid:15) > , ∃ δ ( x, (cid:15) ) > , n ( x, (cid:15) ) s.t. | y − x | < δ ( x, (cid:15) ) , n > n ( x, (cid:15) ) ⇒ | u n ( y ) − u n ( x ) | < (cid:15). If δ ( x, (cid:15) ) = δ ( (cid:15) ) and n ( x, (cid:15) ) = n ( (cid:15) ) , then we say that the sequence is asymptotically uniformly equicontinuous(a.u.e.c.) . Using this definition and letting π Q n D denote the push-forward of the prior density using the map Q n ( λ )we make the following assumption to encode our two practical requirements. Assumption 2.
Let ( Q n ( λ )) denote a sequence of approximations to Q ( λ ) , then there exists B > suchthat for any n , sup q ∈D π Q n D ( q ) ≤ B . Moreover, for any δ > , there exists N δ ⊂ D such that A ⊂ N δ , µ D ( N δ ) < δ , and the sequence of approximate push-forward densities is a.u.e.c. on D\ N δ . This assumption allows for the construction of any approximate push-forward density which is discontinu-ous more often than the exact push-forward density as long as the magnitude of the discontinuities decreasesasymptotically except possibly in a set that can be made arbitrarily small in µ D -measure that containsdiscontinuities of the exact π Q D . Under Assumptions 1 and 2, we can construct push-forward densities usingapproximate models that converge as the approximate model is refined. Theorem 2.2 (Convergence of Push-Forward Densities) . Let ( Q n ( λ )) denote a sequence of approximationsto Q ( λ ) such that Q n ( λ ) → Q ( λ ) in L ∞ ( Λ ) as n → ∞ , i.e., ∀ δ > , ∃ N s.t. n > N ⇒ (cid:107) Q n ( λ ) − Q ( λ ) (cid:107) L ∞ ( Λ ) < δ. (2.2) If Assumptions 1 and 2 hold, then for any (cid:15) > , there exists N such that n > N implies that both (cid:13)(cid:13)(cid:13) π Q D ( q ) − π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) < (cid:15), (2.3) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) < (cid:15). (2.4)In Theorem 2.2, (2.3) implies that the approximate push-forward densities converge in L ∞ ( D ), i.e.,the densities associated with the forward propagation of densities converge on D when evaluated usingexact values of the QoI q . However, in practice, we evaluate the approximate push-forward density at anapproximate QoI value to determine variations in relative likelihoods of the QoI data as parameters are varied.Equation (2.4) states that the approximate push-forward densities evaluated at approximate values of the Using this definition of equicontinuity, sequences of functions that are either equicontinuous or uniformly equicontinuousin the classical sense are automatically a.e.c. or a.u.e.c. since the definitions coincide if this definition is restricted to sequencesof continuous functions.
Q with Approximate Models L ∞ ( Λ ).Before proving Theorem 2.2, we first provide some context for the approach. Certainly, for any p ≥ L p implies convergence in probability, which in turn implies convergence in distribution (i.e.,weak convergence), so we have that the sequence of push-forward measures associated with Q n ( λ ) convergeweakly to the push-forward measure of Q ( λ ). While Scheffe’s theorem states that a.e. convergence of densitiesimplies convergence in distribution of the random variables, the converse is generally not true as mentionedin the introduction.In [5], a converse to Scheffe’s theorem is proven under the conditions that the densities associated withweakly convergent distributions are point-wise bounded and uniformly equicontinuous from which the clas-sical Arzel`a-Ascoli theorem implies uniform convergence of the densities. Subsequently, in [37], this converseto Scheffe’s theorem was generalized for classes of densities that are a.u.e.c. for the sequence of distributionsconverging weakly to a distribution with a continuous density. Therefore, in the proof below, we begin byisolating discontinuities in the exact and approximate push-forward densities using Assumptions 1 and 2 toapply this converse to Scheffe’s theorem on “most” of D . Proof
Let (cid:15) > δ = (cid:15) B + B ) . Let N δ denote the associated set such that µ D ( N δ ) < δ in Assumption 2. Then, (cid:13)(cid:13)(cid:13) π Q D ( q ) − π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) = (cid:13)(cid:13)(cid:13) π Q D ( q ) − π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( N δ ) + (cid:13)(cid:13)(cid:13) π Q D ( q ) − π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D\ N δ ) (2.5)By the choice of δ , the first term on the right-hand side of (2.5) is bounded by (cid:15)/
2. By Theorem 1 in [37], π Q n D → π Q D uniformly on D\ N δ . Thus, the second term on the right-hand side of (2.5) can also be boundedby (cid:15)/ n sufficiently large, which proves (2.3).To prove (2.4), we first apply a triangle inequality to get (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − π Q D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) + (cid:13)(cid:13)(cid:13) π Q D ( Q n ( λ )) − π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) (2.6)By (2.2) and Assumption 1, there exists δ > (cid:15)/
2. Note that the norm for the second term on the right-hand side of (2.6) is equivalent to the L ∞ ( D ) norm since the arguments in the densities are identical. Then, by the above argument, this can bebounded by (cid:15)/
2, which proves (2.4).The next lemma states that the KDE approximation using the sequence of approximate models convergesto the KDE approximation using the true model. The proof of Lemma 2.1 is straightforward and is omittedfor the sake of brevity.
Lemma 2.1.
Assume that a set of M samples are used to generate KDE approximations using the truemodel and the approximate model giving ˆ π Q D and ˆ π Q n D respectively. If K ( q ) is Lipschitz continuous, then wehave the following bounds on the error in the KDE approximation using the approximate model, (cid:13)(cid:13)(cid:13) ˆ π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) , (2.7) and (cid:13)(cid:13)(cid:13) ˆ π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) . (2.8)It is important to note that Lemma 2.1 does not require the same assumptions as Theorem 2.1 since itonly shows that for a given set of samples, the KDE approximation using the approximate model convergesto the KDE approximation using the true model. This is true even if the KDE approximation using the truemodel does not converge to the true density. T. Butler, J. Jakeman, T. Wildey
Under the stricter assumptions in Theorem 2.1, i.e., those necessary to prove convergence of the KDE,we prove that the approximation of the push-forward using the KDE and the approximate model convergesto the true density at a rate that depends on both the KDE approximation error as well as the approximatemodel error. In Section 4, we give corollaries to this result for specific choices of approximate models.
Theorem 2.3 (Convergence of KDE Approximations of Push-Forward Densities) . Assume that a KDEapproximation is generated using M samples of the approximate model. If π Q D , K ( q ) and the bandwidthparameter satisfy the assumptions in Theorem 2.1, and if K is also Lipschitz continuous, then we have thefollowing bounds on the error in the KDE approximation using the approximate model, (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) (cid:33) , (2.9) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) (cid:33) . (2.10) Proof
First we prove (2.9). An application of the triangle inequality gives (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) + (cid:13)(cid:13)(cid:13) ˆ π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) . The first term is bounded using Theorem 2.1 and the second term is bounded using Lemma 2.1. Next, weprove (2.10). Proceeding as before, we apply the triangle inequality twice to obtain (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q D ( Q ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) + (cid:13)(cid:13)(cid:13) ˆ π Q D ( Q ( λ )) − ˆ π Q n D ( Q ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) + (cid:13)(cid:13)(cid:13) ˆ π Q n D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) . The first term is bounded using Theorem 2.1 and the second and third terms are bounded using Lemma 2.1.
In this section we consider the solution of a inverse problem using approximate models.
To facilitate analysis of solutions to the inverse problem presented in the introduction, we first formalize itsdefinition.
Definition 3.1 (Inverse Problem and Consistent Measure) . Given a probability measure P D on ( D , B D ) that is absolutely continuous with respect µ D and admits a density π D , the inverse problem is to determine aprobability measure P post Λ on ( Λ , B Λ ) that is absolutely continuous with respect to µ Λ and admits a probabilitydensity π post Λ , such that the subsequent push-forward measure induced by the map, Q ( λ ) , satisfies P post Λ ( Q − ( A )) = P Q D ( A ) = P D ( A ) , (3.1) for any A ∈ B D . We refer to any probability measure P post Λ that satisfies (3.1) as a consistent solution tothe inverse problem. Clearly, the inverse problem may not have a unique solution, i.e., there may be multiple probabilitymeasures that push-forward to the observed measure. This is analogous to a deterministic inverse problemwhere multiple sets of parameters may produce a fixed observed datum. A unique solution may be obtainedby imposing additional constraints or structure on the inverse problem. In this paper, such structure isobtained by incorporating prior information to construct a unique solution to the inverse problem as firstproposed in [11].
Q with Approximate Models Given a prior probability measure P Λ on ( Λ , B Λ ) that is absolutely continuous with respect to µ Λ andadmits a probability density π Λ we make the following assumption to guarantee existence and uniqueness ofa solution in terms of a density that is computable with standard density approximation techniques necessaryfor approximation of π Q D . Assumption 3.
There exists
C > such that π D ( q ) ≤ Cπ Q D ( q ) for a.e. q ∈ D . Since the observed density and the model are assumed to be fixed, this is only an assumption on the prior.We sometimes refer to this assumption as the
Predictability Assumption since it implies that any output eventwith non-zero observed probability has a non-zero predicted probability defined by the push-forward of theprior. This assumption is consistent with the convention in Bayesian methods to choose the prior to be asgeneral as possible because if the prior predicts that the probability of an event that actually occurs is zero,then even exhaustive sampling of the prior will be insufficient for incorporating data associated with thisevent into the posterior measure.Given an appropriate prior, constructing a consistent posterior solution is based on the following result.
Theorem 3.1 (Disintegration Theorem [13]) . Assume Q : Λ → D is B Λ -measurable, P Λ is a probabilitymeasure on ( Λ , B Λ ) and P D is the push-forward measure of P Λ on ( D , B D ) . There exists a P D -a.e. uniquelydefined family of conditional probability measures { P q } q ∈D on ( Λ , B Λ ) such that for any A ∈ B Λ , P q ( A ) = P q ( A ∩ Q − ( q )) , so P q ( Λ \ Q − ( q )) = 0 , and there exists the following disintegration of P Λ , P Λ ( A ) = (cid:90) D P q ( A ) dP D ( q ) = (cid:90) D (cid:18) (cid:90) A ∩ Q − ( q ) dP q ( λ ) (cid:19) dP D ( q ) , (3.2) for A ∈ B Λ . Assumption 3 automatically gives that P D is absolutely continuous with respect to µ D . Thus, writing dP D ( q ) = π D ( q ) dµ D ( q ) and using the prior to define the conditionals densities in the iterated integral (3.2)we arrive at the following theorem. Theorem 3.2 (Existence and Uniqueness [11]) . The probability measure P post Λ on ( Λ , B Λ ) defined by P post Λ ( A ) = (cid:90) D (cid:18) (cid:90) A ∩ Q − ( q ) π Λ ( λ ) π D ( Q ( λ )) π Q D ( Q ( λ )) dµ Λ ,q ( λ ) (cid:19) dµ D ( q ) , ∀ A ∈ B Λ (3.3) is a consistent solution to the inverse problem in the sense of (3.1) and is uniquely determined for a givenprior probability measure P Λ on ( Λ , B Λ ) . The probability density of the consistent solution is given by π post Λ ( λ ) = π Λ ( λ ) π D ( Q ( λ )) π Q D ( Q ( λ )) , λ ∈ Λ . (3.4)Each of the terms in (3.4) has a particular statistical interpretation and we refer the interested readerto [11] for a detailed discussion. Computing the posterior density (3.4) only requires the construction of thepush-forward of the prior π Q D since the prior and the observed densities are assumed a priori . When using approximate models to solve the inverse problem, we must make the following assumption,which is analogous to Assumption 3, to guarantee existence and uniqueness of approximate solutions.
Assumption 4.
There exists
C > such that π D ( q ) ≤ Cπ Q n D ( q ) for a.e. q ∈ D . Violation of Assumption 4 implies that for the chosen π Λ the approximate forward map given by Q n ( λ )cannot predict the observed data. T. Butler, J. Jakeman, T. Wildey
Recalling the formal expression for π post Λ given by (3.4), we define π post ,n Λ ( λ ) = π Λ ( λ ) π D ( Q n ( λ )) π Q n D ( Q n ( λ )) = π Λ ( λ ) r n ( λ ) , with r n ( λ ) = π D ( Q n ( λ )) π Q n D ( Q n ( λ )) . (3.5)The error in the total variation metric of the approximate posterior pdf is given by (cid:13)(cid:13) π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) = (cid:90) Λ π Λ ( λ ) | r n ( λ ) − r ( λ ) | dµ Λ . (3.6)Since π post ,n Λ ( λ ) and π post Λ ( λ ) are both in L ( Λ ), we have that π Λ ( λ ) | r n ( λ ) − r ( λ ) | is also in L ( Λ ). By astandard result in measure theory, for each η ∈ (0 , > Λ η ⊂ Λ such that (cid:13)(cid:13) π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ \ Λ η ) < η. This immediately implies that both (cid:90) Λ η π post Λ ( λ ) dµ Λ ≥ − η, and (cid:90) Λ η π post ,n Λ ( λ ) dµ Λ ≥ − η. Thus, we can rewrite the error shown in (3.6) as the sum of two integrals. The first integral is over thecompact set Λ η containing “most of the probability” for either the exact or approximate posterior probabilitymeasure. The second integral is over the (potentially unbounded) Λ \ Λ η , where the probabilities of both theexact and approximate posterior probabilities are less than η .To simplify the analysis to follow, we assume that Λ is precompact (i.e., bounded in R k ), which sub-sequently implies D has finite µ D -measure if the QoI map is piecewise smooth and bounded. If Λ is notprecompact, then we simply note that the analysis we provide can be used to prove convergence of theapproximate posterior pdf on any compact subset of Λ . In Section 5, we briefly discuss how it may also bepossible to replace µ Λ , and subsequently µ D , by finite measures dominating the probability measures definedon ( Λ , B Λ ) and ( D , B D ), respectively to arrive at similar theoretical results on Λ that are not precompact. Theorem 3.3 (Convergence of Posterior Densities) . Suppose Assumptions 1, 2, 3 and 4 hold, π D is aLipschitz continuous function on D , and ( Q n ( λ )) is a sequence of approximations to Q ( λ ) such that Q n ( λ ) → Q ( λ ) in L ∞ ( Λ ) as n → ∞ . Then, for any (cid:15) > and precompact Λ , there exists N such that n > N implies (cid:13)(cid:13) π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) < (cid:15), (3.7) where π post ,n Λ is the approximate posterior density obtained using Q n ( λ ) and its associated push-forward ofthe prior density denoted by π Q n D . Proof
Let Q n ( λ ) be any of the approximations from ( Q n ( λ )). We find it convenient to apply thedisintegration theorem using the map Q n ( λ ) to rewrite (3.6) as (cid:13)(cid:13) π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) = (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) | r n ( λ ) − r ( λ ) | dµ Λ ,q dµ D . (3.8)Observe that the difference in ratios given by | r n ( λ ) − r ( λ ) | can be rewritten as | r n ( λ ) − r ( λ ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q n ( λ )) π Q D ( Q ( λ )) − π D ( Q ( λ )) π Q n D ( Q n ( λ )) π Q n D ( Q n ( λ )) π Q D ( Q ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Then, by adding and subtracting π D ( Q ( λ )) π Q D ( Q ( λ )) in the numerator, this difference can be decomposedas | r n ( λ ) − r ( λ ) | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q n ( λ )) − π D ( Q ( λ )) π Q n D ( Q n ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) I + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q ( λ )) (cid:104) π Q D ( Q ( λ )) − π Q n D ( Q n ( λ )) (cid:105) π Q n D ( Q n ( λ )) π Q D ( Q ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) II . (3.9) Q with Approximate Models (cid:15) > N such that if Q n ( λ ) is chosen from ( Q n ( λ )) n>N , then (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) I dµ Λ ,q dµ D < (cid:15)/ , and (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) II dµ Λ ,q dµ D < (cid:15)/ . Since π D is assumed to be Lipschitz continuous with Lipschitz constant C ≥
0, then I ≤ C | Q ( λ ) − Q n ( λ ) | π Q n D ( Q n ( λ )) . (3.10)H¨older’s inequality then implies (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) I dµ Λ ,q ( λ ) dµ D ( q ) ≤ C (cid:90) D (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ∩ Q − n ( q )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) π Λ ( λ ) π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( Λ ∩ Q − n ( q )) dµ D ( q )By the disintegration theorem, for a.e. q ∈ D , π Λ ( λ ) /π Q n D ( Q n ( λ )) is a conditional density on Λ ∩ Q − n ( q ),i.e., (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) π Λ ( λ ) π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( Λ ∩ Q − n ( q )) = 1 for a.e. q ∈ D . It follows that (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) I dµ Λ ,q ( λ ) dµ D ( q ) ≤ Cµ D ( D ) (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) . (3.11)Recall the assumption of precompactness of Λ implies µ D ( D ) < ∞ . Then, since Q n ( λ ) → Q ( λ ) in L ∞ ( Λ ),we have that this bound can be made smaller than (cid:15)/ n sufficiently large.We now choose a new C > q ∈ D , π D ( q ) π Q D ( q ) ≤ C. Using H¨older’s inequality and the disintegration theorem as before, we have that (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) II dµ Λ ,q ( λ ) dµ D ( q ) ≤ Cµ D ( D ) (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) . By Theorem 2.2, this term can also be bounded by (cid:15)/ n sufficiently large, which completes theproof.Next, we consider the utilization of a KDE to approximate the push-forward of the prior and assess howthis affects the approximation of the posterior. We defineˆ π post ,n Λ ( λ ) = π Λ ( λ ) π D ( Q n ( λ ))ˆ π Q n D ( Q n ( λ )) = π Λ ( λ )ˆ r n ( λ ) , with ˆ r n ( λ ) = π D ( Q n ( λ ))ˆ π Q n D ( Q n ( λ )) , (3.12)which represents the approximation of the posterior using the approximate model and a KDE approximationof the push-forward of the prior through the approximate model. We emphasize that ˆ π post ,n Λ is not a KDEapproximation of the posterior. As in previous sections, we need to make a predictability assumption on theKDE approximation of the push-forward of the prior through the approximate model. Assumption 5.
There exists ˆ C > such that π D ( q ) ≤ ˆ C ˆ π Q n D ( q ) for a.e. q ∈ D . Violation of Assumption 5 implies that for the chosen π Λ the KDE approximation constructed using theapproximate forward map given by Q n ( λ ) cannot predict the observed data. The only difference betweenAssumption 4 and Assumption 5 is the incorporation of the kernel density approximation, so we would expectthese assumptions to be roughly equivalent for large M .Under the strict assumptions on π Q D required for Theorem 2.1, we can prove the following theoreminvolving the rate of convergence of ˆ π post ,n Λ to π post Λ .0 T. Butler, J. Jakeman, T. Wildey
Theorem 3.4 (Convergence of Posterior Densities with KDE Approximation) . Assume that a KDE approx-imation is generated using M samples of the approximate model. Suppose Assumptions 1, 2, 3, 4 and 5 andthe assumptions in Theorem 2.1 hold. Furthermore, assume Λ is precompact, π D is a Lipschitz continuousfunction on D , and ( Q n ( λ )) is a sequence of approximations to Q ( λ ) such that Q n ( λ ) → Q ( λ ) in L ∞ ( Λ ) as n → ∞ . Then, (cid:13)(cid:13) ˆ π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) (cid:33) . (3.13) Proof
The proof of (3.13) follows the proof of Theorem 3.3 very closely since the approximate map, Q n ( λ ), is the same, and only additional approximation is the value of the push-forward of the prior usingthe KDE. For the sake of brevity, we only discuss the different arguments used here. We follow the samearguments to decompose, | ˆ r n ( λ ) − r ( λ ) | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q n ( λ )) − π D ( Q ( λ ))ˆ π Q n D ( Q n ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) I + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q ( λ )) (cid:104) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:105) ˆ π Q n D ( Q n ( λ )) π Q D ( Q ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) II . The first term is bounded similar to before, except for the fact that for a.e. q ∈ D , the ratio π Λ ( λ ) / ˆ π Q n D ( Q n ( λ ))is only an approximation of a conditional density. However, we can use the fact that π Q n D ( Q n ( λ )) andˆ π Q n D ( Q n ( λ )) are constants along Λ ∩ Q − n ( q ) along with Assumptions 4 and 5 to show (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) π Λ ( λ )ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( Λ ∩ Q − n ( q )) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π Q n D ( Q n ( λ ))ˆ π Q n D ( Q n ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) π Λ ( λ ) π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( Λ ∩ Q − n ( q )) ≤ ˆ CC for a.e. q ∈ D , where C and ˆ C are the constants in Assumptions 4 and 5 respectively. The bound on II follows a similar argument, but uses Theorem 2.3 instead of 2.2. Theorem 3.5 (Convergence to a KDE Approximation) . Assume that a set of M samples are used to generateKDE approximations using the true model and the approximate model giving ˆ π post Λ and ˆ π post ,n Λ respectively.Suppose Assumptions 1, 2, 3, 4 and 5 hold.Then, (cid:13)(cid:13) ˆ π post ,n Λ ( λ ) − ˆ π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:0) (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) (cid:1) . (3.14)The proof of Theorem 3.5 is almost identical to the proof of Theorem 3.4. The only difference is the useof Lemma 2.1 instead of Theorem 2.3. In this section we specialize the general theory to two classes of convergent approximate models. Specifically,we consider sparse grid surrogate approximations, discretized partial differential equations and combinationsof these two. For each application, we first describe the sequence of discretized models and recall the knownresults regarding the convergence of the approximate model. These theoretical results are combined withTheorems 2.3 and 3.4 to give corollaries specific to each class of approximate model. We also provide numer-ical results to demonstrate the convergence of the push-forward of the prior and the posterior. We considerrelatively simple numerical examples to enable construction of sequences of numerical approximations todemonstrate convergence of the push-forward and posterior densities. For the sake of brevity, we only an-alyze the contribution of the KDE to the error in the first application. The corresponding results for theother two applications were similar. Before we present the applications, we discuss some of the numericalconsiderations that are common throughout the remainder of this paper and some of the diagnostic tools weuse to assess whether or not the predictability assumptions have been satisfied.
Q with Approximate Models In all the application to follow, we investigate the convergence in the push-forward and posterior densitiesas the approximate models converge. For both the exact and approximate models, we use a Gaussian KDEto construct estimates of the push-forward densities.We estimate the L ∞ -norm of the push-forward using the samples generated from P Λ . That is (cid:107) π Q D ( λ ) − ˆ π Q n D ( λ ) (cid:107) L ∞ ( D ) ≈ max ≤ i ≤ M | π Q D ( Q ( λ i )) − ˆ π Q n D ( Q n ( λ i )) | . Similarly we estimate the L -norm of the posterior using the samples generated from the prior: (cid:107) π post Λ ( λ ) − ˆ π post ,n Λ ( λ ) (cid:107) L ( Λ ) = (cid:107) r ( λ ) − ˆ r n ( λ ) (cid:107) L ( Λ ; P Λ ) ≈ M M (cid:88) i =1 | r ( Q ( λ i )) − ˆ r n ( Q n ( λ i )) | . Approximating the posterior density considered in this work requires solving the forward UQ problemfirst. Once the solution to the forward UQ problem has been obtained, i.e., the push-forward π Q D has beenapproximated, we can then evaluate the posterior density at the samples used to build π Q D at no extra costusing (3.4). We can then use a standard rejection sampling strategy to accept a subset of these samples forthe posterior (see [11]).In traditional Bayesian inference (see e.g. [36, 25, 4, 34, 17, 24]), very little in known about the posteriormeasure or density . The situation is slightly different for the approach considered in this work. In ourformulation, the posterior is constructed such that the push-forward of the posterior matches the givendensity on the QoI, which gives us a means to assess the accuracy of the posterior. For example, if theobserved density on the QoI is Gaussian, then we can compare the mean and the variance of the push-forward of the posterior with the corresponding values for π D . We can also estimate the integral of theposterior, I( π post Λ ) = (cid:90) Λ π post Λ ( λ ) dµ Λ = (cid:90) Λ π Λ ( λ ) r ( Q ( λ )) dµ Λ = (cid:90) Λ r ( Q ( λ )) dP Λ and the Kullback-Liebler (KL) divergence [26] between the prior and the posterior,KL( π Λ : π post Λ ) = (cid:90) Λ π post Λ ( λ ) log (cid:32) π post Λ ( λ ) π Λ ( λ ) (cid:33) dµ Λ = (cid:90) Λ r ( Q ( λ )) log r ( Q ( λ )) dP Λ . In practice, we actually compute I(ˆ π post ,n Λ ) and KL( π Λ : ˆ π post ,n Λ ). Both of these quantities can beestimated using the model evaluations that were used to estimate the push-forward of the prior, and aretherefore very cheap diagnostic tools [11]. The integral of the posterior is an especially useful diagnostic toolsince it is easy to show that I( π post Λ ) = (cid:90) Λ π post Λ ( λ ) dµ Λ = (cid:90) D π D ( q ) dµ D . Thus, if the approximation of the posterior using the approximate model and a KDE for the push-forward ofthe prior does not integrate to one, or at least a reasonable Monte Carlo estimate of one, then this indicatesthat Assumption 5 is not satisfied.
Surrogates of the parameter-to-QoI map are often used to reduce the computational cost of uncertaintyquantification. The theory we have developed thus far is quite general and can be applied to any surrogatemodel, however here we focus on sparse grid collocation methods, which are known to provide efficient andaccurate approximation of stochastic quantities [22, 27, 30].For simplicity we will restrict attention to consider stochastic collocation problems characterized byvariables λ with finite support normalized to fit in the domain Λ = [0 , k . However, the technique proposedhere can be applied to semi or unbounded random variables using the methodology outlined in [21]. Unless the map is linear and the prior and noise model are Gaussian. In this case the posterior is also known to be Gaussian. T. Butler, J. Jakeman, T. Wildey
Given a univariate interpolation rule with points Λ n = { λ n,i : i < ≤ i ≤ P n } and basis functions φ n,i , a level- n sparse grid with N n points [7] approximates Q ( λ ) via a weighted linear combination of tensorproducts of the univariate rules Q n ( λ ) = (cid:88) n − k +1 ≤| n | ≤ n ( − n −| n | (cid:18) k − n − | n | (cid:19) P n (cid:88) i =1 · · · P nk (cid:88) i k =1 f ( λ n , i ) · ( φ n ,i ⊗ · · · ⊗ φ n k ,i k ) ( λ ) (4.1)The samples λ n , i used to construct the sparse grid are a set of anisotropic grids Λ n = Λ n × · · · Λ n k on thedomain Λ , where n = ( n , . . . , n k ) and i = ( i , . . . , i k ) are multi-indices that denote the level and positionof a point within each univariate interpolation rule.Typically when approximating Q ( λ ) with a smooth dependence on λ , the Lagrange polynomials are thebest choice of basis functions and Λ n are chosen to be a set of well conditioned points such as the nestedClenshaw-Curtis points. The number of points P n of a one-dimensional grid of a given level, and thus thetotal number of points in the sparse grid N n , is dependent on the growth rate of the quadrature rule chosen.For Clenshaw-Curtis points P n = 2 n + 1. Under some assumptions on the regularity we recall the followingresult. Lemma 4.1 ([31]) . For sufficiently smooth Q ( λ ) , the isotropic level- n sparse-grid (4.1) based on Clenshaw-Curtis abscissas with N n points satisfies: (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) ≤ C ( σ ) N − µ n where µ := σ k and the constant C ( σ ) depends on the size of the region of analyticity σ of Q but noton the number of points in the sparse grid. Combining Lemma 4.1 with Theorem 2.3 gives the following result.
Corollary 4.1.
Under the assumptions of Theorem 2.3 and Lemma 4.1, the error in the push-forward ofthe prior using an isotropic level- n sparse grid approximation based on Clenshaw-Curtis abscissas with N n points satisfies (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n (cid:33) , (4.2) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n (cid:33) , (4.3) where µ = σ k . Combining Lemma 4.1 with Theorem 3.4 gives the following result.
Corollary 4.2.
Under the assumptions of Theorem 3.4, the error in the posterior using an isotropic level- n sparse grid approximation based on Clenshaw-Curtis abscissas with N n points satisfies (cid:13)(cid:13) π post Λ ( λ ) − ˆ π post ,n Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n (cid:33) . (4.4) The goal of this section is to verify the convergence rates in Corollary 4.1. Consider the analytical functionof two input parameters, λ ∈ [0 , , Q ( λ ) = 2 e − ( λ − . . − ( λ − . . + 3 e − ( λ − . . − ( λ − . . + 2 . e − ( λ − . . − ( λ − . . − e − ( λ − . . − ( λ − . . which is a summation of four weighted Gaussian peaks. We assume a uniform prior probability distributionon the input parameters. Q with Approximate Models -0.500.511.522.5 -0.500.511.522.5 -0.500.511.522.5
Figure 1: The level-4 sparse grid approximation of the QoI (left), the level-8 sparse grid approximation ofthe QoI (middle), and the reference solution (right).Next, we focus on the KDE contribution to the error by fixing the sparse grid at a level-12 approximation(577 model evaluations). We then use relatively small sets of samples in the KDE approximation of the push-forward of the prior setting M = 100 , , , , , L ∞ error in the push-forwardof the prior as the number of samples used to compute the KDE increases. Recall that the only difference · − . . . . . . . .
45 Number of Samples (M) L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.2Eq. 4.3 − − − − Number of Model Evaluations (N) L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.2Eq. 4.3
Figure 2: Convergence of the push-forward of the prior as the KDE is refined and the sparse grid is heldfixed at the highest level (left), and as the sparse grid is refined and KDE is fixed with the maximum numberof samples.between (4.2) and (4.3) is whether we evaluate the KDE approximation at the reference QoI values or atthe approximate QoI values. Since we are using a fairly accurate response surface approximation, thesepointwise errors are relatively small and the difference between the two estimates is negligible.Next, we fix M = 50 ,
000 to assess the error in the push-forward of the prior due to the responsesurface approximation. In Figure 2 (right), we see that the push-forward of the prior converges rapidly as4
T. Butler, J. Jakeman, T. Wildey the pointwise error in the response surface approximation decreases. In this case, we observe a differencebetween the error estimates given by (4.2) and (4.3) due to difference in where the KDE is evaluated.
The goal of this section is to verify Corollary 4.2. We use the model introduced in Section 4.2.1. To formulatea inverse problem, we assume that π D ∼ N (2 . , .
04) and use (3.4) to compute the posterior density. Forboth the reference solution and each sparse grid approximation, we use the 50,000 samples with a GaussianKDE to approximate the push-forward of the prior. The corresponding approximations of the posterior forthe level-4 and level-8 sparse grid approximations are shown in Figure 3 along with the posterior from thereference solution. The level-4 sparse grid provides a poor approximation of the response surface and the
Figure 3: Posterior corresponding to the level-4 sparse grid approximation (left), the level-8 sparse gridapproximation (middle), and the reference solution (right).corresponding posterior contains significant error. The posterior corresponding to the level-8 sparse gridappears to be much closer to the reference solution.We use the standard rejection sampling strategy described in [11] to accept a subset of these samples forthe posterior. The accepted samples for the level-4 and level-8 sparse grid approximation of the posteriorare shown in Figure 4 along with the samples accepted from the reference solution.
Figure 4: Samples from the posterior corresponding to the level-4 sparse grid (left), the level-8 sparse gridapproximation (middle), and the reference solution (right).In Table 1, we show the diagnostic data on the posterior densities obtained using different sparse gridlevels. The integral of the posterior clearly indicates that the level-2 sparse grid approximation does notsatisfy Assumption 4. Moreover, the mean and variance of the push-forward of the posterior do not matchthe corresponding values for π D . Thus, the level-2 sparse grid cannot be used to solve the inverse problem. Q with Approximate Models π post ,n Λ ) 0.001 0.930 0.983 0.983 0.983 1.000KL( π Λ : ˆ π post ,n Λ ) -0.004 2.163 1.981 1.981 1.981 UNKNMean PF-post 1.616 2.279 2.304 2.303 2.303 2.300Var. PF-post 2.369e-3 2.915e-2 4.168e-2 4.197e-2 4.192e-2 4.000e-2Table 1: Comparison of the integral of the posterior, the KLD from the prior to the posterior, and themean and variance of the push-forward of the posterior obtained using various sparse grid approximationsin Section 4.2.2.The diagnostic data for the level-4 is much better, but it is still not sufficient to allow this approximate modelto be used to solve the inverse problem. On the other hand, the information for the level-8 and level-12sparse grid approximations indicate that Assumption 4 is satisfied and that these approximate models canbe used to solve the inverse problem. In fact, this is true for the level-5 sparse grid and all higher-orderapproximations. Thus, throughout the remainder of this section we only use levels 5-12. We emphasize thatwhile the diagnostic data is useful to assess the usability of an approximate model, it does not provide anyinformation regarding the accuracy of the posterior.We first seek to isolate the KDE contribution to the error in the posterior by fixing the sparse gridapproximation at level-12 and using smaller subsets of the 50,000 samples as in Section 4.2.1. In Figure 5(left) we plot the error in the posterior density as the number of samples used to compute the KDE increases.Next, we fix M = 50 ,
000 and assess the accuracy in the posterior as the sparse grid approximation is refined. · − · − · − · − . . . . . . . .
24 Number of Samples (M) L E rr o r i n P o s t e r i o r Eq. 4.4 . . . . . . . − − − − Number of Model Evaluations (N) L E rr o r i n P o s t e r i o r L ∞ Error in Q n Eq. 4.4
Figure 5: Convergence of the posterior in the L -norm as the KDE approximation is refined (left) andas the sparse grid is refined (right). For reference, we also include the L ∞ error in the response surfaceapproximation in the right image.As mentioned in the previous section, by utilizing the same set of samples for the reference solution and foreach response surface approximation, we are able to isolate the contribution of the sparse grid approximationto the error. In Figure 5 (right), we clearly see that the error in the posterior, measured in the L -norm,decays at the same rate as the L ∞ -error in the response surface approximation. Consider the following general system of equations, ∂ u ∂t + A ( λ ; u ) = , (4.5)6 T. Butler, J. Jakeman, T. Wildey defined on Ω × (0 , T ] where Ω ⊂ R s , s = 1 , ,
3, is a polygonal (polyhedral) and bounded domain withboundary ∂ Ω. As throughout the paper, the random parameter λ reflects sources of uncertainty, for example,uncertain initial or boundary conditions, forcing etc. The solution operator’s dependency on λ implies thatboth u := u ( x , t, λ ) and Q ( λ ) := Q ( u ( x , t, λ )) are also uncertain and may be modeled as a random processes.For the sake of simplicity, we assume that Q is a bounded continuous linear functional of u .In this paper we assume that A is convex and has smooth second derivatives. Specific examples of A and u will be given in subsequent sections. We assume that sufficient initial and boundary conditions areprovided so that (4.5) is well-posed in the sense that there exists a solution for a. e. λ ∈ Λ .Let T h be a conforming partition of Ω, composed of N h closed convex volumes of maximum diameter h . We assume that the mesh is regular in the sense of Ciarlet [12] and take T h to be a conforming finiteelement mesh consisting of simplices or parallelopipeds. A fully discrete scheme for any λ ∈ Λ can beobtained by letting I j = ( t j − , t j ) and time steps ∆ t = max j t j − t j − denote the discretization of [0 , T ] as0 = t < t < · · · < t N t = T . In this paper, we assume that a first-order Euler scheme (either implicit orexplicit) is used to discretize in time. To define the sequence of approximate models, ( Q n ( λ )), we definesequences of discretizations, h ≥ h ≥ . . . , and ∆ t ≥ ∆ t ≥ . . . , where h n , ∆ t n → n → ∞ . Then we define Q n ( λ ) to be the approximate model that uses h n and ∆ t n respectively. In cases where a unique and sufficiently regular solution exists, one can obtain the followingerror bound using duality arguments (see e.g. [19, 15, 9, 32, 3]) (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) ≤ C ( u )( h r + αn + ∆ t n ) , (4.6)for some α ∈ [0 ,
1] where C ( u ) depends but does not depend on ∆ t n or h n . The parameter r is determinedby the regularity of the solution and the order of accuracy of the spatial discretization. We note that C ( u )typically depends on λ and α depends on the regularity of u . Combining (4.6) with Theorem 2.3 gives thefollowing result. Corollary 4.3.
Under the assumptions of Theorem 2.3, the error in the push-forward of the prior using adiscretization of (4.5) satisfies (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( u )( h r + αn + ∆ t n ) (cid:33) , (4.7) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( u )( h r + αn + ∆ t n ) (cid:33) . (4.8)Combining (4.6) with Theorem 3.4 gives the following result. Corollary 4.4.
Under the assumptions of Theorem 3.4, the error in the posterior using a discretization of (4.5) satisfies (cid:13)(cid:13) π post Λ ( λ ) − ˆ π post ,n Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( u )( h r + αn + ∆ t n ) (cid:33) . (4.9) The goal of this section is to verify the convergence rates in Corollary 4.3. Consider a single-phase incom-pressible flow model: −∇ · ( K ( λ ) ∇ u ) = 0 , ( x, y ) ∈ Ω = (0 , ,u = 1 , x = 0 ,u = 0 , x = 1 ,K ∇ p · n = 0 , y = 0 and y = 1 . (4.10) Q with Approximate Models u is the pressure field and K is the permeability field which we assume is a scalar field given by aKarhunen-Lo´eve expansion of the log transformation, Y = log K , with Y ( λ ) = Y + ∞ (cid:88) i =1 ξ i ( λ ) √ η i f i ( x, y ) , where Y is the mean field and ξ i are mutually uncorrelated random variables with zero mean and unit variance[16, 39]. The eigenvalues, η i , and eigenfunctions, f i , are computed using an assumed functional form for thecovariance matrix [42, 35]. We assume a correlation length of 0 .
01 in each spatial direction and truncate theexpansion at 100 terms. This choice of truncation is purely for the sake of demonstration. In practice, theexpansion is truncated once a sufficient fraction of the energy in the eigenvalues is retained [42, 16]. Ourquantity of interest is the pressure at the point (0 . , . ∈ Ω. The prior is a multivariate standardnormal density π Λ ∼ N ( , I ) where I is the standard identity matrix.To approximate solutions to the PDE in Eq. (4.10) we use a finite element discretization with continuouspiecewise bilinear basis functions defined on a uniform spatial grid. We vary the number of grid points in eachspatial direction ( h = 1 / , / , / , / , / Q n is associated with a particular value of h . The asymptotic value of the quantity of interest is unknown, butfor each sample in Λ we have the QoI on a sequence of grids so we use Richardson extrapolation to estimatea reference solution for the QoI. We generate 10,000 samples from the prior and evaluate each discretizationof the PDE model for each of these realizations. We use a standard Gaussian KDE to approximate the push-forward of the prior in the 1-dimensional output space. In Figure 6 (left), we plot the convergence of thepush-forward of the prior as the physical discretization is refined. We do see a significant difference between . . . . . . . . . . . − − − L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.7Eq. 4.8 . . . . . . . . . . . − − L E rr o r i n P o s t e r i o r L ∞ Error in Q n Eq. 4.9
Figure 6: Convergence of the push-forward of the prior (left) and convergence of the QoI in the L ∞ -normand the posterior in the L -norm (right) as the spatial approximation is refined.(4.7) and (4.8), but the errors eventually converge at approximately the same rate. The difference betweenthe two curves is because (4.7) evaluates error when the approximate push-forward density is evaluatedusing exact values of the QoI q , whereas (4.8) evaluates error using the approximate push-forward densitiesevaluated at approximate values of the QoI. The later evaluation introduces an additional source of errorand thus the error in (4.8) will always be larger than the error in (4.7). The goal of this section is to verify the convergence rate in Corollary 4.4. We use the model introducedin Section 4.3.1. To formulate a inverse problem, we assume the observed density on the QoI is givenby π D ∼ N (0 . , . T. Butler, J. Jakeman, T. Wildey h=1/10 h=1/20 h=1/40 h=1/80 h=1/160 Ref. TruthI( π post Λ ) 0.993 0.986 0.982 0.978 0.980 0.980 1.000KL( π Λ : π post Λ ) 1.344 1.344 1.326 1.341 1.353 1.357 UNKNMean PF-post 0.700 0.700 0.700 0.700 0.700 0.700 0.700Var. PF-post 0.997e-4 0.995e-4 0.978e-4 1.012e-4 1.003e-4 1.021e-4 1.000e-4Table 2: Comparison of the integral of the posterior, the KLD from the prior to the posterior, and themean and variance of the push-forward of the posterior obtained using various spatial approximations inSection 4.3.2.depends on the accuracy in the approximate model. In Figure 6 (right), we plot the L -norm of the error inthe posterior along with the L ∞ -norm of the error in the approximate model. We see that the error in theposterior converges at the same rate as the error in the approximate model. In this section we consider the common case when two forms of approximations are used to quantify uncer-tainty. Specifically we consider the situation when a sparse grid surrogate of a discretized model is used.In this setting, there are several ways to define the sequence Q n . We simply assume that the sequence isdefined in such a way that for any 0 < m < n , we have h n ≤ h m , ∆ t n ≤ ∆ t m and N m ≤ N n . CombiningLemma 4.1 and (4.6) gives the following result. Lemma 4.2.
For sufficiently smooth Q ( λ ) , the isotropic level- n sparse-grid (4.1) based on Clenshaw-Curtisabscissas with N n points and a discretization of (4.5) satisfies (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) ≤ C ( σ ) N − µ n + C ( u ) (cid:0) h r + αn + ∆ t n (cid:1) where the constant C ( σ ) depends on the size of the region of analyticity σ of Q but not on the number ofpoints in the sparse grid and C ( u ) depends on the solution u but not the mesh and temporal resolution h and k , respectively. Combining Lemma 4.2 with Theorem 2.3 gives the following result.
Corollary 4.5.
Under the assumptions of Theorem 2.3 and Lemma 4.2, the error in the push-forward ofthe prior using an isotropic sparse grid approximation based on Clenshaw-Curtis abscissas with N n pointsand a discretization of (4.5) satisfies (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n + C ( u ) (cid:0) h r + αn + ∆ t n (cid:1)(cid:33) , (4.11) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n + C ( u ) (cid:0) h r + αn + ∆ t n (cid:1)(cid:33) , (4.12) where µ = σ m . Combining Lemma 4.2 with Theorem 3.4 gives the following result.
Corollary 4.6.
Under the assumptions of Theorem 3.4, the error in the posterior using an isotropic sparsegrid approximation based on Clenshaw-Curtis abscissas with N n points and a discretization of (4.5) satisfies (cid:13)(cid:13) π post Λ ( λ ) − ˆ π post ,n Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n + C ( u ) (cid:0) h r + αn + ∆ t n (cid:1)(cid:33) . (4.13) Q with Approximate Models The goal of this section is to verify the convergence rates in Corollary 4.5. Consider the nonlinear system ofordinary differential equations governing a competitive Lotka-Volterra model of the population dynamics ofspecies competing for some common resource. The model is given by (cid:40) d u i dt = r i u i (cid:16) − (cid:80) j =1 α ij u j (cid:17) , t ∈ (0 , , u i (0) = u i, , (4.14)for i = 1 , ,
3. The initial condition, u i, , and the self-interacting terms, α ii , are given, but the remaininginteraction parameters, α ij with i (cid:54) = j as well as the reproductivity parameters, r i , are unknown. Thus, wehave a total of 9 uncertain parameters. We assume that these parameters are each uniformly distributed on[0 . , . u (10).We approximate the solution to (4.14) in time using an explicit Euler method. For the reference solution,we use a time step of ∆ t = 1 / t = 1 /
10 and an isotropic sparse grid of level-1 (19 model evaluations). We explore uniformly refining thetime step with ∆ t = 1 / , / , / , /
160 and refining the sparse grid to level-2 (181 model evaluations),level-3 (1177 model evaluations), and level-4 (5929 model evaluations).In Figure 7 (left), we fix ∆ t = 1 /
160 and we see that the error in the push-forward of the prior convergesas the sparse grid is refined. In Figure 7 (right), we fix the sparse grid at level-4 and assess the convergence − − − − Number of Model Evaluations (N) L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.11Eq. 4.12 . . . . . . . . . . . − − − − L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.11Eq. 4.12
Figure 7: Convergence of the push-forward of the prior as the sparse grid discretization is refined and thetemporal discretization is held fixed at the highest level (left), and as the temporal discretization is refinedand the sparse grid discretization is held at the highest level (right).in the error in the push-forward of the prior converges as the temporal discretization is refined.We clearly see that the dominant contribution to the error comes from the sparse grid approximation.The convergence plots in Figure 7 stagnate once they reach the level-4 sparse grid error. Ideally, we woulduse an a posteriori error estimation technique (see e.g., [23, 9, 8, 10, 6]) to decompose the error into thevarious contributions and adaptively choose which discretization to refine, but that is beyond the scope ofthis paper.We remark here that goal-oriented approaches for estimating the errors in approximations based uponspatial and temporal discretizations and surrogate were developed in [9], further generalized in [10]. Theseapproaches were then used in [23, 6] to separate the error into different contributions and adaptively controlthe error, and, extended in [40], to bound the error in probabilities of rare events. However none of theseworks considers the error induced in the estimates of push-forward densities.0
T. Butler, J. Jakeman, T. Wildey
The goal of this section is to verify the convergence rate in Corollary 4.6. We use the model introducedin Section 4.4.1. To formulate a inverse problem, we assume the observed density on the QoI is given by π D ∼ N (0 . , . t = ∆ t = ∆ t = ∆ t = ∆ t = Ref. TruthI( π post Λ ) 1.023 1.023 1.023 1.023 1.023 1.017 1.000KL( π Λ : π post Λ ) 2.449 2.453 2.455 2.456 2.457 2.552 UNKNMean PF-post 0.500 0.500 0.500 0.499 0.500 0.500 0.500Var. PF-post 1.011e-4 1.031e-4 1.040e-4 1.040e-4 1.038e-4 1.029e-4 1.000e-4Table 3: Comparison of the integral of the posterior, the KLD from the prior to the posterior, and the meanand variance of the push-forward of the posterior obtained using using various temporal discretizations anda level-1 sparse grid in Section 4.4.2.Level-1 Level-2 Level-3 Level-4 Ref. TruthI( π post Λ ) 1.023 1.012 1.018 1.020 1.017 1.000KL( π Λ : π post Λ ) 2.449 2.544 2.557 2.559 2.552 UNKNMean PF-post 0.500 0.499 0.499 0.499 0.500 0.500Var. PF-post 0.951e-4 1.022e-4 1.021e-4 1.059e-4 0.994e-4 1.000e-4Table 4: Comparison of the integral of the posterior, the KLD from the prior to the posterior, and the meanand variance of the push-forward of the posterior obtained using various sparse grid approximations with afixed temporal discretization (∆ t = ) in Section 4.4.2coarsest discretization, the approximate models satisfy Assumption 5 and can be used to solve the inverseproblem. We also see that KLD using the level-1 sparse grid does not give the same value as the other sparsegrid levels or the reference solution. This indicates that while Assumption 5 is satisfied, the approximatemodel leads to a different posterior.In Figure 8, we plot the L -norm of the error in the posterior along with the L ∞ -norm of the error inthe approximate model. We see that the error in the posterior converges at the same rate as the error inthe approximate model. As in Section 4.4.1, the dominant contribution to the error is due to the sparse gridapproximation, so the convergence eventually stagnates when refining the temporal discretization. We developed a theoretical framework for analyzing the convergence of probability density functions com-puted using approximate models for both forward and inverse problems. Our theoretical results are quitegeneral and apply to any L ∞ -convergent sequence of approximate models can be considered. We proved thatthe densities converge under quite reasonable assumptions and rates of convergence are obtained under morestringent assumptions. The rates of convergence explicitly show the dependence on the error introducedby approximating densities and the error induced via the use of approximate models. We have verifiedthe theoretical results using sequences of approximate models derived from discretized partial and ordinarydifferential equations as well as from sparse grid approximations. J.D. Jakeman’s work was partially supported by DARPA EQUIPS. T. Wildey’s work was supported by theOffice of Science Early Career Research Program.
Q with Approximate Models − − − Number of Model Evaluations (N) L E rr o r i n P o s t e r i o r L ∞ Error in Q n Eq. 4.13 . . . . . . . . . . . − − − L E rr o r i n P o s t e r i o r L ∞ Error in Q n Eq. 4.13
Figure 8: Convergence of the posterior in the L -norm as the sparse grid discretization is refined and thetemporal discretization is held fixed at the highest level (left), and as the temporal discretization is refinedand the sparse grid discretization is held at the highest level (right).The views expressed in the article do not necessarily represent the views of the U.S. Department of Energyor the United States Government. Sandia National Laboratories is a multimission laboratory managed andoperated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary ofHoneywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administrationunder contract DE-NA-0003525. References [1] B.M. Adams, L.E. Bauman, W.J. Bohnhoff, K.R. Dalbey, M.S. Ebeida, J.P. Eddy, M.S. Eldred, P.D.Hough, K.T. Hu, J.D. Jakeman, J.A. Stephens, L.P. Swiler, D.M. Vigil, , and T.M. Wildey. Dakota, AMultilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncer-tainty Quantification, and Sensitivity Analysis: Version 6.0 Users Manual. Technical Report SAND2014-4633 (Version 6.6), Sandia National Laboratories, 2017.[2] V. Barthelmann, E. Novak, and K. Ritter. High dimensional polynomial interpolation on sparse grids.
Advances in Computational Mathematics , 12:273–288, 2000.[3] Roland Becker and Rolf Rannacher. An optimal control approach to a posteriori error estimation infinite element methods.
Acta Numerica , 10:1–102, 2001.[4] J. M. Bernardo and F. M. Adrian.
Bayesian Theory . Wiley, 1994.[5] D. Boos. A converse to scheffes theorem.
The Annals of Statistics , 13(1):423–427, 1985.[6] C. M. Bryant, S. Prudhomme, and T. Wildey. Error decomposition and adaptivity for response surfaceapproximations from pdes with parametric uncertainty.
SIAM/ASA Journal on Uncertainty Quantifi-cation , 3(1):1020–1045, 2015.[7] H.-J. Bungartz and M. Griebel. Sparse grids.
Acta Numerica , 13:147–269, 2004.[8] T. Butler, P. Constantine, and T. Wildey. A posteriori error analysis of parameterized linear systemsusing spectral methods.
SIAM. J. Matrix Anal. Appl. , 33:195–209, 2012.[9] T. Butler, C. Dawson, and T. Wildey. A posteriori error analysis of stochastic spectral methods.
SIAMJ. Sci. Comput. , 33:1267–1291, 2011.2
T. Butler, J. Jakeman, T. Wildey [10] T. Butler, C. Dawson, and T. Wildey. Propagation of uncertainties using improved surrogate models.
SIAM/ASA Journal on Uncertainty Quantification , 1(1):164–191, 2013.[11] T. Butler, J. Jakeman, and T. Wildey. Combining push-forward measures and bayes rule to constructconsistent solutions to stochastic inverse problems. Accepted for publication in SIAM J. Sci. Comput.,2017.[12] P. G. Ciarlet. Basic error estimates for elliptic problems. In
Handbook of numerical analysis, Vol. II ,pages 17–351. North-Holland, Amsterdam, 1991.[13] C. Dellacherie and P.A. Meyer.
Probabilities and Potential . North-Holland Publishing Co., Amsterdam,1978.[14] L Devroye and L. Gy´’orfi.
Nonparametric Density Estimation: The L View . Wiley, New York, 1985.[15] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson.
Computational differential equations . CambridgeUniversity Press, Cambridge, 1996.[16] Benjamin Ganis, Hector Klie, Mary F Wheeler, Tim Wildey, Ivan Yotov, and Dongxiao Zhang. Stochas-tic collocation and mixed finite elements for flow in porous media.
Computer methods in applied me-chanics and engineering , 197(43):3547–3559, 2008.[17] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin.
Bayesian DataAnalysis, Third Edition . Chapman and Hall/CRC, 2013.[18] R. Ghanem and P. Spanos.
Stochastic Finite Elements: A Spectral Approach . Springer Verlag, NewYork, 2002.[19] Michael B. Giles and Endre Sli. Adjoint methods for pdes: a posteriori error analysis and postprocessingby duality.
Acta Numerica , 11:145–236, 1 2002.[20] B. E. Hansen. Uniform convergence rates for kernel estimation with dependent data.
EconometricTheory , 24(3):726–748, 2008.[21] J.D. Jakeman, M. Eldred, and D. Xiu. Numerical approach for quantification of epistemic uncertainty.
J. Comput. Phys. , 229(12):4648–4663, 2010.[22] J.D. Jakeman and S.G. Roberts. Local and dimension adaptive stochastic collocation for uncertaintyquantification. In Jochen Garcke and Michael Griebel, editors,
Sparse Grids and Applications , volume 88of
Lecture Notes in Computational Science and Engineering , pages 181–203. Springer Berlin Heidelberg,2013.[23] J.D. Jakeman and T. Wildey. Enhancing adaptive sparse grid approximations and improving refinementstrategies using adjoint-based a posteriori error estimates.
Journal of Computational Physics , 280:54 –71, 2015.[24] E. T. Jaynes.
Probability Theory: The Logic of Science . 1998.[25] Marc C. Kennedy and Anthony O’Hagan. Bayesian calibration of computer models.
Journal of theRoyal Statistical Society: Series B (Statistical Methodology) , 63(3):425–464, 2001.[26] S. Kullback and R. A. Leibler. On information and sufficiency.
The Annals of Mathematical Statistics ,22:79–86, 1951.[27] X. Ma and N. Zabaras. An adaptive hierarchical sparse grid collocation algorithm for the solution ofstochastic differential equations.
J. Comput. Phys. , 228:3084–3113, 2009.[28] Xiang Ma and Nicholas Zabaras. An adaptive hierarchical sparse grid collocation algorithm for thesolution of stochastic differential equations.
J. Comput. Phys. , 228(8):3084–3113, May 2009.
Q with Approximate Models
Communications in Computational Physics , 6(1):826–847, 2009.[30] F. Nobile, R. Tempone, and C.G. Webster. A sparse grid stochastic collocation method for partialdifferential equations with random input data.
SIAM J. Numer. Anal. , 46(5):2309–2345, 2008.[31] F. Nobile, R. Tempone, and C.G. Webster. A sparse grid stochastic collocation method for partialdifferential equations with random input data.
SIAM Journal on Numerical Analysis , 46(5):2309–2345,2008.[32] J. T. Oden and S. Prudhomme. Goal-oriented error estimation and adaptivity for the finite elementmethod.
Computers & mathematics with applications , 41(5):735–756, 2001.[33] Carl Edward Rasmussen. Gaussian processes for machine learning. 2006.[34] C. P. Robert.
The Bayesian Choice - A Decision Theoretic Motivation (second ed.) . Springer, 2001.[35] Christoph Schwab and Radu Alexandru Todor. Karhunen-Lo´eve approximation of random fields by gen-eralized fast multipole methods.
Journal of Computational Physics , 217(1):100 – 122, 2006. UncertaintyQuantification in Simulation Science.[36] A. M. Stuart. Inverse problems: A bayesian perspective.
Acta Numerica , 19:451–559, 5 2010.[37] T. J. Sweeting. On a converse to scheffes theorem.
The Annals of Statistics , 14(3):1252–1256, 1986.[38] George R. Terrell and David W. Scott. Variable kernel density estimation.
The Annals of Statistics ,20(3):1236–1265, 1992.[39] Mary F Wheeler, Tim Wildey, and Ivan Yotov. A multiscale preconditioner for stochastic mortar mixedfinite elements.
Computer Methods in Applied Mechanics and Engineering , 200(9):1251–1262, 2011.[40] T. Wildey and T. Butler. Utilizing error estimates for surrogate models to accurately predict probabil-ities of events. To Appear in Int. J. Uncertainty Quantification, 2018.[41] D. Xiu and G. Karniadakis. The Wiener-Askey polynomial chaos for stochastic differential equations.
SIAM J. Sci. Comput. , 24:619–644, 2002.[42] Dongxiao Zhang and Zhiming Lu. An efficient, high-order perturbation approach for flow in randomporous media via Karhunen-Lo´eve and polynomial expansions.