[PDF] Convergence of Probability Densities using Approximate Models for Forward and Inverse Problems in Uncertainty Quantification

Abstract

We analyze the convergence of probability density functions utilizing approximate models for both forward and inverse problems. We consider the standard forward uncertainty quantification problem where an assumed probability density on parameters is propagated through the approximate model to produce a probability density, often called a push-forward probability density, on a set of quantities of interest (QoI). The inverse problem considered in this paper seeks a posterior probability density on model input parameters such that the subsequent push-forward density through the parameter-to-QoI map matches a given probability density on the QoI. We prove that the probability densities obtained from solving the forward and inverse problems, using approximate models, converge to the true probability densities as the approximate models converges to the true models. Numerical results are presented to demonstrate optimal convergence of probability densities for sparse grid approximations of parameter-to-QoI maps and standard spatial and temporal discretizations of PDEs and ODEs.

Full PDF

CConvergence of Probability Densities using ApproximateModels for Forward and Inverse Problems in UncertaintyQuantiﬁcation

T. Butler ∗ , J.D Jakeman † , T. Wildey † July 3, 2018

Abstract

We analyze the convergence of probability density functions utilizing approximate models for bothforward and inverse problems. We consider the standard forward uncertainty quantiﬁcation problemwhere an assumed probability density on parameters is propagated through the approximate model toproduce a probability density, often called a push-forward probability density, on a set of quantities ofinterest (QoI). The inverse problem considered in this paper seeks a posterior probability density onmodel input parameters such that the subsequent push-forward density through the parameter-to-QoImap matches a given probability density on the QoI. We prove that the probability densities obtained fromsolving the forward and inverse problems, using approximate models, converge to the true probabilitydensities as the approximate models converges to the true models. Numerical results are presented todemonstrate optimal convergence of probability densities for sparse grid approximations of parameter-to-QoI maps and standard spatial and temporal discretizations of PDEs and ODEs.

Keywords. inverse problems, uncertainty quantiﬁcation, density estimation, surrogate modeling, re-sponse surface approximations, discretization errors

Assessing modeling uncertainties is essential for credible simulation-based prediction and design. Forwarduncertainty quantiﬁcation (UQ) problems involve estimating uncertainty in model outputs caused by uncer-tain inputs. Inverse UQ problems involve using (noisy) data associated with (a subset of) model outputs toupdate prior information on model inputs. Practical UQ studies often require the solution of both an inverseand forward problem. Unfortunately, evaluating high-ﬁdelity models is often computationally demanding,and methods for solving either forward or inverse UQ problems typically require generating large ensem-bles of simulation runs evaluated at varying realizations of the random variables used to characterize themodel input uncertainty. UQ analyses are also complicated by the simple fact that many of the governingequations used to model physical systems can rarely be solved analytically and so instead must be solvedapproximately.In this paper, we investigate how using approximate models aﬀects the probabilities densities solvingforward and inverse UQ problems. The theory we develop is general. We demonstrate its utility usingcommon forms of approximations, namely, temporal and spatial discretization for the numerical solution ofdiﬀerential equations, and sparse grid surrogate models.The convergence of certain statistical quantities (e.g., mean and variance) is well-studied for many popularchoices of surrogate approximations including generalized polynomial chaos expansions (PCE) [18, 41], sparsegrid interpolation [2, 28] and Gaussian process models [33]. However, little attention in the literatureis given to the impact of surrogate approximations on probability density functions. For example, PCEapproximations for random variables with ﬁnite second moments exhibit mean-square convergence [18] andthus a sequence of PCE converge in both probability and distribution. But while Scheﬀe’s theorem states thatalmost everywhere (a.e.) convergence of probability density functions implies convergence in distribution,the converse is generally not true. For a classical counterexample to the converse, consider the sequence of

Disclosure: See acknowledgementsCorrespondence author: T. Wildey ([email protected]) ∗ CUB † Optimization and Uncertainty Quantiﬁcation Department, Sandia National Labs, Albuquerque, NM, USA a r X i v : . [ m a t h . NA ] J u l T. Butler, J. Jakeman, T. Wildey random variables ( X n ) with densities (1 − cos(2 πnx )) for x ∈ [0 , Given a probability density describing the uncertain model inputs, the forward problemseeks to determine the push-forward probability density obtained by propagating the input density through theparameter-to-QoI map. (Inverse Problem)

Given an observed probability density, the inverse problem seeks a pullback probabilitydensity for the model inputs that when propagated through the parameter-to-QoI map, produces a push-forwarddensity that exactly matches the observed density.

We prove convergence results for both the forward and inverse problems in the total variation metric (i.e.,the so-called “statistical distance” metric). Speciﬁcally, we show that, under suitable conditions, sequencesof approximate push-forward or pullback densities obtained using approximate models converge at a rateproportional to the rate of convergence of the approximate models to the true model. To our knowledge, thisanalysis is the ﬁrst of its kind and exploits a special form of the converse of Scheﬀe’s theorem ﬁrst provenin [5] and subsequently generalized in [37]. Under more restrictive conditions, namely those necessary forconvergence of a standard kernel density estimator, we prove that the rates of convergence are bounded bythe error in the kernel density approximation and the L ∞ -error in the approximate model.To our knowledge, this work on the convergence of push-forward and pullback densities using approxi-mate models is the ﬁrst of its kind. However, complementary work on the convergence of classical Bayesianinverse problems using PCE is studied in [29]. In that work, the Kullback-Leibler divergence (KLD) is usedto measure the diﬀerence between the true and approximate posterior densities (using standard assump-tions in classical Bayesian analysis). Convergence of the densities in the sense of the KLD converging tozero are proven. Furthermore, the convergence of the KLD shown in [29] does imply convergence to theclassical Bayesian posterior in the total variation metric by application of Pinsker’s inequality. However,the analysis provided in [29] does not generalize for either the forward or inverse problems studied in thiswork. Speciﬁcally, we do not restrict approximate models to be deﬁned by PCE surrogates, and, as shownin [11], the classical Bayesian posterior is not designed to give a pullback measure. Moreover, we allow theobserved densities used to deﬁne our posteriors (i.e., the pullback measures) to be of a more general classthan Gaussian distributions assumed for the error models in [29]. Additionally, the posteriors we obtain arenot simply normalized by a constant as with classical Bayesian posteriors. Subsequently, key inequalitiessuch as (4.11) in [29] used to prove the fundamental lemmas for the convergence of the KLD for a classicalBayesian posterior simply do not apply to our pullback densities.The remainder of the paper is organized as follows. In Section 2 we provide a formal deﬁnition of theforward problem and discuss the theoretical aspects of its solution using approximate models. We thenintroduce the inverse problem and prove that the posterior density corresponding to an approximate modelconverges to the true posterior. In Section 4, we review some important classes of approximate models anduse our general theoretical results to provide speciﬁc error bounds for the classes of approximate models weconsider. For each of these applications, we provide numerical results to complement our theoretical resultsand highlight important aspects of the forward and inverse problems. We provide concluding remarks inSection 5. In this section we consider solution of the forward problem using approximate models. We use the termapproximate models in a broad sense to mean any sequence of approximations to the response of the modeloutputs. Speciﬁcally, for a given model, let Λ ⊂ R k denote a space of inputs to the model that we referto simply as parameters. Given a set of quantities of interest (QoI), we deﬁne the parameter-to-QoI map Q ( λ ) : Λ → D ⊂ R m . The range of the QoI map D := Q ( Λ ) describes the space of observable data for theQoI that can be predicted by the model. We let ( Q n ) denote a sequence of approximate parameter-to-QoImaps deﬁned by the approximate models. Q with Approximate Models To facilitate the analysis of solutions to the forward problem presented in the introduction, we ﬁrst formalizethe forward problem deﬁnition. Let ( Λ , B Λ , µ Λ ) and ( D , B D , µ D ) denote measure spaces with B Λ and B D theBorel σ -algebras inherited from the metric topologies on Λ ⊂ R k and D ⊂ R m , respectively. The measures µ Λ and µ D are the dominating measures for which probability densities (i.e., Radon-Nikodym derivatives ofprobability measures) are deﬁned on each space. Deﬁnition 2.1 (Forward Problem and Push-Forward Measure) . Given a probability measure P Λ on ( Λ , B Λ ) that is absolutely continuous with respect to µ Λ and admits a density π Λ , the forward problem is the deter-mination of the push-forward probability measure P Q D ( A ) = P Λ ( Q − ( A )) , ∀ A ∈ B D . on ( D , B D ) that is absolutely continuous with respect to µ D and admits a density π Q D . Here, and in the remainder of the paper, we assume that the parameter-to-QoI map, Q , is a measurable andpiecewise smooth map between ( Λ , B Λ ) and ( D , B D ) so that Q − ( A ) = { λ ∈ Λ | Q ( λ ) ∈ A } ∈ B Λ , and Q ( Q − ( A )) = A. If P Λ is described in terms of a density π Λ with respect to µ Λ (i.e., π Λ = dP Λ /dµ Λ is the Radon-Nikodymderivative of P Λ ), it is not necessarily the case that P Q D is absolutely continuous with respect to the Lebesguemeasure on D . Following [11], we assume that either the measure µ D on ( D , B D ) is deﬁned as the push-forward of µ Λ , or the push-forward of µ Λ is absolutely continuous with respect to a speciﬁed µ D .In practice, even if an exact parameter-to-QoI map Q ( λ ) is available, we will often approximate thepush-forward of π Λ using ﬁnite sampling and standard density estimation techniques. We formalize, inAssumption 1 below, the types of parameter densities π Λ , for which we may reasonably expect to obtainaccurate approximations of π Q D using Monte Carlo sampling and standard density estimation techniques. Assumption 1.

For a given Q ( λ ) , π Λ is chosen so that sup q ∈D π Q D ( q ) ≤ B for some B > , and π Q D iscontinuous on D except possibly on a set A ⊂ D of zero µ D -measure. Generally, for any ﬁnite set of samples for any distribution, { q i } Mi =1 , a standard kernel density estimateof a density π Q D ( q ) will produce a bounded approximation that is continuous everywhere and has the formˆ π Q D ( q ) = 1 M h mM M (cid:88) i =1 K (cid:18) q − q i h M (cid:19) , (2.1)where h M is the bandwidth parameter and K ( q ) is the kernel function. It is common to assume thatthe kernel is integrable with (cid:82) D K ( q ) dµ D = 1 and is bounded, i.e., there exist a constant κ such that (cid:107) K ( q ) (cid:107) L ∞ ( D ) ≤ κ < ∞ .The accuracy of the estimated push-forward density is dependent on the number of samples M anddimension m of the space. The following result from [20] gives a rate of convergence in the L ∞ -norm undercertain assumptions on the regularity of the density and the kernel. Theorem 2.1. If π Q D and the s th -order derivatives of π Q D are uniformly continuous, K ( q ) is an s th -orderkernel that is bounded and integrable, and h M satisﬁes the criteria described in [20], then the error in thekernel density estimate given by (2.1) satisﬁes (cid:107) π Q D ( q ) − ˆ π Q D ( q ) (cid:107) L ∞ ( D ) ≤ C (cid:18) log MM (cid:19) s/ (2 s + m ) . The Gaussian kernel is a popular choice for which s = 2 yielding an O ( M − / (4+ m ) ) rate of convergence inthe L ∞ -norm if one ignores the log factor. The rate of convergence of the KDE using the Gaussian kernel canalso be shown to be O ( M − / (4+ m ) ) in the mean-squared error [38] and O ( M − / (4+ m ) ) in the L -error [14] T. Butler, J. Jakeman, T. Wildey under similar assumptions on the kernel, the bandwidth parameter and the regularity of π Q D . Since the rateof convergence is rather slow in M and scales poorly with dimension, KDEs often requires a large numberof samples to achieve an acceptable level of accuracy motivating the use of (computationally inexpensive)approximate models. Let ( Q n ( λ )) denote a sequence of approximations to Q ( λ ). The use of approximate QoI maps introducesan additional error in the estimates of push-forward densities. Two practical requirements are neededto approximate the push-forward densities using any particular Q n ( λ ). The ﬁrst requirement is that theapproximate push-forward densities are uniformly bounded if the exact push-forward density is boundedso that point-wise errors are not allowed to become arbitrarily large in which case we would not expectconvergence at all. The second requirement puts constraints on the continuity of the approximate push-forward density. To formalize the second requirement we use a generalized notion of equicontinuity toconsider functions that may have many points of discontinuity such as density functions that are onlycontinuous in an a.e. sense. Deﬁnition 2.2.

Using similar notation from [37], we say that a sequence of real-valued functions ( u n ) deﬁned on R k is asymptotically equicontinuous (a.e.c.) at x ∈ R k if ∀ (cid:15) > , ∃ δ ( x, (cid:15) ) > , n ( x, (cid:15) ) s.t. | y − x | < δ ( x, (cid:15) ) , n > n ( x, (cid:15) ) ⇒ | u n ( y ) − u n ( x ) | < (cid:15). If δ ( x, (cid:15) ) = δ ( (cid:15) ) and n ( x, (cid:15) ) = n ( (cid:15) ) , then we say that the sequence is asymptotically uniformly equicontinuous(a.u.e.c.) . Using this deﬁnition and letting π Q n D denote the push-forward of the prior density using the map Q n ( λ )we make the following assumption to encode our two practical requirements. Assumption 2.

Let ( Q n ( λ )) denote a sequence of approximations to Q ( λ ) , then there exists B > suchthat for any n , sup q ∈D π Q n D ( q ) ≤ B . Moreover, for any δ > , there exists N δ ⊂ D such that A ⊂ N δ , µ D ( N δ ) < δ , and the sequence of approximate push-forward densities is a.u.e.c. on D\ N δ . This assumption allows for the construction of any approximate push-forward density which is discontinu-ous more often than the exact push-forward density as long as the magnitude of the discontinuities decreasesasymptotically except possibly in a set that can be made arbitrarily small in µ D -measure that containsdiscontinuities of the exact π Q D . Under Assumptions 1 and 2, we can construct push-forward densities usingapproximate models that converge as the approximate model is reﬁned. Theorem 2.2 (Convergence of Push-Forward Densities) . Let ( Q n ( λ )) denote a sequence of approximationsto Q ( λ ) such that Q n ( λ ) → Q ( λ ) in L ∞ ( Λ ) as n → ∞ , i.e., ∀ δ > , ∃ N s.t. n > N ⇒ (cid:107) Q n ( λ ) − Q ( λ ) (cid:107) L ∞ ( Λ ) < δ. (2.2) If Assumptions 1 and 2 hold, then for any (cid:15) > , there exists N such that n > N implies that both (cid:13)(cid:13)(cid:13) π Q D ( q ) − π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) < (cid:15), (2.3) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) < (cid:15). (2.4)In Theorem 2.2, (2.3) implies that the approximate push-forward densities converge in L ∞ ( D ), i.e.,the densities associated with the forward propagation of densities converge on D when evaluated usingexact values of the QoI q . However, in practice, we evaluate the approximate push-forward density at anapproximate QoI value to determine variations in relative likelihoods of the QoI data as parameters are varied.Equation (2.4) states that the approximate push-forward densities evaluated at approximate values of the Using this deﬁnition of equicontinuity, sequences of functions that are either equicontinuous or uniformly equicontinuousin the classical sense are automatically a.e.c. or a.u.e.c. since the deﬁnitions coincide if this deﬁnition is restricted to sequencesof continuous functions.

Q with Approximate Models L ∞ ( Λ ).Before proving Theorem 2.2, we ﬁrst provide some context for the approach. Certainly, for any p ≥ L p implies convergence in probability, which in turn implies convergence in distribution (i.e.,weak convergence), so we have that the sequence of push-forward measures associated with Q n ( λ ) convergeweakly to the push-forward measure of Q ( λ ). While Scheﬀe’s theorem states that a.e. convergence of densitiesimplies convergence in distribution of the random variables, the converse is generally not true as mentionedin the introduction.In [5], a converse to Scheﬀe’s theorem is proven under the conditions that the densities associated withweakly convergent distributions are point-wise bounded and uniformly equicontinuous from which the clas-sical Arzel`a-Ascoli theorem implies uniform convergence of the densities. Subsequently, in [37], this converseto Scheﬀe’s theorem was generalized for classes of densities that are a.u.e.c. for the sequence of distributionsconverging weakly to a distribution with a continuous density. Therefore, in the proof below, we begin byisolating discontinuities in the exact and approximate push-forward densities using Assumptions 1 and 2 toapply this converse to Scheﬀe’s theorem on “most” of D . Proof

Let (cid:15) > δ = (cid:15) B + B ) . Let N δ denote the associated set such that µ D ( N δ ) < δ in Assumption 2. Then, (cid:13)(cid:13)(cid:13) π Q D ( q ) − π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) = (cid:13)(cid:13)(cid:13) π Q D ( q ) − π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( N δ ) + (cid:13)(cid:13)(cid:13) π Q D ( q ) − π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D\ N δ ) (2.5)By the choice of δ , the ﬁrst term on the right-hand side of (2.5) is bounded by (cid:15)/

2. By Theorem 1 in [37], π Q n D → π Q D uniformly on D\ N δ . Thus, the second term on the right-hand side of (2.5) can also be boundedby (cid:15)/ n suﬃciently large, which proves (2.3).To prove (2.4), we ﬁrst apply a triangle inequality to get (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − π Q D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) + (cid:13)(cid:13)(cid:13) π Q D ( Q n ( λ )) − π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) (2.6)By (2.2) and Assumption 1, there exists δ > (cid:15)/

2. Note that the norm for the second term on the right-hand side of (2.6) is equivalent to the L ∞ ( D ) norm since the arguments in the densities are identical. Then, by the above argument, this can bebounded by (cid:15)/

2, which proves (2.4).The next lemma states that the KDE approximation using the sequence of approximate models convergesto the KDE approximation using the true model. The proof of Lemma 2.1 is straightforward and is omittedfor the sake of brevity.

Lemma 2.1.

Assume that a set of M samples are used to generate KDE approximations using the truemodel and the approximate model giving ˆ π Q D and ˆ π Q n D respectively. If K ( q ) is Lipschitz continuous, then wehave the following bounds on the error in the KDE approximation using the approximate model, (cid:13)(cid:13)(cid:13) ˆ π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) , (2.7) and (cid:13)(cid:13)(cid:13) ˆ π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) . (2.8)It is important to note that Lemma 2.1 does not require the same assumptions as Theorem 2.1 since itonly shows that for a given set of samples, the KDE approximation using the approximate model convergesto the KDE approximation using the true model. This is true even if the KDE approximation using the truemodel does not converge to the true density. T. Butler, J. Jakeman, T. Wildey

Under the stricter assumptions in Theorem 2.1, i.e., those necessary to prove convergence of the KDE,we prove that the approximation of the push-forward using the KDE and the approximate model convergesto the true density at a rate that depends on both the KDE approximation error as well as the approximatemodel error. In Section 4, we give corollaries to this result for speciﬁc choices of approximate models.

Theorem 2.3 (Convergence of KDE Approximations of Push-Forward Densities) . Assume that a KDEapproximation is generated using M samples of the approximate model. If π Q D , K ( q ) and the bandwidthparameter satisfy the assumptions in Theorem 2.1, and if K is also Lipschitz continuous, then we have thefollowing bounds on the error in the KDE approximation using the approximate model, (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) (cid:33) , (2.9) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) (cid:33) . (2.10) Proof

First we prove (2.9). An application of the triangle inequality gives (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) + (cid:13)(cid:13)(cid:13) ˆ π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) . The ﬁrst term is bounded using Theorem 2.1 and the second term is bounded using Lemma 2.1. Next, weprove (2.10). Proceeding as before, we apply the triangle inequality twice to obtain (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q D ( Q ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) + (cid:13)(cid:13)(cid:13) ˆ π Q D ( Q ( λ )) − ˆ π Q n D ( Q ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) + (cid:13)(cid:13)(cid:13) ˆ π Q n D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) . The ﬁrst term is bounded using Theorem 2.1 and the second and third terms are bounded using Lemma 2.1.

In this section we consider the solution of a inverse problem using approximate models.

To facilitate analysis of solutions to the inverse problem presented in the introduction, we ﬁrst formalize itsdeﬁnition.

Deﬁnition 3.1 (Inverse Problem and Consistent Measure) . Given a probability measure P D on ( D , B D ) that is absolutely continuous with respect µ D and admits a density π D , the inverse problem is to determine aprobability measure P post Λ on ( Λ , B Λ ) that is absolutely continuous with respect to µ Λ and admits a probabilitydensity π post Λ , such that the subsequent push-forward measure induced by the map, Q ( λ ) , satisﬁes P post Λ ( Q − ( A )) = P Q D ( A ) = P D ( A ) , (3.1) for any A ∈ B D . We refer to any probability measure P post Λ that satisﬁes (3.1) as a consistent solution tothe inverse problem. Clearly, the inverse problem may not have a unique solution, i.e., there may be multiple probabilitymeasures that push-forward to the observed measure. This is analogous to a deterministic inverse problemwhere multiple sets of parameters may produce a ﬁxed observed datum. A unique solution may be obtainedby imposing additional constraints or structure on the inverse problem. In this paper, such structure isobtained by incorporating prior information to construct a unique solution to the inverse problem as ﬁrstproposed in [11].

Q with Approximate Models Given a prior probability measure P Λ on ( Λ , B Λ ) that is absolutely continuous with respect to µ Λ andadmits a probability density π Λ we make the following assumption to guarantee existence and uniqueness ofa solution in terms of a density that is computable with standard density approximation techniques necessaryfor approximation of π Q D . Assumption 3.

There exists

C > such that π D ( q ) ≤ Cπ Q D ( q ) for a.e. q ∈ D . Since the observed density and the model are assumed to be ﬁxed, this is only an assumption on the prior.We sometimes refer to this assumption as the

Predictability Assumption since it implies that any output eventwith non-zero observed probability has a non-zero predicted probability deﬁned by the push-forward of theprior. This assumption is consistent with the convention in Bayesian methods to choose the prior to be asgeneral as possible because if the prior predicts that the probability of an event that actually occurs is zero,then even exhaustive sampling of the prior will be insuﬃcient for incorporating data associated with thisevent into the posterior measure.Given an appropriate prior, constructing a consistent posterior solution is based on the following result.

Theorem 3.1 (Disintegration Theorem [13]) . Assume Q : Λ → D is B Λ -measurable, P Λ is a probabilitymeasure on ( Λ , B Λ ) and P D is the push-forward measure of P Λ on ( D , B D ) . There exists a P D -a.e. uniquelydeﬁned family of conditional probability measures { P q } q ∈D on ( Λ , B Λ ) such that for any A ∈ B Λ , P q ( A ) = P q ( A ∩ Q − ( q )) , so P q ( Λ \ Q − ( q )) = 0 , and there exists the following disintegration of P Λ , P Λ ( A ) = (cid:90) D P q ( A ) dP D ( q ) = (cid:90) D (cid:18) (cid:90) A ∩ Q − ( q ) dP q ( λ ) (cid:19) dP D ( q ) , (3.2) for A ∈ B Λ . Assumption 3 automatically gives that P D is absolutely continuous with respect to µ D . Thus, writing dP D ( q ) = π D ( q ) dµ D ( q ) and using the prior to deﬁne the conditionals densities in the iterated integral (3.2)we arrive at the following theorem. Theorem 3.2 (Existence and Uniqueness [11]) . The probability measure P post Λ on ( Λ , B Λ ) deﬁned by P post Λ ( A ) = (cid:90) D (cid:18) (cid:90) A ∩ Q − ( q ) π Λ ( λ ) π D ( Q ( λ )) π Q D ( Q ( λ )) dµ Λ ,q ( λ ) (cid:19) dµ D ( q ) , ∀ A ∈ B Λ (3.3) is a consistent solution to the inverse problem in the sense of (3.1) and is uniquely determined for a givenprior probability measure P Λ on ( Λ , B Λ ) . The probability density of the consistent solution is given by π post Λ ( λ ) = π Λ ( λ ) π D ( Q ( λ )) π Q D ( Q ( λ )) , λ ∈ Λ . (3.4)Each of the terms in (3.4) has a particular statistical interpretation and we refer the interested readerto [11] for a detailed discussion. Computing the posterior density (3.4) only requires the construction of thepush-forward of the prior π Q D since the prior and the observed densities are assumed a priori . When using approximate models to solve the inverse problem, we must make the following assumption,which is analogous to Assumption 3, to guarantee existence and uniqueness of approximate solutions.

Assumption 4.

There exists

C > such that π D ( q ) ≤ Cπ Q n D ( q ) for a.e. q ∈ D . Violation of Assumption 4 implies that for the chosen π Λ the approximate forward map given by Q n ( λ )cannot predict the observed data. T. Butler, J. Jakeman, T. Wildey

Recalling the formal expression for π post Λ given by (3.4), we deﬁne π post ,n Λ ( λ ) = π Λ ( λ ) π D ( Q n ( λ )) π Q n D ( Q n ( λ )) = π Λ ( λ ) r n ( λ ) , with r n ( λ ) = π D ( Q n ( λ )) π Q n D ( Q n ( λ )) . (3.5)The error in the total variation metric of the approximate posterior pdf is given by (cid:13)(cid:13) π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) = (cid:90) Λ π Λ ( λ ) | r n ( λ ) − r ( λ ) | dµ Λ . (3.6)Since π post ,n Λ ( λ ) and π post Λ ( λ ) are both in L ( Λ ), we have that π Λ ( λ ) | r n ( λ ) − r ( λ ) | is also in L ( Λ ). By astandard result in measure theory, for each η ∈ (0 , > Λ η ⊂ Λ such that (cid:13)(cid:13) π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ \ Λ η ) < η. This immediately implies that both (cid:90) Λ η π post Λ ( λ ) dµ Λ ≥ − η, and (cid:90) Λ η π post ,n Λ ( λ ) dµ Λ ≥ − η. Thus, we can rewrite the error shown in (3.6) as the sum of two integrals. The ﬁrst integral is over thecompact set Λ η containing “most of the probability” for either the exact or approximate posterior probabilitymeasure. The second integral is over the (potentially unbounded) Λ \ Λ η , where the probabilities of both theexact and approximate posterior probabilities are less than η .To simplify the analysis to follow, we assume that Λ is precompact (i.e., bounded in R k ), which sub-sequently implies D has ﬁnite µ D -measure if the QoI map is piecewise smooth and bounded. If Λ is notprecompact, then we simply note that the analysis we provide can be used to prove convergence of theapproximate posterior pdf on any compact subset of Λ . In Section 5, we brieﬂy discuss how it may also bepossible to replace µ Λ , and subsequently µ D , by ﬁnite measures dominating the probability measures deﬁnedon ( Λ , B Λ ) and ( D , B D ), respectively to arrive at similar theoretical results on Λ that are not precompact. Theorem 3.3 (Convergence of Posterior Densities) . Suppose Assumptions 1, 2, 3 and 4 hold, π D is aLipschitz continuous function on D , and ( Q n ( λ )) is a sequence of approximations to Q ( λ ) such that Q n ( λ ) → Q ( λ ) in L ∞ ( Λ ) as n → ∞ . Then, for any (cid:15) > and precompact Λ , there exists N such that n > N implies (cid:13)(cid:13) π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) < (cid:15), (3.7) where π post ,n Λ is the approximate posterior density obtained using Q n ( λ ) and its associated push-forward ofthe prior density denoted by π Q n D . Proof

Let Q n ( λ ) be any of the approximations from ( Q n ( λ )). We ﬁnd it convenient to apply thedisintegration theorem using the map Q n ( λ ) to rewrite (3.6) as (cid:13)(cid:13) π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) = (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) | r n ( λ ) − r ( λ ) | dµ Λ ,q dµ D . (3.8)Observe that the diﬀerence in ratios given by | r n ( λ ) − r ( λ ) | can be rewritten as | r n ( λ ) − r ( λ ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q n ( λ )) π Q D ( Q ( λ )) − π D ( Q ( λ )) π Q n D ( Q n ( λ )) π Q n D ( Q n ( λ )) π Q D ( Q ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Then, by adding and subtracting π D ( Q ( λ )) π Q D ( Q ( λ )) in the numerator, this diﬀerence can be decomposedas | r n ( λ ) − r ( λ ) | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q n ( λ )) − π D ( Q ( λ )) π Q n D ( Q n ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) I + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q ( λ )) (cid:104) π Q D ( Q ( λ )) − π Q n D ( Q n ( λ )) (cid:105) π Q n D ( Q n ( λ )) π Q D ( Q ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) II . (3.9) Q with Approximate Models (cid:15) > N such that if Q n ( λ ) is chosen from ( Q n ( λ )) n>N , then (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) I dµ Λ ,q dµ D < (cid:15)/ , and (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) II dµ Λ ,q dµ D < (cid:15)/ . Since π D is assumed to be Lipschitz continuous with Lipschitz constant C ≥

0, then I ≤ C | Q ( λ ) − Q n ( λ ) | π Q n D ( Q n ( λ )) . (3.10)H¨older’s inequality then implies (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) I dµ Λ ,q ( λ ) dµ D ( q ) ≤ C (cid:90) D (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ∩ Q − n ( q )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) π Λ ( λ ) π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( Λ ∩ Q − n ( q )) dµ D ( q )By the disintegration theorem, for a.e. q ∈ D , π Λ ( λ ) /π Q n D ( Q n ( λ )) is a conditional density on Λ ∩ Q − n ( q ),i.e., (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) π Λ ( λ ) π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( Λ ∩ Q − n ( q )) = 1 for a.e. q ∈ D . It follows that (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) I dµ Λ ,q ( λ ) dµ D ( q ) ≤ Cµ D ( D ) (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) . (3.11)Recall the assumption of precompactness of Λ implies µ D ( D ) < ∞ . Then, since Q n ( λ ) → Q ( λ ) in L ∞ ( Λ ),we have that this bound can be made smaller than (cid:15)/ n suﬃciently large.We now choose a new C > q ∈ D , π D ( q ) π Q D ( q ) ≤ C. Using H¨older’s inequality and the disintegration theorem as before, we have that (cid:90) D (cid:90) Λ ∩ Q − n ( q ) π Λ ( λ ) II dµ Λ ,q ( λ ) dµ D ( q ) ≤ Cµ D ( D ) (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) . By Theorem 2.2, this term can also be bounded by (cid:15)/ n suﬃciently large, which completes theproof.Next, we consider the utilization of a KDE to approximate the push-forward of the prior and assess howthis aﬀects the approximation of the posterior. We deﬁneˆ π post ,n Λ ( λ ) = π Λ ( λ ) π D ( Q n ( λ ))ˆ π Q n D ( Q n ( λ )) = π Λ ( λ )ˆ r n ( λ ) , with ˆ r n ( λ ) = π D ( Q n ( λ ))ˆ π Q n D ( Q n ( λ )) , (3.12)which represents the approximation of the posterior using the approximate model and a KDE approximationof the push-forward of the prior through the approximate model. We emphasize that ˆ π post ,n Λ is not a KDEapproximation of the posterior. As in previous sections, we need to make a predictability assumption on theKDE approximation of the push-forward of the prior through the approximate model. Assumption 5.

There exists ˆ C > such that π D ( q ) ≤ ˆ C ˆ π Q n D ( q ) for a.e. q ∈ D . Violation of Assumption 5 implies that for the chosen π Λ the KDE approximation constructed using theapproximate forward map given by Q n ( λ ) cannot predict the observed data. The only diﬀerence betweenAssumption 4 and Assumption 5 is the incorporation of the kernel density approximation, so we would expectthese assumptions to be roughly equivalent for large M .Under the strict assumptions on π Q D required for Theorem 2.1, we can prove the following theoreminvolving the rate of convergence of ˆ π post ,n Λ to π post Λ .0 T. Butler, J. Jakeman, T. Wildey

Theorem 3.4 (Convergence of Posterior Densities with KDE Approximation) . Assume that a KDE approx-imation is generated using M samples of the approximate model. Suppose Assumptions 1, 2, 3, 4 and 5 andthe assumptions in Theorem 2.1 hold. Furthermore, assume Λ is precompact, π D is a Lipschitz continuousfunction on D , and ( Q n ( λ )) is a sequence of approximations to Q ( λ ) such that Q n ( λ ) → Q ( λ ) in L ∞ ( Λ ) as n → ∞ . Then, (cid:13)(cid:13) ˆ π post ,n Λ ( λ ) − π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) (cid:33) . (3.13) Proof

The proof of (3.13) follows the proof of Theorem 3.3 very closely since the approximate map, Q n ( λ ), is the same, and only additional approximation is the value of the push-forward of the prior usingthe KDE. For the sake of brevity, we only discuss the diﬀerent arguments used here. We follow the samearguments to decompose, | ˆ r n ( λ ) − r ( λ ) | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q n ( λ )) − π D ( Q ( λ ))ˆ π Q n D ( Q n ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) I + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π D ( Q ( λ )) (cid:104) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:105) ˆ π Q n D ( Q n ( λ )) π Q D ( Q ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:124) (cid:123)(cid:122) (cid:125) II . The ﬁrst term is bounded similar to before, except for the fact that for a.e. q ∈ D , the ratio π Λ ( λ ) / ˆ π Q n D ( Q n ( λ ))is only an approximation of a conditional density. However, we can use the fact that π Q n D ( Q n ( λ )) andˆ π Q n D ( Q n ( λ )) are constants along Λ ∩ Q − n ( q ) along with Assumptions 4 and 5 to show (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) π Λ ( λ )ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( Λ ∩ Q − n ( q )) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π Q n D ( Q n ( λ ))ˆ π Q n D ( Q n ( λ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) π Λ ( λ ) π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( Λ ∩ Q − n ( q )) ≤ ˆ CC for a.e. q ∈ D , where C and ˆ C are the constants in Assumptions 4 and 5 respectively. The bound on II follows a similar argument, but uses Theorem 2.3 instead of 2.2. Theorem 3.5 (Convergence to a KDE Approximation) . Assume that a set of M samples are used to generateKDE approximations using the true model and the approximate model giving ˆ π post Λ and ˆ π post ,n Λ respectively.Suppose Assumptions 1, 2, 3, 4 and 5 hold.Then, (cid:13)(cid:13) ˆ π post ,n Λ ( λ ) − ˆ π post Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:0) (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) (cid:1) . (3.14)The proof of Theorem 3.5 is almost identical to the proof of Theorem 3.4. The only diﬀerence is the useof Lemma 2.1 instead of Theorem 2.3. In this section we specialize the general theory to two classes of convergent approximate models. Speciﬁcally,we consider sparse grid surrogate approximations, discretized partial diﬀerential equations and combinationsof these two. For each application, we ﬁrst describe the sequence of discretized models and recall the knownresults regarding the convergence of the approximate model. These theoretical results are combined withTheorems 2.3 and 3.4 to give corollaries speciﬁc to each class of approximate model. We also provide numer-ical results to demonstrate the convergence of the push-forward of the prior and the posterior. We considerrelatively simple numerical examples to enable construction of sequences of numerical approximations todemonstrate convergence of the push-forward and posterior densities. For the sake of brevity, we only an-alyze the contribution of the KDE to the error in the ﬁrst application. The corresponding results for theother two applications were similar. Before we present the applications, we discuss some of the numericalconsiderations that are common throughout the remainder of this paper and some of the diagnostic tools weuse to assess whether or not the predictability assumptions have been satisﬁed.

Q with Approximate Models In all the application to follow, we investigate the convergence in the push-forward and posterior densitiesas the approximate models converge. For both the exact and approximate models, we use a Gaussian KDEto construct estimates of the push-forward densities.We estimate the L ∞ -norm of the push-forward using the samples generated from P Λ . That is (cid:107) π Q D ( λ ) − ˆ π Q n D ( λ ) (cid:107) L ∞ ( D ) ≈ max ≤ i ≤ M | π Q D ( Q ( λ i )) − ˆ π Q n D ( Q n ( λ i )) | . Similarly we estimate the L -norm of the posterior using the samples generated from the prior: (cid:107) π post Λ ( λ ) − ˆ π post ,n Λ ( λ ) (cid:107) L ( Λ ) = (cid:107) r ( λ ) − ˆ r n ( λ ) (cid:107) L ( Λ ; P Λ ) ≈ M M (cid:88) i =1 | r ( Q ( λ i )) − ˆ r n ( Q n ( λ i )) | . Approximating the posterior density considered in this work requires solving the forward UQ problemﬁrst. Once the solution to the forward UQ problem has been obtained, i.e., the push-forward π Q D has beenapproximated, we can then evaluate the posterior density at the samples used to build π Q D at no extra costusing (3.4). We can then use a standard rejection sampling strategy to accept a subset of these samples forthe posterior (see [11]).In traditional Bayesian inference (see e.g. [36, 25, 4, 34, 17, 24]), very little in known about the posteriormeasure or density . The situation is slightly diﬀerent for the approach considered in this work. In ourformulation, the posterior is constructed such that the push-forward of the posterior matches the givendensity on the QoI, which gives us a means to assess the accuracy of the posterior. For example, if theobserved density on the QoI is Gaussian, then we can compare the mean and the variance of the push-forward of the posterior with the corresponding values for π D . We can also estimate the integral of theposterior, I( π post Λ ) = (cid:90) Λ π post Λ ( λ ) dµ Λ = (cid:90) Λ π Λ ( λ ) r ( Q ( λ )) dµ Λ = (cid:90) Λ r ( Q ( λ )) dP Λ and the Kullback-Liebler (KL) divergence [26] between the prior and the posterior,KL( π Λ : π post Λ ) = (cid:90) Λ π post Λ ( λ ) log (cid:32) π post Λ ( λ ) π Λ ( λ ) (cid:33) dµ Λ = (cid:90) Λ r ( Q ( λ )) log r ( Q ( λ )) dP Λ . In practice, we actually compute I(ˆ π post ,n Λ ) and KL( π Λ : ˆ π post ,n Λ ). Both of these quantities can beestimated using the model evaluations that were used to estimate the push-forward of the prior, and aretherefore very cheap diagnostic tools [11]. The integral of the posterior is an especially useful diagnostic toolsince it is easy to show that I( π post Λ ) = (cid:90) Λ π post Λ ( λ ) dµ Λ = (cid:90) D π D ( q ) dµ D . Thus, if the approximation of the posterior using the approximate model and a KDE for the push-forward ofthe prior does not integrate to one, or at least a reasonable Monte Carlo estimate of one, then this indicatesthat Assumption 5 is not satisﬁed.

Surrogates of the parameter-to-QoI map are often used to reduce the computational cost of uncertaintyquantiﬁcation. The theory we have developed thus far is quite general and can be applied to any surrogatemodel, however here we focus on sparse grid collocation methods, which are known to provide eﬃcient andaccurate approximation of stochastic quantities [22, 27, 30].For simplicity we will restrict attention to consider stochastic collocation problems characterized byvariables λ with ﬁnite support normalized to ﬁt in the domain Λ = [0 , k . However, the technique proposedhere can be applied to semi or unbounded random variables using the methodology outlined in [21]. Unless the map is linear and the prior and noise model are Gaussian. In this case the posterior is also known to be Gaussian. T. Butler, J. Jakeman, T. Wildey

Given a univariate interpolation rule with points Λ n = { λ n,i : i < ≤ i ≤ P n } and basis functions φ n,i , a level- n sparse grid with N n points [7] approximates Q ( λ ) via a weighted linear combination of tensorproducts of the univariate rules Q n ( λ ) = (cid:88) n − k +1 ≤| n | ≤ n ( − n −| n | (cid:18) k − n − | n | (cid:19) P n (cid:88) i =1 · · · P nk (cid:88) i k =1 f ( λ n , i ) · ( φ n ,i ⊗ · · · ⊗ φ n k ,i k ) ( λ ) (4.1)The samples λ n , i used to construct the sparse grid are a set of anisotropic grids Λ n = Λ n × · · · Λ n k on thedomain Λ , where n = ( n , . . . , n k ) and i = ( i , . . . , i k ) are multi-indices that denote the level and positionof a point within each univariate interpolation rule.Typically when approximating Q ( λ ) with a smooth dependence on λ , the Lagrange polynomials are thebest choice of basis functions and Λ n are chosen to be a set of well conditioned points such as the nestedClenshaw-Curtis points. The number of points P n of a one-dimensional grid of a given level, and thus thetotal number of points in the sparse grid N n , is dependent on the growth rate of the quadrature rule chosen.For Clenshaw-Curtis points P n = 2 n + 1. Under some assumptions on the regularity we recall the followingresult. Lemma 4.1 ([31]) . For suﬃciently smooth Q ( λ ) , the isotropic level- n sparse-grid (4.1) based on Clenshaw-Curtis abscissas with N n points satisﬁes: (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) ≤ C ( σ ) N − µ n where µ := σ k and the constant C ( σ ) depends on the size of the region of analyticity σ of Q but noton the number of points in the sparse grid. Combining Lemma 4.1 with Theorem 2.3 gives the following result.

Corollary 4.1.

Under the assumptions of Theorem 2.3 and Lemma 4.1, the error in the push-forward ofthe prior using an isotropic level- n sparse grid approximation based on Clenshaw-Curtis abscissas with N n points satisﬁes (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n (cid:33) , (4.2) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n (cid:33) , (4.3) where µ = σ k . Combining Lemma 4.1 with Theorem 3.4 gives the following result.

Corollary 4.2.

Under the assumptions of Theorem 3.4, the error in the posterior using an isotropic level- n sparse grid approximation based on Clenshaw-Curtis abscissas with N n points satisﬁes (cid:13)(cid:13) π post Λ ( λ ) − ˆ π post ,n Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n (cid:33) . (4.4) The goal of this section is to verify the convergence rates in Corollary 4.1. Consider the analytical functionof two input parameters, λ ∈ [0 , , Q ( λ ) = 2 e − ( λ − . . − ( λ − . . + 3 e − ( λ − . . − ( λ − . . + 2 . e − ( λ − . . − ( λ − . . − e − ( λ − . . − ( λ − . . which is a summation of four weighted Gaussian peaks. We assume a uniform prior probability distributionon the input parameters. Q with Approximate Models -0.500.511.522.5 -0.500.511.522.5 -0.500.511.522.5

Figure 1: The level-4 sparse grid approximation of the QoI (left), the level-8 sparse grid approximation ofthe QoI (middle), and the reference solution (right).Next, we focus on the KDE contribution to the error by ﬁxing the sparse grid at a level-12 approximation(577 model evaluations). We then use relatively small sets of samples in the KDE approximation of the push-forward of the prior setting M = 100 , , , , , L ∞ error in the push-forwardof the prior as the number of samples used to compute the KDE increases. Recall that the only diﬀerence · − . . . . . . . .

45 Number of Samples (M) L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.2Eq. 4.3 − − − − Number of Model Evaluations (N) L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.2Eq. 4.3

Figure 2: Convergence of the push-forward of the prior as the KDE is reﬁned and the sparse grid is heldﬁxed at the highest level (left), and as the sparse grid is reﬁned and KDE is ﬁxed with the maximum numberof samples.between (4.2) and (4.3) is whether we evaluate the KDE approximation at the reference QoI values or atthe approximate QoI values. Since we are using a fairly accurate response surface approximation, thesepointwise errors are relatively small and the diﬀerence between the two estimates is negligible.Next, we ﬁx M = 50 ,

000 to assess the error in the push-forward of the prior due to the responsesurface approximation. In Figure 2 (right), we see that the push-forward of the prior converges rapidly as4

T. Butler, J. Jakeman, T. Wildey the pointwise error in the response surface approximation decreases. In this case, we observe a diﬀerencebetween the error estimates given by (4.2) and (4.3) due to diﬀerence in where the KDE is evaluated.

The goal of this section is to verify Corollary 4.2. We use the model introduced in Section 4.2.1. To formulatea inverse problem, we assume that π D ∼ N (2 . , .

04) and use (3.4) to compute the posterior density. Forboth the reference solution and each sparse grid approximation, we use the 50,000 samples with a GaussianKDE to approximate the push-forward of the prior. The corresponding approximations of the posterior forthe level-4 and level-8 sparse grid approximations are shown in Figure 3 along with the posterior from thereference solution. The level-4 sparse grid provides a poor approximation of the response surface and the

Figure 3: Posterior corresponding to the level-4 sparse grid approximation (left), the level-8 sparse gridapproximation (middle), and the reference solution (right).corresponding posterior contains signiﬁcant error. The posterior corresponding to the level-8 sparse gridappears to be much closer to the reference solution.We use the standard rejection sampling strategy described in [11] to accept a subset of these samples forthe posterior. The accepted samples for the level-4 and level-8 sparse grid approximation of the posteriorare shown in Figure 4 along with the samples accepted from the reference solution.

Figure 4: Samples from the posterior corresponding to the level-4 sparse grid (left), the level-8 sparse gridapproximation (middle), and the reference solution (right).In Table 1, we show the diagnostic data on the posterior densities obtained using diﬀerent sparse gridlevels. The integral of the posterior clearly indicates that the level-2 sparse grid approximation does notsatisfy Assumption 4. Moreover, the mean and variance of the push-forward of the posterior do not matchthe corresponding values for π D . Thus, the level-2 sparse grid cannot be used to solve the inverse problem. Q with Approximate Models π post ,n Λ ) 0.001 0.930 0.983 0.983 0.983 1.000KL( π Λ : ˆ π post ,n Λ ) -0.004 2.163 1.981 1.981 1.981 UNKNMean PF-post 1.616 2.279 2.304 2.303 2.303 2.300Var. PF-post 2.369e-3 2.915e-2 4.168e-2 4.197e-2 4.192e-2 4.000e-2Table 1: Comparison of the integral of the posterior, the KLD from the prior to the posterior, and themean and variance of the push-forward of the posterior obtained using various sparse grid approximationsin Section 4.2.2.The diagnostic data for the level-4 is much better, but it is still not suﬃcient to allow this approximate modelto be used to solve the inverse problem. On the other hand, the information for the level-8 and level-12sparse grid approximations indicate that Assumption 4 is satisﬁed and that these approximate models canbe used to solve the inverse problem. In fact, this is true for the level-5 sparse grid and all higher-orderapproximations. Thus, throughout the remainder of this section we only use levels 5-12. We emphasize thatwhile the diagnostic data is useful to assess the usability of an approximate model, it does not provide anyinformation regarding the accuracy of the posterior.We ﬁrst seek to isolate the KDE contribution to the error in the posterior by ﬁxing the sparse gridapproximation at level-12 and using smaller subsets of the 50,000 samples as in Section 4.2.1. In Figure 5(left) we plot the error in the posterior density as the number of samples used to compute the KDE increases.Next, we ﬁx M = 50 ,

000 and assess the accuracy in the posterior as the sparse grid approximation is reﬁned. · − · − · − · − . . . . . . . .

24 Number of Samples (M) L E rr o r i n P o s t e r i o r Eq. 4.4 . . . . . . . − − − − Number of Model Evaluations (N) L E rr o r i n P o s t e r i o r L ∞ Error in Q n Eq. 4.4

Figure 5: Convergence of the posterior in the L -norm as the KDE approximation is reﬁned (left) andas the sparse grid is reﬁned (right). For reference, we also include the L ∞ error in the response surfaceapproximation in the right image.As mentioned in the previous section, by utilizing the same set of samples for the reference solution and foreach response surface approximation, we are able to isolate the contribution of the sparse grid approximationto the error. In Figure 5 (right), we clearly see that the error in the posterior, measured in the L -norm,decays at the same rate as the L ∞ -error in the response surface approximation. Consider the following general system of equations, ∂ u ∂t + A ( λ ; u ) = , (4.5)6 T. Butler, J. Jakeman, T. Wildey deﬁned on Ω × (0 , T ] where Ω ⊂ R s , s = 1 , ,

3, is a polygonal (polyhedral) and bounded domain withboundary ∂ Ω. As throughout the paper, the random parameter λ reﬂects sources of uncertainty, for example,uncertain initial or boundary conditions, forcing etc. The solution operator’s dependency on λ implies thatboth u := u ( x , t, λ ) and Q ( λ ) := Q ( u ( x , t, λ )) are also uncertain and may be modeled as a random processes.For the sake of simplicity, we assume that Q is a bounded continuous linear functional of u .In this paper we assume that A is convex and has smooth second derivatives. Speciﬁc examples of A and u will be given in subsequent sections. We assume that suﬃcient initial and boundary conditions areprovided so that (4.5) is well-posed in the sense that there exists a solution for a. e. λ ∈ Λ .Let T h be a conforming partition of Ω, composed of N h closed convex volumes of maximum diameter h . We assume that the mesh is regular in the sense of Ciarlet [12] and take T h to be a conforming ﬁniteelement mesh consisting of simplices or parallelopipeds. A fully discrete scheme for any λ ∈ Λ can beobtained by letting I j = ( t j − , t j ) and time steps ∆ t = max j t j − t j − denote the discretization of [0 , T ] as0 = t < t < · · · < t N t = T . In this paper, we assume that a ﬁrst-order Euler scheme (either implicit orexplicit) is used to discretize in time. To deﬁne the sequence of approximate models, ( Q n ( λ )), we deﬁnesequences of discretizations, h ≥ h ≥ . . . , and ∆ t ≥ ∆ t ≥ . . . , where h n , ∆ t n → n → ∞ . Then we deﬁne Q n ( λ ) to be the approximate model that uses h n and ∆ t n respectively. In cases where a unique and suﬃciently regular solution exists, one can obtain the followingerror bound using duality arguments (see e.g. [19, 15, 9, 32, 3]) (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) ≤ C ( u )( h r + αn + ∆ t n ) , (4.6)for some α ∈ [0 ,

1] where C ( u ) depends but does not depend on ∆ t n or h n . The parameter r is determinedby the regularity of the solution and the order of accuracy of the spatial discretization. We note that C ( u )typically depends on λ and α depends on the regularity of u . Combining (4.6) with Theorem 2.3 gives thefollowing result. Corollary 4.3.

Under the assumptions of Theorem 2.3, the error in the push-forward of the prior using adiscretization of (4.5) satisﬁes (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( u )( h r + αn + ∆ t n ) (cid:33) , (4.7) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( u )( h r + αn + ∆ t n ) (cid:33) . (4.8)Combining (4.6) with Theorem 3.4 gives the following result. Corollary 4.4.

Under the assumptions of Theorem 3.4, the error in the posterior using a discretization of (4.5) satisﬁes (cid:13)(cid:13) π post Λ ( λ ) − ˆ π post ,n Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( u )( h r + αn + ∆ t n ) (cid:33) . (4.9) The goal of this section is to verify the convergence rates in Corollary 4.3. Consider a single-phase incom-pressible ﬂow model:  −∇ · ( K ( λ ) ∇ u ) = 0 , ( x, y ) ∈ Ω = (0 , ,u = 1 , x = 0 ,u = 0 , x = 1 ,K ∇ p · n = 0 , y = 0 and y = 1 . (4.10) Q with Approximate Models u is the pressure ﬁeld and K is the permeability ﬁeld which we assume is a scalar ﬁeld given by aKarhunen-Lo´eve expansion of the log transformation, Y = log K , with Y ( λ ) = Y + ∞ (cid:88) i =1 ξ i ( λ ) √ η i f i ( x, y ) , where Y is the mean ﬁeld and ξ i are mutually uncorrelated random variables with zero mean and unit variance[16, 39]. The eigenvalues, η i , and eigenfunctions, f i , are computed using an assumed functional form for thecovariance matrix [42, 35]. We assume a correlation length of 0 .

01 in each spatial direction and truncate theexpansion at 100 terms. This choice of truncation is purely for the sake of demonstration. In practice, theexpansion is truncated once a suﬃcient fraction of the energy in the eigenvalues is retained [42, 16]. Ourquantity of interest is the pressure at the point (0 . , . ∈ Ω. The prior is a multivariate standardnormal density π Λ ∼ N ( , I ) where I is the standard identity matrix.To approximate solutions to the PDE in Eq. (4.10) we use a ﬁnite element discretization with continuouspiecewise bilinear basis functions deﬁned on a uniform spatial grid. We vary the number of grid points in eachspatial direction ( h = 1 / , / , / , / , / Q n is associated with a particular value of h . The asymptotic value of the quantity of interest is unknown, butfor each sample in Λ we have the QoI on a sequence of grids so we use Richardson extrapolation to estimatea reference solution for the QoI. We generate 10,000 samples from the prior and evaluate each discretizationof the PDE model for each of these realizations. We use a standard Gaussian KDE to approximate the push-forward of the prior in the 1-dimensional output space. In Figure 6 (left), we plot the convergence of thepush-forward of the prior as the physical discretization is reﬁned. We do see a signiﬁcant diﬀerence between . . . . . . . . . . . − − − L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.7Eq. 4.8 . . . . . . . . . . . − − L E rr o r i n P o s t e r i o r L ∞ Error in Q n Eq. 4.9

Figure 6: Convergence of the push-forward of the prior (left) and convergence of the QoI in the L ∞ -normand the posterior in the L -norm (right) as the spatial approximation is reﬁned.(4.7) and (4.8), but the errors eventually converge at approximately the same rate. The diﬀerence betweenthe two curves is because (4.7) evaluates error when the approximate push-forward density is evaluatedusing exact values of the QoI q , whereas (4.8) evaluates error using the approximate push-forward densitiesevaluated at approximate values of the QoI. The later evaluation introduces an additional source of errorand thus the error in (4.8) will always be larger than the error in (4.7). The goal of this section is to verify the convergence rate in Corollary 4.4. We use the model introducedin Section 4.3.1. To formulate a inverse problem, we assume the observed density on the QoI is givenby π D ∼ N (0 . , . T. Butler, J. Jakeman, T. Wildey h=1/10 h=1/20 h=1/40 h=1/80 h=1/160 Ref. TruthI( π post Λ ) 0.993 0.986 0.982 0.978 0.980 0.980 1.000KL( π Λ : π post Λ ) 1.344 1.344 1.326 1.341 1.353 1.357 UNKNMean PF-post 0.700 0.700 0.700 0.700 0.700 0.700 0.700Var. PF-post 0.997e-4 0.995e-4 0.978e-4 1.012e-4 1.003e-4 1.021e-4 1.000e-4Table 2: Comparison of the integral of the posterior, the KLD from the prior to the posterior, and themean and variance of the push-forward of the posterior obtained using various spatial approximations inSection 4.3.2.depends on the accuracy in the approximate model. In Figure 6 (right), we plot the L -norm of the error inthe posterior along with the L ∞ -norm of the error in the approximate model. We see that the error in theposterior converges at the same rate as the error in the approximate model. In this section we consider the common case when two forms of approximations are used to quantify uncer-tainty. Speciﬁcally we consider the situation when a sparse grid surrogate of a discretized model is used.In this setting, there are several ways to deﬁne the sequence Q n . We simply assume that the sequence isdeﬁned in such a way that for any 0 < m < n , we have h n ≤ h m , ∆ t n ≤ ∆ t m and N m ≤ N n . CombiningLemma 4.1 and (4.6) gives the following result. Lemma 4.2.

For suﬃciently smooth Q ( λ ) , the isotropic level- n sparse-grid (4.1) based on Clenshaw-Curtisabscissas with N n points and a discretization of (4.5) satisﬁes (cid:107) Q ( λ ) − Q n ( λ ) (cid:107) L ∞ ( Λ ) ≤ C ( σ ) N − µ n + C ( u ) (cid:0) h r + αn + ∆ t n (cid:1) where the constant C ( σ ) depends on the size of the region of analyticity σ of Q but not on the number ofpoints in the sparse grid and C ( u ) depends on the solution u but not the mesh and temporal resolution h and k , respectively. Combining Lemma 4.2 with Theorem 2.3 gives the following result.

Corollary 4.5.

Under the assumptions of Theorem 2.3 and Lemma 4.2, the error in the push-forward ofthe prior using an isotropic sparse grid approximation based on Clenshaw-Curtis abscissas with N n pointsand a discretization of (4.5) satisﬁes (cid:13)(cid:13)(cid:13) π Q D ( q ) − ˆ π Q n D ( q ) (cid:13)(cid:13)(cid:13) L ∞ ( D ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n + C ( u ) (cid:0) h r + αn + ∆ t n (cid:1)(cid:33) , (4.11) and (cid:13)(cid:13)(cid:13) π Q D ( Q ( λ )) − ˆ π Q n D ( Q n ( λ )) (cid:13)(cid:13)(cid:13) L ∞ ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n + C ( u ) (cid:0) h r + αn + ∆ t n (cid:1)(cid:33) , (4.12) where µ = σ m . Combining Lemma 4.2 with Theorem 3.4 gives the following result.

Corollary 4.6.

Under the assumptions of Theorem 3.4, the error in the posterior using an isotropic sparsegrid approximation based on Clenshaw-Curtis abscissas with N n points and a discretization of (4.5) satisﬁes (cid:13)(cid:13) π post Λ ( λ ) − ˆ π post ,n Λ ( λ ) (cid:13)(cid:13) L ( Λ ) ≤ C (cid:32)(cid:18) log MM (cid:19) s s + m + C ( σ ) N − µ n + C ( u ) (cid:0) h r + αn + ∆ t n (cid:1)(cid:33) . (4.13) Q with Approximate Models The goal of this section is to verify the convergence rates in Corollary 4.5. Consider the nonlinear system ofordinary diﬀerential equations governing a competitive Lotka-Volterra model of the population dynamics ofspecies competing for some common resource. The model is given by (cid:40) d u i dt = r i u i (cid:16) − (cid:80) j =1 α ij u j (cid:17) , t ∈ (0 , , u i (0) = u i, , (4.14)for i = 1 , ,

3. The initial condition, u i, , and the self-interacting terms, α ii , are given, but the remaininginteraction parameters, α ij with i (cid:54) = j as well as the reproductivity parameters, r i , are unknown. Thus, wehave a total of 9 uncertain parameters. We assume that these parameters are each uniformly distributed on[0 . , . u (10).We approximate the solution to (4.14) in time using an explicit Euler method. For the reference solution,we use a time step of ∆ t = 1 / t = 1 /

10 and an isotropic sparse grid of level-1 (19 model evaluations). We explore uniformly reﬁning thetime step with ∆ t = 1 / , / , / , /

160 and reﬁning the sparse grid to level-2 (181 model evaluations),level-3 (1177 model evaluations), and level-4 (5929 model evaluations).In Figure 7 (left), we ﬁx ∆ t = 1 /

160 and we see that the error in the push-forward of the prior convergesas the sparse grid is reﬁned. In Figure 7 (right), we ﬁx the sparse grid at level-4 and assess the convergence − − − − Number of Model Evaluations (N) L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.11Eq. 4.12 . . . . . . . . . . . − − − − L ∞ E rr o r i n P u s h - f o r w a r d Eq. 4.11Eq. 4.12

Figure 7: Convergence of the push-forward of the prior as the sparse grid discretization is reﬁned and thetemporal discretization is held ﬁxed at the highest level (left), and as the temporal discretization is reﬁnedand the sparse grid discretization is held at the highest level (right).in the error in the push-forward of the prior converges as the temporal discretization is reﬁned.We clearly see that the dominant contribution to the error comes from the sparse grid approximation.The convergence plots in Figure 7 stagnate once they reach the level-4 sparse grid error. Ideally, we woulduse an a posteriori error estimation technique (see e.g., [23, 9, 8, 10, 6]) to decompose the error into thevarious contributions and adaptively choose which discretization to reﬁne, but that is beyond the scope ofthis paper.We remark here that goal-oriented approaches for estimating the errors in approximations based uponspatial and temporal discretizations and surrogate were developed in [9], further generalized in [10]. Theseapproaches were then used in [23, 6] to separate the error into diﬀerent contributions and adaptively controlthe error, and, extended in [40], to bound the error in probabilities of rare events. However none of theseworks considers the error induced in the estimates of push-forward densities.0

T. Butler, J. Jakeman, T. Wildey

The goal of this section is to verify the convergence rate in Corollary 4.6. We use the model introducedin Section 4.4.1. To formulate a inverse problem, we assume the observed density on the QoI is given by π D ∼ N (0 . , . t = ∆ t = ∆ t = ∆ t = ∆ t = Ref. TruthI( π post Λ ) 1.023 1.023 1.023 1.023 1.023 1.017 1.000KL( π Λ : π post Λ ) 2.449 2.453 2.455 2.456 2.457 2.552 UNKNMean PF-post 0.500 0.500 0.500 0.499 0.500 0.500 0.500Var. PF-post 1.011e-4 1.031e-4 1.040e-4 1.040e-4 1.038e-4 1.029e-4 1.000e-4Table 3: Comparison of the integral of the posterior, the KLD from the prior to the posterior, and the meanand variance of the push-forward of the posterior obtained using using various temporal discretizations anda level-1 sparse grid in Section 4.4.2.Level-1 Level-2 Level-3 Level-4 Ref. TruthI( π post Λ ) 1.023 1.012 1.018 1.020 1.017 1.000KL( π Λ : π post Λ ) 2.449 2.544 2.557 2.559 2.552 UNKNMean PF-post 0.500 0.499 0.499 0.499 0.500 0.500Var. PF-post 0.951e-4 1.022e-4 1.021e-4 1.059e-4 0.994e-4 1.000e-4Table 4: Comparison of the integral of the posterior, the KLD from the prior to the posterior, and the meanand variance of the push-forward of the posterior obtained using various sparse grid approximations with aﬁxed temporal discretization (∆ t = ) in Section 4.4.2coarsest discretization, the approximate models satisfy Assumption 5 and can be used to solve the inverseproblem. We also see that KLD using the level-1 sparse grid does not give the same value as the other sparsegrid levels or the reference solution. This indicates that while Assumption 5 is satisﬁed, the approximatemodel leads to a diﬀerent posterior.In Figure 8, we plot the L -norm of the error in the posterior along with the L ∞ -norm of the error inthe approximate model. We see that the error in the posterior converges at the same rate as the error inthe approximate model. As in Section 4.4.1, the dominant contribution to the error is due to the sparse gridapproximation, so the convergence eventually stagnates when reﬁning the temporal discretization. We developed a theoretical framework for analyzing the convergence of probability density functions com-puted using approximate models for both forward and inverse problems. Our theoretical results are quitegeneral and apply to any L ∞ -convergent sequence of approximate models can be considered. We proved thatthe densities converge under quite reasonable assumptions and rates of convergence are obtained under morestringent assumptions. The rates of convergence explicitly show the dependence on the error introducedby approximating densities and the error induced via the use of approximate models. We have veriﬁedthe theoretical results using sequences of approximate models derived from discretized partial and ordinarydiﬀerential equations as well as from sparse grid approximations. J.D. Jakeman’s work was partially supported by DARPA EQUIPS. T. Wildey’s work was supported by theOﬃce of Science Early Career Research Program.

Q with Approximate Models − − − Number of Model Evaluations (N) L E rr o r i n P o s t e r i o r L ∞ Error in Q n Eq. 4.13 . . . . . . . . . . . − − − L E rr o r i n P o s t e r i o r L ∞ Error in Q n Eq. 4.13

Figure 8: Convergence of the posterior in the L -norm as the sparse grid discretization is reﬁned and thetemporal discretization is held ﬁxed at the highest level (left), and as the temporal discretization is reﬁnedand the sparse grid discretization is held at the highest level (right).The views expressed in the article do not necessarily represent the views of the U.S. Department of Energyor the United States Government. Sandia National Laboratories is a multimission laboratory managed andoperated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary ofHoneywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administrationunder contract DE-NA-0003525. References [1] B.M. Adams, L.E. Bauman, W.J. Bohnhoﬀ, K.R. Dalbey, M.S. Ebeida, J.P. Eddy, M.S. Eldred, P.D.Hough, K.T. Hu, J.D. Jakeman, J.A. Stephens, L.P. Swiler, D.M. Vigil, , and T.M. Wildey. Dakota, AMultilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncer-tainty Quantiﬁcation, and Sensitivity Analysis: Version 6.0 Users Manual. Technical Report SAND2014-4633 (Version 6.6), Sandia National Laboratories, 2017.[2] V. Barthelmann, E. Novak, and K. Ritter. High dimensional polynomial interpolation on sparse grids.

Advances in Computational Mathematics , 12:273–288, 2000.[3] Roland Becker and Rolf Rannacher. An optimal control approach to a posteriori error estimation inﬁnite element methods.

Acta Numerica , 10:1–102, 2001.[4] J. M. Bernardo and F. M. Adrian.

Bayesian Theory . Wiley, 1994.[5] D. Boos. A converse to scheﬀes theorem.

The Annals of Statistics , 13(1):423–427, 1985.[6] C. M. Bryant, S. Prudhomme, and T. Wildey. Error decomposition and adaptivity for response surfaceapproximations from pdes with parametric uncertainty.

SIAM/ASA Journal on Uncertainty Quantiﬁ-cation , 3(1):1020–1045, 2015.[7] H.-J. Bungartz and M. Griebel. Sparse grids.

Acta Numerica , 13:147–269, 2004.[8] T. Butler, P. Constantine, and T. Wildey. A posteriori error analysis of parameterized linear systemsusing spectral methods.

SIAM. J. Matrix Anal. Appl. , 33:195–209, 2012.[9] T. Butler, C. Dawson, and T. Wildey. A posteriori error analysis of stochastic spectral methods.

SIAMJ. Sci. Comput. , 33:1267–1291, 2011.2

T. Butler, J. Jakeman, T. Wildey [10] T. Butler, C. Dawson, and T. Wildey. Propagation of uncertainties using improved surrogate models.

SIAM/ASA Journal on Uncertainty Quantiﬁcation , 1(1):164–191, 2013.[11] T. Butler, J. Jakeman, and T. Wildey. Combining push-forward measures and bayes rule to constructconsistent solutions to stochastic inverse problems. Accepted for publication in SIAM J. Sci. Comput.,2017.[12] P. G. Ciarlet. Basic error estimates for elliptic problems. In

Handbook of numerical analysis, Vol. II ,pages 17–351. North-Holland, Amsterdam, 1991.[13] C. Dellacherie and P.A. Meyer.

Probabilities and Potential . North-Holland Publishing Co., Amsterdam,1978.[14] L Devroye and L. Gy´’orﬁ.

Nonparametric Density Estimation: The L View . Wiley, New York, 1985.[15] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson.

Computational diﬀerential equations . CambridgeUniversity Press, Cambridge, 1996.[16] Benjamin Ganis, Hector Klie, Mary F Wheeler, Tim Wildey, Ivan Yotov, and Dongxiao Zhang. Stochas-tic collocation and mixed ﬁnite elements for ﬂow in porous media.

Computer methods in applied me-chanics and engineering , 197(43):3547–3559, 2008.[17] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin.

Bayesian DataAnalysis, Third Edition . Chapman and Hall/CRC, 2013.[18] R. Ghanem and P. Spanos.

Stochastic Finite Elements: A Spectral Approach . Springer Verlag, NewYork, 2002.[19] Michael B. Giles and Endre Sli. Adjoint methods for pdes: a posteriori error analysis and postprocessingby duality.

Acta Numerica , 11:145–236, 1 2002.[20] B. E. Hansen. Uniform convergence rates for kernel estimation with dependent data.

EconometricTheory , 24(3):726–748, 2008.[21] J.D. Jakeman, M. Eldred, and D. Xiu. Numerical approach for quantiﬁcation of epistemic uncertainty.

J. Comput. Phys. , 229(12):4648–4663, 2010.[22] J.D. Jakeman and S.G. Roberts. Local and dimension adaptive stochastic collocation for uncertaintyquantiﬁcation. In Jochen Garcke and Michael Griebel, editors,

Sparse Grids and Applications , volume 88of

Lecture Notes in Computational Science and Engineering , pages 181–203. Springer Berlin Heidelberg,2013.[23] J.D. Jakeman and T. Wildey. Enhancing adaptive sparse grid approximations and improving reﬁnementstrategies using adjoint-based a posteriori error estimates.

Journal of Computational Physics , 280:54 –71, 2015.[24] E. T. Jaynes.

Probability Theory: The Logic of Science . 1998.[25] Marc C. Kennedy and Anthony O’Hagan. Bayesian calibration of computer models.

Journal of theRoyal Statistical Society: Series B (Statistical Methodology) , 63(3):425–464, 2001.[26] S. Kullback and R. A. Leibler. On information and suﬃciency.

The Annals of Mathematical Statistics ,22:79–86, 1951.[27] X. Ma and N. Zabaras. An adaptive hierarchical sparse grid collocation algorithm for the solution ofstochastic diﬀerential equations.

J. Comput. Phys. , 228:3084–3113, 2009.[28] Xiang Ma and Nicholas Zabaras. An adaptive hierarchical sparse grid collocation algorithm for thesolution of stochastic diﬀerential equations.

J. Comput. Phys. , 228(8):3084–3113, May 2009.

Q with Approximate Models

Communications in Computational Physics , 6(1):826–847, 2009.[30] F. Nobile, R. Tempone, and C.G. Webster. A sparse grid stochastic collocation method for partialdiﬀerential equations with random input data.

SIAM J. Numer. Anal. , 46(5):2309–2345, 2008.[31] F. Nobile, R. Tempone, and C.G. Webster. A sparse grid stochastic collocation method for partialdiﬀerential equations with random input data.

SIAM Journal on Numerical Analysis , 46(5):2309–2345,2008.[32] J. T. Oden and S. Prudhomme. Goal-oriented error estimation and adaptivity for the ﬁnite elementmethod.

Computers & mathematics with applications , 41(5):735–756, 2001.[33] Carl Edward Rasmussen. Gaussian processes for machine learning. 2006.[34] C. P. Robert.

The Bayesian Choice - A Decision Theoretic Motivation (second ed.) . Springer, 2001.[35] Christoph Schwab and Radu Alexandru Todor. Karhunen-Lo´eve approximation of random ﬁelds by gen-eralized fast multipole methods.

Journal of Computational Physics , 217(1):100 – 122, 2006. UncertaintyQuantiﬁcation in Simulation Science.[36] A. M. Stuart. Inverse problems: A bayesian perspective.

Acta Numerica , 19:451–559, 5 2010.[37] T. J. Sweeting. On a converse to scheﬀes theorem.

The Annals of Statistics , 14(3):1252–1256, 1986.[38] George R. Terrell and David W. Scott. Variable kernel density estimation.

The Annals of Statistics ,20(3):1236–1265, 1992.[39] Mary F Wheeler, Tim Wildey, and Ivan Yotov. A multiscale preconditioner for stochastic mortar mixedﬁnite elements.

Computer Methods in Applied Mechanics and Engineering , 200(9):1251–1262, 2011.[40] T. Wildey and T. Butler. Utilizing error estimates for surrogate models to accurately predict probabil-ities of events. To Appear in Int. J. Uncertainty Quantiﬁcation, 2018.[41] D. Xiu and G. Karniadakis. The Wiener-Askey polynomial chaos for stochastic diﬀerential equations.

SIAM J. Sci. Comput. , 24:619–644, 2002.[42] Dongxiao Zhang and Zhiming Lu. An eﬃcient, high-order perturbation approach for ﬂow in randomporous media via Karhunen-Lo´eve and polynomial expansions.