[PDF] Bayesian model inversion using stochastic spectral embedding

Abstract

In this paper we propose a new sampling-free approach to solve Bayesian model inversion problems that is an extension of the previously proposed spectral likelihood expansions (SLE) method. Our approach, called stochastic spectral likelihood embedding (SSLE), uses the recently presented stochastic spectral embedding (SSE) method for local spectral expansion refinement to approximate the likelihood function at the core of Bayesian inversion problems. We show that, similar to SLE, this approach results in analytical expressions for key statistics of the Bayesian posterior distribution, such as evidence, posterior moments and posterior marginals, by direct post-processing of the expansion coefficients. Because SSLE and SSE rely on the direct approximation of the likelihood function, they are in a way independent of the computational/mathematical complexity of the forward model. We further enhance the efficiency of SSLE by introducing a likelihood specific adaptive sample enrichment scheme. To showcase the performance of the proposed SSLE, we solve three problems that exhibit different kinds of complexity in the likelihood function: multimodality, high posterior concentration and high nominal dimensionality. We demonstrate how SSLE significantly improves on SLE, and present it as a promising alternative to existing inversion frameworks.

Full PDF

BBayesian model inversion using stochastic spectral embedding

P.-R. Wagner, S. Marelli, B. Sudret14.05.2020

Abstract

In this paper we propose a new sampling-free approach to solve Bayesian model inversionproblems that extends the recently introduced spectral likelihood expansions (SLE) method.The latter solves the inverse problem by expanding the likelihood function onto a global poly-nomial basis orthogonal w.r.t. the prior distribution. This gives rise to analytical expressionsfor key statistics of the Bayesian posterior distribution, such as evidence, posterior moments and posterior marginals by simple post-processing of the expansion coeﬃcients.It is well known that in most practically relevant scenarios, likelihood functions have closeto compact support, which causes the global SLE approach to fail due to the high polynomialdegree required for an accurate spectral representation. To solve this problem, we hereinreplace the global polynomial expansion from SLE with a recently proposed method for local spectral expansion reﬁnement called stochastic spectral embedding (SSE). This surrogate-modeling method was developed for functions with high local complexity. To increase theeﬃciency of SSE, we enhance it with an adaptive sample enrichment scheme. We show thatSSE works well for likelihood approximations and retains the relevant spectral properties ofSLE, thus preserving analytical expressions of posterior statistics.To assess the performance of our approach, we include three case studies ranging from low to high dimensional model inversion problems that showcase the superiority of the SSEapproach compared to SLE and present the approach as a promising alternative to existinginversion frameworks. Keywords : Bayesian model inversion, inverse problems, polynomial chaos expansions,spectral likelihood expansions, sampling-free inversion.

Computational models are an invaluable tool for decision making, scientiﬁc advances andengineering breakthroughs. They establish a connection between a set of input parameters a r X i v : . [ s t a t . C O ] M a y nd output quantities with wide-ranging applications. Model inversion uses available exper-imental observations of the output to determine the set of input parameters that maximizethe predictive potential of a model. The importance of eﬃcient and reliable model inversionframeworks can hardly be overstated, considering that they establish a direct connectionbetween models and the real world. Without it, the most advanced model predictions mightlack physical meaning and, consequently, be useless for their intended applications.Bayesian model inversion is one way to formalize this problem (Jaynes, 2003; Gelmanet al., 2014). It is based on Bayesian inference and poses the problem in a probabilisticsetting by capitalizing on Bayes’ theorem. In this setting a so-called prior ( i.e. , beforeobservations) probability distribution about the model parameters is updated to a so-called posterior ( i.e. , after observations) distribution. The posterior distribution is the probabilitydistribution of the input parameters conditioned on the available observations, and the mainoutcome of the Bayesian inversion process.In Bayesian model inversion, the connection between the model output and the obser-vations is established through a probabilistic discrepancy model . This model, which is afunction of the input parameters, leads to the so-called likelihood function. The speciﬁc formof the likelihood function depends on the problem at hand, but typically it has a globalmaximum for the input parameters with the model output that is closest to the availableobservations ( w.r.t. some metric), and rapidly goes to zero with increasing distance to thoseparameters.Analytical expressions for the posterior distribution can only be found in few academicexamples ( e.g. , conjugate priors with a linear forward model, Bishop (2006); Gelman et al.(2014)). In general model inversion problems, such analytical solutions are not availablethough. Instead, it is common practice to resort to sampling methods to generate a sampledistributed according to the posterior distribution. The family of Markov chain Monte Carlo (MCMC) algorithms are particularly suitable for generating such a posterior sample (Beckand Au, 2002; Robert and Casella, 2004).While MCMC and its extensions are extensively used in model inversion, and new al-gorithms are continuously being developed (Haario et al., 2001; Ching and Chen, 2007;Goodman and Weare, 2010; Neal, 2011), it has a few notable shortcomings that hinder itsapplication in many practical cases. It is well known that there are no robust convergencecriteria for MCMC algorithms, and that their performance is particularly sensitive to theirtuning parameters. Additionally, samples generated by MCMC algorithms are often highlycorrelated, thus requiring extensive heuristic post-processing and empirical rules (Gelmanand Rubin, 1992; Brooks and Gelman, 1998). MCMC algorithms are also in general not wellsuited for sampling multimodal posterior distributions.When considering complex engineering scenarios, the models subject to inversion areoften computationally expensive. Because MCMC algorithms usually require a signiﬁcant umber of forward model evaluations, it has been proposed to accelerate the procedure byusing surrogate models in lieu of the original models. These surrogate models are eitherconstructed non-adaptively before sampling from the posterior distribution (Marzouk et al.,2007; Marzouk and Xiu, 2009) or adaptively during the sampling procedure (Li and Marzouk,2014; Birolleau et al., 2014). Adaptive techniques can be of great beneﬁt with posteriordistributions that are concentrated in a small subspace of the prior domain, as the surrogateonly needs to be accurate near high density areas of the posterior distribution. Polynomial chaos expansions (PCE) are a widely used surrogate modelling techniquebased on expanding the forward model onto a suitable polynomial basis (Ghanem and Spanos,1991; Xiu and Karniadakis, 2002). In other words, it provides a spectral representation ofthe computational forward model. Thanks to the introduction of sparse regression (see, e.g.Blatman and Sudret (2011)), its computation has become feasible even in the presence ofcomplex and computationally expensive engineering models. This technique has been suc-cessfully used in conjunction with MCMC to reduce the total computational costs associatedwith sampling from the posterior distribution (Marzouk et al., 2007; Wagner et al., 2020).Alternative approaches to compute the posterior distribution or its statistics includethe

Laplace approximation at a posterior mode (Tierney and Kadane, 1986; Tierney et al.,1989b,a), approximate Bayesian computations (ABC) (Marin et al., 2012; Sisson et al., 2018),optimal transport approaches (El Moselhy and Marzouk, 2012; Parno, 2015; Marzouk et al.,2016) and embarrassingly parallel quasi-Monte Carlo sampling (Dick et al., 2017; Gantnerand Peters, 2018).Recently, Nagel and Sudret (2016) proposed spectral likelihood expansions (SLE), a novelapproach that is closely related to PCE and aims at ﬁnding a spectral expansion of thelikelihood function in an orthogonal basis w.r.t. the prior density function. The advantage ofthis method is that it provides analytical expressions for the posterior marginals and generalposterior moments by post-processing the expansion coeﬃcients. However, the typicallycompact support of likelihood functions generally prevents a sparse functional representation.

Stochastic spectral embedding (SSE) is a metamodelling technique suitable for approxi-mating functions with complex localized features recently developed in Marelli et al. (2020).In this paper we show how this technique, when extended with adaptive sample enrichment,is eﬀective for the approximation of likelihood functions in Bayesian model inversion prob-lems. Furthermore, we show that due to the construction of the basis polynomials, SSEpreserves the most important post-processing properties of SLE.The paper is organized as follows: In Section 2 we establish the basics of Bayesianinference and particularly Bayesian model inversion. We then give an introduction intospectral function decomposition with a focus on polynomial chaos expansions and theirapplication to likelihood functions (SLE) in Section 3. In Section 4 we present the maincontribution of the paper, namely the derivation of Bayesian posterior quantities of interest hen SSE is applied to likelihood functions and the extension of the SSE algorithm withan adaptive sampling strategy. Finally, in Section 5 we showcase the performance of ourapproach on three case studies of varying complexity. The problem of model inversion occurs whenever the predictions of a model are to be broughtinto agreement with available observations or data . This is achieved by properly adjustinga set of input parameters of the model. The goal of inversion can be twofold: on the onehand the inferred input parameters might be used to predict new realizations of the modeloutput. On the other hand, the inferred input parameters might be the main interest. Modelinversion is a common problem in many engineering disciplines, that in some cases is stillroutinely solved manually, i.e. by simply changing the input parameters until some, oftenqualitative, goodness-of-ﬁt criterion is met. More quantitative inversion approaches aim atautomatizing this process, by establishing a metric ( e.g. , L -distance) between the data andthe model response, which is then minimized through suitable optimization algorithms.While such approaches can often be used in practical applications, they tend not to pro-vide measures of the uncertainties associated with the inferred model input or predictions.These uncertainties are useful in judging the accuracy of the inversion, as well as indicatingnon-informative measurements. In fact, the lack of uncertainty quantiﬁcation in the con-text of model inversion can lead to erroneous results that have far-reaching consequences insubsequent applications. One approach to consider uncertainties in inverse problems is the Bayesian framework for model inversion that will be presented hereinafter.

Consider some non-observable parameters X ∈ D X and the observables Y ∈ D Y . Further-more, let Y = { y (1) , . . . , y ( N ) } be a set of N measurements, i.e. , noisy observations of a setof realizations of Y . Statistical inference consists in drawing conclusions about X using theinformation from Y (Gelman et al., 2014). These measurements can be direct observationsof the parameters ( Y = X ) or some quantities indirectly related to X through a functionor model M : D X → D Y . One way to conduct this inference is through Bayes’ theorem ofconditional probabilities, a process known as Bayesian inference .Denoting by π ( · ) a probability density function (PDF) and by π ( ·| x ) a PDF conditionedon x , Bayes’ theorem can be written as π ( x |Y ) = π ( Y| x ) π ( x ) π ( Y ) , (1)where π ( x ) is known as the prior distribution of the parameters, i.e. , the distribution of X before observing the data Y . The conditional distribution π ( Y| x ), known as likelihood , stablishes a connection between the observations Y and a realization of the parameters X = x . For a given realization x , it returns the probability density of observing the data Y .Under the common assumption of independence between individual observations, { y ( i ) } Ni =1 ,the likelihood function takes the form: L : x (cid:55)→ L ( x ; Y ) def = π ( Y| x ) = N (cid:89) i =1 π ( y ( i ) | x ) . (2)The likelihood function is a map D X → R + , and it attains its maximum for the parameterset with the highest probability of yielding Y . With this, Bayes’ theorem from Eq. (1) canbe rewritten as: π ( x |Y ) = L ( x ; Y ) π ( x ) Z , with Z = (cid:90) D X L ( x ; Y ) π ( x ) d x , (3)where Z is a normalizing constant often called evidence or marginal likelihood . On the left-hand side, π ( x |Y ) is the posterior PDF, i.e. , the distribution of X after observing data Y . Inthis sense, Bayes’ theorem establishes a general expression for updating the prior distributionusing a likelihood function to incorporate information from the data. Bayesian model inversion describes the application of the Bayesian inference framework tothe problem of model inversion (Beck and Katafygiotis, 1998; Kennedy and O’Hagan, 2001;Jaynes, 2003; Tarantola, 2005). The two main ingredients needed to infer model parameterswithin the Bayesian framework are a prior distribution π ( x ) of the model parameters and alikelihood function L . In practical applications, prior information about the model param-eters is often readily available. Typical sources of such information are physical parameterconstraints or expert knowledge. Additionally, prior inversion attempts can serve as guide-lines to assign informative prior distributions. In cases where no prior information about theparameters is available, so-called non-informative or invariant prior distributions (Jeﬀreys,1946; Harney, 2016) can also be assigned. The likelihood function serves instead as the linkbetween model parameters X and observations of the model output Y . To connect thesetwo quantities, it is necessary to choose a so-called discrepancy model that gives the relativeprobability that the model response to a realization of X = x describes the observations.One common assumption for this probabilistic model is that the measurements are perturbedby a Gaussian additive discrepancy term E ∼ N ( ε | , Σ ), with covariance matrix Σ . For asingle measurement y ( i ) it reads: y ( i ) = M ( x ) + ε . (4)This discrepancy between the model output M ( X ) and the observables Y can result from measurement error or model inadequacies . By using this additive discrepancy model, thedistribution of the observables conditioned on the parameters Y | x is written as: π ( y ( i ) | x ) = N ( y ( i ) |M ( x ) , Σ ) , (5) here N ( ·| µ , Σ ) denotes the multivariate Gaussian PDF with mean value µ and covariancematrix Σ . The likelihood function L is then constructed using this probabilistic model π ( y ( i ) | x ) and Eq. (2). For a given set of measurements Y it thus reads: L ( x ; Y ) def = N (cid:89) i =1 N ( y ( i ) |M ( x ) , Σ ) . (6)With the fully speciﬁed Bayesian model inversion problem, Eq. (3) directly gives the posteriordistribution of the model parameters π ( x |Y ). In the setting of model inversion, the posteriordistribution represents therefore the state of belief about the true data-generating modelparameters, considering all available information: computational forward model, discrepancymodel and measurement data (Beck and Katafygiotis, 1998; Jaynes, 2003).Often, the ultimate goal of model inversion is to provide a set of inferred parameters,with associate conﬁdence measures/intervals. This is often achieved by computing posteriorstatistics ( e.g. , moments, mode, etc.). Propagating the posterior through secondary modelsis also of interest. So-called quantites of interest (QoI) can be expressed by calculatingthe posterior expectation of suitable functions of the parameters h ( x ) : R M → R , with X |Y ∼ π ( x |Y ), as in: E [ h ( X ) |Y ] = (cid:90) D X |Y h ( x ) π ( x |Y ) d x . (7)Depending on h , this formulation encompasses posterior moments ( h ( x ) = x i or h ( x ) =( x i − E [ X i ]) for the ﬁrst and second moments, respectively), posterior covariance ( h ( x ) = x i x j − E [ X i ] E [ X j ]) or expectations of secondary models ( h ( x ) = M (cid:63) ( x )). To pose a Bayesian inversion problem, the speciﬁcation of a prior distribution and a likelihoodfunction described in the previous section is suﬃcient. Its solution, however, is not availablein closed form in the general case.

Spectral likelihood expansion (SLE) is a recently proposed method that aims at solvingthe Bayesian inversion problem by ﬁnding a polynomial chaos expansion (PCE) of the like-lihood function in a basis orthogonal w.r.t. the prior distribution (Nagel and Sudret, 2016).This representation allows one to derive analytical expressions for the evidence Z , the pos-terior distribution, the posterior marginals, and many types of QoIs, including the posteriormoments.We oﬀer here a brief introduction to regression-based, sparse PCE before introducingSLE, but refer the interested reader to more exhaustive resources on PCE (Ghanem andSpanos, 1991; Xiu and Karniadakis, 2002) and sparse PCE (Xiu, 2010; Blatman and Sudret,2010, 2011).Let us consider a random variable X with independent components { X i , i = 1 , . . . , M } and associated probability density functions π i ( x i ) so that π ( x ) = (cid:81) Mi =1 π i ( x i ) . Assume urther that M : D X = (cid:81) Mi =1 D X i ⊆ R M → R is a scalar function of X which fulﬁllsthe ﬁnite variance condition ( E (cid:2) M ( X ) (cid:3) < + ∞ ). Then it is possible to ﬁnd a so-called truncated polynomial chaos approximation of M such that M ( X ) ≈ M PCE ( X ) def = (cid:88) α ∈A a α Ψ α ( X ) (8)where α is an M -tuple ( α , . . . , α M ) ∈ N M and A ⊂ N M . For most parametric distributions,well-known classical orthonormal polynomials { Ψ α } α ∈ N M satisfy the necessary orthonor-mality condition w.r.t. π ( x ) (Xiu and Karniadakis, 2002). For more general distributions,arbitrary orthonormal polynomials can be constructed numerically through the Stieltjes pro-cedure (Gautschi, 2004). If additionally, A is a sparse subset of N M , the truncated expansionin Eq. (8) is called a sparse PCE .There exist diﬀerent algorithms to produce a sparse PCE in practice, i.e. select asparse basis A and compute the corresponding coeﬃcients. A powerful class of methodsare regression-based approaches that rely on an initial input sample X , called experimentaldesign, and corresponding model evaluations M ( X ) (See, e.g. L¨uthen et al. (2020) for arecent survey). Additionally, it is possible to design adaptive algorithms that choose thetruncated basis size (Blatman and Sudret, 2011; Jakeman et al., 2015).To assess the accuracy of PCE, the so-called generalization error E (cid:2) ( M ( X ) − M PCE ( X )) (cid:3) shall be evaluated. A robust generalization error estimator is given by the leave-one-out (LOO) cross validation technique. This estimator is obtained by ε LOO = 1 K K (cid:88) i =1 (cid:16) M ( x ( i ) ) − M ∼ i PCE ( x ( i ) ) (cid:17) , (9)where M ∼ i PCE is constructed by leaving out the i -th point from the experimental design X .For methods based on linear regression, it can be shown (Chapelle et al., 2002; Blatman andSudret, 2010) that the LOO error is available analytically by post-processing the regressormatrix. The idea of SLE is to use sparse PCE to ﬁnd a spectral representation of the likelihoodfunction L occurring in Bayesian model inversion problems (see Eq. (2)). We present here abrief introduction to the method and the main results of Nagel and Sudret (2016).Likelihood functions can be seen as scalar functions of the input random vector X ∼ π ( x ). In this work we assume priors of the type π ( x ) = (cid:81) Mi =1 π i ( x i ), i.e. with independentmarginals. Additionally, likelihood functions fulﬁll the ﬁnite variance condition (Nagel andSudret, 2016) and therefore admit a spectral decomposition: L ( X ) ≈ L SLE ( X ) def = (cid:88) α ∈A a α Ψ α ( X ) , (10)where the explicit dependence on Y was dropped for notational simplicity. pon computing the basis and coeﬃcients in Eq. (8), the solution to the inverse problemis converted to merely post-processing the coeﬃcients a α . The following expressions can bederived for the individual quantities: Evidence

The evidence emerges as the constant polynomial’s coeﬃcient a Z = (cid:90) D X L ( x ) π ( x ) d x ≈ (cid:104)L SLE , (cid:105) π = a . (11) Posterior

Upon computing the evidence Z , the posterior can be evaluated directly through π ( x |Y ) ≈ L SLE ( x ) π ( x ) Z = π ( x ) a (cid:88) α ∈A a α Ψ α ( x ) . (12) Posterior marginals

Let u and v be two non-empty disjoint index sets such that u ∪ v = { , . . . , M } . We split the random vector X into two vectors X u with components { X i } i ∈ u ∈ D X u and X v with components { X i } i ∈ v ∈ D X u . Denote further by π u ( x u ) def = (cid:81) i ∈ u π i ( x i ) and π v ( x v ) def = (cid:81) i ∈ v π i ( x i ) the prior marginal density functions of X u and X v respectively. The posterior marginals then read: π u ( x u |Y ) = (cid:90) D X v π ( x |Y ) d x v ≈ π u ( x u ) a (cid:88) α ∈A v =0 a α Ψ α ( x u ) , (13)where A v =0 = { α ∈ A : α i = 0 ⇔ i ∈ v } . The series in the above equation constitutesa subexpansion that contains non-constant polynomials only in the directions i ∈ u . Quantities of interest

Finally, it is also possible to analytically compute posterior expec-tations of functions that admit a polynomial chaos expansion in the same basis of theform h ( X ) ≈ (cid:80) α ∈A b α Ψ α ( X ). Eq. (7) then reduces to the spectral product: E [ h ( X ) |Y ] = 1 a (cid:88) α ∈A a α b α . (14)The quality of these results depends only on the approximation error introduced inEq. (10). The latter, in turn, depends mainly on the chosen PCE truncation strategy (Blat-man and Sudret, 2011; Nagel and Sudret, 2016) and the number of points used to computethe coeﬃcients ( i.e. , the experimental design). It is known that likelihood functions typicallyhave quasi-compact supports ( i.e. , L ( X ) ≈ D X ). Such functions requirea very high polynomial degree to be approximated accurately, which in turn can lead to theneed for prohibitively large experimental designs. Stochastic spectral embedding (SSE) is a multi-level approach to surrogate modeling origi-nally proposed in Marelli et al. (2020). It attempts to approximate a given square-integrablefunction M through M ≈ M

SSE ( X ) = (cid:88) k ∈K D k X ( X ) (cid:98) R kS ( X ) , (15) here K ⊆ N is a set of multi-indices with elements k = ( (cid:96), p ) for which (cid:96) = 0 , . . . , L and p = 1 , . . . , P (cid:96) where L is the number of levels and P (cid:96) is the number of subdomains at aspeciﬁc level (cid:96) . We call (cid:98) R kS ( X ) a residual expansion given by (cid:98) R kS ( X ) = (cid:88) α ∈A k a k α Ψ k α ( X ) . (16)In the present paper the term (cid:80) j ∈A k a kj Ψ kj ( X ) denotes a polynomial chaos expansion(see Eq. (8)) constructed in the subdomain D k X , but in principle it can refer to any spectralexpansion ( e.g. , Fourier series). A schematic representation of the summation in Eq. (15) isgiven in Figure 2. The detailed notation and the algorithm to sequentially construct an SSEare given in the sequel. Viewing the likelihood as a function of a random variable X , we can directly use Eq. (15)to write down its SSE representation L ( X ) ≈ L SSE ( X ) def = (cid:88) k ∈K D k X ( X ) (cid:98) R kS ( X ) , (17)where the variable X is distributed according to the prior distribution π ( x ) and, conse-quently, the local basis used to compute (cid:98) R kS ( X ) is orthonormal w.r.t. that distribution.Due to the local spectral properties of the residual expansions, the SSE representation ofthe likelihood function retains all of the post-processing properties of SLE (Section 3.1): Evidence

The normalization constant Z emerges as the sum of the constant polynomialcoeﬃcients weighted by the prior mass: Z = (cid:88) k ∈K (cid:88) α ∈A k a k α (cid:90) D k X Ψ k α ( x ) π ( x ) d x = (cid:88) k ∈K V k a k , where V k = (cid:90) D k X π ( x ) d x . (18) Posterior

This allows us to write the posterior as π ( x |Y ) ≈ L SSE ( x ) π ( x ) Z = π ( x ) (cid:80) k ∈K V k a k (cid:88) k ∈K D k X ( x ) (cid:98) R kS ( x ) . (19) Posterior marginal

Utilizing again the disjoint sets u and v from Eq. (13) it is also possibleto analytically derive posterior marginal PDFs as π u ( x u |Y ) = (cid:90) D X v π ( x |Y ) d x v ≈ π u ( x u ) (cid:80) k ∈K V k a k (cid:88) k ∈K D k X u ( x u ) (cid:98) R kS, u ( x u ) V k v (20)where (cid:98) R kS, u ( x u ) = (cid:88) α ∈A k v =0 a k α Ψ k α ( x u ) and V k v = (cid:90) D k X v π v ( x v ) d x v . (21) (cid:98) R kS, u ( x u ) is a subexpansion of (cid:98) R kS ( x ) that contains only non-constant polynomials inthe directions i ∈ u . Note that, as we assumed that the prior distribution has inde-pendent components, the constants V k and V k v are obtained as products of univariate ntegrals which are available analytically from the prior marginal cumulative distribu-tion functions (CDFs). Quantities of interest

Expected values of function h ( x ) = (cid:80) α ∈A k b k α Ψ k α ( x ) for k ∈ K under the posterior can be approximated by: E [ h ( X ) |Y ] = (cid:90) D X h ( x ) π ( x |Y ) d x = 1 Z (cid:88) k ∈K (cid:88) α ∈A k a k α (cid:90) D k X h ( x )Ψ k α ( x ) π ( x ) d x = 1 Z (cid:88) k ∈K (cid:88) α ∈A k a k α b k α , (22)where b k α are the coeﬃcients of the PCE of h in the card( K ) bases { Ψ k α } α ∈A k . Thiscan also be used for computing posterior moments.These expressions can be seen as a generalization of the ones for SLE detailed in Sec-tion 3.1. For a single-level global expansion ( i.e. , card( K ) = 1) and consequently V (0 , = 1,they are identical. The original algorithm for computing an SSE was presented in Marelli et al. (2020). Itrecursively partitions the input domain D X and constructs truncated expansions of theresidual. We reproduce it below for reference but replace the model M with the likelihoodfunction L . We further simplify the algorithm by choosing a partitioning strategy with N S = 2.1. Initialization: (a) (cid:96) = 0, p = 1(b) D (cid:96),p X = D X (c) R (cid:96) ( X ) = L ( X )2. For each subdomain D (cid:96),p X , p = 1 , · · · , P (cid:96) : (a) Calculate the truncated expansion (cid:98) R (cid:96),pS ( X (cid:96),p ) of the residual R (cid:96) ( X (cid:96),p ) in thecurrent subdomain(b) Update the residual in the current subdomain R (cid:96) +1 ( X (cid:96),p ) = R (cid:96) ( X (cid:96),p ) − (cid:98) R (cid:96),pS ( X (cid:96),p )(c) Split the current subdomain D (cid:96),p X in 2 subdomains D (cid:96) +1 , { s ,s } X based on a parti-tioning strategy(d) If (cid:96) < L , (cid:96) ← (cid:96) + 1, go back to 2a, otherwise terminate the algorithm3. Termination (a) Return the full sequence of D (cid:96),p X and (cid:98) R (cid:96),pS ( X (cid:96),p ) needed to compute Eq. (15).In practice, the residual expansions (cid:98) R (cid:96),pS ( X (cid:96),p ) are computed using a ﬁxed experimentaldesign X and corresponding model evaluations L ( X ). The algorithm then only requires thespeciﬁcation of a partitioning strategy and a termination criterion, as detailed in Marelliet al. (2020). ikelihood functions are typically characterized by a localized behaviour: Close to thedata-generating parameters they peak while quickly decaying to 0 in the remainder of theprior domain. This means that in a majority of the domain the likelihood evaluation isnon-informative. Directly applying the original algorithm is then expected to waste manylikelihood evaluations.We therefore modify the original algorithm by adding an adaptive sampling scheme (Sec-tion 4.2.1) that includes the termination criterion and introducing an improved partitioningstrategy (Section 4.2.2) that is especially suitable for ﬁnding compact support features. Therationale for these modiﬁcations is presented next. The proposed algorithm has two parameters: the experimental design size for the residualexpansions N ref and the ﬁnal experimental design size corresponding to the available com-putational budget N ED . At the initialization of the algorithm N ref points are sampled asa ﬁrst experimental design. At every further iteration, additional points are then sampledfrom the prior distribution. These samples are generated in a space-ﬁlling way ( e.g. throughlatin hypercube sampling) in the newly created subdomains D (cid:96) +1 ,s X to have always exactly N ref points available for constructing the residual expansions. The algorithm is terminated,once the computational budget N ED has been exhausted.At every step, the proposed algorithm chooses a single reﬁnement domain from the setof unsplit, i.e. terminal domains , creates two new subdomains by splitting the reﬁnementdomain and constructs residual expansions after enriching the experimental design. Theselection of this reﬁnement domain is based on the error estimator E k that is deﬁned by E (cid:96) +1 ,s =  E (cid:96) +1 ,s LOO V (cid:96) +1 ,s , if ∃ (cid:98) R (cid:96) +1 ,sS ,E (cid:96),s LOO V (cid:96) +1 ,s , otherwise . (23)This estimator incorporates the subdomain size through the prior mass V (cid:96) +1 ,s , and theapproximation accuracy, through the leave-one-out estimator. The distinction is necessaryto assign an error estimator also to domains that have too few points to construct a residualexpansion, in which case the error estimator of the previous level E (cid:96),s LOO is reused.The algorithm sequentially splits and reﬁnes subdomains with large approximation errors.Because likelihood functions typically have the highest complexity close to their peak, theseregions tend to have larger approximation errors and are therefore predominantly picked forreﬁnement. The proposed way of adaptive sampling then ends up placing more points nearthe likelihood peak, thereby reducing the number of non-informative likelihood evaluations.The choice of a constant N ref is simple and could in principle be replaced by a moreelaborate strategy ( e.g. , based on the approximation error of the current subdomain relativeto the total approximation error). A beneﬁt of this enrichment criterion is that all residual xpansions are computed with experimental designs of the same size. Upon choosing thedomain with the maximum approximation error among the terminal domains, the errorestimators then have a more comparable estimation accuracy. The partitioning strategy determines how a selected reﬁnement domain is split. As describedin Marelli et al. (2020), it is easy to deﬁne the splitting in the uniformly distributed quan-tile space U and map the resulting split domains D (cid:96),p U to the (possibly unbounded) realspace X through an appropriate isoprobabilistic transform ( e.g. , the Rosenblatt transform(Rosenblatt, 1952)).Similar to the original SSE algorithm presented in Marelli et al. (2020), we split thereﬁnement domain in half w.r.t. its prior mass. The original algorithm chooses the splittingdirection based on the partial variance in the reﬁnement domain. This approach is well suitedfor generic function approximation problems. For the approximation of likelihood functions,however, we propose a partitioning strategy that is more apt for dealing with their compactsupport .We propose to pick the split direction along which a split yields a maximum diﬀerencein the residual empirical variance between the two candidate subdomains created by thesplit. This can easily be visualized with an example given by the M = 2 dimensionaldomain D (cid:96),p X in Figure 1(a). Assume this subdomain was selected as the reﬁnement domain.To decide along which dimension to split, we construct the M candidate subdomain pairs {D i, , D i, } i =1 ,...,M and estimate the corresponding { E i split } i =1 ,...,M in those subdomainsdeﬁned by E i split def = (cid:12)(cid:12)(cid:12) Var (cid:104) R (cid:96) +1 ( X i, ) (cid:105) − Var (cid:104) R (cid:96) +1 ( X i, ) (cid:105)(cid:12)(cid:12)(cid:12) . (24)In this expression, X i, and X i, denote subsets of the experimental design X inside thesubdomains D i, and D i, respectively. The occurring variances can be easily estimatedwith the empirical variance of the residuals in the respective candidate subdomains.After computing the residual variance diﬀerences, the split is carried out along the di-mension d = arg max i ∈{ ,...,M } E i split , (25) i.e. , to keep the subdomains D d, and D d, that introduce the largest diﬀerence in variance.For d = 1, the resulting split can be seen in Figure 1(d).The choice of this partitioning strategy can be justiﬁed heuristically with the goal ofapproximating compact support functions. Assume that the likelihood function has com-pact support, this criterion will avoid cutting through its support and instead identify a splitdirection that results in one subdomain with large variance (expected to contain the likeli-hood support) and a subdomain with small variance. In subsequent steps, the algorithm willproceed by cutting away low variance subdomains, until the likelihood support is isolated. a) Reﬁnement domain (b) Split along d = 1 (c) Split along d = 2 (d) Selected pair Figure 1: Partitioning strategy for a 2D example visualized in the quantile space U . Thereﬁnement domain D (cid:96),p U is split into two subdomains D (cid:96) +1 ,s U and D (cid:96) +1 ,s U . The algorithm is presented here with its two parameters N ref , the minimum experimentaldesign size needed to expand a residual, and N ED , the ﬁnal experimental design size. Thesample X (cid:96),p refers to X ∩ D (cid:96),pX , i.e. the subset of X inside D (cid:96),pX . Further, the multi-indexset T ∈ N at each step of the algorithm gathers all indices ( (cid:96), p ) of unsplit subdomains. Itthus denotes the terminal domains: D k X , k ∈ T . For visualization purposes we show the ﬁrstiterations of the algorithm for a two-dimensional example in Figure 2.1. Initialization: (a) D , X = D X (b) Sample from prior distribution X = { x (1) , · · · , x ( N ref ) } (c) Calculate the truncated expansion (cid:98) R , S ( X ) of L ( X ) in the full domain X , ,retrieve its approximation error E , and initialize T = { (0 , } (d) R ( X ) = L ( X ) − (cid:98) R , S ( X )2. For ( (cid:96), p ) = arg max k ∈T E k : (a) Split the current subdomain D (cid:96),p X in 2 sub-parts D (cid:96) +1 , { s ,s } X and update T (b) For each split s = { s , s } i. If |X (cid:96) +1 ,s | < N ref and N ref − |X (cid:96) +1 ,s | < N ED − |X | A. Enrich sample X with N ref − |X (cid:96) +1 ,s | new points inside D (cid:96) +1 ,s X ii. If |X (cid:96) +1 ,s | = N ref A. Create the truncated expansion (cid:98) R (cid:96) +1 ,sS ( X (cid:96) +1 ,s ) of the residual R (cid:96) +1 ( X (cid:96) +1 ,s )in the current subdomain using X (cid:96) +1 ,s B. Update the residual in the current subdomain R (cid:96) +2 ( X (cid:96) +1 ,s ) = R (cid:96) +1 ( X (cid:96) +1 ,s ) − (cid:98) R (cid:96) +1 ,sS ( X (cid:96) +1 ,s )iii. Retrieve the approximation error E (cid:96) +1 ,s from Eq. (23)(c) If no new expansions were created, terminate the algorithm, otherwise go back to23. Termination a) Initialization (b) 1st iteration (c) 3rd iteration Figure 2: Graphical representation of the ﬁrst steps of the adaptive SSE algorithm describedin Section 4.2.3 for a two-dimensional problem with independent prior distribution. Upper row:partitioning in the quantile space; Lower row: partitioning in the unbounded real space with π ( x ) contour lines in dashed blue. Red dots show the adaptive experimental design that has aconstant size of N ref = 5 in each created subdomain. The terminal domains T are highlightedin orange. The splitting direction in each subdomain is determined randomly in this example. (a) Return the full sequence of D (cid:96),p X and (cid:98) R (cid:96),pS ( X (cid:96),p ) needed to compute Eq. (15)The updating of the multi-index set in Step 2a refers to removing the current index ( (cid:96), p )from the set and adding to it the newly created indices ( (cid:96) + 1 , s ) and ( (cid:96) + 1 , s ). To showcase the eﬀectiveness of the proposed adaptive SSE approach, we present three casestudies with increasing dimensionality and diﬀerent model complexity: (i) a one-dimensionalvibration problem for illustrative purposes, (ii) a six-dimensional heat transfer problemthat describes the steady-state heat evolution in a solid body with inclusions and (iii)a 62-dimensional diﬀusion problem modeling the concentration-driven diﬀusion in a one-dimensional domain.For all case studies, we adopt the adaptive sparse-PCE based on LARS approach devel-oped in Blatman and Sudret (2011) through its numerical implementation in UQLab (Marelliand Sudret, 2014, 2019). Each (cid:98) R kS is therefore a degree- and q -norm-adaptive polynomial haos expansion. We further introduce a rank truncation of r = 2, i.e. we limit the max-imum number of input interactions (Marelli and Sudret, 2019) to two variables at a time.The truncation set for each spectral expansion (Eq. (8)) thus reads: A M,p,q,r = { α ∈ N M : || α || q ≤ p, || α || ≤ r } . (26)where || α || q = (cid:32) M (cid:88) i =1 α qi (cid:33) q , q ∈ (0 , || α || = M (cid:88) i =1 { α i > } . (27)The q -norm is adaptively increased between q = { . , · · · , . } while the maximum poly-nomial degree is adaptively increased in the interval p = { , , · · · , p } , where the maximumdegree p = 20 for case study (i) and (ii) and p = 3 for case study (iii) due to its highdimensionality.In case study (ii) and (iii), the performance of SLE (Nagel and Sudret, 2016), the originalnon-adaptive SSE (Marelli et al., 2020) and the proposed adaptive SSE approach presentedin Section 4.2 is compared. The comparison was omitted for case study (i), because onlyadaptive SSE succeeded in solving the problem. For clarity, we henceforth abbreviate theadaptive SSE algorithm to adSSE.To simplify the comparison, the same partitioning strategy employed for adSSE (Sec-tion 4.2.2) was employed for the non-adaptive SSE approach. Also, the same experimentaldesigns were used for the non-adaptive SSE and the SLE approaches. Finally, the sameparameter N ref was used to deﬁne the enrichment samples in adSSE and the terminationcriterion in non-adaptive SSE.To assess the performance of the three algorithms considered, we deﬁne an error measurethat allows to quantitatively compare the similarity of the SSE, adSSE and SLE solutionwith the reference MCMC solution. This comparison is inherently diﬃcult, as a sampling-based approach (MCMC) needs to be compared to a functional approximation (SSE, adSSE,SLE). We proceed to compare the univariate posterior marginals , available analytically inSSE, adSSE and SLE (See Eq. (13) and Eq. (20)), to the reference posterior marginalsestimated with kernel density estimation (KDE, Wand and Jones (1995)) from the MCMCsample. Denoting by ˆ π i ( κ i |Y ) the SSE, adSSE or SLE approximations and by π i ( κ i |Y ) thereference solution, we deﬁne the following error measure η def = 1 M M (cid:88) i =1 JSD (ˆ π i ( κ i |Y ) || π i ( κ i |Y )) (28)where M is the dimensionality of the problem and JSD is the Jensen-Shannon divergence (Lin, 1991), a symmetric and bounded ([0 , log(2)]) distance measure for probability distri-butions based on the Kullback-Leibler divergence.The purpose of the error measure η is to allow for a fair comparison between the diﬀerentmethods investigated. It is not a practical measure for engineering applications because itrelies on the availability of a reference solution, and it its magnitude does not have a clear : Sketch of the linear oscillator quantitative interpretation. However, it is considerably more comprehensive than a puremoment-based error measure. Because it is averaged over all M marginals, it encapsulatesthe approximation accuracy of all univariate posterior marginals in a scalar value.As all algorithms (SSE, adSSE, SLE) depend on a randomly chosen experimental designs,we produce 20 replications for case study (ii) and 5 replications for case study (iii) by runningthem multiple times. This ﬁrst case study serves as an illustrative example of how the proposed adaptive algorithmconstructs an adSSE. In the presented problem the goal is the inference of a single unknownparameter with a bi-modal posterior distribution. This problem is very diﬃcult to solve withstandard MCMC methods due to the probability valley, i.e. low probability region, betweenthe posterior peaks. The considered problem is fabricated and uses synthetic data, but ispresented in the context of a relevant engineering problem.Consider the oscillator displayed in Figure 3 subject to harmonic ( i.e. , sinusoidal) excita-tion. Assume the prior information about its stiﬀness X def = k is that it follows a lognormaldistribution with µ = 0 . N/m and σ = 0 . N/m . Its true value shall be determined us-ing measurements of the oscillation amplitude at the location of the mass m . The knownproperties of the oscillator system are the oscillator mass m = 1 kg , the excitation frequency ω = 1 rad/s and the viscous damping coeﬃcient c = 0 . N s/m . The oscillation amplitudeis measured in ﬁve independent oscillation events and normalized by the forcing amplitudeyielding the measured amplitude ratios Y = { . , . , . , . , . } .This problem is well known in mechanics and in the linear case ( i.e. , assuming smalldeformations and linear material behavior) can be solved analytically with the amplitude ofthe frequency response function . This function returns the ratio between the steady stateamplitude of a linear oscillator and the amplitude of its excitation. It is given by M ( X ) = mω (cid:112) ( X − mω ) + ( cω ) . (29) e assume a discrepancy model with known discrepancy standard deviation σ . In con-junction with the available measurements Y , this leads to the following likelihood function: L ( x ; Y ) = (cid:89) i =1 N ( y ( i ) |M ( x ) , σ ) . (30)We employ the adSSE algorithm to approximate this likelihood function with N ref = 10.A few iterations from the solution process are shown in Figure 4. The top plots show thesubdomains D (cid:96),pX constructed at each reﬁnement step, highlighting the terminal domains T .The middle plots display the residual between the true likelihood and the approximation atthe current iteration, as well as the adaptively chosen experimental design X . The bottomplots display the target likelihood function and its current approximation.The initial global approximation of the ﬁrst iteration in Figure 4(a) is a constant poly-nomial based on the initial experimental design. By the third iteration, the algorithm hasidentiﬁed the subdomain D , X as the one of interest and proceeds to reﬁne it in subsequentsteps. By the 8th iteration both likelihood peaks have been identiﬁed. Finally, by the 10thiteration in Figure 4(d), both likelihood peaks are approximated well by the adSSE approach.The last iteration shows how the algorithm splits domains and adds new sample points.There is a clear clustering of subdomains and sample points near the likelihood peaks at X = 0 .

95 and X = 1 . X |Y . Itis shown together with the true posterior and the original prior distribution in Figure 5.For this case study, non-adaptive experimental design approaches like the standard SSE(Marelli et al., 2020) and the original SLE algorithm (Nagel and Sudret, 2016) will almostsurely fail for the considered experimental design of N ED = 100. In numerous trial runsthese approaches did not manage to accurately reconstruct the likelihood function due to alack of informative samples near the likelihood peaks. This case study was originally presented in Nagel and Sudret (2016) and solved there usingSLE. We again solve the same problem with SSE and compare the performance of SLE (Nageland Sudret, 2016), the original non-adaptive SSE (Marelli et al., 2020) and the proposedadSSE approach presented in Section 4.2.Consider the diﬀusion-driven stationary heat transfer problem sketched in Figure 6(a).It models a 2D plate with a background matrix of constant thermal conductivity κ and6 inclusions with conductivities κ def = ( κ , . . . , κ ) (cid:124) . The diﬀusion driven steady state heatdistribution is described by a heat equation in Euclidean coordinates r def = ( r , r ) (cid:124) of the a) 1st iteration, N ED = 10 (b) 3rd iteration, N ED = 30(c) 8th iteration, N ED = 80 (d) 10th iteration, N ED = 100 Figure 4:

One-dimensional vibration problem : Illustration of the adSSE algorithm approximat-ing the likelihood function L . 18igure 5: One-dimensional vibration problem : Comparison of the true multimodal posterior andits adSSE based approximation. (a) Problem sketch (b) Steady state solution

Figure 6:

Moderate-dimensional heat transfer problem : Model setup and exemplary solution. form ∇ · ( κ ( r ) ∇ ˜ T ( r )) = 0 , (31)where the thermal conductivity ﬁeld is denoted by κ and the temperature ﬁeld by ˜ T . Theboundary conditions of the plate are given by a no-heat-ﬂux Neumann boundary conditionson the left and right sides ( ∂ ˜ T /r = 0), a Neumann boundary condition on the bottom ( κ ∂ ˜ T /r = q ) and a temperature ˜ T Dirichlet boundary condition on the top .We employ a ﬁnite element (FE) solver to solve the weak form of Eq. (31) by discretizingthe domain into approximately 10 triangular elements. A sample solution returned by theFE-solver is shown in Figure 6b.In this example we intend to infer the thermal conductivities κ of the inclusions. Weassume the same problem constants as in Nagel and Sudret (2016) ( i.e. , q = 2 ,

000 W/m ,˜ T = 200 K, κ = 45W/m/K). The forward model M takes as an input the conductivities ofthe inclusions κ , solves the ﬁnite element problem and returns the steady state temperature˜ T def = ( ˜ T , . . . , ˜ T ) (cid:124) at the measurement points, i.e. , M : κ (cid:55)→ ˜ T . o solve the inverse problem, we assume a multivariate lognormal prior distributionwith independent marginals on the inclusion conductivities, i.e. π ( κ ) = (cid:81) i =1 LN ( κ i | µ =30 W/m/K , σ = 6 W/m/K). We further assume an additive Gaussian discrepancy model,which yields the likelihood function L ( κ ; Y ) = 12 πσ exp (cid:18) − σ ( T − M ( κ )) (cid:124) ( T − M ( κ )) (cid:19) , (32)with a discrepancy standard deviation of σ .As measurements, we generate one temperature ﬁeld with ˆ κ def = (32 , , , , , (cid:124) W/m/Kand collect its values at 20 points indicated by black dots in Figure 6(a). We then perturbthese temperature values with additive Gaussian noise and use them as the inversion data Y def = T = ( T , . . . , T ) (cid:124) .We look at two instances of this problem that diﬀer only by the discrepancy parameter σ from Eq. (32). The prior model response has a standard deviation of approximately0 . T i . We therefore solve the problem ﬁrst witha large value σ = 0 .

25 K and second with a small value σ = 0 . N ED = { , , , , , , } .The number of reﬁnement samples is set to N ref = 1 , ,

000 steps and50 parallel chains, requiring a total of N ED = 1 . · likelihood evaluations. Based onnumerous heuristic convergence tests and due to the large number of MCMC steps, theresulting samples can be considered to accurately represent the true posterior distributions.The results of the analyses are summarized in Figure 7, where the error measure η isplotted against the number of likelihood evaluations for the large and small standard devi-ation case. For the large discrepancy standard deviation case, both SSE approaches clearlyoutperform standard SLE w.r.t. the error measure η . This is most signiﬁcant at mid-rangeexperimental designs ( N ED = 5 , , small discrepancy standard deviation, where the limitations of ﬁxed experimental designs becomeobvious. When the likelihood function is nonzero in a small subdomain of the prior, theglobal SLE and non-adaptive SSE approach will fail in practice because of the insuﬃcient a) Large discrepancy , σ = 0 .

25 K(b)

Small discrepancy , σ = 0 . Figure 7:

Moderate-dimensional heat transfer problem : Convergence of the η error measure(Eq. (28)) as a function of the experimental design size N ED in 20 replications for SLE, SSE witha static experimental design and the proposed adSSE approach. We display the two discrepancystandard deviation cases σ = { . , . } K. number of samples placed in the informative regions. The adSSE approach, however, worksvery well in these types of problems. It manages to identify the regions of interest andproduces a likelihood approximation that accurately reproduces the posterior marginals.Tables 1 and 2 show the convergence of the adSSE method’s moment estimate (meanand variance) to the reference solution for a single run. In brackets next to the momentestimates ξ , the relative error (cid:15) def = | ξ MCMC − ξ SSE | /ξ MCMC is also shown. Due to the non-strict positivity of the SSE estimate, one variance estimate computed with Eq. (22) is negativeand is therefore omitted from Table 2.The full posterior marginals obtained from one run of adSSE with N ED = 30 ,

000 are alsocompared to those of the reference MCMC and displayed in Figure 8. The individual plotsshow the univariate posterior marginals ( i.e. π ( x i |Y )) on the main diagonal and the bivariateposterior marginals ( i.e. π ( x ij |Y )) in the i -th row and j -th column. It can be clearly seen that Moderate-dimensional heat transfer problem : adSSE results with large discrepancystandard deviation σ = 0 .

25 K. Relative errors w.r.t.

MCMC reference solution are shown inbrackets. E [ κ i |Y ] κ κ κ κ κ κ adSSE N ED = 1 ,

000 30 . . . . . . . . . . . . N ED = 2 ,

000 31 . . . . . . . . . . . . N ED = 5 ,

000 29 . . . . . . . . . . . . N ED = 10 ,

000 29 . . . . . . . . . . . . N ED = 30 ,

000 29 . . . . . . . . . . . . MCMC 29 .

81 32 .

30 20 .

71 32 .

31 36 .

45 26 . (cid:112) Var [ κ i |Y ] adSSE N ED = 1 ,

000 6 . . . . . . . . . . . . N ED = 2 ,

000 6 . . . . . . . . . . . . N ED = 5 ,

000 2 . . . . . . . . . . . . N ED = 10 ,

000 2 . . . . . . . . . . . . N ED = 30 ,

000 2 . . . . . . . . . . . . MCMC 3 .

22 4 .

24 2 .

48 5 .

08 3 .

83 3 . Table 2:

Moderate-dimensional heat transfer problem : adSSE results with small discrepancystandard deviation σ = 0 . w.r.t. MCMC reference solution are shown inbrackets. Field with an asterisk ( ∗ ) indicates negative variance estimate. E [ κ i |Y ] κ κ κ κ κ κ adSSE N ED = 1 ,

000 30 . . . . . . . . . . . . N ED = 2 ,

000 30 . . . . . . . . . . . . N ED = 5 ,

000 30 . . . . . . . . . . . . N ED = 10 ,

000 30 . . . . . . . . . . . . N ED = 30 ,

000 31 . . . . . . . . . . . . MCMC 31 .

17 33 .

61 18 .

49 31 .

83 38 .

84 25 . (cid:112) Var [ κ i |Y ] adSSE N ED = 1 ,

000 6 . . . . . . . . . . . . N ED = 2 ,

000 6 . . . . . . . . . . . . N ED = 5 ,

000 6 . . . . . . . . . . . . N ED = 10 ,

000 4 . . ∗ ( ∗ ) 1 . . . . . . . . N ED = 30 ,

000 1 . . . . . . . . . . . . MCMC 1 .

63 2 .

40 1 .

30 3 .

54 1 .

98 1 . he posterior characteristics are very well captured. However, the adSSE approach sometimesfails to accurately represent the tails of the distribution. This is especially obvious in thesmall discrepancy case in Figure 8(c) where the tail is sometimes cut oﬀ. We emphasize herethat the SSE marginals are obtained analytically as 1D and 2D surfaces for the univariateand bivariate marginals respectively. For the reference MCMC approach, on the other hand,they need to be approximated with histograms based on the available posterior sample. In practical inference applications, posterior moments are often one of the main quantitiesof interest. An estimator of these moments is readily available at every reﬁnement step ofadSSE through Eq. (22).Tracking the evolution of the posterior moments throughout the adSSE iterations can beused as a heuristic estimator of the convergence of the adSSE algorithm. However, only thestability of the solution can be assessed, without guarantees on the bias. As an example,we now consider the large discrepancy problem and plot the evolution of the posterior meanand standard deviation for every X i as a function of the number of likelihood evaluations inFigure 9. It can be seen that after ∼ ,

000 likelihood evaluations, most moment estimatorsachieve convergence to a value close to the reference solution. This plot also reveals a smallbias of the E [ X |Y ] and (cid:112) Var [ X |Y ] estimators, that was previously highlighted in Table 1. N ref The main hyperparameter of the proposed adSSE algorithm is the number N ref , whichcorresponds to the number of sample points that are required at each PCE constructionstep (see Section 4.2). In Figure 10 we display the eﬀect of diﬀerent N ref values on theconvergence in the small and large discrepancy problems. N ref inﬂuences the accuracy of the two error estimators used inside the adSSE algorithm.They are: (i) the residual expansion accuracy E LOO in Eq. (23) and (ii) the splitting error E split in Eq. (24).Small values of N ref allow to quickly obtain a crude likelihood approximation with limitedexperimental design sizes N ED , but this comes at the cost of lower convergence rates at larger N ED . This behaviour can be partially attributed to the deterioration of residual expansionerror E LOO in Eq. (23). At small experimental design sizes, the overall number of terminaldomains is relatively small and this eﬀect is not as pronounced. At larger experimentaldesigns and higher numbers of subdomains, however, the error estimators high variances canlead to diﬃculties in identifying the true high error subdomains.Large values of N ref lead to slower initial convergence rates because of the smaller numberof overall subdomains. The algorithm stability, however, is increased because both errorestimators have lower variance and thereby allow the algorithm to more reliably identify the a) Large discrepancy , σ = 0 .

25 K, adSSE (b)

Large discrepancy , σ = 0 .

25 K, MCMC(c)

Small discrepancy , σ = 0 . Small discrepancy , σ = 0 . Figure 8:

Moderate-dimensional heat transfer problem : Comparison plots of the posterior dis-tribution marginals computed from adSSE ( N ED = 30 , N ED = 1 . · ). Theprior marginals are shown in red. 24 a) Mean (b) Standard deviation Figure 9:

Moderate-dimensional heat transfer problem : Evolution of the posterior moment es-timation for a typical run of the adSSE algorithm on the small discrepancy problem . The thinlines show the MCMC reference solution. (a)

Large discrepancy , σ = 0 .

25 K(b)

Small discrepancy , σ = 0 . Figure 10:

Moderate-dimensional heat transfer problem : Convergence of the η error measure(Eq. (28)) as a function of the experimental design size N ED in 20 replications for the proposedadSSE approach with diﬀerent N ref parameters. We display the two discrepancy standard devi-ation cases σ = { . , . } K. 25 a) Diﬀusion coeﬃcient (b) Concentration

Figure 11:

High-dimensional diﬀusion problem : 5 independent realizations of X and the result-ing κ and u with the model output u (1) highlighted with a circle. true high error subdomains and choose the split directions that maximize Eq. (25). The last cast study shows that adSSE for Bayesian model inversion remains feasible in highdimensional problems. The considered forward model is often used as a standard benchmarkin UQ computations (Shin and Xiu, 2016; Fajraoui et al., 2017). It represents the 1-Ddiﬀusion along a domain with coordinate ξ ∈ [0 ,

1] given by the following boundary valueproblem: − ∂∂ξ (cid:20) κ ( ξ ) ∂u∂ξ ( ξ ) (cid:21) = 1 , with  u (0) = 0 , ∂u∂ξ (1) = 1 . (33)The concentration ﬁeld u can be used to describe any steady-state diﬀusion driven process( e.g. , heat diﬀusion, concentration diﬀusion, etc.). Assume that the diﬀusion coeﬃcient κ is a log-normal random ﬁeld given by κ ( ξ, ω ) = exp (10 + 3 g ( ξ )) where g is a standardnormal stationary Gaussian random ﬁeld with exponential autocorrelation function ρ ( ξ, ξ (cid:48) ) =exp ( − | ξ (cid:48) − ξ | ). Let g be approximated through a truncated Karhunen-Lo`eve expansion g ( ξ ) ≈ M (cid:88) k =1 X k e k ( ξ ) , (34)with the pairwise uncorrelated random variables X k denoting the ﬁeld coeﬃcients and thereal valued function e k obtained from the solution of the Fredholm equation for ρ (Ghanemand Spanos, 1991). The truncation variable is set to M = 62 to explain 99% of the variance.Some realizations of the random ﬁeld and resulting concentrations are shown in Figure 11.In this example, the random vector of coeﬃcients X = ( X , . . . , X ) shall be inferredusing a single measurement of the diﬀusion ﬁeld at u ( ξ = 1) given by Y = 0 .

16. Theconsidered model therefore takes as an input a realization of that random vector, and returnsthe diﬀusion ﬁeld at ξ = 1, i.e. , M : x (cid:55)→ u (1). e impose a standard normal prior on the ﬁeld coeﬃcients such that π ( x ) = (cid:81) i =1 N ( x i | , σ = 10 − . This yields the likelihood function L ( x ; Y ) = 12 πσ exp (cid:18) − σ (0 . − M ( x )) (cid:19) . (35)We proceed to compare the performance of standard SLE, non-adaptive SSE and theproposed adSSE approach on this example. We solve the problem with a set of maximumlikelihood evaluations N ED = { , , , , , } .In the present high-dimensional case, it is necessary to set N ref to a relatively largenumber ( N ref = 2 , N ref numbers the variance of the estimator of E split inEq. (24) makes it diﬃcult for the algorithm to correctly identify the splitting direction thatmaximizes Eq. (25).To compare the results of the algorithms, they are compared to a reference MCMCsolution obtained with the aﬃne-invariant ensemble sampler (Goodman and Weare, 2010)algorithm with 100 ,

000 steps and 100 parallel chains at a total cost of 10 likelihood evalu-ations.To allow a quantitative comparison, we again use the error measure from Eq. (28) with M = 62. It is plotted for a set of maximum likelihood evaluations in Figure 12. It is clearthat both SSE algorithms outperform SLE, while the adSSE approach manages to improvethe performance of SLE by an order of magnitude.The overall small magnitude of the error η in Figure 12 can be attributed to the low active dimensionality of this problem. Despite its high nominal dimensionality ( M = 62),this problem in fact has only very few active dimensions, as the ﬁrst few variables aresigniﬁcantly more important than the rest. In physical terms, very local ﬂuctuations ofthe conductivity do not inﬂuence the output u (1) which results from an integration of theseﬂuctuations. Therefore, the biggest change between prior and posterior distribution happensin the ﬁrst few parameters (see also Table 3), while the other parameters remain unchangedby the updating procedure. This results in a small value of the Jensen-Shannon divergencefor the inactive dimensions that lower the average value η as deﬁned in Eq. (28).To highlight the results of one adSSE algorithm’s instance with N ED = 10 , { X , X , X } . The remaining posteriorparameters are not signiﬁcantly inﬂuenced by the considered data. The comparative plotsshow a good agreement between the adSSE and the reference solution, especially w.r.t. theinteraction between X and { X , X } . It can also be seen that the data Y has the biggestinﬂuence on the ﬁrst parameter X .For the same instance, we also compute the ﬁrst two posterior moments for all posteriormarginals and compare them to the MCMC reference solution. The resulting values arepresented in Table 3. Keeping in mind that the prior distribution is a multivariate standard High-dimensional diﬀusion problem : Convergence of the η error measure (Eq. (28))over ﬁve replications for SLE, SSE with a static experimental design and the proposed adSSEapproach.Table 3: High-dimensional diﬀusion problem : Posterior mean and standard deviation for allmarginals. The values in brackets are computed from the MCMC reference solution. Theprior is a multivariate standard normal distribution ( E [ X i ] = 0 and (cid:112) Var [ X i |Y ] = 1 for i =1 , . . . , , , , . . . , , X X X X X X E [ X i |Y ] − . − .

89) 0 . .

14) 0 . .

09) 0 . − . − . − . − . . (cid:112) Var [ X i |Y ] 0 . .

14) 1 . .

04) 0 . .

03) 1 . .

00) 1 . .

00) 0 . . X X X X X X E [ X i |Y ] 0 . .

03) 0 . . − . − . − . − . − . .

01) 0 . − . (cid:112) Var [ X i |Y ] 1 . .

01) 1 . .

98) 1 . .

00) 1 . .

02) 1 . .

99) 1 . . normal distribution ( E [ X i ] = 0 and (cid:112) Var [ X i |Y ] = 1 for i = 1 , . . . ,

62) it is obvious fromthis table that the data most signiﬁcantly aﬀects the ﬁrst three parameters.The adSSE approach manages to accurately recover the ﬁrst two posterior moments atthe relatively low cost of N ED = 10 ,

000 likelihood evaluations. The average absolute errorfor E [ X i ] and (cid:112) Var [ X i |Y ] is approximately 0 . Motivated by the recently proposed spectral likelihood expansions (SLE) framework forBayesian model inversion presented in Nagel and Sudret (2016), we showed that the sameanalytical post-processing capabilities can be derived when the novel stochastic spectral em-bedding (SSE) approach from Marelli et al. (2020) is applied to likelihood functions. BecauseSSE is designed for models with complex local characteristics , it was expected to outperformSLE on practically relevant, highly localized likelihood functions. To further improve SSE a) adSSE (b) MCMC Figure 13:

High-dimensional diﬀusion problem : Comparison plots of the posterior distributionmarginals for the ﬁrst 3 parameters { X , X , X } computed from adSSE ( N ED = 10 , N ED = 10 ). The prior marginals are shown in red. performance in Bayesian model inversion applications, we introduced a novel adaptive sam-pling procedure and modiﬁed partitioning strategy.There are a few unsolved shortcomings of the SSE that will be addressed in future works.Namely, the discontinuities at the subdomain boundaries may be a source of error thatshould be addressed. Additionally, for the adaptive SSE algorithm, it is not possible at themoment to specify the optimal N ref parameter a priori. In light of the considerable inﬂuenceof that parameter as shown in Section 5.2.2, it might be necessary to adaptively adjust it,or decouple this parameter from the termination criterion.Approximating likelihood functions through local PCEs prohibits the enforcement ofstrict positivity throughout the function domain. For visualization purposes this is notan issue, as negative predictions can simply be set to 0 in a post-processing step. Whencomputing posterior expectations with Eq. (22), however, this can lead to erroneous resultssuch as negative posterior variances. One way to enforce strict positivity is through an initialtransformation of the likelihood function ( e.g. , log-likelihood log L ≈ (cid:80) k ∈K f PCE k ( X )). Thisis avoided in the present work because it comes at a loss of the desirable analytical post-processing properties.The biggest advantage of SSE, however, is that it poses the challenging Bayesian com-putation in a function approximation setting. This yields an analytical expression of theposterior distribution and preserves the analytical post-processing capabilities of SLE while elivering highly improved likelihood function approximations. As opposed to many existingalgorithms, SSE can even eﬃciently solve Bayesian problems with multiple posterior modes.As shown in the case studies, the proposed adaptive algorithm further capitalizes on thecompact support nature of likelihood functions and leads to signiﬁcant performance gains,especially at larger experimental designs. Acknowledgements

The PhD thesis of the ﬁrst author is supported by ETH grant

References

Beck, J. and S. Au (2002). Bayesian updating of structural models and reliability usingMarkov chain Monte Carlo simulation.

Journal of Engineering Mechanics 128 (4), 380–391.Beck, J. L. and L. S. Katafygiotis (1998). Updating models and their uncertainties. I:Bayesian statistical framework.

Journal of Engineering Mechanics 124 (4), 455–461.Birolleau, A., G. Potte, and D. Lucor (2014). Adaptive Bayesian inference for discontin-uous inverse problems, application to hyperbolic conservation laws.

Communications inComputational Physics 16 (1), 1–34.Bishop, C. M. (2006).

Pattern recognition and machine learning . Information Science andStatistics. Springer.Blatman, G. and B. Sudret (2010). An adaptive algorithm to build up sparse polynomialchaos expansions for stochastic ﬁnite element analysis.

Probabilistic Engineering Mechan-ics 25 , 183–197.Blatman, G. and B. Sudret (2011). Adaptive sparse polynomial chaos expansion based onLeast Angle Regression.

Journal of Computational Physics 230 , 2345–2367.Brooks, S. P. and A. Gelman (1998). General methods for monitoring convergence of iterativesimulations.

Journal of Computational and Graphical Statistics 7 (4), 434–455.Chapelle, O., V. Vapnik, and Y. Bengio (2002). Model selection for small sample regression.

Journal of Machine Learning Research 48 (1), 9–23.Ching, J. and Y. Chen (2007). Transitional Markov chain Monte Carlo method for Bayesianmodel updating, model class selection, and model averaging.

Journal of EngineeringMechanics 133 (7), 816–832. ick, J., R. N. Gantner, Q. T. Le Gia, and C. Schwab (2017). Multilevel higher-orderquasi-Monte Carlo Bayesian estimation. Mathematical Models and Methods in AppliedSciences 27 (5), 953–995.El Moselhy, T. A. and Y. M. Marzouk (2012). Bayesian inference with optimal maps.

Journalof Computational Physics 231 (23), 7815–7850.Fajraoui, N., S. Marelli, and B. Sudret (2017). Sequential design of experiment for sparsepolynomial chaos expansions.

SIAM/ASA Journal on Uncertainty Quantiﬁcation 5 (1),1061–1085.Gantner, R. N. and M. D. Peters (2018). Higher-order quasi-Monte Carlo for Bayesian shapeinversion.

SIAM/ASA Journal on Uncertainty Quantiﬁcation 6 (2), 707–736.Gautschi, W. (2004).

Orthogonal polynomials: Computation and approximation . NumericalMathematics and Scientiﬁc Computation. Oxford University Press.Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin (2014).

Bayesian Data Analysis (3 ed.). Texts in Statistical Science. CRC Press.Gelman, A. and D. B. Rubin (1992). Inference from iterative simulation using multiplesequences.

Statistical Science 7 (4), 457–472.Ghanem, R. G. and P. Spanos (1991).

Stochastic ﬁnite elements – A spectral approach .Springer Verlag, New York. (Reedited by Dover Publications, Mineola, 2003).Goodman, J. and J. Weare (2010). Ensemble samplers with aﬃne invariance.

Communica-tions in Applied Mathematics and Computational Science 5 (1), 65–80.Haario, H., E. Saksman, and J. Tamminen (2001). An adaptive Metropolis algorithm.

Bernoulli 7 (2), 223–242.Harney, H. L. (2016).

Bayesian Inference: Data Evaluation and Decisions (2 ed.). SpringerInternational Publishing.Jakeman, J., M. S. Eldred, and K. Sargsyan (2015). Enhancing (cid:96) -minimization estimates ofpolynomial chaos expansions using basis selection. Journal of Computational Physics 289 ,18–34.Jaynes, E. T. (2003).

Probability theory: The logic of science . Cambridge University Press.Jeﬀreys, H. (1946). An invariant form for the prior probability in estimation problems.

Proceedings of the Royal Society of London A: Mathematical, Physical and EngineeringSciences 186 (1007), 453–461. ennedy, M. C. and A. O’Hagan (2001). Bayesian calibration of computer models. Journalof the Royal Statistical Society: Series B (Statistical Methodology) 63 (3), 425–464.Li, J. and Y. M. Marzouk (2014). Adaptive construction of surrogates for the Bayesiansolution of inverse problems.

SIAM Journal on Scientiﬁc Computing 36 (3), A1163–A1186.Lin, J. (1991). Divergence measures based on the Shannon entropy.

Transactions on Infor-mation Theory 37 (1), 145–151.L¨uthen, N., S. Marelli, and B. Sudret (2020). Sparse polynomial chaos expansions: Re-view and benchmark.

SIAM/ASA Journal on Uncertainty Quantiﬁcation . submitted,https://arxiv.org/abs/2002.01290.Marelli, S. and B. Sudret (2014). UQLab: A framework for uncertainty quantiﬁcation inMatlab. In

Vulnerability, Uncertainty, and Risk (Proceedings of the 2nd InternationalConference on Vulnerability, Risk Analysis and Management (ICVRAM2014), Liverpool,United Kingdom) , pp. 2554–2563.Marelli, S. and B. Sudret (2019). UQLab user manual – Polynomial chaos expansions. Techni-cal report, Chair of Risk, Safety and Uncertainty Quantiﬁcation, ETH Zurich, Switzerland.Report

Statistics and Computing 22 (6), 1167–1180.Marzouk, Y., T. Moselhy, M. Parno, and A. Spantini (2016). Sampling via measure trans-port: An introduction. In R. Ghanem, D. Higdon, and H. Owhadi (Eds.),

Handbook ofUncertainty Quantiﬁcation . Springer International Publishing.Marzouk, Y. M., H. N. Najm, and L. A. Rahn (2007). Stochastic spectral methods foreﬃcient Bayesian solution of inverse problems.

Journal of Computational Physics 224 ,560–586.Marzouk, Y. M. and D. Xiu (2009). A stochastic collocation approach to Bayesian inferencein inverse problems.

Communications in Computational Physics 6 (4), 826–847.Nagel, J. and B. Sudret (2016). Spectral likelihood expansions for Bayesian inference.

Journalof Computational Physics 309 , 267–294.Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. L.Jones, and X.-L. Meng (Eds.),

Handbook of Markov Chain Monte Carlo , Handbooks ofModern Statistical Methods, Chapter 5, pp. 113–162. Chapman & Hall/CRC. arno, M. D. (2015). Transport maps for accelerated Bayesian computation . PhD thesis,Massachusetts Institute of Technology (MIT).Robert, C. P. and G. Casella (2004).

Monte Carlo statistical methods (2 nd Ed.) . SpringerSeries in Statistics. Springer Verlag.Rosenblatt, M. (1952). Remarks on a multivariate transformation.

Annals of MathematicalStatistics 23 (3), 470–472.Shin, Y. and D. Xiu (2016). Nonadaptive quasi-optimal points selection for least squareslinear regression.

SIAM Journal on Scientiﬁc Computing 38 (1), A385–A411.Sisson, S. A., Y. Fan, and M. A. Beaumont (Eds.) (2018).

Handbook of approximate Bayesiancomputation . Chapman and Hall/CRC.Tarantola, A. (2005).

Inverse Problem Theory and Methods for Model Parameter Estimation .Society for Industrial and Applied Mathematics (SIAM).Tierney, L. and J. B. Kadane (1986). Accurate approximations for posterior moments andmarginal densities.

Journal of the American Statistical Association 81 (393), 82–86.Tierney, L., R. E. Kass, and J. B. Kadane (1989a). Approximate marginal densities ofnonlinear functions.

Biometrika 76 (3), 425–433.Tierney, L., R. E. Kass, and J. B. Kadane (1989b). Fully exponential Laplace approximationsto expectations and variances of nonpositive functions.

Journal of the American StatisticalAssociation 84 (407), 710–716.Wagner, P.-R., R. Fahrni, M. Klippel, A. Frangi, and B. Sudret (2020). Bayesian calibrationand sensitivity analysis of heat transfer models for ﬁre insulation panels.

EngineeringStructures 205 (15), 110063.Wand, M. and M. C. Jones (1995).

Kernel smoothing . Chapman and Hall, Boca Raton.Xiu, D. (2010).

Numerical methods for stochastic computations – A spectral method approach .Princeton University press.Xiu, D. and G. E. Karniadakis (2002). The Wiener-Askey polynomial chaos for stochasticdiﬀerential equations.

SIAM Journal on Scientiﬁc Computing 24 (2), 619–644.(2), 619–644.