A Generalized Probabilistic Learning Approach for Multi-Fidelity Uncertainty Propagation in Complex Physical Simulations
Jonas Nitzler, Jonas Biehler, Niklas Fehn, Phaedon-Stelios Koutsourelakis, Wolfgang A. Wall
RReceived: Added at production Revised: Added at production Accepted: Added at productionDOI: xxx/xxxx
ARTICLE TYPE
A Generalized Probabilistic Learning Approach for Multi-FidelityUncertainty Propagation in Complex Physical Simulations
J. Nitzler | J. Biehler | N. Fehn | P.-S. Koutsourelakis | W. A. Wall Institute for Computational Mechanics,Technical University of Munich, Germany Professorship of Continuum Mechanics,Technical University of Munich, Germany
Correspondence *W. A. Wall, Email: [email protected]
Present Address
Boltzmannstrasse 15, 85748 Garching b.München
Abstract
Two of the most significant challenges in uncertainty propagation pertain to thehigh computational cost for the simulation of complex physical models and the highdimension of the random inputs. In applications of practical interest both of theseproblems are encountered and standard methods for uncertainty quantification eitherfail or are not feasible. To overcome the current limitations, we propose a prob-abilistic multi-fidelity framework that can exploit lower-fidelity model versions ofthe original problem in a small data regime. The approach circumvents the curse ofdimensionality by learning dependencies between the outputs of high-fidelity modelsand lower-fidelity models instead of explicitly accounting for the high-dimensionalinputs. We complement the information provided by a low-fidelity model with a low-dimensional set of informative features of the stochastic input, which are discoveredby employing a combination of supervised and unsupervised dimensionality reduc-tion techniques. The goal of our analysis is an efficient and accurate estimation ofthe full probabilistic response for a high-fidelity model. Despite the incomplete andnoisy information that low-fidelity predictors provide, we demonstrate that accurateand certifiable estimates for the quantities of interest can be obtained in the smalldata regime, i.e., with significantly fewer high-fidelity model runs than state-of-the-art methods for uncertainty propagation. We illustrate our approach by applyingit to challenging numerical examples such as Navier-Stokes flow simulations andmonolithic fluid-structure interaction problems.
KEYWORDS:
Uncertainty Quantification; Probabilistic Learning; Multi-Fidelity Schemes; Bayesian Inference; Fluid-Structure Interaction; Discontinuous Galerkin Method; Inverse Problems; High-Dimensions
The analysis of complex real-world systems is usually based on sophisticated, high-fidelity (HF) computer models. Accuracycomes at the cost of computationally expensive simulations, characterized by detailed physical resolution, fine temporal and spa-tial discretizations, as well as narrow numerical tolerances. A single evaluation of such models, for example large scale nonlinearand transient biomechanical problems or coupled fluid simulations, can take hours or days even on modern high-performanceclusters. Nevertheless, many questions in industry and science require a vast number of accurate computer simulations to under-stand different system configurations, boundary conditions, perform optimization tasks, or investigate forward and backward a r X i v : . [ c s . C E ] J a n J. Nitzler
ET AL . uncertainty propagation. Unfortunately, limitations in available resources render the aforementioned types of analysis unfeasi-ble, so that for most practical applications, analysts either avoid such fundamental investigations completely, or fall back to lessaccurate but cheaper low-fidelity (LF) variants of the original problem to conduct the analysis.One strategy to overcome these problems pertains to multi-fidelity schemes which, by combining information provided bydifferent levels of model sophistication, attempt to decrease the number of high-fidelity model runs required, while retainingthe same accuracy . Especially sampling based methods for uncertainty propagation, while often being the only choice fornonlinear problems with large variabilities, become unfeasible for costly numerical models. Multi-level Monte Carlo meth-ods (MLMC) were some of the earliest schemes used to accelerate the calculation of the expectation and variance of aquantity of interest (QoI) on complex models, given uncertain inputs 𝒙 . Unfortunately, the method necessitates linear depen-dence between model outputs and can only yield asympotic error estimates. The estimation of the whole response distributionis restricted to special cases . Other contributions used low-fidelity model versions to identify important regions in the inputspace, motivating adaptive sampling strategies and multi-fidelity importance sampling schemes , while still requiring costlysampling of the HF model. Similar ideas arose for inverse problems in the form of multi-stage Markov-chain Monte Carlo meth-ods . Promising alternative approaches that recently gained considerable attention are the so-called Bayesian multi-fidelityschemes . We already demonstrated their superior performance for large scale numerical problems .Thus far, most state-of-the-art multi-fidelity methods aim to construct an approximation for the high-fidelity (HF) computersimulation 𝑦 HF ( 𝒙 ) in the form of a surrogate model ̂𝑓 ( 𝒙 ) ≈ 𝑦 HF ( 𝒙 ) that is built based on a small number of HF simulations.Such multi-fidelity approaches exploit the characteristic that the discrepancy of HF and LF model responses has a simplermathematical structure than the HF model response 𝑦 HF ( 𝒙 ) itself and hence, the error over 𝒙 can be efficiently learned to yieldgood HF response predictions using few data . Unfortunately, such approaches face serious problems for applicationswith high stochastic dimensionality, especially in the case of a small data scenario which refers to the small amount of HFsimulations available due to the associated high costs.This contribution expands upon our previous work in presenting a generalized multi-fidelity framework that is well suitedfor uncertainty propagation in very high stochastic dimensions. The advances proposed lead to a higher accuracy so that wecan exploit a wider range of automatically generated low-fidelity versions of the original problem. We provide a general theo-retic viewpoint on the topic of Bayesian uncertainty propagation while emphasizing the practical applicability of the proposedtechniques towards a broad field of engineering problems. Bayesian multi-fidelity uncertainty propagation is a non-intrusive,data-driven approach that can be used with any numerical solver. In general, only one low-fidelity model is required but themethod can seamlessly integrate several sources of information to improve its performance. The principal result of the methodis a full probabilistic description of the quantity of interest (QoI) in the form of an approximation of the high-fidelity proba-bility density 𝑝 ( 𝑦 HF ) . Point estimates such as event probabilities, expectations or maximum likelihood estimates can then becalculated along with credible intervals due to the Bayesian nature of our approach.The paper is structured as followed: In Section 2, we present the theoretical foundation for Bayesian forward and backwarduncertainty propagation, as well as the key idea for the multi-fidelity extension of both cases. We then discuss the challengesassociated with a small data scenario and introduce a probabilistic learning strategy that leads to accurate prediction of theHF model’s output density by exploiting the information provided by the output of a LF model as well as by informativefeatures of the model’s input. We demonstrate how such informative features of the model input can be learned based on acombination of supervised and unsupervised dimensionality reduction techniques, at negligible extra cost. The derivations arefirst formulated for any probabilistic regression model before they are specialized for Gaussian Processes. We demonstrate thatour Bayesian multi-fidelity approach can provide credible intervals for the density estimate itself. As we aim at complex physicalsimulations, we additionally present strategies for the automated generation of efficient LF model versions that ultimately lead tosignificant computational speed-ups of our multi-fidelity approach, compared to other state-of-the-art methods for uncertaintyquantification. The steps of the generalized Bayesian multi-fidelity approach for uncertainty quantification are summarized ina pseudo-algorithm in Section 2.4. Section 3 provides theoretical suggestions for an automated creation of low-fidelity modelversions based on numerical relaxation. Afterwards, the framework is discussed and demonstrated for challenging fluid andfluid-structure interaction problems in Section 4. Apart from the algorithmic aspects, we also focus on the modeling of physicallycompliant random boundary conditions, random fields, and their numerical realization. We conclude with a discussion of thenumerical results and computational performance, and provide an outlook on possible future developments. . Nitzler ET AL . Uncertainty quantification (UQ) can be distinguished into forward and backward uncertainty propagation (the latter being knownas inverse problems or model calibration problems). Forward UQ aims to propagate the uncertainty of a random input vector 𝒙 with a given density 𝑝 ( 𝒙 ) through a physics-based, high-fidelity, numerical model to accurately and efficiently quantify theuncertainty of one or more outputs or Quantities of Interest (QoIs) 𝒚 , for example in the form of their density 𝑝 ( 𝒚 HF ) . Therandom vector 𝒙 can represent uncertainties in model parameters, loads, excitations or boundary and initial conditions. Forapplications of practical interest its dimension is very high (in the hundreds or thousands). On the other hand, in backwarduncertainty propagation, given a similar mathematical model and, in general, noisy observations 𝑌 obs of the system’s output 𝒚 ,the goal is to estimate a vector of model inputs 𝒙 .In the following we denote by 𝑦 HF ( 𝒙 ) the deterministic input-output map implied by a high-fidelity model which in most casesof practical interest is not available in closed form and expensive to evaluate (e.g., for each value of 𝒙 the numerical solutionof time-dependent, nonlinear PDEs needs to be carried out). We assume that the high-fidelity model is the reference model,i.e., its predictions 𝑦 HF coincide with the QoI. For clarity of the presentation we consider the scalar case, i.e. 𝑦 HF ∶ R 𝑑 → R ,where 𝑑 = dim( 𝒙 ) . Furthermore, to simplify the notation, we make no distinction between random variables and the valuesthese can take. In this notation, 𝒙 or 𝑦 HF denote the respective random variables and possible realizations, whereas 𝑦 HF ( 𝒙 ) refersto a deterministic function. Scalar quantities are expressed by plain letters (e.g., 𝑦 HF for a scalar, high-fidelity model output), incontrast to boldface letters (such as the input vector 𝒙 ), which denote vector-valued quantities. We denote with capital letters adata set that can either consist of scalar quantities or vector-valued quantities. Data sets of scalar quantities are written in plaincapital letters such as the vector of row-wise scalar experimental observations 𝑌 obs . On the other hand, vector-valued quantities,such as the matrix of row-wise vector-valued model inputs 𝑿 , are written with boldface capital letters. The most importantdistinction is made between the training data (indicated by capital letter but without further superscripts, e.g., 𝑌 HF ) and test datathat has an asterisk superscript (e.g., 𝒙 ∗ for one arbitrary test input or the large data set of all test inputs 𝑿 ∗ ).For forward UQ we seek the whole response density 𝑝 ( 𝑦 HF ) which can then be used to calculate any statistic of interest. Theresulting output density for the QoI can be expressed as the integral over the conditional distribution 𝑝 ( 𝑦 HF | 𝒙 ) weighted by thedensity of the input 𝑝 ( 𝒙 ) . In the special case of a deterministic function 𝑦 HF ( 𝒙 ) , the conditional distribution 𝑝 ( 𝑦 HF | 𝒙 ) can beexpressed in form of a Dirac distribution 𝑝 ( 𝑦 HF | 𝒙 ) = 𝛿 𝑦 HF ( 𝑦 HF − 𝑦 HF ( 𝒙 ) ) : 𝑝 ( 𝑦 HF ) = ∫ Ω 𝒙 𝑝 ( 𝑦 HF | 𝒙 ) 𝑝 ( 𝒙 ) 𝑑 𝒙 = ∫ Ω 𝒙 𝛿 𝑦 HF ( 𝑦 HF − 𝑦 HF ( 𝒙 ) ) 𝑝 ( 𝒙 ) 𝑑 𝒙 (1)Equation (1) is usually approximated by Monte Carlo methods which, depending on the demanded level of accuracy andespecially for the tails of 𝑝 ( 𝑦 HF ) = (i.e., rare events), would require a huge amount of evaluations of 𝑦 HF ( 𝒙 ) . The overallcomputational cost can render such an approach impracticable or unfeasible. Alternative strategies have attempted to approxi-mate the map 𝑦 HF ( 𝒙 ) or the conditional density 𝑝 ( 𝑦 HF | 𝒙 ) using a variety of surrogates or emulators which are generally trainedon n train simulation data pairs, i.e., HF = { 𝒙 𝑖 , 𝑦 HF ( 𝒙 𝑖 )} n train 𝑖 =1 . Given the high dimension 𝑑 of 𝒙 , this task gives rise to several accu-racy and efficiency challenges. Even when the most expressive, modern machine learning tools are deployed (e.g., Deep NeuralNets) the number n train of high-fidelity evaluations needed to achieve an acceptable level of accuracy can render such methodsimpracticable or unfeasible as well.For inverse problems, the answer in the Bayesian framework is provided in the form of the so-called posterior probabilitydensity on 𝒙 , given the observations, i.e. 𝑝 ( 𝒙 | 𝑌 obs ) , which arises as a combination of the prior density 𝑝 ( 𝒙 ) and the likelihood: 𝑝 ( 𝒙 | 𝑌 obs ) ⏟⏞⏞⏟⏞⏞⏟ Posterior ∝ 𝑝 ( 𝑌 obs | 𝑦 HF ( 𝒙 ) ) ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ Likelihood 𝑝 ( 𝒙 ) ⏟⏟⏟ Prior (2)From the posterior we can compute point estimates 𝒙 est that represent the vector of calibrated model parameters. The Bayesianinterpretation of backward uncertainty propagation provides a natural regularization framework for ill-posed problems. Thecomputationally expensive part is the likelihood 𝑝 ( 𝑌 obs | 𝑦 HF ( 𝒙 ) ) , as for each evaluation, the HF model must solved. Usually, theposterior distribution 𝑝 ( 𝒙 | 𝑌 obs ) is approximated by advanced sampling methods such as Markov-Chain Monte Carlo (MCMC)sampling or sequential Monte-Carlo (SMC) methods . Nevertheless, even modern sampling techniques also require an unfeasibleamount of costly HF model evaluations.In this contribution, we will focus on an efficient approach for forward uncertainty propagation but also highlight the analogiesto Bayesian model calibration. An in-depth discussion of the latter case is, however, outside the scope of this paper. J. Nitzler
ET AL . The previous expressions (1) and (2) involved a computationally expensive high-fidelity computer model implied by 𝑦 HF ( 𝒙 ) . Inthe following, we demonstrate how less expensive lower-fidelity models in combination with low-dimensional features of 𝒙 canbe employed to obtain accurate and certifiable estimates of the aforementioned quantities requiring only very small numbers ofhigh-fidelity runs. In the simplest version, we presuppose the availability of a lower-fidelity model which provides a potentiallyvery poor approximation of the QoI. In the scalar case, we denote this with 𝑦 LF and the associated input-output (deterministic)map by 𝑦 LF ( 𝒙 ) .In contrast to multi-level Monte-Carlo techniques, which also make use of lower-fidelity models in combination with frequen-tist estimators, we advocate a Bayesian perspective , which we refer to as Bayesian Multi-Fidelity Monte-Carlo (BMFMC)method . The basis of the framework advocated is re-expressing the sought density as: 𝑝 ( 𝑦 HF ) = ∫ Ω 𝒙 𝑝 ( 𝑦 HF | 𝒙 ) ⏟⏞⏞⏟⏞⏞⏟ Dirac:comp. expensive ⋅ 𝑝 ( 𝒙 ) 𝑑 𝒙 → standard forward UQ, see equation (1) = ∫ Ω 𝑦 LF ∫ Ω 𝒙 𝑝 ( 𝑦 HF , 𝑦 LF , 𝒙 ) 𝑑 𝒙 𝑑𝑦 LF → write as joint density and expand expression by LF model response 𝑦 LF (3) = ∫ Ω 𝑦 LF ∫ Ω 𝒙 𝑝 ( 𝑦 HF , 𝒙 | 𝑦 LF ) ⋅ 𝑝 ( 𝑦 LF ) 𝑑 𝒙 𝑑𝑦 LF → condition on LF model 𝑦 LF = ∫ Ω 𝑦 LF 𝑝 ( 𝑦 HF | 𝑦 LF ) ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ Approximate withlittle HF data ⋅ 𝑝 ( 𝑦 LF ) ⏟⏟⏟ Samplingon LF 𝑑𝑦 LF → integrate over inputs 𝒙 to yield multi-fidelity formulation in BMFMC We note that none of the expressions above contain any errors or approximations . Furthermore, the crucial conditional density 𝑝 ( 𝑦 HF | 𝑦 LF ) that must be learned or estimated is independent of the dimension of the input vector 𝒙 . The premise of BMFMC is that all the densities above can be estimated at a cost (as measured by the number of high-fidelity solves) which is much lessthan the alternatives. In the case of 𝑝 ( 𝑦 LF ) this can be achieved as long as the lower-fidelity model is much cheaper than thehigh-fidelity reference. To assess the feasibility of this task for 𝑝 ( 𝑦 HF | 𝑦 LF ) and to better understand the role of this conditionaldensity we consider the following limiting cases: extreme 1) The lower-fidelity model 𝑦 LF is independent of 𝑦 HF , i.e., 𝑝 ( 𝑦 HF | 𝑦 LF ) = 𝑝 ( 𝑦 HF ) . While (3) remains valid, any attemptto estimate 𝑝 ( 𝑦 HF | 𝑦 LF ) will be comparable to a Monte Carlo estimator applied directly on 𝑦 HF . Hence, it is unlikely thatany significant efficiency gains could be achieved. extreme 2) The lower- and high-fidelity model are fully dependent, i.e., there is a function, say 𝑓 , such that 𝑦 HF = 𝑓 ( 𝑦 LF ) and 𝑝 ( 𝑦 HF | 𝑦 LF ) = 𝛿 𝑦 HF ( 𝑦 HF − 𝑓 ( 𝑦 LF ) ) . Any efficiency gains, in this case, would depend on the probabilistic machine learningmodel postulated for learning the conditional.Given a physically motivated 𝑦 LF ( 𝒙 ) , one would expect the actual 𝑝 ( 𝑦 HF | 𝑦 LF ) to be between these two extremes as shown inFigure 1. Errors will only be introduced through the numerical approximation of the densities in equation (3). We distinguishtherefore between the following sources of error: error 1) The primary source of error in the method stems from the probabilistic model (e.g., Gaussian Processes or other prob-abilistic regression tools) selected to approximate the conditional distribution 𝑝 ( 𝑦 HF | 𝑦 LF ) . If the family of approximatingdensities considered does not include the true one, a modeling error will be introduced. For a Gaussian Process modelin Ω 𝑦 HF × 𝑦 LF , such assumptions are, i.e., a normal distributed conditional 𝑝 ( 𝑦 HF | 𝑦 LF ) along with smoothness properties ofthe random process, imposed by the selected kernel. error 2) The second error source pertains to the amount of available training data (i.e., pairs of lower- and high-fidelity runs).Even if the true 𝑝 ( 𝑦 HF | 𝑦 LF ) belongs to the selected model class, it is unlikely that it would be recovered exactly withlimited data. . Nitzler ET AL . BMFMC brings two major advantages for UQ with computationally demanding computer models: Firstly, by exploiting informa-tion encoded in computationally cheaper, lower-fidelity (LF) versions of the original computer model, it can drastically reducethe number of costly HF model evaluations and enable forward and backward uncertainty propagation even for very expensivemodels. Secondly, by learning the statistical dependence between the 𝑦 LF and 𝑦 HF , BMFMC circumvents the curse of dimen-sionality which arises as a result of high-dimensional model inputs 𝒙 . As an explicit treatment of the high-dimensional inputvector 𝒙 in surrogates is not expedient, especially for a low number of HF data, we make use instead of statistical dependenciesbetween 𝑦 LF and 𝑦 HF .Figure 1 provides a visualization of the main terms in equation (3). In particular, Figure 1(a) shows examples of responsesurfaces for a two-dimensional input 𝒙 . The upper response surface represents a LF model and the lower one the correspondingHF model response. A red dot marks a function value for the same 𝒙 on both models. The corresponding Dirac density 𝑝 ( 𝑦 HF | 𝒙 ) (b)(a) 𝑦 LF = 𝑐𝑜𝑛𝑠𝑡.𝑦 LF ( 𝒙 ) 𝑦 HF ( 𝒙 ) 𝑦 HF | 𝑦 LF 𝑦 HF | 𝑦 LF 𝑦 HF ( 𝒙 ) 𝑝 ( 𝑦 HF , 𝑦 LF ) low high9630369 𝑦 HF 𝑥 𝑥 𝑦 LF 𝑝 ( 𝑦 HF | 𝒙 ) 𝑝 ( 𝑦 HF | 𝑦 LF ) 𝑦 FIGURE 1
Visualization of HF and LF model dependencies. Left: Example of LF and HF model outputs with two inputvariables 𝑥 and 𝑥 . Right: Dependence between LF and HF output. The joint density 𝑝 ( 𝑦 LF , 𝑦 HF ) is color-coded. Conditionaldensities 𝑝 ( 𝑦 HF | 𝒙 ) and 𝑝 ( 𝑦 HF | 𝑦 LF ) are shown as slices of 𝑝 ( 𝑦 LF , 𝑦 HF ) in blue.is shown by a blue arrow, centered on the red dot in Figure 1(b). An indicative conditional density of 𝑦 HF given 𝑦 LF is also shownin Figure 1(b). The vertical red line shows the support for the corresponding conditional density 𝑝 ( 𝑦 HF | 𝑦 LF ) which encodes theknowledge about possible outcomes of 𝑦 HF when only 𝑦 LF is observed (without information of a specific 𝒙 that yielded 𝑦 LF ).In this contribution we generalize expression (3) by considering, next to LF models, also informative features 𝜸 ( 𝒙 ) = 𝛾 𝑗 ( 𝒙 ) of the input 𝒙 , with 𝑗 ∈ N ∶ [1 , 𝑁 𝛾 ] and 𝑁 𝛾 being the number of LF models used in the multi-fidelity approach. We willfurther elaborate on this in Section 2.4. We denote the vector of informative features 𝜸 ( 𝒙 ) and LF responses 𝑦 LF ( 𝒙 ) as 𝒛 LF ( 𝒙 ) =[ 𝑦 LF ( 𝒙 ) , 𝜸 ( 𝒙 )] 𝑇 . With 𝒛 LF , we jointly denote the corresponding random vector as well as values that this can take. The basicelements of BMFMC remain unaltered if one employs multiple low-fidelity features, summarized in 𝒛 LF , so that equation (3)becomes: 𝑝 ( 𝑦 HF ) = ∫ Ω 𝒛 LF 𝑝 ( 𝑦 HF | 𝒛 LF ) ⋅ 𝑝 ( 𝒛 LF ) 𝑑 𝒛 LF (4)We demonstrate in the subsequent sections how the modeling error (i.e., error 1 above) can be reduced and superior estimatescan be obtained by an appropriate selection of the input features.In order to show the generality and power of the proposed strategy, we would like to conclude this section by giving a shortoutlook on the applicability of the Bayesian multi-fidelity approach for inverse uncertainty propagation which we derive from J. Nitzler
ET AL . the standard formulation in equation (2): 𝑝 ( 𝒙 | 𝑌 obs ) ∝ 𝑝 ( 𝑌 obs | 𝑦 HF ( 𝒙 ) ) ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ Likelihood ⋅ 𝑝 ( 𝒙 ) → Standard Bayesian inverse problem = ∫ Ω 𝑦 HF 𝑝 ( 𝑌 obs | 𝑦 HF ) ⋅ 𝑝 ( 𝑦 HF | 𝒙 ) 𝑑𝑦 HF ⋅ 𝑝 ( 𝒙 ) → Expand the likelihood ≈ ∫ Ω 𝑦 HF 𝑝 ( 𝑌 obs | 𝑦 HF ) ⋅ 𝑝 ( 𝑦 HF | 𝒛 LF ( 𝒙 ) ) 𝑑𝑦 HF ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝑝 ( 𝑌 obs | 𝒛 LF ( 𝒙 ) ) ⋅ 𝑝 ( 𝒙 ) → Approximation: 𝑝 ( 𝑦 HF | 𝒛 LF ( 𝒙 ) ) ≈ 𝑝 ( 𝑦 HF | 𝒙 ) (5) = ∫ Ω 𝑦 HF ∫ Ω 𝒛 LF 𝑝 ( 𝑌 obs | 𝑦 HF ) ⋅ 𝑝 ( 𝑦 HF | 𝒛 LF ) → Expand 𝑝 ( 𝑦 HF | 𝒛 LF ( 𝒙 ) ) ⋅ 𝑝 ( 𝒛 LF | 𝒙 ) 𝑑 𝒛 LF 𝑑𝑦 HF ⋅ 𝑝 ( 𝒙 )∶= 𝑝 ( 𝒙 | 𝑌 obs ) MF In the equations above we approximate the computationally demanding 𝑝 ( 𝑦 HF | 𝒙 ) with 𝑝 ( 𝑦 HF | 𝒛 LF ( 𝒙 ) ) . In contrast to the multi-fidelity forward UQ formulation, the multi-fidelity inverse problem contains an approximation step, by replacing 𝑝 ( 𝑦 HF | 𝒙 ) with 𝑝 ( 𝑦 HF | 𝒛 LF ( 𝒙 ) ) . In contrast to 𝒙 , the vector 𝒛 LF ( 𝒙 ) is low-dimensional so that 𝑝 ( 𝑦 HF | 𝒛 LF ( 𝒙 ) ) can be well approximatedwith probabilistic learning methods. It is important to point out that in the multi-fidelity scenario, the HF output 𝑦 HF (for each 𝒙 )becomes an unobserved quantity which in fully-Bayesian fashion is modeled as a random variable. In the last line of equation(5) we expand the term 𝑝 ( 𝑦 HF | 𝒛 LF ( 𝒙 ) ) to show the analogies to equation (4). The Bayesian multi-fidelity posterior 𝑝 ( 𝒙 | 𝑌 obs ) MF reflects not only the uncertainties in the data but also the uncertainty introduced by 𝒛 LF for predicting 𝑦 HF (instead of solvingthe HF model directly) . In this contribution, the Bayesian multi-fidelity inverse problem should only illustrate the potential ofour multi-fidelity framework. The approximation of the involved terms brings further technical challenges that are outside thescope of the current paper. 𝑝 ( 𝑦 HF | 𝒛 LF ) : Multi-Fidelity Forward UQ in the Small Data Case In the preceding formulations, the densities involved were assumed to be known. In real applications, an analytic descriptionof the density terms is not existent but only the small data set of HF model evaluations and the computationally cheaper largedata set of LF features 𝒛 LF is available. Classic density estimation techniques are inaccurate in small data scenarios. Hence, thissection focuses on strategies to efficiently learn the multi-fidelity conditional distribution 𝑝 ( 𝑦 HF | 𝒛 LF ) which is the key elementof the multi-fidelity approach. Apart from the obvious accuracy requirements, it is essential that the necessary number of trainingdata 𝑓 = { 𝒁 LF , 𝑌 HF } with 𝒁 LF = 𝒛 LF ( 𝑿 ) and 𝑌 HF = 𝑦 HF ( 𝑿 ) , is minimized.As one regression function 𝑓 ∶ R 𝑑 𝒛 LF → R , with 𝑑 𝒛 LF = dim( 𝒛 LF ) , for the data set 𝑓 would not describe the problem ade-quately, due to the noisy relationship between 𝒛 LF and 𝑦 HF , we follow a probabilistic learning approach, considering a distributionof possible regression functions 𝑓 that would describe the small data set 𝑓 and obtain a density estimate for the HF output.In the Bayesian setting the multi-fidelity distribution 𝑝 ( 𝑦 HF | 𝒛 LF ) is approximated in two steps: First we define a prior distribu-tion for possible regression functions 𝑝 ( 𝑓 | 𝒛 LF ) denoted as the prior probabilistic model, which incorporates our beliefs (e.g.,smoothness, dependency) about the relationship of 𝑦 HF and 𝒛 LF . Since 𝑓 is infinite-dimensional , we follow in the slight abuseof notation for denoting with 𝑝 ( 𝑓 ) and 𝑝 ( 𝑓 | 𝑓 ) the prior and posterior of 𝑓 , respectively. Please note, that 𝑝 ( 𝑓 | 𝑓 ) implies arandom process, e.g., a Gaussian Process 𝑓 ( m 𝑓 ( 𝒛 LF ) , v 𝑓 ( 𝒛 LF )) , while 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) describes the uni-variate condi-tional distribution for a certain test value 𝒛 ∗ LF which is a normal distribution 𝑓 ∗ ( m 𝑓 ( 𝒛 ∗ LF ) , v 𝑓 ( 𝒛 ∗ LF )) , in case a GaussianProcess was deployed as a probabilistic model. We write 𝑓 ∗ to denote the uni-variate random variable that emerges for the eval-uation of the random process at a particular test input 𝒛 ∗ LF . We denote the value 𝑝 ( 𝑦 ∗ HF | 𝒛 ∗ LF , 𝑓 ∗ ) , with 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) , in contrastto the former expression 𝑝 ( 𝑦 HF | 𝒛 LF ) : 𝑝 ( 𝑦 HF | 𝒛 LF ) → 𝑝 ( 𝑦 ∗ HF | 𝒛 ∗ LF , 𝑓 ∗ , 𝑓 ) with 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) (6) . Nitzler ET AL . The regression function 𝑓 is only needed to construct the probabilistic model and allows for a structured representation of ourapproach but we are not interested in particular values of the function itself . In the subsequent predictions for the HF model’soutput density, we will hence calculate statistics for the variability in 𝑓 ∗ and eliminate the dependency on the latter.The starting point of the multi-fidelity UQ is a sampling procedure on the LF model, resulting in a large LF dataset LF ∗ = { 𝒙 ∗ 𝑖 , 𝑦 LF ( 𝒙 ∗ 𝑖 )} N sample 𝑖 =1 = { 𝑿 ∗ , 𝑌 ∗ LF } . The sample size N sample is dependent on the simulation model and the statisticsof interest but usually at least in the hundreds or thousands. Based on LF ∗ and criteria that will be discussed in Section 2.4, weselect a subset of optimal inputs 𝑿 ⊂ 𝑿 ∗ with size n train ≪ N sample and run simulations that yield the corresponding HF out-puts 𝑌 HF = 𝑦 HF ( 𝑿 ) . Following the procedure described in Section 2.4, we can additionally learn informative features 𝛾 𝑖 , withoutthe need of further simulation runs . The tuples of corresponding HF and LF outputs and informative features yield the train-ing data set 𝑓 = { 𝒁 LF , 𝑌 HF } = { 𝒛 LF ( 𝑿 ) , 𝑦 HF ( 𝑿 )} . The extended small data approximation of the multi-fidelity conditionaldistribution 𝑝 ( 𝑦 ∗ HF , 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) can now be plugged into equation (3) to yield the Bayesian multi-fidelity forward uncertaintyestimate for the limited data case: 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 ) = ∫ Ω 𝒛 LF 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF , 𝑓 ) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Likelihood ofHF observations ⋅ 𝑝 ( 𝒛 LF ) ⏟⏟⏟ Marginal density:direct MC on LF model 𝑑 𝒛 LF (7)Now we can among others calculate the expectation and variance of equation (7) with respect to the random variable 𝑓 ∗ . Theexpectation E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] serves as an approximation for the HF distribution E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] ≈ 𝑝 ( 𝑦 HF ) . Ourconfidence about this prediction can be expressed in form of the variance with respect to the posterior model distribution at atest input 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) , which results in credible intervals on the density function E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] itself: E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] = ∫ Ω 𝒛 ∗ LF ∫ Ω 𝑓 ∗ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF , 𝑓 ) 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) 𝑑𝑓 ∗ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF , 𝑓 )] ≈ 𝑝 ( 𝑦 HF | 𝒛 LF ) 𝑝 ( 𝒛 ∗ LF ) 𝑑 𝒛 ∗ LF (8a) V 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] = E 𝑓 ∗ [( 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )) ] − ( E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )]) = ∫ Ω 𝒛 ∗ LF ∫ Ω 𝒛 ∗ LF ′ E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF , 𝑓 ) 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF ′ , 𝑓 )] ⋅ 𝑝 ( 𝒛 ∗ LF ′ ) 𝑝 ( 𝒛 ∗ LF ) 𝑑 𝒛 ∗ LF ′ 𝑑 𝒛 ∗ LF − ( E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )]) (8b) In this work, we advocate Gaussian Processes (GPs) as a probabilistic regression model in equation (6). GPs are a popularnon-parametric Bayesian tool that is well-suited to small data settings. For the inference and learning tasks, we employed GPy .In all examples analyzed we started with a prior Gaussian Process of the form: 𝑝 ( 𝑓 ) = 𝑓 ( m ( 𝒛 LF ) , k ( 𝒛 LF , 𝒛 LF ′ )) , (9a) m ( 𝒛 LF ) = 0 , (9b) k ( 𝒛 LF , 𝒛 LF ′ ) = 𝜎 ⋅ exp [ − | 𝒛 LF − 𝒛 LF ′ | 𝓁 ] , (9c)where m ( 𝒛 LF ) is the prior mean function and k ( 𝒛 LF , 𝒛 LF ′ ) is the prior covariance function which we choose to be the squaredexponential covariance function with 𝓁 being the characteristic length scale and 𝜎 the signal variance. The prior mean functionof the process is usually selected to zero or is set equal to 𝑦 LF , if it is assumed that the LF output reflects 𝑦 HF in the mean. Followingour notation, capital letters, e.g., Γ 𝑖 ∗ , denote the vector of realizations of a random variable, e.g., 𝛾 𝑖 and bold capital lettersdenote matrices, such a matrix holding column-wise realizations of vector valued random variables, e.g. 𝒁 LF , or the covariancematrix 𝐊 . The asterisk superscript indicates the large data set derived from the LF Monte-Carlo simulation. Furthermore, wemodel a Gaussian likelihood of the data with noise level 𝜎 n : 𝑝 ( 𝑌 HF | 𝐹 , 𝒁 LF , 𝑓 ) = 𝑦 HF ( 𝐹 , 𝜎 n 𝐼 ) , (10) J. Nitzler
ET AL . with 𝐹 being a particular realization of the GP, evaluated for the feature matrix 𝒁 LF , where each column corresponds to atraining point. The posterior GP 𝑝 ( 𝑓 | 𝒛 LF , 𝑓 ) , follows then to : 𝑝 ( 𝑓 | 𝑓 ) = 𝑓 | 𝑓 ( m 𝑓 ( 𝒛 LF ) , k 𝑓 ( 𝒛 LF , 𝒛 LF ′ )) (11a) m 𝑓 ( 𝒛 LF ) = m ( 𝒛 LF ) + 𝐤 𝑇 ( 𝒛 LF ) ( 𝐊 + ̂𝜎 n 𝐼 ) −1 ( 𝑌 HF 𝑇 − m( 𝒛 LF ) ) (11b) k 𝑓 ( 𝒛 LF , 𝒛 LF ′ ) = k ( 𝒛 LF , 𝒛 LF ′ ) − k ( 𝒛 LF , 𝒁 LF ) [ k ( 𝒁 LF , 𝒁 LF ) + ̂σ n I ] −1 k ( 𝒁 LF , 𝒛 LF ′ ) (11c)Here, 𝐊 = k ( 𝒁 LF , 𝒁 LF ) and 𝐤 = k ( 𝒛 LF , 𝒁 LF ) are used for compact notation. Point estimates of the hyper-parameters of themodel 𝜽 = { 𝓁 , 𝜎 , 𝜎 𝑛 } are determined by maximizing the marginal likelihood and are denoted by ̂ 𝜽 in the sequel. In addition,we denote the posterior variance of the GP (i.e., the posterior covariance for 𝒛 LF = 𝒛 LF ′ ) with v 𝑓 ( 𝒛 LF ) = k 𝑓 ( 𝒛 LF , 𝒛 LF ) . Givena test input 𝒛 ∗ LF , the predictive posterior of the value of the GP at this point, i.e. 𝑓 ∗ ( 𝒛 ∗ LF ) , is given by a normal distribution: 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) = 𝑓 ∗ ( m 𝑓 ( 𝒛 ∗ LF ) , k 𝑓 ( 𝒛 ∗ LF , 𝒛 ∗ LF )) = 𝑓 ∗ ( m 𝑓 ( 𝒛 ∗ LF ) , v 𝑓 ( 𝒛 ∗ LF )) (12)Furthermore, the Gaussian likelihood in equation (10) implies that the predictive distribution for the corresponding value of theHF model’s output, denoted by 𝑦 ∗ HF , will be: 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF ) = 𝑦 ∗ HF ( 𝑓 ∗ , ̂𝜎 n ) (13)The involved densities are summarized in Table 1: TABLE 1
Applied models for the densities in equations (7) to (8b)Density Applied model Description 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF ) 𝑓 ∗ ( m ( 𝒛 ∗ LF ) , v ( 𝒛 ∗ LF )) Prior GP evaluated at 𝒛 ∗ LF 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) 𝑓 ∗ ( m 𝑓 ( 𝒛 ∗ LF ) , v 𝑓 ( 𝒛 ∗ LF )) Posterior GP evaluated at 𝒛 ∗ LF 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF , 𝑓 ) 𝑦 ∗ HF ( 𝑓 ∗ , ̂𝜎 n ) Likelihood of HF data 𝑝 ( 𝒛 LF ) Only samples available LF distr. 𝑝 ( 𝑦 ∗ HF | 𝒛 ∗ LF , 𝑓 ) Multi-fidelity conditional = ∫ Ω 𝑓 ∗ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF , 𝑓 ) ⋅ 𝑝 ( 𝑓 ∗ | 𝒛 ∗ LF , 𝑓 ) 𝑑𝑓 ∗ 𝑦 ∗ HF ( m 𝑓 ( 𝒛 ∗ LF ) , v 𝑓 ( 𝒛 ∗ LF ) + ̂𝜎 n ) 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 ) Random process for HF density = ∫ Ω 𝒛 ∗ LF 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝒛 ∗ LF , 𝑓 ) 𝑝 ( 𝒛 ∗ LF ) 𝑑 𝒛 ∗ LF - 𝑝 ( 𝑦 ∗ HF | 𝑓 ) Mean estimate for HF density = E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] see equation (14) (in small data regime) . Nitzler ET AL . Based on the previous results and given the posterior uncertainty of the GP, we can compute the expected value of the density E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] in equation (8a), by averaging over the posterior of the GP as follows: E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] = ∫ Ω 𝒛 ∗ LF ∫ Ω 𝑓 ∗ 𝑦 ∗ HF ( 𝑓 ∗ , ̂𝜎 n ) 𝑓 ∗ ( m 𝑓 ( 𝒛 ∗ LF ) , v 𝑓 ( 𝒛 ∗ LF )) 𝑑𝑓 ∗ 𝑝 ( 𝒛 ∗ LF ) 𝑑 𝒛 ∗ LF = ∫ Ω 𝒛 ∗ LF 𝑦 ∗ HF ( m 𝑓 ( 𝒛 ∗ LF ) , v 𝑓 ( 𝒛 ∗ LF ) + ̂𝜎 n ) 𝑝 ( 𝒛 ∗ LF ) 𝑑 𝒛 ∗ LF ≈ 1N sample 𝑁 ∑ 𝑗 =1 𝑦 ∗ HF ( m 𝑓 ( 𝒛 ∗ LF 𝑗 ) , v 𝑓 ( 𝒛 ∗ LF 𝑗 ) + ̂𝜎 n ) (14)A Monte Carlo integration over 𝒛 ∗ LF is used in the last step of equation (14). We note, that this integral, respectively its MonteCarlo approximation, depends on inexpensive LF samples 𝒁 ∗ LF = [ 𝒛 ∗ LF 𝑖 ] with 𝑖 ∈ N ∶ [1 , N sample ] , and for the posteriormean m 𝑓 ( 𝒛 ∗ LF ) and variance v 𝑓 ( 𝒛 ∗ LF ) of the GP, no additional HF runs are needed. Similarly, the (posterior) variance of thesought density of 𝑦 HF can be computed from (8b) by substituting the density approximations in Table 1 and considering theposterior uncertainty of the GP. Here, we calculate the variance with respect to the GP realizations 𝑓 ∗ at 𝒛 ∗ LF and make use ofthe arithmetic for Gaussian distributions to find a semi-analytic formulation for the variance expression up to the integrationover 𝒛 ∗ LF and 𝒛 ∗ LF ′ , respectively. Again, the outer integrals over Ω 𝒛 ∗ LF , respectively Ω 𝒛 ∗ LF ′ , have to be solved via Monte Carlointegration due to the non-Gaussian distributions 𝑝 ( 𝒛 ∗ LF ) , respectively 𝑝 ( 𝒛 ∗ LF ′ ) . The subtrahend ( E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )]) canbe reused from the previous computation in equation (14). For the subsequent derivation we define the vectors 𝒚 ∗ HF = [ 𝑦 ∗ HF , 𝑦 ∗ HF ] 𝑇 and 𝒇 ∗ = [ 𝑓 ∗ , 𝑓 ∗ ] 𝑇 to denote the support of the multivariate normal distributions, which arise from the multiplication oftwo univariate normal distributions in the expectation expression E 𝑓 ∗ [ 𝑦 ∗ HF ( 𝑓 ∗ ( 𝒛 ∗ LF ) , ̂𝜎 n ) ⋅ 𝑦 ∗ HF ( 𝑓 ∗ ( 𝒛 ∗ LF ′ ) , ̂𝜎 n )] (see forstochastic calculus): V 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] = ∫ Ω 𝒛 ∗ LF ∫ Ω 𝒛 ∗ LF ′ E 𝑓 ∗ [ 𝑦 ∗ HF ( 𝑓 ∗ ( 𝒛 ∗ LF ) , ̂𝜎 n ) ⋅ 𝑦 ∗ HF ( 𝑓 ∗ ( 𝒛 ∗ LF ′ ) , ̂𝜎 n )] 𝑝 ( 𝒛 ∗ LF ) 𝑝 ( 𝒛 ∗ LF ′ ) 𝑑 𝒛 ∗ LF ′ 𝑑 𝒛 ∗ LF − ( E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )]) = ∫ Ω 𝒛 ∗ LF ∫ Ω 𝒛 ∗ LF ′ ∫ Ω 𝑓 ∗ 𝐲 ∗ HF ([ 𝑓 ∗ ( 𝒛 ∗ LF ) 𝑓 ∗′ ( 𝒛 ∗ LF ′ ) ] , [ ̂𝜎 n ̂𝜎 n ]) ⋅ 𝒇 ∗ ([ m 𝑓 ( 𝒛 ∗ LF ) m 𝑓 ( 𝒛 ∗ LF ′ )] , [ v 𝑓 ( 𝒛 ∗ LF ) k 𝑓 ( 𝒛 ∗ LF , 𝒛 ∗ LF ′ ) k 𝑓 ( 𝒛 ∗ LF , 𝒛 ∗ LF ′ ) v 𝑓 ( 𝒛 ∗ LF ′ ) ]) 𝑑𝑓 ∗ ⋅ 𝑝 ( 𝒛 ∗ LF ) 𝑝 ( 𝒛 ∗ LF ′ ) 𝑑 𝒛 ∗ LF ′ 𝑑 𝒛 ∗ LF − ( E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )]) = ∫ Ω 𝒛 ∗ LF ∫ Ω 𝒛 ∗ LF ′ 𝐲 ∗ HF ([ m 𝑓 ( 𝒛 ∗ LF ) m 𝑓 ( 𝒛 ∗ LF ′ )] , [ v 𝑓 ( 𝒛 ∗ LF ) + ̂𝜎 n k 𝑓 ( 𝒛 ∗ LF , 𝒛 ∗ LF ′ ) k 𝑓 ( 𝒛 ∗ LF , 𝒛 ∗ LF ′ ) v 𝑓 ( 𝒛 ∗ LF ′ ) + ̂𝜎 n ]) ⋅ 𝑝 ( 𝒛 ∗ LF ) 𝑝 ( 𝒛 ∗ LF ′ ) 𝑑 𝒛 ∗ LF ′ 𝑑 𝒛 ∗ LF − ( E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )]) ≈ 1N sample N sample ∑ 𝑖 =1 N sample ∑ 𝑗 =1 𝐲 ∗ HF ⎛⎜⎜⎝[ m 𝑓 ( 𝒛 ∗ LF 𝑖 ) m 𝑓 ( 𝒛 ∗ LF 𝑗 )] , ⎡⎢⎢⎣ v 𝑓 ( 𝒛 ∗ LF 𝑖 ) + ̂𝜎 n k 𝑓 ( 𝒛 ∗ LF 𝑖 , 𝒛 ∗ LF 𝑗 ′ ) k 𝑓 ( 𝒛 ∗ LF 𝑗 , 𝒛 ∗ LF 𝑖 ′ ) v 𝑓 ( 𝒛 ∗ LF 𝑗 ) + ̂𝜎 n ⎤⎥⎥⎦⎞⎟⎟⎠ − ( E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )]) (15)In the following section, we will discuss the details of the composition of 𝒛 LF and additionally provide pseudo-algorithms tosummarize all important steps of the Bayesian multi-fidelity approach. 𝜸 ( 𝒙 ) and Optimal Training Set 𝑓 The vector of lower-fidelity features 𝒛 LF ( 𝒙 ) contains apart from low-fidelity model outputs 𝑦 LF ( 𝒙 ) , a few informative featurefunctions summarized in 𝜸 ( 𝒙 ) , so that 𝒛 LF ( 𝒙 ) = [ 𝑦 LF ( 𝒙 ) , 𝜸 ( 𝒙 ) ] 𝑇 . This section is devoted to the definition and computation of J. Nitzler
ET AL . informative features 𝜸 ( 𝒙 ) . Former versions of Bayesian Multi-Fidelity Monte Carlo have employed 𝒛 LF ( 𝒙 ) = 𝑦 LF ( 𝒙 ) , so thatmodel inputs 𝒙 are entirely filtered through the LF model. The exclusive use of 𝒛 LF ( 𝒙 ) = 𝑦 LF ( 𝒙 ) leads, in the general case, toconditional densities that might still be challenging to be sufficiently approximated by probabilistic machine learning models ina small data scenario contributing to significant model errors ( error 1 ) in Section 2.1) and inaccurate results.We note that employing the whole input vector 𝒛 LF ( 𝒙 ) ≡ 𝒙 as, e.g., in Perdikaris et al. for nonlinear autoregressive multi-fidelity GP regression ( NARGP ), would only be applicable to low-dimensional 𝒙 . In dynamic problems it was shown by Lee etal. that it is advantageous to incorporate time 𝑡 and time derivatives, in the form of time shifts 𝑦 LF ( 𝒙 , 𝑡 ) , 𝑦 LF ( 𝒙 , 𝑡 + Δ 𝑡 ) , ofthe LF simulation outputs as further features. This idea can directly be integrated in our approach, but without the necessity totreat 𝒙 explicitly, by choosing, e.g., 𝒛 LF ( 𝒙 , 𝑡 ) = [ 𝑦 LF ( 𝒙 , 𝑡 ) , 𝑡, 𝑦 LF ( 𝒙 , 𝑡 + Δ 𝑡 ) ] 𝑇 .The discovery of appropriate 𝜸 ( 𝒙 ) that complement the low-fidelity model outputs 𝑦 LF 𝑗 ( 𝒙 ) can be formulated as an unsuper-vised learning task for which a wealth of linear and non-linear dimensionality reduction techniques have been proposed, in thepast . We would like to emphasize though, that we do not necessarily seek lower-dimensional representations of 𝒙 but rather for the features of 𝒙 that are predictive of 𝑦 HF . Even in cases where 𝒙 is amenable to (non)linear dimensionality reduc-tion, only a subset of the reduced coordinates may affect the output 𝑦 HF . Additionally, as 𝑝 ( 𝑦 HF | 𝑦 LF ) is already conditioned onthe LF output 𝑦 LF , it does not make sense to select 𝛾 𝑖 ( 𝒙 ) that furnish information contained in 𝑦 LF ( 𝒙 ) so that a naive supervisedlearning approach would fail. More specifically, in the context of BMFMC advocated in this paper, our goal is to identify afew 𝛾 𝑖 ( 𝒙 ) that, in conjunction with 𝑦 LF ( 𝒙 ) , will reduce as much as possible the model error source error 1 in Section 2.1.In the following, we will not change the overall probabilistic modeling strategy, in our case Gaussian Processes, but rather aimto find a better representation for the multi-fidelity conditional distribution 𝑝 ( 𝑦 HF | 𝒛 LF ) by modifying 𝒛 LF through 𝛾 𝑖 . As pointedout in error 1 , the assumption of a specific conditional distribution is not fulfilled in the general case and will hence introducean error. Additionally, in case of a large variance of the random process V 𝑦 HF [ 𝑝 ( 𝑦 HF | 𝒛 LF )] for 𝑦 HF , an accurate estimationof this variance in form of the noise level ̂𝜎 n becomes more challenging. Because feature selection based on 𝑝 ( 𝑦 HF | 𝑦 LF ) ishampered by the low number of HF data points { 𝑿 , 𝑌 HF } , we alternatively choose 𝛾 𝑖 ( 𝒙 ) that explain most of the LF model’soutput variance V 𝑦 LF [ 𝑝 ( 𝑦 LF )] and argue that such 𝛾 𝑖 ( 𝒙 ) are also important features for model dependency 𝑝 ( 𝑦 HF | 𝒛 LF ) , i.e.,their explicit treatment in 𝑝 ( 𝑦 HF | 𝒛 LF ) would reduce the variance of the expression the most. By explicit treatment we mean theincorporation of 𝛾 𝑖 in 𝒛 LF so that the Gaussian Process has 𝛾 𝑖 as an additional input variable, rather than implicitly accounting forthe effect of 𝛾 𝑖 by its contribution to the characteristics of the density 𝑝 ( 𝑦 HF | 𝒛 LF ) , where we impose a Gaussian noise assumption.On the other hand, if too many informative features 𝛾 𝑖 are added to 𝑝 ( 𝑦 HF | 𝒛 LF ) , the resulting space Ω 𝒛 LF becomes too large tobe sufficiently covered by 𝑓 , so that error 2 will increase due to growing epistemic uncertainty. The effect is schematicallydepicted in Figure 2. For a larger amount of training data, more informative features 𝛾 𝑖 can be added to further decrease theapproximation error.We start by demanding, that 𝛾 𝑖 ( 𝒙 ) is selected from the lower dimensional representation ̂𝒙 of the original input 𝒙 . Therefore, themodel input 𝒙 separated into uncorrelated dimensions and correlated dimensions: 𝒙 = [ 𝒙 uncorr , 𝒙 corr ,𝑖 ] 𝑇 , with 𝑖 ∈ N ∶ [1 , n corr ] . Inthis notation 𝒙 𝑇 corr ,𝑖 represent the correlated inputs in form of correlated random variables or random fields. A reduced representa-tion ̂𝒙 can be achieved by applying unsupervised dimensionality reduction techniques, in our case a truncated Karhunen-Loéveexpansion (KLE), on 𝒙 𝑇 corr ,𝑖 . We want to emphasize that the dimensionality reduction discussed in this section, was not used togenerate the realization of random fields that were deployed in the actual simulations but is rather a post-processing step that isonly used in the probabilistic learning approach. Starting point is an eigenproblem for the covariance matrix 𝐊 ∗ of the randomfield, which is defined by the evaluation of its covariance function for the underlying discretization: 𝐊 ∗ 𝒗 𝑗,𝑖 = 𝜆 𝑗,𝑖 𝒗 𝑗,𝑖 (16)The eigenvectors 𝒗 𝑗 define a complete basis, in which we can represent 𝒙 corr ,𝑖 as a linear combination of 𝒗 𝑗 , so that the coefficientsof the expansion yields: 𝒄 𝑖 ≈ 𝑽 𝑇 trunc ,𝑖 ⋅ 𝒙 corr ,𝑖 − 𝒎 𝑖 , with 𝑖 ∈ N ∶ [1 , n corr ] (17)Here, 𝒎 𝑖 is the mean vector of the 𝑖 -th discretized random field and 𝒄 𝑖 a vector of coefficients for the truncated basis 𝑽 trunc ,𝑖 .We truncate the series expansion in equation (17) when 95% of the explained variance is reached. The explained variance of thediscretized field is defined as: explained variance ∶= n trunc ∑ 𝑖 = 𝑗 𝜆 𝑗 ∕ 𝑑 corr ∑ 𝑗 =1 𝜆 𝑗 , with 𝑑 corr = dim( 𝒙 corr ,𝑖 ) (18) . Nitzler ET AL . 𝑁 γ error 0 1 2 3 4 5 6smaller n train larger n train FIGURE 2
Schematic illustration of the error behavior in 𝑝 ( 𝑦 HF | 𝒛 LF ) for an increasing number of features, 𝛾 𝑖 . The incorporationof features leads at first to a decrease of the modeling error 1 , before a larger Ω 𝒛 LF leads again to an increase of ( error 2 ). For alarger training data size n train the error is in general lower and the minimum error (indicated by arrows) lies at a higher numberof features. See Figure 10 for a numerical demonstration.Afterwards, we propose to use the vector of KLE-coefficients 𝒄 𝑖 as a low dimensional feature vector for 𝒙 corr ,𝑖 . Standardizationof each dimension is written in form a standardization operator . The reduced input vector then follows to: ̂𝒙 ∶= [ 𝒙 uncorr , 𝒄 𝑖 ] 𝑇 , with 𝑖 ∈ N ∶ [1 , n trunc ,𝑖 ] (19)Standardization refers to the individual scaling of the dimensions in ̂𝒙 , so that their underlying input density has zero mean anda standard deviation of one. In a next step, we define a correlation measure 𝐫 between the individual dimensions of the reducedinput vector ̂𝒙 and the LF simulation output 𝑦 LF , using the projection of the corresponding reduced input matrix ̂𝑿 ∗ on the LFoutput vector 𝑌 ∗ LF : 𝐫 = ||| ̂𝑿 ∗ 𝑇 ⋅ 𝑌 LF ∗ ||| (20)The definition (20) can be understood as the absolute value of the scaled Pearson correlation coefficient, calculated for eachdimension of ̂𝒙 and the LF output 𝑦 LF . Input dimensions ̂𝑥 𝑗 that show high values for 𝑟 𝑗 are informative about 𝑦 LF and inapproximation about 𝑦 HF , leading to an efficient reduction of V 𝑦 HF [ 𝑝 ( 𝑦 HF | 𝑦 LF 𝑖 , 𝛾 𝑖 )] . We select the ̂𝑥 𝑖 that correspond to the 𝑖 -highest entries 𝑟 𝑖 in 𝒓 , with 𝑖 ∈ N ∶ [1 , n 𝜸 ] , as an informative feature 𝛾 𝑖 of the input. We found that in the small data regime twoadditional LF features 𝛾 and 𝛾 gave the best results. For a higher number of LF features, Ω 𝒛 LF gets too large for the small dataset 𝑓 so that the approximation error increases again.For an optimal selection of training points 𝑓 , we propose the following procedure: Given the large LF dataset LF ∗ ∶= { 𝑿 ∗ , 𝑌 ∗ LF } , we select a subset of training inputs 𝑿 ⊂ 𝑿 ∗ for which we run HF simulations, accordingto 𝑌 HF = 𝑦 HF ( 𝑿 ) . The training data set is then defined as 𝑓 = { 𝑌 HF , 𝑌 LF , 𝒁 LF } . Important is in the following the definition ofthe aforementioned subset 𝑿 , meaning which 𝒙 𝑖 we should select to calculate the training data. Usually, the initial training dataselection for surrogate models of computer experiments aims for a space-filling design strategy in the input space, i.e., a LatinHyper-Cube design to explore the input space efficiently . We also decide for a space filling training data set 𝑓 but paycloser attention to the definition of the term space filling in this contexte as we do not intentend to learn a classical determin-istic surrogate model but rather want to learn the conditional distribution 𝑝 ( 𝑦 HF | 𝒛 LF ) with low error. The noisy relationshipbetween 𝑦 HF and 𝒛 LF is caused by neglecting the explicit dependency on 𝒙 . To get a good estimate for the noise structure weneed to select training points with space filling properties in Ω 𝒙 . As the input space is assumed to be large, it is advantageousto focus on the important part of Ω 𝒙 , represented by 𝜸 + = 𝛾 𝑖 , with 𝑖 ∈ N ∶ [1 , n 𝜸 + ] and n 𝜸 + being the number of input featuresused in the extended vector 𝜸 + . We note that n 𝜸 + > n 𝜸 , such that we demand space filling properties to more dimensions 𝛾 𝑖 than actually used to define 𝒛 LF . This is necessary to yield a training design that is also representative for the noise structure of 𝑝 ( 𝑦 HF | 𝒛 LF ) . In the extended space Ω 𝜸 + , the sampling data from the LF model LF ∗ ∶= { 𝑿 ∗ , 𝑌 ∗ LF } corresponds to the LF featuredata set 𝚪 ∗ , + . Given the training data size n train we then choose a space filling subset 𝚪 + ⊂ 𝚪 ∗ , + from the large sampling data J. Nitzler
ET AL . set of size N sample , with n train ≪ N sample . In the numerical implementation we used diverse-subset algorithms based on Wessingand Salomon . Given 𝒁 LF + and the corresponding 𝑿 , we can now run the HF simulations accordingly to 𝑌 HF = 𝑦 HF ( 𝑿 ) .We found that n 𝜸 + ∈ N ∶ [3 , is a good choice in the small data regime.Finally, we can summarize the necessary steps for the Bayesian multi-fidelity uncertainty quantification approach in form ofthe pseudo-code 1. The sub-algorithms 2 and 3 show the details of the determination of 𝒛 LF , discussed in this section and theimplementation of the posterior statistics form equation (14)-(15). Algorithm 1
Pseudo-code for Bayesian multi-fidelity UQ (BMFMC)
Require: 𝑝 ( 𝒙 ) , 𝑦 HF ( 𝒙 ) , 𝑦 LF ( 𝒙 ) , n train , N sample , 𝒚 HF,support 𝑿 ∗ = G ENERATE ( 𝑝 ( 𝒙 ) , N sample ) // Draw N sample Monte-Carlo samples 𝑿 ∗ from 𝑝 ( 𝒙 ) 𝑌 ∗ LF ← 𝑦 LF ( 𝑿 ∗ ) // Run LF model for N sample Monte-Carlo samples 𝑝 ( 𝑓 | 𝒛 LF ) = D ESIGN P RIOR
GP( 𝑌 ∗ LF ) 𝑓 = T RAINING D ATA ( 𝑿 ∗ , 𝑌 ∗ LF , n train , 𝒚 HF,support , N sample )// see algorithm 2 𝑓 | 𝑓 ← 𝑓 // Train GP model on 𝑓 return 𝒑 𝑦 HF , E ∗ , 𝒑 𝑦 HF , V ∗ = P OSTERIOR S TATISTICS ( 𝑓 | 𝑓 , 𝒁 ∗ LF , N sample , 𝒚 HF,support ) Algorithm 2
TrainingData ( 𝑿 ∗ , 𝑌 ∗ LF , n train , 𝒚 HF,support , N sample ) // ———— Unsupervised dim. reduction for input matrix 𝑿 ∗ ———— // [ 𝑿 ∗ uncorr , 𝑿 ∗ corr ,𝑖 ] ← 𝑿 ∗ // with 𝑖 ∈ N ∶ [1 , n corr ] , Distinguish: corr. & uncorr. inputs for 𝑖 to n corr do // Number of random fields: n corr ̂𝑿 ∗ 𝑖 = T RUNCATED
KLE( 𝑿 ∗ corr ,𝑖 ) // Dim. reduction of random fields by truncated KLE end for ̂𝑿 ∗ ← [ 𝑿 ∗ uncorr , ̂𝑿 ∗ 𝑖 ] // Construct reduced input matrix ̂𝑿 ∗ = S TANDARDIZE ( ̂𝑿 ∗ ) // Standardize input matrix for line 7// —————————- Define 𝜸 + (supervised) —————————- // 𝐫 = ||| ̂𝑿 ∗ 𝑇 ⋅ 𝑌 ∗ LF ||| // Calculate corr. coefficients for every dim. in ̂𝒙 , equation (20) for 𝑖 to n 𝜸 + do // We found n 𝜸 + ∈ N ∶ [3 , as a good heuristics idx = R ETURN I NDEX M AX ( 𝐫 ) // Get dimension in ̂𝒙 with max. correlation to 𝑦 HF Γ ∗ 𝑖 = S ELECT C OLUMN (idx , ̂𝑿 ∗ ) // Select corresponding column in ̂𝑿 ∗ 𝐫 = S ET M AX Z ERO ( 𝐫 ) end for 𝚪 ∗ , + ← [Γ ∗ 𝑖 ] , 𝑖 ∈ N ∶ [1 , n 𝜸 + ] // Construct extended feature space// ——————————– Select 𝑿 and 𝑌 HF ——————————— // 𝚪 + = S ELECT D IVERSE S UBSET ( 𝚪 ∗ , + , n train ) // Space filling subset in Ω 𝜸 + 𝑿 = G ET C ORRESPONDING I NPUT ( 𝚪 + ) 𝑌 HF = 𝑦 HF ( 𝑿 ) // Run HF model for training inputs 𝑿 // ——————————– Select 𝒛 LF and 𝑓 ——————————— // 𝒛 LF ← [ 𝒚 LF , 𝛾 𝑖 ] // with 𝑖 ∈ N ∶ [1 , n 𝜸 ] , we recommend n 𝜸 = 2 for n train ∈ N ∶ [50 , 𝒁 LF ← 𝒛 LF 𝑓 ← [ 𝒁 LF , 𝑌 HF ] // Define training-set for GP of size n train ≪ N sample return 𝑓 . Nitzler ET AL . Algorithm 3
PosteriorStatistics ( 𝑓 | 𝑓 , 𝒁 ∗ LF , N sample , 𝒚 HF,support ) // ———- Calculate E 𝑓 ∗ [ 𝑝 ( 𝑦 HF | 𝑓 ∗ , 𝑓 )] , equation (14) ———- // 𝒎 ∗ , 𝒗 ∗ =E VALUATE
GP( 𝑓 | 𝑓 , 𝒁 ∗ LF ) // Evaluate the posterior GP at 𝒁 ∗ LF 𝑷 ∗ 𝑦 HF =N ORMAL
PDF( 𝒚 HF,support , 𝒎 ∗ , 𝒗 ∗ ) // Matrix with column-wise normal distribution evaluations 𝒑 𝑦 HF , E ∗ = sample ⋅ S UM C OLUMNS ( 𝑷 ∗ 𝑦 HF ) // Sum up normal distributions (same 𝒚 HF,support )// ———- Calculate V 𝑓 ∗ [ 𝑝 ( 𝑦 HF | 𝑓 ∗ , 𝑓 )] , equation (15) ———- // 𝑲 𝑓 ∗ =P OSTERIOR C OVARIANCE ( 𝑓 | 𝑓 , 𝒁 ∗ LF ) 𝑗, ℎ = 1 𝒀 HF , V = [ 𝒚 HF,support , 𝒚 HF,support ] // Evaluate 2-dim. PDF in line 15 for points [ 𝑦 HF 𝑙 , 𝑦 HF 𝑙 ] 𝑇 ,// with 𝑙 ∈ N ∶ [1 , n support ] for 𝜇 , 𝑣 in 𝒎 ∗ , 𝒗 ∗ do 𝑖 = 1 for 𝜇 , 𝑣 in 𝒎 ∗ , 𝒗 ∗ do 𝑘 = 𝑲 𝑓 ∗ ( 𝑖, 𝑗 ) ̂𝜎 n = G ET N OISE
GP( 𝑓 | 𝑓 ) 𝚺 = [[ 𝑣 + ̂𝜎 n , 𝑘 ] 𝑇 , [ 𝑘, 𝑣 + ̂𝜎 n ] 𝑇 ] // Construct covariance matrix for PDF in line 15 𝝁 = [ 𝜇 , 𝜇 ] 𝑇 // Construct mean vector for PDF in line 15 𝒑 𝑦 HF , V ∗ + = N ORMAL
PDF( 𝒀 HF , V , 𝝁 , 𝚺 ) // Evaluate PDF for 𝒀 HF , V and add up results per iteration 𝑖 + = 1 ℎ + = 1 end for 𝑗 + = 1 end for 𝒑 𝑦 HF , V ∗ = 𝒑 𝑦 HF , V ∗ ∕ ℎ − ( 𝒑 𝑦 HF , E ∗ ) // Normalize result and subtract squared mean prediction return 𝒑 𝑦 HF , E ∗ , 𝒑 𝑦 HF , V ∗ This section is devoted to simple and general application-oriented procedures to generate low cost numerical approximatorsof the original HF model. A HF model is an umbrella term that includes a complex simulation code which captures detailedphysical effects, narrow solver tolerances as well as an overall fine numerical discretization of the temporal and spatial domain.LF models could then, for example, be simpler physical models as in . A physically simpler version of the original problemis however not always easily available so that we investigate theoretical aspects based on pure numerical relaxation of Galerkinmethods (Finite Element Method, Discontinuous Galerkin Method).We first investigate factors that influence the computational cost for approximating the solution of a nonlinear system ofpartial differential equations based on Galerkin based discretization methods for transient problems with an iterative solution ofnonlinear systems of equations within each time step. Afterwards, we motivate a numerical relaxation strategy for the automatedgeneration of LF model versions, based on these considerations. Computational costs can be decomposed in contributions bythe present number of degrees of freedom (DoFs), i.e., the number of unknowns arising from the numerical discretization, thenecessary number of iterations until convergence, as well as the efficiency of the implementation depending on the computationalcomplexity of a chosen numerical algorithm but also on the optimization level of a specific code. Thus we obtain :cost ∝ DoFs ⋅ time steps ⋅ iterations ⋅ efficiency of implementation (21) J. Nitzler
ET AL . We describe the general case of high-order discontinuous Galerkin methods as these methods contain the widely used finite-element method and finite-volume method as a special case. The spatial dimensionality of the investigated problem is abbreviatedby 𝑑 . The polynomial degree for the Ansatz functions is denoted by 𝑘 and the measure for the element size by ℎ . The spatialdiscretization results in 𝑁 ele elements with ( 𝑘 + 1) 𝑑 degrees of freedom (and similarly 𝑘 𝑑 for a continuous finite element space)per element, assuming hexahedral elements and scalar fields. The number of elements is inversely proportional to ℎ to the powerof the dimension of the problem: 𝑁 ele ∝ ℎ 𝑑 . The number of degrees of freedom (DoFs) can then be summarized in equation (22):DoFs ∝ 𝑁 ele ⋅ ( 𝑘 + 1) 𝑑 ∝ ( 𝑘 + 1 ℎ ) 𝑑 (22)The cost associated with time stepping is inversely proportional to the time step size (we assume a mean time step for the costconsiderations). Besides accuracy demands, the maximal possible time step size is constrained by the stability limits of thedeployed solver. As a general discussion of the stability theory for arbitrary solvers in not expedient, we confine the analysis tosolvers for transient fluid dynamics applications where the time step is selected according to the CFL condition (∗) , resulting in time steps ∝ 1Δ 𝑡 (∗) ∝ 𝑘 𝛾 ℎ⏟⏟⏟ CFLrelationship with 𝛾 ∈ [1 , , (23)and argue that the resulting time step size is small enough to ensure a time accurate solution for many problems. Computationalcosts related to the iterative solution of systems of equations for the unknown degrees of freedom are dependent on solvertolerances 𝝐 solver , and in general also on the element size ℎ as well as on the polynomial degree 𝑘 of the Ansatz functions. For robust solvers (∗∗) , e.g., multigrid, one can assume that the spatial discretization ( ℎ, 𝑘 ) does not influence iteration counts. Adependency of computational costs on the solver tolerance according to − log 𝝐 solver provides a good general model for solverswith optimal complexity, so that we write: iterations = 𝑓 ( 𝝐 solver , ℎ, 𝑘 ) (∗∗) ∝∼ − log 𝝐 solver (24)Under efficiency of implementation (in DoFs computed per second), we imply the speed at which certain elementary operations ofa PDE solver can be performed on a given computer hardware and a given implementation. This factor is summarized in 𝑔 ( ℎ, 𝑘 ) regarding the serial performance of a code, as well as effects of parallel scalability that we summarize in the coefficient of parallelefficiency 𝜂 parallel ( ℎ, 𝑘 ) . Furthermore, we introduce the speed-up through floating-point precision 𝔭 ∈ {1 , , where 𝔭 = 1 for (standard) double precision and 𝔭 = 2 in case the solver uses single floating-point precision. The factor 𝑔 ( ℎ, 𝑘 ) mainlydepends on the implementation variant used for the solver, and we here focus on implementation strategies that have optimalcomplexity (∗∗∗) w.r.t. the polynomial degree 𝑘 and mesh size ℎ , so that we can assume the serial performance to be almostindependent of these parameters for 𝑘 ≤ , see , and we further assume optimal parallel scalability (∗∗∗) :efficiency of implementation ∝ 𝑔 ( ℎ, 𝑘 ) ⋅ 𝜂 parallel ( ℎ, 𝑘 ) ⋅ 𝔭 (∗∗∗) ∝ 𝔭 with 𝔭 = { , for double precision , for single precision (25)The speed-up 𝑓 HF/LF by a numerical relaxation can then be expressed as: 𝑓 HF/LF ( 𝑘, 𝑑, ℎ, 𝔭 ) ∶= costs HFcosts LF = ( 𝑘 +1 ℎ ) 𝑑 ⋅ 𝑘 𝛾 ℎ ⋅ 𝔭 ⋅ (− log 𝝐 solver , ) ( 𝑘 +1 ℎ ) 𝑑 ⋅ 𝑘 𝛾 ℎ ⋅ 𝔭 ⋅ (− log 𝝐 solver ) (26)Figure 3 illustrates potential speed-ups due to numerical relaxation of the original problem. Element or mesh coarsening isalways shown in combination with a change of the time step size according to the CFL condition. The figure illustrates theeffectiveness of higher order methods (change of polynomial degree) in terms of the achieved speed-up factor. A change of 𝑘 or ℎ changes the number of DoFs according to equation (22). We do not explicitly show theoretical error estimates for the relaxationprocedure as they are not of primary importance for the multi-fidelity scheme. It should be noted that the absolute error ordeviation between LF and HF model is not relevant for the proposed method, but only the structure of the error itself (noisestructure). From a practical perspective, especially the change of the polynomial degree 𝑘 for the Ansatz function of the Galerkinapproximation can lead to large speed-ups even for the same mesh. In contrast, the relaxation of couplings or tolerances has asmaller impact (logarithmic expression). An additional speed-up factor of two can be achieved when we relax the floating point . Nitzler ET AL . precision to single-precision. Even though the presented speed-ups are theoretical values one can expect tremendous efficiencygains without the need for a completely new computational model. tolerance factorpolynomial degree k FIGURE 3
Example of theoretical speed-ups for a LF model, generated by numerical relaxation. Left: Speed-up over polynomialdegree 𝑘 and mesh-coarsening factor ℎℎ . The HF reference uses 𝑘 = 7 , 𝜖 solver,0 = 10 −6 ; Right: Speed-up over tolerancefactor 𝜖 solver 𝜖 solver,0 and mesh-coarsening, with HF reference of 𝑘 = 2 , 𝜖 solver,0 = 10 −6 . Along with the spatial coarsening a temporalrelaxation is carried out according to the CFL constraint.The overall cost for the multi-fidelity framework BMFMC is then composed by the costs for the sampling on the LF modeland costs for the HF simulations in 𝑓 . To demonstrate the efficiency of BMFMC , we want to compare the computational costsof the proposed multi-fidelity approach with the cost associated with a classic Monte-Carlo sampling strategy. We define thefollowing speed-up expression for
BMFMC :speed-up MF ∶= N
MC,HF ⋅ costs HF N MC,LF ⋅ costs LF + n train ⋅ costs HF = N MC,HF ⋅ 𝑓 HF/LF N MC,LF + n train ⋅ 𝑓 HF/LF (27)Figure 4 shows the theoretical speed-ups of BMFMC for different LF model speed-ups 𝑓 HF/LF , as well as different training datasizes n train and Monte-Carlo sample sizes N 𝑀𝐶 . The multi-fidelity approach becomes especially powerful, if a high number ofMonte-Carlo evaluations on the HF model would have been necessary to estimate a statistic of interest with high accuracy. J. Nitzler
ET AL . N MC FIGURE 4
Overall speed-up of the proposed multi-fidelity Monte-Carlo (BMFMC) approach for UQ for different training datasizes n train and Monte-Carlo sample N MC sizes as well as different HF/LF model speed-ups. In the following numerical examples, we demonstrate the accuracy and efficiency of the proposed generalized Bayesianmulti-fidelity Monte-Carlo framework. The LF models deployed are automatically generated by numerical relaxation of the cor-responding HF model, as described in Section 3. The generalized multi-fidelity approach for uncertainty quantification
BMFMC was implemented in
QUEENS (Quantification of Uncertainties in Engineering and Science) , a software platform for uncer-tainty quantification, physics-informed machine learning, Bayesian optimization, inverse problems and simulation analytics. QUEENS is capable of interacting with a variety of commercial, open-source and in-house simulation engines and enables thefully automatic set-up of all required simulations on high-performance computing (HPC) clusters, workstations and desktopcomputers. The first numerical demonstration of a stochastic flow past a cylinder was calculated on a workstation with IntelCore i7-8000K CPUs running at 3.7 GHz and the second demonstration of a stochastic fluid-structure interaction problem wascomputed on an HPC cluster with Intel Xeon E5-2680v3 "Haswell" CPU running at 2.5 GHz.
For the first numerical demonstration we investigate uncertainty propagation for a widely used benchmark in computationalfluid dynamics: The flow past cylinder test case for incompressible flows, as defined by Schäfer and Turek . Similar setups havealso been discussed in the UQ community, see Perdikaris et al. . The geometry of the two-dimensional domain is a rectangularchannel with height 𝐻 = 0 . and length 𝐿 = 2 . , as depicted in Figure 5. We modify the original benchmark problemat 𝑅𝑒 = 100 to a stochastic flow problem for our investigations on efficient uncertainty propagation: A circular cylinder withuncertain radius ̃𝑅 ∼ 𝑅 (0 . , . is placed in the channel at position 𝑥 𝑐 = 0 . in streamwise direction. The cylinder’sdistance to the bottom channel wall is also a univariate random variable ̃𝑦 𝑐 ∼ 𝑦 (0 . , . . No-slip boundary conditions areimposed on Γ 𝐷, , defined by the cylinder surface and the channel walls (marked in green in Figure 5). At the outflow boundary Γ 𝑁 (shown in magenta) a Neumann boundary condition is prescribed as described in . Additionally, the kinematic viscosity ̃𝜈 ofthe fluid is uncertain and modeled as a random variable with the univariate distribution 𝜈 ( . ⋅ −4 , . ⋅ −3 ) . Furthermore,we assume a transient and stochastic Dirichlet boundary condition on the inflow section Γ 𝐷,𝑢 (shown in blue) in form of a randomfield in space with a sinusoidal ramp over time: Γ 𝐷,𝑢 ∶ 𝐮 = ⎡⎢⎢⎣ ̃𝑢 𝑥 ( 𝑦, 𝑡 )00 ⎤⎥⎥⎦ , with ̃𝑢 𝑥 ( 𝑦, 𝑡 ) = [ 𝑈 m 𝑦 ( 𝐻 − 𝑦 ) 𝐻 ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ mean function for 𝑡 = 𝑇 ∕2 + ( , ̃𝑘 ( 𝑦, 𝑦 ′ ) ) ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ Non-stat. random process ] ⋅ sin( 𝜋𝑡 ∕ 𝑇 ) ⏟⏞⏞⏟⏞⏞⏟ transient ramping (28) . Nitzler ET AL . L FIGURE 5
Setup of the stochastic flow past a cylinder problem. Random inputs are written in boxes and have a tilde superscript.The quantity of interest is the maximal lift coefficient 𝐶 𝐿, max , due to the lift force 𝐿 on the cylinder (shown in red) in they-direction. In the uncertainty propagation problem, we want to infer the distribution 𝑝 ( 𝐶 𝐿, max ) as a stochastic response tothe uncertain boundary condition and parameters, whose distribution we abbreviate by 𝑝 ( 𝒙 ) . The random process is modeledas a Gaussian Process with a non-stationary kernel function ̃𝑘 ( 𝑦, 𝑦 ′ ) , so that of the inflow field’s realizations (equal totwo times the standard deviation of the process) 𝑢 𝑥 have less than deviation from the mean 𝜇 𝑢 ( 𝑦, 𝑡 ) . The non-stationarycovariance function is formulated as a stationary squared exponential covariance function k 𝑢 ( 𝑦, 𝑦 ′ ) with space dependent signal-variance ( 𝜎 𝑢 ( 𝑦 ) ) : ̃𝑘 ( 𝑦, 𝑦 ′ ) = ( . ⋅ 𝜇 𝑢 ( 𝑦 ) ) ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ ( 𝜎 𝑢 ( 𝑦 ) ) ⋅ exp ( − ( 𝑦 − 𝑦 ′ ) 𝓁 ) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ k 𝑢 ( 𝑦,𝑦 ′ ) . (29)Expression (28) and (29) can be rewritten for easier implementation with available software packages such as GPy : ̃𝑢 𝑥 ( 𝑦, 𝑡 ) ∼ 𝜇 𝑢 ( 𝑦 ) ⋅ sin( 𝜋𝑡 ∕ 𝑇 ) + 0 . ⋅ 𝜇 𝑢 ( 𝑦 ) ⋅ ( , k 𝑢 ( 𝑦, 𝑦 ′ )) ⋅ sin( 𝜋𝑡 ∕ 𝑇 )= 𝜇 𝑢 ( 𝑦 ) ( . ⋅ ( , k 𝑢 ( 𝑦, 𝑦 ′ ))) ⋅ sin( 𝜋𝑡 ∕ 𝑇 ) (30)A summary of important properties for the stochastic flow problem is given in Table 2. Discrete realizations of the random TABLE 2
Properties used in the simulations (random properties are printed bold)Property Variable ValueChannel height 𝐻 𝐿 Lateral cylinder position ̃𝑦 𝑐 𝑦 (0 . , . Cylinder radius ̃𝑅 𝑅 (0 . , . Kinematic viscosity ̃𝜈 𝜈 ( . ⋅ −4 , . ⋅ −3 ) Inflow BC ̃𝑢 𝑥 ( 𝑦, 𝑡 ) 𝜇 𝑢 ( 𝑦 ) [ . ⋅ ( , k 𝑢 𝑥 ( 𝑦, 𝑦 ′ ))] ⋅ sin( 𝜋𝑡 ∕ 𝑇 ) Mean function at 𝑡 = 𝑇 ∕2 𝜇 𝑢 ( 𝑦 ) 𝑈 m 4 𝑦 ( 𝐻 − 𝑦 ) 𝐻 Mean max. velocity 𝑈 m 𝓁 . ⋅ 𝐻 J. Nitzler
ET AL . inflow field can be computed using standard pseudo-random number generators. We define a vector of points 𝑌 Γ on Γ 𝐷,𝑢 onwhich we evaluate the random inflow BC to yield the velocity vector 𝑈 Γ . The Dirichlet boundary condition can then be imposedby a Galerkin projection step. 𝐮 Γ = ( 𝐦 + 𝐠 ) ⋅ sin( 𝜋𝑡 ∕ 𝑇 ) , with 𝐦 = 𝜇 𝑢 ( 𝑌 Γ ) and 𝐠 = 𝐋 ⋅ 𝐫 ∼ ( , k 𝑢 ( 𝑦, 𝑦 ′ )) , with 𝐫 ∼ 𝐫 ( , 𝐼 ) and 𝐋 ⋅ 𝐋 𝑇 = 𝐊 ∗ (31)In equation (31) the matrix 𝐋 denotes the Cholesky factorization of the covariance matrix 𝐊 ∗ = k 𝑢 ( 𝑌 Γ , 𝑌 Γ′ ) . The normally dis-tributed vector 𝐫 has the dimension of 𝑌 Γ , which in our case was discretized by 200 points. The resulting stochastic dimension(before dimensionality reduction) was 𝑑 = 203 . For subsequent parts of our analysis (as put forth in Section 3) we can directlycalculate a low dimensional representation ̂ 𝑿 ∗ of the high-dimensional inputs 𝑿 ∗ by computing a truncated Karkunen-Loèveexpansion (KLE) (unsupervised dimensionality reduction) of the random field. The random variables ̃𝑅, ̃𝜈 and ̃𝑦 𝑐 are indepen-dent. Hence, their dimension cannot be further reduced. Figure 6 shows on the right side realizations of the random inflow profilefor 𝑡 = 𝑇 ∕2 (solid lines) along with their truncated KLE approximation of order six (dashed-lines). The bar-chart (left) showsthe explained variance over the KLE truncation order. We decided to truncate the extension at order 10 and store the reducedinput data of the truncated inflow field and the three random variables as ̂ 𝑿 ∗ ∈ R N sample ×13 , in contrast to the original input dataset 𝑿 ∗ ∈ R N sample ×203 ): u x P e r c en t ageo f e x p l a i ned v a r i an c e FIGURE 6
Left: Cumulative percentage of explained variance for a Karhunen-Loéve expansion of the random inflow field;Right: Example samples (solid line) of the random inflow field ̃𝑢 𝑥 ( 𝑦 ) for 𝑡 = 𝑇 ∕2 along with their Karhunen-Loèveapproximations (dashed) of order six. The mean function of the random inflow is printed in light grey.The unsteady problem for a sample realization of the random input 𝒙 ∼ 𝑝 ( 𝒙 ) is simulated over a time interval of ≤ 𝑡 ≤ 𝑇 = 8 ,with a zero velocity field in the domain at initial time 𝑡 = 0 . We solve the uncertainty propagation problem using two fidelitylevels of a high-order discontinuous Galerkin ( 𝐿 -conforming) discretization developed in . On quadrilateral/hexahedralelements, the solution is approximated by tensor-product Lagrange polynomials of degree 𝑘 ≥ for the velocity unknowns,and degree 𝑘 𝑝 = 𝑘 − 1 for the pressure unknowns for reasons of inf–sup stability. From a practical perspective, the flexibilityto vary the polynomial degree 𝑘 of the shape functions and the mesh resolution ℎ independently, as a means to increase thespatial approximation properties of the discretization, is attractive as one does not have to generate several meshes. We exploitthis property for the multi-fidelity approach and define a high-fidelity model version with polynomial degree 𝑘 = 6 and a low-fidelity version of the benchmark using 𝑘 = 3 to resolve the velocity field. For efficient time integration, the method used in the . Nitzler ET AL . present work relies on well-known projection methods that segregate the solution of velocity and pressure unknowns, where thesecond-order accurate dual splitting scheme with an explicit treatment of the convective term is used here. In all simulations, the same parameterized mesh according to Fehn et al. was used. The simulation domain and the meshfor one sample random input realization 𝒙 ∼ 𝑝 ( 𝒙 ) is shown in Figure 7 for the HF and the LF model version, respectively.The reduction of the polynomial degree from 𝑘 = 6 to 𝑘 = 3 leads to a speed-up of roughly eight which is in agreementwith equation (26). The procedure for the multi-fidelity uncertainty propagation follows algorithm 1. We run simulations for FIGURE 7
Example snapshot of the velocity magnitude for a low-fidelity simulation with 𝑘 = 3 (top) and high-fidelity modelsimulation with 𝑘 = 6 (bottom). Both simulations used identical inputs 𝒙 and are shown for the same simulation time 𝑡 = 𝑇 ∕2 .the N sample = 10000 input realizations of 𝑝 ( 𝒙 ) stored in 𝑿 , with 𝒙 = [ ̃𝑢 𝑥 ( 𝑦 ) , ̃𝜈, ̃𝑅, ̃𝑦 𝑐 ) ] 𝑇 on the LF version of the cylinder flowproblem and obtain a vector 𝑌 ∗ LF of according LF model responses for 𝐶 𝐿, max .The necessary number of sample points for the LF model is problem-dependent and strongly relies on the goal of the analysis.As we are interested in the entire density, the amount of necessary sample points for an accurate approximation is significantlyhigher than for point estimates, such as the mean value or the variance. The convergence of the density estimate or statistic ofinterest can be investigated over an increasing number of sample points. Error estimates exist only for point estimators such asMonte-Carlo mean or variance estimators but can be used for an initial orientation of the sample size N sample 51 . The standarderror of the Monte-Carlo estimator for the mean yields: 𝜎 E ≈ ̂𝜎 √ N sample , (32)where 𝜎 E is the standard deviation of Monte Carlo error, ̂𝜎 is the estimate of the standard deviation of the QoI and N sample is thenumber of sample points. We select an initial sample size of N sample = 10000 for the LF model, so that the relative error 𝜎 E ∕ ̂𝜎 in the mean estimate is 1%.Afterwards, we successively compute features 𝛾 𝑖 and choose five features to calculate a Ω 𝛾 𝑖 × 𝑦 LF -filling sub-set [ 𝑦 LF ( 𝑿 ) , 𝛾 𝑖 ( 𝑿 )] 𝑇 ⊂ [ 𝑦 LF ( 𝑿 ∗ ) , 𝛾 𝑖 ( 𝑿 ∗ )] 𝑇 , with 𝑖 ∈ N ∶ [1 , . We choose a data set of size n train = 150 , corresponding to 150HF model simulations to train the Gaussian Process model. In all problems we investigated, a choice of n train ∈ N ∶ [50 , offered a good balance between accuracy and performance. Figure 8 shows the HF and LF model dependency in Ω 𝑦 HF × 𝑦 LF alongwith the GP-based probabilistic model that would result without 𝛾 𝑖 . The Gaussian Process model in Ω 𝑦 LF × 𝑦 HF , shown in Figure 8,does not sufficiently explain the complex, non-Gaussian nature of the Monte-Carlo reference, shown by grey dots (normally notavailable). The introduction of a further dimension 𝛾 𝑖 leads to a higher dimensional space in which the data can be better explainedby a GP. The transition from Ω 𝑦 HF × 𝑦 LF to Ω 𝑦 HF × 𝑦 LF × 𝛾 did not require any further HF model evaluation and follows the proceduredescribed in Section 2.4, so that a reduction in the overall approximation error is possible without further computational efforts .The resulting approximation for the HF response 𝑝 ( 𝑦 HF | 𝑓 ) is shown in Figure 9a) to d). Figure 9a) shows the best BMFMCprediction for the HF output density, using n train = 150 , and two additional features 𝛾 and 𝛾 . The training points 𝑓 wereselected by choosing a diverse subset in Ω 𝜸 + with n 𝒛 LF + = 5 . The credible intervals on the densities were computed using To obtain a flow solver that is computationally efficient at high polynomial degrees 𝑘 , it is essential to implement the method with optimal computational complexityby the use of matrix-free evaluation techniques . J. Nitzler
ET AL . FIGURE 8
LF and HF output tuples of the flow past a cylinder problem. The posterior Gaussian Process for the approximationof 𝑝 ( 𝑓 ∗ | 𝑓 ) is shown in form of its mean function m 𝑓 ( 𝑦 LF ) and associated credible intervals. The training data 𝑓 is markedby red crosses.equation (15), respectively the square root of equation (15) for the standard deviation, and provide an estimate for the uncertaintyin the density prediction. The LF density (red line) of the maximum lift coefficient shows a bimodal characteristic that cannotbe found in the HF reference density. The BMFMC prediction for n train = 150 without informative LF features in Figure 9b)already resulted in very good predictions. The addition of two informative features in Figure 9a) gave even better predictionsfor the tails of the distribution and resulted in slightly lower predictive variance (narrower credible intervals) and was in almostperfect agreement with the Monte-Carlo density estimate (dashed black line) that used N sample = 10000 HF evaluations. Figures9c) and d) used a different strategy to select the training data set 𝑓 : The outcomes of the LF Monte-Carlo simulation 𝑌 ∗ LF were separated in 25 bins and then an equal amount of training candidates was randomly selected from each bin to define 𝑓 .Even though this strategy covers 𝑦 LF efficiently, the input 𝒙 was not sufficiently covered by the training data, resulting in worsepredictions with higher predictive variance. Figure 9d) demonstrates that the predictive variance (credible intervals on the HFdensity estimates) serves as a good indicator for too small training data size by giving larger credible intervals for a smallertraining data size of n train = 50 .To measure the overall accuracy of the predictive HF distribution E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] , we define an absolute error measurein terms of the Kullback-Leibler divergence (KLD) in equation (33) towards the HF Monte-Carlo density estimate 𝑝 ( 𝑦 HF ) ,which was calculated with a Gaussian kernel density estimation with bandwidth optimization, using N sample = 10000 : 𝜖 abs ∶= D KL [ 𝑝 ( 𝑦 HF ) ‖‖‖ E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )]] = ∞ ∫ −∞ 𝑝 ( 𝑦 HF ) ln ( 𝑝 ( 𝑦 HF ) E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] ) 𝑑𝑦 HF (33)The KLD can be understood as an asymmetric measure of similarity between two probability densities, where two identicaldistributions would result in a KLD value of zero and a discrepancy between the densities in KLD values greater than zero. . Nitzler ET AL . a) 𝑓 : diverse subset, two features, n train = 150 b) 𝑓 : diverse subset, no features, n train = 150 c) 𝑓 : random from bins, no features, n train = 150 d) 𝑓 : as in c) but with n train = 50 FIGURE 9
Comparison of the output distributions 𝑝 ( 𝑦 ) = 𝑝 ( 𝐶 L,max ) for the maximal lift coefficient in the flow around a cylinderproblem. The high-fidelity Monte-Carlo reference 𝑝 ( 𝑦 HF ) is shown as a black dashed line, the low-fidelity solution 𝑝 ( 𝑦 LF ) isgiven in red color and the BMFMC mean predictions E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] is shown in green, along with ±2 standard deviationcredible intervals of the prediction, shown in grey.Figure 10 shows the performance of the Bayesian multi-fidelity approach using the KLD, over an increasing number of features 𝛾 𝑖 and two different training sizes n train . To give a reference for the KLDs of the BMFMC solution, we additionally provide theKLD for Monte-Carlo based density estimates using a lower number of sample points (horizontal dashed lines) compared to theMonte-Carlo reference using N sample = 10000 .The KLD for the BMFMC predictions with zero to two additional LF features, which only required 50 HF model simulation,lies far below the KLD that was reached with density estimate, using 5000 HF model evaluations. For comparison we also plottedthe reference for a density estimate using only 50 HF simulations, resulting in considerably worse prediction than BMFMC J. Nitzler
ET AL . FIGURE 10
Kullback-Leiber divergence between the Monte-Carlo reference solution 𝑝 ( 𝑦 HF ) and the Bayesian predic-tion E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] for different number of features 𝜸 in 𝒛 LF (logarithmic scale). The horizontal dashed lines mark theKL-divergence for one exemplary Monte-Carlo estimate with less points towards the Monte Carlo estimate with N sample = 10000 .The Monte-Carlo density estimates yield a significantly higher KLD than the BMFMC estimates.(please note the logarithmic scale). The best BMFMC predictions were made with only one additional LF feature 𝛾 . The useof one or two additional features leads to a significant reduction in the KLD, without the need for any additional computationalcost. For the small training data set 𝑓 with n train = 50 and n train = 150 , the introduction of more than two additional featuresled to a strong increase in the approximation error due to the curse of dimensionality.The diverse subset strategy for training points selection resulted in considerably better estimates than a random training pointselection.In conclusion, a computational cost comparison of BMFMC with a standard Monte-Carlo procedure, as presented in equation(27), resulted in an overall speed-up factor of roughly 7.1, using N MC = 10000 , N sample = 150 and 𝑓 HF/LF = 8 , for the problem athand. In the case of N sample = 50 , the speed- up factor was roughly 7.7. The deployed LF model, which was created by changingthe degree of the polynomial Ansatz function, is only to be understood as a proof of concept. Much higher speed-ups are possiblewhen further strategies of Section 3 are combined or even a simplified physical model is applied. In the second numerical example, we are interested in uncertainty propagation for a 3D fluid-structure interaction (FSI) problemof a bending wall in a channel flow. The benchmark is motivated by work on monolithic FSI solvers and is depicted in Figure11. The fluid domain Ω is given by a flow-channel with rectangular cross-section of width 𝑏 = 1 . , height ℎ = 0 . andlength 𝑙 = 3 . , while the structure domain Ω is represented by an elastic wall of thickness 𝑡 = 0 . , width 𝑏 = 0 . andheight ℎ = 0 . . We assume the flexible wall to be clamped to the channel floor at 𝑦 = − ℎ . The distance between the wall’scenterline and the left boundary of the fluid domain is 𝑙 𝑖𝑛 = 0 . . The problem is modeled using a hyper-elastic neo-Hookeanconstitutive law for the structural domain and incompressible Newtonian fluid in the fluid domain. The fluid-structure interactionproblem is efficiently solved with a monolithic coupling scheme implemented in our in-house multi-physics finite element code BACI . The interested reader is referred to for further details on n-field monolithic solvers. In our configuration, the fluid field . Nitzler ET AL . FIGURE 11
Fluid-structure interaction problem of an elastic wall in a channel flow subject to a random inflow boundarycondition and uncertain wall elasticity. Random fields, respectively variables are shown in boxes and have an additional tildesuperscript. A no-slip boundary condition is present at Γ 𝐷, . The quantity of interest is the wall deflection in the x-direction atpoint 𝑄 (shown in red). Dirichlet boundaries for the fluid and the structure domain are shown in blue, respectively green.was chosen as the master field in the dual mortar formulation for the interface coupling . The Dirichlet boundary conditionsare formulated on Γ 𝐷 = Γ 𝐷,𝑢 ∪ Γ 𝐷, as follows: Γ 𝐷,𝑢 ∶ 𝐮 = ⎡⎢⎢⎣ 𝑢 𝑥 ( 𝑦, 𝑧 )00 ⎤⎥⎥⎦ inflow B.C. Γ 𝐷, ∶ 𝐮 = ⎡⎢⎢⎣ ⎤⎥⎥⎦ no slip B.C. (34)An overview of detailed material and fluid properties is provided in Table 3:The uncertainty propagation problem is then stated as followed: The model is subject to two sources of input uncertainties: Arandom inflow boundary condition ̃𝑢 𝑥 ( 𝑦, 𝑧 ) (random field with high stochastic dimension) and an uncertain Young’s modulus ̃𝐸 (random variable) for the elastic wall. We are interested in the distribution of the x-displacement of point 𝑄 , on top of theelastic wall. To be compliant with our notation in previous sections, we summarize the distribution of the inputs by 𝑝 ( 𝒙 ) and theresponse distribution for the QoI of the high-fidelity computer model is denoted by 𝑝 ( 𝑦 HF ) . Before we investigate the Bayesianmulti-fidelity framework for the problem at hand, we discuss the probabilistic models for 𝑝 ( 𝒙 ) in more detail. The Young’smodulus 𝐸 of the elastic wall is modeled as a random variable with a log-normal distribution 𝑝 ( 𝐸 ) = 𝐸 ( 𝜇 𝐸 , 𝜎 𝐸 ) toconstrain realizations to R + . The distribution parameters 𝜇 𝐸 and 𝜎 𝐸 are chosen so that the Young’s modulus mean value is 600with a standard deviation of 7% from the mean. The random inflow 𝑢 𝑥 ( 𝑦, 𝑧 ) is realized as a non-stationary Gaussian randomfield with parabolic mean function on Γ 𝐷,𝑢 , analogously to the previous numerical example. Again we can factorize the processto: 𝑢 𝑥 ( 𝑦, 𝑧 ) ∼ ( 𝜇 𝑢 ( 𝑦, 𝑧 ) , ̃𝑘 ( 𝑦, 𝑦 ′ ) ) = 𝜇 𝑢 ( 𝑦, 𝑧 ) + ( , ̃𝑘 ( 𝑦, 𝑦 ′ ) ) (35)The parabolic mean function was taken form the deterministic problem in and is given in Table 3 along with further propertiesof the stochastic problem. Densities of the Young’s modulus and the resulting random field for the uncertain inflow are visualized J. Nitzler
ET AL . TABLE 3
Material and fluid properties used in the simulations (uncertain properties are printed bold face)Property Variable ValuePoisson ratio 𝜈 𝜌 Young’s modulus ̃𝐸 𝐸 ( 𝜇 𝐸 , 𝜎 𝐸 ) 𝜇 𝐸 . 𝜎 𝐸 . Dynamic viscosity 𝜇 𝜌 Inflow Field ̃𝑢 𝑥 ( 𝑦, 𝑧 ) 𝜇 𝑢 ( 𝒙 ) [ . ⋅ ( , k 𝑢 𝑥 ( 𝑦, 𝑦 ′ ))] Mean function 𝜇 𝑢 ( 𝑦, 𝑧 ) 0 . ⋅ ( ( ℎ ) 𝑦 ) ⋅ ( ( 𝑏 ) 𝑧 ) Stationary covariance function k 𝑢 𝑥 ( 𝑦, 𝑦 ′ ) exp [ − ( 𝑦 − 𝑦 ′ ) 𝓁 ] Correlation length scale 𝓁 . ℎ in Figure 12. In the numerical implementation, the realizations of the random inflow were discretized at 200 points so that thestochastic dimension of the problem was dim( 𝒙 ) = 201 . For the stochastic FSI problem, we investigated three different low- FIGURE 12
Log-normal density function of the Young’s modulus (left) and visualization of a 2D cross-section at 𝑧 = 0 of the3D-random inflow field (right).fidelity model versions besides the HF model, regarding their impact on the overall prediction quality. The problem was solvedwith the continuous finite element method: The high-fidelity model used for the fluid-domain Ω . Nitzler ET AL . finite elements with residual based stabilization. The structure domain Ω of the HF model used 1536 HEX-8 F-Bar finiteelements . The three low-fidelity model variants were constructed using pure numerical relaxation as described in Section 3: LF 1
The first LF model (LF 1) was designed with a 100 times looser solver tolerances for the fluid-structure coupling and atwo times larger time step size resulting in an overall speed-up factor of four.
LF 2
The second LF model (LF 2) was constructed by spatial coarsening to 2838 fluid elements and 192 structural elements,while leaving other numerical settings untouched, leading to a speed-up factor of ten.
LF 3
Finally, the third LF model (LF 3) combined the relaxations of LF 1 and LF 2 and led to a speed-up factor of roughly 28.The procedure for the Bayesian multi-fidelity uncertainty propagation scheme was then conducted as in the previous example.We first created an input sample matrix 𝑿 ∗ ∼ 𝑝 ( 𝒙 ) with N sample = 7000 samples using random number generators for thelog-normal distribution, and a Cholesky decomposition from equation (31) to generate sample functions for the random inflowboundary condition. Compared to the previous demonstration, a smaller sample size was chosen, as the smaller variance ofthe response distribution converged faster. Afterwards, we ran the realizations for 𝒙 an all LF models to get the three responsevectors 𝑌 ∗ LF . Features and inputs 𝑿 were chosen in the same procedure as in the previous numerical demonstration with thedifference that the number of training points was further reduced to n train = 50 HF simulations to demonstrate the capabilitiesof the multi-fidelity approach even for a very small number of high-fidelity simulations. In case the predictive statistics for theHF output density show too high variance according to equation (15), more training points can be calculated. From a practicalperspective we suggest to construct subsets of space filling training designs 𝑿 ⊂ 𝑿 ⊂ … 𝑿 𝑚 following the procedure inSection 2.4. Afterwards, we can start with the smallest subset ( 𝑿 ) and choose larger sets, reusing the previous simulations, incase the predictive variance of the HF density is still too high.Figure 13a) shows the stochastic dependency between the HF model and LF 1, created by relaxation of the time discretizationand coupling settings. The dependency has a non-Gaussian noise structure and strong nonlinearities. The pure spatial relaxationin LF 2 is displayed in Figure 13b). The data points are very close to the identity of 𝑦 HF = 𝑦 LF (shown in green) along with somescattered points for which the LF simulation deviated from the HF simulation. The combination of relaxations from LF 1 inFigure a) and LF 2 in Figure b) is shown as LF 3 in Figure c). Despite having the highest numerical relaxation and therefore thehighest computational speed-up, the LF 3 model results in a less noisy model dependency structure in Ω 𝑦 HF × 𝑦 LF when comparedto the LF 1 model. A possible explanation for this effect could be the smoothing property of the coarser mesh which might dampthe model discrepancy over the input space. A detailed investigation of the effect is however outside the scope of this paper.Figure 14 presents the resulting density predictions for the HF output along with the MC reference 𝑝 ( 𝑦 HF ) MC and the outputdensities for LF 1 in Figure a) and LF 3 in Figure b), respectively. Even though the LF 1 model led to a very noisy and non-Gaussian model dependency with the HF model, as shown in Figure 13a), the multi-fidelity approach BMFMC was able topredict the HF output density nearly perfectly with as little as 50 HF simulation runs. Additionally, we present the predictionquality of BMFMC without the use of additional informative features (green dashed line) which results in considerably worsepredictions due to a high modeling error when assuming a Gaussian noise structure between HF and LF 1 in Ω 𝑦 LF × 𝑦 HF . In fact,without the use of 𝜸 the modeling error of the GP is so high that the HF reference density did not lie within the predicted credibleintervals, even for an increase in training data as demonstrated in Figure 13c). The introduction of only one additional feature 𝛾 removed this problem and the reference solution is always within the credible intervals. Figure 13 shows the superior result fortwo additional features 𝛾 and 𝛾 that also gave tighter credible intervals than the prediction with only 𝛾 . In the case of theLF 3 model, which combined spatial and temporal relaxation, BMFMC predicted the reference density nearly perfectly with orwithout additional features.Finally, we investigate the computational speed-ups reached with BMFMC in comparison to a standard Monte Carlo strategy.We use the presented speed-up definition in equation (26) and (27) to calculate the speed-up factor. In our simulations weset 𝑁 HF = 𝑁 LF and the resulting speed-up factors are summarized in Table 4. Only based on pure numerical relaxation, BMFMC performed roughly 23 times faster than the Monte Carlo benchmark while reaching the same accuracy. We want to emphasizethat we did not even fully exhaust the potential in the numerical relaxation, as i.e., the floating point precision in the simulationwas kept untouched. Furthermore, a huge speed-up potential is still available in terms of a simplified physics for the model sothat the discussed problems do not represent the full potential of
BMFMC . J. Nitzler
ET AL . a) LF 1 : Temporal and coupling relax. b)
LF 2 : Spatial and coupling relax.c)
LF 3 : Temporal, spatial and coupling relax.
FIGURE 13
Stochastic fluid-structure interaction problem: HF and LF model outputs for the wall deflection, demonstrated forthe three low-fidelity models LF 1, LF 2 and LF 3. The data of the Monte-Carlo reference (usually not available) is shown as greydots. Training points ( n train = 50 ) for the Gaussian Process are shown by red crosses. The posterior mean function m 𝑓 ( 𝑦 LF ) and ±1 − 𝜎 standard deviation of the trained Gaussian Process are shown in blue. TABLE 4
Comparison of computational speed-ups for the generalized
BMFMC approach with a standard Monte-Carlo approachfor uncertainty quantification LF model 𝑓 HF/LF N MC n train speed-up MFLF 1 4.5 7000 50 4.4LF 2 10 7000 50 9.3LF 3 28 7000 50 23.3 . Nitzler ET AL . a) LF 1 : Temporal and coupling relax., n train = 50 b) LF 3 : Temporal, spatial and coupling relax., n train = 50 c) LF 1 : Temporal and coupling relax., n train = 150 d) LF 3 : Temporal, spatial and coupling relax., n train = 150 FIGURE 14
Comparison of the output densities 𝑝 ( 𝑦 ) for the bending wall in a channel flow problem. The quantity of interestis the x-deflection point Q. The high-fidelity Monte-Carlo reference density 𝑝 ( 𝑦 HF ) is shown as a black dashed line, the low-fidelity density 𝑝 ( 𝑦 LF ) is given in red and the Bayesian predictions E 𝑓 ∗ [ 𝑝 ( 𝑦 ∗ HF | 𝑓 ∗ , 𝑓 )] , in green along with their credibleintervals, shown in grey. Figure a) shows the output densities when LF 1 was deployed in the multi-fidelity approach and Figureb) when LF 3 was deployed, each for n train = 50 . Figure c) and d) show the predictions for n train = 150 . J. Nitzler
ET AL . In this contribution, we introduced a generalized formulation for Bayesian multi-fidelity forward and backward uncertainty prop-agation. In the case of uncertainty quantification, we call the approach Bayesian multi-fidelity Monte-Carlo or short
BMFMC .We presented a formulation for the generalized approach that can be used with any probabilistic machine learning model and gavethe specific numerical implementation under the use of Gaussian Processes. We demonstrated that the generalized
BMFMC for-mulation contains other state-of-the-art methods for uncertainty quantification as special cases, but shows superior performance,especially for computationally expensive UQ problems with high stochastic dimension.Given a very limited amount of HF training data 𝑓 and a potentially very high stochastic input dimension of 𝒙 , both classicaland multi-fidelity surrogate based approaches for UQ, that consider 𝒙 explicitly, show serious accuracy problems. The reason isan insufficient coverage of Ω 𝒛 LF × 𝑦 HF by 𝑓 , with 𝒛 LF = [ 𝑦 LF , 𝒙 ] 𝑇 , in this scenario. For the other extreme, if we only consider 𝒛 LF = 𝑦 LF , a sufficient space coverage of Ω 𝑦 LF by training data 𝑓 is possible, but the dependency between the HF and the LF modelmight still be too complex in Ω 𝑦 LF × 𝑦 HF . An approximation of the density 𝑝 ( 𝑦 HF | 𝑦 LF ) for the small data case is hence prone tomodeling errors. Instead, we showed that additional features of the input 𝜸 can be learned at no extra cost, so that 𝒛 LF = [ 𝑦 LF 𝑖 , 𝜸 ] 𝑇 defines a space Ω 𝒛 LF × 𝑦 HF , in which the statistical dependence with the HF model response can be learned in a small data regimeand with a low approximation error. We furthermore demonstrated that our Bayesian approach provides credible intervals forthe density estimate itself so that it is possible to assess the quality of the predictions.We aimed at applicability of BMFMC towards complex physical models with actual engineering relevance and demonstratedthe capabilities and generality of the method on two challenging stochastic fluid-, respectively, fluid-structure interaction prob-lems with high stochastic input dimensions. We compared the performance of
BMFMC with a Monte Carlo sampling strategy,which is currently the only reliable alternative for high-dimensional stochastic problems, and demonstrated significant speed-up factors of over 23 when
BMFMC was used. The speed-ups were achieved by only using simple numerical relaxation of theoriginal problem. We want to emphasize that higher performance gains are possible by, e.g., using simplified physics in the LFmodel, a simpler geometrical model or combinations of the numerical relaxations discussed in this paper.In future work the presented method will be extended to the full solution field of an HF model, using more flexible probabilisticmodeling approaches. Furthermore, an in-depth investigation of the Bayesian multi-fidelity approach for backward uncertaintypropagation should yield significant computational speed-ups. Another interesting application of the method are multi-scaleproblems where the different physical scales can be interpreted as fidelity levels, keeping the central idea of learning 𝑝 ( 𝑦 HF | 𝒛 LF ) identical to the discussed UQ problems. ACKNOWLEDGMENTS
The research presented in this paper was partly funded by the German Research Foundation (DFG) under the priority programSPP 1886 "Polymorphic uncertainty modeling for the numerical design of structures" with additional contributions that werefunded by the DFG priority program "Software for Exascale Computing" (SPPEXA). The software QUEENS was provided bythe courtesy of AdCo Engineering GW GmbH, which is gratefully acknowledged. We finally want to thank Peter Münch for hissupport on the side of the DG-Navier-Stokes solver, Sebastian Brandstäter for his helpful contributions in the software framework
QUEENS and Gil Robalo Rei for his support in model preparation.
References
1. Biehler J, Mäck M, Nitzler J, Hanss M, Koutsourelakis PS, Wall WA. Multi-Fidelity Approaches for Uncer-tainty Quantification.
Surveys for Applied Mathematics and Mechanics (GAMM Mitteilungen)
SIAM Review . Nitzler
ET AL .
4. Giles MB, Nagapetyan T, Ritter K. Adaptive Multilevel Monte Carlo Approximation of Distribution Functions. arXivpreprint arXiv:1706.06869
Operations Research
Acta Numerica
Journal of Computational Physics
Computer Methods in Applied Mechanicsand Engineering
InternationalJournal for Numerical Methods in Engineering
SIAM Journal onScientific Computing
Journal of Computational and Graphical Statistics
Statistics in Medicine
The Art and Science of Bayesian Image Analysis
SIAM Journal on ScientificComputing
InverseProblems
International Journal for Uncertainty Quantification
SIAM Journal on Uncertainty Quantification
International Journal for Numerical Methodsin Biomedical Engineering
Journalof Computational Physics arXiv preprint arXiv:1809.10784
Journal of The Royal Society Interface J. Nitzler
ET AL .
22. Biehler J, Gee MW, Wall WA. Towards efficient uncertainty quantification in complex and large-scale biomechanical prob-lems based on a Bayesian multi-fidelity scheme.
Biomechanics and Modeling in Mechanobiology
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
SIAM/ASA Journal on Uncertainty Quantification
Biometrika
Advances in Neural Information Processing Systems
Machine Learning: A Probabilistic Perspective . The MIT Press . 2012.29. Williams CK, Rasmussen CE.
Gaussian processes for machine learning . 2. MIT press Cambridge, MA . 2006.30. GPy . GPy: A Gaussian process framework in python. http://github.com/SheffieldML/GPy; since 2012.31. Bertsekas DP, Tsitsiklis JN.
Introduction to probability . 1. Athena Scientific Belmont, MA . 2002.32. Lee S, Dietrich F, Karniadakis GE, Kevrekidis IG. Linking Gaussian Process regression with data-driven manifoldembeddings for nonlinear data fusion. arXiv preprint arXiv:1812.06467
Science
Journal of Machine Learning Research
Journal of the Royal Statistical Society: Series B(Statistical Methodology)
Chemometrics and Intelligent Laboratory Systems
Journal of Chemometrics
Computers and Geotechnics
Quality and Reliability Engineering International
Technometrics
Two-stage methods for multimodal optimization . PhD thesis. Universitätsbibliothek Dortmund, 2015. . Nitzler
ET AL .
43. Salomon S, Avigad G, Goldvard A, Schütze O. PSA–a new scalable space partition based selection algorithm for MOEAs.In: EVOLVE-A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation II. Springer. 2013 (pp.137–151)44. Fehn N, Wall WA, Kronbichler M. Efficiency of high-performance discontinuous Galerkin spectral element methods forunder-resolved turbulent incompressible flows.
International Journal for Numerical Methods in Fluids
Nodal discontinuous Galerkin methods: algorithms, analysis, and applications . Springer . 200746. Kronbichler M, Kormann K. Fast matrix-free evaluation of discontinuous Galerkin finite element operators.
ACM Transac-tions on Mathematical Software
AdCoEngineering GW Journal of Computational Physics
Journal of Computational Physics
EECS Department, The Universityof Michigan
International Journal for NumericalMethods in Engineering
Munich, Germany: Institute forComputational Mechanics, Technical University of Munich
Computer Methods in Applied Mechanics and Engineering
Computer Methods in Applied Mechanics and Engineering
International Journal of Solids and Structures