Bayesian Surrogate Analysis and Uncertainty Propagation with Explicit Surrogate Uncertainties and Implicit Spatio-temporal Correlations
aa r X i v : . [ s t a t . M E ] J a n Bayesian Surrogate Analysis and Uncertainty Propagation withExplicit Surrogate Uncertainties and Implicit Spatio-temporalCorrelations
Sascha Ranftl ∗ and Wolfgang von der Linden January 12, 2021
Abstract
We introduce Bayesian Probability Theory to investigate uncertainty propagation based on meta-models.We approach the problem from the perspective of data analysis, with a given (however almost-arbitrary)input probability distribution and a given ”training” set of computer simulations. While proven mathemati-cally to be the unique consistent probability calculus, the subject of this paper is not to demonstrate beautybut usefulness. We explicitly list all propositions and lay open the general structure of any uncertaintypropagation based on meta-models. The former allows rigorous treatment at any stage, while the latterallows us to quantify the interaction of the surrogate uncertainties with the usual parameter uncertainties.Additionally, we show a simple way to implicitly include spatio-temporal correlations. We then apply theframework jointly to a family of generalized linear meta-model that implicitly includes Polynomial ChaosExpansions as a special case. While we assume a Gaussian surrogate-uncertainty, we do not assume a scalefor the surrogate uncertainty to be known, i.e. a Student-t. We end up with semi-analytic formulas forsurrogate uncertainties and uncertainty propagation.
A comprehensive collection of reviews on UQ, from the point of view of computational engineering and appliedmathematics, can be found in [1]. In [2, 3], a statistician’s perspective is discussed. Here, we will use BayesianProbability Theory [4].In this contribution, we discuss a Bayesian perspective on the uncertainty quantification as well as propaga-tion. In contrast to previous work, we derive a coherent Bayesian theory and do not need ad-hoc assumptions onparameter statistics. The uncertainty propagation demands a large number of simulations, i.e. a prohibitivelylarge computational effort. A common work-around is to run the simulation for a few selected parametervalues, and use that data to “learn” a surrogate model. The simulation uncertainties are then inferred fromthis surrogate model instead of the original simulation model at a significantly reduced computational effort.Popular surrogate models are Polynomial Chaos Expansion [5–8] and Gaussian Process Regression [9, 10], thelatter of which has had its renaissance recently from within the machine learning community. The reductionin computational effort however involves a trade-off for accuracy. The obvious weak point of this procedure isthe credibility of the surrogate model. Our Bayesian approach particularly allows to include the uncertaintiesof this surrogate model itself, too. Other related work is found in [11, 3, 12–18]
We first briefly introduce the reader to the fundamental rules of probability calculus in Sec. 2.1 and infer thebasic structure of uncertainty propagation based on meta-models constructed via expansion in any basis, andexplicitly take into account the uncertainties of the surrogate itself. In Sec. 2.2, we analyze the surrogate modeland its uncertainties. In Sec. 2.3 we provide a semi-analytical formula for subsequent simulation uncertainties.
It is proven mathematically in [19], and in a more modern presentation in [20], that Bayesian Probability Theory(PBT) is unique and the only consistent calculus for partial truths or rather uncertainty quantification. Institute of Theoretical and Computational Physics, Graz University of Technology, Petersgasse 16/II, 8010 Graz, Austria ∗ Corresponding author: Sascha Ranftl ([email protected]) p (cid:0) d (cid:1) = Z p (cid:0) d | a (cid:1) p (cid:0) a (cid:1) da , (1)and Bayes’ theorem p (cid:0) a, d (cid:1) = p (cid:0) d | a (cid:1) p (cid:0) a (cid:1) = p (cid:0) a | d (cid:1) p (cid:0) d (cid:1) , (2)The ν -th moment of a is defined by h a ν i = Z a ν p (cid:0) a (cid:1) da . (3)And the standard deviation by ∆ a = p h a i − h a i (4)The goal in this paper is to quantify the uncertainties of the FEM simulation results for the observable z ( x ) at different measurement points x in the simulation domain. The later depends on unknown parameters a , which shall are generally inferred from the data d exp obtained in experiments. Based on these data, Bayestheorem allows to determine the posterior pdf p (cid:0) a | d exp , I (cid:1) . (5)This object will be assumed to be (almost) arbitrary but given in the following considerations, however alwaysis the results of a particular data analysis of the foregoing experiment. The uncertainty of the model parameters a entails an uncertainty in the the simulated observable z ( x ) described by p (cid:0) z ( x ) | d exp , I (cid:1) = Z p (cid:0) z ( x ) | a , ✟✟✟ d exp , I (cid:1) p (cid:0) a | d exp , I (cid:1) dV a . (6)In the first pdf we have striked out d exp because the knowledge of a suffices to perform the FEM simulationto obtain z ( x ) . More details follow in the next section. If a consists of only one or two parameters, then thenumerical evaluation of the integral over the model parameters a will typically require a few dozens to hundredsFEM-simulations. The uncertainty propagation is then done and no meta-model is needed. However this isthe trivial case, and usually a will consist of way more parameters. Let’s assume that a consists of, e.g., fourparameters. That would imply the need for performing FEM-simulations at least 10 times, which is way tooCPU expensive for most real problems. This can be avoided if the FEM simulations are replaced by a surrogatemodel that approximates the observable z by a suitable parametrized surrogate function z sur = g ( a | c ), where c are yet unknown parameters. In FEM-simulation the observable z ( x ) is available at different sites x in thedomain. Clearly, the parameters of the surrogate model will also depend on those positions. So we actuallyhave z ( x )sur = g ( a | c ( x ) ) . (7)The unknown parameters will be inferred from a suitable training data. To this end, FEM simulations areperformed for a finite set of model parameters A s = { a ( i ) s } N s i =1 and the corresponding observables Z s arecomputed and combined in D sim = { A s , Z s } . Details will follow at due time. Now we use the backgroundinformation ˜ I , suggesting that we take the observable z entering the integral in eq. (6) from the surrogate modeleq. (7) rather than from the expensive FEM simulation. More precisely, instead of eq. (6) we now have p (cid:0) z ( x ) | d exp , D sim , ˜ I (cid:1) = Z dV a p (cid:0) z ( x ) | a , D sim , ✟✟✟ d exp , ˜ I (cid:1) p (cid:0) a | ✘✘✘ D sim , d exp , ˜ I (cid:1) (8)As far as the (second) pdf for the model parameters is concerned, we can omit the information on the trainingset D sim , as it does not tell us anything about the model parameters. This pdf is actually the same as that ineq. (5), i.e. p (cid:0) a | d exp , ˜ I (cid:1) = p (cid:0) a | d exp , I (cid:1) , as it makes no difference for the model parameters, how we solvethe equations underlying the simulation. In the first pdf, we can omit the information on the experiment d exp ,as we only need the simulation data D sim to fix the surrogate model, which in turn defines the observable z .The first pdf can be further specified by the marginalization rule upon introducing the surrogate parameters C p (cid:0) z ( x ) | a , D sim , ˜ I (cid:1) = Z dV C p (cid:0) z ( x ) | C , a , ✘✘✘ D sim , ˜ I (cid:1) p (cid:0) C | ✚ a , D sim , ˜ I (cid:1) . a and C , hence D sim is superfluous. Similarly in the secondpdf, where C is inferred from the trainings data, additional model parameters without the correspondingobservables’ values z , are useless. In summary we have p (cid:0) z ( x ) | d exp , D sim , ˜ I (cid:1) = Z Z dV a dV C p (cid:0) z ( x ) | C , a , ˜ I (cid:1) p (cid:0) C | D sim , ˜ I (cid:1) p (cid:0) a | d exp , I (cid:1) . (9)The first pdf is rather simple. According to the background information ˜ I we will determine the observable viathe surrogate model. Since the necessary parameters c ( x ) ∈ C and a are part of the conditional complex, thesurrogate model allows only one value z ( x )sur = g ( a | c ( x ) ) . (10)for the observable. That means p (cid:0) z ( x ) | C , a , ˜ I (cid:1) is equivalent to the probability density for z ( x ) given z ( x ) = g ( a | c ( x ) ). Hence the pdf is a Dirac-delta distribution p (cid:0) z ( x ) | C , a , ˜ I (cid:1) = δ h z ( x ) − g ( a | c ( x ) ) i . (11)Finally, we have p (cid:0) z ( x ) | d exp , D sim , ˜ I (cid:1) = Z Z dV a dV C δ h z ( x ) − d ( a | c ( x ) ) i p (cid:0) C | D sim , ˜ I (cid:1) p (cid:0) a | d exp , I (cid:1) . (12)Before we can evaluate these integrals, we first need to determine the two ingredients, which have their ownindependent significance. The last term is the result of a data analysis of a specific foregoing experiment, andwill therefore not be treated here. As mentioned earlier, we will suppress the back-ground information in thefollowing, as ambiguities should no longer occur. We recall eq. (12) that allows to determine the pdf for the observable based on the pdf for the model parameters,and the pdf for the parameters of the surrogate model p (cid:0) C | D sim (cid:1) , that we will determine now. To this endwe have to specify the form of the surrogate model. We use the expansion z sur = N b X ν =1 c ν φ ν ( a ) , (13)in terms of basis functions φ ν ( a ) and expansion coefficients c ν . No further specification is needed at thispoint. W.l.o.g., we will use multi-variate Legendre polynomials for the mock data analysis. This expansionis very similar to the frequently used generalized Polynomial Chaos Expansion [6], where the polynomials φ ν ( a ) are orthogonal with respect to the L inner product with integration measure p (cid:0) a (cid:1) . However, here weactually consider a posterior p (cid:0) a | d exp (cid:1) that generally has no standard form, for which no standard orthogonalpolynomial basis is known, and for which conditional independence of the model parameters a does not hold.The procedure has been extented to arbitrary probability measures [21] and dependent parameters [22], butin the present context, however, these polynomials are not of primary interest and would only complicatethe numerical evaluation. As outlined in section 2.1, N s FEM simulations are performed for a set of modelparameters A s = { a ( i ) s } N s i =1 and the corresponding observables are computed. The theory is so far agnostic to theexperimental design of these simulations, and it is therefore not of concern here Now, we want to determine thepdf for the surrogate parameters C , which are combined in a matrix with elements C ν,x , where ν enumerates thesurrogate basis functions and x enumerates the measurement positions in the domain for which the observablesare computed. We abbreviate the simulation data by the quantity D sim = { A s , Z s } , where the matrix Z s hasthe elements ( Z s ) i,x , which represent the observable z ( x ) at position x corresponding to the model parametervector a ( i ) s . The sought-for pdf follows from Bayes’ theorem p (cid:0) C | D sim (cid:1) = p (cid:0) C | Z s , A s (cid:1) ∝ p (cid:0) Z s | C , A s (cid:1) p (cid:0) C | A s (cid:1) ∝ p (cid:0) Z s | C , A s (cid:1) . (14)The proportionality constant is not required in the ensuing considerations and we have assumed an ignorant,uniform prior for the coefficients c , i.e. p (cid:0) C | I (cid:1) = const . We note that this is also the transformation invariantRiemann prior (see Appendix B). However, any prior that is conjugate to the likelihood will retain analyticaltractability. For the likelihood we need the total misfit, which is given by χ = N s X i =1 N x X x =1 (cid:18) ( Z s ) i,x − N b X ν =1 ( M s ) i,ν ( C ) ν,x (cid:19) = X i,x (cid:18) Z s − M s C (cid:19) i,x = tr n(cid:0) Z s − M s C (cid:1) T (cid:0) Z s − M s C (cid:1)o (15)3ith ( M s ) i,ν = Φ ν ( a ( i ) s ) and N sx = N s · N x . We assume a Gaussian type of likelihood, i.e. p (cid:0) Z s | A s , C , ∆ (cid:1) = ∆ − N xs Z exp (cid:26) − χ (cid:27) . (16)We have mentioned the ∆-dependence of the normalization explicitly, while the rest of of the normalization isirrelevant in the present context. Usually, the misfit entering the likelihood comes from the noise of the data.In the present case, however, there is no noise (merely a tiny numerical error), but the surrogate model ispresumably not an exact description of the FEM-data and ∆ covers the corresponding uncertainty. However,the uncertainty level ∆ is not known and has to be marginalized over. Along with the appropriate Jeffreys’prior (Appendix B), the integration over ∆ yields p (cid:0) C | Z s , A s (cid:1) = 1 Z ′ (cid:0) χ (cid:1) − Nsx . (17)In appendix A we derive mean and covariance h C i a = H − s M Ts Z s , (18) h ∆ C νx ∆ C ν ′ x ′ i a = χ ( N s − N b ) N x − (cid:0) H − s (cid:1) ν,ν ′ δ xx ′ , (19)and we argue that the prefactor of H − s is the Bayesian estimate for ∆ , the variance of the Gaussian in eq. (16). Now that we have determined the ingredients of eq. (12), we can determine the pdf for the observables in thelight of the experimental data and the simulation results of the training set. The form in eq. (9) allows an easyevaluation of the mean value D z ( x ) E = Z Z dV a dV C f ( a | c ( x ) ) p (cid:0) C | D sim , ˜ I (cid:1) p (cid:0) a | d exp , I (cid:1) = X ν Z dV a Φ ν ( a ) h C νx i a p (cid:0) a | d exp , I (cid:1) (20)In appendix A we show that h C i a = H − s M Ts Z s . (21)Similarly we obtain D z ( x ) z ( x ′ ) E = Z Z dV a dV C f ( a | c ( x ) ) f ( a | c ( x ′ ) ) p (cid:0) C | D sim , ˜ I (cid:1) p (cid:0) a | d exp , I (cid:1) = X νν ′ Z dV a Φ ν ( a ) Φ ν ′ ( a ) h C νx C ν ′ x ′ i a p (cid:0) a | d exp , I (cid:1) = X νν ′ Z dV a Φ ν ( a ) Φ ν ′ ( a ) (cid:18) h C νx i a h C ν ′ x ′ i a + h ∆ C νx ∆ C ν ′ x ′ i a (cid:19) p (cid:0) a | d exp , I (cid:1) . (22)The covariance again follows from D ∆ z ( x ) ∆ z ( x ′ ) E = D z ( x ) z ( x ′ ) E − D z ( x ) E D z ( x ′ ) E . (23)Note: If we neglected the uncertainty of the surrogate, i.e. p (cid:0) C | D sim (cid:1) = δ ( C − ˆ C ) (24)ˆ C = h C i a (25)then we retain the widely known special case of ’perfect’ surrogates D z ( x ) z ( x ′ ) E = X νν ′ Z dV a Φ ν ( a ) Φ ν ′ ( a ) h C νx i a h C ν ′ x ′ i a p (cid:0) a | d exp , I (cid:1) . (26)4hus, the first part in the integral above is the uncertainty of the observable due to experimental uncertaintiesand given the surrogate model, while the second term adds the uncertainty of the surrogate itself. This suggestsa measure for the trustworthiness of the surrogate model: P νν ′ R dV a Φ ν ( a ) Φ ν ′ ( a ) h ∆ C νx ∆ C ν ′ x ′ i a p (cid:0) a | d exp , I (cid:1)P νν ′ R dV a Φ ν ( a ) Φ ν ′ ( a ) h C νx i a h C ν ′ x ′ i a p (cid:0) a | d exp , I (cid:1) < ǫ (27)If the surrogate uncertainties are smaller than the parameter uncertainties by a few orders of magnitude, e.g. ǫ = 10 − , then they may be neglected.A sketch example on how to implement this computationally efficient via vectorisation in parameter space (forlow dimensions, e.g. < a ) can be found under https://github.com/Sranf/BayesianSurrogate_sketch.git We show how to quantify and explicitly incorporate surrogate uncertainties in a generic uncertainty propagationproblem. The results are valid for sufficiently ’nice’ expansion bases and linear expansion coefficients. Togetherwith a Gaussian / Student-t parametrization of the surrogate’s likelihood, this yields semi-analytic formulas.Other parametrizations might be useful if further information is available, such as bounds on the observable(Gamma- or Beta-likelihood). The result immediately suggests a novel diagnostic for the quality of the surrogateby quantifying the ratio of the surrogate uncertainty to the total uncertainty if the surrogate was. This will beparticularly useful if ’convergence’ in the sense of finding the coefficients of e.g. a Polynomial Chaos Expansion isdoubtful or not achievable due to a limit to the computational budget. Typical FEM-simulations usually featuremeasurements at different locations. The spatial correlations of the observables between two such locations wasquantified and is implicitly accounted for in the uncertainty propagation. The structure is easily adapted totemporal correlations by a simple re-interpretation of the corresponding superscript, and generalized to spatio-temporal correlations by introducing a compound index, effectively re-ordering spatial and temporal indicesinto a single sequence. What remains is applying the theory to a simulation data set where this is actually thecase.
Acknowledgement
This work was funded by Graz University of Technology (TUG) through the LEAD Project ”Mechanics, Mod-eling, and Simulation of Aortic Dissection” (biomechaorta.tugraz.at) and supported by GCCE: Graz Center ofComputational Engineering .
Conflict of Interest
The authors declare no conflict of interest.
Data Accessibility
All information is contained in the manuscript. Code sketches are available here: https://github.com/Sranf/BayesianSurrogate_sketch.git
References [1] Roger G. Ghanem, Houman Owhadi, and David Higdon.
Handbook of uncertainty quantification . Springer,2017. doi: 10.1007/978-3-319-12385-1.[2] A. O’Hagan. Bayesian analysis of computer code outputs: A tutorial.
Reliability Engineering andSystem Safety , 91(10-11):1290–1300, 2006. ISSN 09518320. doi: 10.1016/j.ress.2005.11.025. URL http://link.springer.com/chapter/10.1007/978-1-4471-0657-9{_}11 .[3] Anthony O’Hagan, Marc C. Kennedy, and Jeremy E. Oakley. Uncertainty analysis and other in-ference tools for complex computer codes.
Bayesian Staistics 6 , pages 503–524, 1999. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.446 .[4] Udo von Toussaint Wolfgang von der Linden, Volker Dose.
Bayesian Probability Theory: Applications inthe Physical Sciences . Cambridge University Press, 1 edition, 2014. ISBN 1107035902,978-1-107-03590-4.[5] Norbert Wiener. The Homogeneous Chaos.
American Journal of Mathematics , 60(4):897–936, 1938.56] Dongbin Xiu and George Em Karniadakis. The Wiener-Askey polynomial chaos for stochastic differentialequations.
SIAM J. Sci. Comput. , 27(3):1118–1139, 2005.[7] Anthony O’Hagan. Polynomial chaos: a tutorial and critique from a statistician’s perspective. Available:http://tonyohagan.co.uk/academic/pdf/Polynomial-chaos.pdf, accessed 25.06.2019, 2013.[8] Thierry Crestaux, Olivier P. Le Maˆıtre, and Jean-marc Martinez. Polynomial chaos expansion for sensitivityanalysis.
Reliability Engineering and System Safety , 94(7):1161–1172, 2009. ISSN 09518320. doi: 10.1016/j.ress.2008.10.008.[9] Anthony O’Hagan. Curve Fitting and Optimal Design for Prediction.
Journal of the Royal Statisti-cal Society. Series B (Methodological) , 40(1):1–42, 1978. ISSN 00359246. doi: 10.2307/2984861. URL .[10] Carl Edward Rasmussen and Christopher K.I. Williams.
Gaussian Processes for Machine Learning . TheMIT Press, 2006. doi: 10.1142/S0129065704001899.[11] Ihab Sraj, Olivier P. Le Maˆıtre, Omar M. Knio, and Ibrahim Hoteit. Coordinate transformation andPolynomial Chaos for the Bayesian inference of a Gaussian process with parametrized prior covariancefunction.
Computer Methods in Applied Mechanics and Engineering , 298:205–228, 2016. ISSN 00457825.doi: 10.1016/j.cma.2015.10.002.[12] Marc C. Kennedy and Anthony O’Hagan. Predicting the output from a complex computer code when fastapproximations are available.
Biometrika , 87(1):1–13, 2000. ISSN 00063444. doi: 10.1093/biomet/87.1.1.[13] M. Arnst, Roger G. Ghanem, and Christian Soize. Identification of Bayesian pos-teriors for coefficients of chaos expansions.
Journal of Computational Physics , 229(9):3134–3154, may 2010. ISSN 0021-9991. doi: 10.1016/J.JCP.2009.12.033. URL .[14] Reza Madankan, Puneet Singla, Tarunraj Singh, and Peter D. Scott. Polynomial-chaos-based Bayesianapproach for state and parameter estimations.
Journal of Guidance, Control, and Dynamics , 36(4):1058–1074, 2013. ISSN 07315090. doi: 10.2514/1.58377.[15] Georgios Karagiannis and Guang Lin. Selection of polynomial chaos bases via Bayesian model uncer-tainty methods with applications to sparse approximation of PDEs with stochastic inputs.
Journalof Computational Physics , 259:114–134, 2014. ISSN 10902716. doi: 10.1016/j.jcp.2013.11.016. URL http://dx.doi.org/10.1016/j.jcp.2013.11.016 .[16] Fei Lu, Matthias Morzfeld, Xuemin Tu, and Alexandre J. Chorin. Limitations of polynomial chaos expan-sions in the Bayesian solution of inverse problems.
Journal of Computational Physics , 282:138–147, 2015.ISSN 10902716. doi: 10.1016/j.jcp.2014.11.010. URL http://dx.doi.org/10.1016/j.jcp.2014.11.010 .[17] Matthias Hwai and Yong Tan. Sequential Bayesian Polynomial Chaos Model Selection for Estimation ofSensitivity Indices.
SIAM/ASA J. UNCERTAINTY QUANTIFICATION , 3:146–168, 2015. doi: 10.1137/130931175.[18] R. Sch¨obi, Bruno Sudret, and Stefano Marelli. Rare Event Estimation Using Polynomial-ChaosKriging.
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: CivilEngineering , 3(2014):D4016002, 2016. ISSN 2376-7642. doi: 10.1061/AJRUA6.0000870. URL http://ascelibrary.org/doi/abs/10.1061/AJRUA6.0000870 .[19] R.T. Cox. Probability, Frequency and Reasonable Expectation.
American Journal of Physics , 14(1), 1946.doi: 10.1119/1.1990764.[20] Devinderjit Sivia and John Skilling.
Data Analysis: a Bayesian Tutorial . OUP Oxford, 2006.[21] S. Oladyshkin and W. Nowak. Data-driven uncertainty quantification using the arbitrary polynomialchaos expansion.
Reliability Engineering and System Safety , 106:179–190, 2012. ISSN 09518320. doi:10.1016/j.ress.2012.05.002. URL .[22] John D. Jakeman, Fabian Franzelin, Akil Narayan, Michael Eldred, and Dirk Plf¨uger. Polynomialchaos expansions for dependent random variables.
Computer Methods in Applied Mechanics and Engi-neering , 351:643 – 666, 2019. ISSN 0045-7825. doi: https://doi.org/10.1016/j.cma.2019.03.049. URL .6 Mathematical proofs
Here we want to determine norm, mean and covariance of the marginalized Gaussian (Student-t distribution)in eq. (17), which is p (cid:0) C | Z s , A s (cid:1) = 1 Z ′ (cid:0) χ (cid:1) − Nsx , (28) χ = tr n(cid:0) Z s − M s C (cid:1) T (cid:0) Z s − M s C (cid:1) T o . (29)First we bring the misfit into a form that elucidates the C dependence χ = χ + tr n(cid:0) C − ˆ C (cid:1) T H s (cid:0) C − ˆ C (cid:1)o , (30a) H s = M Ts M s , (30b)ˆ C = H − s M Ts Z s , (30c) χ = tr n Z Ts (cid:0) − M s H − s M Ts (cid:1) Z s o . (30d)Now, the first moment is easily obtained. Along with the variable transformation under the integral C → ˆ C + X (31)we obtain h C i = 1 Z Z dV C C (cid:18) tr n C − ˆ C ) T H ( C − ˆ C o + χ (cid:19) − Nsx , = ˆ C + Z dV X X (cid:18) X T H X + χ (cid:19) − N | {z } =0 , = ˆ C . (32)Next, we transform the expression for normalization based on eq. (31) Z N sx = Z dV X (cid:18) tr n X T H X o + χ (cid:19) − Nsx , = Z dV X (cid:18) X ˜ ν ˜ ν ′ , ˜ x ˜ x ′ ˜ H (˜ ν ˜ x ) , (˜ ν ′ , ˜ x ′ ) X ˜ ν ˜ x X ˜ ν ′ ˜ x ′ + χ (cid:19) − Nsx , (33)where we have introduced the augmented ar. Alternative: Find example in henigher dimsions ( 10) whereconvergence is hopeless, provide surrogate uncertainty as an alternative means of surrogate diagnosticsray˜ H (˜ ν ˜ x ) , (˜ ν ′ , ˜ x ′ ) y := ( H s ) ˜ ν ˜ ν ′ δ ˜ x ˜ x ′ , (34)that will become handy in the computation of the covariance. Now we combine the double indices ( ν, x ) into asingle index l , which turns the matrix X into a vector x of dimension N bx = N b · N x and the array ˜ H into a N bx × N bx matrix H . In this representation we have Z N sx = Z dV x (cid:18) ( x ) T Hx + χ (cid:19) − Nsx , subst. x → H − y ⇒ = | H | − Z dV y (cid:18) y T y + χ (cid:19) − Nsx . (35)Next we introduce hyper-spherical coordinates, which leads to Z N sx = Ω N bx | H | − Z ∞ dρρ ρ N bx ( ρ + χ ) − Nsx , where Ω N bx is the solid angle in N bx dimensions. Finally, based on the substitution ρ = t · p χ , we obtain Z N sx = Ω N bx | H | − (cid:0) χ (cid:1) − Nsx − Nbx Γ( N bx )Γ( N sx − N bx )Γ( N )7his result is valid only for N sx > N bx , which is fulfilled in the present application. For future use we rewritethis as Z N sx = Z ( N sx − · (cid:0) χ (cid:1) − · N sx − N bx − N sx − . (36)Finally, we calculate the covariance, based also on the compound index l = ( ν, x ), and by using the variable-transformation in eq. (31). h ∆ C l ∆ C l ′ i = 1 Z N sx Z dV x x l x l ′ (cid:18) x T Hx + χ (cid:19) − Nsx , = − N sx − · Z N sx · ∂∂ H l,l ′ Z dV x (cid:18) X l X l ′ x T Hx + χ (cid:19) − Nsx +1 , = − N sx − χ · ( N sx − Z ( N sx − · ( N sx − L − ∂∂ H ll ′ Z ( N sx − , = − χ ( N sx − N bx − ∂∂ H ll ′ ln( Z ( N sx − ) , = − χ ( N sx − N bx − ∂∂ H ll ′ ln( | H s | − ) , = (cid:0) H − (cid:1) ll ′ . (37)In the last step we have used that H is a symmetric matrix. This is a very reasonable result because if thevariance ∆ in the Gaussian in eq. (16) would be known, then the covariance is ∆ H − . Consequently, theprefactor represents the Bayesian estimate for the variance ∆ based on the data. Now we go back to theoriginal meaning of the compound index and obtain (cid:0) H − (cid:1) ll ′ → (cid:0) ˜ H − (cid:1) ( ν,x ) , ( ν ′ ,x ′ ) = δ xx ′ ( H − s ) νν ′ , (38)which is easily proven by (cid:0) H − · H (cid:1) ll ′ → (cid:0) ˜ H − · ˜ H (cid:1) ( ν,x ) , ( ν ′ ,x ′ ) = X ν ′′ ,x ′′ ˜ H − ν,ν ′′ δ xx ′′ ˜ H ν ′′ ,ν ′ δ x ′′ ,x ′ = δ νν ′ δ xx ′ . (39)The final result is therfore h ∆ C ν,x ∆ C ν ′ ,x ′ i = χ ( N s − N b ) N x − (cid:0) H − s (cid:1) ν,ν ′ δ xx ′ . (40) B The transformation invariant prior