Boosting MonteCarlo sampling with a non-Gaussian fit
MMon. Not. R. Astron. Soc. , 1– ?? (0000) Printed 4 August 2020 (MN L A TEX style file v2.2)
Boosting MonteCarlo sampling with a non-Gaussian fit
Luca Amendola ? and Adrià Gómez-Valent † Institut für Theoretische Physik, Ruprecht-Karls-Universität Heidelberg, Philosophenweg 16, D-69120 Heidelberg, Germany
ABSTRACT
We propose a new method, called MonteCarlo Posterior Fit, to boost the MonteCarlosampling of likelihood (posterior) functions. The idea is to approximate the posteriorfunction by an analytical multidimensional non-Gaussian fit. The many free param-eters of this fit can be obtained by a smaller sampling than is needed to derive thefull numerical posterior. In the examples that we consider, based on supernovae andcosmic microwave background data, we find that one needs an order of magnitudesmaller sampling than in the standard algorithms to achieve comparable precision.This method can be applied to a variety of situations and is expected to significantlyimprove the performance of the MonteCarlo routines in all the cases in which samplingis very time-consuming. Finally, it can also be applied to Fisher matrix forecasts, andcan help solve various limitations of the standard approach.
Key words: cosmological parameters – data analysis
A large fraction of today’s cosmology consists in finding theprobability distribution of parameters (called posterior inBayesian language) given a set of observational data. Sinceboth the cosmological models and the data points are an evergrowing set, the development of fast algorithms to samplethe posterior is a clearly perceived need. A typical Mon-teCarlo (MC) sampling for Cosmic Microwave Background(CMB) data might require hundreds of thousands of pointsto cover the dozens of parameters (nuisance plus cosmolog-ical) that adequately model the theory and the experimentsystematics. This, in turn, might require days of computa-tion, which should be repeated for every competing model.If the comparison with theory involves very time-consumingoperations, like for instance N -body simulations, this taskcould quickly become unfeasible.In this paper we propose an alternative approach that,under certain circumstances, may reduce the computationaltimes by one or more orders of magnitude. The idea is ex-tremely simple: instead of sampling the posterior with asmany points as possible to reconstruct it faithfully, we pro-pose to sample the posterior a limited number of times anduse these points to fit it with a flexible analytical distributionbased on a higher-order expansion over a multivariate Gaus-sian. The sampling of the fitted distribution can be donewith the same MonteCarlo algorithms usually employed, butit can be evaluated much more rapidly than the original (ex-act) posterior in the cases of interest, since it is fully analyt- ? [email protected] † [email protected] ical. The great advantage of our method is that, as we showin this paper, the overall computational time spent in the to-tal sampling process can be one order of magnitude smallerthan with the traditional method, with a minimal loss ofprecision in estimating the confidence regions. We call ourmethod MonteCarlo Posterior Fit. For other interesting ap-proaches, based on Gaussian Processes, see (Leclercq 2018,McClintock & Rozo 2019, Pellejero-Ibañez et al. 2019) andreferences therein.Although we perform in this work the sampling withthe Metropolis-Hastings algorithm (Metropolis et al. 1953,Hastings 1970) and a minimal variant of it as explained inSec. 4, our method can be also used with other (sometimesmore efficient) sampling techniques in order to boost evenmore the calculations. Some possibilities are: Nested Sam-pling (Skilling 2004), Affine Invariant MonteCarlo MarkovChain (Goodman & Weare 2010), Hamiltonian MonteCarlo(Duane et al. 1987), Density estimation likelihood-free in-ference (Fan et al. 2012; Papamakarios & Murray 2016),or Approximate Bayesian Computation (Ishida et al. 2015;Akeret et al. 2015) among others.Our method is very general and by no means limitedto cosmological data, although here we only discuss illustra-tive examples based on cosmology, in particular SupernovaeIa (SnIa) and CMB. The code to implement our method, MCPostFit , is publicly available .The method we propose finds another application asa generalization of the Fisher matrix (FM) approach. TheFisher matrix is a simple way to approximate the posterior, https://github.com/adriagova/MCPostFit c (cid:13) a r X i v : . [ a s t r o - ph . C O ] A ug L. Amendola and A. Gómez-Valent mostly employed for forecasting the performance of futureobservations given an expected theoretical model (see for ex-ample many applications in Amendola et al. 2018). In thiscase, the best fit parameters are known in advance, and co-incide with the assumed model. There are however threemain problems with the standard FM approach. First, theFM, being given by the second derivative around the best fitpoint, is insensitive to the features away from the peak andis therefore by construction a good approximation only in itsvicinity (the non-Gaussian generalization explored by Sell-entin 2014 and Sellentin, Quartin & Amendola 2015 improveupon this but remains a local approximation). Secondly, theFM assumes that the best fit is a well-behaved peak, withzero first derivatives and no flat directions. Thirdly, often thederivatives have to be taken numerically by taking small dif-ferences, not analytically, and small numerical errors mightget amplified and distort the results. Our method, as we willshow, solves all three problems. In a sense, our method fillsthe gap between the Fisher matrix method, which is basedon a few points around the peak, and the full MonteCarlosampling based on hundreds of thousands of points acrossthe parameter space.The crucial ingredient of our method is that the non-Gaussian fit to the posterior is characterized by a (typi-cally) large number of parameters (we call this a "second-parametrization", to distinguish the posterior fit parametersfrom the theoretical ones) which, however, appear linearlyin the fit function. This makes it possible to obtain a simple analytical solution to the fitting problem, without any needof an expensive search through the very high-dimensionalspace of second-parameters. As we will show, the number ofsecond-parameters might well be of the order of thousandsfor a typical CMB application, yet without any significantoverhead in computational time.
The central idea of our method is to set up an approximationof the posterior obtained through a fit with a limited numberof sampling points. In this section the fit will be taken tobe Gaussian, while in the next section we generalize to anon-Gaussian function. Notice that we never require eitherthe data or the posterior to be Gaussian.Let us denote with L ( θ ) a posterior function (for sim-plicity and with a slight abuse of language we sometimesrefer to this function as the likelihood) that depends on avector of theoretical parameters θ living in a N -dimensionalspace. Let us then generate a number N r of random θ i vectors, and evaluate L i ≡ L ( θ i ), through a MonteCarloMarkov Chain (MCMC) algorithm. We denote with ˆ L thepeak of the posterior obtained for the best fit ˆ θ . We assumefor now that the peak has been accurately determined, butlater on we remove this assumption. We can then find aGaussian fit to the posterior by minimizing the quantity χ M = N r X i =1 (log L i − log L M ( θ i )) (1) with respect to the symmetric and constant matrix M αβ ,where (sum over repeated indexes) L M ( θ ) = ˆ L exp h −
12 ( θ − ˆ θ ) α M αβ ( θ − ˆ θ ) β i . (2)We have thenlog L M = log ˆ L −
12 ( θ − ˆ θ ) α M αβ ( θ − ˆ θ ) β . (3)We need then to solve the equation ∂χ M ∂M µν = − N r X i =1 (log L i − log L M,i ) ∂ log L M,i ∂M µν = 0 , (4)where L M,i = L M ( θ i ) and where M αβ are our N ( N + 1) / χ M is extrem-ized for0 = N r X i =1 (log L i − log L M,i )( θ i − ˆ θ ) µ ( θ i − ˆ θ ) ν , (5)or, equivalently, N r X i =1 D µν,i log L M,i = N r X i =1 D µν,i log L i , (6)where D µν,i ≡ ( θ i − ˆ θ ) µ ( θ i − ˆ θ ) ν . That is M αβ N r X i =1 D αβ,i D µν,i = − N r X i =1 D µν,i log (cid:16) L i ˆ L (cid:17) . (7)Let us now collect the pairs α, β into a single vector of di-mension N ( N + 1) /
2. E.g., if α, β run over 1 ,
3, we define a = { , , , , , } . Then we can rewrite M αβ as M a (where for the off-diagonal components M a = 2 M αβ ) and D µν as D a . Therefore, M αβ N r X i =1 D αβ,i D µν,i = M a N r X i =1 D a,i D b,i ≡ M a ∆ ab . (8)Then the solution is M a = 2∆ − ab N r X i =1 D b,i log (cid:18) ˆ LL i (cid:19) . (9)Notice that we do not need to normalize the likelihood. Thenumber N r of points must be larger than N ( N + 1) / M is singular) and one should reach a N r large enough thatthe solution M a converges, i.e., it does no longer signif-icantly change by further increase of N r (we discuss thispoint further below). This number of posterior evaluations, N r , should be compared to the number of evaluations ofa typical MCMC algorithm. In all the applications below,we find that N r can be one order of magnitude, or more,smaller than a standard MCMC run, without appreciableloss of precision.However, the posterior peak ˆ L and the best fit pointˆ θ obtained with a limited number of samplings might notbe very accurate. A better estimation can be obtained e.g. Any dependence of M αβ on θ would break the Gaussian natureof L M , of course. We study such dependence in Sec. 3, where weintroduce perturbatively some non-Gaussian corrections.c (cid:13) , 1–, 1–
3, we define a = { , , , , , } . Then we can rewrite M αβ as M a (where for the off-diagonal components M a = 2 M αβ ) and D µν as D a . Therefore, M αβ N r X i =1 D αβ,i D µν,i = M a N r X i =1 D a,i D b,i ≡ M a ∆ ab . (8)Then the solution is M a = 2∆ − ab N r X i =1 D b,i log (cid:18) ˆ LL i (cid:19) . (9)Notice that we do not need to normalize the likelihood. Thenumber N r of points must be larger than N ( N + 1) / M is singular) and one should reach a N r large enough thatthe solution M a converges, i.e., it does no longer signif-icantly change by further increase of N r (we discuss thispoint further below). This number of posterior evaluations, N r , should be compared to the number of evaluations ofa typical MCMC algorithm. In all the applications below,we find that N r can be one order of magnitude, or more,smaller than a standard MCMC run, without appreciableloss of precision.However, the posterior peak ˆ L and the best fit pointˆ θ obtained with a limited number of samplings might notbe very accurate. A better estimation can be obtained e.g. Any dependence of M αβ on θ would break the Gaussian natureof L M , of course. We study such dependence in Sec. 3, where weintroduce perturbatively some non-Gaussian corrections.c (cid:13) , 1–, 1– ?? oosting MC sampling with a non-Gaussian fit through the standard Newton-Raphson method (cf. Ap-pendix A), but here instead we proceed in a much moreefficient way, by generalizing our method to include the bestfit parameters in our second-parametrization, still assumingthat a Gaussian approximation to the posterior is sufficient.We consider again (2), but now we minimize (1) notonly w.r.t. the matrix elements M αβ , but also w.r.t. thecomponents of the vector ˆ θ . Doing so however mixes theunknowns in a non-linear way and the system cannot belonger solved analytically. We perform therefore a trans-formation that makes the minimization procedure analyt-ical again. Let us consider the following change of variables:ˆ θ → ˜ θ + ∆ θ , with ∆ θ being the shift from the best fit vectorin our Markov chain, ˜ θ , to the optimal location of the fittingGaussian’s peak, ˆ θ . They will not coincide in general. Weobtain, χ M = N r X i =1 (cid:20) log (cid:16) L i ˆ L (cid:17) + 12 M αβ ∆ θ α ∆ θ β − M αβ ∆ θ α ( θ i − ˜ θ ) β + 12 M αβ ( θ i − ˜ θ ) α ( θ i − ˜ θ ) β (cid:21) , (10)which can be rewritten as follows, χ M = N r X i =1 (cid:20) log (cid:16) L i ˜ L (cid:17) + 12 P α ( θ i − ˜ θ ) α + 12 M αβ ( θ i − ˜ θ ) α ( θ i − ˜ θ ) β (cid:21) , (11)with˜ L = ˆ L exp h − M αβ ∆ θ α ∆ θ β i and P α = − M αβ ∆ θ α . (12)In practice, we have rewritten (2) just as L M ( θ ) = ¯ L exp (cid:20) −
12 ( C + P α ( θ − ˜ θ ) α + M αβ ( θ − ˜ θ ) α ( θ − ˜ θ ) β ) (cid:21) , (13)where ¯ Le − C/ = ˜ L . With this simple change of variableswe gain a lot, since now we can minimize χ M w.r.t. C ( ¯ L can be fixed e.g. to L (˜ θ ), which is known), the elements ofthe N -dimensional vector P and, of course, the elements ofthe matrix M , solving equations that are fully linear in thesecond-parameters. The N + 1 degrees of freedom that werecontained in ˆ L and ˆ θ are now contained in C and P . Theaddition of the independent and linear terms in (2) allowsus to correct the position and height of the peak withoutthe need of any iterative routine, just applying the sameformalism presented before. Minimizing (11) w.r.t. C , P , and M we obtain, N r X i =1 ( C + P α D α,i + M αβ D αβ,i ) = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) (14) N r X i =1 ( C + P α D α,i + M αβ D αβ,i ) D µ,i = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) D µ,i (15) N r X i =1 ( C + P α D α,i + M αβ D αβ,i ) D µν,i = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) D µν,i , (16)where we have used the definitions D α = ( θ − ˜ θ ) α (17) D αβ = ( θ − ˜ θ ) α ( θ − ˜ θ ) β . (18)Then we can write the three equations (14)-(16) as A M = B , (19)with M = CP α M a ! (20) A = N r X i =1 D α,i D a,i D β,i D α,i D β,i D a,i D β,i D b,i D α,i D b,i D a,i D b,i ! (21) B = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) D β,i D b,i ! (22)The solution is then M = A − B . (23)One can use (12) and (23) to compute the corrected positionof the peak, ˆ θ α = ˜ θ α − M − αβ P β . (24)Even adding ˆ L and ˆ θ among the second-parameters, thefit remains of course Gaussian. In the next section we willextend the method to take into account the possible non-Gaussianity of the posterior, by adding higher-order termsto the second-parametrization. The method can be directly generalized to improve upon thequadratic approximation. In fact, the only crucial require-ment is that the second-parameterization remains linear inthe second-parameters. Clearly, the more second-parameterswe introduce, the more accurate the fit becomes, providedthere are more sampling points than second-parameters (butof course still less than is needed for the traditional MC sam-pling, otherwise the advantages of the method vanish). c (cid:13) , 1– ?? L. Amendola and A. Gómez-Valent
Figure 1.
Number of second parameters contained in each termof Eq. (34) as a function of N , when we include all the possibleelements in each of these terms. We also show in red the totalnumber of second-parameters contained in the vector M , whichis equal to the dimension of the square matrix A (35). We could then adopt the following higher-order non-Gaussian expression L M = ¯ L exp (cid:20) −
12 ( C + P α D α (25)+ M αβ D αβ + S αβγ D αβγ + K αβγδ D αβγδ ) (cid:21) , where the first two terms, already introduced in the previ-ous section, help to improve the best fit, while the last twoterms, S (for skewness) and K (for kurtosis), model possi-ble deviations from Gaussianity. Beside (17)-(18), we alsodefined D αβγ = ( θ − ˜ θ ) α ( θ − ˜ θ ) β ( θ − ˜ θ ) γ (26) D αβγδ = ( θ − ˜ θ ) α ( θ − ˜ θ ) β ( θ − ˜ θ ) γ ( θ − ˜ θ ) δ (27)In other words, we replaced the quadratic Gaussian expo-nent with a multivariate polynomial of order four; clearly,this can be extended to arbitrarily higher order, paying theprice of increasing complexity. In practice, we find that thethird or fourth order expansion is sufficient in all the ap-plications we considered. This includes the fitting of typi-cal banana-shape posteriors and other highly-non Gaussianshapes (cf. Secs. 5 and 6, and Appendix B). It is importantto remark though that Eq. (25) is only able to fit poste-riors to within some limits, for instance with just one ortwo peaks. Dealing with more general multimodal posteri-ors would obviously require the use of higher-order termsin the expansion. Eq. (25) can also have problems describ-ing very strong non-Gaussianities or small structures, buthigher-order corrections are not needed in the vast majorityof current cosmological studies.The same procedure of minimization of Eq. (1) withrespect to C, P, M, S, K gives now the five equations N r X i =1 ( C + P α D α,i + M αβ D αβ,i + S αβγ D αβγ,i + K αβγδ D αβγδ,i ) = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) (28) N r X i =1 ( C + P α D α,i + M αβ D αβ,i + S αβγ D αβγ,i + K αβγδ D αβγδ,i ) D µ,i = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) D µ,i (29) N r X i =1 ( C + P α D α,i + M αβ D αβ,i + S αβγ D αβγ,i + K αβγδ D αβγδ,i ) D µν,i = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) D µν,i (30) N r X i =1 ( C + P α D α,i + M αβ D αβ,i + S αβγ D αβγ,i + K αβγδ D αβγδ,i ) D µνσ,i = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) D µνσ,i (31) N r X i =1 ( C + P α D α,i + M αβ D αβ,i + S αβγ D αβγ,i + K αβγδ D αβγδ,i ) D µνστ,i = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) D µνστ,i (32)Extending the approach of the previous section, we organizenow the indexes as αβ → a = N ( α −
1) + β , αβγ → ¯ a = N ( α −
1) + N ( β −
1) + γ , αβγδ → ˜ a = N ( α −
1) + N ( β −
1) + N ( γ −
1) + δ , so we can write the five equations againas A M = B , (33)where M = CP α M a S ¯ a K ˜ a (34) A = N r X i =1 D α,i D a,i D ¯ a,i D ˜ a,i D β,i D α,i D β,i D a,i D β,i D ¯ a,i D β,i D ˜ a,i D β,i D b,i D α,i D b,i D a,i D b,i D ¯ a,i D b,i D ˜ a,i D b,i D ¯ b,i D α,i D ¯ b,i D a,i D ¯ b,i D ¯ a,i D ¯ b,i D ˜ a,i D ¯ b,i D ˜ b,i D α,i D ˜ b,i D a,i D ˜ b,i D ¯ a,i D ˜ b,i D ˜ a,i D ˜ b,i (35) B = 2 N r X i =1 log (cid:18) ¯ LL i (cid:19) D β,i D b,i D ¯ b,i D ˜ b,i (36)(each entry in M and B is a vector, each entry in A is ablock matrix). The solution takes then the same form as inthe Gaussian case: M = A − B . (37)The index organization is such that one considers only one c (cid:13) , 1– ?? oosting MC sampling with a non-Gaussian fit Figure 2.
Example of 1D application of the MonteCarlo Posterior Fit. Black solid lines: exact distribution, sum of two Gaussians ∼ . e − . x + e − . x − ; Black dotted lines: histograms built from N r random points generated in the MonteCarlo, for 6 differentvalues of N r . For the proposal distribution to perform the jumps of the Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings1970) we use a Gaussian with standard deviation σ = 0 .
1; Orange line: MCFM Gaussian fit (13); Green line: MCFM non-Gaussian fit(25). We have included in both cases the correction of the position of the peak, as described in Sec. 2. See also the related comments inthe main text of Sec. 3. permutation of indexes (e.g. only M , and not M ) andevery entry in M, S, K acquires a factor equal to the num-ber of distinct permutations for that entry’s indexes (e.g.in 112, the permutation of the 1s are not distinct). Forinstance, a combination like 12 takes a factor of 2! andtherefore 2 M → M a ; a combination like 123 a factor of3! = 6 (i.e. 6 S → S ¯ a ); a combination like 1122, a factor of4! / /
2! = 6 (i.e. 6 K → K ˜ a ). The ordering of the reducedindexes is arbitrary, but clearly once established must be re-spected in every matrix. There are ( N + k − /k ! / ( N − k = 1 , , , α, a, ¯ a, ˜ a ,respectively.The inversion of the matrix A could become prohibitivefor a large number of first-parameters, although there arenowadays very efficient algorithms to solve linear algebraicequations that avoid direct inversion. For instance, in a typ- ical CMB analysis one could have around 30 free parameters(nuisance plus cosmological). The full A matrix would thenhave dimensions of 46376 × K term(see Fig. 1). However, there is no really need to include thesame number of first-parameters in P, M, S and K . For ex-ample, one could include all of them in P, M and S buta smaller subset (for instance only the cosmological ones)in K , thereby drastically reducing the complexity. We willexperiment with these possibilities later on, in Sec. 6.In Fig. 2 we present a simple artificial 1D applica-tion, using an underlying distribution which is highly non-Gaussian. It clearly shows the good ability of the method toreconstruct the posterior even when a small amount of Mon-teCarlo sampling points is employed in the fit, especiallywhen non-Gaussian corrections are also considered in theanalysis (25). For such a small amount of sampling points, c (cid:13) , 1– ?? L. Amendola and A. Gómez-Valent our method provides much better results than the standardMonteCarlo. Notice that the convergence is very fast. Al-ready with only ∼
3% of the total number of points that areneeded to get a good estimate of the posterior with the usualMonteCarlo approach we are able to obtain results that ba-sically match with those obtained from the full MonteCarlosample. In Sections 5 and 6 we tackle more realistic cases.
Once we have the full solution Eq. (25) we still need tomarginalize over the nuisance parameters and over varioussubsets of parameters in order to produce 1D or 2D confi-dence regions. This can be obtained by sampling (25) witha standard Metropolis-Hastings algorithm (Metropolis et al.1953, Hastings 1970), which however now is extremely fastsince the posterior (25) is analytical, which is not usuallythe case for the original distribution. Moreover, notice that,using the notation introduced earlier, Eq. (25) can be rewrit-ten in the more compact form, L M = ¯ L exp h −
12 ( C + P α D α + M a D a + S ¯ a D ¯ a + K ˜ a D ˜ a ) i , (38)which can be evaluated much faster than (25). We denotethis sampling step as “marginalization MC” in order toclearly distinguish it from the first MC, which is carriedout using the exact likelihood. It is important to keep inmind, though, that in general the fitted distribution L M willonly describe accurately the true underlying distribution L in the regions of parameter space explored in the first MC.Far away from these regions the reconstructed distributioncan even diverge and the differences with respect to L canbe huge. In fact, there is no guarantee that the exponentof the likelihood is positive definite; in this case, there willalways be regions far from the peak in which L M diverges.Fortunately, this problem can be controlled, minimizing itsimpact on the marginalization MCMC. One can start by re-stricting the generation of points in the marginalization MCinside the N -dimensional box set by the lowest and largestvalues of the parameters in each dimension. There can bestill regions inside this box with a relatively low densityof points in which the reconstructed distribution begins todiverge. We can then evaluate the maximum value of L M around ˜ θ , L M (ˆ θ ), and restrict the marginalization MC tothose regions in which L M < L M (ˆ θ ). If we take a pointthat does not fulfill this condition then we can force the MCto go back to the good region, by selecting randomly one ofthe N r sampling points obtained in the first MC. To controlthe jumping process we can employ a multivariate Gaussian,with the covariance matrix computed directly from the N r sampling points. It is also useful to shorten the steps by di-viding the standard deviations by a certain factor, e.g. 10.In this way we can remain more time inside the region where In the Gaussian case we can easily compute L (ˆ θ ) making use of(12). In the non-Gaussian case there is no analytical expressionfor it, but we can estimate L (ˆ θ ) by generating a list of randompoints around ˜ θ (say ∼ points) and then selecting the result-ing maximum value. This computation only takes few minutes,so it is completely feasible. the fitted distribution is well-behaved and improve the effi-ciency of the method. We have checked that this approachworks very well (cf. Sec. 6). The marginalization MC can bestopped as usual, once the well-known Gelman-Rubin con-vergence diagnostic (Gelman & Rubin 1992) is below thedesired threshold, typically ˆ R − < − , − . From thegenerated MonteCarlo Markov chain we can build then thecorresponding histograms and compute the confidence re-gions.Other algorithms can be implemented to force themarginalization procedure to perform the additional sam-pling only within the good, well-sampled region. For in-stance, the marginalization sampling could be performedalong segments connecting nearby points of the first sam-pling; this and other alternative techniques will be exploredin future work. As a first cosmological application of the MonteCarlo Pos-terior Fit explained in the preceding sections, we study itsperformance in the case in which the w CDM parametriza-tion (Turner & White 1997) of the dark energy (DE) equa-tion of state (EoS) is fitted to the data of SnIa from thePantheon+MCT compilation (Scolnic et al. 2018, Riess etal. 2018). We use the compressed likelihood built from theoriginal distance moduli, which constrains the Hubble rate E ( z ) = H ( z ) /H at six different redshifts, see Table 6 andalso Fig. 3 of (Riess et al. 2018). The only two cosmolog-ical parameters that enter the fit are the normalized mat-ter density, Ω m , and the constant DE EoS parameter, w .We have obtained a MonteCarlo Markov chain containing N full r = 2 · points using the MC sampler MontePython (Audren et al. 2013), and computing E ( z ) with the Einstein-Boltzmann system solver CLASS (Blas, Lesgourgues & Tram2011). Using N full r , the Gelman-Rubin diagnostic (Gelman& Rubin 1992) for the two parameters reads ˆ R − ∼ − ,which indicates convergence. Then we have split the fullchain in subsamples containing N r < N full r points in or-der to fit them with both, the Gaussian (13) and the non-Gaussian (25) distributions. As explained earlier, the latterintroduces cubic and fourth order corrections to the pureGaussian fit. Once the second-parameters are computed fol-lowing the procedure explained in Secs. 2 and 3, we haveto carry out the marginalization over Ω m and w in orderto obtain the 1D posteriors. This is analytical in the caseof (13) (at least for positive-definite matrices), but for thenon-Gaussian distribution we need to carry out a marginal-ization MC, as described in Sec. 4. Then we can build thecorresponding histograms, from which one can later on drawnot only the 1D distributions but also the contour plots inthe (Ω m , w )-plane.Our results are shown in Fig. 3. Some comments are inorder. First, one can see therein that the distribution in the(Ω m , w )-plane is affected by clear non-Gaussianities, whichcan be duly quantified using Eq. (C3), giving τ = 1 .
11 (cf.Appendix C for details). Second, the ability of the method http://baudren.github.io/montepython.html http://lesgourg.github.io/classpublic/class.html c (cid:13) , 1– ?? oosting MC sampling with a non-Gaussian fit Figure 3.
In each row of this figure we show the marginalized one-dimensional posterior distributions for the normalized current matterdensity parameter Ω m (left column) and the dark energy EoS parameter w (middle column), together with the 1 σ and 2 σ confidencecontours in the (Ω m , w )-plane (right column), all obtained from the analysis of the SnIa data from the Pantheon+MCT compilation(Scolnic et al. 2018, Riess et al. 2018). We plot: the posterior obtained from the usual MonteCarlo with N full r = 2 · sampling points(solid black lines); the posterior obtained also with the traditional MonteCarlo, but using a number of sampling points N r ≤ N full r , asindicated in each row (dashed black lines); the Gaussian fit (13) to the latter (in orange); and the non-Gaussian fit (25) (in green). Seethe main text in Sec. 5 for further details. to describe the underlying distribution is really good, evenwhen a very small number of points is employed in the fit-ting analysis needed to extract the second-parameters. Forinstance, from the first two rows of Fig. 3 it is clear thateven employing N r = 350 or N r = 1000 (corresponding tofractions of ∼
2% and ∼
5% of the full sample, respectively)we obtain results that basically match with those obtainedfrom N full r sampling points. The differences in the meansfor the two parameters are lower than 0 . σ when we em-ploy N r = 1000. The convergence of the method is quitefast. Third, as we have mentioned earlier, in general we canrely on the good description of the underlying distributiononly in those regions of parameter space containing sampling points. This is why for the non-Gaussian fit we only showthe results in these regions.Of course, here we have generated N full r sampling pointsjust for illustrative purposes. In a real situation we want tomake use of the MonteCarlo Posterior Fit to save compu-tational time, so it is convenient to define a criterion thatallows us to stop the MC sampling once the fitting distri-bution (13) remains stable under the addition of more sam-pling points. This will mean that it has already convergedto the final result. Many criteria are possible. For instance,one can apply the method repeatedly in parallel to the mainMonteCarlo and then compute e.g. the maximum and the68 .
3% c.l. intervals of the marginalized 1D distributions forthe desired parameters using (13). One can decide to stop c (cid:13) , 1– ?? L. Amendola and A. Gómez-Valent
Figure 4.
As in Fig. 3, but for the six ΛCDM parameters ( ω b , ω cdm , ln(10 A s ) , θ s , τ reio , n s ). These constraints have been obtainedusing the Planck 2018 TT,TE,EE+lowE likelihood (Planck Collaboration 2018), and carrying out the corresponding marginalizations,see Sec. 6 for details. We show: the posterior obtained from the usual MonteCarlo with N full r = 3 · sampling points (solid black lines);the same, but using a number of sampling points N r = 1 . · (dashed black lines, only shown on the 1D plots); the Gaussian fit (13)using the latter (in orange); and the non-Gaussian fit (25) (in green), up to cubic order. the main MonteCarlo once the relative differences betweenthe values obtained for these quantities in two consecutiveevaluations is below a certain threshold, say e.g. 5%. Thiscriterion is not very efficient, though, since it requires themarginalization of the fitted distribution. For more practi-cal criteria which do not require any marginalization stepcf. the Appendix D and the results of Sec. 6.The advantages of the MonteCarlo Posterior Fit willbecome more evident in the next section, where we use CMBdata and of course consider a much bigger parameter space.These facts clearly increase the complexity of the problem.The evaluation of the exact SnIa likelihood is already quitefast, and the absolute gain in computational time due to theuse of (13) is not very substantial. This example, though, hasserved us to nicely illustrate the ability of our fit to capturevery precisely the non-Gaussian features of the underlying distribution even with a small sampling, and to show thatthe convergence is also quite fast. We start applying the MonteCarlo Posterior Fit to thechain obtained for the ΛCDM and using the Planck 2018TT,TE,EE+lowE likelihood (Planck Collaboration 2018).In this case we have the 6 usual cosmological parameters ofthe standard model, i.e. ( ω b , ω cdm , ln(10 A s ), 100 θ s , τ reio , n s ), and also the 21 nuisance parameters that enter Planck’slikelihood. Thus, we deal with a parameter space of N = 27dimensions. This means that the square matrix A (35) hasnow 31465 × c (cid:13) , 1– ?? oosting MC sampling with a non-Gaussian fit Figure 5.
1D exact and reconstructed posteriors obtained for the seven parameters ( ω b , ω cdm , ln(10 A s ) , H , τ reio , n s , Ω k ) of theΛCDM model with spatial curvature. H is given is km/s/Mpc. In the right plot of the last row, we also present the 1 σ and 2 σ H , Ω k )-plane. These correspond to the constraints obtained using the Planck 2018 TT,TE,EE+lowE likelihood (PlanckCollaboration 2018), cf. Fig. 26 therein. We show: the posterior obtained from the usual MonteCarlo with N r = 7 · sampling points(solid black lines); the same, but using N r = 1 . · (dashed black lines); the Gaussian fit (13) using the latter (in orange); and thenon-Gaussian fit (25) (in green). We include all the 28 parameters in the C, P, M terms, but only the seven cosmological parameters plus10 nuisance parameters in the
S, K terms. See the main text for further details. version of this matrix, something that is needed to obtainthe second-parameters through (37). In order to reduce thesize of A we can opt, as mentioned before, to consider e.g.only the elements of the matrix K associated to the maincosmological parameters, or just stick to the cubic correc-tion of the Gaussian fit. Later on we will analyze anotherexample considering some elements of the K matrix for il-lustrative purposes, but in the current case we just considerthe third order correction, since it already provides excel-lent results, as we explicitly show in Fig. 4. To obtain suchplot we have basically proceeded as in the preceding sec-tion. Now we have generated a full chain of N full r = 3 · sampling points, from which we can draw 1D posteriors and2D contour plots which are very close to the ones of theexact underlying distribution, since ˆ R − < − for allthe parameters. These results correspond to the solid blacklines in Fig. 4. On the other hand, we have taken a sub-chain containing only 4 .
3% of the points contained in thefull sample, i.e. N r = 1 . · points, and we have appliedthe Gaussian and non-Gaussian fits (including the third or- der correction through the matrix S ). We have plotted theoutput in orange and green, respectively. One can see thatin this case the Gaussian fit already provides pretty goodresults, e.g. the peaks of the one-dimensional marginalizedfitted distributions are very close to the exact ones, being thedifference in all cases lower or much lower than 1 σ depend-ing on the parameter. Of course, this will not necessarilyhold in general, so one needs to check how much improve-ment there is with the higher-order terms. In this case, wefind that the improvement introduced by the non-Gaussiancorrections is quite remarkable. The corresponding resultsare almost indistinguishable from the “exact” ones, obtainedfrom the full chain with N full r points. We could have pro-duced a larger sample, taking advantage of the efficiency ofour method, but here we opt to generate N full r to compareour results with the standard approach on equal levels ofsampling noise. The results are much better than the onesobtained from the usual MonteCarlo with N r points, whichare plotted with dashed black lines. The relative gain interms of computational time offered by the MC Posterior c (cid:13) , 1– ?? L. Amendola and A. Gómez-Valent
Figure 6.
Convergence criteria as a function of the number of sampled points. We can see that the Mahalanobis distance D M (D1)and the criterion d (D2) are below 1 for N r, (cid:38) · , which indicates that for such number of sampling points we can already stopthe first MonteCarlo and perform the marginalization of the fitted distribution. We have chosen here f = 0 . f = 0 .
04, cf. AppendixD. The values of the criteria at the locations marked by the dots have been computed using the numbers of N r, indicated in the x -axisand N r, = N r, − · . Fit method when compared with the usual approach is alsoimportant. The former is ∼
10 times faster, due to the factthat making use of the fitted distribution one can skip theexpensive calculation of the Einstein-Boltzmann equationswith
CLASS in the marginalization MC, something that can-not be avoided in the standard method. The improvementin efficiency provided by the MC Posterior Fit depends alsoon the exact form of the fitting distribution under consider-ation and other technical aspects, as e.g. the typical lengthof the steps in the marginalization MC. In this example wehave used as proposal distribution a multivariate Gaussianwith the covariance matrix computed from the N r samplingpoints of the original MC, but without shortening the steps.Many points are generated outside the N -dimensional boxdescribed in Sec. 4, which slows the routine a little bit down.Further gain in efficiency can be obtained by shortening thesteps, especially when the level of non-Gaussianity is notsmall, as in the next example.We also apply the method to an even more complexcase, with one additional parameter and a higher degree ofnon-Gaussianity: the ΛCDM model with spatial curvature,again under the Planck 2018 TT,TE,EE+lowE likelihood(Planck Collaboration 2018). The results are shown in Fig.5. For the non-Gaussian fit we include all the 28 parame-ters in the C, P, M terms, but only the seven cosmologicalparameters plus 10 nuisance parameters in the
S, K termsof Eq. (25) . We employ N r = 1 . · sampling points inthe first MC. In Fig. 6 we show that with this number ofsampling points we fulfill the convergence criteria explainedin Appendix D, which means that we can already performa good reconstruction of the underlying distribution withthe MC Posterior Fit, see the comments in the caption.Then we compute the second-parameters and subsequentlyobtain one million points through the marginalization MC.This is done quite efficiently, since one single evaluation of L M is ∼
40 times faster than the evaluation of L . Here We have explicitly checked that the choice of the set of 10nuisance parameters included in the
S, K terms is not importantin this case. The results remain completely stable under differentcombinations. we have employed shorter steps in the Metropolis-Hastingsalgorithm of the marginalization MC, by dividing the stan-dard deviations of all the parameters in the covariance ma-trix of the proposal multivariate Gaussian by 10. The re-sults are excellent. We have checked that the means andcorresponding uncertainties match with those provided bythe Planck collaboration . For instance, for the curvatureand Hubble parameters we obtain Ω k = − . +0 . − . and H = (54 . +3 . − . ) km/s/Mpc, respectively, whereas Planckreport Ω k = − . +0 . − . and H = (54 . +3 . − . ) km/s/Mpc.They are fully compatible. To obtain a comparable degree ofaccuracy with the standard approach we would have neededa total number of N full r (cid:38) sampling points obtained with CLASS + MontePython . The overall computational time withthe MC Posterior Fit is reduced by a factor ∼ ∼ N r ∼ . · (cf. again Fig.6). As mentioned in the Introduction, the formalism above canbe used also in another context, namely when we want toforecast the performance of future observations assuming aspecific theoretical model (e.g. ΛCDM). In this case we knowfrom the start the vector of best fit (maximum likelihood)parameters ˆ θ α . Then, given a posterior that depends on N parameters θ α , the usual Fisher matrix is obtained as F αβ = − ∂ log L∂θ α ∂θ β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =ˆ θ . (39)The FM is essentially a simple way to approximate a genericlikelihood with a Gaussian near its maximum L F ( θ ) = ˆ L exp h −
12 ( θ − ˆ θ ) α F αβ ( θ − ˆ θ ) β i , (40) See Sec. 15.3 of https://wiki.cosmos.esa.int/planck-legacy-archive/images/4/43/Baseline_params_table_2018_68pc_v2.pdf c (cid:13) , 1– ?? oosting MC sampling with a non-Gaussian fit Figure 7.
Reconstruction of the 1D posterior of the sum of the neutrino masses obtained from the analysis of the Planck 2018TT,TE,EE+lowE likelihood (Planck Collaboration 2018), assuming the ΛCDM and considering as in that reference the case of threedegenerate neutrinos with equal mass. We have of course marginalized over the 6 ΛCDM parameters and 21 nuisance parameters ofthe Planck likelihood. The various curves represent: the posterior obtained from the standard MonteCarlo with N r, = 3 . · and N r, = 2 · sampling points (solid and dashed black lines, respectively); the Gaussian fit (13) to the N r, points (in orange); the Fishermatrix approximation (in blue). where ˆ L = L (ˆ θ ) is the maximum of the likelihood. Devi-ations from Gaussianity can be modeled by extending tohigher-order derivatives, as proposed in (Sellentin 2014, Sel-lentin, Quartin & Amendola 2015). The great advantage ofthe Fisher matrix is that it just requires to evaluate the pos-terior at a handful of points (of the order of the square of thenumber of parameters) near the best fit values, in order toevaluate the Hessian F αβ . However, as already mentioned,in this way the approximation is only good near the max-imum (and only if a well-defined local maximum with zerofirst derivatives exists) and can be quite bad as soon as onemoves away from it in parameter space. The full explorationof the posterior through a MonteCarlo Markov Chain, as al-ready mentioned, might however be very demanding. Ourmethod tries to find an optimal balance between these twoextremes.The application to the forecast problem is very muchthe same as we have already seen. The main difference isthat now one needs to generate a mock dataset assuminga fiducial model in order to build the likelihood. Then onegenerates a number of points in parameter space, either byemploying a MCMC algorithm or by any other way of sam-pling the space (e.g. through a regular or irregular grid if N is low enough) and evaluates the posterior corresponding tothose points. Then one adopts either the simplest Gaussianform (2), or the improved Gaussian (13), or finally the non-Gaussian one (25), depending on the degree of accuracy re-quired, and finds the second-parameters using e.g. Eq. (37).Even when using the Gaussian form, one is advised to keepthe linear terms in C and P in order to deal with cases inwhich the posterior does not have a maximum with flat firstderivatives (perhaps because the maximum happens to lieon the border of a sharp prior).In Figure 7, we compare the performance of the stan- dard FM with the MonteCarlo Posterior Gaussian fit ina concrete example, in which we reconstruct the one-dimensional posterior distribution of the sum of the neu-trino masses, i.e. P ν m ν , obtained from the analysis of thePlanck 2018 TT,TE,EE+ lowE likelihood (Planck Collabo-ration 2018). We assume the non-minimal extension of theΛCDM model and consider P ν m ν > In this paper we presented a novel method, called Mon-tecarlo Posterior Fit, which improves the efficiency of thesampling of likelihoods offered by the usual MC approaches.We use a multivariate non-Gaussian function to fit a much c (cid:13) , 1– ?? L. Amendola and A. Gómez-Valent smaller number of sampling points than the one neededto reconstruct the posterior with the usual methods. Thesecond-parameters that characterize the fitting function canbe easily computed by solving a simple linear system of equa-tions, and then a marginalization MC is needed to samplethe fitted distribution and to subsequently obtain the 1Dposteriors and 2D confidence regions for the various parame-ters of the model under study. The advantage of this methodbecomes important in those cases in which the evaluation ofthe fitted distribution is much faster than the original likeli-hood. This is clearly the case e.g. when the use of Einstein-Boltzmann codes is required to compute the observables thatenter the likelihood.We have discussed some examples of applications basedon data on supernovae of Type Ia and CMB, which clearlyillustrate the power of the method. In the CMB case, themethod is able to reduce the computational time by a fac-tor from six to ten. The MonteCarlo Posterior Fit can bealso very useful to boost the efficiency of Bayesian analysesthat involve N -body simulations. In this case the reductionin computational time can be even more substantial. Thisdeserves a dedicated study, which we leave for a future work.Another very interesting application of our method isfound in the elaboration of forecasts. In order to avoid thelengthy exploration of the whole parameter space one usu-ally makes use of the Fisher matrix method, which approx-imates the exact likelihood by a multivariate Gaussian andrequires only the estimation of its second derivatives at themaximum. This requires only few evaluations of the likeli-hood around the peak. As we have discussed in this paper,the Fisher matrix has some important drawbacks, though.For instance, it only describes correctly the underlying dis-tribution near the peak, and can fail once we depart from it;or can lead to very bad results in the case in which the exactlikelihood has no flat maximum, for instance because trun-cated by the prior boundaries. The MC Posterior Fit cancorrect all these deficiencies of the Fisher matrix method,bridging the gap between it and the time-consuming fullMonteCarlo approach. ACKNOWLEDGEMENTS
AGV is funded by the Deutsche Forschungsgemeinschaft(DFG) - Project number 415335479. We thank Viviana Nirofor her help in the configuration of
MontePython in the casestudied in Fig. 7.
DATA AVAILABILITY
We provide the references of all the data sources in the ar-ticle. The code in which we have implemented the Monte-Carlo Posterior Fit method,
MCPostFit , is publicly availableat: https://github.com/adriagova/MCPostFit . REFERENCES
Akeret J., Refregier A., Amara A., Seehars S., Hasner C., 2015,J. Cosmology Astropart. Phys., 2015, 043Amendola L. et al., 2018, Living Reviews in Relativity, 21, 2 Audren B., Lesgourgues J., Benabed K., Prunet S., 2013, J. Cos-mol. Astropart. Phys., 1302, 001Blas D., Lesgourgues J., Tram T., 2011, J. Cosmol. Astropart.Phys., 1107, 034Duane S., Kennedy A., Pendleton B. J., Roweth D., 1987, PhysicsLetters B, 195, 216Fan Y., Nott D. J., Sisson S. A., 2012, arXiv:1212.1479Gelman A., Rubin D., 1992, Statist. Sci., 7, 457Goodman J., Weare J., 2010, Comm. App. Math. and Comp. Sci.,5, 65Hastings W., 1970, Biometrika, 57, 97Ishida E. E. O., et al., 2015, Astronomy and Computing, 13, 1Leclercq F., 2018, Phys. Rev. D, 98, 063511Levenberg K., 1944, Q. Appl. Math., 2, 164Mahalanobis P., Proceedings of the National Institute of Scienceof India, 1936, 2, 49Marquardt D., 1963, J. Soc. Indust. Appl. Math., 11, 431McClintock T., Rozo E., 2019, MNRAS, 489, 4155Metropolis N., Rosenbluth A., Rosenbluth M., Teller A., TellerE., 1953, Journal of Chemical Physics, 21, 1087Pellejero-Ibañez M., Angulo R. E., Aricó G., Zennaro M., Contr-eras S., Stücker J., 2019, arXiv:1912.08806Planck Collaboration, Aghanim N. et al., 2018, arXiv:1807.06209Press W. H., Teukolsky S. A., Vetterling W. T., Flannery B. P.,2007, Numerical Recipes: The Art of Scientific Computing.Cambridge Univ. Press, CambridgeRiess A. G. et al., 2018, ApJ, 853, 126Scolnic D. M. et al., 2018, ApJ, 859, 101Sellentin E., 2015, MNRAS, 453, 893Sellentin E., Quartin M., Amendola L., 2014, MNRAS, 441, 1831Skilling J., 2006, Bayesian Anal., 1, 833Turner M. S., White M. J., 1997, Phys. Rev. D, 56, R4439
APPENDIX A: CORRECTING THE PEAK’SLOCATION WITH NEWTON-RAPHSON
It is also possible to apply an iterative routine in order tolook for the correction of the position and height of the peakof our fitting distribution. For instance, one based on theNewton-Raphson method (see e.g. Press et al. 2007). Let usfocus at the moment on the pure Gaussian fit, i.e. let usconsider L M ( θ ) = ¯ L exp h −
12 ( C + ( θ − ˜ θ ) α M αβ ( θ − ˜ θ ) β ) i . (A1)In this case, one can start computing (9) at a given ˜ θ (1) (for instance, using the vector θ with largest L in the Markovchain) to obtain the initial matrix M (1) (notice that we cantake C (1) = 0), and then solve iteratively the following sys-tem of equations until the cost function (1) is minimized, ∂χ M ∂ ˜ θ α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n + ∂ χ M ∂ ˜ θ α ∂ ˜ θ β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (˜ θ ( n +1) β − ˜ θ ( n ) β )+ ∂ χ M ∂ ˜ θ α ∂M βµ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( M ( n +1) βµ − M ( n ) βµ )+ ∂ χ M ∂ ˜ θ α ∂C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( C ( n +1) − C ( n ) ) = 0 , (A2) c (cid:13) , 1–, 1–
12 ( C + ( θ − ˜ θ ) α M αβ ( θ − ˜ θ ) β ) i . (A1)In this case, one can start computing (9) at a given ˜ θ (1) (for instance, using the vector θ with largest L in the Markovchain) to obtain the initial matrix M (1) (notice that we cantake C (1) = 0), and then solve iteratively the following sys-tem of equations until the cost function (1) is minimized, ∂χ M ∂ ˜ θ α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n + ∂ χ M ∂ ˜ θ α ∂ ˜ θ β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (˜ θ ( n +1) β − ˜ θ ( n ) β )+ ∂ χ M ∂ ˜ θ α ∂M βµ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( M ( n +1) βµ − M ( n ) βµ )+ ∂ χ M ∂ ˜ θ α ∂C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( C ( n +1) − C ( n ) ) = 0 , (A2) c (cid:13) , 1–, 1– ?? oosting MC sampling with a non-Gaussian fit Figure A1. σ and 2 σ confidence contours associated to the analytical posteriors (B1)-(B3) (in black) and their reconstructed shapesobtained with the MC Posterior Fit method (in green), using the quartic non-Gaussian fitting distribution (25). See the comments inAppendix B. ∂χ M ∂C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n + ∂ χ M ∂C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( C ( n +1) − C ( n ) )+ ∂ χ M ∂C∂M µν (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( M ( n +1) µν − M ( n ) µν )+ ∂ χ M ∂ ˜ θ µ ∂C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (˜ θ ( n +1) µ − ˜ θ ( n ) µ ) = 0 , (A3) ∂χ M ∂M αβ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n + ∂ χ M ∂M αβ ∂M µν (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( M ( n +1) µν − M ( n ) µν )+ ∂ χ M ∂ ˜ θ µ ∂M αβ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (˜ θ ( n +1) µ − ˜ θ ( n ) µ )+ ∂ χ M ∂C∂M αβ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( C ( n +1) − C ( n ) ) = 0 , (A4)where all the derivatives are evaluated at (˜ θ ( n ) , C ( n ) , M ( n ) ).This system can be written in the standard Newton-Raphson form by using again the compact notation M µν → M a , as in Secs. 2 and 3, and building a vector of d = N +1+ N ( N + 1) / N ( N + 3) / ~ξ = (˜ θ α , C, M a ).The result reads, ~ξ ( n +1) = ~ξ ( n ) − H − ξ ~ ∇ ξ χ M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ~ξ ( n ) , (A5)with ~ ∇ ξ and H ξ being the gradient and the Hessian ma-trix associated to ~ξ , respectively. It is important to recallthat the original Newton-Raphson method does not alwayslead to a minimum. It can also lead to a saddle point oreven produce divergent results when the Hessian is not pos-itive definite. It is possible to modify slightly this methodto solve this issue. We have done so by implementing theso-called Levenberg-Marquardt correction (Levenberg 1944,Marquardt 1963), which basically consists on susbtitutingthe Hessian by H → H + µ I d × d (with I d × d being the iden-tity matrix in d dimensions) and changing properly µ ateach iteration step to ensure the positive definiteness of theresulting matrix. We can stop the iterative process e.g. whenthe relative change | ξ ( n +1) i /ξ ( n ) i − | is below a desired value,which can be different for every i . The lowest is this value,the highest the precision we demand in our calculation. A straightforward generalization of this method can alsobe employed for the non-Gaussian fit (25) (cf. Sec. 3), con-sidering equation (A5) and working now with the vector ~ξ = (˜ θ α , C, M a , S ¯ a , K ˜ a ) and the corresponding gradient andHessian. This iterative approach (and more concretely, thecalculation of the Hessian matrix) can be, though, quite ex-pensive in terms of computational time, specially if we areworking with a high-dimensional parameter space, as in Sec.6. The procedure we have explained in the main body ofthe paper corrects the position of the peak in a much moreefficient way. We have implemented also the alternative it-erative method just to cross-check the results that we havepresented in the paper. We have verified that both methodslead to the same results, as expected. APPENDIX B: FIT TO STARRY/BOXYPOSTERIORS
In this appendix we study the ability of the quartic non-Gaussian distribution (25) to fit posteriors with starry andboxy shapes, which are not encountered in the cosmologicalapplications of Secs. 5 and 6. We are not concerned here withthe MC optimization, but just with shape flexibility. Forthe sake of simplicity we stick to two-dimensional parameterspaces and consider the three following analytical posteriordistributions obtained from the combination of multivariateGaussians: h ( x, y ) = 0 . e − . (cid:16) x − . xy + y (cid:17) ++ 0 . e − . (cid:16) x +0 . xy + y (cid:17) (B1) f ( x, y ) = 0 . e − . (cid:16) ( x − . − . x − . y − . ( y − . (cid:17) ++ 0 . e − . (cid:16) x +0 . xy + y (cid:17) (B2) c (cid:13) , 1– ?? L. Amendola and A. Gómez-Valent g ( x, y ) = 0 . e − . (cid:16) x − . xy + y (cid:17) ++ 0 . e − . (cid:16) x +0 . xy + y (cid:17) (B3)The results are shown in Fig. A1. The black contoursare obtained from the exact distributions (B1)-(B3) and thegreen ones from the corresponding posterior non-Gaussianfits (25). The leftmost plot clearly shows that if the posterioris extremely starry, then Eq. (25) is unable to provide anaccurate description, capturing only some of its features. Ifthe degree of "starriness" of the underlying distribution islower, with more boxy contours, then our method seems towork very well, as evident from the second and third plotsof Fig. A1. APPENDIX C: ASSESSING THE DEGREE OFNON-GAUSSIANITY
One can assess whether the non-Gaussian terms are impor-tant by evaluating the covariance matrix with and with-out the non-Gaussian correction, again focusing only on thefourth-order term. We discuss this test briefly here as a basefor further study, although we only used it in the SnIa ap-plication discussed in Sec. 5.Assuming the deviation from Gaussianity is not exces-sive and considering in (25) the S - and K -terms associatedto all the parameters, we can build the following approxi-mation h ( θ − ˆ θ ) µ ( θ − ˆ θ ) ν i == ˜ L Z d N θD µν exp h − M αβ D αβ − S αβγ D αβγ − K αβγδ D αβγδ i ≈ ˜ L Z d N θD µν exp h − M αβ D αβ i × (cid:16) − S αβγ D αβγ − K αβγδ D αβγδ (cid:17) = M − µν − K αβγδ ˜ L Z d N θD µν D αβγδ exp h − M αβ D αβ i = M − µν + κ µν , (C1)where κ µν = − K αβγδ ( M − µν M − αβ M − γδ +14 distinct permutations) . (C2)Then we can finally compactify the deviation into a singlenumber, τ = | det( Mκ ) | , (C3)with the matrix inside the determinant taking the followingform:( Mκ ) µν = − δ µν K αβγθ M − αβ M − γθ − K µβγθ M − νβ M − γθ . (C4)Although clearly a single number cannot fully characterizethe degree of multi-dimensional non-Gaussianity, we can ex-pect that τ ≥ APPENDIX D: CRITERIA TO STOP THEFIRST MC
We suggest two convergence criteria that can be used to de-cide whether we have reached enough sampling points N r .Both make use of the Gaussian fit (13) and measure thestability of the mean and the covariance matrix under theincrease of N r . Imagine we want to compare the situationwhen we have N r, and N r, > N r, sampling points. Thefitted Gaussian obtained from these samples will be cen-tered at ˆ θ s and ˆ θ s and will have associated matrices M s and M s , respectively. In order to evaluate the stability ofthe mean values, we can compute the so-called Mahalanobisdistance (Mahalanobis 1936) between them, D M (ˆ θ s , ˆ θ s ),using M s for the metric of the parameter space. It is givenby the following simple formula, D M (ˆ θ s , ˆ θ s ) = q (ˆ θ s − ˆ θ s ) T M s (ˆ θ s − ˆ θ s ) . (D1)This criterion provides a good measure of the stability ofthe mean (cf. Sec. 6). The Mahalanobis distance is usefulbecause it automatically normalizes the parameters ( D M isdimensionless) and also because it incorporates the infor-mation of the existing correlations between them. We candecide to stop the first MC once the distance (D1) is lowerthan some threshold, i.e. once D M (cid:46) f √ N , with N beingthe number of parameters and f a number which we canchoose to be f ∼ . − . M s and M s . Let us call them λ i and β i , respectively, for i ∈ [1 , N ]. Then we can compute the sum of the absolute valuesof their relative differences and then demand this quantityto be lower, again, than some threshold, d ( M s , M s ) ≡ N X i =1 (cid:12)(cid:12)(cid:12)(cid:12) λ i β i − (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) ˜ fN . (D2)We can take, again, ˜ f ∼ . − .
1. We have explicitly ap-plied these criteria in one of the analyses of Sec. 6. Seetherein for details. c (cid:13) , 1–, 1–