[PDF] Bayesian evidence for the tensor-to-scalar ratio r and neutrino masses m_ν: Effects of uniform vs logarithmic priors

Abstract

We review the effect that the choice of a uniform or logarithmic prior has on the Bayesian evidence and hence on Bayesian model comparisons when data provide only a one-sided bound on a parameter. We investigate two particular examples: the tensor-to-scalar ratio r of primordial perturbations and the mass of individual neutrinos m_\nu, using the cosmic microwave background temperature and polarisation data from Planck 2018 and the NuFIT 5.0 data from neutrino oscillation experiments. We argue that the Kullback-Leibler divergence, also called the relative entropy, mathematically quantifies the Occam penalty. We further show how the Bayesian evidence stays invariant upon changing the lower prior bound of an upper constrained parameter. While a uniform prior on the tensor-to-scalar ratio disfavours the r-extension compared to the base LCDM model with odds of about 1:20, switching to a logarithmic prior renders both models essentially equally likely. LCDM with a single massive neutrino is favoured over an extension with variable neutrino masses with odds of 20:1 in case of a uniform prior on the lightest neutrino mass, which decreases to roughly 2:1 for a logarithmic prior. For both prior options we get only a very slight preference for the normal over the inverted neutrino hierarchy with Bayesian odds of about 3:2 at most.

Full PDF

BBayesian evidence for the tensor-to-scalar ratio r and neutrino masses m ν :Eﬀects of uniform vs logarithmic priors L. T. Hergt,

1, 2, ∗ W. J. Handley,

1, 2, † M. P. Hobson, ‡ and A. N. Lasenby

1, 2, § Astrophysics Group, Cavendish Laboratory, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK Kavli Institute for Cosmology, Madingley Road, Cambridge, CB3 0HA, UK (Dated: February 24, 2021)We review the eﬀect that the choice of a uniform or logarithmic prior has on the Bayesian evidenceand hence on Bayesian model comparisons when data provide only a one-sided bound on a parameter.We investigate two particular examples: the tensor-to-scalar ratio r of primordial perturbations andthe mass of individual neutrinos m ν , using the cosmic microwave background temperature andpolarisation data from Planck 2018 and the NuFIT 5.0 data from neutrino oscillation experiments.We argue that the Kullback–Leibler divergence, also called the relative entropy, mathematicallyquantiﬁes the Occam penalty. We further show how the Bayesian evidence stays invariant uponchanging the lower prior bound of an upper constrained parameter. While a uniform prior on thetensor-to-scalar ratio disfavours the r -extension compared to the base ΛCDM model with odds ofabout 1 : 20, switching to a logarithmic prior renders both models essentially equally likely. ΛCDMwith a single massive neutrino is favoured over an extension with variable neutrino masses with oddsof 20 : 1 in case of a uniform prior on the lightest neutrino mass, which decreases to roughly 2 : 1for a logarithmic prior. For both prior options we get only a very slight preference for the normalover the inverted neutrino hierarchy with Bayesian odds of about 3 : 2 at most. I. INTRODUCTION

The “principle of insuﬃcient reason” (Bernoulli [1]) or“principal of indiﬀerence” (renamed by Keynes [2]) statesthat in the event of multiple, mutually exclusive, possibleoutcomes and in the absence of any relevant evidence, weshould assign the same probability to all outcomes [3]. Ina Bayesian analysis, this is generalised to continuous pa-rameters in the form of uninformative priors. Completeprior ignorance about a location parameter is representedby assigning a uniform distribution to the prior. Igno-rance about a scale parameter on the other hand is rep-resented by assigning a logarithmic prior, i.e. a uniformdistribution on the logarithm of the parameter [3]. How-ever, it is not always clear whether a parameter shouldbe treated as a location or scale parameter. This is quitecommonly discussed when faced with a strictly positiveparameter such as a mass or an amplitude that is verysmall, yet still unconstrained. In general, the decisionwhether to use a uniform or logarithmic prior has eﬀectson credibility bounds and on the Bayesian evidence, i.e.on both levels of Bayesian inference: parameter estima-tion and model comparison. Under the reasoning thatyou can set the lower bound to zero and thus incorpo-rate all possible small values, the uniform prior is oftenpreferred, whereas the logarithmic prior is criticised fora lack of an unambiguous lower bound, and because theultimate choice of the lower bound might aﬀect a 95 %credibility bound and the Bayesian evidence. ∗ [email protected] † [email protected] ‡ [email protected] § [email protected] In this paper we show that the very last statement istypically not true and that the choice of a lower boundfor such a logarithmic prior is less problematic than com-monly assumed. To that end we will look at two cos-mological examples in particular: the tensor-to-scalarratio r of primordial perturbations as well as the neu-trino masses m ν , where both uniform and logarithmicpriors have been applied historically (for the tensor-to-scalar ratio, see e.g. [4–9]; and for the neutrino massessee e.g. [7, 10–18]).The best constraints on the tensor-to-scalar ratio r . (cid:46) .

06 come from joint analyses of cosmic microwavebackground (CMB) data, CMB lensing, and baryonacoustic oscillations (BAO) [7, 8], where a uniform prioron r was adopted. A common goal of upcoming CMB ex-periments such as the Simons Observatory [19], the Lite-BIRD satellite [20] and the next-generation “Stage-4”ground-based CMB experiment (CMB-S4) [21] is to pushto a tensor-to-scalar ratio of r ∼ − . In pushing to suchsmall values of r , the question of whether to adopt a uni-form or logarithmic prior in one’s analysis becomes morepertinent.Since neutrino oscillation experiments measure non-zero mass diﬀerences, we can conclude that two or moreneutrinos must have mass. However, the absolute scaleof the individual neutrino masses m i cannot be measuredby the oscillation experiments, but only the mass-squaredsplittings ∆ m ij = m i − m j . The strongest bound on theabsolute neutrino mass scales is currently provided againby combined CMB and BAO data, limiting the sum ofthe neutrino masses to (cid:80) m ν (cid:46) .

12 eV at 95 % conﬁ-dence [17] (see also [15, 18] for other recent analyses).When investigating the three discrete neutrino masseigenstates, the question of uniform vs logarithmic priorsarises again. Note, however, that given the known masssplittings from oscillation experiments, the three neu- a r X i v : . [ a s t r o - ph . C O ] F e b trino mass scales are linked. If one mass scale is known,then the others can be inferred from the mass squaredsplittings. Hence, only one mass scale is truly unknownand assuming scale invariant (i.e. logarithmic) priors onall three neutrino masses simultaneously would undulyfavour smaller neutrino masses and thus a normal neu-trino hierarchy (NH) with m < m (cid:28) m compared toan inverted neutrino hierarchy (IH) with m (cid:28) m < m (for more on this see also discussions in [10, 11]).This paper is structured as follows: In Section II wewill start by giving a brief description of our Bayesiananalysis framework, including the data and base cosmo-logical model used, as well as the means of computingthe Bayesian evidence. In Section III we apply this tothe tensor-to-scalar ratio r and compare to a theoreticalmock example. In Section IV we perform the equivalentanalysis for the neutrino masses and contrast the resultsfor the two neutrino hierarchies. We conclude in Sec-tion V. II. METHODSA. Bayesian inference

There are two levels to Bayesian inference: parameterestimation and model comparison (see e.g. [3, 22]). Boththese levels are based on Bayes’ theorem which relatesinference inputs (likelihood and prior) to yielded outputs(posterior and evidence):Pr( θ | D, M ) × Pr( D | M ) = Pr( D | θ, M ) × Pr( θ | M ) , Posterior × Evidence = Likelihood × Prior , P M ( θ ) × Z M = L M ( θ ) × π M ( θ ) . (1)The posterior P is the main quantity of interest in a pa-rameter estimation, representing our state of knowledgeabout the parameters θ in a given model M , inferredfrom our prior information π and the likelihood L of theparameters under the data D . The evidence Z is pivotalfor model comparisons.Were we interested only in parameter estimation, thenit would be suﬃcient to care only about the proportional-ity of the posterior to the product of likelihood and priorand the Bayesian evidence could be neglected as a merenormalisation factor. However, for the comparison of saytwo models A and B the evidence becomes importantwith the posterior odds ratio of the two models given by:Pr( B | D )Pr( A | D ) = Pr( B )Pr( A ) × Z B Z A . (2)Typically models are assigned the same prior preferencesuch that the ﬁrst term on the right-hand side becomesunity, leaving simply the evidence ratio Z B / Z A , whichcan be interpreted as betting odds for the two models.We typically quote this in terms of the log-diﬀerence ofevidences between two models ∆ ln Z = ln( Z B / Z A ). The evidence is the marginal likelihood Z M = (cid:90) L M ( θ ) π M ( θ ) d θ = (cid:10) L M (cid:11) π , (3)and can be numerically approximated with Laplace’smethod [23], estimated from a posterior distribution at-tained e.g. from a Monte Carlo Markov Chain (MCMC)via the Savage–Dickey density ratio (SDDR) [24–27]or via a nearest-neighbour approach [28, 29] or com-puted more directly with nested sampling, which canadditionally estimate the corresponding numerical uncer-tainty [30–36].If the posterior distribution and the evidence have bothbeen determined, then as a byproduct one can also com-pute the Kullback–Leibler (KL) divergence, also calledthe relative entropy: D KL ,M = (cid:90) P M ( θ ) ln (cid:18) P M ( θ ) π M ( θ ) (cid:19) d θ = (cid:28) ln P M π M (cid:29) P , (4)which quantiﬁes the overall compression from prior toposterior distribution. B. Kullback–Leibler divergence and Occam’s razor

It should be noted that the Bayesian evidence naturallyincorporates the so-called Occam’s razor that penalisesmodels for unnecessary complexity. It can be formulatedas the principle to “Accept the simplest explanation thatﬁts the data” [22]. This can be neatly demonstrated us-ing a Gaussian likelihood with mean µ and variance σ having a single parameter x ∈ [ x min , x max ] with a uniformprior (see e.g. [22, 37]). The Bayesian evidence decom-poses into two terms: Z = L ( µ ) × σ √ πx max − x min (5)The ﬁrst term on the right-hand side is the maximumlikelihood point. With additional parameters, this termwould only increase and therefore can only favour thegiven model. The second term incorporates the ratio ofposterior to prior uncertainty. Since the posterior un-certainty σ is generally smaller than the prior uncer-tainty ( x max − x min ), this term penalises the given modelfor each of its parameters and thus embodies its Occampenalty. Note that the posterior and prior uncertaintiesappear inversely in the normalisation factor of the actualdistributions.More generally, the KL-divergence can actually be usedas an estimator of the Occam penalty, which becomesclearer when rewriting the log-evidence according to ∗ :ln (cid:18)(cid:90) L M π M d θ (cid:19) = (cid:90) P M ln L M d θ − (cid:90) P M ln (cid:18) P M π M (cid:19) d θ, (log-)evidence = parameter ﬁt − Occam penalty , ln Z M = (cid:104) ln L M (cid:105) P − D KL ,M , (6)where we have dropped the dependence on θ to savespace. Analogous to the example from Eq. (5), the ﬁrstterm on the right-hand side encapsulates the ﬁt of themodel, while the KL-divergence is the average log-ratioof posterior to prior distribution (see also the last equal-ity in Eq. (4)), thus identifying it as the Occam penalty.This equation appears in passing in the appendix of [38]in the calculation of tension metrics and it has been intu-itively applied e.g. in the third ﬁgure in [39], but as far asthe authors are aware Eq. (6) is the ﬁrst time this analyt-ical form is explicitly connected to the trade-oﬀ betweenparameter ﬁt and model complexity.While known to experts, a widely unappreciated factis that the evidence stays unaﬀected by an un constrainedparameter, i.e. when the data provide no information forthat parameter [40]. In terms of Eq. (6) this is reﬂected inan invariant likelihood (cid:104) ln L(cid:105) P and a zero KL-divergence D KL ( π = P ) = 0. Using the alternative labelling we canrephrase this: Adding an unconstrained parameter doesnot aﬀect the ﬁt, but also does not incur an additionalOccam penalty and hence also leaves the evidence unaf-fected.A popular measure for an eﬀective number of con-strained parameters is the Bayesian model complex-ity [41]. However, this quantity relies on the use of a pointestimator such as the posterior mean or mode, which iswhy we prefer using the Bayesian model dimensionality d in the following sections (see [42] for a more detaileddiscussion on Bayesian complexities/dimensionalities).The Bayesian model dimensionality can be computedstraightforwardly from the posterior distribution as theposterior variance of the log-likelihood: d M (cid:90) P M ( θ ) (cid:18) ln P M ( θ ) π M ( θ ) − D KL ,M (cid:19) d θ (7)= (cid:10) (ln L M ) (cid:11) P − (cid:10) ln L M (cid:11) P . (8)Note the connection to Eq. (6), where we used the pos-terior average of the log-likelihood. As such, these twoquantities provide an interesting additional perspectiveto that of the (ln Z , D KL ) pair. The posterior averageof the log-likelihood informs us about the parameter ﬁtand the posterior variance of the log-likelihood measuresthe models’ complexity in the form of the number of con-strained parameters. ∗ Note that proving Eq. (6) becomes surprisingly straight-forwardwhen going from right to left and making use of Bayes’ theo-rem (1).

TABLE I. Cosmological parameters of the base ΛCDM cos-mology the way they are sampled in our Bayesian analysis.The second column shows their corresponding prior ranges.The third column lists their mean and 68 % limits from ourbase ΛCDM nested sampling run with TT,TE,EE+lowE datafrom Planck 2018 and is in almost perfect agreement with ta-ble 2 in [7].Parameter Prior range 68 % limits ω b = h Ω b [0.019, 0.025] 0 . ± . ω c = h Ω c [0.025, 0.471] 0 . ± . θ s [1.03, 1.05] 1 . ± . τ reio [0.01, 0.40] 0 . + 0 . − . ln (cid:0) A s (cid:1) [2.5, 3.7] 3 . + 0 . − . n s [0.885, 1.040] 0 . ± . C. Cosmological models

In the following sections we perform Bayesian modelcomparisons on one-parameter extensions to the ΛCDMmodel (universe dominated today by a cosmological con-stant Λ and by cold dark matter), which we parametrisewith the standard 6 cosmological parameters listed inTable I with their corresponding prior ranges.In Section III we extend the ΛCDM model by thetensor-to-scalar ratio r of primordial perturbations,which is set to r = 0 in ΛCDM. In Section IV we extendthe base model by allowing for three distinct neutrinomasses. In the ΛCDM model these are typically ﬁxedto two massless neutrinos and a single massive neutrinowith m ν = 0 .

06 eV.

D. Data

We use the 2018 temperature and polarisation datafrom the Planck satellite [43], which we abbreviate as“TT,TE,EE+lowE”. Note that this is the same abbrevi-ation as in the corresponding Planck publication itself.The speciﬁc use of “lowE” but lack of “lowT” might leadto the conclusion that only E-mode and no temperaturedata were used at low multipoles. However, this is not thecase. Both high- (cid:96) and low- (cid:96) temperature auto-correlationdata are implied in that abbreviation.In Section IV we additionally use the NuFIT 5.0 (2020)data from neutrino oscillation experiments [44–46] toset Gaussian priors on the mass squared splittings δm and ∆ m . E. Statistical and cosmological software

We explore the posterior distributions of cosmo-logical and nuisance parameters using

Cobaya [47],

ΛCDM+ r ∼ U (0 , r ∼ U ( − , r ∼ U ( − , Ω b h Ω c h θ s τ reio log(10 A s ) n s FIG. 1. Stability of the cosmological parameters for the tensor-to-scalar ratio extension of the base ΛCDM cosmology withdiﬀerent priors on r : uniform in blue, logarithmic with lower bound − −

10 inred. For each parameter we show the mean and the extent from quantile 0 .

16 to 0 .

84, i.e. the inner 68 % limits. which provides both the MCMC sampler developedfor

CosmoMC [48, 49] with a “fast dragging” proceduredescribed in [50] and also the nested sampling code

PolyChord [35, 36], tailored for high-dimensional pa-rameter spaces, which can simultaneously determine theBayesian evidence alongside its numerical uncertainty.Both samplers are interfaced with the cosmological Boltz-mann code

CLASS [51–53], which computes the theoreti-cal CMB power spectra for temperature and polarisationmodes.We use

GetDist [54] to generate the data tablesof marginalised parameter values. The post-processingof the nested sampling output for the computation ofBayesian evidence, KL-divergence and Bayesian modeldimensionality, as well as the plotting functionality forposterior contours is performed using the python module anesthetic [55].All inference products required to compute the resultspresented in this paper are available for download from

Zenodo [56].

III. TENSOR-TO-SCALAR RATIO

The tensor-to-scalar ratio r quantiﬁes what fraction ofprimordial perturbations is in the form of gravitationalwaves, produced e.g. during cosmic inﬂation and poten-tially detectable in their contribution to CMB B-modes.So far, the major experiments probing the contribu-tion of tensor modes to the CMB power spectrum haveadopted a uniform prior on r [7, 8]. However, the com-mon target of r ∼ − for many upcoming CMB exper-iments such as the Simons Observatory, the LiteBIRDsatellite or CMB-S4, warrants the question as to whethera scale invariant prior might be better to handle such lowvalues. This question frequently brings up arguments ofthe ambiguity of the lower bound to a logarithmic priorand its potential eﬀect on the Bayesian evidence. A. Tensor-to-scalar ratio: Posteriors

Figure 1 gives an overview of the stability of the cosmo-logical base parameters across diﬀerent priors for r and compares them to the ΛCDM base model by showingtheir mean and 68 % ranges. In addition to the ΛCDMbase run, we have taken nested sampling runs with both auniform prior on the tensor-to-scalar ratio r ∼ U (0 ,

1) andwith two logarithmic priors with diﬀerent lower bounds,log r ∼ U ( − ,

0) and log r ∼ U ( − , n s andthe tensor-to-scalar ratio r (or log r ) in particularby showing their one-dimensional marginalised posteriordistributions. Figure 3 shows the corresponding two-dimensional joint probability contours of the 68 % and95 % levels for n s and r (or log r ). We have includedshaded histograms in the 1d plots and scatter points inthe 2d plots to give a notion of the prior distributions.As already expected from Fig. 1, the marginalised pos-terior for the spectral index is near identical, irrespectiveof the prior on r . The tensor-to-scalar ratio in the rightpanel of Fig. 2 drops oﬀ exponentially from r = 0 tolarger values, thereby signiﬁcantly compressing the prior,which spans up to unity. When sampling logarithmicallythe posterior levels oﬀ towards small scales and shows astep-like behaviour at the upper bound.We have included the kernel density estimate from theuniform r -samples in the log r plot and vice versa (dot-ted lines). This allows us to compare more directly whatsort of numerical values were actually used in those twocases. At a ﬁrst naive glance one might be concerned thatthe dotted blue line actually indicates a lower bound,however, looking at the blue shaded histogram in the 1dplot or the blue scatter points in the 2d plot it becomesclear that this is entirely prior driven and reﬂects thatuniform sampling of r does not reach such low values (seealso [57] on a related discussion about the importance ofadjusting the density when setting the x -scale to ‘log’).With a target of r ∼ − this highlights how the pa-rameter space is sampled rather ineﬃciently at those lowvalues of interest when applying a uniform prior, whichwould be an argument for adopting a logarithmic priorin the future.One problem to be aware of with the unconstrainedposteriors from a logarithmic prior is that upper bounds .

95 0 .

97 0 . n s ΛCDM+ r ∼ U (0 , r ∼ U ( − , r ∼ U ( − ,

0) posterior prior − − − r . . . r FIG. 2. Normalised one-dimensional posterior distributionsfor Planck 2018 TT,TE,EE+lowE data for the spectral in-dex n s and the tensor-to-scalar ratio r of primordial per-turbations, contrasting the diﬀerence between using a uni-form (blue) or logarithmic (orange and red) prior on r . Theshaded histograms illustrate the prior distributions. Note thatthe dotted lines show the inferred parameters r and log r inthe respective opposite domain. This is done only to providea more direct visual comparison. However, these dotted con-tours are not data-driven parameter constraints. In particularthe blue dotted line results purely from a lack of small priorsamples when sampling uniformly over r , and does not in factconstitute a lower bound on the tensor-to-scalar ratio. in form of e.g. 95 % limits will change with the lowerbound on the logarithmic parameter: the smaller thelower prior bound, the smaller also the upper posteriorbound. This lack of a stable posterior bound is a re-sult of the deﬁnition via percentiles, a notion inspiredby a normal distribution. For other types of distribu-tions, such as the step-like posteriors seen in the middlepanel of Fig. 2, percentiles of that sort are not the idealmeasure for an upper bound. For such a step-like pos-terior a better alternative would be to quantify the po-sition of the step directly, e.g. where the posterior dropsto some fraction of its plateau value. In the case thatan exponential distribution provides a good ﬁt to thenon-logarithmic parameter (see the mock example in thefollowing Section III C), the parameter value where theposterior is 1 / e times its maximum turns out to be a sta-ble choice, which corresponds to the mean of the expo-nential distribution. Indeed, using that 1 / e measure forthe step position we get roughly the same upper boundon the tensor-to-scalar ratio for all prior options: r < . , log r < − . . (9)Note, that these are not the habitually quoted 95 % up-per bounds on the tensor-to-scalar ratio. For the uni-form sampling run of r , this limit in this case is closerto roughly an 80 % upper bound. Note further that the .

94 0 .

95 0 .

96 0 .

97 0 .

98 0 .

99 1 . n s − − − − l og r . . . . r c o n v e x c o n c a v e ΛCDM+ r ∼ U (0 , r ∼ U ( − ,

0) prior posterior

FIG. 3. Two-dimensional version of Fig. 2 showing the68 % and 95 % levels of the posterior contours for Planck 2018TT,TE,EE+lowE data for spectral index n s and tensor-to-scalar ratio r , where again a uniform (blue) or logarithmic (or-ange) prior on r was used. The scattered dots give a notionof that prior distribution. Note that the dotted lines are nottrue constraints as explained in Fig. 2. The thin black linedivides the n s - r parameter space into regions of convex andconcave inﬂationary potentials. choice of the 1 / e fraction provides a particularly stablebound, because of the connection to the mean of the ex-ponential distribution. B. Tensor-to-scalar ratio: Evidence andKullback–Leibler divergence

Nested sampling provides us with distributions for log-evidence ln Z , KL-divergence D KL and Bayesian modeldimensionality d in the same way as for the posterior of − − − − Z− − ∆ l n Z + ∆ D K L Planck 2018 TT,TE,EE+lowEΛCDMΛCDM+ r ∼ U (0 , r ∼ U ( − , r ∼ U ( − , − D KL − d − ∆ d − ∆ D K L − − Z + ∆ D KL ∆ l n Z FIG. 4. Eﬀect of uniform vs logarithmic priors on Bayesian model comparison for the tensor-to-scalar ratio r : log-evidence ∆ ln Z , Kullback–Leibler divergence D KL (in nats), Bayesian model dimensionality d , and posterior average of thelog-likelihood (cid:104) ln L(cid:105) P = ln Z + D KL . The probability distributions represent errors arising from the nested sampling process.In the limit of inﬁnite life points these distributions would become point statistics, in contrast to posterior distributions. Wenormalise with respect to the ΛCDM model without r (i.e. with r = 0). Note, how switching from uniform to logarithmicsampling of r (i.e. from blue to orange/red) moves the contours along their ln Z , D KL degeneracy line, i.e. relative entropy istraded in for evidence. Note further by comparison of the orange and red lines, how changing the lower bound of the logarithmicsampling interval (by 5 log-units) barely aﬀects the contours (bar some expected statistical ﬂuctuation due to the samplingerror). free model parameters, which can be calculated straight-forwardly using anesthetic ’s analysis tools for nestedsampling output [55]. Figure 4 shows the contours forthose quantities in a triangle plot. We have normalisedall quantities with respect to the base ΛCDM model, suchthat e.g. for the log-evidence we have:∆ ln Z = ln Z − ln Z ΛCDM . (10)Table II lists the summary statistics for the quantitiesfrom Fig. 4.The marginalised plot for the diﬀerence in log-evidence(topmost panel) with ∆ ln Z = − . ± .

23 for the r -extension of ΛCDM shows that it is considerably dis- favoured when applying a uniform prior. However,switching from a uniform to a logarithmic prior negatesthe diﬀerence in log-evidence completely, such that thelog r extension ends up almost on par with the baseΛCDM model.Changing the lower bound for the logarithmic prior,on the other hand, barely aﬀects the evidence valueat all. We have performed a run with a lower boundof log r = − r = −

10, i.e.ﬁve orders of magnitude diﬀerence in r . Despite thislarge diﬀerence in the lower bound the corresponding log-evidence ln Z changes only very little such that the distri-butions signiﬁcantly overlap one another. As explained TABLE II. Mean and standard deviation of the log-evidence ln Z , Kullback–Leibler divergence D KL and Bayesian modeldimensionality d of the base ΛCDM cosmology and its r extension from Planck 2018 TT,TE,EE+lowE data [43]. The ∆indicates normalisation with respect to the base ΛCDM model. Model ln

Z D KL d ∆ ln Z ∆ D KL ∆ d ΛCDM − . ± .

20 38 . ± .

20 17 . ± . − . ± .

20 0 . ± .

20 0 . ± . r ∼ U ( 0 , − . ± .

23 40 . ± .

23 18 . ± . − . ± .

23 2 . ± .

23 0 . ± . r ∼ U ( − , − . ± .

23 39 . ± .

23 17 . ± . − . ± .

23 0 . ± .

23 0 . ± . r ∼ U ( − , − . ± .

23 38 . ± .

22 17 . ± . − . ± .

23 0 . ± .

22 0 . ± . in Section II A, this is due to log r being unconstrainedbelow a certain threshold and the Bayesian evidence pick-ing up only on constrained parameters. This can seemcounter-intuitive, since the Bayesian evidence is generallyunderstood to automatically penalise additional parame-ters. The key point is that the Occam penalty essentiallyenters into the Bayesian evidence in the form of the ra-tio of posterior to prior volume. If both volumes are thesame, then they divide out and do not contribute to theOccam penalty.The last point becomes clearer by also taking into ac-count the KL-divergence and recalling Eq. (6), wherewe identiﬁed D KL as a measure for the Occam penalty.Looking at the correlation plot between log-evidence andKL-divergence makes it clear that there is a trade-oﬀhappening between those two quantities when switch-ing between uniform and logarithmic priors. While theevidence increases for the logarithmic prior, the KL-divergence decreases, as expected from the posterior plotsin Fig. 2, which shows how the change from prior to pos-terior happens only at about log r (cid:38) −

2. This is fur-ther reﬂected in the Bayesian model dimensionality d ,which shows a clear growing trend from about d = 17for the base ΛCDM model via a log r extension to about d = 18 for the r extension reﬂecting the one additionalsampling parameter. Note that the total number of sam-pled parameters consists of 6 base cosmological parame-ters (+1 for the r extension) and 21 nuisance parametersfrom the Planck likelihood.Because of the trade-oﬀ between log-evidence and KL-divergence it is interesting also to look at their sum,which from Eq. (6) we know turns out to be the pos-terior average of the log-likelihood:ln Z + D KL = (cid:104) ln L(cid:105) P . (11)This makes for an interesting pairing with the Bayesianmodel dimensionality, since d/ variance of the log-likelihood. As such, these two quantities pro-vide an alternative perspective to that of the evidenceand KL-divergence. The posterior average and varianceof the log-likelihood are a measure of the ﬁt and com-plexity respectively. (cid:104) ln L(cid:105) P is shown in the last panelin Fig. 4, where we indeed see that the line for uniformsampling of r has moved much closer to the other lines,which is to be expected, since r and log r are fundamen- tally the same parameter and therefore lead to a similargoodness of ﬁt.This behaviour can also be understood analytically,which we explore in the following section in a one-dimensional mock example, simulating the r vs log r re-sult. C. Mock example

To illustrate further the role of a uniform vs a loga-rithmic prior on a Bayesian model comparison, we pro-pose the following mock example, which is loosely basedon the pedagogical example by Sivia and Skilling [37]explaining the eﬀect of an additional (although in thatcase constrained ) parameter, which we already outlinedin Section II A.Here, we will not assume a Gaussian likelihood thatultimately fully constrains a parameter, but rather wewill assume an exponential distribution as our likelihoodon a strictly positive parameter (which is the maximum-entropy distribution when only a mean is known): L ( a ) = P e − a/µ , (12)where P = Pr( D | a = 0) is the maximum likelihoodvalue for the data D at a = 0 and where µ is the meanof the likelihood distribution describing the data. Thus,the likelihood is constrained only on one side, providingan upper bound, as shown in the left panel of Fig. 5.We will assume a model A , where we sample the pa-rameter a uniformly in the interval [ a , a ]. Furthermore,we will assume a model B , where we uniformly samplethe parameter b = log a in the interval [ b , b ], corre-sponding to logarithmically sampling the parameter a .Since both models are fundamentally governed by thesame quantity and will use the same likelihood, any dif-ference in Bayesian inference quantities will be purelyprior driven.We will make the following assumptions on the order-ing of the prior limits:0 = a < b (cid:28) µ (cid:28) b = a = 1 . (13)This ordering is motivated as follows: For the upper limitwe require that the likelihood has essentially droppedto zero. Hence, without loss of generality, we can set . . . . . . r ΛCDM + r L ( r ) = P e − r/µ r µ r ≈ . − − − − − r ΛCDM + log r L (log r ) = P e − r/µ r log µ r ≈ − . FIG. 5. Exponential likelihood distribution from our mock example in Eq. (12) compared to Planck 2018 temperature andpolarisation data (TT,TE,EE+lowE) on the tensor-to-scalar ratio r with uniform sampling of r on the left and logarithmicsampling of r on the right. Note how the mean µ r fulﬁlls the ordering required by Eq. (13) and how the lower limit on log r is well into the saturation plateau of posterior/likelihood. the upper limit to one and require µ (cid:28)

1. The lowerlimit for the positive parameter a can be explicitly set tozero when sampling uniformly. However, when samplinglogarithmically we need to pick some ﬁnite lower limit,which we require to be in the region 10 b (cid:28) µ , wherethe likelihood has essentially saturated with respect to b (see right panel in Fig. 5). The dependence of Bayesianquantities such as the evidence Z or the Kullback–Leiblerdivergence D KL on the prior choice on the one hand andon this lower limit b on the other is the goal of this mockexample. The corresponding priors for models A and B can thusbe written as: π A ( a ) = 1 a − a Θ( a − a ) Θ( a − a ) , (14) π B ( b ) = 1 b − b Θ( b − b ) Θ( b − b ) , (15)where Θ( x ) is the Heaviside step function.We can compute the evidence and Kullback–Leibler divergence for models A and B as: Z A = (cid:90) L ( a ) π A ( a ) d a = P µa − a (cid:16) e − a /µ − e − a /µ (cid:17) , (16) Z B = (cid:90) L (10 b ) π B ( b ) d b = P b − b (cid:20) Ei (cid:18) − b µ (cid:19) − Ei (cid:18) − b µ (cid:19)(cid:21) , (17) D KL ,A = (cid:90) L ( a ) π A ( a ) Z A ln (cid:18) L ( a ) Z A (cid:19) d a = ln P Z A − − P Z A a − a (cid:20) a exp (cid:18) − a µ (cid:19) − a exp (cid:18) − a µ (cid:19)(cid:21) , (18) D KL ,B = (cid:90) L (10 b ) π B ( b ) Z B ln (cid:18) L (10 b ) Z B (cid:19) d b = ln P Z B − P Z B b − b (cid:20) exp (cid:18) − b µ (cid:19) − exp (cid:18) − b µ (cid:19)(cid:21) , (19)where Ei refers to the exponential integral. With the ordering from Eq. (13) we can then approximate these to give:∆ ln Z A ≈ ln µ ∼ − , (20)∆ ln Z B ≈ ln (cid:18) − log µb (cid:19) ∼ , (21)∆ D KL ,A ≈ − ∆ ln Z A − ∼ , (22)∆ D KL ,B ≈ − ∆ ln Z B ∼ , (23) − − − − − b min − − − − ∆ l n Z Mock example: r = 0 r ∼ U (0 , r ∼ U ( b min , Planck 2018 : ΛCDM + r = 0 r ∼ U (0 , r ∼ U ( b min , FIG. 6. Dependence of the log-evidence on the lower priorbound b min : Comparison of the results in Eqs. (16) and (17)for the one-dimensional mock example (solid lines) to thenested sampling results from Table II (dots with error bars).The vertical dotted line corresponds to the mean used in themock likelihood distribution (cf. Fig. 5). where we normalise with respect to a base model O with a = 10 b = 0 ﬁxed, such that Z O = P and D KL ,O = 0.The numerical values assume µ ∼ .

06, which is roughlythe posterior mean of the tensor-to-scalar ratio under uni-form sampling in the preceding section. Hence, we cancompare these zeroth-order numerical approximations tothe results in Fig. 4 and Table II, which indeed match.Figure 6 makes this comparison more thoroughly, com-paring the results from our one-dimensional mock exam-ple in Eqs. (16) and (17) with the nested sampling resultsfrom Table II for a variable lower bound b min of the log-arithmic prior. The mean ln Z of the base model with r = 0 for both the mock example and for the base ΛCDMnested sampling run are zero by deﬁnition of our normali-sation. They serve only as calibration for the models withuniform (blue) and logarithmic (orange) priors. All threenested sampling runs agree well with the prediction fromthe mock example within their margins of errors.Figure 6 illustrates how the evidence levels oﬀ withregards to the choice of the lower bound of the logarith-mic prior (orange line) also reﬂected in the near equalevidences of the nested sampling runs with lower priorbounds of − −

10 respectively. Note that the goodagreement between mock example and data in Fig. 6 isdue to the fact that the tensor-to-scalar ratio is almostcompletely uncorrelated with the other cosmological pa-rameters, with the biggest (yet still small) correlationcoming from the spectral index n s (cf. Fig. 3). IV. NEUTRINO MASSES

In Planck’s baseline cosmology, the neutrinos are as-sumed to be comprised of two massless neutrinos and onemassive neutrino with mass m ν = 0 .

06 eV with the eﬀec-tive number of neutrino species set slightly larger than 3to N eﬀ = 3 .

046 [7, 58, 59].Upcoming CMB experiments such as the Simons Ob-servatory, LiteBIRD or CMB-S4 and large scale struc-ture (LSS) experiments such as Euclid will allow us tofully constrain the sum of neutrino masses (cid:80) m ν . How-ever, even under the most optimistic assumptions, it willnot be possible to disentangle the individual contribu-tions of the three neutrino ﬂavours with cosmologicaldata alone [16]. To achieve that, we need additional datafrom solar, atmospheric, reactor and accelerator experi-ments as summarised in NuFIT 5.0 (2020) [45, 46] thatprovide us with the mass square splittings: δm = 7 . +0 . − . × − eV (NH & IH) , (24)∆ m = (cid:40) . +0 . − . × − eV (NH) , . +0 . − . × − eV (IH) , (25)where δm is the smaller squared mass splitting betweenthe light and the medium neutrino mass for the normalneutrino hierarchy (NH) and between the medium andthe heavy neutrino mass for the inverted neutrino hier-archy (IH), and ∆ m is the larger squared mass splittingbetween the light and the heavy neutrino mass in bothcases.With the knowledge of the two squared mass splittings,the remaining uncertainty lies mostly with the scale ofthe lightest neutrino. In the following Bayesian analysiswe therefore apply Gaussian priors according to Eqs. (24)and (25) and vary over the lightest neutrino mass. A. Neutrino masses: Posteriors

We have taken nested sampling runs for an exten-sion of the base ΛCDM cosmology with three indi-vidual neutrino masses, where we have used both auniform prior m light ∼ U (0 ,

1) and logarithmic priorswith diﬀerent lower bounds, log m light ∼ U ( − , m light ∼ U ( − , m light together with δm and ∆ m from Eqs. (24)and (25): m = (cid:40) m + δm (NH) ,m + ∆ m − δm (IH) , (26) m = m + ∆ m . (27)Figure 7 gives an overview of the stability of thecosmological base parameters across the diﬀerent priors0 ΛCDM+ m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − , m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − , Ω b h Ω c h θ s τ reio log(10 A s ) (NH)(IH) n s FIG. 7. Stability of the cosmological parameters for the 3-neutrino extension of the base ΛCDM cosmology for diﬀerent priorson m light : uniform in blues, logarithmic with lower bound of − −

10 in reds.The darker set of colours corresponds to the normal neutrino hierarchy (NH) and the lighter set to the inverted hierarchy (IH).For each parameter we show the mean and the extent from quantile 0 .

16 to 0 .

84, i.e. the inner 68 % limits. for m light and compares them to the ΛCDM base modelby showing their mean and 68 % ranges. Compared toFig. 1 for the tensor-to-scalar ratio there are some smallparameter shifts visible in relation to the base ΛCDMmodel, but all shifts stay well within the 68 % bounds.Figure 8 shows the one-dimensional marginalised pos-terior distributions for the three individual neutrinomasses m light , m medium , and m heavy , as well as the sum ofall three (cid:80) m ν for both the normal and the inverted hier-archy. We have included shaded histograms to give a no-tion of the prior distributions. The vertical black dottedlines indicate roughly the lower bound for the mediumand heavy neutrino mass as determined from the masssquared splittings under the assumption where the lightneutrino mass is zero.When looking at the lightest neutrino mass in the ﬁrstrow, the picture is very similar to that for the tensor-to-scalar ratio before, and most of what we have saidin Section III A applies here, too. One has an almostexponential drop-oﬀ from zero when sampling uniformlyover the mass (left column), signiﬁcantly compressing theprior, which turns into a more step-like behaviour withrespect to the logarithm of the mass when sampling themass logarithmically (right column).Note that the medium and heavy mass from rows 2 and3 as well as the sum of all masses in the bottom row are derived quantities and therefore do not show the sameprior behaviour visible for the light neutrino mass. Thisis not so apparent for the derived masses, when sam-pling uniformly over the light neutrino mass, althoughone can see a slight step in the histogram of the priorfor the heavy mass in the NH case and for both mediumand heavy mass in the IH case. However, when sam-pling logarithmically over the light neutrino mass, thenthe picture is much clearer. The probability density formedium and heavy neutrino mass bulks up around theirrough lower minimum set by the smaller and larger masssquare splitting respectively. There are two perspectives that one can adopt here.On one hand, one could criticise the choice of a loga-rithmic prior for being ultimately too prior (or theory)driven and not reﬂective of the data. On the other hand,one could say that this is the natural result of our stateof knowledge of the mass square splittings and our trueignorance about the scale of the lightest neutrino mass.We wonder whether this very last statement could becontested, e.g. could we say that we would expect thelightest neutrino mass to be of a magnitude similar tothat of the medium neutrino mass in the NH? However,this is not the case, when checking for precedence bylooking at the other set of leptons, the electron, muonand tauon, where we have roughly around 2 orders ofmagnitude between their masses [60].Comparing the two hierarchies with one another, wecan see that the major diﬀerence lies in the medium neu-trino mass (and therefore also the sum of all neutrinomasses), which is restricted to larger masses in the in-verted hierarchy compared to the normal hierarchy, asexpected from the mass square splitting (black dottedlines). B. Neutrino masses: Evidence andKullback–Leibler divergence

In Fig. 9 we show the results from our nested sam-pling runs for the log-evidence ln Z , KL-divergence D KL ,Bayesian model dimensionality d and posterior averageof the log-likelihood (cid:104) ln L(cid:105) . We again normalise with re-spect to the base ΛCDM model. Table III lists the sum-mary statistics for these quantities. As already the casefor the posterior, the picture here is again similar to theone for the tensor-to-scalar ratio in Section III B.Looking at the distributions for the log-evidence (top-most diagonal panel) shows that the addition of the neu-trino parameters with uniform sampling over the light1 li g h t normal hierarchy ΛCDM+ m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − ,

0) prior posterior m e d i u m m medium & δm ≈ .

009 eV h e a vy m heavy & ∆ m ≈ .

050 eV0 . . . . m ν / P − − − − − ( m ν / P m ν & δm + ∆ m ≈ .

059 eV li g h t inverted hierarchy m e d i u m m medium & √ ∆ m − δm ≈ .

049 eV h e a vy m heavy & ∆ m ≈ .

050 eV0 . . . . m ν / P − − − − − ( m ν / P m ν & .

099 eV

FIG. 8. One-dimensional posterior distributions of neutrino masses with normal hierarchy (NH) in the top panel and withinverted hierarchy (IH) in the bottom panel for TT,TE,EE+lowE data from Planck 2018 and neutrino oscillation data on themass squared splittings from NuFIT 5.0 (2020). The vertical black dotted lines give the rough lower limit on medium and heavymass that is set by the mass squared splittings δm and ∆ m . For the inverted hierarchy these dotted lines appear almoston top of each other. The rows show the posteriors for the light, medium, and heavy neutrino mass and sum of all neutrinomasses, respectively. The columns contrast the diﬀerence between using a uniform (blue, left) or logarithmic (orange and red,right) prior on the light neutrino mass m light . The shaded histograms give a notion of that prior distribution. − − − − Z− . − . − . . ∆ l n Z + ∆ D K L Planck 2018 TT,TE,EE+lowE (+ NuFIT 5.0) normal hierarchy: m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − , inverted hierarchy: m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − , m ν = 0 .

06 eV : ΛCDM0 1 2 3∆ D KL − d − ∆ d ∆ D K L − . − . − . .

0∆ ln Z + ∆ D KL ∆ l n Z FIG. 9. Eﬀect of uniform vs logarithmic priors on the light neutrino mass m light for Bayesian model comparison: log-evidence ∆ ln Z , Kullback–Leibler divergence D KL , Bayesian model dimensionality d , and posterior average of the log-likelihood (cid:104) ln L(cid:105) P = ln Z + D KL . The probability distributions represent errors arising from the nested sampling process.In the limit of inﬁnite life points these distributions would become point statistics, in contrast to posterior distributions. Wenormalise with respect to the ΛCDM model with two massless and only one massive neutrino with m ν = 0 .

06 eV. Note, howswitching from uniform to logarithmic sampling of m light moves the contours along their ln Z , D KL degeneracy line, i.e. relativeentropy is traded in for evidence. Note further by comparison of the orange and red lines, how changing the lower bound ofthe logarithmic sampling interval (by 5 log-units!) barely aﬀects the contours (bar some expected statistical ﬂuctuation due tothe sampling error). neutrino mass (either hierarchy) is disfavoured with over3 log-units compared to the base ΛCDM model with asingle massive neutrino of ﬁxed mass (and 2 massless).Since the mass square splittings enter on the prior levelin our analysis and remain essentially unconstrained bythe cosmological data, any change to the evidence is al- most entirely driven by the light neutrino mass param-eter. Hence, it is not surprising that upon switching toa logarithmic prior on m light the log-evidence increasesagain while the KL-divergence drops close to the level ofthe ΛCDM model. We need to keep in mind that sincethis is an extension to the ΛCDM model, it has in prin-3 TABLE III. Mean and standard deviation of the log-evidence ln Z , Kullback–Leibler divergence D KL and Bayesian modeldimensionality d of the base ΛCDM cosmology and its 3-neutrino extension from Planck 2018 TT,TE,EE+lowE data [43]. Thesecond block of rows shows the results from the normal neutrino hierarchy and the third block for the inverted hierarchy. The ∆indicate normalisation with respect to the base ΛCDM model. Model ln

Z D KL d ∆ ln Z ∆ D KL ∆ d ΛCDM − . ± .

19 38 . ± .

19 17 . ± . − . ± .

19 0 . ± .

19 0 . ± . n o r m a l ΛCDM+ m light ∼ U ( 0 , − . ± .

27 40 . ± .

27 18 . ± . − . ± .

27 2 . ± .

27 1 . ± . m light ∼ U ( − , − . ± .

22 38 . ± .

22 17 . ± . − . ± .

22 0 . ± .

22 0 . ± . m light ∼ U ( − , − . ± .

22 39 . ± .

22 17 . ± . − . ± .

22 0 . ± .

22 0 . ± . i n v e rt e d ΛCDM+ m light ∼ U ( 0 , − . ± .

27 40 . ± .

27 18 . ± . − . ± .

27 2 . ± .

27 1 . ± . m light ∼ U ( − , − . ± .

23 39 . ± .

23 17 . ± . − . ± .

23 0 . ± .

23 0 . ± . m light ∼ U ( − , − . ± .

22 38 . ± .

22 17 . ± . − . ± .

22 0 . ± .

22 0 . ± . ciple a better chance of ﬁtting the data, such that anydiﬀerence in the Bayesian evidence can be attributed toan Occam penalty, which the shift between uniform andlogarithmic sampling conﬁrms.As expected from our investigations for the tensor-to-scalar ratio and especially with regards to our mock ex-ample from Section III C, changing the lower bound forthe logarithmic prior does not aﬀect the Bayesian evi-dence. We have again performed runs with two diﬀerentlower bounds of log m light = − m light = − (cid:104) ln L(cid:105) P = ln Z + D KL we again roughly conﬁrm∆ ln Z uni + ∆ D KL , uni ≈ − , (28)∆ ln Z log + ∆ D KL , log ≈ , (29)matching our mock results from Eqs. (20) to (23), inde-pendent from the mock parameter µ . C. Neutrino hierarchy

A Bayesian model comparison of the normal vs theinverted neutrino hierarchy is beyond the scope of thispaper and has been done before with more stringentdata [17, 61, 62]. However, with posteriors and evidencesat hand, we shall brieﬂy discuss the situation here.There have been claims to a strong preference of thenormal over the inverted neutrino hierarchy [10], how-ever, such strong evidence can typically be traced back toprior volume eﬀects [11], i.e. the eﬀect of a reduced sam-pling space for the inverted hierarchy. In other words,we need to watch out and properly distinguish to whatextent any Bayesian preference is assigned already on the prior level and to what extent is that preference indeeddriven by the data.In our analysis both hierarchies start out on an equalfooting. With the same prior on the light neutrino massand equivalent Gaussian priors on the mass squared split-tings from neutrino oscillation experiments, the prior vol-ume for both hierarchies is essentially the same. Notethat although the means for the larger mass squared split-ting ∆ m are slightly diﬀerent in the two hierarchies, itsstandard deviations are essentially the same.There is a slight tendency for all prior options of a bet-ter ﬁt of the normal compared to the inverted neutrinohierarchy. However, with an evidence diﬀerence of lessthan one log-unit (odds of maximally 2 : 1) any prefer-ence for the normal hierarchy is meagre at best, especiallywhen also accounting for the sampling error (see Fig. 9).It should be noted, though, that we have used only CMBtemperature and polarisation data here. Adding datafrom CMB lensing or baryon acoustic oscillations wouldhave further shrunk the constraints on the sum of neu-trino masses and thereby possibly strengthened the casefor the normal hierarchy. V. DISCUSSION

We demonstrate how switching between a uniform anda logarithmic prior on some single-bounded model pa-rameter results in a trade-oﬀ between Bayesian evidenceand Kullback–Leibler divergence (or relative entropy).The common scenario is that of insuﬃcient data sensi-tivity, leading to a one-sided bound on a parameter. Fora location parameter this typically causes an exponentialdrop-oﬀ, which translates to a step-like behaviour whenturned into the corresponding scale parameter. We showthat the ambiguity of the lower bound of the scale pa-rameter does not aﬀect a Bayesian model comparison,provided the lower bound is chosen suﬃciently far intothe likelihood plateau.We demonstrate this behaviour for two cases of pa-rameter extensions to the ΛCDM model of cosmology,namely for the tensor-to-scalar ratio of primordial per-4turbations and for the case of three non-degenerate neu-trino masses. In both cases we conﬁrm that switchingfrom a uniform prior to a logarithmic prior will get ridof (most of) the Occam penalty associated with that pa-rameter, since unconstrained parameters do not aﬀectthe Bayesian evidence. Thus the Bayesian evidence isroughly on par with the un-extended (base) model, withthe only diﬀerence in the form of an uninformative pa-rameter. Furthermore and for the same reason, the exactchoice of the lower bound for the logarithmic prior doesnot change the Bayesian evidence. When the likelihoodlevels oﬀ, e.g. due to insuﬃcient sensitivity in the data,then so does the Bayesian evidence.

ACKNOWLEDGMENTS [1] J. Bernoulli,

Ars conjectandi (Thurnisiorum, Basilae,1713).[2] J. M. Keynes,

A Treatise On Probability (Macmillan AndCo., 1921).[3] D. S. Sivia and J. Skilling,

Data analysis : a Bayesiantutorial. (Oxford University Press, 2006) pp. 1–246.[4] K. Lau, J.-Y. Tang, and M.-C. Chu, Cosmic microwavebackground constraints on the tensor-to-scalar ratio, Re-search in Astronomy and Astrophysics , 635 (2014).[5] G. Barenboim and W.-I. Park, On the tensor-to-scalar ratio in large single-ﬁeld inﬂation models (2015),arXiv:1509.07132.[6] P. Creminelli, S. Dubovsky, D. L´opez Nacir, M. Si-monovi´c, G. Trevisan, G. Villadoro, and M. Zaldarriaga,Implications of the scalar tilt for the tensor-to-scalar ra-tio, Physical Review D , 123528 (2015).[7] Planck Collaboration, Planck 2018 results. VI. Cosmo-logical parameters, Astronomy & Astrophysics , A6(2020).[8] The Keck Array and BICEP2 Collaborations, Con-straints on Primordial Gravitational Waves UsingPlanck, WMAP, and New BICEP2/Keck Observationsthrough the 2015 Season, Physical Review Letters ,221301 (2018).[9] K. Hirano, Inﬂation with very small tensor-to-scalar ratio(2019), arXiv:1912.12515.[10] F. Simpson, R. Jimenez, C. Pena-Garay, and L. Verde,Strong Bayesian evidence for the normal neutrino hier-archy, Journal of Cosmology and Astroparticle Physics (06), 029.[11] T. Schwetz, K. Freese, M. Gerbino, E. Giusarma,S. Hannestad, M. Lattanzi, O. Mena, and S. Vagnozzi,Comment on ”Strong Evidence for the Normal NeutrinoHierarchy” (2017), arXiv:1703.04585.[12] A. Caldwell, A. Merle, O. Schulz, and M. Totzauer, Global Bayesian analysis of neutrino mass data, Phys-ical Review D , 073001 (2017).[13] F. Capozzi, E. Di Valentino, E. Lisi, A. Marrone, A. Mel-chiorri, and A. Palazzo, Global constraints on absoluteneutrino masses and their ordering, Physical Review D , 096014 (2017).[14] A. Loureiro, A. Cuceu, F. B. Abdalla, B. Moraes,L. Whiteway, M. McLeod, S. T. Balan, O. Lahav,A. Benoit-L´evy, M. Manera, R. P. Rollins, and H. S.Xavier, Upper Bound of Neutrino Masses from CombinedCosmological Observations and Particle Physics Experi-ments, Physical Review Letters , 081301 (2019).[15] F. Capozzi, E. Di Valentino, E. Lisi, A. Marrone, A. Mel-chiorri, and A. Palazzo, Addendum to “Global con-straints on absolute neutrino masses and their ordering”,Physical Review D , 116013 (2020).[16] M. Archidiacono, S. Hannestad, and J. Lesgourgues,What will it take to measure individual neutrino massstates using cosmology? (2020), arXiv:2003.03354.[17] S. R. Choudhury and S. Hannestad, Updated results onneutrino mass and mass hierarchy from cosmology withPlanck 2018 likelihoods, Journal of Cosmology and As-troparticle Physics (07), 037.[18] P. St¨ocker, C. Bal´azs, S. Bloor, T. Bringmann, T. E.Gonzalo, W. Handley, S. Hotinli, C. Howlett, F. Kahlhoe-fer, J. J. Renk, P. Scott, A. C. Vincent, and M. White(The GAMBIT Cosmology Workgroup), Strengthen-ing the bound on the mass of the lightest neutrinowith terrestrial and cosmological experiments (2020),arXiv:2009.03287.[19] The Simons Observatory Collaboration, The SimonsObservatory: science goals and forecasts, Journal ofCosmology and Astroparticle Physics (02), 056,arXiv:1808.07445. [20] The LiteBIRD Collaboration, Astro2020 APC White Pa-per: LiteBIRD: an all-sky cosmic microwave backgroundprobe of inﬂation, Bulletin of the AAS (2019).[21] CMB-S4 Collaboration, CMB-S4 Science Case, ReferenceDesign, and Project Plan (2019), arXiv:1907.04473.[22] D. J. C. MacKay, Model Comparison and Occam’s Ra-zor, in Information Theory, Inference and Learning Al-gorithms (Cambrdige University Press, 2003) Chap. 28,pp. 343–355.[23] D. J. C. MacKay, Laplace’s Method, in

Information The-ory, Inference and Learning Algorithms (Cambrdige Uni-versity Press, 2003) Chap. 28, pp. 341–342.[24] J. M. Dickey, The Weighted Likelihood Ratio, Linear Hy-potheses on Normal Location Parameters, The Annals ofMathematical Statistics , 204 (1971).[25] R. Trotta, Applications of Bayesian model selection tocosmological parameters, Monthly Notices of the RoyalAstronomical Society , 72 (2007).[26] R. Trotta, Forecasting the Bayes factor of a future obser-vation, Monthly Notices of the Royal Astronomical Soci-ety , 819 (2007).[27] L. Verde, S. M. Feeney, D. J. Mortlock, and H. V. Peiris,(Lack of) Cosmological evidence for dark radiation afterPlanck, Journal of Cosmology and Astroparticle Physics (09), 013.[28] A. Heavens, Y. Fantaye, A. Mootoovaloo, H. Eg-gers, Z. Hosenie, S. Kroon, and E. Sellentin, MarginalLikelihoods from Monte Carlo Markov Chains (2017),arXiv:1704.03472.[29] A. Heavens, Y. Fantaye, E. Sellentin, H. Eggers, Z. Ho-senie, S. Kroon, and A. Mootoovaloo, No Evidence forExtensions to the Standard Cosmological Model, Physi-cal Review Letters , 101301 (2017).[30] J. Skilling, Nested sampling for general Bayesian compu-tation, Bayesian Analysis , 833 (2006).[31] D. S. Sivia and J. Skilling, Nested sampling, in Dataanalysis : a Bayesian tutorial. (Oxford University Press,2006) Chap. 9, pp. 181–208.[32] F. Feroz and M. P. Hobson, Multimodal nested sam-pling: an eﬃcient and robust alternative to MarkovChain Monte Carlo methods for astronomical data anal-yses, Monthly Notices of the Royal Astronomical Society , 449 (2008).[33] F. Feroz, M. P. Hobson, and M. Bridges, MultiNest: aneﬃcient and robust Bayesian inference tool for cosmol-ogy and particle physics, Monthly Notices of the RoyalAstronomical Society , 1601 (2009).[34] F. Feroz, M. P. Hobson, E. Cameron, and A. N. Pettitt,Importance Nested Sampling and the MultiNest Algo-rithm, The Open Journal of Astrophysics (2019).[35] W. J. Handley, M. P. Hobson, and A. N. Lasenby,PolyChord: nested sampling for cosmology, Monthly No-tices of the Royal Astronomical Society: Letters , L61(2015), arXiv:1502.01856.[36] W. J. Handley, M. P. Hobson, and A. N. Lasenby,PolyChord: next-generation nested sampling, MonthlyNotices of the Royal Astronomical Society , 4385(2015), arXiv:1506.00171.[37] D. S. Sivia and J. Skilling, Model selection, in Data analy-sis : a Bayesian tutorial. (Oxford University Press, 2006)Chap. 4, pp. 78–102.[38] C. Heymans, T. Tr¨oster, M. Asgari, C. Blake, H. Hilde-brandt, B. Joachimi, K. Kuijken, C.-A. Lin, A. G.S´anchez, J. L. van den Busch, A. H. Wright, A. Amon, M. Bilicki, J. de Jong, M. Crocce, A. Dvornik, T. Erben,M. C. Fortuna, F. Getman, B. Giblin, K. Glazebrook,H. Hoekstra, S. Joudaki, A. Kannawadi, F. K¨ohlinger,C. Lidman, L. Miller, N. R. Napolitano, D. Parkinson,P. Schneider, H. Shan, E. Valentijn, G. V. Kleijn, andC. Wolf, KiDS-1000 Cosmology: Multi-probe weak grav-itational lensing and spectroscopic galaxy clustering con-straints, arXiv (2020), arXiv:2007.15632.[39] W. Handley, Curvature tension: evidence for a closeduniverse (2019), arXiv:1908.09139.[40] W. Handley and P. Lemos, Quantifying tensions in cos-mological parameters: Interpreting the DES evidence ra-tio, Physical Review D , 043504 (2019).[41] D. J. Spiegelhalter, N. G. Best, B. P. Carlin, andA. van der Linde, Bayesian measures of model complexityand ﬁt, Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 583 (2002).[42] W. Handley and P. Lemos, Quantifying dimensionality:Bayesian cosmological model complexities, Physical Re-view D , 023512 (2019).[43] Planck Collaboration, Planck 2018 results. V. CMBpower spectra and likelihoods, Astronomy & Astro-physics , A5 (2020).[44] I. Esteban, M. C. Gonzalez-Garcia, A. Hernandez-Cabezudo, M. Maltoni, and T. Schwetz, Global analysisof three-ﬂavour neutrino oscillations: synergies and ten-sions in the determination of θ δ CP, and the mass or-dering, Journal of High Energy Physics , 106 (2019).[45] I. Esteban, M. C. Gonzalez-Garcia, M. Maltoni,T. Schwetz, and A. Zhou, The fate of hints: up-dated global analysis of three-ﬂavor neutrino oscillations(2020), arXiv:2007.14792.[46] NuFIT — NuFIT 5.0, (2020),accessed: 2020-08-13.[47] J. Torrado and A. Lewis, Cobaya: Code forBayesian Analysis of hierarchical physical models (2020),arXiv:2005.05290.[48] A. Lewis and S. Bridle, Cosmological parameters fromCMB and other data: A Monte Carlo approach, PhysicalReview D , 103511 (2002).[49] A. Lewis, Eﬃcient sampling of fast and slow cosmologicalparameters, Physical Review D , 103529 (2013).[50] R. M. Neal, Taking Bigger Metropolis Steps by DraggingFast Variables (2005), arXiv:math/0502099.[51] J. Lesgourgues, The Cosmic Linear Anisotropy SolvingSystem (CLASS) I: Overview (2011), arXiv:1104.2932.[52] D. Blas, J. Lesgourgues, and T. Tram, The Cosmic LinearAnisotropy Solving System (CLASS). Part II: Approxi-mation schemes, Journal of Cosmology and AstroparticlePhysics (07), 034.[53] J. Lesgourgues and T. Tram, The Cosmic LinearAnisotropy Solving System (CLASS) IV: eﬃcient imple-mentation of non-cold relics, Journal of Cosmology andAstroparticle Physics (09), 032.[54] A. Lewis, GetDist: a Python package for analysing MonteCarlo samples (2019), arXiv:1910.13970.[55] W. Handley, anesthetic: nested sampling visualisation,Journal of Open Source Software , 1414 (2019).[56] L. T. Hergt, Bayesian evidence for the tensor-to-scalar ra-tio and neutrino masses: Eﬀects of uniform vs logarith-mic priors (supplementary inference products), https://doi.org/10.5281/zenodo.4556360 (2021).[57] J. C. Forbes, A PDF PSA, or Never gonnaset xscale again – guilty feats with logarithms (2020), arXiv:2003.14327.[58] G. Mangano, G. Miele, S. Pastor, T. Pinto, O. Pisanti,and P. D. Serpico, Relic neutrino decoupling includingﬂavour oscillations, Nuclear Physics B , 221 (2005).[59] P. F. de Salas and S. Pastor, Relic neutrino decouplingwith ﬂavour oscillations revisited, Journal of Cosmologyand Astroparticle Physics (07), 051.[60] Particle Data Group, Review of Particle Physics, Progress of Theoretical and Experimental Physics (2020).[61] S. Hannestad and T. Schwetz, Cosmology and the neu-trino mass ordering, Journal of Cosmology and Astropar-ticle Physics (11), 035.[62] A. F. Heavens and E. Sellentin, Objective Bayesian anal-ysis of neutrino masses and hierarchy, Journal of Cosmol-ogy and Astroparticle Physics2018