Bayesian evidence for the tensor-to-scalar ratio r and neutrino masses m_ν: Effects of uniform vs logarithmic priors
Lukas T. Hergt, Will J. Handley, Michael P. Hobson, Anthony N. Lasenby
BBayesian evidence for the tensor-to-scalar ratio r and neutrino masses m ν :Effects of uniform vs logarithmic priors L. T. Hergt,
1, 2, ∗ W. J. Handley,
1, 2, † M. P. Hobson, ‡ and A. N. Lasenby
1, 2, § Astrophysics Group, Cavendish Laboratory, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK Kavli Institute for Cosmology, Madingley Road, Cambridge, CB3 0HA, UK (Dated: February 24, 2021)We review the effect that the choice of a uniform or logarithmic prior has on the Bayesian evidenceand hence on Bayesian model comparisons when data provide only a one-sided bound on a parameter.We investigate two particular examples: the tensor-to-scalar ratio r of primordial perturbations andthe mass of individual neutrinos m ν , using the cosmic microwave background temperature andpolarisation data from Planck 2018 and the NuFIT 5.0 data from neutrino oscillation experiments.We argue that the Kullback–Leibler divergence, also called the relative entropy, mathematicallyquantifies the Occam penalty. We further show how the Bayesian evidence stays invariant uponchanging the lower prior bound of an upper constrained parameter. While a uniform prior on thetensor-to-scalar ratio disfavours the r -extension compared to the base ΛCDM model with odds ofabout 1 : 20, switching to a logarithmic prior renders both models essentially equally likely. ΛCDMwith a single massive neutrino is favoured over an extension with variable neutrino masses with oddsof 20 : 1 in case of a uniform prior on the lightest neutrino mass, which decreases to roughly 2 : 1for a logarithmic prior. For both prior options we get only a very slight preference for the normalover the inverted neutrino hierarchy with Bayesian odds of about 3 : 2 at most. I. INTRODUCTION
The “principle of insufficient reason” (Bernoulli [1]) or“principal of indifference” (renamed by Keynes [2]) statesthat in the event of multiple, mutually exclusive, possibleoutcomes and in the absence of any relevant evidence, weshould assign the same probability to all outcomes [3]. Ina Bayesian analysis, this is generalised to continuous pa-rameters in the form of uninformative priors. Completeprior ignorance about a location parameter is representedby assigning a uniform distribution to the prior. Igno-rance about a scale parameter on the other hand is rep-resented by assigning a logarithmic prior, i.e. a uniformdistribution on the logarithm of the parameter [3]. How-ever, it is not always clear whether a parameter shouldbe treated as a location or scale parameter. This is quitecommonly discussed when faced with a strictly positiveparameter such as a mass or an amplitude that is verysmall, yet still unconstrained. In general, the decisionwhether to use a uniform or logarithmic prior has effectson credibility bounds and on the Bayesian evidence, i.e.on both levels of Bayesian inference: parameter estima-tion and model comparison. Under the reasoning thatyou can set the lower bound to zero and thus incorpo-rate all possible small values, the uniform prior is oftenpreferred, whereas the logarithmic prior is criticised fora lack of an unambiguous lower bound, and because theultimate choice of the lower bound might affect a 95 %credibility bound and the Bayesian evidence. ∗ [email protected] † [email protected] ‡ [email protected] § [email protected] In this paper we show that the very last statement istypically not true and that the choice of a lower boundfor such a logarithmic prior is less problematic than com-monly assumed. To that end we will look at two cos-mological examples in particular: the tensor-to-scalarratio r of primordial perturbations as well as the neu-trino masses m ν , where both uniform and logarithmicpriors have been applied historically (for the tensor-to-scalar ratio, see e.g. [4–9]; and for the neutrino massessee e.g. [7, 10–18]).The best constraints on the tensor-to-scalar ratio r . (cid:46) .
06 come from joint analyses of cosmic microwavebackground (CMB) data, CMB lensing, and baryonacoustic oscillations (BAO) [7, 8], where a uniform prioron r was adopted. A common goal of upcoming CMB ex-periments such as the Simons Observatory [19], the Lite-BIRD satellite [20] and the next-generation “Stage-4”ground-based CMB experiment (CMB-S4) [21] is to pushto a tensor-to-scalar ratio of r ∼ − . In pushing to suchsmall values of r , the question of whether to adopt a uni-form or logarithmic prior in one’s analysis becomes morepertinent.Since neutrino oscillation experiments measure non-zero mass differences, we can conclude that two or moreneutrinos must have mass. However, the absolute scaleof the individual neutrino masses m i cannot be measuredby the oscillation experiments, but only the mass-squaredsplittings ∆ m ij = m i − m j . The strongest bound on theabsolute neutrino mass scales is currently provided againby combined CMB and BAO data, limiting the sum ofthe neutrino masses to (cid:80) m ν (cid:46) .
12 eV at 95 % confi-dence [17] (see also [15, 18] for other recent analyses).When investigating the three discrete neutrino masseigenstates, the question of uniform vs logarithmic priorsarises again. Note, however, that given the known masssplittings from oscillation experiments, the three neu- a r X i v : . [ a s t r o - ph . C O ] F e b trino mass scales are linked. If one mass scale is known,then the others can be inferred from the mass squaredsplittings. Hence, only one mass scale is truly unknownand assuming scale invariant (i.e. logarithmic) priors onall three neutrino masses simultaneously would undulyfavour smaller neutrino masses and thus a normal neu-trino hierarchy (NH) with m < m (cid:28) m compared toan inverted neutrino hierarchy (IH) with m (cid:28) m < m (for more on this see also discussions in [10, 11]).This paper is structured as follows: In Section II wewill start by giving a brief description of our Bayesiananalysis framework, including the data and base cosmo-logical model used, as well as the means of computingthe Bayesian evidence. In Section III we apply this tothe tensor-to-scalar ratio r and compare to a theoreticalmock example. In Section IV we perform the equivalentanalysis for the neutrino masses and contrast the resultsfor the two neutrino hierarchies. We conclude in Sec-tion V. II. METHODSA. Bayesian inference
There are two levels to Bayesian inference: parameterestimation and model comparison (see e.g. [3, 22]). Boththese levels are based on Bayes’ theorem which relatesinference inputs (likelihood and prior) to yielded outputs(posterior and evidence):Pr( θ | D, M ) × Pr( D | M ) = Pr( D | θ, M ) × Pr( θ | M ) , Posterior × Evidence = Likelihood × Prior , P M ( θ ) × Z M = L M ( θ ) × π M ( θ ) . (1)The posterior P is the main quantity of interest in a pa-rameter estimation, representing our state of knowledgeabout the parameters θ in a given model M , inferredfrom our prior information π and the likelihood L of theparameters under the data D . The evidence Z is pivotalfor model comparisons.Were we interested only in parameter estimation, thenit would be sufficient to care only about the proportional-ity of the posterior to the product of likelihood and priorand the Bayesian evidence could be neglected as a merenormalisation factor. However, for the comparison of saytwo models A and B the evidence becomes importantwith the posterior odds ratio of the two models given by:Pr( B | D )Pr( A | D ) = Pr( B )Pr( A ) × Z B Z A . (2)Typically models are assigned the same prior preferencesuch that the first term on the right-hand side becomesunity, leaving simply the evidence ratio Z B / Z A , whichcan be interpreted as betting odds for the two models.We typically quote this in terms of the log-difference ofevidences between two models ∆ ln Z = ln( Z B / Z A ). The evidence is the marginal likelihood Z M = (cid:90) L M ( θ ) π M ( θ ) d θ = (cid:10) L M (cid:11) π , (3)and can be numerically approximated with Laplace’smethod [23], estimated from a posterior distribution at-tained e.g. from a Monte Carlo Markov Chain (MCMC)via the Savage–Dickey density ratio (SDDR) [24–27]or via a nearest-neighbour approach [28, 29] or com-puted more directly with nested sampling, which canadditionally estimate the corresponding numerical uncer-tainty [30–36].If the posterior distribution and the evidence have bothbeen determined, then as a byproduct one can also com-pute the Kullback–Leibler (KL) divergence, also calledthe relative entropy: D KL ,M = (cid:90) P M ( θ ) ln (cid:18) P M ( θ ) π M ( θ ) (cid:19) d θ = (cid:28) ln P M π M (cid:29) P , (4)which quantifies the overall compression from prior toposterior distribution. B. Kullback–Leibler divergence and Occam’s razor
It should be noted that the Bayesian evidence naturallyincorporates the so-called Occam’s razor that penalisesmodels for unnecessary complexity. It can be formulatedas the principle to “Accept the simplest explanation thatfits the data” [22]. This can be neatly demonstrated us-ing a Gaussian likelihood with mean µ and variance σ having a single parameter x ∈ [ x min , x max ] with a uniformprior (see e.g. [22, 37]). The Bayesian evidence decom-poses into two terms: Z = L ( µ ) × σ √ πx max − x min (5)The first term on the right-hand side is the maximumlikelihood point. With additional parameters, this termwould only increase and therefore can only favour thegiven model. The second term incorporates the ratio ofposterior to prior uncertainty. Since the posterior un-certainty σ is generally smaller than the prior uncer-tainty ( x max − x min ), this term penalises the given modelfor each of its parameters and thus embodies its Occampenalty. Note that the posterior and prior uncertaintiesappear inversely in the normalisation factor of the actualdistributions.More generally, the KL-divergence can actually be usedas an estimator of the Occam penalty, which becomesclearer when rewriting the log-evidence according to ∗ :ln (cid:18)(cid:90) L M π M d θ (cid:19) = (cid:90) P M ln L M d θ − (cid:90) P M ln (cid:18) P M π M (cid:19) d θ, (log-)evidence = parameter fit − Occam penalty , ln Z M = (cid:104) ln L M (cid:105) P − D KL ,M , (6)where we have dropped the dependence on θ to savespace. Analogous to the example from Eq. (5), the firstterm on the right-hand side encapsulates the fit of themodel, while the KL-divergence is the average log-ratioof posterior to prior distribution (see also the last equal-ity in Eq. (4)), thus identifying it as the Occam penalty.This equation appears in passing in the appendix of [38]in the calculation of tension metrics and it has been intu-itively applied e.g. in the third figure in [39], but as far asthe authors are aware Eq. (6) is the first time this analyt-ical form is explicitly connected to the trade-off betweenparameter fit and model complexity.While known to experts, a widely unappreciated factis that the evidence stays unaffected by an un constrainedparameter, i.e. when the data provide no information forthat parameter [40]. In terms of Eq. (6) this is reflected inan invariant likelihood (cid:104) ln L(cid:105) P and a zero KL-divergence D KL ( π = P ) = 0. Using the alternative labelling we canrephrase this: Adding an unconstrained parameter doesnot affect the fit, but also does not incur an additionalOccam penalty and hence also leaves the evidence unaf-fected.A popular measure for an effective number of con-strained parameters is the Bayesian model complex-ity [41]. However, this quantity relies on the use of a pointestimator such as the posterior mean or mode, which iswhy we prefer using the Bayesian model dimensionality d in the following sections (see [42] for a more detaileddiscussion on Bayesian complexities/dimensionalities).The Bayesian model dimensionality can be computedstraightforwardly from the posterior distribution as theposterior variance of the log-likelihood: d M (cid:90) P M ( θ ) (cid:18) ln P M ( θ ) π M ( θ ) − D KL ,M (cid:19) d θ (7)= (cid:10) (ln L M ) (cid:11) P − (cid:10) ln L M (cid:11) P . (8)Note the connection to Eq. (6), where we used the pos-terior average of the log-likelihood. As such, these twoquantities provide an interesting additional perspectiveto that of the (ln Z , D KL ) pair. The posterior averageof the log-likelihood informs us about the parameter fitand the posterior variance of the log-likelihood measuresthe models’ complexity in the form of the number of con-strained parameters. ∗ Note that proving Eq. (6) becomes surprisingly straight-forwardwhen going from right to left and making use of Bayes’ theo-rem (1).
TABLE I. Cosmological parameters of the base ΛCDM cos-mology the way they are sampled in our Bayesian analysis.The second column shows their corresponding prior ranges.The third column lists their mean and 68 % limits from ourbase ΛCDM nested sampling run with TT,TE,EE+lowE datafrom Planck 2018 and is in almost perfect agreement with ta-ble 2 in [7].Parameter Prior range 68 % limits ω b = h Ω b [0.019, 0.025] 0 . ± . ω c = h Ω c [0.025, 0.471] 0 . ± . θ s [1.03, 1.05] 1 . ± . τ reio [0.01, 0.40] 0 . + 0 . − . ln (cid:0) A s (cid:1) [2.5, 3.7] 3 . + 0 . − . n s [0.885, 1.040] 0 . ± . C. Cosmological models
In the following sections we perform Bayesian modelcomparisons on one-parameter extensions to the ΛCDMmodel (universe dominated today by a cosmological con-stant Λ and by cold dark matter), which we parametrisewith the standard 6 cosmological parameters listed inTable I with their corresponding prior ranges.In Section III we extend the ΛCDM model by thetensor-to-scalar ratio r of primordial perturbations,which is set to r = 0 in ΛCDM. In Section IV we extendthe base model by allowing for three distinct neutrinomasses. In the ΛCDM model these are typically fixedto two massless neutrinos and a single massive neutrinowith m ν = 0 .
06 eV.
D. Data
We use the 2018 temperature and polarisation datafrom the Planck satellite [43], which we abbreviate as“TT,TE,EE+lowE”. Note that this is the same abbrevi-ation as in the corresponding Planck publication itself.The specific use of “lowE” but lack of “lowT” might leadto the conclusion that only E-mode and no temperaturedata were used at low multipoles. However, this is not thecase. Both high- (cid:96) and low- (cid:96) temperature auto-correlationdata are implied in that abbreviation.In Section IV we additionally use the NuFIT 5.0 (2020)data from neutrino oscillation experiments [44–46] toset Gaussian priors on the mass squared splittings δm and ∆ m . E. Statistical and cosmological software
We explore the posterior distributions of cosmo-logical and nuisance parameters using
Cobaya [47],
ΛCDM+ r ∼ U (0 , r ∼ U ( − , r ∼ U ( − , Ω b h Ω c h θ s τ reio log(10 A s ) n s FIG. 1. Stability of the cosmological parameters for the tensor-to-scalar ratio extension of the base ΛCDM cosmology withdifferent priors on r : uniform in blue, logarithmic with lower bound − −
10 inred. For each parameter we show the mean and the extent from quantile 0 .
16 to 0 .
84, i.e. the inner 68 % limits. which provides both the MCMC sampler developedfor
CosmoMC [48, 49] with a “fast dragging” proceduredescribed in [50] and also the nested sampling code
PolyChord [35, 36], tailored for high-dimensional pa-rameter spaces, which can simultaneously determine theBayesian evidence alongside its numerical uncertainty.Both samplers are interfaced with the cosmological Boltz-mann code
CLASS [51–53], which computes the theoreti-cal CMB power spectra for temperature and polarisationmodes.We use
GetDist [54] to generate the data tablesof marginalised parameter values. The post-processingof the nested sampling output for the computation ofBayesian evidence, KL-divergence and Bayesian modeldimensionality, as well as the plotting functionality forposterior contours is performed using the python module anesthetic [55].All inference products required to compute the resultspresented in this paper are available for download from
Zenodo [56].
III. TENSOR-TO-SCALAR RATIO
The tensor-to-scalar ratio r quantifies what fraction ofprimordial perturbations is in the form of gravitationalwaves, produced e.g. during cosmic inflation and poten-tially detectable in their contribution to CMB B-modes.So far, the major experiments probing the contribu-tion of tensor modes to the CMB power spectrum haveadopted a uniform prior on r [7, 8]. However, the com-mon target of r ∼ − for many upcoming CMB exper-iments such as the Simons Observatory, the LiteBIRDsatellite or CMB-S4, warrants the question as to whethera scale invariant prior might be better to handle such lowvalues. This question frequently brings up arguments ofthe ambiguity of the lower bound to a logarithmic priorand its potential effect on the Bayesian evidence. A. Tensor-to-scalar ratio: Posteriors
Figure 1 gives an overview of the stability of the cosmo-logical base parameters across different priors for r and compares them to the ΛCDM base model by showingtheir mean and 68 % ranges. In addition to the ΛCDMbase run, we have taken nested sampling runs with both auniform prior on the tensor-to-scalar ratio r ∼ U (0 ,
1) andwith two logarithmic priors with different lower bounds,log r ∼ U ( − ,
0) and log r ∼ U ( − , n s andthe tensor-to-scalar ratio r (or log r ) in particularby showing their one-dimensional marginalised posteriordistributions. Figure 3 shows the corresponding two-dimensional joint probability contours of the 68 % and95 % levels for n s and r (or log r ). We have includedshaded histograms in the 1d plots and scatter points inthe 2d plots to give a notion of the prior distributions.As already expected from Fig. 1, the marginalised pos-terior for the spectral index is near identical, irrespectiveof the prior on r . The tensor-to-scalar ratio in the rightpanel of Fig. 2 drops off exponentially from r = 0 tolarger values, thereby significantly compressing the prior,which spans up to unity. When sampling logarithmicallythe posterior levels off towards small scales and shows astep-like behaviour at the upper bound.We have included the kernel density estimate from theuniform r -samples in the log r plot and vice versa (dot-ted lines). This allows us to compare more directly whatsort of numerical values were actually used in those twocases. At a first naive glance one might be concerned thatthe dotted blue line actually indicates a lower bound,however, looking at the blue shaded histogram in the 1dplot or the blue scatter points in the 2d plot it becomesclear that this is entirely prior driven and reflects thatuniform sampling of r does not reach such low values (seealso [57] on a related discussion about the importance ofadjusting the density when setting the x -scale to ‘log’).With a target of r ∼ − this highlights how the pa-rameter space is sampled rather inefficiently at those lowvalues of interest when applying a uniform prior, whichwould be an argument for adopting a logarithmic priorin the future.One problem to be aware of with the unconstrainedposteriors from a logarithmic prior is that upper bounds .
95 0 .
97 0 . n s ΛCDM+ r ∼ U (0 , r ∼ U ( − , r ∼ U ( − ,
0) posterior prior − − − r . . . r FIG. 2. Normalised one-dimensional posterior distributionsfor Planck 2018 TT,TE,EE+lowE data for the spectral in-dex n s and the tensor-to-scalar ratio r of primordial per-turbations, contrasting the difference between using a uni-form (blue) or logarithmic (orange and red) prior on r . Theshaded histograms illustrate the prior distributions. Note thatthe dotted lines show the inferred parameters r and log r inthe respective opposite domain. This is done only to providea more direct visual comparison. However, these dotted con-tours are not data-driven parameter constraints. In particularthe blue dotted line results purely from a lack of small priorsamples when sampling uniformly over r , and does not in factconstitute a lower bound on the tensor-to-scalar ratio. in form of e.g. 95 % limits will change with the lowerbound on the logarithmic parameter: the smaller thelower prior bound, the smaller also the upper posteriorbound. This lack of a stable posterior bound is a re-sult of the definition via percentiles, a notion inspiredby a normal distribution. For other types of distribu-tions, such as the step-like posteriors seen in the middlepanel of Fig. 2, percentiles of that sort are not the idealmeasure for an upper bound. For such a step-like pos-terior a better alternative would be to quantify the po-sition of the step directly, e.g. where the posterior dropsto some fraction of its plateau value. In the case thatan exponential distribution provides a good fit to thenon-logarithmic parameter (see the mock example in thefollowing Section III C), the parameter value where theposterior is 1 / e times its maximum turns out to be a sta-ble choice, which corresponds to the mean of the expo-nential distribution. Indeed, using that 1 / e measure forthe step position we get roughly the same upper boundon the tensor-to-scalar ratio for all prior options: r < . , log r < − . . (9)Note, that these are not the habitually quoted 95 % up-per bounds on the tensor-to-scalar ratio. For the uni-form sampling run of r , this limit in this case is closerto roughly an 80 % upper bound. Note further that the .
94 0 .
95 0 .
96 0 .
97 0 .
98 0 .
99 1 . n s − − − − l og r . . . . r c o n v e x c o n c a v e ΛCDM+ r ∼ U (0 , r ∼ U ( − ,
0) prior posterior
FIG. 3. Two-dimensional version of Fig. 2 showing the68 % and 95 % levels of the posterior contours for Planck 2018TT,TE,EE+lowE data for spectral index n s and tensor-to-scalar ratio r , where again a uniform (blue) or logarithmic (or-ange) prior on r was used. The scattered dots give a notionof that prior distribution. Note that the dotted lines are nottrue constraints as explained in Fig. 2. The thin black linedivides the n s - r parameter space into regions of convex andconcave inflationary potentials. choice of the 1 / e fraction provides a particularly stablebound, because of the connection to the mean of the ex-ponential distribution. B. Tensor-to-scalar ratio: Evidence andKullback–Leibler divergence
Nested sampling provides us with distributions for log-evidence ln Z , KL-divergence D KL and Bayesian modeldimensionality d in the same way as for the posterior of − − − − Z− − ∆ l n Z + ∆ D K L Planck 2018 TT,TE,EE+lowEΛCDMΛCDM+ r ∼ U (0 , r ∼ U ( − , r ∼ U ( − , − D KL − d − ∆ d − ∆ D K L − − Z + ∆ D KL ∆ l n Z FIG. 4. Effect of uniform vs logarithmic priors on Bayesian model comparison for the tensor-to-scalar ratio r : log-evidence ∆ ln Z , Kullback–Leibler divergence D KL (in nats), Bayesian model dimensionality d , and posterior average of thelog-likelihood (cid:104) ln L(cid:105) P = ln Z + D KL . The probability distributions represent errors arising from the nested sampling process.In the limit of infinite life points these distributions would become point statistics, in contrast to posterior distributions. Wenormalise with respect to the ΛCDM model without r (i.e. with r = 0). Note, how switching from uniform to logarithmicsampling of r (i.e. from blue to orange/red) moves the contours along their ln Z , D KL degeneracy line, i.e. relative entropy istraded in for evidence. Note further by comparison of the orange and red lines, how changing the lower bound of the logarithmicsampling interval (by 5 log-units) barely affects the contours (bar some expected statistical fluctuation due to the samplingerror). free model parameters, which can be calculated straight-forwardly using anesthetic ’s analysis tools for nestedsampling output [55]. Figure 4 shows the contours forthose quantities in a triangle plot. We have normalisedall quantities with respect to the base ΛCDM model, suchthat e.g. for the log-evidence we have:∆ ln Z = ln Z − ln Z ΛCDM . (10)Table II lists the summary statistics for the quantitiesfrom Fig. 4.The marginalised plot for the difference in log-evidence(topmost panel) with ∆ ln Z = − . ± .
23 for the r -extension of ΛCDM shows that it is considerably dis- favoured when applying a uniform prior. However,switching from a uniform to a logarithmic prior negatesthe difference in log-evidence completely, such that thelog r extension ends up almost on par with the baseΛCDM model.Changing the lower bound for the logarithmic prior,on the other hand, barely affects the evidence valueat all. We have performed a run with a lower boundof log r = − r = −
10, i.e.five orders of magnitude difference in r . Despite thislarge difference in the lower bound the corresponding log-evidence ln Z changes only very little such that the distri-butions significantly overlap one another. As explained TABLE II. Mean and standard deviation of the log-evidence ln Z , Kullback–Leibler divergence D KL and Bayesian modeldimensionality d of the base ΛCDM cosmology and its r extension from Planck 2018 TT,TE,EE+lowE data [43]. The ∆indicates normalisation with respect to the base ΛCDM model. Model ln
Z D KL d ∆ ln Z ∆ D KL ∆ d ΛCDM − . ± .
20 38 . ± .
20 17 . ± . − . ± .
20 0 . ± .
20 0 . ± . r ∼ U ( 0 , − . ± .
23 40 . ± .
23 18 . ± . − . ± .
23 2 . ± .
23 0 . ± . r ∼ U ( − , − . ± .
23 39 . ± .
23 17 . ± . − . ± .
23 0 . ± .
23 0 . ± . r ∼ U ( − , − . ± .
23 38 . ± .
22 17 . ± . − . ± .
23 0 . ± .
22 0 . ± . in Section II A, this is due to log r being unconstrainedbelow a certain threshold and the Bayesian evidence pick-ing up only on constrained parameters. This can seemcounter-intuitive, since the Bayesian evidence is generallyunderstood to automatically penalise additional parame-ters. The key point is that the Occam penalty essentiallyenters into the Bayesian evidence in the form of the ra-tio of posterior to prior volume. If both volumes are thesame, then they divide out and do not contribute to theOccam penalty.The last point becomes clearer by also taking into ac-count the KL-divergence and recalling Eq. (6), wherewe identified D KL as a measure for the Occam penalty.Looking at the correlation plot between log-evidence andKL-divergence makes it clear that there is a trade-offhappening between those two quantities when switch-ing between uniform and logarithmic priors. While theevidence increases for the logarithmic prior, the KL-divergence decreases, as expected from the posterior plotsin Fig. 2, which shows how the change from prior to pos-terior happens only at about log r (cid:38) −
2. This is fur-ther reflected in the Bayesian model dimensionality d ,which shows a clear growing trend from about d = 17for the base ΛCDM model via a log r extension to about d = 18 for the r extension reflecting the one additionalsampling parameter. Note that the total number of sam-pled parameters consists of 6 base cosmological parame-ters (+1 for the r extension) and 21 nuisance parametersfrom the Planck likelihood.Because of the trade-off between log-evidence and KL-divergence it is interesting also to look at their sum,which from Eq. (6) we know turns out to be the pos-terior average of the log-likelihood:ln Z + D KL = (cid:104) ln L(cid:105) P . (11)This makes for an interesting pairing with the Bayesianmodel dimensionality, since d/ variance of the log-likelihood. As such, these two quantities pro-vide an alternative perspective to that of the evidenceand KL-divergence. The posterior average and varianceof the log-likelihood are a measure of the fit and com-plexity respectively. (cid:104) ln L(cid:105) P is shown in the last panelin Fig. 4, where we indeed see that the line for uniformsampling of r has moved much closer to the other lines,which is to be expected, since r and log r are fundamen- tally the same parameter and therefore lead to a similargoodness of fit.This behaviour can also be understood analytically,which we explore in the following section in a one-dimensional mock example, simulating the r vs log r re-sult. C. Mock example
To illustrate further the role of a uniform vs a loga-rithmic prior on a Bayesian model comparison, we pro-pose the following mock example, which is loosely basedon the pedagogical example by Sivia and Skilling [37]explaining the effect of an additional (although in thatcase constrained ) parameter, which we already outlinedin Section II A.Here, we will not assume a Gaussian likelihood thatultimately fully constrains a parameter, but rather wewill assume an exponential distribution as our likelihoodon a strictly positive parameter (which is the maximum-entropy distribution when only a mean is known): L ( a ) = P e − a/µ , (12)where P = Pr( D | a = 0) is the maximum likelihoodvalue for the data D at a = 0 and where µ is the meanof the likelihood distribution describing the data. Thus,the likelihood is constrained only on one side, providingan upper bound, as shown in the left panel of Fig. 5.We will assume a model A , where we sample the pa-rameter a uniformly in the interval [ a , a ]. Furthermore,we will assume a model B , where we uniformly samplethe parameter b = log a in the interval [ b , b ], corre-sponding to logarithmically sampling the parameter a .Since both models are fundamentally governed by thesame quantity and will use the same likelihood, any dif-ference in Bayesian inference quantities will be purelyprior driven.We will make the following assumptions on the order-ing of the prior limits:0 = a < b (cid:28) µ (cid:28) b = a = 1 . (13)This ordering is motivated as follows: For the upper limitwe require that the likelihood has essentially droppedto zero. Hence, without loss of generality, we can set . . . . . . r ΛCDM + r L ( r ) = P e − r/µ r µ r ≈ . − − − − − r ΛCDM + log r L (log r ) = P e − r/µ r log µ r ≈ − . FIG. 5. Exponential likelihood distribution from our mock example in Eq. (12) compared to Planck 2018 temperature andpolarisation data (TT,TE,EE+lowE) on the tensor-to-scalar ratio r with uniform sampling of r on the left and logarithmicsampling of r on the right. Note how the mean µ r fulfills the ordering required by Eq. (13) and how the lower limit on log r is well into the saturation plateau of posterior/likelihood. the upper limit to one and require µ (cid:28)
1. The lowerlimit for the positive parameter a can be explicitly set tozero when sampling uniformly. However, when samplinglogarithmically we need to pick some finite lower limit,which we require to be in the region 10 b (cid:28) µ , wherethe likelihood has essentially saturated with respect to b (see right panel in Fig. 5). The dependence of Bayesianquantities such as the evidence Z or the Kullback–Leiblerdivergence D KL on the prior choice on the one hand andon this lower limit b on the other is the goal of this mockexample. The corresponding priors for models A and B can thusbe written as: π A ( a ) = 1 a − a Θ( a − a ) Θ( a − a ) , (14) π B ( b ) = 1 b − b Θ( b − b ) Θ( b − b ) , (15)where Θ( x ) is the Heaviside step function.We can compute the evidence and Kullback–Leibler divergence for models A and B as: Z A = (cid:90) L ( a ) π A ( a ) d a = P µa − a (cid:16) e − a /µ − e − a /µ (cid:17) , (16) Z B = (cid:90) L (10 b ) π B ( b ) d b = P b − b (cid:20) Ei (cid:18) − b µ (cid:19) − Ei (cid:18) − b µ (cid:19)(cid:21) , (17) D KL ,A = (cid:90) L ( a ) π A ( a ) Z A ln (cid:18) L ( a ) Z A (cid:19) d a = ln P Z A − − P Z A a − a (cid:20) a exp (cid:18) − a µ (cid:19) − a exp (cid:18) − a µ (cid:19)(cid:21) , (18) D KL ,B = (cid:90) L (10 b ) π B ( b ) Z B ln (cid:18) L (10 b ) Z B (cid:19) d b = ln P Z B − P Z B b − b (cid:20) exp (cid:18) − b µ (cid:19) − exp (cid:18) − b µ (cid:19)(cid:21) , (19)where Ei refers to the exponential integral. With the ordering from Eq. (13) we can then approximate these to give:∆ ln Z A ≈ ln µ ∼ − , (20)∆ ln Z B ≈ ln (cid:18) − log µb (cid:19) ∼ , (21)∆ D KL ,A ≈ − ∆ ln Z A − ∼ , (22)∆ D KL ,B ≈ − ∆ ln Z B ∼ , (23) − − − − − b min − − − − ∆ l n Z Mock example: r = 0 r ∼ U (0 , r ∼ U ( b min , Planck 2018 : ΛCDM + r = 0 r ∼ U (0 , r ∼ U ( b min , FIG. 6. Dependence of the log-evidence on the lower priorbound b min : Comparison of the results in Eqs. (16) and (17)for the one-dimensional mock example (solid lines) to thenested sampling results from Table II (dots with error bars).The vertical dotted line corresponds to the mean used in themock likelihood distribution (cf. Fig. 5). where we normalise with respect to a base model O with a = 10 b = 0 fixed, such that Z O = P and D KL ,O = 0.The numerical values assume µ ∼ .
06, which is roughlythe posterior mean of the tensor-to-scalar ratio under uni-form sampling in the preceding section. Hence, we cancompare these zeroth-order numerical approximations tothe results in Fig. 4 and Table II, which indeed match.Figure 6 makes this comparison more thoroughly, com-paring the results from our one-dimensional mock exam-ple in Eqs. (16) and (17) with the nested sampling resultsfrom Table II for a variable lower bound b min of the log-arithmic prior. The mean ln Z of the base model with r = 0 for both the mock example and for the base ΛCDMnested sampling run are zero by definition of our normali-sation. They serve only as calibration for the models withuniform (blue) and logarithmic (orange) priors. All threenested sampling runs agree well with the prediction fromthe mock example within their margins of errors.Figure 6 illustrates how the evidence levels off withregards to the choice of the lower bound of the logarith-mic prior (orange line) also reflected in the near equalevidences of the nested sampling runs with lower priorbounds of − −
10 respectively. Note that the goodagreement between mock example and data in Fig. 6 isdue to the fact that the tensor-to-scalar ratio is almostcompletely uncorrelated with the other cosmological pa-rameters, with the biggest (yet still small) correlationcoming from the spectral index n s (cf. Fig. 3). IV. NEUTRINO MASSES
In Planck’s baseline cosmology, the neutrinos are as-sumed to be comprised of two massless neutrinos and onemassive neutrino with mass m ν = 0 .
06 eV with the effec-tive number of neutrino species set slightly larger than 3to N eff = 3 .
046 [7, 58, 59].Upcoming CMB experiments such as the Simons Ob-servatory, LiteBIRD or CMB-S4 and large scale struc-ture (LSS) experiments such as Euclid will allow us tofully constrain the sum of neutrino masses (cid:80) m ν . How-ever, even under the most optimistic assumptions, it willnot be possible to disentangle the individual contribu-tions of the three neutrino flavours with cosmologicaldata alone [16]. To achieve that, we need additional datafrom solar, atmospheric, reactor and accelerator experi-ments as summarised in NuFIT 5.0 (2020) [45, 46] thatprovide us with the mass square splittings: δm = 7 . +0 . − . × − eV (NH & IH) , (24)∆ m = (cid:40) . +0 . − . × − eV (NH) , . +0 . − . × − eV (IH) , (25)where δm is the smaller squared mass splitting betweenthe light and the medium neutrino mass for the normalneutrino hierarchy (NH) and between the medium andthe heavy neutrino mass for the inverted neutrino hier-archy (IH), and ∆ m is the larger squared mass splittingbetween the light and the heavy neutrino mass in bothcases.With the knowledge of the two squared mass splittings,the remaining uncertainty lies mostly with the scale ofthe lightest neutrino. In the following Bayesian analysiswe therefore apply Gaussian priors according to Eqs. (24)and (25) and vary over the lightest neutrino mass. A. Neutrino masses: Posteriors
We have taken nested sampling runs for an exten-sion of the base ΛCDM cosmology with three indi-vidual neutrino masses, where we have used both auniform prior m light ∼ U (0 ,
1) and logarithmic priorswith different lower bounds, log m light ∼ U ( − , m light ∼ U ( − , m light together with δm and ∆ m from Eqs. (24)and (25): m = (cid:40) m + δm (NH) ,m + ∆ m − δm (IH) , (26) m = m + ∆ m . (27)Figure 7 gives an overview of the stability of thecosmological base parameters across the different priors0 ΛCDM+ m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − , m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − , Ω b h Ω c h θ s τ reio log(10 A s ) (NH)(IH) n s FIG. 7. Stability of the cosmological parameters for the 3-neutrino extension of the base ΛCDM cosmology for different priorson m light : uniform in blues, logarithmic with lower bound of − −
10 in reds.The darker set of colours corresponds to the normal neutrino hierarchy (NH) and the lighter set to the inverted hierarchy (IH).For each parameter we show the mean and the extent from quantile 0 .
16 to 0 .
84, i.e. the inner 68 % limits. for m light and compares them to the ΛCDM base modelby showing their mean and 68 % ranges. Compared toFig. 1 for the tensor-to-scalar ratio there are some smallparameter shifts visible in relation to the base ΛCDMmodel, but all shifts stay well within the 68 % bounds.Figure 8 shows the one-dimensional marginalised pos-terior distributions for the three individual neutrinomasses m light , m medium , and m heavy , as well as the sum ofall three (cid:80) m ν for both the normal and the inverted hier-archy. We have included shaded histograms to give a no-tion of the prior distributions. The vertical black dottedlines indicate roughly the lower bound for the mediumand heavy neutrino mass as determined from the masssquared splittings under the assumption where the lightneutrino mass is zero.When looking at the lightest neutrino mass in the firstrow, the picture is very similar to that for the tensor-to-scalar ratio before, and most of what we have saidin Section III A applies here, too. One has an almostexponential drop-off from zero when sampling uniformlyover the mass (left column), significantly compressing theprior, which turns into a more step-like behaviour withrespect to the logarithm of the mass when sampling themass logarithmically (right column).Note that the medium and heavy mass from rows 2 and3 as well as the sum of all masses in the bottom row are derived quantities and therefore do not show the sameprior behaviour visible for the light neutrino mass. Thisis not so apparent for the derived masses, when sam-pling uniformly over the light neutrino mass, althoughone can see a slight step in the histogram of the priorfor the heavy mass in the NH case and for both mediumand heavy mass in the IH case. However, when sam-pling logarithmically over the light neutrino mass, thenthe picture is much clearer. The probability density formedium and heavy neutrino mass bulks up around theirrough lower minimum set by the smaller and larger masssquare splitting respectively. There are two perspectives that one can adopt here.On one hand, one could criticise the choice of a loga-rithmic prior for being ultimately too prior (or theory)driven and not reflective of the data. On the other hand,one could say that this is the natural result of our stateof knowledge of the mass square splittings and our trueignorance about the scale of the lightest neutrino mass.We wonder whether this very last statement could becontested, e.g. could we say that we would expect thelightest neutrino mass to be of a magnitude similar tothat of the medium neutrino mass in the NH? However,this is not the case, when checking for precedence bylooking at the other set of leptons, the electron, muonand tauon, where we have roughly around 2 orders ofmagnitude between their masses [60].Comparing the two hierarchies with one another, wecan see that the major difference lies in the medium neu-trino mass (and therefore also the sum of all neutrinomasses), which is restricted to larger masses in the in-verted hierarchy compared to the normal hierarchy, asexpected from the mass square splitting (black dottedlines). B. Neutrino masses: Evidence andKullback–Leibler divergence
In Fig. 9 we show the results from our nested sam-pling runs for the log-evidence ln Z , KL-divergence D KL ,Bayesian model dimensionality d and posterior averageof the log-likelihood (cid:104) ln L(cid:105) . We again normalise with re-spect to the base ΛCDM model. Table III lists the sum-mary statistics for these quantities. As already the casefor the posterior, the picture here is again similar to theone for the tensor-to-scalar ratio in Section III B.Looking at the distributions for the log-evidence (top-most diagonal panel) shows that the addition of the neu-trino parameters with uniform sampling over the light1 li g h t normal hierarchy ΛCDM+ m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − ,
0) prior posterior m e d i u m m medium & δm ≈ .
009 eV h e a vy m heavy & ∆ m ≈ .
050 eV0 . . . . m ν / P − − − − − ( m ν / P m ν & δm + ∆ m ≈ .
059 eV li g h t inverted hierarchy m e d i u m m medium & √ ∆ m − δm ≈ .
049 eV h e a vy m heavy & ∆ m ≈ .
050 eV0 . . . . m ν / P − − − − − ( m ν / P m ν & .
099 eV
FIG. 8. One-dimensional posterior distributions of neutrino masses with normal hierarchy (NH) in the top panel and withinverted hierarchy (IH) in the bottom panel for TT,TE,EE+lowE data from Planck 2018 and neutrino oscillation data on themass squared splittings from NuFIT 5.0 (2020). The vertical black dotted lines give the rough lower limit on medium and heavymass that is set by the mass squared splittings δm and ∆ m . For the inverted hierarchy these dotted lines appear almoston top of each other. The rows show the posteriors for the light, medium, and heavy neutrino mass and sum of all neutrinomasses, respectively. The columns contrast the difference between using a uniform (blue, left) or logarithmic (orange and red,right) prior on the light neutrino mass m light . The shaded histograms give a notion of that prior distribution. − − − − Z− . − . − . . ∆ l n Z + ∆ D K L Planck 2018 TT,TE,EE+lowE (+ NuFIT 5.0) normal hierarchy: m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − , inverted hierarchy: m light ∼ U (0 , m light ∼ U ( − , m light ∼ U ( − , m ν = 0 .
06 eV : ΛCDM0 1 2 3∆ D KL − d − ∆ d ∆ D K L − . − . − . .
0∆ ln Z + ∆ D KL ∆ l n Z FIG. 9. Effect of uniform vs logarithmic priors on the light neutrino mass m light for Bayesian model comparison: log-evidence ∆ ln Z , Kullback–Leibler divergence D KL , Bayesian model dimensionality d , and posterior average of the log-likelihood (cid:104) ln L(cid:105) P = ln Z + D KL . The probability distributions represent errors arising from the nested sampling process.In the limit of infinite life points these distributions would become point statistics, in contrast to posterior distributions. Wenormalise with respect to the ΛCDM model with two massless and only one massive neutrino with m ν = 0 .
06 eV. Note, howswitching from uniform to logarithmic sampling of m light moves the contours along their ln Z , D KL degeneracy line, i.e. relativeentropy is traded in for evidence. Note further by comparison of the orange and red lines, how changing the lower bound ofthe logarithmic sampling interval (by 5 log-units!) barely affects the contours (bar some expected statistical fluctuation due tothe sampling error). neutrino mass (either hierarchy) is disfavoured with over3 log-units compared to the base ΛCDM model with asingle massive neutrino of fixed mass (and 2 massless).Since the mass square splittings enter on the prior levelin our analysis and remain essentially unconstrained bythe cosmological data, any change to the evidence is al- most entirely driven by the light neutrino mass param-eter. Hence, it is not surprising that upon switching toa logarithmic prior on m light the log-evidence increasesagain while the KL-divergence drops close to the level ofthe ΛCDM model. We need to keep in mind that sincethis is an extension to the ΛCDM model, it has in prin-3 TABLE III. Mean and standard deviation of the log-evidence ln Z , Kullback–Leibler divergence D KL and Bayesian modeldimensionality d of the base ΛCDM cosmology and its 3-neutrino extension from Planck 2018 TT,TE,EE+lowE data [43]. Thesecond block of rows shows the results from the normal neutrino hierarchy and the third block for the inverted hierarchy. The ∆indicate normalisation with respect to the base ΛCDM model. Model ln
Z D KL d ∆ ln Z ∆ D KL ∆ d ΛCDM − . ± .
19 38 . ± .
19 17 . ± . − . ± .
19 0 . ± .
19 0 . ± . n o r m a l ΛCDM+ m light ∼ U ( 0 , − . ± .
27 40 . ± .
27 18 . ± . − . ± .
27 2 . ± .
27 1 . ± . m light ∼ U ( − , − . ± .
22 38 . ± .
22 17 . ± . − . ± .
22 0 . ± .
22 0 . ± . m light ∼ U ( − , − . ± .
22 39 . ± .
22 17 . ± . − . ± .
22 0 . ± .
22 0 . ± . i n v e rt e d ΛCDM+ m light ∼ U ( 0 , − . ± .
27 40 . ± .
27 18 . ± . − . ± .
27 2 . ± .
27 1 . ± . m light ∼ U ( − , − . ± .
23 39 . ± .
23 17 . ± . − . ± .
23 0 . ± .
23 0 . ± . m light ∼ U ( − , − . ± .
22 38 . ± .
22 17 . ± . − . ± .
22 0 . ± .
22 0 . ± . ciple a better chance of fitting the data, such that anydifference in the Bayesian evidence can be attributed toan Occam penalty, which the shift between uniform andlogarithmic sampling confirms.As expected from our investigations for the tensor-to-scalar ratio and especially with regards to our mock ex-ample from Section III C, changing the lower bound forthe logarithmic prior does not affect the Bayesian evi-dence. We have again performed runs with two differentlower bounds of log m light = − m light = − (cid:104) ln L(cid:105) P = ln Z + D KL we again roughly confirm∆ ln Z uni + ∆ D KL , uni ≈ − , (28)∆ ln Z log + ∆ D KL , log ≈ , (29)matching our mock results from Eqs. (20) to (23), inde-pendent from the mock parameter µ . C. Neutrino hierarchy
A Bayesian model comparison of the normal vs theinverted neutrino hierarchy is beyond the scope of thispaper and has been done before with more stringentdata [17, 61, 62]. However, with posteriors and evidencesat hand, we shall briefly discuss the situation here.There have been claims to a strong preference of thenormal over the inverted neutrino hierarchy [10], how-ever, such strong evidence can typically be traced back toprior volume effects [11], i.e. the effect of a reduced sam-pling space for the inverted hierarchy. In other words,we need to watch out and properly distinguish to whatextent any Bayesian preference is assigned already on the prior level and to what extent is that preference indeeddriven by the data.In our analysis both hierarchies start out on an equalfooting. With the same prior on the light neutrino massand equivalent Gaussian priors on the mass squared split-tings from neutrino oscillation experiments, the prior vol-ume for both hierarchies is essentially the same. Notethat although the means for the larger mass squared split-ting ∆ m are slightly different in the two hierarchies, itsstandard deviations are essentially the same.There is a slight tendency for all prior options of a bet-ter fit of the normal compared to the inverted neutrinohierarchy. However, with an evidence difference of lessthan one log-unit (odds of maximally 2 : 1) any prefer-ence for the normal hierarchy is meagre at best, especiallywhen also accounting for the sampling error (see Fig. 9).It should be noted, though, that we have used only CMBtemperature and polarisation data here. Adding datafrom CMB lensing or baryon acoustic oscillations wouldhave further shrunk the constraints on the sum of neu-trino masses and thereby possibly strengthened the casefor the normal hierarchy. V. DISCUSSION
We demonstrate how switching between a uniform anda logarithmic prior on some single-bounded model pa-rameter results in a trade-off between Bayesian evidenceand Kullback–Leibler divergence (or relative entropy).The common scenario is that of insufficient data sensi-tivity, leading to a one-sided bound on a parameter. Fora location parameter this typically causes an exponentialdrop-off, which translates to a step-like behaviour whenturned into the corresponding scale parameter. We showthat the ambiguity of the lower bound of the scale pa-rameter does not affect a Bayesian model comparison,provided the lower bound is chosen sufficiently far intothe likelihood plateau.We demonstrate this behaviour for two cases of pa-rameter extensions to the ΛCDM model of cosmology,namely for the tensor-to-scalar ratio of primordial per-4turbations and for the case of three non-degenerate neu-trino masses. In both cases we confirm that switchingfrom a uniform prior to a logarithmic prior will get ridof (most of) the Occam penalty associated with that pa-rameter, since unconstrained parameters do not affectthe Bayesian evidence. Thus the Bayesian evidence isroughly on par with the un-extended (base) model, withthe only difference in the form of an uninformative pa-rameter. Furthermore and for the same reason, the exactchoice of the lower bound for the logarithmic prior doesnot change the Bayesian evidence. When the likelihoodlevels off, e.g. due to insufficient sensitivity in the data,then so does the Bayesian evidence.
ACKNOWLEDGMENTS [1] J. Bernoulli,
Ars conjectandi (Thurnisiorum, Basilae,1713).[2] J. M. Keynes,
A Treatise On Probability (Macmillan AndCo., 1921).[3] D. S. Sivia and J. Skilling,
Data analysis : a Bayesiantutorial. (Oxford University Press, 2006) pp. 1–246.[4] K. Lau, J.-Y. Tang, and M.-C. Chu, Cosmic microwavebackground constraints on the tensor-to-scalar ratio, Re-search in Astronomy and Astrophysics , 635 (2014).[5] G. Barenboim and W.-I. Park, On the tensor-to-scalar ratio in large single-field inflation models (2015),arXiv:1509.07132.[6] P. Creminelli, S. Dubovsky, D. L´opez Nacir, M. Si-monovi´c, G. Trevisan, G. Villadoro, and M. Zaldarriaga,Implications of the scalar tilt for the tensor-to-scalar ra-tio, Physical Review D , 123528 (2015).[7] Planck Collaboration, Planck 2018 results. VI. Cosmo-logical parameters, Astronomy & Astrophysics , A6(2020).[8] The Keck Array and BICEP2 Collaborations, Con-straints on Primordial Gravitational Waves UsingPlanck, WMAP, and New BICEP2/Keck Observationsthrough the 2015 Season, Physical Review Letters ,221301 (2018).[9] K. Hirano, Inflation with very small tensor-to-scalar ratio(2019), arXiv:1912.12515.[10] F. Simpson, R. Jimenez, C. Pena-Garay, and L. Verde,Strong Bayesian evidence for the normal neutrino hier-archy, Journal of Cosmology and Astroparticle Physics (06), 029.[11] T. Schwetz, K. Freese, M. Gerbino, E. Giusarma,S. Hannestad, M. Lattanzi, O. Mena, and S. Vagnozzi,Comment on ”Strong Evidence for the Normal NeutrinoHierarchy” (2017), arXiv:1703.04585.[12] A. Caldwell, A. Merle, O. Schulz, and M. Totzauer, Global Bayesian analysis of neutrino mass data, Phys-ical Review D , 073001 (2017).[13] F. Capozzi, E. Di Valentino, E. Lisi, A. Marrone, A. Mel-chiorri, and A. Palazzo, Global constraints on absoluteneutrino masses and their ordering, Physical Review D , 096014 (2017).[14] A. Loureiro, A. Cuceu, F. B. Abdalla, B. Moraes,L. Whiteway, M. McLeod, S. T. Balan, O. Lahav,A. Benoit-L´evy, M. Manera, R. P. Rollins, and H. S.Xavier, Upper Bound of Neutrino Masses from CombinedCosmological Observations and Particle Physics Experi-ments, Physical Review Letters , 081301 (2019).[15] F. Capozzi, E. Di Valentino, E. Lisi, A. Marrone, A. Mel-chiorri, and A. Palazzo, Addendum to “Global con-straints on absolute neutrino masses and their ordering”,Physical Review D , 116013 (2020).[16] M. Archidiacono, S. Hannestad, and J. Lesgourgues,What will it take to measure individual neutrino massstates using cosmology? (2020), arXiv:2003.03354.[17] S. R. Choudhury and S. Hannestad, Updated results onneutrino mass and mass hierarchy from cosmology withPlanck 2018 likelihoods, Journal of Cosmology and As-troparticle Physics (07), 037.[18] P. St¨ocker, C. Bal´azs, S. Bloor, T. Bringmann, T. E.Gonzalo, W. Handley, S. Hotinli, C. Howlett, F. Kahlhoe-fer, J. J. Renk, P. Scott, A. C. Vincent, and M. White(The GAMBIT Cosmology Workgroup), Strengthen-ing the bound on the mass of the lightest neutrinowith terrestrial and cosmological experiments (2020),arXiv:2009.03287.[19] The Simons Observatory Collaboration, The SimonsObservatory: science goals and forecasts, Journal ofCosmology and Astroparticle Physics (02), 056,arXiv:1808.07445. [20] The LiteBIRD Collaboration, Astro2020 APC White Pa-per: LiteBIRD: an all-sky cosmic microwave backgroundprobe of inflation, Bulletin of the AAS (2019).[21] CMB-S4 Collaboration, CMB-S4 Science Case, ReferenceDesign, and Project Plan (2019), arXiv:1907.04473.[22] D. J. C. MacKay, Model Comparison and Occam’s Ra-zor, in Information Theory, Inference and Learning Al-gorithms (Cambrdige University Press, 2003) Chap. 28,pp. 343–355.[23] D. J. C. MacKay, Laplace’s Method, in
Information The-ory, Inference and Learning Algorithms (Cambrdige Uni-versity Press, 2003) Chap. 28, pp. 341–342.[24] J. M. Dickey, The Weighted Likelihood Ratio, Linear Hy-potheses on Normal Location Parameters, The Annals ofMathematical Statistics , 204 (1971).[25] R. Trotta, Applications of Bayesian model selection tocosmological parameters, Monthly Notices of the RoyalAstronomical Society , 72 (2007).[26] R. Trotta, Forecasting the Bayes factor of a future obser-vation, Monthly Notices of the Royal Astronomical Soci-ety , 819 (2007).[27] L. Verde, S. M. Feeney, D. J. Mortlock, and H. V. Peiris,(Lack of) Cosmological evidence for dark radiation afterPlanck, Journal of Cosmology and Astroparticle Physics (09), 013.[28] A. Heavens, Y. Fantaye, A. Mootoovaloo, H. Eg-gers, Z. Hosenie, S. Kroon, and E. Sellentin, MarginalLikelihoods from Monte Carlo Markov Chains (2017),arXiv:1704.03472.[29] A. Heavens, Y. Fantaye, E. Sellentin, H. Eggers, Z. Ho-senie, S. Kroon, and A. Mootoovaloo, No Evidence forExtensions to the Standard Cosmological Model, Physi-cal Review Letters , 101301 (2017).[30] J. Skilling, Nested sampling for general Bayesian compu-tation, Bayesian Analysis , 833 (2006).[31] D. S. Sivia and J. Skilling, Nested sampling, in Dataanalysis : a Bayesian tutorial. (Oxford University Press,2006) Chap. 9, pp. 181–208.[32] F. Feroz and M. P. Hobson, Multimodal nested sam-pling: an efficient and robust alternative to MarkovChain Monte Carlo methods for astronomical data anal-yses, Monthly Notices of the Royal Astronomical Society , 449 (2008).[33] F. Feroz, M. P. Hobson, and M. Bridges, MultiNest: anefficient and robust Bayesian inference tool for cosmol-ogy and particle physics, Monthly Notices of the RoyalAstronomical Society , 1601 (2009).[34] F. Feroz, M. P. Hobson, E. Cameron, and A. N. Pettitt,Importance Nested Sampling and the MultiNest Algo-rithm, The Open Journal of Astrophysics (2019).[35] W. J. Handley, M. P. Hobson, and A. N. Lasenby,PolyChord: nested sampling for cosmology, Monthly No-tices of the Royal Astronomical Society: Letters , L61(2015), arXiv:1502.01856.[36] W. J. Handley, M. P. Hobson, and A. N. Lasenby,PolyChord: next-generation nested sampling, MonthlyNotices of the Royal Astronomical Society , 4385(2015), arXiv:1506.00171.[37] D. S. Sivia and J. Skilling, Model selection, in Data analy-sis : a Bayesian tutorial. (Oxford University Press, 2006)Chap. 4, pp. 78–102.[38] C. Heymans, T. Tr¨oster, M. Asgari, C. Blake, H. Hilde-brandt, B. Joachimi, K. Kuijken, C.-A. Lin, A. G.S´anchez, J. L. van den Busch, A. H. Wright, A. Amon, M. Bilicki, J. de Jong, M. Crocce, A. Dvornik, T. Erben,M. C. Fortuna, F. Getman, B. Giblin, K. Glazebrook,H. Hoekstra, S. Joudaki, A. Kannawadi, F. K¨ohlinger,C. Lidman, L. Miller, N. R. Napolitano, D. Parkinson,P. Schneider, H. Shan, E. Valentijn, G. V. Kleijn, andC. Wolf, KiDS-1000 Cosmology: Multi-probe weak grav-itational lensing and spectroscopic galaxy clustering con-straints, arXiv (2020), arXiv:2007.15632.[39] W. Handley, Curvature tension: evidence for a closeduniverse (2019), arXiv:1908.09139.[40] W. Handley and P. Lemos, Quantifying tensions in cos-mological parameters: Interpreting the DES evidence ra-tio, Physical Review D , 043504 (2019).[41] D. J. Spiegelhalter, N. G. Best, B. P. Carlin, andA. van der Linde, Bayesian measures of model complexityand fit, Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 583 (2002).[42] W. Handley and P. Lemos, Quantifying dimensionality:Bayesian cosmological model complexities, Physical Re-view D , 023512 (2019).[43] Planck Collaboration, Planck 2018 results. V. CMBpower spectra and likelihoods, Astronomy & Astro-physics , A5 (2020).[44] I. Esteban, M. C. Gonzalez-Garcia, A. Hernandez-Cabezudo, M. Maltoni, and T. Schwetz, Global analysisof three-flavour neutrino oscillations: synergies and ten-sions in the determination of θ δ CP, and the mass or-dering, Journal of High Energy Physics , 106 (2019).[45] I. Esteban, M. C. Gonzalez-Garcia, M. Maltoni,T. Schwetz, and A. Zhou, The fate of hints: up-dated global analysis of three-flavor neutrino oscillations(2020), arXiv:2007.14792.[46] NuFIT — NuFIT 5.0, (2020),accessed: 2020-08-13.[47] J. Torrado and A. Lewis, Cobaya: Code forBayesian Analysis of hierarchical physical models (2020),arXiv:2005.05290.[48] A. Lewis and S. Bridle, Cosmological parameters fromCMB and other data: A Monte Carlo approach, PhysicalReview D , 103511 (2002).[49] A. Lewis, Efficient sampling of fast and slow cosmologicalparameters, Physical Review D , 103529 (2013).[50] R. M. Neal, Taking Bigger Metropolis Steps by DraggingFast Variables (2005), arXiv:math/0502099.[51] J. Lesgourgues, The Cosmic Linear Anisotropy SolvingSystem (CLASS) I: Overview (2011), arXiv:1104.2932.[52] D. Blas, J. Lesgourgues, and T. Tram, The Cosmic LinearAnisotropy Solving System (CLASS). Part II: Approxi-mation schemes, Journal of Cosmology and AstroparticlePhysics (07), 034.[53] J. Lesgourgues and T. Tram, The Cosmic LinearAnisotropy Solving System (CLASS) IV: efficient imple-mentation of non-cold relics, Journal of Cosmology andAstroparticle Physics (09), 032.[54] A. Lewis, GetDist: a Python package for analysing MonteCarlo samples (2019), arXiv:1910.13970.[55] W. Handley, anesthetic: nested sampling visualisation,Journal of Open Source Software , 1414 (2019).[56] L. T. Hergt, Bayesian evidence for the tensor-to-scalar ra-tio and neutrino masses: Effects of uniform vs logarith-mic priors (supplementary inference products), https://doi.org/10.5281/zenodo.4556360 (2021).[57] J. C. Forbes, A PDF PSA, or Never gonnaset xscale again – guilty feats with logarithms (2020), arXiv:2003.14327.[58] G. Mangano, G. Miele, S. Pastor, T. Pinto, O. Pisanti,and P. D. Serpico, Relic neutrino decoupling includingflavour oscillations, Nuclear Physics B , 221 (2005).[59] P. F. de Salas and S. Pastor, Relic neutrino decouplingwith flavour oscillations revisited, Journal of Cosmologyand Astroparticle Physics (07), 051.[60] Particle Data Group, Review of Particle Physics, Progress of Theoretical and Experimental Physics (2020).[61] S. Hannestad and T. Schwetz, Cosmology and the neu-trino mass ordering, Journal of Cosmology and Astropar-ticle Physics (11), 035.[62] A. F. Heavens and E. Sellentin, Objective Bayesian anal-ysis of neutrino masses and hierarchy, Journal of Cosmol-ogy and Astroparticle Physics2018