Interpreting Internal Consistency of DES Measurements
MMNRAS , 000–000 (2020) Preprint 1 October 2020 Compiled using MNRAS L A TEX style file v3.0
Interpreting Internal Consistency of DES Measurements
V. Miranda, (cid:63) P. Rogozenski, and E. Krause , Steward Observatory, Department of Astronomy, University of Arizona, Tucson, Arizona, 85721, USA Department of Physics, University of Arizona, Tucson, Arizona, 85721, USA
Accepted XXX. Received YYY; in original form ZZZ
ABSTRACT
Bayesian evidence ratios are widely used to quantify the statistical consistency betweendifferent experiments. However, since the evidence ratio is prior dependent, the pre-cise translation between its value and the degree of concordance/discordance requiresadditional information. The most commonly adopted metric, the Jeffreys scale, canfalsely suggest agreement between datasets when priors are chosen to be sufficientlywide (Raveri & Hu 2019; Handley & Lemos 2019). In this work, we examine evidenceratios in a DES-Y1 simulated analysis, focusing on the internal consistency betweenweak lensing and galaxy clustering. We study two scenarios using simulated data incontrolled experiments. First, we calibrate the expected evidence ratio distributiongiven noise realizations around the best fit DES-Y1 Λ CDM cosmology. Second, weshow the behavior of evidence ratios for noiseless fiducial data vectors simulated usinga modified gravity model, which generates internal tension in the Λ CDM analysis. Weshow that the choice of prior could conceal the discrepancies between weak lensing andgalaxy clustering induced by such models and that the evidence ratio in a DES-Y1study is, indeed, biased towards agreement.
Key words: cosmological parameters – theory – large-scale structure of the Universe
Since the discovery of the accelerating expansion ofthe universe (Riess et al. 1998; Perlmutter et al. 1999),various surveys have been designed to measure the back-ground expansion and structure formation of the Universewith increasing precision. The Dark Energy Task Force(DETF) (Albrecht et al. 2006) classifies these surveys fromstage I to stage IV according to their ability to increase thefigure-of-merit (Albrecht et al. 2009) of the w − w a parame-terization for the dark energy equation of state (Linder 2003;Chevallier & Polarski 2001). The community is currently an-alyzing the stage III surveys, while stage IV surveys such asDESI (Levi et al. 2019), Nancy Grace Roman Space Tele-scope (Akeson et al. 2019), CMB-S4 (Abazajian et al. 2016)and Vera Rubin Telescope Legacy Survey of Space and Time(LSST) (The LSST Dark Energy Science Collaboration et al.2018) will start collecting data in the next few years withthe potential to significantly expand our knowledge aboutthe early and late-time cosmos.Ongoing stage III surveys, such as the Dark Energy Sur-vey (DES) (Abbott et al. 2005), constrain the parameters ofthe standard model ( Λ CDM) with unprecedented precision.These constraints encompass measurements of the CosmicMicrowave Background (CMB) (Planck Collaboration et al. (cid:63)
E-mail: [email protected] H , (Riess et al. 2019) is agood example of a tension that may require new physics tobe fully resolved (Knox & Millea 2019; Verde et al. 2019).The Dark Energy Survey uses the combination of weaklensing and galaxy clustering to break degeneracies between © a r X i v : . [ a s t r o - ph . C O ] S e p V. Miranda, P. Rogozenski, and E. Krause, dark energy and other parameters. For example, the DESyear one (DES-Y1) error bars from the cosmic shear investi-gation on the dark energy equation of state are reduced by ∼ in the combined analysis (Abbott et al. 2018b; Troxelet al. 2018). The joint analysis is only permitted howeverif the datasets are statistically consistent. In Abbott et al.(2018b), consistency was ascertained by the Bayesian evi-dence ratio, R , utilizing the Jeffreys Scale. However, ana-lytical examples show that the Jeffreys scale should not beused as an universal scale (Nesseris & Garcia-Bellido 2013),given that priors can always be chosen to be wide enough toenable consistency (Marshall et al. 2006).In order to make meaningful statements about consis-tency of datasets it is important to investigate how theBayesian evidence ratio R is affected by the priors underconsideration. These investigations are particularly relevantwhen tension with modest statistical significance is detected,e.g., the disagreement between Planck data and weak lens-ing surveys over the value of S ≡ σ Ω / m parameter (Abbottet al. 2018b; Heymans et al. 2020; Hikage et al. 2019).Given the demanding computational costs associatedwith Bayesian evidence computation (Handley et al. 2015),calibrating survey data concordance with simulated data isnot always feasible. Alternative metrics with reduced priordependence have been suggested (Handley & Lemos 2019;Seehars et al. 2016). In simple cases (e.g. multivariate Gaus-sians), these alternatives can be prior independent. How-ever, in more general cases, the interpretation of alternativemetrics still requires careful scale calibration using simu-lated data. Yet another approach to reduce prior dependen-cies is to adopt approximations, such as the validity of theGaussian linear model (GLM), which allows Bayesian esti-mators to be computed either analytically or from MonteCarlo Markov Chains (Raveri & Hu 2019).In this paper we examine the Bayesian evidence ratioin the context of quantifying consistency between cosmicshear, galaxy-galaxy lensing, and galaxy clustering in DES-Y1 data. In particular, we want to quantify whether cosmicshear and the combination of galaxy clustering and galaxy-galaxy lensing (so-called 2x2pt) can be combined into a so-called 3x2pt analysis. We test how this metric responds tonoise drawn from the DES-Y1 covariance around the best-fitcosmology at varying confidence intervals in (cid:174) χ space. Thisfirst test demonstrates how ‘real’ survey noise at known devi-ations from the best-fit cosmology propagates into Bayesianestimators. We then explore how the evidence ratio behaveswhen data vectors generated from an underlying modifiedgravity theory are fit with the standard model. When con-fined to the standard model, these modified gravity baseddata vectors naturally induce a tension between weak lens-ing and galaxy clustering.This manuscript is structured as follows: In Sect. 2 wedefine the tension metrics studied in this paper. In Sect. 3we explain the theoretical modeling and aspects of oursimulated analyses. Section 4 describes our findings aboutBayesian evidence ratios and other tension metrics whenconsidering noisy Λ CDM data vectors that are analyzed witha Λ CDM model. This scenario corresponds to the case whererealistic noise in a data vector might be misinterpreted as aphysical tension. In Sect. 5 we consider a noise free modifiedgravity data vector that is analyzed with a Λ CDM model.This scenario mimics the case where an actual physical ten- sion between the clustering and weak lensing parts of thedata vector exist. Four appendices offer further explanationof the details that are only summarized in this section. Weconclude in Sect. 6.
In this section we briefly review tension metrics andestablish consistent notation. We start defining the posteriorprobability for a set of parameters (cid:174) θ in a given model H andobserved dataset d as P ( (cid:174) θ | d , H) . The posterior is related tothe likelihood, P ( d | (cid:174) θ, H) , via the Bayes’ Theorem P ( (cid:174) θ | d , H) = P ( d | (cid:174) θ, H) P ( (cid:174) θ |H) P ( d |H) . (1)The prior, P ( (cid:174) θ |H) , describes the a priori probability distri-bution of the parameters (cid:174) θ within the assumed model H .The normalization factor, P ( d |H) , is called the Bayesian ev-idence (Marshall et al. 2006). The Bayesian evidence of M datasets (cid:174) d = ( d , . . . , d M ) given a model H of N parameters (cid:174) θ = ( θ , . . . , θ N ) is givenby P ( (cid:174) d |H) = ∫ d (cid:174) θ P ( (cid:174) d | (cid:174) θ, H) P ( (cid:174) θ |H) . (2)In order to evaluate the probability that experiments d and d are in agreement, we evaluate the odds of hypothesis H ,that we can model both datasets with a single set of param-eters, against the alternative hypothesis H , that modelingeach dataset with a different set of parameters is preferable.These odds are defined as P(H | d , d )/P(H | d , d ) andtheir relation to the evidences P ( d , d |H ) and P ( d , d |H ) can be readily seen when applying Bayes’ theorem P(H | d , d )P(H | d , d ) = P ( d , d |H ) P ( d , d |H ) · P (H ) P (H ) , (3)where P ( H i = { , } ) are the prior probabilities of models H i = { , } . The first ratio on the right-hand side of Eq. 3 isknown as the Bayesian evidence ratio, R. If the datasets areindependent, we may express it as R = P ( d , d |H ) P ( d |H ) P ( d |H ) . (4)The Bayesian evidence ratio generally implies agree-ment between datasets when R (cid:29) , while R (cid:28) flags theopposite. The ratio changes as a function of prior range,which can mimic consistency even in the presence of ten-sion. ∆ ¯ χ statistic The ¯ χ value is a statistic related to the average log-likelihood of a chain marginalized over the posterior. Giventhe weights of each sample i of a chain of length N , we cal-culate the statistic directly as ¯ χ j = − (cid:10) ln P ( (cid:174) d j | (cid:174) θ, H) (cid:11) = − (cid:205) Ni w i ln P i ( (cid:174) d j | (cid:174) θ, H) (cid:205) Ni w i , (5) MNRAS , 000–000 (2020) nterpreting Internal Consistency of DES Measurements where the sample weights are defined as the ratio of thesample posterior over the maximum sampled posterior of thechain. We define a statistic similar to the delta chi-squaredstatistic of (Marshall et al. 2006) as the difference betweenthe ¯ χ values of the joint and independent datasets as: ∆ ¯ χ = ¯ χ − ( ¯ χ + ¯ χ ) . (6) The Generalized Parameter Distance estimates the de-parture from the fiducial vector (in this case determinedby the DES-Y1 best-fit cosmology) and it is determined bycalculating the covariance of a chain, ˆ Σ , then taking the dif-ference, in parameter space, of the fiducial data vector, (cid:174) µ and the best-fit data vector of the samples, (cid:174) θ , as ∆ ≡ (cid:113)(cid:0) (cid:174) θ − (cid:174) µ (cid:1) t ˆ Σ − (cid:0) (cid:174) θ − (cid:174) µ (cid:1) , (7) Alternatively to the evidence ratio, the Kullback-Leibler (KL) Divergence, also known as the relative entropy,determines how parameters are constrained by the data com-pared to the prior constraints (Kullback & Leibler 1951).Defined as D i = ∫ d (cid:174) θ P ( (cid:174) θ | d i , H) ln (cid:34) P ( (cid:174) θ | d i , H) P ( (cid:174) θ |H) (cid:35) , (8)the KL Divergence is invariant under model reparameteri-zation and can be interpreted as measuring the informationgain when going from the prior distribution to the poste-rior. Similar to entropy, D i ≥ . The KL Divergence can alsomeasure the information gain of augmented datasets by tak-ing P ( (cid:174) θ |H) → P ( (cid:174) θ | d i , H) and P ( (cid:174) θ | d i , H) → P ( (cid:174) θ | d i + d new , H) .The relative entropy between datasets is the basis of a ten-sion metric called Surprise (Seehars et al. 2014, 2016). Boththe KL Divergence and Surprise computation is non-trivialoutside the Gaussian case, which limits their applicabilityas a check for statistical consistency. Suspiciousness is a tension metric that aims to alleviatethe prior dependence exhibited in the evidence ratio (Han-dley & Lemos 2019). This metric is defined as ln S ≡ ln R − ln I , (9)where ln I is defined as the information ratio ln I ≡ D + D − D . (10)In restricted cases (e.g. the case of flat priors imposed on amultivariate Gaussian likelihood), the prior dependence inthe metric is completely eliminated. For this particular case,a generalization to correlated datasets has been found Lemoset al. (2019). Details on the numerical evaluation of suspi-ciousness, as well as the evidence, in a nested sampling runare shown in Appendix D. Table 1.
Table with priors for the cosmological and nuisance pa-rameters, similar to the adopted priors in DES-Y1. In addition,we applied flat( . , . ) priors on Ω b h for minimal compati-bility with BBN constraints in CosmoLike (see Appendix A forfurther details).Parameter Prior Cosmology Ω m flat ( . , . ) A s × − flat ( . , . ) n s flat (0.87, 1.07) Ω b flat (0.03, 0.07) H flat (55.0, 91.0) m ν flat( . , . ) Lens Galaxy Bias b i ( i = , ) flat (0.8, 3.0) Intrinsic Alignment A IA ( z ) = A IA [( + z )/ . ] η IA A IA flat ( − , ) η IA flat ( − , ) Lens photo- z shift ∆ z Gauss ( , . ) ∆ z Gauss ( , . ) ∆ z Gauss ( , . ) ∆ z Gauss ( , . ) ∆ z Gauss ( , . ) Source photo- z shift ∆ z Gauss ( , . ) ∆ z Gauss ( , . ) ∆ z Gauss ( , . ) ∆ z Gauss ( , . ) Shear calibration m i ( i = , ) Gauss ( , . ) The theoretical modeling and covariance computationand validation for the DES-Y1 3x2pt analysis are describedin detail in (Krause et al. 2017). We summarize the mainmodeling details briefly below.
The DES 3x2pt data vector consists of the angular galaxyclustering statistic w i ( θ ) of galaxies in redshift bin i , thegalaxy–galaxy lensing statistic γ ij t ( θ ) for galaxies in redshiftbin i and shape measurements for source galaxies in red-shift bin j , and cosmic shear two-point correlations functions ξ ij ± ( θ ) of shape measurements for source galaxies in redshiftbins i , j . The galaxy sample used in the clustering measure-ment, which also constitutes the “lens” sample for galaxy-galaxy lensing, is selected using the redMaGiC algorithm(Rozo et al. 2016). Details on the DES-Y1 sample selec-tion and redshift calibration described in Elvin-Poole et al.(2018); Cawthon et al. (2018). For the weak lensing galaxysample, we adopt the DES-Y1 metacal source galaxy sam-ple, for which the sample selection from the DES-Y1 goldcatalog (Drlica-Wagner et al. 2018) and the shear catalogare described in Zuntz et al. (2018), and the source redshiftestimates are described in Hoyle et al. (2018), respectively.We denote the redshift distribution of the red-MaGiC/Metacal source galaxy sample in tomography bin i as n i g / κ ( z ) , and the angular number densities of galaxies in MNRAS , 000–000 (2020)
V. Miranda, P. Rogozenski, and E. Krause, this redshift bin as ¯ n i g / κ = ∫ dz n i g / κ ( z ) . (11)Assuming a flat Λ CDM universe, we write the radial weightfunction for clustering in terms of the comoving radial dis-tance χ as q i δ g ( k , χ ) = b i ( k , z ( χ )) n i g ( z ( χ )) ¯ n i g dzd χ , (12)with b i ( k , z ( χ )) the galaxy bias of the redMaGiC galaxies intomography bin i , and the lensing efficiency q i κ ( χ ) = H Ω m χ a ( χ ) ∫ d χ (cid:48) n i κ ( z ( χ (cid:48) )) dz / d χ (cid:48) ¯ n i κ χ (cid:48) − χχ (cid:48) , (13)where H is the Hubble constant, c the speed of light, and a the scale factor. The angular power spectra for cosmic shear,galaxy-galaxy lensing, and galaxy clustering are calculatedusing the Limber approximation C ij κκ ( l ) = ∫ d χ q i κ ( χ ) q j κ ( χ ) χ P NL (cid:18) l + / χ , z ( χ ) (cid:19) , C ij δ g κ ( l ) = ∫ d χ q i δ g (cid:16) l + / χ , χ (cid:17) q j κ ( χ ) χ P NL (cid:18) l + / χ , z ( χ ) (cid:19) , C ij δ g δ g ( l ) = ∫ d χ q i δ g (cid:16) l + / χ , χ (cid:17) q j δ g (cid:16) l + / χ , χ (cid:17) χ P NL (cid:18) l + / χ , z ( χ ) (cid:19) , (14)where P NL ( k , z ) is the non-linear matter power spectrum atwave vector k and redshift z computed via Halofit (Taka-hashi et al. 2012).The angular correlation functions are calculated fromthe angular power spectra as ξ ij + /− ( θ ) = ∫ dl l π J / ( l θ ) C ij κκ ( l ) ,γ ij t ( θ ) = ∫ dl l π J ( l θ ) C ij δ g κ ( l ) , w i ( θ ) = (cid:213) l l + π P l ( cos ( θ )) C ii δ g δ g ( l ) , (15)with J n ( x ) the n -th order Bessel function of the first kind,and P l ( x ) the Legendre polynomial of order l . The DES-Y1 baseline model includes nuisance parameters toaccount for uncertainties in astrophysical and observationalsystematic effects, summarized below. Prior distributions ofour parameters are given in Table 1, similar to those in DES-Y1 analyses. Parameters with Gaussian priors (i.e. the lensphoto- z shifts, the source photo- z shifts, and the shear cali-brations) are prior-dominated. A detailed validation of theseparameterizations can be found in Elvin-Poole et al. (2018);Krause et al. (2017) and Troxel et al. (2018). Photometric redshift uncertainties
The uncertainty inthe redshift distribution n is modeled through shift param-eters ∆ z , n ix ( z ) = ˆ n ix (cid:16) z − ∆ iz , x (cid:17) , x ∈ { g , κ } , (16) where ˆ n denotes the estimated redshift distribution. Wemarginalize over one parameter for each source and lensredshift bin (nine parameters in total), using the the pri-ors derived in Hoyle et al. (2018); Cawthon et al. (2018). Multiplicative shear calibration is marginalized usingone parameter m i per redshift bin, which affects cosmic shearand galaxy–galaxy lensing correlation functions via ξ ij ± ( θ ) −→ ( + m i ) ( + m j ) ξ ij ± ( θ ) ,γ ijt ( θ ) −→ ( + m j ) γ ijt ( θ ) , (17)with Gaussian priors as determined in Troxel et al. (2018);Zuntz et al. (2018). Galaxy bias
The DES-Y1 baseline model assumes an effec-tive linear galaxy bias ( b ) using one parameter per galaxyredshift bin b i ( k , z ) = b i , i.e. five parameters, which aremarginalized over conservative flat priors. Intrinsic galaxy alignments (IA) are modeled using apower spectrum shape and amplitude A ( z ) , assuming thenon-linear linear alignment (NLA) model (Hirata & Seljak2004; Bridle & King 2007) for the IA power spectrum. Theimpact of this specific IA power spectrum model can be writ-ten as q i κ ( χ ) −→ q i κ ( χ ) − A ( z ( χ )) n i κ ( z ( χ )) ¯ n i κ dzd χ . (18)The IA amplitude is modeled as a power-law scaling in ( + z ) with normalization A IA , and power law slope α IA , which areboth marginalized using conservative priors. Λ CDM DATA VECTORS
In this section, we analyze the distribution of Bayesianevidence ratios for a set of realistic noise realizations of theDES-Y1 data vectors around the DES-Y1 best-fit Λ CDMcosmology. We aim to examine which of these noise realiza-tions of Λ CDM can be flagged as tension according to theJeffreys scale. We also investigate whether noise realizationsat the one σ level are more or less likely to be classifiedas tension by the Jeffreys scale compared to three and fivesigma events. In the following two sections we run multiple simulatedDES-Y1 likelihood analyses to explore the distribution ofBayesian evidence ratios as a function different input datavectors. The input data vectors computed in Sect. 4.2 resem-ble realistic noise realizations of the DES-Y1 survey assum-ing the DES-Y1 best-fit cosmology. The input data vectorsin Sect. 5.1 are computed from a modified gravity model,thereby inducing a physical tension between the weak lens-ing and the galaxy clustering part of the data vector.Throughout this paper we assume that the likelihoodfunction ( L ) of our data vector ( D ) is well approximated by MNRAS , 000–000 (2020) nterpreting Internal Consistency of DES Measurements
150 200 250 300 350 x L n E v i d e n ce R a ti o Figure 1.
The distribution of (cid:174) χ for cosmic shear, χ shear , andthe 2x2pt (galaxy-galaxy lensing and galaxy clustering), χ ,generated using the DES-Y1 joint covariance matrix. We com-pute the , . , and . confidence intervals from thegeneration of hundreds of millions of noise realizations, smooththe contours, and define confidence intervals using a KDE. Thedata vectors are chosen along these contours and are representedas colored points. The color-code denotes the log-evidence ratioof the 3x2pt evidence to the 2x2pt and shear evidences (c.f. Eq.4). Our selected points are sample the confidence limits in all ra-dial directions and we don’t find radial or angular trends of theevidence ratio. a multivariate Gaussian L ∝ exp (cid:18) − (cid:20)(cid:16) D − M ( (cid:174) θ ) (cid:17) t C − (cid:16) D − M ( (cid:174) θ ) (cid:17)(cid:21)(cid:19) , (19)where M denotes the theory prediction or model vector. AsLin et al. (2019) demonstrate Gaussian functional form isa acceptable approximation, at least for ongoing and futurecosmic shear surveys.We use CosmoLike (Krause & Eifler 2017) with
CLASS (Lesgourgues 2011a; Blas et al. 2011; Lesgourgues 2011b;Lesgourgues & Tram 2011) to compute the fiducial data vec-tor and covariance. We sample the parameter space with the
Polychord (Handley et al. 2015) nested sampling, with aninterface implemented in the
Cobaya framework (Torrado &Lewis 2020), assuming the
CAMB (Lewis et al. 2000; Howlettet al. 2012) Boltzmann code. We perform extensive testsof our pipeline that merged
CosmoLike and
Cobaya , furtherdescribed in Appendices A and C.
The DES-Y1 covariance matrix for cosmic-shear, galaxy-galaxy lensing, and galaxy clustering and the noiseless fidu-cial data vector are evaluated at the DES-Y1 best-fit cosmol-ogy using
CosmoLike . We use the DES-Y1 covariance matrixto generate hundreds of millions of (Gaussian) noise realiza-tions around the noiseless fiducial DES-Y1 Λ CDM best-fitdata vector. The generation of a large sample of noise realiza-tions densely populates the (cid:174) χ = ( χ shear , χ ) space around DES Y1: 6.39Noiseless: 6.615 , 5.8 ± 3.163 , 5.9 ± 4.521 , 4.74 ± 2.59
Figure 2.
Histogram of evaluated log-evidence at the one, three,and five σ confidence intervals. For comparison, we include thelog-evidence ratios of our noiseless fiducial data vector and the of-ficial DES-Y1 analysis. The mean log-evidence ratio of each con-fidence interval is represented as a dotted line, with the mean andscatter explicitly given for each interval in the top-right key. Thehistogram reveals that the points on each contour all have similarlog-evidence ratio distributions. The histogram also shows thatthe observed DES-Y1 evidence ratio is rather typical and doesnot point to an unusual level of agreement between the datasets,where the Jeffreys scale declares the DES-Y1 log-evidence ratioto be decisive agreement. our fiducial data vector. We then applied Kernel Density Es-timator (KDE) to define, from the samples, confidence in-tervals of agreement. Based on these confidence regions weselect 68 data vectors that lie at the (one σ ), . (three σ ), and . (five σ ) confidence intervals withapproximate angular uniformity in (cid:174) χ space.The KDE method, implemented with help of GetDist (Lewis 2019) routines, approximates the probabilitydistribution of a continuum of values for (cid:174) χ from N gener-ated samples (cid:174) χ i = , ··· , N as follows P ( (cid:174) χ ) = N (cid:213) i = K f ( (cid:174) χ − (cid:174) χ i ) (20)where K f is a multivariate Gaussian kernel with zero meanand covariance f × ˆ C where ˆ C is the sample covariance of the (cid:174) χ . We found that given our large sample of computed datavectors f ∼ . is a good choice to balance smoothing andnoise features in the P ( (cid:174) χ ) contours. Figure 1 shows the fi-nal selection of data vectors as seen in (cid:174) χ space and displaysthe 1-5 σ confidence intervals as determined by our selectedKDE. The angular distribution of the selected noise real-izations nicely covers all quadrants. Figure 1 also illustratesthe evidence ratios of the selected data vector realizations,specifically the color bar shows the natural-log ratio of thedata vector’s 3x2pt evidence to its 2x2pt and shear evidencesas defined in Eq. 4. MNRAS , 000–000 (2020)
V. Miranda, P. Rogozenski, and E. Krause, Combined: -1.95 lnR + 10.431 : -1.82 lnR + 9.93 : -1.96 lnR + 10.685 : -2.0 lnR + 10.49 l n S Combined: 1.0 lnR - 12.521 : 0.94 lnR - 12.353 : 1.0 lnR - 12.65 : 1.01 lnR - 12.5
Figure 3.
Correlation between Bayesian evidence ratios and ∆ ¯ χ (left panel), Bayesian evidence ratios and suspiciousness (right panel).In both cases, the fit parameters of the slope are similar for one, three, and five σ noise realizations. For ∆ ¯ χ , the slope of the fit is closeto the predicted for multivariate Gaussian posteriors. Using the data vectors as generated in Sect. 4.2, wenow investigate whether statistical fluctuations in the DES-Y1 data vector have a high probability of causing tension(as defined by the Jeffreys scale).Figure 1 shows that there is no radial or angular depen-dency in the value of the evidence ratio as a function of χ values in cosmic shear and 2x2. Similarly, Fig. 2 shows nodifferences in the evidence ratio distribution associated withone, three, and five σ noise realizations; the histograms ofevidence ratios are all centered on large positive values aspredicted by (Raveri & Hu 2019) and (Handley & Lemos2019) for wide uninformative priors.The comparison between the evidence ratio and suspi-ciousness (c.f. Fig. 3) shows that broad priors significantlyincrease the number of noise fluctuations that are not flaggedas internal tension by evidence ratios, but they would beflagged by using suspiciousness. It is however not clear thata prior independent metric, such as suspiciousness, is neces-sarily more objective. While Bayesian evidence tends to hidetensions if broad priors are chosen, it is important to notethat tensions in data are inevitably connected to our priorunderstanding of the situation. Handley & Lemos (2019) ar-gue that some known tensions in cosmology would have beeninterpreted differently had they been observed decades ago,when our prior beliefs encompassed a broader range.It is difficult to estimate which tension estimator is abetter choice. In Fig. 3 (right panel), we present a com-parison and relative calibration between evidence ratios andsuspiciousness (for the specific DES-Y1 case considered inthis paper). Our results show how metrics that rely, at leastfor Gaussian Likelihoods, solely on the likelihood of the datadiffer from tension estimators that take the DES-Y1 priorbeliefs into account.Figure 2 shows that the observed DES-Y1 evidence ra-tio does not point towards an exceptional level of agreementbetween the datasets as would be inferred by the Jeffreysscale. Generally speaking we do not find a significant differ-ence in the evidence ratio’s mean or variance of data vectors . . . . − . − . . . l n ( R / R ) NoiselessNoiseless w/ cov/20Noiseless w/ cov/50
Figure 4.
Comparison between the evidence Ratio, R , for mod-els with Σ = { . , . , . } and the evidence, R , for the Λ CDMmodel ( Σ = ) model. Black diamonds are chains with DES-Y1 covariance, while blue squares and red triangles are chainswith covariances that were divided by 20 and 50, respectively.For DES-Y1 chains, the posterior for many parameters are beingpressed against the prior boundaries before inconsistencies be-tween cosmic shear and 2x2pt become important, which explainsthe unexpected behavior of evidence ratio going up as a functionof Σ . drawn from the 1- σ , 3- σ , 5- σ noise level (also c.f. Fig. 3, leftpanel). In addition, we also find that a noisy DES-Y1 datarealizations from the 1- σ confidence region of the param-eter covariance matrix can have a negative evidence ratio,which would point towards a significant discrepancy. Thesefindings make it difficult to motivate the DES-Y1 Bayesianevidence ratio as a strong indicator for significant agreementbetween cosmic shear and 2x2.In the case of correlated Gaussians, the evidence ratioand ∆ χ = χ − χ − χ (i.e. the maximum log-likelihoods)are linearly correlated. In our DES-Y1 posteriors, we how-ever find that a linear combination of the log-likelihoods, de- MNRAS , 000–000 (2020) nterpreting Internal Consistency of DES Measurements m H n s m A s m
64 72 80 H n s m Figure 5.
The posterior distribution of selected parameters for cosmic shear (dashed) and 2x2pt (solid) analyses, and for the defaultDES-Y1 covariance (yellow) against the case where the covariance was reduced by a factor 50 (blue). While it is true that Σ (cid:44) predictsinconsistencies between the cosmological parameters in Λ CDM, it is difficult to see them in DES-Y1 chains. Not only the error bars arelarger in DES-Y1, but also the posteriors are being squeezed against the prior boundaries. fined as ∆ ¯ χ (Eq. 6), is correlated with the evidence ratio.No correlation was found when comparing evidence ratiosagainst generalized parameter distances. In this section we investigate the evidence ratio’s behaviorwhen assuming a µ - Σ modified gravity scenario (as studiedin Abbott et al. (2019b), Ade et al. (2016), Aghanim et al.(2018), and Simpson et al. (2012)) that induces tension be-tween the weak lensing and the galaxy clustering parts of the3x2 data vector. Recall that Σ (cid:44) only affects cosmic-shearand galaxy-galaxy lensing. Following the definitions in Ferreira & Skordis (2010),the Poisson and lensing equations in Newtonian Gauge arealtered in the µ - Σ model as: k Ψ = − π Ga ( + µ ( a )) ρδ (21) k ( Ψ + Φ ) = − π Ga ( + Σ ( a )) ρδ. (22)Similar to the Λ CDM case (c.f. Sect. 4.2), we com-pute the µ - Σ data vector at the DES-Y1 best fit param-eter values. Specifically, we set µ ( a ) = µ Ω Λ ( z )/ Ω Λ and Σ ( a ) = Σ Ω Λ ( z )/ Ω Λ , with Ω Λ ( z ) being the redshift depen-dent dark energy density over the critical density. No noiseis added to the modified gravity data vectors. Similar to the Λ CDM cases, we apply
Halofit (Takahashi et al. 2012) to
MNRAS , 000–000 (2020)
V. Miranda, P. Rogozenski, and E. Krause, compute the nonlinear matter power spectrum in the µ − Σ case. The fact that Halofit does not correctly describe thenonlinear physics of µ − Σ gravity is not a significant concernfor this paper since it is not out goal to analyze actual data.Instead our goal is to examine changes in the evidence ratiowhen the data vector is computed from a different underly-ing physics than the model that is assumed in the analysis. We now investigate induced internal tensions in thecase where a data vector originating from µ - Σ gravity (seeSect. 5.1 for definitions) is evaluated in the DES-Y1 pipelinefor a Λ CDM cosmology. We have generated fiducial data vec-tors with fixed µ = and Σ ranging from ≤ Σ ≤ . We havenot added noise realizations from DES-Y1 covariances; themodified gravity data vector is noise free. Figure 4 presents asurprising behavior of evidence ratios: the log-evidence ratioof the noiseless modified gravity data vector and our fidu-cial noiseless Λ CDM data vector increases as a function of Σ (black diamonds). This means that the physical tensionintroduced by the modified gravity parameters in the galaxyclustering, galaxy-galaxy lensing, and cosmic shear parts ofthe data vector is not identified as such by the Bayesianevidence ratio.Such unexpected behavior of the evidence ratio can bebetter understood by looking at Fig. 5. We see that severalparameters are pushing against the prior boundaries. Thisboundary effect reduces differences between the cosmologicalparameters that fit cosmic shear and 2x2pt at the expense ofmaking the goodness of fit between theory and data worse.To check that prior boundaries are indeed responsible forthe unusual behavior of the evidence ratio, we re-examinethe log-evidence ratio of the noiseless modified gravity datavector and our fiducial noiseless Λ CDM data vector, how-ever this time we rescale the covariance matrices by factorsof twenty (c.f. Fig. 4 blue squares) and fifty (c.f. Fig. 4 redtriangles). This rescaling procedure significantly reduces theposterior volume, which reduces or even removes the priorboundary effects. Indeed, the evidence ratio now decreasesas a function of Σ as expected. This type of behavior exem-plifies the difficulties in interpreting tension metrics in re-alistic examples without extensive validation via simulatedanalyses. Tension metrics are an important aspect of multi-probeanalyses; they will be used increasingly to determine whetherprobes can be combined or whether tension across probesneed to be further explored. However, tension metrics them-selves need to be calibrated by simulated analyses for eachdataset in order to define levels of discordance.In this work we study the properties of several tensionmetrics for the specific case of the DES-Y1 3x2pt analysis.In Abbott et al. (2018b) the individual analyses of 1) cosmicshear and 2) the galaxy-galaxy lensing plus galaxy clustering(so-called 2x2pt) were compared and ultimately combinedinto a so-called 3x2pt analysis. Both data vectors, cosmicshear and 2x2pt, were deemed consistent under an assumed Λ CDM model. Consistency was demonstrated by computingthe Bayesian evidence ratio, with the result of 6.39, andinterpreted using the Jeffreys scale. Bayesian evidence ratioshowever are known to be prior dependent and it is importantto calibrate the computed numbers through a large suite ofsimulated analyses.In this paper we calibrate the distribution of evidenceratios for a large set of noise realizations around the DES-Y1best fit Λ CDM cosmology. The noisy data vectors are drawnfrom the DES-Y1 data covariance, not from the parametercovariance. While the data covariance and parameter co-variance are closely related, noise realizations drawn fromthe low-dimensional parameter covariance map onto smoothmodulations in the 457-dimensional data space with littlescatter from the fiducial data vector. Our data covariance in-cludes Gaussian cosmic variance, shot/shape noise (for clus-tering/weak lensing, respectively), and non-Gaussian con-tributions to the covariance from the connected four-pointfunction of the matter density field as well as super-samplecovariance (SSC) (Takada & Hu 2013). As the Gaussiancosmic variance terms and shape/shot noise are caused, re-spectively, by the limited number of independent Fouriermodes sampled in each angular bin and the limited numberof galaxies sampled in the power spectrum measurement,noise realizations drawn from the data covariance are nearlyuncorrelated between different Fourier modes and provide”noisy” scatter with little noticeable bias from the fiducialdata vector.”We run multiple simulated likelihood analyses for aDES-Y1 cosmic shear, 2x2pt, and 3x2pt data vector andfind that the Bayesian evidence value obtained by DES-Y1 (6.39) is rather typical. We then explore evidence ratioswhere noiseless data vectors that are computed from a µ − Σ modified gravity model are analyzed with a pipeline thatassumes a Λ CDM model. Under these assumptions, a phys-ical tension is induced between the weak lensing and galaxyclustering parts of the 3x2pt data vector and we explore theBayesian evidence ratio behavior as a function of increasingthe strength of the modified gravity model (increasing Σ ).We demonstrate that prior boundary effects can efficientlyhide tensions between the weak lensing and galaxy clusteringpart of the 3x2pt data vector. When significantly increasingthe constraining power, by dividing the covariance by factors20 and 50, we show that such boundary effects are signifi-cantly reduced and the expected tension appears.Our findings confirm that the evidence ratio, as mea-sured by the Jeffreys scale, is biased towards compatibilitybetween the datasets due to DES-Y1’s adopted priors. Thesewide priors were intentionally chosen conservatively and didnot take into account prior knowledge from other experi-ments. Such wide priors have the potential to hide tensionsbetween probes. In the near future DES data quality willbe superseded by stage IV experiments, in particular, Ru-bin Observatory’s LSST (Ivezi´c et al. (2019)), SPHEREx(Bock & SPHEREx Science Team (2018)), Euclid (Masterset al. (2017)), and the Roman Space Telescope (Spergel et al.(2015), Eifler et al. (2020)). These experiments will providean unprecedented amount of high-quality data that will en-able not just 3x2pt analyses, as considered in this paper, buta large variety of other cosmological probes as well. Explor-ing tensions between probes of the same data set and (evenmore interesting) between datasets will be a critical part of MNRAS , 000–000 (2020) nterpreting Internal Consistency of DES Measurements the data analysis of these missions, throughout which simu-lated analyses to calibrate tension metrics should become astandard tool in precision cosmology. ACKNOWLEDGMENTS
We want to thank T. Eifler for the thorough review andextensive suggestions that improved our results’ presenta-tion. We would also like to thank P. Lemos and M. Raverifor fruitful discussions. VM is supported by NASA ROSES16-ADAP16-0116 and NASA ROSES ATP 16-ATP16-0084.PR and EK are supported by Department of Energy grantDE-SC0020247.
APPENDIX A: PIPELINE VALIDATION
This appendix focuses on the technical aspects of thepipeline calibration. As shown in the main manuscript, theDES posteriors are non-Gaussian in some dimensions, whilethe DES priors are partially informative in several direc-tions, where the likelihood is weakly constraining. Suchproperties affect the required calibration of samplers hy-perparameters, such as the
Multinest ’s efficiency (Ferozet al. 2013), given that the entire volume of the parameterspace needs to be well sampled. Indeed, regions in param-eter space with low non-negligible likelihood probabilitiescan contribute to the Bayesian evidence as long as there isenough prior volume where the likelihood has similar values.The default
Multinest configuration on DES-Y1 is:number of live-points n live = , tolerance = . and effi-ciency = . . Figure A1 reveals biases in the evidence valueswith such settings. For other hyperparameters, such as thenumber of live-points, changes in the reported evidence arecompatible with the quoted error bars. These statements arevalid for both the shear-only and the 3x2pt analyses. Oneprominent feature on figure A1 is the constant slope of theevidence bias as a function of the Multinest ’s efficiency inthe case of the 3x2pt analysis. There is no guarantee, there-fore, that even efficiencies of the order of − would providereliable results, and such settings raise the evidence’s com-putational costs by one order of magnitude in comparisonto the hyperparameter values adopted on DES-Y1. We em-phasize that no conclusions on the general applicability of Multinest can be drawn from our analysis; results are spe-cific to DES-Y1. Figure A1 also does not imply that thereare no settings where
Multinest provides unbiased evidenceratios.We also checked if the detected biases on
Multinest reported evidences could have been identified through fea-tures in the posterior by-product, something that would havecalled the attention as being flagrantly corrupted. Figure A2shows no substantial deviations in the posterior as a functionof the efficiency parameter, except for slight enlargement ofthe two sigma contours, and we have run similar chains us-ing the
Emcee (Foreman-Mackey et al. 2013) sampler to con-firm such statement. Comparisons different
Multinest and
Emcee require robust calibration on both samplers, as onecould argue that direct comparison could point to problemsin
Emcee .To double-check that convergence on
Emcee has been achieved, we have run extremely long chains to check theconsistency of our results. Also, we have compared on Fig-ure A3
Emcee against a third sampler - Metropolis-Hasting- where the well established and reliable Gelman-Rubin cri-teria (Gelman & Rubin 1992) for convergence can be ap-plied. Such comparison also cross-checks our code devel-opment, which unites
Cosmolike and
Cobaya pipelines . Inour new code, Cosmolike receives distances, parameter val-ues and the matter power spectrum as function of redshiftand wavenumber and returns the DES-Y1 data vector. Thismerging allowed us to use both
Polychord and Metropolis-Hasting samplers with the fast-slow decomposition com-monly adopted in CMB analyses (Neal 2005; Lewis 2013),while
Emcee and
Multinest chains employ the original stan-dalone
Cosmolike .It is unclear how much
Multinest ’s biases might haveaffected DES-Y1 official results, and it is beyond the scopeof this article to make such an in-depth analysis of the DES-Y1 official chains. We do, however, believe that
Cobaya-Cosmolike code combines the pipeline validation effort thathas been performed on
Cosmolike with samplers that aremore robust than
Multinest in evaluating Bayesian evidenceratios.
Cobaya-Cosmolike also provides Metropolis-Hastingwith fast-slow decomposition that possesses robust conver-gence criteria, which is hard to be assessed in
Emcee . Indeed,the posterior comparison between Metropolis-Hasting and
Polychord show perfect agreement, as seen in figure A4.Moreover, Figures A5 and A6 show that
Polychord ’s ev-idence and posterior are robust against variations on theadopted values for its hyperparameters.One additional issue emerged from the comparison be-tween
CAMB and
CLASS
Boltzmann codes. While the original
Cosmolike is directly integrated to
CLASS , Cobaya frameworkprovided, at the time we run our simulations, full supportonly to
CAMB . Differences between CAMB or CLASS shouldhave been negligible, but we did detect an extra factor onthe
Halofit formula implemented by
CLASS . We then mod-ified
CAMB to match
CLASS choices, and we discuss this issuein greater depth on appendix C. In addition to that,
CLASS has limitations on the Ω b h range when dealing with BBNconstraints and because of that Cosmolike does assume theprior . < Ω b h < . . We, therefore, applied the sameprior choice in the Cobaya - Cosmolike joint pipeline. We donot expect such minor choices to affect the qualitative con-clusions of this work.
APPENDIX B: GAUSSIAN APPROXIMATION
There is a significant difference in computational costsbetween running MCMC for parameter estimation and eval-uating Bayesian evidence with nested sampling algorithms.The possibility of assessing evidence ratios using MCMCsamples could, therefore, incentivize a more widespread useof such metric as well as make the recalibration of the Jef-freys scale a lot simpler. Such inference is, however, gener-ically challenging in high-dimensional spaces (see Heavenset al. (2017) and references within it). Recently, Raveri & Hu https://github.com/CosmoLike/cocoa https://github.com/CobayaSampler/cobaya/issues/46 .MNRAS , 000–000 (2020) V. Miranda, P. Rogozenski, and E. Krause, − − − Efficiency024 M u l t i n e s t E v i d e n ce B i a s n live . . . . − − Tolerance − . − . . . Figure A1.
MultiNest evidence bias as a function of the sampling efficiency (left panel), number of live points (middle panel) andevidence tolerance factor (right panel). As a simplifying assumption, the evidence evaluated from the chain with either the lowestefficiency or the highest number of live points or the tolerance factor has zero bias by construction. The error bars reflect
MultiNest ’sclaimed uncertainties and no error propagation was applied to take into account the error bars in the value of the unbiased evidence.Sampler × pt DV0 × pt DV1 × pt DV0 × pt DV1 cosmic shear
DV0 cosmic shear
DV1 R DV0 R DV1
GLM - Mean -306.4 -204.0 -172.4 -116.3 -154.5 -110.89 20.5 23.2GLM - Chain BF -307.5 -204.6 -176.4 -117.7 -142.1 -91.7 11 4.8GLM - MKL -306.4 -204.6 -176.4 -117.7 -154.5 -110.89 24.5 23.9Polychord − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . Table A1.
The Comparison performed between predicted Bayesian evidence evaluated using
MultiNest , PolyChord and Gaussian LinearModeling of Metropolis-Hasting chains around either the median of the parameters or the chain best fit. MKL stands for MinimumKullback-Leibler divergence (Kullback & Leibler 1951), and in that row, we select the Gaussian approximation from the two previous casesby minimizing the KL divergence against the full posterior (Raveri & Hu 2019)). In all cases, the additional constraint . < Ω b h < . were applied as an additional top-hat likelihood. DV0 and
DV1 represent distinct noise realizations of the best-fit data vector.Sampler n live Efficiency Tolerance n repeats Multinest (MN)
256 0 . . –Polychord – .
05 3 × dim Table A2.
Default values assumed for the internal parametersemployed in the multiple sampler codes we analyzed in our ap-pendix. In regards to
MultiNest , tolerance corresponds to the ev-idence tolerance factor ; efficiency is the sampling efficiency (thevariable efr ) and n live matches the number of live points . In ad-dition, we set to False the boolean variable that sets up the con-stant efficiency mode . Using
PolyChord , clustering was turned offby default, and n repeats matches the variable num_repeats . Emcee runs consume a fixed amount of computer resources to ensure thatchains contain no less than 5 million samples. On the other hand,Metropolis Hasting samples were run until reaching convergenceaccording to the Gelman and Rubin criteria, where we find themean and standard deviation of the Gelman-Rubin criteria to be0.02 and 0.2, respectively. (2019) proposed a Gaussian approximation to the posteriorthat can provide an estimate for the evidence ratio. For DESonly chains, some partially constrained parameters are priorlimited, which is an indication that the Gaussian approxima-tion may fail. Nevertheless, we tested this approximation infew data vectors given the potential reward such a methodcould have brought to the ongoing DES-Y3 analysis and thiswork.We have followed Raveri & Hu (2019) closely, imple-menting the Gaussian approximation around either the bestfit or the median of the MCMC chain. Initially, we havetested such a scheme in two noise realizations generated us-ing an approximate DES-Y3 covariance (see table A1). The use of DES-Y3 covariance matrix represents a best-case sce-nario given that more constraining data should make theGaussian expansion to work better. For shear only, the ap-proximation does not provide accurate Bayesian evidence ra-tios. Results were more encouraging for the 2x2pt and 3x2ptanalyses, and we further examined such cases in eight ad-ditional noise realizations. Results are shown in figure B1.Unfortunately, there are order unit biases that make theadoption of this approximation in our work unfeasible foreven the most constraining 3x2pt analysis.
MNRAS , 000–000 (2020) nterpreting Internal Consistency of DES Measurements m n s H b m A s m n s
64 72 80 H b m MN, Efficiency: 0.3MN, Efficiency: 0.0005EMCEE
Figure A2.
The panel presents the posterior predicted by
Multinest as a function of the adopted efficiency hyperparam-eter. Table A2 shows the values of additional
Multinest settings.The comparison against the
Emcee sampler confirms that chainswith high-efficiency do predict posteriors that are quite close tothe truth. Indeed, no posterior feature stands out as being anoutlier, something that would indicate that lower efficiency is in-deed needed as it predicts order unity bias for the evidence (seeFigure A1). m n s H b m A s m n s
64 72 80 H b m EMCEE: 3x2 DV0EMCEE: 3x2 DV1COBAYA-MH: 3x2 DV0COBAYA-MH: 3x2 DV1
Figure A3.
The figure compares the predicted posterior for thecosmological parameters given by
Emcee and Metropolis-Hastingsamplers. Blue shades on the two-dimensional panels correspondto dashed blue lines on the 1D posterior plots. The two 3x2ptdata vectors -
DV0 and
DV1 were data vectors with noise gener-ated using a simulated DES-Y3 covariance. The agreement be-tween the two samplers is good to cross-check, considering thepipelines are somewhat different: the linear power spectrum on
Emcee was evaluated within
CLASS (default
CosmoLike pipeline)while for the Metropolis-Hasting we have performed a mergingbetween
Cobaya and
CosmoLike and used
CAMB to calculate thematter power spectrum.MNRAS , 000–000 (2020) V. Miranda, P. Rogozenski, and E. Krause, m n s H b m A s m n s
64 72 80 H b m COBAYA-PC: 3x2 DV0COBAYA-PC: 3x2 DV1COBAYA-MH: 3x2 DV0COBAYA-MH: 3x2 DV1
Figure A4.
The figure compares the predicted posterior for thecosmological parameters given by
Polychord against Metropolis-Hasting. Shades on the 2D panels correspond to dashed lines onthe 1D posterior plots. The two 3x2pt data vectors -
DV0 and
DV1 were data vectors with noise generated using a simulated DES-Y3 covariance. In both cases, the matter power spectrum wasevaluated using
CAMB (without removing the extra
Halofit factorshown in Eq. C2). m n s H b m A s m n s
64 72 80 H b m Shear, n repeats : n DIM
Shear, n repeats : 3 × n DIM
Shear, n repeats : 20 × n DIM
Figure A5.
The figure compares the predicted posterior for thecosmological parameter given by
Polychord as a function of thehyperparameter n repeats written in units of the number of param-eters in the chain ( n DIM ). Blue shades on the two-dimensionalpanels correspond to dashed blue lines on the 1D posterior plots.On shear-shear, the posterior shows uncertain behavior in thecase n repeats = n DIM , with no appreciable changes were seen in therange < n repeats / n DIM < . This is not necessarily the case for3x2pt data vectors, where setting n repeats = n DIM is acceptable forposteriors. MNRAS , 000–000 (2020) nterpreting Internal Consistency of DES Measurements n repeats ( × n DIM )0 . . . P o l y c h o r d E v i d e n ce B i a s
200 300 400 500 n live − . − . − . . −
100 precision criterion − . . . Figure A6.
Polychord evidence bias as a function of the n repeats parameter (left panel), number of live points (middle panel) andprecision criterion (right panel). As a simplifying assumption, the evidence evaluated from the chain with the highest n repeats (leftpanel), the highest number of live points (middle panel), or the lowest precision criterion factor (right panel) has zero bias by construc-tion. The parameter n repeats on the left panel is shown in units of the parameter dimension, n DIM . The error bars reflect
Polychord ’sclaimed uncertainties, and no error propagation was applied to take into account the error bars in the value of the unbiased evidence.Computational costs scale as O ( n repeats ) (Handley et al. 2015), the main bottleneck of our chains, so we have adopted n repeats = × n DIM as a middle ground between accuracy and computational costs. − E v i d e n ce B i a s Max Like Median − Figure B1.
The panels present the comparison between the Bayesian evidence calculated using the Gaussian approximation and
Polychord ’s results. Bias is defined as the difference for the natural logarithm of the Bayesian evidence. The left panel assumed the 3x2pt data vector, while we restrict the analysis to galaxy-galaxy lensing and galaxy clustering in the right panel. The data vectors wererandomly generated using a simulated DES-Y3 covariance. Triangle blue points with thick error bars show the results when the Gaussianapproximation is made around the median of the chain, while black round points provide the results for the Gaussian estimation aroundthe sample of chain with the best likelihood. The error bars reflect
Polychord ’s claimed uncertainties.MNRAS , 000–000 (2020) V. Miranda, P. Rogozenski, and E. Krause,
APPENDIX C: HALOFIT
One practical issue has emerged in our sampler com-parison that is related to implementation differences be-tween
CAMB and
CLASS codes . The Cobaya pipeline versionadopted in this work had only partial support to
CLASS ,while
CosmoLike is incompatible with
CAMB . Therefore, theMetropolis-Hasting and
Polychord chains employed
CAMB toevaluate the background comoving distances and the non-linear matter power spectrum, while
Multinest and
Emcee chains used
CLASS . We, consequently, tested the compatibil-ity between these Boltzmann codes, and discrepancies in the
Halofit formula were spotted.The original Takahashi
Halofit formula for the non-linear matter power spectrum ∆ ( k ) = k P ( k )/( π ) is givenby ∆ ( k ) = ∆ Q ( k ) + ∆ H ( k ) . (C1)The specific expression for ∆ Q ( k ) and ∆ H ( k ) can be foundat (Takahashi et al. 2012). Both Class and
CAMB have up-dates to Takahashi formula that aims to provided betteragreement against cosmology with massive neutrinos. Wewere unable to find the references in peer-reviewed journalsfor such updates. One of the new terms is, in
Class , thefollowing ∆ Q ( k ) → ∆ Q ( k ) (cid:8) + f ν (cid:2) . − . × ( Ω m − . ) (cid:3)(cid:9) , (C2)with f ν ≡ Ω ν / Ω m . In CAMB , on the other hand, the termproportional to ( Ω m − . ) does not exists; the impact ofsuch factor is shown on figure C1. APPENDIX D: NESTED SAMPLING
Evaluation of the Bayesian evidence is possible withnested sampling algorithms (Skilling 2006), and we willbriefly review them in this appendix. Let P ( (cid:174) θ |H) be theprior distribution of the parameters (cid:174) θ within a model H , L be the likelihood distribution P ( (cid:174) d | (cid:174) θ, H) , and E be the evi-dence P ( (cid:174) d |H) . We define X ( λ ) to be the fraction of the priorvolume contained within the isolikelihood contour given by P ( (cid:174) d | (cid:174) θ, H) = λ as shown below X ( λ ) = ∫ L >λ d (cid:174) θ P ( (cid:174) θ |H) . (D1)Nested sampling algorithms evaluate evidences via theone dimensional integral E = ∫ L( X ) dX . (D2)This integration is performed by maintaining a set of livepoints, n live , that samples a sequence of exponentially con-tracting volumes that respects that hard boundary L > L i at iteration i + . The L i value corresponds to the worse like-lihood of all live points at iteration i , which is subsequently CAMB commit onthe official
GitHub repository https://github.com/cmbant/CAMB . CLASS commit onthe official
GitHub repository https://github.com/lesgourg/class_public m n s H b m A s m n s
64 72 80 H b m CAMB Halofit: 3x2 DV0CAMB Halofit: 3x2 DV1Class Halofit: 3x2 DV0Class Halofit: 3x2 DV1
Figure C1.
This figure compares the impact of the additionalterm that
CLASS implements on the
Halofit in comparison tothe expression that
CAMB assumes for the non-linear comple-tion of the matter power spectrum. All MCMC chains adoptedthe Metropolis-Hasting sampler and
CAMB code. Shades on thetwo-dimensional panels correspond to dashed lines on the one-dimensional posterior plots. The two 3x2pt data vectors -
DV0 and
DV1 were randomly generated around the default cosmologyusing a simulated DES-Y3 covariance. As expected, the posteri-ors differ the most on the volume of parameter space associatedwith high values for the sum of neutrino masses. Such discrep-ancy is also non-negligible on the one-dimensional Ω m and H marginalized posteriors. discarded and replaced by another point with L > L i . Mak-ing this replacement efficient is the technically challengingpart of the algorithm (see Feroz et al. (2013) and Handleyet al. (2015) for specific implementations). The set of dis-carded points are named dead points, and the discretizationof the one dimensional evidence integral above is given by E ≈ (cid:213) i ∈ dead (cid:0) X i − − X i (cid:1) × L i . (D3)The precise X i volumes are unknown, but can be probabilis-tically estimated. To reconstruct the prior volume at the ithiteration, the algorithm sample n live times the uniform dis-tribution spanning from 0 to X i − and retrieve the maximumprior volume Skilling (2006).The same procedure can also be used to calculate theKL divergence D i ≈ (cid:213) i ∈ dead ( X i − − X i ) × L i ln (cid:18) L i E (cid:19) . (D4)This expression allows us to evaluate suspiciousness usingthe same nested sampling runs used to calculate evidence,and we have cross-check our numerical results for the KLdivergence against the anesthetic package (run on the samechains) (Handley 2019). Finally, this section it also shows MNRAS , 000–000 (2020) nterpreting Internal Consistency of DES Measurements why the evaluation of the Surprise metric is challenging. Thecalculation of the relative entropy between datasets wouldrequire additional nested sampling runs where the “prior”would be one of the dataset’s posteriors. REFERENCES
Abazajian K. N., et al., 2016. ( arXiv:1610.02743 )Abbott T., et al., 2005. ( arXiv:astro-ph/0510346 )Abbott T. M. C., et al., 2018a, Mon. Not. Roy. Astron. Soc., 480,3879Abbott T. M. C., et al., 2018b, Phys. Rev., D98, 043526Abbott T. M. C., et al., 2019a, ApJ, 872, L30Abbott T. M. C., et al., 2019b, Phys. Rev., D99, 123505Ade P. A. R., et al., 2016, Astronomy & Astrophysics, 594, A14Aghanim N., et al., 2018. ( arXiv:1807.06209 )Akeson R., et al., 2019. ( arXiv:1902.05569 )Alam S., et al., 2017, Mon. Not. Roy. Astron. Soc., 470, 2617Albrecht A., et al., 2006. ( arXiv:astro-ph/0609591 )Albrecht A., et al., 2009. ( arXiv:0901.0721 )Asgari M., et al., 2020. ( arXiv:2007.15633 )Austermann J. E., et al., 2012, SPTpol: an instrument for CMBpolarization measurements with the South Pole Telescope. p.84521E, doi:10.1117/12.927286Blas D., Lesgourgues J., Tram T., 2011, J. Cosmology Astropart.Phys., 2011, 034Bock J., SPHEREx Science Team 2018, in American Astronomi-cal Society Meeting Abstracts arXiv:1306.2144 )Ferreira P. G., Skordis C., 2010, Physical Review D, 81Foreman-Mackey D., Hogg D. W., Lang D., Goodman J., 2013,PASP, 125, 306Gelman A., Rubin D. B., 1992, Statist. Sci., 7, 457Handley W., 2019. ( arXiv:1905.04768 ), doi:10.21105/joss.01414Handley W., Lemos P., 2019, Phys. Rev., D100, 043504Handley W. J., Hobson M. P., Lasenby A. N., 2015, Mon. Not.R. Astron. Soc. , 453, 4384Heavens A., Fantaye Y., Mootoovaloo A., Eggers H., Hosenie Z.,Kroon S., Sellentin E., 2017. ( arXiv:1704.03472 )Heymans C., et al., 2020, arXiv e-prints, p. arXiv:2007.15632Hikage C., et al., 2019, Publ. Astron. Soc. Jap., 71, Publicationsof the Astronomical Society of Japan, Volume 71, Issue 2,April 2019, 43, https://doi.org/10.1093/pasj/psz010Hildebrandt H., et al., 2018. ( arXiv:1812.06076 )Hirata C. M., Seljak U., 2004, Phys. Rev. D, 70, 063526Howlett C., Lewis A., Hall A., Challinor A., 2012, J. CosmologyAstropart. Phys., 1204, 027Hoyle B., et al., 2018, Mon. Not. R. Astron. Soc. , 478, 592Ivanov M. M., Simonovi´c M., Zaldarriaga M., 2020, Phys. Rev.D, 101, 083504Ivezi´c ˇZ., et al., 2019, ApJ, 873, 111Knox L., Millea M., 2019. ( arXiv:1908.03663 )Krause E., Eifler T., 2017, Mon. Not. R. Astron. Soc. , 470, 2100 Krause E., et al., 2017. p. arXiv:1706.09359 ( arXiv:1706.09359 )Kullback S., Leibler R. A., 1951, Ann. Math. Statist., 22, 79Lemos P., K ˜A˝uhlinger F., Handley W., Joachimi B., WhitewayL., Lahav O., 2019. ( arXiv:1910.07820 )Lesgourgues J., 2011a, arXiv e-prints, p. arXiv:1104.2932Lesgourgues J., 2011b, arXiv e-prints, p. arXiv:1104.2934Lesgourgues J., Tram T., 2011, J. Cosmology Astropart. Phys.,2011, 032Levi M. E., et al., 2019. ( arXiv:1907.10688 )Lewis A., 2013, Phys. Rev., D87, 103529Lewis A., 2019. ( arXiv:1910.13970 )Lewis A., Challinor A., Lasenby A., 2000, ApJ, 538, 473Lin C.-H., Harnois-D´eraps J., Eifler T., Pospisil T., Mandel-baum R., Lee A. B., Singh S., 2019, arXiv e-prints, p.arXiv:1905.03779Linder E. V., 2003, Phys. Rev. Lett., 90, 091301Liske J., et al., 2015, Mon. Not. R. Astron. Soc. , 452, 2087Marshall P., Rajguru N., Slosar A., 2006, Phys. Rev., D73, 067302Masters D. C., Stern D. K., Cohen J. G., Capak P. L., RhodesJ. D., Castander F. J., Paltani S., 2017, ApJ, 841, 111Neal R. M., 2005. ( arXiv:math/0502099 )Nesseris S., Garcia-Bellido J., 2013, JCAP, 1308, 036Perlmutter S., et al., 1999, Astrophys. J., 517, 565Planck Collaboration et al., 2018. ( arXiv:1807.06205 )Prakash A., et al., 2016, Astrophys. J. Suppl., 224, 34Raveri M., Hu W., 2019, Phys. Rev., D99, 043506Riess A. G., et al., 1998, Astron. J., 116, 1009Riess A. G., Casertano S., Yuan W., Macri L. M., Scolnic D.,2019, Astrophys. J., 876, 85Rozo E., et al., 2016, Mon. Not. R. Astron. Soc. , 461, 1431Scolnic D. M., et al., 2018, Astrophys. J., 859, 101Seehars S., Amara A., Refregier A., Paranjape A., Akeret J., 2014,Phys. Rev., D90, 023533Seehars S., Grandis S., Amara A., Refregier A., 2016, Phys. Rev.,D93, 103507Simpson F., et al., 2012, Monthly Notices of the Royal Astronom-ical Society, 429, 2249ˆa ˘A¸S2263Skilling J., 2006, Bayesian Anal., 1, 833Spergel D., et al., 2015, arXiv e-prints, p. arXiv:1503.03757Takada M., Hu W., 2013, Phys. Rev. D, 87, 123504Takahashi R., Sato M., Nishimichi T., Taruya A., Oguri M., 2012,Astrophys. J., 761, 152Tegmark M., Eisenstein D. J., Hu W., Kron R. G., 1998aTegmark M., Eisenstein D. J., Hu W., 1998b, in 33rd Rencontresde Moriond: Fundamental Parameters in Cosmology. pp 355–358 ( arXiv:astro-ph/9804168 )The LSST Dark Energy Science Collaboration et al., 2018.( arXiv:1809.01669 )Thornton R. J., et al., 2016, ApJS, 227, 21Torrado J., Lewis A., 2020Troxel M. A., et al., 2018, Phys. Rev., D98, 043528Verde L., Treu T., Riess A. G., 2019, in Nature Astronomy 2019.( arXiv:1907.10625 ), doi:10.1038/s41550-019-0902-0Zuntz J., et al., 2018, Mon. Not. R. Astron. Soc. , 481, 1149MNRAS000