[PDF] On the measurement of cosmological parameters

Abstract

We have catalogued and analysed cosmological parameter determinations and their error bars published between the years 1990 and 2010. Our study focuses on the number of measurements, their precision and their accuracy. The accuracy of past measurements is gauged by comparison with the WMAP7 results. The 637 measurements in our study are of 12 different parameters and we place the techniques used to carry them out into 12 different categories. We find that the number of published measurements per year in all 12 cases except for the dark energy equation of state parameter w_0 peaked between 1995 and 2004. Of the individual techniques, only BAO measurements were still rising in popularity at the end of the studied time period. The fractional error associated with most measurements has been declining relatively slowly, with several parameters, such as the amplitude of mass fluctutations sigma_8 and the Hubble constant H_0 remaining close to the 10% precision level for a 10-15 year period. The accuracy of recent parameter measurements is generally what would be expected given the quoted error bars, although before the year 2000, the accuracy was significantly worse, consistent with an average underestimate of the error bars by a factor of ~2. When used as complement to traditional forecasting techniques, our results suggest that future measurements of parameters such as fNL, and w_a will have been informed by the gradual improvment in understanding and treatment of systematic errors and are likely to be accurate. However, care must be taken to avoid the effects of confirmation bias, which may be affecting recent measurements of dark energy parameters. For example, of the 28 measurements of Omega_Lambda in our sample published since 2003, only 2 are more than 1 sigma from the WMAP results. Wider use of blind analyses in cosmology could help to avoid this.

Full PDF

aa r X i v : . [ a s t r o - ph . C O ] J u l Quarterly Physics Review No 1 (2015) pp 1-14

On the measurement of cosmological parameters

Rupert A.C. Croft *, Matthew Dailey

Abstract

We have catalogued and analysed cosmological parameter determinations and their error bars published be-tween the years 1990 and 2010. Our study focuses on the popularity of measurements, their precision and theiraccuracy. The accuracy of past measurements is gauged by comparison with the WMAP results of Komatsuet al. (2011). The 637 measurements in our study are of 12 different parameters and we place the techniquesused to carry them out into 12 different categories. We ﬁnd that the popularity of parameter measurements(published measurements per year) in all 12 cases except for the dark energy equation of state parameter w peaked between 1995 and 2004. Of the individual techniques, only Baryon Oscillation measurements werestill rising in popularity at the end of the studied time period. The quoted precision (fractional error) of mostmeasurements has been declining relatively slowly, with several parameters, such as the amplitude of massﬂuctutations s and the Hubble constant H remaining close to the precision level for a 10-15 year period.The accuracy of recent parameter measurements is generally what would be expected given the quoted errorbars, although before the year 2000, the accuracy was signiﬁcantly worse, consistent with an average underes-timate of the error bars by a factor of ∼ . When used as complement to traditional forecasting techniques, ourresults suggest that future measurements of parameters such as fNL, and w a will have been informed by thegradual improvment in understanding and treatment of systematic errors and are likely to be accurate. However,care must be taken to avoid the effects of conﬁrmation bias, which may be affecting recent measurements ofdark energy parameters. For example, of the 28 measurements of W L in our sample published since 2003, only2 are more than 1 s from the WMAP results. Wider use of blind analyses in cosmology could help to avoid this. Keywords

Cosmology: observations McWilliams Center for Cosmology, Carnegie Mellon University, Pittsburgh, PA 15213, USA Dept. of Physics, Carnegie Mellon University, Pittsburgh, PA 15213, USA * Corresponding author : [email protected]

Contents

References 13

1. Introduction

Modern cosmological parameters have been measured sinceHubble’s (1929) discovery of the expansion of the Universe.The number of model parameters increased during the late1980s with the introduction of what is often referred to as the“Standard cosmological model” (e.g. Dodelson 2005). Theidea of “Precision cosmology” emerged more recently, andby the present time, many of the parameters in this model arewell known (see e.g., Komatsu et al. 2011, hereafter WMAP7).This presents us with an interesting opportunity: by compar-ing the past measurements of parameters and their error barswith the currently known values, we can evaluate how wellthe measurements were carried out in the past, how realis-tic the quoted uncertainties were, and which methods gavethe most statistically reliable results. We can also study howboth their precision and accuracy has varied with time. Suchresearch will help us in our quest to make critical evaluationsof what will be possible in the future, and by working withpast data serves as a complement to more conventional fu-ture extrapolations of technology and techniques (e.g., thereport of the Dark Energy Task Force, hereafter DETF, Al-brecht et al. , 2006). In the present paper we make a ﬁrst1 n the measurement of cosmological parameters — 2/14 attempt at such a study, by compiling published parametervalues taken from the NASA Astrophysics Data System overthe years 1990-2010.Previous studies of cosmological parameter determina-tions have tended to focus on the Hubble Constant, H , forwhich there is a longer than 80 year baseline for analysis.Several papers have used the comprehensive database com-piled by John Huchra to generate their dataset, such as thestudy of the non-Gaussian error distribution in those measure-ments (Chen et al. 2003). Gott et al. (2001) used medianstatistics in a metanalysis of these H measurements to ﬁndthe most probable value (and also analysed early measure-ments of W L ). This median statistics approach has also beenused to combine individual estimates of W m , the presentmean mass density in non-relativistic matter by Chen and Ra-tra (2003). In the present paper we do not seek to combinethe measurements from various works into best determina-tions of parameters. Instead we start from the assumptionthat the parameters we look at have been well measured (andtheir correct values are close to the WMAP7 values) and seewhat this implies about past measurements.We therefore will be starting with the assumption that L CDM is the correct cosmological model. This should beborne in mind when interpreting our results. Even if the truecosmology turns out in the future to be something else, weexpect that the effective values of the L CDM parameters arenot likely to be very different (given the good ﬁts to currentdata), so that our approach will have some value even then.Parameters which at the moment are unknown, or very poorlyconstrained, such as the non-Gaussianity parameter fNL (e.g.,Slosar et al. 2008), or the time derivatives of the dark energyequation of state parameter w (e.g., Chevallier & Polarski2001) can obviously not be studied at present with our ap-proach. Instead we hope that the general lessons from thepast about the reliability of error bars, methods and achiev-able precision and accuracy can usefully to inform future ef-forts to measure those parameters.The DETF report explains how four different techniquesare being used and will be used in the future to constrain darkenergy parameters. These techniques, gravitational lensing,baryon oscillations, galaxy cluster surveys and supernova sur-veys all have a history and have been involved in a large num-ber of previous measurements of different parameters. It isinteresting to see how they have performed in the past, andevaluate them based on this data. By looking over the pub-lished record, we can also show how measurement precisionhas changed, in terms of the quoted fractional error bars, andsee how this compares with predicted future trends. One canask whether for example the earlier error bars were unrealisti-cally small, so that the quoted precision of measurements hasnot changed much. This should have consquences for the ac-curacy of measurements, which we will deﬁne and measure.In general, our motivation for this study can be summarizedby the idea that once cosmological parameter measurements are published, for the most part they are ignored when futurework arrives. The dataset left behind can instead become avaluable resource to inform future work.Our plan for the paper is as follows. In Section 2., wedetail the source for the cosmogical parameter estimates andhow the data was collated. We explain the different catego-rizations of measurements and methods and standardizationthat was carried out. In Section 3 we outline the steps in-volved in our analysis of the data, and present results includ-ing historical trends in some individual parameters and theprecision and accuracy of measurements. In Section 4 wesummarize our ﬁndings and discuss our results.

2. Data

We have made use of the NASA Astrophysics Data System to generate our dataset by carrying out an automated searchof publication abstracts for the years (1990-2010). We lim-ited the search to published papers which include cosmologi-cal parameter values and their error bars in the paper abstractitself. It is of course possible to carry out a more extensiveanalysis by searching the main text of each paper, and weestimate from a random sampling that approximately 40%of parameter estimates are missed by our abstract-only tech-nique. We make the assumption that this does not bias oursample. The total number of parameter measurements in the20 year period shown is 637. The search we use in the ADS abstract query form is a searchfor the following terms: “sigma8”,”H0”, “Omega”,”Lambda”,”m nu”,”baryon”. We also restrict our search to the followingjournals: MNRAS, Astrophysical Journal, ApJ Letters, ApJsupplement, and Physical Review Letters. This parametersearch query appears restrictive, but enables results for 12different parameters to be found, including associated param-eters. These 12 are:1. W M , the ratio of the present matter density to the criti-cal density.2. W L , the cosmological constant as a fraction of the crit-ical density,3. H , the Hubble constant,4. s , the amplitude of mass ﬂuctuations,5. W b , the baryon density as a fraction of the critical den-sity,6. n , the primordial spectral index7. b , equivalent to W . m / b where b is the galaxy bias,8. m n , the neutrino mass,9. G , equivalent to W m H /100 km s − Mpc − , http://adsabs.harvard.edu/ n the measurement of cosmological parameters — 3/14 W . m s , a combination that arises in peculiar velocityand lensing measurements,11. W k the curvature,12. w , the equation of state parameter for Dark Energy.The measurements are generally quoted with 1 s errorson the parameters but 7% have 2 s errors. In this case, in or-der to have a uniform sample, we halve the 2 s error bars. Wehave tested the effect of ignoring excluding these 7% of mea-surements on our results (Section 3.3) and ﬁnd that our con-clusions are insensitive to this. Some of the measurementsare also quoted with separate systematic and statistical errorbars (6% of the sample). In this case we sum the statisticaland systematic errors to make a total error bar. We also testthe effect of adding them in quadrature, or ignoring the sys-tematic part altogether (see Section 3.4).Given that our approach is to assume that the WMAP7 re-sults are correct within their quoted errors and that the L CDM model describes the observations well, we use the L CDM model values to convert combinations of published param-eters into those listed above. For example, when measure-ments of W b h are given we convert these into a value for W b using the WMAP7 value of h = . L CDM (e.g. w = − For each published measurement, we also choose a categorybased on the type of data and method used to extract thecosmological parameter. There are obviously many differentpossible choices of categorization possible and with differentcoarseness. We choose the following 12 categories in orderto have a reasonable number of measurements in each (themean is 53):1. Cosmic Microwave Background (CMB), speciﬁcallymeasurement of primary anisotropies,2. Large-Scale Structure (LSS), which includes cluster-ing of galaxies, galaxy clusters (BAO measurementsand redshift distortions are considered separately). theLy a forest, quasar absorption lines, and quasars.3. Peculiar velocities, which includes measurements ofgalaxy peculiar velocities inferred from distance mea-surements and redshifts, and the cosmic dipole,4. Supernovae, which includes techniques that use super-nova distance measurements.5. Lensing, which includes constraints from the numberof strong gravitational lenses, weak lensing shear, andgravitational lens time delay,6. Big Bang Nucleosynthesis (BBN), Table 1.

Fiducial values for the cosmological parametersused in this paper. These values are used when computingthe accuracy of past measurements. All parameters are takenfrom the last column of Table 1. in the WMAP7 paper,which are mean of the posterior distribution of combinedWMAP+BAO+ H measurements (we have also tried themaximum likelihood parameters, with no difference in ourresults), except parameters (ii),(viii),(xi) and (xii) for whichwe have assumed that an exactly ﬂat L CDM model holdswith m n = ± . eV . The quoted error bars are derivedfrom the WMAP7 error bars, with the exception ofparameter (vii) for which an error bar of 0.1 is used toapproximately take into account differences in galaxy biasbetween different samples. We explore the effect of addingthese error bars in quadrature to the error bars of pastmeasurements in Section 3.4Parameter Central value 1 s error bar(i) W M W L H − Mpc − − Mpc − (iv) s W b n b m n G W . m s W k w -1.0 0.07. Clusters of galaxies including their abundance and theirmasses. Includes Sunyaev-Zeldovich measurements,8. Baryonic Acoustic Oscillation measurements from large-scale structure of galaxies and clusters,9. The Integrated Sachs Wolfe effect (ISW),10. z distortions, redshift distortions of clustering11. Other, includes Tully Fisher distance estimates, galaxyages and/or colours, globular cluster distances, internalstructure of galaxies, cepheid distances, surface bright-ness ﬂuctuations, reverberation mapping, radio sourcesize, and Gamma Ray Burst distances,12. Combined, includes measurements that result from acombination of techniques or past measurements, with-out the addition of new measurements.In Figure 1 we show a scatter plot of method vs param-eter for our dataset. We can see that the most popular pa-rameter/method combination is W M measured using galaxyclusters, but that in general there is a fairly wide selection of n the measurement of cosmological parameters — 4/14Figure 1. Scatter plot of method vs. parameter. We plot as apoint each of the 637 published measurements, with y -axisrepresenting the 12 method bins of Section 2.2 and the x -axis the 12 parameters of Section 2.1. In order to make thepoints visible we have added a random offset of a fraction ofthe bin width to each point. The red and black colours areused solely to enhance differentiation between the bins.method and parameter, with just over half (76 out of 144) ofthe combinations covered by at least one published abstract.

3. Analysis

Our analysis is in two parts, the ﬁrst being a study of generaltrends in the number of parameter measurements and popu-larity of different methods by year, as well as a looking at themeasurement value vs year for a subset of parameters (Sec-tions 3.1 and 3.2). In the second part (Sections 3.3 and 3.4),we compute the precision and accuracy of the measurementsand see how these have varied with time.

In Figure 2 we show how the number of parameter measure-ments per year has varied with time. The results are shownaveraged in bins of 3 years.It is immediately noticeable that nearly all of the param-eters have a peak in the number of measurements around theyears 2000-2003, and then a decline in the post-WMAP1(Spergel et al. 2003) era. Exceptions to this are measure-ments of w , which are still increasing in number, and con-strains on m n . Of course this historical trend is largely guar-anteed by our selection of the parameter set we have chosen,which in large part are considered to have been well mea-sured already. Other parameters such as fNL, w a , or the mod-iﬁed gravity parameter E G (see e.g., Reyes et al. 2011) wouldstill be increasing on such a plot. Figure 2 can also be viewedas a measure of the extent to which parameters are consideredto be well measured. For individual parameters such as s , Figure 2.

Number of cosmological parameter measurementspublished per year, with curves representing the 12 differentparameters listed in Section 2.1. Bins of width 3 years wereused to compute the curves.

Figure 3.

Number of cosmological parameter measurementspublished per year, with curves representing the 13 differentmeasurement methods listed in Section 2.1. Bins of width 3years were used to compute the curves.there are still many measurements published even at the cur-rent time, but the decline is still there.Another way to present the data is shown in Figure 3,where the popularity of different methods with time can beexamined. Here it can be seen that “combined” methodsare the exception to the general post WMAP1 decline. Inoveral number, galaxy clusters have proven the most popu-lar cosmological probe, with a sharp start in the early 1990s.Supernovae and Large-Scale structure measurements have re-mained fairly constant since 2000, and the popularity of grav-itational lensing per year has not been much different fromthat of galaxy clusters, except lagging behind by about 8years. BAO measurements are the only technique still onthe rise, reﬂecting the current and future large-scale structuresurveys targeted at BAO (e.g., Eisenstein et al. 2011, Blakeet al. 2011, Schlegel et al. 2011). n the measurement of cosmological parameters — 5/14Figure 4.

Individual published values of the Hubbleconstant, H as a function of year. We show one sigma errorbars on the points, and the point colour (shown in thelegend) denotes the technique used to make themeasurement (see Section 2.2 for more details). It is instructive to study the distribution of data points andtheir error bars as a function of time, and we do this for asubset of parameters in Figures 4 through 9. In each casewe show the WMAP7 best ﬁt value for the parameter as ahorizontal line. This type of plot is most familiar from thestudies of Huchra for the Hubble constant, where the initialvalues reported by Hubble were over 5 times the currentlyaccepted values.In Figure 4 we show how Hubble constant determinationshave changed over the last 20 years, with the beginning ofthis time period overlapping with the end of the ∼

20 yeartimeframe during which measurements of H were largely di-vided into two groups, one group closer to 50 km s − Mpc − (e.g., Sandage & Tammann 1975), and one closer to 100 km s − Mpc − (e.g., Devaucouleurs et al. 1979). These two camps can beseen prior to 1995 in Figure 4, where it is also obvious thattheir error bars are largely not compatible, or indeed com-patible with the eventual currently favoured value of H =

70 km s − Mpc − . The HST Key Project (hereafter KP) tomeasure the extragalactic distance scale published its ﬁrstresults in 1994 (Freedman et al. 1994), and ﬁnal results in2001 (Freedman et al. 2001). The main contribution of theproject was to extend the Cepheid-based rung of the distanceladder to cosmological distances. Freedman et al. 2001 com-bined this with other datasets (Type IA and type II SN, thegalaxy Tully-Fisher relation, surface brightness ﬂuctuationsand galaxy fundamental plane) to yield different measure-ments which were all consistent with H = ± − Mpc − ,meeting the goal of a ∼

10% measurement of H .This post-1994 period of activity related to the KP is Figure 5.

Individual published values of the densityparameter, W m as a function of year. We show one sigmaerror bars on the points, and the point colour (shown in thelegend) denotes the technique used to make themeasurement (see section 2.2 for more details). 1 (2) sigmaupper and lower limits are shown using single (double)arrows.immediately apparent in Figure 4. It can also be seen thatdifferent methods have produced results which were some-what divergent at ﬁrst but which eventually became consis-tent with the ﬁnal result by the end of the 1994-2001 KPperiod. An example of this is the determination from typeIA supernovae, where it can be seen that the green points rep-resenting these track steadily upwards from 1993 onwards.A large cluster of gravitiational lensing time delay measure-ments also exhibits a similar trend, and indeed some othermeasures such as galaxy cluster Sunyaev-Zeldovich measure-ments are somewhat lower than H =

70 km s − Mpc − . Thisset of lower results largely disappears by 2003, which is whenthe next sudden tightening of determinations occurs, conci-dent with the WMAP1 data release. The WMAP1 best ﬁtvalue of H was 72 ± − Mpc − , and after that date es-sentially all measurements are consistent with it. Of coursethe WMAP1 result was strikingly similar to the KP resulteven though it involved radically different physics. The ev-idence of Figure 4 is that the combination of the two setsof measurements was enough to convince most researchersthat the measurement goal had been reached. In the future, ameasurement of H to even higher accuracy will be neededto make truly accurate constraints on dark energy parameters(see e.g. the DETF report).There are a few obviously discordant points, for example,Leith et al. (2008) ﬁnd H = . + . − . km s − Mpc − from acombined analysis of several datasets. Their analysis is not inthe context of the LCDM model, but in one in which cosmo- n the measurement of cosmological parameters — 6/14Figure 6. Individual published values of the amplitude ofmass ﬂuctuations, s as a function of year. We show onesigma error bars on the points, and the point colour (shownin the legend) denotes the technique used to make themeasurement (see section 2.2 for more details).logical averaging can be used to understand the accelerationof the Universe (Wiltshire 2007).In Figure 5 we show the history of measurements of W m ,the most frequently measured parameter in our dataset. Inthis case we can see that before 1999 approximately 1/3 ofthe measurements were consistent with high values of W m ∼ . − .

0, and that the most popular technique in this earlyperiod involved the use of galaxy peculiar velocities. The er-ror bars were large, although there are several points whichare not consistent with the eventual WMAP7 value of W m = . ± . W m end of parameter space. As with the H results a second signiﬁcant tightening of published valuesaround the ﬁnal range took place in the years 2004-2005,shortly after the WMAP1 results.The amplitude of mass ﬂuctuations, s is examined inFigure 6. In this case we can see that the abundance of galaxyclusters is easily the most popular method used to measurethis parameter, and the effort started in earnest around 1995.The cluster measurements of s are roughly evenly spreadaround the WMAP7 value of s = . ± .

024 until af-ter the WMAP1 release, when low values (below s ∼ . s seemed to favour high values, s ∼ W b in Figure 7,we can see that the measurements are mainly concentrated inan 8 year period between 1996 and 2004. Over this time span Figure 7.

Individual published values of the baryon densityas a fraction of the critical density, W b , as a function of year.We show one sigma error bars on the points, and the pointcolour (shown in the legend) denotes the technique used tomake the measurement (see section 2.2 for more details). 1(2) sigma upper and lower limits are shown using single(double) arrows. Figure 8.

Individual published values of the vacuum densityparameter W L as a function of year. We show one sigmaerror bars on the points, and the point colour (shown in thelegend) denotes the technique used to make themeasurement (see Section 2.2 for more details). 1 (2) sigmaupper and lower limits are shown using single (double)arrows. n the measurement of cosmological parameters — 7/14Figure 9. Individual published values of the dark energyequation of state parameter w as a function of year. Weshow one sigma error bars on the points, and the pointcolour (shown in the legend) denotes the technique used tomake the measurement (see section 2.2 for more details). 1(2) sigma upper and lower limits are shown using single(double) arrows.two features can be clearly seen, the ﬁrst being the steadyrise in W b measured using BBN, and other being the start ofCMB measurements around 2000. Because a high Deuteriumto Hydrogen ratio (easier to see) implies a low value of W b this may account for the difﬁculty encountered in early BBNmeasurements. Both CMB and BBN were consistent, how-ever well before the WMAP tighening which occured around2003-2004, as with the other parameters.In Figure 8 we plot the measurements of W L . In this case,many of the early points are upper limits which were just con-sistent with the eventually measured value. The ﬁrst Type1A supernova results showing acceleration appeared at theend of this era of upper limits. The probably WMAP-relatedtightening of results around 2003 is especially pronounced inthis plot, where one can see the published error bars sizes im-mediately dropping. It is interesting to note that after 2002,almost all measurements of W L are consistent with the ﬁdu-cial value from Table 1. Of the most recent 28 measurementsshown in Figure 8 (these are those that contribute to the last2 points in the W L accuracy plot, Figure 12, in Section 3.4),only 2 are more than 1 s from the “correct” value. The sumof c values when we compare to the W L from Table 1 perdata point is 22.7 for these 28 measurements, which does notsound very small. However, this includes the measurement ofCabre et al. , (2006), which is 4.0 s from the Table 1. value.Without this outlier, the c per data point is only 0.26. Thiscould be a signature of overestimation of the error bar size,or perhaps of “conﬁrmation bias”. We will return to this inSection 4. The ﬁnal parameter for which we examine the individualmeasurements is the dark energy equation of state parame-ter w , which we show in Figure 9. In this case there areno measurements or limits before the SN measurements ofthe acceleration of the Universe in the late 1990s (Perlmutteret al. 1999, Riess et al. 1998). At around the time of WMAP1the ﬁrst measurements rather than limits on w started to bepublished, and since then SN have continued to be the mostpopular probe of this parameter. A trend more apparent inthis more recently measured parameter is the large number ofpoints from “combined” measurements. Although one couldargue that w is not at present as well known as some of theother parameters, we have plotted the ﬁducial value on thisgraph as w = s level. One of the common themes which has emerged in the pastfew years is that we are now in the era of “precision cosmol-ogy”. It is instructive to study what the data reveals abouthow we reached this point and what precision is currentlyachievable for the different parameters and using the differ-ent techniques. We quantify the precision of measurementsto be the size of the 1 s error bar as a percentage of the ﬁdu-cial (WMAP7) value for each parameter. We have also triedusing the error bar size as a percentage of the quoted centralvalue of each measurement, ﬁnding no signiﬁcant differencein our results (except for the case of W L , for which the latter isnot a useful way of examining earlier data). In the case of theneutrino mass, m n for which only upper limits are available,we have taken the precision to be the limit in m n divided bythe value of m n required for the closure density (i.e. W m = m n = h eV. For W k we divide themeasurement error by 1 . W k = m n , we have not included anyupper or lower limits on parameters in this plot, only pub-lished measurements of values with error bars.It is apparent from the general appearance of Figure 10that the precision of most measurements has not increasedvery steeply. The log scale of the y -axis is partly responsiblefor this impression, but even so, of the 12 parameters shown,6 have a mean precision in the latest bin which is compatible(within 1 s ) of that in the earliest bin. It is possible that thissituation has arisen because of greater understanding of therole of possible systematic errors as time has gone on. Thevalue of s is now known to better than 10% for an aver-age measurement, for example, after a long period in which n the measurement of cosmological parameters — 8/14Figure 10. The quoted precision of measurements as afunction of year for our different cosmological parameters.The precision is deﬁned to be the size of the 1 s error bar asa percentage of the ﬁducial parameter value in Table 1.Error bars are Poisson errors computed from the number ofmeasurements in each bin.the precision did not improve. Currently the most preciselyknown parameters are the curvature W k and the primordialspectral index n , which are both known to about 1%. A largegroup of parameters are currently known to about 10% preci-sion, from W M (17%), through W b , W L , s and H (7%).In Figure 11, we show the precision of measurementsas a function of the technique used. As many of the tech-niques are used to measure several different parameters, it isworth bearing in mind that decreases in precision with timecould be related to the switch to a less well measured pa-rameter. We can see that this may indeed be happening insome cases, or else that again systematic errors are being con-fronted more as time goes in. We can differentiate betweenthese possibilities by considering the averaged accuracy ofmeasurements, which we do in the next Section. For now,we can see that lensing, redshift distortion and peculiar veloc-ity measurements have exhibited no improvement in quotedprecision with time. The CMB on the other has improvedby about an order of magnitude over the 20 year period, andcluster measurements by about a factor of 3. Supernova mea-surements are also more precise now than they were in thelate 1990s by a factor of 2. Our assumption that the “correct” values of the different cos-mological parameter values are available allows us to com-pute a potentially powerful statistic, the accuracy of measure-ments. We deﬁne this to be the absolute value of the differ-ence between a measured value of a parameter and our ﬁd-

Figure 11.

The quoted precision of measurements as afunction of year for the different measurement techniques.The precision is deﬁned to be the size of the 1 s error bar asa percentage of the ﬁducial parameter value in Table 1. Eachpanel therefore includes measurements made of manydifferent parameters. Error bars are Poisson errors computedfrom the number of measurements in each bin.cuial value for that parameter (as listed in Table 1), dividedby the quoted 1 s error bar for that measurement. The accu-racy can therefore be written as N s , the average number ofstandard deviations measurements are from the correct value.We note that for a Normal distribution of errors, the averagevalue of N s =

1. Values smaller than 1 indicate that the er-ror bars have been overestimated, and for values larger than1 the error bars have been underestimated. Alternatively, val-ues smaller than 1 may also indicate evidence for “conﬁrma-tion bias”, in which values closer to the expected ones arefavoured (not necessarily consciously). We have chosen touse N s as our statistic rather than the c as it is more ro-bust to outliers (not being dependent on the square of thedifference between a measurement and the true value). Qual-itatively similar conclusions would result if we did use the c of measurements with respect to the “known” values as ameasure of accuracy, however.When computing the accuracy, one must decide how theuncertainty on the true values of the parameters affects theresults. Two choices which approximately bracket the rangeof potential effects are to either add the 1 s error bars on thevalues in Table 1 in quadrature to the error bars on each pub-lished measurement, or else to assume no additional uncer-tainty beyond the quoted error bar for each published mea-surement. We have tried both, ﬁnding almost imperceptiblequantitative differences which do not affect any of our conclu-sions. This can be understood from the fact that the error barsin Table 1 are much smaller than those on past measurements. n the measurement of cosmological parameters — 9/14Figure 12. The accuracy of measurements as a function ofyear for the different parameters. The accuracy is deﬁned tobe the difference between the quoted measurement and thethe ﬁducial parameter value in Table 1 in units of the quotedmeasurement 1 s error bar. Error bars are Poisson errorscomputed from the number of measurements in each bin.The dashed line is the expectation for Gaussian statistics, N s = N s = N s = G = W h .Turning to individual parameters in Figure 12, we cansee that the accuracy of measurements of H and s , has im-proved over the last 20 years, so that the most recent measure-ments appear to have realistic error bars. The error bars on W L appear to be overestimated, as do the most recent errorbars on W M . From this it would appear that signﬁcant worksucessfully understanding the overall levels of measurementuncertainty has been carried out for H and s , but that thishas not happened for some of the other parameters. We returnto this topic in Section 4.2.If the varying accuracy is more tightly related to the choiceof technique than parameter, then we can expect the plot ofaccuracy for different techniques (Figure 13) to be more in- Figure 13.

The accuracy of measurements as a function ofyear for the different meaasurement techniques. Theaccuracy is deﬁned to be the difference between the quotedmeasurement and the the ﬁducial parameter value in Table 1in units of the quoted measurement 1 s error bar. Error barsare Poisson errors computed from the number ofmeasurements in each bin. The dashed line is theexpectation for Gaussian statistics, N s = s with N s = N s = N s = . ± .

07 and 0 . ± .

12 for the most recent twopoints. This is an improvement, indicating that the SN sys-tematic error bars may well be too conservative. It is stillan underestimate, but now of similar magnitude to the dif-ferences seen between the accuracy=1 line and some datapoints on the “other”, “combined” and “LSS” panels. If weallow for the possibility that the Poisson error bars on ourdata points in Figure 13 are underestimates, and that there n the measurement of cosmological parameters — 10/14Figure 14.

The accuracy of measurements as a function ofyear. The accuracy is deﬁned to be the difference betweenthe quoted measurement and the the ﬁducial parameter valuein Table 1 in units of the quoted measurement 1 s error bar.Error bars are Poisson errors computed from the number ofmeasurements in each bin. We show separately the accuracyfor measurements from the two journals with the mostpublished measurements.may be correlations between measurements in different yearsthen this may go some way to reconciling the measurementsand their hoped for accuracy. We return to this point in ourdiscussion below (Section 4.2).One question which is not easy to answer from the multi-panel Figures 12 and 13 is how the overall accuracy of mea-surements is changing by year. Are cosmological measure-ments improving as both theoretical knowledge and expertisein dealing with experimental uncertainties improve? We cansee that this does appear to be the case by considering Figure14, which plots accuracy by year for results published in thetwo main journals, MNRAS and ApJ (including ApJL andApJS). These account for 35% and 55% of all results in ourcompilation, respectively. The results before the year ∼ N s = N s (innaccurate) and high citations that other cornersof the plot. This leads to a Pearson correlation coefﬁcient of r = − . Figure 15.

The accuracy of measurements published in apaper as a function of the number of citations to it. Theaccuracy is deﬁned to be the difference between the quotedmeasurement and the the ﬁducial parameter value in Table 1in units of the quoted measurement 1 s error bar.tions and accuracy, in that papers with higher accuracy (lower N s ) have more citations. A set of points with no correlationwould give such a result 11% of the time, so the evidence forthis is marginal, however.We note that there does exist a signiﬁcant correlation be-tween the precision of measurements and the number of ci-tations (not plotted). We ﬁnd a correlation coefﬁcient of r = − .

134 (smaller fractional error results are more cited)and probability p = . r = − .

748 , p = . × − ), whichis just due to the overall trend in improving measurements,and a correlation between year of publication and citations( r = . p = . N s = N s values in Figure16 along with the Gaussian curve. The data is fairly simi-lar to the Gaussian curve for the low end of the N s rangewhere the majority of the data resides, showing that in gen- n the measurement of cosmological parameters — 11/14Figure 16. The distribution of measurement errors in unitsof the quoted standard deviation. For each measurement wedivide the difference between the quoted value and theﬁducial value in Table 1 by the quoted 1 s error bar. Theresults are shown as a histogram. We also show the expectedcurve for a Gaussian distribution of errors (smooth line).eral error bars are only slightly understimated (we have seenthis already in Figure 14, for example). There is however along tail extending to high N s values, with some measure-ments being 8 or even 10 s away from their ﬁducial values.Of course with a Gaussian distribution the chance of suchevents occuring would be miniuscule. We can quantify thisfurther by computing the fraction of measurements which aregreater than 2 s away from the correct value. We that 19% ofmeasurements are like this, rather than the 5% expected for aGaussian distribution.

4. Summary and Discusssion

We have compiled cosmological parameter measurements pub-lished between 1990 and 2010 and the techniques used tomeasure them. Using this data we have carried out an analy-sis of historical trends in popularity, precision and accuracy.The accuracy of past measurements has been estimated byassuming that WMAP7 parameter values of Komatsu et al.(2011) (combined with L CDM standard values for e.g. w )are the correct ones. Our ﬁndings can be summarised as fol-lows:(1) The number of published measurements for differentparameters peaks between 1995 and 2004 for all cases, ex-cept for w for which the number was still rising in 2010.(2) Of all techniques used to measure the parameters, onlybaryon oscillation and “combined” measurements were stillrising in terms of publications per year by 2010.(3) The quoted precision of measurements has been de-clining relatively slowly for most parameters, with several(e.g. s , H remaining ﬂat for 10-15 years. (4) The accuracy of recent parameter measurements isgenerally what should be expected based on the quoted er-ror bars i.e. the error bars overall are neither understimatednor overestimated (an accuracy, N s = .

0, within the Poissonuncertainty on the measurement). Before 2000, the accuracy N s as closer to 2, indicating underestimation of the error barsby a factor of 2. Overall, there is a small non-Gaussian tailto the error distributions (we ﬁnd that 20% of measurementsare more that 2 s away from the true values.(5) The accuracy of most methods has become consistentwith N s = .

0, with the historically most innaccurate param-eter measurement technique being the use of galaxy peculiarvelocities. Measurements of W M and particularly W L madesince 2000 tend to have accuracy N s signiﬁcantly less than1.0, indicating “conﬁrmation bias” and/or an overestimationof error bar sizes. Over the 20 year period covered in this study, it is appar-ent that many of the parameters in what is now the concor-dance CDM cosmological model went from the status of noinformation or only limits to being known at the 10% levelor better. It is also apparent from Figure 2 that there wasa “golden age” of parameter measurements between ∼ ∼ L CDM parameters were known by the timeof the ﬁrst WMAP results is sufﬁcient, and many of the rea-sons for pinning down the model better had diminished afterthat.This said, however, the exception to this rule, measure-ments of w (which are still rising in terms of number peryear at the end of our study) seems to point to a comingnew era in parameter measurement. Certainly, the motivationfor the large number of ongoing and future large-scale struc-ture, lensing and other surveys is to hunt for the signatures ofdynamical dark energy and modiﬁed gravity, and given thenumber of researchers carrying out these studies it is likelythat measurements will continue to rise. Many parameterswhich we have not catalogued are now within reach of quan-titiative study. These include the modiﬁed gravity parameter, E G (Reyes et al. 2010) and the time derivative of the equa-tion of state parameter, w a . Measurements of such parame-ters involve searching for deviations from the concordanceCDM model and fall into a different category from most ofthe parameters we have studied in this paper. Inﬂationary pa-rameters such as the non-Gaussianity fNL, or tensor to scalarratio r will pinned down with higher precision in the future,and these should also represent a growth area. The motiva-tion for most future measurements being largely framed interms of a quest for fundamental physics, it would be logi-cal to assume that they will continue until the cause for theUniverse’s acceleration are better understood. Likewise, pa- n the measurement of cosmological parameters — 12/14 rameters describing the dark matter particle should be addedto this category.Possible behaviours for the precision of future parametermeasurements can be predicted by looking at the past results(Figure 10). There is a very wide range, but most parametersimprove slowly, with a factor of 10 improvement in precisionover the 20 years representing the extreme (2 out of 12 pa-rameters). The precision of some parameters has remainedrelatively ﬂat for the whole period, so this is a possibility forfuture so far unconstrained parameters. An argument againstthis slow progress however is the fact that many new sur-veys (such as of Baryon Oscillations) are targeted primarilyat measures of speciﬁc parameters, and this aggressive ap-proach (for example including speciﬁc precisions to be ob-tained at a given time in survey proposal documents) couldlead to faster progress.Our investigation of the accuracy of results could poten-tially lead to some of the most interesting ﬁndings. We haveseen that in the earlier half of our studied time period thereis evidence that the error bars were signiﬁcantly underesti-mated, but that this has changed over time.When discussing the accuracy, we are should be awarethat it was not possible in our analysis to take into accountseveral factors which have the potential to affect our conclu-sions. For example, we do not keep track of the priors thatpeople have assumed in their measurements, and in many ofthe later cases, this may include the WMAP results as priors.That this is happening is likely to be responsible for much ofthe post WMAP1 tightening of constraints seen in Figures 4-9. When computing the error bars on the mean accuraciesof measurements (Figs 12 and 13) we have used Poisson er-rors based on the number of measurements in each bin. Thiswill tend to underestimate the uncertainty on the accuracy be-cause some of those measurements could be using the sameunderlying data, or be using similar priors, or a combinationof the two. There will therefore be correlations between theerror bars so that our estimates of the accuracy will be af-fected. Equivalently the chi-square of the ﬁducial result com-pared to the data points will be incorrectly determined to below because of the correlations are not included.Bearing the above points in mind, we return to the panelsin Figures 12 and 13 where the accuracy seems to be signif-icantly below N s =

1. This is most obvious in the secondpanel ( W L ) of Figure 12. Such as result could be a sign thateither the error bars have been signiﬁcantly overestimated,or else that researchers have been inﬂuenced by prior results(“conﬁrmation bias”), or a combination of the two. If we re-turn to the data points which led to the last two bins of paneltwo of Figure 12, we ﬁnd something especially striking. Ofthose 28 measurements, only 2 are more than 1 s from theﬁducial results of Table 1. These 28 measurements were car-ried out by approximately 11 separate groups (as determinedby authorship lists) using several different techniques.This closeness of published results to the “correct” onesis somewhat worrying for future measurements. One can in- terpret this as coming partly from error bars being overeres-timated by cautious cosmologists, for example by includingpossible systematic errors in the error bars which are not ac-tually present to such a large degree, or in a related point au-thors marginalizing over parameters which are actually betterknown than was assumed. We note that including or exclud-ing the actually quoted systematic error bars (Section 3.4) haslittle effect on this result. An additional question is why someparameters have N s < s ). The rel-atively low number statistics of our whole dataset precludeus from making any strong statements about this issue. If itdoes partly result from conﬁrmation bias, one can also won-der how observers knew which value of W L (for example)would be the “correct” one, given that our ﬁducial (mostlyWMAP7) results from Table 1 were published in 2011. Ifthis bias is present, it is probably related to the mean level for W L resulting from several prior measurements. For examplein Figure 8 and others, the value of the parameter seems tobe pretty well determined at least by 2003.If we look at the techniques which are often associatedwith dark energy measurements, SN and BAO, we can seein Figure 13 that these two have low N s for recent measure-ments. Of the 23 measurements which where included in thelast bins of the SN panel of Figure 13, only 2 are more than1 s from the ﬁducial result. We note that this ﬁducial resultfrom Table 1 does include BAO measurements, but not SNestimates of dark energy. In the case of SN, however, only 4measurements of W L are included in these bins, and only 2separate groups of researchers, so that for that subset of data,statistical ﬂuctuations may well be responsible for the low N s seen. If conﬁrmation bias is present, on the other hand, onecould argue about who is conﬁrming who- certainly the ﬁrstSN results on dark energy predate those from BAO and frommost other techniques. These sorts of questions might beaddressed by a more detailed look at the published measure-ments, including details of priors, jointly used datasets andanalysis techniques. Then again small number statistics prob-ably would not allow ﬁrm conclusions to be drawn. Thesehints should instead serve as a warning that care and perhapsconcrete steps be taken to avoid any conﬁrmation bias in thefuture. As we have seen from Section 3.1, the number of publishedmeasurements per year has already peaked for many param-eters. There are however certain techniques, such as BAOwhich were still rising in popularity at the end of the studyperiod. In this subsection, we speculate on the basis of ourstudy what could be the medium term (5-20 year) future of anewer probe of cosmology, the measurement of General Rela-tivistic (GR) effects in large-scale structure, and in particularthe gravitational redshift.In the weak ﬁeld limit, the gravitational redshift, z g of n the measurement of cosmological parameters — 13/14 photons with wavelength l emitted in a gravitational poten-tial f and observed at inﬁnity is given by z g = D ll ≃ D f c . Mea-surement of z g is one of the fundamental tests of GR. Firstmeasured more that 50 years ago for the Earth’s gravity in alaboratory setting (Pound & Rebka 1959), subsequent deter-minations were been made in the solar system (Lopresto et al.1991) and from spectral line shifts in white dwarf stars (e.g.,Greenstein et al. 1971). In cosmology, theoretical predictionsand attempts to measure the effect have a long history. Lightemitted in dense regions (such as galaxies in clusters or super-clusters) should be redshifted with respect to other galaxiesin less dense regions. Early studies of the redshift differencesbetween galaxy clusters of different masses (Nottale 1983)found no effect, as did measurements made from brightestcluster galaxies (e.g. Cappi 1995, Kim & Croft 2004).The ﬁrst successful measurements of galaxy gravitationalredshifts were made by Wojtak et al (2011), by stacking theredshift proﬁles of 8000 galaxy clusters from the Sloan Dig-ital Sky survey. The measurement had a stated precision of36% (it was a 2.8 s detection). It was followed by a numberof other measurements with similar precision (DominguezRomero, 2012, Sadeh et al., 2015, Jimeno et al., 2015), inthe fashion that might be expected if the precision vs. timecurve follows the slow decent seen in many of the parametersseen in Figure 10.At present a cosmological parameter relevant to quantifythe deviations from GR has not been uniformly used in theliterature on gravitational redshifts. The modiﬁed gravity pa-rameter E G (see e.g., Reyes et al. 2011) has emerged as apossible contender, but gravitational redshift measurementsare still at the level of quoting a detection signiﬁcance. Thedifference in signal amplitude compared to GR for some pop-ular alternative theories of gravity is approximately 30% also(Wojtak et al 2011).Theoretical predictions for GR effects have increased innumber and in precision alongside the ﬁrst detections. Theserange from early work of Cappi (1995), Broadhurst et al.(2000), Kim & Croft (2004) to more recent studies whichinclude a range of other GR galaxy clustering effects of sim-ilar magnitude to the gravitational redshift, and which needto be taken into account at the same time (McDonald 2009,Yoo et al. 2012, Zhao et al. 2013, Kaiser 2013, Bonvin et al.2014).Based on the results in Figures 10 and 11, the ∼

30% pre-cision quoted for published measurements is likely to holdsteady for the next 10 years or so. Towards the end of thistime period, it is expected that major galaxy redshift surveyssuch as MS-DESI and Euclid will allow measurements torapidly reach the precision of a few percent (Croft 2013).

In conclusion, we have seen that huge progress has beenmade in the 20 year period covered by our study. Importantquestions have been resolved (e.g., is the Universe open?, domassive neutrinos contribute substantially to the dark matter density?), a model has been found which agrees with essen-tially all observational data so far ( L CDM), and the param-eters of that model have been pinned down at the 1 − L CDM than statistically likely. These may be ex-plainable by correlations between measurements which wehave not included. On the other hand this may serve as a signthat as cosmology collaboration sizes increase carrying outmore blind analyses (as in particle physics) may be a goodidea.

Acknowledgments

We acknowledge partial support from NSF award AST 1412966.RACC acknowledges useful discussions with Bill Holzapfel,Saul Perlmutter, Jeff Peterson, Anze Slosar and Martin White.This research has made use of NASA’s Astrophysics DataSystem Bibliographic Services.

References [1]

Albrecht, A., et al. , 2006, Report of the Dark Energy TaskForce, eprint arXiv:astro-ph/0609591 [2]

Blake, C., et al. , 2011, MNRAS, in press,arXiv:1108.2635) [3]

Bonvin, C., Hui, L., & Gaztanaga, E., 2014, Phys. Rev.D., 89, 3535 [4]

Broadhurst, T. and Scannapieco, E., 2000, ApJL, 533, 93Cabre, A, Gazta˜naga, R., Manera, M., Fosalba, P., & Cas-tander, F., 2006, MNRAS, 372, 23 [5]

Cappi, A., 1995, A& A, 301, 6 [6]

Chen, G. and Gott, III, J. R. and Ratra, B., 2003, PASP,115, 1269 [7]

Chen, G. & Ratra, B., 2003, PASP, 115, 1143 [8]

Chevallier M., & Polarski, D., 2001, IJMPD, 10, 213 [9]

Croft, R.A.C., 2013, 434, 3008 [10] de Vaucouleurs, G., 1979, ApJ, 227, 729 [11]

Dodelson, S., 2003, “Modern Cosmology”, AcademicPress. n the measurement of cosmological parameters — 14/14 [12]

Dominguez Romero, M., Garcia Lambda, D., Muriel, H.,2012, MNRAS, 427, L6 [13]

Eisenstein, D.J., et al. 2011, Astron J., 142, 72 [14]

Freedman, W.L., Hughes, S. M. , Madore, B. F. , Mould,J. R. , Lee, M. G. , Stetson, P. , Kennicutt, R. C. , Turner,A. , Ferrarese, L. , Ford, H. , Graham, J. A. , Hill, R. ,Hoessel, J. G. , Huchra, J. & Illingworth, G. D. , 1994,ApJ 427, 628 [15]

Freedman, W.L., et al. 2001, ApJ 553, 47 [16]

Gott, J.R., Vogeley, M.S., Podariu, S., Ratra, B., ApJ,2001, 549, 1 [17]

Greenstein, J. L., Oke, J. B.& Shipman, H. L., 1971, 169,563 [18]

Hu, W., & Dodelson, S., 2002, ARAA, 40, 171 [19]

Hubble, E., 1929, PNAS, 15, 168 [20]

Jain, B., & Zhang, P., PRD, 79, 3503 [21]

Jimeno, P., Broadhurst, T., Coupon, J., Umetsu, K.,Lazkoz, R., 2015, MNRAS, 448, 1999 [22]

Kaiser, N., 2013, MNRAS, 435, 1278 [23]

Kim, Y.-R. & Croft, R.A.C., 2004, ApJ, 607, 164 [24]

Leith, B. M., Ng, S. C. C. & Wiltshire, D. L., 2008,ApJL, 672, 91 [25]

Linder, E.V., 2003, Phys. Rev. Lett., 90, 91301 [26]

Lopresto, J. C., Schrader, C., & Pierce, A. K., 1991, 376,757 [27]

McDonald, P., 2009, JCAP, 11, 026 [28]

Nottale, L., 1983, A&A, 118, 85 [29]

Perlmutter, S., Aldering, G., Goldhaber, G., Knop, R. A.,Nugent, P., Castro, P. G., Deustua, S., Fabbro, S., Goo-bar, A., Groom, D. E., Hook, I. M., Kim, A. G., Kim, M.Y., Lee, J. C., Nunes, N. J., Pain, R., Pennypacker, C. R.,Quimby, R., Lidman, C., Ellis, R. S., Irwin, M., McMa-hon, R. G., Ruiz-Lapuente, P., Walton, N., Schaefer, B.,Boyle, B. J., Filippenko, A. V., Matheson, T., Fruchter, A.S., Panagia, N., Newberg, H. J. M., & Couch, W. J., 1999,ApJ, 517, 565 [30]

Pound, R.V., & Rebka, G.A., 1959, PRL, 3, 439 [31]

Reyes, R., Mandelbaum, R., Seljak, U., Baldauf, T.,Gunn, J. E., Lombriser, L. and Smith, R. E., 2010, Nature,464, 256 [32]

Riess, A. G. , Filippenko, A. V. , Challis, P. , Clocchiatti,A. , Diercks, A. , Garnavich, P. M. , Gillil,, R. L. , Hogan,C. J. , Jha, S. , Kirshner, R. P. , Leibundgut, B. , Phillips,M. M. , Reiss, D. , Schmidt, B. P. , Schommer, R. A. ,Smith, R. C. , Spyromilio, J. , Stubbs, C. , Suntzeff, N. B.and Tonry, J., 1998, AJ, 116, 1009 [33]

Sadeh, I., Feng, L.L., Lahav, O., 2015, PRL, 114, 1103 [34]

Sandage, A, & Tammann, G.A., 1975, ApJ, 196, 313 [35]

Schlegel, D., et al. , 2011, NOAO proposal eprint.arXiv:1106.1706 [36]

Slosar, A., Hirata, C., Seljak, U., Ho, S., Padmanhabhan,N., 2008, JCAP, 08, 31 [37]

Spergel, D. N., Verde, L., Peiris, H. V., Komatsu, E.,Nolta, M. R., Bennett, C. L.; Halpern, M., Hinshaw, G.,Jarosik, N.; Kogut, A., Limon, M., Meyer, S. S.; Page, L.,Tucker, G. S., Weiland, J. L.; Wollack, E., and Wright, E.L., 2003, ApJS, 148, 175 [38]

Wiltshire, D., NJP, 9, 377 [39]

Wojtak, R., Hansen, S.H., & Hjorth, J., 2011, Nature,477, 567 [40]

Yoo, J. Hamaus, N., Seljak, U., & Zaldarriaga, M., 2012,Phys. Rev. D, 86, 3514 [41][41]