[PDF] Alternative to the application of PDG scale factors

Abstract

The Particle Data Group recommends a set of procedures to be applied when discrepant data are to be combined. We introduce an alternative method based on a more general and solid statistical framework, providing a robust way to include possible unknown systematic effects interfering with experimental measurements or their theoretical interpretation. The limit of large data sets and practical cases of interest are discussed in detail.

Full PDF

EEur. Phys. J. C manuscript No. (will be inserted by the editor)

Alternative to the application of PDG scale factors

Jens Erler a,1,2 , Rodolfo Ferro-Hern´andez b,1 Departamento de F´ısica Te´orica, Instituto de F´ısica, Universidad Nacional Aut´onoma de M´exico, 04510 CDMX, M´exico PRISMA + Cluster of Excellence and Helmholtz Institute Mainz, Johannes Gutenberg-Universit¨at, 55099 Mainz, Germany

Abstract

The Particle Data Group recommends a setof procedures to be applied when discrepant data areto be combined. We introduce an alternative methodbased on a more general and solid statistical frame-work, providing a robust way to include possible un-known systematic eﬀects interfering with experimentalmeasurements or their theoretical interpretation. Thelimit of large data sets and practical cases of interestare discussed in detail.

Keywords

Particle Data Group · Bayesian DataAnalysis · Hierarchical Models · Parameter Estimation

In any ﬁeld of science, it is often the case that a num-ber of data points or data sets need to be combined inorder to achieve a greater overall precision. Now, datanaturally ﬂuctuate and it is not uncommon that one orseveral data points may appear discrepant or outlyingwith respect to the bulk of the data. This is not nec-essarily a concern, e.g. , if the results of the individualmeasurements or observations are known to be domi-nated by the statistical uncertainty, or even in the pres-ence of signiﬁcant systematic eﬀects, as long as theirassociated uncertainties can be reliably estimated. Onthe other hand, if the observed discrepancies are sus-piciously large or plentiful, one may worry that someunknown systematic eﬀect or unjustiﬁed but hidden as-sumption might have moved the central value of one ormore observations. In that latter case, a more conser-vative handling of the data and its combination wouldbe called for. a erler@ﬁsica.unam.mx b [email protected] Of course, it is impossible to know independentlywhich of the aforementioned situations — larger thanexpected random ﬂuctuations, unknown systematic ef-fect(s), or both — one is facing, or which of the individ-ual data (sub)sets could be at fault. As a remedy, theParticle Data Group (PDG) [1] proposed a set of rulesaccording to which the uncertainty of an average is tobe enlarged by a scale factor S , while the central valuesare to remain unchanged by ﬁat. Assuming Gaussianerrors, in a ﬁrst step the reduced χ is computed astwice the log-likelihood of the minimum divided by N eﬀ ,where N eﬀ is the eﬀective number of degrees of freedomgiven by the number of observations (data points), N ,minus the number of independent ﬁt parameters. Thus,for the most common case of a simple average of oneparameter, N eﬀ = N − χ is smaller than unity, the resultsare accepted and there is no scaling of errors.2. If the reduced χ is larger than unity, and the exper-iments are of comparable precision, then all errorsare re-scaled by a common factor S , given by thereduced χ , i.e. , S = (cid:112) χ /N eﬀ .3. If some of the individual errors are much smallerthan others, then S is computed from only the mostprecise experiments. The criterium for these is givenwith reference to an ad hoc cutoﬀ value.Given that the rationale for a procedure such as thisone, is to err on the conservative side, one immediateobjection is that if there is only one data point thenno conservative scaling will be applied, even though inthis case one is most exposed to a potential problem asthere is no control measurement. The PDG collects, evaluates, averages and ﬁts particlephysics data world-wide and assesses their implications andinterpretations in a large number of dedicated reviews. a r X i v : . [ phy s i c s . d a t a - a n ] J un Another problem is that the set of individual datapoints is not well-deﬁned. In principle, one may com-bine certain data subsets ﬁrst, such as from diﬀerentdata taking periods or diﬀerent decay channels obtainedby the same experimental apparatus, or combine iden-tical channels obtained by diﬀerent detectors and aver-age these is a second step. Conversely, one could splitup the available results into more but less precise in-dividual entries. While this has no impact on ordinarymaximum likelihood analyses, it will generally diluteor enlarge the reduced χ value on which the S factorsare based upon. In fact, applying PDG scale factors todata points of which some have already undergone thescale factor treatment (typically, by the experimentalcollaboration) then this kind of iteration does generallychange the central value of the combination. Also notethat the prescription according to which reduced χ values greater and smaller than unity are being treateddiﬀerently generates an unnecessary dichotomy.In this paper we present an alternative which sharessome of the features of the PDG recommendation whileimproving on others. The framework is a hierarchicalmodel within Bayesian parameter inference [2]. Thebasic idea is that individual data points are not con-sidered independently and identically distributed ( iid ),but rather independently and similarly distributed, inthe sense that the parent distributions are permitted tovary to some extent to allow for unknown eﬀects thatmay or may not be diﬀerent from one data point (mea-surement) to another. Thus, we propose a hierarchicalmodel where each measurement is assumed to deter-mine a diﬀerent parameter, each considered as havingarisen as a random draw from a common parent distri-bution described in turn in terms of hyper-parameters .A similar approach is widely used in the biologi-cal sciences when estimating treatment eﬀects by com-bining several studies performed under similar but notidentical conditions [3,4], in what is often referred toas meta-analysis [5,6,7]. In these cases the experimen-tal conditions can vary slightly, so that the individualstudies may be aﬀected by diﬀerent unknown biases.Several authors within the physics community intro-duced attempts to incorporate the eﬀects of unknownerror sources when combining data. For example, Ref. [8]ﬁnds results similar to the ones in our work, but withina frequentist approach. Ref. [9] models the probabilityof underestimating the experimental error by includinga diﬀerent scale factor for each measurement, which isin turn randomly drawn from a prior distribution. Veryrecently it was shown [10] that it is even possible totest the shape of the prior distribution, and not justto constrain the values of its parameters. We leave thiskind of more complete analysis for the future. In the next section we summarize the formalism ofBayesian hierarchical modeling using the notation ofRef. [2]. The rest of the paper introduces our approach,illustrated by a number of examples and reference cases. θ froman experimental measurement or observation, and to bespeciﬁc, that the likelihood for the outcome y of such anexperiment can be described as a Gaussian with centralvalue θ and standard deviation σ , p ( y | θ, σ ) = N ( y | θ, σ ) , (1)where, N ( y | θ, σ ) ≡ √ πσ e − σ ( y − θ ) . (2)The posterior distribution for the parameter θ can beobtained through Bayes’ theorem, p ( θ | y, σ ) ∝ p ( y | θ, σ ) p ( θ ) , (3)where p ( θ ) is the prior probability distribution of θ . Itis very convenient to assume p ( θ ) to be a conjugateprior , which means that the posterior distribution willfall within the same family of functions as the prior.Thus, in our case we adopt the prior, θ ∼ N (˜ µ, ˜ τ ) , (4)yielding the posterior, p ( θ | y, σ, ˜ µ, ˜ τ ) = 1 √ πσ ˜ τ e − σ τ ( θ − θ ˜ τ ) , (5)where,1 σ τ ≡ σ + 1˜ τ , (6)is the sum of precisions of the prior and the experimen-tal result, while θ ˜ τ ≡ (cid:18) σ + 1˜ τ (cid:19) − (cid:18) yσ + ˜ µ ˜ τ (cid:19) , (7)is the precision averaged central value. Clearly, if the ex-periment has a small error, σ (cid:28) ˜ τ , it will dominate θ ˜ τ .In the limit ˜ τ → ∞ , the prior is called non-informative .Now, let us include further such experiments withcentral values y i and total errors σ i , all measuring thesame quantity θ , as illustrated in Fig. 1. For simplicity,we assume that the σ i are mutually uncorrelated. The Fig. 1

Ordinary averaging. We assume that the y i are ran-dom outcomes of measurements of the same parameter θ . posterior distribution p ( θ | y i , σ i , ˜ µ, ˜ τ ) is again given byEq. (5), but now with1 σ τ = N (cid:88) i =1 σ i + 1˜ τ , (8)and θ ˜ τ = σ τ (cid:32) N (cid:88) i =1 y i σ i + ˜ µ ˜ τ (cid:33) . (9)Obviously, the uncertainty σ ˜ τ in θ decreases strictlymonotonically with the inclusion of more experiments.Nevertheless, if one or several of the experiments wassubject to a number of systematic eﬀects that was nei-ther corrected for, nor accounted for in the individualuncertainties σ i , then the experiments are (eﬀectively)not measuring the same quantity, and σ ˜ τ would be un-derestimated. In other words, each experiment can beviewed as measuring diﬀerent parameters θ i , which are,however, not entirely independent of each other, sinceafter all, the experiments were supposed to constrainthe same θ . We will now review hierarchical Bayesianmodeling, and propose it as a systematic method to in-terpolate between the extreme and rarely realistic casesof all θ i being either equal or else entirely independentof each other.2.2 The hierarchical modelThis is achieved by considering each θ i to be the resultof a random draw from a parent distribution, p ( θ i ) = (cid:90) p ( θ i | µ, τ ) p ( µ, τ ) dµdτ, (10)where p ( µ, τ ) is the hyper-prior distribution for whatare now called the hyper-parameters µ and τ . We sketchthis model in Fig. 2. Note that Eq. (10) implies theproperty of ex-changeability between the θ i , i.e. sym-metry under θ i ↔ θ j . From Bayes’ theorem one has, p ( θ i , µ, τ | y i , σ i ) ∝ p ( y i | θ i , σ i ) p ( θ i | µ, τ ) p ( µ, τ ) , (11) Fig. 2

Hierarchical model. Each experimental parameter θ i arises from a random draw from a parent distribution withhyper-parameters µ and τ , and each experimental centralvalue y i is then considered to be the result of a random drawfrom a Gaussian distribution with central value θ i and er-ror σ i . and explicitly in the Gaussian case, p ( θ i , µ, τ | y i , σ i ) ∝ N (cid:89) i =1 N ( y i | θ i , σ i ) N ( θ i | µ, τ ) p ( µ, τ ) . (12)Marginalizing over θ i one ﬁnds the “master” equation, p ( µ, τ | y i , σ i ) ∝ N (cid:89) i =1 N ( µ | y i , σ i + τ ) p ( µ, τ ) . (13)We will use it to compute the posterior distributionof the hyper-parameters, once a hyper-prior is chosen.For example, assuming a ﬂat prior for µ and τ , we canintegrate over µ to ﬁnd, p ( τ | y i ) ∝ (cid:32) N (cid:88) i =1 σ i + τ (cid:33) − N (cid:89) i =1 N (ˆ µ | y i , σ i + τ ) , (14)where,ˆ µ = (cid:32) N (cid:88) i =1 σ i + τ (cid:33) − N (cid:88) i =1 y i σ i + τ . (15)The parameter τ quantiﬁes general diﬀerences in the θ i . If τ = 0, the experiments measure the same param-eter, i.e. , θ i = θ j . For τ → ∞ , each one measures acompletely independent parameter θ i .From the master equation one can see that the pa-rameter of interest is µ . If τ = 0 the posterior distribu-tion for µ reduces to the ordinary likelihood for param-eter estimation given in Eq. (5) with ˜ τ → ∞ . The fullposterior distribution for µ can be obtained integratingEq. (13) numerically over τ . If there are large unknownsystematic eﬀects, then the most likely values of τ willdiﬀer from zero, which leads to the important result ofincreasing the error in µ . µ -independent, i.e. , p ( µ, τ ) = p ( τ ), and that interpolates smoothly betweena ﬂat and a sharply peaked τ distribution, p ( τ ) dτ ∝ N (cid:89) i =1 (cid:20) σ i + τ (cid:21) α N dτ . (16)This form will prove to be useful due to the simple in-terpretation of α in terms of the number of degrees offreedom, and the possibility to obtain closed analyticalformulas for the posterior distribution of µ . We remarkthat in Bayesian methods one needs to specify a priorthat cannot be determined from ﬁrst principles. Herewe have chosen a prior with a simple analytical forminterpolating between a ﬂat prior and τ = 0. Very in-terestingly, while this prior is only one of many possiblechoices, it turns out that it coincides with Jeﬀrey’s priorin a certain limit. We will return to this at the end ofSection 6.It is interesting to study the eﬀect of this kind ofprior on the tails of the posterior density of µ . Integrat-ing Eq. (13) over τ produces the posterior density of µ given the data, p ( µ | y i ) ∝ (cid:90) ∞ N (cid:89) i =1 (cid:0) σ i + τ (cid:1) − (1+ αN ) e − ( µ − yi )22( σ i + τ dτ . (17)For large µ , the exponential suppression factor favorslarge values of τ , so that, p ( µ | y i ) ∼ (cid:90) ∞ τ − ( N + α ) e − Nµ τ dτ , (18)and after a change of variables u ≡ µ /τ , p ( µ | y i ) ∼ µ − ( N + α − . (19)We observe that the usual exponential suppression of µ in the tails has turned into a milder power law sup-pression which increases with the eﬀective number ofdegrees of freedom, i.e. , in our case the number or mea-surements, ν ≡ N + α − When all errors are equal, σ i = σ j ≡ σ , we obtainan analytical formula which illustrates how the PDGscale factor re-emerges for large data sets. The masterequation reads in this case, p ( µ, τ | y i ) ∝ (cid:0) σ + τ (cid:1) − ν +22 exp (cid:34) − (cid:80) Ni =1 (¯ y i − µ ) σ + τ ) (cid:35) , Fig. 3

Scale factor versus the square root of the reduced χ .We employed α = 0. or simply, p ( µ | y i ) ∝ ∞ (cid:90) (cid:0) σ + τ (cid:1) − ν +22 exp (cid:20) − σ χ σ + τ ) (cid:21) dτ , (20)where we deﬁned, χ ≡ χ ( µ ) ≡ N (cid:88) i =1 ( µ − ¯ y i ) σ , (21)which is the usual χ function. Changing variables, u ≡ σ χ ( µ )2( τ + σ ) , (22)we obtain, p ( µ | y i ) ∝ ( χ ) − ν χ / (cid:90) u ν − e − u du ∝ ( χ ) − ν F ν ( χ ) , (23)which is the master formula in this case in terms of thecumulative distribution function F for a χ distribu-tion with ν degrees of freedom. This equation impliesan interesting result. Since p ( µ | y i ) depends on µ onlythrough χ ( µ ), we have dp ( µ | y i ) dµ = dp ( µ | y i ) dχ dχ dµ , (24)so that the mode of the distribution is the same as inthe usual case, i.e. , at the value of µ where χ (cid:48) ( µ ) = 0.Thus,For σ i = σ j the posterior distributions of thehierarchical and non-hierarchical models peak atthe same location. Fig. 4

Scale factor versus the square root of the reduced χ for the case N = 10. From Eq. (23), we can also obtain the scale factor,which we deﬁne here as the ratio of the sizes of the 68%highest conﬁdence intervals of the hierarchical and non-hierarchical models. In Figs. 3 and 4, we show the scalefactor for several values of α and N , from which one cansee the similarity to the PDG scale factor for large N .We now turn to the case of a large number of degreesof freedom and the Gaussian approximation.3.1 Large number of degrees of freedomWe rewrite Eq. (23) by another change of variables, χ r u, (25)so that p ( µ | y i ) ∝ (cid:90) exp (cid:20) − ν − (cid:0) rχ ν − − ln r (cid:1)(cid:21) dr, (26)where we deﬁned χ ν − ≡ χ / ( ν − ν suppress the integrand exponentially. Dependingon the value r = ( χ ν − ) − where rχ ν − − ln r has aminimum, we have two cases:(1) For r > r near 1, which gives p ( µ | y i ) ∝ e − χ / − χ ν − (cid:104) − e − ν − ( − χ ν − ) (cid:105) ∼ e − χ / , (27)We recognize this is the usual likelihood for parameterinference without scaling. Thus,for σ i ≈ σ j , ν → ∞ and χ ν − ( µ ) <

1, the hier-archical model implies no scaling of the errors. �� = �� = �� = �� = �� χ � � - � � Fig. 5

Comparison of the exact result with the approximateformula for α = 0. (2) For r < r near r . After some algebra, p ( µ | y i ) ∝  ν − µ − µ ) (cid:16) σ χ ν ( µ ) N (cid:17)  − ν , (28)which is proportional to the Student-t distribution for ν − ν it can befurther approximated by a Gaussian, p ( µ | y i ) = t ν − (cid:18) µ , σ χ ν N (cid:19) ∼ N (cid:18) µ , σ χ ν N (cid:19) . (29)This yields another important result,for σ i ≈ σ j , ν → ∞ and χ ν − ( µ ) >

1, the hier-archical model implies a re-scaling of the overallerror by σ → σ (cid:112) χ ν ( µ ).It is amusing to note that for large ν we recovered thePDG scale factor prescription. On the other hand, forlow values of ν our model implies larger scalings thanrecommended by the PDG. In the next subsection weapproximate the distribution of µ as a Gaussian, so asto obtain an analytical formula for the scale factor interms of ν and the value of χ .3.2 Gaussian approximationTo do so, we expand the logarithm of the posterior dis-tribution p = p ( µ | y i ) in powers of µ around µ ,ln p = C + d ln pdµ (cid:12)(cid:12)(cid:12)(cid:12) µ ( µ − µ )+ d ln pdµ (cid:12)(cid:12)(cid:12)(cid:12) µ ( µ − µ ) · · · The second term on the right hand side is zero becausewe are expanding around the maximum. The third term

Fig. 6

The blue points with identical errors originate froma Gaussian distribution centered at 10. The last blue pointhas the same precision as the combination of the previous10 points, but deviates by about 5 σ . The red point is theordinary weighted average after PDG scaling. The black pointis obtained using our Bayesian method. can be compared to the corresponding term of the ex-pansion of a Gaussian distribution, which gives1 σ ≈ − d ln pdµ (cid:12)(cid:12)(cid:12)(cid:12) µ = − Nσ d ln pdχ (cid:12)(cid:12)(cid:12)(cid:12) χ . (30)Using Eq. (23) we have, − d ln pdχ (cid:12)(cid:12)(cid:12)(cid:12) χ = νχ − (cid:0) χ / (cid:1) ( ν − ) e − χ / γ ( ν/ , χ / , (31)where γ is the incomplete Gamma function, deﬁned by γ ( s, x ) ≡ x (cid:90) t s − e − t dt. (32)As we mentioned before, the scale factor S Bayes is de-ﬁned as the ratio of the sizes of the 68% highest conﬁ-dence intervals of the hierarchical and non-hierarchicalmodels. In the Gaussian approximation we ﬁnd, S Bayes ≈ √

N σ

Bayes σ ≈ (cid:114) χ ν  (cid:80) ∞ k =1 ( χ ) k ν !!( ν +2 k )!!  , (33)where we have used the power series expansion of theincomplete Gamma function, γ ( s, x ) = x s Γ( s ) e − x ∞ (cid:88) k =0 x k Γ( s + k + 1) . (34)In Fig. 5 we compare the approximate formula with theexact result. As expected, the approximation improvesfor larger values of ν . We are now ready to discuss thegeneral case of unequal errors, σ i (cid:54) = σ j . To understand this case, we ﬁx the value of τ in Eq. (13).The distribution of µ is then Gaussian, with total error,1 σ t = N (cid:88) i =1 σ i + τ , (35)and central value, µ = (cid:32) N (cid:88) i =1 σ i + τ (cid:33) − N (cid:88) i =1 y i σ i + τ . (36)Thus, experiments with smaller errors are more sensi-tive to τ than less precise ones. Suppose that M of theexperiments have an error σ M , and that σ M is muchsmaller than the error σ of the rest of the experiments.Then, for σ M (cid:39) τ (cid:28) σ the scaling will mainly aﬀect theexperiments with small errors. Since we were unable toﬁnd an analytical formula for the peak or mean of τ ,we proceed with a numerical analysis.As a ﬁrst example, we randomly generated elevenﬁctitious measurement points from a Gaussian withstandard deviation σ = 1 centered at the value of 10.The last point is from a Gaussian centered at 10+5 / √ σ M = 1 / √

10, which is chosen so that its precisionis the same as the combined precision of the other ten.The results are shown in Fig. 6. The red point denotesthe ordinary weighted average with PDG scaling ap-plied, and is pulled away from the horizontal line asa result of the deviating 11th measurement. The blackpoint, on the other hand, is the average obtained as theresult of our Bayesian hierarchical model (here we use α = 10 to specify our prior). It is closer to the bulk ofdata than to the measurement with the smaller error.This is a reasonable property, since it is less likely thatall the measurements in the bulk had a systematic errorin the same direction.In Fig. 7 we show how the two kind of averageschange when we move the central value of the 11thmeasurement (in blue) while leaving the other 10 un-changed. Just for orientation, the gray band representsthe ordinary average (non-hierarchical) of the bulk ofmeasurements with the same error. As in Fig. 6, thered points are the usual PDG-scaled averages, whilethe black points are the hierarchical averages. Clearly,as we approach the bulk the combined error shrinks. There is an interesting discrepancy between the twotypes of experiments measuring the lifetime of the neu-tron. For a state of the art review of both types and y blue y grayblue Fig. 7

The measurement points with small error are shownin blue, the usual averages with the PDG scaling in red, andthe hierarchical averages in black. The labels at the horizontalaxis show by how many σ M the blue points deviate from thegray point. The gray band represents the ordinary weightedaverages of the bulk of measurements in Fig. 6. more details, see Ref. [11]. The ﬁrst type are beam ex-periments [12,13,14], which measure the number of pro-tons or electrons from decays of cold neutrons in a beampassing through a magnetic or electric trap. After thebeam has passed the trap, some of the neutrons aredeposited in a foil at the end of the beam path. Theneutron lifetime is proportional to the rate of neutronsdeposited and inversely proportional to the rate of de-cays detected.The other type of experiment uses bottles [15,16,17,18,19,20,21] containing ultra-cold neutrons with a ki-netic energy of less than 100 neV. Neutrons with such alow kinetic energy can be conﬁned due to the eﬀectiveFermi potential between neutrons and atomic nucleiin many materials. Gravitational forces and magneticﬁelds can also be used to conﬁne the neutrons withinthe container. The idea is simply to count the numberof surviving neutrons after some time and to deduce thelifetime.We now apply our method with α = 6 to the resultsof these experiments which are shown in Fig. 8. PDG χ scaling ( S P DG = 1 . τ n = 879 . ± .

78 s, while the Bayesian method(black point to the left) gives τ Bayes n = 880 . +0 . − . s. Weﬁnd that our Bayesian hierarchical method increasesthe central value when the beam experiments are in-cluded. Even when only bottle experiments are consid-ered, our method still gives a slightly larger averagevalue τ Bayes n = 879 . +0 . − . s, than the PDG method τ n = 879 . ± .

64 s where S P DG = 1 .

56. This is due tothe bulk of the bottle experiments that prefer lifetimeslonger than 880 s. It is important to recall that thetails of the Bayesian hierarchical model do not fall asfast as a Gaussian, so that there is still a non-negligibleprobability for τ n to be lower. Fig. 8

Neutron lifetime measurements. The green points arethe results of bottle experiments, and the blue ones of beamexperiments. The discrepancy can easily be seen. The blackpoint to the left is the Bayesian average of the full data, whilethe ﬁrst red point is the usual average with the PDG scaling.Similarly for the right black and red points but restrictedto the bottle results. The PDG scaling for beam plus bottleexperiments is S PDG = 1 .

96, while for bottle only is S PDG =1 . While this paper was being written, two interesting pa-pers related to our work appeared. The ﬁrst one [22]discusses the kaon mass in the context of a skepticalcombination of experiments, scaling each experimentalerror independently but correlated. The second one [23]studies the discrepancy that arises when the PDG scal-ing is applied to sub-sets of experiments and then tothe combination of the sets, vs. (for example) applyingit to the whole data at the same time. The conclusionis thatthe χ /ν prescription used to enlarge the stan-dard deviation does not hold suﬃciency.This means that the scaling is not suﬃcient to properlydescribe the full probability distribution. Our modelwould have had the same problem had we used themarginalized (over τ ) distribution of µ . This is be-cause the “correlations” that emerge through τ wouldbe absent. But it is clear from Eq. (13) that if we usethe posterior distribution of µ and τ of a subset of ex-periments as the prior for the remaining subset, thenthe updated posterior will be the same as combiningthe whole data set simultaneously.Another interesting point made in Ref. [23] is thefact that the PDG scaling treats any value of N equally,while for ﬁxed χ /N the p -value decreases with N . Inother words, since the probability distribution of thereduced χ function peaks around one as the numberof degrees of freedom increases, the scaling (given a dis-crepant value of the reduced χ ) should be larger whenmore experiments are included in the average. This isnot the case for the PDG description, because the scal- Fig. 9

Scaling for α = 6. ing only depends on the reduced χ value and not onthe number of degrees of freedom. Now, it is clear fromFig. 3 that in the Hierarchical Model with α chosenclose to zero this problem would be aggravated, i.e. , forany given value of the reduced χ , there is more scalingfor low N . However, we can use the freedom to choose avalue of α to improve on this issue. First we demand thevariance of the τ distribution to be ﬁnite, which corre-sponds to α >

6. In Fig. 9 we show the scaling versusthe reduced χ with α = 6 + (cid:15) (where (cid:15) is an inﬁnitesi-mal) from which one can see that for large values of thereduced χ the scaling reduces as N gets smaller. Thisis just the desired eﬀect. On the other hand, we stillhave more scaling for small values of the reduced χ .This is a natural consequence of the fact that for a lownumber of experiments τ can not be constrained toostrongly, which translates into an enlarged error for µ .One can also consider Jeﬀrey’s prior . E.g. , if wespecify to the case of uncertainties of equal magnitude, σ i = σ j = σ , then Jeﬀrey’s prior reduces precisely toEq. (16) with α = 3. This would lead to a plot verysimilar to the one shown in Fig. 9. We proposed a Bayesian hierarchical model as a strat-egy to compute averages of several uncorrelated exper-imental measurements, speciﬁcally with the possibilityin mind that unaccounted for systematic eﬀects mightbe present, leading to underestimates of the quoted un-certainties. We should stress that the point is not that(some part of) the systematic error has been under-estimated or assessed too aggressively. If this is sus-pected then a strategy should be developed to increasethe systematic error component(s), which would imply In the case of a distribution with several parameters (in ourcase µ and τ ), Jeﬀrey’s prior is deﬁned as the square rootof the determinant of Fisher’s information matrix, which inturn is deﬁned as the average (over y i ) of the Hessian of thelog-likelihood N ( y i | µ, τ + σ i ). — among other things — that statistics limited mea-surements would not be questioned. Here, we rather ad-dressed the generic situation in which unknown eﬀectsor human errors may be present, and which thereforecould aﬀect even ostensibly clean determinations.We have shown that our methodology resembles therecommendation of the Particle Data Group wheneverthe number of degrees of freedom (data points) is large.Our approach connects smoothly to cases with fewerdegrees of freedom, though. Another important advan-tage is that it makes the underlying assumptions inthe averaging process transparent. E.g. , a large valueof the parameter α appearing in our proposed form ofthe prior, implies a strong believe that the experimentsdo not have an unknown systematic error, while a smallvalue corresponds to a more agnostic point of view. Ourmethod can be extended to experiments with correlatederrors, but we leave this generalization for the future.Due to the additive form, σ i + τ , of the denom-inator in the exponential part of the distribution, ourmodel has the drawback that it tends to penalize ex-periments with high precision more strongly. This rel-ative issue is already seen in the τ n example, wherethe most recent beam measurement which has a largererror than most bottle experiments and a higher cen-tral value tends to push the combined value up. On theother hand, the natural power suppressed tails of theposterior distribution help to mitigate possible strongshifts in the central value.We also would like to point out that to apply ourmethod to the PDG, it has to be studied, discussedand compared with other approaches in more detail, toconﬁrm that it can be used within the PDG framework.In closing, we remark that we also envision an ap-plication of this model in the context of new physicssearches within the Standard Model Eﬀective Field The-ory (SMEFT) framework [24,25], in which thousandsof a priori independent operator (Wilson) coeﬃcientsneed to be determined. Yet, many of these operatorsare almost certainly generated at some common energyscale, and are consequently not entirely independent.Thus, the idea is to assume that (classes of) the Wilsoncoeﬃcients are random samples generated at a commonultra-violet energy scale, lending itself to a hierarchicalapproach. This can be particularly useful when estimat-ing the sensitivity of a hypothetical future experimentto physics beyond the Standard Model. This is anotherdirection for future work. We are happy to thank Glen Cowan and Giulio D’Agostinifor discussions and comments and Marumi Kado for pointing us to relevant references. This work was sup-ported by CONACyT (Mexico) project 252167–F, andalso the German-Mexican research collaboration grantSP 778/4–1 (DFG) and 278017 (CONACyT).

References

1. M. Tanabashi et al. (Particle Data Group), Phys. Rev. D , 030001 (2018).2. A. Gelman et al. , Bayesian Data Analysis , Chapman andHall/CRC (2013).3. R. E Tarone, Biometries , 215 (1982).4. A. P. Dempster, M. R. Selwyn and B. J. Weeks, J. Am.Stat. Assoc. , 221 (1983).5. L. V. Hedges and I. Olkin, Statistical Methods for Meta-Analysis , Academic Press (1985).6. T. Friede, C. R¨over, S. Wandel and B. Neuenschwander,Res. Synth. Methods , 79 (2017).7. T. Friede, C. R¨over, S. Wandel and B. Neuenschwander,Biom. J. , 658 (2017).8. G. Cowan, Eur. Phys. J. C , 133 (2019).9. G. D’Agostini, Sceptical combination of experimentalresults: General considerations and application to (cid:15) (cid:48) /(cid:15) ,arXiv:hep-ex/9910036.10. S. Mukhopadhyay and D. Fletcher, Sci. Rep. , 9983(2018).11. F. E. Wietfeldt, Atoms , 6(4) (2018).12. P.E. Spivak, JETP , 1735 (1988).13. J. Byrne et al. , Eur. Phys. Lett. , 187 (1996).14. A. T. Yue et al. , Phys. Rev. Lett. , 222501 (2013).15. A. Pichlmaier et al. , Phys. Lett. B , 221 (2010).16. A. Steyerl et al. , Phys. Rev. C , 065503 (2012).17. A. P. Serebrov et al. , Phys. Rev. C , 055503 (2018).18. A. Serebrov et al. , Phys. Lett. B , 72 (2005).19. V. F. Ezhov et al. , JETP Lett. , 671 (2018).20. R. W. Pattie, Jr. et al. , Science , 627 (2018).21. S. Arsumarov et al. , Phys. Lett. B , 79, (2015).22. G. D’Agostini, Skeptical combination of experimental re-sults using JAGS/rjags with application to the K ± mass de-termination , arXiv:2001.03466.23. G. D’Agostini, On a curious bias arising when the (cid:112) χ /ν scaling prescription is ﬁrst applied to a sub-sample of theindividual results , arXiv:2001.07562.24. W. Buchm¨uller and D. Wyler, Nucl. Phys. B , 621(1986).25. B. Grzadkowski, M. Iskrzynski, M. Misiak and J. Rosiek,JHEP1010