[PDF] Model selection in the average of inconsistent data: an analysis of the measured Planck-constant values

Abstract

When the data do not conform to the hypothesis of a known sampling-variance, the fitting of a constant to a set of measured values is a long debated problem. Given the data, fitting would require to find what measurand value is the most trustworthy. Bayesian inference is here reviewed, to assign probabilities to the possible measurand values. Different hypothesis about the data variance are tested by Bayesian model comparison. Eventually, model selection is exemplified in deriving an estimate of the Planck constant.

Full PDF

aa r X i v : . [ phy s i c s . d a t a - a n ] J u l Model selection in the average of inconsistent data:an analysis of the measured Planck-constant values

G Mana , E Massa and M Predescu INRIM – Istituto Nazionale di Ricerca Metrologica, str. delle Cacce 91, 10135Torino, Italy Politecnico di Torino, corso Duca degli Abruzzi 24, 10129 Torino, Italy

Abstract.

When the data do not conform to the hypothesis of a knownsampling-variance, the ﬁtting of a constant to a set of measured values is a longdebated problem. Given the data, ﬁtting would require to ﬁnd what measurandvalue is the most trustworthy. Bayesian inference is here reviewed, to assignprobabilities to the possible measurand values. Diﬀerent hypothesis about thedata variance are tested by Bayesian model comparison. Eventually, modelselection is exempliﬁed in deriving an estimate of the Planck constant.

Submitted to:

Metrologia

PACS numbers: 06.20.Jr, 07.05.Kf, 02.50.Cw, 06.20.DkE-mail: [email protected]

1. Introduction

Given a set of measured values of a constant and the associated uncertainties, theGauss-Markov theorem states that the weighted arithmetic mean is the unbiasedminimum-variance estimator of the measurand [1]. The uncertainty of the mean, whichis smaller than the smallest data uncertainty, does not depend on data spread. This isa consequence of the assumption that the variance of the sampling distribution of eachdatum is exactly know. In practice, this hypothesis is often false and the inconsistencyof the data – quantiﬁed, for example, by the χ or the Birge-ratio values – suggeststhat the uncertainties associated with data are merely lower bounds to the standarddeviations of their sampling distributions. In this case, the Gauss-Markov theoremis of no help and the choice of an optimal measurand value is a long debated issue.To average inconsistent data, a number of approaches have been considered; for anoverview the reader can refer to [2, 3, 4, 5, 6, 7, 8, 9] and the included references. Datainconsistency suggests many diﬀerent hypotheses; for instance, that a term is missingin the error budget or that uncertainties are lower bounds to standard-deviations.Hence, it is necessary to identify the best data model.Hypothesis testing is an important part of data analysis. In the averaging ofinconsistent data, we are interested to quantify and to compare the degree of reliabilityof, for instance, Gaussian sampling having known variance, a missing term in the errorbudget, or sampling-variance underestimate. Traditional tests asses the mismatchbetween model and data; for instance, via χ calculation. Although a small modellikelihood corresponds to a large χ value, to test a model from data, we must calculatethe probability that the model is correct, not the misﬁt.Decision theory, probability calculus, and Bayesian inference help to deal withthe discrepancy between quoted uncertainties and data scatter. Given a data model,the ﬁrst step is to ﬁnd the post-data probability density of the measurand values,which is summarized by the product of data sampling-distribution and the pre-datadistribution of the measurand values. The normalizing constant of this product, whichis named evidence, measures of how much the data support the underlying model.Diﬀerent models can thus be compared by the evidence calculation. Once the bestmodel has been selected, the optimal choice of the measurand value minimizes theexpected loss over the probabilities of the measurand values. Bayesian inference allowsdiﬀerent hypotheses to be compared and makes no longer necessary to exclude the datadisagreeing with the majority, as well as to arbitrarily scale the uncertainties to makethe data consistent.After review of data averaging, the assumption that data are sampled fromindependent Gaussian distributions having known variance is compared againstalternative hypotheses, for example, that the uncertainties are only lower boundsto the standard deviations or that there are undiscovered errors. As an example,hypothesis testing is carried out by using the measured values of the Planck constant.The best data-model, of those considered, is identiﬁed and a Planck-constant value isestimated.

2. Problem statement

Let us consider a set of N measured values x i of a measurand h , where u i is theuncertainty associated with the x i datum. The data are judged to be mutuallyinconsistent when combined in a least squares analysis. Hence, they are supposedindependently sampled from Gaussian distributions having unknown variance σ i . Theproblem is to ﬁnd optimal estimates of the measurand value and of the conﬁdenceintervals.We cannot rely on the weighted mean, because the actual variances of thesampling distributions are not known. On the other hand, we cannot go back tothe arithmetic mean, because it leaves out signiﬁcant information given by the datauncertainties.Diﬀerent hypotheses can be made about the variance of data. The purpose is tochoose between the hypotheses and to ﬁnd an optimal value of the measurand. Thereis no constraint about hypotheses, nor there is on their number. As an illustration weshall consider: H0) the data uncertainties are the standard deviations of the samplingdistributions; H1) the standard deviations are larger than the data uncertainties bya common scale factor [2, 8], H2) the standard deviations are larger than the datauncertainties by a common additional contribution [3]; H3) the data uncertainties arelower bounds for the standard deviations of the relevant sampling distributions [9]. Inthe same way, we can test the extend by which the data support any other model.

3. Hypothesis testing

Let us assume a basic knowledge of probability calculus and of Bayesian inference;introductions can be found in [9, 10, 11, 12]. To choose from diﬀerent data models, wemust compare the probabilities of each of them being true, given the { x i } data-set.To ﬁnd the post-data probability Prob(H | x i ), where H indicates the hypothesis beingtested, we start from the product ruleProb( A, B ) = Prob( A | B )Prob( B ) = Prob( B | A )Prob( A ) , (1)where A and B are propositions – e.g., aﬃrming that a measurement result is x i , thatthe measurand value is h , or that the hypothesis H is true – and A | B implies that B is true. Hence, Prob(H , x i ) = Prob(H | x i ) W ( x i ) = Z ( x i | H)Prob(H) , (2)where W ( x i ) = X H Prob(H , x i ) = X H Z ( x i | H)Prob(H) , (3)is the probability density of data regardless of any hypothesis and Prob(H) is theprobability of H regardless of data. The post-data probability of H,Prob(H | x i ) = Z ( x i | H)Prob(H) W ( x i ) , (4)is determined by two factors: Prob(H), the pre-data probability of H, and the data-dependent term Z H ( x i ) = Z ( x i | H), which is named evidence for H. The Z ( x i | H)evidence depends on both the data and H. For a ﬁxed model, Z ( x i | H) is the datadistribution regardless of the model parameters. For ﬁxed data, Z ( x i | H) is theevidence for the data model. If we are only interested to compare H against anotherhypothesis, W ( x i ), which is a normalizing factor, can be ignored. In addition, on theassumption that, before the data are available, the probabilities of diﬀerent hypothesesare the same, also Prob(H) can be ignored. Therefore, to ﬁnd the most probablehypothesis, we can identify Z ( x i | H) with Prob(H | x i ), the calculation of which is thuscentral.The evidence calculation is based again the Bayesian analysis. To examine thispoint, let us consider the general case where H = H( h, λ ) is a class of hypotheseshaving h and λ as parameters. The post-data probability density Θ H ( h, λ ; x i ) of themodel parameters is obtained by application of P ( h, λ, x i | H) = Θ H ( h, λ ; x i ) Z H ( x i ) = L H ( h, λ ; x i ) π ( h, λ ) , (5)where Z H is the sought evidence for H, π ( h, λ ) is the pre-data probability density ofthe parameter values, and L H ( x i ; h, λ ) is a data-dependent term. For ﬁxed h and λ , L H ( x i ; h, λ ) is the data sampling-distribution; for ﬁxed data, L H ( h, λ ; x i ) is thelikelihood of the parameter values. Here and in the following, in the function argument,the semicolon separates the independent variables from the ﬁxed parameters.Model selection within the H class is equivalent to an estimation problem. From(5), for a ﬁxed λ , the evidence for the H( λ ) model, Z H ( λ ) = Z + ∞−∞ L H ( h, λ ; x i ) π ( h, λ ) d h (6)is the marginalization of L H ( h, λ ; x i ) π ( h, λ ) with respect to h , that is, the nonnormalized post-data distribution of the λ values, regardless of the measurand value.Similarly, the evidence for the whole H class, Z H = Z + ∞−∞ L H ( h, λ ) π ( h, λ ) d h d λ = Z + ∞−∞ Z H ( λ ) d λ, (7)is the normalizing factor of L H ( h, λ ; x i ) π ( h, λ ). If we only wish to choose an optimalmeasurand value, we can forget Z H ; however, in model selection problems, it is of theutmost importance.

4. Measurand estimate

Given the data and their associated uncertainty, the Bayesian solution to the estimateproblem makes it necessary to assign probabilities to the h values and to usethese probabilities to minimize any given loss function [10, 11, 12]. The post-datadistribution Θ H ( h ; x i ) is central to this analysis. Given the loss L ( h − h ) due to awrong estimate h , the optimal choice of the h value minimizes h L ( h − h ) i = Z + ∞−∞ L ( h − h )Θ H ( h ) d h. (8)With a quadratic loss, the optimal estimate is the mean; with a linear loss, it is themedian. With a loss independent of error, it is the mode. Conﬁdence intervals areeasily calculable from Θ H ( h ).When H is a class of hypotheses, the result of the Bayesian analysis is the jointprobability density Θ H ( h, λ ; x i ), where λ is the class parameter. In this case the usualapproach to measurand prediction uses the probability density Θ H ( h, λ ; x i ) with aﬁxed λ value, e.g., the most probable λ value. More accurate predictions are obtainedby marginalization,Θ H ( h ) = Z + ∞−∞ Θ H ( h, λ ) d λ, (9)which takes account of the uncertainty of the λ value.

5. Model selection H0 Hypothesis

If the data uncertainties are the standard deviations, that is, if the data areindependently sampled from Gaussian distributions having σ i = u i variance, L H ( h ; x i )is the Gaussian likelihood, L H0 ( h ) = 1 p (2 π ) N Q Ni =1 u i exp (cid:20) − N X i =1 ( x i − h ) u i (cid:21) (10)= 1 p (2 π ) N Q Ni =1 u i exp (cid:18) − χ (cid:19) exp (cid:20) − ( x − h ) σ x (cid:21) , where x = σ x N X i =1 x i /u i (11 a )is the weighted arithmetic mean of data, σ x = 1 P Ni =1 /u i (11 b )is the x variance, and χ = N X i =1 x i /u i − x /σ x = N X i =1 ( x i − x ) u i (11 c )is the sum of the squared residuals.To calculate the post-data probability density of the h value and the evidencefor the H0 hypothesis, we must assign pre-data probabilities to the h values. Inthe absence of any additional information, we suppose these pre-data probabilitiesindependent of the unit-scale origin, which implies the uniform distribution π h ( h ) = If(Ξ / < h < Ξ / , (12)where If( A ) gives 1 if A is true and 0 if it is not. The interval [ − Ξ / , Ξ /

2] includes theunknown measurand value. In the following, we shall always use this prior measuranddistribution. Accordingly, the evidence for the H0 model is Z H0 = 1Ξ Z +Ξ / − Ξ / L H0 ( h ) d h = σ x Ξ p (2 π ) N − Q Ni =1 u i exp (cid:18) − χ (cid:19) , (13)where Ξ has been chosen large enough that the integration limits can be extended toinﬁnity. The post-data probability density of the h values is the Gaussian distributionΘ H0 ( h ) = 1 √ π σ x exp (cid:20) − ( h − x ) σ x (cid:21) . (14)The mean and most probable h values are both equal to the weighted mean (11 a ),whereas σ x is the h variance. In this case, the Bayesian analysis replicates the least-squares analysis. The novelty is the calculation of the evidence for the σ i = u i hypothesis: it allows the probability of this assumption to be compared againstalternatives. H1 Hypothesis

The H1 hypothesis corresponds to σ i = λu i , where 0 < λ < Λ. Since H1 is a hypothesisclass, we must specify the joint pre-data probability density π ( h, λ ) = π h ( h ) π λ ( λ ) ofboth the h and λ parameters. This density summarizes the knowledge about h and λ before considering data. Unlike Dose – who investigated extensively this case in [8] –in the absence of any information about the λ magnitude prior to considering data,we use the bounded uniform density π λ ( λ ) = If(0 < λ < Λ)Λ . (15)Our choice is motivated by the H1 data-model including the u i values; therefore,the uncertainties associated to data are supposed available before the data themselvesand ﬁx the scale of the problem. For this reason, rather than π λ ( λ ) invariance withrespect to scale transformations, namely, λ = aλ ′ and 0 < λ ′ < Λ /a , we supposeinvariance with respect to translations, namely, λ = λ ∗ + λ ′ and − λ ∗ < λ ′ < Λ − λ ∗ .The ﬁrst invariance implies the Jeﬀryes prior distribution used in [8, 9]; the secondimplies (15). We will use (12) and (15) throughout the paper.The parameter likelihood, L H1 ( h, λ ) = 1 p (2 π ) N λ N Q Ni =1 u i exp (cid:18) − χ λ (cid:19) exp (cid:20) − ( x − h ) λ σ x (cid:21) , (16 a )and the λ evidence, Z H1 ( λ ) = σ x If(0 < λ <

Λ)ΞΛ p (2 π ) N − λ N − Q Ni =1 u i exp (cid:18) − χ λ (cid:19) , (16 b )are obtained from (10) and (13), where Ξ has been chosen large enough that theintegration limits can be extended to inﬁnity.Equation (16 b ) expresses the relative probability of diﬀerent λ values; the mostprobable one, R B = r χ N − , (17)is the Birge ratio. If R B is not equal to 1, the data are inconsistent: the measurementuncertainties have been underestimated and the most probable standard deviations, R B u i , diﬀer from the uncertainties associated to data. The data are inconsistent alsoif R B <

1. In this case, the measurement uncertainties have been overestimated.The Bayesian derivation of (17) allows an expression of the Birge ratio to be foundalso when the data are correlated. By observing that the χ value is invariant withrespect to orthogonal transformations of the x = [ x , x , ... x N ] T data set, if the dataare correlated, the sum of the squared residuals is χ = x T C − xx x , where C xx is thecovariance matrix. Hence, in order to make data consistent, the covariance matrixmust be scaled to R B C xx , where R B is still given by (17).With a ﬁxed λ , say, λ = R B , the post-data distribution of the measurand valuesis Θ H1 ( h ; R B ) = N ( h ; x, σ B ) = 1 √ π σ B exp (cid:20) − ( h − x ) σ B (cid:21) , (18)where σ B = R B σ x and N ( h ; x, σ B ) is a normal distribution having mean x and variance σ B . According to (7), the evidence for the H1 class is Z H1 = Z Λ0 Z H1 ( λ ) d λ = σ x Γ( N/ − p π N − ( χ ) N − Q Ni =1 u i , (19)where Λ is large enough that the limit of integration can be extended to inﬁnity and N >

2. Consequently, by combining (16 a ) and (19), the joint post-data distributionof h and λ isΘ H1 ( h, λ ) = p ( χ ) N − √ N − π σ x Γ( N/ −

1) exp (cid:18) − χ λ (cid:19) exp (cid:20) − ( h − x ) λ σ x (cid:21) λ N , (20)where Ξ , Λ → ∞ and N >

2. The marginal distribution of the measurand regardlessof λ is then Θ H1 ( h ) = p ( χ ) N − Γ (cid:2) ( N − / (cid:3) √ π σ x Γ( N/ − (cid:20) χ + ( h − x ) /σ x (cid:21) N − , (21)where the space of the h values has been extended to inﬁnity and N > h mean is x and the 68%conﬁdence interval is [ x − σ B , x + σ B ]. This approach assumes that the standarddeviations of data are R B u i with certainty, but R B u i are only the most probablevalues. The second way solves this ambiguity. It relies on (21) and accounts of thelack of knowledge about the standard-deviation of data. In this case, though the h mean is still x , the 68% conﬁdence interval must be calculated numerically from thecumulative distribution function of (21).Another and last consideration concerns the presence in evidence (19) of theranges Ξ and Λ of the h and λ pre-data distributions. They are Occam’s factorspenalizing H1 for having the free parameters h and λ [11]. The larger are theseranges, i.e., the model freedom to explain data, the smaller is the evidence for H1given by data. Since we shall consider the same pre-data distributions, these Occam’sfactors are of no relevance and will be omitted when comparing the H1, H2, and H3models. The H2 class corresponds to σ i = u i + λ u o , where 0 < λ < Λ, u is is a reference u value, and the pre-data distributions of h and λ are (12) and (15). The parameterlikelihood, model evidences, and post-data probability densities are obtained from(10), (13), and (14), where u i + λ u substitutes for u i . It is not possible to give thema concise form, but this does not prevent numerical computations. H3 Hypothesis

We now suppose that the data uncertainties are lower bounds for the standarddeviations, that is, u i ≤ σ i ≤ max( u i , λu ), where 0 < λ < Λ and u is a reference u value. A variant is u i ≤ σ i ≤ max( u i , λu i ). In both cases, the pre-data distributions of h and Λ are (12) and (15). Subsequently, we must assign probabilities to the unknown σ i values, for which only the u i lower bounds are known.In the absence of any information about the σ i magnitude before consideringthe data, but with the uncertainties u i being already known, we use the translationinvariant uniform distributions P ( σ ; u, λ ) = If( u < σ < λu ) λu − u λu > u, (22 a )if the model is u ≤ σ ≤ max( u, λu ), and P ( σ ; u, λ ) = If( u < σ < λu )( λ − u λ > , (22 b )if the model is u ≤ σ ≤ max( u, λu ); the i subscript has been dropped. P ( σ ; u, λ ) being stated, the sampling distribution of each x datum can bemarginalized to eliminate the unknown variance. Hence, if u ≤ σ ≤ max( u, λu ), Q ( x ; h, u, λ ) = Z λu u N ( x ; h, σ ) P ( σ ; u, λ ) d σ = Γ (cid:2) , ( x − h ) / (2 λ u ) (cid:3) − Γ (cid:2) , ( x − h ) / (2 u ) (cid:3) √ π ( λu − u ) λu > u = N ( x ; h, u ) λu ≤ u (23 a )where Γ( a, z ) is the incomplete Euler gamma function. If u ≤ σ ≤ max( u, λu ), themarginalized sampling distribution is Q ( x ; h, u, λ ) = Γ (cid:2) , ( x − h ) / (2 λ u ) (cid:3) − Γ (cid:2) , ( x − h ) / (2 u ) (cid:3) √ π ( λ − u λ > N ( x ; h, u ) λ ≤ . (23 b )For ﬁxed λ values, the post-data probability density of the measurand isΘ H3 ( h ; λ ) = π h ( h ) π λ ( λ ) L H3 ( h ; λ ) Z H3 ( λ ) , (24 a )where L H3 ( h ; λ ) = N Y i =1 Q ( x i ; h, u i , λ ) , (24 b ) Q is given in (23 a - b ), x i and u i are the i -th datum and its uncertainty, and the evidencefor the H3( λ ) model is Z H3 ( λ ) = π λ ( λ ) Z + ∞−∞ π h ( h ) L H3 ( h, λ ) d h. (24 c )It is not possible to obtain an analytical form of (24 c ), which must be evaluatednumerically.

6. Planck constant estimate

As an example of application, let us consider the choice of a Planck constant valueon the basis of the measurement results listed in table 1 [13, 14]. The Planckconstant links energy and momentum to the frequency and the wavelength of the wave-function. Therefore, the h determinations match energy (momentum) and frequency(wavelength) measurements. According to whether an electric or mechanic system isconsidered, the measurement result is a value of the h/e or h/m ratio, where e and m are the electron charge and a mass.The electric determinations are based on the measurements of h/e or h/ (2 e )by Ag → Ag + + e − electrolysis, the high-ﬁeld spin-ﬂip frequency, γ hi , of protons insamples of H O, and the tunneling of Cooper’s pairs in Josephson junctions. Thewatt-balance experiments accede directly to the h/m K ratio, where m K is the massof the international kilogram prototype. The measurement of the Avogadro constant N A opened the way towards comparing the watt-balance measurements of h withthe h/m quotients, where m is mass of a particle or of an atom. The h/m ( n ) ratiowas determined by time-of-ﬂight measurements of monochromatic neutrons; low-ﬁeldmagnetic resonance, optical spectroscopy and atom interferometry were instrumentalto determine h/m (e), h/m (Cs), and h/m (Rb). Nuclear spectroscopy of Si and S,after capture of a thermal neutron, allowed the N A h product to be determined bycomparing the mass defect of Si and S with the frequencies of the γ photonsemitted in the cascades from the capture state to the ground state.Figure 1 shows the input data used in the analysis. The values calculated from themeasurements of h/m (e), h/m (Cs), and h/m (Rb) have been averaged, because theiruncertainty is mainly aﬀected by the N A determination and, therefore, they are highlycorrelated. The values obtained from nuclear spectroscopy have been averaged becausethey were obtained by repetitions of the same experiment. For the sake of numericalsimplicity, in the subsequent analyses, we set u = 10 − h and use the reduced data10 ( x i − h ) /h and uncertainties 10 u i /h , where h = 6 . × − J s is theweighted arithmetic mean of data.

Table 1.

Values of the Planck constant. The main reference is [13]; when relevant,more speciﬁc references are listed in the table. The values determined from the h/m (n), h/m (e), h/m (Cs), h/m (Rb), h/ ∆ m ( Si), and h/ ∆ m ( S) ratios dependon the same N A value. method laboratory 10 h / J s referencemeasurement of the Faraday constant h/e NIST 1980 6 . N A and γ lo measurement h/m (e) NIST 1989, IAC 2011 6 . h/m (e) NIM 1995, IAC 2011 6 . h/m (e) KRISS/VNIIM 1998, IAC 2011 6 . γ hi measurement h/e NPL 1979 6 . h/e NIM 1995 6 . K J measurement h/ (2 e ) NMI 1989 6 . h/ (2 e ) PTB 1991 6 . h/m ( K ) NPL 1990 6 . h/m ( K ) NIST 1998 6 . h/m ( K ) NIST 2007 6 . h/m ( K ) METAS 2011 6 . h/m ( K ) NPL 2012 6 . h/m ( K ) NRC 2012 6 . N A and quotient of h and the neutron mass h/m (n) PTB 1998, IAC 2011 6 . N A and quotient of h and the mass of the electron or an atom h/m (e) CODATA 2006, IAC 2011 6 . h/m (Cs) Stanford 2002, IAC 2011 6 . h/m (Rb) LKB 2011, IAC 2011 6 . . N A and quotient of h and nuclear mass defects h/ ∆ m ( Si) MIT/ILL 2005, IAC 2011 6 . h/ ∆ m ( S) MIT/ILL 2005, IAC 2011 6 . . NIST: National Institute of Standards and Technology (USA), IAC: International AvogadroCoordination, NIM: National Institute of Metrology (People’s Republic of China), NPL:National Physical Laboratory (UK), NMI: National Metrology Institute (Australia), PTB:Physikalisch Technische Bundesanstalt (Germany), METAS: Federal Oﬃce of Metrology(Switzerland), NRC: National Research Council (Canada), Stanford: Stanford University(USA), LKB: Laboratoire Kastler-Brossel (France), MIT: Massachusetts Institute ofTechnology (USA), ILL: Institut Laue-Langevin (France) æ æ æ æ æ æ æ æ æ ææ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ - - - - H h - h L (cid:144) h h (cid:144)H L NMI1989 h (cid:144)H L PTB1991 h (cid:144) m K NPL1990 h (cid:144) m K NIST1998 h (cid:144) m K NIST2007 h (cid:144) m K NPL2012 h (cid:144) m K METAS2011 h (cid:144) m K NRC2012 N A IAC2011 h (cid:144) D m ILL2005 h (cid:144) m H n L PTB1999 Γ lo KRISS1998 Γ lo NIM1995 Γ lo NIST1989 Γ hi NPL1979 Γ hi NIM1995 h (cid:144) e NIST1980

Figure 1.

Values of the Planck constant given in table 1. The label N A indicatesthe weighted mean of values determined from the h/m (e), h/m (Cs), h/m (Rb)ratios. The label h/ ∆ m indicates the weighted mean of values determined fromthe h/ ∆ m ( Si), and h/ ∆ m ( S) ratios. The reference value is the weightedmean, h = 6 . × − J s. The double error bars (blue and red)indicate the lower and upper bounds to the standard deviation, according tothe most probable data model. H0 hypothesis When supposing the data sampled from independent Gaussian distributions thestandard deviations of which are the data uncertainties, the most probable h valueis the weighted arithmetic mean. The evidence for this hypothesis is Z H0 = 9 . × − Λ V . The awful numerical value of Z H0 is due to fact that it is the normalizingfactor of L H0 ( h ; x i ) π h ( h ). Its dimensions are (J s) − N and the V = 10 N − / (ΛΞ h N − )coeﬃcient originates from the normalizations of the data and uncertainties used in thecalculation. H1 Hypothesis

Though none of the data is totally out of scale, with χ = 85 .

69 for ν = N − R B = p χ /ν = 2 .

31, they are neverthelessinconsistent. In order to make the data consistent, we supposed that the quoteduncertainties are wrong by a common scale factor λ . The evidence for this hypothesisis shown in Fig. 2; the most probable λ value is the Birge ratio. By multiplying thedata uncertainties by R B , the χ of the weighted arithmetic mean is 16 and the Birgeratio is 1; this removes the discrepancy between data spread and uncertainties.The evidence for the whole of H1 is Z H1 = 2 . × − V . Without ﬁxing theextension Λ of the parameter space, we cannot compare the H0 and H1 hypotheses,but the H1( λ = 1) model replicates the H0 hypothesis. Hence, the comparison of the Z H1 ( λ = 1) = 9 . × − V evidence against Z H1 ( λ = R B ) = 1 . × − V conﬁrms1 Λ - H N - L X L h N - Z H H Λ L Λ - H N - (cid:144) L X L h N - Z H H Λ L Λ - H N - L X L h N - Z H a H Λ L Λ - H N - (cid:144) L X L h N - Z H H Λ L Figure 2.

Evidences for the σ = λu (top, left), σ = u +(10 − λh ) (top, right, u < σ < max( u, λu ) (bottom, left), and u < σ < max( u, − λh ) (bottom,right) hypotheses. The red dots indicate the uncertainties associated to the data. that the odds are severely against the hypothesis of a Gaussian samplings havingknown variance. H2 Hypothesis

This hypothesis conjectures a missing uncertainty-contribution 10 − λh common toall the data. The evidence for this hypothesis is shown in Fig. 2. The most probable λ value is λ = 0 .

17, with Z H2 ( λ ) = 6 . × − V . The evidence for the whole ofH2 is Z H2 = 8 . × − V . The comparison against the H0 hypothesis can be madeby observing that H2( λ = 0) replicates H0 and that Z H2 ( λ = 0) = 9 . × − V . Asbefore, the odds are against the hypothesis of a Gaussian samplings having knownvariance. H3 Hypothesis

When the data uncertainties are supposed to be the lower bounds of unknown standarddeviations, we compared two hypotheses. In the H3a case, the standard deviationupper-bounds are proportional to the uncertainty associated with each datum, i.e., u i ≤ σ i ≤ max( u i , λu i ). In the H3b case, the same 10 − λh value bounds thegreatest standard-deviation of each datum, i.e., u i ≤ σ i ≤ max( u i , − λh ). Theevidences for these hypotheses are shown in Fig. 2. Also in these cases, H3a( λ = 1)and H3b[ λ = min(10 u i /h )] replicate H0.The H3 hypothesis could be thought pessimistic: the uncertainties u i set onlylower bounds for the standard deviations of the sampling distributions, which are2 Table 2.

Medians h of the Planck constant values calculated according to themarginal distributions Θ H ( h, λ ) and Θ H ( h ), where λ is the most probablevalue of the model parameter. The reference h value is the weighted mean, h = 6 . × − J s of the data. The lower and upper bounds are the16% and 84% quantiles. Z H is the evidence for the model, which is proportional(through the same proportionality factor) to the probability Prob(H | x i ) of beingtrue. model λ ( h − h ) h ( h − h ) h ΞΛ h N − Z H N − Prob(H | x i )distribution usedΘ H ( h, λ ) Θ H ( h )H0 − − +18 − . × − Λ − H1 2.3 0 +42 − +45 − . × − . × − H2 0 . − +62 − − +66 − . × − − +46 − − +48 − . × − . × − H3b 0 . − +61 − − +65 − . × − − − − +66 − . × − H0: σ = u , H1: σ = λu , H2: σ = u + (10 − λh ) , H3a: u < σ < max( u, λu ), H3b: u < σ < max( u, − λh ) supposed very large if at all possible. However, as shown in Fig. 2, an inﬁnite standarddeviation is not supported by the data and ﬁnite upper bounds exist. In the H3a class,the optimal upper bound is 4 . u i , with Z H3a (4 . u i ) = 6 . × − V . In the H3b class,the optimal upper bound is 0 . × − h , with Z H3b (0 . × − h ) = 5 . × − V .This eliminates the worry of a too pessimistic view. The evidences for the whole ofthese hypothesis classes are given in table 2. Predictions of the h value were made from both the marginal distributions Θ H ( h ; λ ),where only the most probable λ = λ value was taken into account, and Θ H ( h ), whereall the possible λ values have been considered. The median h value and 68% conﬁdenceintervals are given in table 2 and Fig. 3, together with the evidences for the diﬀerentmodels. The weighted arithmetic mean of the data, which is also the median of themarginal distributions based on the H0 and H1 models, has been taken as the referencevalue.To choose the best data model, we must compare the probabilities of each ofthem being true. These probabilities are proportional to the evidences given intable 2. Of those considered, the hypothesis most favoured by data is H3b. Itpostulates that the data uncertainties are lower bounds to the standard deviationsof the sampling distributions and that a common upper bound exists for all of them.The most probable upper bound, 0 . × − h , is relatively small. Only some datahave smaller uncertainty; in Fig. 1 they are indicated by the double error bars. Theevidence for hypothesis H2, which postulates that a common contribution is missing3 æææææææææ æ - -

50 0 50 10010 H h - h L(cid:144) h H0H1H2H3aH3bAll

Figure 3.

Predictions of the Planck constant values derived from the marginaldistributions Θ H ( h, λ ) (lower line) and Θ H ( h ) (upper line). The last value (All)takes into account the ignorance of the data model by weighing the H1, H2, H3a,and H3b values according to the relevant evidences. The reference h value is theweighted mean, h = 6 . × − J s of the data. The asymmetric errorbars indicate 68% conﬁdence intervals. in the uncertainty budged, is almost high as the evidence for H3b. Besides, H2 andH3b diﬀer only in the greater freedom allowed by H3b to the magnitude of the missedcontribution.Birge’s hypothesis of a missing scaling factor common to all uncertainties, bothin the original and extended versions H1 and H3a, though more supported thanthe hypothesis of Gaussian samplings having standard deviations equal to the datauncertainties, is not sustained by the data.At the end of this analysis we have diﬀerent estimates related to the diﬀerenthypotheses about the uncertainties associated to data. The variability of the datamodel can be eliminated by marginalization. Hence,Θ( h ) = X H Θ H ( h )Prob(H | x i ) = P H Θ H ( h ) Z H P H Z H , (25) - - - - H h - h L(cid:144) h - h Q H h L Figure 4.

Post-data distribution of the Planck constant values given themeasurement results listed in table 1. Marginalization has been carried out withrespect to all models H1, H2, H3a, and H3b. h estimate are given in table 2 and Fig. 3. The posterior density Θ( h ), whichaccounts for all the data models here considered, is shown in Fig. 4.

7. Conclusions

The probability calculus oﬀers a rigorous solution to the problem of ﬁtting a constantto a set of measured values, no matter whether the data are consistent or not. Whenwe suspect the presence of outliers [9] or data non-conformity [4], the post-datadistribution of the measurand values depends on the model supposed for the unknownvariance of the data.Diﬀerent data models can be considered and that most supported by data canbe identiﬁed by calculating and comparing the probabilities of each model being true.Once the most probable model is found, a measurand value can be optimally chosenaccording the measurand post-data probabilities. The ignorance about the data modelcan also be taken into account by weighing the measurand estimates according to themodel probabilities. This corresponds to marginalization of the post-data distributionof the measurand values over the data models. Our analysis clariﬁes that a hypothesiscannot be accepted or rejected absolutely, but only relatively to alternatives againstwhich its probability is compared. From this viewpoint, hypothesis test and modelselection are nothing else than a Bayesian inference procedure and they do not diﬀerfrom parameter estimate.Our analysis proved also that, on the assumption that the uncertainties associatedto data are in error by an unknown scale factor, the Birge ratio can be interpreted asthe most probable value of such scale factor. In addition, the analysis suggested howthe Birge ratio can be generalized to the case when the data are correlated.

Acknowledgments

This work was jointly funded by the European Metrology Research Programme(EMRP) participating countries within the European Association of NationalMetrology Institutes (EURAMET) and the European Union.

References [1] Luemberger D G 1969 Optimization by Vector Space Methods (New York: John Wiley & Sons,Inc.)[2] Birge R T 1929 Probable values of the general physical constants

Rev. Mod. Phys. J. Res. Natl. Bur. Stand. Meas. Sci. Technol. Metrologia Metrologia Metrologia Meas. Sci. Technol. [10] Jaynes E T 2003 Probability theory: The logic of science (Cambridge: Cambridge UniversityPress)[11] Mc Kay D JC 2003 Information Theory, Inference, and Learning Algorithms (Cambridge:Cambridge University Press)[12] Gregory P C 2005 Bayesian Logical Data Analysis for the Physical Sciences (Cambridge:Cambridge University Press)[13] Mohr P J, Taylor B N, and Newell D B 2008 CODATA recommended values of the fundamentalphysical constants: 2006 Rev. Mod. Phys. Rev. Mod. Phys. submitted; preprinthttp://arxiv.org/abs/1203.5425; web http://physics.nist.gov/cuu/Constants[15] Bower V E and Davis R S 1980 The electrochemical equivalent of pure silver - a value of theFaraday

J. Res. Natl. Bur. Stand. Si crystal for a new kilogram deﬁnition

Metrologia S1-S13[17] Williams E R, Jones G R, Ye S., Liu R, Sasaki H, Olsen P T, Phillips W D, and Layer H P 1989A low ﬁeld determination of the proton gyromagnetic ratio in water

IEEE Trans. Instrum.Meas. γ ′ p and 2 e/h at NIM Acta Metrologia Sinica J. Korean Phys. Soc. Metrologia Metrologia IEEE Trans. Instrum. Meas. Metrologia Phys. Rev. Lett. IEEE Trans. Instrum. Meas. Metrologia Metrologia Metrologia L8-L10[29] Krueger E, Nistler W, and Weirauch W 1998 Determination of the ﬁne-structure constant by aprecise measurment of h/m n : the ﬁnal result Metrologia h/m n Metrologia Phys. Scr.

A Preliminary Measurement ofthe Fine Structure Constant Based on Atom Interferometry

T102

Phys. Rev. Lett. E = mc Nature438