Interlaboratory consensus building challenge
IINTERLABORATORY CONSENSUS BUILDING CHALLENGE
G MANA The challenge
The challenge is about an interlaboratory comparison which involved elevenmetrology institutes [1]. It comprises four tasks • deriving a consensus value from these results; • evaluating the associated standard uncertainty; • producing a coverage interval that, with 95% confidence, is believed toinclude the true value of which the consensus value is an estimate; • suggesting how the measurement result from NIST may be compared withthe consensus value. 2. Input data
The input data are the measured values of the iron-59 activity, x i , which is pos-itive by definition, and the associated uncertainties, u i . No information is givenabout correlations, degrees of freedom of the uncertainty estimate, and the range ofthe possible measurand values. The data distributions encoding the available infor-mation, without introducing uncontrolled assumptions, are independent Gaussians,having common (positive) mean µ and, to avoid neglecting dark uncertainties, stan-dard deviations σ i greater than or equal to the associated uncertainties.3. Proposed solution
To explain the data, I considered the following set of random-error models. Forsome datum – maybe none, maybe all – the σ i = u i identity holds; the otherare affected by dark uncertainties. In the first case, x i ∼ N ( x i | µ, u i ). In thesecond, x i ∼ N ( x i | µ, σ i ), where σ i ≥ u i . The hypothesis space contains as many(mutually exclusive) models as the 2048 subsets of the measured values, the emptyset and its complement included. Each subset identifies the results whose associateduncertainty is the standard deviation of the sampling distribution.Since any of data models are uncertain, to offer evidence that they explain theresults or to disprove them, the sought solution must allow for comparisons. Thisdesiderata requires that the marginal likelihood (also termed evidence) is indepen-dent of the chosen distribution parameters (e.g., the mean or standardised meanand the standard deviation or variance). Consequently, it requires that the priordistributions of the different parameterisations are proper and comply with thechange-of-variable rule. [email protected]. a r X i v : . [ phy s i c s . d a t a - a n ] J a n G MANA
Since testable information is not given, the Jeffreys’ prior, which is proportionalto the volume element of the N ( x i | µ, σ i ) manifold equipped with a local Kullback-Leibler metric, can do the work [2]. It is(1) µ, σ i ∼ π ( µ, σ i | u i ) = u i V µ σ i , where u i ≤ σ i , 0 < µ , and V µ is ”volume” of the µ subspace. The samplingdistribution of x i , given the mean and u i and with the unknown σ i integrated out,is(2) x ∼ L ( x | µ, u ) = u (cid:90) + ∞ u N ( x | µ, σ ) /σ d σ = (cid:18) − e − (x − µ )22u2 (cid:19) u √ π (x − µ ) , where I dropped the i subscript.The data likelihood, given the model A , is(3) x ∼ Q ( x | µ, u , A ) = (cid:89) i ∈ A,j ∈ ¯ A N ( x i | µ, u i ) L ( x j | µ, u j ) , where A is a subset of normal data and ¯ A is its complement. The marginal likelihoodand the posterior distribution of the mean are(4) Z ( x | u , A ) = 1 V µ (cid:90) + ∞−∞ Q ( x | µ, u , A ) d µ and(5) µ ∼ P ( µ | x , u , A ) = Q ( x | µ, u , A ) V µ Z ( x | u , A ) , where the V µ support and µ value are large enough to allow extending the integra-tion to the reals for all practical purposes.The A i ’s probabilities (see Fig. 1) are(6) Prob(A i | x , u ) = Z( x | u , A i ) (cid:80) i Z( x | u , A i ) , where, in the absence of additional information, I assumed equiprobable A i s, whichcorresponds to the maximum entropy prior.All the information about the measurand is encoded in its posterior probabilitydensity (5) averaged over all the models,(7) µ ∼ (cid:88) i P ( µ | x , u , A i )Prob(A i | x , u ) . For the sake of simplicity, I picked up the most probable model, A mx (see Figs. 1and 2). Hence,(8) µ ∼ P ( µ | x , u , A mx )and(9) Z ( x | u , A mx ) = (61 × − kBq − ) / V µ . To explain the data, other models are possible. Therefore, the A mx ’s evidence (9)is a kindness to who may wish to check the A mx explanation by the ratios of theevidence values, without having to redo the calculations. This value lets (8) befuture-proof, in that competing explanations can be compared with A mx . As an NTERLABORATORY CONSENSUS BUILDING CHALLENGE 3 - - m ode l p r obab ili t y / % Figure 1.
Posterior probabilities of the subsets of normal datasorted in decreasing order. The inset shows the first 20 values. Thehorizontal lines are the posterior probabilities of the no-Gaussian-datum (green) and all-Gaussian-data (red) subsets.
PTB NIST NPLANSTO LNE BKFHIAEA CMINMIJBARC KRISS - - -
50 0 50 100 1500.00.51.01.52.02.53.0 A e ( Fe ) - A e Fe / kBq p r obab ili t y den s i t y / k B q - Figure 2.
Most probable posterior probability density for the ac-tivity of iron-59. A e ( Fe) = 14 ,
631 kBq is the arithmetic mean ofthe data. The dots are the measured values; the lines represent theassociated uncertainties. Green: Gaussian data, red: data affectedby dark uncertainty.example, the evidence of the no-Gaussian-datum model is (6 . × − kBq − ) / V µ ,whereas that of the all-Gaussian-data one is (0 . × − kBq − ) / V µ .The choice of a consensus value is a matter of decision theory. The posteriormean, mode, and median are all equal to 14 ,
620 kBq. The posterior standarddeviation is 16 kBq. The coverage interval that, with 95% confidence, is believedto include the true activity is [14 , , , A i are mutually exclusive, this probability (see Fig. 3) is(10) Prob( σ k = u k ) = p k = (cid:88) i,x k ∈ A i Prob(A i | x , u ) , G MANA BK F H I AEA / RCC P T B N I S T N P L A N S T O C M I - II R L N E - L NH B N M I J BA RC K R I SS p r obab ili t y o f σ i = u i / % Figure 3.
Posterior probability that the standard deviation of themeasurement result is equal to the associated uncertainty. - -
200 0 200 4000.00.10.20.30.40.50.6 ξ NIST - A e Fe / kBq p r obab ili t y den s i t y / k B q - Figure 4.
Predictive sampling distribution of future NIST mea-surement results, given the data explanation A mx . The filled areais the 95% confidence-interval. The dot is the NIST measuredvalue; the bar is the associated uncertainty.For instance, the probability that the standard deviation of the NIST’s measure-ment result is equal to the associated uncertainty is 66%.The k -th measurement result may also be compared with the consensus value(which is an estimate of the measurand value) via the predictive sampling-distri-bution (given the A mx model, see Fig. 4)(11) ξ | x , u , A mx ∼ (cid:90) + ∞−∞ (cid:2) p k N ( ξ | µ, u k ) + (1 − p k ) L ( ξ | µ, u k ) (cid:3) × P ( µ | x , u , A mx ) d µ, where ξ is a possible result of the laboratory k measurement.For example, the coverage interval that, with 95% confidence, is believed toinclude the future NIST measurement results is from 14,293 kBq to 14,946 kBq.This interval is nearly three-times that expected from(12) ξ ∼ N ( ξ | µ, u NIST ) , NTERLABORATORY CONSENSUS BUILDING CHALLENGE 5 that is, 653 kBq vs.
235 kBq. The difference is due to the residual 34% probabilityof dark uncertainty, which is inferred from the assumed models and comparisonresults.
References [1] Possolo A 2020
Anal. Bioanal. Chem.
Bayesian Inference: Data Evaluation and Decisions (Springer InternationalPublishing)(Springer InternationalPublishing)