The Proton Radius from Bayesian Inference
aa r X i v : . [ h e p - ph ] M a y The Proton Radius from Bayesian Inference
Krzysztof M. Graczyk ∗ and Cezary Juszczak Institute for Theoretical Physics, University of Wroc law, pl. M. Borna 9, 50-204, Wroc law, Poland
The methods of Bayesian statistics are used to extract the value of the proton radius from theelastic ep scattering data in a model independent way. To achieve that goal a large number ofparametrizations (equivalent to neural network schemes) are considered and ranked by their condi-tional probability P (parametrization | data) instead of using the minimal error criterion. As a resultthe most probable proton radii values ( r pE = 0 . ± .
003 fm, r pM = 0 . ± .
007 fm) are obtainedand systematic error due to freedom in the choice of parametrization is estimated. Correcting thedata for the two photon exchange effect leads to smaller difference between the extracted values of r pE and r pM . The results disagree with recent muonic atom measurements. PACS numbers: 13.40.Gp, 25.30.Bf, 14.20.Dh, 84.35.+iKeywords: proton radius, proton form-factors, two-photon exchange correction
The problem of the proton radius is being discussedin the particle and atomic physics for many years butrecently it has received even more attention due to thenew results coming form the Lamb shift measurements ofthe muonic hydrogen atom ( µp ) [1]. It is a very accurateestimate of the proton radius but it is inconsistent withthe CODATA value (compilation of the measurements inhydrogen atom and analysis of the electron-proton scat-tering data) [2]. This disagreement, called “the protonradius puzzle”, has not been explained yet. There aremany proposals to resolve the problem including consid-ering different types of interaction for ep and µp [3] or“weakening the assumptions of perturbative formulationof quantum electrodynamics within the proton” [4]. Inthis paper we introduce a novel approach to the extrac-tion of the proton radius from the ep scattering data,which allows one to control the model dependence of theresult within a Bayesian objective algorithm.The electromagnetic (E-M) structure of the proton isencoded in the electric G E and magnetic G M form fac-tors, which are also the crucial input for the atomic(hydrogen-like) calculations [5]. In the low Q limit inthe Breit frame they are related with the distributions ofthe electric charge and magnetic momentum inside thenucleon [6]. In particular the nucleon radius is expressedby the slope of the form factors at vanishing Q : r pE,M ≡ − G E,M (0) dG E,M ( Q ) dQ (cid:12)(cid:12)(cid:12)(cid:12) Q =0 ! . (1)The value of r pE,M can be obtained from the scatteringdata (mainly elastic ep ). Recent results include [7–11]and a more complete list can be found in Refs. [3, 12, 13].These analyses have been performed in the spirit of thefrequentistic statistics.The values of the proton radius (related to the electriccharge distribution) obtained by different authors during ∗ Electronic address: [email protected]
FIG. 1: (Color online) The scheme of the neural network (with H = 4 hidden units) used in the analysis, for reference see Eq.6. Every line (connection) corresponds to one parameter w i .Gray circles denote the sigmoid functions, open circles areconstants. the last fifty years, range from about 0.8 fm to about0.9 fm [14]. Indeed, as it was demonstrated in [8, 9] theresults depend the choice of the form factor parametriza-tion but it is obvious that the collection of datasets usedin the analysis also matters. Even the right choice of thenumber of parameters in a given class of parametrizationscan be a challenge. More parameters in the fit can alwaysreduce the χ error but at some point the fit tends toreproduce statical fluctuations of the experimental mea-surements. Such a model overfits the data i.e. describesexisting points with unrealistic precision but has no pre-dictive power – introducing new data points drasticallyincreases the χ error and spoils the quality of the fit.This is connected with the so called bias variance trade-off. Some attempt to resolve this problem has been madein Ref. [9] where conformal mapping was used in orderto exploit the analytic properties of the form factors.Our philosophy is different. In contrast to the frequen-tistic statistics methods used in the above papers, we usethe Bayesian statistics which is well suited to model com-parison. Within this framework, adapted for neural net-works, we select the most optimal model (according tothe data) and extract the value of the proton radius.The extraction of the nucleon form factors in [15] wasour first use of the Bayesian neural network framework.In the next paper [16] the approach was significantly im- -1000-800-600-400 0 10 20 30 40 l n ( e v i den c e ) number of hidden units Q <1 GeV (tpe) Q <3 GeV (tpe) Q <10 GeV (tpe) Q <1 GeV Q <3 GeV FIG. 2: (Color online) Logarithm of evidence for the best neu-ral networks in each scheme. Solid/dashed lines correspond tothe analyses of the data corrected/not corrected by the TPEeffect. Points mark the best models for each analysis. proved to obtain the two-photon exchange (TPE) cor-rection to unpolarized elastic ep cross section. However,because of some physical assumptions these analyses werenot dedicated to the low- Q region. In the present pa-per we overcome this difficulty because: (i) the TPEcontribution (elastic and inelastic described by box dia-grams with nucleon and ∆(1232) resonance as intermedi-ate hadronic states) calculated in [17] is subtracted fromthe data and only E-M form factors are extracted fromthe cross section and polarization transfer data; (ii) theform factors are re-parameterized in order to automat-ically fulfil G E (0) = G M (0) /µ p = 1 ( µ p is the protonmagnetic moment); (iii) our software has been signifi-cantly improved in terms of accuracy and efficiency.Here we briefly outline the Bayesian framework, formore details see [15–17]. The main idea is to considera large class of models. By a model we mean the func-tion N used to fit the data D and two conditional prob-abilities: P ( { w i }|N ) – prior which accommodates theinitial assumption about the { w i } , and the likelihood P ( D|{ w i } , N ); { w i } is the set of parameters of the func-tion N . Then the models are ranked by the conditionalprobability P ( N |D ), which can be replaced by the evi-dence P ( D|N ) if the data is the same and no model ispreferred at the beginning of the analysis.For any given model we denote with { w i } MP the con-figuration of the parameters which maximizes the poste-rior P ( { w i }|D , N ) = P ( D|{ w i } , N ) P ( { w i }|N ) P ( D|N ) . (2)Finding { w i } MP in one of the steps of the analysis.Usually the evidence is peaked at { w i } MP and it sim-plifies to the likelihood at the maximum multiplied bythe Occam factor which makes too complex models less G E / G D Q (GeV ) otherQ < 1 GeV Q < 3 GeV Q < 10 GeV FIG. 3: (Color online) The form factor G E /G D ( G D =1 / (1 + Q / . ) ). Red, blue and magenta lines cor-respond to the best models in the analysis with data limitedby Q = 1, 3 and 10 GeV respectively. The areas coloredwith magenta, blue and red denote 1 σ uncertainty due to thechange in the fit parameters (calculated within the Hessianapproximation). The gray lines show models which are bestfor neural networks with definite number of hidden units, butare not globally best. likely [18] P ( D|N ) ≈ P ( D|{ w i } MP , N ) (2 π ) W (det A ) − | {z } Occam factor , (3)where W is the number of parameters { w i } and A ij = − ∇ w i ∇ w j ln P ( { w i }|D , N ) (cid:12)(cid:12) { w i } = { w i } MP .For the function N we take the feed-forward neuralnetworks with one hidden layer of units (Fig. 1), whichcorrespond to linear combinations of sigmoid functions.This class of functions can be used to approximate anycontinuous function with arbitrary precision [19].The prior function for the neural network parameters { w i } has the standard form P ( { w i }| α, N ) ∼ e − αE w where E w = P Wi =1 w i . The parameter α is the regular-izer which determines the width of the initial Gaussiandistribution for the parameters. In principle it should betreated as another parameter of the model and its opti-mal value α MP is also obtained during the analysis. Thelikelihood is defined by the χ distribution P ( D| { w i } , α, N ) ∼ e − χ cr. ( D , { w i } ) − χ PT ( D , { w i } ) . (4)The χ cr. includes the unpolarized elastic ep scatteringcross section data, which are assumed to be parametrizedby one-photon exchange formula σ R = Q G M / M p + εG E ( M p is the proton mass, ε denotes photon polariz-ability). Each independent set of cross section measure-ments is characterized by its systematic normalizationuncertainty so we introduce a separate normalization pa-rameter for every data set. Values of these parameters areobtained by an iterative algorithm described in [16]. The χ P T includes the form factor ratio µ p G E /G M data fromthe polarization transfer measurements. We consider thesame selection of data as in [16] with one additional dataset from [20] and make the prior assumption that all datasets are equally relevant. The MAMI data [7] are not in-cluded in the analysis because of the difficulties describedin [21].The optimal configuration { α, w i } MP is found byan iterative procedure based on the fact that themaximum of the posterior corresponds to the mini-mum over { w i } of the error function S ( D , { w i } ) ≡ χ cr. ( D , { w i } ) + χ P T ( D , { w i } ) + α MP E w and also ∂∂α P ( D| α, N ) (cid:12)(cid:12) α = α MP = 0, where P ( D| α, N ) = Z dw P ( D| { w i } , α, N ) P ( { w i }| α, N ) . Eventually the logarithm of evidence for the model N (in the Hessian approximation) reads:ln P ( D|N ) ≈ − S ( D , { w i } MP ) + W α MP −
12 ln det A −
12 ln W X i =1 λ i λ i + α MP (5)where A kj = ∇ w k ∇ w j S ( D , { w i } ) and λ k -s are theeigenvalues of the matrix ∇ w i ∇ w j ( χ cross ( D , { w i } ) + χ P T ( D , { w i } )).As the form factors G E , G M describe the properties ofthe same object, they must be related, so we parametrizethem with a single feed forward neural network with oneinput ( Q ) and two outputs, o E and o M (see Fig. 1), G E = (1 − Q o E ) G D , G M = µ p (1 − Q o M ) G D , (6)where G D ( Q ) = (cid:0) Q /M V (cid:1) − and M V = 0 .
71 GeV .Then the value of the proton radius is given by (cid:16) r pE,M (cid:17) = (cid:16) r pdipole (cid:17) + 6 o E,M ( Q = 0) , (7)where ( r pdipole ) = 12 /M V . We see that the network out-puts at Q = 0 directly contribute to the deviation of theproton radius from the dipole form factor value.In Refs. [14, 22] it was shown that modifying the crosssection data by the Coulomb distortion (CD) is impor-tant in the extraction of the proton radius. On the otherhand, the CD is a part of the TPE effect , which has tobe subtracted from the data to obtain the form factorswhich agree with the polarization transfer measurements The relevance of the TPE effect and higher order Born correc-tions in the proton radius extraction is also discussed in Refs.[23] and [24, 25] respectively. Q cutoff (GeV ) r pM (fm) r pE (fm) H . ± .
007 0 . ± .
003 13 0 . ± .
007 0 . ± .
003 410 0 . ± .
065 0 . ± .
005 6TABLE I: The values of the proton radii obtained with 1 σ uncertainty due to variation in the parameter space. H is thenumber of hidden units of the best model. [26]. Similarly as in [27] we correct the cross section databy the TPE contribution (calculated in [17] in similarway as in [28, 29]).The set of all possible parametrizations is infinite. Inour approach the models are grouped according to thenumber of units in the hidden layer. For each group wefound the { w i , α } which maximize the evidence. Thedependence of the maximal evidence on the number ofhidden units is plotted in Fig. 2. For every analysis theevidence reaches a peak (which defines the best model)and then starts to fall. Because of this fact we do notconsider the networks with more then 40 hidden units.The total number of obtained neural network parameter-izations exceeded half million.It needs to be underlined that the models which max-imize the evidence are not those which minimize the χ .This is true in each class of models and also globally. The χ min decreases with the number of parameters possiblysaturating at some point. But the models with the lowest χ tend to overfit the data, which is one of the reasonswhy the χ min is not a suitable criterion for ranking themodels. The other problems of χ -based criteria are dis-cussed in [3]. Objective mathematical methods for modelcomparison are provided by the Bayesian statistics.In Fig. 3 it is shown how the form factor plots de-pend on the choice of the parametrization. The gray linescorrespond to the best models in each group, while theglobally best for each analysis are shown with color. Cer-tainly the choice of the form factor parametrization hasalso strong impact on the obtained proton radius value(Fig. 4). Hence it is crucial to have criterion for findingthe model which is the most favorable by the measure-ments. On the other hand each model can be true withthe probability P ( N |D ) (proportional to the evidence) soit is possible to calculate the expected value of the protonradius and the systematic uncertainty due to the choiceof the model. In practice the expected value is very closeto the most probable value (see Tables I and II).In Fig. 4 we show the proton radii values obtainedfrom the analyses where data points with Q above Q = 1, 3 and 10 GeV are rejected (for the data un-corrected by the TPE we considered only Q = 1 , ). The choice of the value for the cutoff has smallimpact on the extraction of the proton radii. For theresult of our analysis we take those from the lowest cut,namely r pE = 0 . ± .
003 fm and r pM = 0 . ± .
007 fm. Q cutoff (GeV ) r pM (fm) r pE (fm)1 0 . ± . . ± . . ± . . ± . . ± . . ± . On the other hand the r pE ’s obtained based on the min-imal error criterion depend on the cutoff choice. Theseresults are smaller than those indicated by the evidenceand have larger uncertainties, see Fig. 4. We leave forfuture studies within the Bayesian framework the quan-titative investigation of the dependence of the results onthe data set selection and Q value. Eventually itcan be seen in Fig. 4 that using the data uncorrected byTPE effect leads to larger uncertainties of r pE and r pM andlarger difference r pE − r pM (0 .
02 fm with TPE and 0 .
09 fmwithout).We would like to emphasize that according to ourknowledge the present work is the first extraction of theproton radius based on the Bayesian methods. The ex-tracted value of the proton radius r pE agrees with previ-ous results [9, 14, 27] but disagrees with the muonic atommeasurements [1]. The subtracting of the TPE correctionfrom the cross section data plays an important role in theanalysis.The calculations have been carried out in Wro-claw Centre for Networking and Supercomputing( ), grant No. 268. [1] R. Pohl, A. Antognini, F. Nez, F. D. Amaro, F. Biraben, et al. , Nature , 213 (2010).[2] P. J. Mohr, B. N. Taylor, and D. B.Newell, Rev.Mod.Phys. , 1527 (2012),arXiv:1203.5425 [physics.atom-ph] .[3] E. Kraus, K. Mesick, A. White, R. Gilman, andS. Strauch, (2014), arXiv:1405.4735 [nucl-ex] .[4] K. Pachucki and K. A. Meissner, (2014),arXiv:1405.6582 [hep-ph] .[5] K. Pachucki, Phys.Rev. A53 , 2092 (1996).[6] F. Ernst, R. Sachs, and K. Wali,Phys.Rev. , 1105 (1960).[7] J. Bernauer et al. (A1 Collabora-tion), Phys.Rev.Lett. , 242001 (2010),arXiv:1007.5076 [nucl-ex] .[8] J. Bernauer et al. (A1 Collabora-tion), (2013), 10.1103/PhysRevC.90.015206,arXiv:1307.6227 [nucl-ex] .[9] R. J. Hill and G. Paz, Phys.Rev.
D82 , 113005 (2010),arXiv:1008.4619 [hep-ph] .[10] I. Lorenz, H.-W. Hammer, and U.-G. Meissner,Eur.Phys.J.
A48 , 151 (2012), arXiv:1205.6628 [hep-ph] . r E p ( f m ) ln(evidence)no tpe tpe 0.000.200.400.600.801.00 r M p ( f m ) Q <10 GeV Q <3 GeV Q <1 GeV FIG. 4: (Color online) The proton radii corresponding to themodels which are the best within particular data selection andneural network scheme. The results for the data corrected bythe TPE are marked with black △ , (cid:3) , and (cid:13) for Q = 1,3 and 10 GeV respectively. The results for the uncorrecteddata are marked with blue ∇ and ♦ for Q = 1 and 3GeV respectively. The filled points mark the best result foreach case. The three leftmost red points are the best resultsaccording to the minimum of the error function. For claritytheir x-coordinate has been changed. Their ln(evidence) val-ues are − − − · B576 , 62 (2003),arXiv:nucl-ex/0310008 [nucl-ex] .[15] K. M. Graczyk, P. Plonski, and R. Sulej,JHEP , 053 (2010), arXiv:1006.0342 [hep-ph].[16] K. M. Graczyk, Phys.Rev.
C84 , 034314 (2011),arXiv:1106.1204 [hep-ph] .[17] K. M. Graczyk, Phys.Rev.
C88 , 065205 (2013),arXiv:1306.5991 [hep-ph] .[18] D. MacKay,
Bayesian Methods for Adaptive Models ,Ph.D. thesis, California Institute of Technology (1991).[19] G. Cybenko, Math Control, Signal , 303 (1989). [20] X. Zhan, K. Allada, D. Armstrong, J. Arring-ton, W. Bertozzi, et al. , Phys.Lett. B705 , 59 (2011),arXiv:1102.0318 [nucl-ex] .[21] J. Arrington, Phys.Rev.Lett. , 119101 (2011),arXiv:1108.3058 [nucl-ex] .[22] R. Rosenfelder, Phys.Lett.
B479 , 381 (2000),arXiv:nucl-th/9912031 [nucl-th] .[23] M. Gorchtein, (2014), arXiv:1406.1612 [nucl-th] .[24] J. Arrington, J.Phys.
G40 , 115003 (2013),arXiv:1210.2677 [nucl-ex] .[25] J. Arrington and I. Sick, Phys.Rev.
C70 , 028203 (2004),arXiv:nucl-ex/0406014 [nucl-ex] .[26] J. Arrington, W. Melnitchouk, and J. Tjon, Phys.Rev.
C76 , 035205 (2007),arXiv:0707.1861 [nucl-ex] .[27] P. G. Blunden and I. Sick,Phys.Rev.
C72 , 057601 (2005),arXiv:nucl-th/0508037 [nucl-th] .[28] P. Blunden, W. Melnitchouk, andJ. Tjon, Phys.Rev.
C72 , 034612 (2005),arXiv:nucl-th/0506039 [nucl-th] .[29] S. Kondratyuk, P. Blunden, W. Melnitchouk,and J. Tjon, Phys.Rev.Lett.95