Comparison of methods to extract an asymmetry parameter from data
aa r X i v : . [ phy s i c s . d a t a - a n ] S e p Comparison of methods to extract anasymmetry parameter from data
J¨org Pretz ∗ Physikalisches Institut, Universit¨at Bonn, 53115 Bonn, Germany
Abstract
Several methods to extract an asymmetry parameter in an event distribution func-tion are discussed and compared in terms of statistical precision and applicability.These methods are: simple counting rate asymmetries, event weighting proceduresand the unbinned extended maximum likelihood method. It is known that weight-ing methods reach the same figure of merit (FOM) as the likelihood method in thelimit of vanishing asymmetries. This article presents an improved weighting proce-dure reaching the FOM of the likelihood method for arbitrary asymmetries. Caseswhere the maximum likelihood method is not applicable are also discussed.
Key words: event weighting, minimal variance bound, Cram´er-Rao inequality,asymmetry extraction, optimal observables, parameter determination, maximumlikelihood
PACS:
We consider two differential event distributions n ± ( x ) following the functionalform n ± ( x ) = α ( x )(1 ± β ( x ) A ) . (1)In a typical experimental situation encountered in particle physics α ( x ) in-cludes a flux and acceptance factor and β ( x ) is an analyzing power. Bothdepend on a set of kinematic variables here denoted by x . Concrete examples ∗ corresponding author Email address: [email protected] (J¨org Pretz).
Preprint submitted to Elsevier 29 October 2018 re spin cross section asymmetries and muon decay. In the latter case theasymmetry corresponds to the muon polarization. The two data sets (+ and − ) are for example obtained by changing the sign of a polarization. The goal isto extract the parameter A by measuring the event distributions n ± ( x ). Thiswould be an easy task if both α ( x ) and β ( x ) were known. However in manyapplications the factor α ( x ) is not known, or not accurately enough knownand only β ( x ) is given.Section 2 presents various methods to extract the parameter A : The simplestmethod, based on counting rate asymmetries, the more efficient extended un-binned maximum likelihood (EML) method and finally methods based onevent weighting are discussed. While in Section 2 we assume that the factor α ( x ) is the same for both data sets, Section 3 extends the discussion to thecase where one has two different factors. Up to Section 3 we assume that thenumber of observed events is large enough so that averages can be replaced bythe corresponding expectation values. Effects occurring at low statistics eventsamples are discussed in Section 4. A summary and conclusions are given inSection 5. A A from counting rate asymmetry The expectation value of the number of events for the two data sets reads D N ± E = Z n ± ( x )d x = (1 ± h β i A ) Z α ( x )d x (2)with h β i = R αβ d x/ R α d x . The integrals run over the kinematic range of x .The asymmetry A can be extracted without the knowledge of R α ( x )d x : A = 1 h β i h N + i − h N − ih N + i + h N − i . (3)Eq. (3) leads to following estimator for A :ˆ A cnt = N + + N − P + β i + P − β i N + − N − N + + N − = N + − N − P + β i + P − β i where β i ≡ β ( x i ), N + and N − are the numbers of observed events. The sums P + and P − run over all events in the corresponding data set (+ or − ).2s shown in Section 2.3 and App. A, the figure of merit (FOM), i.e. the inverseof the variance on ˆ A cnt , isFOM ˆ A cnt = N h β i − A h β i (4)where N = N + + N − denotes the total number of events.This figure of merit may be increased if a cut is set to remove some data withlow values of β . However, it will not reach the FOM of the unbinned extendedlikelihood method discussed now, unless β ( x ) is constant. We now turn to the unbinned extended maximum likelihood (EML) method[1,2]which is known to reach the Cram´er-Rao limit of the lowest possible statisticalerror. The log-likelihood function derived from Eq. (1) reads l = X + ln ( α i (1 + β i A )) − h N + i ( A ) + X − ln ( α i (1 − β i A )) − h N − i ( A ) . Using the expression in Eq. (2) for the expectation values h N ± i results in l = X + ln (1 + β i A ) + X − ln (1 − β i A ) − Z α ( x )d x − X + , − ln α i . (5)The last two terms do not depend on A and can be ignored in the likelihoodmaximization. The asymmetry A can thus be determined without knowledgeof α ( x ). For small values of βA one can even derive an analytic expression forˆ A LH which readsˆ A LH = P + β i − P − β i P + β i + P − β i . (6)For arbitrary asymmetries the maximization has to be done numerically. Note,that this requires CPU intensive sums over all events in the maximizationprocedure.The figure of merit (FOM) is given byFOM ˆ A LH = − ∂ l∂A = X + β i (1 + β i A ) + X − β i (1 − β i A ) . ˆ A LH = Z α (1 + βA ) β (1 + βA ) d x + Z α (1 − βA ) β (1 − βA ) d x = Z αβ − β A d x . (7)Noting, that for an arbitrary function f ( x ) the average is defined by h f i = R f ( x )( n + ( x ) + n − ( x ))d x R n + ( x ) + n − ( x )d x = R α ( x ) f ( x )d x R α ( x )d x , the figure of merit can be written asFOM ˆ A LH = N * β − β A + . (8) Next, consider the following estimatorˆ A w = P + w i − P − w i P + w i β i + P − w i β i (9)where w i ≡ w ( x i ) is a, for the moment arbitrary, weight factor assigned toevery event. The expectation value of ˆ A w equals A independently of the weightfunction w ( x ) used. App. A shows thatFOM ˆ A w = N h wβ i h w (1 − A β ) i . (10)Two cases are of interest:1.) Setting w = 1 corresponds to the counting rate asymmetry discussed inSection 2.1 and proves Eq. (4) for the FOM.2.) In the case w = β the FOM isFOM ˆ A w = β = N h β i − A h β ih β i . A comparison with Eq. (8) indicates that FOM ˆ A w = β coincides with the FOMof the likelihood method for vanishing A . Actually, in this case, the two es-4imators are identical as can be seen by comparing Eq. (9) with w ≡ β andEq. (6). Note that the estimator ˆ A w can be applied for arbitrary asymmetriesas well, accepting a decrease of the FOM compared the EML estimator asdiscussed in Section 2.5.Such a weighting procedure has been used for example in Refs. [3,4] to extractspin asymmetries in the case where h β i A ≪
1. In Ref. [5] a weighting methodis discussed to simultaneously extract signal and background asymmetries.The fact that a weighting procedure reaches the same FOM as the EMLmethod was first discussed in Ref. [7] in the context of signal and backgroundextractions. The next section shows that one can find a weight factor reachingthe FOM of the EML method even in the case of non-vanishing asymmetries.
Variational calculus shows (s. App. B) that the maximum FOM is reachedusing a weight factor w = β − β A . (11)Here, A is a first estimate of the asymmetry A obtained for example fromthe weighting method presented in Section 2.3. The weighting factor definedin Eq. (11) leads to the following estimator (the index iw stands for improvedweight)ˆ A iw = P + β i − β i A − P − β i − β i A P + β i − β i A + P − β i − β i A . (12)Eq. (10) givesFOM ˆ A iw = N D β − β A E D β − β A (1 − β A ) E . (13)Thus given a good estimate A ≈ A , we get FOM ˆ A iw = FOM ˆ A LH , i.e. theimproved weighting method reaches the same FOM as the EML method forarbitrary asymmetries as well with the advantage that no CPU consumingmaximization procedure is needed. 5 ounting rate weighting Likelihood Improved weightingasymmetry method method methodFigure of merit N h β i − A h β i h β i − A h β ih β i D β − β A E Table 1The ratio FOM/ N for various methods discussed. Before we move to a comparison of the different methods, we note that theestimatorˆ A = P + β + i − P − β − i P + ( β + i ) + P − ( β − i ) + A with β ± = β ± βA (14)reaches as well the FOM of the EML for A ≈ A , In contrast to ˆ A iw itsexpectation value only equals A if A ≈ A . For τ decays the optimal weightfactor in Eq. (14) is discussed in Ref. [6]. Tab. 1 summarizes the FOM of the various estimators proposed. Fig. 1 showsthe figure of merit of the different estimators vs. A for the choice α ( x ) = const. = 2500 and β ( x ) = x, < x < . (15)The curves are analytic calculations. The points are results of simulations.For each value of the asymmetry 10000 configurations with α = 2500, whichcorresponds on average to 5000 events, were simulated. One configurationconsists of a plus and minus data set used to evaluate an asymmetry. For eachof the 10000 configurations simulated, the asymmetries were calculated usingthe estimators discussed above. The FOM was determined from the RMS ofthe asymmetry distributions.The results are in perfect agreement with the analytic calculations. The sta-tistical errors of the simulations are of the order of the size of the points.Note, that for all methods no bias was found for the asymmetry. The questionof bias and the range of validity of the expressions for the FOM will be dis-cussed in more detail in Section 4. The weighting methods are superior to the6 F O M counting rateweightingimproved weighingLikelihood Fig. 1. Figure of merit for different methods as a function of the asymmetry A . method using simply the counting rates. As expected the improved weightingor the EML method reach a higher FOM than the simple weighting methodthe larger the asymmetry A . The results depend of course on the shape of α ( x ) and β ( x ). The gain in FOM using weighted events compared to countingrates depends on the spread of β ( x ). For A = 0 for example it is h β i / h β i as can be derived from Eq. (10).Fig. 2 shows the influence of the choice of A on the FOM in the improvedweighting method for the factors α and β as given in Eq. (15) and an asym-metry A = 0 .
8. Choosing A in a range 0.7–0.86, one reaches at least 99% ofFOM ˆ A LH . The normal weighting method corresponds to A = 0. We now turn to the case where the acceptance and flux factor α is not thesame in the two data sets. We assume that they differ by a known factor c which is independent of x . In this case the differential event distributions aregiven by n + ( x ) = 2 c c α ( x )(1 + β ( x ) A ) and (16) n − ( x ) = 21 + c α ( x )(1 − β ( x ) A ) . (17)7 A0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 i w A / F O M L H A F O M Fig. 2. FOM ˆ A iw / FOM ˆ A LH for A = 0 . A . The factor 2 / (1+ c ) has been introduced in order to normalize the distributionsto the same number of events for all values of c at A = 0. The log likelihoodfunction reads l = X + ln (cid:18) c c α i (1 + β i A ) (cid:19) − D N + E ( A ) + X − ln (cid:18)
21 + c α i (1 − β i A ) (cid:19) − h N − i ( A )= X + ln (1 + β i A ) + X − ln (1 − β i A )+ X + ln 2 c c α i + X − ln 21 + c α i − Z α d x − A c −
11 + c Z αβ d x . (18)Here the last term cannot be ignored because it contains the parameter A andthus the likelihood method cannot be applied without knowledge of the factor R αβ d x . The weighting method on the other hand can be applied with a smallmodification:ˆ A w,c = P + w i − c P − w i P + w i β i + c P − w i β i . (19)The expectation value of ˆ A w,c equals again A . The figure of merit reads (deriva-8 F O M c=1c=2c=3 Fig. 3. Figure of merit for three different values of c as a function of the asymmetry A , assuming the same number of events in the case A = 0. tion see s. App. C)FOM ˆ A w,c = N c (1 + c ) h wβ i D w (1 − β A ) (cid:16) − βA − c c (cid:17)E . (20)The FOM is shown in Fig. 3 for different values of c for the improved weightingmethod. As in Fig. 1 the lines correspond to an analytic calculation usingEq. (20), the points are results of simulations. Note, that for arbitrary c theweight reaching the highest FOM is w = β (1 − β A ) (cid:16) − βA − c c (cid:17) . The EML method could be used for c = 1, if one uses an estimate for R αβ d x ≈ P + β ( x i ) + c P − β ( x i ) from data. Simulations showed that the FOM of thismodified EML method equals the one of the improved weighting method. In this section we discuss the validity of the equations derived for the variousFOMs and possible biases of the estimators. The formulas were derived us-9 ime+ - + - + - + - + - + - + - + - + - + - + - + - + -smallconfig.asymmetry larger configurationasymmetry
Fig. 4. Combining data in different configurations. The boxes denote the plus andminus data sets. ing usual error propagation and can thus only be approximations which arethe better the higher N . In the simulations presented in Section 2.5 for everyvalue of the asymmetry 10000 configurations were simulated with on average5000 events (corresponding to α = 2500 in Eq. (15)). In each of these config-urations the asymmetry was determined using the various estimators. In thissection we discuss effects occurring if the asymmetries are extracted in smallerconfigurations, as indicated in Fig. 4.For lower number of events in one configuration one reaches a point where dueto statistical fluctuations the estimated asymmetry in a given configurationcan be larger than 1 or smaller than −
1. In this case the EML method is nomore applicable since the term (1 ± βA ) can get negative. For α = 250 thishappens in about 1% of the configurations for an asymmetry of A = 0 . β having atmost one event in a bin. In this case the remaining estimators are identical:ˆ A cnt ≡ ˆ A w = β ≡ ˆ A iw = ± /β i . The sign depends whether the event occurredin the plus or the minus data set. One finds D ˆ A iw E = A and D ˆ A iw E = 1 /β i ,thus the FOM for this event readsFOM i = 1 D ˆ A iw E − D ˆ A iw E = β i − β i A . Note that A is not the estimated asymmetry from a single event but rathertaken from a larger event sample. Combining all the asymmetries determinedon single events leads to P i ± β i · FOM i P i FOM i = P + 1 β i β i − β i A − P − β i β i − β i A P + β i − β i A + P − β i − β i A . (21)10ssuming A = A , Eq. (21) is nothing but the estimator of the improvedweighting method, ˆ A iw , defined in Eq. (12). In other words, for the improvedweighting method it makes no differences whether the data are analyzed inone large configuration or in many small ones. The improved weighting isalso equivalent to using an infinite number of bins in β and evaluating theasymmetries in every bin and then combining the results. The advantage ofthe improved weighting method is that this binning has not to be performed.The observations discussed above are confirmed by simulations. In total 10 configurations with α = 0 .
025 were simulated. The simulated data were ana-lyzed as follows. First the asymmetries were calculated in the approximately5% of the configurations actually containing at least one event. Then theweighted average of these asymmetries is calculated. The same data were an-alyzed in a different way by combining 10 configurations and calculating theasymmetries in these larger configurations corresponding to α = 0 .
25. Thisprocedure was repeated until reaching 10 configurations with α = 2500. Fig. 4illustrates the procedure. These simulations were performed for an asymmetryof 0.8 generating events in the range 0 . < β ( x ) < .
99 .Fig. 5 shows the mean value of the asymmetries and the statistical error forthe various estimators for the different values of α . As expected the estimatorˆ A iw gives the same result independent of α . No bias is observed within thestatistical error which is of the order of 10 − . The asymmetry for the EMLmethod is only shown for α = 2500 since at lower values, as explained above,the EML method is no more applicable.Fig. 6 shows the distribution of the asymmetry ˆ A iw for different values of α .The entries in the histograms in Fig. 6 are weighted by their correspondingFOM. In the case α = 2500 this would not be necessary because all entrieshave essentially the same FOM for a given method since the relative variationof the number of events N and the averages like P ± β i /N entering the FOMvary only very little from configuration to configuration. At lower values of α ,however, this is no more the case. Taking again the extreme case where theasymmetries is calculated from single events, the FOM depends on the valueof β for this event. This explains why the number of entries is smaller than1 in some bins of the histograms. At α = 0 .
025 the number of configurationsis 10 . The corresponding histogram has only approximately 4 . · entriesreflecting the fact that in most of the configuration there is no event.Finally, Fig. 7 shows the FOM/ N calculated from the RMS of the asymmetrydistributions presented in Fig. 6 for different values of α . The three linescorrespond to FOM ˆ A iw /N , FOM ˆ A w = β /N and FOM ˆ A cnt /N calculated using theexpressions given in Tab. 1. For α ≥
25 there is good agreement with theFOM derived in Section 2 since the points coincide with the correspondinglines. At lower values of α the FOM of the weighting and the counting rate11 -2 -1
10 1 10 asy mm e t r y A counting rateweightingimproved weighingLikelihood Fig. 5. Results for the asymmetry of the simulations as a function ofthe average number of events in one configuration. The points are at α = 0 . , . , . , , , α they areslightly displaced on the horizontal axis for better readability. Note, that valuesat different values of α are correlated since the same data were used. method start to increase and finally reach as expected FOM ˆ A iw at α = 0 . We presented several estimators to extract an asymmetry parameter A ina number density function. These estimators were based on counting rates,event weighting and the unbinned extended maximum likelihood method. Aweighting procedure was derived that reaches the same figure of merit as theunbinned maximum likelihood method, known to reach the minimal variancebound. This weighting estimator is given as an analytic expression, whereas inthe EML method the maximization of the likelihood function has to be donenumerically. Moreover this estimator can be used (with a small modification)12 w= A-100 -50 0 50 100 nb . o f e n t r i es
10 =0.025 α Entries = 4.88e+07Mean = 0.7999RMS = 1.3111 β w= A-100 -50 0 50 100 nb . o f e n t r i es -1
10 =0.25 α Entries = 3.93e+07Mean = 0.7999RMS = 1.1776 β w= A-100 -50 0 50 100 nb . o f e n t r i es -2
10 =2.5 α Entries = 9.93e+06Mean = 0.7999RMS = 0.5915 β w= A-1-0.5 0 0.5 1 1.5 2 2.5 3 nb . o f e n t r i es
10 =25 α Entries = 1000000Mean = 0.7999RMS = 0.1877 β w= A0.2 0.4 0.6 0.8 1 1.2 1.4 nb . o f e n t r i es
10 =250 α Entries = 100000Mean = 0.7999RMS = 0.0595 β w= A0.7 0.75 0.8 0.85 0.9 nb . o f e n t r i es
10 =2500 α Entries = 10000Mean = 0.7999RMS = 0.0187
Fig. 6. Distributions of the estimated asymmetries ˆ A iw for different values of α . α -2 -1
10 1 10 F O M / N counting rateweightingimproved weighingLikelihood Fig. 7. Figure of merit per event for different methods as a function of α for anasymmetry A = 0 .
8. The lines show the expectation calculated from the expressionsgiven in Tab. 1.
13n cases where the EML method cannot be applied because of an incompleteknowledge of event distribution function.Acknowledgments: I am grateful to Jean-Marc Le Goff for numerous discus-sions on the subject, verifying the calculations and for carefully reading themanuscript. 14
Figure of merit of ˆ A w To calculate the FOM of ˆ A w defined in Eq. (9), one needs σ ( P w i ), σ ( P w i β i )and cov( P w i , P w i β i ). For two arbitrary quantities f and g the covariancebetween P f i and P g i iscov( X i f i , X j g j ) (A.1)= *X i = j f i g i + X i = j f i g j + − *X i f i + *X j g j + (A.2)= h N i h f g i + h N ( N − i h f i h g i − h N i h f i h g i (A.3)= h N i h f g i + (cid:16)D N E − h N i − h N i (cid:17) h f i h g i . (A.4)If the number of events N is Poisson distributed, i.e. h N i − h N i − h N i = 0,one findscov( X i f i , X j g j ) = h N i h f g i ≈ X i f i g i . (A.5)Setting f = g = w , f = g = wβ and f = w, g = wβ results in σ ( X w i ) = h N i D w E , (A.6) σ ( X w i β i ) = h N i D ( wβ ) E , (A.7)cov( X w i , X w i β i ) = h N i D w β E . (A.8)Simple error propagation in Eq. (9) finally leads to Eq. (10) for the figure ofmerit. B Optimal weight
Denoting the weight factor which maximizes the FOM by w , we considersmall deviation from this optimum by w ( x ) = w ( x ) + ǫ η ( x ) (B.1)where η ( x ) is arbitrary and ǫ ≪
1. 15nserting Eq. (B.1) in Eq. (10) keeping terms of 1st order in ǫ one findsFOM = ( h w β i + ǫ h ηβ i ) h ( w + 2 ǫw η ) (1 − β A ) i . (B.2)The condition ∂ FOM /∂ǫ = 0 gives w = β − β A . C FOM for the case c = 1The error for the estimator defined in Eq. (19) is obtained by simple errorpropagation taking into account the correlations between P wβ and P w . (cid:16) FOM ˆ A w,c (cid:17) − = ~v T C ~v with ~v T = ∂ ˆ A w,c ∂ ( P + w i ) , ∂ ˆ A w,c ∂ ( P − w i ) , ∂ ˆ A w,c ∂ ( P + w i β i ) , ∂ ˆ A w,c ∂ ( P − w i β i ) , ! = 1 P + w i β i + c P − w i β i (1 , − c, − A, − cA ) (C.1)and C = P + w i P + w i β i P − w i P − w i β i P + w i β i P + ( w i β i ) P − w i β i P − ( w i β i ) . (C.2)This leads to Eq. (20). References [1] R. J. Barlow, Statistics, Wiley 1989[2] R. J. Barlow, Nucl. Instrum. Meth. A (1990) 496.
3] D. Adams et al. [Spin Muon Collaboration (SMC)], Phys. Rev. D (1997)5330[4] E. S. Ageev et al. [COMPASS Collaboration], Phys. Lett. B (2005) 154[5] J. Pretz and J. M. Le Goff, Nucl. Instrum. Meth. A (2009) 594[6] M. Davier, L. Duflot, F. Le Diberder and A. Rouge, Phys. Lett. B (1993)411.[7] R. J. Barlow, J. Comput. Phys. (1987) 202.(1987) 202.