[PDF] Combined Neyman-Pearson Chi-square: An Improved Approximation to the Poisson-likelihood Chi-square

Abstract

We describe an approximation to the widely-used Poisson-likelihood chi-square using a linear combination of Neyman's and Pearson's chi-squares, namely "combined Neyman-Pearson chi-square" ( χ 2 CNP ). Through analytical derivations and toy model simulations, we show that χ 2 CNP leads to a significantly smaller bias on the best-fit model parameters compared to those using either Neyman's or Pearson's chi-square. When the computational cost of using the Poisson-likelihood chi-square is high, χ 2 CNP provides a good alternative given its natural connection to the covariance matrix formalism.

Full PDF

CCombined Neyman–Pearson Chi-square: An ImprovedApproximation to the Poisson-likelihood Chi-square

Xiangpan Ji ∗ , Wenqiang Gu, Xin Qian, Hanyu Wei, Chao Zhang ∗∗ Physics Department, Brookhaven National Laboratory, Upton, NY, USA

Abstract

We describe an approximation to the widely-used Poisson-likelihood chi-square using a linearcombination of Neyman’s and Pearson’s chi-squares, namely “combined Neyman–Pearson chi-square” ( χ ). Through analytical derivations and toy model simulations, we show that χ leads to a signiﬁcantly smaller bias on the best-ﬁt model parameters compared to those using eitherNeyman’s or Pearson’s chi-square. When the computational cost of using the Poisson-likelihoodchi-square is high, χ provides a good alternative given its natural connection to the covariancematrix formalism. Keywords: test statistics, Poisson-likelihood chi-square, Neyman’s chi-square, Pearson’schi-square

1. Introduction

In high-energy physics experiments, it is often convenient to bin the data into a histogram with n bins. The number of measured events M i in each bin typically follows a Poisson distributionwith the mean value µ i ( θ ) predicted by a set of model parameters θ = ( θ , ..., θ N ) . The likelihoodfunction of this Poisson histogram can be written as: L ( µ ( θ ) ; M ) = n (cid:214) i e − µ i µ M i i M i ! . (1)A maximum-likelihood estimator (MLE) of θ can be constructed by maximizing the likelihoodratio [1, 2] λ ( θ ) = L ( µ ( θ ) ; M ) max L ( µ (cid:48) ; M ) = L ( µ ( θ ) ; M ) L ( M ; M ) , (2)where the denominator is a model-independent constant that maximizes the likelihood of the datawithout any restriction on the model . Maximizing this likelihood ratio is equivalent to minimizing ∗ Corresponding author. Email: [email protected] ∗∗ Corresponding author. Email: [email protected] While the estimation of model parameters θ does not depend on the denominator of the likelihood ratio, the chi-square test statistic constructed in this way, such as that in Eq. (3), can be used to examine the data-model compatibilitywith a goodness-of-ﬁt test. Preprint submitted to Nuclear Instruments and Methods A February 26, 2020 a r X i v : . [ phy s i c s . d a t a - a n ] F e b he Poisson-likelihood chi-square function [3, 4]: χ = − λ ( θ ) = n (cid:213) i = (cid:18) µ i ( θ ) − M i + M i ln M i µ i ( θ ) (cid:19) . (3)The MLE is commonly used in the high-energy physics, as it is generally an asymptotically unbiasedestimator, and has the advantage of being consistent and eﬃcient [5].At large statistics, the previous Poisson distribution can be approximated by a normal (orGaussian) distribution with mean µ i ( θ ) and variance σ i = µ i ( θ ) . The likelihood then becomes: L Gauss ( µ ( θ ) ; M ) = (cid:214) i (cid:112) π µ i ( θ ) exp (cid:18) − ( µ i ( θ ) − M i ) µ i ( θ ) (cid:19) . (4)The Gauss-MLE can be similarly constructed through a likelihood ratio: λ Gauss ( θ ) = L Gauss ( µ ( θ ) ; M ) max L Gauss ( µ (cid:48) ; M ) , (5)where the denominator is the maximum of L Gauss without any restriction on the model, and canbe derived by calculating ∂ L Gauss / ∂ µ (cid:48) i = . Maximizing λ Gauss ( θ ) is equivalent to minimizing theGauss-likelihood chi-square function χ = − λ Gauss ( θ ) = (cid:213) i = (cid:32) ( µ i ( θ ) − M i ) µ i ( θ ) + ln µ i ( θ ) µ (cid:48) i − ( µ (cid:48) i − M i ) µ (cid:48) i (cid:33) , with µ (cid:48) i = (cid:113) / + M i − / . (6)While the Gauss-likelihood chi-square is relatively well-known (see e.g. [6, 7]) , interestingly, itis not widely used in high-energy physics experiments. Instead, a direct chi-square test statistic,namely the Pearson’s chi-square, is constructed through: χ = (cid:213) i ( µ i ( θ ) − M i ) µ i ( θ ) . (7)Comparing with Eq. (6), we see χ consists of only the ﬁrst term in χ . These twochi-squares become asymptotically equivalent when M i is large.In practice, the variance σ i is often approximated by the measured value M i , which is indepen-dent of the model parameters. This leads to another popular chi-square test statistic in high-energyphysics experiments, namely the Neyman’s chi-square: χ = (cid:213) i ( µ i ( θ ) − M i ) M i . (8) We further provide some relevant formulas for the Gauss-likelihood chi-square in Appendix D. χ and χ are commonly used in physics data analysis, partly because of their close connection tothe covariance-matrix formalism: χ = ( M − µ ( θ )) T · V − · ( M − µ ( θ )) , (9)where V i j = cov [ µ i , µ j ] is the covariance matrix of the prediction, and can often be calculatedthrough Monte Carlo methods based on the statistical and systematic uncertainties of the experimentprior to the minimization of χ . In situations where many nuisance parameters [5] are requiredin the likelihood function L as in Eq. (1), the covariance matrix format Eq. (9) has a naturaladvantage of reducing the number of nuisance parameters, thus leads to a faster minimization ofthe χ function.One method to remove the bias of the estimator from χ is through an iteration of theweighted least-squares ﬁt, where the variance in one round of χ minimization is replacedby the prediction from the best-ﬁt value in the previous round of iteration [10, 11, 12]. Severalmodiﬁed chi-square test statistics have also been proposed in past literatures to mitigate the biasissue. For example, χ deﬁned in Eq. (6) is a good replacement of χ when the number ofmeasurements is large. Similarly, χ γ as proposed by Mighell [13] is a good alternative to χ when the number of measurements is large. Both χ Gauss and χ γ , however, still lead to biases whenthe number of measurements is small. Redin proposed a solution by including a cubic term in χ and χ [14], or by reporting a weighted average of ﬁtting results from χ and χ [15].In this paper, we propose a new method through the construction of a chi-square test statistic( χ ) with a linear combination of Neyman’s and Pearson’s chi-squares. As an improvedapproximation to the Poisson-likelihood chi-square with respect to either Neyman’s or Pearson’schi-square, the χ signiﬁcantly reduces the bias while keeping the advantage of the covariancematrix formalism. This paper is organized as follows. The construction of χ and its covariancematrix format is described in Sec. 2. Three toy examples are presented in Sec. 3 to illustrate thefeatures and advantages of χ . Finally, we summarize the recommended usage in data analysisof counting experiments in Sec. 4.

2. Combined Neyman–Pearson Chi-square ( χ ) The bias in the estimator of model parameters θ using Neyman’s or Pearson’s chi-square canbe traced back to the diﬀerent χ deﬁnitions in approximating the Poisson-likelihood chi-square.To illustrate this, we start with a simple example. A set of n independent counting experimentswere performed to measure a common expected value µ . Each experiment measured M i events.3he three chi-square functions in this case are : χ = n (cid:213) i = (cid:18) µ − M i + M i ln M i µ (cid:19) ,χ = n (cid:213) i ( µ − M i ) M i ,χ = n (cid:213) i ( µ − M i ) µ . (10) ˆ µ (the estimator of µ ) can be calculated through the minimization of Eq. (10): ∂ χ / ∂ µ = . Weobtain: ˆ µ Poisson = (cid:205) ni = M i n , ˆ µ Neyman = n (cid:205) ni = M i , ˆ µ Pearson = (cid:115) (cid:205) ni = M i n . (11)Given Eq. (11), it is straightforward to show that ˆ µ Neyman ≤ ˆ µ Poisson ≤ ˆ µ Pearson , where the equalsign is only established when all values of M i are the same. Since ˆ µ Poisson is unbiased in thissimple example, we see that ˆ µ Pearson and ˆ µ Neyman are biased in the opposite directions.We further examine the diﬀerence in chi-square values. Assuming that M i and µ are reasonablylarge so that M i is close to µ , a Taylor expansion of χ yields: χ = n (cid:213) i = (cid:18) µ − M i − M i ln (cid:18) + µ − M i M i (cid:19) (cid:19) ≈ n (cid:213) i = (cid:34) ( µ − M i ) M i − ( µ − M i ) M i + O ( ( µ − M i ) M i ) (cid:35) . (12)From Eq. (12), it is straightforward to deduce: χ − χ ≈ − n (cid:213) i ( µ − M i ) M i ,χ − χ ≈ n (cid:213) i ( µ − M i ) M i . (13)Naturally, we can deﬁne a new chi-square function as a linear combination of Neyman’s andPearson’s chi-squares: χ ≡ (cid:16) χ + χ (cid:17) = n (cid:213) i = ( µ − M i ) /( M i + µ ) , (14) The treatment for bins where M i = is described in Appendix A. χ up to O ( ( µ − M i ) M i ) , better than either χ or χ alone. In this example, the estimator ˆ µ from minimizing χ can be derived as: ˆ µ CNP = (cid:118)(cid:116) (cid:205) ni = M i (cid:205) ni = M i = (cid:113) ˆ µ · ˆ µ Neyman , (15)which is the geometric mean of two ˆ µ Pearson and one ˆ µ Neyman . Since the bias of ˆ µ Pearson and ˆ µ Neyman are in the opposite directions, it is easy to see that ˆ µ CNP has a reduced bias.More generally, when model parameters and systematic uncertainties are included, the χ can be written as: χ = n (cid:213) i = ( µ i ( θ , η ) − M i ) /( M i + µ i ( θ , η ) ) + K (cid:213) m = η m σ m , (16)where θ = { θ k | k = , ..., N } are model parameters, and η = { η m | m = , ..., K } are nuisanceparameters representing systematic uncertainties constrained with their corresponding standarddeviations ( σ m ). As an improved approximation to χ , χ in Eq. (16) will naturally leadto a reduced bias in estimating model parameters θ , such as the normalization or the shape of thehistograms, than using χ or χ .It is worth noting that in χ , the variance of the Gaussian distribution for the i th bin isapproximated as /( M i + µ i ) , while for χ and χ they are M i and µ i , respectively. Fromthis we can further deduce the covariance matrix format of the χ . Following Ref. [16], when µ i can be approximated as being linearly dependent on nuisance parameters: µ i = µ i + (cid:205) Km η m s mi ,the chi-square format with pull terms (e.g. Eq. 16) is equivalent to the chi-square in the covariancematrix format (Eq. 9). In this case, the covariance matrix V can be written as V i j = V stat i j + V syst i j , V syst i j = K (cid:213) m σ m s mi s m j . (17)Therefore, the covariance matrix format of χ becomes: ( χ ) cov = ( M − µ ( θ )) T · ( V statCNP ( θ ) + V syst ) − · ( M − µ ( θ )) , (18)where V statCNP ( θ ) i j ≡ /( M i + µ i ( θ ) ) δ i j . (19)Note that in Eq. (19) we have approximated µ i ( θ , η ) ≈ µ i ( θ ) by ﬁxing the nuisance parameters attheir externally constrained (i.e. nominal) values. This is necessary because the above derivationrequires that uncertainties must be independent of the nuisance parameters η [16].While the biases of Neyman’s and Pearson’s chi-squares are well-known [4, 8, 9], the construc-tion of χ is, interestingly, new. This could be partially caused by the fact that in low-statisticsexperiments where the use of χ or χ leads to a high bias, the Poisson-likelihood chi-square is generally used instead. χ , however, provides certain advantages in situations where5ither the number of nuisance parameters is too high, or the likelihood function is analyticallydiﬃcult to write. In the next section, we demonstrate the features and advantages of χ withthree toy examples of increasing complexity. Before that, below we brieﬂy discuss the expectedperformance of χ regarding two other common properties of a test statistic: the goodness ofﬁt and the interval estimation . In a goodness-of-ﬁt test, the test statistic (e.g. χ ) is evaluated at the estimator ˆ µ (i.e. thebest-ﬁt value of µ ). Assuming its distribution following a chi-square distribution with the corre-sponding number of degrees of freedom, a p-value can be calculated to evaluate the compatibilitybetween the data and the model. Although χ can be used to perform such a test, it does nothold a particular advantage over the preferred choice of χ [6]. As shown in Fig. 3 in Sec. 3.1,the distributions of χ , χ , χ , χ , and χ all deviate from the ideal chi-square distribution at low values of µ true , while χ deviates the least. In addition, the meanof the χ distribution equals to the number of degrees of freedom at all µ true ’s. Therefore,following Ref. [6], we recommend to use χ together with the least-biased estimator ˆ µ (frome.g. χ or χ ) to perform the goodness-of-ﬁt test. It is well known that the construction of conﬁdence intervals in the frequentist approach not onlydepends on the choice of test statistics T , but also on its actual procedure. Within the high-energyphysics community, there are two popular procedures in setting the conﬁdence intervals, which wedescribe below.The ﬁrst procedure is based on the Wilks’ theorem [17]. The conﬁdence interval is set byplacing a certain threshold c on the distribution of ∆ T ( µ ) = T ( µ ) − T min , where µ , T ( µ ) , and T min are the parameter of interest, the test statistic evaluated at µ , and the global minimum of the T ( µ ) forall model parameters, respectively. Under the conditions that i) the two hypotheses are nested, ii)the parameters of the larger hypothesis (e.g. T ( µ ) ) are all uniquely deﬁned in the smaller hypothesis(e.g. T min ), and not on the limits of the allowed region, and iii) data are asymptotic, Wilks provesthat the negative-two-log-likelihood-ratio test statistic ∆ T follows a chi-square distribution andthe estimator ˆ µ follows a normal distribution centered around the true value µ true . Consequently,the threshold c can be conveniently calculated. For instance, the threshold c for the 68%, 95%,and 99.7% conﬁdence intervals are 1, 4, and 9, respectively, assuming ∆ T follows a chi-squaredistribution with one degree of freedom. With this procedure, the correctness of the conﬁdenceinterval coverage depends on the validity of the Wilks’ theorem. As demonstrated in Eq. (13)and Eq. (14), χ is an improved approximation to the negative-two-log-likelihood-ratio of thePoisson distribution (i.e. χ ), and it leads to a reduced bias in the estimator ˆ µ comparedto those from χ and χ . Therefore, the conditions of the Wilks’ theorem are bettermet with χ , which means the the chi-square distribution is a better approximation to the ∆ T distribution from χ . Fig. 5 in Sec. 3.1 shows one such example. Consequently, we expecta more proper coverage of the conﬁdence interval using χ when compared to those using χ or χ under this procedure. 6he second procedure is commonly referred to as the Feldman-Cousins approach [18] in thehigh-energy physics community. In this procedure, the construction of the conﬁdence intervalstrictly follows a frequentist deﬁnition (Neyman construction) with an ordering principle based onthe value of the likelihood-ratio test statistic (i.e. ∆ T ( µ ) = T ( µ ) − T min with T = χ forcounting experiments) to ensure a proper frequentist coverage. Sec. 3.1 shows an example of thisprocedure with a toy experiment. Similarly, the procedure can be deﬁned with an ordering principlebased on other ∆ T test statistics (e.g. T = χ , χ , or χ ), and the constructedconﬁdence intervals would also have proper coverages in general. In this case, while all of thecoverages are proper, a better test statistic is expected to yield a smaller conﬁdence interval in size(or area, volume). As shown in Table. 1 of Sec. 3.1, the conﬁdence interval constructed using χ is smaller than those using χ and χ . This is partially caused by the reducedbias in the estimator ˆ µ using χ , as will be further discussed in Sec.3.1.We should note that there are other procedures to set conﬁdence intervals that are less aﬀectedby certain properties of the test statistics. For example, since the bias ( δ µ ) of an estimator ˆ µ can be evaluated with a Monte Carlo method, one can deﬁne an alternative test statistic with ∆ T (cid:48) ( µ ) = T ( µ + δ µ ) − T min . Naturally, the conﬁdence interval constructed using ∆ T (cid:48) with eitherthe thresholding approach based on the Wilks’ theorem or the Feldman-Cousins approach would beless aﬀected by the bias, and performs better than that of ∆ T at the cost of increased computation.

3. Performance of χ In this section, we compare the performance of χ , χ , χ , and χ withthree toy examples. While we focus on the issue of bias, we also provide comparison results ofthe goodness-of-ﬁt test and the interval estimation in the ﬁrst example to support the discussionin Sec. 2.1 and Sec. 2.2. For completeness of the discussion, we add χ , which has a similarperformance to χ in certain scenarios, to the comparison in the ﬁrst example. The ﬁrst example is similar to the one introduced in Sec. 2. In each toy experiment, a set of n independent counting measurements were performed to measure a common expected value µ .The χ curves with n = and µ true = of one simulated toy experiment is shown in the leftpanel of Fig. 1. The minimum location of the χ curve represents the estimator ˆ µ . It is clear that ˆ µ Neyman < ˆ µ CNP ≈ ˆ µ Poisson ≈ ˆ µ Gauss < ˆ µ Pearson and the CNP chi-square curve closely resemblesthe Poisson-likelihood chi-square as demonstrated in the previous section.The relative biases of ˆ µ using χ , χ , χ , χ and χ are shown in theright panel of Fig. 1 with 10 million toy experiments. The bias using χ is zero. The biasesusing χ and χ have opposite signs. The magnitude of mean bias using χ isabout twice of that using χ . The bias using χ is an order of magnitude smaller thanthose using χ and χ . The bias using χ is similar to χ . The variance of ˆ µ Neyman is notably larger than those of the other four test statistics, which are similar.In Fig. 2, we further study the biases of ˆ µ with diﬀerent values of µ true and the number ofmeasurements n . The biases using χ are always zero as expected from an unbiased estimator7 c [%] true m )/ true m - m (40 - - - - E n t r i e s · Rel.Bias, RMSPoisson 0, 8.17Gauss -0.33, 8.28Neyman -6.37, 8.83Pearson 2.95, 8.28CNP -0.29, 8.21

Figure 1: (Left) The χ curves of the ﬁve test statistics: χ , χ , χ , χ , and χ of one toyexperiment with n = and µ true = . (Right) Distributions of relative diﬀerence between ˆ µ and µ true for χ , χ , χ , χ and χ using 10 million toy experiments. The second and third columns in the legendshow the relative bias in percentage and the root-mean-square of the relative bias distribution. in this simple example. The biases using χ , χ , and χ become larger as thenumber of measurements n increases. This behavior may not be intuitive, but is well known andthe proof is provided in Appendix B. As µ and n increases, the biases of ˆ µ Pearson and ˆ µ Neyman approach / and − , respectively. Beside these observations, the general features of the biasesstay the same as discussed previously. Most importantly, χ yields a much smaller bias than χ or χ in all occasions.Figure 2 also shows the performance of χ , which is another way to mitigate the bias issue.Similar to χ , χ performs much better than χ or χ . We note that the bias of ˆ µ Gauss is less dependent on µ , and becomes smaller when n increases. This is expected from thecentral limit theorem, which states that the sum of a large number of identically distributed randomvariables follows a normal distribution. Therefore, when the number of measurements is large, χ provides a better performance even when µ is small. On the other hand, when number ofmeasurements is not large, χ shows a better performance.Next, we compare the performance on the goodness-of-ﬁt test. The left panel of Fig. 3 showsthe distribution of the ﬁve test statistics evaluated at µ true = in the n = setting with 10 milliontoy experiments. The ideal chi-square distribution with 10 degrees of freedom is also shown forcomparison. All ﬁve test statistics deviate from the ideal chi-square distribution, with χ being the closest and χ deviating the most. The mean of χ is exactly 10, and themean of χ is the largest. The right panel of Fig. 3 shows the relative deviation of the meanto the number of degrees of freedom ( ndf = in all toy experiments) for the ﬁve test statistics asa function of µ true . It is clear that except for χ , the other four test statistics are not ideal inthis metric when µ true is less than a few tens, with χ being the worst. Ref. [6] provides agood discussion on this behavior.In practice, µ true is unknown and experiments often report χ min (evaluated at ˆ µ ) as a metricfor the goodness-of-ﬁt test. The left panel of Fig. 4 shows the results of this test for the samesetting of n = as in Fig. 3. Note that when χ is evaluated at ˆ µ , the number of degrees of8 umber of measurements0 20 40 60 80 100 t r u e m - m - - = 15 true m Neyman, = 150 true m Neyman, = 15 true m Pearson, = 150 true m Pearson,

Number of measurements0 20 40 60 80 100 t r u e m - m - - - - = 15 true m Gauss, = 150 true m Gauss, = 15 true m CNP, = 150 true m CNP,

Figure 2: The absolute biases of ˆ µ as a function of the number of measurements n for µ true = and µ true = . Theleft panel shows the biases using χ and χ . The right panel shows the biases using χ and χ .Each point is obtained with 10 million toy experiments. c P r ob a b ilit y (10) c Poisson, 10.12Gauss, 10.53Neyman, 11.71Pearson, 10.00CNP, 10.57 true m > - nd f) / nd f c ( < - true m at PoissonGaussNeymanPearsonCNP Figure 3: (Left) The distribution of the ﬁve test statistics: χ , χ , χ , χ , and χ evaluated at µ true = in the n = setting using 10 million toy experiments. The ideal chi-square distribution with 10 degrees offreedom, χ ( ) , is also shown for comparison. The second column in the legend shows the mean of each distribution.(Right) The relative deviation of the distribution’s mean to the number of degrees of freedom as a function of µ true for these ﬁve test statistics. Each point is obtained with 10 million toy experiments in the n = setting. µ true is less than ∼

10, indicating large deviations from the chi-squaredistribution in those cases. On the other hand, inspired by Fig. 3, we can use χ to performthe goodness-of-ﬁt test, but evaluate it at a ˆ µ obtained from a diﬀerent test statistic. The rightpanel of Fig. 4 shows the results. We see that when χ is evaluated at a less-biased estimator ˆ µ , (e.g. ˆ µ Poisson , ˆ µ Gauss , or ˆ µ CNP ), it results in a better metric for the goodness-of-ﬁt test, whichconﬁrms our recommendation in Sec. 2.1. true m > - nd f) / nd f m i n2 c ( < - - PoissonGaussNeymanPearsonCNP true m > - nd f) / nd f P ea r s on2 c ( < - - Poisson m at Gauss m at Neyman m at Pearson m at CNP m at Figure 4: (Left) Similar to the right panel of Fig. 3 but evaluate the χ at each test statistic’s estimator ˆ µ . Correspond-ingly, the resulting χ value is at its minimum χ min . (Right) Similar to the left panel, but use Pearson’s chi-squareevaluated at a ˆ µ obtained from a diﬀerent test statistic. Each point is obtained with 10 million toy experiments in the n = setting. The number of degrees of freedom in all cases is n − (ndf = 9). cD P r ob a b ilit y - - - -

10 1 (1) c NeymanPoisson PearsonGauss CNP

Figure 5: The distributions of ∆ χ for the ﬁve test statistics: χ , χ , χ , χ , and χ with n = and µ true = using 10 million toy experiments. The ideal chi-square distribution with one degrees offreedom, χ ( ) , is also shown for comparison. To compare the performance on the interval estimation, Fig. 5 shows the ∆ χ distributionin the n = and µ true = setting with 10 million toy experiments, where ∆ χ = χ ( µ = µ true ) − χ ( µ = ˆ µ ) . As discussed in Sec. 2.2, when the conditions of the Wilks’ theorem [17] are10et, it is expected that ∆ χ in this example follows the chi-square distribution with one degreeof freedom. However, except for χ , the other four test statistics all clearly deviate from theideal χ ( ) distribution leading to improper coverages when using the ∆ χ = rule to set the 68%conﬁdence intervals. Therefore, we follow the Feldman-Cousins approach [18] to construct the68% conﬁdence interval instead. First, a scan of µ values is performed. Setting each test µ as thetrue value, many toy experiments are generated to obtain its ∆ χ distribution. Then, from each ∆ χ distribution, a critical ∆ χ c ( ) value can be determined such that below it the distributioncontains 68% of the toy experiments. For example, given the distributions shown in Fig. 5, thecritical ∆ χ c ( ) values for χ , χ , χ , and χ are larger than one, which isthe result of their biases in ˆ µ . Finally, returning to the original toy experiments with the µ true = setting, for each toy experiment we can set its conﬁdence interval by comparing its ∆ χ value withthe critical ∆ χ c value at each test µ value. The 68% conﬁdence interval is constructed to containall the test µ values that have ∆ χ < ∆ χ c ( ) . For each of the 10 million toy experiments, thisprocedure is repeated to obtain its 68% conﬁdence interval. The reported lower limit ˆ µ − σ / andupper limit ˆ µ + σ / of the 68% conﬁdence interval are the median values over all toy experimentsand tabulated in Table. 1. As shown, χ and χ have similar (average) interval sizes, bothlarger than that of χ but quite smaller than those of χ and χ . There are tworeasons causing the larger interval size of χ and χ . First, ˆ µ Neyman has a notablylarger variance as shown in Fig. 1. Second, since µ true is always contained in the ensemble medianof conﬁdence intervals (but not necessarily near the center) by construction , the larger biases of ˆ µ Pearson and ˆ µ Neyman also contribute to their larger interval sizes.

Table 1: Comparison of the median 68% conﬁdence intervals for the ﬁve test statistics: χ , χ , χ , χ , and χ . 10 million toy experiments are generated with the n = and µ true = setting. For each toyexperiment, a 68% conﬁdence interval is obtained using the Feldman-Cousins approach. The reported lower limit ˆ µ − σ / and upper limit ˆ µ + σ / of the 68% conﬁdence interval are the median values over all toy experiments. median 68% conﬁdence interval interval size (cid:16) ˆ µ − σ / , ˆ µ + σ / (cid:17) ˆ µ + σ / − ˆ µ − σ / χ Poisson (13.839, 16.226) 2.387 χ Gauss (13.744, 16.221) 2.478 χ Neyman (12.236, 15.706) 3.471 χ Pearson (14.153, 16.800) 2.647 χ CNP (13.745, 16.196) 2.451Next, we show two more examples with increasing complexity inspired by real experiments. In a frequentist deﬁnition of the 68% conﬁdence interval (C.I.), if one performs a large number of similarexperiments, the interval would contain µ true in 68% of the cases. This means the lower limit of the 68% C.I. wouldbe lower than µ true in at least 68% of the experiments, therefore the median of the lower limit of the 68% C.I., ˆ µ − σ / ,is always lower than µ true . Similarly, the median of the upper limit of the 68% C.I., ˆ µ + σ / , is always higher than µ true . χ Gauss generally have a similar performance as χ CNP and can also beneﬁt from the covariancematrix formalism, we restrict our comparisons of χ CNP to χ , χ , and χ . Thefollowing study will focus on the bias of the point estimation of model parameters, since theperformance on the goodness-of-ﬁt test and the interval estimation is similar to the ﬁrst example. In this section, we introduce a more realistic example, which is inspired by the PROSPECTreactor neutrino experiment [19] searching for a light sterile neutrino [20]. One of the uniquefeatures of PROSPECT is that the detector consists of many segmented sub-detectors, and thenumber of events in each sub-detector is not high ( ∼ few hundreds). Since each sub-detector has adiﬀerent baseline to the reactor, it is desirable to treat each sub-detector separately in the spectrumﬁtter to increase the physics sensitivity to the energy- and baseline-dependent oscillation eﬀectcaused by a hypothetical light sterile neutrino.In our toy example experiment, we assume 100 sub-detectors, each measures a common energyspectrum with 16 energy bins. The expected spectrum is assumed to be ﬂat with an unknownnormalization bias factor (cid:15) to be determined . In the i th bin of the d th detector, µ id signal eventsand b id background events are expected, and M id total events are measured. The background shapeis also assumed to be ﬂat and the expected background b id is assumed to be half of the expectedsignal µ id in size. The experiment also measured B id background events in a signal-oﬀ period,which provided an external constraint on the background. For simplicity we assume the length ofthe signal-oﬀ period is the same as the signal-on period. We consider one systematic uncertainty,the relative normalization uncertainty (cid:15) d among detectors, and assume it to be constrained to 2%.Therefore, in this example, there is one model parameter (cid:15) , and 1700 nuisance parameters ( b id , (cid:15) d )to be estimated.The Poisson-likelihood chi-square function for this toy experiment can be written as: χ = (cid:213) d = (cid:213) i = (cid:32) µ id ( + (cid:15) + (cid:15) d ) + b id − M id + M id ln M id µ id ( + (cid:15) + (cid:15) d ) + b id (cid:33) + (cid:213) d = (cid:213) i = (cid:32) b id − B id + B id ln B id b id (cid:33) + (cid:213) d = (cid:16) (cid:15) d . (cid:17) , (20)and for the CNP chi-square: χ = (cid:213) d = (cid:213) i = (cid:16) µ id ( + (cid:15) + (cid:15) d ) + b id − M id (cid:17) /( M id + µ id ( + (cid:15) + (cid:15) d ) + b id ) + (cid:213) d = (cid:213) i = (cid:16) b id − B id (cid:17) /( B id + b id ) + (cid:213) d = (cid:16) (cid:15) d . (cid:17) . (21)The χ and χ can be constructed similarly by changing the denominators of the ﬁrstand the second terms in Eq. (21). Appendix E shows an example where the shape of the histogram is also a model parameter. ∂ χ / ∂ b id = . In this simple example, since the nuisance parameters are independent ofeach other, this equation is linear for χ , quadratic for χ , quartic for χ , andquintic for χ . The solutions to these equations can be found either analytically ( ≤ th order)or numerically ( > th order).One hundred thousand toy experiments are simulated assuming the nominal signal µ id = and background b id = in each bin. The normalization bias factor (cid:15) is ﬁtted for each experiment,where the true value of (cid:15) is set to zero. The results of using χ , χ , χ and χ are shown in Fig. 6. Despite being small, the bias of χ is non-zero. This is caused by theintroduction of penalty terms in Eq. (20) (see Appendix C for an explanation). One can see thatthe bias of χ is again much smaller than those of χ and χ , representing a muchbetter approximation to χ . [%] ˛ Best-fit value of 5 - - - - - E n t r i e s · Poisson 0.01Neyman -1.70Pearson 0.74CNP -0.01

Figure 6: Distributions of best-ﬁt values of the normalization bias factor (cid:15) using χ , χ , χ , and χ . One hundred thousand toy experiments are simulated. Each experiment has 100 detectors and 16 energy bins.The nominal signal and background in each bin are assumed to be 30 and 15, respectively. The numbers in the legendshow the mean of each distribution. In many physics experiments, covariance matrix is used to model complicated systematicuncertainties, where either direct nuisance parameter implementation is diﬃcult, or there are toomany nuisance parameters to minimize. In this section, we show how the χ can be implementedin a covariance matrix format.We introduce a slight complication to the previous example so that the analytic or numericalmethods to ﬁnd best-ﬁt values are prohibitively diﬃcult in the minimization. In this example,we assume the detector response changed between the signal-on and the signal-oﬀ period, and inorder to interpolate the expected background in the signal-oﬀ period b id to the signal-on period, atransfer matrix R is needed such that ( b id ) on = (cid:205) j R i jd b jd . For simplicity, 10 sub-detectors are used13n this example, and the transfer matrix R does a simple smearing in energy bins such that for eachdetector R i jd = . when i = j , R i jd = . when i = j ± , and R i jd = everywhere else. The χ in this example becomes: χ = (cid:213) d = (cid:213) i = (cid:16) µ id ( + (cid:15) + (cid:15) d ) + (cid:205) j R i jd b jd − M id (cid:17) /( M id + µ id ( + (cid:15) ) + (cid:205) j R ijd b jd ) + (cid:213) d = (cid:213) i = (cid:16) b id − B id (cid:17) /( B id + b id ) + (cid:213) d = (cid:16) (cid:15) d . (cid:17) . (22)In this case, solving for the nuisance parameters through ∂ χ / ∂ b id = would lead to a set ofquintic equations, which is diﬃcult to solve either analytically or numerically. Following Sec. 2,the covariance matrix format of Eq. (22) is: (cid:16) χ (cid:17) cov = ( µ ( + (cid:15) ) + R · b − M ) T · (cid:0) V statCNP + V syst (cid:1) − · ( µ ( + (cid:15) ) + R · b − M ) + ( b − B ) T · (cid:16) V bkgCNP (cid:17) − · ( b − B ) , (23)where M id , µ id , b id and B id are ordered into a single 160-element vector M , µ , b , B , respectively. V statCNP is the covariance matrix corresponding to the statistical uncertainty, which is diagonal with itselements being the corresponding values in the denominator of the ﬁrst term of Eq. (22). Similarly, V bkgCNP is the covariance matrix corresponding to the background statistical uncertainty with thediagonal elements deﬁned by the denominator of the second term in Eq (22). V syst is the covariancematrix corresponding to the systematic uncertainty (cid:15) d , which can be calculated either analyticallyor from toy Monte Carlo simulations by randomly ﬂuctuating the number of events according to (cid:15) d and its constraint.Following the same procedure, covariance matrix formats can be constructed for χ and χ by replacing the statistical uncertainty terms in the covariance matrix in Eq. (23), V statCNP and V bkgCNP , to their corresponding values in χ and χ . We note that there is noequivalent covariance matrix format for the Poisson-likelihood chi-square. One hundred thousandtoy experiments are simulated assuming the nominal signal µ id = and background b id = ineach bin. The normalization bias factor (cid:15) is ﬁtted for each experiment, where the true value of (cid:15) was set to zero. The results are shown in the left panel of Fig. 7. We see that in the covarianceformat, the bias of ( χ ) cov is again more than an order of magnitude smaller than those of ( χ ) cov and ( χ ) cov .We emphasize that in the ( χ ) cov deﬁned in Eq. (23), both the free parameter (cid:15) and thenuisance parameters b di need to be minimized. This is due to the nature of the Poisson statisticaluncertainty of the background, and how it is treated in the CNP chi-square. It is tempting to furtherreduce the number of nuisance parameters by absorbing them into a ﬁxed covariance matrix. Inorder to do so, we need to approximate the expected b di with their measured value B di . In this case,14q. (22) and (23) are replaced by χ (cid:48) = (cid:213) d = (cid:213) i = (cid:16) µ id ( + (cid:15) + (cid:15) d ) + (cid:205) j R i jd b jd − M id (cid:17) /( M id + µ id ( + (cid:15) ) + (cid:205) j R ijd B jd ) + (cid:213) d = (cid:213) i = (cid:16) b id − B id (cid:17) B id + (cid:213) d = (cid:16) (cid:15) d . (cid:17) (24)and (cid:16) χ (cid:48) (cid:17) cov = ( µ ( + (cid:15) ) + R · B − M ) T · (cid:16) V (cid:48) statCNP + V (cid:48) bkg + V (cid:48) syst (cid:17) − · ( µ ( + (cid:15) ) + R · B − M ) , (25)where the nuisance parameters b id are absorbed into V (cid:48) bkg . After these approximations, in (cid:16) χ (cid:48) (cid:17) cov , only one free parameter (cid:15) instead of 161 ﬁtting parameters in Eq. (23) needs tobe minimized and the computational cost is largely reduced. Similar approximations can be usedfor (cid:16) χ (cid:48) (cid:17) cov and the ﬁtting results are shown in right panel of Fig. 7. We see that although theapproximation leads to a much reduced number of ﬁtting parameters, the bias of the normalizationfactor (cid:15) becomes signiﬁcantly larger, in particular for the CNP-chi-square. It is therefore crucial toindicate clearly how the χ is deﬁned, and what approximations are implied in the construction ofthe covariance matrix when reporting results. [%] ˛ Best-fit value of 15 - - - E n t r i e s · Neyman -2.72Pearson 1.29CNP -0.06 [%] ˛ Best-fit value of 15 - - - E n t r i e s · Neyman -2.72Pearson 2.75CNP 0.91

Figure 7: (Left) Distributions of best-ﬁt values of normalization factor (cid:15) from ( χ ) cov , ( χ ) cov and ( χ ) cov in the third example, simulated using one hundred thousand toy experiments with 10 sub-detectors and µ id = and b id = . The numbers in the legend show the mean of each distribution. (Right) Similar to the left plotbut after further approximation to absorb the background term into the covariance matrix as in Eq. 24 and Eq (25).

4. Discussions

Through examples in the previous section, we have compared various chi-square constructionmethods and diﬀerent minimization strategies. In the following, we provide some recommendationson when to use them in the data analysis of counting experiments:15 When the computational cost is not a concern (e.g. number of nuisance parameters issmall), a direct minimization of the Poisson-likelihood chi-square (with nuisance parametersimplementing through pull terms) should be used.• When the computational cost of a direct minimization is high, one should ﬁrst look for analyticor numerical solutions, which can eﬀectively reduce the number of nuisance parameterswithout making any approximations. For example, the number of nuisance parameters ofthe Poisson-likelihood chi-square in the example described in Sec. 3.2 can be reduced bysolving a set of independent quadratic equations.• When analytic or numerical solutions are not available, approximations may become nec-essary to reduce the computational cost. In this case, the covariance matrix formalism isa common tool in reducing the number of nuisance parameters. However, before approxi-mating the Poisson-likelihood chi-square by Neyman’s, Pearson’s, Gauss-likelihood, or CNPchi-squares, one can examine if it is suﬃcient to apply covariance matrix only to the pull termsof the systematic uncertainties. For example, the rate plus shape oscillation ﬁt described inRef. [21] used a covariance matrix in the pull term for reactor-related uncertainties. In thisapproach, the statistical part of the chi-square function can still use the Poisson-likelihoodformat.• When the Poisson-likelihood chi-square has to be replaced, the iterative approach with theweighted least-squares as described in Ref. [10, 11, 12] can be an option to eliminate thebias in the estimator. An alternative approach is the CNP or the Gauss-likelihood chi-square,which both lead to a much reduced bias in estimating model parameters than using eitherNeyman’s or Pearson’s chi-square. As shown in Fig. 2 of Sec. 3.1, the CNP or the Gauss-likelihood chi-square could be the better choice of test statistics depending on the numberof measurements. In addition, the improved conﬁdence intervals (smaller in size or withmore proper coverage) are often accompanied with the reduced bias as discussed in Sec. 2.2and shown in Sec. 3.1. Similarly, analytic or numerical solutions should be explored beforeapplying a covariance matrix approach, since additional approximations are necessary in thelater case. As shown in Sec. 2, the derivation of covariance matrix formula assumes i) thevariance describing statistical ﬂuctuations has to be independent of any nuisance parameters,and ii) the predicted counts only have a linear dependence on the nuisance parameters. Forexample, the approximation made in the right panel of Fig. 7 leads to a signiﬁcant bias.We emphasize that since there are many diﬀerent ways to make approximations in deﬁning thechi-square test statistics, it is extremely important for experiments to clearly report how their teststatistics are constructed.In summary, we proposed a linear combination of Neyman’s and Pearson’s chi-squares, χ ,as an improved approximation to the widely-used Poisson-likelihood chi-square in counting ex-periments. With three examples, we show that the bias in parameter estimation from using CNPchi-square is much smaller than those using the Neyman’s or Pearson’s chi-square alone. In oc-casions where the computational cost of using Poisson-likelihood chi-square is high, the CNPchi-square with its covariance matrix format provides a good alternative.16 cknowledgments We thank Maxim Gonchar and Mike Shaevitz for suggesting the comparison of the CNP chi-square with the Gauss-likelihood chi-square. This work is supported by the U.S. Department ofEnergy, Oﬃce of Science, Oﬃce of High Energy Physics, and Early Career Research Programunder contract number de-sc0012704.

Appendix A. Treatment of bins with zero observed events

Experiments can often have bins with zero counts when the expected signal is small. In thiscase, the Neyman’s chi-square deﬁnition, Eq. (8), breaks down since the measured number ofevents is in the denominator, so are the CNP and Gauss-likelihood chi-square deﬁnitions. Practicalapproximations are often made in experiments by either ignoring bins with zero observation,or assign the statistical uncertainty as 1 for zero-count bins (e.g. the “modiﬁed Neyman’s chi-square” [6]). Here we adopt the Poisson-likelihood chi-square deﬁnition for zero-count bins: (cid:16) χ i (cid:17) M i = = (cid:18) µ i ( θ ) − M i + M i ln M i µ i ( θ ) (cid:19) M i = = µ i ( θ ) . (A.1)Eq. (A.1) can be re-written in a weighted least-squares format: (cid:16) χ i (cid:17) M i = = µ i ( θ ) = ( µ i ( θ ) − M i ) µ i ( θ )/ . (A.2)Compared with the Pearson’s chi-square, we see that the variance is half of χ for zero-countbins. The covariance matrix element corresponding to a zero-count bin follows: (cid:0) V stat ( θ ) i j (cid:1) M i = = µ i ( θ ) δ i j . (A.3)In this paper, we use Eq. (A.1) and (A.3) in all occasions when zero-count bins are encountered. Appendix B. Bias of estimator ˆ µ Neyman and ˆ µ Pearson versus number of measurements

Here we prove that the bias of ˆ µ Neyman and ˆ µ Pearson increases as the number of measurements n increases, as shown in Fig. 2. Making use of the relations Var ( x ) = E ( x ) − E ( x ) , E (cid:18) x (cid:19) ≈ E ( x ) + Var ( x ) E ( x ) , (B.1)for ˆ µ Neyman we have: E (cid:18) µ Neyman (cid:19) = E (cid:32) (cid:205) ni = M i n (cid:33) = E (cid:18) M i (cid:19) ≈ E ( M i ) + Var ( M i )( E ( M i )) = µ + µ , (B.2)17here E ( M i ) = Var ( M i ) = µ since M i follows a Poisson distribution. The expected bias thenbecomes: E ( ˆ µ Neyman − µ ) = E (cid:169)(cid:173)(cid:171) µ Neyman (cid:170)(cid:174)(cid:172) − µ ≈ − µ + µ + Var (cid:16) M i (cid:17) / n (cid:16) µ + µ (cid:17) . (B.3)which deviates further from zero when n increases. The bias approaches -1 when n and µ becomelarge. Similarly, for ˆ µ Pearson we have: E ( ˆ µ Pearson ) = E (cid:169)(cid:173)(cid:171)(cid:115) (cid:205) i M i n (cid:170)(cid:174)(cid:172) = (cid:113) E ( M i ) − Var ( ˆ µ Pearson ) = (cid:112) µ + µ − Var ( ˆ µ Pearson ) , (B.4)therefore: E ( ˆ µ Pearson − µ ) = µ (cid:32)(cid:115) + µ − Var ( ˆ µ Pearson ) µ − (cid:33) , (B.5)which also becomes larger at larger n , since the variance of ˆ µ Pearson becomes smaller at larger n .The bias approaches 1/2 when n and µ become large. Appendix C. Bias of χ when pull terms are included In this appendix, we provide an explanation of the non-zero bias of (cid:15) from χ when pullterms are included, for example, in Eq. (20). Let us consider a simpliﬁed example. One experimentmeasured m number of events, which follows Poisson-distribution with the mean value of µ . Thereis one systematic uncertainty ( (cid:15) ) on the normalization of µ , which is constrained with standarddeviation of σ . Following maximum-likelihood principle, the Poisson-likelihood chi-square withthe constraint on (cid:15) is: χ = (cid:18) µ ( + (cid:15) ) − m + m · ln m µ ( + (cid:15) ) (cid:19) + (cid:16) (cid:15)σ (cid:17) . (C.1)The estimator of (cid:15) ( ˆ (cid:15) ) can be derived through the minimization of chi-square: ∂ χ / ∂(cid:15) = : ˆ (cid:15) = + µσ (cid:169)(cid:173)(cid:171) − + (cid:115) − σ ( + µσ ) ( µ − m ) (cid:170)(cid:174)(cid:172) . (C.2) Note that for the dependence on µ , Eq. (B.3) is only asymptotically correct when n and µ are large due to theapproximation made in Eq. (B.1). The actual dependence on µ when n → ∞ can only be written as an inﬁnitesummation (e.g. E ( ˆ µ Neyman ) = ( e µ − )/ (cid:16) + (cid:205) ∞ k = µ k k ( k ! ) (cid:17) ). One derivation can be found in Ref [13]. x = σ ( + µσ ) ( µ − m ) and assuming | x | (cid:28) , we can perform a Taylor expansion onEq. (C.2) and obtain: ˆ (cid:15) ≈ + µσ (cid:18) − x − x − O ( x ) (cid:19) . (C.3)Ignoring higher-order terms, the expectation of ˆ (cid:15) is E ( ˆ (cid:15) ) ≈ + µσ (cid:18) − E ( x ) − E ( x ) (cid:19) . (C.4)Given that E ( x ) is zero and E ( x ) is non-zero, we see that in this example ˆ (cid:15) is a biased estimator. ˆ (cid:15) only asymptotically becomes unbiased under large statistics [5]. Appendix D. Bias and covariance matrix formulas for the Gauss-likelihood chi-square

In this appendix, we provide formulas on the bias of ˆ µ Gauss from the Gauss-likelihood chi-square χ , as well as the covariance matrix format of χ . Given the simple model describedin Sec. 2, ˆ µ Gauss can be obtained through the minimization of Eq. (6): ∂ χ / ∂ µ = , yielding ˆ µ Gauss = (cid:114) (cid:205) ni = M i n + − . (D.1)Using the covariance matrix formalism, the likelihood function in Eq. (4) becomes: L Gauss ( µ ( θ ) ; M ) = (cid:112) ( π ) d | V | · exp (cid:20) ( M − µ ( θ )) T · V − · ( M − µ ( θ )) (cid:21) , (D.2)where d and | V | are the dimension and determinant of the covariance matrix V , respectively.Therefore, we have χ auss = − λ G auss ( θ ) = ln | V | + ( M − µ ( θ )) T · V − · ( M − µ ( θ )) + C , (D.3)with C being a model-independent constant, which does not play a role in estimating the modelparameters. Appendix E. Improvement on model parameters other than normalization

Although in our examples in Sec. 3, only one normalization parameter is considered (i.e. theshape of the histogram is ﬁxed), since the CNP chi-square is a better approximation to the Poisson-likelihood chi-square for counting statistics, we expect the improvement is general for any binnedhistograms with models including one or more parameters. Below we show an example where theshape of the histogram is linear, with the slope ( p ) and the y-intercept ( p ) being two free modelparameters in the ﬁt. The example is deﬁned as follows: n i = p + p x i , (E.1)19here n i is the number of counts in the i-th bin, and x i is the value of the bin center. n i is assumedto follow a Poisson distribution. 10 bins are considered in this example and x i ranges from 0.1 to1 with a step of 0.1. The true values of p and p are assumed to be 8 and 20, respectively. 10million toy experiments are generated according to this setting. The distribution of best-ﬁt valuesof p and p are shown in Fig. E.8. While the relative bias in p (shape) is generally smaller thanthat of p (normalization) given a chosen test statistic, the CNP chi-square yields smaller biases inboth parameters as expected. p0 10 20 30 40 E n t r i e s · Rel.BiasPoisson 0Neyman -9.07%Pearson 4.34%CNP -0.51% p0 10 20 30 40 50 60 70 E n t r i e s · Rel.BiasPoisson 0Neyman -0.97%Pearson 0.43%CNP 0.10%

Figure E.8: Distributions of best-ﬁt values of p (left) and p (right) for the example in Appendix E using χ , χ , χ , and χ . The true values of p and p are 8 and 20, respectively. Ten million toy experimentsare simulated. The numbers in the legend show the relative biases of the best-ﬁt values. References [1] J. Neyman and E. S. Pearson,

On the use and interpretation of certain test criteria for purposes of statisticalinference: Part i , Biometrika (1928) 175–240.[2] J. Neyman and E. S. Pearson,

On the problem of the most eﬃcient tests of statistical hypotheses , PhilosophicalTransactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or PhysicalCharacter (1933) 289–337.[3] W. Cash,

Parameter estimation in astronomy through application of the likelihood ratio , Astrophys. J. (1979) 939–947.[4] S. Baker and R. D. Cousins,

Clariﬁcation of the Use of Chi Square and Likelihood Functions in Fits toHistograms , Nucl. Instrum. Meth. (1984) 437–442.[5] Particle Data Group collaboration, M. Tanabashi et al.,

Review of particle physics: Chapter 39. statistics , Phys. Rev. D (Aug, 2018) 030001.[6] T. Hauschild and M. Jentschel, Comparison of maximum likelihood estimation and chi-square statistics appliedto counting experiments , Nucl. Instrum. Meth.

A457 (2001) 384–401.[7] X. Qian, A. Tan, J. J. Ling, Y. Nakajima and C. Zhang,

The Gaussian CL s method for searches of new physics , Nucl. Instrum. Meth.

A827 (2016) 63–78, [ arXiv:1407.5052 ].[8] F. James,

Statistical methods in experimental physics , Hackensack, USA: World Scientiﬁc (2006) 345 p (2006) .[9] P. J. Humphrey, W. Liu and D. A. Buote,

Chi-square and Poissonian Data: Biases Even in the High-CountRegime and How to Avoid them , Astrophys. J. (2009) 822, [ arXiv:0811.2796 ].[10] J. A. Nelder and R. W. M. Wedderburn,

Generalized linear models , J. R. Statist. Sco.

A135 (1972) 370–384.[11] P. L. Y. A. Charles, E. L. Frome,

The equivalence of generalized learst squares and maximum likelihoodestimates in the exponential family , J. Am. Stat. Ascco. (1976) 169–171.

12] H. Dembinski, M. Schmelling and R. Waldi,

Application of the Iterated Weighted Least-Squares Fit to countingexperiments , arXiv:1807.07911 .[13] K. J. Mighell, Parameter estimation in astronomy with poisson-distributed data. 1. The Chi square gammastatistic , Astrophys. J. (1999) 380, [ arXiv:astro-ph/9903093 ].[14] L. Lyons, R. P. Mount and R. Reitmeyer, eds.,

Statistical problems in particle physics, astrophysics andcosmology. Proceedings, Conference, PHYSTAT 2003, Stanford, USA, September 8-11, 2003 , 2003.[15] G. Bennett et al.,

Statistical equations and methods applied to the precision muon (g-2) experiment at bnl , Nucl.Instrum. Meth. A (2007) 1096 – 1116.[16] L. Demortier,

Equivalence of the best-ﬁt and covariance-matrix methods for comparing binned data with amodel in the presence of correlated systematic uncertainties , CDF

Note 8661 (1999) .[17] S. S. Wilks,

The large-sample distribution of the likelihood ratio for testing composite hypotheses , The Annalsof Mathematical Statistics (1938) 60–62.[18] G. J. Feldman and R. D. Cousins, A Uniﬁed approach to the classical statistical analysis of small signals , Phys.Rev.

D57 (1998) 3873–3889, [ arXiv:physics/9711021 ].[19] PROSPECT collaboration, J. Ashenfelter et al.,

The PROSPECT Reactor Antineutrino Experiment , arXiv:1808.00097 .[20] PROSPECT collaboration, J. Ashenfelter et al., First search for short-baseline neutrino oscillations at HFIRwith PROSPECT , Phys. Rev. Lett. (2018) 251802, [ arXiv:1806.02784 ].[21] Daya Bay collaboration, F. P. An et al.,

Spectral measurement of electron antineutrino oscillation amplitudeand frequency at Daya Bay , Phys. Rev. Lett. (2014) 061801, [ arXiv:1310.6732 ].].