[PDF] Chi-squared Test for Binned, Gaussian Samples

Abstract

We examine the χ 2 test for binned, Gaussian samples, including effects due to the fact that the experimentally available sample standard deviation and the unavailable true standard deviation have different statistical properties. For data formed by binning Gaussian samples with bin size n , we find that the expected value and standard deviation of the reduced χ 2 statistic is n−1 n−3 ± n−1 n−3 n−2 n−5 − − − − − √ 2 N−1 − − − − − − √ , where N is the total number of binned values. This is strictly larger in both mean and standard deviation than the value of 1±(2/(N−1) ) 1/2 reported in standard treatments, which ignore the distinction between true and sample standard deviation.

Full PDF

CChi-squared Test for Binned, Gaussian Samples

Nicholas R. Hutzler

Division of Physics, Mathematics, and AstronomyCalifornia Institute of TechnologyPasadena, CA 91125E-mail: [email protected]

June 2019

Abstract.

We examine the χ test for binned, Gaussian samples, including eﬀectsdue to the fact that the experimentally available sample standard deviation and theunavailable true standard deviation have diﬀerent statistical properties. For dataformed by binning Gaussian samples with bin size n , we ﬁnd that the expected valueand standard deviation of the reduced χ statistic is n − n − ± n − n − (cid:114) n − n − (cid:114) N − , (1)where N is the total number of binned values. This is strictly larger in both meanand standard deviation than the value of 1 ± (2 / ( N − / reported in standardtreatments, which ignore the distinction between true and sample standard deviation.

1. Introduction

Precision measurements of physical quantities typically require a very large numberof individual measurements of the same quantity often taken under varying conditions,such as drifting signal-to-noise or many experimental conﬁgurations with diﬀerent signalsizes. For this reason, as well as for simpliﬁcation of data analysis and reductionof computational requirements, the data are typically binned together such thatmeasurements in the same bin were taken within a time during which the conditions weresimilar. In order to check whether the binning is susceptible to the varying conditions,as well as to search for unknown sources of noise, a χ test [1, 2, 3] is commonly used.Regardless of whether or not it is an ideal choice of statistic for this case, it is fairlyintuitive as a measure of whether the assigned error bars are correctly capturing thestatistics of the data. However, some of the simplifying assumptions used to constructthe standard χ can give results with a signiﬁcant bias for large data sets. We discusswhy the standard treatment underestimates both the mean and variance of the χ statistic, and then determine the appropriate correction factors. a r X i v : . [ phy s i c s . d a t a - a n ] J un hi-squared Test for Binned, Gaussian Samples

2. Chi-squared test for binned, Gaussian samples

Consider a quantity N x (cid:29) x i without any assigned uncertainties.Say that the measurements are normally distributed with constant, true mean µ thatis not known to the experimenter. We shall not assume that the data has a constantvariance. Let us gather these data sequentially into groups G j with n consecutive pointseach. Now compute the usual sample mean, standard deviation, and standard error ofeach group of points: y j = 1 n (cid:88) x i ∈ G j x i , s j = (cid:115) n − (cid:88) x i ∈ G j ( x i − y j ) , s yj = 1 √ n s j . (2)We have now binned our data into a smaller set of N = N x /n (cid:29) y j and uncertainties s yj . As a check to see whether the assigned uncertainties are correctlycapturing the statistical ﬂuctuations of the data we can perform a χ test as outlinedin many standard texts [1, 2, 3]. We will test the hypothesis that the y j are normallydistributed about a constant ¯ y (though this approach is easily extended to models withmore degrees of freedom), and that the uncertainties correctly describe the statisticalﬂuctuations of the data about the mean. The reduced- χ value of the data set is χ red = 1 N − N (cid:88) j =1 (cid:18) y j − ¯ yσ yj (cid:19) ≡ N − N (cid:88) j =1 χ j , (3)where ¯ y = ( (cid:80) j y j /s yj ) / ( (cid:80) j /s yj ) is the weighted mean of the y data, and σ yj is thetrue (unknown) standard deviation of the points { x i ∈ G j } , which need not be constantover diﬀerent values of j . If the ﬂuctuations in the data are Gaussian in nature, andcorrectly accounted for by the uncertainties, then we have the usual resultE[ χ red ] = 1 , Std[ χ red ] = (cid:114) N − . (4)However, the experimenter does not know the true standard deviation, and thereforeactually computes the statistic˜ χ red = 1 N − N (cid:88) j =1 (cid:18) y j − ¯ ys yj (cid:19) ≡ N − N (cid:88) j =1 ˜ χ j , (5)using s yj as an estimator for σ yj . We wish to ﬁnd the statistical properties ofthis quantity, which we shall ﬁnd diﬀer from χ red . Intuitively, the sample standarddeviation is computed from a ﬁnite number of measurements and therefore has someuncertainty associated with it, and that uncertainty should be propagated through whenexamining the ˜ χ red statistic. This is a well-known eﬀect when estimating parametersfrom ﬁnite data sets and has been previously explored in a number of contexts, forexample Poisson distributions, counting experiments, weighted means, and histogramﬁtting [4, 5, 6, 7, 8, 9, 10].More speciﬁcally, while χ j ∼ N (0 ,

1) is normally distributed, ˜ χ j is not:˜ χ j ≡ (cid:18) y j − ¯ ys yj (cid:19) ≈ (cid:18) y j − µs yj (cid:19) ∼ t ( n − , (6) hi-squared Test for Binned, Gaussian Samples t -distribution with n − n thana normal distribution. Notice that we are treating ¯ y = µ as a constant, which is valid inthe limit N (cid:29)

1, though for smaller N the statistical properties of the weighted meancannot be ignored [9, 11, 12, 13, 14, 15]. In particular, the weighted mean also hascorrection factors due to the diﬀerence between true and sample standard deviation,and has a non-trivial variance, both of which will impact the ˜ χ red statistic. A gooddiscussion of these complexities can be found in reference [15].The square of ˜ χ j is therefore distributed as ˜ χ j ∼ F (1 , n − F -distributionwith (1 , n −

1) degrees of freedom, which hasE[ F (1 , n − n − n − , Var[ F (1 , n − (cid:18) n − n − (cid:19) n − n − . (7)This is as opposed to the χ j statistic, which has (appropriately) a χ distribution. ˜ χ red is therefore distributed as a sum of F -distributions, which is complicated [16]. However,the expectation value and variance are straightforward to calculate,E[ ˜ χ red ] = NN − (cid:2) ˜ χ j (cid:3) = n − n − O (cid:0) N − (cid:1) , (8)Var[ ˜ χ red ] = N ( N − Var (cid:2) ˜ χ j (cid:3) = 2 N − (cid:18) n − n − (cid:19) n − n − O (cid:0) N − (cid:1) . (9)This implies that the mean and standard deviation of the ˜ χ red statistic are largerthan those of the χ red statistic byE[ ˜ χ red ]E[ χ red ] = n − n − , Std[ ˜ χ red ]Std[ χ red ] = n − n − (cid:114) n − n − , (10)up to further corrections of order O ( N − ). A plot of these correction factors is shownin Figure 1. In the limit n → ∞ we recover the usual result, but for ﬁnite n we willalways expect larger values for both mean and standard deviation. We can also see thatchoosing n ≤ χ red2 ]/E[ χ red2 ]Std[ χ red2 ]/Std[ χ red2 ] ~~ Figure 1.

Correction factors to the mean and standard deviation of ˜ χ red . hi-squared Test for Binned, Gaussian Samples

3. Conclusion

In summary, we ﬁnd that the standard χ statistic computed from binning ﬁnite datasets underestimates the mean and variance for binned Gaussian samples, and derivesimple, closed expressions for the biases. For very large data sets with ﬁnite bin sizes,such as those commonly found in precision physics measurements, these corrections canbe signiﬁcant and should not be neglected. Acknowledgments.

I would like to acknowledge helpful discussions with DavidWatson, and many helpful discussions with the ACME Collaboration, in particularDavid DeMille, John M. Doyle, and Brendon O’Leary.

Appendix: A simple example

We can see how the “usual” chi-squared statistic gives an incorrect result by performinga simple numerical test on some simulated data. Generate 1,000,000 points x i ∼ N (0 , n = 10, and then compute means y j , standard errors σ yj , and thereduced chi-squared statistic ˜ χ red (as described in the main text) for the resulting 100,000binned points. Nx = 1000000 //Number of x valuesnbin = 10 //Number of points to binfor j = 1:(Nx/nbin) //Step over binsx = randn(1,nbin) //Generate nbin normally distributed pointsy(j) = mean(x) //Meanssigmayi(j) = std(x)/sqrt(nbin) //Standard errorsendybar = sum(y./sigmayi.^2)/sum(1./sigmayi.^2) //Weighted meanchi = (y-ybar)./sigmayi //chichi2 = sum(chi.^2) //chi^2dof = length(y)-1 //Degrees of freedomredchi2 = chi2/dof //Reduced chi^2redchi2sigma = sqrt(2/dof) //‘‘Usual’’ uncertainty of chi^2

If we run this piece of code, we will ﬁnd redchi2 = 1.2868 and redchi2sigma =0.0045 (though of course the former will be diﬀerent each time due to the randomnature of the calculation.) This value diﬀers considerably from the na¨ıve expectationof 1 ± . . ± . hi-squared Test for Binned, Gaussian Samples [1] Press W H, Teukolsky S A, Vetterling W T and Flannery B P 2007 Numerical Recipes

Data Reduction and Error Analysis for the PhysicalSciences

An Introduction to Error Analysis

Nucl. Instruments Methods Phys. Res.

Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers,Detect. Assoc. Equip.

Nucl. Instruments Methods Phys. Res. Sect. A Accel.Spectrometers, Detect. Assoc. Equip.

Astrophys. J.

Nucl. Instruments Methods Phys. Res. Sect. A Accel.Spectrometers, Detect. Assoc. Equip.

Metrologia Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect.Assoc. Equip.

A New Limit on the Electron Electric Dipole Moment

Ph.D. thesis HarvardUniversity[12] Cochran W G 1937

Suppl. to J. R. Stat. Soc. Biometrics Biometrical J. Statistical Meta-Analysis with Applications (Wiley-Interscience)[16] Morrison D F 1971

J. Am. Stat. Assoc.66