[PDF] Harmonizing discovery thresholds and reporting two-sided confidence intervals: a modified Feldman & Cousins method

Abstract

When searching for new physics effects, collaborations will often wish to publish upper limits and intervals with a lower confidence level than the threshold they would set to claim an excess or a discovery. However, confidence intervals are typically constructed to provide constant coverage, or probability to contain the true value, with possible overcoverage if the random parameter is discrete. In particular, that means that the confidence interval will contain the 0 -signal case with the same frequency as the confidence level. This paper details a modification to the Feldman-Cousins method to allow a different, higher excess reporting significance than the interval confidence level.

Full PDF

PPrepared for submission to JINST

Harmonizing discovery thresholds and reportingtwo-sided conﬁdence intervals: a modiﬁed Feldman &Cousins method

K. D. Morå

Oskar Klein Centre, Department of Physics, Stockholm UniversityAlbanova University Center, SE-10691 Stockholm, Sweden

E-mail: [email protected]

Abstract: When searching for new physics eﬀects, collaborations will often wish to publish upperlimits and intervals with a lower conﬁdence level than the threshold they would set to claim an excessor a discovery. In this paper a modiﬁcation to the Feldman-Cousins method is proposed that allowsfor a transition from one-sided upper conﬁdence limits for null results and a two-sided conﬁdenceintervals for non-null results at any given speciﬁed threshold chosen to deﬁne the observation of asignal, while maintaining exact coverage.Keywords: Analysis and statistical methods, Dark Matter detectors (WIMPs, axions, etc.) a r X i v : . [ phy s i c s . d a t a - a n ] F e b ontents Many physics experiments, in particular searches for new physics, look for very low event rates wherethe asymptotic methods of constructing frequentist conﬁdence intervals do not work. Conﬁdenceintervals are required to have coverage; 1 − α -conﬁdence level intervals should contain the truevalue a fraction 1 − α of repeated experiments. However, the actual coverage of a statistical methodmay vary with the true signal properties. For example, an asymptotic 0 .

68 conﬁdence-level intervalfor a counting experiment observing n events , [ n − √ n , n + √ n ] , will cover the true expectationvalue µ asymptotically as µ → ∞ , but may cover as little as 0 .

55 and as much as 1 depending on µ .A method that provides conﬁdence intervals with exact coverage is known since 1937 as theNeyman construction [1]. The Neyman construction initially consists of constructing a conﬁdencebelt for each possible true value of parameter of interest s :1 − α = ∫ ba f ( x | s ) dx (1.1)where f ( x | s ) is the probability density function for the observed parameter x , which may depend on s , and [ a , b ] denote the limits of the conﬁdence belt. The conﬁdence interval on s can then be foundby constructing a ( s ) and b ( s ) , which will express the upper and lower range in which x would fall1 − α of the time if the true parameter of interest is s . Inverting these functions yields the Neymanconstruction limits for an observation x : [ a − ( x ) , b − ( x )] (1.2)The condition for the conﬁdence belt provided in equation 1.1 is not unique, and the limits of theconﬁdence belt have to be set by a boundary condition. This condition has traditionally consistedin either the desire to set upper or lower limits (for example in absence of a signal) or in reporting(symmetric or asymmetric) two-sided intervals in case of a measurement of a physical parameter.In some cases, such as in searches for a new particle, experiments may wish to report upper limits ifthey do not observe a discovery signiﬁcance exceeding a set threshold, and a two-sided conﬁdence– 1 –nterval otherwise. However, Feldman & Cousins [2] noted that the fact that switching betweenNeyman constructions based on the experimental outcome may lead to under-coverage, even if theindividual constructions provide coverage. The suggested remedy (hereafter referred to as the "FCmethod") is to construct conﬁdence intervals by a single Neyman construction that provides bothupper limits and two-sided intervals, depending on the experimental result. The FC conﬁdence belt,reviewed in section 2, uses the log-likelihood ratio to decide which regions of observable space toinclude ﬁrst. Figure 1 shows the upper and lower limits for a Gaussian observable x with knownbackground b and standard deviation 1 for the FC construction in blue, and an experiment thatswitches to two-sided intervals from upper limits if the discovery signiﬁcance exceeds 3 σ in green.Since this shift moves only the upper limit line for, for example s =

2, this approach will under-coverfor this signal. The modiﬁcation suggested in this paper, which is constructed to maintain coverageis shown in orange, and may be interpreted as a coverage-conserving interpolation between a one-and two-sided Neyman construction.Conventionally, upper limits are reported with conﬁdence levels of less than 95%, and two-sided intervals are presented only in the case of discovery or at least some reasonably signiﬁcantindication. The (one-sided) p-value of an indication is usually much smaller than the 5 % or more α implied by the 1 − α conﬁdence interval. While statistically presenting a two-sided interval andnot claiming a discovery does not pose a problem (the fact that the conﬁdence interval excludesthe non existence of a signal at some conﬁdence level should not be confused with a discoveryclaim), in practice experimenters are reluctant to present a two sided limit even if the FC methodprovides it. A common remedy is to report only the upper edge of the interval provided. This leadsto a signal-dependent over-coverage, or, equivalently, some conﬁdence intervals or upper limitscould be more constraining without violating coverage. In this paper, we suggest a modiﬁcation ofthe FC method that will provide two-sided intervals only at a desired discovery threshold, whilestill providing a uniﬁed conﬁdence interval calculation method and improving the coverage. Thepaper is organized as follows: in section 2 we review brieﬂy the FC method, and the procedure forassessing the existence of an excess or a discovery. In section 3 we introduce the modiﬁed versionof the FC method, and we illustrate the method with line-search example in section 4. For an experiment where one measures some data (cid:174) x with a probability distribution f ( (cid:174) x | s ) thatdepends on a parameter s , the likelihood is given as L( s ) = f ( (cid:174) x | s ) . The method proposed byFeldman and Cousins uses the log-ratio R between the likelihood given s , and the s that minimizesthe likelihood, ˆ s : R ( θ ) = · log [L( ˆ s )/L( s )] (2.1)to decide which (cid:174) x to include. Either constructing the conﬁdence belt from Equation 1.1 with theconstraint that the (cid:174) x with the lowest R ( s ) are included ﬁrst, or constructing the conﬁdence beltdirectly in the R ( s ) parameter: 1 − α = ∫ R max ,α ( s ) f ( R | s ) dR (2.2)– 2 –or each value of s will yield the FC construction. The conﬁdence interval, whether one- or two-sided will be the region where R ( s ) < R max ,α ( s ) . Note that the threshold likelihood ratio R max ,α ( s ) also depends on the parameter of interest. In the case that an experiment is looking to constraina parameter s that has a null-hypothesis and lower bound, s , the method has to give conﬁdenceintervals that do not contain s in α of the cases. For example, in searches for the production cross-section of an unknown particle, α of conﬁdence intervals will exclude the no-signal null-hypothesis.The log-likelihood ratio R ( s ) is typically also used to assess discovery signiﬁcance with respectto the null-hypothesis s = s . The p-value of R ( s result ) under the null-hypothesis is: p result ( R result ( s )) = ∫ ∞ R result ( s ) f ( R | s ) dR (2.3)This may also be inverted to yield discovery thresholds; p − result ( α ) is the discovery threshold foran α excess. Note that this equation shows that at the null-hypothesis s = s , the FC thresholdfor inclusion in the conﬁdence interval, R max ,α ( s ) implies a p-value of α , and that a conﬁdenceinterval that does not include s implies a p-value below α . Typical conﬁdence intervals for upperlimits, and thus the FC construction are α = . , . , .

01. Using the FC method consistently willreport two-sided intervals at those same thresholds. However, a conventional discovery thresholdin particle physics is 5 σ , or p = · − , and experiments may not wish to publish measurementsof excesses lower than, say, 3 σ , or p = . · − . A pragmatic solution to this is to only report theupper edge of the conﬁdence interval as an upper limit until the discovery signiﬁcance has exceededthe required discovery threshold. This will lead to over-coverage, as one extends a conﬁdenceinterval constructed to cover with an 1 − α frequency. The aim of the modiﬁed method is to provide a construction with a desired discovery signiﬁcancethreshold, diﬀerent than what the conﬁdence level would imply, in addition to maintaining theconstant coverage of the pure FC construction. In ﬁgure 1, this modiﬁcation is indicated withan orange line, showing that the modiﬁcation does not change the FC construction at highersignals, while approaching the one-sided Neyman construction upper limit for low signals. In thisillustration, the modiﬁcation does not reach the median signal-free result of x − b =

0, but for higherdiscovery thresholds, such as 4 σ , even the median upper limit will be aﬀected by the modiﬁcation,as shown in the coverage plots in ﬁgure 6 for the example in section 4.We wish to include all results where the discovery signiﬁcance is less than the reporting thresh-old in our Neyman conﬁdence belt, while maintaining coverage for all signals. To accomplish this,we will treat upwards and downwards ﬂuctuations separately, and include all upwards ﬂuctuationsthat do not rise to the reporting threshold in our band. This will require constricting the conﬁdenceband for downwards ﬂuctuations to conserve coverage. To distinguish between upwards and down-wards ﬂuctuations, the proposed modiﬁcation to the FC method uses an idea very similar as thatwhich was used by the atlas Higgs search [3]; where the ordering ratio R is multiplied by the signof ˆ s − s : R (cid:48) ( s ) = sgn ( ˆ s − s ) · R ( s ) (3.1)– 3 – x − b s Two Neyman constructionsFC limitsmodified FC

Figure 1 : Illustrations of three constructions of upper and lower limits for a Gaussian observable x ,with known background b . The green lines show the upper and lower limit as function of x − b for anexperiment that switches between setting a 90% upper limits for discovery signiﬁcances below 3 σ ,and uses a two-sided interval above. The blue lines shows the FC upper and lower limits. Orangelines shows the modiﬁed FC method, that like the FC method provides coverage for all true signals,but switches between a one- and two-sided limit when the threshold signiﬁcance of 3 σ is reached.This leads the upper limit for this construction to approach the one-sided limit construction for low x + b .This separates the cases where the data prefers a lower and higher signal than the tested hypothesis.Close to a boundary, say a requirement that s ≤ s , R (cid:48) ( s ) can only be non-negative, and forslightly larger s , the distribution of R (cid:48) ( s ) will still be asymmetric between upwards and downwardsﬂuctuations. The switch of sign of R (cid:48) ( s ) occurs as ˆ s approaches s , which is also where R ( s ) approaches 0. Examples of the distributions of R (cid:48) ( s ) for the line-search detailed in the next sectionare shown in ﬁgure 2, including a blue line indicating the ranges of R (cid:48) ( s ) corresponding to a 90%conﬁdence level FC interval. Orange lines show that the modiﬁed FC method band shifts to includemore of the positive R (cid:48) ( s ) in order to avoid excluding excesses below the discovery thresholdsindicated.We will construct the edge of the conﬁdence belt corresponding to upwards ﬂuctuations, R + ( s ) ,ﬁrst. We denote the conﬁdence level of the interval 1 − α , and the p-value threshold for reporting atwo-sided excess γ . To ensure that our conﬁdence intervals exclude the null hypothesis case whenthe discovery signiﬁcance exceeds the reporting threshold, R + ( s ) must correspond to the discoverythreshold R max ,γ ( s ) deﬁned in equation 2.3. At large signals, we wish R + ( s ) to approach the FCedge R max ,α ( s ) . We accomplish this by interpolating between the two thresholds: R (cid:48) ( s ) + = w ( s ) · R max ,γ ( s ) + ( − w ( s )) · R max ,α ( s ) (3.2)where w ( s ) is a weighting function that monotonically decreases from 1 at s = s to 0 as s increases.The FC threshold function, R max ,α ( s ) is deﬁned by equation 2.2. The freedom to choose w ( s ) – 4 – -4 -2 σ = 3 σ = 4 FC intervalModified FC intervalss=0.00 5 10 15 R ( s ) -4 -2 σ = 3 σ = 4 FC intervalModified FC intervalss=3.0

Figure 2 : Histograms of R (cid:48) ( s ) computed for toy-Monte Carlo simulations for 0 and 3 expectedsignal events in the upper and lower panel, according to the example in section 4. The best-ﬁt signalrate ˆ s is constrained to be non-negative in the signal model and ﬁt. The sharp boundary at R (cid:48) ( s ) = ≤ ˆ s is applied to the best-ﬁt.Blue and orange bands show the 90% conﬁdence band for the FC method and the modiﬁed FCmethod, respectively, with the latter shown both for a 3 σ and 4 σ discovery threshold.reﬂects the original freedom in the Neyman construction. However, we wish the conﬁdence bandto rapidly approach the FC band with increasing s . In some simple cases, such as a single Gaussiandistributed variable with known standard deviation, an observation with a discovery signiﬁcanceexactly equal to the threshold will have an R (cid:48) ( s ) -curve that exactly divides observations below orabove the discovery threshold, and the R + ( s ) curve may be constructed as the maximum of thiscurve and the FC threshold. This corresponds to the vertical line at x − b = R (cid:48) i ( s ) for all these observations,labelled with index i , and compute the maximum value at each signal, R envelope ( s ) = sup i ( R (cid:48) i ( s )) .Finally, we set the threshold R + ( s ) to be the greatest of R envelope ( s ) and R max ,α ( s ) . This constructionis shown in ﬁgure 3.The lower edge of the interval, R − ( s ) is then deﬁned so that for any signal, 1 − α of R (cid:48) ( s ) s arecontained between R − ( s ) and R + ( s ) :1 − α = ∫ R + ( s ) R − ( s ) f ( R (cid:48) | s ) dR (cid:48) (3.3)At s = s , the above equation would indicate a coverage of 1 − α . However, at the border of thedomain for α , the distribution of R (cid:48) ( s ) will be peaked towards 0, as shown in the lower panelof ﬁgure 2 , and by deﬁning the conﬁdence interval lower threshold R − ( s ) = − (cid:15) , where (cid:15) is aninﬁnitesimally small negative number such that no R (cid:48) ( s ) are lower than R − ( s ) , the coverage at theboundary can be arranged to be γ . – 5 –

10 20 30 40 50 s R ( s ) < σ > σ FC constructionmodified FC σ Discovery Threshold

Figure 3 : Construction of the modiﬁed FC threshold function, using curves of R (cid:48) ( s ) for multiple toy-Monte Carlo realizations, colored according to whether they exceed the 3 σ discovery signiﬁcancethreshold (green) or not (gray). The thick blue curve shows the threshold corresponding to a 90%FC construction, while the orange curve, showing R + ( s ) , is constructed to be equal to the discoverythreshold, indicated with a dashed green line, at 0 signal, and to be greater or equal to both thelikelihood-ratio curves with discovery signiﬁcance less than 3 σ and the FC threshold.Conﬁdence intervals are constructed as in the FC case as intersections between R (cid:48) ( s ) and R + ( s ) for lower limits, and R (cid:48) ( s ) = R − ( s ) for upper limits. For increasing discovery thresholds,the discovery belt must move to include higher positive values of R (cid:48) ( s ) , and correspondinglyincrease R − ( s ) as well. Close to s , where the shift is the highest, R − ( s ) will approach theNeyman construction boundary for an upper-limit-only construction, which can provide emptyconﬁdence intervals for strong but ﬁnite downwards ﬂuctuations of the background. The FCmethod yields higher limits in the downwards ﬂuctuation regime, as illustrated in ﬁgure 1 for theGaussian example, with the upper limit approaching zero asymptotically when the downwardsﬂuctuation approaches negative inﬁnity. Some experiments setting upper limits have adopted theCLs method [4], which penalizes the p-value to yield a signal-dependent over-coverage at lowsignal-background discrimination approaching 1 for signals approaching s . Others have used apower-constraint [5], where upper limits are not placed below a signal where the experiment has acertain discovery power. Direct detection experiments using the two-sided FC method [6, 7], haveapplied a power-constraint, corresponding to a − σ downwards ﬂuctuation of upper limit. Thecoverage properties of the power-constraint applied to the modiﬁed FC conﬁdence intervals has asimpler form than the CLs method, with the coverage being 1 − β ( s ) below the critical discoverypower, where β ( s ) is the discovery power, and 0 . E/E -1 C o un t s Power-LawRandom Data m = 5 E m = 60 E Figure 4 : Background distribution (blue line), and signal distributions for m = E and m = E (orange and green), together with a histogram showing an example data-set drawn from thebackground-only distribution. As an example, we consider an experiment that observes events with energies E i , and searchesfor a Gaussian signal line with a certain mass in the presence of a power-law background. Theprobability distribution function f ( E ) has the form: f ( E | s ) = s ( s + b ) f s ( E ) + bs + b f b ( E ) (4.1) f s ( E ) ≡ √ πσ e ( E − m ) − σ (4.2) f b ( E ) ≡ E Γ · (cid:20)∫ E E E Γ d E (cid:21) − (4.3)Here, s is the signal expectation value, and m , σ = m / f s ( E ) . The background power-law f b ( E ) has expectationvalue b = Γ = −

2. Both distributions are normalized between E and E = E . The signal expectation value is not allowed to be negative in the ﬁt, 0 ≤ ˆ s . In thisexample, the nuisance parameters aﬀecting the distribution shapes, and the background expectationare ﬁxed. For most experiments, the likelihood will include a number of nuisance parameters. Inthat case, the ordering parameter R may be based on the proﬁled likelihood instead, yielding theproﬁle construction [8], where coverage is not ensured by construction, but must be investigated.The toy-Monte Carlo methods used to construct the conﬁdence belt and to investigate coverageproperties are identical to the ones used for the proﬁle construction.The extended un-binned likelihood for the observation of N energies with values E i can be– 7 – s R ( s ) Log-likelihood ratio, ˆ s = 5 . Log-likelihood ratio × sign (ˆ s − s ) FC constructionmodified FC σ Discovery Threshold (a) m = s R ( s ) Log-likelihood ratio, ˆ s = 0 . Log-likelihood ratio × sign (ˆ s − s ) FC constructionmodified FC σ Discovery Threshold (b) m = Figure 5 : Illustration of conﬁdence interval constructions using the FC (blue bands) and themodiﬁed FC (orange bands). Intersections between the R (cid:48) ( s ) -curve, in black and the thresholdsdeﬁne the upper and lower limits of the interval. For comparison, the FC construction is shownwith the dashed black line, with conﬁdence interval boundaries marked by blue dots.written: L( s | N , (cid:174) E ) = Pois ( N | s + b ) · N (cid:214) i = [ f ( E i | s )] (4.4)Here, Pois ( N | s + b ) is the Poisson probability to observe N events given an expectation value of s + b . The distribution of the signed ordering parameter R (cid:48) ( s ) , deﬁned from equations 2.1 and 3.1,is shown in ﬁgure 2, for a signal mass m = E , and for two diﬀerent true signal expectations s = ,

3. As an example of using toy Monte-Carlo methods to determine the interpolation function w ( s ) , Figure 3 shows multiple curves of R (cid:48) ( s ) from toyMC simulations with true signals rangingfrom 0 to 40, divided by whether the discovery signiﬁcance, assessed with R (cid:48) ( ) , is above or belowa 3 σ discovery threshold. The weighting function w ( s ) is chosen so that the modiﬁed FC thresholdis equal or greater than all the R (cid:48) ( s ) curves for all signals, until the R (cid:48) ( s ) + -curve, in orange, meetsthe FC belt in blue. The 90% conﬁdence belts derived from this construction are also shown inﬁgure 2 as bands for 3 and 4 σ discovery thresholds.The conﬁdence interval construction for the power-law example is shown in ﬁgure 5, showingboth cases where an excess with a p-value below 10% gives a two-sided interval, and a case whereboth constructions yields upper limits. The conﬁdence interval consists of the signal range where R (cid:48) ( s ) is contained between the R + ( s ) and R − ( s ) curves.The coverage of the FC and modiﬁed FC method 90% conﬁdence intervals are shown in ﬁg 6for m = ,

60. The pure FC method, shown in a blue line, provides the expected coverage. Thegreen curves show the over-coverage of experiments using the FC construction with a thresholdfor reporting the lower limit of either a 3 or 4 σ discovery signiﬁcance. The modiﬁed FC methodexhibits the desired coverage of 0 . s c o v e r a g e FC sensitivityModified FC sensitivityFCFC + σ discovery thresholdModified FC, σ (a) m =

5, 3 σ threshold s c o v e r a g e FC sensitivityModified FC sensitivityFCFC + σ discovery thresholdModified FC, σ (b) m =

5, 4 σ threshold s c o v e r a g e FC sensitivityModified FC sensitivityFCFC + σ discovery thresholdModified FC, σ (c) m =

60, 3 σ threshold s c o v e r a g e FC sensitivityModified FC sensitivityFCFC + σ discovery thresholdModified FC, σ (d) m =

60, 4 σ threshold Figure 6 : Coverage as function of signal expectation for the FC method (blue), the FC method,including a 3 or 4 σ discovery threshold (green) and the modiﬁed FC method (orange), for twoline masses. Median upper limits are indicated in dashed lines for the FC (blue) and modiﬁed FC(orange) cases.lower threshold shown in ﬁgure 5 can be seen, with a greater change seen for the larger, 4 σ discoverythreshold. This paper proposes a method for constructing a modiﬁed FC method where the discovery sig-niﬁcance is diﬀerent from the conﬁdence level of the upper limits and intervals. For an examplecase, the coverage at 0 signal corresponds to the discovery signiﬁcance, and moves to the requiredconﬁdence level for all signals larger than 0. This allows experiments to avoid the over-coveragethat results from expanding the standard FC intervals, and simpliﬁes reporting or discussion ofcoverage properties. The intervals approach the one-sided upper limit for under-ﬂuctuations ofthe data, motivating an application of a power-constraint lower signal threshold or similar to theconﬁdence interval construction outlined in this paper. This will result in discrete coverage regimes– 9 –hat depend on the true signal size, and allow the experiment to directly set the discovery thresholdand minimal discovery power independent of the conﬁdence level of the interval.

Acknowledgements

The author would like to thank Jan Conrad and Jelle Aalbers for fruitful discussions and suggestions.This research was supported by a grant of the Knut and Alice Wallenberg Foundation, PI: J. Conrad

References [1] J. Neyman. Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability.

Phil. Trans. Roy. Soc. Lond. , A236(767):333–380, 1937. doi: 10.1098/rsta.1937.0005.[2] Gary J. Feldman and Robert D. Cousins. A Uniﬁed approach to the classical statistical analysis ofsmall signals.

Phys. Rev. , D57:3873–3889, 1998. doi: 10.1103/PhysRevD.57.3873.[3] Georges Aad and others (ATLAS Collaboration). Combined search for the Standard Model Higgsboson in pp collisions at √ s = Phys. Rev. , D86:032003, 2012. doi:10.1103/PhysRevD.86.032003.[4] A L Read. Modiﬁed frequentist analysis of search results (the CL s method).(CERN-OPEN-2000-205), 2000. URL http://cds.cern.ch/record/451614 .[5] Glen Cowan, Kyle Cranmer, Eilam Gross, and Ofer Vitells. Power-Constrained Limits. pre-print , 2011.[physics.data-an/1105.3166].[6] E. Aprile and others (XENON Collaboration). Dark Matter Search Results from a One Ton-YearExposure of XENON1T. Phys. Rev. Lett. , 121(11):111302, 2018. doi:10.1103/PhysRevLett.121.111302.[7] D. S. Akerib and others (LUX collaboration). Results from a search for dark matter in the completeLUX exposure.

Phys. Rev. Lett. , 118(2):021303, 2017. doi: 10.1103/PhysRevLett.118.021303.[8] M. Tanabashi and others (PDG). Review of particle physics.

Phys. Rev. D , 98:030001, Aug 2018. doi:10.1103/PhysRevD.98.030001. URL https://link.aps.org/doi/10.1103/PhysRevD.98.030001 ..