[PDF] Criteria for projected discovery and exclusion sensitivities of counting experiments

Abstract

The projected discovery and exclusion capabilities of particle physics and astrophysics/cosmology experiments are often quantified using the median expected p -value or its corresponding significance. We argue that this criterion leads to flawed results, which for example can counterintuitively project lessened sensitivities if the experiment takes more data or reduces its background. We discuss the merits of several alternatives to the median expected significance, both when the background is known and when it is subject to some uncertainty. We advocate for standard use of the "exact Asimov significance" Z A detailed in this paper.

Full PDF

CCriteria for projected discovery and exclusion sensitivities of counting experiments

Prudhvi N. Bhattiprolu, Stephen P. Martin, and James D. Wells Northern Illinois University, DeKalb IL 60115, USA University of Michigan, Ann Arbor, MI 48109, USA (Dated: September 16, 2020)The projected discovery and exclusion capabilities of experiments are often quantiﬁed using themedian expected p -value or its corresponding signiﬁcance. We argue that this criterion leads toﬂawed results, which for example can counterintuitively project lessened sensitivities if the exper-iment takes more data or reduces its background. We discuss the merits of several alternatives tothe median expected signiﬁcance, both when the background is known and when it is subject tosome uncertainty. We advocate for standard use of the “exact Asimov signiﬁcance” Z A detailed inthis letter. Introduction.

Consider the problem of assessing theeﬃcacy of a planned experiment that will measure eventcounts that could be ascribed either to a new physics sig-nal or a standard physics background. The criteria fordiscovery or exclusion of the signal can be quantiﬁed interms of the p -value. In general, for a given experimentalresult, p is the probability of obtaining a result of equalor greater incompatibility with a null hypothesis H . Inhigh-energy physics searches, for example, the one-sided p -value results are usually reported in terms of the sig-niﬁcance Z = √ − (2 p ) , (1)and the criteria for discovery and exclusion have oftenbeen taken, somewhat arbitrarily, as Z > p < . × − ) and p < .

05 (

Z > . s and b respectively, where s is knownand b may be subject to some uncertainty. For as-sessing the prospects for discovery, one simulates manyequivalent pseudo-experiments with data generated un-der the assumption H data = H s + b that both signaland background are present, obtaining observed events n , n , n , . . . . One then calculates the p -value for each ofthose simulated experiments ( p , p , p , . . . ) with respectto the null hypothesis H = H b that only backgroundis present. For exclusion, the roles of the two hypothe-ses are reversed; the pseudo-experiment data is generatedunder the assumption H data = H b that only backgroundis present, and the null hypothesis H = H s + b is thatboth signal and background are present, so that a dif-ferent set of p -values is obtained. The challenge is tosynthesize the results in the limit of a very large numberof pseudo-experiments into a signiﬁcance estimate Z disc or Z excl . There is no agreement on this step, which is theprimary focus of this letter.A common measure [1] of the power of an experimentis the median expected signiﬁcance Z med for discoveryor exclusion of some important signal (i.e., the medianof Z ( p ) , Z ( p ) , Z ( p ) , . . . for the simulated p -values). Areason to use the median (rather than mean) is that eq. (1) is non-linear, so that the mean of a set of Z -valuesis not the same as the Z -value of the corresponding meanof p -values.However, Z med has a counter-intuitive ﬂaw, which ismost prominent when s and b are not too large, and es-pecially for exclusion. As we show below, for a given ﬁxed s , Z med can actually signiﬁcantly increase as b increases.Similarly, for a given ﬁxed b , Z med can decrease as s isincreased. This leads to the paradoxical situation thatan experiment could be judged worse, according to the Z med criteria, if it acquires more data, or if it reduces itsbackground. In this letter, we discuss this problem, andconsider some alternatives to Z med . Known background case.

The Poisson probability ofobserving n events, given a mean µ , is P ( n | µ ) = e − µ µ n /n ! . (2)Consider ﬁrst the idealized case that the signal and back-ground Poisson means s and b are both known exactly.One can then generate pseudo-experiment results for n ,using µ = s + b for the discovery case, and µ = b forthe exclusion case. A large number of simulated pseudo-experiments can be generated randomly via Monte Carlo,as described in the Introduction. However, for all casesin this letter, it is equivalent but much more eﬃcient andaccurate to consider exactly once each result n that cancontribute non-negligibly, and then weight the results ac-cording to the probability of occurrence.The p -value for discovery, if n events are observed, is p disc ( n, b ) = ∞ (cid:88) k = n P ( k | b ) = γ ( n, b ) / Γ( n ) , (3)while that for exclusion is p excl ( n, b, s ) = n (cid:88) k =0 P ( k | s + b ) = Γ( n + 1 , s + b )Γ( n + 1) , (4)where Γ( x ), γ ( x, y ), and Γ( x, y ) are the ordinary, lowerincomplete, and upper incomplete gamma functions, re-spectively. The median p -value among the pseudo-experiments can now be converted, using eq. (1), to ob-tain Z meddisc ( s, b ) and Z medexcl ( s, b ). a r X i v : . [ phy s i c s . d a t a - a n ] S e p Some typical results for Z meddisc and Z medexcl as a functionof b are shown in Figure 1. They each have a “sawtooth”shape, rather than monotonic as one might perhaps ex-pect. This illustrates the unfortunate feature mentionedin the Introduction that the median expected Z can in-crease with increasing b . As noted in [2, 3] for Z meddisc ,the underlying reason is that the allowed values of n arediscrete (integers), causing the median to get “stuck”instead of varying continuously in response to changesin s or b . We emphasize that this sawtooth behavioris exactly reproducible for any suﬃciently large numberof pseudo-experiments, and has nothing to do with ran-domness from insuﬃcient sampling. It is more promi-nent for exclusion than for discovery, because the num-ber of events relevant for the median pseudo-experimentis smaller. Also, note that for larger b , the sawteeth getcloser together as the integer n of the median gets larger,but the height of the sawtooth envelope remains signif-icant. This is eﬀectively a sort of practical randomnessin Z med , as tiny changes in s or b will move one betweenthe top and the bottom of the sawtooth envelope.We now consider several alternatives to Z med . First,one can take the arithmetic mean of the Z -values directly,which we call Z mean . (In computing Z meandisc , we use Z = 0for no observed events, n = 0. A reasonable alternativedeﬁnition for both Z meandisc and Z meanexcl would be to use Z = 0 for all outcomes n that give a negative Z . Thatwould give slightly larger values for Z mean , but usuallynegligibly so except when Z mean is uninterestingly smallanyway.) Second, one can take the arithmetic mean ofthe p -values, and then convert these to Z values, whichwe call Z p mean . Third, one can consider the Z -value ob-tained for the mean n (i.e., average over the simulated n , n , n , . . . ); the use of the mean data for computingthe expected signiﬁcance has been used in [5, 6] and [2, 3]and was called the Asimov data in the latter three refer-ences. Refs. [2, 3] obtained an Asimov approximation to Z meddisc : Z CCGVdisc = (cid:112) s + b ) ln(1 + s/b ) − s ] , (5)and ref. [4] gave a similar result for exclusion: Z KMexcl = (cid:112) s − b ln(1 + s/b )] . (6)These are both based on a likelihood ratio method ap-proximation (valid in the limit of a large event sample)for Z given in [7] in the context of γ -ray astronomy.In this letter, we propose instead to simply use for theAsimov approximation the exact p -values in eqs. (3) and(4) with n replaced by its expected means: (cid:104) n disc (cid:105) = s + b, (cid:104) n excl (cid:105) = b, (7)so that p Asimovdisc = γ ( s + b, b ) / Γ( s + b ) , (8) p Asimovexcl = Γ( b + 1 , s + b ) / Γ( b + 1) , (9) which can be readily converted to Z -values using eq. (1).We call this the “exact Asimov signiﬁcance” and denoteit by Z A .Along with Z med , Figure 1 also shows Z mean and Z A for the discovery and exclusion cases, together with Z CCGVdisc , and Z KMexcl , as a function of b , for ﬁxed s = 3 , , Z mean and Z A are within the Z med sawtooth en-velopes, but decrease monotonically with b . We concludethat they are both sensible measures of the expectedsigniﬁcance. In the discovery case, Z mean is generallyslightly more conservative than Z A , and the reverse istrue for the exclusion case. The previously known Asi-mov approximations Z CCGVdisc and Z KMexcl of refs. [2, 3] and[4] are considerably less conservative, lying near the up-per edges of the Z med sawtooth envelopes.Not shown in Fig. 1 is Z p mean , which we ﬁnd is muchlower than all of the others, due to being dominated byunlikely outcomes with large p -values, and therefore nota reasonable measure of the expected signiﬁcance. (Al-though we do not recommend its use, we note the amus-ing fact Z p meandisc = Z p meanexcl , the proof of which does notrely on the assumed probability distribution, and so alsoholds exactly in the case of an uncertain background dis-cussed below.) One sometimes sees s/ √ b used as an es-timate, but this is much larger than the Z ’s shown inFig. 1, and, as is well-known, is not a good estimate ofthe expected signiﬁcance except when b is large. Uncertain background case.

More realistically, theexpected mean number of background counts can be sub-ject to uncertainties of various sorts. In high-energyphysics, the background uncertainty for a future exper-iment is often dominated by limitations in perturba-tive theoretical calculations or systematic eﬀects, both ofwhich are unknown (and indeed diﬃcult to rigorously de-ﬁne) but can be roughly estimated or conjectured. Thereare also statistical uncertainties that will arise from alimited number of events in control or sideband regions.Here, we will consider, in part as a proxy for other typesof uncertainties, the “on-oﬀ problem” (see for example[7–12]), in which the background is estimated by a mea-surement of m Poisson events in a supposed background-only (oﬀ) region. The ratio of the background Poissonmean in this region to the background mean in the sig-nal (on) region is assumed to be a known number τ . Thepoint estimates for the Poisson mean and the uncertaintyof the background in the signal region are thenˆ b = m/τ, ∆ ˆ b = √ m/τ. (10)While this Poisson variance is certainly not a rigorousmodel for systematic or perturbative calculation uncer-tainties, we propose that it can also be used as a roughproxy for them, in the sense that a proposed estimate forˆ b and ∆ ˆ b can be traded for ( m, τ ) in the on-oﬀ problem.We now assign probabilities ∆ P to each possible countoutcome n in the on region, given m events in the oﬀ − b Z d i s c s = 3 s = 6 s = 12 Z med Z mean Z A Z CCGV − b Z e x c l s = 3 s = 6 s = 12 Z med Z mean Z A Z KM FIG. 1. Expected signiﬁcances for discovery (left) and exclusion (right), for signal means s = 3, 6, and 12, as functions of thebackground mean b . Shown are Z med , Z mean , Z A , and the approximations Z CCGV and Z KM from refs. [2, 3] and [4]. Themedian expected signiﬁcances show a sawtooth behavior, rather than decreasing monotonically with b . region, following a hybrid Bayesian-frequentist approachby averaging [10–14] over the possible background meansusing a Bayesian posterior with a ﬂat prior, P ( b | m, τ ) = τ ( τ b ) m e − τb /m ! , (11)(normalized so that (cid:82) ∞ db P ( b | m, τ ) = 1), from which wethen ﬁnd∆ P ( n, m, τ, s ) = (cid:90) ∞ db P ( b | m, τ ) e − ( s + b ) ( s + b ) n n != τ m +1 e − s Γ( m + 1)Γ( n + 1) (cid:90) ∞ db b m ( s + b ) n e − b ( τ +1) = τ m +1 e − s Γ( m + 1) n (cid:88) k =0 s k k ! ( n − k )! Γ( n − k + m + 1)( τ + 1) n − k + m +1 . (12)Note that here the true background mean b ap-pears only as an integration variable, and that (cid:80) ∞ n =0 ∆ P ( n, m, τ, s ) = 1, for any m, τ, s . The limitlim τ →∞ ∆ P ( n, m, τ, s ), with m/τ = ˆ b held ﬁxed, recov-ers the Poisson distribution P ( n | s + ˆ b ). In the second equality of eq. (12), we have written a form valid for non-integer n and m , both to deﬁne Z A below and to accountfor the fact that an estimated ˆ b and ∆ ˆ b may correspondto non-integer m . The third equality is more useful when n is an integer, and also in the case s = 0 where only the k = 0 term survives and one can replace n ! by Γ( n + 1).The p -value for discovery has two equivalent forms, p disc ( n, m, τ ) = ∞ (cid:88) k = n ∆ P ( k, m, τ, B (1 / ( τ + 1) , n, m + 1) /B ( n, m + 1) , (13)where the ﬁrst form was given in [10–13] and the second(involving the ordinary and incomplete beta functions)was obtained in a frequentist approach by [8, 9]. Despiteappearances, these two forms are equivalent [11, 12], jus-tifying the choice made in eq. (11).For exclusion, we ﬁnd p excl ( n, m, τ, s ) = n (cid:88) k =0 ∆ P ( k, m, τ, s ) = n (cid:88) k =0 τ m +1 ( τ + 1) k + m +1 Γ( k + m + 1)Γ( n − k + 1 , s ) k ! Γ( m + 1)Γ( n − k + 1)= τ m +1 Γ( n + 1)Γ( m + 1) (cid:90) ∞ db e − τb b m Γ( n + 1 , s + b )= (cid:20) Γ( n + 1 , s ) − e − s (cid:90) ∞ db e − b ( s + b ) n Γ( m + 1 , τ b ) / Γ( m + 1) (cid:21) / Γ( n + 1) . (14)where the ﬁrst form (following directly from the deﬁni- tion) involves a double sum, the second single-sum formis more eﬃcient if n is an integer, while the last two formsare valid for non-integer n, m , have diﬀering ease of nu-merical evaluation depending on the inputs, and followfrom each other by integration by parts.We can now consider the expected signiﬁcances in thecase that ˆ b and ∆ ˆ b have been ﬁxed, corresponding eitherto a calculation of the background with limited accuracy,or to a measurement of m for a given τ . This is doneby generating pseudo-experiments for n , distributed ac-cording to the probabilities ∆ P ( n, m, τ, s ) for discoveryand ∆ P ( n, m, τ,

0) for exclusion, and then evaluating the p -values according to eq. (13) for discovery and eq. (14)for exclusion. As before, we consider Z med , Z mean , and Z A obtained from the allowed pseudo-experiment data,each as functions of s, ˆ b, ∆ ˆ b . Here, Z A is obtained by re-placing n by its mean expected values. For the discoveryand exclusion cases respectively, we ﬁnd these are (cid:104) n disc (cid:105) = s + (cid:101) b, (15) (cid:104) n excl (cid:105) = (cid:101) b, (16)where (cid:101) b = ( m + 1) /τ = ˆ b + ∆ b / ˆ b. (17)Then p Asimovdisc ( s, ˆ b, ∆ ˆ b ) = p disc ( (cid:104) n disc (cid:105) , m, τ ) , (18) p Asimovexcl ( s, ˆ b, ∆ ˆ b ) = p excl ( (cid:104) n excl (cid:105) , m, τ, s ) , (19)which are converted to Z Adisc and Z Aexcl as usual.Note that the mean expected event count in the ab-sence of signal, (cid:101) b , is distinct from, and larger than, themeasured background estimate, ˆ b = m/τ . The fact that (cid:101) b > ˆ b can be understood heuristically as the statementthat, for ﬁnite τ , a given m is more likely to have beena downward rather than upward ﬂuctuation. As an ex-treme example, if m = 0, this could be a downward ﬂuc-tuation of a non-zero true background, but obviously itcould not be an upward one. Given ( m, τ ), depending onthe experimental situation there may be other justiﬁableprobability density functions besides eq. (11), and thesubsequent discussion carries through similarly for anyother choice. If we had chosen a diﬀerent Bayesian dis-tribution in eq. (11), then the expression for (cid:101) b (in termsof m and τ ) would change. For this reason, we prefer togive results directly in terms of the independent variableˆ b = m/τ corresponding to the direct measurement (orcalculation) of the background, rather than (cid:101) b .Refs. [3] and [4] had earlier provided Asimov approx-imations to the median discovery and exclusion signiﬁ-cances, respectively. [Equations (5) and (6) above arethe limits as ∆ b → b (cid:54) = 0, since theytake the (unknown) true background mean b as input,rather than the point estimate ˆ b = m/τ as we do here. Ifone ignores the distinction and considers b = ˆ b , then Z Adisc and Z Aexcl as deﬁned in this letter give more conservativesigniﬁcances than those obtained from [3, 4].Results for Z med , Z mean , and Z A for discovery and ex-clusion are shown in Figure 2 for ∆ ˆ b / ˆ b = 0 .

2, this time for s and ˆ b both taken proportional to an integrated luminos-ity factor (cid:82) L dt which represents the temporal progress ofthe experiment. We consider ﬁxed ratios s/ ˆ b = 2 , , . , Z med is evident, while Z mean and Z A both lie within or near its envelope, and can be taken asreasonable and monotonic measures of the expected dis-covery and exclusion capabilities. Note that Z Aexcl is moreconservative than Z medexcl or Z meanexcl for higher integratedluminosities, while Z mean is slightly more conservativefor discovery. As before, Z p meandisc = Z p meanexcl , not shown,gives far smaller values and cannot be recommended. InFig. 3, we show Z Adisc and Z Aexcl for ∆ ˆ b / ˆ b = 0 , . , and 0 . s/ ˆ b is smaller. Conclusion.

In this letter, we have critically examinedthe use of median expected signiﬁcance Z med and pos-sible alternatives. We ﬁnd that either Z mean or Z A asdeﬁned and evaluated above would be reasonable mea-sures of the discovery and exclusion capabilities of count-ing experiments with known or uncertain backgrounds.They both give results that are similar to Z med , but aremonotonic in the expected way with respect to changesin background and signal means and background uncer-tainties. They are also considerably more conservativethan previous Asimov approximations, especially whenthe background is small. The exclusion case with lowevent counts, where the sawtooth behavior of Z medexcl isparticularly prominent and problematic, is noteworthy,as the success of the Standard Model of particle physicssuggests the future importance of limit-setting capabil-ities for experimental signals with small rates includingrare decays, non-standard interactions, new heavy parti-cle production, and dark matter searches.In comparing Z mean and Z A , we note that there is no“correct” measure of the expected signiﬁcance, since thevarious Z deﬁnitions are simply diﬀerent answers to dif-ferent questions. The Z A measure is typically slightly lessconservative in evaluating discovery, and more conserva-tive for exclusion prospects, than Z mean . It may be sim-pler to extend Z A to the case of experiments that featuremore complex statistics than just integer counts of events.Also, the Z A measure, based on the means of the datadistributions, is often simpler to evaluate; in the countingexperiments considered here, this requires only directlyplugging into eqs. (8)-(9) for a known background, oreqs. (10) and (13)-(19) for an uncertain background. Forthese reasons, we advocate that Z A be the standard sig-niﬁcance measure for projected exclusions and discoverysensitivities in counting experiments. s (= σ s R L dt ) Z d i s c s/ ˆ b = 2 s/ ˆ b = 10 s/ ˆ b = 100 ∆ ˆ b / ˆ b = 0 . Z med Z mean Z A s (= σ s R L dt ) Z e x c l s/ ˆ b = 5 s/ ˆ b = 0 . ∆ ˆ b / ˆ b = 0 . Z med Z mean Z A FIG. 2. The median, mean, and exact Asimov expected signiﬁcances for discovery and exclusion, for ﬁxed ratios s/ ˆ b as labeled,as a function of s , for ∆ ˆ b / ˆ b = 0 .

2. Here s and ˆ b are assumed to be proportional to their respective cross-sections multiplied bythe integrated luminosity (cid:82) L dt of the experiment. s (= σ s R L dt ) Z A d i s c s/ ˆ b = 1 s/ ˆ b = 10 s/ ˆ b = 100 ∆ ˆ b / ˆ b = 0∆ ˆ b / ˆ b = 0 . ˆ b / ˆ b = 0 . s (= σ s R L dt ) Z A e x c l s/ ˆ b = 5 s/ ˆ b = 0 . ∆ ˆ b / ˆ b = 0∆ ˆ b / ˆ b = 0 . ˆ b / ˆ b = 0 . FIG. 3. The expected signiﬁcance measures Z Adisc and Z Aexcl , for ﬁxed ratios s/ ˆ b as labeled, as a function of s = σ s (cid:82) L dt , for∆ ˆ b / ˆ b = 0 , . , and 0 .

5, as labeled. For discovery we show s/ ˆ b = 1 , , s/ ˆ b = 0 . Acknowledgments.

This work is supported in partby the National Science Foundation under grant numbers1719273 and 2013340. This work was supported in partby the Department of Energy (DE-SC0007859). [1] G. Cowan. “Statistics”. In M. Tanabashi et al. (ParticleData Group), Phys. Rev. D98, 030001 (2018).[2] G. Cowan, K. Cranmer, E. Gross and O. Vitells, “Asymp-totic formulae for likelihood-based tests of new physics,”Eur. Phys. J. C 71, 1554 (2011) [arXiv:1007.1727[physics.data-an]].[3] G. Cowan, “Two developments in tests for discovery: useof weighted Monte Carlo events and an improved mea- sure”, Progress on Statistical Issues in Searches,” SLAC,June 4 - 6, 2012.[4] N. Kumar and S. P. Martin, “Vectorlike Leptons at theLarge Hadron Collider,” Phys. Rev. D 92, no.11, 115018(2015) [arXiv:1510.03456 [hep-ph]].[5] V. Bartsch and G. Quast, “Expected Signal Observabilityat Future Experiments,” CERN-CMS-NOTE-2005-004.[6] G. Aad et al. [ATLAS], “Expected Performance of theATLAS Experiment - Detector, Trigger and Physics,”[arXiv:0901.0512 [hep-ex]].[7] Ti-pei Li and Yu-qian Ma, “Analysis methods for re-sults in gamma-ray astronomy”, Astrophysical Journal272 (1983) 317.[8] N. Gehrels, “Conﬁdence limits for small numbers ofevents in astrophysical data,” Astrophys. J. 303, 336-346(1986) doi:10.1086/164079[9] S.N. Zhang, D. Ramsden, “Statistical Data Analysis for

Gamma-Ray Astronomy”, Experimental Astronomy 1(1990) 145-163. doi:10.1007/BF00462037[10] D.E. Alexandreas et al. , “Point source search tech-niques in ultra high energy gamma ray astronomy”,Nucl. Instrum. Meth. A 328, no.3, 570-577 (1993)doi.org/10.1016/0168-9002(93)90677-A[11] J. T. Linnemann, “Measures of signiﬁcance in HEPand astrophysics,” Proceedings of PhyStat 2003: Sta-tistical Problems in Particle Physics, Astrophysics,and Cosmology (SLAC, Stanford, California USASept. 8-11, 2003) eConf C030908, MOBT001 (2003)[arXiv:physics/0312059 [physics.data-an]]. [12] R. D. Cousins, J. T. Linnemann and J. Tucker, “Evalu-ation of three methods for calculating statistical signiﬁ-cance when incorporating a systematic uncertainty into atest of the background-only hypothesis for a Poisson pro-cess,” Nucl. Instrum. Meth. A 595, no.2, 480-501 (2008)[arXiv:physics/0702156 [physics.data-an]].[13] T. J. Haines, “An experimental search for evidence of anucleon decay signal above neutrino backgrounds,” UCIrvine PhD thesis, (1986) UMI-87-03928.[14] R. D. Cousins and V. L. Highland, “Incorporating sys-tematic uncertainties into an upper limit,” Nucl. In-strum. Meth. A 320, 331-335 (1992) doi:10.1016/0168-9002(92)90794-5

Supplementary Material

As noted above, in the case where the background estimate is determined by the method of measuring m in the“oﬀ region” and translating it to the “on region” through τ , it is possible to consider diﬀerent Bayesian priors forthe true background mean b , rather than the ﬂat prior chosen in the main text. For a simple two-parameter class ofexamples, consider Prior( b ) ∝ b q e − θb , (20)where q = θ = 0 recovers the choice made in the main text. Then one ﬁnds a normalized Bayesian posteriordistribution for the background, in place of eq. (11): P ( b | m, τ ) = ( τ + θ ) m + q +1 b m + q e − b ( τ + θ ) / Γ( m + q + 1) . (21)The calculations of ∆ P , p disc , and p excl would then go through as before with the replacements τ → τ + θ and m → m + q , with the results still expressible in terms of the independent variables ˆ b and ∆ ˆ b as deﬁned by eq. (10).In particular, one would have (cid:101) b = ( m + q + 1) / ( τ + θ ) = ˆ b [1 + ( q + 1)∆ b / ˆ b ] / [1 + θ ∆ b / ˆ b ] in that case. However, inthe absence of a compelling reason to the contrary, we consider the simple ﬂat prior q = θ = 0 to be preferred, as itsuccessfully reproduces the frequentist result eq. (13) for p disc , as shown in [11, 12]. In any case, the Z mean and Z A measures can be deﬁned as above with any suitable choice of prior as dictated by realistic considerations.We now show some further results supplementary to our main discussion. In Fig. 4, we ﬁrst show the probabilities∆ P ( n, m, τ, s ) for discovery (left panel) and ∆ P ( n, m, τ,

0) for exclusion (right panel), for a ﬁxed ˆ b = m/τ , as afunction of event count n in the signal (on) region, for various values of τ . The lines for τ = ∞ in both panelscorrespond to the Poisson distribution P ( n | µ ) with µ = s + ˆ b for the discovery case, and µ = ˆ b for the exclusion case.For a ﬁxed ˆ b , as τ gets larger, the ∆ P distribution approaches the Poisson distribution, as expected.Intuitively, we also expect the discovery and exclusion signiﬁcance measures to dramatically decrease when thebackground uncertainty gets larger. From Fig. 5, we see that the median expected signiﬁcance, once again, suﬀersfrom the sawtooth behavior. However, the expected signiﬁcances Z mean and Z A behave as we expect, and, as arguedabove, can be taken as reasonable measures of the expected discovery and exclusion signiﬁcances. Also, it is evidentfrom the ﬁgure that the (∆ ˆ b , ˆ b ) → (0 , b ) limit works out smoothly.One can consider other measures as alternatives to the median, mean, or Asimov expected Z . For a large numberof pseudo-experiments simulated for the discovery case, we can also count the number of these experiments, where wehave greater than 5 σ discovery, and thus obtain a probability P ( Z disc > P ( Z disc >

5) for∆ ˆ b / ˆ b = 0 (left panel), and 0.5 (right panel). As we expect, P ( Z disc >

5) decreases, more drastically for smaller s/ ˆ b , asthe background uncertainty increases. However, this measure also shows a sawtooth behavior, rather than increasingmonotonically with s = σ s (cid:82) L dt . Similarly, Fig. 7 shows the probability of obtaining greater than 95% CL exclusionin a large number of pseudo-experiments simulated for the exclusion case P ( Z excl > . ˆ b / ˆ b = 0 (left panel),and 0.5 (right panel). Once again, increasing the background uncertainty reduces P ( Z excl > . s/ ˆ b . And, as was the case with P ( Z disc > s = σ s (cid:82) L dt .Finally, Fig. 8 shows the probability of obtaining a signiﬁcance greater than a certain Z in a large number ofpseudo-experiments simulated for both discovery (left panel) and exclusion (right panel) cases, for ﬁxed ( s, ˆ b ) and∆ ˆ b / ˆ b = 0 , .

5, as a function of Z . As expected, both P ( Z disc > Z ) and P ( Z excl > Z ) decrease with increasing Z , n . . . . . . . . ∆ P ( n , τ , τ , ) τ = ∞ τ = 3 τ = 1 n . . . . . . . . ∆ P ( n , τ , τ , ) τ = ∞ τ = 3 τ = 1 FIG. 4. The distributions ∆ P ( n, m, τ, s ), for s = 5, ˆ b = m/τ = 5 (left panel) and s = 0, ˆ b = m/τ = 10 (right panel), for τ = 1 ,

3, and ∞ . In each case, the result for τ = ∞ is the Poisson distribution P ( n | s + ˆ b ) = P ( n | . . . . . . ∆ ˆ b / ˆ b . . . . . . . . . Z d i s c s = 24 , ˆ b = 10 Z med Z mean Z A . . . . . . ∆ ˆ b / ˆ b . . . . . . Z e x c l s = 12 , ˆ b = 20 Z med Z mean Z A FIG. 5. The median, mean, and exact Asimov expected signiﬁcances for discovery with s = 24 , ˆ b = 10 (left) and exclusionwith s = 12 , ˆ b = 10 (right), as a function of ∆ ˆ b / ˆ b . and with increasing background uncertainty. However, for smaller s/ ˆ b , background uncertainty does not have muchimpact on the results.A Python implementation of various signiﬁcance measures for projected exclusions and discovery sensitivities incounting experiments examined in this letter, including the advocated Z A , is made available in a code repository Zstats at https://github.com/prudhvibhattiprolu/Zstats . To illustrate the usage of the code, the repositoryalso has short programs that produce the data in each of the ﬁgures in this paper. More information about allfunctions in this package can also be accessed using the Python help function. − s (= σ s R L dt ) . . . . . . . . . . . P ( Z d i s c > ) ∆ ˆ b / ˆ b = 0 s/ ˆ b = 50 s/ ˆ b = 10 s/ ˆ b = 5 s/ ˆ b = 2 − s (= σ s R L dt ) . . . . . . . . . . . P ( Z d i s c > ) ∆ ˆ b / ˆ b = 0 . s/ ˆ b = 50 s/ ˆ b = 10 s/ ˆ b = 5 s/ ˆ b = 2 FIG. 6. The probability of obtaining a signiﬁcance Z disc >

5, corresponding to greater than 5 σ discovery, in a large numberof pseudo-experiments generated for the discovery case, for ﬁxed ratios s/ ˆ b = 2 , , , and 50, as a function of s = σ s (cid:82) L dt , for∆ ˆ b / ˆ b = 0 (left) and ∆ ˆ b / ˆ b = 0 . s (= σ s R L dt ) . . . . . . . . . . . P ( Z e x c l > . ) ∆ ˆ b / ˆ b = 0 s/ ˆ b = 10 s/ ˆ b = 1 s/ ˆ b = 0 . s (= σ s R L dt ) . . . . . . . . . . . P ( Z e x c l > . ) ∆ ˆ b / ˆ b = 0 . s/ ˆ b = 10 s/ ˆ b = 1 s/ ˆ b = 0 . FIG. 7. The probability of obtaining a signiﬁcance Z excl > . s/ ˆ b = 0 . , , and 10, as a function of s = σ s (cid:82) L dt , for ∆ ˆ b / ˆ b = 0 (left) and ∆ ˆ b / ˆ b = 0 . Z − − − − P ( Z d i s c > Z ) s = 1 , ˆ b = 10 s = 10 , ˆ b = 5 ∆ ˆ b / ˆ b = 0∆ ˆ b / ˆ b = 0 . . . . . . . . Z − − P ( Z e x c l > Z ) s = 5 , ˆ b = 5 s = 1 , ˆ b = 25 ∆ ˆ b / ˆ b = 0∆ ˆ b / ˆ b = 0 . FIG. 8. The probability of obtaining a discovery signiﬁcance Z disc > Z (left) and an exclusion signiﬁcance Z excl > Z (right)in a large number of pseudo-experiments, for various ( s, ˆ b ) as labeled, for ∆ ˆ b / ˆ b = 0 , and 0 .