[PDF] Resolving Histogram Binning Dilemmas with Binless and Binfull Algorithms

Abstract

The histogram is an analysis tool in widespread use within many sciences, with high energy physics as a prime example. However, there exists an inherent bias in the choice of binning for the histogram, with different choices potentially leading to different interpretations. This paper aims to eliminate this bias using two "debinning" algorithms. Both algorithms generate an observed cumulative distribution function from the data, and use it to construct a representation of the underlying probability distribution function. The strengths and weaknesses of these two algorithms are compared and contrasted. The applicability and future prospects of these algorithms is also discussed.

Full PDF

RResolving Histogram Binning Dilemmas withBinless and Binfull Algorithms

Abram Krislock Nathan Krislock May 21, 2014 Department of Physics, Stockholm University – Oskar Klein Centre,AlbaNova, SE-106 91, Stockholm, Sweden Department of Mathematical Sciences, Northern Illinois University,DeKalb, Illinois 60115, USA

Abstract

The histogram is an analysis tool in widespread use within manysciences, with high energy physics as a prime example. However, thereexists an inherent bias in the choice of binning for the histogram, withdiﬀerent choices potentially leading to diﬀerent interpretations. Thispaper aims to eliminate this bias using two “debinning” algorithms.Both algorithms generate an observed cumulative distribution func-tion from the data, and use it to construct a representation of theunderlying probability distribution function. The strengths and weak-nesses of these two algorithms are compared and contrasted. The ap-plicability and future prospects of these algorithms is also discussed.

High energy physics (HEP) research makes common use use of the histogramas a data analysis tool . A great deal of particle physics data, both experi-mental [1] and phenomenological [2], is analyzed with histograms. Many other areas of scientiﬁc research involving statistical analysis also commonly usethe histogram. The techniques described in this paper apply to any such similar analysis.For concreteness, in this paper we will restrict our discussion to that of HEP research. a r X i v : . [ phy s i c s . d a t a - a n ] M a y he histogram is a way of representing the data in such a way as to haveit look like and be comparable to the underlying probability distributionfunction (UPDF) of the particle physics reality or model prediction. We willrefer to the histogram, or any other representation, of a given data set asan observed probability distribution function (OPDF). The UPDF dependson some set of parameters, and so a set of bins for the OPDF histogram arechosen in terms of those parameters.However, the choice of these bins involves an inherent bias [3]. The anal-ysis of histogrammed data can be highly dependent upon the set of binschosen. This is particularly true for the case of performing a ﬁt of a his-togram with a theoretical or parameterized UPDF; the results obtained fromdiﬀerent choices of bin sets may diﬀer more than the reported uncertainty inthe ﬁt.We demonstrate this problem with a simple example from HEP phe-nomenology: Early searches for supersymmetry at the Large Hadron Colliderwere expecting to ﬁnd kinks or endpoints within kinematical distributionssuch as invariant masses [4], or the m T variable [5, 6]. We show such a pos-sible signal in Fig. 1.1, which shows two histograms of the same data butwith diﬀerent bin sets. The two histograms are then ﬁt with the followingfunction: y = (cid:26) m ( x − x kink ) + b, if x < x kink ,m ( x − x kink ) + b, if x ≥ x kink . (1.1)This equation describes a bent line with a kink at x = x kink . The slopesof the lines on either side of the kink are m and m , and b is the value ofthe function at the kink. The location of the kink is the only parameter ofinterest. Fig. 1.1 shows the two histograms ﬁt with this function, as well asthe ﬁt result and uncertainty for the parameter x kink . As the ﬁgure shows,the locations of x kink from the two ﬁts do not agree. If we treat the ﬁt resultsfrom the ﬁgure as normal distributions, the probability for a sample x kink2 to be greater than a sample x kink1 is 0 . .

50 100 150 200

Arbitrary Units C o un t s Bin width , offsetBin width x kink1 =142 . ± . x kink2 =132 . ± . Figure 1.1: The same data is used to ﬁll two histograms with diﬀerent binsets. Eq. (1.1) is used to ﬁt each of these from their peak bin to their lastnon-zero bin. The green (light gray) histogram has oﬀset bins, with theﬁrst bin edge beginning at x = 3 and a bin width of 10. The purple (gray)histogram has a bin width of 5, beginning at the origin. The ﬁt results forthese two histograms are shown as green (gray) and purple (dark gray) lines,respectively. The locations of the kinks, which are shown in the ﬁgure legend,disagree at the 99 .

8% level.choices on scientiﬁc results is generally not discussed.Thus, our goal for this paper is to avoid such bias by constructing OPDFrepresentations other than regular histograms. Our new representations mustavoid the above mentioned binning dilemmas. In this paper, we will presenttwo such “debinning” algorithms, both of which involve the relations betweenthe OPDF and the observed Cumulative Distribution Function (OCDF).These algorithms were inspired by the method used in [3]. The “binless”algorithm never bins the data at all, instead determining the OPDF as thesmoothed, numerical derivative of the OCDF. In contrast, the “binfull” algo-rithm uses the OCDF as a Monte Carlo generator for the OPDF. A smoothingfunction is applied during the generation of a very large number of points.These points are then used to create a histogram which is full of very smallbins. Each of these two methods has its strengths and weaknesses, which wewill also describe. 3urrently, both of these algorithms assume an intimate understandingof the backgrounds inherent in a given HEP study. Our algorithms bothconsider the comparison between the debinned background-only OPDF andthe background UPDF. We run each algorithm multiple times over the back-ground data, scanning to ﬁnd the best-ﬁt value of the smoothing parameter.We then use the resulting smoothing parameter to generate the ultimatedebinned full-data OPDF.The remainder of this paper is organized as follows. In Sec. 2, we describethe binless algorithm by deriving its equations, describing their implemen-tation, and presenting example plots. The binfull algorithm is described inSec. 3, where we describe the Monte Carlo generator, smoothing function,and implementation, as well as presenting more example plots. We conclude,compare these two methods, and comment on future work in Sec. 4. Lastly,we have included Appx. A to describe the generation of the data used fortesting these debinning algorithms. Additional plots, animations, and all ofour code can be found on our debinning webpage [7].

As stated in the introduction, the binless algorithm ﬁrst forms the OCDFfrom the data. This is followed by taking the smoothed numerical derivativeof it to obtain the binless OPDF.The method of constructing an OCDF is as follows [3]. First, the data issorted by value into an array. Thus we have, say, N data values, x , · · · , x N ,where x i > x j if i > j . Then, the OCDF, z ( x ) is constructed from this datasimply as, z ( x ) =  , if x < x ,i, if x i ≤ x < x i +1 , for i ∈ [1 , N − ,N, if x N ≤ x. (2.1)This function z thus looks like a staircase with stairs of non-uniform length.Next, we take the derivative of z to arrive at our binless OPDF. The rea-son it must be a smoothed numerical derivative is that taking a naive numer-ical derivative of z results in a large amount of noise in the resulting binlessOPDF [8]. Thus, to obtain a binless OPDF which is even remotely use-ful, we must have a more sophisticated numerical diﬀerentiation algorithm.4his algorithm must be able to smooth the noise introduced by numericaldiﬀerentiation. We use the method of numerical diﬀerentiation presented in [8], which is aleast squares minimization problem including a Total Variation (TV) penal-ization term [9]. The problem is deﬁned as follows: We are working in (cid:60) oversome domain Ω = [0 , L ], using the inner-product (cid:104) u, v (cid:105) = (cid:82) Ω u ( x ) v ( x ) d x . Weare trying to ﬁnd the function, u ( x ), which describes the derivative of somedata, z ( x ), while ensuring that u ( x ) does not vary too much due to the noisein z ( x ). The plan is to minimize the following functional : f [ u ] = 12 (cid:107) Au − z (cid:107) + α (cid:107) u (cid:48) (cid:107) . (2.2)Note that the 2-norm is deﬁned to be (cid:107) v (cid:107) = (cid:112) (cid:104) v, v (cid:105) = (cid:113)(cid:82) Ω [ v ( x )] d x andthe 1-norm is deﬁned as (cid:107) v (cid:107) = (cid:82) Ω | v ( x ) | d x .The functional in Eq. (2.2) contains two terms. The ﬁrst term, the leastsquares term, penalizes the diﬀerence between the anti-derivative of u ( x ),where [ Au ]( x ) ≡ (cid:82) x u ( w ) d w , and z ( x ). The second term is the TV termwhich minimizes the total amount of variation of u ( x ). The parameter α controls the relative inﬂuence of the TV term compared to the least squaresterm.In order to ﬁnd the u that minimizes the functional f [ u ] in Eq. (2.2), weﬁrst need to determine the gradient, or functional derivative, of f . However,since the TV term in f [ u ] is not diﬀerentiable when (cid:107) u (cid:48) (cid:107) = 0, we will insteadminimize f β [ u ] = 12 (cid:107) Au − z (cid:107) + α (cid:90) Ω (cid:113) ( u (cid:48) ( x )) + β d x, (2.3)where β is a small positive parameter.We remind the reader that the functional derivative can be deﬁned asfollows: ∂f [ u ] ∂u ( x ) = lim h → f [ u + hδ ( · − x )] − f [ u ] h . (2.4) In this article, whenever its meaning is unambiguous, an apostrophe or “prime” inany equation will be understood to be a regular derivative, such that u (cid:48) ( x ) ≡ d u d x . δ is the Dirac delta function [10]. For our purposes, the relevant prop-erties of the Dirac delta function are (cid:90) Ω v ( x ) δ ( x − y ) d x = v ( y ) (cid:90) Ω v ( x ) δ (cid:48) ( x − y ) d x = − v (cid:48) ( y ) (cid:90) Ω v ( x )( (cid:82) x δ ( w − y ) d w ) d x = (cid:90) Ly v ( x ) d x (2.5)for any function v which is deﬁned over Ω.Computing the gradient of Eq. (2.3), using Eq. (2.5), is left as an exercisefor the reader. The result is g [ u ] ≡ ∂f β [ u ] ∂u = (cid:2) A ‡ A + αL ( u ) (cid:3) u − A ‡ z, (2.6)where the L operator is given by L ( u ) v ≡ − (cid:32) v (cid:48) (cid:112) ( u (cid:48) ) + β (cid:33) (cid:48) , (2.7)and A ‡ is the adjoint of the A anti-derivative operator, and is given by[ A ‡ v ]( x ) ≡ (cid:82) Lx v ( w ) d w . Note that A ‡ satisﬁes (cid:104) Au, v (cid:105) = (cid:104) u, A ‡ v (cid:105) .We deﬁne our algorithm by recognizing that the minimum of the leastsquares functional is found by setting the gradient to zero, and iterating theresulting equation, which is (cid:2) A ‡ A + αL ( u ( k ) ) (cid:3) u ( k +1) = A ‡ z. (2.8)In this equation, u ( k ) is the result of the k th iteration, starting from someinitial guess, u (0) . We can deﬁne the step of our algorithm to be s ( k ) ≡ u ( k +1) − u ( k ) . Then the algorithm equation becomes H ( k ) s ( k ) = − g [ u ( k ) ] , u ( k +1) = u ( k ) + s ( k ) , (2.9)where H ( k ) ≡ (cid:2) A ‡ A + αL ( u ( k ) ) (cid:3) . Since H ( k ) is the Hessian, or second deriva-tive, of f β , this is Newton’s method for minimizing f β .6 .2 Implementation for a General Derivative Unlike [8], our x values are not uniformly spaced for our data z ( x ). Thus,some care has to be taken in deﬁning the various operators which go into thealgorithm equation, Eq. (2.9). Since our data z ( x ) is a “staircase” function,we treat it as a column vector, z , associated with a column vector, x . Bothof these vectors have length N , the number of data points. The operatorswe need are then deﬁned as square matrices which act on these vectors.The derivative matrix is based upon the discretization of a simple deriva-tive: d z d x ( x ) = lim h → z ( x ) − z ( x − h ) h ⇒ (cid:20) d z d x (cid:21) i ≈ z i − z i − x i − x i − . (2.10)The derivative is then deﬁned in terms of a forward diﬀerence matrix, D ,where D =  · · · − − · · · − · · · · · · · · · −  . (2.11)(The − ∇ x , is then given as ∇ x = (∆ x ) − D, (2.12)where ∆ x is a diagonal matrix, with i th entry [∆ x ] ii = x i − x i − (where x ≡ x ) − is also diagonal, with i th entry [(∆ x ) − ] ii = ([∆ x ] ii ) − .The A and A ‡ operators are similarly based upon the discretization of asimple integral. We discretize our integral using trapezoidal areas:[ Az ] i = (cid:90) x i z ( w ) d w ≈ i (cid:88) j =1

12 ( x j − x j − )( z j + z j − ) . (2.13)Here, it is again understood that x ≡ z ≡

0. Thus, A is deﬁned as A = 12  · · · · · · · · · · · · · · ·  ∆ x  · · · · · · · · · · · · · · ·  . (2.14)7ince (cid:2) A ‡ z (cid:3) i = (cid:90) Lx i z ( w ) d w ≈ L (cid:88) j = i +1

12 ( x j − x j − )( z j + z j − ) , (2.15)we ﬁnd that A ‡ is similarly deﬁned as A ‡ = 12  · · · · · · · · · · · · · · ·  ∆ x  · · · · · · · · · · · · · · ·  . (2.16)Finally, care also must be taken in constructing the L ( u ( k ) ) operator.Since it depends on the iteration guess u ( k ) , it must be re-initialized for eachstep of the algorithm. Considering Eq. (2.7), we see that derivatives appearin both the numerator and denominator within L . We can approximate this“derivative ratio” with a “ratio of forward diﬀerence matrix operations”.Thus, the discretization of L is L ( u ( k ) ) = −∇ x (cid:16) (cid:101) ∆ u ( k ) (cid:17) − D (2.17)where (cid:101) ∆ u ( k ) is a diagonal matrix, such that (cid:104) (cid:101) ∆ u ( k ) (cid:105) ii = (cid:112) [ Du ( k ) ] + β . The main work in the binless algorithm is solving the linear system inEq. (2.9) each iteration. Since we would like to apply the binless algorithmto very large data sets, simply forming the matrix H ( k ) and storing it willrequire vast amounts of time and memory. Therefore, we need to solve thislinear system using an iterative matrix-free method. Such methods do notrequire the full matrix H ( k ) , but instead just need a method for applying H ( k ) to a vector (i.e., given a vector v , the method returns H ( k ) v ).Our binless Python module is a collection of simple functions which,given a data set x , provide the discretized calculus operators described inSec. 2.2 as either sparse matrices or matrix-free methods. The derivative(and related) operators are very sparse matrices, so the functions which pro-vide them simply return them as NumPy [11] linked-list sparse matrices. On8he other hand, the anti-derivative operators are not sparse, but we can rep-resent them using eﬃcient matrix-free methods, such as using cummulativesummation for the left-most matrix in the description of A in Eq. (2.14).We found the iterative solver LGMRES (loose generalized minimum resid-ual algorithm) [12] to be particularly eﬀective for solving the linear system inEq. (2.9). However, we also found that it was necessary to use a well-chosenpreconditioner P . A good preconditioner P will approximate the matrix H ( k ) but will also be easy to invert. Instead of solving the linear system H ( k ) s ( k ) = − g [ u ( k ) ], we solve the equivalent system: (cid:0) H ( k ) P − (cid:1) y = − g [ u ( k ) ] , s ( k ) = P − y. Recall that H ( k ) = A ‡ A + αL ( u ( k ) ). We found that the following precondi-tioner was very eﬀective: P = Diag(diag( A ‡ A )) + αL ( u ( k ) ) , where diag( M ) returns the diagonal of a square matrix M and Diag( v ) re-turns a diagonal matrix with the vector v along its diagonal. Using LGMRESwith this preconditioner makes the binless algorithm very fast. This allowsus to run it multiple times using the background-only data to ﬁnd the best-ﬁtsmoothing parameter α within a short amount of time.The default run of our runbinless.py script generates multiple binlessOPDFs as follows:1. Loop over the sample signal plus background UPDFs deﬁned in utilities/sampleFunctions.py .2. Generate (or load) N = 1000 data points for each. Background-onlydata is also generated.3. Iteratively run the binless algorithm on the background data and com-pare each result to the background-only UPDF to ﬁnd the best-ﬁtsmoothing parameter, α .4. Using the best-ﬁt α , run the binless algorithm on the full signal plusbackground data.5. Save (if necessary) the UPDF data and create the binless plots.9

50 100 150 200

Arbitrary Units P r o b a b ili t y True PDF

PointsBinless log ( ) =3.858 Figure 2.1: A comparison between the OPDFs and UPDF for the “easyendpoint” (Appx. A.1). The easy endpoint UPDF is shown as an orange(gray) line. The regular histgoram is ﬁlled with 1000 data points, and isshown with green (light gray) ﬁlled bars. The binless OPDF is shown as1000 very densly packed circular points. The best-ﬁt smoothing parameter α is given within the legend.We present the results of this process for two sample functions here. Theresults for the UPDFs described in Appx. A.1 and Appx. A.2 are shownin Fig. 2.1 and Fig. 2.2, respectively. The former shows an excellent agree-ment between the binless result and the UPDF. The latter shows the limita-tion of using periodic boundary conditions within the binless algorithm. The binfull algorithm utilizes a Monte Carlo generator which is based uponthe OCDF and can eﬀectively regenerate the original data. To start, weconstruct the OCDF as z ( x ) from Eq. (2.1). To regenerate a data value, anumber, r , between zero and one is pulled from a pseudo-random numbergenerator. We multiply by the total number of data values, N , and then ﬁndthe smallest z ( x i ) value of the OCDF which is greater than or equal to rN .The result is x i . 10

50 100 150 200

Arbitrary Units P r o b a b ili t y True PDF

PointsBinless log ( ) =2.128 Figure 2.2: A comparison between the OPDFs and UPDF for “the line”(Appx. A.2).The line UPDF is shown as an orange (gray) line. The regularhistgoram is ﬁlled with 1000 data points, and is shown with green (light gray)ﬁlled bars. The binless OPDF is shown as 1000 very densly packed circularpoints. The points near x = 0 and x = 200 look particularly bad due to theperiodic boundary conditioned derivative operators. The best-ﬁt smoothingparameter α is given within the legend.Like the naive numerical derivative of the OCDF, there is not much useto this Monte Carlo method without data smoothing. A simple example ofapplying a smoothing function is to add a random deviate, (cid:15) ( σ ), which ispulled from a Gaussian distribution of width σ . Thus, instead of ﬁlling ourbinfull histogram with x i , we ﬁll it with x i + (cid:15) ( σ ). We can run this smoothedMonte Carlo generator an arbitrarily large number of times. The more pointsit generates, the smaller we can make the bins in our binfull histogram. The binfull module contains utility functions, classes representing diﬀerentsmoothing functions, and a class to contain the binfull histogram resultingfrom this algorithm. Like a regular histogram, the binfull histogram storesdata as a set of bins and bin contents. (Storing all of the raw binfull dataturned out to be a memory disaster.)11he default run for runbinless.py is as follows:1. Loop over the sample signal plus background UPDFs deﬁned in utilities/sampleFunctions.py .2. Generate (or load) N = 1000 data points for each. Background-onlydata is also generated.3. If binfull data already exists, load it and skip the next two steps.4. Iteratively run the binfull algorithm on the background data and com-pare each result to the background-only UPDF to ﬁnd the best-ﬁtsmoothing parameter, σ , for each smoothing function.5. Using the best-ﬁt σ for each smoothing function, run the binless algo-rithm on the full signal plus background data.6. Save (if necessary) the UPDF and binfull data, and create the binfullplots.We now show the binfull results for the same data sets as we usedin Sec. 2.3. They are shown in Fig. 3.1 and Fig. 3.2, respectively. For theseplots, we use a Gaussian smoothing function whose width grows linearly ev-ery time the same x i is generated by the Monte Carlo. Thus, as each point x i is generated, we ﬁll our binfull histogram with x i + (cid:15) (cid:0) σ × N i N binfull (cid:1) , where N i isthe number of x i s given by the Monte Carlo thus far, and N binfull = 10 isthe default number of points generated by the binfull algorithm. In this paper we have examined two algorithms for generating representa-tions of OPDFs other than histograms. The binless algorithm determinesan OPDF function as the smoothed derivative of the OCDF. The binfullalgorithm creates an OPDF histogram which is so full of small bins that itmay as well not have bins. Each of these methods has its own strengths andweaknesses.The binless algorithm is incredibly fast and well suited for larger datasets. The memory requirements of the binless algorithm are small, since foran input data array of N data points, only arrays of length N and N × N

50 100 150 200

Arbitrary Units P r o b a b ili t y True PDF

PointsLinearExpanding Binfull =5.668 ×10 Figure 3.1: A comparison between the OPDFs and UPDF for the “easyendpoint” (Appx. A.1). The easy endpoint UPDF is shown as an brown(dark gray) line. The regular histgoram is ﬁlled with 1000 data points, andis shown with green (light gray) ﬁlled bars. The binless histogram is shownas a purple (gray) ﬁlled region. The best-ﬁt smoothing parameter σ is givenwithin the legend.matrices are involved. These matrices are either sparse, or represented asmatrix-free methods, each of which returns the operation of the matrix onan array. The result of the binless algorithm is also just an array of length N , which is no more memory demanding than the input. Thus, the entireresult can be stored for future use.However, the current implementation of the binless algorithm is onlyuseful for OPDFs which tend towards zero at either end of the domain Ω =[0 , L ]. This is due to the periodic boundary conditions which we built intothe derivative matrices. Additionally, the binless algorithm suﬀers from asmoothing parameter which is not easy to interpret. It is not at all intuitiveas to how α aﬀects the minimization algorithm used to determine the binlessOPDF.The weaknesses of the binless algorithm tend to be the strengths of thebinfull algorithm and vice versa. For instance, the smoothing of the binfullalgorithm is highly customizable, since one may program their own smoothingfunction. Because of this, the smoothing parameter used for the binfull13

50 100 150 200

Arbitrary Units P r o b a b ili t y True PDF

PointsLinearExpanding Binfull =7.042 ×10 Figure 3.2: A comparison between the OPDFs and UPDF for “the line”(Appx. A.2). The easy endpoint UPDF is shown as an brown (dark gray)line. The regular histgoram is ﬁlled with 1000 data points, and is shown withgreen (light gray) ﬁlled bars. The binless histogram is shown as a purple(gray) ﬁlled region. The best-ﬁt smoothing parameter σ is given within thelegend.algorithm is very intuitive. Also, the binfull algorithm can easily handleOPDFs of any shape, including OPDFs which tend to large values at onlyone end of the domain.However, the binfull algorithm is much slower and requires much morememory than the binless algorithm. The default runBinless.py script isroughly ten times faster than runBinfull.py . The binfull algorithm may besped up by keeping each and every point of data which it generates withinthe speedy NumPy arrays. Unfortunately, it then becomes a memory dis-aster, potentially freezing up the user’s computer. Thus, by default, binfullhistograms retain only information about their bins and bin contents.Ultimately, these two methods are quite complementary, each one makingup for the weaknesses of the other. Together, they certainly overcome thebinning bias inherent in regular histograms. The smoothing parameter theyeach use is determined blindly, chosen as the value which best reproducesthe well-understood background UPDFs.We even view the shortcomings of these algorithms instead as opportuni-14ies for further study. For instance, the determination of smoothing parame-ters is currently dependent upon the assumption that the background UPDFis known. A data driven method to determine the smoothing parameter orsmoothing function would be more ideal. Also, it would be nice to refor-mulate the problem or the code in order to overcome some of the individualweaknesses of each algorithm, such as ﬂexibility, speed, or memory issues.Lastly, it is very important to understand the proper way to statisticllyinterpret the results of these algorithms. This is well understood for regularhistograms, and crucial for understanding the physics of the UPDF. We areeager to pursue all of these goals in future studies. Acknowledgements

Abram Krislock would like to thank Teruki Kamon and Jan Conrad foruseful discussions and follow-up ideas, and Maria Teresa Reynolds for ongoingsupport and motivation.

References [1] J. Beringer et al. (Particle Data Group), “Review of Particle Physics,”Phys. Rev. D (2012) 010001.[2] Due to the breadth and variety of high energy particle physics phe-nomenology, a comprehensive citation list is too long to reference here.The particular subset of papers which inspired this work is [4–6, 13–27].[3] B. A. Berg, “Display of probability densities for data from a continuousdistribution,” Phys. Procedia (2011) 17, arXiv:1105.0696 .[4] I. Hinchliﬀe, F. Paige, M. Shapiro, J. Soderqvist, and W. Yao, “PrecisionSUSY measurements at CERN LHC,” Phys. Rev. D (1997) 5520, hep-ph/9610544 .[5] C. Lester and D. Summers, “Measuring masses of semiinvisibly decayingparticles pair produced at hadron colliders,” Phys. Lett. B (1999)99, hep-ph/9906349 . 156] W. Cho, K. Choi, Y. Kim, and C. Park, “Measuring superparticle massesat hadron collider using the transverse mass kink,” JHEP (2008)035, arXiv:0711.4526 .[7] A. Krislock and N. Krislock (2014–), debinning , https://debinning.hepforge.org .[8] R. Chartrand, “Numerical Diﬀerentiation of Noisy, Nonsmooth Data,”ISRN Appl. Math. (2011) 164564.[9] C. Vogel and M. Oman, “Iterative Methods for Total Variation Denois-ing,” SIAM J. Sci. Comput. , 1 (1996) 227.[10] E. Weisstein, “Delta Function,” From MathWorld — A Wolfram WebResource, http://mathworld.wolfram.com/DeltaFunction.html .[11] E. Jones, T. Oliphant, P. Peterson, et al. , “SciPy: Open source scientiﬁctools for Python,” (2001–), .[12] A. H. Baker, E. R. Jessup, and T. Manteuﬀel, “A Technique for Acceler-ating the Convergence of Restarted GMRES,” SIAM Journal on MatrixAnalysis and Applications , 4 (2005) 962.[13] D. Curtin, “Mixing It Up With MT2: Unbiased Mass Measurements atHadron Colliders,” Phys. Rev. D (2012) 075004, arXiv:1112.1095 .[14] R. Arnowitt, et al. , “Determining the Dark Matter Relic Density in themSUGRA (˜X0(1))-˜tau Co-Annhiliation Region at the LHC,” Phys.Rev. Lett. (2008) 231802, arXiv:0802.2968 .[15] B. Dutta, et al. , “Supersymmetry Signals of Supercritical String Cos-mology at the Large Hadron Collider,” Phys. Rev. D (2009) 055002, arXiv:0808.1372 .[16] B. Dutta, T. Kamon, A. Krislock, N. Kolev, and Y. Oh, “Determinationof Non-Universal Supergravity Models at the Large Hadron Collider,”Phys. Rev. D (2010) 115009, arXiv:1008.3380 .[17] B. Dutta, T. Kamon, N. Kolev, and A. Krislock, “Bi-Event SubtractionTechnique at Hadron Colliders,” Phys. Lett. B (2011) 475, arXiv:1104.2508 . 1618] B. Dutta, T. Kamon, A. Krislock, K. Sinha, and K. Wang, “Diagnosis ofSupersymmetry Breaking Mediation Schemes by Mass Reconstructionat the LHC,” Phys. Rev. D (2012) 115007, arXiv:1112.3966 .[19] R. Allahverdi, B. Dutta, T. Kamon, and A. Krislock, “Lepton FlavorViolation at the Large Hadron Collider,” Phys. Rev. D (2012) 015026, arXiv:1203.3276 .[20] G. Aad et al. (ATLAS Collaboration), “Observation of a new particlein the search for the Standard Model Higgs boson with the ATLASdetector at the LHC,” Phys. Lett. B , 1 (2012) 1.[21] S. Chatrchyan et al. (CMS Collaboration), “Observation of a new bosonat a mass of 125 GeV with the CMS experiment at the LHC,” Phys.Lett. B , 1 (2012) 30.[22] C. Weniger, “A Tentative Gamma-Ray Line from Dark Matter Anni-hilation at the Fermi Large Area Telescope,” JCAP (2012) 007, arXiv:1204.2797 .[23] A. Rajaraman, T. Tait, and D. Whiteson, “Two Lines or Not Two Lines?That is the Question of Gamma Ray Spectra,” JCAP (2012) 003, arXiv:1205.4723 .[24] D. Tovey, “Measuring the SUSY mass scale at the LHC,” Phys. Lett. B (2001) 1, hep-ph/0006276 .[25] K. Agashe, R. Franceschini, and D. Kim, “A simple, yet subtle ‘invari-ance’ of two-body decay kinematics,” Phys. Rev. D (2013) 057701, arXiv:1209.0772 .[26] K. Agashe, R. Franceschini, D. Kim, and K. Wardlow, “Using EnergyPeaks to Count Dark Matter Particles in Decays,” Phys. Dark Univ. (2013) 72, arXiv:1212.5230 .[27] K. Agashe, R. Franceschini, and D. Kim, “Using Energy Peaks to Mea-sure New Particle Masses,” arXiv:1309.4776 .[28] E. Weisstein, “Heaviside Step Function,” From MathWorld —A Wolfram Web Resource, http://mathworld.wolfram.com/HeavisideStepFunction.html . 17 Data Generation

We generate our test data using a simple Monte Carlo method, much like theone used to generate the binfull OPDF in Sec. 3. For any UPDF, deﬁned onany range of x values, the data generation is as follows. First, an array of x values spanning the range is generated, with a small (0 .

01) step betweenadjacent values. The UPDF is then evaluated for each of these x values toform an array of y values. Next, the CDF is constructed from the y valuesusing the A operator given by Eq. (2.14). Thus, z CDF = Ay , where, within A , ∆ x = 10 − .With the CDF generated in this way, we can use it as a Monte Carlogenerator of data based upon the UPDF. The generator is the very same asthe one described in Sec. 3, except that no smoothing function is applied.This generator can, with inﬁnite statistics, reproduce the shape of the UPDFas given by the x and y arrays.In this paper, we use the following two phenomenologically inspired UPDFs.For examples of other such UPDFs, please see [7]. A.1 Easy Endpoint

The “easy endpoint” UPDF is inspired by (highly optimistic) endpoint searchesof Supersymmetry cascade decays within the context of the Large HadronCollider or other collider experiment [4]. The easy endpoint UPDF is con-structed as a cubic background piece, y BG = p ( x − p )( x − p ) , (A.1)plus a triangular shaped signal piece, y signal = (cid:26) p x − p , if x < p m † ( x − p ) , if x ≥ p , (A.2)where p i denote the seven parameters of the function, and m † = − p p − p p − p .These pieces are added together to form one function, but only if they areeach positive. Thus, the overall easy endpoint UPDF can be written as y = y signal H ( y signal ) + y BG H ( y BG ) , (A.3)where H ( y ) is the Heaviside step function [28]. The set of seven parameterswe use for the easy endpoint are p = (0 . , ., ., ., . , ., . ) . (A.4)18e would like to emphasize that the parameter of interest is the location ofthe “endpoint,” which is the maximum x -value where the signal piece meetsthe background. This occurs at x = p = 140. A.2 The Line

The second UPDF we use in this paper is “the line,” which is inspired both bythe discovery of the Higgs boson at the LHC [20, 21] and the recent apparentgamma ray line signal within the Fermi-LAT data [22]. The line UPDF isconstructed as an exponential background plus a Gaussian peak signal: y = p exp( − p x ) + p exp (cid:18) − ( x − p ) p (cid:19) . (A.5)The set of parameters we use for the line are p = (1000 ., . , ., ., . ) . (A.6)In this case, the parameter of interest is the peak location of the Gaussian,which is x = p3