A practical way to regularize unfolding of sharply varying spectra with low data statistics
aa r X i v : . [ phy s i c s . d a t a - a n ] J un FERMILAB-PUB-19-262-PPD June 5, 2019
A practical way to regularize unfolding of sharplyvarying spectra with low data statistics
Andrei Gaponenko
Fermi National Accelerator Laboratory,Batavia, IL, USA
E-mail: [email protected]
Abstract:
Unfolding is a well-established tool in particle physics. However, a naive ap-plication of the standard regularization techniques to unfold the momentum spectrum ofprotons ejected in the process of negative muon nuclear capture led to a result exhibit-ing unphysical artifacts. A finite data sample limited the range in which unfolding canbe performed, thus introducing a cutoff. A sharply falling “true” distribution led to lowdata statistics near the cutoff, which exacerbated the regularization bias and produced anunphysical spike in the resulting spectrum. An improved approach has been developed toaddress these issues and is illustrated using a toy model. The approach uses full Poissonlikelihood of data, and produces a continuous, physically plausible, unfolded distribution.The new technique has a broad applicability since spectra with similar features, such assharply falling spectra, are common. ontents
The procedure of extracting a “truth level” physics distribution that can be directly com-pared to a theoretical model from measured quantities affected by finite detector resolutionis called unfolding [1]. The mathematical problem of unfolding is known to be ill-posed:truth level spectra that are significantly different from each other can map into detectordistributions that have only infinitesimally small differences [1–4]. The best possible unbi-ased solution of an unfolding problem would have an unacceptably large variance [1]. It hasbeen shown that approximate solutions to unfolding problems can be obtained by using aregularization procedure [5–7], which reduces the variance of the result at the price of in-troducing a bias. Implementations of unfolding algorithms for particle physics applications,such as RUN/TRUEE [8] and TUnfold [9] exist. However they are based on the Gaussianapproximation of the log-likelihood function, and regularized unfolding using the completePoisson likelihood is still listed in the “ideas” section in this year’s conference talk [10].The current work was performed in the context of measuring momentum spectrumof charged particles emitted in the process of negative muon capture on atomic nuclei atrest [11]. The median number of data entries in non-empty bins of a reconstructed 2-dimensional distribution was about 10, necessitating the use of Poisson likelihood in theanalysis. The spectrum varied by more than an order of magnitude in the unfolding region.A straightforward application of standard regularization techniques, introduced in section 2to a toy model, defined in section 3, yielded unfolded spectra with undesirable artifacts, asdescribed in section 4 below. Section 5 presents modifications to the unfolding procedurethat allowed us extract the result without unphysical features. Section 6 discusses thechoice of regularization strength, and 7 summarizes the findings.– 1 –
Regularized unfolding
The formulation of the unfolding problem involves an experimental observable x , truthlevel variable y with unknown distribution f ( y ) , which we would like to determine, anddetector response R . Both experimental observables and truth level variables are in generalmultidimensional. For example, in the capture measurement [11] truth level informationcomprises particle species and its true momentum, while experimental observables includemeasured track momentum and its range in the detector.We consider the case when the experimental spectrum is binned. Detector response R i ( y ) is the expectation value of the number of reconstructed events in bin i given atrue event occurring at y . It describes all the detector effects: acceptance, efficiency, andresolution—but is independent of the physics spectrum that is being measured. Detectorresponse is usually determined from a Monte-Carlo simulation, which forces a discretizationin the y space: R R i ( y ) f ( y ) dy −→ P j R ij f j where f i is the integral of f ( y ) over bin j . Thebin size in the y space has to be much smaller than the experimental resolution in order forthe simulation-derived R ij to be independent of the particular truth level spectrum shapeused in the simulation. Small bin size in y leads to a large number of unknowns f j . Thislarge number of unknowns is purely technical and is not related to the number of effectivedegrees of freedom of the problem, which scales with the size of the dataset [12]. However itcan make non-linear numerical minimization not feasible. To reduce the number of degreesof freedom to a physically appropriate value one can approximate the unknown functionswith splines [2], as is illustrated later in this paper.The expected number of data events in bin i , µ i , can be written as µ i = N true X j R ij f j + b i (2.1)where N true is the true number of events of interest in the dataset, and b i is the backgroundcontribution. A maximum likelihood estimator for f j is formed by minimizing − log L ( d | µ { f } ) = − X i ( d i log µ i − µ i ) (2.2)where d i is the observed number of data events in bin i . However the unfolding problem isill-posed and must be regularized to obtain a useful solution. Regularized unfolding can beperformed by minimizing a combination of the log likelihood of data and a regularizationfunctional S { f } [5–7]. F = − log L ( d | µ { f } ) − αS { f } (2.3)where α is the regularization parameter.A widely used Tikhonov [1, 5–7] regularization imposes a “smoothness” requirementon the spectrum by penalizing the second derivative of the solution. It therefore biasesthe result towards a linear function. Another well established regularization, the maximumentropy (or “MaxEnt”) approach [1], is based on the entropy of a probability distribution[13]: S MaxEnt = − X j q j ln( q j ) , q j ≡ f j / X k f k (2.4)– 2 – c o un t s / b i n / c) f ( p )initial sampleefficiency appliedfinal sample(a)00 . . . / c) ǫ ( p )(b) Figure 1 . (a) Toy model momentum spectrum f ( p ) and a random distribution of events drawnfrom it (initial sample), the distribution modified by detector acceptance times efficiency, and thefinal distribution after smearing. See text for more details. (b) Toy model detector acceptancetimes efficiency vs momentum. It biases unfolding result towards a constant.Unfolding with Tikhonov regularization can be implemented in a computationally ef-ficient way when χ minimization is used. However this advantage is lost when Poissonlikelihood is needed. On the other hand, MaxEnt guarantees that the unfolded spectrumis positive, as is required for a particle emission spectrum, whereas Tikhonov with a largeregularization strength α pulls the solution towards a straight line, which can cause someof f j to be negative. The present work uses the MaxEnt regularization term. Unfolding issues will be illustrated using a one-dimensional toy model that demonstratessome features first observed in the real life application of the technique. The model is basedon the spectrum of protons ejected in the process of negative muon nuclear capture. Thespectrum is known to follow an exponential distribution in kinetic energy for large proton– 3 –nergies, and to have a low energy threshold due to the Coulomb barrier [14]. We use theempirical functional shape and parameters proposed in [15], and convert the distributionfrom kinetic energy to momentum space: f ( p ) = C p p p + m × (cid:18) − . MeV T ( p ) (cid:19) . × exp (cid:26) − T ( p )3 . MeV (cid:27) (3.1)where m = 938 . MeV /c is the proton mass, C is a normalization constant, and T ( p ) = p p + m − m is the kinetic energy of the proton. The distribution is shown in Fig. 1(a).Detector efficiency times acceptance is modeled as ǫ ( p ) = , p < = p (cid:18) pp − (cid:19) . × (cid:18) pp (cid:19) , p > p (3.2)with p = 80 MeV / c, illustrated in Fig. 1(b). The momentum resolution of the toy detectormodel as a Gaussian with σ = 10 MeV / c.A sample of 5000 momentum values was drawn from the f ( p ) distribution (the “initialsample” in Fig. 1(a)). Some of the “events” were randomly dropped following the ǫ ( p ) curve,then each remaining momentum smeared with the Gaussian resolution to form the “finalsample” of 1569 events used for the unfolding tests below. The response matrix for thetests was computed analytically and contains no statistical fluctuations, corresponding tothe limit of infinite MC statistics. The toy model contains no background. To implement the approach outlined in section 2 we need to define an unfolding intervaland select a set of splines on that interval to approximate the distribution being unfolded.Cubic B -splines [16] provide a convenient basis for modeling smooth continuous physicsdistributions. Figure 2(a) shows a set of cubic B -splines obtained by placing 3 internal knotsthat split the interval MeV / c < p < MeV / c into 4 equal parts, and locating all othernecessary knots at the end points [16]. Figure 2(b) illustrates how a linear combination ofthese splines can approximate the function f ( p ) from Eq. 3.1: f ( p ) ≈ X w i B i ( p ) . (4.1)where B i ( p ) are the basis splines, and the w i are coefficients.The unfolding is performed by minimizing Eq. 2.3 with respect to w i for a fixed value of α . The choice of a starting point is critical for the success of a nonlinear multi-dimensionalminimization. Our implementation starts with f j = const being an exact minimum of (2.3)for α → ∞ , and minimizes the target functional for a large finite value of α . Then log α isreduced by a small amount, and the minimization is re-run by using the previous minimumas the starting point. As log α is further reduced, each new minimization starts at a pointthat is linearly interpolated from the two previously found mimima. The process is repeateduntil the desired value of regularization strength is reached.– 4 – . . / c) B ( p ) B ( p ) B ( p ) B ( p ) B ( p ) B ( p ) B ( p )(a) 100 150 200p (MeV / c) a r b i t r a r y un i t s f ( p )spline(b) Figure 2 . (a) A set of B-splines. (b) f ( p ) approximated by a linear combination of the splines. Figure 3 shows results for several settings of regularization strength α . As it is reduced,the solution changes from an almost constant function for α = 10 , dominated by theentropy term S { f } , to curves that are influenced by the likelihood of “data” log L ( d | µ { f } ) .The f ( p ) spectrum used to produce the toy MC sample is also shown figure 3. One cansee that α = 5 × is still too large, and the corresponding curve does not reach f ( p ) inboth its peak and tail regions. On the other hand, it already develops a unphysical risingbehavior at the end of the unfolding range. Using a lower value α = 52 produces a spectrumthat oscillates about the ideal result and has a pronounced rise at the end of the range.The toy model example illustrates a typical behavior observed in a real life applicationsof the unfolding technique. In some cases the procedure does not yield a satisfactory resultfor any value of α . The result spikes at the end, and if one moves the upper boundaryof the unfolding interval the spike moves with it. There are two effects that “pull up”the distribution at the end of the unfolding region: the S { f } regularization term, andthe effect of “overflows” (i.e. reconstructed events that originated outside of the unfoldinginterval). The regularization term bias is exacerbated due to the fact that a constant is nota good approximation for the rapidly falling true distribution function. A generalization ofthe MaxEnt approach, cross entropy regularization [1, 17], allows to bias to an arbitraryreference distribution instead of a constant. The distortion due to overflow events can beaddressed by treating the part of the signal distribution outside of the unfolding regionas a fixed shape background, as is done in e.g. [9]. In that approach the model of thesignal distribution is not continuous, because the resulting distribution in the unfoldingregion does not generally match the a priori “background” distribution at the interval ends.Instead of trying to guess the steepness of the “true” distribution for the cross entropy and– 5 –
00 150 200 250p (MeV / c)10203040506070 y i e l d ( c / M e V ) f ( p ) α = 1 × α = 5 . × α = 5 × α = 5 × α = 52 Figure 3 . Unfolding results. overflow background priors, we suggest to fit it from data, as is detailed below.
The main ideas to improve on the results of the previous section are: • Inside the unfolding region, bias towards a physically motivated function instead ofa constant, with parameters of the function included in the fit. For the spectrumof protons from muon capture example an exponential in kinetic energy was chosen,because the spectrum is know to approach this shape at high energies. Note that thetrue distribution in the toy model (Eq. 3.1) is not a simple exponential, however anexponential is a much better approximation for it in the unfolding interval than aconstant. • Include the “overflow” region in the minimization, and fit not just the normalizationbut also the exponential slope in that region.– 6 –
00 150 200 250p (MeV / c)1020304050607080 y i e l d ( c / M e V ) f ( p ) α = 1 × α = 3 . × α = 37 α = 3 . α = 0 .
34 150 200 250p (MeV / c)10 − y i e l d ( c / M e V ) Figure 4 . Results for the improved technique. • Require that the distribution is continuous and has two continuous derivatives. Thisrequirement connects the unfolded distribution to the overflow tail in a way thatprevents the unphysical spike at the boundary.Specifically, we represent f ( p ) = A p p p + m exp {− γT ( p ) } × ( φ ( p ) p min < p ≤ p max p max < p (5.1)where p min and p max determine the limit of the unfolding region, m is the mass of theparticle and T ( p ) its kinetic energy, A and γ are parameters pertaining to the exponentialbehavior of the spectrum, and φ ( p ) is an arbitrary function to be determined from theunfolding. The regularization term has the form (2.4) but now acts on φ instead of f : S MaxEnt = − X j ˜ q j ln(˜ q j ) , ˜ q j ≡ (1 + φ j ) / X k (1 + φ k ) (5.2)– 7 – .
16 2 .
17 2 . ρ ( α ) α = 0 . α = 4 . α = 3 . × α = 1 × α = 37 (max curvature) − . − . − . − l og η ( α ) Figure 5 . L-curve for the improved technique.
The function φ ( p ) is approximated by a linear combinations of cubic basis splines B l [16] φ ( p ) = n X l w l B l ( p ) , p min < p ≤ p max (5.3)Here w l are the spline coefficients determined from the unfolding process. We require thatthe resulting spectrum has a continuous second derivative, leading to φ ( p max ) = φ ′ ( p max ) = φ ′′ ( p max ) = 0 , which is provided by having a single-fold spline knot at the endpoint p max .There are no continuity constraints at p min , therefore a 4-fold knot should be used at thatpoint to support the most general cubic spline shape.To illustrate the modified technique, we use the same unfolding interval MeV / c
MeV / c and < p < MeV / c, while the curve for a larger α = 3 . × deviates more in the peak region p ≈ MeV / c. The proposed method combines unfolding to an arbitrary function shape in a phase spaceregion with sufficient data statistics and a parametric fit in the low statistics tail. Thewhole distribution is required to be twice continuously differentiable, which guaranteesa physically reasonable behavior of the result. Factoring out the exponential part of asharply varying spectrum and applying the regularization to just the deviation from thepure exponent reduces the bias. The use of the L-curve approach for finding the optimalregularization strength has been demonstrated for Poisson likelihood fit to data with theMaxEnt regularization term.
Acknowledgments
The author thanks Richard Mischke, Art Olin, Glen Marshall, Alexander Grossheim, andAnthony Hillairet, who worked with me on the muon capture analysis and provided en-couragement vital for the completion of this study. In addition, Richard and Art providedvaluable feedback on the text of this article.The numerical minimization code used for the study utilized the GNU Scientific Library[20]. The figures were prepared with Asymptote [21].This document was prepared by the author using the resources of the Fermi NationalAccelerator Laboratory (Fermilab), a U.S. Department of Energy, Office of Science, HEPUser Facility. Fermilab is managed by Fermi Research Alliance, LLC (FRA), acting underContract No. DE-AC02-07CH11359.
References [1] G. Cowan,
Statistical Data Analysis . Clarendon Press, Oxford, New York, 1998. – 9 –
2] V. Blobel,
Unfolding Methods in High-energy Physics Experiments , in
Proceedings, CERNSchool of Computing: Aiguablava, Spain, September 9-22 1984 , 1984.[3] G. Cowan,
A survey of unfolding methods for particle physics , Conf. Proc.
C0203181 (2002)248.[4] H. B. Prosper and L. Lyons, eds.,
Proceedings, PHYSTAT 2011 Workshop on StatisticalIssues Related to Discovery Claims in Search Experiments and Unfolding, CERN,Geneva,Switzerland 17-20 January 2011 , (Geneva), CERN, CERN, 2011. 10.5170/CERN-2011-006.[5] A. N. Tikhonov,
Solution of incorrectly formulated problems and the regularization method , Soviet Mathematics Dokl. (1963) 1035–1038.[6] A. N. Tikhonov, Regularization of ill-posed problems , Soviet Mathematics Dokl. (1963)1624.[7] D. L. Phillips, A technique for the numerical solution of certain integral equations of the firstkind , J. Assoc. Comput. Mach. (1962) 84.[8] N. Milke, M. Doert, S. Klepser, D. Mazin, V. Blobel and W. Rhode, Solving inverse problemswith the unfolding program TRUEE: Examples in astroparticle physics , Nucl. Instrum. Meth.
A697 (2013) 133 [ ].[9] S. Schmitt,
TUnfold: an algorithm for correcting migration effects in high energy physics , JINST (2012) T10003 [ ].[10] S. Schmitt, The collider experience with unfolding , in
PHYSTAT-nu 2019 workshop, January22–25, CERN , 2019, https://indico.cern.ch/event/735431/contributions/3137825.[11] TWIST Collaboration, “Charged particle spectra from µ − capture on aluminum.” (inpreparation).[12] V. M. Panaretos, A Statistician’s View on Deconvolution and Unfolding , in
Proceedings,PHYSTAT 2011 Workshop on Statistical Issues Related to Discovery Claims in SearchExperiments and Unfolding, CERN,Geneva, Switzerland 17-20 January 2011 , (Geneva),pp. 229–239, CERN, CERN, 2011, DOI.[13] C. E. Shannon,
A mathematical theory of communication , Bell Syst. Tech. J. (1948) 379.[14] D. Measday, The nuclear physics of muon capture , Phys.Rept. (2001) 243.[15] E. V. Hungerford,
Comment on proton emission after muon capture , Tech. Rep. 034, MECOCollaboration, available as arXiv:1803.08403, 1999.[16] C. de Boor,
A Practical Guide to Splines . Springer Verlag, New York, 1978.[17] M. Schmelling,
The Method of reduced cross entropy: A General approach to unfoldprobability distributions , Nucl. Instrum. Meth.
A340 (1994) 400.[18] P. C. Hansen,
Analysis of discrete ill-posed problems by means of the L-curve , SIAM Review (1992) 561.[19] P. C. Hansen and D. F. O’Leary, The use of the L-curve in the regularization of discreteill-posed problems , SIAM J. Sci. Comput.