Fully Bayesian Unfolding with Regularization
JJanuary 17, 2020
Fully Bayesian Unfolding with Regularization
P. B
ARON
Regional Centre of Advanced Technologies and Materials, Joint Laboratory of Optics ofPalacký University and Institute of Physics AS CR, Faculty of Science, Palacký University,17. listopadu 12, 771 46 Olomouc, Czech [email protected]
Fully Bayesian Unfolding differs from other unfolding methods by provid-ing the full posterior probability of unfolded spectra for each bin. We extendedthe method for the feature of regularization which could be helpful for unfoldingnon-smooth, over-binned or generally non-standard shaped spectra. To decreasethe computation time, the iteration process is presented.PRESENTED AT th International Workshop on Top Quark PhysicsBeijing, China, September 22–27, 2019 a r X i v : . [ phy s i c s . d a t a - a n ] J a n Introduction
Unfolding is the process of correcting measured spectra for finite resolution and efficiencyeffects in high energy physics from the detector to the particle level which is an experiment-independent result. Currently, mostly used methods are the Bayes (D’Agostini) and SVDmethods as implemented in the RooUnfold package [1].This study aims to provide an automated usage of the modern Fully Bayesian Unfolding(FBU), improved by an automatic iteration over the parameters phase space to establishlimits for faster convergence, to add implementation of regularization so far missing inavailable FBU implementations, and study appropriate sampling, e.g. using No-U-Turnsampling in Hamiltonian Monte Carlo algorithm [2].The procedure is tested on spectra of top quarks in proton-proton collisions at √ s = 14 TeV generated by the MadGraph generator [3], showered by Pythia8 [4] to providethe particle level, and finally with the detector level simulated by Delphes [5]. Conclusionsrelevant to the usage of FBU in current high energy experiments at LHC are drawn.
The schematic equation of the unfolding can be written as p = 1 (cid:15) · M − · η · ( D − B ); (1)where p is the unfolded particle-level spectrum, M − is the symbol for a given unfoldingmethod with the migration matrix M , (cid:15) and η are efficiency and acceptance correctionsrespectively D is data spectrum from which the background B is subtracted.The Fully Bayesian Unfolding method is based on the conditional probability and Bayestheorem. Its advantage compared to other methods is the possibility of choosing a priorprobability called prior π ( T ) . P ( T | D ) is then the probability density that the unfoldedspectrum T is inferred and can be written as P ( T | D ) = P ( D | T ) · π ( T )Norm . (2)using the given data D . Norm . is the normalization constant. Ideally, the unfolded spectrum T is equal to particle spectrum P .The probability density P ( D | T ) is proportional to the likelihood function L ( D | T ) andthe prior P ( T | D ) ∝ L ( D | T ) · π ( T ) == n =bins (cid:89) i =1 (cid:15) i (cid:32) n =bins (cid:80) j =1 M ij T j (cid:33) [ η i ( D i − B i )] [ η i ( D i − B i )]! e − (cid:32) n =bins (cid:80) j =1 M ij T j (cid:33) e − τS ( T ) . (3) he practical formula is P ( T | D ) ∼ n =bins (cid:89) i =1 (cid:15) i (cid:118)(cid:117)(cid:117)(cid:116) π (cid:32) n =bins (cid:80) j =1 M ij T j (cid:33) e − (cid:34) η i ( D i − B i ) − (cid:32) n =bins (cid:80) j =1 M ij T j (cid:33)(cid:35) e − τS ( T ) ; (4) where e − τS ( T ) is the prior. If regularization strength parameter τ is zero then the prior isflat ( π ( T ) = 1 ). However, if τ is positive the regularization is applied according to theregularization function S ( T ) . The corrections (cid:15) = P particle, proj. from M P level ; η = D data, proj. from M ˜ D (5)are called efficiency and acceptance corrections respectively, where P particle, proj. from M and D data, proj. from M represent spectra obtained by making particle resp. detector level projectionsfrom the migration matrix. P level and ˜ D are the original particle and detector level spectrataken from simulation. For the test spectra the process of top anti-top quark production at 14 TeV in (cid:96) +jets channelis simulated using Madgraph and Delphes. The resolved topology (a scenario when all thefinal state objects needed to reconstruct the top quark are reconstructed seperately in thedetector) is studied. The detection of the particles was simulated using ATLAS Delphescard as a part of the package. t,hadT p0246810121416182022 · E v en t s DataParticle (a) t,hadT
Detector level p0100200300400500600700800 [ G e V ] t, had T P a r t i c l e l e v e l p Normalized Migration Matrix (b) t,hadT p00.10.20.30.40.50.60.7 h Acceptance corr. ˛ Efficinecy corr. (c)
Figure 1: FBU ingredients. a) Detector-level (blue) and particle-level (red) spectra. b) Migrationmatrix between particle and detector levels. c) Efficiency (blue) and acceptance (red) corrections as afunction of transverse momentum of hadronically decaying top quark.
The unfolded spectrum is derived from posteriors which are calculated for each bin i bymarginalization p i ( T i | D ) = (cid:90) (cid:90) P ( T | D ) dT ...dT i − dT i +1 ...dT N (6)2he unfolded spectra are taken as the fitted mean of the fit Gauss function and the uncertaintyis taken as posterior σ gauss standard deviation. t,hadT p050001000015000200002500030000 E v en t s Truth value (Simulation)Unfolded data
Posteroir in bin 1
Unfolded Spectra, No regularization used
Posteroir in bin 2
Posteroir in bin 3
Posteroir in bin 4
Posteroir in bin 5
140 160 180200 220 24000.20.40.60.81
Posteroir in bin 6
Posteroir in bin 7
Posteroir in bin 8
Posteroir in bin 9
Posteroir in bin 10 (a)
Posteroir in bin 1, Reg. par. = 0.000000
Posteroir in bin 2, Reg. par. = 0.000000
Posteroir in bin 3, Reg. par. = 0.000000
Posteroir in bin 4, Reg. par. = 0.000000
480 500 520 540 560 580 600 620Truth Value in bin 500.20.40.60.81
Posteroir in bin 5, Reg. par. = 0.000000
140 160 180 200 220 240Truth Value in bin 600.20.40.60.81
Posteroir in bin 6, Reg. par. = 0.000000
80 90 100 110 120 130 140 150 160Truth Value in bin 700.20.40.60.81
Posteroir in bin 7, Reg. par. = 0.000000
Posteroir in bin 8, Reg. par. = 0.000000
Posteroir in bin 9, Reg. par. = 0.000000
Posteroir in bin 10, Reg. par. = 0.000000 (b)
Figure 2: Unfolded spectrum of transverse momentum of hadronically decaying top quark (a) derivedfrom the posteriors (b).
In order to decrease the computation power the phase space limits need to be estimated. Inthis study the estimation of phase space for each i -th bin is given as { min i ; max i } = (cid:26) . (cid:18) D i · D i data, proj. from M P i particle, proj. from M (cid:19) ; 1 . (cid:18) D i · D i data, proj. from M P i particle, proj. from M (cid:19)(cid:27) . (7) However this estimation does not need to be universal for all the bins which causes problemshown in the last two bins in Figure 3 (a). In this case the posteriors are fitted by a Gaussfunction g ( µ, σ ) and the unfolding algorithm is launched again with new phase space givenas { min i ; max i } = { µ i − · σ i ; µ i + 4 · σ i } ; (8)where µ and σ stands for the mean and one standard deviation of the Gauss function obtainedby fitting the posteriors. Figures 3 (b) and 3 (c) show the result of two more iterations. (a) (b) (c) Figure 3: Estimation of the phase space of unfolding after a) one b) two and c) three iterations. FBU with regularization
The regularization in the Fully Bayesian Unfolding can be introduced in a very natural wayand is represented by the prior π ( T ) = e − τS ( T ) . In this study the regularization function S ( T ) is chosen as the curvature of the truth pseudo experiment TS ( T ) = N − (cid:88) t =2 (∆ t +1 ,t − ∆ t,t − ) ; (9)where ∆ t ,t = T t − T t (10) i.e. using the sum of second derivatives of the truth spectrum. The unfolding equation thenreads P ( T | D ) ∝ L ( D | T ) · π ( T ) ∼∼ n =bins (cid:89) i =1 (cid:15) i (cid:118)(cid:117)(cid:117)(cid:116) π (cid:32) n =bins (cid:80) j =1 M ij T j (cid:33) e − (cid:34) η i ( D i − B i ) − (cid:32) n =bins (cid:80) j =1 M ij T j (cid:33)(cid:35) · e − τ · N − (cid:80) t =2 (∆ t +1 ,t − ∆ t,t − ) . (11) However, in order to save computation power, log ( P ( T | D )) is computed and for the taskof finding the maximum the likelihood is converted into a hypothetical movement of afree particle in the n -dimensional ( n = number of bins) space with potential given by thelikelihood. This method is called Hamiltonian Monte Carlo [2].To reduce the computation time, the mapping of the phase space in our code is imple-mented using Leapfrog and BuildTree functions described in [2].The regularization might be useful in cases of non-smooth, over-binned or generallynon-standard shaped spectra. In this study the pseudorapidity spectrum of the top quark pairwas chosen due to its non-trivial shape double peak structure to show possible applications. As a result the unfolded spectra of the top quarks pair pseudorapidity are presented afterone and two iteration. The significant improvement between iterations is obvious from thedecrease of the χ / NDF between each unfolded spectrum and the particle-level spectrumvalue in the plots (a) and (b) in the Figure 4.Regularized results with the regularization strength τ = 0 . are consistent within thestatistical uncertainty with the unfolded spectra without using regularization (Figure 4).4 - - - h · E v en t s Particle = 1.39 c FBU this study, = 1.29 c = 0.01, t FBU this study Reg. = 0.19 c FBU ref. [6], - - - h P a r t i c l e U n f o l ded (a) - - - h · E v en t s Particle = 0.43 c FBU this study, = 0.41 c = 0.01, t FBU this study Reg. = 0.18 c FBU ref. [6], - - - h P a r t i c l e U n f o l ded (b) Figure 4: Unfolded pseudorapidity spectrum of the top quark pair. The green line is the unfoldedspectrum with the use of regularization with parameter τ = 0 . in our implementation. The redline is the unfolded spectrum using our implementation without regularization. The blue line is theunfolded spectrum with use of the FBU unfolding code [6] using the mean of the posteriors. Plot (a)shows unfolded spectra after one iteration and plot (b) after two iterations. An iterative method was designed to improve unfolding results and to speed up the computa-tion time.Fully Bayesian Unfolding with regularization can be helpful in a specific kind of spectra.In the case of our implementation, the result of the pseudorapidity spectrum of the topanti-top quark pair shows that applying regularization we can obtain a better agreement inthe first iteration.However, applying the second iteration the differences between regularized and non-regularized spectrum vanishes.In further analysis, the author would like to implement an algorithm to estimate theregularization strength τ . ACKNOWLEDGEMENTS
The author gratefully acknowledges the support from the project IGA_PrF_2019_008 ofPalacky University as well as grants of MSMT, Czech Republic, GACR 19-21484S andLTT-17018. 5 eferences [1] H. B. Prosper and L. Lyons, “Proceedings, PHYSTAT 2011 Workshop on StatisticalIssues Related to Discovery Claims in Search Experiments and Unfolding, CERN,Geneva, Switzerland 17-20 January 2011,” doi:10.5170/CERN-2011-006[2] Matthew D. Hoffman and Andrew Gelman, The No-U-Turn Sampler: AdaptivelySetting Path Lengths in Hamiltonian Monte Carlo, Journal of Machine Learning Re-search, 2014, Volume 15, pages 1593-1623, http://jmlr.org/papers/v15/hoffman14a.html [3] J. Alwall et al. , “The automated computation of tree-level and next-to-leading orderdifferential cross-sections, and their matching to parton shower simulations,” JHEP (2014) 079 doi:10.1007/JHEP07(2014)079 [arXiv:1405.0301 [hep-ph]].[4] T. Sjöstrand et al. , “An Introduction to PYTHIA 8.2,” Comput. Phys. Commun. (2015) 159 doi:10.1016/j.cpc.2015.01.024 [arXiv:1410.3012 [hep-ph]].[5] J. de Favereau et al. [DELPHES 3 Collaboration], “DELPHES 3, A modular frame-work for fast simulation of a generic collider experiment,” JHEP (2014) 057doi:10.1007/JHEP02(2014)057 [arXiv:1307.6346 [hep-ex]].[6] Davide Gerbaudo, Fully Bayesian Unfolding, 2018, GitHub repository, https://github.com/gerbaudo/fbuhttps://github.com/gerbaudo/fbu