Automated detector simulation and reconstruction parametrization using machine learning
AANL-HEP-158616
Automated detector simulation and reconstructionparametrization using machine learning
D. Benjamin, S. Chekanov, W. Hopkins, Y. Li, and J. R. Love High Energy Physics Division, Argonne National Laboratory,9700 S. Cass Avenue, Argonne, IL 60439, USA Computational Science Division, Argonne National Laboratory,9700 S. Cass Avenue, Argonne, IL 60439, USA (Dated: February 27, 2020)
Abstract
Rapidly applying the effects of detector response to physics objects (e.g. electrons, muons, show-ers of particles) is essential in high energy physics. Currently available tools for the transformationfrom truth-level physics objects to reconstructed detector-level physics objects involve manuallydefining resolution functions. These resolution functions are typically derived in bins of variablesthat are correlated with the resolution (e.g. pseudorapidity and transverse momentum). This pro-cess is time consuming, requires manual updates when detector conditions change, and can missimportant correlations. Machine learning offers a way to automate the process of building thesetruth-to-reconstructed object transformations and can capture complex correlation for any givenset of input variables. Such machine learning algorithms, with sufficient optimization, could havea wide range of applications: improving phenomenological studies by using a better detector rep-resentation, allowing for more efficient production of Geant4 simulation by only simulating eventswithin an interesting part of phase space, and studies on future experimental sensitivity to newphysics. a r X i v : . [ phy s i c s . d a t a - a n ] F e b . INTRODUCTION A cornerstone of particle collision experiments is the Monte Carlo (MC) simulation ofphysics processes resulting from collisions of high-energy particles, followed by the simulationof detector responses and object reconstruction. The MC simulation produces objects (jets,electrons, muons, etc) with properties (four momenta, particle types) which entirely dependon the physics processes occurring. These objects are commonly referred to as “truth”objects. These objects are altered by interactions with the detector and are reconstructedwith experimental algorithms. Such objects, that have undergone a transformation due todetector interactions and reconstruction will be referred to as “reco” objects in this paper.With the increased complexity of high-energy collider experiments, such as those at theLarge Hadron Collider (LHC), the detector simulations become increasing complex and timeconsuming. Parameterized detector simulations, such as Delphes [1], have been proven tobe a vital tool for physics performance and phenomological studies (i.e. to estimate thesensitivity of an experiment to a new physics model). An approximation of the detectorresponses and experimental object reconstruction can, however, also be performed by aneural network (NN) trained using the Geant4-based simulations that have gone throughan experiment’s reconstruction algorithm. Such an NN could then computationally rapidlytransform truth MC objects (jets and other identified particles) to objects modified by adetector and experimental reconstruction algorithms.The main advantage of a detector parametrization based on machine learning (ML), ascompared to a manually-constructed analytic parametrization such as Delphes, is that aneural network can automatically learn the features introduced by detailed full simulationsavoiding the need to handcraft parameters to represent resolutions and inefficiencies. AnNN trained using realistic detector simulation could memorize the transformation from thetruth to the reco quantities without dedicated studies of resolution functions. Anotheradvantage is that the NN approach can introduce a complex interdependence of variableswhich is currently difficult to implement in parameterized simulations. Finally, since theunderlying libraries used for ML (e.g. Keras [2], pyTorch [3], etc) are optimized for a widerange of hardware, an NN-based truth-to-reco transformation would be able to run efficientlyon heterogeneous hardware resources (resources that use a varied set of processors such asGPUs and CPUs). 2s a first step towards parameterized detector simulations with ML, it is instructive toinvestigate how a transformation from the truth to reco objects can be performed, leavingaside the question of introducing objects that are created by misreconstructions or objectsthat are lost due to inefficiencies.
II. TRADITIONAL PARAMETERIZED FAST SIMULATIONS
In abstract terms, a typical variable ξ reco i that characterizes a reconstructed particle/jet,such as transverse momentum ( p Treco ) or pseudorapidity ( η reco ), can be viewed as the resultof a multivariate transform, F , of the original variable ξ truth1 at truth level: ξ reco1 = F ( ξ truth1 , ξ truth2 , ξ truth3 , ...ξ truthN ) . Generally, such a transform depends on several other variables ξ truth2 .. ξ truthN characterizingthis (or other) objects at truth level. For example, the extent at which jet transversemomentum, p T is modified by a detector depends on the original truth-level transversemomentum ( ξ truth1 = p truth T ), pseudorapidity ( ξ truth2 = η truth ), and other effects that can beinferred from truth quantities. Similarly, if particular detector regions in the azimuthal angle( φ ) have low efficiency, this would introduce an additional dependence of this transform on φ . Typical parameterized simulations ignore the full range of correlations between the truth-level variables. In most cases, the above transform is reduced to a single variable, or two(as in the case of Delphes simulations where the energy resolution of clusters depends onthe original energies of particles and their positions in η ). In order to take into accountcorrelations between multiple parameters characterizing transformations to reconstructionobjects, a grid in the hypercube with the dimension N Nb , where N b is the number of histogrambins for the distributions ( ξ reco − ξ truth ) /ξ truth , representing the “resolution”, must be created.This methodology results in a large number of histograms when there are many correlatedvariables that affect the resolution.It should be pointed out that the calculation speed for parameterized simulations of onevariable that depends on N other variables at the truth level is proportional to N Nb sinceeach object at the truth level should be placed inside the grid defined by N b bins. Therefore,complex parameterisations of resolutions and efficiencies for N >
III. JET TRUTH-TO-RECO TRANSFORMATION WITH ML
To test the viability of using ML to transform truth objects to reco objects, we studiedthe truth-to-reco transformation for jets. Jet truth-level quantities, such as jet p T , η , φ andjet mass ( m ) are used as training inputs to an NN while the output is an array of nodesthat represent the binned probability density function (PDF) of the resolution for a singlevariable (such as jet p T ). Additional input can consist of any variable that can influencethe resolution of a jet, such as jet flavor at the truth level, jet radius, etc. Figure 1 showsa schematic representation of the NN architecture for modelling the detector response for asingle output variable. The aim is to have the NN learn the shape of the resolution PDF,for example for the p T , depending on other input variables such as the η of the object. Abinned output (multi-categorization) was used so that the precision of the resolution PDFmodelling can be chosen. (ζ R - ζ T ) / ζ T FIG. 1. A schematic representation of the NN architecture for modelling the detector responseand affect of reconstruction algorithms on truth-level input variables. The output nodes of thisNN represent a binned PDF for the resolution of single variable, e.g. ( p recoT − p truthT ) /p truthT , whileother variables such η , φ , mass ( m ) are auxiliary variables that affect ( p recoT − p truthT ) /p truthT . V. MONTE CARLO SIMULATED EVENT SAMPLES
Monte Carlo events used for this analysis were produced using the Madgraph generator [4].The simulated processes are a combination of equal event samples with top pair production( t ¯ t ) and photons produced in association with jets ( γ +jets), which give a high rate of jetsin different environments. Hadronic jets were reconstructed with the FastJet package [5]using the anti- k t algorithm [6] with a distance parameter of 0.4. The detector simulationwas performed with the Delphes package with a detector geometry which is similar to theATLAS geometry. The event samples used for the following study are available from theHepSim database [7]. In this paper, only the transformation of p T from truth jets (which havetruth particle constituents) to reconstructed jets (which have calorimeter cell constituent)was performed. However, the methodology should be object and parameter agnostic. Onlytruth jets which are matched to a reconstructed Delphes jet are considered in this study.For the matching criteria the reconstructed jet that has the smallest ∆ R = (cid:112) ∆ φ + ∆ η ,where ∆ φ = φ truth − φ reco and ∆ η = η truth − η reco , with respect to the truth jet is chosen. Ifthis minimum ∆ R is greater than 0.2, the truth jet is discarded. No other requirements aremade on truth and reconstructed Delphes jets other than the p T >
15 GeV requirement inDelphes. Only matched jets are used since the aim of the study is to test whether an NNcan learn changes in detector resolution as a function of kinematic properties of the jet (e.g. p T , η , φ , m ). The final number of training jets used is two million while 500,000 jets wereused as an independent test sample. The distributions of quantities used as the input forthe NN, p T , η φ , m , are shown in Figure 2.To facilitate gradient descent in all direction of the input variable space, the input vari-ables are scaled to be in the range [0,1]. This avoids the p T and the mass from having adisproportional affect on the training of the NN. The output variable, ( p recoT − p truthT ) /p truthT ,is also scaled to have values between 0 and 1. Only jets that are within the 1 st and 99 th percentile of the ( p recoT − p truthT ) /p truthT distribution are considered. V. NEURAL NETWORK STRUCTURES
An NN is trained with four input parameters, the scaled p T , η , φ , and m , and consistof five layers with 100 nodes each and with each node having a rectifier linear unit (ReLu)5 T [GeV]10 J e t s DelphesGenerator J e t s DelphesGenerator −4 −2 0 2 4 jet η100000200000300000400000 J e t s DelphesGenerator −3 −2 −1 0 1 2 3 jet ϕ50000100000150000200000250000300000 J e t s DelphesGenerator
FIG. 2. Distributions for input variables for truth (red) and reco quantities (black). −0.4 −0.2 0.0 0.2 0.4(p recoT − p truthT )/p truthT J e t s FIG. 3. Resolution of the p T (relative differences between truth and reco p T ). activation function. Several output layer configurations with 100, 200, 300, 400, and 500output nodes were tested, all with a softmax activation function. The configuration with 400output nodes resulted in the best performance, measured by how well the NN could mimic6he Delphes p T spectrum and resolution (see below for details), with the least number oftotal NN parameters.In an attempt to optimize the NN training, several batch-size and number-of-epoch com-binations were used in an attempt to improve the sensitivity to a small subsample (theforward jets) of the training sample. The number of backpropagations ( N bp ) were held con-stant by keeping the ratio of the number of epochs ( N e ) and batch size ( N b ) constant since N bp = N t N b N e where N t is the number of training jets. Batch size and number of epochs of 5,10, 20, 100, 200, 1000 were tested resulting in similar performance of the NN.Finally, the NN is trained using the Adam [8] optimizer with a learning rate of 10 − andis implemented using Keras with a TensorFlow [9] backend. VI. RESULTS
After the NN has been trained to learn the PDF of ( p recoT − p truthT ) /p truthT , the resultinglearned PDF is compared to the Delphes PDF using the test sample in Figure 4a. Goodagreement is observed between the Delphes and NN PDFs, showing that the NN has learnedthe bulk distribution. recoT p truthT )/p truthT scaled0.0050.0100.0150.020 a r b i t r a r y un i t s DelphesNN (a) recoT − p tr thT )/p tr thT C a t e g o r y p r o b a b ili t y Rand jet 0Rand jet 1Rand jet 2 (b)
FIG. 4. NN-generated jet ( p recoT − p truthT ) /p truthT compared to Delphes reco jet ( p recoT − p truthT ) /p truthT (a). Representative values of the NN output after training for three randomly selected truth jetswhich have different input values (b). The NN output represents a binned PDF for each jet based on its input parameters (i.e. p T , φ , η , and m ). The PDFs for a set of three randomly selected jets are shown in Figure 4b7hich features shapes expected for typical resolution function with variations due to changesin jet input parameters. These PDFs are then randomly sampled to produce an NN jet thatmimics the reco jet. A comparison of the NN-generated and Delphes jet p T distribution forthe test sample is shown in Figure 5. The NN reproduces the jet p T distribution of Delpheswithin 5% for reconstructed jets with p T >
20 GeV. J e t s DelphesNNGenerator0 200 400 600 800 jet p T [GeV]0.91.01.1 D e l p h e s / NN (a) J e t s DelphesNNGenerator0 25 50 75 100 125 150 jet p T [GeV]0.91.01.1 D e l p h e s / NN (b) FIG. 5. Delphes and NN-generated jet p T disitributions for a wide (a) and narrow (b) p T range. To test whether the NN learned correlations between input parameters and the p T res-olution, the jets were divided into central ( | η | < .
2) and forward ( | η | > .
2) jets. The p T resolution is then compared between the two regions for both the Delphes jets as well as theNN-generated jets. These two regions in the detector simulation have different calorimeterresponses which results in different jet p T resolutions in these two | η | regions. The resultingresolutions for both regions are shown in Figure 6 using the training sample. The trainingsample was chosen for this comparison because forward jets make up a small subsample ofall jets, as can be seen in Figure 2.The mean and standard deviation of the resolution (demonstrated in Figure 4) as afunction of p T is shown in Figure 7. The mean of the resolution for the NN is systematicallyhigher than the resolution for Delphes but this effect is small when considering the width ofthe resolution. The standard deviation of the resolutions, however, are the same for the NNand Delphes across the p T range showing that the NN accurately predicts the resolutionsfor a large range in p T .In order to produce an NN with optimal performance and a minimal amount of param-eters, a genetic algorithm (GA) is used to optimize the NN hyperparameters. The number8 recoT -p tr thT )/p tr thT a r b i t r a r y n i t s Delphes, |η| < 3.2Delphes, |η| > 3.2NN, |η| < 3.2NN, |η| > 3.2
FIG. 6. Jet p T resolution for the training sample for both the central and forward region. truthT [GeV]−0.020.000.020.04 < ( p r e c o T − p t r u t h T ) / p t r u t h T > DelphesNN (a) truthT [GeV]0.020.040.060.080.100.120.14 σ (( p r e c o T − p t r u t h T ) / p t r u t h T ) DelphesNN (b)
FIG. 7. The mean (a) and standard deviation of the jet p T resolution for Delphes and NN-generatedas a function truth jet p T . of layers, number of neurons in each layer, and choice of the learning rate are scanned tofind the optimal configuration. The GA we utilize is an evolutionary algorithm that mimicsthe process of natural selection and which was previously used in the determination of theparameters of a complex force field [10, 11]. Due to the small scale of our problem, theinitial configuration was found to be equivalent to the optimal configuration found by theGA. 9 II. CONCLUSION
A truth-level to reconstruction-level transformation using a multi-categorizing NN is pre-sented. This approach does not require the determination of analytic resolution functionssince an NN can automatically learn the resolutions during the training procedure. TheNN implementation presented effectively learned the truth-to-reconstruction transformationwithout requiring manual binning to capture the differences in resolutions of particular sub-samples (i.e. central and forward jets). The automatic learning of correlations between theinput variables and the resolution is one of the attractive features of using an ML-basedtransformation, allowing for rapid deployment of detector parametrizations.Additional improvements could probably be made by including more information aboutthe objects (e.g. whether a b -quark is present in a jet, kinematic information from otherobjects in the event) making this method more robust. This method should be easilyextendable to additional reconstructed quantities and could be used to model the ATLASand CMS detector. The method described in this paper allows for automated detectorparametrization which can facilitate phenomological studies, efficient truth event selection,and upgrade studies. ACKNOWLEDGMENTS
We gratefully acknowledge the computing resources provided on a high-performance com-puting cluster operated by the Laboratory Computing Resource Center at Argonne NationalLaboratory. The submitted manuscript has been created by UChicago Argonne, LLC, Op-erator of Argonne National Laboratory (Argonne). Argonne, a U.S. Department of EnergyOffice of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. TheU.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclu-sive, irrevocable worldwide license in said article to reproduce, prepare derivative works,distribute copies to the public, and perform publicly and display publicly, by or on be-half of the Government. The Department of Energy will provide public access to theseresults of federally sponsored research in accordance with the DOE Public Access Plan. http://energy.gov/downloads/doe-public-access-plan . Argonne National Laborato-rys work was funded by the U.S. Department of Energy, Office of High Energy Physics under10ontract DE-AC02-06CH11357. [1] J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lematre, A. Mertens, and M. Sel-vaggi. DELPHES 3, A modular framework for fast simulation of a generic collider experiment.
JHEP , 02:057, 2014.[2] Fran¸cois Chollet et al. Keras. https://keras.io , 2015.[3] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan,Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, AndreasKopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy,Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style,High-Performance Deep Learning Library. In
Advances in Neural Information ProcessingSystems 32 , pages 8024–8035. Curran Associates, Inc., 2019.[4] J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, et al. The automated computationof tree-level and next-to-leading order differential cross sections, and their matching to partonshower simulations.
JHEP , 07:079, 2014.[5] Matteo Cacciari, Gavin P. Salam, and Gregory Soyez. FastJet User Manual.
Eur. Phys. J. C ,72:1896, 2012.[6] Matteo Cacciari, Gavin P. Salam, and Gregory Soyez. The anti- k t jet clustering algorithm. JHEP , 04:063, 2008.[7] S.V. Chekanov. HepSim: a repository with predictions for high-energy physics experiments.
Advances in High Energy Physics , 2015:136093, 2015. Available as .[8] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXive-prints , page arXiv:1412.6980, Dec 2014.[9] Mart´ın Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems,2015. Software available from tensorflow.org.[10] Ying Li, Hui Li, Frank C. Pickard, Badri Narayanan, Fatih G. Sen, Maria K. Y. Chan, Subra-manian K. R. S. Sankaranarayanan, Bernard R. Brooks, and Benot Roux. Machine learningforce field parameters from ab initio data.
Journal of Chemical Theory and Computation ,13(9):4492–4503, 2017. PMID: 28800233.
11] Md Mahbubul Islam, Grigory Kolesov, Toon Verstraelen, Efthimios Kaxiras, and Adri C. T.van Duin. ereaxff: A pseudoclassical treatment of explicit electrons within reactive force fieldsimulations.
Journal of Chemical Theory and Computation , 12(8):3463–3472, 2016. PMID:27399177., 12(8):3463–3472, 2016. PMID:27399177.