DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC
aa r X i v : . [ phy s i c s . d a t a - a n ] S e p DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC
Mikael Mieskolainen ∗ Department of Physics, University of Helsinki and Helsinki Institute of Physics, P.O. Box 64, FI-00014 Helsinki, Finland (Dated: September 16, 2018)We introduce a new high dimensional algorithm for efficiency corrected, maximally Monte Carloevent generator independent fiducial measurements at the LHC and beyond. The approach is drivenprobabilistically using a Deep Neural Network on an event-by-event basis, trained using detectorsimulation and even only pure phase space distributed events. This approach gives also a glimpseinto the future of high energy physics, where experiments publish new type of measurements in aradically multidimensional way.
High energy physics data needs to be corrected for ex-perimental effects induced by kinematically non-uniformresponse of the detector and reconstruction algorithms.These corrections are usually dubbed under efficiencytimes acceptance corrections, implemented often by di-viding the measured number of background correctedevents in a histogram bin with the efficiency value ob-tained by simulations. In addition, unfolding of his-tograms of observables is often done to account for distor-tions of resolution (variance) and absolute value (bias) in-duced by the detector. Unfolding is usually treated witha stochastic smearing matrix, obtained via simulations.It is the art of unfolding and optimization algorithms,with suitable regularization and possible additional phys-ical boundary conditions, which then “inverts” this re-sponse matrix in an algebraic or probabilistic way. Animportant part is the uncertainty estimation, both inpurely statistical terms but also in systematic ways. Wellknown is that many of the naive bin-by-bin methods areseveraly biased towards “Monte Carlo truth” [1].A crucial step of the measurement is the proper defi-nition of the fiducial phase space. Ideally, this should bedefined in terms of proper final state observables, suchas transverse momentum and pseudorapidity of chargedparticles, within geometrically visible and electrically ac-tive detector volume. A fiducial measurement, by its verydefinition, minimizes extrapolations of data and thus alsominimizes the model dependence. In this work, we gofurther and propose maximally
Monte Carlo event gen-erator independent way to implement fiducial efficiencycorrections. For the best of our knowledge, this philoso-phy and approach what we call
DeepEfficiency , based onDeep Neural Networks (DNN), is a new one. However, inother context the networks are well known in high energyphysics, especially in signal-background separation. Fora recent review see [2], or for the pioneering studies [3].For concreteness, let us define our observable degreesof freedom at the level of individual final state particles.Our final state consist of N charged particles (tracks)with 3-momentum ~p of each being measured. This finalstate spans a real valued vector space R N , from whichone can construct a set of Lorentz scalars and other non- ∗ [email protected] scalar observable distributions, such as transverse mo-menta or angular ones. The detector has a finite proba-bility P of seeing the N -track event, given the trigger andtracking efficiencies. This expected probability is repre-sented with an efficiency mapping E : R N → P . Clearly,by using space-time symmetries of interest, one can oftendo dimensional reduction. However, treating maximallythe abstract detector space, forbids us from doing so.That is, we want to take into account all the correlationsthat the detector may induce in terms of efficiency losses.A simplification, dimensional reduction but also a lossof information would occur, if one would for example fac-torize all tracks and get the full event detection efficiencyas E tot = E E · · · E N . However, for very high dimensionalproblems, one may do this. We point out here that theefficiency function E can be used also within the contextof Matrix Element Method (MEM) type likelihood anal-yses, in addition to the fiducial measurements which areof our interest here. For typical approximations withinMEM applications, see [4].What we simply now do is learn this high dimensionaldetector efficiency function E using a fully connected,multilayer DNN in a regression mode. That is, we sim-ply use the neural network as a high dimensional functionapproximator and interpolator - as a mathematical ham-mer. The exact architecture, the cost function and itsregularization and the gradient descent methods are artforms each of their own, we return to these in extendeddescriptions. Because we use the fully differential ob-servable kinematic information of the final states, we caneven use a pure phase space Monte Carlo event generatoras an input to the detector simulation. Perhaps the mostintuitive proof of the vanishing dependence on the eventgenerator is based on a multidimensional histogram divi-sion picture with hyperbin volumes approaching zero.A practical requirement for the generator is that a largeenough event statistics is produced continuously for allcorners of the phase space of interest, naturally. For finalstates containing jets, a pure phase space Monte Carlois not feasible. Thus in practise one proceeds as usualwith a realistic MC generator of interest. One may dothis also in other cases, naturally. Here we do not touchthe highly interesting but complex topic of matching opti-mally parton level, hadron level and detector level objectsand sub-structures, which is basically always non-trivial,sometimes ill-posed, with hard or soft QCD processes.In optimizing (training) the network parameter set Θ,we minimize the so-called cross entropy cost functionsuitable for our probabilistic inference problem L (Θ , S ) = − |S| X i ∈S [ R i ln E i + (1 − R i ) ln(1 − E i )] , (1)where the sum is over all events in the simulation sample S . The known response R is 0 for efficiency lost events,and 1 for selected events, both within the fiducial phasespace. That is, one must not include events outside thefiducial phase space in the sample, by construction. Acutoff regularization is used for the rare E → ℓ - (sparsity, Laplace) or ℓ -norm (smoothness, Gaussian) based, in order to avoidoverfitting. This is avoided also effectively by choosingthe minimal network architecture which results in effi-ciency corrections with minimal bias and variance on sim-ulated samples. Also the training sample size needs to behigh enough, given that the deep network architecturescan contain O (10 − ) free parameters.After training, the efficiency inversion of data and ar-bitrary observables of interest are obtained with dNd O = h O ( { ~p } ) ⊙ [ E ( { ~p } )] − , (2)where h O is a probability distribution estimator oper-ator, typically a bin width ∆ O normalized histogram.That is, the discretization of observables only enters atthis point. The point-wise operator ⊙ is defined as an in-tegral (sum) over the event sample and the weight E ( { ~p } )is obtained from the neural network. Simply put, event-by-event, one constructs the observable of interest andcalls a weighted histogram fill with a weight E − . Thus,in an extraordinary smooth way, one can efficiency cor-rect simultaneously arbitrary number of single or mul-tidimensional histograms of observables using the sameweight. For a related mathematical discussion, see theHorvitz-Thompson unbiased estimator [6]. Differentialcross sections are obtained in a standard way normaliz-ing by integrated luminosity. A straightforward way toobtain statistical uncertainties is via Efron’s bootstrapre-sampling. The possible algorithmic (network) bias should be empirically tested using simulations, observ-able by observable, which is easily automated.The performance was numerically evaluated using thefull ALICE detector simulation with input being low-mass, low- p T central exclusive QCD diffraction, whichis a prominent “glueball production process” at theLHC, decaying in this case to charged meson final states( π + π − , K + K − ). We obtained solid results with a 5-layer network with ∼