Abstract

We introduce a new high dimensional algorithm for efficiency corrected, maximally Monte Carlo event generator independent fiducial measurements at the LHC and beyond. The approach is driven probabilistically using a Deep Neural Network on an event-by-event basis, trained using detector simulation and even only pure phase space distributed events. This approach gives also a glimpse into the future of high energy physics, where experiments publish new type of measurements in a radically multidimensional way.

Full PDF

aa r X i v : . [ phy s i c s . d a t a - a n ] S e p DeepEﬃciency - optimal eﬃciency inversion in higher dimensions at the LHC

Mikael Mieskolainen ∗ Department of Physics, University of Helsinki and Helsinki Institute of Physics, P.O. Box 64, FI-00014 Helsinki, Finland (Dated: September 16, 2018)We introduce a new high dimensional algorithm for eﬃciency corrected, maximally Monte Carloevent generator independent ﬁducial measurements at the LHC and beyond. The approach is drivenprobabilistically using a Deep Neural Network on an event-by-event basis, trained using detectorsimulation and even only pure phase space distributed events. This approach gives also a glimpseinto the future of high energy physics, where experiments publish new type of measurements in aradically multidimensional way.

High energy physics data needs to be corrected for ex-perimental eﬀects induced by kinematically non-uniformresponse of the detector and reconstruction algorithms.These corrections are usually dubbed under eﬃciencytimes acceptance corrections, implemented often by di-viding the measured number of background correctedevents in a histogram bin with the eﬃciency value ob-tained by simulations. In addition, unfolding of his-tograms of observables is often done to account for distor-tions of resolution (variance) and absolute value (bias) in-duced by the detector. Unfolding is usually treated witha stochastic smearing matrix, obtained via simulations.It is the art of unfolding and optimization algorithms,with suitable regularization and possible additional phys-ical boundary conditions, which then “inverts” this re-sponse matrix in an algebraic or probabilistic way. Animportant part is the uncertainty estimation, both inpurely statistical terms but also in systematic ways. Wellknown is that many of the naive bin-by-bin methods areseveraly biased towards “Monte Carlo truth” [1].A crucial step of the measurement is the proper deﬁ-nition of the ﬁducial phase space. Ideally, this should bedeﬁned in terms of proper ﬁnal state observables, suchas transverse momentum and pseudorapidity of chargedparticles, within geometrically visible and electrically ac-tive detector volume. A ﬁducial measurement, by its verydeﬁnition, minimizes extrapolations of data and thus alsominimizes the model dependence. In this work, we gofurther and propose maximally

Monte Carlo event gen-erator independent way to implement ﬁducial eﬃciencycorrections. For the best of our knowledge, this philoso-phy and approach what we call

DeepEﬃciency , based onDeep Neural Networks (DNN), is a new one. However, inother context the networks are well known in high energyphysics, especially in signal-background separation. Fora recent review see [2], or for the pioneering studies [3].For concreteness, let us deﬁne our observable degreesof freedom at the level of individual ﬁnal state particles.Our ﬁnal state consist of N charged particles (tracks)with 3-momentum ~p of each being measured. This ﬁnalstate spans a real valued vector space R N , from whichone can construct a set of Lorentz scalars and other non- ∗ [email protected] scalar observable distributions, such as transverse mo-menta or angular ones. The detector has a ﬁnite proba-bility P of seeing the N -track event, given the trigger andtracking eﬃciencies. This expected probability is repre-sented with an eﬃciency mapping E : R N → P . Clearly,by using space-time symmetries of interest, one can oftendo dimensional reduction. However, treating maximallythe abstract detector space, forbids us from doing so.That is, we want to take into account all the correlationsthat the detector may induce in terms of eﬃciency losses.A simpliﬁcation, dimensional reduction but also a lossof information would occur, if one would for example fac-torize all tracks and get the full event detection eﬃciencyas E tot = E E · · · E N . However, for very high dimensionalproblems, one may do this. We point out here that theeﬃciency function E can be used also within the contextof Matrix Element Method (MEM) type likelihood anal-yses, in addition to the ﬁducial measurements which areof our interest here. For typical approximations withinMEM applications, see [4].What we simply now do is learn this high dimensionaldetector eﬃciency function E using a fully connected,multilayer DNN in a regression mode. That is, we sim-ply use the neural network as a high dimensional functionapproximator and interpolator - as a mathematical ham-mer. The exact architecture, the cost function and itsregularization and the gradient descent methods are artforms each of their own, we return to these in extendeddescriptions. Because we use the fully diﬀerential ob-servable kinematic information of the ﬁnal states, we caneven use a pure phase space Monte Carlo event generatoras an input to the detector simulation. Perhaps the mostintuitive proof of the vanishing dependence on the eventgenerator is based on a multidimensional histogram divi-sion picture with hyperbin volumes approaching zero.A practical requirement for the generator is that a largeenough event statistics is produced continuously for allcorners of the phase space of interest, naturally. For ﬁnalstates containing jets, a pure phase space Monte Carlois not feasible. Thus in practise one proceeds as usualwith a realistic MC generator of interest. One may dothis also in other cases, naturally. Here we do not touchthe highly interesting but complex topic of matching opti-mally parton level, hadron level and detector level objectsand sub-structures, which is basically always non-trivial,sometimes ill-posed, with hard or soft QCD processes.In optimizing (training) the network parameter set Θ,we minimize the so-called cross entropy cost functionsuitable for our probabilistic inference problem L (Θ , S ) = − |S| X i ∈S [ R i ln E i + (1 − R i ) ln(1 − E i )] , (1)where the sum is over all events in the simulation sample S . The known response R is 0 for eﬃciency lost events,and 1 for selected events, both within the ﬁducial phasespace. That is, one must not include events outside theﬁducial phase space in the sample, by construction. Acutoﬀ regularization is used for the rare E → ℓ - (sparsity, Laplace) or ℓ -norm (smoothness, Gaussian) based, in order to avoidoverﬁtting. This is avoided also eﬀectively by choosingthe minimal network architecture which results in eﬃ-ciency corrections with minimal bias and variance on sim-ulated samples. Also the training sample size needs to behigh enough, given that the deep network architecturescan contain O (10 − ) free parameters.After training, the eﬃciency inversion of data and ar-bitrary observables of interest are obtained with dNd O = h O ( { ~p } ) ⊙ [ E ( { ~p } )] − , (2)where h O is a probability distribution estimator oper-ator, typically a bin width ∆ O normalized histogram.That is, the discretization of observables only enters atthis point. The point-wise operator ⊙ is deﬁned as an in-tegral (sum) over the event sample and the weight E ( { ~p } )is obtained from the neural network. Simply put, event-by-event, one constructs the observable of interest andcalls a weighted histogram ﬁll with a weight E − . Thus,in an extraordinary smooth way, one can eﬃciency cor-rect simultaneously arbitrary number of single or mul-tidimensional histograms of observables using the sameweight. For a related mathematical discussion, see theHorvitz-Thompson unbiased estimator [6]. Diﬀerentialcross sections are obtained in a standard way normaliz-ing by integrated luminosity. A straightforward way toobtain statistical uncertainties is via Efron’s bootstrapre-sampling. The possible algorithmic (network) bias should be empirically tested using simulations, observ-able by observable, which is easily automated.The performance was numerically evaluated using thefull ALICE detector simulation with input being low-mass, low- p T central exclusive QCD diﬀraction, whichis a prominent “glueball production process” at theLHC, decaying in this case to charged meson ﬁnal states( π + π − , K + K − ). We obtained solid results with a 5-layer network with ∼