End-to-End Physics Event Classification with CMS Open Data: Applying Image-Based Deep Learning to Detector Data for the Direct Classification of Collision Events at the LHC
Michael Andrews, Manfred Paulini, Sergei Gleyzer, Barnabas Poczos
CComputing and Software for Big Science manuscript No. (will be inserted by the editor)
End-to-End Physics Event Classification with CMS Open Data
Applying Image-Based Deep Learning to Detector Data for the Direct Classificationof Collision Events at the LHC
M. Andrews, M. Paulini, S. Gleyzer, B. Poczos
July 26, 2019
Abstract
This paper describes the construction of novelend-to-end image-based classifiers that directly leveragelow-level simulated detector data to discriminate signaland background processes in pp collision events at theLarge Hadron Collider at CERN. To better understandwhat end-to-end classifiers are capable of learning fromthe data and to address a number of associated chal-lenges, we distinguish the decay of the standard modelHiggs boson into two photons from its leading back-ground sources using high-fidelity simulated CMS OpenData. We demonstrate the ability of end-to-end classi-fiers to learn from the angular distribution of the pho-tons recorded as electromagnetic showers, their intrinsicshapes, and the energy of their constituent hits, evenwhen the underlying particles are not fully resolved,delivering a clear advantage in such cases over purelykinematics-based classifiers.
Keywords end-to-end · detector images · machinelearning · deep learning · CNN · Resnet · photon ID · event classification · mass sculpting · LHC · CMS · open data · higgs boson M. Andrews, M. PauliniDepartment of PhysicsCarnegie Mellon UniversityPittsburgh, USAS. GleyzerDepartment of PhysicsUniversity of FloridaGainesville, USAB. PoczosMachine Learning DepartmentCarnegie Mellon UniversityPittsburgh, USA
An important aspect of searches for physics beyond thestandard model (SM) of particle physics at the CERNLarge Hadron Collider (LHC) is the identification ofsignal events from their corresponding backgrounds. Atthe Compact Muon Solenoid (CMS) experiment [1], thistask is accomplished by first reconstructing the low-level detector data into progressively more physically-motivated quantities [2] until arriving at tabular-likeparticle-level data. Traditional analysis approaches [3,4] use these condensed inputs to construct an event clas-sifier that capitalizes on the decay structure or topologyof the processes involved. While such approaches havebeen widely successful in understanding the SM, theypotentially lose information in the process and limitmore exhaustive searches for physics beyond the stan-dard model (BSM).In this paper, we propose a new approach for par-ticle physics event classification that directly leverageslow-level detector data as input—an end-to-end eventclassifier. Recent machine learning advances, in partic-ular in the field of computer vision, have led to break-through applications of convolutional neural networks(CNNs) to scientific challenges, if the data can be ex-pressed as an image or series of images [5,6]. By usinglow-level data representations, it is possible to constructhigh-fidelity classifiers that are able to generalize acrossfeature scales and event topologies. At the same time,such classifiers can be robust, general event classifiers astheir construction is event topology-independent, mak-ing them well-suited to merged and variable decay struc-tures.While the full potential of end-to-end classifiers inhigh-energy physics lies in probing challenging BSMmodels, we first choose a simple but illustrative pro- a r X i v : . [ h e p - e x ] J u l M. Andrews, M. Paulini, S. Gleyzer, B. Poczos cess to gain more insight into what such classifiers areable to learn, and to address some of the challenges in-volved in their application to searches for new physics.While CNNs have been used in the context of jet classi-fication [7,8], event classification on entire detector im-ages [9,10], and recursive neural networks on particle-level data [11] for similar tasks, this is the first appli-cation of an end-to-end classifier for event classifica-tion using high-quality LHC detector data. We explorethe decay of the SM Higgs boson to two photons usingthe 2012 CMS Simulated Open Data, which utilize thehighest grade of detector simulation.This paper is organized as follows: in Section 2 weintroduce the data samples and event selection. Sec-tion 3 describes the CMS geometry. In Section 4 wediscuss the detector image construction, while we out-line our network and training procedure in Section 5.Results for end-to-end particle identification are pre-sented in Section 6. The classification of full high-energyphysics events is discussed in Section 7 and our conclu-sions are summarized in Section 8.
The 2012 CMS Open Data provides high-quality, sim-ulated CMS data events that we utilize to evaluate theend-to-end approach. The CMS Open Data containsthe highest grade of detector simulation available, using
Geant4 [12] to model the interaction of particles withthe detector material and the most detailed geometrymodel of the CMS detector.
Datasets.
For our signal sample, we choose the gluon fusion Higgsto diphoton dataset [13], gg → H → γγ , with a Higgsmass of m H = 125 GeV. For the background samples,we choose the two leading processes according to theirproduction cross-section: quark fusion to prompt dipho-ton [14], q ¯ q → γγ , or the so-called Born diphoton pro-duction, and γ +jet production [15]. The γγ backgroundis an irreducible background as it also contains two pho-tons in the final state, differing only in their kinematicswith the H → γγ photons. In the γ + jet background,the jet is electromagnetically enriched to deposit its en-ergy primarily in the electromagnetic calorimeter via aneutral meson decaying to two merged photons. The jetthus appears as a single photon-like cluster. While thereare other backgrounds involved in the Higgs to diphotondecay, the chosen backgrounds are representative of themost challenging types: kinematically-differentiated de-cays ( γγ ) and particle shower-differentiated decays dueto unresolved objects ( γ + jet). All the above samples account for the multi-parton interactions from the un-derlying event as well as pile-up (PU) [16]. The PU dis-tributions are run era dependent, ranging from a peakaverage PU of (cid:104) PU (cid:105) = 18-21. Event selection.
We categorize the samples based on pseudorapidity η ,where η = − ln[tan( θ/ θ is the spherical polarangle with respect to the beam axis. The central sam-ple is restricted to | η | < .
44 and the central+forward sample ranges up to | η | < .
3, with the region aroundthe electromagnetic calorimeter barrel-endcap bound-ary, 1 . < | η | < .
54, excluded. For both categories, werequire two reconstructed photons, each with transversemomentum p T >
20 GeV. Since the number of events islimited and unbalanced between datasets, with the low-est number coming from the γγ dataset, we apply nofurther photon quality requirements. We require, how-ever, that the reconstructed mass of the diphoton sys-tem is m γγ >
90 GeV. With these selections, we obtain63,502 and 135,602 events in the γγ dataset for the cen-tral and central+forward categories, respectively. Theselected events are broken down by run era in Table 1.For the remaining datasets, we take the first N i events fulfilling the same era i breakdown, to minimizelearning based on differences in pile-up. The Compact Muon Solenoid (CMS) detector is ar-ranged as a series of concentric cylindrical sections—including a barrel section and circular endcap sections—that encloses a central interaction point where the LHCproton beams collide. Each cylindrical detector sectionor subdetector specializes in measuring one or more as-pect of the particles decaying from the collision. To-gether, the information from the different subdetectorsis used to re-create as complete a picture as possible ofthe collision event, or event for short.3.1 GeometryWe focus on the three subdetectors most relevant forthis study: the inner tracking system (Tracker), theelectromagnetic calorimeter (ECAL), and the hadroniccalorimeter (HCAL). The Tracker is the innermost cylin-drical part of CMS and is responsible for detecting thehits associated with the tracks left by charged particlesas they fly outward from the interaction point. This isreflected in the use of fine silicon segments that provide nd-to-End Physics Event Classification with CMS Open Data 3
Table 1: Number of selected events by run era, per | η | category, per dataset. Category Run2012AB Run2012C Run2012DCentral 16308 24538 22206Central+forward 35141 47885 52576 precise spatial resolution but no practical energy mea-surement. In 2012, the Tracker was composed of 13 bar-rel layers and 14 endcap layers. To avoid particles slip-ping through cracks in the layers, the barrel and end-cap layers of the Tracker overlap in pseudorapidity ina non-trivial way. Each layer is composed of fine strip-or pixel-like silicon segments that provide the spatiallocalization. Moreover, the barrel and endcap sectionsof the Tracker are segmented differently: in cylindricalcoordinates, with the beamline as the axis, they are inaxis and azimuthal angle ( z, φ ) in the barrel and in ra-dius and azimuthal angle ( ρ, φ ) in the endcap, with thedimensions of the segments changing with layer.Surrounding the CMS Tracker system is the ECALsubdetector. The ECAL measures the energy depositsof electrons and photons by capturing almost all theirenergy using scintillating lead tungstate crystals. In thebarrel section (EB), which spans | η | < . iη EB ) and azimuthalangle ( iφ EB ) giving a 170 ×
360 crystal arrangement,respectively. This gives the EB an average granular-ity of ∆η EB × ∆φ EB = 0 . × . . < | η | < . iX, iY )with 7324 crystals per endcap. For reference, most elec-trons/photons will deposit >
90% of their energy withina 3 × iη HCAL ), azimuthal angle( iφ HCAL ), and readout depth ( d HCAL ). The depth seg-mentation varies with | η | but is uniform in φ . Com-bined, the HB and HE span the range | η | <
3, with theboundary between the two occurring at | η | = 1 . ∆η HCAL × ∆φ HCAL = 0 . × . × | iη HCAL | >
20, the φ granularity in the HE becomes more coarse still with ∆φ HCAL = 0 . iφ HCAL = 1 does not correspond to the same plane as iφ ECAL = 1, and thus must be shifted accordingly. To avoid particles slipping through cracks undetected,none of the barrel-endcap boundaries between the Tracker,ECAL, and HCAL overlap.3.2 ReconstructionBelow, we briefly describe how the particle interactionswith the detector are used to reconstruct the detectorhits which form the basis of the low-level data used inthe end-to-end approach. For reference, we also providean overview of how these low-level detector data areused to form the higher-level particle data convention-ally used for physics analyses.
Calorimeter Hit Reconstruction.
Since both the ECAL and HCAL are scintillating calorime-ters, they share similar strategies to the energy recon-struction of calorimeter deposits or hits [1]. As an elec-tromagnetic (hadronic) particle enters an ECAL crystal(HCAL tower), an electromagnetic (hadronic) showeris produced. This is detected as a light pulse which isdigitized into a series of amplitude readings over time—amounting to a short video of the shower evolution inthe ECAL crystal (HCAL tower). By fitting a pulseshape onto these digitized amplitudes, the energy andtiming associated with this deposit can then be deter-mined. These values are then calibrated to give a fi-nal reconstructed energy and time per ECAL crystal(HCAL tower), leading to what is known as the recon-structed hit . Tracker Hit & Track Reconstruction.
In contrast, as charged particles pass through the finelysegmented Tracker subdetector, they deposit very littleenergy in the silicon. As such, the Tracker hits pro-vide precise position information for charged particletrack reconstruction but no practical energy informa-tion. Using the hits recorded in the different layers ofthe Tracker, a combinatorial Kalman-filter pattern recog-nition algorithm [1] is used to iteratively fit chargedparticle tracks through the Tracker hits starting fromthe seed layer. From these reconstructed track fits, vari-ous track parameters can be obtained, in particular, thetrack’s position at the point of closest approach to thebeamline (perigee), and its transverse momentum fromits bending in the magnetic field of the CMS solenoid.
M. Andrews, M. Paulini, S. Gleyzer, B. Poczos
High-level Particle Reconstruction.
The reconstructed tracks and calorimeter hits are thebasic inputs to the rule-based CMS Particle Flow algo-rithm [2] that constructs intermediate-level data beforeproducing final, high-level particle data. These includeattributes, such as probable particle identity, kinemat-ics, and shower shape features. They serve as the pri-mary inputs to most event classifiers used in CMS anal-yses. In contrast, in the end-to-end approach, the inputsare the reconstructed tracks and calorimeter hits. Dueto the present unavailability of Tracker hits in the CMSOpen Data, we use the reconstructed tracks rather thanthe low-level hits , similar to the approach in [10].
The CMS Open Data contains information about thereconstructed hits for the ECAL (HCAL) subdetec-tors, making it possible to construct calorimeter im-ages whose pixels correspond exactly to physical crys-tals (towers). This is important because not all crystals(towers) have the exact same dimensions and imagescreated using averaged dimensions will incur some dis-tortion. Such a level of accuracy would not be possi-ble with intermediate-level data like calorimeter towers(which have an HCAL-like granularity) or the particle-level data which are no longer expressed in detectorcoordinates.
Combining Images.
The main challenge in combining subdetector imagesarises not from differences in granularity but from dif-ferences in segmentation and the fact that regions ofdissimilar segmentation overlap. For subdetector sec-tions which do not spatially overlap (e.g. the ECALbarrel and the ECAL endcap) these images are keptseparate. However, for subdetector sections which dooverlap, such as the ECAL barrel and the HCAL bar-rel calorimeters, the depth information will be compro-mised if the images are not combined at the input level.Even though, in 2-dimensional CNNs, convolutions arenot performed along the depth axis, the activationsalong the depth axis are still being summed over.Therefore, to investigate the trade-off between de-tector fidelity and image integration, we experimentwith different geometry strategies: we choose a sub-detector S to represent with the highest fidelity andproject all other subdetectors S (cid:48) to the segmentationand boundaries of S . Procedures for constructing ECAL-and HCAL-centric geometries are described below andvisualized for a single γ + jet event in Figure 1. ECAL Images.
The ECAL image is defined by reconstructed hit en-ergies and ECAL crystal coordinates. These are dis-tinct for the EB and the EEs since they have differentsegmentation (see Section 3.1). For the EB, we con-struct an unrolled rectangular 170 ×
360 image. For theEEs, we inscribe each circular EE section in a square100 ×
100 image. These define the ECAL-centric geom-etry. Alternatively, for the HCAL-centric geometry, weconstruct a contiguous ECAL image by projecting the( iX, iY )-segmented EEs onto an EB-like ( iη, iφ ) seg-mentation. These are then stitched to the ends of theEB image to form a single 280 ×
360 image that spansthe same η range as the HCAL. Since this results insparse showers in the endcap regions, we smear outeach hit over a 2 × HCAL Images.
The HCAL image is defined in terms of reconstructedhit energies versus HCAL tower coordinates. These areshared by the HB and the HEs due to their similarsegmentation. Since most events ( ≈ d depth for a given ( iη HCAL , iφ
HCAL ). In addition, sometowers overlap in physical η and are summed over aswell to provide consistent alignment with the ECALimage. Above | iη HCAL | >
20, where the φ granularityis halved (see Section 3.1), we share the energy acrosstwo iφ HCAL towers. We can thus construct a single,contiguous 56 ×
72 image for the combined HB andHE. Without loss of information, this image is upsam-pled by a factor of 5 to produce a 280 ×
360 HCALimage. This defines the HCAL-centric geometry. Forthe ECAL-centric geometry, the portions of this imagewhich overlap with EB are left untouched while thosewhich overlap with the EEs are detached and projectedfrom their native ( iη, iφ ) segmentation onto an EE-like( iX, iY ) segmentation, giving a 100 ×
100 image perendcap.
Tracker Images.
Because of the lack of Tracker hits in the CMS OpenData, the tracker image is constructed as a 2D his-togram of the reconstructed tracks’ ( η, φ ) positions atperigee in either ECAL- or HCAL-centric geometry. Tohelp discriminate against the numerous pile-up tracks,each track entry is weighted by its transverse momen-tum. Only high-purity tracks, or tracks with the highestlevel of fit quality, are used. nd-to-End Physics Event Classification with CMS Open Data 5(a) Barrel section of composite image in ECAL-centric geometry. Image resolution:170 × × For the central category, we use only the subde-tector images which overlap with the EB (Figure 1a)giving image inputs of resolution 170 × ×
360 and 100 ×
100 for the ECAL-centric geometry, and 280 ×
360 for the HCAL-centricone. Lastly, while the event selection described in Sec-tion 2 applies η cuts on candidate photons, no suchcuts are applied in the construction of the actual de-tector images in this paper, although this remains anoption for future work. At the heart of the end-to-end classifier is a CNN. Inthis Section, we describe how these deep learning net-works are applied in order to extract information fromthe various subdetector images (see Section 4) in a waythat best complements each subdetector’s knowledgeof the event. Afterwards, we discuss some of the chal- lenges associated with using end-to-end classifiers, howwe train them, and how we evaluate their performancein this study.5.1 Network ArchitectureFor all image-based classifiers, Residual Net-type net-works (ResNet-15) are used due to their simplicity andscalability with image size and network depth [17]. Arepresentative network is illustrated in Figure 2a. Sinceimage pixel intensities carry information about energyscale, the best performance is obtained when using Max-Pooling operations with no batch normalization insteadof AveragePooling. For samples in the central category,we use a single ResNet-15. For those in the central+forwardcategory with ECAL-centric geometry, we use a sepa-rate ResNet-15 for each of barrel, endcap-, and end-cap+. They are concatenated at the output of their fi-nal GlobalMaxPooling layer before being fed to a Fully-Connected Network (FCN), as illustrated in Figure 2c.In the central+forward HCAL-centric geometry, we use
M. Andrews, M. Paulini, S. Gleyzer, B. Poczos(c) Composite image in HCAL-centric geometry. Extent of EB indicated by minorticks on y-axis. Image resolution: 280 × Fig. 1: Composite images of a single γ +jet event in different geometry strategies: separate Barrel (1a) and Endcaps(1b) for the ECAL-centric geometry, and stitched together (1c) for the HCAL-centric. Tracks are in yellow logscale, ECAL hits in blue log scale, and HCAL hits in gray linear scale. Additional zero suppression applied forclarity. Note the photon at around ( iη = 70 , iφ = 130) which is free of HCAL hits or Tracks. In contrast, the jetat around ( iη = − , iφ = 340) shows contributions from all three subdetectors. Only the Barrel images (1a) areused for classification in the central category (see Section 2).a single ResNet-15. The various end-to-end classifiermodels are summarized in Table 2.Within the available statistics, the end-to-end re-sults do not benefit from deeper networks or the inclu-sion of a FCN in the case of a single ResNet-15. Othervariations on concatenating the networks for multipleimages were attempted but were found to be less per-formant.To serve as a reference for conventional event classi-fiers, we train a separate 3-layer, 256-node FCN on thereconstructed 4-momenta of the two candidate photonsin each event, which we denote as the . Specifically, these are trained on the trans-verse momenta of the photons divided by the diphotonmass, p T,i /m γγ , their pseudorapidities η i , and the co-sine of their azimuthal separation, cos( φ − φ ), where i = 1 , m γγ en-sures the classifier is not correlated with the mass ofthe Higgs boson [3]. Note that this classifier serves asa purely kinematics-based reference and does not takeinto account information about the shape of the pho-ton showers. The 4-momentum classifier results were not sensitive to the depth and width of the FCN net-work.5.2 Preprocessing and Mass De-correlationPreprocessing plays a major role not just in improvingthe network optimization process but in controlling thephysics content of the inputs themselves. In particular,for an event classifier intended for a resonance search,it is desirable for the classifier’s output to not be corre-lated with the mass of the signal resonance. Since onetypically applies a cut on the classifier score to obtain asignal-enriched sample, this mitigates the risk of sculpt-ing a false peak in the background.Mass-sculpting—as it is commonly called—is espe-cially an issue for irreducible backgrounds that differonly by kinematics and where good mass resolution isavailable from the classifier inputs. To measure it, weuse the Cram´er-von Mises (CVM) metric suggested in[18] and implemented in approximate form in [19]. Wecalculate this on the classifier’s signal score vs. recon- nd-to-End Physics Event Classification with CMS Open Data 7 Table 2: Summary of end-to-end models used in this paper. *NOTE: Models from the central category only usethe barrel portion of the subdetector images (c.f. Figure 1a).
Model Category Architecture InputsEB Central ResNet-15 ECAL*CMS-B Central ResNet-15 Tracker, ECAL, HCAL*ECAL Central+Fwd 3 x ResNet-15, FCN ECALCMS-I Central+Fwd 3 x ResNet-15, FCN Tracker, ECAL, HCALCMS-II Central+Fwd ResNet-15 Tracker, ECAL, HCAL
MaxPool, /2Residual Block,16Residual Block,32, /2Residual Block,32GlobalMaxPoolConv2D, 7x7 16, /2 x3x3x3x3 (a) ResNet-15
Conv2D, 3x3Conv2D, 3x3
ReLUReLU (b) The Residual block with skip connection. [Endcap -] [Barrel]ResNet-15FC, 128FC, 128ResNet-15 [Endcap +]ResNet-15Concatenate (c) Concatenation of multiple ResNet-15 networks from sepa-rate barrel and endcap inputs.
Fig. 2: The Residual Net (ResNet) architecture, as usedfor single (2a) and multiple (2c) image inputs. structed diphoton mass for true γγ events, as illustratedin Figure 4a.To achieve mass de-correlation, we divide each im-age by the reconstructed diphoton mass for that event.To first approximation, this has the effect of transform-ing the energy scale of the diphoton system to haveunit invariant mass for both signal and backgroundevents. Using the scalar sum of the diphoton p T s (i.e.neglecting the angular component) is also effective. Infact, any quantity that maps the diphoton invariantmass for both signal and background events to a sim-ilar distribution achieves a similar effect. However, inthe image-based approach, this only delays the onset ofmass-sculpting but does not completely eliminate it—we suspect the photon shower profile provides an al-ternate avenue for learning the energy of the shower.In practice, while one can implement early stopping tointercept the training before the mass is learned, it isdesirable to have a more robust and definitive guardagainst mass-sculpting. We accomplish this using anadditional loss term proportional to the CVM metricitself, as outlined in the following Subsection 5.3.Finally, because the energy deposits in the HCALtend to be significantly lower in magnitude, we rescalethe pixel intensities in the HCAL image by a constantto improve training.5.3 TrainingWe train for three-class ( H → γγ vs. γγ vs. γ + jet)classification with the normalized classification scoresfor each class label given by the softmax function. Weuse the ADAM adaptive learning rate optimizer [20]to minimize the cross-entropy loss. In general, we usean initial learning rate of 5 × − , batch size of 360,and implement early stopping if no progress is seenbeyond 5 epochs. The breakdown of training and testset—which doubles as the validation set—is shown inTable 3. These limited statistics provide a slight ad-vantage to the 4-momentum classifier which has lessweights to train. Both training and validation sets con-tain balanced samples of the three classes. All training M. Andrews, M. Paulini, S. Gleyzer, B. Poczos was done using the
PyTorch [21] software library run-ning on a single NVIDIA Titan X GPU.To achieve more robust mass de-correlation, in ad-dition to the cross-entropy loss, we minimize an addi-tional loss term proportional to the CVM metric [18]of the training batch. Specifically, we implement the k -Nearest Neighbors version of the CVM gradient with k = 12 and a strength of λ = 0 .
15 (central) or 0 . < .
002 for the cen-tral category and CVM < .
004 for the central+forwardcategory without the need for early stopping. A detaileddescription of this matter is outside the scope of thispaper and will be treated in a separate paper.Table 3: Number of events in training and test sets foreach class . Test set doubles as validation set due to lim-ited statistics. The total training and test sets containa balanced number of class samples.
Category Training Events Test Eventsper class per classCentral 51200 11800Central+forward 120000 15600
Note that, in this paper, priority is given to present-ing a broad and consistent survey of end-to-end clas-sifiers over individual classifier optimization. As such,individual classifier hyper-parameter tuning was keptto a minimum, although we found the above trainingparameters to be robust across the different end-to-endclassifiers.5.4 EvaluationWe use the area under the curve (AUC) of the nor-malized Receiver Operating Characteristic (ROC) asthe main figure of merit in this paper. As is commonin High Energy Physics, the ROC curve is interpretedin terms of the signal sample efficiency (true positiverate) vs. background sample rejection (true negativerate). To evaluate the multi-class classification results,we define a per-class (1-vs-Rest) ROC and select theclassifier with the best ROC AUC score in the signal la-bel, subject to constraints on the CVM metric. To bet-ter understand the performance between the individualbackgrounds in the multi-class classifier, for each back-ground, we also present the (1-vs-1) signal vs. singlebackground component of the ROC in the signal label.While the latter helps to give a sense of the individualbackground performance, it should be noted that multi-class classification is an inherently coupled problem, as is the nature of physics event classification. Lastly, asnoted in Table 3, the evaluation is done on a balanced mix of class samples. While this is not necessarily thecase in reality, it allows for an unbiased assessment ofclassifier performance in this simplified context.
For reference, we recap our earlier results [22] in end-to-end classification of electromagnetic showers, or par-ticle identification in the ECAL. Using the same im-age construction techniques described in Section 4, wesuccessfully discriminated simulated electron- ( e − ) vs.photon- ( γ ) induced showers in the ECAL Barrel. Whilenot a practical task when track information is takeninto account, when the ECAL information is taken inisolation, electron- and photon-induced showers appearnearly identical. Through higher-order effects such asbremsstrahlung due to the electron’s interaction withthe Tracker material and bending from the CMS solenoidthat the electron shower becomes slightly smeared andasymmetric in φ —an effect that is practically impossi-ble to discern by eye.In Table 4, we present the best-in-category resultsof using CNN-based ( CNN ), convolutional long short-term memory-based (
Conv-LSTM ) recurrent neuralnetworks [23], and fully-connected neural networks (
FCN )on 32 ×
32 ECAL images centered on the shower maxi-mum constructed out of various low-level data inputs.Our results suggest a preference for convolutional-basedarchitectures, and that it is sufficient to use reconstructedhit energies (see Section 3.2) to attain best results.Moreover, the range of these scores serves to illustratehow sensitive end-to-end classifiers are at processinglow-level detector information even when the showersappear indistinguishable to the naked eye.Table 4: Best-in-Category results of e − vs. γ showerclassification on 32 ×
32 ECAL Barrel images.
Energy inputs correspond to reconstructed hit energies, while digis correspond to the series of digitized amplitudesvs. time (see Section 3.2).
Category Network, Input ROC AUCCNN VGG, energy 0.807LSTM Conv-LSTM, digis 0.799FCN 3-layers, digis 0.770
As a further step, we take the whole EB image in-stead of just the shower crop. The e − vs. γ results areshown in Table 5. We observe a minimal loss in perfor-mance, owing to the CNN’s ability to learn features in a nd-to-End Physics Event Classification with CMS Open Data 9 translationally-invariant manner. More importantly, wesee a marked improvement in the classification of un-correlated particle gun pairs ( e + e − vs. γγ ) due to theCNN’s having learned that the shower pairs must beeither both electron-like or both photon-like. This sug-gests that topological complexity works in favor of theend-to-end approach and that the greatest gains maycome from more challenging topologies.Table 5: Results of shower classification on full ECALBarrel images. Classification Network, Input ROC AUC e − vs. γ ResNet, energy 0.788 e + e − vs. γγ ResNet, energy 0.997
In sum, these results illustrate the ability of end-to-end classifiers to discern fine shower details in granulardetectors like the CMS ECAL. However, when dealingwith particles from real physics decays, there is the ad-ditional complexity introduced by kinematics , which weturn our attention to next.
In a real physics process, energy and momentum conser-vation impose physical constraints on the allowed kine-matics of the produced particles. The case of the Higgsboson decay and its related backgrounds is no excep-tion. For the γγ background, the shower types are, infact, identical to those of the H → γγ decay and anydifferences are entirely due to kinematics. For the γ +jetbackground, in addition to kinematic differences, one ofthe particles is of a different type, and the opportunityexists to exploit differences in the particle shower shape,as we have already seen in the previous Section 6. More-over, in a realistic physics scenario, the classifier mustalso simultaneously discriminate between multiple de-cay processes. In this section, we therefore attempt toclassify H → γγ vs. γγ vs. γ + jet backgrounds.The end-to-end event classification results are di-vided by pseudorapidity (see Section 2), with the re-sults for the central (central+forward) category givenin Table 6 (Table 7), and corresponding ROC curves inFigure 3 (Figure 5). The ECAL-only classifier is labeled EB ( ECAL ) and the Tracks+ECAL+HCAL classifierin the ECAL-centric geometry is labeled
CMS-B ( CMS-I ). For the central+forward region, we also include theresults of the HCAL-centric classifier (
CMS-II ). Ineach category, we plot the signal vs. combined back-ground ROC (1-vs-Rest), as well as the signal vs. single background ROC component (1-vs-1) (see Section 5.4).For context, we also include the results of the (mass de-correlated) 4-momentum-only classifier ( ) andthe mass-aware ECAL-only classifier (
EB/ECAL, mass-aware ). Note that the results represent evaluations onbalanced class samples (see Section 5.4). Due to the lim-ited training statistics available—which gives an edge tothe 4-momentum classifier—the following results shouldnot be taken as indications of ultimate end-to-end clas-sifier performance.7.1 Central Pseudorapidity RegionTo interpret these results, we first focus on the centralcategory where we only use detector images from thebarrel section of CMS. From the 1-vs-Rest plot (Fig-ure 3a), we see that, overall, image-based classifiersdeliver substantially better performance versus purelykinematics-based classifiers. This is expected in the pres-ence of a shower-differentiated background but servesto confirm that the end-to-end classifier is performingas expected. We also note that the EB and CMS-B classifiers perform comparably, with only negligible ad-vantage to including additional subdetectors. Since the γγ background is a signature exclusively reconstructedin the ECAL, and the γ + jet background deposits themajority of its energy in the ECAL, this is expected.However, looking back at the event image in Figure 1,we see that other subdetectors carry significant noisefrom collision event pile-up and underlying event. It isthus not a priori evident that the CMS-B classifiershould perform as well as the EB under these circum-stances. That no sizable degradation in performanceis seen from the inclusion of additional noisy subde-tectors indicates the ability of the end-to-end classifierto effectively screen out irrelevant features from extra-neous image features. Lastly, we note that the high-est performance—by a substantial margin—is achievedwhen the end-to-end classifier is allowed to learn theHiggs boson mass, or is mass-aware (see Table 6). Togain further insight into this behaviour, we next turnto the individual background components.Looking at the binary H → γγ vs. γγ component(Figure 3b), the end-to-end classifier seems to under-perform compared to the classifier. We attributethis to the limited statistics available in training theend-to-end classifier. Additional studies performed withhigher statistics on private samples of similar detectorfidelity show similar or better performance for the end-to-end classifier compared to the 4-momentum-only clas-sifier. An indication of this statistical limitation is de-scribed in the following subsection for the Central+Forwardcategory which contains more training statistics. This Table 6: Multi-class Event Classification Results, central | η | < .
44 region.
Metric 4-mom EB, mass-aware EB CMS-B
ROC AUC, 1-vs-Rest H → γγ γγ γ + jet 0.803 0.954 0.942 0.959 ROC AUC, 1-vs-1 H → γγ vs γγ H → γγ vs γ + jet 0.773 0.968 0.937 0.956 CVM demonstrates that, at least for this study, we have paidno penalty in using a general classifier trained on low-level detector data over a specialized kinematics-basedclassifier that relied on our ability to reconstruct theevent. The similarity in performance to the (mass de-correlated) 4-momentum classifier suggests the kine-matic information is manifested in the detector imagein two ways: the angular distribution of the photonshowers and the absolute energy of the photon shower’sconstituent hits. While mass de-correlation is clearly alossy operation, it preserves the angular informationeven though it removes the absolute energy scale, al-lowing for residual discrimination against irreduciblebackgrounds.Turning now to the H → γγ vs. γ + jet component(Figure 3c), we see that this particular background isprimarily responsible for the end-to-end advantage overthe kinematics-only approach. This is expected becausethe jet (typically a merged π → γγ ) is not fully re-solved and is instead reconstructed as a single photon inthe 4-momentum case, which is not supplemented withadditional shower shape information. Despite register-ing as a single photon-like cluster, the jet appears in theECAL image as a differentiated shower, which, on occa-sion, is discernible by eye (see Figure 1). As reviewed inSection 6, end-to-end classifiers are highly sensitive todifferences in particle shower shapes even when no dis-tinguishing kinematic information is present. Moreover,the γ +jet decay exhibits similar non-resonant kinemat-ics to the γγ background and thus, to the 4-momentumclassifier, the two should look alike. This is confirmedby their similar 4-momentum results (c.f. Figure 3b vs.3c). Owing to strong shower differentiation, the effect ofmass de-correlation on the γ + jet background is muchreduced. The impact of mass de-correlation dependsstrongly on the importance of kinematics—in particu-lar, of the energy scale—against shower differentiation.For decays predominantly differentiated by kinematics,the effect of de-correlation will be substantial, while forprimarily shower-differentiated decays, the effect will beminimal. In Figure 4, we plot the classifier output in the signal label vs. diphoton mass for the two backgroundsto illustrate the impact of mass de-correlation.7.2 Central and Forward Pseudorapidity RegionIn this category, we have included the Endcap imageseither in ECAL-centric ( ECAL , CMS-I ) or HCAL-centric (
CMS-II ) fashion (see Sections 4 and 5.1). Ingeneral, we find the qualitative conclusions from thecentral category to be also relevant for the central+forward category. This informs us about the scalabilityof end-to-end network architectures and their ability towithstand the increased pile-up of the forward detec-tor regions. The most notable difference is the signifi-cantly improved performance of the end-to-end classi-fiers in H → γγ vs. γγ discrimination (c.f. Figure 3bvs. 5b) due to the larger available statistics for this cat-egory (see Section 5.3). This is to be contrasted withthe classifier whose performance has mostlyplateaued, and the H → γγ vs. γ + jet discrimina-tion (c.f. Figure 3c vs. 5c), which has mostly convergedfor both types of classifiers. This provides additionalgrounds to our claim that the end-to-end results forkinematic discrimination are statistically limited. In ad-dition, we note that the ECAL-centric ECAL, CMS-I classifiers tend to outperform the HCAL-centric
CMS-II classifier for γγ discrimination even though they havelarger networks to train. This is, however, expectedgiven the transformation applied to the ECAL endcaps(see Section 4) and the role that spatial resolution playsin measuring particle kinematics for this background.In sum, we find the biggest gains in discriminat-ing backgrounds which have subtle shower differencesas these maximally exploit both the full fidelity of theECAL detector-level data and the CNN’s ability tolearn patterns at the level of individual image pixels. In this paper, we described the construction of general ,end-to-end, image-based event classifiers, using high- nd-to-End Physics Event Classification with CMS Open Data 11
Table 7: Multi-class Event Classification Results, central+forward | η | < . Metric 4-mom ECAL, mass-aware ECAL CMS-I CMS-II
ROC AUC, 1-vs-Rest H → γγ γγ γ + jet 0.810 0.955 0.939 0.954 0.956 ROC AUC, 1-vs-1 H → γγ vs γγ H → γγ vs γ + jet 0.788 0.972 0.936 0.954 0.953 CVM fidelity simulated low-level CMS detector data as input.The use of end-to-end classifiers is not restricted to anyparticular topology and does not rely on the ability tofully reconstruct the event kinematics. It can be appliedto arbitrarily complex event topologies and be partic-ularly relevant to cases were traditional reconstructionapproaches are difficult, for example, for highly boostedand merged topologies that arise in many BSM models[24]. To combine overlapping subdetector images of dis-similar segmentation, we chose one subdetector to ren-der faithfully and projected all other subdetectors to itssegmentation. While these classifiers are best suited tochallenging decays, we have applied them in a simpli-fied manner to the SM H → γγ decay to highlight theirkey features and challenges. Through the irreducible γγ background, we were able to infer that such classifiersare able to learn about the angular distribution of thephoton showers as well as the absolute energy of theirconstituent hits. We showed that we can definitivelyde-correlate the event classifier from the reconstructeddiphoton mass while still preserving the angular infor-mation by using a CVM-based loss penalty. We foundthat such classifiers can learn about the photon showershapes giving them an exceptionally strong advantageover a purely kinematics-based classifier in suppressingthe reducible γ + jet background. Finally, we demon-strated the scalability and flexibility of the end-to-endclassifiers when dealing with multiple detector imagesand networks, where they exhibited robustness againstthe presence of underlying event and pile-up. Acknowledgements
We thank the entire CMS Collabora-tion for successfully recording LHC proton-proton collisiondata as well as producing and releasing high quality simu-lated data used in this paper. We also congratulate all mem-bers in the CERN accelerator departments for the excellentperformance of the LHC and thank the technical and ad-ministrative staffs at CERN and at other CMS institutes fortheir contributions to the success of the CMS effort. In ad-dition, we gratefully acknowledge the computing centres andpersonnel of the Worldwide LHC Computing Grid for deliv-ering so effectively the computing infrastructure essential toCMS analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMSdetector.We would like to thank the CERN Open Data group forreleasing their simulated data under an open access policy.We strongly support initiatives to provide such high-qualitysimulated datasets that can encourage the development ofnovel but also realistic algorithms, especially in the area ofmachine learning. We believe their continued availability willbe of great benefit to the high energy physics communityin the long run. Finally, M.A. and M.P. are supported bythe Office of High Energy Physics of the U.S. Department ofEnergy (DOE) under award de-sc0010118.
References
1. S. Chatrchyan, et al., The CMS Experiment at the CERNLHC, JINST , S08004 (2008). DOI 10.1088/1748-0221/3/08/S080042. A.M. Sirunyan, et al., Particle-flow reconstruction andglobal event description with the CMS detector, JINST (10), P10003 (2017). DOI 10.1088/1748-0221/12/10/P100033. V. Khachatryan, et al., Observation of the diphoton de-cay of the Higgs boson and measurement of its properties,Eur. Phys. J. C74 (10), 3076 (2014). DOI 10.1140/epjc/s10052-014-3076-z4. P. Baldi, P. Sadowski, D. Whiteson, Searching for Ex-otic Particles in High-Energy Physics with Deep Learn-ing, Nature Commun. , 4308 (2014). DOI 10.1038/ncomms53085. A. Krizhevsky, I. Sutskever, G.E. Hinton, in Advancesin Neural Information Processing Systems 25 , ed. byF. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger(Curran Associates, Inc., 2012), pp. 1097–11056. A. Esteva, et al., Dermatologist-level classification ofskin cancer with deep neural networks, Nature , 115(2017). DOI 10.1038/nature210567. L. de Oliveira, M. Kagan, L. Mackey, B. Nachman,A. Schwartzman, Jet-images deep learning edition,JHEP , 069 (2016). DOI 10.1007/JHEP07(2016)0698. G. Kasieczka, T. Plehn, M. Russell, T. Schell, Deep-learning Top Taggers or The End of QCD?, JHEP ,006 (2017). DOI 10.1007/JHEP05(2017)0069. A. Aurisano, et al., A Convolutional Neural NetworkNeutrino Event Classifier, JINST (09), P09001 (2016).DOI 10.1088/1748-0221/11/09/P0900110. W. Bhimji, et al., Deep Neural Networks for PhysicsAnalysis on low-level whole-detector data at the LHC,J. Phys. Conf. Ser. (4), 042034 (2018). DOI10.1088/1742-6596/1085/4/0420342 M. Andrews, M. Paulini, S. Gleyzer, B. Poczos(a) H → γγ vs. Rest(b) H → γγ vs. γγ component.(c) H → γγ vs. γ + jet component. Fig. 3: Multi-class Event Classification ROC curves,central | η | < .
44 region.
11. G. Louppe, K. Cho, C. Becot, K. Cranmer, QCD-AwareRecursive Neural Networks for Jet Physics, JHEP ,057 (2019). DOI 10.1007/JHEP01(2019)05712. S. Agostinelli, et al., GEANT4: A Simulation toolkit,Nucl. Instrum. Meth. A506 , 250 (2003). DOI 10.1016/S0168-9002(03)01368-813. CMS Collaboration. Simulated dataset GluGluH-ToGG M-125 8TeV-pythia6 in AODSIM format for 2012collision data. (2017). CERN Open Data Portal.DOI:10.7483/OPENDATA.CMS.WQ7P.BZP314. CMS Collaboration. Simulated dataset DiPhoton-Born Pt-25To250 8TeV ext-pythia6 in AODSIM formatfor 2012 collision data. (2017). CERN Open Data Portal.DOI:10.7483/OPENDATA.CMS.WV7J.8GN015. CMS Collaboration. Simulated datasetGJet Pt40 doubleEMEnriched TuneZ2star 8TeV ext-pythia6 in AODSIM format for 2012 colli-sion data (2017). CERN Open Data Portal.DOI:10.7483/OPENDATA.CMS.2W51.W8AT16. M. Cacciari, G.P. Salam, Pileup subtraction using jetareas, Phys. Lett. B , 119 (2008). DOI 10.1016/j.physletb.2007.09.07717. K. He, X. Zhang, S. Ren, J. Sun, in
Proceedings, 2016IEEE Conference on Computer Vision and PatternRecognition (CVPR): Las Vegas, NV, USA, June 27-30,2016 (2016), pp. 770–778. DOI:10.1109/CVPR.2016.9018. A. Rogozhnikov, A. Bukva, V.V. Gligorov,A. Ustyuzhanin, M. Williams, New approaches forboosting to uniformity, JINST (03), T03002 (2015).DOI 10.1088/1748-0221/10/03/T0300219. Yandex Data School. Flavours ofPhysics Challenge Evaluation (2017).Https://github.com/yandexdataschool/flavours-of-physics-start/blob/master/evaluation.py20. D.P. Kingma, J. Ba, Adam: A Method for Stochastic Op-timization (2014)21. A. Paszke, et al., in Proceedings, 31st Conference on Neu-ral Information Processing Systems (NIPS 2017), LongBeach, CA, USA (2017)22. M. Andrews, M. Paulini, S. Gleyzer, B. Poczos, End-to-End Event Classification of High-Energy Physics Data, J.Phys. Conf. Ser. (4), 042022 (2018). DOI 10.1088/1742-6596/1085/4/04202223. X. Shi et al., Convolutional LSTM Network: A MachineLearning Approach for Precipitation Nowcasting (2015)24. B.A. Dobrescu, G.L. Landsberg, K.T. Matchev, Higgsboson decays to CP odd scalars at the Tevatron and be-yond, Phys. Rev. D , 075003 (2001). DOI 10.1103/PhysRevD.63.075003nd-to-End Physics Event Classification with CMS Open Data 13(a) True γγ events, without mass de-correlation (left) and with (right).(b) True γ + jet events, without mass de-correlation (left) and with (right). Fig. 4: Central
CMS-B classifier output in signal label vs. reconstructed diphoton mass for true γγ (4a) and γ +jet(4b) events, with and without mass de-correlation. The impact of de-correlation is more severe if the backgroundlacks shower differentiation. H → γγ vs. Rest(b) H → γγ vs. γγ component.(c) H → γγ vs. γ + jet component. Fig. 5: Multi-class Event Classification ROC curves,central+forward | η | < ..