[PDF] End-to-End Physics Event Classification with CMS Open Data: Applying Image-Based Deep Learning to Detector Data for the Direct Classification of Collision Events at the LHC

Abstract

This paper describes the construction of novel end-to-end image-based classifiers that directly leverage low-level simulated detector data to discriminate signal and background processes in pp collision events at the Large Hadron Collider at CERN. To better understand what end-to-end classifiers are capable of learning from the data and to address a number of associated challenges, we distinguish the decay of the standard model Higgs boson into two photons from its leading background sources using high-fidelity simulated CMS Open Data. We demonstrate the ability of end-to-end classifiers to learn from the angular distribution of the photons recorded as electromagnetic showers, their intrinsic shapes, and the energy of their constituent hits, even when the underlying particles are not fully resolved, delivering a clear advantage in such cases over purely kinematics-based classifiers.

Full PDF

CComputing and Software for Big Science manuscript No. (will be inserted by the editor)

End-to-End Physics Event Classiﬁcation with CMS Open Data

Applying Image-Based Deep Learning to Detector Data for the Direct Classiﬁcationof Collision Events at the LHC

M. Andrews, M. Paulini, S. Gleyzer, B. Poczos

July 26, 2019

Abstract

This paper describes the construction of novelend-to-end image-based classiﬁers that directly leveragelow-level simulated detector data to discriminate signaland background processes in pp collision events at theLarge Hadron Collider at CERN. To better understandwhat end-to-end classiﬁers are capable of learning fromthe data and to address a number of associated chal-lenges, we distinguish the decay of the standard modelHiggs boson into two photons from its leading back-ground sources using high-ﬁdelity simulated CMS OpenData. We demonstrate the ability of end-to-end classi-ﬁers to learn from the angular distribution of the pho-tons recorded as electromagnetic showers, their intrinsicshapes, and the energy of their constituent hits, evenwhen the underlying particles are not fully resolved,delivering a clear advantage in such cases over purelykinematics-based classiﬁers.

Keywords end-to-end · detector images · machinelearning · deep learning · CNN · Resnet · photon ID · event classiﬁcation · mass sculpting · LHC · CMS · open data · higgs boson M. Andrews, M. PauliniDepartment of PhysicsCarnegie Mellon UniversityPittsburgh, USAS. GleyzerDepartment of PhysicsUniversity of FloridaGainesville, USAB. PoczosMachine Learning DepartmentCarnegie Mellon UniversityPittsburgh, USA

An important aspect of searches for physics beyond thestandard model (SM) of particle physics at the CERNLarge Hadron Collider (LHC) is the identiﬁcation ofsignal events from their corresponding backgrounds. Atthe Compact Muon Solenoid (CMS) experiment [1], thistask is accomplished by ﬁrst reconstructing the low-level detector data into progressively more physically-motivated quantities [2] until arriving at tabular-likeparticle-level data. Traditional analysis approaches [3,4] use these condensed inputs to construct an event clas-siﬁer that capitalizes on the decay structure or topologyof the processes involved. While such approaches havebeen widely successful in understanding the SM, theypotentially lose information in the process and limitmore exhaustive searches for physics beyond the stan-dard model (BSM).In this paper, we propose a new approach for par-ticle physics event classiﬁcation that directly leverageslow-level detector data as input—an end-to-end eventclassiﬁer. Recent machine learning advances, in partic-ular in the ﬁeld of computer vision, have led to break-through applications of convolutional neural networks(CNNs) to scientiﬁc challenges, if the data can be ex-pressed as an image or series of images [5,6]. By usinglow-level data representations, it is possible to constructhigh-ﬁdelity classiﬁers that are able to generalize acrossfeature scales and event topologies. At the same time,such classiﬁers can be robust, general event classiﬁers astheir construction is event topology-independent, mak-ing them well-suited to merged and variable decay struc-tures.While the full potential of end-to-end classiﬁers inhigh-energy physics lies in probing challenging BSMmodels, we ﬁrst choose a simple but illustrative pro- a r X i v : . [ h e p - e x ] J u l M. Andrews, M. Paulini, S. Gleyzer, B. Poczos cess to gain more insight into what such classiﬁers areable to learn, and to address some of the challenges in-volved in their application to searches for new physics.While CNNs have been used in the context of jet classi-ﬁcation [7,8], event classiﬁcation on entire detector im-ages [9,10], and recursive neural networks on particle-level data [11] for similar tasks, this is the ﬁrst appli-cation of an end-to-end classiﬁer for event classiﬁca-tion using high-quality LHC detector data. We explorethe decay of the SM Higgs boson to two photons usingthe 2012 CMS Simulated Open Data, which utilize thehighest grade of detector simulation.This paper is organized as follows: in Section 2 weintroduce the data samples and event selection. Sec-tion 3 describes the CMS geometry. In Section 4 wediscuss the detector image construction, while we out-line our network and training procedure in Section 5.Results for end-to-end particle identiﬁcation are pre-sented in Section 6. The classiﬁcation of full high-energyphysics events is discussed in Section 7 and our conclu-sions are summarized in Section 8.

The 2012 CMS Open Data provides high-quality, sim-ulated CMS data events that we utilize to evaluate theend-to-end approach. The CMS Open Data containsthe highest grade of detector simulation available, using

Geant4 [12] to model the interaction of particles withthe detector material and the most detailed geometrymodel of the CMS detector.

Datasets.

For our signal sample, we choose the gluon fusion Higgsto diphoton dataset [13], gg → H → γγ , with a Higgsmass of m H = 125 GeV. For the background samples,we choose the two leading processes according to theirproduction cross-section: quark fusion to prompt dipho-ton [14], q ¯ q → γγ , or the so-called Born diphoton pro-duction, and γ +jet production [15]. The γγ backgroundis an irreducible background as it also contains two pho-tons in the ﬁnal state, diﬀering only in their kinematicswith the H → γγ photons. In the γ + jet background,the jet is electromagnetically enriched to deposit its en-ergy primarily in the electromagnetic calorimeter via aneutral meson decaying to two merged photons. The jetthus appears as a single photon-like cluster. While thereare other backgrounds involved in the Higgs to diphotondecay, the chosen backgrounds are representative of themost challenging types: kinematically-diﬀerentiated de-cays ( γγ ) and particle shower-diﬀerentiated decays dueto unresolved objects ( γ + jet). All the above samples account for the multi-parton interactions from the un-derlying event as well as pile-up (PU) [16]. The PU dis-tributions are run era dependent, ranging from a peakaverage PU of (cid:104) PU (cid:105) = 18-21. Event selection.

We categorize the samples based on pseudorapidity η ,where η = − ln[tan( θ/ θ is the spherical polarangle with respect to the beam axis. The central sam-ple is restricted to | η | < .

44 and the central+forward sample ranges up to | η | < .

3, with the region aroundthe electromagnetic calorimeter barrel-endcap bound-ary, 1 . < | η | < .

54, excluded. For both categories, werequire two reconstructed photons, each with transversemomentum p T >

20 GeV. Since the number of events islimited and unbalanced between datasets, with the low-est number coming from the γγ dataset, we apply nofurther photon quality requirements. We require, how-ever, that the reconstructed mass of the diphoton sys-tem is m γγ >

90 GeV. With these selections, we obtain63,502 and 135,602 events in the γγ dataset for the cen-tral and central+forward categories, respectively. Theselected events are broken down by run era in Table 1.For the remaining datasets, we take the ﬁrst N i events fulﬁlling the same era i breakdown, to minimizelearning based on diﬀerences in pile-up. The Compact Muon Solenoid (CMS) detector is ar-ranged as a series of concentric cylindrical sections—including a barrel section and circular endcap sections—that encloses a central interaction point where the LHCproton beams collide. Each cylindrical detector sectionor subdetector specializes in measuring one or more as-pect of the particles decaying from the collision. To-gether, the information from the diﬀerent subdetectorsis used to re-create as complete a picture as possible ofthe collision event, or event for short.3.1 GeometryWe focus on the three subdetectors most relevant forthis study: the inner tracking system (Tracker), theelectromagnetic calorimeter (ECAL), and the hadroniccalorimeter (HCAL). The Tracker is the innermost cylin-drical part of CMS and is responsible for detecting thehits associated with the tracks left by charged particlesas they ﬂy outward from the interaction point. This isreﬂected in the use of ﬁne silicon segments that provide nd-to-End Physics Event Classiﬁcation with CMS Open Data 3

Table 1: Number of selected events by run era, per | η | category, per dataset. Category Run2012AB Run2012C Run2012DCentral 16308 24538 22206Central+forward 35141 47885 52576 precise spatial resolution but no practical energy mea-surement. In 2012, the Tracker was composed of 13 bar-rel layers and 14 endcap layers. To avoid particles slip-ping through cracks in the layers, the barrel and end-cap layers of the Tracker overlap in pseudorapidity ina non-trivial way. Each layer is composed of ﬁne strip-or pixel-like silicon segments that provide the spatiallocalization. Moreover, the barrel and endcap sectionsof the Tracker are segmented diﬀerently: in cylindricalcoordinates, with the beamline as the axis, they are inaxis and azimuthal angle ( z, φ ) in the barrel and in ra-dius and azimuthal angle ( ρ, φ ) in the endcap, with thedimensions of the segments changing with layer.Surrounding the CMS Tracker system is the ECALsubdetector. The ECAL measures the energy depositsof electrons and photons by capturing almost all theirenergy using scintillating lead tungstate crystals. In thebarrel section (EB), which spans | η | < . iη EB ) and azimuthalangle ( iφ EB ) giving a 170 ×

360 crystal arrangement,respectively. This gives the EB an average granular-ity of ∆η EB × ∆φ EB = 0 . × . . < | η | < . iX, iY )with 7324 crystals per endcap. For reference, most elec-trons/photons will deposit >

90% of their energy withina 3 × iη HCAL ), azimuthal angle( iφ HCAL ), and readout depth ( d HCAL ). The depth seg-mentation varies with | η | but is uniform in φ . Com-bined, the HB and HE span the range | η | <

3, with theboundary between the two occurring at | η | = 1 . ∆η HCAL × ∆φ HCAL = 0 . × . × | iη HCAL | >

20, the φ granularity in the HE becomes more coarse still with ∆φ HCAL = 0 . iφ HCAL = 1 does not correspond to the same plane as iφ ECAL = 1, and thus must be shifted accordingly. To avoid particles slipping through cracks undetected,none of the barrel-endcap boundaries between the Tracker,ECAL, and HCAL overlap.3.2 ReconstructionBelow, we brieﬂy describe how the particle interactionswith the detector are used to reconstruct the detectorhits which form the basis of the low-level data used inthe end-to-end approach. For reference, we also providean overview of how these low-level detector data areused to form the higher-level particle data convention-ally used for physics analyses.

Calorimeter Hit Reconstruction.

Since both the ECAL and HCAL are scintillating calorime-ters, they share similar strategies to the energy recon-struction of calorimeter deposits or hits [1]. As an elec-tromagnetic (hadronic) particle enters an ECAL crystal(HCAL tower), an electromagnetic (hadronic) showeris produced. This is detected as a light pulse which isdigitized into a series of amplitude readings over time—amounting to a short video of the shower evolution inthe ECAL crystal (HCAL tower). By ﬁtting a pulseshape onto these digitized amplitudes, the energy andtiming associated with this deposit can then be deter-mined. These values are then calibrated to give a ﬁ-nal reconstructed energy and time per ECAL crystal(HCAL tower), leading to what is known as the recon-structed hit . Tracker Hit & Track Reconstruction.

In contrast, as charged particles pass through the ﬁnelysegmented Tracker subdetector, they deposit very littleenergy in the silicon. As such, the Tracker hits pro-vide precise position information for charged particletrack reconstruction but no practical energy informa-tion. Using the hits recorded in the diﬀerent layers ofthe Tracker, a combinatorial Kalman-ﬁlter pattern recog-nition algorithm [1] is used to iteratively ﬁt chargedparticle tracks through the Tracker hits starting fromthe seed layer. From these reconstructed track ﬁts, vari-ous track parameters can be obtained, in particular, thetrack’s position at the point of closest approach to thebeamline (perigee), and its transverse momentum fromits bending in the magnetic ﬁeld of the CMS solenoid.

M. Andrews, M. Paulini, S. Gleyzer, B. Poczos

High-level Particle Reconstruction.

The reconstructed tracks and calorimeter hits are thebasic inputs to the rule-based CMS Particle Flow algo-rithm [2] that constructs intermediate-level data beforeproducing ﬁnal, high-level particle data. These includeattributes, such as probable particle identity, kinemat-ics, and shower shape features. They serve as the pri-mary inputs to most event classiﬁers used in CMS anal-yses. In contrast, in the end-to-end approach, the inputsare the reconstructed tracks and calorimeter hits. Dueto the present unavailability of Tracker hits in the CMSOpen Data, we use the reconstructed tracks rather thanthe low-level hits , similar to the approach in [10].

The CMS Open Data contains information about thereconstructed hits for the ECAL (HCAL) subdetec-tors, making it possible to construct calorimeter im-ages whose pixels correspond exactly to physical crys-tals (towers). This is important because not all crystals(towers) have the exact same dimensions and imagescreated using averaged dimensions will incur some dis-tortion. Such a level of accuracy would not be possi-ble with intermediate-level data like calorimeter towers(which have an HCAL-like granularity) or the particle-level data which are no longer expressed in detectorcoordinates.

Combining Images.

The main challenge in combining subdetector imagesarises not from diﬀerences in granularity but from dif-ferences in segmentation and the fact that regions ofdissimilar segmentation overlap. For subdetector sec-tions which do not spatially overlap (e.g. the ECALbarrel and the ECAL endcap) these images are keptseparate. However, for subdetector sections which dooverlap, such as the ECAL barrel and the HCAL bar-rel calorimeters, the depth information will be compro-mised if the images are not combined at the input level.Even though, in 2-dimensional CNNs, convolutions arenot performed along the depth axis, the activationsalong the depth axis are still being summed over.Therefore, to investigate the trade-oﬀ between de-tector ﬁdelity and image integration, we experimentwith diﬀerent geometry strategies: we choose a sub-detector S to represent with the highest ﬁdelity andproject all other subdetectors S (cid:48) to the segmentationand boundaries of S . Procedures for constructing ECAL-and HCAL-centric geometries are described below andvisualized for a single γ + jet event in Figure 1. ECAL Images.

The ECAL image is deﬁned by reconstructed hit en-ergies and ECAL crystal coordinates. These are dis-tinct for the EB and the EEs since they have diﬀerentsegmentation (see Section 3.1). For the EB, we con-struct an unrolled rectangular 170 ×

360 image. For theEEs, we inscribe each circular EE section in a square100 ×

100 image. These deﬁne the ECAL-centric geom-etry. Alternatively, for the HCAL-centric geometry, weconstruct a contiguous ECAL image by projecting the( iX, iY )-segmented EEs onto an EB-like ( iη, iφ ) seg-mentation. These are then stitched to the ends of theEB image to form a single 280 ×

360 image that spansthe same η range as the HCAL. Since this results insparse showers in the endcap regions, we smear outeach hit over a 2 × HCAL Images.

The HCAL image is deﬁned in terms of reconstructedhit energies versus HCAL tower coordinates. These areshared by the HB and the HEs due to their similarsegmentation. Since most events ( ≈ d depth for a given ( iη HCAL , iφ

HCAL ). In addition, sometowers overlap in physical η and are summed over aswell to provide consistent alignment with the ECALimage. Above | iη HCAL | >

20, where the φ granularityis halved (see Section 3.1), we share the energy acrosstwo iφ HCAL towers. We can thus construct a single,contiguous 56 ×

72 image for the combined HB andHE. Without loss of information, this image is upsam-pled by a factor of 5 to produce a 280 ×

360 HCALimage. This deﬁnes the HCAL-centric geometry. Forthe ECAL-centric geometry, the portions of this imagewhich overlap with EB are left untouched while thosewhich overlap with the EEs are detached and projectedfrom their native ( iη, iφ ) segmentation onto an EE-like( iX, iY ) segmentation, giving a 100 ×

100 image perendcap.

Tracker Images.

Because of the lack of Tracker hits in the CMS OpenData, the tracker image is constructed as a 2D his-togram of the reconstructed tracks’ ( η, φ ) positions atperigee in either ECAL- or HCAL-centric geometry. Tohelp discriminate against the numerous pile-up tracks,each track entry is weighted by its transverse momen-tum. Only high-purity tracks, or tracks with the highestlevel of ﬁt quality, are used. nd-to-End Physics Event Classiﬁcation with CMS Open Data 5(a) Barrel section of composite image in ECAL-centric geometry. Image resolution:170 × × For the central category, we use only the subde-tector images which overlap with the EB (Figure 1a)giving image inputs of resolution 170 × ×

360 and 100 ×

100 for the ECAL-centric geometry, and 280 ×

360 for the HCAL-centricone. Lastly, while the event selection described in Sec-tion 2 applies η cuts on candidate photons, no suchcuts are applied in the construction of the actual de-tector images in this paper, although this remains anoption for future work. At the heart of the end-to-end classiﬁer is a CNN. Inthis Section, we describe how these deep learning net-works are applied in order to extract information fromthe various subdetector images (see Section 4) in a waythat best complements each subdetector’s knowledgeof the event. Afterwards, we discuss some of the chal- lenges associated with using end-to-end classiﬁers, howwe train them, and how we evaluate their performancein this study.5.1 Network ArchitectureFor all image-based classiﬁers, Residual Net-type net-works (ResNet-15) are used due to their simplicity andscalability with image size and network depth [17]. Arepresentative network is illustrated in Figure 2a. Sinceimage pixel intensities carry information about energyscale, the best performance is obtained when using Max-Pooling operations with no batch normalization insteadof AveragePooling. For samples in the central category,we use a single ResNet-15. For those in the central+forwardcategory with ECAL-centric geometry, we use a sepa-rate ResNet-15 for each of barrel, endcap-, and end-cap+. They are concatenated at the output of their ﬁ-nal GlobalMaxPooling layer before being fed to a Fully-Connected Network (FCN), as illustrated in Figure 2c.In the central+forward HCAL-centric geometry, we use

M. Andrews, M. Paulini, S. Gleyzer, B. Poczos(c) Composite image in HCAL-centric geometry. Extent of EB indicated by minorticks on y-axis. Image resolution: 280 × Fig. 1: Composite images of a single γ +jet event in diﬀerent geometry strategies: separate Barrel (1a) and Endcaps(1b) for the ECAL-centric geometry, and stitched together (1c) for the HCAL-centric. Tracks are in yellow logscale, ECAL hits in blue log scale, and HCAL hits in gray linear scale. Additional zero suppression applied forclarity. Note the photon at around ( iη = 70 , iφ = 130) which is free of HCAL hits or Tracks. In contrast, the jetat around ( iη = − , iφ = 340) shows contributions from all three subdetectors. Only the Barrel images (1a) areused for classiﬁcation in the central category (see Section 2).a single ResNet-15. The various end-to-end classiﬁermodels are summarized in Table 2.Within the available statistics, the end-to-end re-sults do not beneﬁt from deeper networks or the inclu-sion of a FCN in the case of a single ResNet-15. Othervariations on concatenating the networks for multipleimages were attempted but were found to be less per-formant.To serve as a reference for conventional event classi-ﬁers, we train a separate 3-layer, 256-node FCN on thereconstructed 4-momenta of the two candidate photonsin each event, which we denote as the . Speciﬁcally, these are trained on the trans-verse momenta of the photons divided by the diphotonmass, p T,i /m γγ , their pseudorapidities η i , and the co-sine of their azimuthal separation, cos( φ − φ ), where i = 1 , m γγ en-sures the classiﬁer is not correlated with the mass ofthe Higgs boson [3]. Note that this classiﬁer serves asa purely kinematics-based reference and does not takeinto account information about the shape of the pho-ton showers. The 4-momentum classiﬁer results were not sensitive to the depth and width of the FCN net-work.5.2 Preprocessing and Mass De-correlationPreprocessing plays a major role not just in improvingthe network optimization process but in controlling thephysics content of the inputs themselves. In particular,for an event classiﬁer intended for a resonance search,it is desirable for the classiﬁer’s output to not be corre-lated with the mass of the signal resonance. Since onetypically applies a cut on the classiﬁer score to obtain asignal-enriched sample, this mitigates the risk of sculpt-ing a false peak in the background.Mass-sculpting—as it is commonly called—is espe-cially an issue for irreducible backgrounds that diﬀeronly by kinematics and where good mass resolution isavailable from the classiﬁer inputs. To measure it, weuse the Cram´er-von Mises (CVM) metric suggested in[18] and implemented in approximate form in [19]. Wecalculate this on the classiﬁer’s signal score vs. recon- nd-to-End Physics Event Classiﬁcation with CMS Open Data 7 Table 2: Summary of end-to-end models used in this paper. *NOTE: Models from the central category only usethe barrel portion of the subdetector images (c.f. Figure 1a).

Model Category Architecture InputsEB Central ResNet-15 ECAL*CMS-B Central ResNet-15 Tracker, ECAL, HCAL*ECAL Central+Fwd 3 x ResNet-15, FCN ECALCMS-I Central+Fwd 3 x ResNet-15, FCN Tracker, ECAL, HCALCMS-II Central+Fwd ResNet-15 Tracker, ECAL, HCAL

MaxPool, /2Residual Block,16Residual Block,32, /2Residual Block,32GlobalMaxPoolConv2D, 7x7 16, /2 x3x3x3x3 (a) ResNet-15

Conv2D, 3x3Conv2D, 3x3

ReLUReLU (b) The Residual block with skip connection. [Endcap -] [Barrel]ResNet-15FC, 128FC, 128ResNet-15 [Endcap +]ResNet-15Concatenate (c) Concatenation of multiple ResNet-15 networks from sepa-rate barrel and endcap inputs.

Fig. 2: The Residual Net (ResNet) architecture, as usedfor single (2a) and multiple (2c) image inputs. structed diphoton mass for true γγ events, as illustratedin Figure 4a.To achieve mass de-correlation, we divide each im-age by the reconstructed diphoton mass for that event.To ﬁrst approximation, this has the eﬀect of transform-ing the energy scale of the diphoton system to haveunit invariant mass for both signal and backgroundevents. Using the scalar sum of the diphoton p T s (i.e.neglecting the angular component) is also eﬀective. Infact, any quantity that maps the diphoton invariantmass for both signal and background events to a sim-ilar distribution achieves a similar eﬀect. However, inthe image-based approach, this only delays the onset ofmass-sculpting but does not completely eliminate it—we suspect the photon shower proﬁle provides an al-ternate avenue for learning the energy of the shower.In practice, while one can implement early stopping tointercept the training before the mass is learned, it isdesirable to have a more robust and deﬁnitive guardagainst mass-sculpting. We accomplish this using anadditional loss term proportional to the CVM metricitself, as outlined in the following Subsection 5.3.Finally, because the energy deposits in the HCALtend to be signiﬁcantly lower in magnitude, we rescalethe pixel intensities in the HCAL image by a constantto improve training.5.3 TrainingWe train for three-class ( H → γγ vs. γγ vs. γ + jet)classiﬁcation with the normalized classiﬁcation scoresfor each class label given by the softmax function. Weuse the ADAM adaptive learning rate optimizer [20]to minimize the cross-entropy loss. In general, we usean initial learning rate of 5 × − , batch size of 360,and implement early stopping if no progress is seenbeyond 5 epochs. The breakdown of training and testset—which doubles as the validation set—is shown inTable 3. These limited statistics provide a slight ad-vantage to the 4-momentum classiﬁer which has lessweights to train. Both training and validation sets con-tain balanced samples of the three classes. All training M. Andrews, M. Paulini, S. Gleyzer, B. Poczos was done using the

PyTorch [21] software library run-ning on a single NVIDIA Titan X GPU.To achieve more robust mass de-correlation, in ad-dition to the cross-entropy loss, we minimize an addi-tional loss term proportional to the CVM metric [18]of the training batch. Speciﬁcally, we implement the k -Nearest Neighbors version of the CVM gradient with k = 12 and a strength of λ = 0 .

15 (central) or 0 . < .

002 for the cen-tral category and CVM < .

004 for the central+forwardcategory without the need for early stopping. A detaileddescription of this matter is outside the scope of thispaper and will be treated in a separate paper.Table 3: Number of events in training and test sets foreach class . Test set doubles as validation set due to lim-ited statistics. The total training and test sets containa balanced number of class samples.

Category Training Events Test Eventsper class per classCentral 51200 11800Central+forward 120000 15600

Note that, in this paper, priority is given to present-ing a broad and consistent survey of end-to-end clas-siﬁers over individual classiﬁer optimization. As such,individual classiﬁer hyper-parameter tuning was keptto a minimum, although we found the above trainingparameters to be robust across the diﬀerent end-to-endclassiﬁers.5.4 EvaluationWe use the area under the curve (AUC) of the nor-malized Receiver Operating Characteristic (ROC) asthe main ﬁgure of merit in this paper. As is commonin High Energy Physics, the ROC curve is interpretedin terms of the signal sample eﬃciency (true positiverate) vs. background sample rejection (true negativerate). To evaluate the multi-class classiﬁcation results,we deﬁne a per-class (1-vs-Rest) ROC and select theclassiﬁer with the best ROC AUC score in the signal la-bel, subject to constraints on the CVM metric. To bet-ter understand the performance between the individualbackgrounds in the multi-class classiﬁer, for each back-ground, we also present the (1-vs-1) signal vs. singlebackground component of the ROC in the signal label.While the latter helps to give a sense of the individualbackground performance, it should be noted that multi-class classiﬁcation is an inherently coupled problem, as is the nature of physics event classiﬁcation. Lastly, asnoted in Table 3, the evaluation is done on a balanced mix of class samples. While this is not necessarily thecase in reality, it allows for an unbiased assessment ofclassiﬁer performance in this simpliﬁed context.

For reference, we recap our earlier results [22] in end-to-end classiﬁcation of electromagnetic showers, or par-ticle identiﬁcation in the ECAL. Using the same im-age construction techniques described in Section 4, wesuccessfully discriminated simulated electron- ( e − ) vs.photon- ( γ ) induced showers in the ECAL Barrel. Whilenot a practical task when track information is takeninto account, when the ECAL information is taken inisolation, electron- and photon-induced showers appearnearly identical. Through higher-order eﬀects such asbremsstrahlung due to the electron’s interaction withthe Tracker material and bending from the CMS solenoidthat the electron shower becomes slightly smeared andasymmetric in φ —an eﬀect that is practically impossi-ble to discern by eye.In Table 4, we present the best-in-category resultsof using CNN-based ( CNN ), convolutional long short-term memory-based (

Conv-LSTM ) recurrent neuralnetworks [23], and fully-connected neural networks (

FCN )on 32 ×

32 ECAL images centered on the shower maxi-mum constructed out of various low-level data inputs.Our results suggest a preference for convolutional-basedarchitectures, and that it is suﬃcient to use reconstructedhit energies (see Section 3.2) to attain best results.Moreover, the range of these scores serves to illustratehow sensitive end-to-end classiﬁers are at processinglow-level detector information even when the showersappear indistinguishable to the naked eye.Table 4: Best-in-Category results of e − vs. γ showerclassiﬁcation on 32 ×

32 ECAL Barrel images.

Energy inputs correspond to reconstructed hit energies, while digis correspond to the series of digitized amplitudesvs. time (see Section 3.2).

Category Network, Input ROC AUCCNN VGG, energy 0.807LSTM Conv-LSTM, digis 0.799FCN 3-layers, digis 0.770

As a further step, we take the whole EB image in-stead of just the shower crop. The e − vs. γ results areshown in Table 5. We observe a minimal loss in perfor-mance, owing to the CNN’s ability to learn features in a nd-to-End Physics Event Classiﬁcation with CMS Open Data 9 translationally-invariant manner. More importantly, wesee a marked improvement in the classiﬁcation of un-correlated particle gun pairs ( e + e − vs. γγ ) due to theCNN’s having learned that the shower pairs must beeither both electron-like or both photon-like. This sug-gests that topological complexity works in favor of theend-to-end approach and that the greatest gains maycome from more challenging topologies.Table 5: Results of shower classiﬁcation on full ECALBarrel images. Classiﬁcation Network, Input ROC AUC e − vs. γ ResNet, energy 0.788 e + e − vs. γγ ResNet, energy 0.997

In sum, these results illustrate the ability of end-to-end classiﬁers to discern ﬁne shower details in granulardetectors like the CMS ECAL. However, when dealingwith particles from real physics decays, there is the ad-ditional complexity introduced by kinematics , which weturn our attention to next.

In a real physics process, energy and momentum conser-vation impose physical constraints on the allowed kine-matics of the produced particles. The case of the Higgsboson decay and its related backgrounds is no excep-tion. For the γγ background, the shower types are, infact, identical to those of the H → γγ decay and anydiﬀerences are entirely due to kinematics. For the γ +jetbackground, in addition to kinematic diﬀerences, one ofthe particles is of a diﬀerent type, and the opportunityexists to exploit diﬀerences in the particle shower shape,as we have already seen in the previous Section 6. More-over, in a realistic physics scenario, the classiﬁer mustalso simultaneously discriminate between multiple de-cay processes. In this section, we therefore attempt toclassify H → γγ vs. γγ vs. γ + jet backgrounds.The end-to-end event classiﬁcation results are di-vided by pseudorapidity (see Section 2), with the re-sults for the central (central+forward) category givenin Table 6 (Table 7), and corresponding ROC curves inFigure 3 (Figure 5). The ECAL-only classiﬁer is labeled EB ( ECAL ) and the Tracks+ECAL+HCAL classiﬁerin the ECAL-centric geometry is labeled

CMS-B ( CMS-I ). For the central+forward region, we also include theresults of the HCAL-centric classiﬁer (

CMS-II ). Ineach category, we plot the signal vs. combined back-ground ROC (1-vs-Rest), as well as the signal vs. single background ROC component (1-vs-1) (see Section 5.4).For context, we also include the results of the (mass de-correlated) 4-momentum-only classiﬁer ( ) andthe mass-aware ECAL-only classiﬁer (

EB/ECAL, mass-aware ). Note that the results represent evaluations onbalanced class samples (see Section 5.4). Due to the lim-ited training statistics available—which gives an edge tothe 4-momentum classiﬁer—the following results shouldnot be taken as indications of ultimate end-to-end clas-siﬁer performance.7.1 Central Pseudorapidity RegionTo interpret these results, we ﬁrst focus on the centralcategory where we only use detector images from thebarrel section of CMS. From the 1-vs-Rest plot (Fig-ure 3a), we see that, overall, image-based classiﬁersdeliver substantially better performance versus purelykinematics-based classiﬁers. This is expected in the pres-ence of a shower-diﬀerentiated background but servesto conﬁrm that the end-to-end classiﬁer is performingas expected. We also note that the EB and CMS-B classiﬁers perform comparably, with only negligible ad-vantage to including additional subdetectors. Since the γγ background is a signature exclusively reconstructedin the ECAL, and the γ + jet background deposits themajority of its energy in the ECAL, this is expected.However, looking back at the event image in Figure 1,we see that other subdetectors carry signiﬁcant noisefrom collision event pile-up and underlying event. It isthus not a priori evident that the CMS-B classiﬁershould perform as well as the EB under these circum-stances. That no sizable degradation in performanceis seen from the inclusion of additional noisy subde-tectors indicates the ability of the end-to-end classiﬁerto eﬀectively screen out irrelevant features from extra-neous image features. Lastly, we note that the high-est performance—by a substantial margin—is achievedwhen the end-to-end classiﬁer is allowed to learn theHiggs boson mass, or is mass-aware (see Table 6). Togain further insight into this behaviour, we next turnto the individual background components.Looking at the binary H → γγ vs. γγ component(Figure 3b), the end-to-end classiﬁer seems to under-perform compared to the classiﬁer. We attributethis to the limited statistics available in training theend-to-end classiﬁer. Additional studies performed withhigher statistics on private samples of similar detectorﬁdelity show similar or better performance for the end-to-end classiﬁer compared to the 4-momentum-only clas-siﬁer. An indication of this statistical limitation is de-scribed in the following subsection for the Central+Forwardcategory which contains more training statistics. This Table 6: Multi-class Event Classiﬁcation Results, central | η | < .

44 region.

Metric 4-mom EB, mass-aware EB CMS-B

ROC AUC, 1-vs-Rest H → γγ γγ γ + jet 0.803 0.954 0.942 0.959 ROC AUC, 1-vs-1 H → γγ vs γγ H → γγ vs γ + jet 0.773 0.968 0.937 0.956 CVM demonstrates that, at least for this study, we have paidno penalty in using a general classiﬁer trained on low-level detector data over a specialized kinematics-basedclassiﬁer that relied on our ability to reconstruct theevent. The similarity in performance to the (mass de-correlated) 4-momentum classiﬁer suggests the kine-matic information is manifested in the detector imagein two ways: the angular distribution of the photonshowers and the absolute energy of the photon shower’sconstituent hits. While mass de-correlation is clearly alossy operation, it preserves the angular informationeven though it removes the absolute energy scale, al-lowing for residual discrimination against irreduciblebackgrounds.Turning now to the H → γγ vs. γ + jet component(Figure 3c), we see that this particular background isprimarily responsible for the end-to-end advantage overthe kinematics-only approach. This is expected becausethe jet (typically a merged π → γγ ) is not fully re-solved and is instead reconstructed as a single photon inthe 4-momentum case, which is not supplemented withadditional shower shape information. Despite register-ing as a single photon-like cluster, the jet appears in theECAL image as a diﬀerentiated shower, which, on occa-sion, is discernible by eye (see Figure 1). As reviewed inSection 6, end-to-end classiﬁers are highly sensitive todiﬀerences in particle shower shapes even when no dis-tinguishing kinematic information is present. Moreover,the γ +jet decay exhibits similar non-resonant kinemat-ics to the γγ background and thus, to the 4-momentumclassiﬁer, the two should look alike. This is conﬁrmedby their similar 4-momentum results (c.f. Figure 3b vs.3c). Owing to strong shower diﬀerentiation, the eﬀect ofmass de-correlation on the γ + jet background is muchreduced. The impact of mass de-correlation dependsstrongly on the importance of kinematics—in particu-lar, of the energy scale—against shower diﬀerentiation.For decays predominantly diﬀerentiated by kinematics,the eﬀect of de-correlation will be substantial, while forprimarily shower-diﬀerentiated decays, the eﬀect will beminimal. In Figure 4, we plot the classiﬁer output in the signal label vs. diphoton mass for the two backgroundsto illustrate the impact of mass de-correlation.7.2 Central and Forward Pseudorapidity RegionIn this category, we have included the Endcap imageseither in ECAL-centric ( ECAL , CMS-I ) or HCAL-centric (

CMS-II ) fashion (see Sections 4 and 5.1). Ingeneral, we ﬁnd the qualitative conclusions from thecentral category to be also relevant for the central+forward category. This informs us about the scalabilityof end-to-end network architectures and their ability towithstand the increased pile-up of the forward detec-tor regions. The most notable diﬀerence is the signiﬁ-cantly improved performance of the end-to-end classi-ﬁers in H → γγ vs. γγ discrimination (c.f. Figure 3bvs. 5b) due to the larger available statistics for this cat-egory (see Section 5.3). This is to be contrasted withthe classiﬁer whose performance has mostlyplateaued, and the H → γγ vs. γ + jet discrimina-tion (c.f. Figure 3c vs. 5c), which has mostly convergedfor both types of classiﬁers. This provides additionalgrounds to our claim that the end-to-end results forkinematic discrimination are statistically limited. In ad-dition, we note that the ECAL-centric ECAL, CMS-I classiﬁers tend to outperform the HCAL-centric

CMS-II classiﬁer for γγ discrimination even though they havelarger networks to train. This is, however, expectedgiven the transformation applied to the ECAL endcaps(see Section 4) and the role that spatial resolution playsin measuring particle kinematics for this background.In sum, we ﬁnd the biggest gains in discriminat-ing backgrounds which have subtle shower diﬀerencesas these maximally exploit both the full ﬁdelity of theECAL detector-level data and the CNN’s ability tolearn patterns at the level of individual image pixels. In this paper, we described the construction of general ,end-to-end, image-based event classiﬁers, using high- nd-to-End Physics Event Classiﬁcation with CMS Open Data 11

Table 7: Multi-class Event Classiﬁcation Results, central+forward | η | < . Metric 4-mom ECAL, mass-aware ECAL CMS-I CMS-II

ROC AUC, 1-vs-Rest H → γγ γγ γ + jet 0.810 0.955 0.939 0.954 0.956 ROC AUC, 1-vs-1 H → γγ vs γγ H → γγ vs γ + jet 0.788 0.972 0.936 0.954 0.953 CVM ﬁdelity simulated low-level CMS detector data as input.The use of end-to-end classiﬁers is not restricted to anyparticular topology and does not rely on the ability tofully reconstruct the event kinematics. It can be appliedto arbitrarily complex event topologies and be partic-ularly relevant to cases were traditional reconstructionapproaches are diﬃcult, for example, for highly boostedand merged topologies that arise in many BSM models[24]. To combine overlapping subdetector images of dis-similar segmentation, we chose one subdetector to ren-der faithfully and projected all other subdetectors to itssegmentation. While these classiﬁers are best suited tochallenging decays, we have applied them in a simpli-ﬁed manner to the SM H → γγ decay to highlight theirkey features and challenges. Through the irreducible γγ background, we were able to infer that such classiﬁersare able to learn about the angular distribution of thephoton showers as well as the absolute energy of theirconstituent hits. We showed that we can deﬁnitivelyde-correlate the event classiﬁer from the reconstructeddiphoton mass while still preserving the angular infor-mation by using a CVM-based loss penalty. We foundthat such classiﬁers can learn about the photon showershapes giving them an exceptionally strong advantageover a purely kinematics-based classiﬁer in suppressingthe reducible γ + jet background. Finally, we demon-strated the scalability and ﬂexibility of the end-to-endclassiﬁers when dealing with multiple detector imagesand networks, where they exhibited robustness againstthe presence of underlying event and pile-up. Acknowledgements

We thank the entire CMS Collabora-tion for successfully recording LHC proton-proton collisiondata as well as producing and releasing high quality simu-lated data used in this paper. We also congratulate all mem-bers in the CERN accelerator departments for the excellentperformance of the LHC and thank the technical and ad-ministrative staﬀs at CERN and at other CMS institutes fortheir contributions to the success of the CMS eﬀort. In ad-dition, we gratefully acknowledge the computing centres andpersonnel of the Worldwide LHC Computing Grid for deliv-ering so eﬀectively the computing infrastructure essential toCMS analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMSdetector.We would like to thank the CERN Open Data group forreleasing their simulated data under an open access policy.We strongly support initiatives to provide such high-qualitysimulated datasets that can encourage the development ofnovel but also realistic algorithms, especially in the area ofmachine learning. We believe their continued availability willbe of great beneﬁt to the high energy physics communityin the long run. Finally, M.A. and M.P. are supported bythe Oﬃce of High Energy Physics of the U.S. Department ofEnergy (DOE) under award de-sc0010118.

References

1. S. Chatrchyan, et al., The CMS Experiment at the CERNLHC, JINST , S08004 (2008). DOI 10.1088/1748-0221/3/08/S080042. A.M. Sirunyan, et al., Particle-ﬂow reconstruction andglobal event description with the CMS detector, JINST (10), P10003 (2017). DOI 10.1088/1748-0221/12/10/P100033. V. Khachatryan, et al., Observation of the diphoton de-cay of the Higgs boson and measurement of its properties,Eur. Phys. J. C74 (10), 3076 (2014). DOI 10.1140/epjc/s10052-014-3076-z4. P. Baldi, P. Sadowski, D. Whiteson, Searching for Ex-otic Particles in High-Energy Physics with Deep Learn-ing, Nature Commun. , 4308 (2014). DOI 10.1038/ncomms53085. A. Krizhevsky, I. Sutskever, G.E. Hinton, in Advancesin Neural Information Processing Systems 25 , ed. byF. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger(Curran Associates, Inc., 2012), pp. 1097–11056. A. Esteva, et al., Dermatologist-level classiﬁcation ofskin cancer with deep neural networks, Nature , 115(2017). DOI 10.1038/nature210567. L. de Oliveira, M. Kagan, L. Mackey, B. Nachman,A. Schwartzman, Jet-images deep learning edition,JHEP , 069 (2016). DOI 10.1007/JHEP07(2016)0698. G. Kasieczka, T. Plehn, M. Russell, T. Schell, Deep-learning Top Taggers or The End of QCD?, JHEP ,006 (2017). DOI 10.1007/JHEP05(2017)0069. A. Aurisano, et al., A Convolutional Neural NetworkNeutrino Event Classiﬁer, JINST (09), P09001 (2016).DOI 10.1088/1748-0221/11/09/P0900110. W. Bhimji, et al., Deep Neural Networks for PhysicsAnalysis on low-level whole-detector data at the LHC,J. Phys. Conf. Ser. (4), 042034 (2018). DOI10.1088/1742-6596/1085/4/0420342 M. Andrews, M. Paulini, S. Gleyzer, B. Poczos(a) H → γγ vs. Rest(b) H → γγ vs. γγ component.(c) H → γγ vs. γ + jet component. Fig. 3: Multi-class Event Classiﬁcation ROC curves,central | η | < .

44 region.

11. G. Louppe, K. Cho, C. Becot, K. Cranmer, QCD-AwareRecursive Neural Networks for Jet Physics, JHEP ,057 (2019). DOI 10.1007/JHEP01(2019)05712. S. Agostinelli, et al., GEANT4: A Simulation toolkit,Nucl. Instrum. Meth. A506 , 250 (2003). DOI 10.1016/S0168-9002(03)01368-813. CMS Collaboration. Simulated dataset GluGluH-ToGG M-125 8TeV-pythia6 in AODSIM format for 2012collision data. (2017). CERN Open Data Portal.DOI:10.7483/OPENDATA.CMS.WQ7P.BZP314. CMS Collaboration. Simulated dataset DiPhoton-Born Pt-25To250 8TeV ext-pythia6 in AODSIM formatfor 2012 collision data. (2017). CERN Open Data Portal.DOI:10.7483/OPENDATA.CMS.WV7J.8GN015. CMS Collaboration. Simulated datasetGJet Pt40 doubleEMEnriched TuneZ2star 8TeV ext-pythia6 in AODSIM format for 2012 colli-sion data (2017). CERN Open Data Portal.DOI:10.7483/OPENDATA.CMS.2W51.W8AT16. M. Cacciari, G.P. Salam, Pileup subtraction using jetareas, Phys. Lett. B , 119 (2008). DOI 10.1016/j.physletb.2007.09.07717. K. He, X. Zhang, S. Ren, J. Sun, in

Proceedings, 2016IEEE Conference on Computer Vision and PatternRecognition (CVPR): Las Vegas, NV, USA, June 27-30,2016 (2016), pp. 770–778. DOI:10.1109/CVPR.2016.9018. A. Rogozhnikov, A. Bukva, V.V. Gligorov,A. Ustyuzhanin, M. Williams, New approaches forboosting to uniformity, JINST (03), T03002 (2015).DOI 10.1088/1748-0221/10/03/T0300219. Yandex Data School. Flavours ofPhysics Challenge Evaluation (2017).Https://github.com/yandexdataschool/ﬂavours-of-physics-start/blob/master/evaluation.py20. D.P. Kingma, J. Ba, Adam: A Method for Stochastic Op-timization (2014)21. A. Paszke, et al., in Proceedings, 31st Conference on Neu-ral Information Processing Systems (NIPS 2017), LongBeach, CA, USA (2017)22. M. Andrews, M. Paulini, S. Gleyzer, B. Poczos, End-to-End Event Classiﬁcation of High-Energy Physics Data, J.Phys. Conf. Ser. (4), 042022 (2018). DOI 10.1088/1742-6596/1085/4/04202223. X. Shi et al., Convolutional LSTM Network: A MachineLearning Approach for Precipitation Nowcasting (2015)24. B.A. Dobrescu, G.L. Landsberg, K.T. Matchev, Higgsboson decays to CP odd scalars at the Tevatron and be-yond, Phys. Rev. D , 075003 (2001). DOI 10.1103/PhysRevD.63.075003nd-to-End Physics Event Classiﬁcation with CMS Open Data 13(a) True γγ events, without mass de-correlation (left) and with (right).(b) True γ + jet events, without mass de-correlation (left) and with (right). Fig. 4: Central

CMS-B classiﬁer output in signal label vs. reconstructed diphoton mass for true γγ (4a) and γ +jet(4b) events, with and without mass de-correlation. The impact of de-correlation is more severe if the backgroundlacks shower diﬀerentiation. H → γγ vs. Rest(b) H → γγ vs. γγ component.(c) H → γγ vs. γ + jet component. Fig. 5: Multi-class Event Classiﬁcation ROC curves,central+forward | η | < ..