MLPF: Efficient machine-learned particle-flow reconstruction using graph neural networks
Joosep Pata, Javier Duarte, Jean-Roch Vlimant, Maurizio Pierini, Maria Spiropulu
NNoname manuscript No. (will be inserted by the editor)
MLPF: Efficient machine-learned particle-flow reconstruction usinggraph neural networks
Joosep Pata a,1,2 , Javier Duarte b,3 , Jean-Roch Vlimant c,2 , Maurizio Pierini d,4 , MariaSpiropulu e,2 National Institute of Chemical Physics and Biophysics (NICPB), Rävala pst 10, 10143 Tallinn, Estonia California Institute of Technology, Pasadena, CA 91125, USA University of California San Diego, La Jolla, CA 92093, USA European Center for Nuclear Research (CERN), CH 1211, Geneva 23, SwitzerlandReceived: date / Accepted: date
Abstract
In general-purpose particle detectors, the parti-cle flow algorithm may be used to reconstruct a coherentparticle-level view of the event by combining informationfrom the calorimeters and the trackers, significantly improv-ing the detector resolution for jets and the missing trans-verse momentum. In view of the planned high-luminosityupgrade of the CERN Large Hadron Collider, it is neces-sary to revisit existing reconstruction algorithms and en-sure that both the physics and computational performanceare sufficient in a high-pileup environment. Recent devel-opments in machine learning may offer a prospect for effi-cient event reconstruction based on parametric models. Weintroduce MLPF, an end-to-end trainable machine-learnedparticle flow algorithm for reconstructing particle flow can-didates based on parallelizable, computationally efficient,scalable graph neural networks and a multi-task objective.We report the physics and computational performance ofthe MLPF algorithm on on a synthetic dataset of tt eventsin HL-LHC running conditions, including the simulation ofmultiple interaction effects, and discuss potential next stepsand considerations towards ML-based reconstruction in ageneral-purpose particle detector.
Reconstruction algorithms at general-purpose high-energyparticle detectors aim to provide a coherent, well-calibratedphysics interpretation of the collision event. Variants ofthe particle-flow (PF) algorithm have been used at the PE- a [email protected]. Corresponding author. This work was partiallycarried out at Caltech. b [email protected] c [email protected] d [email protected] e [email protected] TRA [1], ALEPH [2], CMS [3] and ATLAS [4] experi-ments to reconstruct a particle-level interpretation of high-multiplicity hadronic collision events, given individual de-tector elements such as tracks and calorimeter clusters froma multi-layered, heterogeneous, irregular-geometry detector.The PF algorithm generally correlates tracks and calorime-ter clusters from detector layers such as the electromagneticcalorimeter (ECAL), hadron calorimeter (HCAL) and oth-ers to reconstruct charged and neutral hadron candidates aswell as photons, electrons, and muons with an optimized ef-ficiency and resolution. Existing PF reconstruction imple-mentations are optimized using simulation for each specificexperiment because detailed detector characteristics and ge-ometry must be considered for the best possible physics per-formance.Recently, there has been significant interest in adaptingthe PF reconstruction approach for future high-luminosityexperimental conditions at the CERN Large Hadron Col-lider (LHC), as well as for proposed future collider exper-iments like the Future Circular Collider (FCC). While re-construction algorithms are often based on an imperative,rule-based approach, the use of supervised machine learning(ML) to define reconstruction parametrically based on dataand simulation samples may improve the physics reach ofthe experiments while offering a modern computing solutionthat could scale better with the expected progress on ML-specific computing infrastructures, e.g., at high-performancecomputer centers. In addition to potentially improving thephysics performance, one of the motivations for develop-ing ML-based reconstruction is an improved computationalperformance over standard algorithms in a high-luminosityconfiguration, which ultimately would allow a more detailedreconstruction to be deployed given a fixed computing bud-get, as ML algorithms are well-suited to emerging highlyparallel computing architectures. a r X i v : . [ phy s i c s . d a t a - a n ] J a n ML-based reconstruction approaches have been pro-posed for various tasks, including PF [5]. The clusteringof energy deposits in detectors with a realistic, irregular-geometry detector using graph neural networks (GNNs) hasbeen first proposed in Ref. [6]. The ML-based reconstruc-tion of overlapping signals without a regular grid was fur-ther developed in Ref. [7], where an optimization schemefor reconstructing a variable number of particles based ona potential function using an object condensation approachwas proposed. The clustering of energy deposits from par-ticle decays with potential overlaps is an essential input toPF reconstruction. In Ref. [8], various ML models includ-ing GNNs and computer-vision models have been studiedfor reconstructing neutral hadrons from multi-layered gran-ular calorimeter images and tracking information. In particlegun samples, the ML-based approaches achieved a signifi-cant improvement in neutral hadron energy resolution overthe default algorithm, an important step towards a fully para-metric, simulation-driven reconstruction using ML.In this paper, we build on the previous ML-based recon-struction approaches by extending the ML-based PFs algo-rithm to reconstruct particle candidates in events with a largenumber of simultaneous pileup (PU) collisions. In Section 2,we propose a benchmark dataset that has the main com-ponents for a particle-level reconstruction of charged andneutral hadrons with PU. In Section 3, we build on the ex-isting ML-based reconstruction and propose a GNN-basedmachine-learned particle-flow (MLPF) algorithm where theruntime scales approximately linearly with the input size.Furthermore, in Section 4, we characterize the performanceof the MLPF model on the benchmark dataset in terms ofhadron reconstruction efficiency, fake rate and resolution,comparing it to the baseline PF reconstruction, while alsodemonstrating using synthetic data that MLPF reconstruc-tion can be computationally efficient and scalable. Finally,in Section 5 we discuss some potential issues and next stepsfor ML-based PF reconstruction.
We use
PYTHIA
DELPHES
DELPHES for additional validation. The
DELPHES model corresponds to a CMS-like detector with a multi-layered charged particle tracker, an electromagnetic andhadron calorimeter.Although this simplified simulation does not in-clude important physics effects such as pair produc- tion, Brehmsstrahlung, nuclear interactions, electromagneticshowering or a detailed detector simulation, it allows thestudy of overall per-particle reconstruction properties forcharged and neutral hadrons in a high-PU environment. Dif-ferent reconstruction approaches can be developed and com-pared on this simplified dataset, where the expected perfor-mance is straightforward to assess, including from the aspectof computational complexity.The inputs to PF are charged particle tracks andcalorimeter clusters. We use these high-level detector inputs(elements), rather than low-level tracker hits or unclusteredcalorimeter hits to closely follow how PF is implementedin existing reconstruction chains, where successive recon-struction steps are decoupled, such that each step can be op-timized and characterized individually. In this toy dataset,tracks are characterized by transverse momentum ( p T ) ,charge, and the pseudorapidity and azimuthal angle coor-dinates on the inner ( η , φ ) and outer surfaces ( η outer , φ outer )of the tracker. The track η and φ coordinates are addition-ally smeared with a 1% Gaussian resolution to model a finitetracker resolution. Calorimeter clusters are characterized byelectromagnetic or hadron energy E and η , φ coordinates. Inthis simulation, an event has N = ( . ± . ) × detectorinputs on average.The targets for PF reconstruction are stable generator-level particles that are associated to at least one detector el-ement, as particles that leave no detector hits are not recon-structable. Generator particles are characterized by a particleidentification (PID) which may take one of the followingcategorical values: charged hadron, neutral hadron, photon,electron, or muon. In case multiple generator particles alldeposit their energy completely to a single calorimeter clus-ter, we treat them as reconstructable only in aggregate. Inthis case, the generator particles are merged by adding themomenta and assigning it the PID of the highest-energy sub-particle. In addition, charged hadrons are indistinguishableoutside the tracker acceptance from neutral hadrons, there-fore we label generated charged hadrons with | η | > . E > . DELPHES rule-based PF reconstruction, ignoring neutralhadrons that do not pass this threshold. A single event fromthe dataset is visualized in Fig. 1, demonstrating the inputmultiplicity and particle distribution in the event. We showthe differential distributions of the generator-level particlesin the simulated dataset on Fig. 2. As common for collider physics, we use a Cartesian coordinate sys-tem with the z axis oriented along the beam axis, the x axis on the hori-zontal plane, and the y axis oriented upward. The x and y axes define thetransverse plane, while the z axis identifies the longitudinal direction.The azimuthal angle φ is computed with respect to the x axis. The polarangle θ is used to compute the pseudorapidity η = − log ( tan ( θ / )) .The transverse momentum ( p T ) is the projection of the particle mo-mentum on the ( x , y ) plane. We fix units such that c = (cid:125) = η -4 -2 0 2 4 x [ a . u .] -1 0 1 y [ a . u .] -101 tt, 14 TeV, 200 PU TracksECAL clustersHCAL clustersTruth particles
Fig. 1
A simulated tt event from the MLPF dataset with 200 PU interactions. The input tracks are shown in gray, with the trajectory curvaturebeing defined by the inner and outer η , φ coordinates. Electromagnetic (hadron) calorimeter clusters are shown in blue (orange), with the sizecorresponding to cluster energy for visualization purposes. We also show the locations of the generator particles (all types) with red cross markers.The radii and thus the x , y -coordinates of the tracker, ECAL and HCAL surfaces are arbitrary for visualization purposes. We also store the PF candidates reconstructed by
DELPHES for comparison purposes. The
DELPHES rule-based PF algorithm is described in detail in Ref. [11], iden-tifying charged and neutral hadrons based on track andcalorimeter cluster overlaps and energy subtraction. Pho-tons, electrons and muons are identified by
DELPHES basedon the generator particle associated to the correspondingtrack or calorimeter cluster. Each event is now fully char-acterized by the set of generator particles Y = { y i } (targetvectors), the set of detector inputs X = { x i } (input vectors),with y i = [ PID , p T , E , η , φ , q ] , (1) x i = [ type , p T , E ECAL , E HCAL , η , φ , η outer , φ outer , q ] , (2)PID ∈ { charged hadron , neutral hadron , γ , e ± , µ ± } (3)type ∈ { track , cluster } . (4) For input tracks, only the type, p T , η , φ , η outer , φ outer , and q features are filled. Similarly, for input clusters, only thetype, E ECAL , E HCAL , η and φ entries are filled. Unfilled fea-tures for both tracks and clusters are set to zero. In futureiterations of MLPF, it may be beneficial to represent inputelements of different type with separate data matrices to im-prove the computational efficiency of the model.Functionally, the detector is modelled in simulation by afunction S ( Y ) = X that produces a set of detector signalsfrom the generator-level inputs for an event. Reconstruc-tion imperfectly approximates the inverse of that function R (cid:39) S − ( X ) = Y . In the following section, we approximatethe reconstruction as set-to-set translation and implement abaseline MLPF reconstruction using graph neural networks. Truth particle p T [GeV] T r u t h pa r t i c l e s Charged hadronsNeutral hadronsPhotonsElectronsMuons-5 0 5
Truth particle η T r u t h pa r t i c l e s Charged hadronsNeutral hadronsPhotonsElectronsMuons
Fig. 2
The p T (upper) and η (lower) distributions of the generatorparticles in the simulated dataset, split by particle type. For a given set of detector inputs X , we want to predict a setof particle candidates Y (cid:48) that closely approximates the tar-get generator particle set Y . The target and predicted setsmay have a different number of elements, depending onthe quality of the prediction. For use in ML using gradientdescent, this requires a computationally efficient set-to-setmetric || Y − Y (cid:48) || ∈ R to be used as the loss function.We simplify the problem numerically by first zero-padding the target set Y such that | Y | = | X | . This turns theproblem of predicting a variable number of particles into amulti-classification prediction by adding an additional “noparticle” to the classes already defined by the target PID and is based on Ref. [7]. Since the target set now has a predefinedsize, we may compute the loss function which approximatesreconstruction quality element-by-element: || Y − Y (cid:48) || ≡ ∑ i ∈ event L ( y i , y (cid:48) i ) , (5) L ( y i , y (cid:48) i ) ≡ CLS ( c i , c (cid:48) i ) + α REG ( p i , p (cid:48) i ) , (6)where the target values and predictions y i = [ c i ; p i ] are de-composed such that the multi-classification is encapsulatedin the scores and one-hot encoded classes c i , while the mo-mentum and charge regression values in p i . We use CLS todenote the multi-classification loss (e.g. categorical cross-entropy), while REG denotes the regression loss (e.g. mean-squared error) for the momentum components weighted ap-propriately by a coefficient α . This per-particle loss functionserves as a baseline optimization target for the ML training.Further physics improvements may be reached by extend-ing the loss to take into account event-level quantities, eitherby using an energy flow distance as proposed in Ref. [13–15], or using a generative adversarial network (GAN) setupby optimizing the reconstruction network in tandem with aclassifier that is trained to distinguish between the target andreconstructed events, given the detector inputs.Furthermore, for PF reconstruction, the target genera-tor particles are often geometrically and energetically closeto well-identifiable detector inputs. In physics terms, acharged hadron is reconstructed based on a track, whilea neutral hadron candidate can always be associated to atleast one primary source cluster, with additional correctionstaken from other nearby detector inputs. Therefore, we maychoose to preprocess the inputs such that for a given arbi-trary ordering of the detector inputs X = [ . . . , x i , . . . ] (setsof vectors are represented as matrices with some arbitraryordering for ML training), the target set Y is arranged suchthat if a target particle can be associated to a single detectorinput, it is arranged to be in the same location in the se-quence. This data preprocessing step speeds up model con-vergence, but does not introduce any additional assumptionsto the model.3.1 Graph neural network implementationGiven the set of detector inputs for the event X = { x i } , weadopt a message passing approach for reconstructing the PFcandidates Y = { y i } . First, we need to construct a trainablegraph adjacency matrix F ( X | w ) = A for the given set of in-put elements, represented with the graph building block inFig. 3. The input set is heterogeneous, containing elementsof different type (tracks, ECAL clusters, HCAL clusters) indifferent feature spaces. Therefore, defining a static neigh-borhood graph in the feature space in advance is not straight-forward. A generic approach to learnable graph construc-tion using kNN in an embedding space, known as GravNet, LSH+kNN GCN
Event as input set X = { x i } Event as graph X = { x i }, A = A ij Transformed inputs H = { h i } Target set Y = y i 𝒢( X , A | w ) = H ℱ( X | w ) = A elementwise FFN 𝒟( x i , h i | w ) = y ′ i Output set Y ′ = { y ′ i } Elementwise loss classification & regression L ( y i , y ′ i ) Graph building Message passing Decoding
Trainable neural networks: x i = [type, p T , E ECAL , E HCAL , η , ϕ , η outer , ϕ outer , q , …], type ∈ {track, cluster} y i = [PID, p T , E , η , ϕ , q , …], PID ∈ {none, charged hadron, neutral hadron, γ , e ± , μ ± } h i ∈ ℝ N , N = 256ℱ, 𝒢, 𝒟 Fig. 3
Functional overview of the end-to-end trainable MLPF setup with GNNs. The event is represented as a set of detector elements x i . Theset is transformed into a graph by the graph building step, which is implemented here using an LSH approximation of kNN. The graph nodes arethen encoded using a message passing step, implemented using graph convolutional nets. The encoded elements are decoded to the output featurevectors y i using pointwise feedforward networks. has been proposed in Ref. [6], where the authors demon-strated that a learnable, dynamically-generated graph struc-ture significantly improves the physics performance of anML-based reconstruction algorithm for calorimeter cluster-ing.However, naive kNN graph implementations have O ( n ) time complexity: for each set element out of n = | X | , wemust order the other n − k closest. For reconstruction, given equivalent physics per-formance, both computational efficiency (a low overall run-time) and scalability (subquadratic time and memory scalingwith the input size) are desirable.We build on the GravNet approach [6] by using an ap-proximate kNN graph construction algorithm based on LSHto improve the time complexity of the graph building algo-rithm. The LSH approach has been recently proposed [16]for approximating and thus speeding up ML models thattake into account element-to-element relations using an op- timizable n × n matrix known as self-attention [17]. Themethod divides the input into bins using a hash function,such that nearby elements are likely to be assigned to thesame bin. The bins contain only a small number of elements,such that constructing a kNN graph in the bin is fast.In the kNN+LSH approach, the n input elements x i areprojected into a d K -dimensional embedding space by a train-able, elementwise feed-forward network FFN ( x i | w ) = z i ∈ R d K . As in Ref. [16], we now assign each element into one of d B bins indexed by integers b i using h ( z i ) = b i ∈ [ , . . . , d B ] ,where h ( x ) is a hash function that assigns nearby x to thesame bin with a high probability. We define the hash func-tion as h ( x ) = arg max [ xP ; − xP ] where [ u ; v ] denotes the con-catenation of two vectors u and v and P is a random projec-tion matrix of size [ d K , d B / ] drawn from the normal distri-bution at initialization.We now build d B kNN graphs based on the embeddedelements z i in each of the LSH bins, such that the full sparse graph adjacency A i j in the inputs set X is defined by the sumof the subgraphs. The embedding function can be optimizedwith backpropagation and gradient descent using the valuesof the nonzero elements of A i j . Overall, this graph buildingapproach has O ( n log n ) time complexity and does not re-quire the allocation of an n matrix at any point. The LSHstep generates d B disjoint subgraphs in the full event graph.This is motivated by physics, as we expect subregions of thedetector to be reconstructable approximately independently.The existing PF algorithm in the CMS detector employs asimilar approach by producing disjoint PF blocks as an in-termediate step of the algorithm [3].Having built the graph dynamically, we now use a vari-ant of message passing [18] to create hidden encoded states G ( x i , A i j | w ) = h i of the input elements taking into accountthe graph structure. As a first baseline, we use a variant ofgraph convolutional network (GCN) that combines local andglobal node-level information [19–21]. This choice is moti-vated by implementation and evaluation efficiency in estab-lishing a baseline. This message passing step is representedin Fig. 3 by the GCN block. Finally, we decode the encodednodes H = { h i } to the target outputs with an elementwisefeed-forward network that combines the hidden state withthe original input element D ( x i , h i | w ) = y (cid:48) i using a skip con-nection.We have a joint graph building, but separate graph con-volution and decoding layers for the multi-classification andthe momentum and charge regression subtasks. This allowseach subtask to be retrained separately in addition to a com-bined end-to-end training should the need arise. The clas-sification and regression losses are combined with constantempirical weights such that they have a approximately equalcontribution to the full training loss. It may be beneficial touse specific multi-task training strategies such as gradientsurgery [22] to further improve the performance across allsubtasks.The multi-classification prediction outputs for each nodeare converted to particle probabilities with the softmax op-eration. We choose the PID with the highest probability forthe reconstructed particle candidate, while ensuring that theprobability meets a threshold that matches a fake rate work-ing point defined by the baseline DELPHES
PF reconstruc-tion algorithm.The predicted graph structure is an intermediate step inthe model and is not used in the loss function explicitly—we only optimize the model with respect to reconstructionquality. However, using the graph structure in the loss func-tion when a known ground truth is available may furtherimprove the optimization process. In addition, access to thepredicted graph structure may be helpful in evaluating theinterpretability of the model.The set of networks for graph building, message passingand decoding has been implemented with T
ENSOR F LOW n = ,
400 elements, with theLSH bin size chosen to be 128 such that the number of bins d B =
50 and the number of nearest neighbors k =
16. Weuse two hidden layers for each encoding and decoding netwith 256 units each, with two successive graph convolu-tions between the encoding and decoding steps. Exponen-tial linear activations (ELU) [23] were used between hiddenlayers, linear activations were used for the outputs. Overall,the model has approximately 1.5 million trainable weightsand 25,000 constant weights for the random projections.For optimization, we used the Adam [24] algorithm with l = × − for 300 epochs, training over 4 × events,with 10 events used for testing. The events are processedin minibatches of five simultaneous events per graphics pro-cessing unit (GPU), we train for approximately 24 hours us-ing five RTX 2070S GPUs using data parallelism. We reportthe results of the multi-task learning problem in the next sec-tion. The code and dataset to reproduce the training are madeavailable on the Zenodo platform [25, 26]. In the model assessment, we focus on the charged and neu-tral hadron performance in the simulation events that werenot used for training. In typical PF reconstruction, chargedhadrons are reconstructed based on tracking information,while neutral hadrons are reconstructed from HCAL clus-ters not matched to tracks. In Fig. 4, we see that both thebaseline rule-based PF in
DELPHES and MLPF models gen-erally predict the charged and neutral particle multiplic-ity with a high degree of correlation, suggesting that themulti-classification model is appropriate for reconstructingvariable-multiplicity events. We note that the particle mul-tiplicities from the MLPF model generally correlate betterwith the generator-level target than the rule-based PF.In Fig. 5, we compare the per-particle multi-classification confusion matrix for both reconstructionmethods. We see overall a similar classification perfor-mance, with the neutral hadron identification efficiencybeing around 0 .
90 for both, while the MLPF algorithm hasa slightly higher efficiency (0 .
95 MLPF versus 0 .
90 for therule-based PF). Improved Monte Carlo generation, sub-sampling, or weighting may further improve reconstructionperformance for particles or kinematic configurations thatoccur rarely in a physical simulation. In this set of results,we apply no weighting on the events or particles in theevent.In Fig. 6, we see that the η -dependent charged hadronefficiency (true positive rate) for the MLPF model is some-what higher than for the rule-based PF baseline, while thefake rate (false positive rate) is equivalently zero, as the DELPHES simulation includes no fake tracks. From Fig. 7,
Truth particles / event R e c on s t r u c t ed pa r t i c l e s / e v en t Charged hadrons
Rule-based PFr = 0.9963, σ = 0.0087MLPFr = 0.9994, σ = 0.0036800 1000 1200
Truth particles / event R e c on s t r u c t ed pa r t i c l e s / e v en t Neutral hadrons
Rule-based PFr = 0.9267, σ = 0.0324MLPFr = 0.9694, σ = 0.0134
Fig. 4
True and predicted particle multiplicity for MLPF and
DELPHES
PF for charged hadrons (upper) and neutral hadrons (lower). Both mod-els show a high degree of correlation ( r ) between the generated andpredicted particle multiplicity, with the MLPF model reconstructingthe charged and neutral particle multiplicitly with better resolution ( σ ). we observe a similar result for the energy-dependent effi-ciency and fake rate of neutral hadrons. Both algorithmsexhibit a turn-on at low energies and show a constant be-haviour at high energies, with MLPF being comparable orslightly better than the rule-based PF baseline. Furthermore,we see on Figs. 8 and 9 that the energy, energy ( p T ) and an-gular resolution of the MLPF algorithm are generally com-parable to the baseline for neutral (charged) hadrons. N one C h . had N . had γ e ± μ ± N one C h . had N . had γ e ± μ ± T r ue P I D Rule-based PF N one C h . had N . had γ e ± μ ± Reconstructed PID N one C h . had N . had γ e ± μ ± T r ue P I D MLPF
Fig. 5
Particle identification confusion matrices with gen-level parti-cles as the ground truth, with the baseline
DELPHES
PF (upper) MLPF(lower). The rows have been normalized to unit probability, corre-sponding to normalizing the dataset according to the generated PID.
Overall, these results demonstrate that formulating PFreconstruction as a multi-task ML problem of simultane-ously identifying charged and neutral hadrons in a high-PU environment and predicting their momentum may offercomparable or improved physics performance over hand-written algorithms in the presence of sufficient simulationsamples and careful optimization. The performance charac-teristics for the baseline and the proposed MLPF model aresummarized in Table 1. -2 0 2 η E ff i c i en cy Charged hadrons
Rule-based PFMLPF
Fig. 6
The efficiency of reconstructing charged hadron candidates as afunction of the generator particle pseudorapidity η . The MLPF modelhas uniformly higher efficiency. The fake rate is zero for both models,since the simulation does not contain fake tracks. We also characterize the computational performance ofthe GNN-based MLPF algorithm. In Fig. 10, we see thatthe average inference time scales roughly linearly with theinput size, which is necessary for scalable reconstruction athigh PU. We also note that the GNN-based MLPF algorithmruns natively on a GPU, with the current runtime at around50 ms/event on a consumer-grade GPU for a full 200 PUevent. The algorithm may be relatively simple to port effi-ciently to any computing architecture that supports commonML frameworks like T
ENSOR F LOW without significant in-vestment. This includes GPUs and potentially even field-programmable gate arrays (FPGAs) or ML-specific proces-sors such as the GraphCore intelligence processing units(IPUs) [27] through specialized ML compilers [28–30].These coprocessing accelerators can be integrated into exist-ing CPU-based experimental software frameworks as a scal-able service that grows to meet the transient demand [31–33].
Charged hadrons Neutral hadronsMetric Rule-based PF MLPF Rule-based PF MLPFEfficiency 0.903
Fake rate 0 0 0.191 p T ( E ) resolution 0.211 η resolution N resolution 0.009 Table 1
Particle reconstruction efficiency and fake rate, multiplicity N , p T ( E ) and η resolutions for charged (neutral) hadrons, comparingthe rule-based PF baseline and the proposed MLPF method. Boldedvalues indicate better performance.
10 20 30 40 50
E [GeV] E ff i c i en cy Neutral hadrons
Rule-based PFMLPF10 20 30 40 50
E [GeV] F a k e r a t e Neutral hadrons
Rule-based PFMLPF
Fig. 7
The efficiency (upper) and fake rate (lower) of reconstructingneutral hadron candidates as a function of the generator particle energy.The MLPF model shows comparable performance to the
DELPHES
PFbenchmark, with a somewhat lower fake rate at a similar efficiency.
We have proposed an algorithm for machine-learnedparticle-flow (MLPF) reconstruction in a high-pileup envi-ronment for a general-purpose multilayered particle detectorbased on transforming input sets of detector elements to theoutput set of reconstructed particles. The MLPF implemen-tation with graph neural networks (GNNs) is based on graphbuilding with a locality sensitive hashing (LSH) approxi-mation for k-nearest neighbors (kNN), dubbed LSH+kNN, -2 -1 0 1 2 p T resolution, (p T − p T )/p T P a r t i c l e s Charged hadrons
Rule-based PFμ = − 0.01, σ = 0.21MLPFμ = 0.03, σ = 0.14-0.2 -0.1 0.0 0.1 0.2 η resolution, (η − η)/η P a r t i c l e s Charged hadrons
Rule-based PFμ = − 0.00, σ = 0.25MLPFμ = 0.00, σ = 0.25
Fig. 8
The p T and η resolution of the DELPHES
PF benchmark and theMLPF model for charged hadrons. The p T resolution is comparablefor both algorithms, with the angular resolution being driven by thesmearing of the track ( η , φ ) coordinates. and message passing using graph convolutions. Based ona benchmark particle-level dataset generated using PYTHIA
DELPHES
3, the MLPF GNN reconstruction offerscomparable physics performance for charged and neutralhadrons to the baseline rule-based particle-flow (PF) algo-rithm in
DELPHES , demonstrating that a purely parametricmachine learning (ML)-based PF reconstruction can reachthe physics performance of existing reconstruction algo-rithms, while allowing for greater portability across variouscomputing architectures at a possibly reduced cost. The in- -4 -2 0 2 4
E resolution, (E − E)/E P a r t i c l e s Neutral hadrons
Rule-based PFμ = 0.03, σ = 0.35MLPFμ = 0.05, σ = 0.32-0.4 -0.2 0.0 0.2 0.4 η resolution, (η − η)/η P a r t i c l e s Neutral hadrons
Rule-based PFμ = − 0.02, σ = 0.05MLPFμ = − 0.01, σ = 0.06
Fig. 9
The energy and η resolution of the DELPHES
PF benchmark andthe MLPF model for neutral hadrons. Both reconstruction algorithmsshow comparable performance. ference time empirically scales approximately linearly withthe input size, which is useful for efficient evaluation in thehigh-luminosity phase of the CERN Large Hadron Collider(LHC). In addition, the ML-based reconstruction model mayoffer useful features for downstream physics analysis likeper-particle probabilities for different reconstruction inter-pretations, uncertainty estimates, and optimizable particle-level reconstruction for rare processes including displacedsignatures. Average event size [elements] A v e r age r un t i m e / e v en t [ m s ] tt simulation, 14 TeV
40 PU80 PU200 PUMLPF scaling1 2 3 4
Batch size [events] R e l a t i v e i n f e r en c e t i m e [ a . u .]
40 PU80 PU200 PU
Fig. 10
Average runtime of the MLPF GNN model with a varying in-put event size (upper) and the inference time reduction with increasingbatch size (lower). For a simulated event equivalent to 200 PU colli-sions, we see a runtime of around 50 ms, which scales approximatelylinearly with respect to the input event size. We see a weak depen-dence on batch size, with batching having a minor positive effect forlow-pileup events. The runtime for each event size is averaged over100 randomly generated events over three independent runs. The tim-ing tests were done using an Nvidia RTX 2060S GPU and an [email protected] CPU. We assume a linear scaling between PU andthe number of detector elements.
The MLPF model can be further improved with a morephysics-motivated optimization criterion, i.e. a loss functionthat takes into account event-level, in addition to particle-level differences. While we have shown that a per-particleloss function already converges to an adequate physics per-formance overall, improved event-based losses such as theobject condensation approach or energy flow may be use-ful. In addition, an event-based loss may be defined using anadversarial classifier that is trained to distinguish the targetparticles from the reconstructed particles. Reconstruction algorithms need to adapt to changing ex-perimental conditions—this may be addressed in MLPF bya periodic retraining on simulation that includes up-to-daterunning condition data such as the beam-spot location, deadchannels, and latest calibrations. In a realistic MLPF train-ing, care must be taken that the reconstruction qualities ofrare particles and particles in the low-statistics tails of dis-tributions are not adversely affected and that the reconstruc-tion performance remains uniform. This may be addressedwith detailed simulations and weighting schemes. In addi-tion, for a reliable physics result, the interpretability of thereconstruction is essential. The reconstructed graph struc-ture can provide information about causal relations betweenthe input detector elements and the reconstructed particlecandidates.In order to develop a usable ML-based PF reconstructionalgorithm, a realistic high-pileup simulated dataset that in-cludes detailed interactions with the detector material needsto be used for the ML model optimization. To evaluate thereconstruction performance, efficiencies, fake rates, and res-olutions for all particle types need to be studied in detailas a function of particle kinematics and detector conditions.Furthermore, high-level derived quantities such as pileup-dependent jet and missing transverse momentum resolutionsmust be assessed for a more complete characterization ofthe reconstruction performance. With ongoing work in ML-based track and calorimeter cluster reconstruction upstreamof PF, and ML-based reconstruction of high-level objects in-cluding jets and jet classification probabilities downstreamof PF, care must be taken that the various steps are opti-mized and interfaced coherently.Finally, the MLPF algorithm is inherently parallelizableand can take advantage of hardware acceleration of GNNsvia graphics processing units (GPUs), field-programmablegate arrays (FPGAs) or emerging ML-specific processors.Current experimental software frameworks can easily in-tegrate coprocessing accelerators as a scalable service. Byharnessing heterogeneous computing and parallelizable, ef-ficient ML, the burgeoning computing demand for eventreconstruction tasks in the high-luminosity LHC era canbe met while maintaining or even surpassing the currentphysics performance.
Acknowledgements
We thank our colleagues in the CMS Collabo-ration, including Josh Bendavid, Kenichi Hatakeyama, Lindsey Gray,and Jan Kieseler, for helpful feedback on this work.J. P. was supported by the Prime NSF Tier2 award 1624356 andthe the U.S. Department of Energy (DOE), Office of Science, Officeof High Energy Physics under Award No. DE-SC0011925 while atCaltech, and is currently supported by the Mobilitas Pluss Grant No.MOBTP187 of the Estonian Research Council. J. D. is supported bythe DOE, Office of Science, Office of High Energy Physics Early Ca-reer Research program under Award No. DE-SC0021187 and by theDOE, Office of Advanced Scientific Computing Research under AwardNo. DE-SC0021396 (FAIR4HEP). M. P. is supported by the European1Research Council (ERC) under the European Union’s Horizon 2020research and innovation program (Grant Agreement No. 772369). J-R. V. is partially supported by the same ERC grant and by the DOE,Office of Science, Office of High Energy Physics under Award No.DE-SC0011925, DE-SC0019227, and DE-AC02-07CH11359.We are grateful to Caltech and the Kavli Foundation for theirsupport of undergraduate student research in cross-cutting areas ofmachine learning and domain sciences. This work was mainly con-ducted at “ iBanks ,” the AI GPU cluster at Caltech, and on the NICPBGPU resources, supported by European Regional Development Fundthrough the CoE program grant TK133. We acknowledge Nvidia, Su-perMicro and the Kavli Foundation for their support of iBanks . Partof this work was also performed using the Pacific Research PlatformNautilus HyperCluster supported by NSF awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, the University of CaliforniaOffice of the President, and the University of California San Diego’sCalifornia Institute for Telecommunications and Information Technol-ogy/Qualcomm Institute. Thanks to CENIC for the 100 Gpbs networks.
References
1. CELLO Collaboration, “An Analysis of the Chargedand Neutral Energy Flow in e + e − HadronicAnnihilation at 34 GeV, and a Determination of theQCD Effective Coupling Constant”,
Phys. Lett. B (1982) 427, doi:10.1016/0370-2693(82)90778-X .2. ALEPH Collaboration, “Performance of the ALEPHdetector at LEP”,
Nucl. Instrum. Meth. A (1995)481, doi:10.1016/0168-9002(95)00138-7 .3. CMS Collaboration, “Particle-flow reconstruction andglobal event description with the CMS detector”,
JINST (2017) P10003, doi:10.1088/1748-0221/12/10/P10003 , arXiv:1706.04965 .4. ATLAS Collaboration, “Jet reconstruction andperformance using particle flow with the ATLASDetector”, Eur. Phys. J. C (2017) 466, doi:10.1140/epjc/s10052-017-5031-2 , arXiv:1703.10485 .5. J. Duarte and J.-R. Vlimant, “Graph Neural Networksfor Particle Tracking and Reconstruction”, in ArtificialIntelligence for Particle Physics . World ScientificPublishing, 2020. arXiv:2012.01249 . Submitted to
Int. J. Mod. Phys. A .6. S. R. Qasim, J. Kieseler, Y. Iiyama, and M. Pierini,“Learning representations of irregular particle-detectorgeometry with distance-weighted graph networks”,
Eur. Phys. J. C (2019) 608, doi:10.1140/epjc/s10052-019-7113-9 , arXiv:1902.07987 .7. J. Kieseler, “Object condensation: one-stage grid-freemulti-object reconstruction in physics detectors, graphand image data”, Eur. Phys. J. C (2020) 886, doi:10.1140/epjc/s10052-020-08461-2 , arXiv:2002.03605 . 8. F. A. Di Bello et al., “Towards a Computer VisionParticle Flow”, arXiv:2003.08863 .9. T. Sjöstrand, S. Mrenna, and P. Z. Skands, “ PYTHIA
JHEP (2006) 026, doi:10.1088/1126-6708/2006/05/026 , arXiv:hep-ph/0603175 .10. T. Sjöstrand, S. Mrenna, and P. Z. Skands, “A BriefIntroduction to PYTHIA
Comput. Phys. Commun. (2008) 852, doi:10.1016/j.cpc.2008.01.036 , arXiv:0710.3820 .11. DELPHES 3 Collaboration, “ DELPHES
3, A modularframework for fast simulation of a generic colliderexperiment”,
JHEP (2014) 057, doi:10.1007/JHEP02(2014)057 , arXiv:1307.6346 .12. S. Chekanov, “HepSim: a repository with predictionsfor high-energy physics experiments”, Adv. HighEnergy Phys. (2015) 136093, doi:10.1155/2015/136093 , arXiv:1403.1886 .13. P. T. Komiske, E. M. Metodiev, and J. Thaler, “EnergyFlow Networks: Deep Sets for Particle Jets”, JHEP (2019) 121, doi:10.1007/JHEP01(2019)121 , arXiv:1810.05165 .14. P. T. Komiske, E. M. Metodiev, and J. Thaler, “MetricSpace of Collider Events”, Phys. Rev. Lett. (2019)041801, doi:10.1103/PhysRevLett.123.041801 , arXiv:1902.02346 .15. M. C. Romao et al., “Use of a Generalized EnergyMover’s Distance in the Search for Rare Phenomena atColliders”, arXiv:2004.09360 .16. N. Kitaev, Ł. Kaiser, and A. Levskaya, “Reformer: TheEfficient Transformer”, in . 2020. arXiv:2001.04451 .17. A. Vaswaniet al., “Attention Is All You Need”, in Advances in Neural Information Processing Systems ,I. Guyon et al., eds., volume 30, p. 5998. CurranAssociates, Inc., 2017. arXiv:1706.03762 .18. J. Gilmeret al., “Neural message passing for quantumchemistry”, in
Proceedings of the 34th InternationalConference on Machine Learning , D. Precup and Y. W.Teh, eds., volume 70, p. 1263. PMLR, 2017. arXiv:1704.01212 .19. T. N. Kipf and M. Welling, “Semi-supervisedclassification with graph convolutional networks”, in . 2017. arXiv:1609.02907 .20. F. Wuet al., “Simplifying Graph ConvolutionalNetworks”, in
Proceedings of the 36th InternationalConference on Machine Learning , K. Chaudhuri andR. Salakhutdinov, eds., volume 97, p. 6861. PMLR,2019. arXiv:1902.07153 .
21. X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose,“Graph Highway Networks”, arXiv:2004.04635 .22. T. Yuet al., “Gradient surgery for multi-task learning”,in
Advances in Neural Information ProcessingSystems , H. Larochelle et al., eds., volume 33. 2020. arXiv:2001.06782 .23. D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fastand Accurate Deep Network Learning by ExponentialLinear Units (ELUs)”, in . 2016. arXiv:1511.07289 .24. D. P. Kingma and J. Ba, “Adam: A Method forStochastic Optimization”, in , Y. Bengioand Y. LeCun, eds. 2015. arXiv:1412.6980 .25. J. Pata, J. M. Duarte, and A. Tepper,“jpata/particleflow: MLPF
DELPHES paper softwarerelease”. https://github.com/jpata/particleflow ,2021. doi:10.5281/zenodo.4452542 .26. J. Pata et al., “Simulated particle-level dataset of tt withPU 200 using
PYTHIA DELPHES doi:10.5281/zenodo.4452283 .27. L. R. M. Mohan et al., “Studying the potential ofGraphcore IPUs for applications in Particle Physics”, arXiv:2008.09210 .28. J. Duarte et al., “Fast inference of deep neuralnetworks in FPGAs for particle physics”,
JINST (2018), no. 07, P07027, doi:10.1088/1748-0221/13/07/P07027 , arXiv:1804.06913 .29. Y. Iiyama et al., “Distance-Weighted Graph NeuralNetworks on FPGAs for Real-Time ParticleReconstruction in High Energy Physics”, Front. BigData (2021) 44, doi:10.3389/fdata.2020.598927 , arXiv:2008.03601 .30. A. Heintzet al., “Accelerated charged particle trackingwith graph neural networks on FPGAs”, in . 2020. arXiv:2012.01563 .31. J. Duarte et al., “FPGA-accelerated machine learninginference as a service for particle physics computing”, Comput. Softw. Big Sci. (2019) 13, doi:10.1007/s41781-019-0027-2 , arXiv:1904.08986 .32. J. Krupa et al., “GPU coprocessors as a service fordeep learning inference in high energy physics”, arXiv:2007.10359 . Submitted to Mach. Learn.: Sci.Technol.
33. D. S. Rankin et al., “FPGAs-as-a-Service Toolkit(FaaST)”, in . 2020. arXiv:2010.08556 . doi:10.1109/H2RC51942.2020.00010doi:10.1109/H2RC51942.2020.00010