[PDF] ABCNet: An attention-based method for particle tagging

Abstract

In high energy physics, graph-based implementations have the advantage of treating the input data sets in a similar way as they are collected by collider experiments. To expand on this concept, we propose a graph neural network enhanced by attention mechanisms called ABCNet. To exemplify the advantages and flexibility of treating collider data as a point cloud, two physically motivated problems are investigated: quark-gluon discrimination and pileup reduction. The former is an event-by-event classification while the latter requires each reconstructed particle to receive a classification score. For both tasks ABCNet shows an improved performance compared to other algorithms available.

Full PDF

AABCNet: An attention-based method for particle tagging

V. Mikuni and F. Canelli University of Zurich Winterthurerstrasse 190. CH-8057 Zurich.

Abstract.

In high energy physics, graph-based implementations have the advantage of treating the inputdata sets in a similar way as they are collected by collider experiments. To expand on this concept, wepropose a graph neural network enhanced by attention mechanisms called ABCNet. To exemplify the ad-vantages and ﬂexibility of treating collider data as a point cloud, two physically motivated problems areinvestigated: quark-gluon discrimination and pileup reduction. The former is an event-by-event classiﬁca-tion while the latter requires each reconstructed particle to receive a classiﬁcation score. For both tasksABCNet shows an improved performance compared to other algorithms available.

One of the main goals in modern machine learning is to be able to extract the maximum amount of informationavailable from a data set. Successful implementations take advantage of the data structure for model building. In highenergy physics (HEP), particle collisions in experiments are reconstructed by combining the energy deposits left byparticles after crossing diﬀerent parts of a detector. The information provided by sub-detectors can be further combinedto give a full description of each particle produced. At the Large Hadron Collider (LHC) [1], jets are ubiquitous objectsproduced in proton-proton collisions. Jets are the byproducts of the hadronisation of quarks and gluons, resulting inan often collimated spray of particles. After each collision, O (1000) or more particles can be produced, making thetask of identifying the original hard scattering objects challenging. The luminosity increase at the LHC will alsoincrease the amount of multiple interactions per bunch crossing (pileup). For instance, event collisions recorded thusfar by the ATLAS [2] and CMS [3] detectors at LHC measured an average of about 30 extraneous interactions. Withthe future upgrade, up to 200 pileup events per bunch crossing are expected, requiring new methods for particleidentiﬁcation and pileup suppression. In this paper a new method for event classiﬁcation in HEP is introduced. Theattention-based cloud net (ABCNet) takes into account the data structure recorded by particle collision experiments,treating each interaction as an unordered set of points that deﬁnes a point cloud. This description is advantageoussince the byproducts of each particle collision are treated in a similar fashion as they are collected by particle detectors.To enhance the extraction of local information, an attention mechanism is used, following closely the implementationdeveloped in [4]. Attention mechanisms have proved to boost performance for diﬀerent applications in machine learningby giving local and global context to the learning procedure. To show the performance and ﬂexibility of the model,two critical problems are investigated: quark-gluon discrimination and pileup mitigation. The main novelties introduced by ABCNet are the treatment of particle collision data as a set of permutation invariantobjects, enhanced by attention mechanisms to ﬁlter out the particles that are not relevant for the tasks we want toaccomplish. The usage of graph-based machine learning implementations is still a new concept in particle physics.Nevertheless, new implementations have already been proposed with promising results. ParticleNet [5] uses a similarapproach, using point clouds for jet identiﬁcation. The main diﬀerence between ABCNet and ParticleNet is thatABCNet takes advantage of attention mechanisms to enhance the local feature extraction, allowing for a more compactand eﬃcient architecture. A theory-inspired approach was also developed in the framework of Deep Sets [6] using aninfrared and collinear safe basis, developed in the context of Energy Flow Networks [7]. A message-passing approachfor jet-tagging discussed in [8]. Interaction networks were also studied in the context of high-mass particle decays withJEDI-net [9]. Other graph-based implementations have also been presented in the context of signal and background : a r X i v : . [ phy s i c s . d a t a - a n ] J un V. Mikuni, F. Canelli: ABCNet: An attention-based method for particle tagging classiﬁcation [10,11], particle track reconstruction [12], and particle reconstruction on irregular calorimeters. [13]. Inthe context of pileup rejection, the GGNN implementation [14] shows promising results by combining graph nodeswith GRU cells.

ABCNet follows closely the implementation described for GAPNet [4], with key diﬀerences to adapt the implemen-tation to our problems of interest. For clarity, the description of the essential aspects of the implementation aredescribed. The key aspect of GAPNet is the development of a graph attention pooling layer (GAPLayer) using theedge convolution operation proposed in [15], which deﬁnes a convolution-like operation on point clouds together withattention mechanisms to operate on graph-structured data described in [16]. The point cloud is ﬁrst represented as agraph with vertices represented by the points themselves. The edges are constructed by connecting the points to theirk-nearest neighbours, while the edge features, y ij = ( x i − x ij ) , are taken as the diﬀerence between features of eachpoint x i and its k-neighbours x ij . A GAPLayer is constructed by ﬁrst encoding each point and edge to a higher-levelfeature space of dimension F using a single-layer neural network (NN), with learnable parameters θ , in the followingform: x (cid:48) i = h ( x i , θ i , F ) y (cid:48) ij = h ( y ij , θ ij , F ) Where h() denotes the single-layer neural network operation. Self- and local-coeﬃcients are created by passing thetransformed points and edges to a single-layer NN with output dimension of size one. Finally, the attention coeﬃcients c ij are created by combining the newly created coeﬃcients in the following way: c ij = LeakyRelu( h ( x (cid:48) i , θ (cid:48) i ,

1) + h ( y (cid:48) ij , θ (cid:48) ij , (1)where the non-linear LeakyRelu operation is applied to the output of the sum. To align the attention coeﬃcientsbetween diﬀerent points, a Softmax normalisation is applied to the coeﬃcients c ij . At this moment, each point isassociated to k attention coeﬃcients. To compute a single attention feature for each point, a linear combination witha non-linear activation function is deﬁned as ˆ x i = Relu (cid:88) j c ij y (cid:48) ij  . (2)To enhance the stability of the determination of the coeﬃcients ˆ x i , a multi-head mechanism can be used. A M-headprocess repeats the same procedure described above, determining ˆ x i M times, diﬀering only on the random weightinitialisation. The M results are combined by taking the maximum of the M diﬀerent ˆ x i . The outputs of each GAPLayerconsist of attention features ( ˆ x i ) and graph features ( y (cid:48) ij ). The graph features are further aggregated in the form: y maxij = max ( y (cid:48) ij ) . Due to stackability properties, a GAPlayer output can be further used as an input to a subsequent GAPLayer ormultilayer perceptron (MLP).

Quark-gluon tagging refers to the task of identifying the origin of a jet as produced from the hadronisation of a gluonor a quark. The data set used for the studies are available from [7]. It consists of stable particles clustered into jets,excluding neutrinos, using the anti- k T algorithm [17] with R=0.4. The quark-initiated sample (signal) is generatedusing a Z( νν ) + ( u, d, s ) while the gluon-initiated data (background) are generated using Z( νν ) + g processes. Bothsamples are generated using Pythia8 [18] without detector eﬀects. Jets are required to have transverse momentum p T ∈ [500 , GeV and rapidity | y | < . for the reconstruction. For the training, testing and evaluation of themethod, the recommended splitting is used with 1.6M/200k/200k events respectively. For every reconstructed jet,up to 200 constituents are saved. Each constituent contains the four momentum and the expected particles type(electron, muon, photon, or charged/neutral hadrons). A typical jet has O (10) to O (100) particles. To simplify theimplementation, ABCNet uses the ﬁrst 100 constituents sorted by p T from highest to lowest. If the jet has less than100 constituents, the event is padded with zeros, if there are more than 100 constituents, the event is truncated.To enhance the non-local information extraction, global features can also be added to ABCNet. The approachis similar to the one described in [19], where global information is used to parameterise the network, improving thegeneralisation and performance as a function of the global parameters.The features used to describe each constituent are listed in Table 1. . Mikuni, F. Canelli: ABCNet: An attention-based method for particle tagging 3 Table 1.

Description of each feature used to deﬁne a point in the point cloud implementation for quark-gluon classiﬁcation.The latter two features are the global information added to the networkVariable Description ∆η Diﬀerence between the pseudo-rapidity of the constituent and the jet ∆φ Diﬀerence between the azimuthal angle of the constituent and the jet log p T logarithm of the constituent’s p T log E logarithm of the constituent’s E log p T p T (jet) logarithm of the ratio between the constituent’s p T and the jet p T log EE(jet) logarithm of the ratio between the constituent’s E and the jet E ∆ R Distance in the η − φ space between the constituent and the jetPID particle type identiﬁer as described in [20].m(jet) Jet mass p T (jet) Jet transverse momentum Aggregation

GAP layer {32} (k = 10, H = 1)Input cloud (Nx8)Attention features

GAP layer {64} (k = 10, H = 2)

Attention features Graph featuresGraph features

Global features (Nx2)

Fully connected {16}Fully connected {128}Fully connected {128}Fully connected {128}Fully connected {128} F u ll y c o nn ec t e d { } F u ll y c o nn ec t e d { } F u ll y c o nn ec t e d { } D r o p o u t { . } D r o p o u t { . } S o f t m a x { } A v e r a g e p oo li n g F u ll y c o nn ec t e d { } Fig. 1.

ABCNet architecture used for quark-gluon tagging. Fully connected layer and encoding node sizes are denoted inside“{}”. For each GAPLayer, the number of k-nearest neighbours (k) and heads (H) are given.

The network layout used is shown in Fig. 1. The ﬁrst step is to calculate the distances between the constituents inthe pseudorapidity-azimuth ( η − φ ) space of the form ∆R = (cid:112) ∆η + ∆φ . From the distances, we create the ﬁrstGAPLayer by associating each particle to its nearest 10 neighbours. While diﬀerent choices for k were tested, the overallperformance did not improve with the addition of more neighbours. The encoding channel size of the GAPLayer F isselected to be 32 with a 1-head. The attention features created by the GAPLayer are then passed through two MLPswith node sizes (128,128). The distances used for the second GAPLayer are calculated using the full-feature spaceproduced in the output of the last MLP, allowing the network to learn distances in the transformed feature space.To achieve a robust estimation, the encoding channel size is selected to be 64 with the number of heads determinedto be two. The newly created attention features are passed through two MLPs of node sizes each of 128. In parallel,ABCNet also takes additional global inputs in the form of the jet mass and transverse momenta. The global inputs areﬁrst transformed by means of a single-layer MLP with small node size of 16. The two graph features and the outputof each MLP are concatenated with the transformed global features and fed to a MLP of node size 128. An averagepulling is applied and the result is further passed to 2 additional MLPs of node sizes (128,256) interleaved by twodropout layers. A Softmax operation is applied to the output result. V. Mikuni, F. Canelli: ABCNet: An attention-based method for particle tagging

The performance of ABCNet is compared to the methods implemented in [5] and [7], using the same data set.The ﬁgures of merit used for the comparison are: – Accuracy: Ratio between the number of correct predictions over the total number of test examples. – AUC: Integral of the area under the receiver operating characteristic distribution. – (cid:15) B : One over the background eﬃciency for a ﬁxed value of the signal eﬃciency (50% or 30%) – Parameters: Number of trainable weights for the model.The results of the comparisons are listed in Table 2. Even though the accuracy obtained by ABCNet is numericallythe same as the one reported by ParticleNet, ABCNet excels on the other ﬁgures of merit, improving the backgroundrejection, at 30% signal eﬃciency, by 15 - 20%. The use of attention coeﬃcients allow the model complexity of ABCNetto be reduced, having 40% less parameters compared to ParticleNet.

Table 2.

Comparison between the performance achieved with ABCNet and diﬀerent available implementations. The uncertaintyquoted corresponds to the standard deviation of nine trainings with diﬀerent random weight initialisation. If the uncertainty isnot quoted then the variation is negligible compared to the expected value.Acc AUC 1/ (cid:15) B ( (cid:15) S = 0 . ) 1/ (cid:15) B ( (cid:15) S = 0 . ) ParametersResNeXt-50 0.821 0.9060 30.9 80.8 1.46MP-CNN 0.827 0.9002 34.7 91.0 348kPFN - 0.9005 34.7 ± ParticleNet ± ± ± ± A simple way to check what ABCNet is learning is to look at the self-coeﬃcients of each point of the point cloud.First, we pre-processes the images in a similar fashion as [21], using the following steps: – Centre: All jet images are translated in the η − φ space to a common centre at (0,0). The centre of the jet is takenas its p T -weighted centroid. – Particle scale: Each particle constituent has its transverse momentum scaled such that (cid:80) jet p T , i = 1 , where i isthe i-th constituent of the jet. – Overall scale: The ﬁnal image is created by superimposing the individual event images and dividing the resultingdistribution by the number of events in the test sample.Other steps were adopted in [21], however since the goal is to have a simple visual cue, they were not used. The resultingjet images are shown in Fig. 2 for quark- and gluon-initiated jets on the upper and lower rows, respectively. Theleftmost images correspond to the jets after the pre-processing. The subsequent columns show the same distribution,but only considering particles whose self-attention coeﬃcients, resulting from the ﬁrst (middle column) and second(right column) GAPLayers, are higher than a certain value. This value is chosen such that only the ﬁrst 5 % of allparticles with the largest self-attention coeﬃcients are selected. The self-coeﬃcients from the ﬁrst GAPLayer have theeﬀect of giving higher attention to high- p T particles while soft-QCD with large angular variation has less importance.The second GAPLayer, where nearest-neighbours are calculated in the feature space, have diﬀerent distributions forquark-initiated and gluon-initiated jets. Quark initiated jets have the highest coeﬃcients in a conﬁned radius with ∼ ∆R = 0 . around the centre, while gluon initiated coeﬃcients spam a bigger area around the centre with ∼ ∆R = 0 . .That behaviour is expected since gluon-jets have a larger colour factor compared to quark jets, typically resulting ina broader angular distribution compared to quark jets. Another crucial problem in particle physics is how to identify the particles originated from high- p T collisions, andseparate them from unwanted additional interactions. Two traditional methods to accomplish this task are the Softkiller[22] and the Pileup Per Particle Identiﬁcation (PUPPI) [23] algorithms. These two algorithms are chosen since they . Mikuni, F. Canelli: ABCNet: An attention-based method for particle tagging 5 - - fD - - hD - - - - - - - - -

10 0.4 - - fD - - hD - - - - - - - -

10 0.4 - - fD - - hD - - - - - - - - fD - - hD - - - - - - - - -

10 0.4 - - fD - - hD - - - - - - -

10 0.4 - - fD - - hD - - - - - - Fig. 2.

Distribution of the p T -scaled distribution of the jet constituents averaged over all images in the test sample. The leftmostimages are the quark (top) and gluon (bottom) jet averages after the pre-processing. The ﬁrst 5 % of the jet constituents withthe highest self-attention coeﬃcients for the ﬁrst and second GAPLayers are shown on the images in the centre and right,respectively. represent the most common algorithms for pileup mitigation at the LHC. To test the performance of ABCNet in thiscontext, we change the scope of a single jet classiﬁer to a particle-by-particle classiﬁcation (part segmentation). In thiscase, a probability is estimated for object, determining how likely it is for each particle to originate from the leadingvertex (LV). The sample used for this study is available from [24], containing a set of q ¯ q light-quark-initiated jetscoming from the decay of a scalar particle with mass m φ =

500 GeV. The samples were generated using

Pythia8 at √ s = 13 TeV. The pileup events were generated by overlaying soft QCD processes onto each event. Stable particlesare clustered into jets, excluding neutrinos, using the anti- k T algorithm with R=0.4. At parton level, a p T requirementof at least 95 GeV was applied. Only jets satisfying p T >100 GeV and η ∈ [-2.5,2.5] are considered. For each event, upto two leading jets as ordered in p T are stored. Two thousand events are generated, each with a diﬀerent number ofpileup interactions (NPU) ranging from 0 to 180. For the training and testing samples, events are randomly selectedfrom the generated samples according to a Poisson distribution with average pileup = 140, motivated by theexpected pileup levels for future collisions at the LHC. The training and evaluation are done with 80% and 10% of theevents with = 140, respectively. For testing, two samples are created: one corresponding to the remaining10% of the events and = 140 and the other a sample with independent events generated at diﬀerent NPUlevels. For each event, up to 500 particles are stored as long as they are matched to one of the two leading jets. Thefeatures used to deﬁne each particle are described in Table 3. The feature choice is similar to the ones used for theclassiﬁcation task. The main diﬀerence is that for this sample, the PID information is not available, but replaced by aﬂag that identiﬁes if a particle is charged or not. Since more than one jet can be reconstructed, a global zero is usedfor all events, instead of using the jet axis as a reference point. While no selection is applied to the particles used inABCNet, the PUPPI weights and the SoftKiller decision ﬂag are also used as input features. The global informationadded to the parameterisation is NPU and the number of reconstructed particles associated to jets. The network architecture for the part segmentation problem is similar to the setup used previously. The main diﬀerencesare: – Number of considered neighbours increased from 10 to 50. – Additional MLPs after the attention features and after the pooling layer. – Usage of only 1-head GAPLayers.The increase in expansion of neighbours and MLPs are chosen to increase the model’s capacity to cover the largeramount of points per event. The architecture is shown in Fig. 3.

V. Mikuni, F. Canelli: ABCNet: An attention-based method for particle taggingVariable Description η Particle’s pseudo-rapidity. φ Particle’s azimuthal angle. log p T logarithm of the particle p T . Q boolean ﬂag identifying if the particle is charged. log p T p T (jet) logarithm of the ratio between the particle p T and the associated jet p T . log EE(jet) logarithm of the ratio between the particle E and the associated jet E. w PUPPI

PUPPI weight for the particle. w SoftKiller boolean ﬂag identifying if the particle passes the SoftKiller p T requirement.NPU number of pileup interactions.NPART number of reconstructed particles associated to jets. Table 3.

Variable description for each feature used to deﬁne a point in the point cloud implementation for the pileup mitigationproblem. The latter two features are the global information added to the network.

Aggregation

GAP layer {32} (k = 50, H = 1)Input cloud (Nx8)Attention features

GAP layer {64} (k = 50, H = 1)

Attention features Graph featuresGraph features

Global features (Nx2)

Fully connected {16}Fully connected {64}Fully connected {128}Fully connected {128}Fully connected {128} F u ll y c o nn ec t e d { } F u ll y c o nn ec t e d { } F u ll y c o nn ec t e d { } D r o p o u t { . } D r o p o u t { . } S o f t m a x { N X } A v e r a g e p oo li n g F u ll y c o nn ec t e d { } Fully connected {64}Fully connected {128} F u ll y c o nn ec t e d { N x } Fig. 3.

ABCNet architecture used for pileup identiﬁcation. Fully connected layer and encoding node sizes are denoted inside“{}”. For each GAPLayer, the number of k-nearest neighbours (k) and heads (H) are given.

The performance of ABCNet is compared to the performance achieved using PUPPI and SoftKiller. The defaultparameters for those methods are the same as the ones used in [24]: R = R min = 0.02, w cut = 0.1, p T cut (NPU) =0.1 + 0.007 × NPU (PUPPI), grid size = 0.4 (SoftKiller). First, the jet mass is reconstructed with the =140evaluation sample, applying the diﬀerent mitigation algorithms. Inspired by PUPPI, the output probabilities fromABCNet are used to reweight the four-momentum of each particle. The reconstructed dijet mass and the dijet massresolution are shown in Fig. 4. The resolution is deﬁned as: mass resolution = m reco − m true m true . In Table 4, the width of the jet mass resolution, extracted by ﬁtting the distributions in Fig. 4 (right) with a Gaussianfunction, is also listed.ABCNet improves jet mass resolution compared to both PUPPI and SoftKiller by 75% and 83%, respectively. Therobustness of each algorithm is also tested by comparing The Pearson linear correlation coeﬃcient (PCC) between thetrue jet mass and corrected jet masses for diﬀerent NPU is generated. Figure 5 shows the result of the comparison usingthe test sample with NPU from 0 to 180. To investigate the power of ABCNet to generalise, a training sample with = 20 is created and trained using the same architecture described previously. For both trainings, ABCNet . Mikuni, F. Canelli: ABCNet: An attention-based method for particle tagging 7

200 400 600 800 1000 1200

Dijet mass [GeV] N o r m a li z ed E v en t s / b i n No mitigationABCNetPUPPITrueSoftKiller - - - dijet mass resolution N o r m a li z ed E v en t s / b i n ABCNetPUPPISoftKiller

Fig. 4.

Distribution of the dijet mass using the diﬀerent pileup mitigation algorithms (left) and the jet mass resolution (right).A narrower resolution peak means better performance. All distributions are normalised to unit.Algorithm Resolution widthSoftKiller 0.022PUPPI 0.021ABCNet

Resolution width for diﬀerent pileup mitigation strategies. The resolution width is extracted by ﬁtting the distributionsshown in Fig. 4 (right) with a Gaussian function. shows a superior performance compared to PUPPI and SoftKiller for the entire NPU range. Furthermore, ABCNet isalso remarkably robust for pileup variations outside the training region due to the addition of the global parametersto the method.

ABCNet is implemented using Tensorﬂow v1.4 [25]. A Nvidia GTX 1080 Ti graphics card is used for the trainingand evaluation steps. For all tasks described in this paper, the Adam optimiser [26] is used. The learning rate startsfrom 0.001 and decreases linearly by a factor 10 every seven epochs, until reaching a minimum of 1e-7. The training isperformed with a mini batch size of 64 to a maximum number of 50 epochs. The epoch with the highest accuracy onthe evaluation is saved in the case of the quark-gluon classiﬁcation task. For the pileup identiﬁcation, the epoch withthe lowest loss is stored.

In this document, a new machine learning implementation for data classiﬁcation in HEP is introduced. The attention-based cloud net (ABCNet) takes advantage of the data structure commonly found in particle colliders to create apoint cloud interpretation. An attention mechanism is implemented to enhance the local information extraction andprovide a simple way to investigate what the method is learning. To capture the global information, direct connectionsfor global input features can be directly added. ABCNet can be used for event-by-event classiﬁcation problems orgeneralised to particle-by-particle classiﬁcation. To exemplify the architecture ﬂexibility, two example problems areinvestigated: quark-gluon classiﬁcation and pileup mitigation. For both problems, ABCNet achieved an improvedperformance compared to other available methods. By using a graph architecture and interpreting each point in apoint cloud as a particle, ABCNet can be readily adapted to other applications in HEP like jet-ﬂavour tagging, boostedjet identiﬁcation, or particle-track reconstruction.

V. Mikuni, F. Canelli: ABCNet: An attention-based method for particle tagging

NPU J e t m a ss c o rr e l a t i on c oe ff i c i en t ABCNet trained on =140PUPPISoftKillerABCNet trained on =20

Fig. 5.

PCC for each pileup mitigation algorithm for diﬀerent NPU. ABCNet is trained on =140 (blue) or =20(orange).

This research was supported in part by the Swiss National Science Foundation (SNF) under contract No. 200020-182037. The authors would like to thank Loukas Gouskos and Ben Kilminster for the valuable suggestions regardingthe development and clarity of this document.

References

1. Lyndon Evans and Philip Bryant. LHC Machine.

JINST , 3:S08001, 2008.2. ATLAS Collaboration. The ATLAS Experiment at the CERN Large Hadron Collider.

JINST , 3:S08003, 2008.3. CMS Collaboration. The CMS Experiment at the CERN LHC.

JINST , 3:S08004, 2008.4. Can Chen, Luca Zanotti Fragonara, and Antonios Tsourdos. GAPNet: Graph Attention based Point Neural Network forExploiting Local Feature of Point Cloud. arXiv e-prints , page arXiv:1905.08705, May 2019.5. Huilin Qu and Loukas Gouskos. ParticleNet: Jet Tagging via Particle Clouds. arXiv e-prints , page arXiv:1902.08570,February 2019.6. Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola.Deep sets. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,

Advances in Neural Information Processing Systems 30 , pages 3391–3401. Curran Associates, Inc., 2017.7. Patrick T. Komiske, Eric M. Metodiev, and Jesse Thaler. Energy ﬂow networks: deep sets for particle jets.

Journal of HighEnergy Physics , 2019(1), Jan 2019.8. A. Lister J. Pearkes S. Egan, W. Fedorko and C. Gay. Neural Message Passing for Jet Physics. In

Proceedings of the DeepLearning for Physical Sciences Workshop at NIPS (2017) , 2017.9. Eric A. Moreno, Olmo Cerri, Javier M. Duarte, Harvey B. Newman, Thong Q. Nguyen, Avikar Periwal, Maurizio Pierini,Aidana Serikova, Maria Spiropulu, and Jean-Roch Vlimant. JEDI-net: a jet identiﬁcation algorithm based on interactionnetworks. 2019.10. Murat Abdughani, Jie Ren, Lei Wu, and Jin Min Yang. Probing stop pair production at the LHC with graph neuralnetworks.

JHEP , 08:055, 2019.11. Nicholas Choma, Federico Monti, Lisa Gerhardt, Tomasz Palczewski, Zahra Ronaghi, Prabhat, Wahid Bhimji, Michael M.Bronstein, Spencer R. Klein, and Joan Bruna. Graph neural networks for icecube signal classiﬁcation.

CoRR ,abs/1809.06166, 2018.12. Steven Farrell et al. Novel deep learning methods for track reconstruction. In , 2018.. Mikuni, F. Canelli: ABCNet: An attention-based method for particle tagging 913. Shah Rukh Qasim, Jan Kieseler, Yutaro Iiyama, and Maurizio Pierini. Learning representations of irregular particle-detectorgeometry with distance-weighted graph networks.

The European Physical Journal C , 79(7):608, Jul 2019.14. J. Arjona Martínez, O. Cerri, M. Spiropulu, J. R. Vlimant, and M. Pierini. Pileup mitigation at the large hadron colliderwith graph neural networks.

The European Physical Journal Plus , 134(7):333, Jul 2019.15. Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph CNNfor learning on point clouds.

CoRR , abs/1801.07829, 2018.16. Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attentionnetworks, 2017.17. Matteo Cacciari, Gavin P. Salam, and Gregory Soyez. The anti- k T jet clustering algorithm. JHEP , 04:063, 2008.18. TorbjÃűrn Sjöstrand, Stefan Ask, Jesper R. Christiansen, Richard Corke, Nishita Desai, Philip Ilten, Stephen Mrenna,Stefan Prestel, Christine O. Rasmussen, and Peter Z. Skands. An Introduction to PYTHIA 8.2.

Comput. Phys. Commun. ,191:159–177, 2015.19. Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sadowski, and Daniel Whiteson. Parameterized neural networks forhigh-energy physics.

Eur. Phys. J. , C76(5):235, 2016.20. M. Tanabashi et al. Review of Particle Physics.

Phys. Rev. , D98(3):030001, 2018.21. Patrick T. Komiske, Eric M. Metodiev, and Matthew D. Schwartz. Deep learning in color: towards automated quark/gluonjet discrimination.

JHEP , 01:110, 2017.22. Matteo Cacciari, Gavin P. Salam, and Gregory Soyez. SoftKiller, a particle-level pileup removal method.

Eur. Phys. J. ,C75(2):59, 2015.23. Daniele Bertolini, Philip Harris, Matthew Low, and Nhan Tran. Pileup Per Particle Identiﬁcation.

JHEP , 10:059, 2014.24. Patrick T. Komiske, Eric M. Metodiev, Benjamin Nachman, and Matthew D. Schwartz. Pileup Mitigation with MachineLearning (PUMML).

JHEP , 12:051, 2017.25. M. Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available fromtensorﬂow.org.26. Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv e-printsarXiv e-prints