Point Cloud Transformers applied to Collider Physics
PPoint Cloud Transformers applied to ColliderPhysics
V. Mikuni, a F. Canelli, a a University of Zurich,Winterthurerstrasse 190, 8057 Zurich, Switzerland
E-mail: [email protected]
Abstract:
Methods for processing point cloud information have seen a great success incollider physics applications. One recent breakthrough in machine learning is the usageof Transformer networks to learn semantic relationships between sequences in languageprocessing. In this work, we apply a modified Transformer network called Point CloudTransformer as a method to incorporate the advantages of the Transformer architecture toan unordered set of particles resulting from collision events. To compare the performancewith other strategies, we study jet-tagging applications for highly-boosted particles. a r X i v : . [ phy s i c s . d a t a - a n ] F e b ontents The interactions between elementary particles is described by the Standard Model (SM) ofparticle physics. Particle colliders are then used to study these interactions by comparingexperimental signatures to SM predictions. At every collision, hundreds of particles canbe created, detected, and reconstructed by particle detectors. Extracting relevant physicsquantities from this space is a challenging task that is often accomplished through the usageof physics-motivated summary statistics, that reduce the data dimensionality to manageablequantities. A recent approach is to interpret the set of reconstructed particles as points ina point cloud. Point clouds represent a set of unordered objects, described in a well-definedspace, often used for applications in self-driving vehicles, robotics, and augmented reality,to name a few. With this approach, information from each bunch crossing in a particlecollider is interpreted as a point cloud, where the goal is to use this high-dimensional setof reconstructed particles to extract relevant information. However, the task of extractinginformation from point clouds can also be a challenging task. One novel approach is touse the Transformers architecture [1] to learn the semantic relationship between objects.Transformers yielded a great success in recent years applied to natural language processing– 1 –NLP), often showing a superior performance when compared to previous well-establishedmethods. The advantage of this architecture is the capability of learning semantic affini-ties between objects without losing information during long sequences. Transformers arealso easily parallelizable, a huge computational advantage over sequential architectures likegated recurrent [2] and long short-term memory [3] neural networks. Applications of theTransformer network have already been applied to examples outside NLP problems, withexamples in image recognition [4, 5].The Transformer architecture is not readily applicable to point clouds. In fact, sincepoint clouds are intrinsically unordered, the Transformer structure has to be modified todefine a self-attention operation that respects data symmetries, like permutation invariance.A recent approach introduced in [6] addresses these issues through the development of PointCloud Transformers (PCT). In this work, we will first introduce the key features developedfor PCT and use a modified version, applied to a high energy physics task, in the formof jet-tagging. Results are compare with other approaches using three different publicdatasets.
Neural network architectures that treat collision events as point clouds have recently grownin number given their state-of-the-art performance when applied to different collider physicsproblems. A few examples of such applications are jet-tagging [7, 8], secondary vertexfinding [9], event reconstruction [10–13], and jet parton assignment [14]. A comprehensivereview of the different methods is described in [15].Two applications in particular will be relevant for the following discussions of the PCTimplementation. These are the ParticleNet [16] and ABCNet [17] architectures. The for-mer introduces the EdgeConv operation, initially developed in [3]. This operation uses ak-nearest neighbors approach to create local patches inside a point cloud. The local infor-mation is then used to create high level features for each point that retains the informationof the local neighborhood. ABCNet, on the other hand, uses the local information to definean attention mechanism, first introduced in [18] and applied in [19]. A similar concept ofattention mechanisms are defined for PCT, where a self-attention layer is used to providethe relationship importance between all particles in the set.Jet-tagging is a common task used to benchmark different algorithms applied to col-lider physics. While a number of algorithms have been proposed in recent years, a specialattention will be given to algorithms with results in public datasets. In [16], results arepresented for both quark gluon and top quark datasets, while [20] introduces a multiclas-sification sample containing five different jet categories. The description of each dataset isdiscussed in Sec. 5.
The Transformer implementation applied to point clouds requires two main building blocks:the feature extraction and the self-attention (SA) layers. The feature extractor is used– 2 –o map the input point cloud F in ∈ R N × d in to a higher dimensional representation F e ∈ R N × d out . This step is used to achieve a higher level of abstraction for each point presentin the point cloud. In this work, two different strategies are compared. An architectureconsisting of stacked, one-dimensional convolutional (Conv1D) layers, and a second option,based on EdgeConv blocks. The EdgeConv block consists of an EdgeConv operation [3]followed by 2 two-dimensional convolutional (Conv2D) layers and an average pooling op-eration. The EdgeConvolution operation uses a k-nearest neighbors approach to define avicinity for each point in the point cloud. This enhances the ability of the network toextract information from local neighborhood around each point.The first strategy is then referred to simple PCT (SPCT) while the second will bereferred to just PCT.The second main building block is the usage of an offset attention defined as a self-attention (SA) layer. The output of the feature extractor F e is used as the input of the firstSA layer. The goal of the SA layer is to determine the relationship between all particles ofthe point cloud through an offset attention mechanism. This approach differs from the onetaken in ABCNet, where a self-attention and neighboring attention coefficients are definedfor a neighborhood of each particle.In the same terms defined in the original Transformer [1] work, three different matrices,all built from linear transformations of the original inputs. These matrices are called query(Q), key (K), and value (V). The linear transformations are accomplished through the usageof Conv1D layers such that: Q , K , V = F e . (W q , W k , W v ) (3.1) Q , K ∈ R N × d a , V ∈ R N × d out . (3.2)The matrices (W q , W k , W v ) contain the trainable linear coefficients introduced by theconvolutional operation. The attention weights (A) are then calculated by first multiplyingthe query matrix with the transpose of the key matrix: A = Q . K T , A ∈ R N × N . (3.3)A softmax operation is then applied to each row of A to normalize the coefficients for allpoints. The last step is to define the offset-attention. First, the attention weights aremultiplied by the value matrix, resulting in the self-attention F sa with F sa = A . V , F sa ∈ R N × d out . (3.4)The difference between F e and F sa is passed through a Conv1D layer with same outputdimension d out . The result of this layer is the offset added to the original inputs F e . Differentlevels of abstraction can be achieved by stacking multiple SA layers, using the output ofeach SA layer as the input for the next.To complete the general architecture, the SA layers are combined through a simple con-catenation followed by an average pooling operation, leaving the entire architecture invari-ant with permutations of the input points. The output of this operation is passed through– 3 –ully connected layers before reaching the output layer, normalized through a softmax oper-ation. All convolutional and fully connected layers are passed through the nonlinear ReLUactivation function, with the exception of the convolutional operations inside the SA layersand the output layer.The general PCT network and the main building blocks are shown in Fig. 1. Thetraining details are explained in Sec. 4 FEATURE EXTRACTOR (Nxd out )INPUT DATASET (Nxd in )SA LAYERS (Nxd out )OUTPUT (N categories )FULLY CONNECTED SPCT PCT (2x)FEATURE EXTRACTOR
INPUT INPUTConv1DConv1DOUTPUT EdgeConvConv2DConv2D
Average pooling
OUTPUT
SA LAYER
INPUT Conv1D C o n v D SOFTMAX (NxN) C o n v D Conv1DOUTPUT
KEY (d a xN)QUERY (Nxd a ) VALUE (Nxd out ) Matrix multiplicationMatrix subtractionMatrix addition
CONCATENATIONCONCATENATION
Figure 1 . General network architecture (left), feature extractor (middle), and self-attention layerdescription (right). d in , d out , and, d c represent the input feature, output feature, and fully con-nected layer sizes. The PCT implementation is done using Tensorflow v1.14 [21]. A Nvidia GTX 1080 Tigraphics card is used for the training and evaluation steps. For all architectures, the Adamoptimiser [22] is used with a learning rate starting from 0.001 and decreasing by a factor2 every 20 epochs, to a minimum of 1e-6. The training is performed with a mini batchsize of 64 and a maximum number of 200 epochs. If no improvement is observed in theevaluation set for 15 consecutive epochs, the training is stopped. The epoch with the lowestclassification loss on the test set is stored for further evaluation.
Different performance metrics are compared for (S)PCT applied to a jet classification taskon different public datasets. Jets are collimated sprays of particles resulting from thehadronization and fragmentation of energetic partons. Jets can show distinguishing featuresdepending on the elementary particle that has initiated the jet. Traditional methods usethis information to define physics-motivated observables [23] that can distinguish differentjet categories.The PCT architecture uses two EdgeConv blocks, each defining the k-nearest neighborsof each point with k = 20. The initial distances are calculated in the pseudorapidity-azimuth( η − φ ) space of the form ∆ R = (cid:112) ∆ η + ∆ φ . The distances used for the second EdgeConv– 4 –lock are calculated using the full-feature space produced in the output of the last EdgeConvblock.Besides the feature extractor, PCT uses three SA layers while SPCT uses two. Theoutput of all SA layers are concatenated for both PCT and SPCT. However, the output ofthe last EdgeConv block is also added during concatenation with a skip connection. Thedetailed architectures used during training for PCT and SPCT are shown in Fig. 2. INPUT DATASET (Nx16)SA LAYER (64)
FULLY CONNECTED (64)
SPCT PCT
EdgeConv Block (128) k=20
INPUT DATASET (Nx16)
EdgeConv Block (64) k=20
Dropout (0.5)Conv1D (128)Conv1D (64)SA LAYER (64)CONCATENATIONConv1D (128)
FULLY CONNECTED (N categories ) SA LAYER (64)
FULLY CONNECTED (128)
Dropout (0.5)SA LAYER (64)CONCATENATIONConv1D (256)
FULLY CONNECTED (N categories ) SA LAYER (64) INPUT (Nxd)EdgeConvConv2DConv2D
Average pooling
OUTPUT
EdgeConv Block
Figure 2 . SPCT (left) and PCT (middle) architectures used for all jet-tagging classification tasks.The EdgeConv block structure used in PCT is shown in the right. Numbers in parenthesis representlayer sizes.
PCT and SPCT receive as inputs the particles found inside the jets. The input featuresvary between applications, depending on the available content for each public dataset. Forall comparisons, up to 100 particles per jet are used. If more particles were found inside ajet, the event is truncated, otherwise zero-padded up to 100.
For this study, samples containing simulated jets originating from W bosons, Z bosons, topquarks, light quarks, and gluons produced at √ s = 13 TeV proton-proton collisions are used.The samples are available at [24]. This dataset is created and configured using a parametricdescription of a generic LHC detector, described in [25, 26]. The jets are clustered with theanti- k T algorithm [27] with radius parameter R = 0.8, while also requiring that the jet’s p T – 5 –s around 1 TeV, which ensures that most of the decay products of the generated particles arefound inside a single jet. The training and testing sets contain 567k and 63k jets respectively.The performance comparison is reported using the official evaluation set, containing 240kjets. For each particle, a set of 16 kinematic features are used. These distributions werechosen to match the particle features used in [20] to facilitate the comparison.The area under the curve (AUC) for each evaluation is calculated by taking each jetcategory as a signal while the remaining categories are treated as background. The resultsare shown in Tab. 1. Table 1 . Area under the curve for each jet category reported on the HLS4ML LHC Jet dataset.Results for all methods are taken as the average of 10 trainings with random network initialization.If the uncertainty is not quoted then the variation is negligible compared to the expected value.Bold results represent the algorithm with highest performance.
Algorithm Gluon Light quark W boson Z boson Top quarkDNN [20] 0.9384 0.9026 0.9537 0.9459 0.9620GRU [20] 0.9040 0.8962 0.9192 0.9042 0.9350CNN [20] 0.8945 0.9007 0.9102 0.8994 0.9494JEDI-net [20] 0.9529 0.9301 0.9739 0.9679 0.9683JEDI-net with (cid:80) O [20] 0.9528 0.9290 0.9695 0.9649 0.9677SPCT 0.9585 0.9370 0.9767 0.9799 0.9730PCT The top tagging dataset consists of jets containing the hadronic decay products of topquarks (treated as signal) together with jets generated through QCD dijet events (treatedas background). The samples are available at [28]. The events are generated with
Pythia8 [29] with detector simulation done through
Delphes [30]. The jets are clustered with theanti- k T algorithm with radius parameter R = 0.8. Only jets with transverse momentum p T ∈ [550 , GeV and rapidity | y | < are kept. The official training, testing, andevaluation splitting are used, containing 1.2M/400k/400k events respectively. For eachparticle, a set of 7 input features is used. These distributions are the same ones used in[16] to facilitate the comparison between algorithms. The AUC and background rejectionpower, defined as the inverse of the background efficiency for a fixed signal efficiency, arelisted in Tab. 2, with a reduced number of algorithms as reported in [16]. A more complete,although slightly outdated list is available at [31]. The dataset used for the studies are available from [32]. It consists of stable particlesclustered into jets, excluding neutrinos, using the anti- k T algorithm with R = 0.4. Thequark-initiated sample (treated as signal) is generated using a Z( νν ) + ( u, d, s ) while thegluon-initiated data (treated as background) are generated using Z( νν ) + g processes. Bothsamples are generated using Pythia8 [29] without detector effects. Jets are required to– 6 – able 2 . Comparison between the performance reported for different classification algorithms onthe top tagging dataset. The uncertainty quoted corresponds to the standard deviation of ninetrainings with different random weight initialization. If the uncertainty is not quoted then thevariation is negligible compared to the expected value. Bold results represent the algorithm withhighest performance.
Acc AUC 1/ (cid:15) B ( (cid:15) S = 0 . ) 1/ (cid:15) B ( (cid:15) S = 0 . )ResNeXt-50 [16] 0.936 0.9837 302 ± ± ± ± ± ± ± ± ± ± JEDI-net [20] 0.9263 0.9786 - 590.4JEDI-net with (cid:80) O [20] 0.9300 0.9807 - 774.6SPCT 0.931 0.9813 230 ±
10 851 ± ±
12 1287 ± p T ∈ [500 , GeV and rapidity | y | < . for the reconstruc-tion. For the training, testing and evaluation, the recommended splitting is used with1.6M/200k/200k events respectively. Each particle contains the four momentum and theexpected particles type (electron, muon, photon, or charged/neutral hadrons). For eachparticle, a set of 13 kinematic features is used. These features are chosen to match the onesused in [16, 17]. The AUC and background rejection power are listed in Tab. 3. Table 3 . Comparison between the performance reported for different classification algorithms onthe quark and gluon dataset. The uncertainty quoted corresponds to the standard deviation ofnine trainings with different random weight initialization. If the uncertainty is not quoted then thevariation is negligible compared to the expected value. Bold results represent the algorithm withhighest performance.
Acc AUC 1/ (cid:15) B ( (cid:15) S = 0 . ) 1/ (cid:15) B ( (cid:15) S = 0 . )ResNeXt-50 [16] 0.821 0.9060 30.9 80.8P-CNN [16] 0.827 0.9002 34.7 91.0PFN [32] - 0.9005 34.7 ± ± ± ± ± SPCT 0.824 0.899 34.4 ± ± ± ± Besides the algorithm performance, the computational cost is also an important figure ofmerit. To compare the amount of computational resources required to evaluate each model,– 7 –he number of trainable weights and the number of floating point operations (FLOPs) arecomputed. The comparison of these quantities for different algorithms are shown in Tab. 4.
Table 4 . Number of trainable weights and floating point operations (FLOPs) for each model underconsideration
Algorithm Weights FLOPsResNeXt-50 [16] 1.46M -P-CNN [16] 348k -PFN [32] 82k -ParticleNet-Lite [16] 26k -ParticleNet [16] 366k -ABCNet [17] 230k -DNN [20] 14.7k 27kGRU [20] 15.6k 46kCNN [20] 205.5k 400kJEDI-net [20] 33.6k 116MJEDI-net with (cid:80) O [20] 8.8k 458MSPCT 55.4k 20MPCT 153.9k 381MWhile PCT shows a better overall AUC compared to SPCT, the improvement in perfor-mance from the usage of EdgeConv blocks comes with a cost in computational complexity.SPCT, on the other hand, provides a good balance between performance and computationalcost, resulting in almost 20 times less FLOPs and 3 times less trainable weights comparedto PCT. The SA module defines the relative importance between all points in the set through theattention weights. We can use this information to identify the regions inside a jet that havehigh importance for a chosen particle. To visualize the particle importance, the HLS4MLLHC jet dataset is used to create a pixelated image of a jet in the transverse plane. Theaverage jet image of 100k examples in the evaluation set is used. For each image, a simplepreprocessing strategy is applied to align the different images. First, the whole jet istranslated such that the particle with the highest transverse momentum in the jet is centeredat (0,0). This particle is also used as the reference particle from where attention weightsare shwon. Next, the full jet image is rotated, making the second most energetic particlealigned with the positive y-coordinate. Lastly, the image is flipped in the x-coordinate incase the third most energetic particle is located on the negative x-axis, otherwise the imageis left as is. These transformations are also used in other jet image studies such as [17, 33].The pixel intensity for each jet image is taken from the attention weights after the softmaxoperation is applied, expressing the particle importance with respect to the most energetic– 8 –article in the event. A comparison of the extracted images for each SA layer and for eachjet category is shown in Fig. 3 . - - - - - Gluon - - fD - - hD Gluon - - - - - Light Quark - - fD - - hD Light Quark - - - - - Z Boson - - fD - - hD Z Boson - - - - - W Boson - - fD - - hD W Boson - - - - - Top Quark - - fD - - hD Top Quark - - - - - Gluon - - fD - - hD Gluon - - - - - - Light Quark - - fD - - hD Light Quark - - - - - Z Boson - - fD - - hD Z Boson - - - - - W Boson - - fD - - hD W Boson - - - - - Top Quark - - fD - - hD Top Quark - - - - - Gluon - - fD - - hD Gluon - - - - - Light Quark - - fD - - hD Light Quark - - - - - Z Boson - - fD - - hD Z Boson - - - - - W Boson - - fD - - hD W Boson - - - - - Top Quark - - fD - - hD Top Quark
Figure 3 . Average jet image for each jet category (columns) and for each self-attention layer (rows).The pixel intensities represent the overall particle importance compared to the most energeticparticle in the jet.
The different SA layers are able to extract different information for each jet. In partic-ular, the jet substructure is exploited, resulting in an increased relevance to harder subjetsin the case of Z boson, W boson, and top quark initiated jets. On the other hand, lightquark and gluon initiated jets have a more homogeneous radiation pattern, resulting alsoin a more homogenous picture.
In this work, a new method based on the Transformer architecture was applied to a highenergy physics application. The point cloud transformer (PCT) modifies the usual Trans-former architecture to be applied to a set of unordered points present in a point cloud. Thismethod has the advantage of extracting semantic affinities between the points through thedevelopment of a self-attention mechanism. We evaluate the performance of this archi-tecture applied to several jet-tagging datasets by testing two different implementations,one that exploits the neighborhood information through EdgeConv operations and a sim-pler form that connects all points through convolutional layers called simple PCT (SPCT).Both approaches have shown state-of-the-art performance compared to other publicly avail-able results. While the classification performance of SPCT is slightly lower compared tothe standard PCT, the number of floating point operations required to evaluate the modeldecreases by almost a factor 20. This reduced computational complexity can be exploited– 9 –n environments with limited computing resources or applications that require fast inferenceresponses.A different advantage of (S)PCT is the visualization of the self-attention coefficientsto understand which points have a greater importance through the classification task. Tra-ditional methods often define physics-motivated observables to distinguish the differenttypes of jets. PCT, on the other hand, exploits subjet information by learning affinities ona particle-by-particle basis, resulting in images with distinct features for jets of differentdecay modes.
The authors would like to thank Jean-Roch Vlimant for helpful comments during the de-velopment of this work. This research was supported in part by the Swiss National ScienceFoundation (SNF) under contract No. 200020-182037 and Forschungskredit of the Univer-sityof Zurich, grant no. FK-20-097.
10 BibliographyReferences [1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, andI. Polosukhin,
Attention is all you need , CoRR abs/1706.03762 (2017)[ arXiv:1706.03762 ].[2] J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio,
Empirical evaluation of gated recurrentneural networks on sequence modeling , CoRR abs/1412.3555 (2014) [ arXiv:1412.3555 ].[3] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon,
Dynamic graphCNN for learning on point clouds , CoRR abs/1801.07829 (2018) [ arXiv:1801.07829 ].[4] B. Wu, C. Xu, X. Dai, A. Wan, P. Zhang, Z. Yan, M. Tomizuka, J. Gonzalez, K. Keutzer,and P. Vajda,
Visual transformers: Token-based image representation and processing forcomputer vision , 2020.[5] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby,
An image isworth 16x16 words: Transformers for image recognition at scale , 2020.[6] M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu,
Pct: Point cloudtransformer , 2020.[7] P. T. Komiske, E. M. Metodiev, and J. Thaler,
Energy Flow Networks: Deep Sets forParticle Jets , JHEP (2019) 121, [ arXiv:1810.05165 ].[8] M. J. Dolan and A. Ore, Equivariant Energy Flow Networks for Jet Tagging , arXiv:2012.00964 .[9] J. Shlomi, S. Ganguly, E. Gross, K. Cranmer, Y. Lipman, H. Serviansky, H. Maron, andN. Segol, Secondary Vertex Finding in Jets with Neural Networks , arXiv:2008.02831 .[10] M. J. Fenton, A. Shmakov, T.-W. Ho, S.-C. Hsu, D. Whiteson, and P. Baldi, PermutationlessMany-Jet Event Reconstruction with Symmetry Preserving Attention Networks , arXiv:2010.09206 . – 10 –
11] J. Duarte and J.-R. Vlimant,
Graph Neural Networks for Particle Tracking andReconstruction , arXiv:2012.01249 .[12] J. Pata, J. Duarte, J.-R. Vlimant, M. Pierini, and M. Spiropulu, MLPF: Efficientmachine-learned particle-flow reconstruction using graph neural networks , arXiv:2101.08578 .[13] X. Ju et al., Graph Neural Networks for Particle Reconstruction in High Energy Physicsdetectors , in , 3, 2020. arXiv:2003.11603 .[14] J. S. H. Lee, I. Park, I. J. Watson, and S. Yang,
Zero-Permutation Jet-Parton Assignmentusing a Self-Attention Network , arXiv:2012.03542 .[15] J. Shlomi, P. Battaglia, and J.-R. Vlimant, Graph Neural Networks in Particle Physics , arXiv:2007.13681 .[16] H. Qu and L. Gouskos, Jet tagging via particle clouds , Phys. Rev. D (Mar, 2020) 056019.[17] V. Mikuni and F. Canelli,
ABCNet: An attention-based method for particle tagging , Eur.Phys. J. Plus (2020), no. 6 463, [ arXiv:2001.05311 ].[18] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio,
Graph attentionnetworks , 2017.[19] C. Chen, L. Z. Fragonara, and A. Tsourdos,
GAPNet: Graph Attention based Point NeuralNetwork for Exploiting Local Feature of Point Cloud , arXiv:1905.08705 .[20] E. A. Moreno, O. Cerri, J. M. Duarte, H. B. Newman, T. Q. Nguyen, A. Periwal, M. Pierini,A. Serikova, M. Spiropulu, and J.-R. Vlimant, JEDI-net: a jet identification algorithm basedon interaction networks , arXiv:1908.05318 .[21] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia,R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore,D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker,V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke,Y. Yu, and X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems ,2015. Software available from tensorflow.org.[22] D. P. Kingma and J. Ba,
Adam: A Method for Stochastic Optimization , arXiv e-prints (Dec,2014) arXiv:1412.6980, [ arXiv:1412.6980 ].[23] J. Thaler and K. Van Tilburg, Identifying Boosted Objects with N-subjettiness , JHEP (2011) 015, [ arXiv:1011.2268 ].[24] M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, Hls4ml lhc jet dataset (100 particles) ,Jan., 2020.[25] E. Coleman, M. Freytsis, A. Hinzmann, M. Narain, J. Thaler, N. Tran, and C. Vernieri,
Theimportance of calorimetry for highly-boosted jet substructure , JINST (2018), no. 01T01003, [ arXiv:1709.08705 ].[26] J. Duarte et al., Fast inference of deep neural networks in FPGAs for particle physics , JINST (2018), no. 07 P07027, [ arXiv:1804.06913 ].[27] M. Cacciari, G. P. Salam, and G. Soyez, The anti- k T jet clustering algorithm , JHEP (2008) 063, [ arXiv:0802.1189 ]. – 11 –
28] G. Kasieczka, T. Plehn, J. Thompson, and M. Russel,
Top quark tagging reference dataset ,Mar., 2019.[29] T. Sjöstrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten, S. Mrenna, S. Prestel,C. O. Rasmussen, and P. Z. Skands,
An Introduction to PYTHIA 8.2 , Comput. Phys.Commun. (2015) 159–177, [ arXiv:1410.3012 ].[30]
DELPHES 3
Collaboration, J. de Favereau, C. Delaere, P. Demin, A. Giammanco,V. Lemaître, A. Mertens, and M. Selvaggi,
DELPHES 3, A modular framework for fastsimulation of a generic collider experiment , JHEP (2014) 057, [ arXiv:1307.6346 ].[31] A. Butter et al., The Machine Learning Landscape of Top Taggers , SciPost Phys. (2019)014, [ arXiv:1902.09914 ].[32] P. T. Komiske, E. M. Metodiev, and J. Thaler, Energy flow networks: deep sets for particlejets , Journal of High Energy Physics (Jan, 2019).[33] P. T. Komiske, E. M. Metodiev, and M. D. Schwartz,
Deep learning in color: towardsautomated quark/gluon jet discrimination , JHEP (2017) 110, [ arXiv:1612.01551 ].].