Unsupervised clustering for collider physics
SS UBMITTED TO P HYSICAL R EVIEW D Unsupervised clustering for collider physics
V. Mikuni, a F. Canelli a a University of Zurich,Winterthurerstrasse 190, 8057 Zurich, Switzerland
E-mail: [email protected]
Abstract:
We propose a new method for Unsupervised Clustering for collider physicsnamed UCluster, where information in the embedding space created by a neural network isused to categorise collision events into different clusters that share similar properties. Weshow how this method can be developed into an unsupervised multiclass classification ofdifferent processes and applied in anomaly detection of events to search for new physicsphenomena at colliders. a r X i v : . [ phy s i c s . d a t a - a n ] O c t ontents The Standard Model (SM) of particle physics has been successful so far at describing theinteraction of fundamental particles in high energy physics (HEP). The ATLAS [1] andCMS [2] Collaborations have tested the SM extensively using particle collision events atthe CERN Large Hadron Collider (LHC), while also looking for deviations from the SMthat could point to physics beyond the SM (BSM). Since the underlying nature of the newphysics is not known, new methods designed to be model independent have proliferated inthe recent years. These strategies aim at finding deviations or detecting anomalies whereonly SM events are used and avoiding any dependence on BSM signals. For a short reviewof recent approaches see [3].For measurements of SM parameters, a fully unsupervised multiclass classificationmethod would be advantageous. This is particularly true for precision measurements ofSM parameters. Simulations are often needed to describe the properties of different pro-cesses produced in the LHC collisions. However, simulated events are not always precise inall physics process. This can be caused either by a lack of simulated events compared to thedata expectation, or the need of corrections that are beyond the accuracy of the approxi-mations used in the simulation. Further precision might be computationally prohibitive toachieve, or beyond the capability of our current methods. To mitigate these issues, differentdata-driven methods often replace the event simulations. See [4–7] for recent examples.– 1 –hen two or more processes are not well modelled, the common approach is to designmultiple control regions, often defined using high level distributions, to create a high puritysample that allows a data driven estimation and modelling for this process. However, since itis not always straightforward to define each of these regions without relying on simulations,an unsupervised multiclass classification approach could be used instead.In this paper, we introduce a method for unsupervised clustering (UCluster). Themain idea of UCluster is to use a neural network (NN) to reduce the data dimensionalitywhile retaining the main properties of the data set. In this reduced representation, aclustering objective is added to the training to encourage points embedded in this spaceto be close together when they share similar properties and far apart otherwise. We testthe performance of UCluster in the context of two different tasks: unsupervised multiclassclassification of three different SM processes and unsupervised anomaly detection.
Recently, different and innovative strategies have been proposed for unsupervised trainingin HEP, mostly in the context of event classification. A few examples of methods exploitinganomaly detection signatures as over-densities are [8, 9] and more recently [3]. In theseapproaches, anomalous events are identified as localised excesses in some distribution, wheremachine learning is then used to enhance the local significance of the new physics process.While many strategies focus on unsupervised anomaly detection, other methods havealso been proposed to better understand SM processes without relying on simulation, likethe work developed in [10] for quark and gluon classification with jet topics and the methodsdeveloped in [11], employing latent Dirichlet allocation to build a data-driven top-quarktagger. In order to create an unsupervised and model independent approach, the majorityof the strategies rely on binary classification, where the main goal is to test if an event(or a group of events) resulting from a particle collision is compatible with one out oftwo competing hypotheses. Approaches applied to mixed samples with more than twocomponents were also studied in [12, 13], where prior knowledge of the label proportion foreach component in the mixed sample is required to achieve a good performance.In this work, we propose an unsupervised method for multiclass classification whoseonly requirement is on the expected number of different components inside a mixed sample.The same method is applied to anomalous event detection, where the data is partitionedinto clusters that isolate the anomaly from backgrounds.
UCluster consists of two components: a classification step to ensure events with similarproperties are close in the embedding space created by a NN; and a clustering step, wherethe network learns to cluster embedded events of similar properties. These two tasks areaccomplished by means of a combined loss function containing independent components toguarantee each of the described steps. – 2 –he classification loss ( L focal ), applied to the output nodes of a NN, is defined bythe focal loss [14]. The focal loss improves the classification performance for unbalancedlabels, the case for the classification tasks to be introduced in the following sections. Theexpression for the focal loss is: L focal = − N N (cid:88) j M (cid:88) m y j,m (1 − p θ,m ( x j )) γ log( p θ,m ( x j )) (3.1)where p θ,m ( x j ) is the network’s confidence, for event x j with trainable parameters θ , to beclassified as class m . The term y j,m is 1 if class m is the correct assignment for event j and0 otherwise. In this work, we fix the hyperparameter γ = 2 of the focal loss.The clustering loss ( L cluster ) is defined similarly as the loss developed in [15]: L cluster = 1 N K (cid:88) k n (cid:88) j (cid:107) f θ ( x j ) − µ k (cid:107) π jk , (3.2)where the distance between each event j and each cluster centroid µ k is calculated in theembedding space f θ of the neural network with trainable parameters θ . The function π jk weighs the importance of each event and takes the form: π jk = e − α (cid:107) f θ ( x j ) − µ k (cid:107) (cid:80) k (cid:48) e − α (cid:107) f θ ( x j ) − µ k (cid:107) , (3.3)with hyperparameter α identified as an inverse temperature term. Since L cluster is differen-tiable, stochastic gradient descent can be used to optimise jointly the trainable parameters θ and the centroid positions µ k .The combined loss to be minimised is: L = L focal + βL cluster . (3.4)The hyperparameter β controls the relative importance between the two losses. For thesestudies, we fix β =10 to ensure that both components have the same order of magnitude.Since L cluster requires an initial guess for the centroid positions, we pre-train the modelusing only L focal for 10 epochs. After the pre-training, the K-Means algorithm [16] is appliedto the object embeddings to initialise the cluster centroids. The full training is then carriedout with the combined loss defined in Eq. 3.4. To allow the cluster centres to change, theinverse temperature α has a starting value of 1 and linearly increases by 2 for each followingepoch. The implementation of UCluster is done using ABCNet [17]. ABCNet is a graph-basedneural network where each reconstructed particle is taken as a node in an graph. Theimportance of each node is then learned by the model by the usage of attention mecha-nisms. The embedding space for the clustering loss in Eq. 3.2 is taken as the output of a– 3 – ggregation
GAPLayer {16} (k = 10, H = 1)Input cloud (NxF)Attention features
GAPLayer {128} (k = 10, H = 1)
Attention features Graph featuresGraph features
Global features (Nx4)
Fully connected {16}Fully connected {256}Fully connected {256}Fully connected {128} F u ll y c o nn ec t e d { E } F u ll y c o nn ec t e d { } S o f t m a x { N x x } M a x p oo li n g F u ll y c o nn ec t e d { } Fully connected {64}
Embedding space
Figure 1 . ABCNet architecture used in UCluster for a batch size N, F input features, and embed-ding space of size E. Fully connected layers and encoding node sizes are denoted inside “{}”. Foreach GAPLayer, the number of k-nearest neighbours (k) and heads (H) are given. The additionalcomponents used only for anomaly detection are shown in red. max-pooling layer. For the following studies, the 10 nearest neighbours from each parti-cle are used to calculate the GAPLayers [18]. The initial distances are calculated in thepseudorapidity-azimuth ( η − φ ) space using the distance ∆ R = (cid:112) ∆ η + ∆ φ . The secondGAPLayer uses the Euclidean distances in the space created by subsequent fully connectedlayers. The architectures used for multiclass classification and anomaly detection are de-picted in Fig. 1. Besides the output classification size, both tasks share almost identicalarchitectures. The model used for anomaly detection uses additional high-level distribu-tions and additional skip connections after the pooling layer to improve the classificationperformance. In both cases the batch size is set to 1024 and the training is stopped afterfor 100 epochs.ABCNet is implemented in Tensorflow v1.14 [19]. An Nvidia GTX 1080 Ti graphicscard is used for the training and evaluation steps. For all tasks described in this paper, theAdam optimiser [20] is used. The learning rate starts from 0.001 and decreases by a factor2 every three epochs, until reaching a minimum of 1e-5. The applicability of UCluster is demonstrated on an important problem in high energyphysics: unsupervised multiclass classification. To achieve good performance, we requirea task that results in a suitable embedding space. This task should be such that events– 4 –temming from the same physics process are found close together in the embedding spaceas compared to events from different physics processes. Here, a jet mass classificationtask is chosen in order to provide meaningful event embeddings. Given a set of particlesbelonging to a jet, we ask our model to correctly identify the invariant mass of the jet.This task chosen is inspired by the correlation of jet substructure observables and theinvariant mass of a jet [21, 22]. The goal is to have our machine learning method learnto extract relevant information regarding the different jet substructures by first learninghow to correctly identify the mass of a jet. The simplest solution to this problem could beachieved by the four-vector sum of all the particle’s constituents, leading to an embeddingspace that does not have separation power for different types of jets. To alleviate this issue,we instead define a jet mass label by taking 20 equidistant steps from 10 to 200 GeV, asshown in Fig. 2. The task is then to identify the correct mass interval a jet belongs to,instead of the specific mass value. The input distributions used for the training are listedin Table 1.
20 40 60 80 100 120 140 160 180 200
Jet mass [GeV] N o r m a li z ed en t r i e s / b i n WZTop
Figure 2 . Normalised distribution of the jet mass of each category used in the unsupervisedmulticlass classification task. The bin boundaries represent the boundaries used to define the jetmass labels.
For this study, a sample containing simulated jets originating from W bosons, Z bosons,and top quarks produced at √ s = 13 TeV proton-proton collisions is used. This data set iscreated and configured using a parametric description of a generic LHC detector, described– 5 – able 1 . Description of each feature used to define a point in the ABCNet implementation forunsupervised multiclass classification. Variable Description ∆ η Difference between the pseudo-rapidity of the constituent and the jet ∆ φ Difference between the azimuthal angle of the constituent and the jet log p T Logarithm of the constituent’s p T log E Logarithm of the constituent’s E log p T p T (jet) Logarithm of the ratio between the constituent’s p T and the jet p T log EE(jet)
Logarithm of the ratio between the constituent’s E and the jet E ∆R Distance in the η − φ space between the constituent and the jet PID
Particle type identifier as described in [23]. in [24, 25]. The jets are clustered with the anti-kt algorithm [26] with radius parameterR = 0.8, while also requiring that the jet’s p T is around 1 TeV, ensuring that most of thedecay products of the generated particles are found inside a single jet.The samples are available at [27]. For each jet, up to 100 particles are stored. If moreparticles were found inside a jet, the event is truncated, otherwise zero-padded up to 100.The training set contains 300,000 jets, while the validation sample consists of 140,000 jets.
40 30 20 10 0 10 20 30 4040302010010203040 WZTop 40 30 20 10 0 10 20 30 4040302010010203040 cluster 0cluster 1cluster 2
Figure 3 . t-SNE visualisation of the embedding space after the pre-training and before the fulltraining for multiclass classification with 1000 jets. The true label information is shown on the left,while the initial cluster labels using a k-Means approach is shown on the right.
To visualise the embedding space, the t-SNE visualisation method [28] is used for 1000jets, taken just after the pre-training with only the classification loss, and compared tothe space created after full training is performed. After the pre-training, the initial labelassignment is taken from a k-Means approach, shown in Fig. 3 (right) while the true labelsare shown in Fig. 3 (left). At this stage, the clustering accuracy, calculated using the– 6 –
Figure 4 . t-SNE visualisation of the embedding space, after the full training, created for multiclassclassification with 1000 jets. The true label information is shown on the left while the trained clustersare shown in the right.
Hungarian algorithm [29], is 51%. After the full training is performed, the trained labelsare shown in Fig. 4 (right) with a clustering accuracy of 81% compared to the true labelassignment in Fig. 4 (left).To inspect the quality of the embedding space further, a supervised KNN is trainedusing only the embedding features as inputs. Its performance is then compared to a separateKNN with the same setup, but using only the jet mass as input. The supervised KNNs aretrained to determine class membership given the label of the 30 nearest neighbours. Forthe training, 35k events are used and tested on an independent sample with 15k events.The one-vs-all performance is compared using a receiver operating characteristic (ROC)curve in Fig. 5, where one category is considered the signal of interest while the others areconsidered a background. The area under curve (AUC) for each process is also shown.The resulting AUC for the supervised training using the event embeddings is higher thanthe jet mass alone for all categories. Top quark classification shows a particularly largeimprovement by using the embedding space information. We attribute this improvementto jets containing a top quark showing a broader mass distribution compared to W and Zbosons, resulting in a worse invariant mass separation as seen in Fig. 2. UCluster is able tolearn other jet properties beyond the invariant mass, improving the overall performance.To estimate an upper bound on the UCluster performance, a fully supervised modelusing the full ABCNet architecture is also trained. The ABCNet architecture is used totrain a classifier containing the real class labels as targets, achieving an accuracy of 92%.The comparable results between the fully supervised approach and the KNN trained on theevent embeddings demonstrate how the method is able to reduce the dimensionality of theinput data while retaining relevant information.The accuracies achieved with the full supervision and the other approaches are sum-– 7 –arised in Tab. 2.
Table 2 . Supervised and unsupervised clustering accuracy of UCluster when using only the em-bedding space features.
Algorithm AccuracyPre-training k-Means 51%UCluster 81%Supervised KNN 89%Supervised training 92%
Other jets efficiency T op J e t S i gna l E ff i c i en cy KNN on Jet Mass (AUC = 0.94)KNN on Embeddings (AUC = 0.98)
Other jets efficiency W J e t S i gna l E ff i c i en cy KNN on Jet Mass (AUC = 0.95)KNN on Embeddings (AUC = 0.97)
Other jets efficiency Z J e t S i gna l E ff i c i en cy KNN on Jet Mass (AUC = 0.95)KNN on Embeddings (AUC = 0.97)
Figure 5 . ROC curves for each jet category when considering the other jet categories as a back-ground. – 8 –
Anomaly detection
UCluster can also be applied to anomaly detection. Here, we show an example whereanomalous events, created from an unknown physics process, are found to be close in theembedding space created from a suitable classification task. This technique is motivatedby the fact that, irrespective to the underlying physics model, events created by the samephysics process carry similar event signatures.To create a suitable embedding space we modify the approach described in Sec. 5 totake into account all the particles created in a collision event rather than a single jet. To doso, the classification task is instead changed to a part segmentation task. We consider allparticles associated to a clustered jet. Each particle then receives a label proportional tothe mass of the jet that it was clustered into. For this task, we require the model to learnnot only the mass of the associated jet the particle belongs to, but to also to learn whichparticles should belong to the same jet. This approach is motivated by the fact that jetsubstructure often contains useful information for distinguishing different physics processes,as studied in the previous section.The mass labels are then created by defining 20 equidistant intervals from 10 to 1000GeV. For simplicity, only the two heaviest jets are considered per event. A simplifiedexample of the label definition is shown in Fig. 6.
Figure 6 . Schematic of the labels for anomaly detection. Each particle associated to a clusteredjet receives a mass label proportional to the respective jet mass. The larger the number, the moremassive the associated jet.
To perform these studies, we use the R&D data set created for the LHC Olympics2020 [30]. The data set consists of a million quantum chromodynamic (QCD) dijet eventssimulated with Pythia 8[31] without pile up or multiple parton interactions. The BSMsignal consists of a hypothetical W’ boson with mass m W = 3.5 TeV that decays into anX and Y bosons with masses m X = 500 GeV and m Y = 100 GeV, respectively. The Xand Y bosons, on the other hand, decay promptly into quarks.The detector simulation isperformed with Delphes 3.4.1 [32] and particle flow objects are clustered into jets usingthe Fastjet [33] implementation of the anti-kt algorithm with R = 1.0 for the jet radius.– 9 –vents are required to have at least one jet with p T >1.3 TeV. The number of signal eventsgenerated is set to as 1% of the total number of events. From this data-set, 300k events arerandomly selected for training, 150k for testing and 300k events, are used to evaluate theclustering performance.The distributions used as an input for ABCNet are described in Tab. 3. To improve theclustering performance, a set of high level variables are added to the network. The goal ofthe additional distributions is to parameterize the model performance as described in [34].Here we would also like to point out that, even if a proxy of jet masses is given asan input, the trivial solution is still not achieved, since the model also has to identifywhich particles belong to which jets. To quantify the performance of UCluster, we start byconsidering only two clusters with an embedding space of same dimension. Figure 7 showsthe resulting embedding space without any transformation for 1000 random events. Table 3 . Descriptions of each feature used to define a point in the point cloud implementation formulticlass classification. The last two lines are the global information added to parameterize thenetwork.
Variable Description ∆ η Pseudorapidity difference between the constituent and the associated jet ∆ φ Azimuthal angle difference between the constituent and the associated jet log p T Logarithm of the constituent’s p T log E Logarithm of the constituent’s E log p T p T (jet) Logarithm of the ratio between the constituent’s p T and the associated jet p T log EE(jet)
Logarithm of the ratio between the constituent’s E and the associated jet E ∆R Distance in the η − φ space between the constituent and the associated jet log m J { , } Logarithm of the masses of the two heaviest jets in the event τ { , } Ratio of τ to τ for the two heaviest jets in the event, with τ N defined in [35] Most of the BSM events are found in the same trained cluster, confirming the assump-tion that the signal events would end up close together in the embedding space. However,because of the large QCD background contamination present in the same cluster, the signal-to-background (S/B) ratio remains low, increasing only from 1% to 2.5%. If the proximityassumption holds, then the cluster S/B ratio can be further enhanced by partitioning theevents into more clusters. Indeed, if the classification loss favours an embedding spacewhere signal events remain close together, increasing the number of clusters will decreasethe QCD contamination in the signal clusters whose properties differ from the signal events.To test this assumption, the cluster size is varied while keeping all the other network pa-rameters fixed. The maximum S/B ratio found in a cluster for different clusters sizes isshown in Fig. 8 left. The S/B ratio steadily increases with cluster size, reaching an averageof around 28%. To test how the performance changes with the number of events, differenttraining sample sizes were used while keeping the model fixed, the signal fraction fixed to1%, and number of clusters fixed to 30. The result of each training is then evaluated inan independent sample which is the same size as the training sample. The result of the– 10 – .8 2.9 3.0 3.1 3.2 3.3 3.43.73.83.94.04.14.24.34.4 QCDBSM 2.8 2.9 3.0 3.1 3.2 3.3 3.43.73.83.94.04.14.24.34.4 cluster 0cluster 1
Figure 7 . Visualisation of the embedding space created for anomaly detection using 1000 events.Since the embedding space is already two-dimensional, no additional transformation is applied. Thetrue labels are show on the left, while the clusters created by UCluster are shown on the right. approximate significance (S/ √ B ) is shown in Fig. 8 on the right. For initial significance inthe range 2-6, we observe enhancements by factors 3-4.The uncertainties in Fig. 8 show the standard deviation of five independent trainingswith different random initial weights. When many clusters are used, the clustering stabilitystarts to decrease, as evidenced by the larger error bars. This behaviour is expected, sincea large cluster multiplicity requires clusters to target more specific event properties thatmight differ in between different trainings.The dijet mass distribution for all events (left) and for the cluster with the highest S/Bratio (right) are shown in Fig. 9. In order to relate the clusters in embedding space to physical observables, four high-levelfeatures were added to the anomaly detection model: the invariant mass and τ of the twoheaviest jets in the event.To visualise the physical properties of the clusters, histograms of these four observablesare shown in Fig. 10 with the stacked contributions of each individual cluster shown forUCluster with 5 clusters. From these distributions, there is a sharp separation between thecluster boundaries for the mass of heaviest jet in the dijet event. The sharp separation injet mass is also related to the separation that is observed in the heaviest jet τ . As pointedout in [22], QCD jets show a more distinctive two-prong structure when they have a largermass. Therefore, heavier jets tend to have lower values of τ . This correlation between jetmass and jet substructure is why the jet mass classification task leads to clusters where jetswithin a cluster have similar substructure.– 11 –
20 40 60 80 100
Number of clusters M a x . S / B · Number of events BS / All eventsCluster with max. significance
Figure 8 . Maximum signal-to-background ratio found for different clustering sizes (left) and max-imum significance found for UCluster trained and evaluated on different number of events withcluster size fixed to 30 (right). The uncertainty shows the standard deviation of the results fromfive trainings with different random weight initialisation.
Dijet mass [GeV] · E n t r i e s / b i n BackgroundSignal
Dijet mass [GeV] E n t r i e s / b i n BackgroundSignal
Figure 9 . Dijet mass distribution of the events prior to clustering (left) and for the cluster withthe highest S/B ratio (right), found when the data are partitioned into 60 clusters.
In this work, we presented UCluster, a new method to perform unsupervised clusteringfor collision events in high energy physics. We explored two potential applications for thismethod: unsupervised multiclass classification and anomaly detection.The ability of the embedding space to separate different processes is directly connected– 12 –
200 400 600 800 1000
Heaviest jet mass [GeV] · E n t r i e s / b i n t · E n t r i e s / b i n Second heaviest jet mass [GeV] · E n t r i e s / b i n t · E n t r i e s / b i n Figure 10 . Distributions for the 4 high level features used to parameterize the performance ofUCluster trained with 5 clusters. Events belonging to the same clusters receive the same color. Thestacked contribution of all clusters is then shown. to the secondary task used in conjunction with the clustering objective. We proposed aclassification task which was motivated by the observations of the correlation between thejet mass and jet substructure observables which is often useful for jet tagging. By learningto classify the mass of a jet, UCluster created an embedding space that was shown to havea better separation power for all the class components in the data set compared to the jetmass alone.UCluster was also studied for unsupervised anomaly detection. In this context, theclassification task on jet masses was expanded to cover the entire event topology. Usingthis method, we were able to increase the signal-to-background ratio in a given cluster froman initial value of 1% up to 28%, while also observing a stable performance even for a large– 13 –luster multiplicity.We remark that different tasks than the ones proposed in this work can also be used tocreate meaningful embeddings. In particular, recent advances in auto-encoders applied toparticle physics [36, 37] are strong candidates for a summary statistic that can encapsulatethe event information in a lower dimensional representation, suitable for clustering.Compared to [12, 13], we relax the requirements on the label proportion for each differ-ent component in a mixed sample. One interesting point to notice is that, as presented in[38], the clustering assignment problem can instead be interpreted as an optimal transportproblem. This insight is particularly interesting when the label proportions are known apriori. In this case, the additional knowledge of the label proportions can be directly addedto the model as a regularisation term of the form: L reg . cluster = min K (cid:88) k n (cid:88) j (cid:107) f θ ( x j ) − µ k (cid:107) π jk + απ jk (log( π jk ) − . (7.1)This approach requires the term π jk to be numerically solved, subject to: π K = 1 n N , (7.2) π T N = w, (7.3)where w represent the vector of label proportions.Furthermore, we considered an application where the initial number of mixed compo-nents was known. This condition was necessary to select a suitable number of clusters.However, this requirement could also be relaxed, as shown in [39, 40], for example, wherethe clustering model is able to identify the optimal number of partitions given the propertiesof a data set.Finally, UCluster can also be used in conjunction with other anomaly detection ap-proaches, where first a set of interesting clusters are identified and then further inspectedby other methods. The authors would like to thank Kyle James Read Cormier for the valuable suggestionsregarding the development and clarity of this document. This research was supported inpart by the Swiss National Science Foundation (SNF) under contract No. 200020-182037. [1]
ATLAS
Collaboration, G. Aad et al.,
The ATLAS Experiment at the CERN Large HadronCollider , JINST (2008) S08003.[2] CMS
Collaboration, S. Chatrchyan et al.,
The CMS Experiment at the CERN LHC , JINST (2008) S08004. – 14 –
3] B. Nachman and D. Shih,
Anomaly detection with density estimation , Phys. Rev. D (Apr, 2020) 075042.[4]
ATLAS
Collaboration, M. Aaboud et al.,
Search for new phenomena with large jetmultiplicities and missing transverse momentum using large-radius jets and flavour-taggingat ATLAS in 13 TeV pp collisions , JHEP (2017) 034, [ arXiv:1708.02794 ].[5] CMS
Collaboration, A. M. Sirunyan et al.,
Measurement of the t¯tb¯b production crosssection in the all-jet final state in pp collisions at √ s =
13 TeV , Phys. Lett. B (2020)135285, [ arXiv:1909.05306 ].[6]
CMS
Collaboration, A. M. Sirunyan et al.,
Search for high mass dijet resonances with a newbackground prediction method in proton-proton collisions at √ s =
13 TeV , JHEP (2020)033, [ arXiv:1911.03947 ].[7] ATLAS
Collaboration, G. Aad et al.,
Dijet resonance search with weak supervision using √ s = 13 TeV pp collisions in the ATLAS detector , arXiv:2005.02983 .[8] E. M. Metodiev, B. Nachman, and J. Thaler, Classification without labels: Learning frommixed samples in high energy physics , JHEP (2017) 174, [ arXiv:1708.02949 ].[9] J. H. Collins, K. Howe, and B. Nachman, Extending the search for new resonances withmachine learning , Phys. Rev. D (2019), no. 1 014038, [ arXiv:1902.02634 ].[10] E. M. Metodiev and J. Thaler, Jet topics: Disentangling quarks and gluons at colliders , Physical Review Letters (Jun, 2018).[11] B. M. Dillon, D. A. Faroughy, and J. F. Kamenik,
Uncovering latent jet substructure , Phys.Rev. D (Sep, 2019) 056002.[12] N. Quadrianto, A. J. Smola, T. S. Caetano, and Q. V. Le,
Estimating labels from labelproportions , Journal of Machine Learning Research (2009), no. 82 2349–2374.[13] G. Patrini, R. Nock, P. Rivera, and T. Caetano, (almost) no label no cry , in Advances inNeural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D.Lawrence, and K. Q. Weinberger, eds.), pp. 190–198. Curran Associates, Inc., 2014.[14] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár,
Focal loss for dense object detection , CoRR abs/1708.02002 (2017) [ arXiv:1708.02002 ].[15] M. M. Fard, T. Thonet, and É. Gaussier,
Deep k-means: Jointly clustering with k-means andlearning representations , CoRR abs/1806.10069 (2018) [ arXiv:1806.10069 ].[16] J. A. Hartigan and M. A. Wong,
Algorithm as 136: A k-means clustering algorithm , Journalof the Royal Statistical Society. Series C (Applied Statistics) (1979), no. 1 100–108.[17] V. Mikuni and F. Canelli, ABCNet: An attention-based method for particle tagging , Eur.Phys. J. Plus (2020), no. 6 463, [ arXiv:2001.05311 ].[18] C. Chen, L. Zanotti Fragonara, and A. Tsourdos,
GAPNet: Graph Attention based PointNeural Network for Exploiting Local Feature of Point Cloud , arXiv e-prints (May, 2019)arXiv:1905.08705, [ arXiv:1905.08705 ].[19] M. Abadi et al., TensorFlow: Large-scale machine learning on heterogeneous systems , 2015.Software available from tensorflow.org.[20] D. P. Kingma and J. Ba,
Adam: A Method for Stochastic Optimization , arXiv e-prints (Dec,2014) arXiv:1412.6980, [ arXiv:1412.6980 ]. – 15 –
21] J. Dolen, P. Harris, S. Marzani, S. Rappoccio, and N. Tran,
Thinking outside the ROCs:Designing Decorrelated Taggers (DDT) for jet substructure , JHEP (2016) 156,[ arXiv:1603.00027 ].[22] P. T. Komiske, E. M. Metodiev, and J. Thaler, Metric Space of Collider Events , Phys. Rev.Lett. (2019), no. 4 041801, [ arXiv:1902.02346 ].[23] M. Tanabashi et al.,
Review of Particle Physics , Phys. Rev.
D98 (2018), no. 3 030001.[24] E. Coleman, M. Freytsis, A. Hinzmann, M. Narain, J. Thaler, N. Tran, and C. Vernieri,
Theimportance of calorimetry for highly-boosted jet substructure , JINST (2018), no. 01T01003, [ arXiv:1709.08705 ].[25] J. Duarte et al., Fast inference of deep neural networks in FPGAs for particle physics , JINST (2018), no. 07 P07027, [ arXiv:1804.06913 ].[26] M. Cacciari, G. P. Salam, and G. Soyez, The anti- k T jet clustering algorithm , JHEP (2008) 063, [ arXiv:0802.1189 ].[27] M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, Hls4ml lhc jet dataset (100 particles) ,Jan., 2020.[28] L. van der Maaten and G. Hinton,
Visualizing data using t-SNE , Journal of MachineLearning Research (2008) 2579–2605.[29] H. W. Kuhn, The hungarian method for the assignment problem , Naval Research LogisticsQuarterly (1955), no. 1-2 83–97,[ https://onlinelibrary.wiley.com/doi/pdf/10.1002/nav.3800020109 ].[30] G. Kasieczka, B. Nachman, and D. Shih, R&D Dataset for LHC Olympics 2020 AnomalyDetection Challenge , Apr., 2019.[31] T. Sjöstrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten, S. Mrenna, S. Prestel,C. O. Rasmussen, and P. Z. Skands,
An Introduction to PYTHIA 8.2 , Comput. Phys.Commun. (2015) 159–177, [ arXiv:1410.3012 ].[32]
DELPHES 3
Collaboration, J. de Favereau, C. Delaere, P. Demin, A. Giammanco,V. Lemaître, A. Mertens, and M. Selvaggi,
DELPHES 3, A modular framework for fastsimulation of a generic collider experiment , JHEP (2014) 057, [ arXiv:1307.6346 ].[33] M. Cacciari, G. P. Salam, and G. Soyez, FastJet User Manual , Eur. Phys. J. C (2012)1896, [ arXiv:1111.6097 ].[34] P. Baldi, K. Cranmer, T. Faucett, P. Sadowski, and D. Whiteson, Parameterized neuralnetworks for high-energy physics , Eur. Phys. J. C (2016), no. 5 235, [ arXiv:1601.07913 ].[35] J. Thaler and K. Van Tilburg, Identifying boosted objects with n-subjettiness , Journal of HighEnergy Physics (2011), no. 3 15.[36]
ATLAS Collaboration
Collaboration,
Deep generative models for fast shower simulationin ATLAS , Tech. Rep. ATL-SOFT-PUB-2018-001, CERN, Geneva, Jul, 2018.[37] T. Cheng, J.-F. Arguin, J. Leissner-Martin, J. Pilette, and T. Golling,
VariationalAutoencoders for Anomalous Jet Tagging , arXiv:2007.01850 .[38] A. Genevay, G. Dulac-Arnold, and J.-P. Vert, Differentiable deep clustering with cluster sizeconstraints , ArXiv abs/1910.09036 (2019).[39] Y. Ren, N. Wang, M. Li, and Z. Xu,
Deep density-based image clustering , 2018. – 16 –
40] C. Patil and I. Baidari,
Estimating the optimal number of clusters k in a dataset using datadepth , Data Science and Engineering (2019), no. 2 132–140.(2019), no. 2 132–140.