[PDF] Clustering of Electromagnetic Showers and Particle Interactions with Graph Neural Networks in Liquid Argon Time Projection Chambers Data

Abstract

Liquid Argon Time Projection Chambers (LArTPCs) are a class of detectors that produce high resolution images of charged particles within their sensitive volume. In these images, the clustering of distinct particles into superstructures is of central importance to the current and future neutrino physics program. Electromagnetic (EM) activity typically exhibits spatially detached fragments of varying morphology and orientation that are challenging to efficiently assemble using traditional algorithms. Similarly, particles that are spatially removed from each other in the detector may originate from a common interaction. Graph Neural Networks (GNNs) were developed in recent years to find correlations between objects embedded in an arbitrary space. The Graph Particle Aggregator (GrapPA) first leverages GNNs to predict the adjacency matrix of EM shower fragments and to identify the origin of showers, i.e. primary fragments. On the PILArNet public LArTPC simulation dataset, the algorithm achieves achieves a shower clustering accuracy characterized by a mean adjusted Rand index (ARI) of 97.8 % and a primary identification accuracy of 99.8 %. It yields a relative shower energy resolution of (4.1+1.4/ E(GeV) − − − − − − − √ )% and a shower direction resolution of (2.1/ E(GeV) − − − − − − − √ ) ∘ . The optimized algorithm is then applied to the related task of clustering particle instances into interactions and yields a mean ARI of 99.2 % for an interaction density of ∼O(1) m −3 .

Full PDF

CClustering of Electromagnetic Showers and Particle Interactions withGraph Neural Networks in Liquid Argon Time Projection Chambers Data

Fran¸cois Drielsma, ∗ Qing Lin, Pierre Cˆote de Soux, Laura Domin´e, Ran Itay, Dae Heun Koh, Bradley J. Nelson, Kazuhiro Terao, Ka Vang Tsang, and Tracy L. Usher (on behalf of the DeepLearnPhysics Collaboration) SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA ICME, Stanford University, Stanford, CA, 94305, USA Stanford University, Stanford, CA, 94305, USA

Liquid Argon Time Projection Chambers (LArTPCs) are a class of detectors that produce highresolution images of charged particles within their sensitive volume. In these images, the clusteringof distinct particles into superstructures is of central importance to the current and future neutrinophysics program. Electromagnetic (EM) activity typically exhibits spatially detached fragments ofvarying morphology and orientation that are challenging to eﬃciently assemble using traditionalalgorithms. Similarly, particles that are spatially removed from each other in the detector mayoriginate from a common interaction. Graph Neural Networks (GNNs) were developed in recentyears to ﬁnd correlations between objects embedded in an arbitrary space. The Graphical ParticleAggregator (GrapPA) ﬁrst leverages GNNs to predict the adjacency matrix of EM shower fragmentsand to identify the origin of showers, i.e. primary fragments. On the PILArNet public LArTPCsimulation dataset, the algorithm achieves a shower clustering accuracy characterized by a meanadjusted Rand index (ARI) of 97.8 % and a primary identiﬁcation accuracy of 99.8 %. It yields arelative shower energy resolution of (4 . . / (cid:112) E (GeV)) % and a shower direction resolution of(2 . / (cid:112) E (GeV)) ◦ . The optimized algorithm is then applied to the related task of clustering particleinstances into interactions and yields a mean ARI of 99.2 % for an interaction density of O (1) m − . I. INTRODUCTION

In recent years, accelerator-based neutrino oscillationexperiments in the United States have been designedto use Liquid Argon Time Projection Chambers (LArT-PCs) as their central neutrino detection technology [1].Charged particles that traverse these detectors ionize thenoble liquid. The electrons so produced are drifted in auniform electric ﬁeld towards a readout plane. The lo-cation of the electrons collected on the anode, combinedwith their arrival time, oﬀers mm-scale resolution imagesof charged particle interactions [2]. This level of trackingprecision – coupled to the detailed calorimetric informa-tion that a totally active detector provides – is believed tobe the key to resolving some of the ambiguities observedin previous experiments and to extending their energyreach to probe the MeV-scale physics sector [3, 4].The Short Baseline Neutrino Program (SBN) [5] aimsto clarify an anomalous signal observed by the Mini-BooNE experiment [6]. It will eventually make use ofthree LArTPCs of varying sizes: the Short BaselineNear Detector (SBND, 112 t), MicroBooNE (90 t) [7] andICARUS (600 t) [8]. The DUNE experiment [9] will usethe LArTPC technology to measure long-baseline neu-trino oscillations with unprecedented precision. It willconsist of a near detector (105 t) and a far detector(40 kt). The main signal for these physics endeavors is theunambiguous appearance of electron neutrinos – whichmanifest themselves as electron showers – in a beam of ∗ [email protected] muon neutrinos. Their success thus centrally dependsupon the accurate reconstruction of showers and specif-ically of their initial positions, directions, and energies.Both experiments also face the substantial challenge ofassembling particles into complete neutrino interactionevents, which are often accompanied by unrelated activ-ity. Detectors located close to the surface, such as thoseof the SBN program, suﬀer from a high rate of cosmicrays, while the future DUNE near detector will observea high rate of pileup events, with up to twenty neutrinointeractions per beam pulse.Electromagnetic (EM) showers exhibit an incoherentbranching tree structure in LArTPCs. As an electronpropagates through the dense detector medium, it losesenergy through ionization and stochastically emits pho-tons until it comes to a stop. The emitted photons prop-agate through the noble liquid with a mean free path of15–30 cm, for energies in the range 10–1000 MeV, beforethey either produce an electron-positron pair or emit aCompton electron [10]. A single electron or photon cre-ates a cascade of spatially distinct EM particles – referredto as fragments in this study – that may be far removedfrom one another in the image. Assembling these frag-ments into coherent shower objects has been a persistentchallenge in LArTPCs that has not yet been fully re-solved using traditional programming techniques.Graph Neural Networks (GNNs) became popular in re-cent years as a way to leverage the concept of receptiveﬁeld developed in the context of Convolutional NeuralNetworks and generalize it to arbitrary objects [11]. Thereceptive ﬁeld is no longer exclusively determined by asquare neighborhood of pixels in an image but rather de- a r X i v : . [ phy s i c s . i n s - d e t ] S e p Input Fragments

DBSCAN ... Input Graph

Featureextraction ... ...

Edge Update

EdgeLayer ...Node Update

NodeLayerEdgeLayer ... Output graph

Edgeselection

Groups

Nodeselection

Primaries

FIG. 1. Architecture of the Graphical Particle Aggregator (GrapPA) for shower clustering and primary identiﬁcation. Theinput set of voxels associated with electromagnetic showers is passed through a density-based clustering algorithm that formsdense shower fragments. Each fragment is encoded into a set of node features in a graph connected by arbitrary edges carryingedge features. Edge and node features are updated through a series of message passing composed of edge and node updaters.The updated edge features are used to constrain the connectivity graph and the updated node features to identify primaries. ﬁned by an adjacency matrix whose elements indicatewhether objects are connected by an edge in a graph.This development is ideally suited to the clustering ofEM showers in LArTPCs as they may each be repre-sented by a collection of shower fragments (the graphnodes) that are connected by invisible photons (the graphedges). The task is then to identify the edges that con-nect fragments within a shower and to tag the fragmentsthat initiate showers, i.e the primary fragments.In the case of interaction clustering, distinct sourcesof activity in the detector can be clustered into separategroups using a GNN by building a graph in which parti-cles are nodes and edges represent correlations betweenparticles. The task in then is to identify the edges thatconnect particles that belong to the same interaction.The study presented in this paper is reproducible us-ing a singularity [12] software container , implemen-tations available in the lartpc mlreco repository anda public simulation sample [13] made available by theDeepLearnPhysics collaboration .Section II presents the architecture of the reconstruc-tion chain, as applied to the shower clustering task, fromthe LArTPC image input to the inference stage. Sec-tion III outlines the studies that were conducted to op-timize the chain. Section IV shows a detailed analysisof the shower reconstruction performance on this sam-ple. Section V describes how the algorithm was adaptedto the particle interaction clustering task and providessome performance metrics. https://singularity-hub.org/containers/11757 https://github.com/DeepLearnPhysics/lartpc mlreco3d https://osf.io/6gvf4/ EMTrackMichelDeltaLE

FIG. 2. Example image from the simulated LArTPC inputdataset. The colors correspond to the semantic type of theparticle that deposited the energy: ‘EM’ stands for electro-magnetic, ‘track’ for protons, pions and muons, ‘Michel’ formuon decay electrons, ‘Delta’ for delta electrons and ‘LE’ forlow energy scatters (low energy EM and nuclear activity).

II. RECONSTRUCTION CHAINA. Data

The Graphical Particle Aggregator (GrapPA) isschematically illustrated for the shower clustering taskin ﬁgure 1. The input LArTPC dataset, PILArNet [13],consists of rasterized 3D energy deposition images of sim-ulated ionizing particle interactions in liquid argon. Animage corresponds to a ∼

12 m cubic volume of liquid ar-gon with an edge length of 768 voxels (1 voxel = 3 mm ).Each image includes a multiple-particle vertex, i.e. aset of particles originating from a common vertex, over-layed with randomly located track-like cosmic muon tra-jectories and shower-like instances. An example image isshown in ﬁgure 2. This image contains three tracks andtwo showers originating from the vertex in addition totwo muon tracks and a detached shower.Figure 3 shows the distributions of the number of par-ticles coming from a common vertex in each image, thenumber of overlayed particles and their respective typecompositions. These data emulate a particle densityabove that expected in an SBN program LArTPC de-tector. The data is split in two data samples: a trainingset of 125480 images and a validation set of 22439 images. VertexRandomTotal

FIG. 3. Distribution of the number of particles, showersand tracks in each image, originating from a common vertex(blue), randomly scattered (orange) and combined (green).

For the shower clustering task, the set of input voxelsis constrained to those associated with electromagnetic(EM) activity. In the scope of this paper, the particletype of each voxel is assumed to be known, as it hasbeen demonstrated that semantic segmentation neuralnetwork can identify EM voxels with a 99.4 % voxel-wiseaccuracy [14]. A similarly designed neural network hasbeen shown to work on real data from the MicroBooNELArTPC detector [15].

B. Fragment clustering

A shower object encompasses all energy deposition as-sociated with a single primary electron or photon and A primary electron or photon does not have an EM parent and isneither a delta ray, a Michel electron nor a deexcitation photon. all its subsequent EM daughters. A shower fragment isdeﬁned as a spatially dense subset of voxels of a showerinstance such that each voxel is in the Moore neighbor-hood of at least one other voxel in the fragment, i.e. atleast ‘touches’ it diagonally. As the ground truth frag-ments are not known a priori, EM voxels are clustered us-ing the Density Based Spatial Clustering of Applicationswith Noise (DBSCAN) algorithm [16] with a distancescale set to 1.9. DBSCAN cannot break ground-truthfragments, by deﬁnition, as any two touching voxels aremerged by this algorithm. It can, however, merge two ormore fragments that belong to separate shower instancesinto one. Purity is deﬁned as the maximum fraction ofvoxels in a predicted fragment that belongs to a singleinstance. In this dataset, fragments reconstructed withDBSCAN contain more than one ground-truth label in0.2 % of cases, which corresponds to ∼ . . . . . . . . . . . . FIG. 4. Distribution of the purity of overlapping shower frag-ments built using DBSCAN with a distance scale of 1.9. Inthe top box plot, the blue diamond represents the mean, theorange line the median, the box the IQR and the whiskersspan from the 10 th to the 90 th percentiles. Fragments strictly smaller than 10 voxels are not in-cluded in the input to the clustering task as they haveno clear directionality. To characterize the impact of thisselection on the shower energy resolution, primary show-ers that are at least 95 % contained in the image volumeare selected. The fraction of the total shower energy de-posited in fragments of size 10 and above is represented asa function of shower energy in ﬁgure 5. The shower com-pleteness is on average ∼ . In the ﬁeld of computer vision, ground truth refers to the datalabels, i.e. a predeﬁned target for the reconstruction. energy deposited in small fragments decreases slightly.This selection introduces an average relative uncertaintyof 4.2 % on the ﬁnal energy reconstruction. . . . E f / E h E f /E i = 82 . ± .

03 %0 200 400 600 800 1000Deposited energy [MeV]0 . . . σ E f / E FIG. 5. Fraction of the energy deposited by a shower in frag-ments of size 10 voxels and above. The orange markers on thetop pad represent the mean and their error bars the RMS; thelatter is shown on its own in the bottom pad. The green lineis a constant ﬁt to the markers above 100 MeV.

Figure 6 shows the fragments constructed upon theshower voxels of an image using the DBSCAN algorithm.The goal of the clustering algorithm is to cluster thesefragments together into shower objects.

FIG. 6. Image of the EM shower voxels with a color scalethat represents the DBSCAN cluster ID.

C. Graph Representation

A graph G ( V, E ) is a collection of nodes V and edges E ⊆ V × V . Each shower fragment represents a node ina graph. There is an arbitrary number of ways to builda graph between the nodes. The optimal choice of inputedges is discussed in section III B. Each node is encodedas a vector of F v features and each edge as a vector of F e features. Multiple ways of extracting these features arestudied and optimized in section III B. D. Message Passing

Message passing is used to communicate informationwithin a graph [11, 17]. During the information propaga-tion process, at step s +1, edge attributes are updated bycombing the features coming from the nodes it connectstogether with its own through e s +1 ij = ψ Θ ( x si , x sj , e sij ) , (1)with x i , x j the node feature vectors associated withnodes i and j , respectively, e ij the features of the edgeconnecting i to j and ψ a diﬀerentiable function such asa Multi Layer Perceptron (MLP) [18]. In order to up-date the node features, the message coming from node j communicated to node i at step s + 1 is deﬁned as m s +1 ji = φ Θ ( x sj , e s +1 ji ) , (2)with φ Θ a diﬀerentiable function such as an MLP. Themessages coming from the neighborhood N ( i ) of node i are then aggregated with its own at each step to updateits features following x s +1 i = χ Θ ( x si , (cid:3) N ( i ) m s +1 ji ) , (3)with χ Θ a diﬀerentiable function such as an MLP and (cid:3) an aggregation function such as sum, mean or max.The speciﬁc implementation of the diﬀerentiable func-tions and the number of message passing steps are stud-ied and optimized in section III D. E. Loss deﬁnition

Downstream of the message passing steps, two fullyconnected linear layers reduce the edge and node featuresseparately to two channels each. The outputs are passedthrough the softmax function and the second channel isused to create a vector of N v primary scores for nodes, s v , and a vector of N e adjacency scores for edges, s e . The The neighborhood of node i , N ( i ), is the set of nodes which areadjacent to node i in the input graph, i.e. that share an edgewith node i . binary cross-entropy loss is then applied to node and edgescores alike as follows: L v = − N v (cid:88) i y i ln( s vi ) + (1 − y j ) ln(1 − s vi ) , (4) L e = − N e (cid:88) ( i,j ) ∈ E a ij ln( s eij ) + (1 − a ij ) ln(1 − s eij ) , (5)with y i the primary label of node i and a ij the adjacencylabel of the edge connecting node i to node j . The pri-mary label, y i , is 1 if the fragment initiated the shower, 0otherwise. The deﬁnition of the target adjacency matrix, A , which determines the adjacency labels, is discussed insection III E. The total loss is deﬁned as L = L v + L e . F. Inference

The network predicts an edge score matrix, S e , whichtries to replicate the predeﬁned ground-truth adjacencymatrix, A . In a graph partition problem, A should bedesigned such that, if a ij = 1, then nodes i and j belongto the same group. The converse statement does not haveto hold, as nodes i and j may not be connected directlyas long as they are linked through an indirect path.Figure 7 schematically illustrates how the score ma-trix is converted into a node partition prediction. At theinference stage, one has to ﬁnd the optimal node parti-tion, ˆ g , such that, if s eij is close to 1, nodes i and j areencouraged to be put in the same true group. Mathe-matically, this corresponds to minimizing the partitioncross-entropy loss deﬁned as L ( S e | g ) = − N e (cid:88) ( i,j ) ∈ E δ g i ,g j ln( s eij )+ (1 − δ g i ,g j ) ln(1 − s eij ) , (6)for g , i.e. ˆ g = min g L ( S e | g ), with δ the Kronecker delta.If an edge is not in the input graph, it does not contributeto the grouping optimization loss.The cardinality of the set of all possible partitions ofa set of n nodes, G , corresponds to the Bell number B n .This number grows quickly with the number of nodes toprohibitively large values that rule out brute-force opti-mization. Instead, edges are considered sequentially tobe added to a predicted adjacency matrix, ˆ A , in order ofdecreasing edge score. At each step, the partition scoreis evaluated by running the Union-Find algorithm [19] onthe predicted matrix. The edge is permanently added toˆ A if the new partition improves the loss deﬁned in equa-tion 6. The optimizer stops when the next available edgehas a score below 0.5.Given the predicted partition of the graph nodes, ˆ g ,the primary nodes are identiﬁed by picking those withthe highest primary score in each group. G. Metrics

Three clustering metrics are used to systematicallycharacterize the performance of the clustering algorithmsin this paper: eﬃciency, purity and adjusted Rand index(ARI) [20]. The eﬃciency and purity are deﬁned as:Eﬃciency = 1 N N t (cid:88) i =1 max j | c j ∩ t i | , (7)Purity = 1 N N p (cid:88) i =1 max j | c i ∩ t j | , (8)with N t the true number of showers, N p the predictednumber of showers, N the total number of voxels in theimage, t k the k th true cluster and c k the k th predictedcluster. The Rand index (RI) is deﬁned as the accuracyon binary edge classiﬁcation between any two pair of vox-els; the ARI ajdusts for random chance by shifting thismeasure with respect to the average RI obtained for allpossible permutations of the predicted labels, i.e.ARI = RI − E (RI)max(RI) − E (RI) , (9)with E the expectation value. Note that if one of thepartitions contains a single cluster, a single voxel mistakeyields an ARI of 0, as permutations do not aﬀect RI. III. OPTIMIZATIONA. Training regiment

In this section, the reconstruction steps are optimizedto maximize the clustering accuracy in a model that usesground-truth shower fragments as an input and does notattempt to predict shower primaries. Variations are stud-ied with respect to a baseline model in which • the input node and edge features are geometric; • the input graph is a complete graph; • the number of message passings is 3; • the ground-truth adjacency matrix corresponds toa cluster graph built upon groups; • the batch size is 128; • the Adam optimizer [21] is used with a learningrate is 0.0025.The reconstruction chain is trained for 25 epochs of thetraining set for each conﬁguration under study. The edgeclassiﬁcation accuracy and cross-entropy loss are cross-validated with 10000 events from the validation set every ∼ Edge scores0.9 0.10.9 0.6 0.90.8 0.70.9 0.9 Empty graph L ’ . L ’ . L ’ . ... Optimized partition L ’ . FIG. 7. Schematics of the edge selection mechanism at the inference stage. The partition loss deﬁned in equation 6 is ﬁrstcalculated for an empty graph in which each node forms its own group. Edges are sequentially added in order of decreasingscore only if the new partition they form decreases the partition loss. The edge with score 0.6 is not added to the graph becauseit would put the nodes connected by the edge with score 0.1 in the same group and increase the partition loss.

B. Feature extraction

Each shower fragment has to be encoded into a set ofnode features and each edge in the input graph can beprovided with a set of features of its own. Two methodsof feature extraction have been considered in the contextof this paper and are presented and compared in thissubsection: Geometric and CNN.Geometric features are a list of summary statistics ofthe distribution of fragment voxels in Euclidean space. Itincludes the following 22 features: • normalized covariance matrix (9 features); • normalized principal axis (3 features); • centroid (3 features); • number of voxels (1 feature); • initial point (3 features); • normalized initial direction (3 features).In this study, the initial point of a fragment is acquiredby picking the center of the voxel, v s , closest the truesimulated particle ﬁrst energy deposition. In a realisticsetting, it will be reconstructed using the Point ProposalNetwork (PPN) [22]. Figure 8 shows the distribution ofthe distance between the point with the highest PPNscore in a given fragment and the true initial fragmentpoint. The point is closer than 3 voxels from the truepoint in 96.3 % of cases.The initial direction, ˆ d , is estimated by calculating thenormalized mean direction from the initial point, v s , toall the other fragment voxels within a neighborhood dis-tance R n of the initial point, i.e.ˆ d = (cid:104) v (cid:105) / |(cid:104) v (cid:105)| , (cid:104) v (cid:105) = 1 N n (cid:88) { i | d si ≤ R n } ( v i − v s ) , (10)with d si the Euclidean distance between v i and v s and N n = { i | d si ≤ R n } the number of voxels in the neigh-borhood. The radius was optimized to R n = 5 by min-imizing the spread of the angle between this direction . . . . . . . . . . . . . . FIG. 8. Distance between the point with the highest PPNscore within a fragment and the true initial point of the frag-ment. In the top box plot, the blue diamond represents themean, the orange line the median, the box the IQR and thethe whiskers span from the 10 th to the 90 th percentiles. estimate and the true normalized particle momentum, d = p / | p | , as shown for multiple radial cut values inﬁgure 9. In the following, the importance of the initialpoint and direction for the clustering task is studied bytraining the model without them (referred to as NI).The geometric edge features include 19 components: • closest points of approach (CPAs) (6 features); • displacement between CPAs (3 features); • outer product of displacement (9 features); • length of displacement (1 feature).The CNN feature extractor treats each fragments asan individual node image and each pair of fragment asan edge image. The images are passed through a sparseConvolutional Neural Network (CNN) which consists ofalternating ResNet blocks [23] and strided convolutionswhich progressively reduce the spatial size of each imagewhile increasing the number of features in each channel. ∞ R n [voxel]020406080 a r cc o s ( ˆ d · d ) [ ◦ ] FIG. 9. Boxplot of the angle between the reconstructed di-rection of a shower fragment, ˆ d , and the normalized true par-ticle momentum, d , as a function of the neighborhood cut, R n . The blue diamonds represent the means, the orange linesthe medians, the boxes the IQRs and the whiskers extend atmost 150 % of the IQR on either side of the box. In this study, the kernel size is set to 5, the number ofstrided convolutions to 8 and the input number of ﬁltersto 32. The features of the most spatially compressed 3 voxels image are average pooled to form a vector of 64features per node and 64 features per edge.Figure 10 shows the training and validation curves foreach type of feature extractor, produced using the train-ing and validation datasets, respectively. Removing theinitial point and initial direction (NI) from the fragmentsreduces the edge prediction accuracy by ∼ Epochs0 . . . L o ss GeoGeo NI CNNGeo+CNN0 5 10 15 20 25Epochs0 . . . A cc u r a c y FIG. 10. Edge score loss and edge prediction accuracy for thediﬀerent encoders under considerations. The training curvesare represented as lines and the validation points as roundmarkers with statistical error bars.

C. Input graph

The input set of fragments is partitioned into groupsbased on an adjacency score matrix. An edge is given ascore only if it appears in the input graph. Several graphconstruction methods were studied to ﬁnd the optimalreceptive ﬁeld for nodes: • complete graph (all possible pairwise edges); • Delaunay graph (edges in the spatial Delaunay tri-angulation of the input voxels only); • MST graph (edges in the spatial minimum span-ning tree of the input voxels only); • i to node j , its reciprocal pathexists as well. The network is trained to activate bothreciprocal paths if two nodes belong to the same group.Restricting the number of edges in the input graph canpotentially simplify the clustering task by allowing forthe nodes to only focus on messages coming from nodesthat are adjacent in the input graph. To be suitable,an input graph must include at least one essential pathfrom each node to at least one other node that belongsto the same group, without passing through a node thatdoes not. Figure 11 shows the fraction of essential edgesthat appear in the input graph, i.e. the fraction of nodesthat are reachable through an essential path. It showsthat the MST graph is the most ineﬃcient proposed net-work, on average missing a prohibitively large ∼ . .

93 0 .

94 0 .

95 0 .

96 0 .

97 0 .

98 0 .

99 1 . . . . . . . CompleteDelaunay5NNMST

FIG. 11. Fraction of the edges necessary to make perfect clus-tering predictions that appears in an image for the diﬀerentinput graphs under study. In the top box plot, the blue dia-monds represent the means, the orange lines the medians andthe whiskers span from the 10 th to the 90 th percentiles. Figure 12 shows the training and validation curves forthe aforementioned input graph structures. Figure 13shows the adjusted Rand index clustering metric on thetest set for each conﬁguration. The complete graph,which is the one that includes all possible message pass-ing routes, performs best. Delaunay graphs perform simi-larly – at a much greater computational cost – while otherinput graphs fail to yield a similar precision. This demon-strates the ability of the network to prioritize messagespurely based on the features that it is provided with.

Epochs0 . . L o ss CompleteDelaunay 5NNMST0 5 10 15 20 25Epochs0 . . A cc u r a c y FIG. 12. Edge score loss and edge prediction accuracy forthe diﬀerent input graphs under consideration. The trainingcurves are represented as lines and the validation points asround markers with statistical error bars. .

92 0 .

94 0 .

96 0 .

98 1 . . . . . . . CompleteDelaunay5NNMST

FIG. 13. Adjusted Rand index (ARI) distributions on thetest dataset for the four input graphs under consideration.In the top box plot, the blue diamonds represent the means,the orange lines the medians, the boxes the IQRs and thewhiskers span from the 10 th to the 90 th percentiles. D. Message passing

At a message passing step s + 1, the edge features areupdated by an edge updater which maps 2 F sv + F se fea- tures coming from the nodes it connects and its own fea-tures to F s +1 e features, with F sv the number of node fea-tures at step s and F se the number of edge features at step s . In this study, regardless of the step number, F s +1 e isset to 64. The edge updater MLP consists of three linearlayers, each preceded by a 1D batch normalization layerand followed by a LeakyRELU layer of leakiness 0 .

1. Theﬁrst linear layer brings the number of features to 64, theother two maintain that number.The nodes are updated from F sv features to F s +1 v fea-tures by a function of neighboring nodes and the edgeattributes that connect them, as summarized by equa-tions 2 and 3. Five Pytorch Geometric [24] layers were studied in the context of this paper: MetaLayer,NNConv, EdgeConv, GATConv and AGNNConv.The MetaLayer [17] uses two successive MLPs to com-bine the edge features, the node features and their neigh-bor features. The ﬁrst MLP combines the F sv source nodefeatures together with the F s +1 e edge features using threelinear layers, each preceded by a 1D batch normalizationand followed by a LeakyRELU layer of leakiness 0 .

1. Thisproduces one message per edge e ij , m s +1 ij , each containing F s +1 v features. The second MLP combines the F sv targetnode features with the averaged F s +1 v message featuresusing the same architecture as the ﬁrst MLP. At eachmessage passing step, F s +1 v is set to 64.The NNConv [25] layer is deﬁned as x s +1 i = Θ x si + (cid:88) j ∈N ( i ) x sj · h Θ ( e s +1 j,i ) , (11)with Θ an F s +1 v × F sv matrix of weights and h Θ an MLPthat maps F s +1 e edge features to an F s +1 v × F sv matrix. Inthis study, the MLP is composed of three layers of a batchnormalization, a linear layer and a LeakyReLU functionof leakiness 0.1. The ﬁrst layer increases the number offeatures from F e to F s +1 v × F sv and the following two keepit constant.The EdgeConv [26] layer is deﬁned as x s +1 i = (cid:88) j ∈N ( i ) h Θ ( x si || x sj − x si ) , (12)with || the concatenation operator and h Θ an MLP thatmaps 2 F sv concatenated features to F s +1 v features. Theimplementation of the MLP uses an identical implemen-tation to that of the NNConv layer.The GATConv [27] layer uses the concept of attention: x s +1 i = α ii Θ x si + (cid:88) j ∈N ( i ) α ij Θ x sj ,α sij = exp( LR ( a T [ Θ x si || Θ x sj ])) (cid:80) j ∈N ( i ) ∪{ i } exp( LR ( a T [ Θ x si || Θ x sj ])) , (13)with LR = LeakyReLU and a the attention weight vectorof size 2 F sv . The weight of the message coming from eachnode in the neighborhood of i is explicitly learned as afunction of the node features.The AGNNConv [28] node updater uses a slightly dif-ferent attention mechanism: X s +1 = P s X s ,P sij = exp( β cos( x si , x sj )) (cid:80) j ∈N ( i ) ∪{ i } exp( β cos( x si , x sj )) , (14)with β a learnable parameter. Note that for both of theattention-based layers and the EConv layer, the node fea-tures do not take into account the edge features explicitly.Figure 14 shows the training and validation curves forthe functions under study and for three iterations of mes-sage passing. The MetaLayer node updater performsbest, NNConv is comparable but overtrains faster at alarge number of epochs. Note that, for this task, theadded complexity through the use of an MLP is evidentlyuseful. All layers that explicitly include the edge features– or the diﬀerence between node features as a substitutein the EdgeConv function – perform similarly. Figure 15shows the training and validation curves using the Met-aLayer node updater with a number of message passingvarying between 1 and 5. Adding more than three mes-sage passing steps does not measurably improve the edgeclassiﬁcation accuracy. Epochs0 . . . L o ss MetaEdge NNGAT AGNN0 5 10 15 20 25Epochs0 . . A cc u r a c y FIG. 14. Edge score loss and edge prediction accuracy forthe diﬀerent node updater architecture. The training curvesare represented as lines and the validation points as roundmarkers with statistical error bars.

E. Ground-truth

The ground-truth adjacency matrix may be deﬁned indiﬀerent ways. The only requirement is that it formsat least a tree within each group of nodes, so that thetrue partition can be predicted on an edge basis. Onepossibility is to set the value of edges that connect twonodes within the same group to 1 and all others to 0.This encourages the network to build a cluster graph, i.ea disjoint union of complete graphs.The second possibility uses the score predictions to de-ﬁne a ground-truth. A tree of n − Epochs0 . . . L o ss . . . A cc u r a c y FIG. 15. Edge score loss and edge prediction accuracy forthe diﬀerent number of message passing steps. The trainingcurves are represented as lines and the validation points asround markers with statistical error bars. true group of n nodes so as to maximize the sum of adja-cency scores of the tree edges. The CE loss is then onlyapplied to those edges in the trees and those separatingdiﬀerent node groups. This allows the network freedomas to which edge to turn on, as long as it builds a forest,i.e. a disjoint union of trees. In the following discussion,the ﬁrst ground-truth is referred to as the cluster targetand the second as the forest target.Figure 16 shows the training edge accuracy and edgeloss for the two ground-truths deﬁned above. Figure 17shows their adjusted Rand index (ARI) [20] distribu-tion on the test set. Both targets yield very similar re-sults with a small lead for the baseline cluster target.Note that the forest target does not require the post-processing described in section II F, as not all edges con-necting nodes within a group are trained to be activated.This also means that a single edge incorrectly turned onin a forest introduces mistakes at the inference stage. Epochs0 . . L o ss Cluster Forest0 5 10 15 20 25Epochs0 . . . A cc u r a c y FIG. 16. Edge score loss and edge prediction accuracy forthe diﬀerent ground-truths under consideration. The trainingcurves are represented as lines and the validation points asround markers with statistical error bars. .

96 0 .

97 0 .

98 0 .

99 1 . . . . . . . ClusterForest

FIG. 17. Adjusted Rand Index (ARI) distributions for thediﬀerent ground-truths under consideration. In the top boxplot, the blue diamonds represent the means, the orange linesthe medians and the whiskers span from the 10 th to the 90 th percentiles. IV. RESULTSA. Training

The baseline model is trained for 25 epochs on the edgeand node prediction tasks using the DBSCAN-formedshower fragments and achieves a loss and accuracy sum-marized in ﬁgure 18. The edge classiﬁcation accuracyis not aﬀected by the addition of the node classiﬁcationtask nor the use of DBSCAN. The primary classiﬁcationaccuracy is close to 1 when initial points are known.

Epochs0 . . L o ss Baseline No initialEpochs0 . . A cc u r a c y Epochs0 . . L o ss . . A cc u r a c y E d g e N o d e FIG. 18. Cross-entropy loss and classiﬁcation accuracy ofedges and nodes for the full shower reconstruction model,with and without initial points. The training curves are rep-resented as lines and the validation points as round markerswith statistical error bars.

B. Shower grouping

Figure 31 shows example outputs of the shower recon-struction algorithm for the four events that contain themost fragments in the test set. All four events have a highclustering accuracy, only missing or merging small frag-ments incorrectly. The top event highlights the impor-tance of the partition loss optimization at the inferencestage. Some edges are connecting two separate showerstogether but the loss minimization allows for the recov-ery of an almost perfect partition. The bottom left groupof the fourth event originates from the same shower butthe primary is out of volume. The network correctly as-sociated them together but would not be penalized formaking a mistake there. Also note that there is no pri-mary identiﬁcation mistake in these examples.Figure 19 shows the clustering metrics associated withthe baseline model applied to the test set. The caseswith ARI = 0 represent ∼ ∼

60 MeV) are included. In95.03 % of events, the estimated shower count is exact. .

96 0 .

97 0 .

98 0 .

99 1 . . . . . . . ARIPurityEﬃciency

FIG. 19. Distributions of the three clustering metrics for theshower clustering task. In the top box plot, the blue diamondsrepresent the means, the orange lines the medians and thewhiskers span from the 10 th to the 90 th percentiles. C. Primary identiﬁcation

Figure 21 shows the distributions of primary scores forground-truth primary and secondary nodes in the testset. This study shows that 0.08 % of secondary frag-ments have a score larger than 0 . . P r e d i c t e d s h o w e r c o un t FIG. 20. Number of predicted showers as a function of thenumber of true showers in an event. Only shower instanceswith over 100 voxels are included. selecting the one with highest score. This scheme yieldsa group-wise primary identiﬁcation accuracy of 99.77 %for showers consisting of two or more fragments. Thistask is relatively trivial given an understanding of thedirection of travel of the shower. The prior knowledge offragment initial points helps, but even without them, theprimary identiﬁcation accuracy is still at 98.94 %. . . . . . . . . . . . . SecondaryPrimary

FIG. 21. Fragment primary scores of ground-truth primarynodes and ground-truth secondary nodes. In the top box plot,the blue diamonds represent the mean scores and the orangelines the median scores.

D. Shower energy resolution

For each ground-truth shower instance in the dataset,the total amount of energy that it deposits inside theimage volume is integrated – including the fragmentssmaller than size 10 that are not in the input to thereconstruction chain – to form the ground-truth showerenergy, E . For each true shower, the reconstructed clus- ter with the highest overlap is selected and the energyof its voxels is summed to form an energy estimate, ˆ E .This estimate is multiplied by a fudge factor of 1.211 tocompensate for the energy lost in small fragments mea-sured in ﬁgure 5. In order to assess the importance of thecalorimetric information on the shower energy resolution,the energy is also estimated using the voxel count alonedivided by a factor 1 .

69, obtained by ﬁtting the relationbetween the true number of EM voxels and the energydeposition, as shown in ﬁgure 22. E [MeV]050010001500 V o x e l c o un t , N N = (1 . ± . E (MeV) FIG. 22. Number of shower voxels contained in shower frag-ments of size 10 and above as a function of the total energydeposited by the shower.

Figure 23 shows a distribution of the relative showerenergy residuals for the showers that are at least 95 %contained inside the images of the test dataset. Theresiduals are provided with and without leveraging thecalorimetric information. These results show that theuncertainty on the shower energy is mostly driven by theprior fragment size selection. Figure 24 shows the 1 σ en-ergy resolution as a function of the shower energy. Theuncertainty decreases as the shower energy increases andreaches an accuracy around 5 % at 1 GeV. E. Shower angular resolution

For each ground-truth shower in the dataset, its truedirection is obtained by normalizing its primary momen-tum vector to its norm, d = p / | p | . In practice, the ﬁtteddirection, ˆ d , is estimated by taking the mean directionfrom the predicted primary initial point to the primarypoints with a neighborhood radius R n , as shown in equa-tions 10. Setting R n to a constant is suboptimal becausethe geometry of primary fragments can vary signiﬁcantlyfrom one shower to another. Smaller radii are preferablefor fragments that curve or branch out a lot, but largerradii are advantageous for mostly linear showers.The radius is optimized in order to minimize the rel-ative deviation of the points from a straight line. Theprimary fragment points are ordered from closest to far-2 − . − . . . . − . − . . . .

4( ˆ E − E ) /E FIG. 23. Reconstructed relative shower energy residual dis-tribution. In the top box plot, the blue diamonds representthe means, orange lines the medians, the boxes the IQRs andthe the whiskers span from the 10 th to the 90 th percentiles.‘True’ uses the true energy of true clusters, ‘Calorimetry’ thetrue energy depositions of reconstructed clusters and ‘Count’a constant factor applied to the reconstructed voxel count. − . − . . . . ( ˆ E − E ) / E E [MeV]0 . . σ E / E .

041 + 0 . / p E (GeV) FIG. 24. Reconstructed relative shower energy residual distri-bution as a function of the shower energy. The orange markerson the top pad represent the means and their error bars theRMS; the latter is shown on its own in the bottom pad. thest from the initial point, { v } ni =1 . The mean, ¯ v k , andcovariance matrix, Σ k , are ﬁrst evaluated for the clos-est three points (as the covariance matrix is undeﬁnedfor k < k = 4 , . . . , n following¯ v k = 1 k (( k − v k − + v k ) , (15) Σ k = k − k Σ k − + 1 k − v k − ¯ v k )( v k − ¯ v k ) T . (16)For each combination of points, k , the ordered eigen-values of the covariance matrix, { λ k,i } i =1 , are evalu-ated. The optimal neighborhood of points minimizes the spread around the principal axis, i.e k ∗ = min k λ k, + λ k, λ k, . (17)Figure 25 shows the angular distribution between thetrue direction and the estimate, θ = arccos ( ˆ d · d ), forall the true showers in the dataset. The residual angledistribution has a mode of ∼ ◦ , a mean of 6 . ◦ and amedian of 3 . ◦ , when using the optimized neighborhood, R ∗ n . The distributions for ﬁxed neighborhood radii per-form signiﬁcanlty worse. A radius of 5, while on averageoptimal for all fragments as shown in ﬁgure 9, carries alarge uncertainty when used for primary fragments. Fig-ure 26 shows the angular resolution as a function of theshower energy. It shows that the resolution improves sig-iniﬁcantly with energy to reach a mean as low as ∼ . ◦ . R n = 5 R n = 10 R ∗ n θ = arccos ( ˆ d · d ) [ ◦ ]01000200030004000 R ∗ n R n = 10 R n = 5 FIG. 25. Reconstructed shower direction residual distribu-tion. In the top box plot, the blue diamonds represents themeans, the orange lines the medians, the boxes the IQRs andthe the whiskers span from the 10 th to the 90 th percentiles. F. Mistakes analysis

A study of the events with low clustering purity revealsthat the algorithm occasionally merges showers when thedirection vector of one of its fragments can clearly beback-propagated to another shower fragment from a dis-tinct instance. This may stem from an inconsistency be-tween the true photon momentum and the local directionestimate of the fragment. The top row of ﬁgure 32 showsan event with a purity of 0 .

51 in the test set. Events witha purity < . ∼ .

54 in testset. Events with an eﬃciency < . ∼ . θ = a r cc o s ( ˆ d · d ) [ ◦ ] E [MeV]010 h θ i [ ◦ ] . / p E (GeV) 10 FIG. 26. Reconstructed shower direction residual distributionas a function of the shower energy. The orange markers onthe top pad represent the mean and their error bars the RMS;the former is shown on its own in the bottom pad. of 0 but be mostly accurate, if the number of true orpredicted showers is one.There is a total of 119 showers in the whole test setthat have a misidentiﬁed primary. The majority of thosemistakes stem from an incorrect partition of the nodes.The remaining mistakes can be attributed to ambigu-ous shower starts that do not have a fragment clearlyupstream of the others. An example is provided in ﬁg-ure 27. The network picks the leftmost fragment but theone directly to its right is the labeled shower start. Thenetwork shows its uncertainty by giving scores of 0.76and 0.13 to the left and right fragments, respectively.

FIG. 27. Primary scores for an ambiguous shower start witha color scale that ranges from 0 (black) to bright yellow (1).

V. INTERACTION CLUSTERINGA. Modiﬁcations

Interaction clustering is deﬁned as the association ofparticle instances together into groups that share a com-mon particle ancestor. In this study, the individual par-ticle instances are assumed to be known a priori fromthe previous reconstruction steps. In the full reconstruc-tion chain, the shower instances will be provided by thereconstruction algorithm described in the previous sec-tions while tracks, Michel and Deltas are to be clusteredby a separate algorithm such as DBSCAN [16] or a CNN-based dense clustering algorithm [29].Images simulated for this dataset only contain a singleinteraction vertex and multiple stray tracks and showers.In order to teach the network to separate multiple inter-action vertices, several of these images may be stackedtogether. At the training stage, the number of imagesthat are stacked together before being fed to the networkfollows a Poisson distribution of mean 2. Figure 28 showsan example of two stacked images and their particle la-bels.

FIG. 28. Particle labels of a superimposition of two events inthe test set.

This task utilizes an identical reconstruction chain tothat used for the shower clustering. The input to thechain consists of particle instances instead of shower frag-ments and the target is interaction instances. The edgefeatures are identically deﬁned while node features areextended by adding the number-encoded particle class(0–4 corresponding to shower, track, Michel and deltarays, respectively), the mean and RMS energy deposi-tion and the terminal point of tracks (other classes donot have well deﬁned end points and are given a dupli-4cate of the initial point instead).Downstream of the message passing stage, the updatednode features are not explicitly used to make any predic-tion. The edge features are the basis for an adjacencymatrix prediction, while the groups are extracted by us-ing the method described in section II F. Figure 29 showsthe training and validation edge classiﬁcation loss and ac-curacy for the interaction clustering task. These metricsare evaluated with and without the end point informa-tion; adding the endpoints to the particle instances in-creases the edge classiﬁcation accuracy by ∼ . Epochs0 . . L o ss Baseline No endpoints0 5 10 15 20 25Epochs0 . . . A cc u r a c y FIG. 29. Edge score loss and edge prediction accuracy for thediﬀerent interaction clustering models. The training curvesare represented as lines and the validation points as roundmarkers with an error bar.

B. Performance

Figure 33 shows the output of the interaction cluster-ing algorithm for four randomly selected events in thetest set, with one to four interaction vertices stacked to-gether. All four events have a high clustering accuracy,only missing or merging small particles incorrectly.The metrics described in section II G are used to sys-tematically characterize the performance of the recon-struction chain applied to the interaction clustering task.Figure 30 shows the clustering performance as a functionof the number of images that are superimposed.As shown in ﬁgure 3, each image contains one neutrino-like interaction of 4 . ± . . ± . ∼

12 m , this corresponds to an interactiondensity of 0 . ± .

14 intractions / m , which increases lin-early with the number of superimposed images. The den-sity observed of a single image, for instance, is equiva-lent to ∼

120 interactions in a single ICARUS image, farabove the expected rate [5]. A stack of two images con-tains two neutrino-like interactions, which corresponds to ∼

18 such interactions in the DUNE-ND volume, closeto the maximum expected rate [9], overlayed with ∼

30 cosmic rays, which is unrealistically large. This demon-strates that this algorithm should easily deal with theexpected rate of interactions in the foreseeable future. . . . . M e t r i c ARIEﬃciencyPurity

FIG. 30. Interaction clustering metrics as a function of thenumber of interactions in the image. The diamonds representthe means, the lines the meadians, the boxes the IQRs andthe whiskers span from the 10 th to the 90 th percentiles. C. Mistakes analysis

Figure 34 shows the three events that are reconstructedwith the lowest purity, eﬃciency and ARI on the ﬁrstthree rows, respectively, and an event with an ARI of 0.The ﬁrst event exhibits a purity of 56.1 % due to a cosmicmuon crossing and overlapping one of the vertex tracks.Events with purity < . ∼ . < . ∼ . VI. CONCLUSION

Graph Neural Networks (GNNs) are an ideally suitedmethod to tackle the clustering of spatially detached ob-jects in Liquid Argon Time Projection Chambers (LArT-PCs). A GNN-based reconstruction chain was developedto cluster electromagnetic showers and particle interac-tions. This paper studied its performance on a generic 3Dsample of particle interactions in liquid argon and demon-strated a clustering eﬃciency and purity well above 99 %for both tasks. A good shower energy resolution is a5core requirement for the upcoming SBN program andDUNE experiment to reach their scientiﬁc goals. Thereconstruction of the shower direction will be of centralimportance when matching neutral pion decay showerstogether or when back-propagating photons to a vertex.The clustering of particle into interactions will becomeessential for future high-rate LArTPCs and the GNN al-gorithm developed here shows that this can be achieved.The algorithm described in this paper will be part of anend-to-end, machine-learning-based reconstruction chain developed at SLAC for all LArTPCs.

VII. ACKNOWLEDGEMENT

This work is supported by the U.S. Department of En-ergy, Oﬃce of Science, Oﬃce of High Energy Physics,and Early Career Research Program under Contract DE-AC02-76SF00515. [1] B. Baller et al. , J. Instrum. , T05005 (2014).[2] C. Rubbia, CERN-EP-INT-77-08 (1977).[3] R. Acciarri et al. (ArgoNeuT Collaboration), Phys. Rev.D , 072005 (2017).[4] R. Acciarri et al. (ArgoNeuT Collaboration), Phys. Rev.D , 012002 (2019).[5] M. Antonello et al. (MicroBooNE, LAr1-ND, ICARUS-WA104 Collaborations), (2015), arXiv:1503.01520[physics.ins-det].[6] A. A. Aguilar-Arevalo et al. (MiniBooNE Collaboration),Phys. Rev. Lett. , 221801 (2018).[7] R. Acciarri et al. (MicroBooNE), J. Instrum. , P02017(2017).[8] S. Amerio et al. (ICARUS), Nucl. Instrum. MethodsPhys. Res. A , 329 (2004).[9] R. Acciarri et al. (DUNE), (2016), arXiv:1601.02984[physics.ins-det].[10] C. Adams et al. , J. Instrum. , P02007 (2020).[11] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun,CoRR abs/1812.08434 (2018), arXiv:1812.08434.[12] K. G. Sochat VV, Prybol CJ, PLoS ONE (2017).[13] C. Adams, K. Terao, and T. Wongjirad, “Pilarnet:Public dataset for particle imaging liquid argon detec-tors in high energy physics,” (2020), arXiv:2006.01993[physics.ins-det].[14] L. Domin´e and K. Terao (DeepLearnPhysics Collabora-tion), Phys. Rev. D , 012005 (2020).[15] C. Adams et al. (MicroBooNE Collaboration), Phys. Rev.D , 092001 (2019).[16] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining , KDD’96 (AAAIPress, 1996) pp. 226–231.[17] P. W. Battaglia et al. , CoRR abs/1806.01261 (2018),arXiv:1806.01261.[18] C. M. Bishop,

Neural Networks for Pattern Recognition (Oxford University Press, Inc., USA, 1995).[19] T. H. Cormen, C. E. Leiserson, R. L. Rivest, andC. Stein,

Introduction to Algorithms, Third Edition , 3rded. (The MIT Press, 2009).[20] W. M. Rand, Journal of the American Statistical Asso-ciation , 846 (1971).[21] D. P. Kingma and J. Ba, CoRR abs/1412.6980 (2015).[22] L. Domin´e et al. , (2020), arXiv:2006.14745 [hep-ex].[23] K. He, X. Zhang, S. Ren, and J. Sun, in Proceedingsof the IEEE conference on computer vision and patternrecognition (2016) pp. 770–778. [24] M. Fey and J. E. Lenssen, in

ICLR Workshop on Repre-sentation Learning on Graphs and Manifolds (2019).[25] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, andG. E. Dahl, in

Proceedings of the 34th International Con-ference on Machine Learning , ICML’17 (2017) pp. 1263–1272.[26] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein,and J. M. Solomon, CoRR abs/1801.07829 (2018),arXiv:1801.07829.[27] P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero,P. Lio, and Y. Bengio, (2017), arXiv:1710.10903[stat.ML].[28] K. K. Thekumparampil, C. Wang, S. Oh, and L.-J. Li,ArXiv (2018), arXiv:1803.03735 [stat.ML].[29] D. H. Koh et al. , (2020), arXiv:2007.03083 [hep-ex]. ARI : 98 . . . FIG. 31. Shower clustering predictions for the four events with the highest number of shower fragments in the test dataset (oneevent per row). Left: ground-truth shower labels (color) and edges representing the true fragment parentage. Middle: primarynode scores represented as a node color ranging from 0 (blue) to 1 (red) and edges with an adjacency score > . Pur. : 51 . . − . FIG. 32. Shower clustering predictions with the largest mistakes in three categories and one with an ARI of 0 (one event perrow). Left: ground-truth shower labels (color) and edges representing the true fragment parentage. Middle: primary nodescores represented as a node color ranging from 0 (blue) to 1 (red) and edges with an adjacency score > . ARI : 100 %ARI : 100 %ARI : 100 %ARI : 100 %

FIG. 33. Interaction clustering predictions on four randomly picked events with 1, 2, 3 and 4 randomly merged images (fromtop to bottom). Left: ground-truth interaction labels (color) and ground-truth cluster graph edges. Middle: edges with anadjacency score > . Pur. : 56 . . − . FIG. 34. Interaction clustering predictions with the largest mistakes in three categories and one with an ARI of 0 (one eventper row). Left: ground-truth interaction labels (color) and ground-truth cluster graph edges. Middle: edges with an adjacencyscore > ..