[PDF] An updated hybrid deep learning algorithm for identifying and locating primary vertices

Abstract

We present an improved hybrid algorithm for vertexing, that combines deep learning with conventional methods. Even though the algorithm is a generic approach to vertex finding, we focus here on it's application as an alternative Primary Vertex (PV) finding tool for the LHCb experiment. In the transition to Run 3 in 2021, LHCb will undergo a major luminosity upgrade, going from 1.1 to 5.6 expected visible PVs per event, and it will adopt a purely software trigger. We use a custom kernel to transform the sparse 3D space of hits and tracks into a dense 1D dataset, and then apply Deep Learning techniques to find PV locations using proxy distributions to encode the truth in training data. Last year we reported that training networks on our kernels using several Convolutional Neural Network layers yielded better than 90 % efficiency with no more than 0.2 False Positives (FPs) per event. Modifying several elements of the algorithm, we now achieve better than 94 % efficiency with a significantly lower FP rate. Where our studies to date have been made using toy Monte Carlo (MC), we began to study KDEs produced from complete LHCb Run 3 MC data, including full tracking in the vertex locator rather than proto-tracking.

Full PDF

PProceedings of CTD 2020PROC-CTD2020-52July 3, 2020

An updated hybrid deep learning algorithm for identifying and locatingprimary vertices

Simon Akar , Thomas J. Boettcher , Sarah Carl , Henry F. Schreiner ,Michael D. Sokoloff , Marian Stahl , Constantin Weisser , Mike Williams On behalf of the LHCb Real Time Analysis project, University of Cincinnati, Cincinnati, OH, United States Massachusetts Institute of Technology, Cambridge, MA, United States Princeton University, Princeton, NJ, United States

ABSTRACTWe present an improved hybrid algorithm for vertexing, that combines deeplearning with conventional methods. Even though the algorithm is a genericapproach to vertex ﬁnding, we focus here on it’s application as an alternativePrimary Vertex (PV) ﬁnding tool for the LHCb experiment.In the transition to Run 3 in 2021, LHCb will undergo a major luminosityupgrade, going from 1.1 to 5.6 expected visible PVs per event, and it will adopt apurely software trigger. We use a custom kernel to transform the sparse 3D spaceof hits and tracks into a dense 1D dataset, and then apply Deep Learningtechniques to ﬁnd PV locations using proxy distributions to encode the truth intraining data. Last year we reported that training networks on our kernels usingseveral Convolutional Neural Network layers yielded better than 90 % eﬃciencywith no more than 0.2 False Positives (FPs) per event. Modifying severalelements of the algorithm, we now achieve better than 94 % eﬃciency with asigniﬁcantly lower FP rate. Where our studies to date have been made using toyMonte Carlo (MC), we began to study KDEs produced from complete LHCbRun 3 MC data, including full tracking in the vertex locator rather thanproto-tracking.PRESENTED ATConnecting the Dots Workshop (CTD 2020)April 20-30, 2020 a r X i v : . [ phy s i c s . i n s - d e t ] J u l onnecting the Dots. April 20-30, 2020 The LHCb experiment is currently being upgraded for the planned start of Run 3 of the LHC in 2021to facilitate recording proton-proton collision data at √ s = 14 TeV with an instantaneous luminosity of2 × cm − s − [1]. Building on the success of Run 2 [2], the experiment will continue moving towards areal-time-analysis approach and consequently remove the Level 0 hardware trigger in favor of a pure softwaretrigger system. In this process the the entire software stack is refactored, not only to reﬂect the new hardwarecomponents, but also to improve performance of reconstruction and selection. One of the algorithms thatis largely aﬀected by the increased instantaneous luminosity is Primary Vertex (PV) ﬁnding. The increasedluminosity translates to an expected average number of PVs from 1.1 in Run 2 to 5.6 in Run 3, which willdegrade the overall performance of PV reconstruction. A new fast PV ﬁnding algorithm has been broughtforward and is currently proposed as the baseline solution [3].As an alternative we propose a hybrid algorithm, established in Ref. [4], available on gitlab [5], andschematically shown in Fig. 1, that approaches this challenge with machine learning techniques in the clustersearch of PV ﬁnding - the part of the algorithm that drives the PV reconstruction eﬃciency. In the contextof PV ﬁnding at LHCb, the hybrid algorithm starts with tracks that have been reconstructed from hits inthe Vertex locator (Velo). Their location, direction, and covariance matrix information is used to reducethe problem from three to one dimension, by calculating “kernels” in bins along z , i.e. the beam direction.The z positions of the primary vertices are closely related to peaks in this kernel histogram, and it is thenused as input to a Convolutional Neural Network (CNN) to carry out the cluster search and predict the PVlocations. The output probabilities from the CNN are converted to a list of PV candidate z -locations usinga simple peak ﬁnding algorithm. A toy simulation of the LHCb detector is used to train, develop and testthe algorithm. In parallel, the algorithm has been deployed into the LHCb software stack where it can beused for PV ﬁnding.Figure 1: Schematic workﬂow of the hybrid deep learning algorithm for vertex ﬁnding. The kernel generation step converts sparse 3D data into a feature-rich 1D histogram. It consists of 4000bins along the z -direction (beamline), each 100 µ m wide, covering the active area of the Velo around theinteraction point. Each z bin of the histogram is ﬁlled by the maximum kernel value in x and y , where thekernel is deﬁned by K ( x, y, z ) = (cid:80) tracks G (IP( x, y ) | z ) (cid:80) tracks G (IP( x, y ) | z ) − (cid:88) tracks G (IP( x, y ) | z ) . (1)In Eq. (1), G (IP( x, y ) | z ) is a Gaussian function, centered at x = y = 0 and evaluated at the impact parameterIP( x, y ): the distance of closest approach of a track projection to a hypothesized vertex at position x, y fora given z . The width/covariance of G is given by the IP( x, y ) uncertainty/covariance matrix. Finding themaximum K ( x, y, z ) is a two step process, where kernel values are ﬁrst computed in a coarse 8 × x, y ; then the parameters of that search are then taken as starting points for a MINUIT minimization processto ﬁnd the maximum kernel value. 1 onnecting the Dots. April 20-30, 2020

Equation (1) is computed from the location, direction, and covariance matrix of input tracks. Thisinformation can be provided by a standalone toy Monte Carlo proto-tracking [4] or from the proposed LHCbRun 3 production Velo tracking [6]. While the former uses a heuristic Gaussian width to estimate the IP( x, y )uncertainty, the latter makes use of the measured covariance matrix of the track state closest to beam tocompute the IP( x, y ) covariance. The diﬀerence between kernel histogram using the heuristic and measuredIP( x, y ) uncertainty/covariance is exemplarily shown in Fig. 2. It can be seen that the kernels are morepronounced around PVs, which suggests that – once re-trained – the CNN performance should increase.Further improvement with LHCb simulation data is expected from centering G at the actual beamlineposition at a given z position, and by using Kalman-ﬁtted Velo tracks as input to the kernel generation. - z (mm) KD E v a l u e old Kernelsnew Kernelsrecon'ble PVs LHCb simulation

Figure 2: Comparison of kernel histograms heuristic (blue) and measured (red) IP( x, y ) uncer-tainty/covariance. A random single event with 6 reconstructible primary vertices is shown, whose z po-sitions are marked by black dots. Here, a simulated primary vertex is called reconstructible if more thanfour simulated tracks within the Velo acceptance emerge from that vertex. Note that there are 2 PVs atabout +15 mm, illustrating the issue of PV proximity that the cluster search faces frequently. The cluster search in our algorithm is conducted by a Convolutional Neural Network (CNN) consistingof several one dimensional convolution layers. We use

PyTorch [7] for training and development, and

TorchScript [8] as

C++ inference engine in the LHCb software stack.Before being able to train a CNN, we need to deﬁne what it should learn, i.e. give it a target. TheCNNs in our algorithm is designed to output a 4000 bin target histogram , just as the input kernel histogram,where essentially all “noise” is removed. A ﬁrst approach to create such a target histogram is to deﬁne aGaussian with unit area and a width of one bin (100 µ m) centered around the simulated PV position. Thesesimple target histograms have been updated to take the PV resolution as a function of track multiplicity intoaccount, which aﬀects width and area of the Gaussians. The resulting performance improvement is plottedin Fig. 3. Further, regions around PVs that do not pass the criterion of being reconstructible, i.e. have 5or less detectable associated tracks, are “masked”. Masked regions are eﬀectively hidden during training, sothat discoveries in them are neither punished nor rewarded.Our best performing network to date is composed of 6 convolutional layers with leaky ReLU activationfunctions in-between hidden layers and a softmax activation for the output layer. The widths and padding ofeach convolution kernel was chosen by visual inspection of the data; the number of channels were increaseduntil beneﬁts were no longer noticeable upon adding new channels. The training is carried out on GPUsusing mini-batch gradient descent, the Adam optimizer and dropout regularization.2 onnecting the Dots. April 20-30, 2020 A custom cost function, similar to cross-entropy, has been deﬁned and it was found that it’s initialsymmetric form favored small false positive rates at the expense of eﬃciency. Therefore, a single parameterasymmetry term has been added to the cost function, serving as powerful control for selecting the falsepositive to eﬃciency tradeoﬀ [4]. To stabilize the training process in early epochs, the last convolution layeris replaced by a Fully Connected (FC) layer. After several training epochs with this architecture, the weightsof the convolutional layers are ﬁxed, the FC layer is replaced by a convolutional layer, and the network istrained for a few more epochs. Then, all weights are ﬂoated and the CNN is trained in it’s ﬁnal architecture.Recently, we improved the CNN performance by adding x and y position information, found by maxi-mizing the kernel (Eq. (1)), in a perturbative manner. Adding the information perturbatively is importantbecause the CNN overestimates the importance of these variables compared to the z information. The per-turbative addition works as follows: On the basis of the original network, another CNN with 3 convolutionallayers that solely processes x and y position information is trained independently, but parallel to the originalnetwork. Such, that both CNN responses at the end of each training epoch are multiplied. The CNN with x and y information will contribute with values close to 1 in most cases, but can veto kernel peaks with high x, y gradients that have been observed in data and can lead to false positives. The performance improvementof the perturbation network, together with the addition of one convolutional layer is shown in Fig. 3.Figure 3: Performance improvements with respect to Ref. [4] achieved by modifying target histograms,adding layers to the CNN and adding x , y position information perturbatively. Details are described in thetext. The proof of principle, that the here proposed hybrid deep learning algorithm is an eﬃcient vertex ﬁndingtool has been established in Ref. [4]. Since then, the algorithm’s performance has been improved furtherby changes described in Sec. 3 and are shown in Fig. 3. We want to highlight that for a ﬁxed PV ﬁndingeﬃciency of 94 %, the false positive rate per event could be reduced by a factor of 2. These numbers havebeen obtained on toy simulation using metrics that do not fully reﬂect standard LHCb deﬁnitions, andare thus not directly comparable to the ones given in Ref. [3]. However, ﬁrst studies using oﬃcial LHCbsimulation in place of the toy simulation with our metrics agree with those from pure toy simulation. Weare looking forward to re-train the CNNs with LHCb simulation data and expect further improvements ofthe PV ﬁnding performance. 3 onnecting the Dots. April 20-30, 2020

We have presented a hybrid deep learning vertexing algorithm and it’s application as PV ﬁnding algorithmin the LHCb Run 3 software trigger. The algorithm has been privately deployed in the LHCb CPU softwarestack and it’s performance has been further improved using toy data.Future milestones of the project are well deﬁned. We plan to deploy the algorithm in Allen, the HighLevelTrigger application on GPUs for LHCb [9]. Further, it is of great importance to have a one to one comparisonwith the currently proposed LHCb Run 3 production PV ﬁnding algorithm [3]. This means that we needto refactor the current implementation of our algorithm in the LHCb software stack to be able to run thesame performance benchmark tests. In parallel, we need to re-train our CNN with oﬃcial LHCb simulationdata in place of toy data. We expect further performance improvements from doing so, in particular fromusing the measured covariance matrix of (Kalman-ﬁtted) Velo tracks in the generation of kernel histograms.Currently, the generation of these kernel histograms is the throughput bottleneck of our algorithm due tothe usage of

MINUIT . We are planning to replace the kernel generation by a fast machine learning algorithm,which could be merged with the cluster search CNN into a single algorithm that predicts PV positions fromthe output of the Velo track reconstruction in one step. We are also investigating pruning techniques to speedup the throughput of the inference. Moreover, we plan to associate Velo tracks with found PV candidatesprobabilistically to reduce the false positive rate. In an adjacent step, we plan to probabilistically identifysecondary vertices and associated tracks.

ACKNOWLEDGEMENTS

This work was supported by the National Science Foundation under Cooperative Agreement OAC-1836650,OAC-1739772, and OAC-1740102. It was also supported by the University of Cincinnati Women in Scienceand Engineering program.

References [1] LHCb Collaboration, ”LHCb Trigger and Online Upgrade Technical Design Report,” CERN-LHCC-2014-016, LHCb-TDR-2014-016.[2] R. Aaij et al. [LHCb], ”Design and performance of the LHCb trigger and full real-time reconstructionin Run 2 of the LHC,” JINST , no.04, P04013 (2019); [arXiv:1812.10790 [hep-ex]].[3] F. Reiß, ”Fast parallel Primary Vertex reconstruction for the LHCb Upgrade,” Talk given at thisconference; LHCb-TALK-2020-044. Publication in preparation.[4] R. Fang, H. F. Schreiner, M. D. Sokoloﬀ, C. Weisser and M. Williams, ”A hybrid deep learning approachto vertexing,” [arXiv:1906.08306 [physics.ins-det]].[5] pv-finder repository: https://gitlab.cern.ch/LHCb-Reco-Dev/pv-finder .[6] A. Hennequin, B. Couturier, V. Gligorov, S. Ponce, R. Quagliani and L. Lacassagne, ”A fast and eﬃcientSIMD track reconstruction algorithm for the LHCb Upgrade 1 VELO-PIX detector,” [arXiv:1912.09901[physics.ins-det]].[7] A. Paszke et al. [The PyTorch team], ”PyTorch: An Imperative Style, High-Performance Deep LearningLibrary,” Advances in Neural Information Processing Systems 32, 8024 (2019).[8] The PyTorch team, ”Torch Script,” https://pytorch.org/docs/stable/jit.html .[9] R. Aaij et al. , ”Allen: A HighLevel Trigger on GPUs for LHCb,” Comput. Softw. Big Sci.4