An updated hybrid deep learning algorithm for identifying and locating primary vertices
Simon Akar, Thomas J. Boettcher, Sarah Carl, Henry F. Schreiner, Michael D. Sokoloff, Marian Stahl, Constantin Weisser, Mike Williams
PProceedings of CTD 2020PROC-CTD2020-52July 3, 2020
An updated hybrid deep learning algorithm for identifying and locatingprimary vertices
Simon Akar , Thomas J. Boettcher , Sarah Carl , Henry F. Schreiner ,Michael D. Sokoloff , Marian Stahl , Constantin Weisser , Mike Williams On behalf of the LHCb Real Time Analysis project, University of Cincinnati, Cincinnati, OH, United States Massachusetts Institute of Technology, Cambridge, MA, United States Princeton University, Princeton, NJ, United States
ABSTRACTWe present an improved hybrid algorithm for vertexing, that combines deeplearning with conventional methods. Even though the algorithm is a genericapproach to vertex finding, we focus here on it’s application as an alternativePrimary Vertex (PV) finding tool for the LHCb experiment.In the transition to Run 3 in 2021, LHCb will undergo a major luminosityupgrade, going from 1.1 to 5.6 expected visible PVs per event, and it will adopt apurely software trigger. We use a custom kernel to transform the sparse 3D spaceof hits and tracks into a dense 1D dataset, and then apply Deep Learningtechniques to find PV locations using proxy distributions to encode the truth intraining data. Last year we reported that training networks on our kernels usingseveral Convolutional Neural Network layers yielded better than 90 % efficiencywith no more than 0.2 False Positives (FPs) per event. Modifying severalelements of the algorithm, we now achieve better than 94 % efficiency with asignificantly lower FP rate. Where our studies to date have been made using toyMonte Carlo (MC), we began to study KDEs produced from complete LHCbRun 3 MC data, including full tracking in the vertex locator rather thanproto-tracking.PRESENTED ATConnecting the Dots Workshop (CTD 2020)April 20-30, 2020 a r X i v : . [ phy s i c s . i n s - d e t ] J u l onnecting the Dots. April 20-30, 2020 The LHCb experiment is currently being upgraded for the planned start of Run 3 of the LHC in 2021to facilitate recording proton-proton collision data at √ s = 14 TeV with an instantaneous luminosity of2 × cm − s − [1]. Building on the success of Run 2 [2], the experiment will continue moving towards areal-time-analysis approach and consequently remove the Level 0 hardware trigger in favor of a pure softwaretrigger system. In this process the the entire software stack is refactored, not only to reflect the new hardwarecomponents, but also to improve performance of reconstruction and selection. One of the algorithms thatis largely affected by the increased instantaneous luminosity is Primary Vertex (PV) finding. The increasedluminosity translates to an expected average number of PVs from 1.1 in Run 2 to 5.6 in Run 3, which willdegrade the overall performance of PV reconstruction. A new fast PV finding algorithm has been broughtforward and is currently proposed as the baseline solution [3].As an alternative we propose a hybrid algorithm, established in Ref. [4], available on gitlab [5], andschematically shown in Fig. 1, that approaches this challenge with machine learning techniques in the clustersearch of PV finding - the part of the algorithm that drives the PV reconstruction efficiency. In the contextof PV finding at LHCb, the hybrid algorithm starts with tracks that have been reconstructed from hits inthe Vertex locator (Velo). Their location, direction, and covariance matrix information is used to reducethe problem from three to one dimension, by calculating “kernels” in bins along z , i.e. the beam direction.The z positions of the primary vertices are closely related to peaks in this kernel histogram, and it is thenused as input to a Convolutional Neural Network (CNN) to carry out the cluster search and predict the PVlocations. The output probabilities from the CNN are converted to a list of PV candidate z -locations usinga simple peak finding algorithm. A toy simulation of the LHCb detector is used to train, develop and testthe algorithm. In parallel, the algorithm has been deployed into the LHCb software stack where it can beused for PV finding.Figure 1: Schematic workflow of the hybrid deep learning algorithm for vertex finding. The kernel generation step converts sparse 3D data into a feature-rich 1D histogram. It consists of 4000bins along the z -direction (beamline), each 100 µ m wide, covering the active area of the Velo around theinteraction point. Each z bin of the histogram is filled by the maximum kernel value in x and y , where thekernel is defined by K ( x, y, z ) = (cid:80) tracks G (IP( x, y ) | z ) (cid:80) tracks G (IP( x, y ) | z ) − (cid:88) tracks G (IP( x, y ) | z ) . (1)In Eq. (1), G (IP( x, y ) | z ) is a Gaussian function, centered at x = y = 0 and evaluated at the impact parameterIP( x, y ): the distance of closest approach of a track projection to a hypothesized vertex at position x, y fora given z . The width/covariance of G is given by the IP( x, y ) uncertainty/covariance matrix. Finding themaximum K ( x, y, z ) is a two step process, where kernel values are first computed in a coarse 8 × x, y ; then the parameters of that search are then taken as starting points for a MINUIT minimization processto find the maximum kernel value. 1 onnecting the Dots. April 20-30, 2020
Equation (1) is computed from the location, direction, and covariance matrix of input tracks. Thisinformation can be provided by a standalone toy Monte Carlo proto-tracking [4] or from the proposed LHCbRun 3 production Velo tracking [6]. While the former uses a heuristic Gaussian width to estimate the IP( x, y )uncertainty, the latter makes use of the measured covariance matrix of the track state closest to beam tocompute the IP( x, y ) covariance. The difference between kernel histogram using the heuristic and measuredIP( x, y ) uncertainty/covariance is exemplarily shown in Fig. 2. It can be seen that the kernels are morepronounced around PVs, which suggests that – once re-trained – the CNN performance should increase.Further improvement with LHCb simulation data is expected from centering G at the actual beamlineposition at a given z position, and by using Kalman-fitted Velo tracks as input to the kernel generation. - z (mm) KD E v a l u e old Kernelsnew Kernelsrecon'ble PVs LHCb simulation
Figure 2: Comparison of kernel histograms heuristic (blue) and measured (red) IP( x, y ) uncer-tainty/covariance. A random single event with 6 reconstructible primary vertices is shown, whose z po-sitions are marked by black dots. Here, a simulated primary vertex is called reconstructible if more thanfour simulated tracks within the Velo acceptance emerge from that vertex. Note that there are 2 PVs atabout +15 mm, illustrating the issue of PV proximity that the cluster search faces frequently. The cluster search in our algorithm is conducted by a Convolutional Neural Network (CNN) consistingof several one dimensional convolution layers. We use
PyTorch [7] for training and development, and
TorchScript [8] as
C++ inference engine in the LHCb software stack.Before being able to train a CNN, we need to define what it should learn, i.e. give it a target. TheCNNs in our algorithm is designed to output a 4000 bin target histogram , just as the input kernel histogram,where essentially all “noise” is removed. A first approach to create such a target histogram is to define aGaussian with unit area and a width of one bin (100 µ m) centered around the simulated PV position. Thesesimple target histograms have been updated to take the PV resolution as a function of track multiplicity intoaccount, which affects width and area of the Gaussians. The resulting performance improvement is plottedin Fig. 3. Further, regions around PVs that do not pass the criterion of being reconstructible, i.e. have 5or less detectable associated tracks, are “masked”. Masked regions are effectively hidden during training, sothat discoveries in them are neither punished nor rewarded.Our best performing network to date is composed of 6 convolutional layers with leaky ReLU activationfunctions in-between hidden layers and a softmax activation for the output layer. The widths and padding ofeach convolution kernel was chosen by visual inspection of the data; the number of channels were increaseduntil benefits were no longer noticeable upon adding new channels. The training is carried out on GPUsusing mini-batch gradient descent, the Adam optimizer and dropout regularization.2 onnecting the Dots. April 20-30, 2020 A custom cost function, similar to cross-entropy, has been defined and it was found that it’s initialsymmetric form favored small false positive rates at the expense of efficiency. Therefore, a single parameterasymmetry term has been added to the cost function, serving as powerful control for selecting the falsepositive to efficiency tradeoff [4]. To stabilize the training process in early epochs, the last convolution layeris replaced by a Fully Connected (FC) layer. After several training epochs with this architecture, the weightsof the convolutional layers are fixed, the FC layer is replaced by a convolutional layer, and the network istrained for a few more epochs. Then, all weights are floated and the CNN is trained in it’s final architecture.Recently, we improved the CNN performance by adding x and y position information, found by maxi-mizing the kernel (Eq. (1)), in a perturbative manner. Adding the information perturbatively is importantbecause the CNN overestimates the importance of these variables compared to the z information. The per-turbative addition works as follows: On the basis of the original network, another CNN with 3 convolutionallayers that solely processes x and y position information is trained independently, but parallel to the originalnetwork. Such, that both CNN responses at the end of each training epoch are multiplied. The CNN with x and y information will contribute with values close to 1 in most cases, but can veto kernel peaks with high x, y gradients that have been observed in data and can lead to false positives. The performance improvementof the perturbation network, together with the addition of one convolutional layer is shown in Fig. 3.Figure 3: Performance improvements with respect to Ref. [4] achieved by modifying target histograms,adding layers to the CNN and adding x , y position information perturbatively. Details are described in thetext. The proof of principle, that the here proposed hybrid deep learning algorithm is an efficient vertex findingtool has been established in Ref. [4]. Since then, the algorithm’s performance has been improved furtherby changes described in Sec. 3 and are shown in Fig. 3. We want to highlight that for a fixed PV findingefficiency of 94 %, the false positive rate per event could be reduced by a factor of 2. These numbers havebeen obtained on toy simulation using metrics that do not fully reflect standard LHCb definitions, andare thus not directly comparable to the ones given in Ref. [3]. However, first studies using official LHCbsimulation in place of the toy simulation with our metrics agree with those from pure toy simulation. Weare looking forward to re-train the CNNs with LHCb simulation data and expect further improvements ofthe PV finding performance. 3 onnecting the Dots. April 20-30, 2020
We have presented a hybrid deep learning vertexing algorithm and it’s application as PV finding algorithmin the LHCb Run 3 software trigger. The algorithm has been privately deployed in the LHCb CPU softwarestack and it’s performance has been further improved using toy data.Future milestones of the project are well defined. We plan to deploy the algorithm in Allen, the HighLevelTrigger application on GPUs for LHCb [9]. Further, it is of great importance to have a one to one comparisonwith the currently proposed LHCb Run 3 production PV finding algorithm [3]. This means that we needto refactor the current implementation of our algorithm in the LHCb software stack to be able to run thesame performance benchmark tests. In parallel, we need to re-train our CNN with official LHCb simulationdata in place of toy data. We expect further performance improvements from doing so, in particular fromusing the measured covariance matrix of (Kalman-fitted) Velo tracks in the generation of kernel histograms.Currently, the generation of these kernel histograms is the throughput bottleneck of our algorithm due tothe usage of
MINUIT . We are planning to replace the kernel generation by a fast machine learning algorithm,which could be merged with the cluster search CNN into a single algorithm that predicts PV positions fromthe output of the Velo track reconstruction in one step. We are also investigating pruning techniques to speedup the throughput of the inference. Moreover, we plan to associate Velo tracks with found PV candidatesprobabilistically to reduce the false positive rate. In an adjacent step, we plan to probabilistically identifysecondary vertices and associated tracks.
ACKNOWLEDGEMENTS
This work was supported by the National Science Foundation under Cooperative Agreement OAC-1836650,OAC-1739772, and OAC-1740102. It was also supported by the University of Cincinnati Women in Scienceand Engineering program.
References [1] LHCb Collaboration, ”LHCb Trigger and Online Upgrade Technical Design Report,” CERN-LHCC-2014-016, LHCb-TDR-2014-016.[2] R. Aaij et al. [LHCb], ”Design and performance of the LHCb trigger and full real-time reconstructionin Run 2 of the LHC,” JINST , no.04, P04013 (2019); [arXiv:1812.10790 [hep-ex]].[3] F. Reiß, ”Fast parallel Primary Vertex reconstruction for the LHCb Upgrade,” Talk given at thisconference; LHCb-TALK-2020-044. Publication in preparation.[4] R. Fang, H. F. Schreiner, M. D. Sokoloff, C. Weisser and M. Williams, ”A hybrid deep learning approachto vertexing,” [arXiv:1906.08306 [physics.ins-det]].[5] pv-finder repository: https://gitlab.cern.ch/LHCb-Reco-Dev/pv-finder .[6] A. Hennequin, B. Couturier, V. Gligorov, S. Ponce, R. Quagliani and L. Lacassagne, ”A fast and efficientSIMD track reconstruction algorithm for the LHCb Upgrade 1 VELO-PIX detector,” [arXiv:1912.09901[physics.ins-det]].[7] A. Paszke et al. [The PyTorch team], ”PyTorch: An Imperative Style, High-Performance Deep LearningLibrary,” Advances in Neural Information Processing Systems 32, 8024 (2019).[8] The PyTorch team, ”Torch Script,” https://pytorch.org/docs/stable/jit.html .[9] R. Aaij et al. , ”Allen: A HighLevel Trigger on GPUs for LHCb,” Comput. Softw. Big Sci.4