Muon Identification Using Deep Neural Networks with the Muon Telescope Detector at STAR
MMuon Identification Using Deep Neural Networks withthe Muon Telescope Detector at STAR
J. D. Brandenburg ∗ , a,b , Frank Geurts a a Rice University b Brookhaven National Lab
Abstract
The installation of the muon telescope detector opened new possibilities forstudying dimuon production at STAR. However, backgrounds from hadronpunch-through and weak decays of pions and kaons make the identification ofprimary muons challenging. In this paper we present a study of shallow anddeep neural networks trained as classifiers for the purpose of muon identificationusing information from the muon telescope detector at STAR. The performanceof shallow neural networks is presented as a function of the number of neuronsin their hidden layer. A hyperparameter optimization for determining the op-timal deep neural network classifier architecture is presented. The optimizeddeep neural network is compared with shallow neural networks, boosted de-cision trees, likelihood ratios, and traditional cut-based PID techniques. Thesuperiority of the deep neural network based muon identification technique isdemonstrated and compared with traditional PID through the measurement ofthe φ meson and the ψ (2 S ) in p+p collisions at √ s = 200 GeV. The deep neuralnetwork based PID simultaneously provides higher signal efficiency, signal-to-background ratio, and significance of the φ peak compared to traditional PIDtechniques. Finally, a deep neural network assisted technique for measuring themuon purity in data is presented and discussed. Keywords: muon identification, shallow neural networks, deep neural networks,multivariate classifiers, STAR, Muon Telescope Detector
Preprint submitted to Nucl. Instru. Meth. A August 16, 2019 a r X i v : . [ phy s i c s . i n s - d e t ] A ug Corresponding author.
E-mail address: [email protected] (J. D. Bran-denburg).
1. Introduction
In 2014 the Solenoidal Tracker at RHIC (STAR) completed its installation ofthe Muon Telescope Detector (MTD). The MTD has made muon identificationover a large momentum range possible for the first time at STAR. However,even with the MTD, identification of pure muons can be challenging due tobackgrounds from hadron punch through. The identification of dimuon pairs isfurther obscured by secondary muons originating from the weak decays of π → µ + ν and K → µ + ν . We are motivated to explore the possible improvements overtraditional techniques in single muon identification and muon pair identificationthat can be obtained employing modern supervised learning algorithms.In this paper we explore classification techniques using artificial neural net-works (ANN) for improving muon identification using the information providedby the MTD at STAR. In Sect. 2, a brief description of the relevant STAR sub-systems is provided and the variables used for muon identification are defined.In Sect. 3, the dataset details are provided and the procedure used to gener-ate the training samples is described. In Sect. 4, the use of ANN classifiers isexplored for muon identification. Both shallow and deep neural networks arecompared and the techniques used to determine the optimal deep neural net-work architecture are discussed and presented. In Sect. 5, the performance ofthe DNN based muon identification vs. traditional techniques is compared inp+p collisions at √ s = 200 GeV. In this section, the use of the trained DNN fordata-driven muon purity measurements is also presented. Finally, a summaryis presented in Sect. 6. 2 . STAR Detector The STAR detector is a multi-purpose detector designed with large, uniformacceptance in 0 < φ < π and | η | <
1. The relevant STAR subsystems used forthis study are the Time Projection Chamber (TPC), the Magnet System, theTime-of-Flight (TOF) detector, and the Muon Telescope detector (MTD) [1–3].The TPC provides charged particle tracking and particle identification informa-tion via ionization energy loss ( dE/dx ) measurement. The TPC sits within a0.5 T magnetic field, allowing the charge ( q ) and transverse momenta ( p T ) oftracks to be measured from the curvature of their trajectories. The TPC covers2 π in azimuth and approximately ± η ) for collisionsat the center of the detector. The TPC provides momentum measurement witha momentum resolution of ∼ c of momentumat mid rapidity.The Time-of-Flight (TOF) detector is installed outside the TPC at a radiusof 210 cm and provides precise timing information with a timing resolutionlarger than ∼
90 ps in heavy ion collisions[4]. The TOF detector covers 2 π inazimuth and approximately ± . ∼
45% in the azimuthal direction for | η | < σ ≈
100 ps) and position ( σ ≈ atchedstripcenter Y local Zlocal Y( Δ Y) matched module center Z: Projected hit position 87 cm y z
Figure 1: A schematic of an MTD module. The strips are 87 cm and run along the local z axis. Each module contains 12 strips along the local y axis. Each strip is 3.6 cm wide with aspace of 0.6 cm between strips. readout allows the local Z position of hits to be measured via the difference intime between the two ends of the strip. Within each module, the local Y positionof a hit is measured by determining which of the 12 strips within a moduleregistered the hit. The ∆ Z and ∆ Y are the residual between the measuredlocal positions and the projected positions in the local Z and Y directionsrespectively. Figure 1 shows a schematic of the local MTD module coordinatesand the ∆ Z and ∆ Y calculation. The full list of variables that are used in thisstudy for muon identification are: • ∆ T OF - Difference between the calculated time-of-flight using a muonhypothesis versus the time-of-flight measured by the MTD. • ∆ Z - Difference between the local Z position calculated using a muonhypothesis versus the position measured by the MTD. • ∆ Y - Difference between the local Y position calculated using a muonhypothesis versus the position measured by the MTD from the center of4he matched strip. • cell - the geometric strip index ranging from 0 - 11 with 0 and 11 at theoutside edges of each module. The average amount of steel between theinteraction point and the MTD module is lowest at the edges. • module - the geometric module index ranging from 0 - 4. • backleg - the geometric backleg index ranging from 0 - 29. The amountof material between the interaction point and the MTD backlegs variesas a function of backleg since the detector is not fully symmetric in the φ direction. • n σ π - the dE/dx information measured by the TPC. The value normalizedby the expected value for the π and corrected for detector resolution isused for simplicity. The value of n σ π for muons is on average ∼ +0.5. • DCA - Distance of closest approach of the track to the primary collisionvertex. • p T - Transverse momentum of the track. The ∆ T OF , ∆ Y , and ∆ Z resolutions depend strongly on p T . • q - the track charge measured from its curvature.These variables will be used as the inputs when training neural networkclassifiers in Sec. 4.
3. Dataset and Training Samples
The data used for this study was collected by the STAR detector from p+pcollisions at √ s = 200 GeV during the 2015 RHIC run. The events were selectedusing the dimuon trigger requiring that at least two MTD signals be measuredwithin a timing window. The primary vertex of the events was required to bewithin ±
100 cm of the center of the detector along z . In total the dimuon5 TD Cell0 2 4 6 8 10 12 d N / d ( M T D C e ll ) - SignalBackground
MTD Cell d N / d ( M T D C e ll ) (a) Z (cm) D MTD 60 - - - - Z ) ( c m ) D d N / d ( M T D - - - - SignalBackground
MTD Δ" cm d N / ( M T D Δ " ) c m $ % (b)Figure 2: Simulated MTD cell (a) and ∆ Z distributions for signal and background sources.The affect of varying amounts of steel in the φ direction can be clearly seen in the celldistribution. Hadrons are significantly more likely to punch through to the steel guarding theedge cells (at 0 and 11, respectively) than the central cells. trigger recorded 300M events corresponding to a total sampled luminosity of122 pb − [5].Muon candidate tracks were required to have a p T > c , have a dis-tance of closest (DCA) approach to the collision vertex of DCA < dE/dx measurement to ensure a reason-able dE/dx resolution. Finally, muon candidate tracks are required to projectto active MTD volume and be matched to MTD hits that fired the trigger.
In Sect. 4 the training and use of ANNs to perform a two-class classifica-tion problem to distinguish signal muons from various types of backgroundsis discussed. This type of ANN based classification is an example of super-vised learning and therefore requires labeled datasets for the training phase. AMonte Carlo (MC) simulation procedure is used to generate the labeled signaland background datasets needed to train the supervised learning algorithms dis-cussed in Sect. 4. We define our signal class as primary muon tracks, i.e. those6riginating from the primary interaction vertex. In contrast, the backgroundclass includes all other sources of tracks that match to a hit in the MTD andresult in a reconstructed track in the tracker. The main sources of backgroundare a result of: • punch-through hadrons: e.g., π ± , K ± , and p/ ¯ p • charged-pion weak decays: π → µ + ν • charged-kaon weak decays: K → µ + ν The procedure used to forward model the signal and backgrounds consistsof three main steps: a kinematic event generator, a simulation of the STARdetector, and a full event reconstruction. First, events are generated with ∼ p + p collisionat √ s = 200 GeV. Each track in the event is randomly chosen to be a µ , π , K , or p . The kinematics of each particle are sampled from flat distributions in0 < p T < . c , | η | < .
8, and − π < φ < π . The particle species andkinematics are then fed into a GEANT3 [7] based simulation of the full STAR ge-ometry. The GEANT3 simulation performs decays of unstable particles, modelsenergy loss of particles traversing media, and interactions with detector materi-als. Finally, full event reconstruction is performed on the result of the GEANT3based simulation. This step performs charged particle reconstruction using thesimulated hits in the TPC, determines the event’s primary interaction vertex,and computes the dE/dx of reconstructed tracks. After tracking is complete thetracks are matched to the simulated MTD hits. The result of this simulation isa set of the PID variables for each of the signal and background processes. Anexample of the MTD cell and ∆ Z variables are shown for signal and backgroundin Figs. 2a and 2b, respectively. ∆ TOF distributions from Data
A data-driven approach is employed to determine the MTD ∆
T OF distri-butions separately for the signal and background classes. For this procedure 1D7 /ψ Selection Cuts3.0 < M µµ < c DCA < < nσ π < | ∆Y | < σ (+0.5, p T > c ) | ∆Z | < σ (+0.5, p T > c ) p leading T > c Table 1: Cuts used for determining the signal and background ∆Time-of-Flight PDFs. cuts are applied to all PID variables except the ∆
T OF . With the cuts listedin Table 1, a relatively pure
J/ψ sample can be obtained. Figure 4 shows theunlike-sign and like-sign distributions near the
J/ψ mass after applying the cutslisted in Table 1. Daughter tracks from the
J/ψ are used to extract the ∆
T OF probability distribution function (PDF) for signal. Specifically the signal PDFis extracted from the
J/ψ mass peak (3.0 < M < c ) with the back-ground under the peak estimated using the like-sign pairs in the same massregion. The ∆ T OF from the like-sign background is properly scaled and sub-tracted from the peak region to remove background contributions. The signal∆
T OF
PDF is shown in Fig. 4. The background ∆
T OF
PDF is extracted fromtracks passing an inverted set of cuts meant to exclude all signal muons. Thesecuts are shown in the right hand column of Table 1.The background ∆
T OF distribution is further separated into the contribu-tions for π , K , and p using timing information from the TOF detector. Thesub-sample of tracks which match to both the MTD and TOF are used to ex-tract the MTD ∆ T OF distribution for π , K , and p separately. The β − = c/v distribution measured by the TOF detector is shown in Fig. 3b for all back-ground tracks matched to both the MTD and TOF. In this figure, there areclear β − bands corresponding to pions, kaons and protons. The MTD ∆ T OF distribution for these three species were extracted by selecting around a given β − band. 8 (GeV/c mm M2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 - ) d N / d M ( G e V / c · < 3 p s -1 < n > 3 GeV/c)) T (+0.5 p s dY, dZ < 3 > 1.5 (GeV/c) leadingT pbg scale = 1.110 ) Signal Region : 3.0 < M < 3.2 (GeV/c unlike signlike sign (a)(b)Figure 3: The invariant mass distribution for unlike-sign and like-sign pairs near the
J/ψ mass. The N − J/ψ significance by cutting on allMTD PID variables except the ∆
T OF distribution. A p leading T > . c ) cut is appliedto further improve the purity in the J/ψ mass region. The β − vs. momentum distribution forall tracks passing basic QA cuts that are matched to hits in the MTD and the BTOF detectors(b). The β − calculated from the BTOF information shows clear contributions from π , K ,and p/ ¯ p . TOF (ns) D - ( a r b . no r m a li z a t i on ) - T O F ( n s ) D d N / d - -
10 1 = 200 GeVsRun 15 p+p – p – K pp/ Y from J/ – m Figure 4: The ∆TOF distributions for µ ± from J/ψ , π ± , K ± , and p/ ¯ p . K S → π + π − and φ → K + K − Decays
Selecting K S → π + π − decays in data provides a π ± enhanced sample thatcan be used to test the validity of the MC simulation procedure for the π ± background sources. The selection of K S candidates is carried out by applyingthe topological selection cuts listed in Table 2. In order to increase the availablestatistics for comparison only one of the K S daughters is required to have amatching hit in the MTD. Figure 5a shows the π + π − invariant mass distributionnear the K S mass used to select π ± daughter tracks. The π ± ∆ Y , ∆ Z , and celldistributions are computed using the unlike-sign distribution minus the scaledlike-sign distribution for each variable in the K S mass region (497 ±
25 MeV/ c ).Distributions with an enhanced kaon yield can be selected from the daughtersof φ → K + K − decays. The K + K − invariant mass distribution around M φ isshown in Fig. 5b for the case in which one track is matched to an MTD hit.The K ± ∆ Y , ∆ Z , and cell distributions are computed using the unlike-signdistribution minus the scaled like-sign distribution for each variable in the φ mass region (1.019 ± c ). The comparison between the ∆ Y , ∆ Z S Selection Cuts0.472 < M ππ < c decay length > < < p T +0.025 p T | nσ π | < Table 2: Cuts used to select K S → π + π − decays. The daughter pions provide a π -enhancedsampled that can be compared to the π MonteCarlo simulation. GeV/c pp M - ) d N / d M ( G e V / c · One MTD matched track unlike-signscaled like-signsignal regionbackground region (a) ) (GeV/c KK M - ) d N / d M ( G e V / c · One MTD matched track unlike-signlike-signsignal regionbackground region (b)Figure 5: The M π + π − distribution near the K S mass shown for the cases in which only onetrack is matched to an MTD hit (a) and the M K + K − distribution near the φ mass shown forthe cases in which only one track is matched to an MTD hit (b). and MTD cell distributions from MC and data for π ± and K ± tracks are shownin Figs. 6a and 6b. The data / simulation ratios show that the ∆ Y , ∆ Z andMTD Cell distributions agree within ∼ ±
4. Training and Evaluation of Neural Networks
In this section the use of dense Multilayer Perceptrons (MLP), a type offeed-forward ANN, are trained as continuous classifiers for the purpose of muonidentification. First, shallow artificial neural networks (SNN) will be discussed.A shallow artificial neural network is defined by the presence of a single hidden11 cm) - - - - da t a / s i m u l a t i on Y (circles) D (open) – K Z (stars) D (closed) – p (a) Cell da t a / s i m u l a t i on – K – p (b)Figure 6: The ∆ Y and ∆ Z data / simulation ratio for both π ± and K ± (c). The MTD celldata / simulation ratio for both π ± and K ± (d). layer of neurons between the input and output layers. The universal approxi-mation theorem [8, 9] states that a feed-forward ANN with certain activationfunctions and at least one hidden layer containing a finite number of neurons canapproximate any continuous function on compact subsets of R n . However, theuniversal approximation theorem makes no claim about the size of the hiddenlayer required to approximate a given function. In practice the number of neu-rons in the hidden ( N H ) layer may need to be intractably large to approximatethe desired function with acceptable error. In addition, with increasing numberof neurons the risk of over training can increase resulting in a model capable ofrepresenting the input data with small error but with very poor generalizationperformance. In this section an exploration of the performance of a large set of SNNs as afunction of the number of neurons in their hidden layer is presented. The mod-els are trained using the Toolkit for Multivariate Data Analysis with ROOT(TMVA) [10]. Table 3 lists the parameters used in the training phase for allmodels. Each model is trained on a random subset of 100K signal events and100K background events. A disjoint testing sample is drawn from 250K signal12 T ............charge........DCA...........n .........MTD Y........MTD Z........MTD TOF......MTD Cell......MTD Backleg...MTD Module....Bias.......... OutputB B B Figure 7: An example of a dense multilayer perceptron neural network architecture. Theshallow neural networks have only a single hidden layer of neurons between the input andoutput layers. The deep neural networks have two or more. Bias neurons in the hidden layersare marked with a ”B”.
Number of neurons in the hidden layer P e r f o r m an c e ( A UC x ) Figure 8: The signal vs. background rejection power as a function of the number of neurons( NN HL ) in the hidden layer of a shallow neural network. The performance of the SNNs arequantified using the AUC - the area under the background rejection vs. signal efficiency curve(See 5.1 in text). The points are the mean value of 10 models trained with different randomsamples. The uncertainties show the ± σ of the models assuming a Gaussian variance. N H , 10 models were trained with different randomizedtraining and testing samples. The performance of each trained SNN is quanti-fied using the area under the curve (AUC) of the background rejection versussignal efficiency distribution (higher is better). The results of the SNN scanare summarized in Fig. 8 where the AUC is shown as a function of N H . Eachpoint shows the mean response of 10 models with uncertainties that show the 1 σ variation between the response of the 10 models assuming a Gaussian variance.The background rejection power of the SNN shows clear improvement as N H is increased until N H ≈
30. Above N H ≈
30, adding more and more neuronsprovides relatively smaller and smaller improvement in the background rejectionpower.
Deep neural networks, in contrast to SNNs which contain only a single hiddenlayer, contain two or more hidden layers. The additional hidden layers can allowa network to learn complex relationships between input features with far fewerneurons and connections than a shallow network would need. Depending on theapplication it is also common for DNNs to combine various types of layers, suchas convolutional layers, to promote the learning of specific types of relationships.14 able 3: Parameters used in the training phase for the shallow and deep neural networks.
Parameter ValueNeuron Activation Func-tion tanhEstimator Type Mean SquareNeuron Input Function sumTraining Method Back-PropagationLearning Rate 0.02Decay Rate 0.01Learning Mode SequentialMax , , , , • Signal vs. background rejection power • Prefer simplest NN architecture (fewer number of neurons is better andfewer number of hidden layers is better) • Prefer monotonically increasing S/B as a function of NN responseThese three criteria are considered to determine the optimal set of DNN hy-perparameters. Each DNN was trained using the parameters listed in Table 3with only the architecture related parameters varying. Training DNNs can re-quire significantly more time and larger labeled samples compared to SNNs toreach convergence. The DNNs were trained with 1M signal and 1M backgroundevents and took between 10 and 100 times longer to train than the set of SNNsdepending on the specific architecture. However, the time-cost required to trainDNNs can be greatly reduced by employing modern libraries like TenserFlowthat have been heavily optimized for parallelized network training using GPUs[13].
5. Results and Applications
In the previous section neural networks were trained as classifiers for thepurpose of separating signal muons from various background sources. The per-formance of the neural network based classifiers are compared using modifiedreceiver operating characteristic (ROC) curves in Fig. 9 by plotting the back-ground rejection power (1 − ε bg ) vs. the signal efficiency ( ε sig ). The performanceof a classifier can be succinctly summarized with the area under the curve (AUC)of the background rejection vs. signal efficiency curve. An ideal classifier is ableto reject 100% of the background while providing 100% signal efficiency and hasan AUC of 1. On the other hand, a random guess classifier has an should havea 50/50 chance of correctly guessing the class and has an AUC of 0.5.16 (Signal Efficiency) sig e ) bg e B a ck g r ound R e j e c t i on ( - AUC = 1.0Ideal Classifier: AUC = 0.661 Traditional 1D Cuts: AUC = 0.826 1D Likelihood Ratios: (HL=20): AUC = 0.907 Shallow Neural Network (N=250): AUC = 0.948 Boosted Decision Trees (HL=3x14): AUC = 0.969 Deep Neural Network
Figure 9: The background rejection (1 − ε bg ) versus the signal efficiency ( ε sig ) for severaldifferent multivariate classifiers and traditional 1D cuts. The neural network classifiers shown in Fig. 9 are also compared with clas-sifiers employing optimized 1D cuts, 1D likelihood ratios, and boosted decisiontrees (BDTs). The cuts used in the 1D cut classifier were optimized on the Jψ peak in p + p collisions at √ s = 200 GeV. Both the 1D likelihood ratio classifierand the BDTs were trained using the TMVA package. The 1D likelihood ratioclassifier was trained with default parameters using spline interpolation whenbuilding the feature PDFs. The track p T and charge ( q ) variables were removedfrom the 1D likelihood classifiers since they should not be used directly for muonidentification. Additionally, since 1D likelihoods cannot properly incorporatethe p T dependence of the ∆TOF, ∆ Y , and ∆ Z features, the 1D likelihood clas-sifier was evaluated only for tracks in a narrow p T range (1.4 < p T < c ).A more thorough look at using likelihood ratios for muon identification with theMTD can be found in [3]. The BDT classifier was trained with N T rees = 250and
M axDepth = 5 with all other parameters set to the defaults.17 (GeV/c mm M0.8 1 1.2 - ) d N / d M ( G e V / c · =200 GeVsRun15 p+p at 3.787 – – =244.796 raw f N ) – mass=1.015 ) – width=0.014 S/B=0.191=6.270S + BS/ /NDF= c mm |y > 1.1 GeV/c T m |<0.5 && p m h |1D cut PID DataSignalBackground
Figure 10: Raw yield extraction of the φ meson using optimized traditional 1D PID techniques. ) (GeV/c mm M0.8 1 1.2 - ) d N / d M ( G e V / c · =200 GeVsRun15 p+p at 3.253 – – =281.331 raw f N ) – mass=1.016 ) – width=0.014 S/B=0.336=8.407S + BS/ /NDF= c mm |y > 1.1 GeV/c T m |<0.5 && p m h |DNN-based PID DataSignalBackground
Figure 11: Raw yield extraction of the φ meson using the DNN based PID. .2. Muon Identification in Data The DNN classifier out-performed the other multivariate classifiers inves-tigated in Sec. 4 based on an analysis of the background rejection power vs.signal efficiency evaluated on a testing sample of simulated events. We can fur-ther test the performance of the DNN classifier by applying it to the dimuondata collected from p + p collisions at √ s = 200 GeV. The decay of resonancesto muons, like the φ → µ + µ − decay, provides a self-analyzing set of data fortesting muon identification techniques. Muon pairs are selected in the data byfirst evaluating the DNN response for all muon candidates in an event. Pairsare then formed from oppositely charged muons. Signal pairs are selected basedon the pair DNN response r pair : r pair = (cid:113) r a + r b (1)where r a and r b are the DNN responses for paired muons a and b , respectively.The DNN was specifically optimized to promote a response of r ≈ r ≈ µ + µ − pair will be r pair ≈ √
2. The optimal r pair cut forselecting φ → µ + µ − decays was determined by maximizing the φ significance( S/ √ S + B ) in steps of r pair = 0 .
01. The signal and background contributionswere extracted by fitting the raw µ + µ − invariant mass spectra in 0.85 < M µµ < c . A 4 th -order polynomial was used to model the background anda Gaussian was used for the φ meson peak. The optimal cut was found to be r pair > .
36 which provides a φ meson significance of ∼ S/B ratio of0.33. Figure 11 shows the raw φ meson yield extraction fits using traditional1D cuts optimized on the J/ψ for muon identification and using the DNN-based muon identification. The DNN-based muon identification simultaneouslyprovides higher
S/B ratio, significance, and signal efficiency compared to theoptimized 1D muon identification. In Fig. 12, the raw µ + µ − invariant massspectra in the range 0 < M µµ < is shown for optimized 1D cut-basedmuon identification and compared with the DNN-based muon identification. In19 (GeV/c - µ + µ M ( a r b . no r m a li z a t i on ) - ) d N / d M ( G e V / c −
10 110 = 200 GeVsRun15 p+p at | < 0.5 µ η > 1.0 [GeV/c], | µ T p | < 0.5 µµ |y Invariant Mass Spectra - µ + µ Raw STAR Preliminary
DNN-based PID1D cut-based PID
Figure 12: Comparison of the raw M µµ invariant mass distribution using optimized 1D cut-based muon identification versus the DNN-based muon identification. The distributions arescaled in 1.5 < M µµ < c to make comparison easier. addition to improving S/B and significance of the ω and φ mesons, the DNN-based muon identification allows the ψ (2 S ) to be visible. Since no individual feature among the set of PID features clearly separatessignal from background contributions, it is not possible to fit any one of thefeatures in order to extract the muon purity of tracks in data. Given the signaland background PDFs for each of the 8 PID features (neglecting p T and q ),one could in principle conduct a simultaneous fit to all 8 distributions in orderto extract the yield of signal and background contributions. Since each distri-bution would need to be fit to a µ , π , K , and p contribution it would requiresimultaneously fitting 8 distributions with 32 templates constrained by 4 freeyield parameters. While possible, in practice a simultaneous fit with so manydistributions and templates is technically challenging and often proves unstable.Instead, the complexity of the problem can be greatly reduced by simplyfitting the DNN response for muon candidates with the template shapes forsignal and background components. Since the DNN combines all PID featuresIn this setup, only a single distribution needs to be fit with the 4 template20hapes for signal and background each with a free yield parameter. Figure 13shows the result of this procedure applied to muon candidate tracks in therange 1.5 < p T < c . The template for each component is computedby evaluating the DNN on simulated tracks in the same kinematic regions asthose in the data. The data/fit ratio shown in the lower panel of Fig. 13 showsthat the fit is capable of describing the DNN response for muon candidates towithin ∼
20% over the entire range of DNN responses.After determining the yield of each signal and background contribution,the DNN response can be projected back onto all of the 8 PID features toverify that the DNN is properly combining the information from all variables.Ensuring that the projection onto each PID feature results in a good descriptionof the data is a strong demonstration that the DNN is not over-training onartifacts in the training samples. Projections onto the ∆ Z and DCA featuresare shown in Figs. 14a and 14b. This technique allows the increased signal vs.background separation power provided by the DNN-based muon identification tobe leveraged for data-driven muon purity measurements. At the same time, theability to project the muon purity fit results back onto the PID features providesa data-driven strategy to test for over-training and poor model generalization.21 NN response0 0.2 0.4 0.6 0.8 1 d N / d DNN r e s pon s e - - - < 1.55 (GeV/c) T / NDF = 244.46 / 106 = 2.31 c – = 0.334 m Yield 0.017 – = 0.532 p Yield 0.014 – Yield K = 0.086 0.004 – Yield p = 0.037 =200 GeVsRun15 p+p m p
K p sum
DNN response f i t / da t a Figure 13: The top panel shows the DNN response for muon candidates in the range 1.5 < p T < c . A template fit is conducted to extract the contributions from µ (red), π (blue), K (orange), and p (magenta). The lower panel shows the ratio of the data over thesum of the contributions. Z (cm) D - - - - - - Z ( c m ) D d N / d - - - < 1.55 (GeV/c) T / NDF = 100.65 / 96 = 1.05 c – = 0.334 m Yield 0.017 – = 0.532 p Yield 0.014 – Yield K = 0.086 0.004 – Yield p = 0.037 =200 GeVsRun15 p+p m p
K p
Z (cm) D - - - - - f i t / da t a (a) DCA (cm) - d N / d DC A ( c m ) - - - < 1.55 (GeV/c) T / NDF = 205.04 / 148 = 1.39 c – = 0.334 m Yield 0.017 – = 0.532 p Yield 0.014 – Yield K = 0.086 0.004 – Yield p = 0.037 =200 GeVsRun15 p+p m p
K p
DCA (cm) f i t / da t a (b)Figure 14: The result of the DNN response fit for µ , π , K , and p contributions projected backonto the ∆ Z (a) and DCA (b) distributions. The ratio of fit over data is shown in the lowerpanels of each figure. . Summary The installation of the muon telescope detector has made muon identificationpossible at STAR over a large p T range. With only a single layer of steel actingas a hadron absorber, backgrounds from hadron punch through and weak decaysmake primary muon identification challenging. Several quantities measured bythe STAR tracker and MTD are used to train shallow and deep neural networkclassifiers for the purpose of muon identification. The deep neural networkclassifier out-performed the other multivariate classifiers investigated in Sec. 4based on an analysis of the background rejection power vs. signal efficiencyevaluated on a testing sample of simulated events. When applied to dimuontriggered p+p collisions at √ s = 200 GeV, the DNN-based PID simultaneouslyprovides higher S/B ratio, significance and efficiency for the φ -meson yieldextraction. At higher masses, the he DNN-based muon identification makes the ψ (2 S ) state significantly more visible in the raw M µµ distribution comparedto optimized 1D cut-based muon identification. Finally, an application of thetrained DNN for data-driven muon purity measurements is presented.
7. Acknowledgements
We thank the STAR Collaboration for the use of the experimental datashown in this paper and the operation of this system during RHIC runningperiods as part of STAR standard shift crew operations. This work was fundedby the U.S. DOE Office of Science under contract No. DE-FG02-10ER41666.
References [1] K. H. Ackermann et al . (STAR Collaboration) , Nucl. Instr. and Meth. A (2003), 624.[2] M. Anderson et al . (STAR Collaboration) , Nucl. Instr. and Meth. A (2003) 659–678. 233] T. Huang, R. Ma, B. Huang et al . Nucl. Instr. and Meth. A (2016)88–93.[4] W. Llope, F. Geurts, J. Mitchell et al . Nucl. Instr. and Meth. A (2004)252–273.[5] T. Todoroki, Nucl. Phys. A, 967 (2017): 572–75[6] C. Yang, X. J. Huang, C. M. Du et al . Nucl. Instr. and Meth. A (2014)1–6.[7] R. Brun, a. C. McPherson, P. Zanarini, et al . CERN Program Library LongWriteup W5013.[8] C. Debao, Approx. Theory its Appl. (1993) 17–28.[9] K. Hornik, M. Stinchcombe, H. White, Neural Networks (1989) 359–366.[10] J. Therhaag AIP Conf. Proc., , (2012), 1013–1016.[11] B. Efron, J. Am. Stat. Assoc. (1987) 171–185.[12] B. Efron, Ann. Stat. (1979) 1–26.[13] M. Abadi, A. Agarwal, P. Barham, et alet al