A neural network classifier for electron identification on the DAMPE experiment
David Droz, Andrii Tykhonov, Xin Wu, Francesca Alemanno, Giovanni Ambrosi, Enrico Catanzani, Margherita Di Santo, Dimitrios Kyratzis, Stephan Zimmer
PPrepared for submission to JINST
A neural network classifier for electron identification onthe DAMPE experiment
D.Droz, 𝑎, A.Tykhonov, 𝑎 X.Wu, 𝑎 F.Alemanno, 𝑏,𝑐
G.Ambrosi, 𝑑 E.Catanzani, 𝑑,𝑒
M.Di Santo, 𝑓 ,𝑔
D.Kyratzis, 𝑏,𝑐
S.Zimmer 𝑎 University of Geneva, CH-1205 Geneva, Switzerland 𝑏 Gran Sasso Science Institute (GSSI), Via Iacobucci 2, I-67100 L’Aquila, Italy 𝑐 Istituto Nazionale di Fisica Nucleare (INFN) - Laboratori Nazionali del Gran Sasso, I-67100 Assergi,L’Aquila, Italy 𝑑 Istituto Nazionale di Fisica Nucleare (INFN) - Sezione di Perugia, I-06123 Perugia, Italy 𝑒 Dipartimento di Fisica e Geologia, Universita‘ di Perugia, I-06123, Perugia, Italy 𝑓 Dipartimento di Matematica e Fisica E. De Giorgi, Università del Salento, I-73100, Lecce, Italy 𝑔 Istituto Nazionale di Fisica Nucleare (INFN) - Sezione di Lecce, I-73100, Lecce, Italy
E-mail: [email protected] , [email protected] Abstract: The Dark Matter Particle Explorer (DAMPE) is a space-borne particle detector andcosmic ray observatory in operation since 2015, designed to probe electrons and gamma rays froma few GeV to 10 TeV energy, as well as cosmic protons and nuclei up to 100 TeV. Among the mainscientific objectives is the precise measurement of the cosmic electron+positron flux, which due tothe very large proton background in orbit requires a powerful particle identification method. In thepast decade, the field of machine learning has provided us the needed tools. This paper presents aneural network based approach to cosmic electron identification and proton rejection and showcasesits performances based on simulated Monte Carlo data. The neural network reaches significantlylower background than the classical, cut-based method for the same detection efficiency, especiallyat highest energies. A good matching between simulations and real data completes the picture.Keywords: Particle identification methods Corresponding author. a r X i v : . [ a s t r o - ph . I M ] F e b ontents The DArk Matter Particle Explorer (DAMPE - also known as Wukong in China) is a satellite-basedcosmic rays observatory and gamma rays telescope [1]. It can measure cosmic electrons up toan energy of 10 TeV and protons and heavier nuclei up to a hundred TeV. It is constituted of foursubdetectors, from top to bottom: a plastic scintillator (PSD) for absolute charge measurement; asilicon-tungsten tracker-converter (STK) for precise direction measurement and for enabling photonpair production; a Bismuth Germanium Oxide imaging calorimeter (BGO) of about 32 radiationlengths, made of 308 hodoscopically arranged bars in 14 layers and used for energy measurement,particle identification, and trigger [2]; and a neutron detector (NUD) for improving the identificationof hadronic showers [1]. The DAMPE satellite was launched into a 500-km Sun-synchronous orbiton December 2015, and is in stable operations since then [3].
Carbon-Fibre Satellite Housing
NUD BGO STKPSD z x y
Figure 1 . Layout of the DAMPE detector system. Figure from [4] – 1 –mong the main scientific objectives of DAMPE is the study of cosmic ray electrons pluspositrons (CREs) in the TeV range. Precise measurement of CREs spectrum is crucial for under-standing the mechanisms of cosmic ray production and propagation [5–7]. CREs represent a perfectprobe of the most energetic process in the nearby Galaxy, such as supernovae explosions [8] andmay enable the observation of phenomena such as dark matter decay or annihilation [9, 10]. TheCRE spectrum was first measured by DAMPE in 2017 up to an energy of 4.6 TeV, with the directdetection of a spectral break in the TeV region [11]. This result was obtained using a classical,cut-based technique for electron/proton discrimination. Said method, while powerful enough toreject proton background up to a few TeV with acceptable uncertainty, could not fully exploit theparticle identification capability of the detector. After more than four years of operation and thusmuch higher accumulated data statistics, DAMPE is capable of measuring the CRE spectrum up toan unprecedented energy of 10 TeV with the best achieved energy resolution. However the standardidentification technique [11] does not have sufficient electron/proton discrimination power to beused at such extreme energies. A new method is therefore required, and we propose one comingfrom the ever-growing toolbox of data sciences and artificial intelligence.The field of data sciences underwent incredible developments in the past decade. The expo-nential growth of computing power, the advent of the big data era, and improvements in existingalgorithms, are allowing machine learning and artificial intelligence to conquer a variety of fields[12], where they may even outperform human experts: computer vision [13], self-driving cars [14],or notoriously playing Go [15]. In your everyday life, you most likely use systems powered bymachine learning algorithms, from web searches [16] to emails [17] and to your phone’s virtualassistant [18].Behind these advances and successes are a set of techniques known as deep learning anddeep neural networks [19, 20]. The various flavours of neural networks represent nowadays thestate-of-the-art in most of the data sciences. They have found their way into high energy physics[21, 22], where the large size and high dimensionality of the data make them much welcomeadditions, and into astrophysics [23, 24] where their pattern recognition power allows for extractionof additional information from telescope images. However, neural networks have only been seldomused in cosmic ray physics, and never exploited for CRE direct detection experiments at multi-TeVenergies.In this paper we propose to use neural networks for the problem of electron identification onthe DAMPE experiment. In section 2 we develop the electron identification technique based on aneural network classifier. In section 3 we demonstrate the performances of the developed classifieron Monte-Carlo simulations and compare it with the standard cut-based technique. In section 4 weperform a validation of the developed classifier with the 4-years DAMPE orbit data and demonstratethat the new method allows a substantial enhancement of the electron identification, enabling thefirst CREs spectrum measurement with DAMPE in the 10 TeV energy domain. Finally, the workand the results are summarised in section 5.
Measuring cosmic electrons in orbit requires first to identify them among the many sources ofbackground [25]. Helium and heavier nuclei can be rejected by measuring the absolute electric– 2 –harge, which is done in DAMPE using the signal deposited inside the plastic scintillator (PSD). ThePSD cut is complemented by a charge measurement in the silicon tracker (STK) allowing to rejectheavy nuclei outside of the PSD fiducial region. Gamma rays are another source of background,however their flux is orders of magnitude weaker than the CRE flux especially at energies higherthan several hundred GeV. They are rejected thanks to their electric charge of zero: a gamma raypenetrating through the PSD will not leave any signal as opposed to other particle species.Protons however have the same absolute electric charge as electrons, meaning they cannot berejected with these methods. Moreover, protons are the most abundant comic ray species, with aflux orders of magnitude higher than the electrons one, making the latter a rare signal drown outin a sea of background events. The exploitable difference between these two particles lies in thephysical processes when interacting with matter: protons produce a wide and deep hadronic shower,while electrons produce narrower electromagnetic showers, resulting in different signal topologiesin the BGO calorimeter.Classically, one can build observables that quantify the shower shape inside the calorimeter,and use them to reject protons. The first DAMPE cosmic electron spectrum measurement [11]is based on a single observable named 𝜁 that combines the maximum shower depth and theshower horizontal spread [26]. While the variable proved powerful, it does not use the entiretyof information available in the detector, including the possibility to exploit the strong correlationsbetween topological variables used to describe the shower development in the imaging calorimeter.Furthermore its rejection power is limited above the TeV range where the topological developmentof hadronic and electromagnetic showers in the detector is less pronounced. A more powerfulmethod is therefore required, and is presented in this paper. Neural networks are a new-old technique. First developed in the 1950s-1960s [27], they were quicklyabandoned due to their very high requirements in computing power despite their capabilities as anuniversal classifier [28]. Neural networks were brought back to life in the 21st century and foundtheir way to the very front of machine learning development, thanks to the abundance of data of allsorts and the capabilities of modern computers.The classifier we propose to use is based on a vanilla neural network, composed of a stackof densely connected layers (see below). Such algorithms are sometimes named as multilayerperceptron , feedforward neural network , or simply artificial neural network . Other techniques werealso studied in the DAMPE collaboration prior to this work: convolutional neural networks (CNN),suited for pattern recognition and image identification, showed promising performances. WhileCNN demonstrated potential for further improvement with respect to the feedforward network, weopted for latter technique due to the better understood systematics of this type of networks, whichis reflected, in particular, in data to Monte-Carlo agreement of a classifier score distribution [29].Another technique studied by the collaboration are boosted decision trees, commonly used in highenergy physics. They showed some improvement over the classical method, though optimised forthe lower energy domain, around 10 to 100 GeV [30].An artificial neural network is a stack of densely connected layers of so-called neurons. Aneuron is a mathematical unit that applies a non-linear function to the linear combination of itsinputs. The function output is then used as input by all the neurons in the next layer, in the case of– 3 –ully connected networks. Mathematically, if a neuron receives as input a set { 𝑋 𝑖 } , then its output 𝑦 is: { 𝑋 𝑖 } −→ 𝑦 = 𝑓 (cid:32)∑︁ 𝑖 𝑤 𝑖 𝑋 𝑖 (cid:33) + 𝑏 (2.1)where 𝑓 is the non-linear activation function, 𝑤 𝑖 the weights and 𝑏 the bias. The activation function 𝑓 , as well as the number of neurons and layers, are characteristics of the model which are decidedby the human programmer (section 2.2). The values of 𝑤 𝑖 and 𝑏 are set by the machine duringthe training procedure: the network is exposed to a set of labelled data (training data) where eachobservation/event is associated to a class (e.g. signal/background), and tries to minimise a givenerror metric on the set. The minimisation usually follows a so-called gradient descent or one ofits flavours. In the case of a classification problem, such as discrimination between protons andelectrons, the canonical error metric is the cross-entropy [20].An artificial neural network takes as input a one-dimensional set of variables { 𝑋 𝑖 } . In a binaryclassification, it outputs a value between 0 and 1 which can be interpreted as the probability for anelectron (signal) to produce { 𝑋 𝑖 } . This value is obtained as the output of a sigmoid or sigmoid-likefunction as the activation of the very last layer.In-depth reviews of neural networks can be found in references [19, 20]. Two critical aspects of the procedure are the choice and preparation of the input data, and theparameters of the neural network architecture (so-called hyperparameters).The data available for training and testing the model are Monte Carlo (MC) simulated events,created using the Geant4 package [31] interfaced to the DAMPE software [32] to emulate thedetector response to the various cosmic rays species in orbit. The training data was prepared witha set of cleaning cuts to replicate the analysis chain applied on real data: this includes showercontainment and fiducial volume selection criteria, as well as cuts to remove Helium nuclei andgamma rays [11]. In this way, the only remaining background for electron identification are protons.The cut flow is completed by selecting only events with 𝜁 < 𝜁 is the classical classifierfrom reference [11]. This conservative criterion eliminates events with obvious proton-like topologywhile retaining 99.95% of electrons. The motivation behind this final cut is to get rid of easilyidentified events, such that the neural network can focus on more complex ones. This preselectionyields a neural network with higher discrimination power.Out of this cleaned-up data, we built a training set of 140’000 events, balanced 50/50 betweenelectrons and protons. The data is normalised by dividing each input variable by its maximum valuethrough the dataset, therefore scaling them to the [
0; 1 ] range.We selected the input variables based on several criteria. First, the features had to be descriptiveenough of the shower topology to provide as much information as possible to the neural network.Following published results on deep learning for particle physics [21], we opted mostly for low-levelvariables to maximise said information, but we added in high-level variables as well such as the 𝜁 classifier which we know provides an already powerful proton rejection. The idea being to givethe network our best guess as expert, and then the low-level ingredients to improve on it. Adding 𝜁 significantly improved the sub-TeV performances. Finally, we required the input variables to be– 4 – ..... . . . Measured energy (cid:1)
Trajectory angleRMS layer 14RMS layer 3Energy layer 14Energy layer 3 . . . . . . D r o p o u t % D r o p o u t % D r o p o u t % Dense layer300 neurons Dense layer150 neurons Dense layer75 neurons Output layer
STK cluster energy
Figure 2 . Schematic of the neural network model used in this work. The hidden layers use the ReLUactivation function. The logistic sigmoid in the output layer is removed after training. well simulated. We observed that depending on the set of features, the network could produceslightly different results between MC simulations and real data. We therefore followed an extensivecampaign of empirical testing to find and remove the quantities that resulted in such differences.As a result, the features selected include the energy deposited and its RMS distribution in 12 outof 14 layers of the BGO calorimeter (excluding the top two), the reconstructed energy, the angle ofthe trajectory, the energy deposited in one Moliere radius of a STK track (STK cluster energy), andthe classical 𝜁 classifier.Along with optimising the set of input variables we were also researching and optimising thearchitecture of the neural network itself (the model). Building a model indeed requires severalparameter choices: the number of neurons and layers, the activation function, the optimiser andregularisers, etc. The optimisation of these hyper-parameters is somewhat of an art, and a commonpractice is to conduct a random gridsearch [33]. We decided to follow this philosophy and testedhundreds of models against each other. The winner of this computing battle royale is a modelconsisting of 4 layers with 300, 150, 75 and 1 neuron, respectively, regularised with a 10-20%dropout (technique consisting of randomly turning off neurons during the training) [34]. Thehidden layers use the Rectified Linear Unit (ReLU) [35] activation function, and the output layeruses the logistic sigmoid function to map the network output to the [
0; 1 ] range as is common tobinary classification problems. Finally the model is optimised using the Adam gradient descentalgorithm [36] against the cross-entropy metric. The architecture is represented in figure 2 andtable 1.The extensive training campaigns were conducted on the Baobab computer cluster of theUniversity of Geneva, using Nvidia Titan X GPUs. On the software side, we used Nvidia cuDNN[37], Keras [38] with Theano [39] as a backend, and Scikit-Learn [40]. Google’s Tensorflow[41] was considered as well but internal benchmarks with our models and data showed no gain inperformances, for a longer computing time.A feature we noticed during the early stages of our optimisation procedure is that the neuralnetwork output values are either very close (or exactly equal to) 0.0 or 1.0, with only very fewevents classified in-between. This holds true for false positives and false negatives as well: figure– 5 –ayer Neurons ParametersDense 300 9000Dropout, 20% 0Dense 150 45150Dropout, 10% 0Dense 75 11325Dropout, 10% 0Dense 1 76Total parameters: 65,551Trainable parameters: 65,551Non-trainable parameters: 0 Table 1 . Layer-per-layer summary of the neural network used. . The overall distributionis therefore non-monotonic, which introduces complications for the estimation of background ona cosmic ray electron measurement. For example, this behaviour prevents any sort of baselinebackground extrapolation.The cause is a feature of neural networks for classification: the very last operation is a logisticsigmoid function that maps the output to the [
0; 1 ] range: 𝑓 ( 𝑥 ) = + 𝑒 − 𝑥 (2.2)Values 𝑥 (cid:29) 𝑓 ( 𝑥 ) (cid:39) .
0, effectively compressing the output into a limited, finitespace. Computer floating point accuracy also has an influence: for a 16-bits float, 𝑓 ( ) = 𝑓 ( ) = . (cid:46)
10. Whereas without this trick thereare proton events mapped to the whole space [
0; 1 ] , meaning that even the tightest possible cut at1.0 will still have a non-zero false positive rate. These events are likely protons that transfer most of their energy to one or several 𝜋 , starting electromagneticshowers while the remaining energy yields a very small hadronic contribution. They are therefore the most difficultbackground to distinguish from electron-induced showers. – 6 – .0 0.2 0.4 0.6 0.8 1.0Neural network output score10 N u m b e r o f e v e n t s MC electronMC proton 20 10 0 10 20NN output score (no sigmoid)10 N u m b e r o f e v e n t s MC eMC p
Figure 3 . Histogram of the neural network output on electron and proton Monte Carlo: (left) vanilla modelwith a sigmoid activation function at the output of the last layer, and with a peak of false positive outcomesin the proton Monte Carlo highlighted with a circle ; (right) the same model after removing (post-training)the sigmoid at the last layer. R e m a i n i n g b a c k g r o un d sigmoid offsigmoid on Figure 4 . Sample Receiver Operating Characteristic (ROC) curve of a model with a sigmoid activationfunction at its output versus a model without. The curves compare the signal efficiency (true positive rate)versus the remaining background (false positive rate) at varying discrimination threshold. The perfectlysuperimposed curves prove that the performances are exactly the same.
We report on figure 5 several Receiver Operating Characteristics (ROC) curve for our neuralnetworks, in comparison with the classical 𝜁 method, covering the energy range from 24 GeV to10.4 TeV. ROC curves are obtained by computing classification metrics at various discriminationthresholds, on the test sample. In our case we choose to plot the signal efficiency against theremaining background. Both quantities are defined as the ratio of passing events over total events– 7 – .70 0.75 0.80 0.85 0.90 0.95 1.00Signal efficiency R e m a i n i n g b a c k g r o un d Neural network classifier classifier 0.70 0.75 0.80 0.85 0.90 0.95 1.00Signal efficiency10 R e m a i n i n g b a c k g r o un d Neural network classifier classifier R e m a i n i n g b a c k g r o un d Neural network classifier classifier 0.70 0.75 0.80 0.85 0.90 0.95 1.00Signal efficiency10 R e m a i n i n g b a c k g r o un d Neural network classifier classifier0.70 0.75 0.80 0.85 0.90 0.95 1.00Signal efficiency10 R e m a i n i n g b a c k g r o un d Neural network classifier classifier 0.70 0.75 0.80 0.85 0.90 0.95 1.00Signal efficiency10 R e m a i n i n g b a c k g r o un d Neural network classifier classifier0.70 0.75 0.80 0.85 0.90 0.95 1.00Signal efficiency10 R e m a i n i n g b a c k g r o un d Neural network classifier classifier 0.70 0.75 0.80 0.85 0.90 0.95 1.00Signal efficiency10 R e m a i n i n g b a c k g r o un d Neural network classifier classifier
Figure 5 . Sample ROC curves on Monte-Carlo for the neural network classifier and the classical 𝜁 classifier,from 24 GeV to 10.4 TeV. The lowest curves have the lowest background for a fixed efficiency. The colouredband shows statistical uncertainty from Monte Carlo sampling. Logarithmic y-scale. – 8 – BGO corrected energy [GeV]0.000.010.020.030.04 R e m a i n i n g b a c k g r o un d a t % s i g n a l e ff i c i e n c y Neural network classifierStatistical error classifierEqual-efficiency bandEqual-efficiency + stat. 10 BGO corrected energy [GeV]0.000.010.020.030.040.050.060.07 R e m a i n i n g b a c k g r o un d a t % s i g n a l e ff i c i e n c y Neural network classifierStatistical error classifierEqual-efficiency bandEqual-efficiency + stat.
Figure 6 . Energy dependency of the surviving background fraction for a fixed signal efficiency of 85% (left)and 95% (right), for the neural networks and the classical 𝜁 method. for electrons, respectively protons: Signal efficiency = 𝑁 𝑒 − , 𝑝𝑎𝑠𝑠 𝑁 𝑒 − Remaining background = 𝑁 𝑝, 𝑝𝑎𝑠𝑠 𝑁 𝑝 These two metrics have the advantage of being independent from the relative abundance of electronswith respect to protons. A good classifier is one that maximises the first metric and/or minimisesthe second. This translates into a lower curve on figure 5: classifiers with the lowest curves havethe smallest background for a set efficiency. The image shows that the neural network significantlyoutperforms the classical method in the lowest and highest energy ranges, while the performancesappear roughly comparable at intermediate energies. Note that for both classifiers, 𝑁 𝑒 − and 𝑁 𝑝 aretaken after the 𝜁 <
100 cut for a fair comparison.The performances are thus energy-dependent. To see this dependence and to better quantifythe performances of both methods, we report on figure 6 the remaining background when we setthe discrimination threshold such as to have a 85% or 95% signal efficiency, as a function of theenergy reconstructed from the BGO calorimeter. The comparison involves an uncertainty due to theefficiency of both classifiers not being perfectly equal. On the figure, the error bars associated to 𝜁 show the statistical uncertainty from Monte Carlo sampling, the darker band shows the uncertaintyassociated to the choice of threshold to have compatible efficiency, and the lighter band is thecombination of both. The blue band associated to the neural networks is purely statistical. Figure 6confirms the previous observation that the gains of neural networks are significant on both ends ofthe energy range, in the high efficiency regime. From a few hundred GeV to 2 TeV, the performancesare within uncertainty of each other. Above 5 TeV, the proton rejection is improved by a factor atleast 2. – 9 – eural network output score10 − − C oun t s × FlightMC totalMC pMC e
Neural network output score10 - - C oun t s Neural network output score10 - - C oun t s Neural network output score10 - - C oun t s Neural network output score10 - - C oun t s Neural network output score10 - - C oun t s Figure 7 . Comparison of the neural network output distribution between simulated Monte Carlo and realdata, on six energy bins from 24 GeV to 10.6 TeV. – 10 – eural network output score10 − − C oun t s FlightMC totalMC pMC e
Neural network output score10 - - C oun t s Neural network output score10 - - C oun t s Neural network output score10 - - C oun t s Neural network output score10 - - C oun t s Neural network output score10 - - C oun t s Figure 8 . Comparison of the neural network output distribution between simulated Monte Carlo and realdata, on six energy bins from 24 GeV to 10.6 TeV. Logarithmic y-scale. – 11 –
Model validation
Raw performances on Monte Carlo is only half of the picture. The other half is whether theclassifier can be trusted on real data, that is if its output on simulations is compatible with itsoutput on real data. Both distributions of simulations and real data are shown on figures 7 and 8 insix logarithmatically-spaced energy bins. Both samples have been cleaned such that contributionsfrom other cosmic species, notably Helium nuclei, are negligible (section 2). A sum of proton andelectron Monte-Carlo templates were fitted to the data in the following way: the proton sample wasscaled to the data in the region [− − ] , while electron MC was scaled to the data in the region > To tackle the problem of high energy cosmic electron identification and measurement with DAMPE,we developed a four-layers deep feed-forward neural network classifier, the output of which wetransformed to suit the needs of particle physics analysis. On simulated data, the new classifiershows a strong background rejection power for a high signal efficiency of 95% and thus outperformsthe more traditional cut-based electron identification technique in the energy ranges where thelatter shows its limits, thanks to a better exploitation of the information contained within DAMPEcalorimeter and tracker. In particular, the proton rejection improves by a factor > Acknowledgments
The DAMPE mission was funded by the strategic priority science and technology projects in spacescience of the Chinese Academy of Sciences. In China, the data analysis was supported in partby the National Key Research and Development Program of China (no. 2016YFA0400200), the– 12 –ational Natural Science Foundation of China (nos. 11525313, 11622327, 11722328, U1738205,U1738207, and U1738208), the strategic priority science and technology projects of the ChineseAcademy of Sciences (no. XDA15051100), the 100 Talents Program of Chinese Academy ofSciences, and the Young Elite Scientists Sponsorship Program. In Europe, the activities and thedata analysis were supported by the Swiss National Science Foundation (SNSF), Switzerland,National Institute for Nuclear Physics (INFN), Italy and European Research Council (ERC) underthe European Union’s Horizon 2020 research and innovation programme (grant agreement No851103).The computations presented in this document were performed at University of Geneva on theBaobab cluster, with significant help from computer engineer Y. Meunier and from the HPC team.Simulations were performed on INFN CNAF and ReCaS clusters, Italy, and on Swiss NationalSupercomputing Centre (CSCS) Piz Daint (project s979).The corresponding author would like to acknowledge SARS-CoV-2 for making this worksignificantly harder than it should have been.
References [1] J. Chang et al. The DArk Matter Particle Explorer mission.
Astropart. Phys. , 95:6–24, 2017.[2] Zhiyong Zhang, Yunlong Zhang, Jianing Dong, Sicheng Wen, Changqing Feng, Chi Wang, YifengWei, Xiaolian Wang, Zizong Xu, and Shubin Liu. Design of a high dynamic range photomultiplierbase board for the bgo ecal of dampe.
Nuclear Instruments and Methods in Physics Research SectionA: Accelerators, Spectrometers, Detectors and Associated Equipment , 780:21–26, 2015.[3] G Ambrosi, Q An, R Asfandiyarov, P Azzarello, P Bernardini, MS Cai, M Caragiulo, J Chang,DY Chen, HF Chen, et al. The on-orbit calibration of dark matter particle explorer.
AstroparticlePhysics , 106:18–34, 2019.[4] A Tykhonov, G Ambrosi, R Asfandiyarov, P Azzarello, P Bernardini, B Bertucci, A Bolognini,F Cadoux, A D’Amone, A De Benedittis, et al. Internal alignment and position resolution of thesilicon tracker of dampe determined with orbit data.
Nuclear Instruments and Methods in PhysicsResearch Section A: Accelerators, Spectrometers, Detectors and Associated Equipment , 893:43–56,2018.[5] Peter Meyer. Cosmic rays in the galaxy.
Annual review of astronomy and astrophysics , 7(1):1–38,1969.[6] Igor V Moskalenko and AW Strong. Production and propagation of cosmic-ray positrons andelectrons.
The Astrophysical Journal , 493(2):694, 1998.[7] Yi-Zhong Fan, Bing Zhang, and Jin Chang. Electron/positron excesses in the cosmic ray spectrumand possible interpretations.
International Journal of Modern Physics D , 19(13):2011–2058, 2010.[8] T Kobayashi, Y Komori, K Yoshida, and J Nishimura. The most likely sources of high-energycosmic-ray electrons in supernova remnants.
The Astrophysical Journal , 601(1):340, 2004.[9] Gianfranco Bertone, Dan Hooper, and Joseph Silk. Particle dark matter: Evidence, candidates andconstraints.
Physics reports , 405(5-6):279–390, 2005.[10] Jonathan L Feng. Dark matter candidates from particle physics and methods of detection.
AnnualReview of Astronomy and Astrophysics , 48:495–545, 2010. – 13 –
11] G. Ambrosi et al. Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of electronsand positrons.
Nature , 552:63–66, 2017.[12] Stephen Marsland.
Machine learning: an algorithmic perspective . Chapman and Hall/CRC, 2014.[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassinghuman-level performance on imagenet classification. In
Proceedings of the IEEE internationalconference on computer vision , pages 1026–1034, 2015.[14] Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, PrasoonGoyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning forself-driving cars. arXiv preprint arXiv:1604.07316 , 2016.[15] Jim X Chen. The evolution of computing: Alphago.
Computing in Science & Engineering , 18(4):4,2016.[16] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deepstructured semantic models for web search using clickthrough data. In
Proceedings of the 22nd ACMinternational conference on Information & Knowledge Management , pages 2333–2338. ACM, 2013.[17] Thiago S Guzella and Walmir M Caminhas. A review of machine learning approaches to spamfiltering.
Expert Systems with Applications , 36(7):10206–10222, 2009.[18] Veton Kepuska and Gamal Bohouta. Next-generation of virtual personal assistants (microsoftcortana, apple siri, amazon alexa and google home). In , pages 99–103. IEEE, 2018.[19] Yann LeCun, Yoshua Bengio, and Geoffrey E. Hinton. Deep learning.
Nature , 521(7553):436–444,2015.[20] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
Deep Learning . MIT Press, 2016. .[21] Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for Exotic Particles in High-EnergyPhysics with Deep Learning.
Nature Commun. , 5:4308, 2014.[22] Dan Guest, Kyle Cranmer, and Daniel Whiteson. Deep learning and its application to lhc physics.
Annual Review of Nuclear and Particle Science , 68:161–181, 2018.[23] C Schaefer, M Geiger, T Kuntzer, and J-P Kneib. Deep convolutional neural networks as stronggravitational lens detectors.
Astronomy & Astrophysics , 611:A2, 2018.[24] Daniel George and EA Huerta. Deep neural networks to enable real-time multimessengerastrophysics.
Physical Review D , 97(4):044039, 2018.[25] Claus Grupen.
Astroparticle physics . Springer Science & Business Media, 2005.[26] J. Chang et al. Resolving electrons from protons in ATIC.
Adv. Space Res. , 42:431–436, 2008.[27] Jürgen Schmidhuber. Deep learning in neural networks: An overview.
Neural networks , 61:85–117,2015.[28] G. Cybenko. Approximation by superpositions of a sigmoidal function.
Mathematics of Control,Signals and Systems , 2(4):303–314, Dec 1989.[29] D Droz, A Tykhonov, and X Wu. Neural networks for electron identification with dampe. In , volume 36, 2019. – 14 –
30] Hao Zhao, Wen-Xi Peng, Huan-Yu Wang, Rui Qiao, Dong-Ya Guo, Hong Xiao, and Zhao-Min Wang.A machine learning method to separate cosmic ray electrons from protons from 10 to 100 gev usingdampe data.
Research in Astronomy and Astrophysics , 18(6):071, 2018.[31] Sea Agostinelli, John Allison, K al Amako, John Apostolakis, H Araujo, P Arce, M Asai, D Axen,S Banerjee, G 2 Barrand, et al. Geant4—a simulation toolkit.
Nuclear instruments and methods inphysics research section A: Accelerators, Spectrometers, Detectors and Associated Equipment ,506(3):250–303, 2003.[32] Chi Wang, Dong Liu, Yifeng Wei, Zhiyong Zhang, Yunlong Zhang, Xiaolian Wang, Zizong Xu,Guangshun Huang, Andrii Tykhonov, Xin Wu, et al. Offline software for the dampe experiment.
Chinese Physics C , 41(10):106201, 2017.[33] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization.
Journal ofMachine Learning Research , 13(Feb):281–305, 2012.[34] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov.Dropout: a simple way to prevent neural networks from overfitting.
The journal of machine learningresearch , 15(1):1929–1958, 2014.[35] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In
Proceedings of the 27th international conference on machine learning (ICML-10) , pages 807–814,2010.[36] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980 , 2014.[37] Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro,and Evan Shelhamer. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 ,2014.[38] F. Chollet et al. Keras. https://keras.io , 2015.[39] Theano Development Team. Theano: A Python framework for fast computation of mathematicalexpressions. arXiv e-prints , abs/1605.02688, May 2016.[40] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.
Journal of MachineLearning Research , 12:2825–2830, 2011.[41] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S.Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, AndrewHarp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, ManjunathKudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah,Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, VincentVanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg,Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning onheterogeneous systems, 2015. Software available from tensorflow.org., 12:2825–2830, 2011.[41] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S.Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, AndrewHarp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, ManjunathKudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah,Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, VincentVanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg,Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning onheterogeneous systems, 2015. Software available from tensorflow.org.