Demonstration of background rejection using deep convolutional neural networks in the NEXT experiment
NEXT Collaboration, M. Kekic, C. Adams, K. Woodruff, J. Renner, E. Church, M. Del Tutto, J.A. Hernando Morata, J.J. Gomez-Cadenas, V. Alvarez, L. Arazi, I.J. Arnquist, C.D.R Azevedo, K. Bailey, F. Ballester, J.M. Benlloch-Rodriguez, F.I.G.M. Borges, N. Byrnes, S. Carcel, J.V. Carrion, S. Cebrian, C.A.N. Conde, T. Contreras, G. Diaz, J. Diaz, M. Diesburg, J. Escada, R. Esteve, R. Felkai, A.F.M. Fernandes, L.M.P. Fernandes, P. Ferrario, A.L. Ferreira, E.D.C. Freitas, J. Generowicz, S. Ghosh, A. Goldschmidt, D. Gonzalez-Diaz, R. Guenette, R.M. Gutierrez, J. Haefner, K. Hafidi, J. Hauptman, C.A.O. Henriques, P. Herrero, V. Herrero, Y. Ifergan, B.J.P. Jones, L. Labarga, A. Laing, P. Lebrun, N. Lopez-March, M. Losada, R.D.P. Mano, J. Martin-Albo, A. Martinez, G. Martinez-Lema, M. Martinez-Vara, A.D. McDonald, Z. E. Meziani, F. Monrabal, C.M.B. Monteiro, F.J. Mora, J. Muñoz Vidal, P. Novella, D.R. Nygren, B. Palmeiro, A. Para, J. Perez, M. Querol, A.B. Redwine, L. Ripoll, Y. Rodriguez Garcia, J. Rodriguez, L. Rogers, B. Romeo, C. Romo-Luque, F.P. Santos, J.M.F. dos Santos, A. Simon, C. Sofka, M. Sorel, T. Stiegler, J.F. Toledo, J. Torrent, A. Uson, J.F.C.A. Veloso, R. Webb, R. Weiss-Babai, J.T. White, N. Yahlali
PPrepared for submission to JHEP
Demonstration of background rejection using deepconvolutional neural networks in the NEXTexperiment
NEXT collaboration
M. Kekic, , C. Adams, K. Woodruff, J. Renner, , E. Church, M. Del Tutto, J.A. Hernando Morata, J.J. G´omez-Cadenas, , ,a V. ´Alvarez, L. Arazi, I.J. Arnquist, C.D.R Azevedo, K. Bailey, F. Ballester, J.M. Benlloch-Rodr´ıguez, , F.I.G.M. Borges, N. Byrnes, S. C´arcel, J.V. Carri´on, S. Cebri´an, C.A.N. Conde, T. Contreras, G. D´ıaz, J. D´ıaz, M. Diesburg, J. Escada, R. Esteve, R. Felkai, , , A.F.M. Fernandes, L.M.P. Fernandes, P. Ferrario, , A.L. Ferreira, E.D.C. Freitas, J. Generowicz, S. Ghosh, A. Goldschmidt, D. Gonz´alez-D´ıaz, R. Guenette, R.M. Guti´errez, J. Haefner, K. Hafidi, J. Hauptman, C.A.O. Henriques, P. Herrero, V. Herrero, Y. Ifergan, , B.J.P. Jones, L. Labarga, A. Laing, P. Lebrun, N. L´opez-March, , M. Losada, R.D.P. Mano, J. Mart´ın-Albo, A. Mart´ınez, , G. Mart´ınez-Lema, , ,b M. Mart´ınez-Vara, A.D. McDonald, Z.-E. Meziani F. Monrabal, , C.M.B. Monteiro, F.J. Mora, J. Mu˜noz Vidal, , P. Novella, D.R. Nygren, ,a B. Palmeiro, , A. Para, J. P´erez, M. Querol, A.B. Redwine, L. Ripoll, Y. Rodr´ıguez Garc´ıa, J. Rodr´ıguez, L. Rogers, B. Romeo, , C. Romo-Luque, F.P. Santos, J.M.F. dos Santos, A. Sim´on, C. Sofka, ,c M. Sorel, T. Stiegler, J.F. Toledo, J. Torrent, A. Us´on, J.F.C.A. Veloso, R. Webb, R. Weiss-Babai, ,d J.T. White, ,e N. Yahlali Department of Physics and Astronomy, Iowa State University, 12 Physics Hall, Ames, IA 50011-3160, USA Argonne National Laboratory, Argonne, IL 60439, USA Department of Physics, University of Texas at Arlington, Arlington, TX 76019, USA Institute of Nanostructures, Nanomodelling and Nanofabrication (i3N), Universidade de Aveiro,Campus de Santiago, Aveiro, 3810-193, Portugal a NEXT Co-spokesperson. b Now at Weizmann Institute of Science, Israel. c Now at University of Texas at Austin, USA. d On leave from Soreq Nuclear Research Center, Yavneh, Israel. e Deceased. a r X i v : . [ phy s i c s . i n s - d e t ] S e p Fermi National Accelerator Laboratory, Batavia, IL 60510, USA Nuclear Engineering Unit, Faculty of Engineering Sciences, Ben-Gurion University of the Negev,P.O.B. 653, Beer-Sheva, 8410501, Israel Nuclear Research Center Negev, Beer-Sheva, 84190, Israel Lawrence Berkeley National Laboratory (LBNL), 1 Cyclotron Road, Berkeley, CA 94720, USA Ikerbasque, Basque Foundation for Science, Bilbao, E-48013, Spain Centro de Investigaci´on en Ciencias B´asicas y Aplicadas, Universidad Antonio Nari˜no, SedeCircunvalar, Carretera 3 Este No. 47 A-15, Bogot´a, Colombia Department of Physics, Harvard University, Cambridge, MA 02138, USA Laboratorio Subterr´aneo de Canfranc, Paseo de los Ayerbe s/n, Canfranc Estaci´on, E-22880, Spain LIBPhys, Physics Department, University of Coimbra, Rua Larga, Coimbra, 3004-516, Portugal LIP, Department of Physics, University of Coimbra, Coimbra, 3004-516, Portugal Department of Physics and Astronomy, Texas A&M University, College Station, TX 77843-4242,USA Donostia International Physics Center (DIPC), Paseo Manuel Lardizabal, 4, Donostia-San Sebas-tian, E-20018, Spain Escola Polit`ecnica Superior, Universitat de Girona, Av. Montilivi, s/n, Girona, E-17071, Spain Departamento de F´ısica Te´orica, Universidad Aut´onoma de Madrid, Campus de Cantoblanco,Madrid, E-28049, Spain Instituto de F´ısica Corpuscular (IFIC), CSIC & Universitat de Val`encia, Calle Catedr´atico Jos´eBeltr´an, 2, Paterna, E-46980, Spain Pacific Northwest National Laboratory (PNNL), Richland, WA 99352, USA Instituto Gallego de F´ısica de Altas Energ´ıas, Univ. de Santiago de Compostela, Campus sur, R´uaXos´e Mar´ıa Su´arez N´u˜nez, s/n, Santiago de Compostela, E-15782, Spain Instituto de Instrumentaci´on para Imagen Molecular (I3M), Centro Mixto CSIC - UniversitatPolit`ecnica de Val`encia, Camino de Vera s/n, Valencia, E-46022, Spain Centro de Astropart´ıculas y F´ısica de Altas Energ´ıas (CAPA), Universidad de Zaragoza, CallePedro Cerbuna, 12, Zaragoza, E-50009, Spain
E-mail: [email protected]
Abstract:
Convolutional neural networks (CNNs) are widely used state-of-the-art computervision tools that are becoming increasingly popular in high energy physics. In this paper,we attempt to understand the potential of CNNs for event classification in the NEXTexperiment, which will search for neutrinoless double-beta decay in
Xe. To do so, wedemonstrate the usage of CNNs for the identification of electron-positron pair productionevents, which exhibit a topology similar to that of a neutrinoless double-beta decay event.These events were produced in the NEXT-White high pressure xenon TPC using 2.6 MeVgamma rays from a
Th calibration source. We train a network on Monte Carlo simulatedevents and show that, by applying on-the-fly data augmentation, the network can be maderobust against differences between simulation and data. The use of CNNs offer significantimprovement in signal efficiency/background rejection when compared to previous non-CNN-based analyses.
Keywords:
Neutrinoless double beta decay; TPC; high-pressure xenon chambers; NEXTexperiment; CNN; event classification ontents
Machine learning techniques have recently captured the interest of researchers in variousscientific fields, including particle physics, and are now being employed in search of improvedsolutions to a variety of problems. In this study, we show that deep convolutional neuralnetworks (CNNs) trained on Monte Carlo simulation can be used to classify, to a highdegree of accuracy, events containing particular topologies of ionization tracks acquiredfrom a high-pressure xenon (HPXe) time projection chamber (TPC). As CNNs trainedon simulation are known to be difficult to apply directly to data due to the challengesassociated with producing a Monte Carlo that perfectly matches experiment, we also presentmethods for extending the domain of application of a CNN trained on simulated eventsto include real events. We claim that our use of these methods in adapting CNNs to theexperimental domain and verifying their performance is novel to the use of CNNs in thefield.Event classification is of critical importance in experiments searching for rare physics, asthe successful rejection of background events can lead to significant improvements in overallsensitivity. The NEXT (Neutrino Experiment with a Xenon TPC) experiment is searchingfor neutrinoless double-beta decay (0 νββ ) in
Xe at the Canfranc underground laboratoryin Spain. In the ongoing first phase of the experiment, the 5 kg-scale TPC NEXT-White[1] has demonstrated excellent energy resolution [2] and the ability to reconstruct high-energy ( O (2) MeV) ionization tracks and distinguish between the topological signaturesof two-electron and one-electron tracks [3]. It has also been used to perform a detailed– 1 –easurement of the background distribution and is expected to be capable of measuring the2 νββ mode in Xe with 3.5 σ sensitivity after 1 year of data-taking [4]. The next phaseof the experiment, the 100 kg-scale detector NEXT-100, will search for the 0 νββ mode at Q ββ , around 2.5 MeV. New techniques such as CNNs which analyze the topology of anevent near Q ββ and aim to eliminate background events are becoming more relevant andessential to reaching the best possible sensitivity.Machine learning techniques have seen many recent applications in physics [5]. Inneutrino physics in particular, CNNs have been applied to particle identification in samplingcalorimeters in the NOvA experiment [6]. The MicroBooNE experiment has also employedCNNs for event classification and localization [7] and track segmentation [8] in liquid argonTPCs. IceCube has applied graph neural networks to perform neutrino event classification[9], and DayaBay identified antineutrino events in gadolinium-doped liquid scintillatordetectors using CNNs and convolutional autoencoders [10]. Experiments searching for 0 νββ decay have also employed CNNs: EXO has studied the use of CNNs to extract event energyand position from raw waveform information in a liquid xenon TPC [11] and PandaX-IIIhas performed simulation studies demonstrating the use of CNNs for background rejectionin a HPXe gas TPC with a micromegas-based readout [12]. Further simulation studies inHPXe TPCs with a charge readout scheme (“Topmetal”) allowing for detailed 3D trackreconstruction have also shown the potential of CNNs for background rejection in 0 νββ searches [13]. NEXT has also presented an initial simulation study [14] of the use of CNNsfor background rejection. In this study we show that CNNs can be applied to real NEXTdata, using electron-positron pair production to generate events with a two-electron “ ββ -like”topology and studying how the energy distribution of such events changes when varying anacceptance cut on the classification prediction of a CNN.The paper is organized as follows: section 2 describes the topological signature of asignal event. In section 3 the data acquisition and reconstruction is explained. A descriptionof the CNN and training procedure, as well as evaluation on MC and data is given insection 4. Finally, conclusions are drawn in section 5. In a fully-contained 0 νββ event recorded by a HPXe TPC, two energetic electrons produceionization tracks emanating from a common vertex. Though the fraction of energy Q ββ carried by each individual electron may differ event-by-event, the general pattern observedis similar for the majority of events, and consists of an extended track capped on both endsby two “blobs”, or regions of relatively higher ionization density. These regions are presentdue to the increase in stopping power experienced by electrons in xenon gas as they slowto lower energies. They provide a distinct signature for 0 νββ decay, as measured trackswith similar energy produced by single electrons , for example photoelectric interactionsof background gamma radiation, contain only one such “blob”. The use of this signature,illustrated in Fig. 1, in performing background rejection is an essential part of the NEXTapproach to maximizing sensitivity to 0 νββ decay. Events with multi-tracks are easier to reject simply by counting the number of isolated depositions. – 2 – (mm) -60 -40 -20 0 20 40 60 Y ( mm ) -60-40-200204060 X (mm)
40 60 80 100 120 140 160 Y ( mm ) -100-80-60-40-20020 ✔ ✔ ✔✘ SIGNAL BACKGROUND
Figure 1 : Energy depositions from trajectories in a Monte Carlo simulation of a 0 νββ event, showing its distinct two-electron topological signature (left) compared with that ofsingle-electron event (right) of the same energy (figure from [15]).In order to demonstrate this approach experimentally, a reliable source of events with asimilar topological signature is necessary. Electron-positron pair production by high energygammas, followed by the subsequent escape from the active volume of the two 511 keVgamma rays produced in positron annihilation (“double-escape”), leaves a two-blob trackformed by the electron and positron emitted from a common vertex, similar to the trackthat would be left by a 0 νββ event. In this study, we use gamma rays of energy 2614.5 keVfrom
Tl (provided by a
Th calibration source, see Fig. 2) and observe the events inthe double-escape peak at 1592 keV. This peak lies on top of an exponential background ofsingle-electron tracks from Compton scattering of the calibration gamma rays and otherbackground radiation. Experimentally, then, we have a sample containing 0 νββ -like eventsand background-like events. By evaluating these events with a Monte-Carlo-trained neuralnetwork and studying the resulting distribution of accepted events, we can demonstrate,using real data acquired with the NEXT-White TPC, the potential performance of sucha network when employed in a 0 νββ search. These results can be compared to a similar,non-CNN based analysis published in [3].
The NEXT-White TPC measures both the primary scintillation and ionization producedby a charged particle traversing its active volume of high-pressure xenon gas. The maindetector components are housed in a cylindrical stainless steel pressure vessel lined withcopper shielding and include two planes of photosensors, one at each end, and severalsemi-transparent wire meshes to which voltages are applied, defining key regions of the– 3 –etector (see Fig. 2). The two planes of photosensors are organized into an energy plane ,containing 12 PMTs (photomultiplier tubes, Hamamatsu model R11410-10) behind thecathode, and a tracking plane containing a grid of 1792 SiPMs (silicon photomultipliers,SensL series-C, spaced at a 10 mm pitch) behind the anode. These sensors observe thescintillation produced in the active volume of the detector by ionizing radiation, includingprimary scintillation produced by excitations of the xenon atoms during the creation of theionization track and secondary scintillation produced by electroluminescence (EL) of theionization electrons. Note that in practice only the PMTs observe a consistently measurableprimary scintillation signal, while EL is observed by both the PMTs and the SiPMs. electroluminescent (EL)gap (6 mm)50 cm drift region P M T ( e n e r g y ) p l a n e c a t h o d e g a t e quartz plate(anode) SiPM (tracking)plane E Th source copper shielding
Cs src
Figure 2 : Schematic of the NEXT-White TPC, showing the positioning of the calibrationsources (
Cs and
Th) present during data acquisition for this study (figure derivedfrom [2]).EL occurs after the electrons of the ionization track are drifted through the activeregion by an electric field (of order 400 V/cm) created by application of high voltage to thecathode (-30 kV) and gate (-7.6 keV) meshes and arrive at the EL gap, a narrow (6 mm)region defined by the gate mesh and a grounded quartz plate on which a conductive indiumtin oxide (ITO) coating has been deposited. The large voltage drop over the narrow gapbetween the gate and the grounded plate creates an electric field high enough to acceleratethe electrons to energies sufficient to excite the xenon without producing further ionization,allowing for better energy resolution compared to the charge-avalanche detectors [16]. Thesubsequent decay of these excitations lead to EL scintillation, yielding of order 500-1000photons per electron traversing the EL gap. These photons, produced just in front of thetracking plane, cast a pattern of light on the SiPMs which can be used to reconstruct the( x, y ) location of the ionization. The PMTs located in the energy plane on the opposite side– 4 –f the detector see a more uniform distribution of light, including EL photons that haveundergone a number of reflections in the detector, and record a greater total number ofphotons for a more precise measurement of the energy. The time difference between theobservation of the primary scintillation (called S1) and secondary EL scintillation (calledS2) gives the distance drifted by the ionization electrons before arriving at the EL region,corresponding to the z location at which this ionization was produced. The data used in this study consisted of events with total energy near 1.6 MeV, includingelectron-positron events produced in pair production interactions from a 2.6 MeV gammaray (see section 2) and background events, mostly due to Compton scattering of the same2.6 MeV gamma rays . The acquired signals for each event consisted of 12 PMT waveformssampled at 25 ns intervals and 1792 SiPM waveforms sampled at 1 µ s intervals for atotal duration per read-out greater than the TPC maximum drift (approximately 500microseconds). The ADC counts per unit of time in each waveform were converted tophotoelectrons per unit time via conversion factors established by periodic calibration usingLEDs installed inside the detector, a standard procedure in NEXT-White operation. Thecalibrations were performed by driving LEDs installed inside the vessel with short pulsesand measuring the integrated ADC counts corresponding to a single photoelectron (pe).The analysis of the acquired data was similar to that of [3]. The 12 PMT waveformswere summed, weighted by their calibrated gains, to produce a single waveform in whichscintillation pulses were identified and classified as S1 or S2 according to their shape andlocation within the waveform. Events containing a single S1 pulse and at least one S2pulse were selected, and for these events, the S2 information was used to reconstruct theionization track. To do this, the S2 information was integrated into time bins of width 2 µ sin both the PMTs and SiPMs. Note that to eliminate dark noise, SiPM samples with lessthan 1 pe were not included in the integration.For each time bin, one or more energy depositions (“ hit ”) was reconstructed, and thepattern of signals observed on the SiPMs was used to determine the number of hits fora specific time bin and their corresponding ( x, y ) coordinates. A hit was assigned to thelocation of all SiPMs with an observed signal greater than a given threshold, and the totalenergy measured by the PMTs in that time bin was redistributed among the hits accordingto their relative SiPM signals.The energy of each hit as measured by the PMTs was then corrected, hit-by-hit, bytwo multiplicative factors, one accounting for geometric variations in the light response inthe EL plane and the other for electron attachment due to a finite electron lifetime in thegas. These correction factors were mapped out over the active volume by simultaneouslyacquiring events from decays of Kr, which was injected into the xenon gas and provideduniformly distributed point-like depositions of energy 41.5 keV [17]. The z -coordinate ofeach hit in the time bin was obtained from the time difference between S1 and S2 pulses,assuming an electron drift velocity of 0.91 mm/ µ s, as extracted from an analysis of the Environmental radioactivity is negligible compared to the source one. – 5 – Kr events. A residual dependence of the event energy on the length of the event alongthe z-axis is observed, and a linear correction is performed to model this effect, which isnot observed in simulation and remains to be fully understood. For details on this “axiallength” effect, see [2].The detector volume surrounding the reconstructed hits was then partitioned into 3Dvoxels of side length 10x10x5 mm , and the energy of all hits that fell within each voxelwas integrated. The X and Y dimensions of the individual voxels were chosen based on the1 cm SiPM pitch, while the Z dimension was chosen to account for most of the longitudinaldiffusion (1 σ spread at maximum drift length is ∼ Figure 3 : Reconstructed hits (left) and voxels (right) of a background Monte Carlo event.The volume within a tight bounding box encompassing the reconstructed hits is dividedinto 10x10x5 mm voxels to produced the voxelized track. To generate the events used in training the neural network, a full Monte Carlo (MC) ofthe detector, including the pressure vessel, internal copper shielding, and sensor planes,was constructed using Nexus [18], a simulation package for NEXT based on GEANT4 [19](version geant4.10.02.p01). The
Th calibration source decay and the resulting interactionsof the decay products were simulated by GEANT4, up to and including the productionof the ionization track. Events in the energy range of 1.4-1.8 MeV were selected, and thesubsequent electron drift, diffusion, electroluminescence, photon detection, and electronicreadout processes were simulated outside of GEANT4 to produce for each event a setof sensor waveforms corresponding to those acquired in NEXT-White. The analysis ofdata waveforms described in section 3.2 could then be applied to these MC waveforms toproduce voxelized tracks (see Fig. 3). MC events that were fully contained in the activedetector volume were used in the training set. To ensure the classification was done onlybased on the track topology, the energy of each voxel was scaled by the total event energy– 6 – .45 1.50 1.55 1.60 1.65 1.70 1.75
Energy (MeV) e v e n t c o un t all eventssignal events Energy (MeV) e v e n t c o un t all eventssidebands Figure 4 : Left: Energy distribution of all MC events (dashed line histogram) and of chosensignal events (solid histogram). Right: Energy distribution of experimental data eventsshowing selected sideband events. The sidebands are 100 keV in width, with each bandstarting 45 keV from either side of the double escape peak. The same procedure is alsoused to select the sidebands in MC.(the sum of voxel intensities for a given event was normalized to 1) such that the trainingdata did not contain event energy information. Those events containing an electron anda positron registered in the MC true information, with no additional energy depositedby the two 511 keV gamma rays produced upon annihilation of the positron (i.e. a true“double-escape”), were tagged as “signal” events and all others were tagged as “background”.In [3], an additional single-track selection cut is made, and for a fair comparisonwith this previous result we also apply the same cut (obtained from the standard trackreconstruction, for details see [3]) on test data only, for both MC and experimental data. Asa reference, inside the peak energy range, the efficiency of the single-track cut was ∼ ∼ νββ search for which we do not have a confirmed understanding of the underlyingphysics, nor would it be justified to make predictions on the same events used in optimizing– 7 –he network.Therefore, we develop a general paradigm (as described in section 4.3) that could beapplied at 0 νββ energies and, in evaluating the performance of the network on the datadomain, uses events outside the energy range within which we intend to make predictions.Namely, before applying the CNN to the peak itself, we evaluate the performance on thepeak sidebands (see Fig. 4), where the sample composition is known, and we expect the CNNpredictions to be similar in data and MC. The underlying assumption is that the domainshift between MC and data is not correlated with the type of event, i.e. we expect thatif a network is robust to MC/data differences on sidebands, it will be robust to MC/datadifferences in the peak region as well. In [3] it was shown that the track length differencebetween data and MC is consistent across a wide energy range, giving us confidence thatthe differences are indeed coming from the detector simulation and reconstruction (whichshould have the same effect on both signal and background events), rather than incorrectlysimulated physical processes, justifying the sidebands-testing approach. In this study we embedded our network architecture within the Submanifold Sparse Convo-lutional Networks (SCN) framework [20], implemented in PyTorch. SCN is highly suitablefor sparse input data, making the linear algebra far more efficient than with non-sparsetechniques. Further, in SCN the convolution rules allow only nonzero voxels in initial layersto hold non-zero output, thus conserving input sparsity. SCN is appropriate for our detectorin which the large majority of voxels have zero charge. Such networks have already beenused in high energy physics analysis [21] and the main advantage of these types of network isthat they occupy less memory and allow for larger input volumes and/or larger batch sizes.All of the results shown here were obtained using this framework, but we obtained similarresults using the standard implementation of dense convolutions in Keras/TensorFlow.We employed a residual [22] 3D CNN in performing the topological classification task.The network architecture is summarized in Fig. 5. The network consisted of two initialconvolutional layers, and a set of pre-activated ResNet block layers [23] followed by twoconsecutive dense layers with a dropout layer before each. The input dimensions were40x40x110 with each input corresponding to one voxel, therefore covering a volume of40x40x55 cm , essentially the entire active volume of the detector. The output was a2-element probability vector. A total of about 500k simulated fiducial events were used as a training set, of which 200kwere signal events, and an additional ∼
30k events were used as a validation sample withsimilar signal proportion. A batch size of 1024 was chosen, and binary cross entropy,weighted according to the signal/background ratio of the entire data set, was used as theloss function. To avoid overfitting, L2 weight regularization and dropout were employed,as well as on-the-fly data augmentation [24], including translations, dilation or “zooming” On-the-fly means that the augmentation is done during the code execution and the augmented datasetis not stored on the disk. – 8 – a) ResNet architecture (b) ResNet pre-activated block
Figure 5 : a) Summary of the neural network architecture used in this analysis, with b)details of each ResNetBlock architecture. – 9 –scaling all 3 axes independently), flipping in x and y, and varying SiPM charge cuts asdetailed in Fig. 6. We note that augmentation procedures used here are explicitly designedto be “label preserving” in that they do not change the single- or double-blob nature ofevents, but do reduce the significance of differences in data/simulation.As noted in section 4.1, since CNNs are highly nonlinear models, their applicationoutside the training domain cannot be assumed to be reliable, and before applying thenetwork to events in the peak we compare extracted “ features ” of MC and data events onthe sidebands. It is common to consider convolutional layers as feature extractors (each oneextracting higher level features), and consecutive dense layers as a classifier. We chose thefirst flattened layer as a representative feature vector and applied a two sample test - a testto determine whether independent random samples of R d -valued random vectors are drawnfrom the same underlying distribution, for which we chose energy test statistics [25, 26].The energy distance between two sets A, B is given by (cid:15)
A,B = 2 nm n (cid:88) i =1 m (cid:88) j =1 (cid:107) x i − y j (cid:107) − n n (cid:88) i =1 n (cid:88) j =1 (cid:107) x i − x j (cid:107) − m m (cid:88) i =1 m (cid:88) j =1 (cid:107) y i − y j (cid:107) (4.1)where x i , y i are n, m samples drawn from the two sets. In [26], it was proven that thisquantity is non-negative and equal to zero only if x i and y i are identically distributed. Thep-value, or probability of observing an equal or more extreme value than the measuredvalue, for rejecting the null hypothesis (in this case, that the samples come from the samedistribution) can be calculated via the permutation test [27]. Namely, the nominal energydistance is computed, and the x i and y j are then divided into many (1000 in our case)possible arrangements of two groups of size n and m . The energy distance is computedagain for each of these arrangements, each of which corresponds to one permutation. Thep-value is given by the fraction of permutations in which the energy distance was largerthan the nominal one.The training and validation losses, which are measures of disagreement between theCNN predictions and true labels, are given in Fig. 7 for the networks trained with andwithout data augmentation. The overfitting apparent in the case of training withoutaugmentation is prompt and is manifested in the divergence of the validation and testlosses, meaning that the network is beginning to memorize the training dataset and is notgeneralizing well. In Fig. 8 we show that the data augmentation also reduces the data/MCfeatures distribution distance (eq. 4.1), giving us more confidence that the performance ondata will be similar to the performance on MC. As the distances are always calculated onMC and data events directly (without applying any data augmentation transformations),this technique does not directly correct MC but rather makes the model more robust tothe data/MC differences. The final model is chosen by varying regularization parametersand selecting the training iteration step that gives minimal classification loss on the MC For validation/testing, the SiPM cut was fixed at 20 photoelectrons, while in the augmentation it wasallowed to vary ±
10 around this value. In machine learning language, features are sets of numbers extracted from input data that are relevantto the task being learned. – 10 –alidation sample, ensuring that the corresponding p-value of energy test statistics is notlarger than 5%. xy original different SiPM charge translation x-flipping zoom x z y z Figure 6 : Example of on-the-fly data augmentation used during training on a selectedsignal event, projected on three planes for easier visualization.
Iterations ×10 L o ss trainvalidation 0 1 2 3 4 5 6 Iterations ×10 L o ss trainvalidation Figure 7 : Training and validation losses without (left) and with (right) the application ofdata augmentation to the training set. Overfitting in the left-hand plot is visible only after1000 iterations. As the augmentation procedure is only relevant to the training phase, it wasnot applied to the validation set. The ability of the network to make correct predictions isimproved for events unaltered by data augmentation, which explains why the loss is higherfor the training set than for the validation set in the right-hand plot.– 11 –
Iterations ×10 E n e r g y d i s t a n c e no augmentationaugmentation 0 1 2 3 4 5 6 Iterations ×10 E n e r g y d i s t a n c e no augmentationaugmentation Figure 8 : Energy distance between data and MC features during the training on the leftsideband (left) and right sideband (right) for training with and without the augmentation.The corresponding p-value for the chosen model with augmentation at the chosen iterationstep was ∼ In an ideal test of the trained network, we would have a data sample of only e + e − eventsat the energy of interest acquired from our detector, and another sample of single-electronevents at the same energy. However, as we will always have background events, in particulardue to Compton scattering of the high-energy gamma rays used in producing the e + e − events with the topology of interest, an exactly-labeled test set of detector data is impossible.Therefore we make an assumption about the characteristics of the energy spectrum nearthe energy of interest and attempt to extract the number of signal and background eventspresent, following the procedure explained in [3].First, we select only fiducial events passing a single-track cut as explained in section 4.1.Note that the single-track cut was not applied to the training set, but we do apply it to thetest set to allow for exact comparison with the previous analysis. We then assume that thesignal events produce a Gaussian peak (as indeed would be the case for events occurringat a precise energy), and that the background, consisting of Compton electrons, in theregion of the peak can be characterized by an exponential distribution. The peak energyregion is fixed to 1.570-1.615 MeV (as in [3]), a region that contains more than 99.5% ofthe Gaussian peak for both data and MC. Then, we apply an unbinned fit of the sum oftwo curves (Gaussian + exponential) to the full energy spectrum in the larger energy range1.45-1.75 MeV in order to keep the fits stable, obtaining the parameters defining the twocurves. Integrating over theoretical Gaussian and exponential curves in the peak energyrange gives us the estimate of the initial number of signal events s (from the Gaussian)and the initial number of background events b (from the exponential). This procedure is Note the slightly narrower energy range compared to the one specified in section 4.1 which is used topre-select MC events based on true energy deposited in the detector. This choice is made to remove artificialdisturbances at the end points of the selected energy range. – 12 –hen repeated using the spectra obtained from events with network classification greaterthan a varying threshold, in each case obtaining the number of accepted signal events s and accepted background events b .Figure 9 illustrates this fit procedure for three different threshold values on the CNNprediction output using data from NEXT-White and a set of Monte Carlo simulated eventswhich were not present in the training set. Varying the classification threshold tracesout a curve in the space of signal acceptance s/s vs. background rejection 1 − b/b (seeFig. 10). To obtain optimal sensitivity in a 0 νββ search, one must maximize the ratio ofaccepted signal to the square root of the rejected background [15], and therefore we alsoconstruct the figure of merit F = s/ √ b for the various classification thresholds. We show forcomparison the non-CNN-based result obtained in [3]. In Monte Carlo, we find a maximumfigure of merit of F = 2 .
20 with signal acceptance s/s = 0 .
70 and background rejection1 − b/b = 0 .
90. In data, fixing the CNN cut to the one giving the best Monte Carlo figureof merit, we find F = 2 .
21, with signal acceptance s/s = 0 .
65 and background rejection1 − b/b = 0 . onte Carlo Energy (MeV) E v e n t s / b i n cut = 0.0datagaussianexponential Energy (MeV) E v e n t s / b i n cut = 0.5datagaussianexponential Energy (MeV) E v e n t s / b i n cut = 0.85datagaussianexponential Data
Energy (MeV) E v e n t s / b i n cut = 0.0datagaussianexponential Energy (MeV) E v e n t s / b i n cut = 0.5datagaussianexponential Energy (MeV) E v e n t s / b i n cut = 0.85datagaussianexponential Figure 9 : Fits of the energy spectra of Monte Carlo simulation (top) and data (bottom)near the double-escape peak of
Tl at 1592 keV (see text for details). The fits are shownusing events passing a neural network classification cut of 0.0 (to obtain the total number ofsignal and background events), 0.5 (the cut usually used in binary classification problems)and 0.85 (the cut that was found to yield the optimal f.o.m.) S i g n a l e ff i c i e n c y MC true labels (CNN)MC fit (CNN)data fit (CNN)MC fit (standard)data fit (standard) 0.0 0.2 0.4 0.6 0.8 1.0CNN prediction threshold0.00.51.01.52.02.5 f . o . m . ( s / b ) data (standard) maximumMC true labels(CNN) MC fit (CNN)data fit (CNN) Figure 10 : The signal acceptance vs. background rejection (left) and the figure of merit(right). The curves labeled “fit” are traced out by varying the neural network classificationthreshold and determining the fraction of accepted signal and rejected background usingthe fit procedure described in the text, while the MC true labels are obtained using MClabels as described in section 4.1. – 14 –
Conclusions
We have demonstrated the first data-based evaluation of track classification in HPXe TPCswith neural networks. The results confirm the potential of the method demonstrated inprevious simulation-based studies and show that neural networks trained using a detailedMonte Carlo can be employed to make predictions on real data. The present results showthat the background contamination can be reduced to approximately 10% while maintaininga signal efficiency of about 65%. In fact, these results are likely to be conservative, asthis demonstration was performed at an energy of 1592 keV, while at the same pressure,tracks with energy Q ββ are longer, and therefore their topological features should be morepronounced.Furthermore, we have shown that, with the application of appropriate domain regular-ization techniques to the training set, our model performs similarly on detector data andsimulation in the extraction of the signal events of interest. Acknowledgments
This study used computing resources from Artemisa, co-funded by the European Unionthrough the 2014-2020 FEDER Operative Programme of the Comunitat Valenciana, projectIDIFEDER/2018/048. The NEXT Collaboration acknowledges support from the followingagencies and institutions: the European Research Council (ERC) under the Advanced Grant339787-NEXT; the European Union’s Framework Programme for Research and InnovationHorizon 2020 (2014-2020) under the Grant Agreements No. 674896, 690575 and 740055;the Ministerio de Econom´ıa y Competitividad and the Ministerio de Ciencia, Innovaci´ony Universidades of Spain under grants FIS2014-53371-C04, RTI2018-095979, the SeveroOchoa Program grants SEV-2014-0398 and CEX2018-000867-S, and the Mar´ıa de MaeztuProgram MDM-2016-0692; the GVA of Spain under grants PROMETEO/2016/120 andSEJI/2017/011; the Portuguese FCT under project PTDC/FIS-NUC/2525/2014 and underprojects UID/FIS/04559/2020 to fund the activities of LIBPhys-UC; the U.S. Department ofEnergy under contracts number DE-AC02-06CH11357 (Argonne National Laboratory), DE-AC02-07CH11359 (Fermi National Accelerator Laboratory), DE-FG02-13ER42020 (TexasA&M) and DE-SC0019223 / DE-SC0019054 (University of Texas at Arlington); and theUniversity of Texas at Arlington. DGD acknowledges Ramon y Cajal program (Spain) undercontract number RYC-2015-18820. JM-A acknowledges support from Fundaci´on Bancaria laCaixa (ID 100010434), grant code LCF/BQ/PI19/11690012. We also warmly acknowledgethe Laboratori Nazionali del Gran Sasso (LNGS) and the Dark Side collaboration fortheir help with TPB coating of various parts of the NEXT-White TPC. Finally, we aregrateful to the Laboratorio Subterr´aneo de Canfranc for hosting and supporting the NEXTexperiment.
References [1]
NEXT
Collaboration, F. Monrabal et al.,
The Next White (NEW) detector , JINST (2018)P12010, [ arXiv:1804.02409 ]. – 15 – NEXT
Collaboration, J. Renner et al.,
Energy calibration of the NEXT-White detector with1% resolution near Q ββ of Xe , JHEP (2019) 230, [ arXiv:1905.13110 ].[3] NEXT
Collaboration, P. Ferrario et al.,
Demonstration of the event identification capabilitiesof the NEXT-White detector , JHEP (2019) 052, [ arXiv:1905.13141 ].[4] NEXT
Collaboration, P. Novella et al.,
Radiogenic backgrounds in the NEXT double betadecay experiment , JHEP (2019) 051, [ arXiv:1905.13625 ].[5] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, andL. Zdeborov´a, Machine learning and the physical sciences , Rev. Mod. Phys. (2019) 045002.[6] A. Aurisano, A. Radovic, D. Rocco, A. Himmel, M. Messier, E. Niner, G. Pawloski, F. Psihas,A. Sousa, and P. Vahle, A convolutional neural network neutrino event classifier , JINST (2016), no. 09 P09001, [ arXiv:1604.01444 ].[7] MicroBooNE
Collaboration, R. Acciarri et al.,
Convolutional neural networks applied toneutrino events in a liquid argon time projection chamber , JINST (2017), no. 03 P03011,[ arXiv:1611.05531 ].[8] MicroBooNE
Collaboration, C. Adams et al.,
Deep neural network for pixel-levelelectromagnetic particle identification in the microboone liquid argon time projection chamber , Phys. Rev. D (May, 2019) 092001, [ arXiv:1808.07269 ].[9] N. Choma, F. Monti, L. Gerhardt, T. Palczewski, Z. Ronaghi, P. Prabhat, W. Bhimji, M. M.Bronstein, S. R. Klein, and J. Bruna, Graph neural networks for icecube signal classification , in ,pp. 386–391, 2018. .[10] E. Racah, S. Ko, P. Sadowski, W. Bhimji, C. Tull, S. Oh, P. Baldi, and Prabhat,
Revealingfundamental physics from the daya bay neutrino experiment using deep neural networks , in ,pp. 892–897, 2016. arXiv:1601.07621 .[11]
EXO
Collaboration, S. Delaquis et al.,
Deep neural networks for energy and positionreconstruction in EXO-200 , JINST (2018), no. 08 P08023, [ arXiv:1804.09641 ].[12] H. Qiao, C. Lu, X. Chen, K. Han, X. Ji, and S. Wang, Signal-background discrimination withconvolutional neural networks in the PandaX-III experiment using MC simulation , ScienceChina Physics, Mechanics & Astronomy (2018), no. 10 101007, [ arXiv:1905.13141 ].[13] P. Ai, D. Wang, G. Huang, and X. Sun, Three-dimensional convolutional neural networks forneutrinoless double-beta decay signal/background discrimination in high-pressure gaseous timeprojection chamber , JINST (2018), no. 08 P08015, [ arXiv:1803.01482 ].[14] NEXT
Collaboration, J. Renner et al.,
Background rejection in NEXT using deep neuralnetworks , JINST (2017), no. 01 T01004, [ arXiv:1609.06202 ].[15] NEXT
Collaboration, J. Mart´ın-Albo et al.,
Sensitivity of NEXT-100 to neutrinoless doublebeta decay , JHEP (2016) 159, [ arXiv:1511.09246 ].[16] D. Nygren, High-pressure xenon gas electroluminescent TPC for − ν ββ -decay search , Nucl.Instrum.Meth.
A603 (2009) 337–348.[17]
NEXT
Collaboration, G. Mart´ınez-Lema et al.,
Calibration of the NEXT-White detectorusing m Kr decays , JINST (2018) P10014, [ arXiv:1804.01780 ]. – 16 –
18] J. Mart´ın-Albo,
The NEXT experiment for neutrinoless double beta decay searches . PhDthesis, Valencia U., IFIC, 2015.[19]
GEANT4
Collaboration, S. Agostinelli et al.,
GEANT4: A Simulation toolkit , Nucl. Instrum.Meth.
A506 (2003) 250–303.[20] B. Graham and L. van der Maaten,
Submanifold sparse convolutional networks , ArXiv abs/1706.01307 (2017).[21] L. Domin´e and K. Terao,
Scalable deep convolutional neural networks for sparse, locally denseliquid argon time projection chamber data , ArXiv abs/1903.05663 (2019).[22] K. He, X. Zhang, S. Ren, and J. Sun,
Deep residual learning for image recognition , arXiv:1512.03385 .[23] K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks , in
ECCV ,2016.[24] C. Shorten and T. M. Khoshgoftaar,
A survey on image data augmentation for deep learning , Journal of Big Data (2019) 1–48.[25] G. Szekely and M. Rizzo, Testing for equal distributions in high dimension , InterStat (11,2004).[26] G. Szekely and M. Rizzo, Energy statistics: A class of statistics based on distances , Journal ofStatistical Planning and Inference (08, 2013).[27] R. Fisher, The Design of Experiments . The Design of Experiments. Oliver and Boyd, 1935.[28]
NEXT
Collaboration, V. ´Alvarez et al.,
Initial results of NEXT-DEMO, a large-scaleprototype of the NEXT-100 experiment , JINST (2013) P04002, [ arXiv:1211.4838 ].[29] NEXT
Collaboration, V. Alvarez et al.,
Operation and first results of the NEXT-DEMOprototype using a silicon photomultiplier tracking array , JINST (2013) P09011,[ arXiv:1306.0471 ].].