[PDF] Demonstration of background rejection using deep convolutional neural networks in the NEXT experiment

Abstract

Convolutional neural networks (CNNs) are widely used state-of-the-art computer vision tools that are becoming increasingly popular in high energy physics. In this paper, we attempt to understand the potential of CNNs for event classification in the NEXT experiment, which will search for neutrinoless double-beta decay in 136 Xe. To do so, we demonstrate the usage of CNNs for the identification of electron-positron pair production events, which exhibit a topology similar to that of a neutrinoless double-beta decay event. These events were produced in the NEXT-White high-pressure xenon TPC using 2.6-MeV gamma rays from a 228 Th calibration source. We train a network on Monte Carlo-simulated events and show that, by applying on-the-fly data augmentation, the network can be made robust against differences between simulation and data. The use of CNNs offer significant improvement in signal efficiency/background rejection when compared to previous non-CNN-based analyses.

Full PDF

PPrepared for submission to JHEP

Demonstration of background rejection using deepconvolutional neural networks in the NEXTexperiment

NEXT collaboration

M. Kekic, , C. Adams, K. Woodruﬀ, J. Renner, , E. Church, M. Del Tutto, J.A. Hernando Morata, J.J. G´omez-Cadenas, , ,a V. ´Alvarez, L. Arazi, I.J. Arnquist, C.D.R Azevedo, K. Bailey, F. Ballester, J.M. Benlloch-Rodr´ıguez, , F.I.G.M. Borges, N. Byrnes, S. C´arcel, J.V. Carri´on, S. Cebri´an, C.A.N. Conde, T. Contreras, G. D´ıaz, J. D´ıaz, M. Diesburg, J. Escada, R. Esteve, R. Felkai, , , A.F.M. Fernandes, L.M.P. Fernandes, P. Ferrario, , A.L. Ferreira, E.D.C. Freitas, J. Generowicz, S. Ghosh, A. Goldschmidt, D. Gonz´alez-D´ıaz, R. Guenette, R.M. Guti´errez, J. Haefner, K. Haﬁdi, J. Hauptman, C.A.O. Henriques, P. Herrero, V. Herrero, Y. Ifergan, , B.J.P. Jones, L. Labarga, A. Laing, P. Lebrun, N. L´opez-March, , M. Losada, R.D.P. Mano, J. Mart´ın-Albo, A. Mart´ınez, , G. Mart´ınez-Lema, , ,b M. Mart´ınez-Vara, A.D. McDonald, Z.-E. Meziani F. Monrabal, , C.M.B. Monteiro, F.J. Mora, J. Mu˜noz Vidal, , P. Novella, D.R. Nygren, ,a B. Palmeiro, , A. Para, J. P´erez, M. Querol, A.B. Redwine, L. Ripoll, Y. Rodr´ıguez Garc´ıa, J. Rodr´ıguez, L. Rogers, B. Romeo, , C. Romo-Luque, F.P. Santos, J.M.F. dos Santos, A. Sim´on, C. Sofka, ,c M. Sorel, T. Stiegler, J.F. Toledo, J. Torrent, A. Us´on, J.F.C.A. Veloso, R. Webb, R. Weiss-Babai, ,d J.T. White, ,e N. Yahlali Department of Physics and Astronomy, Iowa State University, 12 Physics Hall, Ames, IA 50011-3160, USA Argonne National Laboratory, Argonne, IL 60439, USA Department of Physics, University of Texas at Arlington, Arlington, TX 76019, USA Institute of Nanostructures, Nanomodelling and Nanofabrication (i3N), Universidade de Aveiro,Campus de Santiago, Aveiro, 3810-193, Portugal a NEXT Co-spokesperson. b Now at Weizmann Institute of Science, Israel. c Now at University of Texas at Austin, USA. d On leave from Soreq Nuclear Research Center, Yavneh, Israel. e Deceased. a r X i v : . [ phy s i c s . i n s - d e t ] S e p Fermi National Accelerator Laboratory, Batavia, IL 60510, USA Nuclear Engineering Unit, Faculty of Engineering Sciences, Ben-Gurion University of the Negev,P.O.B. 653, Beer-Sheva, 8410501, Israel Nuclear Research Center Negev, Beer-Sheva, 84190, Israel Lawrence Berkeley National Laboratory (LBNL), 1 Cyclotron Road, Berkeley, CA 94720, USA Ikerbasque, Basque Foundation for Science, Bilbao, E-48013, Spain Centro de Investigaci´on en Ciencias B´asicas y Aplicadas, Universidad Antonio Nari˜no, SedeCircunvalar, Carretera 3 Este No. 47 A-15, Bogot´a, Colombia Department of Physics, Harvard University, Cambridge, MA 02138, USA Laboratorio Subterr´aneo de Canfranc, Paseo de los Ayerbe s/n, Canfranc Estaci´on, E-22880, Spain LIBPhys, Physics Department, University of Coimbra, Rua Larga, Coimbra, 3004-516, Portugal LIP, Department of Physics, University of Coimbra, Coimbra, 3004-516, Portugal Department of Physics and Astronomy, Texas A&M University, College Station, TX 77843-4242,USA Donostia International Physics Center (DIPC), Paseo Manuel Lardizabal, 4, Donostia-San Sebas-tian, E-20018, Spain Escola Polit`ecnica Superior, Universitat de Girona, Av. Montilivi, s/n, Girona, E-17071, Spain Departamento de F´ısica Te´orica, Universidad Aut´onoma de Madrid, Campus de Cantoblanco,Madrid, E-28049, Spain Instituto de F´ısica Corpuscular (IFIC), CSIC & Universitat de Val`encia, Calle Catedr´atico Jos´eBeltr´an, 2, Paterna, E-46980, Spain Paciﬁc Northwest National Laboratory (PNNL), Richland, WA 99352, USA Instituto Gallego de F´ısica de Altas Energ´ıas, Univ. de Santiago de Compostela, Campus sur, R´uaXos´e Mar´ıa Su´arez N´u˜nez, s/n, Santiago de Compostela, E-15782, Spain Instituto de Instrumentaci´on para Imagen Molecular (I3M), Centro Mixto CSIC - UniversitatPolit`ecnica de Val`encia, Camino de Vera s/n, Valencia, E-46022, Spain Centro de Astropart´ıculas y F´ısica de Altas Energ´ıas (CAPA), Universidad de Zaragoza, CallePedro Cerbuna, 12, Zaragoza, E-50009, Spain

E-mail: [email protected]

Abstract:

Xe. To do so, wedemonstrate the usage of CNNs for the identiﬁcation of electron-positron pair productionevents, which exhibit a topology similar to that of a neutrinoless double-beta decay event.These events were produced in the NEXT-White high pressure xenon TPC using 2.6 MeVgamma rays from a

Th calibration source. We train a network on Monte Carlo simulatedevents and show that, by applying on-the-ﬂy data augmentation, the network can be maderobust against diﬀerences between simulation and data. The use of CNNs oﬀer signiﬁcantimprovement in signal eﬃciency/background rejection when compared to previous non-CNN-based analyses.

Keywords:

Neutrinoless double beta decay; TPC; high-pressure xenon chambers; NEXTexperiment; CNN; event classiﬁcation ontents

Machine learning techniques have recently captured the interest of researchers in variousscientiﬁc ﬁelds, including particle physics, and are now being employed in search of improvedsolutions to a variety of problems. In this study, we show that deep convolutional neuralnetworks (CNNs) trained on Monte Carlo simulation can be used to classify, to a highdegree of accuracy, events containing particular topologies of ionization tracks acquiredfrom a high-pressure xenon (HPXe) time projection chamber (TPC). As CNNs trainedon simulation are known to be diﬃcult to apply directly to data due to the challengesassociated with producing a Monte Carlo that perfectly matches experiment, we also presentmethods for extending the domain of application of a CNN trained on simulated eventsto include real events. We claim that our use of these methods in adapting CNNs to theexperimental domain and verifying their performance is novel to the use of CNNs in theﬁeld.Event classiﬁcation is of critical importance in experiments searching for rare physics, asthe successful rejection of background events can lead to signiﬁcant improvements in overallsensitivity. The NEXT (Neutrino Experiment with a Xenon TPC) experiment is searchingfor neutrinoless double-beta decay (0 νββ ) in

Xe at the Canfranc underground laboratoryin Spain. In the ongoing ﬁrst phase of the experiment, the 5 kg-scale TPC NEXT-White[1] has demonstrated excellent energy resolution [2] and the ability to reconstruct high-energy ( O (2) MeV) ionization tracks and distinguish between the topological signaturesof two-electron and one-electron tracks [3]. It has also been used to perform a detailed– 1 –easurement of the background distribution and is expected to be capable of measuring the2 νββ mode in Xe with 3.5 σ sensitivity after 1 year of data-taking [4]. The next phaseof the experiment, the 100 kg-scale detector NEXT-100, will search for the 0 νββ mode at Q ββ , around 2.5 MeV. New techniques such as CNNs which analyze the topology of anevent near Q ββ and aim to eliminate background events are becoming more relevant andessential to reaching the best possible sensitivity.Machine learning techniques have seen many recent applications in physics [5]. Inneutrino physics in particular, CNNs have been applied to particle identiﬁcation in samplingcalorimeters in the NOvA experiment [6]. The MicroBooNE experiment has also employedCNNs for event classiﬁcation and localization [7] and track segmentation [8] in liquid argonTPCs. IceCube has applied graph neural networks to perform neutrino event classiﬁcation[9], and DayaBay identiﬁed antineutrino events in gadolinium-doped liquid scintillatordetectors using CNNs and convolutional autoencoders [10]. Experiments searching for 0 νββ decay have also employed CNNs: EXO has studied the use of CNNs to extract event energyand position from raw waveform information in a liquid xenon TPC [11] and PandaX-IIIhas performed simulation studies demonstrating the use of CNNs for background rejectionin a HPXe gas TPC with a micromegas-based readout [12]. Further simulation studies inHPXe TPCs with a charge readout scheme (“Topmetal”) allowing for detailed 3D trackreconstruction have also shown the potential of CNNs for background rejection in 0 νββ searches [13]. NEXT has also presented an initial simulation study [14] of the use of CNNsfor background rejection. In this study we show that CNNs can be applied to real NEXTdata, using electron-positron pair production to generate events with a two-electron “ ββ -like”topology and studying how the energy distribution of such events changes when varying anacceptance cut on the classiﬁcation prediction of a CNN.The paper is organized as follows: section 2 describes the topological signature of asignal event. In section 3 the data acquisition and reconstruction is explained. A descriptionof the CNN and training procedure, as well as evaluation on MC and data is given insection 4. Finally, conclusions are drawn in section 5. In a fully-contained 0 νββ event recorded by a HPXe TPC, two energetic electrons produceionization tracks emanating from a common vertex. Though the fraction of energy Q ββ carried by each individual electron may diﬀer event-by-event, the general pattern observedis similar for the majority of events, and consists of an extended track capped on both endsby two “blobs”, or regions of relatively higher ionization density. These regions are presentdue to the increase in stopping power experienced by electrons in xenon gas as they slowto lower energies. They provide a distinct signature for 0 νββ decay, as measured trackswith similar energy produced by single electrons , for example photoelectric interactionsof background gamma radiation, contain only one such “blob”. The use of this signature,illustrated in Fig. 1, in performing background rejection is an essential part of the NEXTapproach to maximizing sensitivity to 0 νββ decay. Events with multi-tracks are easier to reject simply by counting the number of isolated depositions. – 2 – (mm) -60 -40 -20 0 20 40 60 Y ( mm ) -60-40-200204060 X (mm)

40 60 80 100 120 140 160 Y ( mm ) -100-80-60-40-20020 ✔ ✔ ✔✘ SIGNAL BACKGROUND

Figure 1 : Energy depositions from trajectories in a Monte Carlo simulation of a 0 νββ event, showing its distinct two-electron topological signature (left) compared with that ofsingle-electron event (right) of the same energy (ﬁgure from [15]).In order to demonstrate this approach experimentally, a reliable source of events with asimilar topological signature is necessary. Electron-positron pair production by high energygammas, followed by the subsequent escape from the active volume of the two 511 keVgamma rays produced in positron annihilation (“double-escape”), leaves a two-blob trackformed by the electron and positron emitted from a common vertex, similar to the trackthat would be left by a 0 νββ event. In this study, we use gamma rays of energy 2614.5 keVfrom

Tl (provided by a

Th calibration source, see Fig. 2) and observe the events inthe double-escape peak at 1592 keV. This peak lies on top of an exponential background ofsingle-electron tracks from Compton scattering of the calibration gamma rays and otherbackground radiation. Experimentally, then, we have a sample containing 0 νββ -like eventsand background-like events. By evaluating these events with a Monte-Carlo-trained neuralnetwork and studying the resulting distribution of accepted events, we can demonstrate,using real data acquired with the NEXT-White TPC, the potential performance of sucha network when employed in a 0 νββ search. These results can be compared to a similar,non-CNN based analysis published in [3].

The NEXT-White TPC measures both the primary scintillation and ionization producedby a charged particle traversing its active volume of high-pressure xenon gas. The maindetector components are housed in a cylindrical stainless steel pressure vessel lined withcopper shielding and include two planes of photosensors, one at each end, and severalsemi-transparent wire meshes to which voltages are applied, deﬁning key regions of the– 3 –etector (see Fig. 2). The two planes of photosensors are organized into an energy plane ,containing 12 PMTs (photomultiplier tubes, Hamamatsu model R11410-10) behind thecathode, and a tracking plane containing a grid of 1792 SiPMs (silicon photomultipliers,SensL series-C, spaced at a 10 mm pitch) behind the anode. These sensors observe thescintillation produced in the active volume of the detector by ionizing radiation, includingprimary scintillation produced by excitations of the xenon atoms during the creation of theionization track and secondary scintillation produced by electroluminescence (EL) of theionization electrons. Note that in practice only the PMTs observe a consistently measurableprimary scintillation signal, while EL is observed by both the PMTs and the SiPMs. electroluminescent (EL)gap (6 mm)50 cm drift region P M T ( e n e r g y ) p l a n e c a t h o d e g a t e quartz plate(anode) SiPM (tracking)plane E Th source copper shielding

Cs src

Figure 2 : Schematic of the NEXT-White TPC, showing the positioning of the calibrationsources (

Cs and

Th) present during data acquisition for this study (ﬁgure derivedfrom [2]).EL occurs after the electrons of the ionization track are drifted through the activeregion by an electric ﬁeld (of order 400 V/cm) created by application of high voltage to thecathode (-30 kV) and gate (-7.6 keV) meshes and arrive at the EL gap, a narrow (6 mm)region deﬁned by the gate mesh and a grounded quartz plate on which a conductive indiumtin oxide (ITO) coating has been deposited. The large voltage drop over the narrow gapbetween the gate and the grounded plate creates an electric ﬁeld high enough to acceleratethe electrons to energies suﬃcient to excite the xenon without producing further ionization,allowing for better energy resolution compared to the charge-avalanche detectors [16]. Thesubsequent decay of these excitations lead to EL scintillation, yielding of order 500-1000photons per electron traversing the EL gap. These photons, produced just in front of thetracking plane, cast a pattern of light on the SiPMs which can be used to reconstruct the( x, y ) location of the ionization. The PMTs located in the energy plane on the opposite side– 4 –f the detector see a more uniform distribution of light, including EL photons that haveundergone a number of reﬂections in the detector, and record a greater total number ofphotons for a more precise measurement of the energy. The time diﬀerence between theobservation of the primary scintillation (called S1) and secondary EL scintillation (calledS2) gives the distance drifted by the ionization electrons before arriving at the EL region,corresponding to the z location at which this ionization was produced. The data used in this study consisted of events with total energy near 1.6 MeV, includingelectron-positron events produced in pair production interactions from a 2.6 MeV gammaray (see section 2) and background events, mostly due to Compton scattering of the same2.6 MeV gamma rays . The acquired signals for each event consisted of 12 PMT waveformssampled at 25 ns intervals and 1792 SiPM waveforms sampled at 1 µ s intervals for atotal duration per read-out greater than the TPC maximum drift (approximately 500microseconds). The ADC counts per unit of time in each waveform were converted tophotoelectrons per unit time via conversion factors established by periodic calibration usingLEDs installed inside the detector, a standard procedure in NEXT-White operation. Thecalibrations were performed by driving LEDs installed inside the vessel with short pulsesand measuring the integrated ADC counts corresponding to a single photoelectron (pe).The analysis of the acquired data was similar to that of [3]. The 12 PMT waveformswere summed, weighted by their calibrated gains, to produce a single waveform in whichscintillation pulses were identiﬁed and classiﬁed as S1 or S2 according to their shape andlocation within the waveform. Events containing a single S1 pulse and at least one S2pulse were selected, and for these events, the S2 information was used to reconstruct theionization track. To do this, the S2 information was integrated into time bins of width 2 µ sin both the PMTs and SiPMs. Note that to eliminate dark noise, SiPM samples with lessthan 1 pe were not included in the integration.For each time bin, one or more energy depositions (“ hit ”) was reconstructed, and thepattern of signals observed on the SiPMs was used to determine the number of hits fora speciﬁc time bin and their corresponding ( x, y ) coordinates. A hit was assigned to thelocation of all SiPMs with an observed signal greater than a given threshold, and the totalenergy measured by the PMTs in that time bin was redistributed among the hits accordingto their relative SiPM signals.The energy of each hit as measured by the PMTs was then corrected, hit-by-hit, bytwo multiplicative factors, one accounting for geometric variations in the light response inthe EL plane and the other for electron attachment due to a ﬁnite electron lifetime in thegas. These correction factors were mapped out over the active volume by simultaneouslyacquiring events from decays of Kr, which was injected into the xenon gas and provideduniformly distributed point-like depositions of energy 41.5 keV [17]. The z -coordinate ofeach hit in the time bin was obtained from the time diﬀerence between S1 and S2 pulses,assuming an electron drift velocity of 0.91 mm/ µ s, as extracted from an analysis of the Environmental radioactivity is negligible compared to the source one. – 5 – Kr events. A residual dependence of the event energy on the length of the event alongthe z-axis is observed, and a linear correction is performed to model this eﬀect, which isnot observed in simulation and remains to be fully understood. For details on this “axiallength” eﬀect, see [2].The detector volume surrounding the reconstructed hits was then partitioned into 3Dvoxels of side length 10x10x5 mm , and the energy of all hits that fell within each voxelwas integrated. The X and Y dimensions of the individual voxels were chosen based on the1 cm SiPM pitch, while the Z dimension was chosen to account for most of the longitudinaldiﬀusion (1 σ spread at maximum drift length is ∼ Figure 3 : Reconstructed hits (left) and voxels (right) of a background Monte Carlo event.The volume within a tight bounding box encompassing the reconstructed hits is dividedinto 10x10x5 mm voxels to produced the voxelized track. To generate the events used in training the neural network, a full Monte Carlo (MC) ofthe detector, including the pressure vessel, internal copper shielding, and sensor planes,was constructed using Nexus [18], a simulation package for NEXT based on GEANT4 [19](version geant4.10.02.p01). The

Th calibration source decay and the resulting interactionsof the decay products were simulated by GEANT4, up to and including the productionof the ionization track. Events in the energy range of 1.4-1.8 MeV were selected, and thesubsequent electron drift, diﬀusion, electroluminescence, photon detection, and electronicreadout processes were simulated outside of GEANT4 to produce for each event a setof sensor waveforms corresponding to those acquired in NEXT-White. The analysis ofdata waveforms described in section 3.2 could then be applied to these MC waveforms toproduce voxelized tracks (see Fig. 3). MC events that were fully contained in the activedetector volume were used in the training set. To ensure the classiﬁcation was done onlybased on the track topology, the energy of each voxel was scaled by the total event energy– 6 – .45 1.50 1.55 1.60 1.65 1.70 1.75

Energy (MeV) e v e n t c o un t all eventssignal events Energy (MeV) e v e n t c o un t all eventssidebands Figure 4 : Left: Energy distribution of all MC events (dashed line histogram) and of chosensignal events (solid histogram). Right: Energy distribution of experimental data eventsshowing selected sideband events. The sidebands are 100 keV in width, with each bandstarting 45 keV from either side of the double escape peak. The same procedure is alsoused to select the sidebands in MC.(the sum of voxel intensities for a given event was normalized to 1) such that the trainingdata did not contain event energy information. Those events containing an electron anda positron registered in the MC true information, with no additional energy depositedby the two 511 keV gamma rays produced upon annihilation of the positron (i.e. a true“double-escape”), were tagged as “signal” events and all others were tagged as “background”.In [3], an additional single-track selection cut is made, and for a fair comparisonwith this previous result we also apply the same cut (obtained from the standard trackreconstruction, for details see [3]) on test data only, for both MC and experimental data. Asa reference, inside the peak energy range, the eﬃciency of the single-track cut was ∼ ∼ νββ search for which we do not have a conﬁrmed understanding of the underlyingphysics, nor would it be justiﬁed to make predictions on the same events used in optimizing– 7 –he network.Therefore, we develop a general paradigm (as described in section 4.3) that could beapplied at 0 νββ energies and, in evaluating the performance of the network on the datadomain, uses events outside the energy range within which we intend to make predictions.Namely, before applying the CNN to the peak itself, we evaluate the performance on thepeak sidebands (see Fig. 4), where the sample composition is known, and we expect the CNNpredictions to be similar in data and MC. The underlying assumption is that the domainshift between MC and data is not correlated with the type of event, i.e. we expect thatif a network is robust to MC/data diﬀerences on sidebands, it will be robust to MC/datadiﬀerences in the peak region as well. In [3] it was shown that the track length diﬀerencebetween data and MC is consistent across a wide energy range, giving us conﬁdence thatthe diﬀerences are indeed coming from the detector simulation and reconstruction (whichshould have the same eﬀect on both signal and background events), rather than incorrectlysimulated physical processes, justifying the sidebands-testing approach. In this study we embedded our network architecture within the Submanifold Sparse Convo-lutional Networks (SCN) framework [20], implemented in PyTorch. SCN is highly suitablefor sparse input data, making the linear algebra far more eﬃcient than with non-sparsetechniques. Further, in SCN the convolution rules allow only nonzero voxels in initial layersto hold non-zero output, thus conserving input sparsity. SCN is appropriate for our detectorin which the large majority of voxels have zero charge. Such networks have already beenused in high energy physics analysis [21] and the main advantage of these types of network isthat they occupy less memory and allow for larger input volumes and/or larger batch sizes.All of the results shown here were obtained using this framework, but we obtained similarresults using the standard implementation of dense convolutions in Keras/TensorFlow.We employed a residual [22] 3D CNN in performing the topological classiﬁcation task.The network architecture is summarized in Fig. 5. The network consisted of two initialconvolutional layers, and a set of pre-activated ResNet block layers [23] followed by twoconsecutive dense layers with a dropout layer before each. The input dimensions were40x40x110 with each input corresponding to one voxel, therefore covering a volume of40x40x55 cm , essentially the entire active volume of the detector. The output was a2-element probability vector. A total of about 500k simulated ﬁducial events were used as a training set, of which 200kwere signal events, and an additional ∼

30k events were used as a validation sample withsimilar signal proportion. A batch size of 1024 was chosen, and binary cross entropy,weighted according to the signal/background ratio of the entire data set, was used as theloss function. To avoid overﬁtting, L2 weight regularization and dropout were employed,as well as on-the-ﬂy data augmentation [24], including translations, dilation or “zooming” On-the-ﬂy means that the augmentation is done during the code execution and the augmented datasetis not stored on the disk. – 8 – a) ResNet architecture (b) ResNet pre-activated block

Figure 5 : a) Summary of the neural network architecture used in this analysis, with b)details of each ResNetBlock architecture. – 9 –scaling all 3 axes independently), ﬂipping in x and y, and varying SiPM charge cuts asdetailed in Fig. 6. We note that augmentation procedures used here are explicitly designedto be “label preserving” in that they do not change the single- or double-blob nature ofevents, but do reduce the signiﬁcance of diﬀerences in data/simulation.As noted in section 4.1, since CNNs are highly nonlinear models, their applicationoutside the training domain cannot be assumed to be reliable, and before applying thenetwork to events in the peak we compare extracted “ features ” of MC and data events onthe sidebands. It is common to consider convolutional layers as feature extractors (each oneextracting higher level features), and consecutive dense layers as a classiﬁer. We chose theﬁrst ﬂattened layer as a representative feature vector and applied a two sample test - a testto determine whether independent random samples of R d -valued random vectors are drawnfrom the same underlying distribution, for which we chose energy test statistics [25, 26].The energy distance between two sets A, B is given by (cid:15)

A,B = 2 nm n (cid:88) i =1 m (cid:88) j =1 (cid:107) x i − y j (cid:107) − n n (cid:88) i =1 n (cid:88) j =1 (cid:107) x i − x j (cid:107) − m m (cid:88) i =1 m (cid:88) j =1 (cid:107) y i − y j (cid:107) (4.1)where x i , y i are n, m samples drawn from the two sets. In [26], it was proven that thisquantity is non-negative and equal to zero only if x i and y i are identically distributed. Thep-value, or probability of observing an equal or more extreme value than the measuredvalue, for rejecting the null hypothesis (in this case, that the samples come from the samedistribution) can be calculated via the permutation test [27]. Namely, the nominal energydistance is computed, and the x i and y j are then divided into many (1000 in our case)possible arrangements of two groups of size n and m . The energy distance is computedagain for each of these arrangements, each of which corresponds to one permutation. Thep-value is given by the fraction of permutations in which the energy distance was largerthan the nominal one.The training and validation losses, which are measures of disagreement between theCNN predictions and true labels, are given in Fig. 7 for the networks trained with andwithout data augmentation. The overﬁtting apparent in the case of training withoutaugmentation is prompt and is manifested in the divergence of the validation and testlosses, meaning that the network is beginning to memorize the training dataset and is notgeneralizing well. In Fig. 8 we show that the data augmentation also reduces the data/MCfeatures distribution distance (eq. 4.1), giving us more conﬁdence that the performance ondata will be similar to the performance on MC. As the distances are always calculated onMC and data events directly (without applying any data augmentation transformations),this technique does not directly correct MC but rather makes the model more robust tothe data/MC diﬀerences. The ﬁnal model is chosen by varying regularization parametersand selecting the training iteration step that gives minimal classiﬁcation loss on the MC For validation/testing, the SiPM cut was ﬁxed at 20 photoelectrons, while in the augmentation it wasallowed to vary ±

10 around this value. In machine learning language, features are sets of numbers extracted from input data that are relevantto the task being learned. – 10 –alidation sample, ensuring that the corresponding p-value of energy test statistics is notlarger than 5%. xy original different SiPM charge translation x-flipping zoom x z y z Figure 6 : Example of on-the-ﬂy data augmentation used during training on a selectedsignal event, projected on three planes for easier visualization.

Iterations ×10 L o ss trainvalidation 0 1 2 3 4 5 6 Iterations ×10 L o ss trainvalidation Figure 7 : Training and validation losses without (left) and with (right) the application ofdata augmentation to the training set. Overﬁtting in the left-hand plot is visible only after1000 iterations. As the augmentation procedure is only relevant to the training phase, it wasnot applied to the validation set. The ability of the network to make correct predictions isimproved for events unaltered by data augmentation, which explains why the loss is higherfor the training set than for the validation set in the right-hand plot.– 11 –

Iterations ×10 E n e r g y d i s t a n c e no augmentationaugmentation 0 1 2 3 4 5 6 Iterations ×10 E n e r g y d i s t a n c e no augmentationaugmentation Figure 8 : Energy distance between data and MC features during the training on the leftsideband (left) and right sideband (right) for training with and without the augmentation.The corresponding p-value for the chosen model with augmentation at the chosen iterationstep was ∼ In an ideal test of the trained network, we would have a data sample of only e + e − eventsat the energy of interest acquired from our detector, and another sample of single-electronevents at the same energy. However, as we will always have background events, in particulardue to Compton scattering of the high-energy gamma rays used in producing the e + e − events with the topology of interest, an exactly-labeled test set of detector data is impossible.Therefore we make an assumption about the characteristics of the energy spectrum nearthe energy of interest and attempt to extract the number of signal and background eventspresent, following the procedure explained in [3].First, we select only ﬁducial events passing a single-track cut as explained in section 4.1.Note that the single-track cut was not applied to the training set, but we do apply it to thetest set to allow for exact comparison with the previous analysis. We then assume that thesignal events produce a Gaussian peak (as indeed would be the case for events occurringat a precise energy), and that the background, consisting of Compton electrons, in theregion of the peak can be characterized by an exponential distribution. The peak energyregion is ﬁxed to 1.570-1.615 MeV (as in [3]), a region that contains more than 99.5% ofthe Gaussian peak for both data and MC. Then, we apply an unbinned ﬁt of the sum oftwo curves (Gaussian + exponential) to the full energy spectrum in the larger energy range1.45-1.75 MeV in order to keep the ﬁts stable, obtaining the parameters deﬁning the twocurves. Integrating over theoretical Gaussian and exponential curves in the peak energyrange gives us the estimate of the initial number of signal events s (from the Gaussian)and the initial number of background events b (from the exponential). This procedure is Note the slightly narrower energy range compared to the one speciﬁed in section 4.1 which is used topre-select MC events based on true energy deposited in the detector. This choice is made to remove artiﬁcialdisturbances at the end points of the selected energy range. – 12 –hen repeated using the spectra obtained from events with network classiﬁcation greaterthan a varying threshold, in each case obtaining the number of accepted signal events s and accepted background events b .Figure 9 illustrates this ﬁt procedure for three diﬀerent threshold values on the CNNprediction output using data from NEXT-White and a set of Monte Carlo simulated eventswhich were not present in the training set. Varying the classiﬁcation threshold tracesout a curve in the space of signal acceptance s/s vs. background rejection 1 − b/b (seeFig. 10). To obtain optimal sensitivity in a 0 νββ search, one must maximize the ratio ofaccepted signal to the square root of the rejected background [15], and therefore we alsoconstruct the ﬁgure of merit F = s/ √ b for the various classiﬁcation thresholds. We show forcomparison the non-CNN-based result obtained in [3]. In Monte Carlo, we ﬁnd a maximumﬁgure of merit of F = 2 .

20 with signal acceptance s/s = 0 .

70 and background rejection1 − b/b = 0 .

90. In data, ﬁxing the CNN cut to the one giving the best Monte Carlo ﬁgureof merit, we ﬁnd F = 2 .

21, with signal acceptance s/s = 0 .

65 and background rejection1 − b/b = 0 . onte Carlo Energy (MeV) E v e n t s / b i n cut = 0.0datagaussianexponential Energy (MeV) E v e n t s / b i n cut = 0.5datagaussianexponential Energy (MeV) E v e n t s / b i n cut = 0.85datagaussianexponential Data

Energy (MeV) E v e n t s / b i n cut = 0.0datagaussianexponential Energy (MeV) E v e n t s / b i n cut = 0.5datagaussianexponential Energy (MeV) E v e n t s / b i n cut = 0.85datagaussianexponential Figure 9 : Fits of the energy spectra of Monte Carlo simulation (top) and data (bottom)near the double-escape peak of

Tl at 1592 keV (see text for details). The ﬁts are shownusing events passing a neural network classiﬁcation cut of 0.0 (to obtain the total number ofsignal and background events), 0.5 (the cut usually used in binary classiﬁcation problems)and 0.85 (the cut that was found to yield the optimal f.o.m.) S i g n a l e ff i c i e n c y MC true labels (CNN)MC fit (CNN)data fit (CNN)MC fit (standard)data fit (standard) 0.0 0.2 0.4 0.6 0.8 1.0CNN prediction threshold0.00.51.01.52.02.5 f . o . m . ( s / b ) data (standard) maximumMC true labels(CNN) MC fit (CNN)data fit (CNN) Figure 10 : The signal acceptance vs. background rejection (left) and the ﬁgure of merit(right). The curves labeled “ﬁt” are traced out by varying the neural network classiﬁcationthreshold and determining the fraction of accepted signal and rejected background usingthe ﬁt procedure described in the text, while the MC true labels are obtained using MClabels as described in section 4.1. – 14 –

Conclusions

We have demonstrated the ﬁrst data-based evaluation of track classiﬁcation in HPXe TPCswith neural networks. The results conﬁrm the potential of the method demonstrated inprevious simulation-based studies and show that neural networks trained using a detailedMonte Carlo can be employed to make predictions on real data. The present results showthat the background contamination can be reduced to approximately 10% while maintaininga signal eﬃciency of about 65%. In fact, these results are likely to be conservative, asthis demonstration was performed at an energy of 1592 keV, while at the same pressure,tracks with energy Q ββ are longer, and therefore their topological features should be morepronounced.Furthermore, we have shown that, with the application of appropriate domain regular-ization techniques to the training set, our model performs similarly on detector data andsimulation in the extraction of the signal events of interest. Acknowledgments

This study used computing resources from Artemisa, co-funded by the European Unionthrough the 2014-2020 FEDER Operative Programme of the Comunitat Valenciana, projectIDIFEDER/2018/048. The NEXT Collaboration acknowledges support from the followingagencies and institutions: the European Research Council (ERC) under the Advanced Grant339787-NEXT; the European Union’s Framework Programme for Research and InnovationHorizon 2020 (2014-2020) under the Grant Agreements No. 674896, 690575 and 740055;the Ministerio de Econom´ıa y Competitividad and the Ministerio de Ciencia, Innovaci´ony Universidades of Spain under grants FIS2014-53371-C04, RTI2018-095979, the SeveroOchoa Program grants SEV-2014-0398 and CEX2018-000867-S, and the Mar´ıa de MaeztuProgram MDM-2016-0692; the GVA of Spain under grants PROMETEO/2016/120 andSEJI/2017/011; the Portuguese FCT under project PTDC/FIS-NUC/2525/2014 and underprojects UID/FIS/04559/2020 to fund the activities of LIBPhys-UC; the U.S. Department ofEnergy under contracts number DE-AC02-06CH11357 (Argonne National Laboratory), DE-AC02-07CH11359 (Fermi National Accelerator Laboratory), DE-FG02-13ER42020 (TexasA&M) and DE-SC0019223 / DE-SC0019054 (University of Texas at Arlington); and theUniversity of Texas at Arlington. DGD acknowledges Ramon y Cajal program (Spain) undercontract number RYC-2015-18820. JM-A acknowledges support from Fundaci´on Bancaria laCaixa (ID 100010434), grant code LCF/BQ/PI19/11690012. We also warmly acknowledgethe Laboratori Nazionali del Gran Sasso (LNGS) and the Dark Side collaboration fortheir help with TPB coating of various parts of the NEXT-White TPC. Finally, we aregrateful to the Laboratorio Subterr´aneo de Canfranc for hosting and supporting the NEXTexperiment.

References [1]

Collaboration, F. Monrabal et al.,

The Next White (NEW) detector , JINST (2018)P12010, [ arXiv:1804.02409 ]. – 15 – NEXT

Collaboration, J. Renner et al.,

Energy calibration of the NEXT-White detector with1% resolution near Q ββ of Xe , JHEP (2019) 230, [ arXiv:1905.13110 ].[3] NEXT

Collaboration, P. Ferrario et al.,

Demonstration of the event identiﬁcation capabilitiesof the NEXT-White detector , JHEP (2019) 052, [ arXiv:1905.13141 ].[4] NEXT

Collaboration, P. Novella et al.,

Radiogenic backgrounds in the NEXT double betadecay experiment , JHEP (2019) 051, [ arXiv:1905.13625 ].[5] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, andL. Zdeborov´a, Machine learning and the physical sciences , Rev. Mod. Phys. (2019) 045002.[6] A. Aurisano, A. Radovic, D. Rocco, A. Himmel, M. Messier, E. Niner, G. Pawloski, F. Psihas,A. Sousa, and P. Vahle, A convolutional neural network neutrino event classiﬁer , JINST (2016), no. 09 P09001, [ arXiv:1604.01444 ].[7] MicroBooNE

Collaboration, R. Acciarri et al.,

Convolutional neural networks applied toneutrino events in a liquid argon time projection chamber , JINST (2017), no. 03 P03011,[ arXiv:1611.05531 ].[8] MicroBooNE

Collaboration, C. Adams et al.,

Deep neural network for pixel-levelelectromagnetic particle identiﬁcation in the microboone liquid argon time projection chamber , Phys. Rev. D (May, 2019) 092001, [ arXiv:1808.07269 ].[9] N. Choma, F. Monti, L. Gerhardt, T. Palczewski, Z. Ronaghi, P. Prabhat, W. Bhimji, M. M.Bronstein, S. R. Klein, and J. Bruna, Graph neural networks for icecube signal classiﬁcation , in ,pp. 386–391, 2018. .[10] E. Racah, S. Ko, P. Sadowski, W. Bhimji, C. Tull, S. Oh, P. Baldi, and Prabhat,

Revealingfundamental physics from the daya bay neutrino experiment using deep neural networks , in ,pp. 892–897, 2016. arXiv:1601.07621 .[11]

EXO

Collaboration, S. Delaquis et al.,

Deep neural networks for energy and positionreconstruction in EXO-200 , JINST (2018), no. 08 P08023, [ arXiv:1804.09641 ].[12] H. Qiao, C. Lu, X. Chen, K. Han, X. Ji, and S. Wang, Signal-background discrimination withconvolutional neural networks in the PandaX-III experiment using MC simulation , ScienceChina Physics, Mechanics & Astronomy (2018), no. 10 101007, [ arXiv:1905.13141 ].[13] P. Ai, D. Wang, G. Huang, and X. Sun, Three-dimensional convolutional neural networks forneutrinoless double-beta decay signal/background discrimination in high-pressure gaseous timeprojection chamber , JINST (2018), no. 08 P08015, [ arXiv:1803.01482 ].[14] NEXT

Collaboration, J. Renner et al.,

Background rejection in NEXT using deep neuralnetworks , JINST (2017), no. 01 T01004, [ arXiv:1609.06202 ].[15] NEXT

Collaboration, J. Mart´ın-Albo et al.,

Sensitivity of NEXT-100 to neutrinoless doublebeta decay , JHEP (2016) 159, [ arXiv:1511.09246 ].[16] D. Nygren, High-pressure xenon gas electroluminescent TPC for − ν ββ -decay search , Nucl.Instrum.Meth.

A603 (2009) 337–348.[17]

Collaboration, G. Mart´ınez-Lema et al.,

Calibration of the NEXT-White detectorusing m Kr decays , JINST (2018) P10014, [ arXiv:1804.01780 ]. – 16 –

18] J. Mart´ın-Albo,

The NEXT experiment for neutrinoless double beta decay searches . PhDthesis, Valencia U., IFIC, 2015.[19]

GEANT4

Collaboration, S. Agostinelli et al.,

GEANT4: A Simulation toolkit , Nucl. Instrum.Meth.

A506 (2003) 250–303.[20] B. Graham and L. van der Maaten,

Submanifold sparse convolutional networks , ArXiv abs/1706.01307 (2017).[21] L. Domin´e and K. Terao,

Scalable deep convolutional neural networks for sparse, locally denseliquid argon time projection chamber data , ArXiv abs/1903.05663 (2019).[22] K. He, X. Zhang, S. Ren, and J. Sun,

Deep residual learning for image recognition , arXiv:1512.03385 .[23] K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks , in

ECCV ,2016.[24] C. Shorten and T. M. Khoshgoftaar,

A survey on image data augmentation for deep learning , Journal of Big Data (2019) 1–48.[25] G. Szekely and M. Rizzo, Testing for equal distributions in high dimension , InterStat (11,2004).[26] G. Szekely and M. Rizzo, Energy statistics: A class of statistics based on distances , Journal ofStatistical Planning and Inference (08, 2013).[27] R. Fisher, The Design of Experiments . The Design of Experiments. Oliver and Boyd, 1935.[28]

Collaboration, V. ´Alvarez et al.,

Initial results of NEXT-DEMO, a large-scaleprototype of the NEXT-100 experiment , JINST (2013) P04002, [ arXiv:1211.4838 ].[29] NEXT

Collaboration, V. Alvarez et al.,

Operation and ﬁrst results of the NEXT-DEMOprototype using a silicon photomultiplier tracking array , JINST (2013) P09011,[ arXiv:1306.0471 ].].