[PDF] A Review on Machine Learning for Neutrino Experiments

Abstract

Neutrino experiments study the least understood of the Standard Model particles by observing their direct interactions with matter or searching for ultra-rare signals. The study of neutrinos typically requires overcoming large backgrounds, elusive signals, and small statistics. The introduction of state-of-the-art machine learning tools to solve analysis tasks has made major impacts to these challenges in neutrino experiments across the board. Machine learning algorithms have become an integral tool of neutrino physics, and their development is of great importance to the capabilities of next generation experiments. An understanding of the roadblocks, both human and computational, and the challenges that still exist in the application of these techniques is critical to their proper and beneficial utilization for physics applications. This review presents the current status of machine learning applications for neutrino physics in terms of the challenges and opportunities that are at the intersection between these two fields.

Full PDF

IInternational Journal of Modern Physics Ac (cid:13)

World Scientiﬁc Publishing Company

A Review on Machine Learning for Neutrino Experiments

Fernanda Psihas

Neutrino Division, Fermi National Accelerator LaboratoryBatavia, Illinois, United States of [email protected]

Micah Groh

Department of Physics, Indiana UniversityBloomington, IN, United States of [email protected]

Christopher Tunnell

Department of Physics and Astronomy, RICE UniversityHouston, Texas, United States of [email protected]

Karl Warburton

Department of Physics and Astronomy, Iowa State UniversityAmes, Iowa, United States of [email protected]

Submitted to IJMPA 3 August 2020Neutrino experiments study the least understood of the Standard Model particles byobserving their direct interactions with matter or searching for ultra-rare signals. Thestudy of neutrinos typically requires overcoming large backgrounds, elusive signals, andsmall statistics. The introduction of state-of-the-art machine learning tools to solve anal-ysis tasks has made major impacts to these challenges in neutrino experiments across theboard. Machine learning algorithms have become an integral tool of neutrino physics,and their development is of great importance to the capabilities of next generation ex-periments. An understanding of the roadblocks, both human and computational, and thechallenges that still exist in the application of these techniques is critical to their properand beneﬁcial utilization for physics applications. This review presents the current sta-tus of machine learning applications for neutrino physics in terms of the challenges andopportunities that are at the intersection between these two ﬁelds.

Keywords : Neutrinos; Machine Learning; Deep Learning; Review.PACS numbers: 1 a r X i v : . [ phy s i c s . c o m p - ph ] A ug F. Psihas, M. Groh, C. Tunnell, K. Warburton

1. Introduction

The nature of neutrinos and their masses is one of the main science drivers of particlephysics today.

1, 2

Not only are neutrinos the least understood particle in the Stan-dard Model, they may be linked to the explanation of the matter/antimatter asym-metry in the Universe through the process of leptogenesis. Neutrinos exhibit unex-pected oscillations between their mass states, a behavior which indicates that othernew physical phenomena beyond the Standard Model might be possible. Speciﬁ-cally, it raises the question about the mechanism through which they acquire thenon-zero mass required by oscillations as well as the possibility that neutrinos en-gage in charge-parity (CP) violating processes, both linked directly to leptogenesis.The answers to questions about the mass mechanism and CP violation can providea deeper understanding of the early Universe through the study of neutrinos.In the aftermath of the solar neutrino problem, resolved by the discovery ofoscillations,

5, 6 a large number of experiments have set out to answer the remainingquestions of neutrino physics, taking advantage of the best particle detection tech-nology available to them. The low cross-sections of neutrino interactions and thebackground suppression required by many of these experiments makes the studyof neutrinos technically challenging and subject to statistical limitations. The op-timization of signal and background separation, detection threshold, and physicsreconstruction, are all key factors in the technology design for a particular experi-ment.The software tools used for analysis and reconstruction of detector data are oftenoverlooked as key components of experimental technology. These tools are not onlyused for analysis and ﬁnal results, but also form an integral part of the conceptionand design of new projects. Improvements in reconstruction and analysis technolo-gies have enhanced our ability to extract information from data and translate it intophysics quantities. The study and development of reconstruction and analysis toolsis, thus, of critical importance to the capabilities of particle physics experiments.The tools of machine learning, also broadly referred to as artiﬁcial intelligence, havebeen at the center of analysis techniques for several decades.Beyond appealing to the personal interest of the reader in machine learning, itis clear from the abundance of applications of these tools, that a basic knowledge ofmachine learning is central to the understanding of experimental data analysis inneutrino physics today. In this manuscript, we review the algorithms, developments,and evolution of machine learning tools for neutrino experiments with a focus ondeep learning. We discuss the obstacles that still challenge the standing of thesetools as trusted parts of the experimentalist’s arsenal. We ﬁrst deﬁne some essentiallanguage and formalism for this discussion and introduce the basic components ofmachine learning algorithms and concepts of deep learning. The current status andprospects in the ﬁeld will be discussed in terms of the roadblocks, both humanand computational, and the challenges and opportunities that still exist in theapplication of these techniques. achine Learning and Neutrinos

2. Machine Learning and Deep Learning

The term machine learning is an umbrella term for all algorithms where inferenceis used to perform a task and that have the ability to improve with experience,though terms like deep learning are commonly used depending on the complexityof these algorithms. Within the past decade, deep learning algorithms have gainedsigniﬁcant popularity in neutrino experiments and have enabled large improvementsin the performance and physics reach of the analyses where they are employed.A common entry-point to the understanding of machine learning algorithms isthe description of an artiﬁcial neural network or ANN. ANNs are interconnectedseries of elementary functions called neurons, somewhat analogous to the biolog-ical system of the brain. Artiﬁcial neurons employ mathematical functions calledactivation functions to produce an output, mimicking the action potential which de-termines the production of electrical signals in brain cells. The connections betweenartiﬁcial neurons, where the output of each one is passed as input to others, allowthe network to combine these simple units to perform the complex task of learn-ing. Similarly, connecting a large number of artiﬁcial neurons results in interestingmacroscopic behavior, as discussed in the following sections.In the mathematical representation, each neuron receives one or more inputswhich are individually weighted. The output is determined by the activation func-tion, which is a nonlinear transformation of the inputs. In ANNs, neurons are placedinto connected layers, often as seen in Figure 1. The Multi-layer perceptron MLPdepicted in the ﬁgure is a type of ANN in which the neurons are organized into“hidden layers” between the input and the output. In this type of network, alsocalled feedforward network, the outputs of each layer are fed to the next layer, andso on. The number of hidden layers and their interconnected array of neurons allowsthe network to perform complex tasks. For example, a network can be used to repro-duce a mapping F of an input vector (cid:126)x to an output vector (cid:126)y . The input data usedby the network to learn F are a collection of “ground truth” examples for whichthe input (cid:126)x and the exact output (cid:126)y = F ( (cid:126)x ) are known or have been simulated.The process of learning occurs iteratively by ﬁrst constructing output estimatesfor given values of (cid:126)x using a set of initial weights for each component of the network.Then, the diﬀerence between these estimates F (cid:48) ( (cid:126)x ) computed by the network andthe desired target are minimized by introducing changes to the weights of eachneuron. How the diﬀerences, or losses, are quantiﬁed and the choice of minimizationfunction vary by application. The iterations are repeated until the ability of thenetwork to approximate the function’s behavior no longer improves.Note that while the task of the network is to reproduce the output of the targetfunction given the same set of inputs, it need not know or approximate the exactform of the function to accomplish that task. For example, a network trained toreproduce the invariant mass of an initial state does not need to know or learn decaykinematics, but instead it learns to reproduce the same principles by training on ﬁnalstate vectors with corresponding invariant mass values. Therefore, it is possible for F. Psihas, M. Groh, C. Tunnell, K. Warburton x x x activation f X i w i x i + b ! AAACMnicdVDLSgMxFM34rPVVdekmWISKUma0osuiG91VsA/oDEMmzbShmQfJHbUM/SY3fongQheKuPUjTF+gVg9ccjj33OTmeLHgCkzz2ZiZnZtfWMwsZZdXVtfWcxubNRUlkrIqjUQkGx5RTPCQVYGDYI1YMhJ4gtW97vmgX79hUvEovIZezJyAtEPuc0pAS27uMrWHlzRl23NSs1gyBziYIn3fFsyHgq2SwOX4Vtedrn3s2ZK3O7DXd3P5iRlPE6s4PM08GqPi5h7tVkSTgIVABVGqaZkxOCmRwKlg/aydKBYT2iVt1tQ0JAFTTjrcto93tdLCfiR1hYCH6veJlARK9QJPOwMCHfW7NxD/6jUT8E+dlIdxAiyko4f8RGCI8CA/3OKSURA9TQiVXO+KaYdIQkGnnNUhTH6K/ye1w6J1VDy+KuXLZ+M4Mmgb7aACstAJKqMLVEFVRNE9ekKv6M14MF6Md+NjZJ0xxjNb6AeMzy9eu6YR w , w , w AAACHXicdVDLSsNAFJ34rPVVdelmsAguSkialra7ohuXFewD0hAm02k7dPJgZmIpIT/ixl9x40IRF27Ev3GaVvB54MLhnHvv3DlexKiQhvGurayurW9s5rby2zu7e/uFg8OOCGOOSRuHLOQ9DwnCaEDakkpGehEnyPcY6XqTi7nfvSFc0DC4lrOIOD4aBXRIMZJKcguVpJ8tsfnIcxJDLzfqhtUo/SLp1DVLU7esykrdQlHpZr1erUFDr9YsRTNSrlQr0NSNDEWwRMstvPYHIY59EkjMkBC2aUTSSRCXFDOS5vuxIBHCEzQitqIB8olwkuywFJ4qZQCHIVcVSJipXycS5Asx8z3V6SM5Fj+9ufiXZ8dyWHcSGkSxJAFePDSMGZQhnEcFB5QTLNlMEYQ5VbdCPEYcYakCzasQPn8K/yedsm5aevWqUmyeL+PIgWNwAs6ACWqgCS5BC7QBBrfgHjyCJ+1Oe9CetZdF64q2nDkC36C9fQDffZ4R f AAACE3icdZDLSsNAFIYnXmu9RV26GSyCiISkbWiXRTcuK9gLpKFMppN26OTCzEQoIe/gxldx40IRt27c+TZO0wZU9MDAz/+dM3Pm92JGhTTNT21ldW19Y7O0Vd7e2d3b1w8OuyJKOCYdHLGI9z0kCKMh6UgqGenHnKDAY6TnTa/mvHdHuKBReCtnMXEDNA6pTzGSyhrq5+kgv8ThY89NTaNqNZt248I07EZNyVxU63Y987OhXik4LDgsOLQMM68KWFZ7qH8MRhFOAhJKzJAQjmXG0k0RlxQzkpUHiSAxwlM0Jo6SIQqIcNN8nQyeKmcE/YirE0qYu98nUhQIMQs81RkgORG/2dz8izmJ9JtuSsM4kSTEi4f8hEEZwXlAcEQ5wZLNlECYU7UrxBPEEZYqxrIKofgp/F90q4ZVM+ybeqV1uYyjBI7BCTgDFmiAFrgGbdABGNyDR/AMXrQH7Ul71d4WrSvacuYI/Cjt/QtGEpqg x AAACFXicdVBNS8MwGE79nPOr6tFLcAgeRml1osehF48T3Ad0paRZtoWlaUlScZT+CS/+FS8eFPEqePPfmHYbqNMHQp48z/smb54gZlQq2/40FhaXlldWS2vl9Y3NrW1zZ7clo0Rg0sQRi0QnQJIwyklTUcVIJxYEhQEj7WB0mfvtWyIkjfiNGsfEC9GA0z7FSGnJN6tpt7jEFYPAS22rZueozpHszncy36zMznCeOFax2xUwRcM3P7q9CCch4QozJKXr2LHyUiQUxYxk5W4iSYzwCA2IqylHIZFeWgyUwUOt9GA/EnpxBQv1e0eKQinHYaArQ6SG8reXi395bqL6515KeZwowvHkoX7CoIpgHhHsUUGwYmNNEBZUzwrxEAmElQ6yrEOY/RT+T1rHlnNinV7XKvWLaRwlsA8OwBFwwBmogyvQAE2AwT14BM/gxXgwnoxX421SumBMe/bADxjvX7JPmro= x AAACFXicdVBNS8MwGE7n15xfVY9egkPwMEo7J3ocevE4wX1AV0qapVtY+kGSiqP0T3jxr3jxoIhXwZv/xrTbQJ0+EPLked43efN4MaNCmuanVlpaXlldK69XNja3tnf03b2OiBKOSRtHLOI9DwnCaEjakkpGejEnKPAY6Xrjy9zv3hIuaBTeyElMnAANQ+pTjKSSXL2W9otLbD70nNQ0GmaO2gLJ7tx65urV+RkuEssodrMKZmi5+kd/EOEkIKHEDAlhW2YsnRRxSTEjWaWfCBIjPEZDYisaooAIJy0GyuCRUgbQj7haoYSF+r0jRYEQk8BTlQGSI/Hby8W/PDuR/rmT0jBOJAnx9CE/YVBGMI8IDignWLKJIghzqmaFeIQ4wlIFWVEhzH8K/yedumGdGKfXjWrzYhZHGRyAQ3AMLHAGmuAKtEAbYHAPHsEzeNEetCftVXublpa0Wc8++AHt/Quz1Jq7 x AAACFXicdVBNS8MwGE7n15xfVY9egkPwMErrJnocevE4wX1AV0qapVtY+kGSiqP0T3jxr3jxoIhXwZv/xrTbQJ0+EPLked43efN4MaNCmuanVlpaXlldK69XNja3tnf03b2OiBKOSRtHLOI9DwnCaEjakkpGejEnKPAY6Xrjy9zv3hIuaBTeyElMnAANQ+pTjKSSXL2W9otLbD70nNQ0GmaO2gLJ7tx65urV+RkuEssodrMKZmi5+kd/EOEkIKHEDAlhW2YsnRRxSTEjWaWfCBIjPEZDYisaooAIJy0GyuCRUgbQj7haoYSF+r0jRYEQk8BTlQGSI/Hby8W/PDuR/rmT0jBOJAnx9CE/YVBGMI8IDignWLKJIghzqmaFeIQ4wlIFWVEhzH8K/yedE8OqG6fXjWrzYhZHGRyAQ3AMLHAGmuAKtEAbYHAPHsEzeNEetCftVXublpa0Wc8++AHt/Qu1WZq8 b AAACE3icdVBNS8MwGE7n15xfVY9egkMQkdLqRI9DLx4nuA/oykizdAtL05Kkwij9D178K148KOLVizf/jWm3gTp9IOTJ87xv8ubxY0alsu1Po7SwuLS8Ul6trK1vbG6Z2zstGSUCkyaOWCQ6PpKEUU6aiipGOrEgKPQZafujq9xv3xEhacRv1TgmXogGnAYUI6WlnnmUdotLXDHwvdS2anaO4zmS+VnPrM5OcJ44VrHbVTBFo2d+dPsRTkLCFWZIStexY+WlSCiKGckq3USSGOERGhBXU45CIr20GCeDB1rpwyASenEFC/V7R4pCKcehrytDpIbyt5eLf3luooILL6U8ThThePJQkDCoIpgHBPtUEKzYWBOEBdWzQjxEAmGlY6zoEGY/hf+T1onlnFpnN7Vq/XIaRxnsgX1wCBxwDurgGjRAE2BwDx7BM3gxHown49V4m5SWjGnPLvgB4/0LVImaAA== outputinputs inputs hidden layers output Fig. 1. Left: The structure of a neuron, the building block of artiﬁcial neural networks. Eachneuron takes as input the values output from the previous layer. The values are combined in alinear a combination, some bias is applied, and then a non-linear activation function is applied.Right: A complete artiﬁcial neural network. Each neuron is connected to every neuron in theprevious layer and every neuron in the ﬁnal layer. these networks to perform complex tasks through the same simple learning process,which with several computational innovations have become a powerful tool for avariety of tasks.Other common tasks performed by these algorithms on detector data tradition-ally include regression and classiﬁcation. Regression involves learning the mappingbetween dependent and independent variables, where this is often a continuous map-ping predicting quantities such as particle energy. Classiﬁcation involves learningthe category associated with the data, which in common applications is used forsignal and background discrimination. Classiﬁcation is done by normalizing scoresfor each category to sum to unity using a softmax activation function. See 7 fora description of many common activation functions used in neural networks. Thetasks performed by these algorithms are more complex and increasingly take onmore of the process of reconstruction and analysis of detector data.

Deep Learning

Deep learning algorithms is diﬀerentiated from machine learning by the complexityof the algorithms used. Deeper network induce more non-linear operations suchthat the mapping from results to the input variables is more challenging to track.Deep learning algorithms have gained popularity in the last two decades due tobreakthroughs in their performance, largely enabled by the rapid development ofhardware such as graphics processing units (GPUs). The ﬁeld of computer visionhas been the primary driver of the innovations, which are used to solve patternrecognition tasks. Deep neural networks are sophisticated and more computationally expensivetechniques which are able to tackle problems of higher complexity than other ma-chine learning tools. Increasing network depth by adding additional layers, for in-stance, allows for the approximation of increasingly complicated functions. achine Learning and Neutrinos Convolution Layer Pooling Layer Fully Connected LayerConvolution Layer Pooling Layer

Input Layer

Feature Extraction

Fig. 2. The structure of a convolutional neural network. The convolution layers use image kernelsto extract features from the input. The pooling layers downsample the image. The ﬁnal set offeatures are connected to a fully connected, artiﬁcial neural network.

One of the most common deep learning algorithms employed for pattern recogni-tion is the Convolutional Neural Network (CNN). CNNs are a class of MLP whichlearn to extract features from an input in addition to training for the intended task.These features are identiﬁed using the spatial relationship between neighboring re-gions in the image. The key components of CNNs are kernels, or image ﬁlters, which are matrices that scan an input image and output an image with highlightedfeatures. A convolution layer consists of operating one or more kernels across aninput image.In reality, the input to CNNs are tensors typically containing the pixel-by-pixelRGB values of an image. The dimensions and content of the input tensors can bealtered for diﬀerent applications but most developments in image recognition natu-rally use the image-to-RGB tensor strategy. Because convolutions output the eﬀectof a kernel on the input tensor with translational invariance, they are especially use-ful for image and pattern recognition, where the features of interest are topologicalcharacteristics.Figure 2 shows the basic structure of a CNN. As seen in the ﬁgure, the fullyconnected layers of a CNN are notably similar to the basic MLP, whereas the initialconvolutional layers serve the purpose of feature extraction. The kernel values arelearned during the training process to extract features that are most useful forthe desired task. Convolution layers are often interlaced with pooling layers whichdownsample the image to reduce the computations needed deeper in the networkand promote translational or rotational invariance. The ﬁnal layers of the networkthen perform the classiﬁcation or regression task using the extracted features asinput.The task of identifying signals and reconstructing physical characteristics of in-teractions in particle detectors is often analogous to that of pattern recognition inimages. Thus, much of the recent development in applications in neutrino physicsinvolves usage or adaptations of deep learning networks developed for image recog-nition. Within the past decade, deep learning algorithms have gained signiﬁcantpopularity in neutrino experiments and have enabled drastic improvements in the F. Psihas, M. Groh, C. Tunnell, K. Warburton

Fig. 3. First demonstration of neural networks for neutrino physics. The network was trained toseparate charged current and neutral current neutrino interactions for the SNO experiment. Thetable on the performance of the network with a classiﬁcation matrix. performance and sensitivity of the analyses where they have been employed. DeepCNNs have now demonstrated state of the art performance on many tasks andare one of the most common tools used in neutrino physics. The advantages andmotivation to use CNNs in neutrino experiments are largely applicable to other deepneural networks used in neutrino experiment as well. Similarly, the advantages, andthe challenges discussed in the following section applies to CNNs.

3. Applications in Neutrino Experiments

Particle physics experiments, including neutrino experiments, are endeavours whichrequire the analysis of large data sets, sophisticated modeling, and statistics. In thepast two decades, both neutrino physics and machine learning have been experi-encing a renaissance with the discovery of neutrino oscillations and the advent ofdeep learning, respectively. In the 1990s the initial exploration of neural networksin particle physics began. The SNO experiment was the ﬁrst to explore the use ofneural networks in neutrino physics, using feedforward networks, a type of artiﬁcalneural network to classify events based on hit pattern features, shown in Figure 3.While it is true that these neural networks did not outperform other statisticaltechniques at ﬁrst, they demonstrated the capabilities of these techniques for eventclassiﬁcation in neutrino detector data. As expertise grew regarding the impactof sample preparation and feature choices in network performance, not only didmachine learning techniques surpass traditional reconstruction, but they would growto be one of the most widely used analysis techniques in the ﬁeld.Machine learning has played a role in nearly every particle physics discoveryand measurement since. Common analysis frameworks designed for particle physicshave natively supported the use of these tools for almost two decades. The role of machine learning in physics analyses has only grown in scope, takingadvantage of several opportunities speciﬁc to our problem set which will be discussedin the next section. These ﬁrst applications are now commonplace in our ﬁeld,typically using tools like feedforward networks, MLPs, and more recently boosted achine Learning and Neutrinos decision trees, to name a few. Most common applications start with input variableswhich have been pre-extracted and selected by the analyzer. This continued as themain strategy until the introduction of deep learning tools.The tasks that deep learning algorithms have been applied to in the last decadespan the full extent of experimental analysis work ﬂow, including design, hardwaretriggers, energy estimation, reconstruction, and signal selection. Many applicationsexist which have greatly simpliﬁed and improved the performance of experimentsand their physics reach when compared to the standard tools they have replaced.The performance achieved by these tools is the prime motivation for their imple-mentation to solve physics problems, despite the computational complications whichwill be described later in this section.Even more signiﬁcant than the improvements themselves are the implicationsof the usage of these tools in our experiments. The interplay between neutrinophysics and deep learning is rich in both challenges and opportunities for bothﬁelds. The current status of the ﬁeld is presented in this section, in the contextof these challenges and opportunities. Rather than providing an exhaustive list ofapplications in a rapidly growing ﬁeld, those that are notable are highlighted whenrelevant to the item discussed. Challenge 1 — Adaptability of the Methods

The most frequently used deep learning algorithms in neutrino experiments are thosedeveloped or commonly used for image recognition. Given that some experimentalsetups closely resemble or can be mapped into 2-dimensional images, this is a naturalstarting point for many studies to apply the tools of image recognition.However analogous, the problems solved for image or pattern recognition haveimportant diﬀerences with particle physics. Some adaptation is usually required forthe usage of these algorithms. Adaptation can be as simple as converting detectordata into image-like tensor inputs or as complicated as complete network redesignfor the new task. The trade-oﬀ between simple adaptation and those where theinputs and network are more tuned to the particular task can be signiﬁcant interms of performance improvement.An example of a deep learning network used with diﬀerent adaptations in neu-trino experiments is the GoogLeNet CNN architecture. GoogLeNet was the ﬁrstcreatively non-sequential implementation of convolutional layers in CNNs, whichbrought signiﬁcant accuracy and performance improvements with respect to itscompetitors. Following the success of GoogLeNet, many neutrino experiments ex-plored its utilization with minimal or no modiﬁcations as a starting point for theirown classiﬁcation studies. Despite the diﬀerences between images and neutrino data,out-of-the-box approaches yielded important successes over traditional methods.A successful out-of-the-box application of GoogLeNet is the NEXT experimentbackground rejection network. The NEXT detectors are cylindrical time projec-tion chambers with photon detection and charge detection at each end, respectively.

F. Psihas, M. Groh, C. Tunnell, K. Warburton

Fig. 4. Input data for the NEXT CNN classiﬁer. Top: Example event in NEXT with 10 mmvoxelization. Bottom: Example event in NEXT with 2 mm voxelization. Columns are the xy , yz ,and xz views of the event. The distinguishing feature of the track projection that identiﬁes theseas signal events is the presence of a larger energy deposition (bragg peak) at the end of each track.This feature is mostly lost in the 10 mm voxelization. Photomultiplier tubes collect a light signal and silicon photomultipliers (SiPM) col-lect an electroluminescence signal from drifted charges inside the detector. Forthe training inputs to resemble 2D images of the particle tracks, the granularity ofthe data from the SiPM readout is reduced to 3D voxels, of dimensions x , y (spa-tial) and z (drift time), which are used as the RGB channels of the CNN inputtensor. The inputs for diﬀerent voxel sizes are shown in Figure 4. Equal numbers ofsimulated neutrino-less double beta decay signal and radioactive background eventsare used for the training. This simple implementation was found to outperformthe traditional reconstruction by between a factor of 1.2 and 1.6 depending on thereconstruction resolution.The many diﬀerences between 2D images and detector data provide an oppor-tunity to improve algorithm performance by making thoughtful modiﬁcations tothe original networks. In many cases, large improvements have been attained fromenhancing useful features of the data by making changes to the algorithms and thestructure of the inputs.Such is the case for the Convolutional Visual Network, a CNN classiﬁer de-signed for application on NOvA data. The readout from the two orthogonal views ofthe NOvA detectors is already very image-like and naturally depicts 2-dimensionalprojections of energy depositions. However, the decoupled nature of the two viewsmakes a simple conversion to a single RGB tensor unideal because a conversion ofthe xz and yz views into RGB channels of the same image tensor would result inan unnatural overlap of unrelated features. Rather than artiﬁcially overlapping the achine Learning and Neutrinos ConvolutionS

POOLING

LRNINCEPTION OUTPUTFully ConnectedData

Fig. 5. Left: An example readout from the NOvA detector. The planes are arranged in alter-nating orientations to give two orthogonal views of the event. Right: Siamese tower structure ofNOvA’s Convolutional Visual Network, based on GoogLeNet, for neutrino ﬂavor classiﬁcation.The two towers independently operate on each view of the event. The features from each towerare concatenated in the ﬁnal layers of the network. orthogonal views, the authors employed a Siamese network structure, allowing in-dependence in the learning from each detector view to identify neutrino interactionﬂavor. Figure 5 shows NOvA’s detection technology and the architecture used forneutrino identiﬁcation. This, among other modiﬁcations to a GoogLeNet-inspiredarchitecture produced large accuracy improvements, increasing the eﬀective expo-sure of the experiment by 30%. This network was the ﬁrst to be used in a publishedphysics result, and it demonstrated the signiﬁcance and impact of adapting boththe tool and the inputs to the detector technology.In addition to detector technology and readout, the geometry of the detector isalso relevant to the adaptability challenge. In some cases, thoughtfully consideringmodiﬁcations based on detector geometry can boost performance signiﬁcantly. Inother cases, this consideration could be essential for applying the tools with anysuccess. Additionally, careful consideration in how to map detector readout to inputscompatible with the network of choice should not me overlooked.One notable application is the use of spherical CNNs for analysis of data fromthe close-to-spherical Kamland-zen detector. This currently ongoing work incor-porates a modiﬁcation to best ﬁt the detector geometry needs and has alreadydemonstrated improvements in early stages. Given the nearly spherical shape oftheir detector, the authors of this work seek to correct a distortion created by theprojection of the detector readout into a 2D pattern as seen in Figure 6. They em-ploy spherical CNNs for the task of signal-background classiﬁcation. In a SphericalCNN, the kernel covers the entire phase-space by scanning in Euler angle rather thanprojecting the readout into 2D planes. Indeed, the use of spherical CNNs achieves F. Psihas, M. Groh, C. Tunnell, K. Warburton

Fig. 6. The Kamland-Zen experiment uses spherical convolutions for signal and background sep-aration. The detector is nearly spherical so a traditional mapping to 2D would cause abnormaldistortions in the data. background rejection of 71% compared to 61% for their original CNN. In some cases, other technologies already match the experimental needs sub-stantially better than CNNs. This is the case for the networks used for analysisof IceCube Neutrino Observatory data, whose detector spatial sparsity and non-uniformity makes the data less than ideal for CNNs. Their deep learning applica-tion uses Graphic Neural Networks (GNNs) as a way to mitigate the eﬀect of thesefeatures. This is because GNNs are capable of dealing both with irregular geome-try and graphs of diﬀerent sizes, a feature which is seen in many of their events.GNNs are designed to classify graphs, where the graph nodes deﬁne some elementof the detector and the graph edges show some connection between elements. IceCube’s GNN separates neutrino-induced muons (their signal) from cosmic-rayshower-induced muons (their background), and compared the eﬃciency of the net-work to that of their standard reconstruction. This GNN was able to identify 6.3times more signal events and provide a signal-to-noise rate 3 times larger. A com-parison with a CNN, which gave similar results to the traditional reconstruction,demonstrated that GNNs oﬀer signiﬁcant beneﬁts in this application.There are many examples of successfully overcoming this challenge of adaptingdeep learning tools for neutrino data analysis. However, the approach taken foreach application will continue to encounter diﬀerent obstacles and considerationsunique to the data and the tools chosen for analysis. Careful consideration of thesemodiﬁcations continues to show substantial improvements to the direct applicationof image recognition technologies.

Challenge 2 — Quantifying Bias and Uncertainties

One of the risks of applying machine learning is the possibility that the algorithmswill learn information from the training data beyond what is intended. The risksassociated with these techniques are neither new nor speciﬁc to the applications inour ﬁeld. Furthermore, these challenges are starting to receive more attention in thebroader community, industry applications, and government regulations around theworld. achine Learning and Neutrinos The principal danger is that a dataset used for training contains information orimplies underlying structure which may incorrectly bias the results, yet is learned bythe network. This risk is true for all machine learning algorithms, but is particularlynoteworthy with algorithms which perform feature extraction, such as CNNs. Whilefeature extraction is the main advantage of these algorithms, special attention isneeded to mitigate this risk.Given that the features are abstract, their association with the physical traitsof the data is largely unknown. In addition, the networks used for detector dataanalysis are typically trained on simulated datasets of the events of interest. Thesesimulations carry models and assumptions of the detector performance, the par-ticle interactions, and other physical processes. The challenge of quantifying andmitigating biases is particularly important to guarantee robust physical conclusionswhich are also model and generator independent.Apart from deliberate simulation choices, any eﬀects introduced or distributionssculpted in the selection of the training data as well as any mismatch between simu-lated and real events coming from detection performance, particle interactions, cali-bration, or other eﬀects have the possibility of propagating throughout the learningprocess undetected or unquantiﬁed. While some existing applications have devisedtailored approaches at mitigating bias, standard and complete approaches to thisimportant challenge are yet to be achieved.Minimizing known bias can be achieved through careful choices in the construc-tion of input datasets. An example of this concept is the charge-only energy recon-struction CNN used by EXO-200. This network is used to discriminate betweenSingle-Site, and Multi-Site events and is found to outperform traditional recon-struction which had been used in previous publications. The network was initiallytrained using a simulated

Th source. When a systematic study was performedwith arbitrary resolution, disproportionately large improvements in resolution werefound for events in the

Tl peak with respect to other classes of events. In orderto correct this, the network was instead trained on a calibration gamma ray sourcedata, which acts as a proxy for various backgrounds, in the center of the detec-tor. The CNN is tested on numerous samples, including simulation as well as Co,

Tl,

Ra and

Tl calibration sources at a range of source locations. Havingimplemented this change to the training data, improved performance was found inthe relevant energy range.It is also possible that biases exist but are unknown to the developer, for instance,when an unintended artifact arises in the data. In contrast with the example above,where training sample composition was kept ﬂat across diﬀerent backgrounds, othercharacteristics might not be known to be skewed in unphysical ways. It is notpossible to correct or quantify bias which is not yet known, but techniques can bedesigned to either minimize them or look for them in the data.The application of Domain Adversarial Neural Networks by the MINERvA ex-periment is a prime example of unknown bias reduction. Here, the network isapplied to the task of classiﬁcation using a CNN, but a technique of bias reduc- F. Psihas, M. Groh, C. Tunnell, K. Warburton

Fig. 7. The basic structure of a domain adversarial neural network. The feature extractor(green) and label predictor (blue) function just as a CNN. However, the domain classiﬁer (pink)attempts to determine which of two domains the input is from. The gradient reversal layer dis-courages the network from classifying events using features that are unique to one of the twodomains. tion is employed. The network is trained on both simulation and data. The domainnetwork, whose purpose is to distinguish between the data domain and simulationdomain, is attached to the CNN. It is expected to ﬁnd features that result fromerrors or inconsistencies in the simulation. As shown in Figure 7, the interplay be-tween the two components discourages the task of classiﬁcation to learn from anyfeatures that behave diﬀerently between the two domains.Another diﬃculty arises in designing tests that can reveal hidden biases learnedby deep learning algorithms. Ideally, this would look at the performance of thealgorithm on data, but without a method of knowing the true nature of a data event,this is impossible (if such a method existed, we wouldn’t need these algorithms inthe ﬁrst place!). Instead, we must compare reconstructable quantities between dataand simulations and look for signs of bias between the two, but which biases to lookfor is not obvious. In addition, many experiments begin creating reconstructionalgorithms before data taking has begun and others perform blind analyses wherethe algorithms must be optimized and validated without comparison to data. Whilemethods exist for constructing systematic uncertainties that address possible biasesfor many quantities, how to apply these methods to machine learning algorithms isnot clear.One example of a technique that uses both real and simulated data to searchfor bias is the muon-removed electron-added (MRE) technique used by the NOvAexperiment. Two samples are created by overlaying simulations on real data events(MRE-on-data) and simulations on simulated events (MRE-on-simulation). For eachsample, an identiﬁed muon is removed from selected ν µ charged current events,leaving only the hadronic components of the interaction. The muon is substitutedby a simulated electron of equivalent momentum overlaid on the events. Eﬀectively, achine Learning and Neutrinos a comparison of the network performance between the MRE-on-data and MRE-on-simulation samples provides a measure of the bias-related eﬀects introduced bydata-simulation discrepancies in the hadronic component. The resulting diﬀerencein selection eﬃciency between the data overlay and the simulated overlay is lessthan 0.5%.Another data-based technique is to consider human-labeled datasets. While wecan’t know the true identity of a real data event, a trained physicist can oftenidentify with reasonable accuracy. Comparisons of error patterns between humansand deep neural networks have shown diﬀerences between the two, which sug-gests diﬀerences between the unknown biases from humans to neural networks.This is nevertheless a useful technique to search for large, unexpected biases in theoutcome of the networks. The MicroBooNE experiment uses a liquid argon timeprojection chamber (LArTPC) detector for neutrino interactions. They created ahuman-labeled dataset for validating a semantic segmentation network, a techniquefor classifying individual pixels in an image, trained on simulated neutrino inter-actions. The disagreement between the performance of the network and humanswas less than 2% in the misclassiﬁcation of pixels.Finally, we consider uncertainties. Quantifying the uncertainties associated withany measurement is an integral part of physics analyses. Traditional neural net-works, by design, output a single value. In some cases, they output high conﬁdencescores on events that are well outside the phase space of samples they were trainedon. While the output may be sensible in this case, it should incur a large uncertainty.Bayesian neural networks are designed to address this concern. They replace theﬁxed value weights in the network with probability distribution functions, as shownin Figure 8. The resulting output is, thus, also a probability distribution functionwhich can be interpreted as a most probable value with some uncertainty. Thispotential approach at including uncertainties has recently gained attention in theneutrino community with initial implementations currently being explored. Challenge 3 — Network Interpretability

As machine learning models grow deeper, there is often a trade oﬀ between the per-formance of the algorithm and our ability to interpret its results. Boosted decisiontrees, for example, are low-level machine learning models. They can often inform theuser of the relative importance of each input into the model, but may not have theaccuracy that can be achieved with deeper models. CNNs on the other hand, haveachieved state of the art performance on many tasks, but the features extracted bythe convolutional layers are abstract and challenging to interpret. Some individualkernels can be connected to speciﬁc tasks, such as edge detection. However, thefeatures resulting from multiple convolutions are diﬃcult to connect to topologicalcharacteristics or physical interpretations of the events.This is particularly problematic in physics, where relating network features backto the underlying physics problem is important and sometimes necessary for a com- F. Psihas, M. Groh, C. Tunnell, K. Warburton Fig. 8. A depiction of a bayesian neural network. The ﬁxed value weights in an artiﬁcial neuralnetwork are replaced by probability distribution functions. Thus, the output of each neuron is aprobability distribution function with some most probable value and an uncertainty. plete understanding of the physical models. A better conceptual understanding ofthe physical features used by the network could tell us much about the physicalprocesses which produced the features. In addition, the understanding could aid tominimize or correct the ineﬃciencies in the performance of the algorithm.A common method for interpreting the features extracted by the network is toperform dimensionality reduction. The Daya Bay Reactor Neutrino Experiment isdesigned to detect anti-neutrinos produced by two nearby nuclear reactors. Theyemploy a CNN to separate inverse beta decay (IBD) events, the signal of interest,from noise within the detector. The features extracted by the network are trans-formed into two dimensions suing t-Distributed Stochastic Neighbor Embedding (t-SNE). The t-SNE method uses a non-linear transformation to reduce the dimen-sionality of data in a way that maintains the distance between points local to oneanother. Figure 9 shows the result of this technique. Class separation in this twodimension space, relates to topological information that the network has used todistinguish the diﬀerent classes.Another method of dimensionality reduction is Principal Component Analysis (PCA). PCA is a linear change of basis where the new basis has vectors along thedimensions of maximum variation in the data. Often only a few of the basis vectorsare needed to explain most the variation in the data. Critically, the new basis vectorsare orthogonal meaning each has a unique contribution to the variation in the data.PCA is often performed on the input data to a network to reduce the number ofinputs needed to a smaller set of independent values which are most important to achine Learning and Neutrinos

150 100 50 0 50 100 15015010050050100150

IBD promptIBD delaymuonflasherother

Fig. 9. The t-SNE produced from the Daya Bay CNN to separate IBD signal from backgroundnoise. The t-SNE is a non-linear transformation used for dimensionality reduction. Visual separa-tion in this space relates to separation in the high dimension feature space created by the network.Each point is labeled by it’s true identity. the task. PCA can also be performed on the network extracted features to reducethe dimensionality for visualizations in a similar way to t-SNE.In addition, some qualitative methods try to determine which features of theinput are most relevant to the output. This is particularly important for CNNsdoing image recognition where determining which topological features of the inputare most important to the network output. One method of doing this is to occluderegions of the input image and determine how the various output scores change. Thistechnique is often called an occlusion test. Another technique is to use the networkitself to determine these valuable features. Salience maps determine the gradientof the output score from the network with respect to each of the input pixels.These maps can show where the network is ”looking” to construct it’s features.Interestingly, these sometimes show that CNNs do not look at the primary objectin an image, but instead at the surrounding context. If some objects are commonly F. Psihas, M. Groh, C. Tunnell, K. Warburton found in the same context, then the context can be used as the primary discriminatorto classify that object.

Challenge 4 — Computational & System Constraints

As mentioned in section 2.1, the latest developments in deep learning are largelydriven by improvements in GPU technology where the many computations neededfor large networks can be done in parallel. Deep neural networks often perform O (cid:0) (cid:1) ﬂoating point operations. This is compounded by the amount of data col-lected in particle physics experiments. Modern neutrino experiments record bil-lions of events which require evaluation by various reconstruction and analysis algo-rithms. Many experiments perform these evaluations on large-scale computing gridson CPUs.While neural networks have expanded the capabilities of many neutrino exper-iments, this computing limitation provides a bottleneck to widespread use of verydeep neural networks. Here we consider three methods to alleviate this concern.One potential solution is to expand the availability of GPUs. Small GPU clustersused for training neural networks are becoming more common. However, these arenot enough to match the production needs of many experiments. Larger availabilityof GPU clusters would enhance the ability of experiments to utilize large neuralnetwork based algorithms.Another possibility is to enhance the physics output from these algorithms. Asdiscussed throughout this manuscript, machine learning based methods often showsigniﬁcant improvements over traditional methods. One way to improve performanceis to maximize the primary task algorithm, but the implementation of multi-taskalgorithms could be a promising way to enhance the total physics output from anindividual algorithm. The Deep Underground Neutrino Experiment (DUNE) is afuture neutrino oscillation experiment currently in R&D stages for its LArTPCdetector. The DUNE experiment employs a CNN for identiﬁcation of neutrino in-teraction ﬂavor in their detector, which achieves more than 85% eﬃciency of ν e charged-current events in the energy range of interest. In addition to ﬂavor classiﬁ-cation, the algorithm also outputs the sign of the neutrino, the type of interaction,and the amounts of each particle in the ﬁnal state. In total, the network has sevenoutputs at very little additional computational cost since each output uses the sameset of features extracted by the network.Reducing the computational cost of the algorithms is another option which wouldreduce the total computational need of experiments. Using smaller networks is oneoption, but this comes at the cost of performance. Instead, considerations can begiven to the type of data acquired by experiments. LArTPC detectors, such asthose used by DUNE or the Short Baseline Neutrino program, have very lowoccupancy, the fraction of active detector readout from an event. These events areglobally sparse, <

1% sparsity, but locally dense, in the region of the detector wherethe event occurred. An example of an event recorded in a LArTPC is in Figure 10. achine Learning and Neutrinos

This means that typical CNNs will waste much computation time multiplying orsumming together zeros. It’s been shown that using submanifold sparse convolu-tional networks can reduce the inference time of these networks by a factor of30 and the memory cost by more than 300. These sparse convolutional networksare designed for use with sparse data and only perform convolution operations inregions with activity.Finally, we consider the use of open datasets in algorithm development. Opendatasets are commonly used to benchmark algorithm performance in data scienceapplications. Despite increasing eﬀorts from a handful of experiments to providesuch datasets for analysis, there are still many restrictions surrounding data-sharing in the ﬁeld. The lack of available data sets negatively impacts the ability ofresearchers to develop and publish improved machine learning techniques speciﬁcto particle physics applications and signiﬁcantly hinders progress in developmentsrequiring real data, such as bias assessment. Open data sets would not only enablethese advancements to be developed further, but it would signiﬁcantly encouragebeneﬁcial multi-disciplinary collaboration which would surely improve the qualityof physics of our our experiments.

4. Opportunities going forward

The use of machine learning and currently deep learning algorithms for neutrinoexperiment data analysis is on the rise. We have presented an overview of theimpacts of these techniques in the ﬁeld through a description of the challenges andopportunities associated with their usage. F. Psihas, M. Groh, C. Tunnell, K. Warburton

Opportunity 1 — Impact to Physics and Technology

The application of machine learning tools to neutrino physics is also relevant to theprocess of experiment design and proposal, which brings about opportunities to fur-ther impact the capabilities of future experiments. The next generation of neutrinoexperiments will introduce needs and challenges beyond what the ﬁeld has encoun-tered. Massive detectors designed to measure neutrino oscillations will redeﬁne thechallenges of data rates and data management and will continue to look for waysto expand their physics program.

51, 52

Neutrino-less double beta decay experimentsat and beyond the ton-scale will require exceptional rejection of radioactive back-grounds beyond what has ever been achieved. The emerging ﬁeld of multi-messengerastronomy will further encourage experiments to expand their sensitivity to signalsbeyond their current reach.Much like previous generations, this generation of experiments will only be pos-sible by pushing technological frontiers. This presents opportunities for the ﬁeld ofparticle physics and machine learning, which could cement the synergy between thetwo ﬁelds in mutually beneﬁcial ways.An interesting example of research and development (R&D) involving machinelearning is their application on the hardware trigger being developed for the DUNEexperiment. The large data rates expected on DUNE detectors currently constrainthe energy range available for analysis. Figure 11 shows a single DUNE data frame.The majority of the electronics noise as well as radioactive backgrounds are safelybelow the energies of the accelerator neutrinos DUNE is designed to study. How-ever, there are also interesting signals on the MeV-scale energy range which couldpotentially be studied such as supernova neutrinos, solar neutrinos, and neutrino-less double beta decay. Unfortunately, it is possible that the currently availablehardware for data acquisition systems will require the elimination of much of thelow energy noise from DUNE’s data stream at the trigger level in order to maintainmanageable data rates. However, if the physics reach of DUNE could be extendedto study low energy signals, it could produce world leading measurements of solarneutrino oscillations.In order to enable a DUNE low energy program, data acquisition hardware willneed to sustain high rates at low down times. Research into the applications of deeplearning to hardware triggers and data acquisition for DUNE is ongoing to resolvethis issue. Hardware acceleration as well as well as optimal implementation of deeplearning algorithms on FPGAs and GPUs is being explored. This work is exploringthe possibility of online data analysis capable of process up to tens of terabits persecond, aided by the capabilities of CNNs to tackle high rate image processing. The capabilities of this trigger may well deﬁne whether low energy signals willbe available to explore on DUNE. Thus, the usage of machine learning algorithmsmight signiﬁcantly contribute to not just improving performance of existing anal-yses, but expanding the physics program that is available to experiments. Whilethere is community consensus on some of the challenges machine learning will need achine Learning and Neutrinos C h a nn e l Time C h a nn e l Time

Fig. 11. Left: A high energy atmospheric neutrino interaction in the DUNE LArTPC. Right: Alow energy supernova neutrino interaction in the DUNE LArTPC. to address going forward, we are only starting to recognize that machine learningdevelopment is an integral part of neutrino physics research. The continued activepursuit of R&D involving machine learning applications might signiﬁcantly changethe neutrino physics landscape in the coming decades.

Opportunity 2 — What Physics Can Contribute to MachineLearning

The unique nature of the problem set and analysis strategies of neutrino physics(and particle physics) experiments brings about the potential to contribute newknowledge and applications to the ﬁeld of computer science. Two aspects drive thisopportunity:1. Quantitative results and careful statistical analysis. Statistical precision is oneof the hallmarks of particle physics experiments. Carefully quantifying results anduncertainties becomes even more important as neutrino experiments move into theprecision era. As we develop tools and techniques to address the challenge of biasassessment and uncertainty quantiﬁcation for our needs, these developments willsurely inform the broader picture of secure, ethical, and responsible treatment ofmachine learning beyond scientiﬁc applications.

57, 58

2. Customizable simulated datasets corresponding to real physical data. Themajority of industry applications of machine learning are developed, tested, andapplied in real-world datasets. Training usually employs labeled data of the sametype as that to which network will be applied. In contrast, neutrino experimentsusually construct and train most of their analysis infrastructure on simulated datathat resembles the expected data. The detail to which these simulations are tunableis especially relevant to the study of machine learning algorithms. It provides theopportunity to study their behavior under controlled modiﬁcations in the trainingsamples, which could greatly contribute to the challenge of explainability in and F. Psihas, M. Groh, C. Tunnell, K. Warburton

218 220 222 224 226 228 sec) μ t (

10 10 q (ADC) h it s

200 400 600 800 1000 1200200 400 600 800 1000 1200 z (cm)

NOvA - FNAL E929

Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (

218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929