A Review on Machine Learning for Neutrino Experiments
Fernanda Psihas, Micah Groh, Christopher Tunnell, Karl Warburton
IInternational Journal of Modern Physics Ac (cid:13)
World Scientific Publishing Company
A Review on Machine Learning for Neutrino Experiments
Fernanda Psihas
Neutrino Division, Fermi National Accelerator LaboratoryBatavia, Illinois, United States of [email protected]
Micah Groh
Department of Physics, Indiana UniversityBloomington, IN, United States of [email protected]
Christopher Tunnell
Department of Physics and Astronomy, RICE UniversityHouston, Texas, United States of [email protected]
Karl Warburton
Department of Physics and Astronomy, Iowa State UniversityAmes, Iowa, United States of [email protected]
Submitted to IJMPA 3 August 2020Neutrino experiments study the least understood of the Standard Model particles byobserving their direct interactions with matter or searching for ultra-rare signals. Thestudy of neutrinos typically requires overcoming large backgrounds, elusive signals, andsmall statistics. The introduction of state-of-the-art machine learning tools to solve anal-ysis tasks has made major impacts to these challenges in neutrino experiments across theboard. Machine learning algorithms have become an integral tool of neutrino physics,and their development is of great importance to the capabilities of next generation ex-periments. An understanding of the roadblocks, both human and computational, and thechallenges that still exist in the application of these techniques is critical to their properand beneficial utilization for physics applications. This review presents the current sta-tus of machine learning applications for neutrino physics in terms of the challenges andopportunities that are at the intersection between these two fields.
Keywords : Neutrinos; Machine Learning; Deep Learning; Review.PACS numbers: 1 a r X i v : . [ phy s i c s . c o m p - ph ] A ug F. Psihas, M. Groh, C. Tunnell, K. Warburton
1. Introduction
The nature of neutrinos and their masses is one of the main science drivers of particlephysics today.
1, 2
Not only are neutrinos the least understood particle in the Stan-dard Model, they may be linked to the explanation of the matter/antimatter asym-metry in the Universe through the process of leptogenesis. Neutrinos exhibit unex-pected oscillations between their mass states, a behavior which indicates that othernew physical phenomena beyond the Standard Model might be possible. Specifi-cally, it raises the question about the mechanism through which they acquire thenon-zero mass required by oscillations as well as the possibility that neutrinos en-gage in charge-parity (CP) violating processes, both linked directly to leptogenesis.The answers to questions about the mass mechanism and CP violation can providea deeper understanding of the early Universe through the study of neutrinos.In the aftermath of the solar neutrino problem, resolved by the discovery ofoscillations,
5, 6 a large number of experiments have set out to answer the remainingquestions of neutrino physics, taking advantage of the best particle detection tech-nology available to them. The low cross-sections of neutrino interactions and thebackground suppression required by many of these experiments makes the studyof neutrinos technically challenging and subject to statistical limitations. The op-timization of signal and background separation, detection threshold, and physicsreconstruction, are all key factors in the technology design for a particular experi-ment.The software tools used for analysis and reconstruction of detector data are oftenoverlooked as key components of experimental technology. These tools are not onlyused for analysis and final results, but also form an integral part of the conceptionand design of new projects. Improvements in reconstruction and analysis technolo-gies have enhanced our ability to extract information from data and translate it intophysics quantities. The study and development of reconstruction and analysis toolsis, thus, of critical importance to the capabilities of particle physics experiments.The tools of machine learning, also broadly referred to as artificial intelligence, havebeen at the center of analysis techniques for several decades.Beyond appealing to the personal interest of the reader in machine learning, itis clear from the abundance of applications of these tools, that a basic knowledge ofmachine learning is central to the understanding of experimental data analysis inneutrino physics today. In this manuscript, we review the algorithms, developments,and evolution of machine learning tools for neutrino experiments with a focus ondeep learning. We discuss the obstacles that still challenge the standing of thesetools as trusted parts of the experimentalist’s arsenal. We first define some essentiallanguage and formalism for this discussion and introduce the basic components ofmachine learning algorithms and concepts of deep learning. The current status andprospects in the field will be discussed in terms of the roadblocks, both humanand computational, and the challenges and opportunities that still exist in theapplication of these techniques. achine Learning and Neutrinos
2. Machine Learning and Deep Learning
The term machine learning is an umbrella term for all algorithms where inferenceis used to perform a task and that have the ability to improve with experience,though terms like deep learning are commonly used depending on the complexityof these algorithms. Within the past decade, deep learning algorithms have gainedsignificant popularity in neutrino experiments and have enabled large improvementsin the performance and physics reach of the analyses where they are employed.A common entry-point to the understanding of machine learning algorithms isthe description of an artificial neural network or ANN. ANNs are interconnectedseries of elementary functions called neurons, somewhat analogous to the biolog-ical system of the brain. Artificial neurons employ mathematical functions calledactivation functions to produce an output, mimicking the action potential which de-termines the production of electrical signals in brain cells. The connections betweenartificial neurons, where the output of each one is passed as input to others, allowthe network to combine these simple units to perform the complex task of learn-ing. Similarly, connecting a large number of artificial neurons results in interestingmacroscopic behavior, as discussed in the following sections.In the mathematical representation, each neuron receives one or more inputswhich are individually weighted. The output is determined by the activation func-tion, which is a nonlinear transformation of the inputs. In ANNs, neurons are placedinto connected layers, often as seen in Figure 1. The Multi-layer perceptron MLPdepicted in the figure is a type of ANN in which the neurons are organized into“hidden layers” between the input and the output. In this type of network, alsocalled feedforward network, the outputs of each layer are fed to the next layer, andso on. The number of hidden layers and their interconnected array of neurons allowsthe network to perform complex tasks. For example, a network can be used to repro-duce a mapping F of an input vector (cid:126)x to an output vector (cid:126)y . The input data usedby the network to learn F are a collection of “ground truth” examples for whichthe input (cid:126)x and the exact output (cid:126)y = F ( (cid:126)x ) are known or have been simulated.The process of learning occurs iteratively by first constructing output estimatesfor given values of (cid:126)x using a set of initial weights for each component of the network.Then, the difference between these estimates F (cid:48) ( (cid:126)x ) computed by the network andthe desired target are minimized by introducing changes to the weights of eachneuron. How the differences, or losses, are quantified and the choice of minimizationfunction vary by application. The iterations are repeated until the ability of thenetwork to approximate the function’s behavior no longer improves.Note that while the task of the network is to reproduce the output of the targetfunction given the same set of inputs, it need not know or approximate the exactform of the function to accomplish that task. For example, a network trained toreproduce the invariant mass of an initial state does not need to know or learn decaykinematics, but instead it learns to reproduce the same principles by training on finalstate vectors with corresponding invariant mass values. Therefore, it is possible for F. Psihas, M. Groh, C. Tunnell, K. Warburton x x x activation f X i w i x i + b !
Deep Learning
Deep learning algorithms is differentiated from machine learning by the complexityof the algorithms used. Deeper network induce more non-linear operations suchthat the mapping from results to the input variables is more challenging to track.Deep learning algorithms have gained popularity in the last two decades due tobreakthroughs in their performance, largely enabled by the rapid development ofhardware such as graphics processing units (GPUs). The field of computer visionhas been the primary driver of the innovations, which are used to solve patternrecognition tasks. Deep neural networks are sophisticated and more computationally expensivetechniques which are able to tackle problems of higher complexity than other ma-chine learning tools. Increasing network depth by adding additional layers, for in-stance, allows for the approximation of increasingly complicated functions. achine Learning and Neutrinos Convolution Layer Pooling Layer Fully Connected LayerConvolution Layer Pooling Layer
Input Layer
Feature Extraction
Fig. 2. The structure of a convolutional neural network. The convolution layers use image kernelsto extract features from the input. The pooling layers downsample the image. The final set offeatures are connected to a fully connected, artificial neural network.
One of the most common deep learning algorithms employed for pattern recogni-tion is the Convolutional Neural Network (CNN). CNNs are a class of MLP whichlearn to extract features from an input in addition to training for the intended task.These features are identified using the spatial relationship between neighboring re-gions in the image. The key components of CNNs are kernels, or image filters, which are matrices that scan an input image and output an image with highlightedfeatures. A convolution layer consists of operating one or more kernels across aninput image.In reality, the input to CNNs are tensors typically containing the pixel-by-pixelRGB values of an image. The dimensions and content of the input tensors can bealtered for different applications but most developments in image recognition natu-rally use the image-to-RGB tensor strategy. Because convolutions output the effectof a kernel on the input tensor with translational invariance, they are especially use-ful for image and pattern recognition, where the features of interest are topologicalcharacteristics.Figure 2 shows the basic structure of a CNN. As seen in the figure, the fullyconnected layers of a CNN are notably similar to the basic MLP, whereas the initialconvolutional layers serve the purpose of feature extraction. The kernel values arelearned during the training process to extract features that are most useful forthe desired task. Convolution layers are often interlaced with pooling layers whichdownsample the image to reduce the computations needed deeper in the networkand promote translational or rotational invariance. The final layers of the networkthen perform the classification or regression task using the extracted features asinput.The task of identifying signals and reconstructing physical characteristics of in-teractions in particle detectors is often analogous to that of pattern recognition inimages. Thus, much of the recent development in applications in neutrino physicsinvolves usage or adaptations of deep learning networks developed for image recog-nition. Within the past decade, deep learning algorithms have gained significantpopularity in neutrino experiments and have enabled drastic improvements in the F. Psihas, M. Groh, C. Tunnell, K. Warburton
Fig. 3. First demonstration of neural networks for neutrino physics. The network was trained toseparate charged current and neutral current neutrino interactions for the SNO experiment. Thetable on the performance of the network with a classification matrix. performance and sensitivity of the analyses where they have been employed. DeepCNNs have now demonstrated state of the art performance on many tasks andare one of the most common tools used in neutrino physics. The advantages andmotivation to use CNNs in neutrino experiments are largely applicable to other deepneural networks used in neutrino experiment as well. Similarly, the advantages, andthe challenges discussed in the following section applies to CNNs.
3. Applications in Neutrino Experiments
Particle physics experiments, including neutrino experiments, are endeavours whichrequire the analysis of large data sets, sophisticated modeling, and statistics. In thepast two decades, both neutrino physics and machine learning have been experi-encing a renaissance with the discovery of neutrino oscillations and the advent ofdeep learning, respectively. In the 1990s the initial exploration of neural networksin particle physics began. The SNO experiment was the first to explore the use ofneural networks in neutrino physics, using feedforward networks, a type of artificalneural network to classify events based on hit pattern features, shown in Figure 3.While it is true that these neural networks did not outperform other statisticaltechniques at first, they demonstrated the capabilities of these techniques for eventclassification in neutrino detector data. As expertise grew regarding the impactof sample preparation and feature choices in network performance, not only didmachine learning techniques surpass traditional reconstruction, but they would growto be one of the most widely used analysis techniques in the field.Machine learning has played a role in nearly every particle physics discoveryand measurement since. Common analysis frameworks designed for particle physicshave natively supported the use of these tools for almost two decades. The role of machine learning in physics analyses has only grown in scope, takingadvantage of several opportunities specific to our problem set which will be discussedin the next section. These first applications are now commonplace in our field,typically using tools like feedforward networks, MLPs, and more recently boosted achine Learning and Neutrinos decision trees, to name a few. Most common applications start with input variableswhich have been pre-extracted and selected by the analyzer. This continued as themain strategy until the introduction of deep learning tools.The tasks that deep learning algorithms have been applied to in the last decadespan the full extent of experimental analysis work flow, including design, hardwaretriggers, energy estimation, reconstruction, and signal selection. Many applicationsexist which have greatly simplified and improved the performance of experimentsand their physics reach when compared to the standard tools they have replaced.The performance achieved by these tools is the prime motivation for their imple-mentation to solve physics problems, despite the computational complications whichwill be described later in this section.Even more significant than the improvements themselves are the implicationsof the usage of these tools in our experiments. The interplay between neutrinophysics and deep learning is rich in both challenges and opportunities for bothfields. The current status of the field is presented in this section, in the contextof these challenges and opportunities. Rather than providing an exhaustive list ofapplications in a rapidly growing field, those that are notable are highlighted whenrelevant to the item discussed. Challenge 1 — Adaptability of the Methods
The most frequently used deep learning algorithms in neutrino experiments are thosedeveloped or commonly used for image recognition. Given that some experimentalsetups closely resemble or can be mapped into 2-dimensional images, this is a naturalstarting point for many studies to apply the tools of image recognition.However analogous, the problems solved for image or pattern recognition haveimportant differences with particle physics. Some adaptation is usually required forthe usage of these algorithms. Adaptation can be as simple as converting detectordata into image-like tensor inputs or as complicated as complete network redesignfor the new task. The trade-off between simple adaptation and those where theinputs and network are more tuned to the particular task can be significant interms of performance improvement.An example of a deep learning network used with different adaptations in neu-trino experiments is the GoogLeNet CNN architecture. GoogLeNet was the firstcreatively non-sequential implementation of convolutional layers in CNNs, whichbrought significant accuracy and performance improvements with respect to itscompetitors. Following the success of GoogLeNet, many neutrino experiments ex-plored its utilization with minimal or no modifications as a starting point for theirown classification studies. Despite the differences between images and neutrino data,out-of-the-box approaches yielded important successes over traditional methods.A successful out-of-the-box application of GoogLeNet is the NEXT experimentbackground rejection network. The NEXT detectors are cylindrical time projec-tion chambers with photon detection and charge detection at each end, respectively.
F. Psihas, M. Groh, C. Tunnell, K. Warburton
Fig. 4. Input data for the NEXT CNN classifier. Top: Example event in NEXT with 10 mmvoxelization. Bottom: Example event in NEXT with 2 mm voxelization. Columns are the xy , yz ,and xz views of the event. The distinguishing feature of the track projection that identifies theseas signal events is the presence of a larger energy deposition (bragg peak) at the end of each track.This feature is mostly lost in the 10 mm voxelization. Photomultiplier tubes collect a light signal and silicon photomultipliers (SiPM) col-lect an electroluminescence signal from drifted charges inside the detector. Forthe training inputs to resemble 2D images of the particle tracks, the granularity ofthe data from the SiPM readout is reduced to 3D voxels, of dimensions x , y (spa-tial) and z (drift time), which are used as the RGB channels of the CNN inputtensor. The inputs for different voxel sizes are shown in Figure 4. Equal numbers ofsimulated neutrino-less double beta decay signal and radioactive background eventsare used for the training. This simple implementation was found to outperformthe traditional reconstruction by between a factor of 1.2 and 1.6 depending on thereconstruction resolution.The many differences between 2D images and detector data provide an oppor-tunity to improve algorithm performance by making thoughtful modifications tothe original networks. In many cases, large improvements have been attained fromenhancing useful features of the data by making changes to the algorithms and thestructure of the inputs.Such is the case for the Convolutional Visual Network, a CNN classifier de-signed for application on NOvA data. The readout from the two orthogonal views ofthe NOvA detectors is already very image-like and naturally depicts 2-dimensionalprojections of energy depositions. However, the decoupled nature of the two viewsmakes a simple conversion to a single RGB tensor unideal because a conversion ofthe xz and yz views into RGB channels of the same image tensor would result inan unnatural overlap of unrelated features. Rather than artificially overlapping the achine Learning and Neutrinos ConvolutionS
POOLING
LRNINCEPTION OUTPUTFully ConnectedData
Fig. 5. Left: An example readout from the NOvA detector. The planes are arranged in alter-nating orientations to give two orthogonal views of the event. Right: Siamese tower structure ofNOvA’s Convolutional Visual Network, based on GoogLeNet, for neutrino flavor classification.The two towers independently operate on each view of the event. The features from each towerare concatenated in the final layers of the network. orthogonal views, the authors employed a Siamese network structure, allowing in-dependence in the learning from each detector view to identify neutrino interactionflavor. Figure 5 shows NOvA’s detection technology and the architecture used forneutrino identification. This, among other modifications to a GoogLeNet-inspiredarchitecture produced large accuracy improvements, increasing the effective expo-sure of the experiment by 30%. This network was the first to be used in a publishedphysics result, and it demonstrated the significance and impact of adapting boththe tool and the inputs to the detector technology.In addition to detector technology and readout, the geometry of the detector isalso relevant to the adaptability challenge. In some cases, thoughtfully consideringmodifications based on detector geometry can boost performance significantly. Inother cases, this consideration could be essential for applying the tools with anysuccess. Additionally, careful consideration in how to map detector readout to inputscompatible with the network of choice should not me overlooked.One notable application is the use of spherical CNNs for analysis of data fromthe close-to-spherical Kamland-zen detector. This currently ongoing work incor-porates a modification to best fit the detector geometry needs and has alreadydemonstrated improvements in early stages. Given the nearly spherical shape oftheir detector, the authors of this work seek to correct a distortion created by theprojection of the detector readout into a 2D pattern as seen in Figure 6. They em-ploy spherical CNNs for the task of signal-background classification. In a SphericalCNN, the kernel covers the entire phase-space by scanning in Euler angle rather thanprojecting the readout into 2D planes. Indeed, the use of spherical CNNs achieves F. Psihas, M. Groh, C. Tunnell, K. Warburton
Fig. 6. The Kamland-Zen experiment uses spherical convolutions for signal and background sep-aration. The detector is nearly spherical so a traditional mapping to 2D would cause abnormaldistortions in the data. background rejection of 71% compared to 61% for their original CNN. In some cases, other technologies already match the experimental needs sub-stantially better than CNNs. This is the case for the networks used for analysisof IceCube Neutrino Observatory data, whose detector spatial sparsity and non-uniformity makes the data less than ideal for CNNs. Their deep learning applica-tion uses Graphic Neural Networks (GNNs) as a way to mitigate the effect of thesefeatures. This is because GNNs are capable of dealing both with irregular geome-try and graphs of different sizes, a feature which is seen in many of their events.GNNs are designed to classify graphs, where the graph nodes define some elementof the detector and the graph edges show some connection between elements. IceCube’s GNN separates neutrino-induced muons (their signal) from cosmic-rayshower-induced muons (their background), and compared the efficiency of the net-work to that of their standard reconstruction. This GNN was able to identify 6.3times more signal events and provide a signal-to-noise rate 3 times larger. A com-parison with a CNN, which gave similar results to the traditional reconstruction,demonstrated that GNNs offer significant benefits in this application.There are many examples of successfully overcoming this challenge of adaptingdeep learning tools for neutrino data analysis. However, the approach taken foreach application will continue to encounter different obstacles and considerationsunique to the data and the tools chosen for analysis. Careful consideration of thesemodifications continues to show substantial improvements to the direct applicationof image recognition technologies.
Challenge 2 — Quantifying Bias and Uncertainties
One of the risks of applying machine learning is the possibility that the algorithmswill learn information from the training data beyond what is intended. The risksassociated with these techniques are neither new nor specific to the applications inour field. Furthermore, these challenges are starting to receive more attention in thebroader community, industry applications, and government regulations around theworld. achine Learning and Neutrinos The principal danger is that a dataset used for training contains information orimplies underlying structure which may incorrectly bias the results, yet is learned bythe network. This risk is true for all machine learning algorithms, but is particularlynoteworthy with algorithms which perform feature extraction, such as CNNs. Whilefeature extraction is the main advantage of these algorithms, special attention isneeded to mitigate this risk.Given that the features are abstract, their association with the physical traitsof the data is largely unknown. In addition, the networks used for detector dataanalysis are typically trained on simulated datasets of the events of interest. Thesesimulations carry models and assumptions of the detector performance, the par-ticle interactions, and other physical processes. The challenge of quantifying andmitigating biases is particularly important to guarantee robust physical conclusionswhich are also model and generator independent.Apart from deliberate simulation choices, any effects introduced or distributionssculpted in the selection of the training data as well as any mismatch between simu-lated and real events coming from detection performance, particle interactions, cali-bration, or other effects have the possibility of propagating throughout the learningprocess undetected or unquantified. While some existing applications have devisedtailored approaches at mitigating bias, standard and complete approaches to thisimportant challenge are yet to be achieved.Minimizing known bias can be achieved through careful choices in the construc-tion of input datasets. An example of this concept is the charge-only energy recon-struction CNN used by EXO-200. This network is used to discriminate betweenSingle-Site, and Multi-Site events and is found to outperform traditional recon-struction which had been used in previous publications. The network was initiallytrained using a simulated
Th source. When a systematic study was performedwith arbitrary resolution, disproportionately large improvements in resolution werefound for events in the
Tl peak with respect to other classes of events. In orderto correct this, the network was instead trained on a calibration gamma ray sourcedata, which acts as a proxy for various backgrounds, in the center of the detec-tor. The CNN is tested on numerous samples, including simulation as well as Co,
Tl,
Ra and
Tl calibration sources at a range of source locations. Havingimplemented this change to the training data, improved performance was found inthe relevant energy range.It is also possible that biases exist but are unknown to the developer, for instance,when an unintended artifact arises in the data. In contrast with the example above,where training sample composition was kept flat across different backgrounds, othercharacteristics might not be known to be skewed in unphysical ways. It is notpossible to correct or quantify bias which is not yet known, but techniques can bedesigned to either minimize them or look for them in the data.The application of Domain Adversarial Neural Networks by the MINERvA ex-periment is a prime example of unknown bias reduction. Here, the network isapplied to the task of classification using a CNN, but a technique of bias reduc- F. Psihas, M. Groh, C. Tunnell, K. Warburton
Fig. 7. The basic structure of a domain adversarial neural network. The feature extractor(green) and label predictor (blue) function just as a CNN. However, the domain classifier (pink)attempts to determine which of two domains the input is from. The gradient reversal layer dis-courages the network from classifying events using features that are unique to one of the twodomains. tion is employed. The network is trained on both simulation and data. The domainnetwork, whose purpose is to distinguish between the data domain and simulationdomain, is attached to the CNN. It is expected to find features that result fromerrors or inconsistencies in the simulation. As shown in Figure 7, the interplay be-tween the two components discourages the task of classification to learn from anyfeatures that behave differently between the two domains.Another difficulty arises in designing tests that can reveal hidden biases learnedby deep learning algorithms. Ideally, this would look at the performance of thealgorithm on data, but without a method of knowing the true nature of a data event,this is impossible (if such a method existed, we wouldn’t need these algorithms inthe first place!). Instead, we must compare reconstructable quantities between dataand simulations and look for signs of bias between the two, but which biases to lookfor is not obvious. In addition, many experiments begin creating reconstructionalgorithms before data taking has begun and others perform blind analyses wherethe algorithms must be optimized and validated without comparison to data. Whilemethods exist for constructing systematic uncertainties that address possible biasesfor many quantities, how to apply these methods to machine learning algorithms isnot clear.One example of a technique that uses both real and simulated data to searchfor bias is the muon-removed electron-added (MRE) technique used by the NOvAexperiment. Two samples are created by overlaying simulations on real data events(MRE-on-data) and simulations on simulated events (MRE-on-simulation). For eachsample, an identified muon is removed from selected ν µ charged current events,leaving only the hadronic components of the interaction. The muon is substitutedby a simulated electron of equivalent momentum overlaid on the events. Effectively, achine Learning and Neutrinos a comparison of the network performance between the MRE-on-data and MRE-on-simulation samples provides a measure of the bias-related effects introduced bydata-simulation discrepancies in the hadronic component. The resulting differencein selection efficiency between the data overlay and the simulated overlay is lessthan 0.5%.Another data-based technique is to consider human-labeled datasets. While wecan’t know the true identity of a real data event, a trained physicist can oftenidentify with reasonable accuracy. Comparisons of error patterns between humansand deep neural networks have shown differences between the two, which sug-gests differences between the unknown biases from humans to neural networks.This is nevertheless a useful technique to search for large, unexpected biases in theoutcome of the networks. The MicroBooNE experiment uses a liquid argon timeprojection chamber (LArTPC) detector for neutrino interactions. They created ahuman-labeled dataset for validating a semantic segmentation network, a techniquefor classifying individual pixels in an image, trained on simulated neutrino inter-actions. The disagreement between the performance of the network and humanswas less than 2% in the misclassification of pixels.Finally, we consider uncertainties. Quantifying the uncertainties associated withany measurement is an integral part of physics analyses. Traditional neural net-works, by design, output a single value. In some cases, they output high confidencescores on events that are well outside the phase space of samples they were trainedon. While the output may be sensible in this case, it should incur a large uncertainty.Bayesian neural networks are designed to address this concern. They replace thefixed value weights in the network with probability distribution functions, as shownin Figure 8. The resulting output is, thus, also a probability distribution functionwhich can be interpreted as a most probable value with some uncertainty. Thispotential approach at including uncertainties has recently gained attention in theneutrino community with initial implementations currently being explored. Challenge 3 — Network Interpretability
As machine learning models grow deeper, there is often a trade off between the per-formance of the algorithm and our ability to interpret its results. Boosted decisiontrees, for example, are low-level machine learning models. They can often inform theuser of the relative importance of each input into the model, but may not have theaccuracy that can be achieved with deeper models. CNNs on the other hand, haveachieved state of the art performance on many tasks, but the features extracted bythe convolutional layers are abstract and challenging to interpret. Some individualkernels can be connected to specific tasks, such as edge detection. However, thefeatures resulting from multiple convolutions are difficult to connect to topologicalcharacteristics or physical interpretations of the events.This is particularly problematic in physics, where relating network features backto the underlying physics problem is important and sometimes necessary for a com- F. Psihas, M. Groh, C. Tunnell, K. Warburton Fig. 8. A depiction of a bayesian neural network. The fixed value weights in an artificial neuralnetwork are replaced by probability distribution functions. Thus, the output of each neuron is aprobability distribution function with some most probable value and an uncertainty. plete understanding of the physical models. A better conceptual understanding ofthe physical features used by the network could tell us much about the physicalprocesses which produced the features. In addition, the understanding could aid tominimize or correct the inefficiencies in the performance of the algorithm.A common method for interpreting the features extracted by the network is toperform dimensionality reduction. The Daya Bay Reactor Neutrino Experiment isdesigned to detect anti-neutrinos produced by two nearby nuclear reactors. Theyemploy a CNN to separate inverse beta decay (IBD) events, the signal of interest,from noise within the detector. The features extracted by the network are trans-formed into two dimensions suing t-Distributed Stochastic Neighbor Embedding (t-SNE). The t-SNE method uses a non-linear transformation to reduce the dimen-sionality of data in a way that maintains the distance between points local to oneanother. Figure 9 shows the result of this technique. Class separation in this twodimension space, relates to topological information that the network has used todistinguish the different classes.Another method of dimensionality reduction is Principal Component Analysis (PCA). PCA is a linear change of basis where the new basis has vectors along thedimensions of maximum variation in the data. Often only a few of the basis vectorsare needed to explain most the variation in the data. Critically, the new basis vectorsare orthogonal meaning each has a unique contribution to the variation in the data.PCA is often performed on the input data to a network to reduce the number ofinputs needed to a smaller set of independent values which are most important to achine Learning and Neutrinos
150 100 50 0 50 100 15015010050050100150
IBD promptIBD delaymuonflasherother
Fig. 9. The t-SNE produced from the Daya Bay CNN to separate IBD signal from backgroundnoise. The t-SNE is a non-linear transformation used for dimensionality reduction. Visual separa-tion in this space relates to separation in the high dimension feature space created by the network.Each point is labeled by it’s true identity. the task. PCA can also be performed on the network extracted features to reducethe dimensionality for visualizations in a similar way to t-SNE.In addition, some qualitative methods try to determine which features of theinput are most relevant to the output. This is particularly important for CNNsdoing image recognition where determining which topological features of the inputare most important to the network output. One method of doing this is to occluderegions of the input image and determine how the various output scores change. Thistechnique is often called an occlusion test. Another technique is to use the networkitself to determine these valuable features. Salience maps determine the gradientof the output score from the network with respect to each of the input pixels.These maps can show where the network is ”looking” to construct it’s features.Interestingly, these sometimes show that CNNs do not look at the primary objectin an image, but instead at the surrounding context. If some objects are commonly F. Psihas, M. Groh, C. Tunnell, K. Warburton found in the same context, then the context can be used as the primary discriminatorto classify that object.
Challenge 4 — Computational & System Constraints
As mentioned in section 2.1, the latest developments in deep learning are largelydriven by improvements in GPU technology where the many computations neededfor large networks can be done in parallel. Deep neural networks often perform O (cid:0) (cid:1) floating point operations. This is compounded by the amount of data col-lected in particle physics experiments. Modern neutrino experiments record bil-lions of events which require evaluation by various reconstruction and analysis algo-rithms. Many experiments perform these evaluations on large-scale computing gridson CPUs.While neural networks have expanded the capabilities of many neutrino exper-iments, this computing limitation provides a bottleneck to widespread use of verydeep neural networks. Here we consider three methods to alleviate this concern.One potential solution is to expand the availability of GPUs. Small GPU clustersused for training neural networks are becoming more common. However, these arenot enough to match the production needs of many experiments. Larger availabilityof GPU clusters would enhance the ability of experiments to utilize large neuralnetwork based algorithms.Another possibility is to enhance the physics output from these algorithms. Asdiscussed throughout this manuscript, machine learning based methods often showsignificant improvements over traditional methods. One way to improve performanceis to maximize the primary task algorithm, but the implementation of multi-taskalgorithms could be a promising way to enhance the total physics output from anindividual algorithm. The Deep Underground Neutrino Experiment (DUNE) is afuture neutrino oscillation experiment currently in R&D stages for its LArTPCdetector. The DUNE experiment employs a CNN for identification of neutrino in-teraction flavor in their detector, which achieves more than 85% efficiency of ν e charged-current events in the energy range of interest. In addition to flavor classifi-cation, the algorithm also outputs the sign of the neutrino, the type of interaction,and the amounts of each particle in the final state. In total, the network has sevenoutputs at very little additional computational cost since each output uses the sameset of features extracted by the network.Reducing the computational cost of the algorithms is another option which wouldreduce the total computational need of experiments. Using smaller networks is oneoption, but this comes at the cost of performance. Instead, considerations can begiven to the type of data acquired by experiments. LArTPC detectors, such asthose used by DUNE or the Short Baseline Neutrino program, have very lowoccupancy, the fraction of active detector readout from an event. These events areglobally sparse, <
1% sparsity, but locally dense, in the region of the detector wherethe event occurred. An example of an event recorded in a LArTPC is in Figure 10. achine Learning and Neutrinos
This means that typical CNNs will waste much computation time multiplying orsumming together zeros. It’s been shown that using submanifold sparse convolu-tional networks can reduce the inference time of these networks by a factor of30 and the memory cost by more than 300. These sparse convolutional networksare designed for use with sparse data and only perform convolution operations inregions with activity.Finally, we consider the use of open datasets in algorithm development. Opendatasets are commonly used to benchmark algorithm performance in data scienceapplications. Despite increasing efforts from a handful of experiments to providesuch datasets for analysis, there are still many restrictions surrounding data-sharing in the field. The lack of available data sets negatively impacts the ability ofresearchers to develop and publish improved machine learning techniques specificto particle physics applications and significantly hinders progress in developmentsrequiring real data, such as bias assessment. Open data sets would not only enablethese advancements to be developed further, but it would significantly encouragebeneficial multi-disciplinary collaboration which would surely improve the qualityof physics of our our experiments.
4. Opportunities going forward
The use of machine learning and currently deep learning algorithms for neutrinoexperiment data analysis is on the rise. We have presented an overview of theimpacts of these techniques in the field through a description of the challenges andopportunities associated with their usage. F. Psihas, M. Groh, C. Tunnell, K. Warburton
Opportunity 1 — Impact to Physics and Technology
The application of machine learning tools to neutrino physics is also relevant to theprocess of experiment design and proposal, which brings about opportunities to fur-ther impact the capabilities of future experiments. The next generation of neutrinoexperiments will introduce needs and challenges beyond what the field has encoun-tered. Massive detectors designed to measure neutrino oscillations will redefine thechallenges of data rates and data management and will continue to look for waysto expand their physics program.
51, 52
Neutrino-less double beta decay experimentsat and beyond the ton-scale will require exceptional rejection of radioactive back-grounds beyond what has ever been achieved. The emerging field of multi-messengerastronomy will further encourage experiments to expand their sensitivity to signalsbeyond their current reach.Much like previous generations, this generation of experiments will only be pos-sible by pushing technological frontiers. This presents opportunities for the field ofparticle physics and machine learning, which could cement the synergy between thetwo fields in mutually beneficial ways.An interesting example of research and development (R&D) involving machinelearning is their application on the hardware trigger being developed for the DUNEexperiment. The large data rates expected on DUNE detectors currently constrainthe energy range available for analysis. Figure 11 shows a single DUNE data frame.The majority of the electronics noise as well as radioactive backgrounds are safelybelow the energies of the accelerator neutrinos DUNE is designed to study. How-ever, there are also interesting signals on the MeV-scale energy range which couldpotentially be studied such as supernova neutrinos, solar neutrinos, and neutrino-less double beta decay. Unfortunately, it is possible that the currently availablehardware for data acquisition systems will require the elimination of much of thelow energy noise from DUNE’s data stream at the trigger level in order to maintainmanageable data rates. However, if the physics reach of DUNE could be extendedto study low energy signals, it could produce world leading measurements of solarneutrino oscillations.In order to enable a DUNE low energy program, data acquisition hardware willneed to sustain high rates at low down times. Research into the applications of deeplearning to hardware triggers and data acquisition for DUNE is ongoing to resolvethis issue. Hardware acceleration as well as well as optimal implementation of deeplearning algorithms on FPGAs and GPUs is being explored. This work is exploringthe possibility of online data analysis capable of process up to tens of terabits persecond, aided by the capabilities of CNNs to tackle high rate image processing. The capabilities of this trigger may well define whether low energy signals willbe available to explore on DUNE. Thus, the usage of machine learning algorithmsmight significantly contribute to not just improving performance of existing anal-yses, but expanding the physics program that is available to experiments. Whilethere is community consensus on some of the challenges machine learning will need achine Learning and Neutrinos C h a nn e l Time C h a nn e l Time
Fig. 11. Left: A high energy atmospheric neutrino interaction in the DUNE LArTPC. Right: Alow energy supernova neutrino interaction in the DUNE LArTPC. to address going forward, we are only starting to recognize that machine learningdevelopment is an integral part of neutrino physics research. The continued activepursuit of R&D involving machine learning applications might significantly changethe neutrino physics landscape in the coming decades.
Opportunity 2 — What Physics Can Contribute to MachineLearning
The unique nature of the problem set and analysis strategies of neutrino physics(and particle physics) experiments brings about the potential to contribute newknowledge and applications to the field of computer science. Two aspects drive thisopportunity:1. Quantitative results and careful statistical analysis. Statistical precision is oneof the hallmarks of particle physics experiments. Carefully quantifying results anduncertainties becomes even more important as neutrino experiments move into theprecision era. As we develop tools and techniques to address the challenge of biasassessment and uncertainty quantification for our needs, these developments willsurely inform the broader picture of secure, ethical, and responsible treatment ofmachine learning beyond scientific applications.
57, 58
2. Customizable simulated datasets corresponding to real physical data. Themajority of industry applications of machine learning are developed, tested, andapplied in real-world datasets. Training usually employs labeled data of the sametype as that to which network will be applied. In contrast, neutrino experimentsusually construct and train most of their analysis infrastructure on simulated datathat resembles the expected data. The detail to which these simulations are tunableis especially relevant to the study of machine learning algorithms. It provides theopportunity to study their behavior under controlled modifications in the trainingsamples, which could greatly contribute to the challenge of explainability in and F. Psihas, M. Groh, C. Tunnell, K. Warburton
218 220 222 224 226 228 sec) μ t (
10 10 q (ADC) h it s
200 400 600 800 1000 1200200 400 600 800 1000 1200 z (cm)
NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 15330 / 4Event: 11978 / --UTC Fri May 23, 201417:30:2.632293184
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) − y ( c m ) With context
NOvA - FNAL E929
Run: 21259 / 45Event: 678909 / --UTC Sat Nov 21, 201510:04:0.679051776
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) NOvA - FNAL E929
Run: 21259 / 45Event: 678909 / --UTC Sat Nov 21, 201510:04:0.679051776
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) NOvA - FNAL E929
Run: 19348 / 58Event: 832317 / --UTC Wed Apr 15, 201516:36:48.989789568
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) NOvA - FNAL E929
Run: 19348 / 58Event: 832317 / --UTC Wed Apr 15, 201516:36:48.989789568
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) Far Detector Data
218 220 222 224 226 228 sec) μ t (
10 10 q (ADC) h it s
200 400 600 800 1000 1200200 400 600 800 1000 1200 z (cm)
NOvA - FNAL E929
Run: 10713 / 4Event: 500244 / -- h it s UTC Tue Jan 27, 201505:48:26.091133824
218 220 222 224 226 228 sec) μ t (
10 10 q (ADC) h it s
200 400 600 800 1000 1200200 400 600 800 1000 1200 z (cm)
218 220 222 224 226 228 sec) μ t (
10 10 q (ADC) h it s
200 400 600 800 1000 1200200 400 600 800 1000 1200 z (cm)
NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 22242 / 18Event: 232856 / --UTC Mon Feb 15, 201604:36:13.437062336 sec) µ t (
218 220 222 224 226 228 h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) − − − NOvA - FNAL E929
Run: 15330 / 4Event: 11978 / --UTC Fri May 23, 201417:30:2.632293184
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) − y ( c m ) Particle Only
NOvA - FNAL E929
Run: 21259 / 45Event: 678909 / --UTC Sat Nov 21, 201510:04:0.679051776
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) NOvA - FNAL E929
Run: 21259 / 45Event: 678909 / --UTC Sat Nov 21, 201510:04:0.679051776
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) NOvA - FNAL E929
Run: 19348 / 58Event: 832317 / --UTC Wed Apr 15, 201516:36:48.989789568
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) NOvA - FNAL E929
Run: 19348 / 58Event: 832317 / --UTC Wed Apr 15, 201516:36:48.989789568
218 220 222 224 226 228 sec) µ t ( h it s q (ADC) h it s x ( c m ) z (cm) y ( c m ) Far Detector Data
NOvA - FNAL E929
Run: 10713 / 4Event: 500244 / -- h it s UTC Tue Jan 27, 201505:48:26.091133824
With Context
Fig. 12. One view of a neutral current event with a π decay in the NOvA detector. Left: Justone of the particles produced by the decay. Right: The same particle with the context from therest of the event (grey) shown. The knowledge from the context, such as the particle separationfrom the vertex, make it clear that this is a photon shower. outside the field. Opportunity 3 — Innovations
There are also opportunities in the area of overlap between the problem sets ofneutrino physics and machine learning. It is no surprise that we are starting todevelop machine learning inspired tools which can be applicable outside neutrinophysics.For example, applications of machine learning to NOvA detector data have beenfurther explored, specifically targeting single particle identification within clustersof particles. As shown in Figure 12, each cluster of energy depositions in an inter-action needs to be further analyzed to identify its producer. In this case, knowledgeof the single particle cluster is useful, but there is much to be gained from pro-viding some context to the classification network. In a recent publication, theauthors demonstrate a technique to add context information to a CNN input andhow to implement the Siamese concept to take advantage of “particle-only” as wellas “context view” of the inputs. This technique is the first to employ a Siamesearchitecture for the addition of context. As such, it is a contribution to both fields.In the neutrino physics application, it was found that adding context to the inputsimproved the identification efficiency of particles by up to 11%.Opportunities for the synergy between neutrino physics and machine learningare plentiful. The deeper appreciation for the complexity and overlap of each oftheir problem sets may continue to give way to enhanced advances for both fields.
5. Acknowledgements
The authors thank Justin Vasel for his helpful review of this manuscript. The au-thors would like to acknowledge Georgia Karagiorgi for productive conversationsabout the exciting directions of R&D for accelerated hardware as well as Tar-itree Wongjirad, Kazuhiro Terao, and Tingjun Yang for their useful insight of the achine Learning and Neutrinos challenges of LArTPC applications. Fermilab is operated by Fermi Research Al-liance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Departmentof Energy, Office of Science, Office of High Energy Physics. The United StatesGovernment retains and the publisher, by accepting the article for publication, ac-knowledges that the United States Government retains a non-exclusive, paid-up,irrevocable, world-wide license to publish or reproduce the published form of thismanuscript, or allow others to do so, for United States Government purposes. References
1. S. Ritz et al. Building for Discovery: Strategic Plan for U.S. Particle Physicsin the Global Context, institution = U.S. Department of Energy and Na-tional Science Foundation. https://science.energy.gov/~/media/hep/hepap/pdf/May-2014/FINAL_P5_Report_Interactive_060214.pdf , 2014.2. APPEC. European AstroparticlePhysics Strategy 2017-2026. , 2017.3. S. Pascoli et al. Connecting low energy leptonic CP-violation to leptogenesis.
Phys.Rev. , D75:083511, 2007.4. M E Berbenni Bitsch and A Vancura. Neutrino oscillations and the solar neutrinoproblem.
European Journal of Physics , 10(4):243–253, oct 1989.5. Y. Fukuda et al. Evidence for oscillation of atmospheric neutrinos.
Phys. Rev. Lett. ,81:1562–1567, 1998.6. B. Aharmim et al. Electron energy spectra, fluxes, and day-night asymmetries of B-8solar neutrinos from measurements with NaCl dissolved in the heavy-water detectorat the Sudbury Neutrino Observatory.
Phys. Rev. , C72:055502, 2005.7. Chigozie Nwankpa, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall. Ac-tivation functions: Comparison of trends in practice and research for deep learning,2018.8. Y. Guo et al. Deep learning for visual understanding: A review.
Neurocomputing , 187,11 2015.9. J¨urgen Schmidhuber. Deep learning in neural networks: An overview.
Neural Networks ,61:85–117, Jan 2015.10. Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks,2015.11. Md Zahangir Alom et al. The history began from alexnet: A comprehensive survey ondeep learning approaches, 2018.12. Leif Lonnblad, Carsten Peterson, and Thorsteinn Rognvaldsson. Finding Gluon JetsWith a Neural Trigger.
Phys. Rev. Lett. , 65:1321–1324, 1990.13. S. Brice. Results of a neural network statistical event class analysis.14. R. Brun and F. Rademakers. ROOT: An object oriented data analysis framework.
Nucl. Instrum. Meth. A , 389:81–86, 1997.15. C. Szegedy et al. Going deeper with convolutions, 2014.16. O. Russakovsky et al. ImageNet Large Scale Visual Recognition Challenge.
Interna-tional Journal of Computer Vision (IJCV) , 115(3):211–252, 2015.17. J. Renner et al. Background rejection in NEXT using deep neural networks.
Journalof Instrumentation , 12(01):T01004–T01004, jan 2017.18. V. Alvarez et al. NEXT-100 Technical Design Report (TDR): Executive Summary.
JINST , 7:T06001, 2012. F. Psihas, M. Groh, C. Tunnell, K. Warburton
19. A. Aurisano et al. A convolutional neural network neutrino event classifier.
Journalof Instrumentation , 11(09):P09001–P09001, Sep 2016.20. Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard S¨ackinger, and Roopak Shah.Signature verification using a ”siamese” time delay neural network. In
Proceedings ofthe 6th International Conference on Neural Information Processing Systems , NIPS’93,pages 737–744, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc.21. P. Adamson et al. Constraints on Oscillation Parameters from ν e Appearance and ν µ Disappearance in NOvA.
Phys. Rev. Lett. , 118(23):231801, 2017.22. A. Gando et al. Search for Majorana Neutrinos near the Inverted Mass HierarchyRegion with KamLAND-Zen.
Phys. Rev. Lett. , 117(8):082503, 2016. [Addendum:Phys.Rev.Lett. 117, 109903 (2016)].23. Zhenghao Fu. Detection of cosmic muon spallation background in ls-detector using ma-chine learning. 29 th International Conference on Neutrino Physics and Astrophysics,2020.24. A. Li, A. Elagin, S. Fraker, C. Grant, and L. Winslow. Suppression of cosmic muonspallation backgrounds in liquid scintillator detectors using convolutional neural net-works.
Nuclear Instruments and Methods in Physics Research Section A: Accelerators,Spectrometers, Detectors and Associated Equipment , 947:162604, Dec 2019.25. Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang,Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods andapplications, 2018.26. N. Choma et al. Graph neural networks for icecube signal classification, 2018.27. N. Mehrabi et al. A survey on bias and fairness in machine learning, 2019.28. The White House Office of Science and Technology. American artificial inteligenceinitiative: Year one report. 2020.29. Nicol Turner Lee. Detecting racial bias in algorithms and machine learning.
J. Inf.Commun. Ethics Soc. , 16:252–260, 2018.30. S. Delaquis et al. Deep neural networks for energy and position reconstruction inEXO-200.
Journal of Instrumentation , 13(08):P08023–P08023, aug 2018.31. G. Anton et al. Search for neutrinoless double- β decay with the complete exo-200dataset. Phys. Rev. Lett. , 123:161802, Oct 2019.32. G. Perdue et al. Reducing model bias in a deep learning classifier using domain ad-versarial neural networks in the minerva experiment.
Journal of Instrumentation ,13(11):P11020–P11020, Nov 2018.33. Y. Ganin et al. Domain-adversarial training of neural networks, 2015.34. Kanika Sachdev.
Muon Neutrino to Electron Neutrino Oscillation in NO ν A . PhDthesis, Minnesota U., 2015.35. Robert Geirhos, David H. J. Janssen, Heiko H. Sch¨utt, Jonas Rauber, MatthiasBethge, and Felix A. Wichmann. Comparing deep neural networks against humans:object recognition when the signal gets weaker, 2017.36. Bonnie Fleming. The MicroBooNE Technical Design Report. 2 2012.37. C. Adams et al. Deep neural network for pixel-level electromagnetic particle identifi-cation in the microboone liquid argon time projection chamber. Physical Review D ,99(9), May 2019.38. Tom Charnock, Laurence Perreault-Levasseur, and Fran¸cois Lanusse. Bayesian neuralnetworks, 2020.39. Aashwin Mishra. Uncertainty estimation for deep learning in neutrino physics. Neu-trino Physics and Machine Learning, 2020.40. Jun Cao and Kam-Biu Luk. An overview of the daya bay reactor neutrino experiment.
Nuclear Physics B , 908:62–73, Jul 2016. achine Learning and Neutrinos
41. E. Racah et al. Revealing Fundamental Physics from the Daya Bay Neutrino Experi-ment using Deep Neural Networks. 1 2016.42. L.J.P. van der Maaten and G.E. Hinton. Visualizing high-dimensional data using t-sne.2008.43. I. T. Jolliffe.
Principal Component Analysis and Factor Analysis , pages 115–128.Springer New York, New York, NY, 1986.44. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutionalnetworks: Visualising image classification models and saliency maps, 2013.45. Babak Abi et al. Deep Underground Neutrino Experiment (DUNE), Far DetectorTechnical Design Report, Volume I Introduction to DUNE. 2020.46. B. Abi et al. Neutrino interaction classification with a convolutional neural networkin the dune far detector, 2020.47. Pedro A.N. Machado, Ornella Palamara, and David W. Schmitz. The short-baselineneutrino program at fermilab.
Annual Review of Nuclear and Particle Science ,69(1):363–387, Oct 2019.48. Laura Domin´e and Kazuhiro Terao. Scalable deep convolutional neural networks forsparse, locally dense liquid argon time projection chamber data.
Physical Review D ,102(1), Jul 2020.49. J. Deng et al. ImageNet: A Large-Scale Hierarchical Image Database. In
CVPR09 ,2009.50. Deep learn physics public dataset. http://deeplearnphysics.org/DataChallenge/ .Accessed: 2020-07-29.51. Babak Abi et al. Deep Underground Neutrino Experiment (DUNE), Far DetectorTechnical Design Report, Volume I Introduction to DUNE. 2020.52. K. Abe et al. Hyper-Kamiokande Design Report. 5 2018.53. Francesco Capozzi, Shirley Weishi Li, Guanying Zhu, and John F. Beacom. Dune asthe next-generation solar neutrino experiment.
Physical Review Letters , 123(13), Sep2019.54. J. Zennamo et al. Dune-beta: Can we expand dune’s physics program to search forneurtino-less double beta decay? The XXIX International Conference on NeutrinoPhysics and Astrophysics, 2020.55. Y. Jwa, G. Di Guglielmo, L. P. Carloni, and G. Karagiorgi. Accelerating deep neuralnetworks for real-time data selection for high-resolution imaging particle detectors. In , pages 1–10, 2019.56. Kim Albertsson et al. Machine Learning in High Energy Physics Community WhitePaper.
J. Phys. Conf. Ser. , 1085(2):022008, 2018.57. Anna Jobin, Marcello Ienca, and Effy Vayena. The global landscape of ai ethics guide-lines.
Nature Machine Intelligence , 1(9):389–399, 2019.58. B. Mittelstadt et al. The ethics of algorithms: Mapping the debate.
Big Data & Society ,3(2):2053951716679679, 2020/08/02 2016.59. F. Psihas et al. Context-enriched identification of particles with a convolutional net-work for neutrino events.