[PDF] Computational Techniques for the Analysis of Small Signals in High-Statistics Neutrino Oscillation Experiments

Abstract

The current and upcoming generation of Very Large Volume Neutrino Telescopes---collecting unprecedented quantities of neutrino events---can be used to explore subtle effects in oscillation physics, such as (but not restricted to) the neutrino mass ordering. The sensitivity of an experiment to these effects can be estimated from Monte Carlo simulations. With the high number of events that will be collected, there is a trade-off between the computational expense of running such simulations and the inherent statistical uncertainty in the determined values. In such a scenario, it becomes impractical to produce and use adequately-sized sets of simulated events with traditional methods, such as Monte Carlo weighting. In this work we present a staged approach to the generation of binned event distributions in order to overcome these challenges. By combining multiple integration and smoothing techniques which address limited statistics from simulation it arrives at reliable analysis results using modest computational resources.

Full PDF

CComputational Techniques for the Analysis of Small Signals inHigh-Statistics Neutrino Oscillation Experiments

M. G. Aartsen q , M. Ackermann bf , J. Adams q , J. A. Aguilar m , M. Ahlers u , M. Ahrens aw , C.Alispach aa , K. Andeen an , T. Anderson bc , I. Ansseau m , G. Anton y , C. Arg¨uelles o , T. C.Arlen bc , J. Auﬀenberg a , S. Axani o , P. Backes a , H. Bagherpour q , X. Bai at , A. Balagopal V. ad ,A. Barbano aa , S. W. Barwick ac , B. Bastian bf , V. Baum al , S. Baur m , R. Bay h , J. J. Beatty s,t ,K.-H. Becker be , J. Becker Tjus k , S. BenZvi av , D. Berley r , E. Bernardini bf,1 , D. Z. Besson ae,2 ,G. Binder h,i , D. Bindig be , E. Blaufuss r , S. Blot bf , C. Bohm aw , M. B¨orner v , S. B¨oser al , O.Botner bd , J. B¨ottcher a , E. Bourbeau u , J. Bourbeau ak , F. Bradascio bf , J. Braun ak , S. Bron aa ,J. Brostean-Kaiser bf , A. Burgman bd , J. Buscher a , R. S. Busse ao , T. Carver aa , C. Chen f , E.Cheung r , D. Chirkin ak , S. Choi ay , K. Clark af , L. Classen ao , A. Coleman ap , G. H. Collin o , J.M. Conrad o , P. Coppin n , P. Correa n , D. F. Cowen bb,bc , R. Cross av , P. Dave f , C. De Clercq n ,J. J. DeLaunay bc , H. Dembinski ap , K. Deoskar aw , S. De Ridder ab , P. Desiati ak , K. D. deVries n , G. de Wasseige n , M. de With j , T. DeYoung w , A. Diaz o , J. C. D´ıaz-V´elez ak , H.Dujmovic ad , M. Dunkman bc , E. Dvorak at , B. Eberhardt ak , T. Ehrhardt al , P. Eller bc , R.Engel ad , J. J. Evans am , P. A. Evenson ap , S. Fahey ak , A. R. Fazely g , J. Felde r , K. Filimonov h ,C. Finley aw , D. Fox bb , A. Franckowiak bf , E. Friedman r , A. Fritz al , T. K. Gaisser ap , J.Gallagher aj , E. Ganster a , S. Garrappa bf , L. Gerhardt i , K. Ghorbani ak , T. Glauch z , T.Gl¨usenkamp y , A. Goldschmidt i , J. G. Gonzalez ap , D. Grant w , Z. Griﬃth ak , S. Griswold av , M.G¨under a , M. G¨und¨uz k , C. Haack a , A. Hallgren bd , R. Halliday w , L. Halve a , F. Halzen ak , K.Hanson ak , A. Haungs ad , D. Hebecker j , D. Heereman m , P. Heix a , K. Helbing be , R. Hellauer r ,F. Henningsen z , S. Hickford be , J. Hignight x , G. C. Hill b , K. D. Hoﬀman r , R. Hoﬀmann be , T.Hoinka v , B. Hokanson-Fasig ak , K. Hoshina ak,3 , F. Huang bc , M. Huber z , T. Huber ad,bf , K.Hultqvist aw , M. H¨unnefeld v , R. Hussain ak , S. In ay , N. Iovine m , A. Ishihara p , G. S. Japaridze e ,M. Jeong ay , K. Jero ak , B. J. P. Jones d , F. Jonske a , R. Joppe a , D. Kang ad , W. Kang ay , A.Kappes ao , D. Kappesser al , T. Karg bf , M. Karl z , A. Karle ak , T. Katori ah , U. Katz y , M.Kauer ak , J. L. Kelley ak , A. Kheirandish ak , J. Kim ay , T. Kintscher bf , J. Kiryluk ax , T. Kittler y ,S. R. Klein h,i , R. Koirala ap , H. Kolanoski j , L. K¨opke al , C. Kopper w , S. Kopper ba , D. J.Koskinen u , M. Kowalski j,bf , K. Krings z , G. Kr¨uckl al , N. Kulacz x , N. Kurahashi as , A.Kyriacou b , J. L. Lanfranchi bc , M. J. Larson r , F. Lauber be , J. P. Lazar ak , K. Leonard ak , A.Leszczy´nska ad , M. Leuermann a , Q. R. Liu ak , E. Lohﬁnk al , C. J. Lozano Mariscal ao , L. Lu p , F.Lucarelli aa , J. L¨unemann n , W. Luszczak ak , Y. Lyu h,i , W. Y. Ma bf , J. Madsen au , G. Maggi n ,K. B. M. Mahn w , Y. Makino p , P. Mallik a , K. Mallot ak , S. Mancina ak , S. Mandalia ah , I. C.Mari¸s m , R. Maruyama aq , K. Mase p , R. Maunu r , F. McNally ai , K. Meagher ak , M. Medici u , A.Medina t , M. Meier v , S. Meighen-Berger z , T. Menne v , G. Merino ak , T. Meures m , J. Micallef w ,D. Mockler m , G. Moment´e al , T. Montaruli aa , R. W. Moore x , R. Morse ak , M. Moulai o , P.Muth a , R. Nagai p , U. Naumann be , G. Neer w , H. Niederhausen z , M. U. Nisa w , S. C.Nowicki w , D. R. Nygren i , A. Obertacke Pollmann be , M. Oehler ad , A. Olivas r , A.O’Murchadha m , E. O’Sullivan aw , T. Palczewski h,i , H. Pandya ap , D. V. Pankova bc , N. Park ak ,P. Peiﬀer al , C. P´erez de los Heros bd , S. Philippen a , D. Pieloth v , E. Pinat m , A. Pizzuto ak , M.Plum an , A. Porcelli ab , P. B. Price h , G. T. Przybylski i , C. Raab m , A. Raissi q , M. Rameez u , L.1 a r X i v : . [ phy s i c s . d a t a - a n ] D ec auch bf , K. Rawlins c , I. C. Rea z , R. Reimann a , B. Relethford as , M. Renschler ad , G. Renzi m ,E. Resconi z , W. Rhode v , M. Richman as , S. Robertson i , M. Rongen a , C. Rott ay , T. Ruhe v , D.Ryckbosch ab , D. Rysewyk w , I. Safa ak , S. E. Sanchez Herrera w , A. Sandrock v , J. Sandroos al ,M. Santander ba , S. Sarkar ar , S. Sarkar x , K. Satalecka bf , M. Schaufel a , H. Schieler ad , P.Schlunder v , T. Schmidt r , A. Schneider ak , J. Schneider y , F. G. Schr¨oder ad,ap , L. Schulte l , L.Schumacher a , S. Sclafani as , D. Seckel ap , S. Seunarine au , S. Shefali a , M. Silva ak , R. Snihur ak ,J. Soedingrekso v , D. Soldin ap , S. S¨oldner-Rembold am , M. Song r , G. M. Spiczak au , C.Spiering bf , J. Stachurska bf , M. Stamatikos t , T. Stanev ap , R. Stein bf , P. Steinm¨uller ad , J.Stettner a , A. Steuer al , T. Stezelberger i , R. G. Stokstad i , A. St¨oßl p , N. L. Strotjohann bf , T.St¨urwald a , T. Stuttard u , G. W. Sullivan r , I. Taboada f , F. Tenholt k , S. Ter-Antonyan g , A.Terliuk bf , S. Tilav ap , K. Tollefson w , L. Tomankova k , C. T¨onnis az , S. Toscano m , D. Tosi ak , A.Trettin bf , M. Tselengidou y , C. F. Tung f , A. Turcati z , R. Turcotte ad , C. F. Turley bc , B. Ty ak ,E. Unger bd , M. A. Unland Elorrieta ao , M. Usner bf , J. Vandenbroucke ak , W. Van Driessche ab ,D. van Eijk ak , N. van Eijndhoven n , J. van Santen bf , S. Verpoest ab , M. Vraeghe ab , C.Walck aw , A. Wallace b , M. Wallraﬀ a , N. Wandkowsky ak , T. B. Watson d , C. Weaver x , A.Weindl ad , M. J. Weiss bc , J. Weldert al , C. Wendt ak , J. Werthebach ak , B. J. Whelan b , N.Whitehorn ag , K. Wiebe al , C. H. Wiebusch a , L. Wille ak , D. R. Williams ba , L. Wills as , M.Wolf z , J. Wood ak , T. R. Wood x , K. Woschnagg h , G. Wrede y , S. Wren am , D. L. Xu ak , X. W.Xu g , Y. Xu ax , J. P. Yanez x , G. Yodh ac , S. Yoshida p , T. Yuan ak , M. Z¨ocklein a a III. Physikalisches Institut, RWTH Aachen University, D-52056 Aachen, Germany b Department of Physics, University of Adelaide, Adelaide, 5005, Australia c Dept. of Physics and Astronomy, University of Alaska Anchorage, 3211 Providence Dr., Anchorage, AK99508, USA d Dept. of Physics, University of Texas at Arlington, 502 Yates St., Science Hall Rm 108, Box 19059,Arlington, TX 76019, USA e CTSPS, Clark-Atlanta University, Atlanta, GA 30314, USA f School of Physics and Center for Relativistic Astrophysics, Georgia Institute of Technology, Atlanta, GA30332, USA g Dept. of Physics, Southern University, Baton Rouge, LA 70813, USA h Dept. of Physics, University of California, Berkeley, CA 94720, USA i Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA j Institut f¨ur Physik, Humboldt-Universit¨at zu Berlin, D-12489 Berlin, Germany k Fakult¨at f¨ur Physik & Astronomie, Ruhr-Universit¨at Bochum, D-44780 Bochum, Germany l Physikalisches Institut, Universit¨at Bonn, Nussallee 12, D-53115 Bonn, Germany m Universit´e Libre de Bruxelles, Science Faculty CP230, B-1050 Brussels, Belgium n Vrije Universiteit Brussel (VUB), Dienst ELEM, B-1050 Brussels, Belgium o Dept. of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA p Dept. of Physics and Institute for Global Prominent Research, Chiba University, Chiba 263-8522, Japan q Dept. of Physics and Astronomy, University of Canterbury, Private Bag 4800, Christchurch, New Zealand r Dept. of Physics, University of Maryland, College Park, MD 20742, USA s Dept. of Astronomy, Ohio State University, Columbus, OH 43210, USA t Dept. of Physics and Center for Cosmology and Astro-Particle Physics, Ohio State University, Columbus,OH 43210, USA u Niels Bohr Institute, University of Copenhagen, DK-2100 Copenhagen, Denmark v Dept. of Physics, TU Dortmund University, D-44221 Dortmund, Germany w Dept. of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA x Dept. of Physics, University of Alberta, Edmonton, Alberta, Canada T6G 2E1 Erlangen Centre for Astroparticle Physics, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, D-91058Erlangen, Germany z Physik-department, Technische Universit¨at M¨unchen, D-85748 Garching, Germany aa D´epartement de physique nucl´eaire et corpusculaire, Universit´e de Gen`eve, CH-1211 Gen`eve, Switzerland ab Dept. of Physics and Astronomy, University of Gent, B-9000 Gent, Belgium ac Dept. of Physics and Astronomy, University of California, Irvine, CA 92697, USA ad Karlsruhe Institute of Technology, Institut f¨ur Kernphysik, D-76021 Karlsruhe, Germany ae Dept. of Physics and Astronomy, University of Kansas, Lawrence, KS 66045, USA af SNOLAB, 1039 Regional Road 24, Creighton Mine 9, Lively, ON, Canada P3Y 1N2 ag Department of Physics and Astronomy, UCLA, Los Angeles, CA 90095, USA ah School of Physics and Astronomy, Queen Mary University of London, London E1 4NS, United Kingdom ai Department of Physics, Mercer University, Macon, GA 31207-0001, USA aj Dept. of Astronomy, University of Wisconsin, Madison, WI 53706, USA ak Dept. of Physics and Wisconsin IceCube Particle Astrophysics Center, University of Wisconsin, Madison,WI 53706, USA al Institute of Physics, University of Mainz, Staudinger Weg 7, D-55099 Mainz, Germany am School of Physics and Astronomy, The University of Manchester, Oxford Road, Manchester, M13 9PL,United Kingdom an Department of Physics, Marquette University, Milwaukee, WI, 53201, USA ao Institut f¨ur Kernphysik, Westf¨alische Wilhelms-Universit¨at M¨unster, D-48149 M¨unster, Germany ap Bartol Research Institute and Dept. of Physics and Astronomy, University of Delaware, Newark, DE19716, USA aq Dept. of Physics, Yale University, New Haven, CT 06520, USA ar Dept. of Physics, University of Oxford, Parks Road, Oxford OX1 3PU, UK as Dept. of Physics, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA at Physics Department, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA au Dept. of Physics, University of Wisconsin, River Falls, WI 54022, USA av Dept. of Physics and Astronomy, University of Rochester, Rochester, NY 14627, USA aw Oskar Klein Centre and Dept. of Physics, Stockholm University, SE-10691 Stockholm, Sweden ax Dept. of Physics and Astronomy, Stony Brook University, Stony Brook, NY 11794-3800, USA ay Dept. of Physics, Sungkyunkwan University, Suwon 16419, Korea az Institute of Basic Science, Sungkyunkwan University, Suwon 16419, Korea ba Dept. of Physics and Astronomy, University of Alabama, Tuscaloosa, AL 35487, USA bb Dept. of Astronomy and Astrophysics, Pennsylvania State University, University Park, PA 16802, USA bc Dept. of Physics, Pennsylvania State University, University Park, PA 16802, USA bd Dept. of Physics and Astronomy, Uppsala University, Box 516, S-75120 Uppsala, Sweden be Dept. of Physics, University of Wuppertal, D-42119 Wuppertal, Germany bf DESY, D-15738 Zeuthen, Germany

Abstract

The current and upcoming generation of Very Large Volume Neutrino Telescopes—collecting unprecedented quantities of neutrino events—can be used to explore subtle eﬀectsin oscillation physics, such as (but not restricted to) the neutrino mass ordering. Thesensitivity of an experiment to these eﬀects can be estimated from Monte Carlo simulations.With the high number of events that will be collected, there is a trade-oﬀ between thecomputational expense of running such simulations and the inherent statistical uncertaintyin the determined values. In such a scenario, it becomes impractical to produce and use3dequately-sized sets of simulated events with traditional methods, such as Monte Carloweighting. In this work we present a staged approach to the generation of binned eventdistributions in order to overcome these challenges. By combining multiple integration andsmoothing techniques which address limited statistics from simulation it arrives at reliableanalysis results using modest computational resources.

Keywords:

Data Analysis, Monte Carlo, MC, Statistics, Smoothing, KDE, Neutrino,Neutrino Mass Ordering, Detector, VLV ν T

1. Introduction

By virtue of their multi-megaton eﬀective mass paired with the magnitude of the at-mospheric neutrino ﬂux, the next generation of Very Large Volume Neutrino Telescopes(VLV ν Ts) dedicated to neutrino oscillation physics, such as the IceCube Upgrade, PINGU,and ORCA [1, 2, 3, 4], will record tens of thousands of GeV-scale neutrino interactions.These large-scale water or ice Cherenkov detectors do not have the ability to unambiguouslydistinguish between neutrino ﬂavors and interaction types on an event-by-event basis. Evenso, their high statistics data samples can be used to explore eﬀects that are small comparedto the background, such as the tau neutrino appearance rate, the ordering of the neutrinomass eigenstates (NMO), or potential neutrino physics beyond the Standard Model.All such physics analyses are carried out by comparing the observed event distributionswith predictions (hereafter referred to as templates ) obtained from Monte Carlo (MC)simulations. The physical phenomena listed above will appear as statistical (in)compatibilitiesof templates with diﬀerences in event counts as small as a few percent. An inherent problemwhen trying to quantify these deviations in high-statistics data sets is that the templates mustbe described with an accuracy better than the magnitude of the eﬀect being investigated. Alimiting factor to the accuracy is the amount of MC simulation available, which is in turnconstrained by the availability of computing resources. This particularly applies during thedesign optimization phase of a planned experiment, which entails performance assessmentsof multiple detector variants.With an adequate machinery at hand to produce templates, extracting the relevantphysical and systematic parameters typically proceeds via maximizing the likelihood ofobtaining the observed data under a given hypothesis. A common feature to all statisticalmethods is that the templates need to be generated for a multitude of parameter combinations,often thousands or even millions. This process needs to be accurate, but also fast, whichtypically prohibits the reproduction of the full MC sample for each template. ∗ [email protected] also at Universit`a di Padova, I-35131 Padova, Italy also at National Research Nuclear University, Moscow Engineering Physics Institute (MEPhI), Moscow115409, Russia Earthquake Research Institute, University of Tokyo, Bunkyo, Tokyo 113-0032, Japan

Preprint submitted to Journal December 5, 2019 n this article, we present an approach that allows for the fast creation of accuratetemplates even from MC sets that are several orders of magnitude smaller than thosenecessary when using simpler methods. An alternative approach that does not removetemplate inaccuracies but rather mitigates their impact on statistical inference is the inclusionof the inherent MC uncertainty in the ﬁt statistic; recent overviews can be found in [5, 6].Our approach was used to calculate the expected sensitivities for atmospheric neutrinooscillation analyses with the proposed PINGU experiment [2, 3], and a similar approach wastaken in low-energy sensitivity studies for the KM3NeT design [4]. Throughout this article,we will use the NMO analysis for a generic VLV ν T as an example to illustrate our methods,though it is applicable in a wider range of atmospheric neutrino oscillation analyses, and,in parts and with limitations, to other experiments. Section 2 details the computationalchallenge at hand, followed by a brief introduction of the example NMO analysis in Section 3.Our approach to overcome this challenge is presented in Section 4 and Section 5, followed bya discussion of the validity of the approach in Section 6. The performance is compared toother typical analysis methods in Section 7, while the computational burden is discussed inSection 8. Section 9 concludes with a brief summary of the article. Finally, in Appendix Awe provide details about the VLV ν T toy model that we use to benchmark the performanceof all considered analysis approaches.

2. Computational Challenge

The statistical comparison between experimental data and parametric or MC-basedpredictions allows inference of the values of physics parameters under study. It typicallyproceeds via a likelihood analysis. We ﬁrst discuss its most general concepts and variants,then detail the computational requirements on MC generation, and ﬁnally outline twostandard methods of mitigating these computational burdens.

Diﬀerent types of likelihood analyses in particle physics share common features . Anexperiment records data which are used to reconstruct any observables expected to carry theimprint of the physical phenomenon under study. A selection (triggering, ﬁltering, etc.) isapplied in order to enhance the sought signal. Before performing statistical inference, weneed a theoretical model of the observable distributions to compare to the data. Often thisincludes complicated processes like particle interactions and detector response that requirethe use of MC methods. Hence, not only the data, but also the model is subject to statisticalﬂuctuations. However, once an appropriate amount of MC events is available, the data x i canbe compared to templates—theoretical distributions—for diﬀerent physics parameter values θ via a likelihood function, L ( x , x , ..., x n | θ ) = Π i P ( x i | θ ), where P ( x i | θ ) is the probabilityto observe the data x i assuming that θ corresponds to given values of the physics parameters . See, for example, [7] for a more complete overview. If the total number of measurements, n , is also a random quantity, the likelihood function can be extendedto include the distribution of n [8]. ˆ θ ,i.e., the parameter values which maximize L .The methods presented in this paper depend on a likelihood function applied to binneddata. One usually employs either a Poisson likelihood or a χ approximation thereof (seefor example [8]); our methods are independent of this choice, but we use the latter in theexample presented in this article. Binning the data hides physics signatures smaller than thebin size and thus introduces a loss in sensitivity. This can be mitigated by reducing bin sizes,but smaller bins come at the cost of reduced—and possibly insuﬃcient—MC statistics ineach bin.Apart from the physics parameters of interest, a model often comes with nuisanceparameters that are also included in the likelihood function. This further increases thedimensionality of the MLE search, which relies on numerical routines for multidimensionaloptimization problems. For the NMO studies, we use the L-BFGS-B algorithm [9] in a D = 8 dimensional parameter space (see Table 1). The number of steps necessary for theoptimization to converge depends on the particular analysis and model being used (i.e., thedetails of the resulting likelihood landscape); in the case of our toy example, an average of ∼ templates (one per realization of θ ) were needed to converge. The problems associated with generating such a large number of templates are exacerbatedwhen estimating the median sensitivity of an experiment. The above process needs to beapplied to an ensemble of random toy MC pseudo-experiments of size N p . The comparison oftest statistic distributions T (see Section 3 for details) can be used to estimate a signiﬁcancevalue n σ at which one hypothesis is preferred over the alternative. If T is Gaussian distributed ,the uncertainty ∆ n σ to which n σ can be determined depends upon the number of pseudo-experiments N p as (see Appendix B for details):∆ n σ = 1 (cid:112) N p (cid:114) n σ . (1)With an absolute uncertainty ∆ n σ at the 1 % level, determining the sensitivity of anexperiment at a conﬁdence level of 99 . n σ = 3) requires O (10 ) pseudo-experiments.Finally, the event count expectations, µ , for all bins in the templates must be determinedat the same level as the physics eﬀects being investigated, which requires at least = 10 MC events per bin to study sub-percent variations arising in a comparison of the twoNMO realizations. At the same time, the number of bins used in any histograms must be Each pseudo-experiment corresponds to a statistical ﬂuctuation of the expected experimental outcomeas predicted by MC events. For certain problems, the generation of pseudo-experiments can be skipped byapplying the

Asimov approximation [10, 3]. While not a prediction from the model, a near-Gaussian distribution of the test statistic is observed inmost NMO studies [3, 4, 11]. O (10 ) bins are required to resolve the distinct features of theNMO signature; otherwise the analysis cannot exploit the full potential of the experiment.Therefore, the brute-force approach to our example case requires a very large number ofneutrino events to be simulated: O (10 ) events for each of O (10 ) values of θ probed duringthe optimization process for each of O (10 ) pseudo-experiments—a grand total of O (10 )events. Even if the time to simulate and reconstruct a single event is 1 s (a very optimisticestimate for our experiment), full ﬁts to all pseudo-experiments under the two orderinghypotheses would require O (10 ) CPU-core-hours—i.e., a single analysis would keep 10 CPU cores busy for 30 years —a restriction clearly prohibitive to performing any study.Various state-of-the-art methods are employed to mitigate the high computational costs. Inthe remainder of this section, we brieﬂy present the main ideas behind these methods andgive a conceptual introduction to how they are embedded in the approach we introduce inthis article. The standard event-by-event MC weighting technique avoids repeated simulation andreconstruction of events every time a value of a nuisance parameter is changed. This ispossible, ﬁrst, because the physics processes of neutrino production in the atmosphere (ﬂux),their propagation involving ﬂavor oscillation, and their detection and reconstruction areindependent. Each of these processes, therefore, can be treated separately.For a process that has an a priori known parametric form (the parameter values ofwhich are not necessarily known), the outcome of that process can be predicted by directlyevaluating the parametrization at a set of input values. In our case, both the neutrino ﬂuxprediction and ﬂavor oscillations fall into this category. The second category of processesare those that require MC simulation. Predictions of the detection and reconstruction ofneutrinos fall into this category because we do not have a complete characterization of thedetector’s response.This leads to the standard event-by-event reweighting scheme, which estimates theexpected ﬁnal-level event counts due to all processes by simulating a set of MC neutrinos(capturing the eﬀects of detection and reconstruction), assigning to each a weight derived fromﬂux and oscillation calculations, and binning the events’ weights in some set of observabledimensions, as illustrated in the top row of Figure 1.In detail: Each MC neutrino—generated with a ﬂavor β and a set of true observables θ true ν —is assigned a posteriori the weight w β corresponding to the sum over the atmosphericﬂuxes Φ α ( θ ﬂux ; θ true ν ) of all initial ﬂavors α including the probabilities P osc α → β ( θ osc ; θ true ν ) tooscillate into a neutrino of the ﬂavor β : w β ∝ (cid:88) α Φ α ( θ ﬂux ; θ true ν ) × P osc α → β ( θ osc ; θ true ν ) . Here we make the assumption that the algorithm can be parallelized perfectly. C Flux Osc weight weight histogram

Direct hist. MC

Flux Osc weight weight KDE

Direct KDE integrate & bin

Template

Figure 1: Operating principles of direct histogramming (top row) and direct KDE (bottom row), which bothfollow the same weighting scheme for MC events but arrive at the template diﬀerently, as explained in thetext.

In the above, θ ﬂux and θ osc are nuisance parameters aﬀecting neutrino ﬂuxes and oscillationprobabilities. For a given realization of non-parametric nuisance parameters θ det aﬀecting thedetector response, applying the detector response simulation (including event reconstruction,classiﬁcation, etc.) to each incident neutrino results in a set of reconstructed observables θ reco ν , whose distribution can be compared to real data. In practice, techniques dealing withthe discrete nature of detector nuisance parameters may be required. Here, however, weconsider only a single realization of the detector parameters ( θ det ﬁxed)—a simpliﬁcationwithout any loss of the general applicability of the methods discussed.Since the process of oscillation is decoupled from the detector simulation, only a singleMC set is required to generate the templates for the diﬀerent hypotheses under test (e.g., thetwo mass orderings); only the weights w β must be recomputed. This eliminates statisticalﬂuctuations between the otherwise disjoint MC samples. However, even with a single MCset, an undersampling of the phase space of the model can result in a bias.Binning the weights in (a relevant subset of) θ reco ν corresponds to performing MC inte-gration of the experiment’s event distribution. While the convergence rate of this approachdoes not depend on the dimensionality of the integral, errors of the estimates scale as 1 / √ N ,where N is the number of MC events that fall in a bin.As it is often infeasible to generate enough MC events to obtain suﬃcient accuracy inthe MC integration process, smoothing of the ﬁnal event distributions is a common practice.This, however, can be computationally slow and can introduce artiﬁcial features whichmay incorrectly reduce or enhance the signal. One such smoothing technique is kerneldensity estimation (KDE) [12]. Speciﬁcally, we apply adaptive bandwidth KDE directly tothe weighted MC to compare a state-of-the-art version of this method to the methods weintroduce in this paper in Section 4. Here, a Gaussian kernel with a width calculated asdescribed in [13] is centered at each MC event’s reconstruction information. A weighted sumover the kernels of all events then delivers the smoothed distribution as shown in the bottomrow of Figure 1, which will be compared to the distribution our method yields.Shortcomings of the direct application of the two techniques discussed above—the ﬁrstis the weighting method alone (labeled direct histogramming ), while the second appliesadditional smoothing using adaptive kernel density estimates (labeled direct KDE )—can be8vercome using the staged approach . Before providing an overview of the staged approach inSection 4, we brieﬂy introduce the key points of the example NMO analysis used to illustratethe beneﬁts of the approach with respect to the standard techniques.

3. NMO Analysis

The observation of neutrino oscillations and the demonstration of the neutrinos’ non-zeromasses [14, 15] represented a major step forward in the ﬁeld of particle physics. Whilecurrent experimental techniques have not yet allowed for a direct measurement of the tinymasses, the magnitudes of their relative diﬀerences (mass splittings) are well known.The ordering of these neutrino mass states (neutrino mass ordering, NMO) presentsa diﬃcult challenge. A powerful method to determine this ordering is the observation ofmatter eﬀects on neutrinos. Owing to the high electron density of the Sun, observationsof solar neutrinos have shown the second mass state to be heavier than the ﬁrst [16]. Itremains an open question, however, whether the third state is the most or least massive. Theformer scenario is referred to as the normal ordering (NO), while the second is called invertedordering (IO). There is currently no experimental evidence decisively excluding either of thetwo scenarios [17, 18, 19, 20].The study of oscillations of atmospheric neutrinos provides a promising route toward adecisive measurement of the NMO [21, 2, 3, 4]. The path length (or baseline ) varies between20 km for vertically downward going and 12 700 km for straight upward going atmosphericneutrinos, with the latter crossing the full diameter of the Earth. With energies ranging fromMeV up to the TeV scale, combinations of baselines and energies varying over several ordersof magnitude are probed. For the longest baseline, the very pronounced ﬁrst oscillationmaximum of muon neutrinos occurs at a neutrino energy of around 25 GeV, followed bysubsequent maxima at lower energies.The electron neutrinos’ coupling to electrons (coherent forward scattering) in the Earthcreates an eﬀective matter potential which leads to resonant behavior of the transitionprobabilities at energies around 5 GeV, known as matter resonances [22, 23, 24]. Furthermore,the Earth’s speciﬁc density proﬁle encountered by the neutrinos can also parametricallyenhance their oscillations [25]. This enhancement with respect to oscillations proceeding invacuum occurs for neutrinos if the NMO is normal, otherwise for anti-neutrinos.The NMO measurement potential of VLV ν Ts is based on this asymmetry. Two majoraspects are obstructive, however. The ﬁrst is the inability of VLV ν Ts to diﬀerentiatebetween neutrinos and anti-neutrinos. This reduces the eﬀect to the respective diﬀerence inatmospheric ﬂuxes and interaction cross sections. Energy and directional resolutions of theexperiment present the second hurdle. Both are typically prohibitive to resolving the fastvariations of the oscillation pattern at the relevant energies. As a consequence, the observableeﬀect is reduced to at most a few percent over the relevant energy and zenith range (seeFigure 2), requiring neutrino telescopes with eﬀective masses on the order of megatons toachieve suﬃcient event statistics.Proponents of various VLV ν Ts in ice and water have performed studies conﬁrming thisidea, ﬁnding that a > σ (median) sensitivity to the NMO can be achieved within ﬁve9 cos reco E r e c o ( G e V ) Cascades -0.75 -0.25 0.25 0.75 cos reco

Tracks f r a c t i o n a l r a t e d i ff e r e n c e Figure 2: Expected fractional event rate diﬀerence between nominal NO and IO inputs (from Table 1) forthe toy model. Cascades are shown on the left, tracks on the right. years of exposure time even in less favorable regions of the neutrino oscillation parameterspace [2, 4, 26].As the oscillation probabilities directly depend on neutrino energy E true , oscillationbaseline ( ∝ cos ϑ true ), and ﬂavor, we split our data into bins of log E reco , cos ϑ reco , and eventclass . It is important to choose a binning ﬁne enough to resolve the NMO signature, whilecoarse enough to retain a suﬃcient amount of MC statistics per bin, as motivated in Section 2.We have found the division into (40 × ×

2) bins to be suitable, covering a range of E reco from 1 GeV to 80 GeV, the whole sky (cos ϑ reco from − cascades and tracks . Using this binning, for our toy detector introduced in AppendixA, Figure 2 shows the expected fractional event rate diﬀerence ( R NO − R IO ) /R NO , where R NO(IO) is the expected event rate for true NO (IO), based on the two sets of nominal modelparameter values given in Table 1.As the most powerful test statistic for distinguishing two simple hypotheses [27], the logarithm of the likelihood ratio T = −  max θ ∈ NO L ( n | µ ( θ ))max θ ∈ IO L ( n | µ ( θ ))  . (2)is also useful in assessing the ability of an experiment to discriminate between the two(composite) NMO hypotheses at a given conﬁdence level. It is representative of the degree atwhich observing the data n under the NO hypothesis is favored over observing it under thealternate IO hypothesis. The observed spectrum at the detector, n , however, is a convolution The use of the subscript “true” is used to specify the true variables of the neutrinos and to distinguish thesefrom the reconstructed variables, denoted with a subscript “reco”, which will be introduced in Section 4.1. NO A U p Figure 3: Example distributions of Equation (3). The distribution on the left (solid line) represents the caseof NO pseudo-data, while the distribution on the right (dashed) is obtained when the pseudo-data is takenfrom the IO. Here, 1 − p corresponds to the conﬁdence level at which the IO is correctly rejected with aprobability of 50%. of the atmospheric neutrino ﬂux, the eﬀects of neutrino oscillations that bear the NMOsignature, the neutrino interaction and detection processes, and the event reconstruction andclassiﬁcation procedure. Each one of these eﬀects is accompanied by systematic uncertainties.As their impact on the predicted spectrum µ is modeled, the systematic uncertainties directlyfeed in to the likelihood L of the observation.For this study, we limit ourselves to a simpliﬁed treatment using χ statistics and the Asimov dataset. In this approach, the projected median sensitivity is calculated from theaverage experimental outcomes under the two possible NMO hypotheses, as opposed toperforming extensive ensemble tests with randomly ﬂuctuated pseudo-experiments. Thelog-likelihood expression is a simple χ , and Equation (2) can be rewritten as the diﬀerence∆ χ = χ − χ . (3)Here, χ is the minimum χ between model predictions and data, with all nuisanceparameters proﬁled out using NO priors ( χ follows analogously).An illustration of example distributions of (3) for the two diﬀerent NMO hypothesesis shown in Figure 3. The goal is to obtain a p-value p which quantiﬁes the statisticalcompatibility between the hypothesis that is tested and the one assumed to be true. Inthe ensemble approach, the two distributions would need to be built up by ﬁtting pseudo-experiments. In the Asimov approach, however, certain assumptions about the distributionof (3) allow adopting the expression (cid:113) | ∆ χ | as a sensitivity proxy [11], determining thesigniﬁcance at which the wrong ordering can be excluded without the need for pseudo-experiments.For the proﬁling of the nuisance parameters (any free model parameters), a numericalalgorithm minimizes the χ metric. Whenever external constraints are applied to suchparameters, we add those to the χ value as penalty terms (priors). While the presence of11arameter Nominal value PriorNO IO ν e / ν µ ﬂux ratio 1.0 1.0 ± . ν/ ¯ ν ﬂux ratio 1.0 1.0 ± . ± . ± . ± . θ ( ◦ ) 8.5 8.5 ± . θ ( ◦ ) 42.3 49.5 non-Gaussian [28, 29]∆ m (eV ) 0.00246 -0.00237 ± . × − [28, 29] Table 1: Summary of model parameters in the example NMO analysis, including their nominal values forthe two NMO hypotheses and Gaussian ± σ bounds used as external constraints (priors). The ﬁrst threeparameters are applied to atmospheric neutrino ﬂux predictions from [30], following the procedure laid outin Section 5.1. The values for the three oscillation parameters are based on a recent global ﬁt [28, 29]. these penalty terms is meant to illustrate a typical approach to problems of this sort, theirsizes do not follow any precise physical motivation. Table 1 gives an overview of all usedmodel parameters, their nominal values for NO and IO, and priors (where applied).

4. Overview of the Staged Approach

The method to obtain templates we describe in this article is divided into four independentparts, referred to as stages . The four stages (ﬂux, oscillation, detection, and reconstruction)and how they are used to obtain event templates are summarized in this section, while moretechnical descriptions of each stage follow in Section 5.

Templates for our example case of an NMO analysis using a VLV ν T are producedeﬃciently and accurately using the following four stages.Each stage represents a collection of related physical eﬀects. Beginning with the ﬂuxcomputed by the initial stage, each subsequent stage applies a transformation to the outputof the previous stage.1.

Flux

The expected unoscillated atmospheric neutrino ﬂuxes are taken from an externalmodel [30]. Flux values from this model are provided in the form of tables with discretesteps in both neutrino energy, E true , and direction, here the cosine of the zenith angle,cos ϑ true . Therefore, an interpolation must be performed for values between thosetabulated. Crucially, these tables give the integrated ﬂux across the bins, which doesnot necessarily coincide with the ﬂux value at the bin center. Accordingly, we usean integral-preserving (IP) interpolation. In general, atmospheric ﬂux models requireexternal inputs including primary cosmic ray measurements, atmospheric densitymodels, and hadronic interaction measurements. Many associated uncertainties areknown [31, 32] and need to be included as nuisance parameters in an analysis.12 l u x E true E true A e ff E reco e v en t s E true e v en t s x x ⨂ ln(E reco / E true ) E t r ue E true P o sc = = Figure 4: Illustration of the staged approach for obtaining event templates, here for simplicity using acharacterization in one dimension (energy) only. Steps 1, 2, and 3 are in true energy ( E true ); the productof these yields the expected event distribution (lower left). Smearing this spectrum with energy-dependentenergy resolution functions (step 4) gives the reconstructed event rate spectrum (lower right). Note thatthe dotted green line in step 2 shows a hypothetical change of oscillation parameters, aﬀecting only stage 2.Smoothing can now directly be applied to the distributions in steps 3 and 4, instead of the fully weightedMC as in the direct KDE method. Oscillation

Flavor oscillations of neutrinos traversing the Earth modify the ﬂavorcontent of the original ﬂux in a manner that depends on the energies and path lengths(derived from the direction) of the neutrinos. Additional intrinsic neutrino propertiesdetermine the standard ﬂavor oscillation probabilities: three mixing angles and twoindependent mass-squared splittings, as well as a possible non-zero CP-violating phase.In addition, matter eﬀects induce modiﬁcations in the ﬂavor transition probabilitiescompared to vacuum [23, 22, 33], which makes up the basis of the NMO measurementcapability of VLV ν Ts. In [33], the authors present an analytical expression for theneutrino ﬂavor transition amplitude in a layer of uniform-density matter, which in turnwas later implemented in, for example, the

Prob3++ software [34]. Here, the Earthdensity proﬁle [35] is approximated by a ﬁnite number of homogeneous layers and thetotal transition amplitude is represented by a matrix product of the amplitudes in theindividual layers. The main challenge for this stage, which in contrast to the other stagesdoes not require any MC simulation, is to keep the burden of these computationallyexpensive calculations to a minimum, while retaining suﬃcient accuracy in the modelingof the neutrinos’ propagation.3.

Detection

The number of observed events is determined by the (oscillated) ﬂux aswell as a quantity known as the eﬀective area (alternatively, the eﬀective mass) . This In contrast, high energy physics experiments often calculate an acceptance instead, which is also basedon simulation. μ CC 𝝂 τ CC 𝝂 NC 𝝂 e CC 𝝂 tracks 𝝂 μ 𝝂 τ 𝝂 e Flux 𝝂 μ 𝝂 e Oscillation

Detection 𝝂 μ CC 𝝂 τ CC 𝝂 e CC 𝝂 NC Reconstruction 𝝂 cascades Final template 𝝂 μ CC 𝝂 τ CC 𝝂 NC 𝝂 e CC Stage 1Stage 2Stage 3Stage 4

Figure 5: Flow of neutrino ﬂavors and interaction types through the stages, here shown for neutrinos only (withan analogue counterpart for anti-neutrinos). Neutral current events of all ﬂavors are indistinguishable and cantherefore be conveniently added together. The reconstruction stage not only maps from ( E true × cos ϑ true )-space to ( E reco × cos ϑ reco )-space, but also classiﬁes the events into the cascade and track categories, indicatedby the orange and blue color, respectively. incorporates the probability that a given neutrino interacts within the detector, isdetected, and passes the given data selection criteria. We obtain the eight eﬀectiveareas ( ν e,µ,τ & ¯ ν e,µ,τ charged current (CC) and ν & ¯ ν neutral current (NC) interactions)from simulated MC events that are run through the same selection criteria as the realdata. In general, each of these eﬀective areas will depend on the energy and arrivaldirection of the neutrinos. Depending on the detector geometry, certain symmetries canbe exploited to reduce the number of parameters on which the eﬀective areas depend.Here we assume azimuthal symmetry and therefore only extract eﬀective areas as afunction of E true and cos ϑ true .4. Reconstruction

The process referred to as reconstruction translates the raw signalsrecorded by a detector into estimates of the physical properties of events. Uncertaintiesin these estimates manifest as statistical ﬂuctuations, with respect to the true properties,which can be described by probability density functions we refer to as resolutionfunctions . We estimate the resolution functions from the same MC events as used in thedetection stage, for which we know the true energy, zenith angle, and interaction typeon an event-by-event level. The reconstruction stage uses these estimated resolutionfunctions to build smearing kernels (ensembles of resolution functions) that map theevent rates from the space of true variables into the space of reconstructed observables.Additionally—since most VLV ν Ts cannot exactly distinguish the diﬀerent neutrinoﬂavors and interaction types—the events are classiﬁed by their signature in the detector.Here, event classes are tracks and cascades , based on whether the event seems tocontain the expected signature of a starting muon track. This process will separate ν µ CC and ¯ ν µ CC interactions from all others, albeit with limited eﬃciency and purity.For the example NMO analysis, three observables are needed: the primary neutrino’sreconstructed energy ( E reco ), zenith angle ( ϑ reco ), and event classiﬁcation.Note that there is no universal prescription for identifying the set of stages appropriate14or any given physics analysis or detector. Instead, stages are chosen to exploit validsimpliﬁcations for the task at hand. For example, atmospheric neutrino ﬂux and oscillationcalculations depend on readily available tabulated spectra and analytic formulae, respectively.Cosmic ray observatories or high energy particle colliders, by contrast, might require complexstages to describe particle showers, which in turn might depend on high-dimensional, analysis-speciﬁc tables. Any physics scenario resulting in multi-particle ﬁnal states adds furthercomplexity.In essence, the speciﬁc problem and analysis at hand determine to which extent MCsampling is necessary and whether the staged approach is applicable. If the latter is indeedthe case, care must be taken concerning the choice of appropriate stages and their speciﬁcimplementations. In the remainder of this article, we study in detail the staged approach wehave found particularly eﬀective for an NMO analysis using a VLV ν T. In order to produce the ﬁnal-level event templates that are ultimately compared to thedata, the four stages are combined as depicted in Figure 4: integration of the product of theﬁrst three stages (ﬂux, oscillation probability, and eﬀective area) over E true and cos ϑ true yieldsthe event rate in the space of true variables. The event rate in the space of reconstructedobservables is then obtained by a convolution of the true event rate with the reconstructionresolution functions. Finally, multiplication by detector exposure time results in an event count , which can be compared directly to observed data or diﬀerent templates .Throughout the stages, diﬀerent combinations of neutrino ﬂavor and interaction type(channels) need to be treated separately, as depicted in Figure 5. Starting with the atmosphericﬂux, the neutrinos can undergo ﬂavor change via oscillation. Since ν τ production in theatmosphere is expected to be negligible at the energies relevant here, this ﬂavor only appearsthrough oscillation [36]. The detection rate varies between CC and NC interactions [21].Finally, after applying the reconstruction resolutions and event classiﬁcation, event counts aresummed to get the ﬁnal-level templates for events classiﬁed as tracks and cascades separately.Where not mentioned explicitly, the same treatment is also applied to anti-neutrinos. Theﬁnal templates are the sum over both, neutrinos and anti-neutrinos.Since the transformations computed by individual stages are independent of one another,a parameter change aﬀecting one stage does not aﬀect the transformations used by the otherthree stages, and in particular not the result of the previous stages. Therefore, we includecaching functionality that reduces the overall computational expense when a number ofsuccessive templates are retrieved while changing one parameter at a time.The transformations performed by the individual stages are dependent on the neutrino’senergy and zenith angle, and therefore must be computed and applied diﬀerentially. Allstages are evaluated on a grid of points distributed over E true and cos ϑ true , with the ﬁnaltemplates output in E reco , cos ϑ reco , and event class. Points in energy are logarithmicallyspaced in the domain 1 GeV to 80 GeV while points in cosine-zenith are linearly spaced While not shown here, it is possible to extend the model with more parameters or stages to describeadditional eﬀects, such as the modeling of systematic uncertainties. E true ×

400 cos ϑ true Oscillation 400 ×

400 400 E true ×

400 cos ϑ true Detection 400 ×

400 200 E true ×

200 cos ϑ true Reconstruction 200 × × × × E reco ×

40 cos ϑ reco × Table 2: Gridpoints chosen for the staged approach in this work. The output of one stage is the input to thenext stage, and the result of the detection transformation is downsampled from (400 × × × E true ∈ (1, 80) GeV and cos ϑ true ∈ (-1, 1) while the output of reconstruction is in the domain E reco ∈ (1, 80) GeV, cos ϑ reco ∈ (-1, 1), and class ∈ { track, cascade } . Within their respective domains, pointsin energy are logarithmically spaced while points in cosine-zenith are linearly spaced. between − ν e CC, ¯ ν µ NC), we drawan identical number of events. This number, one twelfth of the total number of eventsconstituting a given random sample of the toy model, is referred to as the sample size . Agiven sample is used together with the event-by-event MC weighting technique to generatetemplates for all possible values of θ , that is, to calculate the associated expected counts inall bins of each ﬁnal-level template. 16 x ⊗ = templatevalues transform transform transform Staged(w/ smoothing) evaluate model on grid evaluate model on grid histogram MC + smooth + spline KDE MC, integrate & bin

Staged w/o smoothing histogram MC histogram MC

Truth evaluate model on grid evaluate model on grid

Flux Osc Det Reco

Figure 6: Operating principles of the diﬀerent staged approach modes, which diﬀer in how we generate thetransformations of the last two stages. The staged approach without smoothing is employed for validationpurposes in Section 6.1. See text for details.

A complete overview of the diﬀerent operation modes of the staged approach is givenin Figure 6, which highlights the stages at which these diﬀer in the template generationprocess.

5. Technical Implementation of Stages

The stages within our approach, as summarized in Section 4 and illustrated in Figure 4are subject to diﬀerent technical and computational challenges due to the physics eﬀectscaptured by each one. In this section we examine speciﬁc implementation details whichhighlight how each stage balances performance and precision requirements—even in thepresence of low MC statistics.Therefore, we include caching functionality that reduces the overall computational expensewhen a number of subsequent templates are retrieved while changing one parameter at atime.

In order to preserve the integral of a tabulated set of data, a spline is ﬁt to the integral of the data rather than to the values themselves. Interpolated values in the initial spaceare then found by evaluating the derivative of these splines. We refer to this method asintegral-preserving (IP) interpolation.For the NMO example analysis, the tabulated data of interest are the atmosphericneutrino ﬂux predictions from [30] provided as a function of both E true and cos ϑ true . Tosimplify the problem, the integration is performed along one dimension at a time.Consider the case with ﬂuxes tabulated at M × N points in ( E true , cos ϑ true ). To retrievethe ﬂux at an arbitrary ( E true , cos ϑ true ) point, say ( x, y ), ﬁrst one spline of the integrated Here, a cumulative sum of the bin values multiplied by the respective bin width. .03.54.04.55.05.56.0 F l u x ( m s s r G e V ) Linear spline Cubic spline

Atm. prediction ( E true = 3.98 GeV) Integral-preserving interpolation-0.75 -0.5 -0.25 0.0 0.25 0.5 0.75cos true I n t e g r a l d e v . ( % ) -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75cos true -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75cos true Figure 7: The top part of the ﬁgure shows three diﬀerent interpolation methods applied to the same set ofdata points from [30] while the bottom portion shows the fractional deviation of the integral (= area underthe curve) from these three methods. The deviations from the integral-preserving method presented in thispaper have a maximum ∼ ﬂux as a function of cos ϑ true is created for each of the M E true locations. The derivativeof each of these splines is evaluated at y , yielding M ﬂux values. The integral of thesevalues is then interpolated with a spline, and ﬁnally this spline’s derivative is evaluated at x . This concept generalizes to higher dimensions, but can quickly become computationallyintensive as the number of splines grows. While the splines used in the provided exampleare of one-dimensional cubic type, other spline variants or interpolation techniques can beused, as long as these allow for diﬀerentiation. For the example analysis of this article, IPinterpolation is approximately an order of magnitude slower than two-dimensional cubicspline interpolation.The IP method improves upon standard interpolation techniques in that it correctlymodels the turnover of the ﬂux at the horizon (cos ϑ true = 0) and the behavior in the mostupgoing and downgoing regions (cos ϑ true ∼ ± E true , cos ϑ true )-space. More detailed information onthe IP method can be found in [37]. The oscillation library that we employ is an extension of the code described in [38], wherethe authors ported some of the core functions of

Prob3++ to a graphics processing unit (GPU)via the CUDA C API [39]—an application programming interface to perform general purposecomputations on GPUs. We then added back in the ability to handle an arbitrary numberof constant density layers of matter, allowing for highly parallel calculations of three-ﬂavoroscillation probabilities of neutrinos that encounter a realistic radial Earth density proﬁle,with ﬁne-grained control over its characteristics. We implemented the oscillation calculationswith ﬂoating point precision selectable to either single (32 bits, or FP32) or double (64 bits,or FP64) precision. With our code run in double precision with

Prob3++ , evaluated on a100 ×

100 grid of neutrino energies E true ranging from 1 GeV to 80 GeV and cos ϑ true values18 igure 8: Deviation of ν µ survival probabilities computed with Prob3++ compared to nuCraft . The leftpanel uses a ﬁxed production height of 20 km for both codes and twelve constant-density layers for

Prob3++ .In the right panel the values from nuCraft are the average probabilities for a range of neutrino productionheights across the atmosphere. spanning the region between − Prob3++ code produce consistent results to the level of 10 − or less. These diﬀerences are due todiﬀering hardware implementations of the same mathematical operations. Switching fromdouble to single precision on the GPU, we ﬁnd that the magnitudes of the diﬀerences staybelow about 10 − for all oscillation channels. Single precision is desirable from a performancepoint of view, since most GPUs comprise a larger number of single precision than doubleprecision arithmetic units, and these extra units can be exploited by the parallelism in ourcode.To evaluate the eﬀects of an approximated Earth density proﬁle using a limited number ofconstant density layers and a constant atmospheric production height—both approximationsthat our code makes—we compare the oscillation probabilities from our implementationof Prob3++ against a reference model. The latter is calculated by nuCraft [40], which iswritten in Python and solves the Schr¨odinger equation numerically. The nuCraft libraryalso supports a realistic variation of the oscillation baselines according to the distribution ofatmospheric neutrino production heights described in [41] and uses an interpolated radialdensity proﬁle of the Earth.To this eﬀect, we ﬁrst ﬁx the atmospheric neutrino production height to h = 20 km forboth codes, and quantify the deviations arising from the coarser Earth model by calculatingthe ν µ survival probability residuals on a ﬁne grid in cosine zenith and energy. Whenapproximating the Earth’s density proﬁle with only four layers (one for each of the upperand lower mantle, and the outer and inner core), diﬀerences of up to 15 % to the outputof nuCraft are seen. These diﬀerences decrease to below 5 % when using 12 density layers(see left panel of Figure 8). Using an even more detailed model with 59 layers results indiﬀerences smaller than 0 . Prob3++ probabilities to those obtained under the assumptionof a more realistic distributed atmospheric production height in nuCraft highlights furtherdiscrepancies between the outputs of the two codes (see right panel of Figure 8). However,the largest diﬀerences ( ∼ ±

10 %) appear for near horizontal trajectories, while the residualsfor cos ϑ (cid:46) − . As a reminder, the eﬀective areas are quantities used to translate an incoming ﬂux to theevent rates in the detector. These eﬀective areas are calculated from a limited number ofMC events, hence they can suﬀer from statistical ﬂuctuations which can be a non-negligiblecontribution to the total uncertainty of the ﬁnal physics result. At the same time, eﬀectiveareas are typically well-behaved quantities in energy and zenith angle (under some realisticassumptions, e.g., that no discontinuous selection cuts are applied and no gaps exist inthe detector acceptance). Therefore, smoothing techniques can be applied to alleviate theunwanted uncertainty contributions from statistical ﬂuctuations.For charged current interactions, we compute the eﬀective area separately for eachneutrino ﬂavor. In contrast, we do not distinguish between ﬂavors for neutral current (NC)interactions, since their cross sections are identical. Neutrinos and anti-neutrinos are handledindependently, accounting for a total of eight independent eﬀective area functions. Forconvenience we include the multiplication by detector exposure time ( t exp ) in the same step,which means that this stage outputs event counts ( N events ) instead of rates N events = Φ[m − s − ] · A eﬀ [m ] · t exp [s] , (4)for some input ﬂux (Φ).In our staged approach we ﬁrst evaluate the eﬀective areas on a ﬁne grid in ( E true , cos ϑ true )using the MC events via MC integration, where, when generating events, the sampling ischosen to provide a relatively uniform coverage across all grid points. For our example casestudy, we use a uniform sampling across cos ϑ true and a power law spectrum for the energies ∝ E − to closely follow actual IceCube oscillation analyses. (Note that an optimization ofthe sampling choices would beneﬁt both the staged approach and the reference methods.)Still, for small sample sizes, some grid points may have no associated events, leading to gaps20n the distribution. We remove these by applying a simple Gaussian smearing along thetwo-dimensional grid. In a second step, cubic splines are employed to perform smoothing.Here, ﬁrst, splines are created along the E true dimension individually for every cos ϑ true bin,and evaluated to obtain new values for every grid point. Then, this splining procedure isrepeated along the cos ϑ true dimension.Figure 9 shows the truth template of ν µ CC events on a grid with n bins = 40 ×

40 pointstogether with the fractional deviations that arise when the same template is obtained fromMC samples of diﬀerent sizes using direct histogramming versus the smoothing methoddescribed above. We use ν µ CC events as an example here and below. Table 3 gives theaverage (binwise, i.e. per degree of freedom) χ values deﬁned as (cid:10) χ (cid:11) = 1 n bins χ = 1 n bins n bins (cid:88) i =1 (cid:0) µ (cid:48) i − µ ref i (cid:1) µ ref i (5)and maximal χ values deﬁned as χ = max ≤ i ≤ n bins (cid:34) (cid:0) µ (cid:48) i − µ ref i (cid:1) µ ref i (cid:35) (6)by which the templates from the our method and from direct histogramming deviate fromtruth (with bin counts µ ref i ).The χ values provide direct insight into how the accuracy of the template descriptioncompares to the statistical uncertainty of the real data that would be observed. Since theobserved data underlies Poisson ﬂuctuations it has an average deviation from truth of χ = 1per bin. An analysis of real data, however, can only test templates based on MC. Theseexhibit their own statistical uncertainties, resulting in ﬁnite χ deviations from truth, shownin Table 3 as a function of MC sample size. It is essential that these inaccuracies inherent tothe template generation process are considerably smaller than the statistical ﬂuctuations indata in order to ensure accurate statistical inference.Applying our method we ﬁnd deviations that are lower by a factor of about 40 for thesmallest MC set, and by a factor of about 13 for the largest. It is noteworthy that themaximum deviation ( χ ) across all bins decreases monotonically with MC sample size,conﬁrming that the used smoothing method does not introduce any observable bias. The usual way to obtain templates in the space of reconstructed variables is to place eachindividual MC event in the ﬁnal-level distributions according to the reconstruction informationthat the event carries. This is the case for both methods that are used for comparison:direct histogramming and direct KDE, the only diﬀerence between these being how theﬁnal-level distributions are estimated. While this approach correctly takes into accountjoint dependencies of the event reconstruction on the involved variables, it is particularly Generated from the toy model in Appendix A. cos ϑ true E t r u e ( G e V ) Truth e v e n t s

1k samples D i r e c t h i s t .

10k samples 100k samples E t r u e ( G e V ) -0.75-0.25 0.25 0.75 cos ϑ true S t a g e d a pp r o a c h -0.75-0.25 0.25 0.75 cos ϑ true -0.75-0.25 0.25 0.75 cos ϑ true E t r u e ( G e V ) A b s . f r a c t i o n a l d i ff e r e n c e Figure 9: Parametric reference distribution after the ﬁrst three stages (ﬂux, oscillation, and detection) forthe ν µ CC channel in (cos ϑ true , E true ) (left panel) and relative residuals ( | N − N true | /N true ) for the directhistogramming (right panel, top row) and our proposed method (right panel, bottom row) on a 40 × , 10 , and 10 events, respectively. Thesamples are drawn from the unbinned toy model distributions of Appendix A. sensitive to small MC sample sizes due to the potentially high dimensionality of the space ofreconstructed variables. In contrast, the staged approach uses the available MC simulationto construct detector resolution functions which we integrate to form a transformation thatmaps a template in true variables (such as that shown on the left in Figure 9) onto the spaceof reconstructed variables, what we refer to as the ﬁnal-level template.In the case study of the NMO analysis, the mapping of true variables ( E true and cos ϑ true )to reconstructed variables ( E reco , cos ϑ reco , and event class) is extracted from the MC asa “migration” tensor of order ﬁve, M ijklm . It maps the histogram of event counts in thetwo-dimensional space of true variables, h ij , to the observed histogram of event counts in theSample size 10 Direct hist. (cid:104) χ (cid:105)

215 22.5 2.07 0.201 χ (cid:104) χ (cid:105) χ

460 17.2 2.27 0.975

Table 3: Average χ per bin and the worst-case bin’s χ value comparing templates on a 40 ×

40 grid in( E true , cos ϑ true )-space (i.e., before applying reconstruction resolutions) generated by direct histogramming(top) and the smoothed-staged approach (bottom) with the toy model’s reference template. Shown are valuesobtained for independent input MC samples of various sizes (from 10 up to 10 events per ﬂavor/interactiontype). h (cid:48) klm : h (cid:48) klm = (cid:88) i,j M ijklm h ij . (7)The reconstruction transform in general has to be computed as a ﬁve-dimensionaltransform, as all ﬁve dimensions can depend on one another—i.e. they are correlated.Studying the correlations among the dimensions in our particular MC revealed, however,that E reco only depends on event class and E true , cos ϑ reco depends on event class and bothinput dimensions, and event class only depends on E true . For each of the three reconstructionvariables, we subdivide the MC in the quantity’s dependent dimensions to the point thatcorrelations are not visible and that all events in the subdivision can be assumed to be samplesfrom the same one-dimensional distribution.—i.e. the resolution functions we generate.There is a trade-oﬀ in terms of how much to subdivide the MC for producing theseresolution functions. Since resolution changes as a function of a dependent dimension,suﬃciently narrow subdivisions in that dimension group together MC events drawn fromessentially the same distribution. Subdivisions that are too wide will group together eventsdrawn from diﬀerent distributions and the resulting resolution functions will be erroneous.However, narrower subdivisions admit fewer MC events in each subdivision and so lead togreater statistical variations in the estimated resolution functions (i.e., their shapes will bemore aﬀected by random ﬂuctuations in the MC).To balance these competing factors, we devised the following heuristic. For the quantitybeing characterized, we divide each dependent dimension evenly—except event class, whichis binary. E true is divided evenly in log-space to help ensure even subdivisions group togetherevents with similar energy resolution, as this quantity changes more rapidly at low E true thanat high E true . We allow each subdivision of E true to separately expand enough to captureat least 100 events, and at least 500 events in each subdivision of cos ϑ true . If expansion isperformed, subdivisions’ upper and lower edges are expanded by the same factor (up to thelimits of the dimension). The captured events are then used to produce resolution functions.The remaining parameters that require tuning in this heuristic are the number of sub-divisions to use for each dependent dimension for each quantity being characterized. Forthis, we visually inspect the 2-dimensional distributions of each characterized quantity as afunction of each dependent dimension and require that the events in each subdivision do notdisplay strong dependence on the dependent dimension.If the functional form of the resolution functions is known, a parametric model of thisform ﬁt to the MC yields the most accurate and lowest variance reconstruction transform.However, as we do not know the form of these functions, a non-parametric density estimationtechnique is used to approximate them. In particular, we chose to use adaptive KDE [42]with bandwidths scaled uniformly such that the narrowest is that found from the ImprovedSheather Jones (ISJ) algorithm [43]. KDE works by placing a kernel function (we use aGaussian) centered at the value of each event’s variable to be described and then summingover all kernels. Adaptive bandwidth KDE uses diﬀerent widths for each kernel, where thebandwidths are inversely proportional to the density of points near the location of the kernel.The ISJ bandwidth selection algorithm used to normalize the kernel widths is an improvement23 cos reco cos true ln( E reco / E true )True dens. Adaptive kernel dens. estimate Figure 10: Example energy and cosine-zenith-angle resolution distributions for ν µ CC events classiﬁed ascascades, estimated with histograms and adaptive KDE. Energy resolution is shown for 100 events with E true ∈ [26 . , .

8] GeV and cosine-zenith resolution for 100 events with E true ∈ [1 . , .

1] GeV. The samplesused to construct the histogram and KDE are shown by vertical lines beneath the histograms. over predecessor algorithms (e.g., [44, 13]) in that it does not make assumptions that thequantity being estimated is drawn from a Gaussian distribution. In our experience, thisoutperforms ﬁxed bandwidth KDE by not underestimating the heavy-tailed distributions weencounter, but it bears repeating that other density estimation techniques can yield better orworse results depending on the speciﬁcs of the MC in question. An example of two resolutionfunctions (one for both energy and zenith angle, respectively) estimated using the adaptiveKDE method is shown in Figure 10.Figure 11 again demonstrates that templates obtained from our KDE-based reconstructionstage deviate much less from the parametric reference template after reconstruction thantemplates from direct histogramming of reconstructed MC events.

6. Validation and Comparison of Templates

This section more closely examines the templates generated with the staged approachand compares them—along with those generated by the other two methods (histograms andKDE)—to the parametric reference distributions of the toy detector model. This validationis split into two parts. The ﬁrst examines the grid of points that are used to numericallyapproximate the integral over the ﬁrst three stages, whereas the eﬀect of smoothing isinvestigated in the second.

In order to demonstrate the validity of our choice of grid points shown in Table 2 aswell as the equivalence between the staged approach and traditional MC weighting as gridpoint spacing in E true and cos ϑ true is reduced, we compare the staged approach withoutsmoothing (i.e. using raw histograms as transforms in place of smoothing functions andKDEs) to direct histogramming. The speciﬁc comparison done here without smoothing issolely for the purpose of validating the principle of stages vs. direct histograms.24 cos ϑ reco E r e c o ( G e V ) Truth e v e n t s

1k samples D i r e c t h i s t .

10k samples 100k samples E r e c o ( G e V ) -0.75-0.25 0.25 0.75 cos ϑ reco S t a g e d a pp r o a c h -0.75-0.25 0.25 0.75 cos ϑ reco -0.75-0.25 0.25 0.75 cos ϑ reco E r e c o ( G e V ) A b s . f r a c t i o n a l d i ff e r e n c e Figure 11: Same as Figure 9, but comparing ﬁnal-level templates after all four stages are applied. Note thatthe residuals in the 1k-samples plot for direct histogramming go up to 31 but the scale is clipped at 10.

Grid ( M × N ) 40 ×

40 80 ×

80 160 ×

160 320 ×

320 640 ×

640 1280 × (cid:104) χ (cid:105) χ Table 4: Average and maximal χ deviations per bin of the ﬁnal 40 × × E true ,cos ϑ true ) for the ﬁrst three stages, using an MC sample of 10 events. The last (=reconstruction) stage usesa reduced binning, as described in the text. Table 4 shows the χ diﬀerence (cf. Equations (5) and (6)) between the ﬁnal templatesobtained from the staged approach (with bin counts µ (cid:48) i ) and direct histogramming ( µ ref i ) fora variety of grid point densities in E true and cos ϑ true , using the same MC set of size 10 forboth methods. These templates are output with a binning of 40 × × E reco , cos ϑ reco ,and event class. The relative decrease in the average χ value roughly scales with the inverseof the relative grid density increase, thus conﬁrming that the two methods will agree toarbitrary precision in the asymptotic limit. In the following, for practical reasons we limitourselves to the speciﬁc case summarized in Table 2. To validate the ﬁnal templates with smoothing applied at each stage, we compare themdirectly to truth. For reference, we also show the agreement resulting from both the directhistogramming and the direct KDE methods.While Table 5 quantiﬁes deviations from the reference distributions again in terms of χ and in dependence of MC sample size, Figure 12 displays the ﬁnal-level templates for each ofthe aforementioned methods using a sample with 10 events.25 cos ϑ reco E r e c o ( G e V ) Cascades -0.75-0.25 0.25 0.75 cos ϑ reco Tracks e v e n t s (a) Truth -0.75-0.25 0.25 0.75 cos ϑ reco E r e c o ( G e V ) Cascades -0.75-0.25 0.25 0.75 cos ϑ reco Tracks e v e n t s (b) Direct Histogramming -0.75-0.25 0.25 0.75 cos ϑ reco E r e c o ( G e V ) Cascades -0.75-0.25 0.25 0.75 cos ϑ reco Tracks e v e n t s (c) Direct KDE -0.75-0.25 0.25 0.75 cos ϑ reco E r e c o ( G e V ) Cascades -0.75-0.25 0.25 0.75 cos ϑ reco Tracks e v e n t s (d) Staged Approach Figure 12: Final-level templates used for the example data analysis. The reference distributions (truth)obtained directly for the toy detector model parameterizations are shown in panel (a). Given the samesample of 10 events the estimated distributions using histograms are shown in panel (b), using KDEs inpanel (c), and using the staged approach in panel (d). The staged approach outperforms the two alternatives in terms of χ values by more thanone order of magnitude for all those sample sizes studied here. Furthermore, inaccuraciesof the templates from the staged approach scale with the inverse of sample size almost asfast as those of templates from direct histogramming. In addition, it is noteworthy thatthe KDE method shows comparably slow convergence, i.e., it performs worse than directhistogramming for the sample size of 10 .While for the experimental data (or pseudo-data) one expects statistical ﬂuctuations onthe order of χ = 1 . and the staged approach, the average χ deviation from truth (using the same χ deﬁnition as for data) is only about 30% of whatis expected just from statistical ﬂuctuations in data, while more than 10 events would benecessary to achieve the same average χ using direct histogramming or KDE. (See Table 5for details.) Therefore, to reach an equal accuracy, two or more orders of magnitude largersamples are needed for histogramming or KDE compared to the staged approach. The nextsection illustrates the implications for running a data analysis.26ample size 10 Direct Histogramming (cid:104) χ (cid:105)

468 42.6 4.27 0.458 χ . ·

906 138 10.5Direct KDE (cid:104) χ (cid:105) χ

245 90.2 50.3 25.3Staged Approach (cid:104) χ (cid:105) χ Table 5: Average and maximal χ deviations per bin of the ﬁnal 40 × ×

7. Example Analysis Results

To illustrate the impact of sample size, we show the resulting (cid:112) ∆ χ as an estimate forthe sensitivity to the NMO for our example analysis in Figure 13. For reference, the trueresult is derived directly from the exact templates based on the parametric toy detectormodel and lies at (cid:112) ∆ χ = 5 .

75. For the three methods discussed throughout this paper,the statistical uncertainty of the obtained sensitivity is indicated by error bars in the ﬁgure.This uncertainty is computed from several statistically independent MC sets and indicatesthe central 68% quantile of each ensemble. In particular, as the sensitivity proxy does nottake into account MC uncertainty [5, 6], this range is not, a priori, expected to reﬂect anysensitivity bias for the three methods.The uncertainty reveals that the methods exhibit quite diﬀerent intrinsic ﬂuctuation oftheir respective sensitivity estimates, as well as diﬀerent scaling behavior of the variancewith sample size. As sample size decreases, direct histogramming without any smoothingapplied results in an increasing overestimation of a VLV ν T’s ability to exclude the wrongneutrino mass ordering. In the most extreme case shown here (corresponding to the smallestsample size of 10 ), the sensitivity is estimated to be more than one order of magnitudegreater than the actual capability of the experiment. Only for the sample size of 10 doesdirect histogramming indeed give reliable results. This is expected from the simple rule ofthumb (cf. Section 2.2) of O (10 ) events per bin × O (10 ) bins.Illustratively, an undersampling of the detector response distributions due to low MCstatistics is highly likely to lead to an overestimation of the experiment’s sensitivity becausethe NMO signature that is present in the space of true variables is carried over to randombins in the reconstructed observables with reduced cancellation .Applying KDE smoothing to the weighted events instead of histogramming them (i.e.,direct KDE) leads to a reduction of the overestimated sensitivity for sample sizes of up to at Each MC set is used together with the staged approach to generate one Asimov toy data template and O (10 ) “test” templates. For example, if a bin in the ﬁnal-level template is solely populated by (unweighted) MC neutrinos, andno anti-neutrinos, or vice-versa, it will contribute artiﬁcially strong to the overall NMO sensitivity due to themissing summation over both event types (cf. Section 3). MC sample size E s t i m a t e d s e n s i t i v i t y ( )

135 4987 TruthStaged approachDirect KDEDirect hist.

Figure 13: Estimated sensitivity ( (cid:112) ∆ χ ) to the NMO vs. sample size for direct histogramming, directKDE, and the proposed staged smoothing methods applied to multiple (between 50 and 200) statisticallyindependent toy MC sets. Vertical lines indicate central 68% quantiles of the ensemble. The dashed horizontalline shows the signiﬁcance obtained from truth templates based on the parametric toy detector model. Thestaged approach outperforms the other methods—both in terms of bias and variance—for sample sizesthrough 3 · , with direct histogramming only matching the staged approach using 10 samples. Note thatno data points exist for direct KDE and sample sizes above 3 · , as computational processing times becomeimpractically large. Also note that direct histogramming is oﬀ-scale high for fewer than 3 · events (meanvalues are indicated to the right of the corresponding markers). least 3 · but is not able to eliminate the bias entirely for the tested sample sizes. Forsample sizes larger than O (10 ), the direct KDE method is too computationally expensive todeliver results within a reasonable time (for more details on timing, see Section 8).The estimated sensitivity using the staged approach is statistically compatible with thetrue sensitivity across the whole range of sample sizes considered. It shows no bias and lowervariance for predicting sensitivity to physics compared to the other methods within the limitsof our testing.

8. Benchmarks

Whether a given analysis method is useful in a realistic setting depends not only on itsnumerical reliability, but also on how long it takes to compute the quantity of interest (notethat this duration is in addition to the initial time needed to generate the MC). For reference,we performed benchmarks of the template generation times in the course of a typical analysisprocess . These are compiled in Figure 14. Timings were obtained on a computer with an Intel Xeon E5-1660 v3 3.0 GHz CPU and an NVIDIAGeForce GTX Titan X GPU. to 10 events direct histogramming is the fastest method,it is unusable owing to the large ﬂuctuations associated with the templates it produces,which in turn result in the grossly overestimated sensitivites shown in Figure 13. DirectKDE only proves competitive when used in conjunction with the smallest datasets. Thefaster-than-linear scaling of its computational needs with sample size then quickly rendersit impractical to use. Our proposed method is independent of sample size by construction(excluding initial start-up costs), but will get more expensive if a ﬁner grid point spacing isdesired.The timing diﬀerence between the CPU and GPU implementation of the staged approachis not as large as for the other methods, since it is only using the GPU for parallelizationof the neutrino oscillation weights calculation (whereas the other methods make use of theGPU more extensively). · · · MC sample size10 A v e r a g e t e m p l a t e g e n e r a t i o n t i m e ( s ) Staged Approach (GPU)Staged Approach (CPU)Direct KDE (GPU)Direct KDE (CPU)Direct hist. (GPU)Direct hist. (CPU)

Figure 14: Average template generation time during a typical analysis for input datasets of varying size,shown for the direct histogramming, the direct KDE, and the staged approach. Solid lines represent timingsbased on (partial) GPU acceleration, whereas the dashed ones are for CPU-only calculations.

9. Summary

The search for small physics eﬀects in high statistics neutrino oscillation experimentsrequires careful treatment and use of simulated data. Statistical ﬂuctuations within dis-tributions obtained from Monte Carlo simulations are able to severely distort an analysis,rendering derived constraints or sensitivities essentially meaningless.29he staged approach we have presented serves two main purposes. Firstly, computationalexpense is reduced through sampling of physics and detector response distributions on adiscrete grid instead of computing a weight for every individual Monte Carlo event. In thisrespect, we have demonstrated that our method of breaking down the template generationinto independent stages converges to the MC weighting scheme when using a grid of a highenough, albeit feasible, density. For a ﬁxed number of grid points, the template generationtime has been shown to be independent of the input sample size. Secondly, the stagedapproach allows the application of smoothing techniques to a detector’s response functions.In the speciﬁc example shown, the detection stage characterizes the detector’s eﬀective area byintegrating weighted MC events on a grid and smoothing the resulting histogram, followed bythe event reconstruction stage using an adaptive KDE smoothing on the resolution functionsapplied to arrive at ﬁnal-level templates. This has proven superior to the smoothing of theﬁnal event distributions since it is faster and—even more importantly—yields more accurateand robust results. The presented choice of smoothing techniques works suﬃciently wellfor our purposes, but this choice is neither unique nor do we claim it to be optimal, and itdepends on the wider experimental context. Beside this choice, our overall approach mayprove particularly useful when a fast assessment of the physics potential of various detectordesigns is desired, or when analysis methodologies are optimized. Any ﬁnal-level analysiswill likely rely on large quantities of MC to guarantee the precise and accurate modelling ofthe experiment.In the example neutrino mass ordering analysis that we have conducted—to benchmarkand compare the diﬀerent approaches—we found that direct histogramming of events leadsto a gross overestimation of sensitivities when used in conjunction with small numbers ofevents ( (cid:46) events for our toy model). Conversely, the proposed staged approach leads tocorrect results that are largely unaﬀected by the sample size across the tested range and thevariance of results is small compared to the result above about 10 neutrino events. Thismeans that the necessary amount of simulated events is reduced signiﬁcantly (by about twoorders of magnitude in our example)—an important aspect especially since Monte Carloevent simulation and reconstruction times can represent major hurdles to progress in theﬁeld of neutrino oscillation experiments. Acknowledgments

The IceCube collaboration gratefully acknowledges the support from the following agen-cies and institutions: USA – U.S. National Science Foundation-Oﬃce of Polar Programs, U.S.National Science Foundation-Physics Division, Wisconsin Alumni Research Foundation, Cen-ter for High Throughput Computing (CHTC) at the University of Wisconsin-Madison, OpenScience Grid (OSG), Extreme Science and Engineering Discovery Environment (XSEDE),U.S. Department of Energy-National Energy Research Scientiﬁc Computing Center, Particleastrophysics research computing center at the University of Maryland, Institute for Cyber-Enabled Research at Michigan State University, and Astroparticle physics computationalfacility at Marquette University; Belgium – Funds for Scientiﬁc Research (FRS-FNRS andFWO), FWO Odysseus and Big Science programmes, and Belgian Federal Science Policy Of-30ce (Belspo); Germany – Bundesministerium f¨ur Bildung und Forschung (BMBF), DeutscheForschungsgemeinschaft (DFG), Helmholtz Alliance for Astroparticle Physics (HAP), Initia-tive and Networking Fund of the Helmholtz Association, Deutsches Elektronen Synchrotron(DESY), and High Performance Computing cluster of the RWTH Aachen; Sweden – SwedishResearch Council, Swedish Polar Research Secretariat, Swedish National Infrastructure forComputing (SNIC), and Knut and Alice Wallenberg Foundation; Australia – AustralianResearch Council; Canada – Natural Sciences and Engineering Research Council of Canada,Calcul Qu´ebec, Compute Ontario, Canada Foundation for Innovation, WestGrid, and Com-pute Canada; Denmark – Villum Fonden, Danish National Research Foundation (DNRF),Carlsberg Foundation; New Zealand – Marsden Fund; Japan – Japan Society for Promotionof Science (JSPS) and Institute for Global Prominent Research (IGPR) of Chiba University;Korea – National Research Foundation of Korea (NRF); Switzerland – Swiss National ScienceFoundation (SNSF); United Kingdom – Department of Physics, University of Oxford.

Appendix A. Toy Data Model

In the following we provide a parametric toy detector model used to transform theoscillated atmospheric ﬂuxes into event counts. The functions we use either serve as directinputs (truth) to the various stages of the simulation chain laid out in Section 4, or assampling distributions from which toy MC samples are drawn. We point out here that theseare entirely empirically motivated, and should only be seen as proxies of the performanceindicators in next-generation VLV ν Ts (such as the IceCube Upgrade [1], PINGU [2, 3], orKM3NeT/ORCA [4]).Simpliﬁcations or limitations of the model do not aﬀect the computational analysis tech-niques themselves. Rather, the goal in the following is to capture the most essential featuresof the expected detector response: threshold eﬀects in detection, the ﬁnite accuracy andskew of reconstruction resolution functions, as well as limited ﬂavor and charge identiﬁcationcapabilities. This does not invalidate the conclusions drawn from comparing the variousanalysis approaches.

Appendix A.1. Detection Eﬃciency

We assume a detector of ﬁducial mass M ﬁd = 10 megaton, with a neutrino detectionenergy threshold of E th = 1 GeV for all neutrino ﬂavors and interaction channels apart from ν τ charged current (CC) interactions, where the intrinsic interaction threshold is higher, at E th = 3 . M α eﬀ = ρ ice V α eﬀ for a given combination, α , ofﬂavor and interaction type, where ρ ice is the ice density and V α eﬀ the detector’s correspondingeﬀective volume, exhibits a phenomenological dependence on true neutrino energy, E true ,asymptotically approaching M ﬁd according to an exponential function: M α eﬀ ( E true ) = M ﬁd × (cid:0) − e − k α × ( E true / GeV − E th / GeV) (cid:1) for E true > E th . (A.1)We include three eﬀective masses to cover all the neutrino interaction channels: one for ν e , ¯ ν e , ν µ , and ¯ ν µ CC, one for ν τ and ¯ ν τ CC, and one for all NC channels. For the CC31hannels we choose k α = 0 .

4, while for the NC channels the function rises more slowly, with k α = 0 .

1. The left panel of Figure A.15 shows these dependencies for neutrino energies up to E true = 80 GeV. The detector can be roughly considered fully eﬃcient ( M eﬀ = M ﬁd ) for alldetection channels above E true ≈

50 GeV.The ν -¯ ν asymmetry—which is required to make the NMO measurement—will be intro-duced through diﬀerences in ﬂux and cross sections, i.e., it will become apparent in thedetector’s eﬀective area. The latter we obtain from the eﬀective mass via the conversion A α eﬀ ( E true ) = σ α ( E true ) × n ice /ρ ice × M α eﬀ ( E true ) , (A.2)where σ α is the total neutrino-nucleon cross section for a given ﬂavor-interaction channel α , n ice ≈ × cm − is the nucleon density in ice, and ρ ice ≈ .

92 g cm − the mass density.We also make some simplifying assumptions about the cross sections used in Equa-tion (A.2), in that we take ν e and ν µ (¯ ν e and ¯ ν µ ) CC cross sections to be the same, as wellas all ν x (¯ ν x ) NC cross sections. In addition, we model all the mentioned cross sections torise strictly linearly with E true above E true = 1 GeV [45]: σ α ( E true ) /E true = c α × − cm GeV − , (A.3)where we set c ν e,µ CC = 2 c ¯ ν e,µ CC = 0 .

70 , (A.4) c ν x NC = 2 c ¯ ν x NC = 0 .

25 . (A.5)To obtain ν τ (¯ ν τ ) CC eﬀective areas, we interpolate the corresponding neutrino-nucleon crosssection curves given in [46]. All resulting eﬀective areas as a function of neutrino energy aredepicted in the right panel of Figure A.15. We take these to be invariant in azimuth, butuniversally introduce an arbitrary, smooth polynomial modiﬁcation M with the zenith angledependency M ( x ) = 120 ( − x + x − x ) + 1 ( x ≡ cos ϑ true ) , (A.6)which we normalize to unit area . Appendix A.2. Reconstruction Resolutions

Neutrino zenith resolutions with respect to cos ϑ are represented by single Gaussiandistributions. The distributions’ parameters are taken as functions of E true only. For eachﬂavor and interaction channel, we assign a mean µ ∆ cos ϑ ( E true ) = 0 across all energies, and astandard deviation of σ ∆ cos ϑ ( E true ) = . √ E true / GeV + 0 . µ (cid:48) and σ (cid:48) with respect to their standard form, via the transformation A eﬀ ( E true ) is the average over the full sky, cos ϑ true ∈ [ − , +1]. E true (GeV) M e ff ( M t ) , e CCCC x NC 1 2 4 8 16 32 64 E true (GeV) A e ff ( m ) , e CC , e CCCCCCNC x NC Figure A.15: Eﬀective masses (left) and areas (right) as a function of true neutrino energy for a generictoy detector with ﬁducial mass of 10 Mt. The dependency of the eﬀective masses on energy is given inEquation (A.1). Cross sections are from Equation (A.3), except for ν τ and ¯ ν τ interactions, which areinterpolated from [46]. Eﬀective masses are the same for neutrinos and anti-neutrinos. See text for details. x → ( x − µ (cid:48) ) /σ (cid:48) . These parameters again only depend on E true . The CC distributions areassumed identical for all ﬂavors, and are shown in Figure A.16: µ (cid:48) CC∆ E ν ( E true ) = 0 , σ (cid:48) CC∆ E ν ( E true ) = (cid:32) . (cid:112) E true / GeV + 0 . (cid:33) × E true . (A.7)For NC interactions, we take a spread that scales with E true in the same way σ (cid:48) CC∆ E ν does, butassume a non-zero mean due to the undetected energy carried away by the outgoing neutrino: µ (cid:48) NC∆ E ν ( E true ) = − . E true .Note that each energy and cosine zenith residual distribution is renormalized such thatits integral over the physical region (∆ E ν + E true ≥ − ≤ (∆ cos ϑ + cos ϑ true ) ≤ Appendix A.3. Event Classiﬁcation

Correctly identifying few-GeV CC muon neutrino interactions with relatively sparselyinstrumented neutrino telescopes in water/ice is diﬃcult mainly for two reasons. The tracklength of a near minimum ionizing muon is only on the order of a few meters, comparable tothe extent of an electromagnetic cascade of the same energy. Also, photon scattering lengthssimilar to the horizontal spacing between photomultiplier tubes smear out the Cherenkovring around the muon direction, which is otherwise observed at a speciﬁc angle with respectto the muon direction in the medium.We take into account the muon neutrino CC (“track”) identiﬁcation eﬃciency p µ, CCtrack improving with (reconstructed) neutrino energy, E reco , by setting p µ, CCtrack ≡ p µ, CCtrack ( E reco ) = 0 . × (cid:0) − e − . × ( E reco / GeV+0 . (cid:1) . (A.8)This reﬂects the track length of the secondary muon increasing linearly with its energy, butalso the possible production of a low-energy muon which cannot be distinguished from the33 E reco E true (GeV) E true = 2.00 GeV E true = 5.60 GeV E true = 9.20 GeV E true = 12.80 GeV E true = 16.40 GeV E true = 20.00 GeV x CC Figure A.16: Example energy resolution functions(right-skewed Gumbel) used for all CC interactions,as given by Equation (A.7). The modes of the cor-responding NC resolution functions are shifted by − . E true with respect to the distributions depictedhere. E reco (GeV) T r a c k I D p r o b a b ili t y CCCCCCe , NCx

Figure A.17: Event classiﬁcation eﬃciencies imple-mented as functions of reconstructed neutrino energy.Shown is the probability to identify an event of agiven type as “track-like”. Events are identiﬁed as“cascade-like” with complementary probabilities. accompanying hadronic cascade even for higher-energy muon neutrino CC interactions. Allother (in)eﬃciencies are assumed to be constant: p e , CCtrack ( E reco ) = p NCtrack ( E reco ) = 0 .

15 , (A.9) p τ, CCtrack ( E reco ) = 0 .

25 . (A.10)These are shown in Figure A.17. The probability to identify any event as “cascade-like”for a given reconstructed energy is just the complementary probability to that of the trackidentiﬁcation.When a toy MC event is subject to this classiﬁcation, we assign it one of two dis-crete numbers—representative of either identiﬁcation as track or cascade—with the aboveprobabilities.

Appendix B. Uncertainty in Signiﬁcance

Under the assumption that the test statistic T under two hypotheses H and H isnormally distributed (with means µ and µ and with identical standard deviation σ ),the number of standard deviations ( n σ ) separating the two hypotheses can be written as n σ = | µ − µ | /σ (corresponding to a one-sided hypothesis test and a one-sided conversionfrom p-value). Sampling each distribution with N p pseudo-experiments results in the followinguncertainties for mean and standard deviation (see for example [47])∆ µ = σ (cid:112) N p , (B.1)∆ σ = σ (cid:112) N p − . (B.2)34ince the combination of the quantities is linear, we can perform simple error propagation, sothat the relative uncertainty in signiﬁcance becomes (with ⊕ denoting sum in quadrature)∆ n σ n σ = ∆ σσ ⊕ ∆ | µ − µ || µ − µ | . (B.3)Using ∆ | µ − µ | = ∆ µ ⊕ ∆ µ = (cid:115) N p σ (B.4)the second term simpliﬁes to∆ | µ − µ || µ − µ | = (cid:115) N p σ | µ − µ | = (cid:115) N p n σ . (B.5)Substituting Equations (B.5) and (B.1) into Equation (B.3) yields∆ n σ n σ = 1 (cid:112) N p − ⊕ (cid:115) N p n σ = (cid:115) N p −

1) + 2 N p n σ . (B.6)The absolute error on the number of standard deviations and its approximation for large N p then follow immediately as∆ n σ = (cid:115) n σ N p −

1) + 2 N p ( N p >> −−−−−→ (cid:112) N p (cid:114) n σ . (B.7) References [1] A. Ishihara, The IceCube Upgrade - Design and Science Goals, in: 36th International Cosmic RayConference (ICRC 2019) Madison, Wisconsin, USA, July 24-August 1, 2019, 2019. arXiv:1907.11699 .[2] M. G. Aartsen, et al., PINGU: a vision for neutrino and particle physics at the South Pole, J. Phys.G44 (5) (2017) 054006. arXiv:1607.02671 , doi:10.1088/1361-6471/44/5/054006 .[3] M. G. Aartsen, et al., Letter of intent: the Precision IceCube Next Generation Upgrade (PINGU),(2017). arXiv:1401.2046v2 .[4] S. Adrian-Martinez, et al., Letter of intent for KM3NeT 2.0, J. Phys. G43 (8) (2016) 084001. arXiv:1601.07459 , doi:10.1088/0954-3899/43/8/084001 .[5] T. Gl¨usenkamp, A uniﬁed perspective on modiﬁed Poisson likelihoods for limited Monte Carlo data. arXiv:1902.08831 .[6] C. A. Arg¨uelles, A. Schneider, T. Yuan, A binned likelihood for stochastic models, JHEP 06 (2019) 030. arXiv:1901.04645 , doi:10.1007/JHEP06(2019)030 .[7] R. Barlow, Introduction to statistical issues in particle physics, Statistical problems in particle physics,astrophysics and cosmology. Proceedings, Conference, PHYSTAT 2003, Stanford, USA, September 8-11,2003, C030908 (2003) MOAT002. arXiv:physics/0311105 .[8] G. Cowan, Statistical Data Analysis, Oxford University Press, 1998.[9] R. H. Byrd, P. Lu, J. Nocedal, A limited memory algorithm for bound constrained optimization, SIAMJ. Sci. Comput. 16 (1995) 1190–1208.

10] G. Cowan, K. Cranmer, E. Gross, O. Vitells, Asymptotic formulae for likelihood-based tests of newphysics, Eur. Phys. J. C71 (2011) 1554, [Erratum: Eur. Phys. J.C73,2501(2013)]. arXiv:1007.1727 , doi:10.1140/epjc/s10052-011-1554-0,10.1140/epjc/s10052-013-2501-z .[11] M. Blennow, P. Coloma, P. Huber, T. Schwetz, Quantifying the sensitivity of oscillation experiments tothe neutrino mass ordering, JHEP 2014 (3) (2014) 28. doi:10.1007/JHEP03(2014)028 .[12] K. S. Cranmer, Kernel estimation in high-energy physics, Comput. Phys. Commun. 136 (2001) 198–207. arXiv:hep-ex/0011057 , doi:10.1016/S0010-4655(00)00243-5 .[13] D. W. Scott, On optimal and data-based histograms, Biometrika 66 (3) (1979) 605. doi:10.1093/biomet/66.3.605 .[14] Y. Fukuda, et al., Evidence for oscillation of atmospheric neutrinos, Phys. Rev. Lett. 81 (1998) 1562–1567. arXiv:hep-ex/9807003 , doi:10.1103/PhysRevLett.81.1562 .[15] Q. R. Ahmad, et al., Measurement of the rate of ν e + d → p + p + e − interactions produced by B solar neutrinos at the Sudbury Neutrino Observatory, Phys. Rev. Lett. 87 (2001) 071301. arXiv:nucl-ex/0106015 , doi:10.1103/PhysRevLett.87.071301 .[16] B. Aharmim, et al., Combined analysis of all three phases of solar neutrino data from the SudburyNeutrino Observatory, Phys. Rev. C88 (2013) 025501. arXiv:1109.0763 , doi:10.1103/PhysRevC.88.025501 .[17] F. Capozzi, E. Lisi, A. Marrone, A. Palazzo, Current unknowns in the three neutrino framework, Prog.Part. Nucl. Phys. 102 (2018) 48–72. arXiv:1804.09678 , doi:10.1016/j.ppnp.2018.05.005 .[18] P. F. De Salas, S. Gariazzo, O. Mena, C. A. Ternes, M. T´ortola, Neutrino Mass Ordering fromOscillations and Beyond: 2018 Status and Future Prospects, Front. Astron. Space Sci. 5 (2018) 36. arXiv:1806.11051 , doi:10.3389/fspas.2018.00036 θ , δ CP , and themass ordering, JHEP 01 (2019) 106. arXiv:1811.05487 , doi:10.1007/JHEP01(2019)106 .[21] C. Patrignani, et al., Review of particle physics, Chin. Phys. C40 (10) (2016) 100001. doi:10.1088/1674-1137/40/10/100001 .[22] L. Wolfenstein, Neutrino oscillations in matter, Phys. Rev. D17 (1978) 2369.[23] S. P. Mikheev, A. Y. Smirnov, Resonant ampliﬁcation of neutrino oscillations in matter and solarneutrino spectroscopy, Nuovo Cim. C9 (1986) 17–26.[24] S. T. Petcov, S. Toshev, Three neutrino oscillations in matter: analytical results in the adiabaticapproximation, Phys. Lett. B187 (1987) 120–126. doi:10.1016/0370-2693(87)90083-9 .[25] E. K. Akhmedov, M. Maltoni, A. Yu. Smirnov, Oscillations of high energy neutrinos in matter: preciseformalism and parametric resonance, Phys. Rev. Lett. 95 (2005) 211801. arXiv:hep-ph/0506064 , doi:10.1103/PhysRevLett.95.211801 .[26] K. Abe, et al., Letter of intent: the Hyper-Kamiokande experiment — detector design and physicspotential —. arXiv:1109.3262 .[27] J. Neyman, E. S. Pearson, On the problem of the most eﬃcient tests of statistical hypotheses, Philosoph-ical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences231 (694-706) (1933) 289–337. doi:10.1098/rsta.1933.0009 .[28] M. C. Gonzalez-Garcia, M. Maltoni, T. Schwetz, Updated ﬁt to three neutrino mixing: status of leptonicCP violation, JHEP 2014 (11) (2014) 52. doi:10.1007/JHEP11(2014)052 arXiv:1502.03916 , doi:10.1103/PhysRevD.92.023004 .[31] G. D. Barr, T. K. Gaisser, S. Robbins, T. Stanev, Uncertainties in atmospheric neutrino ﬂuxes, Phys.Rev. D74 (2006) 094009. arXiv:astro-ph/0611266 , doi:10.1103/PhysRevD.74.094009 .[32] J. Evans, D. G. Gamez, S. D. Porzio, S. S¨oldner-Rembold, S. Wren, Uncertainties in atmospheric muon-neutrino ﬂuxes arising from cosmic-ray primaries, Phys. Rev. D95 (2) (2017) 023012. arXiv:1612.03219 , oi:10.1103/PhysRevD.95.023012 .[33] V. Barger, K. Whisnant, S. Pakvasa, R. J. N. Phillips, Matter eﬀects on three-neutrino oscillations,Phys. Rev. D22 (1980) 2718–2726. doi:10.1103/PhysRevD.22.2718 .[34] R. Wendell, Prob3++ software for computing three ﬂavor neutrino oscillation probabilities, , 2012.[35] A. M. Dziewonski, D. L. Anderson, Preliminary reference Earth model, Physics of the Earth and planetaryinteriors 25 (4) (1981) 297 – 356. doi:http://dx.doi.org/10.1016/0031-9201(81)90046-7 .[36] A. Bulmahn, M. H. Reno, Secondary atmospheric tau neutrino production, Phys. Rev. D82 (2010)057302. arXiv:1007.4989 , doi:10.1103/PhysRevD.82.057302 .[37] S. Wren, Neutrino Mass Ordering Studies with IceCube-DeepCore, Ph.D. thesis, Manchester U. (2018).URL [38] R. G. Calland, A. C. Kaboth, D. Payne, Accelerated event-by-event neutrino oscillation reweightingwith matter eﬀects on a GPU, JINST 9 (2014) P04016. arXiv:1311.7579 , doi:10.1088/1748-0221/9/04/P04016 .[39] J. Nickolls, I. Buck, M. Garland, K. Skadron, Scalable parallel programming with cuda, 2008 IEEE HotChips 20 Symposium (HCS) (2008) 1–2.[40] M. Wallraﬀ, C. Wiebusch, Calculation of oscillation probabilities of atmospheric neutrinos using nuCraft,Comput. Phys. Commun. 197 (2015) 185–189. arXiv:1409.1387 , doi:10.1016/j.cpc.2015.07.010 .[41] T. K. Gaisser, T. Stanev, Path length distributions of atmospheric neutrinos, Phys. Rev. D57 (1998)1977–1982. doi:10.1103/PhysRevD.57.1977 .[42] I. S. Abramson, On bandwidth variation in kernel estimates—a square root law, Ann. Statist. 10 (4)(1982) 1217–1223. doi:10.1214/aos/1176345986 .[43] Z. I. Botev, J. F. Grotowski, D. P. Kroese, Kernel density estimation via diﬀusion, Ann. Statist. 38 (5)(2010) 2916–2957. doi:10.1214/10-AOS799 .[44] S. J. Sheather, M. C. Jones, A reliable data-based bandwidth selection method for kernel densityestimation, Journal of the Royal Statistical Society. Series B (Methodological) 53 (3) (1991) 683–690.[45] J. A. Formaggio, G. P. Zeller, From eV to EeV: neutrino cross sections across energy scales, Rev. Mod.Phys. 84 (2012) 1307–1341. arXiv:1305.7513 , doi:10.1103/RevModPhys.84.1307 .[46] A. Gazizov, M. Kowalski, K. S. Kuzmin, V. A. Naumov, C. Spiering, Neutrino-nucleon cross sectionsat energies of megaton-scale detectors, EPJ Web Conf. 116 (2016) 08003. arXiv:1604.02092 , doi:10.1051/epjconf/201611608003 .[47] D. Sivia, Data Analysis: A Bayesian Tutorial, Oxford University Press, 2006..[47] D. Sivia, Data Analysis: A Bayesian Tutorial, Oxford University Press, 2006.