Computational Techniques for the Analysis of Small Signals in High-Statistics Neutrino Oscillation Experiments
IceCube Collaboration, M. G. Aartsen, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, M. Ahrens, I. Al Samarai, D. Altmann, K. Andeen, T. Anderson, I. Ansseau, G. Anton, C. Argüelles, T. C. Arlen, J. Auffenberg, S. Axani, H. Bagherpour, X. Bai, A. Balagopal V., J. P. Barron, I. Bartos, S. W. Barwick, V. Baum, R. Bay, J. J. Beatty, J. Becker Tjus, K.-H. Becker, S. BenZvi, D. Berley, E. Bernardini, D. Z. Besson, G. Binder, D. Bindig, E. Blaufuss, S. Blot, C. Bohm, M. Bohmer, M. Börner, F. Bos, S. Böser, O. Botner, E. Bourbeau, J. Bourbeau, F. Bradascio, J. Braun, M. Brenzke, H.-P. Bretz, S. Bron, J. Brostean-Kaiser, A. Burgman, R. S. Busse, T. Carver, E. Cheung, D. Chirkin, A. Christov, K. Clark, L. Classen, G. H. Collin, J. M. Conrad, P. Coppin, P. Correa, D. F. Cowen, R. Cross, P. Dave, M. Day, J. P. A. M. de André, C. De Clercq, J. J. DeLaunay, H. Dembinski, S. De Ridder, P. Desiati, K. D. de Vries, G. de Wasseige, M. de With, T. DeYoung, J. C. Díaz-Vélez, V. di Lorenzo, H. Dujmovic, J. P. Dumm, M. Dunkman, M. A. DuVernois, E. Dvorak, B. Eberhardt, T. Ehrhardt, B. Eichmann, P. Eller, R. Engel, J. J. Evans, P. A. Evenson, S. Fahey, A. R. Fazely, J. Felde, K. Filimonov, C. Finley, S. Flis, A. Franckowiak, E. Friedman, A. Fritz, T. K. Gaisser, et al. (272 additional authors not shown)
CComputational Techniques for the Analysis of Small Signals inHigh-Statistics Neutrino Oscillation Experiments
M. G. Aartsen q , M. Ackermann bf , J. Adams q , J. A. Aguilar m , M. Ahlers u , M. Ahrens aw , C.Alispach aa , K. Andeen an , T. Anderson bc , I. Ansseau m , G. Anton y , C. Arg¨uelles o , T. C.Arlen bc , J. Auffenberg a , S. Axani o , P. Backes a , H. Bagherpour q , X. Bai at , A. Balagopal V. ad ,A. Barbano aa , S. W. Barwick ac , B. Bastian bf , V. Baum al , S. Baur m , R. Bay h , J. J. Beatty s,t ,K.-H. Becker be , J. Becker Tjus k , S. BenZvi av , D. Berley r , E. Bernardini bf,1 , D. Z. Besson ae,2 ,G. Binder h,i , D. Bindig be , E. Blaufuss r , S. Blot bf , C. Bohm aw , M. B¨orner v , S. B¨oser al , O.Botner bd , J. B¨ottcher a , E. Bourbeau u , J. Bourbeau ak , F. Bradascio bf , J. Braun ak , S. Bron aa ,J. Brostean-Kaiser bf , A. Burgman bd , J. Buscher a , R. S. Busse ao , T. Carver aa , C. Chen f , E.Cheung r , D. Chirkin ak , S. Choi ay , K. Clark af , L. Classen ao , A. Coleman ap , G. H. Collin o , J.M. Conrad o , P. Coppin n , P. Correa n , D. F. Cowen bb,bc , R. Cross av , P. Dave f , C. De Clercq n ,J. J. DeLaunay bc , H. Dembinski ap , K. Deoskar aw , S. De Ridder ab , P. Desiati ak , K. D. deVries n , G. de Wasseige n , M. de With j , T. DeYoung w , A. Diaz o , J. C. D´ıaz-V´elez ak , H.Dujmovic ad , M. Dunkman bc , E. Dvorak at , B. Eberhardt ak , T. Ehrhardt al , P. Eller bc , R.Engel ad , J. J. Evans am , P. A. Evenson ap , S. Fahey ak , A. R. Fazely g , J. Felde r , K. Filimonov h ,C. Finley aw , D. Fox bb , A. Franckowiak bf , E. Friedman r , A. Fritz al , T. K. Gaisser ap , J.Gallagher aj , E. Ganster a , S. Garrappa bf , L. Gerhardt i , K. Ghorbani ak , T. Glauch z , T.Gl¨usenkamp y , A. Goldschmidt i , J. G. Gonzalez ap , D. Grant w , Z. Griffith ak , S. Griswold av , M.G¨under a , M. G¨und¨uz k , C. Haack a , A. Hallgren bd , R. Halliday w , L. Halve a , F. Halzen ak , K.Hanson ak , A. Haungs ad , D. Hebecker j , D. Heereman m , P. Heix a , K. Helbing be , R. Hellauer r ,F. Henningsen z , S. Hickford be , J. Hignight x , G. C. Hill b , K. D. Hoffman r , R. Hoffmann be , T.Hoinka v , B. Hokanson-Fasig ak , K. Hoshina ak,3 , F. Huang bc , M. Huber z , T. Huber ad,bf , K.Hultqvist aw , M. H¨unnefeld v , R. Hussain ak , S. In ay , N. Iovine m , A. Ishihara p , G. S. Japaridze e ,M. Jeong ay , K. Jero ak , B. J. P. Jones d , F. Jonske a , R. Joppe a , D. Kang ad , W. Kang ay , A.Kappes ao , D. Kappesser al , T. Karg bf , M. Karl z , A. Karle ak , T. Katori ah , U. Katz y , M.Kauer ak , J. L. Kelley ak , A. Kheirandish ak , J. Kim ay , T. Kintscher bf , J. Kiryluk ax , T. Kittler y ,S. R. Klein h,i , R. Koirala ap , H. Kolanoski j , L. K¨opke al , C. Kopper w , S. Kopper ba , D. J.Koskinen u , M. Kowalski j,bf , K. Krings z , G. Kr¨uckl al , N. Kulacz x , N. Kurahashi as , A.Kyriacou b , J. L. Lanfranchi bc , M. J. Larson r , F. Lauber be , J. P. Lazar ak , K. Leonard ak , A.Leszczy´nska ad , M. Leuermann a , Q. R. Liu ak , E. Lohfink al , C. J. Lozano Mariscal ao , L. Lu p , F.Lucarelli aa , J. L¨unemann n , W. Luszczak ak , Y. Lyu h,i , W. Y. Ma bf , J. Madsen au , G. Maggi n ,K. B. M. Mahn w , Y. Makino p , P. Mallik a , K. Mallot ak , S. Mancina ak , S. Mandalia ah , I. C.Mari¸s m , R. Maruyama aq , K. Mase p , R. Maunu r , F. McNally ai , K. Meagher ak , M. Medici u , A.Medina t , M. Meier v , S. Meighen-Berger z , T. Menne v , G. Merino ak , T. Meures m , J. Micallef w ,D. Mockler m , G. Moment´e al , T. Montaruli aa , R. W. Moore x , R. Morse ak , M. Moulai o , P.Muth a , R. Nagai p , U. Naumann be , G. Neer w , H. Niederhausen z , M. U. Nisa w , S. C.Nowicki w , D. R. Nygren i , A. Obertacke Pollmann be , M. Oehler ad , A. Olivas r , A.O’Murchadha m , E. O’Sullivan aw , T. Palczewski h,i , H. Pandya ap , D. V. Pankova bc , N. Park ak ,P. Peiffer al , C. P´erez de los Heros bd , S. Philippen a , D. Pieloth v , E. Pinat m , A. Pizzuto ak , M.Plum an , A. Porcelli ab , P. B. Price h , G. T. Przybylski i , C. Raab m , A. Raissi q , M. Rameez u , L.1 a r X i v : . [ phy s i c s . d a t a - a n ] D ec auch bf , K. Rawlins c , I. C. Rea z , R. Reimann a , B. Relethford as , M. Renschler ad , G. Renzi m ,E. Resconi z , W. Rhode v , M. Richman as , S. Robertson i , M. Rongen a , C. Rott ay , T. Ruhe v , D.Ryckbosch ab , D. Rysewyk w , I. Safa ak , S. E. Sanchez Herrera w , A. Sandrock v , J. Sandroos al ,M. Santander ba , S. Sarkar ar , S. Sarkar x , K. Satalecka bf , M. Schaufel a , H. Schieler ad , P.Schlunder v , T. Schmidt r , A. Schneider ak , J. Schneider y , F. G. Schr¨oder ad,ap , L. Schulte l , L.Schumacher a , S. Sclafani as , D. Seckel ap , S. Seunarine au , S. Shefali a , M. Silva ak , R. Snihur ak ,J. Soedingrekso v , D. Soldin ap , S. S¨oldner-Rembold am , M. Song r , G. M. Spiczak au , C.Spiering bf , J. Stachurska bf , M. Stamatikos t , T. Stanev ap , R. Stein bf , P. Steinm¨uller ad , J.Stettner a , A. Steuer al , T. Stezelberger i , R. G. Stokstad i , A. St¨oßl p , N. L. Strotjohann bf , T.St¨urwald a , T. Stuttard u , G. W. Sullivan r , I. Taboada f , F. Tenholt k , S. Ter-Antonyan g , A.Terliuk bf , S. Tilav ap , K. Tollefson w , L. Tomankova k , C. T¨onnis az , S. Toscano m , D. Tosi ak , A.Trettin bf , M. Tselengidou y , C. F. Tung f , A. Turcati z , R. Turcotte ad , C. F. Turley bc , B. Ty ak ,E. Unger bd , M. A. Unland Elorrieta ao , M. Usner bf , J. Vandenbroucke ak , W. Van Driessche ab ,D. van Eijk ak , N. van Eijndhoven n , J. van Santen bf , S. Verpoest ab , M. Vraeghe ab , C.Walck aw , A. Wallace b , M. Wallraff a , N. Wandkowsky ak , T. B. Watson d , C. Weaver x , A.Weindl ad , M. J. Weiss bc , J. Weldert al , C. Wendt ak , J. Werthebach ak , B. J. Whelan b , N.Whitehorn ag , K. Wiebe al , C. H. Wiebusch a , L. Wille ak , D. R. Williams ba , L. Wills as , M.Wolf z , J. Wood ak , T. R. Wood x , K. Woschnagg h , G. Wrede y , S. Wren am , D. L. Xu ak , X. W.Xu g , Y. Xu ax , J. P. Yanez x , G. Yodh ac , S. Yoshida p , T. Yuan ak , M. Z¨ocklein a a III. Physikalisches Institut, RWTH Aachen University, D-52056 Aachen, Germany b Department of Physics, University of Adelaide, Adelaide, 5005, Australia c Dept. of Physics and Astronomy, University of Alaska Anchorage, 3211 Providence Dr., Anchorage, AK99508, USA d Dept. of Physics, University of Texas at Arlington, 502 Yates St., Science Hall Rm 108, Box 19059,Arlington, TX 76019, USA e CTSPS, Clark-Atlanta University, Atlanta, GA 30314, USA f School of Physics and Center for Relativistic Astrophysics, Georgia Institute of Technology, Atlanta, GA30332, USA g Dept. of Physics, Southern University, Baton Rouge, LA 70813, USA h Dept. of Physics, University of California, Berkeley, CA 94720, USA i Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA j Institut f¨ur Physik, Humboldt-Universit¨at zu Berlin, D-12489 Berlin, Germany k Fakult¨at f¨ur Physik & Astronomie, Ruhr-Universit¨at Bochum, D-44780 Bochum, Germany l Physikalisches Institut, Universit¨at Bonn, Nussallee 12, D-53115 Bonn, Germany m Universit´e Libre de Bruxelles, Science Faculty CP230, B-1050 Brussels, Belgium n Vrije Universiteit Brussel (VUB), Dienst ELEM, B-1050 Brussels, Belgium o Dept. of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA p Dept. of Physics and Institute for Global Prominent Research, Chiba University, Chiba 263-8522, Japan q Dept. of Physics and Astronomy, University of Canterbury, Private Bag 4800, Christchurch, New Zealand r Dept. of Physics, University of Maryland, College Park, MD 20742, USA s Dept. of Astronomy, Ohio State University, Columbus, OH 43210, USA t Dept. of Physics and Center for Cosmology and Astro-Particle Physics, Ohio State University, Columbus,OH 43210, USA u Niels Bohr Institute, University of Copenhagen, DK-2100 Copenhagen, Denmark v Dept. of Physics, TU Dortmund University, D-44221 Dortmund, Germany w Dept. of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA x Dept. of Physics, University of Alberta, Edmonton, Alberta, Canada T6G 2E1 Erlangen Centre for Astroparticle Physics, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, D-91058Erlangen, Germany z Physik-department, Technische Universit¨at M¨unchen, D-85748 Garching, Germany aa D´epartement de physique nucl´eaire et corpusculaire, Universit´e de Gen`eve, CH-1211 Gen`eve, Switzerland ab Dept. of Physics and Astronomy, University of Gent, B-9000 Gent, Belgium ac Dept. of Physics and Astronomy, University of California, Irvine, CA 92697, USA ad Karlsruhe Institute of Technology, Institut f¨ur Kernphysik, D-76021 Karlsruhe, Germany ae Dept. of Physics and Astronomy, University of Kansas, Lawrence, KS 66045, USA af SNOLAB, 1039 Regional Road 24, Creighton Mine 9, Lively, ON, Canada P3Y 1N2 ag Department of Physics and Astronomy, UCLA, Los Angeles, CA 90095, USA ah School of Physics and Astronomy, Queen Mary University of London, London E1 4NS, United Kingdom ai Department of Physics, Mercer University, Macon, GA 31207-0001, USA aj Dept. of Astronomy, University of Wisconsin, Madison, WI 53706, USA ak Dept. of Physics and Wisconsin IceCube Particle Astrophysics Center, University of Wisconsin, Madison,WI 53706, USA al Institute of Physics, University of Mainz, Staudinger Weg 7, D-55099 Mainz, Germany am School of Physics and Astronomy, The University of Manchester, Oxford Road, Manchester, M13 9PL,United Kingdom an Department of Physics, Marquette University, Milwaukee, WI, 53201, USA ao Institut f¨ur Kernphysik, Westf¨alische Wilhelms-Universit¨at M¨unster, D-48149 M¨unster, Germany ap Bartol Research Institute and Dept. of Physics and Astronomy, University of Delaware, Newark, DE19716, USA aq Dept. of Physics, Yale University, New Haven, CT 06520, USA ar Dept. of Physics, University of Oxford, Parks Road, Oxford OX1 3PU, UK as Dept. of Physics, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA at Physics Department, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA au Dept. of Physics, University of Wisconsin, River Falls, WI 54022, USA av Dept. of Physics and Astronomy, University of Rochester, Rochester, NY 14627, USA aw Oskar Klein Centre and Dept. of Physics, Stockholm University, SE-10691 Stockholm, Sweden ax Dept. of Physics and Astronomy, Stony Brook University, Stony Brook, NY 11794-3800, USA ay Dept. of Physics, Sungkyunkwan University, Suwon 16419, Korea az Institute of Basic Science, Sungkyunkwan University, Suwon 16419, Korea ba Dept. of Physics and Astronomy, University of Alabama, Tuscaloosa, AL 35487, USA bb Dept. of Astronomy and Astrophysics, Pennsylvania State University, University Park, PA 16802, USA bc Dept. of Physics, Pennsylvania State University, University Park, PA 16802, USA bd Dept. of Physics and Astronomy, Uppsala University, Box 516, S-75120 Uppsala, Sweden be Dept. of Physics, University of Wuppertal, D-42119 Wuppertal, Germany bf DESY, D-15738 Zeuthen, Germany
Abstract
The current and upcoming generation of Very Large Volume Neutrino Telescopes—collecting unprecedented quantities of neutrino events—can be used to explore subtle effectsin oscillation physics, such as (but not restricted to) the neutrino mass ordering. Thesensitivity of an experiment to these effects can be estimated from Monte Carlo simulations.With the high number of events that will be collected, there is a trade-off between thecomputational expense of running such simulations and the inherent statistical uncertaintyin the determined values. In such a scenario, it becomes impractical to produce and use3dequately-sized sets of simulated events with traditional methods, such as Monte Carloweighting. In this work we present a staged approach to the generation of binned eventdistributions in order to overcome these challenges. By combining multiple integration andsmoothing techniques which address limited statistics from simulation it arrives at reliableanalysis results using modest computational resources.
Keywords:
Data Analysis, Monte Carlo, MC, Statistics, Smoothing, KDE, Neutrino,Neutrino Mass Ordering, Detector, VLV ν T
1. Introduction
By virtue of their multi-megaton effective mass paired with the magnitude of the at-mospheric neutrino flux, the next generation of Very Large Volume Neutrino Telescopes(VLV ν Ts) dedicated to neutrino oscillation physics, such as the IceCube Upgrade, PINGU,and ORCA [1, 2, 3, 4], will record tens of thousands of GeV-scale neutrino interactions.These large-scale water or ice Cherenkov detectors do not have the ability to unambiguouslydistinguish between neutrino flavors and interaction types on an event-by-event basis. Evenso, their high statistics data samples can be used to explore effects that are small comparedto the background, such as the tau neutrino appearance rate, the ordering of the neutrinomass eigenstates (NMO), or potential neutrino physics beyond the Standard Model.All such physics analyses are carried out by comparing the observed event distributionswith predictions (hereafter referred to as templates ) obtained from Monte Carlo (MC)simulations. The physical phenomena listed above will appear as statistical (in)compatibilitiesof templates with differences in event counts as small as a few percent. An inherent problemwhen trying to quantify these deviations in high-statistics data sets is that the templates mustbe described with an accuracy better than the magnitude of the effect being investigated. Alimiting factor to the accuracy is the amount of MC simulation available, which is in turnconstrained by the availability of computing resources. This particularly applies during thedesign optimization phase of a planned experiment, which entails performance assessmentsof multiple detector variants.With an adequate machinery at hand to produce templates, extracting the relevantphysical and systematic parameters typically proceeds via maximizing the likelihood ofobtaining the observed data under a given hypothesis. A common feature to all statisticalmethods is that the templates need to be generated for a multitude of parameter combinations,often thousands or even millions. This process needs to be accurate, but also fast, whichtypically prohibits the reproduction of the full MC sample for each template. ∗ [email protected] also at Universit`a di Padova, I-35131 Padova, Italy also at National Research Nuclear University, Moscow Engineering Physics Institute (MEPhI), Moscow115409, Russia Earthquake Research Institute, University of Tokyo, Bunkyo, Tokyo 113-0032, Japan
Preprint submitted to Journal December 5, 2019 n this article, we present an approach that allows for the fast creation of accuratetemplates even from MC sets that are several orders of magnitude smaller than thosenecessary when using simpler methods. An alternative approach that does not removetemplate inaccuracies but rather mitigates their impact on statistical inference is the inclusionof the inherent MC uncertainty in the fit statistic; recent overviews can be found in [5, 6].Our approach was used to calculate the expected sensitivities for atmospheric neutrinooscillation analyses with the proposed PINGU experiment [2, 3], and a similar approach wastaken in low-energy sensitivity studies for the KM3NeT design [4]. Throughout this article,we will use the NMO analysis for a generic VLV ν T as an example to illustrate our methods,though it is applicable in a wider range of atmospheric neutrino oscillation analyses, and,in parts and with limitations, to other experiments. Section 2 details the computationalchallenge at hand, followed by a brief introduction of the example NMO analysis in Section 3.Our approach to overcome this challenge is presented in Section 4 and Section 5, followed bya discussion of the validity of the approach in Section 6. The performance is compared toother typical analysis methods in Section 7, while the computational burden is discussed inSection 8. Section 9 concludes with a brief summary of the article. Finally, in Appendix Awe provide details about the VLV ν T toy model that we use to benchmark the performanceof all considered analysis approaches.
2. Computational Challenge
The statistical comparison between experimental data and parametric or MC-basedpredictions allows inference of the values of physics parameters under study. It typicallyproceeds via a likelihood analysis. We first discuss its most general concepts and variants,then detail the computational requirements on MC generation, and finally outline twostandard methods of mitigating these computational burdens.
Different types of likelihood analyses in particle physics share common features . Anexperiment records data which are used to reconstruct any observables expected to carry theimprint of the physical phenomenon under study. A selection (triggering, filtering, etc.) isapplied in order to enhance the sought signal. Before performing statistical inference, weneed a theoretical model of the observable distributions to compare to the data. Often thisincludes complicated processes like particle interactions and detector response that requirethe use of MC methods. Hence, not only the data, but also the model is subject to statisticalfluctuations. However, once an appropriate amount of MC events is available, the data x i canbe compared to templates—theoretical distributions—for different physics parameter values θ via a likelihood function, L ( x , x , ..., x n | θ ) = Π i P ( x i | θ ), where P ( x i | θ ) is the probabilityto observe the data x i assuming that θ corresponds to given values of the physics parameters . See, for example, [7] for a more complete overview. If the total number of measurements, n , is also a random quantity, the likelihood function can be extendedto include the distribution of n [8]. ˆ θ ,i.e., the parameter values which maximize L .The methods presented in this paper depend on a likelihood function applied to binneddata. One usually employs either a Poisson likelihood or a χ approximation thereof (seefor example [8]); our methods are independent of this choice, but we use the latter in theexample presented in this article. Binning the data hides physics signatures smaller than thebin size and thus introduces a loss in sensitivity. This can be mitigated by reducing bin sizes,but smaller bins come at the cost of reduced—and possibly insufficient—MC statistics ineach bin.Apart from the physics parameters of interest, a model often comes with nuisanceparameters that are also included in the likelihood function. This further increases thedimensionality of the MLE search, which relies on numerical routines for multidimensionaloptimization problems. For the NMO studies, we use the L-BFGS-B algorithm [9] in a D = 8 dimensional parameter space (see Table 1). The number of steps necessary for theoptimization to converge depends on the particular analysis and model being used (i.e., thedetails of the resulting likelihood landscape); in the case of our toy example, an average of ∼ templates (one per realization of θ ) were needed to converge. The problems associated with generating such a large number of templates are exacerbatedwhen estimating the median sensitivity of an experiment. The above process needs to beapplied to an ensemble of random toy MC pseudo-experiments of size N p . The comparison oftest statistic distributions T (see Section 3 for details) can be used to estimate a significancevalue n σ at which one hypothesis is preferred over the alternative. If T is Gaussian distributed ,the uncertainty ∆ n σ to which n σ can be determined depends upon the number of pseudo-experiments N p as (see Appendix B for details):∆ n σ = 1 (cid:112) N p (cid:114) n σ . (1)With an absolute uncertainty ∆ n σ at the 1 % level, determining the sensitivity of anexperiment at a confidence level of 99 . n σ = 3) requires O (10 ) pseudo-experiments.Finally, the event count expectations, µ , for all bins in the templates must be determinedat the same level as the physics effects being investigated, which requires at least = 10 MC events per bin to study sub-percent variations arising in a comparison of the twoNMO realizations. At the same time, the number of bins used in any histograms must be Each pseudo-experiment corresponds to a statistical fluctuation of the expected experimental outcomeas predicted by MC events. For certain problems, the generation of pseudo-experiments can be skipped byapplying the
Asimov approximation [10, 3]. While not a prediction from the model, a near-Gaussian distribution of the test statistic is observed inmost NMO studies [3, 4, 11]. O (10 ) bins are required to resolve the distinct features of theNMO signature; otherwise the analysis cannot exploit the full potential of the experiment.Therefore, the brute-force approach to our example case requires a very large number ofneutrino events to be simulated: O (10 ) events for each of O (10 ) values of θ probed duringthe optimization process for each of O (10 ) pseudo-experiments—a grand total of O (10 )events. Even if the time to simulate and reconstruct a single event is 1 s (a very optimisticestimate for our experiment), full fits to all pseudo-experiments under the two orderinghypotheses would require O (10 ) CPU-core-hours—i.e., a single analysis would keep 10 CPU cores busy for 30 years —a restriction clearly prohibitive to performing any study.Various state-of-the-art methods are employed to mitigate the high computational costs. Inthe remainder of this section, we briefly present the main ideas behind these methods andgive a conceptual introduction to how they are embedded in the approach we introduce inthis article. The standard event-by-event MC weighting technique avoids repeated simulation andreconstruction of events every time a value of a nuisance parameter is changed. This ispossible, first, because the physics processes of neutrino production in the atmosphere (flux),their propagation involving flavor oscillation, and their detection and reconstruction areindependent. Each of these processes, therefore, can be treated separately.For a process that has an a priori known parametric form (the parameter values ofwhich are not necessarily known), the outcome of that process can be predicted by directlyevaluating the parametrization at a set of input values. In our case, both the neutrino fluxprediction and flavor oscillations fall into this category. The second category of processesare those that require MC simulation. Predictions of the detection and reconstruction ofneutrinos fall into this category because we do not have a complete characterization of thedetector’s response.This leads to the standard event-by-event reweighting scheme, which estimates theexpected final-level event counts due to all processes by simulating a set of MC neutrinos(capturing the effects of detection and reconstruction), assigning to each a weight derived fromflux and oscillation calculations, and binning the events’ weights in some set of observabledimensions, as illustrated in the top row of Figure 1.In detail: Each MC neutrino—generated with a flavor β and a set of true observables θ true ν —is assigned a posteriori the weight w β corresponding to the sum over the atmosphericfluxes Φ α ( θ flux ; θ true ν ) of all initial flavors α including the probabilities P osc α → β ( θ osc ; θ true ν ) tooscillate into a neutrino of the flavor β : w β ∝ (cid:88) α Φ α ( θ flux ; θ true ν ) × P osc α → β ( θ osc ; θ true ν ) . Here we make the assumption that the algorithm can be parallelized perfectly. C Flux Osc weight weight histogram
Direct hist. MC
Flux Osc weight weight KDE
Direct KDE integrate & bin
Template
Figure 1: Operating principles of direct histogramming (top row) and direct KDE (bottom row), which bothfollow the same weighting scheme for MC events but arrive at the template differently, as explained in thetext.
In the above, θ flux and θ osc are nuisance parameters affecting neutrino fluxes and oscillationprobabilities. For a given realization of non-parametric nuisance parameters θ det affecting thedetector response, applying the detector response simulation (including event reconstruction,classification, etc.) to each incident neutrino results in a set of reconstructed observables θ reco ν , whose distribution can be compared to real data. In practice, techniques dealing withthe discrete nature of detector nuisance parameters may be required. Here, however, weconsider only a single realization of the detector parameters ( θ det fixed)—a simplificationwithout any loss of the general applicability of the methods discussed.Since the process of oscillation is decoupled from the detector simulation, only a singleMC set is required to generate the templates for the different hypotheses under test (e.g., thetwo mass orderings); only the weights w β must be recomputed. This eliminates statisticalfluctuations between the otherwise disjoint MC samples. However, even with a single MCset, an undersampling of the phase space of the model can result in a bias.Binning the weights in (a relevant subset of) θ reco ν corresponds to performing MC inte-gration of the experiment’s event distribution. While the convergence rate of this approachdoes not depend on the dimensionality of the integral, errors of the estimates scale as 1 / √ N ,where N is the number of MC events that fall in a bin.As it is often infeasible to generate enough MC events to obtain sufficient accuracy inthe MC integration process, smoothing of the final event distributions is a common practice.This, however, can be computationally slow and can introduce artificial features whichmay incorrectly reduce or enhance the signal. One such smoothing technique is kerneldensity estimation (KDE) [12]. Specifically, we apply adaptive bandwidth KDE directly tothe weighted MC to compare a state-of-the-art version of this method to the methods weintroduce in this paper in Section 4. Here, a Gaussian kernel with a width calculated asdescribed in [13] is centered at each MC event’s reconstruction information. A weighted sumover the kernels of all events then delivers the smoothed distribution as shown in the bottomrow of Figure 1, which will be compared to the distribution our method yields.Shortcomings of the direct application of the two techniques discussed above—the firstis the weighting method alone (labeled direct histogramming ), while the second appliesadditional smoothing using adaptive kernel density estimates (labeled direct KDE )—can be8vercome using the staged approach . Before providing an overview of the staged approach inSection 4, we briefly introduce the key points of the example NMO analysis used to illustratethe benefits of the approach with respect to the standard techniques.
3. NMO Analysis
The observation of neutrino oscillations and the demonstration of the neutrinos’ non-zeromasses [14, 15] represented a major step forward in the field of particle physics. Whilecurrent experimental techniques have not yet allowed for a direct measurement of the tinymasses, the magnitudes of their relative differences (mass splittings) are well known.The ordering of these neutrino mass states (neutrino mass ordering, NMO) presentsa difficult challenge. A powerful method to determine this ordering is the observation ofmatter effects on neutrinos. Owing to the high electron density of the Sun, observationsof solar neutrinos have shown the second mass state to be heavier than the first [16]. Itremains an open question, however, whether the third state is the most or least massive. Theformer scenario is referred to as the normal ordering (NO), while the second is called invertedordering (IO). There is currently no experimental evidence decisively excluding either of thetwo scenarios [17, 18, 19, 20].The study of oscillations of atmospheric neutrinos provides a promising route toward adecisive measurement of the NMO [21, 2, 3, 4]. The path length (or baseline ) varies between20 km for vertically downward going and 12 700 km for straight upward going atmosphericneutrinos, with the latter crossing the full diameter of the Earth. With energies ranging fromMeV up to the TeV scale, combinations of baselines and energies varying over several ordersof magnitude are probed. For the longest baseline, the very pronounced first oscillationmaximum of muon neutrinos occurs at a neutrino energy of around 25 GeV, followed bysubsequent maxima at lower energies.The electron neutrinos’ coupling to electrons (coherent forward scattering) in the Earthcreates an effective matter potential which leads to resonant behavior of the transitionprobabilities at energies around 5 GeV, known as matter resonances [22, 23, 24]. Furthermore,the Earth’s specific density profile encountered by the neutrinos can also parametricallyenhance their oscillations [25]. This enhancement with respect to oscillations proceeding invacuum occurs for neutrinos if the NMO is normal, otherwise for anti-neutrinos.The NMO measurement potential of VLV ν Ts is based on this asymmetry. Two majoraspects are obstructive, however. The first is the inability of VLV ν Ts to differentiatebetween neutrinos and anti-neutrinos. This reduces the effect to the respective difference inatmospheric fluxes and interaction cross sections. Energy and directional resolutions of theexperiment present the second hurdle. Both are typically prohibitive to resolving the fastvariations of the oscillation pattern at the relevant energies. As a consequence, the observableeffect is reduced to at most a few percent over the relevant energy and zenith range (seeFigure 2), requiring neutrino telescopes with effective masses on the order of megatons toachieve sufficient event statistics.Proponents of various VLV ν Ts in ice and water have performed studies confirming thisidea, finding that a > σ (median) sensitivity to the NMO can be achieved within five9 cos reco E r e c o ( G e V ) Cascades -0.75 -0.25 0.25 0.75 cos reco
Tracks f r a c t i o n a l r a t e d i ff e r e n c e Figure 2: Expected fractional event rate difference between nominal NO and IO inputs (from Table 1) forthe toy model. Cascades are shown on the left, tracks on the right. years of exposure time even in less favorable regions of the neutrino oscillation parameterspace [2, 4, 26].As the oscillation probabilities directly depend on neutrino energy E true , oscillationbaseline ( ∝ cos ϑ true ), and flavor, we split our data into bins of log E reco , cos ϑ reco , and eventclass . It is important to choose a binning fine enough to resolve the NMO signature, whilecoarse enough to retain a sufficient amount of MC statistics per bin, as motivated in Section 2.We have found the division into (40 × ×
2) bins to be suitable, covering a range of E reco from 1 GeV to 80 GeV, the whole sky (cos ϑ reco from − cascades and tracks . Using this binning, for our toy detector introduced in AppendixA, Figure 2 shows the expected fractional event rate difference ( R NO − R IO ) /R NO , where R NO(IO) is the expected event rate for true NO (IO), based on the two sets of nominal modelparameter values given in Table 1.As the most powerful test statistic for distinguishing two simple hypotheses [27], the logarithm of the likelihood ratio T = − max θ ∈ NO L ( n | µ ( θ ))max θ ∈ IO L ( n | µ ( θ )) . (2)is also useful in assessing the ability of an experiment to discriminate between the two(composite) NMO hypotheses at a given confidence level. It is representative of the degree atwhich observing the data n under the NO hypothesis is favored over observing it under thealternate IO hypothesis. The observed spectrum at the detector, n , however, is a convolution The use of the subscript “true” is used to specify the true variables of the neutrinos and to distinguish thesefrom the reconstructed variables, denoted with a subscript “reco”, which will be introduced in Section 4.1. NO A U p Figure 3: Example distributions of Equation (3). The distribution on the left (solid line) represents the caseof NO pseudo-data, while the distribution on the right (dashed) is obtained when the pseudo-data is takenfrom the IO. Here, 1 − p corresponds to the confidence level at which the IO is correctly rejected with aprobability of 50%. of the atmospheric neutrino flux, the effects of neutrino oscillations that bear the NMOsignature, the neutrino interaction and detection processes, and the event reconstruction andclassification procedure. Each one of these effects is accompanied by systematic uncertainties.As their impact on the predicted spectrum µ is modeled, the systematic uncertainties directlyfeed in to the likelihood L of the observation.For this study, we limit ourselves to a simplified treatment using χ statistics and the Asimov dataset. In this approach, the projected median sensitivity is calculated from theaverage experimental outcomes under the two possible NMO hypotheses, as opposed toperforming extensive ensemble tests with randomly fluctuated pseudo-experiments. Thelog-likelihood expression is a simple χ , and Equation (2) can be rewritten as the difference∆ χ = χ − χ . (3)Here, χ is the minimum χ between model predictions and data, with all nuisanceparameters profiled out using NO priors ( χ follows analogously).An illustration of example distributions of (3) for the two different NMO hypothesesis shown in Figure 3. The goal is to obtain a p-value p which quantifies the statisticalcompatibility between the hypothesis that is tested and the one assumed to be true. Inthe ensemble approach, the two distributions would need to be built up by fitting pseudo-experiments. In the Asimov approach, however, certain assumptions about the distributionof (3) allow adopting the expression (cid:113) | ∆ χ | as a sensitivity proxy [11], determining thesignificance at which the wrong ordering can be excluded without the need for pseudo-experiments.For the profiling of the nuisance parameters (any free model parameters), a numericalalgorithm minimizes the χ metric. Whenever external constraints are applied to suchparameters, we add those to the χ value as penalty terms (priors). While the presence of11arameter Nominal value PriorNO IO ν e / ν µ flux ratio 1.0 1.0 ± . ν/ ¯ ν flux ratio 1.0 1.0 ± . ± . ± . ± . θ ( ◦ ) 8.5 8.5 ± . θ ( ◦ ) 42.3 49.5 non-Gaussian [28, 29]∆ m (eV ) 0.00246 -0.00237 ± . × − [28, 29] Table 1: Summary of model parameters in the example NMO analysis, including their nominal values forthe two NMO hypotheses and Gaussian ± σ bounds used as external constraints (priors). The first threeparameters are applied to atmospheric neutrino flux predictions from [30], following the procedure laid outin Section 5.1. The values for the three oscillation parameters are based on a recent global fit [28, 29]. these penalty terms is meant to illustrate a typical approach to problems of this sort, theirsizes do not follow any precise physical motivation. Table 1 gives an overview of all usedmodel parameters, their nominal values for NO and IO, and priors (where applied).
4. Overview of the Staged Approach
The method to obtain templates we describe in this article is divided into four independentparts, referred to as stages . The four stages (flux, oscillation, detection, and reconstruction)and how they are used to obtain event templates are summarized in this section, while moretechnical descriptions of each stage follow in Section 5.
Templates for our example case of an NMO analysis using a VLV ν T are producedefficiently and accurately using the following four stages.Each stage represents a collection of related physical effects. Beginning with the fluxcomputed by the initial stage, each subsequent stage applies a transformation to the outputof the previous stage.1.
Flux
The expected unoscillated atmospheric neutrino fluxes are taken from an externalmodel [30]. Flux values from this model are provided in the form of tables with discretesteps in both neutrino energy, E true , and direction, here the cosine of the zenith angle,cos ϑ true . Therefore, an interpolation must be performed for values between thosetabulated. Crucially, these tables give the integrated flux across the bins, which doesnot necessarily coincide with the flux value at the bin center. Accordingly, we usean integral-preserving (IP) interpolation. In general, atmospheric flux models requireexternal inputs including primary cosmic ray measurements, atmospheric densitymodels, and hadronic interaction measurements. Many associated uncertainties areknown [31, 32] and need to be included as nuisance parameters in an analysis.12 l u x E true E true A e ff E reco e v en t s E true e v en t s x x ⨂ ln(E reco / E true ) E t r ue E true P o sc = = Figure 4: Illustration of the staged approach for obtaining event templates, here for simplicity using acharacterization in one dimension (energy) only. Steps 1, 2, and 3 are in true energy ( E true ); the productof these yields the expected event distribution (lower left). Smearing this spectrum with energy-dependentenergy resolution functions (step 4) gives the reconstructed event rate spectrum (lower right). Note thatthe dotted green line in step 2 shows a hypothetical change of oscillation parameters, affecting only stage 2.Smoothing can now directly be applied to the distributions in steps 3 and 4, instead of the fully weightedMC as in the direct KDE method. Oscillation
Flavor oscillations of neutrinos traversing the Earth modify the flavorcontent of the original flux in a manner that depends on the energies and path lengths(derived from the direction) of the neutrinos. Additional intrinsic neutrino propertiesdetermine the standard flavor oscillation probabilities: three mixing angles and twoindependent mass-squared splittings, as well as a possible non-zero CP-violating phase.In addition, matter effects induce modifications in the flavor transition probabilitiescompared to vacuum [23, 22, 33], which makes up the basis of the NMO measurementcapability of VLV ν Ts. In [33], the authors present an analytical expression for theneutrino flavor transition amplitude in a layer of uniform-density matter, which in turnwas later implemented in, for example, the
Prob3++ software [34]. Here, the Earthdensity profile [35] is approximated by a finite number of homogeneous layers and thetotal transition amplitude is represented by a matrix product of the amplitudes in theindividual layers. The main challenge for this stage, which in contrast to the other stagesdoes not require any MC simulation, is to keep the burden of these computationallyexpensive calculations to a minimum, while retaining sufficient accuracy in the modelingof the neutrinos’ propagation.3.
Detection
The number of observed events is determined by the (oscillated) flux aswell as a quantity known as the effective area (alternatively, the effective mass) . This In contrast, high energy physics experiments often calculate an acceptance instead, which is also basedon simulation. μ CC 𝝂 τ CC 𝝂 NC 𝝂 e CC 𝝂 tracks 𝝂 μ 𝝂 τ 𝝂 e Flux 𝝂 μ 𝝂 e Oscillation
Detection 𝝂 μ CC 𝝂 τ CC 𝝂 e CC 𝝂 NC Reconstruction 𝝂 cascades Final template 𝝂 μ CC 𝝂 τ CC 𝝂 NC 𝝂 e CC Stage 1Stage 2Stage 3Stage 4
Figure 5: Flow of neutrino flavors and interaction types through the stages, here shown for neutrinos only (withan analogue counterpart for anti-neutrinos). Neutral current events of all flavors are indistinguishable and cantherefore be conveniently added together. The reconstruction stage not only maps from ( E true × cos ϑ true )-space to ( E reco × cos ϑ reco )-space, but also classifies the events into the cascade and track categories, indicatedby the orange and blue color, respectively. incorporates the probability that a given neutrino interacts within the detector, isdetected, and passes the given data selection criteria. We obtain the eight effectiveareas ( ν e,µ,τ & ¯ ν e,µ,τ charged current (CC) and ν & ¯ ν neutral current (NC) interactions)from simulated MC events that are run through the same selection criteria as the realdata. In general, each of these effective areas will depend on the energy and arrivaldirection of the neutrinos. Depending on the detector geometry, certain symmetries canbe exploited to reduce the number of parameters on which the effective areas depend.Here we assume azimuthal symmetry and therefore only extract effective areas as afunction of E true and cos ϑ true .4. Reconstruction
The process referred to as reconstruction translates the raw signalsrecorded by a detector into estimates of the physical properties of events. Uncertaintiesin these estimates manifest as statistical fluctuations, with respect to the true properties,which can be described by probability density functions we refer to as resolutionfunctions . We estimate the resolution functions from the same MC events as used in thedetection stage, for which we know the true energy, zenith angle, and interaction typeon an event-by-event level. The reconstruction stage uses these estimated resolutionfunctions to build smearing kernels (ensembles of resolution functions) that map theevent rates from the space of true variables into the space of reconstructed observables.Additionally—since most VLV ν Ts cannot exactly distinguish the different neutrinoflavors and interaction types—the events are classified by their signature in the detector.Here, event classes are tracks and cascades , based on whether the event seems tocontain the expected signature of a starting muon track. This process will separate ν µ CC and ¯ ν µ CC interactions from all others, albeit with limited efficiency and purity.For the example NMO analysis, three observables are needed: the primary neutrino’sreconstructed energy ( E reco ), zenith angle ( ϑ reco ), and event classification.Note that there is no universal prescription for identifying the set of stages appropriate14or any given physics analysis or detector. Instead, stages are chosen to exploit validsimplifications for the task at hand. For example, atmospheric neutrino flux and oscillationcalculations depend on readily available tabulated spectra and analytic formulae, respectively.Cosmic ray observatories or high energy particle colliders, by contrast, might require complexstages to describe particle showers, which in turn might depend on high-dimensional, analysis-specific tables. Any physics scenario resulting in multi-particle final states adds furthercomplexity.In essence, the specific problem and analysis at hand determine to which extent MCsampling is necessary and whether the staged approach is applicable. If the latter is indeedthe case, care must be taken concerning the choice of appropriate stages and their specificimplementations. In the remainder of this article, we study in detail the staged approach wehave found particularly effective for an NMO analysis using a VLV ν T. In order to produce the final-level event templates that are ultimately compared to thedata, the four stages are combined as depicted in Figure 4: integration of the product of thefirst three stages (flux, oscillation probability, and effective area) over E true and cos ϑ true yieldsthe event rate in the space of true variables. The event rate in the space of reconstructedobservables is then obtained by a convolution of the true event rate with the reconstructionresolution functions. Finally, multiplication by detector exposure time results in an event count , which can be compared directly to observed data or different templates .Throughout the stages, different combinations of neutrino flavor and interaction type(channels) need to be treated separately, as depicted in Figure 5. Starting with the atmosphericflux, the neutrinos can undergo flavor change via oscillation. Since ν τ production in theatmosphere is expected to be negligible at the energies relevant here, this flavor only appearsthrough oscillation [36]. The detection rate varies between CC and NC interactions [21].Finally, after applying the reconstruction resolutions and event classification, event counts aresummed to get the final-level templates for events classified as tracks and cascades separately.Where not mentioned explicitly, the same treatment is also applied to anti-neutrinos. Thefinal templates are the sum over both, neutrinos and anti-neutrinos.Since the transformations computed by individual stages are independent of one another,a parameter change affecting one stage does not affect the transformations used by the otherthree stages, and in particular not the result of the previous stages. Therefore, we includecaching functionality that reduces the overall computational expense when a number ofsuccessive templates are retrieved while changing one parameter at a time.The transformations performed by the individual stages are dependent on the neutrino’senergy and zenith angle, and therefore must be computed and applied differentially. Allstages are evaluated on a grid of points distributed over E true and cos ϑ true , with the finaltemplates output in E reco , cos ϑ reco , and event class. Points in energy are logarithmicallyspaced in the domain 1 GeV to 80 GeV while points in cosine-zenith are linearly spaced While not shown here, it is possible to extend the model with more parameters or stages to describeadditional effects, such as the modeling of systematic uncertainties. E true ×
400 cos ϑ true Oscillation 400 ×
400 400 E true ×
400 cos ϑ true Detection 400 ×
400 200 E true ×
200 cos ϑ true Reconstruction 200 × × × × E reco ×
40 cos ϑ reco × Table 2: Gridpoints chosen for the staged approach in this work. The output of one stage is the input to thenext stage, and the result of the detection transformation is downsampled from (400 × × × E true ∈ (1, 80) GeV and cos ϑ true ∈ (-1, 1) while the output of reconstruction is in the domain E reco ∈ (1, 80) GeV, cos ϑ reco ∈ (-1, 1), and class ∈ { track, cascade } . Within their respective domains, pointsin energy are logarithmically spaced while points in cosine-zenith are linearly spaced. between − ν e CC, ¯ ν µ NC), we drawan identical number of events. This number, one twelfth of the total number of eventsconstituting a given random sample of the toy model, is referred to as the sample size . Agiven sample is used together with the event-by-event MC weighting technique to generatetemplates for all possible values of θ , that is, to calculate the associated expected counts inall bins of each final-level template. 16 x ⊗ = templatevalues transform transform transform Staged(w/ smoothing) evaluate model on grid evaluate model on grid histogram MC + smooth + spline KDE MC, integrate & bin
Staged w/o smoothing histogram MC histogram MC
Truth evaluate model on grid evaluate model on grid
Flux Osc Det Reco
Figure 6: Operating principles of the different staged approach modes, which differ in how we generate thetransformations of the last two stages. The staged approach without smoothing is employed for validationpurposes in Section 6.1. See text for details.
A complete overview of the different operation modes of the staged approach is givenin Figure 6, which highlights the stages at which these differ in the template generationprocess.
5. Technical Implementation of Stages
The stages within our approach, as summarized in Section 4 and illustrated in Figure 4are subject to different technical and computational challenges due to the physics effectscaptured by each one. In this section we examine specific implementation details whichhighlight how each stage balances performance and precision requirements—even in thepresence of low MC statistics.Therefore, we include caching functionality that reduces the overall computational expensewhen a number of subsequent templates are retrieved while changing one parameter at atime.
In order to preserve the integral of a tabulated set of data, a spline is fit to the integral of the data rather than to the values themselves. Interpolated values in the initial spaceare then found by evaluating the derivative of these splines. We refer to this method asintegral-preserving (IP) interpolation.For the NMO example analysis, the tabulated data of interest are the atmosphericneutrino flux predictions from [30] provided as a function of both E true and cos ϑ true . Tosimplify the problem, the integration is performed along one dimension at a time.Consider the case with fluxes tabulated at M × N points in ( E true , cos ϑ true ). To retrievethe flux at an arbitrary ( E true , cos ϑ true ) point, say ( x, y ), first one spline of the integrated Here, a cumulative sum of the bin values multiplied by the respective bin width. .03.54.04.55.05.56.0 F l u x ( m s s r G e V ) Linear spline Cubic spline
Atm. prediction ( E true = 3.98 GeV) Integral-preserving interpolation-0.75 -0.5 -0.25 0.0 0.25 0.5 0.75cos true I n t e g r a l d e v . ( % ) -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75cos true -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75cos true Figure 7: The top part of the figure shows three different interpolation methods applied to the same set ofdata points from [30] while the bottom portion shows the fractional deviation of the integral (= area underthe curve) from these three methods. The deviations from the integral-preserving method presented in thispaper have a maximum ∼ flux as a function of cos ϑ true is created for each of the M E true locations. The derivativeof each of these splines is evaluated at y , yielding M flux values. The integral of thesevalues is then interpolated with a spline, and finally this spline’s derivative is evaluated at x . This concept generalizes to higher dimensions, but can quickly become computationallyintensive as the number of splines grows. While the splines used in the provided exampleare of one-dimensional cubic type, other spline variants or interpolation techniques can beused, as long as these allow for differentiation. For the example analysis of this article, IPinterpolation is approximately an order of magnitude slower than two-dimensional cubicspline interpolation.The IP method improves upon standard interpolation techniques in that it correctlymodels the turnover of the flux at the horizon (cos ϑ true = 0) and the behavior in the mostupgoing and downgoing regions (cos ϑ true ∼ ± E true , cos ϑ true )-space. More detailed information onthe IP method can be found in [37]. The oscillation library that we employ is an extension of the code described in [38], wherethe authors ported some of the core functions of
Prob3++ to a graphics processing unit (GPU)via the CUDA C API [39]—an application programming interface to perform general purposecomputations on GPUs. We then added back in the ability to handle an arbitrary numberof constant density layers of matter, allowing for highly parallel calculations of three-flavoroscillation probabilities of neutrinos that encounter a realistic radial Earth density profile,with fine-grained control over its characteristics. We implemented the oscillation calculationswith floating point precision selectable to either single (32 bits, or FP32) or double (64 bits,or FP64) precision. With our code run in double precision with
Prob3++ , evaluated on a100 ×
100 grid of neutrino energies E true ranging from 1 GeV to 80 GeV and cos ϑ true values18 igure 8: Deviation of ν µ survival probabilities computed with Prob3++ compared to nuCraft . The leftpanel uses a fixed production height of 20 km for both codes and twelve constant-density layers for
Prob3++ .In the right panel the values from nuCraft are the average probabilities for a range of neutrino productionheights across the atmosphere. spanning the region between − Prob3++ code produce consistent results to the level of 10 − or less. These differences are due todiffering hardware implementations of the same mathematical operations. Switching fromdouble to single precision on the GPU, we find that the magnitudes of the differences staybelow about 10 − for all oscillation channels. Single precision is desirable from a performancepoint of view, since most GPUs comprise a larger number of single precision than doubleprecision arithmetic units, and these extra units can be exploited by the parallelism in ourcode.To evaluate the effects of an approximated Earth density profile using a limited number ofconstant density layers and a constant atmospheric production height—both approximationsthat our code makes—we compare the oscillation probabilities from our implementationof Prob3++ against a reference model. The latter is calculated by nuCraft [40], which iswritten in Python and solves the Schr¨odinger equation numerically. The nuCraft libraryalso supports a realistic variation of the oscillation baselines according to the distribution ofatmospheric neutrino production heights described in [41] and uses an interpolated radialdensity profile of the Earth.To this effect, we first fix the atmospheric neutrino production height to h = 20 km forboth codes, and quantify the deviations arising from the coarser Earth model by calculatingthe ν µ survival probability residuals on a fine grid in cosine zenith and energy. Whenapproximating the Earth’s density profile with only four layers (one for each of the upperand lower mantle, and the outer and inner core), differences of up to 15 % to the outputof nuCraft are seen. These differences decrease to below 5 % when using 12 density layers(see left panel of Figure 8). Using an even more detailed model with 59 layers results indifferences smaller than 0 . Prob3++ probabilities to those obtained under the assumptionof a more realistic distributed atmospheric production height in nuCraft highlights furtherdiscrepancies between the outputs of the two codes (see right panel of Figure 8). However,the largest differences ( ∼ ±
10 %) appear for near horizontal trajectories, while the residualsfor cos ϑ (cid:46) − . As a reminder, the effective areas are quantities used to translate an incoming flux to theevent rates in the detector. These effective areas are calculated from a limited number ofMC events, hence they can suffer from statistical fluctuations which can be a non-negligiblecontribution to the total uncertainty of the final physics result. At the same time, effectiveareas are typically well-behaved quantities in energy and zenith angle (under some realisticassumptions, e.g., that no discontinuous selection cuts are applied and no gaps exist inthe detector acceptance). Therefore, smoothing techniques can be applied to alleviate theunwanted uncertainty contributions from statistical fluctuations.For charged current interactions, we compute the effective area separately for eachneutrino flavor. In contrast, we do not distinguish between flavors for neutral current (NC)interactions, since their cross sections are identical. Neutrinos and anti-neutrinos are handledindependently, accounting for a total of eight independent effective area functions. Forconvenience we include the multiplication by detector exposure time ( t exp ) in the same step,which means that this stage outputs event counts ( N events ) instead of rates N events = Φ[m − s − ] · A eff [m ] · t exp [s] , (4)for some input flux (Φ).In our staged approach we first evaluate the effective areas on a fine grid in ( E true , cos ϑ true )using the MC events via MC integration, where, when generating events, the sampling ischosen to provide a relatively uniform coverage across all grid points. For our example casestudy, we use a uniform sampling across cos ϑ true and a power law spectrum for the energies ∝ E − to closely follow actual IceCube oscillation analyses. (Note that an optimization ofthe sampling choices would benefit both the staged approach and the reference methods.)Still, for small sample sizes, some grid points may have no associated events, leading to gaps20n the distribution. We remove these by applying a simple Gaussian smearing along thetwo-dimensional grid. In a second step, cubic splines are employed to perform smoothing.Here, first, splines are created along the E true dimension individually for every cos ϑ true bin,and evaluated to obtain new values for every grid point. Then, this splining procedure isrepeated along the cos ϑ true dimension.Figure 9 shows the truth template of ν µ CC events on a grid with n bins = 40 ×
40 pointstogether with the fractional deviations that arise when the same template is obtained fromMC samples of different sizes using direct histogramming versus the smoothing methoddescribed above. We use ν µ CC events as an example here and below. Table 3 gives theaverage (binwise, i.e. per degree of freedom) χ values defined as (cid:10) χ (cid:11) = 1 n bins χ = 1 n bins n bins (cid:88) i =1 (cid:0) µ (cid:48) i − µ ref i (cid:1) µ ref i (5)and maximal χ values defined as χ = max ≤ i ≤ n bins (cid:34) (cid:0) µ (cid:48) i − µ ref i (cid:1) µ ref i (cid:35) (6)by which the templates from the our method and from direct histogramming deviate fromtruth (with bin counts µ ref i ).The χ values provide direct insight into how the accuracy of the template descriptioncompares to the statistical uncertainty of the real data that would be observed. Since theobserved data underlies Poisson fluctuations it has an average deviation from truth of χ = 1per bin. An analysis of real data, however, can only test templates based on MC. Theseexhibit their own statistical uncertainties, resulting in finite χ deviations from truth, shownin Table 3 as a function of MC sample size. It is essential that these inaccuracies inherent tothe template generation process are considerably smaller than the statistical fluctuations indata in order to ensure accurate statistical inference.Applying our method we find deviations that are lower by a factor of about 40 for thesmallest MC set, and by a factor of about 13 for the largest. It is noteworthy that themaximum deviation ( χ ) across all bins decreases monotonically with MC sample size,confirming that the used smoothing method does not introduce any observable bias. The usual way to obtain templates in the space of reconstructed variables is to place eachindividual MC event in the final-level distributions according to the reconstruction informationthat the event carries. This is the case for both methods that are used for comparison:direct histogramming and direct KDE, the only difference between these being how thefinal-level distributions are estimated. While this approach correctly takes into accountjoint dependencies of the event reconstruction on the involved variables, it is particularly Generated from the toy model in Appendix A. cos ϑ true E t r u e ( G e V ) Truth e v e n t s
1k samples D i r e c t h i s t .
10k samples 100k samples E t r u e ( G e V ) -0.75-0.25 0.25 0.75 cos ϑ true S t a g e d a pp r o a c h -0.75-0.25 0.25 0.75 cos ϑ true -0.75-0.25 0.25 0.75 cos ϑ true E t r u e ( G e V ) A b s . f r a c t i o n a l d i ff e r e n c e Figure 9: Parametric reference distribution after the first three stages (flux, oscillation, and detection) forthe ν µ CC channel in (cos ϑ true , E true ) (left panel) and relative residuals ( | N − N true | /N true ) for the directhistogramming (right panel, top row) and our proposed method (right panel, bottom row) on a 40 × , 10 , and 10 events, respectively. Thesamples are drawn from the unbinned toy model distributions of Appendix A. sensitive to small MC sample sizes due to the potentially high dimensionality of the space ofreconstructed variables. In contrast, the staged approach uses the available MC simulationto construct detector resolution functions which we integrate to form a transformation thatmaps a template in true variables (such as that shown on the left in Figure 9) onto the spaceof reconstructed variables, what we refer to as the final-level template.In the case study of the NMO analysis, the mapping of true variables ( E true and cos ϑ true )to reconstructed variables ( E reco , cos ϑ reco , and event class) is extracted from the MC asa “migration” tensor of order five, M ijklm . It maps the histogram of event counts in thetwo-dimensional space of true variables, h ij , to the observed histogram of event counts in theSample size 10 Direct hist. (cid:104) χ (cid:105)
215 22.5 2.07 0.201 χ (cid:104) χ (cid:105) χ
460 17.2 2.27 0.975
Table 3: Average χ per bin and the worst-case bin’s χ value comparing templates on a 40 ×
40 grid in( E true , cos ϑ true )-space (i.e., before applying reconstruction resolutions) generated by direct histogramming(top) and the smoothed-staged approach (bottom) with the toy model’s reference template. Shown are valuesobtained for independent input MC samples of various sizes (from 10 up to 10 events per flavor/interactiontype). h (cid:48) klm : h (cid:48) klm = (cid:88) i,j M ijklm h ij . (7)The reconstruction transform in general has to be computed as a five-dimensionaltransform, as all five dimensions can depend on one another—i.e. they are correlated.Studying the correlations among the dimensions in our particular MC revealed, however,that E reco only depends on event class and E true , cos ϑ reco depends on event class and bothinput dimensions, and event class only depends on E true . For each of the three reconstructionvariables, we subdivide the MC in the quantity’s dependent dimensions to the point thatcorrelations are not visible and that all events in the subdivision can be assumed to be samplesfrom the same one-dimensional distribution.—i.e. the resolution functions we generate.There is a trade-off in terms of how much to subdivide the MC for producing theseresolution functions. Since resolution changes as a function of a dependent dimension,sufficiently narrow subdivisions in that dimension group together MC events drawn fromessentially the same distribution. Subdivisions that are too wide will group together eventsdrawn from different distributions and the resulting resolution functions will be erroneous.However, narrower subdivisions admit fewer MC events in each subdivision and so lead togreater statistical variations in the estimated resolution functions (i.e., their shapes will bemore affected by random fluctuations in the MC).To balance these competing factors, we devised the following heuristic. For the quantitybeing characterized, we divide each dependent dimension evenly—except event class, whichis binary. E true is divided evenly in log-space to help ensure even subdivisions group togetherevents with similar energy resolution, as this quantity changes more rapidly at low E true thanat high E true . We allow each subdivision of E true to separately expand enough to captureat least 100 events, and at least 500 events in each subdivision of cos ϑ true . If expansion isperformed, subdivisions’ upper and lower edges are expanded by the same factor (up to thelimits of the dimension). The captured events are then used to produce resolution functions.The remaining parameters that require tuning in this heuristic are the number of sub-divisions to use for each dependent dimension for each quantity being characterized. Forthis, we visually inspect the 2-dimensional distributions of each characterized quantity as afunction of each dependent dimension and require that the events in each subdivision do notdisplay strong dependence on the dependent dimension.If the functional form of the resolution functions is known, a parametric model of thisform fit to the MC yields the most accurate and lowest variance reconstruction transform.However, as we do not know the form of these functions, a non-parametric density estimationtechnique is used to approximate them. In particular, we chose to use adaptive KDE [42]with bandwidths scaled uniformly such that the narrowest is that found from the ImprovedSheather Jones (ISJ) algorithm [43]. KDE works by placing a kernel function (we use aGaussian) centered at the value of each event’s variable to be described and then summingover all kernels. Adaptive bandwidth KDE uses different widths for each kernel, where thebandwidths are inversely proportional to the density of points near the location of the kernel.The ISJ bandwidth selection algorithm used to normalize the kernel widths is an improvement23 cos reco cos true ln( E reco / E true )True dens. Adaptive kernel dens. estimate Figure 10: Example energy and cosine-zenith-angle resolution distributions for ν µ CC events classified ascascades, estimated with histograms and adaptive KDE. Energy resolution is shown for 100 events with E true ∈ [26 . , .
8] GeV and cosine-zenith resolution for 100 events with E true ∈ [1 . , .
1] GeV. The samplesused to construct the histogram and KDE are shown by vertical lines beneath the histograms. over predecessor algorithms (e.g., [44, 13]) in that it does not make assumptions that thequantity being estimated is drawn from a Gaussian distribution. In our experience, thisoutperforms fixed bandwidth KDE by not underestimating the heavy-tailed distributions weencounter, but it bears repeating that other density estimation techniques can yield better orworse results depending on the specifics of the MC in question. An example of two resolutionfunctions (one for both energy and zenith angle, respectively) estimated using the adaptiveKDE method is shown in Figure 10.Figure 11 again demonstrates that templates obtained from our KDE-based reconstructionstage deviate much less from the parametric reference template after reconstruction thantemplates from direct histogramming of reconstructed MC events.
6. Validation and Comparison of Templates
This section more closely examines the templates generated with the staged approachand compares them—along with those generated by the other two methods (histograms andKDE)—to the parametric reference distributions of the toy detector model. This validationis split into two parts. The first examines the grid of points that are used to numericallyapproximate the integral over the first three stages, whereas the effect of smoothing isinvestigated in the second.
In order to demonstrate the validity of our choice of grid points shown in Table 2 aswell as the equivalence between the staged approach and traditional MC weighting as gridpoint spacing in E true and cos ϑ true is reduced, we compare the staged approach withoutsmoothing (i.e. using raw histograms as transforms in place of smoothing functions andKDEs) to direct histogramming. The specific comparison done here without smoothing issolely for the purpose of validating the principle of stages vs. direct histograms.24 cos ϑ reco E r e c o ( G e V ) Truth e v e n t s
1k samples D i r e c t h i s t .
10k samples 100k samples E r e c o ( G e V ) -0.75-0.25 0.25 0.75 cos ϑ reco S t a g e d a pp r o a c h -0.75-0.25 0.25 0.75 cos ϑ reco -0.75-0.25 0.25 0.75 cos ϑ reco E r e c o ( G e V ) A b s . f r a c t i o n a l d i ff e r e n c e Figure 11: Same as Figure 9, but comparing final-level templates after all four stages are applied. Note thatthe residuals in the 1k-samples plot for direct histogramming go up to 31 but the scale is clipped at 10.
Grid ( M × N ) 40 ×
40 80 ×
80 160 ×
160 320 ×
320 640 ×
640 1280 × (cid:104) χ (cid:105) χ Table 4: Average and maximal χ deviations per bin of the final 40 × × E true ,cos ϑ true ) for the first three stages, using an MC sample of 10 events. The last (=reconstruction) stage usesa reduced binning, as described in the text. Table 4 shows the χ difference (cf. Equations (5) and (6)) between the final templatesobtained from the staged approach (with bin counts µ (cid:48) i ) and direct histogramming ( µ ref i ) fora variety of grid point densities in E true and cos ϑ true , using the same MC set of size 10 forboth methods. These templates are output with a binning of 40 × × E reco , cos ϑ reco ,and event class. The relative decrease in the average χ value roughly scales with the inverseof the relative grid density increase, thus confirming that the two methods will agree toarbitrary precision in the asymptotic limit. In the following, for practical reasons we limitourselves to the specific case summarized in Table 2. To validate the final templates with smoothing applied at each stage, we compare themdirectly to truth. For reference, we also show the agreement resulting from both the directhistogramming and the direct KDE methods.While Table 5 quantifies deviations from the reference distributions again in terms of χ and in dependence of MC sample size, Figure 12 displays the final-level templates for each ofthe aforementioned methods using a sample with 10 events.25 cos ϑ reco E r e c o ( G e V ) Cascades -0.75-0.25 0.25 0.75 cos ϑ reco Tracks e v e n t s (a) Truth -0.75-0.25 0.25 0.75 cos ϑ reco E r e c o ( G e V ) Cascades -0.75-0.25 0.25 0.75 cos ϑ reco Tracks e v e n t s (b) Direct Histogramming -0.75-0.25 0.25 0.75 cos ϑ reco E r e c o ( G e V ) Cascades -0.75-0.25 0.25 0.75 cos ϑ reco Tracks e v e n t s (c) Direct KDE -0.75-0.25 0.25 0.75 cos ϑ reco E r e c o ( G e V ) Cascades -0.75-0.25 0.25 0.75 cos ϑ reco Tracks e v e n t s (d) Staged Approach Figure 12: Final-level templates used for the example data analysis. The reference distributions (truth)obtained directly for the toy detector model parameterizations are shown in panel (a). Given the samesample of 10 events the estimated distributions using histograms are shown in panel (b), using KDEs inpanel (c), and using the staged approach in panel (d). The staged approach outperforms the two alternatives in terms of χ values by more thanone order of magnitude for all those sample sizes studied here. Furthermore, inaccuraciesof the templates from the staged approach scale with the inverse of sample size almost asfast as those of templates from direct histogramming. In addition, it is noteworthy thatthe KDE method shows comparably slow convergence, i.e., it performs worse than directhistogramming for the sample size of 10 .While for the experimental data (or pseudo-data) one expects statistical fluctuations onthe order of χ = 1 . and the staged approach, the average χ deviation from truth (using the same χ definition as for data) is only about 30% of whatis expected just from statistical fluctuations in data, while more than 10 events would benecessary to achieve the same average χ using direct histogramming or KDE. (See Table 5for details.) Therefore, to reach an equal accuracy, two or more orders of magnitude largersamples are needed for histogramming or KDE compared to the staged approach. The nextsection illustrates the implications for running a data analysis.26ample size 10 Direct Histogramming (cid:104) χ (cid:105)
468 42.6 4.27 0.458 χ . ·
906 138 10.5Direct KDE (cid:104) χ (cid:105) χ
245 90.2 50.3 25.3Staged Approach (cid:104) χ (cid:105) χ Table 5: Average and maximal χ deviations per bin of the final 40 × ×
7. Example Analysis Results
To illustrate the impact of sample size, we show the resulting (cid:112) ∆ χ as an estimate forthe sensitivity to the NMO for our example analysis in Figure 13. For reference, the trueresult is derived directly from the exact templates based on the parametric toy detectormodel and lies at (cid:112) ∆ χ = 5 .
75. For the three methods discussed throughout this paper,the statistical uncertainty of the obtained sensitivity is indicated by error bars in the figure.This uncertainty is computed from several statistically independent MC sets and indicatesthe central 68% quantile of each ensemble. In particular, as the sensitivity proxy does nottake into account MC uncertainty [5, 6], this range is not, a priori, expected to reflect anysensitivity bias for the three methods.The uncertainty reveals that the methods exhibit quite different intrinsic fluctuation oftheir respective sensitivity estimates, as well as different scaling behavior of the variancewith sample size. As sample size decreases, direct histogramming without any smoothingapplied results in an increasing overestimation of a VLV ν T’s ability to exclude the wrongneutrino mass ordering. In the most extreme case shown here (corresponding to the smallestsample size of 10 ), the sensitivity is estimated to be more than one order of magnitudegreater than the actual capability of the experiment. Only for the sample size of 10 doesdirect histogramming indeed give reliable results. This is expected from the simple rule ofthumb (cf. Section 2.2) of O (10 ) events per bin × O (10 ) bins.Illustratively, an undersampling of the detector response distributions due to low MCstatistics is highly likely to lead to an overestimation of the experiment’s sensitivity becausethe NMO signature that is present in the space of true variables is carried over to randombins in the reconstructed observables with reduced cancellation .Applying KDE smoothing to the weighted events instead of histogramming them (i.e.,direct KDE) leads to a reduction of the overestimated sensitivity for sample sizes of up to at Each MC set is used together with the staged approach to generate one Asimov toy data template and O (10 ) “test” templates. For example, if a bin in the final-level template is solely populated by (unweighted) MC neutrinos, andno anti-neutrinos, or vice-versa, it will contribute artificially strong to the overall NMO sensitivity due to themissing summation over both event types (cf. Section 3). MC sample size E s t i m a t e d s e n s i t i v i t y ( )
135 4987 TruthStaged approachDirect KDEDirect hist.
Figure 13: Estimated sensitivity ( (cid:112) ∆ χ ) to the NMO vs. sample size for direct histogramming, directKDE, and the proposed staged smoothing methods applied to multiple (between 50 and 200) statisticallyindependent toy MC sets. Vertical lines indicate central 68% quantiles of the ensemble. The dashed horizontalline shows the significance obtained from truth templates based on the parametric toy detector model. Thestaged approach outperforms the other methods—both in terms of bias and variance—for sample sizesthrough 3 · , with direct histogramming only matching the staged approach using 10 samples. Note thatno data points exist for direct KDE and sample sizes above 3 · , as computational processing times becomeimpractically large. Also note that direct histogramming is off-scale high for fewer than 3 · events (meanvalues are indicated to the right of the corresponding markers). least 3 · but is not able to eliminate the bias entirely for the tested sample sizes. Forsample sizes larger than O (10 ), the direct KDE method is too computationally expensive todeliver results within a reasonable time (for more details on timing, see Section 8).The estimated sensitivity using the staged approach is statistically compatible with thetrue sensitivity across the whole range of sample sizes considered. It shows no bias and lowervariance for predicting sensitivity to physics compared to the other methods within the limitsof our testing.
8. Benchmarks
Whether a given analysis method is useful in a realistic setting depends not only on itsnumerical reliability, but also on how long it takes to compute the quantity of interest (notethat this duration is in addition to the initial time needed to generate the MC). For reference,we performed benchmarks of the template generation times in the course of a typical analysisprocess . These are compiled in Figure 14. Timings were obtained on a computer with an Intel Xeon E5-1660 v3 3.0 GHz CPU and an NVIDIAGeForce GTX Titan X GPU. to 10 events direct histogramming is the fastest method,it is unusable owing to the large fluctuations associated with the templates it produces,which in turn result in the grossly overestimated sensitivites shown in Figure 13. DirectKDE only proves competitive when used in conjunction with the smallest datasets. Thefaster-than-linear scaling of its computational needs with sample size then quickly rendersit impractical to use. Our proposed method is independent of sample size by construction(excluding initial start-up costs), but will get more expensive if a finer grid point spacing isdesired.The timing difference between the CPU and GPU implementation of the staged approachis not as large as for the other methods, since it is only using the GPU for parallelizationof the neutrino oscillation weights calculation (whereas the other methods make use of theGPU more extensively). · · · MC sample size10 A v e r a g e t e m p l a t e g e n e r a t i o n t i m e ( s ) Staged Approach (GPU)Staged Approach (CPU)Direct KDE (GPU)Direct KDE (CPU)Direct hist. (GPU)Direct hist. (CPU)
Figure 14: Average template generation time during a typical analysis for input datasets of varying size,shown for the direct histogramming, the direct KDE, and the staged approach. Solid lines represent timingsbased on (partial) GPU acceleration, whereas the dashed ones are for CPU-only calculations.
9. Summary
The search for small physics effects in high statistics neutrino oscillation experimentsrequires careful treatment and use of simulated data. Statistical fluctuations within dis-tributions obtained from Monte Carlo simulations are able to severely distort an analysis,rendering derived constraints or sensitivities essentially meaningless.29he staged approach we have presented serves two main purposes. Firstly, computationalexpense is reduced through sampling of physics and detector response distributions on adiscrete grid instead of computing a weight for every individual Monte Carlo event. In thisrespect, we have demonstrated that our method of breaking down the template generationinto independent stages converges to the MC weighting scheme when using a grid of a highenough, albeit feasible, density. For a fixed number of grid points, the template generationtime has been shown to be independent of the input sample size. Secondly, the stagedapproach allows the application of smoothing techniques to a detector’s response functions.In the specific example shown, the detection stage characterizes the detector’s effective area byintegrating weighted MC events on a grid and smoothing the resulting histogram, followed bythe event reconstruction stage using an adaptive KDE smoothing on the resolution functionsapplied to arrive at final-level templates. This has proven superior to the smoothing of thefinal event distributions since it is faster and—even more importantly—yields more accurateand robust results. The presented choice of smoothing techniques works sufficiently wellfor our purposes, but this choice is neither unique nor do we claim it to be optimal, and itdepends on the wider experimental context. Beside this choice, our overall approach mayprove particularly useful when a fast assessment of the physics potential of various detectordesigns is desired, or when analysis methodologies are optimized. Any final-level analysiswill likely rely on large quantities of MC to guarantee the precise and accurate modelling ofthe experiment.In the example neutrino mass ordering analysis that we have conducted—to benchmarkand compare the different approaches—we found that direct histogramming of events leadsto a gross overestimation of sensitivities when used in conjunction with small numbers ofevents ( (cid:46) events for our toy model). Conversely, the proposed staged approach leads tocorrect results that are largely unaffected by the sample size across the tested range and thevariance of results is small compared to the result above about 10 neutrino events. Thismeans that the necessary amount of simulated events is reduced significantly (by about twoorders of magnitude in our example)—an important aspect especially since Monte Carloevent simulation and reconstruction times can represent major hurdles to progress in thefield of neutrino oscillation experiments. Acknowledgments
The IceCube collaboration gratefully acknowledges the support from the following agen-cies and institutions: USA – U.S. National Science Foundation-Office of Polar Programs, U.S.National Science Foundation-Physics Division, Wisconsin Alumni Research Foundation, Cen-ter for High Throughput Computing (CHTC) at the University of Wisconsin-Madison, OpenScience Grid (OSG), Extreme Science and Engineering Discovery Environment (XSEDE),U.S. Department of Energy-National Energy Research Scientific Computing Center, Particleastrophysics research computing center at the University of Maryland, Institute for Cyber-Enabled Research at Michigan State University, and Astroparticle physics computationalfacility at Marquette University; Belgium – Funds for Scientific Research (FRS-FNRS andFWO), FWO Odysseus and Big Science programmes, and Belgian Federal Science Policy Of-30ce (Belspo); Germany – Bundesministerium f¨ur Bildung und Forschung (BMBF), DeutscheForschungsgemeinschaft (DFG), Helmholtz Alliance for Astroparticle Physics (HAP), Initia-tive and Networking Fund of the Helmholtz Association, Deutsches Elektronen Synchrotron(DESY), and High Performance Computing cluster of the RWTH Aachen; Sweden – SwedishResearch Council, Swedish Polar Research Secretariat, Swedish National Infrastructure forComputing (SNIC), and Knut and Alice Wallenberg Foundation; Australia – AustralianResearch Council; Canada – Natural Sciences and Engineering Research Council of Canada,Calcul Qu´ebec, Compute Ontario, Canada Foundation for Innovation, WestGrid, and Com-pute Canada; Denmark – Villum Fonden, Danish National Research Foundation (DNRF),Carlsberg Foundation; New Zealand – Marsden Fund; Japan – Japan Society for Promotionof Science (JSPS) and Institute for Global Prominent Research (IGPR) of Chiba University;Korea – National Research Foundation of Korea (NRF); Switzerland – Swiss National ScienceFoundation (SNSF); United Kingdom – Department of Physics, University of Oxford.
Appendix A. Toy Data Model
In the following we provide a parametric toy detector model used to transform theoscillated atmospheric fluxes into event counts. The functions we use either serve as directinputs (truth) to the various stages of the simulation chain laid out in Section 4, or assampling distributions from which toy MC samples are drawn. We point out here that theseare entirely empirically motivated, and should only be seen as proxies of the performanceindicators in next-generation VLV ν Ts (such as the IceCube Upgrade [1], PINGU [2, 3], orKM3NeT/ORCA [4]).Simplifications or limitations of the model do not affect the computational analysis tech-niques themselves. Rather, the goal in the following is to capture the most essential featuresof the expected detector response: threshold effects in detection, the finite accuracy andskew of reconstruction resolution functions, as well as limited flavor and charge identificationcapabilities. This does not invalidate the conclusions drawn from comparing the variousanalysis approaches.
Appendix A.1. Detection Efficiency
We assume a detector of fiducial mass M fid = 10 megaton, with a neutrino detectionenergy threshold of E th = 1 GeV for all neutrino flavors and interaction channels apart from ν τ charged current (CC) interactions, where the intrinsic interaction threshold is higher, at E th = 3 . M α eff = ρ ice V α eff for a given combination, α , offlavor and interaction type, where ρ ice is the ice density and V α eff the detector’s correspondingeffective volume, exhibits a phenomenological dependence on true neutrino energy, E true ,asymptotically approaching M fid according to an exponential function: M α eff ( E true ) = M fid × (cid:0) − e − k α × ( E true / GeV − E th / GeV) (cid:1) for E true > E th . (A.1)We include three effective masses to cover all the neutrino interaction channels: one for ν e , ¯ ν e , ν µ , and ¯ ν µ CC, one for ν τ and ¯ ν τ CC, and one for all NC channels. For the CC31hannels we choose k α = 0 .
4, while for the NC channels the function rises more slowly, with k α = 0 .
1. The left panel of Figure A.15 shows these dependencies for neutrino energies up to E true = 80 GeV. The detector can be roughly considered fully efficient ( M eff = M fid ) for alldetection channels above E true ≈
50 GeV.The ν -¯ ν asymmetry—which is required to make the NMO measurement—will be intro-duced through differences in flux and cross sections, i.e., it will become apparent in thedetector’s effective area. The latter we obtain from the effective mass via the conversion A α eff ( E true ) = σ α ( E true ) × n ice /ρ ice × M α eff ( E true ) , (A.2)where σ α is the total neutrino-nucleon cross section for a given flavor-interaction channel α , n ice ≈ × cm − is the nucleon density in ice, and ρ ice ≈ .
92 g cm − the mass density.We also make some simplifying assumptions about the cross sections used in Equa-tion (A.2), in that we take ν e and ν µ (¯ ν e and ¯ ν µ ) CC cross sections to be the same, as wellas all ν x (¯ ν x ) NC cross sections. In addition, we model all the mentioned cross sections torise strictly linearly with E true above E true = 1 GeV [45]: σ α ( E true ) /E true = c α × − cm GeV − , (A.3)where we set c ν e,µ CC = 2 c ¯ ν e,µ CC = 0 .
70 , (A.4) c ν x NC = 2 c ¯ ν x NC = 0 .
25 . (A.5)To obtain ν τ (¯ ν τ ) CC effective areas, we interpolate the corresponding neutrino-nucleon crosssection curves given in [46]. All resulting effective areas as a function of neutrino energy aredepicted in the right panel of Figure A.15. We take these to be invariant in azimuth, butuniversally introduce an arbitrary, smooth polynomial modification M with the zenith angledependency M ( x ) = 120 ( − x + x − x ) + 1 ( x ≡ cos ϑ true ) , (A.6)which we normalize to unit area . Appendix A.2. Reconstruction Resolutions
Neutrino zenith resolutions with respect to cos ϑ are represented by single Gaussiandistributions. The distributions’ parameters are taken as functions of E true only. For eachflavor and interaction channel, we assign a mean µ ∆ cos ϑ ( E true ) = 0 across all energies, and astandard deviation of σ ∆ cos ϑ ( E true ) = . √ E true / GeV + 0 . µ (cid:48) and σ (cid:48) with respect to their standard form, via the transformation A eff ( E true ) is the average over the full sky, cos ϑ true ∈ [ − , +1]. E true (GeV) M e ff ( M t ) , e CCCC x NC 1 2 4 8 16 32 64 E true (GeV) A e ff ( m ) , e CC , e CCCCCCNC x NC Figure A.15: Effective masses (left) and areas (right) as a function of true neutrino energy for a generictoy detector with fiducial mass of 10 Mt. The dependency of the effective masses on energy is given inEquation (A.1). Cross sections are from Equation (A.3), except for ν τ and ¯ ν τ interactions, which areinterpolated from [46]. Effective masses are the same for neutrinos and anti-neutrinos. See text for details. x → ( x − µ (cid:48) ) /σ (cid:48) . These parameters again only depend on E true . The CC distributions areassumed identical for all flavors, and are shown in Figure A.16: µ (cid:48) CC∆ E ν ( E true ) = 0 , σ (cid:48) CC∆ E ν ( E true ) = (cid:32) . (cid:112) E true / GeV + 0 . (cid:33) × E true . (A.7)For NC interactions, we take a spread that scales with E true in the same way σ (cid:48) CC∆ E ν does, butassume a non-zero mean due to the undetected energy carried away by the outgoing neutrino: µ (cid:48) NC∆ E ν ( E true ) = − . E true .Note that each energy and cosine zenith residual distribution is renormalized such thatits integral over the physical region (∆ E ν + E true ≥ − ≤ (∆ cos ϑ + cos ϑ true ) ≤ Appendix A.3. Event Classification
Correctly identifying few-GeV CC muon neutrino interactions with relatively sparselyinstrumented neutrino telescopes in water/ice is difficult mainly for two reasons. The tracklength of a near minimum ionizing muon is only on the order of a few meters, comparable tothe extent of an electromagnetic cascade of the same energy. Also, photon scattering lengthssimilar to the horizontal spacing between photomultiplier tubes smear out the Cherenkovring around the muon direction, which is otherwise observed at a specific angle with respectto the muon direction in the medium.We take into account the muon neutrino CC (“track”) identification efficiency p µ, CCtrack improving with (reconstructed) neutrino energy, E reco , by setting p µ, CCtrack ≡ p µ, CCtrack ( E reco ) = 0 . × (cid:0) − e − . × ( E reco / GeV+0 . (cid:1) . (A.8)This reflects the track length of the secondary muon increasing linearly with its energy, butalso the possible production of a low-energy muon which cannot be distinguished from the33 E reco E true (GeV) E true = 2.00 GeV E true = 5.60 GeV E true = 9.20 GeV E true = 12.80 GeV E true = 16.40 GeV E true = 20.00 GeV x CC Figure A.16: Example energy resolution functions(right-skewed Gumbel) used for all CC interactions,as given by Equation (A.7). The modes of the cor-responding NC resolution functions are shifted by − . E true with respect to the distributions depictedhere. E reco (GeV) T r a c k I D p r o b a b ili t y CCCCCCe , NCx
Figure A.17: Event classification efficiencies imple-mented as functions of reconstructed neutrino energy.Shown is the probability to identify an event of agiven type as “track-like”. Events are identified as“cascade-like” with complementary probabilities. accompanying hadronic cascade even for higher-energy muon neutrino CC interactions. Allother (in)efficiencies are assumed to be constant: p e , CCtrack ( E reco ) = p NCtrack ( E reco ) = 0 .
15 , (A.9) p τ, CCtrack ( E reco ) = 0 .
25 . (A.10)These are shown in Figure A.17. The probability to identify any event as “cascade-like”for a given reconstructed energy is just the complementary probability to that of the trackidentification.When a toy MC event is subject to this classification, we assign it one of two dis-crete numbers—representative of either identification as track or cascade—with the aboveprobabilities.
Appendix B. Uncertainty in Significance
Under the assumption that the test statistic T under two hypotheses H and H isnormally distributed (with means µ and µ and with identical standard deviation σ ),the number of standard deviations ( n σ ) separating the two hypotheses can be written as n σ = | µ − µ | /σ (corresponding to a one-sided hypothesis test and a one-sided conversionfrom p-value). Sampling each distribution with N p pseudo-experiments results in the followinguncertainties for mean and standard deviation (see for example [47])∆ µ = σ (cid:112) N p , (B.1)∆ σ = σ (cid:112) N p − . (B.2)34ince the combination of the quantities is linear, we can perform simple error propagation, sothat the relative uncertainty in significance becomes (with ⊕ denoting sum in quadrature)∆ n σ n σ = ∆ σσ ⊕ ∆ | µ − µ || µ − µ | . (B.3)Using ∆ | µ − µ | = ∆ µ ⊕ ∆ µ = (cid:115) N p σ (B.4)the second term simplifies to∆ | µ − µ || µ − µ | = (cid:115) N p σ | µ − µ | = (cid:115) N p n σ . (B.5)Substituting Equations (B.5) and (B.1) into Equation (B.3) yields∆ n σ n σ = 1 (cid:112) N p − ⊕ (cid:115) N p n σ = (cid:115) N p −
1) + 2 N p n σ . (B.6)The absolute error on the number of standard deviations and its approximation for large N p then follow immediately as∆ n σ = (cid:115) n σ N p −
1) + 2 N p ( N p >> −−−−−→ (cid:112) N p (cid:114) n σ . (B.7) References [1] A. Ishihara, The IceCube Upgrade - Design and Science Goals, in: 36th International Cosmic RayConference (ICRC 2019) Madison, Wisconsin, USA, July 24-August 1, 2019, 2019. arXiv:1907.11699 .[2] M. G. Aartsen, et al., PINGU: a vision for neutrino and particle physics at the South Pole, J. Phys.G44 (5) (2017) 054006. arXiv:1607.02671 , doi:10.1088/1361-6471/44/5/054006 .[3] M. G. Aartsen, et al., Letter of intent: the Precision IceCube Next Generation Upgrade (PINGU),(2017). arXiv:1401.2046v2 .[4] S. Adrian-Martinez, et al., Letter of intent for KM3NeT 2.0, J. Phys. G43 (8) (2016) 084001. arXiv:1601.07459 , doi:10.1088/0954-3899/43/8/084001 .[5] T. Gl¨usenkamp, A unified perspective on modified Poisson likelihoods for limited Monte Carlo data. arXiv:1902.08831 .[6] C. A. Arg¨uelles, A. Schneider, T. Yuan, A binned likelihood for stochastic models, JHEP 06 (2019) 030. arXiv:1901.04645 , doi:10.1007/JHEP06(2019)030 .[7] R. Barlow, Introduction to statistical issues in particle physics, Statistical problems in particle physics,astrophysics and cosmology. Proceedings, Conference, PHYSTAT 2003, Stanford, USA, September 8-11,2003, C030908 (2003) MOAT002. arXiv:physics/0311105 .[8] G. Cowan, Statistical Data Analysis, Oxford University Press, 1998.[9] R. H. Byrd, P. Lu, J. Nocedal, A limited memory algorithm for bound constrained optimization, SIAMJ. Sci. Comput. 16 (1995) 1190–1208.
10] G. Cowan, K. Cranmer, E. Gross, O. Vitells, Asymptotic formulae for likelihood-based tests of newphysics, Eur. Phys. J. C71 (2011) 1554, [Erratum: Eur. Phys. J.C73,2501(2013)]. arXiv:1007.1727 , doi:10.1140/epjc/s10052-011-1554-0,10.1140/epjc/s10052-013-2501-z .[11] M. Blennow, P. Coloma, P. Huber, T. Schwetz, Quantifying the sensitivity of oscillation experiments tothe neutrino mass ordering, JHEP 2014 (3) (2014) 28. doi:10.1007/JHEP03(2014)028 .[12] K. S. Cranmer, Kernel estimation in high-energy physics, Comput. Phys. Commun. 136 (2001) 198–207. arXiv:hep-ex/0011057 , doi:10.1016/S0010-4655(00)00243-5 .[13] D. W. Scott, On optimal and data-based histograms, Biometrika 66 (3) (1979) 605. doi:10.1093/biomet/66.3.605 .[14] Y. Fukuda, et al., Evidence for oscillation of atmospheric neutrinos, Phys. Rev. Lett. 81 (1998) 1562–1567. arXiv:hep-ex/9807003 , doi:10.1103/PhysRevLett.81.1562 .[15] Q. R. Ahmad, et al., Measurement of the rate of ν e + d → p + p + e − interactions produced by B solar neutrinos at the Sudbury Neutrino Observatory, Phys. Rev. Lett. 87 (2001) 071301. arXiv:nucl-ex/0106015 , doi:10.1103/PhysRevLett.87.071301 .[16] B. Aharmim, et al., Combined analysis of all three phases of solar neutrino data from the SudburyNeutrino Observatory, Phys. Rev. C88 (2013) 025501. arXiv:1109.0763 , doi:10.1103/PhysRevC.88.025501 .[17] F. Capozzi, E. Lisi, A. Marrone, A. Palazzo, Current unknowns in the three neutrino framework, Prog.Part. Nucl. Phys. 102 (2018) 48–72. arXiv:1804.09678 , doi:10.1016/j.ppnp.2018.05.005 .[18] P. F. De Salas, S. Gariazzo, O. Mena, C. A. Ternes, M. T´ortola, Neutrino Mass Ordering fromOscillations and Beyond: 2018 Status and Future Prospects, Front. Astron. Space Sci. 5 (2018) 36. arXiv:1806.11051 , doi:10.3389/fspas.2018.00036 θ , δ CP , and themass ordering, JHEP 01 (2019) 106. arXiv:1811.05487 , doi:10.1007/JHEP01(2019)106 .[21] C. Patrignani, et al., Review of particle physics, Chin. Phys. C40 (10) (2016) 100001. doi:10.1088/1674-1137/40/10/100001 .[22] L. Wolfenstein, Neutrino oscillations in matter, Phys. Rev. D17 (1978) 2369.[23] S. P. Mikheev, A. Y. Smirnov, Resonant amplification of neutrino oscillations in matter and solarneutrino spectroscopy, Nuovo Cim. C9 (1986) 17–26.[24] S. T. Petcov, S. Toshev, Three neutrino oscillations in matter: analytical results in the adiabaticapproximation, Phys. Lett. B187 (1987) 120–126. doi:10.1016/0370-2693(87)90083-9 .[25] E. K. Akhmedov, M. Maltoni, A. Yu. Smirnov, Oscillations of high energy neutrinos in matter: preciseformalism and parametric resonance, Phys. Rev. Lett. 95 (2005) 211801. arXiv:hep-ph/0506064 , doi:10.1103/PhysRevLett.95.211801 .[26] K. Abe, et al., Letter of intent: the Hyper-Kamiokande experiment — detector design and physicspotential —. arXiv:1109.3262 .[27] J. Neyman, E. S. Pearson, On the problem of the most efficient tests of statistical hypotheses, Philosoph-ical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences231 (694-706) (1933) 289–337. doi:10.1098/rsta.1933.0009 .[28] M. C. Gonzalez-Garcia, M. Maltoni, T. Schwetz, Updated fit to three neutrino mixing: status of leptonicCP violation, JHEP 2014 (11) (2014) 52. doi:10.1007/JHEP11(2014)052 arXiv:1502.03916 , doi:10.1103/PhysRevD.92.023004 .[31] G. D. Barr, T. K. Gaisser, S. Robbins, T. Stanev, Uncertainties in atmospheric neutrino fluxes, Phys.Rev. D74 (2006) 094009. arXiv:astro-ph/0611266 , doi:10.1103/PhysRevD.74.094009 .[32] J. Evans, D. G. Gamez, S. D. Porzio, S. S¨oldner-Rembold, S. Wren, Uncertainties in atmospheric muon-neutrino fluxes arising from cosmic-ray primaries, Phys. Rev. D95 (2) (2017) 023012. arXiv:1612.03219 , oi:10.1103/PhysRevD.95.023012 .[33] V. Barger, K. Whisnant, S. Pakvasa, R. J. N. Phillips, Matter effects on three-neutrino oscillations,Phys. Rev. D22 (1980) 2718–2726. doi:10.1103/PhysRevD.22.2718 .[34] R. Wendell, Prob3++ software for computing three flavor neutrino oscillation probabilities, , 2012.[35] A. M. Dziewonski, D. L. Anderson, Preliminary reference Earth model, Physics of the Earth and planetaryinteriors 25 (4) (1981) 297 – 356. doi:http://dx.doi.org/10.1016/0031-9201(81)90046-7 .[36] A. Bulmahn, M. H. Reno, Secondary atmospheric tau neutrino production, Phys. Rev. D82 (2010)057302. arXiv:1007.4989 , doi:10.1103/PhysRevD.82.057302 .[37] S. Wren, Neutrino Mass Ordering Studies with IceCube-DeepCore, Ph.D. thesis, Manchester U. (2018).URL [38] R. G. Calland, A. C. Kaboth, D. Payne, Accelerated event-by-event neutrino oscillation reweightingwith matter effects on a GPU, JINST 9 (2014) P04016. arXiv:1311.7579 , doi:10.1088/1748-0221/9/04/P04016 .[39] J. Nickolls, I. Buck, M. Garland, K. Skadron, Scalable parallel programming with cuda, 2008 IEEE HotChips 20 Symposium (HCS) (2008) 1–2.[40] M. Wallraff, C. Wiebusch, Calculation of oscillation probabilities of atmospheric neutrinos using nuCraft,Comput. Phys. Commun. 197 (2015) 185–189. arXiv:1409.1387 , doi:10.1016/j.cpc.2015.07.010 .[41] T. K. Gaisser, T. Stanev, Path length distributions of atmospheric neutrinos, Phys. Rev. D57 (1998)1977–1982. doi:10.1103/PhysRevD.57.1977 .[42] I. S. Abramson, On bandwidth variation in kernel estimates—a square root law, Ann. Statist. 10 (4)(1982) 1217–1223. doi:10.1214/aos/1176345986 .[43] Z. I. Botev, J. F. Grotowski, D. P. Kroese, Kernel density estimation via diffusion, Ann. Statist. 38 (5)(2010) 2916–2957. doi:10.1214/10-AOS799 .[44] S. J. Sheather, M. C. Jones, A reliable data-based bandwidth selection method for kernel densityestimation, Journal of the Royal Statistical Society. Series B (Methodological) 53 (3) (1991) 683–690.[45] J. A. Formaggio, G. P. Zeller, From eV to EeV: neutrino cross sections across energy scales, Rev. Mod.Phys. 84 (2012) 1307–1341. arXiv:1305.7513 , doi:10.1103/RevModPhys.84.1307 .[46] A. Gazizov, M. Kowalski, K. S. Kuzmin, V. A. Naumov, C. Spiering, Neutrino-nucleon cross sectionsat energies of megaton-scale detectors, EPJ Web Conf. 116 (2016) 08003. arXiv:1604.02092 , doi:10.1051/epjconf/201611608003 .[47] D. Sivia, Data Analysis: A Bayesian Tutorial, Oxford University Press, 2006..[47] D. Sivia, Data Analysis: A Bayesian Tutorial, Oxford University Press, 2006.