[PDF] Bayesian Analysis of a Future Beta Decay Experiment's Sensitivity to Neutrino Mass Scale and Ordering

Abstract

Bayesian modeling techniques enable sensitivity analyses that incorporate detailed expectations regarding future experiments. A model-based approach also allows one to evaluate inferences and predicted outcomes, by calibrating (or measuring) the consequences incurred when certain results are reported. We present procedures for calibrating predictions of an experiment's sensitivity to both continuous and discrete parameters. Using these procedures and a new Bayesian model of the \beta-decay spectrum, we assess a high-precision \beta-decay experiment's sensitivity to the neutrino mass scale and ordering, for one assumed design scenario. We find that such an experiment could measure the electron-weighted neutrino mass within \sim40\,meV after 1 year (90\% credibility). Neutrino masses >500\,meV could be measured within \approx5\,meV. Using only \beta-decay and external reactor neutrino data, we find that next-generation \beta-decay experiments could potentially constrain the mass ordering using a two-neutrino spectral model analysis. By calibrating mass ordering results, we identify reporting criteria that can be tuned to suppress false ordering claims. In some cases, a two-neutrino analysis can reveal that the mass ordering is inverted, an unobtainable result for the traditional one-neutrino analysis approach.

Full PDF

BBayesian Analysis of a Future Beta Decay Experiment’s Sensitivity to Neutrino MassScale and Ordering

A. Ashtari Esfahani, M. Betancourt, Z. Bogorad, S. B¨oser, N. Buzinsky, R. Cervantes, C. Claessens, L. de Viveiros, M. Fertl, J. A. Formaggio, L. Gladstone, M. Grando, M. Guigue,

8, a

J. Hartse, K. M. Heeger, X. Huyan, J. Johnston, A. M. Jones, K. Kazkaz, B. H. LaRoque, A. Lindman, R. Mohiuddin, B. Monreal, J. A. Nikkel, E. Novitski, N. S. Oblath, M. Ottiger, W. Pettus,

1, 11

R. G. H. Robertson, G. Rybka, L. Salda˜na, V. Sibille, M. Schram, P. L. Slocum, Y.-H. Sun, P. T. Surukuchi, J. R. Tedeschi, A. B. Telles, M. Thomas, T. Th¨ummler, L. Tvrznikova,

10, b

B. A. VanDevender,

1, 7

T. E. Weiss,

3, 9, c

T. Wendler,

5, d

E. Zayas, and A. Ziegler Center for Experimental Nuclear Physics and Astrophysics and Department of Physics,University of Washington, Seattle, WA 98195, USA Symplectomorphic, LLC, New York, NY 10026, USA Laboratory for Nuclear Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Institut f¨ur Physik, Johannes Gutenberg-Universit¨at Mainz, 55128 Mainz, Germany Department of Physics, Pennsylvania State University, University Park, PA 16802, USA Department of Physics, Case Western Reserve University, Cleveland, OH 44106, USA Paciﬁc Northwest National Laboratory, Richland, WA 99354, USA Laboratoire de Physique Nucl´eaire et de Hautes ´Energies,Sorbonne Universit´e, Universit´e de Paris, CNRS/IN2P3, Paris, France Wright Laboratory, Department of Physics, Yale University, New Haven, CT 06520, USA Lawrence Livermore National Laboratory, Livermore, CA 94550, USA Department of Physics, Indiana University, Bloomington, IN 47405, USA Institute for Astroparticle Physics, Karlsruhe Institute of Technology, 76021 Karlsruhe, Germany

40 meV after 1 year (90% credibility). Neutrino masses >

500 meV could be measured within ≈ β -decay and external reactor neutrino data,we ﬁnd that next-generation β -decay experiments could potentially constrain the mass ordering usinga two-neutrino spectral model analysis. By calibrating mass ordering results, we identify reportingcriteria that can be tuned to suppress false ordering claims. In some cases, a two-neutrino analysiscan reveal that the mass ordering is inverted, an unobtainable result for the traditional one-neutrinoanalysis approach. I. INTRODUCTION

Model-based simulation is a standard tool for inform-ing the design of physics experiments and predicting theiroutcomes [1]. Such model-based approaches allow one toincorporate detailed expectations regarding future databy performing pseudo-experiments that reﬂect the spanof possible experimental and physical parameter values.In Bayesian sensitivity studies, speciﬁcally, those param-eter values are weighted by prior probabilities. By con-trast, computing and reporting predicted outcomes for“best guess” values ignores information by excluding re-gions of parameter space. a [email protected] b Present address: Waymo LLC, Mountain View, CA 94043, USA c [email protected] d Present address: Paciﬁc Northwest National Laboratory, Rich-land, WA 99354, USA

Moreover, inferential models lend themselves to proce-dures for investigating the consequences of assumptionsmade during analysis. Bayesian methods, in particular,illuminate the eﬀects of assumptions underlying infer-ence (i.e., extracting information from data) and deci-sion making (i.e., claiming results based on inferences)by decoupling the two processes. Thus, when assessingan experiment’s sensitivity, one can quantify, or calibrate ,the expected success or accuracy of procedures that oneplans to use to both analyze data and report results in acertain format. It is also possible to perform conditionalBayesian calibration by ﬁxing one or more parametersbefore simulating data [2–5].Here, we employ Bayesian modeling to perform a sensi-tivity study for a physics experiment. Among physicists, sensitivity typically denotes the level of precision withwhich experimenters can expect to resolve a parameterof interest, assuming a reasonably accurate measurement.(We adopt that usage here, though among statisticians, a r X i v : . [ phy s i c s . d a t a - a n ] D ec Term Deﬁnition Notes

Credibility Fraction of Bayesian posterior probability mass that falls Result of a single real orwithin a reported interval simulated experimentCoverage Fraction of likely experiments for which the reported interval Result of multiple contains the true parameter value, within model assumptions simulated experimentsConﬁdence interval Interval constructed to have a coverage that equals or exceeds Frequentist term; not useda chosen probability (or “conﬁdence level”) in this analysisSensitivity analysis Study of how result precision & accuracy change under reaso- Requires simulatednable variation of all parameters, within model assumptions experiments (pseudo-data)Sensitivity (to very

Upper limit on a parameter, to some conﬁdence level Usage by the KATRIN small parameter) experiment [6]Sensitivity (to parameter

Width of a posterior interval with a chosen credibility Deﬁnition in this paper of any magnitude)

TABLE I. Deﬁnitions are consistent with Particle Data Group descriptions [1] with the exception of the two deﬁnitions of“sensitivity,” which capture a common but less standard usage. The last row describes how “sensitivity” is used in this paper. sensitivity can refer to how a decision making process’ ac-curacy depends on model parameters [2, 3].) For physicsexperiments, in particular, Bayesian sensitivity methodsallow researchers to capitalize on their often extensiveknowledge of experimental conﬁgurations, physical pro-cesses, and expected uncertainties to construct priors.More broadly, model-based analyses oﬀer potential toolsfor physicists to collectively interpret results and judgewhether discovery claims are warranted [2, 7] (see Sec-tion II). These tools thus provide possible alternatives toa 5 σ conﬁdence requirement.To assess sensitivity, we develop a model of an exper-iment’s measurement process, then employ that modelto repeatedly generate and analyze pseudo-data—where“analyze” means “infer posterior distributions.” Parame-ters assumed for data generation are sampled from priors.Next, expectations and intervals are computed from theposteriors, yielding sensitivity results. Finally, we calcu-late how often these results reﬂect “true” values under-lying the generated data (a calibration). In doing so, wequantify the consequences of our modeling and reportingassumptions. For relevant statistical term deﬁnitions, seeTable I.The above procedure is applied here to assess sensitiv-ity to the neutrino mass scale and ordering. Neutrinosare produced in one of three ﬂavor states, each of whichinteracts with electrons, muons or tau leptons. The dis-covery of neutrino oscillations demonstrated that eachﬂavor state can be represented as a superposition of massstates with eigenvalues m , m and m , at least two ofwhich are nonzero [8–10]. While nuclear and particlephysics experiments as well as cosmological models haveplaced upper bounds on the masses and measured thesquared mass diﬀerences [1], the absolute neutrino massscale is unknown. In addition, two orderings of the massspectrum are possible: if m < m < m , the masses aresaid to obey a normal ordering, while if m < m < m ,they follow an inverted ordering . Although recent dataare beginning to shed light on the ordering question, itremains unanswered to date. Sensitivity to the ordering in oscillation experiments is discussed in Qian et al. [11].A promising approach to resolving the mass scale in-volves analyzing the shape of the electron spectrum pro-duced when nuclei β -decay. This “direct mass measure-ment” method is so named because it depends chieﬂyon decay kinematics imposed by energy conservation.Direct mass experiments probe the electron-weightedneutrino mass m β = (cid:113)(cid:80) i =1 | U ei | m i (hereafter “neu-trino mass”), where U ei are Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix elements. The size of m β corre-sponds to a shift in the electron’s maximal kinetic energyand causes a distortion in the β spectrum shape. A pre-cise m β measurement would determine the mass scale,and as a by-product, it could constrain the ordering atmasses (cid:47)

48 meV—the 95% lower limit on the invertedordering mass [1]. Furthermore, if a β -decay experimentis sensitive to the fractional contributions of individualneutrino masses to the full spectral shape, that infor-mation might enable a clearer mass ordering determina-tion [12]. By modeling the shape of a β spectrum, onecan thus assess a direct mass experiment’s sensitivity tothe ordering—accounting for both the magnitude of m β and ﬁner spectral features (see Figure 1).In this paper, we develop a β -decay spectral modelsuited to Bayesian inference. The model uses a two-neutrino approximation (motivated by the fact that∆ m (cid:28) | ∆ m | ) and formulates the mass ordering ques-tion in terms of a parameter η , the fractional contribu-tion of the lighter mass to the spectrum. Constraints on η are most directly accessible via reactor neutrino exper-iments. Thus, for a β -decay experiment to potentiallyresolve the mass ordering, the only external data neededfor the analysis are reactor data. Current as well as fu-ture direct mass experiments could employ this spectralmodel to examine their sensitivity to the neutrino mass For either neutrino mass ordering, m β = m to 1% accuracyfor m (cid:39) .

05 eV. Hence, with knowledge of the ordering andsplittings, an m β measurement determines all three masses. Electron kinetic energy (eV) +1.8563e410 R e l a t i v e p r o b a b ili t y / e V Normal orderingInverted ordering

FIG. 1. A comparison of atomic tritium β -decay spectra forthe two allowed neutrino mass orderings shows how a spectralshape analysis could be sensitive to the mass ordering. Abackground of 10 − /eV is assumed. scale and ordering. As a case study, we use the model toassess sensitivity to these neutrino mass parameters forone possible design scenario of the Project 8 experiment,a high-precision β -decay experiment [13, 14]. II. A MODEL-BASED APPROACH TOCALIBRATING SENSITIVITY RESULTS

Predictive analyses project whether, given some ex-pected data, one will be able to report a particularresult—for example, “ m β falls between 0.05 and 0.09eV with 90% credibility” or “the mass ordering is nor-mal.” In Bayesian analysis, the decision of whether toclaim a particular result occurs after the process of infer-ence . Bayesian inference produces posterior distributions π ( θ | y ) for parameters θ given data y . Such inference ex-ploits Bayes’ rule π ( θ | y ) ∝ π ( y | θ ) · π ( θ ) , where π ( y | θ ) isthe likelihood of y given θ , and π ( θ ) are prior probabil-ity distributions on θ . Experimenters make claims aboutphysics underlying their data by computing expectations(e.g. means and intervals) from posteriors.In practice, there is no guarantee that the process bywhich one decides to claim a scientiﬁc result will performwell when faced with real data. To provide some assur-ance of the decision making process’ good performance, itis necessary to calibrate the process by evaluating it withrespect to possible “model conﬁgurations,” i.e., combina-tions of true parameter values.Decision-making procedures are, in this context, cost-beneﬁt analyses. A calibration requires an inferential loss(or utility) function that expresses the relative loss L in-curred when an experimenter makes diﬀerent reportingchoices. Given multiple reporting options, that experi-menter should select the option with the smallest averageloss over a group of pseudo-experiments. (For example,this enables a decision of whether to report highest den- sity or credible intervals, as discussed in Section II A.)There is no one correct loss function for a given model,but the function should quantify the agreement or dis-crepancy between inputted and reported values. Givensome loss function, model-based calibration then involvescomputing how often one reports accurate results, acrossmany pseudo-experiments with likely model conﬁgura-tions [2, 7, 15].Frequentist calibration entails ﬁnding the worst-caseloss over all model conﬁgurations. Such calibration re-quires tools like the Feldman-Cousins method, which ad-dresses the fact that typical, Gaussian conﬁdence in-tervals are inaccurate for bounded parameters, such asthe positive neutrino mass [16, 17]. This approach istoo time-consuming to implement fully, as it requiresthat likelihood functions be computed and integrated forall reasonable parameter values (or a ﬁne grid). Whileasymptotic approximations can make frequentist calibra-tion computationally viable, they do not fully hold forthe complex statistical models used in modern analy-ses [18, 19].Bayesian calibration, on the other hand, does not re-quire that one determine the worst-case loss ; instead, itentails ﬁnding the expected loss with respect to the priordistribution. This is a probabilistic calculation that canbe readily implemented with sampling methods. In aBayesian analysis, it is not necessary to consider all possi-ble truths—only enough to accurately estimate expectedlosses [19]. Here, we lay out Bayesian calibration pro-cedures for sensitivity to the electron-weighted neutrinomass and mass ordering.

A. Calibrating Neutrino Mass Sensitivity Claims

The Bayesian result of a physics experiment will of-ten be a posterior credible window—that is, the windowwithin which some fraction of a parameter’s posteriorprobability mass falls. This reporting scheme is sensiblefor continuous-domain parameters. If a posterior on m β is inferred from a β spectrum, experimenters can presenttheir result as a credible window of neutrino masses (ineV). We call the width of this window “sensitivity to theneutrino mass.” The reported mass window may consistof either an upper limit with a lower bound at zero, or acredible interval with upper and lower bounds. If poste-riors are inferred for a large number of pseudo-data sets,one may predict an experiment’s sensitivity by comput-ing an expectation value (e.g. mean width or medianwidth) from these credible windows. For a discussionof the frequentist and Bayesian perspectives underlyingthe use of conﬁdence and credible intervals, respectively,see [2].In the continuous-domain case, the loss function pro-vides a method for computing the proportion of likelydata sets for which a posterior interval contains the trueparameter value. For an analysis of sensitivity to m β ,a calibration involves computing the fraction of pseudo-data sets C ≡ − ¯ L for which the credible window in-cludes the true neutrino mass ˜ m β , where ¯ L is the averageloss for an ensemble of pseudo-experiments. ( ¯ L serves toestimate the expected loss with respect to the prior distri-bution.) The fraction C is known as the Bayesian “modelcoverage,” and it estimates the expected accuracy of asensitivity prediction. We ﬁnd C by repeatedly generat-ing and analyzing data given an appropriate distributionof inputted ˜ m β values [2].The calibration procedure is as follows:1. Develop a generation model of the data, and if nec-essary, a second analysis model. The latter maybe approximate but is believed to adequately de-scribe the data. Both models depend on a set ofparameters θ (which includes m β ).2. Select “true” values ˜ θ by sampling from priors π ( θ ),which incorporate as much external knowledge asis reasonable.3. Generate spectral data ˜ y using the generationmodel, with ˜ θ as inputs.4. Use the analysis model from π ( m β | ˜ y ).5. Determine the posterior values ϑ that contain somefraction (credibility) α of the posterior probabilitymass on m β . For a credible interval , calculate theloss function L m β ≡ (cid:40) , ˜ m β ∈ (cid:104) ϑ (1 − α ) / , ϑ (1+ α ) / (cid:105) , Otherwise , (1)where upper and lower posterior bounds ϑ (1 ± α ) / are computed so that (cid:90) ϑ (1 ± α ) / dm β π ( m β | ˜ y ) = 1 ± α . (2)That is, a fraction (1 ± α ) / m β lies below the mass value ϑ (1 ± α ) / . For a limit , the credible window is [0 , ϑ α ].6. Repeat steps 2–5 N trial times. Each repetition con-stitutes a “pseudo-experiment.”7. Compute C by subtracting the mean over resulting L m β values from 1. Potentially, adjust α to obtaina satisfying coverage—that is, to achieve an accept-able number of true and false positive results. Note that credible intervals do not guarantee any frequentist cov-erage. Constructing conﬁdence intervals and computing frequen-tist coverages would require analyzing an ensemble of pseudo-experiments for a multi-dimensional grid of input parameterconﬁgurations. This becomes impractical in many dimensions,where the number of conﬁgurations on any reasonably sized gridgrows exponentially fast [18]. C may not equal α for all α ; the relationship betweenthese two values depends on the model and priors. Theuncertainty on C is (cid:112) C · (1 − C ) /N trial , assuming thenumber of true positive results is binomially distributed.A calibrated sensitivity result then consists of a pro-jected (e.g., mean or median) credible window and itscoverage. It is necessary to sample all input values frompriors before generation (step 2). This creates an ensem-ble of many realistic data sets, where the probabilities ofpossible model conﬁgurations are weighted appropriately.If a model-based sensitivity analysis uses ﬁxed generationinputs (or a grid of inputs, unweighted by prior probabili-ties), it risks biasing results and under- or over-estimatingcoverages.

It is also crucial to generate pseudo-data thatis as realistic as possible , so that the coverage will re-ﬂect the potential consequences of all known assumptionsmade when devising the analysis model or choosing howto report results [2].Note that expected ﬂuctuations in the data itself (i.e.,statistical uncertainties) are incorporated into priors usedfor both data generation and analysis—steps 3 and 4.By contrast, uncertainties representing a lack of clarityin one’s knowledge of ﬁxed parameters (i.e., systematicuncertainties) are incorporated into pre-generation sam-pling and analysis priors—steps 2 and 4.Eq. 2 in step 5 of the above procedure does notuniquely deﬁne a credible window, because the equationfails to specify the window’s central value. A straight-forward choice of window is the quantile interval, whichcontains an equal amount of probability mass above andbelow the posterior median. For asymmetric posteriors,however, highest density intervals (HDIs) may be prefer-able. An HDI is computed by ﬁnding all credible inter-vals for a given α and selecting the narrowest interval.For a continuous posterior, this is equivalent to loweringa horizontal line over the posterior until the outermostintersection points between the line and curve contain afraction α of posterior probability mass [20]. For a par-ticular ensemble of posteriors, assuming both of theseinterval types are qualitatively sensible, one can decidewhich to adopt by computing and comparing coveragesfor each.When measuring a continuous parameter like m β ,physicists are often concerned not only with precision,but also with “discovery potential”: the probability thatthe parameter is nonzero. While neutrinos have beenfound to be massive through oscillation experiments, abeta-decay result distinguishing m β from zero with highconﬁdence or credibility would provide strong veriﬁcationof physicists’ interpretation of these oscillation data [6].Here, we claim a continuous parameter is nonzero if itshighest density credible interval does not intersect withzero. (In practice, the m β prior aﬀects the outcome; seethe end of Section IV A.)To verify that a scheme for assessing discovery poten-tial is sound, a second calibration is required. This in-volves inputting a “true” mass value of zero for an ensem-ble of pseudo-experiments, then constructing HDIs withsome credibility. Next, one conﬁrms that the intervalcredibility approximately equals the fraction (coverage)of experiments for which the interval contains zero. B. Calibrating Mass Ordering Sensitivity Claims

It is similarly possible to calibrate the process of claim-ing that the neutrino masses obey one ordering. Thisprocess is an example of result reporting for a discrete-domain parameter. In that case, we follow the aboveprocedure through step 4, replacing m β with a param-eter that encodes mass ordering information. For our β spectral model, that parameter is η , the lighter mass’contribution to the spectrum. For normal and invertedorderings, respectively, η tends toward precisely knownvalues η N and η I (see Section III). We claim a hypotheti-cal ordering result when the posterior π ( η | ˜ y ) clusters nearthe predicted value for one ordering.Speciﬁcally, as a suggested decision making scheme, wereport a normal (inverted) ordering result when a poste-rior interval on η with credibility κ contains η N ( η I ) butnot η I ( η N ). For a credible interval T on η , the associatedloss functions for each ordering are L N ≡ (cid:40) , ( η N ∈ T ) and ( η I / ∈ T )1 , Otherwise L I ≡ (cid:40) , ( η I ∈ T ) and ( η N / ∈ T )1 , Otherwise T = (cid:104) φ (1 − κ ) / , φ (1+ κ ) / (cid:105) , (3)where posterior bounds on η are computed so that (cid:90) φ (1 ± κ ) / dη π ( η | ˜ y ) = 1 ± κ . These bounds may be selected using either a quantile ora highest density approach, depending on which yieldshigher coverage. If L N = L I = 0 or 1, neither orderingis strongly favored and nothing can be claimed.For each “true” mass ordering, given a series of pseudo-experiments, we then compute the rates at which we re-port correct and incorrect mass ordering results (see Sec-tion IV B). These true and false claim rates enable ex-perimenters to select a credibility κ —i.e., to decide howstringent to make their reporting criterion. As in thecontinuous parameter case, this calibration of sensitiv-ity to the mass ordering should be performed for a largenumber of model conﬁgurations sampled from priors. Asimilar calibration procedure would apply to accelerator,atmospheric and reactor experiments seeking to resolvethe ordering [21], given an η -like parameter expressingmass ordering information.We implement the above two procedures using the Stansoftware platform for Bayesian inference, which estimatesposteriors by exploring a probability density parameter space using Markov Chain Monte Carlo methods (specif-ically, Hamiltonian Monte Carlo [22, 23]). Stan is a valu-able predictive analysis tool because it deals well withhigh dimensional problems and allows users to focus onmodeling systems instead of developing computationalarchitecture [24, 25]. Along with Stan, we employ mor-pho, a python-based tool we developed to organize infor-mation input to and output from Stan. Morpho facili-tates a Stan workﬂow involving convergence checks andanalysis of posteriors, and it is designed to suit generalStan users [26]. III. MODEL FORMALISM FOR A BETADECAY EXPERIMENT

The diﬀerential spectrum predicted for beta decay hasa well understood analytic distribution, especially for su-perallowed transitions. The rate at which electrons areejected as a function of their total energies is describedby the equation dNdE e = (cid:34) G F | V ud | π | M nuc | F ( Z, p e ) p e E e (cid:35) × (cid:34) (cid:88) i =1 | U ei | (cid:15) ν (cid:113) (cid:15) ν − m i Θ( (cid:15) ν − m i ) (cid:35) . (4)In the electron phase space term (ﬁrst bracketed term), G F is the Fermi coupling constant, V ud is the Cab-bibo mixing angle, M nuc is the nuclear matrix element, E e ( p e ) is the outgoing electron energy (momentum), and F ( Z, p e ) is the Fermi function, for a daughter nucleuswith charge Z . In the neutrino phase space term (sec-ond bracketed term), U ei are the electron neutrino mixingmatrix elements, (cid:15) ν and (cid:112) (cid:15) ν − m i represent the totalenergy and momenta of the released neutrino, and Θ isthe Heaviside step function. We also deﬁne the kineticenergy of the electron, K e = E e − m e .In this section, we ﬁrst justify our choice to hold theelectron phase space term constant with respect to en-ergy, allowing us to model spectral data by focusing onthe second, neutrino-speciﬁc term. We then approxi-mate and re-parameterize the neutrino phase space, pro-ducing an analytic spectral form that both incorporatesexpected features of a real data set and is suitable forBayesian modeling. A. Approximations to the Beta Spectrum

For this model, we consider an eV-scale energy regionnear the high-energy end of a spectrum produced by β -decay. For tritium decay, only superallowed transitionsoccur, so M nuc is simply the sum of the vector ( g V ) andaxial vector ( g A ) coupling constants: | M nuc | = g V + 3 g A . M nuc is therefore independent of electron energy.The relativistic correction to the Fermi function is neg-ligible at these energies, so the non-relativistic form isused: F ( Z, p e ) = 2 παZ/β − e − παZ/β , (5)where α is the ﬁne structure constant and β ≡ p e /E e isthe electron’s velocity. Since we conﬁne our analysis toa region of width δK e ∼

10 eV, and the variation in β isof order δK e /p e (cid:28) p max e /E max e , β can be approximatedas constant. Given Eq. 5, then, F ( Z, p e ) (cid:39) F ( Z, p max e ).Similarly, we treat p e E e (cid:39) p max e · E max e as constant, giventhat δK e (cid:28) E max e , m e . Thus, we can deﬁne a constant A ≡ G F | V ud | π | M nuc | F ( Z, p max e ) p max e E max e , representingthe electron phase space.In addition, the spectrum’s neutrino-dependent termcan be expressed in terms of the kinetic energy of theelectron K e . The neutrino phase space strongly dependson the ﬁnal state of the daughter. When multiple ﬁnalstate conﬁgurations are possible—for example, in molec-ular tritium decay—all possible ﬁnal state conﬁgurationsneed to be taken into account. In this case, however,we focus solely on atomic tritium (T) decay to singly-ionized He + (the process of interest for the Project 8experiment [14]).Assuming the decaying source is composed of nearlypure T, we need only consider a transition to one ﬁnalstate conﬁguration of the helium-3 nucleus. Energy con-servation then allows us to deﬁne (cid:15) ν as (cid:15) ν (cid:39) ( Q + m e − E recoil − E e ) ≡ ( Q T − K e ) ,Q ≡ M i − M f − m e − δb,E maxrecoil (cid:39) Q ( Q + 2 m e )2 M f Q , where M i ( f ) is the parent (daughter) nucleus mass, δb isthe diﬀerence in binding energy between the parent anddaughter atoms, and E recoil is the recoil energy of thedecay nucleus (with maximum E maxrecoil ). The recoil energyvaries by ∼ . E recoil as constant near the end of thespectrum [27]. This allows us to write the β spectrum interms of an endpoint energy parameter that is assumednot to diﬀer from decay-to-decay: Q T ≡ Q − E maxrecoil . Foratomic tritium, Q has an experimentally determinedmean value of 18566.66 eV, and E maxrecoil is 3.41 eV [27].Putting this together with the constant electron phasespace approximation, we formulate a spectral model P : P ( K e ) ≡ A (cid:88) i | U ei | ( Q T − K e ) (cid:113) ( Q T − K e ) − m i · Θ( Q T − K e − m i ) ≡ (cid:88) i | U ei | P i ( K e ) . (6) Second-order eﬀects are small compared with the overallspectral shape in and around our narrow analysis win-dow. Hence, our analytic model ignores second-ordercorrections, including terms that account for ﬁnite nu-clear radii and radiative corrections. B. One- and Two-Neutrino Spectral Models withFinite Energy Resolution

We must transform the function P ( K e ) so that it in-cludes features seen in experimental data, including anenergy resolution, background events, and kinetic energybounds. In performing these transformations, P ( K e )must meet two conditions to be suitable for Bayesianinference. First, we require that the function be normal-izable, because Bayesian models are formulated as prob-ability density functions (PDFs). Speciﬁcally, in Stan,one speciﬁes features of a likelihood space by addinglog PDFs to a total log probability. While strictly, thefunction’s normalization need not be analytic becauseStan provides for 1D integration, inference with analyticPDFs is less computationally expensive. By incorporat-ing smearing from an experimental energy resolution, weare able to formulate an analytically normalized versionof P . Second, to assess sensitivity to the mass ordering,our model must include a parameter η , as described inSection II B—or more generally, a variable that stronglydepends on the ordering.We consider two experimental factors: the uncertaintyassociated with reconstructing an energy spectrum andthe presence of background events. As opposed to con-sidering an integrating spectrometer (like the one used byKATRIN), we focus on diﬀerential spectrometers (usedby Project 8, ECHO and HOLMES) capable of measur-ing individual electron kinetic energies [28]. This al-lows us to assume that true kinetic energies are nor-mally distributed around K . The mapping distribution is N ( K e | K, σ ) for a standard deviation—that is, an energyresolution— σ .The convolution of the neutrino phase space term with N is not analytically integrable. We address this issue byexpanding each neutrino mass term P i within P (Eq. 6)to ﬁrst order in m i : P i ( K e ) (cid:39) A · (cid:2) ( Q T − K e ) − m i / (cid:3) Θ( Q T − K e − m i ) . This expansion is justiﬁed for m i (cid:28) ( Q T − K e ) , whichholds for all data points except those very close to theendpoint. Moreover, once the spectral shape is smearedby convolving it with N , the exact and approximatedcurves appear very similar even near the endpoint, asseen in Figure 2. When analyzing a full spectral shape,the expansion holds except for large quantities of data.(The count number at which the approximation breaksdown depends on the analysis window and binning,among other factors.)Given the expansion in m i , we can deﬁne and integratea reconstructed energy spectrum F i : F i ( K | Q T , K min , m i , σ ) ≡ F i ( K ) ∝ (cid:90) P i ( K e ) · N ( K e | K, σ ) · Θ( K e − K min ) · dK e → dNdK = N ( m i , Q T − K min ) · (cid:2) ξ ( K | Q T , m i , σ, m i ) − ξ ( K | Q T , m i , σ, Q T − K min ) (cid:3) (7) ξ ( K | Q T , m i , σ, t ) = ( Q T − K + t ) σ N ( Q T − K | t, σ ) + 12 (cid:32) − m i Q T − K ) + σ (cid:33) · Erfc (cid:32) t − Q T + K √ σ (cid:33) This model describes signal data in a kinetic energywindow [ K min , Q T ]. Its normalization term is deﬁnedbased on the size δK e of this window: N ( m i , δK e ) = 62( δK e ) − m i δK e + m i The practical need to ﬁlter out events below some en-ergy motivates our choice to include a minimum energyparameter. Because of the uncertainty σ associated withthe reconstruction of K min , this lower bound is soft. Reconstructed kinetic energy (eV) P D F o f s m e a r e d s p e c t r u m ( e V ) Q T mK min Approximated (analytic)Exact (numerical convolution)Normalized pseudo-data

FIG. 2. Approximated spectral model (Eq. 7) superimposedon a numerical convolution of a Gaussian with the exact Tspectrum (Eq. 4) and one year of data generated with theexact model. The signal activity is 1 . × /yr in the analysiswindow, m β = 8.5 meV, K min = Q T − m β −

10 eV, and σ =54 meV (see Section IV A). The background is assumed to be uniform in kineticenergy. If we include a smeared (i.e., convolved with N )background B , the normalized spectral model for a singleneutrino mass m i is given by the master equation: M i ( K ) = f s · F i ( K ) + (1 − f s ) · B ( K | K min , K max , σ )(8) B ( K | K min , K max , σ ) = Erf( K max − K √ σ ) − Erf( K min − K √ σ )2( K max − K min ) . Here, f s is the signal fraction of a data set. Since M i ( K )is analytic and normalized, it can be formulated as a PDFand thus used for statistical inference in Stan. Moreover, Eq. 8 allows us to assess an experiment’s sensitivity tothe mass scale. Speciﬁcally, the calibration procedure inSection II A can be applied for posteriors on masses m i inferred using this model.Experimentally, K is constructed from some observedvariable v o —for example, in Project 8’s case, an electroncyclotron frequency (see Section IV). The energy resolu-tion derives in large part from statistical uncertainties onthe quantities used to map v o → K . While these quan-tities and their errors are expected to be well known, amapping bias that shifts the overall energy scale is possi-ble. We model this bias by constructing a prior on K min which allows the minimum energy to shift slightly rela-tive to Q T .We bin data to reduce computation time, though un-binned analyses are possible in Stan. Details in the spec-tral shape on the order of a few meV only inform the neu-trino mass measurement if they occur in the last ≈ O (1 eV) width) in this waybiases m β posteriors upward relative to inputs, due tothe changing slope of the spectrum within each bin. Toaddress this, we derive the cumulative distribution func-tion G CDF i ( K ) corresponding to the PDF model in Eq. 7,then set the number of events in a bin [ K n , K n +1 ] equalto G CDF i ( K n ) − G CDF i ( K n +1 ). The CDF is provided inAppendix A.To report mass ordering results based on inferred pos-teriors, we modify the spectral model in a second way.If one considers the smaller mass splitting (∆ m ≡ m − m ) to be negligible, the signal (Eq. 7) can be writ-ten in terms of only two neutrino masses, m L and m H .Here, m H > m L , with a splitting ∆ m ee ≡ m H − m L (cid:39)| ∆ m | (cid:39) | ∆ m | . The signal is then simply a weightedsum of two spectra, corresponding to the two mass scales: F (cid:48) ( K ) = η · F L ( K | Q T , K min , m L , σ ) +(1 − η ) · F H ( K | Q T , K min , m H , σ ) (9)As indicated previously, η is the fractional contributionof the lighter mass term to the spectral shape.Since ∆ m ee is always positive, η is the only parameterin this model that depends on the ordering. Speciﬁcally, η should tend toward one value ( η N ) if the ordering isnormal and another ( η I ) if it is inverted, where η N ≡ | U e | + | U e | = cos ( θ ) η I = 1 − η N = | U e | = sin ( θ ) . The ordering question can thus be formulated solely interms of the large mass splitting and θ , both of whichare measured by reactor antineutrino disappearance ex-periments. Hence, the above model enables a mass order-ing determination using only a β spectrum and reactorexperiment results.To perform a mass ordering sensitivity study, we sub-stitute F i ( K ) → F (cid:48) ( K ) in Eq. 8. Then, by implement-ing the decision making scheme in Section II B for pos-teriors on η , we can calibrate the analysis by estimatingthe expected accuracy of reporting diﬀerent ordering re-sults based on β spectra. Consequently, we have heredeveloped a probability distribution that serves two keypurposes: It acts as a likelihood function for Bayesianmodeling, and it can be used to assess a direct mass ex-periment’s sensitivity to the mass ordering. IV. RESULTS

Our analysis seeks to determine how experimental pa-rameters such as energy resolution and number of β -decay events aﬀect sensitivity to m β as well as the massordering. To construct concrete, realistic priors that re-ﬂect what parameter values an experiment might see,we incorporate information related to the Project 8 ex-periment. The Project 8 Collaboration developed thetechnique of Cyclotron Radiation Emission Spectroscopy(CRES) for obtaining a β spectrum at high precision,as originally proposed by [13]. CRES involves measur-ing the cyclotron frequencies of electrons in a magneticﬁeld, then computing corresponding energies. In its ﬁnalstage, Project 8 aims to measure the neutrino mass scaleby analyzing a spectrum produced by atomic tritium β -decay. The Collaboration is working to reach a neutrinomass sensitivity of about 40 meV [14]. A. Sensitivity to Absolute Neutrino Mass Scale

1. Pseudo-Data Generation and Analysis

This study follows the procedure for calibrating sensi-tivity claims described in Section IIA. We perform 220pseudo-experiments (that is, repetitions of steps 2-5 inthe procedure), assuming a runtime ∆ t = 1 yr. For eachexperiment, data is generated with a β -spectrum modelthat is much more detailed than the inferential model,to reveal any biases arising from analysis assumptions.The generation model includes an energy-dependent rel-ativistic Fermi function, as well as correction terms stem-ming from atomic physics phenomena. These terms ac-count for the emitted electron’s recoiling charge distri- Prior Model Prior Source Q T N ([18563.25, 0.07]eV) 1, 2 Measured σ dopp γ (59.82, 2868 eV − ) 1, 2 Measured σ inst N ( µ inst , δ inst ) 1, 2 Design K min N ([ Q T − m β,L −

10, 0.01]eV) 1, 2 Design A b lognorm(-27.31, 0.5678) 1, 2 Design N atoms lognorm(44.07, 0.5677) 1, 2 Design m β γ (1.135, 2.302 eV − ) 1 Measured∆ m ee γ (314.5, 122700 eV − ) 2 Measured m L γ (2.186, 126.1 eV − ) 2 N/A TABLE II. Priors for data generation and analysis using one-and two-neutrino models, denoted by “1” and “2,” respec-tively. “Design” quantities reﬂect goals for Project 8, while“measured” ones derive from past experiments. Prior func-tions are deﬁned in Appendix B. bution, radiative eﬀects from real and virtual photons,three-body recoil eﬀects from weak-magnetism and V-Ainterference, 1s-orbital electron interactions with the β and screening of the He + Coulomb ﬁeld, and the He + nucleus’ structure. The formulae for these correctionsare taken from [29]. In this subsection, we generate datawith a one-neutrino mass model and call that mass m β .To compose a full data generation model, the detailed β -spectrum is broadened by numerically convolving itwith a Gaussian of width σ . A nearly ﬂat background(Eq. 8) is then added to the spectrum. Before convolu-tion, the data is conﬁned within a ≈

20 eV window cen-tered on the mean energy at which the spectrum vanishes: Q T − m β , where Q T is the mean T endpoint. The win-dow’s width varies modestly from spectrum-to-spectrumbecause its lower bound is sampled from a prior, as dis-cussed below.In Stan, we implement the one-neutrino spectral model M from Eq. 8, for m i → m β . Each pseudo-spectrum isanalyzed using this model. The data is histogrammedwith 300 bins covering the 1 eV directly below the end-point, nine ≈ K n , K n +1 ], we model the number of counts as avalue sampled from a Poisson distribution with rate M (cid:16) K n + K n +1 (cid:17) × ( K n +1 − K n ). For the 9 wider signalbins, since the β -spectrum decreases monotonically, thesignal Poisson rate can be approximated as G CDF ( K n ) −G CDF ( K n +1 ) (see Appendix A). To test the eﬀect ofbin size near the endpoint, a small analysis (40 pseudo-experiments) was performed with 500 bins in the eV be-low the endpoint. It yielded median m β sensitivities andcoverages consistent with those presented in Table III,within statistical uncertainty.

2. Selection of Priors

Each model parameter requires an associated prior,both for sampling “true” values (generator inputs) andfor inferring posteriors from data. By sampling fromthese priors repeatedly, creating an ensemble of modelconﬁgurations, we can approach an analysis that ac-counts for the full range of possible spectra—given antici-pated statistical and systematic errors. To construct pri-ors, we select functional forms with boundary conditionsthat accord with physical limits on parameters. For pos-itive quantities, we therefore generally chose log-normalor gamma ( γ ) distributions—the former when likely val-ues span multiple orders of magnitude, and the latterotherwise. See Table II for a summary of priors.The one-neutrino model includes parameters m β , Q T , σ , K min , and f s . A γ prior on m β was constructed sothat 1% of its probability mass would fall below 0.008eV, reﬂecting the lower bound from mass splitting mea-surements [1]. (This bound is not strict because of smalluncertainties on those measurements.) Ten percent of theprior mass on m β falls above 1.1 eV, the 90% conﬁdenceupper bound reported by KATRIN in 2019 [30].We employ a normal prior on Q T but deﬁne the pa-rameter as positive in Stan, truncating a negligible neg-ative portion of the normal distribution. The mean ofthe prior is the extrapolated tritium endpoint minus theelectron mass, as calculated by Bodine et al. [27]. Thelargest contribution to the Q T uncertainty is from theT- He mass diﬀerence, which has been measured in Pen-ning traps [31]. That quantity serves as the Q T priorstandard deviation.We consider two energy resolution eﬀects, summed inquadrature to yield the total resolution σ : 1) Dopplerbroadening σ dopp from translational motion of tritiumatoms, and 2) Instrumental broadening σ inst from theprocess of reconstructing kinetic energies. To select a γ prior on σ dopp , we devised a Stan model that extractsposteriors for the mean expected energy spread due tothermal broadening ( µ dopp ) and the uncertainty on thatspread ( δ dopp ), using the formulae in [27]. We set themean ( √ variance) of the σ dopp prior equal to the meanof a Gaussian ﬁt to the µ dopp ( δ dopp ) posterior, inferredfor a 0 . ± . con-tamination.The primary two expected contributions to the instru-mental resolution are A) a cyclotron frequency measure-ment error and B) an uncertainty on the ﬁeld value inthe frequency to energy conversion. We construct a σ inst prior assuming that the ﬁeld error ∆ B/B ∼ − is thelarger contribution [14]. In this case, σ inst ∼ .

05 eV.As Project 8 is considering multiple energy calibrationschemes, the uncertainty on σ inst could reasonably fallanywhere in the large range of ≈ . − σ inst prior’s “true” mean and standard devi-ation ( µ inst , δ inst ) are sampled from distributions beforedata generation, then ﬁxed to their sampled values duringinference. The σ inst prior is then N ( µ inst , δ inst ). The µ inst distribution for pre-generation sampling is γ (25 . , × − eV − ), with mean 0.05 eV and √ variance = 0 .

01 eV.The δ inst distribution is γ (1 . , . − ), selected sothat 5% of its probability mass would fall below (above)2 . × − eV (5 × − eV). Combining the two sourcesof broadening, the mean σ is 0.054 eV.Experimenters can select K min before analysis by ﬁl-tering out events above some cyclotron frequency. If theconversion ( σ ) to K were known exactly, K min could beﬁxed during inference at a value computed from that fre-quency. Instead, to allow for a systematic shift in K onthe order of 0.01 eV, we employ a normal prior on K min with that standard deviation.We also incorporated priors associated with the spec-tral signal fraction. While external information does notdirectly inform a prior on f s , it pertains more directlyto signal and background activities A s and A b . Here, A s ( A b ) is the number of events per second generated by F ( K ) ( B ( K )) in the window [ K min , Q T ] ([ K min , K max ]).We thus model the signal fraction as f s = S/ ( S + B ),where S = ∆ t · A s and B = ∆ t · A b are signal and back-ground Poisson event rates.To inform the prior on A s , a possible expected signalactivity in the unconvolved spectrum’s last electronvoltcan be expressed in terms of both experiment-speciﬁcquantities (atomic source density n ; eﬀective source vol-ume V eﬀ ) and physical parameters (T half-life τ / ; frac-tion f eV of counts between Q T − m β − Q T − m β for σ → A s in the last eV = n · V eﬀ · ln(2) τ / · f eV . (10)Here, the fraction of counts f eV in the last eV takes intoaccount that all events observed in the last electronvoltare produced by decays to the He + electronic groundstate [27], which comprise 70.06% of the total tritiumdecay width [33]. The detailed spectral model we devel-oped for data generation enabled a new, precise calcu-lation of f eV , a quantity that has historically been cen-tral to projecting the activities of tritium-based neutrinomass experiments [6, 32]. Assuming m β = 0, we ﬁnd f eV = 1 . × − for T and 2 . × − for T.For a number density n = 10 atoms/m and V eﬀ =10 m , target values for an experimental design scenarioconsidered by the Project 8 Collaboration [14], the ex-periment would detect ≈ . × events per yearabove Q T − m β − Q T − m β −

10 eV. We employed a log-normal prior on N atoms ≡ n · V eﬀ for this scenario, setting its mode andstandard deviation equal to 10 atoms. For a given ap-paratus, this allows for some variation in source densityand detection eﬃciency. A s was then computed from N atoms .The A b prior is informed by the Project 8 Collab-oration’s goal for its dominant source of backgroundto be cosmic rays passing through the tritium gas.Since the expected cosmic ray activity is approximately010 − /eV/s for the n and V eﬀ values assumed above, andthe activity varies with those parameters [14], the A b prior distribution is chosen to have mode and standarddeviation equal to 10 − /s for each 1-eV-wide bin of data.

3. Neutrino Mass Scale Sensitivity Results

A close correspondence between “true” neutrinomasses and m β posteriors indicates that each β -spectrumstrongly informs a neutrino mass determination (see Fig-ure 3). Each posterior standard deviation on m β isat least 22 times smaller than the corresponding priorspread. See [2, 34] for more information on posteriorshrinkage and evaluating model performance.Highest density credible intervals (C.I.s) were com-puted for α = 0 . Q T , σ inst , σ dopp , K min , A s and A b track with input values. Duringall 220 analyses, the ﬁve Stan convergence diagnostics—ˆ R , eﬀective sample size ratio, E-BFMI, tree depth, anddivergences [23, 35, 36]—showed no signs of pathologicalbehavior. Moreover, the coverage of 90% credible inter-vals is between 85% and 99% for all parameters.For true m β > . (cid:112) C · (1 − C ) /N trial .The left plot of Figure 4 shows that mass sensitivitydepends weakly on σ inst , because the scenario consideredhere is relatively statistics-limited and the range in σ inst is small. However, for this scenario, smaller uncertaintieson σ inst noticeably improve sensitivity (see Section IV A 4for an instance of this). We would also expect increasingthe eﬀective volume to improve neutrino mass sensitivity.Indeed, for an ensemble with ﬁxed energy resolution anda wide range in V eﬀ values, the widths of m β credibleintervals depend strongly on V eﬀ , as seen in the right plotin Figure 4. These results could inform how future directmass experiments prioritize their eﬀorts to improve theexpected energy resolution, resolution uncertainty, andstatistical yield of an apparatus design.

4. Claiming m β is Inconsistent With Zero We also evaluate the ability of an experiment withthe design described here to distinguish the electron-weighted neutrino mass from zero. As introduced in II A,for a given β -spectrum, it is possible to claim that theneutrino mass is nonzero with credibility α if the lowerbound of a posterior highest density α -credible interval Inputted (eV) P o s t e r i o r i n p u t ( e V ) ( % C . I . ) FIG. 3. Neutrino mass posterior means and 90% credibleintervals as a function of inputted m β , for a one-neutrinomodel and the assumed experimental design. Interval widths(“sensitivities”) decrease with m β , asymptoting at ∼ Interval Sensitivity (eV) Coverage

Median Mean Maximum90% C.I. 0.0071 0.0112 0.0493 (90 . ± . . ± . . ± . TABLE III. Sensitivity to m β after 1 yr, with coverages ofcredible intervals. exceeds zero. The m β prior in Table II is in conﬂictwith this test, as that prior assumes that it is highly im-probable for the mass to be zero, considering the lowerbound from oscillations measurements. When Project 8analyzes real data, its main mass scale analysis can in-clude an m β prior with an oscillations-based lower bound.However, to assess consistency with zero, the data willneed to be re-analyzed with an oscillations-bound-freeprior.As an example sensitivity study, we perform 75 pseudo-experiments with 10% of the neutrino mass prior proba-bility falling below 0.005 eV and 10% above 0.1 eV. Re-sulting posterior credible intervals on m β are shown inFigure 5. The neutrino mass can be distinguished fromzero with 90% credibility in 65 of these analyses. It ispossible to claim the mass is inconsistent from zero fortrue m β (cid:38) .

04 eV, with two outliers caused by an un-derestimation of the true mass, combined with poor m β precision due to large inputted uncertainties (i.e., priorwidths) on σ inst .How can one be conﬁdent that this method will notproduce frequent false claims? We may perform anothercalibration: For β -spectra produced given a true neutrinomass of zero, we should rarely claim that m β is distin-guishable from zero. Indeed, when we analyze 150 suchspectra, the mass is judged to be consistent with zero1 Instrumental Resolution (eV) % C . I . s e n s i t i v i t y t o ( e V ) Interval coverage: 90% Effective Volume × Time (m yrs)

Interval coverage: 91%Source density: . × atoms/mInputted : . eV N u m b e r o f p s e u d o - e x p e r i m e n t s FIG. 4. Dependence of mass sensitivity (width of 90% credible intervals) on σ inst and volume × eﬃciency × time. The left plotassumes the design scenario described in this section. The right plot shows a larger range in signal exposure, for an alternatescenario where n (3 . × m − ) and σ (115 ± m β uncertainty, given a trade-oﬀ betweenfrequency reconstruction error and exposure. The right plot “pessimistically” assumes m β = 0 .

008 meV.

93% of the time ( α = 0 . Inputted (eV) P o s t e r i o r ( e V ) ( % i n t e r v a l s ) FIG. 5. Mass posterior means and 90% credible intervals forinputted m β near zero. It is possible to distinguish the massfrom zero for true m β (cid:38) .

04 eV, with outliers characterized bylarge uncertainties σ inst (energy broadening standard dev.). B. Sensitivity to Neutrino Mass Ordering

The analysis in this section follows the procedure de-scribed in Section IIB for calibrating sensitivity claimsto discrete parameters. Pseudo-data is generated with the same detailed spectral model as in Section IV A,but with two neutrino masses instead of one. Sim-ilarly, for inference in Stan, we now employ a two-neutrino model—Eq. 8, with a spectral signal F (cid:48) ( K )(Eq. 9)—to analyze data in the approximate window[ Q T − m L − , Q T − m L + 10 eV]. This region ex-tends only 1 eV below the endpoint so that the likelihoodwill be strongly informed by ﬁne-grained mass ordering-dependent features near Q T . To help constrain the over-all mass scale, data in the next eV below Q T − m L − m β model, with the require-ment m β = η · m L + (1 − η ) · m H .We repeat this two-neutrino analysis for ∆ t = 1 yrand 2 yrs with at least 170 pseudo-experiments per run-time, producing coverage uncertainties of 1-5%. Again,data are binned after generation, then analyzed assum-ing Poisson-distributed events. The Stan model includesthe same priors on parameters Q T , σ dopp , σ inst , N atoms and A b as in the one-neutrino case. The prior on K min issimilar, with its mean dependent on m L instead of m β .We also constructed priors on ∆ m ee and m L (see Ta-ble II), while m H required no prior, as it was modeled bytransforming those parameters. A γ prior on ∆ m ee was formulated by extracting a90% conﬁdence interval from a global ﬁt of three reactor To avoid non-invertible transforms and the need for Jacobian ad-justments, in Stan, we deﬁne a “ positive ordered ” transformedparameter m , with m[1] = m L and m[2] = (cid:113) m L + ∆ m ee (see Sec-tion 22 of [25]). The entries of m then serve as inputs to thespectral log probability density function. FIG. 6. For one pseudo-experiment, example posterior probability density plots and 2D-histograms (in both contour and scatterplot form) for parameters in the two-neutrino spectral model. Posteriors were obtained by analyzing data (∆ t = 1 yr) that wasgenerated assuming a normal mass ordering. neutrino experiments: [2 . , . × − eV [37]. At thetime when we began the analysis, this was the most up-to-date global ﬁt of reactor data. As these bounds diﬀerslightly according to mass ordering, to be conservative,we selected each bound (either the normal or invertedordering limit) so as to obtain a wider prior. Ten percentof the prior mass on ∆ m ee falls outside each bound. Inaddition, before generation, either η N or η I was sampledfrom a Gaussian prior, depending on the “true” ordering.Prior parameters were determined based on the meanof cos θ (0.979) and error on that mixing parameter(0.001), as measured by reactor experiments [37]. Poste-riors extracted from one of the two-neutrino model ﬁtsare shown in Fig. 6.For the prior on m L , we avoided computing soft boundsusing current limits on the mass scale from particlephysics experiments, as those constraints do not trans-late easily to bounds on individual masses [1, 38]. In-stead, we envision a scenario in which m L is restrictedbelow ≈ .

05 eV, potentially based on future cosmologi-cal constraints on the sum of the three neutrino masses.Speciﬁcally, the prior for pre-generation sampling andinference is γ -shaped with 10% of its mass below 5 meVand 5% above 40 meV, resulting in m L < .

08 eV for allpseudo-experiments. (The distribution peaks near zero,since there is no oscillations-based lower bound on m L for the inverted ordering and a very small lower boundfor the normal case.) For true masses above 0 .

08 eV, onerarely claims to have resolved the mass ordering usingour reporting scheme. Hence, by choosing a prior local- ized in a low-mass region, we proportionally inﬂate trueand false ordering claim rates. This makes the process ofselecting ideal reporting criteria κ based on claim ratesmore statistically reliable than it would be for a wider m L prior.As in the one-neutrino case, posterior means track withinput values for all parameters. During analysis of mostspectra, no Stan MCMC diagnostics indicated a failureto converge. However, a quarter of runs exhibited signs ofincomplete convergence [35]: 15% showed a small numberof diverging iterations (1-10 of 15,000), and 10% failed atleast one other check. Mass ordering sensitivity resultsare robust despite this, since observing and minimizingfalse positive rates ultimately validates the analysis. Still,a more consistently converging model might improve sen-sitivity.Table IV summarizes results for calibration of sensi-tivity to the mass ordering. Uncertainties on 0% claimrates represent 68.3% conﬁdence limits derived from a bi-nomial probability law. (Given the ensemble’s ﬁnite size,the actual probability of a false claim is not exactly zero.)The loss functions L N and L I in Eq. 3 dictated whetheran ordering result should be reported for each pseudo-experiment. That is, a normal (inverted) ordering claimwas made if a posterior interval on η of credibility κ con-tained η N = cos θ ( η I = 1 − η N ) but not η I ( η N ) (seeFigure 7). Given the small experimental error on cos θ ,we assumed a known value η N = 0 . κ acts as a reporting criterion, and modifying κ aﬀectsthe rates at which we correctly and incorrectly claim to3 Sensitivity to (eV), 90% C.I. P o s t e r i o r ( . % C . I . ) Normal orderingInverted orderingInputted

FIG. 7. For 2 yrs of data assuming normal (dots) and inverted(triangles) orderings, posterior means and intervals on η as afunction of potential sensitivity to m L , deﬁned as C.I. width. ∆ t = 2 yrs, m L (cid:46) .

05 eVClaim N Claim I Optimal κ N . ± . . (+1.3%)Truth: I . (+1.5%) . ± . t = 1 yr, m L (cid:46) .

05 eVClaim N Claim I Optimal κ N . ± . . (+1.3%)Truth: I . (+1.2%) . ± . TABLE IV. Assuming either a normal or inverted true order-ing, percentages of pseudo-experiments for which the threepossible reporting outcomes (“normal,” “inverted,” or “noclaim”) occur. To minimize false claims, diﬀerent reportingcriteria κ are used for each ensemble and observed ordering. have resolved the neutrino mass ordering (see Figure 8).We recommend an “optimal κ ” by selecting the valuefor which the relevant correct claim rate is maximized,given a minimal incorrect rate—which can be zero, in thisstudy. Values of κ are considered in 0.5% increments.We observe that, for both 1 yr and 2 yrs of data, falseinverted claims begin to occur for κ values above a lower number than do false normal claims. In fact, Figure 8shows that false normal claims are never made for ∆ t =2 yrs, for these pseudo-data sets. Using that knowledge,for real data, it is possible to boost the probability ofa correct ordering claim without increasing the risk of afalse claim by applying the following procedure:A) Check what result would be reported using the op-timal κ for normal ordering true/false claims (aspredicted with pseudo-experiments)—here, 0.925(0.985) for 1 (2) yr(s). Reporting criterion R e p o r t i n g f r e q u e n c y True mass ordering: Normal

Incorrect

CorrectNo claim

Reporting criterion R e p o r t i n g f r e q u e n c y True mass ordering: Inverted

CorrectNo claim

FIG. 8. Mass ordering reporting frequencies for ∆ t = 2 yrs asa function of κ , the credibility of the η interval (see Eq. 3).To obtain the rates in Table IV, diﬀerent κ values are chosendepending on whether the initially favored result is normal orinverted. For the upper plot, this adjustment enables one toreduce the incorrect claim rate. B) If the result is “normal” or “no claim,” report it.C) If the result is “inverted,” it could be a false posi-tive. Reduce κ to the inverted optimal value—0.875(0.855) for 1 (2) yr(s)—to determine if to report“inverted” or nothing.This procedure accounts for the fact that it is easier toclaim a normal than an inverted ordering result for ourmodel. The procedure enables false claim rates of 0% forthe pseudo-experiments performed here, with true ratesreaching 87% (22%) for the normal (inverted) orderingafter 2 yrs. We see here that as statistical power improvesover time, sensitivity to the normal ordering improves.These results indicate that a Project 8-like neutrinomass experiment could resolve the mass ordering for var-4 p r i o r p r o b a b ili t y / e V Inputted (eV) S e n s i t i v i t y t o ( e V ) , % C . I . Reporting criteria: = . %, = . % Report nothingCorrectly report normalCorrectly report inverted

FIG. 9. Distribution of mass ordering results with respect to m L and mass sensitivity (∆ t = 2 yrs). The normal (inverted)ordering was reported when a 98.5% (85.5%) η credible inter-val excluded one ordering and was consistent with the other. ious likely combinations of physical and experimental pa-rameter values. If the neutrinos obey a normal orderingand the lightest mass is constrained below ≈ .

05 eV, thisanalysis predicts there is a high chance of resolving theordering after 2 yrs of data taking. We observe that adirect mass experiment would resolve the normal order-ing especially often in the low- m L , small mass sensitivityregion (see Figure 9).For this study, we chose to employ a two-neutrino spec-tral model, as opposed to constraining the ordering basedon a single mass measurement —for which m β (cid:47)

48 meVrules out an inverted ordering. While it would be impos-sible to resolve the inverted ordering using a one-neutrinomodel, a two-neutrino analysis can enable an invertedordering determination. In other words, the process ofinference is sensitive to ﬁne structure near the endpointof the spectrum produced by individual neutrino masseigenstates.

V. CONCLUSIONS

In this paper, we presented a Bayesian approach to an-alyzing sensitivity to the neutrino mass scale and order-ing. That approach included a calibration, which quanti-ﬁed the performance of two processes: inferring informa-tion and reporting results. Our sensitivity and calibra-tion procedures are applicable to any experiment thatproduces information regarding the mass scale and or-dering. These procedures also serve as templates for sen-sitivity studies by other physics experiments—whetherthey measure continuous or discrete parameters. As de-sign planning for Project 8’s ﬁnal phase advances, futurework will include a detailed analysis of systematic fea- tures to inform more precise priors in a Project 8-speciﬁcstudy.Using the β spectrum model developed here, and giventhe experimental expectations in Section IV A, we ﬁndthat a high-precision direct mass experiment could re-solve the electron-weighted neutrino mass m β ≈ m ina 90% credible interval, with a “true claim rate” or cov-erage of (90 . ± . m β , the width ofthis interval approaches 40 meV, and for m β > . β spectrum.This study also investigates the tritium β -decay tech-nique’s sensitivity to the neutrino mass ordering. We em-phasize that, by using a utility function to judge whetherto report an ordering result, it is possible not only topredict the probability of a false ordering claim, but alsoto determine a reporting tolerance (here, the η intervalcredibility) that minimizes the risk of false claims. Forthe experimental parameters assumed here and a two-year runtime, we would recommend reporting a normalordering result when a 98.5% posterior credible intervalon the light-mass fraction η contains | U e | + | U e | butnot | U e | . To report an inverted ordering determination,the opposite should hold for an 85.5% interval around η .Those reporting criteria enable the normal (inverted) or-dering to be resolved ≈

87% (22%) of the time, with a ≈

0% false claim rate. It is also possible to infer pos-teriors on individual neutrino masses. When sensitivityto the lightest mass is better than 0.03 eV, it is nearlyalways possible to resolve the mass ordering.These results demonstrate that we can access more in-formation by modeling the full spectral shape than wouldbe possible using a one-neutrino model in terms of m β .As more events are detected, the spectral shape methodbecomes increasingly sensitive to count rate kinks thatinform inferences about individual neutrino masses andtheir ordering. Direct mass experiments thus oﬀer aunique potential probe of individual | U ei | matrix ele-ments, complementary to oscillations-based probes oftheir products. ACKNOWLEDGMENTS

The authors would like to thank Andr´e de Gouvˆea forinsight and discussions. This material is based upon worksupported by the following sources: the U.S. Departmentof Energy Oﬃce of Science, Oﬃce of Nuclear Physics, un-der Award No. DE-SC0020433 to Case Western ReserveUniversity (CWRU), under Award No. DE-SC0011091 tothe Massachusetts Institute of Technology (MIT), underthe Early Career Research Program to Paciﬁc NorthwestNational Laboratory (PNNL), a multiprogram nationallaboratory operated by Battelle for the U.S. Departmentof Energy under Contract No. DE-AC05-76RL01830, un-der Early Career Award No. DE-SC0019088 to Penn-5sylvania State University, under Award No. DE-FG02-97ER41020 to the University of Washington, and un-der Award No. DE-SC0012654 to Yale University; theNational Science Foundation under Award Nos. PHY-1205100 to MIT; the Cluster of Excellence “PrecisionPhysics, Fundamental Interactions, and Structure ofMatter” (PRISMA+ EXC 2118/1) funded by the Ger-man Research Foundation (DFG) within the GermanExcellence Strategy (Project ID 39083149); the Labora- tory Directed Research and Development (LDRD) 18-ERD-028 at Lawrence Livermore National Laboratory(LLNL), prepared by LLNL under Contract DE-AC52-07NA27344, LLNL-JRNL-817667; the LDRD program atPNNL; the University of Washington Royalty ResearchFoundation; Yale University; and the Karlsruhe Instituteof Technology (KIT) Center Elementary Particle and As-troparticle Physics (KCETA). A portion of the researchwas performed using the Engaging cluster at the MGH-PCC facility. [1]

Particle Data Group

Collaboration, P. A. Zyla et al.,“Review of particle physics,” Prog. Theor. Exp. Phys. no. 083C01, (2020) . For statistics deﬁnitions andbest-practices, see

40. Statistics.

For neutrino physics,see

14. Neutrino masses, mixing and oscillations .[2] M. Betancourt, “Calibrating Model-Based Inferencesand Decisions,” arXiv e-prints (Mar., 2018)arXiv:1803.08393, arXiv:1803.08393 [stat.ME] .[3] R. J. Little, “Calibrated bayes: A bayes/frequentistroadmap,” The American Statistician (2006)212–223.[4] T. Sellke, M. J. Bayarri, and J. O. Berger, “Calibrationof p values for testing precise null hypotheses,” TheAmerican Statistician (2001) 62–71.[5] S. Lacoste-Julien, F. Husz´ar, and Z. Ghahramani,“Approximate inference for the loss-calibratedbayesian,” Proceedings of Machine Learning Research (2011) 416–424.[6] KATRIN

Collaboration, A. J. et al., “KATRIN DesignReport,” 2004. .[7] P. Gustafson and S. Greenland, “Interval estimation formessy observational data,” Statistical Science no. 3,(2009) 328–342.[8] Super-Kamiokande

Collaboration, K. Abe et al.,“Solar neutrino results in super-kamiokande-iii,” Phys.Rev. D (March, 2011) 052010.[9] SNO

Collaboration, B. Aharmim et al., “Low EnergyThreshold Analysis of the Phase I and Phase II DataSets of the Sudbury Neutrino Observatory,” Phys. Rev.

C81 (2010) 055504, arXiv:0910.2984 [nucl-ex] .[10]

KamLAND Collaboration

Collaboration, K. Eguchiet al., “First results from kamland: Evidence for reactorantineutrino disappearance,” Phys. Rev. Lett. (Jan,2003) 021802.[11] X. Qian et al., “Statistical evaluation of experimentaldeterminations of neutrino mass hierarchy,” Phys. Rev.D (Dec., 2012) .[12] J. A. Formaggio, “Direct neutrino mass measurementsafter PLANCK,” Phys. Dark Univ. (2014) 75–80.[13] B. Monreal and J. Formaggio, “Relativistic CyclotronRadiation Detection of Tritium Decay Electrons as aNew Technique for Measuring the Neutrino Mass,”Phys. Rev. D80 (2009) 051301(R).[14]

Project 8

Collaboration, A. Ashtari Esfahani et al.,“Determining the neutrino mass with cyclotronradiation emission spectroscopy—Project 8,” J. Phys.

G44 no. 5, (2017) 054004, arXiv:1703.02037[physics.ins-det] . [15] S. Talts et al., “Validating bayesian inference algorithmswith simulation-based calibration,” arXiv:1804.06788[stat.ME] .[16] G. J. Feldman and R. D. Cousins, “Uniﬁed approach tothe classical statistical analysis of small signals,” Phys.Rev. D (April, 1998) 3873–3889.[17] J. Neyman and H. Jeﬀreys, “Outline of a theory ofstatistical estimation based on the classical theory ofprobability,” Philosophical Transactions of the RoyalSociety of London. Series A, Mathematical and PhysicalSciences no. 767, (1937) 333–380.[18] M. J. Bayarri and J. O. Berger, “The interplay ofbayesian and frequentist analysis,” Statistical Science (2004) 58–80.[19] M. Betancourt, “Probabilistic modeling and statisticalinference,” 2019. https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html .Retrieved from https://github.com/betanalpha/knitr -case studies/tree/master/modeling and inference,commit b474ec1a5a79347f7c9634376c866fe3294d657a.[20] R. Hyndman et al., “Computing and graphing highestdensity regions.,” The American Statistician (1996)120–126. .[21] X. Qian and P. Vogel, “Neutrino Mass Hierarchy,”Prog. Part. Nucl. Phys. (2015) 113011, arXiv:1505.01891 [hep-ex] .[22] R. Neal, “Mcmc using hamiltonian dynamics,” 2011. In Handbook of Markov Chain Monte Carlo , edited bySteve Brooks, Andrew Gelman, Galin L. Jones, andXiao-Li Meng, 116-62. Chapman and Hall/CRC.[23] M. Betancourt, “A Conceptual Introduction toHamiltonian Monte Carlo,” arXiv e-prints (Jan., 2017)arXiv:1701.02434, arXiv:1701.02434 [stat.ME] .[24] B. Carpenter, A. Gelman, M. Hoﬀman, D. Lee,B. Goodrich, M. Betancourt, M. Brubaker, J. Guo,P. Li, and A. Riddell, “Stan: A probabilisticprogramming language,” Journal of Statistical Software,Articles no. 1, (2017) 1–32.[25] “Stan modeling language users guide and referencemanual, version 2.25,” 2020. https://mc-stan.org/users/documentation/ . StanDevelopment Team.[26] M. Guigue, J. Formaggio, T. Weiss, J. Johnston,N. Oblath, and B. LaRoque, “morpho,” 2020. Githubpackage.[27] L. I. Bodine, D. S. Parno, and R. G. H. Robertson,“Assessment of molecular eﬀects on neutrino massmeasurements from tritium β decay,” Phys. Rev. C (March, 2015) 035505.[28] M. Fertl, “Review of absolute neutrino massmeasurements,” Hyperﬁne Interact. (Nov., 2018)52.[29] M. Kleesiek et al., “ β -Decay Spectrum, ResponseFunction and Statistical Model for Neutrino MassMeasurements with the KATRIN Experiment,” Eur.Phys. J. C (March, 2019) .[30] KATRIN

Collaboration, M. Aker et al., “An improvedupper limit on the neutrino mass from a directkinematic method by KATRIN,” 2019.[31] E. G. Myers, A. Wagner, H. Kracke, and B. A. Wesson,“Atomic masses of tritium and helium-3,” Phys. Rev.Lett. (Jan., 2015) 013003.[32]

Project-8

Collaboration, P. J. Doe et al., “Project-8:Determining neutrino mass from tritium beta decayusing a frequency-based method,” ArXiv e-prints (2013), arXiv:1309.7093 [nucl-ex] .[33] R. D. Williams and S. E. Koonin, “Atomic ﬁnal-stateinteractions in tritium decay,” Phys. Rev. C (April,1983) 1815–1817.[34] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson,A. Vehtari, and D. B. Rubin, Bayesian Data Analysis.Chapman and Hall/CRC, 3rd ed., 2013.[35] M. Betancourt, “Toward a principled bayesian workﬂow(pystan),” 2018. Retrieved fromhttps://github.com/betanalpha/jupyter case -studies/tree/master/principled bayesian workﬂow,commit 2580fdeac38f77859b2d1c60f9c4a37237864e63.[36] M. Betancourt, “Markov chain monte carlo,” 2020. https://betanalpha.github.io/assets/case_studies/markov_chain_monte_carlo.html . Retrievedfrom https://github.com/betanalpha/knitr case -studies/tree/master/markov chain monte carlo, commitb474ec1a5a79347f7c9634376c866fe3294d657a.[37] Valencia Neutrino Global Fit

Collaboration,“Reactor neutrino experiments (2018).,”. https://globalfit.astroparticles.es/2018/02/20/reactor-experiments/ .[38] K. N. Abazajian et al., “Cosmological and AstrophysicalNeutrino Mass Measurements,” Astropart. Phys. (2011) 177–184, arXiv:1103.5083 [astro-ph.CO] . Appendix A: Approximate spectral model

The approximate β spectral model for Bayesian infer-ence in Eq. 7 has a corresponding cumulative distributionfunction. It is given by G CDF i ( K ) = (cid:90) ∞ K F i ( K (cid:48) ) dK (cid:48) = (cid:104) G A ( K | m i , Q T , σ ) − G B ( K | m i , Q T , σ, K min ) (cid:105) /C where G A = N ( Q T − K | m i , σ )2 σ · (cid:104) σ − m i + 2 m i ( Q T − K )+ 2( Q T − K ) (cid:105) + Erfc (cid:32) m i − Q T + K √ σ (cid:33) × (cid:104) m i + ( Q T − K ) · (cid:16) σ − m i + 2( Q T − K ) (cid:17)(cid:105) G B = N ( Q T − K | Q T − K min , σ )2 σ · (cid:104) σ − m i + 2 (cid:16) ( Q T − K min ) − ( Q T − K min )( Q T − K ) + ( Q T − K ) (cid:17)(cid:105) + Erfc (cid:32) K − K min √ σ (cid:33)(cid:104) ( Q T − K min ) (cid:16) m i − Q T − K min ) (cid:17) + ( Q T − K )(6 σ − m i + 2( Q T − K ) ) (cid:105) C = (cid:104) G high ( K ) − G low ( K ) (cid:105)(cid:12)(cid:12)(cid:12) ∞ . We implemented this function in Stan and employed itto analyze fake spectra.

Appendix B: Priors distributions deﬁnitions

The prior distributions used in this paper are deﬁnedas follows. Each distribution is implemented via a Stanfunction that outputs the log of the probability densityof a parameter y [25].1. Normal distribution: N ( µ, σ ) ≡ N ( y | µ, σ ) = 1 √ πσ exp (cid:32) − (cid:16) y − µσ (cid:17) (cid:33)

2. Gamma distribution: γ ( α, β ) ≡ γ ( y | α, β ) = β α Γ( α ) y α − exp( − βy )where Γ( α ) = (cid:90) ∞ x α − e − x dx

3. Log-normal distribution:lognorm( µ, σ ) ≡ lognorm( y | µ, σ )= 1 √ πσy exp (cid:32) − (cid:16) log y − µσ (cid:17)2