[PDF] Gaussian Process Accelerated Feldman-Cousins Approach for Physical Parameter Inference

Abstract

The unified approach of Feldman and Cousins allows for exact statistical inference of small signals that commonly arise in high energy physics. It has gained widespread use, for instance, in measurements of neutrino oscillation parameters in long-baseline experiments. However, the approach relies on the Neyman construction of the classical confidence interval and is computationally intensive as it is typically done in a grid-based fashion over the entire parameter space. In this letter, we propose an efficient algorithm for the Feldman-Cousins approach using Gaussian processes to construct confidence intervals iteratively. We show that in the neutrino oscillation context, one can obtain confidence intervals 5 times faster in one dimension and 10 times faster in two dimensions, while maintaining an accuracy above 99.5%.

Full PDF

GGaussian Process Accelerated Feldman-Cousins Approach forPhysical Parameter Inference

Lingge Li, Nitish Nayak, Jianming Bian, Pierre Baldi

University of California, Irvine (Dated: October 2, 2019)

Abstract

The uniﬁed approach of Feldman and Cousins allows for exact statistical inference of smallsignals that commonly arise in high energy physics. It has gained widespread use, for instance,in measurements of neutrino oscillation parameters in long-baseline experiments. However, theapproach relies on the Neyman construction of the classical conﬁdence interval and is computa-tionally intensive as it is typically done in a grid-based fashion over the entire parameter space.In this letter, we propose an eﬃcient algorithm for the Feldman-Cousins approach using Gaussianprocesses to construct conﬁdence intervals iteratively. We show that in the neutrino oscillationcontext, one can obtain conﬁdence intervals 5 times faster in one dimension and 10 times faster intwo dimensions, while maintaining an accuracy above 99 . PACS numbers: a r X i v : . [ phy s i c s . d a t a - a n ] O c t . INTRODUCTION Constructing classical conﬁdence intervals for physical parameters with boundary con-ditions is challenging when dealing with small signals. The challenge is especially evidentwhen studying neutrino oscillations because of the low event counts and multiple competingeﬀects on the energy spectrum. The low event counts are primarily caused by the extremelylow interaction cross-section of neutrinos, arising from the fact that they interact via theweak nuclear force. In order to extract meaningful statistical conclusions, one has to resortto means other than the asymptotic properties of Poisson data. The gold standard is theso-called uniﬁed approach outlined by Feldman and Cousins [1]. It builds upon the Neymanconstruction of classical conﬁdence intervals by specifying an ordering principle based onlikelihood ratios and is known for providing correct coverage.The Feldman-Cousins approach is ﬁrmly grounded in statistical theory and widely usedin neutrino experiments, for example Refs. [2][3][4]. However, it comes at a heavy computa-tional cost, which in some cases such as Ref. [2] renders it infeasible for multi-dimensionalconﬁdence intervals. For the 1 − α conﬁdence interval, the Feldman-Cousins approach in-cludes all the values in the parameter space where the likelihood ratio test fails to reject at α level. However, it doesnt provide a prescription for how to sample that parameter space.Therefore, one is forced to sample it in its entirety in a grid-based fashion. Moreover, at eachpoint one has to perform a large number of Monte Carlo simulations in order to calculatethe p -value for the likelihood ratio test.To accelerate the Feldman-Cousins approach, we propose approximating the functionof p -values over the parameter space with Gaussian processes. Instead of performing alarge number of Monte Carlo simulations, we start with just a small number of them atseveral parameter values to get noisy estimates of the p -values. We then train a Gaussianprocess model to interpolate over these estimates. Iteratively, we perform more MonteCarlo simulations to reﬁne the Gaussian process approximation. We can control the p -valueapproximation error so that it does not change the likelihood ratio test decisions and theconﬁdence interval. Meanwhile, the Monte Carlo simulations can be allocated intelligentlyin the parameter space to achieve substantial savings in computation.The proposed algorithm is rooted in the framework of Bayesian optimization [5]. It wasoriginally designed to ﬁnd the extremal points of an objective function that is unknown apriori . In the Feldman-Cousins approach, the function of p -values over the parameter spaceis unknown. We adapt Bayesian optimization to locate a set of points in the parameterspace that lie on the boundary of desired conﬁdence intervals. By side-stepping points thatare estimated to be either inside or outside the conﬁdence interval with high probability, wecan thus reduce the computational cost while producing the same result. We show that inthe context of neutrino oscillation experiments, one can accelerate the construction of one-dimensional and two-dimensional conﬁdence intervals by a factor of 5 and 10 respectively,without sacriﬁcing the accuracy of the Feldman-Cousins approach. II. STATISTICAL INFERENCE FOR NEUTRINO OSCILLATIONSA. Neutrino Oscillations

Neutrino oscillations demonstrate that neutrinos have mass and that the neutrino masseigenstates are diﬀerent from their ﬂavor eigenstates. In the three ﬂavor framework, the2ransformation of the mass eigenstates ( ν , ν , ν ) into the ﬂavor eigenstates ( ν e , ν µ , ν τ )is described by the 3 × U P MNS [6], which is parameterized by three mix-ing angles θ , θ and θ , and a CP violation phase δ CP . The probability of oscillationsbetween diﬀerent neutrino ﬂavor states of given energy E ν over a propagation distance (base-line) L depends on the U P MNS parameters and the diﬀerence of the squared masses of theeigenstates, ∆ m and ∆ m .The mixing angles θ and θ along with the squared-mass splitting ∆ m have beenmeasured to relatively high accuracy by several experiments, for example, Refs. [7][8][9]. Onecan then infer the remaining parameters, θ , δ CP , and ∆ m , by measuring the probabilities P ( ν µ → ν µ ) and P ( ν µ → ν e ). Of particular interest are: (1) the sign of ∆ m , positiveindicating a “Normal Hierarchy” (NH) and negative indicating a “Inverted Hierarchy” (IH)of neutrino mass states; (2) whether δ CP (cid:54) = 0 , π , indicating Charge-Parity (CP) violation inthe lepton sector; (3) whether the mixing angle is in fact maximal, i.e θ = 45 ◦ . The neutrinomass hierarchy has important implications for current and future neutrino experiments [10]involved in measuring the absolute neutrino mass and investigating the possible Majorananature of the neutrino. Leptonic CP-violation could be important to deduce the origin ofthe predominance of matter in the universe.To infer neutrino oscillation parameters θ , a typical long baseline neutrino oscillationexperiment sends a beam of ν µ neutrinos into a detector and observes a handful of oscillated ν e neutrinos along with ν µ neutrinos that survive over the baseline. As the oscillationprobability is a function of neutrino energy, the observed neutrinos are binned by theirenergy. The neutrino oscillation parameters are inferred by comparing the observed neutrinoenergy spectra with the expected spectra for diﬀerent oscillation parameters as shown inFig. 1. Neutrino Energy (GeV) N u m be r o f E v en t s e n ﬁ m n Mock Observation /2) p = 3 CP d = 0.56, q (NH, sinPrediction /2) p = CP d = 0.56, q (IH, sinPrediction e n ﬁ m n Neutrino Energy (GeV) N u m be r o f E v en t s m n ﬁ m n m n ﬁ m n FIG. 1: An illustration of a toy neutrino oscillation experiment setup with the ν µ → ν e channelon the left and the ν µ → ν µ on the right. Expectations for diﬀerent oscillation parameters arecompared to mock observations in order to ﬁnd maximum likelihood estimates. The likelihood ofobserved data is maximized using the extended likelihood function. The ﬁt is performed in bothchannels simultaneously. B. Feldman-Cousins Approach

Denote the random variable for the neutrino count in the i -th energy bin by X i . Further,assume that each X i follows an independent Poisson distribution with mean λ i . For a given θ ,3he expectations (cid:126)λ are also inﬂuenced by systematic uncertainties in the beam conﬁgurationand the interaction model among others, which we parameterize by δ . For given oscillationand nuisance parameters ( θ, δ ), the expectations (cid:126)λ given ( θ, δ ) are obtained through simula-tions as they are analytically intractable. Denote the implicit mapping between (cid:126)λ and ( θ, δ )by v . The extended log-likelihood of ( θ, δ ) is given by:log L ( θ, δ ) = (cid:88) i ∈ I log P ois ( x i ; v ( θ, δ ) i ) + log P ois ( (cid:88) i ∈ I x i ; (cid:88) i ∈ I v ( θ, δ ) i ) − δ where − δ is a penalty term for systematic error [11].For a uniﬁed treatment of constructing classical conﬁdence intervals for both null and non-null observations, an ordering principle based on likelihood ratios was introduced by Feldmanand Cousins in 1997. The uniﬁed approach provides correct coverage even at parameterboundaries and has the highest statistical power as a result of the Neyman-Pearson lemma.In essence, a particular parameter value θ is included in the 1 − α conﬁdence interval if thelikelihood ratio test fails to reject the null hypothesis θ = θ at the α level. The likelihoodratio test statistic is given by − L ( θ )arg max θ L ( θ )and has an asymptotic χ distribution by Wilks’ theorem. FIG. 2: In the context of neutrino oscillations, the likelihood ratio test statistic distribution changesin the parameter space. Here the parameter is δ CP and ranges from 0 to 2 π . The solid blue lineindicates the 68th-percentile of Monte Carlo simulated distributions while the dashed black line isthe 68th-percentile of the asymptotic χ distribution. In the context of neutrino oscillations, the asymptotic distribution is unreliable because ofthe small sample size in neutrino data and physical boundaries on the oscillation parameters.The reference distribution of the likelihood ratio test statistic can vary drastically as afunction of θ ; Fig. 2 shows several distributions at diﬀerent θ values and comparisons oftheir critical values in particular. Therefore, for any given θ , Monte Carlo experiments areused to simulate the reference distribution and calculate the p -value for the likelihood ratiotest. Since the parameter space is bounded, the simulations are performed on a grid for alarge number of θ values and the computational cost adds up quickly.4 II. GAUSSIAN PROCESS ALGORITHMA. Gaussian Process Regression

A Gaussian process ( GP ) is a stochastic process where any ﬁnite collection of points arejointly Gaussian with mean µ and covariance Σ. An interpretation of the GP is an inﬁniteextension of multivariate Gaussian; a GP can be thought of as a distribution in the functionspace where each draw from the distribution is a curve. Typically, the zero mean GP is usedfor modeling but it is still impossible to specify an inﬁnite-dimensional covariance matrixexplicitly. Instead, we can parametrize a zero mean GP with a kernel function κ that deﬁnesthe pairwise covariance. Let f ∼ GP (0 , κ ( · , · )). Then for any pair x and x (cid:48) we have (cid:18) f ( x ) f ( x (cid:48) ) (cid:19) ∼ N (0 , (cid:20) κ ( x, x ) κ ( x, x (cid:48) ) κ ( x, x (cid:48) ) κ ( x (cid:48) , x (cid:48) ) (cid:21) ) . Given a ﬁnite set of observed data (cid:126)x obs , we can write down the multivariate Gaussianlikelihood in this fashion and maximize it through kernel parameters ω . Conveniently, at anew point x ∗ we can obtain the closed form predictive distribution: f ( x ∗ ) | f ( (cid:126)x obs ) ∼ N ( κ ( x ∗ , (cid:126)x obs )( κ ( (cid:126)x obs , (cid:126)x obs )) − f ( (cid:126)x obs ) ,κ ( x ∗ , x ∗ ) − κ ( x ∗ , (cid:126)x obs )( κ ( (cid:126)x obs , (cid:126)x obs )) − κ ( (cid:126)x obs , x ∗ )) . Since a GP is uniquely characterized by the kernel, diﬀerent kernels produce distinctbehaviors. A commonly used kernel is squared exponential κ ( x , x ) = exp( − ( x − x ) /l )where l is called the length scale. Intuitively, the length scale determines the distance overwhich the GP interpolates between points. The squared exponential kernel is inﬁnitely diﬀer-entiable and functions drawn from such a GP would be smooth. However, this smoothnessassumption might not be appropriate for some applications; a more general kernel is theMat´ern kernel. The Mat´ern kernel has an additional parameter ν that controls the smooth-ness and the squared exponential kernel is a special case where ν → ∞ . Fig. 3 shows some GP examples and please refer to Ref. [12] for more details on Gaussian processes.Diﬀerent kernels can be combined to compose a GP as long as the new kernel covariancematrix is still positive semi-deﬁnite. With the squared exponential kernel alone, the covari-ance implies that the observed data has no error. To account for error in the data, a diagonalmatrix σ I is usually added to model constant variance across observations. In many situ-ations such as ours, there exists heteroskedasticity, which means that diﬀerent observationshave diﬀerent errors. When we iteratively perform Monte Carlo simulations to calculate p -values, the errors in the estimates also vary based on the number of simulations. We canactually model the p -value error as a diagonal matrix and add it to the GP covariance. B. Monte Carlo Error Estimation

In the Feldman-Cousins approach, a large number of Monte Carlo simulations is requiredin order to make the error in p -value calculation negligible. When the Monte Carlo errorin p -value calculation is not negligible, we should try to quantify it. Since the p -value isthe quantile of the observed likelihood ratio statistic under the reference distribution, wecan use a binomial proportion conﬁdence interval as the p -value error estimate as outlined5 IG. 3: Sampled paths from Gaussian processes with diﬀerent kernels (left). With ν = 1 .

5, theMat´ern kernel produces functions that are only once diﬀerentiable. Sampled paths from Gaussianprocess prior and posterior with squared exponential kernel (right). The posterior paths, repre-senting the curves drawn from the predictive distribution, are better aligned with the observeddata points in solid blue. below [13]. As shown in Fig. 4, the Monte Carlo error only slowly approaches zero when thenumber of simulations increases to 10,000.

FIG. 4: Monte Carlo error in terms of p -value as a function of the number of experiments (left).Example of non-parametric quantile interval construction using Binomial distribution (right). Ina sample with 100 draws, the 85 th and 95 th order statistics form a 95% conﬁdence interval for the90 th quantile of the unknown distribution. Suppose X , ..., X n are independent draws from an unknown distribution F whose q th quantile is denoted by F − ( q ). Each draw X i is either below or above F − ( q ) with probability q . Consequently, M , the number of X i ’s less than or equal to F − ( q ), has a Binomial( n, q )distribution. We can obtain a conﬁdence interval for F − ( q ) with sample statistics X ( l ) , X ( u ) (the l th and u th ordered draws) with 1 ≤ l ≤ u ≤ n such that B ( u − n, q ) − B ( l − n, q ) ≥ − α.B ( u − n, q ) − B ( l − n, q ) is the probability that M is between l and u −

1. Thus,( l, u ) would form a conﬁdence interval for M . Correspondingly, ( X ( l ) , X ( u ) ) would form aconﬁdence interval for F − ( q ). Our goal, however, is to estimate F ( x ∗ ) for an arbitrary x ∗ X , where, in our context, x ∗ is the observed likelihood ratio test statistic and F ( x ∗ ) is the p -value. This can be done by inverting the quantile conﬁdence interval untilthe conﬁdence intervals for F − ( q l ) and F − ( q u ) no longer contain x ∗ . Then ( q l , q u ) wouldform a conﬁdence interval for F ( x ∗ ). C. Proposed Algorithm

Bayesian optimization can be used to ﬁnd the extremum of a black-box function h when h is expensive to evaluate so that a grid search is too computationally intensive. Bayesianoptimization is an iterative procedure; in each iteration, h is evaluated at a number ofpoints to update an approximation of h . The approximation usually starts from a zero-mean Gaussian process prior GP (0 , κ ( · , · )). After each iteration, the GP model yields aposterior distribution, hence Bayesian. Based on the approximation posterior, the points inthe next iteration are proposed by an acquisition function a . The acquisition function a aimsto balance between “exploration”, reducing approximation uncertainty, and “exploitation”,reaching the extremum.In our context, the expensive black-box function is the the function of p -values overthe parameter space. Denote the grid points in the parameter space, where Monte Carlosimulations are performed, by (cid:126)θ o , the simulated p -values at these points by y ( (cid:126)θ o ), and theindependent simulation errors by σ ( (cid:126)θ o ). The GP predictive posterior distribution of theunobserved p -values at (cid:126)θ u conditional on obtained p -values y ( (cid:126)θ o ) at points (cid:126)θ o is then givenby f ( (cid:126)θ u ) | y ( (cid:126)θ o ) ∼ N ( K uo ( K oo + diag ( σ ( (cid:126)θ o ))) − y ( (cid:126)θ o ) ,K uu − K uo ( K oo + σ ( (cid:126)θ o ) I ) − K ou )where K oo , K ou , K uo , K uu denote the covariance matrices between points (cid:126)θ o and (cid:126)θ u .Diﬀerent from typical Bayesian optimization, we do not simply wish to ﬁnd the minimumor maximum p -value. Instead, we want to ﬁnd the points where the p -value is equal to α so that they enclose the conﬁdence interval. Moreover, we want to be able to ﬁnd multipleintervals at diﬀerent conﬁdence levels. Therefore, we choose our acquisition function to be a ( θ ) = (cid:88) α i | f ( θ ) − α i σ f ( θ ) | − where f ( θ ) is the GP approximated p -value (posterior mean) at θ and σ f ( θ ) is the GP posterior standard deviation at θ . 7 lgorithm 1 GP iterative conﬁdence interval construction for each iteration t = 1 , , ... do Propose points in parameter space arg max θ a ( θ ) for each point θ (cid:48) do Simulate likelihood ratio statistic distribution for k = 1 , , ... do Perform a pseudo experimentMaximize the likelihood with respect to ( θ, δ )Maximize the likelihood with constraint θ = θ (cid:48) Calculate likelihood ratio statistic end for

Calculate p -value based on the simulated distribution end for Train GP approximation f ( θ ) for the p -valuesUpdate conﬁdence intervals end for Iteratively, the GP algorithm will seek points on the boundary of conﬁdence intervals,for which it is unsure about. Points far from the boundary, which have p -values muchgreater or less than α i , are probabilistically “ruled out.” At these points, we will end upperforming fewer Monte Carlo experiments or skipping them altogether. Every point on thegrid would be either included or rejected with some uncertainty based on the GP posterior.With more iterations, the uncertainty will diminish so that the approximated conﬁdenceintervals converges to the ones produced by a full grid search. Fig. 5 illustrates the proposedalgorithm on an 1-dimensional example. FIG. 5: An illustration of our construction for the 68% and 90% conﬁdence intervals for δ CP ,which consist of points lying underneath the dashed horizontal lines. From a few initial pointswith high variance, the GP learns a rough approximation of the true curve (left). Based on theapproximation, more points are proposed around the interval boundary, shown in dark blue, andthe GP improves itself (right). The shade of blue represents the number of simulations used tocalculate the p -value and the error bars are for the p -value. Here we use the squared exponential kernel with the Monte Carlo errors added to thecovariance diagonal and a small amount of white noise as often done in a regression setting812]. Point estimates of the GP kernel parameters by optimizing the log marginal likelihood − y ( (cid:126)θ o ) T ( K oo + diag ( σ ( (cid:126)θ o ))) − y ( (cid:126)θ o ) −

12 log | K oo + diag ( σ ( (cid:126)θ o )) | − n π. There are constraints on the kernel parameters that should be incorporated. For instance,the length scale l should be greater than the grid resolution and less than the grid range. IV. NUMERICAL STUDIES

By way of illustration, we set up a toy long-baseline neutrino oscillation experiment inorder to construct conﬁdence intervals for the oscillation parameters. A ﬂux distribution of ν µ s is modeled as a Landau function over neutrino energies, E ν ∈ (0 . , .

5) GeV with thelocation parameter at 2 GeV as shown in Fig 6. The normalisation uncertainty is taken tobe 10% and is applied as a nuisance parameter. The ν µ distribution is then oscillated into ν e s using the PMNS model for a toy baseline of 810km through the Earth. Corrections frommatter interactions [14] are applied assuming a constant matter density of 2 . g/cm . Thesetup is similar to NOvA [3], an accelerator-based long-baseline experiment at Fermilab. Theoscillated ν e s are then “observed” with a toy interaction cross-section distribution, similarin shape to Ref. [15]; the cross-section increases as a function of neutrino energy from 0 GeVup to 1 GeV and decreases slowly until a maximum neutrino energy of 4 . ν e distribution to get an energy spectrum expectation,in energy bins of 0 . ν µ → ν µ channel. However, in order to expedite the computation, the 2-ﬂavor oscillation probabilityapproximation is used. The reactor mixing angle, θ and the solar parameters, θ and∆ m are ﬁxed at the values given in Ref. [16]. A mock data set is obtained by applyingPoisson variations on the expected spectrum at oscillation parameter values given by NOvA. Neutrino Energy (GeV) A r b i t r a r y U n i t s Interaction Cross-Section e n Interaction Cross-Section e n Neutrino Energy (GeV) A r b i t r a r y U n i t s Flux m n Flux m n FIG. 6: The distributions for ν e interaction cross-section (left) and ν µ ﬂux (right) are shown alongwith a normalization systematic error of 10%. We then use this setup to construct 1-dimensional conﬁdence intervals for δ CP and 2-dimensional conﬁdence intervals for sin θ vs δ CP by the two algorithms, a standard grid-search implementation of Feldman-Cousins and the GP -based algorithm. | ∆ m | is treated9s a nuisance parameter while sin θ is treated as another in the case of the 1-dimensionalinterval for δ CP . The likelihood function is integrated over the nuisance parameters assuminga ﬂat prior in the range (2 , × − eV for | ∆ m | and (0 . , .

7) for sin θ , similar toRef. [2]. The prior on the nuisance parameters for the systematic uncertainties in the toymodel is assumed to be a standard normal distribution. The toy model and parameter ﬁttingroutine are implemented in ROOT [17] while the Gaussian process algorithm is implementedwith scikit-learn [18]. A. 1-dimensional Conﬁdence Intervals

To make inference on δ CP , a signiﬁcance curve is usually drawn under diﬀerent assump-tions of mass hierarchy as shown in Fig. 7. The portion of the signiﬁcance curve below acertain value gives us the conﬁdence interval at that level. We can observe that the NHcurves by both the standard FC and GP algorithms have the same intersections with 1 σ horizontal line, which implies that the 1 σ conﬁdence intervals are the same. Though thereare slight discrepancies, the shape of the GP signiﬁcance curve is mostly correct. FIG. 7: Example signiﬁcance curves obtained with the standard Feldman-Cousins and Gaussianprocess algorithms mostly overlap, especially when the signiﬁcance is close to 1 σ and 1 . σ asdesired. In this case, the inverted hierarchy (IH) is rejected at 1 . σ level and the normal hierarchy(NH) has the same 1 σ conﬁdence interval. To evaluate the performance of the GP algorithm, we perform the same inference pro-cedure on 200 diﬀerent data sets to ﬁnd the 68% and 90% conﬁdence intervals. First, withstandard FC results as ground truth, we consider the accuracy of the GP algorithm forclassifying whether or not each grid point is included in the conﬁdence intervals. As the GP algorithm is iterative, we can calculate the accuracy at the end of each iteration withﬁxed computation. When the computation reaches 20% of that is required by standardFC, we stop the algorithm and calculate the absolute error as the diﬀerence in conﬁdenceinterval endpoints. Fig 8 shows that the median accuracy reaches 1 with less than 20% ofcomputation and the error is no more than 0 . π for most data sets. As δ CP ranges from 0to 2 π and there are only 20 grid points, an error of 0 . π is just one grid point. With a ﬁnergrid, we expect the performance of the GP algorithm to improve.10 IG. 8: Relative accuracy of the conﬁdence intervals in terms of correctly included grid points as afunction of computation (left) and the distribution of absolute errors for both normal and invertedhierarchies (right).

B. 2-dimensional Conﬁdence Contours

To ﬁnd the 2-dimensional conﬁdence contours under hierarchy constraints, the GP al-gorithm approximates the p -value surface on the parameter grid as shown in Fig. 9 andspeciﬁcally prioritizes points on the contour boundaries. Grid points below a certain valueare included in the conﬁdence contour at that level. To make the ﬁnal smooth contours inFig. 10, we use Fourier smoothing to draw the closest elliptical curves. We can observe thatthe FC and GP contours overlap in the same areas. In fact, the area diﬀerence between thecontours is on the same order of magnitude with Fourier smoothing. FIG. 9: GP approximated percentile (1 − p -value) on the 20 ×

20 grid for sin θ vs δ CP (left) andthe priority to sample points from the grid (right). Notice that the points near 68% and 90% havethe highest priority. Similarly, we use both algorithms on 200 diﬀerent data sets to ﬁnd the 68% and 90%conﬁdence contours and calculate the grid point classiﬁcation accuracy after each iterationup to 10% of the standard FC computation. A concern is that contours with larger areacould require more computation to achieve the same accuracy as there are more points along11

IG. 10: Conﬁdence contours for the same data constrained to normal (left) and inverted hier-archies (right). The true (dashed) and approximated (transparent) contours are almost indistin-guishable. the boundary. We address this concern by stratifying contours by area quartile and plottingmedian accuracy as a function of computation. Fig. 11 shows that the median accuracyreaches 1 with less than 10% of computation and contour area does not have an eﬀect. Thereason is that while larger contours have more points on the boundary, smaller contoursare more diﬃcult to locate precisely. Overall, it takes roughly the same computation toprobe the p -value surface accurately so the GP algorithm should have similar performanceregardless of the contour size. FIG. 11: Relative accuracy of the conﬁdence contours as a function of computation (left) andmedian accuracy stratiﬁed by area as a function of computation (right).

Lastly, we are interested in where the computational savings come from. We keep trackof the number of grid points explored by the GP algorithm and the number of simulationsat each point for the 200 data sets. Fig. 12 shows that the algorithm explores about halfof the total grid points and on average only about 300 Monte Carlo simulations are doneinstead of 2000 in standard FC. We conclude that most of the computational savings comefrom performing fewer Monte Carlo simulations; skipping grid points nearly doubles thecomputational savings. As mentioned earlier, the advantage of the GP algorithm could be12reater on a ﬁner grid. FIG. 12: Distribution of the number of points explored on the grid (left) and distribution of theaverage number of Monte Carlo experiments simulated at a point (right).

V. DISCUSSION

The proposed algorithm signiﬁcantly accelerates the Feldman-Cousins approach whereinexperiments have to devote enormous computational resources in order to estimate uncer-tainties in neutrino oscillation parameters [19]. This could also prove useful in estimatingconﬁdence intervals from a combined ﬁt of neutrino oscillation results from diﬀerent experi-ments when the respective likelihood functions are available. While we design the GP basedconstruction in the neutrino oscillation context, the GP approximation does not have a par-ticular parametric form. The same idea can therefore be applied to many other scenarioswhere the conﬁdence interval construction for a continuous parameter over a bounded regionnormally proceeds via the uniﬁed approach. [1] G. J. Feldman and R. D. Cousins. Uniﬁed approach to the classical statistical analysis ofsmall signals. Physical Review D , 57(7):3873, 1998.[2] K. Abe et al. Measurement of neutrino and antineutrino oscillations by the T2K experimentincluding a new additional sample of ν e interactions at the far detector. Physical Review D ,96(9):092006, 2017.[3] M. A. Acero et al. New constraints on oscillation parameters from ν e appearance and ν µ disappearance in the NOvA experiment. Physical Review D , 98(3):032012, 2018.[4] P. Adamson et al. Search for sterile neutrinos in MINOS and MINOS+ using a two-detectorﬁt.

Physical Review Letters , 122(9):091803, 2019.[5] Jonas Moˇckus. On bayesian methods for seeking the extremum. In

Optimization TechniquesIFIP Technical Conference , pages 400–404. Springer, 1975.[6] Bruno Pontecorvo. Mesonium and antimesonium.

Zhur. Eksptl’. i Teoret. Fiz. , 33, 1957.[7] Q. R. Ahmad et al. Direct Evidence for Neutrino Flavor Transformation from Neutral-CurrentInteractions in the Sudbury Neutrino Observatory.

Physical Review Letters , 89(011301), 2002.

8] Y. Fukuda et al. Evidence for oscillation of atmospheric neutrinos.

Physical Review Letters ,81(1562), 1998.[9] F. P. An et al. Measurement of electron antineutrino oscillation based on 1230 days of operationof the Daya Bay experiment.

Physical Review D , 95(072006), 2017.[10] G. Drexlin, V. Hannen, S. Mertens, and C. Weinheimer. Current direct neutrino mass exper-iments.

Advances in High Energy Physics , 2013, 2013.[11] R. Barlow. Extended Maximum Likelihood.

Nuclear Instruments and Methods in Physics,Volume 293, Issue 3 , 1990.[12] Carl Edward Rasmussen. Gaussian processes in machine learning. In

Advanced lectures onmachine learning , pages 63–71. Springer, 2004.[13] Gerald J Hahn and William Q Meeker.

Statistical intervals: a guide for practitioners , vol-ume 92. John Wiley & Sons, 2011.[14] Y. Smirnov. The MSW eﬀect and Matter Eﬀects in Neutrino Oscillations.

Phys.Scripta T121(2005) 57-64 , 2004.[15] J. A. Formaggio and G.P. Zeller. From eV to EeV: Neutrino cross sections across energyscales.

Reviews of Modern Physics , 84(3):1307, 2012.[16] Particle Data Group. Neutrino Masses, Mixing and Oscillations. 2017.[17] Rene Brun and Fons Rademakers. ROOT : An object oriented data analysis framework.

Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers,Detectors and Associated Equipment , 389(1-2):81–86, 1997.[18] Fabian Pedregosa, Ga¨el Varoquaux, et al. Scikit-learn: Machine learning in python.

Journalof Machine Learning Research , 12(Oct):2825–2830, 2011.[19] A. Sousa, N. Buchanan, S. Calvez, P. Ding, D. Doyle, A. Himmel, B. Holzman, J. Kowalkowski,A. Norman, and T. Peterka. Implementation of Feldman-Cousins corrections and oscillationcalculations in the HPC environment for the NOvA Experiment. In

Proceedings of the 23rdInternational Conference on Computing in High-Energy and Nuclear Physics (CHEP 2018),Soﬁa, Bulgaria, July 9-13, 2018 , 2019. In press., 2019. In press.