Robust test statistics for data sets with missing correlation information
RRobust test statistics for data sets with missing correlation information
Lukas Koch
University of Oxford
Abstract
Not all experiments publish their results with a description of the correlations between the data points. Thismakes it difficult to do hypothesis tests or model fits with that data, since just assuming no correlation canlead to an over- or underestimation of the resulting uncertainties. This work presents robust test statisticsthat can be used with data sets with missing correlation information. They are exact in the case of nocorrelation and either guaranteed to be conservative – i.e. the uncertainty is never underestimated – in thepresence of correlations, or they are also exact in the degenerate case of perfect correlation between the datapoints.
1. Introduction
Some data sets are published without a full co-variance matrix, describing the correlations be-tween the data points of the result. The impliedassumption in these data sets is that the correla-tion in the uncertainties is 0, i.e. the data is uncor-related. This is not always the case though , andusers of the data are put in the unenviable situationof having to use correlated results, without know-ing what the correlations actually are. Usually onewould use the fully correlated Mahalanobis distanceor its square, D = ∆ T S − ∆ , to judge how well acertain model fits the data. Here ∆ is the differ-ence between the data and the model, and S is thecovariance matrix describing the uncertainty of theresult. Just ignoring the correlations and apply-ing a “naive”, uncorrelated Mahalanobis distanceto compare a model to the data can lead to plainlywrong results.Consider multivariate normal distributed datawith ten dimensions. In fact, unless otherwisestated, let us assume that the examples in this pa-per are all multivariate normal distributed and weknow the correct diagonal elements of the respec-tive covariance matrices. The only problem we will Email address: [email protected] (LukasKoch) If, e.g., the data points vary smoothly within their errorbands or when they are supposed to describe a “shape-only”uncertainty, it is clear that there is a correlation there. address here is the missing of information regardingthe correlations between the variables. The “naive”test statistic would consist of just summing up thesquared z-scores, i.e. the residuals normalised bythe uncertainty:naive( x | µ , s ) = (cid:88) i ( x i − µ i ) S ii , (1)where x is the data result, µ is a prediction of theexpectation value from some model, and S ii = s i isthe variance of the data points. This test statisticwill be chi-square distributed if there are no cor-relations present in the data, but using it in thepresence of correlations leads to under- or overesti-mation of uncertainty, depending on the correlationand the actual value of the statistic. Figure 1 showsthis for 10-dimensional toy data sets thrown withdifferent levels of correlation. The diagonal terms ofthe covariance matrix are kept constant at 1, whilethe off-diagonals are set to the values 0, 0.5, 0.9,and 0.99, in the different sets.The left plot shows the cumulative probabilitydensity functions (CDFs) of the expected distribu-tion of the test statistic in the absence of corre-lations, as well as the actual CDFs of the differ-ent toy data sets. The different distributions affectwhich significance level (or p-value) a certain valueof the test statistic corresponds to. The right plotshows how an assumed significance level (as cal-culated with the expected CDF) translates to theactual significance level (as calculated from the ac- Preprint submitted to TBD February 12, 2021 a r X i v : . [ phy s i c s . d a t a - a n ] F e b ual CDFs). To put it another way: The x-axisshows how often one would like to make a Type-Ierror (rejecting a true hypothesis), while the y-axisshows how often one actually makes a Type-I er-ror, given the different levels of correlation in thedata. If the actual significance level is larger thanassumed, one rejects a true hypothesis more oftenthan intended. In terms of error bars or confidenceregions, this means that the size of the uncertaintiesis effectively underestimated.This behaviour is clearly undesirable. If the datais (suspected to be) strongly correlated, it would bebetter to use a different test statistic that is able toperform consistently under different levels of cor-relation in the data. For such a test statistic, thefollowing properties would be desirable:1. Exact in the case of no correlations.2. Guaranteed to be conservative when not exact.3. Low deviations from exactness when not exact.4. Exact in the case of 100% correlation.5. Exact at every possible level of correlation.Some of these properties are contained in one an-other. The naive use of the uncorrelated Maha-lanobis distance has property 1 but none of theothers. The following sections will describe sometest statistics that have more of these properties.Finally they will be compared in section 4.
2. Fitting the covariance to the data
The first considered test statistic arises fromtreating the off-diagonal elements of the covari-ance matrix as nuisance parameters of the statis-tical model. For any given prediction in the N-dimensional parameter space µ and a given sam-ple (i.e. the data) x , it is possible to choosethe off-diagonal elements of S in a way to min-imise the resulting squared Mahalanobis distance D = ( x − µ ) T S − ( x − µ ). This is different frommaximising the likelihood of the data, since theprobability density of a multivariate normal dis-tribution also depends on the determinant of thecovariance matrix: L = (2 π ) − N det( S ) − e − ( x − µ ) T S − ( x − µ ) . (2)A minimal Mahalanobis distance is only equivalentto a maximal likelihood if the determinant of thecovariance matrix is constant. This is not the casehere. Furthermore, for N ≥ ∞ , rendering it useless as a teststatistic. This will be shown below.To minimise the Mahalanobis distance, we cantransform the variable space to make the last vari-able x N independent of the others: y = − S N S NN x (3)= . . . − S N S NN . . . − S N S NN ... ... . . . ... ...0 0 . . . − S ( N − N S NN . . . x . (4)Here S NN is the variance of the Nth variable, and S N is the vector of the N − S N = S N ... S ( N − N . (5)The covariance and expectation values for y arethen: S y = ( ∇ y T ) T S ( ∇ y T ) (6)= − S N S NN S /N S N S TN S NN − S TN S NN (7)= S /N − S N S TN S NN S TN S NN − S TN S NN (8)= S /N − S N S TN S NN S NN , (9) µ y = − S N S NN µ , (10)where S /N is the original covariance matrix for theremaining N − y N = x N to the total Ma-halanobis distance is fixed, since it only dependson S NN , which has a given constant value. Thecontribution of the remaining variables y /N couldbe minimised to 0 by choosing the off-diagonal ele-ments of the covariance such that their expectationvalue µ y/N is equal to the actual value: µ y/N ! = y /N (11) µ /N − S N S NN µ N = x /N − S N S NN x N (12)2 x )0.00.20.40.60.81.0 C D F Uncorr.Corr.50Corr.90Corr.99expected × 10 10 Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 Uncorr.Corr.50Corr.90Corr.99
Figure 1: CDFs (left) for the “naive” test statistic for different levels of correlations in the data. When using the uncorrelatedCDF to calculate the assumed significance level (or p-value) of a value of the statistic, the actual level will differ from theassumption depending on the correlations (right). As the correlation increases, the distribution of the naive test statisticapproaches that of a χ distributed variable which is multiplied by the number of bins (in this case 10). S N = S NN x /N − µ /N x N − µ N (13)= ∆ i √ S ii √ S NN ∆ N √ S ii S NN ... ∆ N − √ S ( N − N − √ S NN ∆ N (cid:112) S ( N − N − S NN (14)Here ∆ i / √ S ii = ( x i − µ i ) / √ S ii is the (positive ornegative) z-score of the i th variable, and √ S ii S NN is the maximum allowed absolute value of the co-variance between the i th and N th variable. Thelatter arises from the fact that the correlation coef-ficients S ij / √ S ii S NN must be within [ − , S N is thus the ratio of the N − − , N th is the one with the largest absolute z-score.Thus, by eliminating the contribution of the othervariables as shown above, the minimal achievableMahalanobis distance under variation of the off-diagonal elements of the covariance matrix is equalto the largest absolute z-score of the single vari-ables. This behaviour is illustrated in Figure 2 fortwo dimensions. Note that we only need N − N ( N − / S arbitrarily small, leading to the infinite supremumof the Likelihood maximisation over the covarianceelements for N ≥ b the largest absolute z-score, and wecan define the “fitted” test statistic as:fitted( ∆ | s ) = b = max i (cid:18) ∆ i S ii (cid:19) . (15)It is straight-forward to derive the expected dis-tribution of this test statistic in the case of no corre-lations. The CDF of b , F b ( b (cid:48) ), is just the probabilityof the absolute values of all z-scores being smallerthan or equal to b (cid:48) : F b ( b (cid:48) ) = P ( b ≤ b (cid:48) ) = + b (cid:48) (cid:90) − b (cid:48) · · · + b (cid:48) (cid:90) − b (cid:48) f ( z , . . . , z N ) d z . . . d z N , (16)where f ( z ) denotes the probability density function(PDF) of the potentially negative z-scores. Withuncorrelated, standard normal distributed z-scoresthis evaluates to F b ( b (cid:48) ) = erf N (cid:18) b (cid:48) √ (cid:19) . (17)With this we can write down the CDF of b as F b ( y ) = F b ( √ y ), (18)3 .5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 / S / S z z S = 0 S = S S z / z Figure 2: Minimum achievable Mahalanobis distance for two dimensions. The minimum achievable Mahalanobis distance whenvarying the off-diagonal covariance element is equal to the largest absolute z-score of the single variables. The surface wherethe Mahalanobis distance is equal to 1 is an ellipse contained within the square with its edges at ∆ i / √ S ii = ±
1. Varying theoff-diagonal element of the covariance matrix does not rotate the principal axes of the ellipse, but it changes where it touchesthe edges of the square. When chosen correctly, the ellipse touches the edge of the box at the point where the data is projectedonto it. Because of the linearity of the Mahalanobis distance, this means that the total distance of the data point is thensimply the largest z-score. This two-dimensional minimisation can be done for all marginal projections of pairs of variablesin N-dimensional problems. To achieve the minimal total Mahalanobis distance, only the pairs involving the overall largestabsolute z-score need to be specified like this, but applying the scheme to all pairs ensures a valid covariance matrix. ∆ | s ) ∼ Bee N , (19)for uncorrelated normal distributed ∆ i . A Pythonimplementation of the distribution can be found inListing 1 in the appendix.Figure 3 shows how this test statistic fares for dif-ferent levels of correlation in the toy data. For nocorrelations, the distribution follows the expecta-tion. With increasing correlations, the distributiondeviates more and more, approaching a chi-squaredistribution with one degree of freedom. Com-pared to the naive approach, we can see that thedeviation from an exact statistic has been decreasedfor a wide range of significance levels. But more im-portantly, the fitted test statistic is conservative forall significance levels and all correlation strengths.The real significance level of a result is always equalto or lower than the assumed significance that wasevaluated using the expected Bee-square distribu-tion.
3. Asymptotically invariant test statistics
The fitted test statistic described above is “safe”to use in the sense that it is always conservative.Unfortunately it gets more and more conservativewith increasing correlations in the data. It wouldbe advantageous if the test statistic was exact at alllevels of correlations, or at least at both no corre-lations, and (in the limit of) 100% correlations. Toachieve the latter, it is useful to view the problemin the “CDF space” of the data points.Instead of the distribution of ∆ , let us considerthe CDFs of the squares of the single variables: y i = F χ (∆ i /S ii ), (20)where F χ is the CDF of ∆ i /S ii , since we assume∆ i to be normal distributed with a variance of S ii .Since y is a function of a random variable, it is As the data gets more and more correlated, the z-scoreswill approach being equal in all cases, and the maximumz-score will be distributed like a single standard normal dis-tributed variable. I.e. the probability of a result at least as extreme as theobserved is actually lower than what the assumed distribu-tion suggest; the uncertainty is overestimated. itself a random variable. Also, by definition, y i isuniformly distributed between 0 and 1: y i ∼ U(0 , y i , f y ( y ) will also be uniform within theN-cube defined by the N unit vectors: f y ( y ) = (cid:40) ≤ y i ≤ ∀ i y i .If, on the other hand, the variables are perfectlycorrelated, the values of all y i will be identical ineach random sampling. This means the combined PDF must be zero wherever the y i are not identical.In this case, the combined probability density willbe a delta function that concentrates all probabilityon the main diagonal of the hypercube: f (cid:48) y ( y ) = (cid:40)(cid:81) N − i =1 δ ( y i − y i +1 ) if 0 ≤ y ≤
10 else (23)Note that this does not affect the marginal distri-butions of the single variables. Marginalising outall but one variable leads again to a uniform distri-bution in that variable. If we can define a function z ( y ) that is identically distributed under both as-sumptions, we can use it to define a test statisticthat is exact in both cases.Let us demand that a low value of z indi-cates a good agreement between data and model,while high values indicate tension between the two.Within the N-dimensional hypercube, this meansthat z should be low towards the corner at y = and increase towards the corner at y = . For z tobe identically distributed with no correlations andwith 100% correlations, the surface defined by theimplicit function z ( y ) = z (cid:48) = const . must enclosethe same amount of probability in both cases, asthis defines the CDF of z : F z ( z (cid:48) ) = (cid:90) z ( y ) ≤ z (cid:48) f ( (cid:48) ) y ( y )d N y (24)5 x )0.00.20.40.60.81.0 C D F Uncorr.Corr.50Corr.90Corr.99expected Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 Uncorr.Corr.50Corr.90Corr.99
Figure 3: CDFs (left) for the “fitted” test statistic for different levels of correlations in the data. When using the uncorrelatedCDF to calculate the assumed significance level (or p-value) of a value of the statistic, the actual level will differ from theassumption depending on the correlations (right). In the presence of correlations, the real significance is consistently higher(the significance level is lower) than the assumption. This means the uncertainties are overestimated and the statistic behavesconservatively. As the correlations increase, the distribution of the fitted test statistic approaches the χ distribution. In the case of no correlation, this is the volume A of the part of the N-cube enclosed by the implicitfunction. In the case of perfect correlation, it isequal to the single (identical) y i coordinates wherethe surface intersects with the diagonal. Figure 4aillustrates this in the case of two dimensions.Let us call this coordinate x and let us also de-mand that the function z ( y ) = x at that point. Wethen get the following condition: A = (cid:90) z ( y ) ≤ x d N y ! = x , (25)where the integral is understood to be confined tothe inside of the N-cube. The challenge is now tofind functions z ( y ) which fulfil this condition forany number of dimensions. The simplest way to fulfil Equation 25 in twodimensions is to draw straight lines from the pointson the diagonal to the ”off-diagonal” corners of thesquare, as shown in Figure 4b. The function z canbe easily calculated for any given point, as eachpoint can be seen as lying on a straight diagonalline starting at the lower or left edge of the square( y s ) and ending on the right or top edge ( y e ). Thefractional distance along this line is the desired z and evaluates to z ( y ) = y min y min + (1 − y max ) , (26) where y min / max are the minimum and maximum ofthe elements of y respectively. This also directlyapplies to the N-dimensional case without change.Now, it would be possible to use z as the teststatistic directly when used on its own. When thedata is intended to be used in conjunction withother data sets though, e.g. in a global fit, it isuseful to use a test statistic that is (approximately)chi-square distributed. To this end, we can simplyapply another function to z which is chosen so thatthe result is chi-square distributed if z is uniformlydistributed. This function is just the inverse of theCDF of the chi-square distribution − F χ . Finally weget the first of the “invariant” test statistics:invariant ( ∆ | s ) = − F χ (cid:16) z (cid:16) F χ (cid:16) ∆ i / (cid:112) S ii (cid:17) , . . . (cid:17)(cid:17) ,(27)with z as defined in Equation 26.Note that we could have chosen a different num-ber of degrees of freedom for the transformationback to a chi-square distribution. This makes nodifference when using the test statistic on a singledata set alone, but it changes the relative weight adata set has in a global fit with other data whencomputing the total chi-square. If one is confidentthat the correlations in the data set are weak, itmight be better to use the actual number of datapoints. In the presence of medium to strong cor-relations it could be argued though that there isactually less information in the data than the num-ber of data points suggests, or rather, we are losing“degrees of freedom” by having to make up for the6 .0 0.2 0.4 0.6 0.8 1.0 y = F ( / S )0.00.20.40.60.81.0 y = F ( / S ) x A z = const . = x (a) Illustration y = F ( / S )0.00.20.40.60.81.0 y = F ( / S ) x A z = y min y min + (1 y max ) y s + z ( y e y s ) (b) Invariant 1 y = F ( / S )0.00.20.40.60.81.0 y = F ( / S ) x A z = max( g ( y max ), y min ) (c) Invariant 2 y = F ( / S )0.00.20.40.60.81.0 y = F ( / S ) x A z = max( h ( y max ), y min y min + (1 y max )) (d) Invariant 3 Figure 4: Illustration of the test statistics in CDF space. For z to be identically distributed at both no correlations (PDF of y is uniform over square) and at 100% correlations (PDF of y is uniform along diagonal), the area A enclosed by the implicitfunction z = const. must be equal to the position x where the function crosses the diagonal. Another simple solution in two dimensions is toadd identical rectangles to the two inside-facingsides of the x square, as shown in Figure 4c. Let l be the length of the added rectangles: l = 1 − x N dimensions, the shape of A be-comes a hypercube H with an edge length of x + l ,minus another hypercube H at the inside diagonalcorner with an edge length of l : A = ( x + l ) N − l N ! = x . (29)Expressed in the total width of the larger cube d = x + l we get: d N − l N ! = d − l . (30)This equation will always have a solution of l ∈ (0 , d ) for any d ∈ ( N − / ( N − , l ( d ) be that solution, so we candefine the function g : [0 , → [0 , g ( d ) = d ≤ N − / ( N − d − l ( d ) if N − / ( N − < d <
11 if d = 1, (31)which can calculate x from a given (possible) edgelength d of H .To calculate the z of any given point, we can usethat every point on the surface of ( H \ H ) is eitheron one of the the “outer” faces of H (where all co-ordinates are > H (where at least one coordinate is x ). In the formercase, the edge length of H is given by the maxi-mum of the y -coordinates, while in the latter casethe position of the “inner” diagonal corner of H is given by the minimum of the coordinates. Sincethat inner corner is at y i = x ∀ i by construction,we can write z as z ( y ) = max( g ( y max ) , y min ). (32) With this z we can then define the invariant ( ∆ | s )test statistic just like in Equation 27.Its performance can be seen in Figure 6. It isconservative for all strengths of correlation and sig-nificance levels, and shows the expected limit of ex-actness at no and very strong correlations. Finally, Figure 4d shows an intermediate shapebetween “invariant 1” and “invariant 2”. It can beinterpreted as the shape of “invariant 1”, but in-stead of connecting the diagonal to the off-diagonalcorners of the square, it is connected to the respec-tive corners of a larger square with edge length 1 /α ,with the shape parameter α ∈ (0 , A is equal to x , it is cut off at theedges of a square with edge length d .In N dimensions, the volume of such a body is: A = d N − ( d − x ) N (1 − αx ) N − = x . (33)This equation has one solution of x ∈ (0 , d ) if ( N − αdN + αd ) > d − N . Let x ( d ) be that solution and,like before, we can define a function h α : (0 , → (0 , h α ( d ) = N − αdN + αd ) ≤ d − N x ( d ) if ( N − αdN + αd ) > d − N >
11 if d = 1. (34)This determines the value of z for points on thesurface of the hypercube.The value of z for points on the “cut-off” cornercan be calculated just like in the case of “invari-ant 1”. Only the scaling of the containing cubeneeds to be taken into account. The total z func-tion is then again the maximum of the two values: z ( y ) = max (cid:18) h α ( y max ) , y min αy min + (1 − αy max ) (cid:19) .(35)The final test statistic “invariant 3” is then builtaccording to Equation 27. A python implementa-tion of this test statistic is provided in Listing 2 inthe appendix.This test statistic has one free shape parameter α .It determines the opening angle of the iso- z surfacewhere it meets the main diagonal of the hypercube.In the 2D case, this angle is constant at 90 ◦ for α →
0, making it identical to the “invariant 2”8 x )0.20.30.40.50.60.70.80.91.0 C D F Uncorr.Corr.50Corr.90Corr.99expected 10 Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 Uncorr.Corr.50Corr.90Corr.99
Figure 5: CDFs (left) for the “invariant 1” test statistic for different levels of correlations in the data. When using theuncorrelated CDF to calculate the assumed significance level (or p-value) of a value of the statistic, the actual level will differfrom the assumption depending on the correlations (right). The effect of the correlations is weaker than for the naive teststatistic, but it is not consistently conservative as the fitted one. As the correlations increase, the distribution of the teststatistic approaches the uncorrelated expectation again. x )0.20.30.40.50.60.70.80.91.0 C D F Uncorr.Corr.50
Corr.90
Corr.99expected 10 Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 4 Uncorr.Corr.50Corr.90Corr.99
Figure 6: CDFs (left) for the “invariant 2” test statistic for different levels of correlations in the data. When using theuncorrelated CDF to calculate the assumed significance level (or p-value) of a value of the statistic, the actual level will differfrom the assumption depending on the correlations (right). Like the fitted test statistic, this one is consistently conservative.As the correlations increase, the distribution of the test statistic approaches the uncorrelated expectation again. α >
0, the angle starts at 90 ◦ at x = 0and then opens up with increasing x . The value of α determines where the angle reaches 180 ◦ : x ◦ = 12 α . (36)E.g. for α = 1, corresponding to the “invariant 1”case, the opening angle is 180 ◦ at x = 0 . < ◦ means that the surface “protrudes” into parts ofthe CDF space that should “belong” to a highervalue of x , suggesting a conservative statistic. Con-versely, an opening angle > ◦ means that theCDF space perpendicular to the diagonal is onlycovered with higher x , suggesting a coverage thatis actually lower than the expectation. At α = 0 . < ◦ for all x (as x is always < α though. Figure 7 shows the performance of the“invariant 3” test statistic with α = 2 /
3. Out ofall considered test statistics, it shows the smallestdeviations from exactness, and it is conservative forthe considered significance levels and correlations.
4. Comparison
It is quite clear that among the considered teststatistics, “invariant 3” performs the most consis-tent under many different levels of correlations. Itsuffers from a kind of arbitrariness though whentrying to combine it in larger fits with other datasets. The transformation to a chi-square distribu-tion with one degree of freedom is a conservativechoice that allows it to be combined in least-squaresor likelihood fits. Aside from the case of perfect cor-relations, it will under-estimate the amount of in-formation compared to the other data-sets though.The “fitted” statistic is quite a bit more conserva-tive and it gets more conservative the stronger thecorrelations are. It has the advantage though thatit corresponds to an “actual Mahalanobis distance”when considering the correlations in the data as nui-sance parameters. It should thus easily be includedin least-square fits in combination with other datasets. Wilks’ theorem does not hold for it though,except for very strong correlations. Only then isit distributed like a chi-square with one degree of freedom. With no correlations present, it followsthe “Bee-square” distribution.Figure 8 shows the shape of the resulting confi-dence regions for the different test statistics in twodimensions. The “invariant” statistics all have across shape at low confidence levels and then getprogressively more square. The confidence regionsof the “invariant 1” statistic extend all the way to ±∞ along the variable axes. This is due to the factthat all lines of the construction in the CDF spacego to the edges of the hypercube. For a normaldistributed variable a CDF of 1 means a variablevalue of ±∞ . The coverage is still correct in thecase of no correlations, but in principle this meansthat one would have to include a point in the 1 σ re-gion that is arbitrarily far away on one axis, as longas it is close enough to 0 in the other axis. This isclearly counter-intuitive and could very well lead tostrange behaviour in fits. Combined with the factthat it is not conservative for all confidence levels,the “invariant 1” statistic should probably not beused.It is worth stressing that the actual coverage be-haviour of the test statistics will depend on the ac-tual correlations present in the data sets. The ex-amples shown here were very simple, with constantcorrelation coefficients between all data points. Fig-ure 9 shows the performance of the “invariant 3”test statistic for a data set with a more complicatedcorrelation structure. The data consists of N = 100data points with a covariance matrix of the form S = c c . . . c N − c c . . . c N − c c . . . c N − ... ... ... . . . ... c N − c N − c N − · · · . (37)Here c is the maximum correlation as specified inthe plot labels, and c N − = − c /
2. All intermedi-ate c i are linearly equidistant, so c i − c i +1 = const .The correlations are thus stronger for “neighbour-ing” data points and there is some negative cor-relation as well. Since the total covariance neverreaches the limit of “perfect correlation”, the limit-ing exactness is not as efficient in this data set andthe performance is closer to that of the “fitted” teststatistic as shown in Figure 10. Since the “fitted”test statistic is considerably easier to calculate, itmight be worth choosing it over an “invariant” one,depending on the reasonably expected correlationsand computation requirements.10 x )0.20.30.40.50.60.70.80.91.0 C D F Uncorr.Corr.50Corr.90Corr.99expected 10 Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 Uncorr.Corr.50Corr.90Corr.99
Figure 7: CDFs (left) for the “invariant 3” test statistic for different levels of correlations in the data. When using theuncorrelated CDF to calculate the assumed significance level (or p-value) of a value of the statistic, the actual level will differfrom the assumption depending on the correlations (right). The behaviour of this statistic depends on the parameter α . For α = 1 it is identical to the “invariant 1” statistic, while for α → α = 2 /
5. Application to real neutrino cross-sectiondata
As a test, let us apply the “invariant 3” teststatistic to some real data and compare its resultswith the “naive” approach. We will use the double-differential, charged-current quasi-elastic cross sec-tion measurement of (anti-)muon neutrinos byMiniBooNE[1, 2]. This data is presented in theform of a set of differential cross sections with a“shape error” for each bin, plus a relative “normal-isation error” common to all bins. The publicationsprovide no information about the correlations be-tween the bins, or between the shape and the nor-malisation error.Model predictions for the measurements weregenerated with NUISANCE[3]. The generators andmodels considered in these studies are:1. GENIE[4] v3.00.06 tune G18 10a 02 11a2. GENIE v3.00.06 tune G18 10b 00 0003. NEUT[5] v5.4.14. NuWro[6] v19.02.25. GENIE v3.00.06 with SuSAv2[7]To make them comparable to the data, they aresplit into a shape and a normalisation part as de-scribed in [2]. The shape part is scaled to the totalcross-section of the data, so it can be compared di-rectly to the published data points: x data/MC,norm = (cid:88) i x data/MC i w i , (38) x MC,shape i = x MC i x MC,norm x data,norm , (39) with the 2D bin area w i , which is constant for allbins in this case. Figure 11 shows the compari-son of the data and the model predictions. Despitethe lack of covariance information and the impliedclaim that the uncertainties are independent, it isclear that the “shape errors” must be correlated insome way. Not only does the cross-section vary toosmoothly from bin to bin, but any variation of onlythe shape must yield a set of points with a constantsum, meaning the uncertainties of the points can-not be uncorrelated. Furthermore, it is reasonableto assume that there could be correlations betweenthe normalisation and the shape. Such correlationswould easily arise from scaling uncertainties thataffect some bins more than others.Given the data and the model predictions, it iseasy to calculate p-values with the “naive” and the“invariant 3” test statistics: p naive = 1 − F χ N (cid:0) naive (cid:0) ∆ | s data (cid:1)(cid:1) , (40) p invar = 1 − F χ (cid:0) invariant (cid:0) ∆ | s data , α = 2 / (cid:1)(cid:1) ,(41)with ∆ = x data − x MC , and the number of datapoints N . Both the vectors x and N explicitly in-clude the added “normalisation bin” when compar-ing both shape and normalisation.Table 1 shows the p-values resulting from thecomparisons using the two test statistics, with andwithout the normalisation bin. In most cases, thenaive chi-square statistic suggest a much strongerdisagreement between the data and the model thanthe invariant statistic. This is consistent with the11 / S / S / S / S / S / S / S / S Figure 8: Shape of confidence regions of the different statistis in two dimensions. The confidence levels correspond to a 0 . σ , 1 σ ,2 σ , and 3 σ deviation of a one-dimensional normal distributed variable. The shown test statistics are: top left, ”naive” (solid)and ”invariant 3” (dashdotdot); top right, ”naive” (solid), ”fitted” (dashdot), ”invariant 1” (dashed); bottom left, ”naive”(solid), ”invariant 1” (dashed), ”invariant 2” (dotted); bottom right, ”naive” (solid), ”invariant 1” (dashed), ”invariant 3( α = 2 / x )0.20.30.40.50.60.70.80.91.0 C D F Uncorr.Corr.50Corr.90Corr.99expected 10 Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 Uncorr.Corr.50Corr.90Corr.99
Figure 9: CDFs (left) for the “invariant 3” test statistic for different levels of correlations in data with a more complicatedcorrelation structure (see text). When using the uncorrelated CDF to calculate the significance level (or p-value) of a value ofthe statistic, the actual level depends on the correlations (right). x )0.00.20.40.60.81.0 C D F Uncorr.Corr.50Corr.90Corr.99expected Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 Uncorr.Corr.50Corr.90Corr.99
Figure 10: CDFs (left) for the “fitted” test statistic for different levels of correlations in data with a more complicated correlationstructure (see text). When using the uncorrelated CDF to calculate the significance level (or p-value) of a value of the statistic,the actual level depends on the correlations (right).Table 1: P-values from the comparison of models and MiniBooNE data.
Genie 10a Genie 10b Neut NuWro SuSAv2 ν Shape naive 1.89e-03 2.29e-03 5.24e-10 5.14e-06 2.19e-24& norm. invariant 2.76e-04 2.94e-03 1.82e-02 1.31e-01 2.48e-03Shape naive 3.50e-02 2.37e-02 5.66e-09 2.27e-05 3.40e-24only invariant 6.89e-02 1.40e-01 1.81e-02 1.30e-01 2.46e-03¯ ν Shape naive 6.09e-26 5.58e-25 8.67e-15 1.94e-26 5.12e-01& norm. invariant 5.12e-05 7.13e-05 5.61e-04 4.27e-05 8.46e-01Shape naive 9.92e-24 3.71e-23 2.82e-14 7.32e-26 4.96e-01only invariant 5.05e-05 7.05e-05 5.54e-04 4.22e-05 8.42e-0113
20 40 60 80 100 120 140Bin mumber0.00.51.01.52.02.53.03.54.0 d d T d c o s () / [ c m G e V ] p e r n e u t r o n
1e 38
Shape & norm comparison, nu mode
MiniBooNE totalMiniBooNE shape SuSAv2 ( p naive =2.19e-24, p invar =2.48e-03)NuWro ( p naive =5.14e-06, p invar =1.31e-01) Neut ( p naive =5.24e-10, p invar =1.82e-02)Genie_10b ( p naive =2.29e-03, p invar =2.94e-03) Genie_10a ( p naive =1.89e-03, p invar =2.76e-04) t o t / [ c m ] p e r n e u t r o n
1e 380 10 20 30 40 50 60 70 80Bin mumber0.00.51.01.52.0 d d T d c o s () / [ c m G e V ] p e r p r o t o n
1e 38
Shape & norm comparison, antinu mode
MiniBooNE totalMiniBooNE shape SuSAv2 ( p naive =5.12e-01, p invar =8.46e-01)NuWro ( p naive =1.94e-26, p invar =4.27e-05) Neut ( p naive =8.67e-15, p invar =5.61e-04)Genie_10b ( p naive =5.58e-25, p invar =7.13e-05) Genie_10a ( p naive =6.09e-26, p invar =5.12e-05) t o t / [ c m ] p e r p r o t o n
1e 39
Figure 11: Comparison of shape and normalisation between MiniBooNE data and several model predictions. The cosine of themuon angle increases bin-by-bin, the muon’s kinetic energy increases block-by-block. The error bars show the “shape error” ofthe data. The data point on the right shows the common “normalisation error”. The model predictions have been scaled tothe same total cross section as the data for the shape comparison, i.e. the sum over all bins is identical for all models and thedata. The points on the right show the actual total cross section of the model predictions. The p-values were calculated usingall bins and the normalisation. See Table 1 for shape-only p-values. Q ij = (cid:88) k,l J ki M ij J lj . (42)Here M is the original covariance matrix and J is the Jacobian matrix of the combined shape andnorm vector: J ij = (cid:40) δ ij x − − x i w j x − if 0 ≤ i < Nw j if i = N , (43) with the number of original cross-section bins N .The correlation matrices before and after the de-composition are shown in Figure 13. As intended,the decomposition reduces the amount of positivecorrelation in the data, but considerable positiveand negative correlations remain.Figure 14 shows the model comparisons with dif-ferent test statistics in this decomposed data. Fol-lowing MiniBooNE’s approach, the shape errors arescaled up by the total cross section in the data forplotting purposes. Again, the “invariant 3” teststatistic is consistently conservative compared tothe full Mahalanobis distance, while the “naive”test statistic is not. Note that for the calculationof the Mahalanobis distance, the pseudo-inverse ofthe covariance matrix was used. Since shape andnorm together have one additional dimension com-pared to the original cross section, their covariancematrix is not positive definite in general.Table 2 summarises the p-values from the differ-ent model comparisons to the MINERvA data. The“full” p-values using the correlated Mahalanobisdistance in the shape and norm case are differ-ent from the original case, despite the fact thatthey should contain the same information. Thisis caused by the different parameterisations of theproblem space. Since the decomposition in shapeand norm is not a linear transformation, the likeli-hood surfaces described by the covariance matricescannot be identical. It is to be expected that thiswill lead to differing p-values, especially in the low-probability tails. But even for the comparativelywell-fitting models with p-values in the order of afew percent, the difference is surprisingly strong.The NuWro prediction of the antineutrino data hasa p-value of ∼
13% in the original paramterisation,but only 0 .
03% in the decomposed view. This wouldmake a huge difference in the interpretation of thedata with respect to this model! Note also, thatthe invariant test statistic in the decomposed caseis much closer to the original full p-values in thosecases.Since we have the covariance matrices, we cancreate toy data that is distributed according tothem and check for the coverage properties of thedifferent test statistics, just like we did in the previ-ous sections. The results of these studies is shown inFigure 15. The top plot shows the performance ofthe full, naive, and invariant statistics when appliedto the data distributed according to MINERvA’soriginal covariance matrix. The middle plots showthe same for data distributed according to the de-15
20 40 60 80 100 120 140Bin mumber01234 d dp T dp || / [ c m G e V c ] p e r nu c l e o n
1e 39
As-is comparison, nu mode
MINERvASuSAv2 ( p full =7.87e-37, p naive =4.00e-18, p invar =1.94e-06) NuWro ( p full =7.82e-22, p naive =3.24e-14, p invar =1.04e-06)Neut ( p full =1.90e-17, p naive =5.06e-08, p invar =1.72e-05) Genie_10b ( p full =1.02e-43, p naive =1.87e-41, p invar =4.31e-21)Genie_10a ( p full =8.61e-41, p naive =1.84e-47, p invar =1.53e-20) d dp T dp || / [ c m G e V c ] p e r nu c l e o n
1e 39
As-is comparison, antinu mode
MINERvASuSAv2 ( p full =5.02e-03, p naive =1.73e-01, p invar =4.79e-01) NuWro ( p full =9.45e-02, p naive =3.70e-04, p invar =2.47e-01)Neut ( p full =5.95e-03, p naive =1.37e-08, p invar =5.31e-02) Genie_10b ( p full =1.49e-02, p naive =3.56e-09, p invar =6.73e-02)Genie_10a ( p full =8.10e-03, p naive =3.77e-12, p invar =2.47e-02) Figure 12: Comparison between MINERvA data and several model predictions. The muon’s parallel momentum p || increasesbin-by-bin, its transverse momentum p T block-by-block. The error bars show the diagonal elements of the full covariance ofthe data.
20 40 60 80 100 120 140020406080100120140
Full correlation, nu mode
Full correlation, antinu mode
Shape + total correlation, nu mode
Shape + total correlation, antinu mode
Figure 13: Correlation matrix of the MINERvA experiment as reported (top) and after decomposition into shape and normal-isation bin (bottom) for both neutrino (left) and antineutrino (right) modes. The decomposition reduces the strong positivecorrelations in the data, but considerable positive and negative correlations remain.
20 40 60 80 100 120 140Bin mumber01234 d dp T dp || / [ c m G e V c ] p e r nu c l e o n
1e 39
Shape & norm comparison, nu mode
MINERvA totalMINERvA shapeSuSAv2 ( p full =6.86e-27, p naive =3.15e-23, p invar =2.61e-05) NuWro ( p full =2.09e-37, p naive =1.38e-32, p invar =3.29e-10)Neut ( p full =3.87e-26, p naive =7.58e-21, p invar =1.44e-07) Genie_10b ( p full =4.99e-58, p naive =2.28e-78, p invar =1.09e-26)Genie_10a ( p full =9.66e-62, p naive =9.89e-84, p invar =4.44e-28) t o t / [ c m ] p e r nu c l e o n
1e 390 10 20 30 40 50 60Bin mumber0.00.51.01.52.02.53.03.5 d dp T dp || / [ c m G e V c ] p e r nu c l e o n
1e 39
Shape & norm comparison, antinu mode
MINERvA totalMINERvA shapeSuSAv2 ( p full =7.07e-06, p naive =1.25e-04, p invar =3.76e-02) NuWro ( p full =1.41e-04, p naive =6.36e-03, p invar =2.19e-02)Neut ( p full =2.94e-09, p naive =2.33e-05, p invar =4.70e-04) Genie_10b ( p full =8.95e-08, p naive =5.54e-07, p invar =4.21e-02)Genie_10a ( p full =1.43e-09, p naive =5.08e-07, p invar =2.68e-04) t o t / [ c m ] p e r nu c l e o n
1e 39
Figure 14: Comparison between MINERvA data and several model predictions. The muon’s parallel momentum p || increasesbin-by-bin, its transverse momentum p T block-by-block. The error bars show the “shape error” of the data. The data pointon the right shows the common “normalisation error”. The model predictions have been scaled to the same total cross sectionas the data for the shape comparison, i.e. the sum over all bins weighted by the 2D bin widths is identical for all models andthe data. The points on the right show the actual total cross section of the model predictions. The p-values were calculatedusing all shape bins and the normalisation. See Table 2 for shape-only p-values.
6. Conclusions
It was shown that the use of the “naive” uncor-related Mahalanobis distance in the presence of un-known correlations leads to wrong coverage proper-ties. Alternative test statistics were presented thatperform more robust under varying degrees of cor-relation in the data.The “fitted” test statistic is motivated by treat-ing the correlations as nuisance parameters. Its dis-tribution in the absence of correlations is known(the “Bee-square distribution”) and it is conserva-tive in their presence. It could be used when itis important to have an “actual” Mahalanobis dis-tance for the combination with other data sets, The usefulness of this is probably limited though, as it and the overestimation of the error in the presenceof correlations is acceptable. It also has the advan-tage of being incredibly easy to calculate, as it isjust the maximum squared z-score among the vari-ables. Depending on the actual correlations in thedata, its performance is actually comparable to the“invariant” test statistics.The “invariant 3” test statistic performs the bestacross varying levels of correlations. Its level of con-servativeness can be tuned with the shape param-eter α . A value of 0.5 seems to be a safe choice,while the best performance in the toy data sets inthe presented studies was achieved at α = 2 /
3. Asit goes to 0, the test statistic becomes equivalentto the “invariant 2” statistic, which can be seen asthe most conservative version of “invariant 3”. The“invariant 3” statistic could be used when it is im-portant to not over-estimate the uncertainties bytoo much. If necessary, the relative weight of thestatistic can be tuned by transforming it to a chi-square distribution with
N > N is somewhat arbitrary though.The application to real data from MiniBooNEand MINERvA shows that the “invariant 3” teststatistic performs as expected. It is consistentlyconservative, while the “naive” uncorrelated Maha-lanobis distance overestimates the strength of thediscrepancy between data and model in multiple in-stances. When the MINERvA data is decomposedinto a shape and a normalisation part, the invariantstatistic even has better coverage properties than afull Mahalanobis distance that uses a linearly prop-agated covariance matrix. This is most likely dueto the non-linear nature of the decomposition intoshape and norm, which distorts the shape of thedistribution. If it was multivariate Gaussian origi-nally, the covariance in the shape and norm spacecan only ever be an approximation. Since the in-variant statistic does not use information about thecorrelations between the data points, it is not affectas much by this as the fully correlated Mahalanobisdistance. Thanks
I would like to thank Callum Wilkinson andStephen Dolan for generating the NUISANCE filesthat were used in the MiniBooNE and MINERvAstudies, as well as providing a space for discussions, will not be chi-square distributed in general. Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 invariant 3naivefull 10 Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 invariant 3naivefull10 Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 invariant 3naivefull 10 Assumed significance level 10 R e a l s i g i n i f i c a n c e l e v e l
1 2 3 invariant 3naivefull10 Assumed significance level R e a l s i g i n i f i c a n c e l e v e l
1 2 3 invariant 3naivefull 10 Assumed significance level R e a l s i g i n i f i c a n c e l e v e l
1 2 3 invariant 3naivefull
Figure 15: Assumed vs actual significances with different test statistics in toy data distributed according to the providedMINERvA covariance (top) and the covariance of the decomposition into shape and total cross section (middle) for both theneutrino (left) and antineutrino data (right). The bottom plots show the calculated and true significance levels when generatingtoy data according to the original covariance and applying the decomposition afterwards. The nonlinear transformation of thedecomposition causes a significant distortion in the distribution of the Mahalanobis distance, even when using the full covariancematrix. The invariant test statistic seems to be less sensitive to these distortions than the full and naive ones. able 2: P-values from the comparison of models and MINERvA data. Genie 10a Genie 10b Neut NuWro SuSAv2 ν As is full 8.61e-41 1.02e-43 1.90e-17 7.82e-22 7.87e-37naive 1.84e-47 1.87e-41 5.06e-08 3.24e-14 4.00e-18invariant 1.53e-20 4.31e-21 1.72e-05 1.04e-06 1.94e-06Shape full 9.66e-62 4.99e-58 3.87e-26 2.09e-37 6.86e-27& norm. naive 9.89e-84 2.28e-78 7.58e-21 1.38e-32 3.15e-23invariant 4.44e-28 1.09e-26 1.44e-07 3.29e-10 2.61e-05Shape full 9.37e-62 4.32e-58 3.11e-26 1.96e-37 7.36e-27only naive 8.78e-84 1.43e-78 6.60e-21 1.63e-32 2.54e-23invariant 4.41e-28 1.08e-26 1.43e-07 3.27e-10 2.60e-05¯ ν As is full 8.10e-03 1.49e-02 5.95e-03 9.45e-02 5.02e-03naive 3.77e-12 3.56e-09 1.37e-08 3.70e-04 1.73e-01invariant 2.47e-02 6.73e-02 5.31e-02 2.47e-01 4.79e-01Shape full 1.43e-09 8.95e-08 2.94e-09 1.41e-04 7.07e-06& norm. naive 5.08e-07 5.54e-07 2.33e-05 6.36e-03 1.25e-04invariant 2.68e-04 4.21e-02 4.70e-04 2.19e-02 3.76e-02Shape full 6.04e-08 2.71e-06 1.28e-08 2.78e-04 9.13e-06only naive 3.10e-06 2.19e-06 8.74e-05 1.12e-02 1.54e-04invariant 2.64e-04 4.14e-02 4.62e-04 2.15e-02 3.69e-02feedback, and venting. Thanks also goes to LouisLyons for providing feedback and clarifying discus-sions about the concepts discussed in this paper.This work was supported by a grant from the Sci-ence and Technology Facilities Council.
References [1] A. A. Aguilar-Arevalo, et al., First measurement of themuon neutrino charged current quasielastic double dif-ferential cross section, Physical Review D 81 (9) (may2010). arXiv:1002.2680 , doi:10.1103/physrevd.81.092005 .[2] A. A. Aguilar-Arevalo, et al., First measurement of themuon antineutrino double-differential charged-currentquasielastic cross section, Physical Review D 88 (3)(aug 2013). arXiv:1301.7067 , doi:10.1103/physrevd.88.032001 .[3] P. Stowell, et al., NUISANCE: a neutrino cross-sectiongenerator tuning and comparison framework, Journalof Instrumentation 12 (01) (2017) P01016–P01016 (jan2017). doi:10.1088/1748-0221/12/01/p01016 .[4] C. Andreopoulos, et al., The GENIE neutrino montecarlo generator, Nucl. Instrum. Meth. A 614 (2010)87–104 (2010). arXiv:0905.2517 , doi:10.1016/j.nima.2009.12.009 .[5] Y. Hayato, A neutrino interaction simulation programlibrary NEUT, Acta Phys. Polon. B40 (2009) 2477–2489(2009).[6] T. Golan, J. Sobczyk, J. ˙Zmuda, NuWro: the wroc(cid:32)lawmonte carlo generator of neutrino interactions, NuclearPhysics B - Proceedings Supplements 229-232 (2012) 499(aug 2012). doi:10.1016/j.nuclphysbps.2012.09.136 .[7] R. Gonz´alez-Jim´enez, G. D. Megias, M. B. Barbaro,J. A. Caballero, T. W. Donnelly, Extensions of super- scaling from relativistic mean field theory: The SuSAv2model, Physical Review C 90 (3) (sep 2014). doi:10.1103/physrevc.90.035501 .[8] D. Ruterbories, et al., Measurement of quasielastic-likeneutrino scattering at (cid:104) E ν (cid:105) ∼ . arXiv:1811.02774 , doi:10.1103/physrevd.99.012004 .[9] C. E. Patrick, et al., Measurement of the muon anti-neutrino double-differential cross section for quasi-elasticscattering on hydrocarbon at e ν ∼ . arXiv:1801.01197 , doi:10.1103/PhysRevD.97.052002 . isting 1: Python implementation of the Bee-square distribution. import numpy a s np from s c i p y . s t a t s import r v c o n t i n u o u s from s c i p y . s p e c i a l import e r f , e r f i n v c l a s s Bee ( r v c o n t i n u o u s ) : def c d f ( s e l f , x , d f ) : return e r f ( x/np . s q r t ( 2 ) ) ∗ ∗ d f def p d f ( s e l f , x , d f ) :r e t = d f ∗ ( e r f ( x/np . s q r t ( 2 ) ) ) ∗ ∗ ( df − return r e t ∗ np . s q r t ( 2 / np . p i ) ∗ np . exp( − x ∗ ∗ def ppf ( s e l f , x , d f ) : return e r f i n v ( ( x ) ∗ ∗ ( 1 / d f ) ) ∗ np . s q r t ( 2 ) bee = Bee ( a=0) c l a s s Bee2 ( r v c o n t i n u o u s ) : def c d f ( s e l f , x , d f ) :b = np . s q r t ( x )r e t = bee . c d f ( b , d f ) return r e t def p d f ( s e l f , x , d f ) :r e t = d f ∗ ( e r f ( np . s q r t ( x / 2 ) ) ) ∗ ∗ ( df − return r e t / np . s q r t ( 2 ∗ np . p i ∗ x ) ∗ np . exp( − x / 2 ) def p p f ( s e l f , x , d f ) :b = bee . ppf ( x , d f ) return b ∗∗ bee2 = Bee2 ( a=0) 22 isting 2: Python implementation of the “invariant 3” test statistic. Data must be provided in z-scores, i.e. normalised by thevariance. Uses caching for improved performance and works with survival functions instead of CDFs for improved numericalaccuracy. @np . v e c t o r i z e@ l r u c a c h e ( 1 0 0 0 0 ) def yfrommax ( b , d f =2, a l p h a = 0 . 5 ) : ””” (1 − d i a g o n a l c o o r d i n a t e ) from (1 − max) o f a c c e p t e d r e g i o n ””” − b ) ∗∗ d f − ( ( y − b ) ∗∗ d f )/((1 − a l p h a+a l p h a ∗ y ) ∗ ∗ ( df − − y b e t a = 1 − a l p h aq = ( 1 . 0 − b ) ∗∗ d f − − def f ( y ) : return q − ( ( y − b ) ∗∗ d f ) / ( ( b e t a + a l p h a ∗ y ) ∗∗ ( dfm ) ) + y i f b < = 0 : return i f b > = 1 : return e l s e : return r o o t s c a l a r ( f , x0=b , x1=b ∗ def yfrommax ( b , d f =2, a l p h a = 0 . 5 ) : ””” B u f f e r and i n t e r p o l a t e v a l u e s t o s p e e d t h i n g s up . ””” s t e p = 0 . 0 0 0 1b = np . f l o o r ( b / s t e p , dtype= f l o a t ) ∗ s t e pb = b + s t e pd e l t a = ( b − b ) / s t e px = yfrommax ( b , d f=df , a l p h a=a l p h a )x = yfrommax ( b , d f=df , a l p h a=a l p h a ) return x + ( x − x ) ∗ d e l t a def i n v a r i a n t 3 ( x , a l p h a =0.5 , f a s t=F a l s e ) : ””” Return t e s t s t a t i s t i c g i v e n v e c t o r o f n o r m a l i s e d v a l u e s . ””” i f f a s t :s f = 1 − c h i 2 . c d f ( x ∗∗ e l s e :s f = c h i 2 . s f ( x ∗∗ a = np . min ( s f , a x i s = − max ( s f , a x i s = − − y f c = ( a l p h a ∗ ( a − b ) + b ) / ( 1 . 0 + a l p h a ∗ ( a − b ) )y = np . minimum ( y f c , yfm )y = np . maximum( y , 0 ) returnreturn