A semi-analytical solution to the maximum likelihood fit of Poisson data to a linear model using the Cash statistic
OORIGINAL RESEARCH ARTICLE
A semi–analytical solution to the maximum–likelihood fit of Poissondata to a linear model using the Cash statistic
Massimiliano Bonamente a and David Spence a a Department of Physics and Astronomy, University of Alabama in Huntsville, Huntsville AL35899 (U.S.A)
ARTICLE HISTORY
Compiled September 18, 2020
ABSTRACT
The
Cash statistic, also known as the C statistic, is commonly used for theanalysis of low–count Poisson data, including data with null counts for certain valuesof the independent variable. The use of this statistic is especially attractive for low–count data that cannot be combined, or re–binned, without loss of resolution. Thispaper presents a new maximum–likelihood solution for the best–fit parameters of alinear model using the Poisson–based Cash statistic.The solution presented in this paper provides a new and simple method to mea-sure the best–fit parameters of a linear model for any Poisson–based data, includingdata with null counts. In particular, the method enforces the requirement that thebest–fit linear model be non–negative throughout the support of the independentvariable. The method is summarized in a simple algorithm to fit Poisson countingdata of any size and counting rate with a linear model, by–passing entirely the useof the traditional χ statistic. KEYWORDS
Probability; Statistics; Maximum–likelihood methods; Cash statistic; Parameterestimation
1. Introduction
The maximum–likelihood modelling of integer–valued Poisson data can be accom-plished with the use of the
Cash , or C statistic, first proposed by [8]. The Cash statis-tic applies to a variety of counting data in use across the sciences. One example isthe counting of photons as a function of energy or wavelength, as commonly done byphoton–counting detectors used in astronomy [e.g., 5]. Another example is the num-ber or percentage of votes for a candidate in different precincts or polling stations.In counting experiments such as these, the collected data are in the form of inde-pendent integer–valued variables. The behavior of these variables as a function of anindependent variable (such as photon energy or number of voters in a precinct) canbe modelled with the aid of the Cash statistic, which is obtained from the logarithmof the likelihood of the data with a model for the distribution of the counts.It is well established that the asymptotic distribution of the C statistic, in the large–count limit, is a χ distribution [e.g. 8, 13]. This limit is a result of the asymptotic The author gratefully acknowledges support of
NASA Chandra grant AR6-17018X. a r X i v : . [ s t a t . M E ] S e p onvergence of a Poisson distribution with a Gaussian distribution of same mean andvariance, which occurs for large values of the Poisson mean. It is straightforward tostudy the distribution of the C statistic as a function of the parent mean µ , when theparent mean is specified a priori , e.g., when the fitting model has no free parameters.Such calculations are reported in [4, 7, 13], showing, among other results, that theexpectation of C is significantly lower than the expectation of a χ distribution withthe same number of degrees of freedom.The use of the C statistic for integer–valued, counting Poisson data is to be preferredto the use of the more common χ distribution. First, even in the large–count limit,use of the χ fit statistic leads to a bias in the best–fit parameters, due to the approxi-mation of the Gaussian variance with the measured Poisson counts [e.g., 12]. Second,use of the χ statistic often requires the combination of datapoints, often referred toas binning of the independent variable, to reach a sufficient number of counts in eachindependent data point. Such binning may result in an undesirable reduction in theresolution of the data, especially in the presence of sharp features in narrow intervalsof the independent variable, such as emission or absorption lines. The C statistic, onthe other hand, can be used on unbinned data that make use of the full resolution ofthe data.Use of the C statistic also comes with a number of challenges. First, it is notknown exactly how free model parameters affect the distribution of C min – i.e., the C statistic obtained when optimizing variable model parameters – especially in thelow–count regime. A study of the distribution of C min for a simple one–parameterconstant model was reported by [4], although those results do not directly apply tomore complex models such as the linear model.Another challenge is the numerical complexity of the Poisson distribution and theassociated C statistic, which limits the ability to obtain analytical solutions for thebest–fit parameters via the maximum–likelihood criterion. A key illustration of thischallenge is that the simple linear model, which has an analytical solution for its best–fit model parameters and their covariance matrix when using the χ statistic [e.g. 2, 3],does not have an equally simple solution when using the C statistic.This paper addresses the latter problem by presenting a new semi–analytical methodto identify the maximum–likelihood solution of the parameters of a linear model usingthe C statistic. The method consists of the numerical solution of a simple analyticalequation that determines the best–fit value of one of the two parameters, and the useof an analytical function to calculate the other parameter. It is also shown that notall Poisson data sets can be fit to an unconstrained linear model, since the resultingbest–fit model may become negative, and therefore not usable for calculation of thePoisson–based C statistic. In those cases, a simple generalization of the linear modelis proposed that enforces the non–negative requirement for the best–fit model. Such ageneralization ensures that data of all sizes and counting rates can be fit with a linearmodel, and that such model is unique. The results presented in this paper thereforeease the challenges presented by the numerical complexity of the Poisson distribution,by providing a simple semi–analytical method to find the best–fit parameters of thelinear model, and making it possible to study the distribution of C min in the low–countregime.This paper is structured as follows. Section 2 introduces the maximum–likelihoodmethod and the Poisson–based C statistic, Section 3 presents the equations for themodel parameters and Section 4 the solution of those equations. Section 5 discussesconditions for the non–negativity of the best–fit linear model and Section 6 presentsthe extended linear model that ensures an acceptable model for all datasets. Finally,2ection 7 contains a discussion and conclusions.
2. Methods for the maximum–likelihood analysis of Poisson data C statistic The data model considered in this paper is N independent integer–valued measure-ments y i , each Poisson–distributed and measured at a fixed value x i of the independentvariable x . The data can also be viewed as originating from M = (cid:80) y i independentevents that are sorted into N independent ”bins”, each of size ∆ x i and centered at x i with y i counts. This is a common type of data for the physical sciences; for example,the independent variable x may be the wavelength of collected photons, and y thenumber of photons collected in a given wavelength range, binned according to theresolution of the instrument. The data can therefore be summarized as a collection of N independent two–dimensional variables { x i , y i } , with y i ∼ Poiss( µ i ) , i = 1 , . . . , N where µ i is the unknown parent mean of the Poisson distribution of the counts y i ,collected in a fixed range x i ± ∆ x i / y = f ( x ), where f ( x ) is an analytical function with m adjustable parameters a j , j = 1 , . . . , m . The likelihood of the data with the model f ( x ) is given by L = N (cid:89) i =1 e − µ i µ y i i y i !where the adjustable parameters in f ( x ) are optimized so that the likelihood L ismaximised, for the given dataset. Instead of maximizing L directly, it is convenient tominimize the function C ≡ − L − B = 2 N (cid:88) i =1 ( µ i − y i + y i ln( y i /µ i )) = N (cid:88) i =1 C i (1)where B = 2 N (cid:88) i =1 ( y i − y i ln y i + ln y i ! )is a quantity that is independent of the model, and C i = 2 ( µ i − y i + y i ln( y i /µ i )) . The statistic C is known as the Cash or C statistic, originally proposed by [8] and[1] to model and analyze X–ray observations of astronomical sources. For a modelwith m free parameters, the minimization of the C statistic yields m equations, in3eneral non–linear, that need to be solved to obtain the maximum–likelihood best–fitestimates of the m model parameters. The linear model is a simple and commonly used relationship between two variables.The customary parameterization of a linear relationship between two variables x and y is f ( x ) = a + bx , with a and b as the two adjustable parameters. Another convenientand equivalent parameterization, suggested by [17], is of the form f ( x ) = λ (1 + a ( x − x A )) (2)with λ and a as the two adjustable parameters, and x A a fixed fiducial value of the x variable. As will be shown, this parameterization is convenient when taking thederivative of the terms ln f ( x i ) in Equation 1, since it leads to a separation betweenthe two parameters λ and a . This is the parameterization used in this paper .The continuous function f ( x ) is related to the parent mean µ i of each Poissonvariable y i via an integral over the length of the i –th bin, y ( x i ) = µ i = (cid:90) x i +∆ x i / x i − ∆ x i / f ( x ) dx = f ( x i )∆ x i , (3)where y ( x i ) is a step–wise function describing the Poisson mean for each bin, andthe last equality applies because of the linearity of f ( x ). It is therefore recognizedthat f ( x ) is a non–negative density function in units of counts per unit x (i.e., not just counts ), while its integral over a range ∆ x i is the predicted Poisson mean y ( x i ) = µ i inthat bin, in units of counts . Therefore, the two functions f ( x ) and y ( x i ) will vary fromeach other according to the size of the bins, which is allowed to be non–uniform. It isimportant to stress that the function f ( x ) must be non–negative, since it would not bemeaningful to have a Poisson variable with a negative parent mean. The requirement f ( x ) ≥ Integer–valued count data of the type considered in this paper can be modelled withalternative statistics that afford more flexibility than the single–parameter Poissondistribution, in particular with regards to over– or under–dispersion of the data [e.g.,6, 11, 18, 19]. Moreover, the regression with one or several independent variables mayoften require more complex models than a simple linear model. Generalized linearmodels and vector–generalized linear models provide a comprehensive and flexibleframework for the regression of data, including a natural way to account for non–negative Poisson means via a suitable link function that relates the Poisson mean tothe data and model parameters [e.g. 9, 14, 16, 21, 22]. Within the context of generalizedlinear models, a convenient link function would be in the form of log µ = a + bx , wherethe logarithm of the Poisson mean µ , instead of the mean itself, is modelled with a A maximum–likelihood solution for the standard form of the linear model with the C statistic is reported in[3]. It leads to a set of two non–linear coupled equations, whose numerical solution can be challenging. igure 1. Linear model according to Equation 2. In this illustration, the functions f ( x ), in units of countsper unit x , and y ( x i ), in units of total counts in the bin, follow one another closely because a bin size value of∆ x = 1 was used. linear function [as in, e.g. 10]. Such log–linear models would ensure that the Poissonmean is always positive (e.g., see chapter 6 of [14]).There are two main reasons for the present investigation of a simple linear regressionusing the standard Poisson distribution, instead of more versatile models or distribu-tions. First is the goal to obtain an analytical, and therefore computationally efficient,solution for the best–fit parameters of the linear model for count data. Currently, theonly available analytical solution for the maximum–likelihood fit of the linear modelis for Gaussian–distributed variables, and this method is not accurate for low–countinteger–valued data [e.g. 4]. Second, there may be scientific reasons to prefer a simplelinear model versus, e.g., a log–linear model or other more complex models, partic-ularly when the data or an underlying parent model suggest a direct linear relationbetween the dependent and independent variables [e.g. 15, 20]. The combination ofthese practical and scientific reasons make it interesting to seek an analytical solutionof the basic linear regression with Poisson data.
3. Maximum likelihood solutions for the parameters of the linear model C statistic for the linear model The linear model of Equation 2 is illustrated in Figure 1. To evaluate the C statisticof Equation 1 with the linear model of Equation 2, start with N (cid:88) i =1 µ i = (cid:90) x B x A f ( x ) dx = λR (cid:18) a R (cid:19) (4)5here R = x B − x A is the range of the x variable, and it is assumed that the datacovers the entire range R . There are situations where measurements of the dependentvariable y are missing for certain intervals of the independent variable x , for examplebecause measurements are not possible or because they are ignored in the analysis.An interval of the independent variable where data are missing, or are otherwise notused in the analysis, will be referred to as a gap . If the data contain gaps in the x variable, the limits of integration of Equation 4 will change. Section 6.3 describes thesimple modification required to analyze data that contain such gaps.The second term of the C statistic is simply N (cid:88) i =1 y i = M, where M is the total number of counts, and the final term is N (cid:88) i =1 ( y i ln( y i ) − y i ln µ i ) = N (cid:88) i =1 y i ln( y i ) − N (cid:88) i =1 y i ln f ( x i ) − N (cid:88) i =1 y i ln ∆ x i , where Equation 3 was used. The C statistic for the linear model of Equation 2 istherefore (5) C = 2 λR (cid:18) aR (cid:19) − M ln λ − N (cid:88) i =1 y i ln(1 + a ( x i − x A )) + D where D = (cid:32) N (cid:88) i =1 y i ln y i − M − N (cid:88) i =1 y i ln ∆ x i (cid:33) , is a model–independent term that does not have an effect in the subsequent minimiza-tion of the statistic. Note that the binning of the data is not required to be uniform,as will be illustrated in subsequent examples. Minimization of the C statistic in Equation 5 is obtained via ∂C∂λ = 2 R + aR − Mλ = 0 , which leads to λ ( a ) = MR (cid:18) a R (cid:19) (6)and 6 C∂a = λR − N (cid:88) i =1 y i ( x i − x A )1 + a ( x i − x A ) = 0 . Substituting Equation 6 to eliminate λ leads to M R a R − N (cid:88) i =1 y i ( x i − x A )1 + a ( x i − x A ) = 0and finally to a = − R + M (cid:80) Ni =1 y i ( x i − x A )1 + a ( x i − x A ) , (7)which is the equation to solve for the values of the a parameter. Equation 7 may berearranged as 1 + a R − M R g ( a ) = 1 + R (cid:18) a − Mg ( a ) (cid:19) = 0;It is thus convenient to define F ( a ) ≡ R (cid:18) a − Mg ( a ) (cid:19) (8)as the function whose zeros are solutions of Equation 7, with g ( a ) defined as g ( a ) ≡ N (cid:88) i =1 y i ( x i − x A )1 + a ( x i − x A ) . (9)The problem of finding solutions for the parameter a has therefore been cast as findingthe zeros of a function F ( a ), which uses the function g ( a ) of Equation 9. One of the keyproperties to find the zeros of F ( a ) is that the zeros of g ( a ) are points of singularityfor F ( a ), as will be shown in detail in Section 4.In summary, F ( a ) = 0 and Equation 6 are the two equations to solve to find themaximum–likelihood estimators a and λ of the linear model of Equation 2. The twoequations are uncoupled, and therefore the burden is limited to finding a solutionof F ( a ) = 0. Then, Equation 6 is used to find λ = λ ( a ). Notice that, in derivingEquation 6 and 7, no constraints were enforced to ensure that the best–fit model isnon–negative, which is necessary for the applicability of the Poisson statistics and forthe calculation C min . These constraints are presented in Section 5. It is useful to point out that a log–linear model, as obtained for example using a logarithmic link functionwithin the context of generalized linear models (see Section 2.3), would have led to coupled equations involvingthe exponential of the parameters, in place of Equations 6 and 7. . Analytical properties of the maximum–likelihood solution of the linearmodel The maximum–likelihood estimate of the parameters λ and a are obtained by firstfinding the zeros of the function F ( a ) defined by Equation 8. For this purpose, itis necessary to establish a few analytical properties of the functions F ( a ) and g ( a ).These properties will be used to study the location and properties of the zeros of thefunction F ( a ). It is necessary to discuss explicitly the simple case of data with M = 1count before presenting general results for M ≥
2. The case of M = 0 counts is notinteresting, since it represents a dataset with no positive measurements throughoutthe range of the independent variable. M = 1 When M = 1, or just one event ( y j = 1) recorded in a bin centered at x j , one obtains g ( a ) = x j − x A a ( x j − x A )leading to F ( a ) = 1 + a R − R (1 + a ( x j − x A ))2( x j − x A ) = 1 − R x j − x A )which is constant independent of a . The conclusion is that F ( a ) = 0 has no solutionswhen the data have only one count, and it is therefore not possible to find a maximum–likelihood solution for the linear model with M = 1. A simple interpretation for thisfinding is that it is not possible to constrain a two–parameter model with just onenon–null data point. Further discussion is provided in Section 7. M ≥ It is possible to find certain properties of g ( a ) and F ( a ) that apply in general. Theproperties will lead to a general criterion to identify solutions of F ( a ) = 0. Property 4.1 (Properties of the function g ( a )) . The function g ( a ), according toEquation 9, is the sum of n ≤ M terms of type g j ( a ) = y j ( x j − x A )1 + a ( x j − x A ) , where n is the number of bins with y j (cid:54) = 0, and j = 1 , . . . , n represents the bins withnon–null counts; when bins have no more than one count, then n = M . Therefore, thefunction g ( a ) has n points of singularity a j = − x j − x A )8hat fall in the range ( − / ∆ x , − / ( R − ∆ x N / a → a − j g ( a ) = −∞ lim a → a + j g ( a ) = + ∞ . It is also immediate to see that g (cid:48) ( a ) < g ( a ) decreases monotonically from + ∞ to −∞ between twoconsecutive points of singularity. Moreover, the asymptotic limits arelim a →±∞ g ( a ) = 0 . As a result, the function g ( a ) has n − g ( a ) arepoints of singularity for F ( a ).In particular, when M = 2 with y j = y k = 1 and no counts in any of the other bins,then the zero of g ( a ) can be calculated analytically as a c = − (cid:18) x j − x A + 1 x k − x A (cid:19) < M = 2 only) . For the general case of
M >
2, the zeros of g ( a ) must be calculated numerically, asexplained in the following. Property 4.2 (Behavior of F ( a ) near the points of singularity) . Since g ( a ) is contin-uous with g (cid:48) ( a ) < n points of singularity, g ( a ) < g ( a ) > a s any of the n − F ( a ), this implies that lim a → a − s F ( a ) = −∞ lim a → a + s F ( a ) = + ∞ . Property 4.3 (Asymptotic limit of F ( a )) . The asymptotic limit of F ( a ) at ±∞ ,defined as F ∞ ≡ lim a →±∞ F ( a ) , can be evaluated via the De L’Hospital rule for the associated function a − M/g ( a ),which is an indeterminate form of the type 0 / a →±∞ (cid:18) ag ( a ) − Mg ( a ) (cid:19) = lim a →±∞ (cid:18) g ( a ) + ag (cid:48) ( a ) g (cid:48) ( a ) (cid:19) = − M N (cid:88) i =1 y i x i − x A = − M M (cid:88) j =1 x j − x A . This property can be proven with a few steps algebra, by turning the sum over the N bins to an equivalent sum over the individual M counts. Therefore the asymptotic9 igure 2. (Left:) Function F ( a ) for a representative data set with M = 2, with 100 data points x i = i − . i = 1 , . . . , y = y = 1. (Right:) Function g ( a ) for the same data set. limit of F ( a ) becomeslim a →±∞ F ( a ) = F ∞ = 1 − R (cid:32) M N (cid:88) i =1 y i x i − x A (cid:33) . (10) Property 4.4 (Sign of F (cid:48) ( a )) . The derivative F (cid:48) ( a ) is calculated according to F (cid:48) ( a ) = R M R g (cid:48) ( a ) g ( a ) . (11)For M ≥ g (cid:48) ( a ) g ( a ) = − z + · · · + z M ( z + · · · + z M ) , with z j ≡ x j − x A a ( x j − x A )in which the sum is over all individual events, with some z j identical to each other ifthere are bins with more than one count. Using the Cauchy–Schwartz inequality1 M M (cid:88) j =1 z j ≤ M (cid:88) j =1 z j leads to g (cid:48) ( a ) g ( a ) ≤ − M . (12)10 igure 3. (Left:) Function F ( a ) for a representative data set with M = 3, with 100 data points x i = i − . i = 1 , . . . , y = y = y = 1. (Right:) Function f ( a ) for the same data set. Finally, using Equation 12 into Equation 11 leads to the conclusion that F (cid:48) ( a ) ≤ F ( a ).These properties of the function F ( a ) are also illustrated in Example 4.2 and Fig-ures 2 and 3. The properties of F ( a ) and g ( a ) can be used to state a general criterionto locate all solutions of the equation F ( a ) = 0. Lemma 4.1 (Location of zeros of F ( a )) . The function F ( a ) has n − zeros, where n ≤ M is the number of bins with non–null counts. Of these, n − zeros are foundbetween the n − points of singularity of F ( a ) , also zeros of g ( a ) . The remaining zerois found either to the left of the smallest point of singularity, if the asymptotic limit is F ∞ > , or to the right of the largest singularity, if F ∞ < . Proof.
The result is a direct consequence of the presence of n points of singular-ity for F ( a ) (property 4.1), the negative sign of F (cid:48) ( a ) between points of singularity(property 4.4) and the asymptotic limit of F ( a ) at the points of singularity (prop-erty 4.2).Properties of g ( a ) and F ( a ) are illustrated in the following example, which examinesthe behavior of the two functions for two simple datasets with M = 2 and M = 3. Example 4.2 (Two datasets with M = 2 and M = 3) . Two sample datasets with M = 2 and M = 3 are shown respectively in Figure 2 and 3. For the M = 2 dataset,with n = 2 bins with non–null counts, the function F ( a ) has just one point of discon-tinuity for F ( a ), also the zeros of g ( a ). This point of discontinutiy divides the domainof a into two intervals, with F ( a ) monotonically decreasing within these intervals, asshown in Figure 2. For the M = 3 dataset, with n = 3 bins with non–null counts, F ( a ) has n − g ( a ),which were in turn found between the n = 3 points of singularity of g ( a ), as shown inFigure 3. 11he n − F ( a ) are all possible solutions of the maximum–likelihood methodfor the linear model. The following section discusses if these solutions are acceptable,in the sense that they produce a model f ( x ) that is non–negative throughout thedomain of the x variable.
5. Acceptable solutions for the best–fit parameters of the linear model
In Section 4 it was shown that there are several possible solutions for the maximum–likelihood parameters of the linear model of Equation 2. In particular, data with M total counts, distributed over n ≤ M of the N available bins, have n − a that are a solution of the maximum–likelihood equation F ( a ) = 0, with thecorresponding value of λ provided by Equation 6. This section addresses the additionalrequirement that a model be non–negative in all bins, i.e., that a solution be acceptable,so that Poisson statistics apply and the C statistic can be calculated. Definition 5.1 (Acceptable solution of F ( a ) = 0) . A solution of the F ( a ) = 0 equa-tion is said to be acceptable if it leads to a non–negative model throughout the supportof the independent variable. Specifically, the function must satisfy the condition that y ( x i ) ≥ , i = 1 , . . . , N , so that the parent mean of the Poisson distributions is alwaysnon–negative.It will be shown in this section that there is at most one acceptable solution for anyPoisson data set. Cases without an acceptable solution will be examined in Section 6,where a simple generalization of the linear model is provided that ensure one and onlyone acceptable solution for the fit of any data set to a linear model. Given that the model is linear, the condition of acceptability is satisfied by simplyrequiring that the Poisson mean for the first and last bins, y ( x ) and y ( N ), are bothnon–negative, (cid:40) λ · (1 + a ∆ x / ≥ λ · (1 + a ( R − ∆ x N / ≥ . Notice how the model f ( x ) may still become negative in a portion of either the first orthe last bin, but the linearity of the model simply requires that f ( x ) at the mid–pointof the bin be non–negative, in order to ensure that the Poisson mean for the bin isnon–negative.Substituting Equation 6, the conditions become a function of a alone, MR (cid:18) a R (cid:19) (1 + a ∆ x / ≥ MR (cid:18) a R (cid:19) (1 + a ( R − ∆ x N / ≥ , (13)This equation can be used to find a range of the variable a that contains acceptable12olutions. Therefore, a solution of F ( a ) = 0 is acceptable if and only if it satisfiesEquations 13. This property leads to the following result regarding acceptable solu-tions: Lemma 5.2 (Necessary and sufficient condition for the acceptability of a solution of F ( a ) = 0) . A solution of F ( a ) = 0 is acceptable if and only if it is found outside ofthe interval (cid:18) − x , − R − ∆ x N / (cid:19) . Proof.
The conditions of Equations 13 can be used to find values of the variable a that are acceptable solutions of F ( a ) = 0. For a < − /R , i.e., when the denominatorsin Equation 13 are negative, the two conditions are satisfied when a < − / ∆ x , since∆ x < R − ∆ x N /
2. Likewise, for a > − /R , the two conditions are satisfied when a ≥ − / ( R − ∆ x N / (cid:26) a < − / ∆ x (and thus λ < a > − / ( R − ∆ x N /
2) (and thus λ > a ∈ ( − / ∆ x , − / ( R − ∆ x N /
2) lead to a model that becomes neg-ative in some of the bins, and therefore they are not acceptable. Figure 4 shows thefunction λ ( a ) and illustrates the range of acceptable values for the parameters. Example 5.3 ( M = 2 data with no acceptable solution) . The acceptability of themaximum–likelihood solution is illustrated with the data used for Figure 2. The onlysingularity of the F ( a ) function is a c = − (cid:18) x ax + x ax (cid:19) = − . a c > − /R , and with a positive asymptotic value of F ∞ = 0 . F ( a ) = 0 solution is a = − . λ ( a ) = − .
007 results in a best–fit model that isnegative in some of the initial bins, e.g., y ( x ) = λ (1 + a ∆ x/
2) = − . C statistic, and thereforecannot be accepted as a maximum–likelihood solution. Lemma 5.4 (Necessary condition for the acceptability of F ( a ) = 0 solutions) . Solu-tions of F ( a ) = 0 within points of singularity of F ( a ) are always unacceptable. Proof.
This condition applies to data with
M > n > F ( a ) has n − ≥ n − n − F ( a ) = 0 are found between the n − F ( a ),which are the zeros of the function g ( a ). According to property 4.1, the zeros of g ( a )13 igure 4. (Left:) Parameter λ as a function of the parameter a , and range of acceptability of a . For valuesof − / ∆ x < a < − / ( R − ∆ x N /
2) the parameters ( a, λ ) result in a linear model that becomes negative insome of the bin, and therefore not acceptable for use with the Poisson distribution. The smallest and largestpoints of singularity of g ( a ) also correspond to the boundaries of this range, according to Property 4.1. Thezeros of g ( a ), also points of singularity for F ( a ), are therefore inside this range. The x axis was plotted withthe symlog option that allows a near–logarithmic scaling across a value of zero. are located between the n points of singularity of g ( a ), given by a j = − x j − x A ∈ (cid:18) − x , − R − ∆ x N / (cid:19) , (15)where x j is the coordinate of each of the n ≤ M unique bins where non–zero countsare recorded. According to lemma 5.2, these n − F ( a ) = 0 fall in theinterval on unacceptability.Lemma 5.4 states that all zeros of F ( a ) that are within points of singularity may bediscareded as unacceptable. The only possibility for an acceptable solution is the zerothat is located outside of the range of the points of singularity, although such zero isnot guaranteed to be acceptable. Accordingly, the following definition is made: Definition 5.5 (External solution of F ( a ) = 0) . A solution of F ( a ) = 0 is said to be external if it falls outside of the range of the points of singularity of F ( a ).An external solution is therefore found either to the left of the first point of singu-larity of F ( a ), if the asymptotic value F ∞ >
0, or to the right of the last, if F ∞ < M = 2, this is the only solution of F ( a ) = 0, and the point of singularity of F ( a )is calculated according to the equation provided at the end of Property 4.1. Lemma 5.6 (Necessary and sufficient conditions for the acceptability of an external F ( a ) = 0 solutions) . An external solution of F ( a ) = 0 is acceptable if and only if thefollowing conditions are met, according to the sign of the asymptotic value F ∞ : F (cid:18) − x (cid:19) < , if F ∞ > F (cid:18) − R − ∆ x N / (cid:19) > , if F ∞ < . (16) Proof.
This property is simply based on the continuity of the function F ( a ) betweenpoints of singularity, and on lemma 5.2, which established that acceptable values ofthe parameter a are outside of the interval ( − / ∆ x , − / ( R − ∆ x N / F ∞ >
0, the solution of F ( a ) = 0 is to the left of the point of singularity. Giventhat F (cid:48) ( a ) <
0, the solution will fall in the range of acceptability, viz., a < − /R , if F ( − / ∆ x ) < F ∞ <
0, the solution is to the right of the point of singularity, andthe solution is acceptable if F ( − / ( R − ∆ x N / > F ( − / ∆ x ) < F ∞ >
0, then the zero will be in the region of unacceptabil-ity.This necessary and sufficient condition can be immediately applied to data thathave non–zero counts, and thus points of singularity of g ( a ), at the extremes of therange. Corollary 5.7 (Sufficient conditions for data with non–zero counts in first or lastbin) . If a data set with M ≥ satisfies either of the two conditions (cid:40) y ≥ and F ∞ > y N ≥ and F ∞ < then the external solution of the F ( a ) = 0 equation is acceptable. Proof.
According to Equation 8, F ( a ) = 1 + a · R/ g ( a ).A non-zero count in the first bin leads to a singularity of g ( a ) at a j = − ∆ x /
2, andtherefore F ( − / ∆ x ) <
0. Therefore, according to lemma 5.6, if F ∞ >
0, there is anacceptable solution to the left of − / ∆ x . Similar considerations are applicable to thecase of a non-zero count in the last bin, where a singularity of g ( a ) occurs instead at a j = − / ( R − ∆ x N / F ( − / ( R − ∆ x N / >
0. In this case, if F ∞ <
0, theexternal solution of F ( a ) = 0 to the right of the last singularity of g ( a ) is acceptable,again according to lemma 5.6.Notice how corollary 5.7 does not ensure an acceptable solution simply if either thelast or first bin have non–null counts. In fact, the presence of an acceptable solution isconditioned also on the sign of F ∞ . For example, a data set with a non–null last binbut with a positive F ∞ will not have an acceptable solution to the right of the last15 igure 5. Functions F ( a ) and g ( a ) for the dataset presented in Example 5.9. singularity. Finally, the two earlier lemmas can be used to state the uniqueness of themaximum–likelihood solution for the linear model. Lemma 5.8 (Uniqueness of an acceptable solution of F ( a ) = 0) . If there is an ac-ceptable solution of F ( a ) = 0 , this solution is unique. Proof.
This property is an immediate consequence of the fact that, of the n − F ( a ) = 0, the n − Example 5.9 (Example of data with M = 5 and an acceptable solution) . Figure 5shows the F ( a ) and g ( a ) functions for a dataset with M = 5 counts in 5 equally–spaced bins ( x i = 9 . , . , . , . , . n = 5. The function g ( a )has n = 5 points of singularity and n − F ( a ). There are also 4 zeros of the function F ( a ), of which n − F ∞ <
0, and thereforethe remaining external zero is to the right of the last singularity. At the end point a of the range of acceptability, the function is F ( a ) >
0, and therefore the last zeroleads to an acceptable solution.The data and all models are shown in Figure 6. Notice how the model correspondingto the solutions of F ( a ) = 0 that are not acceptable lead to a model that becomesnegative; these models cannot be used for the C statistic, and need to be rejected.The only acceptable model is shown as a solid line, and the corresponding values ofthe C statistic for each bin are shown in the right panel.The results presented in this section can be summarized by a simple algorithmthat can be used to determine whether there is an acceptable solution of the equation F ( a ) = 0, and to calculate it, when it exists. Remark 1 (Algorithm to determine acceptable best–fit parameters of Equation 2) . igure 6. (Left) Best–fit linear models for the M = 5 data presented in Example 5.9. There are 4 solutions of F ( a ) = 0, of which the first three lead to models that become negative somewhere in the x range; the acceptablemodel corresponds to the largest solution. (Right) Contributions to C statistic for each of the N = 200 bins,for a total of C min = 29 . Consider a dataset with N bins, a range R of the independent variable between x A and x B , a total number of integer–valued counts M with a number n ≤ M bins withnon–null counts. The existence and value of the best–fit parameters { a, λ } for thelinear model of Equation 2 can be determined according to the following steps:(1) If n ≤
1, there is no acceptable solution.(2) Calculate the n ≥ g ( a ), given analytically by Equa-tion 15.(3) Numerically calculate the n − g ( a ) between points of singularity. Thesezeros are points of singularity for F ( a ).(4) Calculate the asymptotic value F ∞ , given analytically by Equation 10.(5) (Optional) Numerically calculate the n − F ( a ), found between pointsof singularity of F ( a ) (also zeros of g ( a ). These zeros always lead to unacceptablesolutions.(6) Numerically calculate the remaining external zero of F ( a ), either to the left ofthe first point of singularity (if F ∞ > F ∞ < outside of the range ( − / ∆ x , − / ( R − ∆ x N / acceptable . The corresponding value of λ can be calculate accordingto Equation 6.b. If this value is inside the range ( − / ∆ x , − / ( R − ∆ x N / g ( a ) = 0 and F ( a ) = 0 are facilitated by thecontinuity of the two functions between the known points of singularity, or between17he last point of singularity and ±∞ . An efficient and accurate numerical routineis provided, for example, by python ’s root_scalar , with the brentq method. Themethod requires the specification of an interval, or bracket , where the solution issought. This is either an interval between the two adjacent points of singularity, oran open interval either below the first singularity, or above the last singularity. Forexample, a zero of g ( a ) can be sought in the range [ a j + (cid:15), a j +1 − (cid:15) ], where a j and a j +1 are two consecutive points of singularity of g ( a ). This bracket requires a small value (cid:15) , to be determined according to the separation of the data points, to ensure that thefunction g ( a ) at the two extremes of the bracket has opposite signs. This section examines when data sets with a large number of counts have an acceptablesolution. It will be shown that, when the counts are distributed uniformly acrossthe support, data with large M will always have an acceptable solution. In general,however, it is possible to find datasets with large M that do not have an acceptablesolution, depending on the distribution of counts. This observation will lead to ageneralization of the simple model of Equation 2, presented in Section 6. First, it isnecessary to investigate how the asymptotic value of F ( a ) is affected by the distributionof the detected counts. Property 5.1 (Properties of F ∞ ) . The asymptotic value of F ( a ) is given by Equa-tion 10, and it is negative if1 M N (cid:88) i =1 y i x i − x A = 1 M M (cid:88) i =1 x i − x A > R (18)and positive otherwise. Given that 0 < x i − x A < R , each term 1 / ( x i − x A ) has avalue < /R if x i is above the midpoint of the range, and a value > /R if x i is belowthe midpoint. The left–hand side of Equation 18 is the sample mean of the variable1 / ( x i − x A ).It can now be established that, for data with a large number of bins and a uniformdistribution of the counts, the asymptotic value of F ( a ) is negative. Moreover, whenthe number of counts M is also large, the external solution to the right of the lastsingularity will be acceptable. Lemma 5.10.
For a large number of bins N and a uniform distribution of counts, E (cid:20) x i − x A (cid:21) > R , where E [ ] is the expectation based on a parent distribution for the position x i of the i –th count. Moreover, when M is large, the asymptotic value of F ( a ) is negative. Proof.
Assuming bins of uniform width, the range is R = ∆ x · N . The distance ofthe i –th count from the initial point of the range is x i − x A ∈ (∆ x/ , R − ∆ x/ x i − x A = ∆ x f ( R − ∆ x ) , f ∈ (0 , N is large and the counts are uniformly distributed in the N bins, it is possible to treat f as a continuous and uniformly distributed random variablein the range (0 , x i − x A can be approximated as E (cid:20) x i − x A (cid:21) unif (cid:39) (cid:90) df ∆ x/ f ( R − ∆ x ) = ln( R − ∆ x/ − ln ∆ x/ R − ∆ x (cid:39) ln NR Therefore the expectation is asymptotically larger than 2 /R for large N . Moreover, fora large number of counts M , the law of large numbers ensures that the sample averageof 1 / ( x i − x A ) tends to its expectation. Therefore, as M increases, the asymptoticvalue of F ( a ) tends to be negative, and the external solution of F ( a ) = 0 will be foundto the right of the last point of singularity for F ( a ).It is now possible to state a sufficient condition that applies to uniformly distributedcounts in the large–count regime. Lemma 5.11 (Sufficient condition for an acceptable solution) . For large M and N with uniformly distributed counts and a non–null count in the last bin, the externalsolution of F ( a ) = 0 is acceptable and it is found to the right of the last singularity. Proof.
For data with non–null counts in the last bin, i.e., y N ≥
1, the last singularityof g ( a ) occurs at a j = − / ( R − ∆ x N / g ( a ), F ( a j ) =1 + R/ · a j , according to Equation 8, and therefore F ( a j ) >
0. Notice that the point a j marks the boundary of the region of acceptability for the solutions of F ( a ) = 0,according to lemma 5.2.The last singularity for F ( a ), also a zero of g ( a ), will thus occur at a point a s whichis to the left of a j , and the continuity of F ( a ) to the right of the last singularityensures that F ( a ) remains positive between a + s and a j . Also, lemma 5.10 ensures thatthe asymptotic value of F ( a ) is negative. Therefore there is a zero of F ( a ) to the rightof a j , and this external zero is acceptable, according to lemma 5.2.These asymptotic results apply to a uniform distribution of counts, which is a veryrestrictive condition. When the distribution of counts is not uniform, even large– M datasets may not have an acceptable model. It goes beyond the scope of this paper toseek additional sufficient conditions for covergence, given the number of variables atplay (in particular, the number of counts M , the number of bins N and their size andlocation, and the distribution of counts), and the fact that necessary and sufficientconditions for convergence have been provided earlier in this section. Instead, selectednumerical simulations are presented to quantify the fraction of Poisson datasets thatdo not have a non–negative best–fit linear and to illustrate a few representative cases.For this purpose, 100 data sets were simulated for various values of the total numberof counts M , initially assuming that the M counts were uniformly distributed among N = 100 equally spaced bins, following the same pattern of bins along the x axis asin Figures 2 and 3. As expected, based on the asymptotic results of this section, for M ≥
50, all datasets have an acceptable model (Figure 7, red curve). Then, the samesimulations were repeated for a distribution of counts that is either linearly increasingor decreasing towards larger values of x , i.e., with samples drawn respectively from19 igure 7. (Left:) Fraction of datasets with available best–fit non–negative linear model, as function of thenumber of counts M . (Right:) Fraction of datasets with negative asymptotic value F ∞ . the probability distributions functions h ( x ) = x − x A ) R R − x − x A ) R with x ∈ [0 , R ]. For these cases, the simulations show that the number of accept-able models remains smaller even for large values of M . The right panel of Figure 7also illustrates the fraction of data with a negative F ∞ . As expected according tolemma 5.10, uniformly distributed data (red curve) have a negative asymptotic F ∞ for large M ; moreover, the same applies for data distributed with a negative slope(blue curve). This is explained according to property 5.1, since data points below themid–point of the range drive the average of 1 / ( x i − x A ) to values greater than 2 /R , andthat in turn causes F ∞ to be negative. For data with a positive slope, even for large M there is a large fraction of data with a positive asymptotic limit of F ( a ). Thesesimulations can be used as examples of large– M data that do not have an acceptablesolution using the linear model of Equation 2. Random samples from these distributions are readily obtained by simulating the associated normalizedlinear variables in y ∈ (0 ,
1) (with distributions of 2 y and 2 − y , respectively for an increasing and decreasingdistribution). Samples of x are then obtained by rescaling samples of y to the range R = x B − x A via a lineartransformation with y = ( x − x A ) /R . Simulations of the normalized distributions for y are easily accomplishedwith the aid of a uniform variable u in (0 , F − ( p ) = y , where F is the cumulative distribution of y (respectively F ( y ) = y and F ( y ) = 2 y − y for the two linear models), the variable y is simulated as y = F − ( u ) (see, e.g., Section 4.8of [3]). This means that random samples of the normalized increasing and decreasing distributions are obtainedrespectively via y = √ u and y = 1 − √ u , where u are samples from a uniform distribution in (0 , . An extended linear model with a non–negative solution The paper has identified cases where the maximum–likelihood equations do not yieldan acceptable solution for the parameters of the linear model. In particular, this is truefor all data with only one count ( M = 1) and null counts in N − N availablebins. This can simply be viewed as the inability to constrain two free parameters withjust one non–zero data point. In such case, it may be sufficient to model the data witha simple constant model, with a best–fit model equal to the sample average of thecounts in all the bins (see, e.g., [3] and [4]). There are also other data sets with M ≥ M = 2. Section 5 also illustrated data with large M that do not have an acceptable solution (see, e.g., Figure 7).Motivated by the need to have a linear model that is applicable to any situation,this section proposes a simple generalization of the linear model of Equation 2 that en-sures an acceptable maximum–likelihood solution using the C statistic for any Poissondataset. Definition 6.1 (The extended non–negative linear model) . The proposed non–negative linear model is given by:(1) the standard linear model of Equation 2, when such model has an acceptablesolution; otherwise,(2) the model is parameterized as one of the following three functions:(A) A one–parameter linear model pivoted to zero at the initial point x A : f A ( x ) = λ A ( x − x A ) , (19)for which y A ( x A ) = 0, and with a positive adjustable parameter λ A ≥ x B : f B ( x ) = λ B (cid:18) − x − x A R (cid:19) (20)for which y B ( x B ) = 0, with an adjustable parameter λ B ≥ f C ( x ) = λ C . (21)It will be shown that the three models of Equations 19, 20 and 21 have simpleanalytical solutions for their maximum–likelihood best–fit parameters (respectively λ A , λ B and λ C ), and therefore it is always possible to use one of these models as anacceptable linear model for any dataset. For the linear model pivoted at x A , Equation 19 is used to evaluate the C statistic,Equation 1. Assuming that the data covers the range R continuously, as also assumed21or Equation 4, the term N (cid:88) i =1 µ i = (cid:90) x B x A f A ( x ) dx = λ A R C A = λ A R − M ln λ A + D A (23)where D A ≡ (cid:32) − M + 2 N (cid:88) i =1 y i ln y i − N (cid:88) i =1 y i ln ∆ x i − N (cid:88) i =1 y i ln( x i − x A ) (cid:33) (24)is a term that is independent of the model, and therefore plays no role in the mini-mization of the C statistic. The best–fit parameter is given by ∂C A /∂λ A = 0, leadingto the simple analytical solution λ A = 2 MR > . (25)For the linear model pivoted at x B , use of Equation 20 into Equation 1 leads to N (cid:88) i =1 µ i = (cid:90) x B x A f B ( x ) dx == λ B R C B = λ B R − M ln λ B + D B (27)with D B ≡ (cid:32) − M + 2 N (cid:88) i =1 y i ln y i − N (cid:88) i =1 y i ln ∆ x i − N (cid:88) i =1 y i ln (cid:18) − x i − x A R (cid:19)(cid:33) . (28)The best–fit parameter is therefore given by λ B = 2 MR > . (29)Finally, the best–fit constant model has a C statistic of C C = 2 λ C R − M ln λ c + D C (30)with D C ≡ (cid:32) − M + 2 N (cid:88) i =1 y i ln y i − N (cid:88) i =1 y i ln ∆ x i (cid:33) . (31)22his leads to a best–fit parameter λ C = MR , (32)which is equivalent to the sample average of the data when multiplied by a uniform∆ x , as found in [3]. As already remarked after Equation 4, the equations developed inthis section apply to data that cover continuously the range x A to x B . Data with gapsin the x variable require a simple modification to these equations that is presented inSection 6.3. Equation 2 in combination with the extensions provided by Equations 19, 20, and 21are to be used according to the following method, which defines the solution of theextended model.
Definition 6.2 (Solution of the extended non–negative linear model) . Solution of theextended non–negative linear model is given by:(1) the solution with the standard linear model of Equation 2, if that solution is ac-ceptable. As shown in Section 5 and specifically lemma 5.8, this solution is guaranteedto be unique, when it exists.(2) If a solution with the standard linear model is not available, the solution is givenby the best–fit model that gives the lowest value of the C statistic, among the threeoptions provided by Equations 19, 20 and 21. Lemma 6.3 (Existence and uniqueness of solution for the extended non–negativelinear model) . There exists one and only one maximum–likelihood solution for theextended non–negative linear model fit to any Poisson–distributed data.
Proof.
The proof is a direct consequence of the fact that there is at most one non–negative solution for the model of Equation 2 (lemma 5.8), and of definition 6.2 forthe solution of the extended model.
Remark 2 (Expanded algorithm for the extended non–negative linear model) . Thealgorithm presented in Remark 1 can be extended to the non–negative linear model.When the linear model of Equation 2 fails to produce an acceptable solution, thefollowing two additional steps must be added:(8) Calculate the three additional best–fit linear models (pivoted at A, pivoted at Band constant) and their C statistic, using the analytical formulas 19, 20 and 21.(9) Accept as the best–fit model the one with the lowest C statistic. Notice that ifthe original linear model of Equation 2 is acceptable, its value of the C statisticwill be lower than that of the other three linear models.The use of the extended non–negative linear model is illustrated in the two followingexamples. Example 6.4 (Use of the extended non–negative model for data with no acceptablestandard linear model) . In the left panel of Figure 8 are shown the results for thesame M = 2 data of Figure 2, for which a non–negative linear model according toEquation 2 could not be found. The data can be fit with the pivoted and constant23 igure 8. Pivoted and constant linear models for the data of Figure 2 (with M = 2, left) and Figure 3( M = 3, right). linear models, which yield best–fit C statistic values of C A = 15 . C B = 18 .
141 and C C = 15 . C statistic indicate that the linear model pivoted at x A is the most accurate representation of these data, and should be regarded as thebest–fit linear model. Example 6.5 (Use of the extended non–negative model for data with an acceptablestandard linear model) . The right panel of Figure 8 shows the results for the M = 3model of Figure 3, for which a best–fit non–negative model with the ‘standard’ linearmodel was in fact available, for a C statistic value of C = 20 . C A = 23 . C B = 22 .
413 and C C = 21 . C statistic. There are in factinfinitely many such models, e.g., by fixing an arbitrary intercept of the x = x A axis.The choices made by the three simple extensions discussed in this paper are intendedto provide simple alternatives to the full linear model that have a simple interpretationand likewise simple analytical solutions. The methods of analysis presented in this paper can be applied to data with anybinning, including data with non–uniform bin sizes. The bin sizes, however, will havean effect on the best–fit model, as can be seen by the fact that the function F ( a ) is afunction of x j − x A , there x j is the center coordinate of the j –th bin. When Poissondata are collected on an event–by–event basis, the choice of bin size must be made24ased on considerations on the methods of collection of the data and the instrumentsused for the collection.In Equations 4, 22 and 26 it was assumed that the range of integration of the x variable was continuous, therefore implying that the data covers the x A to x B rangewithout any gaps or missing data. It is possible to provide a simple generalization tothose equations to include gaps in the data. This is in fact a situation of practicalimportance, since certain regions of the independent variable may be without data fora variety of reasons. A common situation is the exclusion of portions of the x variablebecause of poor calibration of the instrument (e.g., the exclusion of a wavelength rangebecause of detector inefficiencies), or because an instrument was not operating duringcertain time intervals. In these cases, one cannot just assign a value of zero countsto that range of the independent variable, but rather the intervals must be explicitlyremoved from the data, therefore creating gaps in an otherwise continuous variable. Definition 6.6 (Gaps in the data) . A gap in the data is defined as a continuousinterval of the independent variable between x a and x b , of length R G = x b − x a , thatis not covered by any of the bins. A Poisson data set may have g non–overlappinggaps between x a,j and x b,j , j = 1 , . . . , g , with x G,j = ( x b,j + x a,j ) / R G,j = x b,j − x a,j . The length of all gaps in the independent variable x is R G = (cid:80) R G,j .The following lemmas summarize the changes that need to be made to analyze datathat contain gaps in the independent variable
Lemma 6.7 (Modifications to the C statistic and to the functions F ( a ) and λ ( a ) forgaps in the data) . When the data have gaps, the C statistic becomes C = 2 λR (cid:18) aR (cid:19) − λ g (cid:88) j =1 R G,j (1 + a ( x G,j − x A )) − M ln λ − N (cid:88) i =1 y i ln(1 + a ( x i − x A )) + D. (33) Moreover, the function whose zero provides the best–fit value of a becomes F ( a ) = 1 + a R m − M R m g ( a ) (34) where R is replaced by a modified R m given by R m ≡ R − S G R − R G S G ≡ (cid:80) gj =1 R G,j ( x G,j − x A ) , (35) and the best–fit solution for the parameter λ is λ ( a ) = MR (cid:18) a R (cid:19) − ( R G + aS G ) . (36)25 roof. The modification to the C statistic to account for the presence of gaps isprovided by changing equation 4 to (37) N (cid:88) i =1 µ i = (cid:90) x B x A f ( x ) dx − g (cid:88) j =1 (cid:90) x b,j x a,j f ( x ) dx = λR (cid:18) a R (cid:19) − λ g (cid:88) j =1 R G,j (1 + a ( x G,j − x A ))The use of Equation 37 in place of Equation 4 leads to the C statistic of Equation 33in place of the original equation 5. Taking the derivatives of C with respect to a and λ and setting them to zero leads to ∂C∂λ = 2 R (cid:18) a R (cid:19) − g (cid:88) j =1 R G,j (1 + a ( x G,j − x A )) − Mλ = 0 , and ∂C∂a = λR − λ g (cid:88) j =1 R G,j ( x G,j − x A ) − N (cid:88) i =1 y i ( x i − x A )1 + a ( x i − x A ) = 0 . Notice that g (cid:88) j =1 R G,j (1 + a ( x G,j − x A )) = R G + a g (cid:88) j =1 R G,j ( x G,j − x A )where R G is the combined length of all (non–overlapping) gaps. Defining S G ≡ g (cid:88) j =1 R G,j ( x G,j − x A ) (38)leads to R (cid:18) a R (cid:19) − R G − aS G − Mλ = 0 , thus proving Equation 36, and λR − λS G − g ( a ) = 0where g ( a ) is the usual function as defined in Equation 9. Simple algebraic modifica-tions and elimination of λ lead to1 + a R m − M R m g ( a ) = 026here R m ≡ R − S G R − R G , thus proving Equation 34.Lemma 6.7 shows that, when there are gaps in the independent variable, the methodof analysis to find a solution for a and λ proceeds in the same way as when there areno gaps, provided the function F ( a ) uses the R m parameter in place of R . Once thebest–fit value of a is found, λ can be calculated analytically by making a change in thedenominator of the function λ ( a ) to account for the gap R G , according to equation 36. Lemma 6.8 (Modifications to the C statistic and to the best–fit parameters of thepivoted and constant models for gaps in the data) . When the data have gaps, the C statistic for the pivoted and constant models become C A = λ A R − λ A S A − M log λ A + D A C B = λ B R − λ B S B − M ln λ B + D B C C = 2 λ C ( R − R G ) − M ln λ C + D C (39) with S A ≡ (cid:80) gj =1 x b,j − x a,j S B ≡ (cid:80) gj =1 R G,j R ( x B − x G,j ) . (40) The best–fit model parameters become λ A = 2 MR − S G λ B = 2 MR − S B λ C = MR − R G . (41) Proof.
For the model pivoted at A , equation 22 is modified by the presence of gapsas (42) N (cid:88) i =1 µ i = (cid:90) x B x A f A ( x ) dx − g (cid:88) j =1 (cid:90) x b,j x a,j f A ( x ) dx = λ A (cid:18) x B − x A − Rx A (cid:19) − λ A g (cid:88) j =1 x b,j − x a,j − g (cid:88) j =1 x A ( x b,j − x a,j ) . Defining S A ≡ g (cid:88) j =1 ( x b,j − x a,j ) (43)27nd noticing that g (cid:88) j =1 x A ( x b,j − x a,j ) = x A R G leads to C A = λ A ( x B − x A ) − λ A S A − λ A x A ( R − R G ) − M ln λ A + D A . Since S A − x A R G = 2 S G and ( x B − x A ) − x A R = R , it follows that C A = λ A ( R − S G ) − M ln λ A + D A . Then, taking a derivative of C A with respect to λ A and setting it to zero completesthe proof for the model pivoted at A.For the model pivoted at B , (44) N (cid:88) i =1 µ i = (cid:90) x B x A f B ( x ) dx − g (cid:88) j =1 (cid:90) x b,j x a,j f B ( x ) dx = λ B (cid:18) R + x A − x B − x A R (cid:19) − λ B g (cid:88) j =1 (cid:16) x A R (cid:17) R G,j − (cid:32) x b,j − x a,j R (cid:33) = λ B (cid:18) R − S B (cid:19) where S B ≡ g (cid:88) j =1 R G,j (cid:18) − x G,j − x A R (cid:19) = g (cid:88) j =1 R G,j R ( x B − x G,j ) . From this, the equations for C B and λ B follow after a few simple algebra steps.The results for the constant model follow immediately from the constancy of thefunction f C ( x ) = λ C .Lemma 6.8 shows that the pivoted and constant models retain a simple analyticalsolution even in the presence of gaps in the data. An application of the fit to Poissondata with non–uniform bin sizes and with a gap in the data is provided in the followingexample. Example 6.9 (Data with non–uniform bin sizes and a gap in the data) . The datachosen for this example span a range of the independent variable between x A = 0 and x B = 9, with a gap between x a = 3 and x b = 6. All nine measurements have a value of y i = 1, with bin sizes of ∆ x i = 1 for the first three data points, and ∆ x i = / for theother six data points, as shown in Figure 9. The data have an acceptable solution forthe standard linear model (in black) with a = 0 . λ = 0 . C c min = 0 . f ( x )(black continuous line, in units of counts–per–bin–size) differs from the best–fit model y ( x i ) (black step–wise curve, in units of counts or counts–per–bin). The constant model28 igure 9. Best–fit linear models for data with non–uniform bins and with a gap in the data. The dot–dashedcurve are the density functions and the solid step–wise curves are the models y ( x i ) for the integer data. (yellow) has a best–fit parameter of λ C = 1 .
5, according to equation 41 with M = 9, R = 9 and R G = 3, with C C = 1 . A has λ A = 0 . C A = 2 . B has λ B = 3 , and C B = 14 . Remark 3 (Algorithm to implement changes in the analysis when gaps in the dataare present) . This algorithm details the additions and modifications required for al-gorithms 1 and 2 when there are data gaps present, following the same enumeration.(0) (Additional step) Calculate the location x a,j , x b,j and range R G,j of each gap,the total gap length R G , R m and S G according to equation 35.(4) Hereafter replace R with R m in the definition of F ( a ).(7) Use equation 36 instead of 6 to calculate the value λ ( a ) corresponding to anacceptable solution a .(8) For the calculation of the C statistics and best–fit parameters of the constantand pivoted models, use respectively Equations 39 (instead of Equations 23, 27and 30) and Equations 41 (instead of Equations 25, 29 and 32). C statistic It is well known that, in the large–count limit, the C min statistic – i.e., the C statisticevaluated for the best–fit linear model – is expected to be distributed like a χ dis-tribution with N − N is the number of bins and 2 is thenumber of adjustable free parameters of the linear model (e.g., [8] and [3]). Moreover,properties of the C statistic for a fixed model with no free parameters is also knownaccurately for any value of the parent Poisson mean [4, 13]. What remains to be an-alyzed in further detail is the effect of free parameters on the distribution of C min , in29he low–count regime. The purpose of this paper is to present a method to evaluatethe best–fit parameters of the linear model, precisely with the intent to further studythe distribution of C min via numerical simulations that rely on this method of analysis.For a significant number of data sets, and especially for data with a small number ofcounts, the only non–negative linear model is one of the three extensions – all of themwith just one adjustable parameter, instead of two of the traditional linear model. Thisrequirement that the model be non–negative was introduced by the use of the Poissondistribution, and did not enter the discussion of Gaussian–distributed datasets thatcan be fit with the χ distribution. It is likely that such new requirement will resultin differences between the distributions of χ min and C min for the linear model in thelow–count regime, with implications for hypothesis testing and confidence intervalson the best–fit parameters. The distribution of the C min for the linear model will bepresented in a separate paper.
7. Discussion and conclusions
This paper has presented a new semi–analytical method to find the best–fit parametersof a linear model for the fit to integer–valued counting data, using the Poisson–based C statistic. The method consists first of finding a solution for the non-linear equation F ( a ) = 0, where a is one of the two parameters of the model. The other parameter λ is then calculated analytically via a simple analytical function λ = λ ( a ). The twoparameters a and λ must be such that the linear model is non–negative in each bin, inorder to ensure the applicability of the Poisson distribution. The analysis presented inthis paper shows that such requirement leads, in fact, to the uniqueness of the best–fitmodel, when such solution is available. This is clearly a very desirable property ofthe method, and a necessary condition for the use of this method to analyze Poisson–distributed data.This paper has identified cases where low–count Poisson data do not have a suit-able non–negative best–fit linear model according to the standard parameterizationof Equation 2. For this reason, an extended linear model was proposed that guaran-tees a unique non–negative solution for any Poisson data set. This is accomplished bypivoting the linear model to either end of the range of the independent variable orby using a simple constant linear model, when the traditional linear model leads toan unsuitable solution. Thanks to simple analytical solutions for the best–fit param-eter of these extensions, the use of the extended non–negative linear model remainsstraightforward.The availability of a simple method to identify the best–fit parameters of a linearmodel for Poisson data of any number of counts makes it possible to further our under-standing of the C statistic. In particular, it is now possible to study the distribution ofthe C min statistic for one of the most commonly used models with adjustable param-eters, i.e., the linear model, especially in the low–count regime where its distributionis not known exactly. References [1] S. Baker and R.D. Cousins,
Clarification of the use of CHI-square and likelihood functionsin fits to histograms , Nuclear Instruments and Methods in Physics Research 221 (1984),pp. 437–442.
2] P.R. Bevington and D.K. Robinson,
Data reduction and error analysis for the physicalsciences , McGraw Hill, Third Edition, 2003.[3] M. Bonamente,
Statistics and Analysis of Scientific Data , Springer, Graduate Texts inPhysics, Second Edition, 2017.[4] M. Bonamente,
Distribution of the c statistic with applications to the sample mean ofpoisson data , Journal of Applied Statistics 0 (2019), pp. 1–22. Available at https://doi.org/10.1080/02664763.2019.1704703 .[5] M. Bonamente,
Probability models of chance fluctuations in spectra of astronomicalsources with applications to x-ray absorption lines , Journal of Applied Statistics 46 (2019),pp. 1129–1154. Available at https://doi.org/10.1080/02664763.2018.1531976 .[6] W.H. Bonat, B. Jorgensen, C.C. Kohonendji, J. Hinde, and C. Demetrio,
Extendedpoisson-tweedie: Properties and regression models for count data , Statistical Modeling18(1) (2018), pp. 24–49.[7] W. Cash,
Generation of Confidence Intervals for Model Parameters in X-ray Astronomy , Astronomy and Astrophysics
52 (1976), p. 307.[8] W. Cash,
Parameter estimation in astronomy through application of the likelihood ratio , The Astrophysical Journal
228 (1979), p. 939. Available at http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1979ApJ...228..939C&db_key=AST .[9] A. Dobson and A. Barnett,
An Introduction to Generalized Linear Models , CRC Press,Fourth Edition, 2018.[10] G.M. El-Sayyad,
Bayesian and classical analysis of poisson regression , Journal of theRoyal Statistical Society. Series B (Methodological) 35 (1973), pp. 445–451. Available at .[11] H. Haselimashhadi, V. Vinciotti, and K. Yu,
A novel bayesian regression model for countswith an application to health data , Journal of Applied Statistics 45 (2018), pp. 1085–1105.Available at https://doi.org/10.1080/02664763.2017.1342782 .[12] P.J. Humphrey, W. Liu, and D.A. Buote, χ and Poissonian Data: Biases Even in theHigh-Count Regime and How to Avoid Them , The Astrophysical Journal
693 (2009), pp.822–829.[13] J.S. Kaastra,
On the use of C-stat in testing models for X-ray spectra , Astronomy andAstrophysics
605 (2017), A51.[14] P. McCullagh and J. Nelder,
Generalized Linear Models , Chapman & Hall/CRC, SecondEdition, 1989.[15] D.M. Mock, M. N.I., Z. S., and et al.,
Red blood cell (rbc) survival determined in humansusing rbcs labeled at multiple biotin densities. , Transfusion 51(5) (2011), pp. 1047–1057.[16] J.A. Nelder and R.W.M. Wedderburn,
Generalized linear models , Journal of the RoyalStatistical Society. Series A (General) 135 (1972), pp. 370–384. Available at .[17] J.D. Scargle, J.P. Norris, B. Jackson, and J. Chiang,
Studies in Astronomical Time SeriesAnalysis. VI. Bayesian Block Representations , The Astrophysical Journal
764 (2013), 167.[18] K.F. Sellers and G. Shmueli,
A flexible regression model for count data , The Annals ofApplied Statistics 4 (2010), pp. 943–961. Available at .[19] G. Shmueli, T.P. Minka, J.B. Kadane, S. Borle, and P. Boatwright,
A useful distributionfor fitting discrete data: Revival of the conway-maxwell-poisson distribution , Journal of theRoyal Statistical Society. Series C (Applied Statistics) 54 (2005), pp. 127–142. Availableat .[20] S. Valenti, D.A. Howell, M.D. Stritzinger, M.L. Graham, G. Hosseinzadeh, I. Arcavi, L.Bildsten, A. Jerkstrand, C. McCully, A. Pastorello, A.L. Piro, D. Sand, S.J. Smartt, G.Terreran, C. Baltay, S. Benetti, P. Brown, A.V. Filippenko, M. Fraser, D. Rabinowitz, M.Sullivan, and F. Yuan,
The diversity of Type II supernova versus the similarity in theirprogenitors , Monthly Notices of the Royal Astronomical Society
459 (2016), pp. 3939–3962.[21] T. Yee,
Vector Generalized Linear and Additive Models , Springer, 2015.[22] T.W. Yee and C.J. Wild,
Vector generalized additive models , Journal of the Royal Sta- istical Society. Series B (Methodological) 58 (1996), pp. 481–493. Available at ..