A Closed Form Approximation of Moments of New Generalization of Negative Binomial Distribution
AA Closed Form Approximation of Moments of NewGeneralization of Negative Binomial Distribution
Sudip Roy ∗ , Ram C. Tripathi † and N. Balakrishnan ‡ Department of Management Science and Statistics, University of Texas atSan Antonio, One UTSA Circle, San Antonio, Texas 78249 Department of Mathematics and Statistics, McMaster University, 1280 MainStreet West, Hamilton, Ontario, Canada L S L Abstract
In this paper, we propose a closed form approximation to the mean and variance of a new gen-eralization of negative binomial (NGNB) distribution arising from the Extended COM-Poisson(ECOMP) distribution developed by Chakraborty and Imoto (2016)(see [4]). The NGNB isa special case of the ECOMP distribution and was named so by these authors. This distri-bution is more flexible in terms of dispersion index as compared to its ordinary counterparts.It approaches to the COM-Poisson distribution (Shmueli et al. 2005) [11] under suitable lim-iting conditions. The NGNB can also be obtained from the COM-Negative Hypergeometricdistribution (Roy et al. 2019)[10] as a limiting distribution. In this paper, we present closedform approximations for the mean and variance of the NGNB distribution. These approxima-tions can be viewed as the mean and variance of convolution of independent and identicallydistributed negative binomial populations. The proposed closed form approximations of themean and variance will be helpful in building the link function for the generalized negativebinomial regression model based on the NGNB distribution and other extended applications,hence resulting in enhanced applicability of this model.
Keyword : Generalized Negative Binomial; COM Poisson; Dispersion Index; Failure rate; Log-convexity; Kemp’s Family of Distributions; Closed Form Approximation to Mean ∗ Corresponding author: [email protected] † [email protected] ‡ [email protected] a r X i v : . [ m a t h . S T ] A p r Introduction
In this paper, we present the probability function (pf) of the NGNB model (Chakraborty andImoto 2016)[4] and propose closed form approximations for its mean and variance. The approx-imate expression for the mean can be used to develop a link function for the new generalizednegative binomial regression model. We plan to develop and report this in a separate paper.This will enhance the applicability of the NGNB model. The NGNB is a particular case of theECOMP distribution when three of its parameters are equal. The ECOMP distribution hasmany important characteristics. It approaches to the COM-Poisson distribution under suit-able limiting conditions. The NGNB distribution is also quite versatile. Its dispersion indexis flexible and hence, it can accommodate over-dispersed and under-dispersed data. It can belog-concave or log-convex based on the values of its parameters. Hence, it has the propertiesof increasing failure rate (IFR) and decreasing failure rate (DFR).The paper is organized as follows. In Section 2, we provide the probability function of theNGNB distribution including its parameters and briefly present its moments and probabilitygenerating function. We show that the NGNB model can be regarded as a member of theKemp’s family of distributions. In Section 3, we will present the closed form approximationsof its mean and variance and investigate its behavior through a simulation study. We assessthe closeness of the proposed approximations for the mean and variance to their true values byanalyzing error, bias and mean square error for different combinations of the parameters. In theappendix, we define the log-concavity(convexity) and use these to characterize the IFR(DFR)properties of the NGNB failure rate for various range of values of its parameters. The theoreticalderivations of these characteristics are also provided.
The negative binomial distribution arises by sampling with replacement from a populationconsisting of two types of items, say S : Success and F : Failure, until we get k number ofsuccesses, where the probability of success is p and the probability of failure is q “ ´ p foreach independent trial. Let X denote the number of failures before getting k successes. Then X is said to have a negative binomial distribution with parameters k and p . Its pf is given by P p X “ x q “ ˆ x ` k ´ x ˙ p k q x x “ , , , ¨ ¨ ¨ . (2.1)This distribution is always over-dispersed (variance greater than the mean) and belongs tothe family of power series distributions. The mean, variance and the probability generatingfunction (pgf) of the NB model are given by E p X q “ kqp ,V ar p X q “ kqp ,G p s q “ E p s X q “ p k p ´ qs q ´ k . γ which acts as the shape parameter and by normalizing the resultingexpression. The NGNB random variable is denoted by Y with its pf defined as P p Y “ y q “ ` y ` k ´ y ˘ γ q y ř j “ ` j ` k ´ j ˘ γ q j (2.2) y “ , , ¨ ¨ ¨ and γ P (cid:60) , k ą , ă q ă . We denote the normalizing constant by Z p γ, k, q q “ ÿ j “ ˆ j ` k ´ j ˙ γ q j . Whenever necessary, we will denote the distribution by
N GN B p γ, k, p q . The NGNB modelprovides a more flexible alternative to the negative binomial, since the additional shape param-eter γ makes the new model more or less over-dispersed as compared to the negative binomial.When γ “
0, the normalizing constant Z p γ, k, q q “ ´ q , and NGNB reduces to the geometricdistribution with probability function p p y q “ pq y . The NGNB model is a member of the moregeneral class called ECOMP studied by Chakraborty and Imoto (2016) [4]. The NGNB wasindependently studied by (Roy 2016)[9] in a Ph.D thesis where it was called COMP-NB. Inthis section, we investigate additional statistical properties of the NGNB (COMP-NB) model. In this section, we discuss some important characteristics of the NGNB model. These includecharacterization of the monotonicity of its failure rate based on the log-convexity and log-concavity of its pf. We also show that when γ is a positive integer, its pgf can be written interms of the generalized hypergeometric function. This establishes the NGNB as a member ofthe Kemp family of distributions and makes it easier to compute its moments. The monotonicity of failure rate of a discrete model plays an important role in modeling of fail-ure time of components measured in number of cycles until the failure occurs. As established in(Gupta, Gupta & Tripathi 1997)[5], the log-concavity (log-convexity) of the pf of a distributiondetermines the monotonicity of its failure rate. In the appendix, we characterize the failure rateof the NGNB in terms of the log-concavity (log-convexity) of its pf. We have showed in theorem A. γ when k ą γ ă k ą
1. Hence, we have established that the NGNB has increasing (decreasing)failure rate for γ greater (less) than 0, when k ą
1. In Theorem A. λ and γ in the limit as k Ñ 8 and p Ñ kp “ λ remains fixed. This parallels to thesimilar limiting results relating the NB and the Poisson distributions.3 .1.2 Moments and probability generating function (for γ a positive integer) In this section, we express the pgf of the NGNB in the form of generalized hypergeometricfunction when γ is a positive integer. We use this representation to find the factorial momentsand hence other moments of the distribution in this special case.Let us first introduce some functions and notations which will be used throughout: • Pochhammer symbol or rising factorial: p b q “ , p b q k “ b p b ` q ¨ ¨ ¨ p b ` k ´ q , k ě • Generalized hypergeometric function: p F q p a , a , ¨ ¨ ¨ , a p ; b , b , ¨ ¨ ¨ , b q ; z q “ ÿ n “ p a q n p a q n ¨ ¨ ¨ p a p q n p b q n p b q n ¨ ¨ ¨ p b q q n z n n ! . The following theorem gives an expression for the pgf of NGNB distribution in terms of thegeneralized hypergeometric function.
Theorem 2.1.
Let Y denote the random variable with the N GN B p γ, k, p q distribution. When γ is a positive integer, the pgf of Y can be written in terms of the hypergeometric series as follows. G p γ, s q “ E p s y q “ γ F γ ´ p k, k, ¨ ¨ ¨ , k ; 1 , , ¨ ¨ ¨ , qs q γ F γ ´ p k, k, ¨ ¨ ¨ , k ; 1 , , ¨ ¨ ¨ , q q . (2.3) Proof.
Consider the pgf of the NGNB distribution G p γ, s q “ E p s y q “ ÿ y “ ` y ` k ´ y ˘ γ p sq q y ř j “ ` j ` k ´ j ˘ γ q j . The numerator in the above can be expressed as a generalized hypergeometric series if γ is apositive integer as seen below: ř y “ ` y ` k ´ y ˘ γ p sq q y “ ř y “ ´ p y ` k ´ q ! y ! p k ´ q ! ¯ γ p sq q y “ ř y “ ´ p k q y p k ´ q ! p q y p k ´ q ! ¯ γ p sq q y “ ř y “ p k q yγ p q yγ ´ p sq q y y ! “ γ F γ ´ p k, k, ¨ ¨ ¨ , k ; 1 , , ¨ ¨ ¨ , qs q . Then, the pgf can be written as G p γ, s q “ γ F γ ´ p k, k, ¨ ¨ ¨ , k ; 1 , , ¨ ¨ ¨ , qs q γ F γ ´ p k, k, ¨ ¨ ¨ , k ; 1 , , ¨ ¨ ¨ , q q . (2.4)Hence the proof. 4 ean, variance and higher order moments of NGNB: For the
N GN B p γ, k, p q , when γ is a positive integer, the expectation E(Y) can be obtainedfrom (2.4) and is expressed as E p Y q “ qk γ ř j “ ` k ` jj ˘ γ q j p j ` q γ ´ ř j “ ` k ` j ´ j ˘ γ q j “ q k γ γ F γ ´ p k ` , k ` , ¨ ¨ ¨ , k `
1; 2 , , ¨ ¨ ¨ , q q γ F γ ´ p k, k, ¨ ¨ ¨ , k ; 1 , , ¨ ¨ ¨ , q q . It can be seen that E(Y) exists and is finite because the hypergeometric series has a radius ofconvergence at 1 for | q | ă
1. Both the series are nearly-poised hypergeometric series of firstkind. The r th order factorial moment and hence the variance of NGNB can be obtained from(2.4) by evaluating its r th derivative at s “ γ is a positive integer. As seen above, themean, variance and higher order moments can not be evaluated in closed form. It would beimportant if some approximations can be developed for the mean and variance of the NGNBmodel in some special cases. This will be addressed in Section 3 below. γ is aPositive Integer The broad family of generalized hypergeometric probability distributions in the Kemp family[13] is known for many useful properties. Most common distributions which are members ofKemp family are Poisson, binomial, negative binomial, hypergeometric and negative hyperge-ometric distributions. The distributions in this family have their pgf as G p z q “ p F q pp a q ; p b q ; λz q p F q pp a q ; p b q ; λ q , where p F q pp a q ; p b q ; z q is the generalized hypergeometric series with p numerator parameters p a , a , ¨ ¨ ¨ , a p q , and q denominator parameters p b , b , ¨ ¨ ¨ , b q q . For the Kemp family of distri-butions the ratio of successive probabilities is given by p p y ` q p p y q “ p a ` y qp a ` y q ¨ ¨ ¨ p a p ` y qp b ` y qp b ` y q ¨ ¨ ¨ p b q ` y q λ ` y . (2.5) Theorem 2.2.
The ratio of successive probabilities for the NGNB distribution can be writtenin the form of the ratio of the probabilities of Kemp Type A p i q families of distributions.Proof. The ratio of successive probabilities for the NGNB distribution can be written in theform of the Kemp Type 1 A p i q families of distributions as follows : p p y ` q p p y q “ ` y ` ` k ´ y ` ˘ γ q y ` ` y ` k ´ y ˘ γ q y “ ˆ y ` ky ` ˙ γ q “ p k ` y q γ p ` y q γ ´ q ` y , γ numerator factors, pp k ` y q , p k ` y q , ¨ ¨ ¨ , p k ` y qq , and γ ´ pp ` y q , p ` y q , ¨ ¨ ¨ , p ` y qq (See equation (2.5)).Hence the proof.In the following section, we develop approximate expressions for the mean and variance of theNGNB model. There is no closed form approximations for the mean and variance of the NGNB distribution. Itwould be of interest to develop approximate expressions for the mean and variance which mayhelp in building the link function for NGNB regression models. With this in mind, we developapproximate expressions for the mean and variance of the NGNB model and investigate theaccuracy of these expressions. We also compare the values of the approximate expressions withtheir exact values for certain range of its parameters and assess the approximations. Theseapproximations are developed based on observing patterns in the tabulated values of the meanand variance for a range of parameter values. In Tables 1-3, we present values of its mean andvariance obtained by numerical calculations for a range of values of its parameters. After a closeexamination of these values, especially for large γ , a pattern seems to emerge to approximateits mean and variance. Based on our observations, we propose the following approximations: E p Y q « kγq ´ q , and V ar p Y q « kγq p ´ q q . The closed form approximation of the mean of NGNB has a relation with the mean of con-volution of independent and identically distributed negative binomial populations. Thus, theapproximate mean and variance of Y can be interpreted as the mean and variance of γ inde-pendent and identically distributed populations following the negative binomial distributionsreferred in (2.1) with mean and variance respectively E p W q “ kq ´ q , and V ar p W q “ kq p ´ q q , where W is the corresponding NB random variable with the pdf (2.1). In the paper fromChakraborty and Imoto 2016 [4], they have also discussed the asymptotic mean in section 2.8for ECOMP p γ, p, α, β q which is given below p α ´ β ` ´ α ` p γ ´ q β p α ´ β q (3.1)As discussed in their paper, the NGNB distribution is derived from ECOMP when three ofits parameters are same, i.e., α “ β “ γ . However, under this condition, the equation for theECOMP mean in equation (3.1) does not exist for NGNB distribution. Thus, our closed formapproximation developed here extends the properties of NGNB distribution.We now compare the exact values of the mean and variance of NGNB with their proposed6pproximations for a range of its parameter values. We also present an analysis of the errorscommitted when approximating their exact values by the above proposed expressions. Analysis of errors in approximating E p Y q and V ar p Y q Tables 1, 2 and 3 provide exact and approximate values of E p Y q and V ar p Y q for a range ofvalues of γ, q and k . From Tables 1 and 2, it can be seen that when γ P r . , . s , q P r . , . s and k takes integer values from 5 to 9, the average value of the error E p Y q ´ kγq {p ´ q q is ´ . “ . γ ą r . , s , q P r . , . s and k takes integer values from 5 to 9, the average value of theerror E p Y q ´ kγq {p ´ q q is 0 . M SE “ . γ P r . , . s , q P r . , . s and k takes integer values from 5 to 9, the average value of the error V ar p Y q ´ kγq {p ´ q q is ´ . M SE “ . γ ą r . , s , q P r . , . s and k takes integer values from 5 to 9, the average value of the error V ar p Y q ´ kγq {p ´ q q is ´ . M SE “ . γ ă
1, the approximate expression overestimates the true value of the mean. For γ ą γ ă
1. The values ofthe MSE remain relatively similar for the two ranges of γ considered.On comparing the average and the MSE of the errors in approximating the variance, we observethat the approximate expression overestimates the true variance for all the values of γ consideredhere. The approximate expression is closer to the true value of the variance for γ ă γ ą
1. The value of the MSE is smaller for the case when γ ă In this paper, we have presented a closed form approximation of the NGNB distribution. Inparticular, we have shown that the NGNB approaches to the COM-Poisson distribution undercertain conditions. We have investigated some characteristics of the NGNB model, such as, itsprobability generating function, moments, and limiting properties. We have also investigatedthe log-concavity and log-convexity of the pf of this distribution and used them to characterizethe monotonicity of its failure rate. In a future paper, we plan to formulate a generalizedregression model based on the proposed approximate form of the mean of the NGNB model.7
APPENDIX
A.1 Log-concavity and log-convexity of the NGNB distribution
The failure rate function for a discrete random variable Y with pf p p y q is defined as r p y q “ P p Y “ y q P p Y ě y q . We characterize the failure rate of the NGNB model in terms of log-convexity and log-concavityof its pf which we investigate below.We use the ratio of two consecutive probabilities to determine the log-concavity and log-convexity of a distribution. This is used for determining the monotonicity of failure rates.We will use the following results in this regard from (Kemp 2005) [6], page 217: A distributionwith pf p p y q is log-convex, if p p y q p p y ` q p p y ` q ą p p y q p p y ` q p p y ` q ă . (A.2)The following theorem characterizes the failure rate of the NGNB in terms of its log-convexityand log-concavity for various values of γ : Theorem A.1.
The NGNB has increasing (decreasing) failure rate for γ greater (less) than ,when k ą .Proof. For the NGNB model, we have, p p y ` q p p y ` q “ ` y ` ` k ´ y ` ˘ γ q y ` ` y ` ` k ´ y ` ˘ γ q y ` “ ˆ y ` k ` y ` ˙ γ qp p y ` q p p y q “ ` y ` ` k ´ y ` ˘ γ q y ` ` y ` k ´ y ˘ γ q y “ ˆ y ` ky ` ˙ γ q. Hence p p y q p p y ` q p p y ` q “ ˆ p y ` k ` qp y ` qp y ` k qp y ` q ˙ γ . (A.3)For the expression (A.3) to be less than 1, we have pp y ` k ` qp y ` qq γ ă pp y ` k qp y ` qq γ ñ p y ` k ` qp y ` q ă p y ` k qp y ` q for positive γ ñ p y ` k qp y ` q ` p y ` q ă p y ` k qp y ` q ` p y ` k qñ y ` ă y ` k ñ k ą . γ when k ą γ ă k ą A.2 Characterization of the failure rate of NGNB
When γ ą
0, the pf of NGNB is log-concave for k ą γ ă k ą
1, the pf of NGNBis log-convex, hence it has decreasing failure rate. The Figure 1 shows graph for failure ratecurves. It shows that the NBNG has IFR for positive value of γ and k ą
1, by increasing thevalue of positive γ , we are able to lower down the failure rate with respect to its occurrence.The failure rate curves of NGNB, for γ ă k ą γ increases.Figure 1: Failure Rate Plot for NGNB(q=0.2,k=3) for positive and negative values of γ showingIFR and DFR Behavior respectively A.3 Convergence of NGNB to COM-Poisson
As is well known, negative binomial approaches the Poisson distribution under certain limitingconditions. In this section, we will show that similar relationships hold between the COM-Negative Binomial and COM-Poisson distributions.
Theorem A.2.
NBNG ( γ , k , q ) approaches to COM-Poisson( λ ) when k Ñ 8 and q Ñ , suchthat λ “ k γ q remains fixed. roof. For the NGNB distribution in (2.2), the numerator can be written as ´ p y ` k ´ q ! y ! p k ´ q ! ¯ γ q y “ q y y ! γ ´ p y ` k ´ qp y ` k ´ q¨¨¨p y ` k ´ ´p y ´ qqp k ´ q ! p k ´ q ! ¯ γ “ q y p k q yγ y ! γ ` p ` y ´ k qp ` y ´ k q ¨ ¨ ¨ p ` y ´ yk q ˘ γ “ p qk γ q y y ! γ ` p ` y ´ k qp ` y ´ k q ¨ ¨ ¨ p ` y ´ yk q ˘ γ Ñ λ y p y ! q γ as k Ñ 8 and q Ñ
0, such that λ “ k γ q remains fixed.The same approach can be used to write the denominator in a similar form. Hence, ` y ` k ´ y ˘ γ q y ř j “ ` j ` k ´ j ˘ γ q j Ñ λ y y ! γ ř j “ λ j j ! γ . This is the pf of the COM-Poisson distribution.Hence the proof. 10able 1: Exact and approximate closed-form of mean and variance for various NGNB parameters ( γ ,q,k), γ = (0.3, 0.4, 0.5, 0.6,0.7, 0.8, 0.9), q=(0.1,0.2, 0.3, 0.4, 0.5), k=(5, 6, 7, 8, 9, 10)) q γ “ . γ “ . γ “ . γ “ . γ “ . γ “ . γ “ . kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q able 2: Exact and approximate closed-form of mean and variance for various NGNB parameters ( γ ,q,k), γ = (0.3, 0.4, 0.5, 0.6,0.7, 0.8, 0.9), q=(0.6, 0.7, 0.8, 0.9), k=(5, 6, 7, 8, 9, 10) q γ “ . γ “ . γ “ . γ “ . γ “ . γ “ . γ “ .
9k E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q able 3: Exact and approximate closed-form of mean and variance for various NGNB parameters ( γ ,q,k), γ (1.2, 1.4, 1.5, 1.6, 1.8,2), q=(0.2, 0.4, 0.5, 0.6, 0.7, 0.8), k=(5, 6, 7, 8, 9, 10) q γ “ . γ “ . γ “ . γ “ . γ “ . γ “
2k E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q E(Y) kγq ´ q Var(Y) kγq p ´ q q eferences [1] Abramowitz, M., and Stegun, I.A. 1972. Handbook of mathematical functions , Dover, NewYork.[2] Borges, P., Rodrigues, J., Balakrishnan, N., and Bazan, J. 2014. A COM-Poisson typegeneralization of the binomial distribution and its properties and application,
Statistics andProbability Letters , 87, 158-166.[3] Chakraborty, S. and Ong, S.H. 2016. A COM-Poisson type generalization of the negativebinomial distribution,
Communications in Statistics, Theory and Methods , 45:14, 4117-4135.[4] Chakraborty, S. and Imoto, T. 2016. Extended Conway-Maxwell-Poisson distribution an itsproperties and applications,
Journal of Statistical Distributions and Applications ,3:5.[5] Gupta, P.L., Gupta, R.C., and Tripathi, R.C. 1997. On the monotonic properties of discretefailure rates,
Journal of Statistical Planning and Inference , 65, 255-268.[6] Johnson, N.L., Kemp, A.W., and Kotz, S. 2005.
Univariate discrete distribution , 3 rd Edition,Wiley, New York.[7] Kokonendji, C.C., Mizere, D., and Balakrishnan, N. 2007. Connections of the Poisson weightfunction to over-dispersion and under-dispersion,
Journal of Statistical Planning and Infer-ence , 138, 1287-1296.[8] Neyman, J., and Pearson, E.S. 1928. On the use and interpretation of certain test criteriafor purposes of statistical inference.
Biometrika , 20, 175-240.[9] Roy, S. 2016. COM-type generalizations of hypergeometric, negative hypergeometric andnegative binomial distributions. PhD diss., The University of Texas at San Antonio.[10] Roy, S., Tripathi, R.C. and Balakrishnan, N. 2019. A Conway Maxwell Poisson TypeGeneralization of the Negative Hypergeometric Distribution.
Communications in Statistics,Theory and Methods
DOI 10.1080/03610926.2019.1576885.[11] Shmueli, G., Borle, S., Minka, P.T., Kadane, J.B., and Boatwright, P. 2005. A usefuldistribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution,
Journal of the Royal Statistical Society, Series E , 54, 127-142.[12] Tripathi, R.C. 1985. Negative binomial distribution,
Encyclopedea of Statistical sciences,Vol. 8 , 169-177.[13] Tripathi, R.C., and Gurland, J. 1977. A general family of discrete distributions with hyper-geometric probabilities,
Journal of the Royal Statistical Society. Series B (Methodological) ,39, 349-356.[14] Xie, M., Gaudoin, O., and Bracquemond, C. 2002. Redefining failure rate function fordiscrete distributions,