Diagnostic tools for a multivariate negative binomial model for fitting correlated data with overdispersion
Lizandra Castilho Fabio, Cristian Villegas, Jalmar M. F. Carrasco, Mário de Castro
DDiagnostic tools for a multivariate negative binomial model forfitting correlated data with overdispersion
Lizandra Castilho FabioDepartment of StatisticsFederal University of BahiaBrazil Cristian VillegasDepartment of Exact SciencesUniversity of S˜ao PauloBrazilJalmar M. F. Carrasco ∗ Department of StatisticsFederal University of BahiaBrazil M´ario de CastroInstituto de Ciˆencias Matem´aticas e de Computa¸c˜aoUniversidade de S˜ao PauloBrazil
Abstract
We focus on the development of diagnostic tools and an R package called MNB for a mul-tivariate negative binomial (MNB) regression model for detecting atypical and influentialsubjects. The MNB model is deduced from a Poisson mixed model in which the randomintercept follows the generalized log-gamma (GLG) distribution. The MNB model for corre-lated count data leads to an MNB regression model that inherits the features of a hierarchicalmodel to accommodate the intraclass correlation and the occurrence of overdispersion simul-taneously. The asymptotic consistency of the dispersion parameter estimator depends on theasymmetry of the GLG distribution. Inferential procedures for the MNB regression modelare simple, although it can provide inconsistent estimates of the asymptotic variance whenthe correlation structure is misspecified. We propose the randomized quantile residual forchecking the adequacy of the multivariate model, and derive global and local influence mea-sures from the multivariate model to assess influential subjects. Finally, two applications arepresented in the data analysis section. The code for installing the
MNB package and the codeused in the two examples is exhibited in the Appendix. keywords:
Count data; Overdispersion; Multivariate negative binomial distribution;
MNB package.
Hierarchical models have been suggested for analyzing correlated count data due to theirfeature of allowing one to model the intraclass correlation and accommodate the occurrence ofoverdispersion simultaneously. This is handled through the inclusion of random effects in thesystematic component of generalized linear models. Using this approach, it is possible to relaxthe assumption about the distribution of the random effects, taking into account the empiricaldistributions of the data or individual profiles (Lee et al. , 2006, Molenberghs et al. , 2007 andFabio et al. , 2012). Several methods have been proposed for inferential procedures in hierar-chical models due to the intractable integrals involved in inference functions. The latter facthas been an obstacle in the development of diagnostic tools, leading to completely numerical ∗ Corresponding author. Email: [email protected] a r X i v : . [ s t a t . M E ] F e b rocedures being used. In such cases, a multivariate model deduced from the random effectsapproach (Molenberghs and Verbeke, 2010) can be an alternative to a hierarchical model forfitting correlated count data with extra variability. Fabio et al. (2012) proposed the randomintercept Poisson mixed regression model by assuming that the random effects follow a gener-alized log-gamma (GLG) distribution (Lawless, 1987). This distribution can be skewed to theright or skewed to the left, with the normal distribution as a particular case. Thus, the randomintercept Poisson-GLG model reduces to a multivariate negative binomial (MNB) model when itis assumed that the scale and shape parameters of the GLG distribution are equal. The randomintercept of a Poisson-GLG model is able to accommodate intraclass correlation and handleoverdispersion, due to the several degrees of asymmetry that the GLG distribution can assume.The MNB inherits these features, with the overdispersion parameter φ − = λ and simpler corre-lation structures. In the literature, the MNB regression (MNBR) model has been used for mod-eling correlated count data from several areas, for instance, health (Solis-Trapala and Farewell,2005), spatial (Moller and Rubak, 2010) and economics (Sung and Lee, 2018) data.We focus on developing diagnostic tools and an R (R Core Team, 2017) package for theMNBR model under overdispersion, which are essential for detecting outliers and for checkingthe adequacy of the model (Cook and Weisberg, 1983). Simulation studies are also conductedto evaluate the asymptotic properties of the maximum likelihood (ML) estimator and to analyzehow these properties’ are impacted on the random effect distribution misspecification. As notedin Solis-Trapala and Farewell (2005), the MNBR model can provide inconsistent estimates of theasymptotic variance of the regression coefficient when the covariance matrix of the random effectis misspecified. We use randomized quantile residuals (Dunn and Smyth, 1996) for checkingthe adequacy of the MNBR model. Following Cook (1977, 1986) approach, global and localinfluence measures are developed from the MNBR model to detect influential subjects. Thesemethodologies are useful to assess the impact on the estimation procedure by removing thesubject from the data set and by proposing perturbation models, respectively. The total localinfluence measure suggested by Lesaffre and Verbeke (1998) is also extended for the MNBRmodel. These measures are helpful for identifying outlying subjects that matter in the MNBRmodel and interpret them according to the application. The inferences and diagnostic analysisfrom MNBR are performed by using the MNB package developed by the authors.The paper is organized as follows: In Section 2, we provide some background about theMNBR model approach by Fabio et al. (2012). In Section 3, we discuss diagnostic analysismethodologies for the MNBR model. In Section 4, we perform a simulation study to evaluatethe asymptotic behavior of the ML estimator. In Section 5, the diagnostic analysis is applied totwo real data sets, using the
MNB package. The code for installing the
MNB package is presentedin the Appendix. Finally, in Section 6, we provide some discussions.
Let y ij denote the j th measurement taken on the i th subject or cluster, for j = 1 , . . . , m i and i = 1 , . . . , n . Further, let b i be random effects that follow the GLG distribution (Lawless, 2002).Assuming that y ij | b i are independent outcomes with probability mass function represented bya Poisson distribution, Fabio et al. (2012) proposed a random intercept Poisson-GLG modelwith the following hierarchical structure: ( i ) y ij | b i ind ∼ Poisson( u ij ), ( ii ) u ij = µ ij exp( b i ) , and( iii ) b i iid ∼ GLG(0 , σ, λ ) , where µ ij = exp ( x (cid:62) ij β ), with x ij = ( x ij , . . . , x ijp ) (cid:62) containing thevalues of explanatory variables, β = ( β , . . . , β p ) (cid:62) the vector of regression coefficients, and σ > λ ∈ R the scale and shape parameters of the GLG distribution. In general, in the randomeffects approach, the marginal distribution of the i th subject does not have an explicit form2Molenberghs and Verbeke, 2010 pp. 259-266). The ML estimates are obtained by integratingout the random effect and maximizing the log-likelihood function. Fabio et al. (2012) showedthat the integral can be solved analytically when the scale and shape parameters of the GLGdistribution are equal ( σ = λ , with λ ∈ R + ). When φ = λ − , the MNB distribution is deducedfrom the random intercept Poisson-GLG model of the following form: f ( y i ; β , φ ) = Γ( φ + y i + ) φ φ (cid:16)(cid:81) m i j =1 y ij ! (cid:17) Γ( φ ) exp (cid:16)(cid:80) m i j =1 y ij log( µ ij ) (cid:17) ( φ + µ i + ) φ + y i + , (1)where y i = ( y i , . . . , y im i ) (cid:62) is the vector of measurements available for the i th subject, φ isthe dispersion parameter, Γ( · ) is the gamma function, y i + = (cid:80) m i j =1 y ij , and µ i + = (cid:80) m i j =1 µ ij .According to Johnson et al. (1997), the MNB distribution belongs to the discrete multivariateexponential family of distributions, and its marginals f ( y ij ) are negative binomial distributionswith E( y ij ) = µ ij and Var( y ij ) = µ ij + µ ij /φ , for i = 1 , . . . , n and j = 1 , . . . , m i . The covariancesCov( y ij , y ij (cid:48) ) = µ ij µ ij (cid:48) /φ and intraclass correlations Corr( y ij , y ij (cid:48) ) = √ µ ij µ ij (cid:48) / (cid:16)(cid:112) φ + µ ij (cid:113) φ + µ ij (cid:48) (cid:17) for j (cid:54) = j (cid:48) are always positive. When φ − indicates the amount of excess correlation in the data(Hilbe, 2011), the parameter φ is also called an overdispersion parameter. For large values of φ ,the marginals of the MNB distribution behave approximately as independent Poisson distribu-tions with mean µ ij . By y i ∼ MNB( µ i , φ ) , we denote independent vectors of random outcomesthat follow the probability function given in (1), with µ i = ( µ i , . . . , µ im i ) (cid:62) and φ > i ) y i ind ∼ MNB( µ i , φ ) and ( ii ) log( µ ij ) = x (cid:62) ij β .Letting y = ( y (cid:62) , . . . , y (cid:62) n ) (cid:62) be the vector containing all the measured outcomes for the i thsubject, the log-likelihood function is given by (cid:96) ( θ ) = n (cid:88) i =1 log (cid:26) Γ( φ + y i + )Γ( φ ) (cid:27) − n (cid:88) i =1 m i (cid:88) j =1 log ( y ij !) + nφ log( φ ) − φ n (cid:88) i =1 log( φ + µ i + ) + n (cid:88) i =1 m i (cid:88) j =1 y ij log (cid:26) µ ij φ + µ i + (cid:27) , (2)where θ = ( β (cid:62) , φ ) (cid:62) . The ML estimates (cid:98) θ of θ are computed by using the quasi-Newton ( BFGS )method. Alternatively, we can solve the nonlinear equation obtained by setting the componentsof the score vector equal to zero, that is, U θ = ( U (cid:62) β , U φ ) (cid:62) = (details about the calculationsare in the Appendix). For interval estimation and hypothesis tests on the model parameters,the expected or observed Fisher information is required. Under standard regularity conditions, (cid:98) θ − θ is asymptotically distributed as a multivariate normal distribution with mean andcovariance matrix equal to the inverse of the Fisher information matrix of the MNBR model(see Fabio et al. , 2012). This section is devoted to diagnostic tools for the MNBR model. In what follows, we willdiscuss the residual analysis, and the global and local influence methodologies.
Let y i = ( y i , y i , . . . , y im i ) (cid:62) be a random vector that follows a MNB distribution. Accord-ing to Tsui (1986), the distribution of y i + = (cid:80) m i j =1 y ij is negative binomial with probability3istribution function given by f ( y i + ) = Γ( y i + + φ )( y i + )!Γ( φ ) q φ (1 − q ) y i + , y i + = 0 , , , ..., where q = φ/ ( φ + µ i + ). Then, the randomized quantile residuals (Dunn and Smyth, 1996),which follow a standard normal distribution, can be used to assess departures from the MNBRmodel. If F ( y i + ; µ, φ ) is the cumulative distribution of f ( y i + ), a i = lim y ↑ y i + F ( y ; (cid:98) µ i , (cid:98) φ ), and b i = F ( y i + ; (cid:98) µ i , (cid:98) φ ), then the randomized quantile residuals for y i are given by r q,i = Φ − ( u i ),where Φ( · ) is the cumulative distribution function of the standard normal and u i is a uniformrandom variable on the interval ( a i , b i ]. Based on the case-deletion approach (Cook, 1977), we propose the generalized Cook’s dis-tance as a global influence measure to assess the impact on the ML estimates of the MNBRmodel when the i th subject (or cluster) is removed from the data set. Let y ( i ) be a vector ofrandom outcomes after deleting the i th subject. Let (cid:98) θ ( i ) be the ML estimate of θ computedfrom (cid:96) ( i ) ( θ ), where (cid:96) ( i ) ( θ ) is obtained from (2) after removing the i th subject. The generalizedCook’s distance is defined as the standardized norm of the distance between (cid:98) θ ( i ) and (cid:98) θ , givenby the expression GD i ( θ ) = ( (cid:98) θ ( i ) − (cid:98) θ ) (cid:62) [¨ (cid:96) θθ ] − ( (cid:98) θ ( i ) − (cid:98) θ ) , where ¨ (cid:96) θθ (see the Appendix) is the Fisher information matrix of the MNBR model. Largevalues of GD i indicate that the ML estimates are strongly influenced by deleting the i th subject.Another popular measure of the difference between (cid:98) θ ( i ) and (cid:98) θ is the likelihood displacement givenby LD ( i ) ( θ ) = 2 { (cid:96) ( (cid:98) θ ) − (cid:96) ( (cid:98) θ ( i ) ) } . The local influence methodology was proposed by Cook (1986) for assessing the sensitivityof the parameters estimated when small perturbations are introduced in the model. Considerthe perturbation vector ω = ( ω , . . . , ω v ) (cid:62) , varying in some open subset Ω ⊂ R v . Let (cid:96) ( θ | ω )denote the log-likelihood function of the perturbed model. It is assumed there exists ω ∈ Ωsuch that (cid:96) ( θ | ω ) = (cid:96) ( θ ) for all θ . The influence of the minor perturbation on the ML estimate (cid:98) θ may be assessed by the likelihood displacement LD ( ω ) = 2 { (cid:96) ( (cid:98) θ ) − (cid:96) ( (cid:98) θ ω ) } , where (cid:98) θ ω denotesthe ML estimate from the perturbed model. A plot of LD ( ω ) versus ω contains essentialinformation about the influence of a perturbation scheme. Cook’s idea consists of selectinga unit direction d and evaluating the plot LD ( ω + a d ), where a ∈ R . This plot is calledlifted line, and each fitted line can be obtained by considering the normal curvature defined by C d ( θ ) = 2 | d (cid:62) ∆ (cid:62) ¨ (cid:96) ( θ ) − ∆ d | around a = 0, where ¨ (cid:96) ( θ ) = ∂ (cid:96) ( θ ) /∂ θ ∂ θ (cid:62) is evaluated at (cid:98) θ and ∆ = ∂ (cid:96) ( θ | ω ) /∂ θ ∂ ω (cid:62) is evaluated at (cid:98) θ and ω . Large values of C d ( θ ) indicate the sensitivityinduced by perturbation schemes in direction d . Cook (1986) suggests the local influence measure C d max ( θ ), evaluated in the direction d max corresponding to the eigenvector with maximal normalcurvature. If the i th component of d max is relatively large, this indicates that perturbations maylead to substantial changes in the ML estimates. Based on this approach, Lesaffre and Verbeke(1998) suggested the total local curvature corresponding to the i th element. This local influencemeasure is obtained by taking the direction d i , an n × i thposition. The total local curvature in direction d i assumes the form C i ( θ ) = 2 | ∆ (cid:62) i ¨ (cid:96) ( θ ) − ∆ i | ,where ∆ i = ∂ (cid:96) i ( θ | ω ) /∂ θ ∂ ω (cid:62) denotes the i th row of ∆ , such that (cid:96) i ( θ | ω ) is the contribution4f the i th individual to the log-likelihood function. ∆ i is evaluated at (cid:98) θ and ω , and it isrecommended to look at the index plot of C i ( θ ) to assess influential subjects. Usually, in theliterature, three types of perturbation schemes are considered for count data: case weights,explanatory variable, and dispersion parameter perturbation schemes. Case weights perturbation for the i th subject The log-likelihood function of the perturbed model takes the form (cid:96) ( θ | ω ) = n (cid:88) i =1 ω i (cid:110) log (cid:16) Γ( φ + y i + ) (cid:17) − log (cid:16) Γ( φ ) (cid:17) − m i (cid:88) j =1 log( y ij !) + φ log( φ ) − φ log( φ + µ i + ) + m i (cid:88) j =1 y ij log( µ ij ) − m i (cid:88) j =1 y ij log( φ + µ i + ) (cid:111) . The ∆ matrix in C d ( θ ) is given by ∆ (cid:62) = ( ∆ (cid:62) β , ∆ (cid:62) φ ) (cid:62) , with ∆ (cid:62) β = ( ∆ (cid:62) β , ∆ (cid:62) β , . . . , ∆ (cid:62) β p ) and ∆ β k = (∆ k , ∆ k , . . . , ∆ ki , . . . , ∆ kn ) , where∆ ki = m i (cid:88) j =1 (cid:40) y ij − ( (cid:98) φ + y i + )( (cid:98) φ + (cid:98) µ i + ) exp( x (cid:62) ij (cid:98) β ) (cid:41) x ijk = m i (cid:88) j =1 ( y ij − (cid:98) a i (cid:98) µ ij ) x ijk , and ∆ (cid:62) φ = (∆ φ , ∆ φ , . . . , ∆ φi , . . . , ∆ φn ), with∆ φi = ψ ( (cid:98) φ + y i + ) − ψ ( (cid:98) φ ) + log (cid:32) (cid:98) φ (cid:98) φ + (cid:98) µ i + (cid:33) + (cid:32) (cid:98) µ i + − y i + (cid:98) φ + (cid:98) µ i + (cid:33) , where ψ ( · ) is the digamma function, (cid:98) µ i + = (cid:80) m i j =1 (cid:98) µ ij , and (cid:98) µ ij = exp( x (cid:62) ij (cid:98) β ), for j = 1 , . . . , m i and i = 1 , . . . , n . Case weights perturbation for the j th measurement of the i th subject For this perturbation scheme, the log-likelihood function can be expressed as (cid:96) ( θ | ω ) = n (cid:88) i =1 m i (cid:88) j =1 ω ij (cid:110) m i log Γ( φ + y i + ) − m i log Γ( φ ) − log( y ij !)+ 1 m i φ log( φ ) − m i φ log( φ + µ i + ) + y ij log( µ ij ) − y ij log( φ + µ i + ) (cid:111) . The matrix ∆ = ( ∆ (cid:62) β , ∆ (cid:62) φ ) (cid:62) , in which element ij of ∆ (cid:62) β and ∆ (cid:62) φ , respectively, is given by∆ kij = y ij x ijk − (cid:98) φ/m i + y ij (cid:98) φ + (cid:98) µ i + m i (cid:88) j =1 (cid:98) µ ij x ijk , k = 1 , . . . , p and∆ φij = 1 m i ( ψ ( (cid:98) φ + y i + ) − ψ ( (cid:98) φ )) + 1 m i (1 + log( (cid:98) φ ) − log( (cid:98) φ + (cid:98) µ i + )) − (cid:98) φ/m i + y ij (cid:98) φ + (cid:98) µ i + . xplanatory variable perturbation We now consider an additive perturbation on a particular continuous explanatory variable, x ijk , j = 1 , . . . , m i , i = 1 , . . . , n, k = 1 , . . . , p , by setting x ∗ ijk = x ijk + ω i S x , where S x is ascale factor and ω i ∈ R . This perturbation scheme leads to the following expression for thelog-likelihood function: (cid:96) ( θ | ω ) = n (cid:88) i =1 (cid:40) log (cid:16) Γ( φ + y i + ) (cid:17) − log (cid:16) Γ( φ ) (cid:17) − m i (cid:88) j =1 log( y ij !) + φ log( φ ) − φ log( φ + µ ∗ i + ) + m i (cid:88) j =1 y ij log( µ ∗ ij ) − m i (cid:88) j =1 y ij log( φ + µ ∗ i + ) (cid:41) , where µ ∗ i + = (cid:80) m i j =1 µ ∗ ij , µ ∗ ij = exp( x ∗(cid:62) ij β ), x ∗(cid:62) ij β = β + β x ij + . . . + β k ( x ijk + ω i S x )+ . . . + β p x ijp ,and ω = (0 , . . . , (cid:62) . For k = 1 , . . . , p , if t = k , ∆ t is given by∆ ti = (cid:98) β t S x (cid:98) φ exp( x (cid:62) ij (cid:98) β )( (cid:98) φ + (cid:98) µ i + ) (cid:32) − (cid:98) µ i + (cid:98) φ + (cid:98) µ i + (cid:33) x ijt + S x exp( x (cid:62) ij (cid:98) β ) (cid:98) φ + (cid:98) µ i + + m i (cid:88) j =1 y ij (cid:104) S x + (cid:16) (cid:98) β t S x − (cid:98) µ ij (cid:17) x ijt (cid:105) + m i (cid:88) j =1 y ij exp( x (cid:62) ij (cid:98) β ) (cid:98) φ + (cid:98) µ i + × (cid:104) S x + ( (cid:98) β t S x − x ijt (cid:105) . If t (cid:54) = k , ∆ ti = (cid:98) β t S x (cid:98) φ exp( x (cid:62) ij (cid:98) β )( (cid:98) φ + (cid:98) µ i + ) (cid:32) − (cid:98) µ i + (cid:98) φ + (cid:98) µ i + (cid:33) x ijt + m i (cid:88) j =1 y ij x ijt × (cid:16) (cid:98) β t S x − (cid:98) µ ij (cid:17) + m i (cid:88) j =1 ( (cid:98) β t S x − (cid:98) φ + (cid:98) µ i + ) exp( x (cid:62) ij (cid:98) β ) y ij x ijt , and the i th element of ∆ (cid:62) φ is∆ φ i = (cid:98) β t S x (cid:98) µ i + ( (cid:98) φ + (cid:98) µ i + ) (cid:16) (cid:98) φ (cid:98) φ + (cid:98) µ i + − (cid:17) + (cid:98) β t S x m i (cid:88) j =1 y ij (cid:98) µ i + ( (cid:98) φ + (cid:98) µ i + ) . Dispersion parameter perturbation
Let φ ∗ i = ω i × φ be the dispersion parameter perturbation, where ω i ∈ R + . The log-likelihoodfunction of the perturbed model takes the form (cid:96) ( θ | ω ) = n (cid:88) i =1 (cid:40) log (cid:16) Γ( φ ∗ i + y i + ) (cid:17) − log (cid:16) Γ( φ ∗ i ) (cid:17) − m i (cid:88) j =1 log( y ij !) + φ ∗ i log( φ ∗ i ) − φ ∗ i log( φ ∗ i + µ i + ) + m i (cid:88) j =1 y ij log( µ ij ) − m i (cid:88) j =1 y ij log( φ ∗ i + µ i + ) (cid:41) . = ( ∆ (cid:62) β , ∆ (cid:62) φ ) (cid:62) , where the i -th elements of ∆ (cid:62) β and ∆ (cid:62) φ , respectively, are given by∆ k i = (cid:98) φ ∗ i (cid:98) µ i + ( y i + − (cid:98) µ i + )( (cid:98) φ ∗ i + (cid:98) µ i + ) x ijk and∆ φ i = ψ ( (cid:98) φ ∗ i + y i + ) − ψ ( (cid:98) φ ∗ i ) + (cid:98) φ ∗ i [ ψ (cid:48) ( (cid:98) φ ∗ i + y i + ) − ψ (cid:48) ( (cid:98) φ ∗ i )] + 1+ log (cid:32) (cid:98) φ ∗ i (cid:98) φ ∗ i + (cid:98) µ i + (cid:33) + (cid:98) µ i + (cid:98) φ ∗ i + (cid:98) µ i + + y i + (cid:98) µ i + ( (cid:98) φ ∗ i + (cid:98) µ i + ) + (cid:98) φ ∗ i (2 (cid:98) µ i + + (cid:98) φ ∗ i )( (cid:98) φ ∗ i + (cid:98) µ i + ) . For all the perturbation schemes, the ¨ (cid:96) ( θ ) matrix in C d ( θ ) is described in the Appendix. First, a simulation study is conducted to evaluate the asymptotic behavior of the ML estima-tor, (cid:98) θ . The vector y = ( y (cid:62) , . . . , y (cid:62) n ) (cid:62) , where y i ∼ MNB( µ i , φ ), is generated from a random in-tercept Poisson-GLG model with ( i ) y ij | b i ind ∼ Poisson( u ij ), ( ii ) log( u ij ) = β + β x ij + β x ij + b i ,and ( iii ) b i iid ∼ GLG(0 , λ, λ ) , where x ij ∼ N (0 ,
1) and x ij is a dummy variable with two levels,for i = 1 , . . . , n and j = 1 , ,
3. Further, it is assumed that β = (1 . , . , . (cid:62) and λ = φ − / inthe GLG distribution. The Bias, root mean square error (RMSE) and coverage probabilities forthe 95% confidence level are obtained from R = 10 ,
000 Monte Carlo replications performed fora sample size of n = 50, 100, 150, and 200 and for three different values of the dispersion pa-rameter, namely, φ = 3, 5 and 7. The estimates of the Bias and RMSE for θ are obtained from:Bias( θ s ) = θ s − θ s and RMSE( θ s ) = (cid:113)(cid:80) Rr =1 ( (cid:98) θ ( r ) s − θ s ) /R , respectively, where θ s = (cid:80) Rr =1 (cid:98) θ ( r ) s /R and (cid:98) θ ( r ) s the estimate from s th parameter obtained in r th Monte Carlo replication. These simu-lation results are presented in Table 1. As expected, the Bias and RMSE values of the regressioncoefficients estimates decrease as the sample size and values of the φ parameter increase. Thecoverage probabilities are close to 0.95. Moreover, the results show that the dispersion parame-ter exhibits a desirable asymptotic behavior when φ assumes small values. Since φ = λ − , it ispossible to affirm that the asymptotic properties of the dispersion parameter estimator are asso-ciated with the asymmetry of the GLG distribution. A second simulation study is performed toevaluate the impact of misspecifying the random effect distribution on the ML estimates of theMNBR model. It is assumed that the random intercept is normally distributed and negativelycorrelated. For sample size n = 50 , , and 150, the vector y = ( y (cid:62) , . . . , y (cid:62) n ) (cid:62) is generatedfrom a random intercept Poisson-Normal distribution with the following hierarchical structure:( i ) y ij | b i ind ∼ Poisson( u ij ), ( ii ) log( u ij ) = β + β x ij + b i and ( iii ) b i iid ∼ N(0 , σ ), σ = φ − and ( i ) y ij | b ij ind ∼ Poisson( u ij ), ( ii ) log( u ij ) = β + β x ij + z ij b ij , and ( iii ) b ∼ N ( , Σ ) , where( β , β ) (cid:62) = (1 . , . x ij ∼ U (0 , Σ = (cid:34) φ − -0.5 -0.1-0.5 φ − -1.0-0.1 -1.0 φ − (cid:35) , and φ − = 4 .
0. In Table 2, we present the Bias and root mean square error (RMSE), calculatedby fitting the MNBR model on 1,000 Monte Carlo replications for different sample sizes. Basedon the assumption that b i iid ∼ N(0 , σ ), we observe that the parameter estimates of φ and β arebiased. To summarize, we conclude that the MNBR model provides inconsistent estimates ofthe asymptotic variance of the ML estimators when the covariance matrix of the random effectis misspecified. 7able 1: Bias, root mean square error (RMSE), and coverage (%) of 95% confidence intervals. φ ( λ ) Measure n φ β β β A third simulation study is conducted to evaluate the variance-to-mean ratio, VMR =Var( y ) / E( y ). Considering three different values for the dispersion parameter, φ = 0 . , . φ , are (122 . , . . , .
35) and (5 . , . φ increases the VMR statistic decreasesand assumes values more than one. Thus, it is possible to conclude that the MNBR model isan overdispersion model. When the dispersion parameter is more than 50 similar results areobtained. 8able 2: Bias and root mean square error (RMSE) when the covariance matrix of the randomresponse variable is misspecified. Variance n Measure φ β β σ
50 Bias -0.4285 -1.8818 0.0561RMSE 0.4584 1.9805 1.0931100 Bias -0.3799 -1.9408 0.0497RMSE 0.3968 2.0068 0.9257150 Bias -0.3633 -1.9282 -0.0129RMSE 0.3750 1.9754 0.6798 Σ
50 Bias -0.4965 -1.8623 0.0068RMSE 0.5301 1.9669 1.0656100 Bias -0.4375 -1.9014 -0.0315RMSE 0.4560 1.9601 0.8784150 Bias -0.4196 -1.9228 -0.0372RMSE 0.4353 1.9953 0.8282
In this section, we present two examples illustrating the diagnostic tools for the MNBRmodel. The data sets and programs can be found in the
MNB package available on the R platform. Seizures data
The data set described in Diggle et al. (2013) refers to an experiment in which 59 epilepticpatients were randomly assigned to one of two treatment groups: treatment (progabide drug)and placebo groups. The number of seizures experienced by each patient during the baselineperiod (week eight) and the four consecutive periods (every two weeks) was recorded. The maingoal of this application is to analyze the drug effect with respect to the placebo. Two dummycovariates are considered in this study; Group which assumes values equal to 1 if the patientbelongs to treatment group and 0 otherwise, and Period which assumes values equal to 1 ifthe number of seizures are recorded during the treatment and 0 if are measured in the baselineperiod. Taking into account the irregular measurement of rate seizures during the time, thevariable Time is considered as an offset for fitting the data, where Time assumes values equal8 if the number of seizures is observed in the baseline period and 2 otherwise. The individualprofiles of the patients belonging to the placebo and progabide groups are shown in Figure 1 (a)and (b), respectively. The atypical individual profiles correspond to patient (cid:93) , , ,
25) ofthe placebo group, who presented a high number of seizures in the third visit compared to otherclinic visits, and patient (cid:93) , , ,
63) of the progabide group, who suffered a high numberof seizures in every clinic visit, indicating the ineffectiveness of the drug in patients with complexseizures. In both groups, it is possible to see the right skewness of the empirical distributionsof individual profiles. Figure 1(c) shows a normal probability plot with simulated envelope forthe Pearson residual (Faraway, 2016, p.135) computed by fitting the Poisson regression modelto seizures data. We confirm the occurrence of the overdispersion phenomenon. Thus, basedon Figure 1(a) - (c), which shows the asymmetric behavior of the empirical distribution ofindividual profiles, the MNBR model is proposed for modeling this behavior according to thefollowing structure: ( i ) y i ind ∼ MNB( µ i , φ ) and ( ii ) log( µ ij ) = β + β Group i + β Period ij + β (Group i × Period ij ) + log(Time ij ) , where y i = ( y i , y i , y i , y i ) (cid:62) , µ i = ( µ i , µ i , µ i , µ i ) (cid:62) , for i = 1 , . . . , β is the logarithm of the ratio of the average rate of the treatment groupto the placebo group at baseline, β is the logarithm of the ratio of the seizure mean aftertreatment period to before treatment period for the placebo group, and exp( β ) is the treatment9able 3: Bias, root mean square error (RMSE), coverage (%) of 95% confidence intervals. φ Measure n φ β β β VMR0.1 Bias 50 0.0093 -0.1769 -0.0006 0.0575 (122.39,220.24)100 0.0051 -0.1201 -0.0004 0.0347150 0.0027 -0.0838 -0.0003 0.0310200 0.0020 -0.0331 -0.0001 0.0226RMSE 50 0.0299 0.7048 0.0707 0.9794100 0.0198 0.5105 0.0411 0.6742150 0.0151 0.3891 0.0298 0.5531200 0.0130 0.3203 0.0253 0.4468Coverage 50 95.4 91.3 95.0 91.1100 95.0 92.8 94.2 92.8150 95.3 93.7 94.9 92.9200 95.5 94.5 95.1 94.71.0 Bias 50 0.0979 -0.0201 0.0017 -0.0026 (19.38,29.35)100 0.0390 -0.0059 0.0000 -0.0189150 0.0317 -0.0045 0.0000 -0.0160200 0.0164 -0.0039 0.0000 -0.0006RMSE 50 0.2706 0.2188 0.0552 0.3046100 0.1627 0.1550 0.0380 0.2132150 0.1351 0.1249 0.0291 0.1712200 0.1065 0.1069 0.0244 0.1477Coverage 50 95.7 93.9 94.7 92.2100 95.4 94.3 95.6 93.5150 95.3 94.9 95.2 94.1200 95.1 95.2 94.9 94.750 Bias 50 36.1329 -0.0034 0.0000 0.0022 (5.85,16.87)100 17.2463 -0.0030 0.0000 0.0015150 11.0381 -0.0009 0.0000 0.0011200 9.1020 -0.0008 0.0000 0.0021RMSE 50 73.4328 0.0621 0.0332 0.0739100 45.1901 0.0433 0.0272 0.0519150 30.6868 0.0367 0.0213 0.0454200 25.7042 0.0312 0.0206 0.0374Coverage 50 95.0 95.0 93.2 95.3100 94.2 95.5 94.1 95.4150 94.0 93.9 94.0 94.1200 95.7 93.4 94.1 94.5 effect, and it is the ratio of post- to pre-treatment mean seizure ratios between treatment andplacebo groups. The parameter estimates obtained by using the fit.MNB function from the
MNB package are shown in Table 4. It is not observed evidence of the treatment effect. The shapeparameter estimate indicates that the variability of individual profiles with respect their averageis asymmetric to the left with dispersion equal to 1.607 ( λ = 1 / √ φ = 0 . (cid:93)
49 is atypical. The qMNB and envelope.MNB functions from the
MNB package isused to display the graphics of the randomized quantile residual.The global influential graphics presented in Figure 3(a) and (b) reveal that patients (cid:93) (cid:93)
25 can impact the ML estimates of the MNBR model when they are removed from themodel. The local and total local influential graphics exhibited in Figure 4(a) and (b) indicate thesensitivity of the estimate associated with patient (cid:93)
49 when the minor perturbation is induced10 imes S e i z u r e s c oun t (a) Times S e i z u r e s c oun t (b) −3 −2 −1 0 1 2 3 N(0,1) quantile P ea r s on r e s i dua l (c)Figure 1: Individual profiles of the patients in (a) the placebo group, (b) the progabide group,and (c) the simulated envelope plot of the Pearson residuals.Table 4: Parameter estimates with their respective approximate standard errors (Std. error),z-values, and p -values for the MNBR model fitted to the epileptic data. Parameter Estimate Std. error z-value p -value φ − − β < . β β β -0.105 0.065 -1.610 0.107 λ − Index R ando m i z ed quan t il e r e s i dua l (a) −2 −1 0 1 2 − − N(0,1) quantile R ando m i z ed quan t il e r e s i dua l (b)Figure 2: Index plot of the randomized quantile residuals (a) and (b) the simulated envelopeplot of the randomized quantile residuals. 11
10 20 30 40 50 60 . . . . Index G ene r a li z ed C oo k D i s t an c e (a) Index | L i k e li hood d i s p l a c e m en t |
25 49 (b)Figure 3: Index plot of (a) the generalized Cook distance and (b) the likelihood displacement. . . . . . . . Index | d m a x | (a) Index C i (b)Figure 4: Index plot of | d max | for the case weight perturbation scheme: (a) local influence and(b) total local influence on the i th patient.in the directions of d max and d i , respectively. According to Figure 5(a) and (b), the number ofepileptic seizures of patient (cid:93)
49 recorded on the first and last clinic visits of the trial are moreinfluential. The dispersion perturbation scheme did not show evidence of any possible influentialobservations.Finally, the percentage relative deviations
P RD =[( (cid:98) θ − (cid:98) θ ∗ ) / (cid:98) θ ] × (cid:98) θ ∗ is the estimatorof θ obtained after deleting one or more atypical subjects, are calculated and the results are12
50 100 150 200 250 300 . . . . . . Index | d m a x | (49,1) (a) Index C i (49,1) (b)Figure 5: Index plot of | d max | for case weight perturbation scheme: (a) local influence and (b)total influence of the j th measurement of each patient.presented in Table 5. We can observe significatively changes are associate with the estimates of β and β . Noting that the signal of the estimate β is indicating that the treatment decreasesthe number of seizures, its descriptive-level keeps nonsignificant though. For the parameter β we observe that the interaction effect incresed 50% and that its descriptive-level becomessignificant.Table 5: Parameter estimates, standard errors (Std. error), p -values, and percentage relativedeviations ( P RD ). Dropping Parameter Estimate Std. error p -value P RD (%) φ − -28.21 β < .
000 0.00 (cid:93) β -0.107 0.189 0.573 487.95 β β -0.302 0.070 < .
000 -188.73
Accident data
Karlis (2003) provides a data set which refers to the number of car accidents in 24 centralroads in Athens, for a time period of 1987 to 1991. In this paper, we are interested in modelingthe number of car accidents per road length during the period of 5 years. Thus, the dummycovariate years is considered for fitting the data including the road length as an offset. Figure6(a) presents the individual profiles of the central roads for the number of car accidents recordedon each road over time. In general, the highest number of car accidents occurred in 1989 andthe smallest in 1991. The empirical distribution of individual profiles is skewed to the right.Figure 6(b) reveals that
P atision and
P eiraios roads are considered as outliers and evidence ofvariability between the individual profiles and across the years is suggested in Figure 6(c).13 ears N u m be r o f a cc i den t s (a) Years N u m be r o f a cc i den t s PatisionPeiraios Patision (b) −2 −1 0 1 2 − N(0,1) quantile P ea r s on r e s i dua l (c)Figure 6: (a) Individual profiles, (b) box-plots of the number of car accidents on central Athensroads during the 5 years, and (c) simulated envelope plot of Pearson residuals.In the fitting the Poisson regression model to the accidents data, Figure 6(c), we observe theoccurrence of the overdispersion phenomenon. Based on Figure 6(a) - (b) showing the asymmet-ric behavior of the empirical distribution of individual profiles, we propose the MNBR modelwith the following structure for modeling the data: ( i ) y i ind ∼ MNB( µ i , φ ) and ( ii ) log( µ ij ) = β + β t year j + log(length i ), where y i = ( y i , y i , y i , y i , y i ) (cid:62) , µ i = ( µ i , µ i , µ i , µ i , µ i ) (cid:62) , t = 1 , , , β t is the logarithm of the ratio of the average car accidents per road length for year t to year 1987. The ML estimates for the MNBR model are presented in Table 6. We observethe estimates are significant and that the small difference in the logarithm of the rate averageof the number of car accidents per length of the roads occurred between the years the 1987 and1991.Table 6: Parameter estimates, standard errors (Std. error), z-values, and p -values for the MNBRmodel fitted to the accidents data.Parameter Estimate Std. error z-value p -value φ − β < . β < . β < . β < . β -0.322 0.063 -5.112 < . Leof. Kavalas , Leof. Athinon and
P anepistimiou can impact the ML estimates when they are removed.
Leof. Kavalas (4,6,12,9,3) and
P anepistimiou (24, 58, 40, 36, 05) are small roads, 2.0 and 1.1 kilometerslong, respectively.
Leof. Athinon (15,11,16,21,28) is a large road, 6.1 kilometers long. Fur-thermore, two roads that can impact the ML estimates are showed in Figure 8(b),
P atision (80,108,114,113,86) and
Katehaki (2,3,27,24,7), which are 4.1 and 1.4 kilometers long, respec-tively. In Figure 9(a) and (b) are showed influential roads,
P eraios (86 89 109 90 49) and
Aharnon (44, 79, 91, 88, 33), which are 8.0 and 5.5 kilometers long, respectively.14
10 15 20 − − Index R ando m i z ed quan t il e r e s i dua l (a) −2 −1 0 1 2 − − − N(0,1) quantile R ando m i z ed quan t il e r e s i dua l (b)Figure 7: Index plot of the randomized quantile residuals (a) and (b) the simulated envelopeplot of the randomized quantile residuals. . . . . . . . Index G ene r a li z ed C oo k D i s t an c e Leof.KavalasLeof.Athinon Panepistimiou (a)
Index | L i k e li hood d i s p l a c e m en t | Katehaki Leof.AthinonPanepistimiouPatision (b)Figure 8: Index plot of (a) the generalized Cook distance and (b) the likelihood displacementfor the number of car accidents on central Athenian roads in 1989-1991.Finally, the percentage relative deviations (
P RD ) are calculated and we verify that themost significant changes in the ML estimates are associated with the removal of the influentialroads
P atision and
P eiraios . The results in Table 7 show that the estimates are affected when
P atision and
P eiraios are removed simultaneously.15
10 15 20 . . . . . Index | d m a x | Aharnon PatisionPeiraios (a)
Index C i Aharnon PatisionPeiraios (b)Figure 9: Index plot of | d max | for the case weight perturbation scheme: (a) local influence and(b) total local influence for the number of car accidents on central Athenian roads in 1989-1991. . . . . . . . . Index | d m a x | (Patision,3)(Peiraios,3) (a) Index C i (Patision,5) (b)Figure 10: Index plot of | d max | for the case weight perturbation scheme: (a) local influence and(b) total local influence on j th measurement of the number of car accidents for each atypicalcentral Athenian road. In this study, we developed diagnostic tools for the MNBR model derived from the Poissonmixed model where the GLG distribution is assumed for the random effect. Simulation resultsexhibited the features of the MNBR model and its association with the hierarchical model, which16
10 15 20 . . . . . . . Index | d m a x | Leof.KavalasLeof.Athinon (a) . . . . . Index C i Leof.Athinon Panepistimiou (b)Figure 11: Index plot of | d max | for the dispersion perturbation scheme, (a) local influence and(b) total local influence for the number of car accident on central Athenian roads of 1989-1991.Table 7: Parameter estimates, standard errors (Std. Error), z-values, p -values, and P RD (%)when
P atision road is deleted from the accidents data.Dropping Parameter Estimate Std. Error z-value p -value P RD (%)
P atision φ − − − β < .
001 2.27 β < .
001 -2.65 β < .
001 -5.36 β < .
001 -3.86 β -0.399 0.069 -5.765 < .
001 -23.97
P eiraios φ − − − β < .
001 1.54 β < .
001 -12.63 β < .
001 -8.23 β < .
001 -11.98 β -0.287 0.067 -4.257 < .
001 10.91
P eiraios and P atision φ − − − β < .
001 4.24 β < .
001 -17.95 β < .
001 -15.76 β < .
001 -18.47 β -0.370 0.075 -4.924 < .
001 -14.91was essential for explaining that the asymptotic consistency of the estimators of the regressioncoefficients and the dispersion parameter depends on the asymmetry of the GLG distribution. Asexpected, the MNBR model provides inconsistent estimates of the asymptotic variance of the MLestimators when the covariance matrix of the response variable is misspecified. The randomized17uantile residuals can be used to assess possible departures of the data from the MNBR modelassumptions. Following the approaches of Cook (1977, 1986) and Lesaffre and Verbeke (1998),global and local measures for the MNBR model were derived and implemented in the authors’
MNB package. The application of the
MNB package to two data sets was presented and it wasshown that the asymmetric behavior of the empirical distributions of the individual profilescan indicate the need to use the multivariate model to better fit these profiles. The proposedmethodology was helpful for identifying outlying subjects that matter in the MNBR model overtime and handle the overdispersion phenomenon. The code for installing the
MNB package ispresented in the Appendix.
Acknowledgements
The work of the fourth author is partially funded by CNPq, Brazil. We also thank anonymousreferees for constructive comments and suggestions.
References
Cook, R. D. (1977). Detection of influential observations in linear regression.
Technometrics , , 15–18. 2, 4, 18Cook, R. D. (1986). Assessment of local influence (with discussion). Journal of the RoyalStatistical Society Series B , , 133–169. 2, 4, 18Cook, R. D. and Weisberg, S. (1983). Residuals and Influence in Regresion . Chapman and Hall,New York. 2Diggle, P. J., Liang, K. Y., and Zeger, S. L. (2013).
Analysis of Longitudinal Data . OxfordUniversity Press, N.Y., 2 edition. 9Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals.
Journal of Computationaland Graphical Statistics , , 236–244. 2, 4Fabio, L., Paula, G. A., and de Castro, M. (2012). A Poisson mixed model with nonormalrandom effect distribution. Computational Statistics and Data Analysis , , 1499–1510. 1, 2,3Faraway, F. (2016). Extending the Linear Model with R: Generalized Linear, Mixed Effects andnonparametric regression models . Chapman & Hall/CRC, New York. 9Hilbe, J. M. (2011).
Negative Binomial Regression . Cambridge, United Kingdom. 3Johnson, N., Kotz, S., and Balakrishnan, N. (1997).
Discrete Multivariate Distributions . Wiley,New York. 3Karlis, D. (2003). An EM algorithm for multivariate Poisson distribution and related models.
Journal of Applied Statistics , , 63–77. 13Lawless, J. (1987). Negative binomial and mixed Poisson regression. The Canadian Journal ofStatistics , , 209–225. 2Lawless, J. F. (2002). Statistical Models and Methods for Lifetime Data . Wiley-Interscience,New York. 2Lee, Y., Nelder, J. A., and Pawitan, Y. (2006).
Generalized linear models with random effects.Unified analysis via H-likelihood . Chapman and Hall/CRC, Boca Raton. 1Lesaffre, E. and Verbeke, G. (1998). Local influence in linear mixed models.
Biometrics , ,570–582. 2, 4, 18Molenberghs, G. and Verbeke, G. (2010). Models for Discrete Longitudinal Data . Springer, NewYork. 2, 3Molenberghs, G., Verbeke, G., and Dem´etrio, C. G. B. (2007). An extended random-effects18pproach to modeling repeated, overdispersed count data.
Lifetime Data Analysis , , 513–531. 1Moller, J. and Rubak, E. (2010). A model for positively correlated count variables. InternationalStatistical Review , , 65–80. 2R Core Team (2017). R: A Language and Environment for Statistical Computing . R Foundationfor Statistical Computing, Vienna, Austria. 2Solis-Trapala, I. and Farewell, V. (2005). Regression analysis of overdispersed correlated countdata with subject specific covariates.
Statistics in Medicine , , 2557–2575. 2Sung, Y. and Lee, K. (2018). Negative binomial loglinear mixed models with general randomeffects covariance matrix. Communications for Statistical Applications and Methods , , 61–70. 2Tsui, K.-W. (1986). Multiparameter estimation for some multivariate discrete distributions withpossibly dependent components. Annals of the Institute of Statistical Mathematics , , 45–56.3 Appendix1. The score vector and the observed Fisher information matrix
The score vector U θ = ( U (cid:62) β , U φ ) (cid:62) is obtained by deriving the log-likelihood function (2) withrespect to the parameters β and φ , respectively. Thus, U β = n (cid:88) i =1 X (cid:62) i ( y i − a i µ i ) and U φ = n (cid:88) i =1 (cid:26) ψ ( φ + y i + ) − ψ ( φ ) + log (cid:18) φφ + µ i + (cid:19) + (cid:18) µ i + − y i + φ + µ i + (cid:19) (cid:27) , (3)where a i = ( φ + y i + ) / ( φ + µ i + ) , ψ ( · ) is the digamma function, and X i is an m i × p matrix withrow x (cid:62) ij for i = 1 , . . . , n and j = 1 , . . . , m i . However, using the fact that Γ( φ + y i + ) / Γ( φ ) = φ ( φ + 1)( φ + 2) . . . ( φ + y i + −
1) in (3), U φ takes the following form U φ = n (cid:88) i =1 y i + − (cid:88) j =0 ( j + φ ) − − y i + φ + µ i + − log(1 + φ − µ i + ) + µ i + ( φ + µ i + ) . The observed information matrix is obtained by deriving the score vector with respect to β and φ . Thus, − ¨ (cid:96) ( θ ) = (cid:20) − ¨ (cid:96) ββ − ¨ (cid:96) βφ − ¨ (cid:96) φβ − ¨ (cid:96) φφ , (cid:21) where ¨ (cid:96) ββ = − n (cid:88) i =1 ( φ + y i + )( φ + µ i + ) m i (cid:88) j =1 x ij x (cid:62) ij µ ij + n (cid:88) i =1 ( φ + y i + )( φ + µ i + ) m i (cid:88) j =1 x ij µ ij m i (cid:88) j =1 x (cid:62) ij µ ij , ¨ (cid:96) βφ = ¨ (cid:96) φβ = − m i (cid:88) j =1 x ijk µ ij ( µ i + − y i + )( φ + µ i + ) and¨ (cid:96) φφ = − n (cid:88) i =1 y ∗ i (cid:88) s =0 s + φ ) + φ − µ i + ( φ + µ i + ) − µ i + ( φ + µ i + ) − y i + ( φ + µ i + ) . . How to install and use the MNB package
We present the R platform code used to install the MNB package. require(devtools) devtools::install_github("carrascojalmar/MNB") require(MNB) data(seizures) star <-list(phi=1, beta0=1, beta1=1, beta2=1, beta3 =1) mod <- fit.MNB(formula=Y ~ trt + period + trt:period + offset(log(weeks)), star=star , dataSet=seizures ,tab=FALSE) mod par <- mod $ par names(par)<-c() res.q <- qMNB(par=par ,formula=Y ~ trt + period + trt:period + offset(log(weeks)),dataSet=seizures) plot(res.q,ylim=c(-3,4.5),ylab="Randomized quantile residual", xlab="Index",pch=15,cex.lab = 1.5, cex = 0.6, bg = 5) abline(h=c(-2,0,2),lty=3) envelope.MNB(formula=Y ~ trt + period + trt:period + offset(weeks),star=star ,nsim=21,n.r=6, dataSet=seizures ,plot=TRUE) global.MNB(formula=Y ~ trt + period + trt:period + offset(log(weeks)),star=star ,dataSet=seizures ,plot=TRUE) local.MNB(formula=Y ~ trt + period + trt:period + offset(log(weeks)),star=star ,dataSet=seizures , schemes="weight",plot=TRUE) require(MNB) accident <- read.table("...\\ accident.txt",h=TRUE) temp <- apply(accident [,2:6],1,t) Y <- matrix(temp ,ncol=1) length.x <- rep(accident $ length ,each=5) ind <- rep(1:24,each=5) year <- as.factor(rep(1:5 ,24)) accident2 <- data.frame(Y,year ,length.x,ind) star=list(phi=10, beta1=2, beta2=0.4, beta3= 0.6, beta4=0.5, beta5 =-0.3) mod <- fit.MNB(Y ~ year + offset(log(length.x)),star=star ,dataSet=accident2 ,tab=FALSE) mod par <- mod $ par names(par)<-c() qMNB(par=par ,formula=Y ~ year + offset(log(length.x)),dataSet=accident2) envelope.MNB(formula=Y ~ year + offset(log(length.x)),star=star ,nsim=21,n.r=6, dataSet=accident2 ,plot=TRUE) global.MNB(Y ~ year + offset(log(length.x)),star=star ,dataSet=accident2 ,plot=TRUE) local.MNB(Y ~ year + offset(log(length.x)),star=star ,dataSet=accident2 , schemes="weight",plot=TRUE)schemes="weight",plot=TRUE)