[PDF] Diagnostic tools for a multivariate negative binomial model for fitting correlated data with overdispersion

Abstract

We focus on the development of diagnostic tools and an R package called MNB for a multivariate negative binomial (MNB) regression model for detecting atypical and influential subjects. The MNB model is deduced from a Poisson mixed model in which the random intercept follows the generalized log-gamma (GLG) distribution. The MNB model for correlated count data leads to an MNB regression model that inherits the features of a hierarchical model to accommodate the intraclass correlation and the occurrence of overdispersion simultaneously. The asymptotic consistency of the dispersion parameter estimator depends on the asymmetry of the GLG distribution. Inferential procedures for the MNB regression model are simple, although it can provide inconsistent estimates of the asymptotic variance when the correlation structure is misspecified. We propose the randomized quantile residual for checking the adequacy of the multivariate model, and derive global and local influence measures from the multivariate model to assess influential subjects. Finally, two applications are presented in the data analysis section. The code for installing the MNB package and the code used in the two examples is exhibited in the Appendix.

Full PDF

DDiagnostic tools for a multivariate negative binomial model forﬁtting correlated data with overdispersion

Lizandra Castilho FabioDepartment of StatisticsFederal University of BahiaBrazil Cristian VillegasDepartment of Exact SciencesUniversity of S˜ao PauloBrazilJalmar M. F. Carrasco ∗ Department of StatisticsFederal University of BahiaBrazil M´ario de CastroInstituto de Ciˆencias Matem´aticas e de Computa¸c˜aoUniversidade de S˜ao PauloBrazil

Abstract

We focus on the development of diagnostic tools and an R package called MNB for a mul-tivariate negative binomial (MNB) regression model for detecting atypical and inﬂuentialsubjects. The MNB model is deduced from a Poisson mixed model in which the randomintercept follows the generalized log-gamma (GLG) distribution. The MNB model for corre-lated count data leads to an MNB regression model that inherits the features of a hierarchicalmodel to accommodate the intraclass correlation and the occurrence of overdispersion simul-taneously. The asymptotic consistency of the dispersion parameter estimator depends on theasymmetry of the GLG distribution. Inferential procedures for the MNB regression modelare simple, although it can provide inconsistent estimates of the asymptotic variance whenthe correlation structure is misspeciﬁed. We propose the randomized quantile residual forchecking the adequacy of the multivariate model, and derive global and local inﬂuence mea-sures from the multivariate model to assess inﬂuential subjects. Finally, two applications arepresented in the data analysis section. The code for installing the

MNB package and the codeused in the two examples is exhibited in the Appendix. keywords:

Count data; Overdispersion; Multivariate negative binomial distribution;

MNB package.

Hierarchical models have been suggested for analyzing correlated count data due to theirfeature of allowing one to model the intraclass correlation and accommodate the occurrence ofoverdispersion simultaneously. This is handled through the inclusion of random eﬀects in thesystematic component of generalized linear models. Using this approach, it is possible to relaxthe assumption about the distribution of the random eﬀects, taking into account the empiricaldistributions of the data or individual proﬁles (Lee et al. , 2006, Molenberghs et al. , 2007 andFabio et al. , 2012). Several methods have been proposed for inferential procedures in hierar-chical models due to the intractable integrals involved in inference functions. The latter facthas been an obstacle in the development of diagnostic tools, leading to completely numerical ∗ Corresponding author. Email: [email protected] a r X i v : . [ s t a t . M E ] F e b rocedures being used. In such cases, a multivariate model deduced from the random eﬀectsapproach (Molenberghs and Verbeke, 2010) can be an alternative to a hierarchical model forﬁtting correlated count data with extra variability. Fabio et al. (2012) proposed the randomintercept Poisson mixed regression model by assuming that the random eﬀects follow a gener-alized log-gamma (GLG) distribution (Lawless, 1987). This distribution can be skewed to theright or skewed to the left, with the normal distribution as a particular case. Thus, the randomintercept Poisson-GLG model reduces to a multivariate negative binomial (MNB) model when itis assumed that the scale and shape parameters of the GLG distribution are equal. The randomintercept of a Poisson-GLG model is able to accommodate intraclass correlation and handleoverdispersion, due to the several degrees of asymmetry that the GLG distribution can assume.The MNB inherits these features, with the overdispersion parameter φ − = λ and simpler corre-lation structures. In the literature, the MNB regression (MNBR) model has been used for mod-eling correlated count data from several areas, for instance, health (Solis-Trapala and Farewell,2005), spatial (Moller and Rubak, 2010) and economics (Sung and Lee, 2018) data.We focus on developing diagnostic tools and an R (R Core Team, 2017) package for theMNBR model under overdispersion, which are essential for detecting outliers and for checkingthe adequacy of the model (Cook and Weisberg, 1983). Simulation studies are also conductedto evaluate the asymptotic properties of the maximum likelihood (ML) estimator and to analyzehow these properties’ are impacted on the random eﬀect distribution misspeciﬁcation. As notedin Solis-Trapala and Farewell (2005), the MNBR model can provide inconsistent estimates of theasymptotic variance of the regression coeﬃcient when the covariance matrix of the random eﬀectis misspeciﬁed. We use randomized quantile residuals (Dunn and Smyth, 1996) for checkingthe adequacy of the MNBR model. Following Cook (1977, 1986) approach, global and localinﬂuence measures are developed from the MNBR model to detect inﬂuential subjects. Thesemethodologies are useful to assess the impact on the estimation procedure by removing thesubject from the data set and by proposing perturbation models, respectively. The total localinﬂuence measure suggested by Lesaﬀre and Verbeke (1998) is also extended for the MNBRmodel. These measures are helpful for identifying outlying subjects that matter in the MNBRmodel and interpret them according to the application. The inferences and diagnostic analysisfrom MNBR are performed by using the MNB package developed by the authors.The paper is organized as follows: In Section 2, we provide some background about theMNBR model approach by Fabio et al. (2012). In Section 3, we discuss diagnostic analysismethodologies for the MNBR model. In Section 4, we perform a simulation study to evaluatethe asymptotic behavior of the ML estimator. In Section 5, the diagnostic analysis is applied totwo real data sets, using the

MNB package. The code for installing the

MNB package is presentedin the Appendix. Finally, in Section 6, we provide some discussions.

Let y ij denote the j th measurement taken on the i th subject or cluster, for j = 1 , . . . , m i and i = 1 , . . . , n . Further, let b i be random eﬀects that follow the GLG distribution (Lawless, 2002).Assuming that y ij | b i are independent outcomes with probability mass function represented bya Poisson distribution, Fabio et al. (2012) proposed a random intercept Poisson-GLG modelwith the following hierarchical structure: ( i ) y ij | b i ind ∼ Poisson( u ij ), ( ii ) u ij = µ ij exp( b i ) , and( iii ) b i iid ∼ GLG(0 , σ, λ ) , where µ ij = exp ( x (cid:62) ij β ), with x ij = ( x ij , . . . , x ijp ) (cid:62) containing thevalues of explanatory variables, β = ( β , . . . , β p ) (cid:62) the vector of regression coeﬃcients, and σ > λ ∈ R the scale and shape parameters of the GLG distribution. In general, in the randomeﬀects approach, the marginal distribution of the i th subject does not have an explicit form2Molenberghs and Verbeke, 2010 pp. 259-266). The ML estimates are obtained by integratingout the random eﬀect and maximizing the log-likelihood function. Fabio et al. (2012) showedthat the integral can be solved analytically when the scale and shape parameters of the GLGdistribution are equal ( σ = λ , with λ ∈ R + ). When φ = λ − , the MNB distribution is deducedfrom the random intercept Poisson-GLG model of the following form: f ( y i ; β , φ ) = Γ( φ + y i + ) φ φ (cid:16)(cid:81) m i j =1 y ij ! (cid:17) Γ( φ ) exp (cid:16)(cid:80) m i j =1 y ij log( µ ij ) (cid:17) ( φ + µ i + ) φ + y i + , (1)where y i = ( y i , . . . , y im i ) (cid:62) is the vector of measurements available for the i th subject, φ isthe dispersion parameter, Γ( · ) is the gamma function, y i + = (cid:80) m i j =1 y ij , and µ i + = (cid:80) m i j =1 µ ij .According to Johnson et al. (1997), the MNB distribution belongs to the discrete multivariateexponential family of distributions, and its marginals f ( y ij ) are negative binomial distributionswith E( y ij ) = µ ij and Var( y ij ) = µ ij + µ ij /φ , for i = 1 , . . . , n and j = 1 , . . . , m i . The covariancesCov( y ij , y ij (cid:48) ) = µ ij µ ij (cid:48) /φ and intraclass correlations Corr( y ij , y ij (cid:48) ) = √ µ ij µ ij (cid:48) / (cid:16)(cid:112) φ + µ ij (cid:113) φ + µ ij (cid:48) (cid:17) for j (cid:54) = j (cid:48) are always positive. When φ − indicates the amount of excess correlation in the data(Hilbe, 2011), the parameter φ is also called an overdispersion parameter. For large values of φ ,the marginals of the MNB distribution behave approximately as independent Poisson distribu-tions with mean µ ij . By y i ∼ MNB( µ i , φ ) , we denote independent vectors of random outcomesthat follow the probability function given in (1), with µ i = ( µ i , . . . , µ im i ) (cid:62) and φ > i ) y i ind ∼ MNB( µ i , φ ) and ( ii ) log( µ ij ) = x (cid:62) ij β .Letting y = ( y (cid:62) , . . . , y (cid:62) n ) (cid:62) be the vector containing all the measured outcomes for the i thsubject, the log-likelihood function is given by (cid:96) ( θ ) = n (cid:88) i =1 log (cid:26) Γ( φ + y i + )Γ( φ ) (cid:27) − n (cid:88) i =1 m i (cid:88) j =1 log ( y ij !) + nφ log( φ ) − φ n (cid:88) i =1 log( φ + µ i + ) + n (cid:88) i =1 m i (cid:88) j =1 y ij log (cid:26) µ ij φ + µ i + (cid:27) , (2)where θ = ( β (cid:62) , φ ) (cid:62) . The ML estimates (cid:98) θ of θ are computed by using the quasi-Newton ( BFGS )method. Alternatively, we can solve the nonlinear equation obtained by setting the componentsof the score vector equal to zero, that is, U θ = ( U (cid:62) β , U φ ) (cid:62) = (details about the calculationsare in the Appendix). For interval estimation and hypothesis tests on the model parameters,the expected or observed Fisher information is required. Under standard regularity conditions, (cid:98) θ − θ is asymptotically distributed as a multivariate normal distribution with mean andcovariance matrix equal to the inverse of the Fisher information matrix of the MNBR model(see Fabio et al. , 2012). This section is devoted to diagnostic tools for the MNBR model. In what follows, we willdiscuss the residual analysis, and the global and local inﬂuence methodologies.

Let y i = ( y i , y i , . . . , y im i ) (cid:62) be a random vector that follows a MNB distribution. Accord-ing to Tsui (1986), the distribution of y i + = (cid:80) m i j =1 y ij is negative binomial with probability3istribution function given by f ( y i + ) = Γ( y i + + φ )( y i + )!Γ( φ ) q φ (1 − q ) y i + , y i + = 0 , , , ..., where q = φ/ ( φ + µ i + ). Then, the randomized quantile residuals (Dunn and Smyth, 1996),which follow a standard normal distribution, can be used to assess departures from the MNBRmodel. If F ( y i + ; µ, φ ) is the cumulative distribution of f ( y i + ), a i = lim y ↑ y i + F ( y ; (cid:98) µ i , (cid:98) φ ), and b i = F ( y i + ; (cid:98) µ i , (cid:98) φ ), then the randomized quantile residuals for y i are given by r q,i = Φ − ( u i ),where Φ( · ) is the cumulative distribution function of the standard normal and u i is a uniformrandom variable on the interval ( a i , b i ]. Based on the case-deletion approach (Cook, 1977), we propose the generalized Cook’s dis-tance as a global inﬂuence measure to assess the impact on the ML estimates of the MNBRmodel when the i th subject (or cluster) is removed from the data set. Let y ( i ) be a vector ofrandom outcomes after deleting the i th subject. Let (cid:98) θ ( i ) be the ML estimate of θ computedfrom (cid:96) ( i ) ( θ ), where (cid:96) ( i ) ( θ ) is obtained from (2) after removing the i th subject. The generalizedCook’s distance is deﬁned as the standardized norm of the distance between (cid:98) θ ( i ) and (cid:98) θ , givenby the expression GD i ( θ ) = ( (cid:98) θ ( i ) − (cid:98) θ ) (cid:62) [¨ (cid:96) θθ ] − ( (cid:98) θ ( i ) − (cid:98) θ ) , where ¨ (cid:96) θθ (see the Appendix) is the Fisher information matrix of the MNBR model. Largevalues of GD i indicate that the ML estimates are strongly inﬂuenced by deleting the i th subject.Another popular measure of the diﬀerence between (cid:98) θ ( i ) and (cid:98) θ is the likelihood displacement givenby LD ( i ) ( θ ) = 2 { (cid:96) ( (cid:98) θ ) − (cid:96) ( (cid:98) θ ( i ) ) } . The local inﬂuence methodology was proposed by Cook (1986) for assessing the sensitivityof the parameters estimated when small perturbations are introduced in the model. Considerthe perturbation vector ω = ( ω , . . . , ω v ) (cid:62) , varying in some open subset Ω ⊂ R v . Let (cid:96) ( θ | ω )denote the log-likelihood function of the perturbed model. It is assumed there exists ω ∈ Ωsuch that (cid:96) ( θ | ω ) = (cid:96) ( θ ) for all θ . The inﬂuence of the minor perturbation on the ML estimate (cid:98) θ may be assessed by the likelihood displacement LD ( ω ) = 2 { (cid:96) ( (cid:98) θ ) − (cid:96) ( (cid:98) θ ω ) } , where (cid:98) θ ω denotesthe ML estimate from the perturbed model. A plot of LD ( ω ) versus ω contains essentialinformation about the inﬂuence of a perturbation scheme. Cook’s idea consists of selectinga unit direction d and evaluating the plot LD ( ω + a d ), where a ∈ R . This plot is calledlifted line, and each ﬁtted line can be obtained by considering the normal curvature deﬁned by C d ( θ ) = 2 | d (cid:62) ∆ (cid:62) ¨ (cid:96) ( θ ) − ∆ d | around a = 0, where ¨ (cid:96) ( θ ) = ∂ (cid:96) ( θ ) /∂ θ ∂ θ (cid:62) is evaluated at (cid:98) θ and ∆ = ∂ (cid:96) ( θ | ω ) /∂ θ ∂ ω (cid:62) is evaluated at (cid:98) θ and ω . Large values of C d ( θ ) indicate the sensitivityinduced by perturbation schemes in direction d . Cook (1986) suggests the local inﬂuence measure C d max ( θ ), evaluated in the direction d max corresponding to the eigenvector with maximal normalcurvature. If the i th component of d max is relatively large, this indicates that perturbations maylead to substantial changes in the ML estimates. Based on this approach, Lesaﬀre and Verbeke(1998) suggested the total local curvature corresponding to the i th element. This local inﬂuencemeasure is obtained by taking the direction d i , an n × i thposition. The total local curvature in direction d i assumes the form C i ( θ ) = 2 | ∆ (cid:62) i ¨ (cid:96) ( θ ) − ∆ i | ,where ∆ i = ∂ (cid:96) i ( θ | ω ) /∂ θ ∂ ω (cid:62) denotes the i th row of ∆ , such that (cid:96) i ( θ | ω ) is the contribution4f the i th individual to the log-likelihood function. ∆ i is evaluated at (cid:98) θ and ω , and it isrecommended to look at the index plot of C i ( θ ) to assess inﬂuential subjects. Usually, in theliterature, three types of perturbation schemes are considered for count data: case weights,explanatory variable, and dispersion parameter perturbation schemes. Case weights perturbation for the i th subject The log-likelihood function of the perturbed model takes the form (cid:96) ( θ | ω ) = n (cid:88) i =1 ω i (cid:110) log (cid:16) Γ( φ + y i + ) (cid:17) − log (cid:16) Γ( φ ) (cid:17) − m i (cid:88) j =1 log( y ij !) + φ log( φ ) − φ log( φ + µ i + ) + m i (cid:88) j =1 y ij log( µ ij ) − m i (cid:88) j =1 y ij log( φ + µ i + ) (cid:111) . The ∆ matrix in C d ( θ ) is given by ∆ (cid:62) = ( ∆ (cid:62) β , ∆ (cid:62) φ ) (cid:62) , with ∆ (cid:62) β = ( ∆ (cid:62) β , ∆ (cid:62) β , . . . , ∆ (cid:62) β p ) and ∆ β k = (∆ k , ∆ k , . . . , ∆ ki , . . . , ∆ kn ) , where∆ ki = m i (cid:88) j =1 (cid:40) y ij − ( (cid:98) φ + y i + )( (cid:98) φ + (cid:98) µ i + ) exp( x (cid:62) ij (cid:98) β ) (cid:41) x ijk = m i (cid:88) j =1 ( y ij − (cid:98) a i (cid:98) µ ij ) x ijk , and ∆ (cid:62) φ = (∆ φ , ∆ φ , . . . , ∆ φi , . . . , ∆ φn ), with∆ φi = ψ ( (cid:98) φ + y i + ) − ψ ( (cid:98) φ ) + log (cid:32) (cid:98) φ (cid:98) φ + (cid:98) µ i + (cid:33) + (cid:32) (cid:98) µ i + − y i + (cid:98) φ + (cid:98) µ i + (cid:33) , where ψ ( · ) is the digamma function, (cid:98) µ i + = (cid:80) m i j =1 (cid:98) µ ij , and (cid:98) µ ij = exp( x (cid:62) ij (cid:98) β ), for j = 1 , . . . , m i and i = 1 , . . . , n . Case weights perturbation for the j th measurement of the i th subject For this perturbation scheme, the log-likelihood function can be expressed as (cid:96) ( θ | ω ) = n (cid:88) i =1 m i (cid:88) j =1 ω ij (cid:110) m i log Γ( φ + y i + ) − m i log Γ( φ ) − log( y ij !)+ 1 m i φ log( φ ) − m i φ log( φ + µ i + ) + y ij log( µ ij ) − y ij log( φ + µ i + ) (cid:111) . The matrix ∆ = ( ∆ (cid:62) β , ∆ (cid:62) φ ) (cid:62) , in which element ij of ∆ (cid:62) β and ∆ (cid:62) φ , respectively, is given by∆ kij = y ij x ijk − (cid:98) φ/m i + y ij (cid:98) φ + (cid:98) µ i + m i (cid:88) j =1 (cid:98) µ ij x ijk , k = 1 , . . . , p and∆ φij = 1 m i ( ψ ( (cid:98) φ + y i + ) − ψ ( (cid:98) φ )) + 1 m i (1 + log( (cid:98) φ ) − log( (cid:98) φ + (cid:98) µ i + )) − (cid:98) φ/m i + y ij (cid:98) φ + (cid:98) µ i + . xplanatory variable perturbation We now consider an additive perturbation on a particular continuous explanatory variable, x ijk , j = 1 , . . . , m i , i = 1 , . . . , n, k = 1 , . . . , p , by setting x ∗ ijk = x ijk + ω i S x , where S x is ascale factor and ω i ∈ R . This perturbation scheme leads to the following expression for thelog-likelihood function: (cid:96) ( θ | ω ) = n (cid:88) i =1 (cid:40) log (cid:16) Γ( φ + y i + ) (cid:17) − log (cid:16) Γ( φ ) (cid:17) − m i (cid:88) j =1 log( y ij !) + φ log( φ ) − φ log( φ + µ ∗ i + ) + m i (cid:88) j =1 y ij log( µ ∗ ij ) − m i (cid:88) j =1 y ij log( φ + µ ∗ i + ) (cid:41) , where µ ∗ i + = (cid:80) m i j =1 µ ∗ ij , µ ∗ ij = exp( x ∗(cid:62) ij β ), x ∗(cid:62) ij β = β + β x ij + . . . + β k ( x ijk + ω i S x )+ . . . + β p x ijp ,and ω = (0 , . . . , (cid:62) . For k = 1 , . . . , p , if t = k , ∆ t is given by∆ ti = (cid:98) β t S x (cid:98) φ exp( x (cid:62) ij (cid:98) β )( (cid:98) φ + (cid:98) µ i + ) (cid:32) − (cid:98) µ i + (cid:98) φ + (cid:98) µ i + (cid:33) x ijt + S x exp( x (cid:62) ij (cid:98) β ) (cid:98) φ + (cid:98) µ i + + m i (cid:88) j =1 y ij (cid:104) S x + (cid:16) (cid:98) β t S x − (cid:98) µ ij (cid:17) x ijt (cid:105) + m i (cid:88) j =1 y ij exp( x (cid:62) ij (cid:98) β ) (cid:98) φ + (cid:98) µ i + × (cid:104) S x + ( (cid:98) β t S x − x ijt (cid:105) . If t (cid:54) = k , ∆ ti = (cid:98) β t S x (cid:98) φ exp( x (cid:62) ij (cid:98) β )( (cid:98) φ + (cid:98) µ i + ) (cid:32) − (cid:98) µ i + (cid:98) φ + (cid:98) µ i + (cid:33) x ijt + m i (cid:88) j =1 y ij x ijt × (cid:16) (cid:98) β t S x − (cid:98) µ ij (cid:17) + m i (cid:88) j =1 ( (cid:98) β t S x − (cid:98) φ + (cid:98) µ i + ) exp( x (cid:62) ij (cid:98) β ) y ij x ijt , and the i th element of ∆ (cid:62) φ is∆ φ i = (cid:98) β t S x (cid:98) µ i + ( (cid:98) φ + (cid:98) µ i + ) (cid:16) (cid:98) φ (cid:98) φ + (cid:98) µ i + − (cid:17) + (cid:98) β t S x m i (cid:88) j =1 y ij (cid:98) µ i + ( (cid:98) φ + (cid:98) µ i + ) . Dispersion parameter perturbation

Let φ ∗ i = ω i × φ be the dispersion parameter perturbation, where ω i ∈ R + . The log-likelihoodfunction of the perturbed model takes the form (cid:96) ( θ | ω ) = n (cid:88) i =1 (cid:40) log (cid:16) Γ( φ ∗ i + y i + ) (cid:17) − log (cid:16) Γ( φ ∗ i ) (cid:17) − m i (cid:88) j =1 log( y ij !) + φ ∗ i log( φ ∗ i ) − φ ∗ i log( φ ∗ i + µ i + ) + m i (cid:88) j =1 y ij log( µ ij ) − m i (cid:88) j =1 y ij log( φ ∗ i + µ i + ) (cid:41) . = ( ∆ (cid:62) β , ∆ (cid:62) φ ) (cid:62) , where the i -th elements of ∆ (cid:62) β and ∆ (cid:62) φ , respectively, are given by∆ k i = (cid:98) φ ∗ i (cid:98) µ i + ( y i + − (cid:98) µ i + )( (cid:98) φ ∗ i + (cid:98) µ i + ) x ijk and∆ φ i = ψ ( (cid:98) φ ∗ i + y i + ) − ψ ( (cid:98) φ ∗ i ) + (cid:98) φ ∗ i [ ψ (cid:48) ( (cid:98) φ ∗ i + y i + ) − ψ (cid:48) ( (cid:98) φ ∗ i )] + 1+ log (cid:32) (cid:98) φ ∗ i (cid:98) φ ∗ i + (cid:98) µ i + (cid:33) + (cid:98) µ i + (cid:98) φ ∗ i + (cid:98) µ i + + y i + (cid:98) µ i + ( (cid:98) φ ∗ i + (cid:98) µ i + ) + (cid:98) φ ∗ i (2 (cid:98) µ i + + (cid:98) φ ∗ i )( (cid:98) φ ∗ i + (cid:98) µ i + ) . For all the perturbation schemes, the ¨ (cid:96) ( θ ) matrix in C d ( θ ) is described in the Appendix. First, a simulation study is conducted to evaluate the asymptotic behavior of the ML estima-tor, (cid:98) θ . The vector y = ( y (cid:62) , . . . , y (cid:62) n ) (cid:62) , where y i ∼ MNB( µ i , φ ), is generated from a random in-tercept Poisson-GLG model with ( i ) y ij | b i ind ∼ Poisson( u ij ), ( ii ) log( u ij ) = β + β x ij + β x ij + b i ,and ( iii ) b i iid ∼ GLG(0 , λ, λ ) , where x ij ∼ N (0 ,

1) and x ij is a dummy variable with two levels,for i = 1 , . . . , n and j = 1 , ,

3. Further, it is assumed that β = (1 . , . , . (cid:62) and λ = φ − / inthe GLG distribution. The Bias, root mean square error (RMSE) and coverage probabilities forthe 95% conﬁdence level are obtained from R = 10 ,

000 Monte Carlo replications performed fora sample size of n = 50, 100, 150, and 200 and for three diﬀerent values of the dispersion pa-rameter, namely, φ = 3, 5 and 7. The estimates of the Bias and RMSE for θ are obtained from:Bias( θ s ) = θ s − θ s and RMSE( θ s ) = (cid:113)(cid:80) Rr =1 ( (cid:98) θ ( r ) s − θ s ) /R , respectively, where θ s = (cid:80) Rr =1 (cid:98) θ ( r ) s /R and (cid:98) θ ( r ) s the estimate from s th parameter obtained in r th Monte Carlo replication. These simu-lation results are presented in Table 1. As expected, the Bias and RMSE values of the regressioncoeﬃcients estimates decrease as the sample size and values of the φ parameter increase. Thecoverage probabilities are close to 0.95. Moreover, the results show that the dispersion parame-ter exhibits a desirable asymptotic behavior when φ assumes small values. Since φ = λ − , it ispossible to aﬃrm that the asymptotic properties of the dispersion parameter estimator are asso-ciated with the asymmetry of the GLG distribution. A second simulation study is performed toevaluate the impact of misspecifying the random eﬀect distribution on the ML estimates of theMNBR model. It is assumed that the random intercept is normally distributed and negativelycorrelated. For sample size n = 50 , , and 150, the vector y = ( y (cid:62) , . . . , y (cid:62) n ) (cid:62) is generatedfrom a random intercept Poisson-Normal distribution with the following hierarchical structure:( i ) y ij | b i ind ∼ Poisson( u ij ), ( ii ) log( u ij ) = β + β x ij + b i and ( iii ) b i iid ∼ N(0 , σ ), σ = φ − and ( i ) y ij | b ij ind ∼ Poisson( u ij ), ( ii ) log( u ij ) = β + β x ij + z ij b ij , and ( iii ) b ∼ N ( , Σ ) , where( β , β ) (cid:62) = (1 . , . x ij ∼ U (0 , Σ = (cid:34) φ − -0.5 -0.1-0.5 φ − -1.0-0.1 -1.0 φ − (cid:35) , and φ − = 4 .

0. In Table 2, we present the Bias and root mean square error (RMSE), calculatedby ﬁtting the MNBR model on 1,000 Monte Carlo replications for diﬀerent sample sizes. Basedon the assumption that b i iid ∼ N(0 , σ ), we observe that the parameter estimates of φ and β arebiased. To summarize, we conclude that the MNBR model provides inconsistent estimates ofthe asymptotic variance of the ML estimators when the covariance matrix of the random eﬀectis misspeciﬁed. 7able 1: Bias, root mean square error (RMSE), and coverage (%) of 95% conﬁdence intervals. φ ( λ ) Measure n φ β β β A third simulation study is conducted to evaluate the variance-to-mean ratio, VMR =Var( y ) / E( y ). Considering three diﬀerent values for the dispersion parameter, φ = 0 . , . φ , are (122 . , . . , .

35) and (5 . , . φ increases the VMR statistic decreasesand assumes values more than one. Thus, it is possible to conclude that the MNBR model isan overdispersion model. When the dispersion parameter is more than 50 similar results areobtained. 8able 2: Bias and root mean square error (RMSE) when the covariance matrix of the randomresponse variable is misspeciﬁed. Variance n Measure φ β β σ

50 Bias -0.4285 -1.8818 0.0561RMSE 0.4584 1.9805 1.0931100 Bias -0.3799 -1.9408 0.0497RMSE 0.3968 2.0068 0.9257150 Bias -0.3633 -1.9282 -0.0129RMSE 0.3750 1.9754 0.6798 Σ

50 Bias -0.4965 -1.8623 0.0068RMSE 0.5301 1.9669 1.0656100 Bias -0.4375 -1.9014 -0.0315RMSE 0.4560 1.9601 0.8784150 Bias -0.4196 -1.9228 -0.0372RMSE 0.4353 1.9953 0.8282

In this section, we present two examples illustrating the diagnostic tools for the MNBRmodel. The data sets and programs can be found in the

MNB package available on the R platform. Seizures data

The data set described in Diggle et al. (2013) refers to an experiment in which 59 epilepticpatients were randomly assigned to one of two treatment groups: treatment (progabide drug)and placebo groups. The number of seizures experienced by each patient during the baselineperiod (week eight) and the four consecutive periods (every two weeks) was recorded. The maingoal of this application is to analyze the drug eﬀect with respect to the placebo. Two dummycovariates are considered in this study; Group which assumes values equal to 1 if the patientbelongs to treatment group and 0 otherwise, and Period which assumes values equal to 1 ifthe number of seizures are recorded during the treatment and 0 if are measured in the baselineperiod. Taking into account the irregular measurement of rate seizures during the time, thevariable Time is considered as an oﬀset for ﬁtting the data, where Time assumes values equal8 if the number of seizures is observed in the baseline period and 2 otherwise. The individualproﬁles of the patients belonging to the placebo and progabide groups are shown in Figure 1 (a)and (b), respectively. The atypical individual proﬁles correspond to patient (cid:93) , , ,

25) ofthe placebo group, who presented a high number of seizures in the third visit compared to otherclinic visits, and patient (cid:93) , , ,

63) of the progabide group, who suﬀered a high numberof seizures in every clinic visit, indicating the ineﬀectiveness of the drug in patients with complexseizures. In both groups, it is possible to see the right skewness of the empirical distributionsof individual proﬁles. Figure 1(c) shows a normal probability plot with simulated envelope forthe Pearson residual (Faraway, 2016, p.135) computed by ﬁtting the Poisson regression modelto seizures data. We conﬁrm the occurrence of the overdispersion phenomenon. Thus, basedon Figure 1(a) - (c), which shows the asymmetric behavior of the empirical distribution ofindividual proﬁles, the MNBR model is proposed for modeling this behavior according to thefollowing structure: ( i ) y i ind ∼ MNB( µ i , φ ) and ( ii ) log( µ ij ) = β + β Group i + β Period ij + β (Group i × Period ij ) + log(Time ij ) , where y i = ( y i , y i , y i , y i ) (cid:62) , µ i = ( µ i , µ i , µ i , µ i ) (cid:62) , for i = 1 , . . . , β is the logarithm of the ratio of the average rate of the treatment groupto the placebo group at baseline, β is the logarithm of the ratio of the seizure mean aftertreatment period to before treatment period for the placebo group, and exp( β ) is the treatment9able 3: Bias, root mean square error (RMSE), coverage (%) of 95% conﬁdence intervals. φ Measure n φ β β β VMR0.1 Bias 50 0.0093 -0.1769 -0.0006 0.0575 (122.39,220.24)100 0.0051 -0.1201 -0.0004 0.0347150 0.0027 -0.0838 -0.0003 0.0310200 0.0020 -0.0331 -0.0001 0.0226RMSE 50 0.0299 0.7048 0.0707 0.9794100 0.0198 0.5105 0.0411 0.6742150 0.0151 0.3891 0.0298 0.5531200 0.0130 0.3203 0.0253 0.4468Coverage 50 95.4 91.3 95.0 91.1100 95.0 92.8 94.2 92.8150 95.3 93.7 94.9 92.9200 95.5 94.5 95.1 94.71.0 Bias 50 0.0979 -0.0201 0.0017 -0.0026 (19.38,29.35)100 0.0390 -0.0059 0.0000 -0.0189150 0.0317 -0.0045 0.0000 -0.0160200 0.0164 -0.0039 0.0000 -0.0006RMSE 50 0.2706 0.2188 0.0552 0.3046100 0.1627 0.1550 0.0380 0.2132150 0.1351 0.1249 0.0291 0.1712200 0.1065 0.1069 0.0244 0.1477Coverage 50 95.7 93.9 94.7 92.2100 95.4 94.3 95.6 93.5150 95.3 94.9 95.2 94.1200 95.1 95.2 94.9 94.750 Bias 50 36.1329 -0.0034 0.0000 0.0022 (5.85,16.87)100 17.2463 -0.0030 0.0000 0.0015150 11.0381 -0.0009 0.0000 0.0011200 9.1020 -0.0008 0.0000 0.0021RMSE 50 73.4328 0.0621 0.0332 0.0739100 45.1901 0.0433 0.0272 0.0519150 30.6868 0.0367 0.0213 0.0454200 25.7042 0.0312 0.0206 0.0374Coverage 50 95.0 95.0 93.2 95.3100 94.2 95.5 94.1 95.4150 94.0 93.9 94.0 94.1200 95.7 93.4 94.1 94.5 eﬀect, and it is the ratio of post- to pre-treatment mean seizure ratios between treatment andplacebo groups. The parameter estimates obtained by using the fit.MNB function from the

MNB package are shown in Table 4. It is not observed evidence of the treatment eﬀect. The shapeparameter estimate indicates that the variability of individual proﬁles with respect their averageis asymmetric to the left with dispersion equal to 1.607 ( λ = 1 / √ φ = 0 . (cid:93)

49 is atypical. The qMNB and envelope.MNB functions from the

MNB package isused to display the graphics of the randomized quantile residual.The global inﬂuential graphics presented in Figure 3(a) and (b) reveal that patients (cid:93) (cid:93)

25 can impact the ML estimates of the MNBR model when they are removed from themodel. The local and total local inﬂuential graphics exhibited in Figure 4(a) and (b) indicate thesensitivity of the estimate associated with patient (cid:93)

49 when the minor perturbation is induced10 imes S e i z u r e s c oun t (a) Times S e i z u r e s c oun t (b) −3 −2 −1 0 1 2 3 N(0,1) quantile P ea r s on r e s i dua l (c)Figure 1: Individual proﬁles of the patients in (a) the placebo group, (b) the progabide group,and (c) the simulated envelope plot of the Pearson residuals.Table 4: Parameter estimates with their respective approximate standard errors (Std. error),z-values, and p -values for the MNBR model ﬁtted to the epileptic data. Parameter Estimate Std. error z-value p -value φ − − β < . β β β -0.105 0.065 -1.610 0.107 λ − Index R ando m i z ed quan t il e r e s i dua l (a) −2 −1 0 1 2 − − N(0,1) quantile R ando m i z ed quan t il e r e s i dua l (b)Figure 2: Index plot of the randomized quantile residuals (a) and (b) the simulated envelopeplot of the randomized quantile residuals. 11

10 20 30 40 50 60 . . . . Index G ene r a li z ed C oo k D i s t an c e (a) Index | L i k e li hood d i s p l a c e m en t |

25 49 (b)Figure 3: Index plot of (a) the generalized Cook distance and (b) the likelihood displacement. . . . . . . . Index | d m a x | (a) Index C i (b)Figure 4: Index plot of | d max | for the case weight perturbation scheme: (a) local inﬂuence and(b) total local inﬂuence on the i th patient.in the directions of d max and d i , respectively. According to Figure 5(a) and (b), the number ofepileptic seizures of patient (cid:93)

49 recorded on the ﬁrst and last clinic visits of the trial are moreinﬂuential. The dispersion perturbation scheme did not show evidence of any possible inﬂuentialobservations.Finally, the percentage relative deviations

P RD =[( (cid:98) θ − (cid:98) θ ∗ ) / (cid:98) θ ] × (cid:98) θ ∗ is the estimatorof θ obtained after deleting one or more atypical subjects, are calculated and the results are12

50 100 150 200 250 300 . . . . . . Index | d m a x | (49,1) (a) Index C i (49,1) (b)Figure 5: Index plot of | d max | for case weight perturbation scheme: (a) local inﬂuence and (b)total inﬂuence of the j th measurement of each patient.presented in Table 5. We can observe signiﬁcatively changes are associate with the estimates of β and β . Noting that the signal of the estimate β is indicating that the treatment decreasesthe number of seizures, its descriptive-level keeps nonsigniﬁcant though. For the parameter β we observe that the interaction eﬀect incresed 50% and that its descriptive-level becomessigniﬁcant.Table 5: Parameter estimates, standard errors (Std. error), p -values, and percentage relativedeviations ( P RD ). Dropping Parameter Estimate Std. error p -value P RD (%) φ − -28.21 β < .

000 0.00 (cid:93) β -0.107 0.189 0.573 487.95 β β -0.302 0.070 < .

000 -188.73

Accident data

Karlis (2003) provides a data set which refers to the number of car accidents in 24 centralroads in Athens, for a time period of 1987 to 1991. In this paper, we are interested in modelingthe number of car accidents per road length during the period of 5 years. Thus, the dummycovariate years is considered for ﬁtting the data including the road length as an oﬀset. Figure6(a) presents the individual proﬁles of the central roads for the number of car accidents recordedon each road over time. In general, the highest number of car accidents occurred in 1989 andthe smallest in 1991. The empirical distribution of individual proﬁles is skewed to the right.Figure 6(b) reveals that

P atision and

P eiraios roads are considered as outliers and evidence ofvariability between the individual proﬁles and across the years is suggested in Figure 6(c).13 ears N u m be r o f a cc i den t s (a) Years N u m be r o f a cc i den t s PatisionPeiraios Patision (b) −2 −1 0 1 2 − N(0,1) quantile P ea r s on r e s i dua l (c)Figure 6: (a) Individual proﬁles, (b) box-plots of the number of car accidents on central Athensroads during the 5 years, and (c) simulated envelope plot of Pearson residuals.In the ﬁtting the Poisson regression model to the accidents data, Figure 6(c), we observe theoccurrence of the overdispersion phenomenon. Based on Figure 6(a) - (b) showing the asymmet-ric behavior of the empirical distribution of individual proﬁles, we propose the MNBR modelwith the following structure for modeling the data: ( i ) y i ind ∼ MNB( µ i , φ ) and ( ii ) log( µ ij ) = β + β t year j + log(length i ), where y i = ( y i , y i , y i , y i , y i ) (cid:62) , µ i = ( µ i , µ i , µ i , µ i , µ i ) (cid:62) , t = 1 , , , β t is the logarithm of the ratio of the average car accidents per road length for year t to year 1987. The ML estimates for the MNBR model are presented in Table 6. We observethe estimates are signiﬁcant and that the small diﬀerence in the logarithm of the rate averageof the number of car accidents per length of the roads occurred between the years the 1987 and1991.Table 6: Parameter estimates, standard errors (Std. error), z-values, and p -values for the MNBRmodel ﬁtted to the accidents data.Parameter Estimate Std. error z-value p -value φ − β < . β < . β < . β < . β -0.322 0.063 -5.112 < . Leof. Kavalas , Leof. Athinon and

P anepistimiou can impact the ML estimates when they are removed.

Leof. Kavalas (4,6,12,9,3) and

P anepistimiou (24, 58, 40, 36, 05) are small roads, 2.0 and 1.1 kilometerslong, respectively.

Leof. Athinon (15,11,16,21,28) is a large road, 6.1 kilometers long. Fur-thermore, two roads that can impact the ML estimates are showed in Figure 8(b),

P atision (80,108,114,113,86) and

Katehaki (2,3,27,24,7), which are 4.1 and 1.4 kilometers long, respec-tively. In Figure 9(a) and (b) are showed inﬂuential roads,

P eraios (86 89 109 90 49) and

Aharnon (44, 79, 91, 88, 33), which are 8.0 and 5.5 kilometers long, respectively.14

10 15 20 − − Index R ando m i z ed quan t il e r e s i dua l (a) −2 −1 0 1 2 − − − N(0,1) quantile R ando m i z ed quan t il e r e s i dua l (b)Figure 7: Index plot of the randomized quantile residuals (a) and (b) the simulated envelopeplot of the randomized quantile residuals. . . . . . . . Index G ene r a li z ed C oo k D i s t an c e Leof.KavalasLeof.Athinon Panepistimiou (a)

Index | L i k e li hood d i s p l a c e m en t | Katehaki Leof.AthinonPanepistimiouPatision (b)Figure 8: Index plot of (a) the generalized Cook distance and (b) the likelihood displacementfor the number of car accidents on central Athenian roads in 1989-1991.Finally, the percentage relative deviations (

P RD ) are calculated and we verify that themost signiﬁcant changes in the ML estimates are associated with the removal of the inﬂuentialroads

P atision and

P eiraios . The results in Table 7 show that the estimates are aﬀected when

P atision and

P eiraios are removed simultaneously.15

10 15 20 . . . . . Index | d m a x | Aharnon PatisionPeiraios (a)

Index C i Aharnon PatisionPeiraios (b)Figure 9: Index plot of | d max | for the case weight perturbation scheme: (a) local inﬂuence and(b) total local inﬂuence for the number of car accidents on central Athenian roads in 1989-1991. . . . . . . . . Index | d m a x | (Patision,3)(Peiraios,3) (a) Index C i (Patision,5) (b)Figure 10: Index plot of | d max | for the case weight perturbation scheme: (a) local inﬂuence and(b) total local inﬂuence on j th measurement of the number of car accidents for each atypicalcentral Athenian road. In this study, we developed diagnostic tools for the MNBR model derived from the Poissonmixed model where the GLG distribution is assumed for the random eﬀect. Simulation resultsexhibited the features of the MNBR model and its association with the hierarchical model, which16

10 15 20 . . . . . . . Index | d m a x | Leof.KavalasLeof.Athinon (a) . . . . . Index C i Leof.Athinon Panepistimiou (b)Figure 11: Index plot of | d max | for the dispersion perturbation scheme, (a) local inﬂuence and(b) total local inﬂuence for the number of car accident on central Athenian roads of 1989-1991.Table 7: Parameter estimates, standard errors (Std. Error), z-values, p -values, and P RD (%)when

P atision road is deleted from the accidents data.Dropping Parameter Estimate Std. Error z-value p -value P RD (%)

P atision φ − − − β < .

001 2.27 β < .

001 -2.65 β < .

001 -5.36 β < .

001 -3.86 β -0.399 0.069 -5.765 < .

001 -23.97

P eiraios φ − − − β < .

001 1.54 β < .

001 -12.63 β < .

001 -8.23 β < .

001 -11.98 β -0.287 0.067 -4.257 < .

001 10.91

P eiraios and P atision φ − − − β < .

001 4.24 β < .

001 -17.95 β < .

001 -15.76 β < .

001 -18.47 β -0.370 0.075 -4.924 < .

001 -14.91was essential for explaining that the asymptotic consistency of the estimators of the regressioncoeﬃcients and the dispersion parameter depends on the asymmetry of the GLG distribution. Asexpected, the MNBR model provides inconsistent estimates of the asymptotic variance of the MLestimators when the covariance matrix of the response variable is misspeciﬁed. The randomized17uantile residuals can be used to assess possible departures of the data from the MNBR modelassumptions. Following the approaches of Cook (1977, 1986) and Lesaﬀre and Verbeke (1998),global and local measures for the MNBR model were derived and implemented in the authors’

MNB package. The application of the

MNB package to two data sets was presented and it wasshown that the asymmetric behavior of the empirical distributions of the individual proﬁlescan indicate the need to use the multivariate model to better ﬁt these proﬁles. The proposedmethodology was helpful for identifying outlying subjects that matter in the MNBR model overtime and handle the overdispersion phenomenon. The code for installing the

MNB package ispresented in the Appendix.

Acknowledgements

The work of the fourth author is partially funded by CNPq, Brazil. We also thank anonymousreferees for constructive comments and suggestions.

References

Cook, R. D. (1977). Detection of inﬂuential observations in linear regression.

Technometrics , , 15–18. 2, 4, 18Cook, R. D. (1986). Assessment of local inﬂuence (with discussion). Journal of the RoyalStatistical Society Series B , , 133–169. 2, 4, 18Cook, R. D. and Weisberg, S. (1983). Residuals and Inﬂuence in Regresion . Chapman and Hall,New York. 2Diggle, P. J., Liang, K. Y., and Zeger, S. L. (2013).

Analysis of Longitudinal Data . OxfordUniversity Press, N.Y., 2 edition. 9Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals.

Journal of Computationaland Graphical Statistics , , 236–244. 2, 4Fabio, L., Paula, G. A., and de Castro, M. (2012). A Poisson mixed model with nonormalrandom eﬀect distribution. Computational Statistics and Data Analysis , , 1499–1510. 1, 2,3Faraway, F. (2016). Extending the Linear Model with R: Generalized Linear, Mixed Eﬀects andnonparametric regression models . Chapman & Hall/CRC, New York. 9Hilbe, J. M. (2011).

Negative Binomial Regression . Cambridge, United Kingdom. 3Johnson, N., Kotz, S., and Balakrishnan, N. (1997).

Discrete Multivariate Distributions . Wiley,New York. 3Karlis, D. (2003). An EM algorithm for multivariate Poisson distribution and related models.

Journal of Applied Statistics , , 63–77. 13Lawless, J. (1987). Negative binomial and mixed Poisson regression. The Canadian Journal ofStatistics , , 209–225. 2Lawless, J. F. (2002). Statistical Models and Methods for Lifetime Data . Wiley-Interscience,New York. 2Lee, Y., Nelder, J. A., and Pawitan, Y. (2006).

Generalized linear models with random eﬀects.Uniﬁed analysis via H-likelihood . Chapman and Hall/CRC, Boca Raton. 1Lesaﬀre, E. and Verbeke, G. (1998). Local inﬂuence in linear mixed models.

Biometrics , ,570–582. 2, 4, 18Molenberghs, G. and Verbeke, G. (2010). Models for Discrete Longitudinal Data . Springer, NewYork. 2, 3Molenberghs, G., Verbeke, G., and Dem´etrio, C. G. B. (2007). An extended random-eﬀects18pproach to modeling repeated, overdispersed count data.

Lifetime Data Analysis , , 513–531. 1Moller, J. and Rubak, E. (2010). A model for positively correlated count variables. InternationalStatistical Review , , 65–80. 2R Core Team (2017). R: A Language and Environment for Statistical Computing . R Foundationfor Statistical Computing, Vienna, Austria. 2Solis-Trapala, I. and Farewell, V. (2005). Regression analysis of overdispersed correlated countdata with subject speciﬁc covariates.

Statistics in Medicine , , 2557–2575. 2Sung, Y. and Lee, K. (2018). Negative binomial loglinear mixed models with general randomeﬀects covariance matrix. Communications for Statistical Applications and Methods , , 61–70. 2Tsui, K.-W. (1986). Multiparameter estimation for some multivariate discrete distributions withpossibly dependent components. Annals of the Institute of Statistical Mathematics , , 45–56.3 Appendix1. The score vector and the observed Fisher information matrix

The score vector U θ = ( U (cid:62) β , U φ ) (cid:62) is obtained by deriving the log-likelihood function (2) withrespect to the parameters β and φ , respectively. Thus, U β = n (cid:88) i =1 X (cid:62) i ( y i − a i µ i ) and U φ = n (cid:88) i =1 (cid:26) ψ ( φ + y i + ) − ψ ( φ ) + log (cid:18) φφ + µ i + (cid:19) + (cid:18) µ i + − y i + φ + µ i + (cid:19) (cid:27) , (3)where a i = ( φ + y i + ) / ( φ + µ i + ) , ψ ( · ) is the digamma function, and X i is an m i × p matrix withrow x (cid:62) ij for i = 1 , . . . , n and j = 1 , . . . , m i . However, using the fact that Γ( φ + y i + ) / Γ( φ ) = φ ( φ + 1)( φ + 2) . . . ( φ + y i + −

1) in (3), U φ takes the following form U φ = n (cid:88) i =1  y i + − (cid:88) j =0 ( j + φ ) − − y i + φ + µ i + − log(1 + φ − µ i + ) + µ i + ( φ + µ i + )  . The observed information matrix is obtained by deriving the score vector with respect to β and φ . Thus, − ¨ (cid:96) ( θ ) = (cid:20) − ¨ (cid:96) ββ − ¨ (cid:96) βφ − ¨ (cid:96) φβ − ¨ (cid:96) φφ , (cid:21) where ¨ (cid:96) ββ = − n (cid:88) i =1 ( φ + y i + )( φ + µ i + ) m i (cid:88) j =1 x ij x (cid:62) ij µ ij + n (cid:88) i =1 ( φ + y i + )( φ + µ i + ) m i (cid:88) j =1 x ij µ ij m i (cid:88) j =1 x (cid:62) ij µ ij , ¨ (cid:96) βφ = ¨ (cid:96) φβ = − m i (cid:88) j =1 x ijk µ ij ( µ i + − y i + )( φ + µ i + ) and¨ (cid:96) φφ = − n (cid:88) i =1  y ∗ i (cid:88) s =0 s + φ ) + φ − µ i + ( φ + µ i + ) − µ i + ( φ + µ i + ) − y i + ( φ + µ i + )  . . How to install and use the MNB package

We present the R platform code used to install the MNB package. require(devtools) devtools::install_github("carrascojalmar/MNB") require(MNB) data(seizures) star <-list(phi=1, beta0=1, beta1=1, beta2=1, beta3 =1) mod <- fit.MNB(formula=Y ~ trt + period + trt:period + offset(log(weeks)), star=star , dataSet=seizures ,tab=FALSE) mod par <- mod $ par names(par)<-c() res.q <- qMNB(par=par ,formula=Y ~ trt + period + trt:period + offset(log(weeks)),dataSet=seizures) plot(res.q,ylim=c(-3,4.5),ylab="Randomized quantile residual", xlab="Index",pch=15,cex.lab = 1.5, cex = 0.6, bg = 5) abline(h=c(-2,0,2),lty=3) envelope.MNB(formula=Y ~ trt + period + trt:period + offset(weeks),star=star ,nsim=21,n.r=6, dataSet=seizures ,plot=TRUE) global.MNB(formula=Y ~ trt + period + trt:period + offset(log(weeks)),star=star ,dataSet=seizures ,plot=TRUE) local.MNB(formula=Y ~ trt + period + trt:period + offset(log(weeks)),star=star ,dataSet=seizures , schemes="weight",plot=TRUE) require(MNB) accident <- read.table("...\\ accident.txt",h=TRUE) temp <- apply(accident [,2:6],1,t) Y <- matrix(temp ,ncol=1) length.x <- rep(accident $ length ,each=5) ind <- rep(1:24,each=5) year <- as.factor(rep(1:5 ,24)) accident2 <- data.frame(Y,year ,length.x,ind) star=list(phi=10, beta1=2, beta2=0.4, beta3= 0.6, beta4=0.5, beta5 =-0.3) mod <- fit.MNB(Y ~ year + offset(log(length.x)),star=star ,dataSet=accident2 ,tab=FALSE) mod par <- mod $ par names(par)<-c() qMNB(par=par ,formula=Y ~ year + offset(log(length.x)),dataSet=accident2) envelope.MNB(formula=Y ~ year + offset(log(length.x)),star=star ,nsim=21,n.r=6, dataSet=accident2 ,plot=TRUE) global.MNB(Y ~ year + offset(log(length.x)),star=star ,dataSet=accident2 ,plot=TRUE) local.MNB(Y ~ year + offset(log(length.x)),star=star ,dataSet=accident2 , schemes="weight",plot=TRUE)schemes="weight",plot=TRUE)