[PDF] Bias Reduction as a Remedy to the Consequences of Infinite Estimates in Poisson and Tobit Regression

Abstract

Data separation is a well-studied phenomenon that can cause problems in the estimation and inference from binary response models. Complete or quasi-complete separation occurs when there is a combination of regressors in the model whose value can perfectly predict one or both outcomes. In such cases, and such cases only, the maximum likelihood estimates and the corresponding standard errors are infinite. It is less widely known that the same can happen in further microeconometric models. One of the few works in the area is Santos Silva and Tenreyro (2010) who note that the finiteness of the maximum likelihood estimates in Poisson regression depends on the data configuration and propose a strategy to detect and overcome the consequences of data separation. However, their approach can lead to notable bias on the parameter estimates when the regressors are correlated. We illustrate how bias-reducing adjustments to the maximum likelihood score equations can overcome the consequences of separation in Poisson and Tobit regression models.

Full PDF

BBias reduction as a remedy to the consequences of inﬁnite estimates in Poisson andTobit regression

Susanne K¨oll a , Ioannis Kosmidis b , Christian Kleiber c , Achim Zeileis a, ∗ a Faculty of Economics and Statistics, Universit¨at Innsbruck, Austria b Department of Statistics, University of Warwick & The Alan Turing Institute, United Kingdom c Faculty of Business and Economics, Universit¨at Basel, Switzerland

Abstract

Data separation is a well-studied phenomenon that can cause problems in the estimation and inference from binaryresponse models. Complete or quasi-complete separation occurs when there is a combination of regressors in the modelwhose value can perfectly predict one or both outcomes. In such cases, and such cases only, the maximum likelihoodestimates and the corresponding standard errors are inﬁnite. It is less widely known that the same can happen in furthermicroeconometric models. One of the few works in the area is Santos Silva and Tenreyro (2010) who note that theﬁniteness of the maximum likelihood estimates in Poisson regression depends on the data conﬁguration and proposea strategy to detect and overcome the consequences of data separation. However, their approach can lead to notablebias on the parameter estimates when the regressors are correlated. We illustrate how bias-reducing adjustments tothe maximum likelihood score equations can overcome the consequences of separation in Poisson and Tobit regressionmodels.

Keywords:

Bias reduction, Data separation, Shrinkage

JEL:

C13, C24, C25, C52

1. Sources of separation in regression models

Suppose that the non-negative random variable y i hasa distribution with a point mass at zero. Suppose thatthe distribution function of y i is F ( · ; µ i , φ ) ( i = 1 , . . . , n ),where the scalar parameter µ i is a centrality measure (e.g.,the mean), and the parameter φ represents higher-ordercharacteristics of the distribution (e.g., dispersion).A regression model can be formulated as y i ∼ F ( · ; µ i , φ ) , (1) µ i = h ( x (cid:62) i β ) ( i = 1 , . . . , n ) , (2)where x i is a vector of regressors with dim( x i ) = p , whichis observed along with y i , and h ( · ) is a monotonically in-creasing function that links µ i to x i and a parameter vec-tor β . The model speciﬁcation in (1) and (2) covers arange of models, including models for binary, multinomial,ordinal, and count models, models for limited dependentvariables such as the Tobit model and its extensions, andzero-inﬂated and two-part or hurdle models. ∗ Corresponding author

Email addresses:

[email protected] (Susanne K¨oll),

[email protected] (IoannisKosmidis),

[email protected] (Christian Kleiber),

[email protected] (Achim Zeileis) Note that the discussion here extends to the case where the sup-port of the response is bounded below or above. If the lower bound-ary is a constant b (cid:54) = 0, we can use y i − b . Similarly, if the upperboundary is b , we can use b − y i . The existence of a point mass at zero implies that f (0; µ i , φ ) = F (0; µ i , φ ), where f ( · ; µ i , φ ) is the densityor probability mass function corresponding to F ( · ; µ i , φ ).The simplest but arguably often-encountered occur-rence of data separation in practice is when there is a re-gressor x i,k ∈ { , } such that y i = 0 for all i ∈ { , . . . , n } with x i,k = 1. Assuming that y , . . . , y n are independentconditionally on x , . . . , x n , the log-likelihood (cid:96) ( β, φ ) forthe model deﬁned by (1) and (2) can be decomposed as (cid:96) ( β, φ ) = (cid:88) x i,k =0 log f ( y i ; h ( x (cid:62) i, − k β − k ) , φ ) + (3) (cid:88) x i,k =1 log F (0; h ( x (cid:62) i, − k β − k + β k ) , φ ) , (4)where a − k indicates the sub-vector formed from a vector a after omitting its k -th component.Term (3) is exactly the log-likelihood without the k -thregressor and based only on the observations with x i,k = 0.Under the extra assumption that F (0; µ i , φ ) is monoton-ically decreasing with µ i (which is true, for example, inPoisson and Tobit regression models), β k will diverge to −∞ during maximization, so that (4) achieves its maxi-mum value of 0. Then, the maximization of term (3) withrespect to β − k yields the maximum likelihood (ML) es-timate of ˆ β − k . So, the ML estimate of β − k will be thesame as the ML estimate obtained by maximizing the log-likelihood without the k -th regressor over the subset ofobservations with x i,k = 0. a r X i v : . [ s t a t . M E ] J a n s Santos Silva and Tenreyro (2010) show for Poissonregression, the same situation can occur more generally,when separation occurs for a certain linear combination ofregressors. Our discussion above extends their considera-tions beyond log-link models and Poisson regression.

2. Estimating regression models with separated data

Albert and Anderson (1984) showed that inﬁnite esti-mates in multinomial logistic regression occur if and onlyif there is data separation. Since then, the consequencesof inﬁnite estimates to estimation and inference have beenwell-studied for binomial and multinomial responses.A popular remedy in the statistics literature is to re-place the ML estimator with shrinkage estimators that areguaranteed to take ﬁnite values (see, for example Gelmanet al., 2008, for using shrinkage priors in the estimationof binary regression models). The probably most-used es-timator of this kind comes from the solution of the bias-reducing adjusted score equations in Firth (1993) (see, forexample, Heinze and Schemper 2002 and Zorn 2005 foraccessible detailed accounts), which guarantee estimatorswith smaller asymptotic bias than what the ML estimatortypically has (Firth, 1993; Kosmidis and Firth, 2009).In contrast, the majority of methods that have beenput forward in the econometrics literature are typicallybased on omitting the regressors that are responsible forthe inﬁnite estimates. Such practice can be problematicas we discuss in the following sections.

Santos Silva and Tenreyro (2010) show that the regres-sors responsible for separation in Poisson models can beeasily identiﬁed by running a least squares regression onthe non-boundary observations and checking for perfectcollinearity among the regressors. The same strategy isalso applicable for Tobit regression models.Having identiﬁed the collinear regressors associated withseparation, Santos Silva and Tenreyro (2010) propose tosimply omit those and re-estimate the model using the full data set with all n observations. The same strategy is alsoadopted in Cameron and Trivedi (2013, Chapter 6.2), whosuggest to drop the separating regressor from the binarymodel part of a count data hurdle model.However, this strategy only leads to consistent esti-mates if the omitted regressors are, in fact, not relevant,or were constructed to speciﬁcally indicate a zero response(e.g., in the artiﬁcial data set used in the illustrationsof Santos Silva and Tenreyro, 2011). In contrast, whena highly informative regressor is omitted, separation willbe replaced by a systematic misspeciﬁcation of the model(Heinze and Schemper, 2002; Zorn, 2005). In that situ-ation, consistent estimates can be obtained by not onlyomitting the regressor but also the observations responsi-ble for separation, i.e., considering only the ﬁrst term (3)in the likelihood and dropping (4). Kosmidis and Firth (2020) have formally shown that,in logit regression models with full-rank model matrix, thebias-reduced (BR) estimators coming from the adjustedscore equations in Firth (1993) (i) have always ﬁnite valueand (ii) shrink towards zero in the direction of maximizingthe Fisher information about the parameters. There arealso strong empirical ﬁndings that the ﬁniteness of the BRestimator extends beyond logit models.A desirable feature of the bias-reducing adjustments tothe score functions is that they are asymptotically domi-nated by the score functions. As a result, inference thatrelies on the BR estimates (Wald tests, information crite-ria, etc.) can be performed as usual by simply using theBR estimates in place of the ML estimates. This makesBR estimation a rather attractive alternative approach fordealing with separation, without omitting regressors.While bias reduction is a well-established remedy fordata separation in binary regression models, it is less wellknown that it is eﬀective also in more general settingssuch as generalized nonlinear models (Kosmidis and Firth,2009), and, as illustrated here, the models in Section 1.

3. Illustration

Similarly to Santos Silva and Tenreyro (2011), we con-sider models with intercept x i, = 1 and regressors x i, and x i, ( i = 1 , . . . , n ). The values for x i, are generatedfrom a uniform distribution as x i, ∼ U ( − , x i, are, then, generated from Bernoulli distributionsas x i, ∼ B ( π ) if x i, > x i, ∼ B (1 − π ) otherwise, inorder to allow for correlation between the two regressors.The responses for the Poisson model are generated from(1) using h ( x (cid:62) i β ) = exp( x (cid:62) i β ) and the Poisson distributionfor F (with known dispersion φ = 1). The responses forthe Tobit model are generated from a latent normal dis-tribution N ( x (cid:62) i β, φ ) with variance φ = 2 and subsequentcensoring by setting all negative responses to 0.For illustration purposes, we generate a single artiﬁcialdata set involving n = 100 regressor values with π = 0 . β = 1, β = 1 and β = −

10. In both cases, separation occurs due to theextreme value for the coeﬃcient of x i, . In the Appendix,we carry out a thorough simulation study with 10 ,

000 datasets for a range of combinations of n and π and β = − n = 100 observations,and ML estimation of the reduced model after omitting x i, either by using just the subset of the data set with x i, = 0 (ML/sub), or all n = 100 observations as proposedby Santos Silva and Tenreyro (2010) (ML/SST).The bias-reducing adjusted score equations for the Pois-son regression are (cid:80) ni =1 ( y i + h i / − µ i ) x i = 0 p , where 0 p is a p -vector of zeros and h i = x (cid:62) i ( X (cid:62) W X ) − x i µ i with W = diag { µ , . . . , µ n } (Firth, 1992). It is solved with the2 able 1: Comparison of diﬀerent approaches when dealing with sep-aration in a Poisson model. N is the number of observations used. ML BR ML/sub ML/SST(Intercept) 0 .

951 0 .

958 0 .

951 0 . . . . . .

011 1 .

006 1 .

011 1 . . . . . − . − . . . − . − . − . − . Table 2: Comparison of diﬀerent approaches when dealing with sep-aration in a Tobit model. N is the number of observations used.

ML BR ML/sub ML/SST(Intercept) 1 .

135 1 .

142 1 . − . . . . . .

719 0 .

705 0 .

719 2 . . . . . − . − . . . .

912 1 .

970 1 .

912 3 . . . . . − . − . − . − . brglm fit method from the R package brglm2 (Kosmidis,2020). For the Tobit model we derived the adjusted scoreequations along with an implementation in the R package brtobit (K¨oll et al., 2021). The derivations are tedious butnot complicated and are provided in the Appendix.Tables 1 and 2 show the results from estimating thePoisson and Tobit models, respectively, with the four dif-ferent strategies. The following remarks can be made: • Standard ML estimation using all observations leadsto a large estimate of β with even larger standarderror. As a result, a standard Wald test results in noevidence against the hypothesis that x should notbe in the model, despite the fact that using β = − x perhaps the mostinﬂuential regressor. • The ML/sub strategy, i.e., estimating the model with-out x only for the 0 observations with x i, = 0,yields exactly the same estimates as ML because itoptimizes the term (3), after setting (4) to zero. • Compared to ML and ML/sub, BR has the advan-tage of returning a ﬁnite estimate and standard er-ror for β . Hence a Wald test can be directly used The estimates for β and the corresponding standard errors areformally inﬁnite. The displayed ﬁnite values are the result of stop-ping the iterations early according to the convergence criteria usedduring maximization of the likelihood. Stricter convergence criteriawill result in estimates and standard errors that diverge further. to examine the evidence against β = 0. The otherparameter estimates and the log-likelihood are closeto ML. Similarly to binary response models, bias re-duction here slightly shrinks the parameter estimatesof β and β towards zero. • Finally, the estimates from ML/SST, where regressor x is omitted and all observations are used, appearto be far from the values we used to generate thedata. This is due to the fact that x is not onlyhighly informative but also correlated with x .Moreover, the simulation experiments in the Appendixprovide evidence that the BR estimates are always ﬁnite,and result in Wald-type intervals with better coverage. References

Albert, A., Anderson, J.A., 1984. On the existence of maximumlikelihood estimates in logistic regression models. Biometrika 71,1–10. doi: .Amemiya, T., 1973. Regression analysis when the dependent variableis truncated normal. Econometrica 41, 997–1016. doi: .Cameron, A.C., Trivedi, P.K., 2013. Regression Analysis of CountData. 2nd ed., Cambridge University Press, New York.Firth, D., 1992. Bias reduction, the Jeﬀreys prior and GLIM, in:Fahrmeir, L., Francis, B., Gilchrist, R., Tutz, G. (Eds.), Advancesin GLIM and Statistical Modelling: Proceedings of the GLIM 92Conference, Munich, Springer, New York. pp. 91–100.Firth, D., 1993. Bias reduction of maximum likelihood estimates.Biometrika 80, 27–38. doi: .Gelman, A., Jakulin, A., Pittau, M.G., Su, Y.S., 2008. A weaklyinformative default prior distribution for logistic and other re-gression models. The Annals of Applied Statistics 2, 1360–1383.doi: .Gourieroux, C., 2000. Econometrics of Qualitative Dependent Vari-ables. Cambridge University Press, Cambridge.Heinze, G., Schemper, M., 2002. A solution to the problem of sepa-ration in logistic regression. Statistics in Medicine 21, 2409–2419.doi: .K¨oll, S., Kosmidis, I., Kleiber, C., Zeileis, A., 2021. brto-bit : Bias-Reduced Tobit Regression Models. URL: https://R-Forge.R-project.org/projects/topmodels/ . R package ver-sion 0.1-1/r1146.Kosmidis, I., 2020. brglm2 : Bias Reduction in Generalized LinearModels. URL: https://CRAN.R-project.org/package=brglm2 .R package version 0.6.2.Kosmidis, I., Firth, D., 2009. Bias reduction in exponential familynonlinear models. Biometrika 96, 793–804. doi: .Kosmidis, I., Firth, D., 2010. A generic algorithm for reducing biasin parametric estimation. Electronic Journal of Statistics 4, 1097–1112. doi: .Kosmidis, I., Firth, D., 2020. Jeﬀreys-prior penalty, ﬁnitenessand shrinkage in binomial-response generalized linear models.Biometrika doi: . R Core Team, 2020. R : A Language and Environment for Statisti-cal Computing. R Foundation for Statistical Computing. Vienna.URL: .Santos Silva, J.M.C., Tenreyro, S., 2010. On the existence of themaximum likelihood estimates in Poisson regression. EconomicsLetters 107, 310–312. doi: .Santos Silva, J.M.C., Tenreyro, S., 2011. Poisson: Some con-vergence issues. Stata Journal 11, 207–212. doi: .Zorn, C., 2005. A solution to separation in binary response models.Political Analysis 13, 157–170. doi: . ppendix A. Bias-reducing adjusted score functions for Tobit regression The Tobit model is one of the classic models of microeconometrics. Fundamental results were obtained by Amemiya(1973). A detailed account of basic properties is available in, e.g., Gourieroux (2000). Here we provide the buildingblocks for bias-reduced estimation of the Tobit model.Denote by (cid:96) ( θ ) the log-likelihood function for a Tobit regression model with full-rank, n × p model matrix X withrows the p -vectors x , . . . , x n , and a ( p + 1)-vector of parameters θ = ( β (cid:62) , φ ) (cid:62) with regression parameters β and variance φ . Then, (cid:96) ( θ ) = (cid:80) ni =1 [(1 − d i ) log(1 − F i ) − d i (log φ ) / − d i ( y i − η i ) / (2 φ )], where d i = 1 if y i > d i = 0 if y i ≤ η i = x (cid:62) i β , and F i is the standard normal distribution function at η i / √ φ . The score vector is s ( θ ) = ∇ (cid:96) ( θ ) = (cid:20) s β ( θ ) s φ ( θ ) (cid:21) =  n (cid:88) i =1 (cid:26) ( d i − λ i √ φ + d i ( y i − η i ) φ (cid:27) x in (cid:88) i =1 (cid:26) (1 − d i ) λ i η i φ / − d i φ + d i ( y i − η i ) φ (cid:27) , where λ i = f i / (1 − F i ) and f i is the density function of the standard normal distribution at η i / √ φ .The observed information matrix, j ( θ ) = −∇∇ (cid:62) (cid:96) ( θ ), has the form j ( θ ) = (cid:20) j ββ ( θ ) j βφ ( θ ) j φβ ( θ ) j φφ ( θ ) (cid:21) , where, setting ν i = f i / (1 − F i ) , j ββ ( θ ) = n (cid:88) i =1 (cid:20) ν i ( d i − √ φ (cid:26) f i √ φ − (1 − F i ) η i φ (cid:27) − d i φ (cid:21) x i x (cid:62) i ,j βφ ( θ ) = n (cid:88) i =1 (cid:20) ν i ( d i − φ / (cid:26) (1 − F i ) η i φ − F i − η i f i √ φ (cid:27) − d i ( y i − η i ) φ (cid:21) x i ,j φβ ( θ ) = j βφ ( θ ) (cid:62) ,j φφ ( θ ) = n (cid:88) i =1 (cid:20) ν i (1 − d i )4 φ / (cid:26) (1 − F i ) η i φ − − F i ) η i − η i f i √ φ (cid:27) + d i φ − d i ( y i − η i ) φ (cid:21) . As shown in Kosmidis and Firth (2010), a BR estimator for θ results as the solution of the adjusted score equations s ( θ ) + A ( θ ) = 0 p +1 , where the vector A ( θ ) has t -th component A t ( θ ) = tr[ { i ( θ ) } − { P t ( θ ) + Q t ( θ ) } ] / t = 1 , . . . , p + 1).In the above expression, Q t ( θ ) = − E( j ( θ ) s t ( θ )) and P t ( θ ) = E( s ( θ ) s (cid:62) ( θ ) s t ( θ )), where i ( θ ) = E( j ( θ )) is the expectedinformation matrix. The R package brtobit implements i ( θ ), Q t ( θ ), and P t ( θ ), and solves the bias-reducing adjusted scoreequations for general Tobit regressions using the quasi Fisher-scoring scheme proposed in Kosmidis and Firth (2010).The matrices i ( θ ), Q t ( θ ) and P t ( θ ) have the same block structure as j ( θ ) and, directly by their deﬁnition, closed-form expressions for their blocks result by taking expectations of the appropriate products of blocks of s ( θ ) and j ( θ ).By direct inspection of the expressions for s ( θ ) and j ( θ ), the required expectations result by noting that E( d mi ) = F i ,E((1 − d i ) m ) = 1 − F i , E( d mi (1 − d i ) l ) = 0, E( d mi (1 − d i ) l ( y i − η i ) k ) = 0, and by computing E( d mi ( y i − η i ) l ) ( k, l, m =1 , . . . , d mi ( y i − η i ) l ) = F i E(( y i − η i ) l | y i > y i − η i | y i >

0) = (cid:112) φξ i , E(( y i − η i ) | y i >

0) = φ − (cid:112) φη i ξ i , E(( y i − η i ) | y i >

0) = (cid:112) φξ i ( η i + 2 φ ) , E(( y i − η i ) | y i >

0) = 3 φ − η i (cid:112) φξ i − φ / η i ξ i , E(( y i − η i ) | y i >

0) = (cid:112) φη i ξ i + 4 φ / ξ i ( η i + 2 φ ) , E(( y i − η i ) | y i >

0) = − η i (cid:112) φξ i ( η i + 5 η i φ + 15 φ ) + 15 φ , where ξ i = f i /F i . The expected information, i ( θ ) = (cid:20) E( j ββ ( θ )) E( j βφ ( θ ))E( j φβ ( θ )) E( j φφ ( θ )) (cid:21) , j ββ ( θ )) = − φ n (cid:88) i =1 (cid:26) η i f i √ φ − λ i f i − F i (cid:27) x i x (cid:62) i , E( j βφ ( θ )) = 12 φ / n (cid:88) i =1 f i (cid:26) η i φ + 1 − λ i η i √ φ (cid:27) x (cid:62) i , E( j φβ ( θ )) = E( j βφ ( θ )) (cid:62) , E( j φφ ( θ )) = − φ n (cid:88) i =1 (cid:26) f i η i φ / + f i η i √ φ − λ i f i η i φ − F i (cid:27) . Furthermore, for t ∈ { , . . . , p } , Q t ( θ ) = − (cid:20) E( j ββ s β t ) E( j βφ s β t )E( j βφ s β t ) (cid:62) E( j φφ s β t ) (cid:21) and P t ( θ ) = (cid:20) E( s β s (cid:62) β s β t ) E( s β s φ s β t )E( s β s φ s β t ) (cid:62) E( s φ s φ s β t ) (cid:21) , and for t = p + 1, Q p +1 ( θ ) = − (cid:20) E( j ββ s φ ) E( j βφ s φ )E( j βφ s φ ) (cid:62) E( j φφ s φ ) (cid:21) and P p +1 ( θ ) = (cid:20) E( s β s (cid:62) β s φ ) E( s β s φ s φ )E( s β s φ s φ ) (cid:62) E( s φ s φ s φ ) (cid:21) , where E( j ββ s β t ) = n (cid:88) i =1 (cid:20) − f i φ / (cid:18) λ i − λ i η i √ φ − (cid:19)(cid:21) x i x (cid:62) i x i,t , E( j βφ s β t ) = n (cid:88) i =1 (cid:20) φ λ i f i (cid:26) − η i φ + 1 + λ i η i √ φ (cid:27) + 1 φ (cid:26) F i − η i f i √ φ (cid:27)(cid:21) x (cid:62) i x i,t , E( j φφ s β t ) = n (cid:88) i =1 φ / (cid:20) λ i f i η i √ φ (cid:26) η i φ − − λ i η i √ φ (cid:27) + f i η i φ + 3 f i (cid:21) x i,t , E( j ββ s φ ) = n (cid:88) i =1 (cid:20) f i η i φ / (1 − F i ) (cid:26) λ i − η i √ φ (cid:27) − η i f i φ / (cid:21) x i x (cid:62) i , E( j βφ s φ ) = n (cid:88) i =1 (cid:20) λ i f i η i φ (cid:26) η i φ − − λ i η i √ φ (cid:27) + f i φ / (cid:26) η i φ (cid:27)(cid:21) x (cid:62) i , E( j φφ s φ ) = n (cid:88) i =1 (cid:20) λ i η i φ (cid:26) − η i f i φ + 3 f i + λ i f i η i √ φ (cid:27) + F i φ − η i f i φ / − f i η i φ / (cid:21) , E( s β s (cid:62) β s β t ) = n (cid:88) i =1 (cid:20) − λ i f i φ / + f i φ / (cid:8) η i + 2 φ (cid:9)(cid:21) x i x (cid:62) i x i,t , E( s β s (cid:62) β s φ ) = n (cid:88) i =1 (cid:20) η i f i φ / (cid:26) λ i − − η i φ (cid:27) + F i φ (cid:21) x i x (cid:62) i , E( s β s φ s β t ) = n (cid:88) i =1 (cid:20) f i η i φ / (cid:8) λ i − (cid:9) + F i φ − f i η i φ / (cid:21) x i x i,t , E( s β s φ s φ ) = n (cid:88) i =1 (cid:20) f i η i φ / (cid:26) − λ i

12 + 1 (cid:27) + f i φ / (cid:26) η i φ (cid:27)(cid:21) x i , E( s φ s φ s β t ) = n (cid:88) i =1 (cid:20) − f i η i φ / (cid:26) λ i − − η i φ (cid:27) + 5 f i φ / (cid:21) x i,t , E( s φ s φ s φ ) = n (cid:88) i =1 (cid:20) f i η i φ / (cid:26) λ i − − η i φ (cid:27) + F i φ − f i η i φ / (cid:21) . ppendix B. Simulation The aim of the simulation experiment is to compare the performance of the BR and ML estimator in count and limiteddependent variable models with varying probabilities of inﬁnite ML estimates. The comparison here is in terms of bias,variance, and empirical coverage of nominally 95% Wald-type conﬁdence intervals based on the asymptotic normalityof the estimators. Our results were obtained using R R Core Team, 2020). Random variables were generatedusing the default methods for the relevant distributions, which in turn rely on uniform random numbers obtained by theMersenne Twister, currently R ’s default generator.The same data generating process as in Section 3 of the main paper is considered, with the coeﬃcient of thebinary regressor x set to the less extreme value β = −

3. The amount of correlation between x and x varieswith π ∈ { , / , / , / , / } so that increasing the value of π leads to decreasing the probability of inﬁnite estimates.The sample sizes we consider are n ∈ { , , , , } . For each combination of π and n , 10 ,

000 independentsamples are simulated, and the parameters of the Poisson and Tobit regression models in Section 3 are estimated usingmaximum likelihood and bias reduction. The estimates are then used to compute simulation-based estimates of the bias,variance, and coverage probability for β .For the ML estimator, the bias, variance, and coverage probabilities are computed conditionally on the ﬁnitenessof the ML estimates. We classify an ML estimate as inﬁnite if the corresponding estimated standard error exceeds20. In eﬀect, we are assuming that if the standard error exceeds 20, the Fisher scoring iteration for ML stopped whilemoving along an asymptote on the log-likelihood surface, hence, at a point where the inverse negative hessian has atleast one massive diagonal element. The heuristic value 20 is conservative even for n = 25. This has been veriﬁedthrough a pilot simulation study to estimate the variance of the reduced-bias estimator, which has the same asymptoticdistribution as the ML estimator. No convergence issues were encountered and the maximum estimated standard errorof the reduced-bias estimators accross simulation settings, parameters, and sample sizes was 8 . . β are inﬁnite, theestimated bias, the estimated variance, and the estimated coverage probability of 95% Wald-type conﬁdence intervals,respectively, for the Poisson model. Figures B.5, B.6, B.7, and B.8 show the corresponding results for the Tobit model.The results for Poisson and Tobit regression lead to similar insights: • Bias reduction via adjusted score functions always yields ﬁnite estimates. • The BR estimator has bias close to zero even for small sample sizes. • Wald-type conﬁdence intervals based on BR estimates have good coverage properties. • The variances of the BR and ML estimator get closer to each other and closer to zero as n increases. This is exactlywhat the theory suggests because the score functions asymptotically dominate the bias-reducing adjustments.6 = p = p = p = p = n P r obab ili t y unconditional conditional BR ML Figure B.1: Probability of inﬁnite estimates for β (Poisson). p = p = p = p = p = n B i a s unconditional conditional BR ML Figure B.2: Bias of estimates for β (Poisson). p = p = p = p = p = n V a r i an c e unconditional conditional BR ML Figure B.3: Variance of estimates for β (Poisson). p = p = p = p = p = n C o v e r age unconditional conditional BR ML Figure B.4: Coverage of 95% Wald-type conﬁdence intervals for β (Poisson). = p = p = p = p = n P r obab ili t y unconditional conditional BR ML Figure B.5: Probability of inﬁnite estimates for β (Tobit). p = p = p = p = p = n B i a s unconditional conditional BR ML Figure B.6: Bias of estimates for β (Tobit). p = p = p = p = p = n V a r i an c e unconditional conditional BR ML Figure B.7: Variance of estimates for β (Tobit). p = p = p = p = p = n C o v e r age unconditional conditional BR ML Figure B.8: Coverage of 95% Wald-type conﬁdence intervals for β (Tobit).(Tobit).