Bias Reduction as a Remedy to the Consequences of Infinite Estimates in Poisson and Tobit Regression
Susanne Köll, Ioannis Kosmidis, Christian Kleiber, Achim Zeileis
BBias reduction as a remedy to the consequences of infinite estimates in Poisson andTobit regression
Susanne K¨oll a , Ioannis Kosmidis b , Christian Kleiber c , Achim Zeileis a, ∗ a Faculty of Economics and Statistics, Universit¨at Innsbruck, Austria b Department of Statistics, University of Warwick & The Alan Turing Institute, United Kingdom c Faculty of Business and Economics, Universit¨at Basel, Switzerland
Abstract
Data separation is a well-studied phenomenon that can cause problems in the estimation and inference from binaryresponse models. Complete or quasi-complete separation occurs when there is a combination of regressors in the modelwhose value can perfectly predict one or both outcomes. In such cases, and such cases only, the maximum likelihoodestimates and the corresponding standard errors are infinite. It is less widely known that the same can happen in furthermicroeconometric models. One of the few works in the area is Santos Silva and Tenreyro (2010) who note that thefiniteness of the maximum likelihood estimates in Poisson regression depends on the data configuration and proposea strategy to detect and overcome the consequences of data separation. However, their approach can lead to notablebias on the parameter estimates when the regressors are correlated. We illustrate how bias-reducing adjustments tothe maximum likelihood score equations can overcome the consequences of separation in Poisson and Tobit regressionmodels.
Keywords:
Bias reduction, Data separation, Shrinkage
JEL:
C13, C24, C25, C52
1. Sources of separation in regression models
Suppose that the non-negative random variable y i hasa distribution with a point mass at zero. Suppose thatthe distribution function of y i is F ( · ; µ i , φ ) ( i = 1 , . . . , n ),where the scalar parameter µ i is a centrality measure (e.g.,the mean), and the parameter φ represents higher-ordercharacteristics of the distribution (e.g., dispersion).A regression model can be formulated as y i ∼ F ( · ; µ i , φ ) , (1) µ i = h ( x (cid:62) i β ) ( i = 1 , . . . , n ) , (2)where x i is a vector of regressors with dim( x i ) = p , whichis observed along with y i , and h ( · ) is a monotonically in-creasing function that links µ i to x i and a parameter vec-tor β . The model specification in (1) and (2) covers arange of models, including models for binary, multinomial,ordinal, and count models, models for limited dependentvariables such as the Tobit model and its extensions, andzero-inflated and two-part or hurdle models. ∗ Corresponding author
Email addresses:
[email protected] (Susanne K¨oll),
[email protected] (IoannisKosmidis),
[email protected] (Christian Kleiber),
[email protected] (Achim Zeileis) Note that the discussion here extends to the case where the sup-port of the response is bounded below or above. If the lower bound-ary is a constant b (cid:54) = 0, we can use y i − b . Similarly, if the upperboundary is b , we can use b − y i . The existence of a point mass at zero implies that f (0; µ i , φ ) = F (0; µ i , φ ), where f ( · ; µ i , φ ) is the densityor probability mass function corresponding to F ( · ; µ i , φ ).The simplest but arguably often-encountered occur-rence of data separation in practice is when there is a re-gressor x i,k ∈ { , } such that y i = 0 for all i ∈ { , . . . , n } with x i,k = 1. Assuming that y , . . . , y n are independentconditionally on x , . . . , x n , the log-likelihood (cid:96) ( β, φ ) forthe model defined by (1) and (2) can be decomposed as (cid:96) ( β, φ ) = (cid:88) x i,k =0 log f ( y i ; h ( x (cid:62) i, − k β − k ) , φ ) + (3) (cid:88) x i,k =1 log F (0; h ( x (cid:62) i, − k β − k + β k ) , φ ) , (4)where a − k indicates the sub-vector formed from a vector a after omitting its k -th component.Term (3) is exactly the log-likelihood without the k -thregressor and based only on the observations with x i,k = 0.Under the extra assumption that F (0; µ i , φ ) is monoton-ically decreasing with µ i (which is true, for example, inPoisson and Tobit regression models), β k will diverge to −∞ during maximization, so that (4) achieves its maxi-mum value of 0. Then, the maximization of term (3) withrespect to β − k yields the maximum likelihood (ML) es-timate of ˆ β − k . So, the ML estimate of β − k will be thesame as the ML estimate obtained by maximizing the log-likelihood without the k -th regressor over the subset ofobservations with x i,k = 0. a r X i v : . [ s t a t . M E ] J a n s Santos Silva and Tenreyro (2010) show for Poissonregression, the same situation can occur more generally,when separation occurs for a certain linear combination ofregressors. Our discussion above extends their considera-tions beyond log-link models and Poisson regression.
2. Estimating regression models with separated data
Albert and Anderson (1984) showed that infinite esti-mates in multinomial logistic regression occur if and onlyif there is data separation. Since then, the consequencesof infinite estimates to estimation and inference have beenwell-studied for binomial and multinomial responses.A popular remedy in the statistics literature is to re-place the ML estimator with shrinkage estimators that areguaranteed to take finite values (see, for example Gelmanet al., 2008, for using shrinkage priors in the estimationof binary regression models). The probably most-used es-timator of this kind comes from the solution of the bias-reducing adjusted score equations in Firth (1993) (see, forexample, Heinze and Schemper 2002 and Zorn 2005 foraccessible detailed accounts), which guarantee estimatorswith smaller asymptotic bias than what the ML estimatortypically has (Firth, 1993; Kosmidis and Firth, 2009).In contrast, the majority of methods that have beenput forward in the econometrics literature are typicallybased on omitting the regressors that are responsible forthe infinite estimates. Such practice can be problematicas we discuss in the following sections.
Santos Silva and Tenreyro (2010) show that the regres-sors responsible for separation in Poisson models can beeasily identified by running a least squares regression onthe non-boundary observations and checking for perfectcollinearity among the regressors. The same strategy isalso applicable for Tobit regression models.Having identified the collinear regressors associated withseparation, Santos Silva and Tenreyro (2010) propose tosimply omit those and re-estimate the model using the full data set with all n observations. The same strategy is alsoadopted in Cameron and Trivedi (2013, Chapter 6.2), whosuggest to drop the separating regressor from the binarymodel part of a count data hurdle model.However, this strategy only leads to consistent esti-mates if the omitted regressors are, in fact, not relevant,or were constructed to specifically indicate a zero response(e.g., in the artificial data set used in the illustrationsof Santos Silva and Tenreyro, 2011). In contrast, whena highly informative regressor is omitted, separation willbe replaced by a systematic misspecification of the model(Heinze and Schemper, 2002; Zorn, 2005). In that situ-ation, consistent estimates can be obtained by not onlyomitting the regressor but also the observations responsi-ble for separation, i.e., considering only the first term (3)in the likelihood and dropping (4). Kosmidis and Firth (2020) have formally shown that,in logit regression models with full-rank model matrix, thebias-reduced (BR) estimators coming from the adjustedscore equations in Firth (1993) (i) have always finite valueand (ii) shrink towards zero in the direction of maximizingthe Fisher information about the parameters. There arealso strong empirical findings that the finiteness of the BRestimator extends beyond logit models.A desirable feature of the bias-reducing adjustments tothe score functions is that they are asymptotically domi-nated by the score functions. As a result, inference thatrelies on the BR estimates (Wald tests, information crite-ria, etc.) can be performed as usual by simply using theBR estimates in place of the ML estimates. This makesBR estimation a rather attractive alternative approach fordealing with separation, without omitting regressors.While bias reduction is a well-established remedy fordata separation in binary regression models, it is less wellknown that it is effective also in more general settingssuch as generalized nonlinear models (Kosmidis and Firth,2009), and, as illustrated here, the models in Section 1.
3. Illustration
Similarly to Santos Silva and Tenreyro (2011), we con-sider models with intercept x i, = 1 and regressors x i, and x i, ( i = 1 , . . . , n ). The values for x i, are generatedfrom a uniform distribution as x i, ∼ U ( − , x i, are, then, generated from Bernoulli distributionsas x i, ∼ B ( π ) if x i, > x i, ∼ B (1 − π ) otherwise, inorder to allow for correlation between the two regressors.The responses for the Poisson model are generated from(1) using h ( x (cid:62) i β ) = exp( x (cid:62) i β ) and the Poisson distributionfor F (with known dispersion φ = 1). The responses forthe Tobit model are generated from a latent normal dis-tribution N ( x (cid:62) i β, φ ) with variance φ = 2 and subsequentcensoring by setting all negative responses to 0.For illustration purposes, we generate a single artificialdata set involving n = 100 regressor values with π = 0 . β = 1, β = 1 and β = −
10. In both cases, separation occurs due to theextreme value for the coefficient of x i, . In the Appendix,we carry out a thorough simulation study with 10 ,
000 datasets for a range of combinations of n and π and β = − n = 100 observations,and ML estimation of the reduced model after omitting x i, either by using just the subset of the data set with x i, = 0 (ML/sub), or all n = 100 observations as proposedby Santos Silva and Tenreyro (2010) (ML/SST).The bias-reducing adjusted score equations for the Pois-son regression are (cid:80) ni =1 ( y i + h i / − µ i ) x i = 0 p , where 0 p is a p -vector of zeros and h i = x (cid:62) i ( X (cid:62) W X ) − x i µ i with W = diag { µ , . . . , µ n } (Firth, 1992). It is solved with the2 able 1: Comparison of different approaches when dealing with sep-aration in a Poisson model. N is the number of observations used. ML BR ML/sub ML/SST(Intercept) 0 .
951 0 .
958 0 .
951 0 . . . . . .
011 1 .
006 1 .
011 1 . . . . . − . − . . . − . − . − . − . Table 2: Comparison of different approaches when dealing with sep-aration in a Tobit model. N is the number of observations used.
ML BR ML/sub ML/SST(Intercept) 1 .
135 1 .
142 1 . − . . . . . .
719 0 .
705 0 .
719 2 . . . . . − . − . . . .
912 1 .
970 1 .
912 3 . . . . . − . − . − . − . brglm fit method from the R package brglm2 (Kosmidis,2020). For the Tobit model we derived the adjusted scoreequations along with an implementation in the R package brtobit (K¨oll et al., 2021). The derivations are tedious butnot complicated and are provided in the Appendix.Tables 1 and 2 show the results from estimating thePoisson and Tobit models, respectively, with the four dif-ferent strategies. The following remarks can be made: • Standard ML estimation using all observations leadsto a large estimate of β with even larger standarderror. As a result, a standard Wald test results in noevidence against the hypothesis that x should notbe in the model, despite the fact that using β = − x perhaps the mostinfluential regressor. • The ML/sub strategy, i.e., estimating the model with-out x only for the 0 observations with x i, = 0,yields exactly the same estimates as ML because itoptimizes the term (3), after setting (4) to zero. • Compared to ML and ML/sub, BR has the advan-tage of returning a finite estimate and standard er-ror for β . Hence a Wald test can be directly used The estimates for β and the corresponding standard errors areformally infinite. The displayed finite values are the result of stop-ping the iterations early according to the convergence criteria usedduring maximization of the likelihood. Stricter convergence criteriawill result in estimates and standard errors that diverge further. to examine the evidence against β = 0. The otherparameter estimates and the log-likelihood are closeto ML. Similarly to binary response models, bias re-duction here slightly shrinks the parameter estimatesof β and β towards zero. • Finally, the estimates from ML/SST, where regressor x is omitted and all observations are used, appearto be far from the values we used to generate thedata. This is due to the fact that x is not onlyhighly informative but also correlated with x .Moreover, the simulation experiments in the Appendixprovide evidence that the BR estimates are always finite,and result in Wald-type intervals with better coverage. References
Albert, A., Anderson, J.A., 1984. On the existence of maximumlikelihood estimates in logistic regression models. Biometrika 71,1–10. doi: .Amemiya, T., 1973. Regression analysis when the dependent variableis truncated normal. Econometrica 41, 997–1016. doi: .Cameron, A.C., Trivedi, P.K., 2013. Regression Analysis of CountData. 2nd ed., Cambridge University Press, New York.Firth, D., 1992. Bias reduction, the Jeffreys prior and GLIM, in:Fahrmeir, L., Francis, B., Gilchrist, R., Tutz, G. (Eds.), Advancesin GLIM and Statistical Modelling: Proceedings of the GLIM 92Conference, Munich, Springer, New York. pp. 91–100.Firth, D., 1993. Bias reduction of maximum likelihood estimates.Biometrika 80, 27–38. doi: .Gelman, A., Jakulin, A., Pittau, M.G., Su, Y.S., 2008. A weaklyinformative default prior distribution for logistic and other re-gression models. The Annals of Applied Statistics 2, 1360–1383.doi: .Gourieroux, C., 2000. Econometrics of Qualitative Dependent Vari-ables. Cambridge University Press, Cambridge.Heinze, G., Schemper, M., 2002. A solution to the problem of sepa-ration in logistic regression. Statistics in Medicine 21, 2409–2419.doi: .K¨oll, S., Kosmidis, I., Kleiber, C., Zeileis, A., 2021. brto-bit : Bias-Reduced Tobit Regression Models. URL: https://R-Forge.R-project.org/projects/topmodels/ . R package ver-sion 0.1-1/r1146.Kosmidis, I., 2020. brglm2 : Bias Reduction in Generalized LinearModels. URL: https://CRAN.R-project.org/package=brglm2 .R package version 0.6.2.Kosmidis, I., Firth, D., 2009. Bias reduction in exponential familynonlinear models. Biometrika 96, 793–804. doi: .Kosmidis, I., Firth, D., 2010. A generic algorithm for reducing biasin parametric estimation. Electronic Journal of Statistics 4, 1097–1112. doi: .Kosmidis, I., Firth, D., 2020. Jeffreys-prior penalty, finitenessand shrinkage in binomial-response generalized linear models.Biometrika doi: . R Core Team, 2020. R : A Language and Environment for Statisti-cal Computing. R Foundation for Statistical Computing. Vienna.URL: .Santos Silva, J.M.C., Tenreyro, S., 2010. On the existence of themaximum likelihood estimates in Poisson regression. EconomicsLetters 107, 310–312. doi: .Santos Silva, J.M.C., Tenreyro, S., 2011. Poisson: Some con-vergence issues. Stata Journal 11, 207–212. doi: .Zorn, C., 2005. A solution to separation in binary response models.Political Analysis 13, 157–170. doi: . ppendix A. Bias-reducing adjusted score functions for Tobit regression The Tobit model is one of the classic models of microeconometrics. Fundamental results were obtained by Amemiya(1973). A detailed account of basic properties is available in, e.g., Gourieroux (2000). Here we provide the buildingblocks for bias-reduced estimation of the Tobit model.Denote by (cid:96) ( θ ) the log-likelihood function for a Tobit regression model with full-rank, n × p model matrix X withrows the p -vectors x , . . . , x n , and a ( p + 1)-vector of parameters θ = ( β (cid:62) , φ ) (cid:62) with regression parameters β and variance φ . Then, (cid:96) ( θ ) = (cid:80) ni =1 [(1 − d i ) log(1 − F i ) − d i (log φ ) / − d i ( y i − η i ) / (2 φ )], where d i = 1 if y i > d i = 0 if y i ≤ η i = x (cid:62) i β , and F i is the standard normal distribution function at η i / √ φ . The score vector is s ( θ ) = ∇ (cid:96) ( θ ) = (cid:20) s β ( θ ) s φ ( θ ) (cid:21) = n (cid:88) i =1 (cid:26) ( d i − λ i √ φ + d i ( y i − η i ) φ (cid:27) x in (cid:88) i =1 (cid:26) (1 − d i ) λ i η i φ / − d i φ + d i ( y i − η i ) φ (cid:27) , where λ i = f i / (1 − F i ) and f i is the density function of the standard normal distribution at η i / √ φ .The observed information matrix, j ( θ ) = −∇∇ (cid:62) (cid:96) ( θ ), has the form j ( θ ) = (cid:20) j ββ ( θ ) j βφ ( θ ) j φβ ( θ ) j φφ ( θ ) (cid:21) , where, setting ν i = f i / (1 − F i ) , j ββ ( θ ) = n (cid:88) i =1 (cid:20) ν i ( d i − √ φ (cid:26) f i √ φ − (1 − F i ) η i φ (cid:27) − d i φ (cid:21) x i x (cid:62) i ,j βφ ( θ ) = n (cid:88) i =1 (cid:20) ν i ( d i − φ / (cid:26) (1 − F i ) η i φ − F i − η i f i √ φ (cid:27) − d i ( y i − η i ) φ (cid:21) x i ,j φβ ( θ ) = j βφ ( θ ) (cid:62) ,j φφ ( θ ) = n (cid:88) i =1 (cid:20) ν i (1 − d i )4 φ / (cid:26) (1 − F i ) η i φ − − F i ) η i − η i f i √ φ (cid:27) + d i φ − d i ( y i − η i ) φ (cid:21) . As shown in Kosmidis and Firth (2010), a BR estimator for θ results as the solution of the adjusted score equations s ( θ ) + A ( θ ) = 0 p +1 , where the vector A ( θ ) has t -th component A t ( θ ) = tr[ { i ( θ ) } − { P t ( θ ) + Q t ( θ ) } ] / t = 1 , . . . , p + 1).In the above expression, Q t ( θ ) = − E( j ( θ ) s t ( θ )) and P t ( θ ) = E( s ( θ ) s (cid:62) ( θ ) s t ( θ )), where i ( θ ) = E( j ( θ )) is the expectedinformation matrix. The R package brtobit implements i ( θ ), Q t ( θ ), and P t ( θ ), and solves the bias-reducing adjusted scoreequations for general Tobit regressions using the quasi Fisher-scoring scheme proposed in Kosmidis and Firth (2010).The matrices i ( θ ), Q t ( θ ) and P t ( θ ) have the same block structure as j ( θ ) and, directly by their definition, closed-form expressions for their blocks result by taking expectations of the appropriate products of blocks of s ( θ ) and j ( θ ).By direct inspection of the expressions for s ( θ ) and j ( θ ), the required expectations result by noting that E( d mi ) = F i ,E((1 − d i ) m ) = 1 − F i , E( d mi (1 − d i ) l ) = 0, E( d mi (1 − d i ) l ( y i − η i ) k ) = 0, and by computing E( d mi ( y i − η i ) l ) ( k, l, m =1 , . . . , d mi ( y i − η i ) l ) = F i E(( y i − η i ) l | y i > y i − η i | y i >
0) = (cid:112) φξ i , E(( y i − η i ) | y i >
0) = φ − (cid:112) φη i ξ i , E(( y i − η i ) | y i >
0) = (cid:112) φξ i ( η i + 2 φ ) , E(( y i − η i ) | y i >
0) = 3 φ − η i (cid:112) φξ i − φ / η i ξ i , E(( y i − η i ) | y i >
0) = (cid:112) φη i ξ i + 4 φ / ξ i ( η i + 2 φ ) , E(( y i − η i ) | y i >
0) = − η i (cid:112) φξ i ( η i + 5 η i φ + 15 φ ) + 15 φ , where ξ i = f i /F i . The expected information, i ( θ ) = (cid:20) E( j ββ ( θ )) E( j βφ ( θ ))E( j φβ ( θ )) E( j φφ ( θ )) (cid:21) , j ββ ( θ )) = − φ n (cid:88) i =1 (cid:26) η i f i √ φ − λ i f i − F i (cid:27) x i x (cid:62) i , E( j βφ ( θ )) = 12 φ / n (cid:88) i =1 f i (cid:26) η i φ + 1 − λ i η i √ φ (cid:27) x (cid:62) i , E( j φβ ( θ )) = E( j βφ ( θ )) (cid:62) , E( j φφ ( θ )) = − φ n (cid:88) i =1 (cid:26) f i η i φ / + f i η i √ φ − λ i f i η i φ − F i (cid:27) . Furthermore, for t ∈ { , . . . , p } , Q t ( θ ) = − (cid:20) E( j ββ s β t ) E( j βφ s β t )E( j βφ s β t ) (cid:62) E( j φφ s β t ) (cid:21) and P t ( θ ) = (cid:20) E( s β s (cid:62) β s β t ) E( s β s φ s β t )E( s β s φ s β t ) (cid:62) E( s φ s φ s β t ) (cid:21) , and for t = p + 1, Q p +1 ( θ ) = − (cid:20) E( j ββ s φ ) E( j βφ s φ )E( j βφ s φ ) (cid:62) E( j φφ s φ ) (cid:21) and P p +1 ( θ ) = (cid:20) E( s β s (cid:62) β s φ ) E( s β s φ s φ )E( s β s φ s φ ) (cid:62) E( s φ s φ s φ ) (cid:21) , where E( j ββ s β t ) = n (cid:88) i =1 (cid:20) − f i φ / (cid:18) λ i − λ i η i √ φ − (cid:19)(cid:21) x i x (cid:62) i x i,t , E( j βφ s β t ) = n (cid:88) i =1 (cid:20) φ λ i f i (cid:26) − η i φ + 1 + λ i η i √ φ (cid:27) + 1 φ (cid:26) F i − η i f i √ φ (cid:27)(cid:21) x (cid:62) i x i,t , E( j φφ s β t ) = n (cid:88) i =1 φ / (cid:20) λ i f i η i √ φ (cid:26) η i φ − − λ i η i √ φ (cid:27) + f i η i φ + 3 f i (cid:21) x i,t , E( j ββ s φ ) = n (cid:88) i =1 (cid:20) f i η i φ / (1 − F i ) (cid:26) λ i − η i √ φ (cid:27) − η i f i φ / (cid:21) x i x (cid:62) i , E( j βφ s φ ) = n (cid:88) i =1 (cid:20) λ i f i η i φ (cid:26) η i φ − − λ i η i √ φ (cid:27) + f i φ / (cid:26) η i φ (cid:27)(cid:21) x (cid:62) i , E( j φφ s φ ) = n (cid:88) i =1 (cid:20) λ i η i φ (cid:26) − η i f i φ + 3 f i + λ i f i η i √ φ (cid:27) + F i φ − η i f i φ / − f i η i φ / (cid:21) , E( s β s (cid:62) β s β t ) = n (cid:88) i =1 (cid:20) − λ i f i φ / + f i φ / (cid:8) η i + 2 φ (cid:9)(cid:21) x i x (cid:62) i x i,t , E( s β s (cid:62) β s φ ) = n (cid:88) i =1 (cid:20) η i f i φ / (cid:26) λ i − − η i φ (cid:27) + F i φ (cid:21) x i x (cid:62) i , E( s β s φ s β t ) = n (cid:88) i =1 (cid:20) f i η i φ / (cid:8) λ i − (cid:9) + F i φ − f i η i φ / (cid:21) x i x i,t , E( s β s φ s φ ) = n (cid:88) i =1 (cid:20) f i η i φ / (cid:26) − λ i
12 + 1 (cid:27) + f i φ / (cid:26) η i φ (cid:27)(cid:21) x i , E( s φ s φ s β t ) = n (cid:88) i =1 (cid:20) − f i η i φ / (cid:26) λ i − − η i φ (cid:27) + 5 f i φ / (cid:21) x i,t , E( s φ s φ s φ ) = n (cid:88) i =1 (cid:20) f i η i φ / (cid:26) λ i − − η i φ (cid:27) + F i φ − f i η i φ / (cid:21) . ppendix B. Simulation The aim of the simulation experiment is to compare the performance of the BR and ML estimator in count and limiteddependent variable models with varying probabilities of infinite ML estimates. The comparison here is in terms of bias,variance, and empirical coverage of nominally 95% Wald-type confidence intervals based on the asymptotic normalityof the estimators. Our results were obtained using R R Core Team, 2020). Random variables were generatedusing the default methods for the relevant distributions, which in turn rely on uniform random numbers obtained by theMersenne Twister, currently R ’s default generator.The same data generating process as in Section 3 of the main paper is considered, with the coefficient of thebinary regressor x set to the less extreme value β = −
3. The amount of correlation between x and x varieswith π ∈ { , / , / , / , / } so that increasing the value of π leads to decreasing the probability of infinite estimates.The sample sizes we consider are n ∈ { , , , , } . For each combination of π and n , 10 ,
000 independentsamples are simulated, and the parameters of the Poisson and Tobit regression models in Section 3 are estimated usingmaximum likelihood and bias reduction. The estimates are then used to compute simulation-based estimates of the bias,variance, and coverage probability for β .For the ML estimator, the bias, variance, and coverage probabilities are computed conditionally on the finitenessof the ML estimates. We classify an ML estimate as infinite if the corresponding estimated standard error exceeds20. In effect, we are assuming that if the standard error exceeds 20, the Fisher scoring iteration for ML stopped whilemoving along an asymptote on the log-likelihood surface, hence, at a point where the inverse negative hessian has atleast one massive diagonal element. The heuristic value 20 is conservative even for n = 25. This has been verifiedthrough a pilot simulation study to estimate the variance of the reduced-bias estimator, which has the same asymptoticdistribution as the ML estimator. No convergence issues were encountered and the maximum estimated standard errorof the reduced-bias estimators accross simulation settings, parameters, and sample sizes was 8 . . β are infinite, theestimated bias, the estimated variance, and the estimated coverage probability of 95% Wald-type confidence intervals,respectively, for the Poisson model. Figures B.5, B.6, B.7, and B.8 show the corresponding results for the Tobit model.The results for Poisson and Tobit regression lead to similar insights: • Bias reduction via adjusted score functions always yields finite estimates. • The BR estimator has bias close to zero even for small sample sizes. • Wald-type confidence intervals based on BR estimates have good coverage properties. • The variances of the BR and ML estimator get closer to each other and closer to zero as n increases. This is exactlywhat the theory suggests because the score functions asymptotically dominate the bias-reducing adjustments.6 = p = p = p = p = n P r obab ili t y unconditional conditional BR ML Figure B.1: Probability of infinite estimates for β (Poisson). p = p = p = p = p = n B i a s unconditional conditional BR ML Figure B.2: Bias of estimates for β (Poisson). p = p = p = p = p = n V a r i an c e unconditional conditional BR ML Figure B.3: Variance of estimates for β (Poisson). p = p = p = p = p = n C o v e r age unconditional conditional BR ML Figure B.4: Coverage of 95% Wald-type confidence intervals for β (Poisson). = p = p = p = p = n P r obab ili t y unconditional conditional BR ML Figure B.5: Probability of infinite estimates for β (Tobit). p = p = p = p = p = n B i a s unconditional conditional BR ML Figure B.6: Bias of estimates for β (Tobit). p = p = p = p = p = n V a r i an c e unconditional conditional BR ML Figure B.7: Variance of estimates for β (Tobit). p = p = p = p = p = n C o v e r age unconditional conditional BR ML Figure B.8: Coverage of 95% Wald-type confidence intervals for β (Tobit).(Tobit).