[PDF] Bayesian Panel Quantile Regression for Binary Outcomes with Correlated Random Effects: An Application on Crime Recidivism in Canada

Abstract

This article develops a Bayesian approach for estimating panel quantile regression with binary outcomes in the presence of correlated random effects. We construct a working likelihood using an asymmetric Laplace (AL) error distribution and combine it with suitable prior distributions to obtain the complete joint posterior distribution. For posterior inference, we propose two Markov chain Monte Carlo (MCMC) algorithms but prefer the algorithm that exploits the blocking procedure to produce lower autocorrelation in the MCMC draws. We also explain how to use the MCMC draws to calculate the marginal effects, relative risk and odds ratio. The performance of our preferred algorithm is demonstrated in multiple simulation studies and shown to perform extremely well. Furthermore, we implement the proposed framework to study crime recidivism in Quebec, a Canadian Province, using a novel data from the administrative correctional files. Our results suggest that the recently implemented "tough-on-crime" policy of the Canadian government has been largely successful in reducing the probability of repeat offenses in the post-policy period. Besides, our results support existing findings on crime recidivism and offer new insights at various quantiles.

Full PDF

aa r X i v : . [ ec on . E M ] J a n Preprint submitted to arXiv.org

Bayesian panel quantile regression for binary outcomeswith correlated random effects: An application on crimerecidivism in Canada

Georges Bresson · Guy Lacroix · Mohammad ArshadRahman

January 2020

Abstract

This article develops a Bayesian approach for estimating panel quantile regression with binaryoutcomes in the presence of correlated random effects. We construct a working likelihood using anasymmetric Laplace (AL) error distribution and combine it with suitable prior distributions to obtainthe complete joint posterior distribution. For posterior inference, we propose two Markov chain MonteCarlo (MCMC) algorithms but prefer the algorithm that exploits the blocking procedure to producelower autocorrelation in the MCMC draws. We also explain how to use the MCMC draws to calculate themarginal effects, relative risk and odds ratio. The performance of our preferred algorithm is demonstratedin multiple simulation studies and shown to perform extremely well. Furthermore, we implement theproposed framework to study crime recidivism in Quebec, a Canadian Province, using a novel data fromthe administrative correctional ﬁles. Our results suggest that the recently implemented “tough-on-crime”policy of the Canadian government has been largely successful in reducing the probability of repeatoffenses in the post-policy period. Besides, our results support existing ﬁndings on crime recidivism andoffer new insights at various quantiles.

Keywords

Bayesian inference · correlated random effects · crime · panel data · quantile regression · recidivism. JEL codes

C11 · C31 · C33 · C35 · K14 · K42.

Georges BressonDepartment of Economics, Universit´e Paris II, Paris, France.E-mail: [email protected]

Corresponding author. Department of Economics, Universit´e Paris II, 12 place du Panth´eon, 75231 Paris cedex 05, France(Tel.: +33 (1) 44 41 89 73).

Guy LacroixDepartment of Economics, Universit´e Laval, Quebec, Canada.E-mail: [email protected] Arshad RahmanDepartment of Economic Sciences, Indian Institute of Technology, Kanpur, India.E-mail: [email protected]

Bresson, Lacroix and Rahman

The concept of quantile regression introduced in Koenker and Bassett (1978) has captured the attentionof both statisticians and econometricians, theorists as well as applied researchers, and across school ofthoughts i.e., Classicals (or Frequentists) and Bayesians. Quantile regression offers several advantagesover mean regression (such as robustness against outliers, desirable equivariance properties, etc .) andestimation methods, particularly for cross-section data, are also well developed . The method has beenemployed in various disciplines including economics, ﬁnance, and the social sciences (Koenker, 2005;Davino et al., 2013). However, the development of quantile regression for panel data witnessed notice-able delay (more than two decades) because of complexities in estimation. The primary challenge wasthat quantiles, unlike means, are not linear operators and hence standard differencing (or demeaning)methods are not applicable to estimation of quantile regression. The challenges in estimation increases,if, for example, the outcome variable is discrete (such as binary or ordinal) because quantiles for suchvariables are not readily deﬁned. Besides, modeling of panel data brings in consideration of unobservedindividual-speciﬁc heterogeneity and the related debate on the choice of “random-effects” versus “ﬁxed-effects”. Motivated by these challenges in modeling and estimation, this paper considers a quantile re-gression model for panel data in the presence of correlated-random effects (CRE) and introduces twoMarkov chain Monte Carlo (MCMC) algorithms for its estimation. The proposed framework is appliedto study crime recidivism in the Province of Quebec, Canada, using a novel data constructed from theadministrative correctional ﬁles.The current paper touches on at least two growing econometric/statistic literatures – quantile re-gression for panel data and panel quantile regression for discrete outcomes. In reference to the former,Koenker (2004) was ﬁrst to suggest a penalization based approach to estimate quantile regression modelwith unobserved individual-speciﬁc effects . Geraci and Bottai (2007) adopted the likelihood based ap-proach of Yu and Moyeed (2001) and constructed a working likelihood using the asymmetric Laplace(AL) distribution. They proposed a Monte Carlo expectation-maximization (EM) algorithm to estimatethe panel quantile regression model and apply it to study labor pain data reported in Davis (1991). Later,Geraci and Bottai (2014) extended the panel quantile regression model of Geraci and Bottai (2007) toaccommodate multiple individual-speciﬁc effects and suggested strategies to reduce the computationalburden of the Monte Carlo EM algorithm. A Bayesian approach to estimate the panel quantile regressionwas presented in Luo et al. (2012), where they propose a Gibbs sampling algorithm by exploiting thenormal-exponential mixture representation of the AL distribution (Kozumi and Kobayashi, 2011). Wang(2012) also utilized the AL density to develop a Bayesian estimation method for quantile regression in aparametric nonlinear mixed-effects model.The papers on quantile regression mentioned in the previous paragraph have assumed that the unob-served individual-speciﬁc effects are uncorrelated with the regressors – also known as “random-effects”in the Classical econometrics literature. In contrast, when the individual-speciﬁc effects are assumed tobe correlated with the regressors, the models have been termed as “ﬁxed-effects” model. Fixed-effects Some Classical techniques include simplex method (Dantzig, 1963; Dantzig and Thapa, 1997, 2003;Barrodale and Roberts, 1973; Koenker and d’Orey, 1987), interior point algorithm (Karmarkar, 1984; Mehrotra, 1992)and smoothing algorithm (Madsen and Nielsen, 1993; Chen, 2007). Bayesian methods using Markov chain Monte Carlo(MCMC) algorithms for estimating quantile regression was introduced in Yu and Moyeed (2001) and reﬁned, amongst others,in Kozumi and Kobayashi (2011). A non-Markovian simulation based algorithm was proposed in Rahman (2013). See alsoSoares and Fagundes (2018) for interval quantile regression using swarm intelligence. For other development in quantile regression on panel data see, amongst others, Lamarche (2010), Canay (2011),Chernozhukov et al. (2013), Galvao et al. (2013), Galvao and Kato (2017), Graham et al. (2018), and Galvao and Poirier (2019)to mention a few. ayesian panel quantile regression for binary outcomes with correlated random effects 3 models suffer from the limitation that it cannot estimate the coefﬁcient for time-invariant regressors. So,when most of the variation in a regressor is located in the individual dimension (rather than in the timedimension), estimation of coefﬁcients of time varying regressors may be imprecise. Most disciplines inapplied statistics, other than econometrics, use the random-effects model (Cameron and Trivedi, 2005).However, as shown in Baltagi (2013), most applied work in economics have settled the choice betweenthe two speciﬁcations using the speciﬁcation test proposed in Hausman (1978).Between the questionable orthogonality assumption of the random-effects model and the limitationsof the ﬁxed-effects speciﬁcation, lies the idea of correlated random-effects (CRE). This concept is uti-lized in the current paper to soften the assertion of unobserved individual heterogeneity being uncor-related with regressors. The CRE was introduced in Mundlak (1978), where he models the individual-speciﬁc effects as a linear function of the time averages of all the regressors. Hausman and Taylor (1981)proposed an alternative speciﬁcation in which some of the time-varying and time-invariant regressorsare related to the unobserved individual-speciﬁc effects. Later, Chamberlain (1982, 1984) considereda richer model and deﬁned the individual-speciﬁc effects as a weighted sum of the regressors. TheseCRE models lead to an estimator of the coefﬁcients of the regressors that equals the ﬁxed-effects es-timator. The literature has numerous publications on the Hausman tests or the CRE models in a linearor non-linear framework. We refer the reader to Baltagi (2013), Wooldridge (2010), Arellano (1993),Burda and Harding (2013), Greene (2015) and references therein. Most recently, Joshi and Wooldridge(2019) extended the CRE approach to linear panel data models when instrumental variables are neededand the panel is unbalanced.Within the quantile regression for panel data literature, Abrevaya and Dahl (2008) incorporated theCRE to the quantile panel regression model and utilized it to study birth weight using a balanced paneldata from Arizona and Washington. They make certain simplifying assumptions which allows them to es-timate the model using pooled linear quantile regression. Following the quantile regression framework ofAbrevaya and Dahl (2008), Bache et al. (2013) considers a more restricted speciﬁcation to model birthweight using an unbalanced panel data from Denmark. Arellano and Bonhomme (2016) introduced aclass of QR estimators for short panels, where the conditional quantile response function of the unob-served heterogeneity is also speciﬁed as a function of observables. The literature on Bayesian panelquantile regression with CRE is limited to Kobayashi and Kozumi (2012), where they develop Bayesianquantile regression for censored dynamic panel data and proposed a Gibbs sampling algorithm to es-timate the model. The initial condition problem arising due to the dynamic nature of the model wassuccessfully managed using correlated random effects. In addition, they implement the framework tostudyThe literature on panel quantile regression for discrete outcomes is quite sparse and most of the workhas only come recently . Alhamzawi and Ali (2018) extended the Bayesian ordinal quantile regressionintroduced in Rahman (2016) to panel data and use it to analyze treatment related changes in illnessseverity using data from the National Institute of Mental Health Schizophrenia Collaborative (NIMHSC),and previously analyzed in Gibbons and Hedeker (1993). Ghasemzadeh et al. (2018a) proposed a Gibbssampling algorithm to estimate Bayesian quantile regression for ordinal longitudinal response in the pres- Baltagi et al. (2003) suggested an alternative pretest estimator based on the Hausman-Taylor (HT) model. This pretestalternative considers an HT model in which some of the variables, but not all, may be correlated with the individual effects.The pretest estimator becomes the random-effects estimator if the standard Hausman test is not rejected. The pretest estimatorbecomes the HT estimator if a second Hausman test (based on the difference between the FE and HT estimators) does not rejectthe choice of strictly exogenous regressors. Otherwise, the pretest estimator is the FE estimator. A body of work related to quantile regression for discrete outcomes include, but is not limited to, Kordas (2006),Benoit and Poel (2010), Alhamzawi (2016), Omata et al. (2017), Alhamzawi and Ali (2018) and Rahman and Karnawat (2019)

Bresson, Lacroix and Rahman ence of non-ignorable missingness and use it to analyze the Schizophrenia data of Gibbons and Hedeker(1993). Ghasemzadeh et al. (2018b) developed a Bayesian quantile regression model for bivariate lon-gitudinal mixed ordinal and continuous responses to study the relationship between reading ability andantisocial behavior amongst children using the Peabody Individual Achievement Test (PIAT) data. Mostrecently, Rahman and Vossmeyer (2019) considered a panel quantile regression model with binary out-comes and develop an efﬁcient blocked sampling algorithm. They apply the framework to study femalelabor force participation and home ownership using data from the Panel Study of Income Dynamics(PSID).This article contributes to the two literatures by incorporating the CRE concept into the panel quan-tile regression model for binary outcomes. Our proposed framework is more general and can accommo-date the binary panel quantile regression model of Rahman and Vossmeyer (2019) as a special case. Wepresent two MCMC algorithms – a simple (non-blocked) Gibbs sampling algorithm and another blockedGibbs sampling algorithm that exploits the block sampling of parameters to reduce the autocorrelation inMCMC draws. We also explain how to calculate the marginal effects, relative risk and the odds ratio usingthe MCMC draws. The performance of the blocked algorithm is thoroughly tested in multiple simulationstudies and shown to perform extremely well. Lastly, we implement the model to study crime recidivismin the Province of Quebec, Canada, using data from the administrative correction ﬁles for the period2007 − We propose a binary quantile regression framework for panel data where the individual-speciﬁc effectsare correlated with the covariates giving rise to correlated random effects. The resulting binary panelquantile regression with correlated random effects (BPQRCRE) model can be conveniently expressed inthe latent variable formulation of Albert and Chib (2001) as follows, z it = x ′ it β + α i + ε it ∀ i = , · · · , n , t = , · · · , T i , y it = (cid:26) z it > , α i ∼ N ( m ′ i ζ , σ α ) , (1)where z it is a continuous latent variable associated with the binary outcome y it , x ′ it = ( x it , , x it , , · · · , x it , k ) is a ( × k ) vector of explanatory variables including the intercept, β is the ( k × ) vector of commonparameters, and α i is the individual-speciﬁc effect assumed to be independently distributed as a normaldistribution, i.e., α i ∼ N (cid:0) m ′ i ζ , σ α (cid:1) . Here m i , j = ∑ T i t = x it , j / T i (for j = , ..., k ) and m ′ i = ( m i , , · · · , m i , k ) ayesian panel quantile regression for binary outcomes with correlated random effects 5 is a ( × ( k − )) vector of individual means of explanatory variables excluding the intercept. The de-pendence of α on the covariates ( x ) yields a correlated random effects model (Mundlak, 1978). Theerror term ε it , conditional on α i , is assumed to be independently and identically distributed ( iid ) as anAsymmetric Laplace (AL) distribution i.e., ε it | α i iid ∼ AL ( , , p ) , where p denotes the quantile. The ALerror distribution is used to create a working likelihood and has been utilized in previous studies onlongitudinal data models such as Luo et al. (2012) and Rahman and Vossmeyer (2019).In the proposed BPQRCRE framework, the modeling of correlated random effects as a function ofthe means of the covariates is inspired from Mundlak (1978). Utilizing m ′ i as a set of controls for un-observed heterogeneity is both intuitive and advantageous. It is intuitive because it estimates the effectof the covariates holding the time average ﬁxed, and advantageous because it serves a compromise be-tween the questionable orthogonality assumptions of the random effects model and the limitation of theﬁxed effects speciﬁcation which leads to the incidental parameters problem. The considered model re-duces to the standard uncorrelated random effects case, if we set ζ =

0, i.e., assume α i is independentof the covariates (Rahman and Vossmeyer, 2019). Here, we note that Chamberlain (1982, 1984) allowedfor correlation between α i and the covariates x ′ it (excluding the intercept) through a more general for-mulation: α i ∼ N (cid:16) ∑ T i t = x ′ it ζ t , σ α (cid:17) . However, this approach is more involved for an unbalanced panel,particularly if endogeneity attrition is the reason for the panel to be unbalanced (see Wooldridge, 2010).Besides, the correlated random effects speciﬁcation has a number of virtues for nonlinear panel datamodels as underlined in Burda and Harding (2013) and Greene (2015). Hence, we prefer the approachpresented in Mundlak (1978) compared to the method in Chamberlain (1980, 1982, 1984).The BPQRCRE model as presented in equation (1) can be directly estimated using MCMC algo-rithms, but the resulting posterior will not yield the full set of tractable conditional posteriors necessaryfor a Gibbs sampler. Therefore, as done in Luo et al. (2012) and Rahman and Vossmeyer (2019), we uti-lize the normal-exponential mixture representation of the AL distribution to facilitate Gibbs sampling(Kozumi and Kobayashi, 2011). The mixture representation for ε it can be written as follows, ε it = θ w it + τ √ w it u it , (2)where u it ∼ N ( , ) is mutually independent of w it ∼ E ( ) with E representing the exponential distri-bution and the constants are θ = − pp ( − p ) and τ = p ( − p ) . The mixture representation gives access to theappealing properties of the normal distribution.To implement the Bayesian approach, we stack the model across i . Deﬁne z i = ( z i , ..., z iT i ) ′ , y i =( y i , · · · , y iT i ) ′ , X i = ( x ′ i , · · · , x ′ iT i ) ′ , w i = ( w i , · · · , w iT i ) ′ , D τ √ w i = τ diag ( √ w i , · · · , √ w iT i ) ′ and u i = ( u i , · · · , u iT i ) ′ .The resulting hierarchical model can be written as, z i = X i β + ι T i α i + w i θ + D τ √ w i u i ∀ i = , ..., n , y it = (cid:26) z it > , ∀ i = , ..., n , ; t = , ..., T i , α i ∼ N (cid:0) m ′ i ζ , σ α (cid:1) w it ∼ E ( ) , u it ∼ N ( , ) , β ∼ N k ( β , B ) σ α ∼ IG (cid:18) c , d (cid:19) , ζ ∼ N k − ( ζ , C ) , (3)where ι T i is a ( T i × ) vector of ones and the last line in equation (3) presents the prior distribution onthe parameters. The notation N k ( · ) denotes a multivariate normal distribution of dimension k and IG ( · ) denotes an inverse-gamma distribution. We note that the form of the prior distribution on β holds apenalty interpretation on the quantile loss function (Koenker, 2004). A normal prior on β implies an Bresson, Lacroix and Rahman ℓ penalty and has been used in Geraci and Bottai (2007), Yuan and Yin (2010), Luo et al. (2012) andRahman and Vossmeyer (2019).By Bayes’ theorem, we express the “complete joint posterior” density as proportional to the productof complete likelihood function and the prior distributions as follows, π ( β , α , z , w , ζ , σ α | y ) ∝ (cid:26) n ∏ i = f ( y i | z i , β , α i , w i , ζ , σ α ) π ( z i | β , α i , w i , ζ , σ α ) × π ( w i ) π ( α i ) (cid:27) π ( β ) π ( ζ ) π ( σ α ) ∝ (cid:26) n ∏ i = (cid:20) T i ∏ t = f ( y it | z it ) (cid:21) π ( z i | β , α i , w i , ζ , σ α ) π ( w i ) π ( α i ) (cid:27) × π ( β ) π ( ζ ) π ( σ α ) , (4)where the ﬁrst line assumes independence between prior distributions and second line follows from thefact that given z it , the observed y it is independent of all parameters because the second line of equation (3)determines y it given z it with probability 1. Substituting the distribution of the variables associated withthe likelihood and the prior distributions in equation (4) yields the following expression, π ( β , α , z , w , ζ , σ α | y ) ∝ (cid:26) n ∏ i = T i ∏ t = h I ( z it > ) I ( y it = ) + I ( z it ≤ ) I ( y it = ) i(cid:27) × exp (cid:20) − n ∑ i = n ( z i − X i β − ι T i α i − w i θ ) ′ D − τ √ w i ( z i − X i β − ι T i α i − w i θ ) o(cid:21) × exp (cid:18) − n ∑ i = T i ∑ t = w it (cid:19)(cid:0) πσ α (cid:1) − n exp (cid:20) − σ α n ∑ i = ( α i − m ′ i ζ ) ′ ( α i − m ′ i ζ ) (cid:21) × ( π ) − k | B | − exp (cid:20) − ( β − β ) ′ B − ( β − β ) (cid:21) ( π ) − k − | C | − × exp (cid:20) − ( ζ − ζ ) ′ C − ( ζ − ζ ) (cid:21) × (cid:0) σ α (cid:1) − ( c + ) exp (cid:20) − d σ α (cid:21) . (5)The complete joint posterior density in equation (5) does not have a tractable form, and thus simulationtechniques are necessary for estimation. Similar to Rahman and Vossmeyer (2019), we adopt a Bayesianapproach due to the following two reasons.. First, the likelihood function of a discrete panel data modelis analytically intractable which makes optimization difﬁcult using standard hill-climbing techniques.Second, numerical simulation methods for discrete panel data models are often slow and difﬁcult toimplement as noted in Burda and Harding (2013) and others. The complete joint posterior distribution(equation 5) readily yields a full set of conditional distributions (outlined below) which can be readilyemployed to estimate the model using Gibbs sampling.We can derive the conditional posteriors of the parameters and latent variables from the joint pos-terior density (5) by a straightforward extension of the non-blocked sampling method presented inRahman and Vossmeyer (2019). This is presented in Algorithm 1, and the derivations of the conditionalposterior densities can be found in the supplementary material. The parameters β are sampled from anupdated multivariate normal distribution. Similarly, the parameters α i are sampled from an updated mul-tivariate normal distribution. The latent weights w it are sampled element wise from a generalized inverseGaussian ( GIG ) distribution (Devroye, 2014). The variance σ α is sampled from an updated inverse-gamma ( IG ) distribution. The parameters ζ are sampled from an updated multivariate normal distri-bution. Last, the latent variable z it is sampled element wise from an univariate truncated normal ( T N ) ayesian panel quantile regression for binary outcomes with correlated random effects 7 Algorithm 1

Non-blocked sampling in the BPQRCRE model

1. Sample β | α , z , w ∼ N k (cid:16) e β , e B (cid:17) where, e B − = (cid:18) n ∑ i = X ′ i D − τ √ w i X i + B − (cid:19) , and e β = e B (cid:18) n ∑ i = X ′ i D − τ √ w i ( z i − ι T i α i − w i θ ) + B − β (cid:19) .2. Sample α i | β , z , w , σ α , ζ ∼ N (cid:16)e a , e A (cid:17) for i = , · · · , n , where, e A − = (cid:16) ι ′ T i D − τ √ w i ι T i + σ − α (cid:17) , and e a = e A (cid:16) ι ′ T i D − τ √ w i ( z i − X i β − w i θ ) + σ − α m ′ i ζ (cid:17) .3. Sample w it | β , α i , z it ∼ GIG (cid:16) , e λ it , e η (cid:17) for i = , · · · , n and t = , · · · , T i , where, e λ it = (cid:16) z it − x ′ it β − α i τ (cid:17) , and e η = (cid:16) θ τ + (cid:17) .4. Sample σ α | α , ζ ∼ IG (cid:16) e c , e d (cid:17) where, e c = ( n + c ) , and e d = d + n ∑ i = ( α i − m ′ i ζ ) ′ ( α i − m ′ i ζ ) .5. Sample ζ | α , σ α ∼ N k − (cid:16)e ζ , e Σ ζ (cid:17) where, e Σ − ζ = (cid:18) σ − α n ∑ i = m i m ′ i + C − (cid:19) , and e ζ = e Σ ζ (cid:18) σ − α n ∑ i = m i α ′ i + C − ζ (cid:19) .6. Sample the latent variable z | β , α , w for all values of i = , · · · , n and t = , · · · , T i from an univariate truncatednormal (TN) distribution as follows, z it | β , α , w ∼ ( T N ( − ∞ , ] (cid:0) x ′ it β + α i + w it θ , τ w it (cid:1) if y it = , T N ( , ∞ ) (cid:0) x ′ it β + α i + w it θ , τ w it (cid:1) if y it = . distribution. Note that while drawing each of the parameters or latent variables, we hold the remainingquantities ﬁxed as presented in Algorithm 1.The MCMC procedure presented in Algorithm 1 exhibits the conditional posterior distributions forthe parameters and latent variables necessary for a Gibbs sampler. While this Gibbs sampler is straight-forward, there is potential for poor mixing of the MCMC draws due to correlation between ( β , α i ) and( z i , α i ). This correlation arises because the variables corresponding to the parameters in α i are oftena subset of those in x ′ it . Thus conditioning these items on one another leads to high autocorrelation inMCMC draws as demonstrated in Chib and Carlin (1999) and noted in Rahman and Vossmeyer (2019).To avoid the high autocorrelation in MCMC draws, we present an alternative algorithm that jointlysamples ( β , z ) in one block within the Gibbs sampler (see Rahman and Vossmeyer, 2019, for more on theblocking procedure). The details of our blocked sampler are described in Algorithm 2, and the deriva-tions of the conditional posterior densities are presented in the supplementary ﬁle. Speciﬁcally, β issampled marginally of α i from a multivariate normal distribution. Then the latent variable z i is sampledmarginally of α i from a truncated multivariate normal distribution denoted by T MV N B i , where B i is thetruncation region given by B i = ( B i × B i × ... × B iT i ) such that B it is the interval ( , ∞ ) if y it = ( − ∞ , ] if y it =

0. To draw from a truncated multivariate normal distribution, we utilize themethod proposed in Geweke (1991, 2005); as done in Rahman and Vossmeyer (2019). This involvesdrawing from a series of conditional posteriors which are univariate truncated normal distributions. Theparameter α i is sampled conditional on ( β , z , w , σ α , ζ ) from an updated multivariate normal distribution.The latent weights w it are sampled element wise from a generalized inverse Gaussian ( GIG ) distribution(Devroye, 2014). The variance σ α is sampled from an updated inverse-gamma ( IG ) distribution. Lastly, Bresson, Lacroix and Rahman

Algorithm 2

Blocked sampling in the BPQRCRE model

1. Sample ( β , z i ) marginally of α in one block as follows.(a) Let Ω i = σ α J T i + D τ √ w i with J T i = ι T i ι ′ T i . Sample β | z , w , σ α , ζ ∼ N k (cid:16) e β , e B (cid:17) where, e B − = (cid:18) n ∑ i = X ′ i Ω − i X i + B − (cid:19) , and e β = e B (cid:18) n ∑ i = X ′ i Ω − i ( z i − ι T i x ′ i ζ − w i θ ) + B − β (cid:19) .(b) Sample the vector z i | β , w i , σ α , ζ ∼ T MV N B i ( X i β + ι T i m ′ i ζ + w i θ , Ω i ) for all i = , ..., n , where B i =( B i × B i × ... × B iT i ) and B it is the interval ( , ∞ ) if y it = ( − ∞ , ] if y it =

0. This isachieved by sampling z i at the j -th pass of the MCMC iteration using a series of conditional posteriordistributions as follows: z jit | z ji , ..., z ji ( t − ) , z ji ( t + ) , ..., z jiT i ∼ T N B it (cid:0) µ t |− t , Σ t |− t (cid:1) , for t = , ..., T i ,where T N denotes a truncated normal distribution. The terms µ t |− t and Σ t |− t are the conditional meanand variance, and are deﬁned as, µ t |− t = x ′ it β + m ′ i ζ + w it θ + Σ t , − t Σ − − t , − t (cid:16) z ji , − t − ( X i β + ι T i x ′ i ζ + w i θ ) − t (cid:17) , Σ t |− t = Σ t , t − Σ t , − t Σ − − t , − t Σ − t , t , where z ji , − t = (cid:16) z ji , ..., z ji ( t − ) , z j − i ( t + ) , ..., z j − iT i (cid:17) ′ , ( X i β + ι T i m ′ i ζ + w i θ ) − t is a column vector with t -th ele-ment removed, Σ t , t denotes the ( t , t ) -th element of Ω i , Σ t , − t denotes the t -th row of Ω i with element inthe t -th column removed and Σ − t , − t is the Ω i matrix with t -th row and t -th column removed.2. Sample α i | β , z , w , σ α , ζ ∼ N (cid:16)e a , e A (cid:17) for i = , ..., n , where, e A − = (cid:16) ι ′ T i D − τ √ w i ι T i + σ − α (cid:17) , and e a = e A (cid:16) ι ′ T i D − τ √ w i ( z i − X i β − w i θ ) + σ − α m ′ i ζ (cid:17) .3. Sample w it | β , α i , z it ∼ GIG (cid:16) , e λ it , e η (cid:17) for i = , ..., n , and t = , ..., T i , where, e λ it = (cid:16) z it − x ′ it β − α i τ (cid:17) , and e η = (cid:16) θ τ + (cid:17) .4. Sample σ α | α , ζ ∼ IG (cid:16) e c , e d (cid:17) where, e c = ( n + c ) , and e d = d + n ∑ i = ( α i − m ′ i ζ ) ′ ( α i − m ′ i ζ ) .5. Sample ζ | α , σ α ∼ N k − (cid:16)e ζ , e Σ ζ (cid:17) where, e Σ − ζ = (cid:18) σ − α n ∑ i = m i m ′ i + C − (cid:19) , and e ζ = e Σ ζ (cid:18) σ − α n ∑ i = m i α ′ i + C − ζ (cid:19) . the parameters ζ are sampled from an updated multivariate normal distribution. Once again, while sam-pling each quantity of interest, we hold the remaining parameters or latent variables ﬁxed as exhibited inAlgorithm 2. In this section, we present two simulation studies to demonstrate the performance of the blocked algo-rithm for the BPQRCRE model. The simulation data are generated from the following model, z it = x ′ it β + α i + ε it , ∀ i = , · · · , n , and t = , · · · , T i , α i = m ′ i ζ + ξ i , ξ i ∼ N (cid:0) , σ α (cid:1) . (6) ayesian panel quantile regression for binary outcomes with correlated random effects 9 where x ′ it = [ , x it , , x it , , x it , ] , m ′ i = [ m i , , m i , ] , m i , j = ∑ T i t = x it , j / T i , j = , β = ( β , β , β , β ) ′ =( . , , . , − . ) ′ , ζ = ( ζ , ζ ) ′ = ( − , ) ′ . The covariates are generated as x it , ∼ U ( − , ) , x it , ∼ U ( − , ) , x it , ∼ U ( − , ) , where U denotes a uniform distribution, and σ α =

1. Our ﬁrst sample isunbalanced with n = ,

000 and T i ∼ U ( , ) , leading to T = ∑ ni = T i = ,

989 observations. In a secondexercise, we increase the number of individuals n = ,

000 leading to T = ,

985 observations. The errorterm is generated from a standard AL distribution, i.e., ε it ∼ AL ( , , p ) for i = , · · · , n , and t = , · · · , T i at three different quantiles p = 0 .

25, 0 .

5, 0 . y is constructed from the continuous variable z , by assigning y it = z it > y it = z it ≤ i = , · · · , n and t = , · · · , T i . We note thatthe binary response values of 0s and 1s are different at each quantile, because the error values gen-erated from an AL distribution are different for each quantile. In the ﬁrst simulation exercise with n = , ( , ) , ( , ) and ( , ) , respectively. In the second simulation exercisewith n = , ( , ) , ( , ) and ( , ) , respectively. To complete the Bayesiansetup for estimation, we use the following independent prior distributions: β ∼ N k (cid:0) k , I k (cid:1) , ζ ∼ N k − (cid:0) k − , I k − (cid:1) , σ α ∼ IG ( / , / ) . For each exercise, we generate 16 ,

000 MCMC sampleswhere the ﬁrst 1 ,

000 values are discarded as burn-ins. The posterior estimates are reported based onthe remaining 15 ,

000 MCMC iterations with a thinning factor of 10. The mixing of the MCMC chain isextremely good as illustrated in Figure 1, which reports the trace and autocorrelation plots of the parame-ters from the second simulation exercise at the 75th quantile. The ﬁgure shows that, as desired, the chainsmix well and the autocorrelation of the MCMC draws are close to zero. The plots from the ﬁrst simu-lation exercise and the remaining quantiles in the second simulation exercise are extremely similar andnot presented to avoid repetition and keep the paper within reasonable length. To supplement the plots in β − . − . − . − . − . β − . − . β − . − . − . − . β − . − . ζ − . − . − . − . ζ − . − . − . − . σ α − . − . − . β − . β − . − . − . − . β − . − . β − . ζ − . − . − . ζ − . − . − . σ α − . − . Table 1:

Autocorrelation in MCMC draws at Lag 1, Lag 5 and Lag 10 for n = ,

000 individuals (upper panel) and n = , ( k ) ( k ) ( k ) ( k ) ( k ) ( k ) ( k ) Fig. 1:

Trace plots and autocorrelation plots of the parameters for the 75th quantile and n = ,

000 individuals. ayesian panel quantile regression for binary outcomes with correlated random effects 11

TRUE MEAN STD IF MEAN STD IF MEAN STD IF n=1000 β β β β − . − . − . − . ζ − . − . − . − . ζ σ α β β β β − . − . − . − . ζ − . − . − . − . ζ σ α Table 2:

True values (True), posterior mean (Mean), standard deviation (Std) and inefﬁciency factor (IF) of the parametersin the simulation study. The upper panel presents results for n = ,

000 individuals and the lower panel presents results for n = ,

000 individuals.

Figure 1, Table 1 presents the autocorrelation in MCMC draws at lag 1, lag 5, and lag 10 conﬁrming thegood mixing across simulation exercises and at all quantiles.The results from the two simulation exercises are presented in Table 2. Speciﬁcally, the table reportsthe true values of the parameters used to generate the data, along with the posterior mean, standard de-viation and inefﬁciency factor (calculated using the batch-means method discussed in Greenberg, 2012)of the MCMC draws. In general, the results show that the posterior means for ( β , ζ ) are near to theirrespective true values, β = ( . , , . , − . ) ′ and ζ = ( − , ) ′ across all considered quantiles. The pos-terior standard deviations for all the parameters are small and all the coefﬁcients are statistically differentfrom zero. So, the proposed MCMC algorithm is successful in correctly estimating all the model param-eters across all quantiles. This is especially important because the number of 0s and 1s were differentfor each quantile. Moreover, the inefﬁciency factor for all the parameters is close to 1, suggesting a goodsampling performance and a nice mixing of the Markov chain. Comparing the results from the ﬁrst andsecond simulation exercise, we see that when the sample size is increased from ( n = , T = , n = , T = , β , ζ , and ζ at the 25th quantile arereduced to a large extent. To summarize, the proposed algorithm for estimating BQQRCRE model doeswell in both the simulations, but the advantages of having a larger data is clearly evident in the posteriorresults. Our proposed binary panel quantile model is nonlinear, as such the coefﬁcients by themselves do notgive the marginal effects (Rahman, 2016; Rahman and Vossmeyer, 2019). However, marginal effectsare important to understand the effect of a covariate on the probability of success. For example, in our current application one may be interested in seeing how the probability of recidivism is affected due toan additional year of schooling, decreasing regional unemployment rate by 1 percentage, or involvementin violent crime. These may be useful to policy makers and researchers alike.To formally derive the marginal effects, we rewrite the BPQRCRE model presented in Equation (1)as follows, z it = x ′ it β + α i + ε it , ∀ i = , · · · , n , and t = , · · · , T i , α i ∼ N ( m ′ i ζ , σ α ) , (7)where ε it = w it θ + τ √ w it u it . We know ε it iid ∼ AL ( , , p ) for i = , · · · , n and t = , · · · , T i , which implies z it | α i ind ∼ AL ( x ′ it β + α i , , p ) , where ind denotes independently distributed.Given the model framework, the probability of success can be calculated as,Pr ( y it = | x it , β , α i ) = Pr ( z it > | β , α i , x it )= − Pr ( z it ≤ | β , α i , x it )= − Pr ( ε it ≤ − x ′ it β − α i | β , α i , x it )= − F AL ( − x ′ it β − α i , , , p ) , (8)for i = , · · · , n and t = , · · · , T i , where F AL ( x , , , p ) denotes the cumulative distribution function ( cdf )of an AL distribution evaluated at x , with location 0, scale 1 and quantile p .Marginal effect (i.e., the derivative of the probability of success with respect to a covariate) is oftencomputed at the average covariate values or by averaging the marginal effects over the sample, alias average partial effects (Wooldridge, 2010; Greene, 2017). However, Jeliazkov and Vossmeyer (2018)show that both these quantities can be clearly inadequate in nonlinear settings (e.g., binary, ordinal andPoisson models) because they employ point estimates rather than their full distribution. To account forthe uncertainty in parameters, we need another layer of integration over the model parameters. This ideaof calculating the marginal effect that accounts for uncertainty in parameters and the covariates has beenpreviously considered, amongst others, by Chib and Jeliazkov (2006) in the context of semiparametricdynamic binary longitudinal models, and Jeliazkov et al. (2008) and Jeliazkov and Rahman (2012) inrelation to ordinal and binary models. Within the quantile literature, this has been mentioned by Rahman(2016) in the context of ordinal models and discussed by Rahman and Vossmeyer (2019) in connectionto binary longitudinal outcome models.Suppose, we are interested in the average marginal effect i.e., average difference between proba-bilities of success when the j -th covariate { x it , j } T i t = is set to the values a and b , denoted as { x ait , j } T i t = and { x bit , j } T i t = , respectively. To proceed, we split the covariate and parameter vectors as follows: x ait =( x ait , j , x it , − j ) , x bit = ( x bit , j , x it , − j ) , and β = ( β j , β − j ) , where − j in the subscript denotes all covariates/parametersexcept the j -th covariate/parameter. We are interested in the distribution of the difference { Pr ( y it = | x bit , j ) − Pr ( y it = | x ait , j ) } , marginalized over { x it , − j } and the parameters ( β , α ) , given the data y =( y , · · · , y n ) ′ . As done in Chib and Jeliazkov (2006) and Rahman and Vossmeyer (2019), we marginal-ize the covariates using their empirical distribution and integrate the parameters using their posteriordistribution. ayesian panel quantile regression for binary outcomes with correlated random effects 13 To obtain a sample of draws from the distribution of the difference in probabilities of success,marginalized over { x it , − j } and ( β , α ) , we express it as follows, { Pr ( y it = | x bit , j ) − Pr ( y it = | x ait , j ) } = Z n P ( y it = | x bit , j , x it , − j , β , α ) − P ( y it = | x ait , j , x it , − j , β , α ) o × π ( x it , − j ) π ( β | y ) π ( α | y ) d ( x it , − j ) d β d α . (9)Drawing a sample from the above predictive distribution (i.e., equation 9) utilizes the method of com-position. This involves randomly drawing an individual, extracting the corresponding sequence of co-variate values, drawing a value ( β , α ) from the posterior distribution and ﬁnally evaluating { Pr ( y it = | x bit , j ) − Pr ( y it = | x ait , j ) } . This is repeated for all other individuals and other draws from the posteriordistribution. Finally, the average marginal effect ( AME

Bayes ) is calculated as the average of the differencein pointwise probabilities of success as follows, AME

Bayes ≈ T M n ∑ i = T i ∑ t = M ∑ m = h F AL ( − x ait , j β ( m ) j − x ′ it , − j β ( m ) − j − α mi , , , p ) − F AL ( − x bit , j β ( m ) j − x ′ it , − j β ( m ) − j − α mi , , , p ) i (10)where the expression for probability of success follows from equation (8), T = ∑ ni = T i is the total numberof observations, and M is the number of MCMC draws. Here, ( β ( m ) , α ( m ) ) is an MCMC draw of ( β , α ) for m = , ..., M . The quantity in equation (10) provides estimate that integrates out the variability in thesample and the uncertainty in parameter estimation.Relative risk ( RR ) can be calculated to demonstrate the association between the risk factor or ex-posure ( x j ) and the event ( y ) being studied. It is the ratio of the probability of the outcome with therisk factor ( x j = b ) to the probability of the outcome with the risk factor ( x j = a ) ( e.g. , exposed ( b = a = RR ( b / a ) Bayes = T M n ∑ i = T i ∑ t = M ∑ m = H bAL H aAL . (11)where H rAL = − F AL ( − x rit , j β ( m ) j − x ′ it , − j β ( m ) − j − α mi , , , p ) for r = a , b , is the complement of the cdf ofthe AL distribution. If there is a causal effect between the exposure and the outcome, values of RR canbe interpreted as follows: if RR > RR < RR =

1, the exposure does not affect the outcome.The odds ratio is the ratio of the odds of the event occurring with the risk factor ( x j = b ) to the oddsof it occurring with the risk factor ( x j = a ). It is given by: OR ( b / a ) Bayes = T M n ∑ i = T i ∑ t = M ∑ m = (cid:18) H bAL − H bAL (cid:19) (cid:30) (cid:18) H aAL − H aAL (cid:19) . (12)The odds ratio, for a given exposure x j , does not have an intuitive interpretation as the relative risk. ORare often interpreted as if they were equivalent to relative risks while ignoring their meaning as a ratioof odds. Two main factors inﬂuence the discrepancies between RR and OR : the initial risk of an event y it , and the strength of the association between exposure x it , j and the event y it . When the event y it = OR ( b / a ) ≈ RR ( b / a ) , but the odds ratio generally overestimates the relative risk, and thisoverestimation becomes larger with increasing incidence of the outcome. Crime has been extensively studied by economists both theoretically and empirically (see, e.g.

Chalﬁn and McCrary(2017) for a recent survey). Many empirical analyses have used panel data either at the state (Cornwell and Trumbull,1994; Baltagi, 2006; Baltagi et al., 2018) or at the individual level (Bhuller et al., 2019). The vast major-ity of the published papers focus on the situation in the U.S. Here, we study crime recidivism in Canadabetween 2007-2017 for two reasons. First, the Canadian government implemented a “tough-on-crime”policy in 2012 which marked a shift from rehabilitating to warehousing people. Our proposed estimatoris well suited to measure the sensitivity of recidivism to this new policy. Second, offenders who are sen-tenced to less than two years serve their sentence in a provincial correctional institution while offenderssentenced to two years or more serve their’s in a federal penitentiary . The former have committed lessserious crimes and are more likely to reoffend over the time span of our panel. Because our analysis fo-cuses on this population, the impact of the “tough-on-crime” policy may be more easily unearthed fromthe data than if it focused on detainees serving long sentences.5.1 The dataWe utilize a sample data drawn from the administrative correctional ﬁles for the Province of Quebec.The ﬁles are used by corrections personnel to manage activities and interventions related to housing of-fenders and contain detailed information on inmates’ characteristics, correctional facilities, and sentenceadministration. While they offer a wealth of information, the ﬁles have never been used for research pur-poses. For illustrative purpose, we have drawn a random sample of 8,974 detainees out of a populationof 148,441. Each detainee is observed upon release and up until 2017. The earliest releases occur in2007 and the latest in 2016. Overall, our unbalanced panel includes 61,880 observations. Of the 8,974detainees, as many as 3,466 had at least one repeat offense over our sample period.Table 3 presents the main characteristics of our sample. Detainees are 41 years of age on average,have a level of schooling corresponding to a high-school degree, and few are married. Aboriginal de-tainees represent 4.5% of our sample and most are incarcerated in a correctional institution suited totheir needs and speciﬁcities. Approximately 7% of inmates do not have French or English, Canada’s twoofﬁcial languages, as their mother tongue. These include some Aboriginal residents as well as recentimmigrants. Crimes have been aggregated into 4 distinct categories. By far the most common concernsproperty crime. Trafﬁc related and infractions to the criminal code usually entail shorter sentences. Vi-olent crimes receive the longest sentences in our data but necessarily less than two years. As mentionedabove, major crimes fall under the federal jurisdiction. The yearly unemployment rate is measured atthe regional level where a detainee is released. Over our sample period, it varies between 4.4% and17.5%. The “Post 2012” variable is equal to one if a detainee entered the panel at any time during orafter 2012 while the “Pre-Post 2012” variable is equal to one if a detainee entered at any time before2012. In the latter case, repeat offenses are observed over the entire duration of the panel, i.e. 2007-2017.In the former, they are only observed over 2012-2017. Roughly a quarter of our sample belongs to theperiod post the implementation of “tough-on-crime” policy. The remaining observations (74.8%) weresanctioned prior to 2012 and may or may not have reoffended in the Post 2012 period. The next 3 lines Starting in 2012, the government enacted a series of legislations that made prison conditions more austere; imposed length-ier incarceration periods; signiﬁcantly expanded the scope of mandatory minimum penalties; and reduced opportunities forconditional release, parole, and alternatives to incarceration. ayesian panel quantile regression for binary outcomes with correlated random effects 15Mean StdAge 41.366 12.596Schooling 6.011 3.814Married 0.045 0.208Aboriginal † etc. ) 0.099 0.299Property (Theft, Robbery, etc. ) 0.439 0.496Other Infractions to Criminal Code 0.299 0.458Unemployment rate 8.329 2.063Post 2012 (=1) 0.252 0.434Recidivism Entire Sample 0.114 0.318Recidivism Pre-Post 2012 0.091 0.288Recidivism Post 2012 0.023 0.150 † First Nations, Inuit and M´etis.

Table 3: Descriptive Summary of the Sample Data.of the table provide information on the rates of recidivism for distinct periods. Thus, the overall rate ofrecidivism is equal to 11.4%. The next line focuses on individuals who are present both before and afterthe implementation of the “tough-on-crime” policy. Their recidivism rate is approximately 9%. The lastline focuses on individuals who entered the panel on or after 2012. Naturally, as they are observed for ashorter period of time, their recidivism rate is relatively smaller at 2.3%.Figure 2 depicts the proportions of repeat offenses for the entire sample period and for those whoentered the panel in 2012 or later. The ﬁgure provides prima facie evidence on the impact of the policy.Indeed, the proportion of detainees who do not reoffend upon release in the post-policy period is 15percentage points larger (74.1%) than the proportion for the whole sample period (51.5%). Likewise, theproportion of repeat offenders is between 3 to 6 percentage points lower in the post-policy period forany given number of repeat offenses. Naturally, such differences may results from factors other thanthe “tough-on-crime” policy, such as, but not limited to, better economic opportunities, and demographiccompositional changes. In order to net these out, we now turn to formal econometric modelling. y is an indicator variable that equals 1 if an individual commits a repeat offenseand 0 otherwise. We regress the probability of recidivism on time-varying covariates (age, schooling,unemployment rate), on time-invariant policy variables ( Pre-Post 2012 and

Post 2012 ) and on other Recidivism is a yearly dummy variable equal to one the year at which the new incarceration begins and zero otherwise.Recidivism may be equal to one in consecutive years so long as the repeat offenses occurred after the end of the previoussentence. Reincarcerations while on parole or on conditional release are not considered repeat offenses. Obviously, detainees who entered the sample on or after 2012 have had less time to reoffend. Yet, in our sample as manyas 34% of detainees are reincarcerated within 12 months upon release, and as many as 43% within two years. Hence, the sharpdecline in repeat offenses in the post-2012 period is unlikely due to the sampling frame. See Lalande et al. (2015). To the extent the new legislation has indeed lowered the recidivism rates, it not clear whether it did so through deterrent orincapacitative effects. Yet, see Bhuller et al. (2019) for U.S. evidence according to which deterrence dominates incapacitation.

Number of Offenses R e l a t i v e P r opo r t i on ( % ) Group

Fig. 2:

Frequency of Repeat Offenses time-invariant control variables.Our Bayesian setup uses the same independent prior distributions as in the simulation exercise: β ∼ N k (cid:0) k , I k (cid:1) , ζ ∼ N k − (cid:0) k − , I k − (cid:1) , σ α ∼ IG ( / , / ) . We generate 60 ,

000 MCMC samples ofwhich the ﬁrst 10 ,

000 are discarded as burn-ins. The posterior estimates are reported using a thinningfactor of 50, optimized following the approach in Owen (2017). The mixing of the MCMC chain is extremely good as illustrated in Figure 3 which exhibits the traceplots of the parameters at the 75th quantile. Trace plots at other quantiles are similar and not reportedfor the sake of brevity but they are available upon request. Figure 4 provides additional information onthe performance of the MCMC chain. The left-hand-side ﬁgure depicts the boxplots of the inefﬁciencyfactors of the parameters ( β s, ζ s and σ α ) for each of the ﬁve different quantiles used in estimating themodel. Except perhaps for the 10th quantile, all are reasonably close to one. Consistent with the sim-ulation results, the parameter with the largest inefﬁciency factor at the 10th quantile is σ α (not shown,see Table 2). The right-hand-side ﬁgure reports the boxplots of the convergence diagnostics of the pa-rameter estimates for the same ﬁve speciﬁcations based on the ﬁrst 10% and the last 40% values of theMarkov chain (Geweke, 1992). As depicted, all parameters have Z -scores within 2 standard deviationof the mean at the 5% level or within 2 .

58 standard deviation at 1% level. All in all, the Markov chainsbehave satisfactorily and thus lend themselves to statistical inference.Table 4 reports the posterior means and standard deviations at ﬁve different quantiles separately. Toease interpretation, the quantile-speciﬁc estimates are reported column-wise in increasing order. Row-wise, we distinguish the time-varying covariates from the time-invariant and the correlated random ef-fects variables. Note that the correlated random effects speciﬁcation does not include an intercept. This isto allow the identiﬁcation of the two time-invariant policy variables,

Pre-Post 2012 and

Post 2012 .The former, is equal to one if the detainee was incarcerated prior to 2012 and thus observed both before Thinning has been criticized by some (MacEachern and Berliner, 1994; Link and Eaton, 2012) while others acknowledgethat it can increase statistical efﬁciency (Geyer, 1991). See Owen (2017) who claims that the arguments against thinning maybe misleading. Note that the time-varying covariates (

Age , Schooling and

Unemployment rate ) have been “demeaned” and that

Age has been divided by 10. The parameter estimates must thus be interpreted accordingly. ayesian panel quantile regression for binary outcomes with correlated random effects 17

Variable p = p = p = p = p = β -Age − .

634 0.426 − .

417 0.183 − .

024 0.099 − .

598 0.083 − .

829 0.123 β -Schooling − .

474 0.116 − .

649 0.047 − .

379 0.025 − .

335 0.021 − .

484 0.032 β -Unemp Rate 0.347 0.103 0.140 0.040 0.072 0.022 0.056 0.020 0.078 0.028Policy Variables (Time invariant)Pre-Post 2012 − .

671 0.465 − .

299 0.203 − .

034 0.113 − .

735 0.087 − .

800 0.121Post-2012 − .

238 0.522 − .

389 0.224 − .

610 0.125 − .

202 0.097 − .

504 0.134Other Time invariant covariatesMarried − .

315 0.901 − .

286 0.399 − .

222 0.204 − .

932 0.164 − .

298 0.226Aboriginals 5 .

661 0.634 2 .

494 0.295 1 .

359 0.148 1 .

132 0.121 1 .

645 0.197Oth. Mot. Ton. 0 .

329 0.683 0 .

174 0.270 0.090 0.147 0.080 0.120 0 .

089 0.164Violent Crime − .

286 0.948 − .

908 0.419 − .

484 0.224 − .

797 0.159 − .

253 0.205Property Crime 3 .

111 0.495 1 .

674 0.210 0 .

906 0.112 0 .

713 0.090 0 .

967 0.126Other Crime 5 .

656 0.498 2 .

707 0.212 1 .

456 0.116 1 .

171 0.092 1 .

636 0.131Correlated Random Effects ζ -Age 10 .

381 0.456 4 .

521 0.195 2 .

558 0.105 2 .

218 0.085 3.286 0.123 ζ -Schooling 1 .

263 0.127 0 .

560 0.051 0 .

331 0.027 0 .

295 0.023 0 .

426 0.034 ζ -Unemployment − .

311 0.137 − .

129 0.054 − .

067 0.029 − .

054 0.025 − .

080 0.037 σ α .

810 3.325 13 .

133 0.540 3 .

871 0.165 2 .

777 0.132 6 .

217 0.325

Table 4:

Posterior Mean (Mean) and Standard Deviation (Std) of the Parameters in the Crime Application. and after the implementation of the “tough-on-crime” policy. The latter is equal to one if a detainee’sﬁrst incarceration occurred during or after 2012, and thus always exposed to the policy. All other time-invariant variables are measured at ﬁrst entry in the panel. The estimates of the correlated randomcomponents associated with the individual mean

Age , Schooling and

Unemployment , b ζ , are all statis-tically different from zero regardless of the quantile. The individual-speciﬁc effects, α i , are thus highlycorrelated with the individual means of the time-varying variables. Omitting this correlation may there-fore bias the model estimates and hence their intrinsic marginal effects and relative risks. This providesempirical support to the worthiness of incorporating correlated random effects within a quantile regres-sion.The ﬁrst noteworthy feature of the table is that all parameter estimates are statistically differentfrom zero, except for the parameter associated with Other Mother Tongue . Thus detainees who reportspeaking a language other than English or French at home are no more and no less likely to eventuallyreoffend. A second interesting feature concerns the sign of the parameter estimates. Indeed, all are con-sistent with recent research on crime recidivism. For instance,

Age and

Schooling are associated withlower rates of recidivism (Bhuller et al., 2019) whereas being released during a period of high unem-ployment has been found to favour recidivism (Siwach, 2018; Rege et al., 2019). Likewise, married menare less likely to reoffend whereas Aboriginal detainees are more likely to do so (Justice Canada, 2017).The type of crime is also associated with recidivism. The estimates must be interpreted relative to trafﬁc Recall from Table 3 that very few men are married. In addition, next to none report a change in their marital status inbetween incarcerations. Further, since the marital status of non-repeaters is not observed in the data we are constrained to usethe information at entry in the panel. related crimes, which is the base or omitted category in our analysis. Clearly, sentences for

ViolentCrimes will be harsher and so the large parameter estimate presumably reﬂects an incapacitative effect.Finally, the parameter estimates of

Post 2012 is larger than that of

Pre-Post 2012 which suggeststhat the implementation of the “tough-on-crime” policy may have had a detrimental effect on recidivism.As stated in Section 4, the parameter estimates such as those reported in Table 4 do not give themarginal effects. Yet, the latter are important from a policy perspective. Thus, while the parameter es-timates vary considerably across quantiles, it is not clear that the marginal effects are equally sensitivesince they depend both on the time-varying variables and the correlated random components. Figure 5reports the average marginal effects computed according to equation (10), along with their highest pos-terior density intervals (HPDI). Note that most marginal effects have a relatively ﬂat proﬁle between p

10 and p

75 and then exhibit a small kink between p

75 and p

90. For instance, increasing

Age by 1/10threduces the probability of reoffending by 1% at the 10th quantile and by 1.6% at the 90th quantile.Similar results hold for

Schooling (1% vs Married (0.3% vs Violent Crime (5% vs p

10 to p

90. As for the time-invariant variables, their marginal effects all increase by atleast 50% as we move from p

10 to p

90. In particular, the marginal effects associated to

First Nation , Property Crime and

Other Crime exhibit a twofold increase. More importantly, the marginal effectsof the two “tough-on-crime” variables increase manifold and in a steady fashion between p

10 and p Pre-Post 2012 , the probability of reoffending decreases from 78% at the 10th quantileto as little as 10% at the 90th. Likewise, the parameters of

Post 2012 imply that the probability de-creases from 79% to 14% at both extremes. These results are important from a policy perspective for tworeasons. First, they imply that detainees from both groups are sensitive to the “tough-on-crime” policy,and even more so for those in the

Post 2012 group. Consequently long-run recidivism ( i.e. recidivismby the

Pre-Post 2012 group between 2012-2017) can be addressed just as well as short-run recidivism( i.e. recidivism by the

Post 2012 group between 2012-2017) by such policies. Second, the policy doesnot impact all detainees alike. Those in the lower quantiles are much more responsive than those in theupper quantiles.In order to gain further insight into the sensitivity of recidivism to various covariates, we reportthe corresponding relative risks in Figure 6 (see equation (11)) along with their HDPI. Not surprisinglygiven the marginal effects, the relative risks are fairly constant for the ﬁrst two or three quantiles ( p = , , Age , Schooling and

Unemployment Rate are associated with slightly different rates of repeat offenses, only those inthe highest quantiles exhibit signiﬁcantly different recidivism rates. On the other hand, marital status(

Married ), First Nation and types of crime (

Violent, Property, Other ) all have signiﬁcantlyhigher or lower relative risks of reoffending as the case may be, and all exhibit a sharp change betweenthe last two quantiles. Here, as with the previous ﬁgure, the results concerning the “tough-on-crime”variables are particularly interesting. Indeed, according to the ﬁgure all detainees were much less likelyto reoffend in the post 2012 period, irrespective of whether they where ﬁrst convicted prior to 2012 orafter. As with the marginal effects, the policy appears to have had a larger impact on those in the lowerquantiles. Thus for every quantile the risk of recidivism is much lower (and signiﬁcantly different) forthose who were exposed to the “tough-on-crime” policy. For instance, the 95% HPDI at quantile p

10 is The marginal effects for

Age correspond to 1/10 of an additional year relative to the mean. Those for

Unemployment and

Schooling correspond to one additional year and one additional percentage point relative to their individual means,respectively. The remaining marginal effects correspond to a change in the indicator variables. ayesian panel quantile regression for binary outcomes with correlated random effects 19 [ . . ] for the Pre-Post 2012 group and [ . . ] for the Post 2012 group. On the otherhand, the 95% HPDI at quantile p

90 for the two groups are [ . . ] and [ . . ] , respectively.In other words, for the lowest quantile ( p [

90; 91 ] % and [

92; 93 ] % for the Pre-Post 2012 and

Post 2012 groups, respectively. In contrast, forthose in the highest quantile, p

90, the

Post 2012 group decreases its recidivism rate more than that ofthe

Pre-Post 2012 ( [

76; 81 ] % vs [

61; 67 ] %). This paper presents a panel quantile regression model for binary outcomes with correlated random-effects (CRE) and proposes two MCMC algorithms for its estimation. By incorporating the CRE into thepanel quantile regression for discrete outcomes, we move beyond the random-effects framework typicallyconsidered in the Bayesian quantile regression literature. The paper makes an important contribution tothe literature on quantile regression for panel data and panel quantile regression for discrete outcomes.The two proposed MCMC algorithms are simpler to implement, but we prefer the algorithm that exploitsblock sampling of parameters to reduce the autocorrelation in MCMC draws. This blocked algorithmis tested in multiple simulation studies and shown to perform extremely well. We also emphasize thecalculation of marginal effects in models with discrete outcome and explain its computation, along withthose of relative risk and odds ratio, using the MCMC draws. Finally, we implement the proposed quantileframework to analyze crime recidivism in Quebec (a Canadian Province) for the period 2007 − Conﬂict of interest

The authors declare that they have no conﬂict of interest.

References

Abrevaya J, Dahl CM (2008) The effects of birth inputs on birthweight: Evidence from quantile estima-tion on panel data. JBES 26(4):379–397Albert J, Chib S (2001) Sequential ordinal modeling with applications to survival data. Biometrics57:829–836

Alhamzawi R (2016) Bayesian model selection in ordinal quantile regression. Computational Statisticsand Data Analysis 103:68–78Alhamzawi R, Ali HTM (2018) Bayesian single-index quantile regression for ordinal data. Communica-tions in Statistics − Simulation and Computation pp 1–15Arellano M (1993) On the testing of correlated effects with panel data. Journal of Econometrics 59(1-2):87–97Arellano M, Bonhomme S (2016) Nonlinear panel data estimation via quantile regression. The Econo-metrics Journal 19(3):61–94Bache SHM, Dahl CM, Christensen JT (2013) Headlights on tobacco road to low birthweight out-comes: Evidence from a battery of quantile regression estimators and a heterogeneous panel. EmpiricalEconomcis 44(3):1593–1633Baltagi BH (2006) Estimating an economic model of crime using panel data from North Carolina. Journalof Applied Econometrics 21(4):543–547Baltagi BH (2013) Econometric Analysis of Panel Data. 5th Edition, John Wiley & Sons, ChichesterBaltagi BH, Bresson G, Pirotte A (2003) Fixed effects, random effects or Hausman-Taylor? a pretestestimator. Economics Letters 79(3):361–369Baltagi BH, Bresson G, Chaturvedi A, Lacroix G (2018) Robust linear static panel data models using ε -contamination. Journal of Econometrics 202:108–123Barrodale I, Roberts FDK (1973) Improved algorithm for discrete l linear approximation. SIAM Journalof Numerical Analysis 10(5):839–848Benoit DF, Poel DVD (2010) Binary quantile regression: A Bayesian approach based on the asymmetricLaplace distribution. Journal of Applied Econometrics 27(7):1174–1188Bhuller M, Dahl G, Loken K, Mogstad M (2019) Incarceration, recidivism and employment. Journal ofPolitical Economy (forthcoming)Burda M, Harding M (2013) Panel probit with ﬂexible correlated effects: Quantifying technologyspillovers in the presence of latent heterogeneity. Journal of Applied Econometrics 28(6):956–981Cameron AC, Trivedi PK (2005) Microeconometrics: Methods and Applications. Cambridge UniversityPress, CambridgeCanay IA (2011) A simple approach to quantile regression for panel data. The Econometrics Journal14(3):368–386Chalﬁn A, McCrary J (2017) Criminal deterrence: A review of the literature. Journal of Economic Liter-ature 55(1):5–48Chamberlain G (1980) Analysis with qualitative data. Review of Economic Studies 47:225–238Chamberlain G (1982) Multivariate regression models for panel data. Journal of Econometrics 18(1):5–46 ayesian panel quantile regression for binary outcomes with correlated random effects 21 Chamberlain G (1984) Panel data. In: Griliches Z, Intriligator MD (eds) Handbook of Econometrics,vol 2, Elsevier, pp 1247–1318Chen C (2007) A ﬁnite Smoothing algorithm for quantile regression. JCGS 16(1):136–164Chernozhukov V, Fern´andez-Val I, Hahn J, Newey W (2013) Average and quantile effects in nonsepara-ble panel models. Econometrica 81(2):535–580Chib S, Carlin BP (1999) On MCMC sampling in hierarchical longitudinal models. Statistics and Com-puting 9:17–26Chib S, Jeliazkov I (2006) Inference in semiparametric dynamic models for binary longitudinal data.Journal of the American Statistical Association 101(474):685–700Cornwell C, Trumbull WN (1994) Estimating the economic model of crime with panel data. The Reviewof Economics and Statistics 76(2):360–366Dantzig GB (1963) Linear Programming and Extensions. Princeton University Press, PrincetonDantzig GB, Thapa MN (1997) Linear Programming 1: Introduction. Springer, New YorkDantzig GB, Thapa MN (2003) Linear Programming 2: Theory and Extensions. Springer, New YorkDavino C, Furno M, Vistocco D (2013) Quantile Regression: Theory and Applications. John Wiley &Sons, ChichesterDavis CS (1991) Semi-parametric and non-parametric methods for the analysis of repeated measure-ments with applications to clinical trials. Statistics in Medicine 10(12):1959–1980Devroye L (2014) Random variate generation for the generalized inverse Gaussian distribution. Statisticsand Computing 24(2):239–246Galvao AF, Kato K (2017) Quantile regression methods for longitudinal data. In: Koenker R, Cher-nozhukov V, He X, Peng L (eds) Handbook of Quantile Regression, Chapman and HAll/CRC, NewYork, pp 363–380Galvao AF, Poirier A (2019) Quantile regression random effects. Annals of Economics and Statistics(134):109–148Galvao AF, Lamarche C, Lima LR (2013) Estimation of censored quantile regression for panel data withﬁxed effects. JASA 108(503):1075–1089Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric Laplace dis-tribution. Biostatistics 8(1):140–154Geraci M, Bottai M (2014) Linear quantile mixed models. Statistics and Computing 24(461-479)Geweke J (1991) Efﬁcient simulation from the multivariate normal and student- t dis-tributions subject to linear constraints and the evaluation of constraint probabilities. , iowaCity, IA, USA Geweke J (1992) Evaluating the accuracy of sampling-based approaches to the calculation of poste-rior moments. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian Statistics, vol 4,Clarendon Press, pp 169–193Geweke J (2005) Contemporary Bayesian Econometrics and Statistics. John Wiley & SonsGeyer CJ (1991) Markov chain monte carlo maximum likelihood. In: Kemramides EM (ed) ComputingScience and Statistics: Proceedings of the 23rd Symposium on the Interface, Interface Foundation ofNorth America, Fairfax Station, VA, USA, pp 156–163Ghasemzadeh S, Ganjali M, Baghfalaki T (2018a) Bayesian quantile regression for analyzing ordinallongitudinal responses in the presence of non-ignorable missingness. METRON 76(3):321–348Ghasemzadeh S, Ganjali M, Baghfalaki T (2018b) Bayesian quantile regression for joint modeling oflongitudinal mixed ordinal contiuous data. Commincations in Statistics − Simulation and Computa-tion pp 1–21Gibbons RD, Hedeker D (1993) Application of random-effects probit regression. Journal of Consultingand Clinical Psychology 62(2):285–296Graham BS, Hahn J, Poirier A, Powell JL (2018) A quantile correlated random coefﬁcients panel datamodel. Journal of Econometrics 206(2):305–335Greenberg E (2012) Introduction to Bayesian Econometrics. 2nd Edition, Cambridge University Press,New YorkGreene W (2015) Panel data models for discrete choice. In: Baltagi BH (ed) The Oxford Handbook ofPanel Data, Oxford University Press, New YorkGreene WH (2017) Econometric Analysis. 8th Edition, Prentice Hall, New YorkHausman JA (1978) Speciﬁcation tests in econometrics. Econometrica 46(6):1251–1271Hausman JA, Taylor WE (1981) Panel data and unobservable individual effects. Econometrica49(6):1377–1398Jeliazkov I, Rahman MA (2012) Binary and ordinal data analysis in economics: Modeling and estimation.In: Yang XS (ed) Mathematical Modeling with Multidisciplinary Applications, John Wiley & SonsInc., New Jersey, pp 123–150Jeliazkov I, Vossmeyer A (2018) The impact of estimation uncertainty on covariate effects in nonlinearmodels. Statistical Papers 59(3):1031–1042Jeliazkov I, Graves J, Kutzbach M (2008) Fitting and comparison of models for multivariate ordinaloutcomes. Advances in Econometrics: Bayesian Econometrics 23:115–156Joshi R, Wooldridge JM (2019) Correlated random effects models with endogeneous explanatory vari-ables and unbalanced panels. Annals of Economics and Statistics (134):243–268Justice Canada (2017) Indigenous overrepresentation in the criminal justice system. URL ayesian panel quantile regression for binary outcomes with correlated random effects 23

Karmarkar N (1984) A new polynomial time algorithm for linear programming. Combinatorica4(4):373–395Kobayashi G, Kozumi H (2012) Bayesian analysis of quantile regression for censored dynamic paneldata model. Computational Statistics 27(2):359–380Koenker R (2004) Quantile regression for longitudinal data. Journal of Multivariate Analysis 91(1):74–89Koenker R (2005) Quantile Regression. Cambridge University Press, CambridgeKoenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50Koenker R, d’Orey V (1987) Computing regression quantiles. JRSSC 36(3):383–393Kordas G (2006) Smoothed binary regression quantiles. Journal of Applied Econometrics 21(3):387–407Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. Journal ofStatistical Computation and Simulation 81(11):1565–1578Lalande P, Pelletier Y, Dolmaire P, Raza E (2015) Projet, enquˆete sur la r´ecidive/reprise de laclient`ele conﬁ´ee aux services correctionnels du Qu´ebec. Minist`ere de la s´ecurit´e publique du Qu´ebec(http://collections.banq.qc.ca/ark:/52327/2505967)Lamarche C (2010) Robust penalized quantile regression estimation for panel data. Journal of Econo-metrics 157(2):396–408Link WA, Eaton MJ (2012) On thinning of chains in MCMC. Methods in Ecology and Evolution 3:112–115Luo Y, Lian H, Tian M (2012) Bayesian quantile regression for longitudinal data models. Journal ofStatistical Computation and Simulation 82(11):1635–1649MacEachern SN, Berliner LM (1994) Subsampling the Gibbs sampler. The American Statistician48(3):188–190Madsen K, Nielsen HB (1993) A ﬁnite smoothing algorithm for linear l estimation. SIAM Journal ofOptimization 3(2):223–235Mehrotra S (1992) On the implementation of Primal-Dual Interior Point methods. SIAM Journal ofOptimization 2(4):575–601Mundlak Y (1978) On the pooling of time series and cross section data. Econometrica 46(1):69–85Omata Y, Katayama H, Arimura TH (2017) Same concerns, same responses: A Bayesian quantile regres-sion analysis of the determinants for nuclear power generation in Japan. Environmental Economics andPolicy Studies 19(3):581–608Owen AB (2017) Statistically efﬁcient thinning of a Markov chain sampler. Journal of Computationaland Graphical Statistics 26(3):738–744Rahman MA (2013) Quantile regression using metaheuristic algorithms. International Journal of Com-putational Economics and Econometrics 3(3/4):205–233 Rahman MA (2016) Bayesian quantile regression for ordinal models. Bayesian Analysis 11(1):1–24Rahman MA, Karnawat S (2019) Flexible bayesian quantile regression in ordinal models. Advances inEconometrics 40B:211–251Rahman MA, Vossmeyer A (2019) Estimation and applications of quantile regression for binary longi-tudinal data. Advances in Econometrics 40(B):157–191Rege M, Skardhamar T, Telle K, Votruba M (2019) Job displacement and crime: Evidence from norwe-gian register data. Labour Economics 61:101761Siwach G (2018) Unemployment shocks for individuals on the margin: Exploring recidivism effects.Labour Economics 52:231–244Soares YM, Fagundes RA (2018) Interval quantile regression models based on swarm intelligence. Ap-plied Soft Computing 72:474–485Wang J (2012) Bayesian quantile regression for parametric nonlinear mixed effects models. StatisticalMethods & Applications 21(3):279–295Wooldridge JM (2010) Econometric Analysis of Cross Section and Panel Data. 2nd Edition, MIT Press,CambridgeYu K, Moyeed RA (2001) Bayesian quantile regression. Statistics and Probability Letters 54(4):437–447Yuan Y, Yin G (2010) Bayesian quantile regression for longitudinal studies with nonignorable missingdata. Biometrics 66(1):105–114 ayesian panel quantile regression for binary outcomes with correlated random effects 25

Fig. 3: Trace plots of the parameters for the 75th quantile. p10 p25 p50 p75 p9011.522.533.544.5 I ne ff i c i en cy F a c t o r ( I F ) p10 p25 p50 p75 p90-2-1.5-1-0.500.511.52 C on v e r gen c e D i agno s t i c ( CD ) Fig. 4: Boxplots of the Inefﬁciency Factors and Convergence Diagnostics for ( β , ζ , σ α ) at 5 differentquantiles. ayesian panel quantile regression for binary outcomes with correlated random effects 27 -3 Fig. 5: Marginal Effects with 95% HPDI.

Fig. 6: Relative Risks with 95% HPDI. r X i v : . [ ec on . E M ] J a n Preprint submitted to arXiv.org

Supplementary material - Bayesian quantile regression for abinary outcome model with correlated random effects

An application on crime recidivism in Canada

Georges Bresson · Guy Lacroix · Mohammad ArshadRahman

January 2020

Contents β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Conditional posterior density of α i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Conditional posterior density of w it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Conditional posterior density of σ α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Conditional posterior density of ζ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Conditional posterior density of z it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Conditional densities for the blocked sampling in the BPQRCRE model . . . . . . . . . . . . . . . . . . . 62.1 Joint posterior conditional density of ( β , z it ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.1 Conditional posterior density of β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Conditional posterior density of z i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Other conditional posterior densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Georges BressonDepartment of Economics, Universit´e Paris II, Paris, France.E-mail: [email protected]

Corresponding author. Department of Economics, Universit´e Paris II, 12 place du Panth´eon, 75231 Paris cedex 05, France (Tel.:+33 (1) 44 41 89 73).

Bresson, Lacroix and Rahman

The binary panel quantile regression with correlated random effects (BPQRCRE) model, when stacked forindividual i , can be written in the hierarchical form as follows, z i = X i β + ι T i α i + w i θ + D τ √ w i u i ∀ i = , · · · , n , y it = (cid:26) z it > , ∀ i = , · · · , n , ; t = , · · · , T i , α i ∼ N (cid:0) m ′ i ζ , σ α (cid:1) , w it ∼ E ( ) , u it ∼ N ( , ) , β ∼ N k ( β , B ) , σ α ∼ IG (cid:18) c , d (cid:19) , ζ ∼ N k − ( ζ , C ) , (1)where the ﬁrst three lines represent the model and distribution of the mixture variables, and the last linespeciﬁes the prior distribution of the model parameters. All the notations are as described in the paper.Using the Bayes’s theorem, the complete posterior density is given by, π ( β , α , z , w , ζ , σ α | y ) ∝ (cid:26) n ∏ i = T i ∏ t = h I ( z it > ) I ( y it = ) + I ( z it ≤ ) I ( y it = ) i(cid:27) × exp (cid:20) − n ∑ i = n ( z i − X i β − ι T i α i − w i θ ) ′ D − τ √ w i ( z i − X i β − ι T i α i − w i θ ) o(cid:21) × exp (cid:18) − n ∑ i = T i ∑ t = w it (cid:19)(cid:0) πσ α (cid:1) − n exp (cid:20) − σ α n ∑ i = ( α i − m ′ i ζ ) ′ ( α i − m ′ i ζ ) (cid:21) × ( π ) − k | B | − exp (cid:20) − ( β − β ) ′ B − ( β − β ) (cid:21) ( π ) − k − | C | − × exp (cid:20) − ( ζ − ζ ) ′ C − ( ζ − ζ ) (cid:21) × (cid:0) σ α (cid:1) − ( c + ) exp (cid:20) − d σ α (cid:21) . (2)1.1 Conditional posterior density of β The conditional posterior density π ( β | α , z , w ) can be derived from the complete posterior density given byequation (2) by collecting terms involving β as follows, π ( β | α , z , w ) ∝ exp (cid:20) − n ∑ i = ( z i − X i β − ι T i α i − w i θ ) ′ D − τ √ w i ( z i − X i β − ι T i α i − w i θ ) (cid:21) × exp (cid:20) − ( β − β ) ′ B − ( β − β ) (cid:21) . Let G i = ( z i − ι T i α i − w i θ ) , and consider the expressions in the exponential without the − / n ∑ i = ( G i − X i β ) ′ D − τ √ w i ( G i − X i β ) + ( β − β ) ′ B − ( β − β )= n ∑ i = G ′ i D − τ √ w i G i − n ∑ i = β ′ X ′ i D − τ √ w i G i + n ∑ i = β ′ X ′ i D − τ √ w i X i β + β ′ B − β − β ′ B − β + β ′ B − β = β ′ (cid:18) n ∑ i = X ′ i D − τ √ w i X i + B − (cid:19) β − β ′ (cid:18) n ∑ i = X ′ i D − τ √ w i G i + B − β (cid:19) + n ∑ i = G ′ i D − τ √ w i G i + β ′ B − β . itle Suppressed Due to Excessive Length 3 Our interest lies in the distribution of β , so all terms that do not involve β are absorbed into the proportion-ality constant. Applying the idea of completing the square to the previous equation we have, π ( β | α , z , w ) ∝ exp (cid:20) − (cid:16) β − e β (cid:17) ′ e B − (cid:16) β − e β (cid:17)(cid:21) , where , e B − = (cid:18) n ∑ i = X ′ i D − τ √ w i X i + B − (cid:19) , and e β = e B (cid:18) n ∑ i = X ′ i D − τ √ w i ( z i − ι T i α i − w i θ ) + B − β (cid:19) , which is recognized as the kernel of a normal distribution. Hence we can write, β | α , z , w ∼ N k (cid:16) e β , e B (cid:17) , where e B − = (cid:18) n ∑ i = X ′ i D − τ √ w i X i + B − (cid:19) , and e β = e B (cid:18) n ∑ i = X ′ i D − τ √ w i ( z i − ι T i α i − w i θ ) + B − β (cid:19) . (3).1.2 Conditional posterior density of α i Similar to β , the conditional posterior density π (cid:0) α i | β , z , w , σ α , ζ (cid:1) can also be derived from the completeposterior density, given by equation (2), by collecting terms involving α i . This results in the followingexpression, π (cid:0) α i | β , z , w , σ α , ζ (cid:1) ∝ exp (cid:20) − ( z i − X i β − ι T i α i − w i θ ) ′ D − τ √ w i ( z i − X i β − ι T i α i − w i θ ) (cid:21) × exp (cid:20) − σ α (cid:0) α i − m ′ i ζ (cid:1) ′ (cid:0) α i − m ′ i ζ (cid:1)(cid:21) Let H i = ( z i − X i β − w i θ ) and consider the expressions in the exponential without the constant − /

2. Thenthe expression can be simpliﬁed as, ( H i − ι T i α i ) ′ D − τ √ w i ( H i − ι T i α i ) + σ − α (cid:0) α i − m ′ i ζ (cid:1) ′ (cid:0) α i − m ′ i ζ (cid:1) = H ′ i D − τ √ w i H i − α ′ i ι ′ T i D − τ √ w i H i + α ′ i ι ′ T i D − τ √ w i ι T i α i + σ − α α ′ i α i − σ − α α ′ i m ′ i ζ + σ − α ζ ′ m i x ′ i ζ = α ′ i (cid:16) ι ′ T i D − τ √ w i ι T i + σ − α (cid:17) α i − α ′ i (cid:16) ι ′ T i D − τ √ w i ι T i + σ − α m ′ i ζ (cid:17) + H ′ i D − τ √ w i H i + σ − α ζ ′ m i m ′ i ζ . We are interested in the distribution of α i , so all terms that do not involve α i are absorbed into the propor-tionality constant. Applying the idea of completing the square as in the previous equation we have, π (cid:0) α i | β , z , w , σ α , ζ (cid:1) ∝ exp (cid:20) − ( α i − e a ) ′ e A − ( α i − e a ) (cid:21) , where , e A − = (cid:16) ι ′ T i D − τ √ w i ι T i + σ − α (cid:17) , and e a = e A (cid:16) ι ′ T i D − τ √ w i ( z i − X i β − w i θ ) + σ − α m ′ i ζ (cid:17) , which is the kernel of a normal distribution. Hence, the conditional posterior density of α i is given by, α i | β , z , w , σ α , ζ ∼ N (cid:16)e a , e A (cid:17) , where e A − = (cid:16) ι ′ T i D − τ √ w i ι T i + σ − α (cid:17) , and e a = e A (cid:16) ι ′ T i D − τ √ w i ( z i − X i β − w i θ ) + σ − α m ′ i ζ (cid:17) . (4) Bresson, Lacroix and Rahman w it The conditional posterior density π ( w it | β , α i , z it ) can be obtained from the complete posterior density(equation 2) by collecting terms involving w it . This is done as follows, π ( w it | β , α i , z it ) ∝ w − / it exp (cid:20) − (cid:0) z it − x ′ it β − α i − w it θ (cid:1) ′ τ w it (cid:0) z it − x ′ it β − α i − w it θ (cid:1)(cid:21) × exp [ − w it ] ∝ w − / it exp " − ( w it + ( z it − x ′ it β − α i − w it θ ) τ w it ) ∝ w − / it exp " − ((cid:18) θ τ + (cid:19) w it + τ ( z it − x ′ it β − α i ) w it ) , where all terms not involving w it have been absorbed in the proportionality constant. By comparing the lastexpression to that of a generalized inverse Gaussian distribution ( GIG ) , we have w it | β , α i , z it ∼ GIG (cid:18) , e λ it , e η (cid:19) for i = , ..., n , and t = , ..., T i , where e λ it = (cid:18) z it − x ′ it β − α i τ (cid:19) , and e η = (cid:18) θ τ + (cid:19) . (5)1.4 Conditional posterior density of σ α The conditional posterior density π (cid:0) σ α | α , ζ (cid:1) is also derived from the complete posterior density (equa-tion 2) by collecting terms involving σ α . This is done as follows, π (cid:0) σ α | α , ζ (cid:1) ∝ (cid:0) πσ α (cid:1) − n exp (cid:20) − σ α n ∑ i = (cid:0) α i − m ′ i ζ (cid:1) ′ (cid:0) α i − m ′ i ζ (cid:1)(cid:21) × (cid:0) σ α (cid:1) − ( c + ) exp (cid:20) − d σ α (cid:21) ∝ (cid:0) πσ α (cid:1) − ( n + c ) + exp (cid:20) − σ α (cid:26) d + n ∑ i = (cid:0) α i − m ′ i ζ (cid:1) ′ (cid:0) α i − m ′ i ζ (cid:1)(cid:27)(cid:21) , which is the kernel of an inverse-gamma distribution ( IG ) with shape parameter e c = n + c and scaleparameter e d = d + n ∑ i = ( α i − m ′ i ζ ) ′ ( α i − m ′ i ζ ) . Therefore, the conditional posterior density can be writtenas, σ α | α , ζ ∼ IG (cid:16) e c , e d (cid:17) , where e c = ( n + c ) , and e d = d + n ∑ i = ( α i − m ′ i ζ ) ′ ( α i − m ′ i ζ ) . (6) The generalized inverse Gaussian distribution

GIG ( p , a , b ) is a three-parameters family of continuous probability distributionswith probability density, f ( x ) = ( b / a ) p / K p (cid:16) √ ab (cid:17) x ( p − ) exp (cid:20) − (cid:16) ax + bx (cid:17)(cid:21) , x > , where K p ( . ) is the modiﬁed Bessel function of the second kind, a > b > p is a real parameter. itle Suppressed Due to Excessive Length 5 ζ The conditional posterior density π (cid:0) ζ | α , σ α (cid:1) is derived from the complete posterior density (equation 2).Collecting terms involving ζ we obtain the following expression, π (cid:0) ζ | α , σ α (cid:1) ∝ exp (cid:20) − σ α n ∑ i = (cid:0) α i − m ′ i ζ (cid:1) ′ (cid:0) α i − m ′ i ζ (cid:1)(cid:21) × exp (cid:20) − ( ζ − ζ ) ′ C − ( ζ − ζ ) . (cid:21) Focusing on the expressions in the exponential but without the − / σ − α n ∑ i = (cid:0) α i − m ′ i ζ (cid:1) ′ (cid:0) α i − m ′ i ζ (cid:1) + ( ζ − ζ ) ′ C − ( ζ − ζ )= σ − α n ∑ i = α ′ i α i − σ − α n ∑ i = ζ ′ m i α i + σ − α n ∑ i = ζ ′ m i m ′ i ζ + ζ ′ C − ζ − ζ ′ C − ζ + ζ ′ C − ζ = ζ ′ (cid:18) σ − α n ∑ i = m i m ′ i + C − (cid:19) ζ − ζ ′ (cid:18) σ − α n ∑ i = m i α i + C − ζ (cid:19) + σ − α n ∑ i = α ′ i α i + ζ ′ C − ζ The interest lies in the distribution of ζ , so all terms that do not involve ζ are absorbed into the propor-tionality constant. Applying the idea of completing the square we have, π (cid:0) ζ | α , σ α (cid:1) ∝ exp (cid:20) − (cid:16) ζ − e ζ (cid:17) ′ e Σ − ζ (cid:16) ζ − e ζ (cid:17)(cid:21) , where e Σ − ζ = (cid:18) σ − α n ∑ i = m i m ′ i + C − (cid:19) , and e ζ = e Σ ζ (cid:18) σ − α n ∑ i = m i α ′ i + C − ζ (cid:19) , which is recognized as the kernel of a normal distribution. Hence, the conditional posterior density is, ζ | α , σ α ∼ N k − (cid:16)e ζ , e Σ ζ (cid:17) , where e Σ − ζ = (cid:18) σ − α n ∑ i = m i m ′ i + C − (cid:19) , and e ζ = e Σ ζ (cid:18) σ − α n ∑ i = m i α ′ i + C − ζ (cid:19) . (7)1.6 Conditional posterior density of z it The conditional posterior density π ( z it | β , α , w , y it ) is obtained by collecting terms involving z it from thecomplete posterior density (equation 2) and seen to have the following expression, π ( z it | β , α , w , y it ) ∝ [ I ( z it > ) I ( y it = ) + I ( z it ≤ ) I ( y it = )] × exp (cid:20) − (cid:0) z it − x ′ it β − α i − w it θ (cid:1) ′ τ w it (cid:0) z it − x ′ it β − α i − w it θ (cid:1)(cid:21) ∝ [ I ( z it > ) I ( y it = ) + I ( z it ≤ ) I ( y it = )] φ (cid:0) z it | x ′ it β + α i + w it θ , τ w it (cid:1) , where the notation φ ( x | µ , Σ ) denotes the probability density function of a normal distribution with mean µ and variance Σ . Note that the observations are conditionally independent, so we can draw each z it indepen-dently of each other. Consequently, we have, π ( z it | β , α , w , y it ) ∝ I ( z it ≤ ) φ (cid:0) z it | x ′ it β + α i + w it θ , τ w it (cid:1) if y it = , π ( z it | β , α , w , y it ) ∝ I ( z it > ) φ (cid:0) z it | x ′ it β + α i + w it θ , τ w it (cid:1) if y it = , Bresson, Lacroix and Rahman which implies that we can draw z it as follows, z it | β , α , w , y it ∼ ( T N ( − ∞ , ] (cid:0) x ′ it β + α i + w it θ , τ w it (cid:1) if y it = , T N ( , ∞ ) (cid:0) x ′ it β + α i + w it θ , τ w it (cid:1) if y it = , (8)where T N [ a , b ] (cid:0) µ , σ (cid:1) denotes a normal distribution truncated to the interval [ a , b ] with mean µ and variance σ . ( β , z it ) The Gibbs sampler presented in Section 1 is straightforward, however, there is a potential for poor mixingdue to correlation between ( β , α i ) and ( z i , α i ). This correlation arises because the variables correspondingto the parameters in α i are often a subset of those in x ′ it . Thus, by conditioning these items on one another,the mixing of the Markov chain will be slow.To avoid this issue, we present an alternative algorithm which jointly samples ( β , z i ) in one block withinthe Gibbs sampler. In particular, β is sampled marginally of α i from a multivariate normal distribution.Thereafter, the latent variable z i is sampled marginally of α i from a truncated multivariate normal distribu-tion. β First, we derive the conditional posterior density of β , marginalized over the random effects α i . Since α i ∼ N (cid:0) m ′ i ζ , σ α (cid:1) , we can write α i = m ′ i ζ + ξ i with ξ i ∼ N (cid:0) , σ α (cid:1) . Therefore, the model can be rewritten asfollows, z i = X i β + ι T i α i + w i θ + D τ √ w i u i , ∀ i = , ..., n , = X i β + ι T i m ′ i ζ + w i θ + v i , where v i = ι T i ξ i + D τ √ w i u i . Given the properties of the components that constitutes ν i , we have E ( ν i ) = T i .The covariance can be derived to have the following expression, Cov ( ν i ) = E (cid:2) v i v ′ i (cid:3) = E (cid:2) ι T i ξ i ξ ′ i ι ′ T i + D τ √ w i u i u ′ i D τ √ w i (cid:3) = σ α ι T i ι ′ T i + D − τ √ w i = σ α J T i + D − τ √ w i = Ω i . where J T i = ι T i ι ′ T i . The conditional posterior density π (cid:0) β | z , w , σ α , ζ (cid:1) , marginally of α , has the expression, π (cid:0) β | z , w , σ α (cid:1) ∝ exp (cid:20) − n ∑ i = (cid:0) z i − X i β − ι T i m ′ i ζ − w i θ (cid:1) ′ Ω − i (cid:0) z i − X i β − ι T i m ′ i ζ − w i θ (cid:1)(cid:21) × exp (cid:20) − ( β − β ) ′ B − ( β − β ) (cid:21) . Similar to Section 1.1, we open the quadratic expressions and collect the terms involving β to complete asquare. This yields the following, π (cid:0) β | z , w , σ α , ζ (cid:1) ∝ exp (cid:20) − (cid:16) β − e β (cid:17) ′ e B − (cid:16) β − e β (cid:17)(cid:21) where e B − = (cid:18) n ∑ i = X ′ i Ω − i X i + B − (cid:19) , and e β = e B (cid:18) n ∑ i = X ′ i Ω − i ( z i − ι T i m ′ i ζ − w i θ ) + B − β (cid:19) , itle Suppressed Due to Excessive Length 7 which is clearly the kernel of a normal distribution. Hence, the conditional posterior density is, β | z , w , σ α , ζ ∼ N k (cid:16) e β , e B (cid:17) where e B − = (cid:18) n ∑ i = X ′ i Ω − i X i + B − (cid:19) , and e β = e B (cid:18) n ∑ i = X ′ i Ω − i ( z i − ι T i m ′ i ζ − w i θ ) + B − β (cid:19) . (9) i The conditional posterior density of the latent variable z , marginally of α , has the following expression, π (cid:0) z | β , z , w , ζ , σ α , y (cid:1) ∝ n ∏ i = (cid:26) T i ∏ t = [ I ( z it > ) I ( y it = ) + I ( z it ≤ ) I ( y it = )] (cid:27) × exp (cid:20) − (cid:0) z i − X i β − ι T i m ′ i ζ − w i θ (cid:1) ′ Ω − i (cid:0) z i − X i β − ι T i m ′ i ζ − w i θ (cid:1)(cid:21) . The above expression is recognized as the kernel of a truncated multivariate normal (TMVN) distribution.Hence, we can write, z i | β , w i , ζ , σ α , y i ∼ T MV N B i (cid:0) X i β + ι T i m ′ i ζ + w i θ , Ω i (cid:1) ∀ i = , · · · , n , where B i = ( B i × B i × ... × B iT i ) and B it is the interval ( , ∞ ) if y it = ( − ∞ , ] if y it = T MV N is not possible, hence, as in ? , we resort to the method proposed in ?? ,which utilizes Gibbs sampling to make draws from a T MV N .We apply the theorem on the conditional multivariate normal distribution (see ? p.171) on full condi-tional f ( z it | z i , − t ) = f (cid:0) z it | z i , ..., z i ( t − ) , z i ( t + ) , ..., z iT i (cid:1) which we denote as t | − t . This results in: (cid:18) z it z i , − t (cid:19) ∼ N ( µ , Σ ) , where µ = (cid:18) x ′ it β + m ′ i ζ + w it θ ( X i β + ι T i m ′ i ζ + w i θ ) − t (cid:19) , and Σ = (cid:20) Σ t , t Σ t , − t Σ − t , t Σ − t , − t (cid:21) , where ( X i β + ι T i m ′ i ζ + w i θ ) − t is a column vector with t -th element removed. Σ t , t denotes the ( t , t ) -th elementof Ω i , Σ t , − t denotes the t -th row of Ω i with element in the t -th column removed and Σ − t , − t is the Ω i matrixwith t -th row and t -th column removed. Then, the distribution of z it conditional on z i , − t is normal withconditional mean and conditional variance equal to, µ t |− t = x ′ it β + m ′ i ζ + w it θ + Σ t , − t Σ − − t , − t (cid:16) z i , − t − (cid:0) X i β + ι T i m ′ i ζ + w i θ (cid:1) − t (cid:17) , Σ t |− t = Σ t , t − Σ t , − t Σ − − t , − t Σ − t , t . We can then construct a Markov chain which continuously draws from f ( z it | z i , − t ) subject to the bound ( , ∞ ) if y it = ( − ∞ , ] if y it =

0. This is done by sampling z i at the j -th pass of the MCMC iterationusing a series of conditional posterior distributions as follows: z jit | z ji , ..., z ji ( t − ) , z ji ( t + ) , ..., z jiT i ∼ T N B it (cid:0) µ t |− t , Σ t |− t (cid:1) , for t = , · · · , T i , (10)where T N denotes a truncated normal distribution. The terms µ t |− t and Σ t |− t are the conditional mean andvariance deﬁned above and z i , − t at the j − th pass is z ji , − t = (cid:16) z ji , ..., z ji ( t − ) , z j − i ( t + ) , ..., z j − iT i (cid:17) ′ . Let (cid:18) xy (cid:19) ∼ N ( µ , Σ ) with µ = (cid:18) µ x µ y (cid:19) and Σ = (cid:20) Σ xx Σ xy Σ yx Σ yy (cid:21) , then the distribution of y conditional on x is normal N (cid:0) µ y . x , Σ y . x (cid:1) with µ y . x = µ y + Σ yx Σ − xx ( x − µ ) and Σ y . x = Σ yy − Σ yx Σ − xx Σ xy . Bresson, Lacroix and Rahman α i , w it , σ α and ζ remains unaltered as presentedearlier from Section 1.2 to Section 1.5. For the sake of completion, we present the conditional distributionsbelow once more. α i | β , z , w , σ α , ζ ∼ N (cid:16)e a , e A (cid:17) , for i = , · · · , n , where e A − = (cid:16) ι ′ T i D − τ √ w i ι T i + σ − α (cid:17) , and e a = e A (cid:16) ι ′ T i D − τ √ w i ( z i − X i β − w i θ ) + σ − α m ′ i ζ (cid:17) . (11) w it | β , α i , z it ∼ GIG (cid:16) , e λ it , e η (cid:17) , for i = , · · · , n , and t = , · · · , T i , where e λ it = (cid:16) z it − x ′ it β − α i τ (cid:17) , and e η = (cid:16) θ τ + (cid:17) . (12) σ α | α , ζ ∼ IG (cid:16) e c , e d (cid:17) , where e c = ( n + c ) , and e d = d + n ∑ i = ( α i − m ′ i ζ ) ′ ( α i − m ′ i ζ ) . (13) ζ | α , σ α ∼ N k − (cid:16)e ζ , e Σ ζ (cid:17) , where e Σ − ζ = (cid:18) σ − α n ∑ i = m i m ′ i + C − (cid:19) , and e ζ = e Σ ζ (cid:18) σ − α n ∑ i = m i α ′ i + C − ζ (cid:19) ..