A New Spatial Count Data Model with Time-varying Parameters
AA New Spatial Count Data Model with Time-varying Parameters
Prasad Buddhavarapu b,1 , Prateek Bansal a,b,2 , Jorge A. Prozzi a Corresponding author: [email protected] b Equal contribution.
Abstract
Recent crash frequency studies incorporate spatiotemporal correlations, but these studies have two key limitations – i)none of these studies accounts for temporal variation in model parameters; and ii) Gibbs sampler su ff ers from conver-gence issues due to non-conjugacy. To address the first limitation, we propose a new count data model that identifiesthe underlying temporal patterns of the regression parameters while simultaneously allowing for time-varying spatialcorrelation. The model is also extended to incorporate heterogeneity in non-temporal parameters across spatial units.We tackle the second shortcoming by deriving a Gibbs sampler that ensures conditionally conjugate posterior updatesfor all model parameters. To this end, we take the advantages of P´olya-Gamma data augmentation and forward fil-tering backward sampling (FFBS) algorithm. After validating the properties of the Gibbs sampler in a Monte Carlostudy, the advantages of the proposed specification are demonstrated in an empirical application to uncover relation-ships between crash frequency spanning across nine years and pavement characteristics. Model parameters exhibitpractically significant temporal patterns (i.e., temporal instability). For example, the safety benefits of better pavementride quality are estimated to increase over time. Keywords:
Negative-Binomial regression, Dynamic linear models, Spatiotemporal dependence, Bayesianestimation, P´olya-Gamma data augmentation.
Email addresses: [email protected] (Prasad Buddhavarapu ), [email protected] (Jorge A. Prozzi) Department of Civil Architectural and Environmental Engineering,The University of Texas at Austin Department of Civil and Environmental Engineering, Imperial College London
Preprint submitted to Elsevier August 11, 2020 a r X i v : . [ s t a t . A P ] A ug . Introduction Tra ffi c crashes are one of the main sources of fatalities in the United States. The National Highway Tra ffi c SafetyAdministration reported 37,133 crash-related fatalities in the year 2017, which resulted in the economic cost of $242billion (NHTSA, 2017). These startling statistics call for new safety countermeasures and policies. To this end, spatialcount data models have been adopted to uncover complex relationships between crash counts and influencing factorssuch as road conditions and geometric features.There are three main sources of unobserved heterogeneity in crash frequency modeling and neglecting it mayresult in biased parameter estimates and an inaccurate policy guidance (Mannering et al., 2016). First, the crashinformation collected from policy reports and other databases lacks several factors such as human behaviour, vehiclecharacteristics, and environmental conditions that can influence the likelihood of an accident. Such unobserved factorsmay introduce observation-specific variation into the relationship between observed explanatory variables and crashcount outcomes. Second, since accidents are rare events, they are generally aggregated over time (e.g. day, month, oryear) and space (e.g., county or census tract) to ensure that each observation unit has adequate crash frequencies forstatistical analysis (Lord and Mannering, 2010; Mannering and Bhat, 2014; Mannering, 2018). Previous studies haveshown that crash counts in spatial units may be correlated due to resemblance in land use, weather, tra ffi c laws, anddriving behaviour (Liu and Sharma, 2017; Li et al., 2019). Third, the parameters estimates can also exhibit temporalinstability (or correlation) due to temporal changes in driver’s decision-making, risk-taking behaviour, and cognitivebiases (Mannering, 2018). These temporal correlations between parameters of di ff erent time units are another sourceof unobserved heterogeneity.To specify observation-specific variations in the e ff ect of the observed variables, model parameters are assumedto be random variables with various parametric and semi-parametric distributions. Dirichlet process mixture (Heydariet al., 2017) and its parametric counterpart, a finite mixture of Gaussian distributions (Buddhavarapu et al., 2016), arethe state-of-the-art mixing distributions. Both are discrete-continuous representations of heterogeneity where eachobservation has a probabilistic association with latent classes and a normal distribution is specified within each class.To specify the unobserved spatial dependence between observations, various specifications of spatial correlationhave been explored in the literature – intrinsic conditional autoregressive (ICAR) (MacNab, 2004; Aguero-Valverdeand Jovanis, 2008; Wang and Kockelman, 2013), spatial autoregressive and spatial error model (Quddus, 2008) , andgeographic weighted Poisson regression (Hadayeghi et al., 2010). There is no consensus among researchers in termsof superiority of one specification over others and the choice of the specification is generally driven by computationalconvenience. In contrast to the abundant literature on modeling spatial correlation, only a handful of crash frequencystudies account for spatiotemporal correlations. Miaou et al. (2003) first introduced spatiotemporal correlations totra ffi c crash frequency modeling by adopting a hierarchical Bayesian framework. In the frequentist setting, seminalwork by Castro et al. (2012) facilitates the incorporation of spatiotemporal correlation and observation-specific het-erogeneity in count data models by recasting them as a restricted version of a generalized ordered response model.Some recent studies propose variants of spatiotemporal count data models, but most of them resort to the Bayesianestimation (Aguero-Valverde and Jovanis, 2006; Truong et al., 2016; Dong et al., 2016; Liu and Sharma, 2017; Maet al., 2017; Cheng et al., 2018; Liu and Sharma, 2018; Li et al., 2019). This is perhaps because Markov ChainMonte Carlo (MCMC) methods are easier to implement in a canned software like OpenBUGS (Lunn et al., 2009) orWinBUGS (Lunn et al., 2000). We identify two main limitations of the existing count data models with spatiotemporal correlations: • Modeling: none of these models incorporates temporal variation in model parameters, rather temporal corre-lation is specified in the link function after conditioning on observed covariates. Such specifications cannotmodel the temporal variation in parameters. In the absence on any coherent model, the temporal instability ofparameters is often quantified by estimating crash count / severity data models for each period, followed by thehypothesis testing to evaluate whether parameters of consecutive periods are statistically di ff erent or not (Islamand Mannering, 2020; Islam et al., 2020). Such methods fail to account for the inherent dependence betweenparameters of consecutive time periods. 2 Estimation: the studies relying on MCMC-based estimation use conventional Gibbs samplers, which do nothave closed-form conditional marginal posteriors. Therefore, they have to embed the Metropolis-Hastings (MH)routine into a Gibbs sampler for posterior inference. This approach is prone to computational and convergenceissues (Liu and Sharma, 2017), and is highly sensitive to initial values Liu and Sharma (2018). This is becausethe step size tuning in MH is challenging – a small step size leads to high serial correlation and a large step sizemay not fully explore the posterior domain (Rossi et al., 2012).
To address these research gaps, we advance the specification of temporal correlation in spatial Negative-Binomial(NB) models and propose an e ffi cient posterior inference routine. The proposed specification allows for temporalvariation in NB parameters using dynamic linear models (DLMs) and temporal variation in spatial correlations byleveraging ICAR priors. DLMs provide a flexible structure, which not only accounts for cross-temporal correlationsacross regression coe ffi cients, but also enables temporal variation in the coe ffi cients of the auto-regressive process.The conventional Gibbs sampler for this model also su ff ers from the unavailability of non-conjugate priors for theNB likelihood. To this end, we add P´olya-Gamma-distributed auxiliary variables in the hierarchical structure of thespecification to transform the NB likelihood into Gaussian distribution and the resulting conjugate structure providesa Gibbs sampler with closed-form posterior updates (Polson et al., 2013). The proposed P´olya-Gamma augmentedGibbs sampler circumvents the need for MH steps in MCMC simulation, which enables computationally-e ffi cient androbust estimation. We first validate the inference procedure in a Monte Carlo study and illustrate its application inestimating crash counts of the contiguous road segments in the Houston area from 2003 to 2011.We also extend the proposed specification to additionally account for observation-specific heterogeneity in non-temporal parameters, where the heterogeneity is specified using a finite mixture of Gaussian distributions (Bud-dhavarapu et al., 2016). A P´olya-Gamma-augmented Gibbs sampler is also derived for this extension.The remaining of this paper is organized as follows. The proposed specification and the Bayesian inference algo-rithm are described in sections 2 and 3, respectively. Section 4 outlines the changes in the original model specificationdue to inclusion of the observation-specific heterogeneity in non-temporal parameters and discusses correspondingmodifications in the Gibbs sampler. Subsequently, a Monte Carlo study is presented in section 5, followed by theempirical study in section 6. Lastly, section 7 concludes with the key findings, and highlights potential avenues forfuture research.
2. Model Development
We analyze a crash count data across T years from n contiguous road segments. Let y it represents the crashcount on i th road segment during t th year. Crash counts { y it : i ∈ { , ....., n } , t ∈ { , ....., T }} are assumed to be gen-erated by a NB process with parameters p it and r . The site-specific attributes may be divided into two groupsbased on their e ff ect on the respective crash count: time-invariant fixed parameters and time-varying parameters.Let X Fit = [ x Fit , x Fit , ...., z Fitg ] denote the 1 × g attribute vector with fixed coe ffi cients γ = [ γ , γ , ...γ g ]. Also, let X Dit = [ X Dit , X Dit , ....., X Ditq ] denote a 1 × q vector of attributes with dynamic parameters and θ t = { θ t , θ t , ......, θ tq } de-note the vector of time-varying regression coe ffi cients. Note that the matrices X F and X D are of nT × g and nT × q dimension, respectively. The proposed crash count model is described below: y it ∼ NB( r , p it ); i ∈ { , , ... n } ; t ∈ { , , ... T } p it = exp( ψ it )1 + exp( ψ it ) ; ψ it = X Fit γ + X Dit θ t + φ ti (1)We consider all attributes to be time-varying, but time-invariant attributes can also have a time-varying e ff ect andcan be included in the current specification by repeating them across periods. The temporal variation of the regressioncoe ffi cients θ t is modeled as a dynamic linear model. The crash counts of contiguous road segments are likely to bespatially correlated and the magnitude of spatial correlation may change over time. The proposed specification allowsfor time-varying spatial correlation through time-specific spatial random e ff ects. A vector of spatial random e ff ects φ t generated using ICAR prior structure is utilized to induce spatial correlations across the crash counts at time t . In3ubsequent subsections, we discuss the specification of time-varying parameters and ICAR prior structure, followedby summarizing the generative process of the proposed model. Dynamic regression facilitates the variation of the parameters according to a specified state-space structure. Forinstance, dynamic linear models (DLMs) are formulated by assuming linear operators while specifying the systemof equations. DLMs are extensively used in time series applications for extracting the underlying states that mightbe driving temporal changes in the outcome of interest. We assume that the following DLM structure generates theobserved crash count time series. ζ t = F t θ t + ν t ; ν t ∼ Normal(0 , V t ) θ t = G t θ t − + u t ; u t ∼ Normal(0 , W t )while inferring vector θ T conditional on other model parameters, Bayesian implementation allows to pretend ζ t ,instead of y t ( n × t ), as the observed outcome at time t . A data augmentation techniqueis employed to transform y t into a multivariate Gaussian distributed random variable ζ t (construction of ζ t from y t isfurther discussed in subsection 3.2). The attribute matrix F t is generally constructed using the time-varying attributes(i.e., X Dt , n × q ), but time-invariant attributes can be included by repeating them across time periods. The vector ofobservations ζ t are generated by the latent parameter vector θ t after transformation using the attribute matrix F t andadding a zero-centered multivariate Gaussian noise term ν t with the covariance matrix V t .The latent parameter vector θ t is assumed to be generated according to a linear state equation. θ t is generated bythe transformation of θ t − using the system operator matrix G t ( q × q ) and adding a zero-centered multivariate Gaussiannoise term u t with the covariance matrix W t . G t may be designed such that θ t is generated through an auto-regressive(AR-1) process. The DLM structure allows for specifying any other general auto-regressive structure. For instance, G t matrix may be specified such that a current state θ tk is dependent on another previous state θ tk (cid:48) (where k (cid:44) k (cid:48) ),rather than just on the last time period. In addition, G t may be designed as a time-varying system operator matrix;however, we assume it time-invariant. The proposed DLM framework is thus adequately flexible to investigate severalunderlying temporal patterns beyond the specific structure considered in this study. As mentioned earlier, we transform the NB likelihood into Gaussian likelihood by adding P´olya-Gamma-distributedauxiliary variables (more details in subsection 3.1). To exploit this conjugacy attained with the data augmentation,we assume Gaussian ICAR prior on spatial random e ff ects. ICAR prior generates spatially correlated random e ff ectsbased on a neighborhood or distance based correlation matrix. We utilize a neighborhood weight matrix W definedas follows: w i j = / k , if i and j are k-order neighbors. We denote w i + as the sum of i th row of the weight matrix.The spatial dependence is assumed to be proportional to the closeness of the neighboring road segments. The spatialcorrelation may vary with time, which we model by allowing for temporal variation in the parameter τ t . ICAR priorspecifies the distribution of spatial random e ff ect of i th road segment at time t (i.e. φ ti ), conditional on spatial randome ff ects of other road segments (i.e. φ t − i ): φ ti | φ t − i ∼ N (cid:88) j w i j w i + φ tj , τ t w i + (2)It is worth noting that the τ t parameter does not quantify the strength of spatial correlation at time t (see section3.3 of Banerjee et al., 2004, for a detailed discussion). Total variation in mean crash count may be decomposed intounstructured heterogeneity and structured spatial variation; we use the proportion of variation due to spatial clusteringas an estimate of strength of spatial correlation: α t = σ φ t σ φ t + σ (cid:15) t , where σ φ t is the empirical standard deviation of posteriordraws of spatial random e ff ects, and σ (cid:15) t is the standard deviation of the unstructured random e ff ects generated by theGamma mixing in the NB model at time t . We construct the empirical posterior distribution of the α t by computing σ φ t and σ (cid:15) t in each MCMC iteration to estimate the strength of spatial correlation.4 .3. Generative Process To facilitate the Bayesian estimation, non-informative conjugate priors are imposed on the model parameters. Thegenerative process of the proposed NB spatial model with time-varying parameters is summarised below. y it ∼ NB( r , p it ); i ∈ { , , ... n } ; t ∈ { , , ... T } p it = + exp( − ψ it ) ; ψ it = X Fit γ + X Dit θ t + φ ti θ t = G t θ t − + u t ; u t ∼ Normal(0 , W t )Where, θ t = θ t θ t ...θ qt G t = ρ .. .. ρ .. .. .. .. .. .. .. .. .. ρ W t = σ θ .. .. .. σ θ .. .. .. .. .. .. .. .. .. σ θ q γ ∼ Normal( s , S ); θ ∼ Normal( m , C ); σ θ k qk = ∼ Gamma( a σ , b σ ) φ ti | φ t − i ∼ N (cid:88) j w i j w i + φ tj , τ t w i + τ − t ∼ Gamma( c , d ); r ∼ Gamma( r , h ); h ∼ Gamma( he , h f );where Gamma( ϕ, χ ) is Gamma distribution with mean ϕχ . { r , h , γ, ω it , θ t , σ θ k , φ ti , τ t } {∀ i , t , k } is a set of model parame-ters and { r , e , f , s , S , m , C , c , d , a σ , b σ } is a set of hyper-parameters. We set ρ =
3. Bayesian Inference
The model parameters are estimated using Markov Chain Monte Carlo (MCMC) simulation. As discussed earlier,analytical full conditional distributions are not available for NB regression models because their likelihood does nothave a conjugate prior specification. In this section, we first discuss the intuition behind the P´olya-Gamma data-augmentation, which we use to address the non-conjugacy of NB regression. Subsequently, we discuss ForwardFiltering Backward Sampling (FFBS) algorithm (Prado and West, 2010), which we employ for posterior inference oftime-varying parameters. We also discuss how the P´olya-Gamma data augmentation facilitates the integration of theFFBS algorithm into the Gibbs sampler of the proposed model. The steps of the Gibbs sampler are summarized inalgorithm 1.
Data augmentation involves introducing latent random variables into the specification that are useful to deriveanalytically tractable full conditional posteriors. P´olya-Gamma data augmentation relies on adding P´olya-Gamma-distributed auxiliary variables. Conditional on these additional variables, the logistic likelihood translates into Gaus-sian likelihood. The same strategy works for the NB regression because it involves logistic likelihood. The details onthe transformation of NB likelihood into Gaussian likelihood are provided in Appendix A. The conjugacy of the con-ditionally Gaussian likelihood and multivariate normal priors leads to analytically tractable full conditional posteriordistributions for the non-temporal fixed parameters (i.e., γ ).The P´olya-Gamma data augmentation also allows to construct a tractable analytical full conditional posterior forspatial random e ff ects. Time-dependent ICAR priors are improper probability distributions on the vector of spatialrandom e ff ects (Banerjee et al., 2004). However, the full conditional posterior turns out to be a proper probability5istribution. We impose a sum-to-zero constraint for each time t by recentering the draws of φ t in each MCMCiteration; i.e. φ ti = φ ti − n (cid:80) i = φ ti ∀ i ∈ { , , ....., n } ; t ∈ { , , ....., T } .We also employ another data augmentation technique proposed by Zhou et al. (2012) to construct the full con-ditional distribution for the dispersion parameter r (see Appendix B.1 for details). We consider an alternate repre-sentation of NB regression – compound Poisson factorization, which introduces a Poisson-distributed latent indicatorvariable L it for each spatial unit i and period t in the model structure. The full conditional posterior of the dispersionparameter r turns out to be Gamma distribution. Initialization:
Initialize parameters: { r , h , γ, ω it , θ t , σ θ k , φ ti , τ t } , ∀ i ∈ { , . . . , n } , ∀ t ∈ { , . . . , T } , ∀ k ∈ { , . . . , q } ;Set hyper-parameters: { r , e , f , s , S , m , C , c , d , a σ , b σ } ; for to max-iteration sample i) r | – ∼ Gamma (cid:32) r + T (cid:80) t = N (cid:80) i = L it , h − T (cid:80) t = N (cid:80) i = ln(1 − p it ) (cid:33) (see equation 12 for the details of L it );ii) h | – ∼ Gamma( r + he , r + h f ) ;iii) γ | – ∼ Normal (cid:32) V γ (cid:32) n (cid:80) i = X Fi (cid:48) Ω − i ( z i − φ i − X Di θ t ) + S − s (cid:33) , V γ (cid:33) , where V γ = (cid:32) n (cid:80) i = X Fi (cid:48) Ω − i X Fi + S − (cid:33) − ;iv) { ω it | – } ∀ i , t ∼ PG( y it + r , ψ it ) ;v) { θ t } Tt = using forward filtering and backward smoothing (appendix B.5);Forward filtering: for t in T sample Posterior at t − θ t − | D t − ∼ Normal( m t − , C t − ) ;Prior at t : θ t | D t − ∼ Normal( G t m t − , G t C t − G (cid:48) t + W t ) ;Predictive at t : ζ t | D t − ∼ Normal( F t a t , F t R t F (cid:48) t + Ω t ) ;Posterior at t : θ t | D t ∼ Normal( a t + R t F (cid:48) t Q − t ( ζ t − f t ) , R t − R t F (cid:48) t Q − t F t R t ) ; end Backward smoothing: for t in ( T −
1) : 1 sample θ t | θ t + , D t ∼ Normal( m t + C t G (cid:48) t + R − t + ( h t + − a t + ) , C t − C t G (cid:48) t + R − t + ( R t + − B t + ) R − t + G (cid:48) t + C t ); end vi) (cid:26) σ θ k (cid:27) qk = | – ∼ Gamma a σ + ( T − , b σ + T (cid:80) t = ( θ t − G t [ k , k ] θ t − ) ;vii) (cid:110) φ ti | – (cid:111) {∀ i , t } ∼ Normal (cid:32)(cid:18) ω it + w i + τ t (cid:19) − (cid:32) [ z it − ( X Fit γ + X Dit θ t )] ω it + (cid:32)(cid:80) j w i j φ tj (cid:33) τ t (cid:33) , (cid:18) ω it + w i + τ t (cid:19) − (cid:33) ;viii) (cid:110) τ − t | – (cid:111) Tt = ∼ Gamma c + n , d + n (cid:80) i = w i + (cid:34) φ ti − (cid:80) j w ij w i + φ tj (cid:35) ;ix) Compute spatial correlation: { α t } Tt = = σ φ t σ φ t + σ (cid:15) t . endAlgorithm 1: Gibbs sampler for the spatial negative binomial model with time-varying parameters.
We adopt FFBS algorithm (originally proposed by Frhwirth-Schnatter (1994); Carter and Kohn (1994)) for poste-rior sampling of time-varying regression parameters θ T . FFBS algorithm simultaneously produces posterior drawsof the state vector θ T through forward sampling followed by backward smoothing in each MCMC iteration. FFBSalgorithm is not capable of handling a non-linear model (such as NB regression) (Windle et al., 2013). However, the6ransformation of the NB-distributed crash counts y it into a conditionally Gaussian distributed data vector z it = y it − r ω it using P´olya-Gamma data augmentation facilitates the adoption of FFBS in the proposed non-linear model.The NB likelihood shown in Equation 1 can be equivalently written as (see Appendix A for details): z it | ω it , – ∼ Normal( ψ it , ω − it )Where, ψ it = X Fit γ + X Dit θ t + φ ti and ω it is a P´olya-Gamma distributed auxiliary variable. Now, the evolution equationsof the system may be written as follows using the transformed data z t . z t = X Ft γ + X Dt θ t + φ t + ν t z t − X Ft γ − φ t = X Dt θ t + ν t ζ t = F t θ t + ν t ; ν t ∼ Normal(0 , Ω t ) (3) Ω t = ω t .... ω t ....
00 0 .... ω nt θ t = G t θ t − + u t ; u t ∼ Normal(0 , W t )where ζ t = z t − X Ft γ − φ t and F t = X Dt . The above augmented specification matches with a traditional dynamiclinear model.
4. Extension: Inclusion of Random Parameters
We first discuss the required modifications in the original model specification to include non-temporal randomparameters in section 4.1. Subsequently, we highlight key modifications in the Gibbs sampler of the extended modelin section 4.2.
Let X Rit = [ X Rit , X Rit , ...., X Rith ] denote the 1 × h attribute vector with time-invariant random parameters β i = [ β i , β i , ...β ih ]. After introducing non-temporal random parameters, only link function ψ it in the original model (seeEquation 1) is modified as: ψ it = X Fit γ + X Rit β i + X Dit θ t + φ ti (4)Following Buddhavarapu et al. (2016), we consider a finite mixture of multivariate normal distributions on time-invariant random parameter β i . If µ c and Σ c are mean vector and covariance matrix corresponding to c th componentand η c is a weight of the c th component in the mixture of C components, the flexible discrete-continuous distributionon β i is represented as follows: β i ∼ C (cid:88) c = η c Normal( µ c , Σ c ); C (cid:88) c = η c = , (5)Due to change in specification of ψ it , ζ t in equation 3 of the augmented dynamic linear model also changes to ζ t = z t − X Ft γ − X Rt β i − φ t . Apart from these modifications, we add the following priors to the the generative processof the original model as presented in section 2.3: η c ∼ Dirichlet( α , ...α ); µ c ∼ Normal( b , B ); Σ − c ∼ Wish( ν , V );where Wish ( ξ, Ξ ) is Wishart distribution with mean ξ Ξ . Thus, { r , h , γ, β i , µ c , Σ c , η c , ω it , θ t , σ θ k , φ ti , τ t } {∀ i , t , c , k } is a setof model parameters and { r , e , f , s , S , b , B , ν , V , α , m , C , c , d , a σ , b σ } is a set of hyper-parameters for theextended specifications. We set ρ = .2. The Modified Gibbs Sampler Since the inclusion of non-temporal random parameters in the model specification adds just another layer ofparameters in the hierarchy, the data augmentation techniques used for the original specification are applicable for theextended specification. In fact, as a consequence of P´olya-Gamma data augmentation, the parameters associated withthe mixing distribution of the additional non-temporal random parameters (i.e., µ C , Σ C , η C ) also attain conjugateposterior updates.Conditional posterior distributions of the original model parameters (except γ , the non-random parameter) eitherremain una ff ected or are slightly modified after addition of non-temporal random parameters. We develop a blockedGibbs sampler for the extended specification to improve the mixing of Markov chains. The detailed derivation ofthe conditional distributions of model parameters for the extended specification are provided in Appendix B and thesteps of the Gibbs sampler are summarized in Algorithm 2. Conditional distributions for original model parameterspresented in Algorithm 1 can be easily retrieved from those derived for the extended specification due to nesting ofthe original specification within the extended specification.
5. Monte Carlo Study
Before applying the proposed dynamic spatial NB (DSNB) model in an empirical application, validation of thefinite sample and convergence properties of the Gibbs sampler is important. To this end, we simulate crash count datausing the data generating process (DGP) of the DSNB specification and estimate the marginal posterior distributionsof model parameters using the derived Gibbs sampler.
We assume a highway network of 1000 road segments (i.e., n = T = ff ect γ and three covariates withtime-varying e ff ect θ t . Whereas features with time-invariant e ff ects X Fit are generated from a multivariate normaldistribution, features with time-varying e ff ects X Dit are generated by assuming a series of correlated draws utilizing asinusoidal temporal pattern with Gaussian random noise. A binary spatial weight matrix is created assuming that thecrash counts of a given road segment are spatially correlated with those of 4 contiguous road segments on each sidealong the highway. The following true parameters are set prior to generating the intermediate model parameters: γ , σ θ { q } , τ T , G T , and r . Subsequently, we generate { φ it } i , t and θ T according to the spatial and dynamic linear modelsof the proposed DSNB specification. Lastly, p i t is computed for each road segment i at year t , followed by generatingcrash count y it using a negative binomial likelihood with parameters p i t and r . We choose true model parameters insuch a way that the distribution of the simulated crash counts approximately matches with that of the empirical crashdata considered in this study. We write our own code to implement the Gibbs sampler (summarized in Algorithm 1) on the simulated datain R software (R Core Team, 2020). We write several components of the code in Rcpp package (Eddelbuettel andBalamuta, 2017) to gain additional computational advantages. To be specific, Rcpp implementation accelerates thecomputation of these components by a factor of 10 or more. We employ
BayesLogit package to sample e ffi cientlyfrom P´olya-Gamma-distributed random variables (Polson et al., 2012). Around 2000 burn-in draws were deemedsu ffi cient to attain stationary distribution, and the marginal posterior of model parameters were estimated based onsubsequent 1000 MCMC draws. We carry out simulations on a Macintosh machine with Intel Core i5 CPU with 2.7GHz and 8GB RAM. An average run time of around 6 hours is required to take 3000 MCMC draws from the jointdistribution.To demonstrate the convergence of the Gibbs sampler, we report Geweke diagnostic statistic (Geweke et al.,1991). The test takes two non-overlapping portions of the Markov chain, and performs a two-means Z-test to checkfor convergence. Moreover, we use the following metrics to assess the e ffi cacy of the proposed Gibbs Sampler inrecovering the true model parameters: • Mean Absolute Bias (MAB) = (cid:12)(cid:12)(cid:12) True parameter value - Posterior mean (cid:12)(cid:12)(cid:12) Absolute Percentage Bias (APB) = (cid:12)(cid:12)(cid:12)(cid:12) True parameter value - Posterior meanTrue parameter value (cid:12)(cid:12)(cid:12)(cid:12) × •
95% credible interval coverage : a binary indicator, which is 1 if the true parameter lies in the estimated 95%credible interval.
Table 1 summarizes the posterior statistics of model parameters, aforementioned performance metrics, and con-vergence diagnostics. The APB values of all model parameters range between 0.75% and 28.39%, and all key modelparameters are captured within the estimated 95% posterior credible intervals. To illustrate the recovery of time-varying parameters graphically, we plot the recovered posterior statistics of these parameters alongside their truevalues across ten years in Figure 1. We also superimpose the distribution of posterior means of road segment-specificspatial random e ff ects on the true values used in generating crash counts in Figure 2. The resemblance in both plotsensures the e ffi cacy of the proposed Gibbs sampler in recovering spatial random e ff ects. Figure 1: Simulation results for dynamic spatial NB (DSNB) model: time-varying parameters
We choose a 95% threshold for hypothesis testing using Geweke statistic. In testing, we use Bonferroni correction(Napierala, 2012) because we do multiple testing, i.e. individually check convergence of Markov chains for all 34model parameters of interest (30 time varying, 3 time-invariant, and 1 dispersion parameter). The Geweke statistic ofmodel parameters are shown in Figure 3 – Z-score of the most of the model parameters is between -3.18 and 3.18 (95%confidence interval thresholds with Bonferroni correction), suggesting that Markov chains have attained stationarityand the sampler has converged.In summary, convergence diagnostics and parameter recovery metrics of all the relevant parameters, i.e. the onesthat are used as input to devise safety policies, indicate that the presented Gibbs sampler in Algorithm 1 is appropriatefor posterior inference in the DSNB model and can be used in empirical studies.9 able 1: Results of the simulation for dynamic spatial NB (DSNB) model
Parameter True value Posterior mean 2.5%-quantile 97.5%-quantile MAB APB 95%-coverage
Time-invariant parameters γ γ γ -0.100 -0.089 -0.103 -0.073 0.011 11.40% 1 r θ [1] 0.500 0.483 0.388 0.563 0.017 3.32% 1 θ [1] 0.502 0.574 0.498 0.654 0.073 14.47% 1 θ [1] 0.551 0.565 0.483 0.654 0.014 2.53% 1 θ [1] 0.553 0.564 0.477 0.650 0.011 2.00% 1 θ [1] 0.635 0.622 0.549 0.701 0.013 2.08% 1 θ [1] 0.616 0.605 0.526 0.689 0.011 1.80% 1 θ [1] 0.443 0.459 0.366 0.548 0.016 3.51% 1 θ [1] 0.419 0.412 0.325 0.490 0.007 1.72% 1 θ [1] 0.440 0.432 0.349 0.518 0.008 1.79% 1 θ [1] 0.375 0.397 0.312 0.471 0.023 6.07% 1 θ [2] -0.500 -0.567 -0.768 -0.349 0.067 13.50% 1 θ [2] -0.479 -0.615 -0.796 -0.431 0.136 28.39% 1 θ [2] -0.531 -0.675 -0.862 -0.473 0.143 26.93% 1 θ [2] -0.565 -0.709 -0.896 -0.530 0.144 25.46% 1 θ [2] -0.551 -0.688 -0.875 -0.477 0.137 24.89% 1 θ [2] -0.587 -0.751 -0.946 -0.571 0.164 27.94% 1 θ [2] -0.629 -0.719 -0.912 -0.532 0.090 14.35% 1 θ [2] -0.603 -0.716 -0.918 -0.536 0.113 18.71% 1 θ [2] -0.660 -0.689 -0.893 -0.497 0.029 4.41% 1 θ [2] -0.581 -0.606 -0.822 -0.381 0.025 4.36% 1 θ [3] -0.500 -0.589 -0.831 -0.361 0.089 17.83% 1 θ [3] -0.481 -0.543 -0.717 -0.346 0.062 12.99% 1 θ [3] -0.527 -0.574 -0.770 -0.387 0.047 8.92% 1 θ [3] -0.531 -0.573 -0.753 -0.393 0.042 7.93% 1 θ [3] -0.593 -0.549 -0.711 -0.370 0.044 7.40% 1 θ [3] -0.684 -0.541 -0.708 -0.328 0.142 20.83% 1 θ [3] -0.668 -0.595 -0.786 -0.395 0.073 10.91% 1 θ [3] -0.655 -0.678 -0.865 -0.503 0.023 3.44% 1 θ [3] -0.693 -0.732 -0.935 -0.540 0.039 5.64% 1 θ [3] -0.707 -0.723 -0.932 -0.511 0.016 2.23% 1Deviance Information Criterion (DIC) 39263.25Number of MCMC iterations 3000Number of burn-in iterations 2000Number of simulated road segments 1000Number of simulated years of data 10Geweke convergence diagnostics See Figure 3 igure 2: Simulation results for dynamic spatial NB (DSNB) model: spatial random e ff ects pooled across all yearsFigure 3: Simulation results: Geweke diagnostic statistic
6. Empirical Analysis
This section presents an application of the proposed DSNB model in understanding the relationship betweenroad attributes and crash counts using panel data. Quantifying these relationships help in designing new safety-countermeasures. Studying the evolution of the relationship of road attributes with crash outcomes enables the devel-opment of informed safety improvement strategies.
In this empirical analysis, we model historical crash counts of contiguous freeway road segments of a metropoli-tan road network from Houston city, USA, which has grade-separated freeway junctions. We use data from eleven11i ff erent road facilities across a span of 9 years (2003 to 2011). We source crash counts from the publicly availablemotor vehicle Crash Record Information System (CRIS) database maintained by the local Department of Transporta-tion (DOT). We geographically map individual crash occurrences to the respective road segments and subsequentlyaggregate them temporally to obtain the annual crash counts for each road segment. We further integrate the crashdata with road condition management databases, which track several road-specific attributes of the road transportationnetwork. This integration helps in linking the annual crash counts with the respective annually aggregated road-segment-specific attributes. More details about the data collection and processing are presented in (Buddhavarapu,2015).The attributes of road segments include tra ffi c volumes, various geometric features, pavement surface character-istics and structural distresses, and road locations. Table 2 reports annual summary statistics of these time-varyingand time-invariant attributes of road segments and the key trends are discussed below. An overall decreasing trendof mean crash counts during the study period indicates that the study network witnessed safety improvements overtime. Annual average daily tra ffi c (AADT) increased on an average, while the proportion of truck tra ffi c remainedconstant for the initial few years, followed by a slight reduction during the last part of the study period. The averagespeed limit across the road segments slightly changed from 2003 to 2004 and remained consistent until the end of thestudy period. The average number of lanes and shoulder widths did not vary much during the study period, indicat-ing no significant changes to the road network in terms of road widening. The ride quality of a road (measured byinternational roughness index, IRI) and road distress index indicate that the road condition of the study network (onan average) improved slightly during the analysis period. This improvement reflects the e ffi cacy of the maintenancee ff orts of DOT in managing the road network. The proportion of continuously reinforced concrete pavements (CRCP)also exhibits an increasing trend. About 45% of the road segments belong to interstate highways (IH), and the restof the road segments belong to state highways (SH) and US highways. The reduction in the proportion of the roadsegments with rural area flag indicates urban sprawl in the study area. We have established the statistical properties of the Gibbs sampler for the DSNB model in the simulation studyin section 5. In this empirical study, we follow the same specification and procedure to implement the Gibbs sampleras used in the simulation study (see section 5.2 for details). For the model selection, i.e. to identify a set of roadattributes that explain the variation in the crash counts, we test various model specifications and select the one withthe lowest Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002)).To demonstrate the importance of accounting for temporal variation in parameters, we also estimate another con-strained model specification where we consider all parameters to be time-invariant. In other words, in addition to aDSNB specification, we also estimate a corresponding spatial NB (SNB) model. Tables 3 reports the posterior summaries of DSNB model parameters. Since presenting all temporal parametersof DSNB in tabular format results in a very long table, we only report variance of the AR(1) process for each of theseparameters in Table 3. However, we plot the posterior summaries of time-varying parameters and spatial correlation(see Section 2.2 for discussion on spatial correlation) of DSNB model in Figures 4 and 5, respectively, while jux-taposing the posterior estimates of the SNB specification for comparison. To perform statistical inference on modelparameters, we report mean, standard deviation, and 95% credible interval limits of the estimated posterior distribu-tions. The posterior mean highlights the magnitude of the model parameters, while credible intervals provide insightson proximity to any null hypothesis value of interest. In the next subsection, we provide a detailed discussion on theestimation results and their implications to safety management. Whereas we have extended the proposed DSNB model to account for unobserved heterogeneity in time-invariant parameters in section 4 andalso derived its Gibbs sampler in Algorithm 2, we do not adopt this specification in this application because such e ff ects are hard to empirically iden-tify along with time-varying parameters using the sample size of the empirical study. The likelihood of empirically identifying such heterogeneouse ff ects would improve with the increase in the number of road segments and the repeated observations in the sample. able 2: Descriptive statistics for years from 2003 to 2011 Category Description Mean (Standard Deviation)2003 2004 2005 2006 2007 2008 2009 2010 2011
Crashes Crash count 21.2 (33.8) 17.3 (25.6) 19.8 (27.4) 19.4 (26.9) 19.2 (25.8) 15.2 (20.6) 14.2 (19.2) 17.4 (23.7) 14.5 (21.6)Tra ffi c Annual Average Daily Tra ffi c 44706 (32102) 49566 (35998) 50820 (38000) 50098 (36601) 51049 (37334) 51911 (37074) 52504 (37918) 50889 (36505) 51047 (36330)Tra ffi c load estimate 130 (12) 173 (20) 119 (16) 142 (10) 166 (14) 167 (14) 157 (12) 187 (15) 185 (15)Truck tra ffi c percentage 11.4 (7.3) 11.1 (6.8) 11.1 (6.3) 11.2 (6.3) 10.5 (6.7) 10.4 (6.5) 10.7 (6.6) 10.8 (6.4) 10.6 (5.2)Imposed speed limit (miles / hours) 62 (6) 59 (6) 61 (6) 61 (5) 61 (5) 61 (5) 61 (5) 61 (5) 61 (5)Geometric Number of lanes (per direction) 3 (1) 3 (1) 3 (1) 3 (1) 3 (1) 3 (1) 3 (1) 3 (1) 3 (1)Total surface roadway width (ft) 52.5 (14.3) 53.3 (14.9) 54.2 (15.4) 54.3 (15.4) 54.5 (15.2) 55 (15.3) 56 (15.6) 56.3 (15.4) 56.2 (15.4)Left shoulder width (ft) 8.4 (2.4) 8.2 (2.6) 8.4 (2.7) 8.5 (2.8) 8.6 (2.8) 8.7 (2.8) 8.4 (3.3) 8.5 (3.2) 8.3 (3.5)Right shoulder width (ft) 8.8 (2.3) 8.9 (2.3) 9 (2.3) 9 (2.3) 9 (2.3) 9.1 (2.3) 9.8 (2.1) 9.6 (2.3) 9.5 (2.6)Segment length (mile) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1)Pavement Road Condition Index 82 (24) 83 (22) 84 (20) 84 (21) 85 (20) 87 (18) 86 (19) 88 (18) 88 (17)Road Distress Index 88 (21) 88 (19) 90 (16) 90 (17) 92 (15) 93 (14) 93 (14) 94 (13) 94 (12)Road Ride Index 3.4 (0.6) 3.5 (0.5) 3.4 (0.6) 3.5 (0.6) 3.4 (0.6) 3.5 (0.6) 3.5 (0.6) 3.5 (0.6) 3.5 (0.6)Average IRI (inch / mile) a
118 (35) 113 (33) 118 (36) 114 (37) 117 (36) 114 (34) 117 (36) 113 (34) 113 (34)Left wheel path IRI (inch / mile) a
117 (35) 111 (32) 116 (36) 112 (36) 117 (35) 108 (34) 115 (35) 112 (33) 107 (31)Right wheel path IRI (inch / mile) a
119 (36) 116 (35) 120 (36) 116 (37) 118 (38) 121 (38) 119 (39) 114 (36) 119 (40)Maintenance Cost ($ scaled to hide actual budgets) 921 (2026) 1133 (2191) 834 (1286) 887 (1834) 1446 (9154) 991 (1969) 993 (1963) 879 (1632) 943 (2020)Indicator: Asphalt pavement 0.21 0.19 0.20 0.19 0.17 0.17 0.14 0.12 0.12Indicator: CRCP pavement b c a IRI: international roughness index b CRCP: continuously reinforced concrete pavements c JCP: jointed concrete pavement able 3: Posterior estimation results for dynamic spatial NB (DSNB) model Description Posterior mean Posterior Std.Dev. 2.5%-quantile 97.5%-quantile
Time-invariant parametersIntercept 0.801 0.077 0.609 0.894Indicator Variable: Asphalt pavement 0.271 0.044 0.192 0.369Indicator Variable: Facility-IH 0.712 0.066 0.601 0.847r 1.416 0.041 1.339 1.488AR (1) variance for time-varying parametersIntercept 0.028 0.023 0.006 0.084Annual Average Daily Tra ffi c 0.012 0.021 0.001 0.066Segment length (mile) 0.002 0.002 0.000 0.008Truck tra ffi c percentage 0.013 0.016 0.001 0.055Indicator Variable: Rural Area 0.020 0.022 0.002 0.081Indicator Variable: Shoulder type - Asphalt 0.006 0.007 0.001 0.024Avg International Roughness Index (IRI inch / mile) 0.003 0.004 0.000 0.011Total shoulder width (ft) 0.004 0.006 0.001 0.019Imposed speed limit (miles / hour) 0.003 0.003 0.000 0.009MCMC diagnostics and sample meta dataDeviance Information Criterion (DIC) 69113.02Number of MCMC iterations 3000Number of burnin iterations 2000Number of road segments 1158Number of years of data 9 We mainly focus on the findings of the DSNB model and compare them with those of the SNB specification.Figure 4 shows that except intercept, 95% credible intervals of time-varying parameters in DSNB model almost fullycover those in SNB model. We observe a similar overlapping pattern for spatial correlation in Figure 5. This figurealso shows a little temporal variation in the magnitude of spatial correlation. The AR(1) variance estimates of time-varying parameters in Table 3 indicate that many link function parameters in DSNB specification exhibit statisticallysignificant temporal instability and trends in Figure 4 shows that the temporal changes in many of these parametersare practically significant. We now discuss sign, magnitude, and practical implications of parameter estimates.The positive posterior mean estimates indicate that road segments on interstate highways with asphalt pavementsare likely to have higher mean crash counts as compared to those on other facility types with concrete pavements,keeping all other covariates constant. These di ff erences between pavement-facility types remain constant over timebecause corresponding indicators have time-invariant parameters in the final specification.Among time-varying parameters, we first discuss segment length and AADT, which are generally considered asexposure variables. As expected, the longer segments with higher AADT are associated with higher mean crashcounts in each year. The mean crash count per unit increase in segment length declines after 2007, and the e ff ectof AADT on crash counts remains nearly constant (see Figure 4). The magnitude of the negative association of theproportion of truck tra ffi c volume with mean crash count first decreases, then remains constant, and finally increasesduring the study period. Negative posterior mean estimate suggests that road segments in rural areas experience lowermean crash counts as compared to that of non-rural road segments. The di ff erence in mean crash counts of rural andnon-rural road segments remains fairly constant over the years. The pavements with asphalt shoulders experiencelarger mean crash count relative to the other types of shoulders throughout the study period, and the magnitude of thisdi ff erence temporally increases as shown in the Figure 4. International roughness index (IRI) is positively associatedwith the mean crash counts, and the strength of this association consistently increases with time (see Figure 4). Sincelower IRI values are proxy for superior road conditions, this result indicates that the safety benefits of improving theride quality of road segments are increasing over time. 14 igure 4: Posterior estimates of temporal parameters from dynamic spatial NB (DSNB) specification (Black lines: posterior mean; shaded region:95% credible intervals) and spatial NB (SNB) specification (Blue lines: posterior mean; dashed blue lines: 95% credible intervals)Figure 5: Temporal variation in spatial correlation
7. Conclusions and Future Work
We propose a dynamic spatial negative binomial count data model that simultaneously allows for time-varyingparameters using a dynamic linear model formulation, unobserved heterogeneity in time-invariant parameters, andtime-varying spatial correlations. This specification provides an elegant way to evaluate temporal instability, a termcoined by Mannering (2018), in model parameters while accounting for all potential sources of heterogeneity. Estimat-ing such a flexible model using traditional Markov Chain Monte Carlo methods is challenging due to non-conjugacyof the negative binomial likelihood. To this end, we leverage recently developed P´olya-gamma data-augmentationtechnique to address the challenges associate with non-conjugacy and adopt the Forward Filtering and BackwardSampling algorithm to perform posterior inference on dynamic parameters. These advancements enable us to derivefull conditional distributions for all the model parameters, and thus, obviate the need of Metropolis-Hasting step,resulting in a computationally-e ffi cient and robust Gibbs sampler for the proposed model. We demonstrate the finitesample and convergence properties of the proposed Gibbs sampler in a comprehensive simulation study.We also demonstrate the advantages of the proposed specification in modeling crash frequency spanning acrossnine years (from 2003 to 2011) from a freeway road network of Houston city, USA. For this analysis, we createannually-aggregated features of road segments by fusing road management databases with crash frequency informa-tion. A majority of parameters corresponding to both time-varying and time-invariant explanatory variables exhibitstatistically significant temporal instability. For example, the increasing magnitude of the positive parameter corre-sponding to the road roughness index suggests the temporal increase in the importance of maintaining superior ridequality to improve safety.While the findings from this small-scale empirical study are valuable for practitioners, our proposed frameworkis applicable to a higher number of features and larger networks due to the e ffi ciency and robustness of the derivedGibbs sampler. The posterior predictive distributions of the model parameters can be constructed to predict thecrash counts in a future year. However, a long crash history would be necessary to accurately predict the modelparameters corresponding to a future year using a dynamic linear model. Representing unobserved heterogeneity intime-invariant parameters with a finite mixture of Gaussian distributions would enable practitioners to classify roadsegments in various categories in a data-driven manner (Buddhavarapu et al., 2016) while simultaneously learningtemporal evolution of other parameters, particularly for large road networks with several years of crash data. In ourfuture work, we plan to explore recent developments in approximate Bayesian inference (Bansal et al., 2020; Lutset al., 2015) to make the estimation even faster and scalable, and eventually facilitate periodical data-driven safetystrategy development on large-scale networks for transportation agencies.16 eferences Aguero-Valverde, J., Jovanis, P.P., 2006. Spatial analysis of fatal and injury crashes in pennsylvania. Accident Analysis & Prevention 38, 618–625.Aguero-Valverde, J., Jovanis, P.P., 2008. Analysis of road crash frequency with spatial models. Transportation Research Record 2061, 55–63.Banerjee, S., Carlin, B.P., Gelfand, A.E., 2004. Hierarchical modeling and analysis for spatial data. Number 101 in Monographs on statistics andapplied probability, Chapman & Hall / CRC, Boca Raton, Fla.Bansal, P., Krueger, R., Graham, D.J., 2020. Fast bayesian estimation of spatial count data models. arXiv preprint arXiv:2007.03681 .Buddhavarapu, P., Scott, J.G., Prozzi, J.A., 2016. Modeling unobserved heterogeneity using finite mixture random parameters for spatially corre-lated discrete count data. Transportation Research Part B: Methodological 91, 492–510.Buddhavarapu, P.N.V.S.R., 2015. On Bayesian estimation of spatial and dynamic count models using data augmentation techniques: application toroad safety management. Ph.D. thesis.Carter, C.K., Kohn, R., 1994. On Gibbs Sampling for State Space Models. Biometrika 81, 541–553. doi: .Castro, M., Paleti, R., Bhat, C.R., 2012. A latent variable representation of count data models to accommodate spatial and temporal dependence:Application to predicting crash frequency at intersections. Transportation research part B: methodological 46, 253–272.Cheng, W., Gill, G.S., Zhang, Y., Cao, Z., 2018. Bayesian spatiotemporal crash frequency models with mixture components for space-timeinteractions. Accident Analysis & Prevention 112, 84–93.Dong, N., Huang, H., Lee, J., Gao, M., Abdel-Aty, M., 2016. Macroscopic hotspots identification: a bayesian spatio-temporal interaction approach.Accident Analysis & Prevention 92, 256–264.Eddelbuettel, D., Balamuta, J.J., 2017. Extending extitR with extitC ++ https://doi.org/10.7287/peerj.preprints.3188v1 , doi: .Frhwirth-Schnatter, S., 1994. Data Augmentation and Dynamic Linear Models. Journal of Time Series Analysis 15, 183–202. doi: .Geweke, J., et al., 1991. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. volume 196.Hadayeghi, A., Shalaby, A.S., Persaud, B.N., 2010. Development of planning level transportation safety tools using geographically weightedpoisson regression. Accident Analysis & Prevention 42, 676–688.Heydari, S., Fu, L., Miranda-Moreno, L.F., Jopseph, L., 2017. Using a flexible multivariate latent class approach to model correlated outcomes: Ajoint analysis of pedestrian and cyclist injuries. Analytic methods in accident research 13, 16–27.Islam, M., Alnawmasi, N., Mannering, F., 2020. Unobserved heterogeneity and temporal instability in the analysis of work-zone crash-injuryseverities. Analytic Methods in Accident Research , 100130.Islam, M., Mannering, F., 2020. A temporal analysis of driver-injury severities in crashes involving aggressive and non-aggressive driving. AnalyticMethods in Accident Research , 100128.Li, Z., Chen, X., Ci, Y., Chen, C., Zhang, G., 2019. A hierarchical bayesian spatiotemporal random parameters approach for alcohol / drug impaired-driving crash frequency analysis. Analytic Methods in Accident Research 21, 44–61.Liu, C., Sharma, A., 2017. Exploring spatio-temporal e ff ects in tra ffi c crash trend analysis. Analytic methods in accident research 16, 104–116.Liu, C., Sharma, A., 2018. Using the multivariate spatio-temporal bayesian model to analyze tra ffi c crashes by severity. Analytic methods inaccident research 17, 14–31.Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Trans-portation research part A: policy and practice 44, 291–305.Lunn, D., Spiegelhalter, D., Thomas, A., Best, N., 2009. The bugs project: Evolution, critique and future directions. Statistics in medicine 28,3049–3067.Lunn, D.J., Thomas, A., Best, N., Spiegelhalter, D., 2000. Winbugs-a bayesian modelling framework: concepts, structure, and extensibility.Statistics and computing 10, 325–337.Luts, J., Wand, M.P., et al., 2015. Variational inference for count response semiparametric regression. Bayesian Analysis 10, 991–1023.Ma, X., Chen, S., Chen, F., 2017. Multivariate space-time modeling of crash frequencies by injury severity levels. Analytic Methods in AccidentResearch 15, 29–40.MacNab, Y.C., 2004. Bayesian spatial and ecological models for small-area accident and injury analysis. Accident Analysis & Prevention 36,1019–1028.Mannering, F., 2018. Temporal instability and the analysis of highway accident data. Analytic methods in accident research 17, 1–13.Mannering, F.L., Bhat, C.R., 2014. Analytic methods in accident research: Methodological frontier and future directions. Analytic methods inaccident research 1, 1–22.Mannering, F.L., Shankar, V., Bhat, C.R., 2016. Unobserved heterogeneity and the statistical analysis of highway accident data. Analytic methodsin accident research 11, 1–16.Miaou, S.P., Song, J.J., Mallick, B.K., 2003. Roadway tra ffi c crash mapping: a space-time modeling approach. Journal of Transportation andStatistics 6, 33–58.Napierala, M.A., 2012. What is the bonferroni correction. AAOS Now 6, 40.NHTSA, 2017. Quick facts 2017. Available from https: // crashstats.nhtsa.dot.gov / Api / Public / ViewPublication / .Prado, R., West, M., 2010. Time Series: Modeling, Computation, and Inference. CRC Press.Quddus, M.A., 2008. Modelling area-wide count outcomes with spatial correlation and heterogeneity: An analysis of london crash data. AccidentAnalysis & Prevention 40, 1486–1497.Quenouille, M., 1949. A Relation between the Logarithmic, Poisson, and Negative Binomial Series. Biometrics 5, 162–164.R Core Team, 2020. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL: . ossi, P.E., Allenby, G.M., McCulloch, R., 2012. Bayesian statistics and marketing. John Wiley & Sons.Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A., 2002. Bayesian measures of model complexity and fit. Journal of the RoyalStatistical Society: Series B (Statistical Methodology) 64, 583–639.Truong, L.T., Kieu, L.M., Vu, T.A., 2016. Spatiotemporal and random parameter panel data models of tra ffi c crash fatalities in vietnam. AccidentAnalysis & Prevention 94, 153–161.Wang, Y., Kockelman, K.M., 2013. A poisson-lognormal conditional-autoregressive model for multivariate spatial analysis of pedestrian crashcounts across neighborhoods. Accident Analysis & Prevention 60, 71–84.Windle, J., Carvalho, C.M., Scott, J.G., Sun, L., 2013. Polya-Gamma Data Augmentation for Dynamic Models. arXiv preprint arXiv:1308.0774 .Zhou, M., Li, L., Dunson, D., Carin, L., 2012. Lognormal and gamma mixed negative binomial regression. arXiv preprint arXiv:1206.6456 . ppendix A P´olya-Gamma data augmentation A random variable ω has a PG(b,c) distribution (P´olya-Gamma distribution with parameters b and c) if ω D = π ∞ (cid:88) k = g k ( k − / + c / (4 π ) (6)where, g , g , ... g k , ... are independent and identically distributed (i.i.d) random variables with Gamma( b , ffi ciency of drawing P´olya-Gamma random variables.We now discuss how a NB likelihood can be converted to a Gaussian likelihood using P´olya-gamma data augmen-tation. Consider a simplistic representation of the cross-sectional NB regression model: y i ∼ NB( r , p i ); i ∈ { , , ... n } p i = exp( ψ i )1 + exp( ψ i ) ; ψ i = X i γ + φ i (7)The NB likelihood parametrized by log-odds can be written as, P ( y i | ψ i , r ) = Γ ( y i + r ) Γ ( r ) y i ! exp( ψ i ) y i (1 + exp( ψ i )) r + y i (8)After applying the main result of Polson et al. (2013), the equation can be written as P ( y i | ψ i , r ) = Γ ( y i + r ) Γ ( r ) y i ! 2 − ( r + y i ) exp (cid:32) ( y i − r ) ψ i (cid:33) (cid:90) ∞ exp − ω i ψ i p ( ω i ) d ω i = Γ ( y i + r ) Γ ( r ) y i ! 2 − ( r + y i ) exp (cid:32) ( y i − r ) ψ i (cid:33) E ω i exp − ω i ψ i (9)where ω i ∼ PG ( y i + r , γ ) becomesGaussian likelihood conditional on the P´olya-Gamma random variable ω i , r and ψ i as shown below. P ( y | ψ, r , ω ) = n (cid:89) i = P ( y i | ψ i , r , ω i ) = n (cid:89) i = Γ ( y i + r ) Γ ( r ) y i ! 2 − ( r + y i ) exp (cid:32) ( y i − r ) ψ i (cid:33) exp − ω i ψ i ∝ n (cid:89) i = exp − ω i (cid:34) ψ i − y i − r ω i (cid:35) ∝ exp (cid:32) − ( z − ψ ) (cid:48) Ω ( z − ψ )2 (cid:33) where, z i = y i − r ω i for i ∈ { , , .... n } and ψ i contains the regression coe ffi cients. z i = ψ i + ν i ; ν i ∼ N(0 , ω − i ) z = ψ + ν ; ν ∼ Normal(0 , Ω )where Ω = diag( ω − i ). In the Gibbs sampler, we pretend that z i is observed instead of y i and thus, a Gaussianlikelihood form of crash counts is obtained in terms of z . We also consider a panel data setting, where Ω t and Ω i areused to represent time-specific and individual-specific variance-covariance matrices of n × n and T × T , respectively.19 ppendix B Gibbs Sampler This section provides a Gibbs sampling scheme to iteratively draw from full conditional posterior distributions ofthe parameters of the proposed dynamic spatial NB model with dynamic parameters and random heterogeneity.
B.1 Posterior sampling of the dispersion parameter (r)
To derive posterior distribution of the dispersion parameter r , we express NB random variables as sums of Loga-rithmic random variables under compound Poisson distribution (Quenouille, 1949): y it = L it (cid:88) k = ς kt , ς kt iid ∼ Logarithmic( p it ) , L it ∼ Poisson( − r ln(1 − p it )) , ∀ i ∈ { , ..., n } , t ∈ { , ..., T } (10) P ( r | L , p , –) ∝ T (cid:89) t = n (cid:89) i = P ( L it | r ) P ( r ) r | L , p , – ∼ Gamma r + T (cid:88) t = N (cid:88) i = L it , h − T (cid:88) t = N (cid:88) i = ln(1 − p it ) (11)The conditional posterior of L it is obtained using the procedure described in Zhou et al. (2012). Here, we providethe analytical closed form expression for the posterior of discrete random variable L it below: P ( L it = j | r , y ) = R ( y it , j ) , j ∈ { , , ... y it } (12)where, R ( l , m ) = l = m = F ( l , m ) r ml (cid:80) j = F ( l , j ) r j l (cid:44) m (cid:44) F ( m , j ) = m = j = m < j ( m − m F ( m − , j ) + m F ( m − , j −
1) 1 ≤ j ≤ m The hyper parameter h is learned by creating the full conditional posterior: P ( h | r , –) ∝ P ( r | r , h ) P ( h | e , f ) h | r , – ∼ Gamma( r + he , r + h f ) (13) B.2 Posterior sampling of { β n , µ C , γ, ω { n , T } } Using the P´olya-Gamma data augmentation, the crash counts y it are transformed to z it = y it − r ω it as described inAppendix A. Pretending that z it is observed, instead of y it , we obtain a Gaussian likelihood form of crash countsin terms of z it . We sample the time-invariant random parameters β n , their component-specific means µ C , fixedparameters γ , and auxiliary variable ω in blocks to accelerate the convergence by improving the mixing of Markovchains. Considering the joint distribution p ( β N , γ, µ C , ω | –) given in equation 14, we perform this blocked samplingin the following steps: p ( β N , γ, µ C , ω | –) ∝ n (cid:89) i = p ( β i | γ, µ C , ω, –) p ( γ, µ C | ω, –) p ( ω | –) (14) • I: p ( β i | γ, µ C , ω, –) ∀ i ∈ { , . . . , n }• II: p ( γ, µ C | ω, –) • III: p ( ω | –) 20 : Sampling from p ( β i | γ, µ C , ω, – ) P ( β i | γ, µ C , ω, –) ∝ T (cid:89) t = P ( z it | ψ it , ω, r ) P ( β i | (cid:37) i ) ∝ exp (cid:32) −
12 ( z i − ψ i ) (cid:48) Ω − i (( z i − ψ i ) (cid:33) exp C (cid:89) c = (cid:32) −
12 ( β i − µ c ) (cid:48) Σ − c ( β i − µ c ) (cid:33) I ( (cid:37) i = c ) ∼ Normal( m β i , V β i )where V β i = X Ri (cid:48) Ω − i X Ri + C (cid:89) c = (cid:16) Σ − c (cid:17) I ( (cid:37) i = c ) − m β i = V β i X Ri (cid:48) Ω − i ( z i − X Fi γ − X Di θ t − φ i ) + C (cid:89) c = ( Σ − c µ c ) I ( (cid:37) i = c ) (15)where X Fi , X Ri , and X Di are covariates of T × g , T × h and T × q dimension, respectively; Ω i is a T × T diagonalmatrix with diagonal elements ω − it ∀ t ∈ { , , .... T } ; (cid:37) is a n × (cid:37) i = c , if i th unitbelongs to c th latent class; I ( (cid:37) i = c ) is atomic latent class indicator which equals one if i th unit belongs to c th latentclass, else switches to zero; z i , ψ i , and φ i are T × II: Sampling from p ( γ, µ C | ω, – )To construct the conditional posterior p ( γ, µ C | ω, –), we marginalize the random parameters as follows. z i = X Fi γ + X Ri β i + X Di θ t + φ i + ν i ; ν i ∼ Normal(0 , Ω i ) = X Fi γ + C (cid:88) c = I ( (cid:37) i = c ) X Ri µ c + X Di θ t + φ i + ν ∗ i ; ν ∗ i ∼ Normal(0 , Υ i ) , Υ i = Ω i + C (cid:88) c = I ( (cid:37) i = c ) X Ri Σ c X Ri (cid:48) The joint distribution can be written as: p ( γ, µ C | (cid:37), ω, –) ∝ C (cid:89) c = p ( µ c | γ, ω, (cid:37), –) p ( γ | ω, –) • II-1: µ c | γ, ω, – ∼ Normal( m µ c , V µ c ), where V µ c = n (cid:88) i = X Ri (cid:48) Υ − i X Ri I ( (cid:37) i = c ) + B − − m µ c ( γ ) = V µ c n (cid:88) i = X Ri (cid:48) Υ − i I ( (cid:37) i = c )( z i − φ i − X Fi γ − X Di θ t ) + B − b (16) • II-2: γ | ω, – ∼ Normal( m γ , V γ ), where V γ = n (cid:88) i = X Fi (cid:48) Υ − i ( Υ i − X Ri V µ (cid:37) i X Ri (cid:48) ) Υ − i X Fi + S − − m γ = V γ n (cid:88) i = X Fi (cid:48) Υ − i ( z i − φ i − X Ri m µ (cid:37) i (0) − X Di θ t ) + S − s (17)21 II: Sample from p ( ω | – )The full conditional distribution ω | – turns out to be a distribution in the P´olya-Gamma class (see Polson et al.,2013, for details). ω it | – ∼ PG( y it + r , ψ it ) (18)The details of the P´olya-Gamma distribution are provided in Appendix A. B.3 Posterior sampling from component-specific variance ( Σ C )P ( Σ c | –) ∝ n (cid:89) i = P ( β i | µ c , Σ c , (cid:37) i ) P ( Σ c ) Σ c | – ∼ Wish( ν Σ c , V Σ c )where ν Σ c = ν + n (cid:88) i = I ( (cid:37) i = c ) V Σ c = V − + n (cid:88) i = I ( (cid:37) i = c )( β i − µ c )( β i − µ c ) (cid:48) − (19) B.4 Posterior sampling from categorical variable ( (cid:37) n ) P ( (cid:37) i = c | –) = Φ ( z i ; m (cid:37) ic , V (cid:37) ic ) η cC (cid:80) c (cid:48) = Φ ( z i ; m (cid:37) ic (cid:48) , V (cid:37) ic (cid:48) ) η (cid:48) c (20)Where, m (cid:37) ic = X Fi γ + X Ri µ c + X Di θ t + φ i , V (cid:37) ic = X Ri Σ c X Ri (cid:48) + Ω i , Φ ( . ) is a multivariate Gaussian density function, and η = [ η , η , ....., η C ] is the mixture weight vector. We sample η using the following full conditional distribution. P ( η | (cid:37) ) ∝ P ( (cid:37) | η ) P ( η ) η | (cid:37) ∼ Dirichlet( α + g , α + g , ....., α + g C ) (21)where, g c = n (cid:80) i = ( (cid:37) i = c ). B.5 Posterior sampling of state vector ( θ t ) The evolution equations of the system are written as follows using the transformed data ζ t . ζ t = F t θ t + ν t ; ν t ∼ Normal(0 , Ω t ) θ t = G t θ t − + u t ; u t ∼ Normal(0 , W t )FFBS algorithm is performed in two steps: Forward filtering and backward smoothing. The following recursionsare performed in each MCMC iteration within the Gibbs sampler. B.5.1 Forward Filtering
The following recursions are performed in each MCMC iteration within the Gibbs sampler. We start the recursionsby drawing a state vector θ from a non-informative distribution, which is the state prior distribution at t = t = θ ∼ Normal( m , C )22 rom t = :Posterior at t-1: θ t − | D t − ∼ Normal( m t − , C t − )Prior at t: θ t | D t − ∼ Normal( a t , R t ) a t = G t m t − R t = G t C t − G (cid:48) t + W t Predictive at t: ζ t | D t − ∼ Normal( f t , Q t ) f t = F t a t Q t = F t R t F (cid:48) t + Ω t Posterior at t: θ t | D t ∼ Normal( m t , C t ) m t = a t + R t F (cid:48) t Q − t ( ζ t − f t ) C t = R t − R t F (cid:48) t Q − t F t R t where D t is the information provided by the first t observations. The above computations involve inversion of Q t of size n × n , which becomes computationally expensive with the number of road segments. However, we use thefollowing established result from matrix algebra to circumvent this challenge: Q − t = ( F t R t F (cid:48) t + Ω t ) − = Ω − t − Ω − t FR ( I q + F (cid:48) t Ω − t FR ) − F (cid:48) Ω − t We continue the recursions until time T and then draw the state vector at time T using θ T | D T ∼ Normal( m T , C T ). B.5.2 Backward Smoothing
Subsequently, we recursively draw the remaining states θ T − by backward smoothing using the following equa-tions and then a single draw of the complete state vector θ , ..θ T is available at each MCMC iteration: θ t | θ t + , D t ∼ Normal( h t , B t ) h t = m t + C t G (cid:48) t + R − t + ( h t + − a t + ) B t = C t − C t G (cid:48) t + R − t + ( R t + − B t + ) R − t + G (cid:48) t + C t B.6 Posterior sampling of σ θ k For each k th diagonal element σ θ k of W t , where k ∈ , , . . . q , take a draw in each MCMC iteration from:1 σ θ k | – ∼ Gamma a σ +
12 ( T − , b σ + T (cid:80) t = ( θ t − G t [ k , k ] θ t − ) (22)where, G t [ k , k ] is the k th diagonal element of G t . 23 .7 Posterior sampling of spatial random e ff ects ( φ ti ) p ( φ ti | φ t − i , –) ∝ P ( z it | φ t − i , –) P ( φ ti | φ t − i ) ∝ exp (cid:18) − ω it (cid:104) z it − ( X Fit γ + X Rit β i + X Dit θ t + φ ti ) (cid:105) (cid:19) exp − w i + τ t φ ti − (cid:88) j w i j w i + φ tj φ ti | φ ( t ) − i , – ∼ Normal V t φ [ z it − ( X Fit γ + X Rit β i + X Dit θ t )] ω it + (cid:88) j w i j φ tj τ t , V t φ where V t φ = (cid:32) ω it + w i + τ t (cid:33) − (23)We perform mean centering to accommodate the identification issue: φ t = φ t − ¯ φ t , where ¯ φ t is mean of the vector φ t . B.8 Posterior sampling of spatial parameter ( τ t ) τ − t | – ∼ Gamma c + n , d + n (cid:88) i = w i + φ ti − (cid:88) j w i j w i + φ tj (24)24 nitialization: Initialize parameters: { r , h , γ, β i , µ c , Σ c , η c , ω it , θ t , σ θ k , φ ti , τ t } , ∀ i ∈ { , . . . , n } , ∀ t ∈ { , . . . , T } , ∀ k ∈ { , . . . , q } , ∀ c ∈ { , . . . , C } ;Set hyper-parameters: { r , e , f , s , S , b , B , ν , V , α , m , C , c , d , a σ , b σ } ; for to max-iteration sample i) r | – ∼ Gamma (cid:32) r + T (cid:80) t = N (cid:80) i = L it , h − T (cid:80) t = N (cid:80) i = ln(1 − p it ) (cid:33) (see equation 12 for the details of L it );ii) h | – ∼ Gamma( r + he , r + h f ) ;iii) { β n , µ C , γ, ω { n , T } } : • { β i | – } ni = ∼ Normal( V β i (cid:34) X Ri (cid:48) Ω − i ( z i − X Fi γ − X Di θ t − φ i ) + C (cid:81) c = ( Σ − c µ c ) I ( (cid:37) i = c ) (cid:35) , V β i ), where V β i = (cid:34) X Ri (cid:48) Ω − i X Ri + C (cid:81) c = (cid:16) Σ − c (cid:17) I ( (cid:37) i = c ) (cid:35) − ; • { µ c | – } Cc = ∼ Normal( V µ c (cid:32) n (cid:80) i = X Ri (cid:48) Υ − i I ( (cid:37) i = c )( z i − φ i − X Fi γ − X Di θ t ) + B − b (cid:33) , V µ c ), where V µ c = (cid:32) n (cid:80) i = X Ri (cid:48) Υ − i X Ri I ( (cid:37) i = c ) + B − (cid:33) − ; • γ | – ∼ Normal( V γ (cid:32) n (cid:80) i = X Fi (cid:48) Υ − i ( z i − φ i − X Ri m µ (cid:37) i (0) − X Di θ t ) + S − s (cid:33) , V γ ), where V γ = (cid:32) n (cid:80) i = X Fi (cid:48) Υ − i ( Υ i − X Ri V µ (cid:37) i X Ri (cid:48) ) Υ − i X Fi + S − (cid:33) − ; • { ω it | – } ∀ i , t ∼ PG( y it + r , ψ it ) ;iv) { Σ c | – } Cc = ∼ Wish ν + n (cid:80) i = I ( (cid:37) i = c ) , (cid:32) V − + n (cid:80) i = I ( (cid:37) i = c )( β i − µ c )( β i − µ c ) (cid:48) (cid:33) − ;v) Class indicator { (cid:37) i } ni = using P ( (cid:37) i = c | –) = Φ ( z i ; m (cid:37) ic , V (cid:37) ic ) η cC (cid:80) c (cid:48) = Φ ( z i ; m (cid:37) ic (cid:48) , V (cid:37) ic (cid:48) ) η (cid:48) c , where m (cid:37) ic = X Fi γ + X Ri µ c + X Di θ t + φ i and V (cid:37) ic = X Ri Σ c X Ri (cid:48) + Ω i ;vi) η | (cid:37) ∼ Dirichlet( α + g , α + g , ....., α + g C ) where, g c = n (cid:80) i = ( (cid:37) i = c ) ;vii) { θ t } Tt = using forward filtering and backward smoothing (appendix B.5);Forward filtering: for t in T sample Posterior at t − θ t − | D t − ∼ Normal( m t − , C t − ) ;Prior at t : θ t | D t − ∼ Normal( G t m t − , G t C t − G (cid:48) t + W t ) ;Predictive at t : ζ t | D t − ∼ Normal( F t a t , F t R t F (cid:48) t + Ω t ) ;Posterior at t : θ t | D t ∼ Normal( a t + R t F (cid:48) t Q − t ( ζ t − f t ) , R t − R t F (cid:48) t Q − t F t R t ) ; end Backward smoothing: for t in ( T −
1) : 1 sample θ t | θ t + , D t ∼ Normal( m t + C t G (cid:48) t + R − t + ( h t + − a t + ) , C t − C t G (cid:48) t + R − t + ( R t + − B t + ) R − t + G (cid:48) t + C t ); end viii) (cid:26) σ θ k (cid:27) qk = | – ∼ Gamma a σ + ( T − , b σ + T (cid:80) t = ( θ t − G t [ k , k ] θ t − ) ;ix) (cid:110) φ ti | – (cid:111) {∀ i , t } ∼ Normal (cid:32)(cid:18) ω it + w i + τ t (cid:19) − (cid:32) [ z it − ( X Fit γ + X Rit β i + X Dit θ t )] ω it + (cid:32)(cid:80) j w i j φ tj (cid:33) τ t (cid:33) , (cid:18) ω it + w i + τ t (cid:19) − (cid:33) ;x) (cid:110) τ − t | – (cid:111) Tt = ∼ Gamma c + n , d + n (cid:80) i = w i + (cid:34) φ ti − (cid:80) j w ij w i + φ tj (cid:35) ;xi) Compute spatial correlation: { α t } Tt = = σ φ t σ φ t + σ (cid:15) t . endAlgorithm 2:endAlgorithm 2: