[PDF] Quantifying the causal effect of speed cameras on road traffic accidents via an approximate Bayesian doubly robust estimator

Abstract

This paper quantifies the effect of speed cameras on road traffic collisions using an approximate Bayesian doubly-robust (DR) causal inference estimation method. Previous empirical work on this topic, which shows a diverse range of estimated effects, is based largely on outcome regression (OR) models using the Empirical Bayes approach or on simple before and after comparisons. Issues of causality and confounding have received little formal attention. A causal DR approach combines propensity score (PS) and OR models to give an average treatment effect (ATE) estimator that is consistent and asymptotically normal under correct specification of either of the two component models. We develop this approach within a novel approximate Bayesian framework to derive posterior predictive distributions for the ATE of speed cameras on road traffic collisions. Our results for England indicate significant reductions in the number of collisions at speed cameras sites (mean ATE = -15%). Our proposed method offers a promising approach for evaluation of transport safety interventions.

Full PDF

aa r X i v : . [ s t a t . A P ] M a y Quantifying the causal eﬀect of speed cameras onroad traﬃc collisions via an approximate Bayesiandoubly robust estimator

Daniel J. Graham , Cian Naik , Emma J. McCoy , and Haojie Li Corresponding author: Department of Civil Engineering, Imperial College London, London,SW7 2AZ, UK. Email: [email protected] Department of Mathematics, Imperial College London, London, UK School of Transportation, Southeast University, Nanjing, China

Abstract

This paper quantiﬁes the eﬀect of speed cameras on road traﬃc collisions using anapproximate Bayesian doubly-robust (DR) causal inference estimation method. Previousempirical work on this topic, which shows a diverse range of estimated eﬀects, is basedlargely on outcome regression (OR) models using the Empirical Bayes approach or on sim-ple before and after comparisons. Issues of causality and confounding have received littleformal attention. A causal DR approach combines propensity score (PS) and OR modelsto give an average treatment eﬀect (ATE) estimator that is consistent and asymptoticallynormal under correct speciﬁcation of either of the two component models. We developthis approach within a novel approximate Bayesian framework to derive posterior predic-tive distributions for the ATE of speed cameras on road traﬃc collisions. Our results forEngland indicate signiﬁcant reductions in the number of collisions at speed cameras sites(mean ATE = -30%). Our proposed method oﬀers a promising approach for evaluation oftransport safety interventions.

Keywords:

Doubly robust; Bayesian inference; propensity score; average treatment eﬀect;speed cameras; casualties.

Fixed speed limit enforcement cameras are a common intervention used to encourage drivers tocomply with maximum legal speed limits. The cameras are installed at sites on selected linksin order to detect speed limit violations, which can subsequently be punished with monetaryﬁnes, driver licence disqualiﬁcation points, or prosecution. Since the introduction of speedcameras (SCs) there has been considerable debate about their eﬀects on road traﬃc collisions(RTCs). At various times claims have been made that SCs serve to reduce RTCs, that theyhave no eﬀect, or even that they increase RTCs by encouraging more erratic driving behaviour.A number of academic studies of the eﬀect of speed cameras on RTCs have been undertaken(for a review see Li et al. 2013). Most studies ﬁnd that speed cameras have led to a reductionin RTCs, but the range of estimated eﬀects is large (from 0% to -55%). Variation in estimatesis to be expected given that study results pertain to diverse empirical contexts, but it islso the case that a number of diﬀerent methods have been applied which can have a criticalinﬂuence on results obtained. In particular, since SCs are not randomly assigned, it is essentialthat any adopted method recognises that the observed relationship between SCs on RTCsmay be subject to confounding. Confounding arises when the characteristics that inﬂuencetreatment assignment (i.e. whether a site is ‘treated’ and ‘untreated’ with an SC) also matterfor outcomes (i.e. RTCs). Regression to the mean (RTM), for instance, is a well knownmanifestation of confounding that arises via ‘selection bias’.The extent to which confounding has been recognised and addressed in existing studies variesconsiderably. Some studies have simply ignored it, using simple before-and-after methodswith control groups (e.g. Christie et al. 2003, Cunningham et al. 2000, De Pauw et al. 2014,Gains et al. 2004, 2005, Goldenbeld and van Schagen 2005, Jones et al. 2008, Maher 2015).Others have used the empirical Bayes (EB) method as suggested by Hauer et al. (2002), largelyto adjust for eﬀects of confounding that arise via RTM (e.g. Chen et al. 2002, Elvik 1997, Hoye2015, Mountain et al. 2004, 2005, Shin et al. 2009). Finally, there are a small number of studiesthat have used time-series methods, either interrupted time-series analyses with control groupsor ARIMA, to test for changes in outcome rates (Carnis and Blais 2013, Hess and Polak 2003,Keall et al. 2001, e.g.). Where studies have attempted to address confounding this has beendone via the inclusion of covariates in outcome regression (OR) models, typically using Poissonor negative binomial Generalised Linear Models (GLMs).In a previous paper we adopted a propensity score (PS) matching approach to evaluate theeﬀectiveness of speed cameras (see Li et al. 2013). A key advantage of the PS over ORapproach is that it provides an eﬀective way of isolating a valid control group by ensuringthat the distribution of pre-treatment covariates matches those of the treated group and thatgenuine overlap in the support of the covariates exists between the two groups. However, aswith the OR approach, valid inference from PS models crucially depends on the unknown PSmodel being correctly speciﬁed.In this paper we build on our previous work by developing and applying an estimation approachwhich we believe has much to oﬀer in evaluating the eﬀectiveness of road safety interventions.Our approach uses the principle of doubly-robust (DR) estimation, which provides robust-ness to model misspeciﬁcation by combining both OR and PS models to derive an averagetreatment eﬀect (ATE) estimator which is consistent and asymptotically normal under correctspeciﬁcation of just one of the two component models. The DR approach is attractive for ourapplication because the PS and OR models we can construct make diﬀerent assumptions aboutthe nature of confounding. For the PS model, we are able to faithfully represent via measuredcovariates the formal criteria that exist for the assignment of speeds cameras to sites. For theOR model, we can diﬀerence our response variable before and after treatment to allow for theexistence of site level time-invariant unobserved eﬀects in addition to measured confounders.To avoid common sources of misspeciﬁcation error, we estimate our component models usingsemiparametric Generalized Additive Mixed Models (GAMMs) which make minimal a prioriassumptions on the functional form of the relationships under study. We also use a matchingalgorithm prior to forming the DR model to establish a valid control group. Thus, in ourapproach, potential biases from confounding are addressed by combining three compatiblemodelling tools: via matching to achieve comparability between treated and control sites, viaa regression model for RTCs, and via a model for the treatment assignment mechanism.DR estimators have been studied and applied extensively in the frequentist setting (e.g.Robins 2000, Robins et al. 2000, Robins and Rotnitzky 2001, van der Laan and Robins 2003,2unceford and Davidian 2004, Bang and Robins 2005, Kang and Schafer 2007). A furthercontribution of the paper is that we develop our binary DR estimator within the Bayesianparadigm. A Bayesian representation of the DR model has proven diﬃcult to formulate inprevious work because DR estimators are typically constructed as solutions to estimatingequations based on a set of moment restrictions that do not imply fully speciﬁed likelihoodfunctions. We choose the Bayesian paradigm for three main reasons. First, DR estimationof the ATE involves prediction and extrapolation over covariate distributions with underly-ing uncertainty in parameter estimates. Bayesian inference provides a suitable framework forprediction that explicitly addresses such uncertainty in the sense that both the predicted ob-servations, and the relevant parameters for prediction, have the same random status. Second,by deriving a posterior predictive distribution for the ATE, rather than a ﬁxed value, we canmake probability statements about the causal quantity of interest allowing us to discuss ﬁnd-ings in relation to speciﬁc hypotheses or in terms of credible intervals which can oﬀer a moreintuitive understanding of the eﬀects of SCs for public policy formulation. Finally, we developan approximate Bayesian approach that can utilise prior information about the parametersof interest, which could be useful in evaluating safety interventions when historical data ortraining data from other regions are available.The paper is structured as follows. Section two outlines broad trends in road traﬃc casualtiesfor Britain and then sets out a formal causal modelling framework to estimate the eﬀects ofSCs on RTCs. Section three describes our approximate Bayesian DR approach and presentssome simulations that demonstrate its properties. Section four describes the data available forestimation and outlines our chosen model speciﬁcations. Results are then presented in sectionﬁve and conclusions are drawn in the ﬁnal section.

For the year ending September 2016 the UK DfT recorded a total of 182,560 causalities onBritish roads of which 25,160 were classiﬁed as killed or seriously injured (KSIs) (DfT 2017).Since 2010 the annual numbers of fatalities and KSIs have not changed signiﬁcantly, followingseveral years in which road safety was improving. The average number of fatal road traﬃcincidents over the period 2010 to 2016 is approximately 1,800. Since the volume of road traﬃchas continued to grow over this period, however, the number of fatalities per vehicle miledriven has been falling (DfT 2016).The DfT argue that there is good evidence to suggest that while the absolute number offatalities on British roads now appears to be relatively static, overall absolute casualty numbersare continuing to fall. In short, levels of safety appear to be improving in relative terms and notdeteriorating in absolute terms. Given the changes that have occurred in vehicle technology,medical care, and road safety interventions, however, the DfT also note that a comprehensivecausal understanding of the factors underpinning casualty trends is currently out of reach.In this paper we attempt to contribute to such an understanding by quantifying the causalimpact of one type of safety intervention: speed cameras (SCs).3 .2 ATE estimation within the potential outcomes framework

Our sample comprises n , i = 1, ..., n , links on the road network. Some links have a SC other donot. We deﬁne D i ∈ {

1, 0 } as a binary random variable indicating the presence or otherwiseof a SC and we refer to this as the treatment variable. We are interested in the eﬀect of thetreatment on an outcome Y i , which measures collision frequency. We deﬁne Y i (1) and Y i (0)as the potential outcomes for unit i under treated and control status respectively. Recognisingthat SCs are not assigned randomly, we also deﬁne X i as a random vector of pre-treatmentcovariates that capture characteristics of links that are relevant to whether a SC was assignedor not, and are also relevant for outcomes. Thus, the data we observe for each link takesthe form of a random vector, z i = ( y i , d i , x i ), where y i denotes a response, d i the treatmentreceived, and x i a vector of pre-treatment covariates.Ideally, we would assess the eﬀects of SCs on each link by calculating the individual causaleﬀect (ICE): τ i = Y i (1) − Y i (0), but the observed data reveal only actual outcomes not potentialoutcomes. Thus we observe the random variable Y i = Y i (1) I ( D i ) + Y i (0)(1 − I ( D i )), where I ( D i ) is the indicator function for receiving the treatment, but we do not observe the jointdensity, f ( Y i (0), Y i (1)), since a SC cannot be both present and absent on a link simultaneously.Instead, our target of inference is the ATE, deﬁned as τ = E [ Y i (1)] − E [ Y i (0)],which measures the diﬀerence in expected outcomes under treatment and control status.A key insight of the potential outcomes approach is that if we focus on estimating the ATEthen we do not have to observe all potential outcomes, even under a non-random treatmentassignment, as long as three key assumptions hold. First, the potential outcomes for unit i must be conditionally independent of the treatment assignment given a (suﬃcient) set ofobserved covariates X i : Y i (0), Y i (1)) ⊥⊥ I ( D i ) | X i . Second, the support of the conditionaldistribution of X i given a particular treatment status must overlap with that of X i givenany other treatment status: 0 < Pr( I ( D i ) = 1 | X i = x ) < ∀ x . Third, the relationshipbetween observed and potential outcomes must comply with the Stable Unit Treatment ValueAssumption (SUTVA) (e.g. Rubin 1978, 1980, 1986, 1990), which requires that the observedresponse under a given treatment allocation is equivalent to the potential response under thattreatment allocation: Y i = I ( D i ) Y i (1) + (1 − I ( D i )) Y i (0) for all i = 1, ..., n .The three assumptions deﬁned above, which are together referred to by Rosenbaum and Rubin(1983) as strong ignorability , allow for identiﬁcation of causal eﬀects from observational databecause if they hold the ATE can be derived as, τ = E i ( Y i (1) − Y i (0)) = E X [ E i ( Y i (1) | X i = x ) − E i ( Y i (0) | X i = x )] (1a)= E X [ E i ( Y i (1) | X i = x , I ( D i ) = 1) − E i ( Y i (0) | X i = x , I ( D i ) = 0)] (1b)= E X [ E i ( Y i | X i = x , I ( D i ) = 1) − E i ( Y i | X i = x , I ( D i ) = 0)] . (1c)Conditional independence justiﬁes the equality of (1a) and (1b), the SUTVA allows the substi-tution of observed for potential outcomes to give (1c), and overlap ensures that the populationATE in (1c) is estimable since there are units in both the treated and untreated groups.Thus, if strong ignorability holds, the potential outcomes approach oﬀers a route to obtainingvalid causal estimates of the ATE of SCs. To proceed we need to estimate the relevantexpectations in (1c) above. 4 .3 Causal estimators Using the notation of Tsiatis and Davidian (2007), we deﬁne joint densities of the observeddata of the form f Z ( z ) = f Y | D , X ( y | d , x ) f D | X ( d | x ) f X ( x ).Given strong ignorability, estimation of the ATE of SCs can proceed in one of the followingways;i. Outcome regression (OR) model - leave f D | X ( d | x ) and f X ( x ) unspeciﬁed and posit amodel for E [ Y i | D i , X i ]); the mean of the conditional density of the response given thecovariates, using an OR model Ψ − { m ( D i , X i ; β ) } , for known link function Ψ, regressionfunction m (), and unknown parameter vector β . If the OR is correctly speciﬁed for themean response then the ATE can be consistently estimated by.ˆ τ OR = 1 n n X i =1 h Ψ − { m (1, X i ; ˆ β ) } − Ψ − { m (0, X i ; ˆ β ) } i .ii. Propensity score (PS) model - leave f Y | D , X ( y | d , x ) and f X ( x ) unspeciﬁed but assume amodel for f D | X ( d | x ); the conditional density of treatment assignment given covariates.This is a propensity score (PS) model, denoted π ( D i | X i ; α ), which can be used to forma number of diﬀerent nonparametric estimators but of primary interest here is its use inthe weighting estimator attributed to Horvitz and Thompson (1952)ˆ τ IP W = 1 n n X i =1 (cid:20) I ( D i ) · Y i π ( D i | X i ; ˆ α ) − [1 − I ( D i )] · Y i − π ( D i | X i ; ˆ α ) (cid:21) ,which is consistent under correct speciﬁcation of the PS by virtue of the fact that E [ Y i (1)] = E { [ Y i (1) · I ( D i )] /π ( D i | X i ; α ) } and similar for control treatment status.iii. Doubly-robust (DR) model - leave f ( x ) unspeciﬁed but assume both an OR model anda PS model and combine them to form a DR estimator. This is achieved by weightingor augmenting the OR model with a function of the inverse of the estimated PS to givea DR model. In this paper we estimate the weighted model e ( D i , X i ; ξ ) = Ψ − { m ( D i , X i ; ξ ) } where the unknown parameter vector ξ is obtained by weighting the model with b κ i ( D i , X i ) = I ( D i ) b π ( D i | X i ; b α ) + 1 − I ( D i )1 − b π ( D i | X i ; b α ) .This model will consistently estimate E [ Y i | D i , X i ]) if the model Ψ − { m ( I ( D i ), X i ; β ) } iscorrect because while weighting may induce ineﬃciency it will leave the consistency andasymptotic normality of the OR estimates unchanged. If the OR model is incorrectlyspeciﬁed, but the PS is correctly speciﬁed, the model is still consistent because weightinggives rise to estimating equations of the form n X i =1 b κ i ( D i , X i ) 1 φ ∂e ( d i , x i ; ξ ) ∂ξ T [ y i − e ( d i , x i ; ξ )] = 0, (2)5here φ i ≡ φ ( D i , X i ) is a working conditional variance for Y i given ( D i , X i ), whicheﬀectively correct for the bias in approximating E [ Y i | D i , X i ]] using Ψ − { m ( D i , X i ; β ) } (for a proof see Lunceford and Davidian 2004).We use estimates of ξ to form the DR estimatorˆ τ DR = 1 n n X i =1 h Ψ − { m (1, X i ; ˆ ξ ) } − Ψ − { m (0, X i ; ˆ ξ ) } i . So far we have discussed DR estimation within the context of frequentist semiparametricinference. As mentioned in the introduction to the paper there are good reasons why aBayesian inferential approach is particularly beneﬁcial for estimation of road safety interven-tions. Bayesian inference has, however, proven diﬃcult to apply for DR estimators becausethey are based on a set of moment restrictions which do not provide fully speciﬁed likelihoodfunctions. Here, we make some improvements to the approach proposed by Graham et al.(2016) in the context of continuous treatment. In contrast to that paper we focus on bi-nary treatments using PS weighting rather than augmentation to achieve the DR model andwe implement ways of incorporating prior information into the posterior distribution of theATE. The basic theory underpinning approximate Bayesian inference in this context is coveredcomprehensively in Graham et al. (2016) and so we provide only a brief summary here.The Bayesian bootstrap was ﬁrst introduced by Rubin (1981) and applied in weighted like-lihood models by Newton and Raftery (1994). The basic idea is to create new datasets byrepeatedly re-weighting the original data in order to obtain the posterior distribution for someparameter of interest. If we treat our observed data, z i say, as eﬀectively coming from a multi-nomial distribution with distinct values a k , k = (1, ..., K ), and attach a probability to eachdistinct value θ = ( θ , ..., θ k ), then by placing an improper Dirichlet prior on θπ ( θ ) ∝ K Y k =1 θ − k .the posterior density also has a Dirichlet distribution p ( θ | v ) ∝ K Y k =1 θ n k − k .with parameter n k . This posterior can be stimulated via the weighted likelihood e L ( θ ) = n Y i =1 f ( z i ; θ ) w i ,in which the weights w = ( w , ..., w n ) are distributed according to the uniform Dirichletdistribution and simulated as n independent standard exponential (i.e. gamma(1,1)) variatesand standardised. The weighted likelihood reduces to e L ( θ ) = n Y i =1 ( K Y k =1 θ I k ( z i ) k ) w i = K Y k =1 θ n P i =1 w i I k ( z i ) k = K Y k =1 θ nγ k k ,6ay, where nγ k is the sum of the weights w i for which z i = a k . Since the vector γ = ( γ , ..., γ K )has a Dirichlet distribution with parameters n k = ( n , ..., n K ), p ( γ ) ∝ K Y k =1 γ n k − k and since at the point of maximisation of e L ( θ ) is e θ = γ , then the solutions to the maximisedweighted likelihood function with repeatedly sampled uniform Dirichlet weights w ( l ) representa sample from the posterior of θ under the improper prior Q k θ − k .To apply the Bayesian bootstrap to our DR model we estimate e ( D i , X i ; ξ ) = Ψ − { m A ( D i , X i ; ξ ) } with weights w ( l ) i · b κ i ( D i , X i ).The maximiser of e L ( ξ ), which we denote e ξ , implies a solution to n X i =1 w ( l ) i · b κ i ( D i , X i ) · φ ∂e ( d i , x i ; ξ ) ∂ξ T [ y i − e ( d i , x i ); ξ )] = 0, (3)which as noted above has the DR property. We repeatedly draw sets of random weights { w ( l ) i } ni =1 as n standardised independent standard exponential variates and solve (3) to buildup an empirical posterior density of e ξ , denoted p n ( e ξ ), from which the sampled values e ξ ( l ) areconsistent with the DR estimating equations.Newton and Raftery (1994) apply sampling-importance resampling (SIR) to improve accuracyof the weighted bootstrap approach, but this improvement requires a fully speciﬁed likelihoodfunction. Instead, for our restricted moment model, we use the resampling scheme proposedby Muliere and Secchi (1996) which extends Rubin’s bootstrap in a general Bayesian nonpara-metric context. Two attractive features of Muliere and Secchi’s approach for causal modellingare that it ensures that predictive distributions are not constrained to be concentrated onobserved values and it allows us to take prior opinions into account. The posterior predictivedistribution of the ATE, incorporating prior information, is obtained in the following way.i. Estimate the PS model π ( D i | X i ; α ), and form b κ i ( d i , x i ; b α ) = I ( d i ) b π ( d i | x i ; b α ) + 1 − I ( d i )1 − b π ( d i | x i ; b α ) .ii. Draw a single set of random weights { w ( l ) i } ni =1 and form the combined weights w ( l ) i · b κ i ( d i | x i ; b α ) and estimate the weighted modelΨ − n m A (cid:16) d i , x i ; ξ ( l ) (cid:17)o .iii. Repeatedly compute (ii) using new weights { w ( l ) i } ni =1 to obtain the empirical posteriordistribution p n ( e ξ ). 7v. Introduce a prior distribution p for ξ and a positive number k , the ‘measure of faith’that we have in this prior. This can range anywhere from 1 to a size comparable to thenumber of samples of ξ .v. Generate m observations x ∗ , ..., x ∗ m from kp + Lp n k + L , where p n is as above. We choose m = L in our case.vi. For i = 1, ..., m generate v i from a Γ (cid:18) L + km , 1 (cid:19) distribution.vii. Sample new parameters e ξ MS from x ∗ , ..., x ∗ m using the weights v , ..., v m to form theposterior p m ( e ξ ).viii. Resample V values of the covariate vector uniformly over the observed values and asingle vector ξ ( m ) from p m ( e ξ ).ix. Form a sampled value of the ATE random variable as τ ( m ) BDR = 1 V V X v =1 h Ψ − { m A (1, x v ; e ξ ( m ) ) } − Ψ − { m A (0, x v ; e ξ ( m ) ) } i .x. Repeat this procedure M times, m = (1, ..., M ), to obtain the posterior predictivedistribution. In this subsection we present some simulation to demonstrate the DR properties of our ap-proximate Bayesian approach. The simulations are based on the following data generatingprocess: a binary treatment D is assigned as a function of covariate X , and the outcome ofinterest Y depends on both treatment D and covariate XX ∼ Normal(0, 10) D ∼ Bernoulli(expit( α + α X )) Y ∼ Normal( β + β D + β X , 5)where α = 2, α = 0.2, β = 10, β = 5, β = 0.2. The true ATE is given by parameter β ,that is τ = 5.0.The following models are tested:1. b τ BOR - an approximate Bayesian OR model based on the correctly speciﬁed model: E [ Y | D , X ] = β + β D + β X . The point estimate reported in the simulations is themean value of the ATE posterior predictive distribution, i.e. b τ BOR = 1 L L X l =1 " V V X v =1 h Ψ − n m (cid:16) x v ; e β ( l ) (cid:17)o − Ψ − n m (cid:16) x v ; e β ( l ) (cid:17)oi .2. b τ BOR - same as [1.] except based based on an incorrectly speciﬁed OR model withcovariate X excluded. 8. b τ BDR - an approximate Bayesian DR model based on an incorrectly speciﬁed OR model( X excluded) but with weights based on the correct PS model b τ BDR = 1 L L X l =1 " V V X v =1 h Ψ − n m A (cid:16) x v ; e ξ ( l ) (cid:17)o − Ψ − n m A (cid:16) x v ; e ξ ( l ) (cid:17)oi .4. b τ BDR - an approximate Bayesian DR model based on a correctly speciﬁed OR modelbut with weights based on an incorrect PS model, which is generated randomly fromthe continuous uniform distribution: b π ( D | X ) ∼ Uniform(0, 1).5. b τ BDR - an approximate Bayesian DR model based on an incorrectly speciﬁed OR modelweighted with weights based on an incorrect PS model.The simulations are based on 1000 runs on generated datasets of size 1,000. In each case,we place a Normal prior on the treatment coeﬃcient β , with mean equal to the true value(5 in this case). We set the measure of faith k to be relatively low so as not to overly aﬀectthe results. Table 1 shows our simulation results. Mean values and variances of the pointestimates obtained (i.e. means and variances of the ATE distributions) and the mean squarederror (MSE) are reported. Table 1: Simulation results for posterior predictive distributions ( τ = 5.0 ). Av. Est. Emp. Var. MSEBOR1 5.004 0.036 0.036BOR2 5.350 0.036 0.157BDR1 5.008 0.046 0.046BDR2 5.018 0.862 0.862BDR3 5.360 0.946 1.074The mean of the posterior distribution for the ATE from the correctly speciﬁed OR model, b τ BOR , provides a good approximation to the true value of τ . The incorrectly speciﬁed ORmodel, BOR2, fails to address confounding and consequently b τ BOR provides a poor approxi-mation to the true ATE. Weighting the incorrectly speciﬁed OR model with weights b κ ( D , X ),based on a correctly speciﬁed PS model, as in the BDR1 model, provides correction for mis-speciﬁcation bias with an average point estimate very close to the true value, but slightly largerposterior variances relative to the correctly speciﬁed OR model. The BDR2 model simula-tion also produces valid point estimates because weighting by weights based on an incorrectlyspeciﬁed PS model does not does not induce bias, but it does increase variance. Finally, ifboth the OR and PS models are wrongly speciﬁed as in BDR3, the model fails to produce agood point estimate of the mean ATE. We have data on the location of ﬁxed speed cameras for 771 camera sites in the followingEnglish administrative districts: Cheshire, Dorset, Greater Manchester, Lancashire, Leicester,Merseyside, Sussex and the West Midlands. These sites form our group of treated units. To9elect potential control sites we randomly sampled a total of 4787 points on the networkwithin our eight administrative districts. The large ratio of potential control to treated unitsis adopted to ensure that we have a suﬃcient number of control units after we apply a matchingalgorithm.Our outcome variable is the number of personal injury collisions (PICs) per kilometre asrecorded from the location of the speed cameras, or in the case of control groups, from therandomly selected point. The PIC data are taken from records completed by police oﬃcerseach time that an incident is reported to them. The individual police records are collated andprocessed by the UK Department for Transport as the ‘STATS 19’ data. The location of eachPIC is recorded using the British National Grid coordinate system and can be located on amap using Geographical Information System (GIS) software. Because the established dates ofspeed cameras vary from 2002 to 2004, the period of analysis is from 1999 to 2007 to ensurethe availability of collision data for the years before and after the camera installation for everycamera site.

To adequately adjust for confounding we require a set of measured covariates that adequatelyrepresent the characteristics of units that simultaneously determine treatment assignmentand outcome. For the UK there exists a formal set of site selection guidelines for ﬁxed speedcameras (see Gains et al. 2004) that are extremely valuable in choosing covariates. The criteriaare as follows1. Site length: between 400-1500 m.2. Number of fatal and serious collisions (FSCs): at least 4 FSCs per km in last threecalendar years.3. Number of personal injury collisions (PICs): at least 8 PICs per km in last three calendaryears.4. 85th percentile speed at collision hot spots: 85th percentile speed at least 10% abovespeed limit.5. Percentage over the speed limit: at least 20% of drivers are exceeding the speed limit.Criteria one to three are primary guidelines for site section and criteria four and ﬁve are ofsecondary importance. There are sites that do not meet the above the above criteria thatwill still be selected as enforcement sites, mainly for reasons such as community concern andengineering factors.Selection of the speed camera sites was primarily based on collision history. collision datacan be obtained from the STATS 19 database and located on the map using GIS. However,secondary criteria such as the 85th percentile speed and percentages of vehicles over the speedlimit are normally unavailable for all sites on UK roads. If speed distributions diﬀer betweenthe treated and untreated groups, then the failure to include the speed data could bias theestimation, an issue discussed in previous research (e.g. Mountain et al. 2005, Gains et al.2004). For untreated sites with the speed limit of 30 mph and 40 mph, the national average10ean speed and percentages of speeding are similar to the data for the camera sites. The focusgroups for this study are sites with the speed limit of 30 mph and 40 mph throughout theUK. It is reasonable to assume that there is no signiﬁcant diﬀerence in the speed distributionbetween the treated and untreated groups and hence exclusion of the speed data will not aﬀectthe accuracy of the propensity score model.It is also possible that drivers may choose alternative routes to avoid speed cameras sites.collision reduction at camera sites may include the eﬀect induced by a reduced traﬃc ﬂow.The beneﬁts of speed cameras will therefore be overestimated without controlling for thechange in traﬃc ﬂow. The annual average daily ﬂow (AADF) is available for both treated anduntreated roads and the eﬀect due to traﬃc ﬂow is controlled for in this study by includingthe AADF in the propensity score model.In addition to the criteria that strongly inﬂuence the treatment assignment, factors that aﬀectthe outcomes should also be taken into account when the propensity score model is speciﬁed.We further include road characteristics such as: road types, speed limit, and the number ofminor junctions within site length, which are suggested as important factors when estimatingthe safety impact of speed cameras (Gains et al. 2005, Christie et al. 2003).

The outcome variable of interest is the number of collisions per site. For the OR model theresponse is speciﬁed in diﬀerenced form, i.e. the number of collisions in the post-treatmentperiod minus the number of collisions in the pre-treatment period. Diﬀerencing allows for theexistence of unit level time-invariant eﬀects, which could be random or ﬁxed. The PS model isestimated using a logit Generalized Additive Mixed Model (GAMM) speciﬁcation. Matchingand overlap is achieved using nearest neighbour matching via the

MatchIt package in R . Theweighted OR model is then estimated on the trimmed dataset, which satisﬁes matching andoverlap conditions, using a Gaussian GAMM speciﬁcation. We use GAMMs to avoid makinga-priori assumptions on the functional form of the relationships under study.As mentioned in the introduction, the DR approach is particularly attractive for our appli-cation because of the diﬀerences inherent in our PS and OR model speciﬁcations. Due tothe existence of formal criteria for SC assignment we have a high degree of conﬁdence in theability of our covariates to eliminate confounding via the PS model. For the OR model, dif-ferencing of the response variable before and after treatment allows for the existence of sitelevel time-invariant unobserved eﬀects in addition to measured confounders. Thus, there aresubtle diﬀerences in the way we model the ATE via the PS or OR approaches. A degree ofrobustness is oﬀered using a DR approach since we will obtain a consistent estimate of theATE if just one of the component models is well speciﬁed. The objective of our application is to estimate the marginal eﬀect of SCs on RTCs, havingadjusted for baseline confounders. We estimate the following models: an OR model, an IPWmodel, a DR model comprising an OR model weighted with the inverse PS covariate (DR),and a na¨ıve model which is simply the OR model without covariates. For the na¨ıve model11

60 −50 −40 −30 −20 −10 0 . . . . . . . Predictive Posterior Distribution of ATE mean = −28.912 s.d. = 7.041 D en s i t y Figure 1: Predictive posterior distribution of the average treament eﬀect fromthe doubly-robust model. we report results using the matched and full samples. All models are repeatedly estimatedusing the approximate Bayesian approach outlined above. In addition to the posterior pre-dictive distribution for the ATE we report point estimates at the mean of the posterior. Forcomparison, we also report Frequentist results.The results are shown in table 2 below including means and credible intervals of the ATEdistributions. Our causal models (OR, IPW and DR) indicate that the presence of speedcameras corresponds with an average change in the number of RTCs of -29% to -31% . Notethat the approximate Bayesian and Frequentist point estimates are very similar, which iswhat we would expect for linear models with uninformative priors. In comparison, the Na¨ıvemodel which does not adjust for confounding, ﬁnds a higher ATE of -35% using the matchedsample and -39% using the unmatched sample. Figure 1 below shows the posterior predictivedistribution derived from the DR model.

Table 2: Bayesian and Frequentist bootstrapped estimates of the average treat-ment eﬀect

Bayesian bootstrap Frequentist bootstrapposterior mean s.d. 95% cred. int. Est. s.e.OR -28.981 7.002 (-42.912, -15.971) -28.746 7.933IPW -31.313 11.107 (-53.083, -9.543) -31.103 10.453DR -28.912 7.041 (-43.240, -15.267) -28.490 8.003Na¨ıve (matched sample) -35.039 10.045 (-55.284, -16.029) -36.260 10.617Na¨ıve (full sample) -39.144 6.504 (-48.681, -16.037) -40.297 4.213Thus, it would appear that correcting for potential sources of confounding serves to reduce themagnitude of our ATE estimates, but we still ﬁnd a substantial reduction in RTCs associatedwith presence of speed cameras. The diﬀerence in estimated ATE between the na¨ıve andcausal models makes sense given that the formal criteria used to assign SCs favours sites thathave exhibited high rates of collisions in the past. Crucially, our causal models imply that SCsdo make a real diﬀerence to RTCs over and above the modelled eﬀect of confounding fromnon random assignment. 12

Conclusions

In this paper we have the quantiﬁed the causal eﬀect of speed cameras on road traﬃc collisionsvia an approximate Bayesian doubly robust approach. This is the ﬁrst time such an approachhas been applied to study road safety outcomes. The method we propose could be usedmore generally for estimation of crash modiﬁcation factor (CMF) distributions. Simulationsdemonstrate that the approach is doubly-robust for average treatment eﬀect estimation. Ourresults indicate that speed cameras do cause a signiﬁcant reduction in road traﬃc collisions,by as much as 30% on average for treated sites. This is an important result that couldhelp inform public policy debates on appropriate measures to reduce RTCs. The adoption ofevidence based approaches by public authorities, based on clear principles of causal inference,could vastly improve their ability to evaluate diﬀerent courses of action and better understandthe consequences of intervention.

References

Bang, H. and J. M. Robins (2005). Doubly robust estimation in missing data and causalinference models.

Biometrics 61 , 962–972.Carnis, L. and E. Blais (2013). An assessment of the safety eﬀects of the french speed cameraprogram.

Accident Analysis & Prevention 51 , 301–309.Chen, G., W. Meckle, and J. Wilson (2002). Speed and safety eﬀect of photo radar enforcementon a highway corridor in british columbia.

Accident Analysis & Prevention 34 , 129–138.Christie, S., R. Lyons, F. Dunstan, and S. Jones (2003). Are mobile speed cameras eﬀective?a controlled before and after study.

Injury Prevention 9 , 302–306.Cunningham, C., J. Hummer, and J. Moon (2000). Analysis of automated speed enforcementcameras in charlotte, north carolina.

Transportation Research Record 2078 , 127–134.De Pauw, E., S. Daniels, T. Brijs, E. Hermans, and G. Wets (2014). An evaluation of thetraﬃc safety eﬀect of ﬁxed speed cameras.

Safety Science 62 , 168–174.DfT (2016). Transport statistics great britain: 2016. Statistical release, UK Department forTransport, London.DfT (2017). Reported road casualties in great britain: quarterly provisional estimates. Sta-tistical release, UK Department for Transport, London.Elvik, R. (1997). Eﬀects on accidents of automatic speed enforcement in norway.

Transporta-tion Research Record 1595 , 14–19.Gains, A., B. Heydecker, J. Shrewsbury, and S. Robertson (2004). The national safety cam-era programme 3-year evaluation report. Technical working paper, UK Department forTransport.Gains, A., B. Heydecker, J. Shrewsbury, and S. Robertson (2005). The national safety cam-era programme 4-year evaluation report. Technical working paper, UK Department forTransport. 13oldenbeld, C. and I. van Schagen (2005). The eﬀects of speed enforcement with mobile radaron speed and accidents. an evaluation study on rural roads in the dutch province friesland.

Accident Analysis & Prevention 37 , 1135–1144.Graham, D. J., E. J. McCoy, and D. A. Stephens (2016). Approximate bayesian inference fordoubly robust estimation.

Bayesian Anal. 11 (1), 47–69.Hauer, E., D. W. Harwood, F. M. Council, and M. S. Griﬃth (2002). Estimating safety bythe empirical bayes method: a tutorial.

Transportation Research Record 1784 , 126–131.Hess, S. and J. Polak (2003). Eﬀects of speed limit enforcement cameras on accident rates.

Transportation Research Record 1830 , 25–34.Horvitz, D. G. and D. J. Thompson (1952). A generalization of sampling without replacementfrom a ﬁnite universe.

Journal of the American Statistical Association 47 , 663–685.Hoye, A. (2015). Safety eﬀects of ﬁxed speed cameras - an empirical bayes evaluation.

AccidentAnalysis & Prevention 82 , 263–269.Jones, A. P., V. Sauerzapf, and R. Haynes (2008). The eﬀects of mobile speed camera in-troduction on road traﬃc crashes and casualties in a rural county of england.

Journal ofSafety Research 39 , 101–110.Kang, J. D. Y. and J. L. Schafer (2007). Demystifying double robustness: A comparison ofalternative strategies for estimating a population mean from incomplete data.

StatisticalScience 22 (4), 523–539.Keall, M. D., L. J. Povey, and W. J. Frith (2001). The relative eﬀectiveness of a hidden versusa visible speed camera programme.

Accident Analysis & Prevention 33 , 277–284.Li, H., D. Graham, and A. Majumdar (2013). The impacts of speed cameras on road accidents:an application of propensity score matching methods.

Accident Analysis & Prevention 60 ,148–157.Lunceford, J. K. and M. Davidian (2004). Stratiﬁcation and weighting via the propensity scorein estimation of causal treatment eﬀects: a comparative study.

Statistics in Medicine 23 ,2937–2960.Maher, M. (2015). A note on the modelling of tﬂ ﬁxed speed camera data. Technical report,University College London.Mountain, L. J., W. M. Hirst, and M. J. Mahar (2004). Costing lives or saving lives: a detailedevaluation of the impact of speed cameras.

Traﬃc Engineering & Control 45 , 280–287.Mountain, L. J., W. M. Hirst, and M. J. Mahar (2005). Are speed enforcement camerasmore eﬀective than other speed management measures? the impact of speed managementschemes on 30 mph roads.

Accident Analysis & Prevention 37 , 742–754.Muliere, P. and P. Secchi (1996). Bayesian nonparametric predictive inference and bootstraptechniques.

Annals of the Institute of Statistical Mathematics 48 (4), 663–673.Newton, M. A. and A. E. Raftery (1994). Approximate Bayesian inference with the weightedlikelihood bootstrap (with discussion).

Journal of the Royal Statistical Society. Series B(Methodological) 56 (1), pp. 3–48. 14obins, J. M. (2000). Robust estimation in sequentially ignorable missing data and causal in-ference models. In

Proceedings of the American Statistical Association, Section on BayesianStatistical Science , pp. 6–10. Alexandria, VA: American Statistical Association.Robins, J. M. and A. Rotnitzky (2001). Comment on “Inference for semiparametric models:some questions and an answer”.

Statistical sinica 11 , 920–936.Robins, J. M., A. Rotnitzky, and M. J. van der Laan (2000). Comment on the Murphy and Vander Vaart article “On proﬁle likelihood”.

Journal of the American Statistical Association 95 ,431–435.Rosenbaum, P. R. and D. B. Rubin (1983). The central role of the propensity score inobservational studies for causal eﬀects.

Biometrika 70 (1), 41–55.Rubin, D. B. (1978). Bayesian inference for causal eﬀects: the role of randomization.

Annalsof Statistics 6 (1), 34–58.Rubin, D. B. (1980). Comment on ‘Randomization analysis of experimental data in the Fisherrandomization test’ by Basu.

Journal of the American Statistical Association 75 (371), 591–593.Rubin, D. B. (1981). The Bayesian bootstrap.

The Annals of Statistics 9 (1), 130–134.Rubin, D. B. (1986). Comment: which ifs have causal answers?

Journal of the AmericanStatistical Association 81 (396), 961–962.Rubin, D. B. (1990). Neyman (1923) and causal inference in experiments and observationalstudies.

Statistical Science 5 (4), 472–480.Shin, K., S. P. Washington, and I. van Schalkwyk (2009). Evaluation of the scottsdale loop 101automated speed enforcement demonstration program.

Accident Analysis & Prevention 41 ,393–403.Tsiatis, A. A. and M. Davidian (2007). Comment: Demystifying double robustness: A com-parison of alternative strategies for estimating a population mean from incomplete data.

Statistical Science 22 (4), 569–573.van der Laan, M. and J. M. Robins (2003).