Impact of COVID-19 on City-Scale Transportation and Safety: An Early Experience from Detroit
IImpact of COVID-19 on City-Scale Transportationand Safety: An Early Experience from Detroit
Yongtao Yao
Department of Computer ScienceWayne State University
Detroit, MI 48202, [email protected]
Tony G. Geara
Department of Public WorksCity of Detroit
Detroit, MI 48216, [email protected]
Weisong Shi
Department of Computer ScienceWayne State University
Detroit, MI 48202, [email protected]
Abstract —The COVID-19 pandemic brought unprecedentedlevels of disruption to the local and regional transportationnetworks throughout the United States, especially the MotorCity—Detroit. That was mainly a result of swift restrictivemeasures such as statewide quarantine and lock-down ordersto confine the spread of the virus and flatten-the-curve alongwith a natural reaction of the population to the rising numberof COVID-19-related cases and deaths. This work is driven byanalyzing five types of real-world data sets from Detroit relatedto: traffic volume, daily cases, weather, social distancing index,and crashes from January 2019 to June 2020. The primary goalis figuring out the impacts of COVID-19 on the transportationnetwork usage (traffic volume) and safety (crashes) for the Cityof Detroit, exploring the potential correlation between thesediverse data features, and determining whether each type ofdata ( e.g., traffic volume data) could be a useful factor in theconfirmed-cases prediction. In addition, early future prediction ofCOVID-19 rates can be a vital contributor to live-saving advancedpreventative and preparatory responses. In order to achieve thisgoal, a deep learning model was developed using long short-term memory networks to predict the number of confirmedcases within the next one week. The model demonstrated apromising prediction result with a coefficient of determination( R ) of up to approximately 0.91. Moreover, in order to providestatistical evaluation measures of confirmed-case prediction andto quantify the prediction effectiveness of each type of data,the prediction results of six feature groups are presented andanalyzed. Furthermore, six essential observations with supportingevidence and analyses are presented. Those will be helpful fordecision-makers to take specific measures that aid in preventingthe spread of COVID-19 and protect public health and safety.The goal of this paper is to present a proposed approach whichcan be applied, customised, adjusted, and replicated for analysisof the impact of COVID-19 on a transportation network andprediction of the anticipated COVID-19 cases using a similar dataset obtained for other large cities in the USA or from aroundthe world. Index Terms —COVID-19, Data, Analysis, Prediction, Quaran-tine, Transportation networks, Traffic volume, Crashes, Socialdistancing, Weather, Daily cases, Detroit.
I. I
NTRODUCTION
The 2019 Novel Coronavirus (SARS-CoV-2), commonlyknown as COVID-19, has spread rapidly across the globe.As of July 28, 2020, over 16 million confirmed cases and650 thousand deaths had been reported worldwide (WHOSituation Report-190, 2020) [1]; meanwhile, the United Statesis one of the most affected nations in the world, with more than 4 million confirmed cases and 149 thousand deaths. Thistragic spread of COVID-19 nationally has resulted in disparateimpacts across states and cities.To slow the progression of COVID-19 and limit fatalities,public officials throughout Michigan had published a seriesof government directives that have been changed over time,starting with voluntary requests for stay-at-home and restric-tions on large public gatherings, and then, statewide quarantineand lock-down orders. Nonetheless, essential travel activitiescontinue to take place across Detroit, such as people’s accessto daily supplies, medical services, and other basic necessitiesof welfare and safety. These government directives inevitablyaffect various forms of travel activities and then impact trans-portation across Detroit significantly [2].Our work is based on the hypothesis that the objective,reliable and continuous transportation data can reflect thedegree of social distancing, i.e., the possibility of socialactivities and interpersonal communication to a certain extent,while many previous works provide the evidence proving thatthe social distancing measures enacted have led to control ofCOVID-19 [3]–[5]. Therefore, we believe that traffic data canprovide a basis for the current and incoming pandemic status,and it is meaningful to explore the changes in traffic patternsduring the COVID-19 pandemic for a specific city.In addition, crash-related information, such as total numberof daily crashes, severity, and crash type can indirectly reflecttraffic conditions [6]–[8]. Since one of the focus points of thiswork is to explore the correlation between traffic volume dataand the outbreak of COVID-19, we also collected and analyzedcrash data from Detroit to identify the impacts of crash-relatedinformation on the confirmed-cases prediction.Through our literature review, an abundance of studiespointed out that weather factors, e.g., temperature (°C) andwind speed (mph) can contribute to the spread of COVID-19[9]–[11]. Inspired by these works, we also sought to determinewhether weather could be a factor in the spread of this disease.Moreover, it is well known that maintaining social dis-tancing can prevent the spread of COVID-19 disease andcontain the number of casualties [12]–[17], which is based onthe assumption that the degree of social distancing is highlyrelated to the spread speed of COVID-19. Therefore, we alsocollected social distancing related data for Wayne County and a r X i v : . [ phy s i c s . s o c - ph ] A ug ichigan state, to explore and test the correlation betweensocial distancing and the severity of COVID-19 disease.By considering the aforementioned data features, we thenaimed to build an effective deep learning model using longshort-term memory networks (LSTM) to predict the numberof confirmed cases in Detroit. In order to provide statisticalevaluation measures to quantify the prediction effectiveness ofeach type of data on the confirmed-cases prediction results, i.e., the performance of LSTM, we trained LSTM on sixexperiment groups with different features, then analyzed theprediction results of the six feature groups.Our observations and prediction model are intended to helpdecision-makers to concentrate suitable public health effortsand apply effective transportation management techniques toprotect residents and improve safety for Detroiters. It mustbe noted that the the presented statistical analysis approachesand the proposed prediction model were used on the Detroit-based data set as an example and due to availability. Themethod could be applied, customised, adjusted, and replicatedfor analysis of the impact of COVID-19 on a transportationnetwork and prediction of the anticipated COVID-19 casesusing a similar data set obtained from other large cities fromwithin the USA or from around the world.Particularly, this paper set out to answer and is driven bythe following question: i ) What are the sudden and drastic changes in overalltemporal traffic patterns resulting from the outbreak ofCOVID-19? ii ) Did traffic decrease and then recover evenly across allmeasured signals during COVID-19 and as COVID-19restrictions were being lifted? iii ) What are the impacts of COVID-19 and the social dis-tancing on the reasons behind crashes? iv ) Can we leverage the traffic count data, crash data, andother COVID-19 related information, such as the data ondaily confirmed cases plus social distancing and com-bined with weather information, to predict the number ofCOVID-19 confirmed cases for the next one week?The rest of the paper is organized as follows. We first reviewprevious works and introduce the research gap in Sec. II.Then, we elaborate on the data sets used for the experimentsin Sec. III. Sec. IV describes experimental investigations forthe questions presented in i ) to iii ), and Sec. V demonstratesthe proposed algorithms to predict the number of COVID-19confirmed cases as described in question iv ). After presentinga discussion of the prediction results in Sec. VI, we concludethe work in Sec. VII.II. R ELATED W ORK
In this section, we review recently published works thatfocus on the COVID-19 confirmed-cases prediction or similarresearch triggered by the outbreak of COVID-19 in terms oftransportation, weather, social distancing, and other aspects.To the best of our knowledge, prior works do not consider theeffects of traffic volume data and crash-related information onthe research of COVID-19 forecasts.
A. Transportation and COVID-19
The travel restrictions put in place to reduce the spread ofCOVID-19 resulted in a sharp reduction in traffic throughoutthe United States. Some recent works explored the changes inthe transportation mode. For example, in the work of [2], Hu etal. studied transportation modes during and after the COVID-19 pandemic using basic laws of traffic and mathematicalanalysis to explore scenarios of increased car commuting.Lacus et al. [18] analyzed data on air traffic worldwidewith the scope of analyzing the impact of the travel ban onthe aviation sector as well as after changes in COVID-19diagnostic criteria. Lau et al. [19] calculated the correlationof air traffic to the number of confirmed COVID-19 cases anddetermined the growth curves of cases before and after lock-down. Teixeira et al. [20] presented clues on how bike-sharingcan support the transition to a post-COVID-19 society.
B. Weather and COVID-19
Moreover, the weather factors including temperature (°C),humidity(%), wind speed (mph) are regarded as the factorsthat triggered the spread of COVID-19 in recent works. Forexample, through the Spearman-rank correlation test, Tosepu et al. [9] proved that among the minimum temperature,maximum temperature, average temperature, humidity, andrainfall, only the average temperature is significantly relatedto the COVID-19 pandemic in Jakarta Indonesia. Anotherexperiment [10] was conducted based on the daily new casesand weather information in 50 states in the United States,and clarify those weather parameters i.e., temperature andabsolute humidity will help classify the risky geographic areasin different countries. Besides, based on the Spearman’s corre-lation coefficients, Mehmet et al. [11] pointed out the highestcorrelations between wind speed (mph) with the outbreak ofCOVID-19. In addition, a few studies have claimed that warmweather can possibly slow down the global pandemic [11] ofCOVID-19 by considering nine cities in Turkey.
C. Social Distancing and COVID-19
In addition, a portion of the related work studied therelationship between social distancing and COVID-19. Forinstance, Singh et al. [12] focused on the age-structured impactof social distancing on the COVID-19 epidemic in India andpresented a mathematical model of the spread of infection ina population that structured by age and social contact betweenages. Courtemanche et al. [5] pointed out that there would havebeen ten times greater spread of COVID-19 without shelter-in-place orders. [13] presented that social distancing and shelterin place has had some impact on crime and disorder, but onlyfor a restricted collection of crime types and not consistentlyacross places.
D. Other Coronavirus-Related Research
To date, more and more researchers are aggressively shiftingtheir focus to detect, predict, treat, and recover from COVID-19. Some of the prior works do not consider transportation,weather, or social distancing, but they presented promisingesults to address the coronavirus-related issue. For example,Qin et al. [21] proposed a novel model to predict the outbreakof COVID-19 in populations in affected areas based on thesocial media search indexes (SMSI) for dry cough, fever,chest distress, coronavirus, and pneumonia. Lacus et al. [18]calculated the economic impact measured in terms of loss ofGDP due to the aviation sector as well as the social impactdue to job losses related to aviation and correlated sectors(tourism, catering, etc).III. D
ATA D ESCRIPTION AND C RITICAL D ATES
In this section, we discuss all four types of data collectedand analyzed for this study (shown in Table I): ( i ) trafficvolume data, ( ii ) daily cases number including daily confirmedcases and daily death number, ( iii ) weather information, ( iv )social distancing -related data, and ( v ) crash data. For eachcategory of data, we then elaborate on the corresponding datasource and the embodied attributes, respectively. Timeline of shotdown and data collection period.
Knowingthese specific dates are important for researchers to explorethe changes in overall traffic volume patterns across Detroitpertaining to COVID-19 and the impact of these four typesof data on the prediction of the confirmed cases. Note that inDetroit, the onset of COVID-19 was estimated to be ion
March1th , the shutdown started on
March 19th , and people startedgoing back to work in the office on
June 1st . however, both,the number of closed businesses in the downtown area and thenumber of people working from home in general continued toremain high.Thus, we defined the shutdown period as going into effectfrom the 19th of March to 1st of June. We collected andanalyzed transportation data from Detroit before the firstCOVID-19 confirmed cases, during the pandemic, and after therelease of shutdown — from 1/1/2019 to 6/30/2020 — coveringmore than one and a half years. To keep data consistent, wealso analyzed daily cases number, weather information, andsocial index data that spans the same time period.
A. Traffic Volume Data
We collected and analyzed transportation data from 73signalized intersection sites with advanced Remote TrafficSignal Management System (RTSMS) Level-II. Those loca-tions have continuous data collection and analytics metricsthrough camera detection. They are all owned by the City ofDetroit out of a total of 787 City-owned signals and anotherapproximately 700 signals owned by other jurisdictions aroundthe City. These signalized intersections are indispensableparts of urban traffic networks since around two-thirds ofurban vehicle miles traveled on signal controlled roads [22].Fig. 2 depicts the geographical distribution of the studied 73signalized intersections, which are highlighted by the greendots. Those locations provided aggregated daily traffic volumedata for the following 10 attributes — Bus, MotorizedVehicle,PickupTruck, ArticulatedTruck, SingleUnitTruck, Pedestrian,Motorcycle, Car, WorkVan, and Bicycle. Those attributes were later compounded into 6 as described in future sections.Additionally, it must be noted that in 2019, the number ofLevel-II RTSMS locations was limited to 25. However, toaccount for that discrepancy the normalized average volumeper intersection was used.
Fig. 1. The distribution of 73 intersections in Detroit.
B. Daily Cases
We obtained and analyzed the daily cases data from Michi-gan’s official Coronavirus dashboard , including ( i ) the num-ber of confirmed cases, and ( ii ) the number of reported deaths.More specifically, as to the number of confirmed cases, thenumber refers to the disease onset date; otherwise, either thespecimen collection date of the first positive COVID-19 testor referral date is used. AS for the number of reported deaths,the corresponding value represents the actual reported dateof death, and 8 confirmed deaths did not have a valid dateavailable and are not included in the collected data. Q uan t i t y Date
Confirmed Cases Death
Fig. 2. The number of daily confirmed cases and death in Detroit.
C. Weather Data
Since previous work research that weather factors e.g., tem-perature (°C) and the wind speed (mph) may be a contrib-utor to the spread of COVID-19, we also included weatherinformation as one of the input for COVID-19 prediction. Wecollected weather data from the official website of National ceanic and Atmospheric Administration , and analyzed sixweather-related attributes including: Rain precipitation, Snowprecipitation, Average temperature, Maximum temperature,Minimum temperature, and average wind speed. D. Social Distancing Information
We also collected and analyzed the social distancing re-lated attributes for Wayne County and Michigan State fromthe COVID-19 Impact Analysis Platform published by theUniversity of Maryland. This data was not granular enoughto account for specifically the City’s boundaries, however, itincluded valuable data for the various input contributing tosocial distancing on both the County and State levels.In particular, the social distancing index is calculated fromthe six mobility indicators by the following equation: socialdistancing index = 0.8 × [%staying home + 0.01 × (100 − %staying home) × (0.1 × %reduction of all trips comparedwith pre-COVID-19 benchmark + × %reduction of worktrips + × %reduction of non-work trips + × %reduc-tion of travel distance)] + × %reduction of out-of-countytrips. The choice of weight is based on the shared travel ratioof residents and tourists ( e.g., about 20% of all trips are tripsoutside the county, which leads to a choice of 0.8 for residentsand 0.2 for out-of-county travel); what trips are consideredmore important ( e.g., work travel is more important than non-work travel). A higher social distance index score refers to thefewer chances for close interpersonal interaction and reducedopportunities for the transmission of COVID-19. E. Crash Data
To figure out the impact of COVID-19/social distancingmetrics on the rate and severity of crashes, we also collected19 crash metrics for Detroit from the Michigan State Police(MSP) Traffic Crash Reporting System (TCRS) , which con-tains the information related to the number of total crashesper day broken down by severity, other reasons, and types ofcrashes. Table I lists these metrics. The monitoring frequencyfor all metrics are per day.IV. S TATISTICAL A NALYSIS
In this section, we conduct statistical analysis on the col-lected metrics and we present our interesting observationsrelated to the changes of traffic volume and pattern, correlationstudies, and crashes statistics analysis.
A. Changes of Traffic Volume and Pattern1)
Changes of Traffic Volume in 2020:
We first soughtto answer the question of ”What’s the sudden spatial trafficpatterns pertaining to COVID-19”. To answer this question, wefirst calculate the average number of each transportation modeamong 73 signalized intersections for each monitoring day,then We further draw the average number of each mode duringthe period before, during, and after quarantine in Detroit https://data.covid.umd.edu/ https://milogintp.michigan.gov/mdot-waps6/crash/ (shown in Fig. 3). Out of the 10 collected metrics related totraffic counts at signalized intersections (shown in Table I),we combined some of those metrics an consolidated theclassifications down to six categories. For example, the valueof ”Truck Van” represents the total number of all recordedtypes of trucks and vans, while the value of ”Car” indicatesthe total quantity of motorized vehicles and traditional cars.It can be seen that the number of buses, pedestrians, andcars both showed declining trends during the shutdown periodand then increased after the ending of the shutdown. On thecontrary, it is notable that the quantity of Bicycle and Mo-torcycle increased sharply even after the statewide quarantineorder. For example, the average number of Bicycle per dayincreased to approximately 2 × its original value and evenincrease by approximately 4 × during and after the quarantine,respectively. Similarly, the average number of Motorcyclesurged to approximately 3 × its before shutdown numbers. Observations 1:
Although cars are still the predominanttransportation mode, biking and motorcycling have demon-strated a transition in usage and are showing an increasedpopularity (up to 4 × of the previous volume) in the post-COVID-19 urban mobility for Detroit. These numbers donot account for the seasonal weather impact on Bike andMotorcycle usage.This observation may be attributed to the fact that cyclingcan be an alternative mode of transport as it can be compatiblewith social distancing regulations and allow for short individ-ual trips. It must be notes that this excercise does not accountfor the seasonal weather impact on Bike and Motorcycleusage which coincided with the onset of COVID-19. Althoughliterature exploring the role of cycling in previous epidemicsis rare, it is recognized that one of the factors leading to therise of e-bikes in China was the 2002–2004 SARS outbreakas people tried to avoid overcrowded public transportationservices [23], [24]. Additionally, the same pattern was alsoobserved in New York City [20], showing some evidence ofa modal transfer from some subway users to the bike-sharingsystem. Observations 2:
The total number of trucks and vanswas almost the same before and during the shutdown. Anincreased of approximately 40% is observed in the post-shutdown numbers.The above truck-related observation might be attributed to: i ) with the outbreak of the epidemic, a portion of Detroitcitizens are turning to online shopping methods. However,due to the shutdown policies, some industry-related trucks andvans stopped running, leading to a relatively stable volumeof trucks and vans during the quarantine period. ii ) after theshutdown period, although there are no specific rules to limitthe transportation of trucks and vans, it is expected that theincreased delivery demand was maintained in addition to theincreased demand from the reopening of industrial activityand road construction. .5 4.3 10.6 1.1 8.3 6.5 4.6 Bus Pedestrian Car Bicycle Motorcycle Truck_Van Q uan t i t y Transportation Modes
Average Number Before ShutdownAverage Number During ShutdownAverage Number After Shutdown ( X ( X ) ( X ( X ( X Fig. 3. Changes in the daily number of each transportation model before,during, and after quarantine in Detroit. Changes of Traffic Pattern between 2019 and 2020:
In order to the answer ”how did traffic patterns change from 2019 to 2020 considering the impact of COVID-19” ,we explored the temporary distribution of each transportationmode’s quantity among 2019 and 2020, including bus, bicycle,car, motorcycle, and truck. More specifically, we dividedthe collected traffic data set into four groups based on thetemporary periods, i.e.,
TABLE IA
NALYZED M ETRICS S UMMARY . Data Category
10 Bus, MotorizedVehicle, PickupTruck, Ar-ticulatedTruck, SingleUnitTruck, Pedes-trian, Motorcycle, Car, WorkVan, and Bi-cycle. Traffic volume by classification: number of each transportationmode per day.
Daily Case
Weather Data
Crash
19 Total crashes Number of total crashes per day.Fatal, Serious, Minor, Possible, None Severity of crashes (worst injury).Ped, Cyclist, YoungDriver(Under 24) Other reason for crash (Crashes with non-motorized roadwayusers and young drivers under 24 years of age).Single motor vehicle, Head on, Head onleft turn, Angle, Rear end, Rearend rightturn, Sideswipe same, Sideswipe opposite,Backing, Other, UnknownNull or not en-tered Type of crashes.
SocialDistancingRelated Data
21 Social distancing index An integer from 0 (cid:118) W ” shape, i.e., the medium value of bicycle and motorcycleis significantly lower than the value of bus. However, when itcomes to ”2020 March - June”, this ” W ” shape is disappearedsince the medium value of bicycles is increased while themedium value of buses is declined markedly compared withother sub-figures. Observations 3:
Comparing the traffic volume data of 2019and 2020, the traffic pattern of 2020 March - June (outbreakperiod of COVID-19) is significantly different from thepatterns of 2019 and the first two months of 2020. N o r m a li z ed T r a ff i c V o l u m e N o r m a li z ed T r a ff i c V o l u m e Fig. 4. The change of traffic patterns from 2019 to 2020 pertaining to COVID-19.
B. Correlation Study
To further explore the correlation between each type ofcollected metrics, such as the correlation between the numberof new confirmed cases with the daily traffic volume, weconduct correlation coefficient analysis and joint distributionanalysis, which present a statistic analysis results on theircorrelation. Correlation Coefficients:
One of the more frequentlyreported statistical methods involves correlation analysis wherea correlation coefficient is reported representing the degree oflinear association between two variables [25], [26].In this work, we calculated the correlation coefficientsbetween i ) transportation modes ( e.g., bus, pedestrian, bicycle,car, motorcycle, and truck), ii ) total crashes, iii ) weather info( e.g., average temperature, rain precipitation, and daily averagewind speed), iv ) social distancing index of Wayne County andMichigan, and v ) daily cases ( e.g., daily confirmed cases anddaily deaths). The formula for the correlation coefficient isdefined as follows. ρ x,y = E [ XY ] − E [ X ] E [ Y ] (cid:113) E [ X ] − ( E [ X ]) (cid:113) E [ Y ] − ( E [ Y ]) (1)The correlation coefficient is a statistic that measures thelinear correlation between two attributes X and Y , with thevalue range of [ − , +1 ]. A value of +1 is the total positivelinear correlation, 0 is no linear correlation, and − is the totalnegative linear correlation. The higher correlation coefficientrepresents the higher correlation between the two attributes.Fig. 5 presents the absolute values of the calculated corre-lation coefficients based on Equation 1. We use the gradientcolor from yellow to blue to indicate the lower to the highercorrelation between two attributes. Observations 4: Daily cases i.e., daily confirmed casesand daily death is highly related, with: − the number of transportation volume, especially forthe cars, − total crashes, − social distancing index at the Wayne County leveland Michigan level, and − the average temperature .Note that although previous work revealed the wind speed(mph) is one of the factors that triggered the spread ofCOVID-19 [11], in this work, we do not find the high linear correlation between the number of daily cases with the windspeed and other weather factors such as rain precipitation inDetroit based on our collected dataset. BusPedestrianBicycleCarMotorcycleTruckTotalcrashes
TAVG
PRCPAWNDSC Index– Wayne CountySC Index – MIDaily deathsDaily cases B u s P ede s t r i an B i cyc l e C a r M o t o r cyc l e T r u ck T o t a l c r a s he s T AV G P RC P A W ND S C I nde x – W a y ne C oun t y S C I nde x – M I D a il y dea t h s D a il y c a s e s Fig. 5. The absolute value of correlation coefficients. TAVG represents theaverage temperature per day. PRCP indicates the daily rain precipitation.AWND stands for the daily average wind speed, and SC index representsthe social distancing index. Joint Distribution:
Based on the correlation coeffi-cients, we can roughly know the correlation degree betweeneach pair of attributes, but we cannot decisively derive thespecific reason for that relationship. Therefore, we calculatednd drew the joint distributions [27] for select pairs of at-tributes. This demonstrates the intuitive quantitative relation-ship between variables (linear / non-linear, or whether thereis a more obvious correlation). Most importantly, the jointdistributions allow us to identify the relationship betweenmultiple attributes.To be certain, for { X = x, Y = y } , we found all elements inthe sample space that satisfy these two values. These elementsformed a subset of the sample space, and the probability ofthis subset was the joint probability of [ P ( X = x, Y = y )] . [ p ( x, y ) = P ( X = x, Y = y )] is called joint PMF (joint prob-ability mass function). The joint probability can be regarded asthe probability when two events occur at the same time. EventA is [ X = x ] , and event B is [ Y = y ] , which is [ P ( A (cid:84) B )] . Fig. 6. Joint distribution of various relevant attributes.
Fig. 6 presents the joint distribution between the dailyconfirmed cases with other attributes such as transportationvolume, total crashes, weather, and social distancing index from 2020 March to June. Green points display the specificvalue of the selected attributes, and red straight lines are usedto represent the linear fit results between the pairs of twoattributes, e.g., the linear fit result of the daily confirmed caseand the number of buses per day. The greater the slope of thered line, the stronger the linear relationship between the twoattributes.Based on the joint distribution of the daily confirmed casesand the rain precipitation shown in Fig. 6, we can deduce thereason why they are not linearly correlated : no matter howlarge the value of daily cases is, the value of PRCP is verysmall, i.e., the rainfall in Detroit from March to June of 2020 isvery small, resulting in a very small range of rainfall. Underthese circumstances, the value of the correlation coefficientbetween the daily cases and PRCP is low, indicating a low linear correlation . C. Crashes Statistics
We then analyzed crash data covering 2019 Jan to Feb, 2019Mar to Jun, 2020 Jan to Feb, and 2020 Mar to Jun. Our goalwas to identify the percent distribution of the different crashtypes, which is shown in Fig. 7.
Crashes Statistics / 2019 Mar - Jun
Crashes Statistics / 2020 Mar - Jun
Crashes Statistics / 2019 Jan - Feb
Crashes Statistics / 2020 Jan - Feb
Single_Motor_Vehicle Head_On Head On-Left TurnAngle Rear End Rear End-Left TurnRear End-Right Turn Sideswipe-Same Sideswipe-OppositeBacking Other Unknown Null | Not entered
Fig. 7. Distribution by crash type percentage for 2019 to 2020.
Observations 5: When Comparing the 2020 crash typepercentages from before the outbreak of COVID-19 withthose during the pandemic period, a clear crash typedistribution shift is observed: − Angle crashes became the most common moving upfrom third place (18.12% ∼ − Rearend crashes moved from first to second place(27.80% ∼ − Sideswipe crashes slightly decrease but maintainedthird place (20.44% ∼ − Single vehicle crashes increased slightly and main-tained forth place (14.51% ∼ Data key: (2020 Jan-Feb ∼ ONFIRMED C ASES P REDICTION
In this section, we aim to conduct a further study on the in-fluence of traffic volume data, weather information data, crashdata, and social distancing index information on the confirmed-cases prediction.Our ultimate goal is building a suitable deeplearning model to predict the number of confirmed cases basedon ( i ) traffic volume data, ( ii ) daily cases number includingdaily confirmed cases and daily death number, ( iii ) weatherinformation, ( iv ) social distancing-related data, and ( v ) crashdata A. Problem Formulation and Solution
Problem Definition.
We formulate the problem of predictingthe number of COVID-19 confirmed cases as a regressionproblem. Specifically, we use T = { input i } ni =1 to representour training data set, in which input i ∈ I denotes allinput features, i.e., the 58 features present in Table Iof Sec. III. Our goal is to employ the best method tolearn the function f , which minimizes the loss function (cid:96) ( h ( input ) ; groundtruth ) , a measurement of the differencebetween the desired output and the actual output of the currentmodel, such that the trained model is able to predict thenumber of confirmed cases over a specific prediction horizonwith high performance. Besides, we choose 21 days as ourmonitoring window, and we aim to predict confirmed-casesfor the next 7 days. Deep Learning Model Selection.
Recently, machine learn-ing methods have been applied with success in regressiontasks. We tackle the confirmed-cases prediction problem usingLong Short-Term Memory Networks (LSTMs) [28], [29] sinceit has become highly successful learning models for bothclassification and regression problems across diverse domains[30]–[33]. Specifically, LSTM is a type of recurrent neuralnetworks (RNNs) with the capability of processing sequencesof sequential data sets. After being proposed by Hochreiterand Schmidhuber [28], LSTM has been proved the abilityto address long-term back-propagating issues. It includes amemory cell and a gating mechanism, which allows it todecide what is kept in the memory cell, and how the new inputdata contributes to what is already in the memory cell. Fig. 8depicts the structure of the LSTM model that we deployed forthe confirmed-cases prediction.
LSTM Dropout LSTM DropoutInput LSTMDropoutLSTMDenseDenseDenseOutput
Fig. 8. The structure of the LSTM.
Effective Measurements.
To be able to design the best prediction method, we needsome metrics to accurately measure the wellness of ourprediction approaches. To begin with, we use some commonly-used measures for our study: coefficient of determination( R ), mean square error (MSE), and the root mean squareerror (RMSE), which both are the commonly used evaluationmetrics for the regression problem [34]. R score is widelyused to indicate the fit of the machine learning model, i.e., thehigher value, the better fit result generated by the model.The maximum value of R is 1 (ideal case), and it may bea negative value with a range of ( −∞ , . MAE measuresthe average magnitude of the errors in a set of predictions,without considering their direction. It’s the average over thetest sample of the absolute differences between prediction andactual observation where all individual differences have equalweight. RMSE is a quadratic scoring rule that also measuresthe average magnitude of the error. It’s the square root of theaverage of squared differences between prediction and actualobservation.Suppose the input data (ground truth) is y = { y , y , ..., y N } , and the prediction result is notedas ˆ y = { ˆ y , ˆ y , ..., ˆ y N } . MSE and RMSE are defined as: M AE = 1 N N (cid:88) i =1 | ˆ y i − y i | (2) RM SE = √ M SE = (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i=1 (y i − ˆy i ) (3)Specifically, the formula to calculate R is defined asfollows: R = 1 − RSST SS = 1 − (cid:80) Ni =1 ( y i − ˆ y i ) (cid:80) Ni =1 ( y i − ¯ y ) (4) ¯ y = 1 N N (cid:88) i =1 y i (5)Where TSS (total sum of squares) is the difference betweenall samples and the mean value, which is N × of the variance.Besides, RSS (residual sum of squares) is the sum of thesquares of all sample errors, which is N × times the MSE.When the predicted value of all samples is the same as thetrue value, RSS is 0, so R equal to 1 (ideal case). B. Model Creation
In this subsection, we introduce the experimental hardwareand the used packages, then we give a detailed explanationof why we conduct experiments on six experimental groups,nd what’s the precise format of our experimental input.
Experimental Setup.
In this work, we adopt NVIDIA GPUWorkstation as our experiment platform, which is powerfulhardware with the high-quality components (4 × GeForce RTX2080 Ti graphics cards) with Intel Xeon E5-2690 v4 (CPU),2.6 GHz of frequency, 14 cores, 64 GB memory, and installedwith Ubuntu 16.04.6 LTS (operating system). NVIDIAGPU Workstation is capable of delivering the cluster-levelperformance for even the demanding applications [35],[36]. The models learned in this paper are implemented inPython, using TensorFlow 1.13.1 [37], Keras 2.1.5 [38], andscikit-learn libraries [39] for model building.
Experimental Groups.
To show the impact of traffic volumedata, weather-related metrics, crash data, and social distancingrelated data on the confirmed-cases prediction, we conductexperiments on six experimental groups. Our first step is tocombine all five categories of 58 features present in Table Iof Sec. III to train models using LSTM methods, and welabel this group as A Group (A represent all). Then, weexclude all traffic volume metrics but keep the left features,and we denote it as A-T Group. Similarly, we exclude weatherinformation but keep other features, and we get the A-WGroup. Since we have two levels of social distancing index, i.e., social distancing index of Wayne County (denoted asSD), and social distancing index of Michigan state (markedas SDM), we delete SD and SDM to get A-SD group andA-SDM group, respectively. Finally, in order to figure out theimpact of crashes data (noted as C) on the confirmed-casesprediction, we get A-C group. Table II shows the input featuresfor A Group, A-T Group, A-W Group, A-SD Group, A-SDM,and A-C Group.
TABLE III
NPUT FEATURES FOR SIX EXPERIMENTAL GROUPS . Group A √ √ √ √ √ √ A-T × √ √ √ √ √ A-W √ √ × √ √ √ A-SD √ √ √ √ × √ A-SDM √ √ √ √ √ ×A-C √ √ √ × √ √ C. Training and Validation Methodology
Training and Validation Methodology.
Next, we provide ahigh-level description of our methodology before delving intothe details. We use 5-fold cross-validation [40], which is avalidation technique to assess the predictive performance ofmachine learning models, judge how models perform to anunseen data set (testing data set) [41] and avoid the over-fittingissue during the training phase. More specifically, our data setis randomly partitioned into five equal-sized sub-samples. Ata time, we take one sub-sample as the testing data set, andtake the remaining four sub-samples as the training data set.Then, we fit a model on the training data set, evaluate it onthe testing data set, and calculate the evaluation scores. After that, we retain the evaluation scores and discard the currentmodel. The process is then repeated five times with differentcombinations of sub-samples, and we use the average of theevaluation scores as the final result for each method.First, we need to determine the hyperparameters of ourmodels—an important aspect of building effective deep learn-ing models. To be concrete, we use hold-out method [42] tosplit up our training phase data set further into the parametertraining process and the validation process (80% and 20% ofthe training phase data set respectively), and the validation isan unbiased evaluation of a model fit on the training datasetwhen tuning parameters. Then, we conduct a grid search onthese values of parameters to find the best combination thatachieves the highest performance.
Avoiding Overfitting of the Models.
Another important factorduring the training process is epoch [43], which indicates thenumber of iterations of processing the input data set duringthe training process. With a higher value of epoch, the erroron training data will reduce further; however, at a crucialtipping point, the network begins to over-fit the training data.Hence, finding the best value of the epoch is essential toavoid overfitting. Figure 9 shows the change in the valueof the training and validation loss functions (the smaller, thebetter) as the epoch increases. Initially, the values of the twoloss functions are decreasing with higher epoch; but after 340epochs, the value of validation loss function slowly increases(higher than the training loss), which indicates the over-fittingissue. Therefore, we choose 340 epochs for LSTM. ss V a l ue epoch Training LossValidation Loss Fig. 9. The validation loss reaches minimum value at 340.
VI. R
ESULTS AND D ISCUSSION
In this section, we present and analyze the sensitivity ofLSTM toward different feature groups. Our discussion in-cludes supporting evidence and reasoning to explain observedtrends, and implications of observed trends for the authoritiesand decision-makers on taking specific measures for Detroit.
A. Prediction Results and Ground Truths
Fig. 10 present the confirmed-cases prediction results, whichis denoted by the black curve. In order to conduct an intuitivecomparison between the prediction results with the groundruth, we also include a red curve to represent the value ofground truth. It can be seen that our prediction result is veryclose to the ground truth, which motivates us to get the statisticevaluation results. o f C a s e s DayPrediction Result Ground Truth
Fig. 10. Prediction result and ground truth.
B. Evaluation Results and Observations
Next, we present the key prediction quality measures forthe six experiment groups (Figure 11). Note that among thethree evaluation metrics, i.e., R , MAE, and RMSE, R isa more intuitive and objective performance indicator of thefitting effect in the regression problem. Therefore, we treat R as our primary evaluation metric. Finally, we make severalinteresting observations as follows: RMSE MAE R^2A 0.0606 0.0378 0.9088A-T 0.0760 0.0457 0.7934A-W 0.0716 0.0438 0.8309A-SD 0.0727 0.0467 0.8178A-SDM 0.0714 0.0404 0.8213A-C 0.0687 0.0450 0.84940.00.20.40.60.81.0
Fig. 11. Model prediction quality with six experiment groups of A (allselected features), A-T (without traffic volume related features), A-W (withoutweather-related features), A-SD (without social distancing related index on theWayne County level), A-SDM (without social distancing related index on theDetroit level), and A-C (without crash-related features). We observe that A group performs the best across allexperiment groups i.e., achieving highest (around 0.91) R score and lowest MAE and RMSE. This observation verifiesour hypothesis that i ) traffic volume data, ii ) weather features, iii ) social distancing-related data, and iv ) crash informationare both useful and helpful for improving the effectiveness ofconfirmed-cases prediction. Considering the difference of the R score between A groupand other five groups, we observe that: (1) A-T group achievesthe lowest highest score, i.e., there is the biggest effectivenessdifference between A group and A-T group, which proves that deleting traffic volume data could result in the most significantadverse effect on the confirmed-cases prediction, i.e., trafficvolume data is critical for the improvement of confirmed-casesprediction. Similar to the above observation, we get the conclusionon the effectiveness comparison between traffic volume data,social distancing related data, weather data, and crash datain terms of prediction improvement (shown in Fig. 12)—adding these four types of data can both improve the pre-diction performance, and traffic volume data is more effectivecompared with social distancing related data, then followedby weather features, and crash data seems has least impactson the prediction performance.
Effectiveness
Traffic Volume Social Distancing Related Data Weather Crash ↓ 𝑅 A-T A-SD A-SDM A-W A-C ↑ RMSE
A-T A-SD A-SDM A-W A-C <> <> < ≈ <>> > > Fig. 12. Effectiveness comparison between four types of data in terms ofprediction improvement. The arrow indicates the higher or lower the value ofthe evaluation metric, the better the prediction performance.
Observations 6:
Besides the daily case data and socialdistancing index, the data related to traffic volume, crashes,and weather can both be a good indicator for COVID-19confirmed-cases prediction. The traffic volume data is veryuseful information regarding the prediction model.VII. C
ONCLUSION
In this work, we collected and analyzed five types of datasets including: traffic volume, daily cases, weather informa-tion, crash features, and social distancing related data. Impor-tant observations with supporting evidence and analysis arepresented to provide practical implications for authorities anddecision-makers on taking preventive actions for Detroit. Interms of crashes there was a clear change in crash percentagedistribution by type. During COVID-19 there was a significantincrease in angle crashes which are typically more severeand indicative of more severe driver-behavior-related issues.In terms of correlations, daily cases i.e., daily confirmed casesand daily death is highly related, with: − the number oftransportation volume, especially for the cars, − total crashes, − social distancing index at the Wayne County level andMichigan level, and − the average temperature. Additionally,we have trained an accuracy deep learning model, which showsthe effectiveness of predicting COVID-19 confirmed-cases forthe next week, i.e., R up to approximately 0.91.The prediction quality is tested on six experiment groups, andthe prediction results also proved that adding traffic volumedata, social distancing related metrics, weather information,and crash feature could both improve the prediction perfor-mance. EFERENCES[1] W. H. Organization et al. , “Coronavirus disease (covid-19): situationreport, 190,” 2020.[2] Y. Hu, W. Barbour, S. Samaranayake, and D. Work, “Impacts of covid-19 mode shift on road traffic,” arXiv preprint arXiv:2005.01610 , 2020.[3] K. E. Ainslie, C. E. Walters, H. Fu, S. Bhatia, H. Wang, X. Xi,M. Baguelin, S. Bhatt, A. Boonyasiri, O. Boyd et al. , “Evidence ofinitial success for china exiting covid-19 social distancing policy afterachieving containment,”
Wellcome Open Research , vol. 5, 2020.[4] G. Briscese, N. Lacetera, M. Macis, and M. Tonin, “Compliance withcovid-19 social-distancing measures in italy: the role of expectations andduration,” National Bureau of Economic Research, Tech. Rep., 2020.[5] C. Courtemanche, J. Garuccio, A. Le, J. Pinkston, and A. Yelowitz,“Strong social distancing measures in the united states reduced thecovid-19 growth rate: Study evaluates the impact of social distancingmeasures on the growth rate of confirmed covid-19 cases across theunited states.”
Health Affairs , pp. 10–1377, 2020.[6] J. C. Stutts, D. W. Reinfurt, L. Staplin, E. Rodgman et al. , “The role ofdriver distraction in traffic crashes,” 2001.[7] S. Candefjord, R. Buendia, E.-C. Caragounis, B. A. Sj¨oqvist, andH. Fagerlind, “Prehospital transportation decisions for patients sustainingmajor trauma in road traffic crashes in sweden,”
Traffic injury prevention ,vol. 17, no. sup1, pp. 16–20, 2016.[8] M. A. Abdel-Aty and H. T. Abdelwahab, “Predicting injury severity lev-els in traffic crashes: a modeling comparison,”
Journal of transportationengineering , vol. 130, no. 2, pp. 204–210, 2004.[9] R. Tosepu, J. Gunawan, D. S. Effendy, H. Lestari, H. Bahar, P. Asfian et al. , “Correlation between weather and covid-19 pandemic in jakarta,indonesia,”
Science of The Total Environment , p. 138436, 2020.[10] S. Gupta, G. S. Raghuwanshi, and A. Chanda, “Effect of weather oncovid-19 spread in the us: a prediction model for india in 2020,”
Scienceof The Total Environment , p. 138860, 2020.[11] M. S¸ahin, “Impact of weather on covid-19 pandemic in turkey,”
Scienceof The Total Environment , p. 138810, 2020.[12] R. Singh and R. Adhikari, “Age-structured impact of social distancingon the covid-19 epidemic in india,” arXiv preprint arXiv:2003.12055 ,2020.[13] G. Mohler, A. L. Bertozzi, J. Carter, M. B. Short, D. Sledge, G. E. Tita,C. D. Uchida, and P. J. Brantingham, “Impact of social distancing duringcovid-19 pandemic on crime in los angeles and indianapolis,”
Journalof Criminal Justice , p. 101692, 2020.[14] M. Painter and T. Qiu, “Political beliefs affect compliance with covid-19social distancing orders,”
Available at SSRN 3569098 , 2020.[15] A. Olivera-La Rosa, E. G. Chuquichambi, and G. P. Ingram, “Keep your(social) distance: Pathogen concerns and social perception in the time ofcovid-19,”
Personality and Individual Differences , vol. 166, p. 110200,2020.[16] H. Chen, W. Xu, C. Paris, A. Reeson, and X. Li, “Social distance andsars memory: impact on the public awareness of 2019 novel coronavirus(covid-19) outbreak,” medRxiv , 2020.[17] J. A. Lewnard and N. C. Lo, “Scientific and ethical basis for social-distancing interventions against covid-19,”
The Lancet. Infectious dis-eases , vol. 20, no. 6, p. 631, 2020.[18] S. M. Iacus, F. Natale, C. Santamaria, S. Spyratos, and M. Vespe,“Estimating and projecting air passenger traffic during the covid-19coronavirus outbreak and its socio-economic impact,”
Safety Science ,p. 104791, 2020.[19] H. Lau, V. Khosrawipour, P. Kocbach, A. Mikolajczyk, J. Schubert,J. Bania, and T. Khosrawipour, “The positive impact of lockdown inwuhan on containing the covid-19 outbreak in china,”
Journal of travelmedicine , vol. 27, no. 3, p. taaa037, 2020.[20] J. F. Teixeira and M. Lopes, “The link between bike sharing and subwayuse during the covid-19 pandemic: The case-study of new york’s citibike,”
Transportation Research Interdisciplinary Perspectives , vol. 6, p.100166, 2020.[21] L. Qin, Q. Sun, Y. Wang, K.-F. Wu, M. Chen, B.-C. Shia, and S.-Y. Wu,“Prediction of number of cases of 2019 novel coronavirus (covid-19)using social media search index,”
International journal of environmentalresearch and public health , vol. 17, no. 7, p. 2365, 2020. [22] J. McCracken, “Demonstration project 93–making the most of today’stechnology,”
Public Roads , vol. 59, no. 3, 1996.[23] J. Weinert, C. Ma, and C. Cherry, “The transition to electric bikes inchina: history and key reasons for rapid growth,”
Transportation , vol. 34,no. 3, pp. 301–318, 2007.[24] P. Simha, “Disruptive innovation on two wheels: Chinese urban trans-portation and electrification of the humble bike,”
Periodica PolytechnicaTransportation Engineering , vol. 44, no. 4, pp. 222–227, 2016.[25] R. Taylor, “Interpretation of the correlation coefficient: a basic review,”
Journal of diagnostic medical sonography , vol. 6, no. 1, pp. 35–39,1990.[26] J. Lee Rodgers and W. A. Nicewander, “Thirteen ways to look at thecorrelation coefficient,”
The American Statistician , vol. 42, no. 1, pp.59–66, 1988.[27] A. Sklar, “Random variables, joint distribution functions, and copulas,”
Kybernetika , vol. 9, no. 6, pp. 449–460, 1973.[28] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neuralcomputation , vol. 9, no. 8, pp. 1735–1780, 1997.[29] F. D. dos Santos Lima, G. M. R. Amaral, L. G. de Moura Leite, J. P. P.Gomes, and J. de Castro Machado, “Predicting failures in hard driveswith lstm networks,” in
Proceedings of the 2017 Brazilian Conferenceon Intelligent Systems (BRACIS) . IEEE, 2017, pp. 222–227.[30] S. Basak, A. Dubey, and L. Bruno, “Analyzing the cascading effectof traffic congestion using lstm networks,” in . IEEE, 2019, pp. 2144–2153.[31] J. Hong, Z. Wang, and Y. Yao, “Fault prognosis of battery system basedon accurate voltage abnormity prognosis using long short-term memoryneural networks,”
Applied Energy , vol. 251, p. 113381, 2019.[32] S. Lu, B. Luo, T. Patel, Y. Yao, D. Tiwari, and W. Shi, “Making diskfailure predictions smarter!” in { USENIX } Conference on File andStorage Technologies ( { FAST } , 2020, pp. 151–167.[33] S. Lu, Y. Yao, and W. Shi, “Collaborative learning on the edges: Acase study on connected vehicles,” in { USENIX } Workshop on HotTopics in Edge Computing (HotEdge 19) , 2019.[34] C. Anastassopoulou, L. Russo, A. Tsakris, and C. Siettos, “Data-basedanalysis, modelling and forecasting of the covid-19 outbreak,”
PloS one ,vol. 15, no. 3, p. e0230405, 2020.[35] F. Spiga and I. Girotto, “phiGEMM: a CPU-GPU library for port-ing quantum espresso on hybrid systems,” in . IEEE, 2012, pp. 368–375.[36] I. V. Morozov, A. Kazennov, R. Bystryi, G. E. Norman, V. Pisarev,and V. V. Stegailov, “Molecular dynamics simulations of the relaxationprocesses in the condensed matter on GPUs,”
Computer Physics Com-munications , vol. 182, no. 9, pp. 1974–1978, 2011.[37] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,S. Ghemawat, G. Irving, M. Isard et al. , “Tensorflow: a system for large-scale machine learning,” in
Proceedings of the 12th USENIX Symposiumon Operating Systems Design and Implementation (OSDI) , vol. 16, 2016,pp. 265–283.[38] A. Gulli and S. Pal,
Deep Learning with Keras . Packt Publishing Ltd,2017.[39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al. ,“Scikit-learn: Machine learning in python,”
Journal of machine learningresearch , vol. 12, no. Oct, pp. 2825–2830, 2011.[40] R. Kohavi et al. , “A study of cross-validation and bootstrap for accuracyestimation and model selection,” in
International Joint Conference onArtificial Intelligence (IJCAI) , vol. 14, no. 2, 1995, pp. 1137–1145.[41] J. D. Rodriguez, A. Perez, and J. A. Lozano, “Sensitivity analysis of k-fold cross validation in prediction error estimation,”
IEEE transactionson pattern analysis and machine intelligence (TPAMI) , vol. 32, no. 3,pp. 569–575, 2010.[42] J.-H. Kim, “Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap,”
Computational statistics& data analysis , vol. 53, no. 11, pp. 3735–3745, 2009.[43] A. Graves and J. Schmidhuber, “Framewise phoneme classificationwith bidirectional lstm and other neural network architectures,”