[PDF] Impact of COVID-19 on City-Scale Transportation and Safety: An Early Experience from Detroit

Abstract

The COVID-19 pandemic brought unprecedented levels of disruption to the local and regional transportation networks throughout the United States, especially the Motor City: Detroit. That was mainly a result of swift restrictive measures such as statewide quarantine and lock-down orders to confine the spread of the virus. This work is driven by analyzing five types of real-world data sets from Detroit related to traffic volume, daily cases, weather, social distancing index, and crashes from January 2019 to June 2020. The primary goal is figuring out the impacts of COVID-19 on the transportation network usage (traffic volume) and safety (crashes) for the Detroit, exploring the potential correlation between these diverse data features, and determining whether each type of data (e.g., traffic volume data) could be a useful factor in the confirmed-cases prediction. In addition, a deep learning model was developed using long short-term memory networks to predict the number of confirmed cases within the next one week. The model demonstrated a promising prediction result with a coefficient of determination (R^2) of up to approximately 0.91. Moreover, in order to provide statistical evaluation measures of confirmed-case prediction and to quantify the prediction effectiveness of each type of data, the prediction results of six feature groups are presented and analyzed. Furthermore, six essential observations with supporting evidence and analyses are presented. The goal of this paper is to present a proposed approach which can be applied, customised, adjusted, and replicated for analysis of the impact of COVID-19 on a transportation network and prediction of the anticipated COVID-19 cases using a similar data set obtained for other large cities in the USA or from around the world.

Full PDF

IImpact of COVID-19 on City-Scale Transportationand Safety: An Early Experience from Detroit

Yongtao Yao

Department of Computer ScienceWayne State University

Detroit, MI 48202, [email protected]

Tony G. Geara

Department of Public WorksCity of Detroit

Detroit, MI 48216, [email protected]

Weisong Shi

Department of Computer ScienceWayne State University

Detroit, MI 48202, [email protected]

Abstract —The COVID-19 pandemic brought unprecedentedlevels of disruption to the local and regional transportationnetworks throughout the United States, especially the MotorCity—Detroit. That was mainly a result of swift restrictivemeasures such as statewide quarantine and lock-down ordersto conﬁne the spread of the virus and ﬂatten-the-curve alongwith a natural reaction of the population to the rising numberof COVID-19-related cases and deaths. This work is driven byanalyzing ﬁve types of real-world data sets from Detroit relatedto: trafﬁc volume, daily cases, weather, social distancing index,and crashes from January 2019 to June 2020. The primary goalis ﬁguring out the impacts of COVID-19 on the transportationnetwork usage (trafﬁc volume) and safety (crashes) for the Cityof Detroit, exploring the potential correlation between thesediverse data features, and determining whether each type ofdata ( e.g., trafﬁc volume data) could be a useful factor in theconﬁrmed-cases prediction. In addition, early future prediction ofCOVID-19 rates can be a vital contributor to live-saving advancedpreventative and preparatory responses. In order to achieve thisgoal, a deep learning model was developed using long short-term memory networks to predict the number of conﬁrmedcases within the next one week. The model demonstrated apromising prediction result with a coefﬁcient of determination( R ) of up to approximately 0.91. Moreover, in order to providestatistical evaluation measures of conﬁrmed-case prediction andto quantify the prediction effectiveness of each type of data,the prediction results of six feature groups are presented andanalyzed. Furthermore, six essential observations with supportingevidence and analyses are presented. Those will be helpful fordecision-makers to take speciﬁc measures that aid in preventingthe spread of COVID-19 and protect public health and safety.The goal of this paper is to present a proposed approach whichcan be applied, customised, adjusted, and replicated for analysisof the impact of COVID-19 on a transportation network andprediction of the anticipated COVID-19 cases using a similar dataset obtained for other large cities in the USA or from aroundthe world. Index Terms —COVID-19, Data, Analysis, Prediction, Quaran-tine, Transportation networks, Trafﬁc volume, Crashes, Socialdistancing, Weather, Daily cases, Detroit.

I. I

NTRODUCTION

The 2019 Novel Coronavirus (SARS-CoV-2), commonlyknown as COVID-19, has spread rapidly across the globe.As of July 28, 2020, over 16 million conﬁrmed cases and650 thousand deaths had been reported worldwide (WHOSituation Report-190, 2020) [1]; meanwhile, the United Statesis one of the most affected nations in the world, with more than 4 million conﬁrmed cases and 149 thousand deaths. Thistragic spread of COVID-19 nationally has resulted in disparateimpacts across states and cities.To slow the progression of COVID-19 and limit fatalities,public ofﬁcials throughout Michigan had published a seriesof government directives that have been changed over time,starting with voluntary requests for stay-at-home and restric-tions on large public gatherings, and then, statewide quarantineand lock-down orders. Nonetheless, essential travel activitiescontinue to take place across Detroit, such as people’s accessto daily supplies, medical services, and other basic necessitiesof welfare and safety. These government directives inevitablyaffect various forms of travel activities and then impact trans-portation across Detroit signiﬁcantly [2].Our work is based on the hypothesis that the objective,reliable and continuous transportation data can reﬂect thedegree of social distancing, i.e., the possibility of socialactivities and interpersonal communication to a certain extent,while many previous works provide the evidence proving thatthe social distancing measures enacted have led to control ofCOVID-19 [3]–[5]. Therefore, we believe that trafﬁc data canprovide a basis for the current and incoming pandemic status,and it is meaningful to explore the changes in trafﬁc patternsduring the COVID-19 pandemic for a speciﬁc city.In addition, crash-related information, such as total numberof daily crashes, severity, and crash type can indirectly reﬂecttrafﬁc conditions [6]–[8]. Since one of the focus points of thiswork is to explore the correlation between trafﬁc volume dataand the outbreak of COVID-19, we also collected and analyzedcrash data from Detroit to identify the impacts of crash-relatedinformation on the conﬁrmed-cases prediction.Through our literature review, an abundance of studiespointed out that weather factors, e.g., temperature (°C) andwind speed (mph) can contribute to the spread of COVID-19[9]–[11]. Inspired by these works, we also sought to determinewhether weather could be a factor in the spread of this disease.Moreover, it is well known that maintaining social dis-tancing can prevent the spread of COVID-19 disease andcontain the number of casualties [12]–[17], which is based onthe assumption that the degree of social distancing is highlyrelated to the spread speed of COVID-19. Therefore, we alsocollected social distancing related data for Wayne County and a r X i v : . [ phy s i c s . s o c - ph ] A ug ichigan state, to explore and test the correlation betweensocial distancing and the severity of COVID-19 disease.By considering the aforementioned data features, we thenaimed to build an effective deep learning model using longshort-term memory networks (LSTM) to predict the numberof conﬁrmed cases in Detroit. In order to provide statisticalevaluation measures to quantify the prediction effectiveness ofeach type of data on the conﬁrmed-cases prediction results, i.e., the performance of LSTM, we trained LSTM on sixexperiment groups with different features, then analyzed theprediction results of the six feature groups.Our observations and prediction model are intended to helpdecision-makers to concentrate suitable public health effortsand apply effective transportation management techniques toprotect residents and improve safety for Detroiters. It mustbe noted that the the presented statistical analysis approachesand the proposed prediction model were used on the Detroit-based data set as an example and due to availability. Themethod could be applied, customised, adjusted, and replicatedfor analysis of the impact of COVID-19 on a transportationnetwork and prediction of the anticipated COVID-19 casesusing a similar data set obtained from other large cities fromwithin the USA or from around the world.Particularly, this paper set out to answer and is driven bythe following question: i ) What are the sudden and drastic changes in overalltemporal trafﬁc patterns resulting from the outbreak ofCOVID-19? ii ) Did trafﬁc decrease and then recover evenly across allmeasured signals during COVID-19 and as COVID-19restrictions were being lifted? iii ) What are the impacts of COVID-19 and the social dis-tancing on the reasons behind crashes? iv ) Can we leverage the trafﬁc count data, crash data, andother COVID-19 related information, such as the data ondaily conﬁrmed cases plus social distancing and com-bined with weather information, to predict the number ofCOVID-19 conﬁrmed cases for the next one week?The rest of the paper is organized as follows. We ﬁrst reviewprevious works and introduce the research gap in Sec. II.Then, we elaborate on the data sets used for the experimentsin Sec. III. Sec. IV describes experimental investigations forthe questions presented in i ) to iii ), and Sec. V demonstratesthe proposed algorithms to predict the number of COVID-19conﬁrmed cases as described in question iv ). After presentinga discussion of the prediction results in Sec. VI, we concludethe work in Sec. VII.II. R ELATED W ORK

In this section, we review recently published works thatfocus on the COVID-19 conﬁrmed-cases prediction or similarresearch triggered by the outbreak of COVID-19 in terms oftransportation, weather, social distancing, and other aspects.To the best of our knowledge, prior works do not consider theeffects of trafﬁc volume data and crash-related information onthe research of COVID-19 forecasts.

A. Transportation and COVID-19

The travel restrictions put in place to reduce the spread ofCOVID-19 resulted in a sharp reduction in trafﬁc throughoutthe United States. Some recent works explored the changes inthe transportation mode. For example, in the work of [2], Hu etal. studied transportation modes during and after the COVID-19 pandemic using basic laws of trafﬁc and mathematicalanalysis to explore scenarios of increased car commuting.Lacus et al. [18] analyzed data on air trafﬁc worldwidewith the scope of analyzing the impact of the travel ban onthe aviation sector as well as after changes in COVID-19diagnostic criteria. Lau et al. [19] calculated the correlationof air trafﬁc to the number of conﬁrmed COVID-19 cases anddetermined the growth curves of cases before and after lock-down. Teixeira et al. [20] presented clues on how bike-sharingcan support the transition to a post-COVID-19 society.

B. Weather and COVID-19

Moreover, the weather factors including temperature (°C),humidity(%), wind speed (mph) are regarded as the factorsthat triggered the spread of COVID-19 in recent works. Forexample, through the Spearman-rank correlation test, Tosepu et al. [9] proved that among the minimum temperature,maximum temperature, average temperature, humidity, andrainfall, only the average temperature is signiﬁcantly relatedto the COVID-19 pandemic in Jakarta Indonesia. Anotherexperiment [10] was conducted based on the daily new casesand weather information in 50 states in the United States,and clarify those weather parameters i.e., temperature andabsolute humidity will help classify the risky geographic areasin different countries. Besides, based on the Spearman’s corre-lation coefﬁcients, Mehmet et al. [11] pointed out the highestcorrelations between wind speed (mph) with the outbreak ofCOVID-19. In addition, a few studies have claimed that warmweather can possibly slow down the global pandemic [11] ofCOVID-19 by considering nine cities in Turkey.

C. Social Distancing and COVID-19

In addition, a portion of the related work studied therelationship between social distancing and COVID-19. Forinstance, Singh et al. [12] focused on the age-structured impactof social distancing on the COVID-19 epidemic in India andpresented a mathematical model of the spread of infection ina population that structured by age and social contact betweenages. Courtemanche et al. [5] pointed out that there would havebeen ten times greater spread of COVID-19 without shelter-in-place orders. [13] presented that social distancing and shelterin place has had some impact on crime and disorder, but onlyfor a restricted collection of crime types and not consistentlyacross places.

D. Other Coronavirus-Related Research

To date, more and more researchers are aggressively shiftingtheir focus to detect, predict, treat, and recover from COVID-19. Some of the prior works do not consider transportation,weather, or social distancing, but they presented promisingesults to address the coronavirus-related issue. For example,Qin et al. [21] proposed a novel model to predict the outbreakof COVID-19 in populations in affected areas based on thesocial media search indexes (SMSI) for dry cough, fever,chest distress, coronavirus, and pneumonia. Lacus et al. [18]calculated the economic impact measured in terms of loss ofGDP due to the aviation sector as well as the social impactdue to job losses related to aviation and correlated sectors(tourism, catering, etc).III. D

ATA D ESCRIPTION AND C RITICAL D ATES

In this section, we discuss all four types of data collectedand analyzed for this study (shown in Table I): ( i ) trafﬁcvolume data, ( ii ) daily cases number including daily conﬁrmedcases and daily death number, ( iii ) weather information, ( iv )social distancing -related data, and ( v ) crash data. For eachcategory of data, we then elaborate on the corresponding datasource and the embodied attributes, respectively. Timeline of shotdown and data collection period.

Knowingthese speciﬁc dates are important for researchers to explorethe changes in overall trafﬁc volume patterns across Detroitpertaining to COVID-19 and the impact of these four typesof data on the prediction of the conﬁrmed cases. Note that inDetroit, the onset of COVID-19 was estimated to be ion

March1th , the shutdown started on

March 19th , and people startedgoing back to work in the ofﬁce on

June 1st . however, both,the number of closed businesses in the downtown area and thenumber of people working from home in general continued toremain high.Thus, we deﬁned the shutdown period as going into effectfrom the 19th of March to 1st of June. We collected andanalyzed transportation data from Detroit before the ﬁrstCOVID-19 conﬁrmed cases, during the pandemic, and after therelease of shutdown — from 1/1/2019 to 6/30/2020 — coveringmore than one and a half years. To keep data consistent, wealso analyzed daily cases number, weather information, andsocial index data that spans the same time period.

A. Trafﬁc Volume Data

We collected and analyzed transportation data from 73signalized intersection sites with advanced Remote TrafﬁcSignal Management System (RTSMS) Level-II. Those loca-tions have continuous data collection and analytics metricsthrough camera detection. They are all owned by the City ofDetroit out of a total of 787 City-owned signals and anotherapproximately 700 signals owned by other jurisdictions aroundthe City. These signalized intersections are indispensableparts of urban trafﬁc networks since around two-thirds ofurban vehicle miles traveled on signal controlled roads [22].Fig. 2 depicts the geographical distribution of the studied 73signalized intersections, which are highlighted by the greendots. Those locations provided aggregated daily trafﬁc volumedata for the following 10 attributes — Bus, MotorizedVehicle,PickupTruck, ArticulatedTruck, SingleUnitTruck, Pedestrian,Motorcycle, Car, WorkVan, and Bicycle. Those attributes were later compounded into 6 as described in future sections.Additionally, it must be noted that in 2019, the number ofLevel-II RTSMS locations was limited to 25. However, toaccount for that discrepancy the normalized average volumeper intersection was used.

Fig. 1. The distribution of 73 intersections in Detroit.

B. Daily Cases

We obtained and analyzed the daily cases data from Michi-gan’s ofﬁcial Coronavirus dashboard , including ( i ) the num-ber of conﬁrmed cases, and ( ii ) the number of reported deaths.More speciﬁcally, as to the number of conﬁrmed cases, thenumber refers to the disease onset date; otherwise, either thespecimen collection date of the ﬁrst positive COVID-19 testor referral date is used. AS for the number of reported deaths,the corresponding value represents the actual reported dateof death, and 8 conﬁrmed deaths did not have a valid dateavailable and are not included in the collected data. Q uan t i t y Date

Confirmed Cases Death

Fig. 2. The number of daily conﬁrmed cases and death in Detroit.

C. Weather Data

Since previous work research that weather factors e.g., tem-perature (°C) and the wind speed (mph) may be a contrib-utor to the spread of COVID-19, we also included weatherinformation as one of the input for COVID-19 prediction. Wecollected weather data from the ofﬁcial website of National ceanic and Atmospheric Administration , and analyzed sixweather-related attributes including: Rain precipitation, Snowprecipitation, Average temperature, Maximum temperature,Minimum temperature, and average wind speed. D. Social Distancing Information

We also collected and analyzed the social distancing re-lated attributes for Wayne County and Michigan State fromthe COVID-19 Impact Analysis Platform published by theUniversity of Maryland. This data was not granular enoughto account for speciﬁcally the City’s boundaries, however, itincluded valuable data for the various input contributing tosocial distancing on both the County and State levels.In particular, the social distancing index is calculated fromthe six mobility indicators by the following equation: socialdistancing index = 0.8 × [%staying home + 0.01 × (100 − %staying home) × (0.1 × %reduction of all trips comparedwith pre-COVID-19 benchmark + × %reduction of worktrips + × %reduction of non-work trips + × %reduc-tion of travel distance)] + × %reduction of out-of-countytrips. The choice of weight is based on the shared travel ratioof residents and tourists ( e.g., about 20% of all trips are tripsoutside the county, which leads to a choice of 0.8 for residentsand 0.2 for out-of-county travel); what trips are consideredmore important ( e.g., work travel is more important than non-work travel). A higher social distance index score refers to thefewer chances for close interpersonal interaction and reducedopportunities for the transmission of COVID-19. E. Crash Data

To ﬁgure out the impact of COVID-19/social distancingmetrics on the rate and severity of crashes, we also collected19 crash metrics for Detroit from the Michigan State Police(MSP) Trafﬁc Crash Reporting System (TCRS) , which con-tains the information related to the number of total crashesper day broken down by severity, other reasons, and types ofcrashes. Table I lists these metrics. The monitoring frequencyfor all metrics are per day.IV. S TATISTICAL A NALYSIS

In this section, we conduct statistical analysis on the col-lected metrics and we present our interesting observationsrelated to the changes of trafﬁc volume and pattern, correlationstudies, and crashes statistics analysis.

A. Changes of Trafﬁc Volume and Pattern1)

Changes of Trafﬁc Volume in 2020:

We ﬁrst soughtto answer the question of ”What’s the sudden spatial trafﬁcpatterns pertaining to COVID-19”. To answer this question, weﬁrst calculate the average number of each transportation modeamong 73 signalized intersections for each monitoring day,then We further draw the average number of each mode duringthe period before, during, and after quarantine in Detroit https://data.covid.umd.edu/ https://milogintp.michigan.gov/mdot-waps6/crash/ (shown in Fig. 3). Out of the 10 collected metrics related totrafﬁc counts at signalized intersections (shown in Table I),we combined some of those metrics an consolidated theclassiﬁcations down to six categories. For example, the valueof ”Truck Van” represents the total number of all recordedtypes of trucks and vans, while the value of ”Car” indicatesthe total quantity of motorized vehicles and traditional cars.It can be seen that the number of buses, pedestrians, andcars both showed declining trends during the shutdown periodand then increased after the ending of the shutdown. On thecontrary, it is notable that the quantity of Bicycle and Mo-torcycle increased sharply even after the statewide quarantineorder. For example, the average number of Bicycle per dayincreased to approximately 2 × its original value and evenincrease by approximately 4 × during and after the quarantine,respectively. Similarly, the average number of Motorcyclesurged to approximately 3 × its before shutdown numbers. Observations 1:

Although cars are still the predominanttransportation mode, biking and motorcycling have demon-strated a transition in usage and are showing an increasedpopularity (up to 4 × of the previous volume) in the post-COVID-19 urban mobility for Detroit. These numbers donot account for the seasonal weather impact on Bike andMotorcycle usage.This observation may be attributed to the fact that cyclingcan be an alternative mode of transport as it can be compatiblewith social distancing regulations and allow for short individ-ual trips. It must be notes that this excercise does not accountfor the seasonal weather impact on Bike and Motorcycleusage which coincided with the onset of COVID-19. Althoughliterature exploring the role of cycling in previous epidemicsis rare, it is recognized that one of the factors leading to therise of e-bikes in China was the 2002–2004 SARS outbreakas people tried to avoid overcrowded public transportationservices [23], [24]. Additionally, the same pattern was alsoobserved in New York City [20], showing some evidence ofa modal transfer from some subway users to the bike-sharingsystem. Observations 2:

The total number of trucks and vanswas almost the same before and during the shutdown. Anincreased of approximately 40% is observed in the post-shutdown numbers.The above truck-related observation might be attributed to: i ) with the outbreak of the epidemic, a portion of Detroitcitizens are turning to online shopping methods. However,due to the shutdown policies, some industry-related trucks andvans stopped running, leading to a relatively stable volumeof trucks and vans during the quarantine period. ii ) after theshutdown period, although there are no speciﬁc rules to limitthe transportation of trucks and vans, it is expected that theincreased delivery demand was maintained in addition to theincreased demand from the reopening of industrial activityand road construction. .5 4.3 10.6 1.1 8.3 6.5 4.6 Bus Pedestrian Car Bicycle Motorcycle Truck_Van Q uan t i t y Transportation Modes

Average Number Before ShutdownAverage Number During ShutdownAverage Number After Shutdown ( X ( X ) ( X ( X ( X Fig. 3. Changes in the daily number of each transportation model before,during, and after quarantine in Detroit. Changes of Trafﬁc Pattern between 2019 and 2020:

In order to the answer ”how did trafﬁc patterns change from 2019 to 2020 considering the impact of COVID-19” ,we explored the temporary distribution of each transportationmode’s quantity among 2019 and 2020, including bus, bicycle,car, motorcycle, and truck. More speciﬁcally, we dividedthe collected trafﬁc data set into four groups based on thetemporary periods, i.e.,

TABLE IA

NALYZED M ETRICS S UMMARY . Data Category

10 Bus, MotorizedVehicle, PickupTruck, Ar-ticulatedTruck, SingleUnitTruck, Pedes-trian, Motorcycle, Car, WorkVan, and Bi-cycle. Trafﬁc volume by classiﬁcation: number of each transportationmode per day.

Daily Case

Weather Data

Crash

19 Total crashes Number of total crashes per day.Fatal, Serious, Minor, Possible, None Severity of crashes (worst injury).Ped, Cyclist, YoungDriver(Under 24) Other reason for crash (Crashes with non-motorized roadwayusers and young drivers under 24 years of age).Single motor vehicle, Head on, Head onleft turn, Angle, Rear end, Rearend rightturn, Sideswipe same, Sideswipe opposite,Backing, Other, UnknownNull or not en-tered Type of crashes.

SocialDistancingRelated Data

21 Social distancing index An integer from 0 (cid:118) W ” shape, i.e., the medium value of bicycle and motorcycleis signiﬁcantly lower than the value of bus. However, when itcomes to ”2020 March - June”, this ” W ” shape is disappearedsince the medium value of bicycles is increased while themedium value of buses is declined markedly compared withother sub-ﬁgures. Observations 3:

Comparing the trafﬁc volume data of 2019and 2020, the trafﬁc pattern of 2020 March - June (outbreakperiod of COVID-19) is signiﬁcantly different from thepatterns of 2019 and the ﬁrst two months of 2020. N o r m a li z ed T r a ff i c V o l u m e N o r m a li z ed T r a ff i c V o l u m e Fig. 4. The change of trafﬁc patterns from 2019 to 2020 pertaining to COVID-19.

B. Correlation Study

To further explore the correlation between each type ofcollected metrics, such as the correlation between the numberof new conﬁrmed cases with the daily trafﬁc volume, weconduct correlation coefﬁcient analysis and joint distributionanalysis, which present a statistic analysis results on theircorrelation. Correlation Coefﬁcients:

One of the more frequentlyreported statistical methods involves correlation analysis wherea correlation coefﬁcient is reported representing the degree oflinear association between two variables [25], [26].In this work, we calculated the correlation coefﬁcientsbetween i ) transportation modes ( e.g., bus, pedestrian, bicycle,car, motorcycle, and truck), ii ) total crashes, iii ) weather info( e.g., average temperature, rain precipitation, and daily averagewind speed), iv ) social distancing index of Wayne County andMichigan, and v ) daily cases ( e.g., daily conﬁrmed cases anddaily deaths). The formula for the correlation coefﬁcient isdeﬁned as follows. ρ x,y = E [ XY ] − E [ X ] E [ Y ] (cid:113) E [ X ] − ( E [ X ]) (cid:113) E [ Y ] − ( E [ Y ]) (1)The correlation coefﬁcient is a statistic that measures thelinear correlation between two attributes X and Y , with thevalue range of [ − , +1 ]. A value of +1 is the total positivelinear correlation, 0 is no linear correlation, and − is the totalnegative linear correlation. The higher correlation coefﬁcientrepresents the higher correlation between the two attributes.Fig. 5 presents the absolute values of the calculated corre-lation coefﬁcients based on Equation 1. We use the gradientcolor from yellow to blue to indicate the lower to the highercorrelation between two attributes. Observations 4: Daily cases i.e., daily conﬁrmed casesand daily death is highly related, with: − the number of transportation volume, especially forthe cars, − total crashes, − social distancing index at the Wayne County leveland Michigan level, and − the average temperature .Note that although previous work revealed the wind speed(mph) is one of the factors that triggered the spread ofCOVID-19 [11], in this work, we do not ﬁnd the high linear correlation between the number of daily cases with the windspeed and other weather factors such as rain precipitation inDetroit based on our collected dataset. BusPedestrianBicycleCarMotorcycleTruckTotalcrashes

TAVG

PRCPAWNDSC Index– Wayne CountySC Index – MIDaily deathsDaily cases B u s P ede s t r i an B i cyc l e C a r M o t o r cyc l e T r u ck T o t a l c r a s he s T AV G P RC P A W ND S C I nde x – W a y ne C oun t y S C I nde x – M I D a il y dea t h s D a il y c a s e s Fig. 5. The absolute value of correlation coefﬁcients. TAVG represents theaverage temperature per day. PRCP indicates the daily rain precipitation.AWND stands for the daily average wind speed, and SC index representsthe social distancing index. Joint Distribution:

Based on the correlation coefﬁ-cients, we can roughly know the correlation degree betweeneach pair of attributes, but we cannot decisively derive thespeciﬁc reason for that relationship. Therefore, we calculatednd drew the joint distributions [27] for select pairs of at-tributes. This demonstrates the intuitive quantitative relation-ship between variables (linear / non-linear, or whether thereis a more obvious correlation). Most importantly, the jointdistributions allow us to identify the relationship betweenmultiple attributes.To be certain, for { X = x, Y = y } , we found all elements inthe sample space that satisfy these two values. These elementsformed a subset of the sample space, and the probability ofthis subset was the joint probability of [ P ( X = x, Y = y )] . [ p ( x, y ) = P ( X = x, Y = y )] is called joint PMF (joint prob-ability mass function). The joint probability can be regarded asthe probability when two events occur at the same time. EventA is [ X = x ] , and event B is [ Y = y ] , which is [ P ( A (cid:84) B )] . Fig. 6. Joint distribution of various relevant attributes.

Fig. 6 presents the joint distribution between the dailyconﬁrmed cases with other attributes such as transportationvolume, total crashes, weather, and social distancing index from 2020 March to June. Green points display the speciﬁcvalue of the selected attributes, and red straight lines are usedto represent the linear ﬁt results between the pairs of twoattributes, e.g., the linear ﬁt result of the daily conﬁrmed caseand the number of buses per day. The greater the slope of thered line, the stronger the linear relationship between the twoattributes.Based on the joint distribution of the daily conﬁrmed casesand the rain precipitation shown in Fig. 6, we can deduce thereason why they are not linearly correlated : no matter howlarge the value of daily cases is, the value of PRCP is verysmall, i.e., the rainfall in Detroit from March to June of 2020 isvery small, resulting in a very small range of rainfall. Underthese circumstances, the value of the correlation coefﬁcientbetween the daily cases and PRCP is low, indicating a low linear correlation . C. Crashes Statistics

We then analyzed crash data covering 2019 Jan to Feb, 2019Mar to Jun, 2020 Jan to Feb, and 2020 Mar to Jun. Our goalwas to identify the percent distribution of the different crashtypes, which is shown in Fig. 7.

Crashes Statistics / 2019 Mar - Jun

Crashes Statistics / 2020 Mar - Jun

Crashes Statistics / 2019 Jan - Feb

Crashes Statistics / 2020 Jan - Feb

Single_Motor_Vehicle Head_On Head On-Left TurnAngle Rear End Rear End-Left TurnRear End-Right Turn Sideswipe-Same Sideswipe-OppositeBacking Other Unknown Null | Not entered

Fig. 7. Distribution by crash type percentage for 2019 to 2020.

Observations 5: When Comparing the 2020 crash typepercentages from before the outbreak of COVID-19 withthose during the pandemic period, a clear crash typedistribution shift is observed: − Angle crashes became the most common moving upfrom third place (18.12% ∼ − Rearend crashes moved from ﬁrst to second place(27.80% ∼ − Sideswipe crashes slightly decrease but maintainedthird place (20.44% ∼ − Single vehicle crashes increased slightly and main-tained forth place (14.51% ∼ Data key: (2020 Jan-Feb ∼ ONFIRMED C ASES P REDICTION

In this section, we aim to conduct a further study on the in-ﬂuence of trafﬁc volume data, weather information data, crashdata, and social distancing index information on the conﬁrmed-cases prediction.Our ultimate goal is building a suitable deeplearning model to predict the number of conﬁrmed cases basedon ( i ) trafﬁc volume data, ( ii ) daily cases number includingdaily conﬁrmed cases and daily death number, ( iii ) weatherinformation, ( iv ) social distancing-related data, and ( v ) crashdata A. Problem Formulation and Solution

Problem Deﬁnition.

We formulate the problem of predictingthe number of COVID-19 conﬁrmed cases as a regressionproblem. Speciﬁcally, we use T = { input i } ni =1 to representour training data set, in which input i ∈ I denotes allinput features, i.e., the 58 features present in Table Iof Sec. III. Our goal is to employ the best method tolearn the function f , which minimizes the loss function (cid:96) ( h ( input ) ; groundtruth ) , a measurement of the differencebetween the desired output and the actual output of the currentmodel, such that the trained model is able to predict thenumber of conﬁrmed cases over a speciﬁc prediction horizonwith high performance. Besides, we choose 21 days as ourmonitoring window, and we aim to predict conﬁrmed-casesfor the next 7 days. Deep Learning Model Selection.

Recently, machine learn-ing methods have been applied with success in regressiontasks. We tackle the conﬁrmed-cases prediction problem usingLong Short-Term Memory Networks (LSTMs) [28], [29] sinceit has become highly successful learning models for bothclassiﬁcation and regression problems across diverse domains[30]–[33]. Speciﬁcally, LSTM is a type of recurrent neuralnetworks (RNNs) with the capability of processing sequencesof sequential data sets. After being proposed by Hochreiterand Schmidhuber [28], LSTM has been proved the abilityto address long-term back-propagating issues. It includes amemory cell and a gating mechanism, which allows it todecide what is kept in the memory cell, and how the new inputdata contributes to what is already in the memory cell. Fig. 8depicts the structure of the LSTM model that we deployed forthe conﬁrmed-cases prediction.

LSTM Dropout LSTM DropoutInput LSTMDropoutLSTMDenseDenseDenseOutput

Fig. 8. The structure of the LSTM.

Effective Measurements.

To be able to design the best prediction method, we needsome metrics to accurately measure the wellness of ourprediction approaches. To begin with, we use some commonly-used measures for our study: coefﬁcient of determination( R ), mean square error (MSE), and the root mean squareerror (RMSE), which both are the commonly used evaluationmetrics for the regression problem [34]. R score is widelyused to indicate the ﬁt of the machine learning model, i.e., thehigher value, the better ﬁt result generated by the model.The maximum value of R is 1 (ideal case), and it may bea negative value with a range of ( −∞ , . MAE measuresthe average magnitude of the errors in a set of predictions,without considering their direction. It’s the average over thetest sample of the absolute differences between prediction andactual observation where all individual differences have equalweight. RMSE is a quadratic scoring rule that also measuresthe average magnitude of the error. It’s the square root of theaverage of squared differences between prediction and actualobservation.Suppose the input data (ground truth) is y = { y , y , ..., y N } , and the prediction result is notedas ˆ y = { ˆ y , ˆ y , ..., ˆ y N } . MSE and RMSE are deﬁned as: M AE = 1 N N (cid:88) i =1 | ˆ y i − y i | (2) RM SE = √ M SE = (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i=1 (y i − ˆy i ) (3)Speciﬁcally, the formula to calculate R is deﬁned asfollows: R = 1 − RSST SS = 1 − (cid:80) Ni =1 ( y i − ˆ y i ) (cid:80) Ni =1 ( y i − ¯ y ) (4) ¯ y = 1 N N (cid:88) i =1 y i (5)Where TSS (total sum of squares) is the difference betweenall samples and the mean value, which is N × of the variance.Besides, RSS (residual sum of squares) is the sum of thesquares of all sample errors, which is N × times the MSE.When the predicted value of all samples is the same as thetrue value, RSS is 0, so R equal to 1 (ideal case). B. Model Creation

In this subsection, we introduce the experimental hardwareand the used packages, then we give a detailed explanationof why we conduct experiments on six experimental groups,nd what’s the precise format of our experimental input.

Experimental Setup.

In this work, we adopt NVIDIA GPUWorkstation as our experiment platform, which is powerfulhardware with the high-quality components (4 × GeForce RTX2080 Ti graphics cards) with Intel Xeon E5-2690 v4 (CPU),2.6 GHz of frequency, 14 cores, 64 GB memory, and installedwith Ubuntu 16.04.6 LTS (operating system). NVIDIAGPU Workstation is capable of delivering the cluster-levelperformance for even the demanding applications [35],[36]. The models learned in this paper are implemented inPython, using TensorFlow 1.13.1 [37], Keras 2.1.5 [38], andscikit-learn libraries [39] for model building.

Experimental Groups.

To show the impact of trafﬁc volumedata, weather-related metrics, crash data, and social distancingrelated data on the conﬁrmed-cases prediction, we conductexperiments on six experimental groups. Our ﬁrst step is tocombine all ﬁve categories of 58 features present in Table Iof Sec. III to train models using LSTM methods, and welabel this group as A Group (A represent all). Then, weexclude all trafﬁc volume metrics but keep the left features,and we denote it as A-T Group. Similarly, we exclude weatherinformation but keep other features, and we get the A-WGroup. Since we have two levels of social distancing index, i.e., social distancing index of Wayne County (denoted asSD), and social distancing index of Michigan state (markedas SDM), we delete SD and SDM to get A-SD group andA-SDM group, respectively. Finally, in order to ﬁgure out theimpact of crashes data (noted as C) on the conﬁrmed-casesprediction, we get A-C group. Table II shows the input featuresfor A Group, A-T Group, A-W Group, A-SD Group, A-SDM,and A-C Group.

TABLE III

NPUT FEATURES FOR SIX EXPERIMENTAL GROUPS . Group A √ √ √ √ √ √ A-T × √ √ √ √ √ A-W √ √ × √ √ √ A-SD √ √ √ √ × √ A-SDM √ √ √ √ √ ×A-C √ √ √ × √ √ C. Training and Validation Methodology

Training and Validation Methodology.

Next, we provide ahigh-level description of our methodology before delving intothe details. We use 5-fold cross-validation [40], which is avalidation technique to assess the predictive performance ofmachine learning models, judge how models perform to anunseen data set (testing data set) [41] and avoid the over-ﬁttingissue during the training phase. More speciﬁcally, our data setis randomly partitioned into ﬁve equal-sized sub-samples. Ata time, we take one sub-sample as the testing data set, andtake the remaining four sub-samples as the training data set.Then, we ﬁt a model on the training data set, evaluate it onthe testing data set, and calculate the evaluation scores. After that, we retain the evaluation scores and discard the currentmodel. The process is then repeated ﬁve times with differentcombinations of sub-samples, and we use the average of theevaluation scores as the ﬁnal result for each method.First, we need to determine the hyperparameters of ourmodels—an important aspect of building effective deep learn-ing models. To be concrete, we use hold-out method [42] tosplit up our training phase data set further into the parametertraining process and the validation process (80% and 20% ofthe training phase data set respectively), and the validation isan unbiased evaluation of a model ﬁt on the training datasetwhen tuning parameters. Then, we conduct a grid search onthese values of parameters to ﬁnd the best combination thatachieves the highest performance.

Avoiding Overﬁtting of the Models.

Another important factorduring the training process is epoch [43], which indicates thenumber of iterations of processing the input data set duringthe training process. With a higher value of epoch, the erroron training data will reduce further; however, at a crucialtipping point, the network begins to over-ﬁt the training data.Hence, ﬁnding the best value of the epoch is essential toavoid overﬁtting. Figure 9 shows the change in the valueof the training and validation loss functions (the smaller, thebetter) as the epoch increases. Initially, the values of the twoloss functions are decreasing with higher epoch; but after 340epochs, the value of validation loss function slowly increases(higher than the training loss), which indicates the over-ﬁttingissue. Therefore, we choose 340 epochs for LSTM. ss V a l ue epoch Training LossValidation Loss Fig. 9. The validation loss reaches minimum value at 340.

VI. R

ESULTS AND D ISCUSSION

In this section, we present and analyze the sensitivity ofLSTM toward different feature groups. Our discussion in-cludes supporting evidence and reasoning to explain observedtrends, and implications of observed trends for the authoritiesand decision-makers on taking speciﬁc measures for Detroit.

A. Prediction Results and Ground Truths

Fig. 10 present the conﬁrmed-cases prediction results, whichis denoted by the black curve. In order to conduct an intuitivecomparison between the prediction results with the groundruth, we also include a red curve to represent the value ofground truth. It can be seen that our prediction result is veryclose to the ground truth, which motivates us to get the statisticevaluation results. o f C a s e s DayPrediction Result Ground Truth

Fig. 10. Prediction result and ground truth.

B. Evaluation Results and Observations

Next, we present the key prediction quality measures forthe six experiment groups (Figure 11). Note that among thethree evaluation metrics, i.e., R , MAE, and RMSE, R isa more intuitive and objective performance indicator of theﬁtting effect in the regression problem. Therefore, we treat R as our primary evaluation metric. Finally, we make severalinteresting observations as follows: RMSE MAE R^2A 0.0606 0.0378 0.9088A-T 0.0760 0.0457 0.7934A-W 0.0716 0.0438 0.8309A-SD 0.0727 0.0467 0.8178A-SDM 0.0714 0.0404 0.8213A-C 0.0687 0.0450 0.84940.00.20.40.60.81.0

Fig. 11. Model prediction quality with six experiment groups of A (allselected features), A-T (without trafﬁc volume related features), A-W (withoutweather-related features), A-SD (without social distancing related index on theWayne County level), A-SDM (without social distancing related index on theDetroit level), and A-C (without crash-related features). We observe that A group performs the best across allexperiment groups i.e., achieving highest (around 0.91) R score and lowest MAE and RMSE. This observation veriﬁesour hypothesis that i ) trafﬁc volume data, ii ) weather features, iii ) social distancing-related data, and iv ) crash informationare both useful and helpful for improving the effectiveness ofconﬁrmed-cases prediction. Considering the difference of the R score between A groupand other ﬁve groups, we observe that: (1) A-T group achievesthe lowest highest score, i.e., there is the biggest effectivenessdifference between A group and A-T group, which proves that deleting trafﬁc volume data could result in the most signiﬁcantadverse effect on the conﬁrmed-cases prediction, i.e., trafﬁcvolume data is critical for the improvement of conﬁrmed-casesprediction. Similar to the above observation, we get the conclusionon the effectiveness comparison between trafﬁc volume data,social distancing related data, weather data, and crash datain terms of prediction improvement (shown in Fig. 12)—adding these four types of data can both improve the pre-diction performance, and trafﬁc volume data is more effectivecompared with social distancing related data, then followedby weather features, and crash data seems has least impactson the prediction performance.

Effectiveness

Traffic Volume Social Distancing Related Data Weather Crash ↓ 𝑅 A-T A-SD A-SDM A-W A-C ↑ RMSE

A-T A-SD A-SDM A-W A-C <> <> < ≈ <>> > > Fig. 12. Effectiveness comparison between four types of data in terms ofprediction improvement. The arrow indicates the higher or lower the value ofthe evaluation metric, the better the prediction performance.

Observations 6:

Besides the daily case data and socialdistancing index, the data related to trafﬁc volume, crashes,and weather can both be a good indicator for COVID-19conﬁrmed-cases prediction. The trafﬁc volume data is veryuseful information regarding the prediction model.VII. C

ONCLUSION

In this work, we collected and analyzed ﬁve types of datasets including: trafﬁc volume, daily cases, weather informa-tion, crash features, and social distancing related data. Impor-tant observations with supporting evidence and analysis arepresented to provide practical implications for authorities anddecision-makers on taking preventive actions for Detroit. Interms of crashes there was a clear change in crash percentagedistribution by type. During COVID-19 there was a signiﬁcantincrease in angle crashes which are typically more severeand indicative of more severe driver-behavior-related issues.In terms of correlations, daily cases i.e., daily conﬁrmed casesand daily death is highly related, with: − the number oftransportation volume, especially for the cars, − total crashes, − social distancing index at the Wayne County level andMichigan level, and − the average temperature. Additionally,we have trained an accuracy deep learning model, which showsthe effectiveness of predicting COVID-19 conﬁrmed-cases forthe next week, i.e., R up to approximately 0.91.The prediction quality is tested on six experiment groups, andthe prediction results also proved that adding trafﬁc volumedata, social distancing related metrics, weather information,and crash feature could both improve the prediction perfor-mance. EFERENCES[1] W. H. Organization et al. , “Coronavirus disease (covid-19): situationreport, 190,” 2020.[2] Y. Hu, W. Barbour, S. Samaranayake, and D. Work, “Impacts of covid-19 mode shift on road trafﬁc,” arXiv preprint arXiv:2005.01610 , 2020.[3] K. E. Ainslie, C. E. Walters, H. Fu, S. Bhatia, H. Wang, X. Xi,M. Baguelin, S. Bhatt, A. Boonyasiri, O. Boyd et al. , “Evidence ofinitial success for china exiting covid-19 social distancing policy afterachieving containment,”

Wellcome Open Research , vol. 5, 2020.[4] G. Briscese, N. Lacetera, M. Macis, and M. Tonin, “Compliance withcovid-19 social-distancing measures in italy: the role of expectations andduration,” National Bureau of Economic Research, Tech. Rep., 2020.[5] C. Courtemanche, J. Garuccio, A. Le, J. Pinkston, and A. Yelowitz,“Strong social distancing measures in the united states reduced thecovid-19 growth rate: Study evaluates the impact of social distancingmeasures on the growth rate of conﬁrmed covid-19 cases across theunited states.”

Health Affairs , pp. 10–1377, 2020.[6] J. C. Stutts, D. W. Reinfurt, L. Staplin, E. Rodgman et al. , “The role ofdriver distraction in trafﬁc crashes,” 2001.[7] S. Candefjord, R. Buendia, E.-C. Caragounis, B. A. Sj¨oqvist, andH. Fagerlind, “Prehospital transportation decisions for patients sustainingmajor trauma in road trafﬁc crashes in sweden,”

Trafﬁc injury prevention ,vol. 17, no. sup1, pp. 16–20, 2016.[8] M. A. Abdel-Aty and H. T. Abdelwahab, “Predicting injury severity lev-els in trafﬁc crashes: a modeling comparison,”

Journal of transportationengineering , vol. 130, no. 2, pp. 204–210, 2004.[9] R. Tosepu, J. Gunawan, D. S. Effendy, H. Lestari, H. Bahar, P. Asﬁan et al. , “Correlation between weather and covid-19 pandemic in jakarta,indonesia,”

Science of The Total Environment , p. 138436, 2020.[10] S. Gupta, G. S. Raghuwanshi, and A. Chanda, “Effect of weather oncovid-19 spread in the us: a prediction model for india in 2020,”

Scienceof The Total Environment , p. 138860, 2020.[11] M. S¸ahin, “Impact of weather on covid-19 pandemic in turkey,”

Scienceof The Total Environment , p. 138810, 2020.[12] R. Singh and R. Adhikari, “Age-structured impact of social distancingon the covid-19 epidemic in india,” arXiv preprint arXiv:2003.12055 ,2020.[13] G. Mohler, A. L. Bertozzi, J. Carter, M. B. Short, D. Sledge, G. E. Tita,C. D. Uchida, and P. J. Brantingham, “Impact of social distancing duringcovid-19 pandemic on crime in los angeles and indianapolis,”

Journalof Criminal Justice , p. 101692, 2020.[14] M. Painter and T. Qiu, “Political beliefs affect compliance with covid-19social distancing orders,”

Available at SSRN 3569098 , 2020.[15] A. Olivera-La Rosa, E. G. Chuquichambi, and G. P. Ingram, “Keep your(social) distance: Pathogen concerns and social perception in the time ofcovid-19,”

Personality and Individual Differences , vol. 166, p. 110200,2020.[16] H. Chen, W. Xu, C. Paris, A. Reeson, and X. Li, “Social distance andsars memory: impact on the public awareness of 2019 novel coronavirus(covid-19) outbreak,” medRxiv , 2020.[17] J. A. Lewnard and N. C. Lo, “Scientiﬁc and ethical basis for social-distancing interventions against covid-19,”

The Lancet. Infectious dis-eases , vol. 20, no. 6, p. 631, 2020.[18] S. M. Iacus, F. Natale, C. Santamaria, S. Spyratos, and M. Vespe,“Estimating and projecting air passenger trafﬁc during the covid-19coronavirus outbreak and its socio-economic impact,”

Safety Science ,p. 104791, 2020.[19] H. Lau, V. Khosrawipour, P. Kocbach, A. Mikolajczyk, J. Schubert,J. Bania, and T. Khosrawipour, “The positive impact of lockdown inwuhan on containing the covid-19 outbreak in china,”

Journal of travelmedicine , vol. 27, no. 3, p. taaa037, 2020.[20] J. F. Teixeira and M. Lopes, “The link between bike sharing and subwayuse during the covid-19 pandemic: The case-study of new york’s citibike,”

Transportation Research Interdisciplinary Perspectives , vol. 6, p.100166, 2020.[21] L. Qin, Q. Sun, Y. Wang, K.-F. Wu, M. Chen, B.-C. Shia, and S.-Y. Wu,“Prediction of number of cases of 2019 novel coronavirus (covid-19)using social media search index,”

International journal of environmentalresearch and public health , vol. 17, no. 7, p. 2365, 2020. [22] J. McCracken, “Demonstration project 93–making the most of today’stechnology,”

Public Roads , vol. 59, no. 3, 1996.[23] J. Weinert, C. Ma, and C. Cherry, “The transition to electric bikes inchina: history and key reasons for rapid growth,”

Transportation , vol. 34,no. 3, pp. 301–318, 2007.[24] P. Simha, “Disruptive innovation on two wheels: Chinese urban trans-portation and electriﬁcation of the humble bike,”

Periodica PolytechnicaTransportation Engineering , vol. 44, no. 4, pp. 222–227, 2016.[25] R. Taylor, “Interpretation of the correlation coefﬁcient: a basic review,”

Journal of diagnostic medical sonography , vol. 6, no. 1, pp. 35–39,1990.[26] J. Lee Rodgers and W. A. Nicewander, “Thirteen ways to look at thecorrelation coefﬁcient,”

The American Statistician , vol. 42, no. 1, pp.59–66, 1988.[27] A. Sklar, “Random variables, joint distribution functions, and copulas,”

Kybernetika , vol. 9, no. 6, pp. 449–460, 1973.[28] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”

Neuralcomputation , vol. 9, no. 8, pp. 1735–1780, 1997.[29] F. D. dos Santos Lima, G. M. R. Amaral, L. G. de Moura Leite, J. P. P.Gomes, and J. de Castro Machado, “Predicting failures in hard driveswith lstm networks,” in

Proceedings of the 2017 Brazilian Conferenceon Intelligent Systems (BRACIS) . IEEE, 2017, pp. 222–227.[30] S. Basak, A. Dubey, and L. Bruno, “Analyzing the cascading effectof trafﬁc congestion using lstm networks,” in . IEEE, 2019, pp. 2144–2153.[31] J. Hong, Z. Wang, and Y. Yao, “Fault prognosis of battery system basedon accurate voltage abnormity prognosis using long short-term memoryneural networks,”

Applied Energy , vol. 251, p. 113381, 2019.[32] S. Lu, B. Luo, T. Patel, Y. Yao, D. Tiwari, and W. Shi, “Making diskfailure predictions smarter!” in { USENIX } Conference on File andStorage Technologies ( { FAST } , 2020, pp. 151–167.[33] S. Lu, Y. Yao, and W. Shi, “Collaborative learning on the edges: Acase study on connected vehicles,” in { USENIX } Workshop on HotTopics in Edge Computing (HotEdge 19) , 2019.[34] C. Anastassopoulou, L. Russo, A. Tsakris, and C. Siettos, “Data-basedanalysis, modelling and forecasting of the covid-19 outbreak,”

PloS one ,vol. 15, no. 3, p. e0230405, 2020.[35] F. Spiga and I. Girotto, “phiGEMM: a CPU-GPU library for port-ing quantum espresso on hybrid systems,” in . IEEE, 2012, pp. 368–375.[36] I. V. Morozov, A. Kazennov, R. Bystryi, G. E. Norman, V. Pisarev,and V. V. Stegailov, “Molecular dynamics simulations of the relaxationprocesses in the condensed matter on GPUs,”

Computer Physics Com-munications , vol. 182, no. 9, pp. 1974–1978, 2011.[37] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,S. Ghemawat, G. Irving, M. Isard et al. , “Tensorﬂow: a system for large-scale machine learning,” in

Proceedings of the 12th USENIX Symposiumon Operating Systems Design and Implementation (OSDI) , vol. 16, 2016,pp. 265–283.[38] A. Gulli and S. Pal,

Deep Learning with Keras . Packt Publishing Ltd,2017.[39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al. ,“Scikit-learn: Machine learning in python,”

Journal of machine learningresearch , vol. 12, no. Oct, pp. 2825–2830, 2011.[40] R. Kohavi et al. , “A study of cross-validation and bootstrap for accuracyestimation and model selection,” in

International Joint Conference onArtiﬁcial Intelligence (IJCAI) , vol. 14, no. 2, 1995, pp. 1137–1145.[41] J. D. Rodriguez, A. Perez, and J. A. Lozano, “Sensitivity analysis of k-fold cross validation in prediction error estimation,”

IEEE transactionson pattern analysis and machine intelligence (TPAMI) , vol. 32, no. 3,pp. 569–575, 2010.[42] J.-H. Kim, “Estimating classiﬁcation error rate: Repeated cross-validation, repeated hold-out and bootstrap,”

Computational statistics& data analysis , vol. 53, no. 11, pp. 3735–3745, 2009.[43] A. Graves and J. Schmidhuber, “Framewise phoneme classiﬁcationwith bidirectional lstm and other neural network architectures,”