[PDF] Adaptive Reinforcement Learning Model for Simulation of Urban Mobility during Crises

Abstract

The objective of this study is to propose and test an adaptive reinforcement learning model that can learn the patterns of human mobility in a normal context and simulate the mobility during perturbations caused by crises, such as flooding, wildfire, and hurricanes. Understanding and predicting human mobility patterns, such as destination and trajectory selection, can inform emerging congestion and road closures raised by disruptions in emergencies. Data related to human movement trajectories are scarce, especially in the context of emergencies, which places a limitation on applications of existing urban mobility models learned from empirical data. Models with the capability of learning the mobility patterns from data generated in normal situations and which can adapt to emergency situations are needed to inform emergency response and urban resilience assessments. To address this gap, this study creates and tests an adaptive reinforcement learning model that can predict the destinations of movements, estimate the trajectory for each origin and destination pair, and examine the impact of perturbations on humans' decisions related to destinations and movement trajectories. The application of the proposed model is shown in the context of Houston and the flooding scenario caused by Hurricane Harvey in August 2017. The results show that the model can achieve more than 76\% precision and recall. The results also show that the model could predict traffic patterns and congestion resulting from to urban flooding. The outcomes of the analysis demonstrate the capabilities of the model for analyzing urban mobility during crises, which can inform the public and decision-makers about the response strategies and resilience planning to reduce the impacts of crises on urban mobility.

Full PDF

AAdaptive Reinforcement Learning Model forSimulation of Urban Mobility during Crises

Chao Fan ∗ , Xiangqi Jiang ∗ , Ali Mostafavi Texas A&M UniversityCollege Station, TX 77843 [email protected], [email protected]

Abstract

The objective of this study is to propose and test an adaptive reinforcementlearning model that can learn the patterns of human mobility in a normal contextand simulate the mobility during perturbations caused by crises, such as ﬂooding,wildﬁre, and hurricanes. Understanding and predicting human mobility patterns,such as destination and trajectory selection, can inform emerging congestionand road closures raised by disruptions in emergencies. Data related to humanmovement trajectories are scarce, especially in the context of emergencies, whichplaces a limitation on applications of existing urban mobility models learned fromempirical data. Models with the capability of learning the mobility patterns fromdata generated in normal situations and which can adapt to emergency situationsare needed to inform emergency response and urban resilience assessments. Toaddress this gap, this study creates and tests an adaptive reinforcement learningmodel that can predict the destinations of movements, estimate the trajectoryfor each origin and destination pair, and examine the impact of perturbationson humans’ decisions related to destinations and movement trajectories. Theapplication of the proposed model is shown in the context of Houston and theﬂooding scenario caused by Hurricane Harvey in August 2017. The results showthat the model can achieve more than 76% precision and recall. The results alsoshow that the model could predict trafﬁc patterns and congestion resulting from tourban ﬂooding. The outcomes of the analysis demonstrate the capabilities of themodel for analyzing urban mobility during crises, which can inform the public anddecision-makers about the response strategies and resilience planning to reduce theimpacts of crises on urban mobility.

Keywords:

Adaptive reinforcement learning; Urban mobility; Resilience; Crisis;Simulation.

The resilience of communities to crises such as ﬂooding, wildﬁres, pandemics and other extremeevents hinges on a deep understanding and effective simulation of the impacts of the crisis on humanand physical environment [1]. Urban mobility plays a vital role in community resilience to crises byenabling populations to access critical facilities, such as healthcare, pharmacies, and grocery stores,for instance, [2]; hence, the ability to simulate and examine the impacts of crises on urban mobilityis essential to effectively improve the resilience of cities [3]. Studies, including Lei et al. [4], haveexamined the vulnerability and resilience of transportation systems during natural disasters and othercrises. The majority of existing studies focus primarily on examining the vulnerability of physical ∗ contributed equallyPreprint. under review. a r X i v : . [ phy s i c s . s o c - ph ] S e p oad networks [5, 6] and quantifying the effects of crises on trafﬁc patterns [7, 8]. A critical missingcomponent is the capability to simulate crisis perturbation scenarios for predictive evaluation of theimpacts on movement trajectories and trafﬁc patterns at urban scale to inform emergency-responseand resilience-planning decisions. To address this gap, the goal of this study is to create and test adeep learning model that can capture individuals’ movement patterns during normal situations andimpose crisis-induced perturbations to simulate impacts on movement trajectories and trafﬁc patterns.With the wide use of smartphones and apps with location services, a person’s digital footprints canbe generated based on daily trips, which allow researchers to gather ﬁne-grained data related toindividuals’ movement trajectories during normal situations for model training and prediction [9].Prior studies have attempted to address mobility simulation through statistical models and machine-learning techniques at different scales [10]. One stream of the existing studies focuses on quantitativeassessments of the dynamical and statistical properties of human travel [11]. With more prevalence oflocation data related to mobile phone users, González et al. [12] demonstrated that human movementtrajectories have a high degree of temporal and spatial regularity, as they follow simple reproduciblepatterns. This ﬁnding has signiﬁcant implications for the predictability of individual mobility patternsin normal conditions. Furthermore, measuring the entropy of an individual’s trajectory, Song etal. [13] showed a 93% potential predictability of human mobility under normal situations. Thestatistical metrics provide essential knowledge about the travel distance, radius, and frequently visitedlocations of humans [14]. Despite these advances in the statistical characterization of aggregatedhuman mobility patterns, limited models exist that could capture the temporal sequence of activitiesand spatial distribution of activities of each individual. The majority of the standard mobility modelshave limited capability for predicting individual mobility at a local scale, such as a neighborhood, ina speciﬁc timestamp.With advances in machine-learning techniques and enhanced computational power, it is now possibleto learn and simulate ﬁner-grained mobility patterns, such as the sequential patterns of individuallocation histories [15]. In particular, deep neural networks (DNN) are prevalent in learning andsimulating human dynamics and density from large data sets [16]. For example, Wang et al. [17]designed a DNN architecture with an alternative-speciﬁc utility using behavioral knowledge toanalyze human choices of their travel mode in normal circumstances. Hosseini et al. [18] introduceda deep learning-based methodology to predict trafﬁc state using convolutional neural networks (CNN)by taking the individual vehicle-level data as inputs. The CNN model is further improved to capturecitywide crowd activities with parameter efﬁciency and stability [19]. These deep learning techniquesoffer signiﬁcant improvements in predicting individual mobility patterns with high resolution in termsof locations and timestamps [20].Human mobility, including origin-destination matrixes and the trajectories for each origin anddestination pair, however, is strongly inﬂuenced by the condition of road networks, speciﬁc gatheringevents, friendship effects [21], and residents’ travel habits and lifestyles [22]. In particular, crisessuch as natural disasters cause perturbations in physical infrastructure and roads that inﬂuence humanmovement trajectories. For example, inundated road segments and buildings would necessitateevacuation of affected residents, thus reducing travel demand in certain areas [5]. Human trajectorydata in the context of crises for model training and mobility simulation are scarce, as this data is rarelyrecorded in historical record [23], necessitating adaptive models for simulating human mobility undercrisis scenarios (e.g., ﬂooding, wildﬁres, and blizzards.) Information related to movement trajectoriesand trafﬁc patterns in such extreme and complex situations is rarely recorded in historical data. Thelack of data makes the simulation of human mobility during crisis conditions challenging [24].To address this information gap, this study tested an adaptive reinforcement learning model thatcan (1) learn individual mobility patterns from data collected from regular daily activities; and (2)simulate mobility under extreme events. The model incorporates both time and location factors as theinput features and allows users (e.g., practitioners and decision makers) to adjust the environmentalparameters for various application objectives in depending upon scenarios. To illustrate the applicationas well as the performance of the model in both prediction and application, we trained the modelusing data related to human mobility activities in Houston during March and April 2017, and usedthe model to simulate movement trajectory densities and trafﬁc patterns during the ﬂooding causedby the Hurricane Harvey in August 2017. The potential of the model for applications in emergencyresponse and pandemic prediction is also discussed.2 Related work

The role of urban mobility in a crisis such as access by emergency responders has been emphasizedin the existing literature [25]. Assessing strategies to enhance the resilience of urban mobility isessential to mitigate the negative effects of crises and to improve the effectiveness of response efforts.To this end, existing studies [26, 27] have put forth methodologies to evaluate the vulnerability ofphysical road networks and have proposed appropriate strategies for enhancement of urban mobilityresilience. In another stream of work, Fan et al. developed a mathematical contagion model topredict the spread of ﬂoodwaters over road networks that informs about the disruptions of the urbanmobility networks in ﬂooding events [5]. Dong et al. proposed a machine-learning model that coupledroad networks and water channels to examine the impacts of ﬂooding on road networks through thelens of network dependencies [6]. Despite the success of prior models, the majority of their effortsare focused on physical road networks, such as the accessibility to critical facilities. To expandthe understanding of collective mobility patterns in crises, Wang et al. [27] used empirical data toexamine the impact of natural disasters on population travel distances and radius of gyrations. Theresult of their study indicates inherent resilience of urban mobility in crises. The majority of theexisting studies (such as Lu et al., 2012 [24]) have used empirical approaches or analytical methods(such as network science-based models) to examine mobility patterns when a population is hit bya crisis. A critical missing piece, however, is predictive data-driven models that can realisticallysimulate the spatial-temporal patterns of urban mobility during a crisis. Such predictive modelsand their resulting information should inform about the spatial distribution of the vehicles on roadnetworks and emerging trafﬁc jams caused by crisis perturbations, such as road inundations. Thisinformation is crucial for emergency response and mobility resilience enhancement; however, usingempirical data from past crises, one could only evaluate one scenario of crisis impacts. The keyto addressing this gap is a predictive data-driven model that can learn the patterns of movementsby individual vehicles during normal situations and which is capable of imposing crisis-relatedperturbations to simulated changes in mobility patterns. Such models for urban mobility simulationin a crisis situation could offer a unique and effective approach that enables city planners, emergencymanagers, and decision makers to capture the spatial and temporal patterns of population movementsduring extreme events, and facilitate contingency and hazard mitigation planning.

Simulating urban mobility is an important problem with a wide range of applications, such as trafﬁccontrol, accident warning, pandemic prediction, and urban planning [28]. Therefore, there is a needfor general and robust methods to predict the destinations and trajectories for drivers, especially inpopulated urban areas [29]. A number of prior studies have explored multiple methods to quantify,model and simulate human mobility, ranging from destination (next place) prediction [30] to routeselection. This section discusses existing research work to show the achievements in addressingthe challenge and to highlight the need for adaptive models to simulate urban mobility under crisessituations.First, a number of existing studies have proposed and tested methods for next-place prediction [31].Next-place prediction, also called destination prediction in local urban scale, predicts the movementpatterns of individuals and accordingly estimates the ﬂow of population in both spatial and temporalmanners. In next-place prediction, the majority of the methods deal with the pairs of origins anddestinations based on individuals’ historical trip data. As mentioned earlier, statistical evidenceshows the presence of regularity in human daily mobility patterns [13]. Hence, to detect and learnthis regularity, one commonly adopted approach is to develop clustering algorithms such as k-meansand DBSCAN (Density-based Spatial Clustering of Applications with Noise) algorithm based onhistorical travel data and to identify association rules between the origin and destination [32]. Multiplestudies [33] indicated that the clustering-based approaches have achieved very high performance innext-place prediction under normal circumstances. Location embedding is another method to capturelocation semantics with comprehensive numerical representations [34, 35]. For example, Shimizuet al. proposed a place-embedding method that can learn ﬁne-grained representation with spatialhierarchical information and which achieved higher accuracy in predicting the next place of humantrips in urban areas [36]. 3econd, in addition to single destination prediction, existing studies have also explored models forsequential patterns of individual travel to places [37]. In this task, Markov chain-based models,including Mobility Markov Chain [38], Mixed Markov Chain [39], and Hidden Markov Model [40],are commonly used to predict the location sequence of a given person/vehicle. Associated withlocation sequences but more related to route selection, other research is concerned with predicting thetrajectory selected by an individual from an origin to a destination has garnered signiﬁcant attentionover the past few years [41]. The predicted trajectories allow researchers to not only estimate traveltime [42], but also to analyze travel demand over urban road networks [43]. Due to the complexity oftrajectory prediction and required data scale, only a limited number of studies have examined thisproblem. The most commonly used approach is the shortest-path algorithm that constructs routablegraphs from historical trajectory data and then computes the route based on travel frequencies [44].This approach relies on trafﬁc ﬂow on speciﬁc road segments, but does not consider origin anddestination pairs, that signiﬁcantly inﬂuence route selection [45]. More recent studies direct attentionto the behavioral mobility patterns of individuals by inferring home and workplace, duration ofactivities. and other relevant temporal and spatial features for route prediction [46]. For example,Jiang et al. proposed a mechanistic modeling framework, called TimeGeo, that can generate mobilitybehaviors with a resolution of 10 minutes and hundreds of meters [47].In summary, recent works have signiﬁcantly advanced methods of simulating urban mobility andtrajectory prediction under normal conditions based on the evidence of regularity in human movementand activity patterns. Still unsolved, however, is the challenge of adaptability of a model forsimulating urban mobility when people are exposed to disruptive conditions, such as life-threateningsituations during crises such as ﬂooding, wildﬁres, and even pandemics. Modeling approaches withlimited adaptability would not be useful in assessing movement trajectories and trafﬁc under crisissituations. Advances in adaptive reinforcement learning models provide opportunities for simulatingmobility under crises situations. In the following section, we will elaborate the proposed adaptivereinforcement learning model and then show its application in the context of ﬂood impact analysis inHouston during Hurricane Harvey in 2017.

The proposed adaptive reinforcement learning model comprises three modules: destination prediction,trajectory prediction, and crisis scenario application (Figure 1). The proposed model uses the humanmovement trajectory data as input, learns mobility patterns in regular situations, then simulatesthe trajectories and trafﬁc conditions in crisis situations through adjusting the reward table. Themathematics and steps are formulated and elaborated in the following sub-sections.

In this section, we ﬁrst deﬁne the notations and terminologies that are used in this study. • Deﬁnition 1 . A trajectory T i , also named as waypoints , route or path , is a spa-tial trace of a vehicle (or an individual) generated by their mobile apps in geographi-cal space. T i contains a sequence of locations with speciﬁc latitudes and longitudes: [( x i , y i , t i ) , ..., ( x ik i , y ik i , t ik i ) , ..., ( x in i , y in i , t in i )] , where x ik i ∈ R and y ik i ∈ R are the lati-tude and longitude of a trace record i at the k it h location at time t ik i ∈ R , and k i ∈ [0 , , ..., n i ] . • Deﬁnition 2 . A trip t i , also called O-D pair , is a pair of origin and destination for a given tracerecord i . Trips can be extracted from trajectory data. • Deﬁnition 3 . A provider is a service provider which is allowed to anonymously collectlocations from mobile devices. The providers are sorted into four categories: consumer vehicles,taxi/shuttle/town car services, ﬁeld service/local delivery ﬂeets, and for-hire/private truckingﬂeets. Each provider’s services fall exclusively into one category. • Deﬁnition 4 . A grid world is a grid based on a geographical map. The cells in the grid worldare in the same shape and with equal area. Each cell represents a speciﬁc state (or location)showing is the location of a vehicle. This term is used to address the Markov decision process(MDP). • Deﬁnition 5 . A policy , π , is a set of choices of actions for an agent (people or vehicle in thisstudy) at each state. 4 odule 1: Destination prediction

Input

20% 80%KNNTrip data

Module 2:

Trajectory prediction

Module 3:

Crisis scenario application

Waypointdata in regularsituation

InverseReinforcementLearning (IRL)Priority DeepQ-Network • Impact analysis• Response prioritization• Mitigation planning

Time 2361Week 1Location [23,95,2]

PredictionFeaturegeneration

Origin Destination

O_1 D_k… … … …O_n D_m

Prediction

DOAgentEnvironment

StateReward Action

Flooding in urban areaUpdatereward table

Partition by provider id Set a grid world (MDP)Test set Training set Adaptiverewardtable Trajectorydensity Trafficcongestion

Figure 1: A schema of the proposed adaptive reinforcement learning model for simulating urbanmobility.Based on these deﬁnitions, the destination prediction problem can be described thus: given anorigin with spatial coordinate ( x i , y i , t i ) and a speciﬁc scenario, predict the spatial coordinate ofthe destination ( x in i , y in i ) . The trajectory prediction problem can further be described as: givena pair of origin ( x i , y i , t i ) and destination ( x in i , y in i ) with time information, predict a trajectory, [( x i , y i , t i ) , ..., ( x ik i , y ik i ) , ..., ( x in i , y in i )] , that a vehicle (or an individual) will go through from theorigin to the destination. The speciﬁc time in a location on the path to the destination is uncertaindue to the dynamic trafﬁc states and data collection frequency. Hence, in this study, the trajectoryprediction task will predict only the route a vehicle (or an individual) will select from an origin to adestination, while the speciﬁc time for the location of the vehicle (or the individual) on the route willnot be estimated. The ﬁrst module of the model is a tool that can predict the destination of a vehicle, given an origin ata speciﬁc time. As suggested by existing studies, human movements are spatial- and time-dependent,meaning that each pair of origin and destination should include both temporal and spatial features toensure the model is both accurate and general. To achieve this, this section discusses the process ofgenerating relevant features.For the temporal dimension, each trip sets the start time at the origin, showing the time of the day andday of the week. Empirical results show that urban mobility presents a signiﬁcant weekly recurrentpattern [21]. To incorporate this feature and to mitigate the effect of mobility variations acrossdifferent days in a week, we converted the time of the day and day of the week into an integratedfeature, called t n . Speciﬁcally, we considered that any time within a week can be represented by areal number from 0 to 1 [48]. 0 represents the start of Monday (00:00 a.m., Monday), and 1 representsthe end of Sunday (23:59 p.m., Sunday). Each day can be divided into 24 bins based on hours. Then,a speciﬁc time can be translated by: t n = hours + mins/

60 + 24 · weekday · (1)where hours represents the hours of the time in 24-hour system, mins represents the minutes of thetime, and weekday represents the number of the days from Monday (for example, weekday =2 forWednesday). In this way, we can specify the time feature of a trip in a continuous manner acrossdifferent times in a week. 5ince people tend to have recurrent movement patterns during the weekday and special travels duringthe weekend (Saturday and Sunday), we also considered this difference in the model by using thecategory of weekday and weekend as a separate feature, w n . The value of the trips in this feature canbe represented using the following rule: w n = (cid:26) weekday weekend (2)The location of the origin and destination is usually recognized by the latitude and longitude coor-dinates. The resolution of the coordinates tends to be in meters. This raises a signiﬁcant challengefor the predictive models to accurately learn the locations from the training data and predict loca-tion for a given origin. Also considering extremely high-resolution coordinates will also increasecomputational cost for the model. To simplify this process and maintain fairly high resolution, weround the coordinates into numbers with three decimals places, (This threshold is ﬂexible and canbe selected based on the requirement of the resolution.) Similar to the idea of grid, the coordinatesare associated to a cell and represented by a centroid of the corresponding cell. Furthermore, sincethe earth is an ellipsoid which should be described as a three-dimensional space, we projected thelatitude and longitude to a Euclidean 3D space using the following formula [48]: x st = cos( x i ) cos( y i ) , y st = cos( x i ) sin( y i ) , z st = sin( x i ) (3)Through this approach, we can obtain three location features to represent a location of an origin ( x st , y st , z st ) . We also did the same transformation for the coordinates of the destination in thetraining data and obtain ( x ed , y ed , z ed ) . Once the ﬁve features for an origin are prepared, we can train a supervised learning model to learn themovement patterns by extracting the association between origins and destinations from the historicaldata in regular conditions. Among the commonly used learning methods [49], k-nearest neighbor(k-NN) regression demonstrates high performance in predicting the location feature values of thedestinations. The k-NN regression method assigns weights based on the contributions of the neighborsfor the predicted outcomes (location features in this study). The nearer the neighbors, the morecontribution they can offer on the outcome values.Speciﬁcally, the k-NN model will take the list of trips t i = ( O i , D i ) as the training set, includingall time and location features of origin and destination. O i = ( t in , w in , x ist , y ist , z ist ) , and D i =( x ied , y ied , z ied ) . Then, we deﬁned the number ( k ) of nearest neighbors that would contribute topredicting the destination. The value of k should be learned from the training data by minimizing theroot mean square error (RMSE) which is deﬁned below. The distance between every two neighbors(two origins: O i and O j ) is calculated using Euclidean norm ( L norm), (cid:107) · (cid:107) , as shown below: (cid:107) O i − O j (cid:107) = (cid:113) ( t in − t jn ) + ( w in − w jn ) + ( x ist − x jst ) + ( y ist − y jst ) + ( z ist − z jst ) (4)The location feature values of the predicted destination are the average values of the location featurefor the corresponding destinations of the k-nearest origins. This process generates three locationfeature values for the predicted destination. We then convert the location features back to the latitudeand longitude with three decimal places of accuracy, analogous to the centroid of the destinationcell. This coordinate is the predicted coordinates for the destination. By adjusting the value of k andcalculating corresponding RMSE based on all pairs of predicted coordinates and actual coordinates,we can ﬁnd an optimized value of k which can minimize the RMSE with high predictive performance.This process is analogous to addressing a regression problem. Hence, we use the RMSE [50] tomeasure the error between the actual and the predicted values: RM SE = (cid:118)(cid:117)(cid:117)(cid:116) N N (cid:88) i

12 [( x in i − x (cid:48) in i ) + ( y in i − y (cid:48) in i ) ]) (5)6here x (cid:48) in i and y (cid:48) in i are the latitude and longitude of the predicted destination for a trace record i ,and N is the number of traces in the test data. Since we calculate the RMSE for each trace, the valueof N , here, should be 1. The RMSE takes the Euclidean distances between the predicted outcomesand the observations which are then normalized across distances among all test data. Once the destinations are predicted for given origins, the next step is to estimate the trajectoriesbetween the origins and destinations. An effective and commonly used approach is to structurethe space of learned policies, which more generally called Markov decision process. The Markovdecision process is a stochastic control process with a tuple of ( S, A, P, R ) , where S is a set of states s i ∈ S , here meaning the cell in a representing vehicle location; A is a set of actions a ∈ A whichincludes the four directions a vehicle can move from one cell to another in this context; P ( s (cid:48) i | s i , a ) is the transition probability that action a in state s i for agent i leads to state s (cid:48) i in the next timestamp.Following a transition matrix T ( s i , a, s (cid:48) i ) ; and R ( s i , a, s (cid:48) i ) is the reward function that speciﬁes thereward a vehicle will immediately receive after transitioning from state s i to state s (cid:48) i due to action a .To simplify the state space and also to enable the matrix computation, we create a grid world thatrepresents all possible states of an agent in the environment. With a start state (i.e., the origin of avehicle) and the terminate state (i.e., the destination), the objective of this MDP is to learn an optimalpolicy π : S → A , that can maximize the total rewards for an agent to move in the grid world [51].To learn a policy, we ﬁrst need to specify the parameters in the model, the probability distributionover a trajectory P ( ζ | T, κ ) ( ζ is a trajectory; and κ is the reward weights), transition matrix T , andthe reward function R . The calculation starts with the transition matrix T . In this study, we assumethat the agents have the same probability to go in each direction a among the four directions. Hence,the transition probability is set to be equal (the value is 1 if the action can be taken) among the fourdirections. Since the agent is usually not allowed to move out of the grid and the destination, we setthe transition probabilities of some actions at the border and destination cell to be 0.Then, the probability P T that an agent will be in the state s (cid:48) i at next step given a state s i , an action a , and a transition matrix T , can be computed as P T ( s (cid:48) i | a, s i ) . The probability distribution over atrajectory P ( ζ | T, κ ) would be proportional to the product of all probabilities over the path: P ( ζ | T, κ ) ∝ (cid:89) s i ,a,s (cid:48) i ∈ ζ P T ( s (cid:48) i | a, s i ) (6)To learn the reward function from historical trajectory data, we maximize the likelihood of thehistorical trajectory data under the maximum entropy distribution: κ ∗ = arg max κ (cid:88) data log P ( (cid:101) ζ | T, κ ) (7)where (cid:101) ζ speciﬁcally refers to the historical trajectories. Solving this function by gradient-basedoptimization method, we can obtain the reward weights for the grid world. The reward value of atrajectory is simply the sum of the state rewards, which is calculated by the product of the path featureand the reward weights. R ( f (cid:101) ζ ) = (cid:88) s i ∈ ζ κ (cid:62) f s i (8)where f (cid:101) ζ is the path feature, which is the sum of the grid feature of each state, f s i . The formula isshown as follows: f (cid:101) ζ = (cid:88) s i ∈ (cid:101) ζ f s i (9)In implementing the model, the ﬁrst challenge we encounter is the computational cost for training theagent on the grid world. That is because, in each step, when the agent wants to determine the action7nd next state, the model has to do matrix multiplications. Since we include the whole Houstonmetropolitan area to create the grid world, the matrix would be very large and, thus, calculating thereward would and informing the agent to make a decision would require an onerously long calculationtime. This cost is usually not allowed for the model that wants to be applied to various contexts,especially in crisis settings. In fact, the cells distant from the origin and destination are not to makeany impact on the trajectory selection of an agent. To reduce the computational cost, hence, in theimplementation process, we draw only a small bounding box that can incorporate both origin anddestination. Then, all cells in the bounding box are considered to be in the reward matrix to do rewardcalculations.The second challenge would be the effect of the historical trajectories on the immediate reward values.As discussed earlier, the reward values in the reward table are learned from the historical trajectoriesthat pass through the cell. For an origin and destination pair, we aim to ﬁnd a trajectory to connectthese two cells. The historical trajectories that connect other O-D pairs would, however, give weightsto other cells in the reward table. Sometimes, the weights from the trajectories for other O-D pairsare quite high, inducing the agent to go other directions, moving around some high reward cells, andeven leaving the actual destinations. To mitigate the effect of noisy historical trajectories, we includeonly the historical trajectories that pass through the given origin and destination to learn the rewardtable. This solution would effectively overcome the noise in the reward tables, reduce the trainingtime, and ﬁnally improve the efﬁciency and performance of the model in learning reward tables. The reward table for each pair of origin and destination is the key to predicting the trajectory that anagent would select under a speciﬁc circumstance. Through examining the reward function learnedfrom the Markov decision process, some common problems hinder an agent to ﬁnd the optimal policy.First, due to the sparsity of the data, it is often the case that the immediate rewards among all cellsare quite close to each other, meaning that an agent would be not able to effectively distinguish thebeneﬁts of two states to make an accurate choice of the next state. As a result, the model is usuallyunstable, generating different trajectories in different rounds of implementation. In addition, thereward values learned by the reinforcement learning model are always positive for all cells in the gridworld. The agent tends to get stuck in some cells with relatively high rewards since the next statedoes not offer substantial reward to enable a moving action. To address these pitfalls, we conducteda normalization to project the values of the immediate rewards to a negative space with a range of [ − , . The normalized reward matrix is denoted as R (cid:48) . The process can be formulated using thefollowing equation: r (cid:48) ij = r ij − r max r max − r min (10)where r ij is the immediate reward in cell at i th row and j th column in the grid in according to rewardfunction R ; r max and r min are the maximum and minimum immediate reward respectively amongall cells in the same grid. By doing so, the range of the reward values will be extended and distributedin a greater space. The negative values of the rewards can enable the agent to move out of some cellshalfway on the trajectory and get a high reward only at the destination cell.Second, in the reward table, the reward of an action given to the agent does not always compel theagent to move towards the destination due to a high density of the road networks and concentration ofhistorical trajectories. That means an agent tends to move around the cells with high reward values,sometimes even moving back to the origin. The agent, however, should intentionally move towardsthe destination. The closer the next state to the destination, the higher reward the agent can obtain.To this end, we created a new matrix, L , that can be added up to the reward table to compensate forthe effect of proximity to the destination on trajectory selection. Empirically, the reward distributionaccounts for the closeness to the destination centers at the destination cell. The reward has to bereduced if the agent does not get to the destination or if the agent moves further from it. The ideaof the reward distribution is similar to the Gaussian distribution. Hence, we employed the Gaussiandistribution to assign rewards to the cells in the grid world. The reward regarding the closeness to adestination is denoted as l ij for the cell at i th row and j th column. The value of l ij can be obtainedusing the following formula: 8 ij = f ( i ) + f ( j ) − δ (11) f ( i ) = 1 σ √ π e − ( i − µσ ) (12) f ( j ) = 1 σ (cid:48) √ π e − ( j − µ (cid:48) σ (cid:48) ) (13)where f ( i ) and f ( j ) are the reward values calculated based on the row and column; µ and µ (cid:48) are therow and columns of the destination, respectively; σ and σ (cid:48) are calculated as the variance of rows andcolumns. δ is a parameter that controls the range of the reward values. Regarding the contribution ofthe closeness to the destination, the value of δ can be tuned to improve the prediction performance ofthe model.Third, the model usually does not converge at the destination cell because the reward at the destinationis not signiﬁcantly higher than other states. The agent would go back and forth around the destinationcell. To address this problem, we simply assigned a large reward to the destination cell so that andthe process can be converged when the agent arrives at and stops at the destination. The matrix canbe created based on the formula shown as follows: d ij = (cid:26) if ij is the destination otherwise (14)where d ij is the extra reward given to the destination cell, while other cell would receive no reward.After all transformation matrices are prepared, ﬁnally, the reward table for a pair of origin anddestination can be obtained by combining all of these matrixes: R t = R (cid:48) + L + D (15)Using this transformed reward table, we can train agents for each origin and destination pair toeffectively learn the optimal policy. Once the reward table for each pair of origin and destination is prepared, we deﬁne that the expectedutility starting in s i and acting optimally is V ∗ ( s i ) . The expected utility starting out having takenthe action a from state s i and thereafter acting optimally is deﬁned as Q ∗ ( s i , a ) . Here Q ∗ ( s i , a ) iscomputed by: Q ∗ ( s i , a ) = (cid:88) s (cid:48) i T ( s i , a, s (cid:48) i )[ R t ( s i , a, s (cid:48) i ) + γV ∗ ( s (cid:48) i )] (16)where γ is a hyperparameter that can be adjusted to balance the contributions of the reward functionand the expected utility. Based on the historical data, we can maximize the value of a state by: V ∗ ( s i ) = max a Q ∗ ( s i , a ) (17)Finally, we can learn the optimal policy: V ∗ ( s i ) = max a (cid:88) s (cid:48) i T ( s i , a, s (cid:48) i )[ R t ( s i , a, s (cid:48) i ) + γV ∗ ( s (cid:48) i )] (18) π ∗ ( s i ) = arg max a (cid:88) s (cid:48) i T ( s i , a, s (cid:48) i )[ R t ( s i , a, s (cid:48) i ) + γV ∗ ( s (cid:48) i )] (19)9 deep Q-network (DQN) with prioritized experience replay is adopted to solve the behavioralpolicy and predict the optimal trajectory that the agent would select for a given O-D pair. Considera historical trajectory where the model has learned and estimated the Q ∗ value for an action. Theempirical trajectories should be encouraged to be sampled in the prediction. To prioritize the empiricaltrajectories, we ﬁrst measure the difference between the predicted Q ∗ value and the experienced Q ∗ value in the same state for the same action, which is represented by δ i : δ i = R t ( s i , a, s (cid:48) i ) + γV ∗ θ − ( s (cid:48) i ) − Q ∗ θ ( s i , a ) (20)where θ − represents the target deep neural network, and θ represents the current deep neural network.The equation can further be reformulated as: δ i = R t ( s i , a, s (cid:48) i ) + γQ ∗ θ − ( s (cid:48) i , arg max a Q ∗ θ ( s (cid:48) i , a )) − Q ∗ θ ( s i , a ) (21)The left-hand side of the equation, R t ( s i , a, s (cid:48) i ) + γQ ∗ θ − ( s (cid:48) i , arg max a Q ∗ θ ( s (cid:48) i , a ))) , is the targetvalue, and the right-hand side of the equation, Q ∗ θ ( s i , a ) , is the predicted value. This equationprovides a quantitative measure of how much the deep neural network can learn from the givenexperience sample i . This is notated as the Double-Q temporal difference (TD) error.To ﬁnd an appropriate θ and an optimal trajectory, the TD error is minimized by using expectation ofthe samples from the replay buffer D : min θ E ( s i ,a,R,s (cid:48) i ) ∼ D [( R t ( s i , a, s (cid:48) i ) + γQ ∗ θ − ( s (cid:48) i , arg max a Q ∗ θ ( s (cid:48) i , a )) − Q ∗ θ ( s i , a )) ] (22)By doing so, we can predict the optimal trajectory for each pair of origin and destination.The performance of trajectory prediction is measured based on the overlaps between the actual andthe predicted trajectory. Since we created the grid world for the Markov decision process and optimalpolicy learning, the trajectories are characterized by the cells. The intersection between the set ofcells in the predicted trajectory and the set of cells in the actual trajectory indicates the precision andrecall of the model for O-D pairs [52]. Hence, we adopted the following formulas for measuring theprecision and recall for each O-D pair to assess the performance of the model: P recision = T rue positiveT rue positive + F alse positive (23)

Recall = T rue positiveT rue positive + F alse negative (24)where

T rue positive is the number passing cells that the model correctly predicts and that presentsin the actual trajectory;

F alse positive is the number of passing cells that the model incorrectlypredicts and that are not in the actual trajectory;

F alse negative is the number of cells that areincorrectly predicted to be out of the trajectory. The calculation of the precision and recall allows usto quantitatively assess the performance of the model for each O-D pair and to identify the speciﬁccorrectly predicted cells.

The adaptability of the model is an important feature that enables the model to be widely used invarious contexts, especially in crises situations, such as ﬂooding, hurricanes, wildﬁres, and pandemics.The major change in the model due to the variation of the contexts is the destination and immediatereward when agents make their decisions on the actions. Therefore, in this section, we discuss thesteps and strategies to update the destination selection and the reward table. The detailed steps areshown below: • Step 1 : sampling the origins based on the historical data (or the density of population);10

Step 2 : predicting the destinations corresponding to each sampled origin using the well-trainedﬁrst module of the model. In particular contexts, such as crises, some destinations might havebeen damaged or closed. Based on the observed situations, these destinations must be removedfrom results and only reachable destinations retained. • Step 3 : generating the situation matrix F to represent the situation between the origin anddestination and updating the transformed reward table with the situation matrix F . Take theﬂooding event as an example. The value of each cell f ij in the situation matrix F can representthe extent of ﬂooding in that cell, such as the number of ﬂooded road segments or the area ofthe ﬂooded cell. Then, multiplied by a parameter /β to normalize the values, we can add thematrix to the transformed matrix or subtract the transformed matrix with the situation matrix,following the formula shown below: R t = R (cid:48) + L + D − F/β (25)where, the value of β can be selected based on the contribution of the situation to the trajectoryselection of the agents. • Step 4 : predicting the trajectory of the given origin and destination using the updated rewardtable and the priority deep Q-network. • Step 5 : augmenting the prediction results. It is important to note that the model can simulateonly the trajectory for each origin and destination pair, instead of directly predicting the numberof vehicles on the road. The model, however, has the capability to simulate trafﬁc conditions onthe road based on the simulated trajectories. In this step, the algorithm can ﬁrst simulate thetrajectory for an origin and destination pair 50 times. Then the algorithm selects the cells withhigh pass-through probability. We use the number of times, n ij , that the cell is included in thepredicted trajectory as the weight of the cell, v ij , and other cells are considered as zero, as theequation shown below: v ( k ) ij = (cid:26) n ( k ) ij if grids are selected otherwise (26)Each pair of origin and destination would have one vehicle matrix, V ( k ) , that encompasses theentire grid world. Since a cell would likely to be on multiple trajectories for different pairs oforigins and destinations, for all pairs of origins and destinations, we use c ij = (cid:80) Nk =1 v ( k ) ij torepresent the number of vehicles passing through this cell, where N is the number of simulatedpairs of origins and destinations.Through the implementation of all these steps, we can arrive at a ﬁnal trafﬁc matrix C to representthe number of vehicles simulated in the cells on the grid world. This matrix would not only show thecommon trajectories that people would choose to reach their destinations, but also can inform theemergency response and resource allocation based on perturbations in mobility and trafﬁc patterns inthe context of crises. The capabilities of the model in simulation of urban mobility during crises isillustrated in the Section 4.4. To demonstrate the performance and capabilities of the proposed model, we ﬁrst introduced the datasets and then elaborated a speciﬁc use case of the model with an application in ﬂooding impactanalysis in the Houston metropolitan area in the context of Hurricane Harvey in 2017.

The data set, which is used to train, tune, and evaluate the adaptive reinforcement learning model,comes from INRIX, a private location intelligence company providing location-based data andanalytics. The reason why we employ these data are twofold. First, the INRIX data provides verydetailed coordinates along with the trajectory of the vehicles. That is, INRIX collected the coordinatesof the vehicles every few seconds. Hence, the temporal and spatial resolution of the data is high.Second, the INRIX data are collected over the entire Houston metropolitan area over a time period oftwo months (March through April 2017) at all times of the day. This yielded a dataset of more than11 C u m u l a t i ve p r ob a b ili t y d e n s i t y f un c t i on RMSE

Provider 1Provider 2Provider 3Provider 4

Figure 2: The performance of the model for destination prediction in four example providers. Theaverage RMSEs for providers 1 through 4 are 0.0288, 0.0701, 0.0780, and 0.0824, respectively.26 million trip records collected in a continuous timeframe. Third, the INRIX data contains the tripsfor hundreds of providers (deﬁned in Section 3.1) with four vehicle types. The vast volume of thetrips and the diversity of vehicle types allow us to capture the various activity patterns and to enhancethe robustness of the proposed approach in learning and simulating large-scale urban mobility.To prepare this data for training, we selected providers with large record sizes and discarded roughly20% of the trips that were too short (distance between origin and destination less than 5 miles) thatmight induce noise into the training process. Then, we randomly selected 80% of the trips as thetraining set and 20% of the remaining data as the testing set.

The ﬁrst task is to predict the destinations, given the origins. We mainly train the model to automati-cally ﬁnd an optimal value for the number of neighbors, k , which indicates the number of neighborsthat should be included for estimating the coordinates of the destinations. Since we have a largenumber of providers with different types, we trained the model on the dataset from each providerseparately.Figure 2 shows the prediction performance of the models in four example providers. As shown inthe ﬁgure, the RMSE goes from a very small value, about 0.005, and then grows gradually to 0.1when 80% of the O-D pairs are predicted. Only 20% of the O-D pairs have an RMSE greater than0.1. The result indicates that the model can accurately predict the destinations for given originsbased on the parameters learned from training data. Since we ﬁltered out short-distance trips, thelengths of the remaining trips in the training and testing data are relatively long. By measuring thedistances between the actual destinations and the predicted destinations, we ﬁnd that a few predicteddestinations are less than 0.3-mile from the actual destination, about 10% of the predicted destinationsare less than 0.6-mile from the actual destination, and more than 50% of the predicted destinations arewithin the 3-mile distance from the actual destination. Considering the length of the trips themselves,the differences between the actual and predicted destinations are much shorter. Hence, the accuracyof the results is acceptable. Figure 3 further shows the spatial accuracy of some example O-D pairs.Figure 3 also shows the high performance of the model. Comparing the results among all providers,we also ﬁnd that the model is quite stable with respect to the prediction performance and is notaffected by the types of providers. This result demonstrates that the model is robust and generalizableto be applied to different data providers for destination prediction. Once we acquire the destinations for given origins, we further use them to predict the optimaltrajectories between the origins and destinations. The ﬁrst step is to compute the reward table for eachO-D pair. As discussed in the methodology section, we include only the cells within the boundingbox. Figure 4 shows examples of reward tables for some of O-D pairs. The bounding box is a bitlarger than the actual box tightly covering the origin and destination. In the majority of the example12 xample P1 L a t i t ud e -95.9 -95.7 -95.5 -95.3 -95.1 -94.9 Longitude -95.9 -95.7 -95.5 -95.3 -95.1 -94.9

Longitude

Example P2 L a t i t ud e L a t i t ud e -95.9 -95.7 -95.5 -95.3 -95.1 -94.9 Longitude

Example P3Example P4 -95.9 -95.7 -95.5 -95.3 -95.1 -94.9

Longitude L a t i t ud e Actual destinationPredicted destinationOrigin

Figure 3: Example origin-destination pairs showing the performance of the model. (The blue polygonis the borderline of Harris County, Texas, USA.)

Example R1 Example R2 Example R3Example R4 Example R5 R e w a r d R e w a r d R e w a r d R e w a r d R e w a r d Figure 4: Example reward tables for origin and destination pairs, learned from training data usingreinforcement learning.tables, such as R1, R2, and R5, we can ﬁnd that the some signiﬁcantly highlighted cells have highreward values, and the agent can effectively determine the optimal trajectories from the origin to thedestination. Although historical data do not have a signiﬁcant footprint for some O-D pairs, such asR3 and R4, the model can still impulse the agent to ﬁnd an optimal trajectory.A quantitative assessment of the model performance is shown in Figure 5. We tested the model formultiple providers and O-D pairs. As the results suggest, the majority of the predicted trajectoriesare similar to the actual trajectories selected by the vehicles. Both average precision and recall arefairly high (although they are negatively affected by some extreme values). Figure 6 shows someexamples of actual trajectories, predicted trajectories, and their overlaps. Despite the complexity ofthe historical trajectories transited by the vehicles, the model can still identify optimal trajectories foreach pair of origin and destination. The performance of the model is also stable when it is trainedand tested on different types of providers and datasets. These results and ﬁndings indicate that theproposed model is robust and could be used for trajectory-ﬁnding tasks.13 . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] Precision N u m b e r o ft es t e d t r a j ec t o r i es [ . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] Recall N u m b e r o ft es t e d t r a j ec t o r i es Figure 5: Performance of the trajectory prediction using module 2 of the proposed model (averageprecision: 0.765; average recall: 0.766).

Example T1 Example T2 Example T3 -95.9 -95.7 -95.5 -95.3 -95.1 -94.9

Longitude L a t i t ud e L a t i t ud e -95.9 -95.7 -95.5 -95.3 -95.1 -94.9 Longitude -95.9 -95.7 -95.5 -95.3 -95.1 -94.9

Longitude L a t i t ud e Actual trajectory

Predicted trajectoryOverlap

Figure 6: Example trajectories showing the performance of the model. (The blue polygon is theborderline of Harris County.)

To illustrate the application and capabilities of the proposed adaptive reinforcement model, weimplemented the model for the situation during Hurricane Harvey in late August 2017 in the Houstonmetropolitan area. Hurricane Harvey, a Category 4 tropical storm, landed in Houston on August26, 2017, and dissipated inland August 30, 2017. August 27 is the date with extreme and sustainedrainfall that subsequently caused large-scale ﬂooding over the urban areas, especially on urban roadnetworks. As reported in news articles and government reports, more than 115,000 buildings weredamaged, and 290 roads and highways were ﬂooded [53, 54]. To illustrate the performance of themodel, we selected a speciﬁc short time interval, 10:00 a.m. through 12:00 p.m., August 29, 2017.We simulated vehicles that would move during this time interval and estimated their trajectory tounderstand the trafﬁc conditions under the perturbation of urban ﬂooding. Here we measured thetrafﬁc conditions in different locations by using the number of vehicles that move across a cell.To be consistent, the data set used to validate the capability of the proposed model for scenarioapplication was also obtained from INRIX. For each road segment, the dataset includes the averagespeed in 5-minute intervals and the speed in free-ﬂow conditions (estimated as the speed limit forthe roads). The ﬂood situation on the road segments can also be reﬂected by the INRIX data. In theINRIX average speed dataset, the ﬂooded road segments are identiﬁed by a designation of NULL forthe trafﬁc speed since no trafﬁc records and no vehicles are driving through ﬂooded road segments.To test the accuracy and validity of the data for identifying ﬂooded roads, we also did a comparisonof the NULL records between the datasets collected before and during the Hurricane Harvey, andalso checked the records with ﬂooded roads from government reports [6]. We found that the NULLrecords presented only during the ﬂooding period and were consistent with the ﬂooded locations.Using this dataset, we identiﬁed the ﬂooded road segments and quantiﬁed the extent of ﬂooding ina cell. In the grid map for Houston, Harris County, each cell contains multiple road segments. Tomeasure the extent of ﬂooding in a speciﬁc cell, we considered the number of ﬂooded road segmentswithin a cell as a metric. As shown in Figure 7(a), the ﬂooded cells are concentrated on the centralarea of Houston and along the major roads such as highways.14 L a t i t ud e L a t i t ud e -95.9 -95.7 -95.5 -95.3 -95.1 -94.9 Longitude -95.9 -95.7 -95.5 -95.3 -95.1 -94.9

Longitude N u m b e r o ff l ood e d se g m e n t s P r e d i c t e dd e n s i t y o f ve h i c l es L a t i t ud e -95.9 -95.7 -95.5 -95.3 -95.1 -94.9 Longitude A c t u a l ave r a g es p ee d P r e d i c t e dd e n s i t y o f ve h i c l es

30 40 50 60 70 80 90 100 110

Actual average speed (km/hour) N u m b e r o ff l ood e d se g m e n t s (a) (b)(c) (d) Figure 7: The simulation results. (a) Number of ﬂooded segments in Houston, Harris County (theblue borderline); (b) Actual average speed for major road segments during the investigated timeperiod; (c) Predicted density of vehicles during the investigated time period; and (d) Relationshipsamong three variables for simulation validation.The trafﬁc conditions in a cell can be captured by the actual trafﬁc speed on the road segment. Sinceour model is mainly learned from the vehicles driving on major roads, here, we take into account onlythe road segments on which the speed limit is equal or greater than 50 km/hour so that all simulatedresults can be on the same road basis. This step also contributes to eliminating the effect of speedlimits (types) of roads on the results. Hence, we computed the average of the actual speed on theseselected road segments as the metric for trafﬁc conditions for each cell. Figure 7(b) shows areaswhere trafﬁc was quite heavy due to ﬂooding.To understand the impact of ﬂooding on the population movement patterns, especially in the trajectoryselection, we implemented the proposed adaptive reinforcement learning model by adopting the ﬂoodconditions to adjust the parameters in the model. The cells with more ﬂooded road segments wouldhave higher negative values added to the reward table. Accordingly, the agent would be less likelyto transit the severely ﬂooded cell to reach their destination. The simulated results are shown inFigure 7(c) and (d). From these results, we found that the majority of the road segments hit hardestby the ﬂood had fewer vehicles compared to the road segments where ﬂooding was less severe (i.e.,passable roads). This result indicates that, the model effectively simulated the trajectories selected bypeople who deliberately avoided ﬂooded road segments. Hence, the road segments that were lessﬂooded tend to have a great number of vehicles with low speed, or fewer vehicles with high speed.The results are quite realistic, which reveals the redistribution of vehicles and the utility of roads inurban road networks when ﬂooding disrupted this area.

In this paper, we presented an adaptive reinforcement learning model that can simulate humanmobility under crisis conditions based on the mobility patterns learned from historical data collectedunder normal circumstances. The proposed model can overcome the scarcity of mobility pattern datain crisis contexts, and also has the capability to simulate different crisis situations for impact analysis.The study showed the application of the proposed model using the digital trace data collected inHouston during regular situations and demonstrated the adaptability of the model in the case of urbanﬂooding during Hurricane Harvey in 2017.The presented model has multiple contributions. First, the proposed simulation model and itsadaptability to different crisis contexts can contribute to hazard mitigation and resilience improvement15f urban areas. Predictive emergency warning and situational awareness are key components ofhazard mitigation, which would beneﬁt from effective simulation of crisis situations. Simulatingurban mobility during crises is an essential step to capture the crisis situation and further inform earlywarning and mitigation planning. For example, using the proposed simulation model, city planners,emergency managers, and decision-makers can simulate scenarios of ﬂooding (such as 100-year and500-year ﬂood scenarios) and road inundations to enable examination of the effects of movementtrajectories and trafﬁc congestion. This output information could inform about areas that wouldexperience signiﬁcant mobility disruptions and trafﬁc congestion under varying ﬂood scenarios. Withsufﬁcient examination of such contingencies, ﬁrst responders and decision-makers can effectivelyassess the extent to which reducing the ﬂood vulnerability of a major road or highway would decreasetrafﬁc congestion and improve movement trajectories. Also, the proposed model could be used tosimulate mobility disruptions between critical origins (such as areas where vulnerable populations,such as the elderly and low-income households, are located) and critical destinations (such ashealthcare facilities). In addition, the outcomes of the proposed model can help with near-real-timeprediction of mobility disruptions and trafﬁc congestions based on the real-time information, suchas the inundated roads, to improve the accessibility of critical facilities and efﬁciency of responderdispatch.Second, the presented study also offers a new perspective for using deep learning techniques anddigital trace data for urban mobility analysis. Although extremely big data related to human digitaltraces has been generated by daily mobility activity of individuals, existing deep learning techniquesare limited to the prediction of regular mobility patterns. The proposed model enables considerationof contextual factors, such as road conditions, road network connectivity and population distribution,in the simulation process, which would signiﬁcantly extend the capabilities of existing deep learningmodels gleaned from normal situations and applied to unfamiliar conditions, such as emergencies andtrafﬁc anomalies, to support decision-making and response strategies. For example, one can learnthe mobility demand of the road networks from daily mobility data and predict the utility of eachroad segment by changing the layout of the networks. This can help design efﬁcient transportationnetworks and trafﬁc control systems to improve the performance of urban road networks.Finally, although the proposed model is general and can be adaptive to various phenomena andapplications, future research could address some limitations. First, although this study achievesgood performance for predicting the destinations and trajectories at a relatively ﬁne-grained scale,the prediction of local movements, such as visits to nearby neighborhoods and points of interest,still needs to be improved. Local movements provide essential information about the lifestyle ofresidents. This study predicts the pure coordinates of destination and trajectories regardless ofthe speciﬁc locations visited and the roads traversed. Inclusion of this information in creating themovement proﬁle of populations and predicting local movements would be of interest and importancein applications related to city and emergency planning. Second, the model has a great potentialto be used as an application for rapid prediction. Implementing the proposed algorithm is stilltime-consuming, which would take a great amount of computational cost for rapid prediction. Hence,there is a need of improving the efﬁciency of the algorithm so that it can be used for more real-timeprediction and across more cities and crises contexts.

Acknowledgement

This material is based in part upon work supported by the National Science Foundation underGrant Number CMMI-1846069 (CAREER), CMMI-1832662 (CRISP 2.0 Type 2), the NationalAcademies’ Gulf Research Program Early-Career Research Fellowship, the Amazon Web Services(AWS) Machine Learning Award. The authors also would like to thank INRIX for providing the datafor this research. Jan Gerston provided editorial services. Any opinions, ﬁndings, and conclusions orrecommendations expressed in this material are those of the authors and do not necessarily reﬂect theviews of the National Science Foundation and Amazon Web Services.

Data availability

The data that support the ﬁndings of this study are available from INRIX and TTI (Texas A&MTransportation Institute), but restrictions apply to the availability of these data, which were usedunder license for the current study, and so are not publicly available.16 ompeting interests

The authors declare that they have no competing interests.

Author contribution

C.F., X.J., and A.M. designed the study. X.J. and C.F. implemented the method and empirical casestudy. A.M. support with data acquisition. C.F. and A.M. wrote the main manuscript. All authorsreviewed the manuscript.

References [1] L Zhu, F R Yu, Y Wang, B Ning, and T Tang. Big Data Analytics in Intelligent TransportationSystems: A Survey.

IEEE Transactions on Intelligent Transportation Systems , 20(1):383–398,2019.[2] L Liu, Z Qiu, G Li, Q Wang, W Ouyang, and L Lin. Contextualized Spatial–TemporalNetwork for Taxi Origin-Destination Demand Prediction.

IEEE Transactions on IntelligentTransportation Systems , 20(10):3875–3887, 2019.[3] Arif Mohaimin Sadri, Samiul Hasan, Satish V Ukkusuri, and Manuel Cebrian. Exploringnetwork properties of social media interactions and activities during Hurricane Sandy.

Trans-portation Research Interdisciplinary Perspectives , 6:100143, 2020.[4] Zengxiang Lei, Xinwu Qian, and Satish V Ukkusuri. Efﬁcient proactive vehicle relocation foron-demand mobility service with recurrent neural networks.

Transportation Research Part C:Emerging Technologies , 117:102678, 2020.[5] Chao Fan, Xiangqi Jiang, and Ali Mostafavi. A Network Percolation-based Contagion Model ofFlood Propagation and Recession in Urban Road Networks.

Scientiﬁc Reports , 10(13481):1–12,aug 2020.[6] Shangjia Dong, Tianbo Yu, Hamed Farahmand, and Ali Mostafavi. Probabilistic Modelingof Cascading Failure Risk in Interdependent Channel and Road Networks in Urban Flooding.

Sustainable Cities and Society , Forthcoming, 2020.[7] Simon Oh, Ravi Seshadri, Carlos Lima Azevedo, Nishant Kumar, Kakali Basak, and MosheBen-Akiva. Assessing the impacts of automated mobility-on-demand through agent-basedsimulation: A study of Singapore.

Transportation Research Part A: Policy and Practice ,138:367–388, 2020.[8] Weiping Wang, Saini Yang, Jianxi Gao, Fuyu Hu, Wanyi Zhao, and H Eugene Stanley. AnIntegrated Approach for Assessing the Impact of Large-Scale Future Floods on a HighwayTransport System.

Risk Analysis , n/a(n/a), jun 2020.[9] Luis E Olmos, Serdar Çolak, Sajjad Shaﬁei, Meead Saberi, and Marta C González. Macroscopicdynamics and the collapse of urban trafﬁc.

Proceedings of the National Academy of Sciences ,115(50):12654 LP – 12661, dec 2018.[10] Xiao-Yong Yan, Wen-Xu Wang, Zi-You Gao, and Ying-Cheng Lai. Universal model of indi-vidual and population mobility on diverse spatial scales.

Nature Communications , 8(1):1639,2017.[11] Ryuichi Kitamura, Cynthia Chen, Ram M Pendyala, and Ravi Narayanan. Micro-simulation ofdaily activity-travel patterns for travel demand forecasting.

Transportation , 27(1):25–51, 2000.[12] Marta C González, César A Hidalgo, and Albert-László Barabási. Understanding individualhuman mobility patterns.

Nature , 453(7196):779–782, 2008.[13] Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. Limits of Predictabil-ity in Human Mobility.

Science , 327(5968):1018 LP – 1021, feb 2010.1714] Chaoming Song, Tal Koren, Pu Wang, and Albert-László Barabási. Modelling the scalingproperties of human mobility.

Nature Physics , 6(10):818–823, 2010.[15] Y Tang, N Cheng, W Wu, M Wang, Y Dai, and X Shen. Delay-Minimization Routing forHeterogeneous VANETs With Machine Learning Based Mobility Prediction.

IEEE Transactionson Vehicular Technology , 68(4):3967–3979, 2019.[16] J Li, X Mei, D Prokhorov, and D Tao. Deep Neural Network for Structural Prediction andLane Detection in Trafﬁc Scene.

IEEE Transactions on Neural Networks and Learning Systems ,28(3):690–703, 2017.[17] Shenhao Wang, Baichuan Mo, and Jinhua Zhao. Deep neural networks for choice analysis:Architecture design with alternative-speciﬁc utility functions.

Transportation Research Part C:Emerging Technologies , 112:234–251, 2020.[18] Mohammadreza Khajeh Hosseini and Alireza Talebpour. Trafﬁc Prediction using Time-SpaceDiagram: A Convolutional Neural Network Approach.

Transportation Research Record ,2673(7):425–435, apr 2019.[19] Yuxuan Liang, Kun Ouyang, Yiwei Wang, Ye Liu, Junbo Zhang, Yu Zheng, and David SRosenblum. Revisiting Convolutional Neural Networks for Citywide Crowd Flow Analytics,2020.[20] Jiawei Wang and Lijun Sun. Dynamic holding control to avoid bus bunching: A multi-agent deepreinforcement learning framework.

Transportation Research Part C: Emerging Technologies ,116:102661, 2020.[21] Eunjoon Cho, Seth A Myers, and Jure Leskovec. Friendship and Mobility: User Movementin Location-Based Social Networks. In

Proceedings of the 17th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining , KDD ’11, pages 1082–1090, New York,NY, USA, 2011. Association for Computing Machinery.[22] Fang Zong, Yongda Tian, Yanan He, Jinjun Tang, and Jianyu Lv. Trip destination predictionbased on multi-day GPS data.

Physica A: Statistical Mechanics and its Applications , 515:258–269, 2019.[23] Pierre Deville, Catherine Linard, Samuel Martin, Marius Gilbert, Forrest R Stevens, Andrea EGaughan, Vincent D Blondel, and Andrew J Tatem. Dynamic population mapping using mobilephone data.

Proceedings of the National Academy of Sciences , 111(45):15888 LP – 15893, nov2014.[24] X. Lu, L. Bengtsson, and P. Holme. Predictability of population displacement after the 2010Haiti earthquake.

Proceedings of the National Academy of Sciences , 109(29):11576–11581,2012.[25] Qi Wang. Human Mobility Perturbation and Resilience in Natural Disasters.

Thesis , 2015.[26] Shangjia Dong, Haizhong Wang, Ali Mostafavi, and Jianxi Gao. Robust component: a robust-ness measure that incorporates access to critical facilities under disruptions.

Journal of TheRoyal Society Interface , 16(157):20190149, aug 2019.[27] Qi Wang and John E Taylor. Patterns and Limitations of Urban Human Mobility Resilienceunder the Inﬂuence of Multiple Types of Natural Disaster.

PLOS ONE , 11(1):e0147299, jan2016.[28] Fan Chao, Jiang Yucheng, and Mostafavi Ali. Social Sensing in Disaster City Digital Twin:Integrated Textual–Visual–Geo Framework for Situational Awareness during Built EnvironmentDisruptions.

Journal of Management in Engineering , 36(3):4020002, may 2020.[29] Zhan Zhao, Haris N Koutsopoulos, and Jinhua Zhao. Individual mobility prediction using transitsmart card data.

Transportation Research Part C: Emerging Technologies , 89:19–34, 2018.[30] Anastasios Noulas, Salvatore Scellato, Renaud Lambiotte, Massimiliano Pontil, and CeciliaMascolo. A tale of many cities: Universal patterns in human urban mobility.

PLoS ONE , 7(5),may 2012. 1831] Xiaolei Ma, Yao-Jan Wu, Yinhai Wang, Feng Chen, and Jianfeng Liu. Mining smart card datafor transit riders’ travel patterns.

Transportation Research Part C: Emerging Technologies ,36:1–12, 2013.[32] Ka Kee Alfred Chu and Robert Chapleau. Augmenting Transit Trip Characterization and TravelBehavior Comprehension: Multiday Location-Stamped Smart Card Transactions.

Transporta-tion Research Record , 2183(1):29–40, jan 2010.[33] R Du, P Santi, M Xiao, A V Vasilakos, and C Fischione. The Sensable City: A Survey on theDeployment and Management for Smart City Monitoring.

IEEE Communications Surveys &Tutorials , 21(2):1533–1560, 2019.[34] Chen Cheng, Haiqin Yang, Michael R Lyu, and Irwin King. Where You Like to Go Next:Successive Point-of-Interest Recommendation.

International Joint Conference on ArtiﬁcialIntelligence; Twenty-Third International Joint Conference on Artiﬁcial Intelligence , 2013.[35] Hongjian Wang and Zhenhui Li. Region Representation Learning via Mobility Flow. In

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management ,CIKM ’17, pages 237–246, New York, NY, USA, 2017. Association for Computing Machinery.[36] Toru Shimizu, Takahiro Yabe, and Kota Tsubouchi. Learning Fine Grained Place Embeddingswith Spatial Hierarchy from Human Mobility Trajectories, 2020.[37] Zhan Zhao, Haris N Koutsopoulos, and Jinhua Zhao. Detecting pattern changes in individualtravel behavior: A Bayesian approach.

Transportation Research Part B: Methodological ,112:73–88, 2018.[38] Sébastien Gambs, Marc-Olivier Killijian, and Miguel Núñez del Prado Cortez. Next PlacePrediction Using Mobility Markov Chains. In

Proceedings of the First Workshop on Measure-ment, Privacy, and Mobility , MPM ’12, New York, NY, USA, 2012. Association for ComputingMachinery.[39] Akinori Asahara, Kishiko Maruyama, Akiko Sato, and Kouichi Seto. Pedestrian-MovementPrediction Based on Mixed Markov-Chain Model. In

Proceedings of the 19th ACM SIGSPATIALInternational Conference on Advances in Geographic Information Systems , GIS ’11, pages25–33, New York, NY, USA, 2011. Association for Computing Machinery.[40] Wesley Mathew, Ruben Raposo, and Bruno Martins. Predicting Future Locations with HiddenMarkov Models. In

Proceedings of the 2012 ACM Conference on Ubiquitous Computing , Ubi-Comp ’12, pages 911–918, New York, NY, USA, 2012. Association for Computing Machinery.[41] S Qiao, N Han, W Zhu, and L A Gutierrez. TraPlan: An Effective Three-in-One Trajectory-Prediction Model in Transportation Networks.

IEEE Transactions on Intelligent TransportationSystems , 16(3):1188–1198, 2015.[42] K Tang, S Chen, and Z Liu. Citywide Spatial-Temporal Travel Time Estimation Using Big andSparse Trajectories.

IEEE Transactions on Intelligent Transportation Systems , 19(12):4023–4034, 2018.[43] S Dabiri, C Lu, K Heaslip, and C K Reddy. Semi-Supervised Deep Learning Approachfor Transportation Mode Identiﬁcation Using GPS Trajectory Data.

IEEE Transactions onKnowledge and Data Engineering , 32(5):1010–1023, 2020.[44] Ling-Yin Wei, Yu Zheng, and Wen-Chih Peng. Constructing Popular Routes from UncertainTrajectories. In

Proceedings of the 18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining , KDD ’12, pages 195–203, New York, NY, USA, 2012. Associationfor Computing Machinery.[45] Lin Yang, Mei-Po Kwan, Xiaofang Pan, Bo Wan, and Shunping Zhou. Scalable space-timetrajectory cube for path-ﬁnding: A study using big taxi trajectory data.

Transportation ResearchPart B: Methodological , 101:1–27, 2017. 1946] Lauren Alexander, Shan Jiang, Mikel Murga, and Marta C González. Origin–destination tripsby purpose and time of day inferred from mobile phone data.

Transportation Research Part C:Emerging Technologies , 58:240–250, 2015.[47] Shan Jiang, Yingxiang Yang, Siddharth Gupta, Daniele Veneziano, Shounak Athavale, andMarta C González. The TimeGeo modeling framework for urban mobility without travel surveys.

Proceedings of the National Academy of Sciences , 113(37):E5370 LP – E5378, sep 2016.[48] Carlos García. carlosbkm/car-destination-prediction, 2017.[49] Caiyan Jia, Yafang Li, Matthew B. Carson, Xiaoyang Wang, and Jian Yu. Node Attribute-enhanced Community Detection in Complex Networks.

Scientiﬁc Reports , 7(1):1–15, 2017.[50] T Chai and R R Draxler. Root mean square error (RMSE) or mean absolute error (MAE)? –Arguments against avoiding RMSE in the literature.

Geosci. Model Dev. , 7(3):1247–1250, jun2014.[51] Brian D Ziebart, Andrew Maas, J Andrew Bagnell, and Anind K Dey. Maximum EntropyInverse Reinforcement Learning. In

Proc. AAAI , pages 1433–1438, 2008.[52] Michael Buckland and Fredric Gey. The relationship between Recall and Precision.