SOUP: Spatial-Temporal Demand Forecasting and Competitive Supply
Bolong Zheng, Qi Hu, Lingfeng Ming, Jilin Hu, Lu Chen, Kai Zheng, Christian S. Jensen
SSpatial-Temporal Demand Forecasting andCompetitive Supply via Graph ConvolutionalNetworks
Bolong Zheng , Qi Hu , Lingfeng Ming , Jilin Hu , Lu Chen , Kai Zheng , Christian S. Jensen Huazhong University of Science and Technology, Wuhan, China { bolongzheng, huqi11, lingfengming } @hust.edu.cn Aalborg University, Aalborg, Denmark { hujilin, luchen, csj } @cs.aau.dk University of Electronic Science and Technology of China, Chengdu, [email protected]
Abstract —We consider a setting with an evolving set of requestsfor transportation from an origin to a destination before adeadline and a set of agents capable of servicing the requests.In this setting, an assignment authority is to assign agentsto requests such that the average idle time of the agents isminimized. An example is the scheduling of taxis (agents) tomeet incoming requests for trips while ensuring that the taxisare empty as little as possible. In this paper, we study theproblem of spatial-temporal demand forecasting and competitivesupply (SOUP). We address the problem in two steps. First, webuild a granular model that provides spatial-temporal predictionsof requests. Specifically, we propose a Spatial-Temporal GraphConvolutional Sequential Learning (
ST-GCSL ) algorithm thatpredicts the service requests across locations and time slots.Second, we provide means of routing agents to request originswhile avoiding competition among the agents. In particular, wedevelop a demand-aware route planning (
DROP ) algorithm thatconsiders both the spatial-temporal predictions and the supply-demand state. We report on extensive experiments with real-world and synthetic data that offer insight into the performanceof the solution and show that it is capable of outperforming thestate-of-the-art proposals.
Index Terms —Spatial-temporal request forecasting, graph con-volutional networks, route planning
I. I
NTRODUCTION
The near-ubiquitous deployment of smartphones has en-abled transportation network companies such as Didi Chuxing[1], Uber [3], and Lyft [2] to operate ride-hailing platformsthat enable the servicing of transportation requests by meansof fleets of drivers. In this setting, drivers accept requestsand move to the origins of requests to complete the requests.Such platforms have reduced significantly the amounts oftime drivers are idle and the amounts of time spent waitingfor service by prospective passengers, thus improving thetraffic efficiency of a city. In this setting, historical requestsprovide insight into the movement patterns of passengers anddrivers, which is beneficial for many applications such astraffic demand prediction, supply and demand scheduling, androute planning.We study the problem of spatial-temporal demand forecast-ing and competitive supply (SOUP), which consists of fore- casting spatial-temporal service requests, as well as planningroutes for agents to active requests in a manner that minimizesthe average idle time of all agents. Besides drivers (agents)looking for passengers (requests), this kind of competitiveassignment problem also occurs in other urban transportationsettings, e.g., drivers looking for parking and drivers lookingfor electric charging stations.Our focus is on a population of drivers servicing an evolvingset of requests for transportation from an origin to a destinationwithin a given time window. The drivers are often calledtaxis. Most existing proposals on crowdsourced taxis focuseither on how to better match taxis with service requests tomaximize global revenue [22], [44], [51], or how to learn taxiand passenger movement patterns from trajectory data to guideroute planning [10], [28], [30], [47]. Once a taxi drops off apassenger and completes a request, no further instructions areprovided to the taxi to reduce the time it is idle before servicingthe next request. Rather, taxis may either stay stationary ormay move towards regions expected high demand, which maylead to competition. We aim to develop a data-driven solutionthat assigns a route to a taxi as soon as the taxi become idlesuch that the average time taxis are idle is minimized.Overall, we address two sub-problems:(i)
Dynamic request availability patterns.
In order to helpagents service new requests quickly, we need to know therequest availabilities across the road network of a city. Wefirst choose carefully a spatial granularity for partitioninga road network and temporal granularity for partitioningtime in order to achieve accurate predictions of requests.We then build a corresponding model that predicts theavailability of future requests.(ii)
Competition among agents.
If all agents tend to movetowards hot regions to find new requests, they willcompete if the supply-demand ratio is high, which causesthe so-called “herding” effect. To eliminate this effect, wedevelop a route planning strategy that assigns agents todestinations with supply-demand balance. a r X i v : . [ c s . D B ] S e p ABLE IS
UMMARY OF N OTATIONS
Notation Definition G = ( V, E ) A road network A = { a i } A set of mobile agents Ω = { ω j } A set of requests I i = { I ik } The set of idle times of agent a i R = { r i } A set of regions on road network T = { t i } A set of time slots during a day r ∗ The search route G = ( R, A ) A region correlation graph A ∈ R N ∗ N The adjacency matrix of graph C in , C out The number of channels for network input and output D i The request availability vector at time slot t i ˆ D t +1 The predicted request availability vector E i The context feature vector at time slot t i c d The final context features X l , X l +1 The input and output of l -th layer Θ l(cid:63) G The spectral kernel of graph convolution at l -th layer Γ l ∗ τ The temporal convolution kernel at l -th layer The route recommendation framework we propose consistsof an offline and an online components. The offline componentcomprises an end-to-end deep learning model, called spatial-temporal graph convolutional sequential learning (
ST-GCSL ),that is capable of predicting request availabilities at differ-ent locations and times. The online component comprises ademand-aware route planning (
DROP ) algorithm that exploitsboth the available spatial-temporal information on requests andthe supply-demand state to guide idle agents.The major contributions are summarized as follows: • We design a multi-level partitioning method that enablespurposeful request availability prediction across spaceand time. • We propose
ST-GCSL to accurately predict the availabil-ity of requests • We develop
DROP to assign routes that takes intoaccount both the available spatial-temporal request avail-ability information and the supply-demand state. • We report on experiments that suggest that the proposed
ST-GCSL and
DROP outperform their baseline methods.The rest of the paper is organized as follows. We detail theproblem addressed in Section II. In Section III, we present themulti-level partitioning method and the
ST-GCSL algorithm.Section IV then presents the
DROP algorithm. The papersexperimental study is covered in Section V, related work isthe topic of Section VI. Finally, Section VII concludes thepaper. II. P
RELIMINARIES
We proceed to introduce the background settings and toformalize the SOUP problem. Frequently used notation issummarized in Table I.
A. Settings
The problem setting encompasses four types of entities: aroad network, mobile agents (taxis), requests (passengers), andan assignment authority.
Definition 1 (Road Network) . A road network is defined asa weighted directed graph G = ( V, E, W ) , where V is theset of nodes, E is the set of edges, and W is the set of edgeweights. Each edge e ( u, v ) ∈ E that starts from node u to v has a positive weight w ( u, v ) ∈ W , i.e., travel time on theedge. Definition 2 (Mobile Agent) . We assume a population ofmobile agents A = { a i } , which is introduced into the systemat once at the beginning. Each a i has an original location l i onthe road network. The population of agents is fixed throughoutthe operation and has cardinality |A| . Agents are initially labeled empty and travel along a so-called search route r ∗ provided by our system. Definition 3 (Request) . We assume an evolving set of requests Ω = { ω j } that are introduced into the system in a streamingfashion. Each request ω j = ( o, d, t o , t ∗ ) has an origin o , adestination d , an introduction time t o , and a maximum lifetime (MLT) t ∗ . A request that is not serviced by t ∗ time unitsafter t is automatically removed from the system, an outcomethat we call request expiration. When a request enters the system, the agent that meetsthe following conditions is assigned to the request by theassignment authority:(i) The agent is empty .(ii) The agent is the agent that is closest to the request.(iii) The shortest-travel-time from the agent to the requestenables the agent to reach the request before it expires.Once an agent is assigned to a request, the agent is labeledas occupied, and the request is removed from the system.Then the agent moves to the request (for pick-up) and then tothe request destination (for drop-off), both along the shortest-travel-time path in the road network. Once the agent arrivesat the destination, it is labeled as empty and travels alongan assigned search route. If the agent finishes traversing itssearch route without having been assigned a new request, it isassigned a new search route.If no agent meets the above conditions, the request remainsin the system until an agent meets the conditions or the requestexpires.It is worth noting that an agent knows neither when andwhere requests will appear nor has any information about otheragents.
Definition 4 (Idle Time) . The idle time of an agent is theamount of time from when the agent is labeled as empty towhen it is assigned to a request. An agent may experiencemultiple idle times in a day, each corresponding to travelingon a search route. We denote I i = { I ik } as the set of idletimes of agent a i ’s search routes, where I ik is the idle time of a i ’s k -th search route r ∗ ik . In order to reduce the idle time when planning a search routefor an idle agent, we build an accurate request data modelthat the assignment authority can use to make decisions. Thedata model is a software module that is shared with all agents ata Model a b cd fe
Road NetworkHistorical Taxi Order Data Multi-level
Partition
ST-GCSL
Spatial-Temporal
Partitioning
Request Availability
Forecasting
Route
Offline Phase
Online Phase
DROP
Fig. 1. System Overview and that is used to represent request availability patterns andpredict future request availability.
B. Problem Statement
The SOUP problem consists of two tasks: (1) spatial-temporal request forecasting; and (2) competitive spatial-temporal request search.
Task 1 (Spatial-Temporal Request Forecasting) . Given a roadnetwork and a historical set of requests Ω , we aim to build arequest data model to predict requests for different locationsand times. Task 2 (Competitive Spatial-Temporal Request Search) . Givena road network, a request data model, a set of agents A withoriginal locations, and a stream of requests Ω , we aim to plana search route for each agent to travel when it becomes idle,such that the average idle time in Eq. 1 is minimized. (cid:80) a i ∈A |I i | (cid:88) a i ∈A (cid:88) I ik ∈I i I ik (1) C. Framework Overview
To solve the above problem, we design a search routerecommendation framework. Instead of considering the twotasks as independent modules, we integrate them into a unifiedframework, shown in Fig. 1. The processing pipeline includesoffline and online components. For the former, we propose the
ST-GCSL algorithm to train a request data model based onhistorical request data, i.e., taxi order data. The data modelbuilds a multi-level partitioning on road network and predictsfuture requests for the partitions. For the latter component, wepropose the
DROP algorithm that computes a search route foran agent based on the location and time when it becomes idleand the supply-demand state in the near future.Once an agent is assigned to a request, it travels to therequest for pickup and travels to the request’s destination fordropoff. Afterwards, the assignment authority provides theagent with a search route. We use an open source frameworkcalled COMpetitive SEarching Testbed (COMSET) as a sim-ulator to implement the whole process. We limit our discussionto the forecasting and search processes, and we omit the detailson the assignment of agents to requests. (a) First-level Partitioning (b) Second-level PartitioningFig. 2. Multi-level partitioning in Manhattan III. S
PATIAL -T EMPORAL R EQUEST F ORECASTING
In order to solve Task 1, we design an end-to-end deeplearning model, called
ST-GCSL , to build the request datamodel for predicting where and when the requests are likelyto appear. First, we develop a multi-level partitioning methodthat partitions the space into a hierarchical structure. Then, wepresent the details of proposed
ST-GCSL . A. Multi-level Partitioning
We partition the space into a three-level structure. For thefirst level, we partition the road network into regions basedon the administrative boundaries. Fig. 2(a) shows the first-level partition of Manhattan, where the boundary informationcan be found in [41]. The intuition is that the functionalitiesdiffer in these administrative regions, and the travel patternsof people in the same region are usually similar. For instance,the downtown districts are financial and business centers, andthe upper district is a traditional wealthy district. With thefirst-level partition, our method can quickly filter out the mostunpopular regions, thus reducing the search space.For the second level, we adopt the partitioning method inSTP [21] that obtains K subregions of a first-level region byapplying KMeans algorithm [31]. We find K cluster centersof all the request locations in historical data, and classifythe vertices into K groups according to their closest centers.Therefore, we form the regions in the second-level partition,denoted as R = { r , r , . . . , r K } . Fig. 2(b) shows an exampleof a second-level partition of Manhattan. Through the second-level partition, we can further refine the popularity of regionsin the first-level partition and find relatively more popularregions.For the third level, we partition the edges (roads) in thesecond-level regions to different groups based on their startingvertices. Therefore, each third-level partition contains a vertexand several edges, and the vertex can be used as the destinationof a search route after determining the third-level partition. https://github.com/Chessnl/COMSET-GISCUPa) GeographicalNeighbor (b) Semantic NeighborFig. 3. Geographical and Semantic Neighbor B. Region Correlation Graph
Before we introduce the details of
ST-GCSL , we firstexplain how to build a region correlation graph based on thehistorical request data.Let T = { t , t , . . . , t | T | } be the set of time slots, wherethe length of each time slot is 10 mins. Let D i,j denote therequest availability, i.e., the number of requests (taxi demands)that appear in region r i at time slot t j , we have D i,j = |{ ω | ω.o ∈ r i ∧ ω.t o ∈ t j }| . (2)After processing the historical request data, we obtain a re-quest availability sequence { D , D , . . . , D | T | } , where D j ∈ R K represents the request availabilities of all the K regionsat time slot t j .We follow previous studies [5], [39] to derive a regioncorrelation graph G = ( R, A ) , where R is the aforementionedsecond-level region set, and A is the adjacency matrix of G . Specifically, we consider two type of neighbors betweenthe regions, i.e., Geographical Neighbors and
SemanticNeighbors , based on whether two regions are geographicallyclose or have similar request availability patterns. • The geographical neighbor is based on the first law ofgeography [37]-“near things are more related than distantthings”, that used to extract spatial correlations betweena region and its adjacent regions. As shown in Fig. 3(a), r ’s geographical neighbors are r , r , r and r . • The semantic neighbor is used to extract semantic corre-lations between regions with similar request availabilitypatterns. To quantify the request availability similarity be-tween regions, we use the Pearson Correlation Coefficient[5]. Let D i represent the request availability sequence ofregion r i in the training data. The semantic similaritybetween r i and r j is defined as follows: Sim ( r i , r j ) = P earson ( D i , D j ) . (3)We consider regions r i and r j are semantic neighbors if Sim ( r i , r j ) > (cid:15) , where (cid:15) is a threshold to control thenumber of semantic neighbors. As shown in Fig. 3(b),the request availability patterns between region r and r are similar, so r and r are semantic neighbors. Concat { 𝑫 𝒕−𝒉+𝟏 , 𝑫 𝒕−𝒉+𝟐 , … , 𝑫 𝒕 }{ 𝑬 𝒕−𝒉+𝟏 , 𝑬 𝒕−𝒉+𝟐 , … , 𝑬 𝒕 } ST-Gate Block
MSTCM STCM
ST-Gate Block
TICC
Conv2d
Conv2d 𝑫 t+1 MSTCM
STCM c d Time Time 𝑿 𝒍+𝟏 Fig. 4. The architecture of ST-GCSL
Therefore, the adjacency matrix A is calculated as follows: A ij = (cid:40) , if r i , r j are neighbors. , otherwise. (4) C. ST-GCSL Model
The architecture of
ST-GCSL is shown in Fig. 4, whichconsists of two parallel chains to process the historicalrequest availability sequences and contexts, respectively.The corresponding results are then concatenated and con-voluted to return the forecasting result, ˆD t + ∈ R N .In the upper chain, the historical request availability se-quences, { D t − h + , D t − h + , . . . , D t } , are taken as inputs,which are then followed by two Spatial-Temporal Gate(ST-Gate) Blocks, each of which is composed of Multi-ple Spatial-Temporal Convolutional Module (MSTCM) andSpatial-Temporal Convolutional Module (STCM) sequentially.Meanwhile, the lower chain takes input as the context fea-tures, { E t − h + , E t − h + , . . . , E t } , which is then processedby Toeplitz Inverse Covariance-Based Clustering (TICC) [16]and 2D convolution operation (Conv2d). Spatial-Temporal Gate Block : ST-Gate Block is con-structed to process graph-structured historical request avail-ability sequences, which aims to capture the short and long-term spatial-temporal correlation simultaneously. The short-term spatial-temporal correlation is captured via MSTCMwhich is stacked by several STCMs, where each STCM takesinput as short-term historical request availability sequences,i.e., m time steps, to extract spatial-temporal correlation. Onthe other hand, the long-term spatial-temporal correlation ismaintained by a single STCM that takes input as the wholehistorical request availability sequences, e.g., h time steps. Spatial-Temporal Convolutional Module.
Fig. 5(a) depictsthe construction of STCM, which consists of three operations:temporal convolution, Graph Convolution (GC), and GatedLinear Unit (GLU) [12]. Given an input X l ∈ R q × N × C l ,where q is the number of time steps and C l is the numberof input features, and the underlying region correlation graph G , it is first processed by temporal convolution that is definedas follows, Γ l ∗ τ ∗ X l = B, (5) onv2d GC GLU 𝜎 𝑩 𝑿 𝒍+𝟏 𝑿 𝒍 𝑿 𝒍+𝟏(𝟏) 𝑘 𝐶 𝑙+1 𝐶 𝑓 𝐶 𝑙+1 𝑞 𝑁𝑩 𝑿 𝒍+𝟏(𝟐) 𝐶 𝑓 𝐶 𝑙+1 𝐶 𝑙 (a) Spatial-Temporal Convolutional Module h 𝐶 𝑙+1 h -(m-1) m 𝐶 𝑙 N N N* 𝐶 𝑙+1 𝐶 𝑙+1 N … … Concat 𝐶 𝑙+1 N … STCMSTCMSTCM N* 𝐶 𝑙 𝐶 𝑙 𝐶 𝑙 m m NN (b) Multiple Spatial-Temporal Convolutional ModuleFig. 5. Spatial-Temporal Gate Block where ∗ denotes the convolution operator, Γ l ∗ τ represents atotal number of × C f convolution filters at the l -th layer, eachof which has kernel size k × , stride number 1, and withoutpadding. B ∈ R q − ( k − × N × C f is the output of temporalconvolution, which is equally split into B and B along thefeature dimension, i.e., B , B ∈ R q − ( k − × N × C f .Then, B and B are fed into two GC layers separately,which can be formulated as follows, X ( k ) l +1 = Θ l(cid:63) G ,k (cid:63) B k = ˜ P − ˜ A ˜ P − B k W k , (6)where (cid:63) denotes the graph convolution operator, and Θ l(cid:63) G ,k represents the graph convolution filter at the l -th layer. ˜ A = I + A ∈ R N × N is the adjacency matrix with self-looping, A is the adjacency matrix of graph G , and ˜ P ∈ R N × N is thedegree matrix with ˜ P ii = (cid:80) j ˜ A ij and ˜ P ij = 0 , ∀ i (cid:54) = j . B k isthe input and W k is the weight parameters, k = { , } .Furthermore, we use a GLU to model the complex non-linearity in request forecasting, which is formulated as follows, X l +1 = (cid:104) X (1) l +1 + X l (cid:105) ⊗ σ (cid:104) X (2) l +1 (cid:105) , (7)where σ is the sigmoid function, ⊗ denotes the hadamardproduct. The right half controls what information can beoutput. On the left half, a residual connection is utilized toavoid the network degradation. Finally, we have the output X l +1 ∈ R ( q − k +1) × N × C l +1 . Multiple Spatial-Temporal Convolutional Module.
Fig. 5(b)shows the architecture of MSTCM, which is stacked by severalSTCMs along the time axis to capture the short-term spatial-temporal correlation. Each STCM takes input as consecutive m time steps, Y i ∈ R m × N × C l , where C l is the featuredimension, ≤ i ≤ h − m + 1 . According to Eq. 5, 6 and 7,the operation of each STCM can be formulated as follows, S i = [ Θ l(cid:63) G , (cid:63) ( Γ l ∗ τ , ∗ Y i ) + Y i ] ⊗ σ [ Θ l(cid:63) G , (cid:63) ( Γ l ∗ τ , ∗ Y i )] , (8)where S i ∈ R ( m − k +1) × N × C l +1 is the output of i -th STCM.For ease of representation, we use m = k in all MSTCMs inthis paper, such that S i ∈ R × N × C l +1 .Then, the output of MSTCM can be represented as S =[ S , · · · , S h − m +1 ] , where [ · , · ] is the concatenation operator. To improve the processing efficiency, these STCMs can beexecuted parallelized, such that MSTCM can capture the short-term spatial-temporal correlation of the entire input at onetime. Clustering Context Feature Sequence : Due to thestrong correlation between the request availability patternand the data periodicity, e.g., working days, weekends, orrainy days, we cluster these context features. When predictingthe request availability, we first determine which cluster thecurrent context features belong to, and then add this cluster asan additional feature to improve the prediction, as is shown inFig. 4.The context feature E t − i + at time step i is a five-tuple,(time of day, day of week, weather, holiday, events), which isthen clustered via TICC. Thus, the context features of all h time steps can be encoded into a cluster feature vector c ∈ R h .Then, c is further processed by a convolution operation with atotal of h − k − filters, and the kernel size is h × , suchthat the output is c d ∈ R [ h − k − × . To enable to concatenatewith the output from ST-Gate Block, c d is duplicated into atensor F ∈ R [ h − k − × N × .Finally, the outputs from historical request availability se-quences and contexts are concatenated, and then convolutedto obtain the prediction of request availability in the next timestep, ˆD t + ∈ R N .IV. D EMAND - AWARE R OUTE P LANNING
We proceed to present the details of the demand-awareroute planning (
DROP ) algorithm. In the offline component,the request availability patterns of different regions and timeslots are predicted by
ST-GCSL . In the online component, weassign the search routes to idle agents based on the forecastingresults. Specifically, we first determine the current supply-demand state based on real-time request forecasting result,and compute the region weights according to the state. Then,we adopt a distance-aware random roulette wheel selectionmethod to select a target region level by level. Finally, afterdetermining the final destination, we take the shortest-travel-time path as the search route and send to the agent. a) 12:00, Oversupply (b) 18:00, Supply-demand balance (c) 21:00, UndersupplyFig. 6. Distribution of idle agents and requests (red dot: idle agents, bluedot: active requests)
A. Supply-Demand Analysis
For better understanding, we add a visualization module intoCOMSET simulator to analyze the real-time state of requestsand idle agents. The road network is from Manhattan, NewYork City, USA. The New York TLC Trip Record YELLOWData on June 1st, 2016 is used as simulation data, and weselect three moments of the day for visual analysis. The resultsare shown in Fig. 6, where the red dots represent idle agentsand the blue dots represent active requests. Figs. 6(a), 6(b), and6(c) show three distinct supply-demand states, i.e., oversupply,supply-demand balance and undersupply.In Fig. 6(a), we observe that at 12:00, there are almostno active requests, while agents are mainly gathered in themidtown area. In this case, we need to spread agents across thenetwork as much as possible to avoid herding effect such thatthe number of request expirations can be reduced. Fig. 6(b)shows that before the evening rush hour, the numbers of idleagents and active requests are small since most requests andagents are assigned to each other. The idle agents are gatheredin uptown area since it is the destination of most people afterwork. The agent distribution at this time is reasonable sinceagents move to where they are requested, thus we aim to finda way to maintain this balance state. Fig. 6(c) shows that thenumber of requests is much greater than the number of idleagents at 21:00, and the request density in the midtown area issignificantly higher than that in other areas. Therefore, to makean improvement, agents need to be guided to the midtown area,rather being scattered across the road network. B. Supply-Demand State Determination
For the aforementioned three supply-demand states, weadopt different strategies based on real-time supply-demandstates to reduce the search time. Therefore, we first need todetermine the current supply-demand state. According to [25],we use ∆ as the time horizon that represents the length of nearfuture to be considered. Let t denote the current time, |A| bethe number of agents and T be the average time for an agent to finish an order. Through analyzing the data, we find that T does not change too much everyday, which is always around20 minutes, so we take it as a constant. We use S to representthe agent supply, which is estimated as follows: S = |A| T (9)According to Eq. 9, S can be regarded as the maximum numberof requests that can be served per minute. With ST-GCSL , wecan predict the request availability (demand) from t to t + ∆ .Let D be predicted average request availability per minutesduring this time period, we can determine the supply-demandstate by the supply-demand ratio as follows, S / D > α , oversupply S / D < α , undersupplyotherwise , supply-demand balance (10)where α ( α > , α ( α < are the parameters to controlthe threshold of different supply-demand states. The values of α and α are tuned in the experiments, where α = 1 . and α = 0 . in New York dataset. C. Region Weight Matrix
In order to choose a region from R as destination, we needto define a weight for each region to compute its popularity.Therefore, we define a region weight matrix as follows. Definition 5 (Region Weight Matrix) . For region weight,following the definition in [21], we let p ij and d ij denote thenumber of pick-up and drop-off events predicted to occur inthe region r i during the time slot t j , respectively. Note that, webuild prediction models for both pick-up and drop-off events.We define a region weight matrix W shaped | R | × | T | andeach W ij can be calculated as follows, W ij = p ij − λ × d ij ( λ < . (11)Obviously, the more pick-up events happen in a region, themore popular the region is. On the contrary, the more drop-offevents happen in a region, the more agents competing in theregion, thus making the region relatively unpopular. Therefore,we use a parameter λ to balance the positive effect of pick-upevents and negative effect of drop-off events.Through above analysis, it is easy to see that the value of λ is state-dependent. The intuition is that when the state is over-supply, the value of λ should be large enough to prevent agentsfrom going to regions with many drop-off events, which maycause herding effect. For the supply-demand balance state, thevalue of λ is relatively smaller than that of oversupply. In theundersupply state, we use the smallest value of λ since thenumber of idle agents in each region is small and the herdingeffect may not happen. The optimal value of λ needs to beobtained through experiments.Besides, for the undersupply state, we divide W ij by thetotal road travel time in the region. It means that we take therequest “density” into consideration, and we lead the agentsmove to the regions with high request “density”, which can lgorithm 1: planSearchRoute Algorithm
Input:
Agent’s current location l and timestamp t Output:
The search route r ∗ Sample a region r from the first-level regions usingroulette wheel selection; Initialize an empty set C for candidate regions; while | C | Dist ( l, r ) γ as its weight; end Sample a region r from C based on the weights; if S-D state is undersupply then Get the intersection with the largest weight from r as the destination; else Use roulette wheel selection to select an intersectionas the destination; end Get the shortest path from l to the destination as r ∗ ;further reduce the search time without worrying about theherding effect caused by competition. As for undersupply state,the demands far exceed the agent supply, which means thatmost agents can be assigned to requests, and the search timefor finding requests in dense regions is much smaller than thatof sparse regions. D. Planning Search Route The process of planning a search route for an agent isequivalent to selecting a target region level by level. It is worthnoting that the search strategy is state-dependent, i.e., thesearch strategies of three supply-demand states are different.Algorithm 1 summarizes the planSearchRoute Algorithm. Selecting First-level Region : In order to dispatchagents according to the weight distribution of regions, weadopt the same strategy as STP [21] to select a first-levelregion. First, we obtain all the region weights at the currenttime slot t j in the first-level region weight matrix W . Thenwe use the random roulette wheel selection [40] to obtain atarget region. For a region r m , the probability of r m beingselected is P r [ r m ] = W mj / (cid:88) i W ij (12) Selecting Second-level Region : We apply two roundsselections. The first round of selection is to find serveral pop-ular regions in the first-level region, and the second selectiontends to select a region close to the agent’s location to reducetravelling time.(i) We select a set of n candidate regions from all the second-level regions of the selected first-level region by repeating n times of random roulette wheel selection, and denoteit as C .(ii) For each r ∈ C , we re-compute its weight by using Dist ( a.l, r ) γ , where Dist ( a.l, r ) is the road networkdistance between the agent’s current location a.l and acandidate region r ∈ C , and γ is a preference parameterof agents for neighboring regions, i.e., the smaller thevalue of γ is, the more likely the agents tend to choosethe nearby regions. Then we execute the roulette wheelselection to obtain the destination region. Selecting Third-level Region : As a third-level regionconsists of a vertex and its edges, selecting a third-level regionis actually selecting a destination. In order to allow agentsmove to hotspots, we consider the current supply-demandstate. • If the current state is undersupply, we calculate the weightof each intersection and simply choose the intersectionwith the largest weight as destination. • Otherwise, in order to prevent competition, we still useroulette wheel selection to select the destination.V. E XPERIMENTAL S TUDY The ST-GCSL model is implemented in Python3 withTensorflow, and the DROP algorithm is implemented in Java.The experiments are run on a Windows machine with an Intel2.8GHz CPU and 16GB memory. A. Experimental Settings Dataset Description. We use two real datasets: New Yorkdataset and Haikou dataset , and a synthetic dataset generatedfrom the New York dataset, to evaluate our model. Table IIsummarizes the statistics of datasets, number of taxi requests,number of edges (roads), and number of nodes (intersections). • New York: The New York dataset is a subset of the NewYork TLC Trip Record YELLOW Data with the recordswhose pick-ups and drop-offs are within the Manhattanarea, which contains the taxi order records of yellowtaxis. Each record includes the coordinates and times ofpick-up and drop-off events. We extract the data fromJanuary, 2016 to June 2016 for evaluation. The relatedweather data is extracted from New York Central Park . • Haikou: The Haikou dataset contains the taxi order dataof Haikou city from May 1 to October 31, 2017, includingthe coordinates of origins and destinations, as well asthe order type, the travel category, and the number ofpassengers. The related weather data is extracted ascontext features. • Synthetic Dataset: We construct a synthetic dataset fromNew York dataset as follows. First, we take one day’sdata and count the number of orders per hour in eachfirst-level partition, and then we divide the number oforders by the number of intersections for each first-level https://outreach.didichuxing.com/app-vue/HaiKou?id=999 https://pan.baidu.com/share/init?surl=gj9rHC6Qe67IGEwx DRyIwABLE IID ATASETS Dataset New York & Synthetic 69,406,526 9,542 4,360Haikou 12,374,094 8,034 3,298 partition, thus we get the average number of orders perhour for each intersection. Finally, for each intersection,we take the intersection as origin, randomly choose one ofintersections 3 to 20 minutes away from the intersectionas destination. We take the average number of orders atthis intersection as the arrival rate λ of Poisson processto generate the pick-up time of records. Methods for Comparison. For the request forecasting prob-lem, we compare ST-GCSL with the following methods: • HA : Historical average model treats the average numberof requests of a region at a time slot as the predictedvalue. • VAR [17]: Vector Auto-Regression model is useful toanalyze multivariate time series data. • LSTM [18]: Long Short-Term Memory Network is atypical time series model for the forecasting problem. • DCRNN [26]: Diffusion Convolutional Recurrent Neu-ral Network models the spatial-temporal dependency byintegrating graph convolution into the gate recurrent unit. • STGCN [46]: Spatio-Temporal Graph Convolutional Net-works captures temporal dependency and spatial corre-lations by using 2D convolutional networks and graphconvolutional network, respectively. • STG2Seq [6]: Spatial-Temporal Graph to SequenceModel uses multiple gate graph convolution module withtwo attention mechanisms to capture spatiao-temporalcorrelations. • Graph WaveNet [42]: Graph WaveNet combines graphconvolution with dilated casual convolution to capturespatial-temporal dependencies.For a fair comparison, we use the same loss function in allmodels, which is defined as follows: Loss ( θ ) = || D t +1 − ˆD t +1 || where D t +1 , ˆD t +1 denote the real and the predicted requestavailability, respectively. We normalized the context featuresdata to [0 , via using Max-Min normalization and then assignthem into 10 clusters through TICC algorithm, and the requestvalues are preprocessed by Mean-Std normalization.For the route planning problem, we compare DROP withthree baselines: • SmartAgent [25]: SmartAgent uses non-negative matrixfactorization (NMF) to model and predict the spatio-temporal distributions of requests, and then chooses des-tinations using a greedy heuristic. • TripBandAgent [9]: TripBandAgent optimizes the taxi-cabs search strategy by using reinforcement learning(RL). • STP [21]: Spatial-Temporal Partitioning divides thesearch space into regions and compute weights for plan-ning routes. Parameter Settings. We set the length of each time slot t to 10 minutes, so each day is evenly divided into 144 timeslots. The length of the time horizon ∆ is 40 minutes, i.e., theregion weight is determined by the prediction results withinthe next 4 time slots. The threshold (cid:15) that controls the numberof semantic neighbors is set to 0.5. The value of α and α are 1.2 and 0.8, respectively.The batch size of random gradient descent is 32, the learningrate is 0.001, the number of filters of 2D convolution and graphconvolution in the first ST-Gate block are 32 and 32, while thenumber of filters in the second ST-Gate block are 32 and 64,respectively. The dropout rate is set to 0.2. The parameter h is set to 10, while k and m are equal to 3.For New York and synthetic dataset, the number of first-level regions is 4, and the number of second-level regions is100. For Haikou dataset, the number of first-level regions is4, and the number of second-level regions is 45. We set thedistance attenuation parameter γ as -0.2, -0.5 and -0.6 forthe three states of undersupply, supply-demand balance, andoversupply, respectively, in all datasets. Evaluation Metrics. For the request forecasting problem, weuse three well-adopted metrics: • Mean Average Percentage Error (MAPE). • Mean Absolute Error (MAE). • Rooted Mean Square Error (RMSE).For the route planning problem, we use three evaluationmetrics: • The average agent idle time. • The average request waiting time, which is the period oftime from its introduction until its pick-up or expiration. • The expiration percentage, which is the percentage ofexpired requests.For the route planning problem, as the search time dif-ferences among the algorithms are small (usually within afew seconds), in order to better compare the performance, weuse the improvement percentage over the Random Destination(RD) algorithm [43] as metric. It randomly chooses an inter-section in the road network as destination and then uses theshortest path between the current location and the destinationas the search route. B. Performance Evaluation1) Request Forecasting : To evaluate the performance of ST-GCSL , we first compare ST-GCSL with baseline mod-els. Then, we study the effect of different components of ST-GCSL . Furthermore, we compare our model with somepopular models in multi-step prediction.Note that, for the data of each month, we use the last days for validation and testing (i.e., 5 days for validation and5 days for testing) and the rest for training using the samemethod. The following results are outputted from the data inJune 2016 on New York dataset and May 2017 on Haikoudataset. ABLE IIIC OMPARISON WITH D IFFERENT B ASELINES Method New York HaikouMAPE(%) MAE RMSE MAPE(%) MAE RMSEHA 37.60 4.737 7.07 41.40 2.669 4.143VAR 26.30 4.024 5.802 32.92 2.023 3.076LSTM 23.77 3.944 5.756 31.70 2.002 3.041DCRNN 20.94 3.752 5.570 31.87 2.010 3.084STGCN 19.42 3.708 5.519 31.54 1.985 3.007STG2Seq 19.79 3.725 5.532 30.88 1.995 3.042Graph WaveNet 20.53 3.890 5.838 33.56 2.143 3.288 ST-GCSL 18.667 3.648 5.458 29.87 1.958 2.986 TABLE IVC OMPARISON WITH V ARIANTS OF ST-GCSL Removed component New York HaikouMAPE(%) MAE RMSE MAPE(%) MAE RMSEContext features 19.15 3.686 5.481 30.07 1.958 2.986GLU 19.12 3.671 5.461 30.58 1.975 2.991STCM 19.04 3.664 5.465 30.65 1.973 2.974MSTCM 19.07 3.669 5.458 30.14 Comparison with Baseline Models. Table III shows the testerror comparison of different methods for request forecasting.We have the following observations:(i) The classical methods including HA and VAR have poorperformances. The reason is that they are not able tomodel the non-linear spatial-temporal dependencies.(ii) In general, the deep learning methods perform better.LSTM only takes the temporal dependencies into con-sideration, while DCRNN, STGCN, STG2Seq, GraphWaveNet use two modules to model temporal depen-dencies and spatial correlations respectively. So theyhave better performance than LSTM. It is worth notingthat Graph waveNet performs poorly on Haikou dataset,which may be caused by the data sparsity issue.(iii) ST-GCSL achieves the best performance regarding allthe metrics in both two real-world datasets. Our methodtakes localized spatial-temporal correlations and long-term temporal dependencies into account and can capturetemporal dependencies, spatial correlations and spatial-temporal correlations simultaneously, while other ap-proaches ignore either temporal dependencies or spatial-temporal correlations to some extent. Component Analysis. To further evaluate the effect of dif-ferent components of our model, we compare the followingvariants of ST-GCSL , including:(i) Removing the context features;(ii) Replacing GLU with Relu activation function;(iii) Removing the STCM from ST-Gate block;(iv) Removing the MSTCM from ST-Gate block.The experimental results on both two datasets are shown inTable IV. We have three observations. First, without contextfeatures, the model have poor performance since the contextfeatures consist useful information for prediction. Second,the model with GLU have better performance than Reluactivation function. This is because the module with GLU hastwice the parameter size of Relu, so it can captures morecomplex spatial-temporal correlations. Besides, the gate in (a) MAPE (b) MAE (c) RMSEFig. 7. Multi-step Prediction on New York Dataset(a) MAPE (b) MAE (c) RMSEFig. 8. Multi-step Prediction on Haikou Dataset GLU can control the output more useful than Relu. Third,removing STCM or MSTCM from ST-Gate block may ignoresome temporal information and spatial-temporal correlationsto some extent. Specifically, removing STCM may ignore thelong-term temporal dependencies while removing MSTCMmay miss localized spatial-temporal correlations. Althoughremoving MSTCM from ST-Gate block has slightly betterperformances in MAE and RMSE on Haikou dataset, it isnormal to obtain a slight change of results due to data sparsity.Therefore, the results verify the superiority of our designednetwork. Multi-step Prediction Comparison. ST-GCSL is able toconduct multi-step prediction as the same as DRCNN, STGCNand STG2Seq. So we compare ST-GCSL with these methodson the request forecasting in the following 3 time steps. Fig.7 and Fig. 8 present the experimental results of all metrics onboth two datasets. As we can see, ST-GCSL performs betterthan these methods. Route Planning : To evaluate the performance of DROP , we randomly select the data of two days from eachmonth to form the test data set. Then, we compare the perfor-mances of all algorithms. Finally, we compare the algorithmsby varying the parameters, such as the number of agents, andthe request’s maximum life time (MLT). Performance Overview. To compare the performances ofall the algorithms with default parameter settings on all thedatasets, we demonstrate the idle time (s), waiting time(s),and expiration percentage (%) on all the datasets in Table V. DROP is better than other algorithms in terms of idle timeand expiration percentage on all the datasets. Even in termsof waiting time, it is only slightly worse than STP in the New ABLE VP ERFORMANCE O VERVIEW DROP SmartAgent TripBandAgent STP RDNew York Idle Time (s) Synthetic Idle Time (s) Haikou Idle Time (s) York dataset and synthetic dataset. Moreover, we find thatthe improvement of DROP compared to RD is related to thedistribution of requests. The spatial and temporal distributionof requests on the New York dataset is quite uneven, so DROP improves more compared to RD, and the spatial and temporaldistribution of requests on Haikou dataset is uniform, so theimprovement of DROP on the Haikou dataset is smaller thanthat on the New York dataset. Therefore, the greater differencein the spatial and temporal distribution of the requests, thebetter performance that DROP has. Idle Time. We evaluate the average idle time for agents on allthe datasets by varying agent cardinality and MLT. The resultsare shown in Fig. 9 and Fig. 10. The results show DROP has abetter performance than its competitors under different agentcardinality and MLT on all the datasets. Compared to RD, DROP has the largest percentage improvement in idle time,from about 6.3% to 2.9% on New York dataset, 11.8% to 2.4%on synthetic dataset, and 1.2% to 0.7% on Haikou dataset, asthe agent cardinality increases. On all the datasets, DROP hasthe lowest average idle time no matter how long the MLT is. Waiting Time. Fig. 11 and Fig. 12 show the performances ofall algorithms on average request’s waiting time as the agentcardinality and MLT increase. Through the figures, we cansee that when the agent cardinality is small, the waiting timeof DROP is not the smallest. For example, on the New Yorkdataset, when the number of taxis is 5000 and 6000, STP hasthe shortest waiting time. However, as the number of taxisincreases, DROP is significantly better than other algorithmsin waiting time. As we do not take the average waiting timeour primary evaluation metric, the performance of DROP isstill good. Expiration Percentage. The performances of all algorithmson the expiration percentage are shown in Fig. 13 and Fig. 14.The results show that DROP has the lowest expiration per-centage compared to the baseline algorithms in all cases. Thisis because DROP uses a multilevel partition-based method todispatch agents such that the distribution of agents is consistentwith the real-time distribution of requests, which reduces theexpiration percentage. (a) New York (b) Synthetic (c) HaikouFig. 9. Idle Time Improvement by Varying Agent Cardinality(a) New York (b) Synthetic (c) HaikouFig. 10. Idle Time Improvement by Varying MLT(a) New York (b) Synthetic (c) HaikouFig. 11. Waiting Time improvement by Varying Agent Cardinality(a) New York (b) Synthetic (c) HaikouFig. 12. Waiting Time Improvement by Varying MLTa) New York (b) Synthetic (c) HaikouFig. 13. Expiration Percentage Improvement by Varying Agent Cardinality(a) New York (b) Synthetic (c) HaikouFig. 14. Expiration Percentage Improvement by Varying MLT VI. R ELATED W ORK A. Taxi Demand Prediction Taxi demand prediction is a critical component to buildan efficient transportation system. LinUOTD [38] applies aunified linear regression with high-dimensional features. Onestudy [14] treats it as a classic time series problem byutilizing RNN network. ConvLSTM [32] combines CNN andRNN to model spatial and temporal correlation, which isan extension of fully-connected LSTM [18]. ST-ResNet [50]models the temporal closeness, period, and trend propertiesof crowd traffic based on [48], [49]. One study [27] conductsa systematic comparison of two recent deep neural networks[24], [48] for taxi demand prediction. DMVST-Net [45] em-ploys graph embedding as an external features to improveforecast accuracy based on localized spatial and temporalviews. However, all these CNN-based methods only model theEuclidean correlations among grid regions. While GCN canextract local features from non-Euclidean structures resultingin its increasing popularity. One study [20] predicts the travelcost, while GCWC [11] fills in missing stochastic weights ofspeed via GCN and their techniques can be applied on demandprediction. GEML [39] formulates origin-destination matrixprediction via GCN. DCRNN [26] models the spatial-temporaldependency by integrating graph convolution into the gaterecurrent unit. STGCN [46] captures temporal dependency andspatial correlations by using 2D convolutional networks andgraph convolutional network. STG2Seq [6] uses multiple gategraph convolution modules with two attention mechanismsto capture spatio-temporal correlations, while Graph WaveNet[42] combines graph convolution with dilated casual convolu-tion to capture spatial-temporal dependencies. STMGCN [52]employs multi-graph to extract spatial correlations and usesRNN to capture temporal dependencies. These GCN-basedmodels are more flexible and progressive than CNN-basedmethods, and our method also uses GCN to extract spatial correlations. B. Taxi Dispatch The existing work on taxi dispatch can be divided intotwo categories, order matching [4], [7], [8], [11], [23], [34],[36], [44], [51], [52] and route recommendation [10], [13],[15], [19], [28]–[30], [33], [35], [47]. Order matching aims tomatch idle taxis with passengers to maximize global revenue.One study [44] considers both instant passenger satisfactionand the expected future gain in a unified decision-makingframework, and then optimizes long-term platform efficiencythrough reinforcement learning. One study [8] formulates theproblem of online taxi routing and introduces a backbonealgorithm that makes online vehicle routing problem tractable.Two studies [7], [11] introduce the idea of game theory intothe matching problem, and model the matching as a processto reach a stable Nash equilibrium. OSM-KIID [52] not onlymaximizes the expected total profits, but also tries to satisfythe preferences among passengers and taxis.For route recommendation, some studies [30], [47] learntaxi and passenger movement patterns from trajectory datato develop routing strategies. One study [10] divides thespace into several regions, thus converting the infinite searchspace into a limited region search space, and then establisha fluid model associated with a closed queueing networkcomposed of single and infinite server stations to find anoptimal static empty-car routing policy. MDM [13] employsa continuous learning platform where the underlying modelthat predicts future customer requests is dynamically updated,and minimizes the distance of idle taxi drivers to anticipatedcustomers through Monte Carlo Tree Search. RHC [28] com-bines highly spatio-temporal correlated demand supply modelsand real-time GPS location and occupancy information todispatch taxis, the objectives include reducing taxi idle drivingdistance and matching spatio-temporal ratio between demandand supply for service quality.VII. C ONCLUSION AND F UTURE W ORK In this paper, we study the spatial-temporal demand fore-casting and competitive supply (SOUP) problem. We proposean end-to-end deep learning model ST-GCSL for requestforecasting, and develop a route planning algorithm DROP toguide agents to reduce the idle time. The experimental studiesshow that ST-GCSL has better performance compared withbaseline models in both the single and multi-step predictions.The improvements on MAPE, MAE and RMSE are 3.9%,1.6% and 1.2% over the baseline methods on New Yorkdataset. In addition, DROP outperforms all the competitorson the average idle time with improvements of 6.3%, 11.8%and 1.2% in the three datasets, respectively.For future work, we plan to further optimize our ST-GCSL for higher accuracy and better robustness. In addition, SOUPcan be extended to more complex problems, such as dynamicvehicle routing, where customers can dial a ride in a dynamictime window. We plan to use deep reinforcement learning andperation research methods to solve these problems in ourfuture work. R ACM Trans. SpatialAlgorithms and Systems , 3(4):11:1–11:39, 2018.[5] L. Bai, L. Yao, S. S. Kanhere, X. Wang, W. Liu, and Z. Yang.Spatio-temporal graph convolutional and recurrent networks for citywidepassenger demand prediction. In CIKM , pages 2293–2296, 2019.[6] L. Bai, L. Yao, S. S. Kanhere, X. Wang, and Q. Z. Sheng. Stg2seq:Spatial-temporal graph to sequence model for multi-step passengerdemand forecasting. In IJCAI , pages 1981–1987, 2019.[7] R. Bai, J. Li, J. A. D. Atkin, and G. Kendall. A novel approach toindependent taxi scheduling problem based on stable matching. JORS ,65(10):1501–1510, 2014.[8] D. Bertsimas, P. Jaillet, and S. Martin. Online vehicle routing: Theedge of optimization in large-scale applications. Operations Research ,67(1):143–162, 2019.[9] F. Borutta, S. Schmoll, and S. Friedl. Optimizing the spatio-temporalresource search problem with reinforcement learning (GIS cup). In SIGSPATIAL/GIS , pages 628–631, 2019.[10] A. Braverman, J. G. Dai, X. Liu, and L. Ying. Empty-car routing inridesharing systems. Operations Research , 67(5):1437–1452, 2019.[11] P. Cheng, L. Chen, and J. Ye. Cooperation-aware task assignment inspatial crowdsourcing. In ICDE , pages 1442–1453, 2019.[12] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier. Language modelingwith gated convolutional networks. In ICML , volume 70 of Proceedingsof Machine Learning Research , pages 933–941. PMLR, 2017.[13] N. Garg and S. Ranu. Route recommendations for idle taxi drivers: Findme the shortest route to a customer! In KDD , pages 1425–1434, 2018.[14] A. Ghaderi, B. M. Sanandaji, and F. Ghaderi. Deep forecast: Deeplearning-based spatio-temporal forecasting. CoRR , abs/1707.08110,2017.[15] Q. Guo and O. Wolfson. Probabilistic spatio-temporal resource search. GeoInformatica , 22(1):75–103, 2018.[16] D. Hallac, S. Vare, S. P. Boyd, and J. Leskovec. Toeplitz inversecovariance-based clustering of multivariate time series data. In IJCAI ,pages 5254–5258. ijcai.org, 2018.[17] J. D. Hamilton. Time series analysis , volume 2. Princeton New Jersey,1994.[18] S. Hochreiter and J. Schmidhuber. Long short-term memory. NeuralComputation , 9(8):1735–1780, 1997.[19] J. Holler, R. Vuorio, Z. Qin, X. Tang, Y. Jiao, T. Jin, S. Singh,C. Wang, and J. Ye. Deep reinforcement learning for multi-driver vehicledispatching and repositioning problem. In ICDM , pages 1090–1095.IEEE, 2019.[20] J. Hu, C. Guo, B. Yang, C. S. Jensen, and H. Xiong. Stochastic origin-destination matrix forecasting using dual-stage graph convolutional,recurrent neural networks. In ICDE , 2020.[21] Q. Hu, L. Ming, C. Tong, and B. Zheng. An effective partitioningapproach for competitive spatial-temporal searching (GIS cup). In SIGSPATIAL/GIS , pages 620–623, 2019.[22] J. Jin, M. Zhou, W. Zhang, M. Li, Z. Guo, Z. Qin, Y. Jiao, X. Tang,C. Wang, J. Wang, G. Wu, and J. Ye. Coride: Joint order dispatchingand fleet management for multi-scale ride-hailing platforms. In CIKM ,pages 1983–1992, 2019.[23] L. Kazemi and C. Shahabi. Geocrowd: enabling query answering withspatial crowdsourcing. In SIGSPATIAL/GIS , pages 189–198, 2012.[24] J. Ke, H. Zheng, H. Yang, and X. Chen. Short-term forecasting ofpassenger demand under on-demand ride services: A spatio-temporaldeep learning approach. CoRR , abs/1706.06279, 2017.[25] J. Kim, D. Pfoser, and A. Z¨ufle. Distance-aware competitive spatiotem-poral searching using spatiotemporal resource matrix factorization (GIScup). In SIGSPATIAL/GIS , pages 624–627, 2019.[26] Y. Li, R. Yu, C. Shahabi, and Y. Liu. Diffusion convolutional recurrentneural network: Data-driven traffic forecasting. In ICLR (Poster) .OpenReview.net, 2018. [27] S. Liao, L. Zhou, X. Di, B. Yuan, and J. Xiong. Large-scale short-termurban taxi demand forecasting using deep learning. In ASP-DAC , pages428–433, 2018.[28] F. Miao, S. Han, S. Lin, J. A. Stankovic, D. Zhang, S. Munir, H. Huang,T. He, and G. J. Pappas. Taxi dispatch with real-time sensing data inmetropolitan areas: A receding horizon control approach. IEEE Trans.Automation Science and Engineering , 13(2):463–478, 2016.[29] Z. T. Qin, X. Tang, Y. Jiao, F. Zhang, C. Wang, and Q. T. Li. Deepreinforcement learning for ride-sharing dispatching and repositioning. In IJCAI , pages 6566–6568. ijcai.org, 2019.[30] M. Qu, H. Zhu, J. Liu, G. Liu, and H. Xiong. A cost-effectiverecommender system for taxi drivers. In KDD , pages 45–54, 2014.[31] S. Z. Selim and M. A. Ismail. K-means-type algorithms: A generalizedconvergence theorem and characterization of local optimality. IEEETrans. Pattern Anal. Mach. Intell. , 6(1):81–87, 1984.[32] X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and W. Woo. Convo-lutional LSTM network: A machine learning approach for precipitationnowcasting. In NIPS , pages 802–810, 2015.[33] Z. Shou, X. Di, J. Ye, H. Zhu, and R. C. Hampshire. Where to findnext passengers on e-hailing platforms? - A markov decision processapproach. CoRR , abs/1905.09906, 2019.[34] T. S¨uhr, A. J. Biega, M. Zehlike, K. P. Gummadi, and A. Chakraborty.Two-sided fairness for repeated matchings in two-sided markets: A casestudy of a ride-hailing platform. In KDD , pages 3082–3092, 2019.[35] X. Tang, Z. T. Qin, F. Zhang, Z. Wang, Z. Xu, Y. Ma, H. Zhu, and J. Ye.A deep value-network based approach for multi-driver order dispatching.In KDD , pages 1780–1790. ACM, 2019.[36] H. To, C. Shahabi, and L. Kazemi. A server-assigned spatial crowdsourc-ing framework. ACM Trans. Spatial Algorithms and Systems , 1(1):2:1–2:28, 2015.[37] W. R. Tobler. A computer movie simulating urban growth in the detroitregion. Economic Geography , 46(sup1):234–240, 1970.[38] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, andW. Lv. The simpler the better: A unified approach to predicting originaltaxi demands based on large-scale online platforms. In KDD , pages1653–1662, 2017.[39] Y. Wang, H. Yin, H. Chen, T. Wo, J. Xu, and K. Zheng. Origin-destination matrix prediction via graph convolution: a new perspectiveof passenger demand modeling. In KDD , pages 1227–1235, 2019.[40] Wikipedia contributors. Fitness proportionate selection — Wikipedia,the free encyclopedia, 2019. [Online; accessed 18-June-2020].[41] Wikipedia contributors. List of manhattan neighborhoods — Wikipedia,the free encyclopedia, 2020. [Online; accessed 11-June-2020].[42] Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang. Graph wavenet for deepspatial-temporal graph modeling. In IJCAI , pages 1907–1913. ijcai.org,2019.[43] B. Xu, Y. Pao, and U. Khurana. Sigspatial gis cup, 2019.[44] Z. Xu, Z. Li, Q. Guan, D. Zhang, Q. Li, J. Nan, C. Liu, W. Bian, andJ. Ye. Large-scale order dispatch in on-demand ride-hailing platforms:A learning and planning approach. In KDD , pages 905–913, 2018.[45] H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, and Z. Li.Deep multi-view spatial-temporal network for taxi demand prediction.In AAAI , pages 2588–2595, 2018.[46] B. Yu, H. Yin, and Z. Zhu. Spatio-temporal graph convolutionalnetworks: A deep learning framework for traffic forecasting. In IJCAI ,pages 3634–3640. ijcai.org, 2018.[47] N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie. T-finder: A recommendersystem for finding passengers and vacant taxis. TKDE , 25(10):2390–2403, 2013.[48] J. Zhang, Y. Zheng, and D. Qi. Deep spatio-temporal residual networksfor citywide crowd flows prediction. In AAAI , pages 1655–1661, 2017.[49] J. Zhang, Y. Zheng, D. Qi, R. Li, and X. Yi. Dnn-based predictionmodel for spatio-temporal data. In SIGSPATIAL/GIS , pages 92:1–92:4,2016.[50] J. Zhang, Y. Zheng, D. Qi, R. Li, X. Yi, and T. Li. Predicting citywidecrowd flows using deep spatio-temporal residual networks. Artif. Intell. ,259:147–166, 2018.[51] L. Zhang, T. Hu, Y. Min, G. Wu, J. Zhang, P. Feng, P. Gong, and J. Ye.A taxi order dispatch model based on combinatorial optimization. In KDD , pages 2151–2159, 2017.[52] B. Zhao, P. Xu, Y. Shi, Y. Tong, Z. Zhou, and Y. Zeng. Preference-aware task assignment in on-demand taxi dispatching: An online stablematching approach. In