Graph-Based Equilibrium Metrics for Dynamic Supply-Demand Systems with Applications to Ride-sourcing Platforms
GGraph-Based Equilibrium Metrics forDynamic Supply-Demand Systems with
Applications to Ride-sourcing Platforms
Fan ZhouSchool of Statistics and ManagementShanghai University of Finance and EconomicsShikai Luo, Xiaohu Qie, Jieping Ye , Hongtu ZhuDidi Chuxing
Abstract
The aim of this paper is to introduce a novel graph-based equilibrium metric(GEM) to quantify the distance between two discrete measures with possibly differ-ent masses on a weighted graph structure. This development is primarily motivatedby dynamically measuring the local-to-global spatio-temporal coherence between de-mand and supply networks obtained from large-scale two-sided markets, such as ride-sourcing platforms and E-commerce. We formulate GEM as the optimal objectivevalue of an unbalanced transport problem. Transport is only allowed among con-nected vertexes satisfying certain constraints based on the weighted graph structure.The transport problem can be efficiently solved by optimizing an equivalent linearprogramming. We also investigate several important GEM-related theoretical prop-erties, such as metric properties and weak convergence. Furthermore, we use realand simulated data sets obtained from a real ride-sourcing platform to address threeimportant problems of interest including predicting answer rate, large-scale orderdispatching optimization, and policy assessment in ride-sourcing platforms.
Keywords:
Graph-based Equilibrium Metric; Order Dispatching; Ride-sourcing Platform;Unbalanced Optimal Transport; Weighted Graph.1 a r X i v : . [ m a t h . O C ] F e b Introduction
Large volumes of data collected from multiple spatio-temporal networks are increasinglystudied in diverse fields including climate science, social sciences, neuroscience, epidemi-ology, and transportation. In addition, those spatio-temporal networks may interact witheach other across spatial and/or temporal dimension. A typical example is that the dy-namic demand and supply networks of a ride-sourcing platform (Wang & Yang 2019) aretwo sequences of un-normalized masses measured on the same undirected (or directed)graph G = ( V , E ), where V and E are, respectively, a vertex set and a set of edges connect-ing vertex pairs. Figure 1 illustrates how the two complicated networks interact with eachother and evolve over time. Specifically, a city is divided into hundreds of non-overlappinggrids as the vertex set V with the edge structure E determined by road networks and lo-cation functionalities. Both demands and supplies are observed across grids at each timewindow with possibly different total masses and distributions. The ride-sourcing platformuses some order dispatching policy to match customer requests with possible surroundingidle drivers, while after finishing serving assigned orders, drivers return back to the sup-ply pool to prepare for the next feasible matching. The aim of this paper is to address afundamental question of interest for the demand and supply networks of two-sided markets.The fundamental question of interest that we consider here is how to quantify the spatialequilibrium of dynamic supply-demand networks for two-sided markets, particularly ride-sourcing platforms (e.g., Uber and DiDi). To solve this question, we first introduce aweighted graph structure ( G, W, C ) to characterize the transport network and transportcosts of a city. Specifically, we divide each market into N disjoint areas and regard themas vertices, denoted as V = { v , . . . , v N } . Let E be a set of edges between any possiblepair of vertices such that ( v i , v j ) ∈ E ⊂ V × V is an edge equipped with an nonegative2eight w ij (e.g., transportation cost). For all ( v i , v j ) / ∈ E , we set w ij = ∞ . The weightedgraph structure consists of an undirected (or directed) graph G = ( V , E ) as well as a weightmatrix W = ( w ij ), where w ij s’ are nonegative weights. A graph-based transport cost from v i to v j is defined as c ij = min K ≥ , ( i k ) Kk =0 : v i → v j { (cid:88) k w i k ,i k +1 : ∀ k ∈ [[0 , K − , ( v i k , v i k +1 ) ∈ E } , (1)where ( i k ) Kk =0 : v i → v j denotes any path on G through E starting from v i = v i andending at v i K = v j . Thus, c ij is the geodesic distance from v i to v j or the minimal cost oftransporting one unit of object from v i to v j . Thus, we can define a transport cost matrixon ( G, W ), denoted as C = ( c ij ) ∈ R N × N . The C may be time variant, since it depends onthe real-time traffic and weather conditions for ride-sourcing platform. The C is possiblyasymmetric since the graph G can be directed. Throughout the paper, we consider theweighted graph structure ( G, W, C ).Second, we need to introduce a distance (or metric) to quantify the difference betweendemand and supply masses at each time interval and across time on (
G, W, C ). At a giventime interval, we define ν j = ν ( v j ) and µ j = µ ( v j ) as the point masses at vertex v j for thetwo measures ν and µ , which, respectively, represent the number of customer requests andavailable drivers inside the vertex v j of the ride-sourcing platform (Wang & Yang 2019). Thesupply and demand systems at each timestamp can be modeled as two discrete Lebesguemeasures µ and ν on ( G, W, C ) with locally finite masses such that max( µ ( V ) , ν ( V )) isfinite for every compact set V ⊂ V . We consider a general case that the two measurescan be unbalanced, that is, µ = (cid:80) Ni =1 µ i and ν = (cid:80) Ni =1 ν i may be unequal to each other.Defining a metric between µ and ν falls into the field of optimal transport.Optimal transport has been widely studied in diverse disciplines including statistics,applied mathematics, neuroscience, medical imaging, computational biology, and computer3igure 1: Dynamic supply and demand networks at three time points in a representativeride-sourcing platform. We divide the whole city into multiple hexagon areas and eachhexagon contains multiple supplies and demands.vision. Wasserstein-based metrics based on the mathematics of optimal mass transport havebeen proved to be powerful tools for comparing objects in complex spaces. Some successfulapplications include solving transport partial differential equation (PDE) (Ambrosio &Gangbo 2008, Ambrosio et al. 2008), measuring image similarities (Rubner et al. 1998 a , b ,2000), imaging processing (Rabin & Papadakis 2015, Shi & Wang 2020), statistical inferencein machine learning (Lacombe et al. 2018, Solomon et al. 2014), manifold diffeomorphisms(Grenander & Miller 2007, Srivastava & Klassen 2016, Younes 2010), and serving as thecost function for training Generative Adversarial Networks (Arjovsky et al. 2017), amongmany others. However, existing Wasserstein-based metrics are not directly applicable to4he comparison of two unbalanced measures defined on ( G, W, C ) as detailed in Section 2.1.We introduce a novel graph-based equilibrium metric (GEM) and formulate it as an un-balanced optimal transport problem. The main contributions of this paper are summarizedas follows: • We propose a novel GEM, which can be regarded as a restricted generalized Wasser-stein distance, to quantify the distance between dynamic demand and supply networkson the weighted graph structure. It not only allows the optimal transport guided byasymmetric costs and node connections, but also accounts for unbalanced masses. Italso allows one of the two sides (supplies) to play the transporting role and the other(demands) to be fixed, which satisfies the physical interpretation of ride-sourcingplatforms. • Varying the size of each vertex leads to multilevel GEMs and their correspondingoptimal transport functions. At the finest scale, our GEM reduces to solving anunbalanced assignment problem and its corresponding optimal transport functioncontains many local details. In contrast, at a relatively coarse scale, it gives a coarserepresentation (or low frequency patterns) of the optimal transport function. • Numerically, the calculation of GEM can be reformulated as a standard linear pro-graming (LP) problem. Theoretically, we investigate several theoretical properties ofour GEM including the convergence of the LP algorithm for computing GEM, the ex-pectation of GEM, and the metric property, additive property and weak convergenceof GEM. • We examine the finite sample performance of GEM by using real data sets obtainedfrom a ride-sourcing platform in order to address three practical problems including5he prediction of the efficiency of a given dispatching policy, the optimization of orderdispatching policy, and the AB testing of comparing different dispatching policies.The remainder of this paper is structured as follows. Section 2 reviews various method-ologies including existing Wasserstein-type distances, graph-based equilibrium metrics,computational approach, and three important applications of GEM in ride-sourcing plat-forms. Section 3 studies four theoretical properties associated with GEM. Section 4 exam-ines the finite sample performance of GEM by using real and simulated data sets.
Many approaches have been proposed to measure the distance between two measures (ordistributions) on a metric space. Most of them fall into two broad categories includingthe aggregation of pixel-wise differences and the transport cost of moving one measure tomatch the other. Measurements in the first category include the L p -distance, the TotalVariation (TV) distance (Levin & Peres 2017), and the Kullback-Leibler (KL) divergence(Cha 2007), among others. A typical example is the Hellinger Distance (Nikulin 2001)reviewed as follows. Definition 2.1. (Hellinger Distance)
Let M ( X ) and M + ( X ) be the vector space ofRadon measures and the cone of nonnegative Radon measures on a Hausdorff topologicalspace X . Then, we use µ and ν to denote two probability measures that are absolutelycontinuous with respect to a third probability measure ν . The square of the Hellinger istance between µ and ν ∈ M + ( X ) is defined as D H ( µ, ν ) = 12 (cid:90) X (cid:16)(cid:112) dµ/dν − (cid:112) dν/dν (cid:17) dν , (2) where dµ/dν and dν/dν are the Radon-Nikodym derivatives of µ and ν , respectively. All these metrics suffer from two major issues. Please refer to Figures 2 for details.First, all these metrics not only fail to consider the connections among different locations(vertices in graph), but also ignore the topological (or geometric) structure of X . Second,the use of Hellinger-type distances requires a normalization step to enforce µ ( X ) = (cid:82) X dµ = ν ( X ) = (cid:82) X dν , which can create a false balance issue.To address these two issues, the second category of metrics, such as the Wassersteindistance (Ambrosio et al. 2008, Villani 2008), is proposed by solving an optimal transportproblem. In the real world, original supply resources can usually be transported to achievea better equilibrium between µ and ν . All those distances have deep connections to wellstudied assignment problems from combinatorial optimization (Steele 1987). Definition 2.2. (Wasserstein Distance)
Let X and Y be Hausdorff topological spacesand X × Y be their product space. We introduce a lower semi-continuous function c : X × Y → R ∪ {∞} , an nonnegative measure (or a transport function) γ ∈ M + ( X × Y ) ,and an equality constraint ι { = } ( α | β ) which is if α = β and ∞ otherwise. Then the optimaltransport problem for measures µ ∈ M + ( X ) and ν ∈ M + ( Y ) with the same total masses,that is µ ( X ) = ν ( Y ) , can be defined as D W ( µ, ν | c ) = inf γ ∈ M + ( X × Y ) (cid:26)(cid:90) X × Y cdγ + ι { = } ( P X γ | µ ) + ι { = } ( P Y γ | ν ) (cid:27) , (3) where P X γ and P Y γ denote the first and second marginals of γ , respectively. γ denotes a transport plan, measuring how far you have to move the mass of µ to turn it into ν . Standard optimal transport in (3) is only meaningful whenever µ and ν have the same total masses. Whenever µ ( X ) (cid:54) = ν ( Y ), there is no feasible γ in (3). For thereal-world ride-sourcing platforms, however, it is important to compute some sort of relaxedtransportation between two arbitrary non-negative measures. An improved approach is tobuild an unbalanced optimal transport problem by introducing two divergences over X and Y , denoted as D ϕ and D ϕ respectively (Chizat et al. 2018, Liero et al. 2018), which aredefined as follows. Definition 2.3. (Divergences) . Let ϕ be an entropy function. For µ, ν ∈ M ( T ) , dµdν ν + µ ⊥ is the Lebesgue decomposition of µ with respect to ν . The divergence D ϕ is defined by D ϕ ( µ | ν ) := (cid:90) T ϕ ( dµdν ) dν + ϕ (cid:48)∞ µ ⊥ ( T ) if µ and ν are nonnegative and ∞ otherwise. Now, we can give the formal definition of Generalized Wasserstein Distance
Definition 2.4. (Generalized Wasserstein Distance (GWD))
Let c : X × Y → [0 , ∞ ] be a lower semi-continuous function, the unbalanced optimal transport problem is D ϕ ,ϕ ( µ, ν | c ) = inf γ ∈ M + ( X × Y ) (cid:26)(cid:90) X × Y cdγ + D ϕ ( P X γ | µ ) + D ϕ ( P Y γ | ν ) (cid:27) . (4)Different from standard Wasserstein Distance which normalizes the input measures intoprobability distributions, GWD quantifies in some way the deviation of the marginals ofthe transport plan γ from the two unbalanced measures µ and ν by using ϕ -divergence.Although D ϕ ,ϕ ( µ, ν | c ) enjoys some nice properties, such as metric property (Chizat et al.2018, Liero et al. 2018), the solution to (4), denoted as γ ∗ , may not have any physical9eaning. For the ride-sourcing business, such γ ∗ is critically important for assigning sup-plies to demands since it can be regarded as the graph representation of a dispatchingpolicy. Therefore, the use of D ϕ ,ϕ ( µ, ν | c ) still cannot fully cover the ’useful’ relative sizebetween µ and ν , since it may underestimate unmatched resources by allowing some infea-sible transports, that is, the space M + ( X × Y ) is too large to be useful. Three major issuesof using D ϕ ,ϕ ( µ, ν | c ) are given as follows. • The first issue is that in many applications (e.g., ride-sourcing platform), point massesin only one of the two measures are allowed to be transported and those in the othermeasure are fixed. In this case, the symmetric property does not hold. • The second issue is that neither W nor C can be used to define a standard metricspace on G = ( V , E ), since transport cost (or weight) matrix may not satisfy thethree key assumptions of standard metrics. For instance, the transport cost from v i to v j may be unequal to that from v j to v i , since transport cost matrix C ∈ R N × N can be asymmetric for directed graphs. Moreover, the direct transport cost from v i to v j may be larger than or equal to the sum of the transport cost from v i to v k andthat from v k to v j . • The third issue is that in some applications, such as supply-demand networks, thetransport cost from v i to v j may not be a constant and the transport cost from avertex to itself may not be zero. It is possible that supply units at vertex v i have theirindividual transport costs of moving within/outside the vertex v i . Subsequently, theirtransport costs from v i to v j may follow a distribution instead of being a constant.10 .2 Graph-based Equilibrium Metrics On (
G, W, C ), we formally introduce our GEMs for two discrete measures µ and ν in M + ( V ), among which point masses in µ are allowed to be transported and those in ν arefixed. We need to introduce some notations. In this case, we have X = Y = V anduse P V γ and P V γ to represent P X γ and P Y γ , respectively. Let | µ | = (cid:80) Ni =1 µ ( v i ) and | µ − ˜ µ | = (cid:80) Ni =1 | µ ( v i ) − ˜ µ ( v i ) | . For i = 1 , . . . , N , we use N i to denote the neighboring setof v i in V , which contains v i and its (possibly high-order) neighboring vertexes. Moreover, v i ∈ N j does not ensure v j ∈ N i since the traffic and road networks may constrain directlytransporting cars from v j to v i .Let c : V × V → R ∪ {∞} be a function and γ ∈ M + ( V × V ) be a nonnegative measure.The general form of our GEM on ( G, W, C ) is written as ρ λ ( µ, ν | G, C ) = inf ˜ µ ∈ M + ( V ) ,γ ∈ M + ( V × V ) (cid:26) | ν − ˜ µ | + λ (cid:90) V × V cdγ (cid:27) (5)subject to an equality constraint and two sets of transport constraints given by | µ | = | ˜ µ | , ( P V γ )( v i ) = (cid:88) v j ∈N i γ ( v i , v j ) = µ i and ( P V γ )( v i ) = (cid:88) v i ∈N j γ ( v j , v i ) = ˜ µ i , (6)where λ is a non-negative hyper-parameter. The three sets of constraints in (6) ensurethat ˜ µ as an intermediate measure shares the same total mass with µ and γ transports µ to ˜ µ . Thus, the feasible set for (6) is much smaller than that for (4). The integration of λ (cid:82) V × V cdγ and the three sets of constraints in (6) is equivalent to the balanced Wassersteindistance in (3), so GEM is the integration of the balanced Wasserstein distance and the L norm.In our GEM framework, one of the two measures plays the role of ’predator’ to moveand ’catch’ the ’prey’, which mimics the general supply-demand system of ride-sourcing11latforms. Therefore, different from the setting of Piccoli & Rossi (2014), in which both twomeasures are rescaled, we fix ν but change µ only to make the two sides match each otherunder the asymmetric distance and transport range constraints. In Figure 3, we considertwo simple examples in order to understand the differences between GEM and GWD.Moreover, since we only consider the transport from µ to ˜ µ with ν fixed, ρ λ ( µ, ν | G, C ) isgenerally asymmetric and can be regarded as a restricted GWD.Figure 3: Examples illustrating the differences between GEM and GWD. Panel (a): ForGEM, the four units can be matched in the left sub-figure, whereas it is infeasible in theright one. There exits a directed edge from vertex A to vertex B, but not from vertex Bto vertex A. Panel (b): In the top sub-figure, one ‘demand’ unit at vertex C cannot bematched (the upper line) for GEM since the transport from vertex A to vertex C is notallowed in this case, whereas in the bottom sub-figure, they can be transported for GWD.In panel (b), for GEM, it is assumed that the neighboring set N i only includes the adjacentvertexes of each vertex.Besides GEM, the optimal solution of (˜ µ, γ ) to (5), denoted as (˜ µ ∗ , γ ∗ ), also plays an im-12ortant role in various two-sided markets, such as ride-sourcing platforms and E-commerce.The ˜ µ ∗ can be regarded as an optimal dispatch of transporting supplies µ to match demands ν , whereas γ ∗ is an optimal transport function associated with ρ λ ( µ, ν | G, C ). If we vary thearea of each vertex from the coarsest to the finest scale, then we obtain multilevel GEMand its transport function. At the finest scale, our GEM reduces to solving an unbalancedassignment problem, so γ ∗ is able to capture the local structure of the optimal transportfunction. In contrast, at a relatively coarse scale, we obtain a coarse representation of theoptimal transport function, reflecting its global patterns. We will discuss how to applyGEM to ride-sourcing platforms in Section 2.4.Furthermore, we can simplify γ by defining γ = ( γ ij ) as an N × N flow matrix with γ ij being the transport amount from v i to v j . Let Γ represent the set consisting of all thefeasible solutions γ with all non-negative elements γ ij ≥
0. Let (cid:101) µ = ( (cid:101) µ , . . . , (cid:101) µ N ) T ∈ R N represent the measure µ after transporting γ such that (cid:101) µ i = (cid:80) v j ∈N i γ ji holds for all i .Therefore, our proposed GEM is equivalent to solving a discrete optimization problem asfollows: ρ λ ( µ, ν | G, C ) = min γ ∈ Γ {(cid:107) ν − (cid:101) µ (cid:107) + λ (cid:88) v i ∈ V (cid:88) v j ∈ V c ij γ ij } (7)subject to (cid:88) v j ∈N i γ ij = µ i , (cid:88) v j / ∈N i γ ij = 0 , and (cid:101) µ i = (cid:88) v i ∈N j γ ji for ∀ v i ∈ V , where ν = ( ν , . . . , ν N ) T ∈ R N and (cid:107) · (cid:107) corresponds to the L norm. Moreover, (cid:107) ν − (cid:101) µ (cid:107) in (7) is equivalent to the first term of the objective function in (5).There are two key advantages of using the derived form given in (7) compared to theexisting unbalanced optimal transport problem. The first one is that transport is onlyallowed between a vertex v i and its neighboring set N i based on G .The second one is that λ as a hyper-parameter can balance the transport cost taken13o reallocate point masses and the requirement of assigning µ to satisfy ν . The choiceof λ in practice is data-driven. To ensure that the transport only happens among se-lected vertex pairs under the optimal transport plan, the theoretical upper bound of λ is2 / max v i ∈ V,v j ∈ (cid:101) N i ( c ij ), where (cid:101) N i ⊂ N i contains all the neighboring vertexes of v i that trans-port from v i . In this case, the cost of transporting one unit of supply from vertex v i to v j ∈ (cid:101) N i , λc ij , is smaller than its contribution to reducing (cid:107) ν − (cid:101) µ (cid:107) , which is 2 (1 for v i and v j , respectively), when ν j − µ j ≥ µ i − ν i ≥
1. Transport from v i to v j keeps decreasingthe objective value ρ λ ( µ, ν | G, C ) until either the balance in the destination vertex v j orthat in the origin vertex v i is achieved. In the real world, we usually let λ max v i ∈ V,v j ∈ (cid:101) N i ( c ij )fall into the range [0 . , .
5] with c ij being the geological distance and (cid:101) N i containing all thefirst-order adjacent vertexes of v i in ( G, W, C ), which can achieve the best performancein some practical problems such as the prediction of order answer rate, dispatching policydesign, and A/B testing.
Optimal solution γ ∗ to (7) can be calculated by solving a standard linear programming(LP). We will reformulate (7) as a LP problem below. Numerically, we use a revisedsimplex method incorporated in a C package GNU Linear Programming Kit (GLPK) tosolve (7). We have found that GLPK works pretty well in our real data analyses in Section4. We need to introduce some notations. Since the transport range constraints in (6)impose γ ij = 0 for v j / ∈ N i , we only need to assign optimal values to (cid:101) γ = Vec { γ ij , j ∈N i } ∈ R N × , where N = (cid:80) Ni =1 n i and Vec( · ) denotes the vectorization of a matrix. Withthis simplification, the dimension of solvable variables is reduced from O ( N ) to O ( N ),14hich highly increases the computational efficiency of our algorithm. Let A and A betwo N × N matrices. The i -th row of A consists of 0’s except the ( (cid:80) i − j =1 n j + 1)-th to( (cid:80) ij =1 n j )-th elements being 1. Similarly, all the elements of of i -th row of A are zerosexcept the ( (cid:80) j − p =1 n p + q )-th element being 1 when grid v i is indexed by q in the neighboringset N j of vertex v j . Let (cid:101) C ∈ R N × be the vector including the unit transport costs for allthe corresponding γ ij (cid:48) s ∈ (cid:101) γ . Moreover, we define A = A A I N − I N A − I N N and b = µνν , where µ = ( µ , . . . , µ N ) T , A ∈ R N × ( N +3 N ) , b ∈ R N , and I N is an identity matrix.The (7) is equivalent tomin {(cid:107) ν − A (cid:101) γ (cid:107) + λ (cid:101) C T (cid:101) γ } subject to A (cid:101) γ = µ and (cid:101) γ ≥ . Let S ∈ R N × , it can be further transferred into a standard linear programming (LP)min { T S + λ (cid:101) C T (cid:101) γ } subject to (8) A (cid:101) γ = µ , A (cid:101) γ + S ≥ ν , A (cid:101) γ − S ≤ ν , (cid:101) γ ≥ , and S ≥ . The above LP can be further rewritten asmin X { B T X } subject to AX = b , X ≥ , (9)where B = ( λ (cid:101) C T , T , T , T ) T and X = ( (cid:101) γ T , S T , w T , w T ) T , in which w and w are vectorsof slack variables. The dual of (9) is assigned asmax y ∈ R N { b T y } subject to A T y ≤ B, (10)which further reduces the variable dimension from N + 3 N to 3 N .15 .4 Applications of GEM in Ride-sourcing Platforms To calculate GEM, we need to build a dynamic weighted graph structure over time foreach city on the ride-sourcing platform as follows. We first divide a city into | V | = N non-overlapping hexagons and regard each hexagon as a vertex in V . Then, we set N i = ∪ k =0 N ki ,where N ki includes all the neighboring hexagons within the k -th outer layer of v i for k > N i only includes v i itself. A vertex v j belongs to the k -th outer layer of v i if k steps arerequired to walk from v i to v j on the hexagonal network. Thus, we determine G = ( V , E ).Second, we set W t = ( w ijt ), where w ijt is the distance between v i and v j in the t − thtimestamp. Note that w ijt may vary with time due to the real-time locations of driversand customers. Third, we compute C t = ( c ijt ) by using W t according to (1) in the t − thtimestamp. Finally, we obtain the dynamic weighted graph structure ( G, W t , C t ).We show how to use GEM to address three important questions of interest in ride-sourcing platforms. First, we can measure the optimal distance between observed dynamicsupply and demand networks across time. We extract the spatio-temporal data O = { ( o it ) } t and D = { ( d it ) } t from the dynamic demand and supply systems, where o it and d it representdemands and supplies at vertex v i in the t -th timestamp, respectively. Given O and D , weset µ t = ( d it ) i and ν t = ( o it ) i and use the LP algorithm to calculate ρ ( t ) = ρ λ ( µ t , ν t | G, C t )and its corresponding solution, denoted as (˜ µ t ∗ = ( ˜ d it ∗ ) i , γ t ∗ ), in the t -th timestamp.Furthermore, we introduce an optimal supply-demand ratio at each v i in the t − thtimestamp defined as the ratio of o it over the ’optimal’ supplies ˜ d it ∗ + ι { = } ( ˜ d it ∗ = 0),denoted as DSr it , in which we add an extra term ι { = } ( ˜ d it ∗ = 0) to avoid zero in thedenominator. Similarly, we can define an optimal supply-demand difference as DSd it = o it − ˜ d it ∗ at each ( v i , t ). It allows us to create the spatiotemporal map of GEM-relatedmeasures (DSr it , DSd it ). Furthermore, we extend (DSr it , DSd it ) to a wide timespan T V ∈ V . For instance, we define a weighted average supply-demandratio over V in T and a weighted average absolute supply-demand difference over V in T as follows:DSr T ( V ) = (cid:82) t ∈T (cid:80) i ∈ V w it DSr it dt (cid:82) t ∈T (cid:80) i ∈ V w it dt and ADSd T ( V ) = (cid:82) t ∈T (cid:80) i ∈ V w it | DSd it | dt (cid:82) t ∈T (cid:80) i ∈ V w it dt , (11)in which we set w it as either o it or ( o it + ˜ d it ∗ ) / | DSd it | and | DSr it − | across all ( v i , t ). Please see an application of (DSr it , DSd it ) inSection 4.1 for details.Second, we can use historical supply-demand information contained in { (DSr it , DSd it ) :( v i , t ) ∈ V × T } to design order dispatching policies for large-scale ride-sourcing platforms.Order dispatch is an essential component of any ride-sourcing platform for assigning idledrivers to nearby passengers. Standard order dispatching approaches focus on immediatecustomer satisfaction such as serving the order with the nearest drivers (Liao 2003) or thefirst-come-first-go strategy to serve the order on the top of the waiting list with the firstdriver becoming available (Zhang & Pavone 2016). Those greedy methods, however, failto account for the spatial effects of an order and driver (O-D) pair on the other O-D pairs.Thus, they may not be optimal from a global perspective. To improve users’ experience,some more advanced techniques strive to balance between small pick-up distance and largedrivers’ revenue. To design better dispatching policy, we will include additional historicalsupply-demand network information based on GEM to delineate its effects on the averageexpected gain from serving current order. Please see Section 4.2 for details.Third, an important application of { ρ ( t ) } is to use it as a metric to directly compare two(or more) dispatching policies for ride-sourcing platforms. The key idea is to detect whetherthere exists a significant difference between two sets of GEMs for two competitive policies17nder the same platform environment. Given the joint distribution of demand and supplyin the platform, the smaller GEM is, the better many global operational metrics, such asorder answer rate, order finishing rate, and driver’s working time, are. Compared withthose global operational metrics, GEM is a more direct measurement of the operationalefficiency for a ride-sourcing platform. Please see Section 4.3 for details. In this section, we study the theoretical properties of our GEM related methods proposedin Section 2, most of whose proofs can be found in the supplementary document.First, we establish the convergence property of LP (9) for GEM.
Theorem 3.1.
The LP (9) has an optimal basic feasible solution. Furthermore, if X isfeasible for the primal problem (9) and y is feasible for the duality (10), then we have ¯ z = y T b = y T AX ≤ B T X = z (12) If either (9) or (10) has a finite optimal value, then so does the other, the optimal valuescoincide, and the optimal solutions to both (9) and (10) exist.
An implication of Theorem 3.1 is that the LP algorithm for GEM converges. It demon-strates that there always exit theoretically optimal transport plans (including no transportcase) to maximally increase the systematic coherence between the initially unbalanced sup-plies and demands. However, Theorem 3.1 also indicates that the optimal transport planmay not be unique considering the weighted graph structure and initial supply and demanddistributions. 18econd, we carry out a probabilistic analysis of our LP (9) for GEM when c ij follows adistribution. Let’s start with two motivating examples of ride-sourcing platforms describedin Figure 4, in which each vertex represents a hexagonal urban area. The w ij representsthe geological distance or the traffic time, which may vary between each pair of supply at v j and demand at v i .Figure 4: Two examples to illustrate the importance of using random c ij in some cases.In Panel (a), two demands and two drivers can only be matched in the lower sub-figuresince their corresponding pairs of distance are below a given threshold, whereas it is notthe case in the upper sub-figure when the distances are larger than the threshold. In Panel(b), supply A is assigned to demand C in the lower sub-figure when the within-grid cost c ii is non-zero, and to demand B when c ii = 0, guided by the optimal transport plan withsmaller transport costs.For the LP defined in (9), it is assumed that each component of (cid:101) C = (˜ c , ˜ c , . . . , ˜ c N ) T isa non-negative random variable, whereas all elements in A and b in (9) are known. Let z ∗ z ∗ is a function of (cid:101) C , it is also a random variable.We provide an upper bound for the expectation of z ∗ = z ∗ ( (cid:101) C ) below. Theorem 3.2 ( Expectation Bound).
Let ˜ c , . . . , ˜ c N be independent non-negative ran-dom variables. Suppose there exist α ∈ (0 , ∞ ) and α ∈ (0 , such that for l = 1 , , . . . , N and all h > with P ( λ ˜ c l ≥ h ) > , we have E ( λ ˜ c l | λ ˜ c l ≥ h ) ≥ α λE (˜ c l ) + α h, (13) where the expectation is taken with respect to ˜ c l . Let { ˆ x , . . . , ˆ x N + N } be any fixed feasiblesolution to (9). We have E ( z ∗ ) ≤ α − { N (cid:88) l =1 (1 − α δ l ) E ( λ ˜ c l )ˆ x l + N + N (cid:88) l = N +1 ˆ x l } , (14) where δ l ∈ [0 , defined in the supplementary document is a pre-defined nonnegative con-stant for each l ∈ { , . . . , N } . Theorem 3.2 has at least two implications. First, condition (13) holds under some mildconditions. For instance, it can be shown that if ˜ c j is a bounded random variable that takesvalues in [ c j,L , c j,U ] such that P (˜ c j > c j,L ) > h → h − P ( λ ˜ c j < h + λc j,L ) > c j include uniform, truncated normal, andtruncated exponential random variables, among others. For instance, we consider the casethat ˜ c j follows Uniform [ c j,L , c j,U ]. It can be shown that E ( λ ˜ c j | λ ˜ c j ≥ h ) = 0 . λc j,U + h ),yielding α = c j,U / ( c j,U + c j,L ) and α = 0 .
5. Second, (14) gives an upper bound of theexpected value of z ∗ ( (cid:101) C ). If we set δ l = 0 for all l , then we can obtain a larger upper boundcompared with the right-hand side of (14). This result generalizes an existing result of Dyeret al. (1986) for standard linear programs with random costs under a stronger conditioncorresponding to α = 1. 20hird, we examine the metric properties of ρ λ (( · ) , ( · ) | G, C ) including non-negativity,identity, symmetry, and the triangle inequality.
Theorem 3.3.
The operator ρ λ (( · ) , ( · ) | G, C ) is a semi-metric such that it satisfies non-negativity, identity, and symmetry, but not necessarily the triangle inequality when (i) C = ( c ij ) ∈ R N × N is symmetric with c ii = 0 for all i ; (ii) j ∈ N i if and only if i ∈ N j . Theorem 3.3 indicates that if C is symmetric, then ρ λ (( · ) , ( · ) | G, C ) as a semi-metricsatisfies three properties including non-negativity, identity, and symmetry. Although thesymmetric assumption of C may be incorrect for all vertexes, it should be valid for mostvertexes. Thus, ρ λ (( · ) , ( · ) | G, C ) is approximately a semi-metric.Fourth, we give the upper and lower bounds of GEM and consider an additivity propertyin order to better understand how the transport costs and network structures affect GEM.
Theorem 3.4.
The following properties hold:(i). || µ | − | ν || ≤ ρ λ ( µ, ν | G, C ) ≤ ( | µ | + | ν | ) (ii). Additivity Property . For a non-negative ∆ , when C is symmetric, we have | ρ λ ( µ, ν + ∆ | G, C ) − ρ λ ( µ, ν | G, C ) | ≤ N ∆; | ρ λ ( µ + ∆ , ν | G, C ) − ρ λ ( µ, ν | G, C ) | ≤ N ∆ . Property (i) shows that GEM can be bounded from both above and below. Based onthe additivity property, the GEM value can either increase or decrease with one-side node-wise augmentation, which depends on the weighted graph structure and the distribution ofsupply and demand. This indicates that applying proper stimulus at selected vertexes ismore efficient than globally increasing supply resources.Fifth, we examine the weak convergence property of ρ λ (( · ) , ( · ) | G, C ).21 heorem 3.5. (Weak Convergence)
Let { µ n } be a sequence of measures on space V ,and µ n , µ ∈ M + ( V ) . If all the transport costs are bounded, that is c ij ≤ R holds for ∀ v i ∈ V and v j ∈ N i , then we haveif µ n → µ and { µ n } is tight, then ρ λ ( µ, µ n | G, C ) → . Here is an immediate corollary of Theorem 3.5, which guarantees the continuity of ρ λ ( µ, µ | G, C ) on (
G, W, C ): Corollary 3.5.1.
Let { µ n } and { ν n } be two sequences of measures on space V , and µ n , ν n , µ, ν ∈ M + ( V ) . If c ij ≤ R holds for ∀ v i ∈ V and v j ∈ N i , then we haveif µ n ( resp. ν n ) → µ ( resp. ν ) and { µ n } , { ν n } are tight, then ρ λ ( µ n , ν n | G, C ) → ρ λ ( µ, ν | G, C ) . Theorem 3.5 states that the GEM value goes to 0 and no transport is required whenthe initial distributions of µ and ν are getting close to each other. In this section, we examine the finite sample performance of GEM by carrying out threereal data analyses including answer-rate prediction, the design of order dispatching strategy,and policy assessment. Without special saying, we use the method described in subsection2.4 to construct the dynamic weighted graph structure across time in all these analyses.
The data set that we use here includes both demand and idle driver information from April21st to May 20th, 2018 in a large city H. We divide the whole city into N = 800 non-overlapping hexagonal sub-regions with side length being 1400m to form the whole vertex22et V . We let the directed edge weight w ij from v i to v j ∈ N i be the distance between thecenters of the two sub-regions, which is 2400m if v j can be directly reached by v i throughtraffic without first passing through another vertex. Otherwise, w ij = ∞ . We computethe numbers of idle drivers and demands in each vertex per minute and then extract thedynamic supply-demand data set.The aim of this data analysis is to examine whether the GEM-related measures, such asDSr it , are useful for predicting order answer rate in ride-sourcing platforms. Order answerrate is defined as the number of orders accepted by drivers divided by the total number oforders in a fixed time interval. Specifically, we predict the log-value of order answer rateof the incoming 10 (or 60) minutes by using historical metric values. We computed theHellinger distance, the L distance, the Wasserstein distance, and GEM for each 10-minuteinterval. The L distance is calculated by using the numbers of orders and available driversin all vertices across 10 consecutive 1-min timestamps. The Hellinger distance is calculatedby normalizing the numbers of orders and available drivers in all vertices and across 10 con-secutive 1-min timestamps into probability distributions. For the Wasserstein distance, wefirst normalize both supplies and demands at each one-minute time interval into two prob-ability distributions and calculate their corresponding Wasserstein distance. Subsequently,we obtain the metric value over each 10-minute interval by aggregating the Wassersteindistances computed across the 10 included one-minute timestamps by using their corre-sponding weights (cid:80) v i ∈ V o it k / (cid:80) t k ∈T (cid:80) v i ∈ V o it k . For GEM, we compute the supply-demandratio map of DSr it per minute and then we calculate DSr T ( V ) for each 10-minute interval.We split the supply-demand data set into a training data set consisting of observationsfrom April 26th to May 11th, 2018, and a test data set consisting of observations from May12th to May 21st, 2018. We use linear regression models to predict the log-value of order23nswer rate of the incoming j − th 10 minutes for j = 1 , . . . , p = 10 10-minute snapshots and those in the same timewindows of the previous 5 days.Figure 5: Results from the answer-rate prediction. Panel (a): comparisons of the log-valueof real answer rates obtained from May 12th to May 18th, 2018 and their predictive valuesbased on the L distance, the Hellinger distance, the Wasserstein distance, and GEM. Panel(b): comparisons of day-wise RMSEs of answer rate prediction obtained from Monday toSunday using the Hellinger distance, the Wasserstein distance, and GEM within the wholecity area.We use Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error(RMSE) as evaluation metrics to examine the prediction accuracy of all the four com-pared metrics. Table 1 shows their corresponding RMSE and MAPE values based on thetest data. Due to the space limitation, we only provide the results corresponding to thoseat t + 10 and t + 60 minutes, which indicate the short-term and long-term prediction ca-24acities of all the four metrics. Moreover, we also include the results during the eveningpeak hours starting from 6 pm to 8 pm. For both the t + 10 and t + 60 cases, GEM signifi-cantly outperforms all other three metrics, which may not sufficiently capture the dynamictransport and systematic balance of the weighted graph structure.Figure 5 (a) presents the real order answer rates and their predictive values in the last 7test days (Tuesday to Monday) from May 12th to May 18th based on all the four metrics forthe ( t +10) case. Compared with all other methods, GEM shows higher consistency betweenthe true and predicted answer rate values, especially for some abnormal extreme cases.Furthermore, Figure 5 (b) presents the histograms of RMSEs for the Hellinger distance,the Wasserstein distance, and GEM at each day of the last seven dayes, indicating thatGEM outperforms the other two metrics consistently in all seven days. Therefore, our GEMis able to capture the short- and long-term variability within the coherence between thetwo spatial-temporal systems and has strong interpretation capacity for predicting futureanswer rates. We consider the order dispatching problem of matching N o orders with N d available idledrivers, where N o and N d denote the total number of orders and that of idle drivers in thecurrent timestamp, respectively. The edge weight A ( k, l ) in the bipartite graph equals tothe expected earnings when pairing driver l to order k . Let x kl be 1 if order k is assignedto driver l and 0 otherwise. The global order dispatching algorithm solves a bipartite25igure 6: The order dispatch as a bipartite matching problem: (a) available orders anddrivers prepared for pairing; (b) quantifying all the potential expected earning A ( k, l ) for alldriver-order pairs ( k, l ) that satisfy the dispatching constraints; and (c) finding the optimalone-to-one bipartite matching in order to maximize the total revenue.matching problem as follows:arg max x kl N d (cid:88) k =0 N o (cid:88) l =0 A ( k, l ) x kl , s.t. N d (cid:88) k =0 x kl ≤ ∀ l ; N o (cid:88) l =0 x kl ≤ ∀ k ; (15) x kl = 0 if c kl > (cid:15) ∀ k, l. See Figure 6 for a graphical illustration of (15). The constraints ensure that each ordercan be paired to at most one available driver and similarly each driver can be assignedto at most one order. In practice, only drivers within a certain distance could serve thecorresponding orders, which means that x kl s’ are forced to be 0 when the distance betweenorder k and driver l , denoted as c kl , is beyond the maximal pick-up distance (cid:15) . The state-of-art algorithm to solve this kind of matching problem is the Kuhn-Munkres (KM) algorithm(Munkres 1957), which will be used to solve the formulated problem here.In this paper, we compare three different dispatching policies based on three different26ormulations of A ( k, l ). The first one as a baseline only considers the immediate reward ofassigning driver l to order k , which is defined as A (1) ( k, l ) = α r k − α c kl , (16)where r k is the driver’s earning by serving order k and c kl is the pick-up distance betweenorder k and driver l . Moreover, α and α are tuning parameters such that the two termsare balanced to maximize drivers’ salaries, while reducing customers’ waiting time.The second one is given by A (2) ( k, l ) = α r k − α c kl + α { η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) } , (17)where η is the discount factor and an additional term α { η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) } is introducedto enhance the long-term effects of current actions on drivers’ future income (Xu et al.2018). Let V ( s ) be the expected earnings from now to the end of the day for a driverlocated at s = ( v, t ), where v ∈ V and t is the current time. Moreover, s l = ( v ( l ) , t )and s (cid:48) lk = ( v ( k ) , t + ∆ t lk ) here represent the current spatial-temporal state of driver l andhis/her estimated finishing state when completing serving order k , where v ( l ) ∈ V is thecurrent region of driver l before order assignment and v ( k ) ∈ V is the destination region oforder k and ∆ t lk denotes the total time required for driver l to finish the whole process ofserving order k . If a driver becomes available to a new order immediately after finishingthe ongoing one, then η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) is the extra future earning for driver l by servingorder k other than staying idle.The third one is given by A (3) ( k, l ) = A (2) ( k, l ) + α { η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) } , (18)where α { η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) } is further introduced to balance the supply-demand coher-ence. Moreover, V ( s ) = ν t ( v ) − ˜ µ t ( v ) at s = ( v, t ) is calculated from GEM in (7). The27se of V ( · ) increases the probability that customers’ requests can be quickly answeredby nearby drivers, whereas V ( · ) ignores the interaction effects when multiple drivers areheading to the same location. Thus, when the future demand has already been fulfilled bydrivers re-allocated by previous completed servings, assigning more drivers might decrease V ( · ) in the target location.We use a comprehensive and realistic dispatch simulator designed for recovering thereal online ride-sourcing system to evaluate the three dispatching policies (16) − (18). Thesimulator models the transition dynamics of the supply and demand systems to mimic thereal on-demand ride-hailing platform. The order demand distribution of the simulator isgenerated based on historical data. The driver supply distribution is initialized by historicaldata at the beginning of the day, and then evolves following the simulator’s transitiondynamics (including drivers getting online/offline, driver movement with passengers andidle driver random movement) as well as the order dispatching policies. The differencesbetween the simulated results and the real-world situation is less than 2% in terms of someimportant metrics, such as drivers’ revenue, answer rate, and idle driver rate.To compare the three dispatching policies, we randomly selected a specific city S, whichusualy has in total 150, 000 to 200, 000 ride demands per day. We still divide the wholecity area into N = 800 hexagonal vertices and use the geological distance between twonearby grids to be the edge weights. Furthermore, three different days including 2018/05/15(Tuesday), 2018/05/18 (Friday), and 2018/05/19 (Saturday) were analyzed since the globalorder answer rates on weekday are usually much lower than those at weekend by looking atthe historical data. Both V ( · ) and V ( · ) values were obtained by taking the average of thesame weekday or weekend from the previous four weeks since the platform has significantweekly periodicity. The length of time intervals that we used to compute V ( · ) and V ( · )28as set to be T = 10 minutes so that all the action windows inside share the same V ( · ) and V ( · ) values. Specifically, V ( · ) is achieved by aggregating the |T | continuous ( ν kt − ˜ µ kt )s’.We applied the three dispatching policies with different edge weights to the simulator evenbased on the same initial input and transition dynamics. We set α = 1 and α = 0 . r k and the pick-up distance c kl into comparable ranges. The r k contributes more to the variations of A ( k, l ) because of the constrained pick-up distance( c kl ≤ (cid:15) ). Furthermore, we perform grid search for a wide range of ( α ; α ) combinationsto find its optimal solution, denoted as ( α ∗ , α ∗ ), that maximizes average drivers’ revenuesfor weekdays and weekends in the simulator. Specifically, we fixed α = 0 first and usethe bisection method to obtain a rough value range of length 0.1 for α with its initialstart being [0 , α each time within the value range until finding the optimal α ∗ corresponding to the largestaveraged drivers’ revenue. Subsequently, we fix α ∗ and do the similar grid search to getthe optimal α ∗ .Tables 2 and 3 summarize the collected results corresponding to the baseline policy, A (2) ( k, l ) with the optimal α , and our approach with different α values. It reveals thatthe order dispatching policy based on A (3) ( k, l ) could achieve higher drivers’ revenue andanswer rate compared with the other two policies. The optimal α is achieved at 0 . .
61, and 0 .
52 for 2018/05/15, 2018/05/18, and 2018/05/19, respectively. In 2018/05/15and 2018/05/19, we obtain a smaller optimal α since a higher coherence between suppliesand demands is achieved under the baseline policy ( α = α = 0) than that of 2018/05/18,which indicates that the supply-demand relationship is more related to the policy effi-ciency than the weekday/weekend status. Moreover, the supply abundance in 2018/05/15and 2018/05/18 results in a higher order answer rate but a smaller optimal α . Compared29o the policy corresponding to A (2) ( k, l ), adding the GEM-related measurements increasesthe expected whole-day answer rate and drivers’ revenue in more than 1%. It may fur-ther indicate that the supply-demand difference may affect the expected future gain of amarginal driver.In practice, we first perform grid search for a wide range of ( α , α ) combinationsto find its optimal solution ( α ∗ , α ∗ ) that maximizes average drivers’ revenues for somerepresentative days in the simulator. Then we fine-tune the parameters via on-line A/Btesting, and apply the policy in the real-life dispatching system. The value functions V ( · )and V ( · ) are updated when the new policy being employed for a period of time, and α ∗ and α ∗ are re-tuned in the real environment to achieve the optimal efficiency. We conduct an experiment using another supply-demand data set of the same city Hfrom December 3rd to December 16th, 2018 in order to compare the effectiveness betweentwo order dispatching policies. We executed them alternatively on successive half-hourlytime intervals. Moreover, we start with the baseline policy being in the first half hourand change the policy every half hour through the whole day and reverse their order inanother day. We include an A/A test, which compares the baseline policy against itself,by using the historical data obtained from November 12th to November 25th as a directcomparison. We calculate GEM within each time window of 30 minutes as follows. Thereare in total M T = 48 time intervals per day. To obtain GEM in each time interval T ,we aggregate 30 GEM values, each of which is calculated within the 1-min timestamp, byusing normalization weights o it / (cid:80) t ∈T (cid:80) v i ∈ V o it .We first need to introduce some notation. We denote y m ( t k ) as the aggregated GEM30alue and use x m ( t k ) to denote a 2 × k -th time interval of day m for k = 1 , . . . , M T and m = 1 , . . . , M D .Let a m ( t k ) = 1 if the new policy is used and = − y m ( t k ) = β ( t k ) + β ( t k ) T { x m ( t k ) − x ( t k ) } + β ( t k ) a m ( t k ) + η m ( t k ) + ε m ( t k ) , (19)where β ( t k ) = ( β ( t k ) , β ( t k ) T , β ( t k )) T is a vector of regression coefficients at t k , and x ( t k ) is the sample mean of all x i ( t k )s for k = 1 , . . . , M T . In addition, we assume that η m = ( η m ( t ) , . . . , η m ( t M T )) T and ε m = ( ε m ( t ) , . . . , ε m ( t M T )) T are M T × N ( , Σ η ) and N ( , σ ε · I M T ), where Σ η is an M T × M T matrix and σ ε is a positive scalar. We are interestedin testing the following null and alternative hypotheses: H : (cid:90) M T β ( t ) dt = 0 v.s. H : (cid:90) M T β ( t ) dt (cid:54) = 0 , (20)where (cid:82) M T β ( t ) dt ≈ (cid:80) M T k =1 β ( t k )∆ t denotes the average treatment effect per day, in which∆ t is the length of each time interval. We propose a joint estimation procedure based onGeneralized Estimating Equations (GEE) to iteratively estimate all unknown parametersuntil a specific convergence criterion being reached (Liang & Zeger 1986). Subsequently,we compute the t − test statistic associated with the average treatment effect per day andits corresponding one-sided (or two-sided) p − value (Mancl & DeRouen 2001).Furthermore, we consider three global operational metrics including the order answerrates, order finishing rate, and gross merchandise value (GMV) as y m ( t k ) in model (19). Wefit the corresponding three regression models in order to study whether the new dispatchingpolicy significantly improves the ride-sourcing platform at the operational level.31able 4 summarizes all regression analysis results for both the A/A and A/B experi-mental designs. We can see that in the A/B experimental design, there exists a significantincrease in the mean answer rate, finishing rate and gross merchandise value when replacingthe old policy by the new one since all the p − values associated with the average treatmenteffect are smaller than 10 − . The new policy can also significantly reduce the GEM value( p − value smaller than 0 . it within the same time period under the control and experimental policies, respectively.The customer requests in three selected regions marked by green and purple circles weresatisfied by the drivers in nearby regions, resulting in the higher supply-demand coherenceand thus a smaller GEM value. References
Ambrosio, L. & Gangbo, W. (2008), ‘Hamiltonian odes in the wasserstein space of proba-bility measures’,
Communications on Pure and Applied Mathematics: A Journal Issuedby the Courant Institute of Mathematical Sciences (1), 18–53.Ambrosio, L., Gigli, N. & Savar´e, G. (2008), Gradient flows: in metric spaces and in the pace of probability measures , Springer Science & Business Media.Arjovsky, M., Chintala, S. & Bottou, L. (2017), Wasserstein generative adversarial net-works, in ‘International Conference on Machine Learning’, pp. 214–223.Cha, S.-H. (2007), ‘Comprehensive survey on distance/similarity measures between proba-bility density functions’, City (2), 1.Chizat, L., Peyr´e, G., Schmitzer, B. & Vialard, F.-X. (2018), ‘Scaling algorithms for unbal-anced optimal transport problems’, Mathematics of Computation (314), 2563–2609.Dyer, M. E., Prieze, A. M. & Mcdiarmid, C. J. H. (1986), ‘On linear programs with randomcosts’, Mathematical Programing , 3–16.Grenander, U. & Miller, M. (2007), Pattern Theory From Representation to Inference ,Oxford University Press.Lacombe, T., Cuturi, M. & OUDOT, S. (2018), Large scale computation of means andclusters for persistence diagrams using optimal transport, in S. Bengio, H. Wallach,H. Larochelle, K. Grauman, N. Cesa-Bianchi & R. Garnett, eds, ‘Advances in NeuralInformation Processing Systems 31’, Curran Associates, Inc., pp. 9770–9780.
URL: http://papers.nips.cc/paper/8184-large-scale-computation-of-means-and-clusters-for-persistence-diagrams-using-optimal-transport.pdf
Levin, D. A. & Peres, Y. (2017),
Markov chains and mixing times , Vol. 107, AmericanMathematical Soc.Liang, K.-Y. & Zeger, S. L. (1986), ‘Longitudinal data analysis using generalized linearmodels’,
Biometrika (1), 13–22. 33iao, Z. (2003), ‘Real-time taxi dispatching using global positioning systems’, Communi-cations of the ACM (5), 81–83.Liero, M., Mielke, A. & Savar´e, G. (2018), ‘Optimal entropy-transport problems and anew hellinger–kantorovich distance between positive measures’, Inventiones mathemati-cae (3), 969–1117.Mancl, L. A. & DeRouen, T. A. (2001), ‘A covariance estimator for gee with improvedsmall-sample properties’,
Biometrics (1), 126–134.Munkres, J. (1957), ‘Algorithms for the assignment and transportation problems’, Journalof the society for industrial and applied mathematics (1), 32–38.Nikulin, M. S. (2001), ‘Hellinger distance’, Encyclopedia of mathematics .Piccoli, B. & Rossi, F. (2014), ‘Generalized wasserstein distance and its application to trans-port equations with source’, Archive for Rational Mechanics and Analysis (1), 335–358.Rabin, J. & Papadakis, N. (2015), Convex color image segmentation with optimal trans-port distances, in ‘International Conference on Scale Space and Variational Methods inComputer Vision’, Springer, pp. 256–269.Rubner, Y., Tomasi, C. & Guibas, L. J. (1998 a ), Adaptive color-image embeddings fordatabase navigation, in ‘Asian Conference on Computer Vision’, Springer, pp. 104–111.Rubner, Y., Tomasi, C. & Guibas, L. J. (1998 b ), A metric for distributions with applicationsto image databases, in ‘Sixth International Conference on Computer Vision (IEEE Cat.No. 98CH36271)’, IEEE, pp. 59–66. 34ubner, Y., Tomasi, C. & Guibas, L. J. (2000), ‘The earth mover’s distance as a metricfor image retrieval’, International Journal of Computer Vision (2), 99–121.Shi, J. & Wang, Y. (2020), ‘Hyperbolic wasserstein distance for shape indexing’, IEEETrans Pattern Anal Mach Intell. , in press.Solomon, J., Rustamov, R., Guibas, L. & Butscher, A. (2014), Wasserstein propagation forsemi-supervised learning, in ‘International Conference on Machine Learning’, pp. 306–314.Srivastava, A. & Klassen, E. P. (2016), Shapes and Diffeomorphismss , Springer Series inStatistics.Steele, J. M. (1987),
Probability Theory and Combinatorial Optimization , Society for In-dustrial and Applied Mathematics.Villani, C. (2008),
Optimal transport: old and new , Vol. 338, Springer Science & BusinessMedia.Wang, H. & Yang, H. (2019), ‘Ridesourcing systems: A framework and review’,
Trans-portation Research Part B: Methodological , 122–155.Xu, Z., Li, Z., Guan, Q., Zhang, D., Li, Q., Nan, J., Liu, C., Bian, W. & Ye, J. (2018),Large-scale order dispatch in on-demand ride-hailing platforms: A learning and plan-ning approach, in ‘Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining’, ACM, pp. 905–913.Younes, L. (2010), Shapes and Diffeomorphismss , Springer.35hang, R. & Pavone, M. (2016), ‘Control of robotic mobility-on-demand systems: aqueueing-theoretical perspective’,
The International Journal of Robotics Research (1-3), 186–203. 36able 1: Results from the answer-rate prediction. Comparisons of Hellinger, L -distance,Wasserstein, and GEM in predicting answer rate at t + 10 and t + 60 minutes. Peakhour denotes the time from 6 pm to 8 pm. MAPE and RMSE denote the mean absolutepercentage error and root mean squared error, respectively.Hellinger L2-distance Wasserstein GEMt+10 All time RMSE 0.1362 0.1496 0.1273 MAPE 0.0801 0.0891 0.0718
Peak hour RMSE 0.2219 0.2187 0.2088
MAPE 0.1494 0.1457 0.1089 t+60 All time RMSE 0.1522 0.1552 0.1413
MAPE 0.0828 0.0868 0.0859
Peak hour RMSE 0.2395 0.2565 0.2222
MAPE 0.1077 0.1159 0.1317 − (18) with respect to two evaluation metrics including the drivers’ revenue and theglobal answer rate using the simulator for city S on two selected Weekdays. The rows with( α , α ) = (0 ,
0) correspond to the first (or baseline) policy (16), those with α = 0 and α (cid:54) = 0 correspond to the second policy, and all other rows correspond to the third pol-icy. The numbers in the parentheses denote the relative improvement of the correspondingpolicy over the baseline policy for each evaluation metric. α α Drivers’ Revenue (Yuan) Order Answer Rate2018/05/15 (Tuesday)0 0 1191316 0.7370.54 0 1227175(+3.01%) 0.760(+3.12%)0.54 6 1235037(+3.67%) 0.761(+3.28%)0.54 7 1236824(+3.82%) 0.763(+3.54%)0.54 8 − (18) with respect to two evaluation metrics including the drivers’ revenue and theglobal answer rate using the simulator for city S on a selected Weekend. α α Drivers’ Revenue (Yuan) Order Answer Rate2018/05/19 (Saturday)0 0 13507568 0.7450.52 0 13886185(+2.80%) 0.768(+3.09%)0.52 6 14034453(+3.90%) 0.774(+3.89%)0.52 7 14008847(+3.71%) 0.772(+3.62%)0.52 8 p − valueof average treatment effects for the A/A and A/B experimentsExperiment Design y m ( t ) Relative Improvement(%) p − valueAnswer Rate 0.76 1.16e-12A/B Finish Rate 0.36 4.32e-3GMV 0.86 2.91e-6GEM -0.80 4.06e-2Answer Rate 0.01 0.96A/A Finishing Rate 0.01 0.96GMV -0.08 0.72GEM -0.25 0.4339igure 7: Results from the policy evaluation. (A) The GEM values of a randomly selectedday on 2018/12/08 for city H at 30-min scale. Green and red points represent the GEMvalues generated by the baseline (control) and new (experimental) policies, respectively. Inparticular, we mark the time period 7:00 to 8:00 a.m. by a blue circle, which demonstratesa significant reduction of GEM value when changing the policy from the the control oneto the experimental one. (B) The heatmaps of vertex-wise supply-demand difference DSd it of city H under the control and experimental policies within the 30-min time window from7:00 to 7:30 am and that from 7:30 to 8:00 am, respectively. Hexagons in red and bluecolors represent the locations with positive and negative DSd it , respectively, and a deepercolor corresponds to a big | DSd it ||