[PDF] Graph-Based Equilibrium Metrics for Dynamic Supply-Demand Systems with Applications to Ride-sourcing Platforms

Abstract

How to dynamically measure the local-to-global spatio-temporal coherence between demand and supply networks is a fundamental task for ride-sourcing platforms, such as DiDi. Such coherence measurement is critically important for the quantification of the market efficiency and the comparison of different platform policies, such as dispatching. The aim of this paper is to introduce a graph-based equilibrium metric (GEM) to quantify the distance between demand and supply networks based on a weighted graph structure. We formulate GEM as the optimal objective value of an unbalanced transport problem, which can be efficiently solved by optimizing an equivalent linear programming. We examine how the GEM can help solve three operational tasks of ride-sourcing platforms. The first one is that GEM achieves up to 70.6% reduction in root-mean-square error over the second-best distance measurement for the prediction accuracy. The second one is that the use of GEM for designing order dispatching policy increases answer rate and drivers' revenue for more than 1%, representing a huge improvement in number. The third one is that GEM is to serve as an endpoint for comparing different platform policies in AB test.

Full PDF

GGraph-Based Equilibrium Metrics forDynamic Supply-Demand Systems with

Applications to Ride-sourcing Platforms

Fan ZhouSchool of Statistics and ManagementShanghai University of Finance and EconomicsShikai Luo, Xiaohu Qie, Jieping Ye , Hongtu ZhuDidi Chuxing

Abstract

The aim of this paper is to introduce a novel graph-based equilibrium metric(GEM) to quantify the distance between two discrete measures with possibly diﬀer-ent masses on a weighted graph structure. This development is primarily motivatedby dynamically measuring the local-to-global spatio-temporal coherence between de-mand and supply networks obtained from large-scale two-sided markets, such as ride-sourcing platforms and E-commerce. We formulate GEM as the optimal objectivevalue of an unbalanced transport problem. Transport is only allowed among con-nected vertexes satisfying certain constraints based on the weighted graph structure.The transport problem can be eﬃciently solved by optimizing an equivalent linearprogramming. We also investigate several important GEM-related theoretical prop-erties, such as metric properties and weak convergence. Furthermore, we use realand simulated data sets obtained from a real ride-sourcing platform to address threeimportant problems of interest including predicting answer rate, large-scale orderdispatching optimization, and policy assessment in ride-sourcing platforms.

Keywords:

Graph-based Equilibrium Metric; Order Dispatching; Ride-sourcing Platform;Unbalanced Optimal Transport; Weighted Graph.1 a r X i v : . [ m a t h . O C ] F e b Introduction

Large volumes of data collected from multiple spatio-temporal networks are increasinglystudied in diverse ﬁelds including climate science, social sciences, neuroscience, epidemi-ology, and transportation. In addition, those spatio-temporal networks may interact witheach other across spatial and/or temporal dimension. A typical example is that the dy-namic demand and supply networks of a ride-sourcing platform (Wang & Yang 2019) aretwo sequences of un-normalized masses measured on the same undirected (or directed)graph G = ( V , E ), where V and E are, respectively, a vertex set and a set of edges connect-ing vertex pairs. Figure 1 illustrates how the two complicated networks interact with eachother and evolve over time. Speciﬁcally, a city is divided into hundreds of non-overlappinggrids as the vertex set V with the edge structure E determined by road networks and lo-cation functionalities. Both demands and supplies are observed across grids at each timewindow with possibly diﬀerent total masses and distributions. The ride-sourcing platformuses some order dispatching policy to match customer requests with possible surroundingidle drivers, while after ﬁnishing serving assigned orders, drivers return back to the sup-ply pool to prepare for the next feasible matching. The aim of this paper is to address afundamental question of interest for the demand and supply networks of two-sided markets.The fundamental question of interest that we consider here is how to quantify the spatialequilibrium of dynamic supply-demand networks for two-sided markets, particularly ride-sourcing platforms (e.g., Uber and DiDi). To solve this question, we ﬁrst introduce aweighted graph structure ( G, W, C ) to characterize the transport network and transportcosts of a city. Speciﬁcally, we divide each market into N disjoint areas and regard themas vertices, denoted as V = { v , . . . , v N } . Let E be a set of edges between any possiblepair of vertices such that ( v i , v j ) ∈ E ⊂ V × V is an edge equipped with an nonegative2eight w ij (e.g., transportation cost). For all ( v i , v j ) / ∈ E , we set w ij = ∞ . The weightedgraph structure consists of an undirected (or directed) graph G = ( V , E ) as well as a weightmatrix W = ( w ij ), where w ij s’ are nonegative weights. A graph-based transport cost from v i to v j is deﬁned as c ij = min K ≥ , ( i k ) Kk =0 : v i → v j { (cid:88) k w i k ,i k +1 : ∀ k ∈ [[0 , K − , ( v i k , v i k +1 ) ∈ E } , (1)where ( i k ) Kk =0 : v i → v j denotes any path on G through E starting from v i = v i andending at v i K = v j . Thus, c ij is the geodesic distance from v i to v j or the minimal cost oftransporting one unit of object from v i to v j . Thus, we can deﬁne a transport cost matrixon ( G, W ), denoted as C = ( c ij ) ∈ R N × N . The C may be time variant, since it depends onthe real-time traﬃc and weather conditions for ride-sourcing platform. The C is possiblyasymmetric since the graph G can be directed. Throughout the paper, we consider theweighted graph structure ( G, W, C ).Second, we need to introduce a distance (or metric) to quantify the diﬀerence betweendemand and supply masses at each time interval and across time on (

G, W, C ). At a giventime interval, we deﬁne ν j = ν ( v j ) and µ j = µ ( v j ) as the point masses at vertex v j for thetwo measures ν and µ , which, respectively, represent the number of customer requests andavailable drivers inside the vertex v j of the ride-sourcing platform (Wang & Yang 2019). Thesupply and demand systems at each timestamp can be modeled as two discrete Lebesguemeasures µ and ν on ( G, W, C ) with locally ﬁnite masses such that max( µ ( V ) , ν ( V )) isﬁnite for every compact set V ⊂ V . We consider a general case that the two measurescan be unbalanced, that is, µ = (cid:80) Ni =1 µ i and ν = (cid:80) Ni =1 ν i may be unequal to each other.Deﬁning a metric between µ and ν falls into the ﬁeld of optimal transport.Optimal transport has been widely studied in diverse disciplines including statistics,applied mathematics, neuroscience, medical imaging, computational biology, and computer3igure 1: Dynamic supply and demand networks at three time points in a representativeride-sourcing platform. We divide the whole city into multiple hexagon areas and eachhexagon contains multiple supplies and demands.vision. Wasserstein-based metrics based on the mathematics of optimal mass transport havebeen proved to be powerful tools for comparing objects in complex spaces. Some successfulapplications include solving transport partial diﬀerential equation (PDE) (Ambrosio &Gangbo 2008, Ambrosio et al. 2008), measuring image similarities (Rubner et al. 1998 a , b ,2000), imaging processing (Rabin & Papadakis 2015, Shi & Wang 2020), statistical inferencein machine learning (Lacombe et al. 2018, Solomon et al. 2014), manifold diﬀeomorphisms(Grenander & Miller 2007, Srivastava & Klassen 2016, Younes 2010), and serving as thecost function for training Generative Adversarial Networks (Arjovsky et al. 2017), amongmany others. However, existing Wasserstein-based metrics are not directly applicable to4he comparison of two unbalanced measures deﬁned on ( G, W, C ) as detailed in Section 2.1.We introduce a novel graph-based equilibrium metric (GEM) and formulate it as an un-balanced optimal transport problem. The main contributions of this paper are summarizedas follows: • We propose a novel GEM, which can be regarded as a restricted generalized Wasser-stein distance, to quantify the distance between dynamic demand and supply networkson the weighted graph structure. It not only allows the optimal transport guided byasymmetric costs and node connections, but also accounts for unbalanced masses. Italso allows one of the two sides (supplies) to play the transporting role and the other(demands) to be ﬁxed, which satisﬁes the physical interpretation of ride-sourcingplatforms. • Varying the size of each vertex leads to multilevel GEMs and their correspondingoptimal transport functions. At the ﬁnest scale, our GEM reduces to solving anunbalanced assignment problem and its corresponding optimal transport functioncontains many local details. In contrast, at a relatively coarse scale, it gives a coarserepresentation (or low frequency patterns) of the optimal transport function. • Numerically, the calculation of GEM can be reformulated as a standard linear pro-graming (LP) problem. Theoretically, we investigate several theoretical properties ofour GEM including the convergence of the LP algorithm for computing GEM, the ex-pectation of GEM, and the metric property, additive property and weak convergenceof GEM. • We examine the ﬁnite sample performance of GEM by using real data sets obtainedfrom a ride-sourcing platform in order to address three practical problems including5he prediction of the eﬃciency of a given dispatching policy, the optimization of orderdispatching policy, and the AB testing of comparing diﬀerent dispatching policies.The remainder of this paper is structured as follows. Section 2 reviews various method-ologies including existing Wasserstein-type distances, graph-based equilibrium metrics,computational approach, and three important applications of GEM in ride-sourcing plat-forms. Section 3 studies four theoretical properties associated with GEM. Section 4 exam-ines the ﬁnite sample performance of GEM by using real and simulated data sets.

Many approaches have been proposed to measure the distance between two measures (ordistributions) on a metric space. Most of them fall into two broad categories includingthe aggregation of pixel-wise diﬀerences and the transport cost of moving one measure tomatch the other. Measurements in the ﬁrst category include the L p -distance, the TotalVariation (TV) distance (Levin & Peres 2017), and the Kullback-Leibler (KL) divergence(Cha 2007), among others. A typical example is the Hellinger Distance (Nikulin 2001)reviewed as follows. Deﬁnition 2.1. (Hellinger Distance)

Let M ( X ) and M + ( X ) be the vector space ofRadon measures and the cone of nonnegative Radon measures on a Hausdorﬀ topologicalspace X . Then, we use µ and ν to denote two probability measures that are absolutelycontinuous with respect to a third probability measure ν . The square of the Hellinger istance between µ and ν ∈ M + ( X ) is deﬁned as D H ( µ, ν ) = 12 (cid:90) X (cid:16)(cid:112) dµ/dν − (cid:112) dν/dν (cid:17) dν , (2) where dµ/dν and dν/dν are the Radon-Nikodym derivatives of µ and ν , respectively. All these metrics suﬀer from two major issues. Please refer to Figures 2 for details.First, all these metrics not only fail to consider the connections among diﬀerent locations(vertices in graph), but also ignore the topological (or geometric) structure of X . Second,the use of Hellinger-type distances requires a normalization step to enforce µ ( X ) = (cid:82) X dµ = ν ( X ) = (cid:82) X dν , which can create a false balance issue.To address these two issues, the second category of metrics, such as the Wassersteindistance (Ambrosio et al. 2008, Villani 2008), is proposed by solving an optimal transportproblem. In the real world, original supply resources can usually be transported to achievea better equilibrium between µ and ν . All those distances have deep connections to wellstudied assignment problems from combinatorial optimization (Steele 1987). Deﬁnition 2.2. (Wasserstein Distance)

Let X and Y be Hausdorﬀ topological spacesand X × Y be their product space. We introduce a lower semi-continuous function c : X × Y → R ∪ {∞} , an nonnegative measure (or a transport function) γ ∈ M + ( X × Y ) ,and an equality constraint ι { = } ( α | β ) which is if α = β and ∞ otherwise. Then the optimaltransport problem for measures µ ∈ M + ( X ) and ν ∈ M + ( Y ) with the same total masses,that is µ ( X ) = ν ( Y ) , can be deﬁned as D W ( µ, ν | c ) = inf γ ∈ M + ( X × Y ) (cid:26)(cid:90) X × Y cdγ + ι { = } ( P X γ | µ ) + ι { = } ( P Y γ | ν ) (cid:27) , (3) where P X γ and P Y γ denote the ﬁrst and second marginals of γ , respectively. γ denotes a transport plan, measuring how far you have to move the mass of µ to turn it into ν . Standard optimal transport in (3) is only meaningful whenever µ and ν have the same total masses. Whenever µ ( X ) (cid:54) = ν ( Y ), there is no feasible γ in (3). For thereal-world ride-sourcing platforms, however, it is important to compute some sort of relaxedtransportation between two arbitrary non-negative measures. An improved approach is tobuild an unbalanced optimal transport problem by introducing two divergences over X and Y , denoted as D ϕ and D ϕ respectively (Chizat et al. 2018, Liero et al. 2018), which aredeﬁned as follows. Deﬁnition 2.3. (Divergences) . Let ϕ be an entropy function. For µ, ν ∈ M ( T ) , dµdν ν + µ ⊥ is the Lebesgue decomposition of µ with respect to ν . The divergence D ϕ is deﬁned by D ϕ ( µ | ν ) := (cid:90) T ϕ ( dµdν ) dν + ϕ (cid:48)∞ µ ⊥ ( T ) if µ and ν are nonnegative and ∞ otherwise. Now, we can give the formal deﬁnition of Generalized Wasserstein Distance

Deﬁnition 2.4. (Generalized Wasserstein Distance (GWD))

Let c : X × Y → [0 , ∞ ] be a lower semi-continuous function, the unbalanced optimal transport problem is D ϕ ,ϕ ( µ, ν | c ) = inf γ ∈ M + ( X × Y ) (cid:26)(cid:90) X × Y cdγ + D ϕ ( P X γ | µ ) + D ϕ ( P Y γ | ν ) (cid:27) . (4)Diﬀerent from standard Wasserstein Distance which normalizes the input measures intoprobability distributions, GWD quantiﬁes in some way the deviation of the marginals ofthe transport plan γ from the two unbalanced measures µ and ν by using ϕ -divergence.Although D ϕ ,ϕ ( µ, ν | c ) enjoys some nice properties, such as metric property (Chizat et al.2018, Liero et al. 2018), the solution to (4), denoted as γ ∗ , may not have any physical9eaning. For the ride-sourcing business, such γ ∗ is critically important for assigning sup-plies to demands since it can be regarded as the graph representation of a dispatchingpolicy. Therefore, the use of D ϕ ,ϕ ( µ, ν | c ) still cannot fully cover the ’useful’ relative sizebetween µ and ν , since it may underestimate unmatched resources by allowing some infea-sible transports, that is, the space M + ( X × Y ) is too large to be useful. Three major issuesof using D ϕ ,ϕ ( µ, ν | c ) are given as follows. • The ﬁrst issue is that in many applications (e.g., ride-sourcing platform), point massesin only one of the two measures are allowed to be transported and those in the othermeasure are ﬁxed. In this case, the symmetric property does not hold. • The second issue is that neither W nor C can be used to deﬁne a standard metricspace on G = ( V , E ), since transport cost (or weight) matrix may not satisfy thethree key assumptions of standard metrics. For instance, the transport cost from v i to v j may be unequal to that from v j to v i , since transport cost matrix C ∈ R N × N can be asymmetric for directed graphs. Moreover, the direct transport cost from v i to v j may be larger than or equal to the sum of the transport cost from v i to v k andthat from v k to v j . • The third issue is that in some applications, such as supply-demand networks, thetransport cost from v i to v j may not be a constant and the transport cost from avertex to itself may not be zero. It is possible that supply units at vertex v i have theirindividual transport costs of moving within/outside the vertex v i . Subsequently, theirtransport costs from v i to v j may follow a distribution instead of being a constant.10 .2 Graph-based Equilibrium Metrics On (

G, W, C ), we formally introduce our GEMs for two discrete measures µ and ν in M + ( V ), among which point masses in µ are allowed to be transported and those in ν areﬁxed. We need to introduce some notations. In this case, we have X = Y = V anduse P V γ and P V γ to represent P X γ and P Y γ , respectively. Let | µ | = (cid:80) Ni =1 µ ( v i ) and | µ − ˜ µ | = (cid:80) Ni =1 | µ ( v i ) − ˜ µ ( v i ) | . For i = 1 , . . . , N , we use N i to denote the neighboring setof v i in V , which contains v i and its (possibly high-order) neighboring vertexes. Moreover, v i ∈ N j does not ensure v j ∈ N i since the traﬃc and road networks may constrain directlytransporting cars from v j to v i .Let c : V × V → R ∪ {∞} be a function and γ ∈ M + ( V × V ) be a nonnegative measure.The general form of our GEM on ( G, W, C ) is written as ρ λ ( µ, ν | G, C ) = inf ˜ µ ∈ M + ( V ) ,γ ∈ M + ( V × V ) (cid:26) | ν − ˜ µ | + λ (cid:90) V × V cdγ (cid:27) (5)subject to an equality constraint and two sets of transport constraints given by | µ | = | ˜ µ | , ( P V γ )( v i ) = (cid:88) v j ∈N i γ ( v i , v j ) = µ i and ( P V γ )( v i ) = (cid:88) v i ∈N j γ ( v j , v i ) = ˜ µ i , (6)where λ is a non-negative hyper-parameter. The three sets of constraints in (6) ensurethat ˜ µ as an intermediate measure shares the same total mass with µ and γ transports µ to ˜ µ . Thus, the feasible set for (6) is much smaller than that for (4). The integration of λ (cid:82) V × V cdγ and the three sets of constraints in (6) is equivalent to the balanced Wassersteindistance in (3), so GEM is the integration of the balanced Wasserstein distance and the L norm.In our GEM framework, one of the two measures plays the role of ’predator’ to moveand ’catch’ the ’prey’, which mimics the general supply-demand system of ride-sourcing11latforms. Therefore, diﬀerent from the setting of Piccoli & Rossi (2014), in which both twomeasures are rescaled, we ﬁx ν but change µ only to make the two sides match each otherunder the asymmetric distance and transport range constraints. In Figure 3, we considertwo simple examples in order to understand the diﬀerences between GEM and GWD.Moreover, since we only consider the transport from µ to ˜ µ with ν ﬁxed, ρ λ ( µ, ν | G, C ) isgenerally asymmetric and can be regarded as a restricted GWD.Figure 3: Examples illustrating the diﬀerences between GEM and GWD. Panel (a): ForGEM, the four units can be matched in the left sub-ﬁgure, whereas it is infeasible in theright one. There exits a directed edge from vertex A to vertex B, but not from vertex Bto vertex A. Panel (b): In the top sub-ﬁgure, one ‘demand’ unit at vertex C cannot bematched (the upper line) for GEM since the transport from vertex A to vertex C is notallowed in this case, whereas in the bottom sub-ﬁgure, they can be transported for GWD.In panel (b), for GEM, it is assumed that the neighboring set N i only includes the adjacentvertexes of each vertex.Besides GEM, the optimal solution of (˜ µ, γ ) to (5), denoted as (˜ µ ∗ , γ ∗ ), also plays an im-12ortant role in various two-sided markets, such as ride-sourcing platforms and E-commerce.The ˜ µ ∗ can be regarded as an optimal dispatch of transporting supplies µ to match demands ν , whereas γ ∗ is an optimal transport function associated with ρ λ ( µ, ν | G, C ). If we vary thearea of each vertex from the coarsest to the ﬁnest scale, then we obtain multilevel GEMand its transport function. At the ﬁnest scale, our GEM reduces to solving an unbalancedassignment problem, so γ ∗ is able to capture the local structure of the optimal transportfunction. In contrast, at a relatively coarse scale, we obtain a coarse representation of theoptimal transport function, reﬂecting its global patterns. We will discuss how to applyGEM to ride-sourcing platforms in Section 2.4.Furthermore, we can simplify γ by deﬁning γ = ( γ ij ) as an N × N ﬂow matrix with γ ij being the transport amount from v i to v j . Let Γ represent the set consisting of all thefeasible solutions γ with all non-negative elements γ ij ≥

0. Let (cid:101) µ = ( (cid:101) µ , . . . , (cid:101) µ N ) T ∈ R N represent the measure µ after transporting γ such that (cid:101) µ i = (cid:80) v j ∈N i γ ji holds for all i .Therefore, our proposed GEM is equivalent to solving a discrete optimization problem asfollows: ρ λ ( µ, ν | G, C ) = min γ ∈ Γ {(cid:107) ν − (cid:101) µ (cid:107) + λ (cid:88) v i ∈ V (cid:88) v j ∈ V c ij γ ij } (7)subject to (cid:88) v j ∈N i γ ij = µ i , (cid:88) v j / ∈N i γ ij = 0 , and (cid:101) µ i = (cid:88) v i ∈N j γ ji for ∀ v i ∈ V , where ν = ( ν , . . . , ν N ) T ∈ R N and (cid:107) · (cid:107) corresponds to the L norm. Moreover, (cid:107) ν − (cid:101) µ (cid:107) in (7) is equivalent to the ﬁrst term of the objective function in (5).There are two key advantages of using the derived form given in (7) compared to theexisting unbalanced optimal transport problem. The ﬁrst one is that transport is onlyallowed between a vertex v i and its neighboring set N i based on G .The second one is that λ as a hyper-parameter can balance the transport cost taken13o reallocate point masses and the requirement of assigning µ to satisfy ν . The choiceof λ in practice is data-driven. To ensure that the transport only happens among se-lected vertex pairs under the optimal transport plan, the theoretical upper bound of λ is2 / max v i ∈ V,v j ∈ (cid:101) N i ( c ij ), where (cid:101) N i ⊂ N i contains all the neighboring vertexes of v i that trans-port from v i . In this case, the cost of transporting one unit of supply from vertex v i to v j ∈ (cid:101) N i , λc ij , is smaller than its contribution to reducing (cid:107) ν − (cid:101) µ (cid:107) , which is 2 (1 for v i and v j , respectively), when ν j − µ j ≥ µ i − ν i ≥

1. Transport from v i to v j keeps decreasingthe objective value ρ λ ( µ, ν | G, C ) until either the balance in the destination vertex v j orthat in the origin vertex v i is achieved. In the real world, we usually let λ max v i ∈ V,v j ∈ (cid:101) N i ( c ij )fall into the range [0 . , .

5] with c ij being the geological distance and (cid:101) N i containing all theﬁrst-order adjacent vertexes of v i in ( G, W, C ), which can achieve the best performancein some practical problems such as the prediction of order answer rate, dispatching policydesign, and A/B testing.

Optimal solution γ ∗ to (7) can be calculated by solving a standard linear programming(LP). We will reformulate (7) as a LP problem below. Numerically, we use a revisedsimplex method incorporated in a C package GNU Linear Programming Kit (GLPK) tosolve (7). We have found that GLPK works pretty well in our real data analyses in Section4. We need to introduce some notations. Since the transport range constraints in (6)impose γ ij = 0 for v j / ∈ N i , we only need to assign optimal values to (cid:101) γ = Vec { γ ij , j ∈N i } ∈ R N × , where N = (cid:80) Ni =1 n i and Vec( · ) denotes the vectorization of a matrix. Withthis simpliﬁcation, the dimension of solvable variables is reduced from O ( N ) to O ( N ),14hich highly increases the computational eﬃciency of our algorithm. Let A and A betwo N × N matrices. The i -th row of A consists of 0’s except the ( (cid:80) i − j =1 n j + 1)-th to( (cid:80) ij =1 n j )-th elements being 1. Similarly, all the elements of of i -th row of A are zerosexcept the ( (cid:80) j − p =1 n p + q )-th element being 1 when grid v i is indexed by q in the neighboringset N j of vertex v j . Let (cid:101) C ∈ R N × be the vector including the unit transport costs for allthe corresponding γ ij (cid:48) s ∈ (cid:101) γ . Moreover, we deﬁne A =  A A I N − I N A − I N N  and b =  µνν  , where µ = ( µ , . . . , µ N ) T , A ∈ R N × ( N +3 N ) , b ∈ R N , and I N is an identity matrix.The (7) is equivalent tomin {(cid:107) ν − A (cid:101) γ (cid:107) + λ (cid:101) C T (cid:101) γ } subject to A (cid:101) γ = µ and (cid:101) γ ≥ . Let S ∈ R N × , it can be further transferred into a standard linear programming (LP)min { T S + λ (cid:101) C T (cid:101) γ } subject to (8) A (cid:101) γ = µ , A (cid:101) γ + S ≥ ν , A (cid:101) γ − S ≤ ν , (cid:101) γ ≥ , and S ≥ . The above LP can be further rewritten asmin X { B T X } subject to AX = b , X ≥ , (9)where B = ( λ (cid:101) C T , T , T , T ) T and X = ( (cid:101) γ T , S T , w T , w T ) T , in which w and w are vectorsof slack variables. The dual of (9) is assigned asmax y ∈ R N { b T y } subject to A T y ≤ B, (10)which further reduces the variable dimension from N + 3 N to 3 N .15 .4 Applications of GEM in Ride-sourcing Platforms To calculate GEM, we need to build a dynamic weighted graph structure over time foreach city on the ride-sourcing platform as follows. We ﬁrst divide a city into | V | = N non-overlapping hexagons and regard each hexagon as a vertex in V . Then, we set N i = ∪ k =0 N ki ,where N ki includes all the neighboring hexagons within the k -th outer layer of v i for k > N i only includes v i itself. A vertex v j belongs to the k -th outer layer of v i if k steps arerequired to walk from v i to v j on the hexagonal network. Thus, we determine G = ( V , E ).Second, we set W t = ( w ijt ), where w ijt is the distance between v i and v j in the t − thtimestamp. Note that w ijt may vary with time due to the real-time locations of driversand customers. Third, we compute C t = ( c ijt ) by using W t according to (1) in the t − thtimestamp. Finally, we obtain the dynamic weighted graph structure ( G, W t , C t ).We show how to use GEM to address three important questions of interest in ride-sourcing platforms. First, we can measure the optimal distance between observed dynamicsupply and demand networks across time. We extract the spatio-temporal data O = { ( o it ) } t and D = { ( d it ) } t from the dynamic demand and supply systems, where o it and d it representdemands and supplies at vertex v i in the t -th timestamp, respectively. Given O and D , weset µ t = ( d it ) i and ν t = ( o it ) i and use the LP algorithm to calculate ρ ( t ) = ρ λ ( µ t , ν t | G, C t )and its corresponding solution, denoted as (˜ µ t ∗ = ( ˜ d it ∗ ) i , γ t ∗ ), in the t -th timestamp.Furthermore, we introduce an optimal supply-demand ratio at each v i in the t − thtimestamp deﬁned as the ratio of o it over the ’optimal’ supplies ˜ d it ∗ + ι { = } ( ˜ d it ∗ = 0),denoted as DSr it , in which we add an extra term ι { = } ( ˜ d it ∗ = 0) to avoid zero in thedenominator. Similarly, we can deﬁne an optimal supply-demand diﬀerence as DSd it = o it − ˜ d it ∗ at each ( v i , t ). It allows us to create the spatiotemporal map of GEM-relatedmeasures (DSr it , DSd it ). Furthermore, we extend (DSr it , DSd it ) to a wide timespan T V ∈ V . For instance, we deﬁne a weighted average supply-demandratio over V in T and a weighted average absolute supply-demand diﬀerence over V in T as follows:DSr T ( V ) = (cid:82) t ∈T (cid:80) i ∈ V w it DSr it dt (cid:82) t ∈T (cid:80) i ∈ V w it dt and ADSd T ( V ) = (cid:82) t ∈T (cid:80) i ∈ V w it | DSd it | dt (cid:82) t ∈T (cid:80) i ∈ V w it dt , (11)in which we set w it as either o it or ( o it + ˜ d it ∗ ) / | DSd it | and | DSr it − | across all ( v i , t ). Please see an application of (DSr it , DSd it ) inSection 4.1 for details.Second, we can use historical supply-demand information contained in { (DSr it , DSd it ) :( v i , t ) ∈ V × T } to design order dispatching policies for large-scale ride-sourcing platforms.Order dispatch is an essential component of any ride-sourcing platform for assigning idledrivers to nearby passengers. Standard order dispatching approaches focus on immediatecustomer satisfaction such as serving the order with the nearest drivers (Liao 2003) or theﬁrst-come-ﬁrst-go strategy to serve the order on the top of the waiting list with the ﬁrstdriver becoming available (Zhang & Pavone 2016). Those greedy methods, however, failto account for the spatial eﬀects of an order and driver (O-D) pair on the other O-D pairs.Thus, they may not be optimal from a global perspective. To improve users’ experience,some more advanced techniques strive to balance between small pick-up distance and largedrivers’ revenue. To design better dispatching policy, we will include additional historicalsupply-demand network information based on GEM to delineate its eﬀects on the averageexpected gain from serving current order. Please see Section 4.2 for details.Third, an important application of { ρ ( t ) } is to use it as a metric to directly compare two(or more) dispatching policies for ride-sourcing platforms. The key idea is to detect whetherthere exists a signiﬁcant diﬀerence between two sets of GEMs for two competitive policies17nder the same platform environment. Given the joint distribution of demand and supplyin the platform, the smaller GEM is, the better many global operational metrics, such asorder answer rate, order ﬁnishing rate, and driver’s working time, are. Compared withthose global operational metrics, GEM is a more direct measurement of the operationaleﬃciency for a ride-sourcing platform. Please see Section 4.3 for details. In this section, we study the theoretical properties of our GEM related methods proposedin Section 2, most of whose proofs can be found in the supplementary document.First, we establish the convergence property of LP (9) for GEM.

Theorem 3.1.

The LP (9) has an optimal basic feasible solution. Furthermore, if X isfeasible for the primal problem (9) and y is feasible for the duality (10), then we have ¯ z = y T b = y T AX ≤ B T X = z (12) If either (9) or (10) has a ﬁnite optimal value, then so does the other, the optimal valuescoincide, and the optimal solutions to both (9) and (10) exist.

An implication of Theorem 3.1 is that the LP algorithm for GEM converges. It demon-strates that there always exit theoretically optimal transport plans (including no transportcase) to maximally increase the systematic coherence between the initially unbalanced sup-plies and demands. However, Theorem 3.1 also indicates that the optimal transport planmay not be unique considering the weighted graph structure and initial supply and demanddistributions. 18econd, we carry out a probabilistic analysis of our LP (9) for GEM when c ij follows adistribution. Let’s start with two motivating examples of ride-sourcing platforms describedin Figure 4, in which each vertex represents a hexagonal urban area. The w ij representsthe geological distance or the traﬃc time, which may vary between each pair of supply at v j and demand at v i .Figure 4: Two examples to illustrate the importance of using random c ij in some cases.In Panel (a), two demands and two drivers can only be matched in the lower sub-ﬁguresince their corresponding pairs of distance are below a given threshold, whereas it is notthe case in the upper sub-ﬁgure when the distances are larger than the threshold. In Panel(b), supply A is assigned to demand C in the lower sub-ﬁgure when the within-grid cost c ii is non-zero, and to demand B when c ii = 0, guided by the optimal transport plan withsmaller transport costs.For the LP deﬁned in (9), it is assumed that each component of (cid:101) C = (˜ c , ˜ c , . . . , ˜ c N ) T isa non-negative random variable, whereas all elements in A and b in (9) are known. Let z ∗ z ∗ is a function of (cid:101) C , it is also a random variable.We provide an upper bound for the expectation of z ∗ = z ∗ ( (cid:101) C ) below. Theorem 3.2 ( Expectation Bound).

Let ˜ c , . . . , ˜ c N be independent non-negative ran-dom variables. Suppose there exist α ∈ (0 , ∞ ) and α ∈ (0 , such that for l = 1 , , . . . , N and all h > with P ( λ ˜ c l ≥ h ) > , we have E ( λ ˜ c l | λ ˜ c l ≥ h ) ≥ α λE (˜ c l ) + α h, (13) where the expectation is taken with respect to ˜ c l . Let { ˆ x , . . . , ˆ x N + N } be any ﬁxed feasiblesolution to (9). We have E ( z ∗ ) ≤ α − { N (cid:88) l =1 (1 − α δ l ) E ( λ ˜ c l )ˆ x l + N + N (cid:88) l = N +1 ˆ x l } , (14) where δ l ∈ [0 , deﬁned in the supplementary document is a pre-deﬁned nonnegative con-stant for each l ∈ { , . . . , N } . Theorem 3.2 has at least two implications. First, condition (13) holds under some mildconditions. For instance, it can be shown that if ˜ c j is a bounded random variable that takesvalues in [ c j,L , c j,U ] such that P (˜ c j > c j,L ) > h → h − P ( λ ˜ c j < h + λc j,L ) > c j include uniform, truncated normal, andtruncated exponential random variables, among others. For instance, we consider the casethat ˜ c j follows Uniform [ c j,L , c j,U ]. It can be shown that E ( λ ˜ c j | λ ˜ c j ≥ h ) = 0 . λc j,U + h ),yielding α = c j,U / ( c j,U + c j,L ) and α = 0 .

5. Second, (14) gives an upper bound of theexpected value of z ∗ ( (cid:101) C ). If we set δ l = 0 for all l , then we can obtain a larger upper boundcompared with the right-hand side of (14). This result generalizes an existing result of Dyeret al. (1986) for standard linear programs with random costs under a stronger conditioncorresponding to α = 1. 20hird, we examine the metric properties of ρ λ (( · ) , ( · ) | G, C ) including non-negativity,identity, symmetry, and the triangle inequality.

Theorem 3.3.

The operator ρ λ (( · ) , ( · ) | G, C ) is a semi-metric such that it satisﬁes non-negativity, identity, and symmetry, but not necessarily the triangle inequality when (i) C = ( c ij ) ∈ R N × N is symmetric with c ii = 0 for all i ; (ii) j ∈ N i if and only if i ∈ N j . Theorem 3.3 indicates that if C is symmetric, then ρ λ (( · ) , ( · ) | G, C ) as a semi-metricsatisﬁes three properties including non-negativity, identity, and symmetry. Although thesymmetric assumption of C may be incorrect for all vertexes, it should be valid for mostvertexes. Thus, ρ λ (( · ) , ( · ) | G, C ) is approximately a semi-metric.Fourth, we give the upper and lower bounds of GEM and consider an additivity propertyin order to better understand how the transport costs and network structures aﬀect GEM.

Theorem 3.4.

The following properties hold:(i). || µ | − | ν || ≤ ρ λ ( µ, ν | G, C ) ≤ ( | µ | + | ν | ) (ii). Additivity Property . For a non-negative ∆ , when C is symmetric, we have | ρ λ ( µ, ν + ∆ | G, C ) − ρ λ ( µ, ν | G, C ) | ≤ N ∆; | ρ λ ( µ + ∆ , ν | G, C ) − ρ λ ( µ, ν | G, C ) | ≤ N ∆ . Property (i) shows that GEM can be bounded from both above and below. Based onthe additivity property, the GEM value can either increase or decrease with one-side node-wise augmentation, which depends on the weighted graph structure and the distribution ofsupply and demand. This indicates that applying proper stimulus at selected vertexes ismore eﬃcient than globally increasing supply resources.Fifth, we examine the weak convergence property of ρ λ (( · ) , ( · ) | G, C ).21 heorem 3.5. (Weak Convergence)

Let { µ n } be a sequence of measures on space V ,and µ n , µ ∈ M + ( V ) . If all the transport costs are bounded, that is c ij ≤ R holds for ∀ v i ∈ V and v j ∈ N i , then we haveif µ n → µ and { µ n } is tight, then ρ λ ( µ, µ n | G, C ) → . Here is an immediate corollary of Theorem 3.5, which guarantees the continuity of ρ λ ( µ, µ | G, C ) on (

G, W, C ): Corollary 3.5.1.

Let { µ n } and { ν n } be two sequences of measures on space V , and µ n , ν n , µ, ν ∈ M + ( V ) . If c ij ≤ R holds for ∀ v i ∈ V and v j ∈ N i , then we haveif µ n ( resp. ν n ) → µ ( resp. ν ) and { µ n } , { ν n } are tight, then ρ λ ( µ n , ν n | G, C ) → ρ λ ( µ, ν | G, C ) . Theorem 3.5 states that the GEM value goes to 0 and no transport is required whenthe initial distributions of µ and ν are getting close to each other. In this section, we examine the ﬁnite sample performance of GEM by carrying out threereal data analyses including answer-rate prediction, the design of order dispatching strategy,and policy assessment. Without special saying, we use the method described in subsection2.4 to construct the dynamic weighted graph structure across time in all these analyses.

The data set that we use here includes both demand and idle driver information from April21st to May 20th, 2018 in a large city H. We divide the whole city into N = 800 non-overlapping hexagonal sub-regions with side length being 1400m to form the whole vertex22et V . We let the directed edge weight w ij from v i to v j ∈ N i be the distance between thecenters of the two sub-regions, which is 2400m if v j can be directly reached by v i throughtraﬃc without ﬁrst passing through another vertex. Otherwise, w ij = ∞ . We computethe numbers of idle drivers and demands in each vertex per minute and then extract thedynamic supply-demand data set.The aim of this data analysis is to examine whether the GEM-related measures, such asDSr it , are useful for predicting order answer rate in ride-sourcing platforms. Order answerrate is deﬁned as the number of orders accepted by drivers divided by the total number oforders in a ﬁxed time interval. Speciﬁcally, we predict the log-value of order answer rateof the incoming 10 (or 60) minutes by using historical metric values. We computed theHellinger distance, the L distance, the Wasserstein distance, and GEM for each 10-minuteinterval. The L distance is calculated by using the numbers of orders and available driversin all vertices across 10 consecutive 1-min timestamps. The Hellinger distance is calculatedby normalizing the numbers of orders and available drivers in all vertices and across 10 con-secutive 1-min timestamps into probability distributions. For the Wasserstein distance, weﬁrst normalize both supplies and demands at each one-minute time interval into two prob-ability distributions and calculate their corresponding Wasserstein distance. Subsequently,we obtain the metric value over each 10-minute interval by aggregating the Wassersteindistances computed across the 10 included one-minute timestamps by using their corre-sponding weights (cid:80) v i ∈ V o it k / (cid:80) t k ∈T (cid:80) v i ∈ V o it k . For GEM, we compute the supply-demandratio map of DSr it per minute and then we calculate DSr T ( V ) for each 10-minute interval.We split the supply-demand data set into a training data set consisting of observationsfrom April 26th to May 11th, 2018, and a test data set consisting of observations from May12th to May 21st, 2018. We use linear regression models to predict the log-value of order23nswer rate of the incoming j − th 10 minutes for j = 1 , . . . , p = 10 10-minute snapshots and those in the same timewindows of the previous 5 days.Figure 5: Results from the answer-rate prediction. Panel (a): comparisons of the log-valueof real answer rates obtained from May 12th to May 18th, 2018 and their predictive valuesbased on the L distance, the Hellinger distance, the Wasserstein distance, and GEM. Panel(b): comparisons of day-wise RMSEs of answer rate prediction obtained from Monday toSunday using the Hellinger distance, the Wasserstein distance, and GEM within the wholecity area.We use Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error(RMSE) as evaluation metrics to examine the prediction accuracy of all the four com-pared metrics. Table 1 shows their corresponding RMSE and MAPE values based on thetest data. Due to the space limitation, we only provide the results corresponding to thoseat t + 10 and t + 60 minutes, which indicate the short-term and long-term prediction ca-24acities of all the four metrics. Moreover, we also include the results during the eveningpeak hours starting from 6 pm to 8 pm. For both the t + 10 and t + 60 cases, GEM signiﬁ-cantly outperforms all other three metrics, which may not suﬃciently capture the dynamictransport and systematic balance of the weighted graph structure.Figure 5 (a) presents the real order answer rates and their predictive values in the last 7test days (Tuesday to Monday) from May 12th to May 18th based on all the four metrics forthe ( t +10) case. Compared with all other methods, GEM shows higher consistency betweenthe true and predicted answer rate values, especially for some abnormal extreme cases.Furthermore, Figure 5 (b) presents the histograms of RMSEs for the Hellinger distance,the Wasserstein distance, and GEM at each day of the last seven dayes, indicating thatGEM outperforms the other two metrics consistently in all seven days. Therefore, our GEMis able to capture the short- and long-term variability within the coherence between thetwo spatial-temporal systems and has strong interpretation capacity for predicting futureanswer rates. We consider the order dispatching problem of matching N o orders with N d available idledrivers, where N o and N d denote the total number of orders and that of idle drivers in thecurrent timestamp, respectively. The edge weight A ( k, l ) in the bipartite graph equals tothe expected earnings when pairing driver l to order k . Let x kl be 1 if order k is assignedto driver l and 0 otherwise. The global order dispatching algorithm solves a bipartite25igure 6: The order dispatch as a bipartite matching problem: (a) available orders anddrivers prepared for pairing; (b) quantifying all the potential expected earning A ( k, l ) for alldriver-order pairs ( k, l ) that satisfy the dispatching constraints; and (c) ﬁnding the optimalone-to-one bipartite matching in order to maximize the total revenue.matching problem as follows:arg max x kl N d (cid:88) k =0 N o (cid:88) l =0 A ( k, l ) x kl , s.t. N d (cid:88) k =0 x kl ≤ ∀ l ; N o (cid:88) l =0 x kl ≤ ∀ k ; (15) x kl = 0 if c kl > (cid:15) ∀ k, l. See Figure 6 for a graphical illustration of (15). The constraints ensure that each ordercan be paired to at most one available driver and similarly each driver can be assignedto at most one order. In practice, only drivers within a certain distance could serve thecorresponding orders, which means that x kl s’ are forced to be 0 when the distance betweenorder k and driver l , denoted as c kl , is beyond the maximal pick-up distance (cid:15) . The state-of-art algorithm to solve this kind of matching problem is the Kuhn-Munkres (KM) algorithm(Munkres 1957), which will be used to solve the formulated problem here.In this paper, we compare three diﬀerent dispatching policies based on three diﬀerent26ormulations of A ( k, l ). The ﬁrst one as a baseline only considers the immediate reward ofassigning driver l to order k , which is deﬁned as A (1) ( k, l ) = α r k − α c kl , (16)where r k is the driver’s earning by serving order k and c kl is the pick-up distance betweenorder k and driver l . Moreover, α and α are tuning parameters such that the two termsare balanced to maximize drivers’ salaries, while reducing customers’ waiting time.The second one is given by A (2) ( k, l ) = α r k − α c kl + α { η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) } , (17)where η is the discount factor and an additional term α { η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) } is introducedto enhance the long-term eﬀects of current actions on drivers’ future income (Xu et al.2018). Let V ( s ) be the expected earnings from now to the end of the day for a driverlocated at s = ( v, t ), where v ∈ V and t is the current time. Moreover, s l = ( v ( l ) , t )and s (cid:48) lk = ( v ( k ) , t + ∆ t lk ) here represent the current spatial-temporal state of driver l andhis/her estimated ﬁnishing state when completing serving order k , where v ( l ) ∈ V is thecurrent region of driver l before order assignment and v ( k ) ∈ V is the destination region oforder k and ∆ t lk denotes the total time required for driver l to ﬁnish the whole process ofserving order k . If a driver becomes available to a new order immediately after ﬁnishingthe ongoing one, then η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) is the extra future earning for driver l by servingorder k other than staying idle.The third one is given by A (3) ( k, l ) = A (2) ( k, l ) + α { η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) } , (18)where α { η ∆ t lk V ( s (cid:48) lk ) − V ( s l ) } is further introduced to balance the supply-demand coher-ence. Moreover, V ( s ) = ν t ( v ) − ˜ µ t ( v ) at s = ( v, t ) is calculated from GEM in (7). The27se of V ( · ) increases the probability that customers’ requests can be quickly answeredby nearby drivers, whereas V ( · ) ignores the interaction eﬀects when multiple drivers areheading to the same location. Thus, when the future demand has already been fulﬁlled bydrivers re-allocated by previous completed servings, assigning more drivers might decrease V ( · ) in the target location.We use a comprehensive and realistic dispatch simulator designed for recovering thereal online ride-sourcing system to evaluate the three dispatching policies (16) − (18). Thesimulator models the transition dynamics of the supply and demand systems to mimic thereal on-demand ride-hailing platform. The order demand distribution of the simulator isgenerated based on historical data. The driver supply distribution is initialized by historicaldata at the beginning of the day, and then evolves following the simulator’s transitiondynamics (including drivers getting online/oﬄine, driver movement with passengers andidle driver random movement) as well as the order dispatching policies. The diﬀerencesbetween the simulated results and the real-world situation is less than 2% in terms of someimportant metrics, such as drivers’ revenue, answer rate, and idle driver rate.To compare the three dispatching policies, we randomly selected a speciﬁc city S, whichusualy has in total 150, 000 to 200, 000 ride demands per day. We still divide the wholecity area into N = 800 hexagonal vertices and use the geological distance between twonearby grids to be the edge weights. Furthermore, three diﬀerent days including 2018/05/15(Tuesday), 2018/05/18 (Friday), and 2018/05/19 (Saturday) were analyzed since the globalorder answer rates on weekday are usually much lower than those at weekend by looking atthe historical data. Both V ( · ) and V ( · ) values were obtained by taking the average of thesame weekday or weekend from the previous four weeks since the platform has signiﬁcantweekly periodicity. The length of time intervals that we used to compute V ( · ) and V ( · )28as set to be T = 10 minutes so that all the action windows inside share the same V ( · ) and V ( · ) values. Speciﬁcally, V ( · ) is achieved by aggregating the |T | continuous ( ν kt − ˜ µ kt )s’.We applied the three dispatching policies with diﬀerent edge weights to the simulator evenbased on the same initial input and transition dynamics. We set α = 1 and α = 0 . r k and the pick-up distance c kl into comparable ranges. The r k contributes more to the variations of A ( k, l ) because of the constrained pick-up distance( c kl ≤ (cid:15) ). Furthermore, we perform grid search for a wide range of ( α ; α ) combinationsto ﬁnd its optimal solution, denoted as ( α ∗ , α ∗ ), that maximizes average drivers’ revenuesfor weekdays and weekends in the simulator. Speciﬁcally, we ﬁxed α = 0 ﬁrst and usethe bisection method to obtain a rough value range of length 0.1 for α with its initialstart being [0 , α each time within the value range until ﬁnding the optimal α ∗ corresponding to the largestaveraged drivers’ revenue. Subsequently, we ﬁx α ∗ and do the similar grid search to getthe optimal α ∗ .Tables 2 and 3 summarize the collected results corresponding to the baseline policy, A (2) ( k, l ) with the optimal α , and our approach with diﬀerent α values. It reveals thatthe order dispatching policy based on A (3) ( k, l ) could achieve higher drivers’ revenue andanswer rate compared with the other two policies. The optimal α is achieved at 0 . .

61, and 0 .

52 for 2018/05/15, 2018/05/18, and 2018/05/19, respectively. In 2018/05/15and 2018/05/19, we obtain a smaller optimal α since a higher coherence between suppliesand demands is achieved under the baseline policy ( α = α = 0) than that of 2018/05/18,which indicates that the supply-demand relationship is more related to the policy eﬃ-ciency than the weekday/weekend status. Moreover, the supply abundance in 2018/05/15and 2018/05/18 results in a higher order answer rate but a smaller optimal α . Compared29o the policy corresponding to A (2) ( k, l ), adding the GEM-related measurements increasesthe expected whole-day answer rate and drivers’ revenue in more than 1%. It may fur-ther indicate that the supply-demand diﬀerence may aﬀect the expected future gain of amarginal driver.In practice, we ﬁrst perform grid search for a wide range of ( α , α ) combinationsto ﬁnd its optimal solution ( α ∗ , α ∗ ) that maximizes average drivers’ revenues for somerepresentative days in the simulator. Then we ﬁne-tune the parameters via on-line A/Btesting, and apply the policy in the real-life dispatching system. The value functions V ( · )and V ( · ) are updated when the new policy being employed for a period of time, and α ∗ and α ∗ are re-tuned in the real environment to achieve the optimal eﬃciency. We conduct an experiment using another supply-demand data set of the same city Hfrom December 3rd to December 16th, 2018 in order to compare the eﬀectiveness betweentwo order dispatching policies. We executed them alternatively on successive half-hourlytime intervals. Moreover, we start with the baseline policy being in the ﬁrst half hourand change the policy every half hour through the whole day and reverse their order inanother day. We include an A/A test, which compares the baseline policy against itself,by using the historical data obtained from November 12th to November 25th as a directcomparison. We calculate GEM within each time window of 30 minutes as follows. Thereare in total M T = 48 time intervals per day. To obtain GEM in each time interval T ,we aggregate 30 GEM values, each of which is calculated within the 1-min timestamp, byusing normalization weights o it / (cid:80) t ∈T (cid:80) v i ∈ V o it .We ﬁrst need to introduce some notation. We denote y m ( t k ) as the aggregated GEM30alue and use x m ( t k ) to denote a 2 × k -th time interval of day m for k = 1 , . . . , M T and m = 1 , . . . , M D .Let a m ( t k ) = 1 if the new policy is used and = − y m ( t k ) = β ( t k ) + β ( t k ) T { x m ( t k ) − x ( t k ) } + β ( t k ) a m ( t k ) + η m ( t k ) + ε m ( t k ) , (19)where β ( t k ) = ( β ( t k ) , β ( t k ) T , β ( t k )) T is a vector of regression coeﬃcients at t k , and x ( t k ) is the sample mean of all x i ( t k )s for k = 1 , . . . , M T . In addition, we assume that η m = ( η m ( t ) , . . . , η m ( t M T )) T and ε m = ( ε m ( t ) , . . . , ε m ( t M T )) T are M T × N ( , Σ η ) and N ( , σ ε · I M T ), where Σ η is an M T × M T matrix and σ ε is a positive scalar. We are interestedin testing the following null and alternative hypotheses: H : (cid:90) M T β ( t ) dt = 0 v.s. H : (cid:90) M T β ( t ) dt (cid:54) = 0 , (20)where (cid:82) M T β ( t ) dt ≈ (cid:80) M T k =1 β ( t k )∆ t denotes the average treatment eﬀect per day, in which∆ t is the length of each time interval. We propose a joint estimation procedure based onGeneralized Estimating Equations (GEE) to iteratively estimate all unknown parametersuntil a speciﬁc convergence criterion being reached (Liang & Zeger 1986). Subsequently,we compute the t − test statistic associated with the average treatment eﬀect per day andits corresponding one-sided (or two-sided) p − value (Mancl & DeRouen 2001).Furthermore, we consider three global operational metrics including the order answerrates, order ﬁnishing rate, and gross merchandise value (GMV) as y m ( t k ) in model (19). Weﬁt the corresponding three regression models in order to study whether the new dispatchingpolicy signiﬁcantly improves the ride-sourcing platform at the operational level.31able 4 summarizes all regression analysis results for both the A/A and A/B experi-mental designs. We can see that in the A/B experimental design, there exists a signiﬁcantincrease in the mean answer rate, ﬁnishing rate and gross merchandise value when replacingthe old policy by the new one since all the p − values associated with the average treatmenteﬀect are smaller than 10 − . The new policy can also signiﬁcantly reduce the GEM value( p − value smaller than 0 . it within the same time period under the control and experimental policies, respectively.The customer requests in three selected regions marked by green and purple circles weresatisﬁed by the drivers in nearby regions, resulting in the higher supply-demand coherenceand thus a smaller GEM value. References

Ambrosio, L. & Gangbo, W. (2008), ‘Hamiltonian odes in the wasserstein space of proba-bility measures’,

Communications on Pure and Applied Mathematics: A Journal Issuedby the Courant Institute of Mathematical Sciences (1), 18–53.Ambrosio, L., Gigli, N. & Savar´e, G. (2008), Gradient ﬂows: in metric spaces and in the pace of probability measures , Springer Science & Business Media.Arjovsky, M., Chintala, S. & Bottou, L. (2017), Wasserstein generative adversarial net-works, in ‘International Conference on Machine Learning’, pp. 214–223.Cha, S.-H. (2007), ‘Comprehensive survey on distance/similarity measures between proba-bility density functions’, City (2), 1.Chizat, L., Peyr´e, G., Schmitzer, B. & Vialard, F.-X. (2018), ‘Scaling algorithms for unbal-anced optimal transport problems’, Mathematics of Computation (314), 2563–2609.Dyer, M. E., Prieze, A. M. & Mcdiarmid, C. J. H. (1986), ‘On linear programs with randomcosts’, Mathematical Programing , 3–16.Grenander, U. & Miller, M. (2007), Pattern Theory From Representation to Inference ,Oxford University Press.Lacombe, T., Cuturi, M. & OUDOT, S. (2018), Large scale computation of means andclusters for persistence diagrams using optimal transport, in S. Bengio, H. Wallach,H. Larochelle, K. Grauman, N. Cesa-Bianchi & R. Garnett, eds, ‘Advances in NeuralInformation Processing Systems 31’, Curran Associates, Inc., pp. 9770–9780.

URL: http://papers.nips.cc/paper/8184-large-scale-computation-of-means-and-clusters-for-persistence-diagrams-using-optimal-transport.pdf

Levin, D. A. & Peres, Y. (2017),

Markov chains and mixing times , Vol. 107, AmericanMathematical Soc.Liang, K.-Y. & Zeger, S. L. (1986), ‘Longitudinal data analysis using generalized linearmodels’,

Biometrika (1), 13–22. 33iao, Z. (2003), ‘Real-time taxi dispatching using global positioning systems’, Communi-cations of the ACM (5), 81–83.Liero, M., Mielke, A. & Savar´e, G. (2018), ‘Optimal entropy-transport problems and anew hellinger–kantorovich distance between positive measures’, Inventiones mathemati-cae (3), 969–1117.Mancl, L. A. & DeRouen, T. A. (2001), ‘A covariance estimator for gee with improvedsmall-sample properties’,

Biometrics (1), 126–134.Munkres, J. (1957), ‘Algorithms for the assignment and transportation problems’, Journalof the society for industrial and applied mathematics (1), 32–38.Nikulin, M. S. (2001), ‘Hellinger distance’, Encyclopedia of mathematics .Piccoli, B. & Rossi, F. (2014), ‘Generalized wasserstein distance and its application to trans-port equations with source’, Archive for Rational Mechanics and Analysis (1), 335–358.Rabin, J. & Papadakis, N. (2015), Convex color image segmentation with optimal trans-port distances, in ‘International Conference on Scale Space and Variational Methods inComputer Vision’, Springer, pp. 256–269.Rubner, Y., Tomasi, C. & Guibas, L. J. (1998 a ), Adaptive color-image embeddings fordatabase navigation, in ‘Asian Conference on Computer Vision’, Springer, pp. 104–111.Rubner, Y., Tomasi, C. & Guibas, L. J. (1998 b ), A metric for distributions with applicationsto image databases, in ‘Sixth International Conference on Computer Vision (IEEE Cat.No. 98CH36271)’, IEEE, pp. 59–66. 34ubner, Y., Tomasi, C. & Guibas, L. J. (2000), ‘The earth mover’s distance as a metricfor image retrieval’, International Journal of Computer Vision (2), 99–121.Shi, J. & Wang, Y. (2020), ‘Hyperbolic wasserstein distance for shape indexing’, IEEETrans Pattern Anal Mach Intell. , in press.Solomon, J., Rustamov, R., Guibas, L. & Butscher, A. (2014), Wasserstein propagation forsemi-supervised learning, in ‘International Conference on Machine Learning’, pp. 306–314.Srivastava, A. & Klassen, E. P. (2016), Shapes and Diﬀeomorphismss , Springer Series inStatistics.Steele, J. M. (1987),

Probability Theory and Combinatorial Optimization , Society for In-dustrial and Applied Mathematics.Villani, C. (2008),

Optimal transport: old and new , Vol. 338, Springer Science & BusinessMedia.Wang, H. & Yang, H. (2019), ‘Ridesourcing systems: A framework and review’,

Trans-portation Research Part B: Methodological , 122–155.Xu, Z., Li, Z., Guan, Q., Zhang, D., Li, Q., Nan, J., Liu, C., Bian, W. & Ye, J. (2018),Large-scale order dispatch in on-demand ride-hailing platforms: A learning and plan-ning approach, in ‘Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining’, ACM, pp. 905–913.Younes, L. (2010), Shapes and Diﬀeomorphismss , Springer.35hang, R. & Pavone, M. (2016), ‘Control of robotic mobility-on-demand systems: aqueueing-theoretical perspective’,

The International Journal of Robotics Research (1-3), 186–203. 36able 1: Results from the answer-rate prediction. Comparisons of Hellinger, L -distance,Wasserstein, and GEM in predicting answer rate at t + 10 and t + 60 minutes. Peakhour denotes the time from 6 pm to 8 pm. MAPE and RMSE denote the mean absolutepercentage error and root mean squared error, respectively.Hellinger L2-distance Wasserstein GEMt+10 All time RMSE 0.1362 0.1496 0.1273 MAPE 0.0801 0.0891 0.0718

Peak hour RMSE 0.2219 0.2187 0.2088

MAPE 0.1494 0.1457 0.1089 t+60 All time RMSE 0.1522 0.1552 0.1413

MAPE 0.0828 0.0868 0.0859

Peak hour RMSE 0.2395 0.2565 0.2222

MAPE 0.1077 0.1159 0.1317 − (18) with respect to two evaluation metrics including the drivers’ revenue and theglobal answer rate using the simulator for city S on two selected Weekdays. The rows with( α , α ) = (0 ,

0) correspond to the ﬁrst (or baseline) policy (16), those with α = 0 and α (cid:54) = 0 correspond to the second policy, and all other rows correspond to the third pol-icy. The numbers in the parentheses denote the relative improvement of the correspondingpolicy over the baseline policy for each evaluation metric. α α Drivers’ Revenue (Yuan) Order Answer Rate2018/05/15 (Tuesday)0 0 1191316 0.7370.54 0 1227175(+3.01%) 0.760(+3.12%)0.54 6 1235037(+3.67%) 0.761(+3.28%)0.54 7 1236824(+3.82%) 0.763(+3.54%)0.54 8 − (18) with respect to two evaluation metrics including the drivers’ revenue and theglobal answer rate using the simulator for city S on a selected Weekend. α α Drivers’ Revenue (Yuan) Order Answer Rate2018/05/19 (Saturday)0 0 13507568 0.7450.52 0 13886185(+2.80%) 0.768(+3.09%)0.52 6 14034453(+3.90%) 0.774(+3.89%)0.52 7 14008847(+3.71%) 0.772(+3.62%)0.52 8 p − valueof average treatment eﬀects for the A/A and A/B experimentsExperiment Design y m ( t ) Relative Improvement(%) p − valueAnswer Rate 0.76 1.16e-12A/B Finish Rate 0.36 4.32e-3GMV 0.86 2.91e-6GEM -0.80 4.06e-2Answer Rate 0.01 0.96A/A Finishing Rate 0.01 0.96GMV -0.08 0.72GEM -0.25 0.4339igure 7: Results from the policy evaluation. (A) The GEM values of a randomly selectedday on 2018/12/08 for city H at 30-min scale. Green and red points represent the GEMvalues generated by the baseline (control) and new (experimental) policies, respectively. Inparticular, we mark the time period 7:00 to 8:00 a.m. by a blue circle, which demonstratesa signiﬁcant reduction of GEM value when changing the policy from the the control oneto the experimental one. (B) The heatmaps of vertex-wise supply-demand diﬀerence DSd it of city H under the control and experimental policies within the 30-min time window from7:00 to 7:30 am and that from 7:30 to 8:00 am, respectively. Hexagons in red and bluecolors represent the locations with positive and negative DSd it , respectively, and a deepercolor corresponds to a big | DSd it ||