Optimal Vehicle Dispatching Schemes via Dynamic Pricing
AAn economic approach to vehicle dispatching for ride sharing
Mengjing Chen, Weiran Shen, Pingzhong Tang, and Song ZuoIIIS, Tsinghua University ∗ March 2, 2018
Abstract
Over the past few years, ride-sharing has emerged as an effective way to relieve traffic congestion. A key problemfor these platforms is to come up with a revenue-optimal (or GMV-optimal) pricing scheme and an induced vehicledispatching policy that incorporate geographic and temporal information. In this paper, we aim to tackle this problemvia an economic approach.Modeled naively, the underlying optimization problem may be non-convex and thus hard to compute. To thisend, we use a so-called “ironing” technique to convert the problem into an equivalent convex optimization one via aclean Markov decision process (MDP) formulation, where the states are the driver distributions and the decisionvariables are the prices for each pair of locations. Our main finding is an efficient algorithm that computes the exactrevenue-optimal (or GMV-optimal) randomized pricing schemes. We characterize the optimal solution of the MDPby a primal-dual analysis of a corresponding convex program. We also conduct empirical evaluations of our solutionthrough real data of a major ride-sharing platform and show its advantages over fixed pricing schemes as well asseveral prevalent surge-based pricing schemes.
The recently established applications of shared mobility, such as ride-sharing, bike-sharing, and car-sharing, have beenproven to be an effective way to utilize redundant transportation resources and to optimize social efficiency (Cramerand Krueger, 2016). Over the past few years, intensive researches have been done on topics related to the economicaspects of shared mobility (Crawford and Meng, 2011; Kostiuk, 1990; Oettinger, 1999).Despite these researches, the problem of how to design revenue optimal prices and vehicle dispatching schemeshas been largely open and one of the main research agendas in sharing economics. There are at least two challengeswhen one wants to tackle this problem in the real-world applications. First of all, due to the nature of transportation,the price and dispatch scheme must be geographically dependent. Secondly, the price and dispatch scheme must takeinto consideration the fact that supplies and demands in these environments may change over time. As a result, itmay be difficult to compute, or even to represent a price and dispatch scheme for such complex environments.Traditional price and dispatch schemes for taxis (Laporte, 1992; Gendreau et al., 1994; Ghiani et al., 2003) andairplanes (Gale and Holmes, 1993; Stavins, 2001; McAfee and Te Velde, 2006) do not capture the dynamic aspects ofthe environments: taxi fees are normally calculated by a fixed rate of distance and time and the prices of flight ticketsare sold via relatively long booking periods, while in contrast, the customers of shared vehicles make their decisionsinstantly.The dynamic ride-sharing market studied in this paper is also known to have imbalanced supply and demand,either globally in a city or locally in a particular time and location. Such imbalance in supply and demand is known tocause severe consequences on revenues (e.g, the so-called wild goose chase phenomenon (Castillo et al., 2017)). Surgingprice is a way to balance dynamic supply and demand (Chen and Sheldon, 2015) but there is no known guarantee thatsurge based pricing can dispatch vehicles efficiently and solve the imbalanced supplies and demands. Traditionaldispatch schemes (Laporte, 1992; Gendreau et al., 1994; Ghiani et al., 2003) focus more on the algorithmic aspect ofstatic vehicle routing, without consider pricing. However, vehicle dispatching and pricing problem are tightly related,since a new price scheme will surely induces a change on supply and demand since the drivers and passengers arestrategic. In this paper, we aim to come up with price schemes with desirable induced supplies and demands. ∗ Contacts: [email protected], {emersonswr, kenshinping, songzuo.z}@gmail.com a r X i v : . [ c s . S Y ] M a r .1 Our contribution In this paper, we propose a graph model to analyze the vehicle pricing and dispatching problem mentioned above. Inthe graph, each node refers to a region in the city and each edge refers to a possible trip that includes a pair of originand destination as well as a cost associated with the trip on this edge. The design problem is, for the platform, to set aprice and specify the vehicle dispatch for each edge at each time step. Drivers are considered to be non-strategic inour model, meaning that they will accept whatever offer assigned to them. The objective of the platform can either beits revenue or the GMV or any convex combination between them.Our model naturally induces a
Markov Decision Process (MDP) with the driver distributions on each node asstates, the price and dispatch along each edge as actions, and the revenue as immediately reward. Although thecorresponding mathematical program is not convex (thus computationally hard to compute) in general, we show thatit can be reduced to a convex one without loss of generality. In particular, in the resulting convex program where thethroughput along each source and destination pair in each time period are the variables, all the constraints are linearand hence the exact optimal solutions can be efficiently computed (Theorem 3.1).We further characterize the optimal solution via primal-dual analysis. In particular, a pricing scheme is optimalif and only if the marginal contribution of the throughput along each edge equals to the system-wise marginalcontribution of additional supply minus the difference of the long term contributions of unit supply at the origin andthe destination (see Section 5).We also perform extensive empirical analysis based on a public dataset with more than 8 . Driven by real-life applications, a large number of researches have been done on ride-share markets. Some of thememploy queuing networks to model the markets (Iglesias et al., 2016; Banerjee et al., 2015; Tang et al., 2016). Iglesiaset al. (2016) describe the market as a closed, multi-class BCMP queuing network which captures the randomness ofcustomer arrivals. They assume that the number of customers is fixed, since customers only change their locationsbut don’t leave the network. In contrast, the number of customer are dynamic in our model and we only considerthe one who asks for a ride (or sends a request to the platform). Banerjee et al. (2015) also use a queuing theoreticapproach to analyze the ride-share markets and mainly focus on the behaviors of drivers and customers. They assumethat the drivers enter or leave the market with certain possibilities. Bimpikis et al. (2016) take account for the spatialdimension of pricing schemes in ride-share markets. They price for each region and their goal is to rebalance thesupply and demand of the whole market. However, we price for each routing and aim to maximize the total revenueor social welfare of the platform. We also refer the readers to the line of researches initiated by (Ma et al., 2013) forthe problems about the car-pooling in the ride-sharing systems (Alonso-Mora et al., 2017; Zhao et al., 2014; Chan andShaheen, 2012).Many works on ride-sharing consider both the customers and the drivers to be strategic, where the driversmay reject the requests or leave the system if the prices are too low (Banerjee et al., 2015; Fang et al., 2017). As wementioned, if the revenue sharing ratios between the platform and the drivers can be dynamic, then the pricingproblem and the revenue sharing problem could be independent and hence the drivers are non-strategic in the pricingproblem. In addition, the platform can also increase the profit by adopting dynamic revenue sharing schemes (Balseiroet al., 2017).Another work closely related to ours is by Banerjee et al. (2017). Their work is concurrent and has been developedindependently from ours. In particular, the customers arrive according to a queuing model and their pricing policy isstate-independent and depends on the transition volume. Both their and our models are built upon the underlyingMarkovian transitions between the states (the distribution of drivers over the graph). The major differences are: (i) ourmodel is built for the dynamic environments with a very large number of customers (each of them is non-atomic) tomeet the practical situations, while theirs adopts discrete agent settings; (ii) they overcome the non-convexity of theproblem by relaxation and focus only on concave objectives, which makes this work hard to use for real applications,while we solve the problem via randomized pricing and transform the problem to a convex program; (iii) they prove2pproximation bounds of the relaxation problem, while we give exact optimal solutions of the problem by efficientlysolving the convex program.
A passenger (she) enters the ride-sharing platform and sends a request including her origin and destination to theplatform. The platform receives the request and determines a price for it. If user accepts the price, then the platformmay decide whether to send a driver (he) to pick her up. The platform is also able to dispatch drivers from one place toanother even there is no request to be served. By the pricing and dispatching methods above, the goal of maximizingrevenue or social welfare of the entire platform can be achieved. Our model incorporates the two methods into asimple pricing problem. In this section, we define basic components of our model and consider two settings: dynamicenvironments with a finite time horizon and static environments with an infinite time horizon. Finally we reduce theaction space of the problem and give a simple formulation.
Requests
We use a strongly connected digraph G = ( V , E ) to model the geographical information of a city.Passengers can only take rides from nodes to nodes on the graph. When a passenger enters the platform, she expectsto get a ride from node s to node t , and is willing to pay at most x ≥ e = ( s , t ) . Upon receiving the request, The platform sets a price p for it. Ifthe price is accepted by the passenger (i.e., x ≥ p ), then the platform tries to send a driver to pick her up. We say thatthe platform rejects the request, if no driver is available.A request is said to be accepted if both the passenger accepts the price p and there are available drivers. Otherwise,the request is considered to end immediately. Drivers
Clearly, within each time period, the total number of accepted requests starting from s cannot be morethan the number of drivers available at s . Formally, let q ( e ) denote the total number of accepted request along edge e ,then: (cid:213) e ∈ OUT ( v ) q ( e ) ≤ w ( v ) , ∀ v ∈ V , (2.1)where OUT ( v ) is the set of edges starting from v and w ( v ) is the number of currently available drivers at node v .In particular, we assume that both the total number of drivers and the number of requests are very large, which isoften the case in practice, and consider each driver and each request to be non-atomic . For simplicity, we normalizethe total amount of drivers on the graph to be 1, thus w ( v ) is a real number in [ , ] . We also normalize the number ofrequests on each edge with the total number of drivers. Note that the amount of requests on an edge e can be morethan 1, if there are more requests on e than the total drivers on the graph. Geographic Status
For each accepted request on edge e , the platform will have to cover a transportation cost c τ ( e ) for the driver. In the meanwhile, the assigned driver, who currently at node s , will not be available until he arrivesthe destination t . Let ∆ τ ( e ) be the traveling time from s to t and τ be the timestep of the driver leaving s . He will beavailable again at timestep τ + ∆ τ ( e ) on node t . Formally, the amount of available drivers on any v ∈ V is evolvingaccording to the following equations: w τ + ( v ) = w τ ( v ) − (cid:213) e ∈ OUT ( v ) q τ ( e ) + (cid:213) e ∈ IN ( v ) q τ + − ∆ τ ( e ) ( e ) , (2.2)where IN ( v ) is the set of edges ending at v . Here we add subscripts to emphasize the timestamp for each quantity. Inparticular, throughout this paper, we focus on the discrete time step setting, i.e., τ ∈ N . Demand Function
As we mentioned, the platform could set different prices for the requests. Such prices mayvary with the request edge e , time step t , and the driver distribution but must be independent of the passenger’sprivate value x as it is not observable. Formally, let D τ (·| e ) : R + → R + be the demand function of edge e , i.e., D τ ( p | e ) is the amount of requests on edge e with private value x ≥ p in time step τ . Then the amount of accepted requests q τ ( e ) ≤ E [ D τ ( p τ ( e )| e )] , where the expectation is taken over the potential randomness of the pricing rule p τ ( e ) . In practice, such a demand function can be predicted from historical data Tong et al. (2017); Moreira-Matias et al. (2013). The randomized pricing rule may set different prices for the requests on the same edge e . esign Objectives In this paper, we consider a class of state-irrelevant objective functions. A function is state-irrelevant if its value only depends on the amount of accepted request on each edge q ( e ) but not the driver distributionof the system w ( v ) . Note that a wide range of objectives are included in our class of objectives, such as the revenue ofthe platform: REVENUE ( p , q ) = (cid:213) e ,τ E [( p τ ( e ) − c τ ( e )) · q τ ( e )] , and the social welfare of the entire system:WELFARE ( p , q ) = (cid:213) e ,τ E [( x − c τ ( e )) · q τ ( e )] . In general, our techniques work for any state-irrelevant objectives. Let g ( p , q ) denote the general objectivefunction and the dispatching and pricing problem can be formulated as follows:maximize (cid:213) e ,τ g ( p τ ( e ) , q τ ( e )| e ) (2.3)subject to ( . ) and ( . ) . Static and Dynamic Environment
In general, our model is defined for a dynamic environment in the sense thatthe demand function D τ and the transportation cost c τ could be different for each time step τ . In particular, we studythe problem (2.3) in general dynamic environments with finite time horizon from τ = T , where the initial driverdistribution w ( v ) is given as input.In addition, we also study the special case with static environment and infinite time horizon, where D τ ≡ D and c τ ≡ c are consistent across each time step. In this section, we rewrite the problem to an equivalent reduced form by incorporating the action of dispatchinginto pricing, i.e., using p to express q . The idea is straightforward: (i) for the requests rejected by the platform, theplatform could equivalently set an infinitely large price; (ii) if the platform is dispatching available drivers (withoutrequests) from node s to t , we can create virtual requests from s to t with 0 value and let the platform sets price 0 forthese virtual requests. In fact, we can assume without loss of generality that D ( | e ) ≡
1, the total amount of drivers,because one can always add enough virtual requests for the edges with maximum demand less than 1 or remove therequests with low values for the edges with maximum demand exceeds the total driver supply, 1.As a result, we may conclude that q ( e ) ≤ D ( p | e ) . Since our goal is to maximize the objective g ( p , q ) , raising pricesto achieve the same amount of flow q ( e ) (such that E [ D ( p | e )] = q ( e ) ) never eliminates the optimal solution. In otherwords, Observation 2.1.
The original problem is equivalent to the following reduced problem, where the flow variables q τ ( e ) are uniquely determined by the price variables p τ ( e ) :maximize (cid:213) e ,τ g ( p τ ( e ) , D τ ( p τ ( e )| e )) subject to q τ ( e ) = E [ D ( p τ ( e )| e )]( . ) and ( . ) . (2.4) In this section, we demonstrate how the original problem (2.4) can be equivalently rewritten as a Markov decisionprocess with a convex objective function. Formally,
Theorem 3.1.
The original problem (2.4) of the instance (cid:104) G , D , g , ∆ τ (cid:105) is equivalent to a Markov decision process problemof another instance (cid:104) G (cid:48) , D (cid:48) , g (cid:48) , ∆ τ (cid:48) (cid:105) with g (cid:48) being convex. The proof of Theorem 3.1 will be immediate after Lemma 3.2 and 3.4. The equivalent Markov decision processproblem could be formulated as a convex program, and hence can be solved efficiently.4 .1 Unifying travel time
Note that the original problem (2.4), in general, is not a MDP by itself, because the current state w τ + ( v ) may dependon the action q τ + − ∆ τ ( e ) in (2.2). Hence our first step is to equivalently map the original instance to another instancewith traveling time is always 1, i.e., ∆ τ ( e ) ≡ Lemma 3.2 (Unifying travel time) . The original problem (2.4) of an general instance (cid:104) G , D , g , ∆ τ (cid:105) is equivalent to theproblem of a -travel time instance (cid:104) G (cid:48) , D (cid:48) , g (cid:48) , ∆ τ (cid:48) (cid:105) , where ∆ τ (cid:48) (·) ≡ . Intuitively, we tackle this problem by adding virtual nodes into the graph to replace the original edges. Thisoperation splits the entire trip into smaller ones, and at each time step, all drivers become available.
Proof.
For edges with traveling time ∆ τ ( e ) =
1, we are done.For edges with traveling time ∆ τ ( e ) >
1, we add ∆ τ ( e ) − v e , . . . , v e ∆ τ ( e )− , andthe directed edges connecting them to replace the original edge e , i.e., E (cid:48) ( e ) = {( s , v e ) , ( v e , v e ) , . . . , ( v e ∆ τ ( e )− , v e ∆ τ ( e )− ) , ( v e ∆ τ ( e )− , t )} , E (cid:48) = (cid:216) e ∈ E E (cid:48) ( e ) , V (cid:48) = (cid:216) e ∈ E { v e , . . . , v e ∆ τ ( e )− } ∪ V . We set the demand function of each new edge e (cid:48) ∈ E (cid:48) ( e ) to be identical to those of the original edge e : D (cid:48) (·| e (cid:48) ) ≡ D (·| e ) .An important but natural constraint is that if a driver handles a request on edge e of the original graph, then hemust go along all edges in E (cid:48) ( e ) of the new graph, because he cannot leave the passenger halfway. To guarantee this,we only need to guarantee that all edges in E (cid:48) ( e ) have the same price. Also, we need to split the objective of travelingalong e into the new edges, i.e., each new edge has objective function g (cid:48) ( p , q | e (cid:48) ) = g ( p , q | e )/ ∆ τ ( e ) , ∀ e (cid:48) ∈ E (cid:48) ( e ) . One can easily verify that the above operations increase the graph size to at most max e ∈ E ∆ τ ( e ) ∗ times of that ofthe original one. In particular, there is a straightforward bijection between the dispatching behaviors of the original G = ( V , E ) and the new graph G (cid:48) = ( V (cid:48) , E (cid:48) ) . Hence we can always recover the solution to the original problem. By Lemma 3.2, the original problem (2.4) can be formulated as an MDP:
Definition 3.3 (Markov Decision Process) . The vehicle pricing and dispatching problem is a Markov decision process,denoted by a tuple ( G , D , g , S , A , W ) , where G = ( V , E ) is the given graph, D is the demand function, objective g is thereward function, S = ∆ ( V ) is the state space including all possible driver distributions over the nodes, A is the actionspace, and W is the state transition rule: w τ + ( v ) = w τ ( v ) − (cid:213) e ∈ OUT ( v ) q τ ( e ) + (cid:213) e ∈ IN ( v ) q τ ( e ) . (3.1)However, by naïvely using the pricing functions p τ ( e ) as the actions, the induced flow q τ ( e ) = E [ D τ ( p τ ( e )| e )] ,in general, is neither convex nor concave. In other words, both the reward g and the state transition W of thecorresponding MDP is non-convex. As a result, it is hard to solve the MDP efficiently.In this section, we show that by formulating the MDP with the flows q τ ( e ) as actions, the corresponding MDP isconvex. Lemma 3.4 (Flow-based MDP) . In the MDP ( G , D , g , S , A , W ) with all possible flows as the action set A , i.e., A = [ , ] | E | ,the state transition rules are linear functions of the flows and the reward functions g are convex functions of the flows.Proof. To do this, we first need to rewrite the prices p τ ( e ) as functions of the flows q τ ( e ) . In general, since the pricescould be randomized, the inverse function of q τ ( e ) = E [ D τ ( p τ ( e )| e )] is not unique.5ote that conditional on fixed flows q τ ( e ) , the state transition of the MDP is also fixed. In this case, differentprices yielding such specific flows only differs in the rewards. In other words, it is without loss of generality to let theinverse function of prices be as follows: p τ ( e ) = arg max p g τ ( p τ ( e ) , q τ ( e )| e ) , s.t. q τ ( e ) = E [ D τ ( p τ ( e )| e )] . In particular, since the objective function g we studied in this paper is linear and weakly increasing in the prices p and the demand function D ( p | e ) is decreasing in p , the inversed price function could be defined as follows:• Let g τ ( q | e ) = g τ ( D − τ ( q | e ) , q | e ) , i.e., the objective obtained by setting the maximum fixed price p = D − τ ( q | e ) such that the induced flow is exactly q ;• Let ˆ g τ ( q | e ) be the ironed objective function , i.e., the smallest concave function that upper-bounds g τ ( q | e ) (seeFigure 1);• For any given q τ ( e ) , the maximum objective on edge e is ˆ g τ ( q τ ( e )| e ) and could be achieve by setting the priceto be randomized over D − τ ( q (cid:48) | e ) and D − τ ( q (cid:48)(cid:48) | e ) .Figure 1: Ironed objective functionFinally, we prove the above claim to complete the proof of Lemma 3.4.By the definition of ˆ g τ ( q | e ) , for any randomized price p , E p [ g τ ( D τ ( p | e )| e )] ≤ E p [ ˆ g τ ( D τ ( p | e )| e )] . Since ˆ g is concave, applying Jensen’s inequality yields: E p [ ˆ g τ ( D τ ( p | e )| e )] ≤ ˆ g τ (cid:18) E p [ D τ ( p | e )] (cid:12)(cid:12) e (cid:19) = ˆ g τ ( ¯ q | e ) Now it suffices to show that the upper bound ˆ g τ ( ¯ q | e ) is attainable.If ˆ g τ ( ¯ q | e ) = g τ ( ¯ q | e ) , then the right-hand-side could be achieved by letting p τ ( e ) be the deterministic price D − τ ( ¯ q | e ) .Otherwise, let I = ( q (cid:48) , q (cid:48)(cid:48) ) be the ironed interval (where ˆ g τ ( q | e ) > g τ ( q | e ) , ∀ q ∈ I but ˆ g τ ( q (cid:48) | e ) = g τ ( q (cid:48) | e ) andˆ g τ ( q (cid:48)(cid:48) | e ) = g τ ( q (cid:48)(cid:48) | e ) ) containing ¯ q . Thus ¯ q can be written as a convex combination of the end points q (cid:48) and q (cid:48)(cid:48) :¯ q = λ q (cid:48) + ( − λ ) q (cid:48)(cid:48) . Note that the function ˆ g τ is linear within the interval I . Therefore λ g τ ( q (cid:48) | e ) + ( − λ ) g τ ( q (cid:48)(cid:48) | e ) = λ ˆ g τ ( q (cid:48) | e ) + ( − λ ) ˆ g τ ( q (cid:48)(cid:48) | e ) = ˆ g τ ( λ q (cid:48) + ( − λ ) q (cid:48)(cid:48) | e ) = ˆ g τ ( ¯ q | e ) . In other words, the upper bound ˆ g τ ( ¯ q | e ) could be achieved by setting the price to be q (cid:48) with probability λ and q (cid:48)(cid:48) with probability 1 − λ . In the meanwhile, the flow q τ ( e ) would retain the same. Proof of Theorem 3.1.
The theorem is implied by Lemma 3.2 and Lemma 3.4. In particular, the reward function is theironed objective function ˆ g .In the rest of the paper, we will focus on the following equivalent problem:maximize (cid:213) e ,τ ˆ g τ ( q τ ( e )| e ) subject to ( . ) and ( . ) . (3.2)6 Optimal Solution in Static Environment
In this setting, we restrict our attention to the case where the environment is static, hence the objective function doesnot change over time, i.e., ∀ τ ∈ [ T ] , ˆ g τ ( q | e ) ≡ ˆ g ( q | e ) . We aim to find the optimal stationary policy that maximizes theobjective function, i.e., the decisions q τ depends only on the current state w τ .In this section, we discretize the MDP problem and focus on stable policies . With the introduction of the ironedobjective function ˆ g τ , we show that for any discretization scheme, the optimal stationary policy of the induced discretized MDP is dominated by a stable dispatching scheme. Then we formulate the stable dispatching scheme as aconvex problem, which means the optimal stationary policy can be found in polynomial time. Definition 4.1.
A stable dispatching scheme is a pair of state and policy ( w τ , π ) , such that if policy π is applied, thedistribution of available drivers does not change over time, i.e., w τ + ( v ) = w τ ( v ) . In particular, under a stable dispatching scheme, the state transition rule (3.1) is equivalent to the following form: (cid:213) e ∈ OUT ( v ) q ( e ) = (cid:213) e ∈ IN ( v ) q ( e ) . (4.1) Definition 4.2.
Let M = ( G , D , ˆ g , S , A , W ) be the original MDP problem. A discretized MDP DM with respect to M isa tuple ( G d , D d , ˆ g d , S d , A d , W d ) , where G d = G , D d = D , ˆ g d = ˆ g , W d = W , S d is a finite subset of S , and A d is a finitesubset of A that contains all feasible transition flows between every two states in S d . Theorem 4.3.
Let DM and M be a discretized MDP and the corresponding original MDP. Let π d : S d → A d be anoptimal stationary policy of DM . Then there exists a stable dispatching scheme ( w , π ) , such that the time-averageobjective of π in M is no less than that of π d in DM .Proof. Consider policy π d in DM . Starting from any state in S d with policy π d , let { w τ } ∞ be the subsequent statesequence. Since DM has finitely many states and policy π d is a stationary policy, there must be an integer n , suchthat w n = w m for some m < n and from time step m on, the state sequence become a periodic sequence. Define¯ w = n − m n − (cid:213) k = m w k , ¯ q = n − m n − (cid:213) k = m π d ( w k ) Denote by π d ( w k | e ) or q d ( e ) the flow at edge e of the decision π d ( w k ) . Sum the transition equations for all the timesteps m ≤ k < n , and we get: n − (cid:213) k = m w k + ( v ) − n − (cid:213) k = m w k ( v ) = n − (cid:213) k = m (cid:169)(cid:173)(cid:171) (cid:213) IN ( v ) π d ( w k | e ) (cid:170)(cid:174)(cid:172) − n − (cid:213) k = m (cid:169)(cid:173)(cid:171) (cid:213) OUT ( v ) π d ( w k | e ) (cid:170)(cid:174)(cid:172) ¯ w ( v ) = ¯ w ( v ) − (cid:169)(cid:173)(cid:171) (cid:213) OUT ( v ) ¯ q ( e ) (cid:170)(cid:174)(cid:172) + (cid:169)(cid:173)(cid:171) (cid:213) IN ( v ) ¯ q ( e ) (cid:170)(cid:174)(cid:172) Also, policy π d is a valid policy, so ∀ v ∈ V and ∀ m ≤ k < n : (cid:213) OUT ( v ) q k ( e ) ≤ w k ( v ) Summing over k , we have: (cid:213) OUT ( v ) ¯ q ( e ) ≤ ¯ w ( v ) Now consider the original problem M . Let w = ¯ w and π be any stationary policy such that:• π ( w ) = ¯ q ; 7 starting from any state w (cid:48) (cid:44) w , policy π leads to state w within finitely many steps.Note that the second condition can be easily satisfied since the graph G is strongly connected.With the above definitions, we know that ( w , π ) is a stable dispatching scheme. Now we compare the objectives ofthe two policies π d and π . The time-average objective function is not sensitive about the first finitely many immediateobjectives. And since the state sequences of both policies π d and π are periodic, Their time-average objectives can bewritten as: OBJ ( π d ) = n − m n − (cid:213) k = m (cid:213) e ∈ E ˆ g ( q d ( e )| e ) OBJ ( π ) = (cid:213) e ∈ E ˆ g ( ¯ q ( e )| e ) By Jensen’s inequality, we have:OBJ ( π d ) = n − m n − (cid:213) k = m (cid:213) e ∈ E ˆ g ( q d ( e )| e ) ≤ (cid:213) e ∈ E ˆ g (cid:34)(cid:32) n − m n (cid:213) k = m q d ( e ) (cid:33) (cid:12)(cid:12) e (cid:35) = (cid:213) e ∈ E ˆ g ( ¯ q ( e )| e ) = OBJ ( π ) With Theorem 4.3, we know there exists a stable dispatching scheme that dominates the optimal stationary policyof the our discretized MDP. Thus we now only focus on stable dispatching schemes. The problem of finding anoptimal stable dispatching scheme can be formulated as a convex program with linear constraints:maximize (cid:213) e ∈ E ˆ g ( q | e ) subject to ( . ) and ( . ) . (4.2)Because ˆ g ( q | e ) is concave, the program is convex. Since all convex programs can be solved in polynomial time,our algorithm for finding optimal stationary policy of maximizing the objective functions is efficient. In this section, we characterize the optimal solution via dual analysis. For the ease of presentation, we considerProgram 4.2 in the static environment with infinite horizon. Our characterization directly extends to the dynamicenvironment.The Lagrangian is defined to be L ( q , λ, µ ) = − (cid:213) e ∈ E ˆ g ( q | e ) + λ (cid:32)(cid:213) e ∈ E q ( e ) − (cid:33) + (cid:213) v ∈ V µ v (cid:169)(cid:173)(cid:171) (cid:213) OUT ( v ) q ( e ) − (cid:213) IN ( v ) q ( e ) (cid:170)(cid:174)(cid:172) = − λ + (cid:213) e ∈ E [− ˆ g ( q | e ) + ( λ + µ s − µ t ) q ( e )] , where s and t are the origin and destination of e , i.e., e = ( s , t ) , and λ and µ are Lagrangian multipliers with λ ≥ − (cid:205) e ∈ E ˆ g ( q ∗ | e ) .The Lagrangian dual function is h ( λ, µ ) = inf q L ( q , λ, µ ) = (cid:213) e ∈ E [− ˆ g ( ˜ q | e ) + ( λ + µ s − µ t ) ˜ q ( e )] , where ˜ q ( e ) is a function of λ and µ such that λ + µ s − µ t = ˆ g (cid:48) ( ˜ q | e ) , where ˆ g (cid:48) ( ˜ q | e ) is the derivative of the objectivefunction with respect to flow q . The dual program corresponding to Program 4.2 ismaximize h ( λ, µ ) subject to λ ≥ heorem 5.1. Let q ∗ ( e ) be a feasible solution to the primal program 4.2 and ( λ ∗ , µ ∗ ) be a feasible solution to the dualprogram 5.1. Then both q ∗ ( e ) and ( λ ∗ , µ ∗ ) are primal and dual optimal with − (cid:205) e ∈ E ˆ g ( q ∗ | e ) = h ( λ ∗ , µ ∗ ) , if and only if λ ∗ (cid:32)(cid:213) e ∈ E q ∗ ( e ) − (cid:33) = g (cid:48) ( q ∗ | e ) = λ ∗ + µ ∗ s − µ ∗ t , ∀ v ∈ V (5.3) Proof.
According to the definition of h ( λ, µ ) , we have h ( λ ∗ , µ ∗ ) = inf q L ( q , λ ∗ , µ ∗ ) . Since ˆ g ( q | e ) are concave functions,Equation 5.3 is equivalent to the fact that q ∗ ( e ) minimizes the function L ( q , λ ∗ , µ ∗ ) . h ( λ ∗ , µ ∗ ) = inf q L ( q , λ ∗ , µ ∗ ) = L ( q ∗ , λ ∗ , µ ∗ ) = − (cid:213) e ∈ E ˆ g ( q ∗ | e ) + λ ∗ (cid:32)(cid:213) e ∈ E q ∗ ( e ) − (cid:33) + (cid:213) v ∈ V µ ∗ v (cid:169)(cid:173)(cid:171) (cid:213) OUT ( v ) q ∗ ( e ) − (cid:213) IN ( v ) q ∗ ( e ) (cid:170)(cid:174)(cid:172) = − (cid:213) e ∈ E ˆ g ( q ∗ | e ) , where the last equation uses the Equation 5.2 and the fact that q ∗ ( e ) is feasible.Continuing with Theorem 5.1, we will analyze the dual variables from the economics angle and some interestinginsights into this problem for real applications. The dual variables have useful economic interpretations (see (Boyd and Vandenberghe, 2004, Chapter 5.6)). λ ∗ is thesystem-wise marginal contribution of the drivers (i.e. the increase in the objective function when a small amount ofdrivers are added to the system). Note that by the complementary slackness (Equation 5.2), if λ ∗ >
0, the sum of thetotal flow must be 1, meaning that all drivers are busy, and more requests can be accepted (hence increase revenue) ifmore drivers are added to the system. Otherwise, there must be some idle drivers, and adding more drivers cannotincrease the revenue. µ ∗ v is the marginal contribution of the drivers at node v . If we allow the outgoing flow from node v to be slightlymore than the incoming flow to node v , then µ v is the revenue gain from adding more drivers at node v . The way we formulate and solve the problem, in fact, naturally leads to two interesting insights into this problem,which are potentially useful for real applications.
1. Scalability
In our model, the size of the convex program increases linearly in the number of edges, hencequadratically in the number of regions. This could be one hidden feature that is potentially an obstacle to realapplications, where the number of regions in a city might be quite large.A key observation to the issue is that any dispatching policy induced by a real system is a feasible solution of ourconvex program and any improvement (maybe via gradient descent) from such policy in fact leads to a better solutionfor this system. In other words, it might be hard to find the exact optimal or nearly optimal solutions, but it is easy toimprove from the current state. Therefore, in practice, the platform can keep running the optimization in backgroundand apply the most recent policy to gain more revenue (or achieve a higher value of some other objectives).
2. Alternative solution
As suggested by the characterization and its economic interpretation, instead of solving theconvex programs directly, we also have an alternative way to find the optimal policy by solving the dual program.The optimal policy can be easily recovered from dual optimal solutions. In particular, according to the economicinterpretation of dual variables, we need to estimate the marginal contributions of drivers.More importantly, the number of dual variables ( = the number of regions) is much smaller than the number ofprimal variables ( = the number of edges ≈ square of the former). So solving the dual program may be more efficientwhen applied to real systems, and is also of independent interest of this paper.9 Empirical Analysis
We design experiments to demonstrate the good performance of our algorithms for real applications. In this section,we first describe the dataset and then introduce how to extract useful information for our model from the dataset.Two benchmark policies,
FIXED and
SURGE , are compared with our pricing policy. The result analysis includesdemand-supply balance and instantaneous revenue in both static and dynamic environments.
We perform our empirical analysis based on a public dataset from a major ride-sharing company. The dataset includesthe orders in a city for three consecutive weeks and the total number of orders is more than 8 . . Destination IDs O r i g i n I D s Heatmap of Routes
Log F r equen cy Figure 2: The logarithmic frequencies of request routes.
The time consumptions from nodes to nodes and demand curves for edges are known in our model. However, thedataset doesn’t provide such information directly. We filter out "abnormal" requests and apply a linear regression to10
Price in Local Currency T i m e i n M i n s Time & Price
Log F r equen cy (a) Time & price without filtering Price in Local Currency T i m e i n M i n s Time & Price Filtered
Log F r equen cy (b) Time & price with filtering Figure 3: The logarithmic frequencies of ( time , price ) pairs, with or without filtering the “abnormal” requests. Value D en s i t y Request value from region 9 to 11
Value D en s i t y Request value from region 6 to 2
Figure 4: Fitting request values to lognormal distributions.get the relationship of the travel time and the price. It makes possible to infer the travel time from the order price.For the demand curves, we observe the values of each edge and fit them to lognormal distributions.
Distance and travel time
The distance (or equivalently the travel time) from one region to another is required toperform our simulation. We approximate the travel time by the time interval of two consecutive requests assigned tothe same driver. In Figure 3(a), we plot the frequencies of requests with certain (time, price) pairs. We cannot seeclear relationship between time and price, which are supposed to be roughly linearly related in this figure. We thinkthat this is due to the existence of two types of “abnormal” requests:•
Cancelled requests , usually with very short completion time but not necessarily low prices (appeared in theright-bottom part of the plot);•
The last request of a working period , after which the driver might go home or have a rest. These requests usuallyhave very long completion time but not necessarily high enough prices (appeared in the left-top part of theplot).With the observations above, we filter out the requests with significantly longer or shorter travel time comparedwith most of the requests with the same origin and destination. Figure 3(b) illustrates the frequencies of requestsafter such filtering. As expected, the brightest region roughly surrounds the 30 ◦ line in the figure. By applying astandard linear regression, the slope turns out to be approximately 0 . The price of a ride is the maximum of a two-dimension linear function of the traveled distance and spent time and a minimal price (which is 7CNY as one can see the vertical bright line at price =
20 40 60 80 100 120 R e v enue pe r M i nu t e (a) Static environment R e v enue pe r M i nu t e (b) Dynamic environment Figure 5: Convergence of revenue.
Estimation of demand curves
To estimate the demand curves, we first gather all the requests along the same edge(also within the same time period for dynamic environment, see Section 6.5) and take the prices associated with therequests as the values of the passengers. Then, we fit the values of each edge (and each time period for dynamicenvironment) to a lognormal distribution. The reason that we choose the lognormal distribution is two-fold: (i) thedata fits lognormal distributions quite well (see Figure 4 as examples); (ii) lognormal distributions are commonly usedin some related literatures Ostrovsky and Schwarz (2011); Lahaie and Pennock (2007); Roberts et al. (2016); Shen andTang (2017).We set the cost of traveling to be zero, because we do not have enough information from the dataset to infer thecost.
We consider two benchmark policies:•
FIXED : fixed per-minute pricing, i.e., the price of a ride equals to the estimated traveling time from the originto the destination of this ride multiplied by a per-minute price α , where α is a constant across the platform.• SURGE : based on
FIXED policy, using surge pricing to clear the local market when supply is not enough.In other words, the price of a ride equals to the estimated traveling time multiplied by αβ , where α is thefixed per-minute price and β ≥ β is dynamic and can be different forrequests initiated at different regions, while the requests initiated at the same regions will share the same surgemultipliers.In the rest of this section, we evaluate and compare our dynamic pricing policy DYNAM with these two benchmarksin both static and dynamic environments.
We first present the empirical analysis for the static environment, which is simpler than the dynamic environmentthat we will consider next, hence easier to begin with.In the static environment, we use the average of the statistics of all 21 days as the inputs to our model. Forexample, the demand function D ( p | e ) is estimated based on the frequencies and prices of the requests along edge e averaged over time. Similarly, the total supply of drivers is estimated based on the total durations of completedrequests.With the static environment, we can instantiate the convex program (4.2) and solve via standard gradient descentalgorithms. In our case, we simply use the MATLAB function fmincon to solve the convex program on a PC withIntel i5-3470 CPU. We did not apply any additional techniques to speed-up the computation as the optimizationof running time is not the main focus of this paper. Figure 5(a) illustrates the convergence of the objective value(revenue) with increasing number of iterations, where each iteration roughly takes 0 . Hours R e v enue pe r M i nu t e Static Environment
DYNAMFIXEDSURGE (a) Static environment
Hours R e v enue pe r M i nu t e Dynamic Environment
DYNAMFIXED SURGE (b) Dynamic environment
Figure 6: Instantaneous revenue in different environments.
Hours S upp l y / D e m and Region
DYNAMFIXEDSURGE
Hours S upp l y / D e m and Region
DYNAMFIXEDSURGE
Hours S upp l y / D e m and Region
DYNAMFIXEDSURGE
Hours S upp l y / D e m and Region
DYNAMFIXEDSURGE
Hours S upp l y / D e m and Region
DYNAMFIXEDSURGE
Figure 7: Instantaneous supply ratios for different regions.To compare the performance of policy
DYNAM with the benchmark policies
FIXED and
SURGE , we also simulatesthem under the same static environment. In particular, the length of each timestep is set to be 15 minutes and thenumber of steps in simulation is 96 (so 24 hours in total). For both
FIXED and
SURGE , we use the per-minute pricefitted from data as the base price, α = . β to be in [ . , . ] . To make the evaluationscomparable, we use the distribution of drivers under the stationary solution of our convex program as the initialdriver distributions for FIXED and
SURGE . Figure 6(a) shows how the instantaneous revenues evolve as the timegoes by, where
DYNAM on average outperforms
FIXED and
SURGE by roughly 24% and 17%, respectively.Note that our policy
DYNAM is stationary under the static environment, the instantaneous revenue is constant(the red horizontal line). Interestingly, the instantaneous revenue curves of both
FIXED and
SURGE are decreasingand the one of
FIXED is decreasing much faster. The observation reflects that both
FIXED and
SURGE are notdoing well in dispatching the vehicles:
FIXED simply never balances the supply and demand, while
SURGE showsbetter control in the balance of supply and demand because the policy seeks to balance the demand with local supplywhen supply can not meet the demand. However, neither of them really balance the global supply and demand, sothe instantaneous revenue decrease as the supply and demand become more unbalanced.In other words, the empirical analysis supports our insight about the importance of vehicle dispatching inride-sharing platforms.
In the dynamic environment, the parameters (i.e., the demand functions and the total number of requests) are estimatedbased on the statistics of each hour but averaged over different days. For example, the demand functions D h ( p | e ) are defined for each edge e and each of the 24 hours, h ∈ { , . . . , } . In particular, we only use the data from theweekdays (14 days in total) among the most popular 5 regions for the estimation.Again, we instantiate the convex program (3.2) for the dynamic environment and solve via the fmincon functionon the same PC that we used for the static case. Figure 5(b) shows the convergence of the objective value withincreasing number of iterations, where each iteration takes less than 1 minute. The reason that we only use data from weekdays is that the dynamics of demands and supplies in weekdays do have similar patterns but quitedifferent from the patterns of weekends.
13e setup
FIXED and
SURGE in exactly the same way as we did for the static environment, except that the initialdriver distribution is from the solution of the convex program for dynamic environment.Figure 6(b) shows the instantaneous revenue along the simulation. In particular, the relationship
DYNAM (cid:31)
SURGE (cid:31)
FIXED holds almost surely. Moreover, the advantages of
DYNAM over the other two policies are moresignificant at the high-demand “peak times”. For example, at 8 a.m.,
DYNAM ( ∼ SURGE ( ∼ FIXED ( ∼ Demand-supply balance
Balancing the demand and supply is not the goal of our dispatching policy. However, apolicy without such balancing abilities are unlikely to perform well. In Figure 7, we plot the supply ratios (defined asthe local instantaneous supply divided by the local instantaneous demand) for all the 5 regions during the 24 hours ofthe simulation.From the figures, we can easily check that comparing with the other two lines, the red line (the supply ratio of
DYNAM ) tightly surrounds the “balance” line of 100%, which means that the number of available drivers at any timeand at each region is close to the number of requests sent from that region at that time. The lines of other two policiessometimes could be very far from the “balance” line, that is, the drivers under policy
FIXED and
SURGE are not inthe location where many passengers need the service.As a result, our policy
DYNAM shows much stronger power in vehicle dispatching and balancing demand andsupply in dynamic ride-sharing systems. Such advanced techniques in dispatching can in turn help the platform togain higher revenue through serving more passengers.
References
Javier Alonso-Mora, Samitha Samaranayake, Alex Wallar, Emilio Frazzoli, and Daniela Rus. 2017. On-demandhigh-capacity ride-sharing via dynamic trip-vehicle assignment.
PNAS (2017), 201611675.Santiago Balseiro, Max Lin, Vahab Mirrokni, Renato Paes Leme, and Song Zuo. 2017. Dynamic revenue sharing. In
NIPS 2017 .Siddhartha Banerjee, Daniel Freund, and Thodoris Lykouris. 2017. Pricing and Optimization in Shared Vehicle Systems:An Approximation Framework. In
EC 2017 .Siddhartha Banerjee, Carlos Riquelme, and Ramesh Johari. 2015. Pricing in Ride-share Platforms: A Queueing-Theoretic Approach. (2015).Kostas Bimpikis, Ozan Candogan, and Saban Daniela. 2016. Spatial Pricing in Ride-Sharing Networks. (2016).Stephen Boyd and Lieven Vandenberghe. 2004.
Convex optimization . Cambridge university press.Gerard P Cachon, Kaitlin M Daniels, and Ruben Lobel. 2016. The role of surge pricing on a service platform withself-scheduling capacity. (2016).Juan Camilo Castillo, Dan Knoepfle, and Glen Weyl. 2017. Surge pricing solves the wild goose chase. In
EC 2017 . ACM,241–242.Nelson D Chan and Susan A Shaheen. 2012. Ridesharing in north america: Past, present, and future.
Transport Reviews
32, 1 (2012), 93–112.M Keith Chen and Michael Sheldon. 2015.
Dynamic pricing in a labor market: Surge pricing and flexible work on theUber platform . Technical Report. Mimeo, UCLA.Judd Cramer and Alan B Krueger. 2016. Disruptive change in the taxi business: The case of Uber.
The AmericanEconomic Review
AER
Proceedingsof the 26th International Conference on World Wide Web . WWW 2017, 53–62.14an L Gale and Thomas J Holmes. 1993. Advance-purchase discounts and monopoly allocation of capacity.
TheAmerican Economic Review (1993), 135–146.Michel Gendreau, Alain Hertz, and Gilbert Laporte. 1994. A tabu search heuristic for the vehicle routing problem.
Management science
40, 10 (1994), 1276–1290.Gianpaolo Ghiani, Francesca Guerriero, Gilbert Laporte, and Roberto Musmanno. 2003. Real-time vehicle routing:Solution concepts, algorithms and parallel computing strategies.
European Journal of Operational Research arXiv preprint arXiv:1607.04357 (2016).Peter F Kostiuk. 1990. Compensating differentials for shift work.
Journal of political Economy
98, 5, Part 1 (1990),1054–1075.Sébastien Lahaie and David M Pennock. 2007. Revenue analysis of a family of ranking rules for keyword auctions. In
EC 2007 . ACM, 50–56.Gilbert Laporte. 1992. The vehicle routing problem: An overview of exact and approximate algorithms.
Europeanjournal of operational research
59, 3 (1992).Shuo Ma, Yu Zheng, and Ouri Wolfson. 2013. T-share: A large-scale dynamic taxi ridesharing service. In
ICDE . IEEE,410–421.R Preston McAfee and Vera Te Velde. 2006. Dynamic pricing in the airline industry. forthcoming in Handbook onEconomics and Information Systems, Ed: TJ Hendershott, Elsevier (2006).Luis Moreira-Matias, Joao Gama, Michel Ferreira, Joao Mendes-Moreira, and Luis Damas. 2013. Predicting taxi–passenger demand using streaming data.
IEEE Transactions on Intelligent Transportation Systems
14, 3 (2013),1393–1402.Gerald S Oettinger. 1999. An empirical analysis of the daily labor supply of stadium venors.
Journal of politicalEconomy
EC 2011Practical . ACM, 59–60.Ben Roberts, Dinan Gunawardena, Ian A Kash, and Peter Key. 2016. Ranking and tradeoffs in sponsored searchauctions.
ACM Transactions on Economics and Computation
4, 3 (2016), 17.Weiran Shen and Pingzhong Tang. 2017. Practical versus Optimal Mechanisms. In
AAMAS . 78–86.Joanna Stavins. 2001. Price discrimination in the airline market: The effect of market concentration.
Review ofEconomics and Statistics
83, 1 (2001), 200–202.Christopher S Tang, Jiaru Bai, Kut C So, Xiqun Michael Chen, and Hai Wang. 2016. Coordinating supply and demandon an on-demand platform: Price, wage, and payout ratio. (2016).Yongxin Tong, Yuqiang Chen, Zimu Zhou, Lei Chen, Jie Wang, Qiang Yang, Jieping Ye, and Weifeng Lv. 2017. Thesimpler the better: a unified approach to predicting original taxi demands based on large-scale online platforms. In
KDD 2017 . ACM, 1653–1662.Dengji Zhao, Dongmo Zhang, Enrico H Gerding, Yuko Sakurai, and Makoto Yokoo. 2014. Incentives in ridesharingwith deficit control. In