Closed Non-atomic Resource Allocation Games
TThe Price of Queueing
Costas Courcoubetis and Antonis Dimakis Singapore University of Technology and Design [email protected] Athens University of Economics and Business [email protected]
Abstract.
How is efficiency affected when demand excesses over supply are signalled through waitingin queues? We consider a class of congestion games with a nonatomic set of players of a constant mass,based on a formulation of generic linear programs as sequential resource allocation games. Playerscontinuously select activities such that they maximize linear objectives interpreted as time-average ofactivity rewards, while active resource constraints cause queueing. In turn, the resulting waiting delaysenter in the optimization problem of each player.The existence of Wardrop-type equilibria and their properties are investivated by means of a potentialfunction related to proportional fairness. The inefficiency of the equilibria relative to optimal resourceallocation is characterized through the price of anarchy which is 2 if all players are of the same type( ∞ if not). In crowdsourcing, access, and sharing economies, a large number of individuals interact to exchange goodsand services, with each individual pursuing his or her own interest. The matching of supply and demandtakes place in shorter times than in traditional product-service economies, so mismatches may be manifestedalso in non-monetary terms as congestion. For example in ride-hailing, it is common for drivers to facesignificant waiting delays until they are matched with a customer, if the number of available drivers inan area exceeds local demand. In this paper we consider a class of nonatomic games, and the appropriateequilibrium concept, which capture the noncooperative behavior and congestion effects in resource allocationsettings such as above.There is a large literature on congestion games [15], which examine the interaction between congestionand noncooperative behavior. In the case where the number of players is large and each has a negligible effecton congestion, nonatomic congestion games view players as a continuous mass whose equilibrium behavioris described by Wardrop-type equilibria, first studied for road traffic in [3, 18]. In this paper we consider asimilar case but with a constant player mass playing a sequential game.To illustrate the difference consider the example in Fig. 1, which resembles Pigou’s selfish routing exam-ple [14]. Players flow from left to right utilizing routes 1 (upper) and 2 (lower). Traversal of the upper routeoffers reward 2 while the lower route a unit reward. Let x i be the rate of players flowing through route i ,and assume the maximum rate at which players can flow through route is 1, i.e., x ≤
1. Route 2 is notrate limited and the delay is always 1. The players do not exit the system after a traversal; they return backto the origin on the left and keep circulating. Assume the mass of players in the system is equal to 2. Abasic difference with Pigou’s example is that the delay of route 1 is not a function of the flow there. It isexpressed as 1 + w where w is the additional waiting time players face if they accumulate in route 1 dueto the limitation in the flow rate. If x <
1, since the players behave as a fluid, they do not accumulate so w = 0. On the other hand, if x = 1 then w is not a function of x anymore; it also depends on the playermass currently on route 2, i.e., x . The total mass is 2, so in route 1 there must be 2 − x players whosemass is also expressed by Little’s law [12] as (1 + w ) x . Thus, w = 1 − x if x = 1.What flows x , x will result from a ‘selfish circulation’ ? Players will prefer the upper route as long as thereward per round-trip is higher than that offered by the lower route, i.e., w ≥
1. If x = 1, this is alwaysthe case if x > w = 1 − x and hence w = 1 + x w >
1. In fact, the entire mass of players canutilize the upper route, in which case w = 1, and still make that route at least as preferrable as the lowerone. Thus, x = 1 , x = 0 is an equilibrium, and is easy to see that it is the only one. The long-run averagetotal reward resulting from the equilibrium circulation of players is 2 (or 1 per unit of player mass). This is a r X i v : . [ c s . G T ] J u l Costas Courcoubetis and Antonis Dimakis
Fig. 1.
Selfish circulation of a constant mass of nonatomic players which maximize their average reward per unittime. The waiting delay w is 0 if x < less that the maximum possible reward, obtained by solving:max 2 x + x (1)such that x ≤ ,x + x = 2 , over x , x ≥ , (Note that in the mass constraint x + x = 2 we do not need to account for waiting players as those canalways be assigned more profitably on route 2.) Its (obvious) optimal solution, x ∗ = 1 , x ∗ = 1, yields 3 asaverage reward (or 1.5 per unit mass), as the use of route 2 increases the average reward by 50%.More generally, we take arbitrary linear programs with a single ‘mass constraint’, similar to (1), as ourpoint of departure, and consider a selfish circulation of nonatomic players which select activities that max-imize their time-average rewards, given by the objective function. The inequality constraints correspond toresource constraints, which when active, cause waiting delays. The latter include all the relevant informationa player needs to know about the other players’ strategies in order to maximize his or her own time-averagerewards, by solving a dynamic program.In this respect, our work is related to the stationary anonymous sequential games in [19] where theplayers are aware of how the other players are distributed over strategies, and maximize their time-averagepayoffs. The equilibria in these systems are similar to our Definition 2 except that we allow the inclusionof ‘balance’ constraints, which are private to each player without additional private state variables (seeSection 2). Another difference with the literature on anonymous sequential games is that the existence ofequilibrium there is established using nonconstructive compactness arguments (see [6, 19]).In Section 4 the existence and uniqueness properties are established by means of a potential function theplayers unknowingly maximize, which is markedly different from the objective of the linear program. Forexample, the optimization problem corresponding to (1) is:max 2 log(2 x + x ) − x − x such that x ≤ , over x , x ≥ , with the sole optimal solution being the equilibrium flow x = 1 , x = 0. As the optimal solutions in thetwo optimization problems in general do not coincide, the time-average rewards in equilibria will be strictlylower. The largest possible ratio of the maximum reward over the reward at equilibrium, called the price ofanarchy , is a measure of the inefficiency of equilibrium, first proposed by Koutsoupias and Papadimitriouin [11]. For nonatomic congestion games, the price of anarchy has been first computed in [16] for variousfamilies of delay functions. In Proposition 2 we establish that the price of anarchy is 2, attained in the limitof a sequence of simple examples, similar to that above.In Section 3 we consider examples from three areas: ride-hailing, crowdsourcing platforms, and interactingsemi-Markov decision processes. For each case, we give example formulations as resource allocation games he Price of Queueing 3 and obtain some new results. In [5] the authors consider the optimization of fluid model of a ride-hailingsystem, where mismatches of demand and supply cause waiting delays similar to fluid queues, but no gamingaspects are explored. This is done in [2] where a concept of equilibrium similar to ours is defined for the twostrategies of whether to circulate (with routing fixed) or not. [4] considers routing as part of the strategy setand establishes existence of equilibrium in symmetric systems with identical players. Corollary 1 in Section 4extends the results of [4] for arbitrary network topologies and multiple player types.The statements and proofs of the main results are given in Section 4, followed by discussion in Section 5. A mass d l > l = 1 , . . . , L players generate value by performing a set of J activities which consume I resources. For a type l player, activity j ∈ { , . . . , J } takes time t lj ≥ a lij ≥ i ∈ { , . . . , I } . Let t l = ( t lj , j = 1 , . . . , J ) be the column vector of activity durations, and b = ( b i , i = 1 , . . . , I ) where b i > i is provided. Also, let x lj denote the total rateat which each activity j is taken by all players of type l , and x l = ( x lj , j = 1 , . . . , J ) be the column vector oftype l rates. Then the resource constraints are expressed as (cid:80) l A l x l ≤ b , where A l is a I × J matrix with a lij in its i -th row and j -th column. Each type l player receives reward c lj for completing activitity j . Thetotal reward rate for all type l players is expressed as c lT x l , where c l = ( c lj , j = 1 , . . . , n ) is a column vector.The activities can be interdependent in the sense that the rates x l for type l satisfy K l homogenous balance constraints , i.e., H l x l = 0, for some K l × J matrix H l with the element of the k -th row and j -thcolumn denoted by h lkj . Note that the resource constraints restrict the aggregate rates, whereas the balanceconstraints restrict the strategies of each player.Next, we define the activity rates which correspond to optimal resource allocation. Definition 1.
A vector of activity rates x ∗ = ( x ∗ l , l = 1 , . . . , L ) is optimal if it maximizes the total rewardrate, i.e., max (cid:88) l c lT x l (2) such that (cid:88) l A l x l ≤ b, (3) H l x l = 0 , (4) t lT x l = d l , l = 1 , . . . , L, (5) over x l ≥ , l = 1 , . . . , L. If instead players act selfishly, each maximizes his or her own average reward rate. As activities areassigned with no central coordination, players may have to wait before they can commence high rewardactivities due to competition for the limited resources. Thus, players need also take into account the waitingdelay w lj before each activity j can commence. Definition 2.
A pair ( x o , w ) where x o = ( x ol , l = 1 , . . . , L ) , w = ( w l , l = 1 , . . . , L ) is an equilibrium if:1. x ol = ( x lj , j = 1 , . . . , J ) is an optimal solution of max c lT x l (6) such that H l x = 0 , (7) t lT x + w lT x = d l , (8) over x ∈ R J + , for each l = 1 , . . . , L .2. (cid:88) l A l x ol ≤ b. (9) Costas Courcoubetis and Antonis Dimakis w l = A lT δ for a nonnegative column vector δ = ( δ i , i = 1 , . . . , I ) with δ i = 0 if (cid:80) j,l a lij x lj < b i . In problem (6) the time-average reward is maximized from the perspective of a type l player: time issplit into either performing some activity or waiting for it (as suggested by (8)) while respecting the balanceconstraints. Each player solves an instance of (6) for an infinitesimal mass in the righthand-side of (8), butas optimal solutions are homogeneous of degree 1 with respect to the mass constant, x ol gives the equilibriumrates for the entire mass of type l players.Resource constraints (9) are not part of the optimization in (6) as resource capacities b and aggregaterates x o are not directly known to the players; resource exhaustion is signaled through the waiting delays w instead. Condition 3 in Definition 2 requires the delays to be of a specific form which can be thought toresult from the following posited mechanism: tickets granting usage of single resource i units are handed outfrom a booth for the respective resource at rate b i . Players of type l can start performing activity j oncethey have collected all tickets for the resources required, i.e., a lij tickets for each resource i . If δ i is the delayto obtain a single resource i ticket then the waiting delay w lj for collecting all the required tickets for activity j is (cid:80) i a lij δ i .We make the assumption that there always exist activity assignments with positive rewards for all types. Assumption 1 (Feasibility).
There exist nonnegative vectors x l , l = 1 , . . . , L with (cid:80) l A l x l ≤ b, H l x l =0 , c lT x l > for all l . Also, we assume that all types have the incentive to participate under all possible waiting delays. Forexample, this is the case when there is an outside option with a positive reward.
Assumption 2 (Participation).
For every w l ≥ , l = 1 , . . . , L , the maximum value c lT x l in (6) is strictlypositive. In this section we formulate a model for ride-hailing which fits into the resource allocation framework. In ride-hailing systems a population of drivers transport customers to their destinations. We consider a geographicalarea which we assume it is divided into a finite set of regions. Let b i be rate at which customers arrive inregion i , with a proportion q ij of them requesting transport to region j . In the context of resource allocation,customers are seen as resources.The drivers constitute the players which we assume are of a single type with mass d . There are two typesof activities: i) a ‘busy’ activity, where the driver transports a customer who has been picked up from i to hisdestination, and ii) a ‘free’ activity, where the driver chooses to move from i to j without carrying a customer.Here it is assumed drivers cannot pickup customers from different regions, and any customers which exceedthe driver capacity in a region are lost. Notice that the ‘busy’ activity may involve waiting if the supply ofdrivers exceeds the rate of arriving customers. Also, it does not include the customer’s destination as this istypically not known to the driver before agreeing to serve the customer. A driver is compensated with c i > i . Thus, the busy activity brings an expected reward c i (cid:80) j q ij t ij , with t ij > i to j , while the free activity is not rewarded andtakes t ij time to complete.Let x i be the rate of drivers choosing the busy activity in i , and y ij be the rate of drivers moving freefrom i to j . Then, as the inflow and outflow of drivers in any region must balance, we have the constraint: x i + (cid:88) j y ij = (cid:88) j x j q ji + (cid:88) j y ji , for each i . The first term on the righthand side consists of the rate of busy drivers arriving to i after reachingthe destination of the customer that was picked up from j , for any j . Each busy activity ‘consumes’ acustomer, so we have the resource constraint x i ≤ b i for all i .Corollary 1 guarantees the existence of an equilibrium ( x o , w ), where the average reward and the massof waiting drivers have unique values in all equilibria. The equilibria can be computed by solving the convexoptimization problem (15). he Price of Queueing 5 Fig. 2.
A ride-hailing system with d drivers comprised by three regions (depicted as arcs). Top: busy and free flows(solid and dashed arrows, respectively) for d ≤
2. Middle: for 2 < d ≤ < d ≤ Example 1.
Consider an area with three regions as depicted in Fig. 2. Customers request transport fromregions 1, 3, with unit rate from each, towards the center region 2, i.e., b = b = 1 , b = 0 , q = q = 1.For each trip transporting a customer from region 3, the driver receives a unit reward, while from region 1receives double ( c = 2 , c = 1). Assume unit transport times between neighboring regions: t ij = | i − j | forany two regions i, j .We observe three regimes, depending on the mass d of drivers. In the first regime, there is no waiting inour fluid model to pick up customers in region 1. Serving continuously customers from 1 to 2 generates 2units of reward per round-trip, i.e., an average reward rate of 1. This is higher than the 1 / d drivers will choose serving region 1 providedthey can always pickup a customer on their return to region 1. This will be possible as long there are nowaiting drivers in region 1, and so d equals the number of drivers 2 x on the forward and return trip. As x ≤ b = 1, we must have d ≤ d is just above 2, then the customer demand from 1 cannot support all driversand so some of them may wait. They will do so if the average reward (including wait) is less than the averagereward serving region 3 (in which there is no waiting.) At this point the total reward rate is 1 and doesnot increase for small increases of d , even though the revenue stream from customers from region 3 is notutilized. Clearly, this equilibrium does not maximize the total reward rate (2), and even more, it is notPareto efficient, i.e., the society of drivers as a whole could gain more by serving region 3 too and splittingthe total proceeds.In the third regime, serving region 3 becomes a best choice due to the high delays in waiting for customersin region 1. The mass d − d is further increased, until the average reward 2 /d equals the 1/2 reward for serving region 3, i.e., d = 4. If d >
4, the extra d − d > In crowdsourcing platforms, tasks which typically form small parts of a much larger effort, are executedby many participants in parallel which may receive a reward for each task completion. The tasks vary intheir difficulty, time to complete, reward given etc., and so do the capabilities and task preferences of theparticipants. The latter, typically select tasks in order to receive as high rewards as possible.We can formulate a simplified model in terms of resource allocation as follows: tasks of type i aregenerated at rate b i and correspond to a unit of resource i . Activity i concerns the processing of one type i task. Participants, which are the players here, are of L different types, with the rewards c l , task processing Costas Courcoubetis and Antonis Dimakis times t l being dependent on the type l . One of the activities corresponds to idling and has 0 reward. Tasktypes which cannot be undertaken by a participant type, are assumed to bring a negative reward so that theyare never selected. A l are all unit matrices and there are no balance constraints as the tasks are assumedindependent.Theorem 1 below, implies an equilibrium ( x o , w ) exists and the resulting aggregate reward c lT x ol attainedby type l participants, for each l , is unique. By Corollary 1, the waiting delay w lj is uniquely determined fortasks with nonzero equilibrium rates, and given by (21).How does the total reward (cid:80) l c lT x ol compares to the maximum possible reward when task assignmentis performed by (2), with the same participants? Notice that if all rewards of one participant type, e.g., 1,are doubled while those of the other types do not change, x o remains an equilibrium as the relative rewardsbetween activities matter in players’ selection; not the actual rewards. Thus, there will be no increase in theamount of tasks completed by type 1. This is not the case under optimal task assignment, as the change willlikely allow type 1 to complete more tasks (by having other types idle) because their rewards are part of thesystem objective. The increase in optimal rewards may be arbitrarily larger than the increase in c lT x ol , asillustrated in the following example. Example 2.
Consider L = 2 participant types, with tasks of a single type (besides the idling task) arrivingat rate 1. The rewarded value is 1 /(cid:15) and 1 for type 1 and 2 respectively, for some constant (cid:15) >
0. The massof type 2 participants is 1 /(cid:15) , while that of type 1 is unit. Hence,max 1 (cid:15) x + x s.t. x + x ≤ ,x + x = 1 ,x + x = 1 (cid:15) , over x , x , x , x ≥ . yields the optimal solution x = 1 , x = 0, with x , x being the rates of the idling activities for eachparticipant type. This is expected, as the higher value type 1 participants generate more value than type2, attaining total value 1 /(cid:15) and both types choose task type 1 since it is the only one generating positiverevenue.On the other hand, the (unique) equilibrium has x = (cid:15) (cid:15) , x = 11 + (cid:15) , w = 1 (cid:15) yielding total value 2 / (1 + (cid:15) ). This can be formally shown either directly from Definition 2, or by Theorem 1below, but it is expected because type 1 participants are a fraction (cid:15) of type 2.The ratio of the optimal to the equilibrium value, i.e., the ‘price of anarchy’, is + (cid:15) which approaches ∞ as (cid:15) → In this section we formulate a nonatomic game with players’ states evolving according to semi-Markovdecision processes (SMDPs), which interact through congestion effects due to linear constraints. AlthoughProposition 1 below holds for players with SMDPs of multiple types, we state it for a single player type tosimplify notation. We then show that such games are instances of stationary anonymous sequential games [6,19].Consider an SMDP with a finite state space S and action space A . At each state i ∈ S , action a ∈ A will make the process transit to j with probability p aij after a random time with mean t ia >
0, which isindependent of the past conditionally on the current state and action. A stationary policy is specified by theprobability p ( a | i ) of choosing action a once transitioning to i , for every i ∈ S , a ∈ A . We assume the transitionprobabilities are such that the embedded Markov chain resulting from any stationary policy irreducible, soin particular the SMDP possesses a unique stationary distribution ( π i , i ∈ S ). Under this distribution, let he Price of Queueing 7 x ia = dπ i p ( a | i ) be the average rate at which action a is taken in state i , by d copies of the SMDP, allfollowing the same policy.Let c ia be the reward received for taking action a in state i . Action rates x = ( x ia , i ∈ S , a ∈ A ) areconstrained by resource constraints of the form Ax ≤ b , for nonnegative I × |S × A| matrix A and columnvector b . As in the general framework, active resource constraints cause a waiting delay w ia before action a in state i can be taken.We consider equilibria of the following form. Definition 3. ( p, x o , w ) is an SMDP equilibrium if and only if:1. The policy p = ( p ( a | i ) , i ∈ S , a ∈ A ) solves the dynamic programming equation V ( i ) = max a ∈A c ia + γ ( t ia + w ia ) + (cid:88) j ∈A p aij V ( j ) , i ∈ S , (10) i.e., the maximum is attained for any a with p ( a | i ) > .2. x oia = dπ i p ( a | i ) , i ∈ S , a ∈ A , (11) where ( π i , i ∈ S ) is the stationary distribution under policy p .3. Ax o ≤ b ,4. w = A T δ for a nonnegative column vector δ with δ q = 0 if (cid:80) i,a a q,ia x oq,ia < b q . Proposition 1.
An SMDP equilibrium ( x o , w, p ) exists and the time-average reward attained by the policy p is the same in every equilibrium.Proof. SMDP equilibria directly correspond to equilibria of Definition 2, as (10) is equivalent to (6) for L = 1,and set of activities S × A . It is well known that the stationary policies p which optimize (10) correspond tooptimal solutions ( y ∗ ia , ( i, a ) ∈ S × A ) of the linear program [7]:max (cid:88) ( i,a ) ∈S×A c ia y ia s.t. (cid:88) a ∈A y ia = (cid:88) ( j,a (cid:48) ) ∈S×A y ja (cid:48) p a (cid:48) ji , i ∈ S , (12) (cid:88) ( i,a ) ∈S×A ( t ia + w ia ) y ia = 1 , (13)over y ia ≥ , ( i, a ) ∈ S × A , where y ∗ ia corresponds to the rate action a is chosen at i under an optimal policy of (10). This implies x o /d is an optimal solution, since it corresponds to p , by (11). Therefore, ( x o , w ) satisfies the conditions inDefinition 2 for balance constraints given by (12).The converse is also true, as given an equilibrium ( x o , w ), p ( a | i ) = x oia (cid:80) a (cid:48) x oia (cid:48) , π i = (cid:80) a ( t ia + w ia ) x oia (cid:80) j,a (cid:48) ( t ja (cid:48) + w ja (cid:48) ) x oja (cid:48) define a policy and the corresponding stationary distribution which give an SMDP equilibrium. From Corol-lary 1, an SMDP equilibrium exists. (cid:117)(cid:116) If no resource constraint is active then w ia = 0 for all i, a , and no interaction takes place between theSMDPs. In this case the equilibrium policies achieve the maximum possible total average reward, and ajoint policy selection (control centralization) cannot produce a higher total reward. If some constraints areactive in equilibrium and waiting results then the average reward is strictly below the one possible undercentralized control. This drop due to decentralization, cannot be more than half because of Proposition 2below. Costas Courcoubetis and Antonis Dimakis
Relation to stationary anonymous sequential games:
In stationary anonymous sequential games [6,19], each player knows its own state and the distribution n = ( n ia , i ∈ S , a ∈ A ) of player mass on state-action pairs. The game between SMDPs is an instance of a (nonlinear) stationary anonymous sequentialgame because the information on the aggregate, ( x, w ), and n are equivalent, through the identities n ia =( t ia + w ia ) x ia for each i, a . Lemma 1.
For each nonnegative n = ( n ia , i ∈ S , a ∈ A ) with (cid:80) ( i,a ) ∈S×A n ia = d , there exist unique x ( n ) = ( x ia ( n ) , ( i, a ) ∈ S × A ) , and w ( n ) = A T δ with δ ∈ R I + , such that n ia = ( t ia + w ia ( n )) x ia ( n ) , for all i ∈ S , a ∈ A ,Ax ( n ) ≤ b, δ T ( Ax ( n ) − b ) = 0 . (14) The mapping n (cid:55)→ ( x ( n ) , w ( n )) is continuous.Proof. Consider the optimization problemmax (cid:88) i,a [ n ia log( x ia ) − t ia x ia ]s.t. Ax ≤ b, over x = ( x ia , i ∈ S , a ∈ A ) ≥ . A unique solution x ( n ) exists, as the objective is a strictly concave function maximized over a set withcompact closure and x ia ( n ) > n ia = 0. By strong duality, (14) characterizes the optimal solutionwith δ being the optimal solution of the dual problem.The mapping n (cid:55)→ x ( n ) is continuous because the objective is continuous in n and x ( n ) is unique. Thecontinuity of w ( n ) follows from (14). (cid:117)(cid:116) Since the action delay w ia ( n ) of a player taking action a in state i , are continuous in n , the existence ofequilibrium in Proposition 1 also follows from the time-average reward case in [19]. In Theorem 1 we give aconstructive proof which also yields uniqueness, based on a potential function for the game.In the case p aii = 1 for all i, a , the SMDP game becomes a finite strategy nonatomic (one-shot) game [13,17]with the payoff of playing strategy ( i, a ) given by c ia t ia + w ia ( n ) = c ia x ia ( n ) n ia , i ∈ S , a ∈ A . The second case in Corollary 1 states that the equilibrium w ia ( n ) is unique if x ia ( n ) > Equilibria have the following variational characterization.
Theorem 1. ( x o , w ) is an equilibrium if x o maximizes max (cid:88) l (cid:2) d l log (cid:0) c lT x l (cid:1) − t lT x l (cid:3) (15) s.t. (cid:88) l A l x l ≤ b, (16) H l x l = 0 , l = 1 , . . . , L, (17) over x l ∈ R J + , l = 1 , . . . , L, and w = ( A lT λ, l = 1 , . . . , L ) , where λ ∈ R I + are optimal values for the dual variables of the constraint (16) .Under Assumption 2, for any equilibrium ( x o , w ) , x o maximizes (15) and w is as above. he Price of Queueing 9 Proof.
Let ( x o , w ) be an equilibrium. Since x o maximizes (6), Assumption 2 implies it is also the maximizerof d l log (cid:0) c lT x l (cid:1) under the same constraints. The optimality conditions for this problem are:1. (Feasibility) H l x ol = 0 , t lT x ol + w lT x ol = d l , (18)2. (First order conditions) d l c lj c lT x ol − t lj ν l − w lj ν l − (cid:88) k µ lk h lkj ≤ , with equality if x lj > , (19)for some values ν l , µ lk of the dual variables, for each l = 1 , . . . , L, k = 1 , . . . , K l .First note that multiplying both sides of (19) with x lj , summing over j , and applying (18) yields ν l = 1.For δ as in Definition 2, letting λ = δ yields λ ≥ , λ T (cid:32)(cid:88) l A l x ol − b (cid:33) = 0 (20)as well as w l = A lT λ . Thus, the conditions above are reexpressed as1 (cid:48) ) H l x ol = 0,2 (cid:48) ) d l c lj c lT x ol − t lj − (cid:88) i λ i a lij − (cid:88) k µ lk h lkj ≤ , with equality if x lj > . These, along with (9),(20) are the optimality conditions for the problem (15), which x o satisfies.By proceeding in the reverse direction, it is easy to see that 1 (cid:48) ), 2 (cid:48) ), (9),(20) imply 1), 2), and so ( x o , w )is equilibrium for w as in the statement of the theorem. (cid:117)(cid:116) The linear term (cid:80) l t lT x l in (15), which we refer to as the active mass , corresponds to the total playermass that is engaged into any activity. Corollary 1.
1. Under Assumption 1 there exists an equilibrium.2. Under Assumption 2, the value c lT x ol rewarded to type l players and the active mass (cid:80) l t lT x ol , assumethe same values in all equilibria.3. Under Assumption 2, if type l has no balance constraints, i.e., H l = 0 , the waiting delay w lj is uniquelydetermined by c lj w lj + t lj = c lT x ol d l , (21) whenever x lj > in any equilibrium.Proof. By Assumption 1 the feasible set of problem (2) is nonempty, and it has a compact closure. Since theobjective function is continuous, a maximizing x o = ( x ol , l = 1 , . . . , L ) exists inside the closure. As the valueof the objective function tends to −∞ as c lT x l → + , x o is feasible and so it is optimal.Under Assumption 2 any equilibrium x o corresponds to an optimum solution of (15). Since the objectivefunction is strictly concave with respect to c lT x l , the c lT x ol values are unique. As all equilibria yield thesame optimal value in (15), (cid:80) l t lT x ol is also unique.Equation (21) holds because x lj > j ’s reward per unit time, appearingon the lefthand side in (21), is equal to the optimal one for type l on the right. (This is a restatement ofcondition (19).) For every l, j with x lj > w lj is unique because c lT x ol is. (cid:117)(cid:116) If the maximization in (15) is restricted to a constant active mass (by including the constraint (cid:80) l t lT x l = d for some d ) then only the first term, the aggregate of logarithmic rewards, is optimized. This objectiveinduces a proportionally fair [9] distribution of value between players, i.e., any changes to activity ratesincur an aggregate of proportional value changes which is nonpositive. Therefore, the value distribution at equilibrium is the proportionally fair allocation under the additional restriction that the active mass is thatat equilibrium, i.e., (cid:80) l t lT x l = (cid:80) l t lT x ol . Note also that proportionally fair allocations coincide with theNash bargaining solution if disagreement entails nonparticipation.In the single player type case, equilibria achieve maximum value when only the active mass at equilibriumis allowed to participate. Let F ( d (cid:48) ) be the optimal value of (2) for player mass d (cid:48) , i.e., F ( d (cid:48) ) = max c T x (22)s.t. Ax ≤ b,Hx = 0 ,t T x = d (cid:48) , (23)over x ≥ , where we have dropped the type index. Corollary 2.
Let Assumption 2 hold, and F ( d ) > . For a single player type with mass d , the equilibrium x o is optimal for a player mass equal to the active mass at equilibrium, t T x o , i.e., c T x o = F ( t T x o ) .Moreover, the active mass at equilibrium is the unique d (cid:48) ∈ (0 , d ] with the property F ( d (cid:48) ) = F (cid:48) ( d (cid:48) ) d , where F (cid:48) ( d (cid:48) ) is a subgradient at d (cid:48) .Proof. By Theorem 1, x o , d o = t T x o maximizemax c T xe − d (cid:48) d (24)s.t. Ax ≤ b,Hx = 0 ,t T x = d (cid:48) , over x ≥ , d (cid:48) ≥ . If x is feasible in (22) for d (cid:48) = d then xd (cid:48) /d is also feasible for any d (cid:48) ≤ d . As F ( d ) >
0, the feasible setof (22) is nonempty for any d (cid:48) ∈ (0 , d ], and F ( d (cid:48) ) > d (cid:48) >
0. For d (cid:48) ∈ (0 , d ] fixed, optimizing (24) withrespect to x yields the optimal value F ( d (cid:48) ) e − d (cid:48) d which itself is maximized for d (cid:48) = d o , and so c T x o = F ( d o )as well.Now, − d (cid:48) d + log F ( d (cid:48) )is a concave function of d (cid:48) with the maximizing d (cid:48) characterized by F ( d (cid:48) ) = F (cid:48) ( d (cid:48) ) d for a subgradient F (cid:48) ( d (cid:48) ).As d o is the unique maximizer, this equation identifies d o uniquely. (cid:117)(cid:116) For a single player type, we calculate the price of anarchy, i.e., the largest possible ratio of the optimal valueand value at equilibrium, sup d> ,c ∈ R J ,A ∈ R I × J + ,b ∈ R I + ,H ∈ R K × J ,t ∈ R J + ,I,J,K ∈ N s.t. Assumption 2, F ( d ) > c T x ∗ c T x o , where x ∗ is optimal, and x o an equilibrium, for player mass d . Assumption 2 and F ( d ) > Proposition 2.
The price of anarchy is 2. he Price of Queueing 11
Proof.
Since F , defined in (22), satisfies F ( d o ) = F (cid:48) ( d o ) d = c T x o by Corollary 2, c T x ∗ − c T x o = F ( d ) − F ( d o ) ≤ c T x o d ( d − d o ) , by using also the concavity of F . Thus, c T x ∗ c T x o ≤ − d o d ≤ , as d o ≤ d .To get a lower bound, for any (cid:15) > (cid:15) x + x s.t. x ≤ (cid:15),x + x = 1 , over x , x ≥ . The maximum value is 2 − (cid:15) achieved at x ∗ = (cid:15), x ∗ = 1 − (cid:15) .On the other hand, waiting delays w = (cid:15) − , w = 0, induce x o = (cid:15), x o = 0 as optimal solution ofmax 1 (cid:15) x + x s.t. x + x + w x + w x = 1over x , x ≥ , which saturates the resource contraint x o = (cid:15) . Thus, x o = (cid:15), x o = 0 is the equilibrium with value 1 and the c T x ∗ /c T x o ratio in this case is 2 − (cid:15) where (cid:15) > (cid:117)(cid:116) A way to force players pick activities which maximize total value as opposed to individual rewards, is touse the shadow prices λ ∗ of the resource constraints in (2) as resource prices. Under these ‘optimal’ prices,activity j has net reward c j − (cid:80) i a ij λ ∗ i .Now, duality implies the optimal x ∗ maximizes the Lagrangian of (2),max (cid:16) c T − λ ∗ T A (cid:17) x such that Hx = 0 ,t T x = d, over x ≥ , and so ( x ∗ ,
0) is an equilibrium under optimal pricing, as Ax ∗ ≤ b also holds.We include this in the following result. Proposition 3.
Under optimal pricing, i.e., imposing a price λ ∗ i per unit of each resource i where ( λ ∗ , . . . , λ ∗ I ) are the optimal dual variables for the resource constraint in (2) , the ensuing equilibrium yields the same valueas the optimal value in (2) .The value at equilibrium without optimal pricing is at least as high as the net value retained by playersunder optimal pricing, i.e., c T x o ≥ c T x ∗ − λ ∗ T Ax ∗ . Proof.
Corollary 2 and the concavity of F yield, c T x o = F (cid:48) ( d o ) d ≥ F (cid:48) ( d ) d = c T x ∗ − λ ∗ T b = c T x ∗ − λ ∗ T Ax ∗ , where the second equality is by strong duality for problem (22). (cid:117)(cid:116) In ordinary, i.e., one-shot, congestion games waiting delays have the role of congestion cost, usually givenexogenously [15] or caused by randomness in the arrivals and service times [8]. The delays in equilibriumcorrespond exactly to the Lagrange multipliers of flow balance constraints of an optimization problem max-imizing the potential function of the game, e.g., see [8]. This is also what happens in Theorem 1 where thedelays are the Lagrange multipliers of the resource constraints in (15). Of course, this is a subsequence ofhow delay is defined in the third condition of Definition 2 which is the complementary slackness condition forthese constraints. What is novel, to the best of the authors’ knowledge, is the use of the potential functionin (15) for sequential congestion games, where the delays are determined endogenously by constraints onplayer mass (i.e., Little’s law [12]). In particular, the concavity of the potential function can be used inshowing that the best response dynamics coupled with the waiting delay dynamics due to queueing,˙ δ i = 1 b i (cid:88) l,j a lij x lj − , converge to an equilibrium, by interpreting them as dynamics of a primal-dual algorithm for solving (15).The linear reward structure is readily generalized to concave homogenous rewards by following essentiallythe same proofs. In one-shot congestion games, inefficiency arises due to inhomogeneity of cost functions [16].In games exhibiting both endogenous delays and inhomogeneous rewards it will be interesting to determinehow efficiency is affected by each.The analysis in this paper may be useful in economic applications where both consumption and productionof resources takes place. Activities with a lij < of as producing − a lij units of resource i ,while resources with b i < − b i . Such models are considered inactivity analysis, e.g., see [10], where the focus is in optimizing (2). This can be a daunting task becausethe requirement of a centralized knowledge of production parameters is nonrealistic, and for this reasonactivity analysis has been subsumed by general equilibrium models [1]. Nonetheless, an equilibrium conceptin activity analysis, such as the one considered in Definition 2 and Theorem 1, may be useful in cases eludedby general equilibrium models. Namely, cases where the production decisions are decentralized and taken bycompeting economic agents, as in crowdsourced production where prices may react slower to variations ofsupply and demand. References
1. Kenneth J. Arrow. George Dantzig in the development of economic analysis.
Discrete Optimization , 5(2):159 –167, 2008. In Memory of George B. Dantzig.2. Siddhartha Banerjee, Ramesh Johari, and Carlos Riquelme. Pricing in ride-sharing platforms: A queueing-theoretic approach. In
Proceedings of the Sixteenth ACM Conference on Economics and Computation , EC ’15,page 639, New York, NY, USA, 2015. Association for Computing Machinery.3. Martin J Beckmann, Charles B McGuire, and Christopher B Winsten. Studies in the economics of transportation.1955.4. Kostas Bimpikis, Ozan Candogan, and Daniela Saban. Spatial pricing in ride-sharing networks.
OperationsResearch , 67(3):744–769, 2019.5. Anton Braverman, J. G. Dai, Xin Liu, and Lei Ying. Empty-car routing in ridesharing systems.
OperationsResearch , 67(5):1437–1452, January 2019.6. Boyan Jovanovic and Robert W. Rosenthal. Anonymous sequential games.
Journal of Mathematical Economics ,17(1):77 – 87, 1988.7. L.C.M. Kallenberg.
Linear Programming and Finite Markovian Control Problems . Mathematical Centre tracts.Mathematisch Centrum (Amsterdam, Netherlands), 1983.8. F. P. Kelly. Network routing.
Philosophical Transactions: Physical Sciences and Engineering , 337(1647):343–367,1991.9. Frank Kelly. Charging and rate control for elastic traffic.
European Transactions on Telecommunications , 8(1):33–37, 1997.10. T. C. Koopmans, editor.
Activity Analysis of Production and Allocation . Wiley, New York, 1951. Note however that (8) is no longer a mass constraint.he Price of Queueing 1311. Elias Koutsoupias and Christos Papadimitriou. Worst-case equilibria.
Computer Science Review , 3(2):65 – 69,2009.12. John D. C. Little. A proof for the queuing formula: L = λW . Operations Research , 9(3):383–387, 1961.13. Andreu Mas-Colell. On a theorem of Schmeidler.
Journal of Mathematical Economics , 13(3):201 – 206, 1984.14. Arthur Cecil Pigou.
The economics of welfare . Palgrave Macmillan, 2013.15. Robert W. Rosenthal. A class of games possessing pure-strategy nash equilibria.
International Journal of GameTheory , 2(1):65–67, 1973.16. Tim Roughgarden and ´Eva Tardos. How bad is selfish routing?
J. ACM , 49(2):236–259, March 2002.17. David Schmeidler. Equilibrium points of nonatomic games.
Journal of Statistical Physics , 7(4):295–300, 1973.18. J G Wardrop. Some theoretical aspects of road traffic research.
Proceedings of the Institution of Civil Engineers ,1(3):325–362, 1952.19. Piotr Wiecek and Eitan Altman. Stationary anonymous sequential games with undiscounted rewards.