Dynamic system optimal traffic assignment with atomic users: Convergence and stability
DDynamic system optimal traffic assignment with atomic users:Convergence and stability
Koki Satsukawa a, ∗ , Kentaro Wada b , David Watling c a New Industry Creation Hatchery Center, Tohoku University, Miyagi, Japan b Faculty of Engineering, Information and Systems, University of Tsukuba, Ibaraki, Japan c Institute for Transport Studies, University of Leeds, Leeds, United Kingdom
Abstract
In this study, we analyse the convergence and stability of dynamic system optimal (DSO) traffic assignment with fixeddeparture times. We first formulate the DSO traffic assignment problem as a strategic game wherein atomic users se-lect routes that minimise their marginal social costs, called a ‘DSO game’. By utilising the fact that the DSO gameis a potential game, we prove that a globally optimal state is stochastically stable under the logit response dynamics,and the better/best response dynamics converges to locally optimal states. Furthermore, as an application of DSO as-signment, we examine characteristics of the evolutionary implementation scheme of marginal cost pricing. Throughtheoretical comparison with a fixed pricing scheme, we found the following properties of the evolutionary implemen-tation scheme: (i) the total travel time decreases smoother to an efficient traffic state as congestion externalities areperfectly internalised; (ii) a traffic state would reach a more efficient state as the globally optimal state is stabilised.Numerical experiments also suggest that these properties make the evolutionary scheme robust in the sense that theyprevent a traffic state from going to worse traffic states with high total travel times.
Keywords: dynamic traffic assignment, system optimal, Nash equilibrium, potential game, weakly acyclic game,convergence, stochastic stability
1. Introduction
Dynamic system optimal (DSO) traffic assignment represents normative traffic flow patterns minimising total costsin transport networks. The optimal solutions of a DSO problem provide useful insights into the design of efficienttransport management and control schemes, while the value of the objective function is the benchmark for evaluatingthese schemes. Therefore, DSO assignment has attracted significant attention over the decades since the pioneeringwork of Merchant and Nemhauser (1978a,b).There are two major research streams for analysing DSO assignment. The first analyses the properties of themathematical optimisation problems of DSO assignment. In this stream, several studies investigated the relationshipbetween the solutions of total cost minimisation problems and route marginal social costs. Regarding DSO assign-ment with fixed departure times, Carey and Srinivasan (1993) and Nie (2011) showed that route choice equilibriumof marginal social costs is a first-order necessary condition for optimality in networks with many-to-one origin–destination (OD) pairs. Ziliaskopoulos (2000) and Carey and Watling (2012) showed a convex approximation of DSOproblems; they also demonstrated that marginal cost equilibrium is a sufficient condition for optimality in the prob-lems . The second stream provides qualitative insights into the optimal control rules from simple networks, whereexact optimal solutions can be analysed. For example, Kuwahara et al. (2001), Mu˜noz and Laval (2006) and Zhao ∗ Corresponding author. Tel.: +81-22-795-7492 (ext. 7492).
Email addresses: [email protected] (Koki Satsukawa), [email protected] (Kentaro Wada),
[email protected] (David Watling) Although this approximation allows the existence of ‘vehicle-holding’ situations, some studies developed methods for resolving this problemcomputationally (e.g. Zhu and Ukkusuri, 2013; Shen and Zhang, 2014).
Preprint submitted to Elsevier January 5, 2021 a r X i v : . [ m a t h . O C ] J a n nd Leclercq (2018) developed graphical solution methods in simple parallel-link networks based on the concept ofmarginal cost equilibrium, and derived properties of optimal ramp metering control. Shen and Zhang (2009) andZhang and Shen (2010) investigated an optimal tolling scheme in a corridor network and a ramp control policy in amonocentric network, respectively.Although the existing studies indicate some useful properties of the optimal states and costs, little is known abouttheoretical results on the convergence of dynamical processes (i.e. evolutionary dynamics and iterative algorithms) tooptimal states and the stability of them, despite their importance in achieving the optimal states. For instance, severalstudies proposed heuristic solution algorithms (e.g. Ghali and Smith, 1995; Shen et al., 2007; Qian et al., 2012;Zhang and Qian, 2020); however, convergence is not guaranteed in most cases. The main reason for the difficulty inaddressing the theoretical issues is associated with the non-convexity of DSO problems (or non-monotonicity of themapping of variational inequality (VI) formulation of the marginal cost equilibrium problems), which stems from acomplex dynamic loading model (Carey, 1992): as the first-order necessary condition is not sufficient for optimality,it is essentially difficult to establish the convergence to the globally optimal state of the standard dynamical processes(e.g. deterministic evolutionary dynamics and gradient based algorithms). Furthermore, such dynamical processesmay not converge to even a locally optimal state of total cost minimisation problems (i.e. DSO solution in a broadsense) , since traffic states satisfying the first-order condition include local maxima and saddle points.This study analyses convergence and stability properties of DSO assignment minimising the total travel time withfixed departure times in a general network. We first formulate the DSO traffic assignment problem as a strategicgame wherein atomic users (individual vehicles) select routes that minimise their marginal social costs, called a ‘DSOgame’. By utilising the fact that the DSO game is a potential game (Monderer and Shapley, 1996), we rigorously anal-yse the behaviour of dynamical processes with perturbations . We prove that a globally optimal state is stochasticallystable under the logit response dynamics. We also show that the better/best response dynamics converges to locallyoptimal states. Numerical experiments confirm theoretical findings regarding some features of these dynamics.The convergence and stability results indicate that a traffic state is led toward an efficient state by the evolutionaryimplementation scheme of the marginal cost pricing (Sandholm, 2002, 2005, 2007) in the DSO game, in which tolllevel is adjusted according to the realised traffic state on a day-to-day basis. Here, we further examine whether suchan implementation scheme is essentially important for achieving efficient states. To this end, we compare the schemewith another typical scheme, ‘fixed implementation scheme’, in which toll level is set optimally in advance accordingto a (known or target) optimal state. Through the comparison, we found that under the evolutionary implementationscheme, (i) the total travel time decreases smoother to an efficient traffic state, and (ii) a traffic state would reach amore efficient state. Numerical experiments are finally conducted to validate the theoretical findings.The remainder of this paper is organised as follows. In Section 2, we define the DSO game. Section 3 presents theconvergence and stability properties in the DSO game under natural evolutionary dynamics. Section 4 shows solutionalgorithms based on the evolutionary dynamics; using these algorithms, we conduct numerical experiments in theDSO game. Section 5 compares the evolutionary implementation scheme with the fixed implementation scheme,theoretically and numerically. Section 6 concludes the paper.
2. DSO game
In this section, we formulate a DSO traffic assignment problem that deals with atomic users as a strategic game,‘DSO game’. The game consists of atomic users travelling through the network (i.e. players), sets of available routesfor the users (i.e. strategy sets), and dynamic loading models determining their travel times (i.e. utility functions).After explaining these components, we define Nash equilibrium of the DSO game.
A general road network with many-to-many origin–destination (OD) pairs is herein considered. The networkconsists of a set of nodes N and a set of directed links L . The sets of origin nodes and destination nodes are denoted One exception is Garcia et al. (2000), who reported that the evolutionary process of the historical frequency of route usage induced by fictitiousplay converges to a local optimal state. However, this does not mean that traffic flow patterns converge to a particular state. Moreover, the type oflocal optimal states that are achieved through this process are not known; the resulting local optimal state may be worse than an initial state. N o and N d , respectively. The set of all acyclic routes from node a to node b is denoted by R ( a , b ) . When thesenodes are not connected, the set becomes an empty set.The length of link l ∈ L is denoted by L l . The free-flow speed, backward wave speed and saturation flow rateare constant and denoted by v l , w l and q l , respectively. Each link has a bottleneck at the end of the link, and thebottleneck capacity of link l is denoted by µ l ( ≤ q l ) . This capacity represents the maximum possible outflow rate fromthe link. This means that we deal with not only a homogeneous link where the capacity of every section is equal,but also a heterogeneous link where the capacities of entrance and exit sections are different, i.e. the shape of thefundamental diagram of each link is triangular or trapezoidal. The values of saturation flow and bottleneck capacityof links can be set sufficiently large such that vehicle queues do not start to grow from these links; we refer to suchlinks as ‘uncapacitated link’, whereas we refer to the other links as ‘capacitated links’. If necessary, we distinguishthese two types of links.Each atomic user is a player in the DSO game. The set of users is denoted by P , and the number of users isdenoted by |P| . The origin, destination and departure times of user i ∈ P are denoted by o i , d i and s i , respectively.These are given exogenously. All users departing from the same origin have different departure times. Each userselects a route between his/her origin and destination. The set of available routes for a user i ∈ P is denoted by R i ( = R ( o i , d i )) . This set includes a special strategy φ i , which represents that user i selects no route (i.e. user i is notassigned to the network). A set of strategies for all users (i.e. strategy profile) corresponds to a route choice patternof all users, which is referred to as a ‘route profile’ or just a (traffic) state. A route profile is denoted by a vector r ≡ { r , . . . , r i , . . . , r |P| } ∈ R where R ≡ R × · · · × R |P| . Note that R is finite as we consider only acyclic routes anda given finite number of users. For any route profile r , the route choices of the users other than user i are denoted by r − i ≡ { r , . . . , r i − , r i + , . . . , r |P| } . With this notation, we sometimes represent a profile r as ( r i , r − i ) to clearly state theroute of user i .We assume that the utility of each user is equal to the negative value of his/her route marginal social cost on theselected route. This is usually interpreted as the situation in which each user tries to minimise his/her generalisedtravel cost under the marginal cost or Pigouvian pricing scheme. The marginal cost of a user who selects a route for agiven route profile is defined as the sum of the route travel time of the user (private cost) and the change in the routetravel times of the other users by assigning the user (external cost). Mathematically, the utility of user i who selectsthe route r i ∈ R i for a given route profile r − i is represented by, U i ( r i , r − i ) = − C i ( r i , r − i ) − E i ( r i , r − i ) , (2.1)where E i ( r i , r − i ) = (cid:88) i (cid:48) ∈P\{ i } (cid:110) C i (cid:48) ( r i , r − i ) − C i (cid:48) ( φ i , r − i ) (cid:111) , (2.2)where C i ( r i , r − i ) and E i ( r i , r − i ) are private and external costs of user i , respectively, and C i (cid:48) ( φ i , r − i ) represents thetravel time of user i (cid:48) when user i is not assigned to the network . We also denote by TC ( r ) the total travel time of thetransport system (referred to as total cost, hereinafter) for route profile r , as follows: TC ( r ) = (cid:88) i (cid:48) ∈P C i (cid:48) ( r ) , ∀ r ∈ R . (2.3)The travel time (and external cost) of each user for a given route profile is uniquely determined according to adynamic loading model which consists of two sub models: a link model and a node model. A link model determinestwo kinds of ‘possible’ times on each link according to existing vehicle trajectories and a specified car-followingmodel: earliest possible departure times that vehicles can depart from the link when the downstream links are empty;earliest arrival times that vehicles can enter the link while satisfying the time-headway restriction required for the car-following behaviour. In other words, the link model determines boundary conditions for determining departure andarrival times of vehicles on each link. A node model then calculates ‘actual’ departure and arrival times of vehicleson links such that the actual departure times are consistent with the boundary conditions of upstream and downstreamlinks of each node. The framework of the dynamic loading model is consistent with the ‘demand/supply approach’proposed by Daganzo (1994, 1995) (see Appendix A for the details of link and node models). The marginal costs can be evaluated exactly, unlike the conventional fluid approach wherein the evaluation is difficult and inexact (Qian et al.,2012); we compare the resulting travel times in the cases with the route profiles of ( r i , r − i ) and ( φ i , r − i ) . .2. Nash equilibrium We employ the concept of pure Nash equilibrium as equilibrium of the users’ route choices. Mathematically, anequilibrium state r ∗ satisfies the following condition: U i ( r ∗ i , r ∗− i ) = max r ∈R i U i ( r , r ∗− i ) , ∀ i ∈ P . (2.4)The route of each user in a Nash equilibrium state is a best response to the route choices of all other users; we referto the route as a ‘best response route’. If each user has the unique best response route, r ∗ is called a strict Nashequilibrium state.By substituting Eq. (2.1) into the condition (2.4), we obtain: (cid:88) i (cid:48) ∈P C i (cid:48) ( r ∗ i , r ∗− i ) = min r ∈R i (cid:88) i (cid:48) ∈P C i (cid:48) ( r , r ∗− i ) , ∀ i ∈ P . (2.5)This equation means that the total cost at equilibrium cannot be decreased by unilaterally changing routes. Thus, eachequilibrium state corresponds to a locally optimal state of a total cost minimisation problem . When the total costof a locally optimal state is minimal among all locally optimal states, the optimal state is referred to as the globallyoptimal state.Note that the globally optimal state always exists because the set of feasible states (i.e. route profiles) is finite;however, the uniqueness is not guaranteed in general since different route profiles could have the same total cost.
3. Convergence and stability in DSO games
This section analyses the convergence and stability in a DSO game. We first show that a DSO game is a potentialgame. By utilising this appealing property, we prove the convergence of evolutionary dynamics to locally optimalstates. Furthermore, we establish the stochastic stability of a globally optimal state under evolutionary dynamics withperturbations.
A potential game is a game wherein the change in the utility of a user, which results from a unilateral change instrategy, equals the change in the global utility referred to as a potential function. This is formally defined as follows:
Definition 1. (Potential game (Monderer and Shapley, 1996))
A finite n -user game with action sets {R i } ni = andutility functions { U i } ni = is a potential game if and only if there is a function Π : R → R such that U i ( r (cid:48) i , r − i ) − U i ( r (cid:48)(cid:48) i , r − i ) = Π ( r (cid:48) i , r − i ) − Π ( r (cid:48)(cid:48) i , r − i ) , ∀ i ∈ P , ∀ r − i ∈ R − i , ∀ r (cid:48) , r (cid:48)(cid:48) ∈ R i . (3.1)According to this definition, we have the following theorem: Theorem 1.
A DSO game is a potential game whose potential function is the negative of the total cost function inEq. (2.3).
Proof.
Considering two route profiles ( r (cid:48) i , r − i ) and ( r i , r − i ) where user i changes his/her route from r i to r (cid:48) i , we have U i ( r (cid:48) i , r − i ) − U i ( r i , r − i ) = − C i ( r (cid:48) i , r − i ) − E i ( r (cid:48) i , r − i ) − {− C i ( r i , r − i ) − E i ( r i , r − i ) } = − C i ( r (cid:48) i , r − i ) + C i ( r i , r − i ) − (cid:88) i (cid:48) ∈P\{ i } (cid:110) C i (cid:48) ( r (cid:48) i , r − i ) − C i (cid:48) ( φ i , r − i ) − C i (cid:48) ( r i , r − i ) + C i (cid:48) ( φ i , r − i ) (cid:111) = − (cid:88) i ∈P C i (cid:48) ( r (cid:48) i , r − i ) + (cid:88) i ∈P C i (cid:48) ( r i , r − i ) = − TC ( r (cid:48) i , r − i ) + TC ( r i , r − i ) . Hence, the change in the utility of user i equals the change in the negative of the total cost function. This means thatthe negative of the total cost function is the potential function of this game; this proves the theorem. (cid:3) It means that an improvement in the utility of each user always leads to an improvement in the total cost. By definition, local maximum and saddle points do not become Nash equilibrium states, unlike in the fluid approach. .2. Convergence In the following, we consider a situation in which a DSO game is repeatedly played on a day-to-day basis. Insuch a repeated game, at each time (i.e. day) τ ∈ N , each user i ∈ P takes the route r τ i ∈ R i and receives theutility U i ( r τ ) where r τ is the route profile at the time. On each day, one user who is going to change his/her routeis selected randomly with equal probability / |P| , and the user calculates his/her utility of each route for a routeprofile on the previous day . The other users must repeat their route choices from the previous day. Then, basedon the calculated utility, the selected user changes the route according to a behavioural rule common to all users bycomparing the utilities of the current route and other routes. The resulting (day-to-day) evolution of the traffic state iscalled evolutionary dynamics.Let us investigate the convergence of the following two general evolutionary dynamics: better response and bestresponse dynamics. Under the better response dynamics, the selected user i changes from route r τ i to r τ + i if this strictlyimproves the user’s utility compared with the previous day. If the user does not have a route satisfying this condition,the user does not change the route from that of the previous day. Mathematically, the probability p betteri ( r ; r τ ) thatselected user i chooses route r for a given route profile r τ is given as follows: p betteri ( r ; r τ ) = if D i ( r τ ) = ∅ ∩ r = r τ i , | D i ( r τ ) | if D i ( r τ ) (cid:44) ∅ ∩ r ∈ D i ( r τ ) , otherwise , (3.2)where D i ( r τ ) : = (cid:110) r ∗ i | r ∗ i ∈ R i s.t. U i ( r ∗ i , r τ − i ) > U i ( r τ i , r τ − i ) (cid:111) , (3.3)where D i ( r τ ) is the set of better responses of the user i for the given route profile r τ . Then, the transition probabilityfrom a route profile r τ to r τ + is represented as follows: p r τ r τ + = |P| · p betteri ( r τ + i ; r τ ) if r τ + i (cid:44) r τ i ∩ r τ + = ( r τ + i , r τ − i ) , ∀ i ∈ P , (cid:80) i ∈P |P| · p betteri ( r τ + i ; r τ ) if r τ + = r τ , otherwise . (3.4)The first case shows that one user i changes his/her route r τ i to a different route r τ + i ; the second case shows that auser does not change the route; the third case shows that two or more users change their routes simultaneously, andthis probability must be zero. The transition probability suggests that the probability distribution of route profiles at τ + depends only on the route profile at τ . Thus, the resulting stochastic process of route profiles { r τ } τ ∈ N becomes aMarkov chain that describes the transition probability between each route profile, and its state space is the set of routeprofiles R .As a direct consequence of the fact that a DSO game is a potential game, we can establish the following theoremregarding the convergence of the dynamics: Theorem 2. (Convergence of better response dynamics).
In a DSO game under the better response dynamics, aroute profile converges almost surely to a Nash equilibrium state (locally or globally optimal state) from an arbitraryinitial profile, and this state has a lower total cost than that of the initial state.
Proof.
Since the game is a potential game, the better response dynamics which increases the potential of the DSOgame converges to a state maximising the potential locally or globally as τ → ∞ . Moreover, since the potential isequal to the negative value of the total cost, an increase in the potential by the better responses always decreases thetotal cost. Thus, the proposition is proved. (cid:3) It would be difficult for the selected user to calculate external costs. However, these costs are considered to be imposed by a road manager inthe form of marginal cost or Pigouvian pricing schemes, as mentioned in Section 2. It follows that the user only have to calculate his/her privatecosts (i.e. route travel times) on the assumption that the other users keep taking their current routes, and this may not be so difficult. A traffic state might get stuck into non-Nash equilibrium if a user that cannot change his/her route is continuously given opportunity to changethe route and the potential does not increase. However, it is obvious that such a probability converges to zero as τ → ∞ . r τ , the user i chooses a route randomly from the following route set B i ( r τ ) : B i ( r τ ) : = (cid:40) r ∗ i | r ∗ i ∈ R i s.t. max r ∈R i U i ( r , r τ − i ) (cid:41) . (3.5)Then, the probability p besti ( r ; r τ ) that selected user i chooses route r for a given route profile r τ is given as follows: p besti ( r ; r τ ) = | B i ( r τ ) | if r ∈ B i ( r τ ) , otherwise . (3.6)Transition probability between route profiles can be derived by replacing the route choice probability p betteri ( r τ + ; r τ ) in Eq. (3.4) by p besti ( r τ + ; r τ ) . Thus, the stochastic process { r τ } τ ∈ N under the best response dynamics is also a Markovchain with finite state space R .The best response dynamics has the following property compared to the better response dynamics: a user takinghis/her best response route can change the route to another best response route if it exists, i.e. users can switch theirroutes among their best response routes . This means that even when the current route profile is Nash equilibrium, aroute profile can deviate from that equilibrium if some user finds another route whose marginal cost is the same withthat of the current route. This property yields the following corollary about a closed communication class, which isan absorbing set of states in a Markov chain, of the best response dynamics : Corollary 1.
In a DSO game, any closed communication class in a Markov chain generated by the best responsedynamics consists of Nash equilibrium states with equal total costs.
Proof.
If two traffic states have different total costs, these states are not reachable from each other by the best responsedynamics; the state with the higher total cost is not reachable from the other. It follows that all the traffic states in anycommunication class should have the same total cost, in order to be able to ‘communicate’.Moreover, if there exists a traffic state that is not a Nash equilibrium state (i.e. there exists at least one user whocould improve the utility) in a communication class, it is not a closed class: if the total cost is improved by a bestresponse of a user and a traffic state deviates from the communication class, the best response dynamics would neverreturn to traffic states in that class. It follows that all closed communication classes consist of Nash equilibrium stateswith equal total costs. Thus, the corollary is proved. (cid:3)
This corollary means that a route profile could change among a set of states with equal total costs. From this corollary,we obtain the following theorem showing the convergence of the best response dynamics:
Theorem 3. (Convergence of best response dynamics).
In a DSO game under the best response dynamics, a routeprofile converges almost surely to a set of
Nash equilibrium states with equal total costs from an arbitrary initialprofile, and these states have a total cost lower than that of the initial state.
Proof.
All Markov chains eventually enter a closed communication class with probability 1 as τ → ∞ . From thisfact and Corollary 1 , it follows that a route profile generated by the best response dynamics is absorbed into a set ofNash equilibrium states with equal total costs. Moreover, as in the statement in the proof of
Theorem 2 , the total costis lower than that of the initial state. (cid:3)
It might look undesirable from a convergence perspective that a traffic state could constantly change under the bestresponse dynamics. However, such behaviour of the dynamics yields the following good property: a route profile can The better response dynamics does not allow switching among the best response routes because users cannot change their routes to other routeswhose utility is the same. A communication class
C ⊆ R is a set of states (route profiles) whose members can reach each other (i.e. communicate) under the correspond-ing evolutionary dynamics, and no state in C communicates with any state outside C . Also, a communication class C is said to be closed if no statein C reaches any state outside C . For details, see for example, Meyn and Tweedie (2009). igure 1: Matrices of a two-user DSO game in which each number is the total cost corresponding to each route profile. Each dotted arrow representsa better or best response. deviate from an inefficient (or locally) optimal state where the better response dynamics could converge. Specifically,under the best response dynamics, a route profile can shift from an equilibrium state to another state with an equaltotal cost since users can switch their routes among their best response routes, as mentioned above. Such a switchcould cause the following ripple effect (Satsukawa et al., 2019): route change of a user affects utilities of other usersand makes their routes into non-best response routes. This means that these users can find routes whose marginalcosts are lower than those they currently select after the shift. Thus, if such users are given opportunities to changetheir routes, the total cost can decrease by their best responses.Meanwhile, under the better response dynamics, a route profile cannot deviate from any equilibrium state becausea user can only select a route whose utility is strictly higher. In other words, in a Markov chain generated by the betterresponse dynamics, all Nash equilibrium states are regarded as absorbing states of the Markov chain. Consequently, itis expected that total costs generated by the best response dynamics tend to be lower than those by the better responsedynamics.An example demonstrating the difference is illustrated in Figure 1. Each matrix in the figure shows the total costsof the route profiles and the better or best responses in a two-user DSO game. There exist two Nash equilibriumstates ( B , B ) and ( C , A ) . As shown in the left matrix, if the better response dynamics is employed, the route profilecan converge to either of the equilibrium states since all equilibrium states are absorbing states of the dynamics. Incontrast, as shown in the right matrix, if the best response dynamics is employed, a route profile can deviate from ( B , B ) and change to ( C , B ) since user 1 need not strictly improve the utility. As a result, the route profile converges to ( C , A ) , whose total cost is lower than that of ( C , B ) .Although we here show the convergence of the deterministic evolutionary dynamics, it is essentially difficult toestablish the convergence of such dynamics to the globally optimal state, as mentioned in Section 1. In order to avoidgetting stuck in (inefficient) locally optimal states where the best response dynamics also converges, it is necessaryto introduce perturbations that lead traffic states to more efficient states by allowing the shift to worse traffic stateswith higher total costs. The next section examines evolutionary dynamics with perturbations, and shows the stochasticstability of globally optimal states , which is regarded as the convergence concept of such perturbed dynamics. The following is a brief summary of Young (1993). We first consider a finite state Markov chain over the statespace R generated by evolutionary dynamics without perturbation (e.g. best response dynamics); we refer to this kindof evolutionary dynamics and the Markov chain as ‘unperturbed evolutionary dynamics’ and ‘unperturbed Markov We can identify which route profiles are Nash equilibrium states from the matrix showing the total cost: since a DSO game is a potential gamewhose potential function corresponds to the total cost, improvement in the total cost by a route change of a user always leads to an improvement inthe utility of the user. P be the transition matrix of the Markov chain. Further, let also p rr (cid:48) be an element of thematrix which represents the transition probability from state (i.e. route profile) r ∈ R to state r (cid:48) ∈ R .We then consider another Markov chain that is generated by a perturbed version of the (unperturbed) evolutionarydynamics under which users are subjected to a small perturbation whose size is indexed by a scalar (cid:15) , referred toas a ‘perturbed Markov chain’. (cid:15) takes on all values in some interval (0 , a ] . Let P (cid:15) be the corresponding transitionprobability matrix. We assume that the perturbed Markov chain satisfies the following conditions: P (cid:15) is aperiodic and irreducible (i.e. the perturbed Markov chain is ergodic) for all (cid:15) ∈ (0 , a ] , (3.7) lim (cid:15) → + p (cid:15) rr (cid:48) = p rr (cid:48) (3.8)and p (cid:15) rr (cid:48) > for some (cid:15) implies ∃ c ( r → r (cid:48) ) ≥ s.t. < lim (cid:15) → + (cid:15) − c ( r → r (cid:48) ) p (cid:15) rr (cid:48) < ∞ . (3.9)Condition (3.7) implies that the perturbed Markov chain has a unique stationary distribution π (cid:15) for every (cid:15) . Condi-tion (3.8) implies that P (cid:15) converges to the unperturbed one, P . Condition (3.9) implies that P (cid:15) approaches P at anexponentially smooth rate. The scalar c ( r → r (cid:48) ) is called the ‘resistance’ of transition r → r (cid:48) . This value representsthe degree of intensity of mistakes required for this transition (e.g. the minimum number of mistakes). If the transi-tion is allowed under the unperturbed evolutionary dynamics, c ( r → r (cid:48) ) = . The specific Markov chain satisfying(3.7)-(3.9) is called a ‘regular perturbed Markov chain’ of P .The unperturbed Markov chain may have multiple stationary distributions, whereas the (regular) perturbed Markovchain has a unique stationary distribution. Such a unique stationary distribution converges to one of the stationarydistributions of the unperturbed Markov chain, as (cid:15) → . This means that the perturbations effectively select onestationary distribution. This stationary distribution yields the observation probability of each state when the processof the Markov chain with the perturbations runs for a long time. Stochastic stability is then defined as follows: Definition 2. (Stochastic stability (Young, 1993)).
A state r ∈ R is ‘stochastically stable’ relative to the Markovchain P (cid:15) if lim (cid:15) → π (cid:15) r > .Over the long run, states that are not stochastically stable are observed infrequently compared to states that are stochas-tically stable, provided that the probability of mistakes is small as (cid:15) → . We establish the stability of global optimal states by employing logit response dynamics (Blume, 1993). Underthe dynamics, the probability p β i ( r , r τ ) that a selected user i chooses route r for a given route profile r τ is given asfollows: p β i ( r , r τ ) = exp( β U i ( r , r − i )) (cid:80) r (cid:48) ∈R i exp( β U i ( r (cid:48) , r − i )) (3.10)where β ∈ (0 , ∞ ) measures the degree of noise in the best response. Note that the logit response dynamics is aperturbed best response dynamics, and converges to the best response dynamics when β → ∞ (this corresponds to (cid:15) → ).We then show that a globally optimal state is a stochastically stable under the logit response dynamics: Theorem 4. (Stochastic stability of logit response dynamics).
Consider the logit response dynamics in a DSOgame. The stochastically stable state is the globally optimal state minimising the total cost.
Proof.
Marden and Shamma (2012) show that the stochastically stable states in a potential game with the logitresponse dynamics are the set of potential maximisers. (cid:3)
This theorem means that, with the help of the stochastic perturbation, a traffic state avoids converging to a locally (andinefficient) optimal states, and approaches the state minimising the total cost over the long run.
4. Solution algorithms and numerical experiments
In this section, we briefly summarise the evolutionary dynamics with and without perturbations as solution algo-rithms for the DSO game. We then show numerical experiments that demonstrate the theoretical properties of theevolutionary dynamics shown in the previous section. 8 .1. Solution algorithms
We first show deterministic algorithms for computing a locally optimal state based on the better and best responsedynamics, as follows.
Algorithm 1: Deterministic solution algorithms with better or best response dynamics Initialisation : Set τ = where τ represents the iteration counter. Set also an initial route profile r .1. Select a user conducting a better/best response : Randomly select one user i ∈ P who can conduct abetter or best response. Calculate the utility of the user for each route from the dynamic loading model.Change the route of the selected user to a new route r ∗ i ∈ R i according to Eq. (3.2) or Eq. (3.6).2. Update the route profile and judge the convergence : Update the route profile: r τ + = ( r ∗ i , r − i ) . If r τ + isnot a Nash equilibrium state, let τ : = τ + and go back to Step 1. If r τ + is a Nash equilibrium state, thenterminate the algorithm.Next, we show a stochastic algorithm for computing a globally optimal state , based on the logit response dynamics.It is clear that as β → ∞ , the stationary distribution of the logit response dynamics in a DSO game converges to statesmaximising the potential function. However, this does not imply the convergence of the logit response dynamicsif β is chosen as fixed. To derive a globally optimal state, we have to develop an algorithm based on the logitresonse dynamics with a time-dependent perturbation parameter β ( τ ) that guarantees the convergence of the resultingstationary distribution to the potential maximiser, i.e. the state minimising total cost. The framework of the stochasticalgorithm is described as follows: Algorithm 2: Stochastic solution algorithm with logit response dynamics Initialisation : Set τ = where τ represents the iteration counter. Set also an initial route profile r .1. Select a user changing the route according to the logit response dynamics : Randomly select one user i ∈ P and calculate the utility of the user for each route. Change the user’s route to a new route r ∈ R i according to Eq. (3.10).2. Update the route profile and perturbation parameter : Update the route profile: r τ + = ( r , r − i ) . Updatealso the perturbation parameter according to a specified schedule: β : = β ( τ ) .3. Go back to Step 1 and repeat.Regarding the condition of the perturbation parameter in the algorithm guaranteeing the convergence to a globallyoptimal state, we show a proposition based on the theorem presented in Tatarenko (2014), as follows: Proposition 1.
Consider a DSO game with |P| users. Then,
Algorithm 2 with β ( τ ) = ln( τ + / ( (cid:100)|P| / (cid:101) ) guaranteesthe probabilistic convergence of route profiles of the DSO game to the maximisers of the potential function, i.e.: lim τ →∞ Pr { r τ ∈ { r ∗ | Π ( r ∗ ) = max r Π ( r ) }} = . (4.1) Proof.
See Appendix B. (cid:3)
Although this proposition provides a setting of the perturbation parameter that guarantees convergence to a globallyoptimal state, the perturbation parameter has a logarithmic dependence on time. This means that the convergencespeed is very low. Thus, in the next subsection addressing numerical experiments of DSO games, we also introduce amore practical setting, in which β ( τ ) is a polynomial function, and test its performance. We first consider a simple network with a single OD pair, as shown in Figure 2. There exists two parallel routes(links) and each link has a bottleneck section with a bottleneck capacity at the end. The physical conditions of eachlink (e.g. free-flow travel time and capacity) are summarised in the figure. Note that the saturation flow rates of alllinks are set to . / sec . Route 1 is the shortest-distance route and consists of links , and , whereas route 2 isreferred to as the bypass route and consists of links , and . The total number of users is 400, and they depart fromthe origin with fixed time headway, . / veh . 9 j d o (6.0, 2.0) (9.0, 1.0)(18.0, 2.0) (9.0, 2.0)(Free flow travel time[sec], Capacity[veh/sec]) Figure 2: Simple network with two parallel routes
Better Best
Dynamics T o t a l t r a v e l t i m e [ s e c ] (a) Total costs of the best optimal states in sample paths Time slot (1 time slot = 200 iteration) A v e . nu m . o f non - be s t r e s pon s e u s e r s A v e r age t o t a l t r a v e l t i m e [ s e c ] Non-best response usersAve. total travel time
Reach NE once Deviate from NE & Decrease TTT Reach efficient NE (b) Convergence process of best response dynamics
Figure 3: Results regarding deterministic solution algorithms
In the numerical experiment, we compare the total costs derived by the solution algorithms (with better, bestand logit responses) to observe the differences in the convergence properties between them. For each algorithm, wegenerate 1,000 sample paths with different initial route profiles; the number of iterations for each sample path is setas 20,000 times, which is enough for each algorithm to find at least a locally optimal state (i.e. equilibrium). Notethat we consider two different time-dependent perturbation parameters, logarithmic and linear decreasing, for the logitresponse dynamics. Specifically, in the former, the perturbation parameter at iteration τ is set as ln( τ + / ; in thelatter, the parameter is set as ( τ + / .Figure 3(a) shows the distribution of the total cost of the best optimal state obtained in each sample path by thedeterministic solution algorithms. From this figure, we see that there are multiple optimal (i.e. equilibrium) stateswith different total costs in this DSO game. The difference between the worst and the best cases is approximately .This figure also shows that the total costs generated by the best response dynamics tend to be lower than thoseby the better response dynamics. This is because, as mentioned in Section 3.2, the behaviour of the two evolutionarydynamics is different near equilibrium even in this simple network. Specifically, when there exist users whose utilitiesof the two routes are the same (i.e. both of the routes are the best response routes), the users can switch their routesamong the two best response routes under the best response dynamics. Then, there is a possibility that a route profileunder the best response dynamics shifts from Nash equilibrium to more efficient equilibrium.Let us look at Figure 3(b) that shows the behaviour of the best response dynamics in more detail. In this figure,for every time slot (200 iterations), we calculate the average total costs, and number of users who do not take theirbest response routes (referred to as ‘non-best response users’) in the route profile. From this figure, we see that thenumber of non-best response users increases from zero during the process; then, the total cost sometimes decreasesin response to the increase. This result comes from ripple effects caused by best responses. Specifically, a user’sbest response among the two (best response) routes affects utilities of other users and makes their route into non-bestresponse routes; then, if one of such users takes the best response, the traffic state moves to a more efficient state withdeviating from an inefficient one. This result thus confirms the properties of the dynamics that the theory predicts.Figure 4 shows the distribution of the best total cost obtained in each sample path of the logit dynamics. Theresult of the best response dynamics is also shown in the figure for a reference. We see that the stochastic algorithm10 est Logit:log Logit:linear Dynamics T o t a l t r a v e l t i m e [ s e c ] Figure 4: Minimum total costs obtained from each solution algorithm
Time slot (1 time slot = 200 iteration) A v e . nu m . o f non - be s t r e s pon s e u s e r s A v e r age t o t a l t r a v e l t i m e [ s e c ] (a) With the logarithmic perturbation parameter Time slot (1 time slot = 200 iteration) A v e . nu m . o f non - be s t r e s pon s e u s e r s A v e r age t o t a l t r a v e l t i m e [ s e c ] (b) With the linear perturbation parameter Figure 5: Changes in the average number of non-best response users and total cost produced by the stochastic solution algorithms with the logarithmic perturbation parameter tends to produce higher total costs compared to those produced by the bestresponse dynamics. By contrast, the stochastic algorithm with the linear perturbation parameter tends to produce lowertotal costs although the difference in its values is relatively small. These results can be explained by the differencebetween the adjustment rules of parameter β ( τ ) : the speed of the decrease in the perturbation parameter in the formercase is slow, and thus the traffic states are continually perturbed during the iterations. The validity of this explanationis confirmed by Figure 5 that shows the stochastic processes (sample paths) by the above two stochastic algorithmsfrom the same initial state. In summary, the stochastic algorithm with the linear perturbation parameter may achievea lower total cost in a practical number of iterations. We conduct a similar experiment in the Nguyen-Dupuis network with multiple OD pairs (Figure 6). The physicalconditions of each link are summarised in Table 1. The number of users departing from each origin-destination pair is1,000 (i.e. 4,000 in total). The users depart from each origin with a fixed time headway, . / veh . We generate 50samples for each dynamics and the number iteration is set as , ; we employ the logit response dynamics with atime-varying parameter β ( τ ) = ( τ + / , as the stochastic algorithm.The results in Figure 7 show that the solution algorithms have the same qualitative properties in the previoussection: the achieved total costs are good in the following order: the logit, best and better response dynamics. Theseresults are consistent with the properties in Section 3. Future work should investigate the characteristics of flowpatterns in different optimal states. 11
11 23109 13 7 8654 12Origins Destinations
Figure 6: Nguyen-Dupuis networkTable 1: Physical conditions of each link (FFTT: Free Flow Travel Time, BC: Bottleneck Capacity, SF: Saturation Flow)
Link FFTT[sec] BC[veh/sec] SF[veh/sec] Link FFTT[sec] BC[veh/sec] SF[veh/sec] (1 ,
42 1.25 6 (8 ,
72 1.25 6 (1 ,
54 1.25 6 (9 ,
60 2.25 6 (4 ,
54 1.25 6 (9 ,
54 0.83 6 (4 ,
90 0.83 6 (10 ,
18 1.67 6 (5 ,
36 1.42 6 (11 ,
54 1.25 6 (5 ,
54 1.67 6 (11 ,
42 1.25 6 (6 ,
24 1.25 6 (12 ,
30 0.67 6 (6 ,
78 0.83 6 (12 ,
84 1.25 6 (7 ,
48 0.83 6 (13 ,
66 0.83 6 (7 ,
66 0.92 6
5. Marginal cost pricing and stochastic evolutionary implementation
A direct and the most important application of DSO assignment is to provide insights into the marginal cost orPigouvian pricing for achieving optimal states (as Nash equilibrium). In this context, the analyses of the evolutionarydynamics so far can be interpreted as those of the evolutionary implementation scheme of the marginal cost pricing (Sandholm, 2002, 2005, 2007), in which toll level is adjusted according to the realised traffic state on a day-to-daybasis. In this section, we examine whether such an implementation scheme is essentially important for achievingoptimal states. Specifically, we show some important features of the evolutionary implementation scheme througha comparison with another typical scheme, ‘fixed implementation scheme’, in which toll level is set optimally inadvance according to a (known or target) optimal state.This section aims to clarify differences between the two schemes from rigorous theoretical analysis. To this end,we mainly focus on a simple network in which there exists a single origin and each route contains only one capacitatedlink, i.e. a single bottleneck per route network, as shown in Figure 8. We refer to this network as an ‘SBPR-1 network’,in short. SBPR-1 networks have some limitations regarding the topology. However, these networks can be consideredas sub-networks included in general networks. Convergence and stability properties in such sub-networks will stronglyaffect those in whole networks as equilibrium of traffic flow of each OD pair is necessary for the equilibrium of wholetraffic flow. This implies that qualitative characteristics of the convergence and stability properties in general networkswould inherit those in SBPR-1 networks.In Section 5.1 and 5.2, we first show some theoretical properties of a new game under fixed tolls. In Section5.3, we compare the two implementation schemes theoretically. Then, we conduct numerical experiments to validatethe theoretical results, and examine whether the theoretical results are still applicable beyond an SBPR-1 network inSection 5.4. 12 etter Best Logit:linear
Dynamics T o t a l t r a v e l t i m e [ s e c ] Figure 7: Total costs of the best optimal states in sample paths in the Nguyen-Dupuis network o Origin DestinationsBottleneck links
Figure 8: Example of a single-bottleneck-per-route network with a single origin
We formulate a strategic game where (optimal) fixed tolls are imposed on users; we refer to this game as a‘dynamic user equilibrium game with a fixed congestion pricing (DUE-FCP game)’. In the DUE-FCP game, weassume that the utility of each user is equal to the negative value of the sum of the route travel time of the user andfixed toll. Mathematically, the utility of user i who selects the route r i ∈ R i for a given route profile r − i is representedby, U Fi ( r i , r − i ) = − C i ( r i , r − i ) − T i ( r i ) , (5.1)where T i ( r ) is the toll imposed on user i whose route is r . A Nash equilibrium state r ∗ is then defined as a state whereusers cannot improve their utility by unilaterally changing routes, as follows: U Fi ( r ∗ i , r ∗− i ) = max r ∈R i U Fi ( r , r ∗− i ) , ∀ i ∈ P . (5.2)In this game, with a fixed set of tolls, one can make a certain traffic state a Nash equilibrium state, as follows: Proposition 2.
Consider a DUE-FCP game and an arbitrary route profile r ∗ ∈ R . Then, there always exists at leastone set of tolls T ∗ that makes r ∗ into strict Nash equilibrium. That is, the following condition is satisfied under T ∗ : U Fi ( r ∗ i , r ∗− i ) > U Fi ( r , r ∗− i ) , ∀ r ∈ R i \ { r ∗ i } , ∀ i ∈ P . (5.3) Proof.
It is obvious that changes in tolls of routes for a user do not change the utility of the other users. Thus, withoutaffecting the other users’ utility, we can make r ∗ i the unique best response route for user i in r ∗ by setting tolls of eachroute to T ∗ i ( r ) as follows: − C i ( r ∗ i , r ∗− i ) − T ∗ i ( r ∗ i ) > − C i ( r i , r ∗− i ) − T ∗ i ( r i ) ∀ r i ∈ R i \ { r ∗ i } (5.4)By applying the same procedure to all the users, we ensure that r ∗ is strict Nash equilibrium in the DUE-FCP game. (cid:3)
13f the tolls for each user are set as the negative externalities that he/she creates in an optimal state r ∗ (a Nash equilibriumstate in a DSO game), i.e. T i ( r i ) = E i ( r i , r ∗− i ) , a Nash equilibrium state in the DUE-FCP game corresponds to theoptimal state. Thus, a DUE-FCP game with a set of optimal tolls is useful for analysing the properties of the fixedimplementation scheme.From now on, we focus on an SBPR-1 network as mentioned above. This network has the following orderingproperty that characterises the direction of the externalities: Proposition 3.
Consider a DUE-FCP game in an SBPR-1 network. The utility of any route of a user is independentof the route choices of the users who depart from the origin later than the considered user.
Proof.
The utility of a user’s route consists of the route travel time and corresponding fixed toll. As the travel time inevery uncapacitated link does not change, it is sufficient for us to prove that the travel time of a user in each capacitatedlink is independent of the route choices of the users who depart later than the user.As no delay occurs in the uncapacitated links, a user arrives at any node that is situated upstream of the capacitatedlinks earlier than users who depart later than the user. By combining this fact with the FIFO principle and the causalityof the dynamic loading model, it is guaranteed that the travel time of a capacitated link, which includes delays owingto traffic congestion, is independent of the route choices of the users who depart later than the user. In other words, itis dependent only on the route choices of users who depart earlier than the user. (cid:3)
With this ordering property, we can prove the uniqueness of equilibrium in the DUE-FCP game:
Corollary 2.
Consider a DUE-FCP game in an SBPR-1 network and an arbitrary route profile r ∗ ∈ R . Consideralso a set of tolls T ∗ that makes r ∗ into strict Nash equilibrium. Then, r ∗ becomes the unique Nash equilibrium in theDUE-FCP game. Proof.
See Appendix C. (cid:3)
The ordering property is useful for analytically investigating the properties of equilibrium, as utilised in previousstudies (e.g. Kuwahara and Akamatsu, 1993; Akamatsu et al., 2015; Wada et al., 2019). Especially, with this propertyand the theory of weakly acyclic games, Satsukawa et al. (2019) established the convergence and stability of dynamicuser equilibrium in a unidirectional network. In a similar manner, we prove the convergence and stochastic stabilityof evolutionary dynamics in a DUE-FCP game in an SBPR-1 network.
To define a weakly acyclic game, we utilise the notion of a better response path. A better response path is asequence of strategy profiles r , r , . . . , r L such that for each successive pair r τ , r τ + , there is exactly one user i τ thatchanges his/her route and that user improves the utility, as follows: r τ i τ (cid:44) r τ + i τ , s.t. r τ + i τ ∈ D i τ ( r τ ) , r τ j (cid:44) r τ + j , j ∈ P \ { i τ } . (5.5)The class of weakly acyclic games, introduced by Young (1993) and Marden and Shamma (2012), is then defined asfollows: Definition 3.
A game is a weakly acyclic game if from any strategy profile r ∈ R , there exists a better response pathstarting at r and ending at a Nash equilibrium state.We then obtain the following theorem and propositions regarding theoretical properties of a DUE-FCP game in anSBPR-1 network in the same way as in Satsukawa et al. (2019): Theorem 5.
A DUE-FCP game in an SBPR-1 network is a weakly acyclic game.
Proof.
The outline of the proof is described as follows. Thanks to
Proposition 3 , it is guaranteed that if a user selectsa best response route in a particular route profile, the route remains a best response route regardless of any routechanges made by the users departing later. Thus, by changing the routes of the users to their best response routessequentially in the order of their departure times, all users come to select their ex-post best response routes, that is,we obtain a Nash equilibrium state (see Appendix D for details). (cid:3) roposition 4. Consider a DUE-FCP game in an SBPR-1 network, an arbitrary route profile r ∗ ∈ R , and a set of tolls T ∗ that makes r ∗ into a unique strict Nash equilibrium. Then, a route profile generated by the better/best responsedynamics from an arbitrary initial route profile converges almost surely to r ∗ , that is, r ∗ is global asymptotically stableunder the best response dynamics. Proof.
Young (2004) proved the convergence of the better response dynamics to Nash equilibrium in weakly acyclicgames. Since r ∗ is the unique Nash equilibrium, the global asymptotic stability is promptly derived. Next, in the sameway as the proof of Theorem 5 , we can prove that there exists a sequence of best responses (i.e. best response path)from any initial route profile ending at the strict Nash equilibrium state. This game is called a weakly acyclic gameunder best replies, and Young (2004) proved the convergence of the best response dynamics in the game: thus, theproposition holds. (cid:3)
Proposition 5.
Consider a DUE-FCP game in an SBPR-1 network, an arbitrary route profile r ∗ ∈ R , and a set oftolls T ∗ that makes r ∗ into a unique strict Nash equilibrium. Then, r ∗ is a stochastically stable under the logit responsedynamics. Proof.
This is promptly derived by the Theorem 4 of Young (1993). (cid:3)
We are now ready to discuss the structural differences between the evolutionary and fixed implementation schemesof the marginal cost pricing in SBPR-1 networks. The convergence and stability results for the DSO and DUE-FCP games so far show that an optimal traffic state will be achieved by natural evolutionary dynamics under bothimplementation schemes. However, there are two major differences in the mechanism for achieving optimal statesbetween the two schemes.The first difference stems from the differences in externalities that are created under each implementation scheme,which will affect convergence processes. Under the fixed implementation scheme (or in a DUE-FCP game), owing tothe ordering property shown in
Proposition 3 , a route change of a user affects the utility of users who depart later fromthe same origin, but not vice versa . By this temporal asymmetry of the interaction, users having earlier departure timesbecome likely to choose their ex-post best response routes earlier during the convergence process (Satsukawa et al.,2019). By contrast, under the evolutionary implementation scheme (or in a DSO game), as externalities are perfectlyinternalised, the better/best response of any user leads the traffic state toward an equilibrium state, regardless of theorder of his/her departure time. This implies that the total travel time decreases smoother to an efficient state underthe evolutionary implementation scheme.The second difference is related to the obvious difference in the price-setting, i.e., the toll level is adjusted accord-ing to the realised state on a day-to-day basis under the evolutionary implementation scheme, while the level is setin advance according to a known target state under the fixed implementation scheme. This difference affects whichstate is stabilised. Under the evolutionary implementation scheme, the most efficient equilibrium state is stochasti-cally stable, as shown in
Theorem 4 . In other words, that equilibrium state will be achieved thanks to perturbations even though there are multiple equilibria (i.e. locally optimal states). On the other hand, under the fixed implemen-tation scheme, the target state is a unique and stochastically stable equilibrium state, as shown in
Proposition 5 .It means that (i) a locally (or inefficient) optimal state is stabilised unless information regarding a globally optimalstate is known in advance, and (ii) perturbations do not play an essential role, unlike the evolutionary implementationscheme. These differences imply that when such an inefficient optimal state is a target state under the fixed implemen-tation scheme, a traffic state could not reach a more efficient traffic state, such as globally optimal state minimisingthe total cost. For static traffic assignment problems dealing with continuum users, Sandholm (2002, 2005, 2007) analysed the problems with marginal costpricing schemes by a similar approach. The author showed that static traffic assignment problems under evolutionary and fixed implementationschemes are potential games, and obtained similar conclusions about the convergence and stability properties. These papers also provide anexcellent discussion on information issues in the implementation of congestion pricing schemes.
20 40 60 80 100
Time slot (1 time slot = 500 iteration) A v e r age t o t a l t r a v e l t i m e EvolutionaryFixed
Figure 9: Behaviour of total costs under evolutionary and fixed implementation schemes in the simple network
From the above theoretical discussions, we can expect that under the evolutionary implementation scheme, thetraffic state decreases smoother to an efficient state and a traffic state reaches a more efficient traffic state. In thissection, we conduct numerical experiments for two purposes: (i) confirming the validity of this expectation and (ii)testing whether the theoretical results (especially, for a fixed implementation scheme) are still applicable beyond anSBPR-1 network.
We consider the simple network used in the previous section (Figure 2). This network is an SBPR-1 network. Tosimulate the evolutionary implementation scheme, we iterate the logit response dynamics with a fixed perturbationparameter in the DSO game. Regarding the fixed implementation scheme, we must determine a target state andthe (optimal) fixed tolls for implementing that state first. In the experiment, we set an equilibrium route profile r ∗ obtained under the evolutionary scheme as the target state; the external costs at the state are set as fixed tolls (i.e., T i ( r ) = E i ( r , r ∗− i ) ). We then iterate the logit response dynamics in the DUE-FCP game from the same initial state aswith the DSO game.First, let us look at a sample path of the stochastic process under each implementation scheme. We here set β = . , the number of iterations as 50,000 times, and the initial route profile of each user as the shortest-distanceroute; the target state of the fixed implementation scheme is an efficient equilibrium state that has minimum totalcost among equilibrium states that occurred under the evolutionary implementation scheme. Figure 9 shows that theprocess of the average total cost which is calculated for every time slot (1 time slot = 500 iterations). The black dottedline indicates the total cost at the efficient equilibrium. From this figure, we see that:• Under the fixed implementation scheme, although the process approaches the efficient equilibrium (or target)state somewhat quickly, it does not stay at the state, and frequently deteriorates.• Under the evolutionary implementation scheme, the process does not significantly deteriorate during the itera-tions; the total cost decreases almost monotonically.These results suggest that a traffic state under the evolutionary implementation scheme is more robust to perturbationsthan that under the fixed one in the sense that the total costs do not largely oscillate.The above difference can be attributed to the differences in externalities between the two implementation schemesdiscussed in the previous subsection. Under the evolutionary implementation scheme, as the utility of each user isaligned with the total cost, if one user makes a mistake and the resulting total cost increases, any user who can improvethe utility rationally chooses a route so that the total cost decreases. Thus, total costs will not significantly deteriorateby perturbations. On the other hand, under the fixed implementation scheme, an improvement in the utility of a userdoes not necessarily decrease the total cost. Moreover, once a traffic state is perturbed, the ordering property requiresthe sequential route changes of users to the best response routes in the order of their departure times to return the traffic16 .06 1.07 1.08 1.09 1.1 1.11 Average total travel time S D o f t o t a l t r a v e l t i m e s = 0.2 = 0.5 = 1.0 Total travel time of the initial NE (a) Initial state: the efficient equilibrium state in Figure 9
Average total travel time S D o f t o t a l t r a v e l t i m e s = 0.2 = 0.5 = 1.0 Total travel time of the initial NE (b) Initial state: the tentative globally optimal state.
Figure 10: Averages and standard deviations of the total costs around Nash equilibrium state. This would prevent the traffic states from recovering to an efficient state smoothly. Thus, we can infer from thesefacts that the total cost under the evolutionary implementation scheme would become more robust to perturbations.To confirm the validity of the observation about the robustness, we generate 500 sample paths from an equilibriumstate under each implementation scheme for every experimental setting; the number of iteration for each sample pathis set as 20,000. We select three perturbation parameters considering the frequencies of mistakes (i.e. changing toroutes with worse utilities): β = . , . and . ; and two initial states: the efficient equilibrium in Figure 9 and atentative globally optimal state obtained in the experiment in Section 4. The target state of the fixed implementationscheme is set as the initial state.Figure 10 shows the results. Each mark represents the average and standard deviation of the total costs derivedin one sample path (circle: the evolutionary implementation scheme; crosses: the fixed implementation scheme);different colours of marks indicate different perturbation parameters; each black line represents the total cost at eachinitial equilibrium. In Figure 10(a), the frequencies of mistakes are about 0.5, 1.0 and 2.0% when β is 1.0, 0.5 and 0.2,respectively. On the other hand, in Figure 10(b), the frequencies are about 0.5, 1.0 and 2.0% under the evolutionaryimplementation scheme, and 1.0, 1.5 and 2.5% under the fixed implementation scheme.Figure 10(a) shows that the standard deviations under the fixed implementation scheme tend to be larger thanthose under the evolutionary one; this tendency is particularly evident when the perturbation level becomes high (from β = . to β = . ). The tendencies are also observable in Figure 10(b), which shows the results of the process startingfrom the tentative globally optimal state. These results are consistent with the above observations that the total costunder the evolutionary implementation scheme is more robust to perturbations than the fixed implementation scheme.Figures 10(a) and 10(b) also show that the average total costs under the fixed implementation scheme are system-atically higher than the initial one. This is because the (initial) locally optimal state becomes strongly stabilised bythe fixed tolls and a traffic state is attracted to the locally optimal state. Owing to this, the traffic state tends to stayamong those with higher total travel times near the locally optimal state. Thus, the total travel time becomes worse inaverage than the initial one. Meanwhile, some average total costs under the evolutionary implementation scheme inFigures 10(a) are less than the total cost of the initial one. This is because the globally optimal state is stochasticallystable, and a traffic state is attracted to the most efficient state. This observation is consistent with the theoreticaldiscussion in the previous subsection. Finally, we consider the Nguyen-Dupuis network with tandem bottlenecks and multiple OD pairs (i.e. non-SBPR-1 network). Figure 11(a) shows the sample paths of the average total costs (here, 1 time slot = 500 iterations) under theevolutionary and fixed implementation schemes from the shortest-distance route profile. We set β = . , the numberof iterations as 200,000 times, and iterate the dynamics from the shortest-distance route profile. The target state of thefixed implementation scheme is Nash equilibrium near the most efficient state with the minimum total cost under theevolutionary implementation scheme: we derived this Nash equilibrium by employing the better response dynamics17
100 200 300 400
Time slot (1 time slot = 500 iteration) A v e r age t o t a l t r a v e l t i m e EvolutionaryFixed (a) Behaviour of total costs from the shortest-distance route profile
Average total travel time S D o f t o t a l t r a v e l t i m e s = 0.025 = 0.05 = 1.0 Total travel time of the initial NE (b) Averages and standard deviations of total costs around Nash equilibrium
Figure 11: Results of the behaviour of average total costs under the logit response dynamics in the Nguyen-Dupuis network where the initial route profile was set to the most efficient state. From this figure, we can see that the average totalcosts under the evolutionary implementation scheme decrease almost monotonically, whereas, those under the fixedimplementation scheme frequently oscillate, as with the previous experiment.Figure 11(b) shows the behaviour of the average total costs around Nash equilibrium. We consider three per-turbation parameters, β = . , . and . , and generate 50 sample paths from an equilibrium state under eachscheme. Note that the frequencies of mistakes are about , and % when β = . , . and . , respectively.The number of iterations for each sample path is set as 50,000 times. We set the most efficient Nash equilibrium asthe initial state. This figure confirms that the characteristics of the behaviour found in the SBPR-1 network also holdin a non-SBPR-1 network, also. Specifically, the standard deviations of the total costs under the fixed implementationscheme are larger than those under the evolutionary implementation scheme, and the average total costs under theevolutionary implementation scheme tend to be lower than those under the fixed implementation scheme. To explorewhether this result holds more generally, a systematic set of numerical experiments should be conducted for differenttypes of network settings.
6. Conclusion
In this study, we examined the convergence and stability of DSO traffic assignment with atomic users. We firstformulated a DSO game that is a strategic game form of the DSO traffic assignment wherein atomic users select routesminimising their marginal social costs. This game becomes a potential game whose potential function is the negativeof total cost function; the Nash equilibrium states correspond to the optimal states of a problem minimising the totalcost. These enable us to rigorously analyse the theoretical properties of evolutionary dynamics. We proved that thebetter/best response dynamics converges to a set of optimal states; a globally optimal state is stochastically stableunder the logit response dynamics. The numerical experiments demonstrated that the total costs generated by the bestresponse dynamics tend to be lower than those by the better response dynamics, and the stochastic algorithm with thelinear perturbation parameter might achieve a lower total cost in a practical number of iterations.As an application of the analysis above, we further examined the characteristics of the evolutionary implementa-tion scheme of marginal cost pricing by comparing this scheme with a fixed implementation scheme. Specifically, wefirst showed the convergence and stability in a DUE-FCP game, which is a strategic game under the fixed implementa-tion scheme, in an SBPR-1 network with the theory of weakly acyclic games. From the differences in externalities andstable states between DSO and DUE-FCP games, we clarified that, under the evolutionary implementation scheme,(i) the total cost decreases smoother to an efficient traffic state, and (ii) a traffic state could reach a more efficientstate in principle. Finally, we conducted numerical experiments to validate the theoretical findings. The experimentsalso indicate the differences in the robustness to perturbations of the two implementation schemes: under the fixed18mplementation scheme, total costs frequently deteriorate by perturbations; in contrast, under the evolutionary imple-mentation scheme, total costs decreases almost monotonically.As part of future works, it is important to clarify traffic flow patterns of optimal states through theoretical analysisand systematic numerical experiments. Especially regarding numerical experiments, although we provide algorithmsthat are guaranteed to converge, they may be inefficient and their computational costs would be high in large-scalenetworks. It is thus required to develop efficient algorithms by utilising useful properties of DSO game, such as theexistence of potential functions. In relation to this, it is important to examine what types of individuals’ learningbehaviour and available information converge evolutionary dynamics to optimal states quickly. To explore the appli-cability of the proposed approach to more general DSO assignment with both route and departure time choices is alsoimportant. Furthermore, it is interesting to pursue another implementation scheme to achieve an optimal state, suchas controlling users via an autonomous vehicle system. The DSO game could be directly applied to the case in whichall vehicles are autonomous vehicles. However, there will exist more complex situations, such as the existence ofnon-autonomous vehicles. Introducing such conditions into a strategic game and analysing the theoretical propertiesare important topics for future studies.
Acknowledgements
This work was financially supported by JSPS KAKENHI Grant numbers JP20K14843 and JP20H00265. Thecomments of anonymous reviewers are gratefully acknowledged.
Appendix A. A dynamic loading model dealing with atomic users
Appendix A.1. Link model
A link model must satisfy several natural conditions, including the FIFO principle and causality (Carey et al.,2003) to eliminate vehicle movements that are physically implausible. We here employ the car-following model byNewell (2002). This model describes how the trajectory x l , n ( t ) , which is the position of the n -th vehicle (‘follower’)entering link l , depends on the trajectory x l , n − ( t ) of the ( n − -th vehicle (‘leader’). Specifically, the trajectory x l , n ( t ) is determined such that the spacing and headway are kept at more than the values of the ‘Newell box’ (seeFigure A.12(a)). The Newell box is characterised by two variables: namely, reaction time τ l and jam spacing d l ,which can be calculated by τ l = w l κ l , d l = κ l , where κ l = ( v l + w l ) q l v l w l . Then, for a given trajectory of the leader x l , n − ( t ) and link arrival time of the follower t al , n , the trajectory x l , n ( t ) iscalculated as follows: x l , n ( t ) = min . (cid:110) v l ( t − t al , n ) , x l , n − ( t − τ l ) − d l (cid:111) . (A.1)The left hand side of the minimum condition means that the follower drives in the free-flow condition; the right handside means that the follower drives in the congested condition.Based on the model, the possible departure/arrival times of the n -th vehicle on a link are derived from the trajectoryof the ( n − -th vehicle (see Figure A.12(b)). The earliest possible arrival time of the n -th vehicle t PAl , n is determinedfrom x − l , n − ( d l ) representing the time that the ( n − -th vehicle arrives at the position d l , as follows : t PAl , n = x − l , n − ( d l ) + τ l . (A.2)The earliest departure time t PDl , n is determined from the departure time of the ( n − -th vehicle t dl , n − and the (actual)arrival time of the n -th vehicle t al , n : t PDl , n = max . (cid:40) t dl , n − + µ l , t al , n + L l v l (cid:41) . (A.3) Note that in case that the ( n − -th vehicle stops at position d l and x − l , n − ( d l ) has multiple values, the maximum of x − l , n − ( d l ) is utilised todetermine the possible arrival time. ewell box ⌧ l
Figure A.12: Schematic of the Newell’s car-following model and possible times S p ace Time x l , n ( t )
Appendix A.2. Node model
A node model determines the actual departure/arrival times of vehicles based on possible times derived from thelink model. Each node belongs to one of the following node types: a normal node that connects one upstream link andone downstream link; a diverge node that connects one upstream link and multiple downstream links; a merge nodethat connects multiple upstream links and one downstream link; an intersection node that connects multiple upstreamand downstream links. Especially, for nodes other than the normal nodes, the departure/arrival times are determinedwhile taking into account diverging or/and merging behaviour of vehicles.On a normal node, the actual departure time from the upstream link is determined as the maximum of the possibledeparture time from the upstream link and the possible arrival time at the downstream link (see Figure A.13). Specif-ically, when the n -th vehicle on the upstream link l enters the downstream link l (cid:48) as the n (cid:48) -th vehicle, the departuretime of the vehicle from link l (i.e. the arrival time at link l (cid:48) ) is determined as follows: t dl , n = t al (cid:48) , n (cid:48) = max (cid:110) t PDl , n , t PAl (cid:48) , n (cid:48) (cid:111) . (A.4)On a diverge node, we assume that the movement of a follower vehicle is restricted by the leader vehicle on thelink which the follower vehicle is going to enter. Thus, we determine the actual departure/arrival times of a vehiclefrom Eq. (A.4) by regarding the link that is included in the route of the vehicle as l (cid:48) .On a merge node, we have to determine which vehicle can actually enter the downstream link when more than onevehicle can simultaneously enter the link. We here assume that the vehicle on the upstream link having the highest‘priority’ can depart from the link and enter the downstream link. The priority at time t of each upstream link of a nodeis determined according to traffic flow passing through the node before time t . Specifically, the priority is determinedsuch that the vehicles enter from the upstream links in a ratio equal to the ratio of their capacities, e.g. when thereexist two upstream links l a and l b of which the ratio of the capacities is and one vehicle has already departed from20 ↵ = ( r ↵ , . . . , r ↵ k , r ↵ k + . . . , r ↵ |P ↵ | , . . . )
This proposition is derived by employing Theorem 6 in Tatarenko (2014). Prior to the introduction of the theorem,we show the following notions utilised in the theorem:
Definition 4. (Scrambling matrix).
A matrix P is called as scrambling matrix, if for any two rows, say α and β ,there exists at least one column, γ , such that both p αγ > and p βγ > .By utilising the notion, the following theorem is proposed: Theorem 6. (Theorem 6 of Tatarenko (2014)).
Consider a potential game, and let B be a pattern matrix of the logitresponse dynamics in the game; the pattern matrix is the transition probability matrix of the logit response dynamicsbetween two states where the parameter β = . Let also integer n < ∞ be a minimal integer such that the n -th powerof the matrix B (i.e. B n ) is scrambling. Then, the logit response dynamics with (cid:15) ( τ ) = ln( τ + / n applied to the gameguarantees the probabilistic convergence to joint actions to the maximisers of potential function Π , namely, lim τ →∞ Pr { r τ ∈ { r ∗ | Π ( r ∗ ) = max r Π ( r ) }} = . (B.1)Thus, it is sufficient for us to prove that the integer n is (cid:100)|P| / (cid:101) in a DSO game with |P| users. To prove this, weutilise the following proposition: Proposition 6.
Consider a pattern matrix B of the logit response dynamics in a DSO game. Consider also two rows α and β , and corresponding route profiles r α and r β . Let P αβ be the set of users taking different routes in these routeprofiles, that is, r α i (cid:44) r β i , i ∈ P αβ r α i = r β i = r i , i ∈ P \ P αβ . (B.2)Then, n = (cid:100)|P αβ | / (cid:101) is a minimal integer such that for the two rows α and β in B n , there exists at least one column γ satisfying the following conditions: b (cid:100)|P αβ | / (cid:101) αγ > ∩ b (cid:100)|P αβ | / (cid:101) βγ > . (B.3) This can be regarded as an atomic version of the Daganzo’s (fluid) merge model (Daganzo, 1995). roof. First of all, each element ( a , b ) of the matrix B n can be represented by utilising the corresponding route profiles r a and r b as follows: b nab = > if r b is reachable from r a by route changes of at most n users otherwise . (B.4)This implies that if a route profile r γ is reachable from r α by route changes of k ( < |P αβ | ) users, at least |P αβ | − k users’ route changes are required to reach r γ from r β . Specifically, if r γ is a route profile that is reached from r α byroute changes of k users in the set |P αβ | to the same routes in r β , r γ can be reached from r β by route changes of theother |P αβ | − k users to the same routes in r α (see Figure B.14). It follows that by iterating the logit response dynamics max { k , |P αβ | − k } times, there exists at least one route profile r γ that is reachable from r α and r β , that is, b max { k , |P αβ |− k } r α r γ > ∩ b max { k , |P αβ |− k } r β r γ > , (B.5)For the given P αβ , it is obvious that the max function is minimised by k = (cid:100)|P αβ | / (cid:101) . Thus, the proposition isproved. (cid:3) From this proposition, in a DSO game with |P| , it follows that if n is equal to or more than (cid:100)|P| / (cid:101) , B n becomes ascrambling matrix since the maximum number of users who take different routes is |P| . From this fact and the theoremproposed by Tatarenko (2014), the proposition is derived. (cid:3) Appendix C. Proof of Corollary 2
We prove the corollary by contradiction. Suppose that there exists another route profile r ∗∗ which is Nash equilib-rium. Among the users taking different routes in these route profiles, we consider the user i departing earliest. From Proposition 3 and the fact that the users departing earlier than user i take the same routes, the following relationshipholds: U Fi ( r , r ∗− i ) = U Fi ( r , r ∗∗− i ) , ∀ r ∈ R i . (C.1)However, by combining this relationship with Eq. (5.3), we obtain the following relationship: U Fi ( r ∗∗ i , r ∗∗− i ) = U Fi ( r ∗∗ i , r ∗− i ) < U Fi ( r ∗ i , r ∗− i ) = U Fi ( r ∗ i , r ∗∗− i ) . (C.2)This contradicts the statement that r ∗∗ is a Nash equilibrium, i.e. there do not exist multiple equilibrium states. Thus,the proposition is proved. Appendix D. Proof of Theorem 5
We prove the existence of a better response path ending at a Nash equilibrium state from every route profile in aconstructive manner: we construct an algorithm that is guaranteed to move any initial traffic state with route profile r into an equilibrium state r ∗ by a sequence of better responses. In the algorithm, the users are divided into two groups:(i) Group A contains the users who are currently on their ex-post best response routes , and (ii) Group B contains userswho are not. The sets of users in Group A and Group B are denoted by P A and P B , respectively. We also denote by r A and r B the route profiles of users in Group A and Group B, respectively (i.e. r = ( r A , r B ) ).We then propose the following algorithm: 22 lgorithm 3: Leading a traffic state to a Nash equilibrium state by better responses Initialisation : Set m = where m represents the iteration counter. Initialise all users to be in Group B, i.e. ( P mA , P mB ) = ( ∅ , P ) and r m = ( r mA , r mB ) = ( , r ) .1. Pick up the user departing from the origin earliest : Pick up the vehicle departing from the originearliest from Group B. The vehicle is denoted by i m ∈ P mB . Note that such a vehicle can uniquely be chosenfor each iteration because we assume that all users with the same origin depart at different times.2. Update the route profile : Find a best response route of user i m to the current route profile r m , i.e. find aroute r ∗ ∈ R i m satisfying the following condition: U Fi m ( r , r m − i m ) ≤ U Fi m ( r ∗ , r m − i m ) , ∀ r ∈ R i m . Compare the utilitywhen user i m takes the current route r mi m and the utility when taking route r ∗ . Then, (a) If the former utility is lower than the latter (i.e. U Fi m ( r mi m , r m − i m ) < U Fi m ( r ∗ , r m − i m ) ), change the route of user i m to r ∗ ,i.e. r m + i m : = r ∗ . Note that the routes of the other users remain the same: r m + j : = r mj for all j ∈ P \ { i m } . (b) If both are the same (i.e. U Fi m ( r mi m , r m − i m ) = U Fi m ( r ∗ , r m − i m ) ), the routes of all users are not changed in the currentiteration, i.e. r m + j : = r mj for all j ∈ P .3. Update the sets of users and judge the convergence : Let P m + A : = P mA ∪ { i m } and P m + B : = P mB \ { i m } . If P m + B (cid:44) ∅ , let m : = m + and go back Step 1. If P m + B = ∅ , then terminate the algorithm; r m + = r m + A is aNash equilibrium state.This algorithm derives an equilibrium state by assigning users to their best response routes in the order of their depar-ture times (please refer to Satsukawa et al. (2019) for the details of the role of each step). Thanks to Proposition 3 , itis guaranteed that the utility of the user in Group A is not changed by route change of any user in Group B. Therefore,once a user is transferred to Group A by Step 2, i.e. the user takes a best response route, the route remains a bestresponse route for the user regardless of route choices of the users in Group B; the route is an ex-post best responseroute for the user. It follows that this algorithm can find a Nash equilibrium state in exactly |P| iterations. There-fore, there exists a better response path ending at a Nash equilibrium state from an arbitrary initial route profile in aDUE-FCP game in an SBPR-1 network. (cid:3)
References
Akamatsu, T., Wada, K., Hayashi, S., 2015. The corridor problem with discrete multiple bottlenecks. Transportation Research Part B: Method-ological 81, 808–829.Blume, L.E., 1993. The statistical mechanics of strategic interaction. Games and Economic Behavior 5, 387–424.Carey, M., 1992. Nonconvexity of the dynamic traffic assignment problem. Transportation Research Part B 26, 127–133.Carey, M., Ge, Y.E., McCartney, M., 2003. A whole-link travel-time model with desirable properties. Transportation Science 37, 83–96.Carey, M., Srinivasan, A., 1993. Externalities, average and marginal costs, and tolls on congested networks with time-varying flows. OperationsResearch 41, 217–231.Carey, M., Watling, D., 2012. Dynamic traffic assignment approximating the kinematic wave model: System optimum, marginal costs, externalitiesand tolls. Transportation Research Part B: Methodological 46, 634–648.Daganzo, C.F., 1994. The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory. Trans-portation Research Part B: Methodological 28, 269–287.Daganzo, C.F., 1995. The cell transmission model, part II: Network traffic. Transportation Research Part B: Methodological 29, 79–93.Garcia, A., Reaume, D., Smith, R.L., 2000. Fictitious play for finding system optimal routings in dynamic traffic networks. Transportation ResearchPart B: Methodological 34, 147–156.Ghali, M., Smith, M., 1995. A model for the dynamic system optimum traffic assignment problem. Transportation Research Part B: Methodological29, 155–170.Kuwahara, M., Akamatsu, T., 1993. Dynamic equilibrium assignment with queues for a one-to-many OD pattern, in: Daganzo, C.F. (Ed.),Proceedings of the 12th International Symposium on the Theory of Traffic Flow and Transportation, Elsevier, Berkeley. pp. 185–204.Kuwahara, M., Yoshii, T., Kumagai, K., 2001. An analysis on dynamic system optimal assignment and ramp control on a simple network. Journalof Japan Society of Civil Engineers 2001, 59–71 [In Japanese].Marden, J.R., Shamma, J.S., 2012. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games andEconomic Behavior 75, 788–808.Merchant, D.K., Nemhauser, G.L., 1978a. A model and an algorithm for the dynamic traffic assignment problems. Transportation Science 12,183–199.Merchant, D.K., Nemhauser, G.L., 1978b. Optimality conditions for a dynamic traffic assignment model. Transportation Science 12, 200–207.Meyn, S., Tweedie, R.L., 2009. Markov Chains and Stochastic Stability. 2nd ed., Cambridge University Press, USA. onderer, D., Shapley, L.S., 1996. Potential games. Games and Economic Behavior 14, 124–143.Mu˜noz, J.C., Laval, J.A., 2006. System optimum dynamic traffic assignment graphical solution method for a congested freeway and one destination.Transportation Research Part B: Methodological 40, 1–15.Newell, G.F., 2002. A simplified car-following theory: A lower order model. Transportation Research Part B: Methodological 36, 195–205.Nie, Y.M., 2011. A cell-based Merchant-Nemhauser model for the system optimum dynamic traffic assignment problem. Transportation ResearchPart B: Methodological 45, 329–342.Qian, Z.S., Shen, W., Zhang, H.M., Sean, Z., Shen, W., Zhang, H.M., 2012. System-optimal dynamic traffic assignment with and without queuespillback: Its path-based formulation and solution via approximate path marginal cost. Transportation Research Part B: Methodological 46,874–893.Sandholm, W.H., 2002. Evolutionary implementation and congestion pricing. Review of Economic Studies 69, 667–689.Sandholm, W.H., 2005. Negative externalities and evolutionary implementation. Review of Economic Studies 72, 885–915.Sandholm, W.H., 2007. Pigouvian pricing and stochastic evolutionary implementation. Journal of Economic Theory 132, 367–382.Satsukawa, K., Wada, K., Iryo, T., 2019. Stochastic stability of dynamic user equilibrium in unidirectional networks: Weakly acyclic gameapproach. Transportation Research Part B: Methodological 125, 229–247.Shen, W., Nie, Y., Zhang, H.M., 2007. On path marginal cost analysis and its relation to dynamic system-optimal traffic assignment, in: Allsop,R.E., Bell, M.G.H., Heydecker, B. (Eds.), Proceedings of the 17th international symposium on transportation and traffic theory, Elsevier, London, England. pp. 327–360.Shen, W., Zhang, H.M., 2009. On the morning commute problem in a corridor network with multiple bottlenecks: Its system-optimal traffic flowpatterns and the realizing tolling scheme. Transportation Research Part B: Methodological 43, 267–284.Shen, W., Zhang, H.M., 2014. System optimal dynamic traffic assignment: Properties and solution procedures in the case of a many-to-onenetwork. Transportation Research Part B: Methodological 65, 1–17.Tatarenko, T., 2014. Proving convergence of log-linear learning in potential games. Proceedings of the American Control Conference , 972–977.Wada, K., Satsukawa, K., Smith, M.J., Akamatsu, T., 2019. Network throughput under dynamic user equilibrium : Queue spillback, paradox andtraffic control. Transportation Research Part B: Methodological 126, 391–413.Young, H.P., 1993. The evolution of conventions. Econometrica 61, 57–84.Young, H.P., 2004. Strategic Learning and Its Limits. Oxford University Press, USA.Zhang, H.M., Shen, W., 2010. Access control policies without inside queues: Their properties and public policy implications. TransportationResearch Part B: Methodological 44, 1132–1147.Zhang, P., Qian, S., 2020. Path-based system optimal dynamic traffic assignment : A subgradient approach. Transportation Research Part B:Methodological 134, 41–63.Zhao, C.L.L., Leclercq, L., 2018. Graphical solution for system optimum dynamic traffic assignment with day-based incentive routing strategies.Transportation Research Part B: Methodological 117, 87–100.Zhu, F., Ukkusuri, S.V., 2013. A cell based dynamic system optimum model with non-holding back flows. Transportation Research Part C:Emerging Technologies 36, 367–380.Ziliaskopoulos, A.K., 2000. A linear programming model for the single destination system optimum dynamic traffic assignment problem. Trans-portation Science 34, 36–49.onderer, D., Shapley, L.S., 1996. Potential games. Games and Economic Behavior 14, 124–143.Mu˜noz, J.C., Laval, J.A., 2006. System optimum dynamic traffic assignment graphical solution method for a congested freeway and one destination.Transportation Research Part B: Methodological 40, 1–15.Newell, G.F., 2002. A simplified car-following theory: A lower order model. Transportation Research Part B: Methodological 36, 195–205.Nie, Y.M., 2011. A cell-based Merchant-Nemhauser model for the system optimum dynamic traffic assignment problem. Transportation ResearchPart B: Methodological 45, 329–342.Qian, Z.S., Shen, W., Zhang, H.M., Sean, Z., Shen, W., Zhang, H.M., 2012. System-optimal dynamic traffic assignment with and without queuespillback: Its path-based formulation and solution via approximate path marginal cost. Transportation Research Part B: Methodological 46,874–893.Sandholm, W.H., 2002. Evolutionary implementation and congestion pricing. Review of Economic Studies 69, 667–689.Sandholm, W.H., 2005. Negative externalities and evolutionary implementation. Review of Economic Studies 72, 885–915.Sandholm, W.H., 2007. Pigouvian pricing and stochastic evolutionary implementation. Journal of Economic Theory 132, 367–382.Satsukawa, K., Wada, K., Iryo, T., 2019. Stochastic stability of dynamic user equilibrium in unidirectional networks: Weakly acyclic gameapproach. Transportation Research Part B: Methodological 125, 229–247.Shen, W., Nie, Y., Zhang, H.M., 2007. On path marginal cost analysis and its relation to dynamic system-optimal traffic assignment, in: Allsop,R.E., Bell, M.G.H., Heydecker, B. (Eds.), Proceedings of the 17th international symposium on transportation and traffic theory, Elsevier, London, England. pp. 327–360.Shen, W., Zhang, H.M., 2009. On the morning commute problem in a corridor network with multiple bottlenecks: Its system-optimal traffic flowpatterns and the realizing tolling scheme. Transportation Research Part B: Methodological 43, 267–284.Shen, W., Zhang, H.M., 2014. System optimal dynamic traffic assignment: Properties and solution procedures in the case of a many-to-onenetwork. Transportation Research Part B: Methodological 65, 1–17.Tatarenko, T., 2014. Proving convergence of log-linear learning in potential games. Proceedings of the American Control Conference , 972–977.Wada, K., Satsukawa, K., Smith, M.J., Akamatsu, T., 2019. Network throughput under dynamic user equilibrium : Queue spillback, paradox andtraffic control. Transportation Research Part B: Methodological 126, 391–413.Young, H.P., 1993. The evolution of conventions. Econometrica 61, 57–84.Young, H.P., 2004. Strategic Learning and Its Limits. Oxford University Press, USA.Zhang, H.M., Shen, W., 2010. Access control policies without inside queues: Their properties and public policy implications. TransportationResearch Part B: Methodological 44, 1132–1147.Zhang, P., Qian, S., 2020. Path-based system optimal dynamic traffic assignment : A subgradient approach. Transportation Research Part B:Methodological 134, 41–63.Zhao, C.L.L., Leclercq, L., 2018. Graphical solution for system optimum dynamic traffic assignment with day-based incentive routing strategies.Transportation Research Part B: Methodological 117, 87–100.Zhu, F., Ukkusuri, S.V., 2013. A cell based dynamic system optimum model with non-holding back flows. Transportation Research Part C:Emerging Technologies 36, 367–380.Ziliaskopoulos, A.K., 2000. A linear programming model for the single destination system optimum dynamic traffic assignment problem. Trans-portation Science 34, 36–49.