Discrete Mean Field Games: Existence of Equilibria and Convergence
aa r X i v : . [ m a t h . O C ] S e p Discrete Mean Field Games: Existence of Equilibria andConvergence
Josu Doncel a , Nicolas Gast b,c , Bruno Gaujal b,c a University of the Basque Country UPV/EHU, Spain b Univ. Grenoble Alpes, F-38000 Grenoble, France c Inria
Abstract
We consider mean field games with discrete state spaces (called discrete mean fieldgames in the following) and we analyze these games in continuous and discrete time,over finite as well as infinite time horizons. We prove the existence of a mean fieldequilibrium assuming continuity of the cost and of the drift. These conditions aremore general than the existing papers studying finite state space mean field games.Besides, we also study the convergence of the equilibria of N -player games to meanfield equilibria in our four settings. On the one hand, we define a class of strategiesin which any sequence of equilibria of the finite games converges weakly to a meanfield equilibrium when the number of players goes to infinity. On the other hand,we exhibit equilibria outside this class that do not converge to mean field equilibriaand for which the value of the game does not converge. In discrete time this non-convergence phenomenon implies that the Folk theorem does not scale to the meanfield limit.
1. Introduction
Mean field games have been introduced by Lasry and Lions [34] as well as Huang,Caines and Malham´e [30] to model interactions between a large number of strategicagents (players) and have had a large success ever since. Since the seminal work in[32, 33, 34, 30], a large variety of papers have been investigating mean field games.Most of the literature concerns continuous state spaces and describes a mean fieldgame as a coupling between a Hamilton-Jacobi-Bellman equation with a Fokker-Planck equation (see for example [28, 7, 9, 24, 10, 25, 22, 23, 3]). Here, we areinterested in studying mean field games with a finite number of states and finitenumber of actions per player. In this case, the analog of the Hamilton-Jacobi-Bellman equation is the
Bellman equation and the discrete version of the Fokker-Planck equation is the
Kolmogorov equation . Preprint submitted to Elsevier inite state space mean field games in discrete time ( a.k.a. with synchronousplayers) were previously studied in [20]. In their work, the strategy of the players isthe probability matrix of the Kolmogorov equation. This implies that each playercan choose her dynamics independently of the state of the others: the behaviorof players is only coupled via their costs. In that case, the Kolmogorov equationbecomes linear.Finite state space mean field games in continuous time ( a.k.a. with asynchronousplayers) have also been previously analyzed in [21, 27, 5, 13]. In their model, theplayers also control completely the transition rate matrix so that the dynamics areagain linear once the actions of the players are given. Again, players do not interactwith each other directly in these models, but only through their costs.The models we study here, both in the synchronous and asynchronous casescover non-linear dynamics: We consider that the players do not have the power tochoose the rate matrix and that their actions only have a limited effect on theirstate. Here, the transition rate matrix may depend not only on the actions taken bythe player, but also on the population distribution of the system. This introducesan explicit interaction between the players (and not just through their costs). Thisnon-linear dynamics is called the relaxed case in [14]. We claim that the model withexplicit interactions covers several natural phenomena such as information/infectionpropagation or resource congestion where the cost but also the state dynamics of aplayer depend on the state of the all the others. This type of behavior is classical insystems with a large number of interacting objects [6] and cannot be handled usingprevious mean field game models. For instance, in the classical SIR (Susceptible,Infected, Recovered) infection model [39], the rate of infection of one individualdepends on the proportion of individuals already infected. Similarly, in a model ofcongestion one player cannot typically use a resource if it is already used to fullcapacity.We show that the only requirement needed to guarantee the existence of a MeanField Equilibrium in mixed strategies is that the cost is continuous with respectto the population distribution (convexity is not needed). This result nicely mimicsthe conditions for existence of a Nash equilibrium in the simpler case of staticpopulation games (see [36]). The existence of a mean field equilibrium in mixedstrategies has been previously shown by [31, 12] in the diffusion case. In [27] theexistence of a Mean Field Equilibrium is proven under the assumption that the costof a player is strictly convex w.r.t. her strategy and in [21] the authors also consideruniformly convex functions. These conditions are rather strong because they arenot satisfied in the important case of linear and/or expected costs. In [14] existenceof a Nash equilibrium is also proved under mere continuity assumptions and witha compact action space (more general than the simplex, used here). However, themain difference between the two approaches is the type of mean field limit that isused. In [14], the trajectories of the states of the players are considered while we2nly consider the state at time t . The first approach uses arguments in line withthe propagation of chaos while the second one is closer to the work in [4, 38]. Whilethe convergence of trajectories is a more refined convergence than the point-wiseconvergence in general, this is useless here. Indeed, for mean field games, costs areassociated to states and actions and not to trajectories. Therefore, the point-wisemean field approach is sufficient. Another difference with [14] is that an additionalassumption about the uniqueness of the argmin is needed in some parts of theconvergence proof as well as for existence (in the feedback case). This is not thecase here, so both papers do not cover the exact same set of games.As in most existence proofs, our proof is based on a version of the fixed pointtheorem of Kakutani in infinite dimension (see for example [13] where such an ex-tended version of the fixed point theorem is used in a mean field game model withminor and major players). Here however, we do not consider the best responseoperator but the evolution of the population distribution instead, as in [14]. Outof the four cases (asynchronous/synchronous, finite/infinite horizons), we mainlydetail the asynchronous player case for which we prove this existence of a mean fieldequilibrium in an infinite horizon with discounted costs. We also show, more briefly,how these results can be extended to a finite horizon or to a finite or infinite timehorizon in the synchronous-player case.Our second contribution concerns convergence of finite games to mean field lim-its. Different authors have studied the convergence of N -player games equilibria tomean field equilibria, e.g. [29, 1, 37, 38]. The type of strategies considered in thesepaper is different from ours: they consider that the strategy of a player only de-pends on her internal state (these are called stationary policies in [38]), whereas herewe allow time dependence in these policies. The model in [38] does include statedynamics that depend on the population distribution but only considers station-ary strategies that do not depend on time, hence cannot depend on the populationdynamics.In all four combinations (finite / infinite horizon, synchronous / asynchronous),a mean field equilibrium is always an ǫ -approximation of an equilibrium of a corre-sponding game with a finite number N of players, where ǫ goes to 0 when N goesto infinity. This is the discrete pending result to similar results in continuous games[11]. However, we show also that not all equilibria for the finite version converge toa Nash equilibrium of the mean field limit of the game. We provide several counter-examples to illustrate this fact. They are all based on the following idea: The “titfor tat” principle allows one to define many equilibria in repeated games with N players. However, when the number of players is infinite, the deviation of a singleplayer is not visible by the population that cannot punish him in retaliation forher deviation. This implies that while the games with N players may have manyequilibria, as stated by the folk theorem, this may not be the case for the limitgame. This fact is well-known for large repeated games (see examples of Anti-folk heorems in [35, 2]). However, up to our knowledge, these results have not yet beeninvestigated in the mean field game framework. Finally, our four models of dynamic games do not face the issue of the order ofplay, nor partial information. Thus, we avoid two difficulties of dynamic games: theinformation structure of each player and the existence of a value [15]. In our case,all players are similar, so the order of play is irrelevant, and we only consider thefull information case: players know the strategy of the other players and the currentglobal state (more details on this are given in Section 3.2).The rest of the article is organized as follows. We introduce mean field gameswith explicit interactions in continuous time in Section 2 where we mainly focus onthe infinite horizon with discounted costs. We describe the evolution of the state ofthe players, the cost function as well as the best response operator. In both cases(finite and infinite horizon), we prove the existence of an equilibrium. We show inSection 3 that this equilibrium is an approximation of an equilibrium for the gamewith a finite number of players. Finally, we study an example of an N -player gameinspired from the prisoner’s dilemma whose equilibria are not always equilibria forthe limit mean field game. We focus on the synchronous case in Section 5 (whereplayers all play at the same time). In this case, N -player games can be seen asclassical stochastic games in discrete time. We derive the mean field limit dynamicsand the existence of an equilibrium. Here counter-examples of equilibria for finitegames that do not go to the limit are easier to find. Indeed, the folk theorem appliesand all equilibria based on retaliation cannot be equilibria at the limit.
2. Discrete Mean Field Games in Continuous Time
A discrete mean field game G is a tuple G = ( E , A , { Q a } , m , { c a } , β ), where E isthe state space, A the action set, { Q a } the transition rate matrices, m the initialstate, { c a } the cost functions and β ∈ R a discount factor.The game is described as follows. State and action sets.
We consider a population made of an infinite number ofhomogeneous players that evolve in continuous time. Each player has a finite statespace denoted by E = { , . . . , E } and a finite action set A = { , . . . , A } .We denote by P ( A ) (resp. P ( E )) the set of probability measures over A (resp. E ). Since A is finite, P ( A ) is the simplex of dimension A . An extended abstract discussing our counterexample in the continuous time model with infinitehorizon was presented in [16]. et of strategies. A mixed strategy (or strategy for short) is a measurable function π : E × R + → P ( A ), that associates to each state i ∈ E and each time t ≥ π i ( t ) ∈ P ( A ) on the set of possible actions. We also denote by π i,a ( t ) the probability that, at time t , a player in state i takes the action a , understrategy π . For all t ≥ i ∈ E , we have P a ∈A π i,a ( t ) = 1. The set of allpossible strategies is denoted by S .We say that a strategy is pure if, for all state i and all t ∈ R , there exists anaction a ∈ A such that π i,a ( t ) = 1 and π i,a ′ ( t ) = 0 for all a ′ = a .The set S is a bounded subset of the Hilbert space of the functions E × R + → R A equipped with the inner product the exponentially weighted inner product : h f, g i = R ∞ f ( g ) g ( t ) e − βt dt . This shows that S is weakly compact, where the weaktopology is defined as follows: a sequence of policy π n converges to a policy π if forany bounded function g :lim n →∞ Z ∞ π n ( t ) g ( t ) e − βt dt = Z ∞ π ( t ) g ( t ) e − βt dt. (1) Rate matrices.
We denote by m π ( t ) ∈ P ( E ) the population distribution at time t .As the state space is finite, m π ( t ) is a vector whose i -th component, m πi ( t ), is theproportion of players in state i at time t . The evolution over time of the populationdistribution is driven by rate matrices: { Q a ( m π ( t )) } a ∈A . By definition, Q ija ( m π ( t ))is the rate at which a player in state i moves to state j when choosing action a , whenthe population distribution is m ( t ). Note that by definition, P j ∈E Q ija ( m π ( t )) = 0for all i and a and Q ija ( m π ( t )) is non-negative for all j = i and all a .In the following, we assume that for all i, j, a , Q ija ( m ) is Lipschitz-continuousin m with constant L .The initial condition is m π (0) = m . For t ≥
0, the population distribution m π ( t ) is the solution of the following differential equation, that depends on thestrategy π : for j ∈ E ˙ m πj ( t ) = X i ∈E X a ∈A m πi ( t ) Q ija ( m π ( t )) π i,a ( t ) . (2)The rationale behind this differential equation is that all players in state i use theaction a ∈ A and move to state j with rate Q ija ( m π ( t )).If the strategy π i ( t ) is not continuous in time, the differential equation (2) maynot be well-posed at time-points where π i is not continuous. The existence of acontinuous solution for (2) is guaranteed by the Carath´eodory’s Existence Theorem.The Lipschitz condition on Q further implies that this solution is essentially unique5ecause any solution of (2) must be a fixed point of m πj ( t ) = m j, + Z t X i ∈E X a ∈A m πi ( u ) Q ija ( m π ( u )) π i,a ( u ) ! du. (3)In anticipation, the same properties (existence and uniqueness of the solution ofthe ODE) hold for the differential equation (4). Remark 1 (Explicit interactions) . In this model, the rate matrix Q ija ( m π ( t )) de-pends explicitly on the population distribution: the rate to go from state i to state j under action a depends on how the whole population is distributed among the statesof the system. Other mean field models, such as [20], only consider the special casewhere Q ija ( m π ( t )) is constant: Q ija ( m π ( t )) = Q ija . This restricts the populationdynamics given in (2) to linear dynamics. Cost function.
We now concentrate on a particular player, that we call Player 0.Player 0 chooses her own strategy π : R + ×E → P ( A ). We denote by x π ( t ) ∈ P ( E )the probability distribution of Player 0 when Player 0 uses strategy π against apopulation who has distribution m . For a given state i ∈ E , x π , m i ( t ) denotes theprobability for Player 0 to be in state i at time t . The distribution x π , m evolvesover time according to the following differential equation: for j ∈ E ˙ x π , m j ( t ) = X i ∈E X a ∈A x π i ( t ) Q ija ( m ( t )) π i,a ( t ) . (4)If Player 0 is in state i and takes an action a , it suffers from an instantaneous cost c i,a ( m ( t )), that depends on the population distribution at time t . We assume thatthe cost is always continuous in m . Given a population distribution m and thestrategy of Player 0 π , we define the discounted cost of Player 0 as W ( π , m ) = Z ∞ X i ∈E X a ∈A x π , m i ( t ) c i,a ( m ( t )) π i,a ( t ) e − βt ! dt, (5)where β > V ( π , π ) that represents the discounted cost ofPlayer 0 when the population plays a strategy π : V ( π , π ) = W ( π , m π ) . Best response.
The best response to π of Player 0 is to choose a strategy π ∈ S that minimizes her discounted cost (5) when the rest of the population plays strategy6 . For a given population strategy π , we denote the set of best responses of Player 0to π by BR ( π ). This set is the set of strategies that minimizes her discounted cost: BR ( π ) := arg min π ∈S V ( π , π ) . (6)Note that the best response function is well defined or, in other words, that the“argmin” is reached for some strategy in Equation (6). To prove that, we will laterprove in Section 2.3 that the function V is continuous for the weak topology. As S is weakly compact, this shows that the minimum in π is attained. Proposition 1.
The function V , defined in Equation (5) is continuous in π and π (for the weak-topology on S ). Mean field equilibrium.
We then define a mean field equilibrium as a strategy π MF E such that when the population strategy is π MF E , a selfish Player 0 wouldalso choose the same strategy π MF E as her best response.
Definition 1 (Mean Field Equilibrium) . A strategy π is called a mean field equilib-rium if it is a fixed point for the best response function, i.e., π MF E ∈ BR ( π MF E ) . (7) A mean field equilibrium is pure if it is a pure strategy.
The rationale behind this definition is when one considers that the population isformed by players that each take selfish decisions. As the population is homogeneous,each player best response is the same as Player 0. In other words, for a givenpopulation strategy π , all the rational players of the populations (or players) choosethe strategy BR ( π ). As in classical games, a mean field equilibrium is a situationwhere no player has incentive to deviate unilaterally from the common strategy. We now show that, under very general assumptions, all discrete mean field gamesadmit a mean field equilibrium. As for classical games, these equilibria are not nec-essarily pure . As most proof on existence of equilibria, our proof relies on a gener-alization of Kakutani fixed point theorem to infinite dimensional spaces. However,the classical approach consisting of showing that the best response function BR( π )is a Kakutani map does not work here when the cost function is not strictly convex.Therefore, in our approach we focus on the state of the game instead of the bestresponse function.As mentioned before, the differential equations (2), (4) and the cost equation(5) are all well defined under our running Assumption (A1):7A1) The rate function m Q ija ( m ) is Lipschitz-continuous in m . The cost func-tion m c i,a ( m ) is continuous in m .In particular, this assumption implies that the costs and the rates are all boundedby a finite value. Theorem 1.
Any discrete mean field game G whose rate and cost satisfy Assump-tion (A1) admits a mean field equilibrium. Note that in general, the best response function π BR ( π ) is neither continuousnor hemi-continuous in general under (A1). In particular, the best response set BR ( π ) may not be a convex set. This makes difficult the application of the classicalfixed point theorems on the best response function. As a result, our proof willformulate the fixed point problem in an alternative manner by considering a fixedpoint in m . For a strategy π , the function m π satisfies the differential equation (2). As m π ( t )lives in a compact and the functions Q are continuous, the right-hand side of thisdifferential equation is bounded. This shows that there exists a constant L ′ suchthat for any strategy π , the function m π is Lipschitz-continuous with constant L ′ .Similarly the function x π is also Lipschitz-continuous with constant L ′ .Let M be the set of functions from R + to P ( E ) that are Lipschitz-continuouswith constant L ′ . We equip this set with the exponentially weighted L ∞ -norm : (cid:13)(cid:13) m − m ′ (cid:13)(cid:13) = sup i ∈E ,t ≥ (cid:12)(cid:12) m i ( t ) − m ′ i ( t ) (cid:12)(cid:12) e − βt . By the Arzela-Ascoli theorem, M is a compact space.To prove that V is continuous in π and π , it suffices to show that the mapping π m π is continuous (for the weak topology) and that the mapping ( π , m ) x π , m is continuous. To prove the continuity of m π , let π n be a sequence of strategythat converges to a strategy π . As M is compact, there exists a function m and asubsequence of m π n that converges to m . Moreover, we have : m j ( t ) = m j, + lim n →∞ Z t X i ∈E X a ∈A m π n i ( u ) Q ija ( m π n ( u ))( π n ) i,a ( u ) ! du (8)= m j, + Z t X i ∈E X a ∈A m i ( u ) Q ija ( m ( u )) π i,a ( u ) ! du, (9)8here the convergence holds because π n converges weakly to π and m π n convergesuniformly on all compact to m .Equation (9) shows that the function m is equal to the function m π . This showsthat π → m π is continuous in π which implies that V is continuous in π .The proof that ( π , m ) x π , m is continuous is very similar to the above proofand we therefore omit it. Recall that for a given population distribution m ∈ M , the cost of a strategy π is defined as W ( π , m ) = Z ∞ X i,a x i ( t ) π i,a ( t ) c i,a ( m ( t )) e − βt dt, (10)where x satisfies (for all j ∈ E ): ˙ x j ( t ) = X i,a x i ( t ) Q ija ( m ( t )) π i,a ( t ) . (11)We now define the function Φ : M → M as the best response to a populationdistribution m . It is a mapping that associates to a population distribution m ∈ M ,the set of all state distributions that can be induced by an optimal policy:Φ( m ) = (cid:26) x π such that π ∈ arg min π ∈S W ( π, m ) (cid:27) . (12)In the remainder of the proof, for all m ∈ M , Φ( m ) is well defined and nonempty ( i.e. , the minimum is attained), is convex and compact. Moreover, we willalso show that the function Φ( · ) is upper-semicontinuous. As M is compact [8,Prop. 11.11], this shows that Φ( · ) satisfies the conditions of the fixed point theoremgiven in [26, Theorem 8.6] and therefore has a fixed point m ∗ . By the definition ofΦ, this implies that there exists a strategy π that is a best-response to m π , whichimplies that π is a mean field equilibrium. Definition of Φ( m ) – It can be shown that W is continuous (by using a rea-soning similar to the one for V (Proposition 1)). This shows that there exists π that attains the minimum on the right hand side of Equation (12), which shows thatΦ( m ) is well defined and non-empty. 9 ompactness of Φ( m ) – Let us consider the following optimization problem:min x , z Z ∞ X i,a z i,a ( t ) c i,a ( m ( t )) e − βt dt (13)such that z satisfies P a z j,a ( t ) = x j ( t ) ∀ j ∈ E ,z j,a ( t ) ≥ , ∀ j ∈ E , ∀ a ∈ A , ˙ x j ( t ) = P i,a z i,a ( t ) Q ija ( m ( t )) ∀ j ∈ E . (14)The above problem is a linear problem, which implies that the set of optimal solu-tions is convex and compact. Let us show that the set of optimal solution of theoptimization problem (13) is Φ( m ). To show this, let us remark that the constraints(11) are equivalent to the constraints (14) by replacing the variables x i ( t ) π i,a ( t ) by z i,a ( t ). Then, the constraint π ∈ S of (11), that corresponds to π ( t ) ∈ P ( A ), isreplaced with z i,a ( t ) ≥ P a z i,a ( t ) = x i ( t ). Upper-semi continuity of
Φ. To prove that Φ is upper-semi continuous, letus show that the graph of m Φ( m ) is closed. Let m n ∈ M and x n ∈ Φ( m n )be two sequences such that lim n →∞ m n = m ∞ and lim n →∞ x n = x ∞ . We want toshow that x ∞ ∈ Φ( m ∞ ).As W is continuous, for all x n ∈ Φ( m n ), there exists a strategy π n that minimizes W ( π, m n ) and such that x n = x π n , m n . As the set S is weakly compact, this sequenceof strategies has a subsequence that converges weakly to a strategy π ∗ . Moreover,we have: • As W is continuous, π ∗ minimizes W ( π, m ∞ ). This shows that x π ∗ ∈ Φ( m ∞ ). • The solution of (11) is continuous in π and m , which shows that x ∞ = x π ∗ , m ∞ .Combining these two facts shows that x ∞ ∈ Φ( m ∞ ) which implies that the graphof Φ is closed. Remark 2.
The continuity assumption (A1) is tight in the following sense:1- If the rate Q is not Lipschitz-continuous in m , then the evolution of thepopulation is not well defined, in the sense that the evolution equation (2) may haveseveral solutions or no solution at all.2- There exist games with non-continuous cost functions that do not admit anymean field equilibrium. For example, consider the following mean field game: G = (cid:18) E = { , } , A = { a, b } , Q a = 0 , Q b = " − , m (0) = (1 ,
0) (15) c a ( m , m ) = 0 , c b ( m , m ) = ( − if m ≤ / otherwise , β (cid:19) . (16)10 ssume that this game has a mean field equilibrium and let denote by m ( t ) the stateat equilibrium. By definition of Q a and Q b , m ( t ) is a non-decreasing function.Hence, let τ = sup { t : m ( t ) ≤ / } (note that τ ∈ [ln 2; + ∞ ) ∪ { + ∞} ). It shouldbe clear that the best response of Player 0 to any state function m is the policy π ( τ ) that consists in playing “ b ” until τ and “ a ” after τ . However, such a policy is nevera mean field equilibrium: under the policy π ( τ ) , m ( t ) = 1 − e − min( t,τ ) , which meansthat sup { t : m ( t ) ≤ / } ∈ { ln 2 , + ∞} . None of the policies π (ln 2) or π ( ∞ ) is anequilibrium: the policy π (ln 2) is the best response to π ( ∞ ) and vice-versa.
3. Convergence of Finite Games to Mean Field Games
Mean field games are often presented as a limit of a sequence of finite games asthe number N of players goes to infinity. In this section, we investigate positive andnegative results that link finite games and mean field games. N Exchangeable Players
To any discrete mean field game G = ( E , A , { Q a } , m , { c a } , β ), one can associatea stochastic N -player game G N = ( N, E , A , { Q a } , m , { c a } , β ) as follows. The finitestochastic game G N has the same state and action spaces E , A , the same rate matri-ces Q a , the same cost functions c a , the same discount factor β , and the same initialstate as G . The time evolution of the finite game is as follows. At any time t , eachplayer (say Player n ) chooses a (randomized) action A n ( t ) ∈ P ( A ).We consider a mean field interaction model between the players, which meansthat the behavior of one object only depends on the states of the other objectsthrough the proportion of objects that are in a given state. To be more precise, wedenote by M ( t ) ∈ P ( E ) the population distribution of the system at time t . As theset E is finite, M ( t ) is a vector with |E| components and for all i ∈ E , M i ( t ) is thefraction of players that have state i at time t : M i ( t ) = 1 N N X n =1 { X n ( t )= i } . The state of one player (say Player n) follows a continuous time Markov chainwhose rate varies over time. The only dependence between players is through therate that depends on the population distribution.More precisely, the evolution of the state of Player n , under F t , the naturalfiltration of the process, satisfies for all k ∈ N and all states i = j , P ( X n ( t + dt ) = j | X n ( t ) = i, M ( t ) = m , A n ( t ) = a, F t ) = Q ija ( m ) dt + o ( dt ) , (17)11here A n ( t ) is the action taken by Player n at time t .At any time t , Player n suffers an instantaneous cost that is a function of herstate X n ( t ), the action that she takes A n ( t ) and the population distribution M ( t ).We write this instantaneous cost c X n ( t ) ,A n ( t ) ( M ( t )).The objective of Player n is to choose a strategy π n from some set of admissiblestrategies Π, in order to minimize her expected discounted cost, knowing the strate-gies of the others. As before, the discount factor is denoted by β . Given a strategy π n ∈ Π used by Player n and a strategy π ∈ Π used by all the others, we denote by V N ( π n , π ) the expected discounted cost of Player n : V N ( π n , π ) = E "Z e − βt c X n ( t ) ,A n ( t ) ( M π ( t )) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A n is chosen w.r.t. π n A n ′ is chosen w.r.t. π ( ∀ n ′ = n ) . A Nash equilibrium for this game is a strategy π such that Player n does nothave another admissible strategy that leads to a lower cost. This notion dependsnaturally on the set of admissible strategies. Definition 2 (Equilibrium of the N player game) . For a given set of strategies Π ,a strategy π ∈ Π is called a symmetric equilibrium in Π if for any strategy π n ∈ Π : V N ( π, π ) ≤ V N ( π n , π ) . We will also use the notion of ε -equilibrium: Definition 3 ( ε -equilibrium of the N player game) . For a given set of strategies Π , a strategy π ∈ Π is called an ε - symmetric equilibrium in Π if for any strategy π n ∈ Π : V N ( π, π ) ≤ V N ( π n , π ) + ε. In a full information setting, A n ( t ) is a (possibly random) function of the values X n ′ ( t ′ ) up to time t ′ ≤ t and all actions taken in the past A n ′ ( t ′ ), for t ′ < t andfor n ′ ∈ { . . . N } . Such a strategy is, however, hard to analyze. Therefore, in thefollowing, we will consider two natural subclasses for the set of admissible strategies,depending on the information available to the players: • (Markov) – A strategy π is called a Markov strategy if it induces a choice of A n ( t ) that is a (possibly random) measurable function of only t , M ( t ) and X ( t ): P ( A n ( t ) = a | F t ) = π a,X n ( t ) ( t, M ( t )) . M ( t ).This implies that when all the other players use a Markov strategy, the setof Markov strategies is dominant among the set of full-information strategies:there exists a full-information best response for Player n that is a Markovstrategy. Furthermore, any Markov game admits a Markovian Nash equilib-rium (see [17]). • (Local) – A strategy π is a local strategy if the choice of the action only dependson the player’s internal state and on the time. P ( A n ( t ) = a | F t ) = π a,X n ( t ) ( t ) . If a player uses a local strategy, its actions may depend on time, hence maytrack the law of the population M ( t ) (but not M ( t ) itself). Also notice that alocal strategy is not necessarily stationary because of its dependence on time. The next theorem provides a relation between local equilibria of finite gamesand mean field equilibria of the limit mean field game. In particular, it shows thatmean field equilibria are a good approximation of local equilibria. However, as wewill show later, this result does not hold for Markovian equilibria.
Theorem 2.
Consider a finite stochastic game G N , with N players and assume that(A1) holds for its rate matrices Q a and its cost functions c a . Then:(i) Let π be a mean field equilibrium of the associated mean field game G . Thereexists N such that for all N ≥ N , π is a local ε -equilibrium of the N playergame.(ii) Let ( π N ) N ∈ N be a sequence of local strategies such that π N is an ε N -equilibriumfor the N player game, with ε N → . Then any sub-sequence of the sequence ( π N ) has a sub-sequence that converges weakly to a mean field equilibrium of G .Proof. First, V N ( π n , π ) converges to V ( π n , π ) uniformly in ( π n , π ). Uniform con-vergence follows from Theorem 3.3.2 in [38] (The theorem is stated for stationarystrategies, but local strategies as defined here are equivalent to stationary strategies,as defined in [38]).Thus, for any ε , there exists N such that N ≥ N implies that (cid:12)(cid:12) V N ( π n , π ) − V ( π n , π ) (cid:12)(cid:12) ≤ ε/
2. Hence, if π is a mean field equilibrium, this impliesthat for any local strategy π n : V N ( π, π ) ≤ V ( π, π ) + ε ≤ V ( π n , π ) + ε ≤ V N ( π n , π ) + ε. (i) .For (ii) , if π N is a sequence of local strategies, then any sub-sequence has a sub-sequence that converge weakly to some local strategy π ∞ . As V ( π n , π ) is continuousin π n and π (for the weak topology), this implies that V ( π ∞ , π ∞ ) ≤ V ( π n , π ∞ ) forall local strategy π n . We now show that Theorem 2-( ii ) does not generalize to Markov strategies. thefollowing example was first presented in [16]. The main ingredient used to constructthe following counterexample, is the “tit-for-tat” principle. This principle can beused to construct equilibria for any N -player game but cannot be used in meanfield games. This approach has been used in repeated game papers (see for examplethe examples in [35], further generalized [2]). Up to our knowledege, this type ofbehavior has not yet been described in the mean field game framework.Let us consider a mean field version of the classical prisoner’s dilemma. Thestate space of a player is E = { C, D } (that stand for Cooperate and Defect) and theaction set is the same A = E . At each time step, one player is chosen. If she selectsan action a ∈ A , her state becomes a at the next time step.The instantaneous cost of a Player n depends on her state i and on the meanfield m : c i,a ( m ) = ( m C + 3 m D if i = C m D if i = D At each time step, this cost function corresponds to a matching game where a playerplays against a randomly assigned opponent and suffers a cost that corresponds tothe following matrix: C DC 1,1 3,0D 0,3 2,2The strategy D dominates the strategy C . This implies that playing D is theunique mean field equilibrium. Indeed, the expected cost (given by (5)) of a Player 0that has a state vector x while the mean field is m ( t ) is Z ∞ [ x C ( t )( m C ( t ) + 3 m D ( t ))( π CC ( t ) + π CD ( t )) + x D ( t )2 m D ( t )( π DC ( t ) + π DD ( t ))] e − βt dt = Z ∞ [ x C ( t ) + 2 m D ( t )] e − βt dt,
14y using the fact that π CC ( t ) + π CD ( t ) = π DC ( t ) + π DD ( t ) = 1 and x C ( t ) + x D ( t ) = m C ( t ) + m D ( t ) = 1.It should be clear that this cost is minimized when x C is minimal, which occurswhen the strategy is to choose action D regardless of the current state. This showsthat the only mean field equilibrium is when all players choose action D .Let us now consider the game with N players and consider the following Markovstrategy: π N ( m ) = ( C if m C = 1 D if m C < β < N large, π N is a Markov Nash equilibrium.Assume that all players, except Player n , play the strategy π N and let us computethe best response of Player n . It should be clear that if at time 0, m C <
1, then thebest response of Player n is to play D . On the other hand, if m C = 1, then: • If Player n applies π N , she will suffer a cost R exp( − βt ) dt = 1 /β . • If Player n deviates from π N and chooses the action D , all players will alsodeviate after that time. This implies that m D ( t ) ≈ − exp( − t ) and that theplayer n will suffer a cost approximately equal to R ∞ ( x C ( t )+2 − e − t ) e − βt dt ≥ / ( β ( β + 1)) when N is large.When β <
1, then 2 / ( β ( β + 1)) > /β , so that Player n has no incentive to deviatefrom the strategy π N and that therefore, π N is a Nash equilibrium. We also observethat for this example, the value of the finite game does not converge to the one ofthe mean field game.In conclusion to this section, one can argue that this counter-example should notbe surprising because, in mean field games, punishment is possible against a fractionon the population that deviates but is not possible against individual deviation,because it is not seen in the population distribution.As a final remark, as in the case of repeated games, the continuity with respectto m (not true here) is critical for convergence (see [35]).
4. Finite Horizon Case
Let us now consider mean field games over a finite time horizon T . Thesegames are similar to games with discounted costs, previously defined, but they onlyrun for a finite duration T . As in the discounted case, the evolution over timeof the population distribution m π is given by (2) and the evolution of Player 0’sdistribution is given by (4). 15iven the population strategy π and Player 0 strategy π , the expected cost ofPlayer 0 for the finite horizon case is defined as follows: V ( π , π ) = Z T X i ∈E X a ∈A x i ( t ) c i,a ( m π ( t )) π i,a ( t ) ! dt. (18)In the literature, similar models have been studied, considering continuous timefinite state space mean field games with finite horizon. The authors in [21] consideruniformly convex cost functions and in [27] cost functions are assumed to be strictlyconvex. In our model, we assume that the costs are continuous in the populationdistribution. It can also be observed that the instantaneous cost of Player 0 is linearin π . Therefore, the model we study in this work is not covered by these papers.We define the notion of mean field equilibrium for the finite horizon case as inthe discounted case, by replacing the cost function (5) by (18). Then, the proof ofthe existence Theorem 1 applies mutatis mutandis to show the existence of a meanfield equilibrium in this case: Any continuous time mean field game over a finitehorizon that satisfies Assumption (A1) has a mean field equilibrium. The construction of a counter example of convergence with an infinite timehorizon given in § § π N is not a Nashequilibrium for the N -player game because at the last time-slot, the best responseof Player n to any strategy is to play D . By induction on the number of time-slots,the only Nash equilibrium of the N -player game is when all players play D , whichcoincides with the mean field equilibrium.Yet, a counter-example also exists for finite-time horizon. The essential idea isto start with a matrix game with two pure Nash equilibria instead of one as in theprevious example. Let us consider the following cost matrix:C D PC 1,1 3,0 4,0D 0,3 2,2 4,3P 0,4 3,4 3,3The setting is similar to the previous example: the action set is equal to the statestate E = A = { C, D, P } and at each time step, one player is chosen. If she selectsan action a ∈ A , then her state becomes a at the next time step. This game canbe viewed as a generalization of the prisoner’s dilemma with an additional Nash-equilibrium P (which stands for “punish”). It can be shown using a similar path as16n the previous section that, when T is large enough, the following time-dependentMarkovian strategy is a Nash equilibrium: π N ( m, t ) = C if t < m c = 1; D if t ≥ m P = 0; P otherwise. (19)In the above strategy, the state P is used as a stick to punish people from deviatingfrom the imposed strategy. In this case, nobody has an incentive to deviate fromthis strategy at the last step because D is also a Nash equilibrium.The mean field game has only two equilibria: The whole population always plays D , or the whole population always plays P . These equilibria are also equilibria forthe finite-game. Yet, they both have a larger cost than the strategy of Equation (19).This leads us to say that the value of the game does not converge: the asymptoticcost of the strategy (19) is strictly smaller than the cost of any of the mean fieldequilibria.
5. Synchronous Players
As explained in the previous section, mean field games in continuous time appearnaturally as the limit of N -player asynchronous games as N goes to infinity. In theseasynchronous games with N players, only one player changes state at the same time.However, there are other situations where it is more natural to consider synchronousgames in which, at each time step, all players take an action. N -Player Games with Exchangeable Players Here we consider a finite synchronous game G Ns = ( N, E , A , { P a } , M , { c a } , β )with N identical players with several differences from the model used in Section 3.1,the main one being the replacement of the rate matrices by stochastic matrices. Asbefore, each Player n has an internal state X n ( t ) that belongs to a finite state space E ( X ( t ) = ( X ( t ) , . . . , X N − ( t )) and chooses an action from a finite action space A .The main difference with the previous asynchronous model is that at each time step t ∈ Z + , all players choose an action A n ( t ) ∈ A simultaneously. We assume that, aplayer in state i who chooses action a goes to state j with probability P ija ( X ( t ))and that, given X ( t ), the evolution of all players are independent. Furthermore,we assume that the players are exchangeable , i.e. for any permutation σ of the N players, P ija ( X ( t ) , . . . , X N − ( t )) = P ija ( X σ (0) ( t ) , . . . , X σ ( N − ( t )). The fact that When the time horizon is finite, it is natural to consider Markovian strategies that depend ontime. X ( t ) can be replaced bya dependence on the population distribution M ( t ). More precisely, for any vectorstate x , y ∈ E N and any action vector a ∈ A N , one can write: P ( X ( t + 1) = j | X ( t ) = i , A ( t ) = a , F t ) = N Y n =1 P i n j n a n ( M ( t )) , (20)where F t is the natural filtration of the game up to time t , m is the population dis-tribution of x and ∀ i, j ∈ E , ∀ a ∈ A , P ija ( m ) forms a stochastic matrix, continuousin m .The instantaneous cost at time t depends on actions and state at time t −
1, symmetric in all players, so it can be written as a function of the populationdistribution: c X n ( t ) ,A n ( t ) ( M ( t )), and a discount factor δ at each time step. Given astrategy π used by Player 0 and a strategy π used by all the others, the expectedcost of Player n is: V N ( π , π ) = E " (1 − δ ) ∞ X t =0 δ t c X ( t ) ,A ( t ) ( M π ( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A is chosen w.r.t. π A n ′ is chosen w.r.t. π if n ′ = 0 . (21) Synchronous games also admit mean field game limits. To construct this limit,let us consider a strategy π such that π i,a ( m ) is the probability for a player tochoose action a given that she is in state i and that M ( t ) = m . Assume that M (0)converges in probability to some m (0) as N goes to infinity and that all playersexcept Player 0 apply a strategy π that is continuous in m . As shown in Theorem1 in [19] (up to differences in notations, the mean field model in [19] is the same asEquation (20)), the population distribution M π ( t ) converges (in probability) to adeterministic quantity m π ( t ) as N goes to infinity. m π ( t ) is defined by m πj ( t + 1) = X i ∈E X a ∈A m πi ( t ) P i,j,a ( m π ( t )) π i,a ( m ( t )) . (22)We denote by π the strategy of Player 0. The probability that Player 0 is in state j ∈ E evolves over time according to the following equation: x j ( t + 1) = X i ∈E X a ∈A x i ( t ) P i,j,a ( m π ( t )) π i,a ( m ( t )) . (23)In this case, the cost of Player 0, given by (21) becomes V ( π , π ) = (1 − δ ) ∞ X t =0 X i ∈E X a ∈A δ t x i ( t ) c i,a ( m π ( t )) π i,a ( m ( t )) .
18s the evolution of m is deterministic, for any closed loop strategy π i,a ( m ( t ))and any initial condition m (0), there exists an open-loop strategy π i,a ( t ) that leadsto the same values for m π ( t ) and the same cost. Hence, for the mean field model,one can replace any state-dependent strategy π ( m ( t )) in the above equations by atime-dependent strategy π ( t ).Player 0 chooses the strategy that minimizes her expected cost. When Player 0does so, we say it uses the best response to the mass strategy π . BR ( π ) = arg min π V ( π , π ) . A strategy is said to be a mean field equilibrium if it is a fixed point for the bestresponse function, that is, π MF E ∈ BR ( π MF E ) . One of the difficulties of the analysis of continuous time mean field game isthat the elements under consideration (the population distribution, the populationstrategy, Player 0 strategy...) are continuous functions of time. In the discrete timecase, the model gets significantly simplified since all the elements are vectors. Hence,the proof of the existence of a mean field equilibrium for continuous-time mean fieldgame (Theorem 1) can be adapted to show that the following result.
Theorem 3 (Mean Field Equilibrium Existence for Synchronous Games) . Any syn-chronous mean field game with discounted cost that satisfies Assumption (A1) for P and c respectively, has a mean field equilibrium.Sketch of proof. We first observe that the set of discrete-time open-loop policies isa compact and convex set. Thus, to finish the proof, we need to show that the bestresponse function has a closed graph and it is convex. The former condition is truesince the set of open-loop policies belongs to a finite dimensional space and fromthe continuity assumptions (A1). The last condition can be shown using the samearguments as in the proof of Theorem 1.
The classical repeated games with discounted costs and with identical playersform a subclass of synchronous games, as defined here. To see this, let us firstconsider a static N -player matrix game G with symmetric cost: u ( a , . . . , a N ) is theinstantaneous cost of any player when the players use actions a , . . . , a N respectively.Furthermore, we assume that u ( a , . . . , a N ) = u ( a σ , . . . , a σ N ), for any permutation σ of { , . . . , N } . The players repeat the matrix game infinitely often and their cost19nder strategy π , · · · , π N is the discounted sum of the costs: V N ( π , π , · · · , π N ) = (1 − δ ) ∞ X t =0 δ t u ( π ( t ) , π ( t ) , · · · , π N ( t )) . (24)These games fit in our framework: The state of a player is merely her currentaction ( X ( t ) = A ( t )) and the evolution of the state becomes trivial: Under state x = a and selecting action b , the next state does not depend on the other playersand becomes b with probability one: P ab ( b, M ( t )) = 1. The cost of one player ateach stage corresponds to an immediate cost c X n ( t ) ,A n ( t ) ( M ( t )) = u ( X ( t )) since thecost u only depends on the population distribution by symmetry. As for the totalcost of a player, (24) coincides with (21), as long as all players in the same state usethe same strategy. The relation between equilibria of N -player games with their mean field limitsis also complex in the discrete time case.Let us first focus on results that concern the performance of mean field equilibriain the N -player game. The situation is almost similar to the continuous time caseand resembles Theorem 2 (i) in the sense that if π is a mean field equilibrium, thenunder assumption (A1), there exists N such that for all N ≥ N , π is a local ε -equilibrium of the N -player game. The proof of this is essentially similar to theproof of Theorem 2.Let us now consider the Nash equilibria of the N -player game. The situation isvery different from the continuous time case because the state of all the players canchange in one time unit in the discrete time while in continuous time, state can onlychange in small steps, one player at a time.This has several consequences on the nature of equilibria under both models.As mentioned before, the Nash equilibria in the continuous time case may dependon the initial population distribution, but this is not the case here, so that there ismore latitude for designing equilibria.Let us consider the particular case of repeated games, introduced in Section5.2.1. For this type of games, the set of equilibria can be characterized using theFolk Theorem for repeated games. Theorem 4 (Folk theorem, adapted from Theorem A in [18])) . Let G be a sym-metric matrix game, and let V ∗ be the cost under the strategy that repeats the Nashequilibrium of the static game G . Then for any compatible cost V smaller than V ∗ , In this context, a compatible cost is a cost that can be attained by at least one strategy. here exists a discount factor δ ∈ (0 , such that V ∗ is the cost of an equilibrium ofthe discounted repeated game. Actually, for any
V < V ∗ , the construction of an equilibrium whose cost is V is based on the “tit for tat” principle. We claim that none of these equilibria scaleat the mean field limit. Let us consider the following example for a static game.Each player only has two strategies, D and C . If all players play D , the cost is −
1. If all players play C , the cost is −
2. If some players play D and others play C , then, all the players who play C get − M C while the players who play D get − M C − M D , where M C and M D are the proportions of players playing C and D respectively. These costs correspond to the average costs obtained by a player in amatching game against a random opponent.The unique Nash equilibrium of the static game is strategy ( D, D, . . . , D ). Thecost of the corresponding repeated game is (1 − δ ) P t − δ t = − π N in the following) for allplayers: Play D for k rounds then play C as long as every-other player has followedthe same pattern, else play D forever. The cost of this strategy is between − −
2: (1 − δ )( k − X t =0 − δ t + ∞ X t = k − δ t ) = − − δ k . The strategy π N is an equilibrium of the finite game if δ is large enough. Indeed,no player wants to deviate in the first k rounds, because her cost would increase:In the rounds after k , a deviation provides an immediate cost advantage, at thecost of being punished until the end of time, so that a larger enough δ makes thisnon-profitable.Let us now consider the mean field game setting. If the whole population usesthe strategy π N and if Player 0 uses the same strategy her cost becomes V ( π N , π N ) = (1 − δ ) ∞ X t =0 X i ∈E X a ∈A δ t x i ( t ) c i,a ( m π ( t ) π i,a ( m ( t )) , = (1 − δ )( k − X t =0 − δ t + ∞ X t = k − δ t )= − − δ k . However in the mean field setting, the best response of Player 0 to π N is not π N but the strategy π D where she plays D all the time. Indeed in this case her total21ost becomes V ( π D , π N ) = (1 − δ )( k − X t =0 − δ t + ∞ X t = k − δ t )= − − δ k . This shows that π N is not a mean field equilibrium and a “free rider” player cantake advantage of the fact that the population will not act against her. We now focus on the mean field games when objects evolve in discrete time timeover a finite horizon, 0 to T . In this case, the population distribution m π is definedby (22), which depends on the strategy of the mass π . We assume that Player 0 canchoose her own strategy π . The expected cost of Player 0 is V ( π , π ) = T X t =0 X i ∈E X a ∈A x i ( t ) c i,a ( m π ( t )) π i,a ( m ( t )) , where x i ( t ) is the probability that Player 0 is in state i at time t . The evolution of x i ( t ) over time is described in (23).Player 0 uses best response to a given population strategy π , which means thatshe selects the strategy π that minimizes her expected cost. We are interested inproving the existence of a mean field equilibrium which consists of finding a strategythat is a fixed-point for the best response function. In Section 5.2, we showed this forthe discounted case. In the finite horizon case, the vectors have finite size and, as aconsequence, it is immediate to show, using the same arguments of those required forthe proof of Theorem 3, that any discrete time mean field game with finite horizoncost such that P and c satisfy Assumption ( A
1) has a mean field equilibrium. Again,the proof mimics the proof of the analog Theorem 1 in continuous time over a finitehorizon.
6. Conclusions
In this article, we generalize the framework of discrete-space mean field games tothe cases of non-convex costs and explicit interactions. They hit a good compromisebetween tractability (existence of an equilibria) and modelization power (includingpropagation and congestion behaviors). This model consists of a finite state spacemean field game where the transition rates of the objects and the cost function ofa generic object depend not only on the actions taken but also on the population22istribution. We also show that there exists a sub-class of Nash equilibria for N -player games that converge to mean field equilibria when the number of players goesto infinity. Outside of this class, and in particular for all equilibria using the “tit fortat” principle over which the Folk theorem is based, the convergence does not hold.For future work, we are interested in finding conditions ensuring the uniquenessof the mean field equilibrium. We believe that monotony assumptions similar toassumptions in [21] are required to prove the existence of a unique mean field equi-librium in this model. On the other hand, another interesting open question concernsthe convergence of N -players equilibria to mean field equilibria when the numberof player grows. We believe that there exist many N -player games for which theonly limiting equilibria are mean field equilibrium, for example when players haveincomplete information about the game. It would be interesting to characterize thesub-class of strategies where convergence to mean field equilibria holds. Obviously,this class includes all local strategies (no information) and excludes some Markovianones (full information). References [1] S. Adlakha, R. Johari, and G. Y. Weintraub. Equilibria of dynamic gameswith many players: Existence, approximation, and market structure.
Journalof Economic Theory , 2015.[2] N. I. Al-Najjar and R. Smorodinsky. Large nonanonymous repeated games.
Games and Economic Behavior , 37:26–39, 2001.[3] D. M. Ambrose. Strong solutions for time-dependent mean field games withnon-separable hamiltonians.
Journal de Math´ematiques Pures et Appliqu´ees ,113:141 – 154, 2018.[4] R. Basna, A. Hilbert, and V. N. Kolokoltsov. An epsilon-nash equilibrium fornon-linear markov games of mean-field-type on finite spaces.
Commun. Stoch.Anal , 8(4):449–468, 2014.[5] E. Bayraktar and A. Cohen. Analysis of a finite state many player game usingits master equation. arXiv preprint arXiv:1707.02648 , 2017.[6] M. Benaim and J.-Y. Le Boudec. A class of mean field interaction models forcomputer and communication systems.
Performance Evaluation , 65(11):823–838, 2008.[7] A. Bensoussan, J. Frehse, and P. Yam.
Mean field games and mean field typecontrol theory . Springer, 2013.[8] K. C. Border.
Fixed point theorems with applications to economics and gametheory . Cambridge university press, 1989.239] P. Cardaliaguet, F. Delarue, J.-M. Lasry, and P.-L. Lions. The masterequation and the convergence problem in mean field games. arXiv preprintarXiv:1509.02505 , 2015.[10] R. Carmona and F. Delarue. Probabilistic analysis of mean-field games.
SIAMJournal on Control and Optimization , 51(4):2705–2734, 2013.[11] R. Carmona and F. Delarue. Probabilistic analysis of mean-field games.
SIAMJ. Control Optim. , 51(4):2705–2734, 2013.[12] R. Carmona, D. Lacker, et al. A probabilistic weak formulation of mean fieldgames and applications.
The Annals of Applied Probability , 25(3):1189–1231,2015.[13] R. Carmona and P. Wang. Finite state mean field games with major and minorplayers. arXiv preprint arXiv:1610.05408 , 2016.[14] A. Cecchin and M. Fischer. Probabilistic approach to finite state mean fieldgames.
Applied Mathematics & Optimization , Mar 2018.[15] P. Dasgupta and E. Maskin. The existence of equilibrium in discontinuouseconomic games, i: Theory.
Review of Economic Studies , 53(1):1–26, 1986.[16] J. Doncel, N. Gast, and B. Gaujal. Are mean-field games the limits of finitestochastic games?
SIGMETRICS Perform. Eval. Rev. , 44(2):18–20, Sept. 2016.[17] A. M. Fink. Equilibrium in a stochastic n -person game. J. Sci. HiroshimaUniv. Ser. A-I Math. , 28(1):89–93, 1964.[18] D. Fudenberg and E. Maskin. The folk theorem in repeated games with dis-counting or with incomplete information.
Econometrica , 54(3):533–554, 1986.[19] N. Gast and B. Gaujal. A mean field approach for optimization in discretetime.
Discrete Event Dynamic Systems , 21(1):63–101, 2011.[20] D. A. Gomes, J. Mohr, and R. R. Souza. Discrete time, finite state space meanfield games.
Journal de Math´ematiques Pures et Appliqu´ees , 93(3):308 – 328,2010.[21] D. A. Gomes, J. Mohr, and R. R. Souza. Continuous time finite state meanfield games.
Applied Mathematics & Optimization , 68(1):99–143, 2013.[22] D. A. Gomes and E. Pimentel. Time-dependent mean-field games with logarith-mic nonlinearities.
SIAM Journal on Mathematical Analysis , 47(5):3798–3812,2015. 2423] D. A. Gomes, E. Pimentel, and H. S´anchez-Morgado. Time-dependent mean-field games in the superquadratic case.
ESAIM: Control, Optimisation andCalculus of Variations , 22(2):562–580, 2016.[24] D. A. Gomes and E. A. Pimentel. Regularity for mean-field games systemswith initial-initial boundary conditions: The subquadratic case. In
Dynamics,Games and Science , pages 291–304. Springer, 2015.[25] D. A. Gomes, E. A. Pimentel, and H. S´anchez-Morgado. Time-dependent mean-field games in the subquadratic case.
Communications in Partial DifferentialEquations , 40(1):40–76, 2015.[26] A. Granas and J. Dugundji.
Fixed point theory . Springer Science & BusinessMedia, 2013.[27] O. Gu´eant. Existence and uniqueness result for mean field games with con-gestion effect on graphs.
Applied Mathematics & Optimization , 72(2):291–303,2014.[28] O. Gu´eant, J.-M. Lasry, and P.-L. Lions. Mean field games and applications.In
Paris-Princeton Lectures on Mathematical Finance 2010 , volume 2003 of
Lecture Notes in Mathematics , pages 205–266. Springer Berlin Heidelberg, 2011.[29] M. Huang. Mean field stochastic games with discrete states and mixed players.In
Game Theory for Networks , pages 138–151. Springer, 2012.[30] M. Huang, R. Malhame, and P. Caines. Large population stochastic dynamicgames: Closed-loop mckean vlasov systems and the nash certainty equivalenceprinciple.
Communications in Information and Systems , 6(3):221 252, 2006.Special issue in honor of the 65th birthday of Tyrone Duncan.[31] D. Lacker. A general characterization of the mean field limit for stochasticdifferential games.
Probability Theory and Related Fields , 165(3), Aug 2016.[32] J.-M. Lasry and P.-L. Lions. Jeux `a champ moyen. i–le cas stationnaire.
Comptes Rendus Math´ematique , 343(9):619–625, 2006.[33] J.-M. Lasry and P.-L. Lions. Jeux `a champ moyen. ii–horizon fini et contrˆoleoptimal.
Comptes Rendus Math´ematique , 343(10):679–684, 2006.[34] J.-M. Lasry and P.-L. Lions. Mean field games.
Japanese Journal of Mathe-matics , 2(1):229–260, 2007.[35] H. Sabourian. Anonymous repeated games with a large number of players andrandom outcomes.
JOURNAL OF ECONOMIC THEORY , 51:92–110, 1990.[36] W. Sandholm.
Population Games and Evolutinary Dynamics . MIT Press, 2010.2537] H. Tembine. Mean field stochastic games: convergence, q/h-learning and opti-mality. In
American Control Conference (ACC), 2011 , pages 2423–2428. IEEE,2011.[38] H. Tembine, J.-Y. L. Boudec, R. El-Azouzi, and E. Altman. Mean field asymp-totics of markov decision evolutionary games and teams. In
Game Theory forNetworks, 2009. GameNets’ 09. International Conference on , pages 140–150.IEEE, 2009.[39] Z. Wang, C. T. Bauch, S. Bhattacharyya, A. d’Onofrio, P. Manfredi, M. Perc,N. Perra, M. Salath´e, and D. Zhao. Statistical physics of vaccination.