[PDF] Discrete Mean Field Games: Existence of Equilibria and Convergence

Abstract

We consider mean field games with discrete state spaces (called discrete mean field games in the following) and we analyze these games in continuous and discrete time, over finite as well as infinite time horizons. We prove the existence of a mean field equilibrium assuming continuity of the cost and of the drift. These conditions are more general than the existing papers studying finite state space mean field games. Besides, we also study the convergence of the equilibria of N -player games to mean field equilibria in our four settings. On the one hand, we define a class of strategies in which any sequence of equilibria of the finite games converges weakly to a mean field equilibrium when the number of players goes to infinity. On the other hand, we exhibit equilibria outside this class that do not converge to mean field equilibria and for which the value of the game does not converge. In discrete time this non-convergence phenomenon implies that the Folk theorem does not scale to the mean field limit.

Full PDF

aa r X i v : . [ m a t h . O C ] S e p Discrete Mean Field Games: Existence of Equilibria andConvergence

Josu Doncel a , Nicolas Gast b,c , Bruno Gaujal b,c a University of the Basque Country UPV/EHU, Spain b Univ. Grenoble Alpes, F-38000 Grenoble, France c Inria

Abstract

We consider mean ﬁeld games with discrete state spaces (called discrete mean ﬁeldgames in the following) and we analyze these games in continuous and discrete time,over ﬁnite as well as inﬁnite time horizons. We prove the existence of a mean ﬁeldequilibrium assuming continuity of the cost and of the drift. These conditions aremore general than the existing papers studying ﬁnite state space mean ﬁeld games.Besides, we also study the convergence of the equilibria of N -player games to meanﬁeld equilibria in our four settings. On the one hand, we deﬁne a class of strategiesin which any sequence of equilibria of the ﬁnite games converges weakly to a meanﬁeld equilibrium when the number of players goes to inﬁnity. On the other hand,we exhibit equilibria outside this class that do not converge to mean ﬁeld equilibriaand for which the value of the game does not converge. In discrete time this non-convergence phenomenon implies that the Folk theorem does not scale to the meanﬁeld limit.

1. Introduction

Mean ﬁeld games have been introduced by Lasry and Lions [34] as well as Huang,Caines and Malham´e [30] to model interactions between a large number of strategicagents (players) and have had a large success ever since. Since the seminal work in[32, 33, 34, 30], a large variety of papers have been investigating mean ﬁeld games.Most of the literature concerns continuous state spaces and describes a mean ﬁeldgame as a coupling between a Hamilton-Jacobi-Bellman equation with a Fokker-Planck equation (see for example [28, 7, 9, 24, 10, 25, 22, 23, 3]). Here, we areinterested in studying mean ﬁeld games with a ﬁnite number of states and ﬁnitenumber of actions per player. In this case, the analog of the Hamilton-Jacobi-Bellman equation is the

Bellman equation and the discrete version of the Fokker-Planck equation is the

Kolmogorov equation . Preprint submitted to Elsevier inite state space mean ﬁeld games in discrete time ( a.k.a. with synchronousplayers) were previously studied in [20]. In their work, the strategy of the players isthe probability matrix of the Kolmogorov equation. This implies that each playercan choose her dynamics independently of the state of the others: the behaviorof players is only coupled via their costs. In that case, the Kolmogorov equationbecomes linear.Finite state space mean ﬁeld games in continuous time ( a.k.a. with asynchronousplayers) have also been previously analyzed in [21, 27, 5, 13]. In their model, theplayers also control completely the transition rate matrix so that the dynamics areagain linear once the actions of the players are given. Again, players do not interactwith each other directly in these models, but only through their costs.The models we study here, both in the synchronous and asynchronous casescover non-linear dynamics: We consider that the players do not have the power tochoose the rate matrix and that their actions only have a limited eﬀect on theirstate. Here, the transition rate matrix may depend not only on the actions taken bythe player, but also on the population distribution of the system. This introducesan explicit interaction between the players (and not just through their costs). Thisnon-linear dynamics is called the relaxed case in [14]. We claim that the model withexplicit interactions covers several natural phenomena such as information/infectionpropagation or resource congestion where the cost but also the state dynamics of aplayer depend on the state of the all the others. This type of behavior is classical insystems with a large number of interacting objects [6] and cannot be handled usingprevious mean ﬁeld game models. For instance, in the classical SIR (Susceptible,Infected, Recovered) infection model [39], the rate of infection of one individualdepends on the proportion of individuals already infected. Similarly, in a model ofcongestion one player cannot typically use a resource if it is already used to fullcapacity.We show that the only requirement needed to guarantee the existence of a MeanField Equilibrium in mixed strategies is that the cost is continuous with respectto the population distribution (convexity is not needed). This result nicely mimicsthe conditions for existence of a Nash equilibrium in the simpler case of staticpopulation games (see [36]). The existence of a mean ﬁeld equilibrium in mixedstrategies has been previously shown by [31, 12] in the diﬀusion case. In [27] theexistence of a Mean Field Equilibrium is proven under the assumption that the costof a player is strictly convex w.r.t. her strategy and in [21] the authors also consideruniformly convex functions. These conditions are rather strong because they arenot satisﬁed in the important case of linear and/or expected costs. In [14] existenceof a Nash equilibrium is also proved under mere continuity assumptions and witha compact action space (more general than the simplex, used here). However, themain diﬀerence between the two approaches is the type of mean ﬁeld limit that isused. In [14], the trajectories of the states of the players are considered while we2nly consider the state at time t . The ﬁrst approach uses arguments in line withthe propagation of chaos while the second one is closer to the work in [4, 38]. Whilethe convergence of trajectories is a more reﬁned convergence than the point-wiseconvergence in general, this is useless here. Indeed, for mean ﬁeld games, costs areassociated to states and actions and not to trajectories. Therefore, the point-wisemean ﬁeld approach is suﬃcient. Another diﬀerence with [14] is that an additionalassumption about the uniqueness of the argmin is needed in some parts of theconvergence proof as well as for existence (in the feedback case). This is not thecase here, so both papers do not cover the exact same set of games.As in most existence proofs, our proof is based on a version of the ﬁxed pointtheorem of Kakutani in inﬁnite dimension (see for example [13] where such an ex-tended version of the ﬁxed point theorem is used in a mean ﬁeld game model withminor and major players). Here however, we do not consider the best responseoperator but the evolution of the population distribution instead, as in [14]. Outof the four cases (asynchronous/synchronous, ﬁnite/inﬁnite horizons), we mainlydetail the asynchronous player case for which we prove this existence of a mean ﬁeldequilibrium in an inﬁnite horizon with discounted costs. We also show, more brieﬂy,how these results can be extended to a ﬁnite horizon or to a ﬁnite or inﬁnite timehorizon in the synchronous-player case.Our second contribution concerns convergence of ﬁnite games to mean ﬁeld lim-its. Diﬀerent authors have studied the convergence of N -player games equilibria tomean ﬁeld equilibria, e.g. [29, 1, 37, 38]. The type of strategies considered in thesepaper is diﬀerent from ours: they consider that the strategy of a player only de-pends on her internal state (these are called stationary policies in [38]), whereas herewe allow time dependence in these policies. The model in [38] does include statedynamics that depend on the population distribution but only considers station-ary strategies that do not depend on time, hence cannot depend on the populationdynamics.In all four combinations (ﬁnite / inﬁnite horizon, synchronous / asynchronous),a mean ﬁeld equilibrium is always an ǫ -approximation of an equilibrium of a corre-sponding game with a ﬁnite number N of players, where ǫ goes to 0 when N goesto inﬁnity. This is the discrete pending result to similar results in continuous games[11]. However, we show also that not all equilibria for the ﬁnite version converge toa Nash equilibrium of the mean ﬁeld limit of the game. We provide several counter-examples to illustrate this fact. They are all based on the following idea: The “titfor tat” principle allows one to deﬁne many equilibria in repeated games with N players. However, when the number of players is inﬁnite, the deviation of a singleplayer is not visible by the population that cannot punish him in retaliation forher deviation. This implies that while the games with N players may have manyequilibria, as stated by the folk theorem, this may not be the case for the limitgame. This fact is well-known for large repeated games (see examples of Anti-folk heorems in [35, 2]). However, up to our knowledge, these results have not yet beeninvestigated in the mean ﬁeld game framework. Finally, our four models of dynamic games do not face the issue of the order ofplay, nor partial information. Thus, we avoid two diﬃculties of dynamic games: theinformation structure of each player and the existence of a value [15]. In our case,all players are similar, so the order of play is irrelevant, and we only consider thefull information case: players know the strategy of the other players and the currentglobal state (more details on this are given in Section 3.2).The rest of the article is organized as follows. We introduce mean ﬁeld gameswith explicit interactions in continuous time in Section 2 where we mainly focus onthe inﬁnite horizon with discounted costs. We describe the evolution of the state ofthe players, the cost function as well as the best response operator. In both cases(ﬁnite and inﬁnite horizon), we prove the existence of an equilibrium. We show inSection 3 that this equilibrium is an approximation of an equilibrium for the gamewith a ﬁnite number of players. Finally, we study an example of an N -player gameinspired from the prisoner’s dilemma whose equilibria are not always equilibria forthe limit mean ﬁeld game. We focus on the synchronous case in Section 5 (whereplayers all play at the same time). In this case, N -player games can be seen asclassical stochastic games in discrete time. We derive the mean ﬁeld limit dynamicsand the existence of an equilibrium. Here counter-examples of equilibria for ﬁnitegames that do not go to the limit are easier to ﬁnd. Indeed, the folk theorem appliesand all equilibria based on retaliation cannot be equilibria at the limit.

2. Discrete Mean Field Games in Continuous Time

A discrete mean ﬁeld game G is a tuple G = ( E , A , { Q a } , m , { c a } , β ), where E isthe state space, A the action set, { Q a } the transition rate matrices, m the initialstate, { c a } the cost functions and β ∈ R a discount factor.The game is described as follows. State and action sets.

We consider a population made of an inﬁnite number ofhomogeneous players that evolve in continuous time. Each player has a ﬁnite statespace denoted by E = { , . . . , E } and a ﬁnite action set A = { , . . . , A } .We denote by P ( A ) (resp. P ( E )) the set of probability measures over A (resp. E ). Since A is ﬁnite, P ( A ) is the simplex of dimension A . An extended abstract discussing our counterexample in the continuous time model with inﬁnitehorizon was presented in [16]. et of strategies. A mixed strategy (or strategy for short) is a measurable function π : E × R + → P ( A ), that associates to each state i ∈ E and each time t ≥ π i ( t ) ∈ P ( A ) on the set of possible actions. We also denote by π i,a ( t ) the probability that, at time t , a player in state i takes the action a , understrategy π . For all t ≥ i ∈ E , we have P a ∈A π i,a ( t ) = 1. The set of allpossible strategies is denoted by S .We say that a strategy is pure if, for all state i and all t ∈ R , there exists anaction a ∈ A such that π i,a ( t ) = 1 and π i,a ′ ( t ) = 0 for all a ′ = a .The set S is a bounded subset of the Hilbert space of the functions E × R + → R A equipped with the inner product the exponentially weighted inner product : h f, g i = R ∞ f ( g ) g ( t ) e − βt dt . This shows that S is weakly compact, where the weaktopology is deﬁned as follows: a sequence of policy π n converges to a policy π if forany bounded function g :lim n →∞ Z ∞ π n ( t ) g ( t ) e − βt dt = Z ∞ π ( t ) g ( t ) e − βt dt. (1) Rate matrices.

We denote by m π ( t ) ∈ P ( E ) the population distribution at time t .As the state space is ﬁnite, m π ( t ) is a vector whose i -th component, m πi ( t ), is theproportion of players in state i at time t . The evolution over time of the populationdistribution is driven by rate matrices: { Q a ( m π ( t )) } a ∈A . By deﬁnition, Q ija ( m π ( t ))is the rate at which a player in state i moves to state j when choosing action a , whenthe population distribution is m ( t ). Note that by deﬁnition, P j ∈E Q ija ( m π ( t )) = 0for all i and a and Q ija ( m π ( t )) is non-negative for all j = i and all a .In the following, we assume that for all i, j, a , Q ija ( m ) is Lipschitz-continuousin m with constant L .The initial condition is m π (0) = m . For t ≥

0, the population distribution m π ( t ) is the solution of the following diﬀerential equation, that depends on thestrategy π : for j ∈ E ˙ m πj ( t ) = X i ∈E X a ∈A m πi ( t ) Q ija ( m π ( t )) π i,a ( t ) . (2)The rationale behind this diﬀerential equation is that all players in state i use theaction a ∈ A and move to state j with rate Q ija ( m π ( t )).If the strategy π i ( t ) is not continuous in time, the diﬀerential equation (2) maynot be well-posed at time-points where π i is not continuous. The existence of acontinuous solution for (2) is guaranteed by the Carath´eodory’s Existence Theorem.The Lipschitz condition on Q further implies that this solution is essentially unique5ecause any solution of (2) must be a ﬁxed point of m πj ( t ) = m j, + Z t X i ∈E X a ∈A m πi ( u ) Q ija ( m π ( u )) π i,a ( u ) ! du. (3)In anticipation, the same properties (existence and uniqueness of the solution ofthe ODE) hold for the diﬀerential equation (4). Remark 1 (Explicit interactions) . In this model, the rate matrix Q ija ( m π ( t )) de-pends explicitly on the population distribution: the rate to go from state i to state j under action a depends on how the whole population is distributed among the statesof the system. Other mean ﬁeld models, such as [20], only consider the special casewhere Q ija ( m π ( t )) is constant: Q ija ( m π ( t )) = Q ija . This restricts the populationdynamics given in (2) to linear dynamics. Cost function.

We now concentrate on a particular player, that we call Player 0.Player 0 chooses her own strategy π : R + ×E → P ( A ). We denote by x π ( t ) ∈ P ( E )the probability distribution of Player 0 when Player 0 uses strategy π against apopulation who has distribution m . For a given state i ∈ E , x π , m i ( t ) denotes theprobability for Player 0 to be in state i at time t . The distribution x π , m evolvesover time according to the following diﬀerential equation: for j ∈ E ˙ x π , m j ( t ) = X i ∈E X a ∈A x π i ( t ) Q ija ( m ( t )) π i,a ( t ) . (4)If Player 0 is in state i and takes an action a , it suﬀers from an instantaneous cost c i,a ( m ( t )), that depends on the population distribution at time t . We assume thatthe cost is always continuous in m . Given a population distribution m and thestrategy of Player 0 π , we deﬁne the discounted cost of Player 0 as W ( π , m ) = Z ∞ X i ∈E X a ∈A x π , m i ( t ) c i,a ( m ( t )) π i,a ( t ) e − βt ! dt, (5)where β > V ( π , π ) that represents the discounted cost ofPlayer 0 when the population plays a strategy π : V ( π , π ) = W ( π , m π ) . Best response.

The best response to π of Player 0 is to choose a strategy π ∈ S that minimizes her discounted cost (5) when the rest of the population plays strategy6 . For a given population strategy π , we denote the set of best responses of Player 0to π by BR ( π ). This set is the set of strategies that minimizes her discounted cost: BR ( π ) := arg min π ∈S V ( π , π ) . (6)Note that the best response function is well deﬁned or, in other words, that the“argmin” is reached for some strategy in Equation (6). To prove that, we will laterprove in Section 2.3 that the function V is continuous for the weak topology. As S is weakly compact, this shows that the minimum in π is attained. Proposition 1.

The function V , deﬁned in Equation (5) is continuous in π and π (for the weak-topology on S ). Mean ﬁeld equilibrium.

We then deﬁne a mean ﬁeld equilibrium as a strategy π MF E such that when the population strategy is π MF E , a selﬁsh Player 0 wouldalso choose the same strategy π MF E as her best response.

Deﬁnition 1 (Mean Field Equilibrium) . A strategy π is called a mean ﬁeld equilib-rium if it is a ﬁxed point for the best response function, i.e., π MF E ∈ BR ( π MF E ) . (7) A mean ﬁeld equilibrium is pure if it is a pure strategy.

The rationale behind this deﬁnition is when one considers that the population isformed by players that each take selﬁsh decisions. As the population is homogeneous,each player best response is the same as Player 0. In other words, for a givenpopulation strategy π , all the rational players of the populations (or players) choosethe strategy BR ( π ). As in classical games, a mean ﬁeld equilibrium is a situationwhere no player has incentive to deviate unilaterally from the common strategy. We now show that, under very general assumptions, all discrete mean ﬁeld gamesadmit a mean ﬁeld equilibrium. As for classical games, these equilibria are not nec-essarily pure . As most proof on existence of equilibria, our proof relies on a gener-alization of Kakutani ﬁxed point theorem to inﬁnite dimensional spaces. However,the classical approach consisting of showing that the best response function BR( π )is a Kakutani map does not work here when the cost function is not strictly convex.Therefore, in our approach we focus on the state of the game instead of the bestresponse function.As mentioned before, the diﬀerential equations (2), (4) and the cost equation(5) are all well deﬁned under our running Assumption (A1):7A1) The rate function m Q ija ( m ) is Lipschitz-continuous in m . The cost func-tion m c i,a ( m ) is continuous in m .In particular, this assumption implies that the costs and the rates are all boundedby a ﬁnite value. Theorem 1.

Any discrete mean ﬁeld game G whose rate and cost satisfy Assump-tion (A1) admits a mean ﬁeld equilibrium. Note that in general, the best response function π BR ( π ) is neither continuousnor hemi-continuous in general under (A1). In particular, the best response set BR ( π ) may not be a convex set. This makes diﬃcult the application of the classicalﬁxed point theorems on the best response function. As a result, our proof willformulate the ﬁxed point problem in an alternative manner by considering a ﬁxedpoint in m . For a strategy π , the function m π satisﬁes the diﬀerential equation (2). As m π ( t )lives in a compact and the functions Q are continuous, the right-hand side of thisdiﬀerential equation is bounded. This shows that there exists a constant L ′ suchthat for any strategy π , the function m π is Lipschitz-continuous with constant L ′ .Similarly the function x π is also Lipschitz-continuous with constant L ′ .Let M be the set of functions from R + to P ( E ) that are Lipschitz-continuouswith constant L ′ . We equip this set with the exponentially weighted L ∞ -norm : (cid:13)(cid:13) m − m ′ (cid:13)(cid:13) = sup i ∈E ,t ≥ (cid:12)(cid:12) m i ( t ) − m ′ i ( t ) (cid:12)(cid:12) e − βt . By the Arzela-Ascoli theorem, M is a compact space.To prove that V is continuous in π and π , it suﬃces to show that the mapping π m π is continuous (for the weak topology) and that the mapping ( π , m ) x π , m is continuous. To prove the continuity of m π , let π n be a sequence of strategythat converges to a strategy π . As M is compact, there exists a function m and asubsequence of m π n that converges to m . Moreover, we have : m j ( t ) = m j, + lim n →∞ Z t X i ∈E X a ∈A m π n i ( u ) Q ija ( m π n ( u ))( π n ) i,a ( u ) ! du (8)= m j, + Z t X i ∈E X a ∈A m i ( u ) Q ija ( m ( u )) π i,a ( u ) ! du, (9)8here the convergence holds because π n converges weakly to π and m π n convergesuniformly on all compact to m .Equation (9) shows that the function m is equal to the function m π . This showsthat π → m π is continuous in π which implies that V is continuous in π .The proof that ( π , m ) x π , m is continuous is very similar to the above proofand we therefore omit it. Recall that for a given population distribution m ∈ M , the cost of a strategy π is deﬁned as W ( π , m ) = Z ∞ X i,a x i ( t ) π i,a ( t ) c i,a ( m ( t )) e − βt  dt, (10)where x satisﬁes (for all j ∈ E ): ˙ x j ( t ) = X i,a x i ( t ) Q ija ( m ( t )) π i,a ( t ) . (11)We now deﬁne the function Φ : M → M as the best response to a populationdistribution m . It is a mapping that associates to a population distribution m ∈ M ,the set of all state distributions that can be induced by an optimal policy:Φ( m ) = (cid:26) x π such that π ∈ arg min π ∈S W ( π, m ) (cid:27) . (12)In the remainder of the proof, for all m ∈ M , Φ( m ) is well deﬁned and nonempty ( i.e. , the minimum is attained), is convex and compact. Moreover, we willalso show that the function Φ( · ) is upper-semicontinuous. As M is compact [8,Prop. 11.11], this shows that Φ( · ) satisﬁes the conditions of the ﬁxed point theoremgiven in [26, Theorem 8.6] and therefore has a ﬁxed point m ∗ . By the deﬁnition ofΦ, this implies that there exists a strategy π that is a best-response to m π , whichimplies that π is a mean ﬁeld equilibrium. Deﬁnition of Φ( m ) – It can be shown that W is continuous (by using a rea-soning similar to the one for V (Proposition 1)). This shows that there exists π that attains the minimum on the right hand side of Equation (12), which shows thatΦ( m ) is well deﬁned and non-empty. 9 ompactness of Φ( m ) – Let us consider the following optimization problem:min x , z Z ∞ X i,a z i,a ( t ) c i,a ( m ( t )) e − βt  dt (13)such that z satisﬁes  P a z j,a ( t ) = x j ( t ) ∀ j ∈ E ,z j,a ( t ) ≥ , ∀ j ∈ E , ∀ a ∈ A , ˙ x j ( t ) = P i,a z i,a ( t ) Q ija ( m ( t )) ∀ j ∈ E . (14)The above problem is a linear problem, which implies that the set of optimal solu-tions is convex and compact. Let us show that the set of optimal solution of theoptimization problem (13) is Φ( m ). To show this, let us remark that the constraints(11) are equivalent to the constraints (14) by replacing the variables x i ( t ) π i,a ( t ) by z i,a ( t ). Then, the constraint π ∈ S of (11), that corresponds to π ( t ) ∈ P ( A ), isreplaced with z i,a ( t ) ≥ P a z i,a ( t ) = x i ( t ). Upper-semi continuity of

Φ. To prove that Φ is upper-semi continuous, letus show that the graph of m Φ( m ) is closed. Let m n ∈ M and x n ∈ Φ( m n )be two sequences such that lim n →∞ m n = m ∞ and lim n →∞ x n = x ∞ . We want toshow that x ∞ ∈ Φ( m ∞ ).As W is continuous, for all x n ∈ Φ( m n ), there exists a strategy π n that minimizes W ( π, m n ) and such that x n = x π n , m n . As the set S is weakly compact, this sequenceof strategies has a subsequence that converges weakly to a strategy π ∗ . Moreover,we have: • As W is continuous, π ∗ minimizes W ( π, m ∞ ). This shows that x π ∗ ∈ Φ( m ∞ ). • The solution of (11) is continuous in π and m , which shows that x ∞ = x π ∗ , m ∞ .Combining these two facts shows that x ∞ ∈ Φ( m ∞ ) which implies that the graphof Φ is closed. Remark 2.

The continuity assumption (A1) is tight in the following sense:1- If the rate Q is not Lipschitz-continuous in m , then the evolution of thepopulation is not well deﬁned, in the sense that the evolution equation (2) may haveseveral solutions or no solution at all.2- There exist games with non-continuous cost functions that do not admit anymean ﬁeld equilibrium. For example, consider the following mean ﬁeld game: G = (cid:18) E = { , } , A = { a, b } , Q a = 0 , Q b = " − , m (0) = (1 ,

0) (15) c a ( m , m ) = 0 , c b ( m , m ) = ( − if m ≤ / otherwise , β (cid:19) . (16)10 ssume that this game has a mean ﬁeld equilibrium and let denote by m ( t ) the stateat equilibrium. By deﬁnition of Q a and Q b , m ( t ) is a non-decreasing function.Hence, let τ = sup { t : m ( t ) ≤ / } (note that τ ∈ [ln 2; + ∞ ) ∪ { + ∞} ). It shouldbe clear that the best response of Player 0 to any state function m is the policy π ( τ ) that consists in playing “ b ” until τ and “ a ” after τ . However, such a policy is nevera mean ﬁeld equilibrium: under the policy π ( τ ) , m ( t ) = 1 − e − min( t,τ ) , which meansthat sup { t : m ( t ) ≤ / } ∈ { ln 2 , + ∞} . None of the policies π (ln 2) or π ( ∞ ) is anequilibrium: the policy π (ln 2) is the best response to π ( ∞ ) and vice-versa.

3. Convergence of Finite Games to Mean Field Games

Mean ﬁeld games are often presented as a limit of a sequence of ﬁnite games asthe number N of players goes to inﬁnity. In this section, we investigate positive andnegative results that link ﬁnite games and mean ﬁeld games. N Exchangeable Players

To any discrete mean ﬁeld game G = ( E , A , { Q a } , m , { c a } , β ), one can associatea stochastic N -player game G N = ( N, E , A , { Q a } , m , { c a } , β ) as follows. The ﬁnitestochastic game G N has the same state and action spaces E , A , the same rate matri-ces Q a , the same cost functions c a , the same discount factor β , and the same initialstate as G . The time evolution of the ﬁnite game is as follows. At any time t , eachplayer (say Player n ) chooses a (randomized) action A n ( t ) ∈ P ( A ).We consider a mean ﬁeld interaction model between the players, which meansthat the behavior of one object only depends on the states of the other objectsthrough the proportion of objects that are in a given state. To be more precise, wedenote by M ( t ) ∈ P ( E ) the population distribution of the system at time t . As theset E is ﬁnite, M ( t ) is a vector with |E| components and for all i ∈ E , M i ( t ) is thefraction of players that have state i at time t : M i ( t ) = 1 N N X n =1 { X n ( t )= i } . The state of one player (say Player n) follows a continuous time Markov chainwhose rate varies over time. The only dependence between players is through therate that depends on the population distribution.More precisely, the evolution of the state of Player n , under F t , the naturalﬁltration of the process, satisﬁes for all k ∈ N and all states i = j , P ( X n ( t + dt ) = j | X n ( t ) = i, M ( t ) = m , A n ( t ) = a, F t ) = Q ija ( m ) dt + o ( dt ) , (17)11here A n ( t ) is the action taken by Player n at time t .At any time t , Player n suﬀers an instantaneous cost that is a function of herstate X n ( t ), the action that she takes A n ( t ) and the population distribution M ( t ).We write this instantaneous cost c X n ( t ) ,A n ( t ) ( M ( t )).The objective of Player n is to choose a strategy π n from some set of admissiblestrategies Π, in order to minimize her expected discounted cost, knowing the strate-gies of the others. As before, the discount factor is denoted by β . Given a strategy π n ∈ Π used by Player n and a strategy π ∈ Π used by all the others, we denote by V N ( π n , π ) the expected discounted cost of Player n : V N ( π n , π ) = E "Z e − βt c X n ( t ) ,A n ( t ) ( M π ( t )) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A n is chosen w.r.t. π n A n ′ is chosen w.r.t. π ( ∀ n ′ = n ) . A Nash equilibrium for this game is a strategy π such that Player n does nothave another admissible strategy that leads to a lower cost. This notion dependsnaturally on the set of admissible strategies. Deﬁnition 2 (Equilibrium of the N player game) . For a given set of strategies Π ,a strategy π ∈ Π is called a symmetric equilibrium in Π if for any strategy π n ∈ Π : V N ( π, π ) ≤ V N ( π n , π ) . We will also use the notion of ε -equilibrium: Deﬁnition 3 ( ε -equilibrium of the N player game) . For a given set of strategies Π , a strategy π ∈ Π is called an ε - symmetric equilibrium in Π if for any strategy π n ∈ Π : V N ( π, π ) ≤ V N ( π n , π ) + ε. In a full information setting, A n ( t ) is a (possibly random) function of the values X n ′ ( t ′ ) up to time t ′ ≤ t and all actions taken in the past A n ′ ( t ′ ), for t ′ < t andfor n ′ ∈ { . . . N } . Such a strategy is, however, hard to analyze. Therefore, in thefollowing, we will consider two natural subclasses for the set of admissible strategies,depending on the information available to the players: • (Markov) – A strategy π is called a Markov strategy if it induces a choice of A n ( t ) that is a (possibly random) measurable function of only t , M ( t ) and X ( t ): P ( A n ( t ) = a | F t ) = π a,X n ( t ) ( t, M ( t )) . M ( t ).This implies that when all the other players use a Markov strategy, the setof Markov strategies is dominant among the set of full-information strategies:there exists a full-information best response for Player n that is a Markovstrategy. Furthermore, any Markov game admits a Markovian Nash equilib-rium (see [17]). • (Local) – A strategy π is a local strategy if the choice of the action only dependson the player’s internal state and on the time. P ( A n ( t ) = a | F t ) = π a,X n ( t ) ( t ) . If a player uses a local strategy, its actions may depend on time, hence maytrack the law of the population M ( t ) (but not M ( t ) itself). Also notice that alocal strategy is not necessarily stationary because of its dependence on time. The next theorem provides a relation between local equilibria of ﬁnite gamesand mean ﬁeld equilibria of the limit mean ﬁeld game. In particular, it shows thatmean ﬁeld equilibria are a good approximation of local equilibria. However, as wewill show later, this result does not hold for Markovian equilibria.

Theorem 2.

Consider a ﬁnite stochastic game G N , with N players and assume that(A1) holds for its rate matrices Q a and its cost functions c a . Then:(i) Let π be a mean ﬁeld equilibrium of the associated mean ﬁeld game G . Thereexists N such that for all N ≥ N , π is a local ε -equilibrium of the N playergame.(ii) Let ( π N ) N ∈ N be a sequence of local strategies such that π N is an ε N -equilibriumfor the N player game, with ε N → . Then any sub-sequence of the sequence ( π N ) has a sub-sequence that converges weakly to a mean ﬁeld equilibrium of G .Proof. First, V N ( π n , π ) converges to V ( π n , π ) uniformly in ( π n , π ). Uniform con-vergence follows from Theorem 3.3.2 in [38] (The theorem is stated for stationarystrategies, but local strategies as deﬁned here are equivalent to stationary strategies,as deﬁned in [38]).Thus, for any ε , there exists N such that N ≥ N implies that (cid:12)(cid:12) V N ( π n , π ) − V ( π n , π ) (cid:12)(cid:12) ≤ ε/

2. Hence, if π is a mean ﬁeld equilibrium, this impliesthat for any local strategy π n : V N ( π, π ) ≤ V ( π, π ) + ε ≤ V ( π n , π ) + ε ≤ V N ( π n , π ) + ε. (i) .For (ii) , if π N is a sequence of local strategies, then any sub-sequence has a sub-sequence that converge weakly to some local strategy π ∞ . As V ( π n , π ) is continuousin π n and π (for the weak topology), this implies that V ( π ∞ , π ∞ ) ≤ V ( π n , π ∞ ) forall local strategy π n . We now show that Theorem 2-( ii ) does not generalize to Markov strategies. thefollowing example was ﬁrst presented in [16]. The main ingredient used to constructthe following counterexample, is the “tit-for-tat” principle. This principle can beused to construct equilibria for any N -player game but cannot be used in meanﬁeld games. This approach has been used in repeated game papers (see for examplethe examples in [35], further generalized [2]). Up to our knowledege, this type ofbehavior has not yet been described in the mean ﬁeld game framework.Let us consider a mean ﬁeld version of the classical prisoner’s dilemma. Thestate space of a player is E = { C, D } (that stand for Cooperate and Defect) and theaction set is the same A = E . At each time step, one player is chosen. If she selectsan action a ∈ A , her state becomes a at the next time step.The instantaneous cost of a Player n depends on her state i and on the meanﬁeld m : c i,a ( m ) = ( m C + 3 m D if i = C m D if i = D At each time step, this cost function corresponds to a matching game where a playerplays against a randomly assigned opponent and suﬀers a cost that corresponds tothe following matrix: C DC 1,1 3,0D 0,3 2,2The strategy D dominates the strategy C . This implies that playing D is theunique mean ﬁeld equilibrium. Indeed, the expected cost (given by (5)) of a Player 0that has a state vector x while the mean ﬁeld is m ( t ) is Z ∞ [ x C ( t )( m C ( t ) + 3 m D ( t ))( π CC ( t ) + π CD ( t )) + x D ( t )2 m D ( t )( π DC ( t ) + π DD ( t ))] e − βt dt = Z ∞ [ x C ( t ) + 2 m D ( t )] e − βt dt,

14y using the fact that π CC ( t ) + π CD ( t ) = π DC ( t ) + π DD ( t ) = 1 and x C ( t ) + x D ( t ) = m C ( t ) + m D ( t ) = 1.It should be clear that this cost is minimized when x C is minimal, which occurswhen the strategy is to choose action D regardless of the current state. This showsthat the only mean ﬁeld equilibrium is when all players choose action D .Let us now consider the game with N players and consider the following Markovstrategy: π N ( m ) = ( C if m C = 1 D if m C < β < N large, π N is a Markov Nash equilibrium.Assume that all players, except Player n , play the strategy π N and let us computethe best response of Player n . It should be clear that if at time 0, m C <

1, then thebest response of Player n is to play D . On the other hand, if m C = 1, then: • If Player n applies π N , she will suﬀer a cost R exp( − βt ) dt = 1 /β . • If Player n deviates from π N and chooses the action D , all players will alsodeviate after that time. This implies that m D ( t ) ≈ − exp( − t ) and that theplayer n will suﬀer a cost approximately equal to R ∞ ( x C ( t )+2 − e − t ) e − βt dt ≥ / ( β ( β + 1)) when N is large.When β <

1, then 2 / ( β ( β + 1)) > /β , so that Player n has no incentive to deviatefrom the strategy π N and that therefore, π N is a Nash equilibrium. We also observethat for this example, the value of the ﬁnite game does not converge to the one ofthe mean ﬁeld game.In conclusion to this section, one can argue that this counter-example should notbe surprising because, in mean ﬁeld games, punishment is possible against a fractionon the population that deviates but is not possible against individual deviation,because it is not seen in the population distribution.As a ﬁnal remark, as in the case of repeated games, the continuity with respectto m (not true here) is critical for convergence (see [35]).

4. Finite Horizon Case

Let us now consider mean ﬁeld games over a ﬁnite time horizon T . Thesegames are similar to games with discounted costs, previously deﬁned, but they onlyrun for a ﬁnite duration T . As in the discounted case, the evolution over timeof the population distribution m π is given by (2) and the evolution of Player 0’sdistribution is given by (4). 15iven the population strategy π and Player 0 strategy π , the expected cost ofPlayer 0 for the ﬁnite horizon case is deﬁned as follows: V ( π , π ) = Z T X i ∈E X a ∈A x i ( t ) c i,a ( m π ( t )) π i,a ( t ) ! dt. (18)In the literature, similar models have been studied, considering continuous timeﬁnite state space mean ﬁeld games with ﬁnite horizon. The authors in [21] consideruniformly convex cost functions and in [27] cost functions are assumed to be strictlyconvex. In our model, we assume that the costs are continuous in the populationdistribution. It can also be observed that the instantaneous cost of Player 0 is linearin π . Therefore, the model we study in this work is not covered by these papers.We deﬁne the notion of mean ﬁeld equilibrium for the ﬁnite horizon case as inthe discounted case, by replacing the cost function (5) by (18). Then, the proof ofthe existence Theorem 1 applies mutatis mutandis to show the existence of a meanﬁeld equilibrium in this case: Any continuous time mean ﬁeld game over a ﬁnitehorizon that satisﬁes Assumption (A1) has a mean ﬁeld equilibrium. The construction of a counter example of convergence with an inﬁnite timehorizon given in § § π N is not a Nashequilibrium for the N -player game because at the last time-slot, the best responseof Player n to any strategy is to play D . By induction on the number of time-slots,the only Nash equilibrium of the N -player game is when all players play D , whichcoincides with the mean ﬁeld equilibrium.Yet, a counter-example also exists for ﬁnite-time horizon. The essential idea isto start with a matrix game with two pure Nash equilibria instead of one as in theprevious example. Let us consider the following cost matrix:C D PC 1,1 3,0 4,0D 0,3 2,2 4,3P 0,4 3,4 3,3The setting is similar to the previous example: the action set is equal to the statestate E = A = { C, D, P } and at each time step, one player is chosen. If she selectsan action a ∈ A , then her state becomes a at the next time step. This game canbe viewed as a generalization of the prisoner’s dilemma with an additional Nash-equilibrium P (which stands for “punish”). It can be shown using a similar path as16n the previous section that, when T is large enough, the following time-dependentMarkovian strategy is a Nash equilibrium: π N ( m, t ) =  C if t < m c = 1; D if t ≥ m P = 0; P otherwise. (19)In the above strategy, the state P is used as a stick to punish people from deviatingfrom the imposed strategy. In this case, nobody has an incentive to deviate fromthis strategy at the last step because D is also a Nash equilibrium.The mean ﬁeld game has only two equilibria: The whole population always plays D , or the whole population always plays P . These equilibria are also equilibria forthe ﬁnite-game. Yet, they both have a larger cost than the strategy of Equation (19).This leads us to say that the value of the game does not converge: the asymptoticcost of the strategy (19) is strictly smaller than the cost of any of the mean ﬁeldequilibria.

5. Synchronous Players

As explained in the previous section, mean ﬁeld games in continuous time appearnaturally as the limit of N -player asynchronous games as N goes to inﬁnity. In theseasynchronous games with N players, only one player changes state at the same time.However, there are other situations where it is more natural to consider synchronousgames in which, at each time step, all players take an action. N -Player Games with Exchangeable Players Here we consider a ﬁnite synchronous game G Ns = ( N, E , A , { P a } , M , { c a } , β )with N identical players with several diﬀerences from the model used in Section 3.1,the main one being the replacement of the rate matrices by stochastic matrices. Asbefore, each Player n has an internal state X n ( t ) that belongs to a ﬁnite state space E ( X ( t ) = ( X ( t ) , . . . , X N − ( t )) and chooses an action from a ﬁnite action space A .The main diﬀerence with the previous asynchronous model is that at each time step t ∈ Z + , all players choose an action A n ( t ) ∈ A simultaneously. We assume that, aplayer in state i who chooses action a goes to state j with probability P ija ( X ( t ))and that, given X ( t ), the evolution of all players are independent. Furthermore,we assume that the players are exchangeable , i.e. for any permutation σ of the N players, P ija ( X ( t ) , . . . , X N − ( t )) = P ija ( X σ (0) ( t ) , . . . , X σ ( N − ( t )). The fact that When the time horizon is ﬁnite, it is natural to consider Markovian strategies that depend ontime. X ( t ) can be replaced bya dependence on the population distribution M ( t ). More precisely, for any vectorstate x , y ∈ E N and any action vector a ∈ A N , one can write: P ( X ( t + 1) = j | X ( t ) = i , A ( t ) = a , F t ) = N Y n =1 P i n j n a n ( M ( t )) , (20)where F t is the natural ﬁltration of the game up to time t , m is the population dis-tribution of x and ∀ i, j ∈ E , ∀ a ∈ A , P ija ( m ) forms a stochastic matrix, continuousin m .The instantaneous cost at time t depends on actions and state at time t −

1, symmetric in all players, so it can be written as a function of the populationdistribution: c X n ( t ) ,A n ( t ) ( M ( t )), and a discount factor δ at each time step. Given astrategy π used by Player 0 and a strategy π used by all the others, the expectedcost of Player n is: V N ( π , π ) = E " (1 − δ ) ∞ X t =0 δ t c X ( t ) ,A ( t ) ( M π ( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A is chosen w.r.t. π A n ′ is chosen w.r.t. π if n ′ = 0 . (21) Synchronous games also admit mean ﬁeld game limits. To construct this limit,let us consider a strategy π such that π i,a ( m ) is the probability for a player tochoose action a given that she is in state i and that M ( t ) = m . Assume that M (0)converges in probability to some m (0) as N goes to inﬁnity and that all playersexcept Player 0 apply a strategy π that is continuous in m . As shown in Theorem1 in [19] (up to diﬀerences in notations, the mean ﬁeld model in [19] is the same asEquation (20)), the population distribution M π ( t ) converges (in probability) to adeterministic quantity m π ( t ) as N goes to inﬁnity. m π ( t ) is deﬁned by m πj ( t + 1) = X i ∈E X a ∈A m πi ( t ) P i,j,a ( m π ( t )) π i,a ( m ( t )) . (22)We denote by π the strategy of Player 0. The probability that Player 0 is in state j ∈ E evolves over time according to the following equation: x j ( t + 1) = X i ∈E X a ∈A x i ( t ) P i,j,a ( m π ( t )) π i,a ( m ( t )) . (23)In this case, the cost of Player 0, given by (21) becomes V ( π , π ) = (1 − δ ) ∞ X t =0 X i ∈E X a ∈A δ t x i ( t ) c i,a ( m π ( t )) π i,a ( m ( t )) .

18s the evolution of m is deterministic, for any closed loop strategy π i,a ( m ( t ))and any initial condition m (0), there exists an open-loop strategy π i,a ( t ) that leadsto the same values for m π ( t ) and the same cost. Hence, for the mean ﬁeld model,one can replace any state-dependent strategy π ( m ( t )) in the above equations by atime-dependent strategy π ( t ).Player 0 chooses the strategy that minimizes her expected cost. When Player 0does so, we say it uses the best response to the mass strategy π . BR ( π ) = arg min π V ( π , π ) . A strategy is said to be a mean ﬁeld equilibrium if it is a ﬁxed point for the bestresponse function, that is, π MF E ∈ BR ( π MF E ) . One of the diﬃculties of the analysis of continuous time mean ﬁeld game isthat the elements under consideration (the population distribution, the populationstrategy, Player 0 strategy...) are continuous functions of time. In the discrete timecase, the model gets signiﬁcantly simpliﬁed since all the elements are vectors. Hence,the proof of the existence of a mean ﬁeld equilibrium for continuous-time mean ﬁeldgame (Theorem 1) can be adapted to show that the following result.

Theorem 3 (Mean Field Equilibrium Existence for Synchronous Games) . Any syn-chronous mean ﬁeld game with discounted cost that satisﬁes Assumption (A1) for P and c respectively, has a mean ﬁeld equilibrium.Sketch of proof. We ﬁrst observe that the set of discrete-time open-loop policies isa compact and convex set. Thus, to ﬁnish the proof, we need to show that the bestresponse function has a closed graph and it is convex. The former condition is truesince the set of open-loop policies belongs to a ﬁnite dimensional space and fromthe continuity assumptions (A1). The last condition can be shown using the samearguments as in the proof of Theorem 1.

The classical repeated games with discounted costs and with identical playersform a subclass of synchronous games, as deﬁned here. To see this, let us ﬁrstconsider a static N -player matrix game G with symmetric cost: u ( a , . . . , a N ) is theinstantaneous cost of any player when the players use actions a , . . . , a N respectively.Furthermore, we assume that u ( a , . . . , a N ) = u ( a σ , . . . , a σ N ), for any permutation σ of { , . . . , N } . The players repeat the matrix game inﬁnitely often and their cost19nder strategy π , · · · , π N is the discounted sum of the costs: V N ( π , π , · · · , π N ) = (1 − δ ) ∞ X t =0 δ t u ( π ( t ) , π ( t ) , · · · , π N ( t )) . (24)These games ﬁt in our framework: The state of a player is merely her currentaction ( X ( t ) = A ( t )) and the evolution of the state becomes trivial: Under state x = a and selecting action b , the next state does not depend on the other playersand becomes b with probability one: P ab ( b, M ( t )) = 1. The cost of one player ateach stage corresponds to an immediate cost c X n ( t ) ,A n ( t ) ( M ( t )) = u ( X ( t )) since thecost u only depends on the population distribution by symmetry. As for the totalcost of a player, (24) coincides with (21), as long as all players in the same state usethe same strategy. The relation between equilibria of N -player games with their mean ﬁeld limitsis also complex in the discrete time case.Let us ﬁrst focus on results that concern the performance of mean ﬁeld equilibriain the N -player game. The situation is almost similar to the continuous time caseand resembles Theorem 2 (i) in the sense that if π is a mean ﬁeld equilibrium, thenunder assumption (A1), there exists N such that for all N ≥ N , π is a local ε -equilibrium of the N -player game. The proof of this is essentially similar to theproof of Theorem 2.Let us now consider the Nash equilibria of the N -player game. The situation isvery diﬀerent from the continuous time case because the state of all the players canchange in one time unit in the discrete time while in continuous time, state can onlychange in small steps, one player at a time.This has several consequences on the nature of equilibria under both models.As mentioned before, the Nash equilibria in the continuous time case may dependon the initial population distribution, but this is not the case here, so that there ismore latitude for designing equilibria.Let us consider the particular case of repeated games, introduced in Section5.2.1. For this type of games, the set of equilibria can be characterized using theFolk Theorem for repeated games. Theorem 4 (Folk theorem, adapted from Theorem A in [18])) . Let G be a sym-metric matrix game, and let V ∗ be the cost under the strategy that repeats the Nashequilibrium of the static game G . Then for any compatible cost V smaller than V ∗ , In this context, a compatible cost is a cost that can be attained by at least one strategy. here exists a discount factor δ ∈ (0 , such that V ∗ is the cost of an equilibrium ofthe discounted repeated game. Actually, for any

V < V ∗ , the construction of an equilibrium whose cost is V is based on the “tit for tat” principle. We claim that none of these equilibria scaleat the mean ﬁeld limit. Let us consider the following example for a static game.Each player only has two strategies, D and C . If all players play D , the cost is −

1. If all players play C , the cost is −

2. If some players play D and others play C , then, all the players who play C get − M C while the players who play D get − M C − M D , where M C and M D are the proportions of players playing C and D respectively. These costs correspond to the average costs obtained by a player in amatching game against a random opponent.The unique Nash equilibrium of the static game is strategy ( D, D, . . . , D ). Thecost of the corresponding repeated game is (1 − δ ) P t − δ t = − π N in the following) for allplayers: Play D for k rounds then play C as long as every-other player has followedthe same pattern, else play D forever. The cost of this strategy is between − −

2: (1 − δ )( k − X t =0 − δ t + ∞ X t = k − δ t ) = − − δ k . The strategy π N is an equilibrium of the ﬁnite game if δ is large enough. Indeed,no player wants to deviate in the ﬁrst k rounds, because her cost would increase:In the rounds after k , a deviation provides an immediate cost advantage, at thecost of being punished until the end of time, so that a larger enough δ makes thisnon-proﬁtable.Let us now consider the mean ﬁeld game setting. If the whole population usesthe strategy π N and if Player 0 uses the same strategy her cost becomes V ( π N , π N ) = (1 − δ ) ∞ X t =0 X i ∈E X a ∈A δ t x i ( t ) c i,a ( m π ( t ) π i,a ( m ( t )) , = (1 − δ )( k − X t =0 − δ t + ∞ X t = k − δ t )= − − δ k . However in the mean ﬁeld setting, the best response of Player 0 to π N is not π N but the strategy π D where she plays D all the time. Indeed in this case her total21ost becomes V ( π D , π N ) = (1 − δ )( k − X t =0 − δ t + ∞ X t = k − δ t )= − − δ k . This shows that π N is not a mean ﬁeld equilibrium and a “free rider” player cantake advantage of the fact that the population will not act against her. We now focus on the mean ﬁeld games when objects evolve in discrete time timeover a ﬁnite horizon, 0 to T . In this case, the population distribution m π is deﬁnedby (22), which depends on the strategy of the mass π . We assume that Player 0 canchoose her own strategy π . The expected cost of Player 0 is V ( π , π ) = T X t =0 X i ∈E X a ∈A x i ( t ) c i,a ( m π ( t )) π i,a ( m ( t )) , where x i ( t ) is the probability that Player 0 is in state i at time t . The evolution of x i ( t ) over time is described in (23).Player 0 uses best response to a given population strategy π , which means thatshe selects the strategy π that minimizes her expected cost. We are interested inproving the existence of a mean ﬁeld equilibrium which consists of ﬁnding a strategythat is a ﬁxed-point for the best response function. In Section 5.2, we showed this forthe discounted case. In the ﬁnite horizon case, the vectors have ﬁnite size and, as aconsequence, it is immediate to show, using the same arguments of those required forthe proof of Theorem 3, that any discrete time mean ﬁeld game with ﬁnite horizoncost such that P and c satisfy Assumption ( A

1) has a mean ﬁeld equilibrium. Again,the proof mimics the proof of the analog Theorem 1 in continuous time over a ﬁnitehorizon.

6. Conclusions

In this article, we generalize the framework of discrete-space mean ﬁeld games tothe cases of non-convex costs and explicit interactions. They hit a good compromisebetween tractability (existence of an equilibria) and modelization power (includingpropagation and congestion behaviors). This model consists of a ﬁnite state spacemean ﬁeld game where the transition rates of the objects and the cost function ofa generic object depend not only on the actions taken but also on the population22istribution. We also show that there exists a sub-class of Nash equilibria for N -player games that converge to mean ﬁeld equilibria when the number of players goesto inﬁnity. Outside of this class, and in particular for all equilibria using the “tit fortat” principle over which the Folk theorem is based, the convergence does not hold.For future work, we are interested in ﬁnding conditions ensuring the uniquenessof the mean ﬁeld equilibrium. We believe that monotony assumptions similar toassumptions in [21] are required to prove the existence of a unique mean ﬁeld equi-librium in this model. On the other hand, another interesting open question concernsthe convergence of N -players equilibria to mean ﬁeld equilibria when the numberof player grows. We believe that there exist many N -player games for which theonly limiting equilibria are mean ﬁeld equilibrium, for example when players haveincomplete information about the game. It would be interesting to characterize thesub-class of strategies where convergence to mean ﬁeld equilibria holds. Obviously,this class includes all local strategies (no information) and excludes some Markovianones (full information). References [1] S. Adlakha, R. Johari, and G. Y. Weintraub. Equilibria of dynamic gameswith many players: Existence, approximation, and market structure.

Journalof Economic Theory , 2015.[2] N. I. Al-Najjar and R. Smorodinsky. Large nonanonymous repeated games.

Games and Economic Behavior , 37:26–39, 2001.[3] D. M. Ambrose. Strong solutions for time-dependent mean ﬁeld games withnon-separable hamiltonians.

Journal de Math´ematiques Pures et Appliqu´ees ,113:141 – 154, 2018.[4] R. Basna, A. Hilbert, and V. N. Kolokoltsov. An epsilon-nash equilibrium fornon-linear markov games of mean-ﬁeld-type on ﬁnite spaces.

Commun. Stoch.Anal , 8(4):449–468, 2014.[5] E. Bayraktar and A. Cohen. Analysis of a ﬁnite state many player game usingits master equation. arXiv preprint arXiv:1707.02648 , 2017.[6] M. Benaim and J.-Y. Le Boudec. A class of mean ﬁeld interaction models forcomputer and communication systems.

Performance Evaluation , 65(11):823–838, 2008.[7] A. Bensoussan, J. Frehse, and P. Yam.

Mean ﬁeld games and mean ﬁeld typecontrol theory . Springer, 2013.[8] K. C. Border.

Fixed point theorems with applications to economics and gametheory . Cambridge university press, 1989.239] P. Cardaliaguet, F. Delarue, J.-M. Lasry, and P.-L. Lions. The masterequation and the convergence problem in mean ﬁeld games. arXiv preprintarXiv:1509.02505 , 2015.[10] R. Carmona and F. Delarue. Probabilistic analysis of mean-ﬁeld games.

SIAMJournal on Control and Optimization , 51(4):2705–2734, 2013.[11] R. Carmona and F. Delarue. Probabilistic analysis of mean-ﬁeld games.

SIAMJ. Control Optim. , 51(4):2705–2734, 2013.[12] R. Carmona, D. Lacker, et al. A probabilistic weak formulation of mean ﬁeldgames and applications.

The Annals of Applied Probability , 25(3):1189–1231,2015.[13] R. Carmona and P. Wang. Finite state mean ﬁeld games with major and minorplayers. arXiv preprint arXiv:1610.05408 , 2016.[14] A. Cecchin and M. Fischer. Probabilistic approach to ﬁnite state mean ﬁeldgames.

Applied Mathematics & Optimization , Mar 2018.[15] P. Dasgupta and E. Maskin. The existence of equilibrium in discontinuouseconomic games, i: Theory.

Review of Economic Studies , 53(1):1–26, 1986.[16] J. Doncel, N. Gast, and B. Gaujal. Are mean-ﬁeld games the limits of ﬁnitestochastic games?

SIGMETRICS Perform. Eval. Rev. , 44(2):18–20, Sept. 2016.[17] A. M. Fink. Equilibrium in a stochastic n -person game. J. Sci. HiroshimaUniv. Ser. A-I Math. , 28(1):89–93, 1964.[18] D. Fudenberg and E. Maskin. The folk theorem in repeated games with dis-counting or with incomplete information.

Econometrica , 54(3):533–554, 1986.[19] N. Gast and B. Gaujal. A mean ﬁeld approach for optimization in discretetime.

Discrete Event Dynamic Systems , 21(1):63–101, 2011.[20] D. A. Gomes, J. Mohr, and R. R. Souza. Discrete time, ﬁnite state space meanﬁeld games.

Journal de Math´ematiques Pures et Appliqu´ees , 93(3):308 – 328,2010.[21] D. A. Gomes, J. Mohr, and R. R. Souza. Continuous time ﬁnite state meanﬁeld games.

Applied Mathematics & Optimization , 68(1):99–143, 2013.[22] D. A. Gomes and E. Pimentel. Time-dependent mean-ﬁeld games with logarith-mic nonlinearities.

SIAM Journal on Mathematical Analysis , 47(5):3798–3812,2015. 2423] D. A. Gomes, E. Pimentel, and H. S´anchez-Morgado. Time-dependent mean-ﬁeld games in the superquadratic case.

ESAIM: Control, Optimisation andCalculus of Variations , 22(2):562–580, 2016.[24] D. A. Gomes and E. A. Pimentel. Regularity for mean-ﬁeld games systemswith initial-initial boundary conditions: The subquadratic case. In

Dynamics,Games and Science , pages 291–304. Springer, 2015.[25] D. A. Gomes, E. A. Pimentel, and H. S´anchez-Morgado. Time-dependent mean-ﬁeld games in the subquadratic case.

Communications in Partial DiﬀerentialEquations , 40(1):40–76, 2015.[26] A. Granas and J. Dugundji.

Fixed point theory . Springer Science & BusinessMedia, 2013.[27] O. Gu´eant. Existence and uniqueness result for mean ﬁeld games with con-gestion eﬀect on graphs.

Applied Mathematics & Optimization , 72(2):291–303,2014.[28] O. Gu´eant, J.-M. Lasry, and P.-L. Lions. Mean ﬁeld games and applications.In

Paris-Princeton Lectures on Mathematical Finance 2010 , volume 2003 of

Lecture Notes in Mathematics , pages 205–266. Springer Berlin Heidelberg, 2011.[29] M. Huang. Mean ﬁeld stochastic games with discrete states and mixed players.In

Game Theory for Networks , pages 138–151. Springer, 2012.[30] M. Huang, R. Malhame, and P. Caines. Large population stochastic dynamicgames: Closed-loop mckean vlasov systems and the nash certainty equivalenceprinciple.

Communications in Information and Systems , 6(3):221 252, 2006.Special issue in honor of the 65th birthday of Tyrone Duncan.[31] D. Lacker. A general characterization of the mean ﬁeld limit for stochasticdiﬀerential games.

Probability Theory and Related Fields , 165(3), Aug 2016.[32] J.-M. Lasry and P.-L. Lions. Jeux `a champ moyen. i–le cas stationnaire.

Comptes Rendus Math´ematique , 343(9):619–625, 2006.[33] J.-M. Lasry and P.-L. Lions. Jeux `a champ moyen. ii–horizon ﬁni et contrˆoleoptimal.

Comptes Rendus Math´ematique , 343(10):679–684, 2006.[34] J.-M. Lasry and P.-L. Lions. Mean ﬁeld games.

Japanese Journal of Mathe-matics , 2(1):229–260, 2007.[35] H. Sabourian. Anonymous repeated games with a large number of players andrandom outcomes.

JOURNAL OF ECONOMIC THEORY , 51:92–110, 1990.[36] W. Sandholm.

Population Games and Evolutinary Dynamics . MIT Press, 2010.2537] H. Tembine. Mean ﬁeld stochastic games: convergence, q/h-learning and opti-mality. In

American Control Conference (ACC), 2011 , pages 2423–2428. IEEE,2011.[38] H. Tembine, J.-Y. L. Boudec, R. El-Azouzi, and E. Altman. Mean ﬁeld asymp-totics of markov decision evolutionary games and teams. In

Game Theory forNetworks, 2009. GameNets’ 09. International Conference on , pages 140–150.IEEE, 2009.[39] Z. Wang, C. T. Bauch, S. Bhattacharyya, A. d’Onofrio, P. Manfredi, M. Perc,N. Perra, M. Salath´e, and D. Zhao. Statistical physics of vaccination.