[PDF] Binary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics

Abstract

This paper considers mean field games in a multi-agent Markov decision process (MDP) framework. Each player has a continuum state and binary action, and benefits from the improvement of the condition of the overall population. Based on an infinite horizon discounted individual cost, we show existence of a stationary equilibrium, and prove its uniqueness under a positive externality condition. We further analyze comparative statics of the stationary equilibrium by quantitatively determining the impact of the effort cost.

Full PDF

aa r X i v : . [ m a t h . O C ] J a n Binary Mean Field Stochastic Games:Stationary Equilibria and Comparative Statics ∗ Minyi Huang and Yan Ma

Abstract

This paper considers mean ﬁeld games in a multi-agent Markov decisionprocess (MDP) framework. Each player has a continuum state and binary action,and beneﬁts from the improvement of the condition of the overall population. Basedon an inﬁnite horizon discounted individual cost, we show existence of a stationaryequilibrium, and prove its uniqueness under a positive externality condition. Wefurther analyze comparative statics of the stationary equilibrium by quantitativelydetermining the impact of the effort cost.

Mean ﬁeld game theory provides a powerful methodology for reducing complexityin the analysis and design of strategies in large population dynamic games [25, 30,37]. Following ideas in statistical physics, it takes a continuum approach to specifythe aggregate impact of many individually insigniﬁcant players and solves a specialstochastic optimal control problem from the point of view of a representative player.By this methodology, one may construct a set of decentralized strategies for theoriginal large but ﬁnite population model and show its ε -Nash equilibrium property Minyi HuangSchool of Mathematics and Statistics, Carleton University, Ottawa, ON K1S 5B6, Canada e-mail: [email protected].

This author was supported by the Natural Sciences and Engi-neering Research Council (NSERC) of CanadaYan MaSchool of Mathematics and Statistics, Zhengzhou University, 450001, Henan, China e-mail: [email protected].

This author was supported by the National Science Foundation ofChina (No. 11601489) ∗ in IMA volume Modeling, Stochastic Control, Optimization, and Applications , eds. G. Yin andQ. Zhang, Springer, 2019, p. 283-313. Submitted Dec 2018; revised Feb 2019. This version: Oct10, 2020. Minor changes in Example 2 1 Minyi Huang and Yan Ma [25, 26, 30]. A related solution notion in Markov decision models is the obliviousequilibrium [55]. The readers are referred to [12, 16, 17, 18, 19] for an overview onmean ﬁeld game theory and further references. For mean ﬁeld type optimal control,see [12, 56], but the analysis in these models only involves a single decision maker.Dynamic games within an MDP setting originated from the work of Shapley andare called stochastic games [21, 50]. Their mean ﬁeld game extension has been stud-ied in the literature; see e.g. [3, 13, 46, 55]. Continuous time mean ﬁeld games withﬁnite state space can be found in [22, 35]. Our previous work [27, 28] studied aclass of mean ﬁeld games in a multi-agent Markov decision process (MDP) frame-work. The players in [27] have continuum state spaces and binary action spaces,and have coupling through their costs. The state of each player is used to model itsrisk (or unﬁtness) level, which has random increase if no active control is taken.Naturally, the one-stage cost of a player is an increasing function of its own stateapart from coupling with others. The motivation of this modeling framework comesfrom applications including network security investment games and ﬂue vaccinationgames [34, 38, 40]; when the one-stage cost is an increasing function of the pop-ulation average state, it reﬂects positive externalities. Markov decision processeswith binary action spaces also arise in control of queues and machine replacementproblems [4, 10]. Binary choice models have formed a subject of signiﬁcant inter-est [8, 15, 48, 49, 54]. Our game model has connection with anonymous sequentialgames [33], which combine stochastic game modeling with a continuum of players.In anonymous sequential games one determines the equilibrium as a joint state-action distribution of the population and leaves the individual strategies unspeciﬁed[33, Sec. 4], although there is an interpretation of randomized actions for playerssharing a given state.For both anonymous games and MDP based mean ﬁeld games, stationary solu-tions with discount have been studied in the literature [3, 33]. These works give morefocus on ﬁxed point analysis to prove the existence of a stationary distribution. Thisapproach does not address ergodic behavior of individuals or the population whileassuming the population starts from the steady-state distribution at the initial time.Thus, there is a need to examine whether the individuals collectively have the abilityto move into that distribution at all when they have a general initial distribution. Ourergodic analysis based approach will provide justiﬁcation of the stationary solutionregarding the population’s ability to settle down around the limiting distribution.The previous work [27, 28] studied the ﬁnite horizon mean ﬁeld game by show-ing existence of a solution with threshold policies, and under an inﬁnite horizondiscounted cost further proved there is at most one stationary equilibrium for whichexistence was not established. A similar continuous time modeling is introduced in[57], which addresses Poisson state jumps and impulse control. It should be notedthat except for linear-quadratic models [9, 26, 31, 39, 43], mean ﬁeld games rarelyhave closed-form solutions and often rely on heavy numerical computations. Withinthis context, the consideration of structured solutions, such as threshold policies, isof particular interest from the point of view of efﬁcient computation and simpleimplementation. Under such a policy, the individual states evolve as regenerativeprocesses [6, 51]. inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 3

By exploiting stochastic monotonicity, this paper adopts more general state tran-sition assumptions than in [27, 28] and continues the analysis on the stationary equa-tion system. The ﬁrst contribution of the present paper is the proof of the existenceof a stationary equilibrium. Our analysis depends on checking the continuous de-pendence of the limiting state distribution on the threshold parameter in the bestresponse. The existence and uniqueness analysis in this paper has appeared in apreliminary form in the conference paper [29].A key parameter in our game model is the effort cost. Intuitively, this parameteris a disincentive indicator of an individual for taking active efforts, and in turn willfurther impact the mean ﬁeld forming the ambient environment of that agent. Thissuggests that we can study a family of mean ﬁeld games parametrized by the effortcosts and compare their solution behaviors. We address this in the setup of com-parative statics, which have a long history in the economic literature [24, 42, 47]and operations research [53] and provide the primary means to analyze the effect ofmodel parameter variations. For dynamic models, such as economic growth mod-els, the analysis follows similar ideas and is sometimes called comparative dynamics[5, 11, 45, 47] by comparing two dynamic equilibria. In control and optimization,such studies are usually called sensitivity analysis [14, 20, 32]. For comparativestatics in large static games and mean ﬁeld games, see [1, 2]. Our analysis is accom-plished by performing perturbation analysis around the equilibrium of the meanﬁeld game.The paper is organized as follows. Section 2 introduces the mean ﬁeld stochasticgame. The best response is analyzed in Section 3. Section 4 proves existence anduniqueness of stationary equilibria. Comparative statics are analyzed in Section 5.Section 6 concludes the paper.

The system consists of N players denoted by A i , 1 ≤ i ≤ N . At time t ∈ Z + = { , , , . . . } , the state of A i is denoted by x it , and its action by a it . For simplicity, weconsider a population of homogeneous (or symmetric) players. Each player has statespace S = [ , ] and action space A = { a , a } . A value of S may be interpreted asa risk or unﬁtness level. A player can either take inaction (as a ) or make an activeeffort (as a ). For an interval I , let B ( I ) denote the Borel σ -algebra of I .The state of each player evolves as a controlled Markov process, which is affectedonly by its own action. For t ≥ x ∈ S , the state has a transition kernel speciﬁedby P ( x it + ∈ B | x it = x , a it = a ) = Q ( B | x ) , (1) P ( x it + = | x it = x , a it = a ) = , (2) Minyi Huang and Yan Ma where Q ( ·| x ) is a stochastic kernel deﬁned for B ∈ B ( S ) and Q ([ x , ] | x ) =

1. Bythe structure of Q , the state of the player deteriorates if no active control is taken.The vector process ( x t , . . . x Nt ) constitutes a controlled Markov process in higherdimension with its transition kernel deﬁning a product measure on ( B ( S )) N forgiven ( x t , · · · , x Nt , a t , . . . , a Nt ) .Deﬁne the population average state x ( N ) t = N ∑ Ni = x it . The one stage cost of A i is c ( x it , x ( N ) t , a it ) = R ( x it , x ( N ) t ) + γ { a it = a } , where γ > γ { a it = a } is the effort cost. The function R ≥ S × S and models the risk-related cost. Let ν i denote the strategy of A i . We introduce theinﬁnite horizon discounted cost J i ( x , . . . , x N , ν , . . . , ν N ) = E ∞ ∑ t = β t c ( x it , x ( N ) t , a it ) , ≤ i ≤ N . (3)The standard methodology of mean ﬁeld games may be applied by approximating { x ( N ) t , t ≥ } by a deterministic sequence { z t , t ≥ } which depends on the initialcondition of the system. One may solve the limiting optimal control problem of A i and derive a dynamic programming equation for its value function denoted by v i ( t , x , ( z k ) ∞ k = ) , whose dependence on t is due to the time-varying sequence { z t , t ≥ } . Subsequently one derives another equation for the mean ﬁeld { z t , t ≥ } byaveraging the individual states across the population. This approach, however, hasthe drawback of heavy computational load. We are interested in a steady-state form of the solution of the mean ﬁeld gamestarting with { z t , t ≥ } . Such steady state equations provide information on the longtime behavior of the solution and are of interest in their own right. They may alsobe used for approximation purposes to compute strategies efﬁciently. We introducethe system v ( x ) = min h β Z v ( y ) Q ( dy | x ) + R ( x , z ) , β v ( ) + R ( x , z ) + γ i , (4) z = Z x µ ( dx ) , (5)where µ is a probability measure on S . We say ( v , z , µ , a i ( · )) is a stationary equilib-rium to (4)-(5) if i) the feedback policy a i ( · ) , as a mapping from S to { a , a } , is thebest response with respect to z in (4), ii) given an initial distribution of x i , { x it , t ≥ } under the policy a i has its distribution converging (under a total variation norm oronly weakly) to the stationary distribution (or called limiting distribution) µ . inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 5 We may interpret v as the value function of an MDP with cost ¯ J i ( x i , z , ν i ) = E ∑ ∞ t = β t c ( x it , z , a it ) . An alternative way to interpret (4)-(5) is that the initial state of A i has been sampled according to the “right” distribution µ , and that z is obtainedby averaging an inﬁnite number of such initial values by the law of large numbers[52]. A similar solution notion is adopted in [2, 3] but ergodicity is not part of theirsolution speciﬁcation.Let the probability measure µ k be the distribution of R -valued random vari-able Z k , k = ,

2. We say µ stochastically dominates µ , and denote µ ≤ st µ ,if µ (( y , ∞ )) ≥ µ (( y , ∞ )) (or equivalently, P ( Z > y ) ≥ P ( Z > y ) ) for all y . It iswell known [44] that µ ≤ st µ if and only if Z ψ ( y ) µ ( dy ) ≤ Z ψ ( y ) µ ( dy ) (6)for all increasing function ψ (not necessarily strictly increasing) for which the twointegrals are ﬁnite. A stochastic kernel Q ( B | x ) , 0 ≤ x ≤ B ∈ B ( S ) , is said to bestrictly stochastically increasing if ϕ ( x ) : = R S ψ ( y ) Q ( dy | x ) is strictly increasing in x ∈ S for any strictly increasing function ψ : [ , ] → R for which the integral is nec-essarily ﬁnite. Q ( ·| x ) is said to be weakly continuous if ϕ is continuous whenever ψ is continuous.Let { Y t , t ≥ } be a Markov process with state space [ , ] , transition kernel Q ( ·| x ) and initial state Y =

0. So each of its trajectories is monotonically in-creasing. Deﬁne τ θ Q = inf { t | Y t ≥ θ } for θ ∈ ( , ) . It is clear that τ θ Q ≤ τ θ Q for0 < θ < θ < { x i , i ≥ } are i.i.d. random variables taking values in S .(A2) R ( x , z ) is a continuous function on S × S . For each ﬁxed z , R ( · , z ) is strictlyincreasing.(A3) i) Q ( ·| x ) satisﬁes Q ([ x , ] | x ) = x , and is strictly stochastically in-creasing; ii) Q ( dy | x ) is weakly continuous and has a positive probability density q ( y | x ) for each ﬁxed x <

1; iii) for any small 0 < δ <

1, inf x Q ([ − δ , ] | x ) > R ( x , · ) is increasing for each ﬁxed x .(A5) lim θ ↑ E τ θ Q = ∞ .(A3)-iii) will be used to ensure the uniform ergodicity of the controlled Markovprocess. In fact, under (A3) we can show E τ θ Q < ∞ . The following condition is aspecial case of (A3).(A3 ′ ) There exists a random variable such that Q ( ·| x ) is equal to the law of x + ( x − ) ξ for some random variable ξ with probability density f ξ ( x ) >

0, a.e. x ∈ S .When (A3 ′ ) holds, we can verify (A5) by analyzing the stopping time τ ξ = inf { t | ∏ ts = ξ s ≤ − θ } , where { ξ s , s ≥ } is a sequence of i.i.d. random variableswith probability density f ξ . For existence analysis of the mean ﬁeld game, (A5) willbe used to ensure continuity of the mean ﬁeld when the threshold θ approaches 1. Minyi Huang and Yan Ma

Proposition 1

The two conditions are equivalent:i) µ ≤ st µ , and µ = µ ;ii) R R φ ( y ) µ ( dy ) < R R φ ( y ) µ ( dy ) for all strictly increasing function φ for whichboth integrals are ﬁnite.Proof. Assume i) holds. By [44, Theorem 1.2.16], we have φ ( Z ) ≤ st φ ( Z ) , (7)and so E φ ( Z ) ≤ E φ ( Z ) . Since µ = µ , there exists y such that P ( Z > y ) = P ( Z > y ) . Take r such that φ ( y ) = r . Then P ( φ ( Z ) > r ) = P ( φ ( Z ) > r ) . (8)If E φ ( Z ) = E φ ( Z ) were true, by (7) and [44, Theorem 1.2.9], φ ( Z ) and φ ( Z ) would have the same distribution, which contradicts (8). We conclude E φ ( Z ) < E φ ( Z ) , which is equivalent to ii).Next we show ii) implies i). Let ψ be any increasing function satisfying (6)with two ﬁnite integrals. When ii) holds, we take φ ε = ψ + ε y + | y | , ε >

0. Then R φ ε µ ( dy ) < R φ ε µ ( dy ) holds for all ε >

0. Letting ε →

0, then (6) follows and µ ≤ st µ . It is clear µ = µ . ⊓⊔ For this section we assume (A1)-(A3). We take any ﬁxed z ∈ [ , ] and consider (4)as a separate equation, which is rewritten below: v ( x ) = min n β Z v ( y ) Q ( dy | x ) + R ( x , z ) , β v ( ) + R ( x , z ) + γ o . (9)Here z is not required to satisfy (5). In relation to the mean ﬁeld game, the resultingoptimal policy will be called the best response with respect to z . Denote G ( x ) = R v ( y ) Q ( dy | x ) . Lemma 1. i) Equation (9) has a unique solution v ∈ C ([ , ] , R ) .ii) v is strictly increasing.iii) The optimal policy is determined as follows:a) If β G ( ) < β v ( ) + γ , a i ( x ) ≡ a .b) If β G ( ) = β v ( ) + γ , a i ( ) = a and a i ( x ) = a for x < .c) If β G ( ) ≥ β v ( ) + γ , a i ( x ) ≡ a .d) If β G ( ) < β v ( ) + γ < ρ G ( ) , there exists a unique x ∗ ∈ ( , ) and a i is athreshold policy with parameter x ∗ , i.e., a i ( x ) = a if x ≥ x ∗ and a i ( x ) = a if x < x ∗ .Proof. Deﬁne the dynamic programming operator inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 7 ( L g )( x ) = min n β Z g ( y ) Q ( dy | x ) + R ( x , z ) , β g ( ) + R ( x , z ) + γ o , (10)which is from C ([ , ] , R ) to itself. The proving method in [27], [28, Lemma 6],which assumed (A3 ′ ), can be extended to the present equation (9) in a straightfor-ward manner.In particular, for the proof of ii) and iii), we obtain progressively stronger prop-erties of v and G . First, denoting g = g k + = L g k for k ≥

0, we use a suc-cessive approximation procedure to show that v is increasing, which implies that G is continuous and increasing by weak continuity and monotonicity of Q . Since R is strictly increasing in x , by the right hand side of (9), we show that v is strictlyincreasing, which implies the same property for G by strict monotonicity of Q . ⊓⊔ For the optimal policy speciﬁed in part iii) of Lemma 1, we can formally denotethe threshold parameters for the corresponding cases: a) θ = + , b) θ =

1, c) θ =

0, and d) θ = x ∗ . Such a policy will be called a θ -threshold policy. We give thecondition for θ = Lemma 2.

For γ > and v solving (9) , β G ( ) ≥ β v ( ) + γ (11) holds if and only if γ ≤ β Z R ( y , z ) Q ( dy | ) − β R ( , z ) . (12) Proof.

We show necessity ﬁrst. Suppose (11) holds. Note that G ( x ) is strictly in-creasing on [ , ] . Equation (9) reduces to v ( x ) = β v ( ) + R ( x , z ) + γ , (13) β G ( x ) ≥ β v ( ) + γ , ∀ x . (14)From (13), we uniquely solve v ( ) = − β [ R ( , z ) + γ ] , v ( x ) = β − β [ R ( , z ) + γ ] + R ( x , z ) + γ , (15)which combined with (14) implies (12).We continue to show sufﬁciency. If γ > v and verify (13) and (14). So v is the unique solution of (9) satisfying (11). ⊓⊔ The next lemma gives the condition for θ = + in the best response. Lemma 3.

For γ > and v solving (9) , we have β G ( ) < β v ( ) + γ (16) if and only if Minyi Huang and Yan Ma γ > β [ V β ( ) − V β ( )] , (17) where V β ( x ) ∈ C ([ , ] , R ) is the unique solution ofV β ( x ) = β Z V β ( y ) Q ( dy | x ) + R ( x , z ) . (18) Proof.

By Banach’s ﬁxed point theorem, we can show that (18) has a unique so-lution. Next, by a successive approximation { V ( k ) β , k ≥ } with V ( ) β = V β is strictly increasing. Moreover, R V β ( y ) Q ( dy | x ) is increasing in x by monotonicity of Q .We show necessity. Since G is strictly increasing, (16) implies that the right handside of (9) now reduces to the ﬁrst term within the parentheses and that v = V β . So(17) follows.To show sufﬁciency, suppose (17) holds. We have β Z V β ( y ) Q ( dy | x ) ≤ β V β ( ) < β V β ( ) + γ , ∀ x . Therefore, v : = V β gives the unique solution of (9) and β G ( ) < β v ( ) + γ . ⊓⊔ Example 1.

Let R ( x , z ) = x ( c + z ) , where c >

0. Take Q ( ·| x ) as uniform distributionon [ x , ] . Then (18) reduces to V β ( x ) = β − x Z x V β ( y ) dy + R ( x , z ) . Deﬁne φ ( x ) = R x V β ( y ) dy , x ∈ [ , ] . Then φ ′ ( x ) = − β − x φ ( x ) − R ( x , z ) holds andwe solve φ ( x ) = ( − x ) β Z x R ( s , z )( − s ) β ds , where the right hand side converges to 0 as x → − . We further obtain V β ( x ) = β ( − x ) β − Z x R ( s , z )( − s ) β ds + R ( x , z ) for x ∈ [ , ) , and the right hand side has the limit R ( , z ) − β as x → − . This gives awell deﬁned V β ∈ C ([ , ] , R ) . Therefore, V β ( ) = β ( c + z )( − β )( − β ) . Then (17) reduces to γ > β ( c + z ) − β . inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 9 Assume (A1)-(A5) for this section. Deﬁne the class P of probability measureson S as follows: ν ∈ P if there exist a constant c ν ≥ g ( x ) ≥ [ , ] such that ν ( B ) = Z B g ( x ) dx + c ν B ( ) , where B ∈ B ( S ) and 1 B is the indicator function of B . When restricted to ( , ] , ν isabsolutely continuous with respect to the Lebesgue measure µ Leb .Let X be a random variable with distribution ν ∈ P . Set x it = X . Deﬁne Y = x it + by applying a it ≡ a . Further deﬁne Y = x it + by applying the r -threshold policy a it with r ∈ ( , ) . Lemma 4.

The distribution ν i of Y i is in P for i = , .Proof. Let q ( y | x ) denote the density function of Q ( ·| x ) for x ∈ [ , ) , where q ( y | x ) = y < x . Denote g ( y ) = Z ≤ x < y q ( y | x ) ν ( dx ) , y ∈ ( , ) , and g ( y ) = Z ≤ x < y ∧ r q ( y | x ) ν ( dx ) , y ∈ ( , ) . Then it can be checked that P ( Y ∈ B ) = Z B g ( y ) dy , P ( Y ∈ B ) = Z B g ( y ) dy + P ( X ≥ r ) B ( ) . This completes the lemma. ⊓⊔ In order to show that (4)-(5) has a solution, we deﬁne a mapping Γ : S → S bythe following rule. For z ∈ [ , ] , we solve (4) to obtain a well deﬁned threshold θ ( z ) ∈ [ , ] ∪ { + } , which in turn determines a limiting distribution µ θ ( z ) of theclosed-loop state process x it by Lemma A.1. Deﬁne Γ ( z ) = Z x µ θ ( z ) ( dx ) . If Γ has a ﬁxed point, we obtain a solution to (4)-(5).We analyze the case where the best response gives a strictly positive threshold.Assume γ > β max z ∈ [ , ] Z [ R ( y , z ) − R ( , z )] Q ( dy | ) . (19) Note that under a zero threshold policy, the behavior of the state process is sensitiveto a positive perturbation of the threshold. The above condition ensures that the zerothreshold will not occur, and this will ensure continuity of Γ to facilitate the ﬁxedpoint analysis. Lemma 5.

Assume (19) . Then Γ ( z ) is continuous on [ , ] .Proof. Let z ∈ [ , ] be ﬁxed, giving a corresponding threshold parameter θ when(9) is solved using z . We check continuity at z and consider 3 cases.Case i) θ ∈ ( , ) . Let π be the stationary distribution with the θ -thresholdpolicy. Consider any ﬁxed ε >

0. There exists ε such that for all θ ∈ ( θ − ε , θ + ε ) ⊂ ( , ) , | R x π ( dx ) − R x π ( dx ) | < ε , where π is the stationary distributionassociated with θ . This follows since lim θ → θ k π − π k TV = z , we can select a sufﬁciently small δ > | z − z | < δ , z generatesa threshold parameter θ ∈ ( θ − ε , θ + ε ) , which implies | Γ ( z ) − Γ ( z ) | ≤ ε .Case ii) z gives θ =

1. Then Γ ( z ) =

1. Fix any ε >

0. Then we can show thereexists ε such that for all θ ∈ ( − ε , ) , the associated stationary distribution π θ gives | Γ ( z ) − R x π θ ( dx ) | < ε , where we use (A5) and the right hand side of (C.1)to estimate a lower bound for R x π θ ( dx ) . Now, there exists δ > z satisfying | z − z | < δ gives a threshold θ either in ( − ε , ) or equal to 1 or 1 + ;for each case, we have | Γ ( z ) − R x π θ ( dx ) | < ε .Case iii) z gives θ = + . Then there exists δ > z satisfying | z − z | < δ gives a threshold parameter θ = + . Then Γ ( z ) = Γ ( z ) = ⊓⊔ Theorem 1.

Assume (19) . There exists a stationary equilibrium to (4) - (5) .Proof. Since Γ is a continuous function from [ , ] to [ , ] by Lemma 5, the theo-rem follows from Brouwer’s ﬁxed point theorem. ⊓⊔ Let x i , θ t and π θ denote the state process and its stationary distribution, respec-tively, under a θ -threshold policy. Denote z ( θ ) = R x π θ ( dx ) . We have the ﬁrstcomparison theorem on monotonicity.

Lemma 6. z ( θ ) ≤ z ( θ ) for < θ < θ < .Proof. By the ergodicity of { x i , θ l t , t ≥ } in Lemma A.2, we have the representation z ( θ l ) = lim k → ∞ k ∑ k − t = x i , θ l t w.p.1. Lemma C.2 implies z ( θ ) ≤ z ( θ ) . ⊓⊔ To establish uniqueness, we consider R ( x , z ) = R ( x ) R ( z ) , where R ≥ R ≥

0, and which satisﬁes (A1)-(A5). We further make the following assumption.(A6) R > S .This assumption indicates positive externalities since an individual beneﬁts fromthe decrease of the population average state. This condition has a crucial role in theuniqueness analysis.Given the product form of R , now (9) takes the form: inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 11 V ( x ) = min h β Z V ( y ) Q ( dy | x ) + R ( x ) R ( z ) , β V ( ) + R ( x ) R ( z ) + γ i . Consider 0 ≤ z < z ≤ V l ( x ) = min h β Z V l ( y ) Q ( dy | x ) + R ( x ) R ( z l ) , β V l ( ) + R ( x ) R ( z l ) + γ i . (20)Denote the optimal policy as a threshold policy with parameter θ l in [ , ] or equalto 1 + , where we follow the interpretation in Section 3 if θ l = + . We state thesecond comparison theorem about the threshold parameters under different meanﬁeld parameters z l . Theorem 2. θ and θ in (20) are speciﬁed according to the following scenarios:i) If θ = , then we have either θ ∈ [ , ] or θ = + .ii) If θ ∈ ( , ) , we have either a) θ ∈ ( θ , ) , or b) θ = , or c) θ = + .iii) If θ = , θ = + .iv) If θ = + , θ = + .Proof. Since R ( z ) > R ( z ) >

0, we divide both sides of (20) by R ( z l ) and deﬁne γ l = γ R ( z l ) . Then 0 < γ < γ . The dynamic programming equation reduces to (D.2).Subsequently, the optimal policy is determined according to Lemma D.4. ⊓⊔ Corollary 1.

Assume (A6) in addition to the assumptions in Theorem 1. Then thesystem (4) - (5) has a unique stationary equilibrium.Proof. The proof is similar to [27, 28], which assumed (A3 ′ ). ⊓⊔ . This section assumes (A1)-(A6). Consider the two solution systems  ¯ v ( x ) = min h β Z ¯ v ( y ) Q ( dy | x ) + R ( x ) R ( ¯ z ) , β ¯ v ( ) + R ( x ) R ( ¯ z ) + ¯ γ i , ¯ z = Z x ¯ µ ( dx ) , (21)and  v ( x ) = min h β Z v ( y ) Q ( dy | x ) + R ( x ) R ( z ) , β v ( ) + R ( x ) R ( z ) + γ i , z = Z x µ ( dx ) . (22) Suppose ¯ γ satisﬁes (19). By Corollary 1, (21) has a unique solution denoted by ( ¯ v , ¯ z , ¯ µ , ¯ θ ) , where ¯ θ is the threshold parameter. We further assume ¯ θ ∈ ( , ) . Sup-pose γ > ¯ γ . Then we can uniquely solve ( v , z , µ , θ ) . The next theorem presents aresult on monotone comparative statics [53]. Theorem 3. If γ > ¯ γ , we have θ > ¯ θ , z > ¯ z , v > ¯ v . Proof.

We prove by contradiction. Assume θ ≤ ¯ θ . Then by Lemma 6, z ≤ ¯ z , andtherefore, γ R ( z ) > ¯ γ R ( ¯ z ) . By the method of proving Theorem 2, we would establish θ > ¯ θ , which contradicts the assumption θ ≤ ¯ θ . We conclude θ > ¯ θ . By Lemma 6and Remark B.1, we have z > ¯ z . For (21), we use value iteration to approximate ¯ v by an increasing sequence of functions ¯ v k with ¯ v =

0. Similarly, v is approximatedby v k with v =

0. By induction, we have v k ≥ ¯ v k for all k . This proves v ≥ ¯ v .Next, we have β v ( ) + R ( x ) R ( z ) + γ > β ¯ v ( ) + R ( x ) R ( ¯ z ) + ¯ γ on [ , ] , and β R v ( y ) Q ( dy | x ) + R ( x ) R ( z ) > β R ¯ v ( y ) Q ( dy | x ) + R ( x ) R ( ¯ z ) on ( , ] . By themethod in [27, Lemma 2], we have v > ¯ v on ( , ] . Then R v ( y ) Q ( dy | ) > R ¯ v ( y ) Q ( dy | ) . This further implies v ( ) > ¯ v ( ) . ⊓⊔ Remark 1.

It is possible to have θ = + in Theorem 3.By a continuity argument, we can further show lim γ → ¯ γ ( | θ − ¯ θ | + | z − ¯ z | + sup x | v ( x ) − ¯ v ( x ) | ) =

0. In the analysis below, we take γ = ¯ γ + ε for some small ε >

0. For this section, we further introduce the following assumption.(A7) For γ > ¯ γ , ( v , z , θ ) has the representation v ( x ) = ¯ v ( x ) + ε w ( x ) + o ( ε ) , ≤ x ≤ , (23) z = ¯ z + ε z γ + o ( ε ) , (24) θ = ¯ θ + εθ γ + o ( ε ) , (25)where v , z , θ are solved depending on the parameter γ and w is a function deﬁnedon [ , ] . The derivatives z γ and θ γ at ¯ γ exist, and R ( z ) is differentiable on [ , ] .For 0 ≤ x <

1, the probability density function q ( y | x ) , y ∈ [ x , ] , for Q ( dy | x ) iscontinuous on { ( x , y ) | ≤ x ≤ y < } . Moreover, ∂ q ( y | x ) ∂ x exists and is continuousin ( x , y ) .We aim to provide a characterization of w , z γ , θ γ . Theorem 4.

The function w satisﬁesw ( x ) =  β Z w ( y ) Q ( dy | x ) + R ( x ) R ′ ( ¯ z ) z γ , ≤ x ≤ ¯ θ , β w ( ) + R ( x ) R ′ ( ¯ z ) z γ + , ¯ θ < x ≤ . (26) Proof.

We have inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 13 ¯ v ( x ) = β Z ¯ v ( y ) Q ( dy | x ) + R ( x ) R ( ¯ z ) , x ∈ [ , ¯ θ ] and v ( x ) = β Z v ( y ) Q ( dy | x ) + R ( x ) R ( z ) , x ∈ [ , θ ] . Note that θ > ¯ θ . For any ﬁxed x ∈ [ , ¯ θ ] , we have v ( x ) − ¯ v ( x ) = β Z ( v ( y ) − ¯ v ( y )) Q ( dy | x ) + R ( x )( R ( z ) − R ( ¯ z )) . Then the equation of w ( x ) for x ∈ [ , ¯ θ ] is derived. We similarly treat the case x ∈ ( ¯ θ , ] . ⊓⊔ Remark 2.

In general w has discontinuity at x = ¯ θ , so that β R w ( y ) Q ( dy | ¯ θ ) = β w ( ) + . We give some interpretation. Let the value function be written as v ( x , γ ) to explicitly indicate γ . Let the rectangle [ , ] × [ γ a , γ b ] be a region of interest inwhich ( x , γ ) varies so that the value function deﬁnes a continuous surface. Then ( θ , γ ) starts at ( ¯ θ , ¯ γ ) and traces out the curve of an increasing function along whichthe expression of the value function has a switch, and the value function surfacemay be visualized as two pieces glued together along the curve in a non-smoothway. The value of w amounts to ﬁnding on the surface the directional derivative inthe direction of γ ; and therefore, discontinuity may occur at x = ¯ θ .To better understand the solution of (26), we consider the general equation W ( x ) =  β Z W ( y ) Q ( dy | x ) + R ( x ) R ′ ( z ) c , ≤ x ≤ θ , β W ( ) + R ( x ) R ′ ( z ) c + , θ < x ≤ , (27)where c , z ∈ [ , ] and θ ∈ ( , ) are arbitrarily chosen and ﬁxed. Let B ([ , ] , R ) be the Banach space of bounded Borel measurable functions with norm k g k = sup x | g ( x ) | . By a contraction mapping, we can show (27) has a unique solution W ∈ B ([ , ] , R ) .We continue to characterize the sensitivity θ γ of the threshold. Recall the partialderivative ∂ q ( y | x ) ∂ x . Lemma 7.

We have β h Z θ ¯ v ( y ) ∂ q ( y | ¯ θ ) ∂ x dy − ¯ v ( ¯ θ ) q ( ¯ θ | ¯ θ ) i θ γ = + β w ( ) − β Z θ w ( y ) Q ( dy | ¯ θ ) . (28) Proof.

Given the threshold ¯ θ ∈ ( , ) , the stationary distribution ¯ µ has a prob-ability density function (p.d.f.) p ( x ) on ( , ] , and ¯ µ ( { } ) = π , where ( p , π ) isdetermined by π = Z θ p ( x ) dx , (29) p ( x ) =  Z x q ( x | y ) p ( y ) dy + π q ( x | ) , ≤ x < ¯ θ , Z ¯ θ q ( x | y ) p ( y ) dy + π q ( x | ) , ¯ θ ≤ x ≤ . (30) Proof.

Let δ be the dirac measure at x =

0. For any Borel subset B ⊂ [ , ] , we have¯ µ ( B ) = R [ Q ( B | y ) ( y < ¯ θ ) + δ ( B ) ( y ≥ ¯ θ ) ] ¯ µ ( dy ) . Then it can be checked that ( p , π ) satisfying the above equations determines the stationary distribution. Now we showthere exists a unique solution. Let π > p ( x ) = Z x q ( x | y ) p ( y ) dy + π q ( x | ) , ≤ x ≤ ¯ θ , (31) inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 15 and we obtain a unique solution p in C ([ , ¯ θ ] , R ) (see e.g. [36, p.33]). In fact p is anonnegative function with R ¯ θ p ( x ) dx >

0. Subsequently, we further determine p ≥ [ ¯ θ , ] by (30). The solution p on [ , ] depends linearly on π and so there existsa unique π such that R p ( x ) dx + π =

1. After we uniquely solve p for (30), weintegrate both sides of this equation on [ , ] and obtain R p ( x ) dx = R ¯ θ p ( x ) dx + π ,which implies that (29) is satisﬁed. ⊓⊔ Now we suppose Q ( dy | x ) has uniform distribution on [ x , ] for all ﬁxed 0 ≤ x < R ( x , z ) = R ( x ) R ( z ) = x ( c + z ) , where R ( x ) = x , R ( z ) = c + z and c >

0. Inthis case, (A2)-(A6) are satisﬁed. For (21), we have¯ v ( x ) =  β − x Z x ¯ v ( y ) dy + R ( x ) R ( ¯ z ) , ≤ x ≤ ¯ θ , β ¯ v ( ) + R ( x ) R ( ¯ z ) + ¯ γ , ¯ θ ≤ x ≤ . (32)Denote ϕ ( x ) = R x ¯ v ( y ) dy . Then˙ ϕ ( x ) = − β − x ϕ − R ( x ) R ( ¯ z ) , ≤ x ≤ ¯ θ . Taking the initial condition ϕ ( ) , we have ϕ ( x ) = ϕ ( )( − x ) β − ( − x ) β Z x R ( τ ) R ( ¯ z )( − τ ) β d τ . On [ , ¯ θ ] ,¯ v ( x ) = ( − x ) β − ¯ v ( ) − β ( − x ) β − Z x R ( τ ) R ( ¯ z )( − τ ) β d τ + R ( x ) R ( ¯ z )= ( − x ) β − h ¯ v ( ) − β ( c + ¯ z )( − β )( − β ) i + ( c + ¯ z ) h β ( − β )( − β ) + x − β i . By the continuity of ¯ v and its form on [ ¯ θ , ] , we have¯ v ( ¯ θ ) = β ¯ v ( ) + ¯ θ ( ¯ z + c ) + ¯ γ . (33)Hence, [( − ¯ θ ) β − − β ] ¯ v ( ) = β ( c + ¯ z )[( − ¯ θ ) β − − ]( − β )( − β ) − β ( c + ¯ z ) ¯ θ − β + ¯ γ . (34)On the other hand, since ¯ v is increasing and ¯ θ is the threshold, we have ¯ v ( ¯ θ ) = β Z θ [ β ¯ v ( ) + ( c + z ) y + ¯ γ ] − ¯ θ dy + ( c + ¯ z ) ¯ θ = β ¯ v ( ) + β ¯ γ + β ( c + ¯ z ) + ( β + )( c + ¯ z ) ¯ θ , which combined with (33) gives β ( c + ¯ z )( + ¯ θ ) = ( β ¯ v ( ) + ¯ γ )( − β ) . (35)Given the special form of Q ( dy | x ) , (26) becomes w ( x ) =  β − x Z x w ( y ) dy + R ( x ) R ′ ( ¯ z ) z γ , ≤ x ≤ ¯ θ , β w ( ) + R ( x ) R ′ ( ¯ z ) z γ + , ¯ θ < x ≤ . (36)The computation of w now reduces to uniquely solving w ( ) . By the expression of w on [ , ¯ θ ] , we have w ( ¯ θ ) = β Z θ w ( y ) Q ( dy | ¯ θ ) + R ( ¯ θ ) R ′ ( ¯ z ) z γ = β w ( ) + β + R ( ¯ θ ) R ′ ( ¯ z ) z γ + β R ′ ( ¯ z ) z γ − ¯ θ Z θ R ( y ) dy = β w ( ) + β + ¯ θ z γ + β z γ + ¯ θ . (37)For x ∈ [ , ¯ θ ] , we further write w ( x ) = β − x Z x w ( y ) dy + R ( x ) R ′ ( ¯ z ) z γ , and solve w ( x ) = ( − x ) β − w ( ) + z γ x − β z γ h ( − x ) β − ( − β )( − β ) − − β + − x − β i , which further gives w ( ¯ θ ) = ( − ¯ θ ) β − w ( ) + z γ ¯ θ − β z γ h ( − ¯ θ ) β − ( − β )( − β ) − − β + − ¯ θ − β i . (38)By (37)–(38), we have [ β − ( − ¯ θ ) β − − β ] w ( ) = + z γ (cid:16) + ¯ θ + ( − ¯ θ ) β − ( − β )( − β ) + − ¯ θ − β − − β (cid:17) . (39)Now from (30) we have inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 17 p ( x ) =  Z x − y p ( y ) dy + π , ≤ x < ¯ θ , Z ¯ θ − y p ( y ) dy + π , ¯ θ ≤ x ≤ , which determines p ( x ) =  π − x , ≤ x < ¯ θ , π − ¯ θ , ¯ θ ≤ x ≤ , where π = − ln ( − ¯ θ ) . We determine the mean ﬁeld¯ z = Z ¯ θ xp ( x ) dx + Z θ xp ( x ) dx = π (cid:0) − ¯ θ − ln ( − ¯ θ ) (cid:1) . (40)We further obtain dzd γ at ¯ γ as z γ = ln ( − ¯ θ ) − + − ¯ θ [ − ln ( − ¯ θ )] θ γ . (41)We note that a perturbation analysis directly based on the general case (30) is morecomplicated.Now (28) reduces to h β − ¯ θ Z θ ¯ v ( y ) − ¯ θ dy − β ¯ v ( ¯ θ ) − ¯ θ i θ γ = + β w ( ) − β Z θ w ( y ) − ¯ θ dy . By the expression of ¯ v in (32) and w in (36) at θ = ¯ θ , we obtain ( − β ) ¯ v ( ¯ θ ) − ¯ θ ( c + ¯ z ) − ¯ θ θ γ = + β w ( ) − w ( ¯ θ ) + ¯ θ z γ . Recalling (33) and (37), we have ( − β )[ β ¯ v ( ) + ¯ γ ] − β ¯ θ ( ¯ z + c ) − ¯ θ θ γ − β ( − β ) w ( ) + + ¯ θ β z γ = − β . (42)By combining (34), (35) and (40), we have¯ v ( ) = [( − ¯ θ ) β − − β ] − h β ( c + ¯ z )[( − ¯ θ ) β − − ]( − β )( − β ) − β ( c + ¯ z ) ¯ θ − β + ¯ γ i , (43)¯ θ = ( − β )( β ¯ v ( ) + ¯ γ ) β ( c + ¯ z ) − , (44)¯ z = − ln ( − ¯ θ ) (cid:0) − ¯ θ − ln ( − ¯ θ ) (cid:1) . (45) x v(x)w(x) Fig. 1

Value function v and perturbation function w Next, combining (39), (41) and (42), we obtain ( − β )[ β ¯ v ( ) + ¯ γ ] − β ¯ θ ( ¯ z + c ) − ¯ θ θ γ − β ( − β ) w ( ) + + ¯ θ β z γ = − β , (46) [ β − ( − ¯ θ ) β − − β ] w ( ) = + z γ (cid:16) + ¯ θ + ( − ¯ θ ) β − ( − β )( − β ) + − ¯ θ − β − − β (cid:17) , (47) z γ = ln ( − ¯ θ ) − + − ¯ θ [ − ln ( − ¯ θ )] θ γ . (48)After ( ¯ v ( ) , ¯ z , ¯ θ ) has been determined from (43)-(45), the above gives a linear equa-tion system with unknowns w ( ) , θ γ and z γ . Example 2.

We take R ( x ) = x and R ( z ) = . + z , ¯ γ = . β = . We numeri-cally solve (43)-(45) to obtain ¯ v ( ) = . , ¯ θ = . , ¯ z = . w ( ) = . , θ γ = . , z γ = . v ( x ) and w ( x ) are displayed in Fig. 1, where w has a discontinuity at x = ¯ θ as dis-cussed in Remark 2. The positive value of θ γ implies the threshold increases with γ ,as asserted in Theorem 3. This paper considers mean ﬁeld games in a framework of binary Markov decisionprocesses (MDP) and establishes existence and uniqueness of stationary equilib- Corrected on Oct 10, 2020 by adding the value of β and correcting the parameter in R ( z ) .inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 19 ria. The resulting policy has a threshold structure. We further analyze comparativestatics to address the impact of parameter variations in the model.For future research, there are some potentially interesting extensions. One mayconsider a heterogenous population and study the emergence of free-riders whocare more about their own effort costs and have less incentive to contribute to thecommon beneﬁt of the population. Another modelling of a quite different natureinvolves negative externalities where other players’ improvement brings more pres-sure on the player in question. For instance, this arises in competitions for marketshare. The modelling and analysis of the agent behavior will be of interest. Appendix A: Preliminaries on Ergodicity

Assume (A3). The next two lemmas determine the limiting distribution of the stateprocess under threshold policies.

Lemma A.1. i) If θ = , then the distribution of x it remains to be the dirac measure δ for all t ≥ , for any x i .ii) If θ = or θ = + , the distribution of x it converges to the dirac measure δ weakly.Proof. Part i) is obvious and part ii) follows from (A3). ⊓⊔ Let x i , θ t denote the state process generated by the θ -threshold policy with θ ∈ ( , ) , and let P t θ ( x , · ) be the distribution of x i , θ t given x i , θ = x . Lemma A.2.

For θ ∈ ( , ) , { x i , θ t , t ≥ } is uniformly ergodic with stationary prob-ability distribution π θ , i.e., sup x ∈ S k P t θ ( x , · ) − π θ k TV ≤ Kr t , (A.1) for some constants K > and r ∈ ( , ) , where k · k TV is the total variation norm ofsigned measures.Proof. The proof is similar to that of the ergodicity theorem in [27], which assumed(A3 ′ ). We use (A3)-iii) to estimate r . ⊓⊔ We take C s = { } as a small set and θ ∈ ( , ) . The θ -threshold policy gives P ( x i , θ = | x i , θ = ) ≥ Z θ q ( y | ) dy = : ε . (A.2)So for any Borel set B , P ( x i , θ ∈ B | x i , θ = ) ≥ ε δ ( B ) , where δ is the dirac measure.For θ ′ in a small neighborhood of θ , we can ensure that the θ ′ -threshold policy gives P ( x i , θ ′ ∈ B | x i , θ ′ = ) ≥ ε δ ( B ) . (A.3) Lemma A.3.

Suppose θ , θ ′ ∈ ( , ) for two threshold policies. Let the correspond-ing stationary distributions of the state process by π and π ′ . Then lim θ ′ → θ k π ′ − π k TV = . Proof.

Fix θ ∈ ( , ) . By (A.3) and [41], there exist a neighborhood I = ( θ − κ , θ + κ ) ⊂ ( , ) and two constants C , r ∈ ( , ) such that for all θ ′ ∈ I , k P t θ ( x , · ) − π k TV ≤ Cr t , k P t θ ′ ( x , · ) − π ′ k TV ≤ Cr t , ∀ x ∈ [ , ] . Subsequently, k π ′ − π k TV ≤ k P t θ ′ ( , · ) − P t θ ( , · ) k TV + Cr t . For any given ε >

0, ﬁx a large k such that 2 Cr k ≤ ε /

2. We show for all θ ′ sufﬁ-ciently close to θ , k P k θ ′ ( , · ) − P k θ ( , · ) k TV ≤ ε / . Given two probability measures µ t , µ ′ t , deﬁne the probability measures µ t + and µ ′ t + , µ t + ( B ) = Z S P θ ( y , B ) µ t ( dy ) , µ ′ t + ( B ) = Z S P θ ′ ( y , B ) µ ′ t ( dy ) , for Borel set B ⊂ [ , ] . Then | µ t + ( B ) − µ ′ t + ( B ) | ≤ | Z S P θ ( y , B ) µ t ( dy ) − Z S P θ ′ ( y , B ) µ t ( dy ) | + | Z S P θ ′ ( y , B ) µ t ( dy ) − Z S P θ ′ ( y , B ) µ ′ t ( dy ) | = : D + D . We have D = (cid:12)(cid:12)(cid:12) Z S P θ ′ ( y , B ) µ t ( dy ) − Z S P θ ′ ( y , B ) µ ′ t ( dy ) (cid:12)(cid:12)(cid:12) ≤ k µ t − µ ′ t k TV . Denote θ = min { θ , θ ′ } and θ = max { θ , θ ′ } . Then D = (cid:12)(cid:12)(cid:12) − Z [ θ , θ ) Q ( B | y ) µ t ( dy ) + B ( ) µ t ([ θ , θ )) (cid:12)(cid:12)(cid:12) ≤ µ t ([ θ , θ )) . Setting µ = µ ′ = δ , then µ t = P t θ ( , · ) , µ ′ t = P t θ ′ ( , · ) . Hence, | P t + θ ′ ( , B ) − P t + θ ( , B ) | ≤ k P t θ ′ ( , · ) − P t θ ( , · ) k TV + P t θ ( , [ θ , θ )) , (A.4)which implies k P t + θ ′ ( , · ) − P t + θ ( , · ) k TV ≤ k P t θ ′ ( , · ) − P t θ ( , · ) k TV + P t θ ( , [ θ , θ ′ )) . (A.5) inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 21 For µ = µ ′ = δ , we have P θ ( , · ) = P θ ′ ( , · ) . It is clear from (A.5) and Lemma 4that for each t ≥ θ ′ → θ k P t θ ′ ( , · ) − P t θ ( , · ) k TV = , lim θ ′ → θ P t θ ( , [ θ , θ )) = . Therefore, for the ﬁxed k , there exists δ > θ ′ satisfying | θ ′ − θ | < δ , k P k θ ′ ( , · ) − P k θ ( , · ) k TV < ε and k π ′ − π k TV ≤ ε . The lemma follows. ⊓⊔ Appendix B: Cycle Average of A Regenerative Process

Let 0 < r < r ′ <

1. Consider a Markov process { Y t , t ≥ } with state space [ , ] and transition kernel Q Y ( ·| y ) which satisﬁes Q Y ([ y , ] | y ) = y ∈ [ , ] and isstochastically increasing. Suppose Y ≡ y < r . Deﬁne the stopping times τ = inf { t | Y t ≥ r } , τ ′ = inf { t | Y t ≥ r ′ } . Lemma B.1.

If E τ < ∞ , then E ∑ τ t = Y t < ∞ andE ∑ τ t = Y t + E τ = EY + EY + ∑ ∞ k = E ( Y k + { Y k < r } ) + ∑ ∞ k = P ( Y k < r ) . (B.1) Proof.

Since 0 ≤ Y t ≤ E ∑ τ t = Y t ≤ + E τ . It is clear that { τ ≥ k } = { Y k − < r } for k ≥

1. We have E τ = ∞ ∑ k = P ( τ ≥ k ) = + ∞ ∑ k = P ( Y k < r ) , (B.2)and E τ ∑ t = Y t = E ∞ ∑ k = k ∑ t = Y t ! { τ = k } = EY + EY + ∞ ∑ k = E ( Y k { τ ≥ k } )= EY + EY + ∞ ∑ k = E ( Y k + { Y k < r } ) . The lemma follows. ⊓⊔ Lemma B.2.

Assume E τ ′ < ∞ . We haveE ∑ τ t = Y t + E τ ≤ E ∑ τ ′ t = Y t + E τ ′ . (B.3) Proof. E τ < ∞ since τ ≤ τ ′ w.p.1. For k ≥

1, denote p k = P ( Y k < r ) , η k = P ( r ≤ Y k < r ′ ) , m k = E ( Y k + { Y k < r } ) , ∆ k = E ( Y k + { r ≤ Y k < r ′ } ) . By Lemma B.1, E ∑ τ t = Y t + E τ = EY + EY + ∑ ∞ k = m k + ∑ ∞ k = p k , E ∑ τ ′ t = Y t + E τ ′ = EY + EY + ∑ ∞ k = ( m k + ∆ k ) + ∑ ∞ k = ( p k + η k ) . So (B.3) is equivalent to ( EY + EY + ∞ ∑ k = m k )( ∞ ∑ k = η k ) ≤ ( ∞ ∑ k = ∆ k )( + ∞ ∑ k = p k ) . (B.4)By the stochastic monotonicity of Q Y , we have E [ Y k + { Y k < r } | Y k ] = { Y k < r } Z yQ Y ( dy | Y k ) ≤ { Y k < r } Z yQ Y ( dy | r ) = : c r { Y k < r } . Note that c r = Z y ≥ r yQ Y ( dy | r ) ≥ r . (B.5)Moreover, E [ Y k + { r ≤ Y k < r ′ } | Y k ] = { r ≤ Y k < r ′ } Z yQ Y ( dy | Y k ) ≥ c r { r ≤ Y k < r ′ } . It follows that m k = E [ Y k + { Y k < r } ] ≤ c r p k , ∆ k = E [ Y k + { r ≤ Y k < r ′ } ] ≥ c r η k . (B.6)Since Y = y < r , E [ Y | Y ] = Z yQ Y ( dy | Y ) ≤ c r . Hence, E ( Y + Y ) ≤ r + c r . By (B.6) and (B.5), inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 23 ( EY + EY + ∞ ∑ k = m k )( ∞ ∑ k = η k ) − ( ∞ ∑ k = ∆ k )( + ∞ ∑ k = p k ) ≤ ( r + c r + c r ∞ ∑ k = p k )( ∞ ∑ k = η k ) − c r ( ∞ ∑ k = η k )( + ∞ ∑ k = p k )=( r − c r ) ∞ ∑ k = η k ≤ , which establishes (B.4). ⊓⊔ Remark B.1.

If for each y ∈ [ , ) , Q Y ( dx | y ) has probability density function q Y ( x | y ) > x ∈ ( y , ) , then c r > r and η k > k ≥

1. In this case, a strict inequalityholds for (B.3). ⊓⊔ Appendix C

We assume (A3). Let { x i , θ t , t ≥ } be the Markov chain generated by a θ -thresholdpolicy with 0 < θ <

1, where x i , θ is given. By Lemma A.2, { x i , θ t , t ≥ } is ergodic.We next deﬁne an auxiliary Markov chain { Y t , t ≥ } with Y = x i , θ t . Denote S t = ∑ ti = Y i for t ≥

0. Deﬁne τ = inf { t | Y t ≥ θ } . Lemma C.1.

We have lim k → ∞ k k − ∑ t = Y t = ES τ + E τ w . p . . (C.1) Proof.

By (A3), we can show E τ < ∞ . Since { Y t , t ≥ } has the same transitionprobability kernel as { x i , θ t , t ≥ } , it is ergodic, and therefore the left hand side of(C.1) has a constant limit w.p.1. Deﬁne T = T n as the time for { Y t , t ≥ } toreturn to state 0 for the n th time. So T = τ +

1. Deﬁne B n = ∑ T n − t = T n − Y t for n ≥

1. Weobserve that { Y t , t ≥ } is a regenerative process (see e.g. [6, 51] and [7, Theorem4]) with regeneration times { T n , n ≥ } and that { B n , n ≥ } is a sequence of i.i.d.random variables. Note that B = S τ is the sum of τ + ⊓⊔ Suppose 0 < θ < θ ′ <

1. Then there exist two constants C θ , C θ ′ such thatlim k → ∞ k k − ∑ t = x i , θ t = C θ , lim k → ∞ k k − ∑ t = x i , θ ′ t = C θ ′ , w.p.1. Lemma C.2.

We have C θ ≤ C θ ′ .Proof. Due to the ergodicity of the Markov chain, C θ (resp., C θ ′ ) does not dependon x i , θ (resp., x i , θ ′ ). Therefore, lim k → ∞ k ∑ k − t = Y t = C θ w.p.1. The lemma followsfrom Lemmas C.1 and B.2. ⊓⊔ Appendix D: An Auxiliary MDP

Assume (A3). This appendix introduces an auxiliary control problem to show theeffect of the effort cost on the threshold parameter of the optimal policy. The stateand control processes { ( x it , a it ) , t ≥ } are speciﬁed by (1)-(2). The cost has the form J ri = E ∞ ∑ t = ρ t (cid:0) R ( x it ) + r { a it = a } (cid:1) , (D.1)where R is continuous and strictly increasing on [ , ] and ρ ∈ ( , ) , r ∈ ( , ∞ ) .Let r take two different values 0 < γ < γ and write the corresponding dynamicprogramming equation v l ( x ) = min (cid:26) ρ Z v l ( y ) Q ( dy | x ) + R ( x ) , ρ v l ( ) + R ( x ) + γ l (cid:27) , l = , , x ∈ S . (D.2)By the method in proving Lemma 1, it can be shown that there exists a uniquesolution v l ∈ C ([ , ] , R ) and that the optimal policy a i , l ( x ) is a threshold policy. If ρ R v l ( y ) Q ( dy | ) < ρ v l ( ) + γ l , a i , l ( x ) ≡ a , and we follow the notation in Section3 to denote the threshold θ l = + . Otherwise, a i , l ( x ) is a θ l -threshold policy with θ l ∈ [ , ] , i.e., a i , l ( x ) = a if x ≥ θ l , and a i , l ( x ) = a if x < θ l . Lemma D.1. If θ ∈ ( , ) , θ = θ .Proof. We prove by contradiction. Suppose for some θ ∈ ( , ) , θ = θ = θ . (D.3)Under (D.3), the resulting optimal policy leads to the representation (see e.g. [23,pp. 22]) v l ( x ) = E ∞ ∑ t = ρ t h R ( x it ) + γ l { a it = a } i , l = , , where { x it , t ≥ } is generated by the θ -threshold policy a it ( x it ) and x i = x . Denote δ = γ − γ .For ﬁxed x ≥ θ and x i = x , denote the resulting optimal state and control pro-cesses by { ( ˆ x it , ˆ a it ) , t ≥ } . Then ˆ a i = a w.p.1., and v ( x ) − v ( x ) = δ + δ E ∞ ∑ t = ρ t { ˆ a it = a } , x ≥ θ . Next consider x i = { ( ˇ x it , ˇ a it ) , t ≥ } . Then v ( ) − v ( ) = δ E ∞ ∑ t = ρ t { ˇ a it = a } = : ∆ . inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 25 It is clear that ˆ x i = { ( ˆ x it , ˆ a it ) , t ≥ } may beinterpreted as the optimal state and control processes of the MDP with initial state 0at t =

1. Hence the two processes { ( ˆ x it , ˆ a it ) , t ≥ } and { ( ˇ x it , ˇ a it ) , t ≥ } , where ˇ x i = a it + and ˇ a it have thesame distribution for t ≥

0. Therefore, E ∞ ∑ t = ρ t − { ˆ a it = a } = E ∞ ∑ t = ρ t { ˇ a it = a } . It follows that v ( x ) − v ( x ) = δ + ρ∆ , ∀ x ≥ θ . (D.4)Combining (D.2) and (D.3) gives ρ Z v l ( y ) Q ( dy | θ ) = ρ v l ( ) + γ l , l = , , which implies ρ Z [ v ( x ) − v ( x )] Q ( dx | θ ) = δ + ρ∆ . (D.5)By Q ([ , θ ) | θ ) = ρ ( δ + ρ∆ ) = δ + ρ∆ , whichis impossible since 0 < ρ < δ + ρ∆ >

0. Therefore, (D.3) does not hold.This completes the proof. ⊓⊔ For the MDP with cost (D.1), we continue to analyze the dynamic programmingequation v r ( x ) = min h ρ Z v r ( y ) Q ( dy | x ) + R ( x ) , ρ v r ( ) + R ( x ) + r i . (D.6)For each ﬁxed r ∈ ( , ∞ ) , we obtain the optimal policy as a threshold policy withthreshold parameter θ ( r ) . By evaluating the cost (D.1) associated with the two poli-cies a it ( x it ) ≡ a and a it ( x it ) ≡ a , respectively, we have the prior estimate v r ( x ) ≤ min (cid:26) R ( ) − ρ , R ( x ) + r + ρ R ( ) − ρ (cid:27) . (D.7)On the other hand, let { x it , t ≥ } with x i = x be generated by any ﬁxed Markovpolicy. Then E ∞ ∑ t = ρ t ( R ( x it ) + r { a it = a } ) ≥ R ( x ) + ∞ ∑ t = ρ t R ( ) , which implies v r ( x ) ≥ R ( x ) + ρ R ( ) − ρ . (D.8)If r > ρ R ( ) − ρ , it follows from (D.7) that ρ Z v r ( y ) Q ( dy | x ) < ρ v r ( ) + r , ∀ x , (D.9)i.e., θ ( r ) = + . Lemma D.2.

There exists δ > such that for all < r < δ , ρ Z v r ( y ) Q ( dy | x ) > ρ v r ( ) + r , ∀ x , (D.10) and so θ ( r ) = .Proof. By (D.8), ρ Z v r ( y ) Q ( dy | x ) ≥ ρ Z R ( y ) Q ( dy | x ) + ρ R ( ) − ρ ≥ ρ Z R ( y ) Q ( dy | ) + ρ R ( ) − ρ , and (D.7) gives ρ v r ( ) + r ≤ ρ R ( ) − ρ + r − ρ . Since R ( x ) is strictly increasing, C R : = Z R ( y ) Q ( dy | ) − R ( ) > . And we have ρ Z v r ( y ) Q ( dy | x ) − ( ρ v r ( ) + r ) ≥ ρ C R − r − ρ . It sufﬁces to take δ = ρ ( − ρ ) C R . ⊓⊔ Deﬁne the nonempty sets R a = { r > | (D.9) hods } , R a = { r > | (D.10) holds } . Remark D.1.

We have ( ρ R ( ) − ρ , ∞ ) ⊂ R a and ( , δ ) ⊂ R a . Lemma D.3.

Let ( r , v r ) be the parameter and the associated solution in (D.6) .i) If r > satisﬁes inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 27 ρ Z v r ( y ) Q ( dy | x ) ≤ ρ v r ( ) + r , ∀ x , (D.11) then any r ′ > r is in R a .ii) If r > satisﬁes ρ Z v r ( y ) Q ( dy | x ) ≥ ρ v r ( ) + r , ∀ x , (D.12) then any r ′ ∈ ( , r ) is in R a .Proof. i) For r ′ > r , v r ′ is uniquely solved from (D.6) with r ′ in place of r . We canuse (D.11) to verify v r ( x ) = min (cid:20) ρ Z v r ( y ) Q ( dy | x ) + R ( x ) , ρ v r ( ) + R ( x ) + r ′ (cid:21) . Hence v r ′ = v r for all x ∈ [ , ] . It follows that ρ R v r ′ ( y ) Q ( dy | x ) < ρ v r ′ ( ) + r ′ forall x . Hence r ′ ∈ R a .ii) By (D.6) and (D.12), v r ( ) = R ( )+ r − ρ , and subsequently, v r ( x ) = ρ v r ( ) + R ( x ) + r = ρ R ( ) + r − ρ + R ( x ) . By substituting v r ( ) and v r ( x ) into (D.12), we obtain ρ R ( ) + r ≤ ρ Z R ( y ) Q ( dy | x ) , ∀ x . (D.13)Now for 0 < r ′ < r , we construct v r ′ ( x ) , as a candidate solution to (D.6) with r replaced by r ′ , to satisfy v r ′ ( ) = ρ v r ′ ( ) + R ( ) + r ′ , v r ′ ( x ) = ρ v r ′ ( ) + R ( x ) + r ′ , (D.14)which gives v r ′ ( x ) = ρ R ( ) + r ′ − ρ + R ( x ) . (D.15)We show that v r ′ ( x ) in (D.15) satisﬁes ρ v r ′ ( ) + r ′ < ρ Z v r ′ ( y ) Q ( dy | x ) , ∀ x , (D.16)which is equivalent to ρ R ( ) + r ′ < ρ R R ( y ) Q ( dy | x ) for all x , which in turnfollows from (D.13). By (D.14) and (D.16), v r ′ indeed satisﬁes (D.6) with r replacedby r ′ . So r ′ ∈ R a . ⊓⊔ Further deﬁne r = sup R a , r = inf R a . Lemma D.4. i) r satisﬁes ρ R v r ( y ) Q ( dy | ) = ρ v r ( ) + r, and θ ( r ) = .ii) r satisﬁes ρ R v r ( y ) Q ( dy | ) = ρ v r ( ) = ρ v r ( ) + r, and θ ( r ) = .iii) We have < r < r < ∞ .iv) The threshold θ ( r ) as a function of r ∈ ( , ∞ ) is continuous and strictly in-creasing on [ r , r ] .Proof. i)-ii) By Lemmas D.2 and D.3, we have 0 < r ≤ ∞ and 0 ≤ r < ∞ . Assume r = ∞ ; then R a = ( , ∞ ) giving R a = /0, a contradiction. So 0 < r < ∞ . For δ > ( , δ ) ⊂ R a . Therefore, 0 < ¯ r < ∞ . Note that v r dependson the parameter r continuously, i.e., lim | r ′ − r |→ sup x | v r ′ ( x ) − v r ( x ) | =

0. Hence ρ Z v r ( y ) Q ( dy | ) ≥ ρ v r ( ) + r . Now assume ρ Z v r ( y ) Q ( dy | ) > ρ v r ( ) + r . (D.17)Then there exists a sufﬁciently small ε > ( r + ε , v r + ε ) replaces ( r , v r ) ; since g ( x ) = R v r + ε ( y ) Q ( dy | x ) is increasing in x , then r + ε ∈ R a , which is impossible. Hence (D.17) does not hold, and this proves i). ii)can be shown in a similar manner.To show iii), assume 0 < r < r < ∞ . (D.18)Then, recalling Remark D.1, there exist r ′ ∈ R a and r ′′ ∈ R a such that0 < r < r ′ < r ′′ < r < ∞ . By Lemma D.3-i), r ′′ ∈ R a , and then r ′′ ∈ R a ∩ R a = /0, which is impossible.Therefore, (D.18) does not hold and we conclude 0 < r ≤ r < ∞ . We further assume r = r . Then i)-ii) would imply R v r ( y ) Q ( dy | ) = v r ( ) , which is impossible since v r is strictly increasing on [ , ] and (A3) holds. This proves iii).iv) By the deﬁnition of r and r , it can be shown using (D.6) that θ ( r ) ∈ ( , ) for r ∈ ( r , r ) . By the continuous dependence of the function v r ( · ) on r and the method ofproving [27, Lemma 10], we can show the continuity of θ ( r ) on ( , ) , and furthershow lim r → r + θ ( r ) = r → r − θ ( r ) =

1. So θ ( r ) is continuous on [ r , r ] . If θ ( r ) were not strictly increasing on [ r , r ] , there would exist r < r < r < r such that θ ( r ) ≥ θ ( r ) . (D.19)If θ ( r ) > θ ( r ) in (D.19), by the continuity of θ ( r ) , θ ( r ) = θ ( r ) =

1, and theintermediate value theorem we may ﬁnd r ′ ∈ ( r , r ) such that θ ( r ′ ) = θ ( r ) . Next,we replace r by r ′ . Thus if θ ( r ) is not strictly increasing, we may ﬁnd r < r from inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 29 ( r , r ) such that θ ( r ) = θ ( r ) ∈ ( , ) , which is a contradiction to Lemma D.1. Thisproves iv). ⊓⊔ Remark D.2.

By Lemmas D.3 and D.4, R a = ( , r ) and R a = ( r , ∞ ) . Acknowledgement

We would like to thank Aditya Mahajan for helpful discussions.

References

1. Acemoglu, D., Jensen, M.K.: Aggregate comparative statics. Games and Economic Behavior , 27-49 (2013)2. Acemoglu, D., Jensen, M.K.: Robust comparative statics in large dynamic economies. Journalof Political Economy , 587-640 (2015)3. Adlakha, S., Johari, R., Weintraub, G.Y.: Equilibria of dynamic games with many players:Existence, approximation, and market structure. J. Econ. Theory , 269-316 (2015)4. Altman, E., Stidham, S.: Optimality of monotonic policies for two-action Markovian deci-sion processes, with applications to control of queues with delayed information. QueueingSystems , 267-291 (1995)5. Amir R.: Sensitivity analysis of multisector optimal economic dynamics. Journal of Mathe-matical Economics , 123-141 (1996)6. Asmussen, S.: Applied Probability and Queues, 2nd edn. Springer, New York (2003)7. Athreya, K.B., Roy, V.: When is a Markov chain regenerative? Statistics and Probability Let-ters , 22-26 (2014)8. Babichenko, Y.: Best-reply dynamics in large binary-choice anonymous games. Games andEconomic Behavior , 130-144 (2013)9. Bardi, M.: Explicit solutions of some linear-quadratic mean ﬁeld games. Netw. Heteroge-neous Media , 243-261 (2012)10. Bauerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer,Berlin (2011)11. Becker R. A.: Comparative dynamics in aggregate models of optimal capital accumulation.Quarterly Journal of Economics , 1235-1256 (1985)12. Bensoussan, A., Frehse, J., Yam, P.: Mean Field Games and Mean Field Type Control Theory.Springer, New York (2013)13. Biswas, A.: Mean ﬁeld games with ergodic cost for discrete time Markov processes, preprint,arXiv:1510.08968, 2015.14. Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer-Verlag,New York (2000)15. Brock, W.A., Durlauf, S. N.: Discrete choice with social interactions. Rev. Econ. Studies ,235-260 (2001)16. Caines, P.E.: Mean ﬁeld games. In: Samad, T., Baillieul, J. (eds.) Encyclopedia of Systemsand Control. Springer-Verlag, Berlin (2014)17. Caines, P.E., Huang, M., Malham´e, R.P.: Mean Field Games. In: Basar, T., Zaccour, G. (eds.)Handbook of Dynamic Game Theory, pp. 345-372, Springer, Berlin (2017)18. Cardaliaguet, P.: Notes on mean ﬁeld games, University of Paris, Dauphine (2012)19. Carmona R., Delarue, F.: Probabilistic Theory of Mean Field Games with Applications, vol Iand II. Springer, Cham (2018)0 Minyi Huang and Yan Ma20. Dorato, P.: On sensitivity in optimal control systems. IEEE Transactions on Automatic Con-trol , 256-257 (1963)21. Filar, J.A., Vrieze, K.: Competitive Markov Decision Processes. Springer, New York (1997)22. Gomes, D. A., Mohr, J., Souza, R.R.: Discrete time, ﬁnite state space mean ﬁeld games. J.Math. Pures Appl. ε -Nash equilibria. IEEETrans. Autom. Control , 1560-1571 (2007)27. Huang, M., Ma, Y.: Mean ﬁeld stochastic games: Monotone costs and threshold policies (inChinese), Sci. Sin. Math. (special issue in honour of the 80th birthday of Prof. H-F. Chen) ,1445-1460 (2016)28. Huang, M., Ma, Y.: Mean ﬁeld stochastic games with binary action spaces and monotonecosts. arXiv:1701.06661v1, 2017.29. Huang, M., Ma, Y.: Mean ﬁeld stochastic games with binary actions: Stationary thresholdpolicies. Proc. 56th IEEE Conference on Decision and Control, Melbourne, Australia, pp.27-32 (2017)30. Huang, M., Malham´e, R.P., Caines, P.E.: Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Commun. In-form. Systems , 221-251 (2006)31. Huang, M., Zhou, M..: Linear quadratic mean ﬁeld games: Asymptotic solvability and rela-tion to the ﬁxed point approach. IEEE Transactions on Automatic Control (2018, in revision,conditionally accepted)32. Ito, K., Kunisch, K.: Sensitivity analysis of solutions to optimization problems in Hilbertspaces with applications to optimal control and estimation. J. Differential Equations , 1-40(1992)33. Jovanovic, B., Rosenthal, R.W.: Anonymous sequential games. Journal of Mathematical Eco-nomics , 77-87 (1988)34. Jiang, L., Anantharam, V., Walrand, J.: How bad are selﬁsh investments in network security?IEEE/ACM Trans. Networking , 549-560 (2011)35. Kolokoltsov, V.N.: Nonlinear Markov games on a ﬁnite state space (mean-ﬁeld and binaryinteractions). International J. Statistics Probability , 77-91 (2012)36. Kress, R.: Linear Integral Equations. Springer, Berlin (1989)37. Lasry, J.-M., Lions, P.-L.: Mean ﬁeld games. Japan. J. Math. , 229-260 (2007)38. Lelarge, M., Bolot, J.: A local mean ﬁeld analysis of security investments in networks. Proc.ACM SIGCOMM NetEcon, Seattle, WA, pp. 25-30, 200839. Li, T., Zhang, J.-F.: Asymptotically optimal decentralized control for large populationstochastic multiagent systems. IEEE Trans. Autom. Control , 1643-1660 (2008)40. Manfredia, P., Posta, P.D., d ´Onofrio, A., Salinelli, E., Centrone, F., Meo, C., Poletti, P.: Op-timal vaccination choice, vaccination games, and rational exemption: An appraisal. Vaccine , 98-109 (2010)41. Meyn, S., Tweedie, R. L.: Markov Chains and Stochastic Stability, 2nd ed. Cambridge Uni-versity Press, Cambridge (2009)42. Milgrom, P., Shannon, C.: Monotone comparative statics. Econometrica , 157-80 (1994)43. Moon, J., Basar, T.: Linear quadratic risk-sensitive and robust mean ﬁeld games. IEEE Trans.Autom. Control , 1062-1077 (2017)44. M¨uller, A., Stoyan, D.: Comparison Methods for Stochastic Models and Risks. Wiley, Chich-ester (2002)45. Oniki, H.: Comparative dynamics (sensitivity analysis) in optimal control theory. J. Econ.Theory , 265-283 (1973)inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 3146. Saldi, N., Basar, T., Raginsky, M.: Markov-Nash equilibria in mean-ﬁeld games with dis-counted cost. SIAM J. Control Optimization , 4256-4287 (2018)47. Samuelson, P.A.: Foundations of Economic Analysis, enlarged edn., Harvard UniversityPress, Cambridge, MA (1983)48. Schelling, T.C.: Hockey helmets, concealed weapons, and daylight saving: A study of binarychoices with externalities. The Journal of Conﬂict Resolution , 381-428 (1973)49. Selten, R.: An axiomatic theory of a risk dominance measure for bipolar games with linearincentives. Games and Econ. Behav. , 213-263 (1995)50. Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. , 1095-1100 (1953)51. Sigman, K., Wolff, R.W.: A review of regenerative processes. SIAM Rev. , 269-288 (1993)52. Sun, Y.: The exact law of large numbers via Fubini extension and characterization of insurablerisks. J. Econ. Theory , 31-69 (2006)53. Topkis, D.M.: Supermodularity and Complementarity. Princeton Univ. Press, Princeton(1998)54. Walker, M., Wooders, J., Amir, R.: Equilibrium play in matches: Binary Markov games.Games and Economic Behavior , 487-502 (2011)55. Weintraub, G.Y., Benkard, C.L., Van Roy, B.: Markov perfect industry dynamics with manyﬁrms. Econometrica , 1375-1411 (2008)56. Yong, J.: Linear-quadratic optimal control problems for mean-ﬁeld stochastic differentialequations. SIAM J. Control Optim.51