Binary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics
aa r X i v : . [ m a t h . O C ] J a n Binary Mean Field Stochastic Games:Stationary Equilibria and Comparative Statics ∗ Minyi Huang and Yan Ma
Abstract
This paper considers mean field games in a multi-agent Markov decisionprocess (MDP) framework. Each player has a continuum state and binary action,and benefits from the improvement of the condition of the overall population. Basedon an infinite horizon discounted individual cost, we show existence of a stationaryequilibrium, and prove its uniqueness under a positive externality condition. Wefurther analyze comparative statics of the stationary equilibrium by quantitativelydetermining the impact of the effort cost.
Mean field game theory provides a powerful methodology for reducing complexityin the analysis and design of strategies in large population dynamic games [25, 30,37]. Following ideas in statistical physics, it takes a continuum approach to specifythe aggregate impact of many individually insignificant players and solves a specialstochastic optimal control problem from the point of view of a representative player.By this methodology, one may construct a set of decentralized strategies for theoriginal large but finite population model and show its ε -Nash equilibrium property Minyi HuangSchool of Mathematics and Statistics, Carleton University, Ottawa, ON K1S 5B6, Canada e-mail: [email protected].
This author was supported by the Natural Sciences and Engi-neering Research Council (NSERC) of CanadaYan MaSchool of Mathematics and Statistics, Zhengzhou University, 450001, Henan, China e-mail: [email protected].
This author was supported by the National Science Foundation ofChina (No. 11601489) ∗ in IMA volume Modeling, Stochastic Control, Optimization, and Applications , eds. G. Yin andQ. Zhang, Springer, 2019, p. 283-313. Submitted Dec 2018; revised Feb 2019. This version: Oct10, 2020. Minor changes in Example 2 1 Minyi Huang and Yan Ma [25, 26, 30]. A related solution notion in Markov decision models is the obliviousequilibrium [55]. The readers are referred to [12, 16, 17, 18, 19] for an overview onmean field game theory and further references. For mean field type optimal control,see [12, 56], but the analysis in these models only involves a single decision maker.Dynamic games within an MDP setting originated from the work of Shapley andare called stochastic games [21, 50]. Their mean field game extension has been stud-ied in the literature; see e.g. [3, 13, 46, 55]. Continuous time mean field games withfinite state space can be found in [22, 35]. Our previous work [27, 28] studied aclass of mean field games in a multi-agent Markov decision process (MDP) frame-work. The players in [27] have continuum state spaces and binary action spaces,and have coupling through their costs. The state of each player is used to model itsrisk (or unfitness) level, which has random increase if no active control is taken.Naturally, the one-stage cost of a player is an increasing function of its own stateapart from coupling with others. The motivation of this modeling framework comesfrom applications including network security investment games and flue vaccinationgames [34, 38, 40]; when the one-stage cost is an increasing function of the pop-ulation average state, it reflects positive externalities. Markov decision processeswith binary action spaces also arise in control of queues and machine replacementproblems [4, 10]. Binary choice models have formed a subject of significant inter-est [8, 15, 48, 49, 54]. Our game model has connection with anonymous sequentialgames [33], which combine stochastic game modeling with a continuum of players.In anonymous sequential games one determines the equilibrium as a joint state-action distribution of the population and leaves the individual strategies unspecified[33, Sec. 4], although there is an interpretation of randomized actions for playerssharing a given state.For both anonymous games and MDP based mean field games, stationary solu-tions with discount have been studied in the literature [3, 33]. These works give morefocus on fixed point analysis to prove the existence of a stationary distribution. Thisapproach does not address ergodic behavior of individuals or the population whileassuming the population starts from the steady-state distribution at the initial time.Thus, there is a need to examine whether the individuals collectively have the abilityto move into that distribution at all when they have a general initial distribution. Ourergodic analysis based approach will provide justification of the stationary solutionregarding the population’s ability to settle down around the limiting distribution.The previous work [27, 28] studied the finite horizon mean field game by show-ing existence of a solution with threshold policies, and under an infinite horizondiscounted cost further proved there is at most one stationary equilibrium for whichexistence was not established. A similar continuous time modeling is introduced in[57], which addresses Poisson state jumps and impulse control. It should be notedthat except for linear-quadratic models [9, 26, 31, 39, 43], mean field games rarelyhave closed-form solutions and often rely on heavy numerical computations. Withinthis context, the consideration of structured solutions, such as threshold policies, isof particular interest from the point of view of efficient computation and simpleimplementation. Under such a policy, the individual states evolve as regenerativeprocesses [6, 51]. inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 3
By exploiting stochastic monotonicity, this paper adopts more general state tran-sition assumptions than in [27, 28] and continues the analysis on the stationary equa-tion system. The first contribution of the present paper is the proof of the existenceof a stationary equilibrium. Our analysis depends on checking the continuous de-pendence of the limiting state distribution on the threshold parameter in the bestresponse. The existence and uniqueness analysis in this paper has appeared in apreliminary form in the conference paper [29].A key parameter in our game model is the effort cost. Intuitively, this parameteris a disincentive indicator of an individual for taking active efforts, and in turn willfurther impact the mean field forming the ambient environment of that agent. Thissuggests that we can study a family of mean field games parametrized by the effortcosts and compare their solution behaviors. We address this in the setup of com-parative statics, which have a long history in the economic literature [24, 42, 47]and operations research [53] and provide the primary means to analyze the effect ofmodel parameter variations. For dynamic models, such as economic growth mod-els, the analysis follows similar ideas and is sometimes called comparative dynamics[5, 11, 45, 47] by comparing two dynamic equilibria. In control and optimization,such studies are usually called sensitivity analysis [14, 20, 32]. For comparativestatics in large static games and mean field games, see [1, 2]. Our analysis is accom-plished by performing perturbation analysis around the equilibrium of the meanfield game.The paper is organized as follows. Section 2 introduces the mean field stochasticgame. The best response is analyzed in Section 3. Section 4 proves existence anduniqueness of stationary equilibria. Comparative statics are analyzed in Section 5.Section 6 concludes the paper.
The system consists of N players denoted by A i , 1 ≤ i ≤ N . At time t ∈ Z + = { , , , . . . } , the state of A i is denoted by x it , and its action by a it . For simplicity, weconsider a population of homogeneous (or symmetric) players. Each player has statespace S = [ , ] and action space A = { a , a } . A value of S may be interpreted asa risk or unfitness level. A player can either take inaction (as a ) or make an activeeffort (as a ). For an interval I , let B ( I ) denote the Borel σ -algebra of I .The state of each player evolves as a controlled Markov process, which is affectedonly by its own action. For t ≥ x ∈ S , the state has a transition kernel specifiedby P ( x it + ∈ B | x it = x , a it = a ) = Q ( B | x ) , (1) P ( x it + = | x it = x , a it = a ) = , (2) Minyi Huang and Yan Ma where Q ( ·| x ) is a stochastic kernel defined for B ∈ B ( S ) and Q ([ x , ] | x ) =
1. Bythe structure of Q , the state of the player deteriorates if no active control is taken.The vector process ( x t , . . . x Nt ) constitutes a controlled Markov process in higherdimension with its transition kernel defining a product measure on ( B ( S )) N forgiven ( x t , · · · , x Nt , a t , . . . , a Nt ) .Define the population average state x ( N ) t = N ∑ Ni = x it . The one stage cost of A i is c ( x it , x ( N ) t , a it ) = R ( x it , x ( N ) t ) + γ { a it = a } , where γ > γ { a it = a } is the effort cost. The function R ≥ S × S and models the risk-related cost. Let ν i denote the strategy of A i . We introduce theinfinite horizon discounted cost J i ( x , . . . , x N , ν , . . . , ν N ) = E ∞ ∑ t = β t c ( x it , x ( N ) t , a it ) , ≤ i ≤ N . (3)The standard methodology of mean field games may be applied by approximating { x ( N ) t , t ≥ } by a deterministic sequence { z t , t ≥ } which depends on the initialcondition of the system. One may solve the limiting optimal control problem of A i and derive a dynamic programming equation for its value function denoted by v i ( t , x , ( z k ) ∞ k = ) , whose dependence on t is due to the time-varying sequence { z t , t ≥ } . Subsequently one derives another equation for the mean field { z t , t ≥ } byaveraging the individual states across the population. This approach, however, hasthe drawback of heavy computational load. We are interested in a steady-state form of the solution of the mean field gamestarting with { z t , t ≥ } . Such steady state equations provide information on the longtime behavior of the solution and are of interest in their own right. They may alsobe used for approximation purposes to compute strategies efficiently. We introducethe system v ( x ) = min h β Z v ( y ) Q ( dy | x ) + R ( x , z ) , β v ( ) + R ( x , z ) + γ i , (4) z = Z x µ ( dx ) , (5)where µ is a probability measure on S . We say ( v , z , µ , a i ( · )) is a stationary equilib-rium to (4)-(5) if i) the feedback policy a i ( · ) , as a mapping from S to { a , a } , is thebest response with respect to z in (4), ii) given an initial distribution of x i , { x it , t ≥ } under the policy a i has its distribution converging (under a total variation norm oronly weakly) to the stationary distribution (or called limiting distribution) µ . inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 5 We may interpret v as the value function of an MDP with cost ¯ J i ( x i , z , ν i ) = E ∑ ∞ t = β t c ( x it , z , a it ) . An alternative way to interpret (4)-(5) is that the initial state of A i has been sampled according to the “right” distribution µ , and that z is obtainedby averaging an infinite number of such initial values by the law of large numbers[52]. A similar solution notion is adopted in [2, 3] but ergodicity is not part of theirsolution specification.Let the probability measure µ k be the distribution of R -valued random vari-able Z k , k = ,
2. We say µ stochastically dominates µ , and denote µ ≤ st µ ,if µ (( y , ∞ )) ≥ µ (( y , ∞ )) (or equivalently, P ( Z > y ) ≥ P ( Z > y ) ) for all y . It iswell known [44] that µ ≤ st µ if and only if Z ψ ( y ) µ ( dy ) ≤ Z ψ ( y ) µ ( dy ) (6)for all increasing function ψ (not necessarily strictly increasing) for which the twointegrals are finite. A stochastic kernel Q ( B | x ) , 0 ≤ x ≤ B ∈ B ( S ) , is said to bestrictly stochastically increasing if ϕ ( x ) : = R S ψ ( y ) Q ( dy | x ) is strictly increasing in x ∈ S for any strictly increasing function ψ : [ , ] → R for which the integral is nec-essarily finite. Q ( ·| x ) is said to be weakly continuous if ϕ is continuous whenever ψ is continuous.Let { Y t , t ≥ } be a Markov process with state space [ , ] , transition kernel Q ( ·| x ) and initial state Y =
0. So each of its trajectories is monotonically in-creasing. Define τ θ Q = inf { t | Y t ≥ θ } for θ ∈ ( , ) . It is clear that τ θ Q ≤ τ θ Q for0 < θ < θ < { x i , i ≥ } are i.i.d. random variables taking values in S .(A2) R ( x , z ) is a continuous function on S × S . For each fixed z , R ( · , z ) is strictlyincreasing.(A3) i) Q ( ·| x ) satisfies Q ([ x , ] | x ) = x , and is strictly stochastically in-creasing; ii) Q ( dy | x ) is weakly continuous and has a positive probability density q ( y | x ) for each fixed x <
1; iii) for any small 0 < δ <
1, inf x Q ([ − δ , ] | x ) > R ( x , · ) is increasing for each fixed x .(A5) lim θ ↑ E τ θ Q = ∞ .(A3)-iii) will be used to ensure the uniform ergodicity of the controlled Markovprocess. In fact, under (A3) we can show E τ θ Q < ∞ . The following condition is aspecial case of (A3).(A3 ′ ) There exists a random variable such that Q ( ·| x ) is equal to the law of x + ( x − ) ξ for some random variable ξ with probability density f ξ ( x ) >
0, a.e. x ∈ S .When (A3 ′ ) holds, we can verify (A5) by analyzing the stopping time τ ξ = inf { t | ∏ ts = ξ s ≤ − θ } , where { ξ s , s ≥ } is a sequence of i.i.d. random variableswith probability density f ξ . For existence analysis of the mean field game, (A5) willbe used to ensure continuity of the mean field when the threshold θ approaches 1. Minyi Huang and Yan Ma
Proposition 1
The two conditions are equivalent:i) µ ≤ st µ , and µ = µ ;ii) R R φ ( y ) µ ( dy ) < R R φ ( y ) µ ( dy ) for all strictly increasing function φ for whichboth integrals are finite.Proof. Assume i) holds. By [44, Theorem 1.2.16], we have φ ( Z ) ≤ st φ ( Z ) , (7)and so E φ ( Z ) ≤ E φ ( Z ) . Since µ = µ , there exists y such that P ( Z > y ) = P ( Z > y ) . Take r such that φ ( y ) = r . Then P ( φ ( Z ) > r ) = P ( φ ( Z ) > r ) . (8)If E φ ( Z ) = E φ ( Z ) were true, by (7) and [44, Theorem 1.2.9], φ ( Z ) and φ ( Z ) would have the same distribution, which contradicts (8). We conclude E φ ( Z ) < E φ ( Z ) , which is equivalent to ii).Next we show ii) implies i). Let ψ be any increasing function satisfying (6)with two finite integrals. When ii) holds, we take φ ε = ψ + ε y + | y | , ε >
0. Then R φ ε µ ( dy ) < R φ ε µ ( dy ) holds for all ε >
0. Letting ε →
0, then (6) follows and µ ≤ st µ . It is clear µ = µ . ⊓⊔ For this section we assume (A1)-(A3). We take any fixed z ∈ [ , ] and consider (4)as a separate equation, which is rewritten below: v ( x ) = min n β Z v ( y ) Q ( dy | x ) + R ( x , z ) , β v ( ) + R ( x , z ) + γ o . (9)Here z is not required to satisfy (5). In relation to the mean field game, the resultingoptimal policy will be called the best response with respect to z . Denote G ( x ) = R v ( y ) Q ( dy | x ) . Lemma 1. i) Equation (9) has a unique solution v ∈ C ([ , ] , R ) .ii) v is strictly increasing.iii) The optimal policy is determined as follows:a) If β G ( ) < β v ( ) + γ , a i ( x ) ≡ a .b) If β G ( ) = β v ( ) + γ , a i ( ) = a and a i ( x ) = a for x < .c) If β G ( ) ≥ β v ( ) + γ , a i ( x ) ≡ a .d) If β G ( ) < β v ( ) + γ < ρ G ( ) , there exists a unique x ∗ ∈ ( , ) and a i is athreshold policy with parameter x ∗ , i.e., a i ( x ) = a if x ≥ x ∗ and a i ( x ) = a if x < x ∗ .Proof. Define the dynamic programming operator inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 7 ( L g )( x ) = min n β Z g ( y ) Q ( dy | x ) + R ( x , z ) , β g ( ) + R ( x , z ) + γ o , (10)which is from C ([ , ] , R ) to itself. The proving method in [27], [28, Lemma 6],which assumed (A3 ′ ), can be extended to the present equation (9) in a straightfor-ward manner.In particular, for the proof of ii) and iii), we obtain progressively stronger prop-erties of v and G . First, denoting g = g k + = L g k for k ≥
0, we use a suc-cessive approximation procedure to show that v is increasing, which implies that G is continuous and increasing by weak continuity and monotonicity of Q . Since R is strictly increasing in x , by the right hand side of (9), we show that v is strictlyincreasing, which implies the same property for G by strict monotonicity of Q . ⊓⊔ For the optimal policy specified in part iii) of Lemma 1, we can formally denotethe threshold parameters for the corresponding cases: a) θ = + , b) θ =
1, c) θ =
0, and d) θ = x ∗ . Such a policy will be called a θ -threshold policy. We give thecondition for θ = Lemma 2.
For γ > and v solving (9) , β G ( ) ≥ β v ( ) + γ (11) holds if and only if γ ≤ β Z R ( y , z ) Q ( dy | ) − β R ( , z ) . (12) Proof.
We show necessity first. Suppose (11) holds. Note that G ( x ) is strictly in-creasing on [ , ] . Equation (9) reduces to v ( x ) = β v ( ) + R ( x , z ) + γ , (13) β G ( x ) ≥ β v ( ) + γ , ∀ x . (14)From (13), we uniquely solve v ( ) = − β [ R ( , z ) + γ ] , v ( x ) = β − β [ R ( , z ) + γ ] + R ( x , z ) + γ , (15)which combined with (14) implies (12).We continue to show sufficiency. If γ > v and verify (13) and (14). So v is the unique solution of (9) satisfying (11). ⊓⊔ The next lemma gives the condition for θ = + in the best response. Lemma 3.
For γ > and v solving (9) , we have β G ( ) < β v ( ) + γ (16) if and only if Minyi Huang and Yan Ma γ > β [ V β ( ) − V β ( )] , (17) where V β ( x ) ∈ C ([ , ] , R ) is the unique solution ofV β ( x ) = β Z V β ( y ) Q ( dy | x ) + R ( x , z ) . (18) Proof.
By Banach’s fixed point theorem, we can show that (18) has a unique so-lution. Next, by a successive approximation { V ( k ) β , k ≥ } with V ( ) β = V β is strictly increasing. Moreover, R V β ( y ) Q ( dy | x ) is increasing in x by monotonicity of Q .We show necessity. Since G is strictly increasing, (16) implies that the right handside of (9) now reduces to the first term within the parentheses and that v = V β . So(17) follows.To show sufficiency, suppose (17) holds. We have β Z V β ( y ) Q ( dy | x ) ≤ β V β ( ) < β V β ( ) + γ , ∀ x . Therefore, v : = V β gives the unique solution of (9) and β G ( ) < β v ( ) + γ . ⊓⊔ Example 1.
Let R ( x , z ) = x ( c + z ) , where c >
0. Take Q ( ·| x ) as uniform distributionon [ x , ] . Then (18) reduces to V β ( x ) = β − x Z x V β ( y ) dy + R ( x , z ) . Define φ ( x ) = R x V β ( y ) dy , x ∈ [ , ] . Then φ ′ ( x ) = − β − x φ ( x ) − R ( x , z ) holds andwe solve φ ( x ) = ( − x ) β Z x R ( s , z )( − s ) β ds , where the right hand side converges to 0 as x → − . We further obtain V β ( x ) = β ( − x ) β − Z x R ( s , z )( − s ) β ds + R ( x , z ) for x ∈ [ , ) , and the right hand side has the limit R ( , z ) − β as x → − . This gives awell defined V β ∈ C ([ , ] , R ) . Therefore, V β ( ) = β ( c + z )( − β )( − β ) . Then (17) reduces to γ > β ( c + z ) − β . inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 9 Assume (A1)-(A5) for this section. Define the class P of probability measureson S as follows: ν ∈ P if there exist a constant c ν ≥ g ( x ) ≥ [ , ] such that ν ( B ) = Z B g ( x ) dx + c ν B ( ) , where B ∈ B ( S ) and 1 B is the indicator function of B . When restricted to ( , ] , ν isabsolutely continuous with respect to the Lebesgue measure µ Leb .Let X be a random variable with distribution ν ∈ P . Set x it = X . Define Y = x it + by applying a it ≡ a . Further define Y = x it + by applying the r -threshold policy a it with r ∈ ( , ) . Lemma 4.
The distribution ν i of Y i is in P for i = , .Proof. Let q ( y | x ) denote the density function of Q ( ·| x ) for x ∈ [ , ) , where q ( y | x ) = y < x . Denote g ( y ) = Z ≤ x < y q ( y | x ) ν ( dx ) , y ∈ ( , ) , and g ( y ) = Z ≤ x < y ∧ r q ( y | x ) ν ( dx ) , y ∈ ( , ) . Then it can be checked that P ( Y ∈ B ) = Z B g ( y ) dy , P ( Y ∈ B ) = Z B g ( y ) dy + P ( X ≥ r ) B ( ) . This completes the lemma. ⊓⊔ In order to show that (4)-(5) has a solution, we define a mapping Γ : S → S bythe following rule. For z ∈ [ , ] , we solve (4) to obtain a well defined threshold θ ( z ) ∈ [ , ] ∪ { + } , which in turn determines a limiting distribution µ θ ( z ) of theclosed-loop state process x it by Lemma A.1. Define Γ ( z ) = Z x µ θ ( z ) ( dx ) . If Γ has a fixed point, we obtain a solution to (4)-(5).We analyze the case where the best response gives a strictly positive threshold.Assume γ > β max z ∈ [ , ] Z [ R ( y , z ) − R ( , z )] Q ( dy | ) . (19) Note that under a zero threshold policy, the behavior of the state process is sensitiveto a positive perturbation of the threshold. The above condition ensures that the zerothreshold will not occur, and this will ensure continuity of Γ to facilitate the fixedpoint analysis. Lemma 5.
Assume (19) . Then Γ ( z ) is continuous on [ , ] .Proof. Let z ∈ [ , ] be fixed, giving a corresponding threshold parameter θ when(9) is solved using z . We check continuity at z and consider 3 cases.Case i) θ ∈ ( , ) . Let π be the stationary distribution with the θ -thresholdpolicy. Consider any fixed ε >
0. There exists ε such that for all θ ∈ ( θ − ε , θ + ε ) ⊂ ( , ) , | R x π ( dx ) − R x π ( dx ) | < ε , where π is the stationary distributionassociated with θ . This follows since lim θ → θ k π − π k TV = z , we can select a sufficiently small δ > | z − z | < δ , z generatesa threshold parameter θ ∈ ( θ − ε , θ + ε ) , which implies | Γ ( z ) − Γ ( z ) | ≤ ε .Case ii) z gives θ =
1. Then Γ ( z ) =
1. Fix any ε >
0. Then we can show thereexists ε such that for all θ ∈ ( − ε , ) , the associated stationary distribution π θ gives | Γ ( z ) − R x π θ ( dx ) | < ε , where we use (A5) and the right hand side of (C.1)to estimate a lower bound for R x π θ ( dx ) . Now, there exists δ > z satisfying | z − z | < δ gives a threshold θ either in ( − ε , ) or equal to 1 or 1 + ;for each case, we have | Γ ( z ) − R x π θ ( dx ) | < ε .Case iii) z gives θ = + . Then there exists δ > z satisfying | z − z | < δ gives a threshold parameter θ = + . Then Γ ( z ) = Γ ( z ) = ⊓⊔ Theorem 1.
Assume (19) . There exists a stationary equilibrium to (4) - (5) .Proof. Since Γ is a continuous function from [ , ] to [ , ] by Lemma 5, the theo-rem follows from Brouwer’s fixed point theorem. ⊓⊔ Let x i , θ t and π θ denote the state process and its stationary distribution, respec-tively, under a θ -threshold policy. Denote z ( θ ) = R x π θ ( dx ) . We have the firstcomparison theorem on monotonicity.
Lemma 6. z ( θ ) ≤ z ( θ ) for < θ < θ < .Proof. By the ergodicity of { x i , θ l t , t ≥ } in Lemma A.2, we have the representation z ( θ l ) = lim k → ∞ k ∑ k − t = x i , θ l t w.p.1. Lemma C.2 implies z ( θ ) ≤ z ( θ ) . ⊓⊔ To establish uniqueness, we consider R ( x , z ) = R ( x ) R ( z ) , where R ≥ R ≥
0, and which satisfies (A1)-(A5). We further make the following assumption.(A6) R > S .This assumption indicates positive externalities since an individual benefits fromthe decrease of the population average state. This condition has a crucial role in theuniqueness analysis.Given the product form of R , now (9) takes the form: inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 11 V ( x ) = min h β Z V ( y ) Q ( dy | x ) + R ( x ) R ( z ) , β V ( ) + R ( x ) R ( z ) + γ i . Consider 0 ≤ z < z ≤ V l ( x ) = min h β Z V l ( y ) Q ( dy | x ) + R ( x ) R ( z l ) , β V l ( ) + R ( x ) R ( z l ) + γ i . (20)Denote the optimal policy as a threshold policy with parameter θ l in [ , ] or equalto 1 + , where we follow the interpretation in Section 3 if θ l = + . We state thesecond comparison theorem about the threshold parameters under different meanfield parameters z l . Theorem 2. θ and θ in (20) are specified according to the following scenarios:i) If θ = , then we have either θ ∈ [ , ] or θ = + .ii) If θ ∈ ( , ) , we have either a) θ ∈ ( θ , ) , or b) θ = , or c) θ = + .iii) If θ = , θ = + .iv) If θ = + , θ = + .Proof. Since R ( z ) > R ( z ) >
0, we divide both sides of (20) by R ( z l ) and define γ l = γ R ( z l ) . Then 0 < γ < γ . The dynamic programming equation reduces to (D.2).Subsequently, the optimal policy is determined according to Lemma D.4. ⊓⊔ Corollary 1.
Assume (A6) in addition to the assumptions in Theorem 1. Then thesystem (4) - (5) has a unique stationary equilibrium.Proof. The proof is similar to [27, 28], which assumed (A3 ′ ). ⊓⊔ . This section assumes (A1)-(A6). Consider the two solution systems ¯ v ( x ) = min h β Z ¯ v ( y ) Q ( dy | x ) + R ( x ) R ( ¯ z ) , β ¯ v ( ) + R ( x ) R ( ¯ z ) + ¯ γ i , ¯ z = Z x ¯ µ ( dx ) , (21)and v ( x ) = min h β Z v ( y ) Q ( dy | x ) + R ( x ) R ( z ) , β v ( ) + R ( x ) R ( z ) + γ i , z = Z x µ ( dx ) . (22) Suppose ¯ γ satisfies (19). By Corollary 1, (21) has a unique solution denoted by ( ¯ v , ¯ z , ¯ µ , ¯ θ ) , where ¯ θ is the threshold parameter. We further assume ¯ θ ∈ ( , ) . Sup-pose γ > ¯ γ . Then we can uniquely solve ( v , z , µ , θ ) . The next theorem presents aresult on monotone comparative statics [53]. Theorem 3. If γ > ¯ γ , we have θ > ¯ θ , z > ¯ z , v > ¯ v . Proof.
We prove by contradiction. Assume θ ≤ ¯ θ . Then by Lemma 6, z ≤ ¯ z , andtherefore, γ R ( z ) > ¯ γ R ( ¯ z ) . By the method of proving Theorem 2, we would establish θ > ¯ θ , which contradicts the assumption θ ≤ ¯ θ . We conclude θ > ¯ θ . By Lemma 6and Remark B.1, we have z > ¯ z . For (21), we use value iteration to approximate ¯ v by an increasing sequence of functions ¯ v k with ¯ v =
0. Similarly, v is approximatedby v k with v =
0. By induction, we have v k ≥ ¯ v k for all k . This proves v ≥ ¯ v .Next, we have β v ( ) + R ( x ) R ( z ) + γ > β ¯ v ( ) + R ( x ) R ( ¯ z ) + ¯ γ on [ , ] , and β R v ( y ) Q ( dy | x ) + R ( x ) R ( z ) > β R ¯ v ( y ) Q ( dy | x ) + R ( x ) R ( ¯ z ) on ( , ] . By themethod in [27, Lemma 2], we have v > ¯ v on ( , ] . Then R v ( y ) Q ( dy | ) > R ¯ v ( y ) Q ( dy | ) . This further implies v ( ) > ¯ v ( ) . ⊓⊔ Remark 1.
It is possible to have θ = + in Theorem 3.By a continuity argument, we can further show lim γ → ¯ γ ( | θ − ¯ θ | + | z − ¯ z | + sup x | v ( x ) − ¯ v ( x ) | ) =
0. In the analysis below, we take γ = ¯ γ + ε for some small ε >
0. For this section, we further introduce the following assumption.(A7) For γ > ¯ γ , ( v , z , θ ) has the representation v ( x ) = ¯ v ( x ) + ε w ( x ) + o ( ε ) , ≤ x ≤ , (23) z = ¯ z + ε z γ + o ( ε ) , (24) θ = ¯ θ + εθ γ + o ( ε ) , (25)where v , z , θ are solved depending on the parameter γ and w is a function definedon [ , ] . The derivatives z γ and θ γ at ¯ γ exist, and R ( z ) is differentiable on [ , ] .For 0 ≤ x <
1, the probability density function q ( y | x ) , y ∈ [ x , ] , for Q ( dy | x ) iscontinuous on { ( x , y ) | ≤ x ≤ y < } . Moreover, ∂ q ( y | x ) ∂ x exists and is continuousin ( x , y ) .We aim to provide a characterization of w , z γ , θ γ . Theorem 4.
The function w satisfiesw ( x ) = β Z w ( y ) Q ( dy | x ) + R ( x ) R ′ ( ¯ z ) z γ , ≤ x ≤ ¯ θ , β w ( ) + R ( x ) R ′ ( ¯ z ) z γ + , ¯ θ < x ≤ . (26) Proof.
We have inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 13 ¯ v ( x ) = β Z ¯ v ( y ) Q ( dy | x ) + R ( x ) R ( ¯ z ) , x ∈ [ , ¯ θ ] and v ( x ) = β Z v ( y ) Q ( dy | x ) + R ( x ) R ( z ) , x ∈ [ , θ ] . Note that θ > ¯ θ . For any fixed x ∈ [ , ¯ θ ] , we have v ( x ) − ¯ v ( x ) = β Z ( v ( y ) − ¯ v ( y )) Q ( dy | x ) + R ( x )( R ( z ) − R ( ¯ z )) . Then the equation of w ( x ) for x ∈ [ , ¯ θ ] is derived. We similarly treat the case x ∈ ( ¯ θ , ] . ⊓⊔ Remark 2.
In general w has discontinuity at x = ¯ θ , so that β R w ( y ) Q ( dy | ¯ θ ) = β w ( ) + . We give some interpretation. Let the value function be written as v ( x , γ ) to explicitly indicate γ . Let the rectangle [ , ] × [ γ a , γ b ] be a region of interest inwhich ( x , γ ) varies so that the value function defines a continuous surface. Then ( θ , γ ) starts at ( ¯ θ , ¯ γ ) and traces out the curve of an increasing function along whichthe expression of the value function has a switch, and the value function surfacemay be visualized as two pieces glued together along the curve in a non-smoothway. The value of w amounts to finding on the surface the directional derivative inthe direction of γ ; and therefore, discontinuity may occur at x = ¯ θ .To better understand the solution of (26), we consider the general equation W ( x ) = β Z W ( y ) Q ( dy | x ) + R ( x ) R ′ ( z ) c , ≤ x ≤ θ , β W ( ) + R ( x ) R ′ ( z ) c + , θ < x ≤ , (27)where c , z ∈ [ , ] and θ ∈ ( , ) are arbitrarily chosen and fixed. Let B ([ , ] , R ) be the Banach space of bounded Borel measurable functions with norm k g k = sup x | g ( x ) | . By a contraction mapping, we can show (27) has a unique solution W ∈ B ([ , ] , R ) .We continue to characterize the sensitivity θ γ of the threshold. Recall the partialderivative ∂ q ( y | x ) ∂ x . Lemma 7.
We have β h Z θ ¯ v ( y ) ∂ q ( y | ¯ θ ) ∂ x dy − ¯ v ( ¯ θ ) q ( ¯ θ | ¯ θ ) i θ γ = + β w ( ) − β Z θ w ( y ) Q ( dy | ¯ θ ) . (28) Proof.
Write γ = ¯ γ + ε . By the property of the threshold, we have β Z θ ¯ v ( y ) Q ( dy | ¯ θ ) = β ¯ v ( ) + ¯ γ , β Z θ v ( y ) Q ( dy | θ ) = β v ( ) + ¯ γ + ε . Note that θ > ¯ θ . We check ∆ : = Z θ v ( y ) Q ( dy | θ ) − Z θ ¯ v ( y ) Q ( dy | ¯ θ )= Z θ v ( y ) Q ( dy | θ ) − Z θ ¯ v ( y ) Q ( dy | ¯ θ ) − Z θ ¯ θ ¯ v ( y ) Q ( dy | ¯ θ )= Z θ v ( y ) Q ( dy | θ ) − Z θ ¯ v ( y ) Q ( dy | θ )+ Z θ ¯ v ( y ) Q ( dy | θ ) − Z θ ¯ v ( y ) Q ( dy | ¯ θ ) − Z θ ¯ θ ¯ v ( y ) Q ( dy | ¯ θ )= ε Z θ w ( y ) q ( y | θ ) dy + ( θ − ¯ θ ) Z θ ¯ v ( y )[ ∂ q ( y | θ ) / ∂ x ] dy − ( θ − ¯ θ ) ¯ v ( ¯ θ ) q ( ¯ θ | ¯ θ )+ o ( ε + | θ − ¯ θ | )= ε Z θ w ( y ) q ( y | ¯ θ ) dy + ( θ − ¯ θ ) Z θ ¯ v ( y )[ ∂ q ( y | ¯ θ ) / ∂ x ] dy − ( θ − ¯ θ ) ¯ v ( ¯ θ ) q ( ¯ θ | ¯ θ )+ o ( ε + | θ − ¯ θ | ) . Note that β∆ = β [ v ( ) − ¯ v ( )] + ε . We derive β Z θ w ( y ) Q ( dy | ¯ θ ) + βθ γ Z θ ¯ v ( y ) ∂ q ( y | ¯ θ ) ∂ x dy − β ¯ v ( ¯ θ ) q ( ¯ θ | ¯ θ ) θ γ = β w ( ) + . This completes the proof. ⊓⊔ Lemma 8.
Given the threshold ¯ θ ∈ ( , ) , the stationary distribution ¯ µ has a prob-ability density function (p.d.f.) p ( x ) on ( , ] , and ¯ µ ( { } ) = π , where ( p , π ) isdetermined by π = Z θ p ( x ) dx , (29) p ( x ) = Z x q ( x | y ) p ( y ) dy + π q ( x | ) , ≤ x < ¯ θ , Z ¯ θ q ( x | y ) p ( y ) dy + π q ( x | ) , ¯ θ ≤ x ≤ . (30) Proof.
Let δ be the dirac measure at x =
0. For any Borel subset B ⊂ [ , ] , we have¯ µ ( B ) = R [ Q ( B | y ) ( y < ¯ θ ) + δ ( B ) ( y ≥ ¯ θ ) ] ¯ µ ( dy ) . Then it can be checked that ( p , π ) satisfying the above equations determines the stationary distribution. Now we showthere exists a unique solution. Let π > p ( x ) = Z x q ( x | y ) p ( y ) dy + π q ( x | ) , ≤ x ≤ ¯ θ , (31) inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 15 and we obtain a unique solution p in C ([ , ¯ θ ] , R ) (see e.g. [36, p.33]). In fact p is anonnegative function with R ¯ θ p ( x ) dx >
0. Subsequently, we further determine p ≥ [ ¯ θ , ] by (30). The solution p on [ , ] depends linearly on π and so there existsa unique π such that R p ( x ) dx + π =
1. After we uniquely solve p for (30), weintegrate both sides of this equation on [ , ] and obtain R p ( x ) dx = R ¯ θ p ( x ) dx + π ,which implies that (29) is satisfied. ⊓⊔ Now we suppose Q ( dy | x ) has uniform distribution on [ x , ] for all fixed 0 ≤ x < R ( x , z ) = R ( x ) R ( z ) = x ( c + z ) , where R ( x ) = x , R ( z ) = c + z and c >
0. Inthis case, (A2)-(A6) are satisfied. For (21), we have¯ v ( x ) = β − x Z x ¯ v ( y ) dy + R ( x ) R ( ¯ z ) , ≤ x ≤ ¯ θ , β ¯ v ( ) + R ( x ) R ( ¯ z ) + ¯ γ , ¯ θ ≤ x ≤ . (32)Denote ϕ ( x ) = R x ¯ v ( y ) dy . Then˙ ϕ ( x ) = − β − x ϕ − R ( x ) R ( ¯ z ) , ≤ x ≤ ¯ θ . Taking the initial condition ϕ ( ) , we have ϕ ( x ) = ϕ ( )( − x ) β − ( − x ) β Z x R ( τ ) R ( ¯ z )( − τ ) β d τ . On [ , ¯ θ ] ,¯ v ( x ) = ( − x ) β − ¯ v ( ) − β ( − x ) β − Z x R ( τ ) R ( ¯ z )( − τ ) β d τ + R ( x ) R ( ¯ z )= ( − x ) β − h ¯ v ( ) − β ( c + ¯ z )( − β )( − β ) i + ( c + ¯ z ) h β ( − β )( − β ) + x − β i . By the continuity of ¯ v and its form on [ ¯ θ , ] , we have¯ v ( ¯ θ ) = β ¯ v ( ) + ¯ θ ( ¯ z + c ) + ¯ γ . (33)Hence, [( − ¯ θ ) β − − β ] ¯ v ( ) = β ( c + ¯ z )[( − ¯ θ ) β − − ]( − β )( − β ) − β ( c + ¯ z ) ¯ θ − β + ¯ γ . (34)On the other hand, since ¯ v is increasing and ¯ θ is the threshold, we have ¯ v ( ¯ θ ) = β Z θ [ β ¯ v ( ) + ( c + z ) y + ¯ γ ] − ¯ θ dy + ( c + ¯ z ) ¯ θ = β ¯ v ( ) + β ¯ γ + β ( c + ¯ z ) + ( β + )( c + ¯ z ) ¯ θ , which combined with (33) gives β ( c + ¯ z )( + ¯ θ ) = ( β ¯ v ( ) + ¯ γ )( − β ) . (35)Given the special form of Q ( dy | x ) , (26) becomes w ( x ) = β − x Z x w ( y ) dy + R ( x ) R ′ ( ¯ z ) z γ , ≤ x ≤ ¯ θ , β w ( ) + R ( x ) R ′ ( ¯ z ) z γ + , ¯ θ < x ≤ . (36)The computation of w now reduces to uniquely solving w ( ) . By the expression of w on [ , ¯ θ ] , we have w ( ¯ θ ) = β Z θ w ( y ) Q ( dy | ¯ θ ) + R ( ¯ θ ) R ′ ( ¯ z ) z γ = β w ( ) + β + R ( ¯ θ ) R ′ ( ¯ z ) z γ + β R ′ ( ¯ z ) z γ − ¯ θ Z θ R ( y ) dy = β w ( ) + β + ¯ θ z γ + β z γ + ¯ θ . (37)For x ∈ [ , ¯ θ ] , we further write w ( x ) = β − x Z x w ( y ) dy + R ( x ) R ′ ( ¯ z ) z γ , and solve w ( x ) = ( − x ) β − w ( ) + z γ x − β z γ h ( − x ) β − ( − β )( − β ) − − β + − x − β i , which further gives w ( ¯ θ ) = ( − ¯ θ ) β − w ( ) + z γ ¯ θ − β z γ h ( − ¯ θ ) β − ( − β )( − β ) − − β + − ¯ θ − β i . (38)By (37)–(38), we have [ β − ( − ¯ θ ) β − − β ] w ( ) = + z γ (cid:16) + ¯ θ + ( − ¯ θ ) β − ( − β )( − β ) + − ¯ θ − β − − β (cid:17) . (39)Now from (30) we have inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 17 p ( x ) = Z x − y p ( y ) dy + π , ≤ x < ¯ θ , Z ¯ θ − y p ( y ) dy + π , ¯ θ ≤ x ≤ , which determines p ( x ) = π − x , ≤ x < ¯ θ , π − ¯ θ , ¯ θ ≤ x ≤ , where π = − ln ( − ¯ θ ) . We determine the mean field¯ z = Z ¯ θ xp ( x ) dx + Z θ xp ( x ) dx = π (cid:0) − ¯ θ − ln ( − ¯ θ ) (cid:1) . (40)We further obtain dzd γ at ¯ γ as z γ = ln ( − ¯ θ ) − + − ¯ θ [ − ln ( − ¯ θ )] θ γ . (41)We note that a perturbation analysis directly based on the general case (30) is morecomplicated.Now (28) reduces to h β − ¯ θ Z θ ¯ v ( y ) − ¯ θ dy − β ¯ v ( ¯ θ ) − ¯ θ i θ γ = + β w ( ) − β Z θ w ( y ) − ¯ θ dy . By the expression of ¯ v in (32) and w in (36) at θ = ¯ θ , we obtain ( − β ) ¯ v ( ¯ θ ) − ¯ θ ( c + ¯ z ) − ¯ θ θ γ = + β w ( ) − w ( ¯ θ ) + ¯ θ z γ . Recalling (33) and (37), we have ( − β )[ β ¯ v ( ) + ¯ γ ] − β ¯ θ ( ¯ z + c ) − ¯ θ θ γ − β ( − β ) w ( ) + + ¯ θ β z γ = − β . (42)By combining (34), (35) and (40), we have¯ v ( ) = [( − ¯ θ ) β − − β ] − h β ( c + ¯ z )[( − ¯ θ ) β − − ]( − β )( − β ) − β ( c + ¯ z ) ¯ θ − β + ¯ γ i , (43)¯ θ = ( − β )( β ¯ v ( ) + ¯ γ ) β ( c + ¯ z ) − , (44)¯ z = − ln ( − ¯ θ ) (cid:0) − ¯ θ − ln ( − ¯ θ ) (cid:1) . (45) x v(x)w(x) Fig. 1
Value function v and perturbation function w Next, combining (39), (41) and (42), we obtain ( − β )[ β ¯ v ( ) + ¯ γ ] − β ¯ θ ( ¯ z + c ) − ¯ θ θ γ − β ( − β ) w ( ) + + ¯ θ β z γ = − β , (46) [ β − ( − ¯ θ ) β − − β ] w ( ) = + z γ (cid:16) + ¯ θ + ( − ¯ θ ) β − ( − β )( − β ) + − ¯ θ − β − − β (cid:17) , (47) z γ = ln ( − ¯ θ ) − + − ¯ θ [ − ln ( − ¯ θ )] θ γ . (48)After ( ¯ v ( ) , ¯ z , ¯ θ ) has been determined from (43)-(45), the above gives a linear equa-tion system with unknowns w ( ) , θ γ and z γ . Example 2.
We take R ( x ) = x and R ( z ) = . + z , ¯ γ = . β = . We numeri-cally solve (43)-(45) to obtain ¯ v ( ) = . , ¯ θ = . , ¯ z = . w ( ) = . , θ γ = . , z γ = . v ( x ) and w ( x ) are displayed in Fig. 1, where w has a discontinuity at x = ¯ θ as dis-cussed in Remark 2. The positive value of θ γ implies the threshold increases with γ ,as asserted in Theorem 3. This paper considers mean field games in a framework of binary Markov decisionprocesses (MDP) and establishes existence and uniqueness of stationary equilib- Corrected on Oct 10, 2020 by adding the value of β and correcting the parameter in R ( z ) .inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 19 ria. The resulting policy has a threshold structure. We further analyze comparativestatics to address the impact of parameter variations in the model.For future research, there are some potentially interesting extensions. One mayconsider a heterogenous population and study the emergence of free-riders whocare more about their own effort costs and have less incentive to contribute to thecommon benefit of the population. Another modelling of a quite different natureinvolves negative externalities where other players’ improvement brings more pres-sure on the player in question. For instance, this arises in competitions for marketshare. The modelling and analysis of the agent behavior will be of interest. Appendix A: Preliminaries on Ergodicity
Assume (A3). The next two lemmas determine the limiting distribution of the stateprocess under threshold policies.
Lemma A.1. i) If θ = , then the distribution of x it remains to be the dirac measure δ for all t ≥ , for any x i .ii) If θ = or θ = + , the distribution of x it converges to the dirac measure δ weakly.Proof. Part i) is obvious and part ii) follows from (A3). ⊓⊔ Let x i , θ t denote the state process generated by the θ -threshold policy with θ ∈ ( , ) , and let P t θ ( x , · ) be the distribution of x i , θ t given x i , θ = x . Lemma A.2.
For θ ∈ ( , ) , { x i , θ t , t ≥ } is uniformly ergodic with stationary prob-ability distribution π θ , i.e., sup x ∈ S k P t θ ( x , · ) − π θ k TV ≤ Kr t , (A.1) for some constants K > and r ∈ ( , ) , where k · k TV is the total variation norm ofsigned measures.Proof. The proof is similar to that of the ergodicity theorem in [27], which assumed(A3 ′ ). We use (A3)-iii) to estimate r . ⊓⊔ We take C s = { } as a small set and θ ∈ ( , ) . The θ -threshold policy gives P ( x i , θ = | x i , θ = ) ≥ Z θ q ( y | ) dy = : ε . (A.2)So for any Borel set B , P ( x i , θ ∈ B | x i , θ = ) ≥ ε δ ( B ) , where δ is the dirac measure.For θ ′ in a small neighborhood of θ , we can ensure that the θ ′ -threshold policy gives P ( x i , θ ′ ∈ B | x i , θ ′ = ) ≥ ε δ ( B ) . (A.3) Lemma A.3.
Suppose θ , θ ′ ∈ ( , ) for two threshold policies. Let the correspond-ing stationary distributions of the state process by π and π ′ . Then lim θ ′ → θ k π ′ − π k TV = . Proof.
Fix θ ∈ ( , ) . By (A.3) and [41], there exist a neighborhood I = ( θ − κ , θ + κ ) ⊂ ( , ) and two constants C , r ∈ ( , ) such that for all θ ′ ∈ I , k P t θ ( x , · ) − π k TV ≤ Cr t , k P t θ ′ ( x , · ) − π ′ k TV ≤ Cr t , ∀ x ∈ [ , ] . Subsequently, k π ′ − π k TV ≤ k P t θ ′ ( , · ) − P t θ ( , · ) k TV + Cr t . For any given ε >
0, fix a large k such that 2 Cr k ≤ ε /
2. We show for all θ ′ suffi-ciently close to θ , k P k θ ′ ( , · ) − P k θ ( , · ) k TV ≤ ε / . Given two probability measures µ t , µ ′ t , define the probability measures µ t + and µ ′ t + , µ t + ( B ) = Z S P θ ( y , B ) µ t ( dy ) , µ ′ t + ( B ) = Z S P θ ′ ( y , B ) µ ′ t ( dy ) , for Borel set B ⊂ [ , ] . Then | µ t + ( B ) − µ ′ t + ( B ) | ≤ | Z S P θ ( y , B ) µ t ( dy ) − Z S P θ ′ ( y , B ) µ t ( dy ) | + | Z S P θ ′ ( y , B ) µ t ( dy ) − Z S P θ ′ ( y , B ) µ ′ t ( dy ) | = : D + D . We have D = (cid:12)(cid:12)(cid:12) Z S P θ ′ ( y , B ) µ t ( dy ) − Z S P θ ′ ( y , B ) µ ′ t ( dy ) (cid:12)(cid:12)(cid:12) ≤ k µ t − µ ′ t k TV . Denote θ = min { θ , θ ′ } and θ = max { θ , θ ′ } . Then D = (cid:12)(cid:12)(cid:12) − Z [ θ , θ ) Q ( B | y ) µ t ( dy ) + B ( ) µ t ([ θ , θ )) (cid:12)(cid:12)(cid:12) ≤ µ t ([ θ , θ )) . Setting µ = µ ′ = δ , then µ t = P t θ ( , · ) , µ ′ t = P t θ ′ ( , · ) . Hence, | P t + θ ′ ( , B ) − P t + θ ( , B ) | ≤ k P t θ ′ ( , · ) − P t θ ( , · ) k TV + P t θ ( , [ θ , θ )) , (A.4)which implies k P t + θ ′ ( , · ) − P t + θ ( , · ) k TV ≤ k P t θ ′ ( , · ) − P t θ ( , · ) k TV + P t θ ( , [ θ , θ ′ )) . (A.5) inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 21 For µ = µ ′ = δ , we have P θ ( , · ) = P θ ′ ( , · ) . It is clear from (A.5) and Lemma 4that for each t ≥ θ ′ → θ k P t θ ′ ( , · ) − P t θ ( , · ) k TV = , lim θ ′ → θ P t θ ( , [ θ , θ )) = . Therefore, for the fixed k , there exists δ > θ ′ satisfying | θ ′ − θ | < δ , k P k θ ′ ( , · ) − P k θ ( , · ) k TV < ε and k π ′ − π k TV ≤ ε . The lemma follows. ⊓⊔ Appendix B: Cycle Average of A Regenerative Process
Let 0 < r < r ′ <
1. Consider a Markov process { Y t , t ≥ } with state space [ , ] and transition kernel Q Y ( ·| y ) which satisfies Q Y ([ y , ] | y ) = y ∈ [ , ] and isstochastically increasing. Suppose Y ≡ y < r . Define the stopping times τ = inf { t | Y t ≥ r } , τ ′ = inf { t | Y t ≥ r ′ } . Lemma B.1.
If E τ < ∞ , then E ∑ τ t = Y t < ∞ andE ∑ τ t = Y t + E τ = EY + EY + ∑ ∞ k = E ( Y k + { Y k < r } ) + ∑ ∞ k = P ( Y k < r ) . (B.1) Proof.
Since 0 ≤ Y t ≤ E ∑ τ t = Y t ≤ + E τ . It is clear that { τ ≥ k } = { Y k − < r } for k ≥
1. We have E τ = ∞ ∑ k = P ( τ ≥ k ) = + ∞ ∑ k = P ( Y k < r ) , (B.2)and E τ ∑ t = Y t = E ∞ ∑ k = k ∑ t = Y t ! { τ = k } = EY + EY + ∞ ∑ k = E ( Y k { τ ≥ k } )= EY + EY + ∞ ∑ k = E ( Y k + { Y k < r } ) . The lemma follows. ⊓⊔ Lemma B.2.
Assume E τ ′ < ∞ . We haveE ∑ τ t = Y t + E τ ≤ E ∑ τ ′ t = Y t + E τ ′ . (B.3) Proof. E τ < ∞ since τ ≤ τ ′ w.p.1. For k ≥
1, denote p k = P ( Y k < r ) , η k = P ( r ≤ Y k < r ′ ) , m k = E ( Y k + { Y k < r } ) , ∆ k = E ( Y k + { r ≤ Y k < r ′ } ) . By Lemma B.1, E ∑ τ t = Y t + E τ = EY + EY + ∑ ∞ k = m k + ∑ ∞ k = p k , E ∑ τ ′ t = Y t + E τ ′ = EY + EY + ∑ ∞ k = ( m k + ∆ k ) + ∑ ∞ k = ( p k + η k ) . So (B.3) is equivalent to ( EY + EY + ∞ ∑ k = m k )( ∞ ∑ k = η k ) ≤ ( ∞ ∑ k = ∆ k )( + ∞ ∑ k = p k ) . (B.4)By the stochastic monotonicity of Q Y , we have E [ Y k + { Y k < r } | Y k ] = { Y k < r } Z yQ Y ( dy | Y k ) ≤ { Y k < r } Z yQ Y ( dy | r ) = : c r { Y k < r } . Note that c r = Z y ≥ r yQ Y ( dy | r ) ≥ r . (B.5)Moreover, E [ Y k + { r ≤ Y k < r ′ } | Y k ] = { r ≤ Y k < r ′ } Z yQ Y ( dy | Y k ) ≥ c r { r ≤ Y k < r ′ } . It follows that m k = E [ Y k + { Y k < r } ] ≤ c r p k , ∆ k = E [ Y k + { r ≤ Y k < r ′ } ] ≥ c r η k . (B.6)Since Y = y < r , E [ Y | Y ] = Z yQ Y ( dy | Y ) ≤ c r . Hence, E ( Y + Y ) ≤ r + c r . By (B.6) and (B.5), inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 23 ( EY + EY + ∞ ∑ k = m k )( ∞ ∑ k = η k ) − ( ∞ ∑ k = ∆ k )( + ∞ ∑ k = p k ) ≤ ( r + c r + c r ∞ ∑ k = p k )( ∞ ∑ k = η k ) − c r ( ∞ ∑ k = η k )( + ∞ ∑ k = p k )=( r − c r ) ∞ ∑ k = η k ≤ , which establishes (B.4). ⊓⊔ Remark B.1.
If for each y ∈ [ , ) , Q Y ( dx | y ) has probability density function q Y ( x | y ) > x ∈ ( y , ) , then c r > r and η k > k ≥
1. In this case, a strict inequalityholds for (B.3). ⊓⊔ Appendix C
We assume (A3). Let { x i , θ t , t ≥ } be the Markov chain generated by a θ -thresholdpolicy with 0 < θ <
1, where x i , θ is given. By Lemma A.2, { x i , θ t , t ≥ } is ergodic.We next define an auxiliary Markov chain { Y t , t ≥ } with Y = x i , θ t . Denote S t = ∑ ti = Y i for t ≥
0. Define τ = inf { t | Y t ≥ θ } . Lemma C.1.
We have lim k → ∞ k k − ∑ t = Y t = ES τ + E τ w . p . . (C.1) Proof.
By (A3), we can show E τ < ∞ . Since { Y t , t ≥ } has the same transitionprobability kernel as { x i , θ t , t ≥ } , it is ergodic, and therefore the left hand side of(C.1) has a constant limit w.p.1. Define T = T n as the time for { Y t , t ≥ } toreturn to state 0 for the n th time. So T = τ +
1. Define B n = ∑ T n − t = T n − Y t for n ≥
1. Weobserve that { Y t , t ≥ } is a regenerative process (see e.g. [6, 51] and [7, Theorem4]) with regeneration times { T n , n ≥ } and that { B n , n ≥ } is a sequence of i.i.d.random variables. Note that B = S τ is the sum of τ + ⊓⊔ Suppose 0 < θ < θ ′ <
1. Then there exist two constants C θ , C θ ′ such thatlim k → ∞ k k − ∑ t = x i , θ t = C θ , lim k → ∞ k k − ∑ t = x i , θ ′ t = C θ ′ , w.p.1. Lemma C.2.
We have C θ ≤ C θ ′ .Proof. Due to the ergodicity of the Markov chain, C θ (resp., C θ ′ ) does not dependon x i , θ (resp., x i , θ ′ ). Therefore, lim k → ∞ k ∑ k − t = Y t = C θ w.p.1. The lemma followsfrom Lemmas C.1 and B.2. ⊓⊔ Appendix D: An Auxiliary MDP
Assume (A3). This appendix introduces an auxiliary control problem to show theeffect of the effort cost on the threshold parameter of the optimal policy. The stateand control processes { ( x it , a it ) , t ≥ } are specified by (1)-(2). The cost has the form J ri = E ∞ ∑ t = ρ t (cid:0) R ( x it ) + r { a it = a } (cid:1) , (D.1)where R is continuous and strictly increasing on [ , ] and ρ ∈ ( , ) , r ∈ ( , ∞ ) .Let r take two different values 0 < γ < γ and write the corresponding dynamicprogramming equation v l ( x ) = min (cid:26) ρ Z v l ( y ) Q ( dy | x ) + R ( x ) , ρ v l ( ) + R ( x ) + γ l (cid:27) , l = , , x ∈ S . (D.2)By the method in proving Lemma 1, it can be shown that there exists a uniquesolution v l ∈ C ([ , ] , R ) and that the optimal policy a i , l ( x ) is a threshold policy. If ρ R v l ( y ) Q ( dy | ) < ρ v l ( ) + γ l , a i , l ( x ) ≡ a , and we follow the notation in Section3 to denote the threshold θ l = + . Otherwise, a i , l ( x ) is a θ l -threshold policy with θ l ∈ [ , ] , i.e., a i , l ( x ) = a if x ≥ θ l , and a i , l ( x ) = a if x < θ l . Lemma D.1. If θ ∈ ( , ) , θ = θ .Proof. We prove by contradiction. Suppose for some θ ∈ ( , ) , θ = θ = θ . (D.3)Under (D.3), the resulting optimal policy leads to the representation (see e.g. [23,pp. 22]) v l ( x ) = E ∞ ∑ t = ρ t h R ( x it ) + γ l { a it = a } i , l = , , where { x it , t ≥ } is generated by the θ -threshold policy a it ( x it ) and x i = x . Denote δ = γ − γ .For fixed x ≥ θ and x i = x , denote the resulting optimal state and control pro-cesses by { ( ˆ x it , ˆ a it ) , t ≥ } . Then ˆ a i = a w.p.1., and v ( x ) − v ( x ) = δ + δ E ∞ ∑ t = ρ t { ˆ a it = a } , x ≥ θ . Next consider x i = { ( ˇ x it , ˇ a it ) , t ≥ } . Then v ( ) − v ( ) = δ E ∞ ∑ t = ρ t { ˇ a it = a } = : ∆ . inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 25 It is clear that ˆ x i = { ( ˆ x it , ˆ a it ) , t ≥ } may beinterpreted as the optimal state and control processes of the MDP with initial state 0at t =
1. Hence the two processes { ( ˆ x it , ˆ a it ) , t ≥ } and { ( ˇ x it , ˇ a it ) , t ≥ } , where ˇ x i = a it + and ˇ a it have thesame distribution for t ≥
0. Therefore, E ∞ ∑ t = ρ t − { ˆ a it = a } = E ∞ ∑ t = ρ t { ˇ a it = a } . It follows that v ( x ) − v ( x ) = δ + ρ∆ , ∀ x ≥ θ . (D.4)Combining (D.2) and (D.3) gives ρ Z v l ( y ) Q ( dy | θ ) = ρ v l ( ) + γ l , l = , , which implies ρ Z [ v ( x ) − v ( x )] Q ( dx | θ ) = δ + ρ∆ . (D.5)By Q ([ , θ ) | θ ) = ρ ( δ + ρ∆ ) = δ + ρ∆ , whichis impossible since 0 < ρ < δ + ρ∆ >
0. Therefore, (D.3) does not hold.This completes the proof. ⊓⊔ For the MDP with cost (D.1), we continue to analyze the dynamic programmingequation v r ( x ) = min h ρ Z v r ( y ) Q ( dy | x ) + R ( x ) , ρ v r ( ) + R ( x ) + r i . (D.6)For each fixed r ∈ ( , ∞ ) , we obtain the optimal policy as a threshold policy withthreshold parameter θ ( r ) . By evaluating the cost (D.1) associated with the two poli-cies a it ( x it ) ≡ a and a it ( x it ) ≡ a , respectively, we have the prior estimate v r ( x ) ≤ min (cid:26) R ( ) − ρ , R ( x ) + r + ρ R ( ) − ρ (cid:27) . (D.7)On the other hand, let { x it , t ≥ } with x i = x be generated by any fixed Markovpolicy. Then E ∞ ∑ t = ρ t ( R ( x it ) + r { a it = a } ) ≥ R ( x ) + ∞ ∑ t = ρ t R ( ) , which implies v r ( x ) ≥ R ( x ) + ρ R ( ) − ρ . (D.8)If r > ρ R ( ) − ρ , it follows from (D.7) that ρ Z v r ( y ) Q ( dy | x ) < ρ v r ( ) + r , ∀ x , (D.9)i.e., θ ( r ) = + . Lemma D.2.
There exists δ > such that for all < r < δ , ρ Z v r ( y ) Q ( dy | x ) > ρ v r ( ) + r , ∀ x , (D.10) and so θ ( r ) = .Proof. By (D.8), ρ Z v r ( y ) Q ( dy | x ) ≥ ρ Z R ( y ) Q ( dy | x ) + ρ R ( ) − ρ ≥ ρ Z R ( y ) Q ( dy | ) + ρ R ( ) − ρ , and (D.7) gives ρ v r ( ) + r ≤ ρ R ( ) − ρ + r − ρ . Since R ( x ) is strictly increasing, C R : = Z R ( y ) Q ( dy | ) − R ( ) > . And we have ρ Z v r ( y ) Q ( dy | x ) − ( ρ v r ( ) + r ) ≥ ρ C R − r − ρ . It suffices to take δ = ρ ( − ρ ) C R . ⊓⊔ Define the nonempty sets R a = { r > | (D.9) hods } , R a = { r > | (D.10) holds } . Remark D.1.
We have ( ρ R ( ) − ρ , ∞ ) ⊂ R a and ( , δ ) ⊂ R a . Lemma D.3.
Let ( r , v r ) be the parameter and the associated solution in (D.6) .i) If r > satisfies inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 27 ρ Z v r ( y ) Q ( dy | x ) ≤ ρ v r ( ) + r , ∀ x , (D.11) then any r ′ > r is in R a .ii) If r > satisfies ρ Z v r ( y ) Q ( dy | x ) ≥ ρ v r ( ) + r , ∀ x , (D.12) then any r ′ ∈ ( , r ) is in R a .Proof. i) For r ′ > r , v r ′ is uniquely solved from (D.6) with r ′ in place of r . We canuse (D.11) to verify v r ( x ) = min (cid:20) ρ Z v r ( y ) Q ( dy | x ) + R ( x ) , ρ v r ( ) + R ( x ) + r ′ (cid:21) . Hence v r ′ = v r for all x ∈ [ , ] . It follows that ρ R v r ′ ( y ) Q ( dy | x ) < ρ v r ′ ( ) + r ′ forall x . Hence r ′ ∈ R a .ii) By (D.6) and (D.12), v r ( ) = R ( )+ r − ρ , and subsequently, v r ( x ) = ρ v r ( ) + R ( x ) + r = ρ R ( ) + r − ρ + R ( x ) . By substituting v r ( ) and v r ( x ) into (D.12), we obtain ρ R ( ) + r ≤ ρ Z R ( y ) Q ( dy | x ) , ∀ x . (D.13)Now for 0 < r ′ < r , we construct v r ′ ( x ) , as a candidate solution to (D.6) with r replaced by r ′ , to satisfy v r ′ ( ) = ρ v r ′ ( ) + R ( ) + r ′ , v r ′ ( x ) = ρ v r ′ ( ) + R ( x ) + r ′ , (D.14)which gives v r ′ ( x ) = ρ R ( ) + r ′ − ρ + R ( x ) . (D.15)We show that v r ′ ( x ) in (D.15) satisfies ρ v r ′ ( ) + r ′ < ρ Z v r ′ ( y ) Q ( dy | x ) , ∀ x , (D.16)which is equivalent to ρ R ( ) + r ′ < ρ R R ( y ) Q ( dy | x ) for all x , which in turnfollows from (D.13). By (D.14) and (D.16), v r ′ indeed satisfies (D.6) with r replacedby r ′ . So r ′ ∈ R a . ⊓⊔ Further define r = sup R a , r = inf R a . Lemma D.4. i) r satisfies ρ R v r ( y ) Q ( dy | ) = ρ v r ( ) + r, and θ ( r ) = .ii) r satisfies ρ R v r ( y ) Q ( dy | ) = ρ v r ( ) = ρ v r ( ) + r, and θ ( r ) = .iii) We have < r < r < ∞ .iv) The threshold θ ( r ) as a function of r ∈ ( , ∞ ) is continuous and strictly in-creasing on [ r , r ] .Proof. i)-ii) By Lemmas D.2 and D.3, we have 0 < r ≤ ∞ and 0 ≤ r < ∞ . Assume r = ∞ ; then R a = ( , ∞ ) giving R a = /0, a contradiction. So 0 < r < ∞ . For δ > ( , δ ) ⊂ R a . Therefore, 0 < ¯ r < ∞ . Note that v r dependson the parameter r continuously, i.e., lim | r ′ − r |→ sup x | v r ′ ( x ) − v r ( x ) | =
0. Hence ρ Z v r ( y ) Q ( dy | ) ≥ ρ v r ( ) + r . Now assume ρ Z v r ( y ) Q ( dy | ) > ρ v r ( ) + r . (D.17)Then there exists a sufficiently small ε > ( r + ε , v r + ε ) replaces ( r , v r ) ; since g ( x ) = R v r + ε ( y ) Q ( dy | x ) is increasing in x , then r + ε ∈ R a , which is impossible. Hence (D.17) does not hold, and this proves i). ii)can be shown in a similar manner.To show iii), assume 0 < r < r < ∞ . (D.18)Then, recalling Remark D.1, there exist r ′ ∈ R a and r ′′ ∈ R a such that0 < r < r ′ < r ′′ < r < ∞ . By Lemma D.3-i), r ′′ ∈ R a , and then r ′′ ∈ R a ∩ R a = /0, which is impossible.Therefore, (D.18) does not hold and we conclude 0 < r ≤ r < ∞ . We further assume r = r . Then i)-ii) would imply R v r ( y ) Q ( dy | ) = v r ( ) , which is impossible since v r is strictly increasing on [ , ] and (A3) holds. This proves iii).iv) By the definition of r and r , it can be shown using (D.6) that θ ( r ) ∈ ( , ) for r ∈ ( r , r ) . By the continuous dependence of the function v r ( · ) on r and the method ofproving [27, Lemma 10], we can show the continuity of θ ( r ) on ( , ) , and furthershow lim r → r + θ ( r ) = r → r − θ ( r ) =
1. So θ ( r ) is continuous on [ r , r ] . If θ ( r ) were not strictly increasing on [ r , r ] , there would exist r < r < r < r such that θ ( r ) ≥ θ ( r ) . (D.19)If θ ( r ) > θ ( r ) in (D.19), by the continuity of θ ( r ) , θ ( r ) = θ ( r ) =
1, and theintermediate value theorem we may find r ′ ∈ ( r , r ) such that θ ( r ′ ) = θ ( r ) . Next,we replace r by r ′ . Thus if θ ( r ) is not strictly increasing, we may find r < r from inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 29 ( r , r ) such that θ ( r ) = θ ( r ) ∈ ( , ) , which is a contradiction to Lemma D.1. Thisproves iv). ⊓⊔ Remark D.2.
By Lemmas D.3 and D.4, R a = ( , r ) and R a = ( r , ∞ ) . Acknowledgement
We would like to thank Aditya Mahajan for helpful discussions.
References
1. Acemoglu, D., Jensen, M.K.: Aggregate comparative statics. Games and Economic Behavior , 27-49 (2013)2. Acemoglu, D., Jensen, M.K.: Robust comparative statics in large dynamic economies. Journalof Political Economy , 587-640 (2015)3. Adlakha, S., Johari, R., Weintraub, G.Y.: Equilibria of dynamic games with many players:Existence, approximation, and market structure. J. Econ. Theory , 269-316 (2015)4. Altman, E., Stidham, S.: Optimality of monotonic policies for two-action Markovian deci-sion processes, with applications to control of queues with delayed information. QueueingSystems , 267-291 (1995)5. Amir R.: Sensitivity analysis of multisector optimal economic dynamics. Journal of Mathe-matical Economics , 123-141 (1996)6. Asmussen, S.: Applied Probability and Queues, 2nd edn. Springer, New York (2003)7. Athreya, K.B., Roy, V.: When is a Markov chain regenerative? Statistics and Probability Let-ters , 22-26 (2014)8. Babichenko, Y.: Best-reply dynamics in large binary-choice anonymous games. Games andEconomic Behavior , 130-144 (2013)9. Bardi, M.: Explicit solutions of some linear-quadratic mean field games. Netw. Heteroge-neous Media , 243-261 (2012)10. Bauerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer,Berlin (2011)11. Becker R. A.: Comparative dynamics in aggregate models of optimal capital accumulation.Quarterly Journal of Economics , 1235-1256 (1985)12. Bensoussan, A., Frehse, J., Yam, P.: Mean Field Games and Mean Field Type Control Theory.Springer, New York (2013)13. Biswas, A.: Mean field games with ergodic cost for discrete time Markov processes, preprint,arXiv:1510.08968, 2015.14. Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer-Verlag,New York (2000)15. Brock, W.A., Durlauf, S. N.: Discrete choice with social interactions. Rev. Econ. Studies ,235-260 (2001)16. Caines, P.E.: Mean field games. In: Samad, T., Baillieul, J. (eds.) Encyclopedia of Systemsand Control. Springer-Verlag, Berlin (2014)17. Caines, P.E., Huang, M., Malham´e, R.P.: Mean Field Games. In: Basar, T., Zaccour, G. (eds.)Handbook of Dynamic Game Theory, pp. 345-372, Springer, Berlin (2017)18. Cardaliaguet, P.: Notes on mean field games, University of Paris, Dauphine (2012)19. Carmona R., Delarue, F.: Probabilistic Theory of Mean Field Games with Applications, vol Iand II. Springer, Cham (2018)0 Minyi Huang and Yan Ma20. Dorato, P.: On sensitivity in optimal control systems. IEEE Transactions on Automatic Con-trol , 256-257 (1963)21. Filar, J.A., Vrieze, K.: Competitive Markov Decision Processes. Springer, New York (1997)22. Gomes, D. A., Mohr, J., Souza, R.R.: Discrete time, finite state space mean field games. J.Math. Pures Appl. ε -Nash equilibria. IEEETrans. Autom. Control , 1560-1571 (2007)27. Huang, M., Ma, Y.: Mean field stochastic games: Monotone costs and threshold policies (inChinese), Sci. Sin. Math. (special issue in honour of the 80th birthday of Prof. H-F. Chen) ,1445-1460 (2016)28. Huang, M., Ma, Y.: Mean field stochastic games with binary action spaces and monotonecosts. arXiv:1701.06661v1, 2017.29. Huang, M., Ma, Y.: Mean field stochastic games with binary actions: Stationary thresholdpolicies. Proc. 56th IEEE Conference on Decision and Control, Melbourne, Australia, pp.27-32 (2017)30. Huang, M., Malham´e, R.P., Caines, P.E.: Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Commun. In-form. Systems , 221-251 (2006)31. Huang, M., Zhou, M..: Linear quadratic mean field games: Asymptotic solvability and rela-tion to the fixed point approach. IEEE Transactions on Automatic Control (2018, in revision,conditionally accepted)32. Ito, K., Kunisch, K.: Sensitivity analysis of solutions to optimization problems in Hilbertspaces with applications to optimal control and estimation. J. Differential Equations , 1-40(1992)33. Jovanovic, B., Rosenthal, R.W.: Anonymous sequential games. Journal of Mathematical Eco-nomics , 77-87 (1988)34. Jiang, L., Anantharam, V., Walrand, J.: How bad are selfish investments in network security?IEEE/ACM Trans. Networking , 549-560 (2011)35. Kolokoltsov, V.N.: Nonlinear Markov games on a finite state space (mean-field and binaryinteractions). International J. Statistics Probability , 77-91 (2012)36. Kress, R.: Linear Integral Equations. Springer, Berlin (1989)37. Lasry, J.-M., Lions, P.-L.: Mean field games. Japan. J. Math. , 229-260 (2007)38. Lelarge, M., Bolot, J.: A local mean field analysis of security investments in networks. Proc.ACM SIGCOMM NetEcon, Seattle, WA, pp. 25-30, 200839. Li, T., Zhang, J.-F.: Asymptotically optimal decentralized control for large populationstochastic multiagent systems. IEEE Trans. Autom. Control , 1643-1660 (2008)40. Manfredia, P., Posta, P.D., d ´Onofrio, A., Salinelli, E., Centrone, F., Meo, C., Poletti, P.: Op-timal vaccination choice, vaccination games, and rational exemption: An appraisal. Vaccine , 98-109 (2010)41. Meyn, S., Tweedie, R. L.: Markov Chains and Stochastic Stability, 2nd ed. Cambridge Uni-versity Press, Cambridge (2009)42. Milgrom, P., Shannon, C.: Monotone comparative statics. Econometrica , 157-80 (1994)43. Moon, J., Basar, T.: Linear quadratic risk-sensitive and robust mean field games. IEEE Trans.Autom. Control , 1062-1077 (2017)44. M¨uller, A., Stoyan, D.: Comparison Methods for Stochastic Models and Risks. Wiley, Chich-ester (2002)45. Oniki, H.: Comparative dynamics (sensitivity analysis) in optimal control theory. J. Econ.Theory , 265-283 (1973)inary Mean Field Stochastic Games: Stationary Equilibria and Comparative Statics 3146. Saldi, N., Basar, T., Raginsky, M.: Markov-Nash equilibria in mean-field games with dis-counted cost. SIAM J. Control Optimization , 4256-4287 (2018)47. Samuelson, P.A.: Foundations of Economic Analysis, enlarged edn., Harvard UniversityPress, Cambridge, MA (1983)48. Schelling, T.C.: Hockey helmets, concealed weapons, and daylight saving: A study of binarychoices with externalities. The Journal of Conflict Resolution , 381-428 (1973)49. Selten, R.: An axiomatic theory of a risk dominance measure for bipolar games with linearincentives. Games and Econ. Behav. , 213-263 (1995)50. Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. , 1095-1100 (1953)51. Sigman, K., Wolff, R.W.: A review of regenerative processes. SIAM Rev. , 269-288 (1993)52. Sun, Y.: The exact law of large numbers via Fubini extension and characterization of insurablerisks. J. Econ. Theory , 31-69 (2006)53. Topkis, D.M.: Supermodularity and Complementarity. Princeton Univ. Press, Princeton(1998)54. Walker, M., Wooders, J., Amir, R.: Equilibrium play in matches: Binary Markov games.Games and Economic Behavior , 487-502 (2011)55. Weintraub, G.Y., Benkard, C.L., Van Roy, B.: Markov perfect industry dynamics with manyfirms. Econometrica , 1375-1411 (2008)56. Yong, J.: Linear-quadratic optimal control problems for mean-field stochastic differentialequations. SIAM J. Control Optim.51