Analysis of Markovian Competitive Situations using Nonatomic Games
aa r X i v : . [ q -f i n . E C ] A p r Analysis of Markovian Competitive Situations usingNonatomic Games
Jian YangDepartment of Management Science and Information SystemsBusiness School, Rutgers UniversityNewark, NJ 07102Email: [email protected] 2016; revised, February 2017
Abstract
For dynamic situations where the evolution of a player’s state is influenced byhis own action as well as other players’ states and actions, we show that equilibriaderived for nonatomic games (NGs) can be used by their large finite counterpartsto achieve near-equilibrium performances. We focus on the case with quite generalspaces but also with independently generated shocks driving random actions and statetransitions. The NG equilibria we consider are random state-to-action maps that payno attention to players’ external environments. They are adoptable by a variety ofreal situations where awareness of other players’ states can be anywhere between fulland non-existent. Transient results here also form the basis of a link between an NG’sstationary equilibrium (SE) and good stationary profiles for large finite games.
Keywords:
Nonatomic Game; Markov Equilibrium; Large Finite Game Introduction
Many multi-period competitive situations, as first noted by Shapley [25], involve randomly-evolving player states that affect players’ payoffs. When making a decision, a player has tocontemplate not only what states other players are in and how other players will act, butalso how his and others’ states and actions will influence the future evolution of all players’states. Another complicating factor is that players may have zero, partial, or full knowledgeof other players’ states before they take their actions in each period. The task of analyzingthese dynamic games is certainly daunting. Consider a dynamic pricing game as an example.Multiple firms start a fixed time horizon with stocks of the same product. Each of them isbent on using pricing to influence demand and earn the highest revenue from selling theirrespective stocks. In any given period, a firm is aware of its own current inventory level butnot the levels of others. Yet, demand arrival to the firm is both random and influenced bynot only its own price, but also prices charged by other firms.The ultimate goal with such a Markovian game lies in identifying an equilibrium actionplan that will earn each player the highest total payoff when other players adhere to theplan. But even in the stationary setting, known equilibria come in quite complicated formsthat for real implementation, demand a high degree of coordination among players; see,e.g., Mertens and Parthasrathy [20], Duffie, Geanakoplos, Mas-Colell, and McLennan [9],and Solan [26]. Alternatively, we propose that equilibria be reached asymptotically as thenumber of players grows, on the premise that the game’s nonatomic-game (NG) counterpartis analyzable. In the latter, a continuum of players are in competition, none of whom havingany discernible influence on any other players and yet all players in aggregation hold swayon players’ payoffs and state evolutions. The key advantage of such a game is that its statedistribution will evolve in a deterministic fashion. This results in the relatively simple formtaken by the NG’s equilibria x : the pure or mixed action plan x t ( s t ), though dependent onthe time period t and his own individual state s t , is insensitive to whatever portion of theoverall state distribution that the player can observe.When an NG equilibrium is handy, we show that it can be used on the original finiteMarkovian game to serve our intended purpose. Relying on intermediate results stemmingfrom the weak Law of Large Numbers (LLN) concerning empirical distributions, we establishtwo main results. In Theorem 1, we show that the empirical distribution of players’ states,which is itself random in the finite game, will nevertheless converge in probability to thedeterministic distribution as predicted for the NG counterpart when the number of players2rows to infinity. This convergence paves way for Theorem 2, which states that players canapply the observation-blind NG equilibrium to the finite-player situation and gain an averageperformance that is ever harder to beat as the number of players grows. In both results, the“average” on players’ states is assessed on the state distribution prevailing in either the NGor the finite game. After assuming time-invariant payoff and transition functions, as wellas fixed discountings over time and an infinite time horizon, we obtain a stationary setting.For this, we establish Theorem 3, effectively our affirmative answer to whether stationaryequilibria (SE) studied in past literature can be useful in large finite games.The above theory will be most useful when the NG counterpart is relatively easy to dealwith in comparison to the corresponding finite games. Besides evidence in literature, thispoint is further buttressed by the dynamic pricing game mentioned earlier. Presented in Yang[33] as supplementary material to the current paper, our analysis demonstrates the usefulnessof the transient result Theorem 2. The game is also extended through the consideration oflocked-in production, wherein every firm uses production to bring its inventory back up to apre-determined level whenever it becomes empty. The resultant game is again asymptoticallyanalyzable due to the stationary result Theorem 3.As our foremost contribution, we established one more link between NGs and their finite-game counterparts. Previously, links were mostly established for single-period games, specialmulti-period games without individual states, or games exhibiting stationary features. Theintroduction of information-carrying individual states allow in for proper treatment a muchwider body of applicable situations involving present-future tradeoffs and transient prop-erties. Comparing to the earlier work Yang [32] which dealt with the NG-finite link forMarkovian games as well, the current paper treats more general, non-discrete state andaction spaces. As a tradeoff, we are compelled to let random shocks drive both decisionmaking and state evolution. This, as is backed up by results such as Aumann [6] on the in-terchangeability between the presentations with and without drivers, does not much restrictthe generality of our results. Moreover, we demonstrated that the usefulness of SEs to largefinite games stems from more fundamental properties possessed by transient NG equilibria.Here is our plan for the remainder of the paper. We spend Section 2 on a survey ofrelated research and Section 3 on basic model primitives. The nonatomic game is introducedin Section 4, while finite games are treated in Section 5. We present the main transientconvergence results in Section 6. These are used in Section 7 to establish a link betweenSEs and large finite games with stationary features. Further discussion is made in Section 8,3hile the paper is concluded in Section 9. NGs are often easier to analyze than their finite counterparts, because in them, the actionof an individual player has no impact on payoffs and future state evolutions of the otherplayers. Therefore, they are often used as proxies of real competitive systems in economicstudies; see, e.g., Aumann [5] and Reny and Perry [22]. Systematic research on NG startedwith Schmeidler [24]. He formulated a single-period semi-anonymous NG, wherein the jointdistribution of other players’ types and actions may affect any given player’s payoff. Whenthe action space is finite, Schmeidler established the existence of pure equilibria when thegame becomes anonymous, so that only the distribution of other players’ actions matters.Mas-Colell [19] showed the existence of distributional equilibria in anonymous NGs withcompact action spaces. Khan, Rath, and Sun [18] identified a certain limit to which Schmei-dler’s result can be extended. Links between NGs and their finite counterparts were coveredin Green [12], Housman [14], Carmona [8], Kalai [17], Al-Najjar [4], and Yang [31].This paper differs from the above by its focus on multi-period games. For such gameswithout individual states that allow past actions to impact future gains, Green [11], Sabourian[23], and Al-Najjar and Smorodinsky [3] showed that equilibria for large games are nearlymyopic. With individual states that inherit traces of past actions, the games we study posenew challenges. An NG equilibrium for our situation is certainly not myopic as it takesinto account the current action’s future consequences. Rather, it is insensitive to real-timeobservations made on other players’ states. We succeed in showing that such a simple actionplan can be used profitably in finite situations with randomly evolving state distributions ofwhich a player may have zero, partial, or full knowledge. The type of NGs we deal with aresimilar to sequential anonymous games studied by Jovanovic and Rosenthal [16], who estab-lished existence of distributional equilibria. The result was generalized to games involvingaggregate shocks by Bergin and Bernhardt [7]. Different from these papers, we work on thelink between NGs and finite games, not the NGs themselves.In their effort to simplify dynamic games, some authors went further than silencingindividual players’ influences as done through the NG approach. In addition, they pursuedthe so-called stationary equilibria (SE), which stressed further the long-run steady-statenature of individual action plans and system-wide state distributions; see, e.g., Hopenhayn413] and Adlakha and Johari [1]. The oblivious equilibrium (OE) concept as proposed byWeintraub, Benkard, and van Roy [29], though accounting for impacts of large players,took the same stationary approach by letting firms beware of only long-run average statedistributions. We caution that the implicit stationarity of SE or OE renders it inappropriatefor applications that are transient by nature; for instance, the dynamic pricing game studiedin Yang [33] where the inventory level of every firm can only decrease over time.Some recent works also contributed on the links between equilibria of infinite-playergames and their finite-player brethren. Weintraub, Benkard, and van Roy [30] did so for asetting where long-run average system state can be defined. Adlakha, Johari, and Weintraub[2] established the existence of SE and achieved a similar conclusion by using only exoge-nous conditions on model primitives. Weintraub et al. [28] studied nonstationary obliviousequilibria (NOE) that capture transient behaviors of players, and showed their usefulness infinite-player situations by relying on a “light-tail” condition on players’ state distributionssimilar to that used in [30]. Huang, Malhame, and Caines [15] dealt with a continuous-timemulti-player system where independent diffusion processes provide random drivers. Theyreached equilibria in the nonatomic limit, and derived asymptotic results as the numberof players becomes large. In the work, other players impact a given player through linearfunctionals of the state distribution they form; meanwhile, their actions play no direct role.Our discrete-time framework afforded us almost full generality regarding other players’impacts on any given player’s payoffs and state transitions—it is the joint state-action dis-tribution that forms the environment faced by an individual player. As already mentioned,while this paper tackles the case where exogenous shocks drive state evolution and decisionmaking, Yang [32] dealt with the setting where such shocks are not necessarily identifiable;however, technical challenges faced there forced state and action spaces to be discrete.
In the dynamic games we study, players are engaged in multi-period competition in periods t = 1 , , ..., ¯ t . In period t , a player’s payoff ψ t ( s, x, µ ) depends on his state s , action x , andsome µ depicting the outside environment. We suppose the game is semi-anonymous, sothat µ can just be the joint distribution of other players’ states and actions. The dynamicsof the game is represented by a function θ t ( s, x, µ, i ), where s , x , and µ are defined as above,and i is an idiosyncratic shock the player experiences individually after taking his action.5ll players’ post-action shocks are independently sampled from a common distribution ι .We allow players to cast dices to decide their actions. However, we do not model theextent to which players can observe their outside environments; after all, we focus onlyon action plans that do not take advantage of any such observations. In every period t , wesuppose each player receives another idiosyncratic shock g , this time before taking his action.All players’ pre-action shocks, such as outcomes of dice casts, are independently sampledfrom a common distribution γ . We study the case where a player’s action x t ( s, g ) dependsmerely on his own state s and the shock g that he himself has received. The main purposeof the paper is to show that one such action plan x [1¯ t ] ≡ ( x t ) t =1 ,..., ¯ t is quite sufficient for themulti-period game just described, even when the latter may be transient in nature and ofthe more complex finite-player variety.Some notations are needed for formal definitions. Given a metric space A , we use d A to denote its metric, B ( A ) its Borel σ -field, and P ( A ) the set of all probability measures(distributions) on the measurable space ( A, B ( A )). The space P ( A ) is metrized by theProhorov metric ρ A , which induces on it the weak topology. Given metric spaces A and B ,we use M ( A, B ) to represent all measurable functions from A to B .We use complete separable metric space S for individual states s and separable metricspace X for player actions x . In a semi-anonymous fashion, payoffs and state transitionsdepend on the joint distribution µ ∈ P ( S × X ) of other players’ states and actions. Letpre-action shocks g come from a complete separable metric space G . In every period, theseaction-influencing shocks are independently drawn from a common distribution γ ∈ P ( G ).Let post-action shocks i come from a complete separable metric space I . In every period,these transition-influencing shocks are independently drawn from a common distribution ι ∈ P ( I ). The completeness requirements on S , G , and I stem from the need to invokeLemma 3 in Appendix A. These are certainly not stringent.For period t = 1 , ..., ¯ t , a player’s state s ∈ S , his action x ∈ X , and the joint state-actiondistribution µ ∈ P ( S × X ) he faces, together determine his payoff in period t . In fact, werequire there to be a bounded payoff function ψ t : S × X × P ( S × X ) −→ [ − ¯ ψ t , ¯ ψ t ] , (1)where ¯ ψ t is some positive constant. It satisfies that ψ t ( · , · , µ ) ∈ M ( S × X, [ − ¯ ψ t , ¯ ψ t ]) forevery given distribution µ ∈ P ( S × X ). As the same player will enter a new state under6ost-action shock i ∈ I , we require there to be θ t : S × X × P ( S × X ) × I −→ S. (2)It satisfies that θ t ( · , · , µ, · ) ∈ M ( S × X × I, S ) at every distribution µ ∈ P ( X × S ).The action plans we consider are of the form x t : S × G −→ X, (3)which are required to be members of M ( S × G, X ). That is, a player will use action x t ( s, g )in period t when he starts with state s ∈ S and receives pre-action shock g . We call any statedistribution σ ∈ P ( S ) a pre-action environment, because the one formed by other playersis what a player could potentially see at the beginning of any period. Also, call any jointstate-action distribution µ ∈ P ( S × X ) an in-action environment, because the one formedby other players is what a player could potentially see in the midst of play in any period.Let us recount our model primitives as follows: the horizon length ¯ t ; the state space S , the action space X , the pre-action shock space G , the post-action shock space I ; also,the pre-action shock distribution γ , the post-action shock distribution ι ; finally for periods t = 1 , ..., ¯ t , the payoff functions ψ t and state transition functions θ t . Given an initial pre-action environment σ ∈ P ( S ), we can define a nonatomic game Γ( σ )which starts period 1 with σ as the distribution of all players’ states. We focus on policyprofiles of the form x [1¯ t ] ≡ ( x t ) t =1 ,..., ¯ t ∈ ( M ( S × G, X )) ¯ t , where each x t ∈ M ( S × G, X ) is amap from a player’s state-shock pairs to actions. Along with the given initial environment σ ,we suppose such a profile will help generate a deterministic pre-action environment trajectory σ [1 , ¯ t +1] ≡ ( σ t ) t =1 , ,..., ¯ t, ¯ t +1 ∈ ( P ( S )) ¯ t +1 . This allows a player’s policy to be observation-blind;that is, what portion of σ t is observable to the player in each period t is not of any concern.The determinism of the environment evolution in Γ( σ ) is justifiable by Sun’s [27] LLNinvolving a continuum of indexed players.We now discuss how the deterministic trajectory can be formed. Let t = 1 , ..., ¯ t be given.When all players form state distribution σ t ∈ P ( S ) at the beginning and adopt the sameplan x t ∈ M ( S × G, X ) for the period, the in-action environment µ t ≡ M ( σ t , x t ) ∈ P ( S × X )to be experienced by all players will take the form µ t = M ( σ t , x t ) = ( σ t × γ ) · (prj S , x t ) − , (4)7here prj S stands for the projection map from S × G to S . The meaning for (4) is that, forany measurable joint state-action set W ′ ∈ B ( S × X ), µ t ( W ′ ) = ( σ t × γ ) (cid:0) (prj S , x t ) − ( W ′ ) (cid:1) = Z S Z G [( s, x t ( s, g )) ∈ W ′ ] · γ ( dg ) · σ t ( ds ) . (5)This reflects that the joint distribution for states and pre-action shocks is the product form σ t × γ ; also, x t provides the map from state-shock pairs to actions for this period.For a player who starts with state s t and has experienced pre-action shock g t as well aspost-action shock i t , his new state will be governed by (2): s t +1 = θ t ( s t , x t ( s t , g t ) , M ( σ t , x t ) , i t ) . (6)To describe the transition of the overall pre-action environment from σ t to σ t +1 under actionplan x t , we define operator T t ( x t ) on P ( S ). Note that states are distributed according to σ t , pre-action shocks are distributed according to γ , and post-action shocks are distributedaccording to ι . So following (6), σ t +1 = T t ( x t ) ◦ σ t = ( σ t × ι × γ ) · (cid:2) θ t (cid:0) prj S , x t · prj S × G , M ( σ t , x t ) , prj I (cid:1)(cid:3) − , (7)meaning that, for any measurable action set S ′ ∈ B ( S ), σ t +1 ( S ′ ) = [ T t ( x t ) ◦ σ t ]( S ′ )= R S R G R I [ θ t ( s, x t ( s, g ) , M ( σ t , x t ) , i ) ∈ S ′ ] · ι ( di ) · γ ( dg ) · σ t ( ds ) . (8)We can iteratively define T [ tt ′ ] ( x [ tt ′ ] ) for t ′ = t − , t, t + 1 , ... so that T [ t,t − is the identitymapping on P ( S ) and for t ′ = t, t + 1 , ... , T [ tt ′ ] ( x [ tt ′ ] ) = T t ′ ( x t ′ ) ◦ T [ t,t ′ − ( x [ t,t ′ − ) . (9)The environment trajectory alluded to earlier is therefore σ [1 , ¯ t +1] = ( T [1 ,t − ( x [1 ,t − ) ◦ σ ) t =1 , ,..., ¯ t, ¯ t +1 . (10)In defining Γ( σ )’s equilibria, we subject a candidate policy profile to the one-time de-viation of a single player, who is negligible in his influence over others. The deviation willnot alter the environment trajectory corresponding to the candidate profile. Thus, we define v t ( s t , σ t , x [ t ¯ t ] , y t ) as the total expected payoff a player can make from time t to ¯ t , when hestarts with state s t ∈ S , other players form pre-action environment σ t ∈ P ( S ), all playersadopt policy x [ t ¯ t ] ≡ ( x t ′ ) t ′ = t,..., ¯ t ∈ ( M ( S × G, X )) ¯ t − t +1 with the exception of the current8layer in period t alone, who deviates to policy y t ∈ M ( S × G, X ) in that period. As aterminal condition, we have v ¯ t +1 ( s ¯ t +1 , σ ¯ t +1 , y ¯ t +1 ) = 0 . (11)For t = ¯ t, ¯ t − , ...,
1, we have v t ( s t , σ t , x [ t ¯ t ] , y t ) = R G [ ψ t ( s t , y t ( s t , g t ) , M ( σ t , x t )) + R I v t +1 ( θ t ( s t , y t ( s t , g t ) ,M ( σ t , x t ) , i t ) , T t ( x t ) ◦ σ t , x [ t +1 , ¯ t ] , x t +1 ) · ι ( di t )] · γ ( dg t ) , (12)due to the dynamics illustrated in (6) to (8). The deviation y t affects the current player’saction y t ( s t , g t ) in period t and his own state θ t ( s t , y t ( s t , g t ) , M ( σ t , x t ) , i t ) in period t + 1. Butas a distinctive feature of the NG setup, it has no bearing on the period-( t + 1) pre-actionenvironment T t ( x t ) ◦ σ t .Now define u t : P ( S ) × ( M ( S × G, X )) ¯ t − t +1 × M ( S × G, X ) −→ ℜ so that u t ( σ t , x [ t ¯ t ] , y t ) = Z S v t ( s t , σ t , x [ t ¯ t ] , y t ) · σ t ( ds t ) . (13)This can be understood as one particular player’s average gain from period t onward whenthe same conditions specified earlier prevail and his period- t state is sampled from the dis-tribution σ t . We deem policy x ∗ [1¯ t ] ≡ ( x ∗ t ) t =1 , ,..., ¯ t ∈ ( M ( S × G, X )) ¯ t a Markov equilibriumfor the game Γ( σ ) when, for every t = 1 , , ..., ¯ t and y t ∈ M ( S × G, X ), u t (cid:16) T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ , x ∗ [ t ¯ t ] , x ∗ t (cid:17) ≥ u t (cid:16) T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ , x ∗ [ t ¯ t ] , y t (cid:17) . (14)That is, policy x ∗ [1¯ t ] will be regarded an equilibrium when it cannot be bettered by anyplan y t ∈ M ( S × G, X ) in any period t in an average sense that is defined by the period- t environment σ t = T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ . We caution that (14) is weaker than v t (cid:16) s t , T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ , x ∗ [ t ¯ t ] , x ∗ t (cid:17) ≥ v t (cid:16) s t , T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ , x ∗ [ t ¯ t ] , y t (cid:17) , (15)for every s t ∈ S . On the other hand, since y t ∈ M ( S × G, X ) allows for much freedom inchoosing for each state s ∈ S and shock g ∈ G a competitive reaction y t ( s, g ), there is notmuch difference between the two criteria aside from measurability subtleties. More notations are needed to appropriately describe finite games. For metric space A and a ∈ A , we use ε ( a ) to denote the singleton probability measure with ε ( a )( { a } ) = 1. For9 = ( a , ..., a n ) ∈ A n where n ∈ N , the set of natural numbers, we use ε ( a ) for P nm =1 ε ( a m ) /n .The two uses are consistent. We also use P n ( A ) for the space of probability measures of thetype ε ( a ) for a ∈ A n , i.e., the space of empirical distributions generated from n samples.For some n = 2 , , ... and initial state distribution σ ∈ P n ( S ), we can define an n -player game Γ n ( σ ). Note the initial pre-action environment σ must be of the form ε ( s ) = ε ( s , s , ..., s n ), where each s m ∈ S is player m ’s initial state. The game’s payoffsand state transitions are still governed by (1) and (2), respectively. In period t , the pre-actionenvironment is also some σ t = ε ( s t , ..., s tn ) ∈ P n ( S ) ⊂ P ( S ). Hence, the in-action environ-ment µ t ∈ P n − ( S × X ) ⊂ P ( S × X ) experienced by any designated player 1 is the empiricaldistribution ε ( s t, − , y t, − ) = ε (( s t , y t ) , ..., ( s tn , y tn )) when each player m is in state s tm ∈ S and takes action y tm ∈ X . Let players still adopt policy x [1¯ t ] ≡ ( x t ) t =1 ,..., ¯ t ∈ ( M ( S × G, X )) ¯ t ,which is but the crudest of many choices available to the n players. We shall see later thatthis restriction is not going to do much harm.Simplistic as it may seem, x will not merely generate a deterministic environment tra-jectory. Given pre-action shock vector g t = ( g t , ..., g tn ) ∈ G n and post-action shock vector i t = ( i t , ..., i tn ) ∈ I n , we can define T nt ( x t , g t , i t ) as the operator on P n ( S ) that convertsa period- t pre-action environment into a period-( t + 1) one. Thus following (4) to (6), ε ( s t +1 ) = T nt ( x t , g t , i t ) ◦ ε ( s t ) is such that s t +1 ,m = θ t ( s tm , x t ( s tm , g tm ) , M n ( s t, − m , g t, − m , x t ) , i tm ) , ∀ m = 1 , , ..., n, (16)where M n ( s t, − m , g t, − m , x t ) = ε ( s t, − m , g t, − m ) · (prj S , x t ) − , (17)and each ε ( s t, − m , g t, − m ) represents the empirical distribution built on state-shock pairs( s t , g t ), ... , ( s t,m − , g t,m − ), ( s t,m +1 , g t,m +1 ), ... , ( s tn , g tn ). The latter reflects that player m ’s in-action environment is made up of the states and actions of the other n − T n, [ tt ′ ] as theidentity map when t ′ ≤ t − t ≤ t ′ , let T n, [ tt ′ ] ( x [ tt ′ ] , g [ tt ′ ] , i [ tt ′ ] ) = T nt ′ ( x t ′ , g t ′ , i t ′ ) ◦ T n, [ t,t ′ − ( x [ t,t ′ − , g [ t,t ′ − , i [ t,t ′ − ) . (18)The evolution of pre-action envirnoments σ t = ε ( s t ) is guided by the random shock vectors g t and i t , and hence is stochastic by nature.For an n -player game, let v nt ( s t , ε ( s t, − ) , x [ t ¯ t ] , y t ) be the total expected payoff player 1can make from t to ¯ t , when he starts with state s t ∈ S , other players’ initial environments10re describable by their aggregate empirical state distribution ε ( s t, − ) = ε ( s t , ..., s tn ), andall players adopt the policy x [ t ¯ t ] ≡ ( x t ′ ) t ′ = t,..., ¯ t ∈ ( M ( S × G, X )) ¯ t − t +1 from period t to period¯ t with the exception of player 1 in period t alone, who deviates to policy y t ∈ M ( S × G, X ).As a terminal condition, we have v n, ¯ t +1 ( s ¯ t +1 , , ε ( s ¯ t +1 , − ) , y ¯ t +1 ) = 0 . (19)For t = ¯ t, ¯ t − , ...,
1, we have the recursive relationship v nt ( s t , ε ( s t, − ) , x [ t ¯ t ] , y t ) = R G n γ n ( dg t ) × { ψ t ( s t , y t ( s t , g t ) , M n ( s t, − , g t, − , x t ))+ R I n ι n ( di t ) × v n,t +1 ( θ t ( s t , y t ( s t , g t ) , M n ( s t, − , g t, − , x t ) , i t ) , [ T nt ( x t , g t , i t ) ◦ ε ( s t )] − , x [ t +1 , ¯ t ] , x t +1 ) } , (20)due to the dynamics illustrated in (6) and (16). By [ T nt ( x t , g t , i t ) ◦ ε ( s t )] − , we mean ε ( s t +1 , − ),where ε ( s t +1 ) is T nt ( x t , g t , i t ) ◦ ε ( s t ) as defined through (16). The current (20) is much morecomplicated than the NG counterpart (12). The evolution from period t to t +1 now dependson pre-action shocks g t ≡ ( g t , ..., g tn ) and post-action shocks i t ≡ ( i t , ..., i tn ). Also, the in-action environment M n ( s t, − , g t, − , x t ) experienced by player 1 excludes his own state andaction, and hence is different from the environment faced by any other player. Similarly, thein-action environment [ T nt ( x t , g t , i t ) ◦ ε ( s t )] − to be faced by player 1 in period t + 1 is uniqueto him as well. The added complexity motivates us to exploit the easier-to-handle NG case.Let σ [1¯ t ] ≡ ( σ t ) t =1 ,..., ¯ t ∈ ( P ( S )) ¯ t be a sequence of environments. For ǫ ≥
0, we deem x ∗ [1¯ t ] ≡ ( x ∗ t ) t =1 ,..., ¯ t ∈ ( M ( S × G, X )) ¯ t an ǫ -Markov equilibrium for the game family (Γ n ( ε ( s )) | s ∈ S n ) in the sense of σ [1¯ t ] when, for every t = 1 , ..., ¯ t and y t ∈ M ( S × G, X ), Z S n v nt (cid:16) s t , ε ( s t, − ) , x ∗ [ t ¯ t ] , x ∗ t (cid:17) · σ nt ( ds t ) ≥ Z S n v nt (cid:16) s t , ε ( s t, − ) , x ∗ [ t ¯ t ] , y t (cid:17) · σ nt ( ds t ) − ǫ. (21)That is, action plan x ∗ [1¯ t ] will be an ǫ -Markov equilibrium in the sense of σ [1¯ t ] when underits guidance, the average payoff from any period t on will not be improved by more than ǫ through any deviation, where the “average” is taken with respect to state distribution σ t . We can achieve convergences of environments and then of equilibria. The former is morefundamental and challenging, and the latter is built on it.11 .1 Convergence of Environments
Even without touching upon payoffs or equilibria, we can establish a link between finite gamesand their NG counterpart. It reflects that stochastic environment pathways experienced bylarge finite games converge to the NG’s deterministic environment trajectory.Let A , B , and C be metric spaces and π B ∈ P ( B ) be a distribution. We use K ( A, B, π B , C ) ⊆ M ( A × B, C ) to represent the space of all measurable functions from A × B to C that areuniformly continuous in a probabilistic sense. The criterion for y ∈ K ( A, B, π B , C ) is thatfor any ǫ >
0, there exists δ >
0, so that for any a, a ′ ∈ A satisfying d A ( a, a ′ ) < δ , π B ( { b ∈ B | d C ( y ( a, b ) , y ( a ′ , b )) < ǫ } ) > − ǫ. (22)When B is a singleton and hence π B is degenerate, y ∈ K ( A, B, π B , C ) merely means that y is a uniformly continuous function from A to C , a situation we denote by y ∈ K ( A, C ). Forregular B and π B , the meaning is somehow that continuity will happen in most cases.We now make two assumptions on the transition function θ t :(S1) For every µ ∈ P ( S × X ), the function θ t ( · , · , µ, · ) is a member of K ( S × X, I, ι, S ).That is, for any µ ∈ P ( S × X ) and ǫ >
0, there exist δ S > δ X >
0, so that for any s, s ′ ∈ S and x, x ′ ∈ X satisfying d S ( s, s ′ ) < δ S and d X ( x, x ′ ) < δ X , ι ( { i ∈ I | d S ( θ t ( s, x, µ, i ) , θ t ( s ′ , x ′ , µ, i )) < ǫ } ) > − ǫ. (S2) Not only is it true that θ t ( s, x, · , · ) ∈ K ( P ( S × X ) , I, ι, S ) at every ( s, x ) ∈ S × X ,but the continuity is also achieved at a rate independent of the ( s, x ) present. That is, forany µ ∈ P ( S × X ) and ǫ >
0, there exists δ >
0, so that for any µ ′ ∈ P ( S × X ) satisfying ρ S × X ( µ, µ ′ ) < δ , as well as any s ∈ S and x ∈ X , ι ( { i ∈ I | d S ( θ t ( s, x, µ, i ) , θ t ( s, x, µ ′ , i )) < ǫ } ) > − ǫ. For separable metric space A , we use ( A n , B n ( A )) to denote the product measurable spacethat houses n -long sample sequences. Given π ∈ P ( A ), we use π n to denote the productmeasure on ( A n , B n ( A )). We can show that a one-step evolution in a big game is not thatmuch different from that in a nonatomic game. Proposition 1
Given separable metric space A , distribution π ∈ P ( A ) , and pre-actionenvironment σ ∈ P ( S ) , suppose s n = ( s n ( a ) | a ∈ A n ) for each n ∈ N is a member of M ( A n , S n ) , and ε ( s n ( a )) converges to σ in probability, to the effect that π n ( { a ∈ A n | ρ S ( ε ( s n ( a )) , σ ) < ǫ } ) > − ǫ, or any ǫ > and any n large enough. Then, any T nt ( x, g, i ) ◦ ε ( s n ( a )) will converge to T t ( x ) ◦ σ in probability for any probabilistically continuous x . That is, for any x ∈ K ( S, G, γ, X ) , ( π × γ × ι ) n ( { ( a, g, i ) ∈ ( A × G × I ) n | ρ S ( T nt ( x, g, i ) ◦ ε ( s n ( a )) , T t ( x ) ◦ σ ) < ǫ } ) > − ǫ, for any ǫ > and any n large enough. Recall that ρ S is the Prohorov metric for measuring the distance between two statedistributions. Also, the operator T t ( x ) delineating the period- t transition of an NG’s pre-action environment is defined at (8), and its finite-game counterpart T nt ( x, g, i ) is definedat (16). The proof of Proposition 1 calls upon Lemma 3 in Appendix A. This is why the spaces S , G , and I are required to be complete. Now imagine that ( A, B ( A ) , π ) provides exogenousshocks that drive games’ evolutions up to period t : A = S × G t − × I t − and π = σ × γ t − × ι t − . Proposition 1 states that, when starting period t with initial state vectors s n ( a )in n -player games that in aggregation increasingly resemble the given starting distribution σ for the NG, one will still get state vectors in large games that in aggregation resemblethe NG’s state distribution after the period- t transition. When exploiting this propositioniteratively, we can arrive at our first main result on the convergence of environments. Theorem 1
Let a policy profile x [ t ¯ t ] ∈ ( M ( S × G, X )) ¯ t − t +1 for periods t, t + 1 , ..., ¯ t be suchthat each x t ′ is a member of K ( S, G, γ, X ) . Then, when we sample s t = ( s t , ..., s tn ) froma given pre-action environment σ t ∈ P ( S ) , the sequence ( σ nt ′ ) t ′ = t,t +1 ,..., ¯ t, ¯ t +1 of stochasticpre-action environments will converge to the sequence ( σ t ′ ) t ′ = t,t +1 ,..., ¯ t, ¯ t +1 of deterministic pre-action environments in probability, where for each t ′ = t, t + 1 , ..., ¯ t, ¯ t + 1 , σ nt ′ is a sampleover the ε ( s t ′ ) ’s with ε ( s t ′ ) = T n, [ t,t ′ − ( x [ t,t ′ − , g [ t,t ′ − , i [ t,t ′ − ) ◦ ε ( s t ) , while ( s t , g [ t,t ′ − , i [ t,t ′ − ) is distributed according to ( σ t × γ t ′ − t × ι t ′ − t ) n ; also, σ t ′ = T [ t,t ′ − ( x [ t,t ′ − ) ◦ σ t . That is, forany ǫ > and any n large enough, (cid:16) σ t × γ ¯ t − t +1 × ι ¯ t − t +1 (cid:17) n (cid:16) ˜ A n ( ǫ ) (cid:17) > − ǫ, where ˜ A n ( ǫ ) ∈ B n ( S × G ¯ t − t +1 × I ¯ t − t +1 ) is such that, for any ( s t , g [ t, ¯ t ] , i [ t, ¯ t ] ) ∈ ˜ A n ( ǫ ) , ρ S ( σ nt ′ , σ t ′ ) < ǫ, ∀ t ′ = t, t + 1 , ..., ¯ t, ¯ t + 1 . The multi-period transition operator T [ t,t ′ − ( x [ t,t ′ − ) for the NG is defined at (9), and itsfinite-game counterpart T n, [ t,t ′ − ( x [ t,t ′ − , g [ t,t ′ − , i [ t,t ′ − ) is defined at (18). Suppose an NGstarts period t with pre-action environment σ t and a slew of finite games start the period13ith pre-action environments that are sampled from σ t . Let the evolution of both typesof games be guided by players acting according to the same probabilistically continuouspolicy profile x [ t ¯ t ] . Then, as the numbers of players n involved in finite games grow toinfinity, Theorem 1 predicts for ever less chances for the finite games’ period- t ′ environments σ nt ′ = T n, [ t,t ′ − ( x [ t,t ′ − , g [ t,t ′ − , i [ t,t ′ − ) ◦ ε ( s t ) to veer off even slightly away from the NG’sdeterministic period- t ′ environment σ t ′ = T [ t,t ′ − ( x [ t,t ′ − ) ◦ σ t . We now set out to establish this section’s main result, that an equilibrium from the NG willserve as an ever more accurate approximate equilibrium for ever larger finite games. First,we need to assume that the single-period payoff functions ψ t are continuous:(F1) Each ψ t ( s, x, µ ) is continuous in ( s, x ). That is, for any µ ∈ P ( S × X ) and ǫ > δ S > δ X >
0, so that for any s, s ′ ∈ S and x, x ′ ∈ X satisfying d S ( s, s ′ ) < δ S and d X ( x, x ′ ) < δ X , | ψ t ( s, x, µ ) − ψ t ( s ′ , x ′ , µ ) | < ǫ. (F2) Each ψ t ( s, x, µ ) is continuous in µ at a rate independent of the ( s, x ) present. Thatis, for any µ ∈ P ( S × X ) and ǫ >
0, there exists δ >
0, so that for any µ ′ ∈ P ( S × X )satisfying ρ S × X ( µ, µ ′ ) < δ , as well as any s ∈ S and x ∈ X , | ψ t ( s, x, µ ) − ψ t ( s, x, µ ′ ) | < ǫ. There are a couple of intermediate results, whose proofs are provided in Appendix B. Recallthat the value functions v t for an NG are defined around (11) and (12), while the valuefunctions v nt for finite games are defined around (19) and (20). Proposition 2 v t ( s t , σ t , x [ t ¯ t ] , x t ) is continuous in s t under probabilistically continuous x t ′ ’s. Proposition 3
Let σ t ∈ P ( S ) and x [ t ¯ t ] ∈ ( K ( S, G, γ, X )) ¯ t − t +1 be given. Suppose sequence s t, − = ( s t , s t , ... ) is sampled from σ t , then v nt ( s t , ε ( s nt, − ) , x [ t ¯ t ] , x t ) will converge to v t ( s t , σ t , x [ t ¯ t ] , x t ) in probability at an s t -independent rate, where s nt, − stands for the cutoff ( s t , s t , ..., s tn ) . Now here comes our main transient result.
Theorem 2
For state distribution σ ∈ P ( S ) , suppose x ∗ [1¯ t ] ≡ ( x ∗ t ) t =1 , ,..., ¯ t ∈ ( K ( S, G, γ, X )) ¯ t is a probabilistically continuous Markov equilibrium of the nonatomic game Γ( σ ) . Then, or any ǫ > and large enough n ∈ N , this x ∗ [1¯ t ] is also an ǫ -Markov equilibrium for thegame family (Γ n ( ε ( s )) | s ∈ S n ) in the sense of σ [1¯ t ] ≡ ( σ t ) t =1 ,.., ¯ t , where every σ t = T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ . This means that for any t = 1 , ..., ¯ t and y t ∈ M ( S × G, X ) , (21) is true: Z S n v nt (cid:16) s t , ε ( s t, − ) , x ∗ [ t ¯ t ] , x ∗ t (cid:17) · σ nt ( ds t ) ≥ Z S n v nt (cid:16) s t , ε ( s t, − ) , x ∗ [ t ¯ t ] , y t (cid:17) · σ nt ( ds t ) − ǫ. Furthermore, the same is true in the sense of the stochastic pre-action environment se-quence σ n, [1¯ t ] ≡ ( σ nt ) t =1 ,..., ¯ t , where every σ nt is a sample over the ε ( s t ) ’s with ε ( s t ) = T n, [1 ,t − ( x [1 ,t − , g [1 ,t − , i [1 ,t − ) ◦ ε ( s ) , while ( s , g [1 ,t − , i [1 ,t − ) is distributed according to ( σ × γ t − × ι t − ) n . This means that, for any ǫ > and large enough n ∈ N , for any t = 1 , ..., ¯ t and y t ∈ M ( S × G, X ) , R S n σ n ( ds ) × R G n · ( t − γ n · ( t − ( dg ,t − ) × R I n · ( t − ι n · ( t − ( di [1 ,t − ) × v nt (cid:16) s t, , ε ( s t, − ) , x ∗ [ t ¯ t ] , x ∗ t (cid:17) ≥ R S n σ n ( ds ) × R G n · ( t − γ n · ( t − ( dg ,t − ) × R I n · ( t − ι n · ( t − ( di [1 ,t − ) ×× v nt (cid:16) s t, , ε ( s t, − ) , x ∗ [ t ¯ t ] , y t (cid:17) − ǫ, where both s t, and ε ( s t, − ) come from ε ( s t ) . Theorem 2 says that, when there are enough of them, players in a finite game can agree onan NG equilibrium and expect to lose little on average; also, the distribution based on which“average” is taken can be either the NG’s state distribution or even an accurate assessmentof what players’ states would be had they followed the NG equilibrium all along. In the latteroption, different players’ states can even be correlated. In the NG limit, the evolution of pre-action environments is deterministic. An equilibrium here, which is necessarily observation-blind to the extent that other players’ states and actions do not influence it, serves as a goodasymptotic equilibrium for finite games when there are enough players; and, this asymptoticresult is independent of the observatory power of players in the finite games.
Now we study an infinite-horizon model with stationary features. To this end, suppose thereis a payoff function ψ , so that ψ t ( s, x, µ ) = α t − · ψ ( s, x, µ ) , ∀ t = 1 , , ..., (23)where α ∈ [0 ,
1) is a discount factor. Also, we use ¯ ψ for the bound ¯ ψ that appears in (1).In addition, suppose there is a state transition function θ , so that θ t ( s, x, µ, i ) = θ ( s, x, µ, i ) , ∀ t = 1 , , .... (24)15or the nonatomic game Γ with the above stationary features, we use x ≡ ( x ( s, g ) | s ∈ S, g ∈ G ) ∈ M ( S × G, X ) to represent a stationary policy profile. It is a map from thecurrent period’s state and pre-action shock to the player’s action. Given an x ∈ M ( S × G, X ), we denote by T ( x ) the operator on P ( S ) that converts one state distribution σ to itscorresponding T ( x ) ◦ σ so that following (8), for every S ′ ∈ B ( S ),[ T ( x ) ◦ σ ]( S ′ ) = Z S Z G Z I ( θ ( s, x ( s, g ) , M ( σ, x ) , i ) ∈ S ′ ) · ι ( di ) · γ ( dg ) · σ ( ds ) . (25)An environment σ ∈ P ( S ) is said to be associated with x when σ = T ( x ) ◦ σ. (26)That is, we consider σ ∈ P ( S ) to be associated with x ∈ M ( S × G, X ) when the former isinvariant under the state transition facilitated by the T ( x ) operator.Suppose pre-action environment σ ∈ P ( S ) is associated with policy x ∈ M ( S × G, X ).For t = 0 , , ... , we define v t ( s, σ, x, y ) as the total expected payoff a player can make fromperiod 1 to t , when he starts period 1 with state s ∈ S and outside environment σ , while allplayers keep on using policy x from period 1 to t with the exception of the current player inthe very beginning, who deviates to y ∈ M ( S × G, X ). As a terminal condition, we have v ( s, σ, x, y ) = 0 . (27)Due to the stationarity of the setting, we have, for t = 1 , , ... , v t ( s, σ, x, y ) = R G [ ψ ( s, y ( s, g ) , M ( σ, x ))+ α · R I v t − ( θ ( s, y ( s, g ) , M ( σ, x ) , i ) , σ, x, x ) · ι ( di )] · γ ( dg ) . (28)Using (27) and (28), we can inductively show that | v t +1 ( s, σ, x, y ) − v t ( s, σ, x, y ) |≤ α t · ¯ ψ. (29)The sequence { v t ( s, σ, x, y ) | t = 0 , , ... } is thus Cauchy with a limit point v ∞ ( s, σ, x, y ).This v ∞ ( s, σ, x, y ) can be understood as the infinite-horizon total discounted expected payoffa player can obtain by starting with state s and environment σ , while all players adhere tothe action plan x except for the current player in the beginning, who deviates to y .We deem x ∗ ∈ M ( S × G, X ) a Markov equilibrium for the nonatomic game Γ when forsome σ ∗ ∈ P ( S ) associated with x ∗ in the fashion of (26) and every y ∈ M ( S × G, X ), Z S v ∞ ( s, σ ∗ , x ∗ , x ∗ ) · σ ∗ ( ds ) ≥ Z S v ∞ ( s, σ ∗ , x ∗ , y ) · σ ∗ ( ds ) . (30)16herefore, a policy will be considered an equilibrium when it induces an invariant environ-ment profile under which the policy forms a best response in the long run.Now we move on to the n -player game Γ n with the same stationary features provided by ψ , θ , and α . Given policy profile x = ( x ( s, g ) | s ∈ S, g ∈ G ) ∈ M ( S × G, X ), pre-actionshock vector g = ( g , ..., g n ) ∈ G n , and post-action shock vector i = ( i , ..., i n ) ∈ I n , wedefine T n ( x, g, i ) as the operator on P n ( S ) that converts a period’s pre-action environmentinto that of a next period. Following (16), ε ( s ′ ) = T n ( x, g, i ) ◦ ε ( s ) is such that s ′ m = θ ( s m , x ( s m , g m ) , M n ( s − m , g − m , x ) , i m ) , ∀ m = 1 , , ..., n. (31)Let v nt ( s , ε ( s − ) , x, y ) be the total expected payoff player 1 can make from period 1 to t ,when the player’s starting state is s ∈ S , other players’ initial environments is describableby their aggregate empirical state distribution ε ( s − ) = ε ( s , ..., s n ), and all players adoptthe policy x ∈ M ( S × G, X ) with the exception that player 1 adopts policy y ∈ M ( S × G, X )in the very beginning. As a terminal condition, we have v n ( s , ε ( s − ) , x, y ) = 0 . (32)For t = 1 , , ... , we have that v nt ( s , ε ( s − ) , x, y ) equals to R G n γ n ( dg ) × { ψ ( s , y ( s , g ) , M n ( s − , g − , x )) + α · R I n ι n ( di ) ×× v n,t − ( θ ( s , y ( s , g ) , M n ( s − , g − , x ) , i ) , [ T n ( x, g, i ) ◦ ε ( s )] − , x, x ) } , (33)where [ T n ( x, g, i ) ◦ ε ( s )] − stands for ε ( s ′− ), such that ε ( s ′ ) = T n ( x, g, i ) ◦ ε ( s ). Using (32)and (33), we can inductively show that | v n,t +1 ( s , ε ( s − ) , x, y ) − v nt ( s , ε ( s − ) , x, y ) |≤ α t · ¯ ψ. (34)Thus, the sequence { v nt ( s , ε ( s − ) , x, y ) | t = 0 , , ... } is Cauchy with limit v n ∞ ( s , ε ( s − ) , x, y ).We make the following assumptions, which are t -independent versions of (S1) to (F2):(S1-s) For every µ ∈ P ( S × X ), the function θ ( · , · , µ, · ) is a member of K ( S × X, I, ι, S ).(S2-s) Not only is it true that θ ( s, x, · , · ) ∈ K ( P ( S × X ) , I, ι, S ) at every ( s, x ) ∈ S × X ,but the continuity is also achieved at a rate independent of the ( s, x ) present.(F1-s) The function ψ ( s, x, µ ) is continuous in ( s, x ).(F2-s) The function ψ ( s, x, µ ) is continuous in µ at an ( s, x )-independent rate.Here comes our main result for the stationary case.17 heorem 3 Suppose x ∗ ∈ K ( S, G, γ, X ) is a probabilistically continuous Markov equilibriumfor the nonatomic game Γ . Let σ ∗ ∈ P ( S ) be associated with x ∗ in the fashion of (26). Then,for any ǫ > and large enough n ∈ N , for any y ∈ M ( S × G, X ) , Z S n v n ∞ ( s , ε ( s − ) , x ∗ , x ∗ ) · ( σ ∗ ) n ( ds ) ≥ Z S n v n ∞ ( s , ε ( s − ) , x ∗ , y ) · ( σ ∗ ) n ( ds ) − ǫ. Theorem 3 is proved in Appendix C. It states that players in a large finite game will notregret much by keeping on adopting a stationary equilibrium for its correspondent nonatomicgame. The regret is measured in an average sense, where the underlying invariant statedistribution for measuring “average” is part of the NG equilibrium. So players can fare wellby responding to their individual states in the same fashion indefinitely.
Using this paper’s language and notation, we offer a comparison with the most relevantpapers. Within the discrete-time framework while without considering atomic players orplayers’ entries and exits, we have arguably worked with the most general setup.Both Weintraub et al. [28] and Weintraub, Benkard, and van Roy [30] treated competingfirms on a common market as players. They allowed for entry and exit of firms, and accountedfor the effect of firm density c per unit market size. Roughly speaking, their regular payoffis in the form of ψ ( s, c · µ | S ) − ψ ( x ), where µ | S stands for the marginal state distributionderivable from the joint state-action distribution µ . Also, firms’ state transitions are governedby a certain θ ( s, x, i ) that is independent of the environment µ .Weintraub et al. [28] arrived at something akin to our Theorem 2. In the mean time,Weintraub, Benkard, and van Roy [30] found a stationary policy of the form x ( s ) to sufficefor the NG limit. It was considered oblivious because of firms’ abilities to ignore the industrystate c · µ | S . When there are few dominant firms in it, an NG equilibrium was shown to workincreasingly well for larger finite models. This is close in spirit to our Theorem 3. We notethat θ ’s independence of µ helped greatly with their derivations. While free from the taskof dealing with entry, exit, or impacts of market size and number of firms, we have allowedplayers’ state transitions to be profoundly impacted by the environment that their collectivestates and actions fabricate. Namely, our θ t can depend on µ in virtually arbitrary fashions.Huang, Malhame, and Caines [15] dealt with continuous-time games with the state space S equal to the real line ℜ . These games’ discrete-time counterparts can be obtained by18eplacing their Brownian motions with symmetric random walks. In particular, we can letthe post-action shock space I be {− , +1 } and the probability ι be half on − θ t ( s, x, µ, i ) = Z ℜ θ ( s, x, s ′ ) · µ | S ( ds ′ ) + ¯ s · i, (35)where θ is a function from ℜ × X × ℜ to ℜ and ¯ s is a constant. So there, only the state-distribution portion of the joint state-action distribution µ of other firms affect the currentfirm’s state transition; its impact is also felt in an average sense; moreover, the effect of therandom shock is additive.Their one-period payoff function can be understood as ψ t ( s, x, µ ) = Z ℜ ψ ( s, x, s ′ ) · µ | S ( ds ′ ) , (36)where ψ is a function from ℜ × X × ℜ to ℜ . Artificial randomization in decision makingturns out to be unnecessary—NG equilibria can be found in the form of x t ( s ) rather than themore general x t ( s, g ). We, on the other hand, believe that allowing other players’ actions toplay a role in both state transitions and one-period payoffs can greatly enhance the relevantmodels’ applicabilities. In the competitive pricing situation, for instance, the demand levelexperienced by a firm is perturbable by prices charged by other firms. It in turn influencesnot only the firm’s present profitability but also its future inventory levels.As could be seen from equivalence results such as Aumann [6] (Lemma F), using pre-action shocks g and post-action shocks i permit us to effectively deal with both random actionplans and random state transitions. These were indeed treated by Yang [32] in an alternativetransition-probability formulation, with each χ t ( s ) there effectively x t ( s, · ) ◦ γ − here andeach ˜ g t ( s, x, µ ) there effectively θ t ( s, x, µ, · ) ◦ ι − here. Due to its need to sample from jointprobabilities of the non-product type, however, the earlier work found it necessary to assumediscrete state and action spaces. This restriction is removed here through exploitations ofthe independently generated shocks and tools pertinent to the tightness of probabilities. Thelatter only requires the current spaces S , G , and I to be complete.We can also apply our results to a dynamic pricing game participated by heterogeneousfirms. Since the random demand arrival process is influenced by prices charged by all firmsand leftover items are stored for future sales, the finite-player version of this problem isvirtually intractable. The usefulness of the transient result Theorem 2 is thus at full display.To the stationary case also involving production, the stationary result Theorem 3 can further19e applied. Moreover, depending on which portion of the outside environment, whether itbe merely other firms’ prices or both their prices and inventory levels, are observable, therecan be different versions of the finite game. The NG approximation renders these differencesirrelevant. Details are furnished in Yang [33]. We have established links between multi-period Markovian games and their NG counterparts.Our focus is the case where state and action spaces are general metric spaces, and thereare independently generated shocks serving as random drivers for decision making and stateevolution. In essence, the evolution of player-state distributions in large finite games, thoughrandom, resembles in probability the deterministic pathway taken by their NG counterparts.This allows NG equilibria to be well adapted to large finite games.Still, many dynamic competitive situations not yet covered by existing studies like Huang,Malhame, and Caines [15] are better described by continuous-time models. These will requirevastly different techniques to probe. For one thing, the mathematical induction approachwe have taken to deal with multiple periods would not seem to go well with a discrete-time approximation of a continuous-time model. In the latter model, even to identify theenvironment induced by all players adopting a common policy might involve solving a fixedpoint problem. Therefore, serious challenges will have to be overcome.20 ppendicesA Technical Developments in Section 6.1
Given metric space A , the Prohorov metric ρ A is such that, for any distributions π, π ′ ∈ P ( A ), ρ A ( π, π ′ ) = inf ( ǫ > | π ′ (( A ′ ) ǫ ) + ǫ ≥ π ( A ′ ) , for all A ′ ∈ B ( A )) , (A.1)where ( A ′ ) ǫ = { a ∈ A | d A ( a, a ′ ) < ǫ for some a ′ ∈ A ′ } . (A.2)The metric ρ A is known to generate the weak topology for P ( A ).According to Parthasarathy [21] (Theorem II.7.1), the strong LLN applies to the empiricaldistribution under the weak topology, and hence under the Prohorov metric. In the following,we state its weak version. Lemma 1
Given separable metric spaces A and B , suppose distribution π A ∈ P ( A ) andmeasurable mapping y ∈ M ( A, B ) . Then, for any ǫ > , as long as n is large enough, ( π A ) n (cid:0)(cid:8) a = ( a , ..., a n ) ∈ A n | ρ B ( ε ( a ) · y − , π · y − ) < ǫ (cid:9)(cid:1) > − ǫ. For separable metric space A , point a ∈ A , and the ( n − π ∈ P n − ( A ), we use ( a, π ) n to represent the member of P n ( A ) that has an additional1 /n weight on the point a , but with probability masses in π being reduced to ( n − /n times of their original values. For a ∈ A n and m = 1 , ..., n , we have ( a m , ε ( a − m )) n = ε ( a ).Concerning the Prohorov metric, we have also a simple but useful observation. Lemma 2
Let A be a separable metric space. Then, for any n = 2 , , ... , a ∈ A , and π ∈ P n − ( A ) , ρ A (( a, π ) n , π ) ≤ n . Proof:
Let A ′ ∈ B ( A ) be chosen. If a / ∈ A ′ , then( a, π ) n ( A ′ ) ≤ π ( A ′ ) ≤ ( a, π ) n ( A ′ ) + 1 n ; (A.3)if a ∈ A ′ , then ( a, π ) n ( A ′ ) − n ≤ π ( A ′ ) ≤ ( a, π ) n ( A ′ ) . (A.4)21ence, it is always true that | ( a, π ) n ( A ′ ) − π ( A ′ ) |≤ n . (A.5)In view of (A.1) and (A.2), we have ρ A (( a, π ) n , π ) ≤ n . (A.6)We have thus completed the proof.The following result is important for showing the near-trajectory evolution of aggregateenvironments in large multi-period games. Lemma 3
Given separable metric space A and complete separable metric spaces B and C ,suppose y n ∈ M ( A n , B n ) for every n ∈ N , π A ∈ P ( A ) , π B ∈ P ( B ) , and π C ∈ P ( C ) . If ( π A ) n ( { a ∈ A n | ρ B ( ε ( y n ( a )) , π B ) < ǫ } ) > − ǫ, for any ǫ > and any n large enough, then ( π A × π C ) n ( { ( a, c ) ∈ ( A × C ) n | ρ B × C ( ε ( y n ( a ) , c ) , π B × π C ) < ǫ } ) > − ǫ, for any ǫ > and any n large enough. Proof:
Suppose sequence { π ′ B , π ′ B , ... } weakly converges to the given probability measure π B , and sequence { π ′ C , π ′ C , ... } weakly converges to the given probability measure π C . Weare to show that the sequence { π ′ B × π ′ C , π ′ B × π ′ C , ... } weakly converges to π B × π C .Let F ( B ) denote the family of uniformly continuous real-valued functions on B withbounded support. Let F ( C ) be similarly defined for C . We certainly have ( lim k → + ∞ R B f ( b ) · π ′ Bk ( db ) = R B f ( b ) · π B ( db ) , ∀ f ∈ F ( B ) , lim k → + ∞ R C f ( c ) · π ′ Ck ( dc ) = R C f ( c ) · π C ( dc ) , ∀ f ∈ F ( C ) . (A.7)Define F so that F = { f | f ( b, c ) = f B ( b ) · f C ( c ) for any ( b, c ) ∈ B × C, where f B ∈ F ( B ) ∪ { } and f C ∈ F ( C ) ∪ { }} , (A.8)where stands for the function whose value is 1 everywhere. By (A.7) and (A.8),lim k → + ∞ Z B × C f ( b, c ) · ( π ′ Bk × π ′ Ck )( d ( b, c )) = Z B × C f ( b, c ) · ( π B × π C )( d ( b, c )) . (A.9)22ccording to Ethier and Kurtz [10] (Proposition III.4.4), F ( B ) and F ( C ) happen to be P ( B ) and P ( C )’s convergence determining families, respectively. As B and C are complete,Ethier and Kurtz ([10], Proposition III.4.6, whose proof involves Prohorov’s Theorem, i.e.,the equivalence between tightness and relative compactness of a collection of probabilitymeasures defined for complete separable metric spaces) further states that F as definedthrough (A.8) is convergence determining for P ( B × C ). Therefore, we have the desiredweak convergence by (A.9).Let ǫ > δ B > δ C >
0, such that ρ B ( π ′ B , π B ) < δ B and ρ C ( π ′ C , π C ) < δ C will imply( ρ B × ρ C )( π ′ B × π ′ C , π B × π C ) < ǫ. (A.10)By (A.1) and the given hypothesis, there is ¯ n ∈ N , so that for n = ¯ n , ¯ n + 1 , ... ,( π A ) n ( ˜ A n ) > − ǫ , (A.11)where ˜ A n contains all a ∈ A n such that ρ B ( ε ( y n ( a )) , π B ) < δ B . (A.12)By (A.1) and Lemma 1, on the other hand, there is ¯ n ∈ N , so that for n = ¯ n , ¯ n + 1 , ... ,( π C ) n ( ˜ C n ) > − ǫ , (A.13)where ˜ C n contains all c ∈ C n such that ρ C ( ε ( c ) , π C ) < δ C . (A.14)For any n = ¯ n ∨ ¯ n , ¯ n ∨ ¯ n + 1 , ... , let ( a, c ) be an arbitrary member of ˜ A n × ˜ C n . We havefrom (A.10), (A.12), and (A.14) that,( ρ B × ρ C )( ε ( y n ( a ) , c ) , π B × π C ) < ǫ. (A.15)Noting the facilitating ( a, c ) is but an arbitrary member of ˜ A n × ˜ C n , we see that( π A × π C ) n ( { ( a, c ) ∈ ( A × C ) n | ρ B × C ( ε ( y n ( a ) , c ) , π B × π C ) < ǫ } ) ≥ ( π A ) n ( ˜ A n ) × ( π C ) n ( ˜ C n ) , (A.16)which by (A.11) and (A.13), is greater than 1 − ǫ .23ecause the equivalence between tightness and relative compactness of a collection ofprobability measures is indirectly related to the proof of Lemma 3, we require B and C tobe complete separable metric spaces. Lemma 4
Given separable metric spaces A , B , C , and D , as well as distributions π A ∈P ( A ) , π B ∈ P ( B ) , and π C ∈ P ( C ) , suppose y n ∈ M ( A n , B n ) for every n ∈ N and z ∈K ( B, C, π C , D ) . If ( π A × π C ) n ( { a ∈ A n , c ∈ C n | ρ B × C ( ε ( y n ( a ) , c ) , π B × π C ) < ǫ } ) > − ǫ, for any ǫ > and any n large enough, then ( π A × π C ) n (cid:0)(cid:8) a ∈ A n , c ∈ C n | ρ D ( ε ( y n ( a ) , c ) · z − , ( π B × π C ) · z − ) < ǫ (cid:9)(cid:1) > − ǫ, for any ǫ > and any n large enough. Proof:
Let ǫ > z ∈ K ( B, C, π C , D ), there exist C ′ ∈ B ( C ) satisfying π C ( C ′ ) > − ǫ , (A.17)as well as δ ∈ (0 , ǫ/ , (A.18)such that for any b, b ′ ∈ B satisfying d B ( b, b ′ ) < δ and any c ∈ C ′ , d D ( z ( b, c ) , z ( b ′ , c )) < ǫ. (A.19)For any subset D ′ in B ( D ), we therefore have( z − ( D ′ )) δ ∩ ( B × C ′ ) ⊆ z − (( D ′ ) ǫ ) . (A.20)This leads to ( z − ( D ′ )) δ \ ( B × ( C \ C ′ )) ⊆ z − (( D ′ ) ǫ ), and hence due to (A.17),( π B × π C ) (cid:0) z − (( D ′ ) ǫ ) (cid:1) ≥ ( π B × π C ) (cid:0) ( z − ( D ′ )) δ (cid:1) − ǫ . (A.21)On the other hand, by the hypothesis, we know for n large enough,( π A × π C ) n ( E ′ n ) > − δ, (A.22)where E ′ n = { a ∈ A n , c ∈ C n | ρ B × C ( ε ( y n ( a ) , c ) , π B × π C ) < δ } ∈ B n ( A × C ) . (A.23)24y (A.23), for any ( a, b ) ∈ E ′ n and F ′ ∈ B ( B × C ),( π B × π C )(( F ′ ) δ ) ≥ [ ε ( y n ( a ) , c )]( F ′ ) − δ. (A.24)Combining the above, we have, for any ( a, c ) ∈ E ′ n and D ′ ∈ B ( D ),[( π B × π C ) · z − ](( D ′ ) ǫ ) = ( π B × π C )( z − (( D ′ ) ǫ )) ≥ ( π B × π C )(( z − ( D ′ )) δ ) − ǫ/ ≥ [ ε ( y n ( a ) , c )]( z − ( D ′ )) − δ − ǫ/ ≥ [ ε ( y n ( a ) , c )]( z − ( D ′ )) − ǫ = ([ ε ( y n ( a ) , c )] · z − )( D ′ ) − ǫ. (A.25)where the first inequality is due to (A.21), the second inequality is due to (A.24), and thethird inequality is due to (A.18). That is, we have ρ D (cid:0) ε ( y n ( a ) , c ) · z − , ( π B × π C ) · z − (cid:1) ≤ ǫ, ∀ ( a, c ) ∈ E ′ n . (A.26)In view of (A.18) and (A.22), we have the desired result.We can now prove Proposition 1 and then Theorem 1. Proof of Proposition 1:
Let t = 1 , ..., ¯ t − x ∈ K ( S, G, γ, X ) be given. Define map z ∈ M ( S × G × I, S ), so that z ( s, g, i ) = θ t ( s, x ( s, g ) , M ( σ, x ) , i ) , ∀ s ∈ S, g ∈ G, i ∈ I. (A.27)In view of (7) and (A.27), we have, for any S ′ ∈ B ( S ),[ T t ( x ) ◦ σ ]( S ′ ) = R S R G R I ( z ( s, g, i ) ∈ S ′ ) · ι ( di ) · γ ( dg ) · σ ( ds )= ( σ × γ × ι )( { ( s, g, i ) ∈ S × G × I | z ( s, g, i ) ∈ S ′ } ) = ( σ × γ × ι )( z − ( S ′ )) . (A.28)For n ∈ N , g = ( g , ..., g n ) ∈ G n , and i = ( i , ..., i n ) ∈ I n , also define operator T ′ n ( g, i ) on P n ( S ) so that T ′ n ( g, i ) ◦ ε ( s ) = ε ( s ′ ), where for m = 1 , , ..., n , s ′ m = z ( s m , g m , i m ) = θ t ( s m , x ( s m , g m ) , M ( σ, x ) , i m ) . (A.29)It is worth noting that (A.29) is different from the earlier (16). In view of (A.27) and (A.29),we have, for S ′ ∈ B ( S ), that [ T ′ n ( g, i ) ◦ ε ( s )]( S ′ ) equals1 n · n X m =1 ( z ( s m , g m , i m ) ∈ S ′ ) = ε (( s , g , i ) , ..., ( s n , g n , i n )) (cid:0) z − ( S ′ ) (cid:1) . (A.30)Combining (A.28) and (A.30), we arrive to a key observation that T t ( x ) ◦ σ = ( σ × γ × ι ) · z − , while T ′ n ( g, i ) ◦ ε ( s ) = ε ( s, g, i ) · z − . (A.31)25n the rest of the proof, we first show the asymptotic closeness between T t ( x ) ◦ σ and T ′ n ( g, i ) ◦ ε ( s n ( a )), and then that between the latter and T nt ( x, g, i ) ◦ ε ( s n ( a )).First, due to the hypothesis on the convergence of ε ( s n ( a )) to σ , the completeness of thespaces S , G , and I and hence also the completeness of G × I , as well as Lemma 3,( π × γ × ι ) n ( { ( a, g, i ) ∈ ( A × G × I ) n | ρ S × G × I ( ε ( s n ( a ) , g, i ) , σ × γ × ι ) < ǫ ′ } ) > − ǫ ′ , (A.32)for any ǫ ′ > n large enough. By (S1) and the fact that x ∈ K ( S, G, γ, X ), we maysee that z as defined through (A.27) is a member of K ( S, G × I, γ × ι, S ). By Lemma 4, thisfact along with (A.32) will lead to the strict dominance of 1 − ǫ ′ by( π × γ × ι ) n ( { ( a, g, i ) ∈ ( A × G × I ) n | ρ S ( ε ( s n ( a ) , g, i ) · z − , ( σ × γ × ι ) · z − ) < ǫ ′ } ) , (A.33)for any ǫ ′ > n large enough. By (A.31), this is equivalent to that, given ǫ > n ∈ N so that for any n = ¯ n , ¯ n + 1 , ... ,( π × γ × ι ) n (cid:16) ˜ A n ( ǫ ) (cid:17) > − ǫ , (A.34)where ˜ A n ( ǫ ) ∈ B n ( A × G × I ) is equal to n ( a, g, i ) ∈ ( A × G × I ) n | ρ S ( T t ( x ) ◦ σ, T ′ n ( g, i ) ◦ ε ( s n ( a ))) < ǫ o . (A.35)Next, note that the only difference between T nt ( x, g, i ) ◦ ε ( s n ( a )) and T ′ n ( g, i ) ◦ ε ( s n ( a ))lies in that ε ( s n, − m ( a ) , g − m ) is used in the former as in (16) whereas σ × γ is used in the latteras in (A.29). Here, s n, − m ( a ) refers to the vector ( s n ( a ) , ..., s n,m − ( a ) , s n,m +1 ( a ) , ..., s nn ( a )).By (S2), there is δ ∈ (0 , ǫ/
4] and I ′ ∈ B ( I ) with ι ( I ′ ) > − ǫ , (A.36)so that for any ( s, g, i ) ∈ S × G × I ′ and any µ ′ ∈ P ( S × X ) satisfying ρ S × X ( M ( σ, x ) , µ ′ ) < δ , d S ( θ t ( s, x ( s, g ) , M ( σ, x ) , i ) , θ t ( s, x ( s, g ) , µ ′ , i )) < ǫ . (A.37)For each n ∈ N , define I ′ n so that I ′ n = n i = ( i , ..., i n ) ∈ I n | more than (cid:16) − ǫ (cid:17) · n components come from I ′ o . (A.38)Also important is that by (A.37) and (A.38), for any S ′ ∈ B ( S ) and i = ( i , ..., i n ) ∈ I ′ n ,[ T nt ( x, g, i ) ◦ ε ( s n ( a ))] (cid:0) ( S ′ ) ǫ/ (cid:1) + ǫ ≥ [ T ′ n ( g, i ) ◦ ε ( s n ( a ))] ( S ′ ) , (A.39)26henever ρ S × X ( M ( σ, x ) , M n ( s n, − m ( a ) , g − m , x )) < δ. (A.40)It can be shown that I ′ n will occupy a big chunk of I n as measured by ι n when n is large.Define map q from I to { , } so that q ( i ) = 1 or 0 depending on whether or not i ∈ I ′ .By (A.36), ι · q − is a Bernoulli distribution with ( ι · q − )( { } ) > − ǫ/
4. So by (A.38), I ′ n contains all i = ( i , ..., i n ) ∈ I n that satisfy ρ { , } ( ε ( i ) · q − , ι · q − ) < ǫ . (A.41)Therefore, by Lemma 1, there exits ¯ n ∈ N , so that for n = ¯ n , ¯ n + 1 , ... , ι n ( I ′ n ) > − ǫ . (A.42)We can also demonstrate that (A.40) will be highly likely when n is large. By Lemma 3 andthe hypothesis on the convergence of ε ( s n ( a )) to σ , we know ε ( s n ( a ) , g ) will converge to σ × γ in probability. Due to Lemma 2, this conclusion applies to the sequence ε ( s n, − m ( a ) , g − m ) aswell. The fact that x ∈ K ( S, G, γ, X ) certainly leads to (prj S , x ) ∈ K ( S, G, γ, S × X ). So byLemma 4, there is ¯ n ∈ N , so that for n = ¯ n , ¯ n + 1 , ... ,( π n × γ n ) (cid:16) ˜ B n ( δ ) (cid:17) > − ǫ , (A.43)where ˜ B n ( δ ) = { ( a, g ) ∈ A n × G n | (A.40) is true } ∈ B n ( A × G ) . (A.44)Consider arbitrary n = ¯ n ∨ ¯ n ∨ ¯ n , ¯ n ∨ ¯ n ∨ ¯ n + 1 , ... , ( a, g, i ) ∈ ˜ A n ( ǫ ) ∩ ( ˜ B n ( δ ) × I ′ n ), and S ′ ∈ B ( S ). By (A.1) and (A.35), we see that[ T ′ n ( g, i ) ◦ ε ( s n ( a ))] (cid:0) ( S ′ ) ǫ/ (cid:1) + ǫ ≥ [ T t ( x ) ◦ σ ]( S ′ ) . (A.45)Combining this with (A.39), (A.40), and (A.44), we obtain[ T nt ( x, g, i ) ◦ ε ( s n ( a ))] (( S ′ ) ǫ ) + ǫ ≥ [ T ′ n ( g, i ) ◦ ε ( s n ( a ))] (cid:0) ( S ′ ) ǫ/ (cid:1) + ǫ ≥ [ T t ( x ) ◦ σ ]( S ′ ) . (A.46)According to (A.1), this means ρ S ( T nt ( x, g, i ) ◦ ε ( s n ( a )) , T t ( x ) ◦ σ ) ≤ ǫ. (A.47)Therefore, for n ≥ ¯ n ∨ ¯ n ∨ ¯ n ,( π × γ × ι ) n ( { ( a, g, i ) ∈ ( A × G × I ) n | ρ S ( T nt ( x, g, i ) ◦ ε ( s n ( a )) , T t ( x ) ◦ σ ) ≤ ǫ } ) ≥ ( π × γ × ι ) n (cid:16) ˜ A n ( ǫ ) ∩ ( ˜ B n ( δ ) × I ′ n ) (cid:17) , (A.48)27hereas the latter is, in view of (A.34), (A.42), and (A.43), greater than 1 − ǫ . Proof of Theorem 1:
We use induction to show that, for each τ = 0 , , ..., ¯ t − t + 1,( σ t × γ τ × ι τ ) n (cid:16) ˜ A nτ ( ǫ ) (cid:17) > − ǫ ¯ t − t + 2 , (A.49)for any ǫ > n large enough, where ˜ A nτ ( ǫ ) ∈ B n ( S × G τ × I τ ) is such that, for any( s t , g [ t,t + τ − , i [ t,t + τ − ) ∈ ˜ A nτ ( ǫ ), ρ S (cid:0) T n, [ t,t + τ − ( x [ t,t + τ − , g [ t,t + τ − , i [ t,t + τ − ) ◦ ε ( s t ) , T [ t,t + τ − ( x [ t,t + τ − ) ◦ σ t (cid:1) < ǫ. (A.50)Once the above is achieved, we can then define ˜ A n ( ǫ ) required in the theorem by˜ A n ( ǫ ) = ¯ t − t +1 \ τ =0 h ˜ A nτ ( ǫ ) × G n · (¯ t − t +1 − τ ) × I n · (¯ t − t +1 − τ ) i . (A.51)This and (A.49) will lead to (cid:16) σ t × γ ¯ t − t +1 × ι ¯ t − t +1 (cid:17) n (cid:16) ˜ A n ( ǫ ) (cid:17) > (cid:18) − ǫ ¯ t − t + 2 (cid:19) ¯ t − t +2 > − ǫ, (A.52)for any ǫ > n large enough.Now we proceed with the induction process. First, note that T n, [ t,t − ◦ ε ( s t ) is merely ε ( s t ) itself and T [ t,t − ◦ σ t is merely σ t itself. Hence, we will have (A.49) for τ = 0 for any ǫ > n large enough just by Lemma 1. Then, for some τ = 1 , , ..., ¯ t − t + 1, suppose (cid:0) σ t × γ τ − × ι τ − (cid:1) n (cid:16) ˜ A n,τ − ( ǫ ) (cid:17) > − ǫ ¯ t − t + 2 , (A.53)for any ǫ > n large enough. We may apply Proposition 1 to the above, while atthe same time identifying S × G τ − × I τ − with A , σ t × γ τ − × ι τ − with π , x t + τ − with x , T n, [ t,t + τ − ( x [ t,t + τ − , g [ t,t + τ − , i [ t,t + τ − ) ◦ ε ( s t ) with ε ( s n ( a )), and T [ t,t + τ − ( x [ t,t + τ − ) ◦ σ t with σ . This way, we will verify (A.49) for any ǫ > n large enough. Therefore, the inductionprocess can be completed. B Technical Developments in Section 6.2
Proof of Proposition 2:
Because payoff functions are bounded, the value functions arebounded too. We then prove by induction on t . By (11), we know the result is true for t =28 t +1. Suppose for some t = ¯ t, ¯ t − , ...,
2, we have the continuity of v t +1 ( s t +1 , σ t +1 , x [ t +1 , ¯ t ] , x t +1 )in s t +1 . By this induction hypothesis, the probabilistic continuity of x t , (S1), (F1), and theboundedness of the value functions, we see the continuity of the right-hand side of (12) in s t . So, v t ( s t , σ t , x [ t ¯ t ] , x t ) is continuous in s t , and we have completed our induction process. Proof of Proposition 3:
We prove by induction on t . By (11) and (19), we know theresult is true for t = ¯ t + 1. Suppose for some t = ¯ t, ¯ t − , ...,
2, we have the convergence of v n,t +1 ( s t +1 , , ε ( s nt +1 , − ) , x [ t +1 , ¯ t ] , x t +1 ) to v t +1 ( s t +1 , , σ t +1 , x [ t +1 , ¯ t ] , x t +1 ) at an s t +1 , -independentrate when s t +1 , − = ( s t +1 , , s t +1 , , ... ) is sampled from σ t +1 . Now, suppose s t, − = ( s t , s t , ... )is sampled from σ t . Let also g = ( g , g , ... ) be generated through sampling on ( G, B ( G ) , γ )and i = ( i , i , ... ) be generated through sampling on ( I, B ( I ) , ι ). In the remainder of theproof, we let s nt = ( s t , s t , ..., s tn ) for any arbitrary s t ∈ S , g n = ( g , ..., g n ) and i n =( i , ..., i n ).Due to Lemma 1, ε ( s nt, − ) will converge to σ t . By Lemma 2, ε ( s nt ) will converge to σ t at an s t -independent rate. By Proposition 1, we know that T nt ( x t , g n , i n ) ◦ ε ( s nt ) will con-verge to T t ( x t ) ◦ σ t in probability at an s t -independent rate, and by Lemma 2 again, sowill [ T nt ( x t , g n , i n ) ◦ ε ( s nt )] − to T t ( x t ) ◦ σ t . Now Lemma 3 will lead to the convergence inprobability of ε ( s nt, − , g n − ) to σ t × γ . Due to x t ’s probabilistic continuity, Lemma 4 will leadto the convergence in probability of M n ( s nt, − , g n − , x t ) to M ( σ t , x t ). Thus,1. ψ t ( s t , x t ( s t , g ) , M n ( s nt, − , g n − , x t )) will converge to ψ t ( s t , x t ( s t , g ) , M ( σ t , x t )) inprobability at an s t -independent rate due to (F2);2. v n,t +1 ( θ t ( s t , x t ( s t , g ) , M n ( s nt, − , g n − , x t ) , i ) , [ T nt ( x t , g n , i n ) ◦ ε ( s nt )] − , x [ t +1 , ¯ t ] , x t +1 ) willconverge to v t +1 ( θ t ( s t , x t ( s t , g ) , M n ( s nt, − , g n − ) , x t ) , i ) , T t ( x t ) ◦ σ t , x [ t +1 , ¯ t ] , x t +1 ) in probabil-ity at an s t -independent rate due to the induction hypothesis; the latter will in turn con-verge to v t +1 ( θ t ( s t , x t ( s t , g ) , M ( σ t , x t ) , i ) , T ( x t ) ◦ σ t , x [ t +1 , ¯ t ] , x t +1 ) in probability at an s t -independent rate due to (S2) and Proposition 2.As per-period payoffs are bounded, all value functions are bounded. The above conver-gences will then lead to the convergence of the right-hand side of (20) to the right-handside of (12) at an s t -independent rate. That is, v nt ( s t , ε ( s nt, − ) , x [ t ¯ t ] , x t ) will converge to v t ( s t , σ t , x [ t ¯ t ] , x t ) at a rate independent of s t . We have completed the induction process. Proof of Theorem 2:
Let us consider subgames starting with some time t = 1 , , ..., ¯ t . Forconvenience, we let σ t = T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ . Now let s t = ( s t , s t , ... ) be generated through29ampling on ( S, B ( S ) , σ t ), g = ( g , g , ... ) be generated through sampling on ( G, B ( G ) , γ ),and i = ( i , i , ... ) be generated through sampling on ( I, B ( I ) , ι ). In the remainder of theproof, we let s nt = ( s t , ..., s tn ), s nt, − = ( s t , ..., s tn ), g n = ( g , ..., g n ), and i n = ( i , ..., i n ).By Lemma 1 and Proposition 1, we know that ε ( s nt ) = ε ( s t , ..., s tn ) converges to σ t inprobability, and also that T nt ( x ∗ t , g n , i n ) ◦ ε ( s nt ) converges to T t ( x ∗ t ) ◦ σ t in probability. Dueto Lemma 2, ε ( s nt, − ) and [ T nt ( x ∗ t , g n , i n ) ◦ ε ( s nt )] − will have the same respective conver-gences. Also, Lemma 3 will lead to the convergence in probability of ε ( s nt, − , g n − ) to σ t × γ .Due to x t ’s probabilistic continuity, Lemma 4 will lead to the convergence in probability of M n ( s nt, − , g n − , x t ) to M ( σ t , x t ). Then,1. ψ t ( s t , y ( s t , g ) , M n ( s nt, − , g − , x t )) will converge to ψ t ( s t , y ( s t , g ) , M ( σ t , x t )) inprobability at a y -independent rate due to (F2);2. v n,t +1 ( θ t ( s t , y ( s t , g ) , M n ( s nt, − , g n − , x t ) , i ) , [ T nt ( x ∗ t , g n , i n ) ◦ ε ( s nt )] − , x ∗ [ t +1 , ¯ t ] , x ∗ t +1 ) willconverge to v t +1 ( θ t ( s t , y ( s t , g ) , M n ( s nt, − , g n − , x t ) , i ) , T t ( x ∗ t ) ◦ σ t , x ∗ [ t +1 , ¯ t ] , x ∗ t +1 ) in probabilityat a y -independent rate due to Proposition 3, which due to (S2) and Proposition 2, willconverge to v t +1 ( θ t ( s t , y ( s t , g ) , M ( σ t , x t ) , i ) , T t ( x ∗ t ) ◦ σ t , x ∗ [ t +1 , ¯ t ] , x ∗ t +1 ) in probability at a y -independent rate.As per-period payoffs are bounded, all value functions are bounded. By (12) and (20),the above convergences will then lead to the convergence of the left-hand side of (21) to theleft-hand side of (14). At the same time, the right-hand side of (21) plus ǫ will converge tothe right-hand side of (14) due to the convergence of ε ( s nt, − ) to σ t , Proposition 3, and theuniform boundedness of the value functions. By (14), as long as n is large enough, (21) willbe true for any ǫ > y ∈ M ( S × G, X ). This would then lead to the final conclusiondue to Theorem 1 and the boundedness of payoff functions.
C Technical Developments in Section 7
Proof of Theorem 3:
Let ǫ > t = 1 , , ... satisfying t ≥ ln(6 ¯ ψ/ ( ǫ · (1 − α ))) / ln(1 /α ) + 1, we have from (33) and (34), | v n ∞ ( s , ε ( s − ) , x ∗ , y ) − v nt ( s , ε ( s − ) , x ∗ , y ) | < ǫ . (C.1)Therefore, we need merely to select such a large t and show that, when n is large enough, Z S n v nt ( s , ε ( s − ) , x ∗ , x ∗ ) · ( σ ∗ ) n ( ds ) ≥ Z S n v nt ( s , ε ( s − ) , x ∗ , y ) · ( σ ∗ ) n ( ds ) − ǫ . (C.2)30or t = 1 , , ... , since ( x ∗ , σ ∗ ) forms an equilibrium for Γ, we know (30) is true. This, aswell as (28) and (29), lead to α t − τ · (cid:20)Z S v τ ( s, σ ∗ , x ∗ , y ) · σ ∗ ( ds ) − Z S v τ ( s, σ ∗ , x ∗ , x ∗ ) · σ ∗ ( ds ) (cid:21) ≤ α t − · ¯ ψ − α ≤ ǫ . (C.3)for τ = 1 , , ..., t , g ∈ G , s ∈ S , and y ∈ M ( S × G, X ).We associate entities here with those defined in Section 4 when ¯ t there is fixed at the t here. To signify the difference in the two notational systems, we add superscript “ K ” tosymbols defined for the previous section. For instance, we write v Kτ for the v τ defined inthat section, which has a different meaning than the v τ here. Now, our α t − τ · v τ ( s, σ ∗ , x ∗ , y )can be understood as v Kt +1 − τ ( s, σ ∗ , x ′ , y ), with x ′ = ( x ′ t +1 − τ , ..., x ′ t ) ∈ ( M ( S × G, X )) τ beingsuch that x ′ t ′ = x ∗ for t ′ = t + 1 − τ, ..., t . Due to the association of σ ∗ with x ∗ through thedefinition (26), we can understand σ ∗ as T K [1 ,τ − ( x ′ [1 ,τ − ) ◦ σ K , where x ′ [1 ,τ − = ( x ′ , ..., x ′ τ − ) ∈ ( M ( S × G, X )) τ − is such that x ′ t ′ = x ∗ for t ′ = 1 , , ..., τ − − ǫ/ x ∗ , σ ∗ ) offers an ( ǫ/ K ( σ ∗ ) with ¯ t = t , θ Kτ = θ , and ψ Kτ = α τ − · ψ . Even though Theorem 2 is nominallyabout going from an 0-equilibrium for the nonatomic game to ǫ -equilibria for finite games,we can follow exactly the same logic used to prove it to go from an ( ǫ/ ǫ/ n large enough andany y ∈ M ( S × G, X ), Z S n (cid:0) σ K (cid:1) n ( ds ) · v Knt (cid:0) s , ε ( s − ) , x ′ [1 t ] , x ′ (cid:1) ≥ Z S n (cid:0) σ K (cid:1) n ( ds ) · v Knt (cid:0) s , ε ( s − ) , x ′ [1 t ] , y (cid:1) − ǫ , (C.4)where x ′ [1 t ] is again to be understood as the policy that takes action x ∗ ( s, g ) whenever themost immediate state-shock pair is ( s, g ). But this translates into (C.2).31 eferences [1] Adlakha, S. and R. Johari. 2013. Mean Field Equilibrium in Dynamic Games withComplementarities. Operations Research , , pp. 971-989.[2] Adlakha, S., R. Johari, and G.Y. Weintraub. 2011. Equilibria of Dynamic Games withMany Players: Existence, Approximation, and Market Structure. Working Paper, Cal-ifornia Institute of Technology.[3] Al-Najjar, N.I. and R. Smorodinsky. 2001. Large Nonanonymous Repeated Games. Games and Economic Behavior , , pp. 26-39.[4] Al-Najjar, N.I. 2008. Large Games and the Law of Large Numbers. Games and EconomicBehavior , , pp. 1-34.[5] Aumann, R.J. 1964a. Markets with a Continuum of Traders. Econometrica , , pp.39-50.[6] Aumann, R.J. 1964b. Mixed and Behavior Strategies in Infinite Extensive Games. InM. Dresher, L. Shapley, and A. Tucker (Eds.), Annals of Mathematics Studies , , Advances in Game Theory , pp. 627-650, Princeton University Press, Princeton, NJ.[7] Bergin, J. and D. Bernhardt. 1995. Anonymous Sequential Games: Existence and Char-acterization of Equilibria.
Economic Theory , , pp. 461-489.[8] Carmona, G. 2004. Nash Equilibria of Games with a Continuum of Players. WorkingPaper, Universidade Nova de Lisboa.[9] Duffie, D., J. Geanakoplos, A. Mas-Colell, and A. McLennan. 1994. Stationary MarkovEquilibria. Econometrica , , pp. 745-781.[10] Ethier, S.N. and T.G. Kurtz. 1986. Markov Processes: Characterization and Conver-gence . John Wiley & Sons, New York.[11] Green, E.J. 1980. Non-cooperative Price Taking in Large Dynamic Markets.
Journal ofEconomic Theory . , pp. 155-182.[12] Green, E.J. 1984. Continuum and Finite-player Noncooperative Models of Competition. Econometrica , , pp. 975-993. 3213] Hopenhayn, H.A. 1992. Entry, Exit, and Firm Dynamics in Long Run Equilibrium. Econometrica , , pp. 1127-1150.[14] Housman, D. 1988. Infinite Player Noncooperative Games and the Continuity of theNash Equilibrium Correspondence. Mathematics of Operations Research , , pp. 488-496.[15] Huang, M., R.P. Malhame, and P.E. Caines. 2006. Large Population Stochastic DynamicGames: Closed-loop McKean-Vlasov Systems and the Nash Certainty Equivalence Prin-ciple. Communications in Information and Systems , , pp. 221-252.[16] Jovanovic, B. and R.W. Rosenthal. 1988. Anonymous Sequential Games. Journal ofMathematical Economics , , pp. 77-88.[17] Kalai, E. 2004. Large Robust Games. Econometrica , , pp. 1631-1665.[18] Khan, M.A., K.P. Rath, and Y. Sun. 1997. On the Existence of Pure Strategy Equi-librium in Games with a Continuum of Players. Journal of Economic Theory , , pp.13-46.[19] Mas-Colell, A. 1984. On a Theorem of Schmeidler. Journal of Mathematical Economics , , pp. 201-206.[20] Mertens, J.F. and T. Parthasarathy. 1987. Equalibria for Discounted Stochastic Games.CORE Discussion Paper No. 8750.[21] Parthasarathy, K.R. 2005. Probability Measures on Metric Spaces . AMS Chelsea Pub-lishing, Providence, Rhode Island.[22] Reny, P.J. and M. Perry. 2006. Towards the Foundation for Rational ExpectationsEquilibrium.