[PDF] Analysis of Markovian Competitive Situations using Nonatomic Games

Abstract

For dynamic situations where the evolution of a player's state is influenced by his own action as well as other players' states and actions, we show that equilibria derived for nonatomic games (NGs) can be used by their large finite counterparts to achieve near-equilibrium performances. We focus on the case with quite general spaces but also with independently generated shocks driving random actions and state transitions. The NG equilibria we consider are random state-to-action maps that pay no attention to players' external environments. They are adoptable by a variety of real situations where awareness of other players' states can be anywhere between full and non-existent. Transient results here also form the basis of a link between an NG's stationary equilibrium (SE) and good stationary profiles for large finite games.

Full PDF

aa r X i v : . [ q -f i n . E C ] A p r Analysis of Markovian Competitive Situations usingNonatomic Games

Jian YangDepartment of Management Science and Information SystemsBusiness School, Rutgers UniversityNewark, NJ 07102Email: [email protected] 2016; revised, February 2017

Abstract

For dynamic situations where the evolution of a player’s state is inﬂuenced byhis own action as well as other players’ states and actions, we show that equilibriaderived for nonatomic games (NGs) can be used by their large ﬁnite counterpartsto achieve near-equilibrium performances. We focus on the case with quite generalspaces but also with independently generated shocks driving random actions and statetransitions. The NG equilibria we consider are random state-to-action maps that payno attention to players’ external environments. They are adoptable by a variety ofreal situations where awareness of other players’ states can be anywhere between fulland non-existent. Transient results here also form the basis of a link between an NG’sstationary equilibrium (SE) and good stationary proﬁles for large ﬁnite games.

Keywords:

Nonatomic Game; Markov Equilibrium; Large Finite Game Introduction

Many multi-period competitive situations, as ﬁrst noted by Shapley [25], involve randomly-evolving player states that aﬀect players’ payoﬀs. When making a decision, a player has tocontemplate not only what states other players are in and how other players will act, butalso how his and others’ states and actions will inﬂuence the future evolution of all players’states. Another complicating factor is that players may have zero, partial, or full knowledgeof other players’ states before they take their actions in each period. The task of analyzingthese dynamic games is certainly daunting. Consider a dynamic pricing game as an example.Multiple ﬁrms start a ﬁxed time horizon with stocks of the same product. Each of them isbent on using pricing to inﬂuence demand and earn the highest revenue from selling theirrespective stocks. In any given period, a ﬁrm is aware of its own current inventory level butnot the levels of others. Yet, demand arrival to the ﬁrm is both random and inﬂuenced bynot only its own price, but also prices charged by other ﬁrms.The ultimate goal with such a Markovian game lies in identifying an equilibrium actionplan that will earn each player the highest total payoﬀ when other players adhere to theplan. But even in the stationary setting, known equilibria come in quite complicated formsthat for real implementation, demand a high degree of coordination among players; see,e.g., Mertens and Parthasrathy [20], Duﬃe, Geanakoplos, Mas-Colell, and McLennan [9],and Solan [26]. Alternatively, we propose that equilibria be reached asymptotically as thenumber of players grows, on the premise that the game’s nonatomic-game (NG) counterpartis analyzable. In the latter, a continuum of players are in competition, none of whom havingany discernible inﬂuence on any other players and yet all players in aggregation hold swayon players’ payoﬀs and state evolutions. The key advantage of such a game is that its statedistribution will evolve in a deterministic fashion. This results in the relatively simple formtaken by the NG’s equilibria x : the pure or mixed action plan x t ( s t ), though dependent onthe time period t and his own individual state s t , is insensitive to whatever portion of theoverall state distribution that the player can observe.When an NG equilibrium is handy, we show that it can be used on the original ﬁniteMarkovian game to serve our intended purpose. Relying on intermediate results stemmingfrom the weak Law of Large Numbers (LLN) concerning empirical distributions, we establishtwo main results. In Theorem 1, we show that the empirical distribution of players’ states,which is itself random in the ﬁnite game, will nevertheless converge in probability to thedeterministic distribution as predicted for the NG counterpart when the number of players2rows to inﬁnity. This convergence paves way for Theorem 2, which states that players canapply the observation-blind NG equilibrium to the ﬁnite-player situation and gain an averageperformance that is ever harder to beat as the number of players grows. In both results, the“average” on players’ states is assessed on the state distribution prevailing in either the NGor the ﬁnite game. After assuming time-invariant payoﬀ and transition functions, as wellas ﬁxed discountings over time and an inﬁnite time horizon, we obtain a stationary setting.For this, we establish Theorem 3, eﬀectively our aﬃrmative answer to whether stationaryequilibria (SE) studied in past literature can be useful in large ﬁnite games.The above theory will be most useful when the NG counterpart is relatively easy to dealwith in comparison to the corresponding ﬁnite games. Besides evidence in literature, thispoint is further buttressed by the dynamic pricing game mentioned earlier. Presented in Yang[33] as supplementary material to the current paper, our analysis demonstrates the usefulnessof the transient result Theorem 2. The game is also extended through the consideration oflocked-in production, wherein every ﬁrm uses production to bring its inventory back up to apre-determined level whenever it becomes empty. The resultant game is again asymptoticallyanalyzable due to the stationary result Theorem 3.As our foremost contribution, we established one more link between NGs and their ﬁnite-game counterparts. Previously, links were mostly established for single-period games, specialmulti-period games without individual states, or games exhibiting stationary features. Theintroduction of information-carrying individual states allow in for proper treatment a muchwider body of applicable situations involving present-future tradeoﬀs and transient prop-erties. Comparing to the earlier work Yang [32] which dealt with the NG-ﬁnite link forMarkovian games as well, the current paper treats more general, non-discrete state andaction spaces. As a tradeoﬀ, we are compelled to let random shocks drive both decisionmaking and state evolution. This, as is backed up by results such as Aumann [6] on the in-terchangeability between the presentations with and without drivers, does not much restrictthe generality of our results. Moreover, we demonstrated that the usefulness of SEs to largeﬁnite games stems from more fundamental properties possessed by transient NG equilibria.Here is our plan for the remainder of the paper. We spend Section 2 on a survey ofrelated research and Section 3 on basic model primitives. The nonatomic game is introducedin Section 4, while ﬁnite games are treated in Section 5. We present the main transientconvergence results in Section 6. These are used in Section 7 to establish a link betweenSEs and large ﬁnite games with stationary features. Further discussion is made in Section 8,3hile the paper is concluded in Section 9. NGs are often easier to analyze than their ﬁnite counterparts, because in them, the actionof an individual player has no impact on payoﬀs and future state evolutions of the otherplayers. Therefore, they are often used as proxies of real competitive systems in economicstudies; see, e.g., Aumann [5] and Reny and Perry [22]. Systematic research on NG startedwith Schmeidler [24]. He formulated a single-period semi-anonymous NG, wherein the jointdistribution of other players’ types and actions may aﬀect any given player’s payoﬀ. Whenthe action space is ﬁnite, Schmeidler established the existence of pure equilibria when thegame becomes anonymous, so that only the distribution of other players’ actions matters.Mas-Colell [19] showed the existence of distributional equilibria in anonymous NGs withcompact action spaces. Khan, Rath, and Sun [18] identiﬁed a certain limit to which Schmei-dler’s result can be extended. Links between NGs and their ﬁnite counterparts were coveredin Green [12], Housman [14], Carmona [8], Kalai [17], Al-Najjar [4], and Yang [31].This paper diﬀers from the above by its focus on multi-period games. For such gameswithout individual states that allow past actions to impact future gains, Green [11], Sabourian[23], and Al-Najjar and Smorodinsky [3] showed that equilibria for large games are nearlymyopic. With individual states that inherit traces of past actions, the games we study posenew challenges. An NG equilibrium for our situation is certainly not myopic as it takesinto account the current action’s future consequences. Rather, it is insensitive to real-timeobservations made on other players’ states. We succeed in showing that such a simple actionplan can be used proﬁtably in ﬁnite situations with randomly evolving state distributions ofwhich a player may have zero, partial, or full knowledge. The type of NGs we deal with aresimilar to sequential anonymous games studied by Jovanovic and Rosenthal [16], who estab-lished existence of distributional equilibria. The result was generalized to games involvingaggregate shocks by Bergin and Bernhardt [7]. Diﬀerent from these papers, we work on thelink between NGs and ﬁnite games, not the NGs themselves.In their eﬀort to simplify dynamic games, some authors went further than silencingindividual players’ inﬂuences as done through the NG approach. In addition, they pursuedthe so-called stationary equilibria (SE), which stressed further the long-run steady-statenature of individual action plans and system-wide state distributions; see, e.g., Hopenhayn413] and Adlakha and Johari [1]. The oblivious equilibrium (OE) concept as proposed byWeintraub, Benkard, and van Roy [29], though accounting for impacts of large players,took the same stationary approach by letting ﬁrms beware of only long-run average statedistributions. We caution that the implicit stationarity of SE or OE renders it inappropriatefor applications that are transient by nature; for instance, the dynamic pricing game studiedin Yang [33] where the inventory level of every ﬁrm can only decrease over time.Some recent works also contributed on the links between equilibria of inﬁnite-playergames and their ﬁnite-player brethren. Weintraub, Benkard, and van Roy [30] did so for asetting where long-run average system state can be deﬁned. Adlakha, Johari, and Weintraub[2] established the existence of SE and achieved a similar conclusion by using only exoge-nous conditions on model primitives. Weintraub et al. [28] studied nonstationary obliviousequilibria (NOE) that capture transient behaviors of players, and showed their usefulness inﬁnite-player situations by relying on a “light-tail” condition on players’ state distributionssimilar to that used in [30]. Huang, Malhame, and Caines [15] dealt with a continuous-timemulti-player system where independent diﬀusion processes provide random drivers. Theyreached equilibria in the nonatomic limit, and derived asymptotic results as the numberof players becomes large. In the work, other players impact a given player through linearfunctionals of the state distribution they form; meanwhile, their actions play no direct role.Our discrete-time framework aﬀorded us almost full generality regarding other players’impacts on any given player’s payoﬀs and state transitions—it is the joint state-action dis-tribution that forms the environment faced by an individual player. As already mentioned,while this paper tackles the case where exogenous shocks drive state evolution and decisionmaking, Yang [32] dealt with the setting where such shocks are not necessarily identiﬁable;however, technical challenges faced there forced state and action spaces to be discrete.

In the dynamic games we study, players are engaged in multi-period competition in periods t = 1 , , ..., ¯ t . In period t , a player’s payoﬀ ψ t ( s, x, µ ) depends on his state s , action x , andsome µ depicting the outside environment. We suppose the game is semi-anonymous, sothat µ can just be the joint distribution of other players’ states and actions. The dynamicsof the game is represented by a function θ t ( s, x, µ, i ), where s , x , and µ are deﬁned as above,and i is an idiosyncratic shock the player experiences individually after taking his action.5ll players’ post-action shocks are independently sampled from a common distribution ι .We allow players to cast dices to decide their actions. However, we do not model theextent to which players can observe their outside environments; after all, we focus onlyon action plans that do not take advantage of any such observations. In every period t , wesuppose each player receives another idiosyncratic shock g , this time before taking his action.All players’ pre-action shocks, such as outcomes of dice casts, are independently sampledfrom a common distribution γ . We study the case where a player’s action x t ( s, g ) dependsmerely on his own state s and the shock g that he himself has received. The main purposeof the paper is to show that one such action plan x [1¯ t ] ≡ ( x t ) t =1 ,..., ¯ t is quite suﬃcient for themulti-period game just described, even when the latter may be transient in nature and ofthe more complex ﬁnite-player variety.Some notations are needed for formal deﬁnitions. Given a metric space A , we use d A to denote its metric, B ( A ) its Borel σ -ﬁeld, and P ( A ) the set of all probability measures(distributions) on the measurable space ( A, B ( A )). The space P ( A ) is metrized by theProhorov metric ρ A , which induces on it the weak topology. Given metric spaces A and B ,we use M ( A, B ) to represent all measurable functions from A to B .We use complete separable metric space S for individual states s and separable metricspace X for player actions x . In a semi-anonymous fashion, payoﬀs and state transitionsdepend on the joint distribution µ ∈ P ( S × X ) of other players’ states and actions. Letpre-action shocks g come from a complete separable metric space G . In every period, theseaction-inﬂuencing shocks are independently drawn from a common distribution γ ∈ P ( G ).Let post-action shocks i come from a complete separable metric space I . In every period,these transition-inﬂuencing shocks are independently drawn from a common distribution ι ∈ P ( I ). The completeness requirements on S , G , and I stem from the need to invokeLemma 3 in Appendix A. These are certainly not stringent.For period t = 1 , ..., ¯ t , a player’s state s ∈ S , his action x ∈ X , and the joint state-actiondistribution µ ∈ P ( S × X ) he faces, together determine his payoﬀ in period t . In fact, werequire there to be a bounded payoﬀ function ψ t : S × X × P ( S × X ) −→ [ − ¯ ψ t , ¯ ψ t ] , (1)where ¯ ψ t is some positive constant. It satisﬁes that ψ t ( · , · , µ ) ∈ M ( S × X, [ − ¯ ψ t , ¯ ψ t ]) forevery given distribution µ ∈ P ( S × X ). As the same player will enter a new state under6ost-action shock i ∈ I , we require there to be θ t : S × X × P ( S × X ) × I −→ S. (2)It satisﬁes that θ t ( · , · , µ, · ) ∈ M ( S × X × I, S ) at every distribution µ ∈ P ( X × S ).The action plans we consider are of the form x t : S × G −→ X, (3)which are required to be members of M ( S × G, X ). That is, a player will use action x t ( s, g )in period t when he starts with state s ∈ S and receives pre-action shock g . We call any statedistribution σ ∈ P ( S ) a pre-action environment, because the one formed by other playersis what a player could potentially see at the beginning of any period. Also, call any jointstate-action distribution µ ∈ P ( S × X ) an in-action environment, because the one formedby other players is what a player could potentially see in the midst of play in any period.Let us recount our model primitives as follows: the horizon length ¯ t ; the state space S , the action space X , the pre-action shock space G , the post-action shock space I ; also,the pre-action shock distribution γ , the post-action shock distribution ι ; ﬁnally for periods t = 1 , ..., ¯ t , the payoﬀ functions ψ t and state transition functions θ t . Given an initial pre-action environment σ ∈ P ( S ), we can deﬁne a nonatomic game Γ( σ )which starts period 1 with σ as the distribution of all players’ states. We focus on policyproﬁles of the form x [1¯ t ] ≡ ( x t ) t =1 ,..., ¯ t ∈ ( M ( S × G, X )) ¯ t , where each x t ∈ M ( S × G, X ) is amap from a player’s state-shock pairs to actions. Along with the given initial environment σ ,we suppose such a proﬁle will help generate a deterministic pre-action environment trajectory σ [1 , ¯ t +1] ≡ ( σ t ) t =1 , ,..., ¯ t, ¯ t +1 ∈ ( P ( S )) ¯ t +1 . This allows a player’s policy to be observation-blind;that is, what portion of σ t is observable to the player in each period t is not of any concern.The determinism of the environment evolution in Γ( σ ) is justiﬁable by Sun’s [27] LLNinvolving a continuum of indexed players.We now discuss how the deterministic trajectory can be formed. Let t = 1 , ..., ¯ t be given.When all players form state distribution σ t ∈ P ( S ) at the beginning and adopt the sameplan x t ∈ M ( S × G, X ) for the period, the in-action environment µ t ≡ M ( σ t , x t ) ∈ P ( S × X )to be experienced by all players will take the form µ t = M ( σ t , x t ) = ( σ t × γ ) · (prj S , x t ) − , (4)7here prj S stands for the projection map from S × G to S . The meaning for (4) is that, forany measurable joint state-action set W ′ ∈ B ( S × X ), µ t ( W ′ ) = ( σ t × γ ) (cid:0) (prj S , x t ) − ( W ′ ) (cid:1) = Z S Z G [( s, x t ( s, g )) ∈ W ′ ] · γ ( dg ) · σ t ( ds ) . (5)This reﬂects that the joint distribution for states and pre-action shocks is the product form σ t × γ ; also, x t provides the map from state-shock pairs to actions for this period.For a player who starts with state s t and has experienced pre-action shock g t as well aspost-action shock i t , his new state will be governed by (2): s t +1 = θ t ( s t , x t ( s t , g t ) , M ( σ t , x t ) , i t ) . (6)To describe the transition of the overall pre-action environment from σ t to σ t +1 under actionplan x t , we deﬁne operator T t ( x t ) on P ( S ). Note that states are distributed according to σ t , pre-action shocks are distributed according to γ , and post-action shocks are distributedaccording to ι . So following (6), σ t +1 = T t ( x t ) ◦ σ t = ( σ t × ι × γ ) · (cid:2) θ t (cid:0) prj S , x t · prj S × G , M ( σ t , x t ) , prj I (cid:1)(cid:3) − , (7)meaning that, for any measurable action set S ′ ∈ B ( S ), σ t +1 ( S ′ ) = [ T t ( x t ) ◦ σ t ]( S ′ )= R S R G R I [ θ t ( s, x t ( s, g ) , M ( σ t , x t ) , i ) ∈ S ′ ] · ι ( di ) · γ ( dg ) · σ t ( ds ) . (8)We can iteratively deﬁne T [ tt ′ ] ( x [ tt ′ ] ) for t ′ = t − , t, t + 1 , ... so that T [ t,t − is the identitymapping on P ( S ) and for t ′ = t, t + 1 , ... , T [ tt ′ ] ( x [ tt ′ ] ) = T t ′ ( x t ′ ) ◦ T [ t,t ′ − ( x [ t,t ′ − ) . (9)The environment trajectory alluded to earlier is therefore σ [1 , ¯ t +1] = ( T [1 ,t − ( x [1 ,t − ) ◦ σ ) t =1 , ,..., ¯ t, ¯ t +1 . (10)In deﬁning Γ( σ )’s equilibria, we subject a candidate policy proﬁle to the one-time de-viation of a single player, who is negligible in his inﬂuence over others. The deviation willnot alter the environment trajectory corresponding to the candidate proﬁle. Thus, we deﬁne v t ( s t , σ t , x [ t ¯ t ] , y t ) as the total expected payoﬀ a player can make from time t to ¯ t , when hestarts with state s t ∈ S , other players form pre-action environment σ t ∈ P ( S ), all playersadopt policy x [ t ¯ t ] ≡ ( x t ′ ) t ′ = t,..., ¯ t ∈ ( M ( S × G, X )) ¯ t − t +1 with the exception of the current8layer in period t alone, who deviates to policy y t ∈ M ( S × G, X ) in that period. As aterminal condition, we have v ¯ t +1 ( s ¯ t +1 , σ ¯ t +1 , y ¯ t +1 ) = 0 . (11)For t = ¯ t, ¯ t − , ...,

1, we have v t ( s t , σ t , x [ t ¯ t ] , y t ) = R G [ ψ t ( s t , y t ( s t , g t ) , M ( σ t , x t )) + R I v t +1 ( θ t ( s t , y t ( s t , g t ) ,M ( σ t , x t ) , i t ) , T t ( x t ) ◦ σ t , x [ t +1 , ¯ t ] , x t +1 ) · ι ( di t )] · γ ( dg t ) , (12)due to the dynamics illustrated in (6) to (8). The deviation y t aﬀects the current player’saction y t ( s t , g t ) in period t and his own state θ t ( s t , y t ( s t , g t ) , M ( σ t , x t ) , i t ) in period t + 1. Butas a distinctive feature of the NG setup, it has no bearing on the period-( t + 1) pre-actionenvironment T t ( x t ) ◦ σ t .Now deﬁne u t : P ( S ) × ( M ( S × G, X )) ¯ t − t +1 × M ( S × G, X ) −→ ℜ so that u t ( σ t , x [ t ¯ t ] , y t ) = Z S v t ( s t , σ t , x [ t ¯ t ] , y t ) · σ t ( ds t ) . (13)This can be understood as one particular player’s average gain from period t onward whenthe same conditions speciﬁed earlier prevail and his period- t state is sampled from the dis-tribution σ t . We deem policy x ∗ [1¯ t ] ≡ ( x ∗ t ) t =1 , ,..., ¯ t ∈ ( M ( S × G, X )) ¯ t a Markov equilibriumfor the game Γ( σ ) when, for every t = 1 , , ..., ¯ t and y t ∈ M ( S × G, X ), u t (cid:16) T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ , x ∗ [ t ¯ t ] , x ∗ t (cid:17) ≥ u t (cid:16) T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ , x ∗ [ t ¯ t ] , y t (cid:17) . (14)That is, policy x ∗ [1¯ t ] will be regarded an equilibrium when it cannot be bettered by anyplan y t ∈ M ( S × G, X ) in any period t in an average sense that is deﬁned by the period- t environment σ t = T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ . We caution that (14) is weaker than v t (cid:16) s t , T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ , x ∗ [ t ¯ t ] , x ∗ t (cid:17) ≥ v t (cid:16) s t , T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ , x ∗ [ t ¯ t ] , y t (cid:17) , (15)for every s t ∈ S . On the other hand, since y t ∈ M ( S × G, X ) allows for much freedom inchoosing for each state s ∈ S and shock g ∈ G a competitive reaction y t ( s, g ), there is notmuch diﬀerence between the two criteria aside from measurability subtleties. More notations are needed to appropriately describe ﬁnite games. For metric space A and a ∈ A , we use ε ( a ) to denote the singleton probability measure with ε ( a )( { a } ) = 1. For9 = ( a , ..., a n ) ∈ A n where n ∈ N , the set of natural numbers, we use ε ( a ) for P nm =1 ε ( a m ) /n .The two uses are consistent. We also use P n ( A ) for the space of probability measures of thetype ε ( a ) for a ∈ A n , i.e., the space of empirical distributions generated from n samples.For some n = 2 , , ... and initial state distribution σ ∈ P n ( S ), we can deﬁne an n -player game Γ n ( σ ). Note the initial pre-action environment σ must be of the form ε ( s ) = ε ( s , s , ..., s n ), where each s m ∈ S is player m ’s initial state. The game’s payoﬀsand state transitions are still governed by (1) and (2), respectively. In period t , the pre-actionenvironment is also some σ t = ε ( s t , ..., s tn ) ∈ P n ( S ) ⊂ P ( S ). Hence, the in-action environ-ment µ t ∈ P n − ( S × X ) ⊂ P ( S × X ) experienced by any designated player 1 is the empiricaldistribution ε ( s t, − , y t, − ) = ε (( s t , y t ) , ..., ( s tn , y tn )) when each player m is in state s tm ∈ S and takes action y tm ∈ X . Let players still adopt policy x [1¯ t ] ≡ ( x t ) t =1 ,..., ¯ t ∈ ( M ( S × G, X )) ¯ t ,which is but the crudest of many choices available to the n players. We shall see later thatthis restriction is not going to do much harm.Simplistic as it may seem, x will not merely generate a deterministic environment tra-jectory. Given pre-action shock vector g t = ( g t , ..., g tn ) ∈ G n and post-action shock vector i t = ( i t , ..., i tn ) ∈ I n , we can deﬁne T nt ( x t , g t , i t ) as the operator on P n ( S ) that convertsa period- t pre-action environment into a period-( t + 1) one. Thus following (4) to (6), ε ( s t +1 ) = T nt ( x t , g t , i t ) ◦ ε ( s t ) is such that s t +1 ,m = θ t ( s tm , x t ( s tm , g tm ) , M n ( s t, − m , g t, − m , x t ) , i tm ) , ∀ m = 1 , , ..., n, (16)where M n ( s t, − m , g t, − m , x t ) = ε ( s t, − m , g t, − m ) · (prj S , x t ) − , (17)and each ε ( s t, − m , g t, − m ) represents the empirical distribution built on state-shock pairs( s t , g t ), ... , ( s t,m − , g t,m − ), ( s t,m +1 , g t,m +1 ), ... , ( s tn , g tn ). The latter reﬂects that player m ’s in-action environment is made up of the states and actions of the other n − T n, [ tt ′ ] as theidentity map when t ′ ≤ t − t ≤ t ′ , let T n, [ tt ′ ] ( x [ tt ′ ] , g [ tt ′ ] , i [ tt ′ ] ) = T nt ′ ( x t ′ , g t ′ , i t ′ ) ◦ T n, [ t,t ′ − ( x [ t,t ′ − , g [ t,t ′ − , i [ t,t ′ − ) . (18)The evolution of pre-action envirnoments σ t = ε ( s t ) is guided by the random shock vectors g t and i t , and hence is stochastic by nature.For an n -player game, let v nt ( s t , ε ( s t, − ) , x [ t ¯ t ] , y t ) be the total expected payoﬀ player 1can make from t to ¯ t , when he starts with state s t ∈ S , other players’ initial environments10re describable by their aggregate empirical state distribution ε ( s t, − ) = ε ( s t , ..., s tn ), andall players adopt the policy x [ t ¯ t ] ≡ ( x t ′ ) t ′ = t,..., ¯ t ∈ ( M ( S × G, X )) ¯ t − t +1 from period t to period¯ t with the exception of player 1 in period t alone, who deviates to policy y t ∈ M ( S × G, X ).As a terminal condition, we have v n, ¯ t +1 ( s ¯ t +1 , , ε ( s ¯ t +1 , − ) , y ¯ t +1 ) = 0 . (19)For t = ¯ t, ¯ t − , ...,

1, we have the recursive relationship v nt ( s t , ε ( s t, − ) , x [ t ¯ t ] , y t ) = R G n γ n ( dg t ) × { ψ t ( s t , y t ( s t , g t ) , M n ( s t, − , g t, − , x t ))+ R I n ι n ( di t ) × v n,t +1 ( θ t ( s t , y t ( s t , g t ) , M n ( s t, − , g t, − , x t ) , i t ) , [ T nt ( x t , g t , i t ) ◦ ε ( s t )] − , x [ t +1 , ¯ t ] , x t +1 ) } , (20)due to the dynamics illustrated in (6) and (16). By [ T nt ( x t , g t , i t ) ◦ ε ( s t )] − , we mean ε ( s t +1 , − ),where ε ( s t +1 ) is T nt ( x t , g t , i t ) ◦ ε ( s t ) as deﬁned through (16). The current (20) is much morecomplicated than the NG counterpart (12). The evolution from period t to t +1 now dependson pre-action shocks g t ≡ ( g t , ..., g tn ) and post-action shocks i t ≡ ( i t , ..., i tn ). Also, the in-action environment M n ( s t, − , g t, − , x t ) experienced by player 1 excludes his own state andaction, and hence is diﬀerent from the environment faced by any other player. Similarly, thein-action environment [ T nt ( x t , g t , i t ) ◦ ε ( s t )] − to be faced by player 1 in period t + 1 is uniqueto him as well. The added complexity motivates us to exploit the easier-to-handle NG case.Let σ [1¯ t ] ≡ ( σ t ) t =1 ,..., ¯ t ∈ ( P ( S )) ¯ t be a sequence of environments. For ǫ ≥

0, we deem x ∗ [1¯ t ] ≡ ( x ∗ t ) t =1 ,..., ¯ t ∈ ( M ( S × G, X )) ¯ t an ǫ -Markov equilibrium for the game family (Γ n ( ε ( s )) | s ∈ S n ) in the sense of σ [1¯ t ] when, for every t = 1 , ..., ¯ t and y t ∈ M ( S × G, X ), Z S n v nt (cid:16) s t , ε ( s t, − ) , x ∗ [ t ¯ t ] , x ∗ t (cid:17) · σ nt ( ds t ) ≥ Z S n v nt (cid:16) s t , ε ( s t, − ) , x ∗ [ t ¯ t ] , y t (cid:17) · σ nt ( ds t ) − ǫ. (21)That is, action plan x ∗ [1¯ t ] will be an ǫ -Markov equilibrium in the sense of σ [1¯ t ] when underits guidance, the average payoﬀ from any period t on will not be improved by more than ǫ through any deviation, where the “average” is taken with respect to state distribution σ t . We can achieve convergences of environments and then of equilibria. The former is morefundamental and challenging, and the latter is built on it.11 .1 Convergence of Environments

Even without touching upon payoﬀs or equilibria, we can establish a link between ﬁnite gamesand their NG counterpart. It reﬂects that stochastic environment pathways experienced bylarge ﬁnite games converge to the NG’s deterministic environment trajectory.Let A , B , and C be metric spaces and π B ∈ P ( B ) be a distribution. We use K ( A, B, π B , C ) ⊆ M ( A × B, C ) to represent the space of all measurable functions from A × B to C that areuniformly continuous in a probabilistic sense. The criterion for y ∈ K ( A, B, π B , C ) is thatfor any ǫ >

0, there exists δ >

0, so that for any a, a ′ ∈ A satisfying d A ( a, a ′ ) < δ , π B ( { b ∈ B | d C ( y ( a, b ) , y ( a ′ , b )) < ǫ } ) > − ǫ. (22)When B is a singleton and hence π B is degenerate, y ∈ K ( A, B, π B , C ) merely means that y is a uniformly continuous function from A to C , a situation we denote by y ∈ K ( A, C ). Forregular B and π B , the meaning is somehow that continuity will happen in most cases.We now make two assumptions on the transition function θ t :(S1) For every µ ∈ P ( S × X ), the function θ t ( · , · , µ, · ) is a member of K ( S × X, I, ι, S ).That is, for any µ ∈ P ( S × X ) and ǫ >

0, there exist δ S > δ X >

0, so that for any s, s ′ ∈ S and x, x ′ ∈ X satisfying d S ( s, s ′ ) < δ S and d X ( x, x ′ ) < δ X , ι ( { i ∈ I | d S ( θ t ( s, x, µ, i ) , θ t ( s ′ , x ′ , µ, i )) < ǫ } ) > − ǫ. (S2) Not only is it true that θ t ( s, x, · , · ) ∈ K ( P ( S × X ) , I, ι, S ) at every ( s, x ) ∈ S × X ,but the continuity is also achieved at a rate independent of the ( s, x ) present. That is, forany µ ∈ P ( S × X ) and ǫ >

0, there exists δ >

0, so that for any µ ′ ∈ P ( S × X ) satisfying ρ S × X ( µ, µ ′ ) < δ , as well as any s ∈ S and x ∈ X , ι ( { i ∈ I | d S ( θ t ( s, x, µ, i ) , θ t ( s, x, µ ′ , i )) < ǫ } ) > − ǫ. For separable metric space A , we use ( A n , B n ( A )) to denote the product measurable spacethat houses n -long sample sequences. Given π ∈ P ( A ), we use π n to denote the productmeasure on ( A n , B n ( A )). We can show that a one-step evolution in a big game is not thatmuch diﬀerent from that in a nonatomic game. Proposition 1

Given separable metric space A , distribution π ∈ P ( A ) , and pre-actionenvironment σ ∈ P ( S ) , suppose s n = ( s n ( a ) | a ∈ A n ) for each n ∈ N is a member of M ( A n , S n ) , and ε ( s n ( a )) converges to σ in probability, to the eﬀect that π n ( { a ∈ A n | ρ S ( ε ( s n ( a )) , σ ) < ǫ } ) > − ǫ, or any ǫ > and any n large enough. Then, any T nt ( x, g, i ) ◦ ε ( s n ( a )) will converge to T t ( x ) ◦ σ in probability for any probabilistically continuous x . That is, for any x ∈ K ( S, G, γ, X ) , ( π × γ × ι ) n ( { ( a, g, i ) ∈ ( A × G × I ) n | ρ S ( T nt ( x, g, i ) ◦ ε ( s n ( a )) , T t ( x ) ◦ σ ) < ǫ } ) > − ǫ, for any ǫ > and any n large enough. Recall that ρ S is the Prohorov metric for measuring the distance between two statedistributions. Also, the operator T t ( x ) delineating the period- t transition of an NG’s pre-action environment is deﬁned at (8), and its ﬁnite-game counterpart T nt ( x, g, i ) is deﬁnedat (16). The proof of Proposition 1 calls upon Lemma 3 in Appendix A. This is why the spaces S , G , and I are required to be complete. Now imagine that ( A, B ( A ) , π ) provides exogenousshocks that drive games’ evolutions up to period t : A = S × G t − × I t − and π = σ × γ t − × ι t − . Proposition 1 states that, when starting period t with initial state vectors s n ( a )in n -player games that in aggregation increasingly resemble the given starting distribution σ for the NG, one will still get state vectors in large games that in aggregation resemblethe NG’s state distribution after the period- t transition. When exploiting this propositioniteratively, we can arrive at our ﬁrst main result on the convergence of environments. Theorem 1

Let a policy proﬁle x [ t ¯ t ] ∈ ( M ( S × G, X )) ¯ t − t +1 for periods t, t + 1 , ..., ¯ t be suchthat each x t ′ is a member of K ( S, G, γ, X ) . Then, when we sample s t = ( s t , ..., s tn ) froma given pre-action environment σ t ∈ P ( S ) , the sequence ( σ nt ′ ) t ′ = t,t +1 ,..., ¯ t, ¯ t +1 of stochasticpre-action environments will converge to the sequence ( σ t ′ ) t ′ = t,t +1 ,..., ¯ t, ¯ t +1 of deterministic pre-action environments in probability, where for each t ′ = t, t + 1 , ..., ¯ t, ¯ t + 1 , σ nt ′ is a sampleover the ε ( s t ′ ) ’s with ε ( s t ′ ) = T n, [ t,t ′ − ( x [ t,t ′ − , g [ t,t ′ − , i [ t,t ′ − ) ◦ ε ( s t ) , while ( s t , g [ t,t ′ − , i [ t,t ′ − ) is distributed according to ( σ t × γ t ′ − t × ι t ′ − t ) n ; also, σ t ′ = T [ t,t ′ − ( x [ t,t ′ − ) ◦ σ t . That is, forany ǫ > and any n large enough, (cid:16) σ t × γ ¯ t − t +1 × ι ¯ t − t +1 (cid:17) n (cid:16) ˜ A n ( ǫ ) (cid:17) > − ǫ, where ˜ A n ( ǫ ) ∈ B n ( S × G ¯ t − t +1 × I ¯ t − t +1 ) is such that, for any ( s t , g [ t, ¯ t ] , i [ t, ¯ t ] ) ∈ ˜ A n ( ǫ ) , ρ S ( σ nt ′ , σ t ′ ) < ǫ, ∀ t ′ = t, t + 1 , ..., ¯ t, ¯ t + 1 . The multi-period transition operator T [ t,t ′ − ( x [ t,t ′ − ) for the NG is deﬁned at (9), and itsﬁnite-game counterpart T n, [ t,t ′ − ( x [ t,t ′ − , g [ t,t ′ − , i [ t,t ′ − ) is deﬁned at (18). Suppose an NGstarts period t with pre-action environment σ t and a slew of ﬁnite games start the period13ith pre-action environments that are sampled from σ t . Let the evolution of both typesof games be guided by players acting according to the same probabilistically continuouspolicy proﬁle x [ t ¯ t ] . Then, as the numbers of players n involved in ﬁnite games grow toinﬁnity, Theorem 1 predicts for ever less chances for the ﬁnite games’ period- t ′ environments σ nt ′ = T n, [ t,t ′ − ( x [ t,t ′ − , g [ t,t ′ − , i [ t,t ′ − ) ◦ ε ( s t ) to veer oﬀ even slightly away from the NG’sdeterministic period- t ′ environment σ t ′ = T [ t,t ′ − ( x [ t,t ′ − ) ◦ σ t . We now set out to establish this section’s main result, that an equilibrium from the NG willserve as an ever more accurate approximate equilibrium for ever larger ﬁnite games. First,we need to assume that the single-period payoﬀ functions ψ t are continuous:(F1) Each ψ t ( s, x, µ ) is continuous in ( s, x ). That is, for any µ ∈ P ( S × X ) and ǫ > δ S > δ X >

0, so that for any s, s ′ ∈ S and x, x ′ ∈ X satisfying d S ( s, s ′ ) < δ S and d X ( x, x ′ ) < δ X , | ψ t ( s, x, µ ) − ψ t ( s ′ , x ′ , µ ) | < ǫ. (F2) Each ψ t ( s, x, µ ) is continuous in µ at a rate independent of the ( s, x ) present. Thatis, for any µ ∈ P ( S × X ) and ǫ >

0, there exists δ >

0, so that for any µ ′ ∈ P ( S × X )satisfying ρ S × X ( µ, µ ′ ) < δ , as well as any s ∈ S and x ∈ X , | ψ t ( s, x, µ ) − ψ t ( s, x, µ ′ ) | < ǫ. There are a couple of intermediate results, whose proofs are provided in Appendix B. Recallthat the value functions v t for an NG are deﬁned around (11) and (12), while the valuefunctions v nt for ﬁnite games are deﬁned around (19) and (20). Proposition 2 v t ( s t , σ t , x [ t ¯ t ] , x t ) is continuous in s t under probabilistically continuous x t ′ ’s. Proposition 3

Let σ t ∈ P ( S ) and x [ t ¯ t ] ∈ ( K ( S, G, γ, X )) ¯ t − t +1 be given. Suppose sequence s t, − = ( s t , s t , ... ) is sampled from σ t , then v nt ( s t , ε ( s nt, − ) , x [ t ¯ t ] , x t ) will converge to v t ( s t , σ t , x [ t ¯ t ] , x t ) in probability at an s t -independent rate, where s nt, − stands for the cutoﬀ ( s t , s t , ..., s tn ) . Now here comes our main transient result.

Theorem 2

For state distribution σ ∈ P ( S ) , suppose x ∗ [1¯ t ] ≡ ( x ∗ t ) t =1 , ,..., ¯ t ∈ ( K ( S, G, γ, X )) ¯ t is a probabilistically continuous Markov equilibrium of the nonatomic game Γ( σ ) . Then, or any ǫ > and large enough n ∈ N , this x ∗ [1¯ t ] is also an ǫ -Markov equilibrium for thegame family (Γ n ( ε ( s )) | s ∈ S n ) in the sense of σ [1¯ t ] ≡ ( σ t ) t =1 ,.., ¯ t , where every σ t = T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ . This means that for any t = 1 , ..., ¯ t and y t ∈ M ( S × G, X ) , (21) is true: Z S n v nt (cid:16) s t , ε ( s t, − ) , x ∗ [ t ¯ t ] , x ∗ t (cid:17) · σ nt ( ds t ) ≥ Z S n v nt (cid:16) s t , ε ( s t, − ) , x ∗ [ t ¯ t ] , y t (cid:17) · σ nt ( ds t ) − ǫ. Furthermore, the same is true in the sense of the stochastic pre-action environment se-quence σ n, [1¯ t ] ≡ ( σ nt ) t =1 ,..., ¯ t , where every σ nt is a sample over the ε ( s t ) ’s with ε ( s t ) = T n, [1 ,t − ( x [1 ,t − , g [1 ,t − , i [1 ,t − ) ◦ ε ( s ) , while ( s , g [1 ,t − , i [1 ,t − ) is distributed according to ( σ × γ t − × ι t − ) n . This means that, for any ǫ > and large enough n ∈ N , for any t = 1 , ..., ¯ t and y t ∈ M ( S × G, X ) , R S n σ n ( ds ) × R G n · ( t − γ n · ( t − ( dg ,t − ) × R I n · ( t − ι n · ( t − ( di [1 ,t − ) × v nt (cid:16) s t, , ε ( s t, − ) , x ∗ [ t ¯ t ] , x ∗ t (cid:17) ≥ R S n σ n ( ds ) × R G n · ( t − γ n · ( t − ( dg ,t − ) × R I n · ( t − ι n · ( t − ( di [1 ,t − ) ×× v nt (cid:16) s t, , ε ( s t, − ) , x ∗ [ t ¯ t ] , y t (cid:17) − ǫ, where both s t, and ε ( s t, − ) come from ε ( s t ) . Theorem 2 says that, when there are enough of them, players in a ﬁnite game can agree onan NG equilibrium and expect to lose little on average; also, the distribution based on which“average” is taken can be either the NG’s state distribution or even an accurate assessmentof what players’ states would be had they followed the NG equilibrium all along. In the latteroption, diﬀerent players’ states can even be correlated. In the NG limit, the evolution of pre-action environments is deterministic. An equilibrium here, which is necessarily observation-blind to the extent that other players’ states and actions do not inﬂuence it, serves as a goodasymptotic equilibrium for ﬁnite games when there are enough players; and, this asymptoticresult is independent of the observatory power of players in the ﬁnite games.

Now we study an inﬁnite-horizon model with stationary features. To this end, suppose thereis a payoﬀ function ψ , so that ψ t ( s, x, µ ) = α t − · ψ ( s, x, µ ) , ∀ t = 1 , , ..., (23)where α ∈ [0 ,

1) is a discount factor. Also, we use ¯ ψ for the bound ¯ ψ that appears in (1).In addition, suppose there is a state transition function θ , so that θ t ( s, x, µ, i ) = θ ( s, x, µ, i ) , ∀ t = 1 , , .... (24)15or the nonatomic game Γ with the above stationary features, we use x ≡ ( x ( s, g ) | s ∈ S, g ∈ G ) ∈ M ( S × G, X ) to represent a stationary policy proﬁle. It is a map from thecurrent period’s state and pre-action shock to the player’s action. Given an x ∈ M ( S × G, X ), we denote by T ( x ) the operator on P ( S ) that converts one state distribution σ to itscorresponding T ( x ) ◦ σ so that following (8), for every S ′ ∈ B ( S ),[ T ( x ) ◦ σ ]( S ′ ) = Z S Z G Z I ( θ ( s, x ( s, g ) , M ( σ, x ) , i ) ∈ S ′ ) · ι ( di ) · γ ( dg ) · σ ( ds ) . (25)An environment σ ∈ P ( S ) is said to be associated with x when σ = T ( x ) ◦ σ. (26)That is, we consider σ ∈ P ( S ) to be associated with x ∈ M ( S × G, X ) when the former isinvariant under the state transition facilitated by the T ( x ) operator.Suppose pre-action environment σ ∈ P ( S ) is associated with policy x ∈ M ( S × G, X ).For t = 0 , , ... , we deﬁne v t ( s, σ, x, y ) as the total expected payoﬀ a player can make fromperiod 1 to t , when he starts period 1 with state s ∈ S and outside environment σ , while allplayers keep on using policy x from period 1 to t with the exception of the current player inthe very beginning, who deviates to y ∈ M ( S × G, X ). As a terminal condition, we have v ( s, σ, x, y ) = 0 . (27)Due to the stationarity of the setting, we have, for t = 1 , , ... , v t ( s, σ, x, y ) = R G [ ψ ( s, y ( s, g ) , M ( σ, x ))+ α · R I v t − ( θ ( s, y ( s, g ) , M ( σ, x ) , i ) , σ, x, x ) · ι ( di )] · γ ( dg ) . (28)Using (27) and (28), we can inductively show that | v t +1 ( s, σ, x, y ) − v t ( s, σ, x, y ) |≤ α t · ¯ ψ. (29)The sequence { v t ( s, σ, x, y ) | t = 0 , , ... } is thus Cauchy with a limit point v ∞ ( s, σ, x, y ).This v ∞ ( s, σ, x, y ) can be understood as the inﬁnite-horizon total discounted expected payoﬀa player can obtain by starting with state s and environment σ , while all players adhere tothe action plan x except for the current player in the beginning, who deviates to y .We deem x ∗ ∈ M ( S × G, X ) a Markov equilibrium for the nonatomic game Γ when forsome σ ∗ ∈ P ( S ) associated with x ∗ in the fashion of (26) and every y ∈ M ( S × G, X ), Z S v ∞ ( s, σ ∗ , x ∗ , x ∗ ) · σ ∗ ( ds ) ≥ Z S v ∞ ( s, σ ∗ , x ∗ , y ) · σ ∗ ( ds ) . (30)16herefore, a policy will be considered an equilibrium when it induces an invariant environ-ment proﬁle under which the policy forms a best response in the long run.Now we move on to the n -player game Γ n with the same stationary features provided by ψ , θ , and α . Given policy proﬁle x = ( x ( s, g ) | s ∈ S, g ∈ G ) ∈ M ( S × G, X ), pre-actionshock vector g = ( g , ..., g n ) ∈ G n , and post-action shock vector i = ( i , ..., i n ) ∈ I n , wedeﬁne T n ( x, g, i ) as the operator on P n ( S ) that converts a period’s pre-action environmentinto that of a next period. Following (16), ε ( s ′ ) = T n ( x, g, i ) ◦ ε ( s ) is such that s ′ m = θ ( s m , x ( s m , g m ) , M n ( s − m , g − m , x ) , i m ) , ∀ m = 1 , , ..., n. (31)Let v nt ( s , ε ( s − ) , x, y ) be the total expected payoﬀ player 1 can make from period 1 to t ,when the player’s starting state is s ∈ S , other players’ initial environments is describableby their aggregate empirical state distribution ε ( s − ) = ε ( s , ..., s n ), and all players adoptthe policy x ∈ M ( S × G, X ) with the exception that player 1 adopts policy y ∈ M ( S × G, X )in the very beginning. As a terminal condition, we have v n ( s , ε ( s − ) , x, y ) = 0 . (32)For t = 1 , , ... , we have that v nt ( s , ε ( s − ) , x, y ) equals to R G n γ n ( dg ) × { ψ ( s , y ( s , g ) , M n ( s − , g − , x )) + α · R I n ι n ( di ) ×× v n,t − ( θ ( s , y ( s , g ) , M n ( s − , g − , x ) , i ) , [ T n ( x, g, i ) ◦ ε ( s )] − , x, x ) } , (33)where [ T n ( x, g, i ) ◦ ε ( s )] − stands for ε ( s ′− ), such that ε ( s ′ ) = T n ( x, g, i ) ◦ ε ( s ). Using (32)and (33), we can inductively show that | v n,t +1 ( s , ε ( s − ) , x, y ) − v nt ( s , ε ( s − ) , x, y ) |≤ α t · ¯ ψ. (34)Thus, the sequence { v nt ( s , ε ( s − ) , x, y ) | t = 0 , , ... } is Cauchy with limit v n ∞ ( s , ε ( s − ) , x, y ).We make the following assumptions, which are t -independent versions of (S1) to (F2):(S1-s) For every µ ∈ P ( S × X ), the function θ ( · , · , µ, · ) is a member of K ( S × X, I, ι, S ).(S2-s) Not only is it true that θ ( s, x, · , · ) ∈ K ( P ( S × X ) , I, ι, S ) at every ( s, x ) ∈ S × X ,but the continuity is also achieved at a rate independent of the ( s, x ) present.(F1-s) The function ψ ( s, x, µ ) is continuous in ( s, x ).(F2-s) The function ψ ( s, x, µ ) is continuous in µ at an ( s, x )-independent rate.Here comes our main result for the stationary case.17 heorem 3 Suppose x ∗ ∈ K ( S, G, γ, X ) is a probabilistically continuous Markov equilibriumfor the nonatomic game Γ . Let σ ∗ ∈ P ( S ) be associated with x ∗ in the fashion of (26). Then,for any ǫ > and large enough n ∈ N , for any y ∈ M ( S × G, X ) , Z S n v n ∞ ( s , ε ( s − ) , x ∗ , x ∗ ) · ( σ ∗ ) n ( ds ) ≥ Z S n v n ∞ ( s , ε ( s − ) , x ∗ , y ) · ( σ ∗ ) n ( ds ) − ǫ. Theorem 3 is proved in Appendix C. It states that players in a large ﬁnite game will notregret much by keeping on adopting a stationary equilibrium for its correspondent nonatomicgame. The regret is measured in an average sense, where the underlying invariant statedistribution for measuring “average” is part of the NG equilibrium. So players can fare wellby responding to their individual states in the same fashion indeﬁnitely.

Using this paper’s language and notation, we oﬀer a comparison with the most relevantpapers. Within the discrete-time framework while without considering atomic players orplayers’ entries and exits, we have arguably worked with the most general setup.Both Weintraub et al. [28] and Weintraub, Benkard, and van Roy [30] treated competingﬁrms on a common market as players. They allowed for entry and exit of ﬁrms, and accountedfor the eﬀect of ﬁrm density c per unit market size. Roughly speaking, their regular payoﬀis in the form of ψ ( s, c · µ | S ) − ψ ( x ), where µ | S stands for the marginal state distributionderivable from the joint state-action distribution µ . Also, ﬁrms’ state transitions are governedby a certain θ ( s, x, i ) that is independent of the environment µ .Weintraub et al. [28] arrived at something akin to our Theorem 2. In the mean time,Weintraub, Benkard, and van Roy [30] found a stationary policy of the form x ( s ) to suﬃcefor the NG limit. It was considered oblivious because of ﬁrms’ abilities to ignore the industrystate c · µ | S . When there are few dominant ﬁrms in it, an NG equilibrium was shown to workincreasingly well for larger ﬁnite models. This is close in spirit to our Theorem 3. We notethat θ ’s independence of µ helped greatly with their derivations. While free from the taskof dealing with entry, exit, or impacts of market size and number of ﬁrms, we have allowedplayers’ state transitions to be profoundly impacted by the environment that their collectivestates and actions fabricate. Namely, our θ t can depend on µ in virtually arbitrary fashions.Huang, Malhame, and Caines [15] dealt with continuous-time games with the state space S equal to the real line ℜ . These games’ discrete-time counterparts can be obtained by18eplacing their Brownian motions with symmetric random walks. In particular, we can letthe post-action shock space I be {− , +1 } and the probability ι be half on − θ t ( s, x, µ, i ) = Z ℜ θ ( s, x, s ′ ) · µ | S ( ds ′ ) + ¯ s · i, (35)where θ is a function from ℜ × X × ℜ to ℜ and ¯ s is a constant. So there, only the state-distribution portion of the joint state-action distribution µ of other ﬁrms aﬀect the currentﬁrm’s state transition; its impact is also felt in an average sense; moreover, the eﬀect of therandom shock is additive.Their one-period payoﬀ function can be understood as ψ t ( s, x, µ ) = Z ℜ ψ ( s, x, s ′ ) · µ | S ( ds ′ ) , (36)where ψ is a function from ℜ × X × ℜ to ℜ . Artiﬁcial randomization in decision makingturns out to be unnecessary—NG equilibria can be found in the form of x t ( s ) rather than themore general x t ( s, g ). We, on the other hand, believe that allowing other players’ actions toplay a role in both state transitions and one-period payoﬀs can greatly enhance the relevantmodels’ applicabilities. In the competitive pricing situation, for instance, the demand levelexperienced by a ﬁrm is perturbable by prices charged by other ﬁrms. It in turn inﬂuencesnot only the ﬁrm’s present proﬁtability but also its future inventory levels.As could be seen from equivalence results such as Aumann [6] (Lemma F), using pre-action shocks g and post-action shocks i permit us to eﬀectively deal with both random actionplans and random state transitions. These were indeed treated by Yang [32] in an alternativetransition-probability formulation, with each χ t ( s ) there eﬀectively x t ( s, · ) ◦ γ − here andeach ˜ g t ( s, x, µ ) there eﬀectively θ t ( s, x, µ, · ) ◦ ι − here. Due to its need to sample from jointprobabilities of the non-product type, however, the earlier work found it necessary to assumediscrete state and action spaces. This restriction is removed here through exploitations ofthe independently generated shocks and tools pertinent to the tightness of probabilities. Thelatter only requires the current spaces S , G , and I to be complete.We can also apply our results to a dynamic pricing game participated by heterogeneousﬁrms. Since the random demand arrival process is inﬂuenced by prices charged by all ﬁrmsand leftover items are stored for future sales, the ﬁnite-player version of this problem isvirtually intractable. The usefulness of the transient result Theorem 2 is thus at full display.To the stationary case also involving production, the stationary result Theorem 3 can further19e applied. Moreover, depending on which portion of the outside environment, whether itbe merely other ﬁrms’ prices or both their prices and inventory levels, are observable, therecan be diﬀerent versions of the ﬁnite game. The NG approximation renders these diﬀerencesirrelevant. Details are furnished in Yang [33]. We have established links between multi-period Markovian games and their NG counterparts.Our focus is the case where state and action spaces are general metric spaces, and thereare independently generated shocks serving as random drivers for decision making and stateevolution. In essence, the evolution of player-state distributions in large ﬁnite games, thoughrandom, resembles in probability the deterministic pathway taken by their NG counterparts.This allows NG equilibria to be well adapted to large ﬁnite games.Still, many dynamic competitive situations not yet covered by existing studies like Huang,Malhame, and Caines [15] are better described by continuous-time models. These will requirevastly diﬀerent techniques to probe. For one thing, the mathematical induction approachwe have taken to deal with multiple periods would not seem to go well with a discrete-time approximation of a continuous-time model. In the latter model, even to identify theenvironment induced by all players adopting a common policy might involve solving a ﬁxedpoint problem. Therefore, serious challenges will have to be overcome.20 ppendicesA Technical Developments in Section 6.1

Given metric space A , the Prohorov metric ρ A is such that, for any distributions π, π ′ ∈ P ( A ), ρ A ( π, π ′ ) = inf ( ǫ > | π ′ (( A ′ ) ǫ ) + ǫ ≥ π ( A ′ ) , for all A ′ ∈ B ( A )) , (A.1)where ( A ′ ) ǫ = { a ∈ A | d A ( a, a ′ ) < ǫ for some a ′ ∈ A ′ } . (A.2)The metric ρ A is known to generate the weak topology for P ( A ).According to Parthasarathy [21] (Theorem II.7.1), the strong LLN applies to the empiricaldistribution under the weak topology, and hence under the Prohorov metric. In the following,we state its weak version. Lemma 1

Given separable metric spaces A and B , suppose distribution π A ∈ P ( A ) andmeasurable mapping y ∈ M ( A, B ) . Then, for any ǫ > , as long as n is large enough, ( π A ) n (cid:0)(cid:8) a = ( a , ..., a n ) ∈ A n | ρ B ( ε ( a ) · y − , π · y − ) < ǫ (cid:9)(cid:1) > − ǫ. For separable metric space A , point a ∈ A , and the ( n − π ∈ P n − ( A ), we use ( a, π ) n to represent the member of P n ( A ) that has an additional1 /n weight on the point a , but with probability masses in π being reduced to ( n − /n times of their original values. For a ∈ A n and m = 1 , ..., n , we have ( a m , ε ( a − m )) n = ε ( a ).Concerning the Prohorov metric, we have also a simple but useful observation. Lemma 2

Let A be a separable metric space. Then, for any n = 2 , , ... , a ∈ A , and π ∈ P n − ( A ) , ρ A (( a, π ) n , π ) ≤ n . Proof:

Let A ′ ∈ B ( A ) be chosen. If a / ∈ A ′ , then( a, π ) n ( A ′ ) ≤ π ( A ′ ) ≤ ( a, π ) n ( A ′ ) + 1 n ; (A.3)if a ∈ A ′ , then ( a, π ) n ( A ′ ) − n ≤ π ( A ′ ) ≤ ( a, π ) n ( A ′ ) . (A.4)21ence, it is always true that | ( a, π ) n ( A ′ ) − π ( A ′ ) |≤ n . (A.5)In view of (A.1) and (A.2), we have ρ A (( a, π ) n , π ) ≤ n . (A.6)We have thus completed the proof.The following result is important for showing the near-trajectory evolution of aggregateenvironments in large multi-period games. Lemma 3

Given separable metric space A and complete separable metric spaces B and C ,suppose y n ∈ M ( A n , B n ) for every n ∈ N , π A ∈ P ( A ) , π B ∈ P ( B ) , and π C ∈ P ( C ) . If ( π A ) n ( { a ∈ A n | ρ B ( ε ( y n ( a )) , π B ) < ǫ } ) > − ǫ, for any ǫ > and any n large enough, then ( π A × π C ) n ( { ( a, c ) ∈ ( A × C ) n | ρ B × C ( ε ( y n ( a ) , c ) , π B × π C ) < ǫ } ) > − ǫ, for any ǫ > and any n large enough. Proof:

Suppose sequence { π ′ B , π ′ B , ... } weakly converges to the given probability measure π B , and sequence { π ′ C , π ′ C , ... } weakly converges to the given probability measure π C . Weare to show that the sequence { π ′ B × π ′ C , π ′ B × π ′ C , ... } weakly converges to π B × π C .Let F ( B ) denote the family of uniformly continuous real-valued functions on B withbounded support. Let F ( C ) be similarly deﬁned for C . We certainly have ( lim k → + ∞ R B f ( b ) · π ′ Bk ( db ) = R B f ( b ) · π B ( db ) , ∀ f ∈ F ( B ) , lim k → + ∞ R C f ( c ) · π ′ Ck ( dc ) = R C f ( c ) · π C ( dc ) , ∀ f ∈ F ( C ) . (A.7)Deﬁne F so that F = { f | f ( b, c ) = f B ( b ) · f C ( c ) for any ( b, c ) ∈ B × C, where f B ∈ F ( B ) ∪ { } and f C ∈ F ( C ) ∪ { }} , (A.8)where stands for the function whose value is 1 everywhere. By (A.7) and (A.8),lim k → + ∞ Z B × C f ( b, c ) · ( π ′ Bk × π ′ Ck )( d ( b, c )) = Z B × C f ( b, c ) · ( π B × π C )( d ( b, c )) . (A.9)22ccording to Ethier and Kurtz [10] (Proposition III.4.4), F ( B ) and F ( C ) happen to be P ( B ) and P ( C )’s convergence determining families, respectively. As B and C are complete,Ethier and Kurtz ([10], Proposition III.4.6, whose proof involves Prohorov’s Theorem, i.e.,the equivalence between tightness and relative compactness of a collection of probabilitymeasures deﬁned for complete separable metric spaces) further states that F as deﬁnedthrough (A.8) is convergence determining for P ( B × C ). Therefore, we have the desiredweak convergence by (A.9).Let ǫ > δ B > δ C >

0, such that ρ B ( π ′ B , π B ) < δ B and ρ C ( π ′ C , π C ) < δ C will imply( ρ B × ρ C )( π ′ B × π ′ C , π B × π C ) < ǫ. (A.10)By (A.1) and the given hypothesis, there is ¯ n ∈ N , so that for n = ¯ n , ¯ n + 1 , ... ,( π A ) n ( ˜ A n ) > − ǫ , (A.11)where ˜ A n contains all a ∈ A n such that ρ B ( ε ( y n ( a )) , π B ) < δ B . (A.12)By (A.1) and Lemma 1, on the other hand, there is ¯ n ∈ N , so that for n = ¯ n , ¯ n + 1 , ... ,( π C ) n ( ˜ C n ) > − ǫ , (A.13)where ˜ C n contains all c ∈ C n such that ρ C ( ε ( c ) , π C ) < δ C . (A.14)For any n = ¯ n ∨ ¯ n , ¯ n ∨ ¯ n + 1 , ... , let ( a, c ) be an arbitrary member of ˜ A n × ˜ C n . We havefrom (A.10), (A.12), and (A.14) that,( ρ B × ρ C )( ε ( y n ( a ) , c ) , π B × π C ) < ǫ. (A.15)Noting the facilitating ( a, c ) is but an arbitrary member of ˜ A n × ˜ C n , we see that( π A × π C ) n ( { ( a, c ) ∈ ( A × C ) n | ρ B × C ( ε ( y n ( a ) , c ) , π B × π C ) < ǫ } ) ≥ ( π A ) n ( ˜ A n ) × ( π C ) n ( ˜ C n ) , (A.16)which by (A.11) and (A.13), is greater than 1 − ǫ .23ecause the equivalence between tightness and relative compactness of a collection ofprobability measures is indirectly related to the proof of Lemma 3, we require B and C tobe complete separable metric spaces. Lemma 4

Given separable metric spaces A , B , C , and D , as well as distributions π A ∈P ( A ) , π B ∈ P ( B ) , and π C ∈ P ( C ) , suppose y n ∈ M ( A n , B n ) for every n ∈ N and z ∈K ( B, C, π C , D ) . If ( π A × π C ) n ( { a ∈ A n , c ∈ C n | ρ B × C ( ε ( y n ( a ) , c ) , π B × π C ) < ǫ } ) > − ǫ, for any ǫ > and any n large enough, then ( π A × π C ) n (cid:0)(cid:8) a ∈ A n , c ∈ C n | ρ D ( ε ( y n ( a ) , c ) · z − , ( π B × π C ) · z − ) < ǫ (cid:9)(cid:1) > − ǫ, for any ǫ > and any n large enough. Proof:

Let ǫ > z ∈ K ( B, C, π C , D ), there exist C ′ ∈ B ( C ) satisfying π C ( C ′ ) > − ǫ , (A.17)as well as δ ∈ (0 , ǫ/ , (A.18)such that for any b, b ′ ∈ B satisfying d B ( b, b ′ ) < δ and any c ∈ C ′ , d D ( z ( b, c ) , z ( b ′ , c )) < ǫ. (A.19)For any subset D ′ in B ( D ), we therefore have( z − ( D ′ )) δ ∩ ( B × C ′ ) ⊆ z − (( D ′ ) ǫ ) . (A.20)This leads to ( z − ( D ′ )) δ \ ( B × ( C \ C ′ )) ⊆ z − (( D ′ ) ǫ ), and hence due to (A.17),( π B × π C ) (cid:0) z − (( D ′ ) ǫ ) (cid:1) ≥ ( π B × π C ) (cid:0) ( z − ( D ′ )) δ (cid:1) − ǫ . (A.21)On the other hand, by the hypothesis, we know for n large enough,( π A × π C ) n ( E ′ n ) > − δ, (A.22)where E ′ n = { a ∈ A n , c ∈ C n | ρ B × C ( ε ( y n ( a ) , c ) , π B × π C ) < δ } ∈ B n ( A × C ) . (A.23)24y (A.23), for any ( a, b ) ∈ E ′ n and F ′ ∈ B ( B × C ),( π B × π C )(( F ′ ) δ ) ≥ [ ε ( y n ( a ) , c )]( F ′ ) − δ. (A.24)Combining the above, we have, for any ( a, c ) ∈ E ′ n and D ′ ∈ B ( D ),[( π B × π C ) · z − ](( D ′ ) ǫ ) = ( π B × π C )( z − (( D ′ ) ǫ )) ≥ ( π B × π C )(( z − ( D ′ )) δ ) − ǫ/ ≥ [ ε ( y n ( a ) , c )]( z − ( D ′ )) − δ − ǫ/ ≥ [ ε ( y n ( a ) , c )]( z − ( D ′ )) − ǫ = ([ ε ( y n ( a ) , c )] · z − )( D ′ ) − ǫ. (A.25)where the ﬁrst inequality is due to (A.21), the second inequality is due to (A.24), and thethird inequality is due to (A.18). That is, we have ρ D (cid:0) ε ( y n ( a ) , c ) · z − , ( π B × π C ) · z − (cid:1) ≤ ǫ, ∀ ( a, c ) ∈ E ′ n . (A.26)In view of (A.18) and (A.22), we have the desired result.We can now prove Proposition 1 and then Theorem 1. Proof of Proposition 1:

Let t = 1 , ..., ¯ t − x ∈ K ( S, G, γ, X ) be given. Deﬁne map z ∈ M ( S × G × I, S ), so that z ( s, g, i ) = θ t ( s, x ( s, g ) , M ( σ, x ) , i ) , ∀ s ∈ S, g ∈ G, i ∈ I. (A.27)In view of (7) and (A.27), we have, for any S ′ ∈ B ( S ),[ T t ( x ) ◦ σ ]( S ′ ) = R S R G R I ( z ( s, g, i ) ∈ S ′ ) · ι ( di ) · γ ( dg ) · σ ( ds )= ( σ × γ × ι )( { ( s, g, i ) ∈ S × G × I | z ( s, g, i ) ∈ S ′ } ) = ( σ × γ × ι )( z − ( S ′ )) . (A.28)For n ∈ N , g = ( g , ..., g n ) ∈ G n , and i = ( i , ..., i n ) ∈ I n , also deﬁne operator T ′ n ( g, i ) on P n ( S ) so that T ′ n ( g, i ) ◦ ε ( s ) = ε ( s ′ ), where for m = 1 , , ..., n , s ′ m = z ( s m , g m , i m ) = θ t ( s m , x ( s m , g m ) , M ( σ, x ) , i m ) . (A.29)It is worth noting that (A.29) is diﬀerent from the earlier (16). In view of (A.27) and (A.29),we have, for S ′ ∈ B ( S ), that [ T ′ n ( g, i ) ◦ ε ( s )]( S ′ ) equals1 n · n X m =1 ( z ( s m , g m , i m ) ∈ S ′ ) = ε (( s , g , i ) , ..., ( s n , g n , i n )) (cid:0) z − ( S ′ ) (cid:1) . (A.30)Combining (A.28) and (A.30), we arrive to a key observation that T t ( x ) ◦ σ = ( σ × γ × ι ) · z − , while T ′ n ( g, i ) ◦ ε ( s ) = ε ( s, g, i ) · z − . (A.31)25n the rest of the proof, we ﬁrst show the asymptotic closeness between T t ( x ) ◦ σ and T ′ n ( g, i ) ◦ ε ( s n ( a )), and then that between the latter and T nt ( x, g, i ) ◦ ε ( s n ( a )).First, due to the hypothesis on the convergence of ε ( s n ( a )) to σ , the completeness of thespaces S , G , and I and hence also the completeness of G × I , as well as Lemma 3,( π × γ × ι ) n ( { ( a, g, i ) ∈ ( A × G × I ) n | ρ S × G × I ( ε ( s n ( a ) , g, i ) , σ × γ × ι ) < ǫ ′ } ) > − ǫ ′ , (A.32)for any ǫ ′ > n large enough. By (S1) and the fact that x ∈ K ( S, G, γ, X ), we maysee that z as deﬁned through (A.27) is a member of K ( S, G × I, γ × ι, S ). By Lemma 4, thisfact along with (A.32) will lead to the strict dominance of 1 − ǫ ′ by( π × γ × ι ) n ( { ( a, g, i ) ∈ ( A × G × I ) n | ρ S ( ε ( s n ( a ) , g, i ) · z − , ( σ × γ × ι ) · z − ) < ǫ ′ } ) , (A.33)for any ǫ ′ > n large enough. By (A.31), this is equivalent to that, given ǫ > n ∈ N so that for any n = ¯ n , ¯ n + 1 , ... ,( π × γ × ι ) n (cid:16) ˜ A n ( ǫ ) (cid:17) > − ǫ , (A.34)where ˜ A n ( ǫ ) ∈ B n ( A × G × I ) is equal to n ( a, g, i ) ∈ ( A × G × I ) n | ρ S ( T t ( x ) ◦ σ, T ′ n ( g, i ) ◦ ε ( s n ( a ))) < ǫ o . (A.35)Next, note that the only diﬀerence between T nt ( x, g, i ) ◦ ε ( s n ( a )) and T ′ n ( g, i ) ◦ ε ( s n ( a ))lies in that ε ( s n, − m ( a ) , g − m ) is used in the former as in (16) whereas σ × γ is used in the latteras in (A.29). Here, s n, − m ( a ) refers to the vector ( s n ( a ) , ..., s n,m − ( a ) , s n,m +1 ( a ) , ..., s nn ( a )).By (S2), there is δ ∈ (0 , ǫ/

4] and I ′ ∈ B ( I ) with ι ( I ′ ) > − ǫ , (A.36)so that for any ( s, g, i ) ∈ S × G × I ′ and any µ ′ ∈ P ( S × X ) satisfying ρ S × X ( M ( σ, x ) , µ ′ ) < δ , d S ( θ t ( s, x ( s, g ) , M ( σ, x ) , i ) , θ t ( s, x ( s, g ) , µ ′ , i )) < ǫ . (A.37)For each n ∈ N , deﬁne I ′ n so that I ′ n = n i = ( i , ..., i n ) ∈ I n | more than (cid:16) − ǫ (cid:17) · n components come from I ′ o . (A.38)Also important is that by (A.37) and (A.38), for any S ′ ∈ B ( S ) and i = ( i , ..., i n ) ∈ I ′ n ,[ T nt ( x, g, i ) ◦ ε ( s n ( a ))] (cid:0) ( S ′ ) ǫ/ (cid:1) + ǫ ≥ [ T ′ n ( g, i ) ◦ ε ( s n ( a ))] ( S ′ ) , (A.39)26henever ρ S × X ( M ( σ, x ) , M n ( s n, − m ( a ) , g − m , x )) < δ. (A.40)It can be shown that I ′ n will occupy a big chunk of I n as measured by ι n when n is large.Deﬁne map q from I to { , } so that q ( i ) = 1 or 0 depending on whether or not i ∈ I ′ .By (A.36), ι · q − is a Bernoulli distribution with ( ι · q − )( { } ) > − ǫ/

4. So by (A.38), I ′ n contains all i = ( i , ..., i n ) ∈ I n that satisfy ρ { , } ( ε ( i ) · q − , ι · q − ) < ǫ . (A.41)Therefore, by Lemma 1, there exits ¯ n ∈ N , so that for n = ¯ n , ¯ n + 1 , ... , ι n ( I ′ n ) > − ǫ . (A.42)We can also demonstrate that (A.40) will be highly likely when n is large. By Lemma 3 andthe hypothesis on the convergence of ε ( s n ( a )) to σ , we know ε ( s n ( a ) , g ) will converge to σ × γ in probability. Due to Lemma 2, this conclusion applies to the sequence ε ( s n, − m ( a ) , g − m ) aswell. The fact that x ∈ K ( S, G, γ, X ) certainly leads to (prj S , x ) ∈ K ( S, G, γ, S × X ). So byLemma 4, there is ¯ n ∈ N , so that for n = ¯ n , ¯ n + 1 , ... ,( π n × γ n ) (cid:16) ˜ B n ( δ ) (cid:17) > − ǫ , (A.43)where ˜ B n ( δ ) = { ( a, g ) ∈ A n × G n | (A.40) is true } ∈ B n ( A × G ) . (A.44)Consider arbitrary n = ¯ n ∨ ¯ n ∨ ¯ n , ¯ n ∨ ¯ n ∨ ¯ n + 1 , ... , ( a, g, i ) ∈ ˜ A n ( ǫ ) ∩ ( ˜ B n ( δ ) × I ′ n ), and S ′ ∈ B ( S ). By (A.1) and (A.35), we see that[ T ′ n ( g, i ) ◦ ε ( s n ( a ))] (cid:0) ( S ′ ) ǫ/ (cid:1) + ǫ ≥ [ T t ( x ) ◦ σ ]( S ′ ) . (A.45)Combining this with (A.39), (A.40), and (A.44), we obtain[ T nt ( x, g, i ) ◦ ε ( s n ( a ))] (( S ′ ) ǫ ) + ǫ ≥ [ T ′ n ( g, i ) ◦ ε ( s n ( a ))] (cid:0) ( S ′ ) ǫ/ (cid:1) + ǫ ≥ [ T t ( x ) ◦ σ ]( S ′ ) . (A.46)According to (A.1), this means ρ S ( T nt ( x, g, i ) ◦ ε ( s n ( a )) , T t ( x ) ◦ σ ) ≤ ǫ. (A.47)Therefore, for n ≥ ¯ n ∨ ¯ n ∨ ¯ n ,( π × γ × ι ) n ( { ( a, g, i ) ∈ ( A × G × I ) n | ρ S ( T nt ( x, g, i ) ◦ ε ( s n ( a )) , T t ( x ) ◦ σ ) ≤ ǫ } ) ≥ ( π × γ × ι ) n (cid:16) ˜ A n ( ǫ ) ∩ ( ˜ B n ( δ ) × I ′ n ) (cid:17) , (A.48)27hereas the latter is, in view of (A.34), (A.42), and (A.43), greater than 1 − ǫ . Proof of Theorem 1:

We use induction to show that, for each τ = 0 , , ..., ¯ t − t + 1,( σ t × γ τ × ι τ ) n (cid:16) ˜ A nτ ( ǫ ) (cid:17) > − ǫ ¯ t − t + 2 , (A.49)for any ǫ > n large enough, where ˜ A nτ ( ǫ ) ∈ B n ( S × G τ × I τ ) is such that, for any( s t , g [ t,t + τ − , i [ t,t + τ − ) ∈ ˜ A nτ ( ǫ ), ρ S (cid:0) T n, [ t,t + τ − ( x [ t,t + τ − , g [ t,t + τ − , i [ t,t + τ − ) ◦ ε ( s t ) , T [ t,t + τ − ( x [ t,t + τ − ) ◦ σ t (cid:1) < ǫ. (A.50)Once the above is achieved, we can then deﬁne ˜ A n ( ǫ ) required in the theorem by˜ A n ( ǫ ) = ¯ t − t +1 \ τ =0 h ˜ A nτ ( ǫ ) × G n · (¯ t − t +1 − τ ) × I n · (¯ t − t +1 − τ ) i . (A.51)This and (A.49) will lead to (cid:16) σ t × γ ¯ t − t +1 × ι ¯ t − t +1 (cid:17) n (cid:16) ˜ A n ( ǫ ) (cid:17) > (cid:18) − ǫ ¯ t − t + 2 (cid:19) ¯ t − t +2 > − ǫ, (A.52)for any ǫ > n large enough.Now we proceed with the induction process. First, note that T n, [ t,t − ◦ ε ( s t ) is merely ε ( s t ) itself and T [ t,t − ◦ σ t is merely σ t itself. Hence, we will have (A.49) for τ = 0 for any ǫ > n large enough just by Lemma 1. Then, for some τ = 1 , , ..., ¯ t − t + 1, suppose (cid:0) σ t × γ τ − × ι τ − (cid:1) n (cid:16) ˜ A n,τ − ( ǫ ) (cid:17) > − ǫ ¯ t − t + 2 , (A.53)for any ǫ > n large enough. We may apply Proposition 1 to the above, while atthe same time identifying S × G τ − × I τ − with A , σ t × γ τ − × ι τ − with π , x t + τ − with x , T n, [ t,t + τ − ( x [ t,t + τ − , g [ t,t + τ − , i [ t,t + τ − ) ◦ ε ( s t ) with ε ( s n ( a )), and T [ t,t + τ − ( x [ t,t + τ − ) ◦ σ t with σ . This way, we will verify (A.49) for any ǫ > n large enough. Therefore, the inductionprocess can be completed. B Technical Developments in Section 6.2

Proof of Proposition 2:

Because payoﬀ functions are bounded, the value functions arebounded too. We then prove by induction on t . By (11), we know the result is true for t =28 t +1. Suppose for some t = ¯ t, ¯ t − , ...,

2, we have the continuity of v t +1 ( s t +1 , σ t +1 , x [ t +1 , ¯ t ] , x t +1 )in s t +1 . By this induction hypothesis, the probabilistic continuity of x t , (S1), (F1), and theboundedness of the value functions, we see the continuity of the right-hand side of (12) in s t . So, v t ( s t , σ t , x [ t ¯ t ] , x t ) is continuous in s t , and we have completed our induction process. Proof of Proposition 3:

We prove by induction on t . By (11) and (19), we know theresult is true for t = ¯ t + 1. Suppose for some t = ¯ t, ¯ t − , ...,

2, we have the convergence of v n,t +1 ( s t +1 , , ε ( s nt +1 , − ) , x [ t +1 , ¯ t ] , x t +1 ) to v t +1 ( s t +1 , , σ t +1 , x [ t +1 , ¯ t ] , x t +1 ) at an s t +1 , -independentrate when s t +1 , − = ( s t +1 , , s t +1 , , ... ) is sampled from σ t +1 . Now, suppose s t, − = ( s t , s t , ... )is sampled from σ t . Let also g = ( g , g , ... ) be generated through sampling on ( G, B ( G ) , γ )and i = ( i , i , ... ) be generated through sampling on ( I, B ( I ) , ι ). In the remainder of theproof, we let s nt = ( s t , s t , ..., s tn ) for any arbitrary s t ∈ S , g n = ( g , ..., g n ) and i n =( i , ..., i n ).Due to Lemma 1, ε ( s nt, − ) will converge to σ t . By Lemma 2, ε ( s nt ) will converge to σ t at an s t -independent rate. By Proposition 1, we know that T nt ( x t , g n , i n ) ◦ ε ( s nt ) will con-verge to T t ( x t ) ◦ σ t in probability at an s t -independent rate, and by Lemma 2 again, sowill [ T nt ( x t , g n , i n ) ◦ ε ( s nt )] − to T t ( x t ) ◦ σ t . Now Lemma 3 will lead to the convergence inprobability of ε ( s nt, − , g n − ) to σ t × γ . Due to x t ’s probabilistic continuity, Lemma 4 will leadto the convergence in probability of M n ( s nt, − , g n − , x t ) to M ( σ t , x t ). Thus,1. ψ t ( s t , x t ( s t , g ) , M n ( s nt, − , g n − , x t )) will converge to ψ t ( s t , x t ( s t , g ) , M ( σ t , x t )) inprobability at an s t -independent rate due to (F2);2. v n,t +1 ( θ t ( s t , x t ( s t , g ) , M n ( s nt, − , g n − , x t ) , i ) , [ T nt ( x t , g n , i n ) ◦ ε ( s nt )] − , x [ t +1 , ¯ t ] , x t +1 ) willconverge to v t +1 ( θ t ( s t , x t ( s t , g ) , M n ( s nt, − , g n − ) , x t ) , i ) , T t ( x t ) ◦ σ t , x [ t +1 , ¯ t ] , x t +1 ) in probabil-ity at an s t -independent rate due to the induction hypothesis; the latter will in turn con-verge to v t +1 ( θ t ( s t , x t ( s t , g ) , M ( σ t , x t ) , i ) , T ( x t ) ◦ σ t , x [ t +1 , ¯ t ] , x t +1 ) in probability at an s t -independent rate due to (S2) and Proposition 2.As per-period payoﬀs are bounded, all value functions are bounded. The above conver-gences will then lead to the convergence of the right-hand side of (20) to the right-handside of (12) at an s t -independent rate. That is, v nt ( s t , ε ( s nt, − ) , x [ t ¯ t ] , x t ) will converge to v t ( s t , σ t , x [ t ¯ t ] , x t ) at a rate independent of s t . We have completed the induction process. Proof of Theorem 2:

Let us consider subgames starting with some time t = 1 , , ..., ¯ t . Forconvenience, we let σ t = T [1 ,t − ( x ∗ [1 ,t − ) ◦ σ . Now let s t = ( s t , s t , ... ) be generated through29ampling on ( S, B ( S ) , σ t ), g = ( g , g , ... ) be generated through sampling on ( G, B ( G ) , γ ),and i = ( i , i , ... ) be generated through sampling on ( I, B ( I ) , ι ). In the remainder of theproof, we let s nt = ( s t , ..., s tn ), s nt, − = ( s t , ..., s tn ), g n = ( g , ..., g n ), and i n = ( i , ..., i n ).By Lemma 1 and Proposition 1, we know that ε ( s nt ) = ε ( s t , ..., s tn ) converges to σ t inprobability, and also that T nt ( x ∗ t , g n , i n ) ◦ ε ( s nt ) converges to T t ( x ∗ t ) ◦ σ t in probability. Dueto Lemma 2, ε ( s nt, − ) and [ T nt ( x ∗ t , g n , i n ) ◦ ε ( s nt )] − will have the same respective conver-gences. Also, Lemma 3 will lead to the convergence in probability of ε ( s nt, − , g n − ) to σ t × γ .Due to x t ’s probabilistic continuity, Lemma 4 will lead to the convergence in probability of M n ( s nt, − , g n − , x t ) to M ( σ t , x t ). Then,1. ψ t ( s t , y ( s t , g ) , M n ( s nt, − , g − , x t )) will converge to ψ t ( s t , y ( s t , g ) , M ( σ t , x t )) inprobability at a y -independent rate due to (F2);2. v n,t +1 ( θ t ( s t , y ( s t , g ) , M n ( s nt, − , g n − , x t ) , i ) , [ T nt ( x ∗ t , g n , i n ) ◦ ε ( s nt )] − , x ∗ [ t +1 , ¯ t ] , x ∗ t +1 ) willconverge to v t +1 ( θ t ( s t , y ( s t , g ) , M n ( s nt, − , g n − , x t ) , i ) , T t ( x ∗ t ) ◦ σ t , x ∗ [ t +1 , ¯ t ] , x ∗ t +1 ) in probabilityat a y -independent rate due to Proposition 3, which due to (S2) and Proposition 2, willconverge to v t +1 ( θ t ( s t , y ( s t , g ) , M ( σ t , x t ) , i ) , T t ( x ∗ t ) ◦ σ t , x ∗ [ t +1 , ¯ t ] , x ∗ t +1 ) in probability at a y -independent rate.As per-period payoﬀs are bounded, all value functions are bounded. By (12) and (20),the above convergences will then lead to the convergence of the left-hand side of (21) to theleft-hand side of (14). At the same time, the right-hand side of (21) plus ǫ will converge tothe right-hand side of (14) due to the convergence of ε ( s nt, − ) to σ t , Proposition 3, and theuniform boundedness of the value functions. By (14), as long as n is large enough, (21) willbe true for any ǫ > y ∈ M ( S × G, X ). This would then lead to the ﬁnal conclusiondue to Theorem 1 and the boundedness of payoﬀ functions.

C Technical Developments in Section 7

Proof of Theorem 3:

Let ǫ > t = 1 , , ... satisfying t ≥ ln(6 ¯ ψ/ ( ǫ · (1 − α ))) / ln(1 /α ) + 1, we have from (33) and (34), | v n ∞ ( s , ε ( s − ) , x ∗ , y ) − v nt ( s , ε ( s − ) , x ∗ , y ) | < ǫ . (C.1)Therefore, we need merely to select such a large t and show that, when n is large enough, Z S n v nt ( s , ε ( s − ) , x ∗ , x ∗ ) · ( σ ∗ ) n ( ds ) ≥ Z S n v nt ( s , ε ( s − ) , x ∗ , y ) · ( σ ∗ ) n ( ds ) − ǫ . (C.2)30or t = 1 , , ... , since ( x ∗ , σ ∗ ) forms an equilibrium for Γ, we know (30) is true. This, aswell as (28) and (29), lead to α t − τ · (cid:20)Z S v τ ( s, σ ∗ , x ∗ , y ) · σ ∗ ( ds ) − Z S v τ ( s, σ ∗ , x ∗ , x ∗ ) · σ ∗ ( ds ) (cid:21) ≤ α t − · ¯ ψ − α ≤ ǫ . (C.3)for τ = 1 , , ..., t , g ∈ G , s ∈ S , and y ∈ M ( S × G, X ).We associate entities here with those deﬁned in Section 4 when ¯ t there is ﬁxed at the t here. To signify the diﬀerence in the two notational systems, we add superscript “ K ” tosymbols deﬁned for the previous section. For instance, we write v Kτ for the v τ deﬁned inthat section, which has a diﬀerent meaning than the v τ here. Now, our α t − τ · v τ ( s, σ ∗ , x ∗ , y )can be understood as v Kt +1 − τ ( s, σ ∗ , x ′ , y ), with x ′ = ( x ′ t +1 − τ , ..., x ′ t ) ∈ ( M ( S × G, X )) τ beingsuch that x ′ t ′ = x ∗ for t ′ = t + 1 − τ, ..., t . Due to the association of σ ∗ with x ∗ through thedeﬁnition (26), we can understand σ ∗ as T K [1 ,τ − ( x ′ [1 ,τ − ) ◦ σ K , where x ′ [1 ,τ − = ( x ′ , ..., x ′ τ − ) ∈ ( M ( S × G, X )) τ − is such that x ′ t ′ = x ∗ for t ′ = 1 , , ..., τ − − ǫ/ x ∗ , σ ∗ ) oﬀers an ( ǫ/ K ( σ ∗ ) with ¯ t = t , θ Kτ = θ , and ψ Kτ = α τ − · ψ . Even though Theorem 2 is nominallyabout going from an 0-equilibrium for the nonatomic game to ǫ -equilibria for ﬁnite games,we can follow exactly the same logic used to prove it to go from an ( ǫ/ ǫ/ n large enough andany y ∈ M ( S × G, X ), Z S n (cid:0) σ K (cid:1) n ( ds ) · v Knt (cid:0) s , ε ( s − ) , x ′ [1 t ] , x ′ (cid:1) ≥ Z S n (cid:0) σ K (cid:1) n ( ds ) · v Knt (cid:0) s , ε ( s − ) , x ′ [1 t ] , y (cid:1) − ǫ , (C.4)where x ′ [1 t ] is again to be understood as the policy that takes action x ∗ ( s, g ) whenever themost immediate state-shock pair is ( s, g ). But this translates into (C.2).31 eferences [1] Adlakha, S. and R. Johari. 2013. Mean Field Equilibrium in Dynamic Games withComplementarities. Operations Research , , pp. 971-989.[2] Adlakha, S., R. Johari, and G.Y. Weintraub. 2011. Equilibria of Dynamic Games withMany Players: Existence, Approximation, and Market Structure. Working Paper, Cal-ifornia Institute of Technology.[3] Al-Najjar, N.I. and R. Smorodinsky. 2001. Large Nonanonymous Repeated Games. Games and Economic Behavior , , pp. 26-39.[4] Al-Najjar, N.I. 2008. Large Games and the Law of Large Numbers. Games and EconomicBehavior , , pp. 1-34.[5] Aumann, R.J. 1964a. Markets with a Continuum of Traders. Econometrica , , pp.39-50.[6] Aumann, R.J. 1964b. Mixed and Behavior Strategies in Inﬁnite Extensive Games. InM. Dresher, L. Shapley, and A. Tucker (Eds.), Annals of Mathematics Studies , , Advances in Game Theory , pp. 627-650, Princeton University Press, Princeton, NJ.[7] Bergin, J. and D. Bernhardt. 1995. Anonymous Sequential Games: Existence and Char-acterization of Equilibria.

Economic Theory , , pp. 461-489.[8] Carmona, G. 2004. Nash Equilibria of Games with a Continuum of Players. WorkingPaper, Universidade Nova de Lisboa.[9] Duﬃe, D., J. Geanakoplos, A. Mas-Colell, and A. McLennan. 1994. Stationary MarkovEquilibria. Econometrica , , pp. 745-781.[10] Ethier, S.N. and T.G. Kurtz. 1986. Markov Processes: Characterization and Conver-gence . John Wiley & Sons, New York.[11] Green, E.J. 1980. Non-cooperative Price Taking in Large Dynamic Markets.

Journal ofEconomic Theory . , pp. 155-182.[12] Green, E.J. 1984. Continuum and Finite-player Noncooperative Models of Competition. Econometrica , , pp. 975-993. 3213] Hopenhayn, H.A. 1992. Entry, Exit, and Firm Dynamics in Long Run Equilibrium. Econometrica , , pp. 1127-1150.[14] Housman, D. 1988. Inﬁnite Player Noncooperative Games and the Continuity of theNash Equilibrium Correspondence. Mathematics of Operations Research , , pp. 488-496.[15] Huang, M., R.P. Malhame, and P.E. Caines. 2006. Large Population Stochastic DynamicGames: Closed-loop McKean-Vlasov Systems and the Nash Certainty Equivalence Prin-ciple. Communications in Information and Systems , , pp. 221-252.[16] Jovanovic, B. and R.W. Rosenthal. 1988. Anonymous Sequential Games. Journal ofMathematical Economics , , pp. 77-88.[17] Kalai, E. 2004. Large Robust Games. Econometrica , , pp. 1631-1665.[18] Khan, M.A., K.P. Rath, and Y. Sun. 1997. On the Existence of Pure Strategy Equi-librium in Games with a Continuum of Players. Journal of Economic Theory , , pp.13-46.[19] Mas-Colell, A. 1984. On a Theorem of Schmeidler. Journal of Mathematical Economics , , pp. 201-206.[20] Mertens, J.F. and T. Parthasarathy. 1987. Equalibria for Discounted Stochastic Games.CORE Discussion Paper No. 8750.[21] Parthasarathy, K.R. 2005. Probability Measures on Metric Spaces . AMS Chelsea Pub-lishing, Providence, Rhode Island.[22] Reny, P.J. and M. Perry. 2006. Towards the Foundation for Rational ExpectationsEquilibrium.