[PDF] A Unified Approach to Dynamic Decision Problems with Asymmetric Information - Part II: Strategic Agents

Abstract

We study a general class of dynamic games with asymmetric information where agents' beliefs are strategy dependent, i.e. signaling occurs. We show that the notion of sufficient information, introduced in the companion paper team, can be used to effectively compress the agents' information in a mutually consistent manner that is sufficient for decision-making purposes. We present instances of dynamic games with asymmetric information where we can characterize a time-invariant information state for each agent. Based on the notion of sufficient information, we define a class of equilibria for dynamic games called Sufficient Information Based Perfect Bayesian Equilibrium (SIB-PBE). Utilizing the notion of SIB-PBE, we provide a sequential decomposition of dynamic games with asymmetric information over time; this decomposition leads to a dynamic program that determines SIB-PBE of dynamic games. Furthermore, we provide conditions under which we can guarantee the existence of SIB-PBE.

Full PDF

11 A Uniﬁed Approach to Dynamic DecisionProblems with Asymmetric Information -Part II: Strategic Agents

Hamidreza Tavafoghi, Yi Ouyang, and Demosthenis Teneketzis

Abstract

We study a general class of dynamic games with asymmetric information where agents’ beliefs arestrategy dependent, i.e. signaling occurs. We show that the notion of sufﬁcient information , introducedin the companion paper [2], can be used to effectively compress the agents’ information in a mutuallyconsistent manner that is sufﬁcient for decision-making purposes. We present instances of dynamicgames with asymmetric information where we can characterize a time-invariant information state foreach agent. Based on the notion of sufﬁcient information, we deﬁne a class of equilibria for dynamicgames called Sufﬁcient Information Based Perfect Bayesian Equilibrium (SIB-PBE). Utilizing the notionof SIB-PBE, we provide a sequential decomposition of dynamic games with asymmetric informationover time; this decomposition leads to a dynamic program that determines SIB-PBE of dynamic games.Furthermore, we provide conditions under which we can guarantee the existence of SIB-PBE.

I. I

NTRODUCTION

We study a general class of stochastic dynamic games with asymmetric information. Weconsider a setting where the underlying system has Markovian dynamics controlled by the agents’joint actions at every time. The instantaneous utility of each agent depends on the agents’ joint

A preliminary version of this paper will appear in the Proceeding of the 57th IEEE Conference on Decision and Control(CDC), Miami Beach, FL, December 2018 [1].H. Tavafoghi is with the Department of Mechanical Engineering at the University of California, Berkeley (e-mail:[email protected]). Y. Ouyang is with Preferred Networks America, Inc. (e-mail: [email protected]). D.Teneketzis is with the Department of Electrical Engineering and Computer Science at the University of Michigan, Ann Arbor(e-mail: [email protected])This work was supported in part by the NSF grants CNS-1238962, CCF-1111061, ARO-MURI grant W911NF-13-1-0421, andARO grant W911NF-17-1-0232.

November 23, 2018 DRAFT a r X i v : . [ c s . M A ] D ec actions and the system state. At every time, each agent makes a private noisy observation thatdepends on the current system state and the agents’ past actions. Therefore, at every time agentshave asymmetric and imperfect information about the history of the game. Furthermore, theinformation that an agent possesses about the history of the game at each time instant dependson the other agents’ past actions and strategies; this phenomenon is known as signaling among theagents. Moreover, at each time, each agent’s strategy depend on his information about the currentsystem state and the other agents’ strategies. Therefore, the agents’ decisions and informationare coupled and interdependent over time.There are three main challenges in the study of dynamic games with asymmetric information.First, since the agents’ decisions and information are interdependent and coupled over time, weneed to determine the agents’ strategies simultaneously for all times. Second, the agents’ strategydomains grow over time as they acquire more information. Third, in contrast to dynamic teamswhere agents coordinate their strategies, in dynamic games each agent’s strategy is his privateinformation as he chooses it individually so as to maximize his utility. Therefore, in dynamicgames each agent needs to form a belief about other agents’ strategies as well as about the gamehistory.In this paper, we propose a general approach for the study of a dynamic games with asymmetricinformation that addresses the stated-above challenges. We build our approach based on notionof sufﬁcient information, introduced in the companion paper [2], and deﬁne a class of sufﬁcientinformation based assessments, where strategic agents compress their information in a mutuallyconsistent manner that is sufﬁcient for decision-making purposes. Accordingly, we propose thenotion of Sufﬁcient Information Based Perfect Bayesian Equilibrium (SIB-PBE) for dynamicgames that characterizes a set of equilibrium outcomes. Using the notion of SIB-PBE, we providea sequential decomposition of the game over time, and formulate a dynamic program that enablesus to compute the set of SIB-PBEs via backward induction. We discover speciﬁc instances ofdynamic games where we can determine a set of information states for the agents that havetime-invariant domain. We determine conditions that guarantee the existence of SIB-PBEs. Wediscuss the relation between the class of SIB-PBE and PBE in dynamic games, and argue thatthe class of SIB-PBE provides a simpler and more robust set of equilibria than PBE that areconsistent with agents’ rationality.The notion of SIB-PBE we introduce in this paper provides a generalization/extension ofMarkov Perfect Equilibrium (MPE) to dynamic games with asymmetric information. The authors November 23, 2018 DRAFT in [3] introduce the notion of Markov Perfect Equilibrium that characterizes a subset of SubgamePerfect Equilibria (SPE) for dynamic games with symmetric information and provide a sequentialdecomposition of the game over time. Moreover, our results, along with those in the companionpaper [2], provide a uniﬁed approach to the study of dynamic decision problems with asymmetricinformation and strategic or nonstrategic agents.

A. Related Literature

Dynamic games with asymmetric information have been investigated extensively in literaturein the context of repeated games; see [4]–[7] and the references therein. The key feature ofthese games is the absence of a dynamic system. Moreover, the works on repeated games studyprimarily their asymptotic properties when the horizon is inﬁnite and agents are sufﬁcientlypatient ( i.e. the discount factor is close one). In repeated games, agents play a stage (static)game repeatedly over time. As a result, the decision making problem that each agent faces isvery simple. The main objective of this strand of literature is to explore situations where agentscan form self-enforcing punishment/reward mechanisms so as to create additional equilibria thatimprove upon the payoffs agents can get by simply playing an equilibrium of the stage game overtime. Recent works (see [8]–[10]) adopt approaches similar to those used in repeated games tostudy inﬁnite horizon dynamic games with asymmetric information when there is an underlyingdynamic Markovian system. Under certain conditions on the system dynamics and informationstructure, the authors of [8]–[10] characterize a set of asymptotic equilibria when the agents aresufﬁciently patient.The problem we study in this paper is different from the ones in [4]–[10] in two aspects. First,we consider a class of dynamic games where the underlying system has a general Markoviandynamics and a general information structure, and we do not restrict attention to asymptoticbehaviors when the horizon is inﬁnite and the agents are sufﬁciently patient. Second, we studysituations where the decision problem that each agent faces, in the absence of strategic interac-tions with other agents, is a Partially Observed Markov Decision Process (POMDP), which isa complex problem to solve by itself. Therefore, reaching (and computing) a set of equilibriumstrategies, which take into account the strategic interactions among the agents, is a very chal-lenging task. As a result, it is not very plausible for the agents to seek reaching an equilibriathat is generated by the formation of self-enforcing punishment/reward mechanisms similar tothose used in inﬁnitely repeated games (see Section VII for more discussion). We believe that

November 23, 2018 DRAFT our results provide new insight into the behavior of strategic agents in complex and dynamicenvironments, and complement the existing results in the repeated games literature with simpleand (mostly) static environments.The works in [11]–[14] consider dynamic zero-sum games with asymmetric information. Theauthors of [11], [12] study zero-sum games with Markovian dynamics and lack of informationon one side ( i.e. one informed player and one uninformed player). The authors of [13], [14]study zero-sum games with Markovian dynamics and lack of information on both sides. Theproblem that we study in this paper is different from the ones in [11]–[14] in three aspects.First, we study a general class of dynamic games that include dynamic zero-sum games withasymmetric information as a special case. Second, we consider a general Markovian dynamics forthe underlying system whereas the authors of [11]–[14] consider a speciﬁc Markovian dynamicswhere each agent observes perfectly a local state that evolves independently of the other localstates conditioned on the agents’ observable actions. Third, we consider a general informationstructure that allows us to capture scenarios with unobservable actions and imperfect observationsthat are not captured in [11]–[14].The problems investigated in [15]–[20] are the most closely related to our problem. The authorsof [15], [16] study a class of dynamic games where the agents’ common information based belief(deﬁned in [15]) is independent of their strategies; that is, there is no signaling among them.This property allows them to apply ideas from the common information approach developedin [21], [22], and deﬁne an equivalent dynamic game with symmetric information among theﬁctitious agents. Consequently, they characterize a class of equilibria for dynamic games called

Common Information based Markov Perfect Equilibrium . Our results are different from those in[15], [16] in two aspects. First, we consider a general class of dynamic games where the agents’CIB beliefs are strategy-dependent, thus, signaling is present. Second, the proposed approachin [15], [16] requires the agents to keep track of all of their private information over time. Wepropose an approach to effectively compress the agents’ private information, and consequently,reduce the number of variables on which the agents need to form CIB beliefs.The authors of [17]–[20] study a class of dynamic games with asymmetric informationwhere signaling occurs. When the horizon in ﬁnite, the authors of [17], [18] introduce thenotion of Common Information Based Perfect Bayesian Equilibrium, and provide a sequentialdecomposition of the game over time. The authors of [19], [20] extend the results of [17], [18]to ﬁnite horizon Linear-Quadratic-Gaussian (LQG) dynamic games and inﬁnite horizon dynamic

November 23, 2018 DRAFT games, respectively. The class of dynamic games studied in [17]–[20] satisﬁes the followingassumptions: (i) agents’ actions are observable (ii) each agent has a perfect observation of hisown local states/type (iii) conditioned on the agents’ actions, the evolution of the local statesare independent.We relax assumptions (i)-(iii) of [17]–[20], and study a general class of dynamic games withasymmetric information, hidden actions, imperfect observations, and controlled and coupleddynamics. As a result, each agent needs to form a belief about the other agents’ past actionsand private (imperfect) observations. Moreover, in contrast to [17]–[20], an agent’s, say agent i ’s, belief about the system state and the other agents’ private information is his own privateinformation and is different from the CIB belief. In this paper, we extend the methodologydeveloped in [17], [18] for dynamic games, and generalize the notion of CIB-PBE. Furthermore,we propose an approach to effectively compress the agents’ private information and obtain theresults of [17]–[20] as special cases. B. Contribution

We develop a general methodology for the study and analysis of dynamic games with asym-metric information, where the information structure is non-classical; that is, signaling occurs.We propose an approach to characterize a set of information states that effectively compressthe agents’ private and common information in a mutually consistent manner. We characterize asubclass of Perfect Bayesian Equilibria, called SIB-PBE, and provide a sequential decompositionof these games over time. This decomposition provides a backward induction algorithm todetermine the set of SIB-PBEs. We discover special instances of dynamic games where wecan identify a set of information states with time-invariant domain. We provide conditions thatguarantee the existence of SIB-PBEs in dynamic games with asymmetric information. We showthat the methodology developed in this paper generalizes the existing results on dynamic gameswith non-classical information structure.

C. Organization

The rest of the paper is organized as follows. In Section II, we describe our model. In SectionIII, we discuss the main issues that arise in the study of dynamic games with asymmetricinformation. We provide the formal deﬁnition of Perfect Bayesian Equilibrium in Section IV. InSection V, we describe the sufﬁcient information approach to dynamic games with asymmetricinformation and introduce the notion of Sufﬁcient Information Based (SIB) assessment and

November 23, 2018 DRAFT

SIB-PBE. In Section VI, we present our main results and provide a sequential decomposition ofdynamic games over time. We discuss our results in Section VII, and compare the notion of SIB-PBE with other equilibrium concepts..In Section VIII, we determine conditions that guarantee theexistence of SIB-PBE in dynamic games with asymmetric information. We conclude in SectionIX. The proofs of all the theorems and lemmas appear in the Appendix.

Remark 1.

Section I-D on notation and Section V-A on the deﬁnition of sufﬁcient privateinformation are similar to the ones appearing in the companion paper [23]; moreover, themodel presented in Section II with strategic agents is similar to that of the companion paper[23] with non-strategic agents. All these sections are included in this paper for ease of readingand to make the paper self-contained.D. Notation

Random variables are denoted by upper case letters, their realizations by the correspondinglower case letters. In general, subscripts are used as time index while superscripts are used toindex agents. For t ≤ t , X t : t (resp. f t : t ( · ) ) is the short hand notation for the random variables ( X t ,X t +1 , ...,X t ) (resp. functions ( f t ( · ) , . . . ,f t ( · )) ). When we consider a sequence of randomvariables (resp. functions) for all time, we drop the subscript and use X to denote X T (resp. f ( · ) to denote f T ( · ) ). For random variables X t , . . . ,X Nt (resp. functions f t ( · ) , . . . ,f Nt ( · ) ),we use X t := ( X t , . . . ,X Nt ) (resp. f t ( · ) := ( f t ( · ) , . . . ,f Nt ( · )) ) to denote the vector of the setof random variables (resp. functions) at t , and X − nt := ( X t , . . . ,X n − t ,X n +1 t , . . . ,X Nt ) (resp. f − nt ( · ) := ( f t ( · ) , . . . ,f n − t ( · ) ,f n +1 t ( · ) , . . . ,f Nt ( · )) ) to denote all random variables (resp. functions)at t except that of the agent indexed by n . P {·} and E {·} denote the probability and expectationof an event and a random variable, respectively. For a set X , ∆( X ) denotes the set of allbeliefs/distributions on X . For random variables X, Y with realizations x,y , P { x | y } := P { X = x | Y = y } and E { X | y } := E { X | Y = y } . For a strategy g and a belief (probability distribution) π ,we use P gπ {·} (resp. E gπ {·} ) to indicate that the probability (resp. expectation) depends on thechoice of g and π . We use { X = x } to denote the indicator function for event X = x . For sets A and B we use A \ B to denote all elements in set A that are not in set B .II. M ODEL

1) System dynamics:

There are N strategic agents who live in a dynamic Markovian worldover horizon T := { , , ..., T } , T < ∞ . Let X t ∈ X t denote the state of the world at t ∈ T . At time November 23, 2018 DRAFT t , each agent, indexed by i ∈ N := { , , ..., N } , chooses an action a it ∈ A it , where A it denotesthe set of available actions to him at t . Given the collective action proﬁle A t := ( A t , ..., A Nt ) , thestate of the world evolves according to the following stochastic dynamic equation, X t +1 = f t ( X t , A t , W xt ) , (1)where W x T − is a sequence of independent random variables. The initial state X is a randomvariable that has a probability distribution η ∈ ∆( X ) with full support.At every time t ∈ T , before taking an action, agent i receives a noisy private observation Y it ∈ Y it of the current state of the world X t and the action proﬁle A t − , given by Y it = O it ( X t , A t − , W it ) , (2)where W i T , i ∈ N , are sequences of independent random variables. Moreover, at every t ∈ T ,all agents receive a common observation Z t ∈ Z t of the current state of the world X t and theaction proﬁle A t − , given by Z t = O ct ( X t , A t − , W ct ) , (3)where W c T , is a sequence of independent random variables. We note that the agents’ actions A t − is commonly observable at t if A t − ⊆ Z t . We assume that the random variables X , W x T − , W c T , and W i T , i ∈ N are mutually independent.

2) Information structure:

Let H t ∈ H t denote the aggregate information of all agents at time t . Assuming that agents have perfect recall, we have H t = { Z t , Y N t , A N t − } , i.e. H t denotesthe set of all agents’ past observations and actions. The set of all possible realizations of theagents’ aggregate information is given by H t := (cid:81) τ ≤ t Z τ × (cid:81) i ∈N (cid:81) τ ≤ t Y iτ × (cid:81) i ∈N (cid:81) τ

3) Strategies and Utilities:

Let H it := { C t , P it } ∈ H it denote the information available to agent i at t , where H it denote the set of all possible realizations of agent i ’s information at t . Agent i ’s behavioral strategy g it , t ∈ T , is deﬁned as a sequence of mappings g it : H it → ∆( A it ) , t ∈ T , November 23, 2018 DRAFT that determine agent i ’s action A it for every realization h it ∈ H it of the history at t ∈ T .Agent i ’s instantaneous utility at t depends on the system state X t and the collective actionproﬁle A t , and is given by u it ( X t ,A t ) . Agent i chooses his strategy g i T so as to maximize histotal (expected) utility over horizon T , given by, U i ( X T , A T ) = (cid:88) t ∈T u it ( X t , A t ) . (4)To avoid measure-theoretic technical difﬁculties and for clarity and convenience of exposition,we assume that all the random variables take values in ﬁnite sets. Assumption 1. (Finite game) The sets X t , Z t , Y it , A it , N , and T are ﬁnite. Moreover, we assume that given any sequence of actions a t − up to time t − , every possiblerealization x t ∈ X t of the system state at t has a strictly positive probability of realization. Assumption 2. (Strictly positive transition matrix) For all t ∈ T , x t ∈ X t and a t − ∈ A t − , wehave P { x t | a t − } > . Furthermore, we assume that for any sequence of actions { a T } , all possible realizations ofprivate observations { y N T } have positive probability. That is, no agent can infer perfectly anotheragent’s action based only on his private observations. Assumption 3. (Imperfect private monitoring) For all t ∈ T , y t ∈ Y t , and a t − ∈ A t − , wehave P { y t | a t − } > . Remark 2.

We can relax Assumptions 2 and 3 under certain conditions and obtain resultssimilar to those appearing in this paper; for instance, when agents actions are observable wecan relax Assumptions 2 and 3. Broadly, the crucial assumption that underlies our results isthat every deviation that can be detected by agent i at any time t must be also detectable by allagents at the same time t based only on the common information C t . Due to space limitationwe do not include the discussion of Assumptions 2 and 3 and the extension of our results whenwe relax them; we refer an interested reader to [24].A. Special Cases We discuss several instances of dynamic games with asymmetric information that are specialcases of the general model described above.

November 23, 2018 DRAFT

1) Nested information structure:

Consider a two-player game with one informed player and oneuninformed player and a general Markovian dynamics. At every time t ∈ T , the informed playermakes a private perfect observation of the state X t , i.e. Y t = X t . The uninformed player doesnot have any observation of the state X t . Both the informed and uninformed players observeeach others’ actions, i.e. Z t = { A t − } . Therefore, we have P t = { X t } , P t = ∅ , and C t = { A t − ,A t − } for all t ∈ T . The above nested information structure corresponds to dynamicgames considered in [11], [25], [26], where in [25], [26] the state X t is static.

2) Independent dynamics with observable actions:

Consider an N -player game where thestate X t := ( X t ,X t ,X t ,..., X Nt ) has N components. The agents’ actions A t are observableby all agents, i.e. A t − ⊂ Z t for all t ∈ T . At every time t ∈ T , agent i makes a perfectobservation of its local state X it as well as a global state X t . Moreover, at time t all agentsmake a common imperfect observation of state X it given by Z it = O ct ( X it ,A t − ,W c,it ) , i ∈ N .Conditioned on the agents’ collective action A t , each X it evolves independently over time as X it +1 = f t ( X it ,A t − ,W x,it ) for all i ∈ N and t ∈ T . We assume that X , W ct , t ∈ T , and W x,it , i ∈ N , t ∈ T are mutually independent. Therefore, we have P it = { X i t } and C t = { X t ,Z N t ,A t − } .The above environment includes the dynamic game considered in [17], [18] as special cases.

3) Perfectly controlled dynamics with hidden actions:

Consider a N -player game where thestate X t := ( X t ,X t ,..., X Nt ) has N components. Agent i , i ∈ N , perfectly controls X it , i.e. X it +1 = A it . Agent i ’s actions A it , t ∈ T , is not observable by all other agents − i . Every agent i , i ∈ N , makes a noisy private observation Y ti ( X t , W it ) of the system state at t ∈ T . Therefore,we have P it := { A t , Y i t } , C t = ∅ .III. A PPRAISALS AND A SSESSMENTS

In this section we provide an overview of the notions of appraisals, assessments, and anequilibrium solution concept for dynamic games with asymmetric information. We argue thatan equilibrium solution concept must consist of a pair of a strategy proﬁle and a belief system(to be deﬁned below), and discuss the importance of off-equilibrium path beliefs in dynamicgames. In a dynamic game with asymmetric information agents have private information about theevolution of the game, and they do not observe the complete history of the game given by We refer the interested reader to the papers by Battigalli [27], Myerson and Remy [28], and Watson [29] for more discussion.

November 23, 2018 DRAFT0 { H t ,X t } . Therefore, at every time t ∈ T , each agent, say agent i ∈ N , needs to form (i) anappraisal about the current state of the system X t and the other agents’ information H − it (appraisalabout the history), and (ii) an appraisal about how other agents will play in the future, so asto evaluate the performance of his strategy choices (appraisal about the future). Given the otheragents’ strategies g − i , agent i can utilize his own information H it at t ∈ T , along with (i) otheragents’ past strategies g − i t − and (ii) other agents’ future strategies g − it : T to form these appraisalsabout the history and future of the game, respectively.In contrast to dynamic teams where agents have a common objective and coordinate theirstrategies, in dynamic games each agent has his own objective and chooses his strategy g i soas to maximize his objective. Thus, unlike dynamic teams, in dynamic games strategy g i isagent i ’s private information and not known to other agents. Therefore, in dynamic games, eachagent needs to form a prediction about the other agents’ strategies. We denote this prediction by g ∗ N T to distinguish it from the strategy proﬁle g N T that is actually being played by the agents.Following Nash’s idea, we assume that agents share a common prediction g ∗ about the actualstrategy g . We would like to emphasize that the prediction g ∗ does not necessarily coincidewith the actual strategy g . As we point out later, one requirement of an equilibrium is that forevery agent i ∈ N , the prediction g ∗ i must be an optimal strategy for him given the other agentsprediction strategies g ∗− i .Since an agent’s actual strategy, say agent i ’s strategy g i , is his own private information, itis possible that g i is different from the prediction g ∗ i . Below we discuss the implication ofan agent’s deviation from the prediction strategy proﬁle g ∗ . For that matter, we ﬁrst consideran agent who may want to deviate from g ∗ , and then we consider an agent who faces such adeviation and his response.In dynamic games, when agent i ∈ N chooses his strategy g i , he needs to know how otheragents will play for any choice of g i which can be different from the prediction g ∗ i . Therefore,the prediction g ∗ has to be deﬁned at all possible information realizations ( i.e. information sets)of every agent, those that have positive probability under g ∗ as well as those that have zeroprobability under g ∗ . Using the prediction g ∗ , any agent, say agent i , can form an appraisalabout the future of the game for any strategy choice g i , and evaluate the performance of g i . This is not an issue in dynamic teams since agents coordinate in advance their choice of strategy proﬁle g , and no agent hasan incentive to (privately) deviate from it. Hence, the agents’ strategy proﬁle g is needed to be deﬁned only on information setsof positive probability under g . November 23, 2018 DRAFT1

By the same rationale, when agent i chooses g i he needs to determine his strategy for allof his information sets, even those that have zero probability under g ∗− i . This is because it ispossible that some agent j ∈ N may deviate from g ∗ j and play a strategy g j that is different fromthe prediction g ∗ j . Agent i must foresee these possible deviations by other agents and determinehis response to these deviations.To determine his optimal strategy g i at any information set, agent i needs to ﬁrst form anappraisal about the history of the game at t as well as an appraisal about the future of the gameusing the strategy prediction g ∗− i . For an information set h it that is compatible with the prediction g ∗− i given his strategy g i at t ∈ T ( i.e. h it has positive probability of being realized under g ∗ ),agent i can use Bayes’ rule to derive the appraisal about the history of the game at t . However, foran information set h it that has zero probability under the prediction g ∗− i given g i , agent i cannotanymore rely on the prediction g ∗ and use Bayes’ rule to form his appraisal about the historyof the game at t . The realization of history h it tells agent i that his original prediction g ∗− i t − isnot (completely) correct, thus, he needs to revise his original prediction g ∗− i t − and to form arevised appraisal about the history of the game at t . Therefore, agent i must determine how toform/revise his appraisal about the history of the game for every realization h it ∈ H it , t ∈ T , thathas zero probability under g ∗− i . We note that upon reaching an information set of measure zero,agent i only revises his prediction g ∗− i t − about other agents’ past strategies, but does not changehis prediction g ∗− it : T about their future strategies. This is because at equilibrium, the prediction g ∗− it : T speciﬁes a set of strategies for other agents that are optimal in the continuation game thattakes place after the realization of the information set h it of zero probability under g ∗ t − . We describe below how one can formalize the above issues we need to consider in the study ofdynamic games with asymmetric information. Following the game theory literature [30], agents’appraisals about the history and future of the game can be captured by an assessment that allagents commonly hold about the game. We deﬁne an assessment as a pair of mappings ( g ∗ ,µ ) ,where g ∗ := { g ∗ it ,i ∈ N , t ∈ T } , g ∗ it : H it → ∆( A it ) denotes a prediction about agent i ’s strategy at t , and µ := { µ it ,i ∈ N ,t ∈ T } , where µ it : H it → ∆( X t × H − it ) denotes agent i ’s belief about the In dynamic teams, agents only need to determine their optimal strategy g for information sets that have positive probabilityunder g . As a result, a collective choice of strategy g is optimal at every information set with positive probability if and onlyif it maximizes the (expected) utility of the team from t = 1 up to T . However, in dynamic games agents need to determinetheir strategies for all information sets irrespective of whether they have zero or positive probability under g ∗ . Therefore, if achoice of strategy g i maximizes agent i ’s (expected) utility from t = 1 to T = 1 , it does not imply that it is also optimal atall information sets that have zero probability under { g ∗− i , g i } . Consequently, a choice of agent i ’s strategy must be optimalfor all continuation games that follow after a realization of an information set h it irrespective of whether it has zero or positiveprobability. November 23, 2018 DRAFT2 system state X t and agents − i ’s information H − it given his information H it . The collection ofmappings µ := { µ it ,i ∈ N ,t ∈ T } is called a belief system . For every i ∈ N , t ∈ T , and h it ∈ H it , µ it ( h it ) denotes agent i ’s belief about the history { X t ,H − it } of the game, and g ∗− it : T denotes agent i ’s prediction about all other agents’ continuation strategy from t onward. We note that µ it ( h it ) determines agent i ’s appraisal about the history of the game when h it has either positive orzero probability under g ∗ . Therefore, using an assessment ( g ∗ ,µ ) each agent can fully constructappraisals about the history and future of the game at any t ∈ T .Using the deﬁnition of an assessment, we can extend the idea of Nash equilibrium to dynamicgames with asymmetric information. An equilibrium of the dynamic game is deﬁned as a commonassessment ( g ∗ , µ ) among the agents that satisﬁes the following conditions under the assumptionthat the agents are rational. (i) Agent i ∈ N chooses his strategy g i T so as to maximize histotal expected utility (4) in all continuation games given the assessment ( g ∗ ,µ ) about the game.Therefore, the prediction g ∗ i T that other agents hold about agent i ’s strategy must be a maximizerof agent i ’s total expected utility under the assessment ( g ∗ ,µ ) . (ii) For all t ∈ T , agent i ’s, i ∈ N ,belief µ it ( h it ) at information set h it ∈ H it that has positive probability of realization under g ∗ ,must be equal to the probability distribution of { X t ,H − it } conditioned on the realization h it (determined via Bayes’ rule) assuming that agents − i play according to g ∗− i t . When h it has zeroprobability under the assessment g ∗ , the belief µ it ( h it ) cannot be determined via Bayes’ rule andmust be revised. The revised belief must satisfy a certain set of “reasonable” conditions so asto be compatible with agent i ’s rationality. Various sets of conditions have been proposed inthe literature (see [30], [31]) to capture the notion of ”reasonable” beliefs that are compatiblewith the agents’ rationality. Different sets of conditions for off-equilibrium beliefs µ it ( h it ) resultin the different equilibrium concepts that are proposed for dynamic games with asymmetricinformation.In this paper, we consider Perfect Bayesian Equilibrium (PBE) as the equilibrium solutionconcept. In the next section we provide the formal deﬁnition of PBE.IV. P ERFECT B AYESIAN E QUILIBRIUM

The formal deﬁnition of Perfect Bayesian Equilibrium (PBE) for dynamic games in extensiveform can be found in [31]. In this paper we use a state space representation for dynamic gamesinstead of an extensive game form representation, therefore, we need to adapt the deﬁnition ofPBE to this representation. A PBE is deﬁned as an assessment ( g ∗ ,µ ) that satisﬁes the sequential November 23, 2018 DRAFT3 rationality and consistency conditions. The sequential rationality condition requires that for all i ∈ N , the prediction g ∗ i is optimal for agent i given the assessment ( g ∗ ,µ ) . The consistencycondition requires that for all i ∈ N , t ∈ T , and h it ∈ H it , agent i ’s belief µ ( h it ) must be compatiblewith prediction g ∗ . We formally deﬁne these conditions below.Let P ( g ∗− it : T ,g ∗ it : T ) µ it {·| h it } denote the probability measure induced by the stochastic process that startsat time t with initial condition { X t ,P − it ,p it ,c t } , where random variables { X t ,P − it } are distributedaccording to probability distribution µ it ( h it ) , assuming that agents i and − i take actions accordingto strategies g ∗ it : T and g ∗− it : T , respectively. In the sequel, to save some notation, we write P g ∗ µ {·} instead of P ( g ∗− it : T ,g ∗ it : T ) µ it {·} whenever there is no confusion. Deﬁnition 1 (Sequential rationality) . We say that an assessment ( g ∗ ,µ ) is sequentially rationalif ∀ i ∈ N , t ∈ T , and h it ∈ H it , the strategy prediction g ∗ it : T is a solution to sup g it : T E ( g ∗− it : T ,g it : T ) µ it (cid:40) T (cid:88) τ = t u it ( X t , A t ) | h it (cid:41) (5)The sequential rationality condition (5) requires that, given the assessment ( g ∗ ,µ ) , the pre-diction strategy g i ∗ t for agent i is an optimal strategy for him for all continuation games afterhistory realization h it ∈ H i , irrespective of whether h it has positive or zero probability under ( g ∗ ,µ ) . That is, the common prediction g ∗ i about agent i ’s strategy must be an optimal strategychoice for him since it is common knowledge that he is a rational agent. We note that thesequential rationality condition deﬁned above is more restrictive than the optimality conditionfor Bayesian Nash Equilibrium (BNE) which only requires (5) to hold at t = 1 . By the sequentialrationality condition, we require the optimality of prediction g ∗ even along off-equilibrium paths,and thus, we rule out the possibility of non-credible threats (see [30] for more discussion).The sequential rationality condition results in a set of constraints that the strategy prediction g ∗ must satisfy given a belief system µ . As we argued in Section III, the belief system µ mustbe also compatible with the strategy prediction g ∗ . The following consistency condition capturessuch compatibility between the belief system µ and the prediction g ∗ . Deﬁnition 2 (Consistency) . We say that an assessment ( g ∗ ,µ ) is consistent ifi) for all i ∈ N , t ∈ T \{ } , h it − ∈ H it − , and h it ∈ H it such that P g ∗ µ { h it | h it − } > , the belief November 23, 2018 DRAFT4 µ it ( h it ) must satisfy Bayes’ rule, i.e. µ it ( h it )( x t , p − it ) = P g ∗ { h it , x t , p − it | h it − } P g ∗ { h it | h it − } . (6) ii) for all i ∈ N , t ∈ T \{ } , h it − ∈ H it − , and h it ∈ H it such that P g ∗ µ { h it | h it − } = 0 , we have µ it ( h it )( x t , p − it ) > only if there exists an open loop strategy ( A − i t − = ˆ a − i t − ,A i t − = a i t − ) such that P ( A − i t − =ˆ a − i t − ,A i t − = a i t − ) { x t , p − it } > . (7)The above consistency condition places a restriction on the belief system µ so that it iscompatible with the strategy prediction g ∗ . For information sets along equilibrium paths, i.e. P g ∗ µ i { h it } > , belief µ it ( h it ) must be updated according to (6) via Bayes’ rule since agent i ’sobservations are consistent with the prediction g ∗ . For information sets along off-equilibriumpaths, i.e. P g ∗ µ i { h it } = 0 , agent i needs to revise his belief about the strategy of agents − i asthe realization of h it indicates that some agent has deviated from prediction g ∗− i t . As pointedout before, the revised belief µ i ( h it ) must be “reasonable”. Deﬁnition 2 provides a set of such“reasonable” conditions captured by (6) and (7) that we discuss further below.First, consider an information set h it along an off-equilibrium path such that P g ∗ µ it − { h it | h it − } > .That is, conditioned on reaching information set h it − at t − , h it has a positive probability underthe prediction strategy g ∗ . Since P g ∗ µ i { h it } = P g ∗ µ it − { h it | h it − } P g ∗ µ i { h it − } and P g ∗ µ i { h it } = 0 , we have P g ∗ µ i { h it − } = 0 . Therefore, h it − is also an information set along an off-equilibrium path, and µ i ( h it − ) is a revised belief that agent i holds at t − . Note that if the assessment ( g ∗ ,µ ) satisﬁesthe sequential rationality condition, g ∗ is a best response for all agents in all continuation gamesthat follow the realization of every information set of positive or zero probability. Moreover,since P g ∗ t − µ it − { h it | h it − } > , the realization of h it conditioned on reaching h it − is consistent withthe strategy prediction g ∗ t − . Therefore, agent i does not have any reason to further revise hisbelief about agents − i ’s strategy beyond the revision that results in µ it − ( h it − ) . Thus, agent i determines his belief µ it ( h it ) by utilizing his belief µ it − ( h it − ) at t − and updating it via Bayes’rule assuming that agents − i ’ play according to g ∗− it − (see part (i), eq. (6)).Next, consider an information set h it along an off-equilibrium path such that P g ∗ µ it { h it | h it − } = 0 .That is, conditioned on reaching information set h it − at t − , h it has a zero probability of November 23, 2018 DRAFT5 realization under the prediction g ∗ . In this case, the realization of h it indicates that agents − i have deviated from prediction g ∗− i t − , and this deviation has not been detected by agent i before.Therefore, agent i needs to form a new belief about agents − i ’s private information P − it and thestate X t by revising µ it ( h it ) . Part (ii) of the consistency condition concerns such belief revisionsand requires that the support of agent i ’s revised belief µ it ( h it ) includes only the states and privateinformation that are feasible under the system and information dynamics (1) and (2), that is,they are reachable under some open-loop control strategy ( A − i t − = ˆ a − i t − ,A i t − = a i t − ) . Wenote that since we are using a state representation of the dynamic game, we need to imposesuch a requirement, whereas in the equivalent extensive form representation of the game such arequirement is satisﬁed by the construction of the game-tree. Remark 3.

Under Assumptions 2 and 3, we have P ( A t − =ˆ a t − ) µ { x t ,p − it } > for all ( A t − =ˆ a t − ) . Therefore part (ii) of the consistency conditions is trivially satisﬁed. In the rest of thepaper, we ignore part (ii) and only consider part (i) of the deﬁnition of consistency. In [24], wediscuss the case when we relax Assumptions 2 and 3. We can now provide the formal deﬁnition of PBE for the dynamic game of Section II.

Deﬁnition 3.

An assessment ( g ∗ ,µ ) is called a PBE if it satisﬁes the sequential rationality andconsistency conditions. The deﬁnition of Perfect Bayesian equilibrium provides a general formalization of outcomesthat are rationalizable ( i.e. consistent with agents’ rationality) under some strategy proﬁle andbelief system. However, as we argue further in Section VII, there are computational and philo-sophical reasons that motivate us to deﬁne a sub class of PBEs that provide a simpler andmore tractable approach to characterizing the outcomes of dynamic games with asymmetricinformation.There are two major challenges in computing a PBE ( g ∗ ,µ ) . First, there is an inter-temporalcoupling between the agents’ strategy prediction g ∗ and belief system µ . According to theconsistency requirement, the belief system µ has to satisfy a set of conditions given a strategyprediction g ∗ . On the other hand, by sequential rationality, a strategy prediction g ∗ must satisfya set of optimality condition given belief system µ . Therefore, there is a circular dependencybetween a prediction strategy g ∗ and a belief system µ over time. For instance, by sequentialrationality, agent i ’s strategy g i ∗ t at time t depends on the agents’ future strategies g ∗ t : T and on November 23, 2018 DRAFT6 the agents’ past strategies g ∗ t − indirectly through the consistency condition for µ it . As a result,one needs to determine the strategy prediction g ∗ and belief system µ simultaneously for thewhole time horizon so as to satisfy the sequential rationality and consistency conditions, andthus, cannot sequentially decompose the computation of PBE over time. Second, the agents’information h it , i ∈ N , has a growing domain over time. Hence, the agents’ strategies havegrowing domains over time, and this feature further complicates the computation of PBEs ofdynamic games with asymmetric information.The deﬁnition of PBE requires an agent to keep track of all observations he acquires overtime and to form beliefs about the private information of all other agents. As we show next,agents do not need to keep track of all of their past observations to reach an equilibrium. Theycan take into account fewer variables for decision making and ignore part of their informationthat is not relevant to the continuation game. As we argue in Section VII, the class of simplerstrategies proposed in this paper characterize a more plausible prediction about the outcome ofthe interaction among agents when the underlying system is highly dynamic and there existsconsiderable information asymmetry among them.V. T HE S UFFICIENT I NFORMATION A PPROACH

We characterize a class of PBEs that utilize strategy choices that are simpler than generalbehavioral strategies as they require agents to keep track of only a compressed version of theirinformation over time. We proceed as follows. In Section V-A we provide sufﬁcient conditions forthe subset of private information an agent needs to keep track of over time for decision makingpurposes. In Section V-D, we introduce the sufﬁcient information based belief as a compressedversion of the agents’ common information that is sufﬁcient for decision-making purposes. Basedon these compressions of the agents’ private and common information, we introduce the notionof sufﬁcient information based assessments and Sufﬁcient Information Based-Perfect BayesianEquilibrium (SIB-PBE) in Sections V-D and V-E, respectively.

A. Sufﬁcient Private Information

The key ideas for compressing an agent’s private information appear in Deﬁnitions 4 below;We refer an interested reader to the companion paper [2] for discussion on the rationale behindDeﬁnition 4.

November 23, 2018 DRAFT7

Deﬁnition 4 (Sufﬁcient private information) . We say S it = ζ it ( P it , C t ; g ∗ t − ) , i ∈ N , t ∈ T , issufﬁcient private information for the agents if,(i) it can be updated recursively as S it = φ it ( S it − , H it \ H it − ; g ∗ t − ) if t ∈ T \{ } , (8) (ii) for any strategy proﬁle g ∗ and for all realizations { c t , p t , p t +1 , z t +1 , a t } ∈ C t × P t × P t +1 ×Z t +1 of positive probability, P g ∗ t (cid:110) s t +1 ,z t +1 (cid:12)(cid:12)(cid:12) p t ,c t ,a t (cid:111) = P g ∗ t (cid:110) s t +1 ,z t +1 (cid:12)(cid:12)(cid:12) s t ,c t ,a t (cid:111) , (9) where s Nτ = ζ Nτ ( p Nτ ,c τ ; g ∗ τ − ) for τ ∈ T ;(iii) for every strategy proﬁle ˜ g ∗ of the form ˜ g ∗ := { ˜ g ∗ it : S it × C t → ∆( A it ) , i ∈ N ,t ∈ T } and a t ∈ A t , t ∈ T ; E ˜ g ∗ t − (cid:110) u it ( X t ,A t ) (cid:12)(cid:12)(cid:12) c t ,p it ,a t (cid:111) = E ˜ g ∗ t − (cid:110) u it ( X t ,A t ) (cid:12)(cid:12)(cid:12) c t ,s it ,a t (cid:111) , (10) for all realizations { c t ,p it }∈ C t × P it of positive probability where s Nτ = ζ Nτ ( p Nτ ,c τ ; ˜ g ∗ τ − ) for τ ∈ T ;(iv) given an arbitrary strategy proﬁle ˜ g ∗ of the form ˜ g ∗ := { ˜ g it : S it ×C t → ∆( A it ) , i ∈ N , t ∈ T } , i ∈ N , and t ∈ T , P ˜ g ∗ t − (cid:110) s − it (cid:12)(cid:12)(cid:12) p it ,c t (cid:111) = P ˜ g ∗ t − (cid:110) s − it (cid:12)(cid:12)(cid:12) s it ,c t (cid:111) , (11) for all realizations { c t ,p it } ∈ C t ×P it of positive probability where s Nτ = ζ Nτ ( p Nτ ,c τ ; ˜ g ∗ τ − ) for τ ∈ T . We note that the conditions of Deﬁnition 4 is written in terms of strategy prediction proﬁle g ∗ for dynamic games. This is because, as we discussed before, the agents’ actual strategy proﬁle g is their private information. Therefore, each agent i , i ∈ N , evaluates the sufﬁciency of acompression of his private information using the strategy prediction he holds about other agents. B. Sufﬁcient Common Information

Based on the characterization of sufﬁcient private information, we present a statistic (com-pressed version) of the common information C t that agents need to keep track of over time fordecision making purposes. November 23, 2018 DRAFT8

Consider the sufﬁcient private information S Nt , t ∈ T . Deﬁne S it to be the set of all possiblerealizations of S it , and S t := (cid:81) Ni =1 S it . Let γ t : C t → ∆( X t ×S t ) denote a mapping that determinesa conditional probability distribution over the system state X t and the agents’ sufﬁcient privateinformation S t given the common information C t at time t . We call the collection of mappings γ := { γ t ,t ∈ T } a Sufﬁcient Information Based belief system (SIB belief system). Note that γ t is only a function of the common information C t , and thus, it is computable by all agents.Let Π γt := γ t ( C t ) denote the (random) sufﬁcient information based belief that agents hold underbelief system γ at t . We can interpret Π γt as the common belief that each agent holds about thesystem state X t and all the agents’ (including himself) sufﬁcient private information S t at time t . We call the SIB belief Π t a sufﬁcient common information for the agents. In the rest of thepaper, we write Π t and drop the superscript γ whenever such a simpliﬁcation in notation is clear.Moreover, we use the terms sufﬁcient common information and SIB belief interchangeably. C. Special Cases:

We consider the special classes described in Section II and identify the sufﬁcient information S N T and SIB belief for each of them.

1) Nested information structure:

The uninformed agent (agent ) has no private information, P t = ∅ . Thus, S t = ∅ . For the informed agent (agent ) consider P ,prt = X t . Consequently, wecan set S t = X t . Note that P t = ∅ , thus, the uninformed agent’s belief about P t is the same asSIB belief Π t = P { X t | A t − ,A t − } .

2) Independent dynamics with observable actions:

Consider S it = X it . Note that X jt , j ∈ N have independent dynamics given the collective action A t that is commonly observable by allagents. Therefore, agent i ’s belief about X j , j (cid:54) = i , is the same as SIB belief, Π t = P g { X jt | C t } = P g { X jt | P it , C t } .

3) Perfectly controlled dynamics with hidden actions:

Since agent i , i ∈ N , perfectly controls X it over time t ∈ T , we set S it = { A it − , Y it } and Π t = ∅ . D. Sufﬁcient Information based Assessment

As we discussed in Section IV, to form a prediction about the game we need to determinean assessment about the game that is sequentially rational and consistent. We show below thatusing the sufﬁcient private information S Nt and sufﬁcient common information (the SIB belief) Π t , we can form a sufﬁcient information based assessment about the game. We prove that sucha sufﬁcient information based assessment is rich enough to capture a subset of PBE. November 23, 2018 DRAFT9

Consider a class of strategies that utilize the information given by (Π t ,S it ) for agent i ∈ N attime t . We call the mapping σ it : ∆( X t × S t ) × S it → ∆( A it ) a Sufﬁcient Information Based (SIB)strategy for agent i at time t . A SIB strategy σ it determines a probability distribution for agent i ’saction A it at time t given his information (Π t ,S it ) . A SIB strategy is a behavioral strategy whereagents only use the SIB belief Π t = γ t ( C t ) (instead of complete common information C t ), andthe sufﬁcient private information S it = ζ it ( P it ,C t ; g ∗ t − ) (instead of complete private information P it ). A collection of SIB strategies { σ T ,..., σ N T } is called a SIB strategy proﬁle σ . The set ofSIB strategies is a subset of behavioral strategies, deﬁned in Section II, as we can deﬁne, g ( σ,γ ) ,it ( h it ) := σ it ( π γt , s it ) . In Section IV, we deﬁned a consistency condition between strategy prediction g ∗ and abelief system µ . Below, we provide an analogous consistency condition between a SIB strategyprediction σ ∗ and a SIB belief system γ . Deﬁnition 5.

A pair ( σ ∗ ,γ ) of a SIB strategy prediction proﬁle σ ∗ and belief system γ satisﬁesthe consistency condition if(i) for all t ∈ T \{ } , z t ∈ Z t , π t − = γ t − ( c t − ) , and π t = γ t ( { c t − ,z t } ) such that P σ ∗ t π t − { z t } > , π t must satisfy Bayes’ rule, i.e., π t ( x t , s t ) = P σ ∗ t π t − { x t , s t , z t } P σ ∗ t π t − { z t } , ∀ x t ∈ X t , ∀ s t ∈ S t , (12) (ii) for all t ∈ T \{ } , c t − ∈ C t − , π t − = γ t − ( c t − ) , z t ∈ Z t , and π t = γ t ( { c t − ,z t } ) such that P σ ∗ t π t − { z t } = 0 , we have π t ( x t , s t ) > , ∀ x t ∈ X t , ∀ s t ∈ S t , only if there exists an open-loop strategy ( A t − = a t − ) such that P ( A t − = a t − ) π { c t − ,z t } > , and P ( A t − = a t − ) π { x t , s t } > , (13) (iii) for all t ∈ T \{ } , c t − ∈ C t − , π t − = γ t − ( c t − ) , z t ∈ Z t , and π t = γ t ( { c t − ,z t } ) such that For t =1 , Π is given by the conditional probability at t =1 as Π ( x ,s ):= P { s | x ,z } (cid:80) ˆ x ∈X P { z | ˆ x } η (ˆ x ) . November 23, 2018 DRAFT0 P σ ∗ t π t − { z t } = 0 , we have (cid:88) x t ∈X t π t ( x t , s t ) > , ∀ s t ∈ S t if there exists an open-loop strategy ( A t − = a t − ) such that P ( A t − = a t − ) π { c t − ,z t } > ,and (cid:88) x t ∈X t P ( A t − = a t − ) π { x t , s t } > . (14)Parts (i) and (ii) of Deﬁnition 5 follow from rationales similar to their analogues in Deﬁnition2, and require a SIB belief system to satisfy a sets of constraints with respect to a SIB strategyproﬁle that are similar to those for an assessment ( g ∗ ,µ ) . Deﬁnition 5 requires an additionalcondition described by part (iii). By (14), we require a SIB belief system γ consistent with theSIB strategy proﬁle σ ∗ to assign a positive probability to every realization s t of the agents’sufﬁcient private information S t that is “plausible” given the common information realization c t = { c t − ,z t } ; plausibility of s t given c t means that there exists an open-loop strategy proﬁle ( A t − = a t − ) consistent with the realization c t that leads to the realization of s t with positiveprobability. Therefore, part (iii) ensures that there exists no possible incompatibility between theSIB belief Π t and the agents’ sufﬁcient private information S t +1 . As we show later (SectionV-E), such a compatibility condition allows each agent to reﬁne the SIB belief Π t using his ownprivate sufﬁcient information S it , and to form his private belief about the game. Remark 4.

Assumptions 2 and 3 imply that (13) holds for all ( A t − = a t − ) such that P ( A t − = a t − ) { c t } > . Therefore, in the rest of the paper, we ignore part (ii) of the consistencycondition for SIB belief systems. Moreover, under Assumptions 2 and 3, condition (14) isalways satisﬁed. Therefore, condition (iii) is equivalent to having (cid:80) x t ∈X t π t ( x t ,p t ) > whenever P σ ∗ π t − { z t } = 0 . Given a SIB strategy proﬁle prediction σ ∗ , a consistent SIB belief must satisfy (12), whichdetermines the SIB belief Π t at t in terms of the SIB belief Π t − at t − and the new commoninformation Z t at t . We deﬁne a SIB belief update rule as a mapping ψ t : ∆( X t − ×S t − ) ×Z t → November 23, 2018 DRAFT1 ∆( X t × S t ) , t ∈ T that determines recursively the SIB belief Π ψt := ψ t (Π ψt − , Z t ) , (15)as a function of new common observation Z t at t and the SIB belief Π ψt − at t − . The superscript ψ in Π ψt indicates that the SIB belief Π ψt is generated using the SIB update rule ψ . Let γ ψ denotethe common belief system that is equivalent to the SIB update rule ψ . We call a SIB belief updaterule ψ consistent with a SIB strategy proﬁle σ ∗ if the equivalent SIB belief system γ ψ is consistentwith σ ∗ (Deﬁnition 5).Deﬁne a SIB assessment ( σ ∗ ,γ ) as a pair of SIB strategy proﬁle σ ∗ and a SIB belief system γ .Below, we show that a consistent SIB assessment ( σ ∗ ,γ ) is equivalent to a consistent assessment ( g ∗ ,µ ) as deﬁned in Section IV (Deﬁnition 2). Lemma 1.

For any given SIB assessment ( σ ∗ ,γ ) , there exists an equivalent assessment ( g ∗ ,µ ) of a behavioral strategy prediction g ∗ and belief system µ such that:i) the behavioral strategy g ∗ is deﬁned by g ∗ it ( h it ) := σ ∗ it ( π γt , s it ); (16) ii) the belief system µ is consistent with g ∗ and satisﬁes P g ∗ (cid:8) s − it | h it (cid:9) = P (cid:8) s − it | π t , s it (cid:9) , (17) for all i ∈ N , t ∈ T , h it ∈ H it , and s − it ∈ S − it . Lemma 1 shows that the set of consistent SIB assessment ( σ ∗ ,γ ) is equivalent to a subset ofconsistent assessments ( g ∗ ,µ ) . That is, using the SIB belief system γ and SIB strategy proﬁle σ ∗ ,agents can form a consistent assessment about the evolution of the game. Moreover, condition(17) implies that the SIB belief Π t along with agent i ’s sufﬁcient information S it capture all theinformation in H it that is relevant to agent i ’s belief about S − it . Upon reaching an information set of measure zero (parts (ii) and (iii) of Deﬁnition 5), the revised SIB belief could bea function of C t = { C t − ,Z t } , rather than only Π t − ( C t − ) and Z t . Therefore, the set of SIB belief systems that can begenerated from SIB update rules is a subset of all consistent SIB belief systems given by Deﬁnition 5. However, we argue thatupon reaching an information set of measure zero, it is more plausible to revise the SIB belief only as a function of relevantinformation Π t − ( C t − ) and Z t ; C t is irrelevant given Π t − ( C t − ) and Z t . November 23, 2018 DRAFT2

E. Sufﬁcient Information based PBE

Using the result of Lemma 1, we can deﬁne a class of PBE, called Sufﬁcient Informationbased PBE (SIB-PBE), as a set of equilibria for dynamic games with asymmetric informationthat can be expressed as SIB assessments.

Deﬁnition 6.

A SIB assessment ( σ ∗ ,γ ) is called a SIB-PBE if γ is consistent with σ ∗ (Deﬁnition5), and the equivalent consistent assessment ( g ∗ , µ ) , given by Lemma 1, is a PBE. In the sequel, we also call a consistent pair ( σ ∗ ,ψ ) of a SIB strategy prediction proﬁle σ ∗ andSIB belief update rule ψ a SIB-PBE if ( σ ∗ ,γ ψ ) is a SIB-PBE.Throughout Section V, we assumed that agents play according to the strategy predictions g ∗ (or SIB strategy predictions σ ∗ ). However, an agent’s, say agent i ∈ N ’s, actual strategy g i ishis private information and could be different from g ∗ if such a deviation is proﬁtable for him.The proposed class of SIB assessments imposes two restrictions on agents’ strategies and beliefscompared to the general class of assessment presented in Section IV. First, it requires that eachagent i , i ∈ N , must play a SIB strategy σ ∗ i instead of a general behavioral strategy g ∗ i . Second,it requires that each agent i , i ∈ N , must form a belief about the status of the game using onlythe SIB belief Π t along with his sufﬁcient private information S it (instead of a general belief µ it ). A strategic agent i ∈ N does not restrict his choice of strategy to SIB strategies, and maydeviate from σ ∗ i to a non-SIB strategy g i if it is proﬁtable to him. Moreover, a strategic agent i does not limit himself to form belief about the current status of the game only based on Π t and S it , and may instead use a general belief µ i if it enables him to improve his expected utility.In the next section, we address these strategic concerns, and show that no agent i ∈ N wants todeviate from (Π ,σ ∗ ) and play a non-SIB strategy g i when all other agents are playing accordingto SIB assessment (Π ,σ ∗ ) . This result allows us to focus on the class of SIB assessments, anddevelop a methodology to sequentially decompose the dynamic game over time.VI. M AIN R ESULTS

In this section, we show that the class of SIB assessments is rich enough to capture the agents’strategic interactions.We ﬁrst show that when agents − i play according to a SIB assessment ( σ ∗ ,γ ) , agent i , i ∈ N , cannot mislead these agents by playing a strategy g i different from σ ∗ i ,thus, creating dual beliefs, one belief that is based on the SIB assessment ( σ ∗ ,γ ) the functionalform of which is known to all agents, and another belief that is based on his private strategy November 23, 2018 DRAFT3 g i that is only known to him (Theorem 1). Then, we show that given that agents − i play SIBstrategy σ ∗− i , agent i has a response that is a SIB strategy (Theorem 2).The result of Theorems 1 (resp. 2) for agent i ∈ N assumes that all other agents − i are playingaccording to strategy prediction g ∗− i (resp. σ ∗− ). The same results hold for every continuationgame that starts at any time t ∈ T along an off-equilibrium path; they can be proved by relabelingtime t as time , and using the SIB belief π t = γ t ( c t ) and the corresponding belief µ it ( h it ) , deﬁnedby Lemma 1, as the initial common belief for the continuation game.Using the results of Theorems 1 and 2, we present a methodology to determine the setof SIB-PBEs of stochastic dynamic games with asymmetric information (Theorem 3). Theproposed methodology leads to a sequential decomposition of stochastic dynamic games withasymmetric information. This decomposition gives rise to a dynamic program that can be utilizedto compute SIB-PBEs via backward induction. We proceed by stating the following results fromthe companion paper [2, Theorem 1]. Theorem 1 (Policy-independence belief property [2]) . (i) Consider a general strategy prediction proﬁle g ∗ . If agents − i play according to strategypredictions g ∗− i , then for every strategy g i that agent i plays, P g ∗ ,g − i (cid:110) x t , p − it (cid:12)(cid:12)(cid:12) h it (cid:111) = P g ∗− i (cid:110) x t , p − it (cid:12)(cid:12)(cid:12) h it (cid:111) . (18) (ii) Consider a SIB strategy prediction proﬁle σ ∗ along with the associated consistent updaterule ψ . If agents − i play according to SIB strategy predictions σ ∗− i , then for every generalstrategy g i that agent i plays, P σ ∗− i ,g i ψ (cid:110) x t , p − it (cid:12)(cid:12)(cid:12) h it (cid:111) = P σ ∗− i ψ (cid:110) x t , p − it (cid:12)(cid:12)(cid:12) h it (cid:111) . (19)Part (i) of Theorem 1 states that under perfect recall agent i ’s belief is independent of hisactual strategy g i . Therefore, agent i cannot mislead agents − i by deviating from the SIB strategyprediction g ∗ i to a behavioral strategy g i so as to create dual beliefs (described above) that hecan use to his advantage. Part (ii) of Theorem 1 concerns situations where all agents exceptagent i play SIB strategies σ ∗− i . Given SIB update rule ψ that is consistent with σ ∗ , it statesthat agent i ’s beliefs independent of his actual strategy g i . We note when agents − i play SIBstrategies the consistency condition for SIB update rule ψ depends on strategy prediction σ ∗ i they have about agent i . November 23, 2018 DRAFT4

Using the result of Theorem 1, we show that agent i ∈ N does not gain by playing a non-SIBstrategy ˜ g i when all other agents − i are playing SIB strategies σ ∗− i . Theorem 2 (Closedness of SIB strategies) . Consider a consistent SIB assessment ( σ ∗ ,γ ψ ) where ψ is a SIB update rule consistent with σ ∗ . If every agent j ∈ N , j (cid:54) = i plays the SIB strategy σ ∗ j ,then, there exists a SIB strategy σ i for agent i that is a best response to σ ∗− i . The results of Theorems 1 and 2 address the two restrictions (discussed above) imposed inSIB assessments on the agents’ beliefs and strategies, respectively.Based on these results, we restrict attention to SIB assessments, and provide a sequentialdecomposition of dynamic games with asymmetric information. A SIB-PBE is SIB assessmentthat is a ﬁxed point under the best response map for all agents. Below, we formulate a dynamicprogram that enables us to compute SIB-PBEs of dynamic games with asymmetric information.Consider a dynamic program over time horizon

T ∪ { T + 1 } with information state { Π t ,S t } , t ∈ T . Let V t := { V it : ∆( X t × S t ) × S t → R , i ∈ N } denote the value function that capturesthe continuation payoffs for all agents, for all realizations of the SIB belief Π t and the agents’private sufﬁcient information S t , t ∈ T . Set V iT +1 = 0 for all i ∈ N . For each stage t ∈ T ofthe dynamic program consider the following static game. Stage game G t ( π t ,V t +1 ,ψ t +1 ) : Given the value function V t +1 and SIB update rule ψ t +1 , wedeﬁne the stage game G t ( π t ,V t +1 ,ψ t ) as a static game of asymmetric information among agentsfor every realization π t . Each agent i ∈ N has private information S it that is distributed accordingto π t , which is common knowledge among the agents. Given a realization a t of the agents’collective action proﬁle and a realization s t of the agents’ sufﬁcient private information, agent i ’s utility is given by ¯ U it ( a t ,s t ,π t ,V t +1 ,ψ t +1 ) := E π t (cid:110) u it ( X t ,a t )+ V it +1 ( ψ t +1 ( π t ,Z t +1 ) ,S t +1 ) (cid:12)(cid:12)(cid:12) π t ,s t ,a t (cid:111) . (20) BNE correspondence:

We deﬁne the correspondence

BNE t ( V t +1 ,ψ t +1 ) , t ∈ T , as the corre-spondence mapping that characterizes the set of BNEs of the stage game G t ( π t ,V t +1 ,ψ t +1 ) forevery realization of π t ; this correspondence is given by BNE t ( V t +1 ,ψ t +1 ) := (cid:8) σ ∗ t : ∀ π t ∈ ∆( X t ×S t ) , σ ( π t , · ) is a BNE of G t ( π t ,V t +1 ,ψ t +1 ) (cid:9) . (21)We say σ ∗ ( π t , · ) is a BNE of the stage game G t ( π t ,V t +1 ,ψ t ) if for all agents i ∈ N , and for all November 23, 2018 DRAFT5 s it ∈ S it , σ ∗ i ( π t ,s it ) ∈ arg max α ∈ ∆( A it ) E π t (cid:110) ¯ U it (( α,σ ∗− i ( π t ,S − it )) ,S t ,π t ,V t +1 ,ψ t +1 ) (cid:12)(cid:12)(cid:12) π t ,s it (cid:111) . (22)Below, we provide a sequential decomposition of dynamic games with asymmetric informationusing the stage game and the BNE correspondence deﬁned above. Theorem 3 (Sequential decomposition) . A pair ( σ ∗ ,ψ ) of a SIB strategy proﬁle σ ∗ and a SIBupdate rule ψ (equivalently, a SIB assessment ( σ ∗ ,γ ψ ) ) is a SIB-PBE if ( σ ∗ , ψ ) solves thefollowing dynamic program:for t = T +1 : V iT +1 ( · ) := 0 ∀ i ∈ N ; (23) for t ∈ T : σ ∗ t ∈ BNE t ( V t +1 , ψ t +1 ) , (24) ψ t +1 is consistent with σ ∗ , (25) V it ( π t ,s t ) := E σ ∗ t ψ t +1 (cid:110) ¯ U it (( σ ∗ ( π t ,S t )) ,S t ,π t ,V t +1 ,ψ t +1 ) (cid:12)(cid:12)(cid:12) π t ,s it (cid:111) , ∀ i ∈ N . (26)VII. D ISCUSSION

We showed that SIB assessments proposed in this chapter are rich enough to capture a setof PBE. However, we would like to point out that the concept of SIB-PBE does not capture allPBEs of a dynamic game in general. We elaborate on the relation between the of SIB-PBE andPBE below. We argue that the set of SIB-PBEs are more plausible to arise as the informationasymmetry among the agents increases and the underlying system is dynamic.We presented an approach to compress the agents’ private and common information byproviding conditions sufﬁcient to characterize the information that is relevant for decision makingpurposes. Such information compression means that the agents do not incorporate into theirdecision making processes their observations that are irrelevant to the continuation game. Aswe show in [2], this information compression is without loss of generality for dynamic teamproblems. However, this is not the case in dynamic games. In general, the set of SIB-PBEsof a dynamic game is a subset of all PBEs of that game. This is because in a dynamic gameagents can incorporate their past irrelevant observations into their future decisions so as to create

November 23, 2018 DRAFT6 rewards (resp. punishments) that incentivize the agents to play (resp. not play) speciﬁc actionsover time. By compressing the agents’ private and common information in SIB assessments, wedo not capture such punishment/reward schemes that are based on past irrelevant observations.Below, we present an example where there exists a PBE that cannot be captured as a SIB-PBE.Consider a two-agent repeated game with T = 2 and a payoff matrix given in Table I. Ateach stage, agent chooses from { U,D } , and agent chooses from { L,M,R } . We assumethat agents’ actions are observable. Therefore, the agents have no private information, and thesufﬁcient private information and SIB belief are trivial. The stage game has two equilibria inpure strategies given by ( D,M ) and ( U,R ) . Using the results of Theorem 3, we can characterizefour SIB-PBEs of the repeated game that correspond to the different combinations of the twoequilibria of the stage game as follows: ( DD,MM ) , ( U U,RR ) , ( DU,MR ) , and ( UD, RM ) .However, there exists an another PBE of the repeated game that cannot be captured as a SIB-PBE. Consider the following equilibrium: Play ( U,L ) at t = 1 . If agent plays L at t = 1 thenplay ( U,R ) ; otherwise, play ( D,M ) at t = 2 . Note that agent ’s decision at t = 2 depends onagent ’s action at t = 1 , which is a payoff-irrelevant information since the two stages of thegame are independent. Nevertheless, agent utilizes agent ’s action at t = 1 , and punishes him(agent ) by playing D at t = 2 if he deviates from playing L at t = 1 .L M RU (8,3) (0,2) (2,10)D (0,1) (1,2) (0,0)TABLE I: Payoff matrixWe would like to point out that there are instances of dynamic games with asymmetricinformation, such as zero-sum dynamic games [32], where the equilibrium payoffs for the agentsare unique. In these games it is not possible to incorporate pay-off irrelevant information so as toconstruct additional equilibria where the agents’ payoffs are different from the ones correspondingto SIB-PBEs; this is clearly the case for zero-sum games since the agents do not cooperate oncreating punishment/reward schemes due to the zero-sum nature of the game. In Section VIII,we utilize this fact and prove the existence of SIB-PBEs in zero-sum games. Furthermore, weshow that SIB-PBEs are equivalent to PBEs, in terms of the agents’ equilibrium payoffs, in thesegames.While it is true that in general, the set of PBEs of a dynamic game is larger than the set ofSIB-PBEs of that game, in the remainder of this section, we provide three reasons on why in November 23, 2018 DRAFT7 a highly dynamic environment with information asymmetry among agents, SIB-PBEs are moreplausible to arise as an outcome of a game.First, we argue that in the face of a highly dynamic environment, an agent with partialobservations of the environment should not behave fundamentally different whether he interactsin a strategic or cooperative environment. From the single-agent decision making point of view( i.e. control theory), SIB strategies are the natural choice of an agent for decision making purposes(See Thm. 2 in the companion paper [2]). Second, we argue that in a highly dynamic environment with information asymmetry amongthe agents, the formation of punishment/reward schemes that utilize the agents’ payoff-irrelevantinformation requires prior complex agreements among the agents; these complex agreements aresensitive to the parameters of the model and are not very plausible to arise in practice whenthe decision making problem for each agent is itself a complex task. We note that the set ofPBEs that cannot be captured as SIB-PBEs are the ones that utilize payoff-irrelevant informationto create punishment/reward schemes in the continuation game as in the example above. How-ever, such punishment/reward schemes require the agents to form a common agreement amongthemselves on how to utilize such payoff-irrelevant information and how to implement suchpunishment/reward schemes. The formation of such a common agreement among the agentsis more likely in games where the underlying system is not highly dynamic (as in repeatedgames [7]) and there is no much information asymmetry among agents. However, in a highlydynamic environment with information asymmetry among agents the formation of such commonagreement becomes less likely for the following reasons. First, in those environments each agent’sindividual decision making process is described by a complex POMDP; thus, strategic agentsare less likely to form a prior common agreement (that depend on the solution of the individualPOMDPs) in addition to solving their individual POMDPs. Second, as the information asymmetryincreases among agents, punishment/reward schemes that utilize payoff-irrelevant informationrequire a complex agreement among the agents that is sensitive and not robust to changes inthe assumptions on the information structure of the game. For instance, consider the exampledescribed above, but assume that agents observe imperfectly each others’ actions at each stage(Assumption 3). Let − (cid:15) , (cid:15) ∈ (0 , , denote the probability that agents observe each others’ The SIB approach proposed in this paper for dynamic games along with the SIB approach to dynamic teams proposed in thecompanion paper [2] provide a uniﬁed approach to the study of agents’ behavior in a dynamic environment with informationasymmetry among them.

November 23, 2018 DRAFT8 actions perfectly, and (cid:15) denote the probability that their observation is different from the trueaction of the other agent. Then, the described non-SIB strategy proﬁle above remains as a PBE ofthe game only if (cid:15) ≤ . The author of [33] provides a general result on the robustness of above-mentioned punishment/reward schemes in repeated games; he shows that the set of equilibriathat are robust to changes in information structure that affect only payoff-irrelevant signals doesnot include the set of equilibria that utilize punishment/reward schemes described above.Third, the proposed notion of SIB-PBE can be viewed as a generalization of Markov PerfectEquilibrium [3] to dynamic games with asymmetric information. Therefore, a similar set ofrationales that support the notion of MPE also applies to the notion of SIB-PBE as follows. First,the set of SIB assessments describe the simplest form of strategies capturing the agents’ behaviorthat is consistent with the agents’ rationality. Second, the class of SIB assessments captures thenotion that “bygones are bygones”, which also underlies the requirement of subgame perfectionin equilibrium concepts for dynamic games. That is, the agents’ strategies in two continuationgames that only differ in the agents’ information about payoff-irrelevant events must be identical.Third, the class of SIB assessments embodies the principle that “minor changes in the past shouldhave minor effects”. This implies that if there exists a small perturbation in the speciﬁcations ofthe game or the agents’ past strategies that are irrelevant to the continuation game, the outcomeof the continuation game should not change drastically. The two-step example above presentsone such situation, where one equilibrium that is not SIB-PBE disappears suddenly as (cid:15) → .VIII. E XISTENCE OF E QUILIBRIA

As we discussed in Section VII, there exist PBEs that cannot be described as SIB-PBEs ingeneral. Therefore, the standard results that guarantee the existence of a PBE for dynamic gameswith asymmetric information [31, Proposition. 249.1] cannot be used to guarantee the existenceof a SIB-PBE in these games. In this Section, we discuss the existence of SIB-PBEs for dynamicgames with asymmetric information. We provide conditions that are sufﬁcient to guarantee theexistence of SIB-PBEs (Lemmas 2 and 3). Using the result of Lemma 2, we prove the existenceof SIB-PBEs for zero-sum dynamic games with asymmetric information (Theorem 4). Usingthe result of Lemma 3, we identify instances of non-zero-sum dynamic games with asymmetricinformation where we can guarantee the existence of SIB-PBEs.

Lemma 2.

The dynamic program given by (24)-(26) has at least one solution at stage t if thevalue function V t +1 is continuous in π t +1 . November 23, 2018 DRAFT9

We note that the condition of Lemma 2 is always satisﬁed for t = T by deﬁnition of V T +1 ; see(20) and (23). However, for t < T , it is not straightforward, in general, to prove the continuityof the value function V t in π t . Given V t +1 is continuous in π t +1 , the result of [34, Theorem 2]implies that the set of equilibrium payoffs for the state game at t is upper-hemicontinuous in π t . Therefore, if the stage game G t ( π t ,V t +1 ,ψ t +1 ) has a unique equilibrium payoff for every π t ,we can show that V t is continuous in π t for t < T . Using this approach, we prove the existenceof SIB-PBEs for zero-sum games below. Theorem 4.

For every dynamic zero-sum game with asymmetric information there exists a SIB-PBE that is a solution to the dynamic program given by (24)-(26).

For dynamic non-zero-sum games, it is harder to establish that V t is continuous in π t for t < T since the set of equilibrium payoffs is not a singleton in general. However, we conjecture thatfor every dynamic game with asymmetric information described in Section II, at every stage ofthe corresponding dynamic program, it is possible to select a BNE for every realization of π t so that the resulting V t is continuous in π t .In addition to the results of Lemma 2 and Theorem 4, we provide another condition belowthat guarantees the existence of SIB-PBEs in some instances of dynamic games with asymmetricinformation. Lemma 3.

A dynamic game with asymmetric information described in Section II has at leastone SIB-PBE if there exits sufﬁcient information S N t such that the SIB update rule ψ T isindependent of σ ∗ . The independence of SIB update rule from σ ∗ is a condition that is not satisﬁed for all dynamicgames with asymmetric information. Nevertheless, we present below special instances where thiscondition is satisﬁed.

1) Nested information structure with one controller [11]:

Consider the nested informationstructure case described in Section II. Assume that the evolution of the system is controlled onlyby the uniformed player and is given by X t +1 = f t ( X t ,A t ,W t ) . For S t = X t and S t = 0 , it iseasy to check that P σ ∗ { π t +1 | π t ,a t } = P { π t +1 | π t ,a t } for all t ∈ T .

2) Independent dynamics with observable actions and no private valuation [18]:

Considerthe model with independent dynamics and observable actions described in Section II. Assumethat agent i ’s, i ∈ N , instantaneous utility is given by u it ( A t ,X − it ) (no private valuation); that is, November 23, 2018 DRAFT0 agent i ’s utility at t does not depend on X it . It is easy to verify that S it = ∅ is sufﬁcient privateinformation for agent i . Hence, the condition of Lemma 3 is trivially satisﬁed.

3) Delayed sharing information structure with d = [35], [36]: Consider a delayed sharinginformation structure where agents’ actions and observations are revealed publicly with d timestep delay. When delay d = 1 , we have P it = { Y it } . Let S it = P it = Y it . Then, it is easy to verifythat the condition of Lemma 3 is satisﬁed.

4) Uncontrolled state process with hidden actions:

Consider an N -player game withuncontrolled dynamics given by X t +1 = f t ( X t , W t ) , t ∈ T . At every time t ∈ T , agent i , i ∈ N , receives a noisy observation Y it = O it ( X it , Z it ) . The agents’ actions are hidden. Thus, P it = { Y i t , A i t − } and C t = ∅ . Hence, the condition of Lemma 3 is trivially satisﬁed. We notethat in the case where a subset of the agents’ observations is revealed to all agents’ with somedelay, i.e. C t ⊆ { Y t } , the condition of Lemma 3 is also satisﬁed.IX. C ONCLUSION

We proposed a general approach to study dynamic games with asymmetric information. Wepresented a set of conditions sufﬁcient to characterize an information state for each agent thateffectively compresses his common and private information in a mutually consistent manner.Along with the results in the companion paper [2], we showed that the characterized informationstate provides a sufﬁcient statistic for decision making purposes in strategic and non-strategicsettings. We introduced the notion of Sufﬁcient Information based Perfect Bayesian Equilibriumthat characterizes a set of outcomes in a dynamic game. We provided a sequential decompositionof the dynamic game over time, which leads to a dynamic program for the computation of theset of SIB-PBEs of the dynamic game. We determined conditions that guarantee the existence ofSIB-PBEs. Using these conditions, we proved the existence of SIB-PBE for dynamic zero-sumgames and for special instances of dynamic non-zero-sum games.R

EFERENCES [1] H. Tavafoghi, Y. Ouyang, and D. Teneketzis, “A sufﬁcient information approach to decentralized decision making,” in , 2018.[2] H. Tavafoghi, Y. Ouyang, and D. Teneketzis, “A uniﬁed approach to dynamic multi-agent decision problems withasymmetric information - part i: Non-strategic agents,” working paper , 2018.[3] E. Maskin and J. Tirole, “Markov perfect equilibrium: I. observable actions,”

Journal of Econonomic Theory , vol. 100,no. 2, pp. 191–219, 2001.[4] S. Zamir, “Repeated games of incomplete information: Zero-sum,”

Handbook of Game Theory , vol. 1, pp. 109–154, 1992.

November 23, 2018 DRAFT1 [5] F. Forges, “Repeated games of incomplete information: non-zero-sum,”

Handbook of Game Theory , vol. 1, pp. 109–154,1992.[6] R. Aumann, M. Maschler, and R. Stearns,

Repeated games with incomplete information . MIT press, 1995.[7] G. Mailath and L. Samuelson,

Repeated Games and Reputations . Oxford university press Oxford, 2006.[8] J. H¨orner, T. Sugaya, S. Takahashi, and N. Vieille, “Recursive methods in discounted stochastic games: An algorithm for δ → and a folk theorem,” Econometrica , vol. 79, no. 4, pp. 1277–1318, 2011.[9] J. Escobar and J. Toikka, “Efﬁciency in games with Markovian private information,”

Econometrica , vol. 81, no. 5, pp. 1887–1934, 2013.[10] T. Sugaya, “Efﬁciency in Markov games with incomplete and private information,” working paper , 2012.[11] J. Renault, “The value of Markov chain games with lack of information on one side,”

Math. Oper. Res. , vol. 31, no. 3,pp. 490–512, 2006.[12] P. Cardaliaguet, C. Rainer, D. Rosenberg, and N. Vieille, “Markov games with frequent actions and incomplete information-the limit case,”

Mathematics of Operations Research , 2015.[13] F. Gensbittel and J. Renault, “The value of Markov chain games with incomplete information on both sides,”

Mathematicsof Operations Research , vol. 40, no. 4, pp. 820–841, 2015.[14] L. Li, C. Langbort, and J. Shamma, “Solving two-player zero-sum repeated Bayesian games,” arXiv preprintarXiv:1703.01957 , 2017.[15] A. Nayyar, A. Gupta, C. Langbort, and T. Bas¸ar, “Common information based Markov perfect equilibria for stochasticgames with asymmetric information: Finite games,”

IEEE Transactions on Automatic Control , vol. 59, pp. 555–570, March2014.[16] A. Gupta, A. Nayyar, C. Langbort, and T. Bas¸ar, “Common information based Markov perfect equilibria for linear-Gaussiangames with asymmetric information,”

SIAM J. Control Optim. , vol. 52, no. 5, pp. 3228–3260, 2014.[17] Y. Ouyang, H. Tavafoghi, and D. Teneketzis, “Dynamic oligopoly games with private Markovian dynamics,” in , 2015.[18] Y. Ouyang, H. Tavafoghi, and D. Teneketzis, “Dynamic games with asymmetric information: Common information basedperfect Bayesian equilibria and sequential decomposition,”

IEEE Transactions on Automatic Control , 2017.[19] D. Vasal and A. Anastasopoulos, “Signaling equilibria for dynamic LQG games with asymmetric information,” in , pp. 6901–6908, 2016.[20] A. Sinha and A. Anastasopoulos, “Structured perfect Bayesian equilibrium in inﬁnite horizon dynamic games withasymmetric information,”

American Control Conference , 2016.[21] A. Nayyar, A. Mahajan, and D. Teneketzis, “Optimal control strategies in delayed sharing information structures,”

IEEETransactions on Automatic Control , vol. 56, no. 7, pp. 1606–1620, 2011.[22] A. Nayyar, A. Mahajan, and D. Teneketzis, “Decentralized stochastic control with partial history sharing: A commoninformation approach,”

IEEE Transactions on Automatic Control , vol. 58, no. 7, pp. 1644–1658, 2013.[23] H. Tavafoghi and D. Teneketzis, “Dynamic market mechanisms for wind energy,” arXiv:1608.04143 , 2016.[24] H. Tavafoghi,

On Design and Analysis of Cyber-Physical Systems with Strategic Agents . PhD thesis, University of Michigan,2017.[25] J. Renault, “The value of repeated games with an informed controller,”

Mathematics of Operations Research , vol. 37,no. 1, pp. 154–179, 2012.[26] L. Li and J. Shamma, “Efﬁcient strategy computation in zero-sum asymmetric repeated games,” arXiv preprintarXiv:1703.01952 , 2017.

November 23, 2018 DRAFT2 [27] P. Battigalli, “Strategic independence and perfect Bayesian equilibria,”

Journal of Economic Theory , vol. 70, no. 1, pp. 201–234, 1996.[28] R. Myerson and P. Reny, “Open sequential equilibria of multi-stage games with inﬁnite sets of types and actions,” workingpaper , 2015.[29] J. Watson, “Perfect Bayesian equilibrium: general deﬁnitions and illustrations,” working paper , 2016.[30] D. Fudenberg and J. Tirole,

Game theory. 1991 . Cambridge, Massachusetts, 1991.[31] M. J. Osborne and A. Rubinstein,

A Course in Game Theory . MIT press, 1994.[32] S. Sorin,

A First Course on Zero-sum Repeated Games , vol. 37. Springer Science & Business Media, 2002.[33] D. Miller, “Robust collusion with private information,”

The Review of Economic Studies , vol. 79, no. 2, pp. 778–811, 2012.[34] P. Milgrom and R. Weber, “Distributional strategies for games with incomplete information,”

Mathematics of OperationsResearch , vol. 10, no. 4, pp. 619–632, 1985.[35] T. Bas¸ar, “Two-criteria LQG decision problems with one-step delay observation sharing pattern,”

Information and Control ,vol. 38, no. 1, pp. 21–50, 1978.[36] T. Bas¸ar, “Decentralized multicriteria optimization of linear stochastic systems,”

IEEE Transactions on Automatic Control ,vol. 23, no. 2, pp. 233–243, 1978.[37] E. Hendon, H. Jacobsen, and B. Sloth, “The one-shot-deviation principle for sequential rationality,”

Games and EconomicBehavior , vol. 12, no. 2, pp. 274–282, 1996.[38] A. Nedich, “Optimization i,”

Lecture Notes , 2008.[39] E. Ok,

Real Analysis with Economic Applications , vol. 10. Princeton University Press, 2007.[40] K. Border,

Fixed point theorems with applications to economics and game theory . Cambridge university press, 1989.[41] E. Einy, O. Haimanko, and B. Tumendemberel, “Continuity of the value and optimal strategies when common priorschange.,”

International Journal of Game Theory , vol. 41, no. 4, 2012. A PPENDIX

Proof of Lemma 1.

For any given consistent SIB assessment ( σ ∗ ,γ ) , let g ∗ denote the behavioralstrategy proﬁle constructed according to (16). In the following, we construct recursively a beliefsystem µ that is consistent with g ∗ and satisﬁes (17).For t = 1 , we have P i = Y i and C = Z . Deﬁne µ it ( h i )( x , p − i ) := P { y , z | x } η ( x ) (cid:80) ˆ x ∈X P { y i , z | ˆ x } η (ˆ x ) . (27)For t > , if P g ∗ µ it − { h it | h it − } > ( i.e. no deviation from g ∗ t − at t − ), deﬁne µ it recursively byBayes’ rule, µ it ( h it )( x t , p − it ) := P g ∗ µ it − { h it , x t , p − it | h it − } P g ∗ µ it − { h it | h it − } . (28) November 23, 2018 DRAFT3

For t > , if P g ∗ µ it − { h it | h it − } = 0 ( i.e. there is a deviation from g ∗ t − at t − ), deﬁne µ it as, µ it ( h it )( x t , p − it ) := |P t ||S t | γ t ( c t )( x t , s t ) (cid:80) ˆ s − it ∈S − it γ ( c t )( x t , ˆ s − it , s it ) , (29)where s jt = ζ jt ( p jt , c t ; g ∗ t − ) for all j ∈ N .At t = 1 , (17) holds by construction from (27). For t > , P g ∗ { S − it | h it } = P g ∗ { S − it | p it , c t } = P g ∗ { S − it | s it , c t } = P g ∗ { S − it , s it | c t } P g ∗ { s it | c t } = π t ( S − it , s it ) (cid:80) ˆ s − it ∈S − it π t (ˆ s − it , s it )= P { S − it | s it , π t } where the second equality follows from (11). Therefore, (17) holds for all t ∈ T . Lemma 4. [2, Lemma 2] Given a SIB strategy proﬁle σ ∗ and update rule ψ consistent with σ ∗ , P σ ∗ ψ { S t +1 , Π t +1 | p t , c t , a t } = P σ ∗ ψ { S t +1 , Π t +1 | s t , π t , a t } . (30) for all s t , π t , a t .Proof of Theorem 2. Consider a “super dynamic system” as the collection of the original dy-namic system along with agents − i who play according to SIB assessment ( σ ∗ , γ ψ ) . Thissuperdynamic system captures the system agent i “sees”. We establish the claim of Theorem 2in two steps: (i) we show that from agent i ’s viewpoint the super dynamic system is a POMDP,and (ii) we show that { Π t , S it } is an information state for agent i when he faces the superdynamic system with the original utility u i T ( · , · ) . Therefore, without loss of optimality, agent i can choose his optimal decision strategy (best response) from the class of strategies that arefunctions of the information state { Π t , S it } , i.e. the class of SIB strategies.To establish step (i), consider ˜ X := { X t , Π t ,S t , Π t − ,S t − } as the system state at t for the superdynamic system. Agent i ’s observation at time t is given by ˜ Y it := { Y it ,Z t } . To show that thesuper dynamic system is a POMDP, we need to show that it satisﬁes the following properties:(a) it has a controlled Markovian dynamics, that is, P σ ∗ ψ { ˜ x t +1 | ˜ x t , a i t , ˜ y i t } = P σ ∗− i ψ { ˜ x t +1 | ˜ x t , a it } , ∀ t ∈ T , (31)(b) agent i ’s observation ˜ Y it is a function of system state ˜ X t along with the previous action A it − , November 23, 2018 DRAFT4 that is, P σ ∗ ψ { ˜ y it | ˜ x t ,a i t − , ˜ y i t − } = P σ ∗− i { ˜ y it | ˜ x t ,a it − } , ∀ t ∈ T , (32)(c) agent i ’s instantaneous utility at t ∈ T can be written as a function ˜ u t (˜ x t ,a it ) of system state ˜ X t along with his action A it , that is, E σ ∗ ψ (cid:8) u it ( X t ,A t ) | ˜ x t ,a i t , ˜ y i t (cid:9) = ˜ u t (˜ x t ,a it ) , ∀ t ∈ T . (33)Property (a) is true because, P σ ∗ ψ { ˜ x t +1 | ˜ x t , a i t , ˜ y i t } = P σ ∗ ψ { x t +1 ,π t +1 ,s t +1 ,π t ,s t | x t ,π t ,s t ,y i t ,z t ,a i t } = (cid:88) a − it ,z t +1 ,y t +1 P σ ∗ ψ { x t +1 ,π t +1 ,s t +1 ,a − it ,z t +1 ,y t +1 | x t ,π t ,s t ,y i t ,z t ,a i t } by system dynamics (1) and (2) = (cid:88) a − it ,z t +1 ,y t +1 (cid:104) P σ ∗ ψ { π t +1 ,s t +1 | x t ,π t ,s t ,y i t ,z t ,a i t ,a − it ,x t +1 ,z t +1 ,y t +1 } P { z t +1 ,y t +1 | x t +1 ,a t } P { x t +1 | x t ,a t } σ ∗− it ( π t ,s − it )( a − it ) (cid:105) Deﬁne ˆ Z := { z t +1 : π t +1 = ψ t +1 ( π t , z t +1 ) } & ˆ Y ( z t +1 ) := { y t +1 : s jt +1 = φ jt +1 ( s jt , { y jt +1 , z t +1 , a jt } ) , ∀ j } = (cid:88) a − it , z t +1 ∈ ˆ Z ,y t +1 ∈ ˆ Y ( z t +1 ) (cid:104) P ψ { π t +1 ,s t +1 | s t ,π t ,y t +1 ,x t +1 ,z t +1 ,a t } P { y t +1 ,z t +1 | x t +1 ,a t } P { x t +1 | x t ,a t } σ ∗− it ( π t ,s − it )( a − it ) (cid:105) = P σ ∗− i ψ { x t +1 ,π t +1 ,s t +1 | x t ,π t ,s t ,a it } = P σ ∗− i ψ { ˜ x t +1 | ˜ x t ,a it } . November 23, 2018 DRAFT5

Property (b) is true because P σ ∗ ψ { ˜ y it | ˜ x t ,a i t − , ˜ y i t − } = P σ ∗ ψ { y it ,z t | x t ,π t ,s t ,y i t − ,z t − ,a i t − } = (cid:88) a − it − P σ ∗ ψ { y it ,z t ,a − it − | x t ,π t ,s t ,y i t − ,z t − ,a i t − } = (cid:88) a − it − (cid:104) P σ ∗ ψ { y it ,z t | x t ,π t ,s t ,y i t − ,z t − ,a i t − ,a − it − } σ ∗− it ( π t − ,s t − )( a − it − ) (cid:105) by system dynamics (2) = (cid:88) a − it − P { y it ,z t | x t ,a it − ,a − it − } σ ∗− it ( π t − ,s t − )( a − it − )= P σ ∗− i { y it ,z t | x t ,π t − ,s t − ,a it − } = P σ ∗ ψ { ˜ y it | ˜ x t ,a it − } . Property (c) is true because E σ ∗ ψ (cid:8) u it ( X t ,A t ) | ˜ x t ,a i t , ˜ y i t (cid:9) = E σ ∗ ψ (cid:8) u it ( X t ,A t ) | x t ,π t ,s t , ˜ x t − ,a i t , ˜ y i t (cid:9) = E σ ∗− i ψ (cid:8) u it ( X t , ( a it ,σ ∗− i ( π t ,s − it ))) | x t ,π t ,s t , ˜ x t − ,a i t , ˜ y i t (cid:9) = u it ( x t , ( a it ,σ ∗− i ( π t ,s − it ))) := ˜ u t (˜ x t ,a it ) To establish step (ii), that is, to show that { Π t ,S it } is an information state for agent i whenhe interacts with the superdynamic system deﬁned above based on SIB assessment ( σ ∗ , γ ψ ) , weneed to prove: (1) it can be updated recursively at t , i.e. it can be determined using { Π t − ,S it − } and { ˜ Y it ,A it } = { Y it ,Z t ,A it } ; (2) agent i ’s belief about { Π t +1 ,S it +1 } conditioned on { Π t ,S it ,A it } isindependent of H it and his actual strategy g i ; and (3) it is sufﬁcient to evaluate the agent i ’sinstantaneous utility at t for every action a it ∈ A it , for all t ∈ T .Condition (1) is satisﬁed since Π t = ψ t (Π t − ,Z t ) and S it = φ t ( S it − , { Y it ,Z t ,A it } ) for t ∈ T \{ } ;see part (i) of Deﬁnition 4 and (15). November 23, 2018 DRAFT6

To prove condition (2), let g ∗ jt ( h jt ) = σ ∗ jt ( l ( h jt ) , γ ψt ( c t )) , (34)for all j ∈ N and t ∈ T . Then condition (2) is satisﬁed since P σ ∗ ψ { s it +1 ,π t +1 | h it ,a it } = (cid:88) h − it ,a − it P σ ∗ ψ { s it +1 ,π t +1 ,h − it ,a − it | h it ,a it } by Theorem 1 and (34) = (cid:88) h − it ,a − it P σ ∗ ψ { s it +1 ,π t +1 | h t ,a t } P g ∗− i { h − it | h it } g ∗− it ( h − it )( a − it )= (cid:88) h − it ,a − it ,s − it +1 P σ ∗ ψ { s it +1 ,π t +1 ,s − it +1 | h t ,a t } P g ∗− i { h − it | h it } g ∗− it ( h − it )( a − it ) by Lemma 4 and s − it = ζ − it ( h − it ; g ∗ t − ) (see Deﬁnition 4) = (cid:88) h − it ,a − it ,s − it +1 P σ ∗ ψ { s t +1 ,π t +1 | s t ,π t ,a t } P g ∗− i { s − it | h it } g ∗− it ( h − it )( a − it ) by part (ii) of Lemma 1 = (cid:88) s − it ,a − it ,s − it +1 P σ ∗ ψ { s t +1 ,π t +1 | s t ,π t ,a t } P { s − it | s it ,π t } g ∗− it ( h − it )( a − it ) by (34) = (cid:88) h − it ,a − it ,s − it +1 P σ ∗ ψ { s it +1 ,π t +1 ,s − it +1 | h t ,a t } P { s − it | s it ,π t } σ ∗− it ( π t ,s − it )( a − it )= P σ ∗ ψ { s it +1 ,π t +1 | s it ,π t ,a it } . . To prove condition (3), we need to show that for all a it ∈ A it , E g ∗− i { u ti ( X t ,A − it ,a it ) | h it } = E g ∗− i { u ti ( X t ,A − it ,a it ) | π t ,s it } , (35)for all h it , π t , s it , t ∈ T . November 23, 2018 DRAFT7

By Lemma 1, P g ∗− i { s − it | h it } = P { s − it | π t , s it } . (36)Setting π t = γ ψ ( c t ) , and A − it = σ ∗− i ( π t , S − it ) , E σ ∗ ψ { u it ( X t ,A − it ,a it ) | h it } = E σ ∗ ψ { u it ( X t ,σ ∗− it ( π t ,S − it ) ,a it ) | h it } = E σ ∗ ψ (cid:110) E σ ∗− i { u it ( X t ,σ ∗− it ( π t ,S − it ) ,a it ) | S − it ,π t ,s ti ,h it } (cid:12)(cid:12)(cid:12) h it (cid:111) = E σ ∗ ψ (cid:110) E σ ∗− i { u it ( X t ,σ ∗− it ( π t ,S − it ) ,a it ) | S − it ,π t ,s it ,c t } (cid:12)(cid:12)(cid:12) h it (cid:111) = E σ ∗− i ψ (cid:110) E σ ∗− i { u it ( X t ,σ ∗− it ( π t ,S − it ) ,a it ) | S − it ,π t ,s it ,c t } (cid:12)(cid:12)(cid:12) h it (cid:111) = E σ ∗− i ψ (cid:110) E σ ∗− i { u it ( X t ,σ ∗− it ( π t ,S − it ) ,a it ) | S − it ,π t ,s it } (cid:12)(cid:12)(cid:12) h it (cid:111) = E σ ∗− i ψ { u it ( X t ,σ ∗− it ( π t ,S − it ) ,a it ) | π t ,s it } . (37)The ﬁrst equality above is by substituting A − it = σ ∗− it ( π t ,S − it ) . The second equality followsfrom the smoothing property of conditional expectation. The third equality holds by condition(iii) of Deﬁnition 4. The fourth equality follows from Theorem 1 since S − it is a function of H − it .The ﬁfth equality holds since for every x t , s t , π t , c t , P { x t | s t ,π t ,c t } = P { x t ,s t | π t ,c t } P { s t ,π t ,c t } = π t ( x t ,s t ) (cid:80) ˆ x t π t (ˆ x t ,s t ) = P { x t | s t ,π t } . The last equality is true by (36). By (37) we prove condition (3) for { Π t ,S it } to be an informationstate, and thus establish the result of Theorem 2. Proof of Theorem 3.

Let ( σ ∗ ,ψ ) denote a solution of the dynamic program. We note that the SIBupdate rule ψ is consistent with σ ∗ by requirement (25). Therefore, we only need to show thatthe SIB assessment ( σ ∗ ,ψ ) is sequentially rational. To prove it, we use the one-shot deviationprinciple for dynamic games with asymmetric information [37]. To state the one-shot deviation,we need the following deﬁnitions. Deﬁnition 7 (One-shot deviation) . We say ˜ g i is a one-shot deviation from g ∗ i if there exists aunique h it ∈ H i such that ˜ g it ( h it ) (cid:54) = g ∗ it ( h it ) , and ˜ g iτ ( h iτ ) (cid:54) = g ∗ iτ ( h iτ ) for all h iτ (cid:54) = h it , h iτ ∈ H i . Deﬁnition 8 (Proﬁtable one-shot deviation) . Consider an assessment ( g ∗ ,µ ) . We say ˜ g i is aproﬁtable one-shot deviation for agent i if ˜ g i is a one-shot deviation from g ∗ i at h it such that November 23, 2018 DRAFT8 ˜ g it ( h it ) (cid:54) = g ∗ it ( h it ) , and E ( g ∗− i , ˜ g i ) µ (cid:40) T (cid:88) τ = t u iτ ( X τ ,A τ ) (cid:12)(cid:12)(cid:12) h it (cid:41) > E ( g ∗− i ,g ∗ i ) µ (cid:40) T (cid:88) τ = t u iτ ( X τ ,A τ ) (cid:12)(cid:12)(cid:12) h it (cid:41) One-shot deviation principle [37]:

A consistent assessment ( g ∗ ,µ ) is a PBE if and only ifthere exists no agent that has a proﬁtable one-shot deviation.Below, we show that the consistent SIB assessment ( σ ∗ ,ψ ) satisﬁes the sequential rationalitycondition using the one-shot deviation principle.Consider an arbitrary agent i ∈ N , time t ∈ T , and history realization h it ∈ H it . Agent i has aproﬁtable one-shot deviation at h it only if σ ∗ it ( π t ,s it ) / ∈ arg max ˜ g it ( h it ) ∈ ∆( A it ) E σ ∗ π (cid:110) ¯ U it (( σ ∗ ( π t ,S t )) ,S t ,π t ,V t +1 ,ψ t +1 ) (cid:12)(cid:12)(cid:12) h it (cid:111) . Given ( π t ,V t +1 ,ψ t +1 ,σ ∗ t ) , the expected value of the function ¯ U it conditioned on h it is only afunction of s it , agent i ’s belief about S − it , as well as agent i ’s strategy ˜ g it ( h it ) . Agent i ’s beliefabout S − it given h it is only a function of s it and π t (see (17)). Therefore, any solution to themaximization problem above can be written as a function of π t and s it , that is, it is a SIB strategy ˜ σ it ( π t ,s it ) for agent i . Consequently, agent i has a proﬁtable one-shot deviation only if σ ∗ it ( π t ,s it ) / ∈ arg max ˜ σ it ( π t ,s it ) ∈ ∆( A it ) E σ ∗ π (cid:110) ¯ U it (( σ ∗ ( π t ,S t )) ,S t ,π t ,V t +1 ,ψ t +1 ) (cid:12)(cid:12)(cid:12) π t ,s it (cid:111) . By (24), σ ∗ t is BNE of the stage game G t ( π t , V t +1 , ψ t +1 ) at t (see also (21)), i.e. σ ∗ it ( π t ,s it ) ∈ arg max ˜ σ it ( π t ,s it ) ∈ ∆( A it ) E σ ∗ π (cid:110) ¯ U it (( σ ∗ ( π t ,S t )) ,S t ,π t ,V t +1 ,ψ t +1 ) (cid:12)(cid:12)(cid:12) π t ,s it (cid:111) . Consequently, there exists no proﬁtable deviation from σ ∗ it ( π t ,s it ) at h it . Therefore, there existsno agent that has a proﬁtable one-shot deviation. Hence, by one-shot deviation principle, theconsistent SIB assessment ( σ ∗ ,ψ ) is sequentially rational, and thus, it is a SIB-PBE. Proof of Lemma 2.

We prove below that if V t +1 ( · ,s t +1 ) is continuous in π t +1 , then the dynamicprogram has a solution at stage t , t ∈ T ; that is, there exists at least one σ ∗ t such that σ ∗ t ∈ BNE t ( V t +1 ,ψ t +1 ) , where ψ t +1 is consistent with σ ∗ t .For every π t , deﬁne a perturbation of the stage game G t ( π t ,V t +1 ,ψ t +1 ) by restricting the set ofstrategies of each agent to mixed strategies that assign probability of at least (cid:15) > to every action a it ∈ A it of agent i ∈ N ; for every agent i ∈ N we denote this class of (cid:15) -restricted strategies by November 23, 2018 DRAFT9 Σ i,(cid:15)t and Σ (cid:15)t := Σ ,(cid:15)t × . . . × Σ N,(cid:15)t . In the following we prove that, for every (cid:15) > , the correspondingperturbed stage game has an equilibrium σ ∗ ,(cid:15)t along with a consistent update rule ψ (cid:15)t +1 .We note that when the agents’ equilibrium strategies are perfectly mixed strategies, then theupdate rule ψ (cid:15)t +1 is completely determined via Bayes’ rule. Therefore, for every strategy proﬁle σ ∗ ,(cid:15)t ∈ Σ (cid:15)t we can write ψ (cid:15)t +1 := β t +1 ( σ ∗ ,(cid:15) ) , where β t +1 ( σ ∗ ,(cid:15) ) is Bayes’ rule where σ ∗ ,(cid:15) is utilized(see (12))For every agent i ∈ N , deﬁne a best response correspondence BR i,(cid:15)t : Σ (cid:15)t ⇒ Σ i,(cid:15)t as BR i,(cid:15)t ( σ ∗ ,(cid:15)t ) := (cid:40) σ it ∈ Σ i,(cid:15)t : σ it ( π t ,s it ) ∈ arg max σ i,(cid:15)t ∈ Σ i,(cid:15)t E σ ∗− i,(cid:15)t ,σ i,(cid:15)t { ¯ U it ( A t ,S t ,π t ,V t +1 ,β t +1 ( σ ∗ ,(cid:15)t )) | s it ,π t } , ∀ π t ,s it (cid:41) , (38)which determines the set of all agent i ’s best responses within the class of (cid:15) -restricted strategiesassuming that agents − i are playing σ ∗− i,(cid:15)t and the update rule ψ (cid:15)t +1 = β t +1 ( σ ∗ ,(cid:15)t ) .For every i ∈ N and σ ∗ ,(cid:15) ∈ Σ (cid:15)t , we prove below that BR i,(cid:15)t ( σ ∗ ,(cid:15)t ) is non-empty, convex, closed,and upper hemicontinuous.We note that E σ ∗ ,(cid:15)t ,σ i,(cid:15)t { ¯ U it ( A t ,S t ,π t ,V t +1 ,β t +1 ( σ ∗ ,(cid:15)t )) | s it ,π t } = (cid:88) a it E σ ∗− i,(cid:15)t ,A it = a it { ¯ U it ( A t ,S t ,π t ,V t +1 ,β t +1 ( σ ∗ ,(cid:15)t )) | s it ,π t } σ i,(cid:15)t ( π t ,s it )( a it )= (cid:88) a it ˜ U ∗ ,(cid:15)π t ,s it ( a it ) σ i,(cid:15)t ( π t ,s it )( a it ) , where ˜ U σ ∗ ,(cid:15) π t ,s it ( a it ) := E σ ∗− i,(cid:15)t ,A it = a it { ¯ U it ( A t ,S t ,π t ,V t +1 ,β t +1 ( σ ∗ ,(cid:15)t )) | s it ,π t } . Therefore, for every π t ,s it , we have σ i,(cid:15)t ( π t ,s it ) ∈ arg max α ∈ ∆( A it ): α ( a it ) ≥ (cid:15), ∀ a it (cid:88) a it ˜ U σ ∗ ,(cid:15) π t ,s it ( a it ) α ( a it ) . We note that max α ∈ ∆( A it ): α ( a it ) ≥ (cid:15), ∀ a it (cid:80) a it ˜ U σ ∗ ,(cid:15) π t ,s it ( a it ) α ( a it ) is a linear program, thus, by Theorem 16of [38], the set of agent i ’s best responses BR i,(cid:15)t ( σ ∗ ,(cid:15)t ) is closed and convex. If V t +1 is continuousin π t +1 then V t +1 is continuous in agent i ’s strategy σ it . Moreover, the instantaneous utility u it is continuous in agent i ’s strategy σ it . Therefore, ¯ U it , given by (20), is continuous in agent i ’strategy σ it . Therefore, by the maximum theorem [39] the set of i ’s best responses in upper November 23, 2018 DRAFT0 hemicontinuous in σ ∗ ,(cid:15)t and non-empty.Consequently, we establish that for every i ∈ N , BR i,(cid:15)t ( σ ∗ ,(cid:15)t ) is closed, convex, upper hemi-continuous, and non-empty for every σ ∗ ,(cid:15)t ∈ Σ (cid:15)t . Deﬁne BR (cid:15)t := × i ∈N BR i,(cid:15)t where × denotesthe Cartesian product. The correspondence BR (cid:15)t ( σ ∗ ,(cid:15)t ) is closed, convex, upper hemicontinuous,and non-empty for every σ ∗ ,(cid:15)t ∈ Σ (cid:15)t since BR i,(cid:15)t ( σ ∗ ,(cid:15)t ) is closed, convex, upper hemicontinuous,and non-empty for every σ ∗ i,(cid:15)t ∈ Σ i,(cid:15)t for all i ∈ N . Therefore, by Kakutani’s ﬁxed-point theorem[40, Corollary 15.3], the correspondence BR (cid:15)t has a ﬁxed point. Therefore, every perturbed stagegame has an equilibrium σ ∗ ,(cid:15)t along with a consistent update rule ψ t +1 = β t +1 ( σ ∗ ,(cid:15)t ) .Now consider the sequence of these perturbed games when (cid:15) → . Since the set of agents’strategies is compact, there exists a subsequence of these perturbed games whose equilibriumstrategies converge, say to σ ∗ t . Similarly let ψ ∗ t +1 denote the convergence point of β t +1 ( σ ∗ ,(cid:15)t ) . Wenote that ψ ∗ t +1 is consistent with σ ∗ t since β t +1 ( σ ∗ ,(cid:15)t ) ( i.e. Bayes’ rule) is continuous in σ ∗ ,(cid:15)t . Weshow below that for every agent i ∈ N , σ ∗ it is a best response for him given V t +1 ,ψ ∗ t +1 when hechooses his strategy from the unconstrained class of SIB strategies.As we proved above, the set of agent i ’s best responses BR it ( σ ∗ t ) is upper hemicontinuousand closed given ψ ∗ t +1 . Therefore, σ ∗ t ( π t , · ) is also a best response for agent i in the stage game G t ( π t ,V t +1 ,ψ ∗ t +1 ) . Consequently, σ ∗ t ∈ BNE t ( V t +1 , ψ ∗ t +1 ) where ψ ∗ t +1 is consistent with σ ∗ t . Proof of Theorem 4.

We have a Bayesian zero-sum game with ﬁnite state and action spaces.By [41, Theorem 1] the equilibrium payoff is a continuous function of the agents’ commonprior/belief. Using this result, we prove, by backward induction, that every stage of the dynamicprogram described by (24)-(26), has a solution and V t is continuous in π t for all t .For t = T + 1 the dynamic program has a solution trivially since the agents have utility fortime less than or equal to T . Moreover, V T +1 ( .,. ) = 0 is trivially continuous in π T +1 .For t ≤ T , assume that V t +1 is continuous in π t +1 . Then, by Lemma 2 the dynamic programhas a solution at t . We note that the continuation game from t to T is a dynamic zero-sumgame with ﬁnite state and actions spaces. Therefore, as we argued above, by [41, Theorem 1]the agents’ equilibrium payoff at t ( i.e. V t ) is unique and is continuous in the agents’ commonprior given by π t .Therefore, by induction we establish the assertion of Theorem 4. Proof of Lemma 3.

Assume that ψ T is independent of σ . Then, the evolution of Π t is in-dependent of σ ∗ and known a priori. As a result, we can ignore the consistency condition November 23, 2018 DRAFT1 (25) in the dynamic program. Given ψ t +1 , the stage game G t ( π t ,V t +1 ,ψ t +1 ) is a static game ofincomplete information with ﬁnite actions (given by A Nt ) and ﬁnite types (given by S Nt ) forevery π t . Therefore, by the standard existence results for ﬁnite games [30, Theorem 1.1], thereexists an equilibrium for the stage game BNE t ( V t +1 ,ψ t +1 ) . Consequently, the correspondence BNE t ( V t +1 ,ψ t +1 ) is non-empty for every t ∈ T , thus, the dynamic programming given by (24-26) has a solution., thus, the dynamic programming given by (24-26) has a solution.