[PDF] Dynamic information design

Abstract

We consider the problem of dynamic information design with one sender and one receiver where the sender observers a private state of the system and takes an action to send a signal based on its observation to a receiver. Based on this signal, the receiver takes an action that determines rewards for both the sender and the receiver and controls the state of the system. In this technical note, we show that this problem can be considered as a problem of dynamic game of asymmetric information and its perfect Bayesian equilibrium (PBE) and Stackelberg equilibrium (SE) can be analyzed using the algorithms presented in [1], [2] by the same author (among others). We then extend this model when there is one sender and multiple receivers and provide algorithms to compute a class of equilibria of this game.

Full PDF

aa r X i v : . [ ec on . T H ] M a y TECHNICAL NOTE 1

Dynamic information design

Deepanshu Vasal

Abstract

We consider the problem of dynamic information design with one sender and one receiver wherethe sender observers a private state of the system and takes an action to send a signal based on itsobservation to a receiver. Based on this signal, the receiver takes an action that determines rewardsfor both the sender and the receiver and controls the state of the system. In this technical note, weshow that this problem can be considered as a problem of dynamic game of asymmetric informationand its perfect Bayesian equilibrium (PBE) and Stackelberg equilibrium (SE) can be analyzed using thealgorithms presented in [1], [2] by the same author (among others). We then extend this model whenthere is one sender and multiple receivers and provide algorithms to compute a class of equilibria ofthis game.

I. I

NTRODUCTION

Game theory is a powerful tool to analyze behavior among strategic agents. An engineeringside of game theory is mechanism design which aims to design systems such that when playedon by strategic agents who optimize their individual objectives, they achieve the same objectiveas envisioned by the designer. A classic and one of the most widely used practical example ofmechanism design is auctions [3] where an auctioneer asks for bids by the bidders on a privategood. The auction is designed in such a way that when the strategic bidders bid on the value tomaximize their own valuations, it maximizes the returns of the auctioneer. There is a huge andgrowing literature on the theory of Mechanism Design as well as its real world applications [4].Information design is a relatively new ﬁeld related to the ﬁeld of mechanism design introducedby Kamenica and Gentzkow in [5] where a sender (designer) observes a state of the world notobserved by the receiver. The sender sends a signal to the receiver about this state based onwhich the receiver takes an action, which determines individual rewards for both the sender andthe receiver. The sender has to choose a signal that maximizes its reward. The receiver interprets e-mail: [email protected].

May 18, 2020 DRAFT TECHNICAL NOTE the state of the world from the sender’s signal knowing that the sender would have chosen asignal that maximizes its reward, and thus takes an action that maximizes its own reward. Thereare two notions of equilibrium that can be considered in this setting: (a) Nash equilibrium and(b) Stackelberg equilibrium.Nash equilibrium is deﬁned as a set of strategies of the player such that no user wants tounilaterally deviate. Thus it can be deﬁned as a ﬁxed-point of best responses to each player’sstrategies. In a stackelberg equilibrium there is a leader and a follower (sender and receiverin this case, respectively). The leader commits to a policy that is known and observed by thereceiver. Then theStackelberg equilibrium is deﬁned as set of strategies of the players such thatthe receiver plays best response to the leader’s committed strategy and the leader, knowing thatthe sender will play a best response, plays a strategy that maximizes its reward.Since [5], there have been a growing number of works on information design includingdynamic information design where the state of the world evolves in a dynamic fashion andboth sender and the receiver play a sequential game [6]–[29]. Authors in [15]–[19] considered adynamic version of the model consiodered by [5] where the state evolves as a Markov process,the sender is forward looking, however, the receiver is myopic. Recently, Farhadi and Teneketzisin [31] considered a dynamic model with evolving Markovian state and presented (Stackelberg)equilibrium strategies of the sender and the receiver, where both the sender and the receiver arefully rational. We refer the readers to [31] for an excellent introduction of information designproblems.In this note, we consider a general discrete time ﬁnite horizon model where there is a stateof the system that is evolving as a controlled Markov process which is privately observed bythe sender. In each time t , the sender sends a signal to the receiver based on which both thesender and the receiver get individual instantaneous rewards. The objective of the sender is tomaximize its total expected reward over the time horizon T and the objective of the receiver isto maximize its own. Thus it can be posed as a dynamic game of asymmetric information whereplayers play alternatively. We assume both the sender and the receiver are fully rational andforward looking. We consider two equilibrium concepts:(1) Perfect Bayesian Equilibrium (PBE)which can be thought of as extension of Nash equilibrium for dynamic games of incompleteinformation, (2) perfect Stackelberg equilibrium (PSE) where the sender has a commitment powerand commits to a policy. In this technical note, we show that game ﬁts within the framework ofthe models used by the authors in [1], [2], which provides a tool to analyze Markovian Perfect DRAFT May 18, 2020ECHNICAL NOTE 3

Bayesian equilibrium (PBE) and Markovian Perfect Stackelberg Equilibrium (PSE) of this game.We further extend this model such that instead of one, there are multiple receivers taking actions.Based on [1], [2], we provide an algorithm to analyze PBEs of this game.

A. Notation

We use uppercase letters for random variables and lowercase for their realizations. For anyvariable, subscripts represent time indices and superscripts represent player indices. We usenotation − i to represent all players other than player i i.e. − i = { , , . . . i − , i + 1 , . . . , N } .We use notation A t : t ′ to represent the vector ( A t , A t +1 , . . . A t ′ ) when t ′ ≥ t or an empty vectorif t ′ < t . We use A − it to mean ( A t , A t , . . . , A i − t , A i +1 t . . . , A Nt ) . We remove superscripts orsubscripts if we want to represent the whole vector, for example A t represents ( A t , . . . , A Nt ) .In a similar vein, for any collection of sets ( X i ) i ∈N , we denote × i ∈N X i by X . We denotethe indicator function of a set A by I A ( · ) . For any ﬁnite set S , ∆( S ) represents the spaceof probability measures on S and |S| represents its cardinality. We denote by P g (or E g ) theprobability measure generated by (or expectation with respect to) strategy proﬁle g . We denotethe set of real numbers by R . For a probabilistic strategy proﬁle of players ( σ it ) i ∈N wherethe probability of action a it conditioned on ( a t − , x i t ) is given by σ it ( a it | a t − , x i t ) , we usethe notation σ − it ( a − it | a t − , x − i t ) to represent Q j = i σ jt ( a jt | a t − , x j t ) . All equalities/inequalitiesinvolving random variables are to be interpreted in the a.s. sense. For mappings with rangefunction sets f : A → ( B → C ) we use square brackets f [ a ] ∈ B → C to denote the image of a ∈ A through f and parentheses f [ a ]( b ) ∈ C to denote the image of b ∈ B through f [ a ] . Acontrolled Markov process with state X t , action A t , and horizon T is denoted by ( X t , A t ) t ∈T .The paper is organized as follows. In Section II, we present the model. In Section III,we present the solution concepts of perfect Bayesian and Perfect Stackelberg equilibrium. InSection IV, we present a two-step backward-forward recursive algorithm to construct a strategyproﬁle and a sequence of beliefs of the dynamic game considered. In Section V, we extend thatmethodology to multiple receivers. II. M ODEL

Suppose there are two players, a sender and a receiver. Sender observes a controlled Markovprocess { X t } t privately such that P ( x t | x t − , a t − ) = Q ( x t | x t − , a t − ) (1a) May 18, 2020 DRAFT TECHNICAL NOTE where a t ∈ A is the action taken by the receiver at time t . Sender takes action s t ∈ S attime t upon observing ( s t − , a t − ) , which is common information among players, and x t ,which is sender’s private information. The sets A , X , S are assumed to be ﬁnite and we alsoassume that kernel Q x has full support. Players play alternatively such that sender plays at oddtimes and the receiver plays at the even times. At the end of interval t , player i receives aninstantaneous reward R i ( x t , a t ) . All reward functions, priors and the update kernels are assumedto be common knowledge. We also assume that the receiver observes the rewards it receives { R rt } t during the course of the game. These could be understood as additional observation of thestate by the receiver. We note that these rewards are a function of current state and action of thereceiver, which the sender perfectly observes. Let g i = ( g it ) t be a probabilistic strategy of player i ∈ { S, R } where g st : ( S × A × R ) t − × ( X ) t → ∆( A s ) and g rt : ( S × A × R ) t − → ∆( A r ) such that player i ∈ { S, R } plays action a it according to A st ∼ g st ( ·| a t − , s t , r r t − , x t ) and A rt ∼ g rt ( ·| a t − , r r t − , s t ) . Let g := ( g i ) i ∈{ S,R } be a strategy proﬁle of all players. The objectiveof the player i is to maximize its total expected reward J i,g := E g " T X t =1 R i ( X t , A t ) . (2) A. Common agent approach

Any history of this game at which players take action is of the form h t = ( a t − , s t , r r t − , x t ) .Let H t be the set of such histories, H T △ = ∪ Tt =0 H t be the set of all possible such histories in ﬁnitehorizon and H ∞ △ = ∪ ∞ t =0 H t for inﬁnite horizon. At any time t the sender observes h st = h t =( a t − , s t , r r t − , x t ) and both players together have h ct = a t − , s t , r r t − as common history.Since the receiver does not observe any private information, h rt = h ct = ( a t − , s t , r r t − ) . Let H it be the set of observed histories of player i at time t and H ct be the set of common historiesat time t .We recall that the sender and the receiver generate their actions at time t as follows, A st ∼ g st ( ·| a t − , s t , r r t − , x t ) and A rt ∼ g rt ( ·| a t − , s t , r r t − ) . An alternative way to view theproblem is as follows. As is done in common information approach [32], at odd time t , a ﬁctitiouscommon agent observes the common information ( a t − , s t − , r r t − ) and generates prescriptionfunction γ st = ψ st [ a t − , s t − , r r t − ] . Sender uses its prescription function γ st to operate on itsprivate information x t to produce its action a st , γ st : X t → ∆( A s ) and a st ∼ γ st ( ·| x t ) . Ateven time t , the ﬁctitious common agent observes the common information ( a t − , s t − , r r t − ) DRAFT May 18, 2020ECHNICAL NOTE 5 and generates prescription function γ rt = ψ rt [ a t − , s t − , r r t − ] . Receiver uses its prescriptionfunction γ rt to produce its action a rt , where γ rt ∈ ∆( A r ) and a rt ∼ γ rt ( · ) . It is easy to see thatfor any g policy proﬁle of the players, there exists an equivalent ψ proﬁle of the common agent(and vice versa) that generates the same control actions for every realization of the informationof the players.Here, we will consider Markovian common agent’s policy as follows. We call a commonagent’s policy be of “type θ ” if the common agent observes the common beliefs µ t at odd timesand ν t at even times derived from the common observation a t − , s t − , r r t − and a t − , s t − , r r t − respectively, and generates prescription functions γ st = θ t [ µ t ] and γ rt = θ t [ ν t ] , where ν t ( x t ) = P g ( X t = x t | a t − , s t − , r r t − ) and µ t +1 ( x t +1 ) = P g ( X t +1 = x t +1 | a t − , s t , r r t − ) .Moreover, the sender’s action only depends on its current private information x t i.e. S t ∼ γ st ( ·| x t ) .In the next lemma we show that for any given θ policy, the belief states µ t , ν t can be updatedrecursively as follows. Let µ ( x ) := Q ( x ) . Lemma 1:

For any given policy of type θ , there exists update functions F, G , independent of θ , such that ν t = F ( µ t , γ st , s t ) (3) µ t +1 = G ( ν t , a t , r rt ) (4) Proof:

Please see Appendix A.III. S

OLUTION C ONCEPTS

A. Solution Concept: Perfect Bayesian Equilibrium

An appropriate concept of equilibrium for such games is PBE [33], which consists of a pair ( β ∗ , µ ∗ ) of strategy proﬁle β ∗ = ( β ∗ ,it ) t ∈T ,i ∈N where β ∗ ,it : H it → ∆( A i ) and a belief proﬁle µ ∗ = ( i µ ∗ t ) t ∈T ,i ∈N where i µ ∗ t : H it → ∆( H t ) that satisfy sequential rationality so that for i = l, f, t ∈ T , h it ∈ H it , β i W i,β ∗ ,i ,Tt ( h it ) ≥ W i,β i ,Tt ( h it ) (5)where the reward-to-go is deﬁned as W i,β i ,Tt ( h it ) , E β i β ∗ , − i , i µ ∗ t [ h it ] ( T X n = t R in ( X n , A n ) (cid:12)(cid:12)(cid:12) h it ) , (6)and the beliefs are updated using Bayes’ rule whenever possible. In general, a belief for player i at time t , i µ ∗ t is deﬁned on history h t = ( a t − , s t , r r t − , x t ) given its private history h it . May 18, 2020 DRAFT TECHNICAL NOTE

At any time t , the relevant uncertainty follower has is about the state history x t ∈ × tn =1 X and their future actions. In our setting, we consider beliefs that are functions of each player’shistory h it only through the common history h ct and are a belief on the current state only. Herethe follower’s belief for each history h ct = ( a t − , s t , r r t − ) is derived from a common belief µ ∗ t [ a t − , s t , r r t − ] . In order to anticipate followers actions through its strategy, the leader keepstrack of this belief as well (and it can since it is derived from common information). Thus wecan sufﬁciently use the system of beliefs, µ ∗ = ( µ ∗ t ) t ∈T , where µ ∗ t : H ct → ∆( X ) . B. Solution concept:Stackelberg Equilibrium

An appropriate notion of equilibrium is Stackelberg equilibrium deﬁned as follows. For agiven strategy proﬁle of the sender, σ s , the receiver maximizes its total discounted expectedutility over ﬁnite horizon T , max σ r E σ s ,σ r ( T X t =1 δ t − R rt ( X t , A t ) ) . (7)Let BR r ( σ s ) be the set of optimizing strategies of the receiver given a strategy σ s of the sender,i.e. BR r ( σ s ) = arg max σ r E σ s ,σ r ( T X t =1 δ t − R rt ( X t , A t ) ) (8)The sender ﬁnds its optimal strategy that maximizes its total expected discounted reward giventhat the receiver will use its best response to it, ˜ σ s ∈ max σ s E σ s ,BR r ( σ s ) ( T X t =1 δ t − R st ( X t , A t ) ) , (9)Then (˜ σ s , ˜ σ r ) constitute a Stackelberg equilibrium where ˜ σ r ∈ BR r (˜ σ s ) . C. Common Perfect Stackelberg equilibrium

In this paper, we will consider sender’ equilibrium policies that only depend on its cur-rent states x t and action history, i.e. at equilibrium, a st ∼ ˜ σ st ( ·| a t − , s t − , r r t − , x t ) , a rt ∼ ˜ σ st ( ·| a t − , s t − , r r t − ) . For the game considered, we introduce a notion of common Perfect Stackelberg Equilibrium(cPSE), inspired by perfect Bayesian equilibrium [34] as follows. Note, however, that for the purpose of equilibrium, the optimization will be performed in the space of all possible strategiesthat may depend on the entire history of state.

DRAFT May 18, 2020ECHNICAL NOTE 7

Let (˜ σ, µ, ν ) be a cPSE of the game, where µ = ( µ t ) t ∈ [ T ] , ν = ( ν t ) t ∈ [ T ] , and for any t , ( a t − , s t − , r r t − ), µ t [ a t − , s t − , r r t − ] ∈ ∆( X ) , ν t [ a t − , s t − , r r t − ] ∈ ∆( X ) are theequilibrium belief on the current state x t , given the action history ( a t − , s t − , r r t − ) , ( a t − , s t − , r r t − ) respectively, i.e. µ t [ a t − , s t − , r r t − ]( x t ) = P ˜ σ ( x t | a t − , s t − , r r t − ) , ν t [ a t − , s t − , r r t − ]( x t ) = P ˜ σ ( x t | a t − , s t − , r r t − ) . Then for all t ∈ [ T ] , h rt = ( a t − , s t − , r r t − ) , h st = ( a t − , s t − , r r t − ) , x t , for any given σ s , with some abuseof notation, let BR r ( σ s ) , which is best response of the receiver to any strategy σ l of the sender,be deﬁned as, ∀ h ct BR r ( σ s ) := \ t \ h ct arg max σ r E σ s ,σ r ( T X n = t δ n − t R rn ( X n , A n ) | h ct ) (10)and ∀ h ct , let the set of optimum strategies of the sender be deﬁned as, ˜ σ s ∈ \ t \ h ct arg max σ s E σ s ,BR r ( σ s ) ( T X n = t δ n − t R rn ( X n , A n ) | h ct ) , (11)Then (˜ σ s , ˜ σ r ) constitute a cPSE of the game where ˜ σ r ∈ BR rt (˜ σ s ) ∀ t ∈ [ T ] . Deﬁnition 1:

We call a strategy proﬁle σ Markov cPSE if it is a cPSE of type θ .In the next section, we design an algorithm to compute all Markovian cPSE of the game.IV. S INGLE RECEIVER

A. PBE methodology

In the following, we adapt the methodology presented in [1] to compute PBE of this game.

B. Backward Recursion

In this section, we deﬁne an equilibrium generating function θ = ( θ it ) i ∈{ s,r } ,t ∈ [ T ] , and asequence of functions ( V st , V rt ) t ∈{ , ,...T +1 } , in a backward recursive way, as follows.1. Initialize ∀ π T +1 ∈ P ( X ) , x T +1 ∈ X , V rT +1 ( µ T +1 ) △ = 0 (12) V s + T +1 ( µ T +1 , x T +1 ) △ = 0 . (13) Note that we condition on the common information and not on the actual observed histories of the players

May 18, 2020 DRAFT TECHNICAL NOTE

2. For t = T, T − , . . . , ∀ ν t ∈ P ( X ) , let ˜ γ rt = θ rt [ ν t ] be generated as follows. ˜ γ rt is thesolution of the following optimization. ˜ γ rt ∈ arg max γ rt E γ rt ν t n R rt ( X t , A t ) + δV rt +1 ( G ( ν t , A t , R rt )) (cid:12)(cid:12)(cid:12) ν t o , (14)where the expectation in (14) is deﬁned with respect to random variables ( X t , A t , R rt ) through the measure ν t ( x t ) γ rt ( a t ) ( R rt = R rt ( x t , a t )) , Let V st +1 ( x t , ν t , x t +1 ) △ = E ˜ γ rt ν t (cid:8) V s + t +1 ( G ( ν t , A t , R rt ) , x t +1 ) | x t (cid:9) (15) V r + t ( ν t ) △ = E ˜ γ rt , ν t (cid:8) R rt ( X t , A t ) + δV rt +1 ( G ( ν t , A t , R rt )) (cid:9) (16)Let ∀ ν t ∈ P ( X ) , ˜ γ st = θ st [ µ t ] , be generated as follows. ˜ γ st is the solution of the followingﬁxed-point equation. ˜ γ st ( ·| x t ) ∈ arg max γ st ( ·| x t ) E γ st ( ·| x t ) ,θ rt [ F ( µ t , ˜ γ st ,S t )] µ t n R st ( x t , A t ) + δV st +1 ( x t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:12)(cid:12)(cid:12) µ t , x t o , (17)Let V s + t ( µ t , x t ) △ = E ˜ γ st ( ·| x t ) θ rt [ F ( µ t , ˜ γ st ,S t )] µ t n R st ( x t , A t ) + δV st +1 ( x t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:12)(cid:12)(cid:12) x t o . (18) V rt ( µ t ) △ = E ˜ γ st , µ t n V r + t ( F ( µ t , ˜ γ st , S t )) (cid:12)(cid:12)(cid:12) µ t o (19)where the expectation in (17) is with respect to random variables ( S t , A t , X t +1 ) through themeasure γ st ( s t | x t ) θ rt [ F ( µ t , ˜ γ st , S t )]( a t ) Q ( x t +1 | x t , a t ) and F is deﬁned in Appendix A. C. Forward Recursion

Based on θ deﬁned in the backward recursion above, we now construct a set of strategies ˜ σ (through beliefs µ ) in a forward recursive way as follows.1. Initialize at time t = 1 , µ [ φ ]( x ) := Q ( x ) . (20)2. For t = 1 , . . . T, ∀ i = 1 , , a t ∈ H ct +1 , x t ∈ X t ˜ σ rt ( a t | a t − , s t , r r t − ) := θ rt [ ν t [ a t − , s t , r r t − ]]( a t ) (21) ˜ σ st ( s t | a t − , s t − , r r t − ) := θ st [ µ t [ a t − , s t − , r r t − ]]( s t | x t ) (22) DRAFT May 18, 2020ECHNICAL NOTE 9 ν t [ a t − , s t , r r t − ] = F ( µ t [ a t − , s t − , r r t − ] , θ st [ µ t [ a t − , s t − ]] , s t ) (23a) µ t +1 [ a t , s t − , r r t ] = G ( ν t [ a t − , s t − , r r t − ] , a t , r rt ) (23b)where F, G are deﬁned in Appendix A.

Theorem 1:

A strategy and belief proﬁle (˜ σ, µ, ν ) , as constructed through backward/forwardrecursion algorithm above is a PBE of the game. Proof:

Our model can be ﬁt in the framework considered in [1] as follows. [1] considers amodel where there are N strategic players, each with a private type x it such that players’ typesare conditionally independent across players given the history of actions. Although it assumesthat all N players act in all periods of the game, simultaneously, it can accommodate cases whereat each time t , players are chosen through an exogenously deﬁned Markov process. This is doneby introducing a “nature” player 0, who perfectly observes its state process ( X t ) t ∈ { S, R } ,where the state process X t evolves as a deterministic process such that x t = S for odd timesand x t = R for even times. Player 0 has reward function zero, and plays actions a t = x t . Oncethe quantity a t − is publicly observed, all players can determine that the acting player (at time t ) will be the one indicated by a t − such that a t − = x t − = S indicates sender plays in thegame and a t − = x t − = R indicates that the receiver plays in the game. This is achieved bysetting ∀ i , R it ( x t , a t ) = 0 if i = a t , and Q x ( x it +1 | x it , a t ) = Q x ( x it +1 | x it , a a t t ) . Here, in each periodonly one player (player a t = x t ) acts in the game while all other non-acting players receivezero rewards during that period.Then the above methodology can be seen as adaptation of the methodology considered in [1]to compute PBE of the game and result of the theorem is implied by [1, Theorem 1]. D. Stackelberg methodology: Single Receiver

In the following, we adapt the methodology presented in [2] to compute cPSE of this game.

E. Backward Recursion

In this section, we deﬁne an equilibrium generating function θ = ( θ it ) i ∈{ l,f } ,t ∈ [ T ] , where θ st : P ( X ) → {X → P ( A ) } , θ rt : P ( X ) → P ( A ) and a sequence of functions ( V st , V rt ) t ∈{ , ,...T +1 } ,where V st : P ( X ) × X → R , V rt : P ( X ) → R , in a backward recursive way, as follows. May 18, 2020 DRAFT0 TECHNICAL NOTE

1. Initialize ∀ π T +1 ∈ P ( X ) , x T +1 ∈ X , V rT +1 ( µ T +1 ) △ = 0 (24) V s + T +1 ( µ T +1 ) △ = 0 . (25)2. For t = T, T − , . . . , ∀ ν t ∈ P ( X ) , let ˜ γ rt = θ rt [ ν t ] be generated as follows. ˜ γ rt is thesolution of the following optimization. ˜ γ rt ∈ arg max γ rt E γ rt ν t n R rt ( X t , A t ) + δV rt +1 ( G ( ν t , A t , R rt )) (cid:12)(cid:12)(cid:12) ν t o , (26)where the expectation in (26) is deﬁned with respect to random variables ( X t , A t , R rt ) through the measure ν t ( x t ) γ rt ( a t ) ( R rt = R rt ( x t , a t )) , Let V st +1 ( x t , ν t ) △ = E ˜ γ rt ν t (cid:8) V s + t +1 ( G ( ν t , A t , R rt )) (cid:9) (27) V r + t ( ν t ) △ = E ˜ γ rt , ν t (cid:8) R rt ( X t , A t ) + δV rt +1 ( G ( ν t , A t , R rt )) (cid:9) (28)Let ∀ ν t ∈ P ( X ) , ˜ γ st = θ st [ µ t ] , be generated as follows. ˜ γ st is the solution of the followingﬁxed-point equation. ˜ γ st ∈ arg max γ st E γ st ,θ rt [ F ( µ t ,γ st ,S t )] µ t (cid:8) R st ( X t , A t ) + δV st +1 ( X t , F ( µ t , γ st , S t )) (cid:9) (29)Let V s + t ( µ t ) △ = E ˜ γ st ( ·| x t ) θ rt [ F ( µ t , ˜ γ st ,S t )] µ t (cid:8) R st ( X t , A t ) + δV st +1 ( X t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:9) . (30) V rt ( µ t ) △ = E ˜ γ st , µ t (cid:8) V r + t ( F ( µ t , ˜ γ st , S t )) (cid:9) (31)where the expectation in (29) is with respect to random variables ( X t , S t , A t ) through themeasure µ t ( x t ) γ st ( s t | x t ) θ rt [ F ( µ t , γ st , S t )]( a t ) Q ( x t +1 | x t , a t ) and F is deﬁned in Appendix A. F. Forward Recursion

Based on θ deﬁned in the backward recursion above, we now construct a set of strategies ˜ σ (through beliefs µ ) in a forward recursive way as follows.1. Initialize at time t = 1 , µ [ φ ]( x ) := Q ( x ) . (32)2. For t = 1 , . . . T, ∀ i = 1 , , a t ∈ H ct +1 , x t ∈ X t ˜ σ rt ( a t | a t − , s t − , r r t − ) := θ rt [ ν t [ a t − , s t − , r r t − ]]( a t ) (33) ˜ σ st ( s t | a t − , s t − , r r t − , x t ) := θ st [ µ t [ a t − , s t − , r r t − ]]( s t | x t ) (34) DRAFT May 18, 2020ECHNICAL NOTE 11 ν t [ a t − , s t , r r t − ] = F ( µ t [ a t − , s t − , r r t − ] , θ st [ µ t [ a t − , s t − , r r t − ]] , s t ) (35a) µ t +1 [ a t , s t , r r t ] = G ( ν t [ a t − , s t , r r t − ] , a t , r rt ) (35b)where F, G are deﬁned in Appendix A.

Theorem 2:

A strategy and belief proﬁle (˜ σ, µ, ν ) , as constructed through backward/forwardrecursion algorithm above is a cPSE of the game. Proof:

Then the above methodology can be seen as adaptation of the methodology considered in [2]to compute cPSE of the game and result of the theorem is implied by [2, Theorem 1]. In thiscase, since only the sender has private information and thus can inﬂuence the beliefs, it solvesfor the optimization problem in (29). The receiver however doesn’t have any private informationand solves for an optimization problem in (26).V. M

ULTIPLE PLAYERS

In this section, we consider the case when there are multiple receivers. As before, each receiver i takes action at time t , A r,it as A r,it ∼ g r,it ( ·| a t − , s t − , r r t − ) . The above methodologies canbe extended as follows, A. PBE methodology: Multiple ReceiversB. Backward Recursion

In this section, we deﬁne an equilibrium generating function θ = ( θ it ) i ∈{ s,r ,...,r N } ,t ∈ [ T ] , and asequence of functions ( V st , V r, t , . . . , V r,Nt ) t ∈{ , ,...T +1 } , in a backward recursive way, as follows.1. Initialize ∀ π T +1 ∈ P ( X ) , x T +1 ∈ X , V r,iT +1 ( µ T +1 ) △ = 0 (36) V s + T +1 ( µ T +1 , x T +1 ) △ = 0 . (37)2. For t = T, T − , . . . , ∀ ν t ∈ P ( X ) , let ˜ γ r,it = θ rt [ ν t ] be generated as follows. ˜ γ r,it is thesolution of the following ﬁxed-point equation. ˜ γ r,it ∈ arg max γ r,it E γ r,it , ˜ γ r, − it ν t n R r,it ( X t , A t ) + δV r,it +1 ( G ( ν t , A t , R rt )) (cid:12)(cid:12)(cid:12) ν t o , (38) May 18, 2020 DRAFT2 TECHNICAL NOTE where the expectation in (38) is deﬁned with respect to random variables ( X t , A t ) throughthe measure ν t ( x t ) γ r,it ( a it )˜ γ r, − it ( a − it ) . Let V st +1 ( x t , ν t , x t +1 ) △ = E ˜ γ rt ( · ) ν t (cid:8) V s + t +1 ( G ( ν t , A t , R rt ) , x t +1 ) | x t (cid:9) (39) V r + ,it ( ν t ) △ = E ˜ γ rt , ν t (cid:8) R r,it ( X t , A t ) + δV r,it +1 ( G ( ν t , A t , R rt ) (cid:9) (40)Let ∀ ν t ∈ P ( X ) , ˜ γ st = θ st [ µ t ] , be generated as follows. ˜ γ st is the solution of the followingﬁxed-point equation. ˜ γ st ∈ arg max γ st ( ·| x t ) E γ st ( ·| x t ) ,θ rt [ F ( µ t , ˜ γ st ,S t )] µ t n R st ( x t , A t ) + δV st +1 ( x t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:12)(cid:12)(cid:12) µ t , x t o , (41)Let V s + t ( µ t , x t ) △ = E ˜ γ st ( ·| x t ) θ rt [ F ( µ t , ˜ γ st ,S t )] µ t n R st ( x t , A t ) + δV st +1 ( x t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:12)(cid:12)(cid:12) x t o . (42) V r,it ( µ t ) △ = E ˜ γ st , µ t n V r + ,it ( F ( µ t , ˜ γ st , S t )) (cid:12)(cid:12)(cid:12) µ t o (43)where the expectation in (41) is with respect to random variables ( S t , A t , X t +1 ) through themeasure γ st ( s t | x t ) θ rt [ F ( µ t , ˜ γ st , S t )]( a t ) Q ( x t +1 | x t , a t ) and F is deﬁned in Appendix A. C. Forward Recursion

Based on θ deﬁned in the backward recursion above, we now construct a set of strategies ˜ σ (through beliefs µ ) in a forward recursive way as follows.1. Initialize at time t = 1 , µ [ φ ]( x ) := Q ( x ) . (44)2. For t = 1 , . . . T, ∀ i = 1 , , a t ∈ H ct +1 , x t ∈ X t ˜ σ r,it ( a t | a t − , s t , r r t − ) := θ r,it [ ν t [ a t − , s t − , r r t − ]]( a it ) (45) ˜ σ st ( s t | a t − , s t − , r r t − , x t ) := θ st [ µ t [ a t − , s t − , r r t − ]]( s t | x t ) (46) ν t [ a t − , s t , r r t − ] = F ( µ t [ a t − , s t − , r r t − ] , θ st [ µ t [ a t − , s t − , r r t − ]] , s t ) (47) µ t +1 [ a t , s t − , r r t ] = G ( ν t [ a t − , s t − , r r t − ] , a t , r rt ) (48)where F, G are deﬁned in Appendix A.

DRAFT May 18, 2020ECHNICAL NOTE 13

Theorem 3:

A strategy and belief proﬁle (˜ σ, µ, ν ) , as constructed through backward/forwardrecursion algorithm above is a PBE of the game. Proof:

By similar arguments as that in Theorem 1 above, the result is implied by [1,Theorem 1].

D. cPSE methodology: Multiple Receivers

In this section, we deﬁne an equilibrium generating function θ = ( θ it ) i ∈{ s,r ,...,r N } ,t ∈ [ T ] , and asequence of functions ( V st , V r, t , . . . , V r,Nt ) t ∈{ , ,...T +1 } , in a backward recursive way, as follows.1. Initialize ∀ π T +1 ∈ P ( X ) , x T +1 ∈ X , V r,iT +1 ( µ T +1 ) △ = 0 (49) V s + T +1 ( µ T +1 ) △ = 0 . (50)2. For t = T, T − , . . . , ∀ ν t ∈ P ( X ) , for all i , let ˜ γ r,it = θ r,it [ ν t ] be generated as follows. ˜ γ r,it is the solution of the following ﬁxed-point equation. ˜ γ r,it ∈ arg max γ r,it E γ r,it , ˜ γ r, − it ν t n R r,it ( X t , A t ) + δV r,it +1 ( G ( ν t , A t , R rt )) (cid:12)(cid:12)(cid:12) ν t o , (51)where the expectation in (51) is deﬁned with respect to random variables ( X t , A t ) throughthe measure ν t ( x t ) γ r,it ( a it )˜ γ r, − it ( a − it ) . Note that the above equation is similar to a ﬁxed-pointequation corresponding to that of a Bayesian Nash. Let V st +1 ( x t , ν t ) △ = E ˜ γ rt ν t (cid:8) V s + t +1 ( G ( ν t , A t , R rt ) (cid:9) (52) V r + ,it ( ν t ) △ = E ˜ γ rt , ν t (cid:8) R r,it ( X t , A t ) + δV r,it +1 ( G ( ν t , A t , R rt ) (cid:9) (53)Let ∀ ν t ∈ P ( X ) , ˜ γ st = θ st [ µ t ] , be generated as follows. ˜ γ st is the solution of the followingﬁxed-point equation. ˜ γ st ∈ arg max γ st E γ st ,θ rt [ F ( µ t ,γ st ,S t )] µ t n R st ( X t , A t ) + δV st +1 ( X t , F ( µ t , γ st , S t )) (cid:12)(cid:12)(cid:12) µ t o , (54) May 18, 2020 DRAFT4 TECHNICAL NOTE

Let V s + t ( µ t ) △ = E ˜ γ st θ rt [ F ( µ t , ˜ γ st ,S t )] µ t (cid:8) R st ( X t , A t ) + δV st +1 ( X t , F ( µ t , ˜ γ st , S t )) (cid:9) . (55) V r,it ( µ t ) △ = E ˜ γ st , µ t n V r + ,it ( F ( µ t , ˜ γ st , S t )) (cid:12)(cid:12)(cid:12) µ t o (56)where the expectation in (54) is with respect to random variables ( X t , S t , A t ) through themeasure π t ( x t ) γ st ( s t | x t ) θ rt [ F ( µ t , γ st , S t )]( a t ) Q ( x t +1 | x t , a t ) and F is deﬁned in Appendix A. F. Forward Recursion

Based on θ deﬁned in the backward recursion above, we now construct a set of strategies ˜ σ (through beliefs µ ) in a forward recursive way as follows.1. Initialize at time t = 1 , µ [ φ ]( x ) := Q ( x ) . (57)2. For t = 1 , . . . T, ∀ i = 1 , , a t ∈ H ct +1 , x t ∈ X t ˜ σ rt ( a t | a t − , s t − , r r t − ) := θ rt [ ν t [ a t − , s t − , r r t − ]]( a t ) (58) ˜ σ st ( s t | a t − , s t − , r r t − , x t ) := θ st [ µ t [ a t − , s t − , r r t − ]]( s t | x t ) (59) ν t [ a t − , s t , r r t − ] = F ( µ t [ a t − , s t − , r r t − ] , θ st [ µ t [ a t − , s t − , r r t − ]] , s t ) (60a) µ t +1 [ a t , s t , r r t ] = G ( ν t [ a t − , s t , r r t − ] , a t , r rt ) (60b)where F, G are deﬁned in Appendix A.

Theorem 4:

A strategy and belief proﬁle (˜ σ, µ, ν ) , as constructed through backward/forwardrecursion algorithm above is a cPSE of the game. Proof:

The result is implied by [2, Theorem 1] using similar arguments as in the proof ofTheorem 1 above. Note that in this case, there are multiple receivers maximizing their rewardsin each step in (51), thus instead of an optimization problem, they are solving a ﬁxed-pointequation similar to that of a Bayesian Nash equilibrium.

DRAFT May 18, 2020ECHNICAL NOTE 15 A PPENDIX

Proof: ν t ( x t ) = P θ ( x t | s t , a t − , r r t − ) (61) = P θ ( x t , s t | s t − , a t − , r r t − ) P x t P θ ( x t , s t | s t − , a t − , r r t − ) (62) = µ t ( x t ) γ st ( s t | x t ) P x t µ t ( x t ) γ st ( s t | x t ) . (63)Thus, ν t = F ( µ t , γ st , s t ) (64) µ t +1 ( x t +1 ) = P θ ( x t +1 | s t , a t , r r t ) (65) = P x t P θ ( x t , a t , r rt , x t +1 | s t , a t − , r r t − ) P x t P θ ( x t , a t , r rt | s t , a t − , r r t − ) (66) = P x t ν t ( x t ) γ rt ( a t ) ( r rt = R rt ( x t , a t )) Q x ( x t +1 | x t , a t ) P x t ν t ( x t ) γ rt ( a t ) ( r rt = R rt ( x t , a t )) (67) = P x t ν t ( x t ) ( r rt = R rt ( x t , a t )) Q x ( x t +1 | x t , a t ) P x t ν t ( x t ) ( r rt = R rt ( x t , a t )) (68)Thus, µ t +1 = G ( ν t , a t , r rt ) (69)R EFERENCES [1] D. Vasal, A. Sinha, and A. Anastasopoulos, “A systematic process for evaluating structured perfect bayesian equilibria indynamic games with asymmetric information,”

IEEE Transactions on Automatic Control , vol. 64, no. 1, pp. 81–96, Jan2019.[2] D. Vasal, “Stochastic Stackelberg games,”

SSRN Electronic Journal , may 2020. [Online]. Available:http://arxiv.org/abs/2005.01997[3] R. B. Myerson, “Optimal Auction Design,”

Mathematics of Operations Research

SSRN Electronic Journal , dec 2014.[5] E. Kamenica and M. Gentzkow, “Bayesian persuasion,”

American Economic Review , vol. 101, no. 6, pp. 2590–2615, oct2011.[6] J. Hedlund, “Persuasion with communication costs,”

Games and Economic Behavior , vol. 92, pp. 28–40, 2015.

May 18, 2020 DRAFT6 TECHNICAL NOTE [7] W. Tamura, “A Theory of Multidimensional Information Disclosure,” Institute of Social and Economic Research, OsakaUniversity, ISER Discussion Paper 0828, jan 2012.[8] M. Gentzkow and E. Kamenica, “Bayesian persuasion with multiple senders and rich signal spaces,”

Games and EconomicBehavior , vol. 104, pp. 411–429, 2017.[9] F. Li and P. Norman, “On Bayesian persuasion with multiple senders,”

Economics Letters , vol. 170, pp. 66–70, 2018.[10] D. Bergemann and S. Morris, “Information Design, Bayesian Persuasion, and Bayes Correlated Equilibrium,”

AmericanEconomic Review , vol. 106, no. 5, may 2016.[11] ——, “Bayes correlated equilibrium and the comparison of information structures in games,”

Theoretical Economics ,vol. 11, no. 2, pp. 487–522, 2016.[12] R. Alonso and O. Cˆamara, “Bayesian persuasion with heterogeneous priors,”

Journal of Economic Theory , vol. 165, pp.672–706, 2016.[13] E. Kamenica, “Bayesian Persuasion and Information Design,”

Annual Review of Economics , vol. 11, no. 1, pp. 249–272,2019.[14] F. Farokhi, A. M. H. Teixeira, and C. Langbort, “Estimation With Strategic Sensors,”

IEEE Trans. on Automatic Control ,vol. 62, 2017.[15] D. Lingenbrink and K. Iyer, “Optimal Signaling Mechanisms in Unobservable Queues with Strategic Customers,” in

Proceedings of the 2017 ACM Conference on Economics and Computation . ACM, 2017, p. 347.[16] J. Ely, “Beeps,”

American Economic Review , 2017.[17] J. Renault, E. Solan, and N. Vieille, “Optimal dynamic information provision,”

Games and Economic Behavior , vol. 104,pp. 329–349, 2017.[18] J. W. Best and D. P. Quigley, “Honestly Dishonest : A Solution to the Commitment Problem in Bayesian Persuasion,”2016.[19] J. Best and D. Quigley, “Persuasion for the Long-Run,” Economics Group, Nufﬁeld College, University of Oxford,Economics Papers 2016-W12, 2016.[20] T. Honryo, “Dynamic persuasion,”

Journal of Economic Theory , vol. 178, pp. 36–58, 2018.[21] D. Orlov, A. Skrzypacz, and P. Zryumov, “Persuading the Principal to Wait,”

Stanford University Graduate School ofBusiness Research Paper , no. 16-20, 2019.[22] J. H¨orner and A. Skrzypacz, “Selling Information,”

Journal of Political Economy , vol. 124, no. 6, pp. 1515–1562, 2016.[23] J. Ely and M. Szydlowski, “Moving the Goalposts,”

Journal of Political Economy , 2019.[24] L. Doval and J. Ely, “Sequential Information Design,”

Econometrica , 2019.[25] P. Basu, “Dynamic Bayesian Persuasion with a Privately Informed Receiver,” 2017.[26] P. H. Au, “Dynamic information disclosure,”

The RAND Journal of Economics , vol. 46, no. 4, pp. 791–823, 2015.[27] F. Farhadi, D. Teneketzis, and S. J. Golestani, “Static and Dynamic Informational Incentive Mechanisms for SecurityEnhancement,” in , jun 2018.[28] H. Tavafoghi and D. Teneketzis, “Informational Incentives in Congestion Games,” in

Proc. of the Allerton , 2017.[29] E. Meigs, F. Parise, A. Ozdaglar, and D. Acemoglu, “Optimal dynamic information provision in trafﬁc routing,” in

Arxiv ,2020.[30] J. C. Ely, “Beeps,”

American Economic Review , vol. 107, no. 1, pp. 31–53, jan 2017.[31] F. Farhadi and D. Teneketzis, “Dynamic Information Design: A Simple Problem on Optimal Sequential InformationDisclosure,”

SSRN Electronic Journal , mar 2020. [Online]. Available: http://arxiv.org/abs/2003.07965[32] A. Nayyar, A. Mahajan, and D. Teneketzis, “Decentralized stochastic control with partial history sharing: A commoninformation approach,”

Automatic Control, IEEE Transactions on , vol. 58, no. 7, pp. 1644–1658, 2013.

DRAFT May 18, 2020ECHNICAL NOTE 17 [33] D. Fudenberg and J. Tirole,

Game Theory . Cambridge, MA: MIT Press, 1991.[34] ——, “Perfect bayesian equilibrium and sequential equilibrium,” journal of Economic Theory , vol. 53, no. 2, pp. 236–260,1991., vol. 53, no. 2, pp. 236–260,1991.