aa r X i v : . [ ec on . T H ] M a y TECHNICAL NOTE 1
Dynamic information design
Deepanshu Vasal
Abstract
We consider the problem of dynamic information design with one sender and one receiver wherethe sender observers a private state of the system and takes an action to send a signal based on itsobservation to a receiver. Based on this signal, the receiver takes an action that determines rewardsfor both the sender and the receiver and controls the state of the system. In this technical note, weshow that this problem can be considered as a problem of dynamic game of asymmetric informationand its perfect Bayesian equilibrium (PBE) and Stackelberg equilibrium (SE) can be analyzed using thealgorithms presented in [1], [2] by the same author (among others). We then extend this model whenthere is one sender and multiple receivers and provide algorithms to compute a class of equilibria ofthis game.
I. I
NTRODUCTION
Game theory is a powerful tool to analyze behavior among strategic agents. An engineeringside of game theory is mechanism design which aims to design systems such that when playedon by strategic agents who optimize their individual objectives, they achieve the same objectiveas envisioned by the designer. A classic and one of the most widely used practical example ofmechanism design is auctions [3] where an auctioneer asks for bids by the bidders on a privategood. The auction is designed in such a way that when the strategic bidders bid on the value tomaximize their own valuations, it maximizes the returns of the auctioneer. There is a huge andgrowing literature on the theory of Mechanism Design as well as its real world applications [4].Information design is a relatively new field related to the field of mechanism design introducedby Kamenica and Gentzkow in [5] where a sender (designer) observes a state of the world notobserved by the receiver. The sender sends a signal to the receiver about this state based onwhich the receiver takes an action, which determines individual rewards for both the sender andthe receiver. The sender has to choose a signal that maximizes its reward. The receiver interprets e-mail: [email protected].
May 18, 2020 DRAFT TECHNICAL NOTE the state of the world from the sender’s signal knowing that the sender would have chosen asignal that maximizes its reward, and thus takes an action that maximizes its own reward. Thereare two notions of equilibrium that can be considered in this setting: (a) Nash equilibrium and(b) Stackelberg equilibrium.Nash equilibrium is defined as a set of strategies of the player such that no user wants tounilaterally deviate. Thus it can be defined as a fixed-point of best responses to each player’sstrategies. In a stackelberg equilibrium there is a leader and a follower (sender and receiverin this case, respectively). The leader commits to a policy that is known and observed by thereceiver. Then theStackelberg equilibrium is defined as set of strategies of the players such thatthe receiver plays best response to the leader’s committed strategy and the leader, knowing thatthe sender will play a best response, plays a strategy that maximizes its reward.Since [5], there have been a growing number of works on information design includingdynamic information design where the state of the world evolves in a dynamic fashion andboth sender and the receiver play a sequential game [6]–[29]. Authors in [15]–[19] considered adynamic version of the model consiodered by [5] where the state evolves as a Markov process,the sender is forward looking, however, the receiver is myopic. Recently, Farhadi and Teneketzisin [31] considered a dynamic model with evolving Markovian state and presented (Stackelberg)equilibrium strategies of the sender and the receiver, where both the sender and the receiver arefully rational. We refer the readers to [31] for an excellent introduction of information designproblems.In this note, we consider a general discrete time finite horizon model where there is a stateof the system that is evolving as a controlled Markov process which is privately observed bythe sender. In each time t , the sender sends a signal to the receiver based on which both thesender and the receiver get individual instantaneous rewards. The objective of the sender is tomaximize its total expected reward over the time horizon T and the objective of the receiver isto maximize its own. Thus it can be posed as a dynamic game of asymmetric information whereplayers play alternatively. We assume both the sender and the receiver are fully rational andforward looking. We consider two equilibrium concepts:(1) Perfect Bayesian Equilibrium (PBE)which can be thought of as extension of Nash equilibrium for dynamic games of incompleteinformation, (2) perfect Stackelberg equilibrium (PSE) where the sender has a commitment powerand commits to a policy. In this technical note, we show that game fits within the framework ofthe models used by the authors in [1], [2], which provides a tool to analyze Markovian Perfect DRAFT May 18, 2020ECHNICAL NOTE 3
Bayesian equilibrium (PBE) and Markovian Perfect Stackelberg Equilibrium (PSE) of this game.We further extend this model such that instead of one, there are multiple receivers taking actions.Based on [1], [2], we provide an algorithm to analyze PBEs of this game.
A. Notation
We use uppercase letters for random variables and lowercase for their realizations. For anyvariable, subscripts represent time indices and superscripts represent player indices. We usenotation − i to represent all players other than player i i.e. − i = { , , . . . i − , i + 1 , . . . , N } .We use notation A t : t ′ to represent the vector ( A t , A t +1 , . . . A t ′ ) when t ′ ≥ t or an empty vectorif t ′ < t . We use A − it to mean ( A t , A t , . . . , A i − t , A i +1 t . . . , A Nt ) . We remove superscripts orsubscripts if we want to represent the whole vector, for example A t represents ( A t , . . . , A Nt ) .In a similar vein, for any collection of sets ( X i ) i ∈N , we denote × i ∈N X i by X . We denotethe indicator function of a set A by I A ( · ) . For any finite set S , ∆( S ) represents the spaceof probability measures on S and |S| represents its cardinality. We denote by P g (or E g ) theprobability measure generated by (or expectation with respect to) strategy profile g . We denotethe set of real numbers by R . For a probabilistic strategy profile of players ( σ it ) i ∈N wherethe probability of action a it conditioned on ( a t − , x i t ) is given by σ it ( a it | a t − , x i t ) , we usethe notation σ − it ( a − it | a t − , x − i t ) to represent Q j = i σ jt ( a jt | a t − , x j t ) . All equalities/inequalitiesinvolving random variables are to be interpreted in the a.s. sense. For mappings with rangefunction sets f : A → ( B → C ) we use square brackets f [ a ] ∈ B → C to denote the image of a ∈ A through f and parentheses f [ a ]( b ) ∈ C to denote the image of b ∈ B through f [ a ] . Acontrolled Markov process with state X t , action A t , and horizon T is denoted by ( X t , A t ) t ∈T .The paper is organized as follows. In Section II, we present the model. In Section III,we present the solution concepts of perfect Bayesian and Perfect Stackelberg equilibrium. InSection IV, we present a two-step backward-forward recursive algorithm to construct a strategyprofile and a sequence of beliefs of the dynamic game considered. In Section V, we extend thatmethodology to multiple receivers. II. M ODEL
Suppose there are two players, a sender and a receiver. Sender observes a controlled Markovprocess { X t } t privately such that P ( x t | x t − , a t − ) = Q ( x t | x t − , a t − ) (1a) May 18, 2020 DRAFT TECHNICAL NOTE where a t ∈ A is the action taken by the receiver at time t . Sender takes action s t ∈ S attime t upon observing ( s t − , a t − ) , which is common information among players, and x t ,which is sender’s private information. The sets A , X , S are assumed to be finite and we alsoassume that kernel Q x has full support. Players play alternatively such that sender plays at oddtimes and the receiver plays at the even times. At the end of interval t , player i receives aninstantaneous reward R i ( x t , a t ) . All reward functions, priors and the update kernels are assumedto be common knowledge. We also assume that the receiver observes the rewards it receives { R rt } t during the course of the game. These could be understood as additional observation of thestate by the receiver. We note that these rewards are a function of current state and action of thereceiver, which the sender perfectly observes. Let g i = ( g it ) t be a probabilistic strategy of player i ∈ { S, R } where g st : ( S × A × R ) t − × ( X ) t → ∆( A s ) and g rt : ( S × A × R ) t − → ∆( A r ) such that player i ∈ { S, R } plays action a it according to A st ∼ g st ( ·| a t − , s t , r r t − , x t ) and A rt ∼ g rt ( ·| a t − , r r t − , s t ) . Let g := ( g i ) i ∈{ S,R } be a strategy profile of all players. The objectiveof the player i is to maximize its total expected reward J i,g := E g " T X t =1 R i ( X t , A t ) . (2) A. Common agent approach
Any history of this game at which players take action is of the form h t = ( a t − , s t , r r t − , x t ) .Let H t be the set of such histories, H T △ = ∪ Tt =0 H t be the set of all possible such histories in finitehorizon and H ∞ △ = ∪ ∞ t =0 H t for infinite horizon. At any time t the sender observes h st = h t =( a t − , s t , r r t − , x t ) and both players together have h ct = a t − , s t , r r t − as common history.Since the receiver does not observe any private information, h rt = h ct = ( a t − , s t , r r t − ) . Let H it be the set of observed histories of player i at time t and H ct be the set of common historiesat time t .We recall that the sender and the receiver generate their actions at time t as follows, A st ∼ g st ( ·| a t − , s t , r r t − , x t ) and A rt ∼ g rt ( ·| a t − , s t , r r t − ) . An alternative way to view theproblem is as follows. As is done in common information approach [32], at odd time t , a fictitiouscommon agent observes the common information ( a t − , s t − , r r t − ) and generates prescriptionfunction γ st = ψ st [ a t − , s t − , r r t − ] . Sender uses its prescription function γ st to operate on itsprivate information x t to produce its action a st , γ st : X t → ∆( A s ) and a st ∼ γ st ( ·| x t ) . Ateven time t , the fictitious common agent observes the common information ( a t − , s t − , r r t − ) DRAFT May 18, 2020ECHNICAL NOTE 5 and generates prescription function γ rt = ψ rt [ a t − , s t − , r r t − ] . Receiver uses its prescriptionfunction γ rt to produce its action a rt , where γ rt ∈ ∆( A r ) and a rt ∼ γ rt ( · ) . It is easy to see thatfor any g policy profile of the players, there exists an equivalent ψ profile of the common agent(and vice versa) that generates the same control actions for every realization of the informationof the players.Here, we will consider Markovian common agent’s policy as follows. We call a commonagent’s policy be of “type θ ” if the common agent observes the common beliefs µ t at odd timesand ν t at even times derived from the common observation a t − , s t − , r r t − and a t − , s t − , r r t − respectively, and generates prescription functions γ st = θ t [ µ t ] and γ rt = θ t [ ν t ] , where ν t ( x t ) = P g ( X t = x t | a t − , s t − , r r t − ) and µ t +1 ( x t +1 ) = P g ( X t +1 = x t +1 | a t − , s t , r r t − ) .Moreover, the sender’s action only depends on its current private information x t i.e. S t ∼ γ st ( ·| x t ) .In the next lemma we show that for any given θ policy, the belief states µ t , ν t can be updatedrecursively as follows. Let µ ( x ) := Q ( x ) . Lemma 1:
For any given policy of type θ , there exists update functions F, G , independent of θ , such that ν t = F ( µ t , γ st , s t ) (3) µ t +1 = G ( ν t , a t , r rt ) (4) Proof:
Please see Appendix A.III. S
OLUTION C ONCEPTS
A. Solution Concept: Perfect Bayesian Equilibrium
An appropriate concept of equilibrium for such games is PBE [33], which consists of a pair ( β ∗ , µ ∗ ) of strategy profile β ∗ = ( β ∗ ,it ) t ∈T ,i ∈N where β ∗ ,it : H it → ∆( A i ) and a belief profile µ ∗ = ( i µ ∗ t ) t ∈T ,i ∈N where i µ ∗ t : H it → ∆( H t ) that satisfy sequential rationality so that for i = l, f, t ∈ T , h it ∈ H it , β i W i,β ∗ ,i ,Tt ( h it ) ≥ W i,β i ,Tt ( h it ) (5)where the reward-to-go is defined as W i,β i ,Tt ( h it ) , E β i β ∗ , − i , i µ ∗ t [ h it ] ( T X n = t R in ( X n , A n ) (cid:12)(cid:12)(cid:12) h it ) , (6)and the beliefs are updated using Bayes’ rule whenever possible. In general, a belief for player i at time t , i µ ∗ t is defined on history h t = ( a t − , s t , r r t − , x t ) given its private history h it . May 18, 2020 DRAFT TECHNICAL NOTE
At any time t , the relevant uncertainty follower has is about the state history x t ∈ × tn =1 X and their future actions. In our setting, we consider beliefs that are functions of each player’shistory h it only through the common history h ct and are a belief on the current state only. Herethe follower’s belief for each history h ct = ( a t − , s t , r r t − ) is derived from a common belief µ ∗ t [ a t − , s t , r r t − ] . In order to anticipate followers actions through its strategy, the leader keepstrack of this belief as well (and it can since it is derived from common information). Thus wecan sufficiently use the system of beliefs, µ ∗ = ( µ ∗ t ) t ∈T , where µ ∗ t : H ct → ∆( X ) . B. Solution concept:Stackelberg Equilibrium
An appropriate notion of equilibrium is Stackelberg equilibrium defined as follows. For agiven strategy profile of the sender, σ s , the receiver maximizes its total discounted expectedutility over finite horizon T , max σ r E σ s ,σ r ( T X t =1 δ t − R rt ( X t , A t ) ) . (7)Let BR r ( σ s ) be the set of optimizing strategies of the receiver given a strategy σ s of the sender,i.e. BR r ( σ s ) = arg max σ r E σ s ,σ r ( T X t =1 δ t − R rt ( X t , A t ) ) (8)The sender finds its optimal strategy that maximizes its total expected discounted reward giventhat the receiver will use its best response to it, ˜ σ s ∈ max σ s E σ s ,BR r ( σ s ) ( T X t =1 δ t − R st ( X t , A t ) ) , (9)Then (˜ σ s , ˜ σ r ) constitute a Stackelberg equilibrium where ˜ σ r ∈ BR r (˜ σ s ) . C. Common Perfect Stackelberg equilibrium
In this paper, we will consider sender’ equilibrium policies that only depend on its cur-rent states x t and action history, i.e. at equilibrium, a st ∼ ˜ σ st ( ·| a t − , s t − , r r t − , x t ) , a rt ∼ ˜ σ st ( ·| a t − , s t − , r r t − ) . For the game considered, we introduce a notion of common Perfect Stackelberg Equilibrium(cPSE), inspired by perfect Bayesian equilibrium [34] as follows. Note, however, that for the purpose of equilibrium, the optimization will be performed in the space of all possible strategiesthat may depend on the entire history of state.
DRAFT May 18, 2020ECHNICAL NOTE 7
Let (˜ σ, µ, ν ) be a cPSE of the game, where µ = ( µ t ) t ∈ [ T ] , ν = ( ν t ) t ∈ [ T ] , and for any t , ( a t − , s t − , r r t − ), µ t [ a t − , s t − , r r t − ] ∈ ∆( X ) , ν t [ a t − , s t − , r r t − ] ∈ ∆( X ) are theequilibrium belief on the current state x t , given the action history ( a t − , s t − , r r t − ) , ( a t − , s t − , r r t − ) respectively, i.e. µ t [ a t − , s t − , r r t − ]( x t ) = P ˜ σ ( x t | a t − , s t − , r r t − ) , ν t [ a t − , s t − , r r t − ]( x t ) = P ˜ σ ( x t | a t − , s t − , r r t − ) . Then for all t ∈ [ T ] , h rt = ( a t − , s t − , r r t − ) , h st = ( a t − , s t − , r r t − ) , x t , for any given σ s , with some abuseof notation, let BR r ( σ s ) , which is best response of the receiver to any strategy σ l of the sender,be defined as, ∀ h ct BR r ( σ s ) := \ t \ h ct arg max σ r E σ s ,σ r ( T X n = t δ n − t R rn ( X n , A n ) | h ct ) (10)and ∀ h ct , let the set of optimum strategies of the sender be defined as, ˜ σ s ∈ \ t \ h ct arg max σ s E σ s ,BR r ( σ s ) ( T X n = t δ n − t R rn ( X n , A n ) | h ct ) , (11)Then (˜ σ s , ˜ σ r ) constitute a cPSE of the game where ˜ σ r ∈ BR rt (˜ σ s ) ∀ t ∈ [ T ] . Definition 1:
We call a strategy profile σ Markov cPSE if it is a cPSE of type θ .In the next section, we design an algorithm to compute all Markovian cPSE of the game.IV. S INGLE RECEIVER
A. PBE methodology
In the following, we adapt the methodology presented in [1] to compute PBE of this game.
B. Backward Recursion
In this section, we define an equilibrium generating function θ = ( θ it ) i ∈{ s,r } ,t ∈ [ T ] , and asequence of functions ( V st , V rt ) t ∈{ , ,...T +1 } , in a backward recursive way, as follows.1. Initialize ∀ π T +1 ∈ P ( X ) , x T +1 ∈ X , V rT +1 ( µ T +1 ) △ = 0 (12) V s + T +1 ( µ T +1 , x T +1 ) △ = 0 . (13) Note that we condition on the common information and not on the actual observed histories of the players
May 18, 2020 DRAFT TECHNICAL NOTE
2. For t = T, T − , . . . , ∀ ν t ∈ P ( X ) , let ˜ γ rt = θ rt [ ν t ] be generated as follows. ˜ γ rt is thesolution of the following optimization. ˜ γ rt ∈ arg max γ rt E γ rt ν t n R rt ( X t , A t ) + δV rt +1 ( G ( ν t , A t , R rt )) (cid:12)(cid:12)(cid:12) ν t o , (14)where the expectation in (14) is defined with respect to random variables ( X t , A t , R rt ) through the measure ν t ( x t ) γ rt ( a t ) ( R rt = R rt ( x t , a t )) , Let V st +1 ( x t , ν t , x t +1 ) △ = E ˜ γ rt ν t (cid:8) V s + t +1 ( G ( ν t , A t , R rt ) , x t +1 ) | x t (cid:9) (15) V r + t ( ν t ) △ = E ˜ γ rt , ν t (cid:8) R rt ( X t , A t ) + δV rt +1 ( G ( ν t , A t , R rt )) (cid:9) (16)Let ∀ ν t ∈ P ( X ) , ˜ γ st = θ st [ µ t ] , be generated as follows. ˜ γ st is the solution of the followingfixed-point equation. ˜ γ st ( ·| x t ) ∈ arg max γ st ( ·| x t ) E γ st ( ·| x t ) ,θ rt [ F ( µ t , ˜ γ st ,S t )] µ t n R st ( x t , A t ) + δV st +1 ( x t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:12)(cid:12)(cid:12) µ t , x t o , (17)Let V s + t ( µ t , x t ) △ = E ˜ γ st ( ·| x t ) θ rt [ F ( µ t , ˜ γ st ,S t )] µ t n R st ( x t , A t ) + δV st +1 ( x t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:12)(cid:12)(cid:12) x t o . (18) V rt ( µ t ) △ = E ˜ γ st , µ t n V r + t ( F ( µ t , ˜ γ st , S t )) (cid:12)(cid:12)(cid:12) µ t o (19)where the expectation in (17) is with respect to random variables ( S t , A t , X t +1 ) through themeasure γ st ( s t | x t ) θ rt [ F ( µ t , ˜ γ st , S t )]( a t ) Q ( x t +1 | x t , a t ) and F is defined in Appendix A. C. Forward Recursion
Based on θ defined in the backward recursion above, we now construct a set of strategies ˜ σ (through beliefs µ ) in a forward recursive way as follows.1. Initialize at time t = 1 , µ [ φ ]( x ) := Q ( x ) . (20)2. For t = 1 , . . . T, ∀ i = 1 , , a t ∈ H ct +1 , x t ∈ X t ˜ σ rt ( a t | a t − , s t , r r t − ) := θ rt [ ν t [ a t − , s t , r r t − ]]( a t ) (21) ˜ σ st ( s t | a t − , s t − , r r t − ) := θ st [ µ t [ a t − , s t − , r r t − ]]( s t | x t ) (22) DRAFT May 18, 2020ECHNICAL NOTE 9 ν t [ a t − , s t , r r t − ] = F ( µ t [ a t − , s t − , r r t − ] , θ st [ µ t [ a t − , s t − ]] , s t ) (23a) µ t +1 [ a t , s t − , r r t ] = G ( ν t [ a t − , s t − , r r t − ] , a t , r rt ) (23b)where F, G are defined in Appendix A.
Theorem 1:
A strategy and belief profile (˜ σ, µ, ν ) , as constructed through backward/forwardrecursion algorithm above is a PBE of the game. Proof:
Our model can be fit in the framework considered in [1] as follows. [1] considers amodel where there are N strategic players, each with a private type x it such that players’ typesare conditionally independent across players given the history of actions. Although it assumesthat all N players act in all periods of the game, simultaneously, it can accommodate cases whereat each time t , players are chosen through an exogenously defined Markov process. This is doneby introducing a “nature” player 0, who perfectly observes its state process ( X t ) t ∈ { S, R } ,where the state process X t evolves as a deterministic process such that x t = S for odd timesand x t = R for even times. Player 0 has reward function zero, and plays actions a t = x t . Oncethe quantity a t − is publicly observed, all players can determine that the acting player (at time t ) will be the one indicated by a t − such that a t − = x t − = S indicates sender plays in thegame and a t − = x t − = R indicates that the receiver plays in the game. This is achieved bysetting ∀ i , R it ( x t , a t ) = 0 if i = a t , and Q x ( x it +1 | x it , a t ) = Q x ( x it +1 | x it , a a t t ) . Here, in each periodonly one player (player a t = x t ) acts in the game while all other non-acting players receivezero rewards during that period.Then the above methodology can be seen as adaptation of the methodology considered in [1]to compute PBE of the game and result of the theorem is implied by [1, Theorem 1]. D. Stackelberg methodology: Single Receiver
In the following, we adapt the methodology presented in [2] to compute cPSE of this game.
E. Backward Recursion
In this section, we define an equilibrium generating function θ = ( θ it ) i ∈{ l,f } ,t ∈ [ T ] , where θ st : P ( X ) → {X → P ( A ) } , θ rt : P ( X ) → P ( A ) and a sequence of functions ( V st , V rt ) t ∈{ , ,...T +1 } ,where V st : P ( X ) × X → R , V rt : P ( X ) → R , in a backward recursive way, as follows. May 18, 2020 DRAFT0 TECHNICAL NOTE
1. Initialize ∀ π T +1 ∈ P ( X ) , x T +1 ∈ X , V rT +1 ( µ T +1 ) △ = 0 (24) V s + T +1 ( µ T +1 ) △ = 0 . (25)2. For t = T, T − , . . . , ∀ ν t ∈ P ( X ) , let ˜ γ rt = θ rt [ ν t ] be generated as follows. ˜ γ rt is thesolution of the following optimization. ˜ γ rt ∈ arg max γ rt E γ rt ν t n R rt ( X t , A t ) + δV rt +1 ( G ( ν t , A t , R rt )) (cid:12)(cid:12)(cid:12) ν t o , (26)where the expectation in (26) is defined with respect to random variables ( X t , A t , R rt ) through the measure ν t ( x t ) γ rt ( a t ) ( R rt = R rt ( x t , a t )) , Let V st +1 ( x t , ν t ) △ = E ˜ γ rt ν t (cid:8) V s + t +1 ( G ( ν t , A t , R rt )) (cid:9) (27) V r + t ( ν t ) △ = E ˜ γ rt , ν t (cid:8) R rt ( X t , A t ) + δV rt +1 ( G ( ν t , A t , R rt )) (cid:9) (28)Let ∀ ν t ∈ P ( X ) , ˜ γ st = θ st [ µ t ] , be generated as follows. ˜ γ st is the solution of the followingfixed-point equation. ˜ γ st ∈ arg max γ st E γ st ,θ rt [ F ( µ t ,γ st ,S t )] µ t (cid:8) R st ( X t , A t ) + δV st +1 ( X t , F ( µ t , γ st , S t )) (cid:9) (29)Let V s + t ( µ t ) △ = E ˜ γ st ( ·| x t ) θ rt [ F ( µ t , ˜ γ st ,S t )] µ t (cid:8) R st ( X t , A t ) + δV st +1 ( X t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:9) . (30) V rt ( µ t ) △ = E ˜ γ st , µ t (cid:8) V r + t ( F ( µ t , ˜ γ st , S t )) (cid:9) (31)where the expectation in (29) is with respect to random variables ( X t , S t , A t ) through themeasure µ t ( x t ) γ st ( s t | x t ) θ rt [ F ( µ t , γ st , S t )]( a t ) Q ( x t +1 | x t , a t ) and F is defined in Appendix A. F. Forward Recursion
Based on θ defined in the backward recursion above, we now construct a set of strategies ˜ σ (through beliefs µ ) in a forward recursive way as follows.1. Initialize at time t = 1 , µ [ φ ]( x ) := Q ( x ) . (32)2. For t = 1 , . . . T, ∀ i = 1 , , a t ∈ H ct +1 , x t ∈ X t ˜ σ rt ( a t | a t − , s t − , r r t − ) := θ rt [ ν t [ a t − , s t − , r r t − ]]( a t ) (33) ˜ σ st ( s t | a t − , s t − , r r t − , x t ) := θ st [ µ t [ a t − , s t − , r r t − ]]( s t | x t ) (34) DRAFT May 18, 2020ECHNICAL NOTE 11 ν t [ a t − , s t , r r t − ] = F ( µ t [ a t − , s t − , r r t − ] , θ st [ µ t [ a t − , s t − , r r t − ]] , s t ) (35a) µ t +1 [ a t , s t , r r t ] = G ( ν t [ a t − , s t , r r t − ] , a t , r rt ) (35b)where F, G are defined in Appendix A.
Theorem 2:
A strategy and belief profile (˜ σ, µ, ν ) , as constructed through backward/forwardrecursion algorithm above is a cPSE of the game. Proof:
Then the above methodology can be seen as adaptation of the methodology considered in [2]to compute cPSE of the game and result of the theorem is implied by [2, Theorem 1]. In thiscase, since only the sender has private information and thus can influence the beliefs, it solvesfor the optimization problem in (29). The receiver however doesn’t have any private informationand solves for an optimization problem in (26).V. M
ULTIPLE PLAYERS
In this section, we consider the case when there are multiple receivers. As before, each receiver i takes action at time t , A r,it as A r,it ∼ g r,it ( ·| a t − , s t − , r r t − ) . The above methodologies canbe extended as follows, A. PBE methodology: Multiple ReceiversB. Backward Recursion
In this section, we define an equilibrium generating function θ = ( θ it ) i ∈{ s,r ,...,r N } ,t ∈ [ T ] , and asequence of functions ( V st , V r, t , . . . , V r,Nt ) t ∈{ , ,...T +1 } , in a backward recursive way, as follows.1. Initialize ∀ π T +1 ∈ P ( X ) , x T +1 ∈ X , V r,iT +1 ( µ T +1 ) △ = 0 (36) V s + T +1 ( µ T +1 , x T +1 ) △ = 0 . (37)2. For t = T, T − , . . . , ∀ ν t ∈ P ( X ) , let ˜ γ r,it = θ rt [ ν t ] be generated as follows. ˜ γ r,it is thesolution of the following fixed-point equation. ˜ γ r,it ∈ arg max γ r,it E γ r,it , ˜ γ r, − it ν t n R r,it ( X t , A t ) + δV r,it +1 ( G ( ν t , A t , R rt )) (cid:12)(cid:12)(cid:12) ν t o , (38) May 18, 2020 DRAFT2 TECHNICAL NOTE where the expectation in (38) is defined with respect to random variables ( X t , A t ) throughthe measure ν t ( x t ) γ r,it ( a it )˜ γ r, − it ( a − it ) . Let V st +1 ( x t , ν t , x t +1 ) △ = E ˜ γ rt ( · ) ν t (cid:8) V s + t +1 ( G ( ν t , A t , R rt ) , x t +1 ) | x t (cid:9) (39) V r + ,it ( ν t ) △ = E ˜ γ rt , ν t (cid:8) R r,it ( X t , A t ) + δV r,it +1 ( G ( ν t , A t , R rt ) (cid:9) (40)Let ∀ ν t ∈ P ( X ) , ˜ γ st = θ st [ µ t ] , be generated as follows. ˜ γ st is the solution of the followingfixed-point equation. ˜ γ st ∈ arg max γ st ( ·| x t ) E γ st ( ·| x t ) ,θ rt [ F ( µ t , ˜ γ st ,S t )] µ t n R st ( x t , A t ) + δV st +1 ( x t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:12)(cid:12)(cid:12) µ t , x t o , (41)Let V s + t ( µ t , x t ) △ = E ˜ γ st ( ·| x t ) θ rt [ F ( µ t , ˜ γ st ,S t )] µ t n R st ( x t , A t ) + δV st +1 ( x t , F ( µ t , ˜ γ st , S t ) , X t +1 ) (cid:12)(cid:12)(cid:12) x t o . (42) V r,it ( µ t ) △ = E ˜ γ st , µ t n V r + ,it ( F ( µ t , ˜ γ st , S t )) (cid:12)(cid:12)(cid:12) µ t o (43)where the expectation in (41) is with respect to random variables ( S t , A t , X t +1 ) through themeasure γ st ( s t | x t ) θ rt [ F ( µ t , ˜ γ st , S t )]( a t ) Q ( x t +1 | x t , a t ) and F is defined in Appendix A. C. Forward Recursion
Based on θ defined in the backward recursion above, we now construct a set of strategies ˜ σ (through beliefs µ ) in a forward recursive way as follows.1. Initialize at time t = 1 , µ [ φ ]( x ) := Q ( x ) . (44)2. For t = 1 , . . . T, ∀ i = 1 , , a t ∈ H ct +1 , x t ∈ X t ˜ σ r,it ( a t | a t − , s t , r r t − ) := θ r,it [ ν t [ a t − , s t − , r r t − ]]( a it ) (45) ˜ σ st ( s t | a t − , s t − , r r t − , x t ) := θ st [ µ t [ a t − , s t − , r r t − ]]( s t | x t ) (46) ν t [ a t − , s t , r r t − ] = F ( µ t [ a t − , s t − , r r t − ] , θ st [ µ t [ a t − , s t − , r r t − ]] , s t ) (47) µ t +1 [ a t , s t − , r r t ] = G ( ν t [ a t − , s t − , r r t − ] , a t , r rt ) (48)where F, G are defined in Appendix A.
DRAFT May 18, 2020ECHNICAL NOTE 13
Theorem 3:
A strategy and belief profile (˜ σ, µ, ν ) , as constructed through backward/forwardrecursion algorithm above is a PBE of the game. Proof:
By similar arguments as that in Theorem 1 above, the result is implied by [1,Theorem 1].
D. cPSE methodology: Multiple Receivers
In this section, we consider the case when there are multiple receivers. As before, each receiver i takes action at time t , A r,it as A r,it ∼ g r,it ( ·| a t − , s t − , r r t − ) . The above methodology can beextended as follows. E. Backward Recursion
In this section, we define an equilibrium generating function θ = ( θ it ) i ∈{ s,r ,...,r N } ,t ∈ [ T ] , and asequence of functions ( V st , V r, t , . . . , V r,Nt ) t ∈{ , ,...T +1 } , in a backward recursive way, as follows.1. Initialize ∀ π T +1 ∈ P ( X ) , x T +1 ∈ X , V r,iT +1 ( µ T +1 ) △ = 0 (49) V s + T +1 ( µ T +1 ) △ = 0 . (50)2. For t = T, T − , . . . , ∀ ν t ∈ P ( X ) , for all i , let ˜ γ r,it = θ r,it [ ν t ] be generated as follows. ˜ γ r,it is the solution of the following fixed-point equation. ˜ γ r,it ∈ arg max γ r,it E γ r,it , ˜ γ r, − it ν t n R r,it ( X t , A t ) + δV r,it +1 ( G ( ν t , A t , R rt )) (cid:12)(cid:12)(cid:12) ν t o , (51)where the expectation in (51) is defined with respect to random variables ( X t , A t ) throughthe measure ν t ( x t ) γ r,it ( a it )˜ γ r, − it ( a − it ) . Note that the above equation is similar to a fixed-pointequation corresponding to that of a Bayesian Nash. Let V st +1 ( x t , ν t ) △ = E ˜ γ rt ν t (cid:8) V s + t +1 ( G ( ν t , A t , R rt ) (cid:9) (52) V r + ,it ( ν t ) △ = E ˜ γ rt , ν t (cid:8) R r,it ( X t , A t ) + δV r,it +1 ( G ( ν t , A t , R rt ) (cid:9) (53)Let ∀ ν t ∈ P ( X ) , ˜ γ st = θ st [ µ t ] , be generated as follows. ˜ γ st is the solution of the followingfixed-point equation. ˜ γ st ∈ arg max γ st E γ st ,θ rt [ F ( µ t ,γ st ,S t )] µ t n R st ( X t , A t ) + δV st +1 ( X t , F ( µ t , γ st , S t )) (cid:12)(cid:12)(cid:12) µ t o , (54) May 18, 2020 DRAFT4 TECHNICAL NOTE
Let V s + t ( µ t ) △ = E ˜ γ st θ rt [ F ( µ t , ˜ γ st ,S t )] µ t (cid:8) R st ( X t , A t ) + δV st +1 ( X t , F ( µ t , ˜ γ st , S t )) (cid:9) . (55) V r,it ( µ t ) △ = E ˜ γ st , µ t n V r + ,it ( F ( µ t , ˜ γ st , S t )) (cid:12)(cid:12)(cid:12) µ t o (56)where the expectation in (54) is with respect to random variables ( X t , S t , A t ) through themeasure π t ( x t ) γ st ( s t | x t ) θ rt [ F ( µ t , γ st , S t )]( a t ) Q ( x t +1 | x t , a t ) and F is defined in Appendix A. F. Forward Recursion
Based on θ defined in the backward recursion above, we now construct a set of strategies ˜ σ (through beliefs µ ) in a forward recursive way as follows.1. Initialize at time t = 1 , µ [ φ ]( x ) := Q ( x ) . (57)2. For t = 1 , . . . T, ∀ i = 1 , , a t ∈ H ct +1 , x t ∈ X t ˜ σ rt ( a t | a t − , s t − , r r t − ) := θ rt [ ν t [ a t − , s t − , r r t − ]]( a t ) (58) ˜ σ st ( s t | a t − , s t − , r r t − , x t ) := θ st [ µ t [ a t − , s t − , r r t − ]]( s t | x t ) (59) ν t [ a t − , s t , r r t − ] = F ( µ t [ a t − , s t − , r r t − ] , θ st [ µ t [ a t − , s t − , r r t − ]] , s t ) (60a) µ t +1 [ a t , s t , r r t ] = G ( ν t [ a t − , s t , r r t − ] , a t , r rt ) (60b)where F, G are defined in Appendix A.
Theorem 4:
A strategy and belief profile (˜ σ, µ, ν ) , as constructed through backward/forwardrecursion algorithm above is a cPSE of the game. Proof:
The result is implied by [2, Theorem 1] using similar arguments as in the proof ofTheorem 1 above. Note that in this case, there are multiple receivers maximizing their rewardsin each step in (51), thus instead of an optimization problem, they are solving a fixed-pointequation similar to that of a Bayesian Nash equilibrium.
DRAFT May 18, 2020ECHNICAL NOTE 15 A PPENDIX
Proof: ν t ( x t ) = P θ ( x t | s t , a t − , r r t − ) (61) = P θ ( x t , s t | s t − , a t − , r r t − ) P x t P θ ( x t , s t | s t − , a t − , r r t − ) (62) = µ t ( x t ) γ st ( s t | x t ) P x t µ t ( x t ) γ st ( s t | x t ) . (63)Thus, ν t = F ( µ t , γ st , s t ) (64) µ t +1 ( x t +1 ) = P θ ( x t +1 | s t , a t , r r t ) (65) = P x t P θ ( x t , a t , r rt , x t +1 | s t , a t − , r r t − ) P x t P θ ( x t , a t , r rt | s t , a t − , r r t − ) (66) = P x t ν t ( x t ) γ rt ( a t ) ( r rt = R rt ( x t , a t )) Q x ( x t +1 | x t , a t ) P x t ν t ( x t ) γ rt ( a t ) ( r rt = R rt ( x t , a t )) (67) = P x t ν t ( x t ) ( r rt = R rt ( x t , a t )) Q x ( x t +1 | x t , a t ) P x t ν t ( x t ) ( r rt = R rt ( x t , a t )) (68)Thus, µ t +1 = G ( ν t , a t , r rt ) (69)R EFERENCES [1] D. Vasal, A. Sinha, and A. Anastasopoulos, “A systematic process for evaluating structured perfect bayesian equilibria indynamic games with asymmetric information,”
IEEE Transactions on Automatic Control , vol. 64, no. 1, pp. 81–96, Jan2019.[2] D. Vasal, “Stochastic Stackelberg games,”
SSRN Electronic Journal , may 2020. [Online]. Available:http://arxiv.org/abs/2005.01997[3] R. B. Myerson, “Optimal Auction Design,”
Mathematics of Operations Research
SSRN Electronic Journal , dec 2014.[5] E. Kamenica and M. Gentzkow, “Bayesian persuasion,”
American Economic Review , vol. 101, no. 6, pp. 2590–2615, oct2011.[6] J. Hedlund, “Persuasion with communication costs,”
Games and Economic Behavior , vol. 92, pp. 28–40, 2015.
May 18, 2020 DRAFT6 TECHNICAL NOTE [7] W. Tamura, “A Theory of Multidimensional Information Disclosure,” Institute of Social and Economic Research, OsakaUniversity, ISER Discussion Paper 0828, jan 2012.[8] M. Gentzkow and E. Kamenica, “Bayesian persuasion with multiple senders and rich signal spaces,”
Games and EconomicBehavior , vol. 104, pp. 411–429, 2017.[9] F. Li and P. Norman, “On Bayesian persuasion with multiple senders,”
Economics Letters , vol. 170, pp. 66–70, 2018.[10] D. Bergemann and S. Morris, “Information Design, Bayesian Persuasion, and Bayes Correlated Equilibrium,”
AmericanEconomic Review , vol. 106, no. 5, may 2016.[11] ——, “Bayes correlated equilibrium and the comparison of information structures in games,”
Theoretical Economics ,vol. 11, no. 2, pp. 487–522, 2016.[12] R. Alonso and O. Cˆamara, “Bayesian persuasion with heterogeneous priors,”
Journal of Economic Theory , vol. 165, pp.672–706, 2016.[13] E. Kamenica, “Bayesian Persuasion and Information Design,”
Annual Review of Economics , vol. 11, no. 1, pp. 249–272,2019.[14] F. Farokhi, A. M. H. Teixeira, and C. Langbort, “Estimation With Strategic Sensors,”
IEEE Trans. on Automatic Control ,vol. 62, 2017.[15] D. Lingenbrink and K. Iyer, “Optimal Signaling Mechanisms in Unobservable Queues with Strategic Customers,” in
Proceedings of the 2017 ACM Conference on Economics and Computation . ACM, 2017, p. 347.[16] J. Ely, “Beeps,”
American Economic Review , 2017.[17] J. Renault, E. Solan, and N. Vieille, “Optimal dynamic information provision,”
Games and Economic Behavior , vol. 104,pp. 329–349, 2017.[18] J. W. Best and D. P. Quigley, “Honestly Dishonest : A Solution to the Commitment Problem in Bayesian Persuasion,”2016.[19] J. Best and D. Quigley, “Persuasion for the Long-Run,” Economics Group, Nuffield College, University of Oxford,Economics Papers 2016-W12, 2016.[20] T. Honryo, “Dynamic persuasion,”
Journal of Economic Theory , vol. 178, pp. 36–58, 2018.[21] D. Orlov, A. Skrzypacz, and P. Zryumov, “Persuading the Principal to Wait,”
Stanford University Graduate School ofBusiness Research Paper , no. 16-20, 2019.[22] J. H¨orner and A. Skrzypacz, “Selling Information,”
Journal of Political Economy , vol. 124, no. 6, pp. 1515–1562, 2016.[23] J. Ely and M. Szydlowski, “Moving the Goalposts,”
Journal of Political Economy , 2019.[24] L. Doval and J. Ely, “Sequential Information Design,”
Econometrica , 2019.[25] P. Basu, “Dynamic Bayesian Persuasion with a Privately Informed Receiver,” 2017.[26] P. H. Au, “Dynamic information disclosure,”
The RAND Journal of Economics , vol. 46, no. 4, pp. 791–823, 2015.[27] F. Farhadi, D. Teneketzis, and S. J. Golestani, “Static and Dynamic Informational Incentive Mechanisms for SecurityEnhancement,” in , jun 2018.[28] H. Tavafoghi and D. Teneketzis, “Informational Incentives in Congestion Games,” in
Proc. of the Allerton , 2017.[29] E. Meigs, F. Parise, A. Ozdaglar, and D. Acemoglu, “Optimal dynamic information provision in traffic routing,” in
Arxiv ,2020.[30] J. C. Ely, “Beeps,”
American Economic Review , vol. 107, no. 1, pp. 31–53, jan 2017.[31] F. Farhadi and D. Teneketzis, “Dynamic Information Design: A Simple Problem on Optimal Sequential InformationDisclosure,”
SSRN Electronic Journal , mar 2020. [Online]. Available: http://arxiv.org/abs/2003.07965[32] A. Nayyar, A. Mahajan, and D. Teneketzis, “Decentralized stochastic control with partial history sharing: A commoninformation approach,”
Automatic Control, IEEE Transactions on , vol. 58, no. 7, pp. 1644–1658, 2013.
DRAFT May 18, 2020ECHNICAL NOTE 17 [33] D. Fudenberg and J. Tirole,
Game Theory . Cambridge, MA: MIT Press, 1991.[34] ——, “Perfect bayesian equilibrium and sequential equilibrium,” journal of Economic Theory , vol. 53, no. 2, pp. 236–260,1991., vol. 53, no. 2, pp. 236–260,1991.