[PDF] Quadratic Signaling Games with Channel Combining Ratio

Abstract

In this study, Nash and Stackelberg equilibria of single-stage and multi-stage quadratic signaling games between an encoder and a decoder are investigated. In the considered setup, the objective functions of the encoder and the decoder are misaligned, there is a noisy channel between the encoder and the decoder, the encoder has a soft power constraint, and the decoder has also noisy observation of the source to be estimated. We show that there exist only linear encoding and decoding strategies at the Stackelberg equilibrium, and derive the equilibrium strategies and costs. Regarding the Nash equilibrium, we explicitly characterize affine equilibria for the single-stage setup and show that the optimal encoder (resp. decoder) is affine for an affine decoder (resp. encoder) for the multi-stage setup. For the decoder side, between the information coming from the encoder and noisy observation of the source, our results describe what should be the combining ratio of these two channels. Regarding the encoder, we derive the conditions under which it is meaningful to transmit a message.

Full PDF

QQuadratic Signaling Games with ChannelCombining Ratio

Serkan Sarıta¸s, Photios A. Stavrou, Ragnar Thobaben and Mikael Skoglund

Division of Information Science and EngineeringKTH Royal Institute of TechnologySE-10044, Stockholm, SwedenEmail: {saritas, fstavrou, ragnart, skoglund}@kth.se

Abstract

In this study, Nash and Stackelberg equilibria of single-stage and multi-stage quadratic signaling games between an encoderand a decoder are investigated. In the considered setup, the objective functions of the encoder and the decoder are misaligned,there is a noisy channel between the encoder and the decoder, the encoder has a soft power constraint, and the decoder has alsonoisy observation of the source to be estimated. We show that there exist only linear encoding and decoding strategies at theStackelberg equilibrium, and derive the equilibrium strategies and costs. Regarding the Nash equilibrium, we explicitly characterizeafﬁne equilibria for the single-stage setup and show that the optimal encoder (resp. decoder) is afﬁne for an afﬁne decoder (resp.encoder) for the multi-stage setup. For the decoder side, between the information coming from the encoder and noisy observationof the source, our results describe what should be the combining ratio of these two channels. Regarding the encoder, we derivethe conditions under which it is meaningful to transmit a message.

I. I

NTRODUCTION

Decision making has a wide-range of applications, from engineering areas (e.g. information and communication theories,control theory, machine learning etc.) to social sciences (e.g. economics, management etc.) to interdisciplinary sciences (e.g.cognitive science). Every decision mechanism requires some prior input/data or observation for the decision maker (DM) sothat an optimal decision can be made. The question may arise if, for instance, there are multiple observations corresponding tothe same data, are all the inputs reliable, which observation is the most usable one etc. In this paper, we search for an answerto these questions with two observation channels under a game theoretic framework.Consider a scenario with two DMs, an encoder and a decoder. The encoder has access to the data and transmits a messageto the decoder over a noisy channel. Besides information coming from the encoder, the decoder has also access to a noisyobservation of the original data. Based on these two observations/inputs, the decoder takes its optimal action. Here, the encoderand the decoder are assumed to have misaligned objective functions, which makes the setting a game theoretic setup. In thefollowing, we make further explanations and comments: • Our setup can be considered as a signaling game: A privately informed sender (i.e., encoder) observes the private dataand chooses a signal that is observed by the (uninformed) receiver (i.e., decoder). Upon receiving the message from theencoder, the decoder picks an action, which determines the costs of the encoder and the decoder. • From the decoder’s perspective, there are two information sources: a noisy observation of the encoder’s message and ofthe original data. The considered question is then, which conditions dictate channels combining and what should be theirrespective ratio of utilization. • The setup can also be considered as a point-to-point communication setup with (speciﬁc type of) side information at thedecoder. A. Motivational Example

When satellite navigation such as GPS is inadequate due to various reasons (e.g., signal loses signiﬁcant power indoors,multiple reﬂections may cause multi-path propagation or acquiring a satellite ﬁx may take too long), additional informationsuch as Wi-Fi positioning systems and indoor positioning systems can be utilized. As a solution to this problem, i.e., in order tomake positioning signals ubiquitous, integration between satellite navigation and indoor positioning can be made. Accordingly,our setup can model such a scenario: Actual location is to be estimated by the user, and satellite navigation, which containsthe location information, can be considered as the original data. An analogous of the encoder is the other positioning systems,which transmits location related information to the user. Even though there is a direct noisy channel between the original data(actual location) and the user due to satellite navigation, more precise location estimate can be achieved by utilizing additionalinformation coming from the other positioning systems. If the transmitted signal does not affect the costs, the game is called as cheap talk . The decoder utilizes the convex combination of the channels, and uses restricted gain coefﬁcients for the utilization of channels (e.g. due to powerconstraint), thus our setup is not equivalent to the case of parallel Gaussian channels, and it may not achieve maximum-ratio combining (see Remark III.1). Since the decoder cannot adjust the gains of the main and side channels separately, our setup is not completely equivalent to the point-to-point communicationsetup with side information at the decoder. a r X i v : . [ m a t h . O C ] F e b . Related Literature The studies on cheap talk and signaling games are initiated by Crawford and Sobel in [1], who showed that under sometechnical conditions on the objective functions of the players, the cheap talk problem only admits quantized Nash equilibriumstrategies. Signaling games have many applications in networked systems [2], [3], recommendation systems [4], [5], andeconomics [6], [7].Starting with a seminal work [8], there are many studies that consider the Stackelberg equilibrium of signaling games [9]–[15]. Many of these works assume that the non-alignment between the objective functions of the encoder and the decoder is afunction of a Gaussian random variable (RV) correlated with the Gaussian source and secret to the decoder (unlike the originalcase where it is ﬁxed and commonly known by the encoder and the decoder [1], which is also studied in [9], [13], [15] andin this paper), the Stackelberg equilibrium under quadratic costs is investigated in [10]–[12]. We refer [7], [9], [13] for morediscussion on the literature and some extensions (including Nash equilibrium analyses and multi-stage extensions) on cheaptalk and signaling games.An information theoretic formulation of the Bayesian persuasion problem [8] is studied in [11] for general (not necessarilyGaussian) sources, including the case with side information at the decoder, and recently also in [14] with ﬁnite state and actionspaces by assuming a decoder side information, respectively. In [16], lossy source coding with side information at the decoderonly, known as the Wyner-Ziv coding, is studied in which the source is observed via a memoryless noisy channel.Similar to our setup, the Bayesian Nash equilibrium of a ﬁnite alphabet semantic communication game is investigated in[17]. Besides the encoder/decoder pair acting as a team, there is also an (helpful or adversarial) agent who is able to modifythe channel transition probability of the side information received by the decoder. In [18], a similar setup is considered inwhich the decoder, besides receiving the message from the encoder over a noiseless channel, also observes side informationconsisting of the original source subject to slow fading and noise. The source coding analysis in [18] is extended to the jointsource/channel coding analysis by assuming a noisy but static channel between the encoder and the decoder [19].

C. Contributions

The main contributions of this paper can be summarized as follows:(i) A signaling game between an encoder and a decoder with quadratic objective functions is modeled with channel combiningand utilization at the decoder side.(ii) Nash and Stackelberg equilibria of the single-stage and multi-stage setups are investigated, and the equilibrium strategiesand costs are characterized.(iii) The optimality of linear strategies is proved for the single-stage Stackelberg equilibrium (Theorem III.3).(iv) For the Stackelberg equilibrium of the multi-stage setup, it is proved that the linear strategies are optimal for both theencoder and the decoder (Theorem IV.3), and an algorithm is provided to ﬁnd the equililbrium (Algorithm 1).(v) For the Nash equilibrium of the single-stage and multi-stage setups, it is proved that the optimal encoder (decoder) isafﬁne for an afﬁne decoder (encoder) (Theorem V.1).The remainder of the paper is organized as follows. We present the system model and problem formulation in Section II.Stackelberg equilibria with single-stage and multi-stage are investigated in Section III and Section IV, respectively. In SectionV, we analyze Nash equilibria. Section VI concludes the paper and discusses future research directions.

Notations: N ( µ, σ ) denotes a scalar Gaussian distribution with mean µ and variance σ , and we denote random variablesby bold lower case letters, e.g., x . II. S YSTEM M ODEL AND P ROBLEM F ORMULATION

A. System Model

For the purpose of illustration, the considered system model is depicted in Fig. 1. An informed player (encoder) observes therealization of the scalar Gaussian RV x ∼ N (0 , σ x ) and transmits a message m to the uninformed player (decoder) through theadditive white Gaussian noise channel (AWGN). The noise v is modeled as v ∼ N (0 , σ v ) , and the output y of the channel is y = m + v . Besides the noisy message from the encoder, the decoder has also an access to the source over an AWGN channel,i.e., the decoder can also observe z = x + w with w ∼ N (0 , σ w ) . The decoder can choose to observe either of the channelsor combination of them (e.g., by using a time-sharing approach). In particular, letting α ∈ [0 , , the combining&utilizationratio of the channel from the encoder is α whereas of the channel from the source is − α , i.e. , r = α y + (1 − α ) z . Thedecoder, upon observing its input r , generates an estimate ˆx of the original source x . Combining&utilization of channels can be interpreted as channel gains of α and − α . Modifying the channels’ gains as α ∈ R and α ∈ R results ininﬁnitely many decoder strategies and maximum-ratio combining (see Remark III.1). aussian RV    x γ e γ d ˆ  x r m y   σ v v   σ w w  z α α C h a n n e lsU tiliza tio nM e c h a n ism

Fig. 1. Single-stage system model.

B. Preliminaries

For the source realization x and the decoder estimate ˆ x , let c e ( x, ˆ x ) and c d ( x, ˆ x ) denote the corresponding cost functions ofthe encoder and the decoder, respectively. Then, for the given encoder strategy m = γ e ( x ) and the decoder strategy ˆ x = γ d ( r ) ,the expected encoder and the decoder costs are J e (cid:0) γ e , γ d (cid:1) = E [ c e ( x , ˆ x )] and J d (cid:0) γ e , γ d (cid:1) = E (cid:2) c d ( x , ˆ x ) (cid:3) , respectively. Sincethe costs are not (essentially) equivalent/aligned, the problem is studied under a game theoretic framework, and two equilibriumtypes are investigated: Stackelberg and Nash equilibria.In the Stackelberg (leader-follower) game, the leader (encoder) commits to a particular policy and announces it to thefollower (decoder). Upon observing the encoder’s committed strategy, the decoder takes its optimal action. More precisely, apair of strategies ( γ e, ∗ , γ d, ∗ ) is said to be a Stackelberg equilibrium [20] if J e ( γ e, ∗ , γ d, ∗ ( γ e, ∗ )) ≤ J e ( γ e , γ d, ∗ ( γ e )) ∀ γ e ∈ Γ e , where γ d, ∗ ( γ e ) satisﬁes J d ( γ e , γ d, ∗ ( γ e )) ≤ J d ( γ e , γ d ( γ e )) ∀ γ d ∈ Γ d . (1)Note that the follower (decoder) takes its action after observing the strategy γ e of the leader (encoder), the strategy γ d ( γ e ) ofthe decoder is a function of γ e .In the Nash (simultaneous-move) game, the encoder and the decoder announce their strategies at the same time. Moreprecisely, a pair of policies ( γ e, ∗ , γ d, ∗ ) is said to be a Nash equilibrium [20] if J e ( γ e, ∗ , γ d, ∗ ) ≤ J e ( γ e , γ d, ∗ ) ∀ γ e ∈ Γ e ,J d ( γ e, ∗ , γ d, ∗ ) ≤ J d ( γ e, ∗ , γ d ) ∀ γ d ∈ Γ d . (2)As observed in (2), none of the players prefers to change their optimal strategies at the equilibrium, i.e., there is no unilateralproﬁtable deviation from any of the players. C. Problem Formulation

We consider quadratic cost functions with a soft power constraint at the encoder side. In particular, c e ( x, ˆ x ) = ( x − ˆ x − b ) + θ ( γ e ( x )) and c d ( x, ˆ x ) = ( x − ˆ x ) , where b denotes the bias term commonly known by the encoder and the decoder, i.e.,the misalignment between the encoder and the decoder costs, and θ is a coefﬁcient responsible for the soft power constraint.Note that the costs simply reduce to those for a minimum mean-square estimation (MMSE) problem when b = 0 . Further notethat the case with θ = 0 corresponds to the setup with no power constraint at the encoder.The encoder aims to minimize J e (cid:0) γ e , γ d (cid:1) = E [ c e ( x , ˆ x )] by selecting an optimal encoding strategy γ e ( x ) whereas thedecoder’s goal is to minimize J d (cid:0) γ e , γ d (cid:1) = E (cid:2) c d ( x , ˆ x ) (cid:3) by choosing an optimal decoding strategy γ d ( r ) and the channelcombining parameter α . III. S INGLE -S TAGE S TACKELBERG E QUILIBRIUM

In this section, we analyze the Stackelberg equilibrium of the game between the encoder (leader) and the decoder (follower).First, we show that the lowest estimation error is achieved when the encoder and the decoder jointly use linear strategies. Thenwe characterize the (existence of) equilibria with respect to the soft power coefﬁcient θ . Theorem III.1.

Let the encoder use a linear strategy such that m = γ e ( x ) = A x . Then, the optimal decoder selects thechannel combining parameter α and the linear strategy ˆ x = γ d ( r ) = K r correspondingly. The optimal decoder strategy α ∗ and K ∗ , and its corresponding cost J d, ∗ = E [( x − ˆ x ) ] are characterized in Table I. ABLE IO

PTIMAL DECODER STRATEGY FOR A LINEAR ENCODER .Case α ∗ K ∗ J d, ∗ A ≥ Aσ w Aσ w + σ v Aσ x σ w + σ x σ v A σ x σ w + σ x σ v + σ w σ v σ x σ w σ v ( A σ w + σ v ) σ x + σ w σ v − (cid:114) σ v σ w ≤ A ≤ σ x σ x + σ w σ x σ w σ x + σ w A ≤ − (cid:114) σ v σ w Aσ x A σ x + σ v σ x σ v A σ x + σ v Proof:

See Appendix A.

Remark III.1.

As it can be observed from Table I, the optimal decoder achieves maximum-ratio combining by randomizing thechannels when

A > . However, when A < , since the decoder’s action space does not support maximum-ratio combining,the decoder always selects the better channel without randomization. Theorem III.2.

The lower bound on the estimation error J d = E [( x − ˆx ) ] is σ x Pσ v + σ x σ w +1 where P (cid:44) E [ m ] is the power ofthe transmitted signal m = γ e ( x ) by the encoder, and this lower bound is achieved if and only if both the encoder and thedecoder jointly use linear strategies.Proof: See Appendix B.Regarding the encoder cost, observe the following . Remark III.2.

Due to the Stackelberg assumption, since the encoder anticipates that the decoder will use ˆ x = γ d, ∗ ( r ) = E [ x | r ] ,the bias b can be decoupled from the encoder cost [9], [13]. In particular, J e = E [( x − E [ x | r ] − b ) + θ ( γ e ( x )) ]= E [( x − E [ x | r ]) + θ ( γ e ( x )) ] + b = J d + E [ θ ( γ e ( x )) ] + b . After ﬁnding the optimal decoder cost, we can proceed to analyze the optimum encoder strategy and characterize the(existence of) equilibria.

Theorem III.3.

The only equilibrium (afﬁne or not) in the Stackelberg setup is the linear equilibrium with γ e ( x ) = A x and γ d ( r ) = K r with A ≥ and K ≥ .In particular, at the equilibrium, the encoder cost is J e, ∗ = J d, ∗ + θA σ x + b , where A is decided according to the following decision rule A = (cid:113) σ v θσ x − σ v σ x (cid:16) σ x σ w + 1 (cid:17) , if θ < σ x σ v (cid:18) σ x σ w +1 (cid:19) , if θ ≥ σ x σ v (cid:18) σ x σ w +1 (cid:19) . Then, the corresponding α , K , and J d, ∗ can be derived from Table I.Proof: See Appendix C. IV. M

ULTI -S TAGE S TACKELBERG E QUILIBRIUM

In this section, we consider the dynamic counterpart of Section III. We start by giving the problem statement illustrated inFig. 2.In Fig. 2, an input message is formed as a Gauss-Markov model described by the following recursion: x t +1 = β t x t + n t , t ∈ N n , (3)where { β t : t ∈ N n } is a deterministic coefﬁcient, the initial message x ∼ N (0; σ x ) , σ x > , and { n t : t ∈ N n − } is amutually independent process independent of everything with n t ∼ N (0; σ n t ) , σ n t > . Since we assume a ﬁxed and public b in contrast to a private and random b which is correlated with the source as in [10]–[12], the results obtained in theformer setup cannot be applied directly to the latter one; i.e., the Stackelberg equilibria of these two setups are different. auss-Markov Process   t  z t  x γ et t y   t σ t v v   t σ t w w γ dt t r ˆ  t x t-1 r m t α t α t C h a n n e lsU tiliza tio nM e c h a n ism

Fig. 2. Multi-stage system model.

We assume a causal noisy observation of source before the decoder, modeled as a time-varying Gaussian process as follows: z t = x t + w t , t ∈ N n , (4)where { w t : t ∈ N n } is an independent noise process independent of everything with w t ∼ N (0; σ w t ) , σ w t > .At stage t , the encoder has access to x t (cid:44) { x , x , . . . , x t } and r t − (cid:44) { r , r , . . . , r t − } (a noiseless feedback channel isassumed) whereas r t (cid:44) { r , r , . . . , r t } is available to the decoder. Then, we can deﬁne the stage-wise costs of the playerssimilar to the single-stage case, i.e., c et ( x t , ˆ x t ) = ( x t − ˆ x t − b t ) + θ t ( γ e ( x t , r t − )) and c dt ( x t , ˆ x t ) = ( x t − ˆ x t ) , where b t ∈ R denotes the stage-wise bias term commonly known by the encoder and the decoder, and { θ t ∈ (0 , ∞ ) : t ∈ N n } are thestage-wise coefﬁcients for the soft power constraints. Assuming myopic encoder and decoder strategies, the costs are deﬁnedas follows: J dt = min γ dt ( r t ) , α t ∈ [0 , E [( x t − ˆ x t ) ] ,J et = min γ et ( x t , r t − ) J dt + E [ θ t ( γ et ( x t , r t − )) ] + b t J daverage total = 1 n + 1 n (cid:88) t =0 J dt ,J eaverage total = 1 n + 1 n (cid:88) t =0 J et . (5) Remark IV.1.

Note that in the sequel we see that although the costs of the encoder and decoder appear to form a nestedoptimization, they are not. In fact they can be decoupled to distinct time stages J e , J e , . . . , J en for the encoder and J d , J d , . . . , J dn for the decoder, and solved independently moving forward in time. Similar to the single-stage counterpart, ﬁrst we show that the lowest estimation error is achieved when the encoder and thedecoder jointly utilize linear strategies. In the following, we ﬁrst ﬁnd the optimal decoder for a linear memoryless encoderwithout any feedback.

Theorem IV.1.

Let the encoder use a linear memoryless strategy such that m t = γ e ( x t ) = A t x t , t ∈ N n . Then, the optimaldecoder is obtained by a discrete time Kalman ﬁlter due to joint Gaussianity and admits closed form recursions. To presentthe recursions of the ﬁlter, we need to deﬁne the following conditional mean and conditional variances : ˆ x t | t − (cid:44) E [ x t | r t − ] , Σ t | t − (cid:44) E [( x t − ˆ x t | t − ) | r t − ] , ˆ x t | t (cid:44) E [ x t | r t ] , Σ t | t (cid:44) E [( x t − ˆ x t | t ) | r t ] . Due to joint Gaussianity, the conditional variances are equivalent to the unconditional ones. hen, { ˆ x t | t − , Σ t | t − , ˆ x t | t , Σ t | t : t ∈ N n } satisfy the following scalar-valued ﬁltering recursions: ˆ x t | t − = β t − ˆ x t − | t − , Σ t | t − = β t − Σ t − | t − + σ n t − , Σ |− = σ x , ˆ x t | t = ˆ x t | t − + K t I t , I t (cid:44) r t − E (cid:2) r t | r t − (cid:3) = ( α t A t + 1 − α t ) (cid:0) x t − ˆ x t | t − (cid:1) + (1 − α t ) w t + α t v t , (innovations) σ I t = ( α t A t + 1 − α t ) Σ t | t − + (1 − α t ) σ w t + α t σ v t K t = Σ t | t − ( α t A t + 1 − α t ) σ I t , (Kalman Gain) Σ t | t = (1 − K t ( α t A t + 1 − α t ))Σ t | t − , (6) which corresponds to the optimal decoder’s strategy γ dt ( r t ) = ˆ x t | t = ˆ x t | t − + K t I t = ˆ x t | t − + K t ( r t − ( α t A t + 1 − α t )ˆ x t | t − − α t C t ) . (7) Then, the optimal channel combining parameter α t , Kalman gain K t and stage-wise costs J d, ∗ t (or Σ ∗ t | t ) are characterized inTable II. The optimal average cost is J d, ∗ average total = n +1 (cid:80) nt =0 J d, ∗ t . TABLE IIS

See Appendix D.Now assume that the encoder has a memory, and consider linear encoders with memory via noiseless feedback, i.e., γ e ( x t , r t − ) = A t ( x t − ˆ x t | t − ) , t ∈ N n . (8)The following proposition shows that the memoryless encoder and the innovations encoder (i.e., encoder with a memory)generate the same innovations process. Proposition IV.1.

The innovations process at the decoder for the class of linear encoders in (8) generates the same informationwith the innovations process obtained for the class of linear memoryless encoders assumed in Theorem IV.1 at each instantof time. Hence, the same values of α ∗ t , K ∗ t and J d, ∗ t given in Table II can be derived even if the encoder is an innovationsencoder.Proof: See Appendix E.In what follows, we leverage Proposition IV.1 to prove a theorem that generalize Theorem III.2 to the dynamic setup.

Theorem IV.2.

The lower bound on the estimation error J daverage total = n +1 (cid:80) nt =0 E (cid:2) ( x t − ˆ x t | t ) (cid:3) = (cid:80) nt =0 J d, ∗ t, LB with { J d, ∗ t, LB : t ∈ N n } computed forward in time as follows: J d, ∗ t, LB = Σ ∗ t | t − σ w t σ v t ( P t Σ ∗ t | t − σ w t + σ v t )Σ ∗ t | t − + σ w t σ v t , (9) with Σ ∗ t | t − = β t − J d, ∗ t − , LB + σ n t − , Σ |− = σ x , and P t (cid:44) E (cid:2) (cid:101) m (cid:3) , (cid:101) m (cid:44) m t − ˆ m t | t − , ˆ m t | t − (cid:44) E [ m t | r t − ] , is the powerof the innovation (cid:101) m of the transmitted signal of γ et ( x t , r t − ) by an encoder with noiseless feedback at each instant of time.This lower bound is achieved if and only if both the encoder and the decoder jointly use linear strategies.Proof: See Appendix F.bserve the following regarding the optimization problem at the encoder: J eaverage total = min m t = γ e ( x t , r t − ) n + 1 n (cid:88) t =0 J d, ∗ t + E [ θ t ( m t ) ] + b t ≥ min m t = γ e ( x t , r t − ) n + 1 n (cid:88) t =0 J d, ∗ t, LB + E [ θ t ( m t ) ] + b t ≥ min m t = γ e ( x t , r t − ) n + 1 n (cid:88) t =0 Σ ∗ t | t − σ w t σ v t ( P t Σ ∗ t | t − σ w t + σ v t )Σ ∗ t | t − + σ w t σ v t + E [ θ t ( m t ) ] + b t . (10)Here, due to Theorem IV.2, the lower bound is achievable for a linear encoder, i.e., m t = γ e ( x t , r t − ) = A t ( x t − ˆ x t | t − ) ,which implies P t = E (cid:2) (cid:101) m (cid:3) = E (cid:2) ( m t − ˆ m t | t − ) (cid:3) = A t Σ t | t − . Then the optimization problem at the encoder becomes J eaverage total = min A t ≥ , t ∈ N n n + 1 n (cid:88) t =0 (cid:34) Σ ∗ t | t − σ w t σ v t ( A t σ w t + σ v t )Σ ∗ t | t − + σ w t σ v t + θ t (cid:0) A t Σ t | t − (cid:1) + b t (cid:35) , (11)where Σ ∗ |− = σ x . The solution is obtained recursively, forward in time in the next theorem. Theorem IV.3. (Recursive solution forward in time of (11) ) For given { θ t ∈ (0 , ∞ ) : t ∈ N n } , Σ ∗ |− = σ x , the solution of (11) is as follows: J e, ∗ average total = 1 n + 1 n (cid:88) t =0 J e, ∗ t , (12) where { J e, ∗ t : t = 0 , , . . . , n } are computed forward in time by J e, ∗ t = (cid:104) J d, ∗ t + θ t A , ∗ t Σ ∗ t | t − + b t (cid:105) , Σ ∗ |− = σ x , (13) J d, ∗ t = Σ ∗ t | t − σ w t σ v t ( A , ∗ t σ w t + σ v t )Σ ∗ t | t − + σ w t σ v t , (14) where A , ∗ t are computed according to the following decision rule A , ∗ t =  σ v t √ θ t Σ ∗ t | t − σ v t − σ v t Σ ∗ t | t − (cid:16) Σ ∗ t | t − σ w t + 1 (cid:17) , if θ t < Σ ∗ t | t − σ v t (cid:18) Σ ∗ t | t − σ w t +1 (cid:19) , if θ t ≥ Σ ∗ t | t − σ v t (cid:18) Σ ∗ t | t − σ w t +1 (cid:19) . (15) and { Σ ∗ t | t − : t = 1 , . . . , n } are computed forward in time by Σ ∗ t | t − = β t − J d, ∗ t + σ n − . (16) Proof:

See Appendix G.In Algorithm 1, we summarize the previous results by providing an iterative scheme to compute the multi-stage Stackelbergequilibrium. V. N

ASH E QUILIBRIUM

In this section, we analyze the Nash equilibrium of the game between the encoder and the decoder. We consider only afﬁneequilibria.

Theorem V.1.

Consider a single-stage scenario. (i)

If the encoder is afﬁne, the optimal decoder is also afﬁne. (ii)

If the decoder is afﬁne, the optimal encoder is also afﬁne. (iii)

For θ < σ x σ v (cid:18) σ x σ w +1 (cid:19) , there are two afﬁne Nash equilibria. In particular, letting A ∗ (cid:44) (cid:114)(cid:113) θ σ v σ x − σ v σ w − σ v σ x , two sets of γ e ( x ) = A x + C , γ d ( r ) = K r + L and the channel combining parameter α are characterized as lgorithm 1 Multi-stage Stackelberg equilibrium

Initialize:

Set Σ ∗ |− = σ x , choose { ( β t , σ n t ) : t ∈ N n − } of (3); choose { ( σ w t , σ v t ) : t ∈ N n } ; choose { θ t ∈ (0 , ∞ ) : t ∈ N n } . for t = 0 : n doif t > then Compute Σ ∗ t | t − according to (16). end if Compute A , ∗ t according to (15).Compute J d, ∗ t according to (14).Compute J e, ∗ t according to (13). end for Compute J e, ∗ average total according to (12).Compute J d, ∗ average total = n +1 (cid:80) nt =0 J d, ∗ t . A C K L αA ∗ − αKbθ (cid:114) θ σ x σ v (cid:0) A ∗ σ x σ w + σ x σ v (cid:1) α K bθ A ∗ σ w A ∗ σ w + σ v − (cid:115)(cid:114) σ v θσ x − σ v σ x − b (cid:118)(cid:117)(cid:117)(cid:116)(cid:115) θσ x σ v − θθ (cid:115)(cid:114) θσ x σ v − θ b (cid:32)(cid:115) θσ x σ v − θ (cid:33) θ Furthermore, for any value of θ , the following also forms an afﬁne Nash equilibrium: A = 0 , C = 0 , K = σ x σ x + σ w , L = 0 , α = 0 . Proof:

See Appendix H.

Remark V.1.

Similar to the single-stage case, afﬁne strategies constitute an invariant subspace under best response maps formulti-stage Nash equilibria. In particular, the ﬁrst and the second parts of Theorem V.1 can be extended to the multi-stagescenario. However, since the number of equations and unknowns increase quadratically, an explicit analysis as in the thirdpart of Theorem V.1 becomes infeasible.

VI. C

ONCLUSION

In this paper, we studied Nash and Stackelberg equilibria for single-stage and multi-stage quadratic signaling games withchannel combining&utilization at the decoder. We established qualitative (e.g. linearity and informativeness) and quantitativeproperties (on linearity or explicit computation) of Nash and Stackelberg equilibria under misaligned objectives.Our model has many possible interesting extensions. Of particular interest are the case when there is a hard power constraintfor the encoder and the analysis of steady state equilibria. Scenarios with more general alternative channels/encoders, or withmore general objective functions are also under consideration.A

PPENDIX AP ROOF OF T HEOREM

III.1For the given encoder strategy m = γ e ( x ) = A x , the decoder input is r = ( αA + 1 − α ) x + α v + (1 − α ) w when the decoder adjusts the time-sharing parameter α of the channels. Then, the optimal decoder strategy is γ d ( r ) = ˆ x = E [ x | r ] , which can be expressed as γ d ( r ) = ( αA + 1 − α ) σ x ( αA + 1 − α ) σ x + α σ v + (1 − α ) σ w (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) K r . (17)Then, the corresponding decoder cost (namely, the estimation error) is J d = E [( x − K r ) ] = (( αA + 1 − α ) K − σ x + α K σ v + (1 − α ) K σ w . (18)he decoder is trying to minimize J d by adjusting both K and α . Let t (cid:44) αK and u (cid:44) (1 − α ) K . Then, the decodercost becomes J d = ( At + u − σ x + t σ v + u σ w . Since the Hessian matrix H = (cid:34) ∂ J d ∂t ∂ J d ∂t∂u∂ J d ∂u∂t ∂ J d ∂u (cid:35) is positive semi-deﬁnite, J d is a convex function of t and u . Thus, at the optimum point, i.e., when ∂J d ∂t = ∂J d ∂u = 0 , we obtain α = Aσ w Aσ w + σ v and K = Aσ x σ w + σ x σ v A σ x σ w + σ x σ v + σ w σ v . By inserting these into (18), we obtain J d = σ x A σ x σ v + σ x σ w +1 .However, note that when A < , the optimal α lies outside of its feasible region [0 , . Thus, the extreme values of thisclosed interval should be compared for A < .Let α = 0 . Then, the optimal decoder is γ d ( r ) = σ x σ x + σ w r by (17), which results in the decoder cost J d = σ x σ w σ x + σ w by (18).Now let α = 1 , i.e., a point-to-point communication scenario is considered. Then, the optimal decoder is γ d ( r ) = Aσ x A σ x + σ v r by (17), which corresponds to the decoder cost J d = σ x σ v A σ x + σ v by (18).Then, the following comparison can be made to ﬁnd the optimal decoder for A < : σ x σ w σ x + σ w α =0 (cid:81) α =1 σ x σ v A σ x + σ v ⇒ A α =0 (cid:81) α =1 σ v σ w . (19)Hence, − (cid:113) σ v σ w ≤ A ≤ corresponds to the case with α = 0 , and A ≤ − (cid:113) σ v σ w corresponds to the case with α = 1 .This completes the derivation. A PPENDIX BP ROOF OF T HEOREM

III.2In the proof, we ﬁrst obtain information theoretic lower bound on the estimation error, then show that this lower bound isachieved when the players jointly utilize linear strategies.Since the decoder’s received signal is r = α ( m + v ) + (1 − α )( x + w ) , its power can be expressed as E [ r ] = α P + (1 − α ) σ x + 2 α (1 − α ) E [ mx ] (cid:124) (cid:123)(cid:122) (cid:125) signal power + α σ v + (1 − α ) σ w (cid:124) (cid:123)(cid:122) (cid:125) noise power . Since the channels are additive Gaussian, a corresponding (combined) channel/information capacity C between x and r canbe represented as C = sup I ( x ; r ) = 12 log (cid:18) α P + (1 − α ) σ x + 2 α (1 − α ) E [ mx ] α σ v + (1 − α ) σ w (cid:19) . (20)Then, the lower bound on the estimation error can be derived as follows: I ( x ; r ) = h ( x ) − h ( x | r ) = h ( x ) − h ( x − E [ x | r ] | r ) ≥ h ( x ) − h ( x − E [ x | r ]) ( a ) ≥

12 log (2 π e σ x ) −

12 log (2 π e J d ) ⇒ I ( x ; r ) ≥

12 log ( σ x J d ) ⇒ J d ≥ σ x − I ( x ; r ) ≥ σ x − I ( x ; r ) ( b ) = σ x − log (cid:0) α P +(1 − α )2 σ x +2 α (1 − α ) E [ mx ] α σ v +(1 − α )2 σ w (cid:1) = σ x α P +(1 − α ) σ x +2 α (1 − α ) E [ mx ] α σ v +(1 − α ) σ w ( c ) ≥ σ x Pσ v + σ x σ w + 1 . (21)Here, (a) holds since the differential entropy is h ( x ) = log (2 π e σ x ) for a Gaussian source x and maximum h ( x − E [ x | r ]) is achieved when x − E [ x | r ] is Gaussian, (b) follows from (20), and (c) holds for < α < due to the following inequalities: α P + (1 − α ) σ x + 2 α (1 − α ) E [ mx ] α σ v + (1 − α ) σ w ( a ) ≤ α P + (1 − α ) σ x + 2 α (1 − α ) (cid:112) P σ x α σ v + (1 − α ) σ w ( b ) ≤ α Pα σ v + (1 − α ) σ x (1 − α ) σ w , where (a) holds due to the Cauchy-Schwarz inequality and (b) holds since ( √ a + √ b ) c + d ≤ ac + bd for positive a, b, c, d with a = α P , b = (1 − α ) σ x , c = α σ v , and d = (1 − α ) σ w . Note that ac + bd − ( √ a + √ b ) c + d = ad + bccd − a + b +2 √ abc + d = ad + bc − √ abcdcd ( c + d ) = ( √ ad −√ bc ) cd ( c + d ) ≥ . n (21), the ﬁrst inequality is tight iff x and r are jointly Gaussian, which is satisﬁed for a linear encoder, whereas the secondinequality (i.e., (c) of (21)) holds with equality for < α < when √ ad = √ bc ⇒ α √ P (1 − α ) σ w = (1 − α ) σ x α σ v ⇒√ P = α − α σ x σ v σ w . Since α = Aσ w Aσ w + σ v for a linear encoder m = γ e ( x ) = A x as shown in Theorem III.1, we obtain E [ m ] = P = A σ x , which is consistent with a linear encoder case. Note that for α = 0 , (c) in (21) reduces to σ x σ x σ w +1 with equality, and for α = 1 , (c) in (21) reduces to σ x Pσ v +1 with equality. Thus, the information theoretic lower bound on theestimation error is σ x Pσ v + σ x σ w +1 and it is achievable only for jointly linear encoder and decoder with A > and < α < . Thiscompletes the derivation. A PPENDIX CP ROOF OF T HEOREM

III.3Due to Theorem III.2 and Remark III.2, the encoder cost is lower bounded by J e ≥ σ x Pσ v + σ x σ w + 1 + θP + b (cid:44) J e LB , where P (cid:44) E [ m ] . Note that the lower bound J e LB is achievable when the encoder use linear strategy. The ﬁrst and secondorder derivatives of the lower bound J e LB are d J e LB d P = − σ x σ v (cid:16) Pσ v + σ x σ w + 1 (cid:17) + θ ≥ θ − σ x σ v (cid:16) σ x σ w + 1 (cid:17) , d J e LB d P = 2 σ x σ v (cid:16) Pσ v + σ x σ w + 1 (cid:17) > . If θ ≥ σ x σ v (cid:18) σ x σ w +1 (cid:19) , then d J e LB d P ≥ , which implies that J e LB is an increasing function of P , thus P should be selected as P = 0 ,i.e., the encoder does not transmit any message.Otherwise, i.e., if θ < σ x σ v (cid:18) σ x σ w +1 (cid:19) , the lower bound can be minimized at the critical point , d J e LB d P = 0 , which implies P ∗ = (cid:114) θ σ x σ v − σ x σ w +1 σ x σ v . Since J e LB is achievable for a linear encoder m = γ e ( x ) = A x , since P ∗ (cid:44) E [( A ∗ x ) ] = ( A ∗ ) , theoptimal A is obtained as A ∗ = (cid:118)(cid:117)(cid:117)(cid:116)(cid:114) θ σ x σ v − σ x σ w +1 σ x σ v . min P J e LB = min P σ x Pσ v + σ x σ w + 1 + θP . (22)Otherwise, i.e., if we have θ < σ x σ v (cid:18) σ x σ w +1 (cid:19) , the critical point d J e d P = 0 when P = (cid:114) θ σ x σ v − σ x σ w +1 σ x σ v . Thus, the optimal A is A = (cid:118)(cid:117)(cid:117)(cid:116)(cid:114) θ σ x σ v − σ x σ w +1 σ x σ v .This completes the derivation. PPENDIX DP ROOF OF T HEOREM

IV.1From the system in Fig. 2, we know that the observations process { r t : t ∈ N n } is given by r t = α t y t + (1 − α t ) z t ( i ) = α t ( γ (cid:15)t ( x t ) + v t ) + (1 − α t )( x t + w t )= α t ( A t x t + v t ) + (1 − α t )( x t + w t )= ( α t A t + 1 − α t ) x t + α t v t + (1 − α t ) w t , t ∈ N n , (23)where ( i ) follows from the realization in Fig. 2 and (4). Moreover, since the minimum error at the decoder at each instant oftime is E [( x t − ˆ x t | t ) ] , then, the decoder’s cost can be modiﬁed as follows: J dt = E [( x t − ˆ x t | t ) ] ( i ) = E [( x t − x t | t − − k t ( r t − ( α t A t + 1 − α t )ˆ x t | t − )) ] ( ii ) = E [( x t − x t | t − − k t (( α t A t + 1 − α t )( x t − ˆ x t | t − ) + α t v t + (1 − α t ) w t ) ]= E [(1 − k t ( α t A t + 1 − α t )( x t − ˆ x t | t − ) − k t α t v t − (1 − α t ) k t w t ) ]= (cid:2) (1 − k t ( α t A t + 1 − α t )) Σ t | t − + k t α t σ v t + (1 − α t ) k t σ w t (cid:3) (24)where ( i ) follows from (6); ( ii ) follows by substituting in our expression (23) and after some simple calculations. Remark D.1.

The decoder’s cost in (24) although written in different form, is precisely Σ t | t ≥ because the conditionalvariance Σ t | t is equal to the unconditional in KF algorithm. Optimization Problem.

We will solve the decoder’s optimization problem in (5) forward in time, starting at time stage zeroand moving forward to a ﬁxed time stage n . To do it, ﬁrst we re-formulate it as follows: J daverage total = 1 n + 1 min α n (cid:26) min α n − (cid:26) . . . min α (cid:26)(cid:26) min α J (cid:27) + J (cid:27) + . . . + J n − (cid:27) + J n (cid:27) , (25)where J =(1 − k ( α A + 1 − α )) Σ |− + k α σ v + (1 − α ) k σ w , Σ |− = σ x , (26) J t =(1 − k t ( α t A t + 1 − α t )) Σ t | t − + k t α t σ v t + (1 − α t ) k t σ w t . (27)We ﬁrst consider t = 0 . Using the formulation in (25), we want to optimize min α J . (28)Observe that from (24), by optimizing w.r.t. α , it is the same as optimizing w.r.t ( k , α ) because k depends on α . Hence,we can re-write (28) as min α , k (1 − k ( α A + 1 − α t )) Σ ∗ |− + k α σ v + (1 − α ) k σ w , Σ ∗ |− = σ x . (29)To solve (29), we ﬁrst show that it is convex. To do it, we ﬁrst introduce the auxiliary variables φ = α k , υ = (1 − α ) k . (30)For the choice of (30), (29) can be simpliﬁed to: min φ , υ (1 − A φ − υ ) σ x + φ σ v + υ σ w . (31)The Hessian matrix that corresponds to the objective function of (31), hereinafter denoted by H , can be found as follows: ∂J d ∂ φ = 2( A φ + υ − A σ x + 2 φ σ v , ∂ J d ∂ φ = 2 A σ x + 2 σ v ∂J d ∂ υ = 2( A φ + υ − σ x + 2 υ σ w , ∂ J d ∂ υ = 2 σ x + 2 σ w ∂J d ∂ υ ∂ φ = ∂J d ∂ φ ∂ υ = 2 A σ x . (32)ased on (32), the Hessian matrix H is given as follows H =  ∂ J d ∂ φ ∂ J d ∂ υ ∂ φ ∂ J d ∂ φ ∂ υ ∂ J d ∂ φ  = (cid:20) A σ x + 2 σ v A σ x A σ x σ x + 2 σ w (cid:21) . (33)It can be easily checked that for any A , the eigenvalues of H are non-negative, hence the matrix is positive semi-deﬁnite.This in turn implies that J d is jointly convex on ( φ , υ ) .Therefore, the optimal solution α ∗ is as follows: ∂J d ∂ φ = ∂J d ∂ υ = 0 (32) ⇒ − φ σ v A = − υ σ w ⇒ σ v A σ w = υ φ (30) = (1 − α ∗ ) k α ∗ k ⇒ α ∗ = A σ w A σ w + σ v (6) ⇒ k ∗ = A σ w + σ v A σ w + σ v + σ w σ v σ x = ( A σ w + σ v ) σ x ( A σ w + σ v ) σ x + σ w σ v . (34)Substituting ( α ∗ , k ∗ ) obtained in (34) to (29), we obtain J d, ∗ = Σ ∗ |− σ w σ v t ( A t σ w + σ v )Σ ∗ |− + σ w σ v .Similar to the single-stage case, note that when A < , the optimal α ∗ lies outside the feasible region [0 , . Thus, theextreme values of this closed interval should be compared for A < . This is done next.Let α ∗ = 0 . Then, the optimal decoder is γ d ( r ) = Σ ∗ |− Σ |− + σ w r by (6), which means that the decoder’s cost becomes J d, ∗ = Σ |− σ w Σ |− + σ w again from (6).Now let α ∗ = 1 , i.e., a point-to-point communication scenario is considered without side information. Then, the optimaldecoder is γ d ( r ) = A Σ |− A Σ |− + σ v r by (6), which yields a decoder’s cost J d, ∗ = Σ |− σ v A Σ |− + σ v by (6).Next, we proceed to t = 1 . Again, using the formulation in (25), this corresponds to the optimization problem min α (cid:110) J d, ∗ + J d (cid:111) ( a ) ≡ min α J d , (35)where ( a ) follows because J d, ∗ is a constant as it is already optimized in time stage t = 0 .Observe from (24) that by optimizing w.r.t. α , is the same as optimizing w.r.t ( k , α ) because k depends on α . Hence,we can re-write (28) as min α , k (1 − k ( α A + 1 − α t )) Σ ∗ | + k α σ v + (1 − α ) k σ w , (36)where Σ ∗ | = β Σ ∗ | + σ n by (6) hence it is independent of α . The latter observation stems from the fact that Σ | = J (see Remark D.1). Therefore, the procedure, is precisely the same as in time stage t = 0 , with σ x replaced by Σ ∗ | . The ﬁnalresult is given in Table II when t = 1 .Suppose that for t = n − , the solution is given by J d, ∗ n − in Table II. Then, for t = n following the approach of time stage t = 0 , the solution will be given by J d, ∗ n in Table II.Clearly, the average total cost of the decoder in (5) is the average total time stages of all individual optimal decoder’s costs.This completes the derivation. A PPENDIX EP ROOF OF P ROPOSITION

IV.1Observe that if the linear encoder is of the class (8), then, by deﬁnition, the innovations process is obtained as follows: I t (cid:44) r t − E [ r t | r t − ] , t ∈ N n = α t A t ( x t − ˆ x t | t − ) + (1 − α t ) x t + α t v t + (1 − α t ) w t − E [ α t A t ( x t − ˆ x t | t − ) + (1 − α t ) x t + α t v t + (1 − α t ) w t | r t − ] ( i ) = α t A t ( x t − ˆ x t | t − ) + (1 − α t ) x t + α t v t + (1 − α t ) w t − α t A t x t | t − − α t A t E [ˆ x t | t − | r t − ] + (1 − α t ) x t | t − ii ) = α t A t ( x t − ˆ x t | t − ) + (1 − α t ) x t + α t v t + (1 − α t ) w t − (1 − α t )ˆ x t | t − = innovations in (6) , (37)where ( i ) follows from the fact that the expectation is a linear operator, C t is a constant and that the noise process { v t : t ∈ N n } and { w t : t ∈ N n } are zero mean mutually independent processes independent of everything; ( ii ) follows from the towerroperty of conditional expectation or simply because ˆ x t | t − is r t − -measurable. Since the innovations process generates thesame information with a linear memoryless encoder, then, the results obtained in Table II will also applied for this class oflinear encoders. A PPENDIX FP ROOF OF T HEOREM

IV.2The proof is obtained using ﬁrst an information theoretic lower bound on the estimation error, and then, by showing thatthis lower bound is achievable when the players jointly utilize linear strategies.Since the decoder’s received signal at each instant of time is r t = α t ( m t + v t ) + (1 − α t )( x t + w t ) , the conditional meanand conditional variance (power) of { r t : t ∈ N n } are as follows: E [ r t | r t − ] = α t ˆ m t | t − + (1 − α t )ˆ x t | t − (38) E (cid:2) ( r t − E [ r t | r t − ]) | r t − (cid:3) ≡ E (cid:2) ( r t − E [ r t | r t − ]) (cid:3) = α t E (cid:2) ( m t − ˆ m t | t − ) (cid:3) + (1 − α t ) E (cid:2) ( x t − ˆ x t | t − ) (cid:3) + α t E (cid:2) v t (cid:3) + (1 − α t ) E (cid:2) w t (cid:3) + 2 α t (1 − α t ) E (cid:2) ( m t − ˆ m t | t − )( x t − ˆ x t | t − ) | r t − (cid:3) = α t P t + (1 − α t ) Σ t | t − + 2 α t (1 − α t ) E (cid:2) ( m t − ˆ m t | t − )( x t − ˆ x t | t − ) (cid:3)(cid:124) (cid:123)(cid:122) (cid:125) signal power + α t σ v t + (1 − α t ) σ w t (cid:124) (cid:123)(cid:122) (cid:125) Gaussian noise power . (39)Next, we give the information theoretic characterization of the average total feedback capacity and the corresponding informationfeedback capacity per time instant between { x t : t ∈ N n } and { r t : t ∈ N n } , denoted hereinafter by C fbaverage total ( { P t } nt =0 ) and C fbt ( P t ) , respectively. C fbaverage total ( { P t } nt =0 ) = sup E [ ( m t − ˆ m t | t − ) ] = P t , ∀ t n + 1 I ( x n → r n ) ( i ) = sup E [ ( m t − ˆ m t | t − ) ] = P t , ∀ t n + 1 n (cid:88) t =0 I ( x t ; r t | r t − )= sup E [ ( m t − ˆ m t | t − ) ] = P t , ∀ t n + 1 n (cid:88) t =0 (cid:2) h ( r t | r t − ) − h ( r t | r t − , x t ) (cid:3) ( ii ) = sup E [ ( m t − ˆ m t | t − ) ] = P t , ∀ t n + 1 n (cid:88) t =0 (cid:2) h G ( r t | r t − ) − h G ( r t | r t − , x t ) (cid:3) ( iii ) = 1 n + 1 n (cid:88) t =0 C fbt ( P t ) , (40)where C fbt ( P t ) = 12 log (cid:32) α t P t + (1 − α t ) Σ t | t − + 2 α t (1 − α t ) E (cid:2) ( m t − ˆ m t | t − )( x t − ˆ x t | t − ) (cid:3) α t σ v t + (1 − α t ) σ w t (cid:33) , t ∈ N n , (41) h ( ·|· ) < ∞ is the conditional differential entropy that is assumed to be ﬁnite, ( i ) follows by deﬁnition of directed information[21]; ( ii ) follows because the noise is additive Gaussian; ( iii ) follows because h G ( r t | r t − ) can be computed from (39) and h G ( r t | r t − , x t ) = log(2 πe ) (cid:0) α t σ v t + (1 − α t ) σ w t (cid:1) for each time instant.Next, we describe an interesting structural result of both C fbaverage total ( { P t } nt =0 ) and C fbt ( P t ) . Proposition F.1. (Structural result) Deﬁne the following information characterization of the information feedback capacity ¯ C fbaverage total ( { P t } nt =0 ) = sup E [ ( m t − ˆ m t | t − ) ] = P t , ∀ t n + 1 n (cid:88) t =0 I ( x t ; r t | r t − ) . (42) Then, for the same { x t : t ∈ N n } and { r t : t ∈ N n } used to obtain C fbaverage total ( { P t } nt =0 ) and C fbt ( P t ) , we have that ¯ C fbaverage total ( { P t } nt =0 ) = C fbaverage total ( { P t } nt =0 ) = n +1 (cid:80) nt =0 C fbt ( P t ) where C fbt ( P t ) = (41) for any t .Proof. This follows by computing ¯ C fb [0 ,n ] at each instant of time. Recall that conditional variance is equivalent with the unconditional for jointly Gaussian processes. ext, we derive the lower bound on the average total estimation error. Before doing it, we ﬁrst consider a lower bound onthe estimation error at each time instant obtained forward in time. To do it, we consider the following inequality: I ( x n → r n ) = n (cid:88) t =0 I ( x t ; r t | r t − ) ( ∗ ) ≥ n (cid:88) t =0 I ( x t ; r t | r t − ) (43)where ( ∗ ) follows by deﬁnition of directed information. Observe that per time instant, the following series of inequalities hold: I ( x t ; r t | r t − ) = h ( x t | r t − ) − h ( x t | r t ) = h ( x t | r t − ) − h ( x t − E [ x t | r t ] | r t ) ( (cid:63) ) ≥ h ( x t | r t − ) − h ( x t − E [ x t | r t ]) ( (cid:63)(cid:63) ) = 12 log 2 πe Σ t | t − −

12 log 2 πeJ dt = 12 log (cid:18) Σ t | t − J dt (cid:19) , Σ |− = σ x , ∀ t, = ⇒ J dt ≥ Σ t | t − − I ( x t ; r t | r t − ) ( (cid:63)(cid:63)(cid:63) ) ≥ Σ t | t − − C fbt ( (cid:63)(cid:63)(cid:63)(cid:63) ) = Σ t | t − − log (cid:32) α t Pt +(1 − αt )2Σ t | t − αt (1 − αt ) E [ ( m t − ˆ m t | t − x t − ˆ x t | t − ] α t σ v t +(1 − αt )2 σ w t (cid:33) = Σ t | t − α t P t +(1 − α t ) Σ t | t − +2 α t (1 − α t ) E [ ( m t − ˆ m t | t − )( x t − ˆ x t | t − ) ] α t σ v t +(1 − α t ) σ w t ( (cid:63)(cid:63)(cid:63)(cid:63)(cid:63) ) ≥ eq. (9) , for any t (44)where ( (cid:63) ) follows because conditioning reduces entropy; ( (cid:63)(cid:63) ) follows because the source process is Gauss-Markov driven byadditive Gaussian noise whereas h ( x t − ˆ x t | t ) is maximized if and only if h ( x t − ˆ x t | t ) = h G ( x t − ˆ x t | t ) ; ( (cid:63) (cid:63) (cid:63) ) follows because I ( x t ; r t | r t − ) ≤ sup E [( m t − ˆ m t | t − ) ]= P t I ( x t ; r t | r t − ) for any t ; ( (cid:63) (cid:63) (cid:63)(cid:63) ) follows from Proposition (F.1) and (41); ( (cid:63) (cid:63) (cid:63) (cid:63) (cid:63) ) is obtained using the following series of inequalities: α t P t + (1 − α t ) Σ t | t − + 2 α t (1 − α t ) E (cid:2) ( m t − ˆ m t | t − )( x t − ˆ x t | t − ) (cid:3) α t σ v t + (1 − α t ) σ w t ( p ≤ α t P t + (1 − α t ) Σ t | t − + 2 α t (1 − α t ) (cid:112) P t Σ t | t − α t σ v t + (1 − α t ) σ w t ( p ≤ α t P t α t σ v t + (1 − α t ) Σ t | t − (1 − α t ) σ w t , where ( p holds due to the Cauchy-Schwarz inequality; ( p holds because of the inequality in the derivation of Theorem III.2 ,i.e., η t υ t + θ t φ t ≥ ( √ η t + √ θ t ) υ t + φ t for positive η t , θ t , υ t , φ t with η t = α t P t , θ t = (1 − α t ) Σ t | t − , υ t = α t σ v t , and φ t = (1 − α t ) σ w t .In (44), the ﬁrst inequality holds with equality if and only if ( x n , r n ) are jointly Gaussian which is the case when the encoderis linear with noiseless feedback; ( (cid:63)(cid:63)(cid:63)(cid:63)(cid:63) ) holds with equality for < α t < when √ η t φ t = √ θ t υ t ⇒ α t √ P t (1 − α t ) σ w t =(1 − α t )Σ t | t − α t σ v t ⇒ √ P t = α t − α t Σ t | t − σ v t σ w t for any t . Since from Proposition IV.1 we showed that α ∗ t = A ∗ t σ w t A ∗ t σ w t + σ v t (fromTable II) for an innovations encoder γ et ( x t , r t − ) = A t ( x t − ˆ x t | t − ) , we obtain E (cid:2) ( m t − ˆ m t | t − ) (cid:3) = P t = A t Σ t | t − , whichis consistent with a linear encoder with a noiseless feedback (innovations encoder). Note that for α t = 0 , inequality ( (cid:63) (cid:63) (cid:63) (cid:63) (cid:63) ) in (44) reduces to Σ t | t − t | t − σ w t +1 that also holds with equality, and for α t = 1 , ( (cid:63) (cid:63) (cid:63) (cid:63) (cid:63) ) in (44) reduces to Σ t | t − Ptσ v t +1 that also holdswith equality. Thus, the information theoretic lower bound on the estimation error at each instant of time is given by (9) andit is achievable at each instance of time only for jointly linear encoder and decoder with A t > and < α t < .Thus, we have proved that at each instant of time going forward in time, the information theoretic lower bound on theestimation error is achievable only for jointly linear encoder and decoder.The ﬁnal result is obtained once we take the average total value of the estimation error at each instant of time. This completesthe derivation. A PPENDIX GP ROOF OF T HEOREM

IV.3We observe that the optimization variables of interest in (11) are { A t : t ∈ N n } hence we can introduce the decisionvariables { µ t = A t : t ∈ N n } which are non-negative variables. Hence, (11) can be cast as follows: J eaverage total = min µ t ≥ , t ∈ N n n + 1 n (cid:88) t =0 (cid:34) Σ ∗ t | t − σ w t σ v t ( µ t σ w t + σ v t )Σ ∗ t | t − + σ w t σ v t + θ t µ t Σ t | t − + b t (cid:35) , Σ ∗ |− = σ x . (45)o solve (45), we employ again Lagrange multipliers and forward induction. First, we write the augmented Lagrangian problemas follows L eaverage total ( { f t } nt =0 , { µ t } nt =0 ) = 1 n + 1 n (cid:88) t =0 (cid:104) Σ ∗ t | t − σ w t σ v t ( µ t σ w t + σ v t )Σ ∗ t | t − + σ w t σ v t + θ t µ t Σ t | t − + b t − f t µ t (cid:105) , Σ ∗ |− = σ x . (46)The ﬁrst order derivative test, the complementary slackness and the primal and dual feasibility conditions, respectively, arederived as follows: ∂ L eaverage total ( { f t } nt =0 , { µ t } nt =0 ) ∂µ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ t = µ ∗ t f t = f ∗ t = 0 , t = 0 , , . . . , n (47) f t µ t = 0 , ∀ t, (48) µ t ≥ , ∀ t, (49) f t ≥ , ∀ t. (50)Next, we optimize forward in time L eaverage total ( · ) and study every possible scenario depending of the active variables.t=0: ∂ L eaverage total ( { µ t } nt =0 ) ∂µ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ = µ ∗ f = f ∗ = 0= ⇒  − σ v (cid:16) µ ∗ σ v + σ w + σ x (cid:17)  + θ σ x − f ∗ = 0 = ⇒ θ = f ∗ σ x +  σ v σ x (cid:16) µ ∗ σ v + σ w + σ x (cid:17)  . (51)Next, we check possible cases to obtain our results when θ t > is given. Case 1:

Let µ ∗ = 0 . Then from (48), f ∗ ≥ , which in turn implies from (51) that θ = f ∗ σ x +  σ v σ x (cid:16) σ w + σ x (cid:17)  ≥ σ v σ x (cid:16) σ w + σ x (cid:17) ≡ θ (cid:48) . (52) Case 2:

Now assume that µ ∗ > . Then, from (48) we obtain that f ∗ = 0 , which implies from (51) that θ = 1 σ v σ x (cid:16) µ ∗ σ v + σ w + σ x (cid:17) < θ (cid:48) . (53)Moreover, solving in (53) the equation w.r.t. µ ∗ we obtain µ ∗ = σ v (cid:112) θ σ x σ v − σ v σ x (cid:18) σ x σ w + 1 (cid:19) . (54)Clearly, from the ﬁrst order derivative in (51), we can easily see that the second derivative w.r.t. to µ ∗ is positive hence thefunction is convex and the optimal solution at this stage is global.t=1: ∂ L eaverage total ( µ ∗ , { µ t } nt =1 ) ∂µ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ = µ ∗ f = f ∗ = 0= ⇒  − σ v (cid:16) µ ∗ σ v + σ w + ∗ | (cid:17)  + θ Σ ∗ | − f ∗ = 0 = ⇒ θ = f ∗ Σ ∗ | +  σ v Σ ∗ | (cid:16) µ ∗ σ v + σ w + ∗ | (cid:17)  . (55)At this stage we note that Σ ∗ | is independent of µ ∗ because its optimal solution depends on µ ∗ that is already obtained attime stage . Hence, under this observation, we can follow precisely the approached followed in time stage zero which willive Next, we check possible cases to obtain our results. Case 1:

Now assume that µ ∗ > . Then, from (48) we obtain that f ∗ = 0 , which implies from (55) that θ = 1 σ v Σ ∗ | (cid:16) µ ∗ σ v + σ w + ∗ | (cid:17) < θ (cid:48) . (57)Moreover, solving the equality in (57) w.r.t. µ ∗ we obtain µ ∗ = σ v (cid:113) θ Σ ∗ | σ v − σ v Σ ∗ | (cid:18) Σ ∗ | σ w + 1 (cid:19) . (58)Clearly, from the ﬁrst order derivative in (55), we can easily see that the second derivative w.r.t. to µ ∗ is positive hence thefunction is convex and the optimal solution at this stage is global.Now suppose that at time n − the optimal solution of µ ∗ n − , for the possible cases is as follows: Case 1:

Let µ ∗ n − = 0 . Then from (48), f ∗ n − ≥ , which in turn implies that θ n − = f ∗ n − Σ ∗ n − | n − +  σ v n − Σ ∗ n − | n − (cid:18) σ w n − + ∗ n − | n − (cid:19)  ≥ σ v n − Σ ∗ n − | n − (cid:18) σ w n − + ∗ n − | n − (cid:19) ≡ θ (cid:48) n − . (59) Case 2:

Now assume that µ ∗ n − > . Then, from (48) we obtain that f ∗ n − = 0 , which implies that θ n − = 1 σ v n − Σ ∗ n − | n − (cid:18) µ ∗ n − σ v n − + σ w n − + ∗ n − | n − (cid:19) < θ (cid:48) n − . (60)Moreover, solving the equation in(60) w.r.t. µ ∗ n − we obtain µ ∗ n − = σ v n − (cid:113) θ n − Σ ∗ n − | n − σ v n − − σ v n − Σ ∗ n − | n − (cid:32) Σ ∗ n − | n − σ w n − + 1 (cid:33) . (61)Then, at time stage t = n , we can obtain following the same argument as in time t = 1 that the followin cases hold. Case 1:

Now assume that µ ∗ n > . Then, from (48) we obtain that f ∗ n = 0 , which implies that θ n = 1 σ v n Σ ∗ n | n − (cid:16) µ ∗ n σ v n + σ w n + ∗ n | n − (cid:17) < θ (cid:48) n . (63)Moreover, solving (63) w.r.t. µ ∗ n we obtain µ ∗ n = σ v n (cid:113) θ n Σ ∗ n | n − σ v n − σ v n Σ ∗ n | n − (cid:18) Σ ∗ n | n − σ w n + 1 (cid:19) . (64)Hence, we proved that by optimizing forward in time, we obtain the optimal { µ ∗ t : t ∈ N n } . The problem is solved once wereplace µ ∗ t = A , ∗ t , for t = 0 , , . . . , n , in (11) which leads to (12), (13), (14), (15) and (16). This completes the derivation. PPENDIX HP ROOF OF T HEOREM

V.1(i) For the given afﬁne encoder strategy m = γ e ( x ) = A x + C , the decoder input is r = ( αA + 1 − α ) x + α v + (1 − α ) w + αC when the decoder adjusts the time-sharing parameter α of the channels. Then, similar to Theorem III.1, the optimaldecoder strategy is γ d ( r ) = ˆ x = E [ x | r ] . For A > , it can be expressed as γ d ( r ) = Aσ x σ w + σ x σ v A σ x σ w + σ x σ v + σ w σ v ( r − αC ) (65)with the channel combining parameter α = Aσ w Aσ w + σ v .For − (cid:113) σ v σ w ≤ A ≤ , we have γ d ( r ) = σ x σ x + σ w r and α = 0 .For A ≤ − (cid:113) σ v σ w , we have γ d ( r ) = Aσ x A σ x + σ v ( r − C ) and α = 1 .(ii) For the given afﬁne decoder strategy ˆ x = γ d ( r ) = K r + L and the nonzero channel combining parameter α , since r = α ( γ e ( x ) + v ) + (1 − α )( x + w ) , we have ˆ x = αKγ e ( x ) + (1 − α ) K x + αK v + (1 − α ) K w + L . Then, thecorresponding encoder cost is J e = E [( x − ˆ x − b ) ] + θ E [( γ e ( x )) ]= E (cid:2) ( − αKγ e ( x ) + (1 − (1 − α ) K ) x − L − b ) + θ ( γ e ( x )) (cid:3) + α K σ v + (1 − α ) K σ w = E (cid:2) ( α K + θ )( γ e ( x )) − αK ((1 − (1 − α ) K ) x − L − b ) γ e ( x ) + ((1 − (1 − α ) K ) x − L − b ) (cid:3) + α K σ v + (1 − α ) K σ w = ( α K + θ ) E (cid:34) (cid:18) γ e ( x ) − αK ((1 − (1 − α ) K ) x − L − b ) α K + θ (cid:19) − (cid:18) αK ((1 − (1 − α ) K ) x − L − b ) α K + θ (cid:19) (cid:35) + (1 − (1 − α ) K ) σ x + ( L + b ) + α K σ v + (1 − α ) K σ w = ( α K + θ ) E (cid:34) (cid:18) γ e ( x ) − αK ((1 − (1 − α ) K ) x − L − b ) α K + θ (cid:19) (cid:35) + θ (cid:18) (1 − (1 − α ) K ) σ x + ( L + b ) α K + θ (cid:19) + α K σ v + (1 − α ) K σ w . Thus, the optimal encoder strategy that minimizes the encoder cost is γ e ( x ) = αK ((1 − (1 − α ) K ) x − L − b ) α K + θ . (66)(iii) In order to have an afﬁne Nash equilibrium, the best responses of the encoder and the decoder must match each other. Inparticular, for A > , (65) and (66) must be simultaneously satisﬁed: A = αK (1 − (1 − α ) K ) α K + θ , C = − αK ( L + b ) α K + θ , K = Aσ x σ w + σ x σ v A σ x σ w + σ x σ v + σ w σ v , L = − αKC , α = Aσ w Aσ w + σ v . Notice the following: A = αK (1 − (1 − α ) K ) α K + θ = Aσ w A σ w + σ v + σ w σ v σ x (cid:32) − σ v A σ w + σ v + σ w σ v σ x (cid:33)(cid:32) Aσ w A σ w + σ v + σ w σ v σ x (cid:33) + θ = Aσ w (cid:16) A σ w + σ w σ v σ x (cid:17) A σ w + θ (cid:16) A σ w + σ v + σ w σ v σ x (cid:17) ⇒ θ (cid:18) A σ w + σ v + σ w σ v σ x (cid:19) = σ w σ v σ x ⇒ (cid:18) A + σ v σ w + σ v σ x (cid:19) = 1 θ σ v σ x ⇒ A = (cid:118)(cid:117)(cid:117)(cid:116)(cid:115) θ σ v σ x − σ v σ w − σ v σ x . (67)hen, by utilizing (67), K and α can be decided correspondingly. In order to have a valid encoder strategy, i.e., A > ,it must be satisﬁed that (cid:115) θ σ v σ x − σ v σ w − σ v σ x > ⇒ θ σ v σ x > (cid:18) σ v σ w + σ v σ x (cid:19) ⇒ θ < σ v σ x (cid:16) σ v σ w + σ v σ x (cid:17) = σ x σ v (cid:16) σ x σ w + 1 (cid:17) . Thus, the linear part of the strategies (i.e., A and K ) construct consistent equations. Regarding the translation parts,observe the following: L = − αCK = α αK ( L + b ) α K + θ K = α K ( L + b ) α K + θ ⇒ L (cid:18) − α K α K + θ (cid:19) = α K bα K + θ ⇒ L = α K bθ ⇒ C = − αKbθ . As a result, when θ < σ x σ v (cid:18) σ x σ w +1 (cid:19) , the jointly afﬁne encoder and decoder strategies γ e ( x ) = A x + C and γ d ( r ) = K r + L and the channel combining parameter α form a Nash equilibrium.Now consider the case when A ≤ − (cid:113) σ v σ w , which implies the following must be simultaneously satisﬁed: A = αK (1 − (1 − α ) K ) α K + θ , C = − αK ( L + b ) α K + θ , K = Aσ x A σ x + σ v , L = − KC , α = 1 . Notice the following: AK = K K + θ = A σ x A σ x + σ v ⇒ θK + θ = σ v A σ x + σ v = σ v Aσ x K = σ v σ x K + θ ⇒ ( K + θ ) = θσ x σ v ⇒ K = ± (cid:118)(cid:117)(cid:117)(cid:116)(cid:115) θσ x σ v − θ ⇒ A = ± (cid:118)(cid:117)(cid:117)(cid:116)(cid:115) σ v θσ x − σ v σ x . Note that in order to have valid strategies, it must hold that (cid:113) θσ x σ v − θ > ⇒ θ < σ x σ v . Due to the assumption, we have θ < σ x σ v (cid:18) σ x σ w +1 (cid:19) < σ x σ v , which satisﬁes the validity of strategies. Furthermore, we must also have A ≤ − (cid:113) σ v σ w , thus thenegative solution of A (which also implies the negative solution of K ) will be preferred. In particular, the following musthold: A = − (cid:118)(cid:117)(cid:117)(cid:116)(cid:115) σ v θσ x − σ v σ x ≤ − (cid:115) σ v σ w ⇒ (cid:115) σ v θσ x − σ v σ x ≥ σ v σ w ⇒ θ ≤ σ v σ x σ v σ w + σ v σ x , which is satisﬁed by the assumption. Thus, the linear parts of the strategies (i.e., A and K ) construct consistent equations.Regarding the translation parts, observe the following: L = − KC = K ( L + b ) K + θ ⇒ L = K bθ ⇒ C = − Kbθ . If − (cid:113) σ v σ w ≤ A ≤ holds, then the decoder does not utilize any information from the encoder, which implies the followingmust be simultaneously satisﬁed: A = αK (1 − (1 − α ) K ) α K + θ , C = − αK ( L + b ) α K + θ , K = σ x σ x + σ w , L = 0 , α = 0 . Thus, A = C = 0 is obtained. Note that, in this particular case, since the encoder has no effect on the estimationperformance of the decoder, the encoder prefers not to transmit any message to minimize its cost (by avoiding transmissioncost).This completes the derivation. EFERENCES[1] V. P. Crawford and J. Sobel, “Strategic information transmission,”

Econometrica , vol. 50, pp. 1431–1451, 1982.[2] I. Shames, A. M. H. Teixeira, H. Sandberg, and K. H. Johansson, “Agents misbehaving in a network: a vice or a virtue?”

IEEE Network , vol. 26, no. 3,pp. 35–40, May 2012.[3] B. Larrousse, O. Beaude, and S. Lasaulce, “Crawford-Sobel meet Lloyd-Max on the grid,” in

IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP) , May 2014, pp. 6127–6131.[4] J. Miklós-Thal and H. Schumacher, “The value of recommendations,”

Games and Economic Behavior , vol. 79, pp. 132–147, 2013.[5] O. Ben-Porat and M. Tennenholtz, “A game-theoretic approach to recommendation systems with strategic content providers,” in

International Conferenceon Neural Information Processing Systems (NeurIPS) , 2018, p. 1118–1128.[6] J. G. Riley, “Silver signals: Twenty-ﬁve years of screening and signaling,”

Journal of Economic Literature , vol. 39, no. 2, pp. 432–478, June 2001.[7] J. Sobel, “Signaling games,” in

Encyclopedia of Complexity and Systems Science , R. A. Meyers, Ed. Springer New York, 2009, pp. 8125–8139.[8] E. Kamenica and M. Gentzkow, “Bayesian persuasion,”

American Economic Review , vol. 101, no. 6, pp. 2590–2615, Oct. 2011.[9] S. Sarıta¸s, S. Yüksel, and S. Gezici, “Quadratic multi-dimensional signaling games and afﬁne equilibria,”

IEEE Transactions on Automatic Control ,vol. 62, no. 2, pp. 605–619, Feb. 2017.[10] F. Farokhi, A. M. H. Teixeira, and C. Langbort, “Estimation with strategic sensors,”

IEEE Transactions on Automatic Control , vol. 62, no. 2, pp. 724–739,Feb. 2017.[11] E. Akyol, C. Langbort, and T. Ba¸sar, “Information-theoretic approach to strategic communication as a hierarchical game,”

Proceedings of the IEEE , vol.105, no. 2, pp. 205–218, Feb. 2017.[12] M. O. Sayin, E. Akyol, and T. Ba¸sar, “Hierarchical multistage Gaussian signaling games in noncooperative communication and control systems,”

Automatica , vol. 107, pp. 9–20, 2019.[13] S. Sarıta¸s, S. Yüksel, and S. Gezici, “Dynamic signaling games with quadratic criteria under Nash and Stackelberg equilibria,”

Automatica , vol. 115, p.108883, May 2020.[14] M. L. Treust and T. Tomala, “Strategic communication with side information at the decoder,” arXiv preprint arXiv:1911.04950 , 2020.[15] S. Sarıta¸s, G. Dán, and H. Sandberg, “Passive fault-tolerant estimation under strategic adversarial bias,” in

American Control Conference (ACC) , 2020,pp. 4644–4651.[16] Y. Wei, S. Lin, S. Lin, H. Su, and H. V. Poor, “Residual-quantization based code design for compressing noisy sources with arbitrary decoder sideinformation,”

IEEE Transactions on Communications , vol. 64, no. 4, pp. 1711–1725, 2016.[17] B. Güler, A. Yener, and A. Swami, “The semantic communication game,”

IEEE Transactions on Cognitive Communications and Networking , vol. 4,no. 4, pp. 787–802, 2018.[18] C. T. K. Ng, C. Tian, A. J. Goldsmith, and S. Shamai, “Minimum expected distortion in Gaussian source coding with fading side information,”

IEEETransactions on Information Theory , vol. 58, no. 9, pp. 5725–5739, 2012.[19] I. Estella Aguerri and D. Gündüz, “Joint source-channel coding with time-varying channel and side-information,”

IEEE Transactions on InformationTheory , vol. 62, no. 2, pp. 736–753, 2016.[20] T. Ba¸sar and G. J. Olsder,

Dynamic Noncooperative Game Theory . Philadelphia, PA: SIAM Classics in Applied Mathematics, 1999.[21] J. L. Massey, “Causality, feedback and directed information,” in