[PDF] Continuous Random Variable Estimation is not Optimal for the Witsenhausen Counterexample

Abstract

Optimal design of distributed decision policies can be a difficult task, illustrated by the famous Witsenhausen counterexample. In this paper we characterize the optimal control designs for the vector-valued setting assuming that it results in an internal state that can be described by a continuous random variable which has a probability density function. More specifically, we provide a genie-aided outer bound that relies on our previous results for empirical coordination problems. This solution turns out to be not optimal in general, since it consists of a time-sharing strategy between two linear schemes of specific power. It follows that the optimal decision strategy for the original scalar Witsenhausen problem must lead to an internal state that cannot be described by a continuous random variable which has a probability density function.

Full PDF

aa r X i v : . [ c s . I T ] F e b Continuous Random Variable Estimation is notOptimal for the Witsenhausen Counterexample

Ma¨el Le Treust

ETIS UMR 8051,CY Cergy Paris Universit´e, ENSEA, CNRS

Tobias J. Oechtering

KTH Royal Institute of Technology EECSDiv. Inform. Science and Engineering

Abstract —Optimal design of distributed decision policies canbe a difﬁcult task, illustrated by the famous Witsenhausencounterexample. In this paper we characterize the optimal controldesigns for the vector-valued setting assuming that it resultsin an internal state that can be described by a continuousrandom variable which has a probability density function. Morespeciﬁcally, we provide a genie-aided outer bound that relies onour previous results for empirical coordination problems. Thissolution turns out to be not optimal in general, since it consistsof a time-sharing strategy between two linear schemes of speciﬁcpower. It follows that the optimal decision strategy for the originalscalar Witsenhausen problem must lead to an internal state thatcannot be described by a continuous random variable which hasa probability density function.

I. I

NTRODUCTION

Distributed decision-making systems arise in many engi-neering problems where decentralized agents choose actionsbased on locally available information as to minimize acommon cost function. The information at each agent is eitherlocally observed or received from other agents. Since theprocess of sharing information comes with a cost, agentsusually do not have access to the whole information availableat all agents. The design of optimal decision strategies forsuch problems is considered to be notoriously difﬁcult. TheWitsenhausen counterexample [1] from 1968 is an outstandingtoy example that has signiﬁcantly helped to understand thefundamental difﬁculty that actions serve two purposes, a con-trol purpose affecting the system state and a communicationpurpose providing information to other agents [2].Although Witsenhausen refuted with his simple two-pointcounterexample the assertion that a linear policy would bealso optimal in such a Gaussian setting, the optimal non-linearpolicy remains unknown. Many researcher have approachedthe optimization problem with various methods. In the lastdecade for instance it has been approached with numericaloptimization methods [3], [4], where the latter is based on aniterative source-channel coding approach. Analytically, usingresults from optimal transport theory, it has been shown in [5]that the optimal decision strategy is a strictly increasing un-bounded piece-wise real analytic function with a real analytic

The authors gratefully acknowledge the ﬁnancial support of SRV ENSEAfor visits at KTH in Stockholm in 2017 and 2019, and at ETIS in Cergyin 2018. This research has been conducted as part of the Labex MME-DII(ANR11-LBX-0023-01). Part of the research has been supported by SwedishResearch Council (VR) under grant 2020-03884. left inverse. More necessary conditions have been derived in[6] by analyzing an equivalent optimization problem on thespace of square-integrable quantile functions. However, it isunclear if the optimal decision policy of the ﬁrst agent resultsin an internal state that can be described by a continuousrandom variable.In this work, we show that the optimal decision strategywill not lead to an internal state that can be described bya continuous random variable that has a probability densityfunction. The observation points on a subtle point in anouter bound argument, which might be easily overseen. Wewill further discuss that this observation, and in essence alsothe Witsenhausen counterexample, can be easily explainedby the relation between the MMSE and mutual informationconsidering Gaussian or binary distributed input [7]. + + b b X n ∼ N (0 , Q I ) X n U n X n Y t Z n ∼ N (0 , N I ) U ,t X ,t C C Fig. 1. The state and the channel noise are drawn according to the i.i.d.Gaussian distributions X n ∼ N (0 , Q I ) and Z n ∼ N (0 , N I ) . The internalstate sequence X n is causally estimated by decision maker C . Our approach is based on a vector-valued extension ofthe Witsenhausen counterexample as proposed by Grover andSahai in [8]. They study a non-causal encoding and decodingstrategy that combines a coding scheme with side informationand a linear scheme,which has been shown to be optimalby Choudhuri and Mitra in [9]. It has later been observedthat such problems can be also approached as an empiricalcoordination coding problem. In [10], we have provided anoverview on the individual ﬁndings and completed the missingcases using coding results from [11]. In [12] we have derivedan achievability result considering non-causal encoding andcausal decoding using a continuous alphabet building on proofmethods from [13]. In this work, we now derive a genie-aided outer bound for this case considering only decisionstrategies that result in continuous random variables whichhave a probability density functions.I. S

YSTEM M ODEL

In this work, we restrict our study to continuous randomvariables which have a probability density function (pdf), see[14, Chap. 8]. For brevity we only refer to continuous randomvariables.We consider the vector-valued Witsenhausen setup in whichthe sequences of states and channel noises are drawn inde-pendently according to the i.i.d. Gaussian distributions X n ∼N (0 , Q I ) and Z n ∼ N (0 , N I ) with min( Q, N ) > , where I denotes the identity matrix. We denote by X the internalstate and Y the output of the noisy channel. X = X + U with X ∼ N (0 , Q ) , (1) Y = X + Z = X + U + Z with Z ∼ N (0 , N ) . (2)We denote by P X = N (0 , Q ) the Gaussian state distributionand by P X Y | X U the conditional probability distributioncorresponding to equations (1) and (2). Deﬁnition 1.

For n ∈ N ⋆ = N \ { } , a “control design” withnon-causal encoder and causal decoder is a tuple of stochasticfunctions c = ( f, { g t } t ∈{ ,...,n } ) deﬁned by f : X n −→ U n , g t : Y t −→ U , ∀ t ∈ { , . . . , n } , (3) which induces a distribution over the sequences given by (cid:18) n Y t =1 P X ,t (cid:19) f U n | X n (cid:18) n Y i = t P X ,t Y ,t | X ,t U ,t (cid:19)(cid:18) n Y t =1 g U ,t | Y t (cid:19) . We denote by C d ( n ) the set of control designs with non-causal encoder and causal decoder c = ( f, { g t } t ∈{ ,...,n } ) that induce sequences of continuous random variables. Deﬁnition 2.

We deﬁne the n -stage costs associated with c by γ n p ( c ) = ( E h n P nt =1 U ,t i if it exists, + ∞ otherwise, (4) γ n s ( c ) = ( E h n P nt =1 ( X ,t − U ,t ) i if it exists, + ∞ otherwise. (5) The pair of costs ( P, S ) ∈ R is achievable if for all ε > ,there exists ¯ n ∈ N ⋆ , for all n ≥ ¯ n , there exists a control design c ∈ C d ( n ) such that (cid:12)(cid:12)(cid:12) P − γ n p ( c ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) S − γ n s ( c ) (cid:12)(cid:12)(cid:12) ≤ ε. (6) Theorem 1.

The pair of Witsenhausen costs ( P, S ) is achiev-able if and only if there exists continuous random variableswith probability distribution that decomposes according to P X Q U W W | X P X Y | X U Q U | W Y , (7) where ( W , W ) are auxiliary random variables such that ≤ I ( W ; Y | W ) − I ( W , W ; X ) and P = E Q (cid:2) U (cid:3) , S = E Q (cid:2) ( X − U ) (cid:3) . (8)This result is stated in [12, Theorem 1]. Remark 1.

The probability distribution in (7) satisﬁes  ( X , Y ) − (cid:10) − ( X , U ) − (cid:10) − ( W , W ) ,U − (cid:10) − ( Y , W ) − (cid:10) − ( X , X , U , W ) ,X − (cid:10) − ( X , U ) − (cid:10) − ( X , U , Y , W , W ) . (9) The causality condition prevents the controller C to recover W which induces the second Markov chain of (9) . Deﬁnition 3.

The optimal cost considering continuous randomvariables is characterized by the optimization problem deﬁnedas follows S c ( P ) = min Q∈ Q c ( P ) E Q (cid:2) ( X − U ) (cid:3) , (10) Q c ( P ) = (cid:26)(cid:0) Q U W W | X , Q U | W Y (cid:1) s.t. P = E Q (cid:2) U (cid:3) ,I ( W ; Y | W ) − I ( W , W ; X ) ≥ and X , U , W , W , X , Y , U are continuous (cid:27) . (11) Lemma 1 (Lemma 11 in [1]) . The best linear scheme is U = − q PQ X if P ≤ Q , otherwise U = − X + √ P − Q . Thisinduces the estimation cost given by S ℓ ( P ) =  (cid:0) √ Q −√ P (cid:1) · N (cid:0) √ Q −√ P (cid:1) + N if P ∈ [0 , Q ] , otherwise. (12) PS QQP P Q = 2 , N = 0 . b bb bbbbb Fig. 2. The curve S ℓ ( P ) in (12) and the straight line N · ( Q − N − P ) Q . Theorem 2.

The optimal cost with continuous random vari-ables satisﬁes S c ( P ) = ( N · ( Q − N − P ) Q if Q > N and P ∈ [ P , P ] ,S ℓ ( P ) otherwise, (13) P = 12 (cid:16) Q − N − p Q · ( Q − N ) (cid:17) , (14) P = 12 (cid:16) Q − N + p Q · ( Q − N ) (cid:17) . (15) If P > Q , the inernal state X can be canceled and the offset √ P − Q is only included to meet the power constraint with equality as in (6). he proof of Theorem 2 is stated in Sec. V-A and V-B.Figure 2 represents the cost of the linear scheme S ℓ ( P ) and theline of equation P N · ( Q − N − P ) Q . Note that the upper boundin (13) may be obtained by using either a time sharing strategybetween the two linear schemes with parameters P and P when Q > N and P ∈ [ P , P ] , or a linear scheme of power P . This result shows that memoryless policies are optimalso that these policies are also optimal for the original scalarWitsenhausen counterexample setup restricted to continuousrandom variables. However, as pointed out by Witsenhausenin [1], such a strategy in the original scalar model is generallynot optimal!III. W ITSENHAUSEN ’ S TWO - POINTS STRATEGY

The two-point strategy described in [1, pp. 141] outperformsthe optimal cost with continuous random variables S c ( P ) ofTheorem 2. We consider Q = 10 , N = 1 and the sender’sstrategy with parameter a ≥ given by U = a · sign (cid:0) X (cid:1) − X , (16)which induces X = a · sign (cid:0) X (cid:1) . For all a ≥ , the pair ofcosts are given by P t ( a ) = Q + a (cid:16) a − r Qπ (cid:17) , (17) S t ( a ) = a √ π e − a Z e − y cosh ay dy. (18)Fig. 3 shows that for some a ∈ [0 . , , the pair of costs ( P t ( a ) , S t ( a )) Pareto-dominates S c ( P ) of Theorem 2, forsome P . PS QQP P Q = 10 , N = 1 b bb bbbbb Fig. 3. The lasso-shaped curve corresponds to the Witsenhausen’s two-pointstrategy with parametric equations (17) and (18) for a ∈ [0 . , . From the previous, we have the following Theorem.

Theorem 3.

There exists values for Q , N , and a for whichwe have S t ( a ) ≤ S c ( P t ( a )) . Discussion In the following we also brieﬂy want to explain the Wit-senhausen counterexample result in terms of the I-MMSEformula by Guo, Shamai and Verdu [7]. The formula hasbeen used to illustrate in [7, Fig. 1] that binary inputs inadditive Gaussian noise channels result in a lower MMSEthan Gaussian distributed inputs with the same SNR. In otherwords, to achieve the same MMSE, binary distributed inputrequires less channel input power than Gaussian distributedinput. Exactly this power gain has been exploited in theWitsenhausen counterexample scheme in which the internalstate X is binary so that the resulting MMSE outperforms theMMSE of the best linear scheme. Analytically, it is interestingto see that the MMSE formulas (17) and (18) directly relateto [7, Equations (13) and (17)] with the corresponding signalpowers and noise power.IV. C ONCLUSION

Our results show that information theoretic methods, inparticular coordination coding results, lead to new insightson the Witsenhausen counterexample. Vice versa, we believethat our observation makes the Witsenhausen counterexamplealso interesting for other source-channel coding problems. Inmore detail, we show that a convex combination of linearmemoryless policies is optimal for the vector-valued Witsen-hausen problem with causal decoder restricted to the space ofcontinuous random variables. Since the policy is memoryless,it follows that the linear policy is also optimal for the originalWitsenhausen problem restricted to the space of continuousrandom variables which have a pdf. However, Witsenhausen’stwo-points strategy outperforms the best linear strategy, whichimplies that the hypothesis of a continuous random variablewhich has a pdf is an active restriction for the Witsenhausencounter-example setup. According to the Lebesgue’s decompo-sition Theorem, every probability distribution on the real lineis a mixture of discrete part, singular part, and an absolutelycontinuous part. Accordingly, we conclude that the optimaldecision strategy for the unrestricted Witsenhausen’s counter-example must lead to an internal state that cannot be describedby a continuous random variable which has a pdf. In futureworks, we will consider policies that result in internal statesdescribed by more general probability distributions.V. P

ROOFS

A. Lower bound for the Theorem 2

The Markov chain Y − (cid:10) − ( X , U ) − (cid:10) − ( W , W ) implies I ( W ; Y | W ) − I ( W , W ; X ) ≤ I ( W ; Y | W , X ) − I ( W ; X ) (19) ≤ I ( U ; Y | W , X ) − I ( W ; X ) . (20)herefore S c ( P ) ≥ min Q U W | X ∈ Q P ) , Q U | W Y E Q (cid:2) ( X − U ) (cid:3) (21) ≥ min Q U W | X ∈ Q ( P ) E Q h(cid:16) X − E (cid:2) X | W , Y (cid:3)(cid:17) i , (22)where Q ( P ) = (cid:26) Q U W | X s.t. P = E Q (cid:2) U (cid:3) ,I ( U ; Y | W , X ) − I ( W ; X ) ≥ , ( X , U , W , X , Y , U ) are continuous (cid:27) , (23)The random variables ( X , W , U ) drawn according to Q ′ U W | X which is optimal for (22), have covariance matrix K =  Q ρ √ QV ρ √ QPρ √ QV V ρ √ V Pρ √ QP ρ √ V P P  , (24)where the correlation coefﬁcients ( ρ , ρ , ρ ) ∈ [ − , aresuch that QV P · (cid:0) − ρ − ρ − ρ + 2 ρ ρ ρ (cid:1) ≥ , i.e. K issemi-deﬁnite positive.We denote by Q U W | X the Gaussian conditional distribu-tion such that ( X , W , U ) ∼ N (0 , K ) . According to [15,Maximum Differential Entropy Lemma, pp. 21], we have E Q ′ h(cid:16) X − E (cid:2) X | Y , W (cid:3)(cid:17) i ≥ πe · h ( X | Y ,W ) (25) = E Q h(cid:16) X − E (cid:2) X | Y , W (cid:3)(cid:17) i . (26)Moreover, both P X Q ′ U W | X and P X Q U W | X satisfy ≤ I Q ′ ( U ; Y | W , X ) − I Q ′ ( W ; X ) (27) = h Q ′ ( Y | W , X ) − h Q ( Y | W , X , U ) − h Q ( X ) + h Q ′ ( X | W ) (28) ≤ h Q ( Y | W , X ) − h Q ( Y | W , X , U ) − h Q ( X ) + h Q ( X | W ) (29) = I Q ( U ; Y | W , X ) − I Q ( W ; X ) , (30)where (28) comes from h Q ′ ( X ) = h Q ( X ) and h Q ′ ( Y | W , X , U ) = h Q ( Z ) = h Q ( Y | W , X , U ) , and(29) comes from [14, (8.92), pp. 256]. Lemma 2.

Assume that ( X , W , U ) ∼ N (0 , K ) , then I ( U ; Y | X , W ) − I ( X ; W )= 12 log (cid:18) PN · (1 − ρ − ρ − ρ + 2 ρ ρ ρ ) + (1 − ρ ) (cid:19) , (31) E Q h(cid:16) X − E (cid:2) X | Y , W (cid:3)(cid:17) i = N · (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 √ QP · ( ρ − ρ ρ ) (cid:17) N + (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 √ QP · ( ρ − ρ ρ ) (cid:17) . (32) The proof of Lemma 2 is stated in Sec. V-C. Note thatthe equations (31) and (32) do not depend on the varianceparameter V of the auxiliary random variable W . Also, when(31) is positive the matrix K is semi-deﬁnite positive.By using Lemma 2, we reformulate (22) and since thefunction x → N · xN + x is strictly increasing for all x ≥ , theoptimal parameters ( ρ ⋆ , ρ ⋆ , ρ ⋆ ) ∈ [ − , minimize Q · (1 − ρ ) + P · (1 − ρ ) + 2 p QP · ( ρ − ρ ρ ) , (33)under the constraint PN · (1 − ρ − ρ − ρ + 2 ρ ρ ρ ) − ρ ≥ (34) ⇐⇒ (1 − ρ ) · (1 − ρ ) − NP · ρ ≥ ( ρ − ρ ρ ) , (35)which yields ρ ⋆ = ρ ρ − r (1 − ρ ) · (1 − ρ ) − NP · ρ . (36) Lemma 3. If Q > N and P ∈ [ P , P ] , then ρ ⋆ = P · Q − ( P + N ) ( P + N ) · Q , ρ ⋆ = 0 . (37) If Q ≤ N or if Q > N and P ∈ [0 , P ] ∪ [ P , Q ] , then ρ ⋆ = 0 , ρ ⋆ = 0 . (38) If P > Q , then ρ ⋆ = 0 , ρ ⋆ = P − QP . (39)The proof of Lemma 3 is stated in Sec. V-D. We ob-tain the lower bound by replacing the optimal parameters ( ρ ⋆ , ρ ⋆ , ρ ⋆ ) ∈ [ − , in (32), that is if P ∈ [0 , Q ] S c ( P ) ≥  N · ( Q − N − P ) Q if Q > N and P ∈ [ P , P ] , (cid:0) √ Q −√ P (cid:1) · N (cid:0) √ Q −√ P (cid:1) + N otherwise. (40) B. Upper bound for the Theorem 21) Linear Scheme:

By considering the best linear scheme U = ( − q PQ · X , if P < Q, − X + √ P − Q, if P ≥ Q, (41)we have that S c ( P ) ≤ S ℓ ( P ) , for all P ≥ .

2) Case where

Q > N and P ∈ [ P , P ] : The upperbound of Theorem 2 can be obtained by using a time sharingstrategy between the two linear schemes with parameters P nd P . We obtain the same result by assuming that the randomvariables ( U , W , W ) are drawn according to W = s P + NP Q − ( P + N ) · (cid:16) X − Z (cid:17) ∼ N (cid:16) , (cid:17) , with Z ∼ N (cid:18) , QN + ( P + N ) P + N (cid:19) and Z ⊥ W , (42) W = P Q − ( P + N ) Q ( P + N ) · X + U , with U ⊥ ( X , W ) and U ∼ N (cid:18) , N · (cid:0) P Q − ( P + N ) (cid:1) QN + ( P + N ) (cid:19) , (43) U = − ( P + N ) QN + ( P + N ) · X + U + r P Q − ( P + N ) P + N · ( P + N ) QN + ( P + N ) · W . (44)Then we have I ( W ; Y , W ) − I ( W ; X , W ) = I ( U ; Y | X , W ) (45) = I ( X ; W ) = 12 log (cid:18) P Q − ( P + N ) QN + ( P + N ) (cid:19) , (46)and S c ( P ) ≤ N · ( Q − P − N ) Q . (47)

C. Proof of Lemma 2

We consider ( X , W , U ) ∼ N (0 , K ) with K deﬁned in(24), which together with (2), induces the Gaussian randomvariables ( X , W , Y ) whose entropy is h ( X , W , Y ) = 12 log (cid:18) (2 πe ) · QV (48) × (cid:16) P · (1 − ρ − ρ − ρ + 2 ρ ρ ρ ) + N · (1 − ρ ) (cid:17)(cid:19) . (49)Therefore we have I ( U ; Y | X , W ) − I ( X ; W )= h ( X , W , Y ) − h ( Y | U , X , W ) − h ( X ) − h ( W ) (50) = 12 log (cid:18) PN · (1 − ρ − ρ − ρ + 2 ρ ρ ρ ) + (1 − ρ ) (cid:19) . (51)According to (1) and (2) the entropy of ( X , W , Y ) writes h ( X , W , Y ) = 12 log (cid:18) (2 πe ) · V · N (52) × (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 p QP · ( ρ − ρ ρ ) (cid:17)(cid:19) , (53) and hence E h(cid:0) X − E ( X | Y, W ) (cid:1) i = N · (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 √ QP · ( ρ − ρ ρ ) (cid:17) N + (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 √ QP · ( ρ − ρ ρ ) (cid:17) . (54) D. Proof of Lemma 3

We replace ρ ⋆ in (33) and we deﬁne f ( ρ , ρ ) = Q · (1 − ρ ) + P · (1 − ρ ) − p QP · r (1 − ρ ) · (1 − ρ ) − NP · ρ . (55)Note that f is well deﬁned if ρ ≤ PP + N and ρ ≤ − NP · ρ − ρ . ∂f ( ρ , ρ ) ∂ρ = p P Q · − ρ q (1 − ρ ) · (1 − ρ ) − NP · ρ − P, (56)then for all ρ ≤ PP + N , the optimal ρ ⋆ ( ρ ) is ρ ⋆ ( ρ ) = max − (cid:18) QP · (cid:16) − ρ (cid:17) + NP · ρ − ρ (cid:19) , ! . (57)We introduce the parameters ρ a = 2 Q − ( P + N ) − p ( P + N ) − QN Q , (58) ρ b = 2 Q − ( P + N ) + p ( P + N ) − QN Q , (59)and we deﬁne the function F ( ρ ) = f (cid:16) ρ , ρ ⋆ ( ρ ) (cid:17) =  Q · (1 − ρ ) + P − √ QP · q − ρ · P + NP if ≤ ρ ≤ ρ a ,N · ρ − ρ if ρ a ≤ ρ ≤ ρ b ,Q · (1 − ρ ) + P − √ QP · q − ρ · P + NP if ρ b ≤ ρ ≤ PP + N . (60)The function F ( ρ ) is continuous in ρ a and ρ b . We deﬁne ρ ⋆ = P · Q − ( P + N ) ( P + N ) · Q . (61) • If Q > N and P ∈ [ P , P ] , then the function F ( ρ ) isdecreasing over the interval ρ ∈ [0 , ρ ⋆ ] and increasing overthe interval ρ ∈ [ ρ ⋆ , PP + N ] , then the optimal parameters are ρ = ρ ⋆ , ρ = 0 . (62) • If Q ≤ N or if Q > N and P ∈ [0 , P ] ∪ [ P , Q ] , thenthe optimal parameters are ρ = ρ = 0 . • If P > Q , then ρ = 0 , ρ = P − QP . (63)

EFERENCES[1] H. Witsenhausen, “A counterexample in stochastic optimum control,”

SIAM Journal on Control , vol. 6, no. 1, pp. 131–147, 1968.[2] S. Y¨uksel and T. Basar,

Stochastic Networked Control Systems: Stabi-lization and Optimization under Information Constraints , ser. Systems &Control Foundations & Applications. New York, NY: Springer, 2013.[3] S.-H. Tseng, A. Tang. “A Local Search Algorithm for the Witsenhausen’sCounterexample.,” in

Proc. IEEE CDC , Dec. 2017.[4] J. Karlsson, A. Gattami, T. J. Oechtering, and M. Skoglund, “Iterativesource-channel coding approach to Witsenhausen’s counterexample,” in

Proc. IEEE ACC,

Proc. IEEE CDC,

Automatica,

IEEE Trans. Inform. Theory,

April 2005.[8] P. Grover and A. Sahai, “Witsenhausen’s counterexample as assistedinterference suppression,” in

Int. J. Syst., Control Commun. , 2010.[9] C. Choudhuri and U. Mitra, “On Witsenhausen’s counterexample: theasymptotic vector case,” , pp. 162–166, July 2012.[10] M. Le Treust and T. J. Oechtering, “Optimal Control Designs forVector-valued Witsenhausen Counterexample Setups,”

IEEE 56th AnnualAllerton Conf. on Commun., Control, and Comp. , Sept. 2018.[11] M. Le. Treust, “Joint Empirical Coordination of Source and Channel,”

IEEE Trans. Inf. Theory , vol. 63, no. 8, pp. 5087–5114, Aug. 2017.[12] T. J. Oechtering and M. Le Treust, “Coordination Coding with CausalDecoder for Vector-valued Witsenhausen Counterexample Setups,”

IEEEInformation Theory Workshop (ITW) , Aug. 2019.[13] M. T. Vu, T. J. Oechtering, and M. Skoglund, “Hierarchical IdentiﬁcationWith Pre-Processing,”

IEEE Trans. Inf. Theory , Jan. 2020.[14] T. M. Cover and J. A. Thomas,

Elements of Information Theory , 2nd ed.Wiley & Sons, 2006.[15] A. El Gamal and Y.-H. Kim,