Continuous Random Variable Estimation is not Optimal for the Witsenhausen Counterexample
aa r X i v : . [ c s . I T ] F e b Continuous Random Variable Estimation is notOptimal for the Witsenhausen Counterexample
Ma¨el Le Treust
ETIS UMR 8051,CY Cergy Paris Universit´e, ENSEA, CNRS
Tobias J. Oechtering
KTH Royal Institute of Technology EECSDiv. Inform. Science and Engineering
Abstract —Optimal design of distributed decision policies canbe a difficult task, illustrated by the famous Witsenhausencounterexample. In this paper we characterize the optimal controldesigns for the vector-valued setting assuming that it resultsin an internal state that can be described by a continuousrandom variable which has a probability density function. Morespecifically, we provide a genie-aided outer bound that relies onour previous results for empirical coordination problems. Thissolution turns out to be not optimal in general, since it consistsof a time-sharing strategy between two linear schemes of specificpower. It follows that the optimal decision strategy for the originalscalar Witsenhausen problem must lead to an internal state thatcannot be described by a continuous random variable which hasa probability density function.
I. I
NTRODUCTION
Distributed decision-making systems arise in many engi-neering problems where decentralized agents choose actionsbased on locally available information as to minimize acommon cost function. The information at each agent is eitherlocally observed or received from other agents. Since theprocess of sharing information comes with a cost, agentsusually do not have access to the whole information availableat all agents. The design of optimal decision strategies forsuch problems is considered to be notoriously difficult. TheWitsenhausen counterexample [1] from 1968 is an outstandingtoy example that has significantly helped to understand thefundamental difficulty that actions serve two purposes, a con-trol purpose affecting the system state and a communicationpurpose providing information to other agents [2].Although Witsenhausen refuted with his simple two-pointcounterexample the assertion that a linear policy would bealso optimal in such a Gaussian setting, the optimal non-linearpolicy remains unknown. Many researcher have approachedthe optimization problem with various methods. In the lastdecade for instance it has been approached with numericaloptimization methods [3], [4], where the latter is based on aniterative source-channel coding approach. Analytically, usingresults from optimal transport theory, it has been shown in [5]that the optimal decision strategy is a strictly increasing un-bounded piece-wise real analytic function with a real analytic
The authors gratefully acknowledge the financial support of SRV ENSEAfor visits at KTH in Stockholm in 2017 and 2019, and at ETIS in Cergyin 2018. This research has been conducted as part of the Labex MME-DII(ANR11-LBX-0023-01). Part of the research has been supported by SwedishResearch Council (VR) under grant 2020-03884. left inverse. More necessary conditions have been derived in[6] by analyzing an equivalent optimization problem on thespace of square-integrable quantile functions. However, it isunclear if the optimal decision policy of the first agent resultsin an internal state that can be described by a continuousrandom variable.In this work, we show that the optimal decision strategywill not lead to an internal state that can be described bya continuous random variable that has a probability densityfunction. The observation points on a subtle point in anouter bound argument, which might be easily overseen. Wewill further discuss that this observation, and in essence alsothe Witsenhausen counterexample, can be easily explainedby the relation between the MMSE and mutual informationconsidering Gaussian or binary distributed input [7]. + + b b X n ∼ N (0 , Q I ) X n U n X n Y t Z n ∼ N (0 , N I ) U ,t X ,t C C Fig. 1. The state and the channel noise are drawn according to the i.i.d.Gaussian distributions X n ∼ N (0 , Q I ) and Z n ∼ N (0 , N I ) . The internalstate sequence X n is causally estimated by decision maker C . Our approach is based on a vector-valued extension ofthe Witsenhausen counterexample as proposed by Grover andSahai in [8]. They study a non-causal encoding and decodingstrategy that combines a coding scheme with side informationand a linear scheme,which has been shown to be optimalby Choudhuri and Mitra in [9]. It has later been observedthat such problems can be also approached as an empiricalcoordination coding problem. In [10], we have provided anoverview on the individual findings and completed the missingcases using coding results from [11]. In [12] we have derivedan achievability result considering non-causal encoding andcausal decoding using a continuous alphabet building on proofmethods from [13]. In this work, we now derive a genie-aided outer bound for this case considering only decisionstrategies that result in continuous random variables whichhave a probability density functions.I. S
YSTEM M ODEL
In this work, we restrict our study to continuous randomvariables which have a probability density function (pdf), see[14, Chap. 8]. For brevity we only refer to continuous randomvariables.We consider the vector-valued Witsenhausen setup in whichthe sequences of states and channel noises are drawn inde-pendently according to the i.i.d. Gaussian distributions X n ∼N (0 , Q I ) and Z n ∼ N (0 , N I ) with min( Q, N ) > , where I denotes the identity matrix. We denote by X the internalstate and Y the output of the noisy channel. X = X + U with X ∼ N (0 , Q ) , (1) Y = X + Z = X + U + Z with Z ∼ N (0 , N ) . (2)We denote by P X = N (0 , Q ) the Gaussian state distributionand by P X Y | X U the conditional probability distributioncorresponding to equations (1) and (2). Definition 1.
For n ∈ N ⋆ = N \ { } , a “control design” withnon-causal encoder and causal decoder is a tuple of stochasticfunctions c = ( f, { g t } t ∈{ ,...,n } ) defined by f : X n −→ U n , g t : Y t −→ U , ∀ t ∈ { , . . . , n } , (3) which induces a distribution over the sequences given by (cid:18) n Y t =1 P X ,t (cid:19) f U n | X n (cid:18) n Y i = t P X ,t Y ,t | X ,t U ,t (cid:19)(cid:18) n Y t =1 g U ,t | Y t (cid:19) . We denote by C d ( n ) the set of control designs with non-causal encoder and causal decoder c = ( f, { g t } t ∈{ ,...,n } ) that induce sequences of continuous random variables. Definition 2.
We define the n -stage costs associated with c by γ n p ( c ) = ( E h n P nt =1 U ,t i if it exists, + ∞ otherwise, (4) γ n s ( c ) = ( E h n P nt =1 ( X ,t − U ,t ) i if it exists, + ∞ otherwise. (5) The pair of costs ( P, S ) ∈ R is achievable if for all ε > ,there exists ¯ n ∈ N ⋆ , for all n ≥ ¯ n , there exists a control design c ∈ C d ( n ) such that (cid:12)(cid:12)(cid:12) P − γ n p ( c ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) S − γ n s ( c ) (cid:12)(cid:12)(cid:12) ≤ ε. (6) Theorem 1.
The pair of Witsenhausen costs ( P, S ) is achiev-able if and only if there exists continuous random variableswith probability distribution that decomposes according to P X Q U W W | X P X Y | X U Q U | W Y , (7) where ( W , W ) are auxiliary random variables such that ≤ I ( W ; Y | W ) − I ( W , W ; X ) and P = E Q (cid:2) U (cid:3) , S = E Q (cid:2) ( X − U ) (cid:3) . (8)This result is stated in [12, Theorem 1]. Remark 1.
The probability distribution in (7) satisfies ( X , Y ) − (cid:10) − ( X , U ) − (cid:10) − ( W , W ) ,U − (cid:10) − ( Y , W ) − (cid:10) − ( X , X , U , W ) ,X − (cid:10) − ( X , U ) − (cid:10) − ( X , U , Y , W , W ) . (9) The causality condition prevents the controller C to recover W which induces the second Markov chain of (9) . Definition 3.
The optimal cost considering continuous randomvariables is characterized by the optimization problem definedas follows S c ( P ) = min Q∈ Q c ( P ) E Q (cid:2) ( X − U ) (cid:3) , (10) Q c ( P ) = (cid:26)(cid:0) Q U W W | X , Q U | W Y (cid:1) s.t. P = E Q (cid:2) U (cid:3) ,I ( W ; Y | W ) − I ( W , W ; X ) ≥ and X , U , W , W , X , Y , U are continuous (cid:27) . (11) Lemma 1 (Lemma 11 in [1]) . The best linear scheme is U = − q PQ X if P ≤ Q , otherwise U = − X + √ P − Q . Thisinduces the estimation cost given by S ℓ ( P ) = (cid:0) √ Q −√ P (cid:1) · N (cid:0) √ Q −√ P (cid:1) + N if P ∈ [0 , Q ] , otherwise. (12) PS QQP P Q = 2 , N = 0 . b bb bbbbb Fig. 2. The curve S ℓ ( P ) in (12) and the straight line N · ( Q − N − P ) Q . Theorem 2.
The optimal cost with continuous random vari-ables satisfies S c ( P ) = ( N · ( Q − N − P ) Q if Q > N and P ∈ [ P , P ] ,S ℓ ( P ) otherwise, (13) P = 12 (cid:16) Q − N − p Q · ( Q − N ) (cid:17) , (14) P = 12 (cid:16) Q − N + p Q · ( Q − N ) (cid:17) . (15) If P > Q , the inernal state X can be canceled and the offset √ P − Q is only included to meet the power constraint with equality as in (6). he proof of Theorem 2 is stated in Sec. V-A and V-B.Figure 2 represents the cost of the linear scheme S ℓ ( P ) and theline of equation P N · ( Q − N − P ) Q . Note that the upper boundin (13) may be obtained by using either a time sharing strategybetween the two linear schemes with parameters P and P when Q > N and P ∈ [ P , P ] , or a linear scheme of power P . This result shows that memoryless policies are optimalso that these policies are also optimal for the original scalarWitsenhausen counterexample setup restricted to continuousrandom variables. However, as pointed out by Witsenhausenin [1], such a strategy in the original scalar model is generallynot optimal!III. W ITSENHAUSEN ’ S TWO - POINTS STRATEGY
The two-point strategy described in [1, pp. 141] outperformsthe optimal cost with continuous random variables S c ( P ) ofTheorem 2. We consider Q = 10 , N = 1 and the sender’sstrategy with parameter a ≥ given by U = a · sign (cid:0) X (cid:1) − X , (16)which induces X = a · sign (cid:0) X (cid:1) . For all a ≥ , the pair ofcosts are given by P t ( a ) = Q + a (cid:16) a − r Qπ (cid:17) , (17) S t ( a ) = a √ π e − a Z e − y cosh ay dy. (18)Fig. 3 shows that for some a ∈ [0 . , , the pair of costs ( P t ( a ) , S t ( a )) Pareto-dominates S c ( P ) of Theorem 2, forsome P . PS QQP P Q = 10 , N = 1 b bb bbbbb Fig. 3. The lasso-shaped curve corresponds to the Witsenhausen’s two-pointstrategy with parametric equations (17) and (18) for a ∈ [0 . , . From the previous, we have the following Theorem.
Theorem 3.
There exists values for Q , N , and a for whichwe have S t ( a ) ≤ S c ( P t ( a )) . Discussion In the following we also briefly want to explain the Wit-senhausen counterexample result in terms of the I-MMSEformula by Guo, Shamai and Verdu [7]. The formula hasbeen used to illustrate in [7, Fig. 1] that binary inputs inadditive Gaussian noise channels result in a lower MMSEthan Gaussian distributed inputs with the same SNR. In otherwords, to achieve the same MMSE, binary distributed inputrequires less channel input power than Gaussian distributedinput. Exactly this power gain has been exploited in theWitsenhausen counterexample scheme in which the internalstate X is binary so that the resulting MMSE outperforms theMMSE of the best linear scheme. Analytically, it is interestingto see that the MMSE formulas (17) and (18) directly relateto [7, Equations (13) and (17)] with the corresponding signalpowers and noise power.IV. C ONCLUSION
Our results show that information theoretic methods, inparticular coordination coding results, lead to new insightson the Witsenhausen counterexample. Vice versa, we believethat our observation makes the Witsenhausen counterexamplealso interesting for other source-channel coding problems. Inmore detail, we show that a convex combination of linearmemoryless policies is optimal for the vector-valued Witsen-hausen problem with causal decoder restricted to the space ofcontinuous random variables. Since the policy is memoryless,it follows that the linear policy is also optimal for the originalWitsenhausen problem restricted to the space of continuousrandom variables which have a pdf. However, Witsenhausen’stwo-points strategy outperforms the best linear strategy, whichimplies that the hypothesis of a continuous random variablewhich has a pdf is an active restriction for the Witsenhausencounter-example setup. According to the Lebesgue’s decompo-sition Theorem, every probability distribution on the real lineis a mixture of discrete part, singular part, and an absolutelycontinuous part. Accordingly, we conclude that the optimaldecision strategy for the unrestricted Witsenhausen’s counter-example must lead to an internal state that cannot be describedby a continuous random variable which has a pdf. In futureworks, we will consider policies that result in internal statesdescribed by more general probability distributions.V. P
ROOFS
A. Lower bound for the Theorem 2
The Markov chain Y − (cid:10) − ( X , U ) − (cid:10) − ( W , W ) implies I ( W ; Y | W ) − I ( W , W ; X ) ≤ I ( W ; Y | W , X ) − I ( W ; X ) (19) ≤ I ( U ; Y | W , X ) − I ( W ; X ) . (20)herefore S c ( P ) ≥ min Q U W | X ∈ Q P ) , Q U | W Y E Q (cid:2) ( X − U ) (cid:3) (21) ≥ min Q U W | X ∈ Q ( P ) E Q h(cid:16) X − E (cid:2) X | W , Y (cid:3)(cid:17) i , (22)where Q ( P ) = (cid:26) Q U W | X s.t. P = E Q (cid:2) U (cid:3) ,I ( U ; Y | W , X ) − I ( W ; X ) ≥ , ( X , U , W , X , Y , U ) are continuous (cid:27) , (23)The random variables ( X , W , U ) drawn according to Q ′ U W | X which is optimal for (22), have covariance matrix K = Q ρ √ QV ρ √ QPρ √ QV V ρ √ V Pρ √ QP ρ √ V P P , (24)where the correlation coefficients ( ρ , ρ , ρ ) ∈ [ − , aresuch that QV P · (cid:0) − ρ − ρ − ρ + 2 ρ ρ ρ (cid:1) ≥ , i.e. K issemi-definite positive.We denote by Q U W | X the Gaussian conditional distribu-tion such that ( X , W , U ) ∼ N (0 , K ) . According to [15,Maximum Differential Entropy Lemma, pp. 21], we have E Q ′ h(cid:16) X − E (cid:2) X | Y , W (cid:3)(cid:17) i ≥ πe · h ( X | Y ,W ) (25) = E Q h(cid:16) X − E (cid:2) X | Y , W (cid:3)(cid:17) i . (26)Moreover, both P X Q ′ U W | X and P X Q U W | X satisfy ≤ I Q ′ ( U ; Y | W , X ) − I Q ′ ( W ; X ) (27) = h Q ′ ( Y | W , X ) − h Q ( Y | W , X , U ) − h Q ( X ) + h Q ′ ( X | W ) (28) ≤ h Q ( Y | W , X ) − h Q ( Y | W , X , U ) − h Q ( X ) + h Q ( X | W ) (29) = I Q ( U ; Y | W , X ) − I Q ( W ; X ) , (30)where (28) comes from h Q ′ ( X ) = h Q ( X ) and h Q ′ ( Y | W , X , U ) = h Q ( Z ) = h Q ( Y | W , X , U ) , and(29) comes from [14, (8.92), pp. 256]. Lemma 2.
Assume that ( X , W , U ) ∼ N (0 , K ) , then I ( U ; Y | X , W ) − I ( X ; W )= 12 log (cid:18) PN · (1 − ρ − ρ − ρ + 2 ρ ρ ρ ) + (1 − ρ ) (cid:19) , (31) E Q h(cid:16) X − E (cid:2) X | Y , W (cid:3)(cid:17) i = N · (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 √ QP · ( ρ − ρ ρ ) (cid:17) N + (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 √ QP · ( ρ − ρ ρ ) (cid:17) . (32) The proof of Lemma 2 is stated in Sec. V-C. Note thatthe equations (31) and (32) do not depend on the varianceparameter V of the auxiliary random variable W . Also, when(31) is positive the matrix K is semi-definite positive.By using Lemma 2, we reformulate (22) and since thefunction x → N · xN + x is strictly increasing for all x ≥ , theoptimal parameters ( ρ ⋆ , ρ ⋆ , ρ ⋆ ) ∈ [ − , minimize Q · (1 − ρ ) + P · (1 − ρ ) + 2 p QP · ( ρ − ρ ρ ) , (33)under the constraint PN · (1 − ρ − ρ − ρ + 2 ρ ρ ρ ) − ρ ≥ (34) ⇐⇒ (1 − ρ ) · (1 − ρ ) − NP · ρ ≥ ( ρ − ρ ρ ) , (35)which yields ρ ⋆ = ρ ρ − r (1 − ρ ) · (1 − ρ ) − NP · ρ . (36) Lemma 3. If Q > N and P ∈ [ P , P ] , then ρ ⋆ = P · Q − ( P + N ) ( P + N ) · Q , ρ ⋆ = 0 . (37) If Q ≤ N or if Q > N and P ∈ [0 , P ] ∪ [ P , Q ] , then ρ ⋆ = 0 , ρ ⋆ = 0 . (38) If P > Q , then ρ ⋆ = 0 , ρ ⋆ = P − QP . (39)The proof of Lemma 3 is stated in Sec. V-D. We ob-tain the lower bound by replacing the optimal parameters ( ρ ⋆ , ρ ⋆ , ρ ⋆ ) ∈ [ − , in (32), that is if P ∈ [0 , Q ] S c ( P ) ≥ N · ( Q − N − P ) Q if Q > N and P ∈ [ P , P ] , (cid:0) √ Q −√ P (cid:1) · N (cid:0) √ Q −√ P (cid:1) + N otherwise. (40) B. Upper bound for the Theorem 21) Linear Scheme:
By considering the best linear scheme U = ( − q PQ · X , if P < Q, − X + √ P − Q, if P ≥ Q, (41)we have that S c ( P ) ≤ S ℓ ( P ) , for all P ≥ .
2) Case where
Q > N and P ∈ [ P , P ] : The upperbound of Theorem 2 can be obtained by using a time sharingstrategy between the two linear schemes with parameters P nd P . We obtain the same result by assuming that the randomvariables ( U , W , W ) are drawn according to W = s P + NP Q − ( P + N ) · (cid:16) X − Z (cid:17) ∼ N (cid:16) , (cid:17) , with Z ∼ N (cid:18) , QN + ( P + N ) P + N (cid:19) and Z ⊥ W , (42) W = P Q − ( P + N ) Q ( P + N ) · X + U , with U ⊥ ( X , W ) and U ∼ N (cid:18) , N · (cid:0) P Q − ( P + N ) (cid:1) QN + ( P + N ) (cid:19) , (43) U = − ( P + N ) QN + ( P + N ) · X + U + r P Q − ( P + N ) P + N · ( P + N ) QN + ( P + N ) · W . (44)Then we have I ( W ; Y , W ) − I ( W ; X , W ) = I ( U ; Y | X , W ) (45) = I ( X ; W ) = 12 log (cid:18) P Q − ( P + N ) QN + ( P + N ) (cid:19) , (46)and S c ( P ) ≤ N · ( Q − P − N ) Q . (47)
C. Proof of Lemma 2
We consider ( X , W , U ) ∼ N (0 , K ) with K defined in(24), which together with (2), induces the Gaussian randomvariables ( X , W , Y ) whose entropy is h ( X , W , Y ) = 12 log (cid:18) (2 πe ) · QV (48) × (cid:16) P · (1 − ρ − ρ − ρ + 2 ρ ρ ρ ) + N · (1 − ρ ) (cid:17)(cid:19) . (49)Therefore we have I ( U ; Y | X , W ) − I ( X ; W )= h ( X , W , Y ) − h ( Y | U , X , W ) − h ( X ) − h ( W ) (50) = 12 log (cid:18) PN · (1 − ρ − ρ − ρ + 2 ρ ρ ρ ) + (1 − ρ ) (cid:19) . (51)According to (1) and (2) the entropy of ( X , W , Y ) writes h ( X , W , Y ) = 12 log (cid:18) (2 πe ) · V · N (52) × (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 p QP · ( ρ − ρ ρ ) (cid:17)(cid:19) , (53) and hence E h(cid:0) X − E ( X | Y, W ) (cid:1) i = N · (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 √ QP · ( ρ − ρ ρ ) (cid:17) N + (cid:16) Q · (1 − ρ ) + P · (1 − ρ ) + 2 √ QP · ( ρ − ρ ρ ) (cid:17) . (54) D. Proof of Lemma 3
We replace ρ ⋆ in (33) and we define f ( ρ , ρ ) = Q · (1 − ρ ) + P · (1 − ρ ) − p QP · r (1 − ρ ) · (1 − ρ ) − NP · ρ . (55)Note that f is well defined if ρ ≤ PP + N and ρ ≤ − NP · ρ − ρ . ∂f ( ρ , ρ ) ∂ρ = p P Q · − ρ q (1 − ρ ) · (1 − ρ ) − NP · ρ − P, (56)then for all ρ ≤ PP + N , the optimal ρ ⋆ ( ρ ) is ρ ⋆ ( ρ ) = max − (cid:18) QP · (cid:16) − ρ (cid:17) + NP · ρ − ρ (cid:19) , ! . (57)We introduce the parameters ρ a = 2 Q − ( P + N ) − p ( P + N ) − QN Q , (58) ρ b = 2 Q − ( P + N ) + p ( P + N ) − QN Q , (59)and we define the function F ( ρ ) = f (cid:16) ρ , ρ ⋆ ( ρ ) (cid:17) = Q · (1 − ρ ) + P − √ QP · q − ρ · P + NP if ≤ ρ ≤ ρ a ,N · ρ − ρ if ρ a ≤ ρ ≤ ρ b ,Q · (1 − ρ ) + P − √ QP · q − ρ · P + NP if ρ b ≤ ρ ≤ PP + N . (60)The function F ( ρ ) is continuous in ρ a and ρ b . We define ρ ⋆ = P · Q − ( P + N ) ( P + N ) · Q . (61) • If Q > N and P ∈ [ P , P ] , then the function F ( ρ ) isdecreasing over the interval ρ ∈ [0 , ρ ⋆ ] and increasing overthe interval ρ ∈ [ ρ ⋆ , PP + N ] , then the optimal parameters are ρ = ρ ⋆ , ρ = 0 . (62) • If Q ≤ N or if Q > N and P ∈ [0 , P ] ∪ [ P , Q ] , thenthe optimal parameters are ρ = ρ = 0 . • If P > Q , then ρ = 0 , ρ = P − QP . (63)
EFERENCES[1] H. Witsenhausen, “A counterexample in stochastic optimum control,”
SIAM Journal on Control , vol. 6, no. 1, pp. 131–147, 1968.[2] S. Y¨uksel and T. Basar,
Stochastic Networked Control Systems: Stabi-lization and Optimization under Information Constraints , ser. Systems &Control Foundations & Applications. New York, NY: Springer, 2013.[3] S.-H. Tseng, A. Tang. “A Local Search Algorithm for the Witsenhausen’sCounterexample.,” in
Proc. IEEE CDC , Dec. 2017.[4] J. Karlsson, A. Gattami, T. J. Oechtering, and M. Skoglund, “Iterativesource-channel coding approach to Witsenhausen’s counterexample,” in
Proc. IEEE ACC,
Proc. IEEE CDC,
Automatica,
IEEE Trans. Inform. Theory,
April 2005.[8] P. Grover and A. Sahai, “Witsenhausen’s counterexample as assistedinterference suppression,” in
Int. J. Syst., Control Commun. , 2010.[9] C. Choudhuri and U. Mitra, “On Witsenhausen’s counterexample: theasymptotic vector case,” , pp. 162–166, July 2012.[10] M. Le Treust and T. J. Oechtering, “Optimal Control Designs forVector-valued Witsenhausen Counterexample Setups,”
IEEE 56th AnnualAllerton Conf. on Commun., Control, and Comp. , Sept. 2018.[11] M. Le. Treust, “Joint Empirical Coordination of Source and Channel,”
IEEE Trans. Inf. Theory , vol. 63, no. 8, pp. 5087–5114, Aug. 2017.[12] T. J. Oechtering and M. Le Treust, “Coordination Coding with CausalDecoder for Vector-valued Witsenhausen Counterexample Setups,”
IEEEInformation Theory Workshop (ITW) , Aug. 2019.[13] M. T. Vu, T. J. Oechtering, and M. Skoglund, “Hierarchical IdentificationWith Pre-Processing,”
IEEE Trans. Inf. Theory , Jan. 2020.[14] T. M. Cover and J. A. Thomas,
Elements of Information Theory , 2nd ed.Wiley & Sons, 2006.[15] A. El Gamal and Y.-H. Kim,