[PDF] Information Bottleneck for an Oblivious Relay with Channel State Information: the Vector Case

Abstract

This paper considers the information bottleneck (IB) problem of a Rayleigh fading multiple-input multiple-out (MIMO) channel. Due to the bottleneck constraint, it is impossible for the oblivious relay to inform the destination node of the perfect channel state information (CSI) in each channel realization. To evaluate the bottleneck rate, we provide an upper bound by assuming that the destination node can get the perfect CSI at no cost and two achievable schemes with simple symbol-by-symbol relay processing and compression. Numerical results show that the lower bounds obtained by the proposed achievable schemes can come close to the upper bound on a wide range of relevant system parameters.

Full PDF

IInformation Bottleneck for an Oblivious Relay withChannel State Information: the Vector Case

Hao Xu ∗ , Tianyu Yang ∗ , Giuseppe Caire ∗ , and Shlomo Shamai (Shitz) †∗ Faculty of Electrical Engineering and Computer Science, Technical University of Berlin, 10587 Berlin, Germany † Department of Electrical Engineering, Technion—Israel Institute of Technology, Haifa 3200003, IsraelE-mail: [email protected]; [email protected]; [email protected]; [email protected]

Abstract —This paper considers the information bottleneck(IB) problem of a Rayleigh fading multiple-input multiple-out (MIMO) channel. Due to the bottleneck constraint, it isimpossible for the oblivious relay to inform the destination nodeof the perfect channel state information (CSI) in each channelrealization. To evaluate the bottleneck rate, we provide an upperbound by assuming that the destination node can get the perfectCSI at no cost and two achievable schemes with simple symbol-by-symbol relay processing and compression. Numerical resultsshow that the lower bounds obtained by the proposed achievableschemes can come close to the upper bound on a wide range ofrelevant system parameters.

I. I

NTRODUCTION

For a Markov chain X → Y → Z and an assignedjoint probability distribution p X,Y , consider the followinginformation bottleneck (IB) problem max p Z | Y I ( X ; Z ) (1a)s.t. I ( Y ; Z ) ≤ C, (1b)where C is the bottleneck constraint parameter and theoptimization is with respect to the conditional probabilitydistribution p Z | Y of Z given Y . Formulation (1) was intro-duced by Tishby in [1], and has been used to interpret thebehavior of deep learning neural networks [2]. From a morefundamental information theoretic viewpoint, the IB arisesfrom the classical remote source coding problem [3], [4] underlogarithmic distortion [5].An interesting application of the IB problem in communi-cations consists of a source node, an oblivious relay, and adestination node, which is connected to the relay via an error-free link with capacity C . The source node sends codewordsover a communication channel and an observation is madeat the relay. X and Y are respectively the channel inputfrom the source node and output at the relay. The relay isoblivious in the sense that it cannot decode the informationmessage of the source node itself. This feature can be modeledrigorously by assuming that the source and destination nodesmake use of a codebook selected at random over a library,while the relay is unaware of such random selection. Hence,the relay must treat X as a random process with a distributioninduced by the random selection over the codebook library(see [6] and references therein), and has to produce someuseful representation Z and convey it to the destination node subject to the link constraint C . Then, it makes sense to ﬁnd Z such that I ( X ; Z ) is maximized.The IB problem for this kind of communication scenario hasbeen studied in [7]–[10]. In [7], the IB method was appliedto reduce the fronthaul data rate of a cloud radio access (C-RAN) network. References [8] and [9] respectively consideredGaussian scalar and vector channels with IB constraint, andinvestigated the optimal trade-off between the compressionrate and the relevant information. However, all references [7],[8], and [9] considered block fading channels. Hence, theperfect channel state information (CSI) was known at boththe relay and the destination node. In [10], we studied theIB problem of a scalar Rayleigh fading channel. Due to thebottleneck constraint, it is impossible to inform the destinationnode of the perfect CSI in each channel realization. An upperbound and two achievable schemes were provided in [10].In this paper, we extend the work in [10] to the multiple-input multiple-out (MIMO) channel with independent andidentically distributed (i.i.d.) Rayleigh fading. To evaluate thebottleneck rate, we ﬁrst obtain an upper bound by assumingthat the channel matrix is also known at the destination nodewith no cost. Then, we provide two achievable schemes wherethe ﬁrst scheme transmits the compressed noisy signal as wellas the quantized noise levels to the destination node, whilethe second scheme only transmits a compressed estimate.Numerical results show that with simple symbol-by-symbolrelay processing and compression, the lower bounds obtainedby the proposed achievable schemes can come close to theupper bound on a wide range of relevant system parameters.II. P ROBLEM F ORMULATION

We consider a system with a source node, an oblivious relay,and a destination node. For convenience, we call the source-relay channel, ‘Channel 1’, and the relay-destination channel,‘Channel 2’. For Channel 1, we consider the following Gaus-sian MIMO channel with i.i.d. Rayleigh fading y = Hx + n , (2)where x ∈ C K × and n ∈ C M × are respectively zero-mean circularly symmetric complex Gaussian input and noisewith covariance matrices I K and σ I M , i.e., x ∼ CN (0 , I K ) and n ∼ CN (0 , σ I M ) . H ∈ C M × K is a random matrixindependent of both x and n , and the elements of H are i.i.d.zero-mean unit-variance complex Gaussian random variables, a r X i v : . [ c s . I T ] J a n .e., H ∼ CN ( , I K ⊗ I M ) . Let z denote a useful represen-tation of y produced by the relay for the destination node. x → ( y , H ) → z thus forms a Markov chain. We assumethat the relay node has a direct observation of the channelmatrix H , while the destination node does not since H is notsubject to any distortion constraint, i.e., it is not needed at all(explicitly) at the destination. Then, we consider the followingIB problem max p ( z | y , H ) I ( x ; z ) (3a)s.t. I ( y , H ; z ) ≤ C, (3b)where C is the bottleneck constraint, i.e., the link capacityof Channel 2. In this paper, we call I ( x ; z ) the bottleneckrate and I ( y , H ; z ) the compression rate. Obviously, for ajoint probability distribution p ( x , y , H ) determined by (2),problem (3) is a slightly augmented version of IB problem(1). In our problem, we aim to ﬁnd a conditional distribution p ( z | y , H ) such that bottleneck constraint (3b) is satisﬁed andthe bottleneck rate is maximized, i.e., as much as informationof x can be extracted from representation z .III. I NFORMED R ECEIVER U PPER B OUND

As stated in [10], an obvious upper bound to problem (3)can be obtained by letting both the relay and the destinationnode know the channel matrix H . We call the bound in thiscase the informed receiver upper bound. The IB problem inthis case takes on the following form max p ( z | y ) I ( x ; z | H ) (4a)s.t. I ( y ; z | H ) ≤ C. (4b)In [8], the IB problem for a scalar Gaussian channel withblock fading has been studied. In the following theorem, weshow that for the considered MIMO channel with Rayleighfading, (4) can be decomposed into a set of parallel scalarIB problems, and the informed receiver upper bound can beobtained based on the result in [8]. Theorem 1.

For the considered MIMO channel with Rayleighfading, the informed receiver upper bound, i.e., the optimalobjective function of IB problem (4), is R ub = T (cid:90) ∞ νρ [log (1 + ρλ ) − log(1 + ν )] f λ ( λ ) dλ, (5) where T = min { K, M } , ρ = σ , the probability densityfunction (pdf) of λ , i.e., f λ ( λ ) , is given by (53), and ν ischosen such that the following bottleneck constraint is met (cid:90) ∞ νρ (cid:18) log ρλν (cid:19) f λ ( λ ) dλ = CT . (6)

Proof:

See Appendix A. (cid:3)

Lemma 1.

When M → + ∞ or ρ → + ∞ , upper bound R ub tends asymptotically to C . When C → + ∞ , R ub approachesthe capacity of Channel 1, i.e., R ub → I ( x ; y , H )= T (cid:90) ∞ log (1 + ρλ ) f λ ( λ ) dλ. (7) Proof:

See Appendix B. (cid:3)

IV. A

CHIEVABLE S CHEMES

In this section, we provide two achievable schemes whereeach scheme guarantees the bottleneck constraint and gives alower bound to the bottleneck rate.

A. Quantized channel inversion (QCI) scheme when K ≤ M In our ﬁrst achievable scheme, the relay ﬁrst gets an estimateof the channel input using channel inversion and then transmitscompressed noisy signal and quantized noise levels to thedestination node.In particular, we apply the pseudo inverse matrix of H , i.e., ( H H H ) − H H , to y , and get the zero-forcing estimate of x as follows ˜ x = ( H H H ) − H H y = x + ( H H H ) − H H n (cid:44) x + ˜ n . (8)For a given channel matrix H , ˜ n ∼ CN ( , A ) , where A = σ ( H H H ) − . Let A = A + A , where A and A respectively consist of the diagonal and off-diagonalelements of A , i.e., A = A (cid:12) I K and A = A − A .If H can be perfectly transmitted to the destination node,the bottleneck rate can be obtained by following similar stepsin Appendix A. However, since H follows a non-degeneratecontinuous distribution and the bottleneck constraint is ﬁnite,this is not possible. To reduce the number of bits per channeluse required for informing the destination node of the channelinformation, we only convey a compressed version of A andconsider a set of independent scalar Gaussian sub-channels.Speciﬁcally, we force each diagonal entry of A to belongto a ﬁnite set of quantized levels by adding artiﬁcial noise,i.e., by introducing physical degradation. We ﬁx a ﬁnite gridof J positive quantization points B = { b , · · · , b J } , where b ≤ b ≤ · · · ≤ b J − < b J , b J = + ∞ , and deﬁne thefollowing ceiling operation (cid:6) a (cid:7) B = arg min b ∈B { a ≤ b } . (9)Then, by adding a Gaussian noise vector ˜ n (cid:48) ∼ CN ( , diag (cid:8)(cid:6) a (cid:7) B − a , · · · , (cid:6) a K (cid:7) B − a K (cid:9)(cid:1) , which is independentof everything else, to (8), a degraded version of ˜ x can beobtained as follows ˆ x = ˜ x + ˜ n (cid:48) = x + ˜ n + ˜ n (cid:48) (cid:44) x + ˆ n , (10)here ˆ n ∼ CN ( , A (cid:48) + A ) for a given H and A (cid:48) (cid:44) diag (cid:8)(cid:6) a (cid:7) B , · · · , (cid:6) a K (cid:7) B (cid:9) . Obviously, due to A , the ele-ments in noise vector ˆ n are correlated.To evaluate the bottleneck rate, we consider a new variable ˆ x g = x + ˆ n g , (11)where ˆ n g ∼ CN ( , A (cid:48) ) . Obviously, (11) can be seen as K parallel scalar Gaussian sub-channels with noise power (cid:6) a k (cid:7) B for each sub-channel. Since each quantized noise level (cid:6) a k (cid:7) B only has J possible values, it is possible for the relayto inform the destination node of the channel informationvia the constrained link. Note that from the deﬁnition of A in (8), it is known that a k , ∀ k ∈ K (cid:44) { , · · · , K } are correlated. The quantized noise levels (cid:6) a k (cid:7) B , ∀ k ∈ K are thus also correlated. Hence, we can jointly source-encode (cid:6) a k (cid:7) B , ∀ k ∈ K to further reduce the number of bits usedfor CSI feedback. However, since the joint entropy of thequantization indices is difﬁcult to obtain (even numerically,since it is a discrete joint distribution over J K possible values),in this work we consider the (slightly) suboptimal, but far morepractical, entropy coding of each sub-channel quantizationindex separately. The resulting optimization problem becomes max p (ˆ z g | ˆ x g ) I ( x ; ˆ z g | A (cid:48) ) (12a)s.t. I ( ˆ x g ; ˆ z g | A (cid:48) ) ≤ C − K (cid:88) k =1 H k , (12b)where H k denotes the entropy of (cid:6) a k (cid:7) B . In Appendix C, weshow that a k , ∀ k ∈ K are marginally identically inverse chisquared distributed with M − K +1 degrees of freedom. Hence, H k = H (cid:44) − (cid:80) Jj =1 P j log P j , where P j = Pr (cid:8)(cid:6) a (cid:7) B = b j (cid:9) and a follows the same distribution as a k . The pdf of a isgiven in (68), based on which the probability mass function(pmf) P j can be calculated as (69). The optimal solution ofIB problem (12) is given in the following theorem. Theorem 2. If A (cid:48) is conveyed to the destination node foreach channel realization, the optimal objective function of IBproblem (12) is R lb = J − (cid:88) j =1 KP j (cid:2) log (1 + ρ j ) − log(1 + ρ j − c j ) (cid:3) . (13) where ρ j = b j , c j = (cid:2) log ρ j ν (cid:3) + , and ν is chosen such that thefollowing bottleneck constraint is met J − (cid:88) j =1 KP j c j = C − KH . (14) Proof:

See Appendix C. (cid:3)

Since (11) can be seen as K parallel scalar Gaussian sub-channels, according to [8, (16)], the representation of ˆ x g , i.e., ˆ z g , can be constructed by adding independent fading andGaussian noise to each element of ˆ x g . Denote ˆ z g = Φ ˆ x g + ˆ n (cid:48) g = Φx + Φ ˆ n g + ˆ n (cid:48) g , (15)where Φ is a diagonal matrix with positive and real diagonalentries, and ˆ n (cid:48) g ∼ CN ( , I K ) . Note that ˆ x g in (11) and itsrepresentation ˆ z g in (15) are only auxiliary variables. Whatwe are really interested in is the representation of ˆ x and thecorresponding bottleneck rate. Hence, we also add fading Φ and Gaussian noise ˆ n (cid:48) g to ˆ x in (10) and get its representationas follows z = Φ ˆ x + ˆ n (cid:48) g = Φx + Φ ˆ n + ˆ n (cid:48) g . (16)In the following lemma we show that by transmitting quantizednoise levels (cid:6) a k (cid:7) B , ∀ k ∈ K and representation z to thedestination node, R lb is an achievable lower bound to thebottleneck rate and the bottleneck constraint is satisﬁed. Lemma 2. If A (cid:48) is forwarded to the destination node for eachchannel realization, with signal vectors ˆ x and ˆ x g in (10) and(11), and their representations z and ˆ z g in (16) and (15), wehave I ( ˆ x ; z | A (cid:48) ) ≤ I ( ˆ x g ; ˆ z g | A (cid:48) ) , (17) I ( x ; z | A (cid:48) ) ≥ I ( x ; ˆ z g | A (cid:48) ) , (18) where (17) indicates that I ( ˆ x ; z | A (cid:48) ) ≤ C − KH and (18)gives I ( x ; z | A (cid:48) ) ≥ R lb . Proof:

See Appendix D. (cid:3)

Lemma 3.

When M → + ∞ or ρ → + ∞ , we can alwaysﬁnd suitable quantization points B = { b , · · · , b J } such that R lb → C . When C → + ∞ , R lb → K E (cid:20) log (cid:18) a (cid:19)(cid:21) ≤ I ( x ; y , H ) , (19) where the expectation can be calculated by using the pdf of a in (68) and I ( x ; y , H ) is the capacity of Channel 1. Proof:

See Appendix E. (cid:3)

For the sake of simplicity, we may choose the quantizationlevels as quantiles such that we obtain the uniform pmf P j = J . The lower bound (13) can thus be simpliﬁed as R lb = J − (cid:88) j =1 KJ (cid:2) log (1 + ρ j ) − log(1 + ρ j − c j ) (cid:3) , (20)and the bottleneck constraint (14) becomes J − (cid:88) j =1 (cid:104) log ρ j ν (cid:105) + = JCK − JB, (21)where B = log J can be seen as the number of bits requiredfor quantizing each diagonal entry of A . Since ρ ≥ · · · ≥ J − , from the strict convexity of the problem, we know thatthere must exist a unique integer ≤ l ≤ J − such that l (cid:88) j =1 log ρ j ν = JCK − JB,ρ j ≤ ν, ∀ l + 1 ≤ j ≤ J − . (22)Hence, ν can be obtained from log ν = l (cid:88) j =1 log ρ j l − JClK + JBl , (23)and R lb can be calculated as follows R lb = l (cid:88) j =1 KJ [log (1 + ρ j ) − log(1 + ν )] . (24)Then, we only need to test the above condition for l =1 , , , · · · till (22) is satisﬁed. Note that to ensure R lb > , JCK − JB in (21) has to be positive, i.e., B ≤ CK . Moreover,though choosing the quantization levels as quantiles makesit easier to calculate R lb , the results in Lemma 3 may nothold in this case since the choice of quantization points B = { b , · · · , b J } is restricted. B. MMSE estimate at the relay

In the second achievable scheme, we assume that the relayﬁrst produces the MMSE estimate of x given ( y , H ) , and thensource-encode this estimate.Denote F = (cid:0) HH H + σ I M (cid:1) − H . (25)The MMSE estimate of x is thus given by ¯ x = F H y = F H Hx + F H n . (26)Then, we consider the following modiﬁed IB problem max p ( z | ¯ x ) I ( x ; z ) (27a)s.t. I ( ¯ x ; z ) ≤ C. (27b)Note that since matrix HH H + σ I K in (25) is alwaysinvertible, the results obtained in this subsection always holdno matter K ≤ M or K > M .To evaluate the bottleneck rate I ( x ; z ) , we deﬁne an aux-iliary Gaussian vector ¯ x g ∼ CN (cid:0) , E (cid:2) ¯ x ¯ x H (cid:3)(cid:1) , let ¯ z g denoteits representation, and choose p ( z | ¯ x ) as well as p ( ¯ z g | ¯ x g ) tobe conditionally Gaussian distribution, i.e., z = ¯ x + q , ¯ z g = ¯ x g + q , (28)where q ∼ CN ( , D I K ) is independent of everything else.Let I ( ¯ x g ; ¯ z g ) = log det (cid:32) I K + E (cid:2) ¯ x ¯ x H (cid:3) D (cid:33) = C, (29) based on which D can be calculated. From (28) it is knownthat the squared-error distortion between ¯ z g and ¯ x g (or z and ¯ x ) is E (cid:2) q H q (cid:3) = KD . Then, rate-distortion pair ( I ( ¯ x g ; ¯ z g ) , KD ) is achievable [11, Theorem 10.4.1]. Dueto the fact that Gaussian input maximizes the mutual in-formation of a Gaussian additive noise channel, we have I ( ¯ x ; z ) ≤ I ( ¯ x g ; ¯ z g ) . Rate-distortion pair ( I ( ¯ x ; z ) , KD ) isthus achievable.In the following, we obtain a lower bound to I ( x ; z ) byevaluating h ( z | H ) and h ( z | x ) separately, and then using I ( x ; z ) = h ( z ) − h ( z | x ) ≥ h ( z | H ) − h ( z | x ) . (30)First, since z is conditionally Gaussian given H , we have h ( z | H ) = E (cid:2) log( πe ) K det (cid:0) F H HH H F + σ F H F + D I K (cid:1)(cid:3) . (31)Next, using the fact that conditioning reduces entropy andGaussian distribution maximizes the entropy over all distri-butions with the same variance [11, Theorem 8.6.5], we have h ( z | x ) = h ( z − E ( z | x ) | x )= h (cid:0)(cid:0) F H H − E (cid:2) F H H (cid:3)(cid:1) x + F H n + q | x (cid:1) ≤ h (cid:0)(cid:0) F H H − E (cid:2) F H H (cid:3)(cid:1) x + F H n + q (cid:1) ≤ log( πe ) K det( G ) , (32)where G = E (cid:2)(cid:0) F H H − E (cid:2) F H H (cid:3)(cid:1) (cid:0) H H F − E (cid:2) H H F (cid:3)(cid:1) + σ F H F (cid:3) + D I K = E (cid:2) F H HH H F (cid:3) − E (cid:2) F H H (cid:3) E (cid:2) H H F (cid:3) + σ E (cid:2) F H F (cid:3) + D I K . (33)Combining (30), (31), and (32), we can get a lower bound to I ( x ; z ) as shown in the following theorem. Theorem 3.

With MMSE estimate at the relay, a lower boundto I ( x ; z ) can be obtained as follows R lb = T E (cid:20) log (cid:18) λλ + σ + D (cid:19)(cid:21) + ( K − T ) log D − K log (cid:40) TK E (cid:20) λλ + σ (cid:21) − T K (cid:18) E (cid:20) λλ + σ (cid:21)(cid:19) + D (cid:41) , (34) where D = TK E (cid:104) λλ + σ (cid:105) CK − , (35) and the expectations can be calculated by using the pdf of λ in (53). Proof:

See Appendix F. (cid:3)

10 20 30 40 50 60 70 800510152025303540 ρ (dB) R a t e ( b i t s / c o m p l e x d i m en s i on ) R ub R lb1 , QCI, B=1 bitsR lb1 , QCI, B=2 bitsR lb1 , QCI, B=4 bitsR lb1 , QCI, B=8 bitsR lb2 , MMSE Fig. 1. Upper and lower bounds to the bottleneck rate versus ρ with K = M = 2 and C = 40 bits/complex dimension.

20 30 40 50 60 70 80 90 1000102030405060

C (bits/complex dimension) R a t e ( b i t s / c o m p l e x d i m en s i on ) R ub R lb1 , QCI, B=1 bitsR lb1 , QCI, B=2 bitsR lb1 , QCI, B=4 bitsR lb1 , QCI, B=8 bitsR lb2 , MMSE Fig. 2. Upper and lower bounds to the bottleneck rate versus C with K = M = 4 and ρ = 40 dB. Lemma 4.

When M → + ∞ or when K ≤ M and ρ → + ∞ ,lower bound R lb tends asymptotically to C . When K ≤ M and C → + ∞ , R lb → K E (cid:20) log (cid:18) λλ + σ (cid:19)(cid:21) − K log (cid:40) E (cid:20) λλ + σ (cid:21) − (cid:18) E (cid:20) λλ + σ (cid:21)(cid:19) (cid:41) . (36) Proof:

See Appendix G. (cid:3)

V. N

UMERICAL R ESULTS

In this section, we investigate the lower bounds obtained bythe proposed achievable schemes and compare them with theupper bound derived in Section III. When performing the QCIscheme, we choose the quantization levels as quantiles for thesake of convenience.In Fig. 1, the upper and lower bounds are depicted versusSNR ρ . It can be found that when ρ is small and or bits are applied to quantize the noise levels, the QCI schemeoutperforms the MMSE scheme. As ρ grows large, R lb obtained by the MMSE scheme approaches C and is largerthan R lb . This is because when ρ is small, the bottleneckrate is mainly limited by the capacity of channel 1, and theQCI scheme works well in this case since partial CSI, i.e., the M R a t e ( b i t s / c o m p l e x d i m en s i on ) R ub R lb1 , QCI, B=1 bitsR lb1 , QCI, B=2 bitsR lb1 , QCI, B=4 bitsR lb1 , QCI, B=8 bitsR lb2 , MMSE Fig. 3. Upper and lower bounds to the bottleneck rate versus M with K = 2 , ρ = 40 dB, and C = 40 bits/complex dimension. noise level of each sub-channel, is conveyed to the destinationnode. When ρ is large, the MMSE scheme can get an accurateestimate and it does not require CSI feedback. The MMSEscheme thus performs better when ρ is large.The effect of the bottleneck constraint C is investigated inFig. 2. It can be found that as C increases, all bounds grow andconverge to different constants, which can be calculated basedon Lemma 1, Lemma 3, and Lemma 4, respectively. Fig. 2also shows R lb virtually achieves the upper bound when C is small, while when C is large, the QCI scheme outperformsthe MMSE scheme thanks to CSI feedback.Fig. 3 depicts the bounds versus the number of relayantennas M . As M increases, R lb quickly approaches R ub .It is also shown that the result for the limit case in Lemma 3,i.e., when M → + ∞ , we can always ﬁnd suitable quantizationpoints B = { b , · · · , b J } such that R lb → C , does not holdhere. This is because when performing the QCI scheme, wechoose the quantization levels as quantiles. The choice ofquantization points B = { b , · · · , b J } is thus restricted.VI. C ONCLUSIONS

This work extends the IB problem of the scalar case in [10]to the case of MIMO Rayleigh fading channels. Due to theinformation bottleneck constraint, the destination node can notget the perfect CSI from the relay. Our results show that withsimple symbol-by-symbol relay processing and compression,we can get bottleneck rate close to the upper bound on a widerange of relevant system parameters.A

PPENDIX AP ROOF OF T HEOREM y = sx + n, (37)where x ∼ CN (0 , , n ∼ CN (0 , σ ) , and s ∈ C is thedeterministic channel gain. With bottleneck constraint C , theIB problem for (37) has been studied in [8] and the optimalbottleneck rate is given by R = log (cid:0) ρ | s | (cid:1) − log (cid:0) ρ | s | − C (cid:1) . (38)n the following, we show that (4) can be decomposed into aset of parallel scalar IB problems, and (38) can then be appliedto get upper bound R ub in Theorem 1.According to the deﬁnition of conditional entropy, problem(4) can be rewritten as max p ( z | y , H ) (cid:90) I ( x ; z | H = H ) p H ( H ) d H (39a)s.t. (cid:90) I ( y ; z | H = H ) p H ( H ) d H ≤ C, (39b)where H is a realization of H . Let U ΛU H denote theeigendecomposition of HH H , where U is a unitary matrixwhose columns are the eigenvectors of HH H , and Λ is adiagonal matrix whose diagonal elements are the eigenvaluesof HH H . Since the rank of HH H is no greater than T = min { K, M } , there are at most T positive diagonal entriesin Λ . Denote them by λ t , where t ∈ T and T = { , · · · , T } .Let ˆ y = U H y = U H Hx + U H n . (40)Then, for a given channel realization H = H , ˆ y is condition-ally Gaussian, i.e., ˆ y | H = H ∼ CN ( , Λ + σ I M ) . (41)Since I ( x ; y | H = H ) = I ( x ; ˆ y | H = H ) , (42)we work with ˆ y instead of y in the following.Based on (39) and (41), it is known that MIMO channel p ( ˆ y | x , H ) can be ﬁrst divided into a set of parallel channelsfor different realizations of H , and each channel p ( ˆ y | x , H = H ) can be further divided into T independent scalar Gaussianchannels with SNRs ρλ t , ∀ t ∈ T . Accordingly, problem (4)can be decomposed into a set of parallel IB problems. Fora scalar Gaussian channel with SNR ρλ t , let c ub t denote theallocation of the bottleneck constraint C and R ub t denote thecorresponding rate. According to (38), we have R ub t = log (1 + ρλ t ) − log (cid:16) ρλ t − c ub t (cid:17) . (43)Then, the solution of problem (4) can be obtained by solvingthe following problem max { c ub t } T (cid:88) t =1 E (cid:2) R ub t (cid:3) (44a)s.t. T (cid:88) t =1 E (cid:2) c ub t (cid:3) ≤ C. (44b)Assume that λ t , ∀ t ∈ T are unordered positive eigenvaluesof HH H . Then, they follow the same distribution. Forconvenience, deﬁne a new variable λ which follows the samedistribution as λ t . The subscript ‘ t ’ in c ub t and R ub t can thus beomitted. In order to distinguish from R ub in (5), we use R ub to denote the bottleneck rate corresponding to c ub , i.e., R ub = log (1 + ρλ ) − log (cid:16) ρλ − c ub (cid:17) . (45) Then, we have T (cid:88) t =1 E (cid:2) R ub t (cid:3) = T E (cid:2) R ub (cid:3) , T (cid:88) t =1 E (cid:2) c ub t (cid:3) = T E (cid:2) c ub (cid:3) . (46)Problem (44) is thus equivalent to max c ub E (cid:2) R ub (cid:3) (47a)s.t. E (cid:2) c ub (cid:3) ≤ CT . (47b)This problem can be solved by the water-ﬁlling method.Consider the Lagrangian L = E (cid:2) − R ub + αc ub (cid:3) − αCT , (48)where α is the Lagrange multiplier. The KKT condition forthe optimality is ∂ L ∂c ub (cid:26) = 0 , if c ub > ≤ , if c ub = 0 . (49)Then, c ub = (cid:40) log ρλν , if λ > νρ , if λ ≤ νρ , (50)where ν = α/ (1 − α ) and it is chosen such that the followingbottleneck constraint is met E (cid:20) log ρλν | λ > νρ (cid:21) Pr (cid:26) λ > νρ (cid:27) = CT . (51)The informed receiver upper bound is thus given by R ub = T E (cid:20) log (1 + ρλ ) − log(1 + ν ) | λ > νρ (cid:21) Pr (cid:26) λ > νρ (cid:27) . (52)From the deﬁnition of H in (2), it is known that when K ≤ M (resp., when K > M ), H H H (resp., HH H ) is acentral complex Wishart matrix with M (resp., K ) degrees offreedom and covariance matrix I K (resp., I M ), i.e., H H H ∼CW K ( M, I K ) (resp., HH H ∼ CW M ( K, I M ) ) [12]. Since λ can be seen as one of the unordered positive eigenvaluesof H H H or HH H , its pdf is thus given by [12, Theorem2.17], [13] f λ ( λ ) = 1 T T − (cid:88) i =0 i !( i + S − T )! (cid:2) L S − Ti ( λ ) (cid:3) λ S − T e − λ , (53)where S = max { K, M } and the Laguerre polynomials are L S − Ti = e λ i ! λ S − T d i dλ i (cid:0) e − λ λ S − T + i (cid:1) . (54)Substituting (53) and (54) into (52) and (51), (5) and (6) canbe obtained. Theorem 1 is thus proven. PPENDIX BP ROOF OF L EMMA R ub approaches C as M → + ∞ , weﬁrst look at the special case with K = 1 . In this case, S = M and T = 1 . From (54) and (53), we have L S − T = 1 and thepdf of λ f λ ( λ ) = λ M − e − λ ( M − , (55)which shows that λ follows Erlang distribution with shapeparameter M and rate parameter , i.e., λ ∼ Erlang ( M, .The expectation of λ is thus M . As M → + ∞ , f λ ( λ ) becomesa delta function [14], andPr { λ = M } → , Pr { λ (cid:54) = M } → . (56)Hence, the bottleneck constraint (6) (cid:90) ∞ νρ (cid:18) log ρλν (cid:19) f λ ( λ ) dλ = C → log ρMν , (57)based on which we get ν → M ρ − C . (58)Then, the informed receiver upper bound R ub → log (1 + M ρ ) − log (cid:0) M ρ − C (cid:1) → C. (59)Next, we consider the general case. For any positive integer K , when M → + ∞ , based on the deﬁnition of H and the lawof large numbers, we have H H H → M I K . Since H H H and HH H have the same positive eigenvalues, we have λ → M .(56) also holds for this general case. Then, (cid:90) ∞ νρ (cid:18) log ρλν (cid:19) f λ ( λ ) dλ = CT → log ρMν , (60)based on which we get ν → M ρ − C/T . (61)We thus have R ub → T (cid:104) log (1 + M ρ ) − log (cid:16) M ρ − C/T (cid:17)(cid:105) → C. (62)Now we prove that R ub approaches C as ρ → + ∞ . From(6), it can be seen that (cid:82) ∞ νρ (cid:16) log ρλν (cid:17) f λ ( λ ) dλ reduces with ν .Therefore, when ρ → + ∞ , to ensure that constraint (6) holds, ν becomes large. Then, we have R ub = T (cid:90) ∞ νρ [log (1 + ρλ ) − log(1 + ν )] f λ ( λ ) dλ → T (cid:90) ∞ νρ [log ( ρλ ) − log ν ] f λ ( λ ) dλ = C. (63) In addition, when C → + ∞ , it can be found from (6) that ν → . Using (5), we can get (7), which is the capacity ofChannel 1. This completes the proof.A PPENDIX CP ROOF OF T HEOREM ˆ n g ∼ CN ( , A (cid:48) ) and (cid:6) a k (cid:7) B has J possible values,i.e., b , · · · , b J , the channel in (11) can be divided into KJ independent scalar Gaussian sub-channels with noise power (cid:6) a k (cid:7) B = b j for each sub-channel. For the sub-channel withnoise power (cid:6) a k (cid:7) B = b j , let c k,j denote the allocation of thebottleneck constraint C and R k,j denote the correspondingrate. According to (38), we have R k,j = log (1 + ρ j ) − log (cid:0) ρ j − c k,j (cid:1) , (64)where ρ j = b j . Since b J = + ∞ , we let R k,J = 0 and c k,J = 0 . Note that based on [8, (16)], the representation of ˆ x g , i.e., ˆ z g , can be constructed by adding independent fadingand Gaussian noise to each element of ˆ x g in (11). Denote P k,j = Pr (cid:8)(cid:6) a k (cid:7) B = b j (cid:9) . (65)Then, the optimal I ( x ; ˆ z g | A (cid:48) ) is equal to the objective func-tion of the following problem max { c k,j } K (cid:88) k =1 J − (cid:88) j =1 P k,j R k,j (66a)s.t. K (cid:88) k =1 J − (cid:88) j =1 P k,j c k,j ≤ C − K (cid:88) k =1 H k , (66b)where H k = − (cid:80) Jj =1 P k,j log P k,j .Since K ≤ M , as stated in Appendix A, H H H ∼CW K ( M, I K ) . Then, ( H H H ) − follows a complex inverseWishart distribution and the diagonal elements of ( H H H ) − are identically inverse chi squared distributed with M − K + 1 degrees of freedom [15]. Let η denote one of the diagonalelement of ( H H H ) − . The pdf of η is thus given by f η ( η ) = 2 − ( M − K +1) / Γ (cid:0) M − K +12 (cid:1) η − ( M − K +1) / − e − / (2 η ) . (67)Since A = σ ( H H H ) − , the diagonal entries of A , i.e., a k , ∀ k ∈ K , are marginally identically distributed. Let a denote a new variable which has the same distribution as a k . a thus follows the same distribution as σ η and its pdf is givenby f a ( a ) = 1 σ f η (cid:16) aσ (cid:17) = (2 /σ ) − ( M − K +1) / Γ (cid:0) M − K +12 (cid:1) a − ( M − K +1) / − e − σ / (2 a ) . (68)n addition, P k,j , R k,j , and c k,j can be simpliﬁed to P j , R j ,and c j by dropping subscript ‘ k ’. Using (68), pmf P j can becalculated as follows P j = Pr (cid:8)(cid:6) a (cid:7) B = b j (cid:9) = Pr { b j − < a ≤ b j } = (cid:90) b j b j − f a ( a ) da. (69)Problem (66) thus becomes max { c j } J − (cid:88) j =1 KP j R j (70a)s.t. J − (cid:88) j =1 KP j c j ≤ C − KH , (70b)where R j = log (1 + ρ j ) − log (cid:0) ρ j − c j (cid:1) ,H = − J (cid:88) j =1 P j log P j . (71)Analogous to problem (47), (70) can be optimally solved bythe water-ﬁlling method. The optimal I ( x ; ˆ z g | A (cid:48) ) is given by R lb = J − (cid:88) j =1 KP j (cid:2) log (1 + ρ j ) − log(1 + ρ j − c j ) (cid:3) . (72)where c j = (cid:2) log ρ j ν (cid:3) + and ν is chosen such that the bottleneckconstraint J − (cid:88) j =1 KP j c j = C − KH , (73)is met. Theorem 2 is then proven.A PPENDIX DP ROOF OF L EMMA Φ is a diagonal matrix with positive and real diagonalentries, it is invertible. Denote z (cid:48) = Φ − z = x + ˆ n + Φ − ˆ n (cid:48) g , ˆ z (cid:48) g = Φ − ˆ z g = x + ˆ n g + Φ − ˆ n (cid:48) g . (74)For a given A (cid:48) , each element in ˆ n is Gaussian distributedwith zero mean and variance (cid:6) a k (cid:7) B . However, ˆ n is not aGaussian vector since H is unknown. Hence, z (cid:48) is not aGaussian vector. As for ˆ z (cid:48) g , from (11) and (15), it is knownthat ˆ z (cid:48) g ∼ CN ( , I K + A (cid:48) + Φ − ) . We ﬁrst prove inequation (17). I ( ˆ x ; z | A (cid:48) )= I ( ˆ x ; z (cid:48) | A (cid:48) )= h ( z (cid:48) | A (cid:48) ) − h ( z (cid:48) | ˆ x , A (cid:48) ) ( a ) ≤ E (cid:2) log det (cid:0) I K + E (cid:2) ˆ n ˆ n H (cid:3) + Φ − (cid:1) − log det (cid:0) Φ − (cid:1)(cid:3) ( b ) ≤ E (cid:2) log det (cid:0) I K + A (cid:48) + Φ − (cid:1) − log det (cid:0) Φ − (cid:1)(cid:3) = I ( ˆ x g ; ˆ z (cid:48) g | A (cid:48) )= I ( ˆ x g ; ˆ z g | A (cid:48) ) , (75)where ( a ) holds since Gaussian distribution maximizes theentropy over all distributions with the same variance, and ( b ) follows by using Hadamard’s inequality.Denote x = ( x , · · · , x K ) T , z (cid:48) = ( z (cid:48) , · · · , z (cid:48) K ) T , ˆ z (cid:48) g =(ˆ z (cid:48) g, , · · · , ˆ z (cid:48) g,K ) T , and Φ = diag { ϕ , · · · , ϕ K } . Then, weprove inequation (18). Using the chain rule of mutual infor-mation, I ( x ; z | A (cid:48) ) = I ( x ; z (cid:48) | A (cid:48) )= K (cid:88) k =1 I ( x k ; z (cid:48) k | A (cid:48) ) + Q ≥ K (cid:88) k =1 I ( x k ; z (cid:48) k | A (cid:48) ) ( a ) = K (cid:88) k =1 I ( x k ; ˆ z (cid:48) g,k | A (cid:48) ) ( b ) = I ( x ; ˆ z (cid:48) g | A (cid:48) )= I ( x ; ˆ z g | A (cid:48) ) , (76)where Q is a non-negative constant, ( a ) holds since for a given A (cid:48) , both z (cid:48) k and ˆ z (cid:48) g,k follow CN (cid:0) , (cid:6) a k (cid:7) B + ϕ − k (cid:1) , and ( b ) follows since the elements in x and ˆ z (cid:48) g are independent.Since Φ is optimally obtained when solving IB prob-lem (12), bottleneck constraint (12b) is thus satisﬁed and I ( x ; ˆ z g | A (cid:48) ) = R lb . Then, from (75) and (76), we have I ( ˆ x ; z | A (cid:48) ) ≤ C − KH ,I ( x ; z | A (cid:48) ) ≥ R lb . (77)This completes the proof.A PPENDIX EP ROOF OF L EMMA M → + ∞ , H H H → M I K . Hence, A → σ M I K . Let J = 2 , b = σ M + (cid:15) , and b = + ∞ , where (cid:15) is a sufﬁciently small positive real number.Since A → σ M I K , we have P → and H → . Then, from(13) and (14), c → CK ,R lb → K (cid:20) log (cid:18) Mσ (cid:19) − log (cid:18) Mσ − CK (cid:19)(cid:21) → C. (78)hen ρ → + ∞ , σ → and A → . By setting J = 2 and b small enough, it can be proven as above that R lb → C .When C → + ∞ , it is possible to choose suitable quanti-zation points B = { b , · · · , b J } such that ˆ x g in (11) and A can be almost perfectly quantized. Hence, R lb = I ( x ; ˆ z g | A (cid:48) ) → I ( x ; ˆ x g | A (cid:48) )= E [log det ( I K + A (cid:48) ) − log det ( A (cid:48) )] → E [log det ( I K + A ) − log det ( A )] , (79)On the other hand, the capacity of Channel 1 is given by I ( x ; y , H ) = I ( x ; y | H )= E (cid:2) log det (cid:0) HH H + σ I M (cid:1) − log det (cid:0) σ I M (cid:1)(cid:3) = E (cid:2) log det (cid:0) H H H + σ I K (cid:1) − log det (cid:0) σ I K (cid:1)(cid:3) = E [log det ( I K + A ) − log det ( A )] . (80)To prove that (79) is upper bounded by (80), we ﬁrst give andprove the following lemma. Lemma 5.

For any K -dimensional positive deﬁnite matrix N ,let N = N (cid:12) I K , i.e., N consist of the diagonal elementsof N . Then, log det ( I K + N ) − log det ( N ) ≥ log det ( I K + N ) − log det ( N ) . (81) Proof:

Obviously, (81) is equivalent to log det ( N ) − log det ( N ) ≥ log det ( I K + N ) − log det ( I K + N ) . (82)To prove (82), we introduce an auxiliary function g ( x ) =log det ( x I K + N ) − log det ( x I K + N ) and show that g ( x ) decreases monotonically w.r.t. x when x ≥ . By takingthe ﬁrst-order derivative to g ( x ) , we have g (cid:48) ( x ) = tr (cid:104) ( x I K + N ) − (cid:105) − tr (cid:104) ( x I K + N ) − (cid:105) . (83)To prove g (cid:48) ( x ) ≤ , we show in the following that for anypositive deﬁnite matrix O , we always havetr (cid:0) O − (cid:1) ≤ tr (cid:0) O − (cid:1) , (84)where O consists of the diagonal elements of O , i.e., O = O (cid:12) I K . Denote the diagonal entries of O (or O ) by o = ( o , · · · , o K ) T and the eigenvalues of O by θ = ( θ , · · · , θ K ) T . Since O is a positive deﬁnite matrix, theentries of o and θ are real and positive. In addition, accordingto the Schur-Horn theorem, o is majorized by θ , i.e., o ≺ θ . (85)Deﬁne a real vector u = ( u , · · · , u K ) T with u k > , ∀ k ∈K , and function g ( u ) = (cid:80) Kk =1 1 u k . It is obvious that g ( u ) is convex and symmetric. Hence, g ( u ) is a Schur-convexfunction. Therefore, g ( o ) ≤ g ( θ ) . (86) Using (86), we havetr (cid:0) O − (cid:1) = K (cid:88) k =1 o k = g ( o ) ≤ g ( θ )= K (cid:88) k =1 θ k = tr (cid:0) O − (cid:1) , (87)based on which we get g (cid:48) ( x ) ≤ and (81) can then be proven. (cid:3) Then, from (79), (80), and Lemma 5, it is known that when C → + ∞ , R lb → E [log det ( I K + A ) − log det ( A )]= K E (cid:20) log (cid:18) a (cid:19)(cid:21) ≤ I ( x ; y , H ) , (88)where the expectation can be calculated by using the pdf of a in (68). Lemma 3 is thus proven.A PPENDIX FP ROOF OF T HEOREM

U ΛU H is the eigendecom-position of HH H and λ t , ∀ t ∈ T are unordered positiveeigenvalues of HH H . To derive R lb , we further denotethe singular value decomposition of H by U LV H , where V ∈ C K × K is a unitary matrix and L ∈ R M × K is arectangular diagonal matrix. In fact, the diagonal entries of L are the non-negative square roots of the positive eigenvaluesof HH H . Then, from (25), we have F H H = H H (cid:0) HH H + σ I M (cid:1) − H , = V L H (cid:0) Λ + σ I M (cid:1) − LV H , = V diag (cid:26) λ λ + σ , · · · , λ T λ T + σ , HK − T (cid:27) V H , F H HH H F = V L H (cid:0) Λ + σ I M (cid:1) − Λ (cid:0) Λ + σ I M (cid:1) − LV H , = V diag (cid:40) λ ( λ + σ ) , · · · , λ T ( λ T + σ ) , HK − T (cid:41) V H , F H F = V L H (cid:0) Λ + σ I M (cid:1) − LV H , = V diag (cid:40) λ ( λ + σ ) , · · · , λ T ( λ T + σ ) , HK − T (cid:41) V H , (89)here K − T is a ( K − T ) -dimensional all ‘ ’ column vector.Based on (89), F H HH H F + σ F H F + D I K = V diag (cid:26) λ λ + σ + D, · · · , λ T λ T + σ + D, D × HK − T (cid:27) V H , (90)where K − T is a ( K − T ) -dimensional all ‘ ’ column vector.Since Λ is independent of U , L is independent of U as wellas V , and λ t , ∀ t ∈ T are unordered, we have E (cid:2) log det (cid:0) F H HH H F + σ F H F + D I K (cid:1)(cid:3) = T E (cid:20) log (cid:18) λλ + σ + D (cid:19)(cid:21) + ( K − T ) log D. (91)Then, we calculate G in (33). For this purpose, we have tocalculate E (cid:2) F H H (cid:3) , E (cid:2) F H HH H F (cid:3) , and E (cid:2) F H F (cid:3) . To getthese expectations, we consider two different cases, i.e., thecase with K ≤ M and the case with K > M . When K ≤ M ,from (89), we have E (cid:2) F H H (cid:3) = E (cid:20) λλ + σ (cid:21) I K , E (cid:2) F H HH H F (cid:3) = E (cid:20) λ ( λ + σ ) (cid:21) I K , E (cid:2) F H F (cid:3) = E (cid:20) λ ( λ + σ ) (cid:21) I K . (92)When K > M , denote V = ( v , · · · , v K ) . Then, from (89), F H H = V diag (cid:26) λ λ + σ , · · · , λ M λ M + σ , HK − T (cid:27) V H = (cid:18) λ λ + σ v , · · · , λ M λ M + σ v M , HK , · · · , HK (cid:19)  v H ... v HK  = M (cid:88) m =1 λ m λ m + σ v m v Hm . (93)Since v m is the eigenvector of matrix H H H and is indepen-dent of unordered eigenvalue λ m , we have E (cid:2) F H H (cid:3) = M (cid:88) m =1 E (cid:20) λ m λ m + σ (cid:21) K I K = MK E (cid:20) λλ + σ (cid:21) I K . (94)Similarly, we also have E (cid:2) F H HH H F (cid:3) = MK E (cid:20) λ ( λ + σ ) (cid:21) I K , E (cid:2) F H F (cid:3) = MK E (cid:20) λ ( λ + σ ) (cid:21) I K . (95) Using (92), (94), (95), and (33), G can be calculated as G = E (cid:2) F H HH H F (cid:3) − E (cid:2) F H H (cid:3) E (cid:2) H H F (cid:3) + σ E (cid:2) F H F (cid:3) + D I K = (cid:40) TK E (cid:20) λλ + σ (cid:21) − T K (cid:18) E (cid:20) λλ + σ (cid:21)(cid:19) + D (cid:41) I K . (96)Hence, log det( G )= K log (cid:40) TK E (cid:20) λλ + σ (cid:21) − T K (cid:18) E (cid:20) λλ + σ (cid:21)(cid:19) + D (cid:41) . (97)Substituting (91) and (97) into (31) and (32), respectively, andusing (30), we can get (34).We then calculate D in (35). From (26), (92), and (95), E (cid:2) ¯ x ¯ x H (cid:3) = E (cid:2) F H HH H F + σ F H F (cid:3) = TK E (cid:20) λλ + σ (cid:21) I K . (98) I ( ¯ x g ; ¯ z g ) in (29) can thus be calculated as follows I ( ¯ x g ; ¯ z g ) = log det (cid:32) I K + E (cid:2) ¯ x ¯ x H (cid:3) D (cid:33) = K log (cid:18) TDK E (cid:20) λλ + σ (cid:21)(cid:19) = C, (99)based on which (35) can be obtained. Theorem 3 is thenproven. A PPENDIX GP ROOF OF L EMMA M → + ∞ , T = K . As stated in Appendix B, H H H → M I K . Hence, λ → M . From (99), I ( ¯ x g ; ¯ z g ) = K log (cid:18) D E (cid:20) λλ + σ (cid:21)(cid:19) = C → K log (cid:18) D (cid:19) . (100)Combining (34) and (100), we have R lb → K log(1 + D ) − K log D = K log (cid:18) D (cid:19) → C. (101)When K ≤ M and ρ → + ∞ , T = K and σ → . Using(99) and (34), we can also get (100) and (101).When K ≤ M and C → + ∞ , it can be found from (35)that D → . Then, using (34), we can get (36). This ﬁnishesthe proof. CKNOWLEDGMENTS

This work was supported by the Alexander von HumboldtFoundation and the work of S. Shamai has been supported bythe European Union’s Horizon 2020 Research and InnovationProgramme with grant agreement No. 694630.R

EFERENCES[1] N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneckmethod,” arXiv preprint physics/0004057 , 2000.[2] R. Shwartz-Ziv and N. Tishby, “Opening the black box of deep neuralnetworks via information,” arXiv preprint arXiv:1703.00810 , 2017.[3] R. Dobrushin and B. Tsybakov, “Information transmission with addi-tional noise,”

IRE Trans. Inf. Theory , vol. 8, no. 5, pp. 293–304, Sep.1962.[4] H. Witsenhausen, “Indirect rate distortion problems,”

IEEE Trans. Inf.Theory , vol. 26, no. 5, pp. 518–521, Sep. 1980.[5] T. A. Courtade and T. Weissman, “Multiterminal source coding underlogarithmic loss,”

IEEE Trans. Inf. Theory , vol. 60, no. 1, pp. 740–761,Jan. 2014.[6] I. E. Aguerri, A. Zaidi, G. Caire, and S. S. Shitz, “On the capacity ofcloud radio access networks with oblivious relaying,”

IEEE Trans. Inf.Theory , vol. 65, no. 7, pp. 4575–4596, July 2019.[7] J. Demel, T. Monsees, C. Bockelmann, D. Wuebben, and A. Dekorsy,“Cloud-ran fronthaul rate reduction via ibm-based quantization formulticarrier systems,” in

Proc. 24th International ITG Workshop onSmart Antennas , Hamburg, Germany, Feb. 2020, pp. 1–6.[8] A. Winkelbauer and G. Matz, “Rate-information-optimal Gaussian chan-nel output compression,” in

Proc. 48th Annu. Conf. Inf. Sci. Syst. (CISS) ,Princeton, NJ, USA, Mar. 2014, pp. 1–5.[9] A. Winkelbauer, S. Farthofer, and G. Matz, “The rate-information trade-off for Gaussian vector channels,” in

Proc. IEEE Int. Symp. Inf. Theory ,Honolulu, USA, June 2014, pp. 2849–2853.[10] G. Caire, S. Shamai, A. Tulino, S. Verdu, and C. Yapar, “Informationbottleneck for an oblivious relay with channel state information: thescalar case,” in

Proc. IEEE Int. Conf. Science of Electrical Engineeringin Israel (ICSEE) , Eilat, Israel, Dec. 2018, pp. 1–5.[11] T. M. Cover and J. A. Thomas,

Elements of information theory . JohnWiley & Sons, 2012.[12] A. M. Tulino, S. Verd´u et al. , Random matrix theory and wirelesscommunications . Now Publishers, 2004.[13] E. Telatar, “Capacity of multi-antenna gaussian channels,”

Europ. Trans.Telecommun. , vol. 10, no. 6, pp. 585–595, Nov.-Dec. 1999.[14] W. C. Lee, “Estimate of channel capacity in rayleigh fading environ-ment,”

IEEE trans. Veh. Tech. , vol. 39, no. 3, pp. 187–189, Aug. 1990.[15] L. E. Brennan and I. S. Reed, “An adaptive array signal processingalgorithm for communications,”