On State Dependent Broadcast Channels with Cooperation
aa r X i v : . [ c s . I T ] M a y On State Dependent Broadcast Channels withCooperation
Lior Dikstein, Haim H. Permuter and Yossef Steinberg
Abstract
In this paper, we investigate problems of communication over physically degraded, state-dependent broadcastchannels (BCs) with cooperating decoders. Two different setups are considered and their capacity regions arecharacterized. First, we study a setting in which one decoder can use a finite capacity link to send the other decoderinformation regarding the messages or the channel states. In this scenario we analyze two cases: one where noncausalstate information is available to the encoder and the strong decoder and the other where state information is availableonly to the encoder in a causal manner. Second, we examine a setting in which the cooperation between the decodersis limited to taking place before the outputs of the channel are given. In this case, one decoder, which is informedof the state sequence noncausally, can cooperate only to send the other decoder rate-limited information about thestate sequence. The proofs of the capacity regions introduce a new method of coding for channels with cooperationbetween different users, where we exploit the link between the decoders for multiple-binning. Finally, we discussthe optimality of using rate splitting techniques when coding for cooperative BCs. In particular, we show that ratesplitting is not necessarily optimal when coding for cooperative BCs by solving an example in which our method ofcoding outperforms rate splitting.
Index Terms
Binning, broadcast channels, causal coding, channel capacity, cooperative broadcast, degraded broadcast channel,noncausal coding, partial side information, side information, state-dependent channels.
I. I
NTRODUCTION
Classical broadcast channels (BCs) adequately model a variety of practical communication scenarios, such ascellular systems, Wi-Fi routers and digital TV broadcasting. However, with the rapid growth of wireless networking,it is necessary to expand the study of such channels and to consider more complex settings that can more accuratelydescribe a wider range of scenarios. Some of these important extensions include settings in which the BC is statedependent. Wireless channels with fading, jamming or interference in the transmission are but a few examples that
L. Dikstein and H. Permuter are with the department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel ([email protected], [email protected]). Y. Steinberg is with the Department of Electrical Engineering, Technion, Haifa, Israel([email protected]). This paper was presented in part at the 2013 51st Annual Allerton Conference on Communication, Control, andComputing (Allerton). This work was supported by the Israel Science Foundation (grant no. 684/11). can be modeled by state-dependent settings. Other important extensions include cooperation between different nodesin a network, where the nodes help one another in decoding. These settings can, inter alia, describe sensor networks,in which a large number of nodes conduct measurements on some ongoing process. When such measurements arecorrelated, cooperation between nodes can assist them in decoding. Some practical sensor network applicationsinclude surveillance, health monitoring and environmental detection. Therefore, the results presented in this workwill contribute to meeting the growing need to find the fundamental limits of such important communicationscenarios.The most general form of the BC, in which a single source attempts to communicate simultaneously to twoor more receivers, was introduced by Cover in [4]. Following his work, the capacity of the degraded BC wascharacterized by Bergmans [2] and Gallager [6]. In the degraded BC setting, one receiver is statistically strongerthan the other. This scenario can, for instance, model TV broadcasts, where some users consume high definitionmedia, while other users watch the same broadcast with lower resolution. In [9], El-Gamal expanded the capacityresult for the degraded BC and in [7]- [10] discussed the two-user BC with and without feedback. Later, in [8]El-Gamal showed that the capacity of the two-user physically degraded BC does not change with feedback.State-dependent channels were first introduced by Shannon [20], who characterized the capacity for the casewhere a single user channel, P Y | X,S , is controlled by a random parameter S , and the state information sequenceup to time i , s i , is known causally at the encoder. The case in which the full realization of the channel states, s n ,is known noncausally at the encoder was presented by Kuznetsov and Tsybakov [14], in the context of coding fora storage unit, and similar cases were studied by Heegard and El Gamal in [13]. The capacity of the channel forthis case was fully solved by Gel’fand and Pinsker in [12]. In recent years, growing interest in state-dependentchannels has resulted in many studies on multi-user settings. Some examples considering multiple access channels(MAC) include the works of Lapidoth and Steinberg [15], [16], Piantanida, Zaidi and Shamai [19], Somekh-Baruch,Shamai and Verdu [22] and many more. In the case of BCs, Steinberg studied the degraded, state-dependent BCin [23]. Inner and outer bounds were derived for the case in which the state information is known noncausally atthe encoder and the capacity region was found for the case in which the states are known causally at the encoderor known noncausally at both the encoder and the strong decoder. Our channel setting with cooperation naturallyextends this model, and the capacity results of this paper generalize the capacity regions found in [23].Other important settings for state-dependent channels are cases where only rate-limited state information isavailable at the encoder or decoder. In many practical systems, information on the channel is not available freely.Thus, to provide side information on the channel’s states to the different users, we must allocate resources such astime slots, bandwidth and memory. Heegard and El Gamal [13] presented a model of a state-dependent channel,where the transmitter is informed of the state information at a rate-limited to R e and the receiver is informed ofthe state information at a rate-limited to R d . Cover and Chiang [3] extended the Gel’fand-Pinsker problem to thecase where both the encoder and the decoder are provided with different, but correlated, partial side information.Steinberg in [24] derived the capacity of a channel where the state information is known fully to the encoder ina noncausal manner and is available rate-limited at the decoder. Coding for such a channel involves solving a Gel’fand-Pinsker problem as well as a Wyner-Ziv problem [27] simultaneously. An extension of this setting to adegraded, state-dependent BC is introduced in this work.Cooperation between different users in multi-user channels was first considered for MACs in the works of Willems[25]. Further studies involving cooperation between encoders in MACs include papers such as the work of Willemsand van der Meulen [26], and later [21] and [1]. A setting in which cooperation between users takes place in astate-dependent MAC, where partial state information is known to each user and full state information is known atthe decoder, was treated by Shamai, Permuter and Somekh-Baruch in [18].The notion of cooperation between receivers in a BC was proposed by Dabora and Servetto [5], where thecapacity region for the cooperative physically degraded BC is characterized. Simultaneously, Liang and Veeravalliindependently examined a more general problem of a relay-BC in [17]. The direct part of the proof for the capacityregion of the cooperative physically degraded BC, combines methods of broadcast coding together with a codeconstruction for the degraded relay channel. The BC setting we present in this work generalizes the model ofDabora and Servetto, and therefore our capacity result generalizes the result of [5]. Moreover, the coding schemewe propose, that achieves the capacity of the more general physically degraded state-dependent cooperative BC, isfundamentally different and in some sense simpler, since it uses binning instead of block Markov coding.In this work, we consider several scenarios of cooperation between decoders for physically degraded, state-dependent (PDSD) BCs. First, we consider a setting in which there is no constraint on when the cooperation linkbetween the decoders, C , should be used. For this setting, we characterize the capacity region for the noncausalcase, where noncausal state information is known at the encoder and the strong decoder, and for the causal case,where causal state information is available only at the encoder. The proof proposes a new coding method forchannels with cooperating users, using multiple-binning. We suggest dividing the weak decoder’s message set intobins, where the number of bins is determined by the capacity link between the decoders. The strong decoder willuse the cooperation link to send the weak decoder the bin number containing its message, and hence narrow downthe search from the entire message set, to the message set of that bin alone. This scheme increases the rate of theweak user.The optimal schemes of the first scenario use the cooperation link between the decoders, C , solely for sendingadditional information about the messages, i.e., information about the state sequence is not sent explicitly via C .The second setting we consider is a case in which the cooperation link C can be used only before the outputs aregiven. In such a case, the strong decoder can use the cooperation link only to convey rate-limited state information tothe weaker user. This setting can be regarded as a broadcast extension of the results in [24]. The capacity region forthis case is derived by using the methods we developed when solving the first scenario, combined with Wyner-Zivcoding.Another interesting result involves the use of rate splitting techniques when coding for cooperative BC. In MACs,rate splitting, which is the most common coding method for dealing with cooperation, has been used to achievethe capacity for most settings that have been solved. Thus, a first guess would be to use this method when codingfor cooperative BCs. However, we show that rate splitting schemes are not necessarily optimal for BCs. Moreover, we demonstrate that out method of coding, using binning, strictly outperforms rate splitting. An example for whichrate splitting is shown to be suboptimal is the binary symmetric BC with cooperation.The remainder of the paper is organized as follows. Section II presents the mathematical notation and definitionsused in this paper. In Section III, all the main results are stated, which include the capacity regions of threePDSD BC settings. Section III-A is devoted to the noncausal PDSD BC and a discussion of special cases, SectionIII-B is dedicated to the causal PDSD BC, and Section III-C is dedicated to the PDSD BC with rate-limited stateinformation. In Section IV, we discuss the optimality of using rate splitting methods when dealing with cooperatingBC, and in Section IV-A we give an example of a cooperative BC in which rate splitting is suboptimal. Finally,proofs are given in Section V. II. N OTATION AND P ROBLEM D EFINITION
PSfrag replacements
A BS i S n M Z M Y Encoder X n P Y,Z | X,S = P Y | X,S P Z | Y Y n Z n Decoder YDecoder Z C ˆ M Y ˆ M Z Fig. 1. The physically degraded, state-dependent BC with cooperating decoders. When considering cooperation between decoders in a physicallydegrade BC setting, only a cooperation link from the strong decoder to the weak decoder will contribute to increasing the rate.
In this paper, random variables are denoted with upper case letters, deterministic realizations or specific valuesare denoted with lower case letters, and calligraphic letters denote the alphabets of the random variables. Vectorsof n elements, ( x , x , ..., x n ) are denoted as x n , and x ji denotes the i − j + 1 -tuple ( x i , x i +1 , ..., x j ) when j ≥ i and an empty set otherwise. The probability distribution function of X , the joint distribution function of X and Y ,and the conditional distribution of X given Y are denoted by P X , P X,Y and P X | Y , respectively.A PDSD BC, ( S , P S ( s ) , X , Y , Z , P Y,Z | X,S ( y, z | x, s )) , illustrated in Fig. 1, is a channel with input alphabet X ,output alphabet Y ×Z and a state space S . The encoder selects a channel input sequence, X n = X n ( M Z , M Y , S n ) .The outputs of the channel at Decoder Y and Decoder Z are denoted Y n and Z n , respectively. The channel isassumed memoryless and without feedback, thus probabilities on n-vectors are given by: P Y,Z | X,S ( y n , z n | x n , s n ) = n Y i =1 P Y,Z | X,S ( y i , z i | x i , s i ) . (1)In addition, the channel probability function can be decomposed as P Y,Z | X,S ( y i , z i | x i , s i ) = P Y | X,S ( y i | x i , s i ) P Z | Y ( z i | y i ) , i.e, ( X, S ) − Y − Z form a Markov chain. Due to the Markov property ofphysically degraded BCs, only a cooperation link from Decoder Y to the Decoder Z will contribute to increasingthe rate. Definition 1: A ((2 nR Z , nR Y ) , nC , n ) code for the PDSD BC with noncausal side information available atthe encoder and strong decoder (where switch A is closed in Fig. 1) consists of two sets of integers, M Z = { , , ..., nR Z } and M Y = { , , ..., nR Y } , called message sets, an index set for the conference message M = { , , ..., nC } , an encoding function f : M Z × M Y × S n → X n , (2)a conference mapping h : Y n × S n → M , (3)and two decoding functions g y : Y n × S n → ˆ M Y (4) g z : Z n × M → ˆ M Z . (5) Definition 2:
The definition of an ((2 nR Z , nR Y ) , nC , n ) code for the PDSD BC with causal side informationand noninformed decoders (where switch A is open and switch B is closed in Fig. 1) follows Definition 1 above,except that the encoder (2) is replaced by a sequence of encoders: f i : M Z × M Y × S i → X n , i = 1 , , . . . , n (6)and Decoder Y’s decoding function (4) is replaced by g y : Y n → ˆ M Y . (7)In Definition 1 there is no restriction on when the link C can be used. However, if we restrict the link to beused before the sequence Y n is given to the strong decoder, then only information on the state sequence, S n , canbe sent there. This is the subject of the next definition. Definition 3: A ((2 nR Z , nR Y ) , nC , n ) code for the PDSD BC with rate-limited side information at the weakdecoder, illustrated in Fig. 2, consists of three sets of integers, M Z = { , , ..., nR Z } and M Y = { , , ..., nR Y } ,called message sets, an index set M = { , , ..., nC } , a channel encoding function f : M Z × M Y × S n → X n , (8)a state encoding function h s : S n → M , (9) and two decoding functions g y : Y n × S n → ˆ M Y (10) g z : Z n × M → ˆ M Z . (11)The next definitions, which deal with probability of error, achievable rates, and capacity region, hold for all thethree problems defined in definitions 1 to 3 above, with respect to the corresponding code definitions. Definition 4:
We define the average probability of error for a ((2 nR Z , nR Y ) , nC , n ) code as follows: P ( n ) e = P r (cid:0) ( ˆ M Y = M Y ) ∪ ( ˆ M Z = M Z ) (cid:1) . (12)The average probability of error at each receiver is defined as P ( n ) e,y = P r ( ˆ M Y = M Y ) (13) P ( n ) e,z = P r ( ˆ M Z = M Z ) . (14)As is commonly held when discussing BCs, the average probability P ( n ) e tends to zero as n approaches infinity, iff P ( n ) e,y and P ( n ) e,z both tend to zero as n approaches infinity. Definition 5:
A rate triplet ( R Z , R Y , C ) is achievable if there exists a sequence of codes ((2 nR Z , nR Y ) , nC , n ) such that P ( n ) e → as n → ∞ . Definition 6:
The capacity region is the closure of all achievable rates.PSfrag replacements S n M Z M Y Encoder X n P Y,Z | X,S = P Y | X,S P Z | Y Y n Z n Decoder YDecoder ZRate ≤ C ˆ M Y ˆ M Z StateEncoder S n Fig. 2. The physically degraded, state-dependent BC with full state information at the encoder and one decoder together with rate-limited stateinformation at the other decoder. This model describes the case where the cooperation between the decoders is confined such that it takes placeprior to the decoding of the messages. Therefore, the only information that the strong decoder, Decoder Y, can send to the weaker decoder,Decoder Z, is regarding the state sequence. However, the state is only partially available at Decoder Z, since it is sent rate-limited due to thelimited capacity of the link between the decoders.
III. M
AIN R ESULTS AND I NSIGHTS
A. Capacity Region of the PDSD BC with Cooperating Decoders
We begin by stating the capacity region for the PDSD BC illustrated in Fig. 1 (switch A is closed) in the followingtheorem. Theorem 1:
The capacity region of the PDSD BC, ( X, S ) − Y − Z , with noncausal state information knownat the encoder and at Decoder Y, with cooperating decoders, is the closure of the set that contains all the rates ( R Z , R Y ) that satisfy R Z ≤ I ( U ; Z ) − I ( U ; S ) + C (15a) R Y ≤ I ( X ; Y | U, S ) (15b) R Z + R Y ≤ I ( X ; Y | S ) , (15c)for some joint probability distribution of the form P S,U,X,Z,Y = P S P U | S P X | S,U P Y | X,S P Z | Y . (15d)The proof is given in Sections V-B and V-C. The achievability of Theorem 1 is proved by using techniquesthat include triple-binning and superposition coding. The main idea is to identify how to best use the capacity linkbetween the decoders. In particular, we want maximize the potential use of the capacity link while simultaneouslysuccessfully balancing the allocation of rate resources between the two messages. Changes in the allocation ofresources between the messages M Y and M Z are possible due to the fact that we use superposition coding. Usingthis coding method, Decoder Y, which is the strong decoder, also decodes the message M Z intended for Decoder Z.This allows us to shift resources between the messages and thus increase the rate resources of M Z at the expense ofthe message M Y . Decoder Y can then send information about M Z to Decoder Z by using the capacity link betweenthem. Therefore, the optimal coding scheme balances the distribution of rate resources between the messages, takinginto account that additional information can be sent through the capacity link.The additional information sent from Decoder Y to Decoder Z comes into play via the use of binning, thatis, we divide the messages M Z among superbins in ascending order. Next, we use a Gel’fand-Pinsker code foreach superbin. Now, we can redirect some of the rate resources of the message M Y to send the superbin indexthat contains M Z . Decoder Y, which decodes both messages, sends this superbin index to Decoder Z through thecapacity link between the decoders. Decoder Z then searches for M Z only in that superbin, by using joint typicalitymethods. By utilizing the capacity link through adding the superbining measure, we can increase the rate of M Z achieved using the standard Gel’fand-Pinsker coding scheme.Nevertheless, if the capacity link between the decoders is very large, there is still a restriction on the amount ofinformation we can send through it. This restriction is reflected through the bound on the rate sum, R Z + R Y ≤ I ( X ; Y | S ) . This bound indicates that we cannot send more information about ( M Y , M Z ) through this settingcompared to the information we could have sent about ( M Y , M Z ) through a state dependent point-to-point channel, where the state information is known at both the encoder and decoder. Moreover, note that R Z + R Y ≤ I ( X ; Y | S ) = I ( U, X ; Y | S ) = I ( U ; Y | S ) + I ( X ; Y | U, S ) , (16)that is, if we have a large capacity link between the decoders, we have a tradeoff between sending informationabout M Z and M Y . If we choose to send information about M Y at the maximal rate possible, I ( X ; Y | U, S ) , thenthe maximal rate we can send M Z is I ( U ; Y | S ) . In contrast, we can increase the rate of M Z (up to the minimumbetween I ( U ; Z ) − I ( U ; S ) + C or I ( X ; Y | S ) ) at the expense of reducing the rate of M Y . For example, if wehave an infinite capacity link between the decoders, we can consolidate the two decoders into one decoder whichwill to decode both messages.Another interesting insight is revealed by comparing a special case of the result presented in Theorem 1 withthe result presented in [5] which is referred to as the physically degraded BC with cooperating receivers (i.e., theBC model where the channel is not state dependent).Let us consider the special case of Theorem 1 where S = ∅ . As a result, the region (15) reduces to R Z ≤ I ( U ; Z ) + C (17a) R Y ≤ I ( X ; Y | U ) (17b) R Z + R Y ≤ I ( X ; Y ) , (17c)for some P U,X,Z,Y = P X,U P Y | X P Z | Y . This special case was studied in [5], where a different expression for the capacity region was found: R Z ≤ min { I ( U ; Z ) + C , I ( U ; Y ) } (18a) R Y ≤ I ( X ; Y | U ) (18b)for some P U,X,Z,Y = P X,U P Y | X P Z | Y . Since the two regions, (17) and (18), are shown to be the capacity regions of the same setting, it indicates thatthe two regions should be equivalent. It is simple to show that (18) ⊆ (17). However, the reverse inclusion is notso straightforward. This raises the question: are the regions indeed equal? The answer is given in the followingcorollary. Corollary 2:
The two regions, (17) and (18), are equivalent.A direct proof, which is not based on the fact that both regions characterize the same capacity, is given in SectionV-A. The main idea is to find a specific choice of U , for any rate pair ( R Z , R Y ) in (17), such that ( R Z , R Y ) satisfy(18), implying that (17) ⊆ (18). Hence, we conclude that (17) and (18) are equivalent. B. Causal Side Information
Consider the case where the state is known to the encoder in a causal manner, i.e., at each time index, i , theencoder has access to the sequence s i . This setting is illustrated in Fig. 1, where we take switch A to be open andswitch B to be closed. In this scenario, the encoder is the only user with information regarding the state sequence,in contrast to the noncausal case, where the strong decoder also has access to the channel states. The capacityregion for this setting is characterized in the following theorem. Theorem 3:
The capacity region for the PDSD BC with cooperating decoders and causal side information knownat the encoder is the closure of the set that contains all the rates ( R Z , R Y ) that satisfy R Z ≤ I ( U ; Z ) + C (19a) R Y ≤ I ( V ; Y | U ) (19b) R Z + R Y ≤ I ( V, U ; Y ) , (19c)for some joint probability distribution of the form P S,U,V,X,Z,Y = P S P U,V P X | S,U,V P Y | X,S P Z | Y . (19d)The proof is given in Section V-D. C. Capacity Region of the PDSD BC with Rate-Limited State Information at the Weak Decoder
In the previous two cases there was no restriction on when the cooperation link C is to be used. In the followingcase, we consider a setting where the cooperation is restricted to being used before the outputs Y n are given tothe strong decoder. Therefore, we can only use the cooperation link to send the weak decoder information aboutthe state sequence, s n , from the strong decoder. Moreover, since there is a limit on the information we can sendthrough the link, the weak decoder receives rate-limited information about the channel states. Hence, we modelthis setting as a PDBC with noncausal state information at the encoder and strong decoder and rate-limited stateinformation at the weak decoder. We state the capacity region for this setting in the following theorem Theorem 4:
The capacity region of the PDSD BC, ( X, S ) − Y − Z , with rate-limited state information at theweak decoder, illustrated in Fig. 2, is the closure of the set that contains all the rates ( R Z , R Y ) that satisfy R Z ≤ I ( U ; Z, S d ) − I ( U ; S, S d ) (20a) R Y ≤ I ( X ; Y | U, S, S d ) (20b)for some joint probability distribution of the form P S,U,X,Z,Y = P S P S d ,U,X | S P Y | X,S P Z | Y (20c)such that C ≥ I ( S ; S d ) − I ( Z ; S d ) . (20d) Remark 1:
As was noted in [24], we can replace the rate bound on R Z in (20a) with the bound: R Z ≤ I ( U ; Z | S d ) − I ( U ; S | S d ) . (21)This can easily be seen by applying the chain rule on the expressions on the right-hand side of (20a) as follows I ( U ; Z, S d ) − I ( U ; S, S d ) = I ( U ; S d ) + I ( U ; Z | S d ) − I ( U ; S d ) − I ( U ; S | S d )= I ( U ; Z | S d ) − I ( U ; S | S d ) . Remark 2:
Observe that the region (20) is contained in the region (15) given in Theorem 1. The rate R Z can befurther bounded as follows: R Z ≤ I ( U ; Z | S d ) − I ( U ; S | S d )= I ( U, S d ; Z ) − I ( U, S d ; S ) − I ( S d ; Z ) + I ( S d ; S ) ≤ I ( U, S d ; Z ) − I ( U, S d ; S ) + C = I ( ˜ U ; Z ) − I ( ˜ U ; S ) + C , where we take ˜ U = ( U, S d ) . Furthermore, with the definition of ˜ U , the rate R Y is bounded by R Y ≤ I ( X ; Y | U, S d , S )= I ( X ; Y | ˜ U , S ) . Finally, notice that the sum of rates, R Z + R Y , satisfies R Z + R Y ≤ I ( X ; Y | S ) , since R Z + R Y ≤ I ( X ; Y | S )= I ( ˜ U , X ; Y | S )= I ( U, S d : Y | S ) + I ( X ; Y | U, S d , S )= I ( U, S d : Y, S ) − I ( U, S d : S ) + I ( X ; Y | U, S d , S ) , where R Z ≤ I ( U ; Z | S d ) − I ( U ; S | S d ) ≤ I ( U, S d : Y, S ) − I ( U, S d : S ) . Hence we have that (20) ⊆ (15).The proof of Theorem 4 is given in Sections V-E and V-F. The coding scheme that achieves the capacity region(20) uses techniques that include superposition coding and Gel’fand-Pinsker coding, which are similar to the proofof Theorem 1, with an addition of Wyner-Ziv compression. The main idea is based on a Gel’fand-Pinsker code, butwith several extensions. First, note that the Channel Encoder and Decoder Y are both informed of the sequence s nd ,the compressed codeword that the State Encoder sends Decoder Z. This is due to the fact that they both know thestate sequence s n and the State Encoder’s strategy. The encoder then uses this knowledge to find a codeword u n , in a bin associated with m Z , that is jointly typical not only with s n , as in the original Gel’fand-Pinsker scheme,but also with s nd . Each codeword u n is set as the center of nR Y bins, one bin for each message m Y . Once acodeword u n is chosen, we look in its satellite bin m Y for a codeword x n such that it is jointly typical withthat u n , the state sequence s n and with s nd . Upon transmission, the State Encoder chooses a codeword s nd , theChannel Encoder chooses a codeword u n and a corresponding x n , where x n is transmitted through the channel.Consequently, identifying x n leads to the identification of u n .At the decoder’s end, the State Encoder sends Decoder Z a compressed version of s n by using the codewords s nd in a Wyner-Ziv scheme, where Decoder Z uses the channel output z n as side information. The joint typicalityof z n and s nd used for decoding is a result of the channel Markov S d − ( X, S ) − Y − Z and the fact that thecodewords x n are generated ∼ Q ni =1 p ( x i | u i , s d,i ) . Finally, s nd is used as side information to identify the codeword u n . As for the strong decoder, Decoder Y looks for codewords u n and x n that are jointly typical with the received y n , the state sequence s n , and the codeword s nd .IV. I S R ATE S PLITTING O PTIMAL FOR C OOPERATION ?When dealing with cooperation settings, the most common approach is the use of rate splitting. Many codingschemes based on rate splitting have been known to achieve the capacity of channels involving cooperation. Forexample, rate splitting is the preferred coding method when coding for cooperative MACs, and it has been shownto be optimal [25], [26] . However, when dealing with cooperative BCs, we show, by a numerical example, thatrate splitting schemes are not necessarily optimal. Moreover, other techniques, such as binning, strictly outperformrate splitting.The main idea of the rate splitting scheme is to split the message intended for the weaker decoder, M Z , into twomessages, M Z = ( M Z , M Z ) . Next, we reorganize the messages. We concatenate part of the message intended forthe weak decoder, M Z , to the message intended for the strong decoder, M Y . In addition, we define new messagesets, M ′ Z = M Z and M ′ Y = ( M Y , M Z ) , where we choose M Z to be of size ≤ C . Now that we have anew message set ( M ′ Z , M ′ Y ) , we transmit by using a Gelfand-Pinsker superposition coding scheme, such asthe one described in [23]. Once the strong decoder decodes both messages, ( M ′ Z , M ′ Y ) (which, in whole, equal ( M Z , M Y ) ), it uses the capacity link between the decoders, C , to send the message M Z to the weak decoder.To sum up, this scheme results with the strong decoder decoding both messages, and the weak decoder decodingthe original message M Z .The achievability scheme that uses the rate splitting method closely follows the achievability of Theorem 1,but with some alterations. In the rate splitting scheme, we define two sets of messages M Z = { , , ..., nR Z } and M Z = { , , ..., nR Z } , where |M Z ||M Z | = |M Z | and R Z = R Z + R Z , such that each message, m Z ∈ { , , ..., nR Z } , is uniquely defined by a pair of messages ( m Z , m Z ) . Using these definitions, we candefine a new pair of messages, ( m ′ Z , m ′ Y ) , where we take m ′ Z = m Z , R ′ Z = R Z , m ′ y = ( m Y , m Z ) and R ′ Y = R Y + R Z . The code is now constructed in a similar manner to the code described in the triple-binningachievability scheme with respect to ( m ′ Z , m ′ Y ) , despite the fact that in this scheme additional partitioning into superbins is not required. However, we will see that this fact turns out to be significant.To transmit ( m Y , m Z ) in the encoding stage, we first construct the corresponding pair ( m ′ Z , m ′ Y ) . The rest ofthe encoding is preformed in a manner similar to the encoding in Section V-B with respect to the constructed ( m ′ Z , m ′ Y ) . The decoding stage is also similar, except that now Decoder Y, upon decoding the messages ( ˆ m ′ Y , ˆ m ′ Z ) , uses the link C to send the message ˆ M Z to Decoder Z (instead of a bin number as in the achievabilityof Theorem 1). The code construction is illustrated in Fig. 3.PSfrag replacements a codeword u n ( m ′ Z ) n ( I ( U ; Z ) − I ( U ; S ) nI ( U ; S ) codewords in a bin nI ( U ; Z ) codewords in total u n ( m ′ Z ) cloud center nI ( X ; Y | U,S ) satellite bins. Each bin contains nI ( X ; S | U ) codewords x n ( m ′ Z , m ′ Y ) Fig. 3. The code construction for the rate splitting scheme. We can see that the code construction is similar to the triple-binning scheme,except that here we do not partition the bins associated with the messages ˜ m Z into superbins. Using the achievability result of the capacity region found in [23, Theorem 3] for the PDSD BC with stateinformation known at the encoder and decoder, together with the fact that the rate R Z cannot be negative orgreater than C , we derive the following bounds: R Z ≤ C (22a) R Z ≥ (22b) R Z ≤ I ( U ; Z ) − I ( U ; S ) (22c) R Y + R Z ≤ I ( X ; Y | U, S ) (22d) R Z + R Z + R Y ≤ I ( X ; Y | S ) . (22e) Recalling that R Z = R Z + R Z , we substitute R Z with R Z − R Z in the bounds (22c) and (22e). Next, by usingthe Fourier-Motzkin elimination, we eliminate the bounds that contain R Z , (22a), (22b) and (22d). The resultingregion is the following bounds on R Z and R Y R Z ≤ I ( U ; Z ) − I ( U ; S ) + C (23a) R Y ≤ I ( X ; Y | U, S ) (23b) R Z + R Y ≤ I ( U ; Z ) − I ( U ; S ) + I ( X ; Y | U, S ) . (23c)This region, (23), is the achievable region as a result of rate splitting.We note that in the process we derive an additional bound on the rate sum R Z + R Y ≤ I ( X ; Y | S ) ; however,we can see that this bound satisfied automatically by satisfying (23c), since I ( X ; Y | S ) = I ( U, X ; Y | S )= I ( U ; Y | S ) + I ( X ; Y | U, S )= I ( U ; Y, S ) − I ( U, S ) + I ( X ; Y | U, S ) ≥ I ( U ; Z ) − I ( U ; S ) + I ( X ; Y | U, S ) , (24)where the last inequality is due to the degradedness properties of the channel. Moreover, we also omit the bound C ≥ , which is follows from the problem setting.Examining the region (23) we notice that its form differs from the capacity region of this channel (15). Therefore,an interesting question rises: Are rate splitting coding schemes optimal for BCs with cooperating decoders? Weanswer this question in the following lemma. Lemma 5:
Using rate splitting coding for BCs with cooperating decoders is not necessarily optimal.
Proof:
We would like to show that the rate splitting coding scheme that derives the region (23) does not achievethe capacity of the channel given by (15) in Theorem 1. In order to do so, we need to show that the region (15)is strictly larger than the region achievable by rate splitting, (23). The region (15) is shown to be achievable inSection V-B by using triple-binning. Thus, by showing that (15) is strictly larger than (23), we can conclude thatthe rate splitting method is not optimal.Firstly, it is easy to see that the region (15) contains (23), since the bounds on R Z and R Y are the same, yetthe bound on the rate sum (15c) is greater than or equal to (23c), as shown in (24). However, to show that (15) isstrictly larger than (23), we need to show that for all distributions of the form (15d): n ∃ ( R Z , R Y ) ∈ (15) : ( R Z , R Y ) / ∈ (23) o . (25)This is not an easy task. If we look at the regions in their general form, we need to find a pair ( R Z , R Y ) ∈ (15) and show that for every random variable U we choose, ( R Z , R Y ) / ∈ (23) . Nevertheless, we can show that (15) isstrictly larger than (23) by considering a specific channel and and showing that for this specific setting (23) ⊂ (15) . A. The Special Case of the Binary Symmetric Broadcast Channel
Consider the binary symmetric BC, [11, Section 5.3], illustrated in Fig. 4. Here, Y = X ⊕ W , Z = X ⊕ W ,where W ∼ Ber( p ) and W ∼ Ber( p ). Note that we can present this channel as a physically degraded BC, where Y = X ⊕ W , Z = X ⊕ ˜ W and W ∼ Ber( p ), ˜ W ∼ Ber( p − p − p ). This channel is a special case of our setting,where the channel is not state-dependent (hence, we take the state as a constant).PSfrag replacements X W ∼ Ber( p ) W ∼ Ber( p ) Y ZW ∼ Ber( p ) ˜ W ∼ Ber( p − p − p ) Fig. 4. The physically degraded binary symmetric BC.
Following closely the arguments given in [11, Section 5.4.2] we can upper bound the region (23) by consideringall the sets of rate pairs ( R Z , R Y ) such that R Z ≤ − H ( α ∗ p ) + C (26a) R Y ≤ H ( α ∗ p ) − H ( p ) (26b) R Z + R Y ≤ − H ( α ∗ p ) + H ( α ∗ p ) − H ( p ) . (26c)for some α ∈ [0 , ] . In contrast, we can show that by taking X = U ⊕ V , where U ∼ Ber( ) and V ∼ Ber( α ), andcalculating the corresponding expressions of (15), the following region of rate pairs, ( R Z , R Y ) such that R Z ≤ − H ( α ∗ p ) + C (27a) R Y ≤ H ( α ∗ p ) − H ( p ) (27b) R Z + R Y ≤ − H ( p ) , (27c)is achievable via the binning scheme.Consequently, we can see that the region (27) is strictly larger than the region (26). For example, consider Fig.5, where we take p = 0 . , p = 0 . and C = 0 . . Looking at both regions, we can see that taking by R Y tobe zero, the point ( R Z , R Y ) = (0 . , is achievable in the binning region (27) (the doted line) for α = 0 , butit is not achievable in the rate splitting region (26) (the solid line) for any value of α ∈ [0 , ] .Thus, for the binary symmetric BC we have shown that an achievable region derived from (15) by a specificchoice of U is strictly larger than the upper bound for the region (23). Therefore, we can conclude that (15) isstrictly larger then (23) and that the rate splitting coding scheme is not necessarily optimal for BCs. PSfrag replacements R Y R Y R Y R Y R Z R Z R Z R Z Rate Region for C = 0 . Rate Region for C = 0 . Rate Region for C = 0 . Rate Region for C = 0 . Fig. 5. The upper bound for the region (23), which is calculated in (26), is plotted using the solid line and corresponds to the smaller region.The achievable region for (15), given by the expressions in (27), is plotted using the dashed line and corresponds to the larger region. Bothregions in all the figures are plotted for values of p = 0 . , p = 0 . , where each figure corresponds to a different value of C . V. P
ROOFS
A. Proof of Corollary 2
Let us denote our region without states by A . It is characterized as the union of all rate pairs ( R y , R z ) satisfying: R z ≤ I ( U ; Z ) + C (28a) R y ≤ I ( X ; Y | U ) (28b) R y + R z ≤ I ( X ; Y ) (28c)for some joint distribution P U,X,Y,Z = P U P X | U P Y | X P Z | Y (29) where P Y | X P Z | Y is the original BC (without states). The region of Dabora and Servetto, presented in [5], is theunion of all rate pairs ( R y , R z ) satisfying R z ≤ min { I ( U ; Z ) + C , I ( U ; Y ) } (30a) R y ≤ I ( X ; Y | U ) (30b)for some joint distribution (29). For brevity, we denote the region of Dabora and Servetto by B . It is simple to showthat B ⊆ A . We now proceed to show the reverse inclusion, i.e.,
A ⊆ B . Let ( R y , R z ) be a rate pair in A , achievedwith a given pair of random variables ( U, X ) . If R y = I ( X ; Y | U ) , then by (28c) and the Markov structure (29),we also have: R z ≤ I ( U ; Y ) (31)and (28a), (31), (28b) coincide with the region B . Therefore, we have only to examine the case where a strictinequality holds in (28b). Thus, let R y = I ( X ; Y | U ) − γ (32)for some γ > . Define the random variable U ∗ = U w.p. λX w.p. − λ. (33)Clearly, the Markov structure U ∗ − X − Y − Z (34)still holds. Moreover I ( X ; Y | U ∗ ) = I ( X ; Y | U ∗ = U ) P ( U ∗ = U ) + I ( X ; Y | U ∗ = X ) P ( U ∗ = X )= λI ( X ; Y | U ) (35)(In (35) and in the sequel, by I ( X ; Y | U ∗ = U ) we mean I ( X ; Y | U ∗ , U ∗ = U ) , that is, the conditioning is notonly on the event that U ∗ = U but also on the specific value.) Now, we choose λ to be λ = I ( X ; Y | U ) − γI ( X ; Y | U ) . (36)Note that with this choice, the following holds R y = I ( X ; Y | U ) − γ = I ( X ; Y | U ∗ ) (37)and R y + R z ≤ I ( X ; Y ) = I ( XU ∗ ; Y ) = I ( U ∗ ; Y ) + I ( X ; Y | U ∗ )= I ( U ∗ ; Y ) + R y . (38)so that R z ≤ I ( U ∗ ; Y ) . (39) We now turn to bound I ( U ∗ ; Z ) . For this purpose, observe that we can decompose I ( X ; Z ) as I ( X ; Z ) = I ( U ; Z ) + I ( X ; Z | U ) (40a) = I ( U ∗ ; Z ) + I ( X ; Z | U ∗ )= I ( U ∗ ; Z ) + I ( X ; Z | U ∗ = U ) P ( U ∗ = U )+ I ( X ; Z | U ∗ = X ) P ( U ∗ = X )= I ( U ∗ ; Z ) + λI ( X ; Z | U ) . (40b)From (40a) and (40b) we obtain I ( U ∗ ; Z ) = I ( U ; Z ) + (1 − λ ) I ( X ; Z | U ) ≥ I ( U ; Z ) . (41)Therefore, (28a) and (41) imply R z ≤ I ( U ∗ Z ) + C . (42)From (39), (42), and (37) we have R z ≤ min { I ( U ∗ ; Z ) + C , I ( U ∗ ; Y ) } (43a) R y ≤ I ( X ; Y | U ∗ ) (43b)which, together with the Markov structure (34), imply that ( R y , R z ) ∈ B . B. Proof of Achievability for Theorem 1
In this section, we prove the achievability part of Theorem 1. Throughout the achievability proof we use thedefinition of a strong typical set [11]. The set T ( n ) ǫ ( X, Y, Z ) of ǫ -typical n -sequences is defined by { ( x n , y n , z n ) : n | N ( x, y, z | x n , y n , z n ) − p ( x, y, z ) | ≤ ǫ · p ( x, y, z ) ∀ ( x, y, z ) ∈ X × Y × Z} , where N ( x, y, z | x n , y n , z n ) is thenumber of appearances of ( x, y, z ) in the n -sequence ( x n , y n , z n ) . Proof:
Fix a joint distribution of P S,U,X,Z,Y = P S P U | S P X | S,U P Y,Z | X,S where P Y,Z | X,S = P Y | X,S P Z | Y is given by thechannel. Code Construction : First, generate nC superbins. Next, generate nR Z bins, one for each message m Z ∈{ , , ..., nR Z } . Partition the bins among the superbins in their natural ordering such that each superbin l ∈{ , , ..., nC } contains the bins associated with the messages m Z ∈ { ( l − n ( R Z − C ) + 1 , ..., l n ( R Z − C ) } .Thus, each superbin contains n ( R Z − C ) bins. Second, for each bin generate n ˜ R Z codewords u n ( m Z , j ) , where j ∈ { , , ..., n ˜ R Z } . Each codeword, u n ( m Z , j ) , is generated i.i.d. ∼ Q ni =1 p ( u i ) . Third, for each codeword u n ( m Z , j ) generate nR Y satellite bins. In each satellite bin generate n ˜ R Y codewords x n ( m Z , j, m Y , k ) , where k ∈ { , , ..., n ˜ R Y } , i.i.d. ∼ Q ni =1 p ( x i ( m Z , j, m Y , k ) | u i ( m Z , j )) . The code construction is illustrated in Fig. 6. PSfrag replacements a codeword u n
11 22 n ( I ( U ; Z ) − I ( U ; S ) nI ( U ; S ) codewords in a bin n (cid:0) I ( U ; Z ) − I ( U ; S ) (cid:1) bins in a superbin nC superbins nI ( U ; Z ) codewords in a superbin nC u n cloud center nI ( X ; Y | U,S ) satellite bins. Each bin contains nI ( X ; S | U ) codewords x n Fig. 6. The code construction. We have nC superbins, each one containing n (cid:0) I ( U ; Z ) − I ( U ; S ) (cid:1) bins. In each bin we have nI ( U ; S ) codewords u n , so in total each superbin contains nI ( U ; Z ) codewords u n . Finally, each codeword u n plays the role of a cloud center and isassociated with nI ( X ; Y | U,S ) satellite codewords x n . Encoding : To transmit ( m Y , m Z ) , the encoder first looks in the bin associated with the message m Z for acodeword u n ( m Z , j ) such that it is jointly typical with the state sequence, s n , i.e. ( u n ( m Z , j ) , s n ) ∈ T ( n ) ǫ ′ ( U, S ) . (44)If such a codeword, u n , does not exist, namely, no codeword in the bin m Z is jointly typical with s n , choosean arbitrary u n from the bin (in such a case the decoder will declare an error). If there is more than one suchcodeword, choose the one for which j is of the smallest lexicographical order. Next, the encoder looks for a sequence x n ( m Z , j, m Y , k ) (where j was chosen in the first stage) such that it is jointly typical with the state sequence, s n ,and the codeword u n ( m Z , j ) , i.e., ( x n ( m Z , j, m Y , k ) , u n ( m Z , j ) , s n ) ∈ T ( n ) ǫ ′ ( X, U, S ) . (45)If such a codeword, x n , does not exist, choose an arbitrary x n from the bin m Y (in such a case the decoderwill declare an error). If there is more than one such codeword, choose the one for which k is of the smallestlexicographical order. Decoding :1) Let ǫ > ǫ ′ . Decoder Y looks for the smallest values of ( ˆ m Y , ˆ m Z ) for which there exists a ˆ j and a ˆ k suchthat ( u n ( ˆ m Z , ˆ j )) , x n ( ˆ m Z , ˆ j, ˆ m Y , ˆ k ) , s n , y n ) ∈ T ( n ) ǫ ( U, X, S, Y ) . (46)If no pair or more than one pair is found, an error is declared.
2) Upon decoding the messages ( ˆ m Y , ˆ m Z ) , Decoder Y uses the link C to send Decoder Z the superbin number, ˆ l , that contains the decoded message ˆ m Z .3) Decoder Z looks in the superbin ˆ l for the smallest value of ˆ m Z for which there exists a ˆ j such that ( u n ( ˆ m Z , ˆ j )) , z n ) ∈ T ( n ) ǫ ( U, Z ) . (47)If no value or more than one value is found, an error is declared. Analysis of the probability of error:
Without loss of generality, we can assume that messages ( m Z , m Y ) = (1 , were sent. Therefore, the superbincontaining m Z = 1 is l = 1 .We define the error events at the encoder: E = {∀ j ∈ { , , ..., n ˜ R Z } : ( U n (1 , j ) , S n ) / ∈ T ( n ) ǫ ′ ( U, S ) } , (48) E = {∀ k ∈ { , , ..., n ˜ R Y } ] : ( X n (1 , , j, k ) , U n (1 , j ) , S n ) / ∈ T ( n ) ǫ ′ ( X, U, S ) } . (49)We define the error events at Decoder Y: E = {∀ j ∈ { , , ..., n ˜ R Z } , ∀ k ∈ { , , ..., n ˜ R Y } :( U n ( j, , X n (1 , j, , k ) , S n , Y n ) / ∈ T ( n ) ǫ ( U, X, S, Y ) } , (50) E = {∃ ˆ m Y = 1 : ( U n ( j, , X n (1 , j, ˆ m Y , k ) , S n , Y n ) ∈ T ( n ) ǫ ( U, X, S, Y ) } , (51) E = {∃ ˆ m Z = 1 , ˆ m Y = 1 :( U n ( j, ˆ m Z ) , X n ( ˆ m Z , j, ˆ m Y , k ) , S n , Y n ) ∈ T ( n ) ǫ ( U, X, S, Y ) } , (52) E = {∃ ˆ m Z = 1 :( U n ( j, ˆ m Z ) , X n ( ˆ m Z , j, , k ) , S n , Y n ) ∈ T ( n ) ǫ ( U, X, S, Y ) } . (53)We define the error events at Decoder Z: E = {∀ j ∈ { , , ..., n ˜ R Z } : ( U n ( j, , Z n ) / ∈ T ( n ) ǫ ( U, Z ) } , (54) E = {∃ ˆ m Z = 1 : m Z ∈ { , , ..., n ( R Z − C ) } ( U n ( j, ˆ m Z ) , Z n ) ∈ T ( n ) ǫ ( U, Z ) } . (55)Then, by the union bound: P ( n ) e ≤ P ( E ) + P ( E ∩ E c ) + P ( E ∩ ( E c ∪ E c )) + P ( E )+ P ( E ) + P ( E ) + P ( E ∩ ( E c ∪ E c )) + P ( E ) . Now, consider:
1) For the encoder error, using the Covering Lemma [11], P ( E ) tends to zero as n → ∞ if in each binassociated with m Z we have more than I ( U ; S ) + δ ( ǫ ) codewords, i.e., ˜ R Z > I ( U ; S ) + δ ( ǫ ) .2) For the second term, we have that ( U n , S n ) ∈ T ( n ) ǫ ( U, S ) and X n is generated i.i.d. ∼ Q ni =1 p ( x i | u i ) . Hence,using the Covering Lemma, we have that P ( E ∩ E c ) tends to zero as n → ∞ if in each bin associated with m Y we have more than I ( X ; S | U ) + δ ( ǫ ) codewords, i.e., ˜ R Y > I ( X ; S | U ) + δ ( ǫ ) .3) For the third term, note that ( X n , U n , S n ) ∈ T ( n ) ǫ ′ ( U, S, X ) . Furthermore, Y n is generated i.i.d. ∼ Q ni =1 p ( y i | x i , s i ) and ǫ > ǫ ′ . Therefore, by the Conditional Typicality Lemma [11], P ( E ∩ ( E c ∪ E c )) tends to zero as n → ∞ .4) For the fourth term, note that if ˆ m Y = 1 then for any j ∈ { , , ..., n ˜ R Z } and any k ∈ { , , ..., n ˜ R Y } , X n (1 , j, ˆ m Y , k ) is conditionally independent of ( X n (1 , j, , k ) , S n , Y n ) given U n (1 , j ) and is distributedaccording to ∼ Q ni =1 p ( x i | u i (1 , j )) . Hence, by the Packing Lemma [11], P ( E ) tends to zero as n → ∞ if R Y + ˜ R Y < I ( X ; S, Y | U ) − δ ( ǫ ) .5) For the fifth term, note that for any ˆ m Z = 1 , any ˆ m Y = 1 , any j ∈ { , , ..., n ˜ R Z } and any k ∈ { , , ..., n ˜ R Y } , ( U n ( ˆ m Z , j ) , X n ( ˆ m Z , j, ˆ m Y , k )) are conditionally independent of ( U n (1 , j ) , X n (1 , j, , k ) , S n , Y n ) . Hence, by the Packing Lemma [11], P ( E ) tends to zero as m → ∞ if R Z + ˜ R Z + R Y + ˜ R Y < I ( U, X ; Y, S ) − δ ( ǫ ) . This bound, in addition the the bounds on ˜ R Z and ˜ R Y ,gives us R Z + R Y < I ( U, X ; Y, S ) − ˜ R Z − ˜ R Y − δ ( ǫ ) < I ( U, X ; Y, S ) − I ( U ; S ) − I ( X ; S | U ) − δ ( ǫ )= I ( U, X ; Y | S ) − δ ( ǫ )= I ( X ; Y | S ) − δ ( ǫ ) .
6) For the sixth term, by the same considerations as for the previous event, by the Packing Lemma we have R Z − ˜ R Z < I ( U, X ; Y, S ) − δ ( ǫ ) (which is already satisfied).7) For the seventh term, ( X n , U n , S n ) ∈ T ( n ) ǫ ( U, S, X ) . In addition, Y n is generated i.i.d. ∼ Q ni =1 p ( y i | x i , s i ) , Z n is generated ∼ Q ni =1 p ( z i | y i ) = Q ni =1 p ( z i | y i , x i , s i , u i ) and ǫ > ǫ ′ . Hence, by the Conditional TypicalityLemma [11] P ( E ∩ ( E c ∪ E c )) tends to zero as n → ∞ .8) For the eighth term, note that for any ˆ m Z = 1 and any j ∈ { , , ..., n ˜ R Z } , U n ( ˆ m Z , j ) is conditionallyindependent of ( U n (1 , j ) , Z n ) . Hence, by the Packing Lemma [11], P ( E ) tend to zero as m → ∞ if thenumber of codewords in each superbin is less than I ( U ; Z ) , i.e., R Z − C + ˜ R Z < I ( U ; Z ) − δ ( ǫ ) .Combining the results, we have shown that P ( E ) → as n → ∞ if R Z ≤ I ( U ; Z ) − I ( U ; S ) + C R Y ≤ I ( X ; Y | U, S ) R Z + R Y ≤ I ( X ; Y | S ) . The above bound shows that the average probability of error, which, by symmetry, is equal to the probabilityfor an individual pair of codewords, ( m Z , m Y ) , averaged over all choices of code-books in the random codeconstruction, is arbitrarily small. Hence, there exists at least one code, ((2 nR Z , nR Y , nR ) , n ) , with an arbitrarilysmall probability of error. C. Converse Proof of Theorem 1
In the previous section, we proved the achievability part of Theorem 1. In this section, we provide the upperbound on the capacity region of the PDSD BC, i.e., we give the proof of the converse for Theorem 1.
Proof:
Given an achievable rate trippet, ( R Y , R Z , C ) , we need to show that there exists a joint distributionof the form (15d), P S P U | S P X | S,U P Y | X,S P Z | Y , such that R Z ≤ I ( U ; Z ) − I ( U ; S ) + C R Y ≤ I ( X ; Y | U, S ) R Z + R Y ≤ I ( X ; Y | S ) . Since ( R Y , R Z , C ) is an achievable rate triplet, there exists a code, ( n, nR Z , nR Y , nC ) , with a probabilityof error, P ( n ) e , that is arbitrarily small. By Fano’s inequality, H ( M Y | Y n , S n ) ≤ n ( R Y ) P ( n ) e, + H ( P ( n ) e, ) , ǫ n , (56) H ( M Z | Z n , M ) ≤ n ( R Z ) P ( n ) e, + H ( P ( n ) e, ) , ǫ n , (57)and let ǫ n + ǫ n , ǫ n . (58)Furthermore, H ( M Y | M Z , Y n , S n , Z n ) ≤ H ( M Y | Y n , S n ) ≤ ǫ n , (59) H ( M Z | Y n , Z n , S n ) ≤ H ( M Z | Z n , M ( Y n , S n )) ≤ ǫ n . (60)Thus, can say that ǫ n → as P ( n ) e → .To bound the rate R Z consider: nR Z = H ( M Z )= H ( M Z ) − H ( M Z | Z n , M ) + H ( M Z | Z n , M ) ( a ) ≤ I ( M Z ; Z n , M ) + nǫ n = I ( M Z ; Z n ) + I ( M Z ; M | Z n ) + nǫ n ( b ) ≤ I ( M Z ; Z n ) + H ( M ) + nǫ n ( c ) ≤ I ( M Z ; Z n ) + C + nǫ n = n X i =1 I ( M Z ; Z i | Z i − ) + C + nǫ n ≤ n X i =1 I ( M Z , Z i − ; Z i ) + C + nǫ n ≤ n X i =1 I ( M Z , Z i − , Y i − ; Z i ) + C + nǫ n ( d ) = n X i =1 I ( M Z , Y i − ; Z i ) + C + nǫ n = n X i =1 I ( M Z , Y i − , S ni +1 ; Z i ) − I ( S ni +1 ; Z i | M Z , Y i − ) + C + nǫ n ( e ) = n X i =1 I ( M Z , Y i − , S ni +1 ; Z i ) − I ( S i ; Y i − | M Z , S ni +1 ) + C + nǫ n ( f ) = n X i =1 I ( M Z , Y i − , S ni +1 ; Z i ) − I ( S i ; Y i − , M Z , S ni +1 ) + C + nǫ n ( g ) = n X i =1 I ( U i ; Z i ) − I ( S i ; U i ) + C + nǫ n (61)where ( a ) follows from Fano’s inequality, ( b ) follows from the fact that conditioning reduces entropy, ( c ) follows from the admissibility of the conference, ( d ) follows from the physical degradedness properties of the channel, ( e ) follows from the Csiszar sum identity, ( f ) follows from the fact that S i is independent of ( M Z , S ni +1 ) , ( g ) follows from the choice of U i = ( M Z , S ni +1 , Y i − ) .Hence, we have: R Z ≤ n n X i =1 [ I ( U i ; Z i ) − I ( S i ; U i )] + C + ǫ n . (62)Next, to bound the rate R Y consider: nR Y = H ( M Y ) ( a ) = H ( M Y | M Z , S n )= H ( M Y | M Z , S n ) − H ( M Y | M Z , S n , Y n ) + H ( M Y | M Z , S n , Y n ) ( b ) ≤ I ( M Y ; Y n | M Z , S n ) + nǫ n ( c ) = I ( M Y , X n ( M Y , M Z , S n ); Y n | M Z , S n ) + nǫ n = n X i =1 I ( M Y , X n ; Y i | M Z , S n , Y i − ) + nǫ n = n X i =1 H ( Y i | M Z , S n , Y i − ) − H ( Y i | M Z , S n , Y i − , M Y , X n ) + nǫ n ( d ) ≤ n X i =1 H ( Y i | M Z , S i , S ni +1 , Y i − ) − H ( Y i | M Z , S n , Y i − , M Y , X n ) + nǫ n ( e ) ≤ n X i =1 H ( Y i | M Z , S i , S ni +1 , Y i − ) − H ( Y i | M Z , S i , S ni +1 , Y i − , X i ) + nǫ n = n X i =1 I ( Y i ; X i | M Z , S i , S ni +1 , Y i − ) + nǫ n ( f ) = n X i =1 I ( Y i ; X i | S i , U i ) + nǫ n (63)where ( a ) follows from the fact that M Y is independent of ( M Z , S n ) , ( b ) follows from Fano’s inequality, ( c ) follows from the fact that X n is a deterministic function of ( M Z , M Y , S n ) , ( d ) follows from the fact that conditioning reduces entropy, ( e ) follows from the properties of the channel, ( f ) follows from the choice of U i = ( M Z , S ni +1 , Y i − ) .Hence, we have: R Y ≤ n n X i =1 I ( Y i ; X i | S i , U i ) + ǫ n . (64)To bound the sum of rates, R Z + R Y , consider: n ( R Z + R Y ) = H ( M Z , M Y ) ( a ) = H ( M Z , M Y | S n )= H ( M Z , M Y | S n ) + H ( M Z , M Y | Y n , Z n , S n ) − H ( M Z , M Y | Y n , Z n , S n ) ( b ) ≤ I ( M Z , M Y ; Y n , Z n | S n ) + nǫ n ( c ) = I ( M Z , M Y ; Y n | S n ) + nǫ n ( d ) = I ( M Y , M Z , X n ( M Y , M Z , S n ); Y n | S n ) + nǫ n = n X i =1 I ( M Y , M Z , X n ; Y i | S n , Y i − ) + nǫ n = n X i =1 H ( Y i | S n , Y i − ) − H ( Y i | S n , Y i − , X n , M Y , M Z ) + nǫ n ( e ) ≤ n X i =1 H ( Y i ) − H ( Y i | S n , Y i − , X n , M Y , M Z ) + nǫ n ( f ) ≤ n X i =1 H ( Y i ) − H ( Y i | S i , X i ) + nǫ n = n X i =1 I ( Y i ; X i | S i ) + nǫ n where ( a ) follows from the fact that ( M Z , M Y ) are independent of S n , ( b ) follows from Fano’s inequality, ( c ) follows from the physical degradedness and memorylessness of the channel, ( d ) follows from the fact that X n is a deterministic function of ( M Z , M Y , S n ) , ( e ) follows from the fact that conditioning reduces entropy, ( f ) follows from the properties of the channel.Hence, we have: R Z + R Y ≤ n n X i =1 I ( Y i ; X i | S i ) + ǫ n . (65)We complete the proof by using standard time-sharing arguments to obtain the rate bounds terms given in (15). D. Proof of Theorem 3Proof:
The proof of Theorem 3 follows closely the proof of the noncausal case. Therefore, the proof is notgiven in full detail. Instead, we rely on the guidelines of the proof for the noncausal case and emphasize thedifferences when considering the causal scenario.For the proof of achievability, fixing a distribution of the form (19d), we generate codewords u n ( m Z ) i.i.d. ∼ Q ni =1 p ( u i ) . Next, we generate satellite codewords v n ( m Y , m Z ) i.i.d. ∼ Q ni =1 p ( v i | u i ) (instead of x n ( m Y , m Z ) )around each cloud center u n ( m Z ) . Furthermore, the codewords u n are divided among nC superbins. To send ( m Y , m Z ) , the encoder transmits x i ( u i ( m z ) , v i ( m Z , m Y ) , s i ) at time i ∈ [1 , n ] . For decoding, the strong decoder,Decoder Y, decodes the satellite codeword v n (and hence also the cloud center u n ), i.e., it looks for an ( ˆ m Y , ˆ m Z ) such that ( u n ( ˆ m Z ) , v n ( ˆ m Z , ˆ m Y ) , y n ) ∈ T ( n ) ǫ ( U, V, Y ) . Decoder Y then uses the link between the decoders to sendDecoder Z the number of the superbin that contains u n . Decoder Z now looks in this specific superbin for a unique ˆ m Z such that ( u n ( ˆ m Z ) , z n ) ∈ T ( n ) ǫ ( U, Z ) . Now, by LLN, the Conditional Typicality Lemma, the Packing Lemma[11], and similar to the proof of achievability of Theorem 1, the probability of error tends to zero as n → ∞ if R Z ≤ I ( U ; Z ) + C − δ ( ǫ ) R Y ≤ I ( V ; Y | U ) − δ ( ǫ ) R Z + R Y ≤ I ( V, U ; Y ) − δ ( ǫ ) . For the converse, we define the auxiliary random variables U i = ( M Z , Y i − ) and V i = M Y . Note that for thissetting, this definition of U i and V i result in ( U i , V i ) that are independent of S i . Therefore, if we follow the samesteps as in the converse of Theorem 1, the bound on R Z reduces to R Z ≤ n n X i =1 I ( U i ; Z i ) + C + ǫ n . (66)Next, to bound the rate R Y consider: nR Y = H ( M Y ) ≤ I ( M Y ; Y n | M Z ) + nǫ n = n X i =1 I ( M Y ; Y i | M Z , Y i − ) + nǫ n = n X i =1 I ( V i ; Y i | U i ) + nǫ n . Hence, we have: R Y ≤ n n X i =1 I ( V i ; Y i | U i ) + ǫ n . (67)Finally, to bound the sum of rates, R Z + R Y , consider: n ( R Z + R Y ) = H ( M Z , M Y ) ≤ I ( M Z , M Y ; Y n ) + nǫ n = n X i =1 I ( M Y , M Z ; Y i | Y i − ) + nǫ n ≤ n X i =1 I ( M Y , M Z , Y i − ; Y i ) + nǫ n = n X i =1 I ( V i , U i ; Y i ) + nǫ n . Hence, we have: R Z + R Y ≤ n n X i =1 I ( V i , U i ; Y i ) + ǫ n . (68)We complete the proof by using standard time-sharing arguments; hence, the details are omitted. E. Proof of Achievability of Theorem 4
Let us prove achievability of the region given in Theorem 4.
Proof:
Fix a joint distribution of the form (20c), P S,S d ,U,X,Z,Y = P S P S d ,U,X | S P Y,Z | X,S , where P Y,Z | X,S = P Y | X,S P Z | Y is given by the channel. Code Construction : First, we start by generating the codebook of the State Encoder. Randomly and independentlygenerate n ˜ R sequences s nd ( l ) , l ∈ [1 , n ˜ R ] i.i.d. ∼ Q ni =1 p ( s d,i ) . Partition the codewords, s nd ( l ) , among nC bins in their natural ordering such that each bin B ( t ) , t ∈ [1 , nC ] contains the codewords associated with theindex l ∈ [( t − n ( ˜ R − C ) + 1 , t n ( ˜ R − C ) ] . Reveal the codebook to the Channel Encoder, Decoder Y andDecoder Z.Second, we create the codebook for the Channel encoder. Generate nR Z bins, B ( m Z ) , m Z ∈ [1 , nR Z ] . In eachbin generate n ˜ R Z codewords u n ( j, m Z ) , j ∈ [1 , n ˜ R Z ] i.i.d. ∼ Q ni =1 p ( u i ) . Third, for each codeword u n ( j, m Z ) generate nR Y satellite bins. In each satellite bin generate n ˜ R Y codewords x n ( m Z , j, m Y , k ) , where k ∈ [1 , n ˜ R Y ] ,i.i.d. ∼ Q ni =1 p ( x i ( m Z , j, m Y , k ) | u i ( m Z , j )) . Encoding :1)
State Encoder : Given s n , the State Encoder finds an index l such that ( s nd ( l ) , s n ) ∈ T ( n ) ǫ ′ ( S d , S ) . (69)If there is more than one such index, choose the one for which l is of the smallest lexicographical order. Ifthere is no such index, select an index at random from the bin B ( t ) . The State Encoder sends the bin index t .2) Channel Encoder : First, note that the Channel Encoder knows the sequence transmitted from the State Encoder, s nd ( l ) , since it knows both s n and the State Encoder’s strategy. To transmit ( m Y , m Z ) , the encoder first looksin the bin associated with the message m Z for a codeword u n ( j, m Z ) such that it is jointly typical with thestate sequence, s n , and the codeword s nd ( l ) , i.e. ( u n ( j, m Z ) , s n , s nd ( l )) ∈ T ( n ) ǫ ′ ( U, S, S d ) . (70)If there is more than one such index, choose the one for which j is of the smallest lexicographical order. Ifthere is no such index, choose an arbitrary u n from the bin m Z (in such a case the decoder will declare anerror). Next, the encoder looks for a sequence x n ( m Z , j, m Y , k ) (where j was chosen in the first stage) suchthat it is jointly typical with the state sequence, s n , the codeword s nd ( l ) and the codeword u n ( m Z , j ) , i.e., ( x n ( m Z , j, m Y , k ) , u n ( m Z , j ) , s nd ( l ) , s n ) ∈ T ( n ) ǫ ′ ( X, U, S, S d ) . (71)If such a codeword, x n , does not exist, choose an arbitrary x n from the bin m Y (in such a case the decoderwill declare an error). If there is more than one such codeword, choose the one for which k is of the smallestlexicographical order. Decoding :1) Let ǫ > ǫ ′ . Note that Decoder Y knows both the sequence s nd and s n . Since Decoder Y knows the sequence s n and the State Encoder’s strategy, it also knows s nd (similar to the Channel Encoder). Hence, it looks forthe smallest values of ( ˆ m Y , ˆ m Z ) for which there exists a ˆ j such that ( u n ( ˆ m Z , ˆ j )) , x n ( ˆ m Z , ˆ j, ˆ m Y , ˆ k ) , s nd , s n , y n ) ∈ T ( n ) ǫ ( U, X, S d , S, Y ) . (72)If no triplet or more than one such triplet is found, an error is declared.
2) Decoder Z first looks for the unique index ˆ l ∈ B ( t ) such that ( s nd (ˆ l ) , z n ) ∈ T ( n ) ǫ ( S d , Z ) . (73)3) Once Decoder Z has decoded s nd (ˆ l ) , it uses s nd (ˆ l ) as side information to help the next decoding stage. Hence,the second step is to look for the smallest value of ˆ m Z for which there exists a ˆ j such that ( u n ( ˆ m Z , ˆ j )) , z n , s d (ˆ l )) ∈ T ( n ) ǫ ( U, Z, S d ) . (74)If no pair or more than one such pair is found, an error is declared. Analysis of the probability of error:
Without loss of generality, we can assume that messages ( m Z , m Y ) = (1 , were sent. We define the error eventat the State Encoder: E = {∀ l ∈ [1 , n ˜ R ] : ( S nd ( l ) , S n ) / ∈ T ( n ) ǫ ′ ( S d , S ) } . (75)We define the error events at the Channel Encoder: E = {∀ j ∈ [1 , n ˜ R Z ] : ( U n (1 , j ) , S n , S nd ) / ∈ T ( n ) ǫ ′ ( U, S, S d ) } , (76) E = {∀ k ∈ [1 , n ˜ R Y ] : ( X n (1 , j, , k ) , U n (1 , j ) , S n , S nd ) / ∈ T ( n ) ǫ ′ ( X, U, S, S d ) } . (77)We define the error events at Decoder Y: E = {∀ j ∈ [1 , n ˜ R Z ] : ( U n ( j, , X n (1 , j, , k ) , S nd , S n , Y n ) / ∈ T ( n ) ǫ ( U, X, S d , S, Y ) } , (78) E = {∃ ˆ m Y = 1 : ( U n ( j, , X n (1 , j, ˆ m Y , k ) , S nd , S n , Y n ) ∈ T ( n ) ǫ ( U, X, S d , S, Y ) } , (79) E = {∃ ˆ m Z = 1 , ˆ m Y = 1 : ( U n ( j, ˆ m Z ) , X n ( ˆ m Z , j, ˆ m Y , k ) , S nd , S n , Y n ) ∈ T ( n ) ǫ ( U, X, S d , S, Y ) } , (80) E = {∃ ˆ m Z = 1 : ( U n ( j, ˆ m Z ) , X n ( ˆ m Z , j, , k ) , S nd , S n , Y n ) ∈ T ( n ) ǫ ( U, X, S d , S, Y ) } . (81)We define the error events at Decoder Z: E = {∀ l ∈ [1 , n ˜ R ] : ( S nd ( l ) , Z n ) / ∈ T ( n ) ǫ ( S d , Z ) } , (82) E = {∃ ˆ l = L, ˆ l ∈ B ( T ) : ( S nd (ˆ l ) , Z n ) ∈ T ( n ) ǫ ( S d , Z ) } , (83) E = {∀ j ∈ [1 , n ˜ R Z ] : ( U n ( j, , Z n , S nd ( l )) / ∈ T ( n ) ǫ ( U, Z, S d ) } , (84) E = {∃ ˆ m Z = 1 , ( U n ( j, ˆ m Z ) , Z n , S nd ( l )) ∈ T ( n ) ǫ ( U, Z, S d ) } . (85)Then, by the union of events bound: P ( n ) e ≤ P ( E ) + P ( E ∩ E c ) + P ( E ∩ ( E c ∪ E c )) + P ( E ∩ ( E c ∪ E c ∪ E c )) + P ( E ) + P ( E ) + P ( E )+ P ( E ∩ ( E c ∪ E c ∪ E c )) + P ( E ) + P ( E ∩ ( E c ∪ E c ∪ E c )) + P ( E ∩ E c ) . Now, consider:1) For the error at the State Encoder, E , by invoking the Covering Lemma [11], we obtain that P ( E ) tendsto zero as n → ∞ if ˜ R ≥ I ( S ; S d ) + δ ( ǫ ′ ) .
2) The probabilities of the errors P ( E ∩ E c ) , P ( E ∩ ( E c ∪ E c )) , P ( E ∩ ( E c ∪ E c ∪ E c )) , P ( E ) , P ( E ) and P ( E ) are treated in the exact same manner as in Section V-B, where they are shown to tend to zero as n → ∞ . Therefore the details are omitted.3) For the eighth term, we have that ( X n , U n , S n , S nd ) ∈ T ( n ) ǫ ( X, U, S, S d ) . In addition, Y n is generated i.i.d. ∼ Q ni =1 p ( y i | x i , s i ) , and Z n is generated ∼ Q ni =1 p ( z i | y i ) = ∼ Q ni =1 p ( z i | y i , x i , s i , u i , s d,i ) . Since ǫ > ǫ ′ , bythe Conditional Typicality Lemma [11], P ( E ∩ ( E c ∪ E c ∪ E c )) tends to zero as n → ∞ .4) For the ninth error expression, E , we have that P ( E ) = P ( ∃ ˆ l = L, ˆ l ∈ B ( T ) : ( S nd (ˆ l ) , Z n ) ∈ T ( n ) ǫ ( S d , Z )) ≤ P ( ∃ ˆ l ∈ B (1) : ( S nd (ˆ l ) , Z n ) ∈ T ( n ) ǫ ( S d , Z )) [11, Lemma 11.1.]. Therefore, since the sequence S nd (ˆ l ) is independent of Z n , by the Packing Lemma [11], P ( E ) tends to zero as n → ∞ if ˜ R − C ≤ I ( Z ; S d ) + δ ( ǫ ) .5) For the tenth term, we again note that the random variables ( U, Z, S, X, S d ) are generated i.i.d.. Hence, since ǫ > ǫ ′ and by the Conditional Typicality Lemma [11], P ( E ∩ ( E c ∪ E c ∪ E c )) tends to zero as m → ∞ .6) For the eleventh term, note that for any ˆ m Z = 1 and any j ∈ [1 , n ˜ R Z ] , U n ( ˆ m Z , j ) is conditionallyindependent of ( U n (1 , j ) , Z n , S nd ) . Hence, by the Packing Lemma [11], P ( E ∩ E c ) tend to zero as m → ∞ if R Z − ˜ R Z < I ( U ; Z, S d ) − δ ( ǫ ) Combining the results, we have shown that P ( E ) → as n → ∞ if R Z ≤ I ( U ; Z, S d ) − I ( U ; S, S d ) R Y ≤ I ( X ; S, S d , Y | U ) − I ( X ; S, S d | U ) R Z + R Y ≤ I ( U, X ; S, S d , Y ) − I ( U, X ; S, S d ) C ≥ I ( S ; S d ) − I ( Z ; S d ) . Remark 3:
Rearranging the expressions, we obtain R Z ≤ I ( U ; Z, S d ) − I ( U ; S, S d ) (86) R Y ≤ I ( X ; Y | U, S, S d ) (87) R Z + R Y ≤ I ( U, X ; Y | S, S d ) (88) C ≥ I ( S ; S d ) − I ( Z ; S d ) . (89)Note that the bound on the rate sum, (88), is redundant and can be removed, since: R Z + R Y ≤ I ( U, X ; Y | S, S d )= I ( U ; Y | S, S d ) + I ( X ; Y | U, S, S d )= I ( U ; Y, S, S d ) − I ( U ; S d , S ) + I ( X ; Y | U, S, S d ) , in addition to R Y satisfying (87) and R Z satisfying (86), where (86) can be bounded by R Z ≤ I ( U ; Z, S d ) − I ( U ; S, S d ) ≤ I ( U ; Y, S d ) − I ( U ; S, S d ) ≤ I ( U ; Y, S d , S ) − I ( U ; S, S d ) . The above bound shows that the average probability of error, which, by symmetry, is equal to the probabilityfor an individual pair of codewords, ( m Z , m Y ) , averaged over all choices of code-books in the random codeconstruction, is arbitrarily small. Hence, there exists at least one code, ((2 nR Z , nR Y , nR ) , n ) , with an arbitrarilysmall probability of error. F. Converse Proof of Theorem 4
In Section V-E, the achievability Theorem 4 was shown. To finish the proof, we provide the upper bound on thecapacity region.
Proof:
Given an achievable rate triplet ( C , R Z , R Y ) , we need to show that there exists a joint distributionof the form (20c), P S P S d ,U,X | S P Y | X,S P Z | Y , such that R Z ≤ I ( U ; Z | S d ) − I ( U ; S | S d ) R Y ≤ I ( X ; Y | U, S ) and C ≥ I ( S ; S d ) − I ( Z ; S d ) . Since ( C , R Z , R Y ) is an achievable rate triplet, there exists a code, ((2 nR Z , nR Y ) , nC , n ) , with a probabilityof error, P ( n ) e , that is arbitrarily small. By Fano’s inequality, H ( M Y | Y n , S n ) ≤ n ( R Y ) P ( n ) e, + H ( P ( n ) e, ) , ǫ n , (90) H ( M Z | Z n , M ) ≤ n ( R Z ) P ( n ) e, + H ( P ( n ) e, ) , ǫ n , (91)and let ǫ n + ǫ n , ǫ n . (92)Furthermore, H ( M Y | M Z , Y n , S n , Z n ) ≤ H ( M Y | Y n , S n ) ≤ ǫ n , (93) H ( M Z | Y n , Z n , S n , M ) ≤ H ( M Z | Z n , M )) ≤ ǫ n . (94)Thus, we can say that ǫ n → as P ( n ) e → .For C consider: nC ≥ H ( M ) ≥ I ( M ; S n )= n X i =1 I ( M ; S i | S i − ) ( a ) = n X i =1 I ( M , S i − ; S i )= n X i =1 I ( M , S i − , Z ni +1 ; S i ) − I ( Z ni +1 ; S i | M , S i − ) ( b ) = n X i =1 I ( M , S i − , Z ni +1 ; S i ) − I ( Z i ; S i − | M , Z ni +1 ) ≥ n X i =1 I ( M , S i − , Z ni +1 ; S i ) − I ( Z i ; S i − , M , Z ni +1 ) ( c ) = n X i =1 I ( S d,i ; S i ) − I ( Z i ; S d,i ) where ( a ) follows since S i is independent of S i − , ( b ) follows from the Csiszar sum identity, ( c ) follows from the definition of the auxiliary random variable, S d = ( M , S ni +1 , Z i − ) .Hence, we have: C ≤ n n X i =1 I ( S d,i ; S i ) − I ( Z i ; S d,i ) . (95)To bound the rate R Z consider: nR Z = H ( M Z )= H ( M Z | M )= H ( M Z | M ) − H ( M Z | Z n , M ) + H ( M Z | Z n , M ) ( a ) ≤ I ( M Z ; Z n | M ) + nǫ n = n X i =1 I ( M Z ; Z i | M , Z i − ) + nǫ n = n X i =1 I ( M Z , S ni +1 ; Z i | M , Z i − ) − I ( S ni +1 ; Z i | M Z , M , Z i − ) + nǫ n ( b ) = n X i =1 I ( M Z , S ni +1 ; Z i | M , Z i − ) − I ( S i ; Z i − | M Z , M , S ni +1 ) + nǫ n ( c ) = n X i =1 I ( M Z , S ni +1 ; Z i | M , Z i − ) − I ( S i ; Z i − , M Z | M , S ni +1 ) + nǫ n ( d ) = n X i =1 I ( S ni +1 ; Z i | M , Z i − ) + I ( M Z , ; Z i | M , Z i − , S ni +1 ) − I ( S i ; Z i − , M Z | M , S ni +1 ) + nǫ n ( e ) = n X i =1 I ( S i ; Z i − | M , S ni +1 ) + I ( M Z , ; Z i | M , Z i − , S ni +1 ) − I ( S i ; Z i − , M Z | M , S ni +1 ) + nǫ n = n X i =1 I ( M Z , ; Z i | M , Z i − , S ni +1 ) − I ( S i ; M Z | Z i − , M , S ni +1 ) + nǫ n ( f ) = n X i =1 I ( U i ; Z i | S d,i ) − I ( S i ; U i | S d,i ) + nǫ n (96)where ( a ) follows from Fano’s inequality, ( b ) follows from the Csiszar sum identity, ( c ) follows from the fact that S i is independent of M Z given ( M Z , M , S ni +1 ) , ( d ) follows from using the chain rule, ( e ) follows from using the Csiszar sum identity, ( f ) follows from the definition of the auxiliary random variables S d = ( M , S ni +1 , Z i − ) and U i = M Z .Hence, we have: R Z ≤ n n X i =1 [ I ( U i ; Z i | S d ) − I ( S i ; U i | S d )] + C + ǫ n . (97)Next, to bound the rate R Y consider: nR Y = H ( M Y ) ( a ) = H ( M Y | M Z , S n )= H ( M Y | M Z , S n ) − H ( M Y | M Z , S n , Y n ) + H ( M Y | M Z , S n , Y n ) ( b ) ≤ I ( M Y ; Y n | M Z , S n ) + nǫ n ( c ) = I ( M Y , X n ( M Y , M Z , S n ); Y n | M Z , S n ) + nǫ n = n X i =1 I ( M Y , X n ; Y i | M Z , S n , Y i − ) + nǫ n ( d ) = n X i =1 I ( M Y , X n ; Y i | M Z , S n , M , Y i − ) + nǫ n ( e ) = n X i =1 I ( M Y , X n ; Y i | M Z , S n , M , Y i − , Z i − ) + nǫ n = n X i =1 H ( Y i | M Z , S n , M , Y i − , Z i − ) − H ( Y i | M Z , S n , M , Y i − , Z i − , M Y , X n ) + nǫ n ( f ) ≤ n X i =1 H ( Y i | M Z , S i , S d,i ) − H ( Y i | M Z , S n , M , Y i − , Z i − , M Y , X n ) + nǫ n ( g ) ≤ n X i =1 H ( Y i | M Z , S i , S d,i ) − H ( Y i | M Z , S i , S d,i , X i ) + nǫ n = n X i =1 I ( Y i ; X i | M Z , S d,i , S i ) + nǫ n ( h ) = n X i =1 I ( Y i ; X i | U i , S d,i , S i ) + nǫ n (98)where ( a ) follows from the fact that M Y is independent of ( M Z , S n ) , ( b ) follows from Fano’s inequality, ( c ) follows from the fact that X n is a deterministic function of ( M Z , M Y , S n ) , ( d ) follows from the fact that M is a function of S n , ( e ) follows from the degradedness property of the channel, ( f ) follows from the fact that conditioning reduces entropy and the definition of the auxiliary random variable S d = ( M , S ni +1 , Z i − ) , ( g ) follows from the properties of the channel, ( h ) follows from the choice of U i = M Z . Hence, we have: R Y ≤ n n X i =1 I ( Y i ; X i | S i , U i ) + ǫ n . (99)We complete the proof by using standard time-sharing arguments to obtain the rate bounds terms given in Theorem4. R EFERENCES[1] E. Amir and Y. Steinberg. Joint source-channel coding for cribbing models. In
Int. Symp. Information Theory (ISIT-12) , pages 1952–1956,2012.[2] P.P. Bergmans. Pandom coding theorem for broadcast channels with degraded message components.
IEEE Trans Inf. Theory , IT-19(2):197–207, Mar. 1973.[3] T. M. Cover and M. Chiang. Duality between channel capasity and rate distortion with two-sided state information.
IEEE Trans. Inf.Theory , 48(6):1629–1638, June 2002.[4] T.M. Cover. Broadcast channels.
IEEE Trans Inf. Theory , IT-18(1):2–14, Jan. 1972.[5] R. Dabora and S. Servetto. Broadcast channels with cooperating decoders.
IEEE Trans Inf. Theory , 52(12):5438–5454, December 2006.[6] R.G. Gallager. Capacity and coding for degraded broadcast channels.
Probl. Pered. Inform. , 3-14(3), 1972.[7] A. E. Gamal. Broadcast channels with and without feedback. In
Proc. 11th Annu. Conf. Circuits Systems and Computers , pages 180–183,1978.[8] A. E. Gamal. The feedback capacity of degraded broadcast channels.
IEEE Trans. Inform. Theory , 24:379–381, 1978.[9] A. E. Gamal. The capacity of a class of broadcast channels.
IEEE Trans Inf. Theory , IT-25(2):166–169, Mar. 1979.[10] A. E. Gamal. The capacity of the physically degraded gaussian broadcast channel with feedback.
IEEE Trans Inf. Theory , IT-27(4):508–511,Jul. 1981.[11] A. E. Gamal and Y.H. Kim.
Network Information Theory . Cambridge University Press, 2011.[12] S. I. Gel’fand and M. S. Pinsker. Coding for channel with random parameters.
Probl. Contr. and Inf. Theory , 9(1):19–31, 1980.[13] C. Heegard and A. E. Gamal. On the capacity of computer memory with defects.
IEEE Trans. Inf. Theory , 29(5):731–739, 1983.[14] A.V. Kuznetsov and B.S. Tsybakov. Coding in a memory with defective cells.
Probl. Peredachi Inf. , 10:52–60, 1974. [15] A. Lapidoth and Y. Steinberg. The multiple access channel with causal and strictly causal side information at the encoders. 2010.[16] A. Lapidoth and Y. Steinberg. The multiple-access channel with causal side information: Common state. IEEE Trans. Inf. Theory ,59(1):32–50, 2013.[17] Y. Liang and V. V. Veeravalli. Cooperative relay broadcast channels.
IEEE Trans. Inf. Theory , 51:900–928, 2007.[18] H. Permuter, S. Shamai, and A. Somekh-Baruch. Message and state cooperation in multiple access channels.
IEEE Trans Inf. Theory ,57(10):6379–6396, 2011.[19] P. Piantanida, A. Zaidi, and S. Shamai. Multiple access channel with states known noncausally at one encoder and only strictly causallyat the other encoder. In
CoRR , 2011. abs/1105.5975,.[20] C. E. Shannon. Channels with side information at the transmitter.
IBM J. Res. Devel. , 2:289–293, 1958.[21] O. Simeone, D. G¨und¨uz, H. V. Poor, A. J. Goldsmith, and S. Shamai. Compound multiple-access channels with partial cooperation.
IEEETrans. Inf. Theory , 55(6):2425–2441, 2009.[22] A. Somekh-Baruch, S. Shamai, and S. Verdu. Cooperative multiple-access encoding with states available at one transmitter.
IEEE TransInf. Theory , IT-54:4448–4469, Oct. 2008.[23] Y. Steinberg. Coding for the degraded broadcast channel with random parameters, with causal and noncausal side information.
IEEETrans Inf. Theory , 51(8):2867–2877, August 2005.[24] Y. Steinberg. Coding for channels with rate-limited side information at the decoder, with applications.
IEEE Trans. Inf. Theory , 54:4283–4295, 2008.[25] F. M. J. Willems. The discrete memoryless multiple channel with partially cooperating encoders.
IEEE Trans. Inf. Theory , 29(6):441–445,1983.[26] F. M. J. Willems and E. C. van der Meulen. The discrete memoryless multiple-access channel with cribbing encoders.
IEEE Trans. Inf.Theory , 31(3):313–327, 1985.[27] A. D. Wyner and J. Ziv. The rate-distortion function for source coding with side information at the decoder.