[PDF] Novel Techniques to Derive Capacity Results for Multi-User Interference Channels

Abstract

The Interference Channels (ICs) represent fundamental building blocks of wireless communication networks. Despite considerable progress in network information theory, available capacity results for ICs, specifically those with more than two users, are still very limited. One of the main difficulties in the analysis of these networks is how to establish useful capacity outer bounds for them. In this paper, novel techniques requiring subtle sequential application of Csiszar-Korner identity are developed to establish efficient single-letter outer bounds on the sum-rate capacity of interference networks. By using the outer bounds, a full characterization of the sum-rate capacity is then derived for various multi-user ICs under specific conditions. Our capacity results hold for both discrete and Gaussian networks.

Full PDF

1 Novel Techniques to Derive Capacity Results for Multi-User Interference Channels

Abstract : The Interference Channels (ICs) represent fundamental building blocks of wireless communication networks. Despite considerable progress in network information theory, available capacity results for ICs, specifically those with more than two users, are still very limited. One of the main difficulties in the analysis of these networks is how to establish useful capacity outer bounds for them. In this paper, novel techniques requiring subtle sequential application of Csiszar-Korner identity are developed to establish efficient single-letter outer bounds on the sum-rate capacity of interference networks. By using the outer bounds, a full characterization of the sum-rate capacity is then derived for various multi-user ICs under specific conditions. Our capacity results hold for both discrete and Gaussian networks. I. I NTRODUCTION

A major challenge in developing wireless networks is to manage interference that is a resultant of simultaneous communication by different users. In network information theory, this problem is basically analyzed by the Interference Channel (IC), a scenario where some transmitters send independent messages to their respective receivers via a common media. Despite considerable progress in Shannon theory, the capacity region of ICs, even for the simple two-user case, is still open problem. A limiting expression for the capacity region in the general case was derived in [1]; however, computing the capacity through this expression is difficult. Capacity bounds for the ICs have been studied in numerous papers. The two-user IC is considered in [1-12]. Specifically, a single-letter characterization of the capacity region has been derived for some special cases such as the IC with strong interference [2, 3], a subclass of deterministic ICs [4], a class of discrete degraded IC [5], and a subclass of one-sided IC [6]. In the concurrent works [7], [8] and [9], the authors established new capacity outer bounds for the two-user Gaussian IC. Using the derived outer bounds, they also identified a noisy interference regime for the channel. In this regime, treating interference as noise at each receiver is sum-rate optimal. Moreover, new sum-rate capacities were recently derived in [10] for the two-user discrete

IC. The multi-user ICs have been also studied widely in recent years. However, available capacity results are very limited. The so-called many-to-one and one-to-many ICs -special cases where interference is experienced, or is caused, by only one user- were considered in [13]. Capacity bounds were presented in [14] for a class of three-user deterministic ICs. An outer bound was presented in [15] for the three-user IC, and the sum-rate capacity was established for some certain Gaussian channels. A class of multi-user cyclic Gaussian IC was studied in [16] where the capacity region was approximately determined to within a constant gap; the authors also identified a strong interference regime for the Gaussian channel. Capacity region of a class of multi-user symmetric Gaussian ICs in the very strong interference regime was determined in [17] using lattice coding. In [18] a full characterization of the sum-rate capacity is provided for the degraded ICs. Finally, the strong interference regime initially derived in [2, 3] for the two-user IC is generalized to the multi-user ICs in [19].

Reza K. Farsani, Amir K. Khandani

Department of Electrical and Computer Engineering University of Waterloo, Waterloo, ON. Canada Email: {r3khosra, khandani}@uwaterloo.ca

2 One of the main difficulties in the analysis of multi-user ICs is how to establish useful capacity outer bounds for them. In this paper, we develop novel techniques requiring subtle sequential application of Csiszar-Korner identity [20] to establish efficient single-letter outer bounds on the sum-rate capacity of multi-user ICs. We demonstrate that our outer bounds can be applied to obtain the sum-rate capacity for various multi-user ICs under specific conditions. Our new capacity results hold for both discrete and Gaussian networks. The rest of the paper is organized as follows. Preliminaries and definitions are given in Section II. The main results are presented in Section III.

II. P RELIMINARIES

Throughout this paper, given a length- 𝑛 vector of random variables 𝑋 𝑛 , the notation 𝑋 𝑛\𝑡 is defined as follows: 𝑋 𝑛\𝑡 ≜ (𝑋 𝑡−1 , 𝑋 𝑡+1𝑛 ), 𝑡 = 1, … , 𝑛 (1) The classical 𝐾 -user interference channel is a communication scenario where 𝐾 transmitters send independent messages to their respective receivers via a common media. The channel is specified by 𝐾 input signals 𝑋 𝑖 ∈ 𝒳 𝑖 , 𝐾 output signals 𝑌 𝑖 ∈ 𝒴 𝑖 , 𝑖 = 1, … , 𝐾 , and the transition probability function ℙ(𝑦 , 𝑦 , … , 𝑦 𝐾 |𝑥 , 𝑥 , … , 𝑥 𝐾 ) . Figure 1 depicts the channel model. 𝑀 𝑋 𝑌 𝑀̂ 𝑀 𝑋 𝑌 𝑀̂ 𝑀 𝐾 𝑋 𝐾 𝑌 𝐾 𝑀̂ 𝐾 Figure 1.

The K-user interference channel.

The Gaussian IC with real-valued input and output signals is given by: [𝑌 ⋮𝑌 𝐾 ] = [𝑎 ⋯ 𝑎 ⋮ ⋱ ⋮𝑎 𝐾1 ⋯ 𝑎 𝐾𝐾 ] [𝑋 ⋮𝑋 𝐾 ] + [𝑍 ⋮𝑍 𝐾 ] (2) where the parameters {𝑎 𝑗𝑖 } 𝑗,𝑖=1,…,𝐾 are (fixed) real-valued numbers, and the random variables {𝑍 𝑗 } 𝑗=1𝐾 are zero-mean unit-variance Gaussian noises. Without loss of generality, one can assume that 𝑎 𝑖𝑖 = 1, 𝑖 = 1, … , 𝐾 . The 𝑖 𝑡ℎ transmitter is subject to an average power constraint as: 𝔼[𝑋 𝑖2 ] ≤ 𝑃 𝑖 , where 𝑃 𝑖 ∈ ℝ + , 𝑖 = 1, … , 𝐾 . In the following, we present some technical lemmas which have an essential role in our main derivations given in Section III. These lemmas are in fact proved in [21]. However, we have provided proofs of them in the Appendix for complicity. ℙ(𝑦 , … , 𝑦 𝐾 |𝑥 , … , 𝑥 𝐾 ) DEC-1 ENC-1 DEC-2

ENC-2 DEC-K

ENC-K

3 Let 𝒴 , 𝒴 , 𝒳 , 𝒳 , … , 𝒳 𝜇 , 𝒳 𝜇 +1 , … , 𝒳 𝜇 +𝜇 be arbitrary sets, where 𝜇 , 𝜇 ∈ ℕ are arbitrary natural numbers. Let also ℙ(𝑦 , 𝑦 |𝑥 , 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) be a given conditional probability distribution defined on the set 𝒴 × 𝒴 × 𝒳 × 𝒳 × … ×𝒳 𝜇 × 𝒳 𝜇 +1 × … × 𝒳 𝜇 +𝜇 . Lemma 1.

Consider the inequality below:

𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) ≤ 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) (3) If the inequality (3) holds for all PDFs 𝑃 𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) with the following factorization: 𝑃 𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 = 𝑃 𝑋 …𝑋 𝜇1 (𝑥 , … , 𝑥 𝜇 )𝑃 𝑋 𝜇1+1 (𝑥 𝜇 +1 )𝑃 𝑋 𝜇1+2 (𝑥 𝜇 +2 ) … 𝑃 𝑋 𝜇1+𝜇2 (𝑥 𝜇 +𝜇 ) (4) then, we have: 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝐷) ≤ 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝐷) (5) for all joint PDFs 𝑃 𝐷𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑑, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) where 𝐷 → 𝑋 , … , 𝑋 𝜇 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 → 𝑌 , 𝑌 forms a Markov chain. Proof of Lemma 1) See Appendix A. ■ Corollary 1.

For any

𝛺 ⊆ {1, … , 𝜇 } , if the inequality (3) holds for all joint PDFs (4), then we have: 𝐼({𝑋 , … , 𝑋 𝜇 } − {𝑋 𝑖 } 𝑖∈𝛺 ; 𝑌 | {𝑋 𝑖 } 𝑖∈𝛺 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝐷) ≤ 𝐼({𝑋 , … , 𝑋 𝜇 } − {𝑋 𝑖 } 𝑖∈𝛺 ; 𝑌 | {𝑋 𝑖 } 𝑖∈𝛺 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝐷) (6) for all joint PDFs 𝑃 𝐷𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑑, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) where 𝐷 → 𝑋 , … , 𝑋 𝜇 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 → 𝑌 , 𝑌 forms a Markov chain. Lemma 2.

Consider the inequality below:

𝐼(𝑈; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) ≤ 𝐼(𝑈; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) (7) If the inequality (7) holds for all PDFs 𝑃 𝑈𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) with the following factorization: 𝑃 𝑈𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 = 𝑃 𝑈𝑋 …𝑋 𝜇1 (𝑢, 𝑥 , … , 𝑥 𝜇 )𝑃 𝑋 𝜇1+1 (𝑥 𝜇 +1 )𝑃 𝑋 𝜇1+2 (𝑥 𝜇 +2 ) … 𝑃 𝑋 𝜇1+𝜇2 (𝑥 𝜇 +𝜇 ) (8) then, we have: 𝐼(𝑈; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝐷) ≤ 𝐼(𝑈; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝐷) (9) for all joint PDFs 𝑃 𝐷𝑈𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑑, 𝑢, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) where 𝐷, 𝑈 → 𝑋 , … , 𝑋 𝜇 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 → 𝑌 , 𝑌 form a Markov chain. Proof of Lemma 2) See Appendix B. ■ For the Gaussian networks, we need to the following variations of Lemmas 1 and 2. Let the outputs 𝑌 and 𝑌 be given as follows: {𝑌 ≜ 𝑎 𝑋 + 𝑎 𝑋 + ⋯ + 𝑎 𝜇 𝑋 𝜇 + 𝑎 𝜇 +1 𝑋 𝜇 +1 + ⋯ + 𝑎 𝜇 +𝜇 𝑋 𝜇 +𝜇 + 𝑍 𝑌 ≜ 𝑏 𝑋 + 𝑏 𝑋 + ⋯ + 𝑏 𝜇 𝑋 𝜇 + 𝑏 𝜇 +1 𝑋 𝜇 +1 + ⋯ + 𝑏 𝜇 +𝜇 𝑋 𝜇 +𝜇 + 𝑍 (10) where 𝑍 and 𝑍 are zero-mean unit-variance Gaussian random variables; also, 𝑋 , 𝑋 , … , 𝑋 𝜇 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 are real-valued power-constrained random variables independent of (𝑍 , 𝑍 ) and 𝑎 , 𝑎 , … , 𝑎 𝜇 , 𝑎 𝜇 +1 , … , 𝑎 𝜇 +𝜇 and 𝑏 , 𝑏 , … , 𝑏 𝜇 , 𝑏 𝜇 +1 , … , 𝑏 𝜇 +𝜇 are fixed real numbers. The goal is to determine sufficient conditions for this setup under which the inequality (5) (or (9)) holds for all joint PDFs 𝑃 𝐷𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑑, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) . The following lemmas present such conditions. Lemma 3.

Consider the Gaussian system in (10). If the following condition satisfies: 𝑎 𝑏 = 𝑎 𝑏 = ⋯ = 𝑎 𝜇 𝑏 𝜇 = 𝛼, |𝛼| ≤ 1, (11) then the inequality (5) holds for all joint PDFs 𝑃 𝐷𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑑, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) where 𝐷 is independent of (𝑍 , 𝑍 ) . Proof of Lemma 3) See Appendix C.■ In fact, under the condition (11), given 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , the signal 𝑌 is a stochastically degraded version of 𝑌 . Also, we remark that the relation (11) is a sufficient condition under which (5) holds; however, in general the inequality (5) may not be equivalent to (11). It is essential to note that the condition (11) is not derived by evaluating (3) for Gaussian input distributions. Only for the case of 𝜇 = 1 , the condition (11) can be equivalently derived by evaluating (3) for Gaussian input distributions. Lemma 4.

Consider the Gaussian system in (10). If (11) holds, then the inequality (9) is satisfied for all joint PDFs 𝑃 𝐷𝑈𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑑, 𝑢, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) where (𝐷, 𝑈) is independent of (𝑍 , 𝑍 ) . Proof of Lemma 4) The proof is in fact similar to Lemma 3. In essence, if the condition (11) holds, given 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , the signal 𝑌 is a stochastically degraded version of 𝑌 . Therefore, (9) is always satisfied for all joint PDFs 𝑃 𝐷𝑈𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑑, 𝑢, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) . ■ Given these preliminaries, our main results are presented in the next section. IV. M AIN R ESULTS

In this section, first we establish a single-letter outer bound on the sum-rate capacity of the general multi-user ICs which satisfy certain less-noisy conditions. The proof of this outer bound includes a novel interesting technique requiring a sequential application of the Csiszar-Korner identity. We then obtain scenarios for which the derived outer bound is also achievable which yields the exact sum-rate capacity. Next, we present some generalizations of the ideas to derive other outer bounds which can be used to prove more capacity results.

Theorem 1.

Consider the K-user IC shown in Fig. 1. Assume that the channel transition probability function satisfies the following conditions:

First note that, according to Lemma 2, the conditions (12) can be extended as follows:

𝐼(𝑈; 𝑌 𝑖 |𝑋 𝑖 , … , 𝑋 𝐾 , 𝐷) ≤ 𝐼(𝑈; 𝑌 𝑖−1 |𝑋 𝑖 , … , 𝑋 𝐾 , 𝐷) , for all joint PDFs 𝑃 𝐷𝑈𝑋 𝑋 …𝑋 𝐾 , 𝑖 = 2, … , 𝐾 (14) Now consider a code of length 𝑛 for the network with the rates 𝑅 , 𝑅 , … , 𝑅 𝐾 for the users 𝑋 , 𝑋 , … , 𝑋 𝐾 , respectively. Using Fano’s inequality, for 𝑖 = 1, … , 𝐾 , we have: 𝑛𝑅 𝑖 ≤ 𝐼(𝑋 𝑖𝑛 ; 𝑌 𝑖𝑛 ) + 𝑛𝜖 𝑖,𝑛 ≤ (𝑎) 𝐼(𝑋 𝑖𝑛 ; 𝑌 𝑖𝑛 |𝑋 𝑖+1𝑛 , … , 𝑋 𝐾𝑛 ) + 𝑛𝜖 𝑖,𝑛 (15) where 𝜖 𝑖,𝑛 → 0 as 𝑛 → ∞ . Note that the inequality (a) in (15) holds because 𝑋 𝑖𝑛 is independent of 𝑋 𝑖+1𝑛 , … , 𝑋 𝐾𝑛 . By adding the two sides of (15) for 𝑖 = 1, … , 𝐾 , we obtain: 𝑛𝒞 𝑠𝑢𝑚 = 𝑛 ∑ 𝑅 𝑖𝐾𝑖=1 ≤ 𝐼(𝑋 ; 𝑌 |𝑋 , … , 𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) + ⋯ + 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1𝑛 |𝑋 𝐾𝑛 ) + 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾𝑛 ) + 𝑛𝜖 𝑛 (16) where 𝜖 𝑛 → 0 as 𝑛 → ∞ . The gist of the proof is to provide a single-letter characterization of the expression on the right side of (16). We derive this by a sequential application of Csiszar-Korner identity [20] as well as by utilizing the conditions (14). Consider the last two mutual information functions in (16). We have: { 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1𝑛 |𝑋 𝐾𝑛 ) = ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾𝑛 ) = ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 |𝑌 𝐾,𝑡+1𝑛 ) 𝑛𝑡=1 ≤ ∑ 𝐼(𝑋 𝐾𝑛 , 𝑌 𝐾,𝑡+1𝑛 ; 𝑌

𝐾,𝑡 ) 𝑛𝑡=1 (17) Please precisely observe the style of applying the chain rule in each of the equations (81). Now, from (81) we derive: 𝐼(𝑋

𝐾−1𝑛 ; 𝑌

𝐾−1𝑛 |𝑋 𝐾𝑛 ) + 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾𝑛 ) ≤ ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 , 𝑌 𝐾,𝑡+1𝑛 ; 𝑌

𝐾,𝑡 ) 𝑛𝑡=1 = ∑ 𝐼(𝑋 𝐾−1𝑛 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾,𝑡 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−1𝑡−1 ; 𝑌

𝐾,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾,𝑡+1𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−1𝑡−1 ; 𝑌

𝐾,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾,𝑡+1𝑛 ) 𝑛𝑡=1 = (𝑎) ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 (18) where the equality (a) holds because, according to Csiszar-Korner identity, the 𝑛𝑑 and the 𝑡ℎ ensembles on the left hand side of (a) are equal. Now consider the second ensemble on right side of (a) in (18). We claim that: ∑ 𝐼(𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 ≤ ∑ 𝐼(𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 (19)

6 To verify the inequality (19), first note that we have: { ∑ 𝐼(𝑌

𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾,𝑡 |𝑋 𝐾,𝑡 , 𝑋

𝐾𝑛\𝑡 ) 𝑛𝑡=1 ∑ 𝐼(𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾,𝑡 , 𝑋

𝐾𝑛\𝑡 ) 𝑛𝑡=1 (20) Considering (20), the inequality (19) is derived from the condition (14) for 𝑖 = 𝐾, 𝑈 ≡ (𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ) and 𝐷 ≡ 𝑋

𝐾𝑛\𝑡 . Now, by substituting (19) into (18), we obtain:

𝐼(𝑋

𝐾−1𝑛 ; 𝑌

𝐾−1𝑛 |𝑋 𝐾𝑛 ) + 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾𝑛 ) ≤ ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 = ∑ 𝐼(𝑋 𝐾−1𝑛 , 𝑌

𝐾−1𝑡−1 , 𝑌

𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 = ∑ 𝐼(𝑋 𝐾−1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 (21) Next, consider the following equality: 𝐼(𝑋

𝐾−2𝑛 ; 𝑌

𝐾−2𝑛 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) = ∑ 𝐼(𝑋 𝐾−2𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 (22) Note that the style of applying the chain rule in (22) is similar to the second relation in (17); it changes alternately among the mutual information functions in (16) . Now consider the ensemble in (22) and the first ensemble on the right side of the last equality of (21); we can write: ∑ 𝐼(𝑋 𝐾−2𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑋 𝐾−2𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 , 𝑌

𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−2𝑛 , 𝑋

𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 = ∑ 𝐼(𝑋 𝐾−2𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 , 𝑌

𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−2𝑛 , 𝑋

𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 ) 𝑛𝑡=1 = (𝑎) ∑ 𝐼(𝑋 𝐾−2𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 , 𝑌

𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−2𝑛 , 𝑋

𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 (23) where equality (a) holds because, according to the Csiszar-Korner identity, the 𝑛𝑑 and the 𝑡ℎ ensembles on the left side of (a) are equal. Now consider the second ensemble on the right side of (a) in (87). We claim: ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 ≤ ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 (24) To prove this inequality, first note that we have: {∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾−1,𝑡 , 𝑋

𝐾,𝑡 , 𝑋

𝐾−1𝑛\𝑡 , 𝑋

𝐾𝑛\𝑡 ) 𝑛𝑡=1 ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1,𝑡 , 𝑋

𝐾,𝑡 , 𝑋

𝐾−1𝑛\𝑡 , 𝑋

𝐾𝑛\𝑡 ) 𝑛𝑡=1 (25)

7 By considering these equalities, (24) is derived from the condition (14) for 𝑖 = 𝐾 − 1, 𝑈 ≡ (𝑌

𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ) and 𝐷 ≡ (𝑋

𝐾−1𝑛\𝑡 , 𝑋

𝐾𝑛\𝑡 ) . By substituting (24) in (23), we obtain: ∑ 𝐼(𝑋 𝐾−2𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 ≤ ∑ 𝐼(𝑋 𝐾−2𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−1𝑡−1 , 𝑌

𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝐾−2,𝑡+1𝑛 , 𝑌

𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−2𝑛 , 𝑋

𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑋 𝐾−2𝑛 , 𝑌

𝐾−1𝑡−1 , 𝑌

𝐾−2,𝑡+1𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝐾−1𝑡−1 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−2𝑛 , 𝑋

𝐾−1𝑛 , 𝑋 𝐾𝑛 , 𝑌 𝐾−2,𝑡+1𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑋 𝐾−2𝑛 , 𝑌

𝐾−2,𝑡+1𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 (26) Therefore, by combining (21), (22) and (26), we have: 𝐼(𝑋

𝐾−2𝑛 ; 𝑌

𝐾−2𝑛 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) + 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1𝑛 |𝑋 𝐾𝑛 ) + 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾𝑛 ) ≤ ∑ 𝐼(𝑋 𝐾−2𝑛 , 𝑌

𝐾−2,𝑡+1𝑛 ; 𝑌

𝐾−2,𝑡 |𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 (27) This procedure can be followed sequentially to manipulate other mutual information functions in (16). At last, we derive: 𝐼(𝑋 ; 𝑌 |𝑋 , … , 𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) + ⋯ + 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1𝑛 |𝑋 𝐾𝑛 ) + 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾𝑛 ) ≤ 𝛯 (28) where:  If 𝐾 is even, 𝛯 is given by: 𝛯 = ∑ 𝐼(𝑋 , 𝑌 ; 𝑌 |𝑋 , … , 𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ⋯ + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 (29)  If 𝐾 is odd, 𝛯 is given by: 𝛯 = ∑ 𝐼(𝑋 , 𝑌 ; 𝑌 |𝑋 , … , 𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ⋯ + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 (30) The expressions (29) and (30) both are in fact identical and equal to: 𝛯 = ∑ 𝐼(𝑋 ; 𝑌 |𝑋 , … , 𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ⋯ + ∑ 𝐼(𝑋 𝐾−1𝑛 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾𝑛 ; 𝑌 𝐾,𝑡 ) 𝑛𝑡=1 (31) This is because: ∑ 𝐼(𝑌 ; 𝑌 |𝑋 , 𝑋 , … , 𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 = ∑ 𝐼(𝑌 ; 𝑌 |𝑋 , 𝑋 , … , 𝑋 𝐾−1𝑛 , 𝑋 𝐾𝑛 ) 𝑛𝑡=1 = 0 (32) Moreover, since the network is memoryless, the following factorization is induced on the code: 𝑃(𝑥 , … , 𝑥 𝐾𝑛 , 𝑦 , … , 𝑦 𝐾𝑛 ) = ∏ 𝑃(𝑥 𝑖𝑛 ) 𝐾𝑖=1 ∏ ℙ ( 𝑦 , 𝑦 , … , 𝑦 𝐾,𝑡 | 𝑥 , 𝑥 , … , 𝑥 𝐾,𝑡 ) 𝑛𝑡=1 (33) Based on the factorization (33), one can easily show that for any 𝛺 ⊆ {1, … , 𝐾} and any 𝑖 ∈ {1, … , 𝐾} , the following Markov relations hold: {𝑋 𝑗𝑛\𝑡 } 𝑗∈𝛺 → {𝑋 𝑗,𝑡 } 𝑗∈𝛺 → 𝑌 𝑖,𝑡 , 𝑡 = 1, … , 𝑛 (34) Therefore, (31) can be simplified in the following single-letter form: 𝛯 ≜ ∑ 𝐼(𝑋 ; 𝑌 |𝑋 , … , 𝑋 𝐾−1,𝑡 , 𝑋

𝐾,𝑡 ) 𝑛𝑡=1 + ⋯ + ∑ 𝐼(𝑋 𝐾−1,𝑡 ; 𝑌

𝐾−1,𝑡 |𝑋 𝐾,𝑡 ) 𝑛𝑡=1 + ∑ 𝐼(𝑋 𝐾,𝑡 ; 𝑌

𝐾,𝑡 ) 𝑛𝑡=1 (35) Finally, by applying a standard time-sharing argument, we derive the desired outer bound as given in (13). ■ In the next theorem, we show that the outer bound derived in Theorem 1 is indeed optimal for a class of ICs with a sequence of less noisy receivers.

Theorem 2.

Consider the K-user IC given in Fig. 1. Assume that the channel satisfies the following less noisy conditions:

𝐼(𝑈; 𝑌 𝑖 |𝑋 𝑖+1 , 𝑋 𝑖+2 , … , 𝑋 𝐾 ) ≤ min{𝐼(𝑈; 𝑌 |𝑋 𝑖+1 , 𝑋 𝑖+2 , … , 𝑋 𝐾 ), 𝐼(𝑈; 𝑌 |𝑋 𝑖+1 , 𝑋 𝑖+2 , … , 𝑋 𝐾 ), … , 𝐼(𝑈; 𝑌 𝑖−1 |𝑋 𝑖+1 , 𝑋 𝑖+2 , … , 𝑋 𝐾 )} (36) for all joint PDFs 𝑃 𝑈𝑋 𝑋 …𝑋 𝑖 𝑃 𝑋 𝑖+1 … 𝑃 𝑋 𝐾 , 𝑖 = 1, … , 𝐾 . Then, the sum-rate capacity of the network is given by (13). Proof of Theorem 2) First note that according to Lemma 2, the conditions (36) can be extended as follows:

𝐼(𝑈; 𝑌 𝑖 |𝑋 𝑖+1 , 𝑋 𝑖+2 , … , 𝑋 𝐾 , 𝐷) ≤ min{𝐼(𝑈; 𝑌 |𝑋 𝑖+1 , 𝑋 𝑖+2 , … , 𝑋 𝐾 , 𝐷), 𝐼(𝑈; 𝑌 |𝑋 𝑖+1 , 𝑋 𝑖+2 , … , 𝑋 𝐾 , 𝐷), … , 𝐼(𝑈; 𝑌 𝑖−1 |𝑋 𝑖+1 , 𝑋 𝑖+2 , … , 𝑋 𝐾 , 𝐷)} (37) for all joint PDFs 𝑃 𝐷𝑈𝑋 𝑋 …𝑋 𝐾 , 𝑖 = 1, … , 𝐾 . Now, for each 𝑖 , if we set 𝐷 ≡ 𝑋 𝑖 in (37), we deduce that (37) implies (12). Therefore, if (36) holds, then the conditions of Theorem 1 are satisfied and thereby (13) constitutes an outer bound on the sum-rate capacity. It remains to prove that (13) is achievable as well. Consider the following simple successive decoding scheme. All messages are simply encoded at the transmitters similar to a multiple access channel (let the parameter 𝑄 be a time-sharing variable). The receiver 𝑌 𝐾 decodes its own message and treats other signals as noise. The rate cost due to this step is: 𝐼(𝑋 𝐾 ; 𝑌 𝐾 |𝑄) At the receiver 𝑌 𝐾−1 , first the message corresponding to the receiver 𝑌 𝐾 is decoded. This step does not introduce any new rate cost. To see this consider (37) for 𝑖 = 𝐾 , 𝑈 ≡ 𝑋 𝐾 , and 𝐷 ≡ 𝑄 . We obtain:

𝐼(𝑋 𝐾 ; 𝑌 𝐾 |𝑄) ≤ min{𝐼(𝑋 𝐾 ; 𝑌 |𝑄), 𝐼(𝑋 𝐾 ; 𝑌 |𝑄), … , 𝐼(𝑋 𝐾 ; 𝑌 𝐾−1 |𝑄)} ≤ 𝐼(𝑋 𝐾 ; 𝑌 𝐾−1 |𝑄)

The receiver 𝑌 𝐾−1 next deocodes its own message. The rate cost due to this step is:

𝐼(𝑋

𝐾−1 ; 𝑌

𝐾−1 |𝑋 𝐾 , 𝑄) This decoding scheme is repeated at the other receivers similarly where the receiver 𝑌 𝑖 successively decodes all the messages corresponding to the receivers 𝑌 𝑗 with 𝑖 ≤ 𝑗 . Considering the conditions (37), one can easily see that by this scheme a sum-rate equal to (13) is achieved. The proof is thus complete. ■ Consider the degraded interference network for which 𝑋 , 𝑋 , … , 𝑋 𝐾 → 𝑌 → 𝑌 → ⋯ → 𝑌 𝐾 forms a Markov chain. For this network, all the less-noisy conditions in (36) are satisfied. Thus, Theorem 2 provides an alternative proof for a partial result of [18]. Note that if the matrix gain of the Gaussian network (2) is of rank one, it would be degraded and therefore its sum-rate capacity can be derived from Theorem 2. In what follows, we present some interesting generalizations of the outer bound derived in Theorem 1. Accordingly, we obtain new bounds which can used to prove capacity results for many other scenarios. To present the main result, we need to the following lemma. Lemma 5.

Consider the general interference channel in Fig. 1. Let 𝑌 𝑎 and 𝑌 𝑏 be two arbitrary receivers. Also, for any arbitrary 𝛺 , 𝛺 ⊆{1, … , 𝐾} , let {𝑋 𝑙 } 𝑙∈𝛺 and {𝑋 𝑙 } 𝑙∈𝛺 be two subset of input signals. Assume that the following condition holds: 𝐼(𝑈, {𝑋 𝑙 } 𝑙∈𝛺 ; 𝑌 𝑎 |{𝑋 𝑙 } 𝑙∈𝛺 ) ≤ 𝐼(𝑈, {𝑋 𝑙 } 𝑙∈𝛺 ; 𝑌 𝑏 |{𝑋 𝑙 } 𝑙∈𝛺 ) (38) for all joint PDFs 𝑃 𝑈,{𝑋 𝑖 } 𝑖=1𝐾 −{{𝑋 𝑙 } 𝑙∈𝛺2 } ∏ 𝑃 𝑋 𝑖 𝑋 𝑖 ∈{𝑋 𝑙 } 𝑙∈𝛺2 . Then, given any code of length 𝑛 for the network, we have: 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑎𝑛 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 ) ≤ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑏𝑛 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 ) (39) Proof of Lemma 5)

First note that the condition (38) can be extended to:

𝐼(𝑈, {𝑋 𝑙 } 𝑙∈𝛺 ; 𝑌 𝑎 |{𝑋 𝑙 } 𝑙∈𝛺 , 𝐷) ≤ 𝐼(𝑈, {𝑋 𝑙 } 𝑙∈𝛺 ; 𝑌 𝑏 |{𝑋 𝑙 } 𝑙∈𝛺 , 𝐷) , for all joint PDFs 𝑃 𝐷𝑈𝑋 𝑋 …𝑋 𝐾 (40) This can be derived by following the same arguments as the proof of Lemma 2. Now, consider a code of length 𝑛 for the network. The proof indeed involves interesting computations including subtle applications of Csiszar-Korner identity. We have: 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑎𝑛 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 ) − 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑏𝑛 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 ) = ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑎,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 ) 𝑛𝑡=1 − ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑏,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 = ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑏,𝑡+1𝑛 ; 𝑌 𝑎,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 ) 𝑛𝑡=1 − ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 ; 𝑌 𝑏,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝑏,𝑡+1𝑛 ; 𝑌 𝑎,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , {𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝑎𝑡−1 ; 𝑌 𝑏,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , {𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 = (𝑎) ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑏,𝑡+1𝑛 ; 𝑌 𝑎,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 ) 𝑛𝑡=1 − ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 ; 𝑌 𝑏,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 = ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑎,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 + ∑ 𝐼(𝑌 𝑏,𝑡+1𝑛 ; 𝑌 𝑎,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 ) 𝑛𝑡=1 − ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑏,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 − ∑ 𝐼(𝑌 𝑎𝑡−1 ; 𝑌 𝑏,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 = (𝑏) ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑎,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 − ∑ 𝐼({𝑋 𝑙𝑛 } 𝑙∈𝛺 ; 𝑌 𝑏,𝑡 |{𝑋 𝑙𝑛 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 = ∑ 𝐼 ({𝑋 𝑙𝑛\𝑡 } 𝑙∈𝛺 , {𝑋 𝑙,𝑡 } 𝑙∈𝛺 ; 𝑌 𝑎,𝑡 |{𝑋 𝑙,𝑡 } 𝑙∈𝛺 , {𝑋 𝑙𝑛\𝑡 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 − ∑ 𝐼 ({𝑋 𝑙𝑛\𝑡 } 𝑙∈𝛺 , {𝑋 𝑙,𝑡 } 𝑙∈𝛺 ; 𝑌 𝑏,𝑡 |{𝑋 𝑙,𝑡 } 𝑙∈𝛺 , {𝑋 𝑙𝑛\𝑡 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 , 𝑌 𝑏,𝑡+1𝑛 ) 𝑛𝑡=1 ≤ (𝑐) (41) where equality (a) holds because, by Csiszar-Korner identity, the third and the fourth ensembles on the left side of (a) are equal; also, equality (b) holds because, again by Csiszar-Korner identity, the second and the fourth ensembles on the left side of (b) are equal; lastly, inequality (c) is derived from (40) by substituting 𝑈 ≡ {𝑋 𝑙𝑛\𝑡 } 𝑙∈𝛺 , and 𝐷 ≡ ({𝑋 𝑙𝑛\𝑡 } 𝑙∈𝛺 , 𝑌 𝑎𝑡−1 , 𝑌 𝑏,𝑡+1𝑛 ) . The proof is thus complete. ■ Remark 1.

Consider the condition (38). If 𝛺 = {1, … , 𝐾} − 𝛺 , then the auxiliary random variable 𝑈 in (38) is dropped. In other words, it is reduced as follows: 𝐼({𝑋 𝑙 } 𝑙∈𝛺 ; 𝑌 𝑎 |{𝑋 𝑙 } 𝑙∈𝛺 ) ≤ 𝐼({𝑋 𝑙 } 𝑙∈𝛺 ; 𝑌 𝑏 |{𝑋 𝑙 } 𝑙∈𝛺 ) (42) for all joint PDFs ∏ 𝑃 𝑋 𝑖 𝑖=1,…,𝐾 . This is because 𝑈 → {𝑋 , 𝑋 , … , 𝑋 𝐾 } → 𝑌 𝑎 , 𝑌 𝑏 forms a Markov chain. Now, using Lemma 5, we derive the following generalization of Theorem 1. Theorem 3.

Consider the general K-user interference channel in Fig. 1. Let 𝜆(. ) be a permutation of the elements of the set {1, … , 𝐾} . Let also 𝑖 , 𝑖 , … , 𝑖 𝜇 be elements of the set {1, … , 𝐾} with: < 𝑖 < ⋯ < 𝑖 𝜇−1 < 𝑖 𝜇 = 𝐾, (43) where 𝜇 is an arbitrary natural number less than or equal to 𝐾 . Define: 𝕏 𝑖 𝜃 ≜ ⋃ {𝑋 𝜆(𝑙) } 𝑖 𝜃−1 +1≤𝑙≤𝑖 𝜃 , 𝜃 = 1, … , 𝜇, (𝑖 ≜ 0) (44) Assume that the network transition probability function satisfies the following conditions:  For 𝜔 = 1, … , 𝑖 − 1 ⟹ 𝐼 ({𝑋 𝜆(𝑙) } 𝑙≤𝜔 ; 𝑌 𝜆(𝜔) |{𝑋 𝜆(𝑙) } 𝜔+1≤𝑙 ) ≤ 𝐼 ({𝑋 𝜆(𝑙) } 𝑙≤𝜔 ; 𝑌 𝜆(𝜔+1) |{𝑋 𝜆(𝑙) } 𝜔+1≤𝑙 ) for all joint PDFs 𝑃 {𝑋 𝜆(𝑙) } 𝑙≤𝜔 ∏ 𝑃 𝑋 𝑗 𝑗 ∈ {𝜆(𝑙)} 𝜔+1≤𝑙 , (45)  For 𝜃 = 1, … , 𝜇 − 1, 𝜔 = 1, … , 𝑖 𝜃+1 − 𝑖 𝜃 − 1 ⟹ 𝐼 (𝑈, {𝑋 𝜆(𝑙) } 𝑖 𝜃 +1≤𝑙≤𝑖 𝜃 +𝜔 ; 𝑌 𝜆(𝑖 𝜃 +𝜔) |{𝑋 𝜆(𝑙) } 𝑖 𝜃 +𝜔+1≤𝑙 ) ≤ 𝐼 (𝑈, {𝑋 𝜆(𝑙) } 𝑖 𝜃 +1≤𝑙≤𝑖 𝜃 +𝜔 ; 𝑌 𝜆(𝑖 𝜃 +𝜔+1) |{𝑋 𝜆(𝑙) } 𝑖 𝜃 +𝜔+1≤𝑙 ) for all joint PDFs 𝑃 𝑈,{𝑋 𝜆(𝑙) } 𝑙≤𝑖𝜃+𝜔 ∏ 𝑃 𝑋 𝑗 𝑗 ∈ {𝜆(𝑙)} 𝑖𝜃+𝜔+1≤𝑙 , (46)  For 𝜃 = 2, … , 𝜇 ⟹ 𝐼(𝑈; 𝑌 𝜆(𝑖 𝜃 ) | ⋃ 𝕏 𝑖 𝛼 𝜇𝛼=𝜃 ) ≤ 𝐼(𝑈; 𝑌 𝜆(𝑖 𝜃−1 ) | ⋃ 𝕏 𝑖 𝛼 𝜇𝛼=𝜃 ) for all joint PDFs 𝑃 𝑈,{𝑋 𝑗 } 𝑗=1𝐾 −⋃ 𝕏 𝑖𝛼𝜇𝛼=𝜃 ∏ 𝑃 𝑋 𝑗 𝑋 𝑗 ∈⋃ 𝕏 𝑖𝛼𝜇𝛼=𝜃 , (47) Then the sum-rate capacity is bounded-above by: 𝒞 𝑠𝑢𝑚 ≤ max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 …𝑃 𝑋𝐾|𝑄 (𝐼 (𝕏 𝑖 ; 𝑌 𝜆(𝑖 ) |𝕏 𝑖 , … , 𝕏 𝑖 𝜇−1 , 𝕏 𝑖 𝜇 , 𝑄) + 𝐼 (𝕏 𝑖 ; 𝑌 𝜆(𝑖 ) |𝕏 𝑖 , … , 𝕏 𝑖 𝜇−1 , 𝕏 𝑖 𝜇 , 𝑄) + ⋯ + 𝐼 (𝕏 𝑖 𝜇 ; 𝑌 𝜆(𝑖 𝜇 ) |𝑄)) (48) Proof of Theorem 3)

Consider a code of length 𝑛 for the network. Similar to (16), one can readily derive: 𝑛𝒞 𝑠𝑢𝑚 ≤ 𝐼 (𝑋 𝜆(1)𝑛 ; 𝑌 𝜆(1)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + 𝐼 (𝑋 𝜆(2)𝑛 ; 𝑌 𝜆(2)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) +𝐼 (𝑋 𝜆(𝑖 +1)𝑛 ; 𝑌 𝜆(𝑖 +1)𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +2≤𝑙 ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) +𝐼 (𝑋 𝜆(𝑖 +1)𝑛 ; 𝑌 𝜆(𝑖 +1)𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +2≤𝑙 ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) ⋮ +𝐼 (𝑋 𝜆(𝑖 𝜇−1 +1)𝑛 ; 𝑌 𝜆(𝑖 𝜇−1 +1)𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 𝜇−1 +2≤𝑙 ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 𝜇 )𝑛 ; 𝑌 𝜆(𝑖 𝜇 )𝑛 ) +𝑛𝜖 𝑛 (49)

11 where 𝜖 𝑛 → 0 as 𝑛 → 0 . Now consider the first row of (49). The conditions (45), according to Lemma 5 (see also Remark 1), imply that: 𝐼 (𝑋 𝜆(1)𝑛 ; 𝑌 𝜆(1)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + 𝐼 (𝑋 𝜆(2)𝑛 ; 𝑌 𝜆(2)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) ≤ 𝐼 (𝑋 𝜆(1)𝑛 ; 𝑌 𝜆(2)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + 𝐼 (𝑋 𝜆(2)𝑛 ; 𝑌 𝜆(2)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + 𝐼 (𝑋 𝜆(3)𝑛 ; 𝑌 𝜆(3)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) = 𝐼 (𝑋 𝜆(1)𝑛 , 𝑋 𝜆(2)𝑛 ; 𝑌 𝜆(2)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + 𝐼 (𝑋 𝜆(3)𝑛 ; 𝑌 𝜆(3)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) ≤ 𝐼 (𝑋 𝜆(1)𝑛 , 𝑋 𝜆(2)𝑛 ; 𝑌 𝜆(3)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + 𝐼 (𝑋 𝜆(3)𝑛 ; 𝑌 𝜆(3)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) = 𝐼 (𝑋 𝜆(1)𝑛 , 𝑋 𝜆(2)𝑛 , 𝑋 𝜆(3)𝑛 ; 𝑌 𝜆(3)𝑛 |{𝑋 𝜆(𝑙)𝑛 } ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) ⋮ ≤ 𝐼 (𝑋 𝜆(1)𝑛 , 𝑋 𝜆(2)𝑛 , … , 𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) = 𝐼 (𝕏 𝑖 𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |𝕏 𝑖 𝑛 , … , 𝕏 𝑖 𝜇−1 𝑛 , 𝕏 𝑖 𝜇 𝑛 ) (50) Similarly, for the other rows in (49), according to Lemma 5, the conditions (46) imply that: 𝐼 (𝑋 𝜆(𝑖 +1)𝑛 ; 𝑌 𝜆(𝑖 +1)𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +2≤𝑙 ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 )𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 +1≤𝑙 ) ≤ 𝐼 (𝕏 𝑖 𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |𝕏 𝑖 𝑛 , … , 𝕏 𝑖 𝜇−1 𝑛 , 𝕏 𝑖 𝜇 𝑛 ) ⋮ 𝐼 (𝑋 𝜆(𝑖 𝜇−1 +1)𝑛 ; 𝑌 𝜆(𝑖 𝜇−1 +1)𝑛 |{𝑋 𝜆(𝑙)𝑛 } 𝑖 𝜇−1 +2≤𝑙 ) + ⋯ + 𝐼 (𝑋 𝜆(𝑖 𝜇 )𝑛 ; 𝑌 𝜆(𝑖 𝜇 )𝑛 ) ≤ 𝐼 (𝕏 𝑖 𝜇 𝑛 ; 𝑌 𝜆(𝑖 𝜇 )𝑛 ) (51) Thus, by substituting (50)-(51) in (49), we obtain: 𝑛𝒞 𝑠𝑢𝑚 ≤ 𝐼 (𝕏 𝑖 𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |𝕏 𝑖 𝑛 , … , 𝕏 𝑖 𝜇−1 𝑛 , 𝕏 𝑖 𝜇 𝑛 ) + 𝐼 (𝕏 𝑖 𝑛 ; 𝑌 𝜆(𝑖 )𝑛 |𝕏 𝑖 𝑛 , … , 𝕏 𝑖 𝜇−1 𝑛 , 𝕏 𝑖 𝜇 𝑛 ) + ⋯ + 𝐼 (𝕏 𝑖 𝜇 𝑛 ; 𝑌 𝜆(𝑖 𝜇 )𝑛 ) + 𝑛𝜖 𝑛 (52) Finally, if the conditions (47) hold, by following the same lines as (16)-(35), one can derive the single-letter outer bound given in (48). The proof is thus complete. ■ Remark 2.

Let consider the two-user IC. By setting 𝜆(1) = 2 , 𝜆(2) = 1 , 𝜇 = 1 , 𝑖 = 2 , in the conditions of Theorem 3, we obtain that if the following holds: 𝐼(𝑋 ; 𝑌 |𝑋 ) ≤ 𝐼(𝑋 ; 𝑌 |𝑋 ) for all PDFs 𝑃 𝑋 𝑃 𝑋 (53) then, the sum-rate capacity is bounded by: 𝒞 𝑠𝑢𝑚 ≤ max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄

𝐼(𝑋 , 𝑋 ; 𝑌 |𝑄) (54) Also, by setting 𝜆(1) = 1 , 𝜆(2) = 2 , 𝜇 = 2 , 𝑖 = 1 , 𝑖 = 2 , we obtain that if: 𝐼(𝑈; 𝑌 |𝑋 ) ≤ 𝐼(𝑈; 𝑌 |𝑋 ) for all joint PDFs 𝑃 𝑈𝑋 𝑃 𝑋 (55) then, the sum-rate capacity is bounded by: 𝒞 𝑠𝑢𝑚 ≤ max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 (𝐼(𝑋 ; 𝑌 |𝑋 , 𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄)) (56) Therefore, one can deduce that if both the conditions (53) and (55) hold simultaneously, then the following is an outer bound on the sum-rate capacity: max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 min ( 𝐼(𝑋 , 𝑋 ; 𝑌 |𝑄)𝐼(𝑋 ; 𝑌 |𝑋 , 𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄)) (57) On the other hand, by a simple successive decoding scheme, one can achieve a sum-rate equal to (57). Thereby, (57) is the sum capacity of the channel under the conditions (53) and (55). These conditions in fact represent the mixed interference regime identified in [10, Th. 6] for the two-user IC. Thus, the result of [10, Th. 6] is indeed recovered by Theorem 3 here. Next, we demonstrate how one can derive new sum-rate capacity results for many different interference networks using the outer bound established in Theorem 3. Specifically, we consider the three-user IC. The Gaussian channel is formulated as: {𝑌 = 𝑋 + 𝑎 𝑋 + 𝑎 𝑋 + 𝑍 𝑌 = 𝑎 𝑋 + 𝑋 + 𝑎 𝑋 + 𝑍 𝑌 = 𝑎 𝑋 + 𝑎 𝑋 + 𝑋 + 𝑍 (58) where 𝑍 , 𝑍 , and 𝑍 are zero-mean unit-variance Gaussian random variables and the inputs are subject to power constraints 𝔼[𝑋 𝑖2 ] ≤𝑃 𝑖 , 𝑖 = 1,2,3 . First let us present the sum-rate achievable by the successive decoding scheme. It is given below: max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 𝑃 𝑋3|𝑄 min { 𝐼(𝑋 ; 𝑌 |𝑋 , 𝑋 , 𝑄) + 𝐼(𝑋 : 𝑌 |𝑋 , 𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄),𝐼(𝑋 ; 𝑌 |𝑋 , 𝑋 , 𝑄) + 𝐼(𝑋 , 𝑋 : 𝑌 |𝑄), 𝐼(𝑋 ; 𝑌 |𝑋 , 𝑋 , 𝑄) + 𝐼(𝑋 : 𝑌 |𝑋 , 𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄),𝐼(𝑋 , 𝑋 ; 𝑌 |𝑋 , 𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄), 𝐼(𝑋 , 𝑋 ; 𝑌 |𝑋 , 𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄), 𝐼(𝑋 , 𝑋 , 𝑋 ; 𝑌 |𝑄) } (59) To achieve this sum-rate:  The receiver 𝑌 decodes only its corresponding signal 𝑋 .  The receiver 𝑌 decodes the signal 𝑋 first and then decodes its corresponding signal 𝑋 .  The receiver 𝑌 decodes the signal 𝑋 first, then the signal 𝑋 , and lastly its corresponding signal 𝑋 . As we see from (59), the sum-rate expression due to this achievability scheme is described by six constraints. We intend to explore conditions under which this achievable sum-rate is optimal. Note that some of the constraints in (59) do not have a structure similar to the expression of the outer bound (48); for example, the third and the fifth ones. Therefore, we need to impose appropriate conditions on the network probability function so that such constraints can be relaxed. Let the network satisfies the following conditions: {𝐼(𝑋 : 𝑌 |𝑋 , 𝑄) ≥ 𝐼(𝑋 : 𝑌 |𝑋 , 𝑄)𝐼(𝑋 ; 𝑌 |𝑄) ≥ 𝐼(𝑋 ; 𝑌 |𝑄) (60) for all joint PDFs which are a solution to the following maximization: max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 𝑃 𝑋3|𝑄 min (𝐼(𝑋 , 𝑋 ; 𝑌 |𝑋 , 𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄),𝐼(𝑋 , 𝑋 , 𝑋 ; 𝑌 |𝑄) ) (61) In this case, the achievable sum-rate (59) is reduced to (61). In other words, the conditions (60) enable us to relax those constraints of (59) which are not given in (61). Now consider the outer bound (48) specialized for the three-user IC. By setting 𝜆(1) = 2 , 𝜆(2) = 1 , 𝜆(3) = 3 , 𝜇 = 2 , 𝑖 = 2 , and 𝑖 = 3 , we obtain that if the following conditions hold: {𝐼(𝑋 ; 𝑌 |𝑋 , 𝑋 ) ≤ 𝐼(𝑋 ; 𝑌 |𝑋 , 𝑋 ) for all joint PDFs 𝑃 𝑋 𝑃 𝑋 𝑃 𝑋 𝐼(𝑈; 𝑌 |𝑋 ) ≤ 𝐼(𝑈; 𝑌 |𝑋 ) for all joint PDFs 𝑃 𝑈𝑋 𝑋 𝑃 𝑋 (62) then, the sum-rate capacity is bounded above by: 𝒞 𝑠𝑢𝑚 ≤ max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 𝑃 𝑋3|𝑄 (𝐼(𝑋 , 𝑋 ; 𝑌 |𝑋 , 𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄)) (63) Also, by setting 𝜆(1) = 3, 𝜆(2) = 2, 𝜆(3) = 1 , 𝜇 = 1 , and 𝑖 = 3 , we obtain that if the following conditions hold: {𝐼(𝑋 ; 𝑌 |𝑋 , 𝑋 ) ≤ 𝐼(𝑋 ; 𝑌 |𝑋 , 𝑋 ) for all joint PDFs 𝑃 𝑋 𝑃 𝑋 𝑃 𝑋 𝐼(𝑋 , 𝑋 ; 𝑌 |𝑋 ) ≤ 𝐼(𝑋 , 𝑋 ; 𝑌 |𝑋 ) for all joint PDFs 𝑃 𝑋 𝑃 𝑋 𝑋 (64) then, the sum-rate capacity is bounded above as: 𝒞 𝑠𝑢𝑚 ≤ max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 𝑃 𝑋3|𝑄 (𝐼(𝑋 , 𝑋 , 𝑋 ; 𝑌 |𝑄)) (65) Thus, if the collection of the conditions (60), (62) and (64) are satisfied, the sum-rate capacity of the network is given by (61). This capacity result could not be obtained using the outer bound (13) derived in Theorem 1. Let us now consider the Gaussian channel given in (58). Lemmas 3 and 4 imply that if the channel gains satisfy: {|𝑎 | ≥ 1, |𝑎 | ≤ 1, |𝑎 | ≥ 1 𝑎 = 𝑎 𝑎 , 𝑎 = 𝑎 𝑎 , (66) then (62) and (64) are also satisfied. Note that, according to Corollary 1, the second inequality of (64) implies the first inequality of (62). For a Gaussian channel satisfying (66), using the entropy power inequality, one can prove that Gaussian input distributions without time-sharing ( 𝑄 ≡ ∅ ) is the solution to the maximization (61). Hence, if the inequalities (60) hold for Gaussian distributions, then the sum-rate capacity is given by (61). Considering (66), it is readily derived that both inequalities (60) hold for Gaussian distributions provided that: 𝑃 + 1 ≥ 𝑎 (𝑎 𝑃 + 1) (67) Thus, for a three-user Gaussian IC (58) with the conditions (66)-(67), the sum-rate capacity is given as follows: min (𝜓(𝑃 + 𝑎 𝑃 ) + 𝜓 ( 𝑃 𝑎 𝑃 + 𝑎 𝑃 + 1 ) 𝜓(𝑃 + 𝑎 𝑃 + 𝑎 𝑃 ) ) (68) where 𝜓(𝑥) ∶= log(1 + 𝑥) . This sum-rate capacity is achieved by successive decoding scheme. It should be remarked that by applying more efficient coding strategies such as joint decoding scheme (or combination of both successive and joint decoding schemes), one can achieve further capacity results for the ICs. Indeed, our approach could be followed to obtain sum-rate capacity for many other network topologies. A second generalization of the result of Theorem 1 is given below. Theorem 4.

Consider the general K-user IC shown in Fig. 1. Let 𝒀 𝐺(1) , 𝒀

𝐺(2) , … , 𝒀

𝐺(𝜇) be nonempty subsets of the set of the outputs {𝑌 , … , 𝑌 𝐾 } so that: {𝒀 𝐺(𝑖 ) ⋂𝒀 𝐺(𝑖 ) ≠ ∅ ⋃ 𝒀 𝐺(𝑖)𝜇𝑖=1 = {𝑌 , … , 𝑌 𝐾 } (69) i.e., the collection 𝒀 𝐺(1) , 𝒀

𝐺(2) , … , 𝒀

𝐺(𝜇) constitutes a nonempty partitioning for {𝑌 , … , 𝑌 𝐾 } . Define: 𝕏 𝐺(𝑖) ≜ ⋃ {𝑋 𝑙 } 𝑙: 𝑌 𝑙 ∈𝒀 𝐺(𝑖) , 𝑖 = 1, … , 𝜇 (70)

Assume that the network transition probability function satisfies the following conditions:

𝐼(𝑈; 𝒀

𝐺(𝑖) |𝕏 𝐺(𝑖) , 𝕏

𝐺(𝑖+1) , … , 𝕏

𝐺(𝜇) ) ≤ 𝐼(𝑈; 𝒀

𝐺(𝑖−1) |𝕏 𝐺(𝑖) , 𝕏

𝐺(𝑖+1) , … , 𝕏

𝐺(𝜇) ) , for all joint PDFs 𝑃 𝑈,{𝑋 𝑗 } 𝑗=1𝐾 −⋃ 𝕏 𝐺(𝛼)𝜇𝛼=𝑖 ∏ 𝑃 𝑋 𝑗 𝑋 𝑗 ∈⋃ 𝕏 𝐺(𝛼)𝜇𝛼=𝑖 , 𝑖 = 2, … , 𝜇 (71) Then the sum-rate capacity of the network is bounded above as: 𝒞 𝑠𝑢𝑚 ≤ max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 …𝑃 𝑋𝐾|𝑄 (𝐼(𝕏

𝐺(1) ; 𝒀

𝐺(1) |𝕏 𝐺(2) , … , 𝕏

𝐺(𝜇−1) , 𝕏

𝐺(𝜇) , 𝑄) + ⋯ + 𝐼(𝕏

𝐺(𝜇−1) ; 𝒀

𝐺(𝜇−1) |𝕏 𝐺(𝜇) , 𝑄) + 𝐼(𝕏

𝐺(𝜇) ; 𝒀

𝐺(𝜇) |𝑄)) (72)

Proof of Theorem 4)

Consider a code of length 𝑛 for the network with the rates 𝑅 , 𝑅 , … , 𝑅 𝐾 for the users 𝑋 , 𝑋 , … , 𝑋 𝐾 , respectively. Given the definition (70), using Fano’s inequality, one can derive: 𝑛 ∑ 𝑅 𝑙𝑙∶𝑌 𝑙 ∈𝒀 𝐺(𝑖) ≤ 𝐼(𝕏

𝐺(𝑖)𝑛 ; 𝒀

𝐺(𝑖)𝑛 ) + 𝑛𝜖

𝐺(𝑖),𝑛 ≤ (𝑎) 𝐼(𝕏

𝐺(𝑖)𝑛 ; 𝒀

𝐺(𝑖)𝑛 |𝕏 𝐺(𝑖+1)𝑛 ⋃ … ⋃𝕏

𝐺(𝜇)𝑛 ) + 𝑛𝜖

𝐺(𝑖),𝑛 (73) where 𝜖 𝐺(𝑖),𝑛 → 0 as 𝑛 → ∞ . Note that inequality (a) in (73) holds because the signals of the set 𝕏 𝐺(𝑖)𝑛 are independent of those in 𝕏 𝐺(𝑖+1)𝑛 ⋃ … ⋃𝕏

𝐺(𝜇)𝑛 . By adding the two sides of (73) for 𝑖 = 1, … , 𝜇 , we obtain: 𝑛𝒞 𝑠𝑢𝑚 = 𝑛 (∑ ∑ 𝑅 𝑙𝑙∶𝑌 𝑙 ∈𝒀 𝐺(𝑖) 𝜇𝑖=1 ) ≤ 𝐼(𝕏 𝐺(1)𝑛 ; 𝒀

𝐺(1)𝑛 |𝕏 𝐺(2)𝑛 , … , 𝕏

𝐺(𝜇−1)𝑛 , 𝕏

𝐺(𝜇)𝑛 ) + ⋯ + 𝐼(𝕏

𝐺(𝜇−1)𝑛 ; 𝒀

𝐺(𝜇−1)𝑛 |𝕏 𝐺(𝜇)𝑛 ) + 𝐼(𝕏

𝐺(𝜇)𝑛 ; 𝒀

𝐺(𝜇)𝑛 ) + 𝑛𝜖 𝑛 (74) where 𝜖 𝑛 → 0 as 𝑛 → ∞ . Now, if the conditions (71) hold, by following the same lines as (16)-(35), one can derive the single-letter outer bound given in (72). The proof is thus complete. ■ Remark 3.

It is clear that by setting 𝜇 = 𝐾 and 𝒀 𝐺(𝑖) = {𝑌 𝑖 }, 𝑖 = 1, … , 𝐾 , the conditions (71) and the outer bound (72) are reduced to (12) and (13), respectively. In fact, the generalized bound of Theorem 4 could be simply deduced by considering a virtual interference network with the outputs 𝒀 𝐺(1) , 𝒀

𝐺(2) , … , 𝒀

𝐺(𝜇) and the corresponding inputs 𝕏 𝐺(1) , 𝕏

𝐺(2) , … , 𝕏

𝐺(𝜇) and then applying the result of Theorem 1. The outer bound given in Theorem 4 can be used to prove explicit sum-rate capacity results which are not necessarily derived from the bounds of Theorems 1 and 3. We conclude the paper by providing an example in this regard. Consider the K-user many-to-one interference channel . This is a special class of the K-user interference channel where only one receiver experiences interference. In this case, the channel transition probability function is factorized in the following form: ℙ 𝑌 …𝑌 𝐾 |𝑋 …𝑋 𝐾 = ℙ 𝑌 |𝑋 ℙ 𝑌 |𝑋 ℙ 𝑌 |𝑋 … ℙ 𝑌 𝐾−1 |𝑋 𝐾−1 ℙ 𝑌 𝐾 |𝑋 𝑋 𝑋 …𝑋 𝐾 (75) Define 𝒀 𝐺(1) , 𝒀

𝐺(2) as follows: 𝒀 𝐺(1) ≜ {𝑌 , 𝑌 , … , 𝑌 𝐾−1 }, 𝒀

𝐺(2) ≜ {𝑌 𝐾 } (76) Now by letting 𝜇 = 2 and substituting (76) in Theorem 4, we obtain that if the following condition holds: 𝐼(𝑈; 𝑌 𝐾 |𝑋 𝐾 ) ≤ 𝐼(𝑈; 𝑌 , 𝑌 , … , 𝑌 𝐾−1 |𝑋 𝐾 ) , for all joint PDFs 𝑃 𝑈𝑋 𝑋 …𝑋 𝐾−1 𝑃 𝑋 𝐾 , (77) then the sum-rate capacity is bounded above as: 𝒞 𝑠𝑢𝑚 ≤ max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 …𝑃 𝑋𝐾|𝑄 (𝐼(𝑋 , 𝑋 , … , 𝑋 𝐾−1 ; 𝑌 , 𝑌 , … , 𝑌 𝐾−1 |𝑋 𝐾 , 𝑄) + 𝐼(𝑋 𝐾 ; 𝑌 𝐾 |𝑄)) = (𝑎) max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 …𝑃 𝑋𝐾|𝑄 (𝐼(𝑋 ; 𝑌 |𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄) + ⋯ + 𝐼(𝑋 𝐾 ; 𝑌 𝐾 |𝑄)) (78) where equality (a) is due to the factorization (75). Now, one can achieve (78) by a simple treating interference as noise strategy, i.e., the outer bound is in fact optimal. Thus, the sum-rate capacity of the many-to-one interference channel, if the less noisy condition (77) holds, is given by: 𝒞 𝑠𝑢𝑚 = max 𝑃 𝑄 𝑃 𝑋1|𝑄 𝑃 𝑋2|𝑄 …𝑃 𝑋𝐾|𝑄 (𝐼(𝑋 ; 𝑌 |𝑄) + 𝐼(𝑋 ; 𝑌 |𝑄) + ⋯ + 𝐼(𝑋 𝐾 ; 𝑌 𝐾 |𝑄)) (79) Similarly, many other scenarios can be identified for which the outer bound derived in Theorem 4 yields the exact sum-rate capacity. C ONCLUSION

In this paper, we presented novel techniques to derive single-letter outer bounds on the sum-rate capacity of multi-user interference channels. Our approach requires subtle sequential applications of Csiszar-Korner identity. We demonstrated that our bounds are efficient to prove several new capacity results for various networks under specific conditions. Our new capacity results hold for both discrete and Gaussian channels. R EFERENCES [1]

R. Ahlswede, “Multi-way communnication channels,” in

Proc. 2nd Int. Symp. Information Theory , U. Tsahkadsor, Armenia, Sep. 1971, pp. 23–52. [2]

H. Sato, “The capacity of the Gaussian interference channel under strong interference,”

IEEE Trans. Inf. Theory , vol. IT-27, no. 6, pp. 786–788, Nov. 1981. [3]

M. H. M. Costa and A. E. Gamal, “The capacity region of the discrete memoryless interference channel with strong interference,”

IEEE

Trans. Inf. Theory , vol. 33, pp. 710–711, Sep. 1987. [4]

A. A. El Gamal and M. H. M. Costa, “The capacity region of a class of deterministic interference channels,”

IEEE Trans. Inf. Theory , vol. IT-28, no. 2, pp. 343–346, Mar. 1982. [5]

N. Liu and S. Ulukus, “The capacity region of a class of discrete degraded interference channels,” in

IEEE Trans. Inform. Theory , vol. 54(9), Sept 2008, pp. 4372–4378. [6]

N. Liu and A. J. Goldsmith, “Capacity regions and bounds for a class of Z-interference channels,”

IEEE Trans. Inf. Theory , vol. 55, pp. 4986–4994, Nov. 2009. [7]

A. S. Motahari and A. K. Khandani, “Capacity bounds for the Gaussian interference channel,”

IEEE Trans. Inform. Theory , vol. 55, no. 2, pp. 620–643, Feb 2009. [8]

V. S. Annapureddy and V. V. Veeravalli, “Gaussian interference networks: Sum capacity in the low-interference regime and new outer bounds on the capacity region,” in

IEEE Trans. Inform. Theory , vol. 55(7), July 2009, pp. 3032 – 3050. [9]

X. Shang, G. Kramer, and B. Chen, “A new outer bound and the noisy interference sumrate capacity for Gaussian interference channels,” in

IEEE Trans. Inform. Theory , vol. 55(2), 2009, pp. 689 – 699. [10]

R. K. Farsani, “On the capacity region of the broadcast, the interference, and the cognitive radio channels,”

IEEE Transactions on Information Theory , vol. 61, no. 5, pp. 2600–2623, May 2015. [11]

R. Etkin, D. N. C. Tse, and H. Wang, “Gaussian interference channel capacity to within one bit,”

IEEE Trans. Inf. Theory , vol. 54, no. 12, pp. 5534–5562, Dec. 2008. [12]

T. S. Han and K. Kobayashi, “A new achievable rate region for the interference channel,”

IEEE Trans. Inf. Theory , vol. IT-27, no. 1, pp. 49–6o, Jan. 1981. [13]

G. Bresler, A. Parekh, and D. Tse, “The approximate capacity of the many-to-one and one-to-many Gaussian interference channels,”

IEEE Trans. Inf. Theory , vol. 56, no. 9, pp. 4566-4592, Sep. 2010. [14]

B. Bandemer and A. El Gamal, “Interference decoding for deterministic channels,”

IEEE Tran. on Inf. Theory , vol. 57, no. 5, pp. 2966–2975, May 2011. [15]

D. Tuninetti, “A new sum-rate outer bound for interference channels with three source-destination pairs,”

Proceedings of the Information Theory and Applications Workshop , 2011, San Diego, CA USA, Feb 2011. [16]

L. Zhou and W. Yu, “On the capacity of the K-user cyclic Gaussian interference channel”, submitted to IEEE Transactions on Information Theory , arXiv:1010.1044. [17]

S. Sridharan, A. Jafarian, S. Vishwanath, and S. Jafar, “Capacity of symmetric k-user Gaussian very strong interference channels, in

Proc. IEEE Global Telecommun. Conf. , Dec. 4, 2008, DOI: 10.1109/GLOCOM.2008.ECP.180. [18]

R. K. Farsani, “The sum-rate capcity of general degarded interference networks with arbitrary topologies,” in

IEEE International Symposium on Information Theory Proceedings , 2014, pp. 276-280. [19] ___________, “The k-user interference channel: Strong interference regime,” in

IEEE International Symposium on Information Theory Proceedings , 2013, pp. 2029–2033. [20]

I. Csiszár and J. Körner, “Broadcast channels with confidential messages,”

IEEE Trans. Inf. Theory , vol. IT-24, no. 3, pp. 339–348, May 1978. [21]

R. K. Farsani, “Fundamental limits of communications in interference networks-Part III: Information flow in strong interference regime,”

IEEE Trans.

Information Theory, Submitted for Publication, 2012, available at http://arxiv.org/abs/1207.3035 . Appendix A

Proof of Lemma 1 First we show that (3) implies the following inequality:

𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝑊) ≤ 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝑊) (80) for all PDFs 𝑃 𝑊𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑤, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) with: 𝑃 𝑊𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 = 𝑃 𝑊 𝑃 𝑋 …𝑋 𝜇1 |𝑊 𝑃 𝑋 𝜇1+1 |𝑊 𝑃 𝑋 𝜇1+2 |𝑊 … 𝑃 𝑋 𝜇1+𝜇2 |𝑊 (81) where 𝑊 → 𝑋 , … , 𝑋 𝜇 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 → 𝑌 , 𝑌 forms a Markov chain. To prove this inequality, one can write: 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝑊) = ∑ 𝑃 𝑊 (𝑤)𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝑤) 𝑤 = ∑ 𝑃 𝑊 (𝑤)𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) 〈𝑃 𝑋1…𝑋𝜇1|𝑤 ×𝑃 𝑋𝜇1+1|𝑤 ×𝑃 𝑋𝜇1+2|𝑤 ×…×𝑃

𝑋𝜇1+𝜇2|𝑤 〉𝑤 = ∑ 𝑃 𝑊 (𝑤)𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝑤) 𝑤 = 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝑊) (82) where the notation 𝐼(𝐴; 𝐵|𝐶) 〈𝑃(.)〉 indicates that the mutual information function

𝐼(𝐴; 𝐵|𝐶) is evaluated by the distribution

𝑃(. ) . Note that for any given 𝑤 , the function 𝑃 𝑋 …𝑋 𝜇1 |𝑤 × 𝑃 𝑋 𝜇1+1 |𝑤 × 𝑃 𝑋 𝜇1+2 |𝑤 × … × 𝑃 𝑋 𝜇1+𝜇2 |𝑤 is a probability distribution defined over the set 𝒳 × … × 𝒳 𝜇 × 𝒳 𝜇 +1 × … × 𝒳 𝜇 +𝜇 with the factorization (4). The inequality (a) is due to (3). Now, having at hand the inequality (80), one can substitute 𝑊 ≡ (𝐷, 𝑋 𝜇 +1 , 𝑋 𝜇 +2 , … , 𝑋 𝜇 +𝜇 ) with an arbitrary joint distribution on the set 𝒟 × 𝒳 𝜇 +1 × … × 𝒳 𝜇 +𝜇 . By this substitution, we derive that (5) holds for all joint PDFs 𝑃 𝐷𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 𝑃 𝑋 …𝑋 𝜇1 |𝐷𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 . The proof is thus complete. ■ Appendix B

Proof of Lemma 2 The proof is rather similar to Lemma 1. First, note that (7) implies the following inequality:

𝐼(𝑈; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝑊) ≤ 𝐼(𝑈; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 , 𝑊) (83) for all PDFs 𝑃 𝑊𝑈𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 (𝑤, 𝑢, 𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) with: 𝑃 𝑊𝑈𝑋 …𝑋 𝜇1 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 = 𝑃 𝑊 𝑃 𝑈𝑋 …𝑋 𝜇1 |𝑊 𝑃 𝑋 𝜇1+1 |𝑊 𝑃 𝑋 𝜇1+2 |𝑊 … 𝑃 𝑋 𝜇1+𝜇2 |𝑊 (84) We have this liberty because 𝑃 𝑊 (𝑤) in (81) is arbitrary.

18 where

𝑊, 𝑈 → 𝑋 , … , 𝑋 𝜇 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 → 𝑌 , 𝑌 forms a Markov chain. This can be proved by following the same lines as (82). Now, having at hand the inequality (83), one can substitute 𝑊 ≡ (𝑋 𝜇 +1 , 𝑋 𝜇 +2 , … , 𝑋 𝜇 +𝜇 , 𝐷) with an arbitrary joint distribution on the set 𝒳 𝜇 +1 × … × 𝒳 𝜇 +𝜇 × 𝒟 . By this substitution, we obtain that the inequality (9) holds for all joint PDFs 𝑃 𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 𝐷 𝑃 𝑈𝑋 …𝑋 𝜇1 |𝑋 𝜇1+1 …𝑋 𝜇1+𝜇2 𝐷 . The proof is complete. ■ Appendix C

Proof of Lemma 3 First note that if 𝐷 is independent of (𝑍 , 𝑍 ) , then 𝐷 → 𝑋 , … , 𝑋 𝜇 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 → 𝑌 , 𝑌 forms a Markov chain. It is sufficient to prove that (3) holds. Now define: 𝑌̃ ≜ 𝛼𝑌 + (𝛼𝑏 𝜇 +1 − 𝑎 𝜇 +1 )𝑋 𝜇 +1 + (𝛼𝑏 𝜇 +2 − 𝑎 𝜇 +2 )𝑋 𝜇 +2 + ⋯ + (𝛼𝑏 𝜇 +𝜇 − 𝑎 𝜇 +𝜇 )𝑋 𝜇 +𝜇 + √1 − 𝛼 𝑍̃ (85) where 𝑍̃ is a Gaussian random variable with zero mean and unit variance which is independent of (𝑍 , 𝑍 ) . By considering (10), it is readily derived that 𝑌̃ is statistically equivalent to 𝑌 in the sense of: ℙ(𝑦̃ |𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) ≋ ℙ(𝑦 |𝑥 , … , 𝑥 𝜇 , 𝑥 𝜇 +1 , … , 𝑥 𝜇 +𝜇 ) (86) Therefore, for all input distributions, we have: 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) = 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌̃ |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) ≤ 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌̃ , 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) = (𝑎) 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) + 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌̃ |𝑌 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) = 𝐼(𝑋 , … , 𝑋 𝜇 ; 𝑌 |𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 ) (87) where (a) holds because 𝑋 , … , 𝑋 𝜇 → 𝑌 , 𝑋 𝜇 +1 , … , 𝑋 𝜇 +𝜇 → 𝑌̃ forms a Markov chain (this is clear from (85)). The proof is thus complete. ■■