[PDF] Multi-layer Interference Alignment and GDoF of the K-User Asymmetric Interference Channel

Abstract

In wireless networks, link strengths are often affected by some topological factors such as propagation path loss, shadowing and inter-cell interference. Thus, different users in the network might experience different link strengths. In this work we consider a K-user asymmetric interference channel, where the channel gains of the links connected to Receiver k are scaled with P^{\alpha_k /2}}, k=1,2,...,K, for 0< \alpha_1 \leq \alpha_2 \leq \cdots \leq \alpha_K \leq 1. For this setting, we show that the optimal sum generalized degrees-of-freedom (GDoF) is characterized as dsum = (\sum_{k=1}^K \alpha_k + \alpha_K -\alpha_{K-1})/2, which matches the existing result dsum= K/2 when \alpha_1 = \alpha_2 = ... = \alpha_K =1. The achievability is based on multi-layer interference alignment, where different interference alignment sub-schemes are designed in different layers associated with specific power levels, and successive decoding is applied at the receivers. While the converse for the symmetric case only requires bounding the sum degrees-of-freedom (DoF) for selected two users, the converse for this asymmetric case involves bounding the weighted sum GDoF for selected J+2 users, with corresponding weights (2^{J}, 2^{J-1}, ... , 2^{2}, 2^{1}), a geometric sequence with common ratio 2, for the first J users and with corresponding weights (1, 1) for the last two users, for J \in {1,2, ... , \lceil\log (K/2)\rceil }.

Full PDF

11 Multi-layer Interference Alignment and GDoF ofthe K -User Asymmetric Interference Channel Jinyuan Chen

Abstract

In wireless networks, link strengths are often affected by some topological factors such as propagation pathloss, shadowing and inter-cell interference. Thus, different users in the network might experience different linkstrengths. In this work we consider a K -user asymmetric interference channel, where the channel gains of the linksconnected to Receiver k are scaled with √ P α k , k = 1 , , · · · , K , for < α ≤ α ≤ · · · ≤ α K ≤ . For thissetting, we show that the optimal sum generalized degrees-of-freedom (GDoF) is characterized as d sum = (cid:80) Kk =1 α k + α K − α K − which matches the existing result d sum = K when α = α = · · · = α K = 1 . The achievability is based on multi-layer interference alignment, where different interference alignment sub-schemes are designed in different layersassociated with speciﬁc power levels, and successive decoding is applied at the receivers. While the converse for the symmetric case only requires bounding the sum degrees-of-freedom (DoF) for selected two users, the converse forthis asymmetric case involves bounding the weighted sum GDoF for selected J +2 users, with corresponding weights (2 J , J − , · · · , , ) , a geometric sequence with common ratio 2, for the ﬁrst J users and with correspondingweights (1 , for the last two users, for J ∈ { , , · · · , (cid:100) log K (cid:101)} . I. I

NTRODUCTION

In wireless networks, the strengths of communication links are often affected by propagation pathloss, shadowing, inter-cell interference, and some other topological factors. Therefore, different users inthe network might experience different link strengths. For example, in an interference network, when areceiver is relatively far from the transmitters, this receiver might experience weaker links compared tothe receivers that are more close to the transmitters (see Fig. 1). Such asymmetry property of the linkstrengths in communication networks can crucially affect the transceiver design, as well as the capacityperformance.In this work we consider a K -user asymmetric interference channel, where different receivers mighthave different link strengths. For this setting, the channel gains of the links connected to Receiver k are scaled with √ P α k , where α k captures the link strength of Receiver k , which might be differentfrom that of the other receivers, for k = 1 , , · · · , K . This generalizes the symmetric setting, in which α = α = · · · = α K = 1 , to a setting with diverse link strengths.For the symmetric K -user interference channel, the work in [1] showed that the optimal sum degrees-of-freedom (DoF) is characterized by K/ , which implies that “everyone gets half of the cake”. DoFis a pre-log factor of capacity at the high signal-to-noise ratio (SNR) regime. Although the DoF metriccan produce profound insights, it has a fundamental limitation, that is, it treats all non-zero links asapproximately equally strong. Thus, it motivates the researchers to go beyond the DoF metric into thegeneralized degrees-of-freedom (GDoF) metric (see [2]–[26] and the references therein), for the settingswith diverse link strengths. For the K -user asymmetric interference channel, we focus on the optimal sumGDoF. Speciﬁcally, for this asymmetric setting we show that the optimal sum GDoF is characterized as d sum = (cid:80) Kk =1 α k + α K − α K − , for < α ≤ α ≤ · · · ≤ α K ≤ . This result generalizes the existing result ofthe symmetric case to the setting with diverse link strengths. Jinyuan Chen is with Louisiana Tech University, Department of Electrical Engineering, Ruston, USA (email: [email protected]). a r X i v : . [ c s . I T ] O c t Rx1

Tx3 w Tx2 w ˆ w Tx1 w ... Tx K w K Rx K ˆ w K Rx2 ˆ w ˆ w ... ... ... ... ... Fig. 1. An asymmetric interference channel, where some receivers are relatively far from the transmitters and consequently might haveweaker links compared to the receivers closer to the transmitters.

The proposed achievability is based on multi-layer interference alignment and successive decoding.While the traditional interference alignment scheme is usually dedicated to all users in the network(cf. [1], [27]), the multi-layer interference alignment scheme proposed in this work consists of K differentinterference alignment sub-schemes, with each interference alignment sub-scheme dedicated to a subsetof users. In this scheme, each interference alignment sub-scheme is designed in a speciﬁc layer associatedwith a particular power level. In terms of decoding, successive decoding is applied at the receivers.Speciﬁcally, successive decoding is operated layer by layer. For the decoding at one layer, each of theinvolved receivers decodes the desired signals and the interference in this layer, and then remove them todecode signals at the next layer. The converse for this asymmetric case involves bounding the weighted sum GDoF for selected J + 2 users, with weights being a geometric sequence for the ﬁrst J users, for J ∈ { , , · · · , (cid:100) log K (cid:101)} . This is very different from the converse for the symmetric case, which onlyrequires bounding the sum DoF for selected two users.The remainder of this work is organized as follows. Section II describes the system model of theasymmetric interference channel. Section III provides the main result of this work. The converse proof isprovided in Section IV, while the achievability proof is described in Section V. Finally, section VI showsthe conclusion of this work. Throughout this work, H ( • ) , h( • ) and I ( • ) denote the entropy, differentialentropy and mutual information, respectively. | • | denotes the magnitude of a scalar or the cardinality ofa set. Z , Z + , R and N denote the sets of integers, positive integers, real numbers, and natural numbers,respectively. o ( • ) is a standard Landau notation, where f ( x ) = o ( g ( x )) implies that lim x →∞ f ( x ) /g ( x ) =0 . [ A : B ] is a set of integers from A to B , for some integers A ≤ B . Given a set A , then A ( i ) denotesthe i th element of set A . Logarithms are in base .II. S YSTEM MODEL

We focus on a K -user asymmetric interference channel deﬁned by the following input-output equations: y k ( t ) = √ P α k K (cid:88) (cid:96) =1 h k(cid:96) x (cid:96) ( t ) + z k ( t ) , k ∈ [1 : K ] (1) t ∈ [1 : n ] , where x (cid:96) ( t ) is the channel input at Transmitter (cid:96) subject to a normalized average powerconstraint E | x (cid:96) ( t ) | ≤ . z k ( t ) ∼ N (0 , is additive white Gaussian noise at Receiver k . h k(cid:96) is the channel coefﬁcient between Transmitter (cid:96) and Receiver k . P ≥ denotes a nominal power value. Theexponent α k represents the channel strength of the links connected to Receiver k . Without loss of generalitywe consider the case that < α ≤ α ≤ · · · ≤ α K ≤ . The channel coefﬁcients { h k(cid:96) } k,(cid:96) are drawn independently and identically from a continuous distribution.We assume that the absolute value of each channel coefﬁcient is bounded between a ﬁnite maximum valueand a nonzero minimum value. All the channel parameters { α k } k and coefﬁcients { h k(cid:96) } k,(cid:96) are assumedto be perfectly known to all the transmitters and receivers (perfect CSIT and CSIR).In this channel, the message w k is sent from Transmitter k to Receiver k over n channel uses, for k ∈ [1 : K ] , where w k is uniformly drawn from a set W k = [1 : 2 nR k ] and R k is the rate of thismessage. A rate tuple ( R ( P, α ) , R ( P, α ) , · · · , R K ( P, α )) is said to be achievable if for any (cid:15) > thereexists a sequence of n -length codes such that each receiver can decode its own message reliably, i.e.,Pr [ ˆ w k (cid:54) = w k ] ≤ (cid:15) , ∀ k ∈ [1 : K ] , when n goes large, for α (cid:44) [ α , α , · · · , α K ] . The capacity region C ( P, α ) is the collection of all the achievable rate tuples ( R ( P, α ) , R ( P, α ) , R c ( P, α )) . The GDoF region D ( α ) is deﬁned as D ( α ) (cid:44) (cid:110) ( d , d , · · · , d K ) : ∃ (cid:0) R ( P, α ) , R ( P, α ) , · · · , R K ( P, α ) (cid:1) ∈ C ( P, α ) s.t. d k = lim P →∞ R k ( P, α ) log P , ∀ k ∈ [1 : K ] (cid:111) . The sum GDoF is then deﬁned by d sum ( α ) (cid:44) max d ,d , ··· ,d K :( d ,d , ··· ,d K ) ∈D ( α ) d + d + · · · + d K . GDoF is a generalization of the DoF. Note that DoF can be considered as a speciﬁc point of GDoF byletting α = α = · · · = α K = 1 . III. M AIN RESULT

The main result of this work is the characterization of the optimal sum GDoF for the K -user asymmetricinterference channel. Theorem 1.

For the K -user asymmetric interference channel deﬁned in Section II, for almost allrealizations of channel coefﬁcients { h k(cid:96) } , the optimal sum GDoF is characterized as d sum ( α ) = (cid:80) Kk =1 α k + α K − α K − . (2) Proof.

The achievability is based on multi-layer interference alignment and successive decoding. Theconverse for this asymmetric case involves bounding the weighted sum GDoF for selected J + 2 users, J ∈ [1 : (cid:100) log K (cid:101) ] . The details of the achievability and converse proofs are provided in Section V andSection IV, respectively. Remark 1.

The result of Theorem 1 matches the previous result d sum ( α ) = K when α = α = · · · = α K = 1 (see [1]). Remark 2.

One observation from the result of Theorem 1 is that, the change of the link strength of the ( K − th receiver, i.e., α K − , will not take effect on the sum GDoF, as long as α K − ≤ α K − ≤ α K . Remark 3.

From the result of Theorem 1, it reveals that the link strength of the K th receiver, i.e., α K ,takes more effect on the optimal sum GDoF (with a larger weight), compared to the link strengths of theother receivers. IV. C

ONVERSE

This section provides the converse of Theorem 1, for the K -user asymmetric interference channeldeﬁned in Section II. While the converse for the symmetric case only requires bounding the sum DoF forselected two users, the converse for this asymmetric case involves bounding the weighted sum GDoF forselected J + 2 users, with corresponding weights (2 J , J − , · · · , , , , , for J ∈ [1 : (cid:100) log K (cid:101) ] . Theresult on bounding the weighted sum GDoF is given in the following lemma. Lemma 1.

For ≤ l < l < · · · < l J +2 ≤ K and J ∈ [1 : (cid:100) log K (cid:101) ] , then the following inequality holdstrue J (cid:88) j =1 J − j +1 d l j + d l J +1 + d l J +2 ≤ J (cid:88) j =1 J − j α l j + α l J +2 . (3)Before proving Lemma 1, let us provide the following result derived from Lemma 1, which serves asthe converse of Theorem 1. Corollary 1.

For the K -user asymmetric interference channel deﬁned in Section II, the optimal sum GDoFis upper bounded by d sum ( α ) ≤ (cid:80) Kk =1 α k + α K − α K − . (4) Proof.

The proof is based on Lemma 1. The details of this proof are provided in Appendix B.Let us now prove Lemma 1. Without loss of generality, we will focus on the case of l i = i for i ∈ [1 : J + 2] and J ∈ [1 : (cid:100) log K (cid:101) ] , and prove J (cid:88) j =1 J − j +1 d j + d J +1 + d J +2 ≤ J (cid:88) j =1 J − j α j + α J +2 . (5)Let us deﬁne an auxiliary variable ˜ y k,(cid:96) ( t ) (cid:44) √ P α (cid:96) K (cid:88) i =1 h ki x i ( t ) + ˜ z (cid:96) ( t ) (6)where ˜ z (cid:96) ( t ) ∼ N (0 , is independent of the other noise random variables, for k, (cid:96) ∈ [1 : K ] . Let y nk (cid:44) { y k ( t ) } nt =1 , x nk (cid:44) { x k ( t ) } nt =1 , z nk (cid:44) { z k ( t ) } nt =1 , and ˜ y nk,(cid:96) (cid:44) { ˜ y k,(cid:96) ( t ) } nt =1 . For the ease of description, wedeﬁne that ¯ W [ i,j ] (cid:44) { w (cid:96) : (cid:96) ∈ [1 : K ] , (cid:96) (cid:54) = i, (cid:96) (cid:54) = j } and ¯ W [ i ] (cid:44) { w (cid:96) : (cid:96) ∈ [1 : K ] , (cid:96) (cid:54) = i } , for i, j ∈ [1 : K ] , i (cid:54) = j . We also deﬁne that Φ( J ) (cid:44) J − J +1 I ( w J ; y nJ ) + J +2 (cid:88) j = J +1 max { J − j +1 , } I ( w j ; ˜ y nJ +1 ,J | ¯ W [ j ] ) (7)for J ∈ [1 : J − , and that d (cid:44) , α (cid:44) , ˜ y n , (cid:44) φ, I ( w j ; ˜ y n , | ¯ W [ j ] ) (cid:44) , ∀ j, I ( w ; y n ) (cid:44) , and Φ(0) (cid:44) . (8) Beginning with Fano’s inequality, we have J (cid:88) j =1 J − j +1 nR j + nR J +1 + nR J +2 − n(cid:15) n ≤ J − (cid:88) j =1 J − j +1 I ( w j ; y nj ) + 2 I ( w J ; y nJ ) + I ( w J +1 ; y nJ +1 ) + I ( w J +2 ; y nJ +2 ) (9) ≤ J − (cid:88) j =1 J − j +1 I ( w j ; y nj ) + J +2 (cid:88) j = J max { J − j +1 , } I ( w j ; ˜ y nJ,J − | ¯ W [ j ] )+ (cid:0) ( α J +2 − α J ) + 2( α J − α J − ) (cid:1) n P + no (log P ) (10) = J − (cid:88) j =1 J − j +1 I ( w j ; y nj ) + Φ( J − (cid:0) ( α J +2 − α J ) + 2( α J − α J − ) (cid:1) n P + no (log P ) (11) ≤ J − (cid:88) j =1 J − j +1 I ( w j ; y nj ) + Φ( J − (cid:0) ( α J +2 − α J ) + 2( α J − α J − ) + 2 ( α J − − α J − ) (cid:1) n P + no (log P ) (12) ≤ J − (cid:88) j =1 J − j +1 I ( w j ; y nj ) + Φ( J − (cid:0) ( α J +2 − α J ) + 2( α J − α J − ) + 2 ( α J − − α J − ) + 2 ( α J − − α J − ) (cid:1) n P + no (log P ) (13)... ≤ (cid:0) ( α J +2 − α J ) + 2( α J − α J − ) + 2 ( α J − − α J − ) + 2 ( α J − − α J − ) + · · · + 2 J ( α − α ) (cid:1) n P + no (log P ) (14) = (cid:0) J (cid:88) j =1 J − j α j + α J +2 (cid:1) n P + no (log P ) (15)where Φ( J ) is deﬁned in (7), for J ∈ [1 : J − ; (9) is from Fano’s inequality, and (cid:15) n → as n → ∞ ;(10) follows from Lemma 4, which is provided at the end of this section; (11) uses the deﬁnition of Φ( J ) ; (12)-(14) follow from the result of Lemma 2, provided at the end of this section. By dividing eachside of (15) with n log P and letting n, P → ∞ , it proves the bound in (5). By mapping the indexes i with l i , for i ∈ [1 : J + 2] and ≤ l < l < · · · < l J +2 ≤ K , it then proves Lemma 1.Note that, in our proof the weights of the sum GDoF for J + 2 users are designed speciﬁcally as (2 J , J − , · · · , , , , . With this design, for J ∈ [1 : J ] , the J th mutual information term I ( w J ; y nJ ) with weight J − J +1 can be bounded with other J − J +1 mutual information terms generated fromUser ( J + 1) to User ( J + 2) , i.e., (cid:80) J +2 j = J +1 max { J − j +1 , } I ( w j ; ˜ y nJ +1 ,J | ¯ W [ j ] ) . This bounding operationalso generates a total of J − ( J − mutual information terms that will be used to bound the ( J − thmutual information term I ( w J − ; y nJ − ) with weight J − ( J − . This process repeats until J = 1 . Sincea weighted mutual information term is bounded with other weighted mutual information terms and it alsogenerates new terms for the next operation, it then forms a “chain” on this bounding process.The lemmas and claims used in our proof are provided below. Their proofs are relegated to Appendix A. Lemma 2.

For Φ( J ) deﬁned in (7) , J ∈ [1 : J − , we have the following bound Φ( J ) + 2 J − ( J − I ( w J − ; y nJ − ) ≤ J − J +1 ( α J − α J − ) · n P + no (log P ) + Φ( J − where α , I ( w ; y n ) , and Φ(0) are deﬁned in (8) .Proof.

See Appendix A-A. The proof is based on the result of Lemma 3.

Lemma 3.

For J ∈ [1 : J − , the following inequality is true J − J +1 I ( w J ; y nJ ) + J +2 (cid:88) j = J +1 max { J − j +1 , } I ( w j ; ˜ y nJ +1 ,J | ¯ W [ j ] ) ≤ J − J +1 ( α J − α J − ) · n P + no (log P ) + J +2 (cid:88) j = J max { J − j +1 , } I ( w j ; ˜ y nJ ,J − | ¯ W [ j ] ) where α , ˜ y n , , and I ( w j ; ˜ y n , | ¯ W [ j ] ) are deﬁned in (8) .Proof. See Appendix A-B. The proof uses the result of Lemma 5.

Lemma 4.

The following bound holds true I ( w J ; y nJ ) + I ( w J +1 ; y nJ +1 ) + I ( w J +2 ; y nJ +2 ) ≤ I ( w J ; ˜ y nJ,J − | ¯ W [ J ] ) + I ( w J +1 ; ˜ y nJ,J − | ¯ W [ J +1] ) + I ( w J +2 ; ˜ y nJ,J − | ¯ W [ J +2] )+ ( α J +2 − α J + 2( α J − α J − )) · n P + no (log P ) . Proof.

See Appendix A-C. The proof uses the result of Lemma 5.

Lemma 5.

For (cid:96) , (cid:96) , (cid:96) , l, i, j ∈ [1 : K ] , (cid:96) < (cid:96) ≤ (cid:96) , i (cid:54) = j , then the following bound is true I ( w i ; y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) + I ( w j ; ˜ y nl,(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] ) ≤ n P α (cid:96) − α (cid:96) ) + n (cid:0) P α (cid:96) − α (cid:96) | h lj | | h (cid:96) j | (cid:1) . When (cid:96) , (cid:96) , l, j ∈ [1 : K ] and (cid:96) ≤ (cid:96) , then we have I ( w i ; y n(cid:96) | ¯ W [ i,j ] ) + I ( w j ; ˜ y nl,(cid:96) | ¯ W [ j ] ) ≤ α (cid:96) · n P + no (log P ) . Proof.

See Appendix A-D. The proof is based on the result of Claim 1 and Claim 2.

Claim 1.

For (cid:96) , (cid:96) , i, j ∈ [1 : K ] , (cid:96) < (cid:96) , i (cid:54) = j , it holds true that I ( w i , w j ; y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) ≤ n P α (cid:96) − α (cid:96) ) . When (cid:96) , i, j ∈ [1 : K ] , i (cid:54) = j , then the following inequality is true I ( w i , w j ; y n(cid:96) | ¯ W [ i,j ] ) ≤ α (cid:96) · n P + no (log P ) . Proof.

See Appendix A-E.

Claim 2.

For (cid:96) , (cid:96) , (cid:96) , l, j ∈ [1 : K ] , (cid:96) < (cid:96) ≤ (cid:96) , it is true that I ( w j ; ˜ y nl,(cid:96) | y n(cid:96) , ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] ) ≤ n (cid:0) P α (cid:96) − α (cid:96) | h lj | | h (cid:96) j | (cid:1) . When (cid:96) , (cid:96) , l, j ∈ [1 : K ] , (cid:96) ≤ (cid:96) , and ˜ y n(cid:96) ,(cid:96) = φ , then the above inequality is also true.Proof. See Appendix A-F.

V. A

CHIEVABILITY

This section provides the achievability for Theorem 1. The achievability is based on multi-layerinterference alignment, where different interference alignment sub-schemes are designed in different layersassociated with speciﬁc power levels. In this scheme, the method of successive decoding is applied at thereceivers. In the proposed scheme, pulse amplitude modulation (PAM) will be used.Let us ﬁrst review the PAM modulation that will be used in our scheme. If a random variable x isuniformly drawn from the following PAM constellation set Ω( ξ, Q ) (cid:44) { ξ · a : a ∈ Z ∩ [ − Q, Q ] } (16)for some Q ∈ Z + and ξ ∈ R , then the average power of x is E | x | = 2 ξ Q + 1 Q (cid:88) i =1 i = ξ Q ( Q + 1)3 . (17)The parameter ξ is used to regularize the average power of x . The expression in (17) implies that E | x | ≤ /τ, for ξ ≤ √ τ Q (18)given some τ > . One property for the PAM constellation is that, given some PAM signals c , c , · · · , c M ∈ Ω( ξ, Q ) , the sum of them is still a PAM signal such that c + c + · · · + c M ∈ Ω( ξ, M Q ) . (19)In the GDoF analysis of the proposed scheme, we will use the Khintchine-Groshev Theorem forMonomials , which is stated in the following Theorem, as in [28]. Theorem 2 (Khintchine-Groshev Theorem for Monomials) . Let N ≤ M , v = ( v , v , · · · , v N ) ∈ R N ,and g , g , · · · , g M be distinct monomials generated by v . Then, for any (cid:15) (cid:48) > and almost all v , thereexists a positive constant κ such that (cid:12)(cid:12) M (cid:88) i =1 g i q i (cid:12)(cid:12) > κ max i | q i | M − (cid:15) (cid:48) (20) holds for all ( q , q , · · · , q M ) (cid:54) = ∈ Z M . Let us describe the proposed scheme with multi-layer interference alignment and successive decoding,given in the following sub-sections.

A. Multi-layer interference alignment

The proposed scheme consists of K sub-schemes, with each sub-scheme designed in a speciﬁc layer,i.e., at a speciﬁc power level. For each of the ﬁrst K − layers, the design follows from the interferencealignment technique [1], [28]. Since interference alignment is designed across multiple layers, we callit as multi-layer interference alignment. The last two layers are dedicated to two users and one user,respectively. Thus, the design of the last two layers is very simple.The (cid:96) th layer (the (cid:96) th sub-scheme) is dedicated speciﬁcally to the last K (cid:96) users, from Users (cid:96) to User K , where K (cid:96) (cid:44) K − (cid:96) + 1 , (cid:96) ∈ [1 : K ] . (21) A function f ( v ) is a monomial generated by v = ( v , v , · · · , v N ) ∈ R N if this function can be written as f ( v ) = (cid:81) Ni =1 v β i i , for β i ∈ N , ∀ i ∈ [1 : N ] . x , x , x , x , x , x , x , x , x , x , x , x , x , x , x x x x x x , · · ·· · ·· · ·· · ·· · ·· · · x K, x K, x K, x K, x Kx K, . . . ... x K,K

Layer 1Layer 2Layer 3Layer 4Layer 5 ...

Layer K Fig. 2. The structure of the multi-layer interference alignment. The (cid:96) th layer is dedicated to the last ( K − (cid:96) + 1) users, from Users (cid:96) toUser K , for (cid:96) ∈ [1 : K ] . For Transmitter k , the transmitted signal is a superposition of the signals dedicated to the ﬁrst k layers, and x k,(cid:96) is the signal dedicated to the (cid:96) th layer, for (cid:96) ∈ [1 : k ] , k ∈ [1 : K ] . For Transmitter k , the transmitted signal is a superposition of the signals dedicated to the ﬁrst k layers,designed as x k = k (cid:88) (cid:96) =1 √ P − α (cid:96) − x k,(cid:96) for x k,(cid:96) = v T k,(cid:96) b k,(cid:96) (22)for k ∈ [1 : K ] , where α (cid:44) and x k,(cid:96) is the signal of Transmitter k dedicated to the (cid:96) th layer. The vector v k,(cid:96) (cid:44) [ v k,(cid:96), , v k,(cid:96), , · · · , v k,(cid:96),N (cid:96) ] T ∈ R N (cid:96) × (23)will be speciﬁed later on, where N (cid:96) is designed as N (cid:96) (cid:44) (cid:26) m K (cid:96) ( K (cid:96) − if (cid:96) ∈ [1 : K − (24a) if (cid:96) ∈ [ K − K ] (24b)for some m ∈ Z + . The vector b k,(cid:96) (cid:44) [ b k,(cid:96), , b k,(cid:96), , · · · , b k,(cid:96),N (cid:96) ] T (25)is an information vector for the (cid:96) th layer, where the elements { b k,(cid:96),i } N (cid:96) i =1 are independent random variables uniformly drawn from the following PAM constellation set b k,(cid:96),i ∈ Ω( ξ = γ · Q (cid:96) , Q = Q (cid:96) ) , i ∈ [1 : N (cid:96) ] , k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K ] (26)where γ is a positive constant, and Q (cid:96) is deﬁned as Q (cid:96) (cid:44) P λ(cid:96) , (cid:96) ∈ [1 : K ] . (27) Without loss of generality we will assume that P λ(cid:96) is an integer, for (cid:96) ∈ [1 : K ] . When P λ(cid:96) isn’t an integer, we can slightly modifythe parameter (cid:15) in (28a) and (28b) such that P λ(cid:96) is an integer, for the regime with large P . The parameter λ (cid:96) is designed as λ (cid:96) (cid:44)  α (cid:96) − α (cid:96) − M (cid:96) − (cid:15) if (cid:96) ∈ [1 : K − (28a) α (cid:96) − α (cid:96) − K − (cid:96) + 1 − (cid:15) if (cid:96) ∈ [ K − K ] (28b)for M (cid:96) (cid:44) m K (cid:96) ( K (cid:96) − + ( K (cid:96) − m K (cid:96) ( K (cid:96) − − − (29)and for some small enough (cid:15) > . As we will see later on, λ (cid:96) represents the GDoF carried by each of thesymbols { b k,(cid:96),i } i,k . In our scheme, when α (cid:96) = α (cid:96) − , then the (cid:96) th layer can be simply removed withoutaffecting the GDoF performance, i.e., the signal x k,(cid:96) is set as x k,(cid:96) = 0 , ∀ k . Without loss of generality, wewill focus on the case with α (cid:96) > α (cid:96) − , ∀ (cid:96) .Let us now design the vectors of v k,(cid:96) for each layer. The design of v k,(cid:96) for the last two layers is verystraightforward. Note that the ( K − th layer is dedicated to User K − and User K , while the K thlayer is dedicated to User K only. Therefore, we set the parameters as v K − ,K − , = v K,K − , = v K,K, = 1 . Recall that N K − = N K = 1 (see (24b)). In the following, we will design the vectors of v k,(cid:96) for the (cid:96) thlayer, for (cid:96) ∈ [1 : K − . For the (cid:96) th layer dedicated to the last K (cid:96) users, we deﬁne a set of dimensions as V (cid:96),m (cid:44) (cid:110) K (cid:89) j = (cid:96) K (cid:89) i = (cid:96)i (cid:54) = j h β ij ij : β ij ∈ [0 : m − (cid:111) , (cid:96) ∈ [1 : K − . (30)Note that V (cid:96),m consists of N (cid:96) rationally independent real numbers , where N (cid:96) = m K (cid:96) ( K (cid:96) − for (cid:96) ∈ [1 : K − . In our scheme, we let v k,(cid:96) be the vector containing all the elements in set V (cid:96),m , i.e., v k,(cid:96),i = V (cid:96),m ( i ) , i ∈ [1 : N (cid:96) ] , k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − . (31) V (cid:96),m ( i ) denotes the i th element of the set V (cid:96),m .Based on our design, Lemma 6 (see below) shows that the average power of each transmitted signalis upper bounded by γ η , where η is a positive value independent of P , and γ is a positive constantappeared in (26). Thus, by setting γ as a constant that is bounded away from zero and is no more than √ η , i.e., γ ∈ (0 , √ η ] , then the average power constraint is satisﬁed, that is, E | x k | ≤ for k ∈ [1 : K ] . Lemma 6.

Based on the signal design in (22) - (30) , the average power of the transmitted signal atTransmitter k , k ∈ [1 : K ] , satisﬁes E | x k | ≤ γ η (32) where η is a positive value independent of P .Proof. See Appendix C-A. We say p , p , · · · , p M are rationally independent if the only M -tuple of integers q , q , · · · , q M such that (cid:80) Mi =1 p i q i = 0 is the trivialsolution in which every q i is zero. B. Successive decoding

The decoding is based on successive decoding. The idea of successive decoding is to decode the signalsfor one layer by treating the lower layers as noise, and then remove them to decode the signals in thenext layer. The signals decoded in one layer include the desired signals and the interference signals thatmight be in a certain form.Let us ﬁrst focus on the decoding for the ﬁrst K − layers, and then discuss the decoding for thelast two layers. For the (cid:96) th layer, (cid:96) ∈ [1 : K − , based on the above design of multi-layer interferencealignment, at Receiver k , k ∈ [ (cid:96) : K ] , the interference signals can be aligned into a set of dimensions denoted by I k,(cid:96) , for I k,(cid:96) = (cid:91) l ∈ [ (cid:96) : K ] l (cid:54) = k (cid:110) h mkl · (cid:89) i,j ∈ [ (cid:96) : K ] i (cid:54) = j ( i,j ) (cid:54) =( k,l ) h β ij ij : β ij ∈ [0 : m − (cid:111) (cid:91)(cid:110) V (cid:96),m (cid:15)(cid:8) (cid:9)(cid:111) (33)which satisﬁes I k,(cid:96) ⊂ V (cid:96),m +1 and |I k,(cid:96) | = m K (cid:96) ( K (cid:96) − + ( K (cid:96) − m K (cid:96) ( K (cid:96) − − − M (cid:96) − N (cid:96) ; while the desired signals lie in a set of dimensions denoted by S k,(cid:96) , for S k,(cid:96) = h kk V (cid:96),m = (cid:110) h kk K (cid:89) j = (cid:96) K (cid:89) i = (cid:96)i (cid:54) = j h β ij ij : β ij ∈ [0 : m − (cid:111) (34)which satisﬁes |S k,(cid:96) | = m K (cid:96) ( K (cid:96) − = N (cid:96) . Note that h kk is not appeared in the dimensions of I k,(cid:96) . Also note that h kk is appeared in each dimensionof S k,(cid:96) . It then implies that all the dimensions in I k,(cid:96) ∪ S k,(cid:96) are rationally independent.For the successive decoding at the (cid:96) th layer, (cid:96) ∈ [1 : K − , at Receiver k , k ∈ [ (cid:96) : K ] , the goal isto decode the desired information vector b k,(cid:96) (see (25)), as well as the interference at that layer, giventhat the decoding of the previous layers is complete. For the (cid:96) th layer, (cid:96) ∈ [1 : K − , assuming that thedecoding of the previous layers is complete, then Receiver k , k ∈ [ (cid:96) : K ] has the following observation(removing the time index) y k,(cid:96) (cid:44) y k − (cid:96) − (cid:88) l =1 K (cid:88) j = l √ P α k − α l − h kj v T j,l b j,l (cid:124) (cid:123)(cid:122) (cid:125) side information from previous layers (35)where the term of (cid:80) (cid:96) − l =1 (cid:80) Kj = l √ P α k − α l − h kj v T j,l b j,l is constructed from the side information about desiredsignals and interference obtained from the decoding of the previous layers, with (cid:80) l =1 s i (cid:44) for any s i ∈ R . When (cid:96) = 1 , this term is zero. Let us expand y k,(cid:96) from (35) to the following expression: y k,(cid:96) = K (cid:88) l =1 K (cid:88) j = l √ P α k − α l − h kj v T j,l b j,l + z k − (cid:96) − (cid:88) l =1 K (cid:88) j = l √ P α k − α l − h kj v T j,l b j,l = √ P α k − α (cid:96) − h kk v T k,(cid:96) b k,(cid:96) (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) S k,(cid:96) , desired signal + K (cid:88) j = (cid:96)j (cid:54) = k √ P α k − α (cid:96) − h kj v T j,(cid:96) b j,(cid:96) (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) I k,(cid:96) , interference + K (cid:88) l = (cid:96) +1 K (cid:88) j = l √ P α k − α l − h kj v T j,l b j,l (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) T k,(cid:96) , treated as noise + z k (36) where S k,(cid:96) (cid:44) √ P α k − α (cid:96) − h kk v T k,(cid:96) b k,(cid:96) , I k,(cid:96) (cid:44) K (cid:88) j = (cid:96)j (cid:54) = k √ P α k − α (cid:96) − h kj v T j,(cid:96) b j,(cid:96) , T k,(cid:96) (cid:44) K (cid:88) l = (cid:96) +1 K (cid:88) j = l √ P α k − α l − h kj v T j,l b j,l (37)for k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − . From the above expression, y k,(cid:96) can be expanded into four terms: S k,(cid:96) , I k,(cid:96) , T k,(cid:96) and noise. For Receiver k , S k,(cid:96) corresponds to the term containing desired information at Layer (cid:96) ; I k,(cid:96) represents the interference at Layer (cid:96) ; and T k,(cid:96) denotes the term containing signals dedicated to thenext layers, which can be treated as noise. The term S k,(cid:96) can be rewritten in the following form S k,(cid:96) = γ √ P α k − α (cid:96) − − λ (cid:96) |S k,(cid:96) | (cid:88) i =1 S k,(cid:96) ( i ) q k,(cid:96),i for q k,(cid:96), , · · · , q k,(cid:96), |S k,(cid:96) | ∈ [ − Q (cid:96) : Q (cid:96) ] (38)where Q (cid:96) and λ (cid:96) are deﬁned in (27), (28a) and (28b). From (26) it holds true that q k,(cid:96),i (cid:44) b k,(cid:96),i · P λ(cid:96) γ ∈ [ − Q (cid:96) , Q (cid:96) ] , for i ∈ [1 : N (cid:96) ] , k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − . Similarly, the interference term I k,(cid:96) can beexpressed in the form of I k,(cid:96) = γ √ P α k − α (cid:96) − − λ (cid:96) |I k,(cid:96) | (cid:88) i =1 I k,(cid:96) ( i ) q (cid:48) k,(cid:96),i for q (cid:48) k,(cid:96), , · · · , q (cid:48) k,(cid:96), |I k,(cid:96) | ∈ [ − K (cid:96) Q (cid:96) : K (cid:96) Q (cid:96) ] (39)Note that, if the PAM signals lie at the same dimension, the sum of PAM signals is still a PAM signal.In the above expression, q (cid:48) k,(cid:96),i represents the sum of the normalized PAM signals (normalized by γP − λ(cid:96) )lying at the dimension I k,(cid:96) ( i ) , and thus q (cid:48) k,(cid:96),i ∈ [ − K (cid:96) Q (cid:96) : K (cid:96) Q (cid:96) ] for i ∈ [1 : |I k,(cid:96) | ] , k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − . In this layer, the goal is to decode q k,(cid:96), , · · · , q k,(cid:96), |S k,(cid:96) | , q (cid:48) k,(cid:96), , · · · , q (cid:48) k,(cid:96), |I k,(cid:96) | from y k,(cid:96) bytreating T k,(cid:96) as noise.Let us now focus on the minimum distance of the constellation for the signal S k,(cid:96) + I k,(cid:96) , which isdeﬁned by d min ( k, (cid:96) ) (cid:44) min q k,(cid:96), , ··· ,q k,(cid:96), |S k,(cid:96) | ,q (cid:48) k,(cid:96), , ··· ,q (cid:48) k,(cid:96), |I k,(cid:96) | : q k,(cid:96), , ··· ,q k,(cid:96), |S k,(cid:96) | ∈ [ − Q (cid:96) : Q (cid:96) ] q (cid:48) k,(cid:96), , ··· ,q (cid:48) k,(cid:96), |I k,(cid:96) | ∈ [ − K (cid:96) Q (cid:96) : K (cid:96) Q (cid:96) ]( q k,(cid:96), , ··· ,q k,(cid:96), |S k,(cid:96) | ,q (cid:48) k,(cid:96), , ··· ,q (cid:48) k,(cid:96), |I k,(cid:96) | ) (cid:54) =(0 , , ··· , γ √ P α k − α (cid:96) − − λ (cid:96) (cid:12)(cid:12)(cid:12) |S k,(cid:96) | (cid:88) i =1 S k,(cid:96) ( i ) q k,(cid:96),i + |I k,(cid:96) | (cid:88) i =1 I k,(cid:96) ( i ) q (cid:48) k,(cid:96),i (cid:12)(cid:12)(cid:12) (40)for k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − . For the minimum distance d min ( k, (cid:96) ) deﬁned in (40), Lemma 7 (shown atthe end of this section) provides a result on its lower bound. On the other hand, for the term T k,(cid:96) appearedin (36), Lemma 8 (shown at the end of this section) provides a result on its upper bound. Let us go backto the expression of y k,(cid:96) (see (36)), that is, y k,(cid:96) = S k,(cid:96) + I k,(cid:96) + T k,(cid:96) + z k (41)for k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − . From Lemma 8, T k,(cid:96) is upper bounded by T k,(cid:96) ≤ P αk − α(cid:96) · δ k,(cid:96) , where δ k,(cid:96) is a positive value independent of P . From Lemma 7, the minimum distance of the constellationfor the signal S k,(cid:96) + I k,(cid:96) is lower bounded by d min ( k, (cid:96) ) ≥ κ (cid:48) P αk − α(cid:96) + (cid:15)(cid:96) , for any small enough (cid:15) (cid:96) > ,where κ (cid:48) is a positive constant. Therefore, one can easily show that q k,(cid:96), , · · · , q k,(cid:96), |S k,(cid:96) | , q (cid:48) k,(cid:96), , · · · , q (cid:48) k,(cid:96), |I k,(cid:96) | can be decoded from y k,(cid:96) by treating T k,(cid:96) as noise, with vanishing error probability as P goes large.At this point, at Layer (cid:96) , the information vector b k,(cid:96) is decoded at Receiver k , and the interference I k,(cid:96) can be reconstructed by Receiver k with the side information of q (cid:48) k,(cid:96), , · · · , q (cid:48) k,(cid:96), |I k,(cid:96) | , for k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − . Once the decoding at Layer (cid:96) is complete, Receiver k removes the reconstructed S k,(cid:96) and I k,(cid:96) from y k,(cid:96) ,and then moves onto the decoding at the next layer, i.e., Layer ( (cid:96) +1) , for k ∈ [ (cid:96) +1 : K ] , (cid:96) +1 ∈ [2 : K − .The decoding at the last two layers is very straightforward. Note that the ( K − th layer is dedicated toUser K − and User K , while the K th layer is dedicated to User K only. Recall that, N K − = N K = 1 , v K − ,K − , = v K,K − , = v K,K, = 1 , and x K − ,K − = b K − ,K − , ∈ Ω( ξ = γ · Q K − , Q = Q K − ) x K,K − = b K,K − , ∈ Ω( ξ = γ · Q K − , Q = Q K − ) x K,K = b K,K, ∈ Ω( ξ = γ · Q K , Q = Q K ) for Q K − (cid:44) P ( αK − − αK − / − (cid:15) and Q K (cid:44) P αK − αK − − (cid:15) . Once the decoding of the ﬁrst K − layers iscomplete, both Receiver ( K − and Receiver K remove all the intended signals and interference signalsdedicated to the ﬁrst K − layers from the corresponding received observations. After that, for the ( K − thlayer, the decoding problem is simply equivalent to decoding two symbols at a × interference channelwith sum GDoF α K − − α K − , where the SNR of this channel is P α K − − α K − . One can easily showthat this two symbols can be decoded at both Receiver ( K − and Receiver K with vanishing errorprobability as P goes large. After that, Receiver K removes the decoded symbols and then decodes itsonly one symbol at the last layer. At this point, the whole decoding is complete.After successive decoding for all the layers, Receiver k , k ∈ [1 : K ] , is able to decode all the followingPAM symbols b k,(cid:96),i ∈ Ω( ξ = γ · P λ(cid:96) , Q = P λ(cid:96) ) , ∀ i ∈ [1 : N (cid:96) ] , (cid:96) ∈ [1 : k ] (42)where λ (cid:96) is deﬁned in (28a) and (28b). Since b k,(cid:96),i is independently and uniformly drawn from thecorresponding PAM constellation Ω( ξ = γ · P λ(cid:96) , Q = P λ(cid:96) ) , then b k,(cid:96),i carries the following amount ofbits of information H ( b k,(cid:96),i ) = log(1 + 2 P λ(cid:96) ) = λ (cid:96) P + o (log P ) (43)for i ∈ [1 : N (cid:96) ] , (cid:96) ∈ [1 : k ] , k ∈ [1 : K ] . By summing up all the amount of information carried by allthe symbols from all the users, and considering that those symbols are sent over a single channel use, itimplies that for almost all realizations of channel coefﬁcients the following sum rate is achievable when P is large R sum = K (cid:88) k =1 R k = K (cid:88) k =1 k (cid:88) (cid:96) =1 N (cid:96) (cid:88) i =1 H ( b k,(cid:96),i )= K (cid:88) k =1 k (cid:88) (cid:96) =1 N (cid:96) (cid:88) i =1 (cid:0) λ (cid:96) P + o (log P ) (cid:1) (44) = K (cid:88) (cid:96) =1 K (cid:88) k = (cid:96) N (cid:96) λ (cid:96) P + o (log P )= K (cid:88) (cid:96) =1 N (cid:96) λ (cid:96) ( K − (cid:96) + 1)2 log P + o (log P ) = K − (cid:88) (cid:96) =1 N (cid:96) ( K − (cid:96) + 1)( α (cid:96) − α (cid:96) − M (cid:96) − (cid:15) )2 log P + 2( α K − − α K − − (cid:15) )2 log P + α K − α K − − (cid:15) P + o (log P ) (45)where (44) follows from (43). Recall that λ (cid:96) = α (cid:96) − α (cid:96) − M (cid:96) − (cid:15) if (cid:96) ∈ [1 : K − , and λ (cid:96) = α (cid:96) − α (cid:96) − K − (cid:96) +1 − (cid:15) if (cid:96) ∈ [ K − K ] . For the sum rate expressed in (45), by dividing each side with log P and letting P → ∞ and (cid:15) → , it reveals that for almost all realizations of channel coefﬁcients the following sumGDoF is achievable d achievable sum ( α ) = K − (cid:88) (cid:96) =1 ( K − (cid:96) + 1)( α (cid:96) − α (cid:96) − ) N (cid:96) M (cid:96) + 2( α K − − α K − )2 + α K − α K − . (46)Note that when (cid:96) ∈ [1 : K − , we have N (cid:96) M (cid:96) = m K(cid:96) ( K(cid:96) − m K(cid:96) ( K(cid:96) − +( K (cid:96) − m K(cid:96) ( K(cid:96) − − − , which converges to for large enough m . Therefore, for large enough m , the achievable sum GDoF expressed in (46) can besimpliﬁed as d achievable sum ( α ) = K − (cid:88) (cid:96) =1 ( K − (cid:96) + 1)( α (cid:96) − α (cid:96) − )2 + 2( α K − − α K − )2 + α K − α K − = (cid:80) Kk =1 α k + α K − α K − (47)which holds for almost all realizations of channel coefﬁcients. At this point, we complete the achievabilityproof for Theorem 1. The two lemmas used in the GDoF analysis are provided below. Lemma 7.

Consider the minimum distance d min ( k, (cid:96) ) deﬁned in (40) . For almost all realizations of channelcoefﬁcients, and for any small enough (cid:15) (cid:96) > , there exists a positive constant κ (cid:48) such that d min ( k, (cid:96) ) ≥ κ (cid:48) P αk − α(cid:96) + (cid:15)(cid:96) for k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − .Proof. See Appendix C-B. The proof uses the result of Khintchine-Groshev Theorem for Monomials.

Lemma 8.

For the term T k,(cid:96) deﬁned in (37) , it can be upper bounded by T k,(cid:96) ≤ P αk − α(cid:96) · δ k,(cid:96) where δ k,(cid:96) is a positive value independent of P , for k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − .Proof. See Appendix C-C. VI. C

ONCLUSION

This work considered the K -user asymmetric interference channel, where different receivers mighthave different channel gains, parameterized by < α ≤ α ≤ · · · ≤ α K ≤ . For this channel, wecharacterized the optimal sum GDoF as d sum = (cid:80) Kk =1 α k + α K − α K − . The achievability is based on multi-layer interference alignment and successive decoding. For the the converse of this asymmetric setting, itinvolves bounding the weighted sum GDoF for selected J + 2 users, J ∈ [1 : (cid:100) log K (cid:101) ] , which is verydifferent from the case of the symmetric setting that only requires bounding the sum DoF for selected two users. The result of this work generalizes the existing result of the symmetric case to the setting withdiverse link strengths. A PPENDIX AP ROOFS OF L EMMAS

2, 3, 4

AND AND C LAIMS AND ˜ y k,(cid:96) ( t ) (cid:44) √ P α (cid:96) K (cid:88) i =1 h ki x i ( t ) + ˜ z (cid:96) ( t )Φ( J ) (cid:44) J − J +1 I ( w J ; y nJ ) + J +2 (cid:88) j = J +1 max { J − j +1 , } I ( w j ; ˜ y nJ +1 ,J | ¯ W [ j ] ) d (cid:44) , α (cid:44) , ˜ y n , (cid:44) φ, I ( w j ; ˜ y n , | ¯ W [ j ] ) (cid:44) , ∀ j, I ( w ; y n ) (cid:44) , and Φ(0) (cid:44) for J ∈ [1 : J − and J ∈ [1 : (cid:100) log K (cid:101) ] (see (6), (7) and (8)). A. Proof of Lemma 2

The proof is based on the result of Lemma 3. Speciﬁcally, Lemma 3 reveals that Φ( J ) ≤ J − J +1 ( α J − α J − ) · n P + no (log P ) + J +2 (cid:88) j = J max { J − j +1 , } I ( w j ; ˜ y nJ ,J − | ¯ W [ j ] ) for J ∈ [1 : J − . By adding J − ( J − I ( w J − ; y nJ − ) into both sides of the above inequality, wehave Φ( J ) + 2 J − ( J − I ( w J − ; y nJ − ) ≤ J − J +1 ( α J − α J − ) · n P + no (log P ) + Φ( J − which completes the proof of Lemma 2 . B. Proof of Lemma 3

The proof will use the result of Lemma 5. At ﬁrst, we note that the following equality is true J − J +1 I ( w J ; y nJ ) + J +2 (cid:88) j = J +1 max { J − j +1 , } I ( w j ; ˜ y nJ +1 ,J | ¯ W [ j ] )= J +2 (cid:88) j = J +1 max { J − j +1 , } (cid:16) I ( w J ; y nJ ) + I ( w j ; ˜ y nJ +1 ,J | ¯ W [ j ] ) (cid:17) (48)by using the identity of (cid:80) J +2 j = J +1 max { J − j +1 , } = 2 J − J +1 , for J ∈ [1 : J − . For the sum of two mutualinformation terms in the right-hand side of (48), given j ∈ [ J + 1 , J + 2] , we have I ( w J ; y nJ ) + I ( w j ; ˜ y nJ +1 ,J | ¯ W [ j ] ) ≤ I ( w J ; y nJ , ˜ y nJ ,J − , ¯ W [ j,J ] ) + I ( w j ; ˜ y nJ +1 ,J , ˜ y nJ ,J − | ¯ W [ j ] ) (49) = I ( w J ; ˜ y nJ ,J − | ¯ W [ j,J ] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ I ( w J ;˜ y nJ ,J − | ¯ W [ J ) + I ( w j ; ˜ y nJ ,J − | ¯ W [ j ] )+ I ( w J ; y nJ | ˜ y nJ ,J − , ¯ W [ j,J ] ) + I ( w j ; ˜ y nJ +1 ,J | ˜ y nJ ,J − , ¯ W [ j ] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ ( α J − α J − ) · n log P + no (log P ) (50) ≤ I ( w J ; ˜ y nJ ,J − | ¯ W [ J ] ) + I ( w j ; ˜ y nJ ,J − | ¯ W [ j ] )+ ( α J − α J − ) · n P + no (log P ) (51) where the step in (49) follows from the fact that adding more information does not reduce the mutualinformation; the step in (50) uses chain rule and the fact that the messages are mutually independent;the step in (51) follows from the derivation of I ( w J ; ˜ y nJ ,J − | ¯ W [ j,J ] ) ≤ I ( w J ; ˜ y nJ ,J − , w j | ¯ W [ j,J ] ) = I ( w J ; ˜ y nJ ,J − | ¯ W [ J ] ) and from the result of Lemma 5, which reveals that I ( w J ; y nJ | ˜ y nJ ,J − , ¯ W [ j,J ] ) + I ( w j ; ˜ y nJ +1 ,J | ˜ y nJ ,J − , ¯ W [ j ] ) ≤ ( α J − α J − ) · n log P + no (log P ) .By incorporating the result of (51) into (48), it gives J − J +1 I ( w J ; y nJ ) + J +2 (cid:88) j = J +1 max { J − j +1 , } I ( w j ; ˜ y nJ +1 ,J | ¯ W [ j ] ) ≤ J +2 (cid:88) j = J +1 max { J − j +1 , } (cid:16) I ( w J ; ˜ y nJ ,J − | ¯ W [ J ] ) + I ( w j ; ˜ y nJ ,J − | ¯ W [ j ] ) + ( α J − α J − ) n P + no (log P ) (cid:17) (52) =2 J − J +1 ( α J − α J − ) · n P + no (log P ) + J +2 (cid:88) j = J max { J − j +1 , } I ( w j ; ˜ y nJ ,J − | ¯ W [ j ] ) (53)where (52) is from (51) and (48); (53) follows from the identity of (cid:80) J +2 j = J +1 max { J − j +1 , } = 2 J − J +1 , for J ∈ [1 : J − . Then, we complete the proof of Lemma 3. C. Proof of Lemma 4

The proof will use the result of Lemma 5. In the ﬁrst step, we expand I ( w J ; y nJ ) as follows I ( w J ; y nJ ) ≤ I ( w J ; y nJ , ˜ y nJ,J − , ¯ W [ J,J +1] ) + I ( w J ; y nJ , ˜ y nJ,J − , ¯ W [ J,J +2] ) (54) = I ( w J ; ˜ y nJ,J − | ¯ W [ J,J +1] ) + I ( w J ; ˜ y nJ,J − | ¯ W [ J,J +2] )+ I ( w J ; y nJ | ˜ y nJ,J − , ¯ W [ J,J +1] ) + I ( w J ; y nJ | ˜ y nJ,J − , ¯ W [ J,J +2] ) (55) ≤ I ( w J ; ˜ y nJ,J − | ¯ W [ J ] ) + I ( w J ; ˜ y nJ,J − | ¯ W [ J ] )+ I ( w J ; y nJ | ˜ y nJ,J − , ¯ W [ J,J +1] ) + I ( w J ; y nJ | ˜ y nJ,J − , ¯ W [ J,J +2] ) (56)where (54) follows from the fact that adding more information does not reduce the mutual information;(55) uses chain rule and the fact that the messages are mutually independent; and (56) results from thederivation that I ( w J ; ˜ y nJ,J − | ¯ W [ J,(cid:96) ] ) ≤ I ( w J ; ˜ y nJ,J − , w (cid:96) | ¯ W [ J,(cid:96) ] ) = I ( w J ; ˜ y nJ,J − | ¯ W [ J ] ) for (cid:96) ∈ [1 : K ] , (cid:96) (cid:54) = J .In the second step, we expand I ( w J +1 ; y nJ +1 ) + I ( w J +2 ; y nJ +2 ) as follows I ( w J +1 ; y nJ +1 ) + I ( w J +2 ; y nJ +2 ) ≤ I ( w J +1 ; y nJ +1 , ˜ y nJ +1 ,J , ¯ W [ J +1 ,J +2] ) + I ( w J +2 ; y nJ +2 , ˜ y nJ +1 ,J , ¯ W [ J +2] ) (57) = I ( w J +1 ; ˜ y nJ +1 ,J | ¯ W [ J +1 ,J +2] ) + I ( w J +2 ; ˜ y nJ +1 ,J | ¯ W [ J +2] )+ I ( w J +1 ; y nJ +1 | ˜ y nJ +1 ,J , ¯ W [ J +1 ,J +2] ) + I ( w J +2 ; y nJ +2 | ˜ y nJ +1 ,J , ¯ W [ J +2] ) (58) ≤ I ( w J +1 ; ˜ y nJ +1 ,J , ˜ y nJ,J − , w J +2 | ¯ W [ J +1 ,J +2] ) + I ( w J +2 ; ˜ y nJ +1 ,J , ˜ y nJ,J − | ¯ W [ J +2] )+ I ( w J +1 ; y nJ +1 | ˜ y nJ +1 ,J , ¯ W [ J +1 ,J +2] ) + I ( w J +2 ; y nJ +2 | ˜ y nJ +1 ,J , ¯ W [ J +2] ) (59) = I ( w J +1 ; ˜ y nJ,J − | ¯ W [ J +1] ) + I ( w J +2 ; ˜ y nJ,J − | ¯ W [ J +2] )+ I ( w J +1 ; ˜ y nJ +1 ,J | ˜ y nJ,J − , ¯ W [ J +1] ) + I ( w J +2 ; ˜ y nJ +1 ,J | ˜ y nJ,J − , ¯ W [ J +2] )+ I ( w J +1 ; y nJ +1 | ˜ y nJ +1 ,J , ¯ W [ J +1 ,J +2] ) + I ( w J +2 ; y nJ +2 | ˜ y nJ +1 ,J , ¯ W [ J +2] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ ( α J +2 − α J ) · n log P + no (log P ) (60) ≤ ( α J +2 − α J ) · n P + no (log P ) + I ( w J +1 ; ˜ y nJ,J − | ¯ W [ J +1] ) + I ( w J +2 ; ˜ y nJ,J − | ¯ W [ J +2] )+ I ( w J +1 ; ˜ y nJ +1 ,J | ˜ y nJ,J − , ¯ W [ J +1] ) + I ( w J +2 ; ˜ y nJ +1 ,J | ˜ y nJ,J − , ¯ W [ J +2] ) (61) where (57) and (59) result from the fact that adding more information does not reduce the mutual informa-tion; (58) and (60) use chain rule and the fact that the messages are mutually independent; (61) followsfrom the result of Lemma 5, that is, I ( w J +1 ; y nJ +1 | ˜ y nJ +1 ,J , ¯ W [ J +1 ,J +2] ) + I ( w J +2 ; y nJ +2 | ˜ y nJ +1 ,J , ¯ W [ J +2] ) = I ( w J +1 ; y nJ +1 | ˜ y nJ +1 ,J , ¯ W [ J +1 ,J +2] ) + I ( w J +2 ; ˜ y nJ +2 ,J +2 | ˜ y nJ +1 ,J , ¯ W [ J +2] ) ≤ ( α J +2 − α J ) · n log P + no (log P ) .By combining the results of (56) and (61), we have I ( w J ; y nJ ) + I ( w J +1 ; y nJ +1 ) + I ( w J +2 ; y nJ +2 ) ≤ I ( w J ; ˜ y nJ,J − | ¯ W [ J ] ) + I ( w J +1 ; ˜ y nJ,J − | ¯ W [ J +1] ) + I ( w J +2 ; ˜ y nJ,J − | ¯ W [ J +2] )+ I ( w J ; y nJ | ˜ y nJ,J − , ¯ W [ J,J +1] ) + I ( w J +1 ; ˜ y nJ +1 ,J | ˜ y nJ,J − , ¯ W [ J +1] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ ( α J − α J − ) · n log P + no (log P ) + I ( w J ; y nJ | ˜ y nJ,J − , ¯ W [ J,J +2] ) + I ( w J +2 ; ˜ y nJ +1 ,J | ˜ y nJ,J − , ¯ W [ J +2] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ ( α J − α J − ) · n log P + no (log P ) + ( α J +2 − α J ) · n P + no (log P ) (62) ≤ I ( w J ; ˜ y nJ,J − | ¯ W [ J ] ) + I ( w J +1 ; ˜ y nJ,J − | ¯ W [ J +1] ) + I ( w J +2 ; ˜ y nJ,J − | ¯ W [ J +2] )+ ( α J − α J − ) · n P + no (log P )+ ( α J − α J − ) · n P + no (log P )+ ( α J +2 − α J ) · n P + no (log P ) (63)where (62) is from (56) and (61); (63) follows from Lemma 5. At this point, we complete the proof ofLemma 4. D. Proof of Lemma 5

The proof will use the result of Claim 1 and Claim 2. When (cid:96) , (cid:96) , (cid:96) , l, i, j ∈ [1 : K ] , (cid:96) < (cid:96) ≤ (cid:96) , i (cid:54) = j , we have I ( w i ; y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) + I ( w j ; ˜ y nl,(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] ) ≤ I ( w i ; y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) + I ( w j ; ˜ y nl,(cid:96) , y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] ) (64) = I ( w i ; y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) + I ( w j ; y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] ) + I ( w j ; ˜ y nl,(cid:96) | y n(cid:96) , ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] )= I ( w i , w j ; y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ n log(1+ P α(cid:96) − α(cid:96) ) + I ( w j ; ˜ y nl,(cid:96) | y n(cid:96) , ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ n log (cid:0) P α(cid:96) − α(cid:96) | hlj | | h(cid:96) j | (cid:1) ≤ n P α (cid:96) − α (cid:96) ) + n (cid:0) P α (cid:96) − α (cid:96) | h lj | | h (cid:96) j | (cid:1) (65)where (64) uses the fact that adding information does not reduce the mutual information; and (65) followsfrom Claim 1 and Claim 2.Similarly, when (cid:96) , (cid:96) , l, j ∈ [1 : K ] and (cid:96) ≤ (cid:96) , we have I ( w i ; y n(cid:96) | ¯ W [ i,j ] ) + I ( w j ; ˜ y nl,(cid:96) | ¯ W [ j ] ) ≤ I ( w i ; y n(cid:96) | ¯ W [ i,j ] ) + I ( w j ; ˜ y nl,(cid:96) , y n(cid:96) | ¯ W [ j ] )= I ( w i ; y n(cid:96) | ¯ W [ i,j ] ) + I ( w j ; y n(cid:96) | ¯ W [ j ] ) + I ( w j ; ˜ y nl,(cid:96) | y n(cid:96) , ¯ W [ j ] )= I ( w i , w j ; y n(cid:96) | ¯ W [ i,j ] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ α (cid:96) · n log P + no (log P ) + I ( w j ; ˜ y nl,(cid:96) | y n(cid:96) , ¯ W [ j ] ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ n log (cid:0) P α(cid:96) − α(cid:96) | hlj | | h(cid:96) j | (cid:1) ≤ α (cid:96) · n P + no (log P ) + n (cid:0) P α (cid:96) − α (cid:96) | h lj | | h (cid:96) j | (cid:1) (66) = α (cid:96) · n P + no (log P ) where (66) follows from Claim 1 and Claim 2. Then, we complete the proof of Lemma 5. E. Proof of Claim 1

When (cid:96) , (cid:96) , i, j ∈ [1 : K ] , (cid:96) < (cid:96) , i (cid:54) = j , we have I ( w i , w j ; y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] )=h( y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) − h( y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] , w i , w j )=h( y n(cid:96) | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) − h( z n(cid:96) )=h( { y (cid:96) ( t ) − √ P α (cid:96) − α (cid:96) ˜ y (cid:96) ,(cid:96) ( t ) } nt =1 | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) − h( z n(cid:96) )=h( { z (cid:96) ( t ) − √ P α (cid:96) − α (cid:96) ˜ z (cid:96) ( t ) } nt =1 | ˜ y n(cid:96) ,(cid:96) , ¯ W [ i,j ] ) − h( z n(cid:96) ) ≤ h( { z (cid:96) ( t ) − √ P α (cid:96) − α (cid:96) ˜ z (cid:96) ( t ) } nt =1 ) − h( z n(cid:96) ) (67) = n πe (1 + P α (cid:96) − α (cid:96) )) − n πe )= n P α (cid:96) − α (cid:96) ) where (67) follows from the fact that conditioning reduces differential entropy.When (cid:96) , i, j ∈ [1 : K ] , i (cid:54) = j , we have I ( w i , w j ; y n(cid:96) | ¯ W [ i,j ] )=h( y n(cid:96) | ¯ W [ i,j ] ) − h( z n(cid:96) )= n (cid:88) t =1 h( y (cid:96) ( t ) | y t − (cid:96) , ¯ W [ i,j ] ) − n πe ) ≤ n (cid:88) t =1 h( y (cid:96) ( t )) − h( z n(cid:96) ) ≤ n πe (1 + P α (cid:96) K (cid:88) k =1 | h (cid:96) k | )) − n πe ) (68) = α (cid:96) · n P + no (log P ) where (68) uses the fact that Gaussian input maximizes the differential entropy. It then completes theproof of Claim 1. F. Proof of Claim 2

When (cid:96) , (cid:96) , (cid:96) , l, j ∈ [1 : K ] , (cid:96) < (cid:96) ≤ (cid:96) , or when (cid:96) , (cid:96) , l, j ∈ [1 : K ] , (cid:96) ≤ (cid:96) , ˜ y n(cid:96) ,(cid:96) = φ , we have I ( w j ; ˜ y nl,(cid:96) | y n(cid:96) , ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] )=h(˜ y nl,(cid:96) | y n(cid:96) , ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] ) − h(˜ y nl,(cid:96) | y n(cid:96) , ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] , w j )=h (cid:16)(cid:8) √ P α (cid:96) h lj x j ( t ) + ˜ z (cid:96) ( t ) (cid:9) nt =1 (cid:12)(cid:12)(cid:8) √ P α (cid:96) h (cid:96) j x j ( t ) + z (cid:96) ( t ) (cid:9) nt =1 , ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] (cid:17) − h(˜ z n(cid:96) )=h (cid:16)(cid:8) √ P α (cid:96) h lj x j ( t ) + ˜ z (cid:96) ( t ) − √ P α (cid:96) − α (cid:96) h lj h (cid:96) j (cid:0) √ P α (cid:96) h (cid:96) j x j ( t ) + z (cid:96) ( t ) (cid:1)(cid:9) nt =1 (cid:12)(cid:12) (cid:8) √ P α (cid:96) h (cid:96) j x j ( t ) + z (cid:96) ( t ) (cid:9) nt =1 , ˜ y n(cid:96) ,(cid:96) , ¯ W [ j ] (cid:17) − h(˜ z n(cid:96) ) ≤ h (cid:16)(cid:8) ˜ z (cid:96) ( t ) − √ P α (cid:96) − α (cid:96) h lj h (cid:96) j z (cid:96) ( t ) (cid:9) nt =1 (cid:17) − h(˜ z n(cid:96) ) (69) = n (cid:0) P α (cid:96) − α (cid:96) | h lj | | h (cid:96) j | (cid:1) where (69) follows from the fact that conditioning reduces differential entropy. It then completes the proofof Claim 2. A PPENDIX BP ROOF OF C OROLLARY J m (cid:44) (cid:100) log K (cid:101) and that Θ( x ) (cid:44) (cid:26) x if x ≥ J m (70a) else . (70b)Recall that (see (8)) d (cid:44) , α (cid:44) . (71)In our proof, a total of J m bounds are required. Among those J m bounds, the ﬁrst J m − bounds have aspeciﬁc structure. The last J m − bounds have a similar structure but some elements with certain indexesare erased (set as zeros). A. Proof for the case with K = 8 From Lemma 1, the following bounds hold true d + 2 d + d + d ≤ α + α + α d + 2 d + d + d ≤ α + α + α d + 2 d + d + d ≤ α + α + α d + 2 d + d + d ≤ α + α + α . By summing up the above bounds and dividing each side with , it gives d sum ( α ) ≤ (cid:80) k =1 α k + α − α . B. Proof for the case with K = 9 The result of Lemma 1 reveals that d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + d ≤ α d + d ≤ α d + d ≤ α d + d ≤ α . By summing up the above bounds and dividing each side with , we have d sum ( α ) ≤ (cid:80) k =1 α k + α − α . C. Proof for the case with K = 10 The following bounds are directly derived from Lemma 1 d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + d + d ≤ α + α d + d + d ≤ α + α d + d + d ≤ α + α d + d + d ≤ α + α . By combining the above bounds it gives d sum ( α ) ≤ (cid:80) k =1 α k + α − α . D. Proof for the case with K = 13 When K = 13 , the following bounds are directly derived from Lemma 1 d + 4 d + 2 d + d + d ≤ α + 2 α + α + α (72) d + 4 d + 2 d + d + d ≤ α + 2 α + α + α (73) d + 4 d + 2 d + d + d ≤ α + 2 α + α + α (74) d + 4 d + 2 d + d + d ≤ α + 2 α + α + α (75) d + 2 d + d + d ≤ α + α + α (76) d + 2 d + d + d ≤ α + α + α (77) d + 2 d + d + d ≤ α + α + α (78) d + 4 d + 2 d + d + d ≤ α + 2 α + α + α . (79)The above bounds reveal that d sum ( α ) ≤ (cid:80) k =1 α k + α − α . E. Proof for the case with K = 16 When K = 16 , the following bounds are directly derived from Lemma 1 d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α d + 4 d + 2 d + d + d ≤ α + 2 α + α + α . It then implies that d sum ( α ) ≤ (cid:80) k =1 α k + α − α .In the following we will prove Corollary 1 for the general case ( K ≥ ) by using the result of Lemma 1.Note that when K = 2 , the proof is straightforward. F. Proof for the general case

In our proof, a total of J m bounds are required, which can be seen in the previous examples. Amongthose J m bounds, the ﬁrst J m − bounds have a similar structure. Speciﬁcally, when (cid:96) ∈ [1 : 2 J m − ] , the (cid:96) th bound takes the following form J m − (cid:88) j =0 J m − j · d (cid:100) (cid:96)/ j (cid:101) + (cid:80) jl =1 Jm − l + d K − + d K ≤ J m − (cid:88) j =0 J m − j − · α (cid:100) (cid:96)/ j (cid:101) + (cid:80) jl =1 Jm − l + α K . (80)Note that in the above expression, we deﬁne that (cid:80) l =1 J m − l (cid:44) . When (cid:96) ∈ [2 J m − + 1 : 2 J m ] , the (cid:96) thbound takes the following form J m − (cid:88) j =0 J m − j · d Θ( K − − Jm + (cid:100) ( (cid:96) − Jm − ) / j (cid:101) + (cid:80) jl =1 Jm − l ) + d K − + d K ≤ J m − (cid:88) j =0 J m − j − · α Θ( K − − Jm + (cid:100) ( (cid:96) − Jm − ) / j (cid:101) + (cid:80) jl =1 Jm − l ) + α K (81)where Θ( • ) , d and α are deﬁned in (70a), (70b) and (71). The last J m − bounds have a similar structureas the ﬁrst J m − bounds. However, with our design in (81), we enforce some d Θ( • ) and α Θ( • ) to whenthe corresponding indices are less than J m . For example, when K = 13 and J m = (cid:100) log K (cid:101) = 3 , the ﬁrst J m − = 4 bounds are exactly the same as in (72)-(75), while the last bounds are expressed as d Θ(5) + 4 d + 2 d + d + d ≤ α Θ(5) + 2 α + α + α (82) d Θ(6) + 4 d + 2 d + d + d ≤ α Θ(6) + 2 α + α + α (83) d Θ(7) + 4 d + 2 d + d + d ≤ α Θ(7) + 2 α + α + α (84) d + 4 d + 2 d + d + d ≤ α + 2 α + α + α (85)where d Θ(5) = d Θ(6) = d Θ(7) = α Θ(5) = α Θ(6) = α Θ(7) = 0 . The bounds in (82)-(85) can be rewritten asin (76)-(79).Note that, for the left-hand side of the above J m bounds, the total weight of d k is J m , ∀ k ∈ [1 : K ] .For the right-hand side of the above J m bounds, the total weight of α k is J m − , ∀ k ∈ [1 : K − ; thetotal weight of α K is J m ; and the total weight of α K − is . Therefore, by summing up the above J m bounds and dividing each side with J m , the following bound holds true d sum ( α ) ≤ (cid:80) Kk =1 α k + α K − α K − which completes the proof of Corollary 1. A PPENDIX CP ROOFS OF L EMMAS

6, 7, 8Recall that, when (cid:96) ∈ [1 : K − , we have |I k,(cid:96) | = m K (cid:96) ( K (cid:96) − + ( K (cid:96) − m K (cid:96) ( K (cid:96) − − − , |S k,(cid:96) | = m K (cid:96) ( K (cid:96) − , λ (cid:96) = α (cid:96) − α (cid:96) − M (cid:96) − (cid:15) , M (cid:96) (cid:44) m K (cid:96) ( K (cid:96) − + ( K (cid:96) − m K (cid:96) ( K (cid:96) − − − , N (cid:96) = m K (cid:96) ( K (cid:96) − , and K (cid:96) = K − (cid:96) + 1 . A. Proof of Lemma 6

Based on the signal design in (22)-(30), the average power of the transmitted signal at Transmitter k , k ∈ [1 : K ] , is bounded by E | x k | = k (cid:88) (cid:96) =1 P − α (cid:96) − E | x k,(cid:96) | = k (cid:88) (cid:96) =1 P − α (cid:96) − E | v T k,(cid:96) b k,(cid:96) | = k (cid:88) (cid:96) =1 P − α (cid:96) − N (cid:96) (cid:88) i =1 | v k,(cid:96),i | · E | b k,(cid:96),i | (86) = k (cid:88) (cid:96) =1 P − α (cid:96) − N (cid:96) (cid:88) i =1 | v k,(cid:96),i | · γ Q (cid:96) ( Q (cid:96) + 1)3 Q (cid:96) (87) ≤ k (cid:88) (cid:96) =1 P − α (cid:96) − N (cid:96) (cid:88) i =1 | v k,(cid:96),i | · γ (88) ≤ γ k (cid:88) (cid:96) =1 N (cid:96) (cid:88) i =1 | v k,(cid:96),i | (89) ≤ γ k (cid:63) (cid:88) (cid:96) =1 N (cid:96) (cid:88) i =1 | v k (cid:63) ,(cid:96),i | = γ η where k (cid:63) (cid:44) arg max k (cid:48) ∈ [1: K ] k (cid:48) (cid:88) (cid:96) =1 N (cid:96) (cid:88) i =1 | v k (cid:48) ,(cid:96),i | and η (cid:44) k (cid:63) (cid:88) (cid:96) =1 N (cid:96) (cid:88) i =1 | v k (cid:63) ,(cid:96),i | . Note that η is a positive value independent of P . The step in (86) uses the fact that the symbols { b k,(cid:96),i } k,(cid:96),i are mutually independent, based on our signal design. The step in (87) is from the result of (17), giventhat b k,(cid:96),i ∈ Ω( ξ = γ · Q (cid:96) , Q = Q (cid:96) ) , for i ∈ [1 : N (cid:96) ] , (cid:96) ∈ [1 : k ] , k ∈ [1 : K ] (see (26)). The step in (88)uses the identity that Q (cid:96) ( Q (cid:96) +1)3 Q (cid:96) ≤ Q (cid:96) Q (cid:96) < . The step in (89) follows from the fact that P − α (cid:96) − ≤ for (cid:96) ∈ [1 : K ] . At this point, we complete the proof of Lemma 6. B. Proof of Lemma 7

Since the elements of S k,(cid:96) and I k,(cid:96) are monomials generated from the channel coefﬁcients (see (33) and(34)), the minimum distance d min ( k, (cid:96) ) deﬁned in (40) can be bounded by using the Khintchine-GroshevTheorem for Monomials (see Theorem 2). Speciﬁcally, the Khintchine-Groshev Theorem for Monomials reveals that, for any small enough (cid:15) (cid:48) = (cid:15) > , and for almost all realizations of channel coefﬁcients, thereexists a positive constant κ such that d min ( k, (cid:96) ) ≥ κγ √ P α k − α (cid:96) − − λ (cid:96) ( K (cid:96) Q (cid:96) ) |S k,(cid:96) | + |I k,(cid:96) |− (cid:15) = κγP ( α k − α (cid:96) − ) / P λ (cid:96) / · ( K (cid:96) P λ (cid:96) / ) M (cid:96) − (cid:15) = κγK M (cid:96) − (cid:15)(cid:96) · P ( α k − α (cid:96) − ) / ( P λ (cid:96) / ) M (cid:96) + (cid:15) = κγK M (cid:96) − (cid:15)(cid:96) · P αk − α(cid:96) − − ( α(cid:96) − α(cid:96) − P − (cid:15) · ( M (cid:96) + (cid:15) − α(cid:96) − α(cid:96) − M(cid:96) ) = κ (cid:48) P αk − α(cid:96) + (cid:15)(cid:96) (90)for k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − , where (cid:15) (cid:96) and κ (cid:48) are deﬁned as (cid:15) (cid:96) (cid:44) (cid:15) ( M (cid:96) + (cid:15) − α (cid:96) − α (cid:96) − M (cid:96) ) , κ (cid:48) (cid:44) κγK M (cid:96) − (cid:15)(cid:96) . Note that the value of κ (cid:48) is positive and independent of P , and (cid:15) (cid:96) is positive, ∀ (cid:96) ∈ [1 : K − , given that (cid:15) > . It then completes the proof of Lemma 7. C. Proof of Lemma 8

For the term T k,(cid:96) deﬁned in (37), it can be bounded by T k,(cid:96) = K (cid:88) l = (cid:96) +1 K (cid:88) j = l √ P α k − α l − h kj v T j,l b j,l = K (cid:88) l = (cid:96) +1 K (cid:88) j = l √ P α k − α l − h kj N l (cid:88) i =1 v j,l,i b j,l,i ≤ K (cid:88) l = (cid:96) +1 K (cid:88) j = l √ P α k − α l − | h kj | N l (cid:88) i =1 | v j,l,i | γ (91) ≤ K (cid:88) l = (cid:96) +1 K (cid:88) j = l √ P α k − α (cid:96) | h kj | N l (cid:88) i =1 | v j,l,i | γ = √ P α k − α (cid:96) · γ K (cid:88) l = (cid:96) +1 K (cid:88) j = l N l (cid:88) i =1 | h kj || v j,l,i | = √ P α k − α (cid:96) · δ k,(cid:96) for k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K − , where δ k,(cid:96) (cid:44) γ K (cid:88) l = (cid:96) +1 K (cid:88) j = l N l (cid:88) i =1 | h kj || v j,l,i | and the value of δ k,(cid:96) is independent of P . The step in (91) uses the fact that b j,(cid:96),i ≤ γ , given that b k,(cid:96),i ∈ Ω( ξ = γ · P λ(cid:96) , Q = P λ(cid:96) ) , for i ∈ [1 : N (cid:96) ] , k ∈ [ (cid:96) : K ] , (cid:96) ∈ [1 : K ] (see (26)). At this point, wecomplete the proof of Lemma 8. R EFERENCES [1] V. R. Cadambe and S. A. Jafar, “Interference alignment and degrees of freedom of the K -user interference channel,” IEEE Trans. Inf.Theory , vol. 54, no. 8, pp. 3425 – 3441, Aug. 2008.[2] R. H. Etkin, D. N. C. Tse, and H. Wang, “Gaussian interference channel capacity to within one bit,”

IEEE Trans. Inf. Theory , vol. 54,no. 12, pp. 5534 – 5562, Dec. 2008.[3] C. S. Vaze, S. Karmakar, and M. K. Varanasi, “On the generalized degrees of freedom region of the MIMO interference channel withno CSIT,” in

Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Aug. 2011.[4] S. Karmakar and M. K. Varanasi, “The generalized degrees of freedom of the MIMO interference channel,” in

Proc. IEEE Int. Symp.Inf. Theory (ISIT) , Aug. 2011.[5] ——, “The generalized multiplexing gain region of the slow fading MIMO interference channel and its achievability with limitedfeedback,” in

Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Jul. 2012.[6] ——, “The generalized degrees of freedom region of the MIMO interference channel and its achievability,”

IEEE Trans. Inf. Theory ,vol. 58, no. 12, pp. 7188 – 7203, Dec. 2012.[7] K. Mohanty and M. K. Varanasi, “The generalized degrees of freedom region of the MIMO Z-interference channel with delayed CSIT,”

IEEE Trans. Inf. Theory , vol. 64, no. 1, pp. 531 – 546, Jan. 2018.[8] J. H. Bae, J. Lee, and I. Kang, “The GDOF of 3-user MIMO Gaussian interference channel,” in

Proc. IEEE Int. Symp. Inf. Theory(ISIT) , Jul. 2013.[9] C. Huang, V. R. Cadambe, and S. A. Jafar, “Interference alignment and the generalized degrees of freedom of the X channel,” IEEETrans. Inf. Theory , vol. 58, no. 8, pp. 5130 – 5150, May 2012.[10] J. Chen, P. Elia, and S. A. Jafar, “On the vector broadcast channel with alternating CSIT: A topological perspective,” in

Proc. IEEEInt. Symp. Inf. Theory (ISIT) , Jul. 2014.[11] S. Mohajer, R. Tandon, and H. V. Poor, “On the feedback capacity of the fully connected K -user interference channel,” IEEE Trans.Inf. Theory , vol. 59, no. 5, pp. 2863 – 2881, May 2013.[12] R. Tandon, S. Mohajer, and H. V. Poor, “On the symmetric feedback capacity of the K -user cyclic Z-interference channel,” IEEETrans. Inf. Theory , vol. 59, no. 5, pp. 2713 – 2734, May 2013.[13] S. A. Jafar and S. Vishwanath, “Generalized degrees of freedom of the symmetric Gaussian K user interference channel,” IEEE Trans.Inf. Theory , vol. 56, no. 7, pp. 3297 – 3303, Jul. 2010.[14] J. Chen, P. Elia, and S. A. Jafar, “On the two-user MISO broadcast channel with alternating CSIT: A topological perspective,”

IEEETrans. Inf. Theory , vol. 61, no. 8, pp. 4345 – 4366, Aug. 2015.[15] C. Geng, N. Naderializadeh, A. S. Avestimehr, and S. A. Jafar, “On the optimality of treating interference as noise,”

IEEE Trans. Inf.Theory , vol. 61, no. 4, pp. 1753 – 1767, Apr. 2015.[16] J. Chen, “Secure communication over interference channel: To jam or not to jam?” in

Proc. Allerton Conf. Communication, Controland Computing , Oct. 2018.[17] C. Geng, R. Tandon, and S. A. Jafar, “On the symmetric 2-user deterministic interference channel with conﬁdential messages,” in

Proc.IEEE Global Conf. Communications (GLOBECOM) , Dec. 2015.[18] J. Chen and F. Li, “Adding a helper can totally remove the secrecy constraints in two-user interference channel,”

IEEE Trans. Inf.Forensics Security , vol. 14, no. 12, pp. 3126–3139, Dec. 2019.[19] X. Yi and G. Caire, “Optimality of treating interference as noise: A combinatorial perspective,”

IEEE Trans. Inf. Theory , vol. 62, no. 8,pp. 4654 – 4673, Aug. 2016.[20] J. Chen and C. Geng, “Optimal secure GDoF of symmetric Gaussian wiretap channel with a helper,” in

Proc. IEEE Int. Symp. Inf.Theory (ISIT) , Jul. 2019.[21] P. Mohapatra and C. Murthy, “On the generalized degrees of freedom of the K -user symmetric MIMO Gaussian interference channel,”in Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Aug. 2011, pp. 2188 –2192.[22] F. Li and J. Chen, “How to break the limits of secrecy constraints in communication networks?” in

Proc. IEEE Int. Symp. Inf. Theory(ISIT) , Jul. 2019.[23] H. Sun and S. A. Jafar, “On the separability of GDoF region for parallel Gaussian TIN optimal interference networks,” in

Proc. IEEEInt. Symp. Inf. Theory (ISIT) , Jun. 2015.[24] A. G. Davoodi, B. Yuan, and S. A. Jafar, “GDoF region of the MISO BC: Bridging the gap between ﬁnite precision and perfect CSIT,”

IEEE Trans. Inf. Theory , vol. 64, no. 11, pp. 7208 – 7217, Nov. 2018.[25] A. G. Davoodi and S. A. Jafar, “ K -user symmetric M × N MIMO interference channel under ﬁnite precision CSIT: A GDoFperspective,”

IEEE Trans. Inf. Theory , vol. 65, no. 2, pp. 1126 – 1136, Feb. 2019.[26] J. Wang, B. Yuan, L. Huang, and S. A. Jafar, “GDoF of interference channel with limited cooperation under ﬁnite precision CSIT,”Aug. 2019, available on ArXiv: https://arxiv.org/pdf/1908.00703.pdf.[27] M. A. Maddah-Ali, A. S. Motahari, and A. K. Khandani, “Communication over MIMO X channels: Interference alignment,decomposition, and performance analysis,” IEEE Trans. Inf. Theory , vol. 54, no. 8, pp. 3457 – 3470, Aug. 2008.[28] A. S. Motahari, S. O. Gharan, M. A. Maddah-Ali, and A. K. Khandani, “Real interference alignment: Exploiting the potential of singleantenna systems,”