[PDF] DoF Analysis of the K-user MISO Broadcast Channel with Alternating CSIT

Abstract

We consider a K -user multiple-input single-output (MISO) broadcast channel (BC) where the channel state information (CSI) of user i(i=1,2,…,K) may be either perfect (P), delayed (D) or not known (N) at the transmitter with probabilities λ i P , λ i D and λ i N , respectively. In this channel, according to the three possible CSIT for each user, joint CSIT of the K users could have at most 3 K realizations. Although the results by Tandon et al. show that the Degrees of Freedom (DoF) region for the two user MISO BC with symmetric marginal probabilities (i.e., λ i Q = λ Q ∀i∈{1,2,…,K},Q∈{P,D,N} ) depends only on the marginal probabilities, we show that this interesting result does not hold in general when the number of users is more than two. In other words, the DoF region is a function of the \textit{CSIT pattern}, or equivalently, all the joint probabilities. In this paper, given the marginal probabilities of CSIT, we derive an outer bound for the DoF region of the K -user MISO BC. Subsequently, the achievability of these outer bounds are considered in certain scenarios. Finally, we show the dependence of the DoF region on the joint probabilities.

Full PDF

11 DoF Analysis of the K-user MISO BroadcastChannel with Alternating CSIT

Borzoo Rassouli, Chenxi Hao and Bruno Clerckx

Abstract

We consider a K -user multiple-input single-output (MISO) broadcast channel (BC) where the channel stateinformation (CSI) of user i ( i = 1 , , . . . , K ) may be either perfect (P), delayed (D) or not known (N) at thetransmitter with probabilities λ iP , λ iD and λ iN , respectively. In this channel, according to the three possible CSITfor each user, joint CSIT of the K users could have at most K realizations. Although the results by Tandon et al.show that the Degrees of Freedom (DoF) region for the two user MISO BC with symmetric marginal probabilities(i.e., λ iQ = λ Q ∀ i ∈ { , , . . . , K } , Q ∈ { P, D, N } ) depends only on the marginal probabilities, we show that thisinteresting result does not hold in general when the number of users is more than two. In other words, the DoFregion is a function of the CSIT pattern , or equivalently, all the joint probabilities. In this paper, given the marginalprobabilities of CSIT, we derive an outer bound for the DoF region of the K -user MISO BC. Subsequently, theachievability of these outer bounds are considered in certain scenarios. Finally, we show the dependence of the DoFregion on the joint probabilities. Index Terms

MISO BC, Alternating CSIT, Degrees of Freedom, Outer Bound, CSIT Pattern

I. I

NTRODUCTION

In contrast to the point to point multiple-input multiple-output (MIMO) communication where the channel stateinformation at the transmitter (CSIT) does not affect the multiplexing gain, in a multiple-input single-output (MISO)broadcast channel (BC), knowledge of CSIT is crucial for interference mitigation and beamforming purposes [1].However, the assumption of perfect CSIT may not always be true in practice due to channel estimation and feedbacklatency. Therefore, the idea of communication under some sort of imperfection in CSIT has gained more attentionrecently. The so called MAT algorithm was presented in [2] where it was shown that in terms of the degreesof freedom, even an outdated CSIT can result in signiﬁcant performance improvement in comparison to the casewith no CSIT. Assuming correlation between the feedback information and current channel state (e.g., when the

Borzoo Rassouli, Chenxi Hao and Bruno Clerckx are with the Communication and Signal Processing group of Department of Electrical andElectronics, Imperial College London, email: { b.rassouli12; chenxi.hao10; b.clerckx } @imperial.ac.ukThis work was partially supported by the Seventh Framework Programme for Research of the European Commission under grant numberHARP-318489. a r X i v : . [ c s . I T ] N ov feedback latency is smaller than the coherence time of the channel), the authors in [3] and [4] consider the degreesof freedom in a time correlated MISO BC which is shown to be a combination of zero forcing beamforming (ZFBF)and MAT algorithm. Following these works, the general case of mixed CSIT and the K -user MISO BC with timecorrelated delayed CSIT are discussed in [5] and [6], respectively. While all these works consider the concept ofdelayed CSIT in time domain, [7] and [8] deal with the DoF region and its achievable schemes in a frequencycorrelated MISO BC where there is no delayed CSIT but imperfect CSIT across subbands, which is more inlinewith practical systems as Long Term Evolution (LTE) [1]. The most relevant article to this paper is the work donein [9] where the synergistic beneﬁts of alternating CSIT over ﬁxed CSIT was presented in a two user MISO BCwith two transmit antennas. The converse in [9] is based on the idea of assigning artiﬁcial receivers to the userswhose observations are (statistically) equivalent to the corresponding user when CSIT is (not) perfect. However,whether this brilliant approach could be generalized to the scenarios with more than two transmit antennas and twousers is unknown. Therefore, for such scenarios, it becomes necessary to check other ways to ﬁnd the fundamentallimits of the system. To the best of our knowledge, this is the ﬁrst paper in the literature addressing the general K -user MISO BC with alternating CSIT. To this end, our contributions are as follows. • Given the marginal probabilities of CSIT in a K -user MISO BC, we derive an outer bound for the DoF regionwhere the proof is based on ﬁnding upper bounds for a certain difference between entropies and is inspiredby [8] and the results in [10]. • We investigate the achievability and tightness of the outer bounds. Several achievable schemes are introducedand shown to achieve the corner points of the DoF region in some scenarios, therefore proving that the outerbounds are optimal bounds in those scenarios. • Finally, we provide an example which proves that in contrast to the results of [9] for the two user BC, theDoF region of the K -user MISO BC ( K ≥ ) is not only a function of marginal probabilities in general.The paper is organized as follows. In section II the system model and preliminaries are presented. The mainresult of this paper is provided in section III as a theorem. The proof and tightness of the outerbounds will bediscussed in section IV and V , respectively. Section VI shows that the DoF region depends on the joint CSITprobabilities in general, and section VII concludes the paper.Throughout the paper, vectors are shown in bold lower case while matrices are written in upper case. CN ( , Σ ) is the circularly symmetric complex Gaussian distribution with covariance matrix Σ . f ∼ O (log P ) is equivalentto lim P →∞ f log P = 0 . X ni = { X ( i ) , X ( i + 1) , . . . , X ( n ) } is the time extension of random variable X and when i = 1 , it is dropped for simplicity (i.e., written as X n ). ( . ) T and ( . ) H denote the transpose and conjugate transpose,respectively. Both of the terms upper bound and outer bound, used in this paper, have almost similar meaningswith a slight difference; while the former is only used for scalars, the latter is a more general term used formultidimensional regions and could be deﬁned by (in)ﬁnite number of upper bounds. Finally, Let S and S be twosets of inequalities deﬁning the regions D and D , respectively, and assume the region D is deﬁned by the set ofinequalities S = S ∪ S or equivalently D = D ∩ D . The set of inequalities S is called inactive (or redundant) Fig. 1: Two different CSIT patterns both having the same marginal probabilities ( λ P = 4 λ D = 4 λ N = )in deﬁning D when D ⊂ D . II. S YSTEM M ODEL

We consider a MISO BC, in which a base station with M antennas sends independent messages W , . . . , W K to K single-antenna users ( M ≥ K ). In a ﬂat fading scenario, the discrete-time baseband received signal of user k at time instant n can be written as y k ( n ) = h Hk ( n ) x ( n ) + w k ( n ) , k = 1 , , . . . , K (1)where x ( n ) ∈ C ( M × is the transmitted signal at time instant n satisfying the power constraint E (cid:2) (cid:107) x (cid:107) (cid:3) ≤ P , and w k ( n ) is the additive noise with distribution CN (0 , . The channel vector of user k has the distribution CN ( , I ) and is i.i.d. over time and users. Also, let H ( n ) = [ h ( n ) , . . . , h K ( n )] H and H n = { H (1) , . . . , H ( n ) } . We assumeglobal perfect Channel State Information at Receiver (CSIR) i.e., at time instant n , all users have perfect knowledgeof H n .The rate tuple ( R , R , . . . , R K ) is achievable if the probability of error in decoding W i at user i ( i = 1 , . . . , K ) can be made arbitrarily small with sufﬁciently large coding length. Analysis of the capacity region, which is theset of all achievable rate tuples, is not always tractable. Instead, we consider the DoF region, which is a simplermetric independent of the transmit power, and is deﬁned as { ( d , . . . , d K ) | d i = lim P →∞ R i log P ∀ i, R i } . At very highSNRs, the effect of noise can be neglected and what remains is the interference caused by other users’ signals.Therefore, the DoF region could also be interpreted as the set constructed by the number of interference-free privatedata streams that each user receives per channel use.The CSIT model used in this paper is the same as that in [9] i.e., at some time instants the transmitter has a Perfect(P) instantaneous knowledge of the CSI of a particular user, whereas at some time instants it receives the CSI withDelay (D) and ﬁnally, at some time instants the CSI of the user is Not known (N) at the transmitter. When thereis delayed CSIT, we assume that the feedback delay is larger than the coherence time of the channel making thefeedback information completely independent of the current channel state. In this conﬁguration, the joint CSIT of allthe K users has at most K states. For example, in a 3 user MISO BC, they will be P P P, P P D, P P N, P DP, . . . with corresponding probabilities λ P P P , λ

P P D , λ

P P N , λ

P DP , . . . . A scenario is symmetric when the marginal CSITprobabilities are the same across the users (i.e., λ iQ = λ Q ∀ i ∈ { , , . . . , K } , Q ∈ { P, D, N } ) and asymmetric Fig. 2: An example of the ﬁxed and alternating CSIT in a 3-uer MISO BC ( a ) ﬁxed CSIT ( b ) alternating CSITotherwise. For the symmetric case, it was shown in [9] that the DoF region for the user BC with transmitantennas at the base station is only a function of marginal probabilities. We show that this interesting result alsoholds for an arbitrary number of antennas ( > ) in the two user MISO BC, however it does not hold for thegeneral K ( > user BC and the DoF region is a function of the CSIT pattern (i.e., a function of all K jointstate probabilities.) By CSIT pattern we refer to the knowledge of CSIT represented in a space-time matrix wherethe rows and columns represent users and time slots, respectively. Figure 1 shows two different symmetric CSITpatterns where both have the same marginal probabilities. For example, in pattern ( b ) , the transmitter knows thechannels of users 2 and 3 perfectly at time slot 1 and has no information about the channel of user 1. The CSI ofuser 1 will be known in the next time slot (i.e., time slot 2) due to feedback delay and is completely independentof the channel in time slot 2.The synergistic beneﬁt of alternating CSIT over ﬁxed CSIT was shown in [9] for the 2-user MISO BC. It isinteresting to check whether it holds for the general K -user MISO BC. For example, consider a 3-user MISO BCin which the transmitter has always delayed CSI of one user and no CSI of the remaining users like the patternshown in ﬁgure 2 ( a ) . Now, what happens if the CSIT alternates among the users as in pattern ( b ) ? Is it beneﬁcial?The answer is yes. According to the results of this paper (see the theorem in section III), the sum DoF of theﬁxed CSIT (i.e., pattern ( a ) ) has an upper bound of 1, and since it is simply achievable, the upper bound is tight.However, a sum DoF of ( > is achievable for the alternating case (pattern ( b ) ) as follows. Let u X and v X betwo complex vectors in C × where each of them contains two symbols from two independently encoded Gaussiancodewords intended for user X (= A, B or C ) . For brevity, the transmission scheme is shown in ﬁgure 3 wherein the ﬁrst row the transmitted vectors in consecutive time slots are shown, and the remaining three rows showthe received signal at the corresponding receiver. In conjunction with pattern ( b ) in ﬁgure 2, the received signalof the users that feed back their channel is shown in red. After the end of time slot 2, the transmitter knows both h HA (1) and h HB (2) , and if it sends the scalar symbol u AB = h HA (1) u B + h HB (2) u A to both receivers A and B , bothof them can decode their intended vectors, by means of interference cancellation. Such a message ( u AB ) is calledan order-2 message, since it is intended for two receivers [2]. Therefore, for successful detection of 12 symbols, 3order-2 messages ( u AB , u AC and u BC ) must be sent. According to [2], in a 3-user scenario, 6 order-2 messagescan be sent in a frame of 5 time slots where the ﬁrst three time slots of the frame look the same as pattern ( b ) inﬁgure 2 and in the next two time slots, the transmitter requires no CSIT. Hence, 12 symbols will be successfully Fig. 3: Transmission scheme for the example of alternating CSIT patterndecoded in time slots which results in an achievable sum DoF of (= ) which is obviously greaterthan the sum DoF of the ﬁxed CSIT scenario. Therefore, CSIT alternation is also beneﬁcial in the K -user scenario.The main result of this paper is that given the marginal probabilities of CSIT, an outer bound for the DoF region isprovided regardless of the CSIT pattern and its achievability is considered in some scenarios with speciﬁc patterns.For example, according to the results of this paper, given the marginal probabilities as in ﬁgure 1 ( λ P = , λ D = λ N = ), the sum DoF has the upper bound of . Actually, it is optimal for the CSIT pattern in ( a ) , since it isachievable, however whether it is also tight for the pattern in ( b ) is unknown. It is our conjecture that this is nota tight bound for the latter, though having the same marginal probabilities as in ( a ) . This dependency on CSITpattern, which will be discussed in section VI, is equivalent to having the optimal DoF region as a function of λ DDP , λ

P NN , . . . in such a way that they do not add up to produce only the marginal probabilities, in contrast tothe results of [9] for the 2-user case. III. M

AIN RESULTS

Theorem . Let O K be the outer bound for the DoF region of the K user MISO BC with M transmit antennas atthe transmitter ( M ≥ K ). Given the marginal probabilities of CSIT for user i (which can be any two of λ iP , λ iD and λ iN , since λ iP + λ iD + λ iN = 1 ), O K = (cid:40) ( d , d , ..., d K ) | j (cid:88) i =1 d π j ( i ) i ≤ j (cid:88) i =2 (cid:80) i − r =1 λ π j ( r ) P i ( i − , j (cid:88) i =1 d π j ( i ) ≤ j − (cid:88) i =1 ( λ ψ πj ( i ) P + λ ψ πj ( i ) D ) , ∀ π j , j = 1 , , ..., K (cid:41) (2)where π j ( . ) is an arbitrary permutation of size j over the indices (1 , , . . . , K ) , and ψ π j ( . ) is a permutation of π j satisfying ( λ ψ πj ( i ) P + λ ψ πj ( i ) D ) ≤ ( λ ψ πj ( i +1) P + λ ψ πj ( i +1) D ) , i = 1 , , . . . , j − . (3)For the symmetric scenario, O K is simpliﬁed as O K = (cid:40) ( d , d , ..., d K ) | j (cid:88) i =1 d π j ( i ) i ≤ λ P j (cid:88) i =2 i , j (cid:88) i =1 d π j ( i ) ≤ j − λ P + λ D ) , ∀ π j , j = 1 , , ..., K (cid:41) . (4) It is important to note that these outer bounds hold regardless of the CSIT patterns and are only a function ofmarginal probabilities. The proof is provided in the next section.IV. P

ROOF OF THE THEOREM

The structure of the proof could be brieﬂy itemized as follows. • Applying some sort of improvement to the channel. • The usage of Fano’s inequality. • Application of the

Csisz´ar sum identity [11] as in [8] to change the difference between vector entropies intothe sum of the component-wise entropy differences. • Finding an upper bound for these entropy differences by application of two provided lemmas.Having an outer bound for the DoF region of the general K -user BC ( M ≥ K ), it is obvious that each subsetof users with cardinality j ( j < K ) should satisfy the outer bound for the j -user BC ( O j ). Therefore, we onlyconsider the proof of the inequalities involving all the K users. For simplicity, we show the inequalities for theidentity permutation (i.e., π K ( i ) = i ) while the results could be easily extended to any other arbitrary permutation. A. Proof of (cid:80) Ki =1 d i i ≤ (cid:80) Ki =2 (cid:80) i − r =1 λ rP i ( i − First, we improve the channel by giving the message and observation of user i to users i + 1 to K ( i =1 , . . . , K − ). Hence, from Fano’s inequality, nR i ≤ I ( W i ; Y n , . . . , Y ni | W , . . . , W i − , H n ) + (cid:15) n (5)where W = ∅ . This improvement does not decrease the capacity region, meaning that the capacity region of theoriginal channel is a subset of this improved channel. Also, by this improvement, channel input and outputs (i.e.,the enhanced observations of users) form a Markov chain which results in a degraded broadcast channel [12].Therefore, according to [13], since feedback does not increase the capacity of degraded broadcast channels, we canignore the delayed CSIT (D) and replace them with No CSIT (N). In other words, at time instant n , knowledge ofthe CSI up to time instant n − is not beneﬁcial in a physically degraded BC. Therefore, it is equivalent to havingthe channel of user i perfectly known with probability λ iP and not known otherwise. It is important to note thatalthough the channel has become physically degraded, the perfect CSIT (P) cannot be replaced with No CSIT (N),since (P) means that at time instant n the current state of the channel is known to the transmitter perfectly whichenables it to know the received signal within noise level (i.e., the results of [13] cannot be applied in this case.)From now on, we ignore the term (cid:15) n for simplicity and write K (cid:88) i =1 nR i i ≤ K (cid:88) i =1 I ( W i ; Y n , . . . , Y ni | W , . . . , W i − , H n ) i (6) ≤ h ( Y n | H n ) + K (cid:88) i =2 (cid:20) h ( Y n , . . . , Y ni | W , . . . , W i − , H n ) i − h ( Y n , . . . , Y ni − | W , . . . , W i − , H n ) i − (cid:21) + nO (log P ) (7) where Y = ∅ and we have used the fact that h ( Y n , . . . , Y nK | W , . . . , W K , H n ) nK ∼ O (log P ) . since with the knowledge of W , . . . , W K , H n , the observations Y n , . . . , Y nK can be reconstructed within noisedistortion. From the chain rule of entropies, each of the terms in the summation in (7) can be written as h ( Y n , . . . , Y ni | W , . . . , W i − , H n ) i − h ( Y n , . . . , Y ni − | W , . . . , W i − , H n ) i − n (cid:88) j =1 (cid:34) h ( Y ( j ) , . . . , Y i ( j ) | W , . . . , W i − , Y j − , . . . , Y j − i , H j ) i − h ( Y ( j ) , . . . , Y i − ( j ) | W , . . . , W i − , Y j − , . . . , Y j − i − , H j ) i − (cid:35) (8)where Y j − i is the time extension of Y from time instant i to j − . By adding Y j − i to the conditions of thesecond entropy, (8) will be increased to n (cid:88) j =1 (cid:20) h ( Y ( j ) , . . . , Y i ( j ) | T i,j , H ( j )) i − h ( Y ( j ) , . . . , Y i − ( j ) | T i,j , H ( j )) i − (cid:21) (9)where T i,j = ( W , . . . , W i − , Y j − , . . . , Y j − i , H j − ) . Therefore, we can write K (cid:88) i =1 nR i i ≤ ≤ n log P (cid:122) (cid:125)(cid:124) (cid:123) h ( Y n | H n ) + K (cid:88) i =2 n (cid:88) j =1 (cid:20) h ( Y ( j ) , . . . , Y i ( j ) | T i,j , H ( j )) i − h ( Y ( j ) , . . . , Y i − ( j ) | T i,j , H ( j )) i − (cid:21) + nO (log P ) . (10)Before going further, the following lemma is needed. Lemma 1 . Let Γ n = { Y , Y , . . . , Y n } be a set of n ( ≥ arbitrary random variables and Ω ji (Γ n ) be a slidingwindow of size j over Γ n i.e., Ω ji (Γ n ) = Y ( i − n +1 , Y ( i ) n +1 , . . . , Y ( i + j − n +1 . where ( . ) n deﬁnes the modulo n operation. Then, ( n − h ( Y , Y , . . . , Y n | A ) ≤ n (cid:88) i =1 h (Ω n − i (Γ n ) | A ) . (11)where A is an arbitrary condition. Proof:

We prove the above by induction. It is obvious that (11) holds for n = 2 (i.e., h ( Y , Y | A ) ≤ h ( Y | A ) + h ( Y | A ) ). Now, considering that (11) is valid for n , we show that it also holds for n + 1 . Replacing n with n + 1 ,we have nh ( Y , . . . , Y n +1 | A ) = h ( Y , . . . , Y n +1 | A ) + ( n − h ( Y , Y , . . . , Y n − , Z | A ) (12) ≤ h ( Y , . . . , Y n +1 | A ) + h (Ω n − (Ψ n ) | A ) + n (cid:88) i =2 h (Ω n − i (Ψ n ) | A ) (13) = h ( Y , . . . , Y n +1 | A ) + h ( Y , . . . , Y n − | A ) + n (cid:88) i =2 h (Ω ni (Γ n +1 ) | A ) (14) = h ( Y , . . . , Y n | A ) + h ( Y n +1 | Y , . . . , Y n , A ) + h ( Y , . . . , Y n − | A ) + n (cid:88) i =2 h (Ω ni (Γ n +1 ) | A ) (15) ≤ h ( Y , . . . , Y n | A ) + h ( Y n +1 | Y , . . . , Y n − , A ) + h ( Y , . . . , Y n − | A ) + n (cid:88) i =2 h (Ω ni (Γ n +1 ) | A ) (16) = h ( Y , . . . , Y n | A ) + h ( Y n +1 , Y , . . . , Y n − | A ) + n (cid:88) i =2 h (Ω ni (Γ n +1 ) | A ) (17) = n +1 (cid:88) i =1 h (Ω ni (Γ n +1 ) | A ) (18)where in (12), Z = ( Y n , Y n +1 ) , in (13), Ψ n = { Y , Y , . . . , Y n − , Z } and the assumption of (11) being valid for n is used, in (14), we have used the fact that Ω n − i (Ψ n ) = Ω ni (Γ n +1 ) (for ≤ i ≤ n ) and ﬁnally, (16) is due to thefact that conditioning reduces differential entropies.Now, each term in the summation of (10) can be rewritten as ( i − h ( Y ( j ) , . . . , Y i ( j ) | T i,j , H ( j )) − ih ( Y ( j ) , . . . , Y i − ( j ) | T i,j , H ( j )) i ( i − (19)and according to the previous lemma, (19) ≤ (cid:80) ir =1 (cid:2) h (Ω i − r (Γ i ) | T i,j , H ( j )) − h ( Y ( j ) , . . . , Y i − ( j ) | T i,j , H ( j )) (cid:3) i ( i − (20) = (cid:80) i − r =1 [ h ( Y i ( j ) | E r , T i,j , H ( j )) − h ( Y r ( j ) | E r , T i,j , H ( j ))] i ( i − (21)where Γ i = { Y ( j ) , Y ( j ) , . . . , Y i ( j ) } , E r = { Y ( j ) , Y ( j ) , ...Y i − ( j ) } − { Y r ( j ) } and (21) is from the chain ruleof entropies. Before going further, the following lemma is needed. lemma 2. Consider a K -user MISO BC. At time instant j , two arbitrary users are selected with the receivedsignals: Y m ( j ) = h m ( j ) T x ( j ) + w m ( j ) (22) Y q ( j ) = h q ( j ) T x ( j ) + w q ( j ) . (23)Without loss of generality, we assume m > q . For simplicity, we assume that the communication is done in realdimensions where x ∈ R M × satisfying E (cid:2) (cid:107) x (cid:107) (cid:3) ≤ P , h m and h q have the distribution N ( , I ) and w m and w q have the distribution N (0 , . When the CSIT of a user is either Perfect (P) or Not known (N), the following upperbound holds for the difference between entropies lim P →∞ h ( Y m ( j ) | T ) − h ( Y q ( j ) | T )log P ≤  when CSIT of h q ( j ) is P when CSIT of h q ( j ) is N (24)where T is a condition such as the condition of entropies in (21) or later in (34). Interestingly, (24) is only afunction of the CSIT of the second user. In other words, in the four possible cases of P P, P N, N P and

N N , theupper bound (not the exact value) for the pre-log factor of the difference is deﬁned by the CSIT of the second userresulting in the same upper bound for the

P N or N N case, and the same upper bound for the

P P or N P case.

Proof:

Based on the four possible states for the joint CSIT of h m ( j ) and h q ( j ) , we have

1) CSIT of h m ( j ) is N or P and CSIT of h q ( j ) is P: h ( Y m ( j ) | T ) − h ( Y q ( j ) | T ) ≤ h ( Y m ( j ) | T ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ log P − h ( Y q ( j ) | T, W , . . . , W K ) (cid:124) (cid:123)(cid:122) (cid:125) O (log P ) (25)Since h q ( j ) is known, a Gaussian input with the conditional covariance matrix of Σ x | T = P u ⊥ q u ⊥ q H achieves theupper bound, where u ⊥ q is a unit vector in the direction orthogonal to h q ( j ) .

2) CSIT of h m ( j ) is N and CSIT of h q ( j ) is N: In this case both Y m ( j ) and Y q ( j ) are statistically equivalent(i.e., having the same probability density functions, and subsequently, the same entropies.) Therefore, h ( Y m ( j ) | T ) − h ( Y q ( j ) | T ) = 0 (26)

3) CSIT of h m ( j ) is P and CSIT of h q ( j ) is N: This is a rather more complicated scenario and we defer theproof to Appendix A.From (10) and (21), we have K (cid:88) i =1 nR i i ≤ n log P + K (cid:88) i =2 n (cid:88) j =1 i − (cid:88) r =1 h ( Y i ( j ) | E r , T i,j , H ( j )) − h ( Y r ( j ) | E r , T i,j , H ( j )) i ( i −

1) + nO (log P ) (27) = n log P + K (cid:88) i =2 i − (cid:88) r =1 (cid:80) nj =1 [ h ( Y i ( j ) | E r , T i,j , H ( j )) − h ( Y r ( j ) | E r , T i,j , H ( j ))] i ( i −

1) + nO (log P ) (28) ≤ n log P + K (cid:88) i =2 i − (cid:88) r =1 nλ rP i ( i −

1) log P + nO (log P ) (29)where (29) is from the application of lemma 2 and the fact that n is sufﬁciently large. Therefore, K (cid:88) i =1 d i i ≤ K (cid:88) i =2 (cid:80) i − r =1 λ rP i ( i − . (30)It is obvious that the same approach can be applied to any other permutations on (1 , , . . . , K ) . In addition to thementioned proof, an alternative converse is provided in Appendix B. B. Proof of (cid:80) Ki =1 d i ≤ (cid:80) K − i =1 ( λ ψ πK ( i ) P + λ ψ πK ( i ) D ) We enhance the channel in two ways:1) Like the approach in [9], whenever there is delayed CSIT ( D ), we assume that it is perfect instantaneousCSIT ( P ), but we keep the probability of delayed CSIT. In other words, the CSIT of user i is perfect withprobability λ iP + λ iD and unknown otherwise.2) We give the message of user i ( W i ) to users i + 1 , i + 2 , . . . , K .Therefore, nR i ≤ I ( W i ; Y ni | W , . . . , W i − , H n ) . (31)By summing (31) over users and writing the mutual informations in terms of differential entropies, K (cid:88) i =1 nR i ≤ ≤ n log P (cid:122) (cid:125)(cid:124) (cid:123) h ( Y n | H n ) + K (cid:88) i =2 (cid:2) h ( Y ni | W , . . . , W i − , H n ) − h ( Y ni − | W , . . . , W i − , H n ) (cid:3) + nO (log P ) . (32) The term in the summation could be written as h ( Y ni | W , . . . , W i − , H n ) − h ( Y ni − | W , . . . , W i − , H n )= n (cid:88) j =1 [ h ( Y i ( j ) | T i,j , H ( j )) − h ( Y i − ( j ) | T i,j , H ( j ))] (33)where T i,j = (cid:0) W , . . . , W i − , H j − , Y i − (1) , . . . , Y i − ( j − , Y i ( j + 1) , . . . , Y i ( n ) (cid:1) . The proof of (33) is provided in Appendix C .Therefore, K (cid:88) i =1 nR i ≤ n log P + K (cid:88) i =2 n (cid:88) j =1 [ h ( Y i ( j ) | T i,j , H ( j )) − h ( Y i − ( j ) | T i,j , H ( j ))] (34)and ﬁnally, by applying the results of lemma 2 to (34), we have K (cid:88) i =1 d i ≤ K (cid:88) i =2 ( λ i − P + λ i − D ) = 1 + K − (cid:88) i =1 ( λ iP + λ iD ) . (35)Let π K ( . ) be an arbitrary permutation of size K on (1 , . . . , K ) . Applying the same reasoning, we have K (cid:88) i =1 d i ≤ K − (cid:88) i =1 ( λ π K ( i ) P + λ π K ( i ) D ) ∀ π K ( . ) (36)(36) results in K inequalities all having the same left hand side. Therefore, K (cid:88) i =1 d i ≤ π K ( . ) K − (cid:88) i =1 ( λ π K ( i ) P + λ π K ( i ) D ) (37)and it is obvious that ψ π K ( . ) will minimize (37) if and only if it satisﬁes (3) (for j = K .)V. A CHIEVABILITY

In this section, we consider the achievability of the symmetric case. The outer bound in theorem consists of K − (cid:80) Kj =2 j !  Kj  inequalities. For K = 2 , it is observed that the outer bound (even with M > ) willbe the same as [9] where it was shown to be achievable, regardless of the pattern of CSIT. For K ≥ , we showthat given the marginal probabilities of CSIT, there exists at least one CSIT pattern that achieves the outer boundin some scenarios. We investigate the following two cases: A. λ D = 0 In this case, K − inequalities ( K − K − inequalities having (cid:80) i d i (summation with equal weights) in theleft-hand side and K single user inequalities) are active and the remaining (cid:80) Kj =2 j !  Kj  inequalities becomeinactive. The reason can be easily veriﬁed from the inequalities, however, a simpler intuitive way is to consider thatthose (cid:80) Kj =2 j !  Kj  inequalities are derived from making the channel degraded and when there is no delayedCSIT, this degradation results in loose bounds. Equivalently, when there is no delayed CSIT, those inequalities Fig. 4: Region in case A for 3 user BCderived from the degraded broadcast channel are inactive. In this case, the region is deﬁned by K − hyperplanesin R K + and has the following K corner points: (1 , λ P , . . . , λ P ) , ( λ P , , λ P , . . . , λ P ) , . . . , ( λ P , . . . , λ P , The corner points have the unique characteristic that the whole region can be constructed by time sharing betweenthem. Therefore, the achievability of these points is equivalent to the achievability of the whole region. Figure 4shows the region for the 3 user broadcast channel. The corner points are simply achieved by the scheme shownin ﬁgure 5. The scheme has N time slots and consists of two parts: in the ﬁrst λ P N time slots, zero forcingbeamforming (ZFBF) is carried out where each user receives one interference-free symbol. In the remaining λ N N time slots, only one particular user (depending on the corner point of interest) is scheduled. B. λ N ≤ λ D (cid:80) Kj =2 1 j Before going further, we need the following simple lemma.

Lemma 3 . The minimum probability of delayed CSIT for sending order- j symbols in the K -user MAT is λ minD ( K, j ) = 1 − K − j + 1 K (cid:80) Ki = j i . (38)Substituting j = 1 in (38), we get the minimum λ D for order- symbols as λ minD ( K ) = 1 − (cid:80) Ki = j i . (39) Fig. 5: Achievable scheme in case A for 3 user BC

Proof:

From [2], the MAT algorithm is based on a concatenation of K phases. As shown below, phase j takes ( K − j + 1)  Kj  order- j messages as its input, takes  Kj  time slots and produces j  Kj + 1  order- j + 1 messages as its output, as illustrated below ( K − j + 1)  Kj  order- j → Phase j (cid:124) (cid:123)(cid:122) (cid:125)  Kj  time slots → j  Kj + 1  order- j + 1 . In each time slot of phase j , the transmitter sends a random linear combination of the ( K − j + 1) symbols toa subset S of receivers , | S | = j . Sending the overheard interferences from the remaining ( K − j ) receivers toreceivers in subset S enables them to successfully decode their ( K − j + 1) symbols by constructing a set of ( K − j + 1) linearly independent equations. Therefore, the transmitter needs to know the channel of only ( K − j ) receivers. In other words, at each time slot of phase j , the feedback of ( K − j ) CSI is enough. In the MAT algorithmthe number of output symbols that phase j produces should match the number of input symbols of phase j + 1 .The ratio between the input of phase j + 1 and output of phase j is: ( K − j )  Kj + 1  j  Kj + 1  = ( K − j ) j . This means that ( K − j ) repetition of phase j will produce the inputs needed by j repetition of phase j + 1 .In general, in order to have an integer number for repetitions, we multiply phase by K ! (i.e., repeat it K ! times), phase by K !( K − , and so on. Therefore, phase j will be repeated (( j − K − j )!) K times which takes (( j − K − j )!) K  Kj  time slots. Since ( K − j ) feedbacks from each time slot is sufﬁcient, the numberof feedbacks will be (( j − K − j )!) K  Kj  ( K − j ) . For a successive decoding or order- j symbols, all the Fig. 6: Region in case B for 3 user BChigher order symbols must be decoded successfully. Therefore, instead of having delayed CSIT at all time instantsfrom all users, the minimum probability of delayed CSIT is the number of feedbacks from phase j to K dividedby the whole number of time slots multiplied by the number of users, λ minD ( K, j ) = (cid:80) Ki = j ( i − K − i )! K  Ki  ( K − i ) (cid:80) Ki = j ( i − K − i )! K  Ki  K = 1 − K − j + 1 K (cid:80) Ki = j i . In this case (i.e., λ N ≤ λ D (cid:80) Kj =2 1 j ), the K − K − inequalities having (cid:80) i d i (summation with equal weights)in the left-hand side become inactive and the remaining (cid:80) Kj =1 j !  Kj  inequalities are active which construct (cid:80) Kj =1 j !  Kj  hyperplanes in R K + . This region has the following K − corner points •  K  corner points in the form (1 , λ P , . . . , λ P ) , ( λ P , , λ P , . . . , λ P ) , . . . , ( λ P , . . . , λ P , •  K  corner points in the form ( λ P , λ P , λ P , . . . , λ P ) , ( λ P , λ P , λ P , λ P , . . . , λ P ) , . . . •  K  corner points in the form ( λ P , λ P , λ P , λ P , . . . , λ P ) , . . . Fig. 7: Achievable scheme in case B for 3 user BC • . . . , and ﬁnally,  KK  corner points in the form ( λ P (cid:80) Ki =2 1 i (cid:80) Ki =1 1 i , λ P (cid:80) Ki =2 1 i (cid:80) Ki =1 1 i , . . . , λ P (cid:80) Ki =2 1 i (cid:80) Ki =1 1 i ) The region for the 3 user broadcast channel and the achievable scheme are shown in ﬁgure 6 and ﬁgure 7,respectively. The scheme is based on a concatenation of ZFBF and MAT as follows. For the ﬁrst K corner pointslisted above, the achievability scheme is the same as that in the previous section (i.e., ZFBF + ﬁxed user scheduling).For the  Kj  ( j ≥ ) corner points in the form ( λ P (cid:80) ji =2 1 i (cid:80) ji =1 1 i , λ P (cid:80) ji =2 1 i (cid:80) ji =1 1 i , . . . , λ P , . . . , λ P ) , we write λ P = M N , λ D = M N , λ minD ( j ) = mn (40)where λ minD ( j ) is the minimum probability of delayed CSIT for sending order-1 symbols in the j -user MAT. m, n, M i and N i ( i = 1 , ) are integers. Making a common denominator between λ P and λ D we have λ P = nM N nN N , λ D = nN M nN N . (41)We construct nN N time slots where the CSIT of each user can be Perfect (P) or Delayed (D) in nM N or nN M time slots, respectively. In the ﬁrst nM N time slots, ZFBF is carried out. In the remaining n ( N N − M N ) time slots, j -user MAT algorithm is done. At each time slot of the ZFBF part, 1 interference-free symbolis received by each user and in the MAT part, n ( N N − M N )1+ + ··· + j symbols are sent to each of the users in subset S (with | S | = j ) where S depends on the corner point of interest. In order to do the MAT algorithm in the secondpart, the minimum probability of delayed CSIT should be met nN M ≥ λ minD ( j ) n ( N N − M N ) (42)Dividing both sides by nN N , λ D ≥ λ minD ( j )(1 − λ P ) = λ minD ( j )( λ D + λ N ) (43)which results in λ N ≤ λ D (cid:80) ji =2 1 i . (44) Fig. 8: The CSIT pattern of the example with λ P = λ P = λ D = and λ iN = , i = 1 , , . Since it should be valid for all j , we have λ N ≤ λ D (cid:80) Ki =2 1 i (45)which is the condition assumed in this case. For the case when λ N > λ D (cid:80) Ki =2 1 i , ﬁnding a general achievable schemeremains an open problem.In case B, the K − K − inequalities having (cid:80) i d i (summation with equal weights) in the left-hand side wereinactive. From section B in the proof of theorem, these inequalities were derived by enhancing the channel in away that whenever there is delayed CSIT we replace it with Perfect CSIT as in [9] for the two user case. Due tothis enhancement, a question may be raised whether these inequalities are always loose or not when λ D (cid:54) = 0 and K > . The following example shows that these inequalities are not always loose. Consider the CSIT pattern inﬁgure 8 where λ P = , λ D = and λ N = . From the outer bound in theorem, it is observed that the sum DoFmust be lower than or equal to . Actually the sum DoF of is optimal and achievable as shown in the sequel.The transmitted signal at time slot is x (1) = P u + P u + u v where u and u are the (2 by 1) private message vectors for user 1 and 2, respectively. u is the private (scalar)message for user 3 and v is a (3 by 1) vector orthogonal to both h and h . P and P are the (3 by 2) precodingmatrices with the following property: h H P = h H P = × At time slot 1, the received signals at the receivers are y (1) = L ( u ) y (1) = L ( u ) y (1) = ˜ L ( u ) + ˜ L ( u ) + ( h H v ) u . where L ( u ) = h H P u , L ( u ) = h H P u , ˜ L ( u ) = h H P u and ˜ L ( u ) = h H P u . The transmitter sends ˜ L ( u ) and ˜ L ( u ) at time slots and , respectively. Having L and ˜ L , receiver can decode its two privatemessages, since h and h are statistically and hence, linearly independent almost surely. The same applies toreceiver , and receiver can decode its private message after eliminating the interference terms ˜ L and ˜ L . Thus,the inequality ( d + d + d ≤ ) is tight in this case. Fig. 9: Two CSIT patterns. ( a ) λ P = , λ P = and λ P = 1 ( b ) λ P = and λ P = λ P = Fig. 10: Two symmetric CSIT patterns having the same marginal probabilities (i.e., λ P = 1 − λ N = .)For the asymmetric scenario with no Delayed (D) CSIT, it is interesting to note that the outer bound in theoremdoes not depend on the maximum probability of Perfect CSIT. Since there is no delayed CSIT, those inequalitiesobtained from making the channel degraded are inactive and only the inequality having the form of summationwith equal weights becomes active. According to (2) and (3), the user with the highest probability of perfect CSITis excluded from the right-hand side of the inequality. For example, according to the theorem, sum DoF of the twoCSIT patterns shown in ﬁgure 9 has the upper bound of and since it is achievable in both patterns, it is optimal.VI. D EPENDENCY OF THE DOF REGION ON THE CSIT PATTERN

In the previous sections, the focus was on the outer bounds and their achievabilities given only the marginalprobabilities. Here, we show that two different CSIT patterns, though having the same marginal probabilities, donot necessarily have the same DoF region. Consider the two simple symmetric CSIT patterns shown in ﬁgure 10.According to the theorem, the DoF region of both has an outer bound as shown in ﬁgure 4 with the corner points (1 , , ) , ( , , ) and ( , , . It is obvious that the corner points are achievable for pattern ( a ) , and in whatfollows we show that they are not achievable for pattern ( b ) . In other words, the DoF region of pattern ( b ) is insidethat of pattern ( a ) . We write, nR ≤ I ( W ; Y n | H n ) (46) nR ≤ I ( W ; Y n | H n , W ) (47)where in (47), we use the fact that W and W are independent. Adding (46) and (47) results in nR ≤ I ( W ; Y n | H n ) + I ( W ; Y n | H n , W ) . (48)By doing the same for R , we have nR ≤ I ( W ; Y n | H n ) + I ( W ; Y n | H n , W ) . (49) Finally, the rate of user 3 is written as nR ≤ I ( W ; Y n | H n , W , W ) . (50)By adding (48),(49) and (50) and writing them in terms of differential entropies, we get nR + 2 nR + nR ≤ h ( Y n | H n ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ n log P + h ( Y n | H n ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ n log P (51) + h ( Y n | H n , W ) − h ( Y n | H n , W ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ n log P + h ( Y n | H n , W ) − h ( Y n | H n , W ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ n log P (52) + h ( Y n | H n , W , W ) − h ( Y n | H n , W , W ) − h ( Y n | H n , W , W ) (cid:124) (cid:123)(cid:122) (cid:125) ≤− h ( Y n ,Y n | H n ,W ,W ) (53) ≤ n P + h ( Y n | H n , W , W ) − h ( Y n , Y n | H n , W , W ) (54) = 8 n P + h ( Y n | H n , W , W ) − h ( Y n ,P NN , Y n ,NP N , Y n ,NNP | H n , W , W ) (cid:124) (cid:123)(cid:122) (cid:125) O (log P ) (55) − h ( Y n ,P NN , Y n ,NP N , Y n ,NNP | H n , W , W , Y n ,P NN , Y n ,NP N , Y n ,NNP ) (cid:124) (cid:123)(cid:122) (cid:125) ≤− h ( Y n ,PNN ,Y n ,NPN ,Y n ,NNP | H n ,W ,W ,Y n ,PNN ,Y n ,NPN ,Y n ,NNP ,W ) ∼ O (log P ) (56) ≤ n P (57)where in (52), the difference terms are ﬁrst written as a time summation of instantaneous differences, as in (33).Then, lemma 2 of section IV is applied to the differences resulting in the values written under braces. We havesplit the observation of users 2 and 3 in terms of the joint CSIT, i.e., Y n = ( Y n ,P NN , Y n ,NP N , Y n ,NNP ) and Y n = ( Y n ,P NN , Y n ,NP N , Y n ,NNP ) . Again, in (55), the difference terms are ﬁrst written as a time summationof instantaneous differences and by the application of lemma 2 we get the upperbound shown under the braces.Speciﬁcally, (55) is due to the fact that there is at least one unknown CSIT (N) in the joint observations Y n , Y n (see rows 1 and 2 of the CSIT pattern shown in ﬁgure 10 ( b ) .) Finally, (56) is due to the fact that conditioningreduces the entropy and knowledge of all the messages and the channels enable us to reconstruct each observationwithin noise distortion. Therefore, for pattern ( b ) , the following inequalities hold which make its DoF region insidethat of pattern ( a ) . d + 2 d + d ≤ (58) d + d + 2 d ≤ (59) d + 2 d + 2 d ≤ . (60) Motivated by this simple example, we can have the following set of inequalities for the 3-user MISO BC d + 2 d + d ≤ λ P + λ D ) + ( λ P + λ D ) + ( λ P P − + λ P D − + λ DP − + λ DD − ) (61) d + d + 2 d ≤ λ P + λ D ) + ( λ P + λ D ) + ( λ P − P + λ P − D + λ D − P + λ D − D ) (62) d + 2 d + 2 d ≤ λ P + λ D ) + ( λ P + λ D ) + ( λ − P P + λ − P D + λ − DP + λ − DD ) (63)where a dashed line in the above means that the CSIT of the corresponding user is not important (for example, λ P D − = λ P DP + λ P DD + λ P DN which is a summation over all the possible values for the CSIT of user 3).The same approach could be easily extended to the K -user MISO BC which is omitted for brevity. It is obviousthat none of the above inequalities can have its right-hand side written in terms of only marginal probabilities.Therefore, in contrast to the two user scenario, marginal probabilities of CSIT are not sufﬁcient for deﬁning theDoF region of the general K -user MISO BC, and having the same marginal probabilities does not guarantee thesame DoF region. VII. C ONCLUSION

Given the marginal probabilities of CSIT, an outer bound was derived for the DoF region of the K -user MISOBC with CSIT alternating among Perfect (P), Delayed (D) or Not known (N). This outer bound was shown tobe achievable by speciﬁc CSIT patterns in certain regions. Through an example, we showed that in general, theDoF region of the K -user MISO BC (when K ≥ ) is a function of CSIT patterns or equivalently the K stateprobabilities rather than the sole marginal probabilities. In contrast to the two user case, the following items may bethe reason why the outer bounds may not always be achievable: 1) It should be a function of all joint probabilities.2) Making the channel degraded when there are also P and N CSITs. 3) Enhancing the channel by treating D asP. Investigating this problem in more detail and the generalization of the results to the MIMO BC are the topics ofour future work. A PPENDIX AP ROOF OF CASE (3)

IN LEMMA

Lemma 4.

Let ˆ Z and ˆ Z be two Gaussian random variables, and let a and b be two deterministic vectors in R M . Let U be a random variable independent of ˆ Z and ˆ Z . In the optimization problem max p ( x | u ) h ( a T X + ˆ Z | U ) − µh ( b T X + ˆ Z | U ) (64)subject to Cov ( X | U ) (cid:22) S and X → U → ( ˆ Z , ˆ Z ) (65)for any µ ≥ and positive semideﬁnite S , a Gaussian p ( x | u ) with the same covariance matrix for each u is anoptimal solution. Proof:

We restate theorem 8 in [10]: Let Z and Z be two Gaussian vectors with strictly positive deﬁnitecovariance matrices K Z and K Z , respectively. Let µ ≥ be a real number, S be a positive semideﬁnite matrixand U be a random variable independent of Z and Z . Consider the optimization problem max p ( x | u ) h ( X + Z | U ) − µh ( X + Z | U ) (66)subject to Cov ( X | U ) (cid:22) S (67)where the maximization is over all conditional distribution of X given U independent of Z and Z . A Gaussian p ( x | u ) with the same covariance matrix for each u is an optimal solution of this optimization problem. The prooffollows the same steps as in theorem 1 in [10] with replacing the classical Entropy Power Inequality (EPI) by itsconditional version [14].Now, instead of (150) in Appendix D of [10], (66) can be used. Next, lemma 13 in [10] is generalized to the M dimensional conditional version as follows.Let Z = ( Z , Z , . . . , Z M ) t where Z , Z , . . . , Z M are independent Gaussian variables with variances σ , σ , . . . , σ M ,respectively. Let U be a random variable independent of Z . For any random vector X = ( X , X , . . . , X M ) t withﬁnite variances and given U independent of Z , we have lim σ ,...,σ M →∞ I ( X ; X + Z | U ) = I ( X ; X + Z | U ) (68)The proof is quite the same as that in [10] (i.e., (153) to (159)) considering the following Markov chains: X + Z → X → ˜ X + ˜ Z , X → ˜ X → ˜ X + ˜ Z where ˜ X = ( X , . . . , X M ) t and ˜ Z = ( Z , . . . , Z M ) t .By eigenvalue decomposition, we have K Z = A Λ A t and K Z = B Λ B t where A = [ a , . . . , a m ] , B =[ b , . . . , b m ] and Λ i = diag ( λ i , λ i , . . . , λ iM )( i = 1 , . The columns of A are M orthogonal vectors in an M dimensional space. In our derivations, these vectors do not need to be orthonormal (i.e., A and B are not necessarilyunitary matrices), but we only restrict them to have ﬁnite norms. Since A and B are invertible (due to being fullrank), we can write lim λ ,...,λ M →∞ I ( X ; X + Z | U ) = lim λ ,...,λ M →∞ I ( A t X ; A t X + A t Z (cid:124) (cid:123)(cid:122) (cid:125) ˆ Z | U ) (69)where ˆ Z is a Gaussian vector independent of U with independent elements having the variances ˆ σ j = λ j (cid:107) a j (cid:107) ( j =1 , , . . . , M ) where a j is the j th column of A having a ﬁnite norm. According to (68), (69) = I ( a t X ; a t X + ˆ Z | U ) (70)where ˆ Z denotes the ﬁrst element of ˆ Z . Following the same steps as in [10] (from (162) to (167)), we get h ( a t X + ˆ Z | U ) − µh ( b t X + ˆ Z | U ) ≤ max (cid:22) K X | U (cid:22) S (cid:26)

12 log(2 πe ( a t K X | U a + ˆ σ )) − µ πe ( b t K X | U b + ˆ σ )) (cid:27) (71)where K X | U = Cov ( X | U ) . We split T in (24) as T = ( U, h m ( j ) , h q ( j )) . From now on, we drop the time indices for simplicity. Therefore, h ( Y m | T ) − h ( Y q | T ) ≤ max S : S (cid:23) , tr ( S ) ≤ P max (cid:22) K X | U (cid:22) S h ( Y m | U, h m , h q ) − h ( Y q | U, h m , h q )= max S : S (cid:23) , tr ( S ) ≤ P max (cid:22) K X | U (cid:22) S E h m , h q (cid:8) h ( Y m | U, h m = h m , h q = h q ) − h ( Y q | U, h m = h m , h q = h q ) (cid:9) ≤ max S : S (cid:23) , tr ( S ) ≤ P E h m , h q (cid:26) max (cid:22) K X | U (cid:22) S h ( Y m | U, h m = h m , h q = h q ) − h ( Y q | U, h m = h m , h q = h q ) (cid:27) (72) = max S : S (cid:23) , tr ( S ) ≤ P E h m , h q (cid:26) max (cid:22) K X | U (cid:22) S h ( h mH X ( h m ) + w m | U ) − h ( h qH X ( h m ) + w q | U ) (cid:27) (73) ≤ max S : S (cid:23) , tr ( S ) ≤ P E h m , h q (cid:26) max (cid:22) K X | U (cid:22) S (cid:26)

12 log(2 πe ( h mH K X | U h m + 1)) − µ πe ( h qH K X | U h q + 1)) (cid:27)(cid:27) (74)where h m and h q are two realizations of the random vectors h m and h q . In (72) taking the maximization into theexpectation makes the value greater. In (73), we have used the realizations and also consider that since the CSIT of h m is perfect, the transmitted signal can be a function of it which is denoted by X ( h m ) . (74) results from lemma4. Let λ ≥ λ ≥ . . . ≥ λ M be the eigenvalues of the covariance matrix K X | U . We can write log( h mH K X | U h m + 1) ≤ log( λ (cid:107) h m (cid:107) + 1) (75)and it is achieved when the eigenvector corresponding to the largest eigenvalue of K X | U is aligned with h m . Thisis in fact possible, since the transmitter has a perfect knowledge of h m and can construct the covariance matrix(via precoding) to meet this condition. However, for user q , we have log( h qH K X | U h q + 1) = log( M (cid:88) i =1 λ i | α i | + 1) (76)where α i is the projection of i th eigenvector of K X | U onto h q (i.e., α i = h qH v i , where v i is the eigenvectorcorresponding to λ i .) Since the transmitter has no CSIT of user q , the transmitted signal will be independent of h q . This results in having α (cid:54) = 0 almost surely (or equivalently, Pr { α = 0 } = 0 .) In other words, if E denotesthe event of having h q orthogonal to the eigenvector corresponding to the largest eigenvalue of K X | U , then thedimension of E is lower than the dimension of the sample space resulting in P r { E } = 0 . By replacing µ with in (74) and using (75) and (76), we get log( h mH K X | U h m + 1) − log( h qH K X | U h m + 1) ≤ log( λ (cid:107) h m (cid:107) + 1 λ | α | + 1 ) (77) ≤ (cid:107) h m (cid:107) λ (cid:107) h m (cid:107) λ log( (cid:107) h m (cid:107) | α | ) (78) ≤ log( (cid:107) h m (cid:107) | α | ) (cid:124) (cid:123)(cid:122) (cid:125) O (log P ) (79) where (78) results from the application of log sum inequality [12, p. 30] and (79) is due to the fact that (cid:107) h m (cid:107) and | α | do not scale with P . Since the DoF analysis is in inﬁnite SNR regime, the exact value of | α | is not importantas long as it is non-zero. Therefore, for each realization of the channel, inside the expectation of (74) is of order O (log P ) , and so will be the expectation. Hence, when CSIT of h m is perfect and CSIT of h q is not known lim P →∞ h ( Y m | T ) − h ( Y q | T )log P = 0 . (80)This completes the proof. A PPENDIX BA N ALTERNATIVE PROOF OF (cid:80) Ki =1 d i i ≤ (cid:80) Ki =2 (cid:80) i − r =1 λ rP i ( i − The proof is based on the approach used in [3], therefore the following deﬁnitions are necessary. The channelvector of user k at time n can be written as h k ( n ) = (cid:98) h k ( n ) + (cid:101) h k ( n ) (81)where (cid:98) h k ( n ) and (cid:101) h k ( n ) are the estimate of the channel and estimation error with distributions CN ( , (1 − σ k ( n )) I ) and CN ( , σ k ( n ) I ) , respectively. The variance of error is σ k ( n ) = E (cid:104) (cid:107) (cid:101) h k ( n ) (cid:107) (cid:105) . As observed from the above, although the channel is assumed stationary, the estimate is a non-stationary processmeaning that the quality of estimation varies over time. The quality of CSIT for user k at time instant n is α k ( n ) = − lim P →∞ log (cid:0) σ k ( n ) (cid:1) log P . (82)From the results of [15], if the rate of feedback scales linearly with log P (or equivalently, the variance of estimationerror decrease as O ( P − ) or faster), perfect CSIT multiplexing gain can be obtained. Therefore, the effective rangeof α k ( n ) will be [0 , where in terms of DoF, α k ( n ) = 1 could be interpreted as perfect CSIT of user k at timeinstant n . We also deﬁne (cid:98) H ( n ) = [ (cid:98) h ( n ) , . . . , (cid:98) h K ( n )] H and (cid:98) H n = { (cid:98) H (1) , . . . , (cid:98) H ( n ) } . Again, for simplicity, weshow the inequalities for a ﬁxed permutation of the users while the results could be easily extended to any arbitrarypermutations. As in part A of the ﬁrst proof, the same channel improvement is done here. The only differenceis that we assume the users not only have perfect global CSIR, but also they know the channel estimates at thetransmitter. Applying this difference to formulae (5) to (10), we rewrite (10) as K (cid:88) i =1 nR i i ≤ ≤ n log P (cid:122) (cid:125)(cid:124) (cid:123) h ( Y n | H n , (cid:98) H n ) + K (cid:88) i =2 n (cid:88) j =1 (cid:20) h ( Y ( j ) , . . . , Y i ( j ) | T i,j , H ( j )) i − h ( Y ( j ) , . . . , Y i − ( j ) | T i,j , H ( j )) i − (cid:21) + nO (log P ) . (83) where T i,j = ( W , . . . , W i − , Y j − , . . . , Y j − i , H j − , (cid:98) H j ) . In what follows, we ﬁnd an upper bound for the termin the brackets of (83). Following the same approach as in [3], we can write max P Ti,j P x ( j ) | Ti,j (cid:20) h ( Y ( j ) , . . . , Y i ( j ) | T i,j , H ( j )) i − h ( Y ( j ) , . . . , Y i − ( j ) | T i,j , H ( j )) i − (cid:21) (84) ≤ max P Ti,j E T i,j (cid:34) max P x ( j ) | Ti,j (cid:18) h ( Y ( j ) , . . . , Y i ( j ) | T i,j = T, H ( j )) i − h ( Y ( j ) , . . . , Y i − ( j ) | T i,j = T, H ( j )) i − (cid:19)(cid:35) (85) = max P Ti,j E T i,j (cid:34) max P x ( j ) | Ti,j E H ( j ) | T i,j (cid:18) h ( Y ( j ) , . . . , Y i ( j ) | T i,j = T, H ( j ) = H ) i − h ( Y ( j ) , . . . , Y i − ( j ) | T i,j = T, H ( j ) = H ) i − (cid:19)(cid:35) (86) = max P Ti,j E T i,j (cid:34) max P x ( j ) | Ti,j E H ( j ) | (cid:98) H ( j ) (cid:18) h ( H i ( j ) x ( j ) + n j | T i,j = T ) i − h ( H i − ( j ) x ( j ) + w j | T i,j = T ) i − (cid:19)(cid:35) (87) = max P Ti,j E T i,j  max C : C (cid:23) ,tr ( C ) ≤ P max P x ( j ) | Ti,j

Cov ( x ( j ) | T i,j ) (cid:22) C E H ( j ) | (cid:98) H ( j ) (cid:18) h ( H i ( j ) x ( j ) + n j | T i,j = T ) i − h ( H i − ( j ) x ( j ) + m j | T i,j = T ) i − (cid:19) (88) = max P Ti,j E T i,j (cid:20) max C : C (cid:23) ,tr ( C ) ≤ P E H ( j ) | (cid:98) H ( j ) (cid:18) log det ( I i + H i ( j ) K ∗ H i ( j ) H ) i − log det ( I i − + H i − ( j ) K ∗ H i − ( j ) H ) i − (cid:19)(cid:21) (89) ≤ E (cid:98) H ( j ) (cid:20) max K : K (cid:23) ,tr ( K ) ≤ P E H ( j ) | (cid:98) H ( j ) (cid:18) log det ( I i + H i ( j ) K H i ( j ) H ) i − log det ( I i − + H i − ( j ) K H i − ( j ) H ) i − (cid:19)(cid:21) (90) ≤ − log det (Σ ) i ( i −

1) + O (log P ) (91)where we have the Markov chain x ( j ) ↔ T i,j ↔ ˆ H ( j ) ↔ H ( j ) , H i ( j ) = [ h ( j ) , . . . , h i ( j )] H , H i − ( j ) =[ h ( j ) , . . . , h i − ( j )] H , n j = [ w ( j ) , . . . , w i ( j )] T and m j = [ w ( j ) , . . . , w i − ( j )] T . (89) is the application of ex-tremal inequality [10], [16] where the Gaussian distribution maximizes a speciﬁc difference between two differentialentropies. The last inequality (91) comes from (101) in [17], in which Σ = diag (cid:0) σ ( j ) , . . . , σ i − ( j ) (cid:1) . Therefore, we can write K (cid:88) i =1 nR i i ≤ n log P + K (cid:88) i =2 n (cid:88) j =1 (cid:20) − log det (Σ ) i ( i −

1) + O (log P ) (cid:21) + nO (log P )= n log P + K (cid:88) i =2 (cid:80) nj =1 [ α ( j ) + · · · + α i − ( j )] i ( i −

1) log P + nKO (log P ) . (92) Since the channel is degraded and D is replaced with N , the CSIT is either P or N . Therefore, the α ’s are either with probability λ iP or otherwise. Therefore, for n large enough, we have lim n →∞ n (cid:88) j =1 [ α ( j ) + · · · + α i − ( j )] = n i − (cid:88) r =1 λ rP which results in K (cid:88) i =1 nR i i ≤ n log P + K (cid:88) i =2 i − (cid:88) r =1 nλ rP i ( i −

1) log P + nKO (log P ) (93)at large n . Dividing both sides by n log P and taking the limit of (93) as n, P → ∞ , we get K (cid:88) i =1 d i i ≤ K (cid:88) i =2 (cid:80) i − r =1 λ rP i ( i − . (94)It is obvious that the same approach can be applied to any other permutations of (1 , , . . . , K ) .A PPENDIX CP ROOF OF (33)According to

Csisz´ar sum identity [11], for the two arbitrary random vectors X n and Y nn (cid:88) i =1 I ( X ni +1 ; Y i | Y i − ) = n (cid:88) i =1 I ( Y i − ; X i | X ni +1 ) . (95)where X n +1 , Y = ∅ . By writing the mutual information of (95) in terms of the differential entropies, we have n (cid:88) i =1 [ h ( Y i | Y i − ) − h ( Y i | Y i − , X ni +1 )] = n (cid:88) i =1 [ h ( X i | X ni +1 ) − h ( X i | X ni +1 , Y i − )] . (96)and ﬁnally, by using the chain rule of entropies, we get h ( X n ) − h ( Y n ) = n (cid:88) i =1 [ h ( X i | X ni +1 , Y i − ) − h ( Y i | X ni +1 , Y i − )] (97)R EFERENCES[1] B. Clerckx and C. Oestges,

MIMO Wireless Networks, 2nd Edition . Academic Press, 2013.[2] M. Maddah-Ali and D. Tse, “Completely stale transmitter channel state information is still very useful,”

IEEE Trans. Inf. Theory , vol. 58,no. 7, pp. 4418–4431, 2012.[3] S. Yang, M. Kobayashi, D. Gesbert, and X. Yi, “Degrees of freedom of time correlated miso broadcast channel with delayed csit,”

IEEETrans. Inf. Theory , vol. 59, no. 1, pp. 315–328, 2013.[4] T. Gou and S. Jafar, “Optimal use of current and outdated channel state information: Degrees of freedom of the miso bc with mixed csit,”

IEEE Comms. Letters , vol. 16, no. 7, pp. 1084 –1087, july 2012.[5] J. Chen and P. Elia, “Degrees-of-freedom region of the miso broadcast channel with general mixed-csit,” vol. arxiv/1205.3474, May, 2012.[6] P. de Kerret, X. Yi, and D. Gesbert, “On the degrees of freedom of the K-user time correlated broadcast channel with delayed CSIT,”in

ISIT 2013, IEEE International Symposium on Information Theory, July 7-12, 2013, Istanbul, Turkey

IEEE ICC2013 , Budapest, Hungary, Jun. 2013, available on arXiv:1302.6521.[8] ——, “Miso broadcast channel with imperfect and (un)matched csit in the frequency domain: Dof region and transmission strategies,” in to appear in IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 2013 , Sept. 2013. [9] R. Tandon, S. Jafar, S. Shamai Shitz, and H. Poor, “On the synergistic beneﬁts of alternating csit for the miso broadcast channel,” IEEETrans. Inf. Theory. , vol. 59, no. 7, pp. 4106–4128, 2013.[10] T. Liu and P. Viswanath, “An extremal inequality motivated by multiterminal information-theoretic problems,”

IEEE Trans. Inf. Theory ,vol. 53, no. 5, pp. 1839–1851, 2007.[11] A. E. Gamal and Y.-H. Kim,

Network Information Theory . Cambridge University Press, 2012.[12] T. M. Cover and J. A. Thomas, ”Elements of Information Theory, second edition . New York: Wiley-Intersicence, 2006.[13] A. Gamal, “The feedback capacity of degraded broadcast channels (corresp.),”

IEEE Trans. Inf. Theory , vol. 24, no. 3, pp. 379 – 381,may 1978.[14] P. Bergmans, “A simple converse for broadcast channels with additive white gaussian noise,”

IEEE Trans. Inf. Theory , vol. IT-20, no. 2,pp. 279–280.[15] N. Jindal, “Mimo broadcast channels with ﬁnite-rate feedback,”

IEEE Trans. Inf. Theory , vol. 52, no. 11, pp. 5045 –5060, nov. 2006.[16] H. Weingarten, Y. Steinberg, and S. Shamai, “The capacity region of the gaussian multiple-input multiple-output broadcast channel,”

IEEETrans. Inf. Theory , vol. 52, no. 9, pp. 3936 –3964, sept. 2006.[17] J. Chen and P. Elia, “Toward the performance vs. feedback tradeoff for the two-user miso broadcast channel,” submitted to IEEE Trans.Inf. Theory, November 2012submitted to IEEE Trans.Inf. Theory, November 2012