[PDF] Joint Scheduling and ARQ for MU-MIMO Downlink in the Presence of Inter-Cell Interference

Abstract

User scheduling and multiuser multi-antenna (MU-MIMO) transmission are at the core of high rate data-oriented downlink schemes of the next-generation of cellular systems (e.g., LTE-Advanced). Scheduling selects groups of users according to their channels vector directions and SINR levels. However, when scheduling is applied independently in each cell, the inter-cell interference (ICI) power at each user receiver is not known in advance since it changes at each new scheduling slot depending on the scheduling decisions of all interfering base stations. In order to cope with this uncertainty, we consider the joint operation of scheduling, MU-MIMO beamforming and Automatic Repeat reQuest (ARQ). We develop a game-theoretic framework for this problem and build on stochastic optimization techniques in order to find optimal scheduling and ARQ schemes. Particularizing our framework to the case of "outage service rates", we obtain a scheme based on adaptive variable-rate coding at the physical layer, combined with ARQ at the Logical Link Control (ARQ-LLC). Then, we present a novel scheme based on incremental redundancy Hybrid ARQ (HARQ) that is able to achieve a throughput performance arbitrarily close to the "genie-aided service rates", with no need for a genie that provides non-causally the ICI power levels. The novel HARQ scheme is both easier to implement and superior in performance with respect to the conventional combination of adaptive variable-rate coding and ARQ-LLC.

Full PDF

aa r X i v : . [ c s . I T ] J a n Joint Scheduling and ARQ for MU-MIMODownlink in the Presence of Inter-CellInterference

H.Shirani-Mehr ∗ , H. Papadopoulos † , S. A. Ramprashad † , G. Caire ∗∗ University of Southern California, Email: shiranim, [email protected] † DoCoMo Laboratories USA, Inc., Email: ramprashad, [email protected]

Abstract

User scheduling and multiuser multi-antenna (MU-MIMO) transmission are at the core of high-rate data-oriented downlink schemes of the next-generation of cellular systems (e.g., LTE-Advanced).Scheduling selects groups of users according to their channels vector directions and SINR levels.However, when scheduling is applied independently in each cell, the inter-cell interference (ICI) powerat each user receiver is not known in advance since it changes at each new scheduling slot dependingon the scheduling decisions of all interfering base stations. In order to cope with this uncertainty, weconsider the joint operation of scheduling, MU-MIMO beamforming and Automatic Repeat reQuest(ARQ). We develop a game-theoretic framework for this problem and build on stochastic optimizationtechniques in order to ﬁnd optimal scheduling and ARQ schemes. Particularizing our framework to thecase of “outage service rates”, we obtain a scheme based on adaptive variable-rate coding at the physicallayer, combined with ARQ at the Logical Link Control (ARQ-LLC). Then, we present a novel schemebased on incremental redundancy Hybrid ARQ (HARQ) that is able to achieve a throughput performancearbitrarily close to the “genie-aided service rates”, with no need for a genie that provides non-causallythe ICI power levels. The novel HARQ scheme is both easier to implement and superior in performancewith respect to the conventional combination of adaptive variable-rate coding and ARQ-LLC.

Keywords:

Multiuser MIMO, inter-cell interference, scheduling, hybrid ARQ, stochastic op-timization, game theory.

I. I

NTRODUCTION

High-rate data-oriented downlink schemes [1], [2] have been successfully deployed as anextension of 3G cellular standards (WCDMA and CDMA2000). These schemes are basedon the results of [3]–[5], showing that the throughput (or “ergodic”) sum-capacity of single-antenna multi-access (uplink) and broadcast (downlink) fading Gaussian channels is achievedby allocating opportunistically each time-frequency slot to the user with the best instantaneouschannel conditions. In a multiuser setting, the sum-capacity is usually not the most meaningfulmeasure of the system performance. Instead, maximizing the sum-throughput subject to some fairness constraint is more desirable [5]. To this purpose, a downlink scheduling policy can bedesigned in order to maximize a suitable concave and component-wise monotonically increasingnetwork utility function over the system’s achievable throughput region (i.e., the region ofachievable long-term average user rates). The network utility function is designed in order tocapture the desired notion of “fairness” (e.g., proportional fairness, max-min fairness and, morein general, α -fairness [6]).In the next generation of cellular systems (e.g., the so-called LTE-Advanced [7]), high-ratedata-oriented downlink schemes will be combined with multiuser multi-antenna (MU-MIMO)transmission techniques [8], [9], supporting spectral efﬁciencies in the 10’s of bits/sec/Hz [10],[11]. With MU-MIMO, the rate supported by each user is generally a function of all the userchannel vectors, and depends on the type of MU-MIMO precoding [8], [9], [12]. In order tocompute the transmitter precoder parameters (e.g., the beamforming steering vectors and thetransmitted rates and powers), channel state information at the transmitter (CSIT) is required.This can be accurately obtained using open and closed loop channel estimation and feedbackschemes (the literature on this subject is overwhelming, for example, see [13]–[17] and referencestherein).In particular, scheduling with MU-MIMO and non-perfect CSIT was considered in [18],particularizing the general stochastic optimization framework of [19] to the case of a single-cell system with linear Zero-Forcing Beamforming (ZFBF) MU-MIMO precoding, where CSITis obtained via noisy channel estimation and prediction.In this work we focus on a multi-cell environment with no inter-cell cooperation. For sufﬁ-ciently slowly-moving user terminals it is possible to design training and feedback schemes that achieve almost perfect CSIT [16]–[18]. Therefore, for simplicity we shall assume thateach BS has perfect CSIT for its own users. In contrast, in a multi-cell system, inter-cellinterference (ICI) emerges as another source of unavoidable uncertainty. (see [20], [21] andreferences therein). When the schedulers at each BS make their own decisions independently,based only on the locally available CSIT relative to their own users, the ICI power seen at eachuser receiver changes on a slot-by slot basis in a random and unpredictable manner, dependingon the scheduling decision made at all the interfering BSs. As a consequence, the instantaneousSignal to Interference plus Noise Ratio (SINR) “seen” at any given user receiver is a randomvariable.The decentralized scheduling problem in a multi-cell environment can be formulated as anon-cooperative game: each BS (player) wishes to maximize its own utility function over itsown feasible throughput region. The players’ strategies are all feasible scheduling policies. Inaddition, the throughput region of any given cell depends on the ICI power statistics seen atthe users’ receivers, which in turn depend on the scheduling policies applied at the interferingBSs. We show that when the individual network utility functions are concave the multi-celldecentralized scheduling game is a concave game and therefore Nash equilibria exist.In order to solve the network utility maximization at each BS, for given ICI statistics, we applythe stochastic optimization framework of [18], [19], [22], [23]. A straightforward application ofthis approach yields a scheme based on variable-rate adaptive coding at the physical layer, andconventional ARQ at the Logical Link Control (LLC) layer. We notice that similar approaches areincluded in several wireless standards such as EV-DO and HSDPA [24]–[26], and therefore thiscan be regarded as the base-line “conventional” approach. In order to improve upon the conven-tional approach, we propose a new method based on combining incremental redundancy HybridAutomatic Retransmission reQuest (HARQ) [27] and MU-MIMO opportunistic scheduling. Inthe proposed scheme, each user feeds back the value of the instantaneous mutual informationobserved in the previous slot, that is used by the scheduler to update recursively the schedulerweights. We show that the throughput achieved by the proposed HARQ scheme approachesthe throughput of a “virtual system”, as if a genie provided non-causally the ICI values ateach scheduling slot. However, we stress that the proposed scheme makes use of strictly causalinformation, and therefore requires no genie. II. S

YSTEM SETUP

We consider the downlink of a system with

C > cells. In each cell, a BS equipped with with M antennas transmits to K single-antenna users. The channel is assumed frequency ﬂat andconstant over “slots” of length T ≫ symbols (block-fading model [28]). Any given channel useof the complex baseband discrete-time signal at the k -th user in cell c during slot t is describedby y k,c [ t ] = √ g k,c,c h H k,c,c [ t ] x c [ t ] | {z } desired BS + X c ′ = c √ g k,c,c ′ h H k,c,c ′ [ t ] x c ′ [ t ] | {z } inter-cell interference + z k,c [ t ] , (1)where t ticks at the slot rate, ( k, c ) denotes user k in cell c , h k,c,c ′ [ t ] ∈ C M is the channelvector from the c ′ -th BS antenna array to the ( k, c ) -th receiver antenna, x c ′ [ t ] ∈ C M is the signaltransmitted by c ′ -th BS and z k,c [ t ] ∼ CN (0 , is the additive white Gaussian noise (AWGN)sample. The coefﬁcients g k,c,c ′ are distance-dependent path gains [29] that are assumed to betime-invariant over many slots. The BSs are sum-power constrained such that tr ( Σ c [ t ]) ≤ forall t , where Σ c [ t ] = E [ x c [ t ] x H c [ t ]] denotes the transmit covariance matrix. The actual channelSNR is included as a common scaling factor in the coefﬁcients g k,c,c ′ . The channel vectors ofusers in cell c form the columns of the channel matrix H c [ t ] = [ h ,c,c [ t ] , ..., h K,c,c [ t ]] ∈ C M × K .We assume that all vectors h k,c,c ′ [ t ] are mutually independent with i.i.d. components ∼ CN (0 , ,for all distinct 4-tuples ( t, k, c, c ′ ) . Each BS c knows all time-invariant quantities relative to itsown users and has perfect knowledge of H c [ t ] immediately before the beginning of slot t (perfectCSIT for the own users).A feasible scheduling policy γ c for BS c is a possibly randomized stationary function thatmaps H c [ t ] into the pair γ c ( H c [ t ]) = ( Σ c [ t ] , r c [ t ]) , where r c [ t ] = ( r ,c [ t ] , . . . , r K,c [ t ]) is a rateallocation vector. We assume that the MU-MIMO precoder is based on linear ZFBF. This yieldsthe transmitted signal vector in the form x c [ t ] = P k ∈S c [ t ] v k,c [ t ] u k,c [ t ] , where S c [ t ] denotes the setof active users, i.e., users that are selected to be served on slot t and where u k,c [ t ] ∈ C denotesthe coded symbol for user ( k, c ) , with power E [ | u k,c [ t ] | ] = P k,c [ t ] . The ZFBF steering vectors { v k,c [ t ] : k ∈ S c [ t ] } are given by the unit-norm (normalized) k -th column of the Moore-Penrose The generalization to MIMO-OFDM and frequency selective fading is immediate. Using the theory developed in [30] we can show that restricting to stationary policies does involve any suboptimality interms of the achievable throughput region. pseudoinverse (e.g., see [14], [16], [18], [31], [32] and references therein) of the channel matrixrestricted to the active users, i.e., to the columns { h k,c,c [ t ] : k ∈ S c [ t ] } . It follows that the transmitcovariance matrix takes on the form Σ c [ t ] = X k ∈S c [ t ] v k,c [ t ] v H k,c [ t ] P k,c [ t ] . (2)where non-negative coefﬁcients { P k,c [ t ] : k ∈ S c [ t ] } deﬁne the power allocation over theactive users in cell c , and satisfy the power constraint P k ∈S c [ t ] P k,c [ t ] ≤ . A necessary andsufﬁcient condition for perfect zero-forcing of the intra-cell multiuser interference is that |S c [ t ] | ≤ min { M, K } . Without loss of generality, in the following we identify the set of active users S c [ t ] with those users with positive powers, i.e., P k,c [ t ] > for k ∈ S c [ t ] and P k,c [ t ] = 0 for k / ∈ S c [ t ] .The ICI power at user ( k, c ) receiver in slot t is given by χ k,c [ t ] = X c ′ = c g k,c,c ′ h H k,c,c ′ [ t ] Σ c ′ [ t ] h k,c,c ′ [ t ] (3)with mean given by χ k,c = E [ χ k,c [ t ]] = P c ′ = c g k,c,c ′ tr ( Σ c ′ [ t ]) . The SINR at user ( k, c ) is givenby sinr k,c [ t ] = g k,c,c (cid:12)(cid:12) h H k,c,c [ t ] v k,c [ t ] (cid:12)(cid:12) P k,c [ t ]1 + χ k,c [ t ] (4)We let R k,c [ t ] denote the instantaneous service rate of user ( k, c ) on slot t , measured inbits/channel use. This is in general a function of sinr k,c [ t ] , and therefore of H c [ t ] , Σ c [ t ] , χ k,c [ t ] , andof the allocated rate r k,c [ t ] . We deﬁne the k -th user service rate function R k ( g, H , χ, Σ , r ) , suchthat R k,c [ t ] = R k ( g k,c,c , H c [ t ] , χ k,c [ t ] , Σ c [ t ] , r c [ t ]) . Let Γ denote the set of all feasible schedulingpolicies and let γ − c = { γ c ′ : c ′ = c } denote the set of scheduling policies at all cells c ′ = c . Forﬁxed γ − c ∈ Γ C − , the throughput of user ( k, c ) under the scheduling policy γ c is given by R k,c ( γ c , γ − c ) = lim inf t →∞ t t X τ =1 R k ( g k,c,c , H c [ τ ] , χ k,c [ τ ] , γ c ( H c [ τ ]))= E [ R k ( g k,c,c , H c , χ k,c , γ c ( H c ))] (5)where the effect of the policies at the interfering BSs is captured by the statistics of the ICIpower process χ k,c [ t ] , the limit holds almost surely because of stationarity and ergodicity, andexpectation is with respect to the joint distribution of the triple ( H c [ t ] , χ k,c [ t ] , γ c ) . The region With some abuse of notation, we denote by H c and { χ k,c : k = 1 , . . . , K } random variables whose joint distributioncoincides with the ﬁrst-order joint distribution of the processes H c [ t ] and { χ k,c [ t ] : k = 1 , . . . , K } , which is time-invariant bystationarity. of achievable throughputs for cell c is given by R c ( γ − c ) = coh [ γ c ∈ Γ (cid:8) R ∈ R K + : R k ≤ E [ R k ( g k,c,c , H c , χ k,c , γ c ( H c ))] , ∀ k (cid:9) (6)“coh” denotes “closure of the convex hull”. Notice that R c ( γ − c ) depends on the other cells’scheduling policies γ − c through the joint probability distribution of the ICI powers { χ k,c : k =1 , . . . , K } ) .Under our assumptions, the BSs operate in a decentralized way and inﬂuence each other onlyin terms of the generated ICI statistics (i.e., the joint cdfs { χ k,c : k = 1 , . . . , K } ). Each BSwishes to maximize its own network utility function. This multi-objective optimization problemis formulated as a non-cooperative game [33], [34] that we nickname the multi-cell decentralizedscheduling game , where each player (i.e., BS) c seeks to achievemaximize U c ( R ) subject to R ∈ R c ( γ − c ) (7)where we assume that U c ( · ) is a continuous, strictly concave and component-wise increasingutility function, reﬂecting some suitable fairness criterion [6].By deﬁnition, for any given joint statistics of H c and of { χ k,c : k = 1 , . . . , K } , the maximum in(7) is achieved by some scheduling policy γ ⋆c . A Nash equilibrium of the decentralized schedulinggame is a set of scheduling policies (also denoted, with some abuse of notation, by { γ ⋆c : c =1 , . . . , C } ) such that γ ⋆c is the solution to (7) when γ − c = γ ⋆ − c , for all c = 1 , . . . , C . We have: Theorem 1:

The decentralized scheduling game deﬁned above is a concave game and thereforehas a Nash equilibrium.

Proof:

All players have the same strategy set Γ . This is a compact convex set due to thecovariance trace constraint and to the fact that we can assume that the rate allocation vector isbounded in r c ∈ [0 , r max ] K for some constant r max . Also, each c -th utility is a concave functionof γ c for ﬁxed γ − c . In order to see this, let R ( γ c , γ − c ) denote the throughput point of R c ( γ − c ) achieved by policy γ c for ﬁxed γ − c , consider any two policies γ ′ c , γ ′′ c ∈ Γ and deﬁne γ ( λ ) c as thepolicy that applies γ ′ c with probability λ ∈ [0 , and γ ′′ c with probability ¯ λ = 1 − λ . Then, from This limitation does not involve any signiﬁcant loss of generality if r max is sufﬁciently large, and always holds in practicesince practical variable-rate coding has a ﬁnite maximum rate. the convexity of R c ( γ − c ) and the concavity of U c ( · ) we have that λU c ( R ( γ ′ c , γ − c )) + ¯ λU c ( R ( γ ′′ c , γ − c )) ≤ U c ( λ R ( γ ′ c , γ − c ) + ¯ λ R ( γ ′′ c , γ − c )) = U c ( R ( γ ( λ ) c , γ − c )) Now, let γ = { γ c : c = 1 , . . . , C } and γ ′ = { γ ′ c : c = 1 , . . . , C } denote two vectors of schedulingpolicies and deﬁne the sum-utility function ρ ( γ , γ ′ ) = P Cc =1 U c ( R ( γ c , γ ′− c )) . Since the functions U c ( · ) are continuous (by assumption) and the throughput vectors are continuous functions of thescheduling policies, it follows that ρ ( γ , γ ′ ) is a continuous function of ( γ , γ ′ ) ∈ Γ C × Γ C and,for what said before, it is concave in γ for any ﬁxed γ ′ . These properties match exactly theassumption of Rosen Theorem [35]. Therefore, as a direct consequence of [35], the existence ofa Nash equilibrium is proved.Since U c ( · ) is component-wise increasing, it follows that the maximum of (7) is obtained forsome γ ⋆c such that R ( γ ⋆c , γ − c ) is on the Pareto boundary of R c ( γ − c ) . If the service rate function R k ( g, H , χ, Σ , r ) is strictly increasing in the power allocated to user k , then the Pareto boundaryof R c ( γ − c ) is achieved by policies that satisfy tr ( Σ c [ t ]) = P k ∈S c [ t ] P k,c [ t ] = 1 with probability1. In this case, any Nash equilibrium { γ ⋆c : c = 1 , . . . , C } must correspond to scheduling policiesthat achieve the power constraint with equality for all BSs.In Sections III and IV we will focus on reference cell c , assuming that all other interferingcells apply a ﬁxed arbitrary policy γ − c (i.e., for ﬁxed and known joint statistics of the ICI powersat all users of cell c ). We shall apply the theory developed in [18], [19] and provide a stochasticoptimization algorithm that solves (7) to any desired level of approximation, for any given ICIpowers statistics.III. S CHEDULING WITH ADAPTIVE VARIABLE - RATE CODING AND

ARQ-LLCFrom now on we shall assume Gaussian random coding and consider speciﬁc cases of servicerate functions. In this case, we deﬁne the k -th user mutual information function as I k ( g, H , χ, Σ ) = log g (cid:12)(cid:12) h H k v k (cid:12)(cid:12) P k χ ! (8)The mutual information at user ( k, c ) receiver on slot t is given by I k,c [ t ] ∆ = I k ( g k,c,c , H c [ t ] , χ k,c [ t ] , Σ c [ t ]) .We approximate the decoding error probability by the corresponding information outage proba-bility (see [28], [36] for the information-theoretic motivations underlying this very common andvery useful approximation). Namely, if the mutual information I k,c [ t ] is less than the scheduled coding rate r k,c [ t ] , the decoder makes a decoding error with probability close to 1, while if I k,c [ t ] > r k,c [ t ] the random coding average error probability is very close to 0. Therefore, forslot length T large enough, there exist “good” codes drawn from a Gaussian ensemble such thattheir block error probability is close to the information outage probability P ( r k,c [ t ] > I k,c [ t ]) . Inthis case, the user k service rate function is given by “outage rate” function [18] R k ( g, H , χ, Σ , r ) = r k × { r ≤ I k ( g, H , χ, Σ ) } (9)In order to obtain the desired near-optimal scheduling policy, we apply the framework of [18].We deﬁne the virtual queues with buffer state Q c [ t ] = ( Q ,c [ t ] , . . . , Q K,c [ t ]) and virtual arrivalprocesses A c [ t ] = ( A ,c [ t ] , . . . , A K,c [ t ]) . The virtual queues evolve according to the stochasticdifference equations Q k,c [ t + 1] = max { , Q k,c [ t ] − R k,c [ t ] } + A k,c [ t ] , k = 1 , . . . , K (10)Then, we consider the adaptive policy deﬁned by:1) For any given t , let the transmit covariance matrix Σ c [ t ] and the rate allocation vector r c [ t ] be the solution ofmaximize K X k =1 Q k,c [ t ] E [ r k,c [ t ] × { r k,c [ t ] ≤ I k ( g k,c,c , H c [ t ] , χ k,c [ t ] , Σ k,c [ t ]) }| H c [ t ]] subject to tr ( Σ c [ t ]) ≤ , r k,c [ t ] ≥ ∀ k (11)2) For suitable constants V, A max > , let the virtual arrival processes at time t be given bythe solution of max ≤ A k,c [ t ] ≤ A max , ∀ k V U c ( A c [ t ]) − K X k =1 A k,c [ t ] Q k,c [ t ] (12)3) Update the virtual queues according to (10), with arrivals A c [ t ] given by (12) and servicerates R k,c [ t ] given by (9) calculated for Σ c [ t ] and r c [ t ] solutions of (11).As stated in Theorem 2 below, the policy deﬁned above achieves the optimal point R ⋆c solutionof (7) within any desired accuracy, depending on the constants V and A max . Neglecting the(small) degradation due to stochastic adaptation and quantiﬁed by Theorem 2, we shall refer tothis policy as γ ⋆ . It is important to keep in mind that the virtual queues have nothing to do with the ARQ transmission buffers: they are usedhere as a tool to recursively update the weights of the the scheduling policy.

Theorem 2:

Assume i.i.d. channels and ﬁxed joint statistics of the ICI powers { χ k,c : k =1 , . . . , K } . Assume that U c ( · ) is concave and entry-wise non-decreasing, and that there exists atleast one point r ∈ R c ( γ − c ) with strictly positive entries such that U c ( r / > −∞ . Then, thescheduling policy γ ⋆c deﬁned above, for given constants V > and A max > , has the followingproperties:(a) The utility achieved by γ ⋆ satisﬁes: lim inf t →∞ U c t t X τ =1 R c [ τ ] ! ≥ U c ( R ⋆ ( A max )) − κ/V (13)where κ ∆ = 12 KA + K X k =1 E (cid:20) log (cid:18) g k,c,c | h k,c,c | χ k,c (cid:19)(cid:21)! (14)and where R ⋆c ( A max ) denotes the solution of the problem (7) with the additional constraint ≤ R k,c ≤ A max for all k = 1 , . . . , K .(b) For any point R c ∈ R c ( γ c ′ : c ′ = c ) such that ≤ R k,c ≤ A max for all k , and for anyvalue β ∈ [0 , we have: lim sup t →∞ t t X τ =1 K X k =1 R k,c E [ Q k,c [ τ ]] ≤ κ + V [ U c ( R ⋆c ( A max )) − U c ( β R c )]1 − β (15)Thus, all virtual queues Q k,c [ t ] are strongly stable. Proof:

The proof follows verbatim from the results in [18] and it is not repeated here forbrevity.As a corollary of Theorem 2, if A max is sufﬁciently large such that A max ≥ R ⋆k,c for all k ,then γ ⋆c satisﬁes lim inf t →∞ U c t t X τ =1 R c [ τ ] ! ≥ U c ( R ⋆c ) − κ/V. (16)Hence, the control parameter V can be chosen sufﬁciently large in order to make the achievedutility as close as desired to the optimal value U c ( R ⋆ ) of problem (7). This comes with a tradeoffin the virtual queue average sizes that, as seen from (15), grow linearly with V . The virtual queuesizes represent the difference between the virtual bits admitted into the queues and the actualbits transmitted, and thus affect the time-scales required for the time averages to become closeto their limiting values. A discrete-time queue Q k [ t ] is strongly stable if lim sup t →∞ t P tτ =1 E [ Q k [ τ ]] < ∞ . The system is strongly stable if allqueues k = 1 , . . . , K are strongly stable. A. Implementation

The policy γ ⋆ found before computes recursively the “weights” Q c [ t ] via (12) and (10)and, for each t , solves the weighted conditional average rate sum maximization (11). Problem(12) is a standard convex optimization problem the solution of which does not present anymajor conceptual difﬁculty and is found in closed form for the important cases of proportionalfairness and max-min fairness (see [18]), corresponding to the choices U c ( R ) = P Kk =1 log R k and U c ( R ) = min k R k , respectively. In contrast, solving (11) presents some difﬁculties. Letting F k,c ( · ) denote the marginal cdf of χ k,c [ t ] and using (8), the objective function in (11) can berewritten as P k ∈S c [ t ] Q k,c [ t ] r k,c [ t ] F k,c g k,c,c (cid:12)(cid:12) h H k,c,c [ t ] v k,c [ t ] (cid:12)(cid:12) P k,c [ t ]2 r k,c [ t ] − − ! (17)The optimization in (11) is generally a non-convex problem that involves a combinatorial searchover all subsets S c [ t ] ⊆ { , . . . , K } of cardinality ≤ min { K, M } and, for each candidate subset,the maximization of (17) with respect to r c [ t ] and the power allocation { P k,c [ t ] : k ∈ S c [ t ] } .Since this optimization may be difﬁcult to compute, we propose the following suboptimal low-complexity two-step approach:Step 1) the active user subset and the corresponding power allocation are selected by assumingdeterministic ICI powers, equal to their mean value χ k,c . Under this assumption, the problem isreduced to the well-known user selection with ZFBF, that can be solved using standard techniquesbased on quasi-orthogonal user selection and waterﬁlling (e.g., [37]–[39]).Step 2) for the transmit covariance Σ c [ t ] obtained in step 1, (17) is optimized with respect tothe rate allocation. This reduces to optimizing the outage rate separately for each k ∈ S k,c [ t ] byletting r k,c [ t ] = arg max r ≥ ( r F k,c g k,c,c (cid:12)(cid:12) h H k [ t ] v k,c,c [ t ] (cid:12)(cid:12) P k,c [ t ]2 r − − !) (18)where g k,c,c (cid:12)(cid:12) h H k,c,c [ t ] v k,c [ t ] (cid:12)(cid:12) P k,c [ t ] is ﬁxed by Step 1.Notice that, both in the original problem and in the proposed low-complexity two-step ap-proximated solution, only the marginal statistics of the ICI powers { χ k,c [ t ] : k = 1 , . . . , K } arerelevant. These marginal statistics can be measured by each user terminal individually and fedback to the BS scheduler by some very low-rate feedback scheme. IV. S

CHEDULING WITH INCREMENTAL REDUNDANCY

HARQIf a genie provides the BS scheduler with the values of the the mutual information { I k,c [ t ] : k = 1 , . . . , K } in a non-causal fashion, just before the beginning of slot t , then the optimal rateallocation would be, trivially, r k,c [ t ] = I k,c [ t ] for all k ∈ S c [ t ] , yielding zero outage probability.This “genie-aided” case was considered in [18] and referred to as “optimistic rate” allocation,although no actual algorithm to approach the optimistic throughput was given. Since for anynon-negative random variable I and r > we have E [ r { r > I } ] ≤ E [ I ] , then the optimisticservice rates provide an upper bound to the throughput of any system with the same signalingscheme (ZFBF and Gaussian codes) and given rate allocation.In this section we show how to achieve the “optimistic” throughput without the aid of anygenie. As a preliminary step, let’s consider the following incremental redundancy HARQ scheme.The BS scheduler maintains a buffer of information packets for each user in the cell. The size ofuser ( k, c ) packets is equal to b k,c bits per packet. Each packet is encoded into an inﬁnite-lengthsequence of complex symbols. The encoded sequence is partitioned into blocks of length T symbols. At each slot t , the scheduling policy computes Σ c [ t ] according to some rule to befound later. For all active users k ∈ S c [ t ] , if the most recent HARQ feedback message from user k is “NACK” (negative acknowledgement), then the ﬁrst not-yet transmitted coded block of thecurrent packet is transmitted. Otherwise, if the most recent received HARQ feedback message is“ACK” (positive acknowledgement), then the current packet is removed from the transmissionbuffer of user k and the ﬁrst coded block of next packet in the buffer is transmitted. The ( k, c ) -th receiver stores in memory all the received slots for times { t : k ∈ S c [ t ] } and attempts todecode the current packet at every newly received slot, by using all the available received slots.If decoding fails, NACK is sent back, otherwise ACK is sent back and the decoder memory isreset. Notice that the scheme does not require any genie-aided “look-ahead” of the instantaneousICI power χ k,c [ t ] , and makes use of time-invariant packet sizes b k,c . These may differ from userto user but are independent of t . For later use, we deﬁne the “ﬁrst-block coding rate” as theratio r k,c = b k,c T bits/channel use.Next, we describe a scheduling rule, denoted again by γ ⋆c , that operates arbitrarily closelyto the genie-aided throughput when combined with the HARQ scheme described above. At the In practice, this rateless coding can be implemented by using Raptor codes [40]. end of each slot t , the active users k ∈ S c [ t ] feed back both their ACK/NACK message and themutual information I k,c [ t ] “seen” at their receiver. Then, γ ⋆c coincides with what given in SectionIII, after the following two changes. 1) The virtual queues evolution equation (10) is replacedby Q k,c [ t + 1] = max { , Q k,c [ t ] − I k,c [ t ] } + A k,c [ t ] , ∀ k (19)2) The transmitter optimization (11) is replaced bymaximize K X k =1 Q k,c [ t ] E [ I k ( g k,c,c , H c [ t ] , χ k,c [ t ] , Σ c [ t ]) | H c [ t ]] subject to tr ( Σ c [ t ]) ≤ (20)In brief, the scheduler updates recursively its weights Q c [ t ] and computes the transmitted signalcovariance Σ c [ t ] according to (20), as if it was operating on a virtual “genie-aided” systemwith instantaneous service rates I k,c [ t ] . The throughput region of the virtual genie-aided system,denoted by R genie c ( γ − c ) , is given by (6), after replacing the general rate function R k ( · · · ) withthe mutual information function I k ( · · · ) deﬁned in (8). The performance of γ ⋆c for the genie-aided system is again given by Theorem 2, where R c [ τ ] in (13) is replaced by the vector ofmutual informations I c [ τ ] = ( I ,c [ τ, . . . , I K,c [ τ ]) and where R ⋆ ( A max ) denotes the solution of(7) when R c ( γ − c ) is replaced by R genie c ( γ − c ) , with the additional constraint ≤ R k,c ≤ A max for all k = 1 , . . . , K .For sufﬁciently large A max , γ ⋆c yields: lim inf t →∞ U c t t X τ =1 I c [ τ ] ! ≥ U c ( R genie ,⋆c ) − κ/V, (21)where R genie ,⋆c is the utility-maximizing throughput point in the region R genie c ( γ − c ) . At this point,it remains to be shown that the combination of the policy γ ⋆c with the incremental redundancyHARQ scheme yields a network utility as close as desired to the limit in (21). This is shownby the following: Theorem 3:

Let R harq ,⋆c = ( R harq ,⋆ ,c , . . . , R harq ,⋆K,c ) denote the throughput achievable by theincremental redundancy HARQ protocol under scheduling policy γ ⋆c deﬁned above. For eachuser ( k, c ) and ǫ k,c > there exists a sufﬁciently large ﬁrst-block rate r k,c such that R harq ,⋆k,c ≥ (1 − ǫ k,c ) R genie ,⋆k,c . Proof:

Consider user ( k, c ) . Following the argument in [27], we can model the event ofsuccessful decoding as a “mutual information level-crossing event”. Suppose that the transmissionof the current packet for user ( k, c ) starts at slot t start (i.e., an ACK was fed back at slottime t start − ). Then, the current packet can be successfully decoded at slot t ≥ t start if P tτ = t start I k,c [ τ ] ≥ r k,c . Otherwise, a decoding error occurs with very high probability. Asshown in [27], [41], the probability of undetected decoding error vanishes exponentially with T . Therefore, in the regime of large T , if P tτ = t start I k,c [ τ ] < r k,c the decoding error is detectedwith arbitrarily high probability and a NACK is sent back. Fig. 1 shows, qualitatively, the mutualinformation level-crossing and the corresponding successful decoding events of the ( k, c ) decoder.Notice that the mutual information increment is non-negative, and it is exactly zero for all t suchthat k / ∈ S c [ t ] , i.e., when user ( k, c ) is not scheduled.Consider the transmission of a long sequence of packets. Without loss of generality, assumethat the system starts at time t start = 1 , denote by N k,c [ t ] the number of successful decodingevents of decoder ( k, c ) up to time t and let W k,c (1) , W k,c (2) , . . . , W k,c ( N k,c [ t ]) denote thecorresponding “inter-ACK” times (see Fig. 1). Since at each successful decoding a “reward”of r k,c bit per channel use is delivered to the destination, the throughput of the HARQ protocolsis given by R harq ,⋆k,c = lim t →∞ r k,c N k,c [ t ] P N k,c [ t ] n =1 W k,c ( n ) + ∆ k,c [ t ] (22)where ∆ k,c [ t ] = t − P N k,c [ t ] n =1 W k,c ( n ) denotes the difference between the current time t and thetime at which the N k,c [ t ] -th successful decoding occurred. Under the assumptions of this paper,the system with HARQ protocol and scheduling policy γ ⋆c evolves according to a discrete-time, continuous-valued vector Markov process with state given by Q c [ t ] and by the vector ofaccumulated mutual informations at each receiver. Since the virtual queues are strongly stable(see Theorem 2) and the accumulated mutual informations are bounded in [0 , r k,c ] , the processis stationary and ergodic. Therefore, the limit in (22) holds almost surely, and can be explicitly computed as follows: R harq ,⋆k,c = lim t →∞ r k,c N k,c [ t ] P N k,c [ t ] n =1 W k,c ( n ) + ∆ k,c [ t ] N k,c [ t ] = r k,c lim t →∞ N k,c [ t ] P N k,c [ t ] n =1 W k,c ( n ) + lim t →∞ ∆ k,c [ t ] N k,c [ t ] = r k,c E [ W k,c ] (23)where W k,c is an integer-valued random variable with the same marginal distribution of theinter-ACK times.In order to determine E [ W k,c ] , consider the case t start = 1 and deﬁne the event A k,c [ t ] = ( t X τ =1 I k [ τ ] ≤ r k,c ) (24)Since the accumulated mutual information between two ACKs is non-decreasing, the followingnesting condition holds: A k,c [ t ] ⊆ A k,c [ t − , ∀ t where A k,c [0] = { ≤ r r,c } has probability 1. It follows that P ( W k,c = t ) = P ( A k,c [ t − , A k,c [ t ]) = P ( A k,c [ t − − P ( A k,c [ t ]) , yielding the average inter-ACK time in the form E [ W k,c ] = ∞ X t =1 t P ( W k,c = t )= 1 + ∞ X t =1 P ( A k,c [ t ]) (25)Owing to the complete formal analogy of results (23) and (25) with the throughput of HARQconsidered in [27]), we can directly apply the limit proved in [27]: lim r k,c →∞ r k,c E [ W k,c ] = E [ I k ( g k,c,c , H c , χ k,c , Σ c )] (26) This result is indeed quite intuitive: when r k,c becomes large, then E [ W k,c ] increases. Therefore, the accumulated mutualinformation divided by the number of slots W k,c P W k,c τ =1 I k ( g k,c,c , H c [ τ ] , χ k,c [ τ ] , Σ c [ τ ]) converges to an ensemble average. Itfollows that in this limit the level crossing condition tends to become deterministic, and satisﬁes (approximately) W k,c X τ =1 I k ( g k,c,c , H c [ τ ] , χ k,c [ τ ] , Σ c [ τ ]) = W k,c r k,c Of course, this argument can be made rigorous by following in the footsteps of [27]. In particular, as r k,c → ∞ the average inter-ACK time E [ W k,c ] diverges to inﬁnity linearly with r k,c . The analysis in [27] shows that, for any η k,c > , R harq ,⋆k,c ≥ (1 − η k,c ) E [ I k ( g k,c,c , H c , χ k,c , Σ c )] (27)for all sufﬁciently large r k,c .The proof of Lemma 3 is ﬁnally concluded by combining the result (26) with (21). Bystationarity and ergodicity, under γ ⋆c we have that lim t →∞ t t X τ =1 I k,c [ τ ] = E [ I k ( g k,c,c , H c , χ k,c , Σ c )] holds almost surely. Since U c ( · ) is component-wise increasing, (21) implies that for any δ k,c > there exist sufﬁciently large A max and V for which E [ I k ( g k,c,c , H c , χ k,c , Σ c )] ≥ (1 − δ k,c ) R genie ,⋆k,c (28)By letting (1 − ǫ k,c ) = (1 − η k,c )(1 − δ k,c ) and using (27) and (28) Theorem 3 is proved.From the above proof it follows that the delay-throughput operating point of the incrementalredundancy HARQ protocol can be chosen individually for each user by setting the thresholdvalue r k,c (or, equivalently, the size b k,c of the information packets). By making r k,c large, theaverage decoding delay D k,c = E [ W k,c ] becomes large and the throughput approaches R genie ,⋆k,c .Also, we wish to stress the difference between the ARQ-LLC scheme described in SectionII and the incremental-redundancy HARQ protocol illustrated in this section. The ARQ-LLCprotocol makes use of adaptive variable-rate coding at the physical layer, and removes or keepsin the transmission buffer packets of information bits of variable size b k,c [ t ] = T r k,c [ t ] . In contrast,the HARQ protocol make use of a ﬁxed packet size b k,c (equivalent to ﬁxed ﬁrst-block rate r k,c ),but the effective service rate is adaptive by varying the decoding delay through the ACK/NACKmechanism. A. Implementation

The scheme previously proposed requires that each active user, at the end of each slot t , feedsback a message formed by one bit for ACK/NACK and by the value of I k,c [ t ] or, equivalently,the value of sinr k,c [ t ] . We notice that feeding back the instantaneous SINR is widely proposedin the literature on opportunistic downlink scheduling [42], [43] and it is referred to as Channel Quality Indicator (CQI). However, in the current literature the CQI is relative to the current slot,and it is used to select users and allocate the rate of a variable-rate coding scheme. In contrast,here the CQI refers to the past slot, and it is used to update the scheduler weights according to(19).Denoting again by F k,c ( · ) the marginal cdf of χ k,c [ t ] , the objective function in (20) can berewritten as P k ∈S c [ t ] Q k,c [ t ] Z ∞ log g k,c,c (cid:12)(cid:12) h H k,c,c [ t ] v k,c [ t ] (cid:12)(cid:12) P k,c [ t ]1 + z ! dF k,c ( z ) (29)While for any ﬁxed user subset S c [ t ] the maximization of (29) with respect to the powers { P k,c [ t ] : k ∈ S c [ t ] } is a convex problem, the solution is not generally given by the simplewaterﬁlling formula and it may be difﬁcult to compute since the cdfs F k,c ( · ) are typically notknown in closed form. A near-optimum low-complexity approximation consists of choosing Σ c [ t ] that maximizes the objective function lower bound P k ∈S c [ t ] Q k,c [ t ] log g k,c,c (cid:12)(cid:12) h H k,c,c [ t ] v k,c [ t ] (cid:12)(cid:12) P k,c [ t ]1 + χ k,c ! (30)obtained by applying Jensen’s inequality to (29). Notice that the maximization of (30) withrespect to the transmit covariance matrix coincides with step 1 in the low-complexity approxi-mation of the variable-rate coding/ARQ-LLC case of Section III-A, and can be solved efﬁcientlyusing the methods in [37]–[39]. B. Extremal ICI distributions

The throughput performance of the HARQ scheme depends on the statistics of the ICI powers,which in turns depend on the scheduling policies γ − c at the interfering BSs. In this section weﬁnd extremal marginal statistics for the ICI powers that provide non-trivial inner and outer boundsto R genie c ( γ − c ) that are independent of γ − c . Here we drop the slot index t since all processes arestationary. We start with the following: Lemma 1:

For all feasible policies γ c ′ : c ′ = c that satisfy the input power constraint withequality and for all users k = 1 , . . . , K , we have E [ I k ( g k,c,c , H c , χ k,c , Σ c )] ≤ E [ I k ( g k,c,c , H c , χ k,c , Σ c )] ≤ E [ I k ( g k,c,c , H c , e χ k,c , Σ c )] (31)where χ k,c = E [ χ k,c ] = P c ′ = c g k,c,c ′ and where e χ k,c = P c ′ = c g k,c,c ′ (cid:12)(cid:12) h H k,c,c ′ v ,c ′ (cid:12)(cid:12) is the ICI powerat the ( k, c ) receiver when all interfering BSs c ′ = c ′ schedule a single user in their own cell. Proof:

The ﬁrst inequality (lower bound) follows immediately from Jensen’s inequalityapplied to the convex function f ( x ) = log(1 + ab + x ) with a, b > , and by the fact that, byassumption, the interfering BSs use all their available power. In order to show the second in-equality (upper bound), we use (2) in (3) and write χ k,c = P c ′ = c g k,c,c ′ P j ∈S c ′ α k,c,c ′ ,j P j,c ′ , where α k,c,c ′ ,j ∆ = | h H k,c,c ′ v j,c ′ | are random variables independent of the SINR numerator | h H k,c,c v k,c | P k,c .Since the ZFBF steering vectors v j,c ′ have unit norm and are independent of h k,c,c ′ , the variables α k,c,c ′ ,j are marginally identically distributed as central chi-squared with 2 degrees of freedom[44]. Also, notice that the α k,c,c ′ ,j ’s are statistically dependent for the same index c ′ , while { α k,c,c ′ ,j : j ∈ S c ′ } and { α k,c,c ′′ ,j : j ∈ S c ′′ } are group-wise mutually independent for c ′ = c ′′ . Byassumption, P j ∈S c ′ P j,c ′ = 1 for all c ′ . Therefore, P j ∈S c ′ α k,c,c ′ ,j P j,c ′ is a convex combinationof identically distributed, possibly dependent, random variables. The second inequality in (31)follows by repeated application of Jensen’s inequality. Choose c ′′ = c . Then, using (8), we have E " log g k,c,c (cid:12)(cid:12) h H k,c,c v k,c (cid:12)(cid:12) P k,c χ k,c ! ≤ X j ∈S c ′′ P j,c ′′ E " log g k,c,c (cid:12)(cid:12) h H k,c,c v k,c (cid:12)(cid:12) P k,c g k,c,c ′′ α k,c,c ′′ ,j + P c ′ = c,c ′′ g k,c,c ′ P j ∈S c ′ α k,c,c ′ ,j P j,c ′ ! = E " log g k,c,c (cid:12)(cid:12) h H k,c,c v k,c (cid:12)(cid:12) P k,c g k,c,c ′′ α k,c,c ′′ , + P c ′ = c,c ′′ g k,c,c ′ P j ∈S c ′ α k,c,c ′ ,j P j,c ′ ! (32)where the equality in (32) follows from the fact that the α k,c,c ′′ ,j ’s are identically distributed withrespect to the index j . Next, pick c ′′′ = c, c ′′ , and apply the same steps to the last line of (32).After eliminating all convex combinations, the ﬁnal upper bound coincides with the right mostterm in (31).As a corollary, we have the following interesting “robustness” result: Theorem 4:

For any choice of the scheduling policies γ − c that satisfy the input power con-straint with equality, we have R c ⊆ R genie c ( γ − c ) ⊆ e R genie c (33)where R c is the region with deterministic ICI powers { χ k,c } , and where e R genie c is the region Notice that if the ICI powers were deterministic, then no genie or HARQ is needed and the system reduces to a collectionof isolated cells, where each cell c has modiﬁed channel path gain coefﬁcients g k,c,c = g k,c,c χ k,c . In this case, the throughputregion R c is achieved by the standard scheduling/resource allocation schemes with perfect state information and zero outageprobability. corresponding to random ICI powers { e χ k,c } . Furthermore, the gap between the inner and outerbounds in (33) is bounded by a constant that does not depend on the channel path coefﬁcients. Proof:

The proof (33) follows directly as a consequence of Lemma 1. In order to show thebounded gap, we have to ﬁnd some constant ∆ , independent of { g k,c,c ′ } , such that max { r − ∆ , } ∈ R c for all points r ∈ e R genie c . To this purpose, pick a point r ∈ e R genie c correspondingto some feasible scheduling policy γ c for the genie-aided system. Applying the same sequenceof input covariance matrices as determined by γ c , to the system with deterministic ICI powers,we certainly ﬁnd a point R c ( γ c ) ∈ R c . Consider the throughput of the k -th user and let forconvenience A ∆ = g k,c,c (cid:12)(cid:12) h H k,c,c v k,c (cid:12)(cid:12) P k,c . Then, by applying Jensen’s inequality we have E h log (cid:16) A P c ′6 = c g k,c,c ′ α k,c,c ′ , (cid:17)(cid:12)(cid:12)(cid:12) A i − log (cid:16) A P c ′6 = c g k,c,c ′ (cid:17) ≤ log (cid:16) P c ′ = c g k,c,c ′ (cid:17) − E h log (cid:16) P c ′ = c g k,c,c ′ α k,c,c ′ , (cid:17)i (34)The RHS in the above inequality is easily seen to be non-negative and component-wise increasingwith respect to any coefﬁcient g k,c,c ′ . Therefore, its maximum is obtained in the limit for all g k,c,c ′ → ∞ (in passing, we notice that this corresponds to considering the interference-limitedregime where SNR → ∞ ). In order to see that this limit is ﬁnite, let g max = max g k,c,c ′ , thenwe have RHS of (34) ≤ log (1 + ( C − g max ) − E " log g max X c ′ = c α k,c,c ′ , ! ≤ − E " log C − X c ′ = c α k,c,c ′ , ! (35) ≤ − E [log ( α k,c,c ′ , )] (36) ≤ γ/ ln(2) (37)where (35) follows by letting g max → ∞ , (36) follows by applying Jensen’s inequality to theconvex function − log x and (37) follows by using the fact that α k,c,c ′ , is chi-squared with 2degrees of freedom, and using the limit lim ǫ ↓ R ∞ ǫ ln xe − x dx = − γ , where γ denotes the Euler-Mascheroni constant [45].Theorem 4 has the following interesting consequence: consider the multi-cell decentralizedscheduling game under the proposed incremental redundancy HARQ scheme, achieving the genie-aided throughput region in each cell. The performance of any given cell c (in terms ofits network utility value) at any Nash equilibrium ( γ ⋆ , . . . , γ ⋆C ) is bounded below and above bythe solutions of (7) when R genie c ( γ ⋆ − c ) is replaced by R c and e R genie c , respectively. This followsfrom the fact that, as argued at the end of Section II, all Nash equilibria must achieve the powerconstraints with equality at each BS. V. N

UMERICAL RESULTS

We considered a simple one-dimensional cellular layout with unit width cells arranged on aline. BSs are located at integer positions c ∈ Z . In each cell c , users are placed on a uniform gridin positions u ( k, c ) = (2 k − K − / (2 K )+ c , for k = 1 , · · · , K . The channel path gains are givenby g k,c,c ′ = G | u ( k,c ) − c ′ | C /δ ) ν , where the modulo- C distance | u − c | C = min {| u − c + zC | : z ∈ Z } induces a torus topology that eliminates border effects and where ν and δ are the propagationexponent and the 3dB breakpoint distance, respectively, and G determines the received SNRat the cell edge [29]. We present results for a system with C = 18 cells, M = 2 antennasper BS, K = 36 users per cell and parameters G = 60 dB, α = 3 . and δ = 0 . . For theimplementation of the policy γ ⋆c we chose parameters A max = 50 , V = 50 and suboptimallow-complexity approximations as explained in Sections III-A and IV-A, respectively. As for thenetwork utility functions, we considered both proportional fairness and max-min fairness (seeSection III-A and [6], [18] and references therein). In order to gather the ICI statistics, we runthe same scheduling algorithm in all BSs and measure the empirical cdf of the ICI power ateach user location in the reference cell c = 0 (since the system is completely symmetric, allcells see the same ICI statistics).Figs. 2 and 3 compare user throughputs in cell c = 0 under proportional fairness and max-minfairness, respectively. Thick dashed lines illustrate the throughput upper bounds of Theorem 4.Thin dashed lines correspond to the actual “genie-aided” rates achievable by the proposed HARQscheme in the limit of inﬁnite decoding delay. Solid lines show the throughput achieved by theHARQ scheme operating at ﬁnite average decoding delay for all users, by setting the parameters { r k, } such that each user achieves 97% of the genie-aided rates (inﬁnite delay). The “triangle”marks indicate the throughput lower bounds of Theorem 4. Finally, the “square” marks indicate Notice that the mutual information function is strictly increasing with the SINR. the throughputs achieved by the conventional adaptive variable-rate coding with ARQ-LLC. Weobserve that under both fairness objective functions, the throughputs achieved by HARQ achievea gain of more than 100% for the users at the edge of the cell in the proportional fairness case,and a throughput gain of more than 40% for all users in the max-min fairness case, with respectto the ARQ-LLC scheme.Figs. 4 and 5 illustrate the average throughput as a function of the average decoding delay forthe HARQ scheme in the case of two speciﬁc users: user (1 , at the left cell edge and (18 , atthe cell center, under proportional fairness and max-min fairness, respectively. The thick dashedlines show genie-aided rates. The solid lines are obtained by increasing ﬁrst-block coding rateparameter r k, and computing average decoding delay from (25) with P ( A k, [ t ]) obtained byMonte Carlo simulation. Note that as r k, increases, also the delay E [ W k, ] increases and theHARQ throughputs approach the genie-aided throughputs, in agreement with Theorem 3. The“o” marks indicate the throughput-delay points at which the HARQ protocol achieves 70%, 80%and 90% of the genie-aided throughput based on simulations. For example, under proportionalfairness, 90% of the genie-aided throughput can be achieved at users (1 , and (18 , withaverage decoding delays of about and slots, respectively. These points (obtained by fullsystem simulation) are accurately predicted by the analytical formulas of Section IV ﬁtted withthe Monte Carlo estimation of the probabilities P ( A k, [ t ]) .For K = 36 users per cell and M = 2 BS antennas, assuming that exactly M = 2 usersare served in each slot, a round-robin scheduling with no outage (genie-aided rate allocation)would take an average delay of slots. Remarkably, under proportional fairness, 90% of thegenie-aided throughput can be achieved with about slots of average delay for center user. Thisis only ≈ times that of the genie-aided round-robin scheduling. For edge users, this is achievedwith ≈ slots of average delay for the edge users, which is only times that of round-robin.Under max-min fairness, both users (1 , and (18 , achieve genie-aided throughputs close to . bits/channel use. The decoding delay for the center user is larger than for the edge user dueto the fact that center users are scheduled very rarely. For the point, edge users achieve . bits/channel use with average delay of slots while center users achieves a similar throughputof . bits/channel use with delay of slots. VI. C

ONCLUDING REMARKS

In this work we considered decentralized downlink scheduling in a multi-cell environmentwith multi-antenna BSs, where the scheduler at each BS has perfect CSIT about its own usersand statistical information about the ICI caused by the other cells. Since each BS modiﬁesits transmit covariance matrix at every slot, the ICI powers experienced at the users’ receiversare random variable. We addressed the scheduling problem in the presence of uncertain ICIpowers in the framework of stochastic network optimization. A straightforward application of thisframework yields a conventional scheme based on adaptive variable-rate coding at the physicallayer, and ARQ at the Logical Link Control layer. Then, a new combination of the same stochasticnetwork optimization framework with incremental redundancy Hybrid ARQ at the physical layerwas shown to improve over the conventional scheme, and achieve a network utility arbitrarilyclose to the performance of a genie-aided system that can schedule the user rates equal to the(non-causally known) instantaneous mutual information on each slot. For this scheme, we alsoshowed that all Nash equilibria of the multi-cell decentralized scheduling game yield networkutility values that can be uniformly upper and lower bounded by virtual systems correspondingto “extremal” ICI statistics, where the lower bound corresponds to the case of deterministicICI powers equal to their mean values, and the upper bound corresponds to the case where allinterfering BSs transmit to a single user at full power (rank 1 interfering covariance matrices).These bounds stay at a ﬁxed gap that is independent of the cellular system conﬁguration, i.e.,of the channel path gain coefﬁcients and operating SNR. The proposed incremental redundancyHARQ can be implemented in practice by using Raptor codes [40] at the physical layer, andneed no protocol overhead to communicate slot-by-slot rate allocation as in adaptive variable-rate coding. Hence, the proposed HARQ scheme is both easier to implement and performssigniﬁcantly better than the conventional variable-rate coding scheme. Also, we hasten to saythat our approach applies directly to a variety of possible conﬁgurations, including differentMU-MIMO precoding schemes and network MIMO schemes with clusters of coordinated cells[46]. In this paper we considered the case of linear ZFBF and no cell clustering for the sakeof clarity of exposition. The approach can also be extended to the case of non-perfect CSIT,following [18]. Here we focused on perfect CSIT for its simplicity and in order to focus on therandom nature of ICI as the fundamental source of uncertainty in a multi-cell environment. R EFERENCES [1] P. Bender, P. Black, M. Grob, R. Padovani, N. Sindhushayana, and A. Viterbi, “CDMA/HDR: A bandwidth-efﬁcienthigh-speed wireless data service for nomadic users,”

IEEE Commun. Mag. , vol. 38, pp. 70–77, Jul. 2000.[2] A. Jalali, R. Padovani, and R. Pankaj, “Data throughput of CDMA-HDR: A high efﬁciency high data rate personalcommunication wireless system,” in

Proc. IEEE Vehic. Tech. Conf., VTC-Spring , May 2000.[3] D. Tse, “Optimal power allocation over parallel Gaussian broadcast channels,” unpublished manuscript .[4] D. Tse and S. Hanly, “Multi-access fading channels: Part I: Polymatroid structure, optimal resource allocation andthroughput capacities,”

IEEE Trans. on Inform. Theory , vol. 44, no. 7, pp. 2796–2815, Nov. 1998.[5] P. Viswanath, D. N. C. Tse, and R. Laroia, “Opportunistic beamforming using dumb antennas,”

IEEE Trans. on Inform.Theory , vol. 48, no. 6, pp. 1277–1294, Jun. 2002.[6] J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,”

IEEE/ACM Trans. Netw. , vol. 8, no. 5, pp.556–567, 2000.[7] .[8] G. Caire and S. Shamai, “On the achievable throughput of a multiantenna Gaussian broadcast channel,”

IEEE Trans. onInform. Theory , vol. 49, no. 7, pp. 1691–1706, 2003.[9] H. Weingarten, Y. Steinberg, and S. Shamai, “The capacity region of the Gaussian multiple-input multiple-output broadcastchannel,”

IEEE Trans. on Inform. Theory , vol. 52, no. 9, pp. 3936–3964, Sep. 2006.[10] S. A. Ramprashad and G. Caire, “Cellular vs. network MIMO: A comparison including the channel state informationoverhead,” in

Proc. IEEE Intern. Symp. on Personal, Indoor and Mobile Radio Commun., PIMRC , Tokyo, Japan, Sep.2009.[11] G. Foschini, K. Karakayali, and R. A. Valenzuela, “Coordinating multiple antenna cellular networks to achieve enormousspectral efﬁciency,”

IEE Proc. Commun. , vol. 152, no. 4, pp. 548–555, Aug. 2006.[12] F. Boccardi, F. Tosato, and G. Caire, “Precoding Schemes for the MIMO-GBC,” in

Proc. Int. Zurich Seminar on Commun. ,Feb. 2006, pp. 10–13.[13] T. Marzetta, “How Much Training is Required for Multiuser MIMO ?”

Signals, Systems and Computers, 2006. ACSSC’06.Fortieth Asilomar Conference on , pp. 359–363, 2006.[14] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, “Multiuser MIMO achievable rates with downlink training andchannel state feedback,” submitted to IEEE Trans. on Inform. Theory , Nov. 2007, Arxiv preprint cs.IT/0711.2642v2.[15] P. Ding, D. Love, and M. Zoltowski, “Multiple antenna broadcast channels with shape feedback and limited feedback,”

IEEE Trans. on Sig. Proc. , vol. 55, no. 7, Part 1, pp. 3417–3428, Jul. 2007.[16] M. Kobayashi, G. Caire, and N. Jindal, “How much training and feedback are needed in MIMO broadcast channels?” in

Proc. IEEE Int. Symp. on Inform. Theory, ISIT , Jul. 2008, pp. 2663–2667.[17] H. Shirani-Mehr and G. Caire, “Channel State Feedback Schemes for Multiuser MIMO-OFDM Downlink,” to appear inIEEE Trans. on Commun. [18] H. Shirani-Mehr, G. Caire, and M. J. Neely, “Mimo downlink scheduling with non-perfect channel state knowledge,”

Submitted to IEEE Transactions on Communications .[19] L. Georgiadis, M. Neely, and L. Tassiulas,

Resource Allocation and Cross-Layer Control in Wireless Networks , ser.Foundations and Trends in Networking. Hanover, MA, USA: Now Publishers Inc., 2006, vol. 1, no. 1.[20] G. Fodor and C. Koutsimanis, “A low intercell interference variation scheduler for ofdma networks,” in

Communications,2008. ICC ’08. IEEE International Conference on , May 2008, pp. 3078–3084. [21] M. T. Ivrlac and J. A. Nossek, “Intercell-interference in the gaussian miso broadcast channel,” in GLOBECOM . IEEE,2007, pp. 3195–3199.[22] M. J. Neely, E. Modiano, and C. Li, “Fairness and optimal stochastic control for heterogeneous networks,”

IEEE INFOCOMProceedings , March 2005.[23] M. J. Neely, E. Modiano, and C. E. Rohrs, “Dynamic power allocation and routing for time varying wireless networks,”

IEEE Journal on Selected Areas in Communications, Special Issue on Wireless Ad-Hoc Networks , vol. 23, no. 1, pp. 89– 103, Jan 2005.[24] Q. Bi and S. Vitebsky, “Performance analysis of 3G-1X EVDO high data rate system,” in

Proc. IEEE Wireless Commun.and Networking Conf., WCNC , vol. 1, Mar. 2002, pp. 389–395.[25] P. Frenger, S. Parkvall, and E. Dahlman, “Performance comparison of HARQ with Chase combining and incrementalredundancy for HSDPA,” in

Proc. IEEE Vehic. Tech. Conf, VTC-Fall , vol. 3, Sep. 2001, pp. 1829–1833.[26] R. Love, A. Ghosh, W. Xiao, and R. Ratasuk, “Performance of 3GPP high speed downlink packet access (HSDPA),” in

Proc. IEEE Vehic. Tech. Conf., VTC-Fall , vol. 5, Sep. 2004, pp. 3359–3363.[27] G. Caire and D. Tuninetti, “The throughput of hybrid-ARQ protocols for the Gaussian collision channel,”

IEEE Trans. onInform. Theory , vol. 47, no. 5, pp. 1971–1988, Jul. 2001.[28] E. Biglieri, J. Proakis, and S. Shamai, “Fading channels: information-theoretic and communications aspects,”

IEEE Trans.on Inform. Theory , vol. 44, no. 6, pp. 2619–2692, 1998.[29] A. Goldsmith,

Wireless Communications . Cambridge University Press, 2005.[30] M. J. Neely, “Energy optimal control for time-varying wireless networks,”

IEEE Transactions on Information Theory ,vol. 52, no. 7, pp. 2915–2934, 2006.[31] J. Jose, A. Ashikhmin, P. Whiting, and S. Vishwanath, “Scheduling and pre-conditioning in multi-user MIMO TDDsystems,”

Arxiv preprint cs.IT/0709.4513 .[32] P. Ding, D. Love, and M. Zoltowski, “Multiple Antenna Broadcast Channels With Shape Feedback and Limited Feedback,”

IEEE Trans. on Sig. Proc. , vol. 55, pp. 3417–3428, 2007.[33] J. W. Friedman, “A non-cooperative equilibrium for supergames,”

Review of Economic Studies , vol. 38, no. 113, pp.1–12, 1971. [Online]. Available: http://ideas.repec.org/a/bla/restud/v38y1971i113p1-12.html[34] ——,

Oligopoly and the Theory of Games . Amsterdam [u.a.]: North-Holland, 1977.[35] J. B. Rosen, “Existence and uniqueness of equilibrium points for concave n-person games,”

Econometrica , vol. 33, no. 3,pp. 520–534, 1965. [Online]. Available: http://dx.doi.org/10.2307/1911749[36] L. Ozarow, S. Shamai, and A. Wyner, “Information theoretic considerations for cellular mobile radio,”

Vehicular Technology,IEEE Transactions on , vol. 43, no. 2, pp. 359–378, May 1994.[37] G. Dimic and N. Sidiropoulos, “On downlink beamforming with greedy user selection: performance analysis and simplenew algorithm,”

IEEE Trans. on Sig. Proc. , vol. 53, no. 10, pp. 3857–3868, Oct. 2005.[38] T. Yoo and A. Goldsmith, “On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming,”

Selected Areas in Communications, IEEE Journal on , vol. 24, no. 3, pp. 528–541, March 2006.[39] H. Huh, H. Papadopoulos, and G. Caire, “MIMO broadcast channel optimization under general linear constraints,” in

Information Theory, 2009. ISIT 2009. IEEE International Symposium on , 28 2009-July 3 2009, pp. 2664–2668.[40] A. Shokrollahi, “Raptor Codes,”

IEEE Trans. on Inform. Theory , vol. 52, no. 6, pp. 2551–2567, Jun. 2006.[41] J. Forney, G., “Exponential error bounds for erasure, list, and decision feedback schemes,”

Information Theory, IEEETransactions on , vol. 14, no. 2, pp. 206–220, Mar 1968. [42] M. Sharif and B. Hassibi, “On the capacity of a mimo broadcast channel with partial side information,” IEEE Trans. onInform. Theory , vol. 51, no. 2, pp. 506–522, Feb. 2005.[43] T. Yoo, N. Jindal, and A. Goldsmith, “Multi-Antenna Downlink Channels with Limited Feedback and User Selection,”

IEEE J. Select. Areas Commun. , vol. 25, pp. 1478–1491, 2007.[44] G. Grimmet and D. Stirzaker,

Probability and Random Processes . Oxford Univ. Press, 2004.[45] M. Abramowitz and I. Stegun,

Handbook of mathematical functions: with formulas, graphs, and mathematical tables .Courier Dover Publications, 1965.[46] G. Caire, S. A. Ramprashad, H. C. Papadopoulos, C. Pepin, and C.-E. W. Sundberg, “Multiuser MIMO downlink withlimited inter-cell cooperation: Approximate interference alignment in time, frequency and space,”

Proc. of Forty-SixthAnnual Allerton Conference on Communication, Control, and Computing , Sept. 2008. k,c r k,c Accumulated mutual informationACK ACK ACK

W (1) W (2) W (3) k,c k,c

Fig. 1. Qualitative plot of the mutual information level-crossing process that determines the decoding events of the HARQprotocol. The jumps of the accumulated mutual information process correspond to slot times at which user ( k, c ) is active. −0.5 −0.25 0 0.25 0.500.10.20.30.40.50.60.70.80.911.1 User Location ¯ R k , ( b i t s / c h a nn e l u s e ) Proportional Fairness

Genie−aided Upper Bound (Th. 4)Genie−aided, Simulated ICIHARQ (97% of Genie−aided), Simulated ICIMean ICI Lower Bound (Th. 4)ARQ−LLC, Simulated ICI

Fig. 2. Average-throughput, proportional fairness. −0.5 −0.25 0 0.25 0.500.10.20.30.40.5 User Location ¯ R k , ( b i t s / c h a nn e l u s e ) Max−min Fairness

Genie−aided Upper Bound (Th. 4)Genie−aided, Simulated ICIHARQ (97% of Genie−aided), Simulated ICIMean ICI Lower Bound (Th. 4)ARQ−LLC, Simulated ICI

Fig. 3. Average-throughput, max-min fairness.

20 40 60 80 100 120 14000.10.20.30.40.50.60.70.8 E [ W k, ] (slots) ¯ R k , ( b i t s / c h a nn e l u s e ) Proportional Fairness

70% 70%80% 80%90% 90%

User close to BSUser at left cell edge

Genie−aidedSemi−analytic, HARQSimulation, HARQ

Fig. 4. Average rate vs. decoding delay with proportional fairness for two sample users. E [ W k, ] (slots) ¯ R k , ( b i t s / c h a nn e l u s e ) Max−min Fairness

70% 70%80% 80%90% 90%

User close to BSUser at left cell edge