[PDF] Effective Capacity in MIMO Channels with Arbitrary Inputs

Abstract

Recently, communication systems that are both spectrum and energy efficient have attracted significant attention. Different from the existing research, we investigate the throughput and energy efficiency of a general class of multiple-input and multiple-output systems with arbitrary inputs when they are subject to statistical quality-of-service (QoS) constraints, which are imposed as limits on the delay violation and buffer overflow probabilities. We employ the effective capacity as the performance metric. We obtain the optimal input covariance matrix that maximizes the effective capacity under a short-term average power constraint. Following that, we perform an asymptotic analysis of the effective capacity in the low signal-to-noise ratio and large-scale antenna regimes. In the low signal-to-noise ratio regime analysis, we utilize the first and second derivatives of the effective capacity when the signal-to-noise ratio approaches zero in order to determine the minimum energy-per-bit and also the slope of the effective capacity versus energy-per-bit curve at the minimum energy-per-bit. We observe that the minimum energy-per-bit is independent of the input distribution, whereas the slope depends on the input distribution. In the large-scale antenna analysis, we show that the effective capacity approaches the average transmission rate in the channel with the increasing number of transmit and/or receive antennas. Particularly, the gap between the effective capacity and the average transmission rate in the channel, which is caused by the QoS constraints, is minimized with the number of antennas. In addition, we put forward the non-asymptotic backlog and delay violation bounds by utilizing the effective capacity. Finally, we substantiate our analytical results through numerical illustrations.

Full PDF

aa r X i v : . [ c s . I T ] D ec Effective Capacity in MIMO Channels withArbitrary Inputs

Marwan Hammouda, Sami Akın, M. Cenk Gursoy, and J ¨urgen Peissig

Abstract —Recently, communication systems that are bothspectrum and energy efﬁcient have attracted signiﬁcant atten-tion. Different from the existing research, we investigate thethroughput and energy efﬁciency of a general class of multiple-input and multiple-output systems with arbitrary inputs whenthey are subject to statistical quality-of-service (QoS) constraints,which are imposed as limits on the delay violation and bufferoverﬂow probabilities. We employ the effective capacity as theperformance metric, which is the maximum constant data arrivalrate at a buffer that can be sustained by the channel serviceprocess under speciﬁed QoS constraints. We obtain the optimalinput covariance matrix that maximizes the effective capacityunder a short-term average power constraint. Following that,we perform an asymptotic analysis of the effective capacity inthe low signal-to-noise ratio and large-scale antenna regimes.In the low signal-to-noise ratio regime analysis, in order todetermine the minimum energy-per-bit and also the slope of theeffective capacity versus energy-per-bit curve at the minimum energy-per-bit , we utilize the ﬁrst and second derivatives of theeffective capacity when the signal-to-noise ratio approaches zero.We observe that the minimum energy-per-bit is independent ofthe input distribution, whereas the slope depends on the inputdistribution. In the large-scale antenna analysis, we show thatthe effective capacity approaches the average transmission ratein the channel with the increasing number of transmit and/orreceive antennas. Particularly, the gap between the effectivecapacity and the average transmission rate in the channel, whichis caused by the QoS constraints, is minimized with the numberof antennas. In addition, we put forward the non-asymptoticbacklog and delay violation bounds by utilizing the effectivecapacity. Finally, we substantiate our analytical results throughnumerical illustrations.

Index Terms —Effective capacity, energy efﬁciency, large-scaleantenna regime, minimum energy-per-bit, multiple-antenna sys-tems, mutual information, optimal input covariance, quality ofservice constraints.

I. I

NTRODUCTION

Following the research of Foschini [1] and Telatar [2],multiple-input and multiple-output (MIMO) transmission sys-tems have been widely studied, and it was shown that employ-ing multiple antennas at a transmitter and/or a receiver can

Copyright (c) 2017 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected]. Hammouda, S. Akın, and J. Peissig are with the Institute of Com-munications Technology, Leibniz Universit¨at Hannover, 30167 Hanover, Ger-many, (E-mail: [email protected], [email protected], and [email protected]).M. C. Gursoy is with the Department of Electrical Engineering andComputer Science, Syracuse University, Syracuse, NY 13244 USA (e-mail:[email protected]).This work was supported by the European Research Council under StartingGrant-306644 and in part by the National Science Foundation under GrantCCF-1618615. remarkably enhance the system performance in terms of bothreliability and spectral efﬁciency [3]. Herein, the information-theoretic analysis of MIMO systems formed the basis to under-stand the system dynamics [4]–[14]. For instance, the ergodiccapacity of MIMO systems was explored, and analytical char-acterizations of spatial fading correlations and their effect onthe ergodic capacity were provided in [5]. Moreover, regardingthe available information about the channel statistics at thetransmitter, the optimal input covariance matrix that achievesthe maximum ergodic capacity in a one-to-one MIMO systemwas investigated in [6]. Considering line-of-sight characteri-zations in a wireless medium, the structures of the capacity-achieving input covariance matrices were researched as well[12]–[14].The efﬁcient use of energy is a fundamental requirementin communication networks because most of the portablecommunication devices are battery-driven and environmentalconcerns are to be carefully mediated. Thus, energy efﬁciencyalong with spectral efﬁciency is in the focus of attention inprospective transmission system designs. For example, the nextgeneration wireless communication technology, commonlyknown as 5G, targets to support 10 to 100 times higher datatransmission rate and to provide 10 times longer battery lifethan the preceding mobile technology [15]. In this regard, theergodic capacity of MIMO systems were primarily studiedin low-power regimes [16]–[19]. These studies revealed thatwhen the objective capacity function is concave, the minimumenergy required to transmit one bit of information, i.e., energy-per-bit , is obtained when the signal-to-noise ratio approacheszero [16]. Subsequently, a more comprehensive energy efﬁ-ciency analysis was conducted considering any power regime[20]. Particularly, MIMO scenarios with Rayleigh fading chan-nel models were investigated, and a fairly accurate closed-form approximation for the energy-per-bit was obtained byengaging different power models. Similar investigations wereconducted in distributed MIMO systems as well [21].Another approach that maximizes the spectral efﬁciencywhile minimizing the energy-per-bit is to increase the spatialdimension by increasing the number of transmit and/or receiveantennas. It was shown that the spectral efﬁciency improvessubstantially with the increasing number of antennas whilemaking the transmit power arbitrarily small [22]. On thisaccount, massive MIMO (or large-scale antenna [23]) systemshave evolved as a candidate technology for 5G wireless com-munications [24], [25], and they have been investigated frominformation-theoretic perspectives [26]–[32]. Particularly, en-ergy and spectral efﬁciency in the uplink channels of multi-user massive MIMO systems were studied with different information processing techniques such as maximum-ratiocombining, zero forcing, and minimum mean-square error(MMSE) estimation [29]. Likewise, power allocation policieswere also studied and optimal input covariance matrices inmulti-access channels with massive number of antennas atboth transmitters and receivers, which maximize the sumtransmission rate, were derived [32].Quality-of-service (QoS) constraints, which generallyemerge in the form of delay and/or data buffering limitations,are generally disregarded when the ergodic capacity is setas the only performance metric. However, the increasingdemand for delay-sensitive services, such as video streamingand online gaming over wireless networks, has brought upthe need for a comprehensive investigation of delay-sensitivescenarios [33]. For wireless communications systems withsuch delay-sensitive services, the ergodic capacity solely isnot a sufﬁcient metric. On the contrary, QoS constraints inthe data-link layer that are attributed to delay violation andbuffer overﬂow probabilities should be invoked as performancemeasures as well. Relying on this motivation, cross-layerdesign goals were acquired as new research grounds. Initialcross-layer analysis was performed in wired networks, wherethe effective bandwidth (the minimum required service ratefrom a transmission node given a data arrival process at thatnode under desired QoS requirements) was introduced as aperformance probing tool [34], [35]. In effective bandwidthstudies, stochastic nature of data arrival processes were takeninto account while assuming service processes with constanttransmission rates. However, in contrast to the deterministicnature of wired networks, wireless service links demonstrategenerally a stochastic behavior, and the instantaneous trans-mission (service) rates may vary drastically. In this context,the effective capacity, which provides the maximum constantdata arrival rate at a transmission node that is sustained by agiven stochastic service process under deﬁned QoS constraints,was proposed [36]. The effective capacity is the dual of theeffective bandwidth. The concept of the effective capacityhas gained a notable attention, and it has been investigatedin several transmission scenarios, including MIMO systems[37]–[39]. Speciﬁcally, point-to-point MIMO scenarios wereexplored under QoS constraints by employing the effectivecapacity as the performance metric in the low and highsignal-to-noise ratio regimes and the wide-band regime [37].A comparable analysis was extended to cognitive MIMOsystems, where the effects of channel uncertainty on theeffective capacity performance of secondary users followingchannel sensing errors are studied [38]. Regarding the antennabeam-forming, optimal transmit strategies that maximize theeffective capacity were derived in MIMO systems with doublycorrelated channels and a covariance feedback [39].Because Gaussian input signaling in certain cases is optimalin the sense of maximizing the mutual information between theinput and output in a transmission channel, it has been invokedin many research scenarios. Even though Gaussian input sig-naling is not practical, it is preferred by many researchers sinceit typically simpliﬁes the analytical presentations. On the otherhand, it is of importance to understand the effects of signalingchoice on the the system performance, because the type of input signaling may critically affect the tradeoff between thedata arrival process to a node and the data service processfrom that node [40]. A general look at wireless systemsemploying ﬁnite and discrete input signaling methods can befound in [41]–[48]. However, QoS constraints are generally notincluded in these studies. Particularly, the optimal precodingmatrix in a point-to-point MIMO system, which maximizes themutual information in the low and high signal-to-noise ratioregimes, was proposed [41]. With the same objective, channeldiagonalization was applied in order to obtain the optimalchannel precoder [43], [45], i.e., parallel and non-interferingGaussian channels are formed to reach the optimal inputcovariance matrix. In another study [49], the optimal power al-location policy that maximizes the mutual information, namedas mercury/water-ﬁlling , was shown to be a generalization tothe well-known water-ﬁlling algorithm. Multi-access systemswere studied as well [47], where linear precoding matricesare obtained in order to maximize the weighted sum rate. Anextension of the same analysis was performed in scenariosin which transmitters have only statistical information aboutthe wireless channels [48]. Asymptotic analyses in the large-scale antenna regimes were also provided. Here, the notionof mutual information was utilized as the performance metric,and the rudimentary relation between the mutual informationand the MMSE, which was introduced in [50], [51], wasexploited.In this paper, we focus on a more general MIMO scenarioin which input signaling is arbitrary and statistical QoS con-straints. We investigate the system performance by employingthe effective capacity. We provide a mathematical toolbox thatsystem designers can use in order to understand performancelevels of spectrum and energy efﬁcient systems under QoSconstraints imposed as limits on the buffer overﬂow and delayviolation probabilities, which are two of the main objectivesin the 5G technology [15]. More speciﬁcally, we can list ourcontributions as follows:1) Assuming that the instantaneous channel fading gainestimate is available at both the transmitter and thereceiver, we have identiﬁed the optimal input covariancematrix that maximizes the effective capacity under ashort-term average power constraint over the transmitantennas.2) We obtain the ﬁrst and second derivatives of the effectivecapacity when the signal-to-noise ratio goes to zero.Using these derivatives, we obtain a linear approxima-tion of the effective capacity in the low signal-to-noiseratio regime. We show that this approximation does notdepend on the input distribution and covariance matrix.3) We further show that the minimum energy-per-bit isobtained when the signal-to-noise ratio goes to zero andthat it is independent of the QoS constraints, the inputdistribution, and the covariance matrix.4) In the large-scale antenna regime, we prove that theeffective capacity approaches the average mutual in-formation in the channel, i.e., the dependence of the We refer interested readers to [52]–[55] and references therein for practicalmassive MIMO settings.

Fig. 1: Channel model.effective capacity performance on the QoS constraintsdecreases with the increasing number of antennas.5) Under the stability condition of the data queue, weanalyze the non-asymptotic backlog and delay violationbounds by utilizing the effective capacity.We can apply the analysis provided in the paper in differentpractical scenarios that necessitate low latency, low power con-sumption or ability to simultaneously support massive numberof users. Here, we refer to the vehicular-based communicationscenarios deﬁned by the well-known European project ‘Mobileand wireless communications Enablers for the Twenty-twentyInformation Society (METIS)’ [15], [56]. For instance, we canperform our analysis in scenarios such as ‘Best experiencefollows you’ [15] and ‘Trafﬁc Jam’ and ‘Trafﬁc Efﬁciencyand Safety’ [56].The rest of the paper is organized as follows: We de-scribe the MIMO system in Section II. Then, we discussthe instantaneous mutual information between the channelinput and output, and then introduce the effective rate andcapacity expressions in Section III. We provide the optimalinput covariance matrix. We perform asymptotic analyses inthe low signal-to-noise ratio regime in Section IV-A and inthe large-scale antenna regime in Section IV-B. We investigatenon-asymptotic backlog and delay bounds in Section V. Wepresent the numerical results in Section VI and the conclusionin Section VII. We relegate the proofs to the Appendix.II. C

HANNEL M ODEL

As shown in Figure 1, we consider a point-to-point MIMOtransmission system in which one transmitter and one receiverare equipped with M and N antennas, respectively. Thedata generated by a source (or sources) initially arrives atthe transmitter buffer with rate a(t) bits/channel use for t ∈ { , , · · · } and is stored in the buffer. Following theencoding and modulation processes, the transmitter sends thedata to the receiver over the wireless channel packet by packetin frames (blocks) of T channel uses . During the transmissionof the data, the input-output relation in the ﬂat-fading channelat time instant t is expressed as follows: y t = √ P H t x t + w t , (1) Each channel use duration can be considered equal to the samplingduration of one symbol, i.e., bits/sec/Hz. where x t and y t are the M -dimensional input and N -dimensional output vectors, respectively, and w t representsthe N -dimensional additive noise vector with independentand identically distributed elements. Each element of w t is circularly symmetric, complex Gaussian distributed withzero-mean and variance σ w . Hence, we have E { w t w † t } = σ w I N × N , where E {·} denotes the expected value, {·} † is thetranspose operator and I N × N is the N × N identity matrix.Furthermore, H t = { h nm ( t ) } is the N × M random channelmatrix, where h nm ( t ) is the channel fading coefﬁcient withan arbitrary distribution between the m th transmit antenna andthe n th receive antenna. Here, we consider a general channelmodel and assume that the distributions of { h nm ( t ) } can beeither statistically identical or non identical or semi identical .We further assume that the channel matrix remains constantduring one transmission frame ( T channel uses ) and changesindependently from one frame to another. We also considera short-term power constraint, i.e., P indicates the powerallocated for the transmission of the data in one channel use .Then, we have tr { E { x t x † t }} = tr { K t } ≤ , where tr {·} isthe trace operator and K t is a positive semi-deﬁnite Hermitianmatrix.We assume that the instantaneous channel realizations areavailable at both the transmitter and the receiver, and that thechannel fading coefﬁcients are correlated with each other. Weinvoke the Kronecker product model, which is widely used inmodeling real channels because of its analytical tractabilitywith a reasonable accuracy [58, Ch. 2], [59]. Hence, thechannel matrix is expressed as H t = R r Γ t R v , (2)where Γ t is an N × M matrix with independent and identicallydistributed complex elements. R v and R r are the transmit andreceive correlation matrices, respectively, which are usuallymodeled with an exponential correlation structure [60], [61].The transmit and receive correlation matrices depend on thearray spacing at the transmitter and the receiver, and thecharacteristic distances proportional to the spatial coherencedistances at the transmitter and the receiver, respectively.Particularly, the elements of R v and R r are expressed as { R v } kl = e dv ∆ v | k − l | for k, l ∈ { , · · · , M } and { R r } kl = e dr ∆ r | k − l | for k, l ∈ { , · · · , N } , respectively, where d v and d r are the corresponding antenna spacings, and ∆ v and ∆ r are the corresponding characteristic distances. Therefore, thecorrelation matrix at one end can be locally estimated withoutany feedback from the other end. On the other hand, Γ t isestimated by the receiver, and then forwarded to the transmitter In semi-identical channel models, the channel coefﬁcients from differenttransmit antennas to a common receive antenna at a receiver with multipleantennas are identically distributed, but the coefﬁcients related to differentreceive antennas are non-identically distributed. Such a model ﬁts into anuplink scenario of a cellular system, e.g., from a handset to a base station,where the antennas on the handset are installed in a small panel and theantennas at the base station are mounted several wavelengths apart from eachother [57]. at the beginning of each transmission frame . We further knowthat in practical settings the channel estimation is obtainedimperfectly. Therefore, we have H t = R r ( b Γ t + e Γ t ) R v = b H t + e H t , (3)where b Γ t is the channel estimate and e Γ t is the channelestimation error. Given that the receiver employs MMSEestimator in order to obtain the channel knowledge, we have b Γ t and e Γ t uncorrelated with each other. Similar to [64],we further assume that e Γ t is a zero-mean process with aknown variance at both the transmitter and the receiver. Above, b H t = R r b Γ t R v and e H t = R r e Γ t R v . Hence, the input-outputrelation in (1) becomes y t = √ P b H t x t + √ P e H t x t + w t = √ P b H t x t + e w t . (4)III. E FFECTIVE C APACITY

Due to the time-varying nature of wireless channels, itis not very easy to sustain a stable transmission rate. Inparticular, reliable transmission may not be provided all thetime. Therefore, depending on the type of data transmission,delay violation and buffer overﬂow concerns become criticalat the transmitter. Respectively, given a statistical transmission(service) process, how to determine the maximum data arrivalrate at the transmitter buffer so that the QoS requirementsin the form of limits on delay violation and buffer overﬂowprobabilities can be satisﬁed is one of the main researchquestions. In this regard, the effective capacity can be em-ployed as a performance metric. Speciﬁcally, the effectivecapacity identiﬁes the maximum constant data arrival rateat the transmitter buffer that the time-varying transmissionprocess can support under desired QoS constraints [36].In Fig. 1, Q ( t ) is the number of bits in the data buffer at timeinstant t and q is the buffer threshold. Now, let Q = Q ( t →∞ ) be the steady-state queue length and θ be the decay rate ofthe tail distribution of the queue length, Q . Then, θ is deﬁnedas [34, Theorem 3.9] θ = − lim q →∞ log e Pr { Q ≥ q } q . (5) θ is also called as the QoS exponent. Now, we can easily ap-proximate the buffer overﬂow probability as Pr ( Q ≥ q max ) ≈ e − θq max for a large threshold, q max . Here, larger θ impliesstricter QoS constraints, whereas smaller θ corresponds tolooser constraints. Subsequently, for a given discrete-time,ergodic and stationary stochastic service process, r ( t ) , the Similar to the strategy in [62]–[64], we assume that the feedback channelis delay-free and error-free. Because we have a block-fading channel, thechannel information is valid until the end of the transmission frame. Evenif we consider a feedback delay, it will only reduce the time allocated fordata transmission. In particular, when the channel feedback arrives after acertain portion of the time frame ( T channel uses), i.e., αT for < α < ,the remaining (1 − α ) T will be the time duration for data transmission.Moreover, the reliable feedback can be sustained with strong channel codes. The constraint on the overﬂow probability can be linked to the constrainton the queuing delay probability. For instance, it has been shown that Pr { D ≥ d max } ≤ c p Pr { Q ≥ q max } for constant arrival rates, where D is the steady-state delay experienced at the buffer, c is a constant, and q max = ad max , where a is the data arrival rate [65]. effective capacity as a function of the decay rate parameter, θ ,is given by [36, Eq. (11)] C E ( θ ) = − lim τ →∞ θτ T log e E { e − θ P τTt =1 r ( t ) } , where r ( t ) is the service rate in the wireless channel at timeinstant t , P τTt =1 r ( t ) is the time-accumulated service process,i.e., the total number of bits served from the transmitter in τ T channel uses , and τ ∈ { , , · · · , } is the time frameindex. Recall that the encoding and modulation of data andits transmission are performed in frames of T channel uses .Given the channel estimate, b H t , the service rate in oneframe can be set to the mutual information between x t and y t , i.e., r ( t ) = I ( x t ; y t | b H t ) . However, considering theinput-output relation (4), it is difﬁcult to evaluate the mutualinformation in closed-form. Therefore, the service rate in thechannel is set to a lower bound on the mutual information byconsidering the worst-case noise and modeling the estimationerror as an additional Gaussian noise vector with zero-mean,independent and identically distributed samples [66], [67],i.e., r ( t ) = I L ( x t ; y t | b H t ) ≤ I ( x t ; y t | b H t ) and E { e w t e w † t } = σ e w I N × N , where σ e w = σ w + PNM tr n E n e H t x t x † t e H † t oo . Sincethe service rate in the channel is smaller than or equal to themutual information, the reliable transmission is guaranteed.Hence, the service rate is expressed as r ( t ) = I L ( x t ; y t | b H t ) = E x t , y t (cid:26) log f y t | x t ( y t | x t ) f y t ( y t ) (cid:27) , (6)where f y t ( y t ) = P x t p ( x t ) f y t | x t ( y t | x t ) is the prob-ability density function of y t and f y t | x t ( y t | x t ) =( πσ e w ) − N e − σ e w || y t −√ P b H t x t || is the conditional probabilitydensity function of y t given x t . For notational conveniencein the paper, we use I ( x t ; y t ) to refer to the lower bound, I L ( x t ; y t | b H t ) .Because the channel matrix stays constant during one trans-mission frame and changes independently from one frame toanother, and that the encoding and modulation of the datapackets are performed in T channel uses , we can express thenormalized effective rate in bits/channel use/receive dimension as R E ( θ ) = − θN T log e E b H t n e − θT I ( x t ; y t ) o . (7)Above, while the receiver has the instantaneous channel esti-mate, the transmitter has no information regarding the channelmatrix. If the transmitter is aware of the channel statisticsbut not the actual value of b H t , then the transmitter sets theinput covariance matrix to a value, i.e., K t = K , in orderto maximize the effective rate in (7) by considering the QoSconstraints and the channel statistics, i.e., R E ( θ ) = max K (cid:23) tr { K ) ≤ − θN T log e E b H t n e − θT I ( x t ; y t ) o (8) As for the effective capacity when there exists a temporal correlationbetween the channel matrices, we refer to [35, Chap. 7, Example 7.2.7].Because we focus on the performance levels in the low signal-to-noiseratio and large-scale antenna regimes, we consider a temporally uncorrelatedchannel model. in bits/channel use/receive dimension . In (8), the covariancematrix, K , depends on the statistics of b H t and the worst-casenoise, and is independent of its actual realization. On the otherhand, if the instantaneous knowledge of b H t is available atthe transmitter and the receiver, the transmitter can adaptivelyset the input covariance matrix by considering both the QoSconstraints and the instantaneous realization of the channelmatrix . Hence, the maximum effective rate, which we call asthe effective capacity, in bits/channel use/receive dimension isgiven as follows: C E ( θ ) = max K t (cid:23) tr { K t ) ≤ − θN T log e E b H t n e − θT I ( x t ; y t ) o . (9)Above, K t is time-varying unlike K in (8), because it is afunction of b H t .Here, a key research problem is the optimal selection of thepower allocation policy (or input covariance matrix) given thechannel matrix and the QoS requirements. In particular, thecentral question is the following: What is the instantaneousinput covariance matrix, K t , that solves (9) given that thechannel matrix, b H t , is available at the transmitter and thereceiver, and that there are certain QoS constraints? In thefollowing theorem, we identify the optimal policy that thetransmitter should employ to obtain (9). Theorem 1:

The input covariance matrix, K t (cid:23) , thatmaximizes the effective capacity given in (9) is the solutionof the following equality: K t = θT γe − θT I ( x t ; y t ) λ b H † t b H t mmse t , (10)where γ = Pσ e w is the average signal-to-noise ratio at the re-ceiver, λ is the Lagrange multiplier of the constraint tr { K t } ≤ , and mmse t = E n ( E { x t | y t } − x t ) ( E { x t | y t } − x t ) † o isthe MMSE matrix. Proof : See Appendix A. (cid:3)

In (10), both the mutual information and mmse t arefunctions of the input covariance matrix, K t , and (10) isnon-concave over the space spanned by K t [42], [44], [45].Therefore, the solution obtained from (10) is not necessarilyunique. On the other hand, we follow a different strategy andstart with the singular value decomposition of the channelmatrix, i.e., b H t = U t D t V † t , where U t and V t are N × N and M × M unitary matrices,respectively, and D t is an N × M matrix with non-negativereal numbers on the diagonal, which are the square roots ofthe non-zero eigenvalues of b H t b H † t and b H † t b H t . Then, we re-express the input-output model in (4) as follows: e y t = √ P D t e x t + e n t , (11)where e y t = U † t y t and e x t = V † t x t . The new noise vectoris denoted by e n t = U † t e w t , which is a zero-mean, Gaussian, In case there is a delay in the feedback channel, and the delay is smallerthan the block duration ( T channel uses ), the effective capacity can bereformulated as C E ( θ ) = − θNT log e E b H t (cid:8) e − θT (1 − α ) I ( x t ; y t ) (cid:9) , where αT is the delay and < α < . complex vector with independent and identically distributedelements [2]. We further know that I ( x t ; y t ) = I ( e x t ; e y t ) ,because the information regarding b H t is available at both thetransmitter and the receiver. Now, let e K t be the covariancematrix of e x t , i.e., e K t = E { e x t e x t † } = E { V † t x t x † t V t } = V † t K t V t . In particular, if we can ﬁnd the optimal e K t , we can also deter-mine the optimal input covariance matrix, K t . Therefore, weprovide the optimal input covariance matrix in the followingtheorem and show that this is the global solution in its proof. Theorem 2:

The input covariance matrix, K t (cid:23) , thatprovides (9) is K t = V t Σ t V † t , (12)where V t is the M × M unitary matrix, columns ofwhich are the left-singular vectors of H t . e K t = Σ t = diag { σ t (1) , · · · , σ t ( M ) } is an M × M diagonal matrix thatsatisﬁes σ t ( i ) = θT γd t ( i ) λ e − θT I ( e x t ; e y t ) mmse t ( i ) , if σ t ( i ) ≥ ,σ t ( i ) = 0 , otherwise, σ t ( i ) = 0 , for min { M, N } < i ≤ M, given that λ is the Lagrange multiplier associated withthe constraint P Mi =1 σ t ( i ) ≤ , and mmse t ( i ) = E n | E { e x t ( i ) | e y t ( i ) } − e x t ( i ) | o is the MMSE function. Fur-thermore, d t ( i ) is the i th eigenvalue of b H t b H † t and b H † t b H t . Proof : See Appendix B. (cid:3)

Remark 1:

The input covariance matrix, K t , is set accordingto the channel estimate. However, the constraint tr { K t } ≤ (or P Mi =1 σ i ≤ in Theorem 2) is independent of the channelestimate. Therefore, the worst-case noise variance, σ e w , andhence the signal-to-noise ratio, γ , do not depend on the actualchannel estimate.IV. E FFECTIVE C APACITY IN A SYMPTOTIC R EGIMES

Having obtained the effective capacity and rate expressions,and having characterized the optimal input covariance matricesthat maximize the effective capacity performance, we notethat due to the complexity in the analytical formulations, itbecomes difﬁcult to gain insight on the system performancein general scenarios. On the other hand, asymptotic approachescan help us set the design criteria in certain asymptoticregimes. Therefore, we investigate the effective capacity ofMIMO systems in the low signal-to-noise ratio and large-scaleantenna regimes. We also note that we drop the time index inthe sequel unless otherwise it becomes necessary.

A. Effective Capacity in Low Signal-to-Noise Ratio Regime

In this section, we explore the effective capacity perfor-mance of the aforementioned MIMO system with an arbitraryinput distribution in the low signal-to-noise ratio regime. Inthis direction, we determine the minimum energy-per-bit andthe slope of the effective capacity versus the energy-per-bit at the minimum energy-per-bit , which are denoted by ζ min and S , respectively. The beneﬁt of the low signal-to-noiseratio analysis is that many battery-driven applications requireoperations at low energy costs and energy efﬁciency generallyincreases with decreasing transmission power when the trans-mission throughput is a concave function of the transmissionpower. For this purpose, we start the low signal-to-noise ratioanalysis with the following second-order expansion of theeffective capacity with respect to the transmission power, P ,at P = 0 : C E ( θ, P ) = ˙ C E ( θ, P + ¨ C E ( θ, P o ( P ) , (13)where ˙ C E ( θ, and ¨ C E ( θ, are, respectively, the ﬁrst andsecond derivatives of the effective capacity with respect to P at P = 0 . Note that we express the effective capacity as afunction of θ and P .Now, let ζ = PC E ( θ,P ) denote the energy-per-bit required forgiven θ and P . Following [68, Proposition 1], we can showthat the effective capacity is concave in the space spanned by P . Thus, we can obtain the minimum energy-per-bit whenthe transmission power goes to zero, i.e., P → , as follows: ζ min = lim P → PC E ( θ, P ) = 1˙ C E ( θ, . (14)Moreover, considering the result in [16, Eq. (29)], we can showthe slope of the effective capacity versus ζ (in dB) curve at ζ min as S = lim ζ ↓ ζ min C E ( ζ )10 log ζ −

10 log ζ min

10 log , (15)where C E ( ζ ) is the effective capacity as a function of the energy-per-bit , ζ , and ζ min is the minimum energy-per-bit andobtained when the transmission power goes to zero, i.e., P → . Above, ζ ↓ ζ min indicates the limit when the value of ζ is reduced and approaches ζ min . Using the ﬁrst and secondderivatives [16, Th. 9], we can express the slope in bits/channeluse/(3 dB)/receive antenna as S = 2 (cid:2) ˙ C E ( θ, (cid:3) − ¨ C E ( θ,

0) log e . (16)Accordingly, having ζ min and S , we can form a linear ap-proximation of C E ( ζ ) in the low signal-to-noise ratio regime.In order to better understand the effective capacity perfor-mance in the low signal-to-noise ratio regime, we provide thefollowing theorem. Theorem 3:

The ﬁrst derivative of the effective capacity in(9) with respect to P at P = 0 is given as ˙ C E ( θ,

0) = 1 N log e E b H { λ max ( b H † b H ) } , (17) It is known that the minimum energy-per-bit is obtained as the signal-to-noise ratio goes to zero [16]. In our model, the signal-to-noise ratio, γ = Pσ e w ,goes to zero with the transmission power going to zero. We utilize the Taylor series representation of the effective capacity withrespect to P at P = 0 . It is sufﬁcient to prove the concavity of the lower bound on the mutualinformation over the space spanned by the transmission power P , because thesignal-to-noise ratio is an increasing function of the transmission power. Theconcavity of the same lower bound on the mutual information is also shownin [62, Eq. 16] when the channel input is Gaussian distributed. and the second derivative of the effective capacity with respectto P at P = 0 is given as ¨ C E ( θ,

0) = θTN log e (cid:2) E b H { λ max ( b H b H ) } − E b H { λ max ( b H † b H ) } (cid:3) − E b H { λ max ( b H † b H ) } N l log e − σ e N log e E b H { λ max ( b H † b H ) } , (18)where λ max ( b H † b H ) in (17) and (18) is the maximum eigen-value of b H † b H and l in (18) is the multiplicity of λ max ( b H † b H ) .Above, σ e = PNM tr n E n e H t x t x † t e H † t oo . Proof:

See Appendix C. (cid:3)

Remark 2:

The ﬁrst and second derivatives of the effectivecapacity, ˙ C E ( θ, and ¨ C E ( θ, , respectively, are independentof the input distribution. Particularly, the minimum energy-per-bit , ζ min , and the slope of the effective capacity versus ζ (in dB) curve at ζ min , S , are not functions of x and/orits probability density function. Additionally, our results alsoconﬁrm the ﬁndings in [37], where the effective capacity ofMIMO systems are investigated when the input is Gaussiandistributed and the channel is perfectly known at both thetransmitter and receiver. Remark 3:

As also detailed in the proof in Appendix C, theminimum energy-per-bit is achieved by allocating data powerin the direction of the eigenspace of the maximum eigenvalueof b H † b H . Remark 4:

The minimum energy-per-bit , ζ min , does notchange with increasing or decreasing QoS constraints or thechannel estimation error, while the slope of the effectivecapacity at ζ min , S , is a function of both the exponentialdecay rate parameter, θ , and the estimation error variance, σ e .With increasing σ e , the slope decreases. Remark 5:

The aforementioned minimum energy-per-bit and slope are acquired given the fact that the input vector, x , is complex. On the other hand, when the modulation isperformed over the real axis of the constellation only, e.g.,binary phase-shift keying (BPSK) and M -pulse-amplitude-modulation, the minimum energy-per-bit stays the same be-cause the ﬁrst derivative does not change, but the slopebecomes half of the slope achieved with a complex modulationbecause the second derivative is the double of the secondderivative in the case of a complex modulation [49]. B. Effective Capacity in Large-Scale Antenna Regime

With the increasing number of antennas the transmitters andthe receivers are equipped with, there are more communica-tion pathways and increased transmission link reliability. Onemore advantage of employing many antennas is the energyefﬁciency, due to the fundamental principle that with a largenumber of antennas, energy can be focused with extremesharpness onto small regions in space [24]. Therefore, inthis section, we turn our attention to analyzing the systemperformance in the large-scale antenna regime. Principally,we obtain the effective capacity while the number of transmitor/and receive antennas goes to inﬁnity.

In particular, we are interested in the effective capacity givenin (9) when both M and N approach, or either M or N approaches, inﬁnity, i.e., lim M and/or N →∞ C E ( θ, P ) = C ∞ E ( θ, P ) . (19)The following theorem provides a signiﬁcant property of C ∞ E ( θ, P ) , which follows from the increase in the numberof antennas at the transmitter and/or the receiver. Theorem 4:

For the MIMO system described in (4), theeffective capacity, C ∞ E ( θ, P ) deﬁned in (19), is independent ofthe QoS exponent, θ , and approaches the average transmissionrate, i.e., C ∞ E ( θ, P ) = lim M and/or N →∞ N E b H { r } (20)where r is the service rate deﬁned in (6). Proof:

See Appendix D. (cid:3)

Remark 6:

Note that N × C E ( θ, γ ) indicates the throughputlevel the wireless channel can support under given QoS andtransmission power constraints, and that E { r } is the averageservice rate in the wireless channel in one channel use . Since N × C E ( θ, γ ) ≤ E { r } for any θ , the transmitter cannot acceptdata to its buffer at a rate more than the effective capacity, N × C E ( θ, γ ) , due to the delay violation and buffer overﬂowconstraints even though the average service rate in the channelis higher. Therefore, the link utilization, which is deﬁned tobe the ratio of the data ﬂow rate to a link to the link capacity[69, Ch. 5] and [70, Ch. 16], decreases with increasing QoSconstraints. We can consider the effective capacity as themaximum data ﬂow rate and the channel throughput as thelink capacity. Herein, Theorem 4 states that the maximumlink utilization can be achieved under QoS constraints byincreasing the number of antennas. Remark 7:

As made clear in the proof of Theorem 4, theknowledge of the channel realizations is not necessary at thetransmitter side to achieve the transmission rate given in (20)when the number of transmit and/or receive antennas becomeslarger. Indeed, the statistical information regarding the channelmatrix, H , is sufﬁcient. Example 1:

Let us assume that the channel is perfectlyknown and that the channel coefﬁcients { h nm ( t ) } are zero-mean, independent and identically distributed with ﬁnite vari-ance σ h , i.e., E {| h nm | } = σ h . When the number of antennasis going to inﬁnity, the minimum energy-per-bit deﬁned in(14), ζ min , and the slope of the effective capacity versus ζ curve at ζ min deﬁned in (15), S , are ζ ∞ min = lim M and/or N →∞ ζ min = lim M and/or N →∞ C E ( θ, (21) = lim M and/or N →∞ N log e E H { λ max ( H † H ) } (22) = lim M and/or N →∞ min { M, N } N log e M N σ h (23) =  , if M → ∞ , log e ρσ h , if M, N → ∞ , MN = ρ > log e σ h , if MN ≤ (24) and S ∞ = lim M and/or N →∞ S = lim M and/or N →∞ (cid:2) ˙ C E ( θ, (cid:3) − ¨ C E ( θ,

0) log e (25) = lim M and/or N →∞ e h E H { λ max ( H † H ) } N log e i E H { λ max ( H † H ) } Nl log e (26) = lim M and/or N →∞ { M, N } N (27) =  , if N → ∞ , ρ, if M, N → ∞ , MN = ρ ≤ , if MN > , (28)respectively.V. N ON - ASYMPTOTIC P ERFORMANCE A NALYSIS

So far, we have investigated the throughput and energyefﬁciency of the aforementioned MIMO systems in two differ-ent asymptotic regimes by employing the effective capacity,which is also an asymptotic measure in time. Nevertheless,non-asymptotic performance bounds regarding the statisticalcharacterizations of buffer overﬂow and queueing delay areof importance for practical research agendas. Therefore, webeneﬁt from the tools of the stochastic network calculus [71]–[73], and provide a statistical bound on the buffer overﬂow andqueueing delay probabilities by utilizing the effective capacity.Recall that the transmission of a packet is performed over ablock duration of T channel uses and the transmission rate inthe channel during one transmission block is constant. Now, letus deﬁne s ( i ) as the total number of bits transmitted (served) inthe i th transmission block. Subsequently, considering the nor-malized effective rate for the input covariance matrix given in(7), R E ( θ ) , and following the setting in [73, Deﬁnition 7.2.1],we deﬁne a statistical afﬁne bound for the aforementionedchannel model for any decay rate value, θ , as follows: E n e − θS ( i,j ) o ≤ e − θ [( j − i ) NT R E ( θ ) − σ R ( θ )] , (29)where S ( i, j ) = P jl = i +1 s ( l ) , and σ R ( θ ) is a slack term thatdeﬁnes an initial transmission delay. Due to − θ , the expressionin (29) is in fact a lower bound on the expected amountof the transmitted data in the channel. Subsequently, notingChernoff’s lower bound Pr { X ≤ x } ≤ e θx E { e − θX } for θ ≥ , we have the exponentially bounded ﬂuctuation modeldescribed in [74] with parameters R E ( θ ) > and b ≥ as Pr { S ( i, j ) < ( j − i ) N T R E ( θ ) − b } ≤ ε ( b ) , where ε ( b ) = e θσ R ( θ ) e − θb is a speciﬁc exponentially decayingdeﬁcit proﬁle of the amount of the transmitted data in thechannel. Now, using the union bound, we express the samplepath guarantee as follows: Pr {∃ i ∈ [0 , j ] : S ( i, j ) < ( j − i ) N T R ⋆E ( θ ) − b } ≤ ε ′ ( b ) , -30 -20 -10 0Signal-to-Noise Ratio, (dB)00.511.522.5 E ff e c t i v e C apa c i t y ( b i t s / c hanne l u s e ) Optimal Power AllocationEqual Power Allocation N=2N=4N=16N=50 (a) BPSK -30 -20 -10 0 10Signal-to-Noise Ratio, (dB)012345 E ff e c t i v e C apa c i t y ( b i t s / c hanne l u s e ) Optimal Power AllocationEqual Power Allocation N=2N=4N=16N=50 (b) 4-QAM

Fig. 2: Effective capacity as a function of the signal-to-noise ratio, γ , when M = 2 and θ = 1 with different number of receiveantennas, i.e., N = ∈ { , , , } .where ε ′ ( b ) = e θσ R ( θ ) − e − θδ e − θb (30)and N T R ⋆E ( θ ) = N T R E ( θ ) − δ with a free parameter < δ ≤ N T R E ( θ ) − T a for a constant data arrivalrate at the transmitter buffer, i.e., a bits/channel use . For amore detailed derivation, we refer to [73]. We also refer to[75], where capacity-delay-error boundaries are provisionedas performance models for networked sources and systems.Exclusively, the backlog at the transmitter buffer with theconstant data arrival rate a , i.e., Q ( j ) = max i ∈ [0 ,j ] { ( j − i ) T a − S ( i, j ) } , has a statistical bound q = max i ∈ [0 ,j ] (cid:8) ( j − i ) T a − [( j − i ) N T R ⋆E ( θ ) − b ] + (cid:9) and may fail with probability Pr { Q ( j ) > q } ≤ ε ′ ( b ) , where [ x ] + = 0 if x < and [ x ] + = x otherwise, which accountsfor S ( i, j ) ≥ . In this place, if a ≤ N R ⋆E ( θ ) for stability, q = T a bN T R E ( θ ) − δ (31)is valid for all j . Accordingly, we can express the delay bound Pr { D ( j ) > d } with d = qa , which is expressed in channeluse . In other words, T bNT R E ( θ ) − δ in (31) provides us the initiallatency caused by the variations in the transmission. Finally,we can express b by inversion of (30) for any given ε ′ as b = σ R ( θ ) − θ h log e ( ε ′ ) + log e (cid:0) − e − θδ (cid:1)i . (32)As for the existence of the slack term in (32), we refer to thefollowing Lemma. Lemma 1: If S ( i, j ) has an envelope rate N T R E ( θ ) < ∞ for every ǫ > , there exists σ R ( θ ) < ∞ such that S ( i, j ) is ( σ R ( θ ) , N T R E ( θ ) − ǫ ) -upper constrained [76, Lemma 1]. VI. N UMERICAL R ESULTS

In this section, we substantiate our analytical results throughnumerical analysis. We initially assume that the channel isperfectly known at both the transmitter and receiver, andthat the channel coefﬁcients are uncorrelated, i.e., R r = I N and R t = I M . In addition, we consider a Rayleigh fadingchannel model, where the components of the channel matrix, H , are independent and identically distributed, zero-mean, unitvariance ( σ h = 1 ), circularly symmetric Gaussian randomvariables, i.e., { h nm } ∼ CN (0 , for n ∈ { , · · · , N } and m ∈ { , · · · , M } . In addition, we set the noise power to σ w = 1 . Thus, the signal-to-noise ratio is same with thetransmission power, i.e., γ = P . Moreover, for the sake of sim-plicity, we set the number of channel uses in one transmissionframe to 1, i.e., T = 1 . Initially, we plot the effective capacityof the MIMO system as a function of the signal-to-noise ratio, γ , for different numbers of receive antennas, N , in Fig. 2 whenthe number of transmit antennas and the queue decay rate areset to and , i.e., M = 2 and θ = 1 , respectively. We employBPSK in Fig. 2(a) and 4-quadrature amplitude modulation (4-QAM) in Fig. 2(b). This transmission scenario with 2 transmitantennas and many receive antennas can be considered asan uplink communication channel. We obtain the optimalinput covariance matrix (i.e., optimal power allocation acrossthe transmit antennas) and compare the effective capacityperformance with the ones obtained when the input covariancematrix is diagonal (i.e., equal power allocation across thetransmit antennas, where K = M I ). We clearly observe thatthe performance gap decreases with the increasing numberof the receive antennas. In particular, given that BPSK and4-QAM are employed, it is not very necessary to performpower optimization across the transmit antennas when thedelay concerns are of importance. With the increasing numberof antennas, the channel behaves almost deterministic and non-fading. In other words, the statistical dispersion index (Fanofactor) of the channel service rates, i.e., a normalized measure -50 -40 -30 -20 -10 0 10QoS Exponent, (dB)00.511.52 E ff e c t i v e C apa c i t y ( b i t s / c hanne l u s e ) Optimal Power AllocationEqual Power AllocationN=2N=4N=16N=50 (a) BPSK -50 -40 -30 -20 -10 0 10QoS Exponent, (dB)1234 E ff e c t i v e C apa c i t y ( b i t s / c hanne l u s e ) Optimal Power AllocationEqual Power AllocationN=2N=4N=16N=50 (b) 4-QAM

Fig. 3: Effective capacity vs. the QoS exponent, θ , when M = 2 and and γ = 0 dB with different number of receive antennas.The input is BPSK-modulated.of the dispersion of a probability distribution [77], approacheszero. The key point behind this behavior is the self-averaging property that we use to prove Theorem 4, and it shows that theso-called free energy converges in probability to its expectationover the distribution of the channel matrix in the large-antennaregime. Moreover, we see that because the number of bitsthat can be transmitted in one modulated symbol is limited(i.e., 1 bit with BPSK and 2 bits with 4-QAM, and hence2 and 4 bits in total with 2 transmit antennas), when γ is higher we can send the data by employing equal powerallocation across the transmit antennas. Regarding the systemperformance when the QoS metrics are of importance, weplot the effective capacity as a function of θ in Fig. 3 byemploying BPSK and setting γ = 0 dB. With increasing θ , theeffective capacity performance decreases and approaches zero.The effective capacity goes to the average transmission ratein the channel with decreasing θ . Moreover, the performancegain by employing power optimization is again not signiﬁcantwhen the number of receive antennas is higher.Employing the equal power allocation policy, we plot theeffective capacity as a function of γ when the number oftransmit antennas is ﬁxed to 1 for different number of receiveantennas in Fig. 4(a) and when the number of receive antennasis ﬁxed to 1 for different number of transmit antennas inFig. 4(b). The input data is BPSK-modulated. In order tounderstand the system behavior under strict QoS constraints,we set θ = 5 . Again, we can refer to the scenario in Fig.4(a) as an uplink scenario and the scenario in Fig. 4(b) asa down-link scenario. We observe that increasing the numberof the receive antennas while keeping the number of transmitantennas constant boosts the effective capacity performancewhen the signal-to-noise ratio is small as seen in Fig. 4(a).On the other hand, increasing the number of transmit antennaswhile keeping the number of receive antennas ﬁxed does notprovide a performance increase when the delay violation andbuffer overﬂow concerns are present as seen in Fig. 4(b). The reason behind this is the fact that increasing the number ofreceive antennas provides more power gain [78, Chapter 8].Subsequently, regarding the system performance with differentmodulation techniques, we again plot the effective capacity asa function of γ in Fig. 5 when we have BPSK, 4-QAM, 16-QAM and Gaussian signaling for θ = 1 . We set the numberof transmit and receive antennas as M = 1 and N = 16 inFig. 5(a) and M = 16 and N = 1 in Fig. 5(b). Likewise, theformer scenario can be considered as an uplink transmissionand the latter can be considered as a down-link transmission.Regardless of the modulation technique, increasing the numberof receive antennas helps improve the system performancemore than increasing the number of transmit antennas doesunder the same conditions.As for the system performance in the low signal-to-noiseratio regime, we plot the effective capacity as a function of the energy-per-bit , ζ , for different numbers of transmit and receiveantennas in Fig. 6 by employing optimal power allocationpolicy when θ = 1 . We have the results for different numberof receive antennas when the number of transmit antennas isset to , i.e., M = 1 , in Fig. 6(a), and for different numberof transmit antennas when the number of receive antennasis set to , i.e., N = 1 , in Fig. 6(b). We plot the effectivecapacity in bits/channel use/dimension . The minimum energy-per-bit , ζ min , decreases with the increasing number of transmitantennas, whereas it is independent of the number of receiveantennas given that the number of transmit antennas is ﬁxed.This observation veriﬁes our analytical derivation in (24),which provides us the minimum energy-per-bit when either thenumber of transmit antennas or the number of receive antennasgoes to inﬁnity, or both go to inﬁnity. In addition, we again In [78, Chapter 8], comparing multi-input-single-output (MISO) andsingle-input-multi-output (SIMO) channel models, the author showed thatSIMO systems outperforms MISO systems having the same number of receiveand transmit antennas, respectively, which is also valid for the effectivecapacity performance. -40 -30 -20 -10 0Signal-to-Noise Ratio, (dB)00.20.40.60.811.2 E ff e c t i v e C apa c i t y ( b i t s / c hanne l u s e ) N=500N=300N=100N=16 (a) M = 1 -20 -10 0 10Signal-to-Noise Ratio, (dB)00.20.40.60.811.2 E ff e c t i v e C apa c i t y ( b i t s / c hanne l u s e ) M = 500M = 300M = 100M = 16 (b) N = 1 Fig. 4: Effective capacity of different transmission scenarios as a function of signal-to-noise ratio γ for BPSK and θ = 5 . -25 -20 -15 -10 -5 0 5Signal-to-Noise Ratio, (dB)01234567 E ff e c t i v e C apa c i t y ( b i t s / c hanne l u s e ) Gaussian16-QAM4-QAMBPSK (a) M = 1 and N = 16 -15 -10 -5 0 5 10 15Signal-to-Noise Ratio, (dB)012345 E ff e c t i v e C apa c i t y ( b i t s / c hanne l u s e ) Gaussian16-QAM4-QAMBPSK (b) M = 16 and N = 1 Fig. 5: Effective capacity of different transmission scenarios vs. signal-to-noise ratio γ for different input signaling and θ = 1 .plot the effective capacity as a function of ζ and compare thesystem performance when different modulation techniques areemployed. In Fig. 7(a) and Fig. 7(b), we set the number ofantennas as follows: M = 1 and N = 16 , and M = 16 and N = 1 , respectively. In both ﬁgures, the minimum energy-per-bit , ζ min , is independent of the input modulation. We also notethat the slope of the effective capacity versus ζ curve at ζ min , S , when BPSK is employed is half of the slope when theother modulation techniques are employed, which are formedin the complex domain.Theorem 3 shows that the slope of the effective capacityversus ζ (in dB) curve at ζ min , i.e., S , decreases with thedecreasing channel estimation quality. Hence, we plot S asa function of the additive Gaussian noise variance, σ e , fordifferent number of receive antennas in Fig. 8(a) and fordifferent number of transmit antennas in Fig. 8(b). The resultsconﬁrm that the slope decreases with the decreasing estimation quality. In other words, the effective capacity increases slowlywith the increasing transmission power in the low signal-to-noise ratio regime. Moreover, the decreasing estimation qualityaffects the effective capacity with the increasing number ofreceive antennas less than the increasing number of transmitantennas. In addition, we display the system performance inthe large-scale antenna regime. Hence, by setting θ = 5 and γ = 0 dB and by employing the equal power allocationpolicy, we plot the link utilization as a function of the numberof receive antennas in Fig. 9(a) and the number of transmitantennas in Fig. 9(b). And then, we compare the systemperformance by having different modulation techniques. Recallthat we deﬁne the link utilization as the ratio of the effectivecapacity to the average transmission rate. The fact that thelink utilization approaches one with the increasing number ofreceive or transmit antennas justiﬁes the result in Theorem4. The link utilization reaches 1 faster with the increasing -2 0 2 4 6 8 10Energy per bit, (dB)00.20.40.60.81 E ff e c t i v e C apa c i t y ( bp c u / d i m en s i on ) N=2N=4N=16N=100 (a) M = 1 -25 -20 -15 -10 -5 0 5Energy per bit, (dB)0.20.40.60.81 E ff e c t i v e C apa c i t y ( bp c u / d i m en s i on ) M=100M=16M=4M=2 (b) N = 1 Fig. 6: Effective capacity of different transmission scenarios as a function of energy-per-bit ζ for BPSK and θ = 1 . bpcu: bits/channel use .number of receive antennas than it does with the increasingnumber of transmit antennas. In addition, the link utilization ishigher when BPSK is employed than it is when the others areemployed, while it is lower when Gaussian distributed inputis employed than it is when the others are employed. Thisis because the scattering of the probability of the achievabletransmission rates in the channel is reduced when BPSK isemployed and the scattering increases with the complexity ofthe modulation technique [40].Finally, we display the non-asymptotic performance of anuplink MIMO scenario when the number of receive antennasis N = 16 and the number of transmit antennas is M = 1 ,where we employ the equal power allocation policy. Here,we set the delay violation probability to ε ′ = 10 − when γ = 0 dB and T = 10 − seconds. We plot the delay boundas a function of the data arrival rate when the transmitteddata is modulated with BPSK and Gaussian input signalingin Fig. 10(a) and Fig. 10(b), respectively. We observe thatGaussian distributed input provides lower delay bounds for agiven delay violation probability than BPSK-modulated inputdoes. We further see that the delay bound goes to inﬁnity whenthe data arrival rate approaches the average transmission ratein the channel. In addition, the number of receive antennasaffects the transmission performance by decreasing the delaybound for a given delay violation probability. However, after acertain value, increasing the number of receive antennas doesnot contribute to the delay performance.VII. C ONCLUSION

We have studied the throughput and energy efﬁciency in ageneral class of MIMO systems with arbitrary inputs whenthey are subject to statistical QoS constraints, which areimposed as bounds on the delay violation and buffer overﬂowprobabilities. Adopting the effective capacity as the perfor-mance metric, we have obtained the optimal power allocationpolicies across transmit antennas when there is a short-term average power constraint. Moreover, we have analyzed thesystem performance in the low signal-to-noise ratio and large-scale antenna regimes. We have attained the ﬁrst and secondderivatives of the effective capacity when the signal-to-noiseratio approaches zero. Using these characterizations, we haverevealed that the minimum energy-per-bit does not depend onthe input distribution and the QoS constraints but the slopedoes. In the large-scale antenna regime, we have identiﬁedthat the gap between the effective capacity and the averagetransmission rate in the channel decreases with the increasingnumber of antennas. We have also invoked non-asymptoticperformance measures by employing the effective capacity inbacklog and delay violation bounds.A

PPENDIX

A. Proof of Theorem 1

Note that the logarithm in (9) is a monotonic function of I ( x t ; y t ) . Hence, we can write the optimization problem as min K t E b H n e − θT I ( x t ; y t ) o (33)such that tr { K t } ≤ and K t (cid:23) . Subsequently, we form the Lagrange function as L ( K t , λ, Φ) = E b H (cid:8) e − θT I ( x t ; y t ) − λ (1 − tr { K t } ) − tr { Φ K t } (cid:9) , where λ and Φ (cid:23) are the Lagrange multipliers to theproblem constraints. Then, evaluating its gradient with respectto K t , we obtain − θT e − θT I ( x t ; y t ) ∂ I ( x t ; y t ) ∂ K t + λ I − Φ = 0 , (34)where λ (1 − tr { K t } ) = 0 for λ ≥ , and tr { Φ K t } = 0 for Φ (cid:23) and K t (cid:23) . Moreover, we know from [51, Eq. (25)] Energy per bit, (dB) E ff e c t i v e C apa c i t y ( bp c u / d i m en s i on ) Gaussian16-QAM4-QAMBPSK BPSK (a) M = 1 and N = 16 -15 -10 -5 0 5Energy per bit, (dB)1234567 E ff e c t i v e C apa c i t y () ( bp c u / d i m en s i on ) Gaussian16-QAM4-QAMBPSK BPSK (b) M = 16 and N = 1 Fig. 7: Effective capacity of different transmission scenarios as a function of energy-per-bit ζ for different input signaling and θ = 1 . bpcu: bits/channel use . S ( bp c u / ( d B ) / r e c e i v e an t enna ) N=2N=4N=16N=100 (a) M = 1 S ( bp c u / ( d B ) / r e c e i v e an t enna ) M=100M=16M=4M=2 (b) N = 1 Fig. 8: Effective capacity slope S as a function of the error ratio, β , and θ = − dB, where β = σ e σ h . bpcu: bits/channel use .that ∂ I ( x t ; y t ) ∂ K t K t = P b H † Σ − e w b Hmmse t . (35)Since we consider the worst-case noise assumption, we have Σ e w = σ e w I N × N as the noise covariance matrix. Now, plug-ging (35) into (34), we have − θT e − θT I ( x t ; y t ) γ b H † t b H t mmse t + λ K t − Φ K t = 0 . (36)where γ = P/σ e w . Moreover, we can further express (36) bymultiplying both sides with K t as follows: − θT e − θT I ( x t ; y t ) γ K t b H † t b H t mmse t + λ K t − K t Φ K t K t = 0 . Noting that tr { Φ K t } = tr { K t Φ K t } = 0 , we know that K t Φ K t is forced to be a null matrix [45]. Consequently, theoptimal input covariance matrix, K t (cid:23) , is the solution ofthe following expression: K t = θT γe − θT I ( x t ; y t ) λ b H † t b H t mmse t . (37)This concludes the proof. B. Proof of Theorem 2

With the input-output channel model given in (11), we havecomponent-wise independent channels, i.e., e y t ( i ) = p γd t ( i ) e x t ( i ) + e n t ( i ) for i = 1 , · · · , min { M, N } , where p d t ( i ) is the i th non-zero diagonal of D t and d t ( i ) is the i th eigenvalue of b H b H † and b H † b H . Above, e x t ( i ) , e y t ( i ) Number of Receive Antennas, N0.50.60.70.80.91 L i n k U t ili z a t i on , BPSK4-QAM16-QAMGaussian (a) M = 1 Number of Transmit Antennas, M L i n k U t ili z a t i on , BPSK4-QAM16-QAMGaussian (b) N = 1 Fig. 9: Link Utilization of different transmission scenarios for different input signaling and γ = 0 dB.and e n t ( i ) are the i th element of the input, output and noisevectors, respectively. We note that e x t ( i ) = 0 , e y t ( i ) = 0 ,and e n t ( i ) = 0 for i > min { N, M } . Moreover, becausewe have I ( x t ; y t ) = I ( e x t ; e y t ) , the logarithm in (9) is amonotonic function of I ( e x t ; e y t ) as well. Returning to theoptimization problem in (33), we can see that when theminimum is obtained, I ( e x t ; e y t ) is maximized at every timeinstant. So, the samples of e x should be independent of eachother [2]. In particular, we should have e K t = Σ t , whichis an N × M diagonal matrix with non-negative elements, { σ t ( i ) }| min { N,M } i =1 . Consequently, the optimization problembecomes min Σ t E b H n e − θT I ( e x t ; e y t ) o (38)such that tr { Σ t } = tr { e K t } = tr { V † t K t V t } ≤ . Herein, webeneﬁt from the fact that the trace of a matrix is the sum ofits eigenvalues and the fact that e K t and K t have the sameeigenvalues because V t is a unitary matrix. Subsequently,forming the Lagrange function as L (Σ t , λ, Φ) = E b H (cid:8) e − θT I ( e x t ; e y t ) − λ (1 − tr { Σ t } ) − tr { ΦΣ t } (cid:9) , where λ and Φ (cid:23) are the Lagrange multipliers associatedwith the problem constraints, and taking its derivatives withrespect to { σ t ( i ) }| min { M,N } i =1 , we obtain − θT e − θT I ( e x t ; e y t ) ∂ I ( e x t ( i ); e y t ( i )) ∂σ t ( i ) + λ − φ ( i, i ) = 0 , (39)where e x t ( i ) and e y t ( i ) are the i th elements of e x t and e y t ,respectively, and φ ( i, i ) is the i th diagonal element of Φ . Since tr { ΦΣ } = 0 , we have φ ( i, i ) = 0 . Moreover, using [51,Eq. (25)], we show that ∂ I ( e x t ( i ); e y t ( i )) ∂σ t ( i ) = γd t ( i ) σ t ( i ) mmse t ( i ) ,where mmse t ( i ) = E n | E { e x t ( i ) | e y t ( i ) } − e x t ( i ) | o . Then,given σ t ( i ) ≥ , we have the optimal σ t ( i ) as the solution of the following: σ t ( i ) = θT γd t ( i ) λ e − θT I ( e x t ; e y t ) mmse t ( i ) . (40)If the solution in (40) is negative, we set σ t ( i ) = 0 . Wefurther note that e − θT I ( e x t ; e y t ) and mmse t ( i ) are convex andmonotonically decreasing functions of σ t ( i ) . Therefore, theright-hand-side of (40) is also a convex function of σ t ( i ) andit is monotonically decreasing. Hence, it provides a uniqueand global solution. C. Proof of Theorem 3

The ﬁrst derivative of the effective rate in (7), R E ( θ, P ) ,with respect to the transmission power, P , when P approaches0, is ˙ R E ( θ,

0) = lim P → E b H n ˙ I ( P ) e − θT I ( P ) o N E b H (cid:8) e − θT I ( P ) (cid:9) , (41)where I ( P ) and ˙ I ( P ) are the mutual information and itsderivative, respectively, as a function of P . Noting the worst-case noise assumption, we can re-express (4) as follows: y t = √ P b H t x t + √ P σ e n t + w t , (42)where n t has zero-mean and unit-variance Gaussian randomelements. Because b H t and e H t , and hence n t , are uncorrelated,we can see that the channel model in (42) is similar to thechannel model described in [79, Eq. 7]. Therefore, the lowerbound on the mutual information in the low signal-to-noiseratio regime, i.e., as P → , is expressed as [79, Eq. 64] I ( x t ; y t ) = P log e tr { b H t K b H † t }− P e tr { [ b H t K b H † t ] + 2 σ e b H t K b H † t } + O ( P ) . (43) D e l a y B ound , d ( m s e c ) N = 16N = 100N = 300N = 500 (a) BPSK

30 40 50 60 70 80 90Arrival Rate, a (Mbits/sec)00.20.40.60.811.2 D e l a y B ound , d ( m s e c ) N = 16N = 100N = 300N = 500 (b) Gaussian

Fig. 10: Delay bound of an uplink MIMO scenario as a function of the data arrival rate when M = 1 and N = 16 for γ = 0 dB and ε ′ = 10 − .Then, the ﬁrst derivative of I ( x t ; y t ) with respect to P in thelow signal-to-noise ratio regime becomes ˙ I ( P ) = tr { b H t K b H † t } log e − P log e tr { [ b H t K b H † t ] + 2 σ e b H t K b H † t } + O ( P ) . (44)Then, we can re-write (41) as ˙ R E ( θ,

0) = E b H { tr { b HK b H † }} N log e . (45)We can easily observe that I ( P ) = 0 when P = 0 , and hence e − θT I ( γ ) = 1 in (41). Moreover, since the input covariancematrix, K , is a positive semi-deﬁnite Hermitian matrix, wecan express K as [80] K = U Σ U † = M X i =1 σ i u i u † i , (46)where U is the unitary matrix and Σ is the diagonal matrix.The unitary matrix is formed by the set of the eigenvectorsof K , i.e., U = [ u , · · · , u M ] , and the diagonal matrixis composed of the eigenvalues of K corresponding to itseigenvectors, i.e., Σ = diag { σ , · · · , σ M } . Moreover, theeigenvectors form an orthonormal space, i.e., u † i u j = 1 for i = j and u † i u i = 0 for i = j , and eigenvalues are greaterthan or equal to zero, i.e., σ i ≥ . Here, we assume thatthe system uses all the available energy for transmission, i.e., tr { K } = 1 , and hence, we have P Mi =1 σ i = 1 . Now, we havethe following: ˙ R E ( θ,

0) = 1 N log e E b H { tr { b HK b H † }} (47) = 1 N log e M X i =1 σ i E b H { tr { b Hu i u † i H † }} (48) = 1 N log e M X i =1 σ i E b H { u † i b H † b Hu i } (49) ≤ N log e E b H { λ max ( b H † b H ) } = ˙ C E ( θ, , (50)where λ max ( b H † b H ) is the maximum eigenvalue of b H † b H .Above, (49) follows from the fact that tr { b Hu i u † i b H † } = tr { u † i b H † b Hu i } = u † i b H † b Hu i ,where u † i b H † b Hu i is a scalarvalue. The upper bound in (50) can be achieved by choosingthe normalized input covariance matrix as K = u max u † max and u max is the unit eigenvector of b H † b H that correspondsto the maximum eigenvalue of b H † b H . This completes the ﬁrstpart of the proof.The second derivative of the effective rate in (7), R E ( θ, P ) ,with respect to the transmission power, P , when P approaches0, is ¨ R E ( θ,

0) = lim P → E b H { ¨ I ( γ ) e − θT I ( P ) − θT [ ˙ I ( P )] e − θT I ( γ ) } N E H { e − θT I ( γ ) } + θT E b H n ˙ I ( γ ) e − θT I ( γ ) o N E H (cid:8) e − θT I ( γ ) (cid:9) , (51)where ¨ I ( P ) is the second derivative of the lower bound onthe mutual information with respect to P . From (43), we have ¨ I (0) = − e tr { [ b H t K b H † t ] } − e tr { σ e b H t K b H † t } . Then,we can re-write (51) as ¨ R E ( θ,

0) = θTN log e h E b H n tr { b HK b H † } o − E b H n tr { b HK b H † } oi − N log e E b H n tr { [ b HK b H † ] + 2 σ e b HK b H † } o . (52)Now, let l be the multiplicity of λ max ( b H † b H ) . Hence,we can re-express K as follows: K = P li =1 σ i u i u † i ,where σ i ∈ [0 , and P li =1 σ i = 1 . Above, { u i } li =1 are the corresponding column vectors. Hence, we canshow that E b H { tr { b HK b H † }} = E b H { λ max ( b H † b H ) } and E b H { tr ( b HK b H † }} = E b H { λ max ( b H † b H ) } . Moreover, we have E b H { tr { [ b HK b H † ] }} = E b H { tr { b HK b H † b HK b H † }} (53) = E b H ( tr ( b H l X i =1 σ i u i u † i b H † b H l X j =1 σ j u j u † j b H † )) (54) = E b H ( tr ( l X i,j σ i σ j u † i b H † b Hu j u † j b H † b Hu i )) (55) = E b H ( l X i,j σ i σ j (cid:12)(cid:12)(cid:12) u † i b H † b Hu j (cid:12)(cid:12)(cid:12) ) (56) = E b H ( l X i =1 σ i (cid:12)(cid:12)(cid:12) u † i b H † b Hu i (cid:12)(cid:12)(cid:12) ) (57) = E b H ( λ max ( b H † b H ) l X i =1 σ i ) (58) ≥ l E b H ( λ max ( b H † b H ) ) . (59)Above, (55) comes from the fact that tr { AB } = tr { BA } ,where A and B are matrices. Moreover, since u † i b H † b Hu j and u † j b H † b Hu i are the complex conjugates of each other,we have the result in (56). Noting that u i and u j areorthonormal to each other, i.e., u † i u j = 0 given i = j and u † i u j = 1 given i = j , we have (57). Moreover, we knowthat λ max ( b H † b H ) = u † i b H † b Hu i . Subsequently, we have (58).Finally, P li σ i is minimized when σ i = l . Therefore, wehave the lower bound in (59). As a result, the second derivativeof the effective rate, ¨ R ( θ, P ) , when P diminishes to zero, isupperbounded as follows: ¨ R E ( θ, ≤ θTN log e h E b H { λ max ( b H † b H ) } − E b H { λ max ( b H † b H ) } i − E b H { λ max ( b H † b H ) } lN log e − σ e N log e E b H { λ max ( b H † b H ) } = ¨ C E ( θ, , which completes the second part of the proof. D. Proof of Theorem 4

Given an input covariance matrix, K , the instantaneousmutual information between the channel input and output,deﬁned in (6), can be expressed as follows: r = E x , y (cid:26) log f y | x ( y | x ) f y ( y ) (cid:27) = E x , y (cid:8) log f y | x ( y | x ) (cid:9) − E y (cid:8) log f y ( y ) (cid:9) = − N log e − E y (cid:8) log E x (cid:8) e − σ e w || y −√ P b Hx || (cid:9)(cid:9) . (60)Now, by inserting (60) into (7) and taking the limit when M goes to inﬁnity, we can write the effective rate as follows: lim M →∞ R E ( θ, P ) = lim M →∞ − θN T × log e E b H (cid:26) e θT N log e e θT E y (cid:8) log E x (cid:8) e − σ e w || y −√ P c Hx || (cid:9)(cid:9)(cid:27) (61) = lim M →∞ (cid:26) − e − θN T log e E b H ( e θT E y (cid:8) log E x (cid:8) e − σ e w || y −√ P c Hx || (cid:9)(cid:9)) (cid:27) (62) = lim M →∞ (cid:26) − e − θN T log e E b H ( e θT M E y (cid:8) M log E x (cid:8) e − σ e w || y −√ P c Hx || (cid:9)(cid:9)) (cid:27) (63) = lim M →∞ (cid:26) − e − θN T log e E b H ( e θT M E y , H (cid:8) M log E x (cid:8) e − σ e w || y −√ P c Hx || (cid:9)(cid:9)) (cid:27) (64) = lim M →∞ − e − MN E y , b H (cid:8) M log E x (cid:8) e − σ e w || y −√ P b Hx || (cid:9)(cid:9) (65) = lim M →∞ N E b H ( − N log e − E y (cid:8) log E x (cid:8) e − σ e w || y −√ P b Hx || (cid:9)(cid:9)) (66) = lim M →∞ N E b H (cid:8) r (cid:9) . (67)In (63), we beneﬁt from the connection between the freeenergy and the mutual information and employ the self-averaging property, which provides us the following [81]: lim M →∞ E y (cid:26) M log E x (cid:8) e − σ e w || y −√ P b Hx || (cid:9)(cid:27) = lim M →∞ E y , b H (cid:26) M log E x (cid:8) e − σ e w || y −√ P b Hx || (cid:9)(cid:27) , (68)which is a result of the assumption of the self-averaging property, in which the free energy converges in probability toits expectation over the distribution of the random variables b H and y in the large-system limit [81]. Moreover, the expressioninside the ﬁrst bracket in (66) is same with the expression in(60), we have the result in (67). Then, we have lim M →∞ R E ( θ, P ) = lim M →∞ N E b H (cid:8) r (cid:9) . Similarly, when N goes to inﬁnity or both M and N goto inﬁnity, the solution is trivial. We can again use thereformulation performed in (63) and engage the property statedin (68). Since the aforementioned proof is valid for any inputcovariance matrix, we can complete the proof with (20).R EFERENCES[1] G. J. Foschini, “Layered space-time architecture for wireless commu-nication in a fading environment when using multi-element antennas,”

Bell labs technical journal , vol. 1, no. 2, pp. 41–59, 1996.[2] E. Telatar, “Capacity of multi-antenna Gaussian channels,”

EuropeanTrans. Telecommun. , vol. 10, no. 6, pp. 585–595, 1999. [3] L. Zheng and D. N. Tse, “Diversity and multiplexing: a fundamentaltradeoff in multiple-antenna channels,” IEEE Trans. Inf. Theory , vol. 49,no. 5, pp. 1073–1096, 2003.[4] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacity limitsof MIMO channels,”

IEEE J. Sel. Areas Commun. , vol. 21, no. 5, pp.684–702, 2003.[5] D.-S. Shiu, G. J. Foschini, M. J. Gans, and J. M. Kahn, “Fadingcorrelation and its effect on the capacity of multielement antennasystems,”

IEEE Trans. Commun. , vol. 48, no. 3, pp. 502–513, 2000.[6] S. A. Jafar and A. Goldsmith, “Transmitter optimization and optimalityof beamforming for multiple antenna systems,”

IEEE Trans. WirelessCommun. , vol. 3, no. 4, pp. 1165–1175, 2004.[7] E. A. Jorswieck and H. Boche, “Channel capacity and capacity-range ofbeamforming in MIMO wireless systems under correlated fading withcovariance feedback,”

IEEE Trans. Wireless Commun. , vol. 3, no. 5, pp.1543–1553, 2004.[8] M. Vu and A. Paulraj, “Capacity optimization for Rician correlatedMIMO wireless channels,” in

Proc. 39th Asilomar Conf. Signal Syst.Comput. (ASILOMAR) , 2005, pp. 133–138.[9] A. M. Tulino, A. Lozano, and S. Verd´u, “Capacity-achieving inputcovariance for single-user multi-antenna channels,”

IEEE Trans. WirelessCommun. , vol. 5, no. 3, pp. 662–671, 2006.[10] W. Rhee and J. M. Ciofﬁ, “On the capacity of multiuser wirelesschannels with multiple antennas,”

IEEE Trans. Inf. Theory , vol. 49,no. 10, pp. 2580–2595, 2003.[11] M. Ivrlac, T. Kurpjuhn, C. Brunner, and J. Nossek, “On channel capacityof correlated MIMO channels,”

ITG Fokusprojekt: MobilkommunikationSysteme mit intelligenten Antennen , 2001.[12] S. Venkatesan, S. H. Simon, and R. A. Valenzuela, “Capacity of aGaussian MIMO channel with nonzero mean,” in

Proc. IEEE Veh.Technol. Conf. Fall (VTC-FALL) , vol. 3, 2003, pp. 1767–1771.[13] D. Hosli and A. Lapidoth, “The capacity of a MIMO Ricean channelis monotonic in the singular values of the mean,”

ITG FACHBERICHT ,pp. 381–386, 2004.[14] M. Kang and M.-S. Alouini, “Capacity of MIMO Rician channels,”

IEEE Trans. Wireless Commun. , vol. 5, no. 1, pp. 112–122, 2006.[15] A. Osseiran, F. Boccardi, V. Braun, K. Kusume, P. Marsch, M. Maternia,O. Queseth, M. Schellmann, H. Schotten, H. Taoka et al. , “Scenariosfor 5G mobile and wireless communications: The vision of the METISproject,”

IEEE Commun. Mag. , vol. 52, no. 5, pp. 26–35, 2014.[16] S. Verd´u, “Spectral efﬁciency in the wideband regime,”

IEEE Trans. Inf.Theory , vol. 48, no. 6, pp. 1319–1343, 2002.[17] A. Lozano, A. M. Tulino, and S. Verd´u, “Multiple-antenna capacity inthe low-power regime,”

IEEE Trans. Inf. Theory , vol. 49, no. 10, pp.2527–2544, 2003.[18] P. Memmolo, M. Lops, A. M. Tulino, and R. A. Valenzuela, “Up-linkmulti-user MIMO capacity in low-power regime,” in

Proc. IEEE Int.Symp. Inf. Theory , JUn. 2010, pp. 2308–2312.[19] V. Raghavan and A. M. Sayeed, “Achieving coherent capacity ofcorrelated MIMO channels in the low-power regime with non-ﬂashysignaling schemes,” in

Proc. IEEE Int. Symp. Inf. Theory , Adelaide,Australia, Sep. 2005, pp. 906–910.[20] F. Heliot, M. A. Imran, and R. Tafazolli, “On the energy efﬁciency-spectral efﬁciency trade-off over the MIMO Rayleigh fading channel,”

IEEE Trans. Commun. , vol. 60, no. 5, pp. 1345–1356, 2012.[21] O. Onireti, F. Heliot, and M. A. Imran, “On the energy efﬁciency-spectral efﬁciency trade-off of distributed MIMO systems,”

IEEE Trans.Commun. , vol. 61, no. 9, pp. 3741–3753, 2013.[22] T. L. Marzetta, “Noncooperative cellular wireless with unlimited num-bers of base station antennas,”

IEEE Trans. Wireless Commun. , vol. 9,no. 11, pp. 3590–3600, 2010.[23] T. L. Marzetta, G. Caire, M. Debbah, I. Chih-Lin, and S. K. Mohammed,“Special issue on massive MIMO,”

J. Commun. Networks , vol. 15, no. 4,pp. 333–337, 2013.[24] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, “Massive MIMOfor next generation wireless systems,”

IEEE Commun. Mag. , vol. 52,no. 2, pp. 186–195, 2014.[25] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong,and J. C. Zhang, “What will 5g be?”

IEEE Journal on selected areasin communications , vol. 32, no. 6, pp. 1065–1082, 2014.[26] A. Lozano and A. M. Tulino, “Capacity of multiple-transmit multiple-receive antenna architectures,”

IEEE Trans. Inf. Theory , vol. 48, no. 12,pp. 3117–3128, 2002.[27] A. L. Moustakas, S. H. Simon, and A. M. Sengupta, “MIMO capacitythrough correlated channels in the presence of correlated interferers andnoise: A (not so) large n analysis,”

IEEE Trans. Inf. Theory , vol. 49,no. 10, pp. 2545–2561, 2003. [28] E. Bjornson, J. Hoydis, M. Kountouris, and M. Debbah, “MassiveMIMO systems with non-ideal hardware: Energy efﬁciency, estimation,and capacity limits,”

IEEE Trans. Inf. Theory , vol. 60, no. 11, pp. 7112–7139, 2014.[29] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efﬁ-ciency of very large multiuser MIMO systems,”

IEEE Trans. Commun. ,vol. 61, no. 4, pp. 1436–1449, 2013.[30] J.-C. Shen, J. Zhang, and K. B. Letaief, “Downlink user capacityof massive MIMO under pilot contamination,”

IEEE Trans. WirelessCommun. , vol. 14, no. 6, pp. 3183–3193, 2015.[31] J. Hoydis, S. Ten Brink, and M. Debbah, “Massive MIMO in the ul/dlof cellular networks: How many antennas do we need?”

IEEE J. Sel.Areas Commun. , vol. 31, no. 2, pp. 160–171, 2013.[32] C.-K. Wen, S. Jin, and K.-K. Wong, “On the sum-rate of multiuserMIMO uplink channels with jointly-correlated Rician fading,”

IEEETrans. Commun.

IEEE Trans. Autom. Control , vol. 39,no. 5, pp. 913–931, 1994.[35] ——,

Performance guarantees in communication networks . SpringerScience & Business Media, 2012.[36] D. Wu and R. Negi, “Effective capacity: a wireless link model forsupport of quality of service,”

IEEE Trans. Wireless Commun. , vol. 2,no. 4, pp. 630–643, 2003.[37] M. C. Gursoy, “MIMO wireless communications under statistical queue-ing constraints,”

IEEE Trans. Inf. Theory , vol. 57, no. 9, pp. 5897–5917,2011.[38] S. Akin and M. C. Gursoy, “On the throughput and energy efﬁciencyof cognitive MIMO transmissions,”

IEEE Trans. Veh. Technol. , vol. 62,no. 7, pp. 3245–3260, 2013.[39] E. A. Jorswieck, R. Mochaourab, and M. Mittelbach, “Effective capacitymaximization in multi-antenna channels with covariance feedback,”

IEEE Trans. Wireless Commun. , vol. 9, no. 10, pp. 2988–2993, 2010.[40] S. Akin, “The interplay between data transmission power and transmis-sion link utilization,”

IEEE Commun. Lett. , vol. 19, no. 11, pp. 1953–1956, 2015.[41] F. P´erez-Cruz, M. R. Rodrigues, and S. Verd´u, “MIMO Gaussianchannels with arbitrary inputs: Optimal precoding and power allocation,”

IEEE Trans. Inf. Theory , vol. 56, no. 3, pp. 1070–1084, 2010.[42] M. M. Lamarca Orozco, “Linear precoding for mutual informationmaximization in MIMO systems,” in

Proc. 6th IEEE Int. Symp. WirelessCommun. Syst. (ISWCS) , 2010, pp. 26–30.[43] M. Payar´o and D. P. Palomar, “On optimal precoding in linear vectorGaussian channels with arbitrary input distribution,” in

Proc. IEEE Int.Symp. Inf. Theory , 2009, pp. 1085–1089.[44] C. Xiao, Y. R. Zheng, and Z. Ding, “Globally optimal linear precodersfor ﬁnite alphabet signals over complex vector Gaussian channels,”

IEEETrans. Signal Process. , vol. 59, no. 7, pp. 3301–3314, 2011.[45] M. R. Rodrigues, F. P´erez-Cruz, and S. Verduy, “Multiple-inputmultiple-output Gaussian channels: Optimal covariance for non-Gaussian inputs,” in

Proc. IEEE Inf. Theory Workshop (ITW) , 2008,pp. 445–449.[46] Y. Wu, C.-K. Wen, C. Xiao, X. Gao, and R. Schober, “Linear MIMOprecoding in jointly-correlated fading multiple access channels withﬁnite alphabet signaling,” in

Proc. IEEE Int. Conf. Commun. (ICC) ,2014, pp. 5306–5311.[47] M. Wang, W. Zeng, and C. Xiao, “Linear precoding for MIMO multipleaccess channels with ﬁnite discrete inputs,”

IEEE Trans. WirelessCommun. , vol. 10, no. 11, pp. 3934–3942, 2011.[48] Y. Wu, C.-K. Wen, C. Xiao, X. Gao, and R. Schober, “Linear precodingfor the MIMO multiple access channel with ﬁnite alphabet inputs andstatistical CSI,”

IEEE Trans. Wireless Commun. , vol. 14, no. 2, pp. 983–997, 2015.[49] A. Lozano, A. M. Tulino, and S. Verd´u, “Optimum power allocationfor parallel Gaussian channels with arbitrary input distributions,”

IEEETrans. Inf. Theory , vol. 52, no. 7, pp. 3033–3051, 2006.[50] D. Guo, S. Shamai, and S. Verd´u, “Mutual information and minimummean-square error in Gaussian channels,”

IEEE Trans. Inf. Theory ,vol. 51, no. 4, pp. 1261–1282, 2005.[51] D. P. Palomar and S. Verd´u, “Gradient of mutual information in linearvector Gaussian channels,”

IEEE Trans. Inf. Theory , vol. 52, no. 1, pp.141–154, 2006. [52] E. Bj¨ornson, J. Hoydis, M. Kountouris, and M. Debbah, “Hardwareimpairments in large-scale miso systems: Energy efﬁciency, estimation,and capacity limits,” in Digital Signal Processing (DSP), 2013 18thInternational Conference on . IEEE, 2013, pp. 1–6.[53] U. Gustavsson, C. Sanch´ez-Perez, T. Eriksson, F. Athley, G. Durisi,P. Landin, K. Hausmair, C. Fager, and L. Svensson, “On the impact ofhardware impairments on massive mimo,” in

Globecom Workshops (GCWkshps), 2014 . IEEE, 2014, pp. 294–300.[54] J. Vieira, S. Malkowsky, K. Nieman, Z. Miers, N. Kundargi, L. Liu,I. Wong, V. ¨Owall, O. Edfors, and F. Tufvesson, “A ﬂexible 100-antennatestbed for massive mimo,” in

Globecom Workshops (GC Wkshps), 2014 .IEEE, 2014, pp. 287–293.[55] S. Malkowsky, J. Vieira, L. Liu, P. Harris, K. Nieman, N. Kundargi,I. Wong, F. Tufvesson, V. Owall, and O. Edfors, “The world’s ﬁrst real-time testbed for massive mimo: Design, implementation, and validation,”

IEEE Access , 2017.[56] ICT-317669 METIS Project, “Scenarios, requirements and KPIs for 5Gmobile and wireless system,”

Del. D1.1 , May, 2013.[57] M. Tao and P. Y. Kam, “Analysis of differential orthogonal space–timeblock codes over semi-identical MIMO fading channels,”

IEEE trans.on commun. , vol. 55, no. 2, pp. 282–291, 2007.[58] E. Biglieri, R. Calderbank, A. Constantinides, A. Goldsmith, A. Paulraj,and H. V. Poor,

MIMO wireless communications . Cambridge universitypress, 2007.[59] C.-N. Chuah, D. N. C. Tse, J. M. Kahn, and R. A. Valenzuela, “Capacityscaling in MIMO wireless systems under correlated fading,”

IEEE Trans.on Inf. Theory , vol. 48, no. 3, pp. 637–650, 2002.[60] C. Oestges, B. Clerckx, M. Guillaud, and M. Debbah, “Dual-polarizedwireless communications: from propagation models to system perfor-mance evaluation,”

IEEE Tran. on Wireless Commun. , vol. 7, no. 10,2008.[61] S. L. Loyka, “Channel capacity of MIMO architecture using the expo-nential correlation matrix,”

IEEE Communications letters , vol. 5, no. 9,pp. 369–371, 2001.[62] T. Yoo and A. Goldsmith, “Capacity and power allocation for fadingMIMO channels with channel estimation error,”

IEEE Transactions onInformation Theory , vol. 52, no. 5, pp. 2203–2214, 2006.[63] L. Musavian, M. R. Nakhai, M. Dohler, and A. H. Aghvami, “Effectof channel uncertainty on the mutual information of MIMO fadingchannels,”

IEEE Transactions on Vehicular Technology , vol. 56, no. 5,pp. 2798–2806, 2007.[64] T. E. Klein and R. G. Gallager, “Power control for the additive whitegaussian noise channel under channel estimation errors,” in

InformationTheory, 2001. Proceedings. 2001 IEEE International Symposium on ,2001, p. 304.[65] L. Liu and J.-F. Chamberland, “On the effective capacities of multiple-antenna Gaussian channels,” in

Proc. IEEE Int. Symp. Inf. Theory ,Toronto, Canada, 2008, pp. 2583–2587.[66] M. Medard, “The effect upon channel capacity in wireless communica-tions of perfect and imperfect knowledge of the channel,”

IEEE Trans.Inf. theory , vol. 46, no. 3, pp. 933–946, 2000.[67] B. Hassibi and B. M. Hochwald, “How much training is needed inmultiple-antenna wireless links?”

IEEE Trans. Inf. Theory , vol. 49, no. 4,pp. 951–963, 2003.[68] S. Akin and M. Fidler, “On the transmission rate strategies in cognitiveradios,”

IEEE Trans. Wireless Commun. , vol. 15, no. 3, pp. 2335–2350,March 2016.[69] D. P. Bertsekas, R. G. Gallager, and P. Humblet,

Data networks .Prentice-Hall International New Jersey, 1992, vol. 2.[70] A. Goldsmith,

Wireless communications . Cambridge university press,2005.[71] C.-S. Chang,

Performance Guarantees in Communication Networks .Springer-Verlag, 2000.[72] Y. Jiang and Y. Liu,

Stochastic network calculus . Springer, 2008, vol. 1.[73] M. Fidler and A. Rizk, “A guide to the stochastic network calculus,”

IEEE Commun. Surveys Tuts. , vol. 17, no. 1, pp. 92–105, 2015.[74] K. Lee, “Performance bounds in communication networks with variable-rate links,” in

ACM SIGCOMM Comput. Commun. Review , vol. 25,no. 4. ACM, 1995, pp. 126–136.[75] M. Fidler, R. L¨ubben, and N. Becker, “Capacity–delay–error boundaries:A composable model of sources and systems,”

IEEE Trans. WirelessCommun. , vol. 14, no. 3, pp. 1280–1294, 2015.[76] S. Akin and M. Fidler, “Backlog and delay reasoning in harq system,”in

Proc. 27th Int. Teletrafﬁc Congress (ITC 27) , 2015, pp. 185–193.[77] U. Fano, “Ionization yield of radiations. II. the ﬂuctuations of the numberof ions,”

Physical Review , vol. 72, no. 1, p. 26, 1947. [78] D. Tse and P. Viswanath,

Fundamentals of wireless communication .Cambridge university press, 2005.[79] V. V. Prelov and S. Verd´u, “Second-order asymptotics of mutual infor-mation,”

IEEE Transactions on Information Theory , vol. 50, no. 8, pp.1567–1580, 2004.[80] R. A. Horn and C. R. Johnson,

Matrix analysis . Cambridge universitypress, 2012.[81] C.-K. Wen, P. Ting, and J.-T. Chen, “Asymptotic analysis of MIMOwireless systems with spatial correlation at the receiver,”

IEEE Trans.Commun. , vol. 54, no. 2, pp. 349–363, 2006.PLACEPHOTOHERE

Marwan Hammouda received the B.S. degree incommunications and control engineering from theIslamic University of Gaza, Gaza, Palestine in 2007,and the M.S. degree in Communications, systemsand electronics from Jacobs University Bremen, Ger-many in 2012. Since November 2012, he has beenwith the Institute of Communications Technology atLeibniz Universit¨at Hannover, Hanover, Germany asa research assistance working toward the Ph.D. de-gree in electrical engineering. His research interestsmainly include the application of signal processingand information theory in wireless radio frequency (RF) systems and visiblelight communications (VLC), with emphasize on resource allocation in wire-less communications and networks under quality of service (QoS) constraints.PLACEPHOTOHERE

Sami Akın received the B.S. degree in electrical andelectronics engineering from Bogazici University,Istanbul, Turkey in 2005, and the Ph.D. degreein electrical engineering from the University ofNebraska-Lincoln, Lincoln, NE, US in 2011. SinceDecember 2011, he has been with the Institute ofCommunications Technology at Leibniz Universit¨atHannover, Hanover, Germany as a research scientist.He was the technical group leader of the CognitiveRadio for Audio Systems (CoRAS) project fundedby Lower Saxony Ministry of Science and Culture,and worked in the Towards a Uniﬁed Information and Queueing Theory(UnIQue) project funded by the European Research Council (ERC) StartingGrant. Currently, he is a research member of the Feedback-Less Machine-Type Communications (FeeLMaTyC) project funded by the German ResearchFoundation (DFG). His research interests are in the general areas of wirelesscommunications, signal processing, information theory, queueing theory, net-work calculus, and energy harvesting with a focus on wireless communicationsand networks under quality of service (QoS) constraints. PLACEPHOTOHERE

M Cenk Gursoy received the B.S. degree with highdistinction in electrical and electronics engineeringfrom Bogazici University, Istanbul, Turkey, in 1999and the Ph.D. degree in electrical engineering fromPrinceton University, Princeton, NJ, in 2004. Hewas a recipient of the Gordon Wu Graduate Fel-lowship from Princeton University between 1999and 2003. In the summer of 2000, he workedat Lucent Technologies, Holmdel, NJ, where heconducted performance analysis of DSL modems.Between 2004 and 2011, he was a faculty memberin the Department of Electrical Engineering at the University of Nebraska-Lincoln (UNL). He is currently a Professor in the Department of ElectricalEngineering and Computer Science at Syracuse University. His researchinterests are in the general areas of wireless communications, informationtheory, communication networks, and signal processing. He is a member of theeditorial boards of IEEE TRANSACTIONS ON WIRELESS COMMUNICA-TIONS, IEEE TRANSACTIONS ON GREEN COMMUNICATIONS ANDNETWORKING, IEEE TRANSACTIONS ON COMMUNICATIONS, andIEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY. He served asan editor for IEEE COMMUNICATIONS LETTERS between 2012 and 2014and for IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS- Series on Green Communications and Networking (JSAC-SGCN) between2015 and 2016. He also served as a Co-Chair of the 2017 Communication QoSand System Modeling Symposium, International Conference on Computing,Networking and Communications (ICNC). He received an NSF CAREERAward in 2006. More recently, he received the IEEE PIMRC’17 Best PaperAward, EURASIP Journal of Wireless Communications and Networking BestPaper Award, the UNL College Distinguished Teaching Award, and the MaudeHammond Fling Faculty Research Fellowship. He is a Senior Member ofIEEE, and is the Aerospace/Communications/Signal Processing Chapter Co-Chair of IEEE Syracuse Section.PLACEPHOTOHERE

J¨urgen Peissig received the Diploma and the Ph.D.degrees in physics from the III. Physical Instituteof University of G¨ottingen, Germany, in 1988 and1992, respectively, in digital signal processing withapplication in acoustics and hearing aids. After a stayat Bell Laboratories, Murray Hill, NJ, USA in 1991he worked as research assistant and lecturer at theuniversities of G¨ottingen and Oldenburg, Germany.In 1995, he joined Sennheiser electronic Germany,were he became responsible for the digital signalprocessing group in R &&