[PDF] High-Dimensional CSI Acquisition in Massive MIMO: Sparsity-Inspired Approaches

Abstract

Massive MIMO has been regarded as one of the key technologies for 5G wireless networks, as it can significantly improve both the spectral efficiency and energy efficiency. The availability of high-dimensional channel side information (CSI) is critical for its promised performance gains, but the overhead of acquiring CSI may potentially deplete the available radio resources. Fortunately, it has recently been discovered that harnessing various sparsity structures in massive MIMO channels can lead to significant overhead reduction, and thus improve the system performance. This paper presents and discusses the use of sparsity-inspired CSI acquisition techniques for massive MIMO, as well as the underlying mathematical theory. Sparsity-inspired approaches for both frequency-division duplexing and time-division duplexing massive MIMO systems will be examined and compared from an overall system perspective, including the design trade-offs between the two duplexing modes, computational complexity of acquisition algorithms, and applicability of sparsity structures. Meanwhile, some future prospects for research on high-dimensional CSI acquisition to meet practical demands will be identified.

Full PDF

aa r X i v : . [ c s . I T ] M a y High-Dimensional CSI Acquisition in MassiveMIMO: Sparsity-Inspired Approaches

Juei-Chin Shen,

Member , IEEE , Jun Zhang,

Member , IEEE , Kwang-Cheng Chen,

Fellow , IEEE ,and Khaled B. Letaief,

Fellow , IEEE

Abstract

Massive MIMO has been regarded as one of the key technologies for 5G wireless networks, as it can signiﬁcantlyimprove both the spectral efﬁciency and energy efﬁciency. The availability of high-dimensional channel side informa-tion (CSI) is critical for its promised performance gains, but the overhead of acquiring CSI may potentially depletethe available radio resources. Fortunately, it has recently been discovered that harnessing various sparsity structuresin massive MIMO channels can lead to signiﬁcant overhead reduction, and thus improve the system performance.This paper presents and discusses the use of sparsity-inspired CSI acquisition techniques for massive MIMO, aswell as the underlying mathematical theory. Sparsity-inspired approaches for both frequency-division duplexing andtime-division duplexing massive MIMO systems will be examined and compared from an overall system perspective,including the design trade-offs between the two duplexing modes, computational complexity of acquisition algorithms,and applicability of sparsity structures. Meanwhile, some future prospects for research on high-dimensional CSIacquisition to meet practical demands will be identiﬁed.

Index Terms

Massive MIMO, channel estimation, pilot contamination, pilot sequences, sparsity, compressed sensing, ℓ min-imization. I. I

NTRODUCTION

Massive MIMO systems promise to boost spectral efﬁciency by more than one order of magnitude [1], [2].Full beneﬁts of massive MIMO, however, will never come to fruition without the base stations (BSs) havingadequate channel knowledge, which appears to be an extremely challenging task [3]. The challenges posed byMIMO channels of very high dimension are confronted in both frequency-division duplexing (FDD) and time-division duplexing (TDD) massive MIMO systems. In the FDD mode, both the pilot-aided training overhead andthe feedback overhead for CSI acquisition grow proportionally with the BS antenna size. However, the proportion

J.-C. Shen, J. Zhang, and K. B. Letaief are with the Department of Electronic and Computer Engineering, Hong Kong University of Scienceand Technology, Hong Kong (E-mail: {eejcshen, eejzhang, eekhaled}@ust.hk).K.-C. Chen is with the Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan (E-mail:[email protected]).

May 5, 2015 DRAFT (a) (b)

Fig. 1: Pilot reuse in multiple cells. (a) FDD downlink training (b) TDD uplink training.of radio resources allocated to CSI acquisition is severely restricted by the channel coherence period. The situationis made worse in an environment with high user equipment (UE) mobility.In view of this, a considerable research effort has been devoted to TDD massive MIMO by exploiting channelreciprocity. Although the training overhead for TDD operation becomes proportional to the number of active UEsrather than that of BS antennas, the inevitable reuse of the same pilot in neighboring cells can seriously degradethe quality of obtained channel knowledge. This is because the channels to UEs in adjacent cells who share thesame pilot will be collectively acquired by the BS. In other words, the desired channel obtained by the BS will becontaminated by interference channels. Once this contaminated channel knowledge is utilized for transmitting orreceiving data, intercell interference occurs immediately and hence limits the achievable performance. This problem,known as pilot contamination , can not be circumvented simply by adding more BS antennas.Several attempts have been made to tackle the challenges of acquiring high-dimensional CSI in massive MIMO.For instance, in [4], open/closed loop training that utilizes temporal and spatial channel statistics is proposedto reduce the amount of downlink training overhead. For mitigating pilot contamination, the optimal design ofprecoding matrices aimed at minimizing the square errors caused by pilot reuse has shown its superiority overlinear precoding [5]. Thanks to the recent advances in compressed sensing [6], [7], sparse signal processing hasattracted much attention in such high-dimensional settings, which has also demonstrated its power in CSI acquisitionin terms of reconstructing CSI from a limited number of channel measurements. Various sparsity structures exhibitedby massive MIMO channels have recently been identiﬁed, thereby motivating the development of new strategiesfor CSI acquisition. Surprisingly, not only can high training overhead be reduced, but pilot contamination can alsobe resolved by appealing to sparsity-inspired approaches.In this paper, we provide a comprehensive overview of the state-of-the-art research on sparsity-inspired approachesfor high-dimensional CSI acquisition. In Section II, the challenges in FDD and TDD massive MIMO are reviewed indetail, including a rarely mentioned issue of FDD pilot contamination. On the basis of different sparsity structures,a variety of methods for either achieving overhead reduction or alleviating the effects of pilot reuse are examined

May 5, 2015 DRAFT and compared in Section III. Finally, concluding remarks are made in Section IV.

Notations : C : complex number, ℜ : real part, k·k p : p -norm, ( · ) ′ : transpose, ( · ) H : Hermitian transpose, I N : N × N identity matrix, N ( · , · ) : normal distribution, E [ · ] : expectation, : zero vector, card ( · ) : cardinality, supp ( · ) : the set ofindices of non-zero elements, Var ( · ) : variance, max {·} : the maximum element, Vec ( · ) : vectorization, ⊗ : Kroneckerproduct, (cid:23) : matrix inequality, ( · ) † : pseudo inverse.II. C HALLENGES OF H IGH -D IMENSIONAL

CSI A

CQUISITION

In massive MIMO systems with high-dimensional channels, CSI acquisition at BSs is a fundamentally challengingproblem. In FDD massive MIMO, performing this task consumes a considerable amount of radio resources whichis proportional to the dimension of channels. On the other hand, in TDD-mode operation, it is hard to ensurethe orthogonality of pilot sequences in the multicell scenario as the number of overall UEs becomes large. As aresult, the inevitable reuse of correlated pilot sequences in different cells, known as pilot contamination, causescapacity-limiting intercell interference.To illustrate these difﬁculties further, we will consider a massive MIMO network consisting of L hexagonal cells.In each cell, there is a BS equipped with an M -element linear array , serving K single-antenna UEs. The channelbetween BS i and UE k in cell j is denoted by the M × vector h i,j,k . The BS antenna size is supposed to begreatly larger than the number of served UEs. A. FDD Massive MIMO

In the FDD mode, obtaining CSI at BSs is normally performed in two steps. First, each BS sends a downlink train-ing matrix to its served UEs. Second, each UE estimates the desired channel based on the downlink measurementsand feeds back acquired CSI through dedicated uplink feedback channels.During downlink training, UE k in cell i receives channel measurements y DL i,k = S DL i h i,i,k + X l = i S DL l h l,i,k + z DL i,k , (1)where S DL l denotes the N × M pilot training matrix used in cell l , z DL i,k is the additive noise, while the ﬁrstterm of the right-hand side (RHS) represents the desired channel measurements, and the next term results fromintercell interference. Even without considering the impact of intercell interference, the required training overhead N for conventional least-squares (LS) or minimum mean square error (MMSE) estimators to achieve a reasonableperformance level still scales linearly with the BS antenna size. By taking intercell interference into account, afurther increase in training overhead would occur. The explicit expressions of the optimal pilot training matrices( N ≥ M ) are provided in [8] for single-cell networks. In [9], the optimal design of training matrices for multicellMIMO-OFDM systems is considered. For simplicity, the assumption of employing linear arrays is made. However, most of the results discussed in this paper can be generalizedto include the cases of using planar or cylindrical arrays.

May 5, 2015 DRAFT

What makes the situation worse is that typical feedback channels are ﬁnite-rate. This implies that only quantizedversions of channel estimates can be fed back to BSs. If there are predeﬁned codebooks consisting of precodingvectors, then the index of the optimal codebook vector is required to be sent back [10], [11]. However, either theamount of quantized CSI or the size of codebooks increases in proportion to the number of BS antennas, and it inturn makes these two limited feedback techniques impractical in FDD massive MIMO.Note that when the same training matrix is repeatedly used in multiple cells, i.e., S DL = · · · = S DL L , this canbe regarded as pilot contamination in FDD massive MIMO. As a result of such contamination, as shown in Fig.1(a), BS i will acquire the composite channel P Ll =1 h l,i,k rather than the desired channel h i,i,k , given the feedbackchannel being error-free and the additive noise being ignored. Despite this fact, utilizing this composite CSI toform a precoding vector and transmit signals at BS i will not cause serious interference to UEs in the neighboringcells. For instance, given that maximum ratio transmission (MRT) precoding is employed, the transmitted signalfrom BS i can be expressed as x i = P Kk =1 w FDD i,k x i,k where x i,k is the signal intended for UE k within the cell, and w FDD i,k = P Ll =1 ( h Hl,i,k ) ′ denotes the MRT precoding vector. During the downlink transmission phase, the receivedinterference at UE m in cell j due to BS i is given by I i,j,m = h ′ i,j,m x i = K X k =1 L X l =1 h Hl,i,k h i,j,m x i,k . (2)When the number of BS antennas grows without limit, the channel vectors are asymptotically orthogonal. Thus, thechannel products h Hl,i,k h i,j,m approach zero and so does the interference I i,j,m . In other words, intercell interferencecaused by pilot contamination diminishes asymptotically with increasing BS antenna size. This implies that thereis no need to mitigate intercell interference by making training matrices distinct from each other in the asymptoticregime. Hence, the existing literature rarely addresses the issue of pilot contamination in FDD massive MIMO.Note that uplink training in the FDD mode is not considered here. An explanation for this is provided as follows.The uplink CSI is mainly utilized for data acquisition in a multiple-access channel, instead of a broadcast channel.This means that more advanced signal processing techniques, such as blind multiuser detection, can be applied atthe BS side. Thus, pilot-aided training may not be the best choice and CSI acquisition is not necessarily separatedfrom data acquisition. B. TDD Massive MIMO

Making massive MIMO operate in the TDD mode is a promising way to circumvent the identiﬁed difﬁcultiesin the FDD mode. Owing to channel reciprocity in the TDD mode, the CSI obtained via uplink training can beutilized for downlink transmission. More importantly, the cost of uplink training now increases linearly with thenumber of active UEs rather than that of BS antennas. Typically, for obtaining accurate CSI, it requires that eachUE transmits an orthogonal pilot sequence to its serving BS. However, the number of available orthogonal pilotsequences is limited by the ratio of the channel coherence interval to the channel delay spread [12], which maybe small due to the mobility of UEs or adverse physical environments. When the number of overall UEs becomes

May 5, 2015 DRAFT

10 15 20 25 300102030405060 Pilot sequence length ( τ ) N u m be r o f ad m i ss i b l e U E s ( K ) GWBE schemeWBE schemeFOS scheme

Fig. 2: The number ( K = 3 l ) of admissible UEs versus pilot sequence length for the GWBE, WBE, and FOSschemes, given a ﬁxed SINR-requirement pattern, that is (cid:8) γ ∼ l = / , γ ( l +1) ∼ l = 1 , γ (2 l +1) ∼ l = 3 (cid:9) (from [13].)large, the situation of using non-orthogonal pilot sequences, known as pilot contamination, inevitably arises. Aconsequence of pilot contamination is intra- and inter-cell interference.During the uplink training phase, the received signal at the i th BS is given by Y UL i = L X l =1 S UL l H i,l + Z UL i , (3)where H i,l = [ h i,l, , . . . , h i,l,K ] ′ consists of channel vectors from UEs in the l th cell to the i th BS, the columnsof S UL l form a set of τ × pilot sequences { s l,k } Kk =1 , and Z UL i denotes an additive noise matrix. To illustrate thecase of intercell interference, assume that the same set of orthogonal pilot sequences is reused in each cell, i.e., S UL = · · · = S UL L and s ′ l,k s l,k = 0 for k = k , as shown in Fig. 1(b). Employing the LS estimator yields thechannel estimate b H i,i = h ( S UL i ) H S UL i i ( S UL i ) H Y UL i , = H i,i + X l = i H i,l + h ( S UL i ) H S UL i i ( S UL i ) H Z UL i , (4)where the rows of b H i,i are given by ˆh i,i,k = P Ll =1 h i,l,k when ignoring the noise. During downlink transmission,using estimates ˆh i,i,k to form the transmit signal x i = P Kk =1 w TDD i,k x i,k , where w TDD i,k = P Ll =1 ( h Hi,l,k ) ′ are MRTprecoding vectors, will cause interference May 5, 2015 DRAFT I i,j,m = h ′ i,j,m x i = k h i,j,m k x i,m + X k = m or l = j h Hi,l,k h i,j,m x i,k (5)to UE m in cell j . Though the second term on the RHS of (5) decreases with the increasing BS antenna size, theﬁrst term, which does not vanish, makes the received signal-to-interference-plus-noise ratio (SINR) at UE m in cell j converge to a limit and becomes the performance limiting factor.The current investigation into TDD pilot contamination focuses on its impact on the received SINR or the sumrate when linear precoders/detectors are applied. However, very little is known about its impact on the systemequipped with nonlinear precoders/detectors. A recent work [13] provides an interesting perspective on the usercapacity of pilot-contaminated massive MIMO which quantiﬁes the maximum number of admissible UEs giventheir own SINR requirements. As shown in Fig. 2, the user capacity of three schemes of joint pilot design andtransmit power allocation is fundamentally limited by the length of pilot sequences. For further details about pilotcontamination in TDD massive MIMO, the study [14] and references therein should be consulted.III. S PARSITY -I NSPIRED

CSI A

CQUISITION

Despite the challenges imposed by the high dimensionality of channel matrices, a number of research effortshave sought to address them and have achieved reasonably efﬁcient CSI acquisition. In particular, sparsity-inspiredapproaches have been proved to be powerful tools, as presented below.

A. FDD Massive MIMO1) The Joint CSI Recovery Method:

Authors of [15] proposed a method for low-overhead pilot training in thesingle-cell scenario, taking advantage of channel sparsity. Provided that a uniform linear array with critically spacedantennas is employed at the BS, the channel h k , where indices of BSs are discarded in the single-cell scenario,exhibits a sparse representation h a k in the angular domain, i.e., h k = Uh a k , (6)where U is a discrete Fourier transform (DFT) matrix whose columns form an angular basis. The cardinality ofsupp ( h a k ) can be reasonably assumed to be greatly less than M because of limited local scattering at the BSwhose antenna array mounted higher than surrounding scatterers. Additionally, based on the results in [16], it hasbeen argued that the channels to UEs are likely to share a partially common support in the angular domain, i.e., ∩ Kk =1 supp ( h a k ) = Ω c . In order to utilize the channel sparsity and common support property simultaneously, channel The pilot sequences employed in the GWBE, WBE, and FOS schemes are respectively generalized Welch bound equality (GWBE) sequences,WBE sequences, and ﬁnite orthogonal sequences (FOS) whose correlation among sequences is either 1 or 0. The same downlink power allocation, P i ∝ γ i / (1+ γ i ) , is used in the three schemes. May 5, 2015 DRAFT measurements acquired at UEs are fed back to the serving BS via error-free feedback channels. Hence, a jointchannel recovery problem can be formulated as follows: min { h k , ∀ k } P Kk =1 k y DL k − S DL h k k s.t. ∩ Kk =1 supp ( h a k ) = Ω c . (7)Using orthogonal matching pursuit (OMP) as a basis, a greedy algorithm has been proposed to efﬁciently solvethis problem. The simulation results show that the required training overhead for this recovery algorithm can besigniﬁcantly less than that for the conventional LS estimator. Moreover, the mean square error (MSE) performanceimproves with the increasing cardinality of Ω c .One major concern about this joint recovery approach is the underlying assumption of perfect channel measure-ments being fed back. As practical feedback channels are rate-limited, it is more reasonable to assume quantizedmeasurements at the BS. The impact of quantization on the channel recovery performance requires further investi-gation. On the other hand, it has been suggested that the amount of channel measurements that is needed at the BSshould be adaptively adjusted according to the sensitivity of the system performance to the CSI inaccuracy [17].Furthermore, there has been little quantitative analysis of the required training overhead against the channel sparsitylevel. This quantiﬁcation is in dire need as it will help us measure the actual training overhead reduction that canbe achieved without relying on time-consuming simulations.

2) The Weighted ℓ Minimization Method:

Considering a similar single-cell scenario, the study in [18] hasdrawn attention to utilizing partial support information of sparse massive MIMO channels, which is a collectionof indices of signiﬁcant entries of channel vectors in the angular domain. The main advantage of using partialsupport information is the possibility of achieving a remarkable training overhead reduction. Speciﬁcally, the orderof the required overhead decreases from O ( s log M ) to O ( s ) where s = card [ supp ( h a k )] is the channel sparsitylevel. Assume that the partial support information b T k of channel h a k is available at UE k , where card ( b T k ) = ˆ s andcard [ supp ( h a k ) ∩ b T k ] is given by ⌊ α ˆ s ⌋ . The higher the factor α , the higher is the accuracy level of partial supportinformation. Based on a weighted ℓ minimization framework, the channel recovery is performed as follows: min ˆh a k ∈ C M (cid:13)(cid:13)(cid:13) ˆh a k (cid:13)(cid:13)(cid:13) , w subject to (cid:13)(cid:13)(cid:13) S DL Uˆh a k − y DL k (cid:13)(cid:13)(cid:13) ≤ ǫ, with w i =  , i / ∈ b T k , , i ∈ b T k , (8)where S DL ∈ C N × M is designed to be a Gaussian random matrix of independent complex normal entries, the noise z DL k is assumed to be upper bounded, i.e., k z DL k k ≤ ǫ , and || ˆh a k || , w = P Mi =1 w i | ˆ h a k [ i ] | . In the objective function,the entries that are expected to be zero are weighted more heavily than others. The results show a signiﬁcantimprovement over the method without using partial support information when the accuracy level α exceeds a certainthreshold. Moreover, taking a convex geometry approach, the authors have successfully and precisely quantiﬁed the May 5, 2015 DRAFT

10 20 30 40 50102030405060708090

Sparsity level ( s ) N u m be r o f m ea s u r e m en t s

60% Recovery, Analy. α =0.260% Recovery, Empir. α =0.255% Recovery, Analy. α =0.855% Recovery, Empir. α =0.8 Fig. 3: Phase transition curves of (8) over different values of α given M = 100 , ˆ s = 10 , z DL k = , and ǫ = 0 (from[18].)required training overhead for achieving a certain percentage of exact recovery. The exact recovery is declared if || ˆh a k − h a k || ≤ − . As shown in Fig. 3, the analytical curves of α = 0 . and α = 0 . can accurately depict theempirical phase transition curves of exact recovery and exact recovery, respectively.Unlike the previous method, here, channel measurements are not fed back to the BS. In other words, it avoidsthe assumption of error-free feedback channels. However, it raises another issue of storing random matrices at UEswith limited memory. Also, performing convex optimization can impose a stringent computation requirement onUEs without seeking for low-complexity solutions. Several attempts have been made to design practical trainingmatrices. In [19], Toeplitz-structured training matrices, suggested for the realistic implementation, are shown toperform comparably to Gaussian random matrices and require generating less independent random variables. Adeterministic approach to the training matrix design is ﬁrst considered by appealing to matrix properties suchas mutual coherence [20]. More advanced deterministic training matrices are developed in [21] to yield higherrecovery accuracy. In the context of FDD massive MIMO, it would be interesting to invent structurally random ordeterministic training matrices that take partial support information of channels to multiple UEs into consideration.In addition, the similar concepts of using prior channel knowledge to lower training overhead can be found in [4]where spatial and temporal correlations are harnessed. More study is needed to better understand how to integrateall the relevant prior knowledge into efﬁcient CSI acquisition. B. TDD Massive MIMO

As mentioned in Sec. II-B, employing uplink training to obtain high-dimensional downlink CSI results in undesiredpilot contamination, and the following are some efforts to address this issue.

1) The Coordinated MMSE Method:

Contradicting conventional wisdom, it has been shown that it is possible tomitigate pilot contamination using the linear MMSE estimator [22]. The key factor in determining the success of

May 5, 2015 DRAFT

MMSE estimation is that each channel to the UE can be regarded as a linear combination of ﬁnite steering vectors h i,j,k = 1 √ P P X p =1 α i,j,k ( p ) a [ θ i,j,k ( p )] , (9)where P is the number of paths, α i,j,k ( p ) are zero-mean path gains, and a [ θ i,j,k ( p )] denote the steering vectorsdue to angle of arrivals (AoAs) θ i,j,k ( p ) . Consequently, the rank of the channel covariance matrix R i,j,k , E { h i,j,k h Hi,j,k } depends on the range [ θ min i,j,k , θ max i,j,k ] in which AoAs θ i,j,k ( p ) lie, which typically turns out to below. Let us focus on the k th row of (4), i.e., ˆh i,i,k = P Ll =1 h i,l,k + z i,k . Based on it, the desired channel h i,i,k canbe further extracted by the MMSE estimator, i.e., ˆˆh i,i,k = R i,i,k σ z I M + L X l =1 R i,l,k ! − ˆh i,i,k , (10)where the covariance matrix of z i,k is assumed to be σ z I M . When the range of AoAs due to interfering UEs that usethe same pilot sequence does not overlap with the AoA range due to the desired UE, the estimate ˆˆh i,i,k approachesthe desired h i,i,k as the BS antenna size grows to inﬁnity. This feature is highly attractive because the dimensionof the BS antennas can be made as large as desired in massive MIMO. Moreover, the condition of non-overlappingAoA ranges can be satisﬁed if the reused pilot sequence is properly allocated to UEs in neighboring cells. A heuristicalgorithm has been developed to perform pilot allocation in a coordinated manner. Another favorable feature ofthis method recently demonstrated in [23] is that the asymptotically optimal estimate is obtainable whether uniformor non-uniform arrays are employed. As a result, BS antenna arrays are exempt from the requirement of highcalibration accuracy.The second-order statistics of high-dimensional channels have successfully been utilized to facilitate robust MMSEchannel estimation under pilot contamination. However, obtaining channel covariance matrices of high dimensionimposes another challenge to the massive MIMO system. It is interesting to know if the low-rankness can help speedup the acquisition of channel covariance matrices. Furthermore, it is still unknown if this covariance-matrix-awaremethod is sensitive to the inaccuracy of the second-order statistics. On the other hand, the information about AoAsactually can be extracted from statistical channel knowledge prior to commencing the instantaneous CSI acquisition[24]. In this case, the dimension of the parameter space of each channel shrinks to P , which can be signiﬁcantlyless than the original. Most importantly, this information could aid BSs in distinguishing between training signalsfrom UEs using the same pilot.

2) The Quadratic Semideﬁnite Programming (SDP) Method:

It is suggested that a BS should collect CSI of boththe desired links within the cell and interference links from its neighboring cells [25]. In other words, the CSI ofinterference links should not be regarded as irrelevant information. From this new angle, the expression (3) can berecast as Y UL i = S UL H i + Z UL i , (11) May 5, 2015 DRAFT0

TABLE I: Comparison of Sparsity-Inspired CSI Acquisition Methods

Methods Sparsity Types Pros ConsJoint CSI Recovery(FDD) Sparse channel vectors &Common supports • Jointly exploit sparsity &common-support property • Perform channel recovery at theBS • UEs need to feed back perfectchannel measurementsWeighted ℓ Minimization(FDD) Sparse channel vectors &partial supportinformation • Sharp estimate of the requiredtraining overhead • Lower training overhead • Need to obtain partial supportinformationCoordinated MMSE(TDD) Low-rank channelcovariance matrices • Performance improves withincreasing antenna size • Lower training overhead • Need to obtain second-orderchannel statisticsQuadratic SDP(TDD) Low-rank channelmatrices • No need for knowledge ofsecond-order channel statistics • Only suitable for poor scatteringpropagation environments • Higher training overheadSparse BayesianLearning (TDD) Sparse channel vectors inthe UE domain • No need for knowledge ofsecond-order channel statistics • Channels are not jointlyrecovered • Higher training overhead where S UL , [ S UL , . . . , S UL L ] and H i , [ H ′ i, , . . . , H ′ i,L ] ′ is the full CSI of wireless links that should be recovered.Thus, the currently challenging issue is similar to that in FDD massive MIMO, i.e., how to reduce the requiredtraining overhead.In the undesirable scattering propagation environments, the rank of the channel matrix is equal to the number r of the feasible AoAs θ i,j,k ( p ) in (9), which is greatly less than max { M, K · L } . Based on this observation, aunclear norm regularized problem can be formulated as min H i k vec ( Y UL i ) − Ψ vec ( H i ) k + γ k H i k F , (12)where Ψ = S UL ⊗ I M and γ is a regularization factor. The sole purpose of adopting unclear norm regulation is tominimize the sum of the matrix’s singular values, thereby achieving rank minimization. The above problem hasbeen further recast as a quadratic SDP problem May 5, 2015 DRAFT1 min v v H v − ℜ n [ vec ( Y UL i )] H v o s.t.  γ I KL vec − KL,M (cid:0) Ψ H v (cid:1)h vec − KL,M (cid:0) Ψ H v (cid:1)i H γ I M  (cid:23) . (13)The solution v ∗ to this SDP problem determines the estimate of the channel matrix H ∗ i = vec − KL,M (cid:8) Ψ † [ vec ( Y UL i ) − v ∗ ] (cid:9) , (14)which can now be obtained efﬁciently, thanks to the readily available polynomial-time SDP solvers.In the commencing study of massive MIMO [26], the CSI of interference links at BSs is viewed as nonessential.This is because that desired links and interference links are asymptotically orthogonal, and more importantly,intercell interference can be proved manageable with the CSI of desired links only. Here, we offer an explanationwhy there is a need for acquiring the CSI of interference links in the poor scattering environments. Consider that H i = G i A where A = [ a ( φ ) , . . . , a ( φ r )] ′ is an r × M matrix of full row rank with r ≪ min { M, KL } dueto poor scattering, and G i consists of KL × r independent and identically distributed (i.i.d.) zero-mean channelgains. Then, we have lim M →∞ AA H = I r and lim M →∞ H i H Hi = G i G Hi I KL (15)which implies that the correlation among wireless links does not diminish with the increasing BS antenna size. Insuch a situation, it becomes crucial to obtain the full CSI of wireless links for effective interference management.

3) The Sparse Bayesian Learning (SBL) Method:

Sharing the same perspective as the study [25], the work in [27]also considers acquiring the full CSI of wireless links and proposes a sparse Bayesian learning method to achievethis goal. Sparse Bayesian learning was ﬁrst presented in [28] and has been proved to outperform some prevailing ℓ minimization algorithms [29]. The SBL method proceeds by ﬁrst transforming the channel matrix into the angulardomain via DFT as mentioned in the joint CSI recovery method, i.e., H i = H i U . Interestingly, instead of takingadvantage of the sparsity in the angular domain, the sparsity in the UE domain, which has been empirically shownto exist, is utilized. In other words, the column vectors of the channel matrix H i are considered one by one. Aseach column vector consists of elements due to different UEs, the independence among elements can be reasonablyassumed. This independence together with the sparsity in the UE domain leads to an effective Gaussian-mixture(GM) model which well describes the joint distributions of the channel elements. More surprisingly, empiricalresults show that there are only few parameters involved in the GM model that need to be determined. Therefore,the practical Bayes estimation can be implemented by evaluating marginal probability density functions via theapproximate message passing (AMP) algorithm [30] and learning GM parameters by means of the expectation-maximization (EM) algorithm [31]. The numerical results show that this Bayesian method can achieve a signiﬁcantreduction in estimation errors. May 5, 2015 DRAFT2

The assumption of channel vectors being sparse in the UE domain may not hold when the UE dimension KL is not large enough. A possible remedy for this situation is suggested in the following. First, it is desirable tounderstand if the GM model is also applicable for modeling distributions of spare channel vectors in the angulardomain. Second, as angular-domain channels are very likely to consist of a small number of block-wise non-zerosegments resulting from few clusters of scatterers, it is eminently reasonable to assume some dependence amongangular-domain channel elements. Hence, the distribution of the channel vector could be a mixture of Gaussianrandom vectors, and the original AMP and EM algorithms should be modiﬁed accordingly to this new GM model. C. Discussion and Comparison

In the previous subsections, several methods for efﬁcient high-dimensional CSI acquisition have been discussedfor massive MIMO communications. Table I provides a brief summary of the advantages and disadvantages ofthese methods. It is shown in the table that each method utilizes a distinct sparsity structure. However, all sparsitystructures considered in massive MIMO are based on the observation that angular-domain channels are sparse. As aresult, the second-order statistics of massive MIMO channels inherit the sparsity structure, yielding low-rank channelcovariance matrices. In addition, as sparse channels are collectively examined, it leads to either block-sparse orlow-rank channel matrices. When the UE dimension is comparable to the channel dimension, sparsity in the angulardomain also results in sparsity in the UE domain. On the basis of the aforementioned sparsity structures, differentsparsity-inspired methods are developed either to reduce training overhead or to mitigate pilot contamination.In FDD massive MIMO, without feeding back channel measurements to the BS side, less sparsity structures areavailable for developing efﬁcient CSI acquisition methods. Despite this limitation, the weighted ℓ minimizationmethod shows that achieving further overhead reduction is feasible if partial support information can be obtained inadvance and properly harnessed. Interestingly, by enabling the BS to gather perfect channel measurements from itsserved UEs, the joint CSI recovery method offers an effective way of utilizing sparsity structures across multipleUEs. If the performance superiority of this method still holds when taking rate-limited feedback channels intoaccount, it will establish the fact that ofﬂoading CSI acquisition tasks to the BS is feasible and beneﬁcial.With regard to TDD massive MIMO, uplink training has more sparsity structures to utilize as high-dimensionalchannels are jointly recovered at the BS side. It is worth noting that only low-rank channel covariance matriceshave been used for pilot decontamination. Other sparsity structures such as low-rank channel matrices and sparseUE-domain channels have not been considered for mitigating the effects of pilot reuse. In this regard, there isstill much room for innovation in sparsity-inspired pilot decontamination. It is also worth noting that using perfectcovariance matrices of both desired channels and interference channels in the coordinated MMSE method has drawncriticism [32]. It would be intriguing to assess if there exist efﬁcient algorithms for learning low-rank covariancematrices. If such algorithms are developed or identiﬁed, they should be integrated into the coordinated MMSEmethod. May 5, 2015 DRAFT3

D. Implementation Issues

Recently investigators have examined the practical implementation of compressed sensing based algorithms forsparse channel recovery [33]–[35]. Although the design targets are channel models in the 3GPP LTE standard,several insights that have been provided are still valuable and applicable to realistic implementation of sparsemassive MIMO channel recovery. It has been pointed out that greedy algorithms such as OMP or matching pursuit(MP) are more desirable from a hardware perspective. It is because these algorithms require lower computationalcomplexity and lower numerical precision when compared to convex relaxation algorithms such as basis pursuit(BP) [34]. The trade-off between hardware complexity and denoising performance of three greedy algorithms hasbeen characterized in [35] and it is indicated that the chip area overhead required to implement the gradient pursuit(GP) algorithm can be three times larger than MP. The power consumption is normally proportional to this areaoverhead. When it comes to the design of channel recovery algorithms in FDD massive MIMO, which are typicallyperformed at the UE side, the issue of hardware complexity should be carefully taken into account. On the otherhand, at the BS side, high-dimensional channels can be recovered by more advanced algorithms such as sparseBayesian learning or joint CSI recovery.

E. Implications of New Propagation Models

Most existing studies have based their CSI acquisition approaches on the conventional MIMO channel models,which may fail to capture some unique characteristics of massive MIMO channels. For instance, the far-ﬁeld andplane wavefront assumptions no longer hold when antenna arrays become physically larger than the Rayleighdistance [36]. On the other hand, the sheer size of antenna arrays, where different antenna elements observevarying subsets of scatterer clusters, makes the assumption of spatial channels being wide-sense stationary on thearray axis no longer valid [37]. While new channel models have been proposed in [38], [39] by making a moreaccurate spherical wavefront assumption and taking the non-stationarities into consideration, there is still very littleunderstanding of how these characteristics affect the sparsity structures of the channels in massive MIMO systems.One previous result [40], however, suggests that the spherical wavefront model does adequately characterize therank of the channel matrix. This implies that the new channel models can potentially affect the SDP method whichexploits the sparsity in the form of the channel matrix rank. In addition, the possibility that none of clusters areperceptible to some antenna elements cannot be categorically excluded, so it indicates the possible presence of thesparsity on the array axis. These inferences suggest that there is abundant room for further progress in identifyingutilizable sparsity structures based on the latest models.IV. C

ONCLUSIONS

In this article, the challenges of acquiring high-dimensional CSI in FDD/TDD massive MIMO systems have beendiscussed. To address these challenges and break the curse of dimensionality, one can effectively utilize sparsitystructures that uniquely appear in massive MIMO channels. Several state-of-the-art sparsity-inspired approachesfor high-dimensional CSI acquisition have been examined and compared in terms of the sparsity structures being

May 5, 2015 DRAFT4 exploited, while their own advantages and disadvantages are identiﬁed. As a result of this study, the followingconclusions can be drawn. The sparsity structures that can be harnessed are conditional on the radio propagationenvironments. In TDD massive MIMO, uplink training inherently has more sparsity structures to exploit as high-dimensional channels are jointly recovered at the BS. On the contrary, in the FDD mode, the desired channel isnormally recovered at the UE where utilizable sparsity structures are limited. Finally, based upon existing approaches,we have identiﬁed the potential research problems in need of further investigation.R

EFERENCES[1] T. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,”

IEEE Trans. Wireless Commun. , vol. 9,no. 11, pp. 3590–3600, Nov. 2010.[2] F. Rusek, D. Persson, B. K. Lau, E. Larsson, T. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities and challengeswith very large arrays,”

IEEE Signal Process. Mag. , vol. 30, no. 1, pp. 40–60, 2013.[3] G. Bartoli, R. Fantacci, K. B. Letaief, D. Marabissi, N. Privitera, M. Pucci, and J. Zhang, “Beamforming for small cell deployment inLTE-advanced and beyond,”

IEEE Wireless Commun. , vol. 21, no. 2, pp. 50–56, Apr. 2014.[4] J. Choi, D. Love, and P. Bidigare, “Downlink training techniques for FDD massive MIMO systems: Open-loop and closed-loop trainingwith memory,”

IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 802–814, Oct. 2014.[5] J. Jose, A. Ashikhmin, T. Marzetta, and S. Vishwanath, “Pilot contamination and precoding in multi-cell TDD systems,”

IEEE Trans.Wireless Commun. , vol. 10, no. 8, pp. 2640–2651, Aug. 2011.[6] W. Bajwa, J. Haupt, A. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse multipath channels,”

Proc. IEEE , vol. 98, no. 6, pp. 1058–1076, Jun. 2010.[7] Y. C. Eldar and G. Kutyniok,

Compressed Sensing: Theory and Applications . Cambridge University Press, 2012.[8] M. Biguesh and A. Gershman, “Training-based MIMO channel estimation: A study of estimator tradeoffs and optimal training signals,”

IEEE Trans. Signal Process. , vol. 54, no. 3, pp. 884–893, Mar. 2006.[9] J. W. Kang, Y. Whang, H. Y. Lee, and K. S. Kim, “Optimal pilot sequence design for multi-cell MIMO-OFDM systems,”

IEEE Trans.Wireless Commun. , vol. 10, no. 10, pp. 3354–3367, Oct. 2011.[10] D. Love, R. Heath, V. Lau, D. Gesbert, B. Rao, and M. Andrews, “An overview of limited feedback in wireless communication systems,”

IEEE J. Sel. Areas Commun. , vol. 26, no. 8, pp. 1341–1365, Oct. 2008.[11] A. Ghosh, J. Zhang, J. G. Andrews, and R. Muhamed,

Fundamentals of LTE . Prentice-Hall, 2010.[12] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, “Massive MIMO for next generation wireless systems,”

IEEE CommunicationsMagazine , vol. 52, no. 2, pp. 186–195, Feb. 2014.[13] J.-C. Shen, J. Zhang, and K. B. Letaief, “User capacity of pilot-contaminated TDD massive MIMO systems,” in

Proc. IEEE GlobalCommun. Conf. (Globecom) , Austin, TX, Dec. 2014.[14] L. Lu, G. Li, A. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Beneﬁts and challenges,”

IEEE J. Sel.Topics Signal Process. , vol. 8, no. 5, pp. 742–758, Oct. 2014.[15] X. Rao and V. Lau, “Distributed compressive CSIT estimation and feedback for FDD multi-user massive MIMO systems,”

IEEE Trans.Signal Process. , vol. 62, no. 12, pp. 3261–3271, Jun. 2014.[16] J. Poutanen, K. Haneda, J. Salmi, V. Kolmonen, F. Tufvesson, T. Hult, and P. Vainikainen, “Signiﬁcance of common scatterers in multi-linkindoor radio wave propagation,” in

Proc. 4th Eur. Conf. Antennas Propag. (EuCAP) , Barcelona, Spain, Apr. 2010, pp. 1–5.[17] P.-H. Kuo, H. Kung, and P.-A. Ting, “Compressive sensing based channel feedback protocols for spatially-correlated massive antennaarrays,” in

Proc. IEEE Wireless Commun. Netw. Conf. (WCNC) , Shanghai, China, Apr. 2012, pp. 492–497.[18] J.-C. Shen, J. Zhang, E. Alsusa, and K. B. Letaief, “Compressed CSI acquisition in FDD massive MIMO with partial support information,”to be presented at the IEEE Int. Conf. Commun. (ICC), London, UK, 2015.[19] W. Bajwa, J. Haupt, G. M. Raz, S. Wright, and R. Nowak, “Toeplitz-structured compressed sensing matrices,” in

Proc. IEEE/SP 14thWorkshop Statist. Signal Process. , Madison, WI, Aug. 2007, pp. 294–298.[20] M. Elad, “Optimized projections for compressed sensing,”

IEEE Trans. Signal Process. , vol. 55, no. 12, pp. 5695–5702, Dec. 2007.

May 5, 2015 DRAFT5 [21] G. Li, Z. Zhu, D. Yang, L. Chang, and H. Bai, “On projection matrix optimization for compressive sensing systems,”

IEEE Trans. SignalProcess. , vol. 61, no. 11, pp. 2887–2898, Jun. 2013.[22] H. Yin, D. Gesbert, M. Filippou, and Y. Liu, “A coordinated approach to channel estimation in large-scale multiple-antenna systems,”

IEEE J. Sel. Areas Commun. , vol. 31, no. 2, pp. 264–273, Feb. 2013.[23] H. Yin, D. Gesbert, and L. Cottatellucci, “Dealing with Interference in distributed large-scale MIMO systems: A statistical approach,”

IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 942–953, Oct. 2014.[24] J. Foutz, A. Spanias, and M. K. Banavar,

Narrowband Direction of Arrival Estimation for Antenna Arrays . Morgan & Claypool Publishers,2008.[25] S. L. H. Nguyen and A. Ghrayeb, “Compressive sensing-based channel estimation for massive multiuser MIMO systems,” in

Proc. IEEEWireless Commun. Netw. Conf. (WCNC) , Shanghai, China, Apr. 2013, pp. 2890–2895.[26] T. Marzetta, “How much training is required for multiuser MIMO,” in

Proc. 40th Asilomar Conf. Signals, Syst., Comput. (ACSSC) , PaciﬁcGrove, CA, Oct. 2006, pp. 359–363.[27] C.-K. Wen, S. Jin, K.-K. Wong, J.-C. Chen, and P. Ting, “Channel estimation for massive MIMO using Gaussian-mixture Bayesianlearning,”

IEEE Trans. Wireless Commun. , 2014.[28] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,”

J. Mach. Learn. Res. , vol. 1, pp. 211–244, Sept. 2001.[29] Z. Zhang and B. Rao, “Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning,”

IEEE J. Sel.Topics Signal Process. , vol. 5, no. 5, pp. 912–926, Sept. 2011.[30] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,”

Proc. Nat. Acad. Sci. , vol. 106, no. 45,pp. 18 914–18 919, Nov. 2009.[31] J. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture approximate message passing,”

IEEE Trans. Signal Process. , vol. 61,no. 19, pp. 4658–4672, Oct. 2013.[32] J. Zhang, B. Zhang, S. Chen, X. Mu, M. El-Hajjar, and L. Hanzo, “Pilot contamination elimination for large-scale multiple-antenna aidedOFDM systems,”

IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 759–772, Oct. 2014.[33] J. Lofgren, L. Liu, O. Edfors, and P. Nilsson, “Improved matching-pursuit implementation for LTE channel estimation,”

IEEE Trans.Circuits Syst. I: Reg. Papers , vol. 61, no. 1, pp. 226–237, Jan. 2014.[34] P. Maechler, P. Greisen, N. Felber, and A. Burg, “Matching pursuit: Evaluation and implementatio for LTE channel estimation,” in

Proc.IEEE Int. Symp. Circuits Syst. (ISCAS) , Paris, France, May 2010, pp. 589–592.[35] P. Maechler, P. Greisen, B. Sporrer, S. Steiner, N. Felber, and A. Burg, “Implementation of greedy algorithms for LTE sparse channelestimation,” in

Proc. Asilomar Conf. Signals, Syst., Comp. (ASILOMAR) , Paciﬁc Grove, CA, Nov. 2010, pp. 400–405.[36] S. Payami and F. Tufvesson, “Channel measurements and analysis for very large array systems at 2.6 GHz,” in

Proc. Europ. Conf. AntennasPropag. (EUCAP) , Prague, Czech Republic, Mar. 2012, pp. 433–437.[37] X. Gao, F. Tufvesson, and O. Edfors, “Massive mimo channels - measurements and models,” in

Proc. 47th Annu. Asilomar Conf. Signals,Syst., Comput. , Paciﬁc Grove, CA, Nov. 2013, pp. 280–284.[38] S. Wu, C.-X. Wang, E.-H. Aggoune, M. Alwakeel, and Y. He, “A non-stationary 3-D wideband twin-cluster model for 5G massive MIMOchannels,”

IEEE J. Sel. Areas Commun. , vol. 32, no. 6, pp. 1207–1218, Jun. 2014.[39] S. Wu, C.-X. Wang, H. Haas, E.-H. Aggoune, M. Alwakeel, and B. Ai, “A non-stationary wideband channel model for massive MIMOcommunication systems,”

IEEE Trans. Wireless Commun. , vol. 14, no. 3, pp. 1434–1446, Mar. 2015.[40] J.-S. Jiang and M. Ingram, “Spherical-wave model for short-range MIMO,”

IEEE Trans. Commun. , vol. 53, no. 9, pp. 1534–1541, Sept.2005., vol. 53, no. 9, pp. 1534–1541, Sept.2005.