[PDF] FDD Massive MIMO Based on Efficient Downlink Channel Reconstruction

Abstract

Massive multiple-input multiple-output (MIMO) systems deploying a large number of antennas at the base station considerably increase the spectrum efficiency by serving multiple users simultaneously without causing severe interference. However, the advantage relies on the availability of the downlink channel state information (CSI) of multiple users, which is still a challenge in frequency-division-duplex transmission systems. This paper aims to solve this problem by developing a full transceiver framework that includes downlink channel training (or estimation), CSI feedback, and channel reconstruction schemes. Our framework provides accurate reconstruction results for multiple users with small amounts of training and feedback overhead. Specifically, we first develop an enhanced Newtonized orthogonal matching pursuit (eNOMP) algorithm to extract the frequency-independent parameters (i.e., downtilts, azimuths, and delays) from the uplink. Then, by leveraging the information from these frequency-independent parameters, we develop an efficient downlink training scheme to estimate the downlink channel gains for multiple users. This training scheme offers an acceptable estimation error rate of the gains with a limited pilot amount. Numerical results verify the precision of the eNOMP algorithm and demonstrate that the sum-rate performance of the system using the reconstructed downlink channel can approach that of the system using perfect CSI.

Full PDF

11 FDD Massive MIMO Based on Efﬁcient DownlinkChannel Reconstruction

Yu Han ∗ , Qi Liu ∗ , Chao-Kai Wen † , Shi Jin ∗ , and Kai-Kit Wong §∗ National Mobile Communications Research Laboratory, Southeast University, P. R. China † Institute of Communications Engineering, National Sun Yat-sen University, Taiwan § University College London, London, United KingdomEmail: { hanyu,qiliu } @seu.edu.cn, [email protected],[email protected], [email protected] Abstract

Massive multiple-input multiple-output (MIMO) systems deploying a large number of antennas at the base station considerablyincrease the spectrum efﬁciency by serving multiple users simultaneously without causing severe interference. However, theadvantage relies on the availability of the downlink channel state information (CSI) of multiple users, which is still a challenge infrequency-division-duplex transmission systems. This paper aims to solve this problem by developing a full transceiver frameworkthat includes downlink channel training (or estimation), CSI feedback, and channel reconstruction schemes. Our frameworkprovides accurate reconstruction results for multiple users with small amounts of training and feedback overhead. Speciﬁcally,we ﬁrst develop an enhanced Newtonized orthogonal matching pursuit (eNOMP) algorithm to extract the frequency-independentparameters (i.e., downtilts, azimuths, and delays) from the uplink. Then, by leveraging the information from these frequency-independent parameters, we develop an efﬁcient downlink training scheme to estimate the downlink channel gains for multipleusers. This training scheme offers an acceptable estimation error rate of the gains with a limited pilot amount. Numerical resultsverify the precision of the eNOMP algorithm and demonstrate that the sum-rate performance of the system using the reconstructeddownlink channel can approach that of the system using perfect CSI.

Index Terms

Downlink channel reconstruction, FDD massive MIMO, multiuser transmission.

I. I

NTRODUCTION

Massive multiple-input multiple-output (MIMO) is a key enabler of the ﬁfth-generation and future mobile communicationnetworks [1]–[3]. Large-scale antenna arrays are equipped at the base stations (BSs) to fully exploit the spatial degrees offreedom, for providing huge room for spatial division multiplexing [4]–[6]. Multiple users can be served by the BS on the sametime-frequency resource block, and the spatial multiplexing dimension can be further expanded by scaling up the array at theBS. The large array is usually structured in a planar or circular topology to exploit horizontal and vertical spaces and realizethree dimensional (3D) MIMO techniques. A beam formed by the array can ﬂexibly target at any direction in the 3D spacedepending on practical requirements [7]–[9]. We can also design beam weights to produce a set of spatially orthogonal beamsusing the large array and transmit multiple data streams on these beams without causing interference [10]. These advantagesresult in the high sum-rate performance of multiuser massive MIMO systems.A prerequisite to gain these advantages is the acquisition of the channel state information (CSI). Due to the lack of uplink-downlink reciprocity in frequency-division-duplex (FDD) systems, downlink training-then-feedback is a typical solution fordownlink channel estimation. In the fourth-generation era and before, the number of BS antennas is relatively small. Thus,downlink CSI can be easily acquired by sending orthogonal downlink pilots, applying channel estimation at the user side,and ﬁnally feeding the estimates back to the BS. However, in massive MIMO systems, using completely orthogonal downlinkpilots and sending back high-dimensional complex channel matrices are impractical. Obtaining downlink CSI at the BS sidebecomes a bottleneck in FDD massive MIMO systems. Therefore, researchers have been searching for new solutions to obtaindownlink CSI and design corresponding transmission schemes. a r X i v : . [ c s . I T ] F e b Related Work:

In this area, some methods followed the traditional approach by transmitting downlink pilots and sendingback the estimates to the BS. For example, compressed sensing was introduced to estimate the sparse channel through a smallamount of downlink measurements [11], [12]. However, comprehensive signal processing was conducted at the user side, whichraised an exorbitant requirement on the capability of the user equipment. Moreover, time-correlation of the wireless channelcan be utilized in downlink training and feedback phases. In [13], the user estimated downlink CSI based on the currentlyreceived downlink pilots and the estimated CSI obtained at the previous moment. Before feeding back the estimates, [14] and[15] proposed to quantize the channel based on previous results within the coherence time by using a trellis-extended codebookand an angle of departure-adaptive subspace codebook. However, these methods [13]–[15] relied on the accuracy of the initialestimates.In recent years, the idea of using the spatial reciprocity between uplink and downlink has attracted increasing attention.Many efforts have suggested to acquire downlink CSI by using the information obtained from the uplink. Existing workgenerally aims to obtain two types of downlink CSI. The ﬁrst type is partial CSI, such as the spatial information or the reduceddimensional channel. For example, only the angles of propagation paths are estimated during the training phase. Alternatively,channel sparsity in beamspace is utilized and spatial angle estimation is translated to the search for the non-zero elementsin the beamspace channel. In this context, only spatial directions or beam indices are known at the BS, and user schedulingis required to avoid spatial overlapping among different users [10], [16]. The second type is full CSI, which describes thefull-dimensional channel and contains the complete information in the propagation environment. Full CSI is usually obtainedby channel estimation or reconstruction scheme. With full CSI, the BS can serve a large number of users simultaneously byusing precoding to eliminate the interference. The BS also can conduct a comprehensive user scheduling scheme to maximizethe sum-rate performance and fully exploit the spatial multiplexing gain.Most recent works aim to acquire the full CSI [17]–[21]. Focusing on the clustering channel that covers a continuous angularregion, [18]–[20] suggested to estimate the channel based on the downlink channel covariance matrix that describes the angulardomain energy distribution and can be derived from its uplink version. For the limited scattering channel where multiple distinctpaths exist, the authors in [21] proposed a uniﬁed transmission strategy with the aid of the spatial basis expansion model.Using channel sparsity, the reduced dimensional angular-domain downlink channel from each user was ﬁrst estimated and thentransformed back to the full-dimensional antenna domain. These methods are based on the angular domain that uniformlyover-samples the space and estimate the power distribution on these sample points. If the paths are distinguishable, then thenumber of sample points with distinct projected power is usually larger than that of the real paths. As a result, the trainingoverhead used to obtain the power distribution on these sample points increases.To save the training resources, [22]–[25] considered the real paths and reconstructed the downlink channel by extractingeach component path. They proposed to estimate frequency-independent parameters in the uplink. Then, only the downlinkgains should be estimated through downlink training and feedback, and the amount of pilots and feedback overhead was small.The effectiveness of the proposed schemes were also demonstrated in over-the-air tests. However, [22] did not consider delays,set restriction on the estimated number of paths, and only retained four dominant paths. [23]–[25] merely provided a channelreconstruction scheme to obtain the downlink channel of a single user. Given that [23]–[25] used user-dedicated pilots fordownlink training, if we simply extend the method to ﬁt multiuser scenarios, then a large amount of training resource is neededto estimate downlink gains of multiuser channels, as designed in [22]. Moreover, the schemes proposed in [23] and [24] areinapplicable in massive 3D-MIMO systems because they can not extract the full-dimensional spatial parameters.

Contributions:

This paper considers the multiuser massive MIMO system using orthogonal frequency-division-multiplexing(OFDM) technique under FDD mode. By resolving the drawbacks in [23]–[25] for multiuser scenarios, we present a fulltransceiver framework for downlink channel reconstruction with high reconstruction precision and low training and feedbackoverhead. Following the mechanism in [23]–[25], we also reconstruct the downlink channel by extracting each componentpath. Speciﬁcally, our transceiver framework combines three components as described below.1)

Frequency-independent parameter extraction:

We develop an enhanced Newtonized orthogonal matching pursuit(eNOMP) algorithm that can detect the component paths from their noisy mixture and extract the gain, downtilt, azimuth,and delay of each path for massive MIMO OFDM systems. Part of this work will be shown in the conference version of this paper [25]. Numerical results show that eNOMP can provide precise extraction results. eNOMP-based uplink channelreconstruction also achieves smaller mean square error (MSE) than the linear minimum MSE (LMMSE) estimation.2)

Pilot scheduling:

After obtaining the frequency-independent parameters (i.e., downtilt, azimuth, and delay) of each pathfrom the uplink channels, the subsequent task is to estimate the downlink gains through the downlink pilots. The bestway to estimate each downlink gain is to apply beamforming in downlink training process and send the pilot aligningwith each downtilt and azimuth direction (i.e., dedicated pilot). However, this approach using the dedicated pilots willresult in large training overhead in the multiuser scenario. To solve this problem, we propose to co-use and remove somepilots (or beams) while slightly compromising the estimation performance. In particular, we introduce an approximationof the MSE for the channel gains to predict the corresponding performance degradation for determining which pilots (orbeams) can be removed. We use the greedy method to exclude the unnecessary pilots. The proposed pilot schedulingscheme can minimize the downlink training overhead and offer an acceptable estimation error rate.3)

Multiuser downlink gain estimation and CSI feedback:

After determining the pilots, we use them in a broadcastmanner instead of the dedicated manner. That is, all the users can use the broadcast pilots. However, if each user directlysends the channel response on each corresponding pilot back to the BS, then the amount of feedback remains large. Toreduce the feedback overhead, the BS sends the corresponding frequency-independent parameters to each user. Thereafter,the user can estimate the downlink gain of each path on the basis of the downtilt, azimuth, and delay, and send theestimates back to the BS. Given that the number of paths in each user channel is much smaller than the pilot numbers, thefeedback overhead of the proposed scheme is also very small. We evaluate the multiuser sum-rate performance throughtheoretical analysis and simulations and observe that the sum-rate offered by the reconstruction channel remains large.This ﬁnding validates the effectiveness and efﬁciency of the proposed downlink channel reconstruction scheme.The rest of the paper is organized as follows. Section II describes the massive MIMO-OFDM system operating in FDD mode,introduces the uplink and downlink channel models in terms of the frequency-independent parameters, and brieﬂy outlines theproposed downlink channel reconstruction and multiuser transmission scheme. Section III presents the eNOMP algorithm, andhighlights the new codebook and the updated Newton step designed for the 3D massive MIMO-OFDM scenario. Section IVpresents the working principle of the proposed low-cost downlink-training strategy for the multiuser system. Section V appliesthe reconstruction results to the downlink multiuser transmission scheme and analyzes the sum-rate performance. Section VIprovides the numerical results, and Section VII elaborates the conclusions.

Notations —We denote matrices and vectors by uppercase and lowercase boldface letters, respectively. By contrast, thesuperscripts ( · ) † , ( · ) H , and ( · ) T denote pseudo-inverse, conjugate-transpose, and transpose, respectively. We also denote [ A ] i, : and [ A ] : ,j as the i th row and the j th column of matrix A , and [ A ] i,j as the ( i, j ) th entry of A . ⊗ denotes taking Kroneckerproduct. R{·} represents taking the real part of a complex number, whereas E {·} takes the expectation with respect to therandom variables inside the brackets. |·| and (cid:107)·(cid:107) indicate taking the absolute value and modulus operations, and (cid:98)·(cid:99) and (cid:100)·(cid:101) imply rounding a decimal number to its nearest lower and higher integers, respectively.II. S YSTEM M ODEL

A. Channel Model

A single-cell massive MIMO system using FDD transmission mode and OFDM modulation is considered. We denote theuplink and downlink carrier frequencies as f ul and f dl , respectively. We also assume that each of the uplink and downlinkfrequency bands has N sub-carriers with spacing (cid:52) f . The BS serves K users who are randomly distributed in the cell. TheBS is equipped with a UPA, and each user has a single antenna. The UPA contains M = M h M v antenna elements, including M h elements in each row and M v elements in each column, and M (cid:29) K . The distance between two horizontally or verticallyadjacent elements is d = λ/ , where λ is the carrier wavelength. As shown in Fig. 1, scatterers exist in the space, and the userchannel is composed of multiple propagation paths. The wireless signal can arrive at the user side along with the line-of-sightpath or be reﬂected by several scatterers. Different user channels may share a common scatterer and are spatially overlappedwith each other. The wireless channel of each user remains constant over T c OFDM symbols.

BS User K User 1 User 2

Fig. 1. FDD massive MIMO system. BS is equipped with UPA and serves K users simultaneously. Buildings, trees, and cars are all scatterers in the wirelesschannel. For user k , when down-converted to the baseband, its uplink multipath channel is expressed as h ul k = L k (cid:88) l =1 g ul k,l a ( θ k,l , φ k,l ) ⊗ p ( τ k,l ) , (1)where L k is the number of propagation paths of user k ; a ( θ k,l , φ k,l ) is the steering vector of UPA with θ k,l ∈ [ − π/ , π/ and φ k,l ∈ [ − π/ , π/ being the downtilt and the azimuth of the l th propagation path of user k , respectively; p ( τ ) = (cid:104) , e j π (cid:52) fτ , . . . , e j π ( N − (cid:52) fτ (cid:105) T (2)is the delay vector on the OFDM sub-carriers; and τ k,l is the delay of the l th propagation path of user k . Here, the steeringvector of UPA can be explicitly expressed as a ( θ, φ ) = a v ( θ ) ⊗ a h ( θ, φ ) , a v ( θ ) = (cid:104) , e j π dλ sin θ , . . . , e j π ( M v − dλ sin θ (cid:105) T , a h ( θ, φ ) = (cid:104) , e j π dλ cos θ sin φ , . . . , e j π ( M h − dλ cos θ sin φ (cid:105) T . (3)Given the frequency-independent nature of the propagation delays and angles, the baseband channel of user k in the downlinkcan be modeled as h dl k = L k (cid:88) l =1 g dl k,l a T ( θ k,l , φ k,l ) ⊗ p T ( τ k,l ) e j π ( f dl − f ul ) τ k,l , (4)where g dl k,l is the downlink complex gain of the l th propagation path in the k th user’s channel. B. Efﬁcient Reconstruction of Downlink Channels

In the downlink of the FDD massive MIMO system, the BS needs to reconstruct the downlink channel for each userbefore conducting data transmission. Comparison between (1) and (4) shows that the uplink and downlink share the samedowntilts, azimuths, and delays while each path experiences different phase shifts [i.e., the last term in (4)] because of thecarrier frequency shift. We also ﬁnd that g dl k,l may not be equal to g ul k,l when reﬂection occurs during propagation [26]. Onthe basis of the observation above, the goal of reconstructing the downlink channel is translated to extracting the frequency-independent parameters { θ k,l , φ k,l , τ k,l } and estimating the downlink gains { g dl k,l } , for l = 1 , . . . , L k and k = 1 , . . . , K . Thefrequency-independent parameters are more efﬁcient to be extracted from the uplink channels than the downlink channels.We develop the following transceiver framework for downlink channel reconstruction and data transmission. Fig. 2 shows theworking process of the transceiver, which consists of the following phases:

1) Frequency-independent parameter extraction:

The users send sounding reference signals (RSs) to the BS. The BSextracts the frequency-independent parameters of the channel from the sounding RSs.

BSUsers ...

Sounding RSs Extract downtilts, azimuths, delays ...

Downtilts, azimuths, delays, and beam weights ...

Pilots Estimate downlink gains ...

Downlink gains Reconstruct downlink channel ...

DataPhase 1 Phase 2 Phase 3 Phase 4Design precoders

Fig. 2. Working process of the FDD massive MIMO transceiver.

2) Downlink gain estimation:

The BS transmits the estimated downtilts, azimuths, delays, and the weights of downlink-training beams to the users and then broadcasts downlink-training pilots. Each user estimates its downlink gains of thepropagation paths on the basis of the known downtilts, azimuths, delays, and beam weights.

3) Downlink channel reconstruction:

Users send the estimated downlink gains to the BS. On the basis of the spatialreciprocity, the BS reconstructs the multiuser channel by applying the frequency-independent parameters and the downlinkgains in (4).

4) Downlink data transmission:

With full CSI at the transmitter, the BS designs interference-cancellable precoders using h dl k , k = 1 , . . . , K , and maximizes the spatial multiplexing gain by serving all the K users simultaneously in the downlink.The channel reconstruction through the procedures above is reasonable. However, the use of a large-scale UPA array andthe existence of multiple users bring new challenges: • Three kinds of frequency-independent parameters need to be estimated in Phase 1. • We should estimate downlink gains for multiple users with an acceptable amount of downlink training overhead in Phase2 to ensure that sufﬁcient time resources are retained for the multiuser data transmission in Phase 4.The two key problems will be solved in the following sections.III. E

XTRACTION OF F REQUENCY - INDEPENDENT P ARAMETERS IN THE U PLINK

In this section, we focus on the ﬁrst challenge mentioned above and obtain the delays, azimuths, and downtilts of each userchannel for massive MIMO-OFDM systems.

A. Uplink RS Model

During the uplink sounding phase, each user sends sounding RSs to the BS. Sounding RSs from different users are timeseparated and the BS can distinguish sounding RSs from different users. We assume that the sounding RS from user k occupiesthe k th OFDM symbol in the uplink slot and that all-ones sounding RSs are applied. The received sounding RS at the BSfrom user k is expressed as y ul k = L k (cid:88) l =1 g ul k,l a ( θ k,l , φ k,l ) ⊗ p ( τ k,l ) + z ul k , (5)where z ul k ∈ C MN × is the additive noise vector on all subcarriers of OFDM symbol k and on all antenna elements at theBS, and each element of z ul k is independent and identically distributed (i.i.d.) with zero mean and unit variance. We aim toextract { τ k,l , θ k,l , φ k,l } l =1 ,...,L k from y ul k .This frequency (in delay and angular domain) estimation problem can be solved by utilizing the NOMP algorithm that detectsfrequencies from the noisy mixture of multiple sinusoids. However, the original NOMP algorithm in [27] and the extendedNOMP algorithm proposed in [24] do not cover the case with three types of frequencies to be extracted. Hence, in this study,we further develop an eNOMP algorithm to cater for the massive MIMO-OFDM system.   ,     , h M       , v h M M         , v M        ,      ,   N      , h M       , v h M M         , v M       , h M       , v h M M         , v M     Fig. 3. Codebook used in the eNOMP algorithm. Each vertical plane represents a lower dimensional sub-codebook that covers the sampled downtilts andazimuths on a sampled delay.

B. eNOMP for Massive MIMO-OFDM Systems eNOMP is an iteration-based algorithm that extracts a new component within each iteration through the e-OMP and e-Newton steps. We will describe the two steps later. At the end of the i th iteration of the eNOMP algorithm, the i th componentpath will be removed from the noisy mixture. If the parameters are precisely estimated, then the i th component path willbe completely eliminated and the residual noisy mixture will be minimized. When the algorithm terminates, the number ofextracted components equals the number of practical components if each component is accurately estimated.We give a detailed description of the e-OMP and e-Newton steps in the i th iteration of the eNOMP algorithm.

1) e-OMP Step:

At the beginning, the residual noisy mixture is expressed as y ulr ,k ( i ) = y ul k − i − (cid:88) l =1 ˆ g ul k,l a (ˆ θ k,l , ˆ φ k,l ) ⊗ p (ˆ τ k,l ) , (6)where ˆ g ul k,l , ˆ θ k,l , ˆ φ k,l , and ˆ τ k,l are the gain, downtilt, azimuth, and delay estimated in the l th iteration, respectively. In thee-OMP step, we exhaustively search a pre-deﬁned codebook that occupies the value regions of the downtilt, azimuth, anddelay to ﬁnd the codeword that best matches y ulr ,k ( i ) , and use the downtilt, azimuth, and delay included in this codeword asthe coarse estimates of downtilt, azimuth, and delay of the extracted i th path.To ﬁt the massive MIMO-OFDM scenario, we design the codebook in accordance with the structure of a component pathin (5). A codeword is expressed as c (¯ θ, ¯ φ, ¯ τ ) = a (¯ θ, ¯ φ ) ⊗ p (¯ τ ) , (7)where ¯ θ ∈ [ − π/ , π/ , ¯ φ ∈ [ − π/ , π/ , and ¯ τ ∈ [0 , / (cid:52) f ) are the downtilt, azimuth, and delay that c represents. Thecodebook covers the 3D space and the delay domain. Thus, we sample the angles and delays uniformly as ¯ θ ∈ (cid:26) ¯ θ = − π , ¯ θ = − π πβ θ M v , . . . , ¯ θ β θ M v = − π β θ M v − πβ θ M v (cid:27) , ¯ φ ∈ (cid:26) ¯ φ = − π , ¯ φ = − π πβ φ M h , . . . , ¯ φ β φ M h = − π β φ M h − πβ φ M h (cid:27) , ¯ τ ∈ (cid:26) ¯ τ = 0 , ¯ τ = 1 β τ N (cid:52) f , . . . , ¯ τ β τ N = β τ N − β τ N (cid:52) f (cid:27) , (8)where β θ , β φ , and β τ are the over-sampling rates of the downtilt, azimuth, and delay, respectively. The black circles in Fig. 3are the codewords. Each codeword points to a sampled spatial direction and covers a sampled delay. The e-OMP step selects the codeword with the maximum projected power from y ulr ,k ( i ) , that is, c (ˆ θ k,i , ˆ φ k,i , ˆ τ k,i ) = arg max (¯ θ, ¯ φ, ¯ τ ) | c H (¯ θ, ¯ φ, ¯ τ ) y ulr ,k ( i ) | (cid:107) c (¯ θ, ¯ φ, ¯ τ ) (cid:107) , (9)where ˆ θ k,i , ˆ φ k,i , and ˆ τ k,i are the coarsely estimated downtilt, azimuth, and delay of the i th component path. Then, the gain ofthe i th component path is calculated by ˆ g ul k,i = c H (ˆ θ k,i , ˆ φ k,i , ˆ τ k,i ) y ulr ,k ( i ) (cid:107) c (ˆ θ k,i , ˆ φ k,i , ˆ τ k,i ) (cid:107) . (10)Notably, the total number of codewords equals β θ M v × β φ M h × β τ N . Increasing the over-sampling rates helps improve thematching between y ulr ,k ( i ) and c (ˆ θ k,i , ˆ φ k,i , ˆ τ k,i ) and enhance the precision of the coarse estimates of azimuth, downtilt, anddelay. However, this step also multiplies the search time and severely degrades the efﬁciency, especially when M v , M h , and N are large. Therefore, the over-sampling rates are usually small when we design the codebook for massive MIMO systems.

2) e-Newton Step:

Before removing the i th component path from the noisy mixture at the ﬁnal stage of e-OMP, the e-Newtonstep is applied to tackle the off-grid effect and adjust the estimates toward the real values. Newton’s method can successivelyﬁnd better approximations to the roots of a function [28]. The goal of minimizing the residual noisy mixture is realized bymaximizing S ( θ, φ, τ ) = 2 (cid:60) (cid:8) y ul H r ( i ) g ul c ( θ, φ, τ ) (cid:9) − (cid:107) g ul c ( θ, φ, τ ) (cid:107) . (11)The e-Newton step is designed to reﬁne the downtilt, azimuth, and delay simultaneously by  ˆ θ (cid:48) k,i ˆ φ (cid:48) k,i ˆ τ (cid:48) k,i  =  ˆ θ k,i ˆ φ k,i ˆ τ k,i  − ¨ S (cid:16) ˆ θ k,i , ˆ φ k,i , ˆ τ k,i (cid:17) − ˙ S (cid:16) ˆ θ k,i , ˆ φ k,i , ˆ τ k,i (cid:17) , (12)where ˙ S ( θ, φ, τ ) =  ∂S∂θ∂S∂φ∂S∂τ  , ¨ S ( θ, φ, τ ) =  ∂ S∂θ ∂ S∂θ∂φ ∂ S∂θ∂τ∂ S∂φ∂θ ∂ S∂φ ∂ S∂φ∂τ∂ S∂τ∂θ ∂ S∂τ∂φ ∂ S∂τ  . (13)In (11), we regard y ulr and g ul as constant, and the derivation of S is transformed to the derivation of the codeword c . Wetake the partial derivatives of S versus θ as the examples. The ﬁrst-order partial derivative is calculated as ∂S∂θ = 2 (cid:60) (cid:26) y ul H r g ul ∂ c ∂θ − (cid:12)(cid:12) g ul (cid:12)(cid:12) c H ∂ c ∂θ (cid:27) . (14)The second-order partial and cross partial derivatives are ∂ S∂θ = 2 (cid:60) (cid:26)(cid:16) y ul H r g ul − (cid:12)(cid:12) g ul (cid:12)(cid:12) c H (cid:17) ∂ c ∂θ (cid:27) − (cid:13)(cid:13)(cid:13)(cid:13) g ul ∂ c ∂θ (cid:13)(cid:13)(cid:13)(cid:13) , (15)and ∂ S∂θ∂φ = 2 (cid:60) (cid:26)(cid:16) y ul H r g ul − (cid:12)(cid:12) g ul (cid:12)(cid:12) c H (cid:17) ∂ c ∂θ∂φ (cid:27) − (cid:60) (cid:26)(cid:12)(cid:12) g ul (cid:12)(cid:12) ∂ c H ∂φ ∂ c ∂θ (cid:27) , (16)respectively. Other derivatives can be derived in a similar way. From (12), we see that the effectiveness of the e-Newton stepgreatly depends on the initial values, that is, the coarse estimates obtained in the e-OMP step. If the initial values are notfar from the true maximum values, then the e-Newton-reﬁned values are closer to global maxima than local maxima, therebyincreasing the precision of the estimates of the delay, azimuth, and downtilt. This is one of the reasons why we use oversampledcodebook in the e-OMP step. The gain of the i th component path will be updated through (10) by replacing (ˆ θ k,i , ˆ φ k,i , ˆ τ k,i ) with (ˆ θ (cid:48) k,i , ˆ φ (cid:48) k,i , ˆ τ (cid:48) k,i ) . Thereafter, the reﬁned i th component path will be removed from y ulr ,k ( i ) .Algorithm 1 brieﬂy summarizes the working steps of an iteration in the eNOMP algorithm for the massive MIMO-OFDMsystem. The stopping criterion of the eNOMP iterations is designed on the basis of the required false alarm rate P fa [27] whose Algorithm 1 Working Steps of An Iteration in eNOMPStep 1: New detection.

Coarsely estimate downtilt, azimuth, and delay of a component path by applying the e-OMP step.

Step 2: Single Newton Reﬁnement.

Apply e-Newton step to the newly detected component path.

Step 3: Cyclic Newton Reﬁnement.

Cyclicly apply e-Newton steps to all the detected component paths one by one.

Step 4: Gains Update.

Update the gains of all the detected component paths by utilizing the reﬁned parameters.typical value range is − − − . The algorithm terminates when (cid:107)F{ y ulr ,k }(cid:107) ∞ < ln( M N ) − ln( − ln(1 − P fa )) , (17)where F{·} represents taking Fourier transformation and (cid:107) · (cid:107) ∞ denotes the inﬁnite norm. Note that if each path is preciselyestimated, then the algorithm terminates when all the real paths are extracted and no false alarm happens, as long as the thevalue of P fa is set reasonably. If the paths extracted in early iterations are not precise enough, then the value of P fa determineshow many paths are detected. The e-NOMP algorithm terminates early if P fa is small, and consequently less paths may bedetected. Otherwise, the algorithm stops late when increasing P fa , and we obtain more paths. The later obtained paths occupyvery little proportion of power of the reconstructed channel and may not exist in the real channel. Their existence helps enhancethe global accuracy of the reconstructed channel. Finally, the BS obtains the estimated frequency-independent parameters ofall the user channels, which are denoted by { ˆ τ k,l , ˆ θ k,l , ˆ φ k,l } , where l = 1 , . . . , ˆ L k , k = 1 , . . . , K .IV. E FFICIENT D OWNLINK T RAINING AND C HANNEL R ECONSTRUCTION

The multiuser downlink channel can be reconstructed by estimating the downlink gains for each user one-by-one. However,this procedure will cost considerable downlink resources for training pilots, which is the second key problem mentioned in

Section II . In this section, we propose an efﬁcient training strategy that utilizes a small amount of training overhead to obtainaccurate estimates of downlink gains for multiple users.

A. Requirement for Successful Estimation

We analyze the requirement for the successful estimation of downlink gains. For a certain user in the system, the realspatial parameters are { τ l , θ l , φ l } l =1 ,...,L and their estimates are { ˆ τ l , ˆ θ l , ˆ φ l } l =1 ,..., ˆ L . In this subsection, we omit the user indexto simplify the expressions. During the downlink gain estimation phase, the BS transmits downlink pilots over T p successiveOFDM symbols. To enhance the receiving power of pilots and improve the estimation accuracy, beamforming is applied onthe pilots. The beamforming directions are altered on every OFDM symbol.We suppose that comb-type all-ones pilots are used and these pilots are sparsely and uniformly inserted in the downlinkfrequency band. The received pilots on the t th OFDM symbol at a certain user are y dl ( t ) = L − (cid:88) l =0 √ P g dl l a T ( θ l , φ l ) b t p p ( τ l ) + z dl ( t ) , (18)where P is the transmit power; b t ∈ C M × is the training beam used on the t th OFDM symbol; p p ( τ ) = (cid:104) e j π ( f dl − f ul + n (cid:52) f ) τ , . . . , e j π ( f dl − f ul + n Np (cid:52) f ) τ (cid:105) T (19)describes the delay on N p downlink subcarriers that are occupied by downlink pilots; and z dl ( t ) is the noise vector on the t thOFDM symbol with elements that are i.i.d. with zero mean and unit variance. At the user side, considering that L , θ l , φ l , and τ l have been estimated in the uplink and sent to this user, we replace them by ˆ L , ˆ θ l , ˆ φ l , and ˆ τ l in the following derivation.We denote Θ l,t = a T (ˆ θ l , ˆ φ l ) b t (20)to simplify the expressions. By stacking all the received pilots into a large vector, we obtain y dl = √ P Ag dl + z dl , (21) where y dl =  y dl (1) ... y dl ( T p )  , g dl =  g dl1 ... g dlˆ L  , z dl =  z dl (1) ... z dl ( T p )  (22)are the stacked received pilot, downlink gain, and noise vectors, respectively; and A =  A (1 , · · · A (1 , ˆ L ) ... ... A ( T p , · · · A ( T p , ˆ L )  (23)is the coefﬁcient matrix with submatrix A ( t, l ) = e j π ( f dl − f ul )ˆ τ l Θ l,t p p (ˆ τ l ) . (24)From (18), the least squares (LS) estimate of gains is given by ˆ g dl = 1 √ P A † y = 1 √ P ( A H A ) − A H y . (25)Evidently, the ﬁrst requirement for the successful estimation of g dl is that A H A is invertible. Thus, A must have full column-rank, that is, rank( A ) = ˆ L . An implied condition for full column-rank is that N p T p ≥ ˆ L , which indicates that the number ofdownlink pilots should be no less than the number of estimated downlink gains. We apply singular-value decomposition onthe coefﬁcient matrix A by A = UΛV H , where U ∈ C N p T p × N p T p and V ∈ C ˆ L × ˆ L are unitary matrices and Λ ∈ C N p T p × ˆ L satisﬁes Λ =  λ . . . λ ˆ L · · ·  , (26)where λ ≥ . . . ≥ λ ˆ L are the singular values. To ensure that A H A = UΛ H ΛV H is invertible, the smallest singular valuemust hold that | λ ˆ L | > .We denote y dl = U H y dl , g dl = V H g dl , and z dl = U H z dl . Then, the LS estimation of the downlink gain of the l thestimated path is ˆ g dl l = g dl l + λ ∗ l √ P | λ l | z dl l . (27)where g dl l and z dl l are the l th elements of g dl and z dl , respectively. The normalized MSE (NMSE) of the estimated downlinkgains is deﬁned as NMSE = E {(cid:107) ˆ g dl − g dl (cid:107) }(cid:107) g dl (cid:107) , (28)which can be further derived as NMSE = 1 P (cid:107) g dl (cid:107) L (cid:88) l =1 E {| z dl l | }| λ l | . (29)Here, we assume a successful estimation of the downlink gains by evaluating the NMSE. Assumption 1:

Estimation of the downlink gains of a user is successful when the NMSE is below a tolerated error rate δ ,where < δ (cid:28) .Now, we attempt to explore a deeper insight into (29). The unitary characteristic of U determines that the new noise vector z dl holds the same statistics with z dl . Thus, the elements of z dl are also i.i.d. with zero mean and unit variance, that is, E {| z dl l | } = 1 . According to [18], the spatial power difference between the uplink and downlink is small. We use (cid:107) ˆ g ul (cid:107) to approximate (cid:107) g dl (cid:107) . Then, the NMSE of the gains can be estimated by (cid:92) NMSE = 1 P (cid:107) ˆ g ul (cid:107) L (cid:88) l =1 | λ l | . (30)According to Assumption 1 , we can predict that the downlink gain estimation will be successful if P (cid:107) ˆ g ul (cid:107) L (cid:88) l =1 | λ l | < δ. (31)Subsequently, we formulate the requirement for a successful estimation as follows. Assumption 2:

In multiuser FDD massive MIMO systems, the requirement for successful estimation of downlink gains isthat ˆ L k (cid:88) l =1 | λ l,k | < δP (cid:107) ˆ g ul k (cid:107) (32)holds for user k = 1 , . . . , K , where λ l,k is the l th singular value of the coefﬁcient matrix A k of user k which is deﬁned in(23)-(24).This assumption guides our design of the downlink-training strategy. In the following subsection, we will use (32) to designdownlink-training beams b t , t = 1 , . . . , T p for achieving successful estimation of the downlink gains. B. Spatial Angle Grid

If only one user exists in the cell, then all the downlink pilots can be dedicated to this user and beamformed to the spatialangles estimated in the uplink. In this condition, T p = ˆ L , b t = a ∗ (ˆ θ l , ˆ φ l ) / √ M . Consequently, Θ t,t is equal to its maximumvalue √ M . From (30), we can see that the estimation error is inversely proportional to | λ | , . . . , | λ ˆ L | . If Θ t,t (cid:29) Θ l,t for l (cid:54) = t , then | λ | , . . . , | λ ˆ L | approaches to √ M . We can obtain precise estimation results with extremely small error.A simple extension to multiuser scenario is to execute the single user estimation process for each user individually. Inparticular, T p = (cid:80) Kk =1 ˆ L k OFDM symbols will be occupied by the pilots. However, this method will be extremely timeconsuming when the number of users increases. If the training time exceeds the channel coherent time, that is, T p ≥ T c , thenthe effectiveness of the reconstruction results will be lost.To ensure a balance between precision and efﬁciency, we introduce a spatial angle grid and select beamforming directionsfrom the angles on the grid. The spatial angle grid covers the entire space and uniformly samples the spatial directions. Eachspatial angle on the grid corresponds to a downlink-training beam. We have deﬁned a codebook with similar usage in eNOMPalgorithm. As shown in Fig. 3, each vertical plane represents a lower dimensional sub-codebook that uniformly samples theentire 3D space. Therefore, we can reuse this sub-codebook as the spatial angle grid. Remark : We set β θ = β φ = 1 to restrict the quantity of downlink-training beams. Notably, if we use an oversampled grid,the probability that a grid point is shared by more than one user greatly decreases, and the number of selected grid pointsenlarges. Besides, we do not need to precisely target the the downlink pilots on the spatial directions of the paths. We cansuccessfully estimate the downlink gains only if (32) is satisﬁed.For the i th grid point, i = 1 , . . . , M v M h , we denote (¯ θ, ¯ φ ) i = (¯ θ i v , ¯ φ i h ) , where the corresponding downtilt and azimuth are ¯ θ i v = πM v (cid:18) i v − M v − (cid:19) , ¯ φ i h = πM h (cid:18) i h − M h − (cid:19) . (33)In addition, the corresponding indices of the sampled downtilt and azimuth angles are i v = (cid:108) iM h (cid:109) , i h = i − M h ( i v − , (34)respectively. The i th grid point corresponds to the beamforming vector a ∗ (¯ θ i v , ¯ φ i h ) / √ M . Each estimated pair of downtilt and azimuth angles is rounded to its “nearest” spatial grid point with the largest projectedpower from the original estimated angle pair. For the l th estimated path of user k , the projected power on grid point (¯ θ, ¯ φ ) isdeﬁned as ρ k,l (¯ θ, ¯ φ ) = 1 M (cid:12)(cid:12)(cid:12) a T (ˆ θ k,l , ˆ φ k,l ) a ∗ (¯ θ, ¯ φ ) (cid:12)(cid:12)(cid:12) . (35)To enhance the receiving power at the user side, we calculate the projected power on all the grid points, select the one withthe maximum value (¯ θ k,l , ¯ φ k,l ) = arg max ¯ θ = ¯ θ , . . . , ¯ θMv ¯ φ = ¯ φ , . . . , ¯ φMh ρ k,l (¯ θ, ¯ φ ) (36)and mark it as the optimal grid point.The main advantages of using a spatial angle grid are listed as follows. (1) When different users mark the same grid point,the corresponding training beam can be shared by these users simultaneously. As a result, the downlink dedicated pilots aretranslated to downlink broadcast pilots, and the number of required pilots is decreased. (2) The estimated angle pair is projectedto an optimal grid point and the projected power is sufﬁciently large to guarantee high receiving signal-to-noise ratio (SNR)of the downlink pilots. C. Downlink-Training Strategy

After the projected grid points from each user are known, the BS records all the marked grid points, includes them in theselected grid angle set, and obtains

Φ = { (¯ θ, ¯ φ ) i , . . . , (¯ θ, ¯ φ ) i S } . Then, S OFDM symbols will be occupied by the downlinkpilots for the estimation of the downlink gains of each user. In a massive MIMO system, the number of users K is usually large.Accordingly, S is large as well. To ensure that sufﬁcient time is retained for data transmission within the channel coherenttime, we should reduce the overhead for training.

1) Reasons for the Reduction in Training Overhead:

As mentioned above, we mark an optimal grid point for each estimatedangle pair to achieve high receiving SNR. According to [21], when the number of antennas is large but not unlimited, theestimated angle pair has considerable projected power on more than one grid angle. Therefore, if the transmit power issufﬁciently large, then a suboptimal grid point with the second or third largest projected power is also adequate to ensure thehigh receiving power of pilots. Moreover, when the projected power on other grid points is sufﬁciently large and the resultingsingular values of A satisfy (32), the suboptimal options work as well. These observations give the possibility to further reducethe grid points included in Φ . To better understand our idea, we ﬁrst discuss two speciﬁc cases that can reduce the selectedgrid points. Case 1: Share a common grid point.

We suppose only one user exists in the cell. The propagation paths are spatially-closed with each other. These paths are projected on different optimal grid points but share a common suboptimal grid point.According to (23), if the BS beamforms the downlink pilots only on the direction of this common suboptimal grid point, thenthe coefﬁcient matrix A becomes A = (cid:104) Θ , p p (ˆ τ ) , Θ , p p (ˆ τ ) , . . . , Θ ˆ L, p p (ˆ τ ˆ L ) (cid:105) . (37)Notably, p p (ˆ τ l ) is a vector with completely different elements, and ˆ τ l is not equal to each other for l = 1 , . . . , ˆ L . The probabilitythat A satisﬁes (32) is high, and the downlink gains can still be successfully estimated. Thus, ˆ L − grid points in Φ areredundant and can be replaced by a single grid point. In this condition, | Φ | equals 1 rather than ˆ L , thereby saving trainingoverhead by ˆ L − OFDM symbols.

Case 2: Utilize grid points selected by other users.

In this case, we suppose two users exist, and each user only has onepropagation path. We assume that user 1 can receive the broadcast downlink pilot that is beamformed to the optimal grid pointselected by user 2. Therefore, this pilot can be utilized by user 1 as well. If the resulting singular values of A satisfy (32),then the original optimal grid point of the path of user 1 can be removed from Φ . In this condition, the estimation accuracywill not be degenerated considerably if a user receives downlink pilots that are beamformed to neither optimal nor suboptimalgrid angles. Grid PointsUsers1234 w(2) = 1 w(9) = 2   ,    ,    ,    ,    ,    ,    ,    ,    ,    ,  Fig. 4. Weights of grid points. Grid point (¯ θ, ¯ φ ) has the ﬁrst minimum weight and will thus be abandoned ﬁrst.

2) Beam Scheduling Scheme:

The two cases indicate that multiple paths can share a common grid point and that each usercan utilize the grid points selected by other users. These ﬁndings provide reasons for the reduction in training overhead. Basedon these ﬁndings, we propose a beam scheduling scheme to exclude the redundant grid points in Φ and only retain the gridpoints that can be shared by the paths of all the user channels. The target of the beam scheduling scheme is to minimize thenumber of selected grid points in Φ : arg min | Φ | , s . t . (32) holds for k = 1 , . . . , K. (38)If we exhaustively search all the possible grid angle subsets of Φ to ﬁnd the optimum subset, then the searching time willbe extremely high. Thus, we follow a similar approach as the greedy method but use it in a reverse way. The redundant gridangles will be identiﬁed and excluded from Φ one by one. According to Assumption 2 , if (32) holds for all the K users andno more grid points can be excluded in Φ , then we obtain the ﬁnal set of selected grid points that will be transformed to thedownlink-training beams.We introduce the concept of “weight” in the removal of redundant grid angles. If grid point (¯ θ, ¯ φ ) i is chosen by ¯ K users,then its weight is denoted by w ( i ) = ¯ K . The large weight of a grid point shows that it is shared by plenty of users. Inprinciple, we keep the grid points that are shared by several users and abandon the grid points that are chosen by only oneuser. For example, the weights of grid points (¯ θ, ¯ φ ) and (¯ θ, ¯ φ ) in Fig. 4 are 1 and 2, respectively. Grid point (¯ θ, ¯ φ ) has theﬁrst smallest weight and will therefore be abandoned ﬁrst.Speciﬁcally, we compare w ( i ) , . . . , w ( i S ) and reorder Φ in increasing weight, that is, w ( j ) ≤ w ( j ) ≤ · · · ≤ w ( j S ) and Φ = { (¯ θ, ¯ φ ) j , . . . , (¯ θ, ¯ φ ) j S } . Following the order, we keep excluding the front grid points with small weights until anygrid point in Φ is indispensable to achieve (32). The working principle of the beam scheduling scheme is summarized inAlgorithm 2. We denote the indices of the remaining grid point as j S − T p +1 , . . . , j S . Then, the downlink-training beams are b t = a ∗ (¯ θ, ¯ φ ) j t / √ M , t = S − T p + 1 , . . . , S .The complexity of Algorithm 2 increases in proportion to the number of users. However, if the users are spatially closeto one another and their optimal grid points are almost the same, then Algorithm 2 needs to exclude only a few grid pointsbecause the size of the initial grid point set Φ is small. Under this condition, the complexity of Algorithm 2 does not increasesharply with the increased number of users. However, if the users are spatially separate from one another, the size of the initialgrid point set Φ is large. Then, the complexity increases because the number of grid points with small weights is large.Before the start of the downlink pilot transmission phase, each selected user is informed with its delays, angles, and thegrid angles for the downlink pilots. Thereafter, the BS broadcasts cell-common pilots in successive T p OFDM symbols. Eachuser receives all the pilots and calculates their downlink gains using (25). The estimated downlink gains are sent back to theBS. By utilizing the uplink-estimated downtilts, azimuths, delays, and the downlink-estimated gains, the BS reconstructs the Algorithm 2 Beam Scheduling StrategyRequire:

Select grid point set Φ Initialize:

Φ = { (¯ θ, ¯ φ ) i , . . . , (¯ θ, ¯ φ ) i S }

1: Calculate the weight of each grid point in Φ

2: Reorder the grid points in Φ in increasing order and obtain Φ = { (¯ θ, ¯ φ ) j , . . . , (¯ θ, ¯ φ ) j S }

3: Set ﬂag = 1 , s = 1 while ﬂag do

1) Set Φ temp = { Φ \ (¯ θ, ¯ φ ) j s } for k = 1 , . . . , K do a) if (32) does not hold for user k when applying Φ temp Set ﬂag = 0 , T p = S − s + 1 , break if ﬂag = 1 Set

Φ = Φ temp , s = s + 1 Output: Φ TABLE IC

OST C OMPARISON

Training Feedback(OFDM symbols) (complex numbers)LMMSE

M M N K

Reconstruction T p (cid:80) Kk =1 ˆ L k downlink channel of user k as ˆ h dl k = ˆ L k (cid:88) l =1 ˆ g dl k,l a T (ˆ θ k,l , ˆ φ k,l ) ⊗ p T (ˆ τ k,l ) e j π ( f dl − f ul )ˆ τ k,l . (39)where ˆ g dl k,l is the estimated downlink gain of the l th path of user k . Finally, the BS obtains ˆ h dl1 , . . . , ˆ h dl K . D. Cost Evaluation

Here, we evaluate the total cost of the proposed downlink channel reconstruction scheme, including the training and thefeedback overhead. We compare the proposed reconstruction with the LMMSE channel estimation method in FDD massiveMIMO systems. Table I shows a brief comparison of the costs of these two schemes.

1) Training Overhead:

For the widely used LMMSE channel estimation method, M orthogonal downlink pilots are requiredto distinguish from the M BS antenna elements. To align with orthogonal pilot design of the downlink-training strategy, thetraining overhead for LMMSE estimation is M OFDM symbols. As for the proposed reconstruction scheme, the trainingoverhead is T p OFDM symbols. If we loosen the requirement on the estimation accuracy and increase δ even in the worst casethat S = M , most of the beams in the initial Φ is redundant and will be excluded from Φ . Thus, we obtain that T p (cid:28) M .

2) Feedback Overhead:

After the downlink channel is estimated using LMMSE channel estimation method, each user sendsthe M -dimensional complex channel vectors on all the N subcarriers to the BS. Thereafter, the feedback overhead of theLMMSE estimation equals M N K complex numbers. For the proposed downlink channel reconstruction scheme, each useronly needs to send back the estimated downlink gains. Thus, the feedback overhead of the proposed scheme is (cid:80) Kk =1 ˆ L k complex numbers. Example : If M = 128 , K = 10 , and L = · · · = L K = 6 , then the value range of T p is [12, 55] according to the numericalresults in Section VI. The training overhead of LMMSE estimation is 128 OFDM symbols, and that of the proposed downlinkchannel reconstruction scheme is 12 −

55 OFDM symbols. If N = 256 , then the LMMSE estimation needs to send back × × complex numbers. The proposed reconstruction scheme only feeds back about × complex numbers, farless than that of using LMMSE method. V. M

ULTIUSER S UM - RATE A NALYSIS

With the downlink reconstructed channels of all the users, the BS conducts user scheduling and designs data transmission.If the number of users is much smaller than the number of BS antennas, then user scheduling is not necessary, and these userscan be simultaneously served. In this section, we analyze the sum-rate based on the reconstructed channels.During the downlink transmission phase, the BS sends data streams to all the K users simultaneously. To overcomeinterference, the BS adopts zero-forcing (ZF) precoding before transmitting downlink data. The ZF precoder on downlinksubcarrier n is expressed as W ( n ) = ˆ H † ( n ) Λ ( n ) , (40)where ˆ H ( n ) ∈ C K × M is the reconstructed multiuser channel matrix on downlink subcarrier n , [ ˆ H ( n )] k, : = ˆ h dl k ( n ) = ˆ L k (cid:88) l =1 ˆ g dl k,l a T (ˆ θ k,l , ˆ φ k,l ) e j π ( f dl − f ul + n (cid:52) f )ˆ τ k,l , (41)and Λ ( n ) = diag { α , . . . , α K } is the normalization matrix. Here, we adopt uniform power allocation strategy and set α k = 1 √ K (cid:107) [ ˆ H † ( n )] : ,k (cid:107) . (42)For user k , its received data on subcarrier n comprise the target data and the inter-user interference, that is, r dl k ( n ) = √ P h dl k ( n ) [ W ( n )] : ,k d k ( n ) + (cid:88) j (cid:54) = k √ P h dl k ( n ) [ W ( n )] : ,k d j ( n ) + z dl k ( n ) , (43)where r k ( n ) is the received data by user k on subcarrier n , P is the total downlink transmit power, d k ( n ) is the transmit datawith unit power, and z k ( n ) is the noise with unit variance. When applied (40), we rewrite (44) as r dl k ( n ) = √ P α k h dl k ( n )[ ˆ H † ( n )] : ,k d k ( n ) + (cid:88) j (cid:54) = k √ P α j h dl k ( n )[ ˆ H † ( n )] : ,k d j ( n ) + z dl k ( n ) . (44)Considering the cost for the downlink channel reconstruction, the average multiuser sum-rate of this system is calculated as R = (cid:18) − T p T c (cid:19) N N (cid:88) n =1 K (cid:88) k =1 log (1 + SINR k ( n )) , (45)where SINR k ( n ) is the SINR at user k on subcarrier n and can be expressed as SINR k ( n ) = P α k | h dl k ( n )[ ˆ H † ( n )] : ,k | (cid:80) j (cid:54) = k P α j | h dl k ( n )[ ˆ H † ( n )] : ,j | + | z dl k ( n ) | . (46)Notably, the multiuser sum-rate is proportional to SINR but inversely proportional to T p . We ﬁrst evaluate the achievablerate of a single user k by analyzing its SINR. To simplify the expressions, we neglect the subcarrier index n in the followingderivations. Theorem 1 provides the approximation of the expectation of

SINR k when the acceptable error rate equals δ . Theorem 1:

The expectation of the SINR at user k approximates E { SINR k } ≈ S k (cid:80) j (cid:54) = k I k,j + 1 , (47)where S k = P (1 + δ (cid:80) Mm =1 | [ H ] k,m (cid:2) H † (cid:3) m,k | ) K (cid:16) (cid:107) [ H † ] : ,k (cid:107) + δ (cid:80) Kj =1 (cid:107) [ H † ] : ,j (cid:107) (cid:80) Mm =1 | [ H ] j,m [ H † ] m,k | (cid:17) (48)can be viewed as the expected signal power, and I k,j = P ( δ (cid:80) Mm =1 | [ H ] k,m (cid:2) H † (cid:3) m,j | ) K (cid:16) (cid:107) [ H † ] : ,j (cid:107) + δ (cid:80) Ki =1 (cid:107) [ H † ] : ,i (cid:107) (cid:80) Mm =1 | [ H ] i,m [ H † ] m,j | (cid:17) (49) is viewed as the expected interference power related to user j . Proof:

The proof of

Theorem 1 can be found in the Appendix.From

Theorem 1 , we can see that S k = P (cid:16) δ (cid:80) Mm =1 | [ H ] k,m (cid:2) H † (cid:3) m,k | (cid:17) K (cid:107) [ H † ] : ,k (cid:107) (cid:16) δ (cid:80) Mm =1 | [ H ] k,m [ H † ] m,k | (cid:17) + KδX , (50)where X = (cid:80) j (cid:54) = k (cid:107) (cid:2) H † (cid:3) : ,j (cid:107) (cid:80) Mm =1 | [ H ] j,m (cid:2) H † (cid:3) m,k | > . Since S k < PK (cid:16) δ (cid:80) Mm =1 | [ H ] k,m (cid:2) H † (cid:3) m,k | (cid:17) (cid:107) [ H † ] : ,k (cid:107) (cid:16) δ (cid:80) Mm =1 | [ H ] k,m [ H † ] m,k | (cid:17) , (51)where the right item can be reduced to P/K (cid:107) (cid:2) H † (cid:3) : ,k (cid:107) , we obtain that S k < PK (cid:107) [ H † ] : ,k (cid:107) , (52)which is the expected signal power when δ = 0 . The expected signal power degrades if error is acceptable in the LS estimationof the downlink gains. When the value of δ becomes large, the expected signal power decreases.Only if δ = 0 and the reconstructed downlink multiuser channel is precise can the interference be completely eliminated.Otherwise, I k,j > , which shows that the interference exists. After rewriting the expression of I k,j by I k,j = (cid:80) Mm =1 PK | [ H ] k,m (cid:2) H † (cid:3) m,j | δ − (cid:107) [ H † ] : ,j (cid:107) + (cid:80) Ki =1 (cid:107) [ H † ] : ,i (cid:107) (cid:80) Mm =1 | [ H ] i,m [ H † ] m,j | , (53)we ﬁnd that the interference increases in proportion to δ .The analytical results above show that, when δ increases, the signal power decreases and the interference becomes moresevere than before, resulting in the considerable degradation in the SINR performance. However, most grids in Φ are removedby the beam scheduling strategy and the value of T p is small. Accordingly, the sum-rate increases as well. Therefore, thesum-rate performance remains high even if the requirement on the estimation accuracy is relaxed.VI. N UMERICAL R ESULTS

In this section, we evaluate the performance of the downlink channel reconstruction-based FDD massive MIMO transceiver.In this FDD system, f dl − f ul = 300 MHz, N = 256 , and (cid:52) f = 75 kHz. For the UPA at BS, we set M v = 8 and M h = 16 .In each user channel, T c = 200 and L = · · · = L K = 6 . Delays of the paths are randomly distributed in [0 , / (cid:52) f ) . Thedowntilts and azimuths are randomly distributed in [ − π/ , π/ . Power attenuation occurs during the propagation of a wirelesssignal, and the total attenuation for each user channel is randomly set within [0 , − dB. For the e-NOMP algorithm, we set P fa = 10 − . The oversampling rates used the in e-OMP step are chosen by jointly considering the resolution of the codebookand the computation complexity. Considering that the values of N , M h , and M v are large, and N is larger than M h or M v ,we set β τ = 1 , and β θ = β φ = 2 when running the eNOMP algorithm. In the downlink training phase, pilots are uniformlyinserted in every four subcarriers. A. Evaluation of the eNOMP Algorithm

We evaluate the estimation precision of the eNOMP algorithm by checking if the values of extracted frequency-independentparameters are equal to their real values. Fig. 5 examines two implementations of the eNOMP algorithm when the channelattenuation and the transmit SNR equal 0 dB. The results are displayed in a 3D coordinate system. The coordinate of eachpoint in the 3D coordinate system is composed of the delay, azimuth, and downtilt. The blue circles denote the real frequency-independent parameters, and the red stars represent their estimates. As shown in Fig. 5(a), the estimates coincide with the realvalues and the number of extracted paths is exactly the real number of paths. Therefore, the eNOMP algorithm can precisely (a) (b) Fig. 5. Results of two implementations of the eNOMP algorithm. The real frequency-independent parameters are illustrated by circles and their estimatesare denoted by stars.

SNR (dB) -5 -4 -3 -2 -1 N M SE Uplink channel

LSLMMSEReconstruct

Fig. 6. NMSE performance of the eNOMP algorithm. The eNOMP-reconstructed uplink channel is more accurate than the LMMSE-estimated uplink channel. detect each path from the mixture. Fig. 5(b) shows that an extra and fake path is detected, which implies that false alarmmay occur during the implementation of eNOMP. Considering that the estimated parameters are not sufﬁciently accurate, thecomponent paths can not be completely eliminated from the mixture. The integration of the residual components will resultin a fake component path, which is falsely detected by the algorithm.The occurrence of false alarm is inevitable, but the eNOMP algorithm can still provide a globally accurate reconstructionresult. We examine the global accuracy of the eNOMP-based uplink channel reconstruction by evaluating the NMSE, whichis calculated as

NMSE = E (cid:40) (cid:107) ˆ h − h (cid:107) (cid:107) h (cid:107) (cid:41) . (54)The classical LS and LMMSE channel estimation methods are introduced as benchmarks. Fig. 6 compares the NMSEperformance of the estimated or reconstructed uplink channels. Here, the wireless channel attenuation is set to 0 dB, andSNR equals the transmit power. As expected, the LS estimated channel has the worst accuracy, and LMMSE improves theperformance by a large margin. On the contrary, the eNOMP-based uplink channel reconstruction further reduces the NMSE -3 -2 -1 nu m be r o f s e l e c t ed bea m s Beam Scheduling Scheme -3 -2 -1 -3 -2 -1 r ea l N M SE o f c hanne l LMMSEReconstruct -3 -2 -1 required NMSE of gain s u m -r a t e ( b / s / H z ) Perfect CSILMMSEReconstruct

Fig. 7. Evaluation of the NMSE and multiuser sum-rate performance of the proposed transceiver design with K = 10 . considerably. When SNR equals 0 dB, eNOMP can obtain − NMSE. The value continuously drops with the increase ofSNR. These results strongly demonstrate the high global accuracy of the eNOMP algorithm.

B. Evaluation of the Reconstruction-based Transceiver

We introduce LMMSE channel estimation as a benchmark again and consider its training cost of M = 128 OFDM symbolsto compare the performance of the proposed downlink channel reconstruction-based FDD massive MIMO transceiver. We alsoevaluate the case when perfect downlink CSI is known at the BS and assume that the training cost is equal to that of theproposed downlink-training strategy. The transmit SNRs in uplink and downlink are equal to 10 dB.Fig. 7 evaluates the performance of the transceiver under different requirements on the NMSE of the estimated downlinkgains with K = 10 . If we set δ = 10 − , then sufﬁcient pilots are available to guarantee the high estimation precision ofthe downlink gains. Thus, the ﬁrst sub-ﬁgure indicates that seldom grid points are excluded in the selected grid point set.In this condition, NMSE of the reconstructed downlink channel is as small as that of the LMMSE estimated channel whenobserving the second sub-ﬁgure. The real NMSE of the reconstruction scheme is higher than the required NMSE because ofa lower bound of the real NMSE, which can be further lowered by enhancing the transmit SNR. Moreover, we ﬁnd from thethird sub-ﬁgure that LMMSE estimation has extremely low sum-rate performance due to the large cost of training pilots. Thisperformance gap narrows if the channel’s coherence time increases because the time resource for training is not comparableto that of data transmission. The reconstruction scheme has nearly optimal rate performance when compared with the rateof using perfect CSI. Considering that we assume the two methods cost the same amount of training overhead, the proposedscheme can achieve as high SINR as that of using perfect CSI if δ is small. When δ increases, the high precision requirement is relaxed gradually. The number of the remaining grid points decreases as well. The reduction in training pilots results inthe degradation in accuracy of the estimates. Notably, the real NMSE is nearly the same with the predeﬁned NMSE. The rateperformance experiences a slight improvement and reaches the peak when δ = 10 − . Therefore, a little sacriﬁce of NMSE isacceptable, and the reduction in training overhead contributes to the increase in rate. The rate achieved by the reconstruction-based transceiver still approaches that of using perfect CSI. If we further release the NMSE requirement and set δ = 10 − ,then the rate performance gap between the proposed scheme and using perfect CSI becomes large. Nevertheless, the trainingcost is decreased to nearly 12 OFDM symbols. As a result, the rate achieved by the proposed scheme is still very high, whichdemonstrates the efﬁciency of the proposed scheme. VII. C ONCLUSION

In this paper, we proposed an efﬁcient downlink channel reconstruction-based transceiver for FDD massive MIMO systems.Spatial reciprocity between uplink and downlink was utilized to reduce the training and feedback overhead. We ﬁrst addressedthe problem of extracting downtilts, azimuths, and delays in a 3D massive MIMO-OFDM system by introducing an eNOMPalgorithm. Then, we solved the problem of estimating downlink gains for multiple users by utilizing a spatial angle gridand proposing an efﬁcient downlink-training strategy with low overhead. Theoretical analysis revealed the effect of the valueof acceptable NMSE on the sum-rate performance. Numerical results proved that downtilts, azimuths, and delays could beprecisely estimated using the eNOMP algorithm and that high sum-rate could be achieved by utilizing the reconstructedmultiuser downlink channel. A

PPENDIX

According to (46), the SINR at user k satisﬁes E { SINR k } ≈ E { P | α k | | h dl k ( n )[ ˆ H † ( n )] : ,k | } (cid:80) j (cid:54) = k E { P | α j | | h dl k ( n )[ ˆ H † ( n )] : ,j | } + E {| z dl k | } . (55)Firstly, it is obvious that E {| z dl k | } = 1 . For the signal and interference items, they can be approximated to, respectively, E { P | α k | | h dl k ( n )[ ˆ H † ( n )] : ,k | } ≈ P E {| h dl k ( n )[ ˆ H † ( n )] : ,k | } K E {(cid:107) [ ˆ H † ( n )] : ,k (cid:107) } (56)and E { P | α j | | h dl k ( n )[ ˆ H † ( n )] : ,j | } ≈ P E {| h dl k ( n )[ ˆ H † ( n )] : ,j | } K E {(cid:107) [ ˆ H † ( n )] : ,j (cid:107) } (57)when (42) is applied. Considering the error in the reconstructed multiuser channel, we model the reconstructed multiuser channelby ˆ H = H + E , where H ∈ C K × M is the real downlink channel matrix, [ H ] k, : = h dl k , and E is the reconstruction error withelements that are i.i.d Gaussian with zero mean. We regard the real channel H as constant and acquire the expectation of SINRabout the reconstruction error E . According to (32), the NMSE of ˆ g dl k approximates δ , and the NMSE of ˆ h dl k approximates δ as well. Thus, we make the following approximation E {| [ E ] k,i | } ≈ δ | [ H ] k,i | . (58)Besides, ˆ H † can be Taylor expanded by [29] ˆ H † = ( H + E ) † ≈ H † − H † EH † . (59)Then, the k th column of ˆ H † is expressed as [ ˆ H † ] : ,k ≈ [ H † ] : ,k − H † E [ H † ] : ,k . (60)Since h dl k [ H † ] : ,j = (cid:40) , j = k, , j (cid:54) = k, (61) we can derive that | h dl k [ ˆ H † ] : ,j | = (cid:40) − [ H † ] H : ,k [ E ] Hk, : − [ E ] k, : [ H † ] : ,k +[ H † ] H : ,k [ E ] Hk, : [ E ] k, : [ H † ] : ,k , j = k [ H † ] H : ,j [ E ] Hk, : [ E ] k, : [ H † ] : ,j , j (cid:54) = k (62)The expectation approximates E {| h dl k [ ˆ H † ] : ,j | } ≈ (cid:40) δ (cid:80) Mm =1 | [ H ] k,m [ H † ] m,k | , j = kδ (cid:80) Mm =1 | [ H ] k,m [ H † ] m,j | , j (cid:54) = k (63)when applying (58) to (62). Besides, (cid:107) [ ˆ H † ] : ,k (cid:107) ≈ (cid:107) [ H † ] : ,k (cid:107) + [ H † ] H : ,k E H H † H H † E [ H † ] : ,k − [ H † ] H : ,k H † E [ H † ] : ,k − [ H † ] H : ,k E H H † H [ H † ] : ,k , (64)and its expectation satisﬁes E {(cid:107) [ ˆ H † ] : ,k (cid:107) } ≈ (cid:107) [ H † ] : ,k (cid:107) + δ K (cid:88) j =1 (cid:107) [ H † ] : ,j (cid:107) M (cid:88) m =1 | [ H ] j,m [ H † ] m,k | . (65)By applying (63) and (65) into (56), we obtain E { P | α k | | h dl k ( n )[ ˆ H † ( n )] : ,k | } ≈ PK (1 + δ (cid:80) Mm =1 | [ H ] k,m [ H † ] m,k | ) (cid:107) [ H † ] : ,k (cid:107) + δ (cid:80) Kj =1 (cid:107) [ H † ] : ,j (cid:107) (cid:80) Mm =1 | [ H ] j,m [ H † ] m,k | (66)which is exactly S k . Similarly, E { P | α j | | h dl k ( n )[ ˆ H † ( n )] : ,j | } ≈ (cid:88) j (cid:54) = k I k,j . (67)Therefore, (47) is obtained. R EFERENCES[1] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. Zhang, “What will 5G be?”

IEEE J. Sel. Areas Commun. , vol. 32,no. 6, pp. 1065-1082, Jun. 2014.[2] E. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,”

IEEE Commun. Mag. , vol. 52, no. 2,pp. 186-195, Feb. 2014.[3] M. Agiwal, A. Roy, and N. Saxena, “Next generation 5G wireless networks: A comprehensive survey,”

IEEE Commun. Surveys Tuts. , vol. 18, no. 3, pp.1617-1655 , 3rd Quart. 2016.[4] A. Adhikary, J. Nam, J. Ahn, and G. Caire, “Joint spatial division and multiplexing: The large-scale array regime,”

IEEE Trans. Inf. Theory , vol. 59,no. 10, pp. 6441-6463, Oct. 2013.[5] S. K. Mohammed, and E. G. Larsson, “Constant-envelope multi-user precoding for frequency-selective massive MIMO systems,”

IEEE Commun. Lett. ,vol. 2, no. 5, pp. 547-550, Oct. 2013.[6] Y. Han, S. Jin, J. Zhang, J. Zhang, and K. K. Wong, “DFT-based hybrid beamforming multiuser systems: Rate analysis and beam selection”

IEEE J.Sel. Topics Signal Process. , vol. 12, no. 3, pp. 514-528, Jun. 2018.[7] Y.-H. Nam, B.-L. Ng, K. Sayana, Y. Li, J. Zhang, Y. Kim, J. Lee , “Full-dimension MIMO (FD-MIMO) for next generation cellular technology,”

IEEECommun. Mag. , vol. 51, no. 6, pp. 172-179 , Jun. 2013.[8] H. Halbauer, S. Saur, J. Koppenborg, and C. Hoek, “3D Beamforming: Performance improvement for cellular networks,”

Bell Labs Tech. J. , vol. 18, no.2, pp. 37-56, 2013.[9] X. Li, S. Jin, X. Gao, and R. W. Heath, “Three-dimensional beamforming for large-scale FD-MIMO systems exploiting statistical channel stateinformation,”

IEEE Trans. Veh. Technol. , vol. 65, no. 11, pp. 8992-9005, Nov. 2016.[10] C. Sun, X. Gao, S. Jin, M. Matthaiou, Z. Ding, and C. Xiao, “Beam division multiple access transmission for massive MIMO communications,”

IEEETrans. Commun. , vol. 63, no. 6, pp. 2170-2184, Jun. 2015.[11] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,”

IEEETrans. Signal Process. , vol. 63, no. 23, pp. 6169-6183, Dec. 2015.[12] X. Rao, and V. K. N. Lau, “Distributed compressive CSIT estimation and feedback for FDD multi-user massive MIMO systems,”

IEEE Trans. SignalProcess. , vol. 62, no. 12, pp. 3261-3271, Jun. 2014.[13] J. Choi, D. J. Love, and P. Bidigare, “Downlink training techniques for FDD massive MIMO systems: Open-loop and closed-loop training with memory,”

IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 802-814, Oct. 2014.[14] J. Choi, D. J. Love, and T. Kim, “Trellis-extended codes and successive phase adjustment: A path from LTE-advanced to FDD massive MIMO systems,”

IEEE Trans. Wireless Commun. , vol. 14, no. 4, pp. 2007-2016, Apr. 2015. [15] W. Shen, L. Dai, B. Shim, Z. Wang, and R. W. Heath, “Channel feedback based on AoD-adaptive subspace codebook in FDD massive MIMO systems,”arXiv preprint arXiv:1704.00658, 2017.[16] Y. Han, H. Zhang, S. Jin, X. Li, R. Yu and Y. Zhang, “Investigation of transmission schemes for millimeter-wave massive MU-MIMO systems,” IEEESyst. J. , vol. 11, no. 1, pp. 72-83, Mar. 2017.[17] U. Ugurlu, R. Wichman, C. B. Ribeiro and C. Wijting, “A multipath extraction-based CSI acquisition method for FDD cellular networks with massiveantenna arrays,”

IEEE Trans. Wireless Commun. , vol. 15, no. 4, pp. 2940-2953, Apr. 2016.[18] H. Xie, F. Gao, S. Jin, J. Fang, and Y.-C. Liang, “Channel estimation for TDD/FDD massive MIMO systems with channel covariance computing,”

IEEETrans. Wireless Commun. , vol. 17, no. 6, pp. 4206-4218, Jun. 2018.[19] S. Haghighatshoar, M. B. Khalilsarai, and G. Caire, “Multi-band covariance interpolation with applications in massive MIMO,” arXiv:1801.03714, Jan.2018.[20] M. B. Khalilsarai, S. Haghighatshoar, X. Yi, and G. Caire, “FDD massive MIMO via UL/DL channel covariance extrapolation and active channelsparsiﬁcation,” arXiv:1803.05754, Mar. 2018.[21] H. Xie, F. Gao, S. Zhang, and S. Jin, “A uniﬁed transmission strategy for TDD/FDD massive MIMO systems with spatial basis expansion model,”

IEEETrans. Veh. Technol. , vol. 66, no. 4, pp. 3170-3184, Apr. 2017.[22] X. Zhang, L. Zhong, and A. Sabharwal, “Directional training for FDD massive MIMO,”

IEEE Trans. Wireless Commun. , vol. 17, no. 8, pp. 5183-5197,Aug. 2018.[23] Y. Han, T.-H. Hsu, C.-K. Wen, K.-K. Wong, and S. Jin, “Efﬁcient downlink channel reconstruction for FDD transmission systems,” in

Proc. 27th IEEEWOCC , Jun. 2018, pp. 1-5.[24] Y. Han, T.-H. Hsu, C.-K. Wen, K.-K. Wong, and S. Jin, “Efﬁcient downlink channel reconstruction for FDD multi-antenna systems,” arXiv:1805.07027,May. 2018.[25] Q. Liu, Y. Han, C. -K. Wen, and S. Jin, “Downlink Channel Reconstruction for FDD 3D Multi-Antenna Systems,” in

Proc. 24th IEEE APCC , Nov.2018, pp. 1-6.[26] S. Imtiaz, G. S. Dahman, F. Rusek, and F. Tufvesson, “On the directional reciprocity of uplink and downlink channels in frequency division duplexsystems,” in

Proc. IEEE PIMRC , 2015, pp. 172-176.[27] B. Mamandipoor, D. Ramasamy, and U. Madhow, “Newtonized orthogonal matching pursuit: Frequency estimation over the continuum,”

IEEE Trans.Signal Process. , vol. 64, no. 19, pp. 5066-5081, Oct. 2016.[28] P. Deuﬂhard, “Newton Methods for Nonlinear Problems: Afﬁne Invariance and Adaptive Algorithms,”

Springer Series in Computational Mathematics ,vol. 35. Springer, Berlin, 2004.[29] C. Wang, E. K. S. Au, R. D. Murch, W.-H. Mow, R. S. Cheng, and V. K. N. Lau, “On the performance of the MIMO zero-forcing receiver in thepresence of channel estimation error,”