[PDF] Deep Learning Based Frequency-Selective Channel Estimation for Hybrid mmWave MIMO Systems

Abstract

Millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems typically employ hybrid mixed signal processing to avoid expensive hardware and high training overheads. {However, the lack of fully digital beamforming at mmWave bands imposes additional challenges in channel estimation. Prior art on hybrid architectures has mainly focused on greedy optimization algorithms to estimate frequency-flat narrowband mmWave channels, despite the fact that in practice, the large bandwidth associated with mmWave channels results in frequency-selective channels. In this paper, we consider a frequency-selective wideband mmWave system and propose two deep learning (DL) compressive sensing (CS) based algorithms for channel estimation.} The proposed algorithms learn critical apriori information from training data to provide highly accurate channel estimates with low training overhead. In the first approach, a DL-CS based algorithm simultaneously estimates the channel supports in the frequency domain, which are then used for channel reconstruction. The second approach exploits the estimated supports to apply a low-complexity multi-resolution fine-tuning method to further enhance the estimation performance. Simulation results demonstrate that the proposed DL-based schemes significantly outperform conventional orthogonal matching pursuit (OMP) techniques in terms of the normalized mean-squared error (NMSE), computational complexity, and spectral efficiency, particularly in the low signal-to-noise ratio regime. When compared to OMP approaches that achieve an NMSE gap of $\unit[\{4-10\}]{dB}$ with respect to the Cramer Rao Lower Bound (CRLB), the proposed algorithms reduce the CRLB gap to only $\unit[\{1-1.5\}]{dB}$, while significantly reducing complexity by two orders of magnitude.

Full PDF

11 Deep Learning Based Frequency-Selective ChannelEstimation for Hybrid mmWave MIMO Systems

Asmaa Abdallah,

Member, IEEE , Abdulkadir Celik,

Senior Member, IEEE ,Mohammad M. Mansour,

Senior Member, IEEE , and Ahmed M. Eltawil,

Senior Member, IEEE . Abstract —Millimeter wave (mmWave) massive multiple-inputmultiple-output (MIMO) systems typically employ hybrid mixedsignal processing to avoid expensive hardware and high trainingoverheads. However, the lack of fully digital beamforming atmmWave bands imposes additional challenges in channel esti-mation. Prior art on hybrid architectures has mainly focusedon greedy optimization algorithms to estimate frequency-ﬂatnarrowband mmWave channels, despite the fact that in practice,the large bandwidth associated with mmWave channels resultsin frequency-selective channels. In this paper, we consider afrequency-selective wideband mmWave system and propose twodeep learning (DL) compressive sensing (CS) based algorithmsfor channel estimation. The proposed algorithms learn criticalapriori information from training data to provide highly accu-rate channel estimates with low training overhead. In the ﬁrstapproach, a DL-CS based algorithm simultaneously estimates thechannel supports in the frequency domain, which are then usedfor channel reconstruction. The second approach exploits theestimated supports to apply a low-complexity multi-resolutionﬁne-tuning method to further enhance the estimation perfor-mance. Simulation results demonstrate that the proposed DL-based schemes signiﬁcantly outperform conventional orthogonalmatching pursuit (OMP) techniques in terms of the normalizedmean-squared error (NMSE), computational complexity, andspectral efﬁciency, particularly in the low signal-to-noise ratioregime. When compared to OMP approaches that achieve anNMSE gap of { − } dB with respect to the Cramer Rao LowerBound (CRLB), the proposed algorithms reduce the CRLB gapto only { − . } dB , while signiﬁcantly reducing complexity bytwo orders of magnitude. Index Terms —Deep learning, channel estimation, compressiveSensing, frequency-selective channel, mmWave, MIMO, convolu-tional neural networks, denoising, sparse recovery

I. I

NTRODUCTION M Illimeter wave (mmWave) communication has emergedas a key technology to fulﬁll beyond ﬁfth-generation(B5G) network requirements, such as enhanced mobile broad-band, massive connectivity, and ultra-reliable low-latencycommunications. The mmWave band offers an abundant fre-quency spectrum (30-300 GHz) at the cost of low pene-tration depth and high propagation losses. Fortunately, itsshort-wavelength mitigates these drawbacks by allowing thedeployment of large antenna arrays into small form factortransceivers, paving the way for multiple-input multiple-output(MIMO) systems with high directivity gains [1]–[4].Hybrid MIMO structures have been introduced to operateat mmWave frequencies because an all-digital architecture,with a dedicated radio frequency (RF) chain for each antennaelement, results in expensive system architecture and highpower consumption at these frequencies [2]. In these hybrid architectures, phase-only analog beamformers are employed tosteer the beams using steering vectors of quantized angles. Thedown-converted signal is then processed by low-dimensionalbaseband beamformers, each of which is dedicated to a singleRF chain [5], [6]. The number of RF chains is signiﬁcantlyreduced with this combination of high-dimensional phase-onlyanalog and low-dimensional baseband digital beamformers[6]. Moreover, optimal conﬁguration of the digital/analogprecoders and combiners requires instantaneous channel stateinformation (CSI) to achieve spatial diversity and multiplexinggain [7]. However, acquiring mmWave CSI is challenging witha hybrid architecture due to the following reasons [5]: 1)There is no direct access to the different antenna elementsin the array since the channel is seen through the analogcombining network, which forms a compression stage for thereceived signal when the number of RF chains is much smallerthan the number of antennas, 2) the large channel bandwidthyields high noise power and low received signal-to-noise-ratio(SNR) before beamforming, and 3) the large size of channelmatrices increases the complexity and overheads associatedwith traditional precoding and channel estimation algorithms.Therefore, low complexity channel estimation for mmWaveMIMO systems with hybrid architecture is necessary.

A. Related Work

Channel estimation techniques typically leverage the sparsenature of mmWave MIMO channels by formulating the esti-mation as a sparse recovery problem and apply compressivesensing (CS) methods to solve it. Compressive sensing isa general framework for estimation of sparse vectors fromlinear measurements [8]. The estimated supports of the sparsevectors using CS help identify the indices of Angle-of-Arrival(AoA) and Angle-of-Departure (AoD) pairs for each pathin the mmWave channel, while the amplitudes of the non-zero coefﬁcients in the sparse vectors represent the channelgains for each path. Therefore, these supports and amplitudes are key components to be estimated to obtain accurate CSI.Moreover, it has been shown that pilot training overhead canbe reduced with compressive estimation, unlike the conven-tional approaches such as those based on least squares (LS)estimation [6].Several channel estimation methods based on CS tools thatexplore the mmWave channel sparsity have been investigatedin the literature [6], [9]–[12]. A distributed grid matchingpursuit (DGMP) channel estimation scheme is presented in [9],where the dominant entries of the line-of-sight (LoS) channel a r X i v : . [ c s . I T ] F e b path are detected and updated iteratively. In [10], an orthog-onal matching pursuit (OMP) channel estimation scheme todetect multiple channel paths support entries is also consid-ered. Likewise, a simultaneous weighted orthogonal matchingpursuit (SW-OMP) channel estimation scheme based on aweighted OMP method is developed in [11] for frequency-selective mmWave systems. A sparse reconstruction problemwas formulated in [11] to estimate the channel independentlyfor every subcarrier by exploiting common sparsity in thefrequency domain. However, such optimization and CS-basedchannel estimation schemes detect the support indices of themmWave channel sequentially and greedily, and hence are notglobally optimal [12].Alternatively, deep learning (DL) approaches and data-driven algorithms have recently received much attention askey enablers for beyond 5G networks. Traditionally, signalprocessing and numerical optimization techniques have beenheavily used to address channel estimation at mmWave bands[9]–[12]. However, optimization algorithms often demandconsiderable computational complexity overhead, which cre-ates a barrier between theoretical design/analysis and real-time processing requirements. Hence, the prior data-set ob-servations and deep neural network (DNN) models can beleveraged to learn the non-trivial mapping from compressedreceived pilots to channels. DNNs can be used to approximatethe optimization problems by selecting the suitable set ofparameters that minimize the approximation error. The useof DNNs is expected to substantially reduce computationalcomplexity and processing overhead since it only requiresseveral layers of simple operations such as matrix-vectormultiplications. Moreover, several successful DL applicationshave been demonstrated in wireless communications problemssuch as channel estimation [13]–[22], analog beam selection[23], [24], and hybrid beamforming [23], [25]–[29]. Besides,DL-based techniques, when compared with other conventionaloptimization methods, have been shown [14], [27], [28],[30] to be more computationally efﬁcient in searching forbeamformers and more tolerant to imperfect channel inputs. In[15], a learned denoising-based approximate message passing(LDAMP) network is presented to estimate the mmWavecommunication system with lens antenna array, where thenoise term is detected and removed to estimate the channel.However, channel estimation for mmWave massive MIMOsystems with hybrid architecture is not considered in [15].Prior work on channel estimation for hybrid mmWaveMIMO architecture [15]–[22], [25]–[27], [31]–[34] considerthe narrow-band ﬂat fading channel model for tractability,while the practical mmWave channels exhibit the widebandfrequency-selective fading due to the very large bandwidth,short coherence time and different delays of multipath [11],[35], [36]. MmWave environments such as indoor and vehic-ular communications are highly variable with short coherencetime [36] which requires channel estimation algorithms thatare robust to the rapidly changing channel characteristics .Accordingly, this paper presents combination of DL and CS The coherence time is within few milliseconds such as whenoperating at

60 GHz with bandwidth [36]. methods to identify AoA/AoD pairs’ indices and estimate thechannel amplitudes for frequency-selective channel estimationof hybrid MIMO systems.

B. Contributions of the Paper

In this paper, we propose a frequency-selective channel es-timation framework for mmWave MIMO systems with hybridarchitecture. By considering the mmWave channel sparsity, thedeveloped method aims at reaping the full advantages of bothCS and DL methods. We consider the received pilot signal asan image, and then employ a denoising convolutional neuralnetwork (DnCNN) from [37] for channel amplitude estimation.Thereby, we treat image denoising as a plain discriminativelearning problem, i.e., separating the noise from a noisy imageby feed-forward convolutional neural networks (CNNs). Themain motivations behind using CNNs are twofold: First, deepCNNs have been recognized to effectively extract image fea-tures [37]. Second, considerable advances have been achievedon regularization and learning methods for training CNNs,including Rectiﬁer Linear Unit (ReLU), batch normalization,and residual learning [38]. These methods can be adoptedin CNNs to speed up the training process and improve thedenoising performance. The main contributions of the papercan be summarized as follows:1) We propose a deep learning compressed sensing channelestimation (DL-CS-CE) scheme for wideband mmWavemassive MIMO systems. The proposed DL-CS-based chan-nel estimation (DL-CS-CE) algorithm aims at exploiting theinformation on the support coming from every subcarrierin the MIMO-OFDM system. It is executed in two steps:channel amplitude estimation through deep learning andchannel reconstruction. We train a DnCNN using real mmWave channel realizations obtained from Raymobtime . The correlation between the received signal vectors andthe measurement matrix is fed into the trained DnCNNto predict the channel amplitudes. Using the obtainedchannel amplitudes, the indices of dominant entries of thechannel are obtained, based on which the channel can bereconstructed. Unlike the existing work of [9]–[11] thatestimates the dominant channel entries sequentially, weestimate dominant entries simultaneously, which is able tosave in computational complexity and improve estimationperformance.2) Using the DL-CS-CE for support detection, we proposea reﬁned DL-CS-CE algorithm that exploits the spatiallycommon sparsity within the system bandwidth. A channelreconstruction with a low complexity multi-resolution ﬁne-tuning approach is developed that further improves NMSEperformance by enhancing the accuracy of the estimatedAoAs/AoDs. The channel reconstruction is performed byconsuming a very small amount of pilot training frames,which signiﬁcantly reduces the training overhead and com-putational complexity.3) Simulation results in the low SNR regime show that bothproposed algorithms signiﬁcantly outperform the frequencydomain approach developed in [11]. Numerical results also show that using a reasonably small pilot training frames,approximately in the range of 60-100 frames, leads tosubstantially low channel estimation errors. The proposedalgorithms are also compared with existing solutions byanalyzing the trade-off between delivered performance andincurred computational complexity. Our analysis revealsthat both proposed channel estimation methods achieve thedesired performance at signiﬁcant lower complexity. Thedeveloped approaches are shown to attain an NMSE gapof − . with the Cramer Rao Lower Bound (CRLB)compared to the −

10 dB gap attained by the SW-OMPtechnique, while reducing the computational complexity bytwo orders of magnitude.

C. Notation and Paper Organization

Bold upper case, bold lower case, and lower case letterscorrespond to matrices, vectors, and scalars, respectively.Scalar norms, vector L norms, and Frobenius norms, aredenoted by |·| , (cid:107)·(cid:107) , and (cid:107)·(cid:107) F , respectively. We use X todenote a set. I X denotes a X × X identity matrix. E [ · ] , ( · ) T , ¯( · ) , and ( · ) ∗ stand for expected value, transpose, complexconjugate, and Hermitian. X † stands for the Moore-Penrosepseudo-inverse of X . [ x ] i represents i th element of a vector x . The ( i, j ) th entry of a matrix X is denoted by [ X ] i,j .In addition, [ X ] : ,j and [ X ] : , Ω denote the j th column vectorof matrix X and the sub-matrix consisting of the columnsof matrix X with indices in set Ω . { a } mod b means a modulo b . CN ( µ , C ) refers to a circularly-symmetric complexGaussian distribution with mean µ and covariance matrix C .The operations vec( X ) , vec2mat( x , sz ) , sub2ind( sz, [ r, c ]) ,and ind2sub( sz, i ) correspond to transforming a matrix into avector, transforming a vector into a matrix for a deﬁned size( sz ), transforming the row r and column c subscripts of amatrix into their corresponding linear index, and transformingthe linear index i into its corresponding row and columnsubscripts for a matrix of a deﬁned size ( sz ), respectively. X ⊗ Y is the Kronecker product of X and Y . Key model-related notation is listed in Table I.The rest of the paper is organized as follows. The systemmodel for the frequency selective mmwave MIMO systemis described in Section II. In Section III, the proposed twodeep learning-based compressive sensing channel estimationschemes in the frequency domain are introduced. Moreover,complexity analysis in terms of convergence and computa-tional analysis is presented in Section IV. Case studies withnumerical results are simulated and analyzed based on theproposed schemes in Section V. Section VI concludes thepaper.II. S YSTEM M ODEL AND P ROBLEM F ORMULATION

This section ﬁrst provides the system and channel modelsof frequeny-selective hybrid mmWave transceivers. Then, itformulates a sparse recovery problem to estimate the sparsechannel in the frequency domain. Table I: N

OTATION

Notation Deﬁnition F RF ∈ C N t × L t RF analog precoder (time domain (TD)) W RF ∈ C N r × L r RF analog combiner (TD) F BB [ k ] ∈ C L t × N s Baseband digital precoder (frequency domain(FD)) W BB [ k ] ∈ C L r × N s Baseband digital combiner (FD) s [ k ] ∈ C N s × Data symbol vector (FD) H d ∈ C N r × N t d th delay tap of the channel (TD) ∆ d ∈ C L × L Complex diagonal matrix (time domain) A R ∈ C N r × L Receive array steering matrix A T ∈ C N t × L Transmit array steering matrix H [ k ] ∈ C N r × N t Channel at k th subcarrier (FD) ∆ [ k ] ∈ C L × L Complex diagonal matrix (FD) ˜ A R ∈ C N r × G r Dictionary matrix for receive array response ˜ A T ∈ C N t × G t Dict. matrix for transmit array response ˜ A rR ∈ C N r × G rr Reﬁning dict. matrix for receive array response ˜ A rT ∈ C N t × G rt Reﬁning dict. matrix for transmit array response ∆ vd ∈ C G r × G t Path gains sparse matrix of the virtual channel (TD) ∆ v [ k ] ∈ C G r × G t Path gains sparse matrix of the virtual channel (FD) Φ ∈ C ML r × N t N r Measurement matrix Ψ ∈ C N t N r × G t G r Dictionary matrix h v [ k ] ∈ C G r G t × Sparse vector containing complex channel gains (FD) Υ ∈ C ML r × G t G r Equivalent measurement matrix y [ k ] ∈ C ML r × Received signal (FD) c [ k ] ∈ C G r G t Correlation vector (FD) C w ∈ C ML r × ML r Noise covariance matrix of y [ k ] D w ∈ C ML r × ML r Whitening matrix (upper triangular matrix) y w [ k ] ∈ C ML r × Whitened received signal (FD) Υ w ∈ C ML r × G t G r Whitened measurement matrix Υ d w ∈ C ML r × G t G rr White. meas. matrix to remove detection uncertainty Υ r w ∈ C ML r × G t G r White. meas. matrix for reﬁning C α [ k ] ∈ R G r × G t Input matrix to the DnCNN (FD) G [ k ] ∈ R G r × G t Output matrix of the DnCNN (FD) g [ k ] ∈ R G r G t × Vectorized form of G [ k ] (FD) ξ [ k ] ∈ C L × Vector of actual channel gains (FD) P ∈ C ML r × ML r Projection matrix r [ k ] ∈ C ML r × Residual vector (FD) T Sparse channel support set K Subset from total K subcarriers A. System Model

As shown in Fig. 1, we consider an OFDM-based mmWaveMIMO link employing a total of K subcarriers to send N s data streams from a transmitter with N t antennas to a receiverwith N r antennas. The system is based on a hybrid MIMOarchitecture, with L t < N t and L r < N r radio frequency(RF) chains at the transmitter and receiver sides. Followingthe notation of [11], we deﬁne a frequency-selective hybridprecoder F [ k ] = F RF F BB [ k ] ∈ C N t × N s , k = 0 , . . . , K − ,where F RF and F BB [ k ] are the analog and digital precoders,respectively. Although, the analog precoder is considered tobe frequency-ﬂat, the digital precoder is different for everysubcarrier. The RF precoder and combiner are deployed usinga fully connected network of quantized phase shifters, asdescribed in [6]. During transmission, the transmitter (TX)ﬁrst precodes data symbols s [ k ] ∈ C N s × at each subcarrier byapplying the subcarrier-dependent baseband precoder F BB [ k ] .The symbol blocks are then transformed into the time domainusing L t parallel K -point inverse Fast Fourier transform(IFFT). After adding the cyclic preﬁx (CP), the transmitteremploys the subcarrier-independent RF precoder F RF to formthe transmitted signal. The complex baseband signal at the k th BasebandPrecoder

IFFT + CP RFChain(1)+ CPIFFT RFChain ... ...... .........

Tx Streams Transmitter Blocks BasebandCombiner

FFT- CPRFChain(1) - CP FFTRFChain ... ... ... .........

Receiver Blocks Rx Streams ......

Channel ... ...

Fig. 1: Hybrid architecture system model of a mmWave MIMO system, which includes analog/digital precoders and combiners.subcarrier can be expressed as x [ k ] = F RF F BB [ k ] s [ k ] , (1)where s [ k ] denotes the transmitted symbol sequence at the k th subcarrier of size N s × .

1) Channel Model:

We consider a frequency-selectiveMIMO channel between the transmitter and the receiver,with a delay tap length of N c in the time domain. The d th delay tap of the channel is denoted by an N r × N t matrix H d , d = 0 , , . . . , N c − . Assuming a geometric channelmodel [11], H d can be written as H d = (cid:113) N t N r Lρ L L (cid:88) (cid:96) =1 α (cid:96) p rc ( dT s − τ (cid:96) ) a R ( φ (cid:96) ) a ∗ T ( θ (cid:96) ) , (2)where ρ L represents the path loss between the transmitter andthe receiver; L corresponds to the number of paths; T s denotesthe sampling period; p rc ( τ ) is a ﬁlter that includes the effectsof pulse-shaping and other lowpass ﬁltering evaluated at τ ; α (cid:96) ∈ C is the complex gain of the (cid:96) th path; τ (cid:96) ∈ R is the delayof the (cid:96) th path; φ (cid:96) ∈ [0 , π ] and θ (cid:96) ∈ [0 , π ] are the AoA andAoD of the (cid:96) th path, respectively; and a R ( φ (cid:96) ) ∈ C N r × and a T ( θ (cid:96) ) ∈ C N t × are the array steering vectors for the receiveand transmit antennas, respectively. Both the transmitter andthe receiver are assumed to use Uniform Linear Arrays (ULAs)with half-wavelength separation. Such an ULA has steeringvectors obeying the expressions [ a T ( θ (cid:96) )] n = (cid:113) N t e j nπ cos ( θ (cid:96) ) , n = 0 , . . . , N t − , [ a R ( φ (cid:96) )] m = (cid:113) N r e j mπ cos ( φ (cid:96) ) , m = 0 , . . . , N r − . The channel can be expressed more compactly in the followingform: H d = A R ∆ d A ∗ T (3)where ∆ d ∈ C L × L is diagonal with non-zero complexdiagonal entries, and A R ∈ C N r × L and A T ∈ C N t × L containthe receive and transmit array steering vectors a R ( φ l ) and a T ( θ l ) , respectively. The channel at subcarrier k can be writtenin terms of the different delay taps as H [ k ] = N c − (cid:88) d =0 H d e − j πkK d = A R ∆ [ k ] A ∗ T . (4)where ∆ [ k ] ∈ C L × L is diagonal with non-zero complexdiagonal entries such that ∆ [ k ] = (cid:80) N c − d =0 ∆ d e − j πkN d , k =0 , . . . , K − .

2) Extended Virtual Channel Model:

According to [2], wecan further approximate the channel H d using the extended virtual channel model as H d ≈ ˜ A R ∆ vd ˜ A ∗ T , (5)where ∆ vd ∈ C G r × G t corresponds to a sparse matrix thatcontains the path gains in the non-zero elements. Moreover,the dictionary matrices ˜ A T and ˜ A R contain the transmitterand receiver array response vectors evaluated on a grid ofsize G r (cid:29) L for the AoA and a grid of size G t (cid:29) L for the AoD, i.e., ˜ θ (cid:96) ∈ { , πG r , . . . , π ( G r − G r } and ˜ φ (cid:96) ∈{ , πG t , . . . , π ( G t − G t } , respectively: ˜ A T = [ a T (˜ θ ) . . . a T (˜ θ G t )] , (6) ˜ A R = [ a R ( ˜ φ ) . . . a R ( ˜ φ G r )] . (7)Since we have few scattering clusters in mmWave channels,the sparse assumption for ∆ vd ∈ C G r × G t is commonlyaccepted. To help expose the sparse structure, we can expressthe channel at subcarrier k in terms of the sparse matrices ∆ vd and the dictionaries as follows H [ k ] ≈ ˜ A R (cid:18) N c − (cid:88) d =0 ∆ vd e − j πkK d (cid:19) ˜ A ∗ T ≈ ˜ A R ∆ v [ k ] ˜ A ∗ T . (8)where ∆ [ k ] = (cid:80) N c − d =0 ∆ vd e − j πkN d , k = 0 , . . . , K − , is a G r × G t complex sparse matrix containing the channel gainsof the virtual channel.

3) Signal Reception:

Considering that the receiver (RX)applies a hybrid combiner W [ k ] = W RF W BB [ k ] ∈ C N r × N s ,the received signal at subcarrier k can be expressed as y [ k ] = W ∗ BB [ k ] W ∗ RF H [ k ] F RF F BB [ k ] s [ k ] + W ∗ BB [ k ] W ∗ RF n [ k ] , (9)where n [ k ] ∼ CN (cid:0) , σ I (cid:1) corresponds to the circularlysymmetric complex Gaussian distributed additive noise vector.The received signal model in (9) corresponds to the datatransmission phase. As explained in Section III, during thechannel acquisition phase, frequency-ﬂat training precodersand combiners will be considered to reduce complexity. B. Problem Formulation

During the training phase, transmitter and receiver use atraining precoder F ( m ) tr ∈ C N t × L t and a training combiner W ( m ) tr ∈ C N r × L r for the m th pilot training frame, respectively.The precoders and combiners considered in this phase arefrequency-ﬂat to keep the complexity of the sparse recoveryalgorithms low. The transmitted symbols are assumed tosatisfy E { s ( m ) [ k ] s ( m ) ∗ [ k ] } = PN s I N s , where P is the totaltransmitted power and N s = L t . The transmitted symbol s ( m ) [ k ] is decomposed as s ( m ) [ k ] = q ( m ) t ( m ) [ k ] , with q ( m ) ∈ C L t × is a frequency-ﬂat vector and t ( m ) [ k ] is a pilot symbolknown at the receiver. This decomposition is used to reducecomputational complexity since it allows simultaneous use ofthe L t spatial degrees of freedom coming from L t RF chainsand enables channel estimation using a single subcarrier-independent measurement matrix. Moreover, each entry in F ( m ) tr and in W ( m ) tr are normalized such that their squared-modulus would be N t and N r , respectively. Then, the receivedsamples in the frequency domain for the m th training framecan be expressed as y ( m ) [ k ] = W ( m ) tr ∗ H [ k ] F ( m ) tr q ( m ) t ( m ) [ k ] + n ( m ) c [ k ] , (10)where H [ k ] ∈ C N r × N t denotes the frequency-domain MIMOchannel response at the k th subcarrier and n ( m ) c [ k ] ∈ C L r × , n ( m ) c [ k ] = W ( m ) tr ∗ n ( m ) [ k ] , represents the frequency-domaincombined noise vector received at the k th subcarrier. Theaverage received SNR is given by SNR = Pρ L σ . Furthermore,the channel coherence time is assumed to be larger than theframe duration and that the same channel can be consideredfor several consecutive frames.

1) Measurement Matrix:

In order to apply sparse recon-struction with a single subcarrier-independent measurementmatrix, we ﬁrst remove the effect of the scalar t ( m ) [ k ] by mul-tiplying the received signal by t ( m ) [ k ] − . Using the followingproperty vec { AXC } = ( C T ⊗ A ) vec { X } , the vectorizedreceived signal is given by vec { y ( m ) [ k ] } = ( q ( m ) T F ( m ) T tr ⊗ W ( m ) ∗ tr ) vec { H [ k ] } + n ( m ) c [ k ] . (11)The vectorized channel matrix can be expressed as vec { H [ k ] } = ( ¯˜ A T ⊗ ˜ A R ) vec { ∆ v [ k ] } . (12)Furthermore, we deﬁne the measurement matrix Φ ( m ) ∈ C L r × N t N r : Φ ( m ) = ( q ( m ) T F ( m ) T tr ⊗ W ( m ) ∗ tr ) , (13)and the dictionary Ψ ∈ C N t N r × G t G r as Ψ = ( ¯˜ A T ⊗ ˜ A R ) , (14)Then, the vectorized received pilot signal L r × at the m th training symbol can be written as vec { y ( m ) [ k ] } = Φ ( m ) Ψ h v [ k ] + n ( m ) c [ k ] , (15)where h v [ k ] = vec { ∆ v [ k ] } ∈ C G r G t × is the sparse vectorcontaining the complex channel gains. Moreover, we useseveral training frames to get enough measurements andaccurately reconstruct the sparse vector h v [ k ] , especially inthe very-low SNR regime. Therefore, when the transmitter andreceiver communicate during M training steps using differentpseudorandomly built precoders and combiners, (15) can beextended to M received signals given by  y (1) [ k ] ... y ( M ) [ k ] (cid:124) (cid:123)(cid:122) (cid:125) y [ k ] =  Φ (1) ... Φ ( M )  T (cid:124) (cid:123)(cid:122) (cid:125) Φ Ψ h v [ k ] +  n (1) c [ k ] ... n ( M ) c [ k ] (cid:124) (cid:123)(cid:122) (cid:125) n c [ k ] . (16)Hence, the vector h v [ k ] can be estimated by solving thesparse reconstruction problem as done in [11], min (cid:107) h v [ k ] (cid:107) subject to (cid:107) y [ k ] − ΦΨ h v [ k ] (cid:107) < (cid:15), (17)where (cid:15) represents a tunable parameter deﬁning the maximum error between the reconstructed channel and the receivedsignal. In realistic scenarios, the sparsity (number of channelpaths) is usually unknown, therefore the choice of (cid:15) is criticalto solve (17) and estimate the sparsity level. The choice ofthis parameter is explained in Section III-D.Interestingly, the matrices in (8) exhibit the same sparsestructure for all k , since the AoA and AoD do not changewith frequency in the transmission bandwidth. This is aninteresting property that can be leveraged when solving thecompressed channel estimation problem deﬁned in (17). More-over, we denote the supports of the virtual channel matrices ∆ vd as T , T , . . . , T N c − , d = 0 , . . . , N c − . Then, knowing h v [ k ] = vec { ∆ v [ k ] } , with ∆ v [ k ] = (cid:80) N c − d =0 ∆ vd e − j πkN d , k = 0 , . . . , K − , the supports of h v [ k ] are deﬁned as supp { h v [ k ] } = N c − (cid:91) d =0 supp { vec { ∆ vd }} k = 0 , . . . , K − , (18)where the union of the supports of the time-domain virtualchannel matrices is due to the additive nature of the Fouriertransform. Therefore, as shown in (18), where the union isindependent of the subcarrier k , ∆ [ k ] has the same supportsfor all k .

2) Correlation Matrix:

To estimate multi-path componentsof the channel, i.e., AoAs/AoDs and channel gains, we ﬁrstneed to compute the atom, which is deﬁned as the vector thatproduces the largest sum-correlation with the received signalsin the measurement matrix. The sum-correlation is especiallyconsidered as the support of the different sparse vectors is thesame over the K subcarriers. The correlation vector c [ k ] ∈ C G r G t is given by c [ k ] = Υ ∗ y [ k ] , (19)where Υ ∈ C ML r × G t G r , Υ = ΦΨ represents the equivalentmeasurement matrix which is the same ∀ k and y [ k ] ∈ C ML r × is the received signal for a given k , k = 0 , . . . , K − .One can note that if there exists a correlation between noisecomponents, the atom estimated from the projection in (19)might not be the correct one. In order to compensate for thiserror in estimation, we consider the noise covariance matrixwhen performing the correlation step. In particular, we con-sider two arbitrary (hybrid) combiners W ( m ) tr ( i ) , W ( m ) tr ( j ) ∈ C N r × L r for two arbitrary training steps i, j and a given sub-carrier k . Hence, the combined noise at a given training step i and subcarrier k is represented as n ( i ) c [ k ] = W ( i ) ∗ tr n ( i ) [ k ] , with n ( i ) [ k ] ∼ N ( , σ I L r ) , which results in noise cross-covariancematrix given by E { n ( i ) c [ k ] n ( j ) ∗ c [ k ] } = W ( i ) ∗ tr σ δ [ i − j ] W ( j ) tr .We can further write the noise covariance matrix of y [ k ] as ablock diagonal matrix C w ∈ C ML r × ML r , C w = blkdiag { W tr (1) ∗ W tr (1) , . . . , W tr ( M ) ∗ W tr ( M ) } . (20)Moreover, Cholesky factorization can be used to factorize C w into C w = D ∗ w D w , where D w ∈ C ML r × ML r is an uppertriangular matrix. Then, by taking into consideration the noisecovariance matrix, the correlation step is given by c [ k ] = Υ ∗ w y w [ k ] , (21)where Υ w ∈ C ML r × G t G r represents the whitened mea-surement matrix given by Υ w = D −∗ w Υ . And, the M L r × whitened received signal y w [ k ] is given by Input Residual Noise

64 filters (3x3x1) 64 filters (3x3x64) 1 filter (3x3x64)

Conv + ReLU Conv +BN+ ReLU Conv Final Regression Output Layer 𝐂 𝒂 [𝑘] = |𝒄[𝑘]| (𝐆[𝑘] = |*𝚫 [𝑘]| Fig. 2: Proposed denoising convolutional neural network (DnCNN) for multicarrier channel amplitude estimation. y w [ k ] = D −∗ w y [ k ] . The matrix D − w ∈ C ML r × ML r is givenby D − w = blkdiag (cid:26)(cid:16) D (1) w (cid:17) − , . . . , (cid:16) D ( M ) w (cid:17) − (cid:27) , where (cid:16) D ( m ) w (cid:17) − can be considered as a frequency-ﬂat basebandcombiner W ( m ) BB ,tr used in the m -th training step. Therefore,by applying the whitened measurement matrix, the resultingcorrelation would simultaneously whiten the spatial noisecomponents and estimate a more accurate support index inthe sparse vectors h v [ k ] .III. D EEP L EARNING AND C OMPRESSIVE -S ENSING B ASED C HANNEL E STIMATION (DL-CS-CE)To solve the CS channel estimation problem formulatedabove, this section proposes two DL-based algorithms. Bothleverage the common support between the channel matrices forevery subcarrier and provide different complexity-performancetrade-offs. The former simultaneously estimate the supportusing an ofﬂine-trained DnCNN and then reconstruct thechannel. On the other hand, the latter applies further ﬁne-tuning to accurately estimate the AoAs and AoDs with higherresolution dictionary matrices while keeping computationalcomplexity low.

A. Ofﬂine Training and Online Deployment of DnCNN

Before delving into the proposed solutions’ details, let usﬁrst provide insights into the considered DnCNN architectureas well as its ofﬂine training and online deployment.

1) DnCNN Architecture:

Fig. 2 illustrates the networkarchitecture of the DnCNN denoiser that consists of L C convolutional (Conv) layers. Each layer uses c ( l )CL different D ( l ) x × D ( l ) y × D ( l ) z ﬁlters. The ﬁrst convolutional layer is fol-lowed by a rectiﬁed linear unit (ReLU). The succeeding L C − convolutional layers are followed by batch-normalization (BN)and a ReLU. The ﬁnal L C th convolutional layer uses oneseparate D ( L C ) x × D ( L C ) y × D ( L C ) z ﬁlter to reconstruct thesignal. Here, D ( l ) x , D ( l ) y and D ( l ) z are the convolutional kerneldimensions, and c ( l )CL is the number of ﬁlters in the l th layer. We present three pseudo-color images of the noisy channel,residual noise, and estimated output channel in Fig. 2. TheDnCNN considers the amplitude of the correlation G r × G t matrix, i.e., C α [ k ] = vec2mat( | c [ k ] | , [ G r , G t ]) , ∀ k, (22)as input and produces residual noise as an output, rather thanestimated channel amplitudes, where we deﬁne a G r × G t matrix of channel amplitudes as G [ k ] = | ∆ v [ k ] | ∈ R G r × G t , ∀ k. (23)The DnCNN aims to learn a mapping function F ( C α [ k ]) = G [ k ] to predict the latent clean image from noisy observation C α [ k ] . We adopt the residual learning formulation to traina residual mapping R ( C α [ k ]) ≈ V where V is the residualnoise, and then we have G [ k ] = C α [ k ] −R ( C α [ k ]) . Instead oflearning a mapping directly from a noisy image to a denoisedimage, learning the residual noise is beneﬁcial [37], [38].Furthermore, the averaged mean squared error between thedesired residual images and estimated ones from noisy inputis adopted as the loss function to learn the trainable parameters Θ of the DnCNN. This loss function is given by (cid:96) ( Θ ) = 12 N N (cid:88) i =1 (cid:107)R ( C α [ k ] i ; Θ ) − ( C α [ k ] i − G [ k ] i ) (cid:107) F (24)where ( C α [ k ] i , G [ k ] i ) Ni =1 represents N noisy-clean trainingpatch pairs. This method is also known as residual learning[38] and renders the DnCNN to remove the highly structurednatural image rather than the unstructured noise. Consequently,residual learning improves both the training times and accu-racy of a network. In this way, combining batch normalizationand residual learning techniques can accelerate the trainingspeed and improve the denoising performance. Besides, batchnormalization has been shown to offer some merits for residuallearning, such as alleviating internal covariate shift problem in[20], [37].

2) Ofﬂine Training of the DnCNN:

During ofﬂine trainingof the DnCNN, the dataset of C α [ k ] , ∀ k and G [ k ] , ∀ k is gen-erated based on the realistic Raymobtime dataset for mmWave DnCNNDataset(Environment) 𝐂 𝒂 [k] Calculate Loss

𝐆[𝑘] = |𝚫[𝑘]| 𝐆*[𝑘]

Update WeightsReal-LifeEnvironment Multicarrier Channel Reconstruction 𝐂 𝒂 [𝑘] 𝐆*[𝑘] 𝐇, [𝑘] vec2mat( 𝒄 𝑘 ) (a) Offline Training(b) Online Deployment Offline TrainedDnCNN

Fig. 3: Block diagram of the DL-CS-CE Scheme: ofﬂine training and online deployment.frequency selective channel environment . With the mmWavechannel amplitude in (23) and the correlation of the receivedsignals and the measurement matrix in (22), the training dataof C α [ k ] and G [ k ] can be obtained. In particular, the processto obtain C α [ k ] and G [ k ] involves the following four steps: i)generation of channel matrices based on the mmWave channelmodel from the Raymobtime dataset ii) obtaining G [ k ] basedon (23); iii) computing the whitened received signal vector y w [ k ] ∀ k ; and iv) acquiring the amplitudes of the correlationvector c [ k ] and transforming it into a matrix form C α [ k ] asper (22).

3) Online Deployment of the DnCNN:

During the on-line deployment of the DL-CS-CE, we obtain the measuredreceived signal y w [ k ] from the realistic mmWave channelenvironments. We compute C α [ k ] based on (22), which isthen fed to the ofﬂine-trained DnCNN. Then, the trainedDnCNN would predict ˆG [ k ] , from which we can estimatethe supports of ∆ v [ k ] . An interesting and noteworthy issueis that we can feed the trained DnCNN a subset K p of K subcarriers of the amplitudes of the correlation matrices C α [ k ] , to eventually estimate the support of ∆ v [ k ] , since asshown in Section II-B1 ∆ v [ k ] have the same support forall k . In particular, the support can be estimated if a smallnumber of subcarriers K p (cid:28) K is used instead. This willeliminate the need for computing C α [ k ] for all subcarriers andeventually reduce the overall computational complexity at thecost of a negligible performance degradation. By leveragingfrom triangle inequality, || y [ k ] || ≤ || Φ h v [ k ] || + || n c [ k ] || ,such that the K p selected signals are expected to exhibitthe strongest channel response. Therefore, the K p subcarriershaving largest (cid:96) -norm will be exploited to derive an estimateof the support of the already deﬁned sparse channel matrix ∆ v [ k ] , k = 0 , . . . . , K − . B. Algorithm 1: DL-CS-CE

The state-of-the-art sparse channel estimation schemes [11,and references therein] depend on greedy algorithms to detectthe supports sequentially, which naturally yield suboptimalsolutions. This motivated us to exploit the neural networks toestimate all supports simultaneously rather than sequentially.The algorithmic implementation of the proposed DL-CS-CEsolution is presented in Algorithm 1. After initialization stepsbetween lines 1-3 and the computation of the whitened equiv-alent observation matrix in line 4, DL-CS-CE is structuredbased on three main procedures: • Estimation of the channel amplitudes by using an ofﬂine-trained DnCNN, • Sorting the estimated channel amplitudes in descendingorder to select the supports of dominant entries, • Reconstruction of the channel according to the selectedindices,which are explained in the sequel.

1) Strongest Subcarriers Selection:

This procedure is rep-resented in lines 8-11 of Algorithm 1, where the algorithmiteratively ﬁnds a subset

K ∈ K containing the K p strongestsubcarriers which are expected to exhibit the strongest channelresponse as explained in Section III-A3.

2) Amplitude Estimation:

As depicted in Fig. 3, the lines13 and 14 of Algorithm 1 ﬁrst compute the correlation vectoras per (21) and then create the DnCNN input C α [ k ] by puttingcorrelation vectors into a matrix form as per (22), respectively.In line 15, the ofﬂine trained DnCNN is used as the kernel ofthe channel amplitude estimation to obtain the DnCNN output ˆG [ k ] of size G r × G t , which is the estimate of G [ k ] givenin (23). It is worth noting that we only use a subset K of thecorrelation matrices C α [ k ] ∀ k ∈ K as an input to the DnCNN.In line 16, the output channel amplitude estimation matrix ˆG [ k ] is then vectorized into the following G t G r × vector Algorithm 1

DL-CS-CE

Input: y [ k ] , Φ , Ψ , ˜ A T , ˜ A R , K p , (cid:15) y w [ k ] ← D −∗ w y [ k ] ∀ k r [ k ] ← y w [ k ] ∀ k ˆ T , K ← {∅} Υ w ← D −∗ w ΦΨ K ← F IND S TRONGEST S UBCARRIERS ( y [ k ] ) ˆ g [ k ] ← E STIMATE A MPLITUDES ( Υ ∗ W , r [ k ] , K ) ˆ H [ k ] ← R ECONSTRUCT C HANNEL ( ˆ g [ k ] ) return ˆ H [ k ] procedure F IND S TRONGEST S UBCARRIERS ( y [ k ] ) for i = 1 : K p do K = K ∪ arg max k (cid:54)∈K (cid:107) y [ k ] (cid:107) end forreturn K end procedure procedure E STIMATE A MPLITUDES ( Υ ∗ w , r [ k ] , K ) c [ k ] ← Υ ∗ w r [ k ] , k ∈ K // as per (21) C α [ k ] ← vec2mat( | c [ k ] | , [ G r , G t ]) // as per (22) ˆ G [ k ] Online ←−−−−

DnCNN C α [ k ] // [c.f. Fig. 3.b] ˆ g [ k ] ← vec ( ˆ G [ k ]) // as per (25) return ˆ g [ k ] , ∀ k ∈ K end procedure procedure R ECONSTRUCT C HANNEL ( ˆ g [ k ] , ∀ k ) MSE ← ∞ i ← I ← I NDEX S ORT D ESCEND (cid:0)(cid:80) k ∈K | ˆ g [ k ] | (cid:1) while MSE > (cid:15) & i ≤ G t G r do ˆ T ← ˆ T ∪ I ( i ) ˆ˜ ξ [ k ] ← (cid:16) [ Υ w ] : , ˆ T (cid:17) † y w [ k ] , ∀ k r [ k ] ← y w [ k ] − [ Υ w ] : , ˆ T ˆ˜ ξ [ k ] , ∀ k MSE ← KML r (cid:80) K − k =0 r ∗ [ k ] r [ k ] i ← i + 1 end while ˆ L ← i // Estimate ˆ h v [ k ] ← as per (29). vec { ˆ ∆ v [ k ] } ← ˆ h v [ k ] vec { ˆ H [ k ] } ← ( ¯˜ A T ⊗ ˜ A R ) vec { ˆ ∆ v [ k ] } . return ˆ H [ k ] end procedure form ˆ g [ k ] = vec ( ˆ G [ k ]) , ∀ k ∈ K (25)where the indices of the maximum amplitudes of ˆ g [ k ] will beexploited for support detection.

3) Multicarrier Channel Reconstruction:

This procedurecorresponds to the last block depicted in the last stage ofthe block diagram in Fig.3.b. It detects supports by iterativelyupdating residual until the MSE falls below a predeterminedthreshold, (cid:15) . After initialization steps in lines 19 and 20,line 19 ﬁrst sums the amplitudes of predicted ˆ g [ k ] over thesubcarriers k ∈ K as the supports are the same for all k [c.f.Section II-B1]. Then, I NDEX S ORT D ESCEND function sorts the sum vector in descending order and return corresponding indexset I , |I| = G r G t . Thereafter, the while loop between lines 22and 28 follows the below steps until the termination conditionis satisﬁed:Line 23 updates the detected support set ˆ T by adding the i th element of ordered index set I . Then, line 24 projects theinput signal y w [ k ] ∀ k onto the subspace given by the detectedsupport T using Weighted Least-Squares (WLS) (cid:16) [ Υ w ] : , ˆ T (cid:17) † ,which is followed by residual update and MSE computationin lines 25 and 26, respectively. It is also worth notingthat (cid:16) [ Υ w ] : , ˆ T (cid:17) † corresponds to a WLS estimator, with thecorresponding weights given by the inverse noise covariancematrix. Lastly, line 26 increments the loop index i for the nextiteration. The ﬁnal value of i = | ˆ T | provides us with one ofthe key parameters: ˆ L , the estimate of the sufﬁcient number ofpaths that guarantees MSE > (cid:15) , i.e., L . Thereby, it is closelytied with the choice of (cid:15) , which will be explained in detailsin Section III-D. We should also note that the while loop isterminated by the MSE > (cid:15) condition almost all the time since G t G r (cid:29) ˆ L as shown in Table II .Since the support of sparse channel vectors is alreadyestimated by ˆ T , the measurement matrix can now be deﬁnedas [ Υ ] : , ˆ T ∈ C ML r × ˆ L such that [ Υ ] : , ˆ T = [ ΦΨ ] : , ˆ T . Hence, thereceived signal model for the k th subcarrier can be rewrittenas y [ k ] = [ Υ ] : , ˆ T ˜ ξ [ k ] + ˜ n c [ k ] , (26)where ˜ n c [ k ] ∈ C ML r × represents the residual noise afterestimating the channel support and ˜ ξ [ k ] ∈ C ˆ L × is the vectorcontaining the channel gains to be estimated after sparserecovery. If the support estimation is accurate enough, ˜ n c [ k ] will be approximately similar to the post-combining noisevector n c [ k ] [11]. It is important to remark that the indicesobtained by the trained DnCNN may be different from theactual channel support. In this case, the support detected ˆ T may also be different from the actual support. Likewise, thechannel gains to be estimated ˜ ξ [ k ] , can also be different fromactual vector, ξ [ k ] = vec { diag { ∆ [ k ] }} .The mathematical model in (26) is usually considered as theGeneral Linear Model (GLM), where the solution of ˜ ξ [ k ] forreal parameters is provided in [39]. For the case with complex-valued parameters, the solution is straightforward and given by ˆ˜ ξ [ k ] = (cid:16) [ Υ ] ∗ : , ˆ T C − w [ Υ ] : , ˆ T (cid:17) − [ Υ ] ∗ : , ˆ T C − w y [ k ] , (27)which can be further reduced to ˆ˜ ξ [ k ] = (cid:16) [ Υ w ] : , ˆ T (cid:17) † y w [ k ] . (28)Therefore, ˆ˜ ξ [ k ] is considered as the Minimum Variance Unbi-ased (MVU) estimator for the complex parameter vector ˜ ξ [ k ] , k = 0 , . . . , K − . Hence, it is unbiased and attains the Cram´er-Rao Lower Bound (CRLB) if the support is correctly estimated[11] . This assumption holds since mmWave channels are known to have limitednumber of paths. This is considered as Cram´er-Rao Lower Bound of a Genie-aided estima-tion problem, in which the estimator knows the location of the nonzero tapsi.e., T , as if a Genie has aided the estimator with the location of the taps [40] Once all the supports are detected, line 29 computes thesparse channel vector ˆ h v [ k ] where its non-zero elements areobtained according to [ˆ h v [ k ]] ˆ T = (cid:16) [ Υ w ] : , ˆ T (cid:17) † y w [ k ] . (29)Finally, line 32 reconstructs the channel based on (12) asfollows vec { ˆ H [ k ] } = ( ¯˜ A T ⊗ ˜ A R ) vec { ˆ ∆ v [ k ] } , (30)such that vec { ˆ ∆ v } [ k ] = ˆ h v [ k ] . C. Algorithm 2: Reﬁned DL-CS-CE

The sparsity of h v [ k ] can be impaired by channel powerleakage caused by the limited resolution of the chosen dic-tionary matrices [41]. Although the DL-CS-CE provides rea-sonable AoD/AoA estimates, the adopted virtual quantizeddictionary matrices may not obtain the exact AoDs/AoAsthat really lies in the off-grid regions of the dictionary. Inthis section, we combat this issue by developing a methodto obtain more accurate AoDs/AoAs. This new procedure iscalled reﬁned DL-CS-CE and improves NMSE performanceof Algorithm 1 while reducing the incurring computationalcomplexity at the same time.Using the superscript r for referring to the reﬁning phase,we consider higher resolution reﬁning dictionary matrices ˜ A rR and ˜ A rT with grid sizes G rr and G rt , respectively. Based on thisnotation, the reﬁned DL-CS-CE summarized in Algorithm 2follows the same implementation as that of Algorithm 1 exceptsome technical differences during the channel reconstructionstage, on which we focus our attention in the sequel. Multicarrier Channel Reconstruction and Reﬁnement:

Thewhile loop between lines 22 and 29 reﬁnes the path compo-nents by iterative projections. In line 23, the detected support I ( i ) is ﬁrst transformed into column and row indices of a G r × G t matrix representing the indices ( i d AoA , i d AoD ) of thedetected AoAs and AoDs in the original lower resolutiondictionary matrices ˜ A R and ˜ A T , respectively. In line 24,a multi-resolution ﬁne-tuning method is applied to enhancethe resolution of the detected AoAs and AoDs. The reﬁningprocedure consists of two steps as shown between lines 36and 39 of Algorithm 2. In what follows, these steps areexplained based on the column index set notation Ω K , q , where K ∈ { A , D } represents arrival or departure, and q ∈ { d , r } refer to detection or reﬁnement, respectively.1) The ﬁrst step starts with line 36 which reﬁne the anglecomponents with the highest number of antennas. Forinstance, let’s assume that N r > N t . By increasing theresolution of ˆ φ l to G rr (cid:29) G r , the maximum projection alongthe reﬁned receiving array steering matrix ˜ A rR , while thecorresponding AoD ˆ θ l is ﬁxed, can be expressed as i r AoA = arg max i (cid:34)(cid:88) k ∈K (cid:12)(cid:12)(cid:12)(cid:16)(cid:2) Υ dw (cid:3) : , Ω D,d (cid:17) ∗ y w [ k ] (cid:12)(cid:12)(cid:12)(cid:35) i , (31)where Υ dw is an M L r × G rr G t matrix such that Υ dw = Φ w ( ¯˜ A T ⊗ ˜ A rR ) , and (cid:2) Γ dw (cid:3) : , Ω D,d is an

M L r × G rr sub-matrixwith the column index set deﬁned as Ω D,d = { i d AoD : i d AoD .G rr } ; i d AoD corresponds to the index of the previouslydetected AoD before reﬁning. Then, line 37 continues with

Algorithm 2

Reﬁned DL-CS-CE

Input: y [ k ] , Φ , Ψ , ˜ A T , ˜ A R , ˜ A rT , ˜ A rR , K p , (cid:15) y w [ k ] ← D −∗ w y [ k ] ∀ k r [ k ] ← y w [ k ] ∀ k ˆ T , K ← {∅} Φ w = D −∗ w Φ Ψ ← ( ¯˜ A T ⊗ ˜ A R ) // F or Detection Υ w ← D −∗ w ΦΨ Ψ r = ( ¯˜ A rT ⊗ ˜ A rR ) // F or Ref ining Υ r w ← D −∗ w ΦΨ r K ← F IND S TRONGEST S UBCARRIERS ( y [ k ] ) ˆ g [ k ] ← E STIMATE A MPLITUDES ( Υ ∗ W , r [ k ] , K ) ˆ H [ k ] ← R ECONSTRUCT C HANNEL & R

EFINE ( ˆ g [ k ] ) return ˆ H [ k ] procedure F IND S TRONGEST S UBCARRIERS ( y [ k ] ) Lines 9-10 in Algorithm 1 end procedure procedure E STIMATE A MPLITUDES ( Υ ∗ w , r [ k ] , K ) Lines 13-16 in Algorithm 1 end procedure procedure R ECONSTRUCT C HANNEL & R

EFINE ( ˆ g [ k ] ) I ← I NDEX S ORT D ESCEND (cid:0)(cid:80) k ∈K | ˆ g [ k ] | (cid:1) MSE ← ∞ i ← while MSE > (cid:15) & i ≤ G t G r do [ i d AoA , i d AoD ] ← ind2sub([ G r , G t ] , I ( i )) ˆ T ← R EFINE ( i d A O A , i d A O D ) ˆ˜ ξ [ k ] ← (cid:16) [ Υ r w ] : , ˆ T (cid:17) † y w [ k ] , ∀ k r [ k ] ← y w [ k ] − [ Υ r w ] : , ˆ T ˆ˜ ξ [ k ] , ∀ k MSE ← KML r (cid:80) K − k =0 r ∗ [ k ] r [ k ] i ← i + 1 end while ˆ L ← i // Estimate ˆ h v [ k ] ← as per (29) but using [ Υ rw ] : , ˆ T instead vec { ˆ ∆ v [ k ] } ← ˆ h v [ k ] vec { ˆ H [ k ] } ← Ψ r vec { ˆ ∆ v [ k ] } . return ˆ H [ k ] end procedure procedure R EFINE ( i d AoA , i d AoD ) i r AoA ← as per in (31) i r AoD ← as per (32) i r AoA (cid:63) ← as per (34) i r AoD (cid:63) ← as per (32) by using i r AoA (cid:63) instead of i r AoA j (cid:63) ← sub2ind([ G rr , G rt ] , [ i r AoA (cid:63) , i r AoD (cid:63) ]) ˆ T ← ˆ T ∪ j (cid:63) return ˆ T end procedure the remaining angle by increasing the resolution of ˆ θ l to G rt (cid:29) G t . Similar to (31), the maximum projection alongthe reﬁned transmit array steering matrix ˜ A rT , while thecorresponding obtained reﬁned AoA ˆ φ l is ﬁxed, can be Table II: A

VERAGE SIZE OF ESTIMATED SUPPORT ˆ L = | ˆ T |

SNR −

15 dB −

10 dB − − L expressed as i r AoD = arg max i (cid:34)(cid:88) k ∈K (cid:12)(cid:12)(cid:12)(cid:16) [ Υ rw ] : , Ω A,r (cid:17) ∗ y w [ k ] (cid:12)(cid:12)(cid:12)(cid:35) i (32)where Υ rw is an M L r × G rr G rt matrix such that Υ rw = Φ w ( ¯˜ A rT ⊗ ˜ A rR ) , and [ Υ rw ] : , Ω A,r is an

M L r × G rt sub-matrixwith the column index set deﬁned as Ω A,r = { G rr G rt − } mod G rr + i r AoA . (33)Here, i r AoA is the index of the reﬁned AoA obtainedfrom (31).2) The second step: In line 38, after removing the angleuncertainty caused by the detection phase, we can proceedto repeat the same step by substituting all angles with theircorresponding reﬁned angles. The maximum projectionalong the reﬁned received array is given by i r AoA (cid:63) = arg max i (cid:34)(cid:88) k ∈K (cid:12)(cid:12)(cid:12)(cid:16) [ Υ rw ] : , Ω D,r (cid:17) ∗ y w [ k ] (cid:12)(cid:12)(cid:12)(cid:35) i , (34)where Ω D,r = { i r AoD : i r AoD .G rr } , and i r AoD corresponds tothe index obtained from the previous step in (32).Similarly in line 39, i r AoD (cid:63) is now obtained using equa-tion (32) but by substituting i r AoA in (33) with the obtained i r AoA (cid:63) (the result from equation (34)).Next, line 40 transforms the row and column indices [ i r AoA (cid:63) , i r AoD (cid:63) ] into a linear index j (cid:63) . The reﬁning procedurelastly updates the reﬁned support estimation set ˆ T by admit-ting index j (cid:63) into ˆ T . D. Estimation of The Sufﬁcient Number of Paths

After estimating the channel amplitudes using the trainedDnCNN, it is necessary to determine the sufﬁcient supportindices representing the number of paths needed to reconstructthe channel. To solve this detection problem, some prior infor-mation is needed to compare the received signals y [ k ] with thereconstructed signals ˆ x rec [ k ] = [ Υ ] : , ˆ T ˆ ξ [ k ] . For instance, thenoise variance is assumed to be known at the receiver in whichthe receiver can accurately estimate the noise variance beforethe training stage takes place. Hence, the received signal y [ k ] can be approximately modeled as y [ k ] ≈ ˆ x rec [ k ] + ˜ n c [ k ] , since ˆ x rec [ k ] is an estimate of the mean of y [ k ] .The estimation of the noise variance can be formulated asa Maximum-Likelihood estimation problem [11], [39]: ˆ σ ML = arg max σ L ( y , ˆ x rec , σ ) , (35)where y (cid:44) vec { y [0] , . . . , y [ K − } represents the completereceived signal, ˆ x rec (cid:44) vec { ˆ x rec [0] , . . . , ˆ x rec [ K − } is thecomplete reconstructed signal, and L ( y , ˆ x rec , σ ) denotes thelog likelihood function of y . This log likelihood function isgiven by L ( y , ˆ x rec , σ )= − KM L r ln πσ − ln det { C w }− σ K − (cid:88) k =0 (cid:0) y [ k ] − ˆ x rec [ k ]) ∗ C − w ( y [ k ] − ˆ x rec [ k ] (cid:1) . (36) The ML estimator of the noise variance is then obtainedby taking partial derivative with respect to σ where ∂ L ( y , ˆ x rec , σ ) /∂σ = 0 . Hence, (cid:99) σ ML is given by (cid:99) σ ML = 1 KM L r K − (cid:88) k =0 ( y [ k ] − ˆ x rec [ k ]) ∗ C − w ( y [ k ] − ˆ x rec [ k ]) (cid:124) (cid:123)(cid:122) (cid:125) r ∗ [ k ] r [ k ] (37)where the M L r × vector r [ k ] (cid:44) y w [ k ] − D −∗ w ˆ x rec is theresidual. One can note that r [ k ] can also be expressed as r [ k ] = ( I ML r − P ) y w [ k ] , where P ∈ C ML r × ML r representsthe projection matrix given by P = [ Υ w ] † : , ˆ T [ Υ w ] : , ˆ T .Therefore, for a sufﬁcient number of iterations, ˆ L sufﬁcientpaths are expected to be detected as those ˆ L paths correspondto the dominant ˆ L entries of (cid:80) k ∈K | h v [ k ] | . Moreover, the de-tection process is achieved when the estimated noise variancebecomes equal to the true noise variance of the received signalby setting (cid:15) to σ in (17).IV. C ONVERGENCE AND C OMPLEXITY A NALYSIS

In this section, we analyze the convergence of the proposedalgorithms to a local optimum, which is then followed by theirstep-by-step computational complexity analysis.

A. Convergence Analysis

We assume that the dictionary sizes G t and G r are largeenough to have coarsely quantized AoAs/AoDs are accuratelyestimated. For the sake of simplicity, we build the convergenceanalysis based on the notation for Algorithm 1 to analyze theconvergence, which is also applicable for Algorithm 2. In orderto insure convergence to a local optimum, the energy of theresidual computed at the n + 1 th iteration should be strictlysmaller than that of the previous n th iteration, i.e., || r ( n +1) [ k ] || < || r ( n ) [ k ] || , k = 0 , . . . , K − . (38)Noting that the residual computation for SW-OMP in [11]and proposed algorithms are identical as they follow the sameanalysis, the residual for a given iteration n is expressed as r ( n ) [ k ] = (cid:16) I ML r − P ( n ) (cid:17) y w [ k ] , (39)where P ( n ) ∈ C ML r × ML r corresponds to a projection matrixgiven by P ( n ) (cid:44) [ Υ w ] : , ˆ T ( n ) [ Υ w ] † : , ˆ T ( n ) . It is worth mentioningthat the residual r ( n ) [ k ] is the vector resulting from projecting y w [ k ] onto the subspace orthogonal to the column space of [ Υ w ] : , ˆ T ( n ) . Moreover, we can use the projection onto thecolumn space of [ Υ w ] : , ˆ T ( n ) to rewrite the condition in (38)as follows || P ( n +1) y w [ k ] || > || P ( n ) y w [ k ] || . (40)Following the notation used in Algorithm 1, the term insidethe (cid:96) -norm on the left side of (40) can be expressed as P ( n +1) y w [ k ] = (cid:2) [ Υ w ] : , ˆ T ( n ) [ Υ w ] : , ˆ p ( n +1) ∗ (cid:3) × (cid:2) [ Υ w ] : , ˆ T ( n ) [ Υ w ] : , ˆ p ( n +1) ∗ (cid:3) † y w [ k ] , (41)where ˆ p ( n +1) ∗ is the estimate for the support index foundduring the n + 1 th iteration, such that ˆ p ( n +1) ∗ (cid:54)∈ ˆ T ( n ) . This assumption holds for large enough values of M and K [6]. Table III: O

NLINE C OMPUTATIONAL C OMPLEXITYOF A LGORITHM Operation Complexity K p × c [ k ] = Υ ∗ w r [ k ] O ( K p ( G r G t ) ML r ) Estimation using DnCNN (44) max p (cid:80) k ∈K | h v [ k ] | O ( K p ( G t G r )ˆ L )( K ) × x ˆ T [ k ]= (cid:16) [ Υ w ] : , ˆ T (cid:17) † y w [ k ] O (2ˆ L L r M + ˆ L )( K ) × r [ k ]= y w [ k ] − [ Υ w ] : , ˆ T ˆ˜ ξ [ k ] O ( KL r M ˆ L ) MSE = KML r (cid:80) K − k =0 r ∗ [ k ] r [ k ] O ( KL r M ˆ L ) Overall O ( K p ( G r G t ) ML r ) Table IV: O

NLINE C OMPUTATIONAL C OMPLEXITYOF A LGORITHM Operation Complexity K p × c [ k ] = Υ ∗ w r [ k ] O ( K p ( G r G t ) ML r ) Estimation using DnCNN (44) max p (cid:80) k ∈K | h v [ k ] | O ( K p ( G t G r )ˆ L )arg max i (cid:104)(cid:80) k ∈K (cid:12)(cid:12)(cid:12)(cid:16)(cid:2) Υ dw (cid:3) : , Ω (cid:17) ∗ y w [ k ] (cid:12)(cid:12)(cid:12)(cid:105) i O ( K p ML r G rr ˆ L )arg max i (cid:104)(cid:80) k ∈K (cid:12)(cid:12)(cid:12)(cid:16) [ Υ rw ] : , Ω (cid:17) ∗ y w [ k ] (cid:12)(cid:12)(cid:12)(cid:105) i O ( K p ML r G rt ˆ L )( K ) × x ˆ T [ k ] = (cid:16) [ Υ r w ] : , ˆ T (cid:17) † y w [ k ] O (2ˆ L L r M + ˆ L )( K ) × r [ k ] = y w [ k ] − [ Υ r w ] : , ˆ T ˆ˜ ξ [ k ] O ( KL r M ˆ L ) MSE = KML r (cid:80) K − k =0 r ∗ [ k ] r [ k ] O ( KL r M ˆ L ) Overall O ( K p ML r G rr ˆ L ) Table V: O

NLINE C OMPUTATIONAL C OMPLEXITYOF

SW-OMP [11].

Operation Overall Complexity

For grid size dictionary matrices G r G t O ( K ( G r G t ) ML r L ) For grid size dictionary matrices G rr G rt O ( K ( G rr G rt ) ML r L ) By using the formula for the inverse of a × block matrix(from Appendix 8B in [39]), the projection matrix P ( n +1) canbe recursively written as a function of P ( n ) as P ( n +1) = P ( n ) + (cid:0) I ML r − P ( n ) (cid:1) [ Υ w ] : , ˆ p ( n +1) ∗ [ Υ w ] ∗ : , ˆ p ( n +1) ∗ (cid:0) I ML r − P ( n ) (cid:1) [ Υ w ] ∗ : , ˆ p ( n +1) ∗ (cid:0) I ML r − P ( n ) (cid:1) [ Υ w ] : , ˆ p ( n +1) ∗ (cid:124) (cid:123)(cid:122) (cid:125) ∆P ( n +1) , (42)with ∆P ( n +1) ∈ C ML r × ML r is another projection matrix thatconsiders the relation between the projections at the n th and n + 1 th iterations. The equation in (42) can be easily shown tofulﬁll the orthogonality principle, i.e., P ( n +1) ∆P ( n +1) = .The left-handed term in (40) then can be expressed as || P ( n +1) y w [ k ] || = || P ( n ) y w [ k ] + ∆P ( n +1) y w [ k ] || = || P ( n ) y w [ k ] || + || ∆P ( n +1) y w [ k ] || , (43)which satisﬁes the triangle equality. Moreover, ∆P ( n +1) is idempotent [39] in which, using straight-forward linear al-gebraic manipulations, it is easy to show that ∆P ( n +1) = (cid:0) ∆P ( n +1) (cid:1) . Hence, the eigen values of ∆P ( n +1) are either or , thereby, || P ( n +1) y w [ k ] || > || P ( n ) y w [ k ] || . Sincethe condition in (40) is satisﬁed, the proposed algorithmsare therefore guaranteed to converge to a local optimum.Moreover, Table II shows the average number of sufﬁcientiterations | ˆ T | = ˆ L for a range of SNR values. The results in Table VI: S IMULATION P ARAMETERS

Parameter Value

Total size of dataset , Total number of subcarriers ( K ) Subset number of subcarriers ( K p ) K/ Operating frequency

60 GHz

Number of TX (RX) antennas N t ( N r ) 16(64) Number of TX (RX) RF chains L t ( L r ) 2(4) Grid size of TX (RX) detectingdictionary steering vectors G t ( G r ) 2 N t (2 N r ) Grid size of TX (RX) reﬁningdictionary steering vectors G r t ( G rr ) 8 N t (8 N r ) Channel paths L Number of delay taps of the channel N c Distribution of AoAs/AoDs U (0 , π ) the table conﬁrms that the proposed support detection methodusing the trained DnCNN needs few iterations to converge. B. Computational Analysis

The computational complexity for Algorithm 1 and Algo-rithm 2 are provided in Table III and Table IV, respectively.For comparison purposes, the overall computational complex-ity of SW-OMP [11] benchmark is also provided in Table V.Since some steps can be performed before running the channelestimation algorithms, we will distinguish between online andofﬂine operations. For instance, the matrices Υ w = D −∗ w Υ , C w , D w , Υ dw , and Υ rw can be computed ofﬂine before explicitchannel estimation.Besides, the computational complexity of the proposedDnCNN arises from both online deployment and ofﬂine train-ing. Although the online complexity is easier to compute,the ofﬂine training complexity is still an open issue dueto a more involved implementation of the backpropagationprocess during training [42]. Therefore, we only consider thecomplexity of the online deployment which is based on simplematrix-vector multiplications.For a deep neural network with L C convolutional layers[43], the total time complexity of is given by O (cid:32) L C (cid:88) l =1 D ( l ) x D ( l ) y D ( l ) z b ( l ) x b ( l ) y c ( l − c ( l )CL (cid:33) (44)where D ( l ) x , D ( l ) y and D ( l ) z are the convolutional kernel dimen-sions, b ( l ) x and b ( l ) y are the dimensions of the l th convolutionallayer output; and c ( l )CL is the number of ﬁlters in the l th layer.We should also note that DL enjoys the advantages of graphicsprocessing units (GPUs) and parallel processing, and hence,the overall time complexity is dominated by the analyticaloperations performed in the proposed algorithms.Moreover, we observe that the overall computational com-plexity of DL-CS-CE is lower than SW-OMP specially forsmall grid sizes (for instance, when G t and G r are twice thesize of the transmit and receive antennas). Moreover, whenthe reﬁned algorithm is applied with the new reﬁning higherresolution G rt and G rr , the computational complexity is still lessthan that of SW-OMP applied with the same higher resolutiongrid sizes applied ( G rt and G rr ). In Section V-D, we comparethe computation times of the proposed methods with that ofSW-OMP. V. S

IMULATION R ESULTS

This section evaluates the performance of the proposedalgorithms and compares empirical results with benchmarkfrequency-domain channel estimation algorithms, includingSW-OMP [11]. The results are obtained through extensiveMonte Carlo simulations to evaluate the average normalizedmean squared error (NMSE), and the ergodic rate as a functionof SNR and the number of training frames M . The simulationsare performed based on realistic channel realizations fromRaymobtime channel datasets .The main parameters used for system conﬁguration are asfollows. The phase-shifters used in both the transmitter and thereceiver are assumed to have N Q quantization bits, so that theentries of the training vectors f ( m ) tr , w ( m ) tr , m = 1 , , . . . , M are drawn from the set A = (cid:110) , π N Q , . . . , π (2 N Q − N Q (cid:111) . Thenumber of quantization bits is set to N Q = 2 . The band-limiting ﬁlter p rc ( t ) is assumed to be a raised-cosine ﬁlterwith roll-off factor of . .The DnCNN adopted in this work has L C = 3 convolutionallayers. The ﬁrst convolutional layer uses c = 64 different × × ﬁlters. The succeeding convolutional layer uses different × × ﬁlters. The ﬁnal convolutional layer usesone separate × × ﬁlter. Moreover, we divide the datasetinto the training set and the validation set randomly, where thesize of the training set is

70 % of the total set and the validationset is the other

30 % . We adopt the adaptive moment estimation(Adam) optimizer to train the DnCNN. The DnCNN is trainedfor epochs, where mini-batches are utilized in eachepoch. The learning rate is set to . . The training processterminates when the validation accuracy does not improve inten consecutive iterations.Unless stated explicitly otherwise, the default system pa-rameters used throughout the experimental simulations aresummarized in Table VI, where U ( · , · ) represents the uniformdistribution. A. Comparison of the Normalized Mean Squared Errors

One of the key performance metrics for the channel estimate ˆ H [ k ] is the NMSE, which is expressed for a given realizationas NMSE = (cid:80) K − k =0 (cid:107) ˆ H [ k ] − H [ k ] (cid:107) F (cid:80) K − k =0 (cid:107) H [ k ] (cid:107) F . (45)The NMSE is considered our baseline metric to compute theproposed algorithms’ performance and will be averaged overmany channel realizations. The normalized CRLB (NCRLB),from which the supports are perfectly estimated [11], is alsoprovided to compare each algorithm’s average performancewith the lowest achievable NMSE.We compare the average NMSE versus SNR obtained forthe different channel estimation algorithms in Figs. 4 for apractical SNR range of −

15 dB to and three differentlengths of training frames M = { , , } . It is worthnoting that the choice of the SNR range is based on the factthat the SNR expected in mmWave communication systems -15 -10 -5 0 5SNR (dB)-25-20-15-10-50 N M SE ( d B ) DL-CS-CERefined DL-CS-CESW-OMP G r =2N r , G t =2N t SW-OMP G r =8N r ,G t =8N t CRLB G r =2N r , G t =2N t CRLB G r =8N r , G t =8N t (a) M = 100 -15 -10 -5 0 5SNR (dB)-25-20-15-10-50 N M SE ( d B ) DL-CS-CERefined DL-CS-CESW-OMP G r =2N r , G t =2N t SW-OMP G r =8N r ,G t =8N t CRLB G r =2N r , G t =2N t CRLB G r =8N r , G t =8N t (b) M = 80 -15 -10 -5 0 5SNR (dB)-25-20-15-10-505 N M SE ( d B ) DL-CS-CERefined DL-CS-CESW-OMP G r =2N r , G t =2N t SW-OMP G r =8N r ,G t =8N t CRLB G r =2N r , G t =2N t CRLB G r =8N r , G t =8N t (c) M = 60 Fig. 4: The NMSE vs. SNR for the DL-CS-CE, the reﬁnedDL-CS-CE, and the SW-OMP ( N t = 16 , N r = 64 , K = 16 ).is in the order of −

20 dB up to . Using a large numberof training frames M increases performance at the cost of both higher overhead and computational complexity since thecomplexity of estimating the support, channel gains, and noisevariance grows linearly with L r M .In Fig. 4, DL-CS-CE with reﬁning performs the best,achieving NMSE values very close to the NCRLB (around gap). The performance difference between SW-OMPand proposed algorithms is noticeable, which comes from thefact that SW-OMP estimates the mmWave channel dominantentries sequentially rather than at a single shot. The DL-CS-CE obviously deliver an NMSE lower than that of SW-OMPby − . The reﬁned DL-CS-CE achieves even lower NMSEvalues below −

10 dB especially for low SNR values such asSNR = −

15 dB whereas SW-OMP with higher resolutiongrid sizes achieves NMSE around − and − forSNR = −

15 dB .In Fig. 5, we compare the NMSE of the DL-CS-CE with G r = 2 N r and G t = 2 N t and the reﬁned DL-CS-CEwith reﬁning grid sizes G rr = { N r , N r , N r , N r } and G rt = { N t , N t , N t , N t } . It is obvious from Fig. 5 thatsetting the dictionary sizes to twice the number of antennasat transmitter and receiver is not enough to estimate theexact AoDs/AoAs that lie in the off grid regions of thedictionary. At this very point, the reﬁning method introducedin Algorithm 2 is shown to greatly enhance the NMSEperformance especially for the low SNR regime, at the cost ofincreased computational complexity as the reﬁning resolutionincreases as shown in Table IV. Hence, a trade-off existsbetween attaining good NMSE performance and keeping thecomputational complexity order low. However, even with theproposed reﬁning approach, the complexity remains lowerthan that of SW-OMP for the same high resolution dictionarymatrices by at least two orders of magnitude. For instance,by taking M = 100 , K p = K/ , G rt = 8 N t , and G rr = 8 N r ,the complexity order of SW-OMP is O ( K ( G rr G rt ) M L r L ) = O (6 . × ) , while the complexity order of the reﬁned DL-CS-CE is O ( K p G rr M L r L ) = O (1 . × ) . Moreover, Fig. 5shows that as the reﬁning resolution increases (i.e., G rr > N r , G rt > N t ), the NMSE enhancement becomes gradual as nofurther gains are attained from further reﬁnement. B. Comparisons for the Probability of Successful SupportEstimation for L Paths

In Fig. 6, we compare the successful support detectionprobability versus SNR for the proposed DnCNN-based am-plitude estimation and that of SW-OMP. It can be seen thatthe proposed DnCNN outperforms SW-OMP over the wholeSNR range as the trained DnCNN can efﬁciently denoisethe correlated input image and obtain a sparse matrix of thechannel amplitudes. From this denoised sparse matrix, theindices of the supports (i.e., dominant entries of th obtainedsparse matrix) are detected. Moreover, we show that when weset K p (cid:28) K , the support detection is not affected, since asshown in Section II-B1 ∆ [ k ] have the same support for all k . Therefore, we can reduce computational complexity sincethere is no need to compute the correlation step (given in (21)for all subcarriers. Thus, a smaller subset of subcarriers canalso provide a high probability of correct support detection. -15 -10 -5 0 5 SNR (dB) -22-20-18-16-14-12-10-8-6-4 N M SE ( d B ) DL-CS-CERefined DL-CS-CE G rr =2N r , G tr =2N t Refined DL-CS-CE G rr =4N r , G tr =4N t Refined DL-CS-CE G rr =8N r , G tr =8N t Refined DL-CS-CE G rr =16N r , G tr =16N t Fig. 5: The NMSE vs. SNR for the DL-CS-CE and thereﬁned DL-CS-CE under different reﬁning grid sizes of G rr = { N r , N r , N r , N r } and G rt = { N t , N t , N t , N t } ( N t = 16 , N r = 64 , K = 16 , M = 100 ). -15 -10 -5 0 5SNR (dB)0.10.20.30.40.50.60.70.80.9 S u cc e ss P r obab ili t y o f S uppo r t D e t e c t i on DnCNN K p =KDnCNN K p =K/4SW-OMP Fig. 6: Probability of successfully detecting the supports vs.SNR for the DL-CS-CE, the reﬁned DL-CS-CE, and the SW-OMP ( N t = 16 , N r = 64 , K = 16 , M = 100 ). C. Spectral Efﬁciency Comparison

Another key performance metric is the spectral efﬁ-ciency, which is computed by assuming fully-digital pre-coding and combining. In this way, using estimates for the N s dominant left and right singular vectors of the chan-nel estimate gives K parallel effective channels H eff [ k ] = (cid:104) ˆ U [ k ] (cid:105) ∗ : , N s H [ k ] (cid:104) ˆ V [ k ] (cid:105) : , N s . Accordingly, the average spec-tral efﬁciency can be expressed as R = 1 K K − (cid:88) k =0 N s (cid:88) n =1 log (cid:18) SNR N s λ n ( H eff [ k ]) (cid:19) , (46)with λ n ( H eff [ k ]) , n = 1 , . . . , N s the eigenvalues of eacheffective channel H eff [ k ] .In Fig.7, we show the achievable spectral efﬁciency asa function of the SNR for the different channel estimationalgorithms. The proposed DL-CS-CE approach provides atleast . performance improvement over the SW-OMP. -15 -10 -5 0 5SNR (dB)246810121416 S pe c t r a l E ff i c i en cy bp s / H z Perfect CSIDL-CS-CERefined DL-CS-CESW-OMP G r =2N r , G t =2N t SW-OMP G r =8N r ,G t =8N t Fig. 7: Spectral efﬁciency vs. SNR ( N t = 16 , N r = 64 , K =16 , M = 100 ).

40 50 60 70 80 90 100

Training Length (M) S pe c t r a l E ff i c i en cy bp s / H z Perfect CSIDL-CS-CERefined DL-CS-CESW-OMP G r =2N r , G t =2N t SW-OMP G r =8N r ,G t =8N t SNR=0 dBSNR=-15 dB

Fig. 8: Spectral efﬁciency vs. M training lengths ( N t =16 , N r = 64 , K = 16 , SNR = {− , } ).The reﬁned DL-CS-CE provides near-optimal achievable rateswith at least . performance improvement over the otherschemes. The spectral efﬁciency gap of the different schemesis smaller than that of the NMSE gap, since the NMSEperformance is much more sensitive to the success rate of thesparse recovery. However, the spectral efﬁciency performanceis determined by the beamforming gain and is less sensitiveto the success rate of the sparse recovery.In Fig. 8, we show the achievable spectral efﬁciency as afunction of different training lengths for the proposed schemesunder different SNRs. We observe that using M > framesdoes not signiﬁcantly improve performance, which leveragesthe robustness of the two proposed approaches. Simulationsalso show that near-optimal achievable rates can be achievedby using a reasonable number of frames, i.e., ≤ M ≤ .Therefore, with the proposed schemes, we can save in trainingoverhead. Table VII: A VERAGE R UNNING T IME FOR M = 100 AND

SNR = − Algorithm Run time [seconds]

DL-CS-CE G r = 2 N r and G t = 2 N t . Reﬁned DL-CS-CE G rr = 2 N r and G rt = 2 N t . Reﬁned DL-CS-CE G rr = 8 N r and G rt = 8 N t . SW-OMP for grids G r = 2 N r and G t = 2 N t . SW-OMP for grids G r = 8 N r and G t = 8 N t . D. Time Complexity Analysis

Table VII shows online estimation stage computationaltimes of the proposed frameworks and SW-OMP [11]. SW-OMP is the slowest to solve the inherent optimization problem,especially for high-resolution dictionary matrices. The run-ning time of the DL-CS-CE without reﬁning exhibits shortercomputational times than the SW-OMP algorithm. However,for fair comparison when reﬁning is applied, we compare therunning time of the reﬁned DL-CS-CE with higher resolutionSW-OMP where G r = G rr = 8 N r and G t = G rt = 8 N t , and itis shown that the reﬁned DL-CS-CE takes less time to performthe channel estimation. Hence, we conclude that the proposedDL-CS-CE frameworks are computationally efﬁcient and tol-erant, especially for higher resolution dictionary matrices.VI. C ONCLUSION

In this work, we have proposed two DL-CS-basedfrequency-selective channel estimation approaches formmWave wideband communication systems under hybridarchitectures. The developed algorithms are based on joint-sparse recovery to exploit information on the common basisshared for every subcarrier. Compared to the state-of-the-art channel estimation techniques that estimate supportsiteratively, the proposed solutions reduce computationalcomplexity and estimation error by detecting all supportssimultaneously. Simulation results have shown that theDL-CS-CE and the reﬁned DL-CS-CE schemes have betterchannel estimation performance than existing schemes usinga reasonably small training length and low complexity order.It has also been shown that a small number of subcarriersare sufﬁcient for successful support detection during thedeep learning prediction phase. Thus, the proposed schemesare able to attain good NMSE performance with lowcomputational complexity.R

EFERENCES[1] Z. Pi and F. Khan, “An introduction to millimeter-wave mobile broad-band systems,”

IEEE Commun. Mag. , vol. 49, no. 6, pp. 101–107, Jun.2011.[2] R. W. Heath, N. Gonz´alez-Prelcic, S. Rangan, W. Roh, and A. M.Sayeed, “An overview of signal processing techniques for MillimeterWave MIMO systems,”

IEEE J. Sel. Topics Signal Process. , vol. 10,no. 3, pp. 436–453, Apr. 2016.[3] T. Bai, A. Alkhateeb, and R. W. Heath, “Coverage and capacity ofmillimeter-wave cellular networks,”

IEEE Commun. Mag. , vol. 52, no. 9,pp. 70–77, Sep. 2014.[4] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K.Soong, and J. C. Zhang, “What will 5g be?”

IEEE J. Sel. AreasCommun. , vol. 32, no. 6, pp. 1065–1082, Jun. 2014.[5] A. Alkhateeb, J. Mo, N. Gonzalez-Prelcic, and R. W. Heath, “Mimoprecoding and combining solutions for millimeter-wave systems,”

IEEECommun. Mag. , vol. 52, no. 12, pp. 122–131, Dec. 2014. [6] R. M´endez-Rial, C. Rusu, N. Gonz´alez-Prelcic, A. Alkhateeb, and R. W.Heath, “Hybrid MIMO architectures for millimeter wave communica-tions: Phase shifters or switches?” IEEE Access , vol. 4, pp. 247–267,Jan. 2016.[7] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channelestimation and hybrid precoding for millimeter wave cellular systems,”

IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 831–846, Oct.2014.[8] M. F. Duarte and Y. C. Eldar, “Structured compressed sensing: Fromtheory to applications,”

IEEE Trans. Signal Process. , vol. 59, no. 9, pp.4053–4085, Sep. 2011.[9] Z. Gao, C. Hu, L. Dai, and Z. Wang, “Channel estimation for millimeter-wave massive MIMO with hybrid precoding over frequency-selectivefading channels,”

IEEE Commun. Lett. , vol. 20, no. 6, pp. 1259–1262,Jun. 2016.[10] K. Venugopal, A. Alkhateeb, R. W. Heath, and N. G. Prelcic, “Time-domain channel estimation for wideband millimeter wave systems withhybrid architecture,” in

Proc. IEEE Int. Conf. Acoustics, Speech, andSignal Process. (ICASSP) , 2017, pp. 6493–6497.[11] J. Rodr´ıguez-Fern´andez, N. Gonz´alez-Prelcic, K. Venugopal, andR. W. Heath, “Frequency-domain compressive channel estimationfor frequency-selective hybrid millimeter waveMIMO systems,”

IEEETrans. Wireless Commun. , vol. 17, no. 5, pp. 2946–2960, May 2018.[12] W. Ma and C. Qi, “Beamspace channel estimation for millimeter wavemassive MIMO system with hybrid precoding and combining,”

IEEETrans. Signal Process. , vol. 66, no. 18, pp. 4839–4853, Sep. 2018.[13] H. Ye, G. Y. Li, and B. Juang, “Power of deep learning for channelestimation and signal detection in ofdm systems,”

IEEE Microw. WirelessCompon. Lett. , vol. 7, no. 1, pp. 114–117, Feb. 2018.[14] P. Dong, H. Zhang, G. Y. Li, I. S. Gaspar, and N. Naderi Alizadeh, “DeepCNN-based channel estimation for mmwave massive MIMO systems,”

IEEE J. Sel. Topics Signal Process. , vol. 13, no. 5, pp. 989–1000, Sep.2019.[15] H. He, C. Wen, S. Jin, and G. Y. Li, “Deep learning-based channelestimation for beamspace mmwave massive MIMO systems,”

IEEECommun. Lett. , vol. 7, no. 5, pp. 852–855, Oct. 2018.[16] M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deeplearning-based channel estimation,”

IEEE Commun. Lett. , vol. 23, no. 4,pp. 652–655, Apr. 2019.[17] W. Ma, C. Qi, Z. Zhang, and J. Cheng, “Sparse channel estimationand hybrid precoding using deep learning for millimeter wave massiveMIMO,”

IEEE Trans. Commun. , vol. 68, no. 5, pp. 2838–2849, May2020.[18] X. Wei, C. Hu, and L. Dai, “Knowledge-aided deep learning forbeamspace channel estimation in millimeter-wave massive MIMO sys-tems,” arXiv preprint arXiv:1910.12455 , 2019.[19] C. Chun, J. Kang, and I. Kim, “Deep learning-based channel estimationfor massive MIMO systems,”

IEEE Microw. Wireless Compon. Lett. ,vol. 8, no. 4, pp. 1228–1231, Aug. 2019.[20] Y. Jin, J. Zhang, S. Jin, and B. Ai, “Channel estimation for cell-freemmwave massive MIMO through deep learning,”

IEEE Trans. Veh.Technol. , vol. 68, no. 10, pp. 10 325–10 329, Oct. 2019.[21] E. Balevi, A. Doshi, and J. G. Andrews, “Massive mimo channelestimation with an untrained deep neural network,”

IEEE Trans. WirelessCommun. , vol. 19, no. 3, pp. 2079–2090, Mar. 2020.[22] ¨O. T. Demir and E. Bj¨ornson, “Channel estimation in massive mimounder hardware non-linearities: Bayesian methods versus deep learning,”

IEEE Open Journal of the Commun. Soc. , vol. 1, pp. 109–124, 2020.[23] Y. Long, Z. Chen, J. Fang, and C. Tellambura, “Data-driven-based analogbeam selection for hybrid beamforming under mm-wave channels,”

IEEE J. Sel. Topics Signal Process. , vol. 12, no. 2, pp. 340–352, May2018.[24] J. A. Hodge, K. Vijay Mishra, and A. I. Zaghloul, “Multi-discriminatordistributed generative model for multi-layer rf metasurface discovery,”in

Proc. IEEE Global Conf. on Signal and Inform. Process. (GlobalSIP) ,Ottawa, ON, Canada, 2019, pp. 1–5.[25] A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deeplearning coordinated beamforming for highly-mobile millimeter wavesystems,”

IEEE Access , vol. 6, pp. 37 328–37 348, 2018.[26] H. Huang, Y. Song, J. Yang, G. Gui, and F. Adachi, “Deep-learning-based millimeter-wave massive MIMO for hybrid precoding,”

IEEETrans. Veh. Technol. , vol. 68, no. 3, pp. 3027–3032, Mar. 2019.[27] A. M. Elbir, “Cnn-based precoder and combiner design in mmwaveMIMO systems,”

IEEE Commun. Lett. , vol. 23, no. 7, pp. 1240–1243,Jul. 2019.[28] A. M. Elbir and K. V. Mishra, “Joint antenna selection and hybridbeamformer design using unquantized and quantized deep learning networks,”

IEEE Trans. Wireless Commun. , vol. 19, no. 3, pp. 1677–1688, Mar. 2020.[29] A. M. Elbir and K. V. Mishra, “Online and ofﬂine deep learning strate-gies for channel estimation and hybrid beamforming in multi-carriermm-wave massive MIMO systems,” arXiv preprint arXiv:1912.10036 ,2019.[30] S. D¨orner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep learningbased communication over the air,”

IEEE J. Sel. Topics Signal Process. ,vol. 12, no. 1, pp. 132–143, Feb. 2018.[31] J. Xu, P. Zhu, J. Li, and X. You, “Deep learning-based pilot design formulti-user distributed massive MIMO systems,”

IEEE Commun. Lett. ,vol. 8, no. 4, pp. 1016–1019, Aug. 2019.[32] J. Kang, C. Chun, and I. Kim, “Deep-learning-based channel estimationfor wireless energy transfer,”

IEEE Commun. Lett. , vol. 22, no. 11, pp.2310–2313, Nov. 2018.[33] A. Abdallah and M. M. Mansour, “Efﬁcient angle-domain processingfor fdd-based cell-free massive mimo systems,”

IEEE Transactions onCommunications , vol. 68, no. 4, pp. 2188–2203, Apr. 2020.[34] ——, “Angle-based multipath estimation and beamforming for fdd cell-free massive mimo,” in

Proc. IEEE Int. Sig. Process. Advances inWireless Commun. Workshop (SPAWC) , Cannes, France, 2019, pp. 1–5.[35] A. Alkhateeb and R. W. Heath, “Frequency selective hybrid precodingfor limited feedback millimeter wave systems,”

IEEE Trans. Commun. ,vol. 64, no. 5, pp. 1801–1818, May 2016.[36] E. Bjornson, L. Van der Perre, S. Buzzi, and E. G. Larsson, “MassiveMIMO in sub-6 ghz and mmWave: Physical, practical, and use-casedifferences,”

IEEE Trans. Wireless Commun. , vol. 26, no. 2, pp. 100–108, Apr. 2019.[37] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussiandenoiser: Residual learning of deep cnn for image denoising,”

IEEETrans. Image Process. , vol. 26, no. 7, pp. 3142–3155, Jul. 2017.[38] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning forimage recognition,” in

Proc. IEEE Conf.on Computer Vision and PatternRecognition , Las Vegas, NV, 2016, pp. 770–778.[39] S. M. Kay,

Fundamentals of statistical signal processing . Prentice HallPTR, 1993.[40] R. Niazadeh, M. Babaie-Zadeh, and C. Jutten, “On the achievability ofCram´er–Rao bound in noisy compressed sensing,”

IEEE Trans. SignalProcess. , vol. 60, no. 1, pp. 518–526, Jan. 2012.[41] Z. Qin, J. Fan, Y. Liu, Y. Gao, and G. Y. Li, “Sparse representationfor wireless communications: A compressive sensing approach,”

IEEESignal Process. Mag. , vol. 35, no. 3, pp. 40–58, May 2018.[42] B. Matthiesen, A. Zappone, K. L. Besser, E. A. Jorswieck, and M. Deb-bah, “A globally optimal energy-efﬁcient power control framework andits efﬁcient implementation in wireless interference networks,”

IEEETrans. Signal Process. , vol. 68, pp. 3887–3902, 2020.[43] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014.

Asmaa Abdallah received the B.S. (with High Dis-tinction) and M.S degree in computer and commu-nications engineering from Raﬁk Hariri University(RHU), Lebanon, in 2013 and 2015, respectively. In2020, she received the Ph.D. degree in electrical andcomputer engineering at the American University ofBeirut (AUB), Beirut, Lebanon. She is currently apost-doctoral fellow at King Abdullah University ofScience and Technology (KAUST). She has been aresearch and teaching assistant at AUB since 2015.She was a research intern at Nokia Bell Labs inFrance from July 2019 till December 2019, where she worked on newhybrid automatic request (HARQ) mechanisms for long-delay channel innon-terrestrial networks (NTN). Her research interests are in the area ofcommunication theory, stochastic geometry for wireless communications,array signal processing, with emphasis on energy and spectral efﬁcientalgorithms for Device-to-Device (D2D) communications, massive multiple-input and multiple-output (MIMO) systems and cell free massive MIMOsystems. Ms. Abdallah was the recipient of the Academic Excellence Awardat RHU in 2013 for ranking ﬁrst on the graduating class. She also receiveda scholarship from the Lebanese National Counsel for Scientiﬁc Research(CNRS-L/AUB) to support her doctoral studies. Abdulkadir Celik (S’14-M’16-SM’19) received theM.S. degree in electrical engineering in 2013, theM.S. degree in computer engineering in 2015, andthe Ph.D. degree in co-majors of electrical engi-neering and computer engineering in 2016 fromIowa State University, Ames, IA, USA. He was apost-doctoral fellow at King Abdullah University ofScience and Technology (KAUST) from 2016 to2020. Since 2020, he has been a research scientistat the communications and computing systems labat KAUST. His research interests are in the areas ofwireless communication systems and networks.

Mohammad M. Mansour (S’97-M’03-SM’08) re-ceived the B.E. (Hons.) and the M.E. degrees incomputer and communications engineering fromthe American University of Beirut (AUB), Beirut,Lebanon, in 1996 and 1998, respectively, and theM.S. degree in mathematics and the Ph.D. degreein electrical engineering from the University of Illi-nois at Urbana–Champaign (UIUC), Champaign, IL,USA, in 2002 and 2003, respectively.He was a Visiting Researcher at Qualcomm, SanJose, CA, USA, in summer of 2016, where heworked on baseband receiver architectures for the IEEE 802.11ax standard. Hewas a Visiting Researcher at Broadcom, Sunnyvale, CA, USA, from 2012 to2014, where he worked on the physical layer SoC architecture and algorithmdevelopment for LTE-Advanced baseband receivers. He was on research leavewith Qualcomm Flarion Technologies in Bridgewater, NJ, USA, from 2006to 2008, where he worked on modem design and implementation for 3GPP-LTE, 3GPP2-UMB, and peer-to-peer wireless networking physical layer SoCarchitecture and algorithm development. He was a Research Assistant at theCoordinated Science Laboratory (CSL), UIUC, from 1998 to 2003. He workedat National Semiconductor Corporation, San Francisco, CA, with the WirelessResearch group in 2000. He was a Research Assistant with the Departmentof Electrical and Computer Engineering, AUB, in 1997, and a TeachingAssistant in 1996. He joined as a faculty member with the Department ofElectrical and Computer Engineering, AUB, in 2003, where he is currentlya Professor. His research interests are in the area of energy-efﬁcient andhigh-performance VLSI circuits, architectures, algorithms, and systems forcomputing, communications, and signal processing.Prof. Mansour is a member of the Design and Implementation of SignalProcessing Systems (DISPS) Technical Committee Advisory Board of theIEEE Signal Processing Society. He served as a member of the DISPSTechnical Committee from 2006 to 2013. He served as an Associate Editor forIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II (TCAS-II) from2008 to 2013, as an Associate Editor for the IEEE SIGNAL PROCESSINGLETTERS from 2012 to 2016, and as an Associate Editor of the IEEETRANSACTIONS ON VLSI SYSTEMS from 2011 to 2016. He served asthe Technical Co-Chair of the IEEE Workshop on Signal Processing Systemsin 2011, and as a member of the Technical Program Committee of variousinternational conferences and workshops. He was the recipient of the PHIKappa PHI Honor Society Award twice in 2000 and 2001, and the recipientof the Hewlett Foundation Fellowship Award in 2006. He has seven issuedU.S. patents.