[PDF] Deep Learning-based Phase Reconfiguration for Intelligent Reflecting Surfaces

Abstract

Intelligent reflecting surfaces (IRSs), consisting of reconfigurable metamaterials, have recently attracted attention as a promising cost-effective technology that can bring new features to wireless communications. These surfaces can be used to partially control the propagation environment and can potentially provide a power gain that is proportional to the square of the number of IRS elements when configured in a proper way. However, the configuration of the local phase matrix at the IRSs can be quite a challenging task since they are purposely designed to not have any active components, therefore, they are not able to process any pilot signal. In addition, a large number of elements at the IRS may create a huge training overhead. In this paper, we present a deep learning (DL) approach for phase reconfiguration at an IRS in order to learn and make use of the local propagation environment. The proposed method uses the received pilot signals reflected through the IRS to train the deep feedforward network. The performance of the proposed approach is evaluated and the numerical results are presented.

Full PDF

DDeep Learning-based Phase Reconﬁguration forIntelligent Reﬂecting Surfaces

Özgecan Özdo˘gan, Emil Björnson,Department of Electrical Engineering (ISY), Linköping University, Sweden

Abstract —Intelligent reﬂecting surfaces (IRSs), consisting ofreconﬁgurable metamaterials, have recently attracted attentionas a promising cost-effective technology that can bring newfeatures to wireless communications. These surfaces can beused to partially control the propagation environment and canpotentially provide a power gain that is proportional to the squareof the number of IRS elements when conﬁgured in a proper way.However, the conﬁguration of the local phase matrix at the IRSscan be quite a challenging task since they are purposely designedto not have any active components, therefore, they are not able toprocess any pilot signal. In addition, a large number of elementsat the IRS may create a huge training overhead. In this paper, wepresent a deep learning (DL) approach for phase reconﬁgurationat an IRS in order to learn and make use of the local propagationenvironment. The proposed method uses the received pilot signalsreﬂected through the IRS to train the deep feedforward network.The performance of the proposed approach is evaluated and thenumerical results are presented.

I. I

NTRODUCTION

An intelligent reﬂecting surface (IRS), also known underthe names reconﬁgurable intelligent surface [1] and software-controlled metasurface [2], is a thin two-dimensional meta-surface that is used to aid communications [3]. According tothe application of interest, an IRS has the ability to controland transform electromagnetic waves that are impinging onit. Recently, it has received a massive attention from theacademia and sometimes marketed as one of the key enablingtechnologies for the next generation wireless communicationsystems.Bringing such a technology into reality requires to addreesmany practical challenges. For instance, the proper conﬁgu-ration of an IRS critically depends on accurate channel stateinformation (CSI). However, there are two main issues thatcomplicates the channel acquisition with IRS [4]. First, the IRSis not inherently equipped with transceiver chains. Therefore,it can not sense the pilot signals. Besides, introducing an IRSinto an existing setup will increase the number of channelcoefﬁcients proportionally to the number of IRS elements.In the literature, some deep learning (DL) solutions arediscussed to tackle these problems [5]. In [6], a supervisedlearning approach is presented where two identical convo-lutional neural networks (CNNs) are trained to estimate thedirect and cascaded channels. In [7], a feedforward neuralnetwork is proposed to unveil the mapping between themeasured user coordinates and the optimal phase matrix at theIRS that maximzes the targeted user’s signal strength. Anotherapproach is to equip the IRS with a small number of activeelements with sensing capabilities. The data collected from the active elements are utilized during the training of deepneural networks (DNNs) in [8], [9] and the underlying channelstructure is exploited to learn the entire channel. There are alsodeep reinforcement learning based methods that aim to solvethe problem of joint optimization of IRS phases and transmitbeamforming assuming perfect CSI [10], [11].In this paper, we propose a novel DL approach for phase-conﬁguration in an IRS-assisted MIMO system. We design twoDNNs that are fed by the received pilot signals to directly ﬁndthe mapping between the pilot signals and the optimum phasematrix and downlink transmit beamforming vector, therebybypassing the conventional intermediate step of estimating thechannels, which is prone to error propagation. In the ﬁrstDNN, we send full-length pilot sequences and compare ourresults with a conventional least-square (LS) estimator basedscheme. In the second method, our goal is to reduce the pilotoverhead. We train the DNN with shorter pilot sequences andpredict the optimum phases and beamforming vector at theonline stage.

Notation : Lower and upper case boldface letters are usedfor vectors and matrices, respectively. The transpose andHermitian transpose of a matrix A are written as A T and A H , respectively. The superscript ( . ) ∗ denotes the complexconjugate. The operation A = diag ( a ) with a ∈ C N × returns the matrix A ∈ C N × N with a on the diagonal. Theoperator ⊗ denotes the Kronecker product. The Euclidiannorm is denoted by (cid:107)·(cid:107) .II. S YSTEM M ODEL WITH

IRS

SUPPORTED TRANSMISSION

We consider communication from an M -antenna BS to asingle-antenna user equipment (UE) as shown in Fig. 1. Aplanar IRS with N elements (composed of N H horizontal and N V vertical) is located in between to assist. The locations ofthe BS and IRS are ﬁxed whereas the UE can be in differentlocations. Each element of the IRS has the ability to introducea phase shift to an incoming narrowband signal. The phase isadjusted by an IRS-controller that enables manipulation of theimpinging wave. The IRS-controller is connected to the BSover a backhaul link to coordinate between the IRS and BS.To conﬁgure the IRS elements, the CSI is crucial. Since theIRS is not equipped with radio frequency chains, we assumethat the channel estimation is performed at the BS side. A. Channel Estimation

We assume quasi-static ﬂat-fading channels and the systemoperates in time divison duplex (TDD) mode. Pilot-basedchannel training is utilized to estimate the channels at the BS. a r X i v : . [ ee ss . SP ] S e p ig. 1: Illustration of an IRS-assisted communication system.During the channel estimation phase, the UE sends the pilotsignal x t ∈ C at time slot t . The received pilot signal at theBS is modeled as [12] y t = ( h d + H br diag( φ t ) h ru ) x t + n t , (1)where n t ∼ CN ( , I M ) is the additive white Gaussian noise(AWGN), h d ∈ C M × , H br ∈ C M × N , h ru ∈ C N × are thechannels between BS and UE, BS and IRS, IRS and UE,respectively. The phase conﬁguration at the IRS at time slot t is denoted by φ t = [ e jφ t, , . . . , e jφ t,N ] T ∈ C N × where φ t,n ∈ [0 , π ) is the phase shift of the n th element.We assume that the BS is equipped with a horizon-tal uniform linear array (ULA) placed on the x -axis. Un-like the UE, the IRS and BS have typically ﬁxed lo-cations once they are deployed. Therefore, H br is repre-sented by a static line-of-sight (LoS) channel as H br = √ β br a BS ( ϕ BS , θ BS ) a IRS ( ϕ IRS , θ

IRS ) H where β br is thepathloss coefﬁcient, a BS ( ϕ BS , θ BS ) = (cid:104) , . . . , e j π ( M − d H cos( ϕ BS ) cos( θ BS ) (cid:105) T (2)is the BS’s array response vector where ϕ BS , θ BS are theazimuth and elevation angle-of-arrivals (AoA) to the IRS seenfrom the BS, d H is the antenna spacing parameter measuredin the number of wavelengths. The array response of the IRS(placed on the yz -plane) is denoted by a IRS ( ϕ IRS , θ

IRS ) = [ e j k ( ϕ IRS ,θ IRS ) T u , . . . , e j k ( ϕ IRS ,θ IRS ) T u N ] T (3)where ϕ IRS and θ IRS are the azimuth and elevation angle-of-departures (AoD) to the BS seen from the IRS, respectively.Recall that we consider a planar IRS. The wave vector is k ( ϕ IRS , θ

IRS ) = 2 πλ c  cos( ϕ IRS ) cos( θ IRS )sin( ϕ IRS ) cos( θ IRS )sin( θ IRS )  , (4)and the indexing vector is u n = [0 , i ( n ) d r λ c , j ( n ) d r λ c ] T where λ c is the wavelength at the carrier frequency, i ( n ) =mod ( n − , N H ) , and j ( n ) = (cid:98) ( n − /N H (cid:99) are used for thedescribing the location of each IRS element [13, Sec. 7.3].The parameter d r denotes the element spacing at the IRS, in both the horizontal and vertical directions. Notice that theULA array response in (2) is a special case of planar arrayresponse in (3) where [ a BS ( ϕ BS , θ BS )] m = e j k ( ϕ BS ,θ BS ) T u m with u m = [( m − d H λ c , , T .To account for the assumed limited scattering environment,the channels h d and h ru are represented by the Saleh-Valenzuela (SV) model [6], [14]. We assume that there are L d and L ru paths, respectively. Thus, the direct channel ismodeled as h d = (cid:114) L d L d (cid:88) l =1 α l d a BS ( ϕ l BS , θ l BS ) (5)where α l d is the complex channel gain, ϕ l BS , θ l BS are theazimuth and elevation AoAs associated with the l th path.Similarly, the channel between the IRS and UE is h ru = (cid:114) L ru L ru (cid:88) l =1 α l ru a IRS ( ϕ l IRS , θ l IRS ) (6)where α l ru is the complex channel gain, ϕ l IRS , θ l IRS are theazimuth and elevation AoAs associated with the l th path.At time slot t , we can rewrite (1) as y t = ( h d + V φ t ) x t + n t (7)where V = H br diag( h ru ) = [ v , v , . . . , v N ] ∈ C M × N isthe cascaded BS-IRS-UE channel. The pilot signals are sent T times by the UE. We assume that the channels are ﬁxedduring the estimation period and φ t is reconﬁgured at eachtime slot t . The collection of all the pilot signal at the BS is y p = [ y T , y T , . . . , y TT ] T ∈ C T M × can be written as y p = X ( Φ ⊗ I M ) h + n (8)where the pilot signal is X = diag ([ x M , . . . , x T M ]) ∈ C T M × T M , and n ∼ CN ( , I T M ) . The channels are stackedinto h = [ h T d , v T , . . . , v TN ] T ∈ C ( N +1) M × . All thephase conﬁgurations at the IRS are collected in Φ =[ ¯ φ , . . . , ¯ φ T ] T ∈ C T × ( N +1) where ¯ φ t = [1 , φ Tt ] T ∈ C ( N +1) × is the extended reﬂection pattern accounting forboth the direct and cascaded channels. Notice that the ﬁrstcolumn of Φ is set to an all one vector to estimate the directchannel.The IRS phase conﬁguration during the channel estimationperiod, Φ , mimics a discrete Fourier Transform matrix as in[12], [15]. More precisely, each element of the phase matrixcan be written as [ Φ ] t,n = e − j π ( t − n − N +1 (9)where Φ can not contain more than N + 1 unique valuesaround the unit circle. Note that this speciﬁc selection of Φ guarantess that rank ( Φ ) = min { T, N + 1 } and the phase ofeach element satisﬁes the unit-modulus constraint. Besides, theﬁrst column of Φ is equal to an all one vector. The property | [ Φ ] t,n | = 1 is particulary important since implementingdifferent amplitudes at each IRS element can be costlier andharder. Another potential choice of Φ that satisﬁes the sameconstraints is a truncated Hadamard matrix [15].ssuming that T ≥ N + 1 , based on the pilot signal y p ,the channels can be estimated by the LS estimator as [12] ˆ h = arg min h (cid:107) Ph − y p (cid:107) = (cid:0) P H P (cid:1) − P H y p (10)where P = X ( Φ ⊗ I M ) is the observation matrix. The BScan utilize these channel estimates to compute the downlinktransmit beamforming vector at the BS and the optimumphase conﬁguration at the IRS. Then, the BS can send the N optimum phases to the IRS via backhaul link. B. IRS Phase Reconﬁguration and Downlink Spectral Efﬁ-ciency

If the BS has perfect CSI, it can compute the optimal phasesand the beamforming vector using the alternating optimizationmethod in [16] as φ opt n = arg (cid:0) h H d w (cid:1) − arg (cid:0) v Hn w (cid:1) , (11) w opt = h d + V (cid:0) φ opt (cid:1) ∗ (cid:13)(cid:13)(cid:13) h d + V (cid:0) φ opt (cid:1) ∗ (cid:13)(cid:13)(cid:13) (12)where φ opt = [ φ opt1 , . . . , φ opt N ] T ∈ C N × . We initialize thebeamforming vector as w = √ M [1 , . . . , T . Note that theoptimized phases are obtained by phase aligning the direct andcascaded channels. Besides, for any given phase conﬁguration,the optimum transmit beamforming is equal to the maximumratio precoding vector.During the downlink transmission, the UE receives y r = (cid:0) h H d + h H ru diag (cid:0) φ opt (cid:1) H H br (cid:1) w opt s + n (13)where s is the data signal and n ∼ CN (0 , is the additivenoise. Alternatively, we can rewrite (13) as y r = (cid:16) h H d + (cid:0) φ opt (cid:1) T V H (cid:17) w opt s + n. (14)If the channels are ﬁxed throughout the transmission, therate is R = log (cid:18) γ (cid:12)(cid:12)(cid:12)(cid:16) h H d + (cid:0) φ opt (cid:1) T V H (cid:17) w opt (cid:12)(cid:12)(cid:12) (cid:19) (15) = log (cid:18) γ (cid:13)(cid:13)(cid:13)(cid:16) h H d + (cid:0) φ opt (cid:1) T V H (cid:17)(cid:13)(cid:13)(cid:13) (cid:19) (16)where γ is the signal-to-noise-ratio (SNR). If the BS utilizesthe LS estimator then it treats the estimated channels as thetrue channels and calculates φ opt and w opt based on ˆ h in(10). Then, the optimum phase conﬁguration φ opt based onLS estimator are sent to the IRS over the backhaul link.III. D EEP L EARNING - BASED P HASE C ONFIGURATION

According to the universal approximation theorem, a DNNhas the capability of approximating any continuous function[17]. In supervised learning, DNNs are trained using a trainingdataset that is given as input-output pairs. The goal of the pro-posed DNNs is to ﬁnd the mapping between the received pilotsignals and the optimum phase conﬁguration and downlinktransmit beamforming vector. The pilot signals go through allthe channels and reach the BS. Therefore, it captures important information for the phase and beamforming setting since thereis a nonlinear relation between the optimal phases and thechannel coefﬁcients. A properly designed DNN can learn thisrelation. Therefore, the problem is to train effectively theweights and biases of the DNN so that it can learn a nearlyoptimal mapping between received pilots and phases. A testdataset that is separately generated from the training data isused to evaluate the performance of the DNNs. During theonline phase, the trained DNNs compute the required phasesand beamforming vector.As mentioned earlier, a main challenge of channel acqui-sition with IRS is that the number of channel coefﬁcientsincreases proportionally to N . The conventional methods suchas the LS estimator in (10) requires a pilot training periodwith T ≥ N + 1 . When applying an LS estimator and thentreating the estimate as perfect, there is an information loss,which is not the case when we directly obtain the phase shiftsand beamforming vector. Besides, the LS estimator is unawareof the underlying propagation conditions, while a DNN canlearn it. Hence, it is possible for a DNN to outperformthe conventional LS method. In this paper, we present twodifferent DNNs with different T values as described in thefollowing subsections. A. Deep Learning Method 1

In the ﬁrst method, to train the DNN, we set T = N + 1 and use the input-output pairs { y p , Ω } that are generatedduring the preamble stage. The output is formed by stackingthe optimum phases and beamforming vector into Ω = (cid:2) ( φ opt ) T , ( w opt ) T (cid:3) T ∈ C ( N + M ) × . Both input and outputvectors contain complex numbers. To feed them into the DNN,the real and imaginary parts of each entry are separated. Thus,the input has size T M × and the output dimension is N + M ) × . Using a training set of n train samples consistingof different realizations, the DNN emulates the mapping byadjusting the weights and bias terms.The proposed DNN (DL method 1) is composed of 3 fullyconnected hidden layers. The details are presented in Table I.The input data is scaled using Standard Scaler function in thePython environment, which removes the mean and normalizethe input data such that it has unit variance. We use the Adamoptimizer with adaptive learning rates starting from . .The learning rate is reduced to its half when there is noimprovement in the last 5 epochs. As loss function, we selectthe mean square error (MSE). The batch size is chosen as and an early stopping criteria is applied that stops thetraining when the validation accuracy does not improve in 10consecutive epochs. The maximum number of epochs is set to200. B. Deep Learning Method 2

In the second DNN, we set

T < N + 1 to reduce thepilot overhead and the intention is that the DNN will learnhow to reconstruct the channel despite the reduced dimen-sionality. The input-output pairs { y p , Ω } are generated duringthe preamble stage. Note that the input y p is shorter in thiscase. As in DL method 1, the real and imaginary parts of ayers Size Activation FunctionInput T M eluLayer (Dense) eluLayer (Dense) eluLayer (Dense) eluOutput N + M ) linear TABLE I: Layout of the proposed DL method 1 where T = N + 1 . Layers Size Activation FunctionInput T M eluLayer (Dense) eluLayer (Dense) eluLayer (Dense) eluLayer (Dense) eluOutput N + M ) linear TABLE II: Layout of the proposed DL method 2 where

UMERICAL R ESULTS

In this section, we evaluate the performance of the proposedDNNs where M = 10 and N = 100 . For each data sample,the location of the UE with height . m is drawn from auniform distribution over a × square-meter room. Thenumbers of paths are set as L d = L ru = 5 . The downlinktransmit power is dBm and the pilot power is dBm,unless otherwise stated. The receiver noise power is − dBmwhere the bandwidth is MHz.The pathloss coefﬁcient of the BS-IRS channel is calculatedas β br = NA πd where A = ( d r λ c ) is the area of one IRSelement with d r = 0 . and λ c = 0 . m and d br = 292 m isthe distance between the BS and IRS. The antenna spacing atthe BS is d H = 0 . .The other pathloss parameters are set based on [18],[19] as α l d = (cid:112) β ( d bu /d ) − . e − j πf c τ l d and α l ru = (cid:112) β ( d ru /d ) − . e − j πf c τ l ru where d = 1 m, β = − . dB is the reference pathloss, d bu and d ru are the distancesbetween BS-UE and IRS-UE, respectively. The associated pathdelays in nanoseconds are τ l d ∼ U [0 , , τ l ru ∼ U [0 , . Theminimum allowed d ru = 7 m.The DNN was trained based on a dataset of n train = 80000 training samples. Particularly, of the samples was usedfor training and for validation. Another samplesformed the test dataset, which is independent from the trainingdataset but drawn from the same distribution. The trainingprocess takes around 1 hour and the online testing requiresapproximately 0.2 ms for both methods in Python on aWindows 10 personal computer having Intel i7-6600U CPUwith 2.81 GHz and Intel HD Graphics 520 GPU.The normalized mean-squared-error (NMSE) of the phaseconﬁguration is calculated as NMSE = 1 n test n test (cid:88) s =1 (cid:13)(cid:13)(cid:13) φ opt s − ˆ φ xs (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) φ opt s (cid:13)(cid:13) (17) where φ opt s is the optimum phase conﬁguration based onperfect CSI, ˆ φ xs is either the output of one of the DNNsor calculated based on LS-based estimation i.e., x ∈{ DL method 1 , DL method 2 , LS-based method } . Notice that (cid:13)(cid:13) φ opt s (cid:13)(cid:13) = (cid:107) ˆ φ xs (cid:107) = N .Fig. 2: Cumulative distribution function of the downlinkspectral efﬁciency.Fig. 3: NMSE versus pilot transmit powers.Fig. 4: Cumulative distribution function of beamforming mis-match for different methods.ig. 2 compares the cumulative distribution of the downlinkspectral efﬁciencies that are calculated based on (15) fordifferent cases. The “Direct Path” label represents the casewhen there is no IRS in the system. The “Random φ ” denotesthe setting where the phase conﬁguration at the IRS is setrandomly and the downlink transmit beamforming vector iscalculated based on these phases for each test sample. Weobserve that DL method 1 performs better than the classicalLS-based method for almost all of the samples. It is very closeto the “Optimum φ ” in which the phase conﬁguration and thebeamforming vector are computed based on perfect CSI. Notethat in both DL method 1 and the LS-based method, we usedthe same pilot length T = N +1 = 101 . Moreover, DL method2 in which we used T = 64 also performs better than the LS-based method for most of the test data. The pilot overheadis reduced by in DL method 2 compared to DL method1 and LS-based method. This is because of the fact that theDNNs are able to ﬁnd the direct mapping between the receivedpilot signals and the optimum phases and beamformer whereasthe LS-based method treats the estimates as the true channelsthat causes an information loss. Besides, the LS estimator doesnot have any prior information on the channel whereas theDNNs can learn the features of the channel from the datasets.In Fig. 3, we compare the NMSEs of the presented methodsfor different pilot transmit powers. During the preamble stage,the training data is generated for different pilot transmit powerswhile keeping the other parameters ﬁxed. Then, the DNNsare trained by these received pilots. It is demonstrated thatfor practical pilot powers the DL methods provide betterperformance whereas for high pilot powers the LS-basedmethod outperforms the DL approaches. However, potentially,another DNN could be designed and trained for high pilotpowers by increasing the width of the hidden layers that wouldincrease the accuracy. However, a potential pitfall with thisapproach is to create an overﬁtting problem causing the DNNto memorize the training set.In Fig. 4, we compare the accuracy of the downlinktransmit beamforming vectors that are designed at the BSside based on the presented methods. More precisely, thebeamforming mismatch is computed as (cid:107) w opt − w x (cid:107) where x ∈ { DL method 1 , DL method 2 , LS-based method } . Noticethat (cid:107) w opt (cid:107) = (cid:107) w x (cid:107) = 1 . We observe that the DL methodsgive very similar accuracy and they are superior to the LS-based approach. V. C ONCLUSIONS

This paper proposes a DNN framework for the reconﬁgu-ration of IRS elements based on the available pilot signals.We showed that a properly trained feed-forward DNN isable to learn how to conﬁgure the IRS phases and downlinkbeamforming vector. DL method 1 outperforms the classicalLS estimator based method for practical pilot transmit powers.Its performance is close to the perfect CSI based approach. Inaddition, DL method 2 reduces the pilot overhead and have asimilar performance to the LS based method. To further improve the framework, other things could bedone such as considering multiple users, IRS-element groupingfor reducing the pilot overhead further or using quantized IRSphases. Besides, measured channels could be used for DNNtraining. R

EFERENCES[1] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, andC. Yuen, “Reconﬁgurable intelligent surfaces for energy efﬁciency inwireless communication,”

IEEE Trans. Wireless Commun. , vol. 18, no. 8,pp. 4157–4170, 2019.[2] C. Liaskos, S. Nie, A. Tsioliaridou, A. Pitsillides, S. Ioannidis, andI. Akyildiz, “A new wireless communication paradigm through software-controlled metasurfaces,”

IEEE Commun. Mag. , vol. 56, no. 9, pp. 162–169, 2018.[3] Q. Wu and R. Zhang, “Towards smart and reconﬁgurable environment:Intelligent reﬂecting surface aided wireless network,”

IEEE Commun.Mag. , vol. 58, no. 1, pp. 106–112, 2020.[4] E. Björnson, O. Özdogan, and E. G. Larsson, “Reconﬁgurable intelligentsurfaces: Three myths and two critical questions,”

IEEE Commun. Mag. ,2020, to appear.[5] A. M. Elbir and K. V. Mishra, “A survey of deep learningarchitectures for intelligent reﬂecting surfaces,” 2020. [Online].Available: https://arxiv.org/abs/2009.02540[6] A. M. Elbir, A. Papazafeiropoulos, P. Kourtessis, and S. Chatzino-tas, “Deep channel learning for large intelligent surfaces aided mm-Wave massive MIMO systems,”

IEEE Wireless Communications Letters ,vol. 9, no. 9, pp. 1447–1451, 2020.[7] C. Huang, G. C. Alexandropoulos, C. Yuen, and M. Debbah, “Indoorsignal focusing with deep learning designed reconﬁgurable intelligentsurfaces,” in

IEEE 20th International Workshop on Signal ProcessingAdvances in Wireless Communications (SPAWC) , 2019, pp. 1–5.[8] A. Taha, M. Alrabeiah, and A. Alkhateeb, “Deep learning for largeintelligent surfaces in millimeter wave and massive MIMO systems,”in

IEEE Global Communications Conference (GLOBECOM) , 2019, pp.1–6.[9] F. Jiang, L. Yang, D. B. da Costa, and Q. Wu, “Channel estimation viadirect calculation and deep learning for RIS-Aided mmWave systems,”2020. [Online]. Available: https://arxiv.org/abs/2008.04704[10] C. Huang, R. Mo, and C. Yuen, “Reconﬁgurable intelligent surface as-sisted multiuser MISO systems exploiting deep reinforcement learning,”

IEEE Journal on Selected Areas in Communications , vol. 38, no. 8, pp.1839–1850, 2020.[11] K. Feng, Q. Wang, X. Li, and C. Wen, “Deep reinforcement learningbased intelligent reﬂecting surface optimization for MISO communica-tion systems,”

IEEE Wireless Communications Letters , vol. 9, no. 5, pp.745–749, 2020.[12] T. L. Jensen and E. De Carvalho, “An optimal channel estimation schemefor intelligent reﬂecting surfaces based on a minimum variance unbiasedestimator,” in

IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP) , 2020, pp. 5000–5004.[13] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks:Spectral, energy, and hardware efﬁciency,”

Foundations and Trends R (cid:13) in Signal Processing , vol. 11, no. 3-4, pp. 154–655, 2017.[14] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath,“Spatially sparse precoding in millimeter wave MIMO systems,” IEEETrans. Wireless Commun. , vol. 13, no. 3, pp. 1499–1513, 2014.[15] C. You, B. Zheng, and R. Zhang, “Intelligent reﬂecting surface withdiscrete phase shifts: Channel estimation and passive beamforming,” in

IEEE International Conference on Communications (ICC) , 2020, pp.1–6.[16] Q. Wu and R. Zhang, “Intelligent reﬂecting surface enhanced wirelessnetwork via joint active and passive beamforming,”

IEEE Trans. WirelessCommun. , vol. 18, no. 11, pp. 5394–5409, Nov. 2019.[17] I. Goodfellow, Y. Bengio, and A. Courville,

Deep Learning

IEEE Journal on Selected Areas in Communications ,vol. 5, no. 2, pp. 128–137, 1987.[19] Z. Wang, L. Liu, and S. Cui, “Channel estimation for intelligentreﬂecting surface assisted multiuser communications,” in