[PDF] A Survey of Deep Learning Architectures for Intelligent Reflecting Surfaces

Abstract

Intelligent reflecting surfaces (IRSs) have recently received significant attention for wireless communications because it reduces the hardware complexity, physical size, weight, and cost of conventional large arrays. However, deployment of IRS entails dealing with multiple channel links between the base station (BS) and the users. Further, the BS and IRS beamformers require a joint design, wherein the IRS elements must be rapidly reconfigured. Data-driven techniques, such as deep learning (DL), are critical in addressing these challenges. The lower computation time and model-free nature of DL makes it robust against the data imperfections and environmental changes. At the physical layer, DL has been shown to be effective for IRS signal detection, channel estimation and active/passive beamforming using architectures such as supervised, unsupervised and reinforcement learning. This article provides a synopsis of these techniques for designing DL-based IRS-assisted wireless systems.

Full PDF

11 A Survey of Deep Learning Architectures forIntelligent Reﬂecting Surfaces

Ahmet M. Elbir and Kumar Vijay Mishra

Abstract —Intelligent reﬂecting surfaces (IRSs) have recentlyreceived signiﬁcant attention for wireless communications be-cause it reduces the hardware complexity, physical size, weight,and cost of conventional large arrays. However, deployment ofIRS entails dealing with multiple channel links between the basestation (BS) and the users. Further, the BS and IRS beamformersrequire a joint design, wherein the IRS elements must be rapidlyreconﬁgured. Data-driven techniques, such as deep learning (DL),are critical in addressing these challenges. The lower computationtime and model-free nature of DL makes it robust against thedata imperfections and environmental changes. At the physicallayer, DL has been shown to be effective for IRS signal detec-tion, channel estimation and active/passive beamforming usingarchitectures such as supervised, unsupervised and reinforcementlearning. This article provides a synopsis of these techniques fordesigning DL-based IRS-assisted wireless systems.

Index Terms —Intelligent reﬂecting surfaces, beamforming,deep learning, reinforcement learning, federated learning.

I. I

NTRODUCTION

The next-generation millimeter wave (mm-Wave) massivemultiple-input multiple-output (MIMO) systems require largeantenna arrays with a dedicated radio-frequency (RF) chainfor each antenna. This results in expensive and large sys-tem architectures which consume high power and processingresources. To reduce the number of RF chains while alsomaintaining sufﬁcient beamforming gains, hybrid analog anddigital beamforming architectures were introduced. However,the resulting cost and energy overheads using these systemsremain a concern. Recently, intelligent reﬂective surfaces(IRSs) have emerged as a feasible solution [1] to implementlow cost and light-weight alternative to large arrays complexityin both outdoor and indoor applications (Fig. 1).An IRS is an electromagnetic two-dimensional surfacethat is composed of large number of passive reconﬁgurablemeta-material elements, which reﬂect the incoming signal byintroducing a pre-determined phase shift. This phase shift iscontrolled via external signals by the base station (BS) througha backhaul control link. As a result, the incoming signal fromthe BS can be manipulated in real-time, thereby, reﬂectingthe received signal toward the users. Hence, the usage ofIRS enhances the signal energy received by distant users andexpands the coverage of the BS. It is, therefore, required tojointly design the beamformer parameters both at the IRS andBS. This achieves desired channel conditions, wherein the BSconveys the information to multiple users through the IRS [2].

A. M. Elbir is with the Department of Electrical and Electronics Engineer-ing, Duzce University, Duzce, Turkey (e-mail: [email protected]).K. V. Mishra is with the United States Army Research Laboratory, Adelphi,MD 20783 USA (e-mail: [email protected]).

The accuracy of beamformer design strongly relies on theknowledge of the channel information. In fact, the IRS-assistedsystems include multiple communications links, i.e., a directchannel from BS to users and a cascaded channel from BS tousers through IRS. This makes the IRS scenario even morechallenging than the conventional massive MIMO systems.Furthermore, the wireless channel is dynamic and uncertainbecause of changing IRS conﬁgurations. Consequently, thereexists an inherit uncertainty stemming from the IRS conﬁgu-ration and the channel dynamics. These characteristics of IRSmake the system design very challenging [2, 3].To address the aforementioned uncertainties and non-linearities imposed by channel equalization, hardware im-pairments, and sub-optimality of high-dimensional problems,model-free techniques have become common in wirelesscommunications [4]. In this context, deep learning (DL) isparticularly powerful in extracting the features from the rawdata and providing a “meaning” to the input by constructinga model-free data mapping with huge number of learnableparameters. As listed below, DL is more efﬁcient than model-based techniques that largely rely on mathematical models: • A learning model constructs a non-linear mapping be-tween the raw input data and the desired output toapproximate a problem from a model-free perspective.Thus, its prediction performance is robust against thecorruptions/imperfections in the wireless channel data. • DL learns the feature patterns, which are easily updatedfor the new data and adapted to environmental changes.In the long run, this results in a lower computationalcomplexity than the model-based optimization. • DL-based solutions have signiﬁcantly reduced run-timesbecause of parallel processing capabilities. On the otherhand, it is not straightforward to achieve parallel im-plementations of conventional optimization and signalprocessing algorithms.The aforementioned advantages have led to DL supersedingthe optimization-based techniques in the system design forphysical layer of the wireless communications [4].Lately, the IRS-aided wireless systems have exploited DLto handle very challenging problems. For instance, signaldetection in IRS requires development of end-to-end learningsystems under the effect of channel and beamformers [5]. Thechannel needs to be estimated for multiple communicationlinks, i.e., BS-user and BS-IRS-user [6]. Finally, beamformersare designed for phase shifters at both BS and passive elementsof the IRS [9]. There have been recent surveys on applyingDL [4] and IRS [1] individually to wireless communications. a r X i v : . [ ee ss . SP ] S e p Fig. 1. IRS-assisted wireless communications for outdoor and indoor deployments. A BS on top of the infrastructure (left) communicates with the userson ground through an intermediate IRS mounted on other buildings (center). The BS also serves users (right) inside the apartment building through an IRSplaced on the wall of the room. TABLE IDL-

BASED TECHNIQUES FOR

IRS-

ASSISTED WIRELESS SYSTEMS

LearningScheme NN Architecture Beneﬁts DrawbacksSignal detectionSL [5] MLP with layers No need for channel estimation algorithm Still needs to design beamformers and requireshuge datasets and deeper NN architecturesChannel estimationSL [6] Twin CNNs with convolu-tional, fully connected layers Each user estimates its own channel with the trainedmodel Data collection requires channel training by turningon/off each IRS elementsFL [7] A single CNN with convolu-tional, fully connected layers Less transmission overhead for training, A singleCNN estimates both cascaded and direct channels Performance depends on the number of users andthe diversity of the local datasetsSL [8] DDNN with convolutionallayers Leverages both CS and DL methods Requires active IRS elements. High predictioncomplexity arising from CS algorithmsBeamformingSL [9] MLP with layers Reduced pilot training overhead Requires active IRS elements for channel trainingUL [10] MLP with layers Reduced complexity at the model training stage Implicitly needs the reﬂect beamformers as labelsRL [11] DQN with layers Provides standalone operation since RL does notrequire labels like SL Longer training. Active IRS elements needed forchannel acquisitionRL [2] DDPG with -layered actor andcritic networks Better performance than DQN Large number of NN parameters are involvedRL [12] DDPG with actor and critic net-works Accelerated learning performance with the aid ofoptimization, shrinking the search space Additional optimization tools neededFL [13] MLP with layers Less transmission overhead involved during modeltraining IRS must be connected to the parameter serverSecure beamformingRL [14] DQN with layers Robust against eavesdropping High model training complexityEnergy-efﬁcient beamformingRL [3] DQN Energy-efﬁcient and robust against channel uncer-tainty IRS beamforming onlyIndoor beamformingSL [15] MLP with layers Reduces hardware complexity of multiple BSs andimproves RSS for indoor environments Learning model performance relies on room con-ditions In this article, we provide an overview of systems whichjointly employ both approaches. In particular, we describeDL techniques (Table I) for three main IRS problems: signaldetection, channel estimation, and beamforming. Each of theserequires different DL architectures, which have so far includedsupervised learning (SL), unsupervised learning (UL), rein-forcement learning (RL) and federated learning (FL). We pro-vide a detailed synopsis of the advantages and shortcomings ofeach algorithm for these three applications in the subsequentsections. We also discuss the design challenges of DL-basedIRS systems for the next generation wireless communicationsand highlight the related future research directions.II. DL-B

ASED S IGNAL D ETECTION IN

IRSThe signal detection comprises mapping the received sym-bols under the effect of channel and beamformers to transmitsymbols (Fig. 2). To leverage DL for signal detection, [5]devised a multi-layer perceptron (MLP) for mapping the chan-nel and reﬂecting beamformer effected data symbols to thetransmit symbols. The MLP is a feedforward neural network(NN) composed of multiple hidden layers. The frameworkin [5] uses three fully connected layers. Once the MLP istrained on a dataset composed of received-transmitted datasymbols, each user feeds the learning model with the blockof received symbols. These blocks account for the effect ofchannel and beamformers. Then, MLP yields the estimatedtransmit symbols.A major advantage of this approach is its simplicity that thelearning model estimates the data symbols directly, without aprior stage for channel estimation. Thus, this method is helpfulreducing the cost of channel acquisition. In [5], a bit-error-rate (BER) analysis has shown that the DL-based IRS signaldetection (DeepIRS) provides better BER than the minimummean-squared-error (MMSE) and close performance to themaximum likelihood estimator.However, a few challenges remain to achieve a reliable per-formance. The training data should be collected under severalchannel conditions and different beamformer conﬁgurationsso that the trained model learns the environment well andreﬂects the accurate performance in different scenarios. This isa particularly challenging task because it requires collection ofthe training data for different user locations. As a result, DL-based signal detection demands huge training dataset collectedat different channel conditions. In order to provide reliablefeature representation and mapping for this dataset, wider anddeeper NN architectures are needed. Further, this approachstill requires optimizing the beamformers so that the IRSeffectively reﬂects the received signal to the users.III. DL-B

ASED

IRS C

HANNEL E STIMATION

The IRS is composed of a huge number of reﬂecting ele-ments and, therefore, channel state acquisition is a major taskin IRS-assisted wireless systems. A common approach is toturn on and off each individual IRS element one-by-one whilealso using orthogonal pilot signals to estimate the channelbetween the BS and the users through IRS. In particular, IRSchannel estimation via DL involves constructing a mapping between the received input signals at the user and the channelinformation of direct and cascaded links (Fig. 2). This is morechallenging than the conventional massive MIMO, where asingle channel link is used.The SL approach proposed in [6] estimates both directand cascaded channels via twin convolutional neural networks(CNNs). First, the received pilot signals at the user are col-lected by sequentially turning on the individual IRS elements.Then, the collected data are used to ﬁnd the least squaresestimate of the cascaded and the direct channels. Both CNNsare trained to map the least squares (LS) channel estimates tothe true channel data. The upshot is that each user estimates itsown channels only once and feeds the received pilot data (LSestimate) to the trained CNN models. The CNNs have highertolerance than MLP against the channel data uncertainties,imperfections (such as switching mismatch) of IRS elements.When the model training is conducted at the user with hugedatasets as in [6], the system may lack sufﬁcient computationalcapability. This is overcome by FL-based training [7], wherethe learning model is trained at the BS without transmittingthe data (Fig. 3). Instead, only model updates are sent therebyreducing the transmission overhead. Each user computes onlythe model update corresponding to its local dataset, which issmaller than the entire training data. Furthermore, instead ofusing two CNNs as in [6], a single CNN jointly estimates bothcascaded and direct channels.Although FL reduces the transmission overhead duringmodel training, its training performance is upper bounded bythe centralized model training, i.e., training the model with thewhole dataset at once. Therefore, the prediction performanceof FL is usually poorer than the centralized learning (CL).As shown in Fig. 4), CL and FL frameworks are comparedwith the MMSE and the LS estimation. We note that FL per-forms slightly poorer than CL in high SNR regimes. Despitethis, FL signiﬁcantly reduces the transmission overhead, e.g.,approximately ten-fold reduction in the number transmittedsymbols [7]. The performance of FL improves with the in-crease in the number of users or edge devices because thisreduces the variance of the model updates aggregated at theBS. The diversity of the local dataset of the users also affectsthe training/prediction performance and better performanceis obtained if the local datasets are close to uniformity. Inaddition, FL in IRS-assisted scenario is more challenging thanconventional massive MIMO because the performance of thetraining stage also requires the two-hop connection betweenthe users and the BS. Hence, beamforming should be doneprior to model training.Both SL- and FL-based channel estimation techniques suf-fer from high channel training overhead. In this context,compressive channel estimation with deep denoising neuralnetworks (DDNNs) is very effective [8]. It employs a hybridpassive/active IRS architecture, where the active IRS elementsare used for uplink pilot training and passive ones for reﬂectingthe signal from the BS to the users. Once the BS collects thecompressed received pilot measurements, complete channelmatrix is recovered through sparse reconstruction algorithmssuch as orthogonal matching pursuit (OMP). Then, DDNN isused to improve the channel estimation accuracy by exploiting

Fig. 2. Model-based versus learning-based frameworks for signal detection and channel estimation. Model-based approach (top) comprises multiple subsystemsto process the received signal. Learning-based signal detection (bottom, left) provides an end-to-end data mapping from the corrupted symbols under thechannel effects at the receiver to the transmit symbols. Learning-based channel estimation (bottom, right) maps the input received signals to the channelestimate as output labels.Fig. 3. In the FL framework, each user (center) processes its own localdataset (right), computes the model updates (gradients), and sends them tothe parameter server (left). The server aggregates the collected model updatesand the updated model parameters are sent back to the users. the correlation between the real and imaginary parts of themm-Wave channel in angular-delay domain. During training,the input is the OMP-reconstructed channel matrix and theoutput is the noise, i.e. the difference between the OMP esti-mate and the ground truth channel data. This method leveragesboth compressed sensing (CS) and DL yielding a performancebetter than using these techniques individually. The majordrawback is the additional hardware complexity introducedby the active IRS elements. Furthermore, OMP algorithm isused in place of the raw received pilot measurements forconstructing the input. This requires repeated execution of theOMP algorithm thereby increasing the prediction complexity -20 -10 0 10 20 30 4010 -3 -2 -1 Fig. 4. The mean-squared-error of channel estimates normalized againstground truth channel, obtained using CNN in centralized and federatedlearning frameworks, MMSE and LS. The BS consisted of antennas andIRS employed passive reﬂecting elements [6, 7]. over the DL methods in [6] and [7].IV. DL-A IDED B EAMFORMING FOR

IRS A

PPLICATIONS

Beamforming in IRS-based communications have diverseapplications such as IRS-only beamforming (passive), BS-IRSbeamforming (active/passive), secure beamforming (eaves-droppers included), energy-efﬁcient beamforming, and indoorIRS beamforming. There are speciﬁc DL challenges andsolutions to each one of these problems.

Fig. 5. In RL, the DQN and DDPG architectures accept same state (channeldata and received SNR) and environment data (beamformers to be evaluated).The DQN involves training a single neural network based on the rewarddetermined from the environment. On the other hand, the DDPG has multipleneural networks, where actor-critic architectures are used to compute actionsand target values, respectively.

A. Beamforming at the IRS

The IRS beamforming requires passive elements continu-ously to reliably reﬂect the BS signal to the users. Here, theMLP architecture [9] is helpful in designing the reﬂect beam-forming weights using active IRS elements [8]. These elementsare randomly distributed through the IRS. They are used forpilot training, after which compressed channel estimation iscarried out using OMP. During data collection, the reﬂectbeamforming weights are optimized by using the estimatedchannel data. Finally, a training dataset is constructed withchannel data and reﬂect beamformers as the input-output pairsfor an SL framework. Note that the active IRS elements presentsimilar shortcomings as in [8]. However, the method in [9]excels by leveraging DL for designing beamformers.The labeling process in [9] demands solving an optimizationproblem for each channel instance in training data generationstage. One possible way to mitigate this is to use label-freetechniques, such as UL. The UL approach in [10] for reﬂectbeamforming design employs MLP with ﬁve fully connectedlayers. The network maps the vectorized cascaded and directchannel data input to the output comprising the phase valuesof the reﬂect beamformers. The loss function is selected as thenegative of the norm of the channel vector, which may seemlike an unsupervised approach because it does not minimizethe error between the label and learning model prediction.However, this technique yields the phase information at theoutput uniquely for each training samples. Consequently, thebeamformers implicitly behave like a label in the trainingprocess. In UL, the training data should be unlabeled, and the“distance” between the training data samples is minimized.This clusters the training data into smaller sets without priorknowledge about the “meaning” of each clustered sets. How-ever, in [10], the output of the NN is a design parameter,i.e. reﬂect beamformer phases, which have the complexity ofbeamformer optimization for each input.In order to eliminate the expensive labeling process ofthe SL-based techniques, [11] employed RL to design thereﬂect beamformers for single-antenna users and BS. TheRL is a promising approach which directly yields the outputby optimizing the objective function of the learning model. First, the channel state is estimated by using two orthog-onal pilot signals. An action vector is selected either byexploitation (using prior experience of the learning model)or exploration (using a predeﬁned codebook). After comput-ing the achievable rate based on the selected action vectorfrom the environment, a reward or penalty is imposed bycomparing with the achievable rate with a threshold. Uponreward calculation, a Deep Quality Network (DQN) (Fig. 5)updates the map from the input state (channel data) to theoutput action (action vector composed of reﬂect beamformerweights). This process is repeated for several input states untilthe learning model converges. Note that this process avoidslabeling. The RL algorithm learns reﬂect beamformer weightsbased on the optimization of the achievable rate. Thus, RLpresents a solution for online learning schemes, where themodel effectively adapts to the changes in the propagationenvironment. However, RL techniques have longer trainingtimes than the SL approaches because reward mechanism anddiscrete action spaces make it difﬁcult to reach the globaloptimum. The label-free process implies that the RL usuallyhas slightly poorer performance than the SL.To accelerate the training stage by the use of continuousaction spaces, a deep deterministic policy gradient (DDPG)(Fig. 5) was introduced in [2]. Here, actor-critic networkarchitectures are used to compute actions and target values,respectively. First, the learning stage is initialized by the useof input state excited by cascaded and direct channels. Giventhe state information, a deep policy network (DPN) (actor)constructs the actions (reﬂection beamformer phases). Here,the DPN provides a continuous action space that convergesfaster than the DQN architecture of [11]. The action vector isused by the critic network architecture to estimate the receivedsignal-to-noise-ratio (SNR) as objective. This SNR then yieldsthe target beamformer vector under the learning policy. Usingthe gradient of DPN, the network parameters are updated andthe next state is constructed as the combination of the receivedSNR and the reﬂecting beamformers. This process is repeateduntil it converges. An additional beneﬁt of this approach is thatit outperforms ﬁxed-point iteration (FPI) algorithms used tosolve reﬂect beamforming optimization. Moreover, the contin-uous action space representation with DPN in DDPG providesrobustness of the learning model against changes in channeldata. However, multiple NN architectures (actor and criticnetworks) increase the number of learning parameters andaggravate model update requirements for each architecture.The model initialization in both DQN and DDPG couldpossibly cause the learning models to start far from theoptimum point during the early stages of learning. This leadsto a slow convergence and poor reward performance. In orderto accelerate the learning process, [12] devised a joint learningand optimization technique. The key idea is to use DDPGto search for optimal action for each decision epoch duringtraining. Then, a feasible beamformer vector is found viaoptimization in a convex-approximation setting. This reducesthe search space of the DDPG algorithm and shortens trainingtimes. However, the learning platform must be equipped withoptimization and other signal processing routines. This can beavoided by replacing the optimization stage with more efﬁcient fully learning-based approaches, such as end-to-end learningschemes [4].Even if RL is a label-free approach that reduces the over-head during training data generation, training approaches in [2,11, 12] demand expensive transmission overhead to be trainedon huge datasets. This is mitigated in FL techniques. TheFL approach in [13] learns the IRS reﬂect beamformers bytraining an MLP by computing the model updates at eachuser with the local dataset. The model updates are aggregatedin a parameter server, which is connected to the IRS. TheMLP input is the cascaded channel information and the outputlabels are IRS beamformer weights. The federated architecturelowers the transmission overhead during training. However, itis assumed that the parameter server is connected to the IRS.The simple architecture of the IRS could make this infeasible.It is more practical to access the parameter server via BS formodel training. Importantly, the FL is a privacy-preservinglearning scheme where training data are not transmitted duringtraining [13]. This is especially critical when data includepersonal content, such as images and contact information. Inaddition, instead of MLP architectures, CNNs can provideimproved performance in FL settings due to its high featureextraction capability [7].

B. Secure-Beamforming

Physical layer security in wireless systems is largelyachieved through signal processing techniques, such as co-operative relaying and cooperative jamming. The hardwarecomplexity is a major issue in these methods. The low-cost, less complex IRS-based systems have the potential tomitigate these problems. The RL-based secure beamforming[14] minimizes the secrecy rate by jointly designing the beam-formers at the IRS and BS to serve multiple legitimate usersin the presence of eavesdroppers. The RL algorithm acceptsthe states as the channel information of all users, secrecyrate and transmission rate. Similar to [2], the action vectorare beamformers at the BS and IRS. The reward functionis designed based on the secrecy rate of users. A DQN istrained to learn the beamformers by minimizing the secrecyrate while guaranteeing the quality-of-service requirements.The model training takes place at the BS, which is responsiblefor collecting the environment information (channel data) andmaking decisions for secure beamforming. This scheme ismore realistic and reliable than that of [2, 11], which ignorethe effect of eavesdroppers. The learning model includes high-dimensional state and action information, such as the channelsof all users and beamformers of BS and IRS. This maynecessitate more computing resources for training than non-secure IRS [2, 11] and conventional SL techniques [6, 9].

C. Energy-Efﬁcient Beamforming

The IRS conﬁguration dynamically changes depending onthe network status. It is very demanding for the BS to optimizethe transmit power every time when the on/off status of IRSelements is updated. This could be addressed by accountingenergy-efﬁciency in the beamformer design problem. In [3], aself-powered IRS scenario maximizes the energy-efﬁciency by optimizing the transmit power and the IRS beamformer phases.In this DQN-based RL approach, the BS learns the outcome ofthe system performance while updating the model parameters.Thus, the BS makes decisions to allocate the radio resourcesby relying on only the estimated channel information. The RLframework has states selected as the estimated channels fromusers and the energy level of the IRS. Meanwhile, the actionvector includes the transmit power, the IRS beamformer phasesand on/off status of the IRS elements. The learning policy isbased on the reward which is selected as the energy-efﬁciencyof the overall system. However, this work considers only IRSbeamforming and ignores the same at the BS.

D. Beamforming for Indoor IRS

Different from the above scenarios, [15] addresses the IRSbeamformer design problem in an indoor communicationsscenario to increase the received signal strength (RSS) (seeFig. 1). This is particularly useful from the perspective oflow hardware complexity because it eliminates deploymentof multiple BSs to improve RSS. The MLP architecture in[15] accepts two-dimensional user position vector and yieldsthe IRS beamformer phases at the output. Since the channeldata is not employed as input, the network does not haveto deal with severe environmental ﬂuctuations. However, thelearning model trains on speciﬁc room environments andmay perform poorly for different room conditions or differentobstacle distribution in the same room. This is mitigated inRL-based solutions which are highly adaptive to differentenvironments [2, 11].V. C

HALLENGES AND F UTURE R ESEARCH D IRECTIONS

DL architectures result in signiﬁcant performance gainsand efﬁciency for IRS-assisted wireless systems. Several chal-lenges remain in realizing these methods, as detailed below.

A. Data Collection

Massive data collection hampers successful performance ofDL-based techniques for all wireless communications tasks:signal detection, channel estimation, and beamformer design.The signal detection requires collection and storage of transmitand receive data symbols for different channel conditions.The prerequisites for channel estimation and beamformingare even more tedious because of additional labeling process.This is difﬁcult to overcome in, especially online scenarios.Apart from SL, the label-free structure in RL is particularlyhelpful but at the cost of training times. It is possible to relaxthe data collection requirements by realizing the propagationenvironment in a numerical electromagnetic simulation tool [7]and then using a more realistically simulated data. This ishelpful in constructing the training dataset ofﬂine but chancesof failure remain in a real world scenario. As a result, efﬁcientdata collection algorithms are of great interest to future DL-based IRS-aided systems.

B. Model Training

Model training consumes much time and resources, includ-ing parallel processing and storage. It is usually carried out of-ﬂine before online deployment at a parameter server connectedto the BS. This introduces huge transmission overhead as wellas privacy issues. The FL has potential to reduce these costsand enable a communication-efﬁcient model training (see, e.g.,Fig. 4). Here, combining the label-free structure of RL and thecommunications efﬁciency of FL, i.e. federated reinforcementlearning, could be the next step.The communications efﬁciency of FL brings the learningstage from the cloud to the edge level. In this case, theedge devices (e.g., mobile phones) should have sufﬁcientparallel processing power for model training. To enable this,the next generation edge devices are expected to have morepowerful processing capability to perform DL tasks. Althougha graphical processing unit (GPU) is commonly used inthe mobile phones nowadays, it is largely programmed toprocess image-like data and not to train DL models. It isalso not straightforward to use these mobile GPUs for DLbecause training software are usually tailored for speciﬁc GPUarchitecture.

C. Environment Adaptation

To achieve commercial viability of DL-based IRS-aidedcommunications, dynamically adapting to changes in the en-vironment is crucial. The behavior of the channel affectsall DL-based tasks including channel estimation, beamform-ing, user scheduling, power allocation, and antenna selec-tion/switching. Current DL architectures for wireless systemsremain environment-speciﬁc. The input data space of theirlearning model is limited. As a result, the performance de-grades signiﬁcantly when the learning model is fed with theinput from unlearned/uncovered data space. In order to coverlarger data spaces, wider and deeper learning models arerequired. But the current DL architectures for wireless commu-nications comprise less than a million neurons [7]. The giantlearning models for image recognition or natural languageprocessing consists of millions and billions of neurons, e.g.VGG (138 million), AlexNet (60 million), and GPT-3 (170billion). Clearly, going wider and deeper in designing thelearning models is of great interest for future DL-based IRS-aided systems. VI. S

UMMARY

We surveyed DL architectures for IRS-assisted wirelesssystems for key applications of signal detection, channel es-timation, and beamforming. We extensively discussed variouslearning schemes and model architectures, such as SL, UL,FL and RL for IRS applications. The SL exhibits betterperformance than UL and RL because of label usage. TheUL and RL are label-free schemes that provide less com-plexity during training data generation. However, UL stillinvolves an optimization stage for each data instance. Amongall, the RL is the most promising technique because of itsstandalone operation and the consequent ability to adapt toenvironmental changes at the cost of longer training times. The FL reduces the transmission overhead signiﬁcantly and can beintegrated with the other learning methods. The combinationof FL- and RL-based learning policies not only exhibitsa communication-efﬁcient model training but also providesenvironmental adaptation. Major research challenges includedata collection, model training, and environment adaptation.These should be addressed simultaneously to provide a reliableDL architecture for the next-generation IRS-assisted wirelesssystems. Speciﬁcally, the combination of FL and RL shouldbe fed with the collection of huge datasets and massive neuralnetworks so that a robust DL architecture is achieved.R

EFERENCES[1] S. Gong, X. Lu, D. T. Hoang, D. Niyato, L. Shu, D. I. Kim, and Y. Liang,“Towards Smart Wireless Communications via Intelligent ReﬂectingSurfaces: A Contemporary Survey,”

IEEE Commun. Surveys Tuts. , pp.1–1, 2020.[2] K. Feng, Q. Wang, X. Li, and C. Wen, “Deep Reinforcement LearningBased Intelligent Reﬂecting Surface Optimization for MISO Communi-cation Systems,”

IEEE Wireless Commun. Lett. , vol. 9, no. 5, pp. 745–749, 2020.[3] G. Lee, M. Jung, A. T. Z. Kasgari, W. Saad, and M. Bennis, “Deep rein-forcement learning for energy-efﬁcient networking with reconﬁgurableintelligent surfaces,” in

ICC 2020 - 2020 IEEE International Conferenceon Communications (ICC) , 2020, pp. 1–6.[4] L. Dai, R. Jiao, F. Adachi, H. V. Poor, and L. Hanzo, “Deep Learning forWireless Communications: An Emerging Interdisciplinary Paradigm,”

IEEE Wireless Commun. , vol. 27, no. 4, pp. 133–139, 2020.[5] S. Khan and S. Y. Shin, “Deep-learning-aided detection for reconﬁg-urable intelligent surfaces,” arXiv preprint arXiv:1910.09136 , 2019.[6] A. M. Elbir, A. Papazafeiropoulos, P. Kourtessis, and S. Chatzinotas,“Deep Channel Learning for Large Intelligent Surfaces Aided mm-WaveMassive MIMO Systems,”

IEEE Wireless Commun. Lett. , vol. 9, no. 9,pp. 1447–1451, 2020.[7] A. M. Elbir and S. Coleri, “Federated Learning for Channel Estimationin Conventional and IRS-Assisted Massive MIMO,” arXiv preprintarXiv:2008.10846 , 2020.[8] S. Liu, Z. Gao, J. Zhang, M. D. Renzo, and M. Alouini, “Deep DenoisingNeural Network Assisted Compressive Channel Estimation for mmWaveIntelligent Reﬂecting Surfaces,”

IEEE Trans. Veh. Technol. , vol. 69,no. 8, pp. 9223–9228, 2020.[9] A. Taha, M. Alrabeiah, and A. Alkhateeb, “Enabling large intelligentsurfaces with compressive sensing and deep learning,” arXiv preprintarXiv:1904.10136 , 2019.[10] J. Gao, C. Zhong, X. Chen, H. Lin, and Z. Zhang, “UnsupervisedLearning for Passive Beamforming,”

IEEE Commun. Lett. , vol. 24, no. 5,pp. 1052–1056, 2020.[11] A. Taha, Y. Zhang, F. B. Mismar, and A. Alkhateeb, “Deep Reinforce-ment Learning for Intelligent Reﬂecting Surfaces: Towards StandaloneOperation,” in , 2020, pp. 1–5.[12] S. Gong, J. Lin, J. Zhang, D. Niyato, D. I. Kim, and M. Guizani,“Optimization-driven Machine Learning for Intelligent Reﬂecting Sur-faces Assisted Wireless Networks,” arXiv preprint arXiv:2008.12938 ,2020.[13] D. Ma, L. Li, H. Ren, D. Wang, X. Li, and Z. Han, “DistributedRate Optimization for Intelligent Reﬂecting Surface with FederatedLearning,” in , 2020, pp. 1–6.[14] H. Yang, Z. Xiong, J. Zhao, D. Niyato, and L. Xiao, “Deep reinforce-ment learning based intelligent reﬂecting surface for secure wirelesscommunications,” arXiv preprint arXiv:2002.12271 , 2020.[15] C. Huang, G. C. Alexandropoulos, C. Yuen, and M. Debbah, “IndoorSignal Focusing with Deep Learning Designed Reconﬁgurable Intelli-gent Surfaces,” in2019 IEEE 20th International Workshop on SignalProcessing Advances in Wireless Communications (SPAWC)