Federated Learning for 6G: Applications, Challenges, and Opportunities
Zhaohui Yang, Mingzhe Chen, Kai-Kit Wong, H. Vincent Poor, Shuguang Cui
aa r X i v : . [ c s . I T ] J a n Federated Learning for 6G: Applications,Challenges, and Opportunities
Zhaohui Yang, Mingzhe Chen, Kai-Kit Wong,
Fellow, IEEE , H. Vincent Poor,
Fellow, IEEE , and Shuguang Cui,
Fellow, IEEE
Abstract
Traditional machine learning is centralized in the cloud (data centers). Recently, the security concernand the availability of abundant data and computation resources in wireless networks are pushing thedeployment of learning algorithms towards the network edge. This has led to the emergence of afast growing area, called federated learning (FL), which integrates two originally decoupled areas:wireless communication and machine learning. In this paper, we provide a comprehensive study on theapplications of FL for sixth generation (6G) wireless networks. First, we discuss the key requirementsin applying FL for wireless communications. Then, we focus on the motivating application of FL forwireless communications. We identify the main problems, challenges, and provide a comprehensivetreatment of implementing FL techniques for wireless communications.
This work was supported in part by the U.S. National Science Foundation under Grant CCF-1908308.Z. Yang is with the Centre for Telecommunications Research, Department of Engineering, King’s College London, WC2R2LS, UK, Email: [email protected]. Chen is with the Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544, USA, and also withthe Chinese University of Hong Kong, Shenzhen, 518172, China, Email: [email protected]. Wong is with the Department of Electronic and Electrical Engineering, University College London, WC1E 6BT London,UK, Email: [email protected]. Vincent Poor is with the Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544, USA, Email:[email protected]. Cui is with the Shenzhen Research Institute of Big Data and School of Science and Engineering, the Chinese Universityof Hong Kong, Shenzhen, 518172, China, Email: [email protected].
I. B
ACKGROUND AND O VERVIEW ON F EDERATED L EARNING FOR W IRELESS C OMMUNICATIONS
A. Motivation
Due to the explosive growth in data traffic, machine learning and data driven approaches haverecently received much attention and are anticipated to be a key enabler for the to be developedsixth generation (6G) wireless networks [1]. Nowadays, standard machine learning approachesrequire centralizing the training data on a single data center or cloud. Since massive data samplesneed to be uploaded to the data center, transmission delay can be very high and user privacy isnot guaranteed in standard centralized machine learning approaches. However, low-latency andprivacy requirements are important in the emerging application scenarios, such as unmannedaerial vehicles, extended reality (XR) services, autonomous driving, which makes centralizedmachine learning approaches inapplicable. Moreover, due to limited communication resources,it is impractical for all the wireless devices that are engaged in learning to transmit all of theircollected data to a data center that uses a centralized learning algorithm for data analytic ornetwork self-organization.Therefore, it becomes increasingly attractive to process data locally at edge devices. Thishas led to the emergency of distributed optimization methods. In distributed optimization, eachnode can compute on its own data and sends the results to its neighbours or a central node.Distributed optimization has many applications, such as user selection optimization, resourceallocation optimization, trajectory optimization, and distributed machine learning design [2].Combining the advantages of distributed optimization and machine learning, distributed learn-ing frameworks are needed to enable wireless devices to collaboratively build a shared learningmodel with training taken place locally. One of the most promising distributed learning algorithmsis the emerging federated learning (FL) [3]–[17] framework is anticipated in future Internet ofThings (IoT) systems. In FL, wireless devices can cooperatively execute a learning task by onlyuploading local learning models to the base station (BS) instead of sharing the entirety of theirtraining data, as illustrated in Fig. 1 [18]. Since the data center cannot access the local data setsat the users, FL can protect data privacy of the users.For wireless communications, FL has the following advantages: (i) exchanging local machinelearning model parameters instead of the massive training data saves energy and consumes
Local Data set (cid:258)(cid:258) (cid:258)(cid:258) (cid:258) (cid:258) (cid:258) (cid:258) (cid:258) (cid:258) (cid:258) (cid:258) (cid:258) wireless link Device
Device
Device (cid:283)
Fig. 1. A FL algorithm over wireless communication systems. less wireless resources; (ii) training machine learning model parameters locally can effectivelyreduce transmission latency; (iii) FL preserves data privacy since the training data remains ateach device and only the local machine learning model parameters are uploaded; (iv) usingdifferent learning processes to train several classifiers from distributed data sets increases thepossibility of achieving higher accuracy especially on a large-size domain; (v) FL is inherentlyscalable since the growing amount of data may be offset by increasing the number of computersor processors, and providing a natural solution for large-scale learning where complexity andmemory are the main obstacles.FL can be used to solve complex convex and nonconvex optimization problems that arise invarious use cases such as network control, user clustering, resource management, and interferencealignment. Besides, FL enables users to collaboratively learn a shared prediction model whilekeeping their collected data on their devices for user behaviour predictions, user identifications,and wireless environment analysis. Based on the predicted results, the BS can efficiently allocatethe wireless resources for the devices.
B. Classification
For FL, there are two main classifications: federated reinforcement learning (FRL) and fed-erated supervised learning (FSL). In [19], the goal of FRL is to enable wireless devices to remember what they have learned and what other wireless devices have learned. FRL can beused in the case where multiple wireless devices make decisions in different environments. InFRL, each wireless device builds a learning network with the help of other wireless devices.1. Initially, one edge device first obtains its private strategy model learning network throughreinforcement learning (RL) in its own environment and then uploads it to the BS as theshared model.2. After a while, the wireless devices download the shared model from the BS as the initialactor model in RL. Wireless devices get their own private learning networks through RLin new environments. After training is completed, wireless devices upload their privatelearning networks to the BS.3. At the BS, the private learning networks are fused into the shared model, and then a newshared model will be generated. The new shared model can be used by other wirelessdevices. Other wireless devices will also upload their private learning networks to the BSto evolve and update the shared model.The FSL technique builds a uniform learning model through iteratively updating informationbetween the BS and wireless devices, where the local private data is fully labeled. The FSLprocedure contains three steps at each iteration: local computation at each wireless device,local FSL model parameters transmission from each wireless device, and result aggregationand broadcast at the BS.1. Every wireless device needs to compute the result by using its fully labeled data set locally.2. All wireless devices upload the local prediction parameters to the BS via wireless links inthe uplink.3. The BS aggregates the prediction model parameters and broadcasts the global predictionmodel parameters to all the wireless devices in the downlink.
C. Relevant Surveys and Our Contributions
There are some interesting surveys about FL in wireless communications such as [20]–[25].The unique characteristics and challenges of FL were discussed in [20]. Moreover, this workprovided an overview of the current approaches, and outlined several directions of future work.The work in [21] introduced the challenges of FL implementation and reviewed the existingsolutions. In [22], the authors described the challenges of machine learning systems that are
TABLE IA
N OVERVIEW OF SELECTED SURVEYS ABOUT FL IN WIRELESS COMMUNICATIONS .Subject Contributions Related WorkFL Introductory tutorial on unique characteristics and challenges of FL [20]FL Challenges of FL implementation [21]Edge machine learning Challenges of machine learning systems at the edge computer networks [22]FL FL and RL for optimizing mobile edge computing and caching [23]Edge machine learning Edge machine learning architectures [24]FL FL application and use-cases [25] configured at the edge computer networks. Considering RL, the authors in [23] proposed tointegrate deep RL techniques and the FL framework with mobile edge systems, for optimizingmobile edge computing, caching and wireless communication resource. In addition, the work in[24] explored the key building blocks of edge machine learning and different wireless networkarchitectural splits for wireless communications. The study about FL application was surveyed in[25] including software and hardware platforms, protocols, real-life applications and use-cases.We aim to gather the state-of-the-art contributions that address the key challenges of applyingFL techniques for wireless networks. In particular, our objectives are three-fold: to provide acomprehensive descriptions of FL algorithm, to identify the key open problems in wireless com-munication that can be addressed using FL methods, and to point out the emerging applicationsin wireless communication with FL.II. P
ERFORMANCE AND R EQUIREMENTS FOR F EDERATED L EARNING
A. Performance Evaluation
The procedure of FL over wireless networks is shown in Fig. 2. The FL procedure containsthree steps at each iteration: local computation at each device (using several local iterations),local FL parameter transmission for each device, and result aggregation and broadcast at the BS.The local computation step is essentially the phase during which each device calculates its localFL parameters by using its local data set and the received global FL parameters. There are fourmain performance indicators for FL: delay, energy, reliability, and massive connectivity.
Local computation at device 1 (cid:258)
Local computation at device k (cid:258) Local computation at device K Global accuracy
BS aggregationWireless transmission
Information broadcast
Local accuracy Local accuracy Local accuracy
Fig. 2. FL procedures over wireless networks.
Device1Device 2 Computing and transmitting Downloading
Time τ t τ t t K τ K Computation time Transmission time Downloading timeDevice K Fig. 3. Energy performance of FL over wireless networks.
1) Delay:
According to Fig. 3, the delay of FL includes: local computation delay of wirelessdevices, uplink transmission delay, BS aggregation delay, and downlink transmission delay.Considering the tradeoff between local computation delay and wireless transmission delay, it isof importance to minimize the delay for FL via joint transmission and computation optimization.
2) Energy:
Due to limited energy budget of wireless devices, both local computation energyand transmission energy must be considered during the FL process. The calculation of localcomputation energy involves the number of iterations for local computation at each device, and the transmission energy is related to the number of iterations for the FL algorithm to converge.
3) Reliability:
To train an FL algorithm in a distributed manner, the devices must transmit thetraining parameters over wireless links which can introduce training errors, due to limited wirelessresources (e.g., bandwidth) and the inherent unreliability of wireless links. For example, symbolerrors introduced by the unreliable nature of the wireless channel and by resource limitationscan impact the quality and correctness of the FL updates among users. Such errors will, in turn,affect the performance of FL algorithms, as well as their convergence speed.
4) Massive connectivity:
To meet the low latency requirement of FL, we need to collect datadistributed among a huge number of devices rapidly through wireless communications. However,with enormous number of devices, conventional interference-avoiding channel access schemesbecome infeasible since they normally result in excessive latency. To overcome this challenge,over-the-air computation is a promising approach for fast wireless data aggregation via exploitingthe superposition property in a multiple access channel [26].
B. Potential to Meet 6G Requirements
It is expected that 6G communication systems will hence to accommodate 125 billion wirelessdevices by 2030. As a result, it is important to develop an automatic data processing frameworkto allow edge learning to take place. As one of the key enabling technologies, FL has the potentialto meet the following 6G requirement [1].
1) Massive ultra-reliable, low latency communications (mURLLC):
Due to the explosivegrowth in the number of wireless devices in 6G, the 5G URLLC requirements will be changedto the mURLLC. With FL, multiple edge computing units can be used to cooperatively learn ashared model for the network, which can decrease service delay and provide high reliability.
2) Scalable architecture:
Different from a central cloud, edge intelligence, such as FL, is builtin a distributed manner, which includes many edge servers with computing and communicationcapabilities. To serve a massive number of devices in the future 6G, it is important to provide ascalable and decomposable architecture to allow simultaneous computing among multiple edgeservers. It is expected that the FL architecture will play an important role in the future 6Gservices and applications.
3) Human-centric services:
Different from the rate-reliability-latency metrics in 5G, 6G in-volves human-centric services, which requires quality of experience related to the physical movement of the users. FL can be used to predict the movements and gestures of users, and theBS can utilize the predicted results to improve the quality of experience for users.III. F
EDERATED L EARNING FOR W IRELESS C OMMUNICATIONS : M
OTIVATING A PPLICATIONS
Machine learning tools can exploit big data analytic for wireless network state estimationand find the relationship between the optimized variables and objective functions in an onlinemanner so as to reduce the computational complexity for solving the nonconvex problems inwireless communication. Besides, machine learning is powerful because it can optimize problemsthat no one knows how to describe the problems. However, given that multi-cell network needsglobal channel state information (CSI), centralized learning algorithms may require the BSs tocontinuously upload their collected data to a centralized processing server, which can lead to ahigh network overhead and significant delays. As a consequence, using a centralized learningalgorithm for resource management or network control may need a large number of iterations toconverge. Thus, centralized machine learning algorithms will not be able to handle the resourceallocation, signal detection and user behaviour prediction problems in future networks. To thisend, FL is needed, which enables users or BSs to manage the resource in a distributed mannerand analyze their collected data locally.
A. Driving Application of FL for Wireless Problems1) Resource management:
Spectral efficiency and connectivity optimization of multi-cellnetwork always leads to nonconvex resource allocation problems, which were often solved byconventional algorithms such as successive convex approximation and matching theory withhigh complexity and impractical implementation. Therefore, there is a need to introduce newFL techniques that can be used to address a variety of resource management challenges such asdistributed power control for multi-cell networks, joint user association and beamforming design,and dynamic user clustering.For multi-cell power control, as shown in Fig. 4, FRL enables each BS to build the relationshipbetween the power control schemes and utility values so as to find the optimal power controlscheme. In FRL, the BSs on a connected network process data locally by minimizing small
Subcarriers C h a nn e l G a i n BS 1
BS 2 (cid:258)(cid:258) BS N User 1 (cid:22127)(cid:22127)
Subcarriers C h a nn e l G a i n BS 1
BS 2 (cid:258)(cid:258) BS N User M Multi-cell multi-user
BS 1
Subcarriers P o w e r (cid:22127)(cid:22127) BS N Subcarriers P o w e r Fig. 4. Multi-cell power control problem. optimization problems, and exchange the local results among the neighbors to arrive at a globalsolution.Further, FRL can be used for dynamic user clustering, where users individually learn theclustering parameters by RL and the BS builds the unified clustering parameters based on thereceived clustering parameters from all users.
2) User behavior predictions:
Due to the heterogeneous quality-of-service requirement ofusers, user behaviour prediction is of great importance for the implementation of wirelessnetworks.FL can be used to predict the users behaviors such as mobility patterns where each userperforms a local FL algorithm to train the learning model using its own user behavior data andupload the trained model to the BS. Then the BS generates and broadcasts the unified FL modelparameters to all users. Based on the mobility predictions, the users can dynamically choosea subchannel to upload data in the uplink, the BS dynamically allocates multiple subchannelsto multiple users in the downlink, and multiple users which occupy the same subchannel canperform non-orthogonal multiple access (NOMA) or full duplex.The quality of service of users can be predicted by FL, where each BS uses the FL algorithmbased on its stored information such as users’ requested data, gender, job, and device type andall BSs transmit the FL model results to a server to get a unified FL model. BS User
RIS 1RIS 2 RIS 3RIS 4
Fig. 5. A RIS-assisted wireless communication system.
3) Channel estimation and signal detection:
Channel estimation and signal detection is amajor challenge due to the random features of wireless channels in wireless communicationnetworks. For downlink systems, FL algorithms are used for channel estimation and multi-userdetection, where each user performs a FL algorithm for channel estimation and signal detection,and sends their local FL model parameters to the BS that will generate the global FL model.For multi-cell uplink systems, multi-user signals can be detected via iteratively transmittingindividually FL model parameters from all BSs to a server and broadcasting the unified FLmodel parameters from the server to all the BSs. Further, FL algorithms can be utilized toautomatically design the codebook of BSs and decoding strategy of users to minimize the biterror rate, where users upload the learned result to the corresponding BSs and the BSs forwardtheir unified learned result to a server. B. Reconfigurable Intelligent Surface
Reconfigurable intelligent surface (RIS)-assisted wireless communication has been proposed asa potential solution for enhancing the energy efficiency of wireless networks [27]–[38]. An RISis a meta-surface equipped with low-cost and passive elements that can be programmed to turnthe wireless channel into a partially deterministic space. In RIS-assisted wireless communicationnetworks, a BS sends control signals to an RIS controller so as to optimize the properties ofincident waves and improve the communication quality of users. The RIS acts as a reflector anddoes not perform any digitalization operation. Hence, if properly deployed, an RIS promisesmuch lower energy consumption than traditional amplify-and-forward (AF) relays [39]–[41].However, the constraint on the diagonal phase shift matrix and unit modulus of the reflectingRIS makes the joint design of transmit beamforming and phase shifts extremely challenging.To address high-dimension, complex EM environment, and mathematically intractable nonlinearissues of communication systems, the model-free FL method as an extraordinarily remarkabletechnology can be used.
1) CSI Detection:
In the RIS-enhanced system, to achieve the full advantages of the ar-chitecture, several efficient technologies are required including the joint active and passivebeamforming, resource allocation, and energy-efficient design. It is noted that all of above designsrely on the perfect CSI between the BS and RIS, and the perfect CSI between the RIS and users.However, it is infeasible for the RIS-enhanced systems to estimate the accurate CSI when theradio frequency (RF) chains or sensors are not equipped on the RIS. To this end, it is meaningfulto use FL for CSI detection in RIS-assisted wireless communications.The FL-based model training approach can be used in RIS-assisted massive multiple-input-multiple-output (MIMO) systems [42]. The FL approach mainly includes three steps: datacollection, training and prediction. In the first step, each user collects its local training dataset, where the pilot sequence is the input and the received signal is the output. Then, each usercomputes the updated model by using its own local data set, and the BS generates a globalmodel after receiving the updated models from all users. In the last step, each user estimates itsown channel by feeding the received pilot data into the trained model.
2) Distributed joint passive and active beamforming:
In RIS, the phase shift of each RISelement can be adjusted to improve the performance of RIS-assisted wireless communication systems. Different from conventional communications, it is of importance to jointly optimize thepassive beamforming (phase shift matrices at the RIS) and active beamforming (beamformingat the multi-antenna transmitter) [43], [44]. To solve the complicated joint passive and activebeamforming, deep learning (DL) has been used to design the best reflection matrix of RISelements in indoor communication environments [45]. In practice, similar to multi-hop relay-ing systems, multiple RISs can be used to overcome severe signal blockage between the BSand users to achieve better service coverage. The authors in [46] presented a multi-hop RIS-assisted communication scheme to overcome the severe propagation attenuations and improvethe coverage range at Terahertz (THz) band frequencies, where the hybrid design of transmitbeamforming at the BS and phase shift matrices is obtained by the advances of RL. Due tothe high complexity of using centralized RL, FRL can be utilized to solve the joint passive andactive beamforming problem, where all users can individually optimize the phase shift matricesand transmit beamforming via RL, and the BS broadcasts the aggregated learning model to allusers.
3) Phase shift prediction:
Due to the randomness of wireless communication channels, it isrequired to adjust the phase shift matrixes as the wireless channel changes. By exploiting thetime-correlated property of channel fading, the phase shift matrixes of the RIS can be predictedwith FL. To predict the phase shift, each user uses long short-term memory (LSTM) networkto predict the future CSI and phase shift matrices using local data set, while the BS aggregatesthe received results from all users.
C. Semantic Communication
Semantic communication, is similar to a brain communication, where the difference betweenmeaning of the transmitted symbols and that of recovered ones is correlated [47]. This correlationcan be useful for joint encoding and decoding when the bandwidth of the system is limited orthe bit error rate is high for some typical communication systems.
1) Channel encoder and decoder design:
Using semantic communication technique whichenables the devices only to transmit semantic information to the server, rather than traditionalbit or symbol, the network bandwidth utility can be effectively improved. However, semanticcommunication model requires the training data from multiple distributed devices, which induceshuge communication cost for data transmission. To solve this problem, a FL based DL enabled semantic communication can be proposed for channel encoder and decoder design. First, a DLmodel can be used to extract the semantic information from text or audio with robustness tonoise. Then, in an FL approach, the devices and the server obtain practicable DL models withthe server aggregating devices locally trained models and sending the aggregated model back tothe devices.
2) Distributed semantic communication for IoT:
The emerging technologies, such as smartcity, IoT and machine to machine (M2M) networks, require the intelligent communicationbetween different ends, such as human to machine. For those applications, the intelligent com-munication depends on the background and interface language model [48]. Besides, there arealways a large number of devices in IoT. The above factors motivate the design of distributedsemantic communication for IoT with FL. The distributed semantic communication with FLincludes three steps. In the first step, the BS computes the semantic communication model usingDL. In the second step, the BS broadcasts the trained DL model to all users. In the third step,each user obtains the semantic features through receiving the broadcast information. Then, eachuser transmits the semantic features to the BS and the BS accordingly updates the semanticcommunication model.
D. XR
XR refers to all real-and-virtual environments generated by computer graphics, which includesaugmented reality (AR), mixed reality (MR), and virtual reality (VR), as in Fig. 6. DeployingXR over wireless communication networks is an essential step for realising XR applications[1]. Due to the seamless and immersible requirements, it is important to introduce wirelesscommunication technologies to meet the stringent quality of service requirements, such as highdata rate and ultra low latency. For XR allocation over wireless communications, the locationand orientation information needs to be sent to the BSs and the BSs construct the 360 degreesimages for users based on the received information
1) User movement prediction:
In a wireless XR network, the user body movement can heavilyinfluence the wireless resource allocation and network management [49]. To deal with the usermovement challenge, FL is effective at predicting the users’ movements and actions. Based onthe predicted movements and actions, the BSs can improve the generation of the XR imagesand optimize the resource management for wireless XR users. Fig. 6. Classification of XR.
2) Resource allocation:
FL can be used to develop self-organizing algorithms for solvingdynamic resource management problems for XR networks [50]. In particular, FL can be used toadaptively optimize the wireless resource and construct the format of the XR images based onthe wireless environment.
E. Non-Orthogonal Multiple Access
NOMA is envisioned to be a promising technique for the development of next-generationwireless networks [51]. By serving multiple users at the same time and frequency resource,NOMA can scale up the number of served users, increase spectral efficiency, and improve user-fairness compared to existing orthogonal multiple access (OMA) techniques. Recently, significantresearch efforts have appeared focusing on various challenge of NOMA [52]–[54], that includemodeling, performance analysis, signal processing, and emerging NOMA applications such asheterogenous networks (HetNets), cognitive radio networks and millimeter wave (mmWave)communications. The non-orthogonal resource allocation nature of NOMA necessitates the intro-duction of novel models and algorithms for addressing several challenges that include: joint user clustering and resource allocation for devising a scalable multi-cell NOMA design, advancedchannel estimation and signal detection for large-scale NOMA networks, and dynamic userbehaviour prediction in NOMA based mobile networks.Due to non-orthogonal resource allocation, intra-cell interference always exists in NOMAnetworks compared to OMA networks, which usually leads to nonconvex resource allocationproblems. Traditional optimization methods, which are used to solve the nonconvex problemsfor optimizing the performance of NOMA networks, mostly operate in an offline manner withhigh computation complexity and depend largely on accurate CSI [55]–[58]. Machine learningtools [59]–[62] can exploit big data analytics for wireless network state estimation and find therelationship between the optimized variables and objective functions in an online manner so as toreduce the computational complexity for solving the nonconvex problems in NOMA. However,given that multi-cell NOMA needs global CSI, centralized learning algorithm may require theBSs to continuously upload their collected data to a centralized processing server, which canlead to a high network overhead and significant delays. Besides, in NOMA, each subcarriercan be occupied by multiple users. In consequence, using a centralized learning algorithm forresource management or network control may need a large number of iterations to converge.Thus, centralized machine learning algorithms such as in [63]–[66] will not be able to handlethe resource allocation, signal detection and user behaviour prediction problems in NOMA. ForNOMA, FL have two important use cases: 1) FRL can be used to solve complex convex andnonconvex optimization problems that arise in various NOMA use cases such as network control,user clustering, resource management and interference alignment and 2) FSL enables users tocollaboratively learn a shared prediction model while remaining their collected data on theirdevices for user detection and CSI prediction.
1) Resource management in NOMA:
With superposition coding at the transmitter and succes-sive interference cancellation (SIC) at the receiver, NOMA can achieve higher spectral efficiencythan OMA. Moreover, NOMA can serve multiple users at the same resource (e.g., time/frequency)by exploiting the user differences in the power domain [67], [68]. This power domain featureprovides rich opportunities for NOMA to support massive connectivity and meet the users’diverse quality of service.The spectral efficiency and connectivity optimization of NOMA always leads to nonconvexresource allocation problems, which were solved by conventional algorithms such as successive convex approximation and matching theory with high complexity and impractical implementation[52]. Therefore, there is a need to introduce new distributed learning techniques that can be usedto address a variety of resource management challenges such as distributed power control formulti-cell NOMA [57], joint user association and beamforming design [54], and dynamic userclustering [69]. For multi-cell power control, FRL enables each BS to build the relationshipbetween the power control schemes and utility values so as to find the optimal power controlscheme. FRL can also be used to study the user association and beamforming of a multi-antennaNOMA network [70]. Further, the use of FRL for dynamic user clustering in NOMA, whereusers individually learn the clustering parameters by RL and the BS builds the unified clusteringparameters based on the received clustering parameters from all users.
2) Channel estimation and signal detection in NOMA:
Channel estimation and signal detectionin NOMA is a major challenge due to error propagation in SIC for NOMA networks. FSLalgorithms can be used for channel estimation and multi-user detection in downlink NOMAnetworks, where each user performs a supervised learning (SL) algorithm for channel estimationand signal detection of multiple users due to SIC and send their local federated learning modelparameters to the BS that will generate the global FL model. As in [71], FSL can detect multi-user signal in multi-cell uplink NOMA networks via iteratively transmitting individually learningmodel parameters from all BSs to a server and broadcasting the unified learning model parametersfrom the server to all BSs. Further, FSL can be used to automatically design the codebook ofBSs and decoding strategy of users for code-domain NOMA networks so as to minimize biterror rate [72], where users upload the learned result to the corresponding BSs and the BSsforward their unified learned result to a server.
3) User behaviour prediction in NOMA:
Due to the heterogeneous quality-of-service require-ment of users in NOMA, where users in the same cluster forming NOMA should have diversifiedchannel gains and quality of service, user behaviour prediction is of great importance for theimplementation of NOMA networks. To predict the users behaviors such as mobility patterns,each user in FSL scheme performs a supervised learning algorithm to train the learning modelusing its own user behavior data and upload the trained model to the BS via NOMA. Thenthe BS generates and broadcasts the unified learning model parameters to all users by usingNOMA. Based on the mobility patter predictions, the users can dynamically choose subchannelto upload data in the uplink, the BS dynamically allocates multiple subchannels to multiple users in the downlink, and multiple users which occupy the same subchannel can perform NOMA.For multiple BSs to predict the quality of service of users [73] in FSL, each BS uses supervisedlearning algorithm based on its stored information such as users’ requested data, gender, job,and device type and all BSs transmit the learning model results to a server via NOMA to get aunified federated learning model.IV. R ESEARCH D IRECTIONS AND C HALLENGES
FL ensures that the resource allocation or behavior prediction problem can be solved in adistributed manner for wireless networks. The utilization of FL for wireless networks has thefollowing five main directions and challenges.1. Convergence analysis: Due to the limited number of resource blocks (RBs) in a wirelessnetwork, only a subset of users can be selected to transmit their local FL model parametersto the BS at each learning step. Moreover, since each user has unique training data samples,the BS prefers to include all local user FL models to generate a converged global FL model.Hence, the FL performance and convergence time will be significantly affected by the userselection scheme. Most of the FL convergence proof is established on the assumption thatthe loss function in convex [74], [75]. However, the loss function of the popular neuralnetwork is non-convex. It is a challenge to investigate the convergence rate for FL withnon-convex loss function.2. Privacy and security: In FL, the raw data set at each user can be protected since only thelocal FL model is transmitted to the BS. However, it is also possible for Eavesdropperto reconstruct the raw data approximately, especially when the local and global modelparameters are not well protected [76]. Besides, the local FL model may leak privateinformation. In FL, the security can be classified into two categories: global security andlocal security. Global privacy requires that the model updates generated at each round areprivate to all untrusted third parties other than the central server, while local privacy furtherrequires that the updates are also private to the server.3. Asynchronous communication: Fl involves the information exchange between wirelessdevices and the BS. Synchronous communication methods are simple, which introducestragglers among all devices. Asynchronous schemes are an attractive approach to mitigatestragglers in heterogeneous environments. server. While asynchronous parameter servers have been successful in distributed data centers, classical bounded-delay assumptions canbe unrealistic in federated settings.4. Non-iid device: Challenges arise when training federated models from data that is notidentically distributed across devices, both in terms of modeling the data, and in terms ofanalyzing the convergence behavior of associated training procedures. limited computationcapacity at some wireless devices causes delays.5. Joint communication and computation design: To deploy FL in wireless networks, devicesare required to transmit their multimedia data or local training results over unreliable wire-less links. This exposes the performance of learning and inference to degradation causedby limited radio resources (e.g., power, time and bandwidth). This makes it important tojointly manage communication and computation resources for efficient and robust FL.V. O PEN P ROBLEMS AND F UTURE D IRECTIONS
This section is to discuss open research problems in each one of the covered areas, in orderto shed light on future opportunities. Despite a considerable number of studies on FL, there arestill many key open problems that must be investigated about FL for wireless communications.1. Convergence: For FL convergence rate, there are still some key problems. For example,there is a need for exact/more accurate convergence formulation with less assumptions andapproximations [74], which should consistent with real FL experiment data. Although thereare some studies in this area, most of them related to convex loss function. Besides, due tothe heterogeneous property of quality of service, it is possible to simultaneously conductmulti-task FL. In addition, for large-scale system, the multi-cell and multi-hop FL shouldbe considered, which require one must have more insights on FL convergence analysis.Moreover, one challenge is to study the mobility of wireless devices for FL convergence.Due to the mobility, the channel gains between devices and BS are dynamically changedand it is possible that some devices will quit the FL process due to serious channel stateinformation, which affects the convergence of the whole FL process.2. Privacy and security: In terms of open problems for privacy and security, there is a needfor the following study: privacy protection at each user, privacy protection at the BS, andsecurity for the whole FL algorithm. For privacy protection at each user and the BS, oneof the key problem is to study the coding scheme and physical layer security technique. For security of the whole FL algorithm, there is a need to study the encryption (such asquantum key distribution) and defender.3. Performance evaluation: One of the challenges is to investigate communication bandwidthfor FL delay performance. FL on mobile phones relies on wireless communication tocollaboratively learn a machine learning model. Although compute resources of mobilephones are becoming increasingly powerful, the bandwidth of wireless communication hasnot increased as much. As such, the bottleneck is shifted from computation to communica-tion. As a consequence, limited communication bandwidth could incur long communicationlatency, and thus could significantly slow down the convergence time of the FL process.4. FL for emerging technologies: The interplay between FL and emerging technologies in-troduces new challenges. For instance, the very high propagation attenuations in THz canaffect the convergence analysis. For instance, in satellite communication, FL can used tooptimize beam and location of the satellite. For brain-computer, one of the challenge is touse FL extract deep knowledge of the brain’s neural network. In quantum communication,there is a need to use FL optimize the parameters (such as base probability) for quantumkey distribution. VI. C
ONCLUSIONS
In this tutorial, we have provided a comprehensive study on the use of FL for wirelessnetworks. We have investigated two main classifications of FL, namely, FRL and FSL. Be-sides, we have provided the motivation applications of using FL for wireless communications.Meanwhile, we have described the techniques needed to meet the challenges of using FL forwireless communications. Such an in-depth study on FL for wireless communications providesunique guidelines for optimizing, designing and operating FL-based wireless communicationsystems. R
EFERENCES [1] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: Applications, trends, technologies, and open researchproblems,”
IEEE Network , vol. 34, no. 3, pp. 134–142, 2020.[2] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint learning and communications framework for federatedlearning over wireless networks,”
IEEE Trans. Wireless Commun. , 2020, to appear. [3] J. Koneˇcn`y, H. B. McMahan, D. Ramage, and P. Richt´arik, “Federated optimization: Distributed machine learning foron-device intelligence,” arXiv preprint arXiv:1610.02527 , 2016.[4] M. Bennis, M. Debbah, K. Huang, and Z. Yang, “Communication technologies for efficient edge learning,” IEEE Commun.Magazine , 2020 (To appear).[5] G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,”
IEEE Trans.Wireless Commun. , vol. 19, no. 1, pp. 491–506, 2019.[6] G. Zhu, Y. Du, D. Gunduz, and K. Huang, “One-bit over-the-air aggregation for communication-efficient federated edgelearning: Design and convergence analysis,” arXiv preprint arXiv:2001.05713 , 2020.[7] Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efficient resource management for federated edge learning withCPU-GPU heterogeneous computing,” arXiv preprint arXiv:2007.07122 , 2020.[8] M. M. Amiri and D. G¨und¨uz, “Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air,”
IEEE Trans. Signal Process. , vol. 68, pp. 2155–2169, 2020.[9] D. Gunduz, D. B. Kurka, M. Jankowski, M. M. Amiri, E. Ozfatura, and S. Sreekumar, “Communicate to learn at theedge,” arXiv preprint arXiv:2009.13269 , 2020.[10] M. M. Amiri and D. G¨und¨uz, “Federated learning over wireless fading channels,”
IEEE Trans. Wireless Commun. , vol. 19,no. 5, pp. 3546–3557, 2020.[11] S. Hosseinalipour, C. G. Brinton, V. Aggarwal, H. Dai, and M. Chiang, “From federated learning to fog learning: Towardslarge-scale distributed machine learning in heterogeneous wireless networks,” arXiv preprint arXiv:2006.03594 , 2020.[12] S. Hosseinalipour, S. S. Azam, C. G. Brinton, N. Michelusi, V. Aggarwal, D. J. Love, and H. Dai, “Multi-stage hybridfederated learning over large-scale wireless fog networks,” arXiv preprint arXiv:2007.09511 , 2020.[13] R. Jin, X. He, and H. Dai, “On the design of communication efficient federated learning over wireless networks,” arXivpreprint arXiv:2004.07351 , 2020.[14] D. Liu and O. Simeone, “Privacy for free: Wireless federated learning via uncoded transmission with adaptive powercontrol,” arXiv preprint arXiv:2006.05459 , 2020.[15] R. Kassab and O. Simeone, “Federated generalized bayesian learning via distributed stein variational gradient descent,” arXiv preprint arXiv:2009.06419 , 2020.[16] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode,R. Cummings et al. , “Advances and open problems in federated learning,” arXiv preprint arXiv:1912.04977 , 2019.[17] S. Samarakoon, M. Bennis, W. Saad, and M. Debbah, “Distributed federated learning for ultra-reliable low-latency vehicularcommunications,”
IEEE Transactions on Communications , vol. 68, no. 2, pp. 1146–1159, 2019.[18] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy efficient federated learning over wirelesscommunication networks,”
IEEE Trans. Wireless Commun. , 2020 (To appear).[19] B. Liu, L. Wang, M. Liu, and C. Xu, “Lifelong federated reinforcement learning: a learning architecture for navigation incloud robotic systems,” arXiv preprint arXiv:1901.06455 , 2019.[20] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,”
IEEESignal Process. Magazine , vol. 37, no. 3, pp. 50–60, 2020.[21] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang, D. Niyato, and C. Miao, “Federated learning inmobile edge networks: A comprehensive survey,”
IEEE Communications Surveys & Tutorials , 2020.[22] M. Murshed, C. Murphy, D. Hou, N. Khan, G. Ananthanarayanan, and F. Hussain, “Machine learning at the network edge:A survey,” arXiv preprint arXiv:1908.00080 , 2019. [23] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, “In-edge AI: Intelligentizing mobile edge computing, cachingand communication by federated learning,” IEEE Network , vol. 33, no. 5, pp. 156–165, 2019.[24] J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless network intelligence at the edge,”
Proceedings of the IEEE ,vol. 107, no. 11, pp. 2204–2239, 2019.[25] M. Aledhari, R. Razzak, R. M. Parizi, and F. Saeed, “Federated learning: A survey on enabling technologies, protocols,and applications,”
IEEE Access , vol. 8, pp. 140 699–140 725, 2020.[26] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,”
IEEE Trans. Wireless Commun. ,pp. 1–1, 2020.[27] E. Basar, M. Di Renzo, J. de Rosny, M. Debbah, M.-S. Alouini, and R. Zhang, “Wireless communications throughreconfigurable intelligent surfaces,” arXiv preprint arXiv:1906.09490 , 2019.[28] S. Zhang and R. Zhang, “Capacity characterization for intelligent reflecting surface aided MIMO communication,” arXivpreprint arXiv:1910.01573 , 2019.[29] S. Hu, K. Chitti, F. Rusek, and O. Edfors, “User assignment with distributed large intelligent surface (LIS) systems,” in
Proc. IEEE Int. Symposium Personal, Indoor Mobile Radio Commun. , Bologna, Italy, Sep. 2018, pp. 1–6.[30] C. Pan, H. Ren, K. Wang, W. Xu, M. Elkashlan, A. Nallanathan, and L. Hanzo, “Intelligent reflecting surface for multicellMIMO communications,” arXiv preprint arXiv:1907.10864 , 2019.[31] Q.-U.-A. Nadeem, A. Kammoun, A. Chaaban, M. Debbah, and M.-S. Alouini, “Large intelligent surface assisted MIMOcommunications,” arXiv preprint arXiv:1903.08127 , 2019.[32] L. Wei, C. Huang, G. C. Alexandropoulos, Z. Yang, C. Yuen, and Z. Zhang, “Joint channel estimation and signal recoveryin RIS-assisted multi-user MISO communications,” arXiv preprint arXiv:2011.13116 , 2020.[33] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, and C. Yuen, “Reconfigurable intelligent surfaces for energyefficiency in wireless communication,”
IEEE Trans. Wireless Commun. , vol. 18, no. 8, pp. 4157–4170, Aug. 2019.[34] C. Huang, R. Mo, and C. Yuen, “Reconfigurable intelligent surface assisted multiuser MISO systems exploiting deepreinforcement learning,”
IEEE J. Sel. Areas Commun. , vol. 38, pp. 1–1, 2020.[35] C. Huang, S. Hu, G. C. Alexandropoulos, A. Zappone, C. Yuen, R. Zhang, M. Di Renzo, and M. Debbah, “HolographicMIMO surfaces for 6G wireless networks: Opportunities, challenges, and trends,” arXiv preprint arXiv:1911.12296 , 2019.[36] X. Yu, D. Xu, Y. Sun, D. W. K. Ng, and R. Schober, “Robust and secure wireless communications via intelligent reflectingsurfaces,” arXiv preprint arXiv:1912.01497 , 2019.[37] B. Zheng, C. You, and R. Zhang, “Double-irs assisted multi-user mimo: Cooperative passive beamforming design,” arXivpreprint arXiv:2008.13701 , 2020.[38] C. Chaccour, M. N. Soorki, W. Saad, M. Bennis, and P. Popovski, “Risk-based optimization of virtual reality over terahertzreconfigurable intelligent surfaces,” arXiv preprint arXiv:2002.09052 , 2020.[39] S. V. Hum and J. Perruisseau-Carrier, “Reconfigurable reflectarrays and array lenses for dynamic antenna beam control:A review,”
IEEE Trans. Antennas Prop. , vol. 62, no. 1, pp. 183–198, Jan. 2013.[40] J. Huang, Q. Li, Q. Zhang, G. Zhang, and J. Qin, “Relay beamforming for amplify-and-forward multi-antenna relaynetworks with energy harvesting constraint,”
IEEE Signal Process. Lett. , vol. 21, no. 4, pp. 454–458, Apr. 2014.[41] K. Ntontin, M. Di Renzo, J. Song, F. Lazarakis, J. de Rosny, D.-T. Phan-Huy, O. Simeone, R. Zhang, M. Debbah, G. Lerosey,M. Fink, S. Tretyakov, and S. Shamai, “Reconfigurable intelligent surfaces vs. relaying: Differences, similarities, andperformance comparison,” arXiv preprint arXiv:1908.08747 , 2019. [42] A. M. Elbir and S. Coleri, “Federated learning for channel estimation in conventional and irs-assisted massive mimo,” arXiv preprint arXiv:2008.10846 , 2020.[43] K. Yang, Y. Shi, Y. Zhou, Z. Yang, L. Fu, and W. Chen, “Federated machine learning for intelligent iot via reconfigurableintelligent surface,” arXiv preprint arXiv:2004.05843 , 2020.[44] W. Ni, Y. Liu, Z. Yang, H. Tian, and X. Shen, “Federated learning in multi-ris aided systems,” arXiv preprintarXiv:2010.13333 , 2020.[45] C. Huang, G. C. Alexandropoulos, C. Yuen, and M. Debbah, “Indoor signal focusing with deep learning designedreconfigurable intelligent surfaces,” in . IEEE, 2019, pp. 1–5.[46] C. Huang, Z. Yang, G. C. Alexandropoulos, K. Xiong, L. Wei, C. Yuen, and Z. Zhang, “Hybrid beamforming for ris-empowered multi-hop terahertz communications: A drl-based method,” arXiv preprint arXiv:2009.09380 , 2020.[47] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” arXiv preprintarXiv:2006.10685 , 2020.[48] H. Xie and Z. Qin, “A lite distributed semantic communication system for internet of things,” arXiv preprintarXiv:2007.11095 , 2020.[49] M. Chen, O. Semiari, W. Saad, X. Liu, and C. Yin, “Federated echo state learning for minimizing breaks in presence inwireless virtual reality networks,” IEEE Transactions on Wireless Communications , vol. 19, no. 1, pp. 177–191, 2019.[50] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Artificial neural networks-based machine learning for wirelessnetworks: A tutorial,”
IEEE Commun. Surveys Tutorials , vol. 21, no. 4, pp. 3039–3071, Fourthquarter 2019.[51] Z. Ding, Y. Liu, J. Choi, Q. Sun, M. Elkashlan, C.-L. I, and H. V. Poor, “Application of non-orthogonal multiple accessin LTE and 5G networks,”
IEEE Commun. Mag. , vol. 55, no. 2, pp. 185–191, Feb. 2017.[52] Y. Liu, Z. Qin, M. Elkashlan, Z. Ding, A. Nallanathan, and L. Hanzo, “Nonorthogonal multiple access for 5G and beyond,”
Proceedings of the IEEE , vol. 105, no. 12, pp. 2347–2381, Dec. 2017.[53] Y. Liu, H. Xing, C. Pan, A. Nallanathan, M. Elkashlan, and L. Hanzo, “Multiple-antenna-assisted non-orthogonal multipleaccess,”
IEEE Wireless Commun. , vol. 25, no. 2, pp. 17–23, Apr. 2018.[54] Z. Qin, X. Yue, Y. Liu, Z. Ding, and A. Nallanathan, “User association and resource allocation in unified NOMA enabledheterogeneous ultra dense networks,”
IEEE Commun. Mag. , vol. 56, no. 6, pp. 86–92, June 2018.[55] Z. Yang, W. Xu, C. Pan, Y. Pan, and M. Chen, “On the optimality of power allocation for NOMA downlinks with individualQoS constraints,”
IEEE Commun. Lett. , vol. 21, no. 7, pp. 1649–1652, July 2017.[56] Z. Yang, C. Pan, W. Xu, Y. Pan, M. Chen, and M. Elkashlan, “Power control for multi-cell networks with non-orthogonalmultiple access,”
IEEE Trans. Wireless Commun. , vol. 17, no. 2, pp. 927–942, Feb. 2018.[57] W. Shin, M. Vaezi, B. Lee, D. J. Love, J. Lee, and H. V. Poor, “Non-orthogonal multiple access in multi-cell networks:Theory, performance, and practical challenges,”
IEEE Commun. Mag. , vol. 55, no. 10, pp. 176–183, Oct. 2017.[58] W. Ni, X. Liu, Y. Liu, H. Tian, and Y. Chen, “Resource allocation for multi-cell irs-aided noma networks,” arXiv preprintarXiv:2006.11811 , 2020.[59] C. Andrieu, N. De Freitas, A. Doucet, and M. I. Jordan, “An introduction to MCMC for machine learning,”
MachineLearning , vol. 50, no. 1-2, pp. 5–43, Jan. 2003.[60] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, “Learning low-level vision,”
International Journal of Computer Vision ,vol. 40, no. 1, pp. 25–47, Oct. 2000. [61] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitasklearning,” in Proc. of the International Conference on Machine Learning , New York, NY, USA, July 2008, pp. 160–167.[62] C. M. Bishop,
Pattern recognition and machine learning . Springer, 2006.[63] H. Lee, M. Wicke, B. Kusy, O. Gnawali, and L. Guibas, “Predictive data delivery to mobile users through mobility learningin wireless sensor networks,”
IEEE Trans. Veh. Technol. , vol. 64, no. 12, pp. 5831–5849, Dec 2015.[64] L. Yao, A. Chen, J. Deng, J. Wang, and G. Wu, “A cooperative caching scheme based on mobility prediction in vehicularcontent centric networks,”
IEEE Trans. Veh. Technol. , vol. 67, no. 6, pp. 5435–5444, June 2018.[65] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong, “Caching in the sky: Proactive deployment ofcache-enabled unmanned aerial vehicles for optimized quality-of-experience,”
IEEE J. Sel. Areas Commun. , vol. 35, no. 5,pp. 1046–1061, May 2017.[66] J. Yin, L. Li, H. Zhang, X. Li, A. Gao, and Z. Han, “A prediction-based coordination caching scheme for content centricnetworking,” in
Proc. of Wireless and Optical Communication Conference , Hualien, Taiwan, April 2018.[67] Z. Yang, W. Xu, H. Xu, J. Shi, and M. Chen, “Energy efficient non-orthogonal multiple access for machine-to-machinecommunications,”
IEEE Commun. Lett. , vol. 21, no. 4, pp. 817–820, Apr. 2017.[68] Z. Yang, W. Xu, Y. Pan, C. Pan, and M. Chen, “Energy efficient resource allocation in machine-to-machine communicationswith multiple access and energy harvesting for IoT,”
IEEE Internet Things J. , vol. 5, no. 1, pp. 229–245, Feb. 2018.[69] J. Cui, Z. Ding, P. Fan, and N. Al-Dhahir, “Unsupervised machine learning-based user clustering in millimeter-wave-NOMAsystems,”
IEEE Trans. Wireless Commun. , vol. 17, no. 11, pp. 7425–7440, Nov. 2018.[70] M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. G. Andersen, and A. Smola, “Parameter server for distributed machine learning,”in
Big Learning NIPS Workshop , vol. 6, 2013, p. 2.[71] R. Bekkerman, M. Bilenko, and J. Langford,
Scaling up machine learning: Parallel and distributed approaches . CambridgeUniversity Press, 2011.[72] M. Kim, N.-I. Kim, W. Lee, and D.-H. Cho, “Deep learning-aided SCMA,”
IEEE Commun. Lett. , vol. 22, no. 4, pp.720–723, 2018.[73] S. Samarakoon, M. Bennis, W. Saad, and M. Debbah, “Federated learning for ultra-reliable low-latency V2V communi-cations,” in
Proc. IEEE Global Commun. Conf. , Abu Dhabi, United Arab Emirates, Dec. 2018, pp. 1–7.[74] M. Chen, H. V. Poor, W. Saad, and S. Cui, “Convergence time optimization for federated learning over wireless networks,”
IEEE Trans. Wireless Commun. , 2020 (To appear).[75] H. H. Yang, Z. Liu, T. Q. Quek, and H. V. Poor, “Scheduling policies for federated learning in wireless networks,”
IEEETransactions on Communications , vol. 68, no. 1, pp. 317–333, 2019.[76] C. Ma, J. Li, M. Ding, H. H. Yang, F. Shu, T. Q. Quek, and H. V. Poor, “On safeguarding privacy and security in theframework of federated learning,”