[PDF] Artificial Intelligence Driven UAV-NOMA-MEC in Next Generation Wireless Networks

Abstract

Driven by the unprecedented high throughput and low latency requirements in next-generation wireless networks, this paper introduces an artificial intelligence (AI) enabled framework in which unmanned aerial vehicles (UAVs) use non-orthogonal multiple access (NOMA) and mobile edge computing (MEC) techniques to service terrestrial mobile users (MUs). The proposed framework enables the terrestrial MUs to offload their computational tasks simultaneously, intelligently, and flexibly, thus enhancing their connectivity as well as reducing their transmission latency and their energy consumption. To this end, the fundamentals of this framework are first introduced. Then, a number of communication and AI techniques are proposed to improve the quality of experiences of terrestrial MUs. To this end, federated learning and reinforcement learning are introduced for intelligent task offloading and computing resource allocation. For each learning technique, motivations, challenges, and representative results are introduced. Finally, several key technical challenges and open research issues of the proposed framework are summarized.

Full PDF

aa r X i v : . [ ee ss . SP ] J a n A MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS Artiﬁcial Intelligence DrivenUAV-NOMA-MEC in Next GenerationWireless Networks

Zhong Yang,

Student Member, IEEE,

Mingzhe Chen,

Member, IEEE,

Xiao Liu,

StudentMember, IEEE,

Yuanwei Liu,

Senior Member, IEEE,

Yue Chen,

Senior Member, IEEE,

Shuguang Cui,

Fellow, IEEE,

H. Vincent Poor,

Fellow, IEEE

Abstract

Driven by the unprecedented high throughput and low latency requirements in next generation wire-less networks, this paper introduces an artiﬁcial intelligence (AI) enabled framework in which unmannedaerial vehicles (UAVs) use non-orthogonal multiple access (NOMA) and mobile edge computing (MEC)techniques to service terrestrial mobile users (MUs). The proposed framework enables the terrestrialMUs to ofﬂoad their computational tasks simultaneously, intelligently, and ﬂexibly, thus enhancing theirconnectivity as well as reducing their transmission latency and their energy consumption. To this end, thefundamentals of this framework are ﬁrst introduced. Then, a number of communication and AI techniquesare proposed to improve the quality of experiences of terrestrial MUs. To this end, federated learning andreinforcement learning are introduced for intelligent task ofﬂoading and computing resources allocation.For each learning technique, motivations, challenges, and representative results are introduced. Finally,several key technical challenges and open research issues of the proposed framework are summarized.

Z. Yang, X. Liu, Y. Liu and Y. Chen are with Queen Mary University of London, London, UK (email: { zhong.yang, x.liu,yuanwei.liu, yue.chen } @qmul.ac.uk).M. Chen is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA, and alsowith the Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen 518172, China (e-mail:[email protected]).S. Cui is with the Shenzhen Research Institute of Big Data, and the Future Network of Intelligence Institute, The ChineseUniversity of Hong Kong, Shenzhen 518172, China (e-mail: [email protected]).H. V. Poor is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail:[email protected]). MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS I. I

NTRODUCTION

In next generation wireless networks, the stringent delay requirements of services and applications, suchas virtual reality, augmented reality, holographic telepresence, industry 4.0, and robotics, are considerablyrestricted by ﬁnite battery and computing resources of terrestrial mobile users (MUs) and terrestrial accessponits (APs). In order to satisfy these stringent requirements, novel highly efﬁcient techniques, suchas mobile edge computing (MEC) [1], non-orthogonal multiple access (NOMA) [2], unmanned aerialvehicles (UAVs) [3, 4], and artiﬁcial intelligence (AI) algorithms [5] should be thoroughly investigatedfor next generation wireless networks.In this light, early research articles have studied these techniques to effectively exploit the performanceenhancement for next generation wireless networks. In [1], fog computing is introduced for mobilenetworks which is capable of achieving higher capacity than conventional communication networks. Theauthors in [3] investigate both cellular-enabled UAV communication and UAV-aided cellular communica-tion and optimize the trajectory of the UAV subject to practical communication connectivity constraints.Reference [4] minimize the sum energy consumption of MUs and UAVs in a UAV-MEC network by jointlyoptimize the user association, power control, computing resources allocation and location planning. Adisaster resilient three-layered architecture is proposed in [6], in which UAV layers are integrated withedge computing to enable emergency communication links. In UAV-NOMA-MEC systems, a criticalchallenge is task ofﬂoading decision-making and computing resources allocation. Moreover, a naturalapproach to task ofﬂoading and computing resources allocation is to combine them. For this reason, theyare often formulated as a mixed integer programming (MIP) problem [7, 8]. In [7], the authors proposeda joint optimization approach to allocate both the communication resources and computing resourcesfor NOMA-MEC networks, while minimizing the total energy consumption of MUs. The authors in [8]minimize the energy consumption by adjusting the computing resources and transmit power of the APs.MEC is a promising technique for next generation wireless networks, which moves the computingresources of central networks towards the network edges to MUs. MEC is capable of signiﬁcantly im-proving the computing performance of MUs with low energy consumption. NOMA, with high bandwidthefﬁciency and ultra high connectivity, is an emerging technique in next generation wireless networks. InUAV-NOMA-MEC, NOMA is capable of enabling ofﬂoading multiple computational tasks simultaneouslyfrom a large number of MUs under stringent spectrum constraints. In UAV-NOMA-MEC systems, UAVsare equipped with computing capabilities, thus can be swiftly deployed to emergency situations whenterrestrial MEC servers are overloaded or unavailable to MUs. There are two aspects to the combination ofUAVs and communication, namely, UAV aided communications and communication for UAV operations.

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS For the ﬁrst aspect, UAV aided communication has been recognized as an emerging technique due toits superior ﬂexibility and autonomy [9]. For the second aspect, the operational control of the UAVsoften relies on wireless communication, which introduces difﬁcult challenges for spectrum allocation andinterference cancellation.With the rapid progression of artiﬁcial intelligence (AI) and the high-performance computing work-stations, the integration of AI and UAV-NOMA-MEC is a promising direction to obtain an efﬁcientjoint resource allocation solution in an intelligent fashion. Firstly, deep reinforcement learning (DRL)is a model-free solution for efﬁcient decision-making problems, such as task ofﬂoading decision andcomputing resources allocation in UAV-NOMA-MEC systems. Then, the distinguished ﬁtting qualiﬁcationof deep neural networks (DNNs) is a novel approach to predict the computational tasks in UAV-NOMA-MEC systems, which can be used to further improve the performance of above-mentioned resourcesallocation solutions. Moreover, a recently proposed federated learning (FL) model is capable of furtherenhancing the training efﬁciency of the DRL and DNNs.The above challenges motivate us to consider an AI enabled UAV-NOMA-MEC framework in thispaper, the rest of which is organized as follows. In Section II, the system structure for the proposedUAV-NOMA-MEC framework is presented. In Section III, FL enabled task prediction for UAV-NOMA-MEC is investigated. The deployment design for UAV-NOMA-MEC is given in Section IV. AI enabledjoint resource allocation for UAV-NOMA-MEC is presented in Section V, before we conclude this workin Section VI. Table I provides a summary of advantages and disadvantages of AI solutions for UAV-NOMA-MEC systems.

UAV-NOMA-MEC

Backhaul linkAPAP

Central network

Cloud networksComputingComputingComputing

Backhaul link

ComputingComputing

UAV platform 1UAV platform 2 f Power

Data collection

AI solutionsUAV-NOMA-MEC structure

Wireless data Social data Cloud data

Deep reinforcement learning

Deep learning

Federated learning

Superposition coding (SC)

Task computational results

NOMA downlink

Successive interference cancellation (SIC)

Computational tasks NOMA uplink

Fig. 1. Network structure of UAV-NOMA-MEC

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS II. AI

ENABLED

UAV-NOMA-MEC S

YSTEM S TRUCTURE

A. Structure for AI-enabled UAV-NOMA-MEC system

Fig. 1 illustrates the network structure of UAV-NOMA-MEC, which consists of a central network, mul-tiple UAV platforms with computing capabilities, multiple APs with computing capabilities, and mobileMUs. MUs can be mobile smart devices or UAV platforms. Each MU has a computational task, whichmust be processed within a time constraint. In Fig. 1, UAVs and other MUs are clustered as one group,which has heterogeneous channel condition, thus is appropriate for NOMA uplink transmission. The MUscommunicate with the UAV platforms or APs using the NOMA techniques, while the communicationbetween APs and the central network is also conducted by the NOMA techniques. MUs can ofﬂoad thecomputational tasks to UAV platforms and the APs located within its communication range. Moreover,the mobility of MUs and UAVs have heterogeneous characteristics, which is challenging for resourceallocation. To allocate the communication resources and computing resources efﬁciently, we need topredict the mobility of MUs’ tasks. Then, based on the predicted task mobility, UAVs are deployedaccordingly. Lastly, task ofﬂoading decision and computing resources allocation are implemented withthe proposed AI algorithms.

B. Task Mobility Prediction for UAV-NOMA-MEC

Due to the mobility characteristic of MUs and computational tasks in UAV-NOMA-MEC networks,the requested computational tasks varies over time. Therefore, the computing resources allocation andthe task ofﬂoading decision must be conducted dynamically according to the task mobility. To efﬁcientlyallocate computing resources in UAV-NOMA-MEC, some prior information is required, e.g., task mobilityin the future. The recent advances in AI have provide novel approaches to predict the task mobility. Theadvantage of AI algorithms is that they can train a learning model to obtain the complex relationshipbetween the future task mobility and the task mobilities in the history, which is non-trivial for conventionalapproaches. Therefore, we propose AI algorithms for task mobility prediction, which works as a priorinformation for joint resources allocation (e.g., bandwidth, storage capacity and computing speed, etc).

C. Techniques of UAV-NOMA-MEC based frameworks1) UAVs for NOMA-MEC networks:

UAVs have attracted research attention from both academia andindustry for next generation wireless networks, because UAVs are easy to be deployed in various scenariosto support services, such as rapid emergency communication response and accurate observation services.In these services, UAVs are deployed as relays to support MUs with line-of-sight (LOS) wireless channels.

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS TABLE IA

DVANTAGES AND D ISADVANTAGES OF A RTIFICIAL I NTELLIGENT S OLUTIONS FOR

UAV-NOMA-MEC

AI solutions Advantages for UAV-NOMA-MEC Disadvantages for UAV-NOMA-MEC

Deep neural networks(DNN) (a) distinguished ﬁtting capabilitiesof task prediction(b) complex non-linear relationshipsof task prediction (a) require large amount of labelled input/outputwireless data, social data and cloud data(b) over ﬁtting problem for task predictionDeep reinforcementlearning (DRL) (a) does not need labelled training datafor resource allocation(b) similar with humans resource allocation experience large action space and state space withthe increasing number of UAVs, BSs, and MUsFederated learning(FL) (a) privacy preserving for sensitive MUsin UAV-NOMA-MEC(b) high training efﬁciency for task prediction local network (UAVs network, BSs networks,and MUs networks) failure affectsthe global network

When the computing capabilities of APs and MUs are not enough for massive tasks computing, deploycomputing resources quickly to MUs is a major challenge for NOMA-MEC networks. UAVs can bedeployed dynamically according to the requirements of MUs, thus are an efﬁcient complementary forNOMA-MEC networks. In the proposed UAV-NOMA-MEC networks, UAVs work from two aspects, i.e.,UAVs as base stations and UAVs as users. From the UAVs as base stations aspect, UAVs are integratedwith computing resources and can be deployed dynamically for emergency use. However, deployingUAVs at the UAV-NOMA-MEC networks is challenging and a large amount of recent works have studiedthe deployment problem. Furthermore, in contrast to conventional terrestrial BSs deployment, the UAVplacement is no longer a 2D placement problem, it is actually a 3D placement problem. From the UAVs asusers aspect, UAVs have computing-intensive tasks, which require a large amount of computing resources.Therefore, the UAVs can transmit the computational tasks to the MEC servers at the terrestrial AP usingNOMA technique. Then after computing, the tasks’ computing results are transmitted back to the UAVsusing NOMA technique.

2) NOMA for UAV-MEC networks:

For UAV-MEC networks, choosing suitable transmission mecha-nism for the computational tasks ofﬂoading is a key challenge for reducing the computing delay. Differentfrom orthogonal multiple access (OMA) in UAV-MEC, NOMA can ensure that multiple computationaltasks are ofﬂoaded from MUs to UAV platforms or terrestrial MEC servers within the same giventime/frequency resource block (RB), which is capable of signiﬁcantly reducing the computation latency ofMUs. For this reason, we adopt NOMA in UAV-MEC networks to better utilize the capacity of the com-munication channel for computational tasks ofﬂoading, and consequently reduce the task computational

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS latency for multiuser UAV-MEC networks.

3) AI for UAV-NOMA-MEC networks:

The recent advances in AI offer promising approaches to tacklethe new challenges in UAV-NOMA-MEC. For instance, in order to efﬁciently allocate the limited commu-nication and computing resources in UAV-NOMA-MEC, deep learning (DL) oriented algorithms can beused to predict task popularity more accurately, going beyond conventional approaches [10]. Furthermore,deep reinforcement learning (DRL) algorithms can be utilized to solve stochastic optimization problems,which may not be computationally feasible with conventional optimization approaches. In fact, the timecomplexity of optimal solutions for the joint resource allocation problems arising in UAV-NOMA-MECincrease exponentially with the number of involved variables (e.g., number of MUs, number of UAVs,etc.). To this end, we propose AI based solution for UAV-NOMA-MEC framework. Table I presents theadvantages and disadvantages of the AI based solution for UAV-NOMA-MEC framework.

D. AI-enabled UAV-NOMA-MEC Network Optimization for Serving Terrestrial Users and UAVs

The considered UAV-NOMA-MEC framework contains several optimization problems, including taskprediction, UAV deployment, user association, signal processing, and joint resource allocation. Thepredicted tasks work as prior information of MUs’ requirements, which is used for the following op-timizations. With the predicted requirements, the UAVs are deployed accordingly, with reinforcementlearning (RL) approaches, since RL algorithms are suitable for UAVs deployment optimization. Then,how to associate the UAVs and terrestrial MEC servers is another critical problem. In the end, jointoptimization of task ofﬂoading and computing resources allocation can be solved by the proposed DRLalgorithms.III. F

EDERATED L EARNING E NABLED U SER P REDICTION FOR

UAV-NOMA-MECIn this section, we ﬁrst explain why we need to use FL for computing resources allocation in theproposed framework. Then, we discuss the challenges of using FL for the proposed framework. Finally,we use an example to show the implementation of FL for optimizing computational and task allocationin the proposed framework.

A. Motivations

Due to the mobility of UAVs and dynamic computational requests as shown in Fig. 2(a), a giventerrestrial MU in the proposed framework must ofﬂoad its computational tasks to different UAVs atdifferent time periods. Hence, for a given terrestrial MU, its historical information related to computationaltasks and user connection will be distributed across multiple UAVs. Due to the large data size of each

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS (a) Dynamic user association in UAV-NOMA-MECnetworks. (cid:28611) (cid:28611)(cid:28611) (b) Implementation of FL over UAV-NOMA-MECUAVs.Fig. 2. FL over UAV-NOMA-MEC Networks user’s historical information, the limited energy of each UAV, and privacy concerns, UAVs may not beable to share each MU’s historical information with other UAVs. Therefore, it is necessary to design anovel distributed machine learning (ML) algorithm that enables the UAVs to train a common ML modelthat can accurately learn the entire future computational requests of each MU without data exchange. FLis a such type of distributed learning algorithm which enables the UAVs to exchange their trained MLparameters to generate a common ML model for computational request predictions [11]. In particular,FL is trained by an iterative distributed process. At each iteration, each UAV must ﬁrst use its collecteddata to train its local ML model and share its ML parameters to other UAVs, as shown in Fig. 2(b).Then, each UAV will use the ML parameters received from other UAVs and its own ML parametersto generate a common ML model. After that, each UAV will use the generated common ML model toupdate its own local ML model. After several iterations, each UAV can ﬁnd a common ML model thatcan predict the entire future computational task requests of each MU. B. Challenges

Implementation of FL over UAVs also faces several challenges. First, due to their mobility and limitedbattery capacity, UAVs may not be able to participate in all FL training iterations. In particular, the numberof UAVs that can participate in different FL iterations is different thus affecting FL training loss. To thisend, a UAV scheduling scheme must be designed so as to minimize the FL training loss and convergencetime. Meanwhile, due to their high ﬂying speed, UAVs must complete the FL training process in a stricttime period. Hence, an efﬁcient FL training method is needed for UAVs. In addition, UAVs in the proposedframework must provide computational services for the terrestrial MUs. Therefore, UAVs must ﬁnd atradeoff of the energy consumption for FL training and providing computational services for terrestrialMUs. Finally, in the proposed framework, each UAV must communicate with both terrestrial MUs and

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS other UAVs. The mutual interference between such aerial and terrestrial systems will signiﬁcantly affectthe data rate of computational task transmission and ML parameter transmission thus increasing the FLconvergence time and computational time. In consequence, there is a need to jointly optimize wirelessresource allocation and deployment for UAVs so as to minimize the mutual interference. Number of users P r e d i c ti on acc u r ac y Centralized learningSeparate learningFederated learning (a) Prediction accuracy as the number of MUs varies.

Time slots T he U AV I nde x N u m be r o f k il ob i t s Optimal user associationFederated learningData size of each computational task (b) Predicted user association as the data size of com-putational tasks varies.Fig. 3. FL for Proactive User Association

C. Representative Result

Next, we use two simulation ﬁgures to show the performance of using FL for the proposed framework.The simulation settings is based our previous work [12]. In particular, Fig. 3 shows the performance ofusing FL for proactively determining user association. Given the future user association, one can useoptimization theory to optimize the task allocation and resource allocation. From Fig. 3(a), we can seethat FL can achieve a better accuracy compared to separate learning. This is because FL enables UAVsto cooperatively generate a common ML model and hence improving prediction accuracy. Meanwhile, asthe number of MUs increases, the gap between centralized learning and FL decreases. However, differentfrom centralized learning that requires UAVs to share their data, FL only needs the UAVs to share theirlearned ML parameters thus improve data privacy for the UAVs. Fig. 3(b) shows how the predicted userassociation changes as the data size of computational tasks varies. From this ﬁgure, we can see thatFL can accurately determine the user association as the data size of computational tasks varies. This isbecause user association variable is binary and hence small FL prediction errors may not signiﬁcantlyaffect the accuracy of the optimal user association prediction.

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS IV. D

EPLOYMENT DESIGN FOR

UAV-NOMA-MECAs mentioned above, heterogeneous network segments, including heterogeneous user mobility, tele-trafﬁc demand, and computing resource requirements, impose signiﬁcant challenges on conventionalterrestrial MEC networks. In an effort to tackle these challenges, the terrestrial MEC networks may beintrinsically amalgamated with UAV-aided wireless networks for forming air-ground integrated mobileedge computing networks. Compared to conventional terrestrial NOMA-MEC networks, UAVs can bedynamically deployed closer to MUs than terrestrial APs, which leads to improved performance. Ad-ditionally, as shown in Fig. 4, in the NOMA-MEC networks, dynamic deployment design of UAVs iscapable of making the channel condition of NOMA MUs more suitable to NOMA policy, which improvesthe system performance. In contrast to the UAV-aided wireless networks, where UAVs act as aerial basestations or aerial relays to improve the QoS of MUs, by integrating UAVs into NOMA-MEC networks,UAVs can act as aerial MEC server to execute computational tasks from MUs. Since heterogeneous usermobility is considered, UAVs has to be re-deployed simultaneously based on the movement of MUs.Moreover, when a particular UE has special request, UAVs can adaptively changing their positions tosatisfy MUs’ requirement. Moreover, as illustrated in Fig. 4, in the NOMA-MEC networks where delay-sensitive MUs and delay-tolerant MUs are partitioned into the same cluster, UAVs can ﬂy towards todelay-sensitive MUs to minimize the sum delay of all MUs. In this section, we discuss the deployment andtrajectory design for UAV-NOMA-MEC networks. Based on the network structure of UAV-NOMA-MEC

Trajectory

UAV 1 at timeslot t UAV n at timeslot t UE 1(autonomous vehicle) UAV 1 at timeslot t UAV n at timeslot t TrajectoryUE 2 (mobile user) UE 4 (mobile user)UE 3(mobile user)

NOMA link for UEs with heterogeneous

QoS requirement in the same NOMA cluster

NOMA link for UEs with heterogeneous mobility

Fig. 4. Air-ground integrated NOMA-MEC networks networks, we focus on the AI-based solutions for designing the deployment of UAVs, this is because thatUAVs operate in a complex time-variant hybrid environment, where the classic mathematical models have

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS limited accuracy. In contrast to the conventional gradient-based optimization techniques, RL approachesare capable of enabling UAVs to rapidly adapt their trajectories to the dynamic/uncertain environment bylearning from their environment and historical experiences.In the RL-empowered UAV-NOMA-MEC networks, RL model empowers agents to make observationsand take actions within the environment, and in return, receive rewards. It possesses learning capabilitybased on correcting mistakes over trial and aims for maximizing expected long-term rewards. Hence,RL algorithms outperform the conventional algorithms in terms of dynamic scenarios or interactive withenvironment. However, every approach conveys both advantages and disadvantages in variable scenariosof UAV-NOMA-MEC networks. RL models assume the formulated as a Markovian problem, whichindicates that when the current state depends not only on the attained previous state, RL algorithmsmay fail to solve the problem. Additionally, when faced with simple scenarios, RL algorithms have nosuperiority due to the reason that the optimality of RL algorithms cannot be theoretically proved orstrictly guaranteed.The discussions of designing architecture of RL model in UAV-NOMA-MEC networks are listed asfollows: • Distributed or Centralized:

The advantage of centralized RL model in UAV-NOMA-MEC networksis that the central controller (the base station or control center) has complete local information.Thus it enables each agent (UAV) to cooperate with each other and searching for optimal controlpolicy collectively. However, the centralized design requires the accurate instantaneous channel stateinformation (CSI). Additionally, in the centralized ML model for UAV-NOMA-MEC networks,the central controller requires each agent to share their states and actions while searching for theoptimal strategies. The formulated problem has to be solved by updating control policy based onall agents’ actions and states, which leads to increased complexity of the model. On the other hand,the aforementioned challenge can be solved by distributed RL model. However, incomplete localinformation may lead to performance loss. Additionally, the distributed model causes unexpectedstate change of neighboring areas and leads to the complicated situation of multi-agents competition. • Continuous or Discrete:

RL algorithms can be divided into three categories, namely, value-basedalgorithms, policy-based algorithms, and actor-critic algorithms. When consider discrete position,value-based RL algorithms are more suitable for designing the trajectory of UAVs. However, whendiscrete trajectory design problem is coupled with continuous task/resource allocation problem, howto design RL model with both continuous state space and discrete state space is challenging.The problem of UAVs’ trajectory design is coupled with other problems such as task ofﬂoading and

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS computing resource allocation, which will be discussed in the next sections. UAVs’ trajectory designproblem can be jointly tackled with the other problems by adopting the RL solutions introduced inthis section. In terms of challenges in UAV-NOMA-MEC networks, before fully reaping the beneﬁts ofintegrating UAVs into NOMA-MEC networks, some of the weaknesses of UAVs such as their limitedcoverage area, meagre energy supply, as well as their limited backhaul have to be mitigated.V. AI ENABLED JOINT RESOURCE ALLOCATION IN

UAV-NOMA-MEC

DNNs

Local ComputingOffloading MEC ComputingCloud Computing DownloadingOffloading Downloading

Task offloading decisionComputing allocation

MEC computing allocation Cloud computing allocation

Intelligent agenti. UAVs as agentsii. BSs as agentsiii.UEs as agents

Step2: Take the selected action

Step1: Observe current state, evaluate the performance Step3: Update the state according to the chosen action

UAV-NOMA-MEC Networks

Fig. 5. DRL based solution for joint resource allocation in UAV-NOMA-MEC

In this section, we advocate DRL based solutions for joint resource allocation of task ofﬂoading andcomputing resources allocation in UAV-NOMA-MEC networks. The structure of the DRL based solutionis presented in Fig. 5. The motivation of using DRL algorithms is to obtain an ofﬂine policy for theformulated joint optimization problem of task ofﬂoading and computing resources allocation.

A. Joint task ofﬂoading and computing resources allocation in UAV-NOMA-MEC

In multi-users UAV-NOMA-MEC networks, multiple MUs request for tasks computing services. Thekey research challenge is joint resources allocation, i.e., task ofﬂoading decision and computing resourcesallocation. More particularly, ofﬂoading computational tasks simultaneously to one destination, such asUAV and MEC server, is capable of reducing task computing latency. In UAV-NOMA-MEC, the taskofﬂoading decision and computing resources allocation are combined together, due to the reason thatonly the ofﬂoaded computational tasks need to be allocated with computing power from the computingplatforms, such as UAVs and MEC servers. Therefore, we formulate the task ofﬂoading decision andcomputing resources allocation as a joint optimization problem.In the proposed UAV-NOMA-MEC networks, tasks are ofﬂoaded simultaneously, using the NOMAtechnique, thus reducing the energy consumption of ofﬂoading and avoiding task transmission delay. Sincenoth the UAVs and MEC servers have computing capabilities, the task ofﬂoading in UAV-NOMA-MEC

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS networks have more than one destinations. Further more, according to whether the computational tasksare segmented, there are two kinds of task ofﬂoading, namely, binary ofﬂoading and partial ofﬂoading.

1) Binary ofﬂoading of UAV-NOMA-MEC:

In the binary ofﬂoading of UAV-NOMA-MEC, the com-putational tasks are not segmented, so they are computed locally at MUs, or ofﬂoaded to UAVs and MECservers for computing. So the task ofﬂoading decision for this case is to choose suitable destinations.

2) Partial ofﬂoading of UAV-NOMA-MEC:

In partial ofﬂoading of UAV-NOMA-MEC, the computa-tional tasks are ﬁrstly divided into fragments. Then the ofﬂoading decision is to decide which fragmentare ofﬂoaded to a speciﬁc destination, which is more complex than binary ofﬂoading.

B. AI based solution for joint optimization in UAV-NOMA-MEC

The prosperity of AI algorithms provide effective and low-cost solutions that make UAV-NOMA-MECadaptive to the dynamic radio environment. We adopt RL in UAV-NOMA-MEC because the mechanicalof RL algorithms is to obtain a long-term reward maximization by balancing exploration and exploitation,which is capable of solving a long-term optimization problem of joint task ofﬂoading and computingresources allocation [13, 14].

1) Q-learning for joint optimization:

In UAV-NOMA-MEC, our objective is to obtain a ofﬂine policyfor a long-term optimization of joint task ofﬂoading and computing resources allocation problem. Q-learning is one of the classic RL algorithms that is capable of selecting suitable action to maximizethe reward in a particular situation by training the Q-table. The reward function of the Q-learning inUAV-NOMA-MEC is deﬁned by the objective functions in the networks, e.g., energy consumptionminimization, summation data rate maximization, computation latency minimization, etc. However, inQ-learning algorithm, the action selection scheme is based on a random mechanism, such as ǫ -greedy.

2) Modiﬁed reinforcement learning for joint optimization:

In RL algorithm, how to select the suitableaction given the feedback and current state is critical. The action selection scheme is to balance theexploration and exploitation and avoiding over-ﬁtting. Conventional ǫ -greedy method cannnot balance theimportance of current reward and future reward. Therefore, we proposed a Bayesian learning automata(BLA) based action scheme for the proposed modiﬁed RL algorithm in UAV-NOMA-MEC. The functionof BLA is to adaptively make the decision to obtain the best action for the intelligent agent from theaction space offered by the UAV-NOMA-MEC environment it operates in. It is proven that BLA basedaction selection scheme is capable of enabling every state to select the optimal action. The proposed BLAbased RL algorithm achieves signiﬁcant performance improvement against conventional RL algorithm inUAV-NOMA-MEC [15]. MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS

3) DQN in for joint optimization:

The dimensional curse of RL algorithms is a heavy burden forintelligent agent. Moreover, for UAV-NOMA-MEC, the dimensions of state space and action space aresettled by the number of network parameters, e.g., number of channels, number of MUs and the number ofMEC servers. To overcome this drawback, we adopt deep Q networks (DQN) for the joint optimizationproblem in UAV-NOMA-MEC. In the proposed DQN, the optimal policy of the intelligent agent isobtained by updating Q values in neural networks (NNs). The inputs of the NNs are the current statesand the outputs are the probabilities of all the actions in the action space. By utilizing the ﬁtting abilityof the NNs, a high-dimension state input and low-dimension action output pattern is implemented to dealwith the curse of dimensionality in conventional RL algorithms, especially when the number of networkparameters in UAV-NOMA-MEC are large.VI. C

ONCLUSION R EMARKS AND F UTURE C HALLENGES

A. Conclusion Remarks

In this article, the design challenges associated with the application of AI techniques for UAV-NOMA-MEC networks have been investigated. An architecture for UAV-NOMA-MEC networks has beenproposed, and key AI techniques for their optimization have been described. Then, the network structureof UAV-NOMA-MEC is demonstrated where the NOMA technique is adopted to accommodate multipleMUs in a single resource block. Furthermore, three speciﬁc techniques, namely, federated learning enabledtask prediction, deployment design for UAVs, and joint resource allocation have been studied in detail.

B. Future Challenges

Although the advantages have been highlighted for task prediction, UAV deployment, and task com-puting in UAV-NOMA-MEC networks based on AI techniques, there still remain some open researchissues and challenges to be addressed in the future, which are outlined as follows: • Combination with 6G Techniques:

6G provides signiﬁcant new techniques that can be combinedwith UAV-NOMA-MEC, such as cell-free massive multiple-input multiple-output, millimeter-wavecommunication, and reconﬁgurable intelligent surfaces. • UAV trajectory and MA schemes selection:

In UAV-NOMA-MEC, the UAV trajectory andmultiple access (MA) schemes selection play a critical role in task ofﬂoading. AI based approachescan play an important role in jointly optimizing the UAV trajectory and MA scheme selection. • Joint optimization of AI transmission and wireless transmission:

In AI algorithms, the networkparameters need to be shared with other intelligent agents or network models. For AI enabled UAV-NOMA-MEC, the transmissions of network parameters in AI algorithms and wireless transmission

MANUSCRIPT SUBMITTED TO THE IEEE WIRELESS COMMUNICATIONS need to be jointly optimized. A uniﬁed design of AI transmission and wireless transmission shouldbe further investigated. • Joint optimization of UAVs, terrestrial MEC servers and MUs:

A key aspect of the UAV-NOMA-MEC network is mobility of UAVs, terrestrial MEC servers and MUs, which brings signiﬁcantchallenge for the joint optimization of resource allocation. Therefore, more advanced approachesare needed to further explore the performance enhancement when all the elements are moving.R

EFERENCES [1] Y. Zhou, L. Tian, L. Liu, and Y. Qi, “Fog computing enabled future mobile communication networks: A convergence ofcommunication and computing,”

IEEE Commun. Mag. , vol. 57, no. 5, pp. 20–27, May 2019.[2] Y. Liu, Z. Qin, M. Elkashlan, Z. Ding, A. Nallanathan, and L. Hanzo, “Nonorthogonal multiple access for 5G and beyond,”

Proc. IEEE , vol. 105, no. 12, pp. 2347–2381, Dec. 2017.[3] S. Zhang, Y. Zeng, and R. Zhang, “Cellular-enabled UAV communication: A connectivity-constrained trajectory optimiza-tion perspective,”

IEEE Trans. Commun. , vol. 67, no. 3, pp. 2580–2604, Mar. 2019.[4] Z. Yang, C. Pan, K. Wang, and M. Shikh-Bahaei, “Energy efﬁcient resource allocation in UAV-enabled mobile edgecomputing networks,” vol. 18, no. 9, pp. 4576–4589, Sep. 2019.[5] S. Zhang, J. Liu, H. Guo, M. Qi, and N. Kato, “Envisioning device-to-device communications in 6G,”

IEEE Network ,vol. 34, no. 3, pp. 86–91, May 2020.[6] Z. Kaleem, M. Yousaf, A. Qamar, A. Ahmad, T. Q. Duong, W. Choi, and A. Jamalipour, “UAV-empowered disaster-resilientedge architecture for delay-sensitive communication,”

IEEE Network , vol. 33, no. 6, pp. 124–132, Nov. 2019.[7] F. Wang, J. Xu, and Z. Ding, “Multi-antenna NOMA for computation ofﬂoading in multiuser mobile edge computingsystems,”

IEEE Trans. Commun. , vol. 67, no. 3, pp. 2450–2463, Mar. 2019.[8] A. Kiani and N. Ansari, “Edge computing aware NOMA for 5G networks,”

IEEE Int. of Things , vol. 5, no. 2, pp.1299–1306, Apr. 2018.[9] Y. Liu, Z. Qin, Y. Cai, Y. Gao, G. Y. Li, and A. Nallanathan, “UAV communications based on non-orthogonal multipleaccess,”

IEEE Wireless Commun. , vol. 26, no. 1, pp. 52–57, Feb. 2019.[10] R. Nallapati, F. Zhai, and B. Zhou, “SummaRuNNer: A recurrent neural network based sequence model for extractivesummarization of documents,” in

Proc. Thirty-First AAAI Conference on Artiﬁcial Intelligence (AAAI-17) , San Francisco,CA, USA, Feb. 2017.[11] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint learning and communications framework for federatedlearning over wireless networks,”

IEEE Trans. Wireless Commun. , pp. 1–1, 2020.[12] S. Wang, M. Chen, C. Yin, W. Saad, C. S. Hong, S. Cui, and H. V. Poor, “Federated learning for task and resourceallocation in wireless high altitude balloon networks,” arXiv preprint arXiv:2003.09375 2020.[13] C. He, Y. Hu, Y. Chen, and B. Zeng, “Joint power allocation and channel assignment for NOMA with deep reinforcementlearning,”

IEEE J. Sel. Areas Commun. , vol. 37, no. 10, pp. 2200–2210, Oct. 2019.[14] A. Sadeghi, F. Sheikholeslami, A. G. Marques, and G. B. Giannakis, “Reinforcement learning for adaptive caching withdynamic storage pricing,”

IEEE J. Sel. Areas Commun. , vol. 37, no. 10, pp. 2267–2281, Oct. 2019.[15] Z. Yang, Y. Liu, Y. Chen, and N. Al-Dhahir, “Cache-aided NOMA mobile edge computing: A reinforcement learningapproach,”