[PDF] AI based Service Management for 6G Green Communications

Abstract

Green communications have always been a target for the information industry to alleviate energy overhead and reduce fossil fuel usage. In current 5G and future 6G era, there is no doubt that the volume of network infrastructure and the number of connected terminals will keep exponentially increasing, which results in the surging energy cost. It becomes growing important and urgent to drive the development of green communications. However, 6G will inevitably have increasingly stringent and diversified requirements for Quality of Service (QoS), security, flexibility, and even intelligence, all of which challenge the improvement of energy efficiency. Moreover, the dynamic energy harvesting process, which will be adopted widely in 6G, further complicates the power control and network management. To address these challenges and reduce human intervene, Artificial Intelligence (AI) has been widely recognized and acknowledged as the only solution. Academia and industry have conducted extensive research to alleviate energy demand, improve energy efficiency, and manage energy harvesting in various communication scenarios. In this paper, we present the main considerations for green communications and survey the related research on AI-based green communications. We focus on how AI techniques are adopted to manage the network and improve energy harvesting toward the green era. We analyze how state-of-the-art Machine Learning (ML) and Deep Learning (DL) techniques can cooperate with conventional AI methods and mathematical models to reduce the algorithm complexity and optimize the accuracy rate to accelerate the applications in 6G. Finally, we discuss the existing problems and envision the challenges for these emerging techniques in 6G.

Full PDF

11 AI based Service Management for 6G GreenCommunications

Bomin Mao,

Member, IEEE , Fengxiao Tang,

Member, IEEE,

Yuichi Kawamoto,

Member, IEEE, and Nei Kato,

Fellow, IEEE

Abstract —Green communications have always been a targetfor the information industry to alleviate energy overhead andreduce fossil fuel usage. In current 5G and future 6G era,there is no doubt that the volume of network infrastructureand the number of connected terminals will keep exponentiallyincreasing, which results in the surging energy cost. It becomesgrowing important and urgent to drive the development of greencommunications. However, 6G will inevitably have increasinglystringent and diversiﬁed requirements for Quality of Service(QoS), security, ﬂexibility, and even intelligence, all of whichchallenge the improvement of energy efﬁciency. Moreover, thedynamic energy harvesting process, which will be adopted widelyin 6G, further complicates the power control and networkmanagement. To address these challenges and reduce humanintervene, Artiﬁcial Intelligence (AI) has been widely recognizedand acknowledged as the only solution. Academia and industryhave conducted extensive research to alleviate energy demand,improve energy efﬁciency, and manage energy harvesting invarious communication scenarios. In this paper, we present themain considerations for green communications and survey therelated research on AI-based green communications. We focuson how AI techniques are adopted to manage the network andimprove energy harvesting toward the green era. We analyzehow state-of-the-art Machine Learning (ML) and Deep Learning(DL) techniques can cooperate with conventional AI methodsand mathematical models to reduce the algorithm complexityand optimize the accuracy rate to accelerate the applications in6G. Finally, we discuss the existing problems and envision thechallenges for these emerging techniques in 6G.

Index Terms —6G, green communications, Artiﬁcial Intelli-gence (AI), energy harvesting.

I. I

NTRODUCTION R ECENTLY, 5G has been launched to provide users withhigh-throughput services in some countries, while theworldwide researchers have started to conceive 6G [1]–[3].It has been reported that 5G Base Stations (BSs) and mobiledevices consume much more energy than 4G [4]. For example,a typical 5G BS with multiple bands has a power consumptionof more than 11,000W, while a 4G BS costs less than 7,000W.The dramatically increased power consumption mainly comesfrom two parts: the growing Power Ampliﬁcation (PA) inthe massive Multiple Input Multiple Output (MIMO) antennaand the processing of booming data. Even though the energyconsumption per unit of data has dropped drastically, theexponentially increasing energy required to provide seamless5G services cannot be neglected since the number of required5G BSs is at least 4 times that of 4G to cover the same

Bomin Mao, Fengxiao Tang, Yuichi Kawamoto, and Nei Kato are with theGraduate School of Information Sciences, Tohoku University, Sendai, Japan.Emails: {bomin.mao, fengxiao.tang, youpsan, and kato}@it.is.tohoku.ac.jp sized area. Data show that Information and CommunicationTechnology (ICT) accounts for more than total electricityconsumption as shown in Fig. 1a, and it will keep an estimatedannual growth rate between 6% and 9% [5], [6].Then, what will be the situation for 6G in terms of energyconsumption? As we know, 6G is expected to extend theutilized frequency bands to Terahertz (THz) for 1,000 timesof throughput improvement on the basis of 5G [1]. Since theupper bound of transmission range is shortened from 100 mof millimeter Wave (mmWave) to 10 m of THz spectrum,future THz-enabled BS is envisioned to be deployed in thehouse to provide indoor communications [7], which meanssigniﬁcant growth of required BSs. Moreover, besides the com-munication purpose for mobile terminals and various sensingdevices, the computation and content provision services willbe gradually transferred from local devices to clouds and edgeservers through real-time communications [8], [9], which isone of the main constituent of ICT energy consumption asin Fig. 1b. Another critical paradigm is the utilization ofArtiﬁcial Intelligence (AI) techniques to provide context-awareinformation transmissions and personal-customized services,as well as realize the automatic network management [1],[10], [11]. The growing ICT infrastructure, exploding data, andthe increasingly complex network management will result insurging energy consumption, which poses a great challenge forthe network operators [12], [13]. Data analysis shows that theICT sector may cost more than 20% of the total electricity [5]as in Fig. 1a.To alleviate the growing energy burden toward 6G, theacademia and industry have conducted extensive research.And the available solutions to address the huge energyconsumption mainly come from two parts: energy-efﬁcientnetwork design [14], [15] and energy harvesting [16], [17].Speciﬁcally, energy harvesting units, such as the solar panels,wind turbines, and vibration harvester, are widely adoptedto convert the various kinds of energy to electricity forthe communication devices as shown in Fig. 1c. Amongthese energy harvesting techniques, Radio Frequency (RF)harvesting is an important technique which enables not onlythe simultaneous information and energy transmission, butalso the utilization of the interference signal. Similar to RFharvesting, the Intelligent Reﬂecting Surface (IRS) is expectedto be widely deployed to reﬂect the wasted signal to thereceivers to increase the Signal to Interference plus NoiseRatio (SINR) [18]–[20]. Some other deployment including thesatellites and Unmanned Aerial Vehicles (UAVs), are deployedto provide seamless coverage. For more efﬁcient energy/power a r X i v : . [ c s . N I] J a n (%) (a) Energy consumption of ICT and its share. Data centers

Network infrastructure

Consumer devices (b) Energy consumption of dif-ferent parts for ICT.

Solar

Wind Tide RF signalsVibration Geothermal (c) Various energy harvesting sources forICT.

Fig. 1: Tendency of energy consumption for ICT and the promising energy harvesting techniques.management, AI techniques including conventional heuristicalgorithms, the popular Machine Learning (ML), and state-of-the-art Deep Learning (DL) methods, has been adoptedto simplify the traditional mathematical iteration process andpredict the future network changes as shown in Fig. 2. Sincethe future network services have diverse requirements insteadof only the high throughput, traditional mathematical modelsaiming at improving the bit-per-Joule may not be applied tofuture complex scenarios. To realize the automatic networkmanagement toward the green era, AI is the most promisingsolution. And what we need to do is analyze the variousnetwork resources and consider more joint optimizations asshown in Fig. 2. Accordingly, AI techniques are more widelyadopted to optimize the power control and resource alloca-tion in many works [21]–[24]. In this research, we conducta survey on AI-related service management for 6G greencommunications. In the following paragraphs, we introducethe motivations, scope, and contributions of this paper.

A. Motivation1) Energy-related Issues for Different Network Services:

Similar to the 5G which has deﬁned three kinds of ser-vices including the eMBB (enhanced Mobile Broadband),uRLLC (ultra-Reliable and Low-Latency Communications),and mMTC (massive Machine Type Communications), someresearchers have also considered service deﬁnitions in 6G [1].Among these different service deﬁnitions, we expand ourintroductions from three typical communication scenarios:Cellular Network Communications (CNC), Machine TypeCommunications (MTC), and Computation Oriented Commu-nications (COC). • CNC : Since the majority of energy consumption for cel-lular networks comes from the BSs, the related researchon green CNC mainly focuses on the deployment andconﬁgurations of BSs. To optimize energy efﬁciency ofCNCs, the deployment and work states of the BSs shouldbe carefully analyzed and scheduled. Moreover, for theworking BSs, the power control and resource allocationare critical to improving the system throughput withminimum energy consumption. Furthermore, the energyharvesting technology can be also considered to alleviatethe grid electricity demand of BSs. • MTC : For the MTC devices most of which are battery-constrained and difﬁcult to be charged, to alleviate en-ergy demand can be conducted from the access layerand network layer. The research mainly concentrates onthe optimization of network access, routing, and relay.As energy harvesting has been widely regarded as animportant technique for future Internet of Things (IoT)networks, how to manage the networks considering en-ergy dynamics is challenging and meaningful. • COC : Computation and storage services will be animportant part of 6G, which is also energy-aggressiveas shown in Fig. 1b. For the computation parts, theresearch to reduce energy consumption mainly analyzesthe ofﬂoading decision computation resource allocationsince each server has a limited capacity. Moreover, theuneven distribution of computation demand requires theoptimization of server deployment for the balance oflatency and energy consumption. For the Content Deliv-ery Networks (CDNs), the content caching and deliverypolicies directly affect energy consumption.

2) Limitations of Conventional Methods:

To alleviate en-ergy demand and improve energy efﬁciency is usually verycomplex since it is not only concerned with the power control,but also related to many other factors, such as transmissionscheduling, resource allocation, network design, user associ-ation, and so on. Thus, the formulated problem consideringmultiple related factors is non-convex or NP-hard [22], [48],[49]. And the conventional mathematical approach is to iter-ative search the global optimum result or divide into two ormultiple sub-problems and search the sub-optimal point [50],[51]. However, due to the increasing factors necessary tobe considered, the solution space is signiﬁcantly huge, re-sulting in low convergence or extreme difﬁculty in ﬁndingthe global optimum. Moreover, since 6G network serviceshave more diversiﬁed requirements for throughput, latency,and reliability than 5G, common mathematical optimizationmethods focusing on the maximization or minimization of asingle metric is not enough. Furthermore, the nonlinear andunclear relationship among multiple parameters necessary tobe considered makes the mathematical models difﬁcult to beconstructed. Additionally, node mobility and service changeslead to increasing network dynamics, which may result in

TABLE I. Existing Surveys on Energy Harvesting and Green CommunicationsPublication Topics in this survey Difference and enhancements of our surveyZhang, 2010 [25] Energy efﬁciency, optical networks Focus on energy-efﬁcient wirelesscommunications and network managementSudevalayam, 2011 [26] Energy harvesting, wirelesssensor networks Enhanced coverage including various wirelessscenarios and AI-based green communicationsFeng, 2013 [27] Energy efﬁciency, resource management,cooperative communication, MIMO, OFDMA Focus on AI-based energy-efﬁcientnetwork managementAziz, 2013 [28] Energy efﬁciency, wireless sensornetworks, topology control Enhanced coverage including various wirelessscenarios and AI-based green communicationsBudzisz, 2014 [29] Energy efﬁciency, cellularnetworks, WLAN, sleep modes Enhanced coverage of energy-efﬁcient wirelesscommunication scenariosLu, 2015 [30] RF Energy harvesting, SWIPT,CRN, communication protocols Enhanced coverage of various wireless scenariosand focus on AI-based green communicationsIsmail, 2015 [31] Energy efﬁciency, cellular networks,power consumption modeling Enhanced coverage of various wireless scenariosand focus on AI-based green communicationsFang, 2015 [32] Energy efﬁciency, information-centricnetworking, content delivery networks Focus on wireless communication scenariosErol-Kantarci, 2015 [33] Smart grid, data centers,energy-efﬁcient communications Focus on energy-efﬁcient wirelesscommunications and network managementHuang, 2015 [34] Energy efﬁciency, energy harvesting,cognitive radio networks Enhanced coverage including various wirelessscenarios and AI-based green communicationsPeng, 2015 [35] Interference control, energy harvesting,resource allocation, heterogeneous networks Focus on AI-based energy-efﬁcientcommunications and network managementMahapatra, 2016 [36] Energy efﬁciency, tradeoff,spectrum, routing, scheduling Focus on state-of-the-art power managementfor network performance optimizationHeddeghem, 2016 [37] Power saving techniques in IP-over-WDMbackbone networks Focus on the different wireless access networksKu, 2016 [38] Energy harvesting, usage protocol,energy scheduling, network design Enhanced coverage of greencommunication techniquesBuzzi, 2016 [39] Energy efﬁciency, 5G, cellularnetwork, energy harvesting Enhanced coverage of greencommunication techniquesOmairi, 2017 [40] Energy harvesting, wirelesssensor networks Enhanced coverage including various wirelessnetworks and AI-based green communicationsZhang, 2017 [41] Green communications, tradeoffs,5G networks Focus on AI-based power managementfor network performance optimizationAlsaba, 2018 [42] Energy harvesting, beamforming,SWIPT, physical layer security Enhanced coverage of energy-efﬁcient wirelesscommunications and network managementPerera, 2018 [43] SWIPT, 5G Focus on energy-efﬁcient communicationsand network managementChen, 2019 [44] Energy-saving, physical-layer andcross-layer communication coding Enhanced coverage including various AI-basedenergy-efﬁcient communication techniquesTedeschi, 2020 [45] Energy harvesting, security,green communications, IoT Focus on AI-based energy-efﬁcientcommunications and network managementMa, 2020, [46] IoT, energy harvesting, sensing,computing, and communications Enhanced coverage includingheterogeneous wireless networksHu, 2020 [47] Energy harvesting management,5G/B5G communication networks Enhanced coverage of energy-efﬁcientcommunications and focus on AI-based solutionsfrequent failures of conventional methods.

3) Advantages of AI Methods:

Compared with conventionalmethods, AI techniques including the traditional heuristicalgorithms, ML, and the currently popular DL approacheshave signiﬁcant advantages. AI techniques aim to solve theproblems in a naturally intelligent manner [52]. Thus, it can try to explore the complex relationship among different networkparameters through trial and error [53]. In current years, theML/DL methods have been widely used to learn the powercontrol and resource allocation policy [21], [49], [54], [55],which greatly alleviate the difﬁculty in manually studyingthe complex relationships and constructing the mathematical

DESIGN DESIGN PowerRelated metrics Deep AI modelAI modelAutomatic network managementAI-assisted iterationsFuture AI-based green communications

Existing DL-based green communications

Previous smart green communications

Fig. 2: The development of AI-based green communicationsmodels. Moreover, many AI models can estimate the changesof network parameters, which enables the necessary networkadjustment in advance and avoids the potential performancedeterioration [56], [57]. More importantly, the future increas-ing Internet users and growing trafﬁc provide massive dataresource to adopt and develop AI methods in order to realizeautomatic network management.

B. Scope

In this paper, we focus on AI-based research to alleviateenergy cost and improve energy efﬁciency. Different from pre-vious works which concentrate on some deﬁnite networks [25],[34], [39], [45], our research is expanded from three 6G com-munication services: CNC, MTC, and COC. And we mainlyfocus on AI techniques utilizing for green communicationsincluding the traditional heuristic algorithms, ML, and thestate-of-the-art DL. Detail introductions will be given in thefollowing paragraphs.

1) Existing Surveys:

The green communications-relatedtopics have attracted scholars’ attention in more than 10 yearsand Table I lists the concerned survey papers. We can ﬁndthat these survey papers focus on deﬁnite networks, includ-ing backbone networks [37], optical networks [58], cellularnetworks [27], [29], [31], [41]–[43], [47], Cognitive RadioNetworks (CRNs) [30], [34], and Wireless Sensor Networks(WSNs) [26], [28], [40]. And different topics, such as improv-ing energy efﬁciency [25], [27], [28], [31]–[33], [36], energyharvesting [26], [30], [34], [35], [38], [40], [42], [45]–[47],balancing energy cost and network performance tradeoff [36],[41] have been discussed. However, no research focus on AI-based energy-efﬁcient communication techniques, even thoughAI has been regarded as the next paradigm to improvecommunication and network performance [1], [59]. Anotherproblem is that these surveys mainly focus on the relationshipbetween energy and communication performance. However,computation and storage services will be an important part for6G [60], [61]. Thus, to construct the 6G green ICT systems, we need to make analysis from not only the communicationperspective, but also the computation perspective.

2) Structure of This Survey:

The remaining part consistsof ﬁve sections. Before introducing the related research, weintroduce the widely-adopted AI techniques in Sec. II. Then,we introduce the related research according to the studiedcommunication scenarios including CNC, MTC, and COCin Sec. III, IV, and V, respectively. Then, we summarizethe limitations of existing research and envision the futuredirections in Sec. VI and conclude this article in Sec. VII.The structure of this paper is given in Fig. 3.

3) Contribution:

After discussing the existing surveys andintroducing our research, the contributions can be summarizedas below: • We summarize the commonly concerned communicationparts and techniques to alleviate energy demand andimprove energy efﬁciency. • We introduce the widely-adopted AI models as well asthe state-of-the-art ML/DL methods to improve energymanagement and network performance, which can givesome ideas for future related research. • We analyze the green ICT systems from not only thecommunication perspective, but also the viewpoint ofcomputation. And this survey covers the most promising6G network scenarios, including THz-enabled cellularnetworks, Satellite-Air-Ground Integrated Networks (SA-GINs), DCNs, Vehicular ad hoc Networks (VANETs), andIoTs. • We not only focus on how AI is adopted in these researchworks, but also analyze how to design AI models toimprove the performance. Especially, we explain the com-mon techniques and mathematical methods to improve theAI accuracy rate. • We envision the challenges of AI-based 6G green commu-nications including the overwhelming computation over-head, security issues, and practical deployment.II. O

VERVIEW OF

AI M

ETHODS T OWARDS

NERGY -E FFICIENT C OMMUNICATIONS

Besides the applications in image classiﬁcation [62], natu-ral language processing [63], and game [64], AI techniqueshave been widely studied to optimize the network perfor-mance [65]–[68], while green communication is an importantapplication. To improve the performance of AI strategies,various AI models have been developed and some new ten-dency has appeared toward more intelligent communicationmanagement. In this section, we give some introductions abouttraditional and current AI methods.AI has been conﬁrmed as an important paradigm for 6G torealize the network automatic management [1], [59]. However,the growing network complexity and increasingly stringentservice requirements cause great challenges for existing AItechniques. Future intelligent network management dependson the cooperation of various parts: network design, deploy-ment, resource allocation, and so on. To realize the intelligencein every part, various kinds of AI techniques will be adopted.

Sec. II Commonly utilized AI methods for green communications

Future perspective learning methodsDevelopment of DL modelsTraditional AI methods

Sec. III Cellular network communications

Power control and resource allocationBS deployment and configuration

Power consumption and energy efficiency model Sec. IV Machine type communications

Energy-efficient network access

Basic concepts and existing AI-based literature of green communications

Power consumption modelEnergy harvesting- enabled BS

Energy efficient transmissions Energy harvesting and sharing Energy EfficientCloud and edge computing

Power consumption modelGreen content caching and delivery

Sec. V Computation oriented computationsIntroductions of AI

Open research issues

Green BS management for 6G HetNet Energy efficientSAGIN

AI-based energy efficient transmissions AI-enhancedenergy transfer and sharing Security for AI-enabled networks Lightweight AI model and hardware design

Fig. 3: Main contents and structure of this article

A. Traditional AI Algorithms

The development of AI technology can be separated intoseveral stages and Fig. 4 gives an example. As shown in thisﬁgure, the traditional AI techniques utilized in communicationnetworks mainly consist of two types: the heuristic algorithmsand ML methods [69]. Even though some ML methods alsobelong to the heuristic algorithms, such as the Artiﬁcial NeuralNetworks (ANNs) and Support Vector Machines (SVMs), weonly consider the non-data-based heuristic models for clearexplanations. Thus, the former one mainly utilizes the onlinesearch of optimum solution through iterations, while the lattergroup constructs and train deﬁnite models with extensive datato accumulate experience. The following paragraphs will givesome more detailed discussions.

1) Heuristic Algorithms:

The heuristic algorithms focus onthe NP-hard problem and aim to ﬁnd a good enough solutiongiven a limited time frame. Generally, the heuristic algorithmsuse some shortcuts and run faster compared with traditionalgreedy search methods. However, the sacriﬁce is the worseaccuracy rate or near-global optimum. The shortcut methodsvary from different heuristic methods, including the ParticleSwarm Optimization (PSO), Ant Colony Optimization (ACO),and Genetic Algorithm (GA) as shown in Fig. 4.

Particle Swarm Optimization : This optimization methodassumes the dubbed particles move around the search-spaceaccording to the mathematical formulations of their positionsand velocities [70]. The movement of each particle is affectedby its own best position and the best-known positions in thesearch-space, which leads to the discovery of improved po-sitions. Through repeating the process, a satisfactory solutionmay be found. This method has been adopted to optimize edgeserver deployment [71] and virtual machine placement [72],[73] in order to improve energy efﬁciency. Moreover, themethod previously mainly adopted for continuous problemshas also been illustrated its availability for a discrete pro-cess [72], [73]. However, this method is easy to fall into local optimum in high-dimensional space and has low a convergencerate.

Ant Colony Optimization : Inspired by the ants’ behavior tosearch food, ACO has been proposed to ﬁnd the optimal routethrough simulating the revolution [74]. Similar to PSO, ACOis also based on swarm intelligence, where a grout of artiﬁcial"ants" which are multiple simulation ants move through thesearch space to ﬁnd the optimal route. And for each artiﬁcialant, record its position and quality, which can guide other antsto locate better positions in later simulation iterations. Thismethod has been widely studied in many network applicationsin order to improve energy efﬁciency, such as routing [75],resource allocation [76], and server deployment [66].

Genetic Algorithm : The GA, which is also termed geneticprogramming, borrows the concepts of mutation, crossover,and selection in evolutionary biology to improve the solu-tion [77]. In GA, a group of candidate solutions is abstractedas chromosomes or phenotypes and a pair of chromosomes orphenotypes can crossover to generate a new generation witha certain probability. Moreover, the mutation may happen foreach new generation to result in a totally new chromosome orphenotype. To guide the process toward the expected direction,ﬁtness is deﬁned to evaluate the individuals in every generationand the individual with low ﬁtness value is eliminated. GA iseasy to converge and expandable, while it cannot guaranteethe global optimum and depends heavily on the parameterselection. Researchers have adopted this method to designthe cellular networks [78], [79] and optimize the edge serverdeployment [80], [81].

2) Machine Learning Algorithms:

As a data-based tech-nique, various ML algorithms have been developed andadopted in many network performance optimization strate-gies [60], [82], [83]. In this part, we focus on three machinelearning algorithms: regression analysis [84], SVM [85], andK-means clustering [86], which are commonly utilized ingreen communications. Another important technique: Rein-

Heuristic

Algorithms

Machine

Learning

Deep

Learning • Particle Swarm Optimization • Genetic Algorithm • Ant Colony Optimization • Simulated Annealing • Supervised learning • Unsupervised learning • Reinforcement learning • Semi-supervised learning • Supervised/semi-supervised • Unsupervised learning • Deep reinforcement learning • Federated learning • Transfer learning • Imitation learning

Increasing accuracy with growing complexity

Fig. 4: Development of AI Techniquesforcement Learning (RL) will be introduced in the nextsubsection.

Regression Analysis : This method is mainly utilized toanalyze the relationship between two or among multipleparameters. The most common application is to map from theinput parameters to the output results with the labeled datasetand a cost function is usually deﬁned to evaluate the accuracyrate. According to whether the output is linear or binary, theregression analysis can be divided into linear regression andlogistic regression. Regression analysis plays an important rolein green communications. For instance, the linear regressioncan be utilized to predict future trafﬁc changes, which isfurther adopted to determine the energy-efﬁcient transmissionschemes, resource allocation, and computation ofﬂoading [15].

Support Vector Machine : SVM is adopted to analyze datafor classiﬁcation and regression analysis in a supervised learn-ing manner [85]. An SVM utilizes a set of orthogonal vectorsto deﬁne a hyperplane or a set of hyperplanes to separatethe training data point. And the best hyperplane is the onethat has the largest distance to the nearest training data in anyclass. The SVM can be adopted for high-dimensional problemsand suitable for the small dataset. In green communicationmanagement, SVM has been applied to solve the problemslike user association [67] and computation ofﬂoading [87].

K-means Clustering : This method aims to partition multipleobservations into several clusters in which each observationbelongs to the cluster with the nearest center [86]. As anunsupervised learning method, this technique repeats the pro-cess to assign the nodes into different clusters and update thecluster center. To evaluate the assignments, a cost functionbased on the distance between the nodes and the cluster centeris deﬁned. K-means clustering is efﬁcient to cluster the usersand associate them to suitable BSs for saving energy [88]–[90]. It can also be applied to the optimization of cloudletplacement [91].

B. Development of Deep Learning Models

Since the common ML/DL models and three training man-ners shown in Fig. 4 have been introduced in many works [82],[92], we just give some discussion about the development ofML/DL models which have been utilized to improve energyefﬁciency.Most of the current ML/DL models are developed fromArtiﬁcial Neural Networks (ANNs) which can be also termedNeural Networks (NNs). ANN is constructed by layers ofinterconnected units named "artiﬁcial neurons", which is to model the neurons in a biological brain [93]. Each artiﬁcialneuron can process the received signals with some non-linear functions and then transmit the result to neurons inthe next layer through the weighted edges. Thus, the ﬁnaloutput of each ANN depends on not only the input signals,but also the utilized non-linear functions and edge weights.In recent decades, the ML/DL models have developed faston the basis of ANNs, which can be summarized into threeaspects. First, the most obvious development is the increasednumber of layers, which result in the deep architectures fromtraditional shallow ones. Thanks to the breakthrough in thetraining algorithm [94] as well as the hardware developments,current DL models can have very complex architectures whilekeeping a extremely high accuracy rate, which enables themto be adopted in very complicated scenarios and overwhelmhumans in some applications, such as the board game [64].Second, connection manners become more complex. Besidesthe full connections among neurons in adjacent layers for mostANNs, the partial connections have also been utilized in somemodern ANNs, such as the Convolutional Neural Networks(CNNs) [95], which enables the ﬂexible processing of the inputwhere features are not distributed everywhere. And part of theoutput can be also further input the learning models, such asthe Recurrent Neural Networks (RNN) [96], to generate thetime-consecutive variables. Third, researchers have developedthe models to concurrently utilize multiple ANNs to cooper-atively complete one task, such as the Generative AdversarialNetwork (GAN) [97] and Actor-Critic (AC) method [98]. Thetwo ANNs can have the same or different structures whileact different roles. Forth, the techniques such as the differentactivation functions, data processing methods, and attentionmechanism signiﬁcantly improve the accuracy rate of currentML/DL structures.

C. Future Perspective AI Learning Methods

Besides the development in the ML/DL structures, thelearning methods also critically affect the accuracy rate andcomputation performance. Future networks will consist ofmore complex scenarios and dynamics, which drives us toconsider more advanced AI learning methods. In this part,besides the traditional supervised learning and unsupervisedlearning, we focus on three AI learning methods which willdeﬁnitely attract more attention as shown in Fig. 4.

1) Deep Reinforcement Learning:

RL is the dynamicallylearning through trial and error to maximize the outcome.In an RL model, the essential components are the environ-ment, a deﬁned agent, the state space, the action space, andreward [99]. In the studied environment, the agent chooses anaction according to the current state, and then gets rewardedfor the correct action or penalized for an incorrect one. In thetraining process, the agent follows the existing experience orexplores a new action with a certain probability in order tomaximize the reward. In the traditional RL model, a table isusually utilized to store the Q value which is the expectedaccumulated reward for different actions at each state. Thetraining process is to ﬁll in the table, which can guidefuture action selection. However, with the studied problem becoming complex, the number of states and potential actionswill be huge and even unavailable, which makes the Q-valuetable impossible. To solve this problem, the DL models areadopted to map from the state to the corresponding action,which is the main concept of Deep Reinforcement Learning(DRL) [99]. Another advantage is that this method enablesan agent to generalize the value of states it has never seenbefore or just has partial information. Due to these advantages,it has been witnessed that DRL has attracted more attentionto improving energy efﬁciency through optimizing the BSmanagement [100], resource allocation [101], [102], powercontrol [21], [103], and computation ofﬂoading [23], [24],[104].

2) Transfer Learning:

Transfer learning is a machine learn-ing method which aims to utilize the constructed knowledgesystem while solving a problem to the different but relatedproblem [105]. Different from traditional ML models whichlearn the knowledge from zero, what is necessary to do for thenew application in related problems is ﬁne-tune the new modelbased on existing knowledge system or train part of it. Thus,transfer learning can signiﬁcantly reduce the computation con-sumption and required training data, resulting in extended andaccelerated applications. As the network changes frequentlydue to the mobility and transmission environment changes,transfer learning is widely considered to address the similarscenarios [106]–[109]. On the other hand, the applicationrange of the existing knowledge system as well as the balancebetween training and performance in target scenario are hottopics and require more attention in existing research [108].

3) Federated Learning:

Federated learning is a decentral-ization method by utilizing the distributed servers or devicesto train and test AI models with the local data [110], [111].Thus, the edge servers or devices can keep the training datalocally and just need to upload the obtained parameters to thecentral controller. What the central controller needs to do iscollect and integrate the parameters of AI models. And thenthe edge devices can download AI models to make predictionsor conduct periodical update. Since personal privacy arousesincreasing concern recently, the federated learning techniquewill attract growing attention in 6G. Moreover, the cooper-ative training and running manner of federated learning canefﬁciently utilize the idle computation resource and reducethe consumption in the central controller. Furthermore, theuploading of parameters instead of training data results inreduced communication overhead [24], [87], [112].

D. Summary

From the above introduction, we can ﬁnd AI techniqueshave various application scenarios and should be chosenaccording to deﬁnite problems. And with the development ofcomputation hardware, DL techniques have attracted growingattention to solving more complex problems. However, thisdoes not mean that the traditional AI techniques such asheuristic algorithms and shallow ML models are not suitableanymore. Since many traditional AI methods have much lowercomputation complexity compared with DL, they are suitablefor some resource-limited scenarios. In the following paper, …… …… …… …… ……… Output:

Future traffic

Input:

Traffic trace, user mobility,

CSI, ⋯ Final result:

BS switching policy …… …… …… …… … …… Output:

BS switching policy …… Mathematical model (a) Two-step strategy(b) One-step strategy

Input:

Traffic trace, user mobility,

CSI, ⋯ Fig. 5: One-step and two-step AI-based BS switching strategywe give more detailed explanations about how these meth-ods to realize green communications in different scenarios.It should be noted that some important AI techniques arenot introduced in this section, but they still have promisingperspectives, such as imitation learning [113] and quantummachine learning [114].III. C

ELLULAR N ETWORK C OMMUNICATIONS

Energy consumption of cellular networks comes from theradio access part and the core part [31]. Some practicalmeasurements of energy consumption of cellular networkshave been reported in [31], [115]. And the data illustratethat the BSs account for more than half of the total energyconsumption, in which more than 50% to 80% is utilized forthe power ampliﬁer and feeder. With the utilized frequencyband extended to sub-THz and THz in the 6G era, the coverageof single BS further shrinks [1], [116]. Then, the requiredincreasing number of BSs to realize seamless coverage isexpected to consume more energy. Therefore, green com-munication research for cellular networks mainly focuses onBSs. In this section, we ﬁrst introduce the power consumptionand energy efﬁciency modeling of cellular networks and thenexplain the related AI-based approaches to realize the greencommunications from different perspectives.

A. Power Consumption and Energy Efﬁciency of CellularNetworks

According to our above introductions, we mainly focus onthe Radio Access Network (RAN) part consisting of BSsand access terminals. We introduce the power consumptionmodeling of BSs and the metric "bit-per-Joul" to measureenergy efﬁciency for both the BSs and access terminals.

1) Power Consumption Modeling of BSs:

The power con-sumption of a BS consists of four part: power supply, signalprocessing, air conditioning, and the power ampliﬁer [31].Since part of the power consumption is constant for BSsat sleep and idle states while the other part is relevant to the workload, energy consumption of a BS can be usuallysummarized as [117]: 𝑃 𝑏𝑠 = 𝑃 𝑠𝑙𝑒𝑒𝑝 + 𝐼 𝑏𝑠 { 𝑃 𝑎𝑑𝑑 + 𝜂𝑃 𝑡𝑟 𝑎𝑛𝑠 } (1)where 𝑃 𝑏𝑠 and 𝑃 𝑡𝑟 𝑎𝑛𝑠 denote the total power consumption andmaximum transmission power consumption of the BS, while 𝜂 ∈ [ , ] denotes the usage rate. 𝑃 𝑠𝑙𝑒𝑒𝑝 is the constant powerconsumption to sustain the basic functions in sleep mode. 𝑃 𝑎𝑑𝑑 denotes the additional constant power for computation,backhaul communication, and power supply in active mode. 𝐼 𝑏𝑠 is a binary parameter representing whether the BS is activeor sleep. According to Equation 1, to reduce energy consump-tion, we should try our best to turn the idle BS to sleepmode and minimize the usage in active mode. If we furtherconsider that future deployed multi-tier heterogeneous BSs areenabled with various frequency bands up to THz [78], [118],to reduce the consumed energy of all the BSs should be mainlydependent on the BS deployment as well as management, userassociation, and resource allocation.

2) Energy Efﬁciency Measurement:

Energy efﬁciency is tomeasure to achieved performance with energy per unit mass.Thus, in the cellular networks, it is usually deﬁned as theratio between the obtained transmission rate and power con-sumption with the unit of "bit-per-Joule". Different from thedirect energy-saving strategies, to improve energy efﬁciencyis also an important direction towards green communication.Here we deduce the equations of energy efﬁciency for a UE incellular networks and then analyze the potential optimizationstrategies. It should be noted that the derivation method alsoapplies to the BSs.We assume a multi-cell interference network with multiplesingle-antenna UEs and several multi-antenna BSs. The samespectrum resource is multiplex among the cells [49]. If oneUE’s transmission power and the channel gain to the corre-sponding BS are 𝑃 𝑢𝑡 and 𝐺 𝑢 , respectively, then the maximumuplink transmission rate can be calculated as: 𝑅 𝑢 = 𝐵 log ( + 𝐺𝑃 𝑢 𝑁 + 𝐼 ) (2)where 𝑅 𝑢 is the maximum transmission rate for the uplink ofthe considered UE. And 𝐵 is the assigned bandwidth, while 𝑁 and 𝐼 denote the noise and interference on the utilized channel,respectively. If we further assume the inefﬁciency of the con-sidered UE’s power ampliﬁer and static power consumptionare 𝜇 and 𝑃 𝑢 , respectively, then energy efﬁciency can becalculated as below: 𝐸 𝐸 𝑢 = 𝑅 𝑢 𝜇𝑃 𝑢 + 𝑃 𝑢 (3)According to Equations 2 and 3, we can ﬁnd that theparameters affecting energy efﬁciency include the assignedbandwidth, channel gain, transmission power, and interference,while the noise and static power consumption are usuallyconstant. Therefore, we need to optimize the allocation ofresource including channels and bandwidth, power control, andtransmission scheduling policy to improve energy efﬁciency. UEeNB UEeNBMacro BSeNB Genetic AlgorithmData processingCandidate solution

Machine learning model

Prediction resultNetwork informationDeployment configurations

Fig. 6: Intelligent BS deployment.

3) Summary:

According to our above analysis, the strate-gies toward the green cellular networks in the 6G era mainlyconsists of the deployment and management of BSs [119]–[121], the power control [21], [122], and resource alloca-tion [118], [123]–[125]. Another important direction whichhas been mentioned in Sec. I is the utilization of renewableenergy to drive the BSs [126]–[129], [129], [130], [130].

B. Base Station Deployment and Conﬁguration

As we mentioned in Sec. I, the signiﬁcant penetrationloss of THz radio signals will cause 6G BSs to cover verylimited areas with increased available frequency bands [1],both of which contribute to the drastic increase of energyconsumption [4]. However, the uneven distribution and usermobility result in unbalanced trafﬁc load for different BSs.According to Equation 1, to reduce energy consumption andimprove efﬁciency, we need to minimize the number andtransmit power of working BSs in the cellular networks. Thus,the BS deployment policy, workload management, and userassociation are three attractive strategies.

1) Base Station Deployment:

In the network constructionperiod, the BS deployment is an important factor to affect thecommunication performance and energy consumption. Eventhough some deployment positions can be manually selectedaccording to the population density [131], the increasing dy-namics, variable propagation characteristics, complex physicalsurroundings, and even the climates drive the researchers andoperators to consider more efﬁcient and automatic strategies.To decrease the number of deployed BSs, Dai andZhang [78] consider multi-objective GA. In their research,the proposed approach ﬁrstly extracts the main features whichdetermine the strength of the Received Signal Strength (RSS).Then, multiple ML models including 𝑘 -Nearest Neighbor(KNN) [83], random forest [132], SVM [85], and Multi-LayerPerceptron (MLP) [133] are adopted to map the relationshipbetween the extracted features and RSS values. In the secondstage, the multi-objective GA [77] is adopted to optimizethe locations and operating parameters. Speciﬁcally, the GAprogramming process is conducted with a different numberof BSs, and then the minimum number reaching the coveragerequirement is selected. Then, the feasible solutions are evalu-ated by the proposed ML models. Simulation results illustratethat the MLP outperforms other ML models in terms of MeanAbsolute Error (MAE). And the coverage rate is improved by18.5% compared with real-world deployment. Besides the BS deployment planning, the coverage design isan important factor to affect the required number of BSs andnetwork performance. Assuming the deployment is done with-out detailed cell planning, Ho. et al [134] utilize the GA [77]method to adjust the femtocell coverage in order to optimizethe three network metrics: coverage holes, coverage leakage,and load balance. In this paper, the authors consider threemetrics including coverage holes, coverage leakage, and loadto deﬁne the ﬁtness function for the evaluation of consideredsolutions during the evolution process. To overcome unknownnetwork dynamics and user mobility, the online learningmethod based on periodical updates with real-time networkmeasurements is adopted. In their proposal, the hierarchicalMarkov Models (hMMs) [135] are used to capture the behaviorand generate the load trace of each femtocell with a highaccuracy rate. Then, the results can be used to calculate theﬁtness. And the evolution process is illustrated to provide thecontinuous performance improvement.Similar to [78], [134], Moysen et al. [79] also combinethe GA and ML in the design of cellular networks. In theirresearch, the SVM [85] is trained ofﬂine as a QoS regressorwith the collected data including the Reference Signal Re-ceived Power (RSRP) and Reference Signal Received Quality(RSRQ) coming from the serving and neighboring eNBs.Then, in the online phase, the GA algorithm is utilized togenerate the feasible solutions consisting of the conﬁgurationparameters of eNBs. And then the UE measurements for eachfeasible solution is utilized as the input of SVM, of whichthe predicted QoS result is adopted to calculate the ﬁtnessfunction. With the goal of minimizing the PRB per transmittedMb, the improved BS conﬁguration set can be found throughthe iterations of GA. The case study illustrates the proposedmodel can enable the operator to ﬁnd the appropriate deploy-ment layout and minimize the required resources.From the above research, it can be found that the deploy-ment policy is usually found by iterative algorithms, suchas the GA, while the supervised learning-based training isadopted to predict the multiple network parameters as the inputof GA or evaluate the ﬁtness function as shown in Fig. 6.The combinations of the heterogeneous algorithms and ML asshown in Fig. 6 can cooperatively improve the performanceof the proposed model. Since the DL has shown improvedaccuracy rate and more advanced policy searching ability, itis highly expected to witness the application of the prevalentDL techniques in the BS deployment design.

2) Work State Management:

As the network trafﬁc isdynamically changing due to user mobility, the multi-tier BSscan be scheduled to switch on and off to reduce energyconsumption [136]. If the work state of the BS is changed, theuser association information should be adjusted accordingly toensure a qualiﬁed connection. Therefore, the work state of BSsshould be scheduled carefully to minimize energy consumptionas well as meet the QoS requirement.Since the users’ daily movements contribute to the similarchanging tendency of the trafﬁc patterns, the correlation be-tween the current trafﬁc data and historical experience can beutilized to design the BS switch on/off policy [106], [107]. Topredict the future trafﬁc with a historical proﬁle and switch off the BSs with low usage may be the easiest solution. The mainconcern to switch off some BSs is the potential deteriorationof QoS. To alleviate the concern, the accuracy rate of trafﬁcprediction affects network performance in terms of energysaving and QoS. Gao et al. [137] compare multiple MLmodels including Auto-regressive Integrated Moving Average(ARIMA) [138], prophet, random forest, LSTM, and ensemblelearning in terms of accuracy rate, speed, and complexity.Then, these models are utilized in trafﬁc prediction. Theprediction results are further utilized to calculate energy ef-ﬁciency. Thus, some BSs can be switched off if the KeyPerformance Index (KPI) is below the predeﬁned threshold.Similarly, Donevski et al. [139] utilize two kinds of NNs,including the dense NN and RNN to predict the future trafﬁcof Small Base Stations (SBSs) according to the previous trace.Then, a threshold is deﬁned to decide whether the SBS couldbe switched off or kept on. Another uniﬁed strategy is given bydirectly utilizing the trafﬁc trace to predict the BS switchingscheme as shown in Fig. 5. It should be noted that the thresholdin this proposal is adjustable to achieve a balance between thecoverage loss and efﬁciency loss. Simulation results illustratethat energy consumption can be reduced by 63%, while morethan 99.9% of requests can be satisﬁed.Different from the above scenarios which only consider twowork states, Pervaiz et al. [140] analyze the switching policyfor the multi-sleep-level-enabled BSs in a two-tier cellularnetwork. The machine learning technique is utilized to decidethe best sleep level of SBSs, while the users keep connectionswith the Macro Base Stations (MBSs). Speciﬁcally, the SVMregression model is considered to predict the vacation periodand operation time of the SBSs according to historical networktrafﬁc proﬁle. Then, the prediction results are analyzed alongwith energy consumption and latency to decide which sleeplevel the SBS should be switched to. It should be noted that theSVM utilized in this paper can be replaced by other regressionmodels.The above research works utilize the historical trafﬁc proﬁleto efﬁciently train the ML models in a supervised manner.Researchers have also proposed the approaches to combinethe RL and transfer learning to increase the ﬂexibility andaccelerate the convergence. Authors of [106], [107] considerthe RL agent to select the BS work modes for system powerminimization according to the trafﬁc patterns. Moreover,transfer learning [105] is exploited to use the past learningexperience in current scenarios, which can accelerate thelearning process. However, these two research works [106],[107] neglect the QoS even though the authors consider theuser association policy after switching off some BSs. To solvethis problem, in [141], the cost function of the RL model isdeﬁned as an adjustable combination of energy consumptionand service delay instead of only energy consumption [106],[107]. Consequently, their proposal can not only reduce energyconsumption, but also guarantee the diversiﬁed QoS require-ments. Additionally, the transfer learning technique is utilizedto accelerate the convergence of the considered AC model [98].Another similar research [142] also combines the RL andtransfer learning to design the BS switching policy. In thisproposal, the learned knowledge for spectrum assignment is transferred to the process of user association.Deep Q-learning (DQL) technique has also been appliedto design the BS switching policy based on the networktrafﬁc in [143]. Different from the research [106], [107] whichdirectly utilizes the trafﬁc pattern, authors in [143] consider atrafﬁc modeling module to iteratively ﬁt an Interrupted PossionProcess [144] and predict the next trafﬁc belief state. Sincethe trafﬁc model is learned in an online fashion, it can capturethe complex dynamics of real-world trafﬁc, which allows theadopted DQL model to output more accurate action. Theadopted Deep Q-network (DQN) decides the sleeping policyaccording to the output brief state of the trafﬁc modelingmodule. And the reward function is deﬁned as the sum ofthe operation cost and the service reward. To enhance theoriginal DQN model, a reply memory storing a certain amountof past experiences are utilized in the training step as abootstrapped estimation of true distributions. And the stableparameters are stored by a separate network to avoid thetraining oscillations and divergence. The authors also applyadaptive reward scaling to match the network outputs. Eventhough the research neglects the mutual effects among BSs,the proposed model is suitable for BSs with different trafﬁcpatterns. And the experiment with a network simulator anddataset illustrates the advantages of the proposed model overother ML algorithms.In the above research, to switch off some BSs in low usageon the one hand reduces energy consumption, on the otherhand sacriﬁces some network performance due to the resultedcoverage hole. Therefore, the proposed AI approaches usuallydeﬁne a weighted sum of energy consumption and QoS asthe reward or cost function to reach a balance [140], [141].To address the QoS sacriﬁce physically, Panahi et al. [145]consider the heterogeneous scenario where the Device-to-Device (D2D) technique is utilized to relay the messagestoward working BSs. To decide the work state for each MBSand Femtocell Base Station (FBS), the authors propose theFuzzy Q-learning (FQL) algorithm which combines the Q-learning (QL) and Fuzzy Interference System (FIS) [146],[147]. In the model, the FIS is utilized to map the relationshipbetween the input energy efﬁciency as well as the servicesuccess probability and the switching policy. In the QL model,the reward is deﬁned as the weighted probability of a D2D linksuccess probability, while a threshold of cellular link successprobability is adopted to decide whether the reward is positiveor negative. With the reward function, the 𝜖 -greedy algorithmallows to explore and exploit the potential switch on/off poli-cies until convergence. Even though every MBS/FBS decidesthe switching scheme, the control functionality including theinitialization and termination of the optimization process isdeployed in a central entity. And after each state transitionprocess, MBSs and FBSs receive the overall shared rewarddetermined by the central entity, and uses it to update the Qvalue to avoid the local selﬁsh optimization.Lee et al. consider the joint cell activation and user associa-tion for load balancing and energy saving in their work [148].The authors adopt the QL method. Speciﬁcally, each BS istreated as an agent, while the state and action are currentactivation variable and mode, respectively. Once each BS chooses an action, a user association scheme can be foundby relaxing the load balancing problem to a convex problem.Then, the Q-value based on the heterogeneous network (Het-Net) power consumption can be calculated to evaluate the pairof BS activation and user association scheme. By iterating theprocess until the threshold is reached, the best scheme whichjointly optimizes the load balancing and energy efﬁciency canbe obtained. Results illustrate the signiﬁcant improvement ofthe network performance and energy efﬁciency.

3) User Association and Load Balancing:

To switch theidle BSs to sleep or off mode may result in the overloadedusage of nearby working BSs, which further leads to the QoSdeterioration. To strike the balance between energy efﬁciencyand QoS, AI-based user association schemes have been stud-ied.Zhang et al. adopt the QL technique to decide the userofﬂoading policy to reduce energy consumption as well asimprove network throughput [149]. In this paper, the authorsconsider that part of the connected users for each SBS canbe ofﬂoaded to neighbor SBS or MBS in the multi-tierUltra Dense Networks (UDNs). In this way, the idle SBScan be turned to sleep or off mode, while the overloadedSBS can be alleviated to ensure the provided services. Theproposed QL model aims to solve the problem of how muchworkload of each SBS can be ofﬂoaded to other BSs. Thestate space includes the load of studied cell and neighbor cellsas well as the proportion of users who could be ofﬂoaded.And to guarantee energy saving performance and networkthroughput concurrently, the reward function considers theEE, throughput, and the load difference among the cells.The authors also utilize the mean normalization method toeliminate the sample difference of the considered factors todeﬁne the reward function.The authors of [117] combine the game theory and RLtechnique to solve the user association and Orthogonal Fre-quency Division Multiple Access (OFDMA) tile assignment.Speciﬁcally, each player is treated as a player to choosethe heterogeneous NodeB (hgNB) considering the potentialproﬁt and the effects on other players. Since the combinatorialproblem can result in the huge size of potential solutions, theauthors propose two RL approaches to intelligently guide thesearch: the regret learning-based algorithm and the ﬁctitiousplay-based algorithm. In the former one, the Q value is deﬁnedaccording to the regret which is interpreted as the differencebetween the actual payoff the agent realizes and the potentialpayoff if another HeNB is chosen. In the latter one, the agentreinforces a strategy considered the payoff calculated on theempirical frequency distribution of the opponents.Wang et al. [54] utilize the ML techniques to predict thepotential trafﬁc burst and then conduct the trafﬁc-aware vehicleassociation. In their proposal, the supervised learning model isadopted to analyze the statistical correlation between past andpresent trafﬁc. And online learning is adopted with the goal ofminimizing regret instead of loss. In the proposed architecture,every AP performs independent trafﬁc prediction, while thecentral coordinator conducts the global trafﬁc balance. Sincethe vehicles are traveling across the APs, the trafﬁc changesin adjacent cells are correlated. Thus, the trafﬁc prediction of each AP is based on the historical data rates and associationinformation of neighboring APs. Once the central coordinatorobtains the trafﬁc forecast results, it can proactively update theBS conﬁgurations to change the user association information.Thus, some BSs can make preparations for the coming trafﬁcburst, while other BSs can be switched to off mode. C. Power Control and Resource Allocation

According to Equation 3, to improve the system energyefﬁciency, the transmit power control and resource alloca-tion which affects the interference is critical. Since the ul-tra massive Multiple-Input Multiple-Output (MIMO), Non-Orthogonal Multiple Access (NOMA), and beamforming tech-nologies will be important techniques in 6G [1], we willintroduce the power control for these parts as well as thegeneral power control issue.

1) General Power Control:

The transmit power of BSsaffects the received SINR at the targeted receivers as wellas interference for users in neighboring cells. Thus, the op-timization of energy consumption is also jointly consideredwith interference mitigation through the transmit power con-trol. In [21], [55], Zhang et al. utilize the RL technique tooptimize the transmit power for alleviating the interferencein neighboring cells according to the received SINR and userdensity. In their proposal, for each transmit power level, everytarget BS is assumed to obtain a deﬁned utility according tothe received SINR at the target users, energy consumption,and interference to non-served users. Then, the Q-value canbe deﬁned according to the utility to measure the overallperformance of the transmit power level. With the Q-function,the target BSs apply the 𝜖 -greedy policy to determine theoptimal transmit power level. The performance illustrates thereduced energy consumption and interference as well as im-provement of network throughput. In [21], the authors furtherproposed a CNN based DRL model to map from the networkstates including the received SINR, user density in the targetcell, and estimated channel conditions in neighboring cells, tothe transmit power level. The performance illustration showsthat the DRL based method can further improve the networkperformance in terms of energy consumption, throughput, andinterference. Another important advantage is that the DRLmethod converges much faster than the RL based strategy.Dong et al. [108] utilize the Fully-connected NN (FNN) andcascaded NN to optimize the transmit power and channel allo-cation aiming at minimizing the network energy consumptionconsidering the various service requirements. In this paper, thearrival rates of services and packets are considered as input.For the FNN, the transmit power and channel allocation areadopted as the output. Since the transmit power is a continuousparameter while channel allocation has discrete values, thequantization error in the output layer cannot guarantee theoptimal solution even though the DL structure is supervisedtrained with the labeled data generated by global optimizationmethod. To solve this problem, the authors consider thecascaded FNN structure where the ﬁrst FNN is to predictthe channel allocation and the second for power control ofeach user. The authors also analyze the non-stationary channel conditions and different service types, and then adopt thetransfer learning technique to only ﬁne-tune the last a fewlayers of the structures through backpropagation process asshown in Fig. 7. For the non-stationary wireless channels, theﬁrst FNN in cascaded structure only needs to ﬁne-tune the lasta few layers with a small number of data samples as shownin Fig. 7a. On the other hand, for the reason that the channeldistribution which is the input changes, all layers of the secondFNN need to be ﬁne-tuned. Moreover, the authors mention thatto ﬁne-tune the last a few layers can be also applied when theservice type changes. For instance, the parameters of last a fewlayers of the cascaded FNN using for delay-tolerant servicecan be ﬁne-tuned to ﬁt the delay-sensitive or URLLC servicesas shown in Fig. 7a. Furthermore, if we consider multiple typesof services exist, the authors propose a structure as shownin Fig. 7b, where a few layers are just cascaded at the endof FNN for each service. In this way, we can only ﬁne-tunethe parameters of the newly-added layers with a few trainingsamples.Mattiesen et al. [49] utilize the ANN to determine thetransmit power according to the channel states. The researchgoal of their proposal is to optimize the weighted sum energyefﬁciency, which is a non-convex problem. To solve thisproblem, they ﬁrst propose an improved Branch-and-Bound(BB) based algorithm to obtain the global optimum solution.Then, the results obtained with this method can be furtherutilized to train the ANNs in a supervised manner. Sincethe training is conducted ofﬂine, the ANN can be trainedwith a large dataset generated by the proposed BB-basedalgorithm to achieve global optimal performance. And theonline calculation of the transmit power based on the ANNis illustrated to be robust against mismatches between thetraining set and real dataset conditions.Liu et al. [150] study the power allocation in a distributedantenna system and utilize the KNN model to optimize thespectrum efﬁciency and energy efﬁciency. In this paper, thesingle-cell distributed antenna system with multiple RemoteAccess Units (RAUs) is considered and the transmit power ofthe RAUs should be optimized. However, the research purposeis not for further improvement over traditional methods. Onthe other hand, they target on solving the high computationoverhead of existing methods and hope to utilize the KNNto map the relationship between the user location and powerallocation with the assumption of available Channel StateInformation (CSI) and orthogonal channel resource. Thus, theyutilize the traditional method to obtain some data samples fortraining the KNN models. In the running phase, Euclideandistance between users in the testing and training groups arecalculated. And the same power of the nearest neighbor inthe training samples is copied to the user in the test group.The ﬁnal performance analysis shows the KNN can achievenear-optimal performance.The power control for multi-layer HetNet is more com-plex and difﬁcult to reach the global optimum. Zhang andLiang [103] propose a multi-agent-shared-critic DRL methodconducted in the core network. Speciﬁcally, in the core net-work, an actor and target actor DNN are trained for everyBS, while a shared DNN pair acts as the critic and target …… …… …… … … … …… ………… …… …… … … ……… …… Source task

Target task

Fine-tuned layersInput Output

Input Output

Source scenarioTarget scenario

Learned knowledge transfer (a) The transfer learning model for non-stationary wireless channels.

Service n

Service 2 … Service 1 Target taskInput Input Input Output Fine-tuned layers (b) The transfer learning model for multiple typesof services.

Fig. 7: The transfer learning techniques for dynamic channel conditions and multiple service types.critic. The actor DNNs are trained with redundant experience,then share the weight parameters with the corresponding localDNNs. The local DNNs can calculate the transmit power withthe real-time local data. To avoid the problem of involving thelocal optimum, the core network utilizes the global experienceto train the critic DNNs. Li et al. [151] combine the graphtheory and RL technique. In this research, the conﬂict graphconstructed according to the received SINR by the users isutilized to dynamically cluster the cells in order to optimizethe channel allocation. To optimize the power control in cellclustering, the RL technique is utilized where the SBS acts asthe agent. The state space consists of the interference set andRSS, while the reward is deﬁned according to the throughputand interference.With the extension of utilized frequency bands to THz, thepropagation loss and penetration loss will become increasinglyserious. To solve this problem as well as keep satisﬁed cover-age, the radius of future THz-enabled BSs will be limited to10 meters. Thus, the power control to mitigate the interferencein an indoor network will attract increasing attention. Authorsin [152] propose the QL-based distributed and hybrid powercontrol strategies to optimize the network performance interms of throughput, energy efﬁciency, and user experiencesatisfaction. For the BSs without mutual communications, eachBS acts as the agent to determine the power for each ResourceBlock (RB) in a selﬁsh manner. On the other hand, if a centralcontroller is provided, it conducts the QL model to decide thetransmit power for each BS. In these two methods, the stateis the received SINR level and current transmit power level,while the action is the power level that can be assigned toeach RB. The reward functions are deﬁned according to thethroughput.

2) Beamforming:

Adaptive beamforming is an importanttechnology to adjust the directionality of the antenna arrayto enable highly directional transmissions in densely pop-ulated areas. Through the adaptive beamforming technique,the network performance of the hotspot can be signiﬁcantlyimproved, which further results inincreased energy efﬁciency.However, the hotspot areas are not ﬁxed due to the dra-matically changing user distribution caused by the lifestyleand habits. In [125], Liu et al. utilize the LSTM to extractthe spatial and temporal features of UE distributions fromthe history dataset and detect future hotspots. Based on thelocation information of predicted hotspots, hybrid beamform-ing which combines the digital and analog beamformingtechniques at the MBS can be adjusted to minimize the totalpower consumption. Speciﬁcally, in the analog beamformingdesign of massive MIMO systems, the phase shifter canbe adjusted to maximize the large array gain. For hybridbeamforming, the optimal power allocation and beamformingdirections can be found by converting the original probleminto a convex one. The ﬁnal results also illustrate the reducedenergy consumption.Du et al. jointly optimize the cell sleeping control andbeamforming operation by DNN models in [153]. The au-thors ﬁrstly model the power minimization problem throughjoint cell sleeping and coordinated beamforming. And theformulated power minimization problem is constrained by therequired SINR and maximum power threshold. To alleviatethe computation overhead of the numerical method for large-scale scenarios, the authors consider the DNN models to mapthe relationship between the channel coefﬁcients and beam-forming vectors. And the numerical method can be adoptedto generate the training data which are further utilized to trainthe constructed DNN models. To illustrate the performance, the no sleep control and equivalent association strategies arecompared. The ﬁnal results show the DNN-based method canachieve obvious advantages in terms of power saving andsatisfactions of QoS demands.The authors of [154] consider the manifold learning [155]and K-means method [88] to cluster the multi-cell users intoseveral regions and reduce the complexity of the consideredmassive MIMO operation. In the two-tier massive MIMOsystem, the interference mitigation and MIMO hybrid precod-ing process are challenging due to the large channel dimen-sionality and high complexity caused by the large antennacount. To alleviate the computation overhead, the authorsﬁrst utilize the maximum-minimum distance-based K-meansmethod to cluster users into different groups. Thus, with themanifold learning, the nonlinear high dimensional channelcoefﬁcients can be transformed into the linear combinationsof neighborhood channel coefﬁcients, resulting in signiﬁcantdimension reduction of the channel matrix while keepingthe original geometric properties of the underlying channelmanifold. Furthermore, the two-tier beamformers are mainlycharacterized by the distribution of low-dimensional manifoldsand split into outer beamformer and inner beamformer, whichare utilized to minimize the inter-cell interference and multi-user intra-cell interference. The ﬁnal results illustrate theimproved SINR and reduced computation complexity.Beamforming is also jointly optimized with some othernetwork factors to improve energy efﬁciency, such as the relayoperations. Zou et al. [156], [157] adopt the DRL techniqueto improve the multi-antenna Hybrid AP (HAP) beamformingstrategies and RF-powered relay operations. In their consid-ered scenario, the individual relay can forward or backscatterthe signal to improve the received SINR. Moreover, the relayneeds to harvest part of the received power to keep continuousworking states. Then, a hierarchical Deep Deterministic PolicyGradient (H-DDPG) model is proposed to select the relaymode and optimize the parameters including the beamformingvector, power splitting ratio, and reﬂection coefﬁcient in orderto maximize the SINR. Speciﬁcally, the considered model sep-arates the studied problems into two sub-problems. The DQNmodel is utilized in the outer loop to select the relay mode.Once the relay mode is selected, the channel conditions, whichcan be used by the AC networks [98] of the inner loop DeepDeterministic Policy Gradient (DDPG) to generate the actions,representing the values of beamforming and relay operationparameters. To accelerate the convergence of the conventionalDDPG model caused by the random initialization of double Q-networks, an optimization model is developed to approximatethe original problem, which can estimate a lower bound ofthe target value. The simulations show the improvement ofthe ﬁnal reward value and convergence speed compared withthe model-free DDPG method. Moreover, the H-DDPG-basedframework can signiﬁcantly improve throughput.Since the UAVs are usually adopted as the ﬂying BSs, AItechnique has also been applied for the UAV-enabled cellularnetworks. Li et al. [158] combine the ML and Meaning FieldGame (MFG) techniques to jointly optimize the beamformingand beam-steering to maximize the system sum rate. In theconsidered scenario, to optimize hybrid beamforming lies in the optimization of the hybrid analog beamformer. The Cross-Entropy (CE) function is adopted to evaluate the obtainedsystem sum-rate corresponding to each randomly generatedhybrid analog beamformer. In the beam-steering optimizationprocess, the MFG framework is adopted, where the beams actas the agents and information interactions are converted intothe interactions with the mass. Considering the conventionalnumerical methods require a large dimension of action andstate spaces, the RL technique is adopted to solve the MFG.Speciﬁcally, the state is deﬁned as the combination of indexoffset of antenna elevation and azimuth angles, while theactions represent the beam selectable path, elevation Angleof Departure (AoD), and aimuth AoD. The reward functionis deﬁned according to the obtained system rate. Through theQL process, the optimal action can be chosen.

3) MIMO:

In the distributed massive MIMO systems, thepilot sequences transmitted by users are usually adopted toestimate the CSI. However, the pilot contamination caused bythe adopted same orthogonal pilot sequences affects the chan-nel estimation accuracy. To alleviate the pilot contamination,the power allocated to each pilot sequence is important. Xuet al. design an unsupervised learning method to predict thepower allocation scheme according to the large-scale channelfading coefﬁcients [159]. In their research, the authors considerthe Minimum Mean-Square Error (MMSE) channel estimatorand formulate the problem as the sum MSE minimization.Then, a DNN is exploited with the channel fading coefﬁcientsand power allocation as the input and output, respectively.With the loss function deﬁned by the sum MSE of channelestimation, the training process enables the DNN to mapnonlinear relationship from channel fading coefﬁcients to theoptimal pilot power allocation. Similarly, the authors of [14]consider the same input and output for the designed DeepConvolutional Neural Network (DCNN). The authors focuson the maximum sum rate problem in limited-fronthaul cell-free massive MIMO. And a heuristic sub-optimal approachis proposed to obtain some data samples, which are to trainthe DCNN model. Another similar research [160] is to utilizethe ANN to map from the users’ positions or shadowingcoefﬁcients to the power allocation vector. All of these re-search works have veriﬁed the advantages of DL techniquesover traditional mathematical models in terms of the powerallocation in massive MIMO systems.Intelligent power control has also been considered to sup-press the attack motivation for more secure communicationsof MIMO transmitter in [161]. In the considered scenario,the malicious attacker can choose different attack modesincluding jamming, eavesdropping, and spooﬁng according tothe potential reward. The authors combine the game theoryand RL to control the power of MIMO transmitter for thesuppression of the attack motivation considering the requiredEE. Speciﬁcally, a game model is formulated between theMIMO transmitter and the malicious attacker. And the RLtechnique is adopted to derive the optimal power control andtransmission probability to reach the Nash Equilibrium (NE) infavor of the MIMO transmitter. The ﬁnal results illustrate theimprovement of transmission secrecy performance and energyefﬁciency. The authors of [162] and [48] utilize the CE-based algo-rithm to solve the hybrid procoding problem in mmWavemassive MIMO systems. Speciﬁcally, the CE-based algorithmis adopted to update the probability distribution of the analogbeamformer in the iteration process and then the "elite" analogbeamformer which can result in minimum total transmit powercan be found. Moreover, the authors of [48] adaptively weightdifferent elites according to their objective values, which canfurther improve the performance of CE-based algorithms. Thesimulations in the two paper verify the CE-based hybridprecoding scheme can improve energy efﬁciency of mmWavemassive MIMO systems with low complexity.Different from the above research focusing the intelligentpower control in the massive MIMO system, the authorsof [163] propose a DL-based user-aware antenna allocationstrategy. In their research, the LSTM model trained with thereal dataset is adopted to predict the variations of futureassociated users for the massive MIMO-enabled BSs, which issimilar to the applications of DL in trafﬁc forecast [15], [137],[139]. Based on the prediction results, the optimum numberof BS antennas are allocated to maximize the EE.

4) NOMA:

The NOMA technique introduces an extrapower domain to enable multiple users to be multiplexedon the same channel resource [101], which can improve thenetwork capacity and resource efﬁciency. Thus, the resourcesincluding the power and channels are usually considered as thekey metrics to be optimized for network performance improve-ment. In [101], the authors ﬁrst utilize the DRL technique toconduct the channel assignment for alleviating the computationoverhead of conventional methods due to the huge solutionspace. In the proposal, each BS acts as the agent, while theNOMA system is regarded as the environment. The attention-based NN is adopted to model the channel assignment policy,with the encoder computing the embedding of state spaceand decoder outputting the probability distribution over allstates. Once a channel assignment solution is obtained, thecorresponding power allocation can be calculated. Then, thederived system performance is further utilized to deﬁne thereward function. And the training process enables the proposedNN to ﬁnd the optimal channel assignment according to thesystem states with low complexity. The authors of [164] alsoutilize the DL technique to alleviate the computation overheadof conventional methods. However, their proposal train theDNN in a supervised manner, where the downlink channelgains and corresponding power allocation scheme are as theinput and output, respectively.Zhang et al. [22] also consider the DL-based radio resourcemanagement to improve EE in NOMA networks. Besidesthe subchannel and power allocation considered in [101], theauthors of [22] also analyze the user association since theyconsider the two-tier networks including MBSs and SBSs.The authors optimize these three factors separately with threemethods. Speciﬁcally, the semi-supervised learning-based NNsand supervised learning-based DNN are adopted to optimizesubchannel assignment and power allocation, respectively,while the Lagrange dual decomposition method is used tosolve the user association problem. In the optimization ofsubchannel assignment, the numerical iterative method, the two-side matching algorithm, is utilized to generate somelabeled data samples, which cooperate with the unlabeleddata to train the NN in a semi-supervised learning manner.The authors consider the co-training semi-supervised learningmodel [165], where two NNs are trained with the data fromdifferent views to produce the optimal learner. The inputand output of the NNs are the channel gains and allocationstrategies, respectively. Since the classiﬁcation with unlabeleddata still depends on the labeled data, the authors select thehighly conﬁdent labeled data with the most consistency. Tooptimize the power allocation, the DNN is trained with thelabels generated by an iterative gradient algorithm.User clustering is an important factor to improve energyefﬁciency for NOMA-enabled multi-tier cellular networks.Zhang et al. [89], [90] adopt the K-means clustering to clusterthe users in Thz MIMO-NOMA systems. In their research, theusers are separated into different clusters of SBSs and MBSin the coverage. Since the THz transmission is challengedby the severe path spreading loss and molecular absorptionloss, a suitable clustering scheme can improve the channelquality and suppress the interference, resulting in higher SINRand transmission throughput. Then, the authors propose anenhanced K-means strategy to cluster the users. To overcomethe ﬂuctuation with different initial clustering centers forthe conventional K-means method, the authors calculate thechannel correlation parameters of different cluster heads andchoose the one that maximizes the metric. And the MSEanalysis clearly veriﬁes the improved convergence comparedwith the conventional K-means method.

D. Energy Harvesting-enabled Base Station

Motivated by the concern for climate change and inspiredby the development of energy harvesting, the renewable energyresources have been considered to alleviate the requirement forthe power grid. On the other hand, the dynamics of renewableenergy resources complicate the management and operationof cellular networks. AI techniques have been widely studiedto track the dynamic harvesting source and optimize networkoperations.To optimize cellular network performance with renewableenergy-enabled BSs, the most direct method is to predict theharvesting power. For the scenario where BS is powered by aphotovoltaic (PV) panel, battery, and power grid, the authorsof [15] adopt the Block Linear Regression (BLR) [166],ANN [167], and LSTM [168] to forecast the trafﬁc, whilethe linear regression model is utilized to predict the dynamicharvesting power. To measure the performance of these MLmodels, the metrics including the Average Mean AbsoluteRelative Error (AMARE) and Average Mean Error (AME) areanalyzed. Then, the prediction results can be utilized to switchoff some micro BS in low usage to save energy.Miozzo et al. [169] propose a distributed RL-based SBSswitching strategy to balance the network drop rate and energyconsumption for two-tier cellular networks where the SBSand MBS are powered by the electricity grid and renewablesolar energy. The state space includes the instantaneous energyharvested, battery level, and trafﬁc load, while the reward is deﬁned according to the system drop rate and battery level.However, this method has the limitation to reach the systemoptimization since each SBS acts as the agent and decidesthe working state according to its local state. To alleviatethis problem, the authors further propose a layered learningoptimization framework in [126]. In the lower layer, eachSBS still follows the original manner to decide the switchingscheme in a distributed intelligent manner. The only differenceis that a heuristic function is deﬁned and united with theregular Q-value to select the optimal policy. Moreover, theheuristic value is decided in the upper layer in a centralizedmanner. Speciﬁcally, the MBS utilizes a multi-layer NN toforecast its trafﬁc load and judge whether the system isunder-dimensioned or over-dimensioned. Based on the loadestimation, the heuristic value is derived.Li et al. [170] utilize the DRL method to manage the workstates of the harvesting-enabled SBS in a centralized manner.In their proposal, the central controller acts as the agent todecide the action which is a vector consisting of binary unitsrepresenting the switching decision for each SBS. And thestate space includes the harvested energy, battery levels, trafﬁcloads, throughput, and delay of all SBSs. Since the researchaims to balance the EE and QoS, the reward function is deﬁnedas the weighted sum of the two metrics. Using the DNNs toapproximate the Q-value, the ﬁnal simulation results clearlyillustrate the advantages of DQL against the traditional QLin terms of energy efﬁciency and delay. On the other hand,this method has a shortcoming that the size of action spaceexponentially increases with the number of SBSs, which leadsto abundant explorations during the training process. To solvethis problem, Li et al. [171] consider the DDPG model. In thismodel, the AC algorithm [172] is adopted where an actor NNand a critic NN are adopted to select an action and evaluatethe selected action, respectively. The ﬁnal results verify theimproved energy efﬁciency over DQN and QL methods.Since the renewable energy-enabled BSs are usuallyequipped with batteries to store the harvested energy, tooptimize battery management can also contribute to the EE.The authors of [173] propose the FQL-based power manage-ment which combines the QL and FIS [174] to minimize theelectricity expenditures and enhance the battery life span. Theauthors also construct the power consumption model relatedto the real-time trafﬁc as well as the battery aging model,which is meaningful to design a more detailed energy-efﬁcientBS management policy in the future. Piovesan et al. [175]analyze the constrained capacity of SBS battery and considerenergy sharing in the design of the SBS switching scheme. Theauthors utilize and compare imitation learning [176], QL, andDQL methods. The considered state includes the battery leveland harvested energy, while the reward functions in two RLmodels are deﬁned according to the grid energy consumption.In the imitation learning model, the ANN is supervised trainedwith the labeled data generated by a mathematical model [177]to map the relationship between the system state and switchaction. For the two RL models, the difference is that the Q-value is stored in a table for QL, while DQL utilizes an ANNapproximator to estimate the Q-value. The ﬁnal comparisonillustrates the DQL model achieves the best performance in terms of energy saving and system outage, which is moresuitable for the highly-dense scenarios.Wei et al. [123] utilize the policy gradient-based ACnetworks [178] to solve the user scheduling and resourceallocation problem for the optimization of EE in a two-tierHetNet where the SBSs are powered by solar and wind energy.Since the wireless fading channels and stochastically harvestedrenewable energy have the Markovian property [102], theoptimization of user scheduling and resource allocation can beformulated as an MDP, which lays the foundation for usingDRL method. In their proposal, the state space consists ofthe SINR of each user and battery energy level of each SBS,which are both continuous variables. The action space includesthe number of allocated users and subchannels as well as thetransmission power. The reward function is deﬁned as theEE with only the grid energy consumption considered. Andthrough online training, the ﬁnal numerical analysis illustratesthe improved EE.From the above introductions, it can be found that AItechniques are efﬁcient to address the dynamics of energyharvesting process. And similar to the BSs which are onlypowered by electricity grid, AI models can be utilized to op-timize the switching scheme, user association, power control,and resource allocation. E. summary

In the above research, AI techniques can be utilized tooptimize different network parameters in order to reduce en-ergy consumption or improve the EE. The supervised learningtechnique can be utilized to regress the complex unknownrelationships among the network parameters. For example, AImodels can be trained with the data generated by conventionalmethods to map the relationship between channel conditionsand power allocation [14], [153]. Thus, AI-based algorithmscan avoid the massive iterations and alleviate the computationoverhead of conventional methods. Moreover, the RL and DRLtechniques can efﬁciently address the problem of the huge sizeof solution space [171], [179]. Furthermore, the combinationsof ML/DL models with heuristic algorithms or game theorycan further enhance efﬁciency [79], [117], [134], [161], [180]IV. M

ACHINE T YPE C OMMUNICATIONS

Besides the cellular networks, MTC techniques provideusers with more choices and ﬂexibility. And the developmentof IoT will result in a great surging number of MTC de-vices [181]. In this section, we ﬁrst give the power consump-tion model of MTC and introduce the related AI strategies toreduce energy consumption and improve efﬁciency.

A. Power Consumption Modeling

The actual energy consumption of MTC depends on thedeﬁnite scenario including the transmission policy, devices,information size, and so on. In this part, we give a generalpower consumption model for the single-hop MTC scenario,by which the multi-hop power consumption model can bederived. The total power consumption of a machine node is mainlyutilized for two purposes: transmission and receiving packets,which can be simpliﬁed in the following equation. The detailscan be referred to [182]. 𝑃 𝑚 ( 𝑑 ) = 𝑃 𝑡 + 𝑃 𝑟 + 𝑃 𝑎 ( 𝑑 ) (4)where 𝑃 𝑚 denotes the total power consumption of an MTCnode. 𝑃 𝑡 and 𝑃 𝑟 are the power consumed by the circuit fortransmitting and receiving and usually regarded as constants. 𝑃 𝑎 ( 𝑑 ) denotes the power consumption of Power Ampliﬁer(PA), where 𝑑 is the transmission distance. From the equation,it can be found that the total power consumption depends onthe PA. However, the value of 𝑃 𝑎 ( 𝑑 ) is affected by manyfactors including the speciﬁc hardware implementation, DCbias condition, load characteristics, operating frequency, andthe required PA output power 𝑃 𝑚𝑡 . In a speciﬁc scenariowith given MTC devices, we usually only study the requiredminimum PA output power while the other factors are constant.And the relationship between the two metrics can be denotedas below: 𝑃 𝑚𝑡 ( 𝑑 ) = 𝜂𝑃 𝑎 ( 𝑑 ) (5)where 𝜂 denotes the drain efﬁciency of PA. Speciﬁcally, thevalue of required minimum PA output power 𝑃 𝑚𝑡 can be calcu-lated according to the given SINR threshold at the receiver sideand the path loss model between the transmitter and receiver.Then, the power consumption for a single-hop MTC modelcan be calculated. By adding the power consumption for eachhop, the multi-hop power consumption can be obtained. Sincethe deﬁnition of energy efﬁciency in MTC is similar to thatin cellular networks, we can use Equation 3 accordingly.It can be found the power consumed by the MTC nodeis mainly to support the circuit and PA. Since most of theMTC nodes do not need to keep the working state, the idlenodes can be turned into the sleep state to reduce the circuitenergy consumption. For the working nodes, how to reducethe required transmit power 𝑃 𝑚𝑡 as well as minimize thetransmission time are the main factors considered in greencommunications. For the former part, the transmit powerdepends on the path loss and required SINR at the receiver.The practical solutions to reduce the transmit power includethe optimization of network deployment, access technologies,and resource allocation. To reduce the transmission time, weneed to optimize the transmission protocols, such as routingand relay. Similar to the renewable energy-enabled BSs, energyharvesting and sharing are also important techniques towardgreen MTC. The following paragraphs introduce the relatedresearch one by one. B. Energy-Efﬁcient Network Access

Various access technologies have been developed for dif-ferent MTC scenarios, such as cellular communications, IEEE802.15.4, WiFi, Narrow-Band IoT (NB-IoT), backscatter com-munications, and so on. The satellites and Unmanned AerialVehicles (UAVs) have been emerging platforms to provideInternet access for devices. In this part, we discuss howAI is utilized to improve energy efﬁciency of these accesstechnologies.

1) Terrestrial Access Conﬁgurations:

Even though cellularcommunications can provide stable and high-throughput con-nections, the high power consumption to keep connections aswell as the expense of the cellular infrastructure challengethe wide applications in MTC. Moreover, since differentMTC services have heterogeneous QoS requirements and aredistributed in various areas including the sparsely populatedareas and hazardous environments, to develop correspondingaccess techniques is important to reduce energy consumptionor extend the lifetime. Some AI researcher works related toimprove energy efﬁciency of these access technologies areintroduced in the following paragraphs. We also give Table IIto give more examples to adopt AI to optimize the access layerfor green communications.Li et al. adopt the RL technique to optimize the duty cyclecontrol for each router node in IEEE 802.15.4-based M2Mcommunications [183], [184]. The authors consider the QLmethod to design the superframe order for minimizing the sumof weighted energy consumption and delay. In the consideredRL model, the agent interacts with the environment andchooses the suitable superframe order according to the queuelength. And the ﬁnal simulation results verify the improvedenergy efﬁciency. Xu et al. also utilize the model-free RLmethod to improve the throughput and EE of IEEE 802.15.4-enabled Industrial IoT (IIoT) networks [185]. In their research,the QL is adopted to adjust the sampling rate of the controlsubsystem and backoff exponential, which is difﬁcult to beaddressed by traditional stochastic modeling approaches. Forthe IEEE 802.15.4-based MTC scenarios, Zarwar et al. [186]give a comprehensive survey on RL-enabled adaptive dutycycling strategies, which can be referred for more knowledge.Alenezi et al. focuses on LoRa communication technologyand utilize the K-clustering method to cluster the nodes inorder to reduce the collision rate [187]. To address the highprobability of packet collision caused by random access and si-multaneous transmissions, the authors ﬁrst utilize the K-meansclustering to separate the IoT nodes into several groups andthen schedule their transmissions according to dynamic prior-ity. The ﬁnal simulations illustrate the signiﬁcant reduction ofcollision rate, which further results in the decreased transmis-sion delay and energy consumption. Azari and Cavdar [188]also utilize AI to optimize the performance of LoRa. Theauthors consider the Multi-Agent Multi-Arm Bandit (MAB)to choose the best transmit power level, spreading factor,and subchannel to maximize the reward which is deﬁned asa weighted sum of communication reliability and EE. Theanalysis illustrates the lightweight complexity of the proposedalgorithms and veriﬁes the performance improvement in termsof energy efﬁciency and transmission success probability.Guo and Xiang [202] utilize the distribute multi-agent RLtechnique to pick the power ramping factor and preamble foreach UE in the NB-IoT networks. In their research, an adaptivelearning rate based QL algorithm is proposed for the non-stationary environment, with the reward deﬁned according tothe UE’s energy consumption. Moreover, the learning rate isadjusted after comparing current expected reward and expec-tations. The authors of [203] also utilize the QL techniquesto optimize the conﬁgurations in the random access process. TABLE II. Some Related Research Works Using AI to Optimize Network Access for Green MTCsResearch work Accesstechnology Learningmethod AImodel Input/state Output/action Target/metricZhou et al.[189] UAV reinforcement DNN UAV location,task information,energy cost ofﬂoadingdecision delayNguyen et al.[190] UAV, D2D reinforcement DDPG D2D SINRinformation harvestingtime energyefﬁciencyChen et al.[191] satellite reinforcement QL trafﬁc demand,channel condition powerallocation servicefairnessÖzbek et al.[192] cellularnetwork, D2D supervised ANN channel gain powerallocation energyefﬁciencyZhang et al.[193] cellularnetwork, D2D reinforcement DDPG QoS satisfactiondegree mode selection,power control energyefﬁciencyJi et al.[194] cellularnetwork, D2D reinforcement DQN SINRtransmission power transmissionpower change energyefﬁciencyChowdhury et al.[195] unknown reinforcement DNN current resourceallocation, servicedeman resourceallocationstrategy energy costand delayYang et al.[196] WiFi, VLC reinforcement DQN channel status,channel qualityservice types,service satisﬁcation network selection,channel assignment,power management energyefﬁciencyYang et al.[197] CRN actor-criticreinforcement AC channel states,device priority,channel SNR,trafﬁc load power control,spectrum management,modulation selection transmissionrate, powerthroughput,and delayRahman et al.[198] RAN reinforcement DQN content states,channel states,and power caching policy,and powerallocation delaySharma et al.[199], [200] unknown reinforcement DNN battery states, andchannel gains transmit power throughputBao et al.[201] cellular reinforcement QL battery states transmit power SNRTheir proposal focuses on the optimization of three parametersincluding the number of random access channel periods, therepetition value, and the number of preambles in each accessperiod. In the single-cell scenario, the tabular QL, linearapproximation-based QL, and DQN methods are adopted bythe eNB to predict the number of preambles in order tomaximize the served IoT devices. In the multi-cell scenario,the huge size of the action space composed of three parametersis a great challenge. The authors consider an action aggregatedapproach by converting the selection of deﬁnite value to thechoice of increase or decrease. Then, the three QL methodsare compared with a cooperative multi-agent DQL proposed.Lien et al. study the intelligent radio access in vehicularnetwork to strike the balance among energy efﬁciency, latency,and reliability [204]. The authors concentrate on the fronthaulradio resource starvation and propose an RL-based MABalgorithm to avoid the backhaul transmission in the corenetworks. In the considered scenario, each vehicle can simul-taneously access multiple BSs to request the contents usingthe feedbackless transmission schemes, which further meansdifferent communication reliability, energy consumption, and latency. To strike the tradeoff among energy efﬁciency, latency,and reliability, the authors ﬁrst formulate the Lyapunov func-tion [205] to derive the optimum number of BSs to meet thecontent request of each vehicle. Then, to decide whether touse the feedback-based or feedbackless transmission scheme,the authors construct the MAB model and utilize the 𝜖 -greedyRL algorithm to solve this problem. Speciﬁcally, the researchgoal of this step is to minimize the long-term expected costwhich is deﬁned as the weighted sum of request drop event,transmission latency, and energy consumption.

2) Access through Satellites:

Satellites can provide seam-less coverage for IoT devices, especially for rural and remoteareas. However, the large path loss challenges the systemEE and lifetime. Authors of [206] study DRL-based channelallocation to improve the system EE as well as guarantee theQoS for LEO satellite IoTs. The authors formulate the channelresource allocation as an MDP and further utilize the DRLtechnique to solve it. In their proposal, the agent is assumedto choose an action to assign the channels according to thestate which is deﬁned as the user task size and location. Theauthors also construct the users’ requests into an image as the Policy

Environment Value function actionState

Critic

Actor Reward TD errorAgent channel UE

Fig. 8: The Actor-Critic model for UAV-assisted IoT network.input of the considered NN, which can reduce the input sizeand accelerate the learning process. The proposed intelligentapproach is illustrated to save more than 60% of energy.Sun et al. utilize the DL technique to optimize the Suc-cessive Interference Cancellation (SIC) decoding order forNOMA downlink system in satellite IoT networks [207]. Inthis research, the long-term joint power allocation and ratecontrol scheme is formulated to improve the NOMA downlinkrate. Then, the Lyapunou optimization framework [205] isadopted to convert the original problem to a series of on-line optimization sub-problems, where the power allocationdepends on the SIC decoding order, which is further affectedby the queue state and channel states. Due to the continuouschanges, the DNN model is adopted to map from the states ofqueues and channels to the SIC decoding order. Moreover, theDNN is trained in a supervised manner with the data obtainedby traversing all possible choices.Han et al. combine the game theory and DRL to optimizethe anti-jamming performance of satellite-enabled army IoTnetworks [208]. In their considered scenario, the sensingdevices are separated into different groups and the messagesare relayed by the sink nodes to reduce energy consumption.Assuming the smart jammers may launch jamming attacks tothe IoT devices, the authors ﬁrst utilize the DRL techniqueto select the optimal location of jammers for the maximumjamming effect. In the DRL model, the reward is deﬁned withthe estimated transmission energy consumption and minimalvalue without jamming attacks into account. With the deﬁnedjamming policy, the anti-jamming part is constructed as ahierarchical anti-jamming stackelberg game, which is not thefocus of this paper.

3) Access through UAVs:

Since UAVs can be easily con-trolled to ﬂy over the communication terminals, they havebeen widely recognized as air BSs or gateway to provideperiodical Internet connections [66], [189], [209], [210]. Liu etal. [66] consider the UAVs to collect the sensing informationaggregated by the collection point from terrestrial IoT endterminals. In the considered scenario, the ﬂying trajectory ofUAVs affects the received SINR, which further impacts thenumber of uploaded packets for each collection point in around. Then, the authors utilize the DRL technique to optimizethe trajectory with the deﬁned reward considering EE andpacket delay priority. The dueling DQN is utilized to decidethe moving direction according to current states which consistof both the delay priority and energy consumption priority.The simulation illustrates the improved average reward with different network scale and data density.Cao et al. [210] utilize the DRL technique to optimizethe channel allocation and transmit power control for theIoT nodes. Speciﬁcally, with the ﬁxed trajectory assumed,the UAV acts as the agent to select a suitable channel andtransmit power for every IoT uplink at each time slot inorder to maximize the reward which is deﬁned as the EEof all IoT nodes. The AC network [178] is adopted in theDRL algorithm, where the actor and critic NN have differentstructures as shown in Fig. 8. Moreover, the authors alsotry the different number of trajectory steps to update thechannel and power allocations through the simulations. Similarresearch in [211] also considers the UAV-enabled BS with thepredeﬁned trajectory. However, the research aims to optimizeuser scheduling and hovering time assignment for improvingthe EE of battery-constrained UAVs. Since the problem is adiscrete constrained combinatorial problem that is beyond theconventional AC model, the authors consider the stochasticpolicy to address the issue of huge discrete spaces. A ﬂexiblereward function is deﬁned with an adjustable parameter. Theﬁnal performance illustrates that the proposed model can savenearly 30% of energy compared with the conventional ACmodel.The above paragraphs give some typical research works onAI-based network access toward green MTCs. We further listsome research works and the utilized AI techniques as shownin Table II.

C. Energy-Efﬁcient Transmissions

In many MTC scenarios, the messages are transmitted in acooperative manner from the senders to receivers or APs dueto the resource constraints. Then, the routing path design orthe relay selection affects both the network performance andpower consumption [212], [213]. Different from the messagetransmission in wired networks, the path and relay selection ofMTC scenarios needs to consider more issues, such as energydynamics [213], [214], node mobility [204], [215], spectrumefﬁciency [212], [216], QoS [217], [218], and even the infor-mation security [219], [220]. The following paragraphs andTable III show the related research.

1) Routing:

Liu et al. study the routing problem in thewide area mesh IoT networks and consider the RL techniqueto address the limitations of conventional methods in termsof energy sensitivity [212]. In their proposal, the model-freeRL method called temporal difference learning is adopted topopulate and update the routing table. Speciﬁcally, the routingmetric which indicates the probability of selecting a particularadjacent node is calculated by using a Boltzmann explorationprocess. And once the routing metric values of the visitednodes in all paths are calculated, the path quality value iscomputed using the RL method. To improve energy sensitivityof the routing method, the cost function is deﬁned accordingto the transmission power as well as the remaining energyof transmit and receive nodes. The ﬁnal simulation resultsillustrate the performance improvement in terms of energyefﬁciency as well as the success rate and spectrum efﬁciency.The routing design in underwater sensor networks (UWSN)is a hot application of AI techniques. Zhou et al. utilize the QL method to select the next node and deﬁne the rewardfunction according to the residual energy and depth infor-mation for a balance of End-to-End (E2E) delay and energyconsumption [221]. The utilization of QL enables the long-term reward taken into account, which ﬁnally reaches theglobal optimization. By sorting the neighbors according tothe calculated Q-value, the node with higher priority can beselected to forward packets, while the other neighbors withsmaller Q-values are suppressed for energy saving. Hu andFei also adopt the QL to solve the routing in UWSNs [222],while the research goal is to make the residual energy of sensornodes more evenly distributed for the maximum networklifetime. In the RL proposal, the authors consider not onlythe residual energy but also energy distribution in a groupof sensor nodes to deﬁne the cost function, which is furtherutilized to calculate the reward and Q-value for differentactions indicating various next nodes. The authors in thispaper also illustrate that the proposed method can convergefor dynamic scenarios. And ﬁnal performance results indicatethe lifetime can be extended up to 20%.In [223], authors adopt supervised learning-based MLPalgorithm to improve the routing performance and energyefﬁciency for the IoT low power networks. Different fromthe other works [212], [221], [222] which utilize AI modelsto predict the next node directly, [223] aims at optimizingthe value of transmission range of each node to improve therouting performance and minimizing energy consumption. Inthis paper, the authors ﬁrst construct an IoT network to collectthe labeled data including node positions and correspondingtransmission range. Then, the MLP is trained with the labeleddata to map the relationship from the node position to theoptimal transmission range. One of the advantages of thisproposal is to address the high dynamics of IoT networks. Andthe ﬁnal simulations illustrate the extension of the networklifetime.Mostafaei studies the multi-constrained routing problem inWSNs and proposes a distributed learning approach [217],where each node is regarded as a learning automaton. Afterthe initial phase each learning automaton senses the neighbornodes to construct the action space, it transmits a packet bya randomly selected action. Once the packet reaches the sinknode, the environment will feedback a reinforcement signalwhich can be a penalty or a reward to evaluate the selectedaction. Then, the transmission probability of each action forevery node can be updated.

2) Relay and D2D:

Compared with routing, relay and D2Dtechniques provide more ﬂexibility. AI can be adopted todecide whether to relay or not and help to select the optimalrelay node according to the energy condition. Mastronarde etal. utilize the MDP to formulate the relay decision for eachUE in the cellular networks [228]. To maximize the long-term utility, the authors proposed a supervised learning-basedmodel to help each UE to learn the optimal cooperation policyonline. Speciﬁcally, the UE estimates three parameters, namelythe outbound relay demand rate, inbound relay demand rate,and relay recruitment efﬁciency in an online manner. Then,the estimated values can be utilized to calculate the transitionprobability and utility functions. To address the problem of frequent recomputing, the authors ﬁrst compute a collectionof cooperation policies ofﬂine. Then, in the online phase, theestimated parameter values can be adopted to calculate energycost, which ﬁnally helps to choose the optimal policy.He et al. study the relay selection problem in the air-to-ground VANETs (A2G VANETs) and adopt the QL to choosethe relay node in order to balance the network performanceand energy consumption [229]. In this paper, the ﬂying UAVsand the ground vehicles transmit messages to each otherby multi-hop relaying. Then, the relay selection affects thepacket delivery ratio, latency, signal overhead, and energyconsumption, which is further formulated to a multi-objectiveoptimization problem. The authors construct the Q value tableincluding the state and action indicating the network states andrelay selection, respectively. Through attempting different re-lay selections, the Q values for different choices can be ﬁnallycalculated. The extensive performance analysis illustrates theimprovement in terms of packet delivery ratio, latency, hopcounts, and signal overhead, which means increased energyefﬁciency.Wang et al. also utilize QL to optimize the power allo-cation and D2D relay selection for maximizing the energyefﬁciency [218]. As the relay selection policy affects energyefﬁciency of all D2D pairs, the authors construct a ﬁnite MDPand adopt QL to choose which neighbor node is selected.In the QL model, the state space is deﬁned with the fourcases that whether energy efﬁciency of ﬁrst-hop and second-hop D2D links is below or above the deﬁnite lower band.Each D2D pair acts as the agent to select a neighbor node intheir region with the target of maximizing the reward deﬁnedaccording to their energy efﬁciency. Through the iterationprocess in the QL algorithm, the Q-value table of each D2Dpair can be updated and the optimal relay with the maximumQ-value is chosen. The ﬁnal simulation clearly illustrates theimprovement of energy efﬁciency.Hashima et al. [230] utilize the stochastic MAB [231]to model the neighbor discovery and selection problem inmmWave D2D networks. And the considered MAB modelaims to maximize the long-term reward which is deﬁned asthe average throughput of the devices subject to the resid-ual energy-constraint of nearby devices. To solve this MABproblem, a family of upper conﬁdence bound algorithm plusThomson sampling is utilized by incorporating residual energyconstraints. The ﬁnal results illustrate the improved averageenergy efﬁciency and extended network lifetime. Authorsof [232] also focus on the relay selection in D2D mmWavenetworks to increase the connection reliability. However, theyutilize the DL model to predict the best relay device accordingto the distance between the device and BS or other devices,node mobility, signal strength, and residual energy. Speciﬁ-cally, the proposed relay selection algorithm consists of twophases. In the online phase, the random training values aregenerated with the best relay labels to train the consideredDNN model. Then, the second phase is to utilize the trainedDNN to predict the best relay. TABLE III. Some Related Research Works Using AI to Optimize Transmissions for Green MTCsResearchwork Networkscenario Learningmethod AImodel Input/state Output/action Target/rewardFu et al.[224] vehicularenergynetwork supervised LSTM trafﬁc ﬂow futuretrafﬁc optimizing routingand storageallocationJin et al.[225] UWSN reinforcement QL information ofneighbors andlinks packetforwarding constant cost,congestion cost,delay, and energyconsumptionHuang et al.[226] wirelesssensornetwork supervised CNN adjacencymatrix linkreliability optimizingroutingZhang et al.[179] relaynetwork reinforcement QL states oflink, buffer,and battery link selection maximizingreceiving dataHe et al.[227] CognitiveRadioNetwork reinforcement QL harvested energy,battery,destination nodes next hop optimizing energy efﬁciency

D. Energy Harvesting and Sharing

Similar to the cellular networks, MTC terminals can alsobe charged by the ambient energy in a wireless manner [206],[233]. To drive the MTC toward the green 6G era, two com-mon energy harvesting techniques are expected to be widelyapplied: renewable energy harvesting and RF harvesting. Theformal one considers renewable green energy sources suchas solar, winding, tide, and so on to reduce the utilization offossil fuel. The latter one is to efﬁciently harvest the dissipatedenergy which counts the majority in RF signals but cannot beused [234]. On the other hand, the dynamics of the harvestingpower further complicate the network performance improve-ment or energy efﬁciency optimization, which is the reason forthe application of AI techniques. In the following paragraphs,we introduce the related AI-based research considering the twoEH techniques.

1) Renewable Energy Harvesting:

Chu et al. utilize the RLtechnique to design the multiaccess control policy and predictthe future battery state [17]. In their research, the authorsconsider the uplink communication scenario where multipleenergy harvesting-enabled UEs access the BS with the limitedchannel resource. The authors ﬁrstly assume the user batteryand channel states are available for the BS, then utilize theDQN based LSTM to design the UE uplink access scheme. Inthis model, the system state includes the channel conditionsand UE battery levels. The reward is deﬁned as the discountedsystem sum rate of the long term. The consideration ofmultiple time slots drives the authors to adopt the LSTMmodel, which can make sequential decisions. The constructedLSTM model assists the BS to select the UEs at each timeslot in order to maximize the system sum rate. In the secondproposal, the authors utilize the RL based LSTM to predictthe battery level. In this RL model, the considered state spaceincludes the access scheduling history, the previous UE batterypredictions, and the practical UE battery information. Since thepurpose is to maximize the prediction accuracy, the reward is deﬁned according to the long-term prediction loss. Finally, theauthors combine the predictions of access control and batteryinformation and design a two-layer LSTM DQN network. Theﬁrst layer is to predict the battery level, which is adoptedas part of the state space in the access control prediction.Extensive simulations illustrate the improvement of the systemsum rate, further resulting in improved energy efﬁciency.Similar to the considered scenario in [17], the same authorsapply the DRL techniques to optimize the joint control ofpower and access [235]. Generally, the proposal consists oftwo stages. In the ﬁrst stage, the LSTM model is utilized topredict the battery states, which is similar to that in [17]. In thecontrol stage, the authors utilize the AC algorithm and DQN todecide the access and power scheme. The state space consistsof the channel power gain, predicted UE battery level, historyinformation of power control policy, and selected UE’s truebattery, while the action represents the transmit power whichhas a continuous value. The reward is deﬁned according to theachieved transmission rate, thus the algorithm aims to improvethe system throughput. The proposed LSTM model is veriﬁeda high accuracy rate to predict the battery state and the newapproach enables the improved average sum rate comparedwith conventional algorithms as well as DQN-based models.From the above introduction, we can ﬁnd that using theAI method to predict the harvesting-enabled battery state isan efﬁcient method to adjust the network conﬁgurations forperformance optimization. Authors of [16], [233] utilize thenon-linear regression method to ﬁnd the relationship betweenfuture harvesting power and the historical records. Then, withthe estimated harvesting power, the IoT node can adjust thesecurity conﬁgurations to provide qualiﬁed service as wellas reduce the outage probability. In [233], the authors alsostudy the THz-enabled 6G IoT scenario and show the achievednetwork throughput improvement and extended working time.

2) Radio Frequency Harvesting:

Abuzainab et al. focuson the problem of adversarial learning in a communication network where the devices are served and powered by theHybrid Access Point (HAP) [236]. In the considered scenario,the HAP needs to estimate the transmission power of thedevices and determine the suitable energy signal to reducethe packet drop rate of the devices. As the adversary may alterthe HAP’s estimate, the authors propose a robust unsupervisedBayesian learning method. In the proposed model, the HAPis assumed to have full CSI, which is utilized to calculate thetransmission power according to the received signal power.In the nonparametric Bayesian learning model, the Dirichletdistribution is used to calculate the posterior distribution ofthe probability vector of the device transmission power. Then,the HAP can ﬁnd the optimal transmission power to maximizethe utility while not depleting the device’s battery. Comparedwith the conventional Bayesian learning method, the proposedapproach can achieve performance in terms of packet droprate without jeopardizing energy consumption. The proposedlearning scheme also exhibits improved energy efﬁciencycompared with a ﬁxed power transmission policy.Kwan et al. study the RF harvesting from intended andunintended sources and propose machine learning-based wake-up scheduling policy for on-body sensors [237]. To addressthe unpredictable nature and low amount of energy harvestingfrom the RF signals of unindented sources make it difﬁcult todecide the wake-up time, the authors consider two machinelearning techniques including linear regression and ANN topredict the wake-up time. In the linear regression-based fore-caster, the authors consider the current capacitor charge leveland average energy harvesting rate to address the dynamicscaused by user mobility and changing channel conditions. Theproposed ANN is to predict the next wake-up time consideringthe last successful wake-up time and energy level. The ﬁnalsimulation results illustrate the two models both achieve highaccuracy rate.Similar to [237], the authors of [238] also focus on the opti-mization of active time of IoT nodes which are powered by RFharvesting energy. In this paper, besides information collectionand energy provision, the HAP is also responsible for settingthe sampling time of the IoT devices. The challenge of thisproblem is that the HAP cannot have exact knowledge of theharvested energy for each IoT device due to the impreciseknowledge of CSI. To address this issue, the authors combinestochastic programming and RL techniques. Firstly, stochasticprogramming is used to maximize the minimum sampling timeamong all devices. To tackle the limitation of an unknownand dynamic probability distribution, the RL technique isadopted where the assumed agent decides the sampling andcharging time according to the states corresponding to thedevice battery levels. The reward function is measured bythe maximum-minimum active time of devices. Moreover,the authors model the large-state or continuous space usinglinear function approximation. The ﬁnal results illustrate theRL approach can achieve as high as 93% of the minimumsampling time computed by stochastic programming. E. Summary

In this section, we analyze the AI-based research towardgreen MTCs. Compared with conventional methods, the ad- vantage of AI is that it can address the uncertainty andalleviate the failure ratio during the access and transmissionprocess [185], [203], [212], [214], [222]. For energy harvestingprocess, AI enables more knowledge about future availablepower and battery status, which enables necessary conﬁgura-tions towards improved energy efﬁciency [17], [235].V. C

OMPUTING O RIENTED C OMMUNICATIONS

In the 6G era, the computation services are expected to playa more important role in people’s work and life. With thegreat leap in transmission rate and communication capacity,an increasing number of applications will be ofﬂoaded to thecloud or edge server for the nearly real-time results insteadof execution locally. Moreover, to store the contents on thecloud and edge servers can provide users with more efﬁcientand ﬂexible service. Additionally, the widespread applicationof AI techniques also drives the development of computingoriented communications to accelerate network management.In this section, we discuss the power consumption model andintroduce the existing AI-based research aiming to improveenergy efﬁciency and save energy consumption of the COCscenarios.

A. Power Consumption Modeling

The consuming power of the servers depends on the CentralProcessing Unit (CPU) or Graphic Processing Unit (GPU)utilization which usually keeps changing. Generally, energyconsumption of a server is approximately linearly dependenton the CPU and GPU usage. If we assume 𝑃 𝑖𝑑𝑙𝑒 and 𝑃 𝑚𝑎𝑥 todenote the consuming power of a server working at idle stateand full state, respectively, the following equations can modelenergy consumption when the utilization rate is denoted as 𝑢 [71], [72]: 𝑃 ( 𝑡 ) = 𝑃 𝑖𝑑𝑙𝑒 + ( 𝑃 𝑚𝑎𝑥 − 𝑃 𝑖𝑑𝑙𝑒 ) × 𝑢 ( 𝑡 ) (6) 𝐸 = ∫ 𝑡 𝑃 ( 𝑡 ) 𝑑𝑡 (7)Thus, for a cluster of servers, the total energy consumptioncan be calculated by summing energy cost of all servers.From Equations 6, it can be found that to save energyconsumption, we can reduce the utilization rate of each server.However, it has been investigated that the server in the idlestate consumes approximately more than 60% of the peakload electricity [239], [240], which makes the problem morecomplicated. For a given workload, to utilize only one or severservers at the full state and turn off the other servers may resultin low energy consumption, but on the other hand contributeto the high delay. Therefore, how to allocate the computationresources to balance energy consumption and service qualityis an important direction in the research [149], [241], [242]. B. Energy-Efﬁcient Cloud and Edge Computing

According to Equation 6, to reduce the CPU/GPU usagecan alleviate energy cost. In this part, we discuss the threecommon issues to alleviate the computation resource usageincluding ofﬂoading decision, resource allocation, and serverplacement. …… Global model Local model n hgNB 1:

UAV hgNB 2: HAB hgNB n: BS

End devices Local model 2

Local model 1

Aggregate and update

Fig. 9: The federated learning structure for heterogeneousnetwork architecture.

1) Ofﬂoading Decision:

The existing networks usually con-sist of multiple types of computation platforms including thecloud, fog, and edge computation servers. Moreover, the com-putation tasks can be also executed locally if necessary. Theheterogeneous computation platforms have variable latency aswell as different energy consumption. Moreover, computationofﬂoading choice also means different communication over-head. In this part, we introduce how AI is utilized to decidethe computation ofﬂoading policy for green communications.Wang et al. [243] combine the heuristic algorithm and DL tooptimize the computation ofﬂoading policy to the fog or cloudservers. In this paper, the authors analyze energy consumptionand latency to ﬁnish the computation task by fog serversand cloud server, and then formulate a Mixed Integer Non-Linear Programming (MINLP) problem aiming at minimizingthe total energy cost under the latency constraint. To solvethe NP-hard problem, the authors ﬁrst utilize the simulatedannealing algorithm [244] to ﬁnd some optimal solutions,which is further utilized to train the constructed CNN model.Moreover, the training process is periodically conducted toupdate the parameters of the considered CNN models, whilethe greedy algorithm is utilized as compensation if the resultof CNN models is not reasonable.To alleviate the computing overhead for meeting the latencyrequirement, Gong et al. [23] consider the high-rate RFcommunication and low-power backscatter communications torealize active ofﬂoading and passive ofﬂoading, respectively.Since the local computing, active ofﬂoading, and passive of-ﬂoading have different computation latency as well as variousenergy consumption, DRL is adopted to optimize the suit-able computation and transmission policy. The assumed agentchooses from three actions: local computing, active ofﬂoading,and passive ofﬂoading, given the channel conditions, energystatus, and workload in each time slot. The ﬁnal resultsillustrate that the DRL-based method can reduce the outageprobability by intelligently scheduling the ofﬂoading policy.Moreover, this paper illustrates many perspective directions ofDRL-based backscatter-aided data ofﬂoading in Mobile EdgeComputing (MEC) scenarios.Yan [245] consider the single user with multiple indepen-dent tasks meaning that the results of some computing tasks may be utilized as the input of some others. The authors ﬁrstadopt the task call graph [104] to model the inter-dependencyamong different computation components. To reduce energyconsumption of mobile devices and the computation latency,the authors deﬁne the reward function with weighted energyconsumption and latency, then consider two problems: howto ofﬂoad the computation tasks and how to allocate theCPU cycles for different tasks. Since the ﬁrst problem is acombinatorial binary problem while the second one is convex,the authors adopt the DRL technique to decide the ofﬂoadingpolicy. The AC learning structure [172] is utilized where aDNN in the actor network is adopted to learn the relationshipbetween the input states (wireless channels and edge CPUfrequency) and the ofﬂoading policy, while the critic networkis to evaluate the energy and latency performance of differentofﬂoading strategies. Different from the conventional criticnetwork which utilizes the DNN to evaluate the ofﬂoadingdecisions, the authors deﬁne the one-climb policy where thetasks in one path of the task graph can only migrate oncefrom the mobile device to edge servers, which can reducethe number of performance evaluations, resulting in reducedcomplexity and accelerated computations of the DRL method.Ren et al. [24] unite the federated learning and DRL to op-timize the partial ofﬂoading policy for the energy harvesting-enabled IoT nodes. Different from the centralized learningtechnique, the adopted federated learning enables every IoTnode to avoid the sensing data uploading to the edge node,which can protect data privacy and alleviate the transmissionoverhead. Speciﬁcally, the edge node only acquires the pa-rameters of the trained DRL agent, while a random set ofIoT devices are selected to download the parameters fromthe edge node, train the DRL agent with newly collecteddata, and ﬁnally upload the updated parameters to the edgenode. In this paper, the authors also compare with central-ized DRL in terms of training performance and networkperformance. Results show that the training of FL-based DRLcan ﬁnally approach that of centralized DRL, even though itﬂuctuates more seriously. And under varying computation taskgeneration probability, the federated learning-enabled DRLcan improve the overall network performance, especially interms of queuing delay and task drop rate. Similar researchis also conducted in [112]. The authors demonstrate thatthe federated learning-based DRL models can be applied tovarious environments with reduced transmission consumptionand enhanced privacy protection.Different from the above works focusing on static scenarios,the authors of [113] adopt multiple ML techniques to optimizethe cooperative Vehicular Edge Computing (VEC) and cloudcomputing in dynamic vehicular networks. As the uncertainvehicle mobility results in the dynamic network structure andunstable connections, which leads to low efﬁciency for con-ventional heuristic searching strategies, ML is adopted [246] tocluster the vehicles into groups according to their connectiontime, where each group consists of a Road Side Unit (RSU),multiple service demanding vehicles, and service providingvehicles. And the RSU decide whether ofﬂoading the tasksto the cloud servers or conduct them locally. To schedule thecomputation tasks for a balance of energy consumption and latency, an imitation learning-based algorithm is proposed,which can alleviate the extreme complexity of conventionalbranch-and-bound algorithm. Speciﬁcally, an expert is trainedwith a few samples to obtain the optimal scheduling policyin an ofﬂine manner. Then, the agent is trained to follow theexpert’s demonstration online. Results illustrate that imitationlearning can signiﬁcantly accelerate the execution of thebranch-and-bound process.

2) Computation Resource Allocation:

The computationplatform usually needs to execute multiple tasks. How toallocate the computation resource, especially the CPU/GPUcycles is an attractive topic [87], [109], [113], [247]. On theother hand, energy consumption is also an important metricthat needs to be considered. How to balance energy consump-tion and computation performance can be addressed by AItechniques [113], [248], [249]. The following paragraphs willfocus on several research works.Similar to [113], the authors of [250] also consider AItechniques to balance energy consumption and latency for thescenarios utilizing the capacity-limited edge servers and cloudserver. However, edge servers are driven by hybrid powerincluding solar, wind, and diesel generator, while computation-efﬁcient cloud servers are grid-tied. The authors model thejoint workload ofﬂoading and edge server provision as anMDP and utilize the RL technique to solve it. The authorsdeﬁne the total system cost with the delay, diesel generatorcost, and battery consumption, while the policy denotes thecomputing power demand in each time slot. To ﬁnd the opti-mal policy, a novel post-decision state-based online learningalgorithm is proposed to exploit the state transitions of theconsidered energy harvested-enabled MEC system. Comparedwith the standard QL method, the proposed approach con-verges much faster. And extensive simulations conﬁrm thatthe MEC system performance can be signiﬁcantly improved.Pradhan et al. [109] study the computation ofﬂoading ofIoT devices in the massive MIMO Cloud-RAN (C-RAN)deployed in an indoor environment. In this paper, the pur-pose of optimizing the computation ofﬂoading is to minimizethe total transmit power of IoT devices. In the consideredscenario, the transmission latency of the uplink signals isconcerned with the transmit power and the CPU cycle alloca-tion. Therefore, to minimize the total transmit power of IoTdevices under the latency threshold, we need to consider notonly the signal processing factor, but also the computationresource allocation, which is a non-convex problem due tothe coupling relationship among these factors and their valueconstraints. To solve this problem, the authors consider thesupervised learning method and adopt the DNN model todecide the transmit power, CPU cycle assignment vector, andthe number of quantized bit. The authors also propose anAlternating Optimization (AO) based mathematical model toobtain some near-optimal solutions to train the DNN modelofﬂine. Simulation results illustrate the fast convergence ofthe DNN training process. More importantly, to tackle thesame problem in dynamic IoT networks, the authors utilizethe transfer learning [105] technique, which means that part ofthe trained DNN’s parameters are utilized in the newly-formedDNN for the changed scenario. Then, the DNN can be updated through training with limited samples, avoiding the complextraining from scratch, which reduces the execution time bythe order of two magnitudes. The ﬁnal performance analysisshow that the transfer learning-based DNN can provide a closeapproximation of the optimal resource allocation.Wang et al. [87] study the cellular networks where MEC-enabled High-Altitude Balloons (HABs) conduct the users’computation tasks with limited capacity and energy. Since thedata sizes of the computation tasks vary, the user associationpolicy should be optimized to meet the requirement as wellas minimize energy consumption. To alleviate the limitationsof traditional Lagrangian dual decomposition [251] and gametheory [67] in dynamic scenarios, the authors utilize the SVM-based federated learning algorithm to map the relationshipfrom users’ association and historical requested task size tothe future association. Speciﬁcally, similar to the processin Fig. 9, the HAB ﬁrst train an SVM model with thelocally obtained data to construct the relationship between userassociation and computation task size. Then, the HABs sharetheir trained SVM model, which enables further integrationand local improvement. Thus, each HAB can build an SVMmodel to quantify the relationship between all user associationand historical computation task information. The simulationresults illustrate energy consumption can be reduced with abetter prediction of optimal user association.Ma et al. [247] utilize the PSO algorithm to jointly optimizethe selection of access networks and edge cloud to minimizethe latency and total energy consumption. In the consideredscenario, each user can be served by multiple edge cloud-enabled access networks. Since the latency and energy con-sumption are both caused by task ofﬂoading and execution, theformulated problem to minimize the two metrics is NP-hard.In the adopted PSO model, the ﬁtness function is deﬁned asthe sum of weighted latency and energy consumption. Notethat the values of latency and consumed energy are processedto between 0 and 1 to avoid the dimensional inﬂuence.And the ﬁnal performance analysis illustrates the signiﬁcantimprovement in terms of latency and energy consumption.

3) Edge Server and Virtual Machine Placement:

The place-ment optimization including the edge servers and VirtualMachines (VMs) affect the resource utilization of the wholenetwork. Since power consumption at the idle state constitutesthe major part of total energy waste [66], to minimize theactive servers as well as meet the service requirements canimprove energy efﬁciency. And AI techniques including theheuristic algorithms and machine learning methods have beenstudied to optimize the deployment of edge servers and VMs.Li and Wang [71] study the edge server placement anddevise a PSO-based approach to minimize energy consump-tion. In this paper, the authors consider that multiple edgeservers are located at different base stations. And the delayfor the base stations to access the edge servers should be notabove a threshold. In this paper, the minimization of energyconsumption depends on the locations and assignments of theedge servers. To solve this discrete problem, the authors alsoredeﬁne the parameters and operators of the PSO method.To evaluate the performance, a real dataset from ShanghaiTelecom is utilized in the experiment, with which the PSO- based approach shows an improvement of more than 10%energy saving.Liu et al. [66] study the VM placement in cloud serversand adopt the ACO algorithm to minimize the number ofactive servers and balance the resource utilization, resultingin improved energy efﬁciency. In their approach, the bipartitegraph is constructed to describe the VM placement prob-lem. And the pheromone is distributed not only betweenthe VMs and servers, but also among the VMs assigned tothe same server. And the assumed artiﬁcial ants conduct theVM assignment based on global search information. To speedthe convergence and improve the solution, a local searchincluding the ordering exchange and migration operations isconducted. The improved ACO algorithm is efﬁcient for large-scale problems. And the experimental results show that thenumber of active servers can be minimized with balancedusage of resources including the CPU and memory, whichresults in improved energy efﬁciency.Shen et al. [91] focus on the cloudlet placement to improveenergy efﬁciency in the mobile scenario and K-means cluster-ing [88] method is adopted to search the location center. In thispaper, energy consumption is assumed to be directly relatedto the number of deployed cloudlets. Thus, to minimize thenumber of deployed cloudlets can optimize energy efﬁciency.To tackle this problem, the authors ﬁrstly utilize the K-meansclustering method to ﬁnd the central locations of the mobiledevices. The following steps are to delete some locations thatdo not meet the density requirements and generate the movingtrajectory of the cloudlets. Performance analysis illustrates theincreased number of covered devices of each cloudlet, whichresults in reduced energy consumption.Zhang et al. [80] study the container placement to optimizeenergy consumption of virtual machines and propose an im-proved GA. In this paper, the container is utilized to computesome applications and energy consumption is assumed to benonlinearly related to resource utilization. Since the containerplacement is regarded as a combinatorial optimization prob-lem, the heuristic algorithms, such as GA [81], are well suited.However, the conventional GA sometimes incorrectly elimi-nates new individuals in the mutation operation when resourceutilization is high, which causes performance degradation. Tosolve this problem, the authors propose two kinds of exchangemutation operations and deﬁne a control parameter with thenumber of search iterations. And the method can help thesearch iteration to jump out of the local optimum. The ﬁnalsimulations illustrate the signiﬁcantly improved power savingperformance in small, medium, and large scales of scenarioswith uniform and non-uniform VM distributions.Wang et al. [72] study virtual machine placement inheterogeneous virtualized data centers and utilize the PSOmethod [252] to minimize energy consumption. In this paper,the authors ﬁrst establish energy consumption model of aheterogeneous virtualized data center. Since traditional PSOmethod can be only utilized for continuous optimization prob-lems, the authors redeﬁne the particle position and velocitywith two 𝑛 -bit vectors, and then redeﬁne the subtraction,addition, and multiplication operators to ﬁt the energy-awarevirtual machine placement optimization, which is a discrete problem. Then, the authors consider the energy-aware localﬁtness and devise a two-dimensional encoding scheme toaccelerate the convergence and reduce the search time. Resultsillustrate that the proposed method outperforms the otherapproaches and can lessen 13%-23% energy consumption. Asimilar research work based on PSO is given in [73]. Theauthors utilize the decimal coding method to apply PSO ina discrete problem. And energy consumption is minimizedconsidering the service requirement constraints. The authorsalso analyze the complexity of the proposal which is relatedto the numbers of migrated virtual machines, particles, anditerations. C. Green Content Caching and Delivery

Besides ofﬂoading the contents to the edge/cloud servers,to store the contents is also an important service for futureCDN. Energy consumption of this part mainly comes from thecaching and delivery process. In the following paragraphs, wediscuss the related research on how AI is adopted to improveenergy efﬁciency of content caching and delivery.

1) Caching Policy Design:

For future multi-tier or hierar-chical networks, the contents are usually cached in differentparts to improve storage efﬁciency. The content caching policyneeds to be optimized due to the variable storage size ofheterogeneous devices and different energy consumption forcontent retrievers. Li et al. [253] utilize the DRL to optimizethe content caching policy for multi-tier cache-enabled UDNs.The authors analyze the different energy consumption ofcontent retriever from the Small Access Points (SAPs), MBS,and core networks, then construct the energy-efﬁcient model.To optimize energy efﬁciency, the standard DRL method usingthe regular multi-tier DNN is adopted, where energy efﬁciencyand different content combinations as the reward and state,respectively. To accelerate the convergence of the proposedintelligent content caching method, the authors utilized thelatest ﬁnds including the prioritized experience replay [254],dueling architecture, and deep RNN. Extensive simulationsillustrate that the proposed intelligent content caching algo-rithms can signiﬁcantly improve energy efﬁciency for both thestationary and dynamic popularity distributions. [255] analyzesimpacts of the channel conditions on content caching. Andthe RL-based content caching is proposed to alleviate energyconsumption.Shi et al. [256] adopt the DQN model to optimize thecontent caching in three layered vehicular networks, wherean airship distributes the contents to UAVs for satisfying theterrestrial services. In the considered scenario, the airshipneeds to schedule the UAV caching the required contents toprovide the service if the requested content is not in local UAV,which means more energy consumption. To minimize energyconsumption, the DQN model is proposed and the deﬁnedreward considers the probabilities of local UAV requests andother UAV scheduling. To improve training performance, theexperience replay mechanism is considered. And the proposedDQN model is veriﬁed to overcome the large number of statesand in the training process.Tang et al. [257] consider the scenario where the users canretrieve the contents locally, or from the neighbor devices, SBS, and MBS, with increasing energy consumption. On theother hand, the user’s device, SBS, and MBS have increasingcaching capacity. Speciﬁcally, the QL algorithm is appliedto every user to select the cached contents with the goal ofminimizing the cost which is inversely proportional to thepopularity of cached ﬁles. For the caching policy of each SBS,the DQN is adopted to select the contents in order to minimizethe total energy consumption. In the proposal, the cost functionis similar to the reward in DRL, while the optimization goalbecomes to minimize the value of cost. For this proposal, thecomplexity of QL is relatively low since every user’s devicehas very limited capacity, which means the state space is small.On the other hand, the DQN has a relatively high complexitysince the number of cache combinations is large, leading to ahuge state space.The content caching policy design deeply depends on theusers’ preferences, thus, the centralized control-based opti-mization methods may cause concern for privacy. For the data-driven AI algorithms including ML and DL techniques, thetraining and running process which requires the users’ dataposes great challenges. To address this problem, federatedlearning has been widely studied to keep the data IN thelocal area to protect privacy [116], [258], [259]. In [116],the UE conducts the calculations of the shallow layers togenerate some general features of the content requests. Similarto the process in Fig. 9, the heterogeneous BSs including theﬂying UAVs aggregate the parameters of the shallow layers toconduct the further training and running process to decide thecontent caching policy. Different from the cooperative trainingof the deep learning models, Yu et al. [258] consider that eachuser downloads the Stacked Autoencoder from the server andtrains it with the local dataset generated from the personalusage. Then, the updated parameters and extracted features areuploaded to the server, where the hybrid ﬁltering techniqueis adopted to decide the content caching policy. To furtherensure data security, blockchain techniques can be adopted inthe data transmission process [259]. However, these researchworks aim to improve the caching performance, instead of theminimization of energy consumption.

2) Delivery:

Besides content caching, how to deliver thecontents is also an important factor to affect energy consump-tion. In this part, we discuss the related AI-based research oncontent delivery optimization.Lei et al. [260] study the content caching and deliveryin cellular networks, and a supervised DNN based approachis adopted to optimize the user clustering to minimize thetransmit power of the BSs. In each cell, the content deliveryshould satisfy the stringent delay requirement, thus the userscheduling algorithm should have low computation time toenable real-time operations. To realize this goal, the DNNis trained to map from the users’ channel coefﬁcients andrequested data amount to the clustering scheduling policy.The authors utilize a variable size of dataset generated withconventional iterative algorithms to train the proposed DNN.And the performance shows that the large sized dataset canresult in 90% approximation to the optimum with limited timeconsumption.Al-Hilo et al. [261] utilize the DRL technique to optimize the trajectory of UAV in order to improve content deliveryfor the UAV-assisted intelligent transportation system. In thispaper, the moving vehicles are assumed to cache part of thecontents due to the limited capacity and need to retrieve theother contents from the BS which is time-consuming andunstable. To improve the content delivery performance, thecache-enabled UAVs are assumed to hover over the vehiclesto meet some content requests. As the trajectory control affectsthe performance of content delivery as well as the powerconsumption of UAVs, the Proximal Policy Optimizationalgorithm is adopted to decide the ﬂying velocity accordingto the network states including the current position, vehicleinformation, and cached contents. The ﬁnal results also showthe improvement of energy efﬁciency.The above works focus on content delivery in the accessnetworks, while the data forwarding in the core networksis also an important factor to affect energy consumption.Li [75] utilize the ACO algorithm [262] to optimize the dataforwarding scheme to reduce content retrieve hops, whichresults in less energy consumed by the routers and links. In thispaper, the CDNs are ﬁrst divided into multiple domains. Andthe data packets and the hello message packets are assumed tobe two types of ants. For each path, the pheromone is deﬁnedand calculated as the normalized sum of path load, delay, andbandwidth. Then, through the generated interest ants in theinitial state, the node can construct the paths and update thecorresponding pheromone values. Then, during the data packettransmission stage, the pheromone is further updated accordingto the real-time performance.

3) Joint Optimization:

Since the caching and delivery poli-cies both affect energy consumption, joint optimization isanother direction toward green communications. Li et al. [263]adopt the DRL method to minimize the latency and energycost of content caching and delivery in RAN. In this paper,the authors deﬁne the reward function considering the latencyand energy cost of the content caching and delivery betweenthe users and SBS, MBS, and cloud servers. Then, the ACmodel and DDPG algorithm [264] are adopted, where twoidentical DNNs are utilized to generate the deterministic actionand evaluate the chosen strategy. Here, the action is deﬁnedwith the content ﬁle placement, SBS-user association, andsubchannel assignment. The simulation results illustrate theimproved rewards, which means the performance improvementin terms of transmission delay and energy consumption.Similarly, Li et al. [242] also utilize the DL techniqueto jointly optimize the content delivery latency and systemenergy consumption. However, as the cache-enabled D2Dnetworks are adopted to alleviate the overhead of requestingthe contents from the cellular BS In this paper, the devicemobility, content popularity, and link establishment decisionsneed to be considered. To address the complexity causedby the dynamics including changing channel conditions andvariable content popularity, the authors consider a three-stepproposal, all of which utilize the DL models. First, the RNNmodels including the conceptor-based Echo State Networks(ESN) [265] or LSTM is utilized to predict user mobilityaccording to the limited previous records. Then, the predictedD2D user location information, together with other attributes including gender, age, occupation, time, and so on, are utilizedas the input of ESN or LSTM to predict the probability ofeach user to request every content at the next time slot. Then,the content request distribution can be utilized to assist thecontent placement. For example, the content will be assignedto the user if the request probability is above 70%. In thethird step, the joint value and policy-based AC algorithm [172]is utilized for each user to choose a neighbor to establishthe communication link for content delivery according tothe observed environment which is deﬁned as the transmitpower, channel gain, and distance. In this algorithm, thereward function is denoted by the sum of weighted contentdelivery delay and power consumption. The simulation resultsillustrate that with different weight combinations of delay andpower consumption, variable power saving performance can beobtained, which means that the proposed strategy is reasonableand ﬂexible. Similar research is given in [266], which alsoutilizes the ESN model [265] to predict the user mobilityand content request distribution. Since the requested contentis dependent on the users, the authors consider the context ofusers including the gender, occupation, age, and device type topredict the probability of content requests. To make the resultspractical, the authors collect historical content transmissionand user mobility records to train the considered models. D. Summary

According to the introduced research, we can ﬁnd AItechniques can signiﬁcantly improve energy efﬁciency of thecontent caching process. In the content placement step, AItechniques are important and efﬁcient to predict the contentpopularity and users’ information including the preference andlocation, which can result in improved local Cache Hit Ratio(CHR) and reduce the content retriever from cloud servers.For the content delivery part, the optimization is to improvethe resource allocation, transmission scheduling, routing, andother communication functions to save energy. Different fromthe energy-efﬁcient proposals in cellular networks as wementioned above, the strategies in content delivery networksshould consider the content placement, latency requirements,and even the caching capacity.VI. O

PEN R ESEARCH I SSUES

Even though there are a huge number of research workson AI-based green communication services, we still need topay more attention to transform our endeavors into practicalapplications in the 6G era. Moreover, the utilization of AItechniques in current networks is still confronted with manychallenges in terms of computation complexity, hardware com-patibility, data security, and so on. The following paragraphsgive some promising directions, which we believe will givesome ideas to the researchers.

A. Green BS Management for 6G HetNet

As we mentioned in Sec. III, the BSs take the majority oftotal energy consumption. In the 6G era, the number of BSsis meant to be multiple times that of 5G. And these BSs are constructed in a hierarchical manner and have various sizes ofcoverage. Moreover, as the UAVs and HABs will also act asthe BSs [87], [116], [249], [261], the heterogeneous hardwarearchitectures and the mobility further complicate the greenmanagement. The following paragraphs introduce the potentialAI-based research considering the potential three functions of6G BSs.As the end terminals can be served by different BSsincluding the MBSs, SBSs, and Tiny Base Stations (TBSs) inthe multi-tier 6G HetNet, the user association policy shouldbe optimized in order to turn off the redundant BSs forenergy saving. Moreover, the BSs are usually deployed withmultiple frequency bands, the resource allocation includingthe channels and power are critically for the network en-ergy efﬁciency. However, the mobility of end devices, andUAV or satellite-enabled BSs results in the changing trafﬁcdemand and dynamic channel conditions, while the resourceheterogeneity further complicates these problems. To addressthese issues, AI techniques can provide efﬁcient assistance.For example, AI models can be adopted to predict the trafﬁcdemands, mobility patterns, and channel conditions, whichenables the network reconﬁgurations in advance.Besides offering communication services, future BSs willact multiple roles, such as the computation/storage providersand energy source. As some BSs have a certain amount ofcomputation and storage resources, the computation ofﬂoadingand content caching policies can be optimized by AI models.For example, the computation ofﬂoading or content cachingare usually models as a non-convex problem, which is furthersolved by the RL or DRL techniques. As we mentioned inSec. I, compared with the traditional method which divides thenon-convex problem into two sub-problems and solves themone by one, the RL or DRL can ﬁnd the global optimal solutionand avoid the complex iteration process during the algorithmexecution period.

B. Energy-Efﬁcient Space-Air-Ground Integrated Networks

SAGIN has been regarded as one of the key technologiesfor 6G [1], [267]. SAGIN can provide seamless coverageand ﬂexible information transmissions, especially for massiveMTCs. Since the satellites, HABs, and many UAVs are drivenby renewable energy, energy-efﬁcient network orchestrationis critically important for SAGIN. However, the diversiﬁedtransmission environments, heterogeneous hardware platforms,and dynamic energy resources pose great challenges. To ad-dress the complexity and uncertainty, AI can provide manyefﬁcient models. For example, using the RL technique tooptimize the resource allocation policy including the trans-mitting power [268] and channels [206] has been evaluatedto improve the network energy efﬁciency. Moreover, the CSIdynamics and network mobility make energy-efﬁcient packettransmissions more difﬁcult. As AI has been demonstratedthat it can efﬁciently map the complex relationship betweenexisting network traces and future transmission policy forterrestrial networks [206], [216], we believe the research canbe extended to the SAGIN scenario.Even though AI has been studied to optimize the SAGINperformance [189], [269], current research mainly focuses on the single layer, such as the LEOs and UAVs. Fromthe systematic perspective, the network management towardgreen communications should consider every part of SAGIN.For example, the UAV deployment and trajectory should beoptimized considering the beam control of satellites to realizeenergy-efﬁcient coverage [66], [210], [211]. As AI has beenillustrated to be competent to handle the complex multiple-variable-related problems [196], [197], using AI techniques toanalyze performance from the perspective of whole SAGINsystem will be a promising direction. However, the difﬁcultyis how to characterize the concerned factors into the AImodel [57], [92]. And, the execution of the AI model is anotherchallenge due to the extreme computation overhead. Moreover,AI is also important to optimize RF energy harvesting incellular networks, which will be discussed in Sec. VI-D. C. AI-based Energy-Efﬁcient Transmissions

Packet transmission is energy-consuming as it costs energyof transmitters, forwarders, and receivers. Besides power con-trol and resource allocation methods to reduce energy con-sumption, many other choices have been provided includingthe routing policy design, relay, backscatter communication,and IRS-aided transmissions. There is no doubt that multiplecommunication manners will be provided for the end devicesto transmit the packets successfully. For instance, the mobileusers can choose the cellular network to send the email,which can be also ﬁnished by the IEEE 802.11-based WiFior through D2D in a multi-hop manner. How to cooperativelyutilize and schedule the different communication methodsand resource in a multi-agent multi-task environment willheavily affect the system energy consumption and networkperformance. Most AI-based research focuses on the singlecommunication scenario, while very limited works study thehybrid scenario [193], [196]. In the future, we can pay moreattention on AI to improve energy-efﬁcient transmission in thescenario where multiple communication manners are available.

D. AI-Enhanced Energy Harvesting and Sharing

Energy harvesting has been widely recognized as an im-portant part for green communications. To drive the devel-opment of green communications, various energy harvestingtechniques will be utilized, which can be grouped into dif-ferent groups according to whether it is controllable and pre-dictable [270]. AI techniques can be adopted in the scenariosusing the uncontrollable but predictable energy group andpartially controllable energy group, where the formal consistsof the solar, winding, tide, and other renewable sources,while the latter includes RF energy. For the uncontrollablebut predictable energy harvesting techniques, some AI modelscan be utilized to map the relationship between the futureharvesting power and related factors [271], [272]. And thepredicted results can be adopted to reconﬁgure the networkin advance. Another method is to directly utilize AI modelsto map from the harvesting-related factors to network man-agement policy. These methods enable network operators togain more knowledge of energy harvesting and improve theutilization efﬁciency. For the partially controllable RF energy harvesting technique, AI can be used to optimize the BS powercontrol and transmission scheduling [150], [152]. For the UAV-enabled BSs, AI can be adopted to optimize the trajectoryto reduce energy consumption and improve the harvestingefﬁciency [161], [273]. Current research mainly focuses onthe maximization of minimum harvesting energy due to thedisordered transmission and unplanned power control [237],[238], AI can enable the RF harvesting process to be energy-aware, which can greatly reduce the wasted energy, especiallyfor the signals from omnidirectional antennas.The RF harvesting technique also enables energy sharingamong devices, which can be considered to avoid the outageof some network parts as well as reduce energy waste whenbatteries of some devices are nearly full and cannot saveincoming energy anymore [129]. The Simultaneous WirelessInformation and Power Transmission (SWIPT) technique hasbeen widely studied, especially in MTC scenarios [274]. Eventhough it may cause some performance loss to harvest energyfrom part of the received signals, AI can be utilized to decidethe ratio between RF harvesting and information transmissionto reach a balance [275]. Currently, ambient backscattering isa promising technique especially for the low power machines,AI can be considered to optimize energy harvesting andinformation forwarding process [23], [156], [157].

E. Security for AI-enabled Networks

The adversaries and unauthenticated users threaten the in-formation privacy as well as cause the transmission failures,leading to the deteriorated energy efﬁciency. To protect thenormal information transmission from the attacks, AI canbe considered as it has been veriﬁed to detect the networkthreats [276]. Moreover, using AI to control the transmitpower and allocate the resource is also efﬁcient to address thenetwork jammers [208]. For the future AI-driven 6G, a newtype of network threatens may be the malicious data generatedby the adversaries, which misleads AI models to reach awrong decision. Besides the decreased throughput or increasedlatency, the potential results may be the widespread outage ofend terminals or extremely low harvesting efﬁciency. How todevelop robust AI models to ensure green communicationswill be important topics.Most AI techniques including the DL and ML rely ondata in the training and running phases. Since the data maybe concerned with personal privacy or business information,to develop and execute AI algorithms should consider thedata security issues. More importantly, the standards andregulations should be built to guide the collection and usageof data [59].

F. Lightweight AI Model and Hardware Design

To develop AI-based green communications, energy con-sumption of AI algorithms should be analyzed. However,most of the current research just focuses on the networkperformance improvement compared with conventional algo-rithms and neglects the consumed energy for the trainingand running of AI models [277], [278]. This may cause thehigh complexity of the proposed AI models, which may be more energy-aggressive than traditional methods. Thus, howto minimize the required training data and how to decreasethe algorithm complexity is important for the development ofAI-based green communications. As the reduced complexitymay sacriﬁce the accuracy rate in some cases, the balanceenergy efﬁciency and network performance is still critical forAI algorithms. Furthermore, the amount of consumed energyfor AI algorithms also depends on the hardware. To designthe hardware for computation acceleration of AI algorithmswith low cost should also be paid more attention [279].Currently, very limited research has analyzed how to conductAI algorithms with low energy consumption [280], [281]. Andthe results inspire us to pay more attention to how to executethe proposed AI algorithms in an energy-efﬁcient manner.VII. C ONCLUSION

AI has aroused widespread attention from nearly every ﬁeldto improve the quality, accelerate production, customize theprovided services, and so on. To utilize AI technologies in 6Ghas been widely acknowledged as a paradigm. And the AI-based green communications will be an important directiondue to the exponentially increasing energy consumption fromthe growing infrastructure and end devices. To reduce energycost and improve energy efﬁciency, too many variables and ahigh dimension of solution space need to be considered andanalyzed. Conventional heuristic algorithms and convex opti-mizations require the simpliﬁcation of considered problems,which may need a great number of iterations or not reacha satisfying energy efﬁciency level. On the other hand, AItechniques have been veriﬁed their overwhelming advantagesand power in handling complex problems. In this research, wesurvey the AI-related research on network management andconﬁgurations toward energy efﬁciency optimization. Anotherdirection for green communications is to utilize energy har-vesting techniques which adopt renewable energy or ambientenergy to reduce the usage of fossil resource. AI techniquescan be adopted to address the uncertainty and dynamics inenergy harvesting process. Moreover, this paper considersthree common scenarios in 6G: CNC, MTC, and COC, andanalyze how AI can improve the conﬁgurations of 6G elementsincluding massive MIMO, NOMA, SAGIN, and THz. Webelieve this paper can provide some guidance and encouragefuture works focusing on AI-based 6G green communications.Furthermore, we analyze the strengths and weakness of dif-ferent AI models, including the traditional heuristic algorithmsand the state-of-the-art ML/DL methods. We illustrate howthey can cooperatively work to reduce energy consumptionand improve energy efﬁciency from a systematic perspective.Additionally, we discuss the necessity to consider energyconsumption of AI models and indicate some open issuesincluding data privacy, computation complexity, hardware de-sign, and network deployment, which the future researchersneed to embrace. R

EFERENCES[1] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang,“The Roadmap to 6G: AI Empowered Wireless Networks,”

IEEECommunications Magazine , vol. 57, pp. 84–90, Aug. 2019. [2] K. David and H. Berndt, “6G Vision and Requirements: Is There AnyNeed for Beyond 5G?,”

IEEE Vehicular Technology Magazine , vol. 13,no. 3, pp. 72–80, 2018.[3] W. Saad, M. Bennis, and M. Chen, “A Vision of 6G Wireless Systems:Applications, Trends, Technologies, and Open Research Problems,”

IEEE Network

Challenges , vol. 6, no. 1,pp. 117–157, 2015.[7] “White Paper: Key Drivers and Research Challenges for 6G HbiquitousWireless Intelligence.” http://jultika.oulu.ﬁ/ﬁles/isbn9789526223544.pdf, accessed in Aug. 2020.[8] Y. Lin, E. T. . Chu, Y. Lai, and T. Huang, “Time-and-Energy-AwareComputation Ofﬂoading in Handheld Devices to Coprocessors andClouds,”

IEEE Systems Journal , vol. 9, no. 2, pp. 393–405, 2015.[9] Y. Lin, Y. Lai, J. Huang, and H. Chien, “Three-Tier Capacity and TrafﬁcAllocation for Core, Edges, and Devices for Mobile Edge Computing,”

IEEE Transactions on Network and Service Management , vol. 15, no. 3,pp. 923–933, 2018.[10] F. Tang, B. Mao, Z. M. Fadlullah, N. Kato, O. Akashi, T. Inoue, andK. Mizutani, “On Removing Routing Protocol from Future WirelessNetworks: A Real-time Deep Learning Approach for Intelligent TrafﬁcControl,”

IEEE Wireless Communications , vol. 25, no. 1, pp. 154–160,2018.[11] F. Tang, Z. M. Fadlullah, B. Mao, and N. Kato, “An Intelligent TrafﬁcLoad Prediction-Based Adaptive Channel Assignment Algorithm inSDN-IoT: A Deep Learning Approach,”

IEEE Internet of ThingsJournal , vol. 5, no. 6, pp. 5141–5154, 2018.[12] B. Kar, E. H. Wu, and Y. Lin, “Energy Cost Optimization in DynamicPlacement of Virtualized Network Function Chains,”

IEEE Transac-tions on Network and Service Management , vol. 15, no. 1, pp. 372–386,2018.[13] C. T. Wang, Y. D. Lin, C. C. Wang, and Y. C. Lai, “Cost Minimizationin Placing Service Chains for Virtualized Network Functions,”

Inter-national Journal of Communication Systems , vol. 33, no. 4, p. e4222,2020.[14] M. Bashar, A. Akbari, K. Cumanan, H. Q. Ngo, A. G. Burr, P. Xiao,M. Debbah, and J. Kittler, “Exploiting Deep Learning in Limited-Fronthaul Cell-Free Massive MIMO Uplink,”

IEEE Journal on SelectedAreas in Communications , vol. 38, pp. 1678–1697, Aug. 2020.[15] G. Vallero, D. Renga, M. Meo, and M. A. Marsan, “Greener RANOperation Through Machine Learning,”

IEEE Transactions on Networkand Service Management , vol. 16, pp. 896–908, Sept. 2019.[16] B. Mao, Y. Kawamoto, J. Liu, and N. Kato, “Harvesting and ThreatAware Security Conﬁguration Strategy for IEEE 802.15.4 Based IoTNetworks,”

IEEE Communications Letters , vol. 23, no. 11, pp. 2130–2134, 2019.[17] M. Chu, H. Li, X. Liao, and S. Cui, “Reinforcement Learning-BasedMultiaccess Control and Battery Prediction With Energy Harvesting inIoT Systems,”

IEEE Internet of Things Journal , vol. 6, pp. 2009–2020,Apr. 2019.[18] H. Hashida, Y. Kawamoto, and N. Kato, “Intelligent Reﬂecting Sur-face Placement Optimization in Air-Ground Communication NetworksToward 6G,”

IEEE Wireless Communications , pp. 1–6, 2020.[19] E. Björnson and Ö. Özdogan and E. G. Larsson, “ReconﬁgurableIntelligent Surfaces: Three Myths and Two Critical Questions,” arXivpreprint arXiv:2006.03377 , 2020.[20] D. Dampahalage and KB Manosha and N. Rajatheva, “IntelligentReﬂecting Surface Aided Vehicular Communications,” arXiv preprintarXiv:2011.03071 , 2020.[21] L. Xiao, H. Zhang, Y. Xiao, X. Wan, S. Liu, L. Wang, and H. V.Poor, “Reinforcement Learning-Based Downlink Interference Controlfor Ultra-Dense Small Cells,”

IEEE Transactions on Wireless Commu-nications , vol. 19, pp. 423–434, Jan. 2020.[22] H. Zhang, H. Zhang, K. Long, and G. Karagiannidis, “Deep LearningBased Radio Resource Management in NOMA Networks: User As-sociation, Subchannel and Power Allocation,”

IEEE Transactions onNetwork Science and Engineering , pp. 1–1, 2020.[23] S. Gong, Y. Xie, J. Xu, D. Niyato, and Y. Liang, “Deep reinforcementlearning for backscatter-aided data ofﬂoading in mobile edge comput-ing,”

IEEE Network , pp. 1–8, 2020. [24] J. Ren, H. Wang, T. Hou, S. Zheng, and C. Tang, “Federated Learning-Based Computation Ofﬂoading Optimization in Edge Computing-Supported Internet of Things,” IEEE Access , vol. 7, pp. 69194–69201,2019.[25] Y. Zhang, P. Chowdhury, M. Tornatore, and B. Mukherjee, “Energy Ef-ﬁciency in Telecom Optical Networks,”

IEEE Communications SurveysTutorials , vol. 12, no. 4, pp. 441–458, 2010.[26] S. Sudevalayam and P. Kulkarni, “Energy Harvesting Sensor Nodes:Survey and Implications,”

IEEE Communications Surveys Tutorials ,vol. 13, no. 3, pp. 443–461, 2011.[27] D. Feng, C. Jiang, G. Lim, L. J. Cimini, G. Feng, and G. Y.Li, “A Survey of Energy-efﬁcient Wireless Communications,”

IEEECommunications Surveys Tutorials , vol. 15, no. 1, pp. 167–178, 2013.[28] A. A. Aziz, Y. A. Sekercioglu, P. Fitzpatrick, and M. Ivanovich, “ASurvey on Distributed Topology Control Techniques for Extendingthe Lifetime of Battery Powered Wireless Sensor Networks,”

IEEECommunications Surveys Tutorials , vol. 15, no. 1, pp. 121–144, 2013.[29] . Budzisz, F. Ganji, G. Rizzo, M. Ajmone Marsan, M. Meo, Y. Zhang,G. Koutitas, L. Tassiulas, S. Lambert, B. Lannoo, M. Pickavet,A. Conte, I. Haratcherev, and A. Wolisz, “Dynamic Resource Provi-sioning for Energy Efﬁciency in Wireless Access Networks: A Surveyand an Outlook,”

IEEE Communications Surveys Tutorials , vol. 16,no. 4, pp. 2259–2285, 2014.[30] X. Lu, P. Wang, D. Niyato, D. I. Kim, and Z. Han, “WirelessNetworks With RF Energy Harvesting: A Contemporary Survey,”

IEEECommunications Surveys Tutorials , vol. 17, no. 2, pp. 757–789, 2015.[31] M. Ismail, W. Zhuang, E. Serpedin, and K. Qaraqe, “A Survey on GreenMobile Networking: From The Perspectives of Network Operators andMobile Users,”

IEEE Communications Surveys Tutorials , vol. 17, no. 3,pp. 1535–1556, 2015.[32] C. Fang, F. R. Yu, T. Huang, J. Liu, and Y. Liu, “A Survey of GreenInformation-Centric Networking: Research Issues and Challenges,”

IEEE Communications Surveys Tutorials , vol. 17, no. 3, pp. 1455–1472, 2015.[33] M. Erol-Kantarci and H. T. Mouftah, “Energy-Efﬁcient Informationand Communication Infrastructures in the Smart Grid: A Survey onInteractions and Open Issues,”

IEEE Communications Surveys Tutori-als , vol. 17, no. 1, pp. 179–197, 2015.[34] X. Huang, T. Han, and N. Ansari, “On Green-Energy-Powered Cog-nitive Radio Networks,”

IEEE Communications Surveys Tutorials ,vol. 17, no. 2, pp. 827–842, 2015.[35] M. Peng, C. Wang, J. Li, H. Xiang, and V. Lau, “Recent Advancesin Underlay Heterogeneous Networks: Interference Control, ResourceAllocation, and Self-Organization,”

IEEE Communications SurveysTutorials , vol. 17, no. 2, pp. 700–729, 2015.[36] R. Mahapatra, Y. Nijsure, G. Kaddoum, N. Ul Hassan, and C. Yuen,“Energy Efﬁciency Tradeoff Mechanism Towards Wireless GreenCommunication: A Survey,”

IEEE Communications Surveys Tutorials ,vol. 18, no. 1, pp. 686–705, 2016.[37] W. Van Heddeghem, B. Lannoo, D. Colle, M. Pickavet, and P. De-meester, “A Quantitative Survey of the Power Saving Potential inIP-Over-WDM Backbone Networks,”

IEEE Communications SurveysTutorials , vol. 18, no. 1, pp. 706–731, 2016.[38] M. Ku, W. Li, Y. Chen, and K. J. Ray Liu, “Advances in EnergyHarvesting Communications: Past, Present, and Future Challenges,”

IEEE Communications Surveys Tutorials , vol. 18, no. 2, pp. 1384–1412, 2016.[39] S. Buzzi, C. I, T. E. Klein, H. V. Poor, C. Yang, and A. Zappone,“A Survey of Energy-Efﬁcient Techniques for 5G Networks and Chal-lenges Ahead,”

IEEE Journal on Selected Areas in Communications ,vol. 34, no. 4, pp. 697–709, 2016.[40] A. Omairi, Z. H. Ismail, K. A. Danapalasingam, and M. Ibrahim,“Power Harvesting in Wireless Sensor Networks and Its AdaptationWith Maximum Power Point Tracking: Current Technology and FutureDirections,”

IEEE Internet of Things Journal , vol. 4, no. 6, pp. 2104–2115, 2017.[41] S. Zhang, Q. Wu, S. Xu, and G. Y. Li, “Fundamental Green Tradeoffs:Progresses, Challenges, and Impacts on 5G Networks,”

IEEE Commu-nications Surveys Tutorials , vol. 19, no. 1, pp. 33–56, 2017.[42] Y. Alsaba, S. K. A. Rahim, and C. Y. Leow, “Beamforming inWireless Energy Harvesting Communications Systems: A Survey,”

IEEE Communications Surveys Tutorials , vol. 20, no. 2, pp. 1329–1360, 2018.[43] T. D. Ponnimbaduge Perera, D. N. K. Jayakody, S. K. Sharma,S. Chatzinotas, and J. Li, “Simultaneous Wireless Information andPower Transfer (SWIPT): Recent Advances and Future Challenges,”

IEEE Communications Surveys Tutorials , vol. 20, no. 1, pp. 264–302,2018. [44] Q. Chen, L. Wang, P. Chen, and G. Chen, “Optimization of ComponentElements in Integrated Coding Systems for Green Communications:A Survey,”

IEEE Communications Surveys Tutorials , vol. 21, no. 3,pp. 2977–2999, 2019.[45] P. Tedeschi, S. Sciancalepore, and R. Di Pietro, “Security in EnergyHarvesting Networks: A Survey of Current Solutions and ResearchChallenges,”

IEEE Communications Surveys Tutorials , pp. 1–1, 2020.[46] D. Ma, G. Lan, M. Hassan, W. Hu, and S. K. Das, “Sensing, Com-puting, and Communications for Energy Harvesting IoTs: A Survey,”

IEEE Communications Surveys Tutorials , vol. 22, no. 2, pp. 1222–1250, 2020.[47] S. Hu, X. Chen, W. Ni, X. Wang, and E. Hossain, “Modeling andAnalysis of Energy Harvesting and Smart Grid-Powered Wireless Com-munication Networks: A Contemporary Survey,”

IEEE Transactions onGreen Communications and Networking , vol. 4, no. 2, pp. 461–496,2020.[48] W. Dong, T. Zhang, Z. Hu, Y. Liu, and X. Han, “Energy-EfﬁcientHybrid Precoding for mmWave Massive MIMO Systems,” in , (Beijing, China), pp. 6–10, Aug. 2018.[49] B. Matthiesen, A. Zappone, K. Besser, E. A. Jorswieck, and M. Deb-bah, “A Globally Optimal Energy-Efﬁcient Power Control Frameworkand Its Efﬁcient Implementation in Wireless Interference Networks,”

IEEE Transactions on Signal Processing , vol. 68, pp. 3887–3902, 2020.[50] K. Yang, S. Martin, C. Xing, J. Wu, and R. Fan, “Energy-EfﬁcientPower Control for Device-to-Device Communications,”

IEEE Journalon Selected Areas in Communications , vol. 34, no. 12, pp. 3208–3220,2016.[51] Y. Jiang, Q. Liu, F. Zheng, X. Gao, and X. You, “Energy-Efﬁcient JointResource Allocation and Power Control for D2D Communications,”

IEEE Transactions on Vehicular Technology , vol. 65, no. 8, pp. 6119–6127, 2016.[52] D. Chang, Y. Ding, J. Xie, A. K. Bhunia, X. Li, Z. Ma, M. Wu, J. Guo,and Y. Z. Song, “The Devil is in the Channels: Mutual-Channel Lossfor Fine-Grained Image Classiﬁcation,”

IEEE Transactions on ImageProcessing , vol. 29, pp. 4683–4695, 2020.[53] B. Mao, F. Tang, Z. M. Fadlullah, and N. Kato, “An intelligentroute computation approach based on real-time deep learning strategyfor software deﬁned communication systems,”

IEEE Transactions onEmerging Topics in Computing , pp. 1–1, 2019.[54] T. Wang, S. Wang, and Z. Zhou, “Machine Learning for 5G and Be-yond: From Model-Based to Data-Driven Mobile Wireless Networks,”

China Communications , vol. 16, pp. 165–175, Jan. 2019.[55] H. Zhang, M. Min, L. Xiao, S. Liu, P. Cheng, and M. Peng, “Rein-forcement Learning-Based Interference Control for Ultra-Dense SmallCells,” in , pp. 1–6, Dec. 2018.[56] Y. Zhou, Z. M. Fadlullah, B. Mao, and N. Kato, “A Deep-Learning-Based Radio Resource Assignment Technique for 5G Ultra DenseNetworks,”

IEEE Network , vol. 32, no. 6, pp. 28–34, 2018.[57] N. Kato, Z. M. Fadlullah, B. Mao, F. Tang, O. Akashi, T. Inoue, andK. Mizutani, “The Deep Learning Vision for Heterogeneous NetworkTrafﬁc Control: Proposal, Challenges, and Future Perspective,”

IEEEWireless Communications , vol. 24, pp. 146–153, Dec. 2017.[58] Z. Zhang, Y. Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Kara-giannidis, and P. Fan, “6G Wireless Networks: Vision, Requirements,Architecture, and Key Technologies,”

IEEE Vehicular Technology Mag-azine , vol. 14, pp. 28–41, Sep. 2019.[59] N. Kato, B. Mao, F. Tang, Y. Kawamoto, and J. Liu, “Ten Challengesin Advancing Machine Learning Technologies towards 6G,”

IEEEWireless Communications Magazine , vol. 27, pp. 96–103, Jun. 2020.[60] T. K. Rodrigues, K. Suto, H. Nishiyama, J. Liu, and N. Kato, “MachineLearning Meets Computation and Communication Control in EvolvingEdge and Cloud: Challenges and Future Perspective,”

IEEE Commu-nications Surveys Tutorials , vol. 22, no. 1, pp. 38–67, 2020.[61] E. Peltonen, M. Bennis, M. Capobianco, M. Debbah, A. Ding, F. Gil-Castiñeira, M. Jurmu, T. Karvonen, M. Kelanti, A. Kliks, T. Leppänen,L. Lovén, T. Mikkonen, A. Rao, S. Samarakoon, K. Seppänen, P. Sroka,S. Tarkoma, and T. Yang, “6G White Paper on Edge Intelligence,” Tech.Rep. 8, Jun. 2020.[62] X. Li, J. Wu, Z. Sun, Z. Ma, J. Cao, and J. H. Xue, “BSNet: Bi-Similarity Network for Few-shot Fine-grained Image Classiﬁcation,”

IEEE Transactions on Image Processing , vol. 30, pp. 1318–1331, 2021.[63] G. Boulianne, “A Study of Inductive Biases for Unsupervised SpeechRepresentation Learning,”

IEEE/ACM Transactions on Audio, Speech,and Language Processing , vol. 28, pp. 2781–2795, 2020. [64] D. S. et al., “Mastering the Game of Go without Human Knowledge,” nature , vol. 550, no. 7676, pp. 354–359, 2017.[65] B. Mao, Z. M. Fadlullah, F. Tang, N. Kato, O. Akashi, T. Inoue,and K. Mizutani, “A Tensor Based Deep Learning Technique forIntelligent Packet Routing,” in GLOBECOM 2017 - 2017 IEEE GlobalCommunications Conference , (Singapore), pp. 1–6, Dec. 2017.[66] X. Liu, Z. Zhan, J. D. Deng, Y. Li, T. Gu, and J. Zhang, “An EnergyEfﬁcient Ant Colony System for Virtual Machine Placement in CloudComputing,”

IEEE Transactions on Evolutionary Computation , vol. 22,pp. 113–128, Feb. 2018.[67] S. Moon, H. Kim, and Y. Yi, “BRUTE: Energy-Efﬁcient User Associ-ation in Cellular Networks From Population Game Perspective,”

IEEETransactions on Wireless Communications , vol. 15, no. 1, pp. 663–675,2016.[68] Y. Zhao, Y. Yin, and G. Gui, “Lightweight deep learning based intel-ligent edge surveillance techniques,”

IEEE Transactions on CognitiveCommunications and Networking , vol. 6, no. 4, pp. 1146–1154, 2020.[69] v. n. p. y. Z. Beheshti and S. M. H. Shamsuddin, journal=Int. J. Adv.Soft Comput. Appl, “A Review of Population-based Meta-HeuristicAlgorithms,”[70] J. Kennedy and R. Eberhart, “Particle Swarm Optimization,” in

Pro-ceedings of ICNN’95-International Conference on Neural Networks ,vol. 4, pp. 1942–1948, IEEE, 1995.[71] Y. Li and S. Wang, “An Energy-Aware Edge Server Placement Al-gorithm in Mobile Edge Computing,” in , (San Francisco, CA, USA),pp. 66–73, July 2018.[72] S. Wang, Z. Liu, Z. Zheng, Q. Sun, and F. Yang, “Particle SwarmOptimization for Energy-Aware Virtual Machine Placement Optimiza-tion in Virtualized Data Centers,” in , (Seoul, South Korea), pp. 102–109,Dec. 2013.[73] A. Ibrahim, M. Noshy, H. A. Ali, and M. Badawy, “PAPSO: APower-Aware VM Placement Technique Based on Particle SwarmOptimization,”

IEEE Access , vol. 8, pp. 81747–81764, 2020.[74] M. Dorigo, M. Birattari, and T. Stutzle, “Ant colony Optimization,”

IEEE computational intelligence magazine , vol. 1, no. 4, pp. 28–39,2006.[75] C. Li, W. Liu, L. Wang, M. Li, and K. Okamura, “Energy-EfﬁcientQuality of Service Aware Forwarding Scheme for Content-CentricNetworking,”

Journal of Network and Computer Applications , vol. 58,pp. 241 – 254, Dec. 2015.[76] C. Liao, J. Wu, J. Du, and L. Zhao, “Ant Colony Optimization InspiredResource Allocation for Multiuser Multicarrier Systems,” in , (Nanjing, China), pp. 1–6, Oct. 2017.[77] V. Mallawaarachchi, “Introduction to Genetic Algorithms — Includ-ing Example Code.” https://towardsdatascience.com/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3, accessedNov. 2020.[78] L. Dai and H. Zhang, “Propagation-Model-Free Base Station Deploy-ment for Mobile Networks: Integrating Machine Learning and HeuristicMethods,”

IEEE Access , vol. 8, pp. 83375–83386, 2020.[79] J. Moysen, L. Giupponi, and J. Mangues-Bafalluy, “A Machine Learn-ing Enabled Network Planning Tool,” in , (Valencia, Spain), pp. 1–7, Sep. 2016.[80] R. Zhang, Y. Chen, B. Dong, F. Tian, and Q. Zheng, “A GeneticAlgorithm-Based Energy-Efﬁcient Container Placement Strategy inCaaS,”

IEEE Access , vol. 7, pp. 121360–121373, 2019.[81] D. Gong, J. Sun, and Z. Miao, “A Set-Based Genetic Algorithm forInterval Many-Objective Optimization Problems,”

IEEE Transactionson Evolutionary Computation , vol. 22, no. 1, pp. 47–60, 2018.[82] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue,and K. Mizutani, “State-of-the-Art Deep Learning: Evolving MachineIntelligence Toward Tomorrow’s Intelligent Network Trafﬁc ControlSystems,”

IEEE Communications Surveys Tutorials , vol. 19, pp. 2432–2455, Fourthquarter 2017.[83] J. M. Keller, M. R. Gray, and J. A. Givens, “A Fuzzy K-Nearest Neigh-bor Algorithm,”

IEEE Transactions on Systems, Man, and Cybernetics ,no. 4, pp. 580–585, 1985.[84] S. Chatterjee and A. S. Hadi,

Regression Analysis by Example . JohnWiley & Sons, 2015.[85] R. Gandhi, “Support Vector Machine — Introduction to MachineLearning Algorithms.” https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47,accessed Nov. 2020. [86] M. J. Garbade, “Understanding K-means Clustering in MachineLearning.” https://towardsdatascience.com/understanding- k- means-clustering-in-machine-learning-6a6e67336aa1, accessed Nov. 2020.[87] S. Wang, M. Chen, C. Yin, W. Saad, C. S. Hong, S. Cui, andH. V. Poor, “Federated Learning for Task and Resource Allocationin Wireless High Altitude Balloon Networks.” Available at arXivhttps://arxiv.org/abs/2003.09375(2020/09/15), 2020.[88] N. K. Visalakshi and J. Suguna, “K-means clustering using Max-mindistance measure,” in

NAFIPS 2009 - 2009 Annual Meeting of theNorth American Fuzzy Information Processing Society , (Cincinnati,OH, USA), pp. 1–6, June 2009.[89] H. Zhang, H. Zhang, W. liu, K. long, J. Dong, and V. C. M. Leung,“Energy Efﬁcient User Clustering and Hybrid Precoding for TerahertzMIMO-NOMA Systems,” in

ICC 2020 - 2020 IEEE InternationalConference on Communications (ICC) , (Dublin, Ireland), pp. 1–5, June2020.[90] H. Zhang, H. Zhang, W. Liu, K. Long, J. Dong, and V. C. M.Leung, “Energy Efﬁcient User Clustering, Hybrid Precoding and PowerOptimization in Terahertz MIMO-NOMA Systems,”

IEEE Journal onSelected Areas in Communications , vol. 38, pp. 2074–2085, Sept. 2020.[91] C. Shen, S. Xue, and S. Fu, “ECPM: An Energy-Efﬁcient CloudletPlacement Method in Mobile Cloud Environment,”

EURASIP Journalon Wireless Communications and Networking , vol. 2019, pp. 1–10,May 2019.[92] B. Mao, F. Tang, Z. M. Fadlullah, N. Kato, O. Akashi, T. Inoue, andK. Mizutani, “A Novel Non-Supervised Deep-Learning-Based NetworkTrafﬁc Control Method for Software Deﬁned Wireless Networks,”

IEEE Wireless Communications , vol. 25, pp. 74–81, Sept. 2018.[93] A. Krogh, “What Are Artiﬁcial Neural Networks?,”

Nature Biotech-nology , vol. 26, pp. 195–197, Feb. 2008.[94] G. E. Hinton, S. Osindero, and Y. W. Teh, “A Fast Learning Algorithmfor Deep Belief Nets,”

Neural computation , vol. 18, no. 7, pp. 1527–1554, 2006.[95] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classiﬁcationwith Deep Convolutional Neural Networks,”

Communications of theACM , vol. 60, no. 6, pp. 84–90, 2017.[96] T. Mikolov, S. Kombrink, L. Burget, J. ˇCernock`y, and S. Khudan-pur, “Extensions of Recurrent Neural Network Language Model,” in , (Prague, Czech Republic), pp. 5528–5531, May2011.[97] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative AdversarialNetworks,” 2014.[98] V. R. Konda and J. N. Tsitsiklis, “Actor-Critic Algorithms,” in

Ad-vances in Neural Information Processing Systems , pp. 1008–1014,2000.[99] V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, andJ. Pineau. 2018.[100] D. A. Temesgene, M. Miozzo, D. Gunduz, and P. Dini, “DistributedDeep Reinforcement Learning for Functional Split Control in EnergyHarvesting Virtualized Small Cells,”

IEEE Transactions on SustainableComputing , pp. 1–1, 2020.[101] C. He, Y. Hu, Y. Chen, and B. Zeng, “Joint Power Allocation andChannel Assignment for NOMA With Deep Reinforcement Learning,”

IEEE Journal on Selected Areas in Communications , vol. 37, pp. 2200–2210, Oct. 2019.[102] M. Simsek, M. Bennis, and . Güvenç, “Learning Based Frequency- andTime-Domain Inter-Cell Interference Coordination in HetNets,”

IEEETransactions on Vehicular Technology , vol. 64, pp. 4589–4602, Oct.2015.[103] L. Zhang and Y. Liang, “Deep Reinforcement Learning for Multi-AgentPower Control in Heterogeneous Networks.” Available at arXiv https://arxiv.org/abs/2004.12095(2020/09/15), 2020.[104] Yu-Kwong Kwok and I. Ahmad, “Dynamic critical-path scheduling:an effective technique for allocating task graphs to multiprocessors,”

IEEE Transactions on Parallel and Distributed Systems , vol. 7, no. 5,pp. 506–521, 1996.[105] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,”

IEEETransactions on Knowledge and Data Engineering , vol. 22, no. 10,pp. 1345–1359, 2010.[106] S. Sharma, S. J. Darak, and A. Srivastava, “Energy Saving in Het-erogeneous Cellular Network via Transfer Reinforcement LearningBased Policy,” in , (Bangalore, India), pp. 397–398,Jan. 2017. [107] S. Sharma, S. J. Darak, and A. Srivastava, “Transfer ReinforcementLearning based Framework for Energy Savings in Cellular Base StationNetwork,” in , (New Delhi, India), pp. 1–4, Mar. 2019.[108] R. Dong, C. She, W. Hardjawana, Y. Li, and B. Vucetic, “DeepLearning for Radio Resource Allocation with Diverse Quality-of-Service Requirements in 5G,” arXiv preprint arXiv:2004.00507 , 2020.[109] C. Pradhan, A. Li, C. She, Y. Li, and B. Vucetic, “ComputationOfﬂoading for IoT in C-RAN: Optimization and Deep Learning,” IEEETransactions on Communications , vol. 68, no. 7, pp. 4565–4579, 2020.[110] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated Machine Learning:Concept and Applications,”

ACM Transactions on Intelligent Systemsand Technology (TIST) , vol. 10, no. 2, pp. 1–19, 2019.[111] M. J. Garbade, “Federated Learning.” https://federated.withgoogle.com/, accessed Nov. 2020.[112] S. Shen, Y. Han, X. Wang, and Y. Wang, “Computation Ofﬂoading withMultiple Agents in Edge-Computing–Supported IoT,”

ACM Transac-tions on Sensor Networks , vol. 16, p. 1–27, Dec. 2020.[113] X. Wang, Z. Ning, S. Guo, and L. Wang, “Imitation Learning EnabledTask Scheduling for Online Vehicular Edge Computing,”

IEEE Trans-actions on Mobile Computing , pp. 1–1, 2020.[114] S. J. Nawaz, S. K. Sharma, S. Wyne, M. N. Patwary, and M. Asaduzza-man, “Quantum Machine Learning for 6G Communication Networks:State-of-the-Art and Vision for the Future,”

IEEE Access , vol. 7,pp. 46317–46350, 2019.[115] M. H. Alsharif, J. Kim, and J. H. Kim, “Green and sustainable cellularbase stations: An overview and future research directions,”

Energies ,vol. 10, no. 5, p. 587, 2017.[116] Z. M. Fadlullah and N. Kato, “HCP: Heterogeneous Computing Plat-form for Federated Learning Based Collaborative Content CachingTowards 6G Networks,”

IEEE Transactions on Emerging Topics inComputing , pp. 1–1, 2020.[117] Y. Wang, X. Dai, J. M. Wang, and B. Bensaou, “A ReinforcementLearning Approach to Energy Efﬁciency and QoS in 5G Wireless Net-works,”

IEEE Journal on Selected Areas in Communications , vol. 37,pp. 1413–1423, June 2019.[118] R. Thakur, S. N. Swain, and C. S. R. Murthy, “An Energy Efﬁcient CellSelection Framework for Femtocell Networks With Limited BackhaulLink Capacity,”

IEEE Systems Journal , vol. 12, no. 2, pp. 1969–1980,2018.[119] J. Wu, E. W. M. Wong, Y. Chan, and M. Zukerman, “Power Con-sumption and GoS Tradeoff in Cellular Mobile Networks with BaseStation Sleeping and Related Performance Studies,”

IEEE Transactionson Green Communications and Networking , pp. 1–1, 2020.[120] A. Alnoman and A. S. Anpalagan, “Computing-Aware Base StationSleeping Mechanism in H-CRAN-Cloud-Edge Networks,”

IEEE Trans-actions on Cloud Computing , pp. 1–1, 2019.[121] W. K. Lai, C. Shieh, C. Ho, and Y. Chen, “A Clustering-Based EnergySaving Scheme for Dense Small Cell Networks,”

IEEE Access , vol. 7,pp. 2880–2893, 2019.[122] K. N. Doan, M. Vaezi, W. Shin, H. V. Poor, H. Shin, and T. Q. S. Quek,“Power Allocation in Cache-Aided NOMA Systems: Optimization andDeep Reinforcement Learning Approaches,”

IEEE Transactions onCommunications , vol. 68, no. 1, pp. 630–644, 2020.[123] Y. Wei, F. R. Yu, M. Song, and Z. Han, “User Scheduling and ResourceAllocation in HetNets With Hybrid Energy Supply: An Actor-CriticReinforcement Learning Approach,”

IEEE Transactions on WirelessCommunications , vol. 17, no. 1, pp. 680–692, 2018.[124] H. Zhang, M. Feng, K. Long, G. K. Karagiannidis, and A. Nallanathan,“Artiﬁcial Intelligence-Based Resource Allocation in Ultradense Net-works: Applying Event-Triggered Q-Learning Algorithms,”

IEEE Ve-hicular Technology Magazine , vol. 14, no. 4, pp. 56–63, 2019.[125] Y. Liu, X. Wang, G. Boudreau, A. B. Sediq, and H. Abou-zeid,“Deep Learning Based Hotspot Prediction and Beam Management forAdaptive Virtual Small Cell in 5G Networks,”

IEEE Transactions onEmerging Topics in Computational Intelligence , 2020.[126] M. Miozzo, N. Piovesan, and P. Dini, “Coordinated Load Control ofRenewable Powered Small Base Stations Through Layered Learning,”

IEEE Transactions on Green Communications and Networking , vol. 4,pp. 16–30, Mar. 2020.[127] M. Wakaiki, K. Suto, K. Koiwa, K. Liu, and T. Zanma, “A Control-Theoretic Approach for Cell Zooming of Energy Harvesting SmallCell Networks,”

IEEE Transactions on Green Communications andNetworking , vol. 3, no. 2, pp. 329–342, 2019.[128] A. Ghazanfari, H. Tabassum, and E. Hossain, “Ambient RF energyharvesting in ultra-dense small cell networks: performance and trade-offs,”

IEEE Wireless Communications , vol. 23, no. 2, pp. 38–45, 2016. [129] W. Lin, I. Lai, and C. Lee, “Distributed Energy Cooperation for EnergyHarvesting Nodes Using Reinforcement Learning,” in , pp. 1584–1588, 2015.[130] A. Kariminezhad and A. Sezgin, “Heterogeneous Multi-Tier Networks:Improper Signaling for Joint Rate-Energy Optimization,”

IEEE Trans-actions on Wireless Communications , vol. 18, no. 1, pp. 680–694, 2019.[131] J. Borah, M. Hussain, and J. Bora, “Effect on Energy Efﬁciency withSmall Cell Deployment in Heterogeneous Cellular Networks,”

InternetTechnology Letters , vol. 2, pp. 1–6, May/June 2019.[132] T. Yiu, “Understanding Random Forest.” https://towardsdatascience.com/understanding-random-forest-58381e0602d2, accessed Nov. 2020.[133] P. Marius, V. Balas, L. Perescu-Popescu, and N. Mastorakis, “Mul-tilayer perceptron and neural networks,”

WSEAS Transactions onCircuits and Systems , vol. 8, July 2009.[134] L. Ho, H. Claussen, and D. Cherubini, “Online Evolution of FemtocellCoverage Algorithms Using Genetic Programming,” in , (London, UK), pp. 3033–3038, Sep.2013.[135] K. P. Murphy,

Machine Learning: A Probabilistic Perspective . MITpress, 2012.[136] M. Feng, S. Mao, and T. Jiang, “Base Station ON-OFF Switching in5G Wireless Networks: Approaches and Challenges,”

IEEE WirelessCommunications , vol. 24, no. 4, pp. 46–54, 2017.[137] Y. Gao, J. Chen, Z. Liu, B. Zhang, Y. Ke, and R. Liu, “MachineLearning based Energy Saving Scheme in Wireless Access Networks,”in , (Limassol, Cyprus), pp. 1573–1578, June 2020.[138] “WIKI-Auto-Regressive Integrated Moving Average.” https : / / en .wikipedia.org/wiki/Autoregressive_integrated_moving_average, ac-cessed Nov. 2020.[139] I. Donevski, G. Vallero, and M. A. Marsan, “Neural Networks forCellular Base Station Switching,” in

IEEE INFOCOM 2019 - IEEEConference on Computer Communications Workshops (INFOCOMWKSHPS) , (Paris, France), pp. 738–743, Apr. 2019.[140] H. Pervaiz, O. Onireti, A. Mohamed, M. Ali Imran, R. Tafazolli, andQ. Ni, “Energy-Efﬁcient and Load-Proportional eNodeB for 5G User-Centric Networks: A Multilevel Sleep Strategy Mechanism,”

IEEEVehicular Technology Magazine , vol. 13, pp. 51–59, Dec. 2018.[141] R. Li, Z. Zhao, X. Chen, J. Palicot, and H. Zhang, “TACT: A TransferActor-Critic Learning Framework for Energy Saving in Cellular RadioAccess Networks,”

IEEE Transactions on Wireless Communications ,vol. 13, pp. 2000–2011, Apr. 2014.[142] Q. Zhao and D. Grace, “Transfer learning for QoS aware topol-ogy management in energy efﬁcient 5G cognitive radio networks,”in ,(Akaslompolo, Finland), pp. 152–157, Nov. 2014.[143] J. Liu, B. Krishnamachari, S. Zhou, and Z. Niu, “DeepNap: Data-Driven Base Station Sleeping Operations Through Deep ReinforcementLearning,”

IEEE Internet of Things Journal , vol. 5, pp. 4273–4282,Dec. 2018.[144] W. Fischer and K. Meier-Hellstern, “The Markov-Modulated Pois-son Process (MMPP) Cookbook,”

Performance evaluation , vol. 18,pp. 149–171, Sept. 1993.[145] F. H. Panahi, F. H. Panahi, G. Hattab, T. Ohtsuki, and D. Cabric,“Green Heterogeneous Networks via an Intelligent Sleep/Wake-UpMechanism and D2D Communications,”

IEEE Transactions on GreenCommunications and Networking , vol. 2, pp. 915–931, Dec. 2018.[146] F. H. Panahi and T. Ohtsuki, “Optimal Channel-Sensing Schemefor Cognitive Radio Systems based on Fuzzy Q-Learning,”

IEICETransactions on Communications , vol. 97, no. 2, pp. 283–294, 2014.[147] F. H. Panahi and T. Ohtsuki, “Optimal Channel-Sensing Policy basedon Fuzzy Q-learning Process over Cognitive Radio Systems,” in , (Budapest,Hungary), pp. 2677–2682, June 2013.[148] Y. L. Lee, W. L. Tan, S. B. Y. Lau, T. C. Chuah, A. A. El-Saleh, andD. Qin, “Joint Cell Activation and User Association for Backhaul LoadBalancing in Green HetNets,”

IEEE Wireless Communications Letters ,vol. 9, pp. 1486–1490, Sept. 2020.[149] Q. Zhang, X. Xu, J. Zhang, X. Tao, and C. Liu, “Dynamic LoadAdjustments for Small Cells in Heterogeneous Ultra-dense Networks,”in , (Seoul, Korea (South)), pp. 1–6, May 2020.[150] Y. Liu, C. He, X. Li, C. Zhang, and C. Tian, “Power AllocationSchemes Based on Machine Learning for Distributed Antenna Sys-tems,”

IEEE Access , vol. 7, pp. 20577–20584, 2019. [151] Y. Li, Z. Gao, L. Huang, X. Du, and M. Guizani, “Energy-Aware Inter-ference Management for Ultra-Dense Multi-Tier HetNets: Architectureand Technologies,” Computer Communications , vol. 127, pp. 30–35,2018.[152] Z. Gao, B. Wen, L. Huang, C. Chen, and Z. Su, “Q-Learning-BasedPower Control for LTE Enterprise Femtocell Networks,”

IEEE SystemsJournal , vol. 11, pp. 2699–2707, Dec. 2017.[153] G. Du, L. Wang, Q. Liao, and H. Hu, “Deep Neural Network BasedCell Sleeping Control and Beamforming Optimization in Cloud-RAN,”in ,(Honolulu, HI, USA), pp. 1–5, Sept. 2019.[154] X. Zhou, P. Wang, Z. Yang, L. Tong, Y. Wang, C. Yang, N. Xiong,and H. Gao, “A Manifold Learning Two-Tier Beamforming SchemeOptimizes Resource Management in Massive MIMO Networks,”

IEEEAccess , vol. 8, pp. 22976–22987, 2020.[155] N. Zheng and J. Xue, “Manifold Learning,” in

Statistical Learning andPattern Analysis for Image and Video Processing , pp. 87–119, London:Springer, 2009.[156] Y. Zou, Y. Xie, C. Zhang, S. Gong, D. T. Hoang, and D. Niyato,“Optimization-Driven Hierarchical Deep Reinforcement Learning forHybrid Relaying Communications,” in , (Seoul, Korea (South)),pp. 1–6, May 2020.[157] S. Gong, Y. Zou, J. Xu, D. Hoang, B. Lyu, and D. Niyato,“Optimization-driven Hierarchical Learning Framework for WirelessPowered Backscatter-aided Relay Communications.” Available at arXivhttps://arxiv.org/abs/2008.01366(2020/09/15), 2020.[158] L. Li, H. Ren, Q. Cheng, K. Xue, W. Chen, M. Debbah, and Z. Han,“Millimeter-Wave Networking in Sky: A Machine Learning and MeanField Game Approach for Joint Beamforming and Beam-Steering,”

IEEE Transactions on Wireless Communications , pp. 1–1, 2020.[159] J. Xu, P. Zhu, J. Li, and X. You, “Deep Learning-Based Pilot Designfor Multi-User Distributed Massive MIMO Systems,”

IEEE WirelessCommunications Letters , vol. 8, pp. 1016–1019, Aug. 2019.[160] C. D’Andrea, A. Zappone, S. Buzzi, and M. Debbah, “Uplink PowerControl in Cell-Free Massive MIMO via Deep Learning,” in , (Le gosier, Guadeloupe),pp. 554–558, Dec. 2019.[161] Y. Nie, Q. Chen, X. Shen, and K. Gan, “Energy Efﬁcient Secure MIMOTransmission in the Presence of Smart Attacker,”

IET Communications ,vol. 14, pp. 1619–1631, June 2020.[162] X. Gao, L. Dai, Y. Sun, S. Han, and I. Chih-Lin, “Machine Learning In-spired Energy-Efﬁcient Hybrid Precoding for mmWave Massive MIMOSystems,” in , (Paris, France), pp. 1–6, May 2017.[163] P. Ge and T. Lv, “Energy-Efﬁcient Optimized Dynamic MassiveMIMO Based on Predicted User Quantity by LSTM Algorithm,” in , (Beijing, China), pp. 179–183, Aug. 2018.[164] N. Yang, H. Zhang, K. Long, H. Hsieh, and J. Liu, “Deep NeuralNetwork for Resource Management in NOMA Networks,”

IEEE Trans-actions on Vehicular Technology , vol. 69, pp. 876–886, Jan. 2020.[165] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Datawith Co-Training,” in

Proceedings of the Eleventh Annual Conferenceon Computational Learning Theory , COLT’ 98, (New York, NY, USA),p. 92–100, July 1998.[166] H. Pan, J. Liu, S. Zhou, and Z. Niu, “A Block Regression Model forShort-Term Mobile Trafﬁc Forecasting,” in , (Shenzhen,China), pp. 1–5, Nov. 2015.[167] K. Y. Lee, Y. T. Cha, and J. H. Park, “Short-Term Load ForecastingUsing an Artiﬁcial Neural Network,”

IEEE Transactions on PowerSystems , vol. 7, pp. 124–132, Feb. 1992.[168] “Understanding LSTM Networks.” Available at https://colah.github.io/posts/2015-08-Understanding-LSTMs/(2020/09/15), 2015.[169] M. Miozzo, L. Giupponi, M. Rossi, and P. Dini, “Switch-On/OffPolicies for Energy Harvesting Small Cells through Distributed Q-Learning,” in , (San Francisco, CA, USA), pp. 1–6, Mar. 2017.[170] H. Li, H. Gao, T. Lv, and Y. Lu, “Deep Q-Learning Based DynamicResource Allocation for Self-Powered Ultra-Dense Networks,” in , (Kansas City, MO, USA), pp. 1–6, May 2018.[171] H. Li, T. Lv, and X. Zhang, “Deep Deterministic Policy Gradient BasedDynamic Power Control for Self-Powered Ultra-Dense Networks,” in , (Abu Dhabi, UnitedArab Emirates), pp. 1–6, Dec. 2018.[172] S. Levine, “Actor-Critic Algorithms.” Available at http : / / rail .eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_5_actor_critic_pdf2020/09/15.[173] M. Mendil, A. De Domenico, V. Heiries, R. Caire, and N. Hadjsaid,“Battery-Aware Optimization of Green Small Cells: Sizing and EnergyManagement,”

IEEE Transactions on Green Communications andNetworking , vol. 2, pp. 635–651, Sept. 2018.[174] L. Busoniu, D. Ernst, B. De Schutter, and R. Babuska, “FuzzyApproximation for Convergent Model-Based Reinforcement Learning,”in , (London, UK),pp. 1–6, July 2007.[175] N. Piovesan, D. López-Pérez, M. Miozzo, and P. Dini, “Joint LoadControl and Energy Sharing for Renewable Powered Small BaseStations: a Machine Learning Approach,”

IEEE Transactions on GreenCommunications and Networking , pp. 1–1, 2020.[176] Y. Yue and H. M. Le, “Imitation Learning Tutorial.” Available at https://sites.google.com/view/icml2018- imitation- learning/(2020/09/15),2018.[177] N. Piovesan and P. Dini, “Optimal Direct Load Control of RenewablePowered Small Cells: A Shortest Path Approach,”

Internet TechnologyLetters , vol. 1, no. 1, p. e7, 2018.[178] I. Grondman, M. Vaandrager, L. Busoniu, R. Babuska, andE. Schuitema, “Efﬁcient Model Learning Methods for Actor–CriticControl,”

IEEE Transactions on Systems, Man, and Cybernetics, PartB (Cybernetics) , vol. 42, pp. 591–602, June 2012.[179] H. Zhang, D. Zhan, C. J. Zhang, K. Wu, Y. Liu, and S. Luo,“Deep Reinforcement Learning-Based Access Control for Buffer-AidedRelaying Systems With Energy Harvesting,”

IEEE Access , vol. 8,pp. 145006–145017, Aug. 2020.[180] L. Dai, B. Wang, M. Peng, and S. Chen, “Hybrid Precoding-BasedMillimeter-Wave Massive MIMO-NOMA With Simultaneous WirelessInformation and Power Transfer,”

IEEE Journal on Selected Areas inCommunications , vol. 37, pp. 131–141, Jan. 2019.[181] Y. Kawamoto, R. Sasazawa, B. Mao, and N. Kato, “Multilayer VirtualCell Based Resource Allocation in Low-Power Wide-Area Networks,”

IEEE Internet of Things Journal , 2019.[182] Q. Wang, M. Hempstead, and W. Yang, “A Realistic Power Consump-tion Model for Wireless Sensor Network Devices,” in , vol. 1, (Reston, VA, USA), pp. 286–295, Sept. 2006.[183] Y. Li, K. K. Chai, Y. Chen, and J. Loo, “QoS-Aware Joint AccessControl and Duty Cycle Control for Machine-to-Machine Communi-cations,” in , (San Diego, CA, USA), pp. 1–6, Dec. 2015.[184] Y. Li, K. K. Chai, Y. Chen, and J. Loo, “Smart Duty Cycle Control withReinforcement Learning for Machine to Machine Communications,”in , (London, UK), pp. 1458–1463, June 2015.[185] H. Xu, X. Liu, W. G. Hatcher, G. Xu, W. Liao, and W. Yu, “Priority-aware Reinforcement Learning-Based Integrated Design of Networkingand Control for Industrial Internet of Things,”

IEEE Internet of ThingsJournal , pp. 1–1, 2020.[186] S. Sarwar, R. Sirhindi, L. Aslam, G. Mustafa, M. M. Yousaf, and S. W.U. Q. Jaffry, “Reinforcement Learning Based Adaptive Duty Cyclingin LR-WPANs,”

IEEE Access , vol. 8, pp. 161157–161174, 2020.[187] M. Alenezi, K. K. Chai, A. S. Alam, Y. Chen, and S. Jimaa, “Un-supervised Learning Clustering and Dynamic Transmission Schedul-ing for Efﬁcient Dense LoRaWAN Networks,”

IEEE Access , vol. 8,pp. 191495–191509, 2020.[188] A. Azari and C. Cavdar, “Self-Organized Low-Power IoT Networks:A Distributed Learning Approach,” in , (Abu Dhabi, United Arab Emirates),pp. 1–7, Dec. 2018.[189] C. Zhou, W. Wu, H. He, P. Yang, F. Lyu, N. Cheng, and X. Shen, “DeepReinforcement Learning for Delay-Oriented IoT Task Scheduling inSpace-Air-Ground Integrated Network,”

IEEE Transactions on WirelessCommunications , pp. 1–1, 2020.[190] K. K. Nguyen, N. A. Vien, L. D. Nguyen, M. T. Le, L. Hanzo, andT. Q. Duong, “Real-Time Energy Harvesting Aided Scheduling in UAV-Assisted D2D Networks Relying on Deep Reinforcement Learning,”

IEEE Access , pp. 1–1, 2020.[191] R. Chen, X. Hu, X. Li, and W. Wang, “Optimum Power Allocationbased on Trafﬁc Matching Service for Multi-beam Satellite System,”in , pp. 655–659, 2020. [192] B. Özbek, M. Pischella, and D. Le Ruyet, “Energy efﬁcient resourceallocation for underlaying multi-d2d enabled multiple-antennas com-munications,” IEEE Transactions on Vehicular Technology , vol. 69,no. 6, pp. 6189–6199, 2020.[193] T. Zhang, K. Zhu, and J. Wang, “Energy-Efﬁcient Mode Selectionand Resource Allocation for D2D-enabled Heterogeneous Networks:A Deep Reinforcement Learning Approach,”

IEEE Transactions onWireless Communications , pp. 1–1, 2020.[194] Z. Ji, A. K. Kiani, Z. Qin, and R. Ahmad, “Power Optimization inDevice-to-Device Communications: A Deep Reinforcement LearningApproach with Dynamic Reward,”

IEEE Wireless CommunicationsLetters , pp. 1–1, 2020.[195] A. Chowdhury, S. A. Raut, and H. S. Narman, “DA-DRLS: DriftAdaptive Deep Reinforcement Learning based Scheduling for IoT Re-source Management,”

Journal of Network and Computer Applications ,vol. 138, pp. 51–65, 2019.[196] H. Yang, A. Alphones, W. Zhong, C. Chen, and X. Xie, “Learning-Based Energy-Efﬁcient Resource Management by HeterogeneousRF/VLC for Ultra-Reliable Low-Latency Industrial IoT Networks,”

IEEE Transactions on Industrial Informatics , vol. 16, pp. 5565–5576,Aug. 2020.[197] H. Yang and X. Xie, “An Actor-Critic Deep Reinforcement LearningApproach for Transmission Scheduling in Cognitive Internet of ThingsSystems,”

IEEE Systems Journal , vol. 14, pp. 51–60, Mar. 2020.[198] G. M. S. Rahman, M. Peng, S. Yan, and T. Dang, “Learning BasedJoint Cache and Power Allocation in Fog Radio Access Networks,”

IEEE Transactions on Vehicular Technology , vol. 69, pp. 4401–4411,Apr. 2020.[199] M. K. Sharma, A. Zappone, M. Assaad, M. Debbah, and S. Vassilaras,“Distributed Power Control for Large Energy Harvesting Networks: AMulti-Agent Deep Reinforcement Learning Approach,”

IEEE Transac-tions on Cognitive Communications and Networking , vol. 5, pp. 1140–1154, Dec. 2019.[200] M. K. Sharma, A. Zappone, M. Debbah, and M. Assaad, “Multi-AgentDeep Reinforcement Learning based Power Control for Large EnergyHarvesting Networks,” in ,(Avignon, France), pp. 1–7, June 2019.[201] X. Bao, H. Liang, Y. Liu, and F. Zhang, “A Stochastic Game Approachfor Collaborative Beamforming in SDN-Based Energy HarvestingWireless Sensor Networks,”

IEEE Internet of Things Journal , vol. 6,pp. 9583–9595, Dec. 2019.[202] Y. Guo and M. Xiang, “Multi-Agent Reinforcement Learning BasedEnergy Efﬁciency Optimization in NB-IoT Networks,” in , (Waikoloa, HI, USA), pp. 1–6,Dec. 2019.[203] N. Jiang, Y. Deng, A. Nallanathan, and J. A. Chambers, “ReinforcementLearning for Real-Time Optimization in NB-IoT Networks,”

IEEEJournal on Selected Areas in Communications , vol. 37, no. 6, pp. 1424–1440, 2019.[204] S. Lien, S. Hung, D. Deng, C. Lai, and H. Tsai, “Low LatencyRadio Access in 3GPP Local Area Data Networks for V2X: StochasticOptimization and Learning,”

IEEE Internet of Things Journal , vol. 6,pp. 4867–4879, June 2019.[205] Y. Cui, V. K. N. Lau, R. Wang, H. Huang, and S. Zhang, “A Surveyon Delay-Aware Resource Control for Wireless Systems—Large De-viation Theory, Stochastic Lyapunov Drift, and Distributed StochasticLearning,”

IEEE Transactions on Information Theory , vol. 58, no. 3,pp. 1677–1701, 2012.[206] Y. Zhao, J. Hu, K. Yang, and S. Cui, “Deep Reinforcement LearningAided Intelligent Access Control in Energy Harvesting based WLAN,”

IEEE Transactions on Vehicular Technology , pp. 1–1, 2020.[207] Y. Sun, Y. Wang, J. Jiao, S. Wu, and Q. Zhang, “Deep Learning-BasedLong-Term Power Allocation Scheme for NOMA Downlink System inS-IoT,”

IEEE Access , vol. 7, pp. 86288–86296, 2019.[208] C. Han, A. Liu, H. Wang, L. Huo, and X. Liang, “Dynamic Anti-Jamming Coalition for Satellite-Enabled Army IoT: A DistributedGame Approach,”

IEEE Internet of Things Journal , vol. 7, no. 11,pp. 10932–10944, 2020.[209] S. Khairy, P. Balaprakash, L. X. Cai, and Y. Cheng, “Constrained DeepReinforcement Learning for Energy Sustainable Multi-UAV basedRandom Access IoT Networks with NOMA,”

IEEE Journal on SelectedAreas in Communications , pp. 1–1, 2020.[210] Y. Cao, L. Zhang, and Y. Liang, “Deep Reinforcement Learning forChannel and Power Allocation in UAV-enabled IoT Systems,” in , (Waikoloa,HI, USA), pp. 1–6, Dec. 2019. [211] Y. Yuan, L. Lei, T. X. Vu, S. Chatzinotas, S. Sun, and B. Ottersten, “En-ergy minimization in UAV-aided networks: actor-critic learning for con-strained scheduling optimization,” arXiv preprint arXiv:2006.13610 ,2020.[212] Y. Liu, K. . Tong, and K. . Wong, “Reinforcement Learning basedRouting for Energy Sensitive Wireless Mesh IoT Networks,”

Electron-ics Letters , vol. 55, no. 17, pp. 966–968, 2019.[213] R. Wang, A. Yadav, E. A. Makled, O. A. Dobre, R. Zhao, andP. K. Varshney, “Optimal Power Allocation for Full-Duplex UnderwaterRelay Networks With Energy Harvesting: A Reinforcement LearningApproach,”

IEEE Wireless Communications Letters , vol. 9, no. 2,pp. 223–227, 2020.[214] C. Wang, X. Yao, W. Wang, and J. M. Jornet, “Multi-hop DeﬂectionRouting Algorithm Based on Reinforcement Learning for Energy-Harvesting Nanonetworks,”

IEEE Transactions on Mobile Computing ,pp. 1–1, 2020.[215] J. Zhang, J. Tang, and F. Wang, “Cooperative Relay Selection for LoadBalancing With Mobility in Hierarchical WSNs: A Multi-Armed BanditApproach,”

IEEE Access , vol. 8, pp. 18110–18122, 2020.[216] Z. Zhou, F. Xiong, C. Xu, Y. He, and S. Mumtaz, “Energy-Efﬁcient Ve-hicular Heterogeneous Networks for Green Cities,”

IEEE Transactionson Industrial Informatics , vol. 14, no. 4, pp. 1522–1531, 2018.[217] H. Mostafaei, “Energy-Efﬁcient Algorithm for Reliable Routing ofWireless Sensor Networks,”

IEEE Transactions on Industrial Electron-ics , vol. 66, no. 7, pp. 5567–5575, 2019.[218] X. Wang, T. Jin, L. Hu, and Z. Qian, “Energy-Efﬁcient Power Al-location and Q-Learning-Based Relay Selection for Relay-Aided D2DCommunication,”

IEEE Transactions on Vehicular Technology , vol. 69,no. 6, pp. 6452–6462, 2020.[219] K. Haseeb, K. M. Almustafa, Z. Jan, T. Saba, and U. Tariq, “Secureand Energy-aware Heuristic Routing Protocol for Wireless SensorNetwork,”

IEEE Access , pp. 1–1, 2020.[220] L. Xiao, D. Jiang, Y. Chen, W. Su, and Y. Tang, “Reinforcement-Learning-Based Relay Mobility and Power Allocation for UnderwaterSensor Networks Against Jamming,”

IEEE Journal of Oceanic Engi-neering , vol. 45, no. 3, pp. 1148–1156, 2020.[221] Y. Zhou, T. Cao, and W. Xiang, “QLFR: A Q-Learning-BasedLocalization-Free Routing Protocol for Underwater Sensor Networks,”in ,(Waikoloa, HI, USA), pp. 1–6, Dec. 2019.[222] T. Hu and Y. Fei, “QELAR: A Machine-Learning-Based AdaptiveRouting Protocol for Energy-Efﬁcient and Lifetime-Extended Under-water Sensor Networks,”

IEEE Transactions on Mobile Computing ,vol. 9, pp. 796–809, June 2010.[223] M. Aboubakar, M. Kellil, A. Bouabdallah, and P. Roux, “Toward Intel-ligent Reconﬁguration of RPL Networks using Supervised Learning,”in , (Manchester, UK), pp. 1–4, Apr. 2019.[224] T. Fu, C. Wang, and N. Cheng, “Deep-Learning-Based Joint Optimiza-tion of Renewable Energy Storage and Routing in Vehicular EnergyNetwork,”

IEEE Internet of Things Journal , vol. 7, pp. 6229–6241,July 2020.[225] Z. Jin, Q. Zhao, and Y. Su, “RCAR: A Reinforcement-Learning-BasedRouting Protocol for Congestion-Avoided Underwater Acoustic SensorNetworks,”

IEEE Sensors Journal , vol. 19, pp. 10881–10891, Nov.2019.[226] R. Huang, L. Ma, G. Zhai, J. He, X. Chu, and H. Yan, “ResilientRouting Mechanism for Wireless Sensor Networks With Deep LearningLink Reliability Prediction,”

IEEE Access , vol. 8, pp. 64857–64872,2020.[227] X. He, H. Jiang, Y. Song, C. He, and H. Xiao, “Routing Selection WithReinforcement Learning for Energy Harvesting Multi-Hop CRN,”

IEEEAccess , vol. 7, pp. 54435–54448, 2019.[228] N. Mastronarde, V. Patel, J. Xu, L. Liu, and M. van der Schaar, “ToRelay or Not to Relay: Learning Device-to-Device Relaying Strategiesin Cellular Networks,”

IEEE Transactions on Mobile Computing ,vol. 15, pp. 1569–1585, June 2016.[229] Y. He, D. Zhai, Y. Jiang, and R. Zhang, “Relay Selection for UAV-Assisted Urban Vehicular Ad Hoc Networks,”

IEEE Wireless Commu-nications Letters , vol. 9, pp. 1379–1383, Sept. 2020.[230] S. Hashima, K. Hatano, E. Takimoto, and E. Mahmoud Mohamed,“Neighbor Discovery and Selection in Millimeter Wave D2D Net-works Using Stochastic MAB,”

IEEE Communications Letters , vol. 24,pp. 1840–1844, Aug. 2020.[231] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis ofthe multiarmed bandit problem,”

Machine learning , vol. 47, no. 2-3,pp. 235–256, 2002. [232] A. Abdelreheem, O. A. Omer, H. Esmaiel, and U. S. Mohamed, “DeepLearning-Based Relay Selection In D2D Millimeter Wave Communica-tions,” in , (Sakaka, Saudi Arabia), pp. 1–5, Apr. 2019.[233] B. Mao, Y. Kawamoto, and N. Kato, “AI-Based Joint Optimization ofQoS and Security for 6G Energy Harvesting Internet of Things,” IEEEInternet of Things Journal , vol. 7, no. 8, pp. 7032–7042, 2020.[234] A. M. Zungeru, L. M. Ang, S. Prabaharan, and K. P. Seng, “RadioFrequency Energy Harvesting and Management for Wireless SensorNetworks,” in

Green mobile devices and networks: Energy optimizationand scavenging techniques , no. 13, pp. 341–368, CRC Press New York,NY, USA, 2012.[235] M. Chu, X. Liao, H. Li, and S. Cui, “Power Control in EnergyHarvesting Multiple Access System With Reinforcement Learning,”

IEEE Internet of Things Journal , vol. 6, no. 5, pp. 9175–9186, 2019.[236] N. Abuzainab, W. Saad, and B. Maham, “Robust Bayesian Learning forWireless RF Energy Harvesting Networks,” in , (Paris, France), pp. 1–8, May 2017.[237] J. C. Kwan, J. M. Chaulk, and A. O. Fapojuwo, “A CoordinatedAmbient/Dedicated Radio Frequency Energy Harvesting Scheme UsingMachine Learning,”

IEEE Sensors Journal , vol. 20, no. 22, pp. 13808–13823, 2020.[238] C. Yang, K. Chin, T. He, and Y. Liu, “On Sampling Time Maximizationin Wireless Powered Internet of Things,”

IEEE Transactions on GreenCommunications and Networking , vol. 3, no. 3, pp. 641–650, 2019.[239] X. Fan, W. D. Weber, and L. A. Barroso, “Power Provisioning for aWarehouse-Sized Computer,”

ACM SIGARCH Computer ArchitectureNews , vol. 35, no. 2, pp. 13–23, 2007.[240] V. Gupta, R. Nathuji, and K. Schwan, “An Analysis of Power Reductionin Datacenters Using Heterogeneous Chip Multiprocessors,”

ACMSIGMETRICS Performance Evaluation Review , vol. 39, no. 3, pp. 87–91, 2011.[241] B. Tian, L. Wang, Y. Ai, and A. Fei, “Reinforcement Learning BasedMatching for Computation Ofﬂoading in D2D Communications,” in , pp. 984–988, 2019.[242] L. Li, Y. Xu, J. Yin, W. Liang, X. Li, W. Chen, and Z. Han, “DeepReinforcement Learning Approaches for Content Caching in Cache-Enabled D2D Networks,”

IEEE Internet of Things Journal , vol. 7,pp. 544–557, Jan. 2020.[243] X. Wang, X. Wei, and L. Wang, “A deep learning based energy-efﬁcient computational ofﬂoading method in Internet of vehicles,”

China Communications , vol. 16, no. 3, pp. 81–91, 2019.[244] “Simulated Annealing Algorithm.” Available at https://en.wikipedia.org/wiki/Simulated_annealing(2020/09/15).[245] J. Yan, S. Bi, and Y. J. A. Zhang, “Ofﬂoading and Resource Alloca-tion With General Task Graph in Mobile Edge Computing: A DeepReinforcement Learning Approach,”

IEEE Transactions on WirelessCommunications , vol. 19, no. 8, pp. 5404–5419, 2020.[246] D. Zhang, H. Ge, T. Zhang, Y. Cui, X. Liu, and G. Mao, “New Multi-Hop Clustering Algorithm for Vehicular Ad Hoc Networks,”

IEEETransactions on Intelligent Transportation Systems , vol. 20, no. 4,pp. 1517–1530, 2019.[247] S. Ma, S. Song, J. Zhao, L. Zhai, and F. Yang, “Joint Network Selectionand Service Placement Based on Particle Swarm Optimization forMulti-Access Edge Computing,”

IEEE Access , vol. 8, pp. 160871–160881, 2020.[248] Y. Wang, H. Ge, A. Feng, W. Li, L. Liu, and H. Jiang, “ComputationOfﬂoading Strategy Based on Deep Reinforcement Learning in Cloud-Assisted Mobile Edge Computing,” in ,pp. 108–113, 2020.[249] N. Cheng, F. Lyu, W. Quan, C. Zhou, H. He, W. Shi, and X. Shen,“Space/Aerial-Assisted Computing Ofﬂoading for IoT Applications:A Learning-Based Approach,”

IEEE Journal on Selected Areas inCommunications , vol. 37, no. 5, pp. 1117–1129, 2019.[250] J. Xu, L. Chen, and S. Ren, “Online Learning for Ofﬂoading andAutoscaling in Energy Harvesting Mobile Edge Computing,”

IEEETransactions on Cognitive Communications and Networking , vol. 3,no. 3, pp. 361–373, 2017.[251] H. Zhang, S. Huang, C. Jiang, K. Long, V. C. M. Leung, and H. V.Poor, “Energy Efﬁcient User Association and Power Allocation inMillimeter-Wave-Based Ultra Dense Networks With Energy HarvestingBase Stations,”

IEEE Journal on Selected Areas in Communications ,vol. 35, no. 9, pp. 1936–1947, 2017. [252] J. Holland, “Genetic Algorithms and Adaptation,” in

Adaptive Controlof Ill-Deﬁned Systems , pp. 317–313, Boston, MA, USA: Springer,1984.[253] W. Li, J. Wang, G. Zhang, L. Li, Z. Dang, and S. Li, “A ReinforcementLearning Based Smart Cache Strategy for Cache-Aided Ultra-DenseNetwork,”

IEEE Access , vol. 7, pp. 39390–39401, 2019.[254] T. Schaul, J. Quan, L. Antonoglou, and D. Silver, “Prioritized Expe-rience Replay,” in

International Conference on Learning Representa-tions , (San Juan, Puerto Rico), May 2016.[255] S. O. Somuyiwa, A. György, and D. Gündüz, “A Reinforcement-Learning Approach to Proactive Caching in Wireless Networks,”

IEEEJournal on Selected Areas in Communications , vol. 36, pp. 1331–1344,June 2018.[256] J. Shi, L. Zhao, X. Wang, W. Zhao, A. Hawbani, and M. Huang,“A Novel Deep Q-Learning-Based Air-Assisted Vehicular CachingScheme for Safe Autonomous Driving,”

IEEE Transactions on Intelli-gent Transportation Systems , pp. 1–11, 2020.[257] J. Tang, H. Tang, X. Zhang, K. Cumanan, G. Chen, K. Wong, and J. A.Chambers, “Energy Minimization in D2D-Assisted Cache-EnabledInternet of Things: A Deep Reinforcement Learning Approach,”

IEEETransactions on Industrial Informatics , vol. 16, pp. 5412–5423, Aug.2020.[258] Z. Yu, J. Hu, G. Min, H. Lu, Z. Zhao, H. Wang, and N. Georgalas, “Fed-erated Learning Based Proactive Content Caching in Edge Computing,”in ,(Abu Dhabi, United Arab Emirates), pp. 1–6, Dec. 2018.[259] L. Cui, X. Su, Z. Ming, Z. Chen, S. Yang, Y. Zhou, and W. Xiao,“CREAT: Blockchain-assisted Compression Algorithm of FederatedLearning for Content Caching in Edge Computing,”

IEEE Internet ofThings Journal , pp. 1–1, 2020.[260] L. Lei, L. You, G. Dai, T. X. Vu, D. Yuan, and S. Chatzinotas,“A Deep Learning Approach for Optimizing Content Delivering inCache-Enabled HetNet,” in , (Bologna, Italy), pp. 449–453, Aug.2017.[261] A. Al-Hilo, M. Samir, C. Assi, S. Sharafeddine, and D. Ebrahimi,“UAV-Assisted Content Delivery in Intelligent Transportation Systems-Joint Trajectory Planning and Cache Management,”

IEEE Transactionson Intelligent Transportation Systems , pp. 1–13, 2020.[262] M. Dorigo, “Ant Colony Optimization,”

IEEE Internet of ThingsJournal , vol. 2, no. 3, p. 1461, 2007.[263] Q. Li, Y. Sun, Q. Wang, L. Meng, and Y. Zhang, “A Green DDPGReinforcement Learning-Based Framework for Content Caching,” in , (Chongqing, China), pp. 223–227, June 2020.[264] S. Guha, “Deep Deterministic Policy Gradient (DDPG): Theoryand Implementation.” Available at https://towardsdatascience.com/deep-deterministic-policy-gradient-ddpg-theory-and-implementation-747a3010e82f(2020/09/15).[265] M. Lukoševiˇcius, “A Practical Guide to Applying Echo State Net-works,” in

Neural Networks: Tricks of the Trade (G. Montavon, G. Orr,and K. Müller, eds.), pp. 659–686, Berlin, Heidelberg, Germany:Springer, 2012.[266] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong,“Caching in the Sky: Proactive Deployment of Cache-Enabled Un-manned Aerial Vehicles for Optimized Quality-of-Experience,”

IEEEJournal on Selected Areas in Communications , vol. 35, pp. 1046–1061,May 2017.[267] J. Liu, Y. Shi, Z. M. Fadlullah, and N. Kato, “Space-Air-Ground Inte-grated Network: A Survey,”

IEEE Communications Surveys Tutorials ,vol. 20, no. 4, pp. 2714–2741, 2018.[268] H. Tsuchida, Y. Kawamoto, N. Kato, K. Kaneko, S. Tani, S. Uchida,and H. Aruga, “Efﬁcient Power Control for Satellite-Borne BatteriesUsing Q-Learning in Low-Earth-Orbit Satellite Constellations,”

IEEEWireless Communications Letters , vol. 9, no. 6, pp. 809–812, 2020.[269] N. Kato, Z. M. Fadlullah, F. Tang, B. Mao, S. Tani, A. Okamura,and J. Liu, “Optimizing Space-Air-Ground Integrated Networks byArtiﬁcial Intelligence,”

IEEE Wireless Communications , vol. 26, no. 4,pp. 140–147, 2019.[270] A. Kansal, J. Hsu, S. Zahedi, and M. B. Srivastava, “Power Manage-ment in Energy Harvesting Sensor Networks,”

ACM Trans. Embed.Comput. Syst. , vol. 6, Sep. 2007.[271] . F. Gambín and M. Rossi, “A Sharing Framework for Energy andComputing Resources in Multi-Operator Mobile Networks,”

IEEETransactions on Network and Service Management , vol. 17, no. 2,pp. 1140–1152, 2020. [272] H. Jahangir, H. Tayarani, S. Sadeghi Gougheri, M. Aliakbar Golkar,A. Ahmadian, and A. Elkamel, “Deep Learning-based ForecastingApproach in Smart Grids with Micro-Clustering and Bi-directionalLSTM Network,” IEEE Transactions on Industrial Electronics , pp. 1–1, 2020.[273] S. A. Hoseini, J. Hassan, A. Bokani, and S. S. Kanhere, “TrajectoryOptimization of Flying Energy Sources using Q-Learning to RechargeHotspot UAVs,” in

IEEE INFOCOM 2020 - IEEE Conference on Com-puter Communications Workshops (INFOCOM WKSHPS) , (Toronto,Canda), pp. 683–688, 2020.[274] T. D. Ponnimbaduge Perera, D. N. K. Jayakody, S. K. Sharma,S. Chatzinotas, and J. Li, “Simultaneous Wireless Information andPower Transfer (SWIPT): Recent Advances and Future Challenges,”

IEEE Communications Surveys Tutorials , vol. 20, no. 1, pp. 264–302,2018.[275] Y. Liang, Y. He, and J. Qiao, “Optimal Power Splitting for Simul-taneous Wireless Information and Power Transfer in Millimeter-waveNetworks,” in

IEEE INFOCOM 2020 - IEEE Conference on Com-puter Communications Workshops (INFOCOM WKSHPS) , (Toronto,Canada), pp. 1117–1122, 2020.[276] Z. Cui, F. Xue, X. Cai, Y. Cao, G. Wang, and J. Chen, “Detection of Malicious Code Variants Based on Deep Learning,”

IEEE Transactionson Industrial Informatics , vol. 14, no. 7, pp. 3187–3196, 2018.[277] Q. Chen, Z. Zheng, C. Hu, D. Wang, and F. Liu, “On-Edge Multi-Task Transfer Learning: Model and Practice With Data-Driven TaskAllocation,”

IEEE Transactions on Parallel and Distributed Systems ,vol. 31, no. 6, pp. 1357–1371, 2020.[278] T. Nishio and R. Yonetani, “Client Selection for Federated Learningwith Heterogeneous Resources in Mobile Edge,” in

ICC 2019 - 2019IEEE International Conference on Communications (ICC) , pp. 1–7,2019.[279] Y. Yamauchi, K. Musha, and H. Amano, “Implementing a LargeAplication(LSTM) on the Multi-FPGA System: Flow-in-Cloud,” in , pp. 1–3, 2019.[280] K. Yang, Y. Shi, W. Yu, and Z. Ding, “Energy-Efﬁcient Processing andRobust Wireless Cooperative Transmission for Edge Inference,”

IEEEInternet of Things Journal , pp. 1–1, 2020.[281] A. E. Eshratifar, M. S. Abrishami, and M. Pedram, “JointDNN: AnEfﬁcient Training and Inference Engine for Intelligent Mobile CloudComputing Services,”