[PDF] Customized Slicing for 6G: Enforcing Artificial Intelligence on Resource Management

Abstract

Next generation wireless networks are expected to support diverse vertical industries and offer countless emerging use cases. To satisfy stringent requirements of diversified services, network slicing is developed, which enables service-oriented resource allocation by tailoring the infrastructure network into multiple logical networks. However, there are still some challenges in cross-domain multi-dimensional resource management for end-to-end (E2E) slices under the dynamic and uncertain environment. Trading off the revenue and cost of resource allocation while guaranteeing service quality is significant to tenants. Therefore, this article introduces a hierarchical resource management framework, utilizing deep reinforcement learning in admission control of resource requests from different tenants and resource adjustment within admitted slices for each tenant. Particularly, we first discuss the challenges in customized resource management of 6G. Second, the motivation and background are presented to explain why artificial intelligence (AI) is applied in resource customization of multi-tenant slicing. Third, E2E resource management is decomposed into two problems, multi-dimensional resource allocation decision based on slice-level feedback and real-time slice adaption aimed at avoiding service quality degradation. Simulation results demonstrate the effectiveness of AI-based customized slicing. Finally, several significant challenges that need to be addressed in practical implementation are investigated.

Full PDF

aa r X i v : . [ c s . N I] F e b Customized Slicing for 6G: Enforcing ArtiﬁcialIntelligence on Resource Management

Wanqing Guan, Haijun Zhang,

Senior Member, IEEE, and Victor C. M. Leung,

Fellow, IEEE

Abstract

Next generation wireless networks are expected to support diverse vertical industries and offercountless emerging use cases. To satisfy stringent requirements of diversiﬁed services, network slicing isdeveloped, which enables service-oriented resource allocation by tailoring the infrastructure network intomultiple logical networks. However, there are still some challenges in cross-domain multi-dimensionalresource management for end-to-end (E2E) slices under the dynamic and uncertain environment. Tradingoff the revenue and cost of resource allocation while guaranteeing service quality is signiﬁcant totenants. Therefore, this article introduces a hierarchical resource management framework, utilizing deepreinforcement learning in admission control of resource requests from different tenants and resourceadjustment within admitted slices for each tenant. Particularly, we ﬁrst discuss the challenges in cus-tomized resource management of 6G. Second, the motivation and background are presented to explainwhy artiﬁcial intelligence (AI) is applied in resource customization of multi-tenant slicing. Third,E2E resource management is decomposed into two problems, multi-dimensional resource allocationdecision based on slice-level feedback and real-time slice adaption aimed at avoiding service qualitydegradation. Simulation results demonstrate the effectiveness of AI-based customized slicing. Finally,several signiﬁcant challenges that need to be addressed in practical implementation are investigated.

Wanqing Guan and Haijun Zhang are with Beijing Engineering and Technology Research Center for Convergence Networksand Ubiquitous Services, Institute of Artiﬁcial Intelligence, University of Science and Technology Beijing, Beijing, China (E-mail: [email protected], [email protected]).Wanqing Guan is also with State Key Laboratory of Media Convergence and Communication, Communication University ofChina, Beijing, China.Victor C. M. Leung is with the Department of Electrical and Computer Engineering, The University of British Columbia,Vancouver, BC V6T 1Z4 Canada (E-mail: [email protected]).

February 23, 2021 DRAFT

I. I

NTRODUCTION

In the coming era of 6G, the proliferation of new smart terminals and the explosion ofnew applications in vertical industry markets, such as augmented or virtual reality (AR/VR),unmanned aerial vehicles (UAV), fully autonomous driving, satellite-ground communications,etc., are forcing mobile network operators (MNOs) to carry the complex scenarios and deliverdiverse services [1]. Meanwhile, user demands are evolving continuously, making it more difﬁcultto provide customized services and make personalized decisions in real time [2]. 6G is expectedto satisfy the dynamic and differentiated demands of users through real-time micro-managementof multiple resources including communication, computing and storage resources.The concept of network slicing is proposed in 5G to create end-to-end (E2E) slice instancesaccording to the different requirements of various services. As a key innovation expected to beinherited in 6G, network slicing is able to reduce the capital expenditure and operating expense(CAPEX/OPEX) by sharing the network resources among multiple tenants. Tenants, such asmobile virtual network operators (MVNOs), over-the-top (OTT) and vertical industries withlimited capacity or coverage, rent the physical resources of MNOs or infrastructure networkproviders (InPs) to provide diversiﬁed services. To further reduce CAPEX/OPEX and increaserevenue opportunities, tenants are motivated to unite the available resources provided by differentInPs to enhance their attractiveness and acquire more subscribers [3].Therefore, there is a tremendous need to efﬁciently manage multi-dimensional resources ofmulti-InPs while meeting the strict and diversiﬁed service requirements of multi-tenant underdynamic environment. Creating customized slices for multiple tenants according to their pref-erences enables ﬂexible and adaptive resource management [4]. Moreover, allowing tenants tocustomize the resource allocation for each slice can dynamically adapt to the changes in networkenvironment caused by user mobility, time-varying channel conditions and so on. However,supporting more 6G innovative services and satisfying increasingly-diverse user demands imposesigniﬁcant challenges for customized slicing, particularly in terms of E2E slices managementand multi-dimensional resources orchestration.The ﬁrst challenge is how to achieve real-time status observation of slices by depicting dynamicslice deployment and scalable resource utilization. Slice status information should be accurately

February 23, 2021 DRAFT obtained and quickly incorporated in decision-making of resource allocation. Then, efﬁcientresources planning is conducted based on current status information of slices, including reservingresources for slices and determining the placement of virtual network functions (VNFs) fordifferentiated slices.Considering that an E2E slice consists of a number of interconnected VNFs from radio accessnetwork (RAN), core network and transport network, combinatorial optimization of numerousresources is the second challenge. The differences in proﬁt of providing multiple resources todifferent tenants need to be accounted for when maximizing long-term revenue of InPs as networkslice providers (NSPs). Striking a balance between the resources utilization of infrastructures andthe proﬁts of differentiated services provisioning is crucial for NSPs.Last but not least, quickly satisfying the dynamic demands of differentiated services is anotherchallenge. Since the scale and rates of network ﬂows keep changing, the resources allocated toslices need to be adjusted in time to cope with the dynamic user demands. Additionally, thegrowing number and types of slices result in high complexity of slice adaption. Trading off thecost of reconﬁguring slices and the satisfaction of stringent service quality becomes harder fortenants as network slice customers (NSCs).Artiﬁcial intelligence (AI) saw rapid development during the past ten years and solved manypain points in different industries, such as healthcare, autonomous driving, smart manufacturing,etc. As one of the most promising AI tools, machine learning (ML) techniques have been widelyapplied in wireless communications [5]. By iteratively learning from the reward feedback ofenvironment, an optimal decision can be quickly achieved with ML methods compared to theconventional model-based optimization methods. Many researches adopt reinforcement learning(RL) based approach to manage resources involving both radio access part and core network part[6]. RL incorporates farsighted system evolution into its decision-making and updates decisionstrategies to reach optimal performance through feedback of the previous decisions.However, the existing RL based resource allocation methods [6, 7] and the AI-assisted networkarchitecture for network slicing [8] are inadequate to balance the capability of customizing re-sources for multiple tenants and maximizing revenues for multiple InPs. It is still very challengingto allocate resources across multiple domains and customize resources for each tenant simulta-neously. In this article, we provide an AI-based hierarchical resource management framework,

February 23, 2021 DRAFT leveraging AI algorithms in both of the global management of multi-domain resources and thelocal slice adaption for multiple tenants. In addition, we introduce a customized slicing procedurefor the proposed framework, realizing real-time on-demand resource provisioning and long-termrevenue maximization.The remainder of this article is organized as follows. We ﬁrst discuss the characteristicsof customized slicing in the scenario of multi-InPs and multi-tenant, and explain why AI-based approaches is adopted. Then, a hierarchical framework is proposed for intelligent resourcemanagement of E2E slices, supporting slice customization for each tenant based on RL methods.The procedure of observing E2E slices’ status and incorporating it into decision making asslice-level feedback is introduced. We illustrate the effectiveness of the prosposed intelligentmanagement scheme in achieving high revenue and maintaining service quality. Finally, thechallenges in practical implementation of 6G intelligent resource management are highlighted.II. M

OTIVATIONS AND B ACKGROUND

A. Network Slicing Across Multiple Infrastructures

As a fundamental attribute of 5G and beyond, network slicing is realized with the maturity ofsoftware deﬁned networking (SDN) and network function virtualization (NFV). As the enablersof network slicing, NFV decouples software and hardware by virtualizing network functionsand running them on the virtual machines (VMs) while SDN architecture provides centralizedcontrol plane for the conﬁguration of network resources. These techniques prompt a service-basedE2E wireless network architecture where VNFs of RANs and core network are placed as VMsdeployed in data centers (DCs) of cloud InPs. The diverse demands of tenants can be satisﬁedthrough ﬂexibly managing resources and efﬁciently orchestrating VNFs of slices. By involvingtenants in virtual network embedding (VNE) calculation, virtual networks could be provided in atenant-driven manner with a trade-off between cost-effectiveness and time-efﬁciency. Since E2Eslices require multiple resources, multiple domains administrated by different cloud InPs forma federated environment to jointly provide tenants with resources.E2E network slicing across multiple infrastructures has been discussed in the literature whilemanagement and orchestration (MANO) operations of slices in multiple administrative domainsare also concern [9], as well as the life-cycle management operations. Through ﬂexible slicing,

February 23, 2021 DRAFT heterogeneous resources of these cloud infrastructures can be utilized in a customized mannerand the additional costs of the coalition can be reduced [10]. Besides, analyzing the proﬁt ofresources provisioning and monitoring the status of resource utilization are essential in dynamicreal-time E2E slicing. Speciﬁcally, performing admission control of resource requests needs toconsider the revenue of NSPs as well as the service requirements and reallocating resourcesacross multiple domains requires a global view of slice deployment status. To handle massiverequests of conﬁguring and modifying E2E slices dynamically, RL methods are used in ourarchitecture, improving the speed and accuracy of decision-making.

B. Multi-tenant Slicing

Following the upcoming trends of applications, such as smart driving, AR/VR cloud gaming,the typical scenarios supported by 6G include further enhanced mobile broadband (FeMBB),ultra-massive machine-type communications (umMTC), extremely reliable and low-latency com-munications (eURLLC), long-distance and high-mobility communications (LDHMC), and ex-tremely low-power communications (ELPC) [11]. For services in these scenarios, guaranteeingextreme quality of experience (QoE) continuously requires rapid adjustment of network pa-rameters based on real-time monitoring of network status. In order to guarantee QoE and boostrevenue, efﬁcient sharing of the underlaid network infrastructure has stimulated the interest of theresearch community [12]. By establishing efﬁcient network sharing schemes, multiple tenantswhich may own conﬂicting resource requirements obtain access to the different parts of thelimited resources.As service providers, tenants rent resources to offer slice instances according to heterogeneousservice requirements, which enhances the existent resource sharing ﬂexibility. Network slicingallows various tenants to provide better-performing and cost-efﬁcient services by supportingcustomized slices [6]. Due to the uncertainty of service requirements, many model-free AI-basedsolutions are applied to jointly allocate multi-dimensional resources to slices [13]. Moreover,RL has become an effective method to solve the decision-making problem of network slicingin the uncertain and probabilistic environment [14]. For each tenant, when the trafﬁc ﬂowarrival/departure results in the degradation of slice’s service quality, individually deciding how

February 23, 2021 DRAFT

DNN

Input

OutputExperience Replay MemoryTraining

EnvironmentFeedback

State Reward Action t A t R t S t S t S + t R + ( ) , , , t t t t S A R S + Q-ValueQ-learning

Observe new state and reward

Action selection strategy

Fig. 1: An illustration of deep Q-learningto reallocate available resources is necessary. Owing that the trafﬁc variation cannot be predictedwithout error, RL methods are also applied in slice adaption decisions of our architecture.

C. Deep Reinforcement Learning

Because resource allocation in wireless networks affects the QoE of services, various resourceallocation methods have been studied over the past decades, including optimization, heuristic andgame theoretic. As wireless networks become more complex, the static model-based algorithmswill be inapplicable in the real dynamic network because of the long decision-making timeand high computing burden. Owing to the capability of learning an optimal policy quickly, RLhas been preferred for decision-making in the time-varying network environments and widelyapplied in solving many resource management problems, for instance, power control, spectrummanagement and computation resource management [5]. As one of the most commonly adoptedconventional RL algorithm, Q-learning suffers from slow convergence speed when the state spaceand action space are large. Deep reinforcement learning (DRL) algorithm which integrates deepneural network (DNN) with RL has been proposed by Google DeepMind, and the applicationof many advanced DRL algorithms has triggered tremendous research attention [15].Based on deep Q-network, DRL as shown in Fig. 1 outperforms conventional RL becauseexperience replay is used to increase the efﬁciency of learning and enhance the stability of

February 23, 2021 DRAFT

MNO/InP 1

MNO/InP 2

RRH

Radio Access Network

Tenant A Tenant B

RRH

Core Network (cid:56)(cid:50)(cid:41) (cid:54)(cid:42)(cid:41)(cid:54) (cid:56)(cid:56)(cid:51)(cid:51)(cid:39)(cid:41) (cid:54)(cid:46)(cid:63) (cid:57)(cid:45)(cid:61) (cid:54)(cid:45)(cid:61) (cid:51)(cid:51)(cid:43)(cid:39)(cid:51)(cid:44) (cid:57)(cid:51)(cid:44) (cid:59)(cid:54)(cid:44) (cid:46)(cid:57)(cid:57) (cid:54)(cid:41)(cid:56)(cid:44)(cid:39)(cid:59)(cid:57)(cid:44) (cid:59)(cid:42)(cid:51)

Subscription

Transport Network

PGW (cid:889)

Packet GatewaySGW (cid:889)

Serving GatewayPHY (cid:889)

PhysicalMAC (cid:889)

Medium Access ControlRRM (cid:889)

Radio Resource Management

PDCP (cid:889)

Packet Data Convergence Protocol

RLC (cid:889)

Radio Link Control MME (cid:889)

Mobility Management Entity HSS (cid:889)

Home Subscriber ServerAMF (cid:889)

Access and Mobility Management FunctionSMF (cid:889)

Session Management FunctionUPF (cid:889)

User Plane Function AUSF (cid:889)

Authentication Server Function UDM (cid:889)

Unified Data ManagementPCRF (cid:889)

Policy and Charging Rules Function

GRM

Admission control

LRM1

Demand changes

AI Slice adaption (cid:60)(cid:52)(cid:44)(cid:60)(cid:52)(cid:44) (cid:60)(cid:52)(cid:44)

LRM2

Demand changes

AI Slice adaption (cid:60)(cid:52)(cid:44)(cid:60)(cid:52)(cid:44) (cid:60)(cid:52)(cid:44)

Resource requests

Revenue management

AIAIStatus acquisition Status acquisitionStatus acquisition

Hierarchical management

Fig. 2: The AI-based hierarchical resource management frameworkDNN. After performing action selection, reward calculation and new state observation, the mini-batches of experience are sampled uniformly at random to feed into the neural network duringthe learning process. DNN which is used to approximate the Q-value function takes the currentstates as the input and outputs a set of Q-values for all of the state-action pairs. Instead of usingQ-table to store Q-values in the Q-learning algorithm, the deep convolutional network is used toaddress the instability caused by the correlations. Experience replay memory randomizes overthe data, thereby allowing for greater efﬁciency and breaking the strong correlations between thesamples. Hence, DNN improves the convergence of Q-learning and enables the deep Q-learning(DQL) algorithm to solve the problems which have a high-dimensional state-action space.III. AI-

BASED H IERARCHICAL R ESOURCE M ANAGEMENT F RAMEWORK

A. Hierarchical Resource Management

Planning deployment location from a global view can effectively avoid resource competitioncaused by the increasing number of co-located VMs on the same server. Scheduling multi-

February 23, 2021 DRAFT dimensional resources in a comprehensive and balanced way can potentially increase resourceefﬁciency and avoid resource waste and shortage. In order to improve the revenue of resourceproviders, managing multi-domain resources centrally and allocating optimal amount resourcesto E2E slice instances are required. Given that trafﬁc load variation of the slice might degradeQoE, the centralized management approach faces performance issues and limits the autonomyof the tenants. Hence, based on the MANO architecture for multi-domain slices in 5G [9], anAI-based hierarchical resource management framework shown in Fig. 2 is proposed to integrateintelligence in customized slicing for 6G use cases in the scenario of multi-InPs and multi-tenant.To meet dynamically evolving service quality requirements and support ﬁne-grained networkdecision optimization, the proposed framework introduces a global resource manager (GRM)to handle incoming differentiated resource requests from tenants, and multiple local resourcemanager (LRM) to deal with the demand changes in resource requirements for individual tenant.The deployment of GRM and LRMs enables two-layer customization of slices, which means thatthe resources are ﬁrstly allocated to each tenant according to the heterogeneous slice performancerequirements, and then resource allocation to each slice is optimized and adjusted according tothe real-time observation of demand changes. It is worth noting that the AI-based algorithmsused in global resource allocation and local slice adaption can be different.The hierarchical approach can enable ﬂexibility and scalability properties by distributingresource management to individual tenant. GRM maintains the overall control over the LRMsand delegates the concrete operations to each LRM. GRM is responsible for charging of sliceowners and monitoring the LRMs while allocating federated resources across multiple domains.LRM performs slice adaption by adjusting the assigned resources to maintain service quality.Moreover, the LRMs not only provide each tenant the ability of resource customization, but alsohave the distinguishing feature of transmitting the status of slices to the GRM.To handle with the trafﬁc dynamics quickly, the status of slices which include the deploymentlocation of VNFs and the condition of trafﬁc ﬂows passing through these VNFs are observedperiodically. Mornitoring slice deployment and resource utilization facilitates to maintain servicequality and enhance resource efﬁciency. Observing the real-time status of E2E slices provides areference for determining whether or not to perform resource adaption. To realize real-timeresource monitoring and slice topology information updating in the scenario of multi-InPs

February 23, 2021 DRAFT

AI-driven optimizationsAllocate resourcesMachine learning model

Measure the service quality Compared with targeted valuesCollect status of each slice

Service

BrokerService Conductor

Virtual ResourcesPhysical Resources

RAN Transport Core

Virtualization LayerNetwork Storage Computing VNFs

Domain 1: GRM

OSS/BSS A B Tenants

LRM1 LRM2Slice Life-Cycle Management Sub-domain ControlSub-domain NFV MANO

AI-based slice customization

Fig. 3: The procedure of customized slicing with the proposed frameworkand multi-tenant, both of the differentiated slices provided by multiple tenants and the jointinfrastructure network which consists of multiple infrastructures are depicted. Speciﬁcally, thecooperation between these infrastructure networks and the mapping relationships between thephysical servers and the VNFs deployed in these servers are precisely delineated.

B. Customized Slicing Procedure

Figure 3 shows the procedure of customized slicing with the proposed AI-based managementframework. After receiving the real-time slice requests from multiple tenants, the GRM deployedin the functional plane named the Service Broker performs admission control of these requestsbased on ML model. The NSPs make a trade off between the resource requirements associatedto these requests and the revenue achieved by providing required resources. Multi-dimensionalresources are allocated to tenants with the objective of maximizing the long-term revenue ofNSPs. For the DRL-based resource allocation performed in GRM, states are deﬁned as thenumber of accepted requests belonging to different tenants, actions taken by each agent areaccepting/rejecting the arrival slice requests and reward is related to slice utility. With theoutput of the DRL algorithm in GRM, slices are deployed and the status of slices are recordedperiodically.

February 23, 2021 DRAFT0

Depending on perceived status, the current service quality can be measured and comparedwith its target quality requirements. The current service quality satisfaction reﬂects the gap withthe target value. The target values regarded as the desired service quality should be deﬁnedas the level of service that the available resources of InPs can and should provide, thus theyare preset and fed as input to the optimization problem of slice adaption. As the slice-levelfeedback, current service quality satisfaction is used to improve the performance of ML modeland update the model with demand changes that could occur over time. When there are changesin slice requests, such as a sudden increase in resource requirements, the ML model is utilizedto maintain service quality for admitted slices by micro-managing resources.The motivation of performing resource adaption generates from the mismatch between avail-able resources and the varying trafﬁc demand in the slice. This mismatch might cause two kindsof issues, one is that available resources are exhausted and the other is that partial resourcesare idle. The former means unfair resource allocation resulting in the low data rate of newlyaccepted user, and the latter means that the revenue of tenant is declining. To avoid unbalanceddistribution of available resources, i.e., some resources are under-utilized, some are over-utilized,the allocated resource of each tenant should be adjusted to maximize the proﬁts of availableresources. After receiving the requests of adjusting resources for multiple slices, tenant makesdecisions by weighing the cost and revenue of adjusting resources for each slice to ensure optimalresource efﬁciency.The LRM deployed in Service Conductor performs the DRL-based slice adaption. States aretied to the current service quality satisfaction and actions denotes whether slice adaption ispermitted. Reward is deﬁned as the revenue obtained by adjusting resource minus the resourceconsumption cost and operational cost. The revenue is related to the amount of money paidby the service subscribers for guaranteeing service quality, which depends on the type of slice.The resource consumption cost represents the cost of providing more resources, such as theextra processing units required by the newly arrived trafﬁc ﬂows. The operational cost meansthe cost of performing reconﬁguration, which includes the cost of service interruption causedby reallocating resources and migrating VNFs among physical servers. There is no doubt thatDQL used in this article can be replaced by other advanced DQL-based algorithms to achievebetter performance.

February 23, 2021 DRAFT1

IV. E

VALUATION

Having introduced the key elements of the proposed framework, the next important step is toevaluate the performance of customized slicing and verify the beneﬁts of AI-based resource man-agement. In this section, the effectiveness of the proposed management framework is validatedin terms of improving the long-term revenue and guaranteeing service quality.

A. Experimental Setup

The AI-based resource management approach is implemented in python where the Tensorﬂowlibrary is used to build the ML model. For the purpose of comparison, a non-intelligent re-source management approach with centralized resource allocation of differentiated slices is alsoimplemented. DQL algorithm is compared with a greedy algorithm used in the non-intelligentresource management framework. The greedy algorithm permits resource reallocation so longas the remaining resources are enough, which ignores difference in the value of reconﬁguringdifferentiated slice. The AI-based framework reserves more resources for slices which can bringmore revenue, and performs slice adaption in a cost-effective manner. To verify the performanceof the proposed framework, there are two tenants which owns different types of slices in thesimulation, and the ﬂow dynamics of these two types of slices are different. Assuming that sliceof type 1 owned by tenant A and slice of type 2 owned by tenant B have the same total numberof VNFs, and the VNFs are deployed in DCs of the joint infrastructure network provided bytwo InPs at the beginning of simulation.A summary of the simulation parameters is listed in Table 1. The topology of each infrastruc-ture network is generated according to the algorithm of Barab´asi-Albert (BA) scale-free networksbecause a forthcoming node of the communication network tends to connect itself to the nodeswith large degrees. While resources in wireless networks are miscellaneous, here we conﬁneresources to computational resource of DCs. We assume that processing one unit of data ﬂowrequires one unit of computational capacity. There are 5 DCs in the joint infrastructure networkwith 10-node topology and each DC has capacities of 300 processing units. The ﬂows in eachslice arrive following a Poisson process, and the service time follows an exponential distribution(the arrival and departure rates are given in Table I). The status of slices are recorded and storedin a database. Service quality satisfaction of slices are measured periodically and the period

February 23, 2021 DRAFT2

TABLE I: Simulation parameters

Items ValuesTotal number of physical nodes Number of DCs Resources of each DC processing unitsNumber of VNFs for each slice Number of ﬂows for slice of type 1

Flow arrival interval for slice of type 1 (sec)Flow service time for slice of type 1 (sec)Number of ﬂows for slice of type 2 Flow arrival interval for slice of type 2 (sec)Flow service time for slice of type 2 (sec) between two measurements is 1s. In this section, to calculate the values of current satisfaction,the number of waiting data ﬂows in each moment of measurement is recorded and normalized. B. Long-term Revenue

First, the proposed framework is able to maximize the average reward by reserving resource forresource requests which could bring more revenue. According to the requests of tenants, GRMachieves the optimal decision-making for admission control through DRL-based algorithm. Thearrival rates of resource requests from tenant A and tenant B are set at 10 requests/hour and 12requests/hour respectively while the completion rates of requests are set at 6 requests/hour. Theimmediate reward obtained by accepting resource request from tenant A is set at 2 and tenantB is varied while each resource request requires 60 processing units of each DC.In Fig. 4(a), the performance of the DQL algorithm with Q-learning and greedy algorithmsin terms of long-term revenue are compared. It is shown that the revenue obtained by threealgorithms are increased when the immediate reward of accepting requests from tenant B isvaried from 1 to 6. However, the revenue obtained by RL algorithms, i.e., DQL and Q-learning,is signiﬁcantly higher than that of the greedy algorithm. The reason is that RL algorithms reserveresource for the requests which may bring high reward while the greedy algorithm accepts the

February 23, 2021 DRAFT3 ,PPHGLDWHUHZDUGIRUWHQDQW% / RQJ W H U P U H Y H QX H *UHHG\4/HDUQLQJ'4/ (a) Long-term revenue (b) Proportion of accepted requests Fig. 4: The performance of the proposed framework when the immediate reward is varied.requests according to the available resource without considering reward. Besides, due to the slowconvergence rate of Q-learning, the DQL algorithm achieves higher revenue than Q-learning.To further analyze the performance of the proposed framework, the proportion of acceptedrequests from two tenants are calculated and shown in Fig. 4(b). It can be observed that RLalgorithms are likely to accept the resource requests which have higher immediate reward. Forexample, when the immediate reward of accepting requests from tenant B is larger than thatfrom tenant A, there are more accepted requests from tenant B. In contrast, the greedy algorithmaccepts request when the available resource satisfy the demand. Hence, the composition ratio ofaccepted requests is not affected by the change of the immediate reward.

C. Service Quality Satisfaction

Second, to substantiate that the proposed framework is able to maintain service quality,the service quality satisfaction of the intelligent framework and non-intelligent framework arecompared. The satisfaction related to E2E delay is inﬂuenced by the remaining processing unitsallocated to this slice, which determines whether the incoming data ﬂows can be delivered intime. Once the remaining resources are insufﬁcient to meet the service quality requirements,the number of the waiting data ﬂows is going to accumulate. When the allocated resource cannot satisfy the dynamic demand, the slice adaption algorithm in each LRM of the proposedframework will be triggered to make decisions of reallocating processing units. With resource

February 23, 2021 DRAFT4 (a) Service quality satisfaction (b) Decision delay

Fig. 5: Comparison between the intelligent and non-intelligent frameworks. (a) Service qualitysatisfaction vs. time; (b) Decision delay vs. the number of VNFs for each slice.customization, service quality of slice could be maintained in a higher level by sacriﬁcing thesignaling cost and the communication overhead.In Fig. 5, we plot the service quality satisfaction vs. time and decision delay vs. the number ofVNFs for each slice for the intelligent resource management framework and the non-intelligentframework. As shown in Fig. 5(a), the average satisfaction of the intelligent framework is in-creased by nearly 20% with the proposed method, compared with the non-intelligent framework.In Fig. 5(b), it can be seen that the non-intelligent framework requires minimum decision delaywhile the proposed framework with Q-learning algorithm requires the most computation time. Inparticular, more decision time is required as the number of VNFs increases, because reconﬁguringslices becomes more complex. However, both DQL and double DQL which can further improvestability and avoid overestimation have decision delays of less than 1 second.V. O

PEN I SSUES AND F UTURE C HALLENGES

With the deployment of GRM and LRMs, the proposed framework achieves multi-tenantoriented intelligent resource management with the aim of optimizing the long-term revenue of theNSPs, and realizes ﬁne-grained resource customization with the aim of maintaining the servicequality of slices from different tenants. In addition, AI-enabled technique or, more precisely, theuse of ML algorithms allow the resource management framework to adapt the changes in resource

February 23, 2021 DRAFT5 requirements and learn optimal policy from the dynamic environment. Nevertheless, there areplenty of open issues that need further study and several challenges in practical implementation.Some of these issues and challenges are introduced below.

Real-time prediction of evolving user demands : User demands are highly dynamic and uncer-tain. Therefore, there are efforts in the literature to predict users’ behavior. However, correlatingthe evolutional tendency of demands to resource allocation in network slicing constitutes achallenging but interesting line of research.

Fast implementation of E2E slicing : Responding to user demands in a real-time manner isessential in providing better service quality. For this reason, E2E slices need to be instantiatedrapidly and completely. Therefore, more research should be conducted to develop practicalsolutions supporting easier experimentation in the scenario of multi-InPs and multi-tenant.

Adaptive adjustment of AI-based solution : In the learning stage of AI-based resource man-agement framework, the relatively long convergence time of ML methods undermines theirusefulness. Besides convergence, the stochastic nature of the wireless network may requireongoing updates of the parameters and continuous adaption of ML methods. Therefore, feasibleand scalable ML algorithms need more study and analysis.

Coordinated collaboration of multiple InPs : As mentioned earlier, the cooperation betweendifferent infrastructure networks offers an attractive mean of providing multiple resources atlow cost. However, the process of building effective collaboration is extremely challenging,which necessitates more investigation about management interfaces and resource isolation acrossmultiple administrative domains. VI. C

ONCLUSION

This article has proposed intelligent resource management framework to enable customizedslicing in the scenario of multi-InPs and multi-tenant based on the slice-level feedback andreal-time service quality. Along with artiﬁcial intelligence, we propose a hierarchical frameworkwhere global resource manager utilizes machine-learning-based method coupled with servicequality evaluation to make optimal resource allocation decisions and local resource managerscollect status information about slice deployment and resource utilization to support ﬁne-grainedresource adaption. The simulation results show the effectiveness of the proposed reinforcement-

February 23, 2021 DRAFT6 learning-based resource management approach in terms of achieving high revenue of networkslice providers and maintaining high quality of end-to-end services.R

EFERENCES [1] K. David and H. Berndt, “6g vision and requirements: Is there any need for beyond 5g?”

IEEE Veh. Technol. Mag. , vol. 13,no. 3, pp. 72–80, 2018.[2] R. Alkurd, I. Abualhaol, and H. Yanikomeroglu, “Big-data-driven and ai-based framework to enable personalization inwireless networks,”

IEEE Commun. Mag. , vol. 58, no. 3, pp. 18–24, 2020.[3] K. Samdanis, X. Costa-Perez, and V. Sciancalepore, “From network sharing to multi-tenancy: The 5g network slice broker,”

IEEE Commun. Mag. , vol. 54, no. 7, pp. 32–39, 2016.[4] H. Zhang, N. Liu, X. Chu, K. Long, A. Aghvami, and V. C. M. Leung, “Network slicing based 5g and future mobilenetworks: Mobility, resource management, and challenges,”

IEEE Commun. Mag. , vol. 55, no. 8, pp. 138–145, 2017.[5] Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao, “Application of machine learning in wireless networks: Key techniquesand open issues,”

IEEE Commun. Surv. Tutorials , vol. 21, no. 4, pp. 3072–3108, 2019.[6] R. Li, Z. Zhao, Q. Sun, C. I, C. Yang, X. Chen, M. Zhao, and H. Zhang, “Deep reinforcement learning for resourcemanagement in network slicing,”

IEEE Access , vol. 6, pp. 74 429–74 441, 2018.[7] N. Van Huynh, D. Thai Hoang, D. N. Nguyen, and E. Dutkiewicz, “Optimal and fast real-time resource slicing with deepdueling neural networks,”

IEEE J. Sel. Areas Commun. , vol. 37, no. 6, pp. 1455–1470, 2019.[8] X. Shen, J. Gao, W. Wu, K. Lyu, M. Li, W. Zhuang, X. Li, and J. Rao, “Ai-assisted network-slicing based next-generationwireless networks,”

IEEE Open J. Veh. Technol. , vol. 1, pp. 45–66, 2020.[9] T. Taleb, I. Afolabi, K. Samdanis, and F. Z. Yousaf, “On multi-domain network slicing orchestration architecture andfederated resource control,”

IEEE Network , vol. 33, no. 5, pp. 242–252, 2019.[10] M. Vincenzi, A. Antonopoulos, E. Kartsakli, J. Vardakas, L. Alonso, and C. Verikoukis, “Multi-tenant slicing for spectrummanagement on the road to 5g,”

IEEE Wireless Commun. , vol. 24, no. 5, pp. 118–125, 2017.[11] Z. Zhang, Y. Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, and P. Fan, “6g wireless networks: Vision,requirements, architecture, and key technologies,”

IEEE Veh. Technol. Mag. , vol. 14, no. 3, pp. 28–41, 2019.[12] A. Antonopoulos, “Bankruptcy problem in network sharing: Fundamentals, applications and challenges,”

IEEE WirelessCommun. , vol. 27, no. 4, pp. 81–87, 2020.[13] X. Chen, Z. Zhao, C. Wu, M. Bennis, H. Liu, Y. Ji, and H. Zhang, “Multi-tenant cross-slice resource orchestration: Adeep reinforcement learning approach,”

IEEE J. Sel. Areas Commun. , vol. 37, no. 10, pp. 2377–2392, 2019.[14] Y. Abiko, T. Saito, D. Ikeda, K. Ohta, T. Mizuno, and H. Mineno, “Flexible resource block allocation to multiple slicesfor radio access network slicing using deep reinforcement learning,”

IEEE Access , vol. 8, pp. 68 183–68 198, 2020.[15] Y. Hua, R. Li, Z. Zhao, X. Chen, and H. Zhang, “Gan-powered deep distributional reinforcement learning for resourcemanagement in network slicing,”

IEEE J. Sel. Areas Commun. , vol. 38, no. 2, pp. 334–349, 2020., vol. 38, no. 2, pp. 334–349, 2020.