AI-Empowered VNF Migration as a Cost-Loss-Effective Solution for Network Resilience
AAI-Empowered VNF Migration as aCost-Loss-Effective Solution for Network Resilience
Amina Lejla Ibrahimpaˇsi´c, Bin Han, and Hans D. Schotten
Division of Wireless Communications and Radio Positioning (WiCoN)Department of Electrical and Computer EngineeringTechnische Universit¨at Kaiserslautern
Abstract —With a wide deployment of Multi-Access EdgeComputing (MEC) in the Fifth Generation (5G) mobile networks,virtual network functions (VNF) can be flexibly migrated betweendifference locations, and therewith significantly enhances thenetwork resilience to counter the degradation in quality of service(QoS) due to network function outages. A balance has to be takencarefully, between the loss reduced by VNF migration and theoperations cost generated thereby. To achieve this in practicalscenarios with realistic user behavior, it calls for models of bothcost and user mobility. This paper proposes a novel cost modeland a AI-empowered approach for a rational migration of statefulVNFs, which minimizes the sum of operations cost and potentialloss caused by outages, and is capable to deal with the complexrealistic user mobility patterns.
Index Terms —Multi-access Edge Computing, VNF migration,artificial intelligence, context awareness, network resilience
I. I
NTRODUCTION
Fifth-generation (5G) mobile networks are expected toextend the traditional consumer-based market with business-oriented applications. To fully exploit the potential of 5G,MEC will play a key role [1]. MEC is intended to distributethe computing tasks and mobile services from the traditionalcentralized cloud server onto the edge of mobile networks,which can improve the flexibility, efficiency, timeliness, andreliability of cloud services for heterogeneous use cases. Asthe mobile networking service itself, driven by the emerg-ing development of software-defined networking (SDN) andnetwork function virtualization (NFV) technologies, will alsobe highly cloudified in 5G and beyond, MEC can also bedeployed for virtualized network functions (VNFs) to improvethe networking performance [2]. A typical technology of thiskind, known as the VNF migration [3], [4], is to createtemporary edge-cloud redundancy for VNFs that are detectedto be in abnormal situations that may lead to failures ormalfunctions, so as to enhance the network resilience.Compared to other pro-resilience measures such as statemanagement and rollback recovery, VNF migration is moreeffective in coping with network outages caused by degradedbackhaul connections. The general idea is to temporarilymigrate VNFs running in the central cloud to the edge clouds,and to periodically update them. However, a general creationand maintenance of redundancy commonly for all centralcloud VNFs and globally in every edge cloud will lead to
Bin Han ([email protected]) is the corresponding author. a huge operation expenditure (OPEX) beyond any affordablelevel. It was therefore proposed in [5] to selectively migrateonly the critical VNFs in outage-risky edge clouds, where theestimated loss caused by potential central cloud VNF outage,as an opportunity cost, exceeds the essential operation cost forcreate/update the local edge cloud redundancy.It shall be remarked that most VNFs are stateful [6].Therefore, besides the software implementation of the VNFitself, also the status information of the VNF must be migratedto the target cloud, so as to guarantee a seamless transfer of theservice. Typical states for most stateful VNFs can be classifiedinto control state, per-flow state, and aggregate state [3]. Inaddition to them, for some specific VNFs that are related tosecurity and privacy, such as the virtualized authentication, au-thorization and accounting (V-AAA) in 5G [7], profiles of theserved subscribers become also an essential state information.In this case, the migration cost is also related to the num-ber of migrated SPs, and the cost-optimal migration controlmust therefore take into account the context information ofsubscriber devices such as position, mobility and local trafficscenario. To cope with this issue, we have proposed in [8]a context-aware solution of SP synchronization. Despite ofits success in demonstrating the principle, the referred workhas only considered a highly simplified mobility model, i.e.a modified random walk model, which fails to characterizewith the mobility pattern of users in complicated real-lifeenvironments. Towards a cost-efficient ultra-resilient MEC, inthis paper we propose an AI-based solution that makes itrealistic to flexibly deploy cost-optimal context-aware VNFmigration in various complex scenarios of real life.Our work consists of four main contributions: i ) We re-fine the cost and loss models of VNF migration in existingwork, taking into account both the user-independent part ofVNF redundancy creation and the user-dependent part of SPsynchronization. ii) Based on the refined models, we derivea novel cost-loss-optimal policy of VNF migration, and analgorithm to solve it. iii)
We propose to use a mixture densityneural network (MDN) to accurately model the complex usermobility pattern in reality, so as to support the prediction ofsubscribers’ edge cloud visit probability, which is crucial inmaking the optimal migration decision. The proposed MDN istrained and tested with real-life GPS trajectory data of mobilesubscribers. At last, iv We carry out numerical simulations a r X i v : . [ c s . N I] J a n o demonstrate the effectiveness and cost efficiency of ourproposed approach with realistic user mobility behaviors.The remainder of this paper is organized as follows: Sec. IIsets up the system model, from which the cost and loss instateful VNF migration are derived in a statistical manner.Based on the models, we propose in Sec. III a novel cost-loss-optimal migration policy, and in Sec. IV a MDN-basedsubscriber mobility prediction scheme. We then demonstrateour proposed approaches in Sec. V by carrying out simulationswith the real-world mobile user trajectory data. To the end, inSec. VI we close our paper with conclusions to this study andsome insights about future work.II. MEC VNF M IGRATION FOR
5G R
ESILIENCE
A. Overview
5G networks are highly virtualized and cloudified, makingit possible to flexibly deploy the software-defined networkingservices at variant positions in the network architecture. Espe-cially, it has been a technical trend to distribute network func-tions, especially the latency-sensitive ones, to MEC serversat the edge of network. This does not only reduces the E2Elatency, but also improves the reliability by relieving fromthe risks of congestion and malfunctions at cell site gatewaysand backhaul connections. Nevertheless, there are still somenetwork functions that shall be generally located in the centralcloud, including computationally challenging ones such likenetwork-wide AI model training, and security intolerant onessuch like authentication, authorization and accounting.Concerning potential network outages that can significantlydegrade the quality of service (QoS) of such centralized VNFs,it is proposed to proactively create temporary at MEC serverslocal redundancies for the the involved VNFs and subscriberprofiles, upon the VNF outage risk. In order to estimate theoutage risk - of a certain VNF, for a specific subscriber, and inthe coverage of a certain edge cloud (EC) - factors that shouldbe taken into the account are the backhaul reliability and thesubscriber service time. While the backhaul reliability can bepredicted by network monitoring and analyzing techniquesstraightforwardly, the estimation of subscriber service timerelies on advanced brokering among various context informa-tion such as user motion pattern, local mobility model, ECcoverage, service data traffic pattern, etc. [8].It costs not only data traffic to update the redundancies ofcentral cloud VNFs in edge clouds, but also server resource tomaintain them. Hence, there is a trade-off between the networkresilience and the operations expenditure when deploying theMEC VNF migration. Aiming at an optimum, a reasonablepolicy is only to migrate a central cloud VNF to MEC server,when the expected potential loss upon its outage exceeds thecost to migrate it [5].
B. Model Setup of Stateful VNF Migration
We consider a certain area - notated as the edge coverage (EC) - that can be served by its dedicated MEC server, and a stateful VNF that by default runs in the central cloud. TheMEC server keeps monitoring the QoS of the VNF, and is ableto, once after every synchronization interval T , synchronizethe VNF together with involved SPs from the central cloudto a local redundancy. Redundancies of both the VNF and theSPs will be erased at the end of the synchronization interval, inorder to save maintenance cost and ensure security. It generatesa cost of c NF for every migration of the VNF, and a cost of c SP for every synchronization of every SP.We consider the reliability r NF of the VNF in every synchro-nization interval T to be a discrete-time random process, whichcan be characterized by a Markov chain [8]. When it dropsbelow a threshold r min , an outage of the VNF is considered tobe present in the EC. In this case, every subscriber in the ECwill suffer from a degraded QoS, causing the operator a lossof l per involved subscriber per unit time, unless 1) the VNFhas been migrated to the MEC server, and
2) the subscriber’sprofile has been synchronized to the MEC server.Thus, at the end of an arbitrary synchronization interval,call the current time slot t , consider the set U ( t ) of all usersin the network that may visit the target EC over the nextsynchronization interval [ t +1 , t + T ] , the expected loss causedby the potential VNF outage is L ( t ) = l t + T (cid:88) τ = t +1 (cid:88) u ∈U ( t ) p o ( τ ) p v ,u ( τ ) [1 − m ( t ) s u ( t )] , (1)where p o ( τ ) = Prob [ r ( τ ) < r min ] is the risk of VNF outagein time slot τ , p v ,u ( τ ) is the probability that user u visits ECin time slot τ . m ( t ) is the VNF migration label that has thevalue of when the VNF redundancy is created/updated ininterval [ t + 1 , t + T ] , and equals otherwise. Similarly, s u ( t ) is the SP synchronization label for user u . Correspondingly,the total cost generated by the migration decision is C ( t ) = c NF m ( t ) + c SP (cid:88) u ∈U ( t ) s u ( t ) . (2)To achieve a rational decision of VNF migration with SPsynchronization, the migration controller must predict the keyprobabilities p o ( τ ) and p v ,u ( τ ) for all u ∈ U and τ ∈ [ t +1 , t + T ] upon the context information provisioned by VNFmonitoring and mobility management, as illustrated in Fig. 1.III. C OST -L OSS -O PTIMAL S TATEFUL
VNF M
IGRATION
For convenience of notation let T u ( t ) (cid:44) t + T (cid:80) τ = t +1 p o ( τ ) p v ,u ( τ ) ,which is the VNF outage time that user u is expected toexperience over the next synchronization interval [ t +1 , t + T ] .Combining the models of outage loss (1) and migrationcost (2), it can be obtained for any certain VNF migra-tion decision m ( t ) and any certain SP synchronization plan U s ( t ) (cid:44) { u : s u ( t ) = 1 , ∀ u ∈ U ( t ) } , that the expected sum ofcost and loss over the next interval is S ( t ) = L ( t ) + C ( t ) = c NF m ( t ) + (cid:88) u ∈U ( t ) (cid:40) l t + T (cid:88) τ = t +1 p o ( τ ) p v ,u ( τ ) [1 − m ( t ] s u ( t )) + c SP s u ( t ) (cid:41) = c NF + (cid:80) u ∈U ( t ) { lT u ( t ) + s u ( t ) [ c SP − lT u ( t )] } if m ( t ) = 1 (cid:80) u ∈U ( t ) [ T u ( t ) + c SP s u ( t )] if m ( t ) = 0 . (3)igure 1: Concept of context-aware stateful VNF migrationEspecially, in the case of m ( t ) = 1 , i.e. the VNF is migratedto the MEC server, we have the lower bound S ( t | m ( t ) = 1) (cid:62) c NF + (cid:88) u ∈U ( t ) lT u ( t ) Figure 2: The structure of proposed MDNThe estimation of VNF outage probability p o ( τ ) , as referredby the function EstVNFOutageProb ( ) in Algorithm 1,can be realized by online VNF status analysis [8]. Never-theless, the prediction of subscribers’ EC visit probability p v ,u ( τ ) , which is referred in Algorithm 1 by the function PredECVisitProb ( ) , remains an open issue. Mainstreammobility management solutions commonly predict only themost likely future trace, while a spatial probability distribution lgorithm 1 Cost-Loss-Optimal Stateful VNF Migration Input U ( t ) , l, c NF , c SP , T, r NF ( t ) S ← c NF (cid:46) To buffer the lower bound (4) S ← (cid:46) To buffer the lower bound (6) U s ( t ) ← ∅ for u ∈ U ( t ) do T u ( t ) ← end for for τ ∈ [ t + 1 , t + T ] do p o ( τ ) ← EstVNFOutageProb ( r NF ( t ) , τ − t ) for u ∈ U ( t ) do p v ,u ( τ ) ← PredECVisitProb ( u, τ − t ) T u ( t ) ← T u ( t ) + p o ( τ ) p v ,u ( τ ) end for end for for u ∈ U ( t ) do S ← S + lT u ( t ) if lT u ( t ) (cid:62) c SP then U s ( t ) ← U s ( t ) ∪ { u } S ← S + c SP else s u ( t ) ← S ← S + lT u ( t ) end if end for if S (cid:62) S then m ( t ) ← else m ( t ) ← U s ( t ) ← ∅ end if Return m ( t ) , U s ( t ) of the subscriber’s future position is required for an accuratemulti-step p v ,u prediction. To address this challenge, we invokethe mixture density network [9]. A. Mixture Density Network The core principle of MDN is to use the outputs of a neuralnetwork to parametrize a mixture distribution. A subset ofthe outputs are used to define the mixture weights, whilethe remaining outputs are used to parametrize the individualmixture components [10]. The probability density of the targetdata is then represented as a linear combination of kernelfunctions in the form p ( t | x ) = I (cid:88) i =1 α i ( x ) θ i ( t | x ) , (8)where I is the number of components (kernels) in the dis-tribution mixture, the mixing coefficient α i ( x ) represent the prior probability of the target vector t conditioned on the inputstatus x generated from the i th kernel.For any given value of x, the mixture model (8) providesa general model for an arbitrary conditional density function p ( t | x ) . Considering the mixing coefficients α i ( x ) , means µ i ( x ) and variances σ i ( x ) of all kernels i ∈ { , , . . . , I } , tobe general (continuous) functions of x , a conventional feed-forward neural network can be constructed, taking x as inputand generates p ( t | x ) as output, which is known as a mixturedensity network.By choosing a mixture model with a sufficient number ofkernel functions, and a neural network with a sufficient numberof hidden units, a MDN can be trained to approximate asclosely as desired any conditional probability distributions. Inthis work we propose to compose a network with 3 successivefully connected hidden layers with a rectified linear unit(ReLU) activation functions, whose output will be fed intoa final softmax output layer for sampling from a mixture ofbivariate Gaussians, as illustrated in Fig. 2. B. Real-life Mobility Modeling with MDN1) Data Provisioning: To train the MDN towards realisticuser mobility pattern, we obtained real-life user mobility datafrom a GPS trajectory dataset, which was collected by the Microsoft Research Asia as a group of Geolife project. Itcontains the data of 182 users in a period over five years. EveryGPS trajectory in this dataset is represented by a sequenceof time-stamped 3D coordinates, each containing the latitude,longitude and altitude. These trajectories were recorded bydifferent GPS loggers and GPS phones, and have a variety ofsampling rates [11]. 2) Data Pre-processing: First of all, since we focus on thescenario of terrestrial networks, the 3D X − Y − Z coordinateswere first projected into the 2D X − Y plane and reorganizedto unified sampling rate. Afterwards, as users in variousmobility classes can have significantly different behaviors,we selectively picked only the pedestrian-like trajectories byfiltering high-speed users out. Furthermore, user mobility isusually time-varying, and therefore exhibits low stationaryover long terms, which can critically reduce the performanceof most time series analyzing techniques, including the MDNtraining (which is the mobility model fitting) in our case. Tocope with this issue, long-term trajectories were segmentedinto short ones, so as to guarantee a stationarity over inboth X and Y domains. 3) Specification and Training of the MDN: We specifiedour MDN to 64 input nodes, taking the 1-order differentiated2D coordinates over a time window of 32 samples, and 2output nodes for a single-step forward prediction. The threefully connected hidden layers are specified with 512, 128, and12 neurons, respectively.We split the dataset into two parts, one with trajectoriesas the training set, and the other with the remaining dataas the validation set. We trained the model for 15 epochs, eachreferring to a full traversal through the training dataset with abatch of 512 trajectories in parallel. We chose RMSprop as a) A sample prediction by the trained MDN(b) Training and validation losses Figure 3: Performance of MDN-based user mobility modelingthe optimizer, with the mean square error (MES) selected asloss function for fitting performance evaluation.A 100-step segment of prediction result is shown in Fig. 3aas an validation example, showing a sufficient accuracy. Boththe training and validation losses drop rapidly along withthe epochs, as shown in Fig. 3b, exhibiting a satisfactoryconvergence. V. N UMERICAL E VALUATION A. Simulation Setup To evaluate the effectiveness of our proposed approach, wecarried out numerical simulations.For the environment, we considered an 8x8 km region,where the circular EC area with radius of 2 km in the middleis served by a MEC server. As initialization, we randomlyset 1000 users across the whole region under a uniformdistribution, each user was randomly assigned to a fitted kernelwe had obtained during the MDN training in Sec. IV-B, as itsmobility model. Then for a pre-convergence from the randominitial state to a steady state, we let all users randomly movew.r.t. their models for steps, with each step interval lasting . If any user leaves the region, a new arriving user willbe randomly created at the region boundary, so that the userdensity remains consistent. Meanwhile, the VNF outage events was simulated by a Markov model inherited from [8], whichtakes account of multiple factors in VNF reliability, such asthe random network status fluctuation, the rare disaster events,and the repairing procedure. The migration interval was set to user motion steps.For the VNF migration controlling system, we considereda MDN to be created at then end of user location pre-convergence, and trained for the next steps to learn theuser mobility pattern. With the MDN sufficiently trained,the simulation continued for another period of 4000 steps,during which the MDN continuously predicted the EC visitprobability p v ,u for every user u and kept online updated.Meanwhile, a VNF monitoring engine was considered toonline observe the VNF status, and predict the outage risk p o at every step with a full knowledge about the outage model. B. Benchmarking For benchmarking we adopted the risk thresholding ap-proach used in [8] with extension. Differing from [8] thatonly considers the UP synchronization cost c UP , in this workwe also need to take account of the VNF migration cost c NF ,so a double thresholding method was designed, where thecontroller chooses to migrate the VNF in EC only when theaverage p o in next synchronization interval exceeds a threshold P o , and synchronizes the SP of a user only when its cumulativeprobability of visiting EC in the next interval exceeds anotherthreshold P v . We tested this basedline solution with differentspecifications of [ P o , P v ] , and compare the resulted sum of lossand cost to the performance achievable by the MDN-basedoptimal migration scheme. The results, as shown in Fig. 4,shows that our proposed method outperforms the benchamrk,even when the latter is optimally specified.Figure 4: Result of the benchmarkingVI. C ONCLUSION In this work, we have developed a novel cost-loss modelof stateful VNF migration, concerning costs of both the VNFigration cost and the subscriber profile synchronization, andtherewith proposed a cost-loss-optimal migration strategy. Tocope with the challenge of predicting subscribers’ edge cloudvisit probability with realistic user mobility pattern, we havedeveloped an artificial intelligence solution based on mixturedensity network. The proposed approach has been validatedby numerical simulations.R EFERENCES[1] S. Barbarossa, S. Sardellitti, and P. Di Lorenzo, “Communicating whilecomputing: Distributed mobile cloud computing over 5g heterogeneousnetworks,” IEEE Signal Processing Magazine , vol. 31, no. 6, pp. 45–55,2014.[2] S. R. Chaudhry, A. Palade, A. Kazmi, and S. Clarke, “ImprovedQoS at the edge using serverless computing to deploy virtual networkfunctions,” IEEE Internet of Things Journal , vol. 7, no. 10, pp. 10 673–10 683, 2020.[3] B. Han, V. Gopalakrishnan, G. Kathirvel, and A. Shaikh, “On the re-siliency of virtual network functions,” IEEE Communications Magazine ,vol. 55, no. 7, pp. 152–157, 2017.[4] I. Sarrigiannis, K. Ramantas, E. Kartsakli, P. Mekikis, A. Antonopoulos,and C. Verikoukis, “Online VNF lifecycle management in an MEC-enabled 5G iot architecture,” IEEE Internet of Things Journal , vol. 7,no. 5, pp. 4183–4194, 2020.[5] B. Han, M. R. Crippa, and H. D. Schotten, “5G island for net-work resilience and autonomous failsafe operations,” arXiv preprintarXiv:1805.01715 , 2018.[6] B. Yi, X. Wang, M. Huang, and K. Li, “Design and implementation ofnetwork-aware VNF migration mechanism,” IEEE Access , vol. 8, pp.44 346–44 358, 2020.[7] S. Wong, N. Sastry, O. Holland, V. Friderikos, M. Dohler, and H. Agh-vami, “Virtualized authentication, authorization and accounting (v-aaa)in 5g networks,” in , 2017, pp. 175–180.[8] B. Han, S. Wong, C. Mannweiler, M. R. Crippa, and H. D. Schotten,“Context-awareness enhances 5g multi-access edge computing reliabil-ity,” IEEE Access , 2019.[9] C. M. Bishop, “Mixture density networks,” Aston University, Tech. Rep.,1994.[10] A. Graves, “Generating sequences with recurrent neural networks,” arXivpreprint arXiv:1308.0850arXivpreprint arXiv:1308.0850