A Dynamic Reliability-Aware Service Placement for Network Function Virtualization (NFV)
Mohammad Karimzadeh Farshbafan, Vahid Shah-Mansouri, Dusit Niyato
aa r X i v : . [ c s . N I] N ov A Dynamic Reliability-Aware ServicePlacement for Network Function Virtualization(NFV)
Mohammad Karimzadeh Farshbafan, Vahid Shah-Mansouri, and Dusit Niyato,
Abstract
Network softwarization is one of the major paradigm shifts in the next generation of networks.It enables programmable and flexible management and deployment of the network. Network functionvirtualization (NFV) is referred to the deployment of software functions running on commodity serversinstead of traditional hardware-based middle-boxes. It is an example of network softwarization. In NFV,a service is defined as a chain of software functions named service chain function (SFC). The processof allocating the resources of servers to the services, called service placement, is the most challengingmission in NFV. Dynamic nature of the service arrivals and departures as well as meeting the servicelevel agreement make the service placement problem even more challenging. In this paper, we propose amodel for dynamic reliability-aware service placement based on the simultaneous allocation of the mainand backup servers. Then, we formulate the dynamic reliability-aware service placement as an infinitehorizon Markov decision process (MDP), which aims to minimize the placement cost and maximizethe number of admitted services. In the proposed MDP, the number of active services in the networkis considered to be the state of the system, and the state of the idle resources is estimated based on it.Also, the number of possible admitted services is considered as the action of the presented MDP. Toevaluate each possible action in the proposed MDP, we use a sub-optimal method based on the Viterbialgorithm named Viterbi-based Reliable Static Service Placement (VRSSP) algorithm. We determine theoptimal policy based on value iteration method using an algorithm named VRSSP-based Value Iteration(VVI) algorithm. Eventually, through the extensive simulations, the superiority of the proposed modelfor dynamic reliability-aware service placement compared to the static solutions is inferred.
Mohammad Karimzadeh Farshbafan and Vahid Shah-Mansouri are with School of Electrical and Computer Engineering,College of Engineering, University of Tehran, Tehran, Iran (e-mail: { m.karimzadeh68,vmansouri } @ut.ac.ir).Dusit Niyato is with School of Computer Science and Engineering, Nanyang Technological University, Singapore (e-mail:[email protected]). Index Terms
Network function virtualization (NFV), dynamic reliability-aware service placement, Markov deci-sion process (MDP), Viterbi algorithm, value iteration algorithm.
I. I
NTRODUCTION
Network softwarization technologies are envisioned to be a major contribution to 5G networks[1]. In this way, network function virtualization (NFV) avoids the necessity of dedicated mid-dleboxes and facilitates the agile service provisioning [2]. In the NFV paradigm, the hardwaremiddleboxes are replaced by software-based virtual network functions (VNFs) which run oncommodity servers. Examples of these VNFs are deep packet inspection (DPI), firewall, andcellular packet core functions [3]. In NFV, a service is created by instantiation of multipleconnected VNFs, called service function chain (SFC). The task of assigning SFCs and theirVNFs to appropriate servers is referred to as service placement [4]. The main components ofan NFV-based network are • Infrastructure network provider (InP) is the owner of the network function virtualizationinfrastructure (NFVI) which includes commodity servers for performing the VNFs and thelinks between the servers for routing the traffic of back-to-back VNFs in the SFCs. • Services are requested by the users of the network. Each service has a specific service levelagreements (SLA) (e.g., the reliability of service or end-to-end delay). The SLA of each serviceis determined according to the type of the service. • Network Operator (NO) is responsible for providing the services of the users. For thispurpose, NO first composes an appropriate SFC for each service and then use the infrastructureof the InP for performing service placement [4].The most important challenge of a successful NFV deployment is managing the resources ofthe InP in a way that the number of admitted services, meeting their SLAs, is maximized. Mostof the previous studies focused on performing service placement without considering the SLAof the service. Recently, SLA-aware service placement has received attention for NFV-basednetworks. The reliability of the services is one of the most critical requirements in the nextgeneration of telecommunication network, especially 5G and beyond. The three service types of5G named enhanced mobile broadband (eMBB), ultra-reliable and low-latency communication(URLLC), and machine-type communications (MTC) have specific reliability requirements [5].As a consequence, the reliability-aware service placement for NFV-enabled NO needs to beextensively studied.
The reliability of a service in NFV depends on the reliability of the commodity servers runningthe VNFs of the service. In fact, a service is in a perfect running state, if all VNFs of the serviceare running without failure. On the other hand, the servers of InPs can have different reliabilitylevels, which makes the reliability-aware service placement more complicated. One commonsolution for providing the reliability requirement is using hot backups. It is referred to allocationof additional servers to some of the constituent VNFs of the service’s SFC. The first server iscalled the main server, and the next servers are called the backup servers. There are two generalapproaches for performing backup allocation. In the first approach, backup server allocation isdone after performing the main server assignment. In the second approach, the main and backupserver allocations are done simultaneously. Most of the proposed methods for reliability-awareservice placement is based on the first approach, which is not the optimal solution.Generally, the most unperceived aspect of reliability-aware service placement is the dynamicnature of the service arrival and departure. More precisely, most of the previous studies havenot considered the dynamic characteristic of the services. The dynamic nature of the servicescan dramatically affect the performance of the proposed methods. To the best of our knowledge,there is no comprehensive work for dynamic reliability-aware service placement where the mainand backup servers allocation is carried out simultaneously. The main contributions of this papercan be summarized as follows: • We consider a scenario in which an NO aims to provide different service types using NFVfor the users. Each service type is characterized by its arrival rate, departure rate, reliabilityrequirements, and SFC specifications. • We formulate an optimization problem for dynamic reliability-aware service placement, whichconsiders the main and backup servers allocation in one step with the objective of minimizingplacement cost and maximizing the number of admitted services meeting their reliability. • Then, we formulate the dynamic reliability-aware service placement as an infinite horizonMarkov decision process (MDP) problem to minimize the placement cost and maximize thenumber of admitted services. We define the state set of the MDP based on the number ofactive services and the number of incoming services of each type and the action set using thenumber of possible admitted services of each type. • We propose a method to estimate the idle resources of the InPs according to the numberof active services in the process of finding the optimal policy of the MDP. For evaluatingeach possible action, we use a sub-optimal method based on the Viterbi algorithm named
Viterbi-based Reliable Static Service Placement (VRSSP) algorithm. • We adopt the value iteration algorithm named VRSSP-based Value Iteration (VVI) to find theoptimal policy of the proposed MDP based on VRSSP. During VVI, we determine the bestpossible arrangement of the admitted services for placement. • Finally, we evaluate the performance of the proposed MDP model for dynamic reliability-aware service placement. We compare the performance of the MDP model with the baselinestatic methods for reliability-aware service placement.The rest of the paper is organized as follows. The existing methods for reliability-awareservice placement in NFV are reviewed in Section II. We introduce a dynamic reliability-awareservice placement in Section III. Then, we propose an MDP model for dynamic reliability-awareservice placement in Section IV. We present the algorithm for service placement named VRSSPalgorithm in Section V. In the following, we present VVI algorithm for finding the optimal policyof the introduced MDP in Section VI. Finally, we numerically evaluate the proposed scenariofor dynamic reliability-aware service placement in Section VII.II. R
ELATED W ORKS
In this section, we first review the static service placement problem and then introduce theconducted research in dynamic service placement.In [6]–[8], the static service placement problem is investigated for minimizing the placementcost by considering different cost components including server cost, cost of traffic routing,a penalty for resource fragmentation, and delay cost. In [9]–[14], the static reliability-awareservice placement is considered. In [9]–[11], the backup server allocation apart from the mainserver placement is done. In [9], [10], the VNF selection for backup is performed independentof backup placement. In [11], an iterative cost-effective redundancy algorithm named CERA isproposed in which VNF selection for backup and backup placement are combined. The authorsin [12] first map each service with the estimated number of backups, and then, provide therequired reliability by adding more backups. The authors in [13] investigate iterative backupselection with a routing procedure and endeavored to maximize link utilization while providingrequired reliability and delay. In [14], an algorithm named ensure reliability cost-saving (ER-CA) for reducing the cost of placement is presented. In [15] and [16], virtual backup allocationto recover the failing middleboxes is considered. In [15], the idea of shared backup in whicheach backup server is a backup for the multiple middleboxes is introduced. In [16], a novel graph-based presentation for backup server allocation is proposed. In [17], the idea of pipelinesharing for decreasing the number of requirement cores in NFV is investigated in a way that theaverage delay is minimized.In [18]–[23], service placement by considering the dynamic characteristic of the servicesin NFV-enabled NO is investigated. In [18], [19], a distributed approach for dynamic serviceplacement is proposed. In [20], an online algorithm by dynamic adjusting the number of virtualnetwork function instances (VNFIs) is introduced. In [21], the dynamic service placement for NOwith the capability of mobile edge computing (MEC) is considered. In [22], the jointly dynamicservice placement and scheduling problem are modeled as a mixed-integer linear programming(MILP) by providing guaranteed quality of service (QoS). An online algorithm based on theregularization approach is proposed in [23]. In [24]–[26], the dynamic migration-based serviceplacement model with considering the negative effect of migration on the QoS is considered.In [27], for dynamic reliability-aware service placement only using the main server, a deepreinforcement learning (Deep-RL) method is proposed. To the best of our knowledge, there isno comprehensive work for dynamic reliability-aware service placement by the simultaneousallocation of main and backup servers for NFV-enabled NO.III. S
YSTEM M ODEL
In this section, we introduce the dynamic reliability-aware service placement problem. Weconsider a scenario in which an NO needs to deliver services using NFV. However, the NOwould use the resources of existing InPs for the placement of the incoming services. There aremultiple InPs with commodity servers providing service to NOs. Each InP has several serverswith different amount of resources and a certain level of reliability.
A. Infrastructure Network Providers (InPs)
Let I denote the set of existing InPs that NO can use their servers. We model the entirenetwork of InPs as a uni-directed graph G = ( G s , G b ) , where G s is the set of servers and G b isthe set of the links between the servers which can be written as G s = n G si | s ∈ { , , . . . , | S i |} , i ∈ { , , . . . , | I |} o , (1) G b = n G s ,s i ,i | s ∈ { , , . . . , | S i |} , s ∈ { , , . . . , | S i |} , i , i ∈ { , , . . . , | I |} o , (2)where G si is the s th server of the i th InP and G s ,s i ,i indicates the link between the s th server ofthe i th InP and the s th server of i th InP. Each server have | R | resource types where the amount of j th resource type on the s th server of i th InP is denoted by R si,j . Examples of such resourcetypes are CPU, RAM, and storage. The bandwidth capacity and the unit cost of using link G s ,s i ,i are denoted by B s ,s i ,i and C s ,s i ,i , respectively. The unit cost of using the j th resource type of i th InP’s server is denoted by C i,j . Let v i indicate the failure probability of the servers of the i th InP. We assume that decreasing the failure probability marginally close to zero exponentiallyincreases the cost of servers. As a result, the value of C i,j can be written as C i,j = α j e β ( v Base − v i ) , i = 1 , . . . , | I | , (3)where α j ( ≤ α j ≤ ) is a coefficient which indicates the importance of j th resource typefor the servers of i th InP. β is the design parameter and v Base is the highest acceptable failureprobability. The failure probability of the servers of each InP should be lower than this threshold.We know that with decreasing the failure probability of a server, the involved hardware in theserver will be more expensive, which leads to an increment of server cost. The downtime of theserver is one of the best metrics for determining the server cost. For example, for the serverswith failure probabilities of { . , . , . } , the downtimes are { . , . , . } days peryear. Assume that the cost of the first server is , we can consider the costs of the second andthird servers to be . and according to their lower downtimes. These values of servers’ costcan be obtained using the introduced exponential model. By using this model, the high-reliableservers become expensive, which leads to efficient usage of the InPs’ resources by the NO. B. Characteristics of Service Requests
For service arrival and departure, we assume that the time is divided into equal slots. Theconcept of the slotted time is introduced to define the time evolution of the system. The length ofthe slot is the input of the problem and can be set to any value. For example, it can be selectedbased on the minimum time between two consecutive arrivals and the minimum time between twoconsecutive departures. To avoid service placement for each arrival service, we consider serviceplacement problem at the beginning of the n th slot for arriving services during the ( n − th slot.Therefore, the service placement is performed for arrival services during a slot, which can be leadto a better result especially in terms of admission ratio compared to performing service placementproblem for each incoming service. In 5G networks, each service has a specific service type.Examples of such types are eMBB, URLLC, and MTC. In our model, each incoming service hasa specific type with certain characteristics. Let Υ kn ∈ Υ for k = 1 , . . . , K n indicate the servicetype for the k th incoming service of n th slot and K n is the number of incoming services in the n th slot. Υ is the set of service types defined as Υ = n Υ , Υ , . . . , Υ L o , where L is the totalnumber of the service types and Υ l denotes the characteristics of l th service type, defined as Υ l = n F l , d l , b l , U l , f l , λ l max , r lu,j , t lu o , ≤ l ≤ L, ≤ j ≤ | R | , ≤ t lu ≤ | T | , (4)where F l is maximum tolerable failure reliability, b l is the required bandwidth, U l is the numberof the VNFs of the l th service type’s SFC and r lu,j is the amount of the j th resource type requiredfor the u th VNF in the SFC of the l th service type. t lu indicates the VNF type of u th VNF and | T | is the total number of VNF types. Also, d l indicates the departure probability of the l th servicetype at the end of each slot. Each admitted service will remain in the system for a randomnumber of slots and leaves the system by the end of each slot with a probability of d l . Thedeparture probability of each active service at the end of each slot is constant and independentof the number of slots that the service is active. Therefore, the service duration and the number ofactive services at the beginning of each slot will be memoryless. As a consequence, the numberof active services at the beginning of the n th slot only depends on the number of active servicesat the beginning of the ( n − th slot and the number of admitted services at the beginning ofthe n th slot. The value of d l is the input of the problem and can be set to any value.Finally, f l is the probability distribution function (PDF) of the number of incoming serviceswith type l in each slot defined as f l ( m ) = Pr { λ ln = m } for ≤ λ ln ≤ λ l max , where λ ln is thenumber of incoming services with type l in the n th slot and λ l max is the maximum number ofincoming services of l th service type. The total number of the incoming services in the n th slotcan be written as K n = P Ll =1 λ ln . The number of incoming services in each slot is independentof the incoming services in the previous slots and the number of the active services. It is worthnoting that a popular scenario for modeling the Internet traffic dynamics of the users in cellularnetworks is cycle-stationary traffic [24], [28]. In this model, it is assumed that the traffic volume ischanged periodically among N intervals. For example, in [24], a cycle-stationary traffic scenariowith N = 24 which is a typical value for daily traffic, is used. However, the considered modelfor traffic arrival and departure in our paper is more general than the mentioned cycle-stationarytraffic scenario. It should be mentioned that by considering the cycle-stationary traffic scenario,the Markov characteristic for the number of the active services is preserved. Therefore, the MDPapproach can also be applied to the cycle-stationary traffic scenario. C. Service Placement Cost
The service placement in NFV has two major components to contribute to placement cost,namely the cost of using the servers and the cost of traffic forwarding between servers. However,other cost resources including a deployment cost for the different types of VNFs, and a penaltyfor violating the SLA of the incoming services can be also included. Here, we consider threecost components to contribute to placement cost. For this purpose, assume that x l,k,sn,u,i ∈ { , } indicates the binary decision variable for placing the u th VNF of k th incoming service of l th typein the n th slot, in G si , where ≤ l ≤ L , ≤ u ≤ U l , ≤ k ≤ λ ln , ≤ s ≤ | S i | , ≤ i ≤ | I | and ≤ n ≤ ∞ . Also, y l,kn,u ′ ( s , s , i , i ) ∈ { , } is the binary decision variable for forwarding thetraffic between the ( u ′ ) th and ( u ′ + 1) th VNF of this service, using G s ,s i ,i . Now, the componentsof placement cost for the k th incoming service of l th type in the n th slot can be written as follows:1) Server Cost:
Let ξ l,kn,s indicate the server cost for the k th incoming service of l th type inthe n th slot. We assume that the cost of using a server is proportional to the amount ofresources being used. As a result, the server cost for a service can expressed as ξ l,kn,s = P U l u =1 P | I | i =1 P | S i | s =1 P | R | j =1 x l,k,sn,u,i × r lu,j × C i,j . (5)2) Traffic Forwarding Cost:
Let ξ l,kn,b indicate the traffic forwarding cost for the k th incomingservice of l th type in the n th slot which can be expressed as ξ l,kn,b = P U l − u ′ =1 P | I | i =1 P | S i | s =1 P | I | i =1 P | S i | s =1 y l,kn,u ′ ( s , s , i , i ) × b l × C s ,s i ,i . (6)3) VNF Deployment Cost:
We define the VNF deployment cost for the k th incoming serviceof l th type in the n th slot with ξ l,kn,d which depends on the number of VNFs in the service’sSFC and the types of VNFs. This cost can be expressed as ξ l,kn,d = P U l u =1 P | I | i =1 P | S i | s =1 x l,k,sn,u,i × DC i,t lu , (7)where DC i,t lu indicates the deployment cost of the ( t lu ) th VNF type in the servers of i th InP.The placement cost for the k th service of l th type in the n th slot can be computed as ξ l,kn,p = ξ l,kn,s + ξ l,kn,b + ξ l,kn,d . (8)According to the definition of the placement cost, in most scenarios, the placement cost ofthe services is heightened by increasing the number of VNFs in the service’s SFC. Therefore,admitting the services with a low number of VNFs is more profitable for NO, which is notan appropriate service admission policy. For preventing this shortcoming, we consider differentvalues for the reward of admitting different service types, q l , which is introduced in Section IV-D. Intuitively, the reward of admitting services should be heightened by increasing the number ofVNFs in the service’s SFC. However, by these values of service admitting reward, the totalnumber of admitted services is decreased, but there would be a compromise for admitting theservices with the different number of the VNFs.
D. Placement Constraints
We consider five constraints for the service placement. The first one is introduced to guaranteethe allocation of the main server to each VNF and considering the possibility of backup serverallocation to the respective VNF. The second and third constraints are used to prevent the violationof each server’s resources and the bandwidth of each link, respectively. The fourth constraint isintroduced to guarantee the allocation of an appropriate link for forwarding the traffic betweenthe consecutive VNFs, considering the allocated servers to the VNFs. The last constraint isintroduced to guarantee that the reliability requirement of each service is provided. In this way,the reliability of each service is computed as a function of the binary decision variable, x l,k,sn,u,i .First, we introduce constraint of the main and backup servers allocation to each VNF, whichis indicated by H p . This constraint is to ensure that each VNF of the incoming services is placedin one server as the main server. Also, this constraint provides the possibility of using a backupserver to meet the reliability requirement. This constraint can be written as H p :1 ≤ P | I | i =1 P | S i | s =1 x l,k,sn,u,i ≤ , ≤ u ≤ | U l | , ≤ k ≤ λ ln , ≤ l ≤ L. (9)According to this constraint, if a VNF has a backup server in addition to the main server,the main and backup servers are placed in different physical servers. Therefore, the main andbackup servers of a VNF are physically separated.Now, we introduce the constraint for the resources of the servers in each slot which is indicatedby H g . Let ω sn,i,j indicate the amount of idle resources of j th resource type in the s th server of i th InP at the beginning of n th slot (cid:0) ≤ ω sn,i,j ≤ R si,j (cid:1) . For the placement of the incoming servicesin this slot, NO considers the constraints on the resources of the servers as H g : P Ll =1 P λ ln k =1 P U l u =1 x l,k,sn,u,i × r lu,j ≤ ω sn,i,j , ≤ j ≤ | R | , ≤ s ≤ | S i | , ≤ i ≤ | I | . (10)The third constraint is defined for the bandwidth limitation of the connection links betweenthe servers which is indicated by H b . Assume that ω s ,s n,i ,i indicates the amount of remainingbandwidth in the connection link between the s th server of i th InP and the s th server of i th InPat the beginning of n th slot (cid:0) ≤ ω s ,s n,i ,i ≤ R s ,s i ,i (cid:1) . For the placement of the incoming servicesin each slot, NO will consider the constraints on the bandwidth of the links as H b : P Ll =1 P λ ln k =1 P U l − u ′ =1 y l,kn,u ′ ( s , s , i , i ) × b l ≤ ω s ,s n,i ,i , (11) ≤ s ≤ | S i | , ≤ s ≤ | S i | , ≤ i , i ≤ | I | . The fourth constraint is defined for traffic forwarding of each service indicated by H f . Forthe placement of the incoming services in each slot, NO will consider the following constraints. H f : y l,kn,u ′ ( s , s , i , i ) ≤ x l,k,s n,u ′ ,i , y l,kn,u ′ ( s , s , i , i ) ≤ x l,k,s n,u ′ +1 ,i , (12) y l,kn,u ′ ( s , s , i , i ) ≥ (cid:0) x l,k,s n,u ′ ,i + x l,k,s n,u ′ +1 ,i − (cid:1) , ≤ l ≤ L, ≤ k ≤ λ ln , ≤ u ′ ≤ U l − , ≤ s ≤ | S i | , ≤ s ≤ | S i | , ≤ s ≤ | S i | , ≤ i, i , i ≤ | I | . This constraint implies that if u ′ and u ′ + 1 VNFs of a service are placed in servers G s i and G s i , respectively, the traffic between these VNFs should be routed using link G s ,s i ,i .The final constraint is defined for the reliability requirement of the incoming services. Weindicate the failure probability of k th incoming service of l th type in the n th slot with f l,kn . Toobtain f l,kn , we calculate the probability of being in the running state (i.e., not being failed)which is indicated by p l,kn . We know that a service is in running state if none of the VNFsof that service fails. As a result, we should determine the failure probability of the VNF as afunction of binary decision variable, x l,k,sn,u,i . Let f l,kn,u denote the failure probability of u th VNFof k th service of l th type in the n th slot, which can be computed as f l,kn,u = Q | I | i =1 (cid:16) Q | S i | s =1 ρ l,k,sn,u,i (cid:17) , where ρ l,k,sn,u,i = v i when x l,k,sn,u,i = 1 , and otherwise it is 1 ( v i is the failure probability of i th InP).Now, we can compute the probability of being in the running state and failure probability of theplacement for the k th service of l th type in the n th slot as p l,kn = Q U l u =1 (1 − f l,kn,u ) , f l,kn = 1 − p l,kn ,and the reliability constraint can be written as H r :1 − Q U l u =1 (cid:16) − Q | I | i =1 (cid:0) Q | S i | s =1 ρ l,k,sn,u,i (cid:1)(cid:17) ≤ F l , ≤ k ≤ λ ln , ≤ l ≤ L. (13) E. Dynamic Reliability-Aware Service Placement Problem
The optimization problem of dynamic reliability-aware service placement is min x l,k,mn,u,i ,y l,kn,u ′ ( s ,s ,i ,i ) P ∞ n =1 P Ll =1 P λ ln k =1 ξ l,kn,p (14)subject to H p , H g , H b , H f , H r . (15)This optimization problem aims to minimize the total placement cost during the infinite timefor all incoming services of all service types. The considered constraints are for main andbackup server allocation, for the resource capacity of the servers, for the bandwidth capacityof the links, for the traffic routing, and for the reliability requirement of the services which areindicated by H p , H g , H b , H f , and H r , respectively. The optimization variables are x l,k,mn,u,i and y l,kn,u ′ ( s , s , i , i ) , where x l,k,mn,u,i is the binary decision variable for placing the u th VNF of k th incoming service of l th type in the n th slot, using server G si , and y l,kn,u ′ ( s , s , i , i ) is the binarydecision variable for routing the traffic between the ( u ′ ) th and ( u ′ + 1) th VNFs of k th incomingservice of l th type in the n th slot, using link G s ,s i ,i .This optimization problem cannot be solved with standard static optimization techniques dueto the dynamic nature of the parameters. More precisely, the amount of idle resources in eachInP depends on the number of active services. NO should take into consideration the number ofactive service for admitting new services. In such a scenario, the dynamic programming (DP)techniques can be beneficial. An MDP, as a DP technique, is a model of an agent interactingsynchronously with a world. The agent receives as input the state of the world and takes asoutput actions, which affect the state of the world. The most important character for modeling adynamic optimization problem with MDP is the memoryless characteristic of the state. Accordingto some aforementioned assumption, the number of the active services, which can be consideredas the state of the problem, is memoryless. Then, for responding to the characteristics of thedynamic reliability-aware service placement, MDP can be an appropriate idea. Table I includesthe notations used in the problem and proposed MDP model.IV. AN MDP MODEL OF DYNAMIC RELIABILITY-A WARE S ERVICE
PLACEMENTAs we know, MDP provides a mathematical framework for decision making in problems whereoutcomes are partly random and partly depend on the actions of the decision maker. It shouldbe mentioned that NO is the agent making a decision by formulating and solving the MDP. AnMDP problem is characterized by a four-tuple (Ω S , Ω A , Ω P , Ω R ) where Ω S is the state set, Ω A is the action set, Ω P is the probability set, and Ω R is the reward set. In the MDP framework, itis assumed that, although there may be a great deal of uncertainty about the effects of an agentsactions, there is never any uncertainty about the agents current state. Also, we assume that theagent has complete and perfect perceptual abilities. For modeling the dynamic reliability-awareservice placement problem with MDP, we define these sets in the following. A. Set of States
The definition of an appropriate state set is the most important parts of modeling a problemas an MDP. The state of the system has three components. The first one is the state of the idleresources of servers in different InPs, at the beginning of each slot, which is indicated by Ω RS .The second component is the state of the active services, which is equal to the number of active TABLE I:
Notation table including all notations of optimization problem and MDP model
Symbol Description G s , G b Set of servers and links G si The s th server of i th InP. G s ,s i ,i The link between ( s ) th server of ( i ) th InP and ( s ) th server of ( i ) th InP. R si,j The amount of j th resource type in G si B s ,s i ,i , C s ,s i ,i The cost and bandwidth of G s ,s i ,i C i,j , v i The cost of using j th resource type and failureprobability of servers of i th InP F l , b l , U l Failure probability, bandwidth requirement,and number of VNFs in the l th service type. d l , f l , λ l max Departure probability, PDF of arrival services,maximum number of arrivals of l th service type. r lu,j , t lu Resource requirement of j th type and VNF typeof u th VNF of l th service type x l,k,sn,u,i Decision varibable of placing u th VNF of k th service of l th type in the n th slot, in G si ξ l,kn,s , ξ l,kn,b , ξ l,kn,d Server cost, traffic forwarding cost, deploymentcost of k th service of l th type in the n th slot ξ l,kn,p Placement cost of k th service of l th type in n th slot ω sn,i,j Remaining resource of j th type of G si in n th slot ω s ,s n,i ,i Remaining bandwidth of G s ,s i ,i in n th slot. p l,kn , f l,kn Running state probability and failure probabil-ity of k th service of l th type in the n th slot Ω RS , Ω DS , Ω IS State set of idle resource, active service, andincoming service λ n , σ n State of incoming service and active service in n th slot Symbol Description λ ln , σ ln State of incoming service and active service of l th type in n th slot σ l max Maximum number of active services of l th type | Ω DS | , | Ω IS | Size of state sets of active service and incomingservice. Ω S , Ω fA State set and set of feasible action S n , A n State and action of system in the n th slot P ( S, A, S ′ ) Probability of ending in state S ′ , if agent takeaction A in state S P i Transition matrix over Ω DS when state of in-coming service is i ∆( S, A ) Validation metric of taking action A in state SR ( S, A ) Reward of taking action A in state Se l,k , ξ l,kp Failure probability and placement cost of k th service of l th type which are results of VRSSPalgorithm ρ Placement arrangement (input of VRSSP algo-rithm) N si,j [ l, k ] Usage of j th resource type of G si for placementof k th service of l th type (output of VRSSPalgorithm) ω s,η S i,j Idle resources of j th type of G si when state is Sα η S Updating factor of estimating idle resource forstate SN si,j ( S, A ) Resource usage of j th type of G si for takingaction A in state SR S,A opt , ρ
S,A opt
Reward of optimum arrangement and optimumarrangement for taking action A in state S services of each type at the beginning of each slot indicated by Ω DS . We know that active servicesat the beginning of each slot are the services that are admitted in the previous slots and have notleft the network. The last component of the state set is the state of the incoming services at thebeginning of each slot, which is indicated by Ω IS . In the following, we define the componentsof the state set.1) Ω RS : The state set of the idle resources of the InPs can be written as Ω RS = n ( ω si,j ) (cid:12)(cid:12) ≤ ω si,j ≤ R si,j o , ≤ s ≤ | S i | , ≤ i ≤ | I | , ≤ j ≤ | R | , (16) where ω si,j indicates the state of idle resources of j th resource type in the s th server of i th InP.The state of idle resources of the servers in the n th slot can be written as (cid:8) ( ω sn,i,j ) (cid:9) ∈ Ω RS . Asseen in (16), the state of idle resources of the servers is continuous. We know that MDPswith continuous state space has high computational complexity and can be impractical.Later on, we discuss this challenge.2) Ω DS : We assume that each incoming service has a specific type that determines the char-acteristics of the service. Therefore, the state set of the active services can be written as Ω DS = n ( σ , σ , . . . , σ L ) (cid:12)(cid:12) ≤ σ l ≤ σ l max o , (17)where σ l and σ l max are the number of active services and the maximum number of activeservices for the l th service type, respectively. If the number of active services for the l th service type at the beginning of a slot is σ l max , NO will not admit any new incoming serviceswith type l . We indicate the number of active services with type l at the beginning of n th slot with σ n = ( σ n , σ n , . . . , σ Ln ) where σ n ∈ Ω DS .3) Ω IS : We consider the state of the incoming services as the number of incoming services ofeach service type separately. Therefore, the state of the incoming services is Ω IS = n ( λ , λ , . . . , λ L ) (cid:12)(cid:12) ≤ λ l ≤ λ l max o , (18)where λ l and λ l max are the number of incoming service and the maximum number ofincoming service for the l th service type, respectively. The number of incoming services inthe n th slot is indicated by λ n = ( λ n , λ n , . . . , λ Ln ) where λ n ∈ Ω IS .The total state set of the system can be written as Ω S = Ω RS × Ω DS × Ω IS , which is a massive statespace. For example, the state space of idle resources of the InPs, Ω RS , is continuous, and we haveto use continuous state space for our MDP problem. Solving the MDP problems with continuousstate space is time-consuming. However, we know that the state of the idle resources of theservers in different InPs, at the beginning of each slot depends on the number active services atthe beginning of the corresponding slot. More precisely, if the number of active services at thebeginning of each slot is known, the state of the idle resources can be estimated. As a result, thestate set of the proposed MDP can be considered as Ω S = Ω DS × Ω IS . For this purpose, we shouldintroduce a method for estimating the state of the idle resources of the InPs according to thenumber of active services. More precisely, for each ( σ , σ , . . . , σ L ) ∈ Ω DS , we should determinethe state of idle resources, (cid:8) ( ω si,j ) (cid:9) . We discuss more details about this method in Section VI.Now, we can write the total state of the system in the n th slot as S n = { λ n , σ n } ∈ Ω S . We assume that the state of the incoming services and active services are independent. The sizeof this state set is indicated by (cid:12)(cid:12) Ω S (cid:12)(cid:12) which can be computed as (cid:12)(cid:12) Ω S (cid:12)(cid:12) = (cid:12)(cid:12) Ω DS (cid:12)(cid:12) × (cid:12)(cid:12) Ω IS (cid:12)(cid:12) , where (cid:12)(cid:12) Ω DS (cid:12)(cid:12) and (cid:12)(cid:12) Ω IS (cid:12)(cid:12) are the size of state sets of the active services and incoming services, and can becalculated as (cid:12)(cid:12) Ω DS (cid:12)(cid:12) = Q Ll =1 ( σ l max + 1) , (cid:12)(cid:12) Ω IS (cid:12)(cid:12) = Q Ll =1 ( λ l max + 1) . It is worth noting that due to therandom nature of the number of the incoming services in each slot and due to the independenceof the number of the incoming services from the number of the active services in each slot,the inclusion of the number of incoming services in each slot in the state space is necessary toachieve the optimal policy. Otherwise, NO cannot differentiate between the different states ofthe number of incoming services of each type in a determined state of the active services andidle resources, which leads to a sub-optimal policy. B. Action Set
The introduced decision variables in III-E are x l,k,sn,u,i and y l,kn,u ′ ( s , s , i , i ) which indicate theactions for the placement of the VNFs and traffic forwarding of the services, respectively. There-fore, we can define the action set as, Ω A = (cid:8)(cid:0) x l,k,su,i , y l,ku ′ ( s , s , i , i ) (cid:1)(cid:12)(cid:12)(cid:12) x l,k,su,i , y l,ku ′ ( s , s , i , i ) ∈{ , } (cid:9) . Even though this action set can result in optimal placement, the implementation ofsuch action set is computationally complex and cannot be used in a practical scenario. Moreprecisely, to determine the optimal policy in each state, all possible actions should be examined,which means a large number of exhaustive searches. Therefore, we would revise the definitionof the action set. According to the definition of the state set in IV-A, the number of the possibleadmitted services for each service type can be considered as the action set of MDP as Ω A = n ( a , a , . . . , a L ) (cid:12)(cid:12) ≤ a l ≤ λ l max o , (19)where a l = M means admitting M services of l th service type according to the requirement ofthis service type. We notice that for each possible action, the placement of admitted servicesis not considered. Therefore, we should determine the placement of admitted services, usinganother static algorithm. For this purpose, we introduce Viterbi-based Reliable Static ServicePlacement (VRSSP) algorithm, in the next section. We consider this algorithm static becauseit determines the placement of services regardless of the cost in the next slots. This algorithmtakes the resources of InPs and admitted services as the inputs and returns the placement ofadmitted services as an output. In each state, depending on the state of the active and incomingservices, some of the actions in Ω A are not feasible. As a result, the feasible actions Ω fA ⊂ Ω A ,for each state S ∈ Ω S where S = (cid:8) ( λ , λ , . . . , λ L ) , ( σ , σ , . . . , σ L ) (cid:9) , can be computed as Ω fA = n ( a , a , . . . , a L ) (cid:12)(cid:12) a l + σ l ≤ σ l max , a l ≤ λ l o . (20)We know that depending on the state of the idle resources, providing the reliability requirementfor some of the admitted services is not feasible. In other words, in each state, some of the actionsin Ω fA are not feasible because of the limitation in the resources of the InPs. We should designthe reward function in a way that prevents NO to choose such actions as an optimal action ineach state. We completely discuss the reward function of each action in the following. C. Transition Probability
The probability set Ω P : Ω S × Ω A → Ω S is the state transition function, giving for each stateand action, a probability distribution over the state set. Therefore, we write P (cid:0) S n − , A n − , S n (cid:1) for the probability of ending in state S n , given that the agent starts in state S n − and takesaction A n − . In the service placement problem, the state transition depends on the current state,the taken action, the departure distribution of the active services, and the distribution of serviceincoming, For defining state transition matrix P , we indicate the probability distribution overthe state space in the n th slot by b n , which is a vector with a length of (cid:12)(cid:12) Ω S (cid:12)(cid:12) defined as b n = (cid:0) b n , b n , . . . , b | Ω IS | n ) , b in = (cid:0) b i, n , b i, n , . . . , b i, | Ω DS | n (cid:1) , (21) b i,jn = Pr (cid:16) λ n = ( i , i , . . . , i L ) , σ n = ( j , j , . . . , j L ) (cid:17) , (22)where b in is a vector that indicates the probability distribution over the state space of the activeservices when the state of incoming service is i , and b i,jn indicates the probability that the stateof incoming service is i , and the state of active service is j . The state of the incoming servicesis i , and the state of the active services is j , when we have L X l =1 i l × δ lλ = i, δ lλ = Q l − r =1 ( λ r max + 1) , l ≥ , , l = 1 , ≤ i ≤ | Ω IS | , (23) L X l =1 j l × δ lσ = j, δ lσ = Q l − r =1 ( σ r max + 1) , l ≥ , , l = 1 , ≤ j ≤ | Ω DS | . (24)where δ lσ and δ lλ are the auxiliary variables introduced to facilitate the computation of the statetransition matrix, P .We assume that the number of incoming services in the n th slot is independent of the incomingservices and active services in the ( n − th slot. Also, the number of active services in the n th slot depends on the state of active service and the taken action in the ( n − th slot. Therefore, S n is independent of the state of incoming services in the ( n − th slot, and the state transition matrix, P , is a | Ω DS | × | Ω S | matrix and can be written as P = h P P · · · P | Ω IS | i , where P i is a | Ω DS | × | Ω DS | matrix which is the transition matrix over the state space of the active serviceswhen the state of incoming service is i . Each element of P i is defined as P i ( j, k ) = Pr (cid:16) σ n = ( k , k , . . . , k l ) , λ n = ( i , i , . . . , i l ) (cid:12)(cid:12) σ n − = ( j , j , . . . , j l ) (cid:17) , (25) = Pr (cid:16) σ n = ( k , k , . . . , k l ) (cid:12)(cid:12) σ n − = ( j , j , . . . , j l ) (cid:17) × Pr (cid:16) λ n = ( i , i , . . . , i l ) (cid:17) , (26)where the state of the active services in the ( n − th and n th slot are j and k , respectively. Theterms of being in state i for the incoming services and being in state j and k for the activeservices are defined in (23) and (24). We could write (26) due to assumption that the state of theincoming services in the n th slot is independent of the state of the active services in the ( n − th slot. The probabilities in (26) can be computed as follows: Pr (cid:16) σ n = ( k , . . . , k l ) (cid:12)(cid:12) σ n − = ( j , . . . , j l ) (cid:17) = ∃ l, ( j l − k l ) < , Q Ll =1 ( d l ) j l − k l otherwise , (27) Pr (cid:16) λ n = ( i , i , . . . , i l ) (cid:17) = Q Ll =1 f l ( i l ) . (28)Now, we can write the state transition probability, P (cid:0) S n − , A n − , S n (cid:1) , as P (cid:0) S n − , A n − , S n (cid:1) = P i ( j, k ) , (29) P Ll =1 λ ln × δ lλ = i, P Ll =1 σ ln × δ lσ = k, P Ll =1 ( σ ln − + a ln − ) × δ lσ = j, where the values of δ lλ and δ lσ are defined in (23) and (24). D. Reward Function
In MDP, the reward set Ω R : Ω S × Ω A → R is the reward function, giving the expectedimmediate reward gained by the agent for taking each action in each state. We use R ( S, A ) forthe expected reward of taking action A ∈ Ω fA in state S ∈ Ω S . For the evaluation of R ( S, A ) we define a validation metric ∆( A, S ) , where ∆( A, S ) = 0 , if there is not enough resourcesfor main server placement, and otherwise ∆( A, S ) = 1 . As mentioned before, we define theaction set as the number of possible admitted services. It was also noted that for each action,the placement of admitted services should be characterized using another static algorithm namedVRSSP algorithm. In some cases for the taken action and idle resources of the InPs (estimatedfrom the number of the active services), main server placement for some of the services isnot possible because of the lack of resources in InPs. For this type of actions in such states, ∆( A, S ) = 0 . Now, we can define the reward function regarding the reward of admitting eachservice, providing its reliability requirement and the placement cost of admitting each service as R ( S, A ) =
A, S ) = 0 , P Ll =1 P a l k =1 q l × I ( F l − e l,k ) − ξ l,kp ∆( A, S ) = 1 , (30)where q l is the reward of admitting l th service type and I ( · ) is the indicator function where I ( x ) = 1 if x ≥ , and otherwise I ( x ) = 0 . Also, e l,k and ξ l,kp are the failure probability andplacement cost of the k th service of l th type, respectively, which are results of running the VRSSPalgorithm. We explain the inputs and outputs of the VRSSP algorithm, in the next section.V. V ITERBI - BASED R ELIABLE S TATIC S ERVICE P LACEMENT (VRSSP) A
LGORITHM
In this section, we present an algorithm for static service placement problem, which is namedVRSSP algorithm. The introduced algorithm is called static because of solving service placementproblem without considering the effect of possible placement solutions on the future slots. We in-dicate the input of the VRSSP algorithm as V I = ( A, ρ, O ) . The first input, A = { a , a , . . . , a L } ,determines the number of services that is admitted from each service type. The second input isdefined as ρ = ( ρ , ρ , . . . , ρ K ) , where K = P Ll =1 a l , and ≤ ρ k ≤ L determine the placementarrangement of the services which can be admitted according to action A . In this way, ρ k = l means that the type of k th service for placement is l . The order of the services which should beplaced by the VRSSP algorithm is effective on the output of this algorithm. For example, for agiven action A , the order of the services which should be placed by the VRSSP algorithm can bechanged for the different orders. Therefore, we consider the order of the services for placement asone of the inputs of the VRSSP algorithm. Also, different orders can lead to different placementcost. These two points directly affect the reward of each action. As a result, the optimumplacement arrangement can be specified during the procedure of determining the optimal policy,which is explained in more details, in Section VI-B. The last input determines the state of idleresources in the InPs which can be used for service placement, defined as O = { ( ω si,j ) } ∈ Ω RS .For each input, the algorithm determines the validation metric, ∆( A, S ) , defined in SectionIV-D. In the case of valid input, the algorithm will also determine the placement cost, ξ l,kp ,failure probability, e l,k and the amount of usage resources of servers, N si,j [ l, k ] . Therefore, theoutput of the VRSSP algorithm is V O = (∆ , ξ l,kp , e l,k , N si,j [ l, k ]) where ≤ i ≤ | I | , ≤ s ≤ | S i | , ≤ j ≤ | R | , ≤ l ≤ L , and ≤ k ≤ K . In the VRSSP algorithm, we apply the idea of the Viterbi algorithm for finding the mostlikely sequence of hidden states (called the Viterbi path) that results in a sequence of observedevents. A Viterbi path is defined as an example of a sequence for observed events. The Viterbialgorithm first models the states of the problem and their transitions as a multistage graph. Thenumber of stages is equal to the number of observed events. In each stage, one of the observedevents is considered. The state of Viterbi algorithm is defined as the possible events in eachstage. For the transition between all pairs of the states in consecutive stages, a transition costis defined according to the objective function of the considered optimization problem. For eachstate of a stage, a path with minimum cost is selected as the survived path between the inputpaths to the respective state. It is worth noting that for determining the survived path of a state,the cumulative cost of input paths to the respective state is considered. Finally, in the last stage,each possible state has a survived path with a cost. The survived path of state with minimum costis selected as a Viterbi path in the last stage. According to this description, first, we determinestate, stage, and transition cost in our problem.Let L V = P Kk =1 U ρ k denote the number of stages in the VRSSP algorithm, where thecoefficient is used to consider two server assignments including the main and backup serversfor each VNF. We assume that the NO can assign at most one backup server for each VNFof a service. In each stage, placement of the main or backup servers for a specific VNF isperformed. For example, in the first stage, the placement of the main server for the first VNFof the first service is performed. In the second stage, the placement of the backup server forthe first VNF of the first service is performed. In the (2 × U ρ + 1) th stage, the placement ofthe main server for the first VNF of the second service is performed. We notice that U ρ is thenumber of VNFs for the first service. It is worth noting that in each stage, several candidatesfor the placement of the main or backup server for the respective VNF is determined. In thefinal stage, the certain placement of the main and backup server for all VNFs of all services isspecified. In Appendix A, the detailed procedure of determining the placement for the servicesof a specific input V I = ( A, ρ, O ) using the idea of Viterbi algorithm is indicated.Algorithm 1 shows the details of the proposed VRSSP algorithm. At the beginning, weinitialize the outputs and variables used in the algorithm. The variable SR [ i, s, j ] indicatesthe idle resources of the servers, at the beginning. Then, we have a loop with a length of L V .In each iteration of this loop, we determine the candidate for the main or backup server of a Algorithm 1:
Viterbi-based Reliable Static Service Placement (VRSSP) Algorithm Viterbi algorithm input: V I = ( A, ρ, O ) , K = P Ll =1 a l , L V = P Kk =1 U ρ k , X = 0 , φ = 0 , SR [ i, s, j ] = ω si,j , ∆ = 1 , ξ l,kp = 0 , e l,k = 1 , p l,k = 1 , N si,j [ l, k ] = 0 for ( m = 1 : L V ) do Determine k m and u m . Determine the set of states, X m using (36) . for x ∈ X m do Determine the index of InP, i x m , and the index of server in the related InP, s x m . XP x m = X m − , RemovedState = {} . if x = 0 then for x ∈ XP x m do for j = 1 : | R | do if ( SR x m − [ i x m , s x m , j ] − r ρ km u m ,j < then XP x m = XP x m − { x } . break . end end end if m mod 2 = 0 && x ∈ XP x m then XP x m = XP x m − { x } . end end if ( XP x m = φ ) then for x ∈ XP x m do compute Θ x ,x m − ,m using (37). end Compute SP x m using (41) and SR x m = SR SP x m m − . Compute Λ Wm,x and Λ Im,x using (42). Compute φ x m and τ x m using (43) and (44). if x = 0 then for j = 1 : | R | do SR x m [ i x m , s x m , j ] = SR x m [ i x m , s x m , j ] − r ρ km u m ,j . end end else Add the x to the RemovedState . end end if ( RemovedState = X m ) then ∆ = 0 , return . else Remove all states in the
RemovedState from the X m . end end Compute ξ l,kp , e l,k , N si,j [ l, k ] by running Algorithm 2 . specific VNF. At the beginning of each iteration, in Line 3, we obtain the index of consideredservice and the index of considered VNF in the corresponding service. Also, the set of states(i.e., candidates) for hosting the corresponding VNF is determined in Line 4. In the following,we have an inner loop with the length | X m | in Lines 5-35. In each iteration of the inner loop,we consider one of the placement candidates. In Line 6, the index of server and InP for the Algorithm 2:
Viterbi Output Computation (VOP) Algorithm Determine ζ , P IL V and P WL V , using (45), a l = 0 for l = 1 , , . . . , L . for m = 1 : L V do Determine k m , u m and i u m = P IL V [ m ] , s u m = P WL V [ m ] . if ( u m = 1 ) then a ρ km = a ρ km + 1 . end for j = 1 : | R | do N s um i um ,j [ ρ k m , a ρ km ] = N s um i um ,j [ ρ k m , a ρ km ] + r ρ km u m ,j . end ξ ρ km ,a ρkm p = ξ ρ km ,a ρkm p + P | R | j =1 r ρ km u m ,j × C i um ,j + DC i um ,t ρkmum . if ( u m ≥ && m mod 2 = 1 ) then ξ ρ km ,a ρkm p = ξ ρ km ,a ρkm p + b ρ km × (cid:16) C s um − ,s um i um − ,i um + C s um − ,s um i um − ,i um (cid:17) . end if ( u m ≥ && m mod 2 = 0 ) then ξ ρ km ,a ρkm p = ξ ρ km ,a ρkm p + b ρ km × (cid:16) C s um − ,s um i um − ,i um + C s um − ,s um i um − ,i um (cid:17) . end if ( m mod 2 = 0 ) then p ρ km ,a ρkm = p ρ km ,a ρkm × (1 − v i um v i um − ) end if ( m = U ρ km ) then e ρ km ,a ρkm = 1 − p ρ km ,a ρkm . end end considered placement candidate, x is determined. Then, the set of possible input paths, XP x m is determined in Lines 7-19. For this purpose, we should consider the idle resources of serversof the input paths, SR x m − , in the ( m − th stage. If the set of possible input paths to the ( x ) th candidate is null, the corresponding server did not have enough resource for hosting theconsidered VNF and added to a set named RemovedState , in Line 33. Otherwise, between allpossible input paths to the ( x ) th candidate, the path with minimum cost named survived pathis selected in Line 24. Then, the value of Viterbi related parameters are updated in Lines 25-31.When all candidates are considered if the RemovedState and X m are equal (Line 36), noneof the servers has enough resource and as a result, the rest of services will not be admitted. Moreprecisely, the services { k m , k m +1 , . . . , K } cannot be admitted because of the insufficient resourceof the servers. In such a scenario, the validation metric is set to , and the VRSSP algorithm isterminated, as indicated in Line 37. This scenario can happen only in the odd stages in whichwe consider main server allocation. This is because that in the even stages, we have no serverassignment as an option, and as a result, XP x m could not be null and X m = RemovedState .Finally, if the validation metric is , the output of the Algorithm 1 is computed using Algorithm e l,k , placementcost, ξ l,kp , and usage resources, N si,j [ l, k ] , for all services. At the beginning of this algorithm, theViterbi path is computed in Line 1. The usage resources of each service are computed in Lines7-9, the placement cost for each service is computed in Lines 10-16, and the failure probabilityof placement for each service is computed in Lines 17-22.The computational complexities of Algorithms 1 and 2 can be computed for a specific input, V I = ( A, ρ, O ) . The number of stages in the VRSSP algorithm is L V . Also, the total numberof the servers of all InPs is | S | = P | I | i =1 | S i | . According to these variables, the computationalcomplexities of Algorithms 1 and 2 are O ( L V × | S | ) and O ( L V ) , respectively. Both of thesevalues grow linearly with increasing the number of the services. The memory resources neededfor Algorithm 1 is mostly used for the storage of the survived path information which is relatedto the number of the stages and the number of the states. The resource requirement of Algorithm1 for input V I is, L V × | S | , which grows linearly with the increasing the number of the services.Also, the memory resource requirement of Algorithm 2 is negligible compared to Algorithm 1.VI. O PTIMAL P OLICY C OMPUTING
In an MDP, the agent will act in such a way as to maximize the long-run reward received.The most straightforward framework is the infinite-horizon discounted model, in which we sumthe rewards over the infinite lifetime, but discount them geometrically using discount factor < γ < . The agent should act so as to optimize E (cid:2) P ∞ n =0 γ n R n (cid:3) , where R n is the rewardgained in the n th step. In this model, rewards received earlier in its lifetime have more value tothe agent. Even though the infinite lifetime is considered, but the discount factor ensures that thesum is finite. In the infinite-horizon discounted case, we write V π ( S ) for the expected discountedsum of future reward for starting in state S and executing policy π , which can be computed as V π ( S ) = R (cid:0) S, π ( S ) (cid:1) + γ P S ′ P (cid:0) S, π ( S ) , S ′ (cid:1) V π ( S ′ ) . (31)This value function for each policy π ∈ Ω π is the unique simultaneous solution of this setof linear equations, i.e., one equation for each state S ∈ Ω S . In the infinite-horizon discountedmodel, for any initial state S ∈ Ω S , we want to obtain and execute the policy π that maximizesvalue function V π ( S ) . We know that there exists a stationary policy, π ∗ , that is optimal for everystarting state [29]. The value function of this policy, V ∗ , can be computed by the set of equations V ∗ ( S ) = max A ∈ Ω A n R (cid:0) S, A (cid:1) + γ P S ′ P (cid:0) S, A, S ′ (cid:1) V ∗ ( S ′ ) o . (32) According to (32), the optimal policy, π ∗ , is just a greedy policy with respect to V ∗ . There aremany methods for finding optimal policies for MDPs. Here, we would like to explore the valueiteration algorithm to find the optimal policy [29]. For this purpose, we should first introducethe proposed method for estimating the state of idle resources of the InPs, { ( ω si,j ) } , accordingto the state of active services in S = (cid:8) ( λ , λ , . . . , λ L ) , ( σ , σ , . . . , σ L ) (cid:9) . A. State Estimation for InP Resources
As mentioned in Section IV-A, we would like to introduce a method for estimating the stateof the idle resources of InPs according to the state of active services. Let ω s,η S i,j denote the idleresource of j th resource type in s th server of i th InP for S = (cid:8) ( λ , λ , . . . , λ L ) , ( σ , σ , . . . , σ L ) (cid:9) ,and η S = 1 + P Ll =1 σ l × δ lσ , where δ lσ is defined in (24). We know that the state of idle resourcesis depend on the state of active services and independent of the state of incoming services. Weassume that the NO is in the state S , takes action A = { a , a , . . . , a L } and ends in state S ′ .The estimation for the state of the idle resources in state S ′ can be updated as ω s,η S ′ i,j = α η S ′ × (cid:0) ω s,η S i,j − N si,j ( S, A ) (cid:1) + (1 − α η S ′ ) × ω s,η S ′ i,j , α η S ′ = α η S ′ × D η S ′ , (33)where ω s,η S ′ i,j denotes the state of the idle resources in state S ′ and N si,j ( S, A ) is the amount ofusage resources of j th resource type in the s th server of i th InP for taking action A in state S .Finally, α η S ′ is the updating factor for estimating the state of the idle resources in state S ′ . Thevalue of α η S ′ is reduced in each resource estimation of state S ′ using the discount factor D η S ′ .The value of η S ′ and N si,j ( S, A ) can be computed as N si,j ( S, A ) = P Ll =1 P a l k =1 I ( F l − e l,k ) × N si,j [ l, k ] , (34) η S ′ = 1 + P Ll =1 (cid:0) σ l + a lr (cid:1) × δ lσ , a lr = P a l k =1 I ( F l − e l,k ) , A r = ( a r , a r , . . . , a Lr ) , (35)where N si,j [ l, k ] and e l,k are the outputs of running VRSSP algorithm with input of V I = ( A, ρ, O ) in which O = { ( ω s,η S i,j ) } and ρ = ( ρ , ρ , . . . , ρ K ) is an arbitrary placement arrangement of theservices which should be admitted according to action A . Also, a lr indicates the number ofservice of l th type which can be admitted according to the output of the VRSSP algorithm.According to the introduced updating formula for the state of the idle resources in (33), weshould determine the initial value of ω s,η S i,j , which is set to zero. On the other hand, for state S = (cid:8) ( λ , λ , . . . , λ L ) , (0 , , . . . , (cid:9) which means that there is no active service in the systemand we have ω s,η S i,j = 0 . In such a state, with taking different feasible action A ∈ Ω A and endsin different state S ′ ∈ Ω S , the value of ω s,η S ′ i,j can updated using (33). B. Placement Arrangement for Actions
As mentioned in Section V, one of the inputs of the VRSSP algorithm is the placementarrangement of the services which should be admitted according to the taken action A . Theplacement arrangement for action A = { a , a , . . . , a l } with K = P Ll =1 a l is indicated as ρ =( ρ , ρ , . . . , ρ K ) , where ρ k = l , means that the type of k th service for placement is l . We wantto find optimum the optimum placement arrangement for each feasible action A in each state S . For this purpose, we define ρ S,A and ρ S,A opt as the set of possible placement arrangementand the optimum placement arrangement of taking action A in state S . The possible placementarrangements for each feasible action A in each state S are evaluated within the value iterationalgorithm and the optimum placement arrangement is determined. In this way, R S,A opt is the rewardof optimum placement arrangement for taking action A in state S . The values of R S,A opt and ρ S,A opt are updated, when ρ ∈ ρ S,A which is one of the possible placement arrangements for takingaction A in state S , is considered within the value iteration algorithm. More precisely, if thevalue of R ( S, A ) is greater than the value of R S,A opt , the values of ρ S,A opt and R S,A opt are updated.The value of R ( S, A ) is determined using (30) where the values of e l,k and ξ l,kp are the outputof running VRSSP algorithm with input of V I = ( A, ρ, O ) in which O = { ( ω s,η S i,j ) } .In Algorithm 3 the VRSSP-based Value Iteration (VVI) algorithm for finding the optimalpolicy of MDP is indicated. In Lines 1 and 2, the characteristics of service types and MDP-related parameters are determined. Then, the set of state space and transition matrix are computedin Line 3. The state of idle resources in the InPs and the value function of each state are initializedin Line 4. The set of possible placement arrangement for each feasible action in each state, ρ S,A is computed in Lines 5-18. For this purpose, the
ST V is a vector which indicates the set ofservice types which should be admitted according to action A . For example, if A = (1 , , , ,we have ST V = { , , , } which means that one service of first service type, two servicesof second service type and one service of forth service type should be admitted. According to ST V , there are 12 possible placement arrangements for the given action A . On the other hand,because of the structure of the VRSSP algorithm, each one of these placement arrangementscan lead to a different output which changes the performance of the proposed algorithm. As aresult, we can obtain the optimal placement arrangement within the value iteration process.In Lines 19-39, the value iteration is executed for finding the optimal action. In each iterationof the value iteration algorithm, the value function of each S ∈ Ω S is updated. In this way, the Algorithm 3:
VRSSP-based Value Iteration (VVI) Algorithm Service input: Υ l = (cid:8) F l , d l , b l , U l , f l , λ l max , r lu,j , t lu (cid:9) , L . MDP related parameter: q l , σ l max , γ, ǫ, | O | . Compute the state set Ω S = Ω DS × Ω IS using 17-18 and state transition matrix, P , using 25-28. ω s,η S i,j = 0 , α η S = 1 , D η S = 0 . and V ( S ) = 0 ∀ S ∈ Ω S , n = 1 . for S ∈ Ω S do Compute the set of feasible action, Ω fA , using 20. for A ∈ Ω fA do R S,A opt = 0 , ρ
S,A = {} , K = P Ll =1 a l , ST V = {} . for ( l = 1 : L ) do for ( a = 1 : a l ) do Add l to the ST V . end end for ( o = 1 : | O | ) do P erm = A Random permutation of
ST V and add
P erm to the ρ S,A . end end end while (cid:0) n ≥ && (cid:12)(cid:12) V n ( S ) − V n − ( S ) (cid:12)(cid:12) ≥ ǫ (cid:1) do for S = ( λ, σ ) ∈ Ω S do Compute the set of feasible action, Ω fA , using 20. η S = 1 + P Ll =1 σ l × δ lσ , ω si,j = ω s,η S i,j ∀ i, j, s and A r = (0 , , . . . , . for A ∈ Ω fA do Select ρ ∈ ρ S,A , ρ
S,A = ρ S,A − { ρ } and R ( S, A ) = 0 . V I = ( A, ρ, ω si,j ) and (∆ , ξ l,kp , e l,k , N si,j [ l, k ]) = Algorithm1 ( V I ) . if ( ∆ = 1 ) then Compute R ( S, A ) using (30). Compute η S ′ and A r using (35). Update ω s,η S ′ i,j and α η S ′ using (34) and (33). if ( R S,A opt ≤ R ( S, A ) ) then R S,A opt = R ( S, A ) , ρ S,A opt = ρ . end end Q An ( S ) = R ( S, A ) + γ × P S ′ ∈ Ω S P ( S, A r , S ′ ) × V n − ( S ′ ) . end A = arg max A ∈ Ω fA Q An ( S ) . V n ( S ) = Q An ( S ) , O n ( S ) = O S,A opt . end end set of feasible actions for state S is determined in Line 21 and the state of idle resources in theInPs is obtained in Line 22. Then, the value of Q An ( S ) which is the n th step value of starting instate S , taking action A , then continuing with the optimal ( n − th step nonstationary policy,is determined in Lines 23-35. For this purpose, one placement arrangement is selected in Line24, the input of VRSSP algorithm is determined, and VRSSP algorithm runs in Line 25. In caseof valid input ( ∆ = 1 ), the reward of taking action A in state S is computed in Line 27, thenumber of admitted services according to the output of VRSSP algorithm is computed in Line , λ n , λ n +1 . . . . . ., λ Ln , λ Ln +1 . . . + S n = { λ n , σ n } + Determine A n and ρ n for S n + Run VRSSP with V I = ( A n , ρ n , O n ) Service admission,update ω si,j , σ n Beginning of n th slot Determinedeparture servicesUpdate ω si,j anddetermine σ n +1 End of n th slot O n = { ω si,j } σ n = ( σ n , . . . , σ Ln ) Run VVIalgorithm + Service TypesInformation
InP1 InP2 InP3 InP4BackboneNetwork
ServerVirtual SwitchPhysical Switch
NFVI
Fig. 1: Dynamic reliability-aware service placement procedure.28 and the index of next state, η S ′ and A r are determined in Line 28. Then, the state of idleresources and the values of R S,A opt and O S,A opt are updated in Lines 29-32. The value of Q An ( S ) iscomputed in Line 34 and the values of V n ( S ) and O n ( S ) are updated in Lines 36-37, where O n ( S ) is the optimal placement arrangement for the optimal action of n th step in state S .For evaluating the complexity of Algorithm 3, we compute the computational complexity ofeach iteration of the value iteration process. The computational complexity of each iteration is O ( | Ω S | × | Ω A | × | U | × | S | ) , where | Ω S | is the size of state set, | Ω A | is the size of action set, | S | is the total number of the servers of all InPs, and | U | = 2 × P Ll =1 U l × λ l max is the maximumnumber of stages in the VRSSP algorithm. The memory usage of Algorithm 3 is mostly dueto the storage of the placement arrangement information and the value function of the states.The memory resource of the placement arrangement information is, | ρ S,A | × | O | × | A | , where | ρ S,A | = | Ω S | × | Ω A | , | O | is the maximum number of the permutations which are evaluated foreach action during Algorithm 3, and | A | = P Ll =1 λ l max is the maximum number of the serviceswhich can be admitted according to an action. The memory resource of the value functionof the states is, | Ω S | × | Ω A | , which is related to the required memory for Q An ( S ) . The memoryresources of Q An ( S ) is negligible compared to the memory resource of the placement arrangementinformation.Therefore, the memory resource of Algorithm 3 is, | ρ S,A | × | O | × | A | .In Fig. 1, the proposed model for dynamic reliability-aware service placement is shown. Asseen in this figure, VVI algorithm takes the characteristics of service types and NFVI as theinputs to determine the optimal service admission and placement policy. At the beginning of each slot, NO takes the state of active services and the state of incoming services as the input todetermine the optimal action ( A n , ρ n ) . Then, according to the state of the NFVI’s resources andthe optimal action, the VRSSP algorithm determines the admitted services and their placement.Then, the state of the active services and the NFVI’s resources are updated. All of these tasksare conducted at the beginning of each slot. At the end of each slot, the terminated services aredetermined and the state of the active services and the NFVI’s resources are updated.VII. N UMERICAL R ESULT
In this section, we would like to evaluate the performance of the proposed MDP-based model.We consider three performance metrics, including the placement cost, the number of backupservers, and the admission ratio. The placement cost for each service is introduced in (8). Thenumber of backup servers indicates the number of additional servers used to meet the requiredreliability. The admission ratio is defined as the ratio of the number of accepted services withthe required reliability to the number of incoming services. We compare the performance ofthe proposed model with five static methods for reliability-aware service placement, includingVRSSP, MinResource, MinReliability, CERA, and RedundantVNF. These algorithms considerthe service placement in each slot using the idle resources without concerning about the effectsof decisions on the next slots. In the VRSSP algorithm introduced in Algorithm 1, NO tries toadmit all of the arrival services during the ( n − th slot, at the beginning of n th slot.In the MinResource, MinReliability, and CERA methods, NO first allocates the main serversto all incoming services. Then, it endeavors to meet the reliability requirement of each serviceby allocating backup servers. The policy of allocating the main servers in these three methods isminimizing the placement cost, and the main difference between them is on the backup serverallocation algorithm. In MinResource method, for each service, NO selects the VNF with theminimum required resource and then chooses a backup server for the given VNF in a way that therequired reliability is met. Otherwise, a server with the highest reliability and sufficient resourceis allocated as a backup [11]. In MinReliability method, for each service, NO selects the VNFwith the minimum reliability and then chooses a backup server for the given VNF in a waythat the required reliability is met. Otherwise, the selected VNF is assigned to a server with thehighest reliability and sufficient resource [10]. The CERA method employs a metric named costimportance measure for VNF selection for backup and backup placement in each iteration untilthe required reliability is met [11]. The RedundantVNF, introduced in [13], considers iterative Step of VVI algorithm T he m ean o f s t a t e s v a l ue q l = 10000, R i,js = 120q l = 10000, R i,js = 90q l = 4000, R i,js = 120q l = 4000, R i,js = 90q l = 1000, R i,js = 120q l = 1000, R i,js = 90 (a) Convergence in case of changing service admission reward, q l . Step of VVI algorithm T he m ean o f s t a t e s v a l ue = 10, R i,js = 120 = 15, R i,js = 120 = 10, R i,js = 100 = 15, R i,js = 100 = 10, R i,js = 80 = 15, R i,js = 80 (b) Convergence in case of changing server cost parameter, β . Fig. 2:
Convergence of the proposed VVI for different scenarios of service admission reward, q l , departureprobability d l , and server cost parameter, β . backup adding to the services according to the reliability of each VNF in each service. It isworth noting that this algorithm simultaneously performs main and backup server placement. A. Convergence of VVI Algorithm
Now, we investigate the convergence of Algorithm 3 for finding the optimal policy in differentscenarios. We first introduce the simulation setup. We consider an NFV-enabled NO whichwould like to deliver four service types. The reliability requirement of each type is among { , , , } , according to the SLA of Google Apps [30]. The SFC of each service typeconsists of three to six VNFs, and the resource demand of each VNF is randomly generatedbetween 20 and 30 units. The number of arrival services of different types is a random numberbetween zero and two services in each slot. The NFVI consists of seven InPs with reliabilitylevels from to with step. For each InP, we consider three servers with equalreliability. For evaluating the performance of the MDP model, different values for the resourcecapacity of each server between 80 and 120 is examined. It is worth noting that Algorithm 3 isperformed offline. Therefore, there is no need for execution of Algorithm 3 during the slot.In Fig. 2a, the convergence of the mean state value, V Mn = P S ∈ Ω S V n ( S ) / | Ω S | , for thedifferent values of the service admission reward, q l , and two different values for the capacity ofservers, is shown. In Fig. 2b, the convergence of V Mn , for two different values of parameter β used in (3), and three different values for the capacity of servers, is indicated. As seen in thesefigures, the convergence rate is not affected by changing the value of q l , the value of β , and theresource capacity of servers. However, by increasing the value of q l and the resource capacityof the servers, the value of V Mn is gained, which is expected. As seen in Fig. 2b, increasingthe value of β leads to reduction of V Mn . According to (3), increasing the value of β leads toincrease in server cost, which increases the placement cost and reduces the value of V Mn .
50 55 60 65 70 75 80 85 90 95
Admission ratio (percentage) : d l = . , : d l = . , : d l = . MDP modelVRSSP algorithmMinResourceMinReliabilityCERA (a)
Admission ratio for different values of d l . Avg. number of backup per VNF : d l = . , : d l = . , : d l = . MDP modelVRSSP algorithmCERAMinResourceMinReliability (b)
Number of backups per VNF for different values of d l . Fig. 3:
Performance of MDP model and static methods for different values of service departure probability, d l . B. Performance Comparison of MDP and Static Algorithm
Now, we would like to evaluate the performance of MDP model by changing the depar-ture probability, d l , and the server resource, R si,j . The simulation setup is the same as theintroduced setup in Section VII-A. In Figs. 3a-3b, the admission ratio and average numberof backup servers per VNF for the MDP model and four static methods, by changing the servicedeparture probability are indicated, respectively. The simulation is conducted by assumptionof q l = 4000 , β = 15 , R si,j = 80 . As seen in Fig. 3a, the admission ratio of MDP modelis remarkably higher than the other algorithm, especially by decreasing the value of d l . Bydecreasing the value of d l , the admitted services last for more number of slots. In such a scenario,the MDP model is much more efficient for improving the admission ratio. On the other hand,for a large value of d l , the MDP model can be much more efficient for improving the averagenumber of backups, and by increasing the value of d l , the improvement of the MDP model interms of the average number of the backups is gained, as indicated in Fig. 3b. This improvementin the average number of backups is due to the fact that for large values of d l , the MDP modelcan find an appropriate placement for each service, by admitting the services in proper states.In Figs. 4a-4c, the admission ratio, placement cost, and mean the number of backups perVNF for the MDP model and five static methods, by changing the values of server resource, R si,j , are indicated, respectively. The simulation is conducted with d l = 0 . , β = 15 . The valueof q l is determined based on the number of the VNFs in the l th service type. It is worth notingthat we expect the advantage of the MDP is emerged in the full load scenario, in which thenumber of the incoming services is increased. Because when there exists sufficient resources inthe InPs, most of the algorithms can admit the high percentage of the incoming services. Forevaluating this fact, we fix the mean number of the incoming services and alter the amount ofInPs’ resources. According to Fig. 4a, the admission ratio of the MDP model is significantly
60 65 70 75 80 85 90 95 100 105 110
Server resource A d m i ss i on r a t i o ( pe r c en t age ) MDP modelVRSSP algorithmRedundantVNFCERAMinResourceMinReliability (a)
Admission ratio for different values of server resource.
60 65 70 75 80 85 90 95 100 105 110
Server resource P l a c e m en t c o s t CERAMinReliabilityMinResourceRedundantVNFVRSSP algorithmMDP model (b)
Placement cost for different values of server resource.
60 65 70 75 80 85 90 95 100 105 110
Server resource A v g . nu m be r o f ba ck up pe r V N F MinResourceMinReliabilityCERARedundantVNFVRSSP algorithmMDP model (c)
Number of backups per VNF for different values of server resourc.
Fig. 4:
Performance of MDP model for different values of server resource, R si,j , compared to static methods. higher than the static models, especially for the low amount of server resources. This is due tothe fact that by increasing the amount of server resources, most of the incoming services canbe admitted by using a simple algorithm, and there is no need for more intelligence in serviceplacement. Also, as shown in Figs. 4b-4c, the placement cost and the average number of backupservers in the MDP model are remarkably lower than the MinResource, MinReliability, CERA,and RedundantVNF methods and in the proximity of VRSSP algorithm. It is worth noting thatthe placement cost and the average number of backup servers of RedundantVNF are lower thanthe MDP model for the low amount of resources. Because the RedundantVNF method onlyadmits the services with a low number of VNFs which leads to lower admission ratio comparedto MDP model. However, RedundantVNF method still has a better performance in all metricscompared to MinResource, MinReliability, and CERA methods.One of the most important aspects of evaluating the performance of the MDP model is fairnessanalysis of this model. The most critical characteristics of the services which should be takeninto consideration are the reliability requirement and number of VNFs of the services. For thispurpose, in Figs. 5a-5b, the admission ratio of the MDP model and the VRSSP algorithm forthe different number of the VNFs and reliability requirements of the service is indicated. It isworth noting that between the introduced static methods, the VRSSP algorithm has the best Number of the VNFs in the SFC of the service A d m i ss i on r a t i on MDP modelVRSSP algorithm (a)
Admission ratio for different number of VNFs.
Reliability requirement of the service A d m i ss i on r a t i on MDP modelVRSSP algorithm (b)
Placement cost for different values of reliability requirement.
Fig. 5:
Admission ratio of the MDP model and VRSSP algorithm for different characteristics of the service types. performance. The simulation is conducted with d l = 0 . , β = 15 , R si,j = 70 . The value of q l isdetermined based on the number of the VNFs in the l th service type. As shown in Fig. 5a, theadmission ratio of the MDP model is greater than the VRSSP algorithm when the number of VNFis , , , and only when the number of VNF is , the performance of VRSSP is slightly betterthan the MDP model. In other words, the MDP model can significantly improve the admissionratio in the cost of slightly reducing the admitted services with a high number of VNFs. Thereis similar observation by increasing the reliability requirement, as seen in Fig. 5b, indicates thatthe MDP model can significantly improve the admission ratio in the cost of slightly reducingthe admitted services with high reliability requirement.While our simulation results demonstrate the proof of concept, the accuracy of the simulationsis one of the most important aspects of validating the paper. For this purpose, the repeatability ofthe simulations and the time horizon of terminating simulations are two key factors, as discussedin [31], [32]. For fulfilling the first aspect, we used Matlab software, version of 2017.b, whichbenefits the use of Mersenne Twister, one of the best pseudo-random number generators (PRNG)[31], [33]. Regarding the second aspect, we should discuss two points. First, for determinedservice types and NFVI, VVI algorithm runs to determine the optimal policy. Then, due to therandom nature of service arrival and departure, a large-enough number of the slots should besimulated to evaluate the policy. The required number of slots is dependent on the size of statespace, | Ω S | = Q Ll =1 ( σ l max + 1) × Q Ll =1 ( λ l max + 1) , which is equal to × = 104976 , in theconducted simulation. Therefore, we simulated slots to have sufficient data for evaluatingthe policy. The second point is the existence of parameters which can change the output policyof VVI algorithm, for a given NFVI. In this way, the most effective parameters are the reliabilityrequirement and the number of VNFs in each service type. In the considered simulation setup,there are different states for the reliability requirement, and also different states for the number of the VNFs of service types, which leads to different combinations of these twoparameters. However, a great number of these combinations is not meaningful. Nevertheless,simulating all of the combinations is time-consuming. Therefore, for each determined NFVI, wesimulate about different combinations. More precisely, for obtaining each point in Figs. 3-4which has a determined NFVI, we simulated 1000 different combinations of reliability require-ment and number of VNFs of the services. For each combination, we run VVI algorithm, thenthe resulted policy of VVI algorithm is evaluated during slots to determine the performancemetrics. For a better clarification, we precisely report the admission ratio of MDP model andVRSSP algorithm which is the second best algorithm, when R si,j = 70 . The mean improvementof admission ratio using MDP model compared to VRSSP algorithm is . . The minimumand maximum improvement of admission ratio are . and . .Finally, we compare the runtime of Algorithm 1 with MinResource, MinReliability, and CERAalgorithms. All the simulations whose results are reported in this paper are conducted using amachine having an Intel 2.7 GHz processor and 8GB of RAM. Using this machine, the meanexecution time of Algorithm 1 in each slot is 19 ms which is negligible compared to the lengthof the slot. The runtimes of MinResource, MinReliability, and CERA algorithms are 7 ms, 7ms, and 12 ms, respectively. Because of solving an optimization problem in the RedundantVNFalgorithm, the runtime of this method is significantly greater than the other methods.VIII. C ONCLUSION
In this paper, we investigated the reliability-aware service placement considering the dynamicnature of the service arrival and departure. We adopted a model based on an infinite horizonMDP for dynamic reliability-aware service placement, considering the simultaneous allocationof the main and backup servers. The reward function of the proposed MDP model was definedsuch that the admission ratio is maximized and placement cost is minimized. For evaluating eachaction, we used a sub-optimal algorithm named VRSSP. Then, we introduced an algorithm namedVVI, based on value iteration for finding the optimal policy. Finally, via extensive simulations,we compared the performance of the MDP model with five static methods and demonstrated thesuperiority of the MDP model in terms of various criteria. Also, the robustness of the proposedMDP model in different scenarios and the fairness analysis is investigated.R
EFERENCES [1] I. Afolabi, T. Taleb, K. Samdanis, A. Ksentini, and H. Flinck, “Network slicing and softwarization: A survey on principles,enabling technologies, and solutions,”
IEEE Communications Surveys & Tutorials , vol. 20, no. 3, pp. 2429–2453, 2018. [2] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and R. Boutaba, “Network function virtualization: State-of-the-art and research challenges,” IEEE Communications Surveys & Tutorials , vol. 18, no. 1, pp. 236–262, 2016.[3] A. Laghrissi and T. Taleb, “A survey on the placement of virtual resources and virtual network functions,”
IEEECommunications Surveys & Tutorials , vol. 21, no. 2, pp. 1409–1434, 2018.[4] J. G. Herrera and J. F. Botero, “Resource allocation in NFV: A comprehensive survey,”
IEEE Trans. on Network andService Management , vol. 13, no. 3, pp. 518–532, 2016.[5] A. Osseiran, F. Boccardi, V. Braun, K. Kusume, P. Marsch, M. Maternia, O. Queseth, M. Schellmann, H. Schotten, H. Taoka et al. , “Scenarios for 5G mobile and wireless communications: the vision of the metis project,”
IEEE CommunicationsMagazine , vol. 52, no. 5, pp. 26–35, 2014.[6] C. Pham, N. H. Tran, S. Ren, W. Saad, and C. S. Hong, “Traffic-aware and energy-efficient VNF placement for servicechaining: Joint sampling and matching approach,”
IEEE Trans. on Services Computing , 2017.[7] F. Bari, S. R. Chowdhury, R. Ahmed, R. Boutaba, and O. C. M. B. Duarte, “Orchestrating virtualized network functions,”
IEEE Trans. on Network and Service Management , vol. 13, no. 4, pp. 725–739, 2016.[8] M. Mechtri, C. Ghribi, and D. Zeghlache, “A scalable algorithm for the placement of service function chains,”
IEEE Trans.on Network and Service Management , vol. 13, no. 3, pp. 533–546, 2016.[9] S. Herker, X. An, W. Kiess, S. Beker, and A. Kirstaedter, “Data-center architecture impacts on virtualized network functionsservice chain embedding with high availability requirements,” in
Workshop. of IEEE Globecom , San Diego, CA, Dec. 2015.[10] J. Fan, Z. Ye, C. Guan, X. Gao, K. Ren, and C. Qiao, “Grep: Guaranteeing reliability with enhanced protection in NFV,”in
Proc. of ACM SIGCOMM Workshop , Heraklion, Greece, July. 2015.[11] W. Ding, H. Yu, and S. Luo, “Enhancing the reliability of services in NFV with the cost-efficient redundancy scheme,” in
Proc. of IEEE ICC , Paris, France, May. 2017.[12] J. Fan, M. Jiang, O. Rottenstreich, Y. Zhao, T. Guan, R. Ramesh, S. Das, and C. Qiao, “A framework for provisioningavailability of NFV in data center networks,”
IEEE JSAC , vol. 36, no. 10, pp. 2246–2259, 2018.[13] L. Qu, C. Assi, K. Shaban, and M. J. Khabbaz, “A reliability-aware network service chain provisioning with delay guaranteesin NFV-enabled enterprise datacenter networks,”
IEEE TNSM , vol. 14, no. 3, pp. 554–568, 2017.[14] J. Sun, G. Zhu, G. Sun, D. Liao, Y. Li, A. K. Sangaiah, M. Ramachandran, and V. Chang, “A reliability-aware approachfor resource efficient virtual network function deployment,”
IEEE Access , vol. 6, pp. 18 238–18 250, 2018.[15] Y. Kanizo, O. Rottenstreich, I. Segall, and J. Yallouz, “Optimizing virtual backup allocation for middleboxes,”
IEEE/ACMTrans. on Networking , vol. 25, no. 5, pp. 2759–2772, 2017.[16] ——, “Designing optimal middlebox recovery schemes with performance guarantees,”
IEEE Journal on Selected Areas inCommunications , vol. 36, no. 10, pp. 2373–2383, 2018.[17] O. Rottenstreich, I. Keslassy, Y. Revah, and A. Kadosh, “Minimizing delay in network function virtualization with sharedpipelines,”
IEEE Transactions on Parallel and Distributed Systems , vol. 28, no. 1, pp. 156–169, 2016.[18] X. Chen, W. Ni, I. B. Collings, X. Wang, and S. Xu, “Automated function placement and online optimization of networkfunctions virtualization,”
IEEE Trans. on Communications , vol. 67, no. 2, pp. 1225–1237, 2019.[19] X. Chen, W. Ni, T. Chen, I. Collings, X. Wang, R. P. Liu, and G. B. Giannakis, “Multi-timescale online optimization ofnetwork function virtualization for service chaining,”
IEEE Trans. on Mobile Computing , 2018.[20] Y. T. Woldeyohannes, A. Mohammadkhan, K. Ramakrishnan, and Y. Jiang, “Cluspr: Balancing multiple objectives at scalefor NFV resource allocation,”
IEEE Trans. on Network and Service Management , vol. 15, no. 4, pp. 1307–1321, 2018.[21] B. Yang, W. K. Chai, Z. Xu, K. V. Katsaros, and G. Pavlou, “Cost-efficient NFV-enabled mobile edge-cloud for lowlatency mobile applications,”
IEEE Trans. on Network and Service Management , vol. 15, no. 1, pp. 475–488, 2018. [22] H. Cao, H. Zhu, and L. Yang, “Dynamic embedding and scheduling of service function chains for future SDN/NFV-enablednetworks,” Accepted in IEEE Access , 2019.[23] Y. Jia, C. Wu, Z. Li, F. Le, A. Liu, Z. Li, Y. Jia, C. Wu, F. Le, and A. Liu, “Online scaling of NFV service chains acrossgeo-distributed datacenters,”
IEEE/ACM Trans. on Networking (TON) , vol. 26, no. 2, pp. 699–710, 2018.[24] V. Eramo, E. Miucci, M. Ammar, and F. G. Lavacca, “An approach for service function chain routing and virtual functionnetwork instance migration in network function virtualization architectures,”
IEEE/ACM Transactions on Networking ,vol. 25, no. 4, pp. 2008–2025, 2017.[25] V. Eramo, M. Ammar, and F. G. Lavacca, “Migration energy aware reconfigurations of virtual network function instancesin NFV architectures,”
IEEE Access , vol. 5, pp. 4927–4938, 2017.[26] J. Liu, W. Lu, F. Zhou, P. Lu, and Z. Zhu, “On dynamic service function chain deployment and readjustment,”
IEEE Trans.on Network and Service Management , vol. 14, no. 3, pp. 543–553, 2017.[27] H. R. Khezri, P. A. Moghadam, M. K. Farshbafan, V. Shah-Mansouri, H. Kebriaei, and D. Niyato, “Deep Q-Learning fordynamic reliability aware NFV-based service provisioning,” arXiv preprint arXiv:1812.00737 , 2018.[28] M. Z. Shafiq, L. Ji, A. X. Liu, and J. Wang, “Characterizing and modeling internet traffic dynamics of cellular devices,”
ACM SIGMETRICS Performance Evaluation Review , vol. 39, no. 1, pp. 265–276, 2011.[29] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,”
Artificial intelligence
IEEECommunications magazine , vol. 40, no. 1, pp. 132–139, 2002.[32] N. I. Sarkar and J. A. Guti´errez, “Revisiting the issue of the credibility of simulation studies in telecommunication networks:highlighting the results of a comprehensive survey of ieee publications,”
IEEE Communications Magazine , vol. 52, no. 5,pp. 218–224, 2014.[33] M. Matsumoto and T. Nishimura, “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random numbergenerator,”
ACM Transactions on Modeling and Computer Simulation (TOMACS) , vol. 8, no. 1, pp. 3–30, 1998. A PPENDIX P ROCEDURE OF THE
VRSSP
ALGORITHM
The two important aspects of the Viterbi-based algorithm are definition of state and decisionmetric. In the service placement problem, the state of each stage is defined as a set of the serverswhere respective VNF can run. Therefore, the set of states in the m th stage, X m , is X m = { } S G s , ( m mod 2) = 0 ,G s , ( m mod 2) = 1 , m = 1 , . . . , L V , (36)where G s is the set of all servers introduced in Section III-A and indicates no server assignmentwhich is applicable only in even stages. For defining decision metric, we exploit the placementcost and cost of violating reliability requirement. Let Θ x ,x m − ,m denote the transition cost betweenthe x th state of ( m − th stage and x th state of m th stage which can be computed as Θ x ,x m − ,m = P | R | j =1 r ρ km u m ,j × C i x m ,j + DC i x m ,t ρkmum + ψ x ,x ,rm − ,m + ψ x ,x ,bm − ,m + φ x m − , (37) where r ρ km u m ,j is the amount of j th resource type required for the u th m VNF of k th m service, ρ k m refersto the type of k th m service and C i x m ,j is the unit cost of using j th resource type of ( i x m ) th InP’sservers. Also, k m denote the index of considered service in the m th stage, u m is the index ofconsidered VNF of k th m service in the m th stage and i x m indicates the InP index for the x th stateof m th stage. DC i x m ,t ρkmum refers to the deployment cost of ( t ρ km u m ) th types of VNF in the serversof ( i x m ) th InP. Also, φ x m − indicates the cost of being in the x th state of ( m − th stage. The ψ x ,x ,bm − ,m is the cost of traffic routing for the transition between the x th state of ( m − th stageand the x th state of m th stage which can be computed as ψ x ,x ,bm − ,m = u m = 1 or x = 0 ,b ρ km × (cid:16) C s x , m − ,s x m i x , m − ,i x m + C s x , m − ,s x m i x , m − ,i x m (cid:17) u m ≥ , m ∈ odds ,b ρ km × (cid:16) C s x , m − ,s x m i x , m − ,i x m + C s x , m − ,s x m i x , m − ,i x m (cid:17) u m ≥ , m ∈ evens , (38)where we have i x , m − = Λ Im − ,x [ m − , s x , m − = Λ Wm − ,x [ m − , i x , m − = Λ Im − ,x [ m − , s x , m − = Λ Wm − ,x [ m − , i x , m − = Λ Im − ,x [ m − , and s x , m − = Λ Wm − ,x [ m − . The parameters Λ Im − ,x and Λ Wm − ,x are the vectors of length m − which indicate the index of InP andserver for the survived path of x th state in the ( m − th stage. Also, s x m indicates the indexof server in the x th state of m th stage. When x = 0 , we assume i x m = s x m = 0 and consider C ,j = 0 , C s x , m − , i x , m − , = 0 , C s x , m − , i x , m − , = 0 , C s x , m − , i x , m − , = 0 . Also, we have C ,s x m ,i x m = 0 .Finally, ψ x ,x ,rm − ,m is the cost of violating reliability requirement for the transition between the x th state of ( m − th stage and the x th state of m th stage which can be computed as ψ x ,x ,rm − ,m = M ρ km × (cid:0) OR x m − T x ,x m − ,m (cid:1) × I (cid:0) OR x m − T x ,x m − ,m (cid:1) , (39)where, M ρ km is the cost of violating the reliability requirement of ( ρ k m ) th service type. It is worthnoting that the value of M ρ km should be selected large enough compared to the placement costto meet the reliability requirement. The parameter OR x m is the objective reliability in the x th stateof m th stage which is defined as OR x m = 1 when x = 0 and otherwise OR x m = 1 − F ρ km . In x = 0 which means no backup server assignment for the considered VNF, the path which leadsto the maximum reliability is selected as a survived path. T x ,x m − ,m is defined as the reliabilitylevel in the x th state of m th stage, if the VRSSP algorithm moves from the x th state of ( m − th stage to the x th state of m th stage. The value of T x ,x m − ,m , can be computed as T x ,x m − ,m = − v i x m u m = 1 , m ∈ odds , − (cid:0) v Λ Im − ,x [ m − × v i x m (cid:1) u m = 1 , m ∈ evens ,τ x m − × (1 − v i x m ) u m ≥ , m ∈ odds ,τ x m − × (cid:0) − ( v Λ Im − ,x [ m − × v i x m ) (cid:1) − v Λ Im − ,x [ m − u m ≥ , m ∈ evens , (40)where τ x m − is the reliability of being in the x th state of ( m − th stage.The survived path for the x th state of m th stage can be computed using the decision metric as SP x m = argmin x ∈ XP x m (cid:0) Θ x ,x m − ,m (cid:1) , (41) Λ Im,x = h Λ Im − ,SP x m i x m i , Λ Wm,x = h Λ Wm − ,SP x m s x m i , (42)where SP x m indicates the index of the survived path and XP x m is the set of possible inputstates to x th state of m th stage. The XP x m is the subset of the states set in the ( m − th stage ( XP x m ⊆ X m − ) in which there are enough resources for running the VNF of m th stage in theserver of x th state. In (42), the characteristic of survived path in the x th state of m th stage isdetermined in which i x m and s x m indicate the index of the considered InP and server in the x th state of m th stage, respectively. The values of φ x m and τ x m can be, respectively, determined as φ x m = φ SP x m m − + Θ SP x m ,x m − ,m − ψ SP x m ,x ,rm − ,m , (43) τ x m = − v i x m u m = 1 , m ∈ odds , − (cid:0) v Λ Im,x [ m − × v i x m (cid:1) u m = 1 , m ∈ evens ,τ SP x m m − × (1 − v i x m ) u m ≥ , m ∈ odds ,τ SP x m m − (cid:0) − ( v Λ Im,x [ m − × v i x m ) (cid:1) − v Λ Im,x [ m − u m ≥ , m ∈ evens . (44)In the last stage, L V , the best path between the survived paths can be determined using ζ = argmin x ∈ X LV (cid:16) φ xL V + M ρ K × (cid:0) OR xL V − τ xL V (cid:1) × I (cid:0) OR xL V − τ xL V (cid:1)(cid:17) , P IL V = Λ IL V ,ζ , P WL V = Λ WL V ,ζ , (45)where P IL V and P WL V are the indexes of the InP and server with the best path, respectively, and ζ is the index of best path in the last stage. The number of candidates for the best path is (cid:12)(cid:12) X L V (cid:12)(cid:12)(cid:12)(cid:12)