[PDF] Probabilistic Bounds on the End-to-End Delay of Service Function Chains using Deep MDN

Abstract

Ensuring the conformance of a service system's end-to-end delay to service level agreement (SLA) constraints is a challenging task that requires statistical measures beyond the average delay. In this paper, we study the real-time prediction of the end-to-end delay distribution in systems with composite services such as service function chains. In order to have a general framework, we use queueing theory to model service systems, while also adopting a statistical learning approach to avoid the limitations of queueing-theoretic methods such as stationarity assumptions or other approximations that are often used to make the analysis mathematically tractable. Specifically, we use deep mixture density networks (MDN) to predict the end-to-end distribution of the delay given the network's state. As a result, our method is sufficiently general to be applied in different contexts and applications. Our evaluations show a good match between the learned distributions and the simulations, which suggest that the proposed method is a good candidate for providing probabilistic bounds on the end-to-end delay of more complex systems where simulations or theoretical methods are not applicable.

Full PDF

aa r X i v : . [ c s . PF ] J un Probabilistic Bounds on the End-to-EndDelay of Service Function Chainsusing Deep MDN

Majid Raeis, Ali Tizghadam and Alberto Leon-Garcia

Department of Electrical and Computer EngineeringUniversity of Toronto , Toronto, CanadaEmails: [email protected], [email protected] and [email protected]

Abstract —Ensuring the conformance of a service systems end-to-end delay to service level agreement (SLA) constraints is achallenging task that requires statistical measures beyond theaverage delay. In this paper, we study the real-time predictionof the end-to-end delay distribution in systems with compositeservices such as service function chains. In order to have a generalframework, we use queueing theory to model service systems,while also adopting a statistical learning approach to avoid thelimitations of queueing-theoretic methods such as stationarityassumptions or other approximations that are often used tomake the analysis mathematically tractable. Speciﬁcally, we usedeep mixture density networks (MDN) to predict the end-to-enddistribution of the delay given the network’s state. As a result, ourmethod is sufﬁciently general to be applied in different contextsand applications. Our evaluations show a good match between thelearned distributions and the simulations, which suggest that theproposed method is a good candidate for providing probabilisticbounds on the end-to-end delay of more complex systems wheresimulations or theoretical methods are not applicable.

Index Terms —Service function chaining, queueing networks,distribution prediction, mixture density networks. I. INTRODUCTION

Measuring the quality of service (QoS) for a particularservice or application is often a challenging task. In addition,most applications and services are composed of more ﬁne-grained services, making quality of service assessment evenmore complicated. Service function chaining is one suchexample in which the network services consist of an abstractsequence of service functions (SFs) [1], where each servicefunction provides a speciﬁc service, such as load balancing,deep packet inspection, etc. Therefore, the end-to-end perfor-mance of the service chain depends on the performance ofthe constituent SFs, as well as the order of the SFs that arevisited by the incoming packets. The end-to-end delay of aservice chain is one of the important measures of the QoS,particularly when providing time-sensitive services in whichthe packets must be processed within some speciﬁc deadline.In this case, real-time prediction of the delay distribution canbe much more informative compared to the existing single-value delay prediction methods. For instance, the predicteddistribution can be used for designing an admission controller that rejects the packets that have a high probability of missingthe deadline. Moreover, the predicted distribution can be usedfor other control purposes such as auto-scaling of virtual network functions (VNFs) in a service chain. We refer thereader to [2]–[4] for some existing works on the performance,auto-scaling and admission control of VNF chains.In order to study the end-to-end performance of such sys-tems, we take a general approach and do not limit ourselves toa speciﬁc application. In particular, we use queueing theory asa general framework for modeling service systems. However,we do not adopt the traditional queueing theoretic methodsbecause of their limitations and unrealistic assumptions. In-stead, we take a statistical learning approach, without makingany assumptions about the network topology, or service andinter-arrival time distributions. Speciﬁcally, we study the end-to-end delay of a service network, which is one of theimportant measures of the quality of service, by applyingqueueing models to the underlying services (SFs in the contextof SFC), so that the theoretical analysis is replaced withstatistical learning methods. It should be noted that real-time prediction and analysis of the end-to-end delay in queueingnetworks, under non-stationarity assumptions, is an under-explored problem [5], which will be studied in this paper.

A. Background and Previous Work

Here, we brieﬂy review the literature on real-time delayprediction and analysis of queueing systems, as opposed to steady state studies. In order to perform real-time prediction,different types of information such as queue length anddelay history are typically used. We classify delay analysisin service systems based on two aspects: the analysis tech-nique (queueing-theoretical versus data-based) and the systemtopology (single-stage versus multi-stage network). Let us ﬁrstbegin with the queueing-theoretic methods for the single-stagequeueing systems.One of the earliest work on predicting a customer’s waitingtime in a multi-server queueing system is [6]. This paperinvestigates the possibility of improving delay predictions byexploiting information about the system state, and the elapsedservice time of the customers in service, under non-exponentialservice time assumptions. Following on [6], the performanceof alternative queue-length-based and delay-history-based pre-dictors for multi-server queues have been studied in [7].In contrast to the single-stage case, real-time delay pre-diction in multi-stage queueing systems has not yet beenxtensively studied. One of the few examples in this category,which is also closely related to this paper, is the approx-imation model proposed in [8] for predicting the sojourn-time distribution of the customers in multi-stage systems.Using phase-type distributions, the authors develop a modelfor approximating sojourn-time distributions based on queuelength information. Although the authors in [8] use generalinter-arrival and service time distributions, the method assumesheavy trafﬁc, stationary distributions and knowledge of thesystem parameters and the network topology.Limitations of the queueing-theoretic analysis have led torecent interest in data-based methods such as machine-learningalgorithms and data-mining techniques. Combining processmining and queueing-theoretic results, a technique calledqueue-mining is introduced in [9] for predicting waiting timesin service systems. In [10], the authors propose a new predic-tor, called Q-Lasso, which combines the Lasso method fromstatistical learning and ﬂuid models from the queueing theory.Similar to [9] and [10], most of the existing works in thisarea focus on single-value delay predictions and provide noinformation on the distribution of the delay. A closely relatedwork to this paper is [11], which studies delay distributionprediction in single stage queueing systems using delay historyinformation. Taking a statistical learning approach, the methodin [11] is capable of predicting the conditional distributionof the delay under non-stationary conditions, without anyknowledge of the system parameters.

B. Motivation

Both queueing-theoretic as well as data-based methods havetheir own advantages and shortcomings. One of the main dis-advantages of queueing-theoretic methods is that the analysiscan easily become intractable when introducing more realisticassumptions, such as general non-stationary inter-arrival andservice time distributions. Furthermore, the queueing-theoreticmethods require knowledge of the model parameters and thenetwork topology, which might not be available and need to beestimated as well. Another shortcoming of the these methods istheir limitation in exploiting all the available information. Onthe other hand, the prediction method and the feature selectionprocess in data-based predictions are usually specialized fora particular application and do not provide much insightinto the behaviour of general queueing networks. Uncertaintyof the estimations and the distribution of the waiting timesare additional pieces that are often missing in data-basedmethods. The combination of these reasons motivated us to usestatistical learning methods to study queueing models undermore realistic assumptions, such as non-stationary arrivalsand non-exponential service times. Furthermore, our proposedmethod enables us to estimate the conditional distribution ofthe end-to-end delay, which is much more informative thansingle-value predictions.The remainder of this paper is organized as follows. InSection II, we describe the queueing system model and for-mulate the problems that we study in this paper. We beginour analysis by providing some theoretical results on the ! ! ! ! " ! $! ! ! ! " " $! " ! ! " $! ! ! " ! $! ! ! ! " $ $! $ ! ! " " $! " ! ! " $ ! !" ! ! " $! (a) ! ! ! ! " ! $! ! ! ! " " $! " ! ! " $! ! ! " ! $! ! ! ! " $ $! $ ! ! " " $! " ! ! " $ ! !" ! ! " $! % !" % !$ % ! (b)Fig. 1. Network topologies (a) Tandem queue (b) Acyclic queue. end-to-end delay of the service networks in Section III. InSection IV, we propose to use Gaussian mixture models asan approximation of the delay distribution, parameters ofwhich can be estimated using mixture density networks. Theevaluation of the proposed methods are presented in Section V.Finally, Section VI presents the conclusions.II. S YSTEM M ODEL AND P ROBLEM S ETTING

We consider multi-server queueing systems, with inﬁnitequeue size and First Come First Serve (FCFS) service dis-cipline, as the building blocks of the service networks thatwe study. Furthermore, we study tandem queues and simpleacyclic queueing networks as shown in Fig.1. In a tandemtopology, a customer must go through all the stages to receivethe end-to-end service, while in an acyclic topology, thecustomers randomly go through one of the branches withthe speciﬁed probabilities in Fig. 1. Our MDN-based methoddoes not assume a speciﬁc distribution for the service timesor inter-arrival times and therefore, these processes can havenon-stationary distributions. It should also be noted that ourMDN-based method is not limited to the earlier mentionedassumptions about the service discipline or network topologiesand we only consider them so that we can obtain theoreticalbaselines for comparison.Consider a network consisting of N queueing systems,where system n , ≤ n ≤ N , is a multi-server queueingsystem with c n servers. Let b n denote the queue length (QL)of the n th queueing system upon arrival of the customerof interest. We deﬁne queue length information vector as b = [ b , b , · · · , b N ] . Furthermore, D ( b ) denotes the end-to-end delay of a new arrival given queue length information b upon arrival. Our goal is to predict the distribution of anew arrival’s end-to-end delay, based on the observed QLinformation upon arrival of the customer of interest. In otherwords, we are interested in obtaining the distribution of D ( b ) .s we mentioned earlier, an important reason for estimatingthe distribution of the delay is to obtain probabilistic boundsinstead of making single-value predictions. More speciﬁcally,we can deﬁne probabilistic lower-bounds ( d lb ) and upper-bounds ( d ub ) as follows P ( D ( b ) > d ub ) ≤ ε ub , (1) P ( D ( b ) < d lb ) ≤ ε lb , (2)where ε ub and ε lb are the violation probabilities for the upper-bounds and the lower-bounds, respectively. Conﬁdence inter-val is another statistic that will be used in this paper to measurethe amount of uncertainty for each prediction. Since the conﬁ-dence intervals will be used along with the MMSE predictions,we deﬁne the conﬁdence interval for the random delay D ( b ) as an interval with endpoints ( E [ D ( b )] − x, E [ D ( b )] + x ) such that P (cid:16) E [ D ( b )] − x < D ( b ) < E [ D ( b )] + x (cid:17) ≥ P cl , (3)where P cl denotes the corresponding conﬁdence level.Finally, we study single-value prediction of the end-to-enddelay, which will be denoted by b D ( b ) . In particular, we areinterested in computing the Minimum Mean Square Error(MMSE) predictions, which can be obtained by the conditionalexpectation of the end-to-end delay, given QL information b .In other words, it is well-known that E [ D ( b )] minimizes theMSE of the predictor, which is deﬁned asMSE ( b D ( b )) ≡ E (cid:20)(cid:16) D ( b ) − b D ( b ) (cid:17) (cid:21) . (4)III. E ND - TO - END D ELAY A NALYSIS

In this section, we analyze the end-to-end delay of multi-stage queueing systems with tandem and acyclic topologies.We only consider queue-length-based methods and investigatethe differences between our approach and the existing ones.We begin with some approximations of the ﬁrst two momentsof the end-to-end delay. Using the ﬁrst two moments, wediscuss the normal approximation method, which motivates theuse of mixtures of Gaussians for approximating the conditionaldistribution of the end-to-end delay.

A. Analytical Expressions

We begin by considering a tandem network of N queuesas in Fig. 1a. We deﬁne the end-to-end delay of a customeras the sum of the waiting times and the service times that areexperienced while going through the network, i.e., D = N X n =1 ( W n + S n ) , (5)where W n and S n represent the waiting time and servicetime of the customer of interest at stage n . Let b τn , τ ∈{ , , · · · , N − } , denote the queue length at stage n , oncethe customer of interest reaches stage τ + 1 . As before, b n represents the queue length at stage n once the customer ofinterest enters the network. In order to simplify the notation,we deﬁne q n ≡ b n − n , which represents the queue length of the n ’th queueing system upon arrival of the customerof interest at this stage. In other words, q n , ≤ n ≤ N ,represent the sequence of queue lengths that the customerof interest observes as he goes through stages to N . Aswill be discussed later in this section (see Fig. 2), q n and b n are not necessarily equal for n ≥ and therefore, thevector q = [ q , q , · · · , q N ] needs to be approximated from b and other system parameters. We propose an algorithmfor this purpose at the end of this section. Now, having anestimation of q , we can write W n as P q n +1 i =1 U n,i , where U n,i , ≤ i ≤ q n + 1 , denote the intervals between successiveservice completions at stage n . For mathematical tractability,we can approximate U n,i by S n,i /c n under exponential servicetime assumption, where S n,i is a random variable with thesame distribution as the service times in stage n . It should benoted that we only use the exponential service time assumptionto make the derivations tractable, and we will not limit ouranalysis to a particular arrival or service time distribution.Now, using the approximation, we can write the end-to-enddelay as D ( q ) ≃ D N X n =1 q n +1 X i =1 S n,i c n + S n ! , (6)where ’ ≃ D ’ indicates equality (with approximation) in distri-bution. Assuming independent service times, the conditionalmean and variance of the end-to-end delay can be obtainedfrom Eq. (6) as follows E [ D ( q )] ≃ N X n =1 (cid:18) q n + 1 c n + 1 (cid:19) E [ S n ] , (7) V ar [ D ( q )] ≃ N X n =1 (cid:18) q n + 1 c n + 1 (cid:19) V ar [ S n ] . (8)As mentioned earlier, we are interested in the MMSEprediction, E [ D ( b )] , which can be approximated by E [ D ( q )] .The following Theorem states that using E [ D ( q )] as ourprediction of the end-to-end delay has the desirable prop-erty that the predictor becomes relatively more accurate, i.e., V ar [ D ( q )] /E [ D ( q )] → , as the number of customerswaiting in the queues or the number of stages in the networkincreases. Theorem 1:

For a tandem queueing network with indepen-dent and exponentially distributed service times, we have c D ( q ) = V ar [ D ( q )] E [ D ( q )] → as N X n =1 ( q n + 1) → ∞ , (9)where c D ( q ) is the squared coefﬁcient of variation (SCV) ofthe end-to-end delay given q . Proof:

Let us deﬁne Q = P Nn =1 ( q n + 1) . Using Eq. (8),we have V ar [ D ( q )] = N X n =1 q n + 1 c n V ar [ S n ] + N X n =1 V ar [ S n ] ≤ Qβ + β , (10) ! " !" ! " !! ! New Arrival T i m e $" $! ! " ! $ " $$%! ! ! !! !! Fig. 2. Change of queue lengths as the customer of interest proceeds througha tandem network. where β = max ≤ m ≤ N { V ar [ S m ] /c m } and β =max ≤ m ≤ N { V ar [ S m ] } . Similarly, using Eq. (7) we obtain E [ D ( q )] = N X n =1 q n + 1 c n E [ S n ] + N X n =1 E [ S n ] ≥ Qβ + β , (11)where β = min ≤ m ≤ N { E [ S m ] /c m } and β =min ≤ m ≤ N { E [ S m ] } . Combining Eqs. (10) and (11),we have c D ( q ) = V ar [ D ( q )] E [ D ( q )] ≤ Qβ + β ( Qβ + β ) . (12)Consequently, as Q grows to inﬁnity, c D ( q ) < β / ( Qβ ) andtherefore, c D ( q ) → . B. Updating Queue Lengths Through Time

In the previous subsection, we obtained our theoreticalresults based on the knowledge of queue lengths upon arrivalof the customer of interest at each stage ( q ). As shown inFig. 2, q is not necessarily equal to b and needs to beapproximated. In this subsection, we explain an algorithm forapproximating q from b in a tandem network, which is closeto the proposed method in [8].Let us deﬁne T n as the experienced sojourn time of thecustomer of interest at stage n . Furthermore, µ n representsthe service rate of a single server at stage n . We consider N − updating steps, where the customer of interest proceedsfrom one stage to the next, in each step. The vector q isinitialized to b . In the ﬁrst step, τ = 0 , we update thenumber of customers in the down-stream stages, i.e., to N , given that the customer of interest has just arrived to thesecond queue. We assume that all b + c customers in frontof the customer of interest have arrived to the second stage.Furthermore, assuming busy servers during this time, we canestimate the number of departed customers from stage by c µ E [ T ] . Therefore, b will be updated as max { , b + b + c − ⌊ c µ E [ T ] ⌋} . For stages n > , we update the queue lengths by max { , b n + ⌊ c n − µ n − E [ T ] ⌋ − ⌊ c n µ n E [ T ] ⌋} ,where the number of customers from the upstream stage areequal to ⌊ c n − µ n − E [ T ] ⌋ . Similarly, in updating step τ > , b τn , τ + 1 ≤ n ≤ N , are updated as explained in Algorithm 1. Algorithm 1

Calculating queue lengths upon arrival of thecustomer of interest at each stage q ← b for τ = 1 to N − do b ττ +1 ← max { , b τ − τ +1 + b τ − τ + c i − ⌊ c i +1 µ i +1 E [ T τ ] ⌋} q τ ← b ττ +1 for n = τ + 2 to N do b τn ← max { , b τ − n + ⌊ c n − µ n − E [ T τ ] ⌋ −⌊ c n µ n E [ T τ ] ⌋} end forend for A similar approach can be used for updating the queuelengths in an acyclic network, by taking into account theprobabilities of each branch. It should be noted that thisalgorithm is based on heavy-trafﬁc assumption and will onlybe used to obtain theoretical results for comparison with ourmain MDN-based method.IV. G

AUSSIAN M IXTURE M ODEL A PPROXIMATION

In this section, we use our results from Section III-A toapproximate the distribution of the end-to-end delay givenqueue length information b . Since the end-to-end delay fora new customer consists of the intervals between servicecompletion times of the customers ahead of the new arrival,the central limit theorem (CLT) and Eq. (6) suggest that theNormal distribution can be a good candidate for approximatingthe conditional distribution in a tandem network. Speciﬁcally,we approximate the conditional distribution of the delay in atandem network with D ( b ) ∼ N ( m ( b ) , σ ( b )) , where m ( b ) and σ ( b ) can be approximated by calculating q from b usingAlgorithm 1, and then using Eqs. (7) and (8). Now, consideringeach path of the acyclic network as a tandem queue and usingthe normal approximation for each path, we can approximatethe total distribution of the delay in an acyclic network byGaussian mixture models (GMM), where the mixture weightsare equal to the probabilities of taking each branch. Morespeciﬁcally, for an acyclic network as in Fig. 1b, we have P ( D ( b )) = X k ∈P p k N ( D | m k ( b ) , σ k ( b )) , (13)where P is the set of existing paths in the acyclic network, and m k ( b ) and σ k ( b ) denote the mean and variance of the delayfor path k , given queue length information b upon arrival.In order to have a preliminary assessment of the proposedapproximations, we perform some evaluations on similarnetwork topologies as in [8] (see Tandem I and Acyclic Itopologies in Table I). Fig. 3 shows the comparison betweenthe PDFs of the end-to-end delay obtained from the simulation,approximation method in [8] and our GMM approximationmethod, for the tandem and acyclic networks. It can be

10 20 30 40 50 60 70 80Delay0.000.010.020.030.040.050.06 P D F SimulationNorm. Approx.DMDNApprox. [Gue] (a) P D F MDNGMMSimulationApprox. [Gue] (b)Fig. 3. Comparison of the conditional PDFs of the end-to-end delay in thea) tandem network, given queue lengths b = [6 , , b) acyclic network,given queue lengths b = [6 , , , . observed that the normal distribution can be a good ap-proximation of the conditional distribution of the delay forthe tandem network. Furthermore, Fig. 3b shows that theconditional distribution of the end-to-end delay of the acyclicnetwork consists of two modes, since the conditional meansof the two paths are far from each other relative to theirstandard deviations. As can be seen from Fig. 3, the GMMapproximation provides acceptable results, even in comparisonwith the more complex method of [8]. However, both of thesemethods are limited to stationary systems with heavy trafﬁcand require knowledge of the network topology, as well asother parameters of the network, such as the average servicetimes, which might not be available in practice. In order toaddress these issues, we adopt a statistical learning approachto estimate the parameters of the Gaussian mixture models. A. Mixture Density Networks (MDNs)

As we discussed in the previous subsection, the Gaussianmixture model could be a good candidate for approximatingthe conditional distribution of the end-to-end delay in tandemor acyclic networks. However, the theoretical expressionsobtained in Section III become less accurate as the numberof customers in the network decreases or the degree of non-stationarity increases. Nevertheless, we can still approximatethe conditional distribution of the end-to-end delay by GMMs,since they are powerful enough to approximate arbitrarydistributions [12].In order to address the problems related to GMM param-eter estimation under more realistic assumptions, we adopt astatistical learning approach called

Mixture Density Networks (MDN). The MDN provides a general framework for approx-imating arbitrary conditional distributions using mixture mod- ! ! !! ! ! " $ % & $ ’ & ( & $ ) * + , , , / / / ! Fig. 4. Using mixture density network (MDN) for approximating theconditional distribution of y given x . els. Considering Gaussian components, an MDN approximatesthe conditional distribution of y given x by P ( y | x ) = K X k =1 π k ( x ) N ( y | m k ( x ) , σ k ( x )) , (14)where π k ( x ) ∈ (0 , are the mixing coefﬁcients and, m k ( x ) and σ k ( x ) denote the mean and variance of the k ’th kernel, ≤ k ≤ K , given x . An MDN estimates the parameters ofthe mixture model using a fully-connected neural network. Asshown in Fig. 4, the output layer consists of three types ofnodes which predict the parameters of the mixture model inEq. (14). The ﬁrst type uses the soft-max activation functionto predict the mixing coefﬁcients such that ≤ π k ≤ and P k π k = 1 . The second group, which predict the variances ofthe kernels, use exponential activations to ensure non-negativevalues. The last group of the nodes use linear activations andcompute the means of the kernels. Using a data set of N sample observations (queue lengths) and their corresponding targetvalues (end-to-end delay), { ( x j = b j , y j = D j ) | ≤ j ≤ N sample } , the mixture density network learns the weights ofthe neural network by minimizing the error function, which isdeﬁned as the negative logarithm of the likelihood, i.e., E = − N sample X j =1 ln { P ( y j | x j ) } . (15)We refer the reader to [13], [14] for more information on theapplications of the MDNs.V. E VALUATION AND N UMERICAL R ESULTS

In this section, we evaluate the accuracy of our proposedmethods and compare some of our results to the existingmethod from [8] under different network topologies that aresummarized in Table I. Furthermore, we consider multipletypes of arrival processes, including non-stationary and non-renewal arrivals (Table II). It should be noted that time isnormalized by the mean service time of the ingress queue,i.e., / ( c µ ) , in all of the following experiments.Let us start with revisiting the delay distribution predictionexperiment in Section IV and compare our MDN-based pre-dictions to the previously obtained theoretical and simulationresults. Our MDN-based predictor consists of three hiddenlayers with 64, 32 and 32 hidden nodes, and uses RELU ABLE IN

ETWORK T OPOLOGIES . Topology Num. of servers [ c , c , · · · , c N ] Service rates [ µ , µ , · · · , µ N ] Arrivaltype

Tandem I [5 , ,

2] [0 . , . , . GammaTandem II [3 , , , , ,

4] [0 . , . , · · · , . NHPPAcyclic I [5 , , ,

2] [0 . , . , . , . GammaAcyclic II [1 , , , ,

1] [1 . , . , . , . , . MMPPTABLE IIA

RRIVAL AND SERVICE MODELS . Distribution Parameters

Gamma (Arrival) λ = 0 . , SCV = . NHPP (Arrival) ¯ λ = 0 . , α = 0 . , T p = 144 MMPP (Arrival) ¯ λ = 0 . , P on → off = 0.4, P off → on = 0.1Gamma (Service time) for µ see Table I, SCV = . activation function for the hidden layers. The MDN layeroutputs the parameters of a Gaussian mixture model with 3kernels. The queue lengths of all the stages in the networkwill be used as the feature set, while the ground-truth end-to-end delays serve as the labels. The MDN-based predictor istrained for 500 epochs with a batch size of 512. Fig. 3 showsthe predicted probability density function obtained from theMDN-based method along with the simulation and theoreticalresults, for the Tandem I and Acyclic I topologies (see Table I).As we can see, the MDN can provide acceptable estimations ofthe conditional distribution, without any knowledge of the net-work topology or its parameters. Furthermore, once the neuralnetwork has been trained, the distribution predictions can beobtained in real-time. This can be a huge beneﬁt compared tothe more complex methods, such as the approximation methodin [8], which require a large number of convolution operationsto perform. Application to Service Function Chaining

As discussed in Section I, service function chainingis one of the examples of the service networks that canbeneﬁt from our MDN-based method by providing real-timeservice guarantees. We model each service function with amulti-server queueing system, where each server representsan instance of a service function.

Admission Control:

Let us consider a service functionchain as in Fig. 6a, which is modeled by a queueing networkwith topology Tandem I. Moreover, we assume that everypacket must be processed within an end-to-end deadline of d ub = 80 , otherwise it is considered useless. Our goal is todesign an admission controller that drops the packets withhigh probability of missing the deadline (more than )at the entrance of the service chain, so that there will bemore room for the following packets and therefore, higherthroughputs can be obtained. Fig. 5 shows the end-to-enddelay of around K packets that have made it through theservice chain, both with and without the admission controller.As can be observed, smaller number of packets will missthe deadline with admission control (Fig. 5b), which showshow dropping a few packets that have high probability ofmissing the deadline at the entrance can have large impacts d ub (a) d ub (b)Fig. 5. End-to-end packet delay in a service chain: a) without admissioncontrol b) with admission controller that guarantees P ( D > d ub ) < . !" (a) !" !" %" (b)Fig. 6. Service function chain: a) tandem b) acyclic. on the end-to-end delay of the following packets. Althoughthe admission controller only rejects the packets for which P ( D > d ub ) ≥ . , the end-to-end throughput has beenincreased from to . The real-time prediction ofthe delay distribution in a SFC can be used for differentpurposes such as SLA compliance prediction and scalingservice function chains. Multimodal Distribution:

Consider a service chaincomprising of three service functions as in Fig. 6b. Theincoming trafﬁc to SF 1 will be forwarded to one of thethree instances of SF 2 by probabilities / , / and / (Fig. 6b). Finally, all instances of SF 2 forward theirpackets to SF 3. This service chain can be modeled by theacyclic queueing network in Fig.1b, parameters of which aresummarized in Table I (topology Acyclic II). Moreover, weassume non-renewal arrivals, which are modeled by MarkovModulated Poisson Process (MMPP) as described in Table II.Fig. 7 shows the comparison between the PDFs obtainedfrom the GMM approximation method and the MDN-basedmethod, given queue lengths b = [6 , , , , . As can beobserved, the MDN-based predictor is capable of capturingthe three existing modes in the conditional distribution of thedelay, and has a good match with the GMM approximation

25 50 75 100 125 150 175 200Delay0.0000.0050.0100.0150.0200.025 P D F DMDNGMM

Fig. 7. Comparison of the conditional PDFs of the end-to-end delay in theSFC shown in Fig. 6b, given queue lengths b = [6 , , , , . method. Probabilistic Bounds:

As we discussed earlier, the predicteddistributions can also be used to obtain probabilistic boundson the end-to-end delay. For this experiment, we consider aservice chain consisting of service functions in tandem,parameters of which are summarized in Table I (topologyTandem II). Furthermore, we assume non-stationary arrivalsmodeled by Non-homogeneous Poisson Process (NHPP). Weadopt the same model as in [7] with sinusoidal arrival rate, i.e.,we consider an arrival rate of λ ( t ) = ¯ λ (1 + α sin(2 πt/T p )) ,where ¯ λ , α and T represent the average arrival rate, relativeamplitude and the cycle length of the arrival rate. Theseparameters are summarized in Table II. Fig. 8a shows thesample paths of the actual end-to-end delay along with theprobabilistic upper bounds, lower bounds and the conditionalmean, obtained from the learned distribution. The boundsare computed for violation probabilities ε lb = ε ub = 0 . .It should be noted that there exists a trade off betweenthe tightness of the bounds and the violation probabilities.Similarly, Fig. 8b shows the MMSE predictions and the conﬁdence intervals computed from Eq. (3). As canbe observed, using the conﬁdence intervals along with thepredictions, which are shown by the error bars, can be muchmore informative compared to the single value predictions,since it provides a region in which the ground-truth delaysare more likely to occur.VI. C ONCLUSIONS