SimFaaS: A Performance Simulator for Serverless Computing Platforms
SSimFaaS: A Performance Simulator for Serverless Computing Platforms
Nima Mahmoudi and Hamzeh Khazaei Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada Electrical Engineering and Computer Science, York University, Toronto, Ontario, [email protected], [email protected]
Keywords: simulator, serverless, serverless computing, performance analysisAbstract: Developing accurate and extendable performance models for serverless platforms, aka Function-as-a-Service(FaaS) platforms, is a very challenging task. Also, implementation and experimentation on real serverlessplatforms is both costly and time-consuming. However, at the moment, there is no comprehensive simulationtool or framework to be used instead of the real platform. As a result, in this paper, we fill this gap byproposing a simulation platform, called SimFaaS, which assists serverless application developers to developoptimized Function-as-a-Service applications in terms of cost and performance. On the other hand, SimFaaScan be leveraged by FaaS providers to tailor their platforms to be workload-aware so that they can increaseprofit and quality of service at the same time. Also, serverless platform providers can evaluate new designs,implementations, and deployments on SimFaaS in a timely and cost-efficient manner.SimFaaS is open-source, well-documented, and publicly available, making it easily usable and extendable toincorporate more use case scenarios in the future. Besides, it provides performance engineers with a set of toolsthat can calculate several characteristics of serverless platform internal states, which is otherwise hard (mostlyimpossible) to extract from real platforms. In previous studies, temporal and steady-state performance modelsfor serverless computing platforms have been developed. However, those models are limited to Markovianprocesses. We designed SimFaaS as a tool that can help overcome such limitations for performance and costprediction in serverless computing.We show how SimFaaS facilitates the prediction of essential performance metrics such as average responsetime, probability of cold start, and the average number of instances reflecting the infrastructure cost incurred bythe serverless computing provider. We evaluate the accuracy and applicability of SimFaaS by comparing theprediction results with real-world traces from Amazon AWS Lambda.
There is very little official documentation made pub-licly available about the scheduling algorithms in pub-lic serverless computing platforms. However, manyworks have focused on partially reverse engineeringthis information through experimentations on theseplatforms (Wang et al., 2018; Figiela et al., 2018;Lloyd et al., 2018). Using the results of such stud-ies and by modifying their code base and thoroughextensive experimentation, we have come to a good un-derstanding of the way modern serverless frameworksare operated and managed by the service providers. Inthis work, we plan to use this information to build an a https://orcid.org/0000-0002-2592-9559 b https://orcid.org/0000-0001-5439-8024 open and public performance simulator for modernserverless computing platforms with a high degree offlexibility, fidelity and accuracy.In serverless computing platforms, computation isdone in function instances. These instances are com-pletely managed by the serverless computing platformprovider and act as tiny servers for the incoming trig-gers (requests). To develop a comprehensive simulatorfor serverless computing platforms, we first need tounderstand how they work underneath and are man-aged.The simulator presented in this work is written in Python . The resulting package can easily be installedusing pip . The source code is openly accessible on A simulator focusing on the performance-related keymetrics and aspects of the system. https://pypi.org/project/simfaas/ a r X i v : . [ c s . D C ] F e b he project Github . The documentation is accessibleon Read the Docs . For more information, interestedreaders can check out our Github repository, whichprovides links to all of our artifacts as well as easy-to-setup environments, to try out our sample scenarios.The remainder of the paper is organized as follows:Section 2 describes the system simulated in SimFaaSin detail. Section 3 outlines the design of SimFaaSwith the most important design choices and character-istics. Section 4 lists some of possible use cases forSimFaaS. In Section 5, we present the experimentalevaluation of SimFaaS, validating the accuracy of thesimulator. Section 6 gives a summary of the relatedwork. Finally, Section 7 concludes the paper. In this section, we introduce the management systemin serverless computing platforms, which has beenfully captured by the serverless simulator presented inthis paper.
Function Instance States: according to recentstudies (Mahmoudi and Khazaei, 2020a; Mahmoudiand Khazaei, 2020b; Wang et al., 2018; Figiela et al.,2018; Mahmoudi et al., 2019), we identify three statesfor each function instance: initializing , running , and idle . The initializing state happens when the infras-tructure is spinning up new instances, which mightinclude setting up new virtual machines, unikernels,or containers to handle the excessive workload. Theinstance will remain in the initializing state until itis able to handle incoming requests. As defined inthis work, we also consider application initializing ,which is the time user’s code is performing initialtasks like creating database connections, importinglibraries or loading a machine learning model from anS3 bucket as a part of the initializing state which needsto happen only once for each new instance. Note thatthe instance cannot accept incoming requests beforeperforming all initialization tasks. It might be worthnoting that the application initializing state is billed bymost providers while the rest of the initializing state isnot billed. When a request is submitted to the instance,the instance goes into the running state. In this state,the request is parsed and processed. The time spentin the running state is also billed by the serverlessprovider. After the processing of a request is over, theserverless platform keeps the instances warm for sometime to be able to handle later spikes in the workload.In this state, we consider the instance to be in the idle https://github.com/pacslab/simfaas https://simfaas.readthedocs.io/en/latest/ state. The application developer is not charged for aninstance that is in the idle state. Cold/Warm start: as defined in previouswork (Lloyd et al., 2018; Wang et al., 2018; Figielaet al., 2018), we refer to a request as a cold start re-quest when it goes through the process of launchinga new function instance. For the platform, this couldinclude launching a new virtual machine, deployinga new function, or creating a new instance on an ex-isting virtual machine, which introduces an overheadto the response time experienced by users. In casethe platform has an instance in the idle state when anew request arrives, it will reuse the existing functioninstance instead of spinning up a new one. This iscommonly known as a warm start request. Cold startscould be orders of magnitude longer than warm startsfor some applications. Thus, too many cold startscould impact the application’s responsiveness and userexperience (Wang et al., 2018). This is the reason alot of research in the field of serverless computing hasfocused on mitigating cold starts (Lin and Glikson,2019; Bermbach et al., 2020; Manner et al., 2018).
Autoscaling: we have identified three main au-toscaling patterns among the mainstream serverlesscomputing platforms: 1) scale-per-request ; 2) con-currency value scaling ; 3) metrics-based scaling . In scale-per-request
Function-as-a-Service (FaaS) plat-forms, when a request comes in, it will be serviced byone of the available idle instances ( warm start ), or theplatform will spin up a new instance for that request( cold start ). Thus, there is no queuing involved in thesystem, and each cold start causes the creation of anew instance, which acts as a tiny server for subse-quent requests. As the load decreases, to scale thenumber of instances down, the platform also needs toscale the number of instances down. In the scale-per-request pattern, as long as requests that are being madeto the instance are less than the expiration threshold apart, the instance will be kept warm. In other words,for each instance, at any moment in time, if a requesthas not been received in the last expiration threshold units of time, it will be expired and thus terminated bythe platform, and the consumed resources will be re-leased. To enable simplified billing, most well-knownpublic serverless computing platforms use this scalingpattern, e.g., AWS Lambda, Google Cloud Functions,IBM Cloud Functions, Apache OpenWhisk, and AzureFunctions (Wang et al., 2018; Van Eyk et al., 2018).As scale-per-request is the dominant scaling techniqueused by major providers, in this paper, we strive tosimulate the performance of this type of serverlessplatform.In the concurrency value scaling pattern (GoogleCloud Platform Inc., 2020), function instances can re- igure 1: The effect of the concurrency value on the numberof function instances needed. The left service has a concur-rency value of 1, while the right service has a concurrencyvalue of 3. ceive multiple requests at the same time. The numberof requests that can be made concurrently to the sameinstance can be set via concurrency value . Figure 1shows the effect of concurrency value on the autoscal-ing behaviour of the platform. It is worth noting thatthe scale-per-request autoscaling pattern can initiallybe seen as a special case of concurrency value scaling pattern where concurrency value is set to 1. However,due to its popularity, importance, and fundamental dif-ferences in their management layer, we classify theminto separate categories. Examples of this scaling pat-tern are Google Cloud Run and Knative.
Metrics-based scaling tries to keep metrics likeCPU or memory usage within a predefined range.Most on-premises serverless computing platformswork with this pattern due to its simplicity and re-liability. Some of the serverless computing platformsthat use this pattern are AWS Fargate, Azure ContainerInstances, OpenFaaS, Kubeless, and Fission.The simulator proposed in this work considers onlythe platforms that use the scale-per-request patterndue to their importance and widespread adoption inmainstream public serverless computing platforms.
Initialization Time: as mentioned earlier, whenthe platform is spinning up new instances, they willfirst go into the initialization state. The initializationtime is the amount of time it takes since the platformreceives a request until the new instance is up and run-ning and ready to serve the request. The initializationtime, as defined here, is comprised of the platform ini-tialization time and the application initialization time.The platform initialization time is the time it takesfor the platform to make the function instance ready,whether a unikernel or a container and the applicationinitialization time is the time it takes for the applica-tion to run the initialization code, e.g., connecting tothe database.
Response Time: the response time usually in-cludes the queuing time and the service time. Since weare addressing the scale-per-request serverless com-puting platforms here, there is no queuing involvedfor the incoming requests. Due to the inherent linearscalability in serverless computing platforms (Lloydet al., 2018; Wang et al., 2018; Figiela et al., 2018),the distribution of the response time does not changeover time with different loads.
Maximum Concurrency Level: every publicserverless computing platform has some limitationson the number of function instances that can be spunup and in running state for a single function. This ismainly due to ensuring the availability of the servicefor others, limiting the number of instances one usercan have up and running at the same time. This ismostly known as the maximum concurrency level . Forexample, the default maximum concurrency level forAWS Lambda is 1000 function instances in 2020 formost regions. When the system reaches the maximumconcurrency level, any request that needs to be servedby a new instance will receive an error status show-ing the server is not able to fulfill that request at themoment.
Request Routing: in order to minimize the num-ber of containers that are kept warm and thus to freeup system resources, the platform routes incomingrequests to new containers, and it will use older con-tainers only if all containers that are created more re-cently are busy (McGrath and Brenner, 2017). In otherwords, the scheduler gives priority to newly instanti-ated idle instances using priority scheduling accordingto creation time, i.e., the newer the instance, the higherthe priority. By adopting this approach, the systemminimizes the number of requests going to older con-tainers, maximizing their chance of being expired andterminated.
This section discusses the design of the novel Function-as-a-Service (FaaS) platform simulator (SimFaaS) pro-posed in this work. SimFaaS was created by the au-thors as a tool for simplifying the process of validatinga developed performance model and allowing accurateperformance prediction for providers and applicationdevelopers in the absence of one. SimFaaS mainly tar-gets public serverless computing platforms. There areseveral built-in tools for visualizing, analyzing, andverifying a developed analytical performance model.In addition, we added tools that can accept customstate encoding and generate approximations for Proba-bility Density Functions (PDF) and Cumulative Distri- /13/2020 packages_simfaas.svgfile:///D:/Users/Nima/Documents/Github/serverless-performance-simulator/packages_simfaas.svg 1/1 simfaassimfaas.FunctionInstancesimfaas.ServerlessSimulatorsimfaas.SimProcesssimfaas.ServerlessTemporalSimulatorsimfaas.Utility
Figure 2: The package diagram of SimFaaS. Each squarerepresents a class in the package and arrows represent depen-dency. bution Functions (CDF) from the simulations, whichcan help debug several parts of a given analytical per-formance model.The proposed simulator can predict several QoS-related metrics accurately like cold start probability,average response time, and the probability of rejec-tion for requests under different load intensities, whichhelps application developers understand the limits oftheir system and measure their SLA compliance with-out the need for expensive experiments. In addition,it can predict the average number of running servercount and total server count, which helps predict thecost of service for the application developer and theinfrastructure cost incurred by the serverless provider,respectively.Figure 2 outlines the package diagram of SimFaaS,showing the dependency between different modules.The
Utility module provides helper functions for plotsand calculations. The
SimProcess module will helpsimulate a single process and allows for comparisonswith the optional analytical model provided as a func-tion handle to the module. The
FunctionInstance mod-ule provides the functionality of a single function in-stance. The main simulation with an empty initial stateis provided by the
ServerlessSimulator module. Fi-nally,
ServerlessTemporalSimulator module performssimulations similar to
ServerlessSimulator module, butwith added functionality allowing customized initialstate and calculation of simulation results in a time-bounded fashion.
SimFaaS has been developed entirely in Python usingan object-oriented design methodology. In order toleverage the tools within the package, the user needsto write a Python application or a Jupyter notebook initializing the classes and providing the input parame-ters. In addition, the user has the option to extend thefunctionality in the package by extending the classesand adding their custom functionality. Almost everyfunctionality of classes can be overridden to allow formodification and extensions. For example, the arrival,cold start service, and warm start service processescan be redefined by simply extending the
SimProcess class. We included deterministic, Gaussian, and Ex-ponential processes as examples of such extensions inthe package. Examples of such changes can be foundin the several examples we have provided for SimFaaS.In addition, the user can include their analytically pro-duced PDF and CDF functions to be compared againstthe simulation trace results.The simulator provides all of the functionalityneeded for modelling modern scale-per-request server-less computing platforms. However, we created a mod-ular framework that can span future types of computa-tional platforms. To demonstrate this, we extended the
ServerlessSimulator class to create
ParServerlessSim-ulator , which simulates serverless platforms that allowqueuing in the function instances but have a scalingalgorithm similar to scale-per-request platforms.
SimFaaS includes simulation models able to mimic themost popular public serverless computing platformslike AWS Lambda, Google Cloud Functions, IBMCloud Functions, Apache OpenWhisk, Azure Func-tions, and all other platforms with similar autoscaling.We have also performed over one month of experimen-tation to demonstrate the validity of the simulationresults extracted from SimFaaS.To capture the exogenous parameters needed foran accurate simulation, the following information isneeded:•
Expiration Threshold which is usually constantfor any given public serverless computing plat-form. According to our experimentations andother works (Shahrad et al., 2020; Mikhail Shilkov,2020), in 2020, this value is 10 minutes for AWSLambda, Google Cloud Functions, IBM Cloud,and Apache OpenWhisk, and 20 minutes for AzureFunctions. For other serverless computing plat-forms, experimentation is needed by the users. Theuse of a non-deterministic expiration threshold isalso possible by extending the
FunctionInstance class.•
The arrival process which can rather safely be as-sumed as an exponential for most consumer-facingpplications. However, other applications mightuse a deterministic process, e.g. cron jobs, or othertypes like batch arrival. The user can use one ofour built-in processes or simply define their own.•
The warm/cold service process can be extractedby measuring the response time from monitoringthe workload response time for cold and warmrequests. By default, SimFaaS uses exponentialdistribution for this process but can be overriddenby the user by passing any class that extends the
SimProcess class. We have provided Gaussian andfixed-interval distributions as part of the packageto demonstrate this.
In this section, we will go through a few sample usecases for the serverless platform simulator presentedin this work. For more details, a comprehensive listof examples can be found in the project Github reposi-tory . In this example, we use the SimFaaS simulator to cal-culate the steady-state properties of a given workloadin scale-per-request serverless computing platforms.In SimFaaS, the workload is only characterized byarrival rate, service time (warm start response time),and the provisioning time (the amount of time to havea cold start instance get ready to serve the request),which are easily accessible through experimentationand any monitoring dashboard. The only informationneeded to characterize the serverless computing plat-form is the expiration threshold, which is the amountof time it takes for the platform to expire and recyclethe resources of an instance after it has finished pro-cessing its last request. This value is usually constantand the same for all users of the serverless computingplatform. To run a simple simulation, we can leveragethe
ServerlessSimulator class and run the simulationlong enough to minimize the transient effect and letthe system achieve the steady-state.Table 1 shows a set of example simulation parame-ters with the default exponential distribution both forthe arrival and service time processes. Note that in-stead of using exponential distributions, the user canpass a random generator function with a custom dis-tribution to achieve more accurate results for specific https://github.com/pacslab/SimFaaS/tree/master/examples Table 1: An example simulation input and selected output pa-rameters. The output parameters are signified with a leadingstar (*).
Parameter Value
Arrival Rate 0.9 req/sWarm Service Time 1.991 sCold Service Time 2.244 sExpiration Threshold 10 minSimulation Time 10 sSkip Initial Time 100 s*Cold Start Probability 0.14 %*Rejection Probability 0 %*Average Instance Lifespan 6307.7389 s*Average Server Count 7.6795*Average Running Servers 1.7902*Average Idle Count 5.8893applications. As can be seen, the system can produceQoS-related parameters like the probability of coldstart or rejection of the request for a given arrival rate,which can help the application developer analyze andfind the limits of the system. In addition, the appli-cation developer can also use the average number ofrunning servers as an important measure for the cost oftheir service that can be used for setting different con-figurations of services that the function relies on, e.g.,the database concurrent connection capacity (Ama-zon Web Services Inc., 2020). Besides, informationlike the average server count can produce an estimatefor the infrastructure cost incurred by the serverlessprovider. The serverless provider can use SimFaaS asa tool to analyze the possible effect of changing pa-rameters like the expiration threshold on their incurredcost and QoS for different scenarios.Another way the proposed simulator can be lever-aged is for extracting information about the systemthat is not visible to software engineers and devel-opers in public serverless computing platforms likeAWS Lambda or Google Cloud Functions. This in-formation could facilitate research for predicting cost,performance, database configurations or other relatedparameters. For example, we can find out the distri-bution of instance counts in the system throughouttime in the simulated platform for input parametersshown in Table 1, which is shown in Figure 3. Thisinformation can help researchers develop performancemodels based on internal states of the system withvery good accuracy, which is otherwise not possiblein public serverless offerings. To further analyze thereproducibility of our instance count estimation us-ng the parameters in Table 1, we ran 10 independentsimulations and generated our estimation of averageinstance count over time for each run. Figure 4 showsthe average and 95% confidence interval of our estima-tion over those runs. As can be seen, our estimationconverges, showing less than 1% deviation from themean in the 95% confidence interval. R a t i o o f T i m e Figure 3: The instance count distribution of the simulatedprocess throughout time. The y-axis represents the portionof time in the simulation with a specific number of instances. I n s t a n c e C o un t Avg Estimate95% CI
Figure 4: The estimated average instance count over time in10 simulations. The solid line shows the average of simula-tions and the shaded area shows the 95% Confidence Interval(CI).
Although the steady-state analysis of the serverlesscomputing platform’s performance can give us thelong-term quality of service metrics, the applicationdeveloper or the serverless provider might be inter-ested in the platform’s transient behaviour. A transientanalysis simulation can provide insight into the im-mediate future, facilitating time-bound performanceguarantees. Besides, it can help serverless providersensure the short-term quality of service when tryingnew designs.Previous efforts have been made to develop per- formance models able to provide transient analysis ofserverless computing platforms (Mahmoudi and Khaz-aei, 2020b). However, there are inherent limitationsto such performance models, like the absence of batcharrival modelling and being limited to Markovian pro-cesses. SimFaaS doesn’t have such limitations andcan help both application developers and serverlessproviders gain insight into the transient aspect of theperformance of serverless computing platforms.
Due to the inherent highly-dynamic infrastructure ofserverless computing platforms, there are very fewtools from the performance engineering methodologiesand analysis that can be used in the emerging serverlesstechnologies. Because of this inherent lack of toolsand resources, serverless computing platforms wereforced to use trial and error through implementation toanalyze their new designs for making performance andefficiency improvements. There have been previousstudies that proposed analytical performance modelsfor serverless computing platforms (Mahmoudi andKhazaei, 2020a; Mahmoudi and Khazaei, 2020b), butthese methods have limitations like only supportingMarkovian processes, limiting their applicability in anumber of scenarios.One major benefit of having an accurate serverlessplatform simulator is the ability to perform what-ifanalysis on different configurations and find the best-performing settings for a given workload. Implementa-tion and experimentation to gather similar data is bothtime-consuming and costly, while using the proposedsimulator makes the data collection much faster andeasier. Figure 5 shows an example of such an anal-ysis for different values for the expiration threshold in the system. Different workloads running on server-less computing platforms might have different perfor-mance/cost criteria. Using what-if analysis poweredby an accurate performance model, one could optimizethe configuration for each unique workload. Similarexamples can be found in the project examples.
Performing cost prediction under different loads incloud computing is generally a very challenging task.These challenges tend to be exacerbated in the highlydynamic structure of serverless computing platforms.Generally, there is a broad range of possible costs for agiven serverless function, including computation, stor-age, networking, database, or other API-based serviceslike machine learning engines or statistical analysis.However, all charges incurred by serverless functionsan be seen as either per-request charges (e.g., exter-nal APIs, machine learning, face recognition, networkI/O) or runtime charges billed based on execution time(e.g., memory or computation). Per-request chargescan be calculated using only the average arrival rate.However, runtime charges may differ under differentload intensities due to the difference in cold start prob-ability. Using the proposed simulator, users can get anestimate on the cold start probability and the averagenumber of running servers, which are necessary forcost estimation under different load intensities.In addition to the average running server count,which helps estimate the cost incurred by the appli-cation developer, the average total server count is lin-early proportional to the infrastructure cost incurredby the serverless provider. Thus, using the proposedsimulator, both developer charges and infrastructurecharges incurred by the provider can be estimated un-der different configurations, which can help improvethe platform by studying the effect of using differentconfigurations or designs without the need to performexpensive or time-consuming experiments or imple-mentations. P r o b . o f C o l d S t a r t ( % )
10 sec1 min10 min20 min30 min
Figure 5: Cold start probability against the arrival rate fordifferent values of the expiration threshold for the workloadspecified in Table 1. SimFaaS can ease the process of con-ducting experiments with several configurations to find thebest performing one.
To show the accuracy of the proposed simulator,we performed extensive experimentations on AWSLambda and showed that the results were in tune withthe results from SimFaaS. The experiments are aboveone month of running benchmark applications on AWSLambda and are openly accessible on Github . All of https://github.com/pacslab/serverless-performance-modeling/tree/master/experiments/results our experiments were executed for a 28-hour windowwith 10 minutes of warm-up time in the beginning, dur-ing which we do not record any data. The workloadused in this work was based on the work of Wang etal. (Wang et al., 2018) with minor modifications. Ourworkload is openly available in our Github repository .For the purpose of experimental validation, we used acombination of CPU intensive and I/O intensive work-loads. During the experimentation, we have obtainedperformance metrics and the other parameters such ascold/warm start information, instance id, lifespan, etc. In our AWS Lambda deployment, we used the
Python3.6 runtime with 128 MB of RAM deployed on the us-east-1 region in order to have the lowest possiblelatency from our client. Note that the memory con-figuration won’t affect the accuracy of the simulationas the results depend on the service time distribution,which captures the effect of changing the memory con-figuration. For the client, we used a virtual machinewith 8 vCPUs, 16 GB of memory, and 1000 Mbpsnetwork connectivity with single-digit millisecondslatency to AWS servers hosted on Compute CanadaArbutus cloud . We used Python as the programminglanguage and the official boto3 library to communi-cate with the AWS Lambda API to make the requestsand process the resulting logs for each request. Forload-testing and generation of the client requests basedon a Poisson process, we used our in-house work-load generation library , which is openly accessiblethrough PyPi . The result is stored in a CSV fileand then processed using Pandas, Numpy, Matplotlib,and Seaborn. The dataset, parser, and the code for ex-traction of system parameters and properties are alsopublicly available in our analytical model project’sGithub repository . We need to estimate the system characteristics to beused in our simulator as input parameters. In thissection, we discuss our approach to estimating each ofthese parameters. https://github.com/pacslab/serverless-performance-modeling https://docs.computecanada.ca/wiki/Cloud_resources https://github.com/pacslab/pacswg https://pypi.org/project/pacswg https://github.com/pacslab/serverless-performance-modeling xpiration Threshold: here, our goal is to mea-sure the expiration threshold, which is the amount oftime after which inactive function instance in the warmpool will be expired and therefore terminated. To mea-sure this parameter, we created an experiment in whichwe make requests with increasing inter-arrival timesuntil we see a cold start meaning that the system has ter-minated the instance between two consecutive requests.We performed this experiment on AWS lambda withthe starting inter-arrival time of 10 seconds, each timeincreasing it by 10 seconds until we see a cold start. Inour experiments, AWS lambda instances seemed to ex-pire an instance exactly after 10 minutes of inactivity(after it has processed its last request). This numberdid not change in any of our experiments leading us toassume it is a deterministic value. This observation hasalso been verified in (Mikhail Shilkov, 2020; Shahradet al., 2020). Average Warm Response Time and AverageCold Response Time: with the definitions providedhere, warm response time is the service time of thefunction, and cold response time includes both provi-sioning time and service time. To measure the averagewarm response time and the average cold responsetime, we used the average of response times measuredthroughout the experiment.
In this section, we outline our methodology for measur-ing the performance metrics of the system, comparingthe results with the predictions of our simulator.
Probability of Cold Start: to measure the proba-bility of cold start, we divide the number of requestscausing a cold start by the total number of requestsmade during our experiment. Due to the inherentscarcity of cold starts in most of our experiments, weobserved an increased noise in our measurements forthe probability of cold start, which led to increasingthe window for data collection to about 28 hours foreach sampled point.
Mean Number of Instances in the Warm Pool: to measure the mean number of instances in the warmpool, we count the number of unique instances thathave responded to the client’s requests in the past 10minutes. We use a unique identifier for each functioninstance to keep track of their life cycle, as obtainedin (Wang et al., 2018).
Mean Number of Running Instances: we cal-culate this metric by observing the system every tenseconds, counting the number of in-flight requests inthe system, taking the average as our estimate.
Mean Number of Idle Instances: this can bemeasured as the difference between the total average number of instances in the warm pool and the numberof instances busy running the requests.
Average Wasted Capacity: for this metric, we de-fine the utilized capacity as the ratio of the number ofrunning instances over all instances in the warm pool.Using this definition, the ratio of idle instances overall instances in the warm pool is the wasted portion ofcapacity provisioned for our workload. Note that thisvalue is very important to the provider as it measuresthe ratio of the utilized capacity (billed for the applica-tion developer) over the deployed capacity (reflectingthe infrastructure cost).
Figure 6 shows the probability of cold start for differ-ent arrival rates extracted from the simulation com-pared with real-world results. As can be seen, theresults match the performance metrics extracted fromexperimentations. The results show an average errorof 12 . .
14% showing the accuracy of the resultsobtained from the simulation. Figures 7 and 8 show theaverage number of instances and the average wastedcapacity (in idle state) for the simulation and experi-ments with Mean Absolute Percentage Error (MAPE)of 3 .
43% and 0 . P r o b . o f C o l d S t a r t ( % ) SimulationExperiment
Figure 6: Probability of cold start extracted from simula-tion compared with real-world experimentations on AWSLambda.
Many recent works in the area of serverless computingplatforms have focused on studying and finding waysto improve the performance of serverless computingplatforms (van Eyk and Iosup, 2018; Manner et al.,2018; Manner, 2019; Boza et al., 2017; Abad et al.,2018; Jeon et al., 2019). However, to the best of the .01 0.10 1.00Arrival Rate (reqs/s)2468 A v e r a g e I n s t a n c e C o un t SimulationExperiment
Figure 7: The average number of instances extracted fromsimulation compared with real-world experimentations onAWS Lambda. A v e r a g e W a s t e d C a p a c i t y ( % ) SimulationExperiment
Figure 8: Average wasted resources extracted from simula-tion compared with real-world experimentations on AWSLambda. authors’ knowledge, none have been able to predictor simulate comprehensive performance or quality ofmetrics characteristics for a given workload. In thissection, we will go through the most related recentworks in the prediction of the performance or qualityof service for serverless computing platforms.Some of the previous studies have focused ondeveloping comprehensive performance models forsteady-state and transient analysis of a given work-load (Mahmoudi and Khazaei, 2020a; Mahmoudi andKhazaei, 2020b). However, the proposed models im-pose several limitations on the arrival and service timeprocesses and cannot handle batch arrivals. Theselimitations render the developed performance modelunusable for many types of workloads, especially forbatch and analytics workloads. The serverless per-formance simulator proposed in this work can handleany type of arrival or service time process and canbe adapted to future emerging serverless computingmanagement models with less manual effort.In (Bortolini and Obelheiro, 2019), Bortolini etal. used experimentations on different configurationsand serverless providers in order to find the most im-portant factors influencing the performance and cost of current serverless platforms. In their study, theyfound low predictability of cost as one of the mostimportant drawbacks of serverless computing plat-forms. Using simulators like what we proposed in thiswork can help improve the predictability of the costof a given workload under different load intensities.Hellerstein et al. (Hellerstein et al., 2018) addressedthe main shortcomings present in the first-generationserverless computing platforms and the anti-patternspresent in them. They showed that current implemen-tations are restricting distributed programming andcloud computing innovations. Eyk et al. (Van Eyket al., 2018) found the most important issues surround-ing the widespread adoption of FaaS to be sizeableoverheads, unreliable performance, and new forms ofcost-performance trade-off. In their work, they identi-fied six performance-related challenges for the domainof serverless computing and proposed a roadmap foralleviating these challenges. Several of the aforemen-tioned shortcomings of serverless computing platformscan be mitigated by predicting the cost-performancetrade-off using serverless simulators.Eyk et al. (van Eyk and Iosup, 2018) investigatedthe performance challenges in current serverless com-puting platforms. They found the most important chal-lenges in the adoption of FaaS to be the remarkablecomputational overhead, unreliable performance, andabsence of benchmarks. The introduction of an accu-rate simulator for function-as-a-service offerings couldovercome some of these shortcomings. Manner etal. (Manner et al., 2018) investigated the factors influ-encing the cold start performance of serverless comput-ing platforms. Their experiments on AWS Lambda andAzure Functions show that factors like the program-ming language, deployment package size, and memorysettings affect the performance on serverless comput-ing platforms. In a later study, Manner et al. (Manner,2019) describe the importance of an accurate simula-tor for Function-as-a-Service (FaaS) products. Theymention how scaling, cold starts, function configura-tions, dependent services, network latency, and otherimportant configurations influence cost-performancetrade-off. In their work, they propose a simulationframework for a cost and performance simulator forserverless computing platforms. In this platform, theysuggest extracted mean values from experiments asinputs to the performance model in order to calculatedifferent properties.Boza et al. (Boza et al., 2017) introduced a model-based simulation for cloud budget planning. In theirwork, they perform cost simulation for using reservedVMs, on-demand VMs, bid-based VMs, and serverlesscomputing for a similar computing task. In their work,the serverless computing simulation is overly simplis-ic for performance modelling researchers and lacksseveral important details. In this work, we focus solelyon performance simulation of serverless computingplatforms, but with more details in mind, which seemsnecessary for the simulator to be leveraged by the per-formance research community, application developers,and serverless providers.Abad et al. (Abad et al., 2018) mainly consideredthe problem of scheduling small cloud functions onserverless computing platforms. As a part of their eval-uations, they implemented a SimPy-based simulationfor their proposed scheduling method. Although thiswork shows promise of rather accurate serverless com-puting simulations, their focus is on scheduling taskswhile ignoring several details of interest for perfor-mance modelling. In this work, we strive to fill thisgap by providing the performance modelling researchwith the proper tooling necessary for high-fidelity per-formance models of serverless computing platforms.Jeon et al. (Jeon et al., 2019) introduced a CloudSim ex-tension focused on Distributed Function-as-a-Service(DFaaS) on edge devices. Although the DFaaS sys-tems hold a great promise for the future of serverlesscomputing, it doesn’t allow simulation for the main-stream serverless computing platforms.
In this work, we presented SimFaaS, a simulator formodern serverless computing platforms with sufficientdetails to yield very accurate results. We introduceda range of tools available for performance modellingresearchers giving them insights and details into sev-eral internal properties that are not visible for users inpublic serverless computing platforms. We reviewedsome of the possible use cases of the proposed simula-tor and showed its accuracy through comparison withreal-world traces gathered from running benchmarkapplications on AWS Lambda.SimFaaS enables performance modelling re-searchers with a tool allowing them to develop accurateperformance models using the internal state of the sys-tem, which cannot be monitored on public serverlesscomputing platforms. Using SimFaaS, both server-less providers and application developers can predictthe quality of service, expected infrastructure and in-curred cost, amount of wasted resources, and energyconsumption without performing lengthy and expen-sive experimentations. The benefits of using SimFaaSfor serverless computing platform providers could betwo-fold: 1) They can examine new designs, develop-ments, and deployments in their platforms by initially validating new ideas on SimFaaS, which will be sig-nificantly cheaper in terms of time and cost comparedto actual prototyping; 2) They can provide users withfine-grain control over the cost-performance trade-offby modifying the platform parameters (e.g., expirationthreshold ). This is mainly due to the fact that thereis no universal optimal point in the cost-performancetrade-off for all workloads. By making accurate pre-dictions, a serverless provider can better optimize theirresource usage while improving the application devel-opers’ experience and consequently the end-users.As future work, we plan to extend SimFaaS toinclude new generations of serverless computing, inaddition to adding several billing schemas in order topredict the cost of running workloads on the serverlessplatform. We are also aiming to maintain SimFaaSby the members of PACS Lab to add new featuresoffered by serverless providers. Acknowledgements
REFERENCES
Abad, C. L., Boza, E. F., and Van Eyk, E. (2018). Package-aware scheduling of faas functions. In
Companionof the 2018 ACM/SPEC International Conference onPerformance Engineering , pages 101–106.Amazon Web Services Inc. (2020). Amazon Dy-namoDB Read/Write Capacity Mode. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html . Last accessed2020-11-20.Bermbach, D., Karakaya, A. S., and Buchholz, S. (2020).Using application knowledge to reduce cold starts infaas services. In
Proceedings of the 35th ACM/SIGAPPSymposium on Applied Computing .Bortolini, D. and Obelheiro, R. R. (2019). Investigatingperformance and cost in function-as-a-service plat-forms. In
International Conference on P2P, Parallel,Grid, Cloud and Internet Computing , pages 174–185.Springer.Boza, E. F., Abad, C. L., Villavicencio, M., Quimba, S., andPlaza, J. A. (2017). Reserved, on demand or serverless:Model-based simulations for cloud budget planning. https://pacs.eecs.yorku.ca n , pages 1–6. IEEE.Figiela, K., Gajek, A., Zima, A., Obrok, B., and Malawski,M. (2018). Performance evaluation of heterogeneouscloud functions. Concurrency and Computation: Prac-tice and Experience , 30(23):e4792.Google Cloud Platform Inc. (2020). Concur-rency. https://cloud.google.com/run/docs/about-concurrency . Last accessed 2020-02-13.Hellerstein, J. M., Faleiro, J., Gonzalez, J. E., Schleier-Smith,J., Sreekanti, V., Tumanov, A., and Wu, C. (2018).Serverless computing: One step forward, two stepsback. arXiv preprint arXiv:1812.03651 .Jeon, H., Cho, C., Shin, S., and Yoon, S. (2019). A cloudsim-extension for simulating distributed functions-as-a-service. In , pages 386–391. IEEE.Lin, P.-M. and Glikson, A. (2019). Mitigating cold starts inserverless platforms: A pool-based approach. arXivpreprint arXiv:1903.12221 .Lloyd, W., Ramesh, S., Chinthalapati, S., Ly, L., and Pal-lickara, S. (2018). Serverless computing: An investiga-tion of factors influencing microservice performance.In , pages 159–169. IEEE.Mahmoudi, N. and Khazaei, H. (2020a). Performance Mod-eling of Serverless Computing Platforms.
IEEE Trans-actions on Cloud Computing , pages 1–15.Mahmoudi, N. and Khazaei, H. (2020b). Temporal Perfor-mance Modelling of Serverless Computing Platforms.In
Proceedings of the 6th International Workshop onServerless Computing , WOSC ’20, pages 1–6. Associ-ation for Computing Machinery.Mahmoudi, N., Lin, C., Khazaei, H., and Litoiu, M. (2019).Optimizing serverless computing: introducing an adap-tive function placement algorithm. In
Proceedings ofthe 29th Annual International Conference on ComputerScience and Software Engineering , pages 203–213.Manner, J. (2019). Towards performance and cost simulationin function as a service.
Proc. ZEUS (accepted) .Manner, J., Endreß, M., Heckel, T., and Wirtz, G. (2018).Cold start influencing factors in function as a service.In ,pages 181–188. IEEE.McGrath, G. and Brenner, P. R. (2017). Serverless com-puting: Design, implementation, and performance.In ,pages 405–410. IEEE.Mikhail Shilkov (2020). Cold starts in aws lambda. https://mikhail.io/serverless/coldstarts/aws/ .Last accessed 2020-03-18.Shahrad, M., Fonseca, R., Goiri, ´I., Chaudhry, G., Batum, P.,Cooke, J., Laureano, E., Tresness, C., Russinovich, M.,and Bianchini, R. (2020). Serverless in the wild: Char-acterizing and optimizing the serverless workload at alarge cloud provider. arXiv preprint arXiv:2003.03423 . van Eyk, E. and Iosup, A. (2018). Addressing performancechallenges in serverless computing. In
Proc. ICT.OPEN .Van Eyk, E., Iosup, A., Abad, C. L., Grohmann, J., andEismann, S. (2018). A spec rg cloud group’s vision onthe performance challenges of faas cloud architectures.In
Companion of the 2018 ACM/SPEC InternationalConference on Performance Engineering , pages 21–24.ACM.Wang, L., Li, M., Zhang, Y., Ristenpart, T., and Swift, M.(2018). Peeking behind the curtains of serverless plat-forms. In2018 USENIX Annual Technical Conference(USENIX ATC 18)