[PDF] Analysis of SLA Compliance in the Cloud -- An Automated, Model-based Approach

Abstract

Service Level Agreements (SLA) are commonly used to specify the quality attributes between cloud service providers and the customers. A violation of SLAs can result in high penalties. To allow the analysis of SLA compliance before the services are deployed, we describe in this paper an approach for SLA-aware deployment of services on the cloud, and illustrate its workflow by means of a case study. The approach is based on formal models combined with static analysis tools and generated runtime monitors. As such, it fits well within a methodology combining software development with information technology operations (DevOps).

Full PDF

DD. Ancona and G. Pace (Eds.): Veriﬁcationof Objects at RunTime EXecution 2018 (VORTEX 2018)EPTCS 302, 2019, pp. 1–15, doi:10.4204/EPTCS.302.1 c (cid:13)

F. de Boer et al.This work is licensed under theCreative Commons Attribution License.

Analysis of SLA Compliance in the Cloud:An Automated, Model-based Approach ∗ Frank S. de Boer

CWI AmsterdamThe Netherlands

[email protected]

Elena Giachino

University of BolognaItaly [email protected]

Stijn de Gouw

The Open UniversityThe Netherlands [email protected]

Reiner H¨ahnle

Technical University of DarmstadtGermany [email protected]

Einar Broch Johnsen

University of OsloNorway [email protected]

Cosimo Laneve

University of BolognaItaly [email protected]

Ka I Pun

Western Norway University of Applied SciencesUniversity of OsloNorway

[email protected]

Gianluigi Zavattaro

University of BolognaItaly [email protected]

Abstract.

Service Level Agreements (SLA) are commonly used to specify the quality attributesbetween cloud service providers and the customers. A violation of SLAs can result in high penalties.To allow the analysis of SLA compliance before the services are deployed, we describe in this paperan approach for SLA-aware deployment of services on the cloud, and illustrate its workﬂow bymeans of a case study. The approach is based on formal models combined with static analysis toolsand generated runtime monitors. As such, it ﬁts well within a methodology combining softwaredevelopment with information technology operations (DevOps).

Every customer wants to be sure about the quality of his purchases. In the cloud world, this qualityassurance includes guarantees on service performance .Service Level Agreements (SLAs) are legal documents, signed and agreed upon by cloud serviceproviders and their customers, which specify the agreed quality of service. An SLA violation will resultin penalties and possibly in a loss of money, clients, and credibility. Even though the stakes are high,there are only few tools with limited capabilities available to check the compliance of cloud services withSLAs. But why does it seem to be so difﬁcult to provide tool support for SLA compliance checking andmonitoring?For a start, a number of complex and challenging questions arise: How to describe service per-formance? How many resources, for example, memory or virtual machines, should be assigned to aparticular service and how should they be conﬁgured? How to react optimally at runtime to take ad-vantage of the elasticity of the cloud? How to estimate the future behavior of a service and adjust theresource conﬁguration accordingly? ∗ Partially funded by EU project FP7-610582 ENVISAGE: Engineering Virtualized Services . Other concerns include security, support, data management and data protection [15].

Analysis of SLA Compliance in the Cloud

These are challenging issues! It is beyond current technology to address them in a general way for anygiven SLA and any given software. To develop effective tools for SLA compliance analysis, we believeit is essential to work at the level of models , and describe and analyze SLAs in a way that is independentof the concrete technology offered by the cloud service provider. Shifting to the modeling level increasesthe level of abstraction, reduces complexity, and removes dependency on a speciﬁc runtime environment.The importance of models applies to SLAs as well as to software: a model-centric approach allows usto create a formal representation of the essential aspects of an SLA. At the same time, software servicesdeployed on the cloud can be represented as an executable service model , annotated with parametricexpressions for their use of resources. Combining the two models, i.e., of the SLA and of the cloudservices, makes it now possible to use techniques with a formal basis, such as static software analysis ormonitor generation. Such tools provide proven guarantees on service performance, thereby vastly raisingthe degree of automation.

Beneﬁts of model-centric, tool-based SLA analysis

An effective solution to SLA design and compliance must coordinate all phases of service provisioning: • Provide assistance in the conﬁguration of SLA metric bounds and provisioning of virtual machineswhose resources comply to the services’ requirements • Permit automatic monitoring of the service at runtime • Enable speedy reaction to an SLA violation and assistance in its resolution • Support deployment: signiﬁcant simpliﬁcation and increased automation

In this paper, we present an approach to facilitate SLA-aware deployment of services on the cloudby combining the formal executable model of the target system of deployed services and the formalrepresentation of corresponding SLAs. We deﬁne a detailed workﬂow that takes advantage of the for-mal models by enabling automated tool support at various stages. With the help of a case study wedomonstrate how our approach can be realized for a real-world cloud service provider.The paper is organised as follows: Section 2 describes the cloud service performance metrics thatour approach is supposed to measure and verify, Section 3 outlines the workﬂow of the model-centricSLA compliance analysis; Section 4 provides a short introduction of the modeling language used in thisapproach, Section 5 presents the case study, Section 6, and Section 7 concludes the paper.

Service performance metrics measure and assess the performance level of a service, quantitatively andperiodically. Typical metrics fall into one of the following categories:

Availability is the property of a service to be accessible and usable on demand. It includes ( i ) the level ofuptime , namely the percentage of time a service is up within a deﬁned period; ( ii ) the percentageof successful requests , namely the number of requests processed without an error over the totalnumber of submitted requests; ( iii ) the percentage of timely serviced responses to requests , that isthe number of service provisioning requests completed within a deﬁned time period over the totalnumber of service provisioning requests. . de Boer et al. Response time is the time period between a client request event and a service response event. Theservice metrics used to constrain the response time may return either an average time or a maximumtime , given a particular form of request.

Capacity is the maximum amount of a resource used by a service. It includes the service throughputmetric , namely the minimum number of requests that will be processed by a service in a statedtime period. If there is no extra resources provided, the more resources a service requires, thelower the service throughput will be.Several factors contribute to the quality level of a service. They can be classiﬁed as internal ,such as the available resources, code quality, or the computational complexity, and external . The lat-ter ones are outside the direct control of the stakeholders and include, for example, network avail-ability or the number of accesses/requests. The situation is metaphorically illustrated in Figure 1. E XTERNAL F ACTORS monitor, mitigate, redeploy I NTERNAL F ACTORS analyse, plan, deployTime Q u a lit y Figure 1: Service quality level over time and its in-ﬂuencesInternal factors can principally be controlled (and,if so desired, be modiﬁed) at deployment timewith techniques that either directly verify the code(static analysis) or with the help of an underlyingmathematical model (model checking, simulation,etc.). Whenever the service implementation doesnot comply with the metric, the designer makescode modiﬁcations that eventually lead to com-pliance. A typical example of this is the analysisof the resource capacity of a service, which mea-sures to which extent a critical resource is used.For instance, a static analysis technique can deter-mine an upper bound on the resources needed bya service [2, 12]; if a service is deployed on an in-sufﬁcient number of machines, then its response time increases, or it even becomes unavailable. Thus,internal factors can be expressed and analyzed inside a model, and integrated into the plans for the initialdeployment of the service on the cloud.External factors can not be controlled or analyzed in advance, but they can be supervised by mon-itoring code that runs independently of the service implementation. Monitoring is always needed, asthere are (performance) metrics that are affected by external factors, for example, hardware failures,which cannot be statically veriﬁed. In this case, neither the service implementation nor the resourceconﬁguration is at fault. However, a runtime monitor can still be helpful, for example, when it triggers adynamic resource re-allocation that compensates for a faulty component. Thus, external factors cannotbe expressed inside the model and must be monitored on the deployed service and then mitigated through(static or dynamic) redeployment of the service.

In Section 1, we argued that a model-centric representation of SLAs and services constitutes the basisfor advanced tool support for cloud service conﬁguration and deployment. In Section 2, we explainedhow the service quality is inﬂuenced by internal factors, to be addressed by compliance checking ofservice implementations against SLAs; and by external factors, to be mitigated with the help of runtimemonitors. In Figure 2, we illustrate a workﬂow that realizes model-centric conﬁguration of cloud services

Analysis of SLA Compliance in the Cloud

DeploymentanalysistoolsMetricfunctionsSLA c o m p li a n ce v i o l a ti on extract ServiceModel Resourceconﬁguration c o m p li a n ce v i o l a ti on Monitoradd-on generate v e r i f y Runtime TechniquesMonitoringplatform ob s e r v e / r eac t ( d e - ) a l l o c a t e feedback Serviceimplementation p r ov i s i on g e n e r a t e c od ee x t r ac t t e s t ca s e s c o m p li a n ce v i o l a ti on c o m p li a n ce v i o l a ti on Figure 2: workﬂow of service conﬁguration and deploymentto optimize SLA compliance under internal and external factors. The workﬂow is divided into three phases, namely, Negotiation, Observation and Reaction.Static (deployment analysis) techniques play an important role in generating initial metrics and mon-itors that are used in run-time techniques.

Feedback loops —represented by dashed arrows—to a previousphase of the analysis, represent modiﬁcations to the system that ensure, for example, continued compli-ance after external changes. Thick blue arrows indicate tool inputs.

Negotiation phase.

This phase includes everything that might happen before signing a prospectiveSLA. At this stage the SLA metrics are set, so that the service model can be veriﬁed against them. The SLA (top-left corner) is written in a machine-readable standard format (ISO 19086-2). Quality-of-serviceupper bounds, expressed in terms of metric functions over possible service measurements, are extractedfrom it. An initial resource conﬁguration is deﬁned over the types of resources that are allocated for theservice (such as CPUs, memory, bandwidth, etc.). It can be speciﬁed manually, or it can be computedautomatically by a solver that returns an optimal distribution of resources to service instances, given theknowledge of the initial instances to be deployed, their required computing resources and the resourcecosts [1]. At the same time, an executable service model is extracted from the components of the actual service implementation . The system is provisioned and deployed using the initial resource conﬁguration.A suite of deployment analysis tools now takes the three inputs (Metric functions, Service model,and Resource conﬁguration) and produces responses as output to form a feasibility assessment. The toolscan verify properties such as: upper bounds of the resource consumption (bandwidth, virtual machines,memory allocation, CPU processing cycles), liveness (deadlock-freedom) and safety (functionality). Ifthe tools report that a service model violates an SLA constraint, then either the constraint can be relaxedor the resource allocation can be suitably enlarged during the negotiation phase (with a possible charge Phrases typeset in italics correspond to the artifacts in rectangle boxes in Figure 2. . de Boer et al. in theabsence of external factors , the service implementation and the resource conﬁguration comply to theSLA—the next phase can start.

Observation phase.

The SLA is now signed and the service implementation is up and running. Factorsunder external control, such as the network infrastructure, may affect the behavior of the service in waysthat could not be predicted statically. To supervise the service metrics we use a monitoring system,namely code external to the service that continuously monitors its execution and uses self-healing torepair resource failures or mitigate SLA violations.The code of the monitor add-ons is automatically generated (or conﬁgured), starting from the speciﬁcmetric functions they are intended to monitor. Static techniques may be used at this stage for proving thecorrectness of the generated code, i.e., that the monitors are observing the right property. Moreover, statictechniques may be performed again at runtime, periodically, on the service model, to estimate the futurebehavior of the service in a next time window. Feedback from the monitoring system can signiﬁcantlyaugment the precision of the analysis.

Reaction phase.

System monitoring lets the service provider report violations of the agreed SLA viaa monitoring platform . However, the ultimate goal for a provider is to dynamically adapt the resourceconﬁguration so that SLA violations remain under a penalty threshold while minimizing the cost of therunning system. This can be achieved by adding appropriate resources to the service (e.g., scaling up thenumber and/or size of the virtual machines).The observation phase takes measurements on services. If an SLA mismatch is observed by themonitoring platform, in the reaction phase, the number of allocated resources is increased or decreasedaccordingly. As was done for the initial conﬁguration, also in this phase the modiﬁcation of the re-sources assigned to objects can be done either manually or automatically. A solver computes which newresources are required and how new service instances should be distributed on these resources, or howold objects and resources that are no longer necessary should be un-deployed, given the knowledge of thecurrent resource conﬁguration and the new requirements indicated by the monitoring framework. Fullyautomatic dynamic elasticity can be obtained thanks to the combined use of the monitoring frameworkand the external deployment solver [14].

ABS [23] is a modeling language which can be used to realize model-centric analysis of SLA complianceaccording to the workﬂow outlined in Section 3. We brieﬂy summarize the main relevant features of ABS(see ). Analysis of SLA Compliance in the Cloud

The ABS tool suite ( abs-models.org/abs-tools ) Simulation tool for rapid model exploration and visualization

Deadlock analysis tool automatically checks that the model is deadlock free, focusing on the communi-cation protocols in the model

Systematic testing tool provides a technique to eliminate redundant test cases for the concurrent execu-tion of ABS models.

Test case generation tool for the automatic generation of test cases for concurrent objects in ABS

Termination and resource consumption tool automatically infers cost bounds for selected parts of themodel for, e.g., execution cost or transmission data size

ABS Smart Deployer ﬁnds the optimal deployment of components on virtual machines, given a userspeciﬁcation of how components should be connected and of their resource requirements

Code generation tools enable rapid prototyping on real machines and integration with other programs,using Haskell or a Java library

Formal veriﬁcation tool supports deductive analysis of behavioral properties, including communicationtraces

Monitoring framework for SLA metrics is used to automatically conﬁgure correct monitors for the de-ployed system and monitor the system at a high level, according to the SLA

Figure 3: ABS tool suiteABS is a language for A bstract B ehavioral S peciﬁcation, which was designed for analyzability. Itcombines implementation-level speciﬁcations with veriﬁability, high-level design with executability, andformal semantics with practical usability. ABS is a concurrent, object-oriented modeling language builtaround a simple functional language with user-deﬁned algebraic datatypes. Models are easy to under-stand and written in a familiar, Java-like syntax. In addition, ABS enables replaying a real-world log inthe corresponding executable model through a so-called Model API [31]. It also explicitly supports themodeling of resource consumption on virtual machine instances [24]. Thus, the language allows analysisof deployment decisions , including a conﬁgurable model of cloud provisioning [17], and has been usedfor industrial case studies [3]. Both the resource requirements and timing properties of models can be ex-pressed and analyzed, which makes it easy to compare deployment decisions at the level of models [33]by means of a large portfolio of analysis and deployment tools (see Figure 3). The company Fredhopper provided the Fredhopper Cloud Services to offer search on a large productdatabase to e-Commerce companies as services (SaaS) over the cloud computing infrastructure (IaaS).At the time of the case study, Fredhopper Cloud Services powered over 350 global retailers with morethan 16 billion $ in online sales per year. A customer (service consumer) of Fredhopper is a web shop,and an end user is a visitor to the web shop.Software services offered by Fredhopper are RESTful and deployed as service instances that accept . de Boer et al. Service EndpointService Endpoint InfrastructurePlatform ServiceService Instance Load Balancing Service Monitoring/AlertingServiceService EndpointService Instance Service Instance Service InstanceDeployment ServiceService APIsFredhopper Cloud ServiceCloud ProviderConsumes ProvidesCustomersCustomers

Figure 4: The architecture of the Fredhopper Cloud Servicesconnections over HTTP. Each instance offers the same service and is exposed via Load Balancer end-points that distribute requests using a round-robin strategy over the service instances. Figure 4 shows ablock diagram of the Fredhopper Cloud Services.The number of requests can vary greatly over time, and typically depend on several factors. Forinstance, the time of the day where most of the end users are located plays an important role. Typicallows in demand are observed between 2 am and 5 am. Figure 5 shows a visualization of monitoreddata in Grafana, the visualization framework used by ABS. The top graph shows the number of query’scompleted per second (qps), the middle graph (current requests) shows the number of concurrently servedrequests averaged over all service instances of the customer, and the downmost graph visualizes theaverage CPU usage over time.

SLA.

Peaks in demand of Fredhopper Cloud Services typically occur during promotions of the web-shop or around Christmas. To ensure a high quality of service, web shops negotiate an aggressive ServiceLevel Agreement (SLA) with Fredhopper. QoS attributes of interest include query latency ( responsetime ) and throughput ( queries per second ). The SLA negotiated with a customer could express, e.g., service degradation requirements as follows: “Services must respond to queries in less than 200 millisecondsover 99.5% of the service up-time.” (a)An SLA speciﬁes properties of service metric functions. For the example SLA, the service metric func-tion is deﬁned as the percentage of client requests which are processed in a “slow” manner, i.e., thepercentage of queries slower than 200 milliseconds.

Analysis of SLA Compliance in the Cloud

Figure 5: Visualization of metrics

Formalizing the service metric function.

In ABS we formalize a service metric function using anattribute grammar as a partial mapping of traces of events to values. The events represent client inter-actions with an endpoint of an exposed service API. The mapping can be partial to detect and excludeillegal orderings of service invocations. The values correspond to different levels of the provided qual-ity of service (QoS). The deﬁnition of the attributes is given for each production in the form of ABScode. This ensures that the attribute values are computable and sufﬁciently expressive (ABS is Turingcomplete) to capture general metrics. To establish whether a trace of service events is legal, and if so,what QoS level it should give rise to, the event trace is parsed according to the grammar. As such, gram-mars are a user-friendly formalism and are particularly well-suited for the speciﬁcation of both data- andprotocol-oriented properties of event traces. All regular grammars (with attributes) are currently sup- . de Boer et al. To formalize our service degradation metric, we identify the processing of a client request sent to anendpoint of the exposed query service API by an event invoke(Time t, Rat procTime)

This event indicates that the request has been issued at time t and that it has processing time procTime . In our formalization, a service view identiﬁes all the events that are relevant for a partic-ular service metric and associates a name to each such event. These names will be used as grammarterminals. Since there can be many SLAs in a managed cloud service, and each SLA may concern adifferent subset of events from the service API(s), service views allow users to select only those eventsrelevant for that SLA. For simplicity, we assume that we treat all requests in the same way. A view thatsimply identiﬁes the invoke event as the only relevant event and associates the name “query” with thisevent, is expressed as follows: view Degradation { invoke(Time t, Rat procTime) query } Figure 6 contains a grammar that computes as the main metric the percentage of slow queries “degra-dation”. The string “fas.200” gives the name of the metric. The parameters of the invoke event, e.g.,“procTime”, are directly referred to in the grammar by their name and are used to compute the “degra-dation” percentage. The grammar further makes use of the auxiliary concepts “cnt”, the total number ofqueries, and “slowCnt”, the total number of “slow queries”. Pair degradation = Pair("fas.200", 0);Int cnt = 0;Int slowCnt = 0;S ::= query{ cnt = cnt + 1;slowCnt = slowCnt +case (procTime > 200) {True => 1;False => 0;};degradation = Pair("fas.200", 100 * slowCnt / cnt);}S

Figure 6: Grammar for Service Degradation

The resource-aware service.

We create an abstract service model in ABS of the various servicesshown in Figure 4. A detailed model of the services is not necessary to exploit the ABS tool-set, itsufﬁces to create a course-grained model that captures the Service APIs with stub implementations, as No such method is known for general context-free grammars, and it is unlikely to exist as this would give a procedure toparse them in linear time (in the size of the trace/sequence). With the usual semantics of CFG’s, this grammar generates the empty language: no words (sequence of query’s) of ﬁnitelength are derivable. An epsilon production would have to be included. As a convenience to the user, we allow to omit epsilonproductions by using the preﬁx-closure of the given grammar. For the given grammar, this is all ﬁnite sequences of query events. Analysis of SLA Compliance in the Cloud we shall see. The service model can be reﬁned with more detailed implementations whenever necessaryto allow more detailed analyses. By way of example we show the model of a Query Service (Figure 7)and the Load Balancing Service (Figure 8). The load balancer distributes requests by means of a roundrobin policy and forwards them to query service instances. (Here, current is the number of service in-stances available in the current round and services the instances available in the next round, which maychange dynamically depending on the scaling policy.) The actual service instances process the requestsand return a response, e.g., a list of products that match the query in the case of Fredhopper Cloud Ser-vices. The given ABS model abstracts from a detailed implementation and focuses on execution cost bymeans of the statement [Cost: cost] log = log + . The annotation [Cost: cost] is a measure ofthe estimated number of instructions. An initial value for it can be obtained by using the SACO tool [2]for cost analysis of models in the ABS tool suite [33], or by averaging execution times from real-worldclient logs produced from existing code. class QueryServiceImpl (...) implements QueryService {...Response invoke (Request request) {assert state == RUNNING;Int cost = cost(request);Int time = currentms();[Cost: cost] log = log + 1;time = currentms() - time;latency = max(latency, time);return success();}} Figure 7: Query Service class LoadBalancerEndPointImplimplements LoadBalancerEndPoint {Int log = 0;State state = STOP;List services = Nil;List current = Nil;...Response invoke (Request request) {log = log + 1;assert state == RUNNING;if (current == Nil) { current = services; }EndPoint p = head(current);current = tail(current);return await p!invoke(request);}...}

Figure 8: Load Balancing Service Endpoint . de Boer et al. Negotiation phase.

Before we can accept a proposed SLA, we need to determine whether we canmeet it with appropriate expense by deploying a number of

QueryServiceImpl instances. We assumea setting where

QueryServiceImpl instances run on virtual machines with an allocated capacity of K execution resources (CPU execution capacity, also called ECU).Static analysis with SACO [2] yields cost/K as the total time required by the invoke method to replyto a single query. Therefore, we obtain (cost/K) ≤ as a ﬁrst bound from the SLA (a) on page 5. Inorder to meet the service degradation requirement expressed in the SLA (a), we need to determine theminimum number of resources in a conﬁguration that complies with the SLA. For simplicity, we hereassume a uniform arrival time for the requests, ignore the overhead of load balancing and distribution,and let n be the number of machines with k execution resources that we need. In this case, we knowthat (cost/(n × k)) ≤ , and we obtain (5 × cost/k) ≤ n . For more complex scenarios (especiallyinvolving sub-services and synchronization), the ABS tool suite [33] comes in handy to help calculatingthe required number of machines.This ignores the actual arrival time of requests as well as any external factors (see Figure 1) whichmay disrupt service execution. To ensure compliance to the service metrics under non-ideal conditions,we use a monitoring platform , external to the service, that continuously observes it. The observation phase.

The observation phase in our framework [7] consists of computing the valueof the service metric function as speciﬁed by the grammar in Figure 6 from a given event trace. Thisinvolves parsing the event trace according to the grammar. From the grammar we automatically syn-thesize an ABS implementation of the corresponding parser. The use of grammars allows us to buildon well-established and widely known parsing technology with optimal performance. Observations canalso come from external systems which publish events to the model using an API over HTTP.Given our service model in ABS, we can now replay a real-world log using this API, which generatescorresponding invoke events for the model according to the speciﬁed timings in the logﬁle (see Figure 9).The resulting trace of invoke events is then parsed according to the grammar in order to compute the“degradation” service metric.

Reaction phase.

Figure 10 shows a monitor corresponding to the grammar in Figure 6 for servicedegradation. Here metricHist contains the time-stamped history of metric values which is provided bythe general ABS monitoring framework. The monitoring framework further integrates a powerful tool(the ABS Smart Deployer [14]) for the automated deployment of new service instances, based on high-level requirements of deployment conﬁgurations. A solver synthesizes a provisioning script executablein ABS that implements

DeployerIF with appropriate scaling actions, such as allocating new virtualmachines, and conﬁguring and deploying additional service instances on these machines. This approachguarantees that the scaling actions preserve the deployment requirements.The above ABS monitor reacts to the service degradation metrics (cf. Figure 6) by asking the deployerto scale up or down the service instances. For instance, if degradation is larger than (cf. SLA (a)),the method scaleup is invoked to get more service instances; and if degradation is less than ,the method scaleDown is invoked to reduce service instances. Monitoring can be expensive; we mustensure that the monitoring does not degrade performance below the level stipulated in the SLA. Staticanalysis and simulation of the ABS model together with the monitor allows to analyze how the monitoreffects the SLA before the system is deployed. ABS allows monitors to be deployed asynchronously anddecoupled.2

Analysis of SLA Compliance in the Cloud

Figure 9: Log replay

Unit monitor (DeployerIF deployer) {Rat degradation = head(metricHist);if (degradation > 5/1000) {deployer.scaleUp();} else if (degradation < 1/1000) {deployer.scaleDown();}}

Figure 10: Monitor for Service Degradation

The methodology presented in this paper has been devised in the context of the EU project Envisageto provide efﬁcient development of SLA-aware and scalable services, supported by highly automatedanalysis tools using formal methods.While there are several proposals for formalizing SLAs [27, 28], there is no study on how such SLAscan be used to both verify and monitor the service and upgrade it as necessary. In this respect, to the bestof our knowledge, our technique that uses both static analysis and run-time analysis is original. Belowwe report the main related work on analysis, deployment and runtime monitoring of systems, focussingon work which is relevant in the context of cloud systems.Static analysis estimates computational complexity (e.g. time) or resource usage of a given programand provides guarantees that the program will not exceed the inferred amount of resources [16, 20]. Typ-ically, such analyses apply to traditional sequential applications and, in order to use the above techniques . de Boer et al.

This paper describes the analysis SLA compliance for services deployed on the cloud, by combing theformal models of the SLA and of the cloud service. Based on these two formal models, a detailedmodel-centric, tool-supported workﬂow is deﬁned to obtain a conﬁguration of cloud services in a semi-automated manner. The basis for our approach is the modeling language ABS that supports the modelingof deployment decisions on elastic infrastructure, and is the basis for a scalable monitoring frameworkfor deployed services based on service metric functions. Using an industrial case study from FredhopperCloud Services, we show that our model-based approach can help to address the challenging questionsposed in Section 1. Our speciﬁc combination of model-based tools is a good match for a DevOps method-ology.

References [1] Erika ´Abrah´am, Florian Corzilius, Einar Broch Johnsen, Gereon Kremer & Jacopo Mauro (2016):

Zephyrus2: On the Fly Deployment Optimization Using SMT and CP Technologies . In Martin Fr¨anzle, Analysis of SLA Compliance in the Cloud

Deepak Kapur & Naijun Zhan, editors:

SETTA , LNCS

SACO: Static Analyzer for ConcurrentObjects . In Erika ´Abrah´am & Klaus Havelund, editors:

Tools and Algorithms for the Construction and Anal-ysis of Systems, 20th Intl. Conf. , LNCS

Formal Modeling of Resource Management for Cloud Architectures:An Industrial Case Study using Real-Time ABS . Journal of Service-Oriented Computing and Applications

Resource Analysis of Distributed Systems .In Erika ´Abrah´am, Marcello M. Bonsangue & Einar Broch Johnsen, editors:

Theory and Practice of FormalMethods , LNCS

Apache Mesos . http://mesos.apache.org/ .[6] Amazon AWS: Amazon CloudWatch . https://aws.amazon.com/cloudwatch/ .[7] Frank S. de Boer & Stijn de Gouw (2014): Combining Monitoring with Run-Time Assertion Checking . InMarco Bernardo, Ferruccio Damiani, Reiner H¨ahnle, Einar Broch Johnsen & Ina Schaefer, editors:

FormalMethods for Executable Software Models, 14th Intl. School on Formal Methods for the Design of Computer,Communication, and Software Systems, SFM, Advanced Lectures , LNCS

SLA Decomposition: TranslatingService Level Objectives to System Level Thresholds . In:

Fourth International Conference on AutonomicComputing (ICAC), Jacksonville, Florida, USA , IEEE Computer Society, p. 3, doi:10.1109/ICAC.2007.36.[9] Marco Comuzzi, Constantinos Kotsokalis, George Spanoudakis & Ramin Yahyapour (2009):

Establishingand Monitoring SLAs in Complex Service Based Systems . In:

IEEE International Conference on Web Ser-vices, ICWS, Los Angeles, CA, USA , IEEE Computer Society, pp. 783–790, doi:10.1109/ICWS.2009.47.[10] Roberto Di Cosmo, Michael Lienhardt, Jacopo Mauro, Stefano Zacchiroli, Gianluigi Zavattaro & JakubZwolakowski (2015):

Automatic Application Deployment in the Cloud: from Practice to Theory and Back .In Luca Aceto & David de Frutos-Escrig, editors: , LIPIcs

42, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, pp. 1–16,doi:10.4230/LIPIcs.CONCUR.2015.1.[11] Roberto Di Cosmo, Jacopo Mauro, Stefano Zacchiroli & Gianluigi Zavattaro (2014):

Aeolus: A componentmodel for the cloud . Inf. Comput.

Static analysis of cloud elasticity . Sci. Comput.Program.

Time Complexity of ConcurrentPrograms—A Technique Based on Behavioural Types . In Christiano Braga & Peter Csaba ¨Olveczky, editors:

Formal Aspects of Component Software, 12th Intl. Conference, FACS, Niter´oi, Brazil, Revised SelectedPapers , LNCS

Declarative Elasticity inABS . In Marco Aiello, Einar Broch Johnsen, Schahram Dustdar & Ilche Georgievski, editors:

Proc. 5th IFIPWG 2.14 European Conference on Service-Oriented and Cloud Computing (ESOCC) , LNCS

Cloud Service Level Agreement Standardisation Guidelines . Developedas part of the Commission’s European Cloud Strategy. Available at http://ec.europa.eu/information_society/newsroom/cf/dae/document.cfm?action=display&doc_id=6138 .[16] Sumit Gulwani, Krishna K. Mehra & Trishul M. Chilimbi (2009):

SPEED: precise and efﬁcient static es-timation of program computational complexity . In Zhong Shao & Benjamin C. Pierce, editors:

Proc. 36th . de Boer et al. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL, Savannah, GA,USA , ACM, pp. 127–139, doi:10.1145/1480881.1480898.[17] Reiner H¨ahnle & Einar Broch Johnsen (2015):

Designing Resource-Aware Cloud Applications . IEEE Com-puter

Ansible . .[19] Kelsey Hightower, Brendan Burns & Joe Beda (2017): Kubernetes: Up and Running Dive into the Future ofInfrastructure , 1st edition. O’Reilly Media, Inc.[20] Jan Hoffmann & Martin Hofmann (2010):

Amortized Resource Analysis with Polynomial Potential . In An-drew D. Gordon, editor:

Programming Languages and Systems, 19th European Symposium on Programming,ESOP, Paphos, Cyprus , LNCS

Mutant Apples: A Critical Examination of Cloud SLAAvailability Deﬁnitions . In:

IEEE 5th International Conference on Cloud Computing Technologyand Science, CloudCom, Bristol, United Kingdom, Volume 1 , IEEE Computer Society, pp. 379–386,doi:10.1109/CloudCom.2013.56.[22] Christian Inzinger, Waldemar Hummer, Benjamin Satzger, Philipp Leitner & Schahram Dustdar (2014):

Generic event-based monitoring and adaptation methodology for heterogeneous distributed systems . Softw.,Pract. Exper.

ABS: A CoreLanguage for Abstract Behavioral Speciﬁcation . In Bernhard K. Aichernig, Frank de Boer & Marcello M.Bonsangue, editors:

Proc. 9th International Symposium on Formal Methods for Components and Objects(FMCO 2010) , LNCS

Integrating deployment architecturesand resource consumption in timed object-oriented models . Journal of Logical and Algebraic Methods inProgramming

Translation of SLAs into monitoring speciﬁcations . InP. Wieder, J. Butler, W. Teilmann & R. Yahyapour, editors:

Service Level Agreements for Cloud Computing ,Springer, pp. 79–101.[26] Luke Kanies (2006):

Puppet: Next-generation conﬁguration management . ;login: the USENIX magazine The WSLA Framework: Specifying and Monitoring Service LevelAgreements for Web Services . J. Network Syst. Mgmt.

SLAng: A Language for Deﬁning ServiceLevel Agreements . In:

Proc. 9th IEEE Intl. Workshop on Future Trends of Distributed Computing Systems(FTDCS), San Juan, Puerto Rico , IEEE Computer Society, pp. 100–106, doi:10.1109/FTDCS.2003.1204317.[29] Opscode:

Chef . .[30] Puppet Labs: Marionette Collective . http://docs.puppetlabs.com/mcollective/ .[31] Rudolf Schlatte, Einar Broch Johnsen, Jacopo Mauro, Silvia Lizeth Tapia Tarifa & Ingrid Chieh Yu (2018): Release the Beasts: When Formal Methods Meet Real World Data . In Frank S. de Boer, Marcello M.Bonsangue & Jan Rutten, editors:

It’s All About Coordination: Essays to Celebrate the Lifelong ScientiﬁcAchievements of Farhad Arbab , LNCS