Long-term IaaS Provider Selection using Short-term Trial Experience
Sheik Mohammad Mostakim Fattah, Athman Bouguettaya, Sajib Mistry
LLong-term IaaS Provider Selection using Short-term Trial Experience
Sheik Mohammad Mostakim Fattah, Athman Bouguettaya, and Sajib Mistry
School of Computer Science, University of Sydney, AustraliaEmail: { sfat5243, athman.bouguettaya, sajib.mistry } @sydney.edu.au Abstract —We propose a novel approach to select privacy-sensitive IaaS providers for a long-term period. The proposedapproach leverages a consumer’s short-term trial experiencesfor long-term selection. We design a novel equivalence par-titioning based trial strategy to discover the temporal andunknown QoS performance variability of an IaaS provider.The consumer’s long-term workloads are partitioned intomultiple Virtual Machines in the short-term trial. We proposea performance fingerprint matching approach to ascertainthe confidence of the consumer’s trial experience. A trialexperience transformation method is proposed to estimate theactual long-term performance of the provider. Experimentalresults with real-world datasets demonstrate the efficiency ofthe proposed approach.
Keywords -Long-term Selection; Privacy Sensitiveness; IaaSProviders; Performance Fingerprint; Performance Discovery;Equivalence Partitioning;
I. I
NTRODUCTION
Cloud computing is a key technology of choice for smallto large organizations to establish and manage their ITinfrastructures [1]. Cloud provides a faster, and cost-effectiveway to migrate in-house IT infrastructures. A large numberof organizations manage their IT infrastructures in the cloudto achieve economy of sale. Large organizations such asgovernments, universities, banks subscribe to cloud servicesover a long-term period (e.g., more than a year) [2].Infrastructure-as-a-Service (IaaS) is a primary servicedelivery model in the cloud. IaaS models typically offercomputational resources such as CPU, memory, storage, andnetwork bandwidth in the form of Virtual Machines (VMs).Amazon, Google, and Microsoft are examples of popularIaaS providers. The
IaaS provider selection for a long-termperiod is a topical research issue in cloud computing [3].The performance of IaaS providers plays an important rolein the selection of IaaS providers. The IaaS performance isoften measured in terms of its Quality of Service (QoS) suchas price, throughput, and availability. A consumer generallyconcerns two key aspects of the IaaS performance for thelong-term selection. First, how the provider may perform un-der the consumer’s long-term workloads . The performanceof IaaS providers usually varies depending on the workloads[4]. Second, how the performance may vary over the long-term period for its workloads. Most IaaS providers arereluctant to reveal much information about their performanceto protect themselves from their competitors. We define this unwillingness of revealing information as the privacy-sensitiveness of IaaS providers. Privacy-sensitiveness is anintrinsic nature of IaaS providers that restrict them to divulgedetailed and complete information about their services. Themain reasons for such privacy-sensitiveness are market com-petition and business secrecy [5].Most existing studies mainly focus on short-term IaaSprovider selection approaches [6]. These approaches rely onIaaS advertisements for the selection process and are notapplicable to select privacy-sensitive IaaS providers. IaaSadvertisements typically contain incomplete and convoluted information to protect providers’ business privacy . For in-stance, Amazon AWS mentions only the availability of aservice in its advertisements. Information about throughput,response time is not available in its advertisements. IaaSproviders often advertise average or maximum performanceinformation of their services. For instance, Amazon EC2 A2instance advertise its network performance up to 10 Gbps.A consumer may not rely on such advertisements as actualperformance is not guaranteed.Several studies introduce application and micro-benchmarks to predict the performance of IaaS providersaccording to consumer requirements [7]. Applicationbenchmarks are utilized to evaluate providers usingdifferent applications such as web applications, databaseapplications. Micro-benchmarks reveal the performance ofindividual resources of VMs such as CPU, memory, andnetwork bandwidths [8]. These approaches do not considerthe long-term performance variability of IaaS providers.IaaS providers in the cloud market offer free trials for theirservices. For example, Microsoft Azure offers $200 creditfor 30 days for a limited number of services. Although IaaSproviders do not explicitly share detailed information abouttheir services, consumers may get a first-hand experienceabout IaaS providers using the free trial periods. To thebest of our knowledge, existing studies do not consider theeffective utilization of free trial periods for the long-termIaaS provider selection.
We aim to utilize free trial periodsto find out unknown QoS performance information of IaaSproviders for the long-term IaaS provider selection.
There are two main challenges of using trial periods forlong-term selections. First, IaaS providers typically offer freetrial periods for short-term periods with limited flexibility.The consumer can not test its long-term workloads insuch short trial periods. An unplanned utilization of such
Copyright © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or futuremedia, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistributionto servers or lists, or reuse of any copyrighted component of this work in other works. a r X i v : . [ c s . CR ] F e b hort-term trial periods may not properly reflect the actualperformance of the provider. For example, if the workloadsof a consumer have a long-tailed distribution, a one-monthtrial with a balanced request distribution may not divulgethe true performance of long-tailed workloads. Second, theperformance information found in the trial periods is ap-plicable for a short-term period. The performance of publicIaaS providers varies over time due the dynamic and chaotic nature of the cloud environment [9].The performance observed in trial periods primarily de-pends on the consumer’s workloads and the provider’sperformance at that time. Both of these factors should betaken into account while performing the trial. We proposea novel trial strategy based on an equivalence partitioningmethod to capture the effect of the consumer’s workloads onthe provider’s performance while considering the provider’stemporal performance behaviour.
We utilize the concept of performance fingerprint forthe long-term selection. The performance fingerprint of anIaaS provider represents an aggregated view of its tem-poral performance behavior.
We assume the performancefingerprints of IaaS providers are known in this work . Wepropose a fingerprint matching technique to ascertain theconfidence of the consumer’s trial experience for long-termselection.
If the trial experience of a consumer is consistentwith a provider’s performance fingerprint, we utilize thefingerprint to predict the provider’s long-term performancefor the consumer’s long-term workloads. The trial experiencemay not entirely match the performance fingerprint as itrepresents an aggregated view of the provider’s performanceregardless of the consumer’s workloads. The provider mayprovide an isolated trial environment where a consumermay not be able to observe its actual performance.
Wepropose a trial experience transformation technique usingthe provider’s performance fingerprint to estimate the actualperformance of the provider for the consumer’s workloads.
Our contributions in this work are as follows: • An equivalence partitioning based trial strategy usinga time series compression technique that maps theconsumer’s long-term workloads into multiple VMs ina short-term trial period to discover a privacy-sensitiveIaaS provider’s unknown QoS performance. • A performance fingerprint matching technique to ascer-tain the confidence of the consumer’s trial experienceusing the providers’ performance fingerprints. • A long-term performance discovery approach to selectprivacy-sensitive providers using time-series analysis.II. M
OTIVATION S CENARIO
Let us assume a university requires some general purposeVMs for one year where each VM has at least 2 vCPUand 4 GB memory. The required number of VMs, resourcerequirements for each VM are considered as the functionalrequirements of the university. We assume the university has deterministic workloads, i.e., workloads are known for oneyear. The university represents the workloads in terms ofthe number of requested resources per day. The workloadsmay change over time depending on the number of students,holiday periods, and so on. The university defines minimumQoS requirements on throughput, response time, and avail-ability of the VMs. The QoS requirements may also varyover time depending on seasonal demands.Let us assume there are three IaaS providers Google,Amazon, and Microsoft who fulfil the university’s func-tional requirements. No providers advertise their long-termperformance on throughput, response time, and availability.We assume each provider offers a one-month free trialperiod to the university and allows the university to usethree VMs. The university may run some representativebenchmarks on three VMs for each day of the one-monthtrial period and monitor the performance of each providerto make the selection. It may lead to poor decision makingas it does not consider the university’s long-term workloadsand the providers’ temporal performance behaviours. Theperformance of a provider may fluctuate in the trial pe-riod. The university requires an effective trial strategy tounderstand the effect of different types of workloads onthe provider’s performance while considering the provider’stemporal performance behaviour.We assume that the performance fingerprint of eachprovider is known to the university. The performance finger-print provides the university with an aggregated view of aprovider’s temporal performance behaviour regardless of anyspecific type of workload distribution. Hence, the universityrequires to evaluate the performance of the providers usingits workloads. If the provider’s performance fingerprint andthe trial experience exhibit similar temporal performancebehaviour, the university may use the trial experience toevaluate the provider with high confidence. The universityrequires a fingerprint matching technique to evaluate its trialexperience. We propose a set of tools in this paper thatenables the university to leverage trial periods effectivelyto make an informed decision for the long-term period.III. T HE P ROPOSED F RAMEWORK
We identify the following key challenges for the long-termselection using free trial periods: (1) Restriction on Trial Periods : IaaS providers assign dif-ferent types of restricted condition on free trial periods: • Free trial periods are typically offered for a short-termperiod . Discovering long-term performances directlyfrom short-term trials may not be possible. Amazonoffers a one year trial period for some services. Aconsumer may not be able to wait such a long periodto discover performances for the IaaS selection. • Most providers offer trial periods only for a limitednumber of services. For example, Amazon allows auser to give trial only t2 instances from EC2 VMs. ongtermWorkloadsLongterm IaaSConsumerLongtermPerformanceRequirements PerformanceFingerprintsIaaS providersRestricted TrialPeriodsEquivalence Partitioningfor trial periodsPerformance FingerprintMatching with trialexperienceLongterm PerformancePredictionLongterm Selection
Figure 1: Long-term IaaS Provider Selection FrameworkThe required types of VMs of a consumer may not beavailable for trial. In such a case, the consumer mightbe provided with similar yet different types of VMsfor the trial. IaaS providers also restrict the number ofavailable VMs for trial. (2) Temporal Performance Variability:
The performancediscovered in the short-term trial periods may not alwaysreflect the actual performance of the provider. Almost allpublic IaaS providers typically use multi-tenant environ-ments to provide services to their consumers. The effectof multi-tenancy on the performance may depend on sev-eral factors such as location, workloads on the provider,and QoS management strategy that vary with time. Multi-tenancy management policies are not revealed publicly dueto the privacy-sensitiveness of the providers.
The measuredperformance in the trial in one month may be different inanother month . (3) Isolated Trial Environment: IaaS providers may use anisolated environment for the trial users. In such a case, thetrial consumers do not perceive the experience of a real cloudenvironment.
The consumers require a way to find whetherthey are treated differently than the existing consumers.
Fig. 1 shows an IaaS provider selection framework thattakes a consumer’s long-term workload and the performancefingerprints of the providers to perform the selection. First,the proposed framework generates trial workloads using anequivalence partitioning method. Next, A performance fin-gerprint matching technique is applied to the trial experienceto ascertain its confidence. The trial experience is then usedfor long-term performance prediction using the providers’fingerprints. Finally, the framework selects providers basedon the consumer’s long-term performance requirements. Wediscuss each of these steps in the following sections.IV. A N E QUIVALENCE P ARTITIONING BASED T RIAL S TRATEGY
We define an equivalent partitioning based trial strategywhere the consumer’s workloads are tested in the trial periodto discover a provider’s performance while considering theprovider’s temporal performance fluctuation. For simplicity,we assume that the providers offer a fixed number ofrequired VMs trial period using a continuous time-based model . We consider the long-term workloads as time se-ries data. We utilize time series compression techniques tocapture the essential characteristics of the university’s long-term workloads and map these workloads into the multipleVMs in the trial period.
A. Trial Workload Generation for Multiple VMs
Let us assume the university’s long-term workloads has n number of workload data points i.e., t , t , t , ..., t n over T period. For instance, the university defines the workloadas the average number of requested resources per day forone year. Each provider offers v number of VMs for T r trialperiod. We assume that the performance fluctuation within d period is negligible. Hence, workloads for a particular VMshould remain same for every d period over T r period tounderstand the effect of temporal performance behavior ofa provider. Each VM may run different types of workloadsto understand the effect of provider’s performance for dif-ferent types of workloads. This method of partitioning theworkloads is called equivalent partitioning. The university’s long-term workloads of n size need tobe mapped with v number of VMs on T r period. First, wepartition the workloads into n/v equal parts. Let us assumeeach part w contains m workload data points. Once weallocate the workloads for each VM, we need to compressthe workload as w may be still very large to run in d period.The size of w may be still too large to run in d period. Forexample, if the university has one-year workloads and thenumber of VMs are 12, each part of the workload containsone month of workloads. Each VM should run one monthof workloads on every day ( d = 1 ) of the trial period T r .We need to compress each w into d period for each VM.Let us assume d can be divided into k data points. If m ≤ k then each workload of w can be tested in the d period. If m > k , then there are more workloads then compression isrequired. The compression may incur some loss of workloadinformation. The loss depends on the size of k , the shape ofthe workload time series , and the compression method [10].We use a compression technique M to extract the mostimportant workloads from w and fit into d periods. L () isthe loss function that calculates the loss incurred during thecompression using M and th is the maximum acceptableloss. During compression L ( M ) ≤ th condition must beheld. th is defined by the consumer. Let us assume d canbe divided into k − intervals. The method M compresses w to fit into k − intervals. Our target is to find an optimalvalue for k where L ( M ) ≤ th . We use Algorithm 1 togenerate workload for the d period. The algorithm takeseach w from the university’s long-term workload, the lengthof d , a compression technique M , a loss function L () , andminimum acceptable loss th . The algorithm produces trialworkload tw as output. First, it determines the initial value of k by using a workload summarization method. The workloadsummarization technique generates all different workloadsor given workloads. The algorithm set the sampling rate rate based on the size of the workload m and the initialvalue of k . The algorithm then applies the compressionmethod M using rate and calculates the amount of lossusing L () . If it is less than an acceptable threshold, thisprocess continues until the amount of loss error is less thanthe maximum acceptable loss of th . Once error ≥ th , thealgorithm stops and returns the trial workload tw . Algorithm 1
Generating Trial Workloads Input: w , d , M , L () , th Output: tw m ← size ( w ) ; workloadSummary ← summary ( w ) ; k ← size ( workloadSummary ) ; rate ← ceil ( m/k ) ; errorT hresh ← th ; error ← ; while error < errorT hresh do if rate == size ( w ) then break; tw ← M ( w, rate ) error ← L ( w ) rate = rate + 1 return tw ; B. Workload Compression Technique
The university’s long-term workloads are representedusing time series. Several approaches exist to compresstime series data and generated representative time series.We decide to use a commonly used time series compres-sion technique called Piecewise Aggregate Approximation(PAA). The PAA method reduces the number of data pointsin a time series by taking average values in each interval( I i ). If t i represents a timestamp in the workload time seriesand m is the total number of points in the time series, thenthe value of the time series in I j is calculated using thefollowing equation: I j = 1 x i ∗ x (cid:88) i =( j − ∗ x +1 t i for j = 1 ... (cid:100) m/x (cid:101) (1)The size of the original time series can be reduced by anyfactor by changing the value of x . For a given m datapoints in the workloads and k segments in the trial period,we define the minimum x = (cid:100) m/k (cid:101) . The PAA methodintroduces some loss of information. For two given timeseries w and z , the loss is defined by the following equation: L = 1 m n (cid:88) i =1 | z i − w i | (2)The original workload time series and the compressedworkload time series has a different number of workload points. We decompress the compressed workload time seriesto compare with the original workload time series. We find k workload points using equation 1 during the compression.We apply a decompression mechanism on the compressedworkload of k points to generate n points. The decompres-sion is performed by mapping each value of the compressedtime series with (cid:100) m/k (cid:101) interval into m space. The rest m − k points are generated by the linear extrapolation method.After decompression both original and compressed workloadtime series have the same number of points. Hence, we canapply equation 2 to compute the mean absolute error .V. P ERFORMANCE F INGERPRINT M ATCHING
The performance information found in the trial periods isapplicable for a short-term period. Many organizations suchas CloudSpectator, CloudHarmony, and CloudStatus aredevoted to monitor and analyze the performance of publicIaaS cloud providers due to its growing importance. Theseorganizations publish reports on the performance of IaaSproviders using standard benchmarks. Each provider showsunique performance characteristics and exhibits differenttemporal performance behavior over the long-term periods.Each provider has its own unique temporal performancebehavior that may depend on its provisioning policy, thenumber of consumers, and location.We leverage the idea of fingerprinting to represent thetemporal performance behavior of a provider. Fingerprintingtechniques are well-known to identify and track a user onthe Internet based on the impression left by the user [11].Fingerprinting techniques are typically used to partially orfully identify a user on the Internet by tracking its activityand preferences without any active identification. We use theconcept of performance fingerprint of an IaaS provider torepresents an aggregated view of the provider’s long-termperformance behavior.
Definition 1.
Performance Fingerprint
A performance fin-gerprint of an IaaS provider is the average performance a setof QoS parameters for each time interval over a fixed periodthat captures the provider’s temporal performance behavior.We denote the performance fingerprint as F = { Q , Q , ..Q N } where N is the number of QoS parametersand Q i = ( P n , t n ) | n = 1 , , , ..k . Here, t n denotes atimestamp of T period where the average performance of Q i is P i . The performance fingerprint of a provider maybe known partially or completely. The partial fingerprintrefers to a fingerprint that does not have information forall timestamps of a certain period. A. Performance Fingerprint Matching
We utilize the performance fingerprint of each providerto ascertain the confidence of the trial experience. If thetrial experience is consistent with a provider’s performancengerprint, then the consumer may make the selection withconfidence based on the trial experience.We assume the complete performance fingerprints of theproviders is known for T period. The trial is performedin the interval ( t j , t k ) ∈ T i.e., T r = ( t j , t k ) where j < k and j, k ∈ T for a set of VM v = { v , v , ...v p } .The trial performance observed by the consumer for eachVM is Q vi = { q , q , ..q c } where c is the number ofQoS parameters in the consumer requirements and q i = { ( p n , t n ) | n = t j , ...t k } where p n is the performance of q i at the timestamp t n . The first step of fingerprint matching isto aggregate the performance of each VM v i for each QoSparameter q i ∈ Q vi . The aggregated performance for eachQoS parameter is computed by the following equation: q (cid:48) i = sum ( v ( q i ) , v ( q i ) , ..., v p ( q i )) (3)where sum () represents the aggregate function, v j ( q i ) repre-sents the performance time series of QoS parameter q i in theVM v j in the trial period. The aggregated QoS performanceof q i for all VM is q (cid:48) i .We denote the performance of the trial period for theconsumer’s aggregated workloads as Q V M = { q (cid:48) , q (cid:48) , ..q (cid:48) c } .Now, we need to perform fingerprint matching between Q V M and F for the trial interval ( t j , t k ) . We use Pearsoncorrelation coefficient to compute the similarity between thetrial experience and the performance fingerprint for eachQoS parameters using the following equation: r q (cid:48) i ,q i = (cid:80) kt = j ( p (cid:48) t − ¯ p (cid:48) )( p t − ¯ p ) (cid:113)(cid:80) kt = j ( p (cid:48) t − ¯ p (cid:48) ) (cid:113)(cid:80) kt = j ( p t − ¯ p ) (4)where p (cid:48) t the value of observed performance and p t is thevalue of the performance fingerprint at the time t of the trialperiod for a QoS parameter. The mean correlation coefficientfor all QoS parameters is computed as follows: R Q V M ,F = 1 c c (cid:88) i =1 r q (cid:48) i ,q i (5)The correlation coefficient measures the similarity of twotime series in terms of their trends, i.e., how much the trialexperience is affected by the performance fingerprint. It doesnot consider the actual distance from the fingerprint. Wedefine the confidence of the trial by considering both trendand distance of the trial experience with the fingerprint usingthe following equation:Confidence = ( R Q V M ,F , MNRMSE ( Q V M , F )) (6)where MNRMSE is the mean normalized root meansquared error between the trial experience and the per-formance fingerprint. First, we compute the NRMSE foreach QoS parameter in the trial period. The MNRMSE is computed by taking average NMRSE of all QoS parameter.The consumer defines a minimum threshold R t and E t for R Q V M ,F and MNRMSE ( Q V M , F ) respectively. If theconfidence of the trial experience is below the thresholds,we consider it as a partial fingerprint matching. B. Trial Experience Transformation for Partial Matching
We transform the trial experience for partial fingerprintmatching to estimate an approximate performance behaviorof the provider for the consumer’s workloads. The trial expe-rience may have less correlation or higher distance with theperformance fingerprint of a provider. We need to transformthe trial experience in a way that it reduces the distanceor increase the correlation between trial experience and theperformance fingerprint. In both cases, the confidence of thetrial may increase.The partial fingerprint matching indicates that the actualperformance of the provider may be different from the trialexperience for the consumer’s workloads. We use the follow-ing equation to transform the trial experience to estimate theactual performance: Q TV M = Q V M + 12 ( F − Q V M ) (7)where Q TV M is the transformed trial experience, F is theperformance fingerprint in the trial interval, and Q V M is thetrial experience. Equation 7 transforms the trial experiencefor each aggregated QoS parameters by reducing the distancefrom the fingerprint by half. The intuitive idea behind thistransformation is two-fold. First, if the provider offers anisolated trial environment, the real experience may be closerto the fingerprint rather than the trial experience. Second,the performance fingerprint does not contain informationfor the consumer’s workload distribution. The transformationincreases the confidence of the trial experience.VI. L
ONG - TERM I AA S P
ROVIDER S ELECTION
We discuss the long-term selection process using the trialexperience and the provider’s performance fingerprint in thissection. First, we estimate providers’ long-term performancefor the consumer’s workloads using the trial experience andthe performance fingerprints. Next, we rank the providersbased on their performance and the consumer’s long-termperformance requirements.
A. Long-term Performance Discovery
The university’s important workloads are tested in the trialperiods and the required QoS parameters are monitored. Letus assume the trial workloads
T W = { tw , tw , ...., tw k } are tested in v number of VMs. Each type of workloadis monitored in the trial period T r for each d intervalswhere the performance fluctuation in d is negligible. TheQoS performance for each workload is denoted by Q tw i = { q , q , .., q c } where q i = { ( p n , t n ) | n = 1 , d, ..., T r } and n is the performance observed at the timestamp t n . Theconsumer’s long-term workloads is denoted by LW = { W , W , ...., W T } . We need to find the performance foreach W i which is denoted by Q W i . The trial performance Q tw i and the performance fingerprint F are used to generate Q W i . We use the following steps to compute Q W i :1) For each W i ∈ LW , find the closest w i ∈ T W .2) For each q (cid:48) i ∈ Q W i , find q i ∈ Q tw i .3) Let t (cid:48) i is the timestamp of W i . tw i has T r/d number ofobservations. We select a timestamp t i for tw i where ( t i − t ) = ( t (cid:48) i mod ( T /T r )) where T is the total timeand T r is the trial period. For example, if
T r = 30 , T = 360 , and t (cid:48) i = 35 then t i = 5 .4) We compute the relative weight r w of the fingerprintat t i for q (cid:48) i using the following equation: r w = P t (cid:48) i P t i (8)where P t (cid:48) i and P t i is the performance of the fingerprintat timestamp t (cid:48) i and t i respectively.5) The performance of q (cid:48) i at t (cid:48) i is computed as follows: p (cid:48) t (cid:48) i = r w ∗ p t i (9)where p (cid:48) t (cid:48) i and p t i is the performance q (cid:48) i at timestamp t (cid:48) i and t i respectively.We compute the relative weight of the performance fin-gerprint between the timestamp of real workload and thetrial workload. The relative weight is applied to the trialperformance of the particular QoS value to compute the per-formance of the real workload. We perform the above stepsfor each q i ∈ Q W i to generate the long-term performancefor each provider. B. IaaS Provider Selection
We compute the distance between the estimated perfor-mance of a provider and the consumer’s long-term perfor-mance requirements. The rank of each provider is computedbased on their distance from the consumer’s long-termrequirements. We use normalized root mean square distanceto compute the distance for each QoS parameter using thefollowing equation: d ( q c , q p ) = (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) q ∈ q c ,q (cid:48) ∈ q p ,t =1 ..n ( q t − q (cid:48) t ) (10)where q c and q p are the time series of the consumer’slong-term requirements and the provider’s estimated long-term performance for a particular QoS parameter respec-tively. The total distance for all QoS parameter is computedby the following: D ( Q c , Q p ) = c (cid:88) i =1 di ( q ci , q pi ) (11) where Q c and Q p are the consumer’s requirements andthe provider’s estimated performance respectively.VII. E XPERIMENTS AND R ESULTS
A set of experiments is conducted to evaluate the proposedapproach. First, we show that the proposed trial strategycan predict a provider’s long-term performance using itsperformance fingerprints. Next, we evaluate the effectivenessof the trial experience transformation technique consideringpartial fingerprint matching. Finally, we rank IaaS providersbased on long-term performance prediction.
A. Experiment Setup
Finding real-world cloud traces for a long-term periodis challenging. We generate the CPU workloads for theconsumers from publicly available Eucalyptus cloud traces.It contains data of 6 clusters which cover continuous multi-month time frames [12]. We select one trace to generateCPU workloads for ten consumers. The QoS performancedata is collected from SPEC Cloud IaaS 2016 results [13].We generate QoS performances of each provider for eachconsumer’s workloads by random replication method. Dataof one month are mapped into 12-month data points whereeach data point is considered as an average of a single daymeasurement. The performance fingerprint of each provideris generated by taking the average of the observed per-formances of all consumers. First, we select a consumerfrom ten consumers as new consumer. The trial data of theselected consumer is generated using the proposed approachfor 12 virtual machines and 30 days. First, we find the closestmatched workload from other nine consumers to generatethe performance data for the trial for each workload. Theperformance of the corresponding workload is consideredas the trial performance of a new consumer. This approachensures that the trial experience is affected by a provider’sperformance behavior.
B. Accuracy of the Performance Prediction
Fig. 2 shows the results of a long-term performanceprediction for a provider using its performance fingerprint.The performance fingerprints represent an aggregated viewregardless of a consumer’s workload. Hence, We can notpredict the actual performance using only the provider’s per-formance fingerprint. Fig. 2(a) shows that the performanceprediction without considering the trial experience transfor-mation for the throughput of a provider. The performanceprediction considering the partial fingerprint matching isshown in Fig. 2(b). We use the confidence threshold (0 . , for the similarity and distance respectively. Once we applythe transformation, the confidence of the trial increasessignificantly. The prediction accuracy also improves whenpartial fingerprint matching is considered. Fig. 2 depictsthe long-term performance prediction for ten IaaS providers.
50 100 150 200 250 300 350 400
Time T h r oughpu t ( op / s ) ActualPredicted (a)
Time T h r oughpu t ( op / s ) ActualPredicted (b)
Providers N o r m a li z e R M SE ThroughputInsert Response TimeRead Response Time (c)
Providers N o r m a li z e R M SE ( P a r t i a l F i nge r p r i n t ) ThroughputInsert Response TimeRead Response Time (d)
Figure 2: Long-term performance prediction (a) Throughput (b) Throughput with partial fingerprint (c) Normalize RMSEprediction accuracy (d) Normalize RMSE prediction accuracy with partial fingerprintFig. 2(c) shows the performance prediction without consid-ering the partial fingerprint matching. Fig. 2(d) shows theperformance prediction considering the partial fingerprintmatching. The prediction accuracy is higher i.e., lower
Normalized RMSE distance in Fig. 2(c) than Fig. 2(d) whichproves that the performance prediction accuracy increaseswith the trial experience transformation technique.
C. Accuracy of the Long-term Selection
We use the
Normalize RMSE distance between aprovider’s performance and a consumer’s requirements torank each provider. The distance between the consumer’srequirements and each provider’s predicted performance isshown in Fig. 3(a). The figure shows that the provider 1has the minimum distance from the consumer requirementsfor throughput, insert and read response time. Fig. 3(a)shows the distance between the providers’ actual perfor-mance and the consumer’s requirement. As we generatedthe consumer’s requirement from provider 1’s actual perfor-mance, provider 1 has zero distance from the consumer’srequirement. Therefore, the proposed approach successfullyselect the optimal provider for the long-term period.VIII. R
ELATED W ORK
Several studies discover QoS performance of IaaSproviders by deploying VMs in the cloud. An extensivestudy on the performance variance of Amazon EC2 isprovided in [14]. It addresses that the performance un-predictability in the cloud is a significant issue for many users and often considered as a key obstacle in the cloudadaption. The study finds that Amazon EC2 shows highvariance in its performance. The performance of clouds forscientific computing is analyzed using micro-benchmarksand kernels on Amazon EC2 in [8], [4]. The proposed studyobserves that tested clouds are not suitable for scientificcomputing due to performance variance and low reliability.Most studies conduct experiments to measure short-termperformance [15]. Existing performance monitoring andtesting approaches do not consider the long-term selection.Fingerprinting is a well-known approach where a smallportion of data is used to identify a data source uniquely.Fingerprinting is used in many computing domains such aspublic key management, digital video and audio copyright,digital forensic, and user tracking. A number of studiesfocus on passive fingerprinting technique to track users fromtheir interaction in the browser without using cookies [11].These approaches focus on users from a provider’s perspec-tive. We gain insight from these approaches that temporalperformance behaviour of IaaS provider is identifiable. Weintroduce the performance fingerprinting of IaaS providersin this work from a consumer perspective to capture theirtemporal performance variability.IX. C
ONCLUSION
We propose a novel approach to select privacy-sensitiveIaaS providers using their performance fingerprints. Theproposed approach utilizes free trial periods to evaluate aprovider’s long-term performance. A consumer may choose
Providers P r ed i c t ed D i s t an c e TPIRRR (a)
Providers A c t ua l D i s t an c e TPIRRR (b)
Figure 3: Normalize RMSE Distance between provider and consumer (a) Predicted distance (b) Actual distancea provider based on its trial experience. A novel trialstrategy using equivalence partitioning method is proposedto estimate a provider’s performance for different typesof workloads while considering the provider’s performancevariability. The trial experience is incorporated with theprovider’s performance fingerprint to predict long-term per-formance. A performance fingerprint matching techniqueis proposed to ascertain the confidence of the consumer’strial experience. A trial experience transformation methodis proposed to improve the confidence of the consumer’strial experience. The results of experiments show that ourproposed approach helps a consumer to make an informeddecision to select a privacy-sensitive IaaS provider for thelong-term period. A key limitation is that we consider onlya limited number of real-world IaaS providers. We aim tostudy the performance of a large number of IaaS providersto improve our proposed approach.X. A
CKNOWLEDGEMENT
This research was partly made possible by NPRP 9-224-1-049 grant from the Qatar National Research Fund (amember of The Qatar Foundation) and DP160103595 andLE180100158 grants from Australian Research Council. Thestatements made herein are solely the responsibility of theauthors. R
EFERENCES [1] S. Chaisiri, B.-S. Lee, and D. Niyato, “Optimization ofresource provisioning cost in cloud computing,”
IEEE TSC ,vol. 5, no. 2, pp. 164–177, 2012.[2] Z. Ye, S. Mistry, A. Bouguettaya, and H. Dong, “Long-termqos-aware cloud service composition using multivariate timeseries analysis,”
IEEE TSC , vol. 9, no. 3, pp. 382–393, 2016.[3] S. Mistry, A. Bouguettaya, H. Dong, and A. Erradi, “Qual-itative economic model for long-term iaas composition,” in
ICSOC . Springer, 2016, pp. 317–332.[4] A. Iosup, N. Yigitbasi, and D. Epema, “On the performancevariability of production cloud services,” in
CCGrid . IEEE,2011, pp. 104–113. [5] C. Binnig, D. Kossmann, T. Kraska, and S. Loesing, “How isthe weather tomorrow?: towards a benchmark for the cloud,”in
Proceedings of the Second International Workshop onTesting Database Systems . ACM, 2009, p. 9.[6] S. Mistry, A. Bouguettaya, H. Dong, and A. K. Qin, “Meta-heuristic optimization for long-term iaas service composi-tion,”
IEEE Transactions on Services Computing , vol. 11,no. 1, pp. 131–143, 2018.[7] J. Scheuner and P. Leitner, “Estimating cloud application per-formance based on micro-benchmark profiling,” in
CLOUD .IEEE, 2018, pp. 90–97.[8] S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer,and D. Epema, “A performance analysis of ec2 cloud com-puting services for scientific computing,” in
ICCC . Springer,2009, pp. 115–131.[9] P. Leitner and J. Cito, “Patterns in the chaos—a studyof performance variation and predictability in public iaasclouds,”
ACM TOIT , vol. 16, no. 3, p. 15, 2016.[10] G. Burtini, S. Fazackerley, and R. Lawrence, “Time se-ries compression for adaptive chart generation,” in
CCECE .IEEE, 2013, pp. 1–6.[11] K. Takeda, “User identification and tracking with onlinedevice fingerprints fusion,” in
ICCST . IEEE, 2012, pp. 163–167.[12] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman,L. Youseff, and D. Zagorodnov, “The eucalyptus open-sourcecloud-computing system,” in
CCGRID . IEEE, 2009, pp.124–131.[13] S. Baset, M. Silva, and N. Wakou, “Spec cloud iaas 2016benchmark,” in
Proceedings of the 8th ACM/SPEC on Inter-national Conference on Performance Engineering . ACM,2017, pp. 423–423.[14] J. Schad, J. Dittrich, and J.-A. Quian´e-Ruiz, “Runtime mea-surements in the cloud: observing, analyzing, and reducingvariance,”
Proceedings of the VLDB Endowment , vol. 3, no.1-2, pp. 460–471, 2010.[15] A. Bouguettaya, S. Nepal, W. Sherchan, X. Zhou, J. Wu,S. Chen, D. Liu, L. Li, H. Wang, and X. Liu, “End-to-endservice support for mashups,”