[PDF] Delay and Price Differentiation in Cloud Computing: A Service Model, Supporting Architectures, and Performance

Abstract

Many cloud service providers (CSPs) provide on-demand service at a price with a small delay. We propose a QoS-differentiated model where multiple SLAs deliver both on-demand service for latency-critical users and delayed services for delay-tolerant users at lower prices. Two architectures are considered to fulfill SLAs. The first is based on priority queues. The second simply separates servers into multiple modules, each for one SLA. As an ecosystem, we show that the proposed framework is dominant-strategy incentive compatible. Although the first architecture appears more prevalent in the literature, we prove the superiority of the second architecture, under which we further leverage queueing theory to determine the optimal SLA delays and prices. Finally, the viability of the proposed framework is validated through numerical comparison with the on-demand service and it exhibits a revenue improvement in excess of 200%. Our results can help CSPs design optimal delay-differentiated services and choose appropriate serving architectures.

Full PDF

DDelay and Price Differentiation in Cloud Computing: AService Model, Supporting Architectures, and Performance

XIAOHU WU,

Nanyang Technological University, Singapore

FRANCESCO DE PELLEGRINI,

University of Avignon, France

GIULIANO CASALE,

Imperial College London, United KingdomMany cloud service providers (CSPs) provide on-demand service at a price with a small delay. We proposea QoS-differentiated model where multiple SLAs deliver both on-demand service for latency-critical usersand delayed services for delay-tolerant users at lower prices. Two architectures are considered to fulfill SLAs.The first is based on priority queues. The second simply separates servers into multiple modules, each forone SLA. As an ecosystem, we show that the proposed framework is dominant-strategy incentive compatible.Although the first architecture appears more prevalent in the literature, we prove the superiority of the secondarchitecture, under which we further leverage queueing theory to determine the optimal SLA delays andprices. Finally, the viability of the proposed framework is validated through numerical comparison with theon-demand service and it exhibits a revenue improvement in excess of 200%. Our results can help CSPs designoptimal delay-differentiated services and choose appropriate serving architectures.Additional Key Words and Phrases: QoS-differentiation, incentive compatible, cloud computing

The Infrastructure-as-a-Service (IaaS) market is projected to grow to $61.9 billion in 2021 from$30.5 billion in 2018 [1], and is attracting users with different purposes to run their applicationson cloud servers. Many cloud service providers (CSPs) provide the standard on-demand service,which is always available at a publicly known price p with a small delay. When a customer arrives,it requests to occupy servers for some period without interruption, and a delay arises, i.e., thetime from the request arrival to the service commencement. While delay is a key constraint toresource efficiency, customers often differ in the sensitivity to it [2, 3]. Price differentiation bydelays is thus an important research direction to satisfy the customer’ preference. Related schemesoften use queuing theory for performance analysis and incentive compatibility (IC) to ensureuser truthfulness, eliminating the unpredictable effect of non-truthful strategic behaviour on theperformance.One line of work considers an architecture of separating servers into two parts respectively foron-demand and spot markets [4, 5]. Each customer has an initial individual willingness-to-pay(WTP) that further decreases linearly with the delay. The associated slope c defines how sensitive itis to delay, and is called delay-cost type. It will choose to join one market or neither to maximize itssurplus. For any customer of spot market, Abhishek et al. show that, there is a pricing rule to forma Bayesian-Nash incentive compatible mechanism (BNIC), i.e., it will truthfully bid c if the othersalso do so [4]. The customers of higher bids can preempt the servers of others, and each type ofcustomers has an individual service class whose delay relies on the job arrival rate of higher bids.Dierks and Seuken extend the model by considering additional constraints such as preemption costand the capacity finiteness of on-demand market [5].Differently from [4, 5], we consider the following dimensions. First, it is general to use a familyof concave functions to more precisely characterize the WTPs of users [6, 7]. Second, in the current In this paper, we use customers and users interchangeably.Authors’ addresses: Xiaohu Wu, Nanyang Technological University, Singapore, [email protected]; Francesco DePellegrini, University of Avignon, France, [email protected]; Giuliano Casale, Imperial CollegeLondon, United Kingdom, [email protected]. , Vol. 1, No. 1, Article . Publication date: July 2018. a r X i v : . [ c s . PF ] J u l loud markets, the price p of a CSP is a predefined value that depends on not only WTPs but alsoother factors such as competition. It is acceptable for most users. We thus consider the case that theinitial WTPs of users are all p , implying their acceptability of on-demand service, and study pricedifferentiation by delays under such context. Third, the delay-cost types can be tremendous and it isoperationally costly to maintain an individual service level agreement (SLA) for each type of users[8]. Fourth, we focus on non-preemptive scheduling, i.e., the service is continuously provisioned tothe customer with no interruption. Preemptions are costly and can increase uncertainties withinthe delays [9, 10]. Model.

The standard on-demand service is the fastest service and designed with the principle of“one size fits all” to satisfy all types of users. We propose a model of offering a limited numberof SLAs to provide incentives and service differentiation among users. These SLAs include bothon-demand service for latency-critical jobs, and services with different levels of delay at lowerprices. The service model is supported by an underlying architecture to fulfill SLAs. Two typicalones are considered. One is similar to the spot market in [4, 5], called the priority-based sharing(PBS) architecture, where delay-tolerant jobs can access the servers of on-demand market in lowerpriorities. The other simply separates servers into multiple modules, each for one SLA, called theseparated multi-SLAs (SMS) architecture.The proposed model may benefit all market participants. Potential customers get opportunityto trade their delay tolerance for cheaper service. The CSP can thus attract more such customersfrom its competitors. In queuing systems, the larger the delay, the higher the resource utilization.Delay-differentiated services allow processing more workload than a pure on-demand service,possibly improving its revenue.

Results.

As an ecosystem, we derive the main features of the model above, and the main results ofthis paper are as follows:(i) We derive a generic pricing rule that gives the optimal SLA prices when the SLA delaysare given in advance, and show that the proposed model is dominant-strategy incentivecompatible (DSIC): every user truthfully reports its delay-cost type, regardless of what theothers do. DSIC is a stronger degree of IC than BNIC that assumes that an individual customerhas the global knowledge of the distribution on user types, which is not needed in DSIC [11].(ii) The architecture determines the model’s performance. We derive two performance boundsrespectively under the PBS and SMS architectures. They show the superiority of a SMS-basedservice system where a PBS-based system in fact achieves a similar revenue to a pure on-demand system. We then leverage queueing theory to give the optimal SLA prices and delaysof a SMS-based system. Finally, we give numerical results to show that it can significantlyoutperform the standard on-demand service model, with a revenue improvement in excessof 200%.The rest of this paper is organized as follows. In Section 2, we introduce the related work.We propose the delay-differentiated service model in Section 3. Next, we study in Section 4 therelated pricing problems. We describe two architectures in Section 5 to support the service modeldifferently, and analyze their performance and optimal parameter configuration. Simulations aredone in Section 6 to evaluate the performance numerically. Finally, we conclude this paper inSection 7. Due to space limitation, all proofs of conclusions are put in the appendix.

CSPs can offer spot service where customers bid to utilize servers, similar to what Amazon ElasticCloud Compute (EC2) does. The combination of queue and game theories is used to characterizeuser behavior and request serving [12, 13]. Currently, two main models exist for CPS systems. he first model is proposed by Abhishek et al. [4], which has been partly introduced before. Thereare n classes of jobs whose mean service time is s . Each job of type i ∈ [ , n ] has an initial WTP v i and a linear delay-cost type C i that is a random variable in [ , s · v i ] . The on-demand market ismodeled as a G / G /∞ queue with infinite servers to guarantee that the service delay φ is zero forall jobs. The spot market is modeled as a preemptive G / G / m queue with finite servers in whicheach job bids; the higher its bid, the higher its priority to access servers and the lower its delay.Dierks and Seuken extend the first model by considering additional constraints and modelingthe on-demand market as a G / G / m queue with finite servers [5]; thus, the on-demand service isdelivered to customers with a small delay T . Finally, their mathematical expressions are instantiatedwith regard to M / M / m queues, and numerical results are given to show the concrete revenueimprovement of this model over the pure on-demand service model. The second model focuses on enabling users to utilize the idleness of on-demand market. The idleperiods of servers appear at random and are utilized as spot service by users who bid the highest[14]. Wu et al. show that the challenge is guaranteeing the immediacy of on-demand service andthe persistence of spot service while sharing servers [15]. Then, they give an integral resourceallocation and pricing framework for this purpose, and it forms a DSIC mechanism. They basicallyfollow the pricing principle used by Amazon EC2 in practice [16] and show how to run suchservices in cloud systems. Song and Guérin focus on the statistical features of the spot pricingaspect and give the optimal pricing and bidding strategies for a CSP and its users respectively [7].The spot service is also delay-differentiated in the sense that if a user bids higher, its delay willbe smaller. For the sake of tractability, the authors also use a family of linear delay-cost functionsto characterize the users’ sensitivity to delay; numerical results are given for the more generalsettings including a family of concave functions. Finally, spot service is popular in that users cantrade their delay tolerance for cheaper service. However, it indeed creates significant complexitythat users have to face and does not provide any delay guarantee [17–19].Additionally, there are many works that use the theory of auction and mechanism design toexplore potential frameworks for selling computing resource that take into account deadlines[2, 20–22] or virtual machine configuration [23], under which the availability of resource dependson a customer’s bid and is uncertain. However, in practice, it is often desirable to offer an on-demandmarket as an option of customers such that the computing service is certainly available at a fixedunit price, as we see for most of products and services in real world. This is also one motivation ofthe models of this paper and [4, 5, 15].

In this section, we describe the proposed QoS-differentiated service model, and the associatedquestions to be addressed. The service model is generic and we postpone the description of theways of fulfilling its SLAs, which will be given after we study the model properties.

Each customer j requests at time a j to occupy servers for some time s j . We equivalently referto such a request as a job j , a j as arrival time, and s j as service time. Upon arrival, a job may getserved with some delay φ , i.e., it will get served at time a j + φ ; then, the service stops until the job iscontinuously served for a duration s j . The standard on-demand service in cloud markets representsthe fastest service to satisfy all users. We use T and p to denote its delay and price, and they arefixed system parameters: T is the minimum delay before a user can get served where φ ≥ T , and p is the maximum price that a user need to pay for service.In cloud markets, the WTPs of latency-critical jobs drop sharply even if the delay is increasedslightly. For delay-tolerant jobs, although they prefer to get service earlier, their WTPs decrease lowly before the delay increases to a threshold, after which their WTPs decrease sharply [24]. Thesituation of both types of jobs is unified and characterized by a family of functions, denoted by u ( α , φ ) . Property The WTP function u ( α , φ ) is assumed to have the following properties where α is apositive real number and φ ∈ [ T , + ∞) : (i) Normalisation: for all α ∈ R + , we have u ( α , T ) = p ; (iii) Non-increasing: fixing the value of α , u ( α , φ ) is decreasing in φ ; (ii) Monotone Parametrisation: fixing the value of φ , u ( α , φ ) is decreasing in α when φ > T ; (iv) Decreasing speed: fixing the value of φ , ∂ u ∂ φ is decreasing in α . The number of users is finite. Each user will choose a specific value of α that can best fit itssensitivity to delay, and α is said to be its delay-cost type. The first subproperty implies that, allusers can accept on-demand service at a price p since their WTPs are all p when the delay is T . Thesecond subproperty means that, the WTP of a user will decrease as the delay φ increases. The thirdsubproperty states under the same delay φ that, the larger of the value of α , the smaller the WTP u ( α , φ ) . Thus, when the delay increases from T to a larger φ , a user of larger α has more value lossand is more sensitive to delay. ∂ u ∂ φ represents the slope of the tangent line at a point. The fourthsubproperty guarantees that, if a user has a larger α , the decreasing speed of its WTP is also larger.The function instances can be of any form and the conclusions of this paper will hold only ifthey satisfy Property 1. In fact, it can be satisfied by many typical functions in pricing literature.Specifically, the WTP functions can be a family of linear functions in [4, 5] where the value lossis characterized by α · φ when the delay is φ . More interestingly, they can also be a family ofconcave functions [6] where the value loss is α · U ( φ ) , where U ( φ ) is an increasing convex function;then, u ( α , φ ) = p − α · U ( φ ) . As discussed before, they can precisely characterize the followingphenomenon: the WTP decreases slightly as delay increases before a threshold; then, it decreasessignificantly.For example, the WTP functions can be instantiated as u ( α , φ ) = p · (cid:16) − ( α · ( φ − T )) β (cid:17) , φ ∈ [ T , + ∞) . (1)where β ≥ β is a fixed parameter. Here, U ( φ ) = p · ( φ − T ) β , and u ( α , φ ) = p − α β · U ( φ ) . Weuse the term α β , rather than the term α in [6], as the coefficient of U ( φ ) to simplify the subsequentcomputation of φ ; However, this only affects the user’s choice of the values of α to specify thesame relation between WTP and delay: choosing α ′ with the function (1) is equivalent to choosing α ′′ = ( α ′ ) β with the function of the form in [6]. The function in (1) is concave and satisfiesProperty 1. Given a customer of type α , its WTP becomes zero when the experienced delay φ equals φ = α + T , i.e., u ( α , φ ) =

0. When the customer experiences a delay no smaller than α + T (i.e., φ ≥ α + T ), it will not accept any service since its WTP is not positive.We illustrate the function (1) in Fig. 1 where T is set to zero. As illustrated by the solid curves,latency-critical and delay-tolerant users can respectively choose larger and smaller α to reflecttheir sensitivities to delay, conforming to the explanation of the third subproperty. As illustrated bythe leftmost solid curve where φ = .

2, a user’s WTP will decrease faster and faster as the delayincreases, since the function in (1) is concave. When the delay φ ranges in the first half interval (cid:2) , φ (cid:3) , the WTP decreases slowly from p to 0 . · p ; as φ becomes larger and ranges in the secondhalf interval (cid:2) φ , φ (cid:3) , the WTP decreases fast from 0 . · p to 0.Finally, the value of β affects the decreasing speed of WTP as the delay increases. The leftmostsolid and dashed curves illustrate the cases with β = β =

6. A larger β means that the initialdecreasing speed is smaller but then turns larger. In this paper, we study the property of a market Fig. 1. The Curves of WTP Function in (1) for different values of the parameters ( α , β ) . that consists of customers whose sensitivities to delay are defined by the values of α ; when theinstance in (1) is applied, the parameter β is common. The CSP plans to offer a finite number of L Service Level Agreements (SLAs). For all l ∈ [ , L ] , the l -th SLA specifies a delay φ l and the price p l of utilizing a server per unit of time; for the customersoperating under the l -th SLA, whenever their requests arrive, the CSP guarantees that the expecteddelay of delivering service is at most φ l . The first SLA represents the standard on-demand service incloud markets, and it is for latency-critical users who are not willing to tolerate significant delays.Thus, p and φ equal the price and delay of an on-demand service. The prices of the other SLAsare lower than p , at the expense of delaying the delivery of computing services to their consumers;here, we let T = φ < φ < · · · < φ L . (2)Further, we have for all l ∈ [ , L − ] that the price of the l -th SLA is larger than the price of the( l + l -th SLA with a smaller delay. Thus, we have p = p > p > · · · > p L . We note that p and T are fixed parameters, and { φ l } Ll = and { p l } Ll = aredecision variables.The interaction process between a CSP and its customers is illustrated in Fig. 2. Specifically, eachcustomer who enters the service system will choose a value α ∈ R + such that u ( α , φ ) can best fitits sensitivity to delay; then, it reports the chosen α to the CSP. Users of the same α is said to havethe same delay-cost type. The CSP aims to satisfy all its customers, without rejecting any servicerequest, since all customers can accept on-demand service. Under an arbitrary SLA l ∈ [ , L ] , thesurplus of a customer is its WTP minus the SLA price, i.e., u ( α , φ l ) − p l . According to the reportedtype, the CSP will choose one SLA for each type of customers such that their surplus is maximized.Formally, we have the following definition. Definition 3.1.

The customers of type α are assigned the l α -th SLA defined below: l α = arg max l ∈[ , L ] u ( α , φ l ) − p l . (3)Specifically, the CSP regulates that, if the customer achieves the same maximum surplus undermultiple SLAs, it will be assigned to the SLA whose number is the largest. Each customer submits its delay-cost type α to the CSP that in turn assigns a specific SLA to it.The types of all customers constitute a set Φ ; the minimum and maximum values of the elements customer Report its type 𝛼 Cloud provider who offers 𝐿 SLAsAssign the 𝑙 𝛼 -th SLA Computation by Definition 1 Fig. 2. The interaction between customers and a CSP. of Φ are α and α . Let P ( α ) ∈ ( , ) denote the probability that an arriving customer has a delay-costtype α , where (cid:205) α ∈ Φ P ( α ) =

1. The mean arrival rate of the jobs of all types is Λ , and the mean jobsize is s . For all l ∈ [ , L ] , let Φ l denote the set of the types of the customers who are assigned tothe l -th SLA, and P = { Φ , Φ , · · · , Φ L } where (cid:205) Ll = Φ l = Φ and Φ l ∩ Φ l = ∅ for all l , l ∈ [ , L ] with l (cid:44) l . Thus, the mean job arrival rate of the l -th SLA is Λ l = Λ · (cid:213) α ∈ Φ l P ( α ) . (4)The total workload of customers that is processed per unit of time under the l -th SLA is w l = Λ l · s .The revenue from the l -th SLA per unit of time is p l · w l = p l · Λ l · s . The total revenue obtainedper unit of time is G = (cid:213) Ll = p l · w l = (cid:213) Ll = p l · Λ l · s . (5)Above, the system input includes Φ , P (·) , Λ , s , m , φ , p and the decision variables include { φ l } Ll = , { p l } Ll = , and P .For all l ∈ [ , L ] , the l -th SLA guarantees that its jobs experiences a delay of at most φ l . Let Θ = ( φ , φ , · · · , φ L ) . The P determines the job arrival rate of each SLA by (4). Roughly, in aqueuing system, the more the available servers, the smaller the actual experienced delay of servingjobs. When there are x servers and P is given, the actual delay t l of the jobs of SLA l ∈ [ , L ] is anon-increasing function of x . Suppose there are a total of x = m servers for fulfilling all SLAs. Let T = ( t , t , · · · , t L ) , and the CSP will provide the minimum number m of servers needed to fulfillSLAs such that T = h ( m , P) ≤ Θ . (6)We leverage queuing theory to characterize the actual delay of each SLA and concretize the function h (·) , which enables us to better focus on the overall performance of the proposed model and will beelaborated in Section 5.For the service model, we focus on three questions. In the interaction process illustrated inFig 2, each user needs to report its type information to the CSP. However, this information isprivate and customers may seek possible ways to maximize their surplus by misreporting theirtype information. A mechanism is said to be DSIC if a user gains most or at least not less by beingtruthful, regardless of what the others do [11]. In the context of this paper, we have the followingdefinition. Definition 3.2.

Every user of type α will report a type α ′ to the CSP, with the aim to maximizeits surplus. Our service framework is said to be DSIC if the user’s surplus is maximized when ittruthfully reports its type, i.e., α ′ = α , no matter whether other users will truthfully do so or not.Thus, The first question is about providing appropriate incentives via pricing SLAs such thatour service framework is DSIC.

The second is about market segmentation, i.e., how different types able 1. Key Notation Symbol Explanation L the number of SLAs φ l the delay of the l -th SLA p l the price of the l -th SLA T the delay of on-demand service where φ = Tp the price of on-demand service where p = pm the total number of servers possessed by a CSP Λ the total job arrival rate λ l at a single server, the job arrival rate of the l -th SLAˆ λ l at a single server, the total job arrival rate of the first l SLAs Φ the set of the types of all customers α (resp. α ) the maximum (resp. minimum) type of ΦΦ l the set of the types of the customers who are assigned to the l -th SLA P the set { Φ , · · · , Φ L } ˆ α , · · · , ˆ α L + a division of Φ used to define Φ , · · · , Φ L by (7) t l the actual job delay of the l -th SLAof users are grouped together such that each group of users belongs to the same SLA, when theSLA delays are given in advance, and it characterizes the structural property in the mapping ofthe types to the SLAs (i.e., P ). This helps CSP and users better understand the market structure.Then, we will determine the optimal SLA prices given a specific market segmentation. the thirdquestion is what architecture of servers should be used to satisfy (6) for fulfilling SLAs. Then, undera particular architecture, we need to leverage queuing theory to optimally determine the marketsegementation and SLA delays in order to maximize the revenue (5). The main notation used inthis paper is summarized in Table 1. In this section, we suppose the SLA delays φ , φ , · · · , φ L are given; then, we show the marketsegmentation presents a structural property that there exists a sequence ˆ α , ˆ α , · · · , ˆ α L + ∈ Φ suchthat for all l ∈ [ , L ] the customers of the types between ˆ α l and ˆ α l + will be assigned to the l -th SLA.Further, we derive the optimal SLA prices under which our framework forms a DSIC mechanismwhile the CSP’s revenue is maximized. If a customer is more sensitive to delay, its WTP will decrease more quickly while facing thesame increment in delay. Formally, we have the following relation on the difference of WTPs undertwo SLAs.Lemma 4.1.

Let us consider two arbitrary customers of types α and α with α > α , and two SLAs k and k with k < k . The customer of type α is more sensitive to delay as explained for Property 1;the SLA delays satisfy φ k < φ k by (2). Then, we have that the difference of the WTPs of the customerof type α respectively under the k -th and k -th SLAs is larger than its counterpart for the customerof type α , i.e., u ( α , φ k ) − u ( α , φ k ) > u ( α , φ k ) − u ( α , φ k ) . ccording to Definition 3.1, the CSP will select for each customer a SLA under which its surplusis maximized. Roughly, a customer of larger α is more sensitive to delay and will be assigned to aSLA with a smaller delay, as shown below.Lemma 4.2. Let us consider two customers of types α and α where α > α . If the customers oftypes α and α are respectively assigned to the SLAs k and k (i.e., α ∈ Φ k and α ∈ Φ k ), then wehave k ≤ k , where the SLA delays satisfy φ k ≤ φ k by (2). The following proposition characterizes the market segmentation, i.e., the mapping of the typesof customers to the SLAs.Proposition 4.3.

There exists a sequence ˆ α , ˆ α , · · · , ˆ α L + ∈ Φ such that the l -th SLA will beassigned the customers of type α ∈ Φ l , where α = ˆ α L + < · · · < ˆ α < ˆ α = α and Φ l is a subset of thecustomer types defined below: Φ l = (cid:40) Φ ∩ ( ˆ α l + , ˆ α l ] , if l ∈ [ , L − ] , Φ ∩ [ ˆ α L + , ˆ α L ] , if l = L . (7)Proposition 4.3 shows that, in a delay-differentiated market, the customers are segmented by asequence ˆ α , ˆ α , · · · , ˆ α L + such that the customers of type α ∈ Φ l will be assigned to the l -th SLA. Let us suppose in this subsection we are given a particular market segmentation ˆ α , ˆ α , · · · , ˆ α L + defined in Proposition 4.3. Then, we will derive the corresponding SLA prices p , p , · · · , p L thatsimultaneously guarantee that (i) they are optimal to maximize a CSP’s revenue, and (ii) our serviceframework forms a DSIC mechanism.First, we give a definition that is used to define SLA prices. Definition 4.4.

Let u − l = u ( ˆ α l , φ l − ) − u ( ˆ α l , φ l ) for all l ∈ [ , L ] where u − l is the difference of theWTPs of a customer of type ˆ α l respectively under the ( l − l -th SLAs. We define parameterˆ p l to be such that,(i) ˆ p = u ( ˆ α , φ ) = p , i.e., the price of on-demand instances;(ii) for all l ∈ [ , L ] , ˆ p l is the maximum possible p l that satisfies p l ≤ ˆ p l − − u − l , i.e.,ˆ p l = ˆ p l ( ˆ α , · · · , ˆ α l , φ , · · · φ l ) = ˆ p l − − u − l = ˆ p − (cid:213) ll ′ = u − l ′ . Second, each type of customers is assigned some SLA according to Definition 3.1, and we willshow that, when the SLA prices p , p , · · · , p L are set to ˆ p , ˆ p , · · · , ˆ p L , the market segmentation isstill ˆ α , ˆ α , · · · , ˆ α L + , i.e., every customer of type α ∈ Φ l is still assigned to the l -th SLA where Φ l isgiven by (7).To prove this, we consider the surpluses of a customer of type α ∈ Φ l under two adjoining SLAswhose numbers are simultaneously no larger or smaller than l . Roughly, its surplus under the SLAwhose number is closer to l is always larger than its surplus under the other SLA, as shown below.Lemma 4.5. Suppose the SLA prices p , p , · · · , p L are set to ˆ p , ˆ p , · · · , ˆ p L . Let us consider a customerof type α ∈ Φ l and a SLA l ′ where l , l ′ ∈ [ , L ] and Φ l is given by (7). The surplus of this customer issuch that (i) in the case that l ′ ∈ [ , l ] , we have • if α = ˆ α l and l ′ = l , its surpluses under the l ′ -th and ( l ′ − )-th SLAs are the same, and • otherwise, its surplus under the l ′ -th SLA is larger than its surplus under the ( l ′ − )-th SLA; ⋯ ⋯ ⋯ ⋮ A pool of mixed jobs of different SLAs 𝑚 serversdispatching The priority-based queue at each server

Fig. 3. The priority-based sharing architecture with L = : grey rectangles denote all jobs that are dispatchedto multiple servers in spite of their SLAs; at a single server, the jobs of the first SLA (denoted by orangerectangles) have a higher priority to be served than the jobs of the second SLA (denoted by golden rectangles). and (ii) in the case that l ′ ∈ [ l , L − ] , its surplus under the l ′ -th SLA is larger than its surplus underthe ( l ′ + )-th SLA. Using the transitiveness of inequalities, we derive the following proposition with Lemma 4.5.Proposition 4.6.

When the SLA prices p , p , · · · , p L are set to ˆ p , ˆ p , · · · , ˆ p L , we have for all l ∈ [ , L ] that a customer of type α ∈ Φ l will be assigned to the l -th SLA where Φ l is given by (7). Inother words, the customer achieves the maximum surplus under the l -th SLA. Third, we show that, when the SLA delays φ , · · · , φ L and market segmentation ˆ α , · · · , ˆ α L + arearbitrarily given, there is a pricing rule such that the SLA prices are optimal and our frameworkforms a DSIC mechanism.Proposition 4.7. When the SLA prices p , p , · · · , p L are set to ˆ p , ˆ p , · · · , ˆ p L , we have (i) our service framework forms a DSIC mechanism; (ii) ˆ p , ˆ p , · · · , ˆ p L are the optimal SLA prices. In Sections 3 and 4, we study a generic service model that offers L SLAs and its properties inpricing and user behavior. The SLA fulfillment relies on proper provision of servers to jobs tosatisfy (6). In this section, we will consider two typical architectures of servers to serve jobs andfulfill SLAs. Then, we study their performance and optimal configuration of parameters such asSLA delays.

A CSP has a total of m servers. When a job j arrives, it is assigned to a server that will serve it fora duration s j . We will respectively consider (i) the PBS architecture and (ii) the SMS architecture. Inthe former, an arriving job will be assigned to one of the m servers, and the order of serving thejobs at a server depends on their priorities, which depend on the SLAs to which they belong. In thelatter, servers are separated into L groups and each exclusively serves the jobs of the same SLA. Preliminary . Before elaborating the architectures, we first introduce the polices used incloud services for assigning jobs [25]. Suppose there are m ′ servers to serve a particular group ofjobs and the mean job arrival rate is Λ ′ . Typical dispatching policies include (i) Random : for everyjob , it chooses every server with the same probability m ′ and assign j to the chosen server [26, 27], ⋯ ⋯⋯ ⋯ ⋮ The first SLAThe second SLA A pool of jobs 𝑚 servers 𝑚 serversdispatching The FCFS queue at each server Fig. 4. The separated multi-SLAs architecture with L = and m + m = m : colored rectangles denote jobs ofdifferent SLAs while colored circles denote servers of different SLAs. and (ii) Round-Robin (RR) : jobs are assigned to servers in a cyclical fashion with the j -th job beingassigned to the i -th server where i = j mod m ′ [28]. As a result, jobs are evenly dispatched over the m ′ servers. At each server, the arriving jobs form a single queue with the same mean job arrivalrate λ ′ = Λ ′ m ′ [29]. The service time of a job is denoted by a random variable x and the mean s of x is normalized to be one, i.e., s = The PBS Architecture . In the PBS architecture, whenever a job arrives, it is assigned toone of the m servers by some dispatching policy described above. The total job arrival rate is Λ andthe job arrival rate at a single server is λ = Λ m . At every server, the jobs have L priority classes. Forall l ∈ [ , L − ] , the jobs of SLA l have higher priority to utilize servers than the jobs of SLAs l + l . At the moments of job completion, the server becomes idle andwill select a new job of the highest priority to serve, and jobs of the same priority will be chosen ina first-come-first-served (FCFS) discipline. While a job j is being served, the nonpreemptive rule isapplied, that is, the job will continuously occupy a server for a duration s j even if other jobs ofhigher priorities arrive.Now, we give the mean delay t l of the jobs of each SLA l ∈ [ , L ] . At each server, the job arrivalrate of the l -th SLA is λ l = λ · (cid:205) α ∈ Φ l P ( α ) . The total arrival rate of the jobs of SLAs 1, · · · , l isˆ λ l = (cid:205) ll ′ = λ l ′ . The jobs of all SLAs at every server form a single queue and their job arrivals aredescribed as a Poisson process with rate λ . The service time x of jobs is assumed to follow a generaldistribution where the mean s is one. Such a queue is usually denoted by M / G /

1. We can directly se the result for a M / G / l -th SLA t l = . · λ · E [ x ]( − ˆ λ l − ) · ( − ˆ λ l ) , (8)where l ∈ [ , L ] , ˆ λ is set to zero trivially, and E [ x ] is the second moment of x , i.e., its mean-squaredvalue. The SMS Architecture . In the SMS architecture, the m servers are separated into L groups,and each group has m i servers and forms a module, where m = (cid:205) Ll = m l . The l -th module is usedto exclusively serve the jobs of the l -th SLA, and every job that belongs to the l -th SLA will beassigned to one of the m l servers under some dispatching policy such as Random or RR. At everyserver, the jobs will be served in a FCFS discipline. The total job arrival rate of the l -th SLA is Λ l and the job arrival rate at a single server is λ l = Λ l m l . The jobs at every server forms a single queue,and when it is a M/G/1 queue, we have from [33] that the job delay of the l -th SLA is t l = . · λ l · E [ x ] − λ l . (9) On-demand Service System . The delay-differentiated service system of this paper canbe viewed as a complement to the standard on-demand service model, which will be used as abenchmark. In a pure on-demand system, all jobs are served with a short delay and processed withthe same priority on the m servers. Upon arrival of each job, it will be dispatched to one of the m servers under some policy and the jobs at the same server will be served in a FCFS discipline. Thetotal job arrival rate is Λ od , and the job arrival rate at a single server is λ od = Λ od m . Similar to (9),we have that the delay of all jobs is t = . · λ od · E [ x ] − λ od . (10)The job delay will be no larger than T , which requires that t ≤ T .Beyond the above architectural description, we will use in this paper exponential or hyperexpo-nential distribution to model the service time x . As often used in the literature [26, 32], they havesimple closed-form expressions for E [ x ] and can guarantee the existence of E [ x ] , which enableanalytically evaluating the performance of the architectures above. When x follows an exponentialdistribution [26, 32], we have E [ x ] = · s = . (11)When x follows a hyperexponential distribution [32], it can be characterized by h tuples ( π i , η i ) where i ∈ [ , h ] and (cid:205) hi = η i = x has a probability η i to follow an exponential distribution withrate π i . For an exponential distribution with rate π i , its mean is π i . The mean of x is s = (cid:213) hi = η i π i = x is E [ x ] = (cid:213) hi = π i · η i . (13) .2 Optimal SLA Delays The actual experienced job delays of the L SLAs are t , · · · , t L . As described in (6), t l should beno larger than the SLA delay φ l . The delay of the first SLA is T . Intuitively, we should keep theother SLA delays as small as possible, i.e., φ l = t l for all l ∈ [ , L ] , in order to maximize the revenue.In fact, by doing so, we can make every SLA price as high as possible, and we now rigorously provethis by analyzing the structure of the SLA prices in Definition 4.4.Proposition 5.1. In order to maximize the revenue, we have φ l = t l for all l ∈ [ , L ] . In this subsection, we will study the performance of the proposed service system respectivelybuilt on the PBS and SMS architectures. Recall that G denotes the revenue of the service systemof this paper and we denote by G od the revenue of an on-demand service system. The viability ofour service system can be mainly indicated by the ratio of G to G od , denoted by κ ; κ − G od is improved by when our service system is used. It is difficult to give aclosed form of the optimal G since this involves solving a system of non-linear equations. We thusseek to give a bound of κ .For the PBS-based service system, we will get an upper bound of κ that is close to one. Thisimplies that, at best, it can marginally outperform the on-demand service system, which willdiscourage the adoption of a PBS-based service system. For the SMS-based service system, we willget a lower bound of κ that is significantly larger than one. This implies that the SMS-based servicesystem can significantly outperform the on-demand service system, which will support the use ofa SMS-based service system by CSPs. Finally, we will give an optimal algorithm to maximize therevenue of a SMS-based service system. A Performance Bound of the PBS-based Service System . When a PBS-based servicesystem is considered, we denote by G pbs its revenue. We will derive an upper bound of the ratioof G pbs to G od . For the standard on-demand service model, it has a fixed price p and guarantees asmall delay of at most T . A CSP’s revenue is maximized when the delay of the first SLA is T andwe have by (10) that the corresponding job arrival rate at a single server is as follows: λ od = TA + T , (14)where A = . · E [ x ] . Further, the maximum revenue that an on-demand service model can achieveis G od = m · p · λ od · s = m · p · TA + T . (15)For the PBS-based service system, we have the following analysis. All jobs of different SLAs areexecuted on the m servers. The first SLA offers service at a fixed price p and guarantees a smalldelay of at most T , and we have by (8) that φ = λ · A / (cid:16) − ˆ λ (cid:17) ≤ T , where 0 < ˆ λ < λ <

1. Thus,we get λ < T / A . A CSP’s revenue is given in (5) and we can get an upper bound of G pbs : G pbs = L (cid:213) l = p l · m · λ l ≤ p · m · L (cid:213) l = λ l = p · m · λ < p · m · TA , (16)where p l ≤ p for all l ∈ [ , L ] . It follows from (14) and (16) thatProposition 5.2. The performance of a PBS-based service model is upperly bounded by + TA timesthe optimal performance of the standard on-demand service model, in terms of the revenue, where A = . · E [ x ] . .2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 A Fig. 5.

The Value of A under Varying π . When x follows an exponential distribution, we have A = x follows a hyperexpo-nential distribution, we use an example in [32] to set h = η = . η = .

25; we let π ∈ ( , ) ,which represents more jobs have relatively smaller service times. We vary the value of π from 0.2to 0.95 with a step size 0.05, and compute the corresponding value of π by (12); then, we can getthe value of A by (13), which is illustrated by the red stars in Fig. 5, where A >

1. In both cases, wecan conclude by Proposition 5.2 that, the upper bound in Proposition 5.2 is at most 1 + T , and thePBS-based service system can only outperform the standard on-demand service system marginally,since the delay of the first SLA T is small. A Performance Bound of the SMS-based Service System . When a SMS-based servicesystem is considered, we denote by G sms its revenue, and by G ∗ sms its optimal revenue where G ∗ sms ≥ G sms . In cloud markets, the total number of servers is large so that the revenue from asingle server could be negligible, in comparison with the total revenue. Thus, to give a closedform of the lower bound, we relax in this subsubsection the constraint that the number of serversassigned to each SLA is integer and allow the number to be fractional; the total revenue afterrelaxation approximates the total revenue of an integer solution. Further, we have m i · λ i = Λ i , and λ i and φ i satisfy the relation (9); for i ∈ [ , L ] , the number of servers assigned to the i -th SLA is asfollows: m i = Λ i · ( φ i + A ) φ i , (17)where A = . · E [ x ] . Furthermore, it is known that there are many applications whose workloadis delay-tolerant, as illustrated by the prosperity of spot market [3]; a CSP like Amazon EC2 orMicrosot Azure also has the ability to adjust the provision of servers to properly satisfy the needsof users.We consider a specific setting of the service system where two SLAs are offered respectively forlatency-critical and delay-tolerant jobs; the corresponding revenue can be viewed as a lower boundof G ∗ sms . The setting is as follows: (i) we choose some α ′ ∈ (cid:0) α , α (cid:1) such that all customers with α larger than α ′ will be processed under the first SLA and the others are processed under the secondSLA (i.e., ˆ α = α ′ ), and (ii) the CSP intends to adapt its capacity (i.e., the value of m ) to guaranteethat the delay φ of the second SLA is set to some value φ ′ ; the α ′ and φ ′ are system parametersset by the CSP. α ′ determines the proportion of the arriving jobs to be processed under each SLA.Let Φ = Φ ∩ ( α ′ , α ] , and Φ = Φ − Φ ; we have that the job arrival rates for the first and secondSLAs are respectively: Λ = (cid:205) α ∈ Φ P ( α ) · Λ and Λ = Λ − Λ . By Proposition 4.7, the prices of thefirst and second SLAs are p = p and p = p + ( u ( ˆ α , φ ) − u ( ˆ α , φ )) = u ( ˆ α , φ ) where φ = T . Wehave that the CSP’s total revenue is G sms = p · Λ + u (cid:0) α ′ , φ ′ (cid:1) · Λ . (18) .2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fig. 6.

The Lower Bound κ ′ under Varying π . If the CSP only provides on-demand service, the (optimal) revenue G od is as follows: G od = m · p · λ od = ( m + m ) · p · TA + T , (19)where λ od is given in (14). The below conclusion follows from (18) and (19):Proposition 5.3. The optimal revenue G ∗ sms of a SMS-based service system is at least κ ′ times therevenue of an on-demand service system where κ ′ = G sms G od ≥ (cid:0) p · Λ + u (cid:0) α ′ , φ ′ (cid:1) · Λ (cid:1) · ( A + T )( m + m ) · p · T , (20) where m and m are given in (17), A = . · E [ x ] , and α ′ and φ ′ are system parameters set by theCSP. Proposition 5.3 provides a closed form of the lower bound κ ′ of the ratio of G ∗ sms to G od , and κ ′ − κ ′ − β = u ( α , φ ) = p · (cid:0) − ( α · ( φ − T )) (cid:1) , t ∈ [ T , + ∞) . (21)The parameter α ′ is set to be such that a significant portion of jobs (e.g., half jobs) are processedunder each SLA. Since there are many delay-tolerant jobs in cloud markets, ˆ φ = α ′ + T can bemuch larger than T and it is the minimum delay under which the WTPS of the customers of thesecond SLA will become zero. Correspondingly, the delay φ of the second SLA is set to ˆ φ + T ; thisleads to that the price p of the second SLA will be 0 . · p , which is not far from the on-demandprice p . We set Λ = Λ = . · Λ . In this case, we have that κ ′ = G ∗ sms G od ≥ . · (cid:0) + AT (cid:1)(cid:16) + AT + · A ˆ φ (cid:17) . (22)The lower bound of (22) decreases in T and increases in ˆ φ ; we set T to a larger value 0.05 and ˆ φ to 0.5. In this case, when the service time x follows an exponential distribution, we have A = κ ′ ≥ . x follows a hyperexponential distribution, we still use the setting inSection 5.3.1 and the value of κ ′ is illustrated in Fig. 6 where κ ′ ≥ . lgorithm 1: Optimal Parameter Configuration G ∗ ← A ′ ← A , M ′ ← M ; // G ∗ : record the current optimal revenue; A ′ and M ′ : recordthe tuples unconsidered respectively in A and M while M ′ (cid:44) ∅ do Get a tuple ( i , i , · · · , i L + ) from M ′ , and the l -th module is assigned m l = i l + − i l servers; while A ′ (cid:44) ∅ do Get a tuple seq = ( α , α , · · · , α L + ) from A ′ ; Compute the job arrival rate Λ l of the l -th SLA by Equation (4) and Proposition 4.3; For all l ∈ [ , L ] , compute the actual job delay t l of the l -th SLA using (9); if φ ≤ T < φ < φ < · · · < φ L then // The delay of the first SLA is no larger than T and the SLA delays are increasing Set the delay φ l of the l -th SLA to t l for all l ∈ [ , L ] , and φ to T ; Use Proposition 4.7 to compute the optimal prices of SLAs p , p , · · · , p L ; Compute the revenue G by (5), where w l = Λ l · s = m · λ l · s ; if G > G ∗ then G ∗ ← G , φ ∗ l ← φ l , p ∗ l ← p l , m l ← m ∗ l , for all l ∈ [ , L ] ; // record the optimalSLA delays and prices, and division of servers Delete seq from A ′ ; Delete the tuple ( i , i , · · · , i L + ) from M ′ ; In this subsection, we will give a procedure to determine the optimal SLA delays and prices of aSMS-based service system, in order to maximize the revenue. The delays and prices are determinedby the market segmentation ˆ α , ˆ α , · · · , ˆ α L + , and the numbers of servers assigned to different SLAs m , m , · · · , m L . Specifically, as shown in Proposition 4.6, the sequence ˆ α , ˆ α , · · · , ˆ α L + determinesthe job arrival rate of each SLA by (4). The numbers m , m , · · · , m L determine the delays of SLAs φ , φ , · · · , φ L by (9), which further determine the prices of SLAs by Proposition 4.7. Thus, ourdecision variables are ˆ α , · · · , ˆ α L and m , · · · , m L with the aim of maximizing the revenue, whereˆ α = α , ˆ α L + = α , and (cid:205) Ll = m l = m .Now, we give a procedure to determine the optimal decision variables under the SMS architecture.ˆ α , ˆ α , · · · , ˆ α L + uniquely corresponds to an element in the following set A = {( α , α , · · · , α L + ) | α = α > α > · · · > α L + = α , α , α , · · · , α L ∈ Φ } , where ˆ α l = α l for all l ∈ [ , L + ] . m , m , · · · , m L uniquely correspond to an element in thefollowing set M = {( i , i , · · · , i L + ) | = i < i < · · · < i L + = m } . The number m l is set to i l + − i l for all l ∈ [ , L ] . We can give a procedure, presented as Algorithm 1,to determine the optimal tuples in A and M such that the CSP achieves the maximum revenue;then, the corresponding delays and prices under these two tuples will be the optimal ones, and wehave the following conclusion.Proposition 5.4. Algorithm 1 gives the optimal delays and prices of SLAs, and its time complexityis O (cid:0) m L − · n L − (cid:1) . ig. 7. Revenue Improvement : the red (resp. blue) stars are for the case of low (resp. high) delay-tolerance;the left subfigure illustrates the maximum revenue improvement under a given number of SLAs L , while theright subfigure illustrates the corresponding average load per server; in the on-demand service system, theaverage load per server is 0.0476. In this section, we numerically show the revenue improvement that a SMS-based service systemachieves over the standard on-demand service system. Besides, we adapt the architecture of [4, 5]to the service model of this paper and compare it with the SMS-based service system; the relatedresults and analysis are put in the Appendix.

There are a total of m servers and the WTP function is given in (21). The on-demand price p (i.e., the price p of the first SLA) is normalized as 1, and its delay T is 0.05. Given a delay-costtype α , let φ ′ = α and a customer’s WTP becomes zero when the delay φ = φ ′ + T , and each α uniquely corresponds to a φ . There are n =

50 types of customers and for all i ∈ [ , n ] the WTP ofthe i -th type of customers becomes zero when the delay is φ , i = T + φ ′ , i ; here, φ ′ , i = ϵ if i = φ ′ , i = ( i − ) · δ otherwise, where ϵ is arbitrarily small. We have φ , < φ , < · · · < φ , . Thefirst type of customers is the most delay-sensitive and its WTP becomes zero even if the delay isslightly larger than T . The value of δ determines the delay-tolerance of the population, and if it islarge, the population has a high delay-tolerance. We consider two cases where the delay-toleranceis low and high respectively: (i) δ = .

02 and (ii) δ = . Λ ; the service time of jobs follows an exponentialdistribution and their mean is normalized as one, i.e., s =

1. Customers are independently anduniformly distributed over the n types, and the mean job arrival rate of each type is Λ n . Then, ρ = Λ m · s = λ denotes the average load per server when all m servers are considered. We denote by G ∗ sms the optimal revenue achieved by Algorithm 1. In an on-demand service system, G od denotesits revenue and is defined in (15); λ od denotes the maximum load per server and it equals 0.0476since t ≤ T in (10); hence, the maximum revenue that a CSP can obtain from a single server is also0.0476. The following ratio is the main performance metric in our experiments: γ = G ∗ sms / G od . Specifically, if γ >

1, the SMS-based service system will outperform the on-demand system; thelarger the value of γ , the higher the revenue improvement. The service model of this paper can be viewed as a complement to the on-demand service, and itcan attract potential delay-tolerant customers from the market and improve the revenue efficiency,i.e., the average revenue per server. In practice, a CSP like Amazon EC2 or Microsoft Azure often .05 0.07 0.09 0.11 0.13 0.150.01.02.0 Fig. 8.

Revenue Improvement γ under Varying Load λ : (i) the left and right subfigures correspond to thelow and high delay-tolerance cases respectively; (ii) the magenta, blue, red, black and green stars denote therevenue improvement γ in the case of two, three, four, five and six SLAs respectively. T he S L A P r i c e T he S L A P r i c e s Fig. 9.

The SLA Prices under Varying Load λ : (i) the left and right subfigures correspond to the low andhigh delay-tolerance cases respectively; (ii) in each subfigure, the red, blue and magenta markers denote theresults when L = , , respectively; (iii) the markers "stars", "circles" and "squares" denote the SLA prices ofthe second, third and fourth SLAs respectively; the price of the first SLA is one. D e l a y D e l a y Fig. 10.

The SLA Delays under Varying Load λ : (i) the left and right subfigures correspond to the low andhigh delay-tolerance cases respectively; (ii) the stars illustrate the SLA delays φ l while the squares illustratethe value of φ , i l ; (iii) the red markers are for the second SLA when offering two SLAs; (iv) the blue andmagenta markers are respectively for the second and third SLAs when offering three SLAs. has rich capital and can adapt its capacity to accept and serve all arriving jobs and maintain itsload per server at a desired level. Revenue Improvement . In Section 5.3.2, we have given a lower bound of the performancein (22), and consider the setting that two SLAs are offered and each SLA is assigned half the jobs. hen it is further concretized by our experimental setting, we have ˆ φ = . + · δ . In thelow delay-tolerant case, δ = .

02 and ˆ φ = .

55; the revenue improvement γ is 1.536. In the highdelay-tolerant case, δ = .

04 and ˆ φ = . γ is 1.647.In the rest of this section, we fix the number of servers m =

100 and allocate a proper proportionof servers to each SLA. We vary the average load per server λ that increases from 0.05 with a stepsize 0.01, and calculate the revenue improvement γ . The value of γ varies under different load λ .The maximum revenue improvement under a given number of SLAs L is summarized in Fig. 7 (left),ranging from 182.5% to 309.9%; the corresponding optimal λ is given in Fig. 7 (right). From the figure,we can see that (i) the larger the number L of SLAs, the higher the revenue improvement γ , and (ii)the higher the delay-tolerance, the higher the revenue improvement. In the low delay-tolerancecase, when the number of SLAs offered by a CSP varies from two to six, the revenue improvementincreases from 182.5% to 226.0%. The revenue improvement is remarkable even when L =

2. In thehigh delay-tolerance case, the revenue improvement is 229.1% even when L =

2. In both low andhigh delay-tolerance cases, when L ≥

4, the revenue improvement increases only marginally as L increases. This may imply that, in practice, offering two or three SLAs may be enough. Further Observation . In the following, we illustrate some detailed numerical results tohelp us understand the features of an optimal parameter configuration.First, we describe the general features. By Proposition 4.7, there exists a sequence 1 = i < i < · · · < i L ≤ n such that the l -th SLA is assigned the customers whose φ , i is such that i ∈ [ i l , i l + ) if l ∈ [ , L − ] and i ∈ [ i L , n ] if l = L ; here, we have ˆ α l = / φ ′ , i l . For all l ∈ [ , L ] , the price p l ofthe l -th SLA equals p l − minus the difference u ( ˆ α l , φ l − ) − u ( ˆ α l , φ l ) where φ l − < φ l . Roughly, therevenue is the average price times the load of the m servers. To maximize the revenue, we needkeep the SLA prices high, and the sequence i , i , · · · , i L should be selected in a way such that, forall l ∈ [ , L ] ,(i) the SLA delay φ l is significantly smaller than φ , i l ;(ii) the difference of φ l and φ l − is small;(iii) the value of φ , i l is as large as possible;(iv) the SLA delay φ l is significantly larger than T .When the delay is small, the WTP decreases slowly, as explained in Section 3.1. The first two pointsguarantee that p l is not far from the on-demand price p . By (9), the last two guarantee that, theload λ l per server of the l -th SLA is significantly larger than λ od , leading to a larger overall load λ per server.Second, the above features are also embodied in our numerical results. The revenue improvementunder varying load are illustrated in Fig. 8. The corresponding SLA prices and delays are givenin Fig. 9 and 10. Given the number of SLAs L , the revenue improvement γ always increases untilthe load λ increases to some threshold; afterwards, γ begins to decrease since every server has atoo heavy load. As illustrated in Fig 10, if λ is too large, the SLA delay φ l will be large and close to φ , i l ; then, the WTPs of customers are low, as well as the SLA price, as illustrated in Fig. 9; thus, γ becomes smaller even if more workload is processed.For example, in the low delay-tolerance case with L =

2, the optimal γ is achieved when the load λ is 0.1, as shown in Fig. 7 (right). As the load λ increases from 0.05 to 0.1, γ keeps increasing, asillustrated by the magenta curve in Fig. 8 (left); afterwards, γ begins to decrease. As illustrated bythe red curve in Fig. 10 (left), when the load λ is 0.12, the SLA delay φ is 0.2228, which is close to φ , i = .

23; then, the SLA price p is 0.1149. In contrast, when λ = .

1, the SLA delay φ = . φ , i = .

29. Thus, to maintain a large γ , the average load λ perserver should be maintained at a proper level by adjusting the total number of servers m . CONCLUSION

In cloud computing, there exist both latency-critical jobs and jobs that could tolerate differentdegrees of delay. The resource efficiency of a system is much dependent on the job’s latencyrequirement. We propose a delay-differentiated pricing and service model where multiple SLAs areprovided, as a complement to the existing on-demand service system. The structure of the marketformed by the proposed model is studied and we thus derive the pricing rule under which theproposed framework forms a DSIC mechanism and the CSP’s revenue is maximized. We considertwo architectures for fulfilling SLAs: the first appears more prevalent and advanced in the literaturewhile the second seems very simple. Our rigorous analysis discourages the adoption of the firstarchitecture and supports the use of the second one. Finally, numerical results are given to showthe viability of the proposed service model in comparison with a pure on-demand service system,showing a revenue improvement by up to 209.9%.

REFERENCES

Journal of Network and Computer Applications

45 (2014): 108-120.[9] K. Psychas, and J. Ghaderi. “On Non-Preemptive VM Scheduling in the Cloud." Proceedings of the ACM on Measurementand Analysis of Computing Systems (SIGMETRICS’17) 1, 2, Article 35 (2017), 29 pages.[10] W. Dargie. “Estimation of the cost of VM migration." In Proceedings of the 23rd International Conference on ComputerCommunication and Networks (ICCCN’14), pp. 1-8. IEEE, 2014.[11] N. Nisan, T. Roughgarden, Éva Tardos, and V. Vazirani. “Algorithmic Game Theory." Cambridge University Press, 2007.[12] Danilo Ardagna, Giuliano Casale, Michele Ciavotta, Juan F Pérez, and Weikun Wang. Quality-of-service in cloudcomputing: modeling techniques and their applications. Journal of Internet Services and Applications, 5(1):1-17, 2014.[13] J. Anselmi, D. Ardagna, John C. S. Lui, Adam Wierman, Y. Xu, and Z. Yang. “The Economics of the Cloud." ACMTransactions on Modeling and Performance Evaluation of Computing Systems 2, 4, Article 18 (December 2017), 23pages.[14] Nikhil R. Devanur. “A Report on the Workshop on the Economics of Cloud Computing." ACM SIGecom Exchanges15.2 (2017): 25-29.[15] Xiaohu Wu, Francesco De Pellegrini, Guanyu Gao, and Giuliano Casale. “A Framework for Allocating Server Time toSpot and On-Demand Services in Cloud Computing." ACM Transactions on Modeling and Performance Evaluation ofComputing Systems 4, 4, Article 20 (2019), 31 pages.[16] O. Ben-Yehuda, M. Ben-Yehuda, A. Schuster, and D. Tsafrir. “Deconstructing amazon ec2 spot instance pricing." ACMTransactions on Economics and Computation 1, no. 3 (2013): 16.[17] I. Kash, and P. Key. “Pricing the cloud." IEEE Internet Computing 20, no. 1 (2016): 36-43.[18] Xiaohu Wu, Patrick Loiseau, and Esa Hyytiä. “Towards Designing Cost-Optimal Policies to Utilize IaaS Clouds withOnline Learning." IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 3, pp. 501-514, 2020.[19] Daniel J. Dubois, and Giuliano Casale. “OptiSpot: minimizing application deployment cost using spot cloud resources."Cluster Computing 19, no. 2 (2016): 893-909.[20] Y. Azar, I. Kalp-Shaltiel, B. Lucier, I. Menache, J. Naor, and J. Yaniv. “Truthful online scheduling with commitments." InProceedings of the Sixteenth ACM Conference on Economics and Computation (EC’15), pp. 715-732. ACM, 2015.

21] N. Jain, I. Menache, J. Naor, and J. Yaniv. “Near-optimal scheduling mechanisms for deadline-sensitive jobs in largecomputing clusters." ACM Transactions on Parallel Computing 2, no. 1 (2015): 3.[22] Xiaohu Wu, and Patrick Loiseau. “Algorithms for scheduling deadline-sensitive malleable tasks." In 2015 53rd AnnualAllerton Conference on Communication, Control, and Computing (Allerton’15), pp. 530-537. IEEE, 2015.[23] X. Zhang, Z. Huang, C. Wu, Z. Li, and F. C.M. Lau. “Online Auctions in IaaS Clouds: Welfare and Profit Maximizationwith Server Costs." In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement andModeling of Computer Systems (SIGMETRICS’15), pp. 3-15. ACM, 2015.[24] C. S. Yeo and R. Buyya, “Service Level Agreement based Allocation of Cluster Resources: Handling Penalty to EnhanceUtility," In Proceedings of the 2005 IEEE International Conference on Cluster Computing, 2005, pp. 1-10.[25] Esa Hyytiä, Rhonda Righter, Olivier Bilenne, Xiaohu Wu, “Dispatching discrete-size jobs with multiple deadlines toparallel heterogeneous servers," in: Antonio Puliafito, Kishor Trivedi (Eds.), Systems modeling: methodologies and tools,EAI/Springer Innovations in Communications and Computing, Springer, 2019, pp. 29-46.[26] J. Rasley, K. Karanasos, S. Kandula, R. Fonseca, M. Vojnovic, and S. Rao. “Efficient queue management for clusterscheduling." In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys’16), p. 36. ACM, 2016.[27] L. Zheng, C. Joe-Wong, C. Brinton, C.-W. Tan, S. Ha, and M. Chiang. “On the Viability of a Cloud Virtual ServiceProvider." In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling ofComputer Systems (SIGMETRICS’16). ACM, 2016.[28] Weikun Wang, and Giuliano Casale. "Evaluating weighted round robin load balancing for cloud web services." InProceedings of the 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing(SYNASC’14), pp. 393-400. IEEE, 2014.[29] E. Hyytiä, and S. Aalto. "On Round-Robin routing with FCFS and LCFS scheduling." Performance Evaluation 97 (2016):83-103.[30] A. Chung, J. W. Park, and G. R. Ganger. Stratus: cost-aware container scheduling in the public cloud. In Proceedings ofthe ACM Symposium on Cloud Computing (SoCC’18), pp. 121-134. ACM, 2018.[31] K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S.Sakalanaga. “Mercury: hybrid centralized and distributed scheduling in large shared clusters." In Proceedings of the2015 USENIX Annual Technical Conference (ATC’15), pp. 485-497. USENIX Association, 2015.[32] D. Mukherjee, S. Dhara, Sem C. Borst, and J. S.H. van Leeuwaarden. “Optimal Service Elasticity in Large-ScaleDistributed Systems." Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS’17)1, 1, Article 25 (2017), 28 pages.[33] Dimitri Bertsekas and Robert Gallager. 1987. “Data Networks." Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

A PROOFS

Proof of Lemma 4.1.

Let φ ∈ [ T , + ∞) . It suffices to prove the conclusion that д ( φ ) = u ( α , φ ) − u ( α , φ ) is an increasing function of φ ; then, the lemma holds since д ( φ k ) > д ( φ k ) . To prove this,we note that the derivative of д ( φ ) is д ′ ( φ ) = ∂ u ( α , φ ) ∂ φ − ∂ u ( α , φ ) ∂ φ . Since α > α , we have д ′ ( φ ) > д ( φ ) is increasing. Proof of Lemma 4.2.

We prove this by contradiction. Suppose k < k and the SLA delays satisfy φ k < φ k . The customer of type α (resp. α ) achieves the maximum surplus under the SLA k (resp. k ), and we thus have u ( α , φ k ) − p k ≥ u ( α , φ k ) − p k (23) u ( α , φ k ) − p k ≤ u ( α , φ k ) − p k (24)Multiplying (23) by -1 and adding the resulting inequality to (24), we have u ( α , φ k ) − u ( α , φ k ) ≤ u ( α , φ k ) − u ( α , φ k ) . However, since α > α and k < k , we have by Lemma 4.1 that u ( α , φ k ) − u ( α , φ k ) > u ( α , φ k ) − u ( α , φ k ) , which contradicts the previous inequality. Proof of Lemma 4.3.

Each type of customers will be assigned to some SLA, and Φ l denotes the setof the types of the customers assigned to the l -th SLA for all l ∈ [ , L ] . Let ˆ α l denote the maximumtype in Φ l such that only the customers of type α ≤ ˆ α l will possibly be assigned to the l -th SLA. or all l ∈ [ , L − ] , when the customers of types ˆ α l and ˆ α l + are respectively assigned the l -th and( l + α l > ˆ α l + , which can be easily proved by contradiction.A customer of type α will be assigned to a SLA whose number is no larger than one (i.e., the firstSLA) since α ≥ ˆ α . Thus, we have ˆ α = α .By Lemma 4.2, we also have that (i) for all l ∈ [ , L − ] every customer of type α ∈ ( ˆ α l + , ˆ α l ] ∩ Φ will be assigned to a SLA whose number l ′ is no smaller than l but no larger than l +

1, and (ii)every customer of type α ∈ (cid:2) α , ˆ α L (cid:3) ∩ Φ will be assigned to a SLA whose number is no smaller than L since α ≤ ˆ α L . In the first case, α > ˆ α l + and ˆ α l + is the maximum type of Φ l + ; thus l ′ will besmaller than l + l . The proposition thus holds. Proof of Lemma 4.5.

In the first case, if α = ˆ α l and l ′ = l , the surplus difference of the customerunder the l ′ -th and ( l ′ − ( u ( ˆ α l , φ l ) − p l ) − ( u ( ˆ α l , φ l − ) − p l − ) ; it equals zero due toDefinition 4.4. Otherwise, we have either α < ˆ α l or l ′ < l : in the former, α < ˆ α l ≤ ˆ α l ′ since l ′ ∈ [ , l ] ; in the latter, α ≤ ˆ α l < ˆ α l ′ . Thus, we have α < ˆ α l ′ . The surplus difference under twoadjoining SLAs is ( u ( α , φ l ′ ) − p l ′ ) − ( u ( α , φ l ′ − ) − p l ′ − ) ( a ) = ( u ( ˆ α l ′ , φ l ′ − ) − u ( ˆ α l ′ , φ l ′ )) − ( u ( α , φ l ′ − ) − u ( α , φ l ′ )) ( b ) > α l ′ + < α since α ∈ ( ˆ α l + , ˆ α l ] and l ′ ≥ l , and the difference of the surpluses of the customer underthe l ′ -th and ( l ′ + ( u ( α , φ l ′ ) − p l ′ ) − ( u ( α , φ l ′ + ) − p l ′ + ) ( c ) = ( u ( α , φ l ′ ) − u ( α , φ l ′ + )) − ( u ( ˆ α l ′ + , φ l ′ ) − u ( ˆ α l ′ + , φ l ′ + )) ( d ) > Proof of Lemma 4.6.

In the case that α (cid:44) ˆ α l , we have by Lemma 4.5 the conclusion that, (i) for all l ′ ∈ [ , l ] , the customer achieves a higher surplus under l ′ -th SLA than under the ( l ′ − l ′ ∈ [ l , L − ] , it achieves a higher surplus under the l ′ -th SLA than under the ( l ′ + l -th SLA. In the case that α = ˆ α l ,we still have the above conclusion, except that the customer achieves the same surplus under the l -th and ( l − l ′ ∈ [ , l ] and l ′ = l . Hence, the customer achieves the maximumsurplus under both the l -th and ( l − Proof of Lemma 4.7.

Let us consider a customer of type α ∈ Φ l who reports to the CSP thatits type is α ′ . No matter what the other users do, we have by Proposition 4.6 that it achieves themaximum surplus under the l -th SLA and will be assigned by the CSP to the l -th SLA when ittruthfully reports its type, i.e., α ′ = α . Thus, it cannot gain more by misreporting its type, sincemisreport can lead to that it is assigned to the l -th SLA or the other SLAs. The first point thus holdsby Definition 3.2.The objective of our framework is to maximize (5); given the market segmentation ˆ α , ˆ α , · · · , ˆ α L + defined in Proposition 4.3, the job arrival rate of each SLA is fixed by (4) and we have the conclusionthat the larger the SLA prices, the larger the value of AG . The first SLA’s price p is fixed and equals p . In order to guarantee the truthfulness of the customers of type α ∈ Φ l , a necessary condition isthat u i l ( α , φ l − ) − p l − ≤ u i l ( α , φ l ) − p l , for all l ∈ [ , L ] . Further, irrespective of the value of p l − ,the maximum possible value of p l is ˆ p l for all l ∈ [ , L ] . Thus, the second point holds. roof of Lemma 5.1. We prove this by contradiction. We have φ l ≥ t l for all l ∈ [ , L ] . Let usconsider an optimal solution where the SLA delays and prices are φ ∗ l and p ∗ l for all l ∈ [ , L ] , andthe market segmentation is ˆ α , ˆ α , · · · , ˆ α L + . Suppose there exists some SLA l ∈ [ , L ] such that φ ∗ l > t l ; let l ′ denote the minimum such l , where φ ∗ = t , · · · , φ ∗ l ′ − = t l ′ − if l ′ >

2. If we decreasethe delay of the l ′ -th SLA to t l ′ and keep the others unchanged, we denote the correspondingprices by p , · · · , p L . It suffices to prove the conclusion that p l > p ∗ l for all l ∈ [ l ′ , L ] and p l = p ∗ l for all l ∈ [ , l ′ − ] if l ′ >

2. This will lead to that the revenue (5) increases, which contradicts theassumption that p ∗ , · · · , p ∗ L are optimal; the proposition thus holds. Now, we prove the conclusion.The SLA prices are determined by Proposition 4.7. First, we have p ∗ l = p l for all l ∈ [ , l ′ − ] if l ′ >

2; this is due to that φ ∗ , · · · , φ ∗ l ′ − does not change. Second, for the l ′ -th SLA, we have p l ′ = p l ′ − + u ( ˆ α l ′ , t l ′ ) − u ( ˆ α l ′ , t l ′ − ) ( a ) > p ∗ l ′ − + u ( ˆ α l ′ , φ ∗ l ′ ) − u ( ˆ α l ′ , φ ∗ l ′ − ) = p ∗ l ′ . The inequality (a) is due to that p l ′ − = p ∗ l ′ − , u ( ˆ α l ′ , t l ′ ) > u ( ˆ α l ′ , φ ∗ l ′ ) , and t l ′ − = φ ∗ l ′ − . Third, for the( l ′ + p l ′ + = p l ′ + u ( ˆ α l ′ + , φ ∗ l ′ + ) − u ( ˆ α l ′ + , t l ′ ) = p l ′ − + u ( ˆ α l ′ , t l ′ ) − u ( ˆ α l ′ , φ ∗ l ′ − ) + u ( ˆ α l ′ + , φ ∗ l ′ + ) − u ( ˆ α l ′ + , t l ′ ) ( b ) > p ∗ l ′ − + u ( ˆ α l ′ , φ ∗ l ′ ) − u ( ˆ α l ′ , φ ∗ l ′ − ) + u ( ˆ α l ′ + , φ ∗ l ′ + ) − u ( ˆ α l ′ + , φ l ′ ) = p ∗ l ′ + . Here, the inequality (b) is due to Lemma 4.1. Fourth, if l ′ + ≤ L , for all l ∈ [ l ′ + , L ] , we have by asimple mathematical induction that p l = p l − + u ( ˆ α l , φ ∗ l ) − u ( ˆ α l , φ ∗ l − ) ( c ) > p ∗ l − + u ( ˆ α l , φ ∗ l ) − u ( ˆ α l , φ ∗ l − ) = p ∗ l ′ . Here, the inequality (c) is due to p l − > p ∗ l − . Proof of Proposition 5.4.

Algorithm 1 searches each possible pair of ( α , α , · · · , α L + ) and ( i , i , · · · , i L + ) respectively in A and M (lines 1, 2, 3, 14, 4, 5, 13 of Algorithm 1), and com-putes the corresponding revenue under this pair (lines 6-10). Among all pairs that have beensearched so far, it records the current maximum revenue and the corresponding SLA delays andprices, and the numbers of servers assigned to SLAs (lines 1, 11, 12). Thus, the algorithm willreturn the optimal solution. The sizes of M and A are respectively polynomial in m and n (i.e., (cid:0) mL − (cid:1) and (cid:0) nL − (cid:1) ). The loop in line 4 is nested in the loop in line 2; hence, the time complexity is O (cid:0) m L − · n L − (cid:1) . B ADDITIONAL EXPERIMENTS

As seen in Section 1, our framework differs from [4, 5] in several aspects. Nevertheless, theservice model in Section 3 and 4 is generic. The architecture of [4, 5] can be adapted to our model,and roughly viewed as a hybrid of the PBS and SMS architectures. Specifically, all servers areseparated into two parts: the first are used to fulfill the first SLA, as done by the first module ofthe SMS architecture; the second use priority queues to fulfill the SLAs 2 , · · · , L , as done by thePBS architecture. Specially, when the number of SLAs is two (i.e., L = Fig. 11.

Revenue Ratio ˆ γ with L SLAs : the red (resp. blue) stars correspond to the case of low (resp. high)delay-tolerance. worse than the SMS-based system since the PBS architecture achieves a lower utilization. It can beexpected that the hybrid architecture has a in-between performance, as shown later.We denote by G ∗ hyb the maximum revenue achieved by our service model under the hybridarchitecture. For all l ∈ [ , L ] , let ˆ λ ′ l denote the total job arrival rate of SLAs 2 , · · · , l at a singleserver; we can derive the actual delay t l of the l -th SLA by (11) and the equation (8) for the PBSarchitecture, and have t l = ˆ λ ′ L /(( − ˆ λ ′ l − ) · ( − ˆ λ ′ l )) , (25)where ˆ λ ′ is set to zero trivially. The value of G ∗ hyb can be computed by a small modification of theline 7 of Algorithm 1 where for all l ∈ [ , L ] we change to use (25) to compute t l . The revenue ratioˆ γ , defined below, is used to show which of the SMS and Hybrid architectures is better:ˆ γ = G ∗ hyb / G ∗ sms . If ˆ γ ≤

1, the service model under the hybrid architecture will be no better than the SMS-basedservice system. This is exactly shown by the numerical results illustrated in Fig. 11.The reason for ˆ γ ≤ t , · · · , t L are all constrained by the total job arrival rate ˆ λ ′ L ,which is also the average load per server in the second part. Second, there is a sequence i , i , · · · , i L for mapping jobs to SLAs, as described in the last subsubsection. In the second part, the mostdelay-sensitive jobs have a type ˆ α = / φ ′ , i , and are assigned to the second SLA, which requires asmall SLA delay φ to guarantee that the SLA price p does not decrease to a negligible value. Thisfurther leads to a small ˆ λ ′ L .For example, in the low delay-tolerance case with L =

4, the first and second parts have 51 and49 servers respectively. The market segmentation is ( i , i , i ) = ( , , ) and we correspondinglyhave ( φ , i , φ , i , φ , i ) = ( . , . , . ) . The SLA delays and prices are as follows: ( φ , φ , φ , φ ) = ( . , . , . , . ) and ( p , p , p , p ) = ( , . , . , . ) . Specially, φ has to besmall and is around 0.5 times φ , i to guarantee that the price p = u ( ˆ α , φ ) is not low, as introducedin Section 3.1. By (25), the value of φ further limits that ˆ λ ′ L has to be small where φ = t . In theexperiments, we have ( ˆ λ ′ , ˆ λ ′ , ˆ λ ′ ) = ( . , . , . ) . This leads to that the second part ofservers achieve relatively low utilization and revenue. Finally, for the m servers, the average loadper server is 0.1.In contrast, the delays of different SLAs in the SMS architecture are independent by (9), whichunlocks the power of trading the job’s delay-tolerance for a higher utilization. Specifically, thenumbers of servers assigned to different SLAs are ( m , m , m , m ) = ( , , , ) . The market egmentation is ( i , i , i ) = ( , , ) and correspondingly ( φ , i , φ , i , φ , i ) = ( . , . , . ) .The SLA delays and prices are ( φ , φ , φ , φ ) = ( . , . , . , . ) and ( p , p , p , p ) = ( , . , . , . ) . Although the value of φ is still small, it imposes no constraints onthe value of λ , i.e., the average load per server of the fourth SLA. In the experiments, we have ( λ , λ , λ ) = ( . , . , . ) . For the m servers, the average load per server is 0.12, whichis larger than the one in the hybrid architecture, and the revenue ratio ˆ γ =0.8753.=0.8753.