Akita: A CPU scheduler for virtualized Clouds
Esmail Asyabi, Azer Bestavros, Renato Mancuso, Richard West, Erfan Sharafzadeh
AAkita: A CPU scheduler for virtualized Clouds
Esmail Asyabi , Azer Bestavros , Renato Mancuso , Richard West , Erfan Sharafzadeh Boston University Johns Hopkins University { easyabi, best, rmancuso, richwest } @bu.edu [email protected] Abstract
Clouds inherit CPU scheduling policies of operating sys-tems. These policies enforce fairness while leveragingbest-effort mechanisms to enhance responsiveness of allschedulable entities, irrespective of their service level ob-jectives (SLOs). This leads to unpredictable performancethat forces cloud providers to enforce strict reservationand isolation policies to prevent high-criticality services(e.g., Memcached) from being impacted by low-criticalityones (e.g., logging), which results in low utilization.In this paper, we present Akita, a hypervisor CPUscheduler that delivers predictable performance at highutilization. Akita allows virtual machines (VMs) to becategorized into high- and low-criticality VMs. Akita pro-vides strong guarantees on the ability of cloud providers tomeet SLOs of high-criticality VMs, by temporarily slow-ing down low-criticality VMs if necessary. Akita, there-fore, allows the co-existence of high and low-criticalityVMs on the same physical machine, leading to higher uti-lization. The effectiveness of Akita is demonstrated bya prototype implementation in the Xen hypervisor. Wepresent experimental results that show the many advan-tages of adopting Akita as the hypervisor CPU scheduler.In particular, we show that high-criticality MemcachedVMs are able to deliver predictable performance despitebeing co-located with low-criticality CPU-bound VMs.
To be able to harness the economies of scale, data cen-ters tend to consolidate diverse workloads ranging fromcritical (e.g., in-memory key-value stores) to non-criticalservices (e.g., logging, batch or video processing) onto asfew physical machines (PMs) as possible, to raise utiliza-tion, mitigate operational costs, and/or reduce energy con-sumption [10] [23] [16] [25] [27]. In virtualized clouds,however, many research studies show that under policiescurrently adopted for processor sharing, co-located work-loads introduce a level of unpredictability regarding theperformance of critical applications [19] [10] [22] [29][17] [3]. Therefore, to deliver the expected Quality of Ser-vice (QoS), cloud providers resort to strict isolation and reservation policies, leading to significant inefficienciesresulting from wasted/unused processor capacities [15][30] [25] [16].Existing CPU scheduling policies currently adopted inclouds (e.g., Xen’s Credit schedulers and Linux’s CFSscheduler) are inherited from operating systems. Themain goal of these policies is to enforce fair allocation ofCPU shares which are determined by virtual CPU (vCPU)budgets [36]. For I/O, however, they rely on best-effortapproaches to raise responsiveness of all IO-bound vC-PUs by reducing their CPU Access Latencies (CALs) asmuch as possible. For example, Xen’s credit scheduleremploys a load balancer to exploit all existing cores inorder to reduce CALs of IO-bound vCPUs [4] [3].Although best-effort policies improve responsiveness,we argue that they impose several challenges to virtual-ized clouds: Best-effort mechanisms try to raise respon-siveness of all vCPUs/VMs irrespective of their ServiceLevel Objectives (SLOs). This contradicts the cloud pay-as-you-go business model since these policies might uti-lize all processor resources (through load-balancing) todeliver the IO performance that is neither expected norpaid for. More importantly, these policies lead to un-predictability, where delivered IO performance of a vir-tual machine (VM) depends on the behavior/existence ofother VMs, making it very difficult to predict VMs’ de-livered QoS in advance. This forces cloud providers toresort to costly middleware in order to monitor QoS ofVMs to prevent SLO violations (e.g., by migrating VMs[10]), or to seek predictability through over-provisioning(e.g., by pining each vCPU to a distinct CPU core), re-sulting in lower utilization, more power consumption andhigher operational costs [15] [19] [35].We present Akita, a CPU scheduler for virtual-ized clouds. Akita offers several advantages to cloudproviders. First, it offers a predictable IO performanceeven at high utilization of processor resources. Second,it characterises VMs/vCPUs with not only a CPU bud-get but also an IO Quality (IOQ). Adopting this informa-tion, Akita scheduler offers differentiated service quali-ties regarding both execution time and IO performanceof VMs determined by VMs’ budgets and IOQs, respec-tively. Third, unlike existing schedulers, Akita operates1 a r X i v : . [ c s . O S ] S e p he lowest possible number of CPU cores while deliveringthe expected QoS. Finally, Akita offers a schedulabilitytest that lets cloud providers know if a PM can accommo-date a VM’s IO and CPU requirements in advance, mit-igating the need for costly monitoring services in clouddata centers.To achieve these goals, in Akita, we characterizeVMs/vCPUs with four attributes, a pessimistic budget, anoptimistic budget, an IO quality metric and a criticallylevel. A VM’s optimistic budget indicates its average re-quired CPU time, the pessimistic budget shows its worstcase required CPU time. IO quality indicates the averageresponse time tolerated by the VM, and the critically levelwhich is either high or low indicates if the VM is runninga critical (e.g., a user facing) application or a non-critical(e.g., background or free service) application.In each CPU core, Akita scheduler alternates betweentwo modes: normal and critical modes. When mode isnormal, Akita allocates all vCPUs the CPU shares deter-mined by their optimistic budgets with CPU access la-tencies that are not higher than their IO qualities. Mean-while, Akita monitors high-criticality vCPUs. If a high-criticality vCPU requires its pessimistic CPU budget,Akita changes its mode to the high critical mode. Inthe the high critical mode, Akita ignores low-criticalityvCPUs to accommodate required CPU shares of high-criticality vCPUs. When high-criticality vCPUs no longerneed their pessimistic budgets, Akita resets its mode tonormal mode again. Therefore, (1) Akita increases utiliza-tion of processor resources by consolidating high and low-criticality vCPUs on the same pCPU (2) enforces fairnessby allocating VMs CPU shares corresponding to VMs’budgets (3) offers different service qualities for IO, bykeeping the CALs of vCPUs less than their defined IOQs,and (4) finally offers a predictable performance for high-criticality VMs by temporarily slowing down/ignore lowcriticality ones if needed.Further, Akita distributes vCPUs based on a first-fit ap-proach. When a new vCPU is created, Akita finds the firstpCPU that can accommodate the vCPUs requirements interms of both CPU share and IO quality. Therefore, Akitaoperates as few CPU cores as possible while delivering theexpected performance as apposed to existing schedulersthat utilize all CPU cores. In summary, our contributionsare as follows: • We show that existing CPU schedulers adopted invirtualized clouds lead to unpredictability and con-sequently low utilization. • We present Akita, a hypervisor CPU scheduler thatraises utilization while offering a predictable IO per-formance. • We have built a prototype of Akita in the Xen hy-pervisor. We present experimental results that show the many advantages of Akita as the hypervisor CPUscheduler. In particular, we show that HI-crit Mem-cached/RPC VMs are able to deliver predictableperformance despite being co-located with LO-critCPU-bound VMs.
In this section, we study the essential properties of a hy-pervisor CPU scheduler that improves performance pre-dictability and utilization of virtualized clouds. We thendiscuss why existing CPU scheduling policies are not ableto address the hard problem of predictable IO at high uti-lization.
Fairness.
The scheduler must enforce fairness whereCPU shares are allocated corresponding to VMs budgets.These budgets are determined by the cloud provider basedon SLOs and VM prices, offering different execution timeto CPU bound workloads.
Differentiated IO qualities.
Similar to CPU shares,we argue that the scheduler must also offer different IOqualities, ranging from best effort to guaranteed and pre-dictable IO.
High utilization.
Predictability is easily achievablethrough over-provisioning. Therefore, we argue that pre-dictable IO should be offered at high utilization of proces-sor resources, which in turn cuts operational and powercosts.
Schedulability test.
The scheduler must let the cloudproviders know if a PM can accommodate a VM’s re-quirements (CPU share and IO quality) in advance. Thismitigates the need for costly monitoring services.
The Credit scheduler of Xen uses a proportional weight-ing policy to share processor resources among executableentities (vCPUs). It assigns each vCPU a relative weightthat determines the amount of CPU time that can be allo-cated to a vCPU relative to other vCPUs. The schedulerserves vCPUs in round-robin fashion with the time sliceof 30ms. Its main goal is to share processor resourcesfairly augmented by a best-effort approach (boosting) toincrease the performance of IO-bound vCPUs by givingthe highest priority to the newly awakened vCPUs whichare very likely to be IO-bound vCPUs. The IO perfor-mance is further improved by a load balancer that reducesthe waiting time of vCPUs by migrating vCPUs from thequeues of busier CPUs to idle CPUs. Therefore, by ad-justing the weight, different CPU shares can be allocated2o VMs, enforcing fairness. When it comes to IO per-formance, although Xen’s best-effort mechanisms (boost-ing and load balancing) improve the performance of IO-bound workloads, it results in an unpredictable IO per-formance reported in several research works conductedon Xen-based virtualized clouds, making it very diffi-cult to predict VMs’ IO performance in advance. Thisis because the Xen scheduler treats all VMs/vCPUs as ifthey were equally important, trying to improve all VMsIO without any discrimination. (They all compete forbetter IO). Therefore, LO-crit vCPUs can adversely im-pact the IO performance of HI-crit vCPUs if they areco-located. Figure 1a shows the response times of aMemcached VM when its alone compared to when it runsalongside CPU-bound VMs. In this experiment, all VMshave enough CPU budgets. As shown, response timesare significantly impacted by neighbour VMs, forcingcloud providers to resort to over-provisioning to delivera predictable IO. This is because neighbor CPU-boundVMs lengthen CALs of the Memcached vCPU and con-sequently the response times of the Memcached VM. Wedefine CAL as the latency between the time when an I/Orequest arrives and the time when its corresponding vCPUis executed on a pCPU.
Linux CPU scheduler.
The Completely Fair Sched-uler (CFS), leveraged by KVM, is another dominant CPUscheduler in clouds whose main goal is to share proces-sor resources fairly. It treats vCPUs the same as Linuxprocesses, referring to them as tasks. The scheduler as-signs each task an attribute called vRuntime to track theexecution times of tasks. A task’s vRurntime inflates asthe task runs on a CPU. The speed of inflation dependson the task priority. The higher the priority, the slowerthe inflations. The scheduler enforces fairness by keepingthe vRuntime of tasks equal. To do so, CFS uses a red-black tree to sort tasks based on their vRuntimes. At eachscheduling decision, it chooses the task with the lowestvRuntime (the leftmost task on the tree) to execute next.Since IO bound tasks tend to consume less CPU time rel-ative to CPU-bound ones, their vRuntimes grow slowerand therefore they are typically located on the left-mostof the red-black tree, being the first to be executed, deliv-ering a good performance for IO-bound workloads. Thisscheduler does not offer different service qualities for IO-bound tasks given its best-effort nature to enhance IO per-formance. However, one can assign different priorities toCPU bound workloads. This scheduler does not offer anyschedulability test regarding IO bound workloads. Similarto Xen’s CPU scheduler, it does not support the notion ofcriticality. Therefore, low criticality tasks impact the QoSof HI-crit tasks. As shown in Figure 1b, under KVM,the quality of Memcached VM as a HI-crit VM is notablyimpacted by neighbor CPU-bound VMs.
Fixed priority schedulers.
Simplistic fixed-priority schedulers are not able to offer all requirements out-lined in Section 2.1: Fairness, different IO qualities, highutilization and schedulability test. A promising fixed-priority scheduler is the Rate monotonic (RM) scheduler,which is used in RT-Xen scheduler [32]. If RM is used forvirtualized clouds, it characterizes vCPUs using a bud-get (C) and a period (T). Budget C indicates the CPUshare that must be allocated to a vCPU during each pe-riod T. vCPUs with lower periods get higher priorities.By adjusting C and T, one can assign different quotas ofCPU shares to different vCPUs/VMs, offering differentexecution times to CPU-bound VMs and enforcing fair-ness. For example, if for vCPU v, C = 25 and T = 100,the vCPU will get 25% of the CPU time. Further, by ad-justing T, the cloud provider can adjust the CPU access la-tencies of vCPUs because over each T the vCPU certainlygets access to a CPU, offering different service qualitiesto IO-bound workloads, determined by the periods(T) oftheir corresponding VMs. Finally, RM features a schedu-lability test that determines if a vCPU set is schedula-ble or not ( (cid:80) ni =1 C i T i ≤ n × ( n √ − , where n is thenumber of vCPUs). RM treats all vCPUs equally impor-tant, implying that a vCPU will possibly face lengthenedCALS if another vCPU fails to be bounded by its ownbudget. Therefore, in order to deliver a predictable IOperformance, system designers have to reserve exagger-ated large amounts of CPU time (budgets) for all vCPUsincluding non-critical vCPUs, which highly exceed theactual worst case execution times to ensure every vCPUperforms correctly even under harsh circumstances, ulti-mately leading to low utilization. [31] [7] [8]. Earliest deadline first (EDF) is a dynamic priorityCPU scheduling policy that dynamically assigns highestpriority to the tasks with the earliest deadline. Similar toRM, if EDF is used as hypervisor CPU scheduler, it as-signs each vCPU an execution time (C) and a period (T)that can be leveraged to offer different QoS to both IO-and CPU-bound workloads. EDF also features a schedu-lability test (cid:80) ni =1 C i T i ≤ , where n is the number of vC-PUs. However, similar to RM, EDF does not considervCPU criticalities. Therefore, the cloud administratorshave to reserve budgets required for peak loads for all vC-PUs to avoid the impacts of vCPUs on HI-crit vCPUs, se-vere wasting of processor resources that classic schedul-ing policies such as EDF and RM are not able to mitigate.[31] [8]. Using RM and EDF for vCPU scheduling and strict bud-get enforcement, a misbehaving VM cannot impact otherVMs. However, if a VM does not have enough budget tohandle a temporary overload . It ends up lagging behind3 a) Xen (b) KVM
Figure 1: RTTs of Memcached requestsFigure 2: Akita in a virtualized cloudin all future activations. Solving this problem by employ-ing classic periodic budgeting requires always assigningenough budgets to handle overloads, which is overly pes-simistic and leads to resource under-utilization, the mainmotivation behind designing Akita.In Akita, we categorize VMs into high-criticality andlow-criticality VMs. High-criticality VMs and conse-quently their vCPUs are characterized by three param-eters: an optimistic budget ( C opt ), a pessimistic budget( C pes ), and a period ( T ). Low-criticality VMs/vCPUs, onthe other hand, are characterized by an optimistic budgetand a period. The optimistic budget of a VM indicates theaverage CPU time the VM needs, which is less than pes-simistic budget that shows the highest CPU time the high-criticality VM would require under harsh circumstances(peak load).Having our schedulable entities (vCPUs) characterized,Akita’s scheduling policy is as follows:for each vCPU v with period T , Akita allocates v itsoptimistic budget over T . Meanwhile, Akita monitorshigh-criticality vCPUs. If v is a high-criticality vCPUand has consumed its optimistic budget and requires itspessimistic budget, Akita switches to the mode that dis-cards all low-criticality vCPUs to merely schedule high-criticality ones to allocate them their pessimistic budgets.When high-criticality vCPUs no longer need their pes-simistic budgets, Akita resets to the mode where all vC-PUs are scheduled and allocated their optimistic budgetsregardless of their criticalities.We will discuss the intuition behind Akita’s schedul-ing policy and its schedulability test in Section 2.3. Akitaaddresses the requirements outlined in Section 2.3 as fol-lows: Fairness.
The fraction of CPU time allocated to eachhigh-criticality vCPU is [ C opt /T , C pes /T ], dependingon the demand of the high-criticality vCPU. The frac- Figure 3: The Architecture of Akitation of CPU time allocated to low-criticality vCPUs is[ , C opt /T ], depending on the behavior and existenceof high-criticality vCPUs. Note that the CPU shares ofhigh-criticality vCPUs are not impacted by low-criticalityvCPUs while the CPU share of low-criticality vCPUs areimpacted by high-criticality vCPUs. By adjusting C and T and the criticality level, cloud providers offer differentCPU shares to different VMs. Different IO qualities.
In Akita, each high-criticalityvCPU is guaranteed to get access to a CPU over its pe-riod, and low criticality vCPUs typically get access toa CPU over their periods, depending on the behavior ofhigh-critically vCPUs. Therefore, by adjusting the periodand the criticality level, cloud providers offer different IOqualities ranging from predictable to best-effort perfor-mance.
High utilization.
In Akita, both high and low-criticality vCPUs are consolidated on the same CPU andthe performance of high-criticality workloads are not im-pacted by low-criticality vCPUs. This translates to higherutilization while delivering a predictable performance forhigh-criticality vCPUs, as apposed to existing schedulersthat force the cloud providers to isolate high-criticalityVMs through over-provisioning in order to achieve pre-dictability.
Figure 2 depicts where Akita stands in a virtualizedcloud. Cloud providers translate VMs’ SLOs to Akita’slanguage, namely pessimistic and optimistic budget, crit-icality level and period.Figure 3 illustrates the architecture of Akita. Akita of-fers an API to cloud providers to specify VMs using fourparameters: optimistic budget, pessimistic budget, periodand criticality level. vCPUs inherit these parameters fromtheir corresponding VMs. When a VM is created, theCPU picker assigns each vCPU to a CPU core that can ac-4ommodate the vCPU requirements according to Akita’sscheduling policy. The design of the CPU picker is de-scribed in Section 3.6. Akita vCPUs periodically ask forCPU shares. In Section 3.2, we will explain how weinject periodic behaviour in our vCPUs. Akita’s CPUsswitch between two modes: normal and high-criticality.In normal mode, Akita schedules all vCPUs irrespec-tive of their criticalities. In the high-criticality mode, itmerely schedules high-criticality vCPUs. In section 3.3,we will describe our policy and its corresponding mecha-nism for CPU mode switching. Finally, we will describeour scheduling algorithm and its schedulability test in sec-tion 3.4.
Unlike sporadic/periodic tasks, vCPU do not have a pe-riodic behavior. Their activations are bursty without aclear minimum inter-arrival time. To inject periodic be-havior, we transform over vCPUs to periodic servers thatare given a budget and a period. These budgets are strictlyenforced by the scheduler; therefore, the problem of over-run cannot occur. vCPUs, at the beginning of each period,ask for a CPU share that should be received before the be-ginning of the next period. Therefore, Akita imitates theimplicit-deadline sporadic tasks systems in which tasks’deadlines are equal to their periods.To enable periodic behavior, Akita’s vCPUs feature aninternal timer that fires periodically at the beginning ofeach period. When a vCPU’s timer tick, the vCPU’s bud-get is set to the optimistic budget If the vCPU is a LO-critvCPU; otherwise, the vCPU is a HI-crit vCPU whose bud-get is set to its pessimistic budget.As a vCPU runs on a pCPU, its budget is decreasedproportionally. If a vCPU’s budget runs out. The vCPU isdeactivated, meaning that the scheduler ignores the vCPUif there exist other vCPUs with positive budgets. At thenext tick of the vCPU’s timer, the vCPU is activated againand its budget is set to its optimistic or pessimistic budget,depending on its criticality level.Akita’s scheduler is work conserving. When there is noactive vCPU, It schedules inactive vCPUs, suggesting thatAkita will not remain idle if there exist runnable vCPUs.
Each pCPU starts in the LO-crit mode. As a consequence,the scheduler at first treats all vCPUs as equally impor-tant and allocates each vCPU its desired optimistic CPUbudget. Meanwhile, using our accounting mechanism, wemonitor the CPU consumption of the currently runningvCPU. If the currently running vCPU is a HI-crit vCPUthat has been executed for its optimistic budget and isstill runnable (there are running & ready processes inside its corresponding VM), it indicates the HI-crit vCPU re-quires its pessimistic CPU share. If so, a mode switch isimmediately triggered and the scheduler switches to theHI-crit mode. Henceforth, the pCPU behaves as if therewere only HI-crit vCPUs; It merely schedules them to al-locate them their pessimistic budgets and LO-crit vCPUswill not receive any further execution.In the HI-crit mode, an early mode switch to the LO-crit mode may impact the performance of HI-crit vCPUs.Therefore, before switching back to the LO-crit mode,we need to make sure that HI-crit vCPUs no longer needtheir pessimistic budgets. To this end, we assign each HI-crit vCPU a temperature. When a HI-crit vCPU needs itspessimistic budget (has received its optimistic budget anddoes not signal completion), its temperature is set to a pos-itive number. On the other hand, if a HI-crit vCPU whosetemperature is greater than zero has not asked for its pes-simistic budget in the last period, we decrease the vCPU’stemperature by one degree, cooling down the vCPU. Thetemperature of a HI-crit vCPU, in fact, indicates how re-cently the vCPU has asked for its HI-crit CPU share. Thehotter a vCPU, the more recent it caused a mode switch tothe HI-crit mode.Having known vCPU temperatures, our approach forswitching back to the LO-crit mode is straightforward.If there exists no vCPU with positive temperature, it in-dicates that recently no vCPU has requested for its pes-simistic CPU share and therefore the pCPU’s mode canbe switched back to the LO-crit mode. The initial valueof the temperature determines how fast a pCPU gets re-set to LO-crit mode. The higher the temperature, theslower switching back to the LO-crit mode. Therefore,in Akita, switching to the HI-crit mode is instant whileswitching back to the LO-crit mode is conservative, prior-itizing the predictability and performance of HI-crit vC-PUs and VMs. The main objective of the scheduling unit is to choose thenext vCPU from a pCPU’s run queue to execute for a pre-defined amount of time known as the time slice. In fact,the scheduling function determines the order of schedul-ing of vCPUs located on the run queue. In Akita, thescheduling function is invoked for the following reasons:(1) When a vCPU’s timer ticks, the vCPU becomes ac-tivated; since it is likely that the newly activated vCPU ismore important/urgent than the currently running vCPU,the scheduler is invoked (2) The currently running vCPUrelinquishes the pCPU voluntarily for a reason (e.g., thevCPU is idle), the scheduler is invoked to choose the nextrunnable vCPU (3) The currently running vCPU relin-quishes the pCPU forcibly because it has used its budget(4) and finally when a HI-crit vCPU has received its op-5imistic budget, the scheduler is invoked to check if thevCPU is still runnable; if so, a mode switch must be trig-gered.At each scheduling decision Akita takes several steps.First, it determines the pCPU’s mode. Second, It updatesthe budget of the currently running vCPU based on itsCPU consumption. Third, it updates the state (active whenthe vCPU’s budget exhausts; otherwise inactive) and thetemperature of the currently running vCPU. Finally, It in-serts the currently running vCPU into the run-queue tochoose the next eligible vCPU from the run queue. Ifmode is HI-crit, Akita chooses an active HI-crit vCPUwith the earliest deadline, if mode is LO-crit, it choosesthe vCPU with earliest deadline regardless of its criti-cality. The time slice is set to the budget of the chosenvCPU. vCPUs deadlines are set to their periods when theirtimer tick. Therefore, at each mode, Akita imitates EDFscheduling of implicit deadline sporadic task systems.vCPUs are sorted based on their deadlines, and activevCPUs are always located before inactive ones. The timecomplexity of vCPU insertion is O ( n ) . At each schedul-ing decision, the scheduler picks the vCPU at the head ofrunqueue to execute next with time complexity of O (1) .Therefore, similar to Xen’s Credit schedulers Akita is an O ( n ) scheduler. Low utilization caused by traditional real-time policiessuch as EDF and RM scheduling led researchers to de-sign a novel scheduling policy, namely mixed-criticalityscheduling (MCS) which is the main intuition behindAkita’s design. MCS discriminates between importantand non-important tasks by assigning a new dimension totasks, referred to as criticality. MC scheduler goal is toallocate each task its optimistic budget over each perioduntil a HI-crit task asks for its pessimistic budget. If So,the scheduler will move to HI-crit mode where all LO-crittasks are immediately discarded. An MC scheduler notonly must favor urgent jobs (e.g., jobs with lower peri-ods in RM) but also prioritize high-critical jobs to preparethem for potentially long executions. The compromisebetween urgency and importance results in exponentialchoices, leading to the fact that mixed-criticality schedul-ing is NP-hard in a strong sense [6] [11]. Therefore,several approximation algorithms have been proposed forscheduling of MC tasks including but not limited to EDF-VD [8] and OCBP [21].EDF-VD leverages EDF to schedule tasks. However, itshrinks the deadlines of HI-crit tasks by a certain factor sothat HI-crit tasks will be promoted by the EDF scheduler.In fact, it reduces the actual deadlines (D) of HI-crit taskjobs to modified deadlines that are called virtual deadlines(VD) which are lower than actual deadlines. EDF-VD cal- culates virtual deadlines using Equations (1)-(5), wherein C opt and C pes indicate optimistic and pessimistic budgets,respectively, U (1) is the utilizations of the LO-crit tasks, U (1) is the utilizations of the HI-crit tasks consideringtheir optimistic budgets, U (2) is the utilizations of theHI-crit tasks considering their pessimistic budgets, and V D indicates the virtual deadline calculated by shrinkingthe actual deadline by a factor of x . Condition (6) is theEDF-VD’s schedulability test. If condition (6) holds for atask set, the task set is MC-schedulable. The key propertyof real-time scheduling strategies is that they guaranteethat the deadlines of all the tasks are met. This require-ment is too strong for the virtualized clouds, where infre-quent violations of temporal guarantees would not lead tocatastrophic consequences. Although the current versionof Akita uses EDF-VD to determine the deadline of ourvCPUs as well as our schedulability test, a cloud providercan take a less conservative schedulability test (e.g., EDFschedulability test). Unlike MC systems, Akita do not dis-card LO-crit vCPUs when a mode switch happens. OurvCPUs alternate between low and high criticality modes. U (1) = (cid:88) τ i : X i =1 C opt ( i ) /T i (1) U (1) = (cid:88) τ i : X i =2 C opt ( i ) /T i (2) U (2) = (cid:88) τ i : X i =2 C pes ( i ) /T i (3) x = U (1) / (1 − U (1)) (4) V D = (cid:40) now + x ∗ T i if Xi = 2 now + T i if Xi = 1 (5) x ∗ U (1) + U (2) ≤ (6) When a VM is initially consolidated in a PM, Akita in-vokes the pCPU picker function to find appropriate pC-PUs that can accommodate vCPUs of the new VM. ThepCPU picker uses our schedulability test to determine ifa pCPU can accommodate a vCPU. For each vCPU, ourCPU picker assigns the vCPU to the first CPU core thatcan accommodate the vCPU. If the pCPU picker fails tofind an appropriate pCPU for a vCPU, Akita simply no-tifies the cloud provider, implying that hosting this VMwould lead to SLO violations. Akita, therefore, offerscloud providers a schedulability test that enables a wiserVM placement, and thus mitigating the need for costlymonitoring services . More importantly, Akita’s first-fitmechanism for vCPU assigning keeps the number of op-erating cores as low as possible. The remaining idle cores6everage C-state mechanism to turn off their internal com-ponents to save power. Akita’s counterparts, on the otherhand, operate all existing CPU cores in a blind effort toincrease IO performance. Therefore, adopting Akita willlead to lower power and thus operational costs of virtual-ized clouds.
We evaluate the Akita’s performance by answering thefollowing questions quantitatively: (1) Is Akita able tooffer different service qualities to both IO- and CPU-bound workloads? ( § § § § § § Akita offers different levels of QoS to both IO- and CPU-bound workloads by enforcing different CPU shares ad-justed by periods and budgets, and different CALs ad-justed by periods. In this experiment, the physical servermachine hosts a bunch of CPU-bound (1x Type2 VMand 1x Type3 VM) and IO-bound VMs (1x Type4 VM Table 1: The configurations of VMs
VMType C opt C pes Period CL
Type1 5ms 25ms 100ms HighType2 25ms - 100ms LowType3 50ms - 100ms LowType4 5ms - 50ms LowType5 10ms - 100ms LowType6 10ms 50 100ms High (a) Execution time (b) Average of RTTs
Figure 4: Differentiated service qualities offered by Akitaand 1x Type5 VM). CPU bound VMs run Sysbench, andIO-bound VMs run an RPC benchmark. Each IO-boundVM is stressed with multiple concurrent client threadshosted in the client physical machine, each generating 500calls/second forming a Poisson process. Figure 4a showsthe execution times of Sysbench benchmarks, and Figure4b shows the average latency of RPC requests sent to eachIO-bound VM. Sysbench benchmark hosted in VMs withhigher budget/period have a lower execution time, andRPC workloads hosted in VMs with lower periods (CALs)are more responsive, suggesting that Akita is able to offerdifferent levels of QoS to both IO- and CPU-bound work-loads.
To assess the delivered QoS of HI-crit IO-bound VMsunder Akita, we use a Memcached server VM (Type1).This VM is stressed for two minutes by 50 concurrentconnections, each sending 700 requests per second withexponential inter-arrivals. Figure 5a shows the RTTs ofthe Memcached server VM when it is running alone on apCPU under Akita, Credit, and CFS (for Credit and CFSexperiments, we use a VM with the default configuration).As expected, RTTs are predictable and rarely higher than5ms under all schedulers. This is because the MemcachedvCPU runs alone on a pCPU, meaning that its CALs re-main intact which results in predictable and fast responsetimes. In this experiment, the Memcached VM utilizesaround 25% of the CPU time to accommodate the incom-ing key-value workload.Next, we host three CPU-bound VMs (3x Type2 VMs)7 a) RTTs (Alone) (b) RTTs (Not Alone) (c) Tail (Alone) (d) Tail (Not Alone)
Figure 5: Performance of a Memcached VM (a) RTTs (b) RTTs (c) RTTs (d) RTTs
Figure 6: Performance of Akita when multiple HI-crit VMs co-executerunning Lookbusy- a synthetic CPU-intensive workloadalongside the Memcached VM. Each of these VMs uti-lizes around 25% of the pCPU time during the experi-ment, 75% combined. For the Akita experiment, we con-figure the Memcached VM as a HI-crit VM and the CPU-bound VMs as LO-crit VMs. For CFS and Credit ex-periments, we use the default configuration for all VMs.Given the configuration of these VMs (see Table 1), Akitapacks all of these VMs on same CPU core based on itsfirst-fit assigning mechanism. To allow for a fair compar-ison, we pin all VMs on the same CPU core under Creditand CFS. We then stress the Memcached server VM asbefore, and record RTTs of Memcached requests underall schedulers.Figure 5b shows the RTTs of the Memcached serverVM. Under Akita, the Memcached server VM still deliv-ers predictable performance (response time) even thoughit is collocated with three other LO-crit VMs, suggest-ing that under Akita, the expected QoS of HI-crit latency-bound VMs is not influenced by collocated LO-crit VMswhereas the RTTs of Memcached requests are deterio-rated by collocated VMs under both Xen and KVM. Fig-ure 5c and 5d show the last latency percentiles of Mem-cached server VM when it is alone on a pCPU comparedto when it runs alongside CPU-bound VMs. Under Akita,the tail latency at 99.9 percentile remains intact as if the Memcached server VM ran alone on the pCPU while thetail latency is increased by 96% and 93% under Xen andKVM, respectively. This is because Akita keeps the CALsof Memcached vCPU as a HI-crit vCPU under the ex-pected value (period), as it switches the pCPU’s mode toHI-crit mode where LO-crit vCPUs are ignored to makesure the HI-crit vCPU gets its desired CPU share, as wellas geting access to the pCPU before the deadline specifiedby its period. Under best-effort scheduling policies of Xenand KVM, on the other hand, collocated CPU-bound vC-PUs impose lengthened CALs, resulting in variable per-formance of the Memcached VM.We next examine the effectiveness of Akita when mul-tiple HI-crit IO-bound VMs run on a pCPU. At first, aMemcached server VM (Type2) which is tagged as a LO-crit VM is stressed by four concurrent connections, eachsending 600 requests per second. We then host anotherMemcached server VM (Type1) that is tagged as a HI-critVM and stress both of the Memcached VMs using 8 con-current threads. Figure 6b shows the RTTs of MemcachedVMs. The RTTs of the LO-crit VM slightly fluctuate,while the HI-crit Memcached VM delivers a predictable(expected) performance. We increase the number of HI-crit Memcached VMs from 1 VM up to 3 VMs (3x Type1VMs) and repeat the experiment. As the number of HI-critVMs grows, the RTTs of the LO-crit VMs swing more8oticeably. However, in our experiment, no VM missesdeadlines because the RTTs of IO-bounds VMs mainlydepend on CALs of their corresponding vCPUs. When allvCPUs are latency-sensitive, they remain on a pCPU for avery short time ( ∼ µs ) and then voluntarily relinquishthe pCPU. They, therefore, cannot result in long CALsfor one another, the reason why no VM misses deadlinesin this experiment, as depicted in Figure 8.Figure 7: CPU utilization of VMs Linux’s CFS and Xen’s Credit schedulers both leverage aload-balancing mechanism to exploit multicores for rais-ing performance. Under these mechanisms, idle coressteal executable entities from ready queues of busier coresfor immediate execution, raising responsiveness of IO-bound vCPUs by mitigating their CALs, and reducingexecution times of CPU-bound vCPUs by utilizing morecores. In contrast, Akita assigns vCPUs to CPU cores us-ing a first-fit mechanism based on the Akita’s schedula-bility test, does not migrate HI-crit vCPUs, and occasion-ally migrates LO-crit ones forced by RQS mechanism. Tocompare best-effort load-balancing policies of Xen andKVM to Akita’s approach, we host an RPC server in a VM(Type1) running alongside 5 other CPU-bound VMs (5xType2 VMs) that consume 25% of CPU time and recordthe average response times of requests sent from the clientphysical machine, each generating 700 calls/second. Notethat the RPC VM is a HI-crit VM under Akita. We thenincrease the number of CPU-bound VMs from 5 VMs upto 24 VMs to utilize more CPU cores. In Figure 8, we seethat at low loads, Akita, Xen, and KVM deliver the sameperformance because load balancers of Xen and KVMsteal vCPUs from relatively busier cores to idle cores,shortening CALs of the RPC vCPU and hence mitigat-ing RPC RTTs. This performance, however, diminishesas the number of VMs increases. When all cores are uti-lized (when there are no idle cores), Xen and KVM arenot able to hide the negative impacts of collocated VMs onthe RPC VM. When 24 VMs run on the physical machine,Xen and KVM lengthen the tail latency at th percentileby 10x and 5x, respectively. Akita, on the other hand, de- livers a predictable performance by keeping the CALs ofthe RPC vCPU under the expected value regardless of thenumber VMs hosted on the physical machine.We report the number of idle CPU cores in Akita, Xen,and Linux during this experiment in Figure 9. As shown,Akita’s first-fit mechanism for assigning vCPUs to coreskeeps the number of idle cores and their idleness peri-ods as high as possible. When a CPU core does not findany runnable task/vCPU to execute, the operating systemstriggers a mechanism known as idle-state mechanism toturn off processor components in order to reduce powerconsumption [26] [18]. Increased number of idle coresin Akita results in more frequent triggering of CPU idle-states, causing more power savings while still deliveringthe expected performance. In contrast, Akita’s counter-parts using their load balancers, spread vCPUs as muchas possible to raise performance, no matter if this level ofperformance is expected or not, lowering the number ofidle cores and periods of their idleness, increasing the en-ergy consumption of CPUs and thus operational costs ofcloud data centers [14] [28]. In this experiment, we aim to study if Akita keeps theCPU share of HI-crit VMs intact. The physical servermachine hosts VM2, VM3, and VM4 (Type2) that runLookbusy and each utilizing 25% of a pCPU. We also hostVM1 (Type6) as a HI-crit VM. We vary CPU utilizationof the HI-crit VM to mimic an unpredictable CPU-boundworkload. Figure 7 demonstrates how Akita multiplexesthe CPU time among these VMs. At first, the LO-critVMs utilizes 75% of CPU time, 25% each. When theload offered to the HI-crit VM gets intense, the HI-critVM gets its desired CPU share while the remaining CPUtime is available to be equally allocated to LO-crit VMs,suggesting that the desired CPU share of the HI-crit VMis not impacted by LO-crit VMs which may come at theexpanse of unpredictable utilization of LO-crit VMs. Asshown in Figure 7, when load offered to the HI-crit VMutilizes 50% of the CPU time, each LO-crit VM can onlyutilize 13% of the CPU time despite the fact that each oneof them needs to utilize 25% of the CPU time.
CPU scheduling in clouds is a matter of great concern.Many research works have reported unpredictable andvariable performance of VMs in clouds [33] [34] [13] [24][20] [12] [2] that stems from traditional policies of CPUscheduling whose main goal is to raise performance orfairness using best-effort approaches. Xen’s round-robinpolicy [1] [5], for example, results in variable CPU access9 a) Akita (b) Xen (c) Linux
Figure 8: Tail latency of RPC round trip times (a) Akita (b) Xen (c) KVM
Figure 9: Number of idle-cores under different schedulerslatencies for vCPUs that are responsible for handling IO,hampering responsiveness. vSlicer [34] has tacked thischallenge by executing latency-sensitive vCPUs more fre-quently, mitigating CPU access latency (CALs) and thushigher responsiveness. vTurbo [33] is another effort thatdedicates specific cores to IRQ processing and modifiesthe guest VMs kernels to schedule IO-bound threads onvCPUs that run on dedicated cores. TerrierTail [3] re-duces the tail latency of VMs running network-intensiveworkloads by recognizing vCPUs that are receiving net-work packets and schedules them more frequently, reduc-ing their CPU access latencies and therefore raising re-sponsiveness.Although these approaches enhance responsiveness,the delivered QoS is still not predictable because they tryto raise performance of all VMs regardless of VMs SLOs.Consequently, some approaches suggest avoiding the co-existence of latency- and CPU-intensive VMs on the samephysical machine (isolation) to mitigate interference, en-hancing predictability at the expense of low utilization.[35] [19]. RT-Xen [32] is another approach that aims atdelivering real-time performance in virtualized environ-ments by adopting real-time scheduling in guest OSes andscheduling vCPUs as virtual servers for tasks hosted inguest OSs. Akita, however, is a cloud CPU schedulerwith the aim of delivering predictable IO at high utiliza-tion. Although Akita does not require any assumptionon guest OS schedulers, by adopting real-time guest OSschedulers, Akita can be used to deliver soft real-time performance in virtualized clouds. Further, Akita offersdifferent level of QoS through enforcing different CALsand CPU shares for for both CPU and IO bound work-loads. Most importantly, Akita augments VMs/vCPUswith a criticality-level and discriminate critical VMs fromnon-critical ones using, allowing the coexistence of HI-crit and LO-crit VMs on the same machine and thus rais-ing utilization.
Mixed-criticality scheduling is a relatively new en-deavor that differentiates tasks based on their criticali-ties for efficient utilization of computing resources whileguaranteeing all critical tasks meet their deadlines [6] [7][11]. Vestal initiated MC scheduling by proposing a for-malism for a multi-criticality task model and conductingfixed-priority response time analysis for priority assign-ment [31] [9] . AMC [11] is an implementation schemefor uniprocessor scheduling of MC systems. Under AMC,when the system’s mode is switched to the HI-crit modeall LO-crit tasks are abandoned forever. Akita is an SMPscheduler for virtualized clouds. Akita CPUs alternatebetween HI-crit and LO-crit modes. Akita uses a modeswitching mechanism to return back to the LO-crit modewhen the conditions are appropriate.EDF-VD [8] is an MC scheduling algorithm forimplicit-deadline sporadic task systems that modifiesstandard EDF algorithm for the MC priority assignment.EDF-VD shrinks the deadlines of HI-crit tasks propor-tionally so that the HI-crit tasks will be prompted by theEDF scheduler. Akita leverages EDF-VD to determinethe order of the execution of vCPUs. In Akita, unlikeEDF-VD, pCPUs in HI-crit mode are reset to LO-critmode if HI-crit vCPUs no longer need their pessimisticCPU shares. Further, Akita reduces power consumptionof cloud data centers by using a first mechanism for allo-cating CPU core while delivering the expected QoS.
Akita is a CPU scheduler for virtualized clouds. Akita’smain goal is to offer predictable IO even at high utilization10f processor resources. To this end, it first characterizesVMs based on their CPU and IO requirements. It thencategorizes running VMs into HI-crit and LO-crit VMs.Akita ensures a predictable performance for HI-crit VMseven when HI-crit VMs are consolidated with other VMs,which may come at the cost of slowing down the LO-critVMs temporarily. This allows the coexistence of HI-critand LO-crit VMs on the same machine, which notablyenhances the utilization of cloud data centers. Experi-ments with a prototype implementation of Akita demon-strate that a Memcached server VM, as a HI-crit VM, de-livers an intact and predictable performance while runningalongside several LO-crit CPU-bound VMs.
References [1] Xen credit scheduler.
Available:http://wiki.xenproject.org/wiki/Credit Scheduler .Last accessed: May 2018.[2] G. Ajay, A. Merchant, and P. J. Varman. mClock:handling throughput variability for hypervisor IOscheduling.
Proceedings of the 9th USENIX con-ference on Operating systems design and implemen-tation , 2010.[3] E. Asyabi, S. SanaeeKohroudi, M. Sharifi, andA. Bestavros. Terriertail: Mitigating tail latency ofcloud virtual machines.
IEEE Transactions on Par-allel and Distributed Systems , 29(10):2346–2359,Oct 2018.[4] E. Asyabi, E. Sharafzadeh, S. SanaeeKohroudi, andM. Sharifi. Cts: An operating system cpu schedulerto mitigate tail latency for latency-sensitive multi-threaded applications.
Journal of Parallel and Dis-tributed Computing , 133:232 – 243, 2019.[5] P. Barham, B. Dragovic, and K. Fraser. Xen and theart of virtualization.
Proceedings of the NineteenthACM Symposium on Operating Systems Principles(SOSP) , pages 164–177, 2003.[6] S. Baruah, V. Bonifaci, G. D’Angelo, H. Li,A. Marchetti-Spaccamela, N. Megow, andL. Stougie. Scheduling real-time mixed-criticalityjobs.
IEEE Transactions on Computers , 61(8):1140–1152, Aug 2012.[7] S. Baruah, V. Bonifaci, G. DAngelo, H. Li,A. Marchetti-Spaccamela, S. van der Ster, andL. Stougie. The preemptive uniprocessor schedulingof mixed-criticality implicit-deadline sporadic tasksystems. In , pages 145–154, July 2012. [8] S. K. Baruah, V. Bonifaci, G. D’Angelo,A. Marchetti-Spaccamela, S. Van Der Ster,and L. Stougie. Mixed-criticality scheduling ofsporadic task systems. In
ESA , pages 555–566.Springer, 2011.[9] S. K. Baruah, A. Burns, and R. I. Davis. Response-time analysis for mixed criticality systems. In , pages34–43, Nov 2011.[10] A. Beloglazov and R. Buyya. Managing overloadedPMs for dynamic consolidation of virtual machinesin cloud data centers under quality of service con-straints.
IEEE Transactions on Parallel and Dis-tributed Systems , 24(7):1366–1379, 2013.[11] A. Burns and S. Baruah. Towards a more practicalmodel for mixed criticality systems. In
Workshopon Mixed-Criticality Systems (colocated with RTSS) ,2013.[12] R. C. Chiang, J. Hwang, H. H. Huang, and et al. Ma-trix: Achieving Predictable Virtual Machine Perfor-mance in the Clouds. , pages 45–56,2014.[13] S. Gamage, C. Xu, R. R. Kompellaa, and D. Xu.vPipe: piped I/O offloading for efficient data move-ment in vrtualized clouds.
Proceedings of the ACMSymposium on Cloud Computing , pages 1–13, 2014.[14] C. Gough, I. Steiner, and W. Saunders.
Energy Effi-cient Servers: Blueprints for Data Center Optimiza-tion . Apress, Apr. 2015.[15] K. Hazelwood, S. Bird, D. Brooks, S. Chintala,U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia,A. Kalro, et al. Applied machine learning at face-book: A datacenter infrastructure perspective. In
High Performance Computer Architecture (HPCA),2018 IEEE International Symposium on , pages 620–629. IEEE, 2018.[16] C. Iorgulescu, R. Azimi, Y. Kwon, S. Elnikety,M. Syamala, V. Narasayya, H. Herodotou, P. Tomita,A. Chen, J. Zhang, and J. Wang. Perfiso: Per-formance isolation for commercial latency-sensitiveservices. In , pages 519–532, Boston,MA, 2018. USENIX Association.[17] K. Jang, J. Sherry, H. Ballani, and T. Moncaster.Silo: Predictable Message Latency in the Cloud.
SIGCOMM Comput. Commun. Rev. , 45(4):435–448,Aug. 2015.1118] S. Kanev, K. Hazelwood, G. Wei, and D. Brooks.Tradeoffs Between Power Management and TailLatency in Warehouse-scale Applications. In , pages 31–40. ieeex-plore.ieee.org, Oct. 2014.[19] J. Leverich and C. Kozyrakis. Reconciling HighServer Utilization and Sub-millisecond Quality-of-service. In
Proceedings of the Ninth European Con-ference on Computer Systems , EuroSys ’14, pages4:1–4:14, New York, NY, USA, 2014. ACM.[20] C. Li, I. Goiri, A. Bhattacharjee, and et al. Quanti-fying and improving i/o predictability in virtualizedsystems.
IEEE/ACM 21st International Symposiumon Quality of Service (IWQoS) , 2013.[21] H. Li and S. Baruah. An algorithm for schedulingcertifiable mixed-criticality sporadic task systems.pages 183–192, Nov 2010.[22] R. Nathuji, A. Kansa, and A. Ghaffarkhah. Q-clouds: managing performance interference effectsfor QoS-aware clouds.
Proceedings of the 5th Euro-pean conference on Computer systems , pages 237–250, 2010.[23] R. R. Nikolas Roman Herbst, Samuel Kounev. Elas-ticity in cloud computing: what it is, and what itis.
USENIX 10th International Conference on Auto-nomic Computing (ICAC 2013) , pages 23–27, 2013.[24] D. Ongaro, A. L. Coxa, and S. Rixner. SchedulingI/O in virtual machine monitor.
Proceedings of thefourth ACM SIGPLAN/SIGOPS International Con-ference on Virtual Execution Environments , pages1–10, 2008.[25] A. Ousterhout, J. Fried, J. Behrens, A. Belay, andH. Balakrishnan. Shenango: Achieving high CPUefficiency for latency-sensitive datacenter work-loads. In ,pages 361–378, Boston, MA, 2019. USENIX Asso-ciation.[26] G. Prekas, M. Primorac, A. Belay, C. Kozyrakis, andE. Bugnion. Energy Proportionality and WorkloadConsolidation for Latency-critical Applications. In
Proceedings of the Sixth ACM Symposium on CloudComputing , SoCC ’15, pages 342–355, New York,NY, USA, 2015. ACM.[27] H. Qin, Q. Li, J. Speiser, P. Kraft, and J. Ouster-hout. Arachne: Core-aware thread management. In , pages 145–160,Carlsbad, CA, 2018. USENIX Association.[28] R. Sen and D. A. Wood. Energy-proportional Com-puting: A New Definition.
IEEE Computer , 2017.[29] D. Shue, M. J. Freedman, and A. Shaikh. Perfor-mance isolation and fairness for multi-tenant cloudstorage.
OSDI’12 Proceedings of the 10th USENIXconference on Operating Systems , pages 349–362,2012.[30] A. Sriraman and T. F. Wenisch. µ tune: Auto-tunedthreading for oldi microservices. In , pages 177–194, 2018.[31] S. Vestal. Preemptive scheduling of multi-criticalitysystems with varying degrees of execution time as-surance. In , pages 239–243, Dec2007.[32] S. Xi, J. Wilson, and et al. RT-Xen: towards real-time hypervisor scheduling in Xen. Proceedingsof the IEEE international conference on embeddedsoftware (EMSOFT) , page 3948, 2011.[33] C. Xu, S. Gamage, and H. Lu. vTurbo: accelerat-ing virtual machine I/O processing using designatedTurbo-Sliced core.
USENIX ATC’13 Proceedings ofthe 2013 USENIX conference on Annual TechnicalConference , pages 243–254, 2013.[34] C. Xu, S. Gamage, P. N. Rao, and et al. vSlicer:latency-aware virtual machine scheduling via dif-ferentiated frequency CPU slicing.
Proceedings ofthe 21st ACM International Symposium on High-Performance Parallel and Distributed Computing(HPDC ’12) , pages 3–14, 2012.[35] Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bob-tail: Avoiding Long Tails in the Cloud. In
NSDI ,volume 13, pages 329–342, 2013.[36] S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fe-dorova, and M. Prieto. Survey of Scheduling Tech-niques for Addressing Shared Resources in Multi-core Processors.